Basic Guide to Database Normalisation: First Normal Form
Posted by daniel | Posted in Features, Tutorial | Posted on 29-12-2009
Tags: 1NF, 2NF, Database, Database Fields, Database Optimisation, Database Performance, Database Records, Database Struture, Edgar F Codd, First Normal Form, Forms, mySQL, Normalisation, Normalization, Oracle, Primary Keys, SQL
0
Before I start talking about the normal forms, I thought it would be best to explain what I am planning to write about, and what normalisation is.
I am going to cover what normalisation is for beginners and try to write it in such a way a novice will at least understand the key principles.
If I have missed anything or written something incorrect as always please let me know.
So… What is normalistation?
Normalisation is effectively a set of rules which can be followed when creating a database structure. I have notably put “can” be followed. Sometimes people specifically break a rule for such reasons as database performance. However I am not planning to discuss this area in my post.
There are 7 forms of Normalisation. They are labelled First Normal Form, up to Sixth Normal Form. Before anyone says that is only 6 – there is one more. It sits between 3 and 4. It is called Boyce-Codd Normal Form — It was named after the people who created it.
First Normal Form (1NF)
1NF was first created by Edgar F. Codd in 1970. The first rule is really the basis of table structures. Effectively it states that you should create separate tables for groups of data and give that data a unique reference (commonly known as the primary key).
One way to notice if areas which you need to assess, is if the data repeats either in columns or rows. So for instance if we have teachers at a school we may have a table like this (data repeats in rows):
| FirstName | Surname | Lesson | Date | Room |
| Chris | Jones | IT | 23/01/2009 11:00 | 102 |
| Chris | Jones | IT | 23/01/2009 14:00 | 110 |
| Chris | Jones | Business | 23/01/2009 16:00 | 102 |
| Mary | Charles | IT | 23/01/2009 11:00 | 119 |
| Mary | Charles | IT | 23/01/2009 15:00 | 119 |
So here we have a few items which repeat. So firstly we have a look at what data repeats. Firstname and Surname both repeat as a combined unit. So logically we need a table called “Teachers”. Subjects repeat and so do rooms. However they repeat on there own, so there is no information about the room, or the subject which repeats with it – so they would not need to be in individual tables.
So then we would have a table called “Lessons” with the remaining data. So teachers would look like this:
| TeacherCode | Firstname | Surname |
| 1 | Chris | Jones |
| 2 | Mary | Charles |
And Lessons:
| LesssonCode | Subject | Date | TeacherCode | Room |
| L1 | IT | 23/01/2009 11:00 | 1 | 102 |
| L2 | IT | 23/01/2009 14:00 | 1 | 110 |
| L3 | Business | 23/01/2009 16:00 | 1 | 102 |
| L4 | IT | 23/01/2009 11:00 | 2 | 119 |
| L5 | IT | 23/01/2009 15:00 | 2 | 119 |
If however we had a information about the room – for instance location, or amount of computers – I would then separate the rooms into another table. The same would apply for subject.
This could easily be replication in columns. So for instance if you have customers which have multiple phone numbers you should not have columns called Telephone1, Telephone2, Telephone3, Telephone4 etcetera. The telephone numbers should be in a separate table with a unique key.
The second rule is that each intersection between a field and a record should only contain one piece of information. So for instance you should not have a field called “Name”. It should be split into first name, surname, and maybe
even middle name or initial. That way each intersection contains one piece of information.
The next rule is that all data should relate to the domain of the table. The domain is what the data is about.
So for the above example we have domains of Teachers and Lessons. So we should not for instance store information about Lessons within the Teachers domain.
The final rule is there should be no duplicate rows. So in our example what would you do if you had two people called Chris Jones. That is why in our 1NF example we have a unique reference. This is usually called the primary key and
is the reference for the record within the table.
Why?
Well one of the most obvious reasons for this level is that if Mary for instance gets
married and changes her name. If the table is laid out correctly this name
change should happen in one place.

If you have laid the tables out incorrectly
the amount of changes could be limitless. If you store the same information
multiple times, and have to change it in multiple places. When you have to do
this it is also the concern that data will be missed in the change. Then you
would have data that conflicted.
A second reason is that you may require to identify separate data that is
identical. So for instance two people with the same name.
Also there is data storage. With repeating data it can cause database size
to be inflated.
The next level up is Second Normal Form (2NF). By default if you have
completed the 2NF or higher you have also done the 1NF. I will do another post
in the future discussing 2NF, so stay tuned.
I hope you found this post useful, Happy Holidays and New year from all at Cardboard Coder.



