To understand normalization in database with example tables, let’s assume that we are supposed to store the details of courses and instructors. Lets consider the database extract shown above. This depicts a special dieting clinic where the each patient has 4 appointments. On the first they are weighed. Insertion, Updation and Deletion Anamolies are very frequent if database is not normalized. To understand these anomalies let us take an example of a Student .

Author: Gadal Fekasa
Country: Panama
Language: English (Spanish)
Genre: Love
Published (Last): 15 October 2017
Pages: 439
PDF File Size: 11.71 Mb
ePub File Size: 20.82 Mb
ISBN: 987-3-52725-174-7
Downloads: 24929
Price: Free* [*Free Regsitration Required]
Uploader: Faurisar

When developing the schema of a relational database, one of the most important aspect to be taken into account is to ensure that the duplication is minimized.

This is done for 2 purposes:. Database Normalization is a technique that helps in designing the schema of the database in an optimal manner so as to ensure the above points. The core idea of database normalization is to divide the tables into smaller subtables and store pointers to data rather than replicating it. For a better understanding of what we just said, here is a simple Normalization example: Here is what a sample database could look like:.

At first, this design seems to be good. However, issues nomralization to develop once we need to modify information. For instance, suppose, if Prof. George changed his mobile number. In such a situation, we will have to make edits in 2 places.

What if someone just edited the mobile number against CS, but forgot to edit it wxample CS? Basically, we store the instructors separately and in the course table, we do not store the entire data of the instructor. We rather store the ID of the instructor. Also, if we were to change the mobile number of Prof. George, it can be done in exactly one place.

Further, if you observe, the mobile number now need not be stored 2 times. We have stored it at just 1 place. This also saves storage. This may not be obvious in the above simple example. However, think about the case when there are hundreds of courses and instructors and for each instructor, we have to store not just the mobile number, but also other details like office address, email address, specialization, availability, etc.

In such a situation, replicating so much data will increase the storage requirement unnecessarily. The above is a simplified example of how database normalizztion works. We will now more formally study it. Each normal form has an importance which helps in optimizing the database to save storage and to reduce redundancies.

The First normal form simply says that each cell of a table should contain exactly one value. Let us take an dtabase. Suppose we are storing the courses that a particular instructor takes, we can store it like this:.

Here, the issue is that in the first row, we are storing 2 courses against Prof. A better method would be to store the courses separately. Instructor’s name Course code Prof. Also, observe that each row stores unique information. There is no repetition. This is the First Normal Form. The first point is obviously straightforward exakple we just studied 1NF. Let us understand the normaliaztion point — 1 column primary key. Well, a primary key is a set of columns that uniquely identifies a row.


Basically, no 2 rows have the same primary keys. Here, in this table, the course code is unique.

So, that becomes our primary key. Let us take another example of storing student enrollment in various courses.

DBMS Normalization: 1NF, 2NF, 3NF and BCNF with Examples

Each student may enrol in multiple courses. Similarly, each course may have multiple enrollments. A sample table may look like this student name and course code:. Here, the first column is the student name and the second column is the course taken by the student.

Similarly, the course code column is not unique as we can see that there are 2 entries corresponding to course code CS in row 2 and row 4. However, the tuple student name, course code is unique since a student cannot enroll in the same course more than once. So, these 2 columns when combined form the primary key for the database.

To achieve the same 1NF to 2NFwe can rather break it into 2 tables:. Student name Enrolment number Rahul 1 Rajat 2 Raman 3 Here the second column is unique and it indicates the enrollment number for the student. Clearly, the enrollment number is unique.

Now, we can attach each of these enrollment numbers with course codes. Before we delve into details of third normal form, let us understand the concept of a functional dependency on a table.

Column A is said to be functionally dependent on column B if changing the value of A may require a change in the value of B. As an example, consider the following table:.

Here, the department column is dependent on the professor name column. This is because if in a particular row, we change the name of the professor, we will also have to change the department value.

As an example, suppose MA is now taken by Prof. Ronald who happens to be from the Mathematics department, the table will look like this:.

Here, when we changed the name of the professor, we also had to change the bcnv column. This bcmf not desirable since someone who is updating the database may remember to change the name of the professor, but may forget updating the department value.

This can cause inconsistency in the database. We can simply use the ID. Boyce-Codd Normal form is a stronger generalization of third normal form. Let us first understand what a superkey means. Here, the first ratabase course code is unique across various rows. So, it is a superkey. Consider the combination of columns course code, professor name. It is also unique across various rows.


So, it is also a superkey. A superkey is basically a set of columns such that the value of that set of columns is unique across various rows.

That is, no 2 rows have the same set of values for those columns. A superkey whose size number of columns is the smallest is called as a candidate key. For instance, the first superkey above vcnf just 1 column. The second one and the last one have 2 columns. So, the first superkey Course code is a candidate key.

DBMS Normalization: 1NF, 2NF, 3NF and BCNF with Examples

A trivial functional dependency means that all columns of B are contained in the columns of A. A is a superkey: Basically, if a set of columns B can be determined knowing some other set of columns Athen A should be a superkey. Superkey basically determines each row uniquely. It is a trivial functional dependency: This may lead to an inconsistent database.

A table is said to be in fourth normal form if there is no two or more, independent and multivalued data describing the relevant entity. The various forms of database normalization are useful while designing the schema of a database in such a way that there is no data replication which may possibly lead to inconsistencies.

While designing schema for applications, we should always think about how can we make use of these forms. Entrepreneur, Coder, Speed-cuber, Blogger, fan of Air crash investigation! Fascinated by the world of technology he went on to build his own start-up – AllinCall Research and Solutions to build the next generation of Artificial Intelligence, Machine Learning and Natural Language Processing based solutions to power businesses. View all posts by Aman Goel. Lock is the mechanism to prevent the overwriting of data.

Database locks serve to protect shared resources or objects like tables, rows etc. In the Star schema, dimensions are denormalized.

For example, if you have an employee dimension and the employee belongs to a particular department. Then in star schema, you will only have the employee table and repeat the department data for each employee. This will increase the data retrieval speed and save the storage.

Fact tables are completely normalized because the redundant information is maintained in the dimensions table. Dimensions table can be normalized or denormalized. If anyone say that fact table is denormalized as it might contain duplicate foreign key then it would be partially correct to say denormalized. There can be some situations where fact table contains lot of columns.

In that case, we can say that fact table is denormalized, but it would be much better to say that schema is denormalized.