We may not have the course you’re looking for. If you enquire or give us a call on 01344203999 and speak to our training experts, we may still be able to help with your training requirements.
Training Outcomes Within Your Budget!
We ensure quality, budget-alignment, and timely delivery by our expert instructors.
Data is king in the digital age and its organisational structure can make or break a system’s efficiency. Consequently, one of the prime goals of any programmer is the transformation of chaotic data into a streamlined structure that enhances efficiency and clarity. This is where the technique of Normalisation in SQL comes in as the knight in shining armour.
Normalisation in SQL is a multi-step process that eliminates data redundancy and enhances data integrity by structuring data into tables and removing duplicates from relational tables. Read on as this blog takes a deep dive into this vital technique including its various types and its importance in the realm of Database Management.
Table of Contents
1) What is Normalisation in SQL?
2) Types of Database Normalisation
3) The Implications of Normalisation on SQL
4) Importance of Normalisation in SQL
5) Conclusion
What is Normalisation in SQL?
Normalisation is the process of eliminating data redundancy and increasing data integrity within a SQL table. It also helps to organise data efficiently in a database. This multi-step procedure involves structuring data into a tabular form and eliminating duplicated entries from relational tables.
Normalisation systematically arranges the tables and columns in a database to ensure that integrity constraints are properly maintained. This technique decomposes tables to remove data redundancy and prevents undesirable issues like update, insertion, and deletion anomalies.
Types of Database Normalisation
The process of taking a database design and applying a set of formal rules and criteria is called Normal Forms. The Database Normalisation process is further categorised into the following types:
First Normal Form (1NF)
A table is considered to be in its First Normal Form if the atomicity of the table is 1. Here, atomicity states that a single cell can't hold multiple values. It must have only a single-valued attribute. The First standard form disallows the multi-valued attribute, composite attribute, and their combinations.
Let’s explore 1NF with an example:
Original Table (Not in 1NF)
OrderID |
Product |
Quantity |
Prices |
1 |
Pen, Notebook |
2, 1 |
10, 40 |
2 |
Pencil, Eraser |
5,3 |
8, 6 |
In this table, the Product, Quantity, and Prices columns contain multiple values in a single cell. This is not compliant with 1NF.
Transformed Table (In 1NF)
OrderID |
Product |
Quantity |
Prices |
1 |
Pen |
2 |
20 |
1 |
Notebook |
1 |
40 |
2 |
Pencil |
5 |
10 |
2 |
Eraser |
3 |
8 |
In the transformed table, each column contains atomic values, and there are no repeating groups. Each row showcases a single product with its corresponding quantity and price, complying with the rules of 1NF.
Dive into the world of SQL with our Introduction To SQL Course – Sign up now!
Second Normal Form (2NF)
Second Normal Form (2NF) requires that the table first meet the criteria for First Normal Form (1NF). The table should also not have any partial dependencies. A partial dependency occurs when a non-prime attribute is dependent on a part of a candidate key, rather than on the whole candidate key. A non-prime attribute is an attribute that is not part of any candidate key.
Let’s explore 2NF with an example:
Consider a table that tracks orders with the structure as shown below:
OrderID |
ProductID |
ProductName |
Quantity |
Unit Price |
1 |
101 |
Pen |
10 |
20 |
1 |
102 |
Notebook |
5 |
40 |
2 |
101 |
Pen |
7 |
20 |
2 |
103 |
Eraser |
12 |
8 |
In this table, the composite primary key is (OrderID, ProductID). But the ProductName and UnitPrice attributes is dependent only on ProductID, not the entire composite key. This creates a partial dependency and violates 2NF.
You can resolve this by splitting the data into two tables:
Orders Table:
OrderID |
ProductID |
Quantity |
1 |
101 |
10 |
1 |
102 |
5 |
2 |
101 |
7 |
2 |
103 |
12 |
Products Table:
ProductID |
ProductName |
Unit Price |
101 |
Pen |
20 |
102 |
Notebook |
40 |
103 |
Eraser |
8 |
In the Orders table, (OrderID, ProductID) is the composite key, and Quantity is functionally dependent on this composite key. There are no instances of partial dependencies here.
In the Products table, ProductID is the primary key, where ProductName and UnitPrice as non-prime attributes. There are no partial dependencies because both ProductName and UnitPrice are entirely dependent on the ProductID.
This separation ensures that the database structure complies with 2NF, eliminating partial dependencies and maintaining data integrity.
Third Normal Form (3NF)
To illustrate Third Normal Form (3NF), remember that a table must first fulfill the Second Normal Form (2NF) criteria. Additionally, it must not show any transitive dependencies which occurs when a non-prime attribute is dependent on another non-prime attribute rather than directly on the primary key.
We explore 3NF below with an example
StudentID |
StudentName |
CourseID |
CourseName |
Instructor |
1 |
James Doe |
C101 |
Math |
Dr. Smith |
2 |
Jaimie Smith |
C102 |
English |
Prof. Brown |
3 |
Alicia Jones |
C101 |
Math |
Dr. Smith |
In this table, StudentID is the primary key. The CourseName and Instructor attributes depend on CourseID rather than on StudentID, which creates a transitive dependency. This situation violates 3NF.
To remove transitive dependencies and achieve 3NF, we can split the table into two:
Students Table:
StudentID |
StudentName |
CourseID |
1 |
James Doe |
C101 |
2 |
Jaimie Smith |
C102 |
3 |
Alicia Jones |
C101 |
Courses Table:
CourseID |
Course Name |
Instructor |
C101 |
Math |
Dr. Smith |
C102 |
English |
Prof. Brown |
In the Students table, StudentID is the primary key, and CourseID is a foreign key referring to the Courses table.
In the Courses table, CourseID is the primary key, and CourseName and Instructor depend directly on CourseID. This structure eliminates the transitive dependency by ensuring that non-key attributes are dependent only on the primary key. Thus, the database design now adheres to 3NF, preventing redundant data and maintaining data integrity.
Looking to expand your SQL skills? Our Advanced SQL Course is here to help - Sign up now!
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form (BCNF) is an advanced version of the Third Normal Form (3NF) in SQL database normalisation. A table is in BCNF if it is in 3NF and, for every one of its non-trivial functional dependencies X→Y, X is a superkey.
Let’s explore BCNF with an example:
Consider a table StudentCourse with the following attributes:
a) StudentID
b) CourseID
c) InstructorID
d) InstructorName
Assume the following functional dependencies:
a) StudentID, CourseID→InstructorID
b) InstructorID→InstructorName
Here are the steps to Achieve BCNF:
1) Identify Functional Dependencies:
a) StudentID, CourseID→InstructorID
b) InstructorID→InstructorName
2) Check for BCNF Violations:
To be in BCNF, every determinant must be a superkey.
a) StudentID, CourseID is a composite key and therefore a superkey for the table.
b) InstructorID is not a superkey for the table, as it does not uniquely determine all attributes of the table.
The dependency InstructorID → InstructorName violates BCNF because InstructorID is not a superkey
3) Decompose the Table
To resolve this BCNF violation, decompose the table into two tables:
Table 1: StudentCourse
StudentID |
CourseID |
InstructorID |
101 |
CSE101 |
201 |
102 |
CSE101 |
201 |
101 |
MAT101 |
202 |
102 |
PHY101 |
203 |
Table 2: Instructor
InstructorID |
InstructorName |
201 |
Dr. Smith |
202 |
Dr. Jones |
203 |
Dr. Brown |
4) Verify BCNF
a) In StudentCourse, the composite key StudentID, CourseID uniquely determines InstructorID.
b) In Instructor, InstructorID is the primary key and uniquely determines InstructorName.
Both tables are now in BCNF because every determinant is a superkey.
Here’s how the final BCNF tables look like
StudentCourse Table:
StudentID |
CourseID |
InstructorID |
101 |
CSE101 |
201 |
102 |
CSE101 |
201 |
101 |
MAT101 |
202 |
102 |
PHY101 |
203 |
Instructor Table:
InstructorID |
InstructorName |
201 |
Dr. Smith |
202 |
Dr. Jones |
203 |
Dr. Brown |
These tables are in Boyce-Codd Normal Form, as each non-trivial functional dependency has a superkey as its determinant.
Learn how to design and generate a report using SQL through our SQL Server Reporting Services (SSRS) Masterclass – Register now!
Fourth Normal Form (4NF)
4NF deals with multi-valued dependencies, which exist when one attribute can have multiple dependent attributes, and these dependent attributes are independent of the primary key.
Consider the table StudentActivities with the following attributes:
a) StudentID
b) Activity
c) Hobby
Let’s assume the following data:
StudentID |
Activity |
Hobby |
1 |
Football |
Painting |
1 |
Football |
Singing |
1 |
Basketball |
Painting |
1 |
Basketball |
Singing |
2 |
Tennis |
Reading |
2 |
Tennis |
Cycling |
2 |
Swimming |
Reading |
2 |
Swimming |
Cycling |
In this table, Activity and Hobby are independent multi-valued attributes of StudentID. This means that a student's hobbies and activities are independent of each other and should not be combined in a single table.
Here’s how you can achieve 4NF:
1) Identify Multi-valued Dependencies:
The multi-valued dependencies are:
a) StudentID ->> Activity
b) StudentID ->> Hobby
2) Decompose the Table
To resolve the multi-valued dependencies, the table must be decomposed into two separate tables:
Table 1: StudentActivities
StudentID |
Hobby |
1 |
Football |
1 |
Basketball |
2 |
Tennis |
2 |
Swimming |
Table 2: StudentHobbies
StudentID |
Hobby |
1 |
Painting |
1 |
Singing |
2 |
Reading |
2 |
Cycling |
3) Verify 4NF
a) In StudentActivities, StudentID uniquely determines Activity.
b) In StudentHobbies, StudentID uniquely determines Hobby.
Both tables are now in 4NF because other than a candidate key, there are no non-trivial multi-valued dependencies.
The final 4NF tables will look like this:
StudentActivities Table:
StaffID |
Hobby |
1 |
Football |
1 |
Basketball |
2 |
Tennis |
2 |
Swimming |
StudentHobbies Table:
StaffID |
Hobby |
1 |
Painting |
1 |
Singing |
2 |
Reading |
2 |
Cycling |
These tables are in 4NF because they do not contain two or more independent multi-valued dependencies. Each table contains only one multi-valued dependency per candidate key.
Fifth Normal Form (5NF)
5NF is the most complex form of normalisation that removes join dependencies. In this situation, data needs to be joined from multiple tables to answer a specific query, even when those tables are already aligned with 4NF.
In simpler terms, 5NF ensures that all the information in the database can be completely reconstructed from its separate tables without needing any extra data.
Let’s illustrate 5NF with an example. Consider a scenario where the table ProjectStaffSkill keeps track of projects, the staff members working on them, and the skills they are employing:
ProjectID |
StaffID |
Skill |
1 |
1 |
Java |
1 |
2 |
Python |
2 |
1 |
SQL |
2 |
3 |
C++ |
3 |
2 |
Java |
3 |
3 |
Python |
The steps to achieve decomposition into 5NF is as follows:
Step 1: Identify Decomposition
The table ProjectStaffSkill has a potential problem where combinations of ProjectID, StaffID, and Skill may lead to anomalies and redundancy. To achieve 5NF, you need to decompose it into smaller tables that circumvent these issues.
Step 2: Decompose into Three Relations
You can decompose the original table into three smaller tables:
ProjectStaff Table:
ProjectID |
Skill |
1 |
1 |
1 |
2 |
2 |
1 |
2 |
3 |
3 |
2 |
3 |
3 |
StaffSkill Table:
StaffID |
Skill |
1 |
Java |
1 |
Python |
2 |
SQL |
2 |
C++ |
3 |
Java |
3 |
Python |
ProjectSkill Table:
ProjectID |
Skill |
1 |
Java |
1 |
Python |
2 |
SQL |
2 |
C++ |
3 |
Java |
3 |
Python |
Step 3: Verify 5NF
Now, the original information can be reconstructed from these three tables without any loss of data or redundancy:
a) ProjectStaff showcases which staff members are working on which projects.
b) StaffSkill captures the skills of individual staff members.
c) ProjectSkill captures the skills required for each project.
The Implications of Normalisation on SQL
The outcome of the normalisation process significantly impacts your SQL queries. Normalisation often leads to the creation of more tables and additional relationships between them. While this might initially seem like it would make writing SQL queries harder and slower due to the greater number of tables and joins, this isn't necessarily the case. Here are some benefits of writing SQL on a Normalised Database:
a) Less Data is Considered: With multiple smaller tables, the database has less data to search through when returning query results as compared to tables with more rows and columns.
b) Performance can be Optimised: There are various methods to enhance the performance of queries, even when multiple tables are involved.
c) One Place to Make Changes: Writing Delete, Insert, or Update statements is simplified because changes only need to be made in one place.
Importance of Normalisation in SQL
Normalisation plays a vital role in database design. Here are some of the reasons why it is so important in SQL:
a) Reduces Redundancy: Redundancy occurs when the same information is stored multiple times, and splitting data into smaller tables is a good way to bypass this.
b) Improves Query Performance: You can perform quicker query execution on smaller tables that have undergone normalisation.
c) Minimises Update Anomalies: You can easily update data without affecting other records with normalised tables.
d) Enhances Data Integrity: It ensures that data remains consistent and accurate.
Conclusion
In conclusion, normalisation is a vital process in SQL that reduces redundancy, ensures data integrity, and enhances database efficiency. By following the principles and stages of normalisation, you can create a reliable and well-structured database. We hope this blog helps you expand the horizons of your SQL Developer expertise.
Looking to learn advanced techniques for optimising database performance? Sign up for our PostgreSQL Administration Training now!
Frequently Asked Questions
There are a few situations when you can consider denormalisation of a database. These situations include if you want to maintain history, improve query performance, speed up reporting and compute commonly needed values up front.
Potential drawbacks of normalisation include performance overhead, loss of data context and over-normalisation.
The Knowledge Academy takes global learning to new heights, offering over 30,000 online courses across 490+ locations in 220 countries. This expansive reach ensures accessibility and convenience for learners worldwide.
Alongside our diverse Online Course Catalogue, encompassing 17 major categories, we go the extra mile by providing a plethora of free educational Online Resources like News updates, Blogs, videos, webinars, and interview questions. Tailoring learning experiences further, professionals can maximise value with customisable Course Bundles of TKA.
The Knowledge Academy’s Knowledge Pass, a prepaid voucher, adds another layer of flexibility, allowing course bookings over a 12-month period. Join us on a journey where education knows no bounds.
The Knowledge Academy offers various SQL Courses, including the Advanced SQL Training and PostgreSQL Administration Course. These courses cater to different skill levels, providing comprehensive insights into Basic and Advanced SQL Functions.
Our Programming & DevOps Blogs cover a range of topics related to SQL, offering valuable resources, best practices, and industry insights. Whether you are a beginner or looking to advance your SQL skills, The Knowledge Academy's diverse courses and informative blogs have got you covered.
Upcoming Programming & DevOps Resources Batches & Dates
Date
Tue 11th Feb 2025
Tue 13th May 2025
Tue 12th Aug 2025
Tue 9th Dec 2025