Normalization



Normalization













Normalization is the process of splitting relations into well-structured relations that allow users to insert, delete, and update tuples without introducing database inconsistencies. The focus of normalization is to reduce redundant data to the minimum.  Normalization is also called “Bottom-up approach”, because this technique requires very minute details like every participating attribute and how it is dependant on the key attributes, is crucial. If you add new attributes after normalization, it may change the normal form itself.


Through Normalization, the collection of data in a single table is distributed into multiple tables with specific relation. Without normalization many problems can occur when trying to load an integrated conceptual model into the DBMS. These problems arise from relations that are generated directly from user views are called anomalies. There are three types of anomalies:












RollNo


StudentName


CourseNo


CourseName


Instructor


120


SKL


CS-75


DBMS


SB


89


KBC


CS-75


DBMS


SB


25


ABC


CS-75


DBMS


SB


48


XYZ


CS-30


CA


MC


57


TKC


CS-80


OS


JB


120


SKL


CS-80


OS


JB


68





CS-90


FLAT


LD







Update anomaly: An update anomaly is a data inconsistency that results from data redundancy and a partial update. For example, if there is a change in the name of the instructor for CS-75, we need to make change for all the rows. If we forget to update a single row the database will show two instructor names for the Course CS-75.





Deletion anomaly:  This means loss of useful information. In some cases it may occur that some useful data is lost. For example, consider the row for RollNo 48. XYZ is the only student who has opted for CS-30. If XYZ leaves the institute and the data is deleted for XYZ, the associated data for Course will also be deleted.





Insertion anomaly: An insertion anomaly is the inability to add data to the database due to absence of other data. For example, assume that the above table is defined so that null values are not allowed. If a course is added but not immediately students opt for the course, then this course could not be entered into the database (68). This results in database inconsistencies due to omission.





In database tables are normalized for the following reasons:


To allow data retrieval at an optimal speed


Data maintenance through update, insertion and deletion.


To reduce the need to restructure tables for new applications





A functional dependency is a constraint between two sets of attributes in a relation from a database. It occurs when an attribute in a relation uniquely determines another attribute.


In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value of X determines EXACTLY ONE value of Y, which is represented as X → Y (X can be composite in nature).


We say here “X determines Y” or “y is functionally dependent on x”.


 X→Y does not imply Y→X.


If the value of an attribute “Marks” is known then the value of an attribute “Grade” is determined since Marks→ Grade







RollNo


StudentName


Marks


Grade


59


ABC


90


A


96


XYZ


70


B







Types of functional dependencies:


1.       Full Functional dependency: X and Y are attributes. X Functionally determines Y (Note: Subset of X should not functionally determine Y)


2.       Partial Functional dependency:  X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.


3.       Transitive dependency: X Y and Z are three attributes. X -> Y, Y-> Z => X -> Z








Now consider the following Relation


REPORT (STUDENT#, COURSE#, CourseName, IName, Room#, Marks, Grade)


• STUDENT# - Student Number


• COURSE# - Course Number


• CourseName - Course Name


• IName - Name of the Instructor who delivered the course


• Room# - Room number which is assigned to respective Instructor


• Marks - Scored in Course COURSE# by Student STUDENT#


• Grade - obtained by Student STUDENT# in Course COURSE#





The Functional Dependencies are:


• STUDENT# COURSE# → Marks


• COURSE# → CourseName,


• COURSE# → IName (Assuming one course is taught by one and only one


Instructor)


• IName → Room# (Assuming each Instructor has his/her own and non-shared room)


• Marks → Grade


           In above example Marks is fully functionally dependent on STUDENT# COURSE# and not on subset of STUDENT# COURSE#. This means Marks cannot be determined either by STUDENT# OR COURSE# alone. It can be determined only using STUDENT# AND COURSE# together. Hence Marks is fully functionally dependent on STUDENT# COURSE#. CourseName is not fully functionally dependent on STUDENT# COURSE# because subset of STUDENT# COURSE# i.e only COURSE# determines the CourseName and STUDENT# does not have any role in deciding CourseName. Hence CourseName is not fully functionally dependent on STUDENT# COURSE#.


Now CourseName, IName, Room# are partially dependent on composite attributes STUDENT# COURSE# because COURSE# alone defines the CourseName, IName, Room#.


Again, Room# depends on IName and in turn IName depends on COURSE#. Hence Room# transitively depends on COURSE#. Similarly Grade depends on Marks, in turn Marks depends on STUDENT# COURSE# hence Grade depends Fully transitively on STUDENT# COURSE#.





Now consider this table:







Student Data


Course Details


Result


STUDENT#


StudentName


COURSE#


CourseName


IName


Room#


Marks


Grade































First normal form (1NF)


A relation schema is in 1NF if:


·         If and only if all the attributes of the relation R are atomic in nature.


·         Atomic: the smallest level to which data may be broken down and remain meaningful.


In relational database design it is not practically possible to have a table which is not in 1NF.






STUDENT#


StudentName


COURSE#


CourseName


IName


Room#


Marks


Grade































Second normal form (2NF)


A Relation is said to be in Second Normal Form if and only if:


·         It is in the First normal form, and


·         No partial dependency exists between non-key attributes and key attributes.


An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute To make a table 2NF compliant, we have to remove all the partial dependencies


Note: - All partial dependencies are eliminated






STUDENT#


StudentName

















COURSE#


CourseName


IName


Room#























STUDENT#


COURSE#


Marks


Grade



















Third normal form (3 NF)


A relation R is said to be in the Third Normal Form (3NF) if and only if:


·         It is in 2NF and


·         No transitive dependency exists between non-key attributes andkey attributes.


• STUDENT# and COURSE# are the key attributes.


• All other attributes, except grade are nonpartially, non-transitively dependent on key attributes.









STUDENT#


StudentName

















COURSE#


CourseName


IName


Room#























STUDENT#


COURSE#


Marks




















Marks (Lower Bound)


Marks (Upper Bound)


Grade













Boyce-Codd Normal form (BCNF)


A relation is said to be in Boyce Codd Normal Form (BCNF)


·         If and only if all the determinants are candidate keys.


BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.


Consider the table:






Student#


EmailID


Course#


Marks


ABC


abc@sumitkar.in


CS-75


98







Candidate Keys for the relation are [C#, S#] and [C#, Email]. Since Course # is overlapping, it is referred as Overlapping Candidate Key. Valid Functional Dependencies are:


S#  → EmailID


EmailID → S#


S#, C# → Marks


C#, EmailID →Marks






Student#


EmailID


ABC


abc@sumitkar.in











Student#


Course#


Marks


ABC


CS-75


98


















0 Comments