Unit 3 DBMS
Unit 3 DBMS
Data integrity refers to how accurate and consistent the data in a database is. Databases with lots of
missing information and incorrect information is said to have low data integrity.
Data independence refers to the separation between data and the application (or applications) in which
it is being used. This allows you to update the data in your application (such as fixing a spelling
mistake) without having to recompile the entire application.
Data Redundancy refers to having the exact same data at different places in the database. Data
redundancy Increases the size of the database, creates Integrity problems, decreases efficiency and
leads to anomalies. Data should be stored so that it is not repeated in multiple tables.
Data security refers to how well the data in the database is protected from crashes, hacks and accidental
deletion.
Data maintenance refers to monthly, daily or hourly tasks that are run to fix errors within a database
and prevent anomalies from occurring. Database maintenance not only fixes errors, but it also detects
potential errors and prevents future errors from occurring.
It is easy to modify and maintain without affecting other fields or tables in the database.
Information is easy to retrieve, and user applications are easy to develop and build.
The database is scalable, meaning that it can be expanded to meet the changing needs of an
organization.
Schema Refinement
Schema Refinement or Normalization is a process of organizing the data in database to avoid data redundancy,
insertion anomaly, update anomaly & deletion anomaly. Normalization means decomposing or splitting the
tables into small tables which will contain less number of attributes in such a way that table design must not
contain any problem of inserting, deleting, updating anomalies and guarantees no redundancy.
A good understating of the semantics of data helps the designer to build efficient design using the concept of
normalization.
Anomalies or Problems without Normalization
1. Redundant Storage: Some information is stored repeatedly and it will occupy extra memory space and it
will also make it difficult to handle and update the database
2. Insertion, Updating and Deletion Anomalies
To understand these anomalies let us take an example of a Student table.
a. Insertion Anomaly: It may not be possible to store some information unless some other information is
stored as well.
Example- Suppose a new course is introduced i.e. C4, but no student is there who is enrolled for C4 subject, so
data of such student cannot be inserted, or else we will have to set the SID and Sname information as NULL.
b. Deletion Anomaly: It may not be possible to delete some information without losing some other
information as well.
Example- Deletion of S3 student causes the deletion of course. Because of deletion of some data forced to
delete some other useful data.
Insertion and Deletion Anomalies exist only due to redundancy, otherwise they do not exist.
c. Update Anomaly: If one copy of redundant data is updated, then inconsistency is created unless all
redundant copies of data are updated.
Example- If there is updation in the fee from 5000 to 7000, then we have to update FEE column in all the rows,
else data will become inconsistent.
Purpose of Normalization
✓ Minimize the redundancy in data.
✓ Remove insert, update, and delete anomalies during the database activities.
Functional Dependency
A Functional Dependency is a constraint that specifies the relationship between two sets of attributes where
one set can accurately determine the value of other sets. It is denoted as
X→Y
where X is a set of attributes that is capable of determining the value of Y. We read it as X determines Y.
Two tuples sharing the same values of X will necessarily have the same values of Y. If two tuples having the
same value of attribute X also have the same value of attribute Y. If t1.X=t2.X then t1.Y=t2.Y where t1,t2 are
tuples and X,Y are attributes.
The attribute set on the left side of the arrow, X is called Determinant, while on the right side, Y is called the
Dependent. Functional dependencies are used to mathematically express relations among database entities.
Functional Dependencies are fundamental to the process of Normalization i.e., Functional Dependency
plays key role in differentiating good database design from bad database designs.
A functional dependency is a “type of constraint that is a generalization of the notation of the key”.
Functional Dependency describes the relationship between attributes (columns) in a table.
Functional dependency is represented by an arrow sign (→).
Case1:
A →B Here A1 belongs to B1 & B2. So A1 does not have unique value in B. So it is not a FD.
Case1:
A →C Here A1→C1 and A2, A3→C2. So A has unique values in C. So it is in FD.
[Note: try to find all the possibilities. i.e., A→D, B→C, B→D, and C→D]
Example 3:
Cases of dependency:
1) Roll_no → name
44→ xyz
44→ xyz
2) Roll_no → name
44→ xyz
45 → xyz
3) Roll_no → name
44→ xyz
46 → mno
These are valid cases of dependency.
Some more valid cases-
roll_no → { name, phone, dept_name, dept_building },→ Here, roll_no can determine values of fields name,
dept_name and dept_building, hence a valid Functional dependency.
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name,dept_building}, it can
Here, roll_no → name is a non-trivial functional dependency, since the dependent ‘name’ is not a subset of
determinant ‘roll_no.’ Similarly, {roll_no, name} → age is also a non-trivial functional dependency,
since age is not a subset of {roll_no, name}
Completely non-trivial functional dependency:-If X→Y and X∩Y=Ф(null) then it is called completely non-
trivial functional dependency
2. Partial Functional Dependency: If a non-prime attribute of the relation is getting derived by only a part of
the candidate key, then such dependency is known as Partial Dependency.
(OR)
In a relation having more than one key field, a subset of non-key fields may depend on all key fields but
another subset or a particular non-key field may depend on only one of the key fields. Such dependency is
defined as Partial Dependency.
Example: Consider the following determinants AC→P, A→D, D→P. From these determinants P is not fully
FD on AC. Because, If we find A+ (means A’s Closure) A→D, D→P i.e., A→P. But we don’t have any
requirement of C. C attribute is removed completely. So P is Partially Dependent on AC.
3.Transitive Functional Dependency: If a non-prime attribute of a relation is getting derived by either
another nonprime attribute or the combination of the part of the candidate key along with non-prime attribute,
then such dependency is defined as Transitive dependency. i.e., in a relation, there may be dependency among
non-key fields. Such dependency is called Transitive Functional Dependency.
Example: X→Y, and Y→Z then we can determine X→Z holds.
For example,
Here, roll_no → dept and dept → building_no. Hence, according to the axiom of transitivity, roll_no →
building_no is a valid functional dependency. This is an indirect functional dependency, hence called
Transitive functional dependency.
4. Trivial Functional Dependency: It is basically related to Reflexive rule. i.e., if X is a set of attributes, and
Y is subset of X then X→Y holds.
Example: ABC→BC is a Trivial Dependency.
For example, X={roll_no,name} and Y = {name}
X → Y is a trivial functional dependency, since the dependent attribute “name” is a subset of determinant set
{roll_no, name}
Prepared By- Charu Kavadia Page 8
Database Management System Unit 3
5. Multi-Valued Dependency: Entities of the dependent set are not dependent on each other. i.e. If a → {b, c}
and there exists no functional dependency between b and c, then it is called a multivalued functional
dependency.
For example,
Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents attributes “name”
& “age” are not dependent on each other (i.e. name → age or age → name doesn’t exist !)
A={roll_no} B= {name}
A→B = roll_no→name
AC→BC
C = phone number
AC→BC ={roll_no, phone}→{name , phone}
It means that attributes in dependencies does not change the basic dependencies.
Example 1: {roll_no, phoneno} → {name, phoneno}
Example 2: If {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name}→ {dept_building,
dept_name} is also valid
iii) Transitivity: If A holds B and B holds C, then A holds C.
{A → B} and {B → C}, then {A → C}
Explanation:
Example: roll_no → dept_name & dept_name → dept_building, then roll_no →dept_building is also valid.
B) Secondary axioms:
i) Union
If A holds B and A holds C, then A holds BC
If {A → B} and {A → C}, then {A → BC}
Explanation:
A ={Roll_no}
B= {name}
C= {phone }
Roll_no→name: valid
Prepared By- Charu Kavadia Page 11
Database Management System Unit 3
Roll_no→phone: valid
Roll_no→{name , phone}
ii) Decomposition
If A holds BC then A holds B, and A holds C.
If {A → BC} then {A → B}and{A → C}
Explanation:
A ={Roll_no}
B= {name}
C= {phone }
If {A → BC} = Roll_no→{name , phone}
then
{A → B}= Roll_no→name
And {A → C}: valid = Roll_no→phone
iii) Pseudo Transitivity
If A holds B and BC holds D, then AC holds D.
If {A → B} and {BC → D}, then {AC → D}
Explanation:
A ={Roll_no}
B= {name}
C= {phone}
D= dept_name
If {Roll_no → name}
{name, phone → dept_name}
Then, Roll_no, phone → dept_name is valid
2) Attribute Closure: Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
NOTE:
To find attribute closure of an attribute set-
1) add elements of attribute set to the result set.
2) recursively add elements to the result set which can be functionally determined from the elements of result
set.
Example 1: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then what is the
Closure of A?
Solution: A+ → {A, B, D} as A→B, B→D exists and C is not FD on A. So it is eliminated.
As there is no other attribute remaining in relation to be derived from E-ID. So result is:
(E-ID)+ = {E-ID, E-NAME, E-CITY, E-STATE }
[Similarly,(E-NAME)+ = {E-NAME}, (E-CITY)+ = {E-CITY, E_STATE}]
Example 3: Find the attribute closures of given FDs R(ABCDE) = {AB->C, B->D, C->E, D->A}. FIND B+
{B}+= {B} and Find candidate key in Functional dependency.
Solution:
{B}+= {B,D} (As B->D)
{B}+= {B,D,A} (As D->A)
{B}+= {B,D,A,C} (As AB->C)
{B}+= {B,D,A,C,E} (As C->E)
And Candidate Key is B.
Similarly
(AB)+= (AB)
AB→C
(AB)+= (ABC)
A→DE
(AB)+= (ABCDE)
B→F
(AB)+= (ABCDEF)
F→GH
(AB)+= (ABCDEFGH)
So (AB) IS THE CANDIDATE KEY.
{EC}+={EC}
As A and E are not part of candidate key we can combine them and also find its closure
{AEC}+={AEC}
Candidate Key: BC, DC
Example 5: R(A,B,C,D,E)
AB→CD
D→A
BC→DE
Solution: Candidate Key: AB, BC, BD (How its calculated, find on your own)
Example-Compute the closure of the following set F of functional dependencies for relation schema R = (A,
B, C, D, E). A →BC CD →E B→ D E→ A. List the candidate keys for R.
Solution: Starting with A → BC, we can conclude: A → B and A → C (decomposition)
Since A → B and B → D, then A → D (transitive)
Since A→C and A → D then A → CD (union)
Since A → CD and CD → E, A → E (transitive)
Since A → A, we have (reflexive)
A → ABCDE from the above steps (union)
Since E → A and A → ABCDE then E → ABCDE (transitive)
Since CD → E and E → ABCDE then CD → ABCDE (transitive)
Since B → D then BC → CD (augmentative)
Since B → CD and CD → ABCDE, BC → ABCDE (transitive)
Since CD →E and E→A, CD→A (transitive)
Since BC →CD and CD→A, BC→A (transitive)
4) Database Keys: They are used to establish and identify the relation between tables. They can also ensure
that each record within a table can be uniquely identified by combination of one or more fields with in a table.
a) Candidate Key: -Candidate Key is minimal set of attributes of a relation which can be used to identify a
tuple uniquely.
Consider student table: student(sno, sname, sphone, age)
we can take sno as candidate key. we can have more than 1 candidate key in a table.
Prime or Key Attribute: The attributes of the relation which are part of candidate key are called Prime or key
attribute. And rest are called Non-Prime Attributes.
Types of candidate keys:
1. simple (having only one attribute)
2. composite (having multiple attributes as candidate key)
b) Super Key: -Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
Adding zero or more attributes to candidate key generates super key.
A candidate key is a super key but vice versa is not true.
c) Primary Key: It is a candidate key that is most appropriate to become the main key of the table. It is the key
that uniquely identifies the records in a table.
d) Composite Key: A key that consists o0f two or more attributes that uniquely identifies an entity occurrence
is called composite key.
e) Secondary Key or Alternative key: The candidate keys which are not selected for primary key are known
as secondary key.
f) Foreign Key: Tuples in one relation, say r1 (R1) often need to refer tuples in another relation, say r2 (R2).
Normalization
Normalization is a process for evaluating and correcting table structures to minimize database redundancies,
thereby reducing the likelihood of data anomalies. Normalization is a process of designing a consistent
database with minimum redundancy which support data integrity by grating or decomposing given relation into
smaller relations preserving constraints on the relation.
Normalization works through a series of stages called normal forms. The first three stages are described as first
normal form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of view,
2NF is better than 1NF, and 3NF is better than 2NF.
For most purposes in business database design, 3NF is as high as we need to go in the normalization process.
Normalization of data can hence be looked upon as a process of analyzing the given relation schemas based on
their FDs and primary keys to achieve the desirable properties of
(1) minimizing redundancy
(2) minimizing the insertion, deletion, and update anomalies
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called as prime attribute, others are non-prime
attributes.
Example 2: Relation STUDENT in following table is not in 1NF because of multi-valued attribute
STUD_PHONE . Its decomposition into 1NF is as follows.
Example 3:
To be in second normal form, a relation must be in first normal form and relation must not contain any partial
dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which
are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.
Definition: A relation that is in First Normal Form and every non-primary-key attribute is fully functionally
dependent on the primary key, then the relation is in Second Normal Form (2NF).
If the proper subset of candidate key determines non-prime attribute, it is called partial dependency. This is
Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on the
whole key.
The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial
dependency exists, we remove the partially dependent attribute(s) from the relation by placing them in a new
relation along with a copy of their determinant.
Consider the examples given below.
Example-1: Consider table as following below.
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;
Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one and only candidate key
{STUD_NO, COURSE_NO} ;
But, COURSE_NO → COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper
subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the
candidate key, which is a partial dependency and so this relation is not in 2NF.
To convert the above relation to 2NF, we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100 students taking
C1 course, we don’t need to store its Fee as 1000 for all the 100 records, instead once we can store it in the
second table as the course fee for C1 is 1000.
Example 2: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
Solution :
Step 1: List all functional dependency
StudentID→ StuName
StudentId→{ CourseId , Grade }
CourseId→ CourseName
Step 2: Break table removing functional dependency
Example 4: A university uses the following relation: Student (IDSt, LastName, IDProf, ProfessorName, Grade)
The attributes IDSt and IDProf are the identification keys.
All attributes a single valued (1NF).
The following functional dependencies exist:
1. The attribute ProfessorName is functionally dependent on attribute IDProf (IDProf -→ProfessorName)
2. The attribute StudentName is functionally dependent on IDSt (IDSt -→ StudentName)
3. The attribute Grade is fully functional dependent on IDSt and IDProf (IDSt, IDProf -→Grade)
A relation that is in First and Second Normal Form and in which no non-primary-key attribute is transitively
dependent on the primary key, then it is in Third Normal Form (3NF).
If A→B and B→C are two FDs then A→C is called transitive dependency.
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies.
If a transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by placing
the attribute(s) in a new relation along with a copy of the determinant.
Consider the examples given below.
In relation STUDENT given in Table
FD set:
{STUD_NO → STUD_NAME,
STUD_NO → STUD_STATE, A->B
STUD_STATE → STUD_COUNTRY, B->C
STUD_NO → STUD_AGE}
Candidate Key:
{STUD_NO}
For this relation in table,
STUD_NO → STUD_STATE and STUD_STATE → STUD_COUNTRY are true.
So STUD_COUNTRY is transitively dependent on STUD_NO.
It violates the third normal form. To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_STATE,STUD_COUNTRY_STUD_AGE) as:
Table 1:
FD : STUD_STATE → STUD_COUNTRY,
Example 2:
Example 3:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Example 2:
The above relation is in 1NF, 2NF, 3NF, but not in BCNF. Here is the reason −
Package->Ground
Package : is not a candidate key
x->y
and here x should be super key; and here x(package) is not super key.
Prepared By- Charu Kavadia Page 28
Database Management System Unit 3
Decomposition
Decomposition: It is the process of splitting original table into smaller relations such that attribute sets of two
relations will be the subset of attribute set of original table.
Rules of Decomposition:
If ‘R’ is a relation splitted into ‘R1’ and ‘R2’ relations, the decomposition done should satisfy following-
1) Union of two smaller subsets of attributes gives all attributes of ‘R’.
R1(attributes) U R2(attributes)=R(attributes)
2) Both relations interaction should not give null value.
R1(attributes) ∩ R2(attributes)!=null
3) Both relations interaction should give key attribute.
R1(attribute)∩R2(attribute)=R(key attribute)
Types of Decomposition
1) Lossless Decomposition
2) Lossy Decomposition
1) Lossless Decomposition
Consider there is a relation R which is decomposed into sub relations R1 , R2 , …. , Rn.
This decomposition is called lossless join decomposition when the join of the sub relations results in
the same relation R that was decomposed, i.e. no data should be lost and should satisfy all the rules of
decomposition.
For lossless join decomposition, we always have-
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
This relation is same as the original relation R. Thus, we conclude that the above decomposition is lossless join
decomposition.
Note-
Lossless join decomposition is also known as non-additive join decomposition.
This is because the resultant relation after joining the sub relations is same as the decomposed relation.
No extraneous tuples appear after joining of the sub-relations.
2) Lossy Join Decomposition
Consider there is a relation R which is decomposed into sub relations R1, R2, …. , Rn.
This decomposition is called lossy join decomposition when the join of the sub relations does not result
in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
For lossy join decomposition, we always have-
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-
This relation is not same as the original relation R and contains some extraneous tuples. Clearly, R1 ⋈ R2 ⊃ R.
Thus, we conclude that the above decomposition is lossy join decomposition.
Note-
Lossy join decomposition is also known as careless decomposition.
This is because extraneous tuples get introduced in the natural join of the sub-relations.
Extraneous tuples make the identification of the original tuples difficult.
Hence resultant FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
4 a. Closure Z+ = ZXWY using FD = { W → X, Y → X, Z → W, Z → X, Z → Y, WY → Z }
4 b. Closure Z+ = ZWYX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
From 4 a and 4 b, we found that both the Closure (by including Z → X and excluding Z → X ) are equivalent,
hence FD Z → X is not important and can be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
5 a. Closure Z+ = ZYWX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
5 b. Closure Z+ = ZWX using FD = {W → X, Y → X, Z → W, WY → Z }
From 5 a and 5 b, we found that both the Closure (by including Z → Y and excluding Z → Y ) are not
equivalent, hence FD Z → X is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
6 a. Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
6 b. Closure WY+ = WYX using FD = { W → X, Y → X, Z → W, Z → Y }
From 6 a and 6 b, we found that both the Closure (by including WY → Z and excluding WY → Z) are not
equivalent, hence FD WY → Z is important and cannot be removed from the set of FD.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } is resultant FD now, we have checked the
redundancy of attribute, since the left side of FD WY → Z has two attributes at its left, let's check their
importance, i.e. whether they both are important or only one.
Closure WY+ = WYZX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure W+ = WX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Closure Y+ = YX using FD = { W → X, Y → X, Z → W, Z → Y, WY → Z }
Since the closure of WY+, W+, Y+ that we found are not all equivalent, hence in FD WY → Z, both W and
Y are important attributes and cannot be removed.
Hence resultant FD = { W → X, Y → X, Z → W, Z → Y, WY → Z } and we can rewrite as:
FD = {W → X, Y → X, Z → WY, WY → Z } is Canonical Cover of FD = { W → X, Y → X, Z → WXY,
WY → Z }.
From above arrow diagram on R, we can see that an attribute X is not determined by any of the given FD,
hence X will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have X compulsory attribute.
Definition of 3NF: A relational schema R is said to be in 3NF, First, it should be in 2NF and, no non-prime
attribute should be transitively dependent on the Key of the table.
If X → Y and Y → Z exist then X → Z also exists which is a transitive dependency, and it should not hold.
Since R has 3 attributes: - X, Y, Z, and Candidate Key is X, Therefore, prime attribute (part of candidate key)
is X while a non-prime attribute are Y and Z
Given FD are X → Y and Y → Z
So, we can write X → Z (which is a transitive dependency)
In above FD X → Z, a non-prime attribute( Z) is transitively depending on the key of the table( X ) hence as
per the definition of 3NF it is not in 3 NF, because no non-prime attribute should be transitively dependent
on the key of the table.
We can also prove the same from Definition 2: First, it should be in 2NF and if there exists a non-trivial
dependency between two sets of attributes X and Y such that X → Y (i.e., Y is not a subset of X) then
a) Either X is Super Key
b) Or Y is a prime attribute.
Since we have just proved that above table R is in 2 NF.
Hence because of Y → Z using definition 2 of 3NF, we can say that above table R is not in 3NF.
From above arrow diagram on R, we can see that an attributes XZ is not determined by any of the given FD,
hence XZ will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have XZ compulsory attribute.
Calculate the closure of XZ
XZ+ = XZYPW
Since the closure of XZ contains all the attributes of R, hence XZ is Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no proper subset is a Super key).
Since all key will have XZ as an integral part, and we have proved that XZ is Candidate Key, Therefore, any
superset of XZ will be Super Key but not the Candidate key.
Hence there will be only one candidate key XZ.
Definition of 3NF: First it should be in 2NF and if there exists a non-trivial dependency between two sets of
attributes X and Y such that X → Y ( i.e., Y is not a subset of X) then
a) Either X is Super Key
b) Or Y is a prime attribute.
Since R has 5 attributes: - X, Y, Z, W, P and Candidate Key is XZ, Therefore, prime attribute (part of
candidate key) are X and Z while a non-prime attribute are Y, W, and P
Prepared By- Charu Kavadia Page 40
Database Management System Unit 3
From above arrow diagram on R, we can see that an attribute PQ is not determined by any of the given FD,
hence PQ will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have PQ compulsory attribute.
Definition of 3NF: First it should be in 2NF and if there exists a non-trivial dependency between two sets of
attributes X and Y such that X → Y (i.e., Y is not a subset of X) then
c) Either X is Super Key
d) Or Y is a prime attribute.
Since R has 10 attributes: - P, Q, R, S, T, U, V, W, X, Y, V, W and Candidate Key is PQ, Therefore, prime
attribute (part of candidate key) are P and Q while a non-prime attribute are R S T U V W X Y V W
Given FD are {PQ → R, P → ST, Q → U, U → VW and S → XY} and Super Key / Candidate Key is PQ
a) FD: PQ → R satisfy the definition of 3NF, as PQ Super Key
b) FD: P → ST does not satisfy the definition of 3NF, that neither P is Super Key nor ST is the prime
attribute
c) FD: Q → U does not satisfy the definition of 3NF, that neither Q is Super Key nor U is a prime
attribute
d) FD: U → VW does not satisfy the definition of 3NF, that neither U is Super Key nor VW is a prime
attribute
e) FD: S → XY does not satisfy the definition of 3NF, that neither S is Super Key nor XY is a prime
attribute
R4( S, X, Y) { Using FD S → XY }
R5( P, Q, R) { Using FD PQ → R, and candidate key PQ }
All the decomposed tables R1, R2, R3, R4, and R5 are in 2NF( as there is no partial dependency) as well as in
3NF.
Hence decomposed tables are:
R1(P, S, T), R2(Q, U), R3(U, V, W), R4( S, X, Y), and R5( P, Q, R)
From the above arrow diagram on R, we can see that an attribute X is not determined by any of the given FD,
hence X will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have X compulsory attribute.
Using the Definition of 3NF to check whether R is in 3NF?: First, it should be in 2NF and if there exists a
non-trivial dependency between two sets of attributes X and Y such that X → Y ( i.e. Y is not a subset of X)
then
a) Either X is Super Key
b) Or Y is a prime attribute.
Given FD are XY → Z, and Z → Y and Super Key / Candidate Key are XZ and XY
a) FD: X Y → Z satisfies the definition of 3NF, as XY is Super Key also Z is a prime attribute.
b) FD: Z → Y satisfies the definition of 3NF, even though Z is not Super Key but Y is a prime attribute.
Since both FD of R, XY → Z and Z → Y satisfy the definition of 3NF hence R is in 3 NF
Using the Definition of BCNF to check whether R is in BCNF?: First, it should be in 3NF and if there exists
a non-trivial dependency between two sets of attributes X and Y such that X → Y ( i.e. Y is not a subset of X)
then
a) X is Super Key
Given FD are XY → Z, and Z → Y and Super Key / Candidate Key is XZ and XY
b) FD: X Y → Z satisfies the definition of BCNF, as XY is Super Key.
c) FD: Z → Y does not satisfy the definition of BCNF, as Z is not Super Key Since both FD of R, XY → Z
and Z → Y satisfy the definition of 3NF hence R is in 3 NF
Convert the table R( X, Y, Z) into BCNF:
Since due to FD: Z → Y, our table was not in BCNF, let's decompose the table
FD: Z→ Y was creating an issue, hence one table R1( Z, Y )
Create Table for key XY R2(X, Y) as XY was candidate key
Create Table for key XZ R2(X, Z) as XZ was candidate key
Note: When we have more than one key( eg: XY and XY) then while decomposing keep in mind that you
compare both R2 and R3 with R1 such that among R1 and R2 or R1 and R3 there should be at least one
common attribute and, that common attribute must be key in any of the table.
Considering R1( Z, Y) and R2(X, Y) both tables have one common attribute Y, but Y is not key in any of the
table R1 and R2, hence we discard R2(X, Y) i.e. discarding candidate key XY.
Considering R1( Z, Y) and R3(X, Z) both tables have one common attribute Z, and Z is key of the table R1,
hence we include R3(X, Z) i.e. including candidate key XZ.
From the above arrow diagram on R, we can see that an attribute X is not determined by any of the given FD,
hence X will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have X compulsory attribute.
Using the Definition of BCNF to check whether R is in BCNF?: First, it should be in 3NF and if there exists
a non-trivial dependency between two sets of attributes X and Y such that X → Y ( i.e. Y is not a subset of X)
then
a) X is Super Key
Using the Definition of 3NF to check whether R is in 3NF?: If there exists a non-trivial dependency
between two sets of attributes X and Y such that X → Y ( i.e. Y is not a subset of X) then
a) Either X is Super Key
b) Or Y is a prime attribute.
a) FD: X → Y is in 3NF (as X is a super Key)
b) FD: Y → Z is not in 3NF (as neither Y is Key nor Z is a prime attribute)
Hence because of Y → Z using definition 2 of 3NF, we can say that above table R is not in 3NF.
Convert the table R( X, Y, Z) into 3NF:
Since due to FD: Y → Z our table was not in 3NF, let's decompose the table
FD: Y → Z was creating issue, hence one table R1(Y, Z)
Create one Table for key X, R2(X, Y), since X → Y
STEP 6: If for any FD STEP 6 fails (it signifies that table is not in BCNF), then verify that FD and remaining
FD with Definition of 2NF (No non-prime attribute should be partially dependent on the key of table).
STEP 7: If for any FD STEP 7 fails (it signifies that table is not in 2NF), hence no need to check it for 1NF, as
by default it is in 1NF.
STEP 8: If all the FD's satisfy the definition of BCNF then we can say that given R is in BCNF, if any FD fails
for BCNF and that FD and remaining FD satisfy for 3NF then we say R is in 3NF, similarly if any FD fails for
3NF and that FD and remaining FD satisfy for 2NF then we say R is in 2NF, otherwise the table is in 1NF.
Definition of 2NF: No non-prime attribute should be partially dependent on Candidate Key. i.e. there should
not be a partial dependency from X → Y.
Definition of 3NF: First, it should be in 2NF and if there exists a non-trivial dependency between two sets of
attributes X and Y such that X → Y (i.e. Y is not a subset of X) then
a) Either X is Super Key
b) Or Y is a prime attribute.
Definition of BCNF: First, it should be in 3NF and if there exists a non-trivial dependency between two sets
of attributes X and Y such that X → Y (i.e., Y is not a subset of X) then
a) X is Super Key
From the above arrow diagram on R, we can see that an attribute PQS is not determined by any of the given
FD, hence PQS will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and
how many will be the candidate key, but all will have PQS compulsory attribute.
Let us calculate the closure of PQS
PQS + = P Q R S T U X V W (from the closure method we studied earlier)
Since the closure of PQS contains all the attributes of R, hence PQS is Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no proper subset is a Super
key)
Since all key will have PQS as an integral part, and we have proved that PQS is Candidate Key, Therefore, any
superset of PQS will be Super Key but not a Candidate key.
Hence there will be only one candidate key PQS
Since R has 9 attributes: - P, Q, R, S, T, U, V, W, X, and Candidate Key is PQS, Therefore, prime attributes
(part of candidate key) are P Q and S while a non-prime attribute is R T U V W X
Given FD are { PQ → R, QS → TU, PS → VW, and P → X } and Super Key / Candidate Key is PQS
NOTE: To solve such questions, we apply reverse engineering, i.e. 1st check BCNF, if not then 3NF, if not then
2NF, and so on.
a. FD: PQ → R does not satisfy the definition of BCNF, as PQ is not Super Key, hence the table is not in
BCNF (because if one dependency fails, all fails) now we check the same FD for 3NF.
b. FD: PQ → R even does not satisfy the definition of 3NF, as PQ is not Super Key or R is not a prime
attribute, hence table is not in 3NF also (because if one dependency fails, all fails) now we check same
FD for 2NF
c. FD: PQ → R even does not satisfy the definition of 2NF, as PQ is not Super Key and R which is not
prime attribute depending on part of the key (partial dependency), hence table is not in 2NF also
(because if one dependency fails, all fails).
Hence from the above three statements, we can say that table R ( P, Q, R, S, T, U, V, W, X) is in 1NF
only.
From the above arrow diagram on R, we can see that an attribute PQ is not determined by any of the given FD,
hence PQ will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have PQ compulsory attribute.
Let us calculate the closure of PQ
PQ + = P Q R S T U V W (from the closure method we studied earlier)
Since the closure of PQ contains all the attributes of R, hence PQ is Candidate Key
From the definition of Candidate Key (Candidate Key is a Super Key whose no proper subset is a Super
key)
Prepared By- Charu Kavadia Page 48
Database Management System Unit 3
Since all key will have PQ as an integral part, and we have proved that PQ is Candidate Key, Therefore, any
superset of PQ will be Super Key but not Candidate key.
Hence there will be only one candidate key PQ
Since R has 8 attributes: - P, Q, R, S, T, U, V, W and Candidate Key is PQ. Therefore, prime attribute (part of
candidate key) are P and Q while a non-prime attribute is R S T U V W
Given FD are { PQ → R, P → ST, Q → U, and U → VW } and Super Key / Candidate Key is PQ
NOTE: To solve such questions, we apply reverse engineering, i.e. 1st check BCNF, if not then 3NF, if not then
2NF, and so on.
a. FD: PQ → R satisfies the definition of BCNF, as PQ is Super Key, hence no need to check it for
further normal forms, as it satisfies the highest one. Now we check another dependency in a reverse
engineering manner.
b. FD: P → ST does not satisfy the definition of BCNF, as P is not Super Key, hence table is not in BCNF
(because if one dependency fails, all fails) now we check the same FD for 3NF.
c. FD: P → ST even does not satisfy the definition of 3NF, as P is not Super Key or S T is not a prime
attribute, hence table is not in 3NF also (because if one dependency fails, all fails) now we check same
FD for 2NF.
d. FD: P → ST even does not satisfy the definition of 2NF, as P is not Super Key and S T which is not
prime attribute depending on part of the key (partial dependency), hence table is not in 2NF also
(because if one dependency fails, all fails).
Hence from the above three statements b, c, and d we can say that table R ( P, Q, R, S, T, U, V, W, ) is in
1NF only.
From the above arrow diagram on R, we can see that an attribute QS is not determined by any of the given FD,
hence QS will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have QS compulsory attribute.
Let us calculate the closure of QS
QS + = QS (from the closure method we studied earlier)
Since closure QS does not contain all the attributes of R, hence QS is not the Candidate key.
On making a combination of QS with another attribute, we found that PQS and RQS determine all the
attributes of R, hence PQS and RQS are candidate keys of R.
Since R has 6 attributes: - P, Q, R, S, T, U and Candidate Key is PQS and RQS, Therefore, prime attributes
(part of candidate key) are P Q R and S while a non-prime attribute is T U
Given FD are { PQ → R, SR→ PT, T → U } and Super Key / Candidate Key is PQS and RQS
NOTE: To solve such questions, we apply reverse engineering, i.e. 1st check BCNF, if not then 3NF, if not then
2NF, and so on.
a. FD: PQ → R does not satisfy the definition of BCNF, as PQ is not Super Key, hence table is not in
BCNF (because if one dependency fails, all fails) now we check the same FD for 3NF
b. FD: PQ → R satisfies the definition of 3NF, even though PQ is not Super Key but R is attributed,
hence the table is in 3NF now we check other FD's using reverse engineering process.
c. FD: SR → PT does not satisfy the definition of 3NF, as SR is not Super Key or P T is not prime
attribute (as P is prime but the combination should be prime attribute), hence table is not in 3NF
(because if one dependency fails, all fails). now we check the same FD for 2NF
d. FD: SR → PT does not satisfy the definition of 2NF, as SR is not Super Key or P T which is not prime
attribute depending on part of the key (partial dependency), hence table is not in 2NF also (because if
one dependency fails, all fails).
Hence from the above two statements c and d, we can say that table R ( P, Q, R, S, T, U) is in 1NF only.
From the above arrow diagram on R, we can see that an attribute R is not determined by any of the given FD,
hence R will be the integral part of the Candidate key, i.e. no matter what will be the candidate key, and how
many will be the candidate key, but all will have R compulsory attribute.
Let us calculate the closure of R
R + = R (from the closure method we studied earlier)
Since closure R does not contain all the attributes of R, hence R is not Candidate key.
On making a combination of R with another attribute, we found that RS and RQ determine all the attributes of
R, hence RS and RQ are candidate keys of R.
Since R has 5 attributes: - P, Q, R, S, T and Candidate Keys are RS and RQ, Therefore prime attributes (part of
candidate key) are S Q R while a non-prime attribute is TP
Given FD are { QR → PST, S → Q } and Super Key / Candidate Key is RS and RQ
NOTE: To solve such questions, we apply reverse engineering, i.e. 1st check BCNF, if not then 3NF, if not then
2NF, and so on.
a. FD: QR → PST satisfies the definition of BCNF, as QR is Super Key, we check other FD for BCNF
b. FD: S → Q does not satisfy the definition of BCNF, as S is not Super Key, hence table is not in BCNF
(because if one dependency fails, all fails) now we check the same FD for 3NF
c. FD: S → Q satisfies the definition of 3NF, even though S is not Super Key but Q is the prime attribute,
hence the table is in 3NF.
Since there were only two FD's, out of which one ( QR → PST ) satisfy BCNF while the other ( S → Q)
satisfy 3NF, hence the highest normal form is 3NF R(P, Q, R, S, T) is in 3NF.
From the above arrow diagram on R, we can see that all the attributes are determined by all the attributes of the
given FD, hence we will check all the attributes (i.e., A, B, and C) for candidate keys
Let us calculate the closure of A
A + = ABC (from the closure method we studied earlier)
Since closure A contains all the attributes of R, hence A is the Candidate key.
Let us calculate the closure of B
B + = BAC (from the closure method we studied earlier)
Since closure B contains all the attributes of R, hence B is the Candidate key.
Let us calculate the closure of C
C + = CAB (from the closure method we studied earlier)
Since closure C contains all the attributes of R, hence C is the Candidate key.
Hence three Candidate keys are: A B and C
Prepared By- Charu Kavadia Page 51
Database Management System Unit 3
Since R has 3 attributes: - A B and C, Candidate Keys are A B and C, Therefore, prime attributes (part of
candidate key) are A B C while there is no non-prime attribute
Given FD are { A → B, B → C, and C → A } and Super Key / Candidate Key is A B and C
NOTE: To solve such questions, we apply reverse engineering, i.e. 1st check BCNF, if not then 3NF, if not then
2NF, and so on.
a. FD: A → B satisfy the definition of BCNF, as A is Super Key, we check other FD for BCNF
b. FD: B → C satisfy the definition of BCNF, as B is Super Key, we check other FD for BCNF
c. FD: C à A satisfy the definition of BCNF, as C is Super Key
Since there were only three FD's and all FD: { A → B, B → C and C → A } satisfy BCNF, hence the
highest normal form is BCNF.
Therefore R(A, B, C ) is in BCNF.