0% found this document useful (0 votes)
15 views29 pages

CBD 04 Normalisation

The document discusses the normalization process for relational schema design, focusing on the first three normal forms (1NF, 2NF, 3NF) and Boyce-Codd Normal Form (BCNF). It outlines the importance of functional dependencies, primary keys, and the necessity of minimizing redundancy and anomalies in database design. The document also emphasizes practical applications of normalization in real-world database projects and the challenges associated with achieving higher normal forms.

Uploaded by

llow5735
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views29 pages

CBD 04 Normalisation

The document discusses the normalization process for relational schema design, focusing on the first three normal forms (1NF, 2NF, 3NF) and Boyce-Codd Normal Form (BCNF). It outlines the importance of functional dependencies, primary keys, and the necessity of minimizing redundancy and anomalies in database design. The document also emphasizes practical applications of normalization in real-world database projects and the challenges associated with achieving higher normal forms.

Uploaded by

llow5735
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Conception de Bases de

Données

Normalization process for relational schema


design.

Y. Sam (DI - Univ. de Tours) L2 Info.


Normal Forms Based on Primary Keys
❑ Having introduced functional dependencies, we are now ready to
specify how to use them to develop a formal methodology for testing
and improving relation schemas.

❑ We assume that :
▪ a set of functional dependencies is given for each relation, and
▪ each relation has a designated primary key;
this information combined with the tests (conditions) for normal forms
drives the normalization process for relational schema design.

2
Normal Forms Based on Primary Keys
❑ Evaluate each relation for goodness and decompose it further as
needed to achieve higher normal forms using the normalization
theory

❑ We fucus on first three normal forms for relation schemas, the


intuition behind them, and how they were developed

❑ We start by informally discussing normal forms and the motivation


behind their development

3
Normalization of Relations
❑ The normalization process takes a relation schema through a series of tests to certify whether it
satisfies a certain normal form.

❑ The process, which proceeds in a top-down fashion by evaluating each relation against the criteria
for normal forms and decomposing relations as necessary, can thus be considered as relational
design by analysis.

❑ 3 Normal Forms (NF) + Byce-Codd NF, based on functional dependencies

❑ 4 et 5 NF, based on multivalued dependencies and join dependencies, respectively

4
Normalization of data
❑ process of analyzing the given relation schemas based on their FDs
and primary keys to achieve the desirable properties of
▪ (1) minimizing redundancy and
▪ (2) minimizing the insertion, deletion, and update anomalies
❑ a “filtering” or “purification” process to make the design have
successively better quality.
▪ An unsatisfactory relation schema that does not meet the condition for a normal form—the
normal form test—is decomposed into smaller relation schemas that contain a subset of the
attributes and meet the test that was otherwise not met by the original relation

5
Normalization procedure
❑ provides database designers with :
▪ A formal framework for analyzing relation schemas based on their keys and on the functional
dependencies among their attributes
▪ A series of normal form tests that can be carried out on individual relation schemas so that
the relational database can be normalized to any desired degree

❑ The normal form of a relation refers to the highest normal form


condition that it meets, and hence indicates the degree to which it
has been normalized.

6
Normalization procedure
❑ Normal forms, when considered in isolation from other factors, do not guarantee
a good database design.

❑ It is generally not sufficient to check separately that each relation schema in the
database is, say, in BCNF or 3NF. Rather, the process of normalization through
decomposition must also include two properties
▪ The nonadditive join or lossless join property, which guarantees that the spurious tuple generation
problem does not occur with respect to the relation schemas created after decomposition (mandatory).
▪ The dependency preservation property, which ensures that each functional dependency is represented
in some individual relation resulting after decomposition (can sometimes be sacrified)

7
Practical Use of Normal Forms
❑ Most practical design projects in commercial and governmental environment
acquire existing designs of databases from previous designs, from designs in
legacy models, or from existing files. They are certainly interested in assuring that
the designs are good quality and sustainable over long periods of time.

❑ Existing designs are evaluated by applying the tests for normal forms, and
normalization is carried out in practice so that the resulting designs are of high
quality and meet the desirable properties stated previously.

8
Practical Use of Normal Forms
❑ Although several higher normal forms have been defined, such as the 4NF and 5NF, the practical
utility of these normal forms becomes questionable. The reason is that the constraints on which
they are based are rare and hard for the database designers and users to understand or to detect.
Designers and users must either already know them or discover them as a part of the business.
Thus, database design as practiced in industry today pays particular attention to normalization
only up to 3NF, BCNF, or at most 4NF.

❑ Another point worth noting is that the database designers need not normalize to the highest
possible normal form. Relations may be left in a lower normalization status, such as 2NF, for
performance reasons
▪ Doing so incurs the corresponding penalties of dealing with the anomalies.
▪ Denormalization is the process of storing the join of higher normal form relations as a base relation, which is in a
lower normal form.

9
Keys and Attributes Participating in Keys

❑ The difference between a key and a superkey is that a key has to be minimal; that is, if we have a key
K = {A1, A2, … , Ak} of R, then K - {Ai} is not a key of R for any Ai, 1 <= i <= k.

❑ {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename}, {Ssn, Ename, Bdate}, and any set of attributes that includes
Ssn are all superkeys.

10
Keys and Attributes Participating in Keys
❑ If a relation schema has more than one key, each is called a candidate key.
❑ One of the candidate keys is arbitrarily designated to be the primary key, and the others are called
secondary keys.
❑ In a practical relational database, each relation schema must have a primary key.
❑ If no candidate key is known for a relation, the entire relation can be treated as a default superkey.

❑ {Ssn} is the only candidate key for EMPLOYEE, so it is also the primary key.

11
First Normal Form
❑ It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the
value of any attribute in a tuple must be a single value from the domain of that attribute. Hence, 1NF
disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single
tuple.

❑ We assume that each department can have


a number of locations.

❑ The DEPARTMENT relation is not in 1NF because


Dlocations is not an atomic attribute.

12
First Normal Form
❑ There are three main techniques to achieve first normal form for such a relation
▪ Remove the attribute Dlocations that violates 1NF and place it in a separate relation DEPT_LOCATIONS along with
the primary key Dnumber of DEPARTMENT.
▪ The primary key of this newly formed relation is the combination {Dnumber, Dlocation}.
▪ A distinct tuple in DEPT_LOCATIONS exists for each location of a department.
▪ This decomposes the non-1NF relation into two 1NF relations.

13
First Normal Form
❑ Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for each location
of a DEPARTMENT,

❑ In this case, the primary key becomes the combination {Dnumber,Dlocation}.


This solution has the disadvantage of introducing redundancy in the relation and hence is rarely adopted.

14
First Normal Form
❑ If a maximum number of values is known for the attribute—for example, if it is known that at most three locations
can exist for a department—replace the Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2,
and Dlocation3.

❑ This solution has the disadvantage of introducing NULL values if most departments have fewer than three
locations. It further introduces spurious semantics about the ordering among the location values; that ordering is
not originally intended.

❑ Querying on this attribute becomes more difficult;


▪ E.g., consider how you would write the query: List the departments that have ‘Bellaire’ as one of their locations in this design. For all these
reasons, it is best to avoid this alternative.

❑ The first solution is generally considered best because:


▪ it does not suffer from redundancy and it is completely general;
▪ it places no maximum limit on the number of values;
▪ If we choose the second solution, it will be decomposed further during subsequent normalization steps into the first solution.

15
Second Normal Form
❑ Second normal form (2NF) is based on the concept of full functional dependency.
❑ A functional dependency X -> Y is a full functional dependency if removal of any attribute A from X means
that the dependency does not hold anymore.

❑ The test for 2NF involves testing for functional dependencies whose left-hand side attributes are part of the
primary key.
❑ If the primary key contains a single attribute, the test need not be applied at all

❑ 1NF but not in 2NF

16
Second Normal Form

17
Third Normal Form
❑ Third normal form (3NF) is based on the concept of transitive dependency.
❑ A functional dependency X -> Y in a relation schema R is a transitive dependency if there exists a set of
attributes Z in R that is neither a candidate key nor a subset of any key of R and both X -> Z and Z -> Y hold.

❑ The dependency Ssn -> Dmgr_ssn is transitive through Dnumber in EMP_DEPT because both the
dependencies Ssn -> Dnumber and Dnumber -> Dmgr_ssn hold and Dnumber is neither a key itself nor a
subset of the key of EMP_DEPT.
❑ Intuitively, we can see that the dependency of Dmgr_ssn on Dnumber is undesirable in EMP_DEPT since
Dnumber is not a key of EMP_DEPT.

18
Summary of 3 NFs

19
Decomposition Process
❑ Figure (a), in the next slide, describes parcels of land for sale in various counties
of a state.

❑ Suppose that there are two candidate keys: Property_id# and {County_name,
Lot#};
▪ that is, lot numbers are unique only within each county,
▪ but Property_id# numbers are unique across counties for the entire state.

20
Decomposition Process

21
Decomposition Process
❑ Based on the two candidate keys Property_id# and {County_name, Lot#}, the
functional dependencies FD1 and FD2 in Figure 14.12(a) hold.

❑ We choose Property_id# as the primary key, so it is underlined in Figure 14.12(a),


but no special consideration will be given to this key over the other candidate key.

❑ Suppose that the following two additional functional dependencies hold in LOTS:
▪ FD3 says that the tax rate is fixed for a given county (does not vary lot by lot within the same county),
▪ FD4 says that the price of a lot is determined by its area regardless of which county it is in.

22
Decomposition Process
❑ The LOTS relation schema violates the general definition of 2NF because Tax_rate is partially
dependent on the candidate key {County_name, Lot#}, due to FD3.

❑ To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2, shown in
Figure 14.12(b).

❑ We construct LOTS1 by removing the attribute Tax_rate that violates 2NF from LOTS and placing it
with County_name (the left-hand side of FD3 that causes the partial dependency) into another
relation LOTS2.
▪ Both LOTS1 and LOTS2 are in 2NF.
▪ Notice that FD4 does not violate 2NF and is carried over to LOTS1.

23
Decomposition Process
❑ LOTS2 (Figure 14.12(b)) is in 3NF.

❑ However, FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not
a prime attribute in LOTS1.

❑ To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and LOTS1B shown in
Figure 14.12(c).

❑ We construct LOTS1A by removing the attribute Price that violates 3NF from LOTS1 and placing it with
Area into another relation LOTS1B.

❑ Both LOTS1A and LOTS1B are in 3NF.

24
Boyce-Codd Normal Form
❑ Suppose that we have thousands of lots in the relation but the lots are from only two counties: DeKalb and
Fulton.
❑ Suppose also that lot sizes in DeKalb County are only 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 acres, whereas lot sizes in
Fulton County are restricted to 1.1, 1.2, … , 1.9, and 2.0 acres
▪ FD5: Area -> County_name.
▪ LOTS1A still is in 3NF
❑ County_name being a prime attribute.

❑ This decomposition reduces the redundancy of repeating


the same information in the thousands of LOTS1A tuples.
❑ BCNF is a stronger normal form that would disallow LOTS1A
and suggest the need for decomposing it.

25
Boyce-Codd Normal Form
❑ In practice, most relation schemas that are in 3NF are also in BCNF.
▪ Only if there exists some f.d. X-> A that holds in a relation schema R
❑ with X not being a superkey
❑ and A being a prime attribute

❑ Such an FD leads to potential redundancy of data, as we illustrated above


▪ FD5: Area -> County_name in LOTS1A relation.

❑ Ideally, relational database design should strive to achieve BCNF or 3NF for every relation
schema.

❑ Achieving the normalization status of just 1NF or 2NF is not considered adequate, since
both were developed historically to be intermediate normal forms as stepping stones to
3NF and BCNF.

26
Boyce-Codd Normal Form
❑ Let TEACH be a relation with the following dependencies:

▪ DF2 means that each instructor teaches one course is a constraint for this application.
▪ Note that {Student, Course} is a candidate key for this relation
▪ Student : A
▪ Course as B,
▪ Instructor as C

❑ Hence this relation is in 3NF but not BCNF.


❑ Decomposition of this relation schema into two schemas is not straightforward because it may be
decomposed into one of the three following possible pairs:

❑ All three decompositions lose the functional dependency FD1

27
Boyce-Codd Normal Form
❑ Which of the above three is a desirable decomposition?
❑ We are not able to meet the functional dependency preservation for any of the above BCNF decompositions
as seen above; but we must meet the nonadditive join property. A simple test comes in handy to test the
binary decomposition of a relation into two relations:

❑ If we apply this test to the above three decompositions, we find that only the third decomposition meets the
test. In the third decomposition, the R1  R2 for the above test is Instructor and R1 - R2 is Course. Because
Instructor -> Course, the NJB test is satisfied and the decomposition is nonadditive.
❑ Hence, the proper decomposition of TEACH into BCNF relations is:
❑ TEACH1 (Instructor, Course) and TEACH2 (Instructor, Student)

28
Boyce-Codd Normal Form
❑ We make sure that we meet this property, because nonadditive decomposition is a must during
normalization.
❑ In general, a relation R not in BCNF can be decomposed so as to meet the nonadditive join property by the
following procedure.16 It decomposes R successively into a set of relations that are in BCNF:

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy