Chapter 2 - Relational Model and Languages
Chapter 2 - Relational Model and Languages
147
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 147 10/06/14 4:27 PM
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 148 10/06/14 4:27 PM
C h apt e r
Chapter Objectives
In this chapter you will learn:
The Relational Database Management System (RDBMS) has become the domi-
nant data-processing software in use today, with an estimated total software
revenue worldwide of US$24 billion in 2011 and estimated to grow to about
US$37 billion by 2016. This software represents the second generation of DBMSs
and is based on the relational data model proposed by E. F. Codd (1970). In the
relational model, all data is logically structured within relations (tables). Each
relation has a name and is made up of named attributes (columns) of data. Each
tuple (row) contains one value per attribute. A great strength of the relational
model is this simple logical structure. Yet behind this simple structure is a sound
theoretical foundation that is lacking in the first generation of DBMSs (the net-
work and hierarchical DBMSs).
We devote a significant amount of this book to the RDBMS, in recognition
of the importance of these systems. In this chapter, we discuss the terminology
and basic structural concepts of the relational data model. In the next chapter,
we examine the relational languages that can be used for update and data
retrieval.
149
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 149 10/06/14 4:27 PM
150 | Chapter 4 The Relational Model
Although interest in the relational model came from several directions, the most
significant research may be attributed to three projects with rather different per-
spectives. The first of these, at IBM’s San José Research Laboratory in California,
was the prototype relational DBMS System R, which was developed during the late
1970s (Astrahan et al., 1976). This project was designed to prove the practicality
of the relational model by providing an implementation of its data structures
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 150 10/06/14 4:27 PM
4.1 Brief History of the Relational Model | 151
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 151 10/06/14 4:27 PM
152 | Chapter 4 The Relational Model
4.2 Terminology
The relational model is based on the mathematical concept of a relation, which is
physically represented as a table. Codd, a trained mathematician, used terminology
taken from mathematics, principally set theory and predicate logic. In this section
we explain the terminology and structural concepts of the relational model.
An RDBMS requires only that the database be perceived by the user as tables.
Note, however, that this perception applies only to the logical structure of the
database: that is, the external and conceptual levels of the ANSI-SPARC archi-
tecture discussed in Section 2.1. It does not apply to the physical structure of the
database, which can be implemented using a variety of storage structures (see
Appendix F).
In the relational model, relations are used to hold information about the
objects to be represented in the database. A relation is represented as a two-
dimensional table in which the rows of the table correspond to individual records
and the table columns correspond to attributes. Attributes can appear in any
order and the relation will still be the same relation, and therefore will convey
the same meaning.
For example, the information on branch offices is represented by the Branch
relation, with columns for attributes branchNo (the branch number), street, city, and
postcode. Similarly, the information on staff is represented by the Staff relation, with
columns for attributes staffNo (the staff number), fName, IName, position, sex, DOB
(date of birth), salary, and branchNo (the number of the branch the staff member
works at). Figure 4.1 shows instances of the Branch and Staff relations. As you can see
from this example, a column contains values of a single attribute; for example, the
branchNo columns contain only numbers of existing branch offices.
Domain A domain is the set of allowable values for one or more attributes.
Domains are an extremely powerful feature of the relational model. Every attrib-
ute in a relation is defined on a domain. Domains may be distinct for each attribute,
or two or more attributes may be defined on the same domain. Figure 4.2 shows the
domains for some of the attributes of the Branch and Staff relations. Note that at any
given time, typically there will be values in a domain that do not currently appear
as values in the corresponding attribute.
The domain concept is important, because it allows the user to define in a central
place the meaning and source of values that attributes can hold. As a result, more
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 152 10/06/14 4:27 PM
4.2 Terminology | 153
Cardinality
B007 16 Argyll St Aberdeen AB2 3SU
B003 163 Main St Glasgow G11 9QX
B004 32 Manse Rd Bristol BS99 1NZ
B002 56 Clover Dr London NW10 6EU
Degree
Primary key Foreign key
Staff
Figure 4.2
Attribute Domain Name Meaning Domain Definition
Domains for
branchNo BranchNumbers The set of all possible branch numbers character: size 4, range B001–B999 some attributes
street StreetNames The set of all street names in Britain character: size 25 of the Branch and
city CityNames The set of all city names in Britain character: size 15 Staff relations.
postcode Postcodes The set of all postcodes in Britain character: size 8
sex Sex The sex of a person character: size 1, value M or F
DOB DatesOfBirth Possible values of staff birth dates date, range from 1-Jan-20,
format dd-mmm-yy
salary Salaries Possible values of staff salaries monetary: 7 digits, range
6000.00–40000.00
The elements of a relation are the rows or tuples in the table. In the Branch rela-
tion, each row contains four values, one for each attribute. Tuples can appear in
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 153 10/06/14 4:27 PM
154 | Chapter 4 The Relational Model
any order and the relation will still be the same relation, and therefore convey the
same meaning.
The structure of a relation, together with a specification of the domains and any
other restrictions on possible values, is sometimes called its intension, which is usu-
ally fixed, unless the meaning of a relation is changed to include additional attrib-
utes. The tuples are called the extension (or state) of a relation, which changes
over time.
The Branch relation in Figure 4.1 has four attributes or degree four. This means
that each row of the table is a four-tuple, containing four values. A relation with only
one attribute would have degree one and be called a unary relation or one-tuple.
A relation with two attributes is called binary, one with three attributes is called
ternary, and after that the term n-ary is usually used. The degree of a relation is a
property of the intension of the relation.
By contrast, the number of tuples is called the cardinality of the relation and this
changes as tuples are added or deleted. The cardinality is a property of the extension
of the relation and is determined from the particular instance of the relation at any
given moment. Finally, we define a relational database.
Alternative terminology
The terminology for the relational model can be quite confusing. We have intro-
duced two sets of terms. In fact, a third set of terms is sometimes used: a relation
may be referred to as a file, the tuples as records, and the attributes as fields. This
terminology stems from the fact that, physically, the RDBMS may store each rela-
tion in a file. Table 4.1 summarizes the different terms for the relational model.
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 154 10/06/14 4:27 PM
4.2 Terminology | 155
Any set of n-tuples from this Cartesian product is a relation on the n sets. Note that
in defining these relations we have to specify the sets, or domains, from which we
choose values.
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 155 10/06/14 4:27 PM
156 | Chapter 4 The Relational Model
Let A1, A2, . . . , An be attributes with domains D1, D2, . . . , Dn. Then the set {A1:D1,
A2:D2, . . . , An:Dn} is a relation schema. A relation R defined by a relation schema S
is a set of mappings from the attribute names to their corresponding domains.
Thus, relation R is a set of n-tuples:
(A1:d1, A2:d2, . . . , An:dn) such that d1 ∈ D1, d2 ∈ D2, . . . , dn ∈ Dn
Each element in the n-tuple consists of an attribute and a value for that attribute.
Normally, when we write out a relation as a table, we list the attribute names as column
headings and write out the tuples as rows having the form (d1, d2, . . . , dn), where each
value is taken from the appropriate domain. In this way, we can think of a relation
in the relational model as any subset of the Cartesian product of the domains of the
attributes. A table is simply a physical representation of such a relation.
In our example, the Branch relation shown in Figure 4.1 has attributes branchNo,
street, city, and postcode, each with its corresponding domain. The Branch relation
is any subset of the Cartesian product of the domains, or any set of four-tuples in
which the first element is from the domain BranchNumbers, the second is from the
domain StreetNames, and so on. One of the four-tuples is:
{(B005, 22 Deer Rd, London, SW1 4EH)}
or more correctly:
{(branchNo: B005, street: 22 Deer Rd, city: London, postcode: SW1 4EH)}
We refer to this as a relation instance. The Branch table is a convenient way of writ-
ing out all the four-tuples that form the relation at a specific moment in time, which
explains why table rows in the relational model are called “tuples”. In the same way
that a relation has a schema, so too does the relational database.
If R1, R2, . . . , Rn are a set of relation schemas, then we can write the relational
database schema, or simply relational schema, R, as:
R = {R1, R2, . . . , Rn}
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 156 10/06/14 4:27 PM
4.2 Terminology | 157
To illustrate what these restrictions mean, consider again the Branch relation
shown in Figure 4.1. Because each cell should contain only one value, it is illegal
to store two postcodes for a single branch office in a single cell. In other words,
relations do not contain repeating groups. A relation that satisfies this property
is said to be normalized or in first normal form. (Normal forms are discussed in
chapters 14 and 15.)
The column names listed at the tops of columns correspond to the attributes
of the relation. The values in the branchNo attribute are all from the BranchNumbers
domain; we should not allow a postcode value to appear in this column. There can be
no duplicate tuples in a relation. For example, the row (B005, 22 Deer Rd, London,
SW1 4EH) appears only once.
Provided that an attribute name is moved along with the attribute values, we can
interchange columns. The table would represent the same relation if we were to
put the city attribute before the postcode attribute, although for readability it makes
more sense to keep the address elements in the normal order. Similarly, tuples can
be interchanged, so the records of branches B005 and B004 can be switched and
the relation will still be the same.
Most of the properties specified for relations result from the properties of math-
ematical relations:
• When we derived the Cartesian product of sets with simple, single-valued ele-
ments such as integers, each element in each tuple was single-valued. Similarly,
each cell of a relation contains exactly one value. However, a mathematical rela-
tion need not be normalized. Codd chose to disallow repeating groups to simplify
the relational data model.
• In a relation, the possible values for a given position are determined by the set,
or domain, on which the position is defined. In a table, the values in each col-
umn must come from the same attribute domain.
• In a set, no elements are repeated. Similarly, in a relation, there are no dupli-
cate tuples.
• Because a relation is a set, the order of elements has no significance. Therefore,
in a relation, the order of tuples is immaterial.
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 157 10/06/14 4:27 PM
158 | Chapter 4 The Relational Model
There may be several candidate keys for a relation. When a key consists of more
than one attribute, we call it a composite key. Consider the Branch relation shown
in Figure 4.1. Given a value of city, we can determine several branch offices (for
example, London has two branch offices). This attribute cannot be a candidate
key. On the other hand, because DreamHome allocates each branch office a unique
branch number, given a branch number value, branchNo, we can determine at most
one tuple, so that branchNo is a candidate key. Similarly, postcode is also a candidate
key for this relation.
Now consider a relation Viewing, which contains information relating to properties
viewed by clients. The relation comprises a client number (clientNo), a property num-
ber (propertyNo), a date of viewing (viewDate) and, optionally, a comment (comment).
Given a client number, clientNo, there may be several corresponding viewings for dif-
ferent properties. Similarly, given a property number, propertyNo, there may be several
clients who viewed this property. Therefore, clientNo by itself or propertyNo by itself
cannot be selected as a candidate key. However, the combination of clientNo and prop-
ertyNo identifies at most one tuple, so for the Viewing relation, clientNo and propertyNo
together form the (composite) candidate key. If we need to take into account the pos-
sibility that a client may view a property more than once, then we could add viewDate
to the composite key. However, we assume that this is not necessary.
Note that an instance of a relation cannot be used to prove that an attribute or
combination of attributes is a candidate key. The fact that there are no duplicates
for the values that appear at a particular moment in time does not guarantee that
duplicates are not possible. However, the presence of duplicates in an instance can
be used to show that some attribute combination is not a candidate key. Identifying
a candidate key requires that we know the “real-world” meaning of the attribute(s)
involved so that we can decide whether duplicates are possible. Only by using this
semantic information can we be certain that an attribute combination is a candi-
date key. For example, from the data presented in Figure 4.1, we may think that
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 158 10/06/14 4:27 PM
4.2 Terminology | 159
a suitable candidate key for the Staff relation would be IName, the employee’s sur-
name. However, although there is only a single value of “White” in this instance
of the Staff relation, a new member of staff with the surname “White” may join the
company, invalidating the choice of IName as a candidate key.
Primary The candidate key that is selected to identify tuples uniquely within
key the relation.
Foreign An attribute, or set of attributes, within one relation that matches the
key candidate key of some (possibly the same) relation.
When an attribute appears in more than one relation, its appearance usually rep-
resents a relationship between tuples of the two relations. For example, the inclusion
of branchNo in both the Branch and Staff relations is quite deliberate and links each
branch to the details of staff working at that branch. In the Branch relation, branchNo is
the primary key. However, in the Staff relation, the branchNo attribute exists to match
staff to the branch office they work in. In the Staff relation, branchNo is a foreign key.
We say that the attribute branchNo in the Staff relation targets the primary key attribute
branchNo in the home relation, Branch. These common attributes play an important
role in performing data manipulation, as we see in the next chapter.
The common convention for representing a relation schema is to give the name of
the relation followed by the attribute names in parentheses. Normally, the primary
key is underlined.
The conceptual model, or conceptual schema, is the set of all such schemas for the
database. Figure 4.3 shows an instance of this relational schema.
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 159 10/06/14 4:27 PM
160 | Chapter 4 The Relational Model
Figure 4-3
Instance of the
DreamHome
rental database.
eMail password
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 160 10/06/14 4:27 PM
4.3 Integrity Constraints | 161
4.3.1 Nulls
Represents a value for an attribute that is currently unknown or is not
Null
applicable for this tuple.
A null can be taken to mean the logical value “unknown.” It can mean that a value
is not applicable to a particular tuple, or it could merely mean that no value has
yet been supplied. Nulls are a way to deal with incomplete or exceptional data.
However, a null is not the same as a zero numeric value or a text string filled with
spaces; zeros and spaces are values, but a null represents the absence of a value.
Therefore, nulls should be treated differently from other values. Some authors use
the term “null value”; however, as a null is not a value but represents the absence
of a value, the term “null value” is deprecated.
For example, in the Viewing relation shown in Figure 4.3, the comment attrib-
ute may be undefined until the potential renter has visited the property and
returned his or her comment to the agency. Without nulls, it becomes necessary
to introduce false data to represent this state or to add additional attributes that
may not be meaningful to the user. In our example, we may try to represent
a null comment with the value 21. Alternatively, we may add a new attribute
hasCommentBeenSupplied to the Viewing relation, which contains a Y (Yes) if a com-
ment has been supplied, and N (No) otherwise. Both these approaches can be
confusing to the user.
Nulls can cause implementation problems, arising from the fact that the rela-
tional model is based on first-order predicate calculus, which is a two-valued or
Boolean logic—the only values allowed are true or false. Allowing nulls means that
we have to work with a higher-valued logic, such as three- or four-valued logic
(Codd, 1986, 1987, 1990).
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 161 10/06/14 4:27 PM
162 | Chapter 4 The Relational Model
If a foreign key exists in a relation, either the foreign key value must
Referential match a candidate key value of some tuple in its home relation or
integrity the foreign key value must be wholly null.
For example, branchNo in the Staff relation is a foreign key targeting the branchNo
attribute in the home relation, Branch. It should not be possible to create a staff
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 162 10/06/14 4:27 PM
4.4 Views | 163
record with branch number B025, for example, unless there is already a record
for branch number B025 in the Branch relation. However, we should be able to cre-
ate a new staff record with a null branch number to allow for the situation where
a new member of staff has joined the company but has not yet been assigned to a
particular branch office.
It is also possible for users to specify additional constraints that the data must sat-
isfy. For example, if an upper limit of 20 has been placed upon the number of staff
that may work at a branch office, then the user must be able to specify this general
constraint and expect the DBMS to enforce it. In this case, it should not be possible
to add a new member of staff at a given branch to the Staff relation if the number of
staff currently assigned to that branch is 20. Unfortunately, the level of support for
general constraints varies from system to system. We discuss the implementation of
relational integrity in 7 and 18.
4.4 Views
In the three-level ANSI-SPARC architecture presented in Chapter 2, we described
an external view as the structure of the database as it appears to a particular user.
In the relational model, the word “view” has a slightly different meaning. Rather
than being the entire external model of a user’s view, a view is a virtual or derived
relation: a relation that does not necessarily exist in its own right, but may be
dynamically derived from one or more base relations. Thus, an external model can
consist of both base (conceptual-level) relations and views derived from the base
relations. In this section, we briefly discuss views in relational systems. In Section
7.4 we examine views in more detail and show how they can be created and used
within SQL.
4.4.1 Terminology
The relations we have been dealing with so far in this chapter are known as base
relations.
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 163 10/06/14 4:27 PM
164 | Chapter 4 The Relational Model
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 164 10/06/14 4:27 PM
Chapter Summary | 165
Chapter Summary
• The Relational Database Management System (RDBMS) has become the dominant data-processing software in use
today, with estimated new licence sales of between US$6 billion and US$10 billion per year (US$25 billion with
tools sales included). This software represents the second generation of DBMSs and is based on the relational
data model proposed by E. F. Codd.
• A mathematical relation is a subset of the Cartesian product of two or more sets. In database terms, a relation
is any subset of the Cartesian product of the domains of the attributes. A relation is normally written as a set of
n-tuples, in which each element is chosen from the appropriate domain.
• Relations are physically represented as tables, with the rows corresponding to individual tuples and the columns to
attributes.
• The structure of the relation, with domain specifications and other constraints, is part of the intension of the
database; the relation with all its tuples written out represents an instance or extension of the database.
• Properties of database relations are: each cell contains exactly one atomic value, attribute names are distinct,
attribute values come from the same domain, attribute order is immaterial, tuple order is immaterial, and there
are no duplicate tuples.
• The degree of a relation is the number of attributes, and the cardinality is the number of tuples. A unary
relation has one attribute, a binary relation has two, a ternary relation has three, and an n-ary relation has n
attributes.
• A superkey is an attribute, or set of attributes, that identifies tuples of a relation uniquely, and a candidate
key is a minimal superkey. A primary key is the candidate key chosen for use in identification of tuples. A rela-
tion must always have a primary key. A foreign key is an attribute, or set of attributes, within one relation that
is the candidate key of another relation.
• A null represents a value for an attribute that is unknown at the present time or is not applicable for this tuple.
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 165 10/06/14 4:27 PM
166 | Chapter 4 The Relational Model
• Entity integrity is a constraint that states that in a base relation no attribute of a primary key can be null.
Referential integrity states that foreign key values must match a candidate key value of some tuple in the
home relation or be wholly null. Apart from relational integrity, integrity constraints include required data,
domain, and multiplicity constraints; other integrity constraints are called general constraints.
• A view in the relational model is a virtual or derived relation that is dynamically created from the under-
lying base relation(s) when required. Views provide security and allow the designer to customize a user’s
model. Not all views are updatable.
Review Questions
4.1 Discuss each of the following concepts in the context of the relational data model:
(a) relation
(b) attribute
(c) domain
(d) tuple
(e) intension and extension
(f ) degree and cardinality.
4.2 Describe the relationship between mathematical relations and relations in the relational data model.
4.3 Describe the term “normalized reaction.” Why are constraints so important in a relational database?
4.4 Discuss the properties of a relation.
4.5 Discuss the differences between the candidate keys and the primary key of a relation. Explain what is meant by a
foreign key. How do foreign keys of relations relate to candidate keys? Give examples to illustrate your answer.
4.6 Define the two principal integrity rules for the relational model. Discuss why it is desirable to enforce these rules.
4.7 Define “views.” Why are they important in a database approach?
Exercises
www.EBooksWorld.ir
M04_CONN3067_06_SE_C04.indd 166 10/06/14 4:27 PM
Chapter
Chapter Objectives
In this chapter you will learn:
www.EBooksWorld.ir
M05_CONN3067_06_SE_C05.indd 167 06/06/14 5:01 PM