2 RelationalModel
2 RelationalModel
2 RelationalModel
CMPT 354
Database Systems I
Martin Ester
Simon Fraser University
Spring 2023
32
What is a Data Model?
A formal notation (language) for describing data.
Structure of the data
Conceptual model
Higher level of abstraction than data structures in
programming languages such as lists or arrays.
Operations on the data
Limited set of high level operations: queries and
modifications.
Speeds-up database programming.
Allows DBS to optimize query execution, e.g. choice of
most efficient sorting method.
Constraints on the data
Capture more of the real world meaning of the data.
33
Why Study the Relational Model?
Most widely used model.
Vendors: Oracle, IBM, Microsoft, Sybase, etc.
Legacy systems in older models.
E.g., IBM s IMS
Not so recent competitor: object-oriented model.
ObjectStore, Versant, Ontos
A synthesis emerging: object-relational model
Informix Universal Server, Oracle, DB2
More recent competitor: semi-structured model.
XML
34
XML
eXtensible Markup Language
Inspired by HTML.
HTML specifies layout on the screen, XML
specifies content (semantics) of data.
Hierarchical data, i.e. records are not flat but
can be nested.
Semi-structured, i.e. data does not need to have
a schema.
Data is self-describing.
35
XML: Example dataset
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
36
Relational Database: Definitions
37
Relational Database: Definitions
Rows are called tuples (or records), columns
called attributes (or fields).
Attributes are referenced not by column
number, but by name.
Order of attributes does not matter.
Attribute types are called domains. Domains
consist of atomic values such as integers or
strings.
No structured values such as lists or sets.
The order of tuples does not matter, a relation
is a set of tuples. The order of tuples resulting
from a relational query is undefined.
38
Relational Database: Definitions
39
Example
Relation Students
sid name login age gpa
53666 Jones jones@cs 18 3.4
53688 Smith smith@eecs 18 3.2
53650 Smith smith@math 19 3.8
40
The SQL Language
Proposed by IBM (system R) in the 1970s.
Later developed into a standard since relational
data model used by many vendors.
Structured Query Language (SQL):
retrieval,
insertion, updating, and deletion of data,
management and administrative functions.
All commercial DBSs support SQL, but with
proprietary extensions to the standard
language.
41
The SQL Language
Major versions of the standard:
SQL-1986
first version
SQL-1992
major revision
SQL-1999
triggers, recursive queries, object-oriented features
SQL-2003
XML-related features, window functions, etc.
SQL-2006
importing XML data, publishing in XML format,
integration of XQuery, etc.
42
The SQL Language
Major versions of the standard (contd.)
SQL-2008
ORDER BY outside of cursors, TRUNCATE
statement, FETCH clause
SQL-2011
support for temporal data, PERIOD FOR
SQL-2016
row pattern matching, polymorphic table functions
SQL-2019
multidimensional arrays
43
The SQL Language
SQL supports the
creation
CREATE <relation name> (<attributes>);
modification
INSERT INTO <relation name> (<attribute names>)
VALUES (<attribute values>);
and querying of relations / tables.
SELECT <attribute names>
FROM <relation names>
WHERE <condition>;
Queries are more complex and will be covered in
separate chapter.
44
Creating Relations
CREATE TABLE specifies the relation name
and its attributes.
The domain of each attribute is specified, and
enforced by the DBMS whenever tuples are
added or modified.
Attributes can have zero or one value from
their domain.
NOT NULL specifies that this attribute must
have exactly one value.
45
Creating Relations
CREATE TABLE Students
(sid CHAR(20),
name CHAR(20),
login CHAR(10),
age INTEGER,
gpa REAL);
49
Integrity Constraints (ICs)
IC: condition that must be true for any instance
of the database; e.g., domain constraints.
ICs are specified when schema is defined.
ICs are checked when instance is modified.
A legal instance of a relation is one that satisfies
all specified ICs.
DBMS does not allow illegal instances.
If the DBMS checks ICs, stored data is more
faithful to real-world meaning.
Avoids data entry errors, too!
50
Primary Keys and Candidate Keys
A set of attributes is a key for a relation if:
1. No two distinct tuples can (!) have same values in
all key attributes, and
2. Condition 1 is not true for any subset of the key.
A set of attributes is a superkey for a relation if
condition 1 is fulfilled but condition 2 is not.
Possibly many candidate keys (specified using
UNIQUE), but exactly one primary key
(specified using PRIMARY KEY).
DBMS ensures that no two tuples share the
same (primary or candidate) key value(s).
51
Primary Keys and Candidate Keys
Artificial keys are often introduced, since they
are fully under the control of the DBS / the
enterprise, e.g., sid.
For each key attribute a value needs to be
provided, i.e. a key cannot have the special
value null.
Primary key can be used to express references
between tables and may also be used to
optimize data storage.
52
Primary and Candidate Keys
CREATE TABLE Students
(sid CHAR(20),
name CHAR(20),
login CHAR(10),
age INTEGER,
gpa REAL,
);
What are keys?
54
Primary and Candidate Keys
CREATE TABLE Enrolled
(sid CHAR(20)
Students can take a course only cid CHAR(20),
once. For a given student and grade CHAR(2),
course, there is a single grade.
);
Students can take only one course, CREATE TABLE Enrolled
and receive a single grade for that (sid CHAR(20)
course; further, no two students in cid CHAR(20),
a course receive the same grade. grade CHAR(2),
Used carelessly, an IC can prevent
the storage of database instances );
that arise in practice!
55
Foreign Keys
Foreign key : Set of attributes in one relation that
is used to ‘reference’ a tuple in another relation.
Must correspond to primary key of the
referenced relation.
à ‘logical pointer
E.g. sid in Enrolled is a foreign key referring to
Students:
56
Referential Integrity
If all foreign key constraints are enforced,
referential integrity is achieved, i.e., no dangling
references.
Can you name a data model without
referential integrity?
57
Foreign Keys in SQL
Only students listed in the Students relation are
allowed to enroll for courses.
CREATE TABLE Enrolled
(sid CHAR(20), cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students);
Enrolled
sid cid grade Students
sid name login age gpa
53666 Carnatic101 C
53666 Reggae203 B 53666 Jones jones@cs 18 3.4
53650 Topology112 A 53688 Smith smith@eecs 18 3.2
53666 History105 B 53650 Smith smith@math 19 3.8
58
Enforcing Referential Integrity
Consider Students and Enrolled; sid in Enrolled is
a foreign key that references Students.
What should be done if an Enrolled tuple with a
non-existent sid is inserted? Reject it!
What should be done if a Students tuple is
deleted?
Also delete all Enrolled tuples that refer to it.
Disallow deletion of a Students tuple that is referred to.
Set sid in Enrolled tuples that refer to it to a default sid.
In SQL, also: Set sid in Enrolled tuples that refer to it to
a special value null, denoting `unknown or
`inapplicable .
à Similar options for updates of primary key of Students.
59
Referential Integrity in SQL
SQL supports all 4 options
on deletes and updates. CREATE TABLE Enrolled
Default is NO ACTION (sid CHAR(20),
(delete/update is cid CHAR(20),
rejected). grade CHAR(2),
CASCADE (also delete PRIMARY KEY (sid,cid),
all tuples that refer to FOREIGN KEY (sid)
deleted tuple). REFERENCES Students
ON DELETE CASCADE
SET NULL / SET DEFAULT
ON UPDATE SET DEFAULT);
(sets foreign key value
of referencing tuple).
60
Where do ICs Come From?
ICs are based upon the semantics of the real-
world enterprise that is being described in the
database relations.
We can check a database instance to see if an
IC is violated, but we can never infer that an
IC is true by looking at an instance.
An IC is a statement about all possible
instances!
E.g., we know name is not a key, but the
assertion that sid is a key is given to us.
Key and foreign key ICs are the most
common; more general ICs discussed later.
61
Summary
The relational model is a tabular representation of data.
A relation is a subset of the cartesian product of some
domains.
Simple and intuitive, currently the most widely used
data model.
SQL is the standard language for creating, updating
and querying relational databases.
Integrity constraints can be specified by the DBA, based
on application semantics. DBMS checks for violations.
Two most important kinds of ICs: primary and foreign
key constraints.
In addition, we always have domain constraints.
62