Dbs Merge
Dbs Merge
Dbs Merge
ICT1407
ISHANTHA HEWARATNE
CHAPTER 01
INTRODUCTION
CHAPTER OUTLINE
u Drawbacks of file systems
u Types of Databases and Database Applications
u Basic Definitions
u Typical DBMS Functionality
u Example of a Database (UNIVERSITY)
u Main Characteristics of the Database Approach
u Database Users
u Advantages of Using the Database Approach
u When Not to Use Databases
DATA
u A representation of facts, concepts or instructions in a formalised manner
suitable for communication, interpretation or processing by human beings or
by automatic means.
u Raw data which is unprocessed Text, colours, symbols, shapes, graphics,
images, temperatures, sound, video or other facts and figures are data
suitable for processing.
u E.g. Person or Employee or Customer
u name, address, phone, date of birth, designation, department, salary, –
employee no, photograph
INFORMATION
u Knowledge derived from data.
u Processed or organised or summarised data.
u Eg:-
u Process Date of Birth ->Age
u Process Salary (all) ->Highest paid employee
u Process all -> No of employees
u Process all -> Employees working for
Why use a Database?
u Many people collect things
u e.g. stamps, photos, paper cuttings
u If you collect any thing, you probably are familiar with some of the problems
of managing a collection
u e.g. storing, filtering, updating
u One way to keep track of a collection is to create a database
Why Database Technology?
u The need to manipulate large collection of data for frequent used data
queries and reports.
E.g. Collection of information on library books
u Queries:
u List of books written by a particular author
u List of books about a particular subject
u Borrowing a book
u Reserving a book for borrowing
Examples of Database Applications
u Integrity problems
u Integrityconstraints
u Hard to add new constraints or change existing ones
The Solution
n Unique and centralized administration of data in a given
company
i.e. database notion
What is a Database?
n A database is a collection of related data
n It represents some aspects of the real world.
n It is a coherent collection of data.
n It is designed for specific purpose and intended users.
n It can be at varying size and complexity
n It can be generated and maintained manually
Types of Databases and Database Applications
u Traditional Applications:
u Numeric and Textual Databases
u More Recent Applications:
u Multimedia Databases
u Geographic Information Systems (GIS)
u Data Warehouses
u Real-time and Active Databases
u Many other applications
u First we focus on traditional applications
Basic Definitions
u Database:
u A collection of related data.
u Data:
u Known facts that can be recorded and have an implicit meaning.
u Mini-world:
u Some part of the real world about which data is stored in a
database. For example, student grades and transcripts at a
university.
u Database Management System (DBMS):
u A software package/ system to facilitate the creation and
maintenance of a computerized database.
u Database System:
u The DBMS software together with the data itself. Sometimes, the
applications are also included.
Simplified database system environment
Typical DBMS Functionality
u Defining
u Constructing
u Manipulating
u Processing and Sharing
Typical DBMS Functionality
u Defining a particular database in terms of its data types,
structures, and constraints
u Constructing or Loading the initial database contents on a
secondary storage medium
u Manipulating the database:
u Retrieval: Querying, generating reports
u Modification: Insertions, deletions and updates to its content
u Accessing the database through Web applications
u Processing and Sharing by a set of concurrent users and
application programs – yet, keeping all data valid and
consistent
Typical DBMS Functionality
u Other features:
u Protection or Security measures to prevent unauthorized access
u “Active” processing to take internal actions on data
u Presentation and Visualization of data
u Maintaining the database and associated programs over the
lifetime of the database application
u Called database, software, and system maintenance
Example of a Database
u Those who design and develop the DBMS software and related
tools, and the computer systems operators (called “Workers
Behind the Scene”).
Database Users
u Actors on the scene
u Database administrators:
u Responsible for authorizing access to the database, for
coordinating and monitoring its use, acquiring software and
hardware resources, controlling its use and monitoring
efficiency of operations.
u Database Designers:
u Responsible to define the content, the structure, the
constraints, and functions or transactions against the database.
They must communicate with the end-users and understand
their needs.
Categories of End-users
u Actors on the scene (continued)
u End-users: They use the data for queries, reports and some of
them update the database content. End-users can be categorized
into:
u Casual: access database occasionally when needed
u Naïve or Parametric: they make up a large section of the end-
user population.
u They use previously well-defined functions in the form of
“canned transactions” against the database.
u Examples are bank-tellers or reservation clerks who do this
activity for an entire shift of operations.
Categories of End-users (continued)
u Sophisticated:
u Hardware
u Set of physical devices on which a database resides.
u Can range from a PC to a network of computers.
u Software
– database management system (DBMS)
– operating system
– application programs
– User Interface
u Data
u Used by the organization and a description of this data called the
schema.
COMPONENTS OF DATABASE SYSTEM ENVIRONMENT
Procedures
u Instructions and rules that should be applied to the design and use of the
database.
People
u Two different types of people (end-users and practitioners) are concerned
with the database.
Database Systems
ICT1407
ISHANTHA HEWARATNE
CHAPTER 02
DATABASE ENVIRONMENT
Chapter Outline
• Data Models and Their Categories
• History of Data Models
• Schemas, Instances, and States
• Three-Schema Architecture
• Data Independence
• DBMS Languages and Interfaces
• Database System Utilities and Tools
• Centralized and Client-Server Architectures
• Classification of DBMSs
Data Models
• Data Model:
• A set of concepts to describe the structure of a database,
the operations for manipulating these structures, and
certain constraints that the database should obey.
• Data Model Structure and Constraints:
• Constructs are used to define the database structure
• Constructs typically include elements (and their data types)
as well as groups of elements (e.g. entity, record, table),
and relationships among such groups
• Constraints specify some restrictions on valid data; these
constraints must be enforced at all times
Data Models (continued)
• Data Model Operations:
• These operations are used for specifying database retrievals and
updates by referring to the constructs of the data model.
• Operations on the data model may include basic model operations
(e.g. generic insert, delete, update) and user-defined operations
(e.g. compute_student_gpa, update_inventory)
Categories of Data Models
• Conceptual (high-level, semantic) data models:
• Provide concepts that are close to the way many users
perceive data.
• (Also called entity-based or object-based data models.)
• Physical (low-level, internal) data models:
• Provide concepts that describe details of how data is stored
in the computer. These are usually specified in an ad-hoc
manner through DBMS design and administration manuals
• Implementation (representational) data models:
• Provide concepts that fall between the above two, used by
many commercial DBMS implementations (e.g. relational
data models used in many commercial systems).
Schemas versus Instances
• Database Schema:
• The description of a database.
• Includes descriptions of the database structure, data types, and
the constraints on the database.
• Schema Diagram:
• An illustrative display of (most aspects of) a database schema.
• Schema Construct:
• A component of the schema or an object within the schema, e.g.,
STUDENT, COURSE.
Schemas versus Instances
• Database State:
• The actual data stored in a database at a particular moment in
time. This includes the collection of all the data in the database.
• Also called database instance (or occurrence or snapshot).
• The term instance is also applied to individual database
components, e.g. record instance, table instance, entity
instance
Database Schema vs. Database State
• Database State:
• Refers to the content of a database at a moment in time.
• Initial Database State:
• Refers to the database state when it is initially loaded into the
system.
• Valid State:
• A state that satisfies the structure and constraints of the database.
Database Schema vs. Database State
(continued)
• Distinction
• The database schema changes very infrequently.
• The database state changes every time the database is updated.
37
Three-tier client-server architecture
38
Classification of DBMSs
• Based on the data model used
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
• Other classifications
• Single-user (typically used with personal computers)
vs. multi-user (most DBMSs).
• Centralized (uses a single computer with one database)
vs. distributed (uses multiple computers, multiple databases)
Variations of Distributed DBMSs (DDBMSs)
• Homogeneous DDBMS
• Heterogeneous DDBMS
• Federated or Multi-database Systems
Cost considerations for DBMSs
• Cost Range: from free open-source systems to configurations costing millions
of dollars
• Examples of free relational DBMSs: MySQL, PostgreSQL, others
• Commercial DBMS offer additional specialized modules, e.g. time-series
module, spatial data module, document module, XML module
• These offer additional specialized functionality when purchased separately
• Sometimes called cartridges (e.g., in Oracle) or blades
• Different licensing options: site license, maximum number of concurrent users
(seat license), single user, etc.
History of Data Models
• Network Model
• Hierarchical Model
• Relational Model
• Object-oriented Data Models
• Object-Relational Models
42
History of Data Models
• Network Model:
• The first network DBMS was implemented by Honeywell in 1964-65
(IDS System).
• Adopted heavily due to the support by CODASYL (Conference on
Data Systems Languages) (CODASYL - DBTG report of 1971).
• Later implemented in a large variety of systems - IDMS (Cullinet -
now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P.
(Hewlett-Packard)), VAX -DBMS (Digital Equipment Corp., next
COMPAQ, now H.P.).
Example of Network Model Schema
44
Network Model
• Advantages:
• Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and
relationship types.
• Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET, etc.
• Programmers can do optimal navigation through the database.
45
Network Model
• Disadvantages:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through
a set of records.
• Little scope for automated “query optimization”
46
History of Data Models
• Hierarchical Data Model:
• Initially implemented in a joint effort by IBM and North American
Rockwell around 1965. Resulted in the IMS family of systems.
• IBM’s IMS product had (and still has) a very large customer base
worldwide
• Hierarchical model was formalized based on the IMS system
• Other systems based on this model: System 2k (SAS inc.)
Hierarchical Model
• Advantages:
• Simple to construct and operate
• Corresponds to a number of natural hierarchically organized
domains, e.g., organization (“org”) chart
• Language is simple:
• Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT, etc.
• Disadvantages:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
History of Data Models
• Relational Model:
• Proposed in 1970 by E.F. Codd (IBM), first commercial system in
1981-82.
• Now in several commercial products (e.g. DB2, ORACLE, MS SQL
Server, SYBASE, INFORMIX).
• Several free open source implementations, e.g. MySQL, PostgreSQL
• Currently most dominant for developing database applications.
• SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99,
SQL3, …
49
History of Data Models
• Object-oriented Data Models:
• Several models have been proposed for implementing in a database
system.
• One set comprises models of persistent O-O Programming
Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and
Smalltalk (e.g., in GEMSTONE).
• Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS
(at H.P.- used in Open OODB).
• Object Database Standard: ODMG-93, ODMG-version 2.0, ODMG-
version 3.0.
• Chapters 20 and 21 describe this model.
History of Data Models
• Object-Relational Models:
• Most Recent Trend. Started with Informix Universal Server.
• Relational systems incorporate concepts from object databases
leading to object-relational.
• Exemplified in the latest versions of Oracle-10i, DB2, and SQL
Server and other DBMSs.
• Standards included in SQL-99 and expected to be enhanced in
future SQL standards.
51
Summary
• Data Models and Their Categories
• History of Data Models
• Schemas, Instances, and States
• Three-Schema Architecture
• Data Independence
• DBMS Languages and Interfaces
• Database System Utilities and Tools
• Centralized and Client-Server Architectures
• Classification of DBMSs
52
Data Modeling Using
the Entity-Relationship (ER)
Model
Chapter Outline
u Overview of Database Design Process
u Example Database Application (COMPANY)
u ER Model Concepts
u Entities and Attributes
u Entity Types, Value Sets, and Key Attributes
u Relationships and Relationship Types
u Weak Entity Types
u Roles and Attributes in Relationship Types
u ER Diagrams - Notation
u ER Diagram for COMPANY Schema
Overview of Database Design Process
u Two main activities:
u Database design
u Applications design
u Focus in this chapter on database design
u To design the conceptual schema for a database application
u Applications design focuses on the programs and interfaces that
access the database
u Generally considered part of software engineering
Database Design Process
stages in the design of a database:
u requirement analysis
u conceptual database design
u choice of the DBMS
u data model mapping
u physical design
u implementation
Requirement Gathering and Analysis
Purpose: to document the data requirements of the users
functional requirements are the operations that will be applied to the database,
including queries and update
the specification will then be used as the basis for the design of the database typical
activities:
u identification of application areas and user groups
u analysis of existing documentation of application areas, e.g. policy documents,
forms, reports, organization charts
u analysis of current operating environments and the planned use of the information,
e.g. information flow, types of transactions, frequency of transaction types
u responses to user questionnaires are analyzed
In Other words
start from a description of the requirements which is:
u poorly structured,
u heterogeneous
u informal
and use a technique to transform that into a specification of the database
requirements which is:
u formal
u homogeneous
u consistent
u complete
Conceptual Design
Two parallel activities
u Schema Design
look at the data requirements resulting from the analysis (phase 1) and produce a
conceptual schema in a DBMS-independent high level data model
u Transaction Design
look at the database applications whose requirements were analyzed in phase 1
and produce high level specifications for these transactions
Conceptual Schema Design
Purpose: to produce a conceptual schema of database
u Expressed using concept of the high level data model
u Not including implementation details (has to be understood by
non-technical users)
u but detailed in terms of the “objects” of the domain the database
will represent
u independent of the DBMS to be used (no relational DB-oriented
notions!)
u cannot be used directly to implement the database
design is made in terms of a semantic or conceptual data model
Transaction Design
Purpose: to produce a design of the transactions, that will run on the
database
u retrieval: retrieve data for display or as part of a report
u update: enter new data or amend existing data
u mixed: more complex applications may do both retrieval and update
Why?
u need to be sure to include in the conceptual schema all information
required by transactions
u relative importance and frequency of use of transactions will
influence physical database design
u ... the software needs to be designed as well as the data!
Choosing a DBMS
Purpose: Deciding the best framework for implementing the produced
schema:
u type of DBMS (relational, network, deductive, Object Oriented, ...)
u user and programmer interfaces
u type of query languages
choice made on the basis of
u technical factors the DBMS
has to support the required tasks
u economic factors
software acquisition/maintenance, hardware acquisition,
creation/conversion, training of staff
u organizational factors:
platforms supported, availability of vendor services
Logical Design
Purpose: to transform the generic, DBMS independent conceptual
schema in the data model of the chosen DBMS (data model mapping)
Two stages:
u system independent mapping: no consideration of any specific
characteristics that may apply to the specific DBMS package
u tailoring to DBMS: different DBMSs may implement the same data
model in slightly different ways
u result is a set of Data Description Language (DDL) statements in the
language of the chosen DBMS
Physical Design
Purpose: to choose the specific storage structures and access paths for
the database files
attention to performances some relevant criteria:
u response time: may want to minimise database access time for data
items referenced by frequently used transactions
u space utilisation: less frequently used data and queries may be
archived
u transaction throughput: average number of transactions that can be
processed per minute
Implementation
Purpose: to create the database
u compile and execute DDL statements
populate the database
u Manually / Automatically (May need to convert data from previous
formats)
u Implement application programs (transactions)
u Programs are written with embedded DML statements
Overview of Database Design Process
Slide 3-
15
Example COMPANY Database
u We need to create a database schema design based on the following
(simplified) requirements of the COMPANY Database:
u The company is organized into DEPARTMENTs. Each department has
a name, number and an employee who manages the department.
We keep track of the start date of the department manager. A
department may have several locations.
u Each department controls a number of PROJECTs. Each project has
a unique name, unique number and is located at a single location.
Example COMPANY Database (Contd.)
u We store each EMPLOYEE’s social security number, address, salary,
sex, and birthdate.
u Each employee works for one department but may work on
several projects.
u Wekeep track of the number of hours per week that an
employee currently works on each project.
u We also keep track of the direct supervisor of each employee.
u Each employee may have a number of DEPENDENTs.
u For each dependent, we keep track of their name, sex,
birthdate, and relationship to the employee.
Entity-Relationship Model
u model to express the conceptual schema of the database
u originally proposed in 1976 by Peter Chen on the “ACM Transactions
on Database Systems” journal as a means to unify the network and
relational DB models
used routinely for system analysis and design
u simple enough to learn and understand the basic concepts
u Powerful enough to be used in the development of complex
applications
conceptual designs using the ER model are called ER schemas
ER Model Concepts
Entities and Attributes
u Entities are specific objects or things in the mini-world that are
represented in the database.
u For example the EMPLOYEE John Smith, the Research
DEPARTMENT, the ProductX PROJECT
u Attributes are properties used to describe an entity.
u For example an EMPLOYEE entity may have the attributes
Name, SSN, Address, Sex, BirthDate
u A specific entity will have a value for each of its attributes.
u For example a specific employee entity may have Name='John
Smith', SSN='123456789', Address ='731, Fondren, Houston, TX',
Sex='M', BirthDate='09-JAN-55‘
u Each attribute has a value set (or data type) associated with it –
e.g. integer, string, subrange, enumerated type, …
Types of Attributes
u Simple
u Each entity has a single atomic value for the attribute. For
example, SSN or Sex.
u Composite
u The attribute may be composed of several components. For
example:
u Address(Apt#, House#, Street, City, State, ZipCode, Country),
or
u Name(FirstName, MiddleName, LastName).
u Composition may form a hierarchy where some components are
themselves composite.
u Multi-valued
u An entity may have multiple values for that attribute. For
example, Color of a CAR or PreviousDegrees of a STUDENT.
u Denoted as {Color} or {PreviousDegrees}.
Types of Attributes (2)
u In general, composite and multi-valued attributes may be nested
arbitrarily to any number of levels, although this is rare.
u For example, PreviousDegrees of a STUDENT is a composite multi-
valued attribute denoted by {PreviousDegrees (College, Year,
Degree, Field)}
u Multiple PreviousDegrees values can exist
u Each has four subcomponent attributes:
u College, Year, Degree, Field
Example of a composite attribute
Entity Types and Key Attributes
u Entities with the same basic attributes are grouped or typed into an
entity type.
u For example, the entity type EMPLOYEE and PROJECT.
u An attribute of an entity type for which each entity must have a
unique value is called a key attribute of the entity type.
u For example, SSN of EMPLOYEE.
Entity Types and Key Attributes
u A key attribute may be composite.
u License plate number is a key of the CAR entity type
u An entity type may have more than one key.
u The CAR entity type may have two keys:
u Vehicle Identification Number (popularly called VIN)
u License plate number (Number, Province)
u Each key is underlined
Displaying an Entity type
u In ER diagrams, an entity type is displayed in a rectangular box
u Attributes are displayed in ovals
u Each attribute is connected to its entity type
u Components of a composite attribute are connected to the oval
representing the composite attribute
u Each key attribute is underlined
u Multivalued attributes displayed in double ovals
u See CAR example on next slide
Entity Type CAR with two keys and a
corresponding Entity Set
Slide 3-
26
Entity Set
u Each entity type will have a collection of entities stored in the
database
u Called the entity set
u Previous slide shows three CAR entity instances in the entity set for
CAR
u Same name (CAR) used to refer to both the entity type and the entity
set
u Entity set is the current state of the entities of that type that are
stored in the database
Initial Design of Entity Types for the
COMPANY Database Schema
Slide 3-
29
Refining the initial design by introducing
relationships
Slide 3-
38
Discussion on Relationship Types
u In the refined design, some attributes from the initial entity
types are refined into relationships:
u Manager of DEPARTMENT -> MANAGES
u Works_on of EMPLOYEE -> WORKS_ON
u Department of EMPLOYEE -> WORKS_FOR
u In general, more than one relationship type can exist between
the same participating entity types
u MANAGES and WORKS_FOR are distinct relationship types between
EMPLOYEE and DEPARTMENT
u With different meanings and different relationship instances.
Recursive Relationship Type
u Is a relationship type with the same participating entity type in
distinct roles
u Example: the SUPERVISION relationship
u EMPLOYEE participates twice in two distinct roles:
u supervisor (or boss) role
u supervisee (or subordinate) role
u Each relationship instance relates two distinct EMPLOYEE entities:
u One employee in supervisor role
u One employee in supervisee role
Example of relationships of different
degrees
(a) Unary recursive relationships
Weak Entity Types
u An entity that does not have a key attribute
u A weak entity must participate in an identifying relationship type with
an owner or identifying entity type
u Entities are identified by the combination of:
u A partial key of the weak entity type
u The particular entity they are related to in the identifying entity
type
u Example:
u A DEPENDENT entity is identified by the dependent’s first name,
and the specific EMPLOYEE with whom the dependent is related
u Name of DEPENDENT is the partial key
u DEPENDENT is a weak entity type
u EMPLOYEE is its identifying entity type via the identifying
relationship type DEPENDENT_OF
Weak Entity Type
Constraints on Relationships
u Constraints on Relationship Types
u Cardinality Ratio (specifies maximum participation)
u Also known as ratio constraints
u One-to-one (1:1)
u One-to-many (1:N) or Many-to-one (N:1)
u Many-to-many (M:N)
u Existence Dependency Constraint (specifies minimum
participation) (also called participation constraint)
u zero (optional participation, not existence-dependent)
u one or more (mandatory participation, existence-dependent)
Supervision Relationship
Manages Relationship
Recursive Relationship Type is: SUPERVISION
(participation role names are shown)
Slide 3-
47
Attributes of Relationship types
u A relationship type can have attributes:
u For example, HoursPerWeek of WORKS_ON
u Its value for each relationship instance describes the number of
hours per week that an EMPLOYEE works on a PROJECT.
uA value of HoursPerWeek depends on a particular (employee,
project) combination
u Most relationship attributes are used with M:N relationships
u In1:N relationships, they can be transferred to the entity type on
the N-side of the relationship
Notation for Constraints on Relationships
u Cardinality ratio (of a binary relationship): 1:1, 1:N, N:1, or M:N
u Shown by placing appropriate numbers on the relationship edges.
u Participation constraint (on each participating entity type): total
(called existence dependency) or partial.
u Total shown by double line, partial by single line.
u NOTE: These are easy to specify for Binary Relationship Types.
Alternative diagrammatic notation
u ER diagrams is one popular example for displaying database schemas
u Many other notations exist in the literature and in various database
design and modeling tools
u Appendix A illustrates some of the alternative notations that have
been used
u UML class diagrams is representative of another way of displaying ER
concepts that is used in several commercial design tools
Summary of notation for ER diagrams
Slide 3-
51
Relationships of Higher Degree
u Relationship types of degree 2 are called binary
u Relationship types of degree 3 are called ternary and of degree n are
called n-ary
u In general, an n-ary relationship is not equivalent to n binary
relationships
u Constraints are harder to specify for higher-degree relationships (n >
2) than for binary relationships
n-ary relationships (n > 2)
u In general, 3 binary relationships can represent different information
than a single ternary relationship (see Figure 3.17a and b on next
slide)
u If needed, the binary and n-ary relationships can all be included in the
schema design (see Figure 3.17a and b, where all relationships convey
different meanings)
u The entity relationship model in its original form did not support the
specialization and generalization abstractions
u Next chapter illustrates how the ER model can be extended with
u Type-subtype and set-subset relationships
u Specialization/Generalization Hierarchies
u Notation to display them in EER diagrams
Chapter Summary
u Disjointness Constraint:
u Completeness Constraint:
Constraints on Specialization and Generalization
u Disjointness Constraint:
u Disjoint, total
u Disjoint, partial
u Overlapping, total
u Overlapping, partial
u Hierarchy has a constraint that every subclass has only one superclass
(called single inheritance); this is basically a tree structure
u Can have:
u We just use specialization (to stand for the end result of either
specialization or generalization)
Specialization/Generalization Hierarchies, Lattices
& Shared Subclasses
u In specialization, start with an entity type and then define subclasses
of the entity type by successive specialization
u In some cases, we need to model a single superclass/subclass relationship with more than
one superclass
u For each regular (strong) entity type E in the ER schema, create a relation R
that includes all the simple attributes of E.
u If the chosen key of E is composite, the set of simple attributes that form it will
together form the primary key of R.
u SSN, DNUMBER, and PNUMBER are the primary keys for the relations EMPLOYEE,
DEPARTMENT, and PROJECT as shown.
Foreign Key Constraint
u A FOREIGN KEY is a key used to link two tables together.
u A FOREIGN KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table.
Slide 7-
7
ER-to-Relational Mapping Algorithm (contd.)
u Step 2: Mapping of Weak Entity Types
u For each weak entity type W in the ER schema with owner entity type E,
create a relation R & include all simple attributes (or simple components of
composite attributes) of W as attributes of R.
u For each binary 1:1 relationship type R in the ER schema, identify the relations S
and T that correspond to the entity types participating in R.
u For each regular binary 1:N relationship type R, identify the relation S that
represent the participating entity type at the N-side of the relationship type.
u Include as foreign key in S the primary key of the relation T that represents the
other entity type participating in R.
u For each regular binary M:N relationship type R, create a new relation S to
represent R.
u Include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types; their combination will form the primary
key of S.
u Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.
u Example: The M:N relationship type WORKS_ON from the ER diagram
is mapped by creating a relation WORKS_ON in the relational database
schema.
u For each n-ary relationship type R, where n>2, create a new relationship
S to represent R.
u Also include any simple attributes of the n-ary relationship type (or
simple components of composite attributes) as attributes of S.
Example: The relationship type SUPPY in the ER
This can be mapped to the relation SUPPLY shown in the relational schema,
whose primary key is the combination of the three foreign keys {SNAME,
PARTNO, PROJNAME}
Summary of Mapping constructs and constraints
u Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attr(Li) =
{attributes of Si} U {k,a1…,an} and PK(Li) = k.
This option only works for a specialization whose subclasses are total (every entity in
the superclass must belong to (at least) one of the subclasses).
EER diagram notation for an attribute-defined specialization on JobType.
Options for mapping specialization or generalization.
(a) Mapping the EER schema using option 8A.
Generalization.
(b) Generalizing CAR and TRUCK into the superclass VEHICLE.
Options for mapping specialization or generalization.
(b) Mapping the EER schema using option 8B.
Mapping EER Model Constructs to Relations (contd.)
Slide 7-
33
Mapping the EER specialization lattice using multiple options.
Mapping EER Model Constructs to Relations
u Step 9: Mapping of Union Types (Categories).
Slide 7-
38
Chapter Summary
Slide 5- 3
Relational Model Concepts
u A Relation is a mathematical concept based on the ideas of sets
u The model was first proposed by Dr. E.F. Codd of IBM Research
in 1970 in the following paper:
u "A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970
u The above paper caused a major revolution in the field of
database management and earned Dr. Codd the coveted ACM
Turing Award
Slide 5- 4
Informal Definitions
u The data elements in each row represent certain facts that correspond to a real-
world entity or relationship
u In the formal model, rows are called tuples
u Each column has a column header that gives an indication of the meaning of the
data items in that column
u In the formal model, the column header is called an attribute name (or just
attribute)
Example of a Relation
Slide 5- 6
Informal Definitions
u Key of a Relation:
u Each row has a value of a data item (or set of items) that uniquely identifies
that row in the table
u Called the key
u In the STUDENT table, SSN is the key
u each domain contains the set of all possible values the attribute can take.
u dom(Cust-name) is varchar(25)
u The role these strings play in the CUSTOMER relation is that of the name of a
customer.
Formal Definitions - Summary
u Formally,
u Given R(A1, A2, .........., An)
u r(R) ⊂ dom (A1) X dom (A2) X ....X dom(An)
u R(A1, A2, …, An) is the schema of the relation
u R is the name of the relation
u A1, A2, …, An are the attributes of the relation
u r(R): a specific state (or "value" or “population”) of relation R – this is a set of
tuples (rows)
u r(R) = {t1, t2, …, tn} where each ti is an n-tuple
u ti = <v1, v2, …, vn> where each vj element-of dom(Aj)
Formal Definitions - Example
u Let R(A1, A2) be a relation schema:
u Let dom(A1) = {0,1}
u Let dom(A2) = {a,b,c}
u Then: dom(A1) X dom(A2) is all possible combinations:
{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }
u The tuples are not considered to be ordered, even though they appear to be in
the tabular form.
u We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2,
..., vn> to be ordered .
Slide 5- 17
Characteristics Of Relations
u Values in a tuple:
u All values are considered atomic (indivisible).
u Each value in a tuple must be from the domain of the attribute for that column
u Iftuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of R(A1, A2,
…, An)
u Then each vi must be a value from dom(Ai)
u A special null value is used to represent values that are unknown or inapplicable
to certain tuples.
Characteristics Of Relations
u Notation:
u We refer to component values of a tuple t by:
u t[Ai] or t.Ai
u This is the value vi of attribute Ai for tuple t
u Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively in t
Relational Integrity Constraints
u Constraints are conditions that must hold on all valid relation states.
u There are three main types of constraints in the relational model:
u Key constraints
u Entity integrity constraints
u Referential integrity constraints
u Another implicit constraint is the domain constraint
u Every value in a tuple must be from the domain of its attribute (or it could be
null, if allowed for that attribute)
Key Constraints
u Superkey of R:
u Is a set of attributes SK of R with the following condition:
u No two tuples in any valid relation state r(R) will have the same value for SK
u That is, for any distinct tuples t1 and t2 in r(R), t1[SK] ≠ t2[SK]
u This condition must hold in any valid state r(R)
u Key of R:
u A "minimal" superkey
u That is, a key is a superkey K such that removal of any attribute from K results in
a set of attributes that is not a superkey (does not possess the superkey
uniqueness property)
Key Constraints (continued)
u Example: Consider the CAR relation schema:
u CAR(State, Reg#, SerialNo, Make, Model, Year)
u CAR has two keys:
u Key1 = {State, Reg#}
u Key2 = {SerialNo}
u Both are also superkeys of CAR
u {SerialNo, Make} is a superkey but not a key.
u In general:
u Any key is a superkey (but not vice versa)
u Any set of attributes that includes a key is a superkey
u A minimal superkey is also a key
Key Constraints (continued)
u If a relation has several candidate keys, one is chosen arbitrarily to be the primary
key.
u The primary key attributes are underlined.
u Example: Consider the CAR relation schema:
u CAR(State, Reg#, SerialNo, Make, Model, Year)
u We chose SerialNo as the primary key
u The primary key value is used to uniquely identify each tuple in a relation
u Provides the tuple identity
u Also used to reference the tuple from another tuple
u General rule: Choose as primary key the smallest of the candidate keys (in terms of
size)
u Not always applicable – choice is sometimes subjective
CAR table with two candidate keys –
LicenseNumber chosen as Primary Key
Slide 5- 24
Relational Database Schema
u Relational Database Schema:
u A set S of relation schemas that belong to the same database.
u S is the name of the whole database schema
u S = {R1, R2, ..., Rn}
u R1, R2, …, Rn are the names of the individual relation schemas within the
database S
u Following slide shows a COMPANY database schema with 6 relation schemas
COMPANY Database Schema
Slide 5- 26
Entity Integrity
u Entity Integrity:
u The primary key attributes PK of each relation schema R in S cannot
have null values in any tuple of r(R).
u This is because primary key values are used to identify the individual tuples.
u t[PK] ≠ null for any tuple t in r(R)
u If PK has several attributes, null is not allowed in any of these attributes
u Note: Other attributes of R may be constrained to disallow null values,
even though they are not members of the primary key.
Referential Integrity
Slide 5- 32
Populated database state
u Each relation will have many tuples in its current relation state
u The relational database state is a union of all the individual relation states
u Whenever the database is changed, a new state arises
u Basic operations for changing the database:
u INSERT a new tuple in a relation
u DELETE an existing tuple from a relation
u MODIFY an attribute of an existing tuple
u Next slide shows an example state for the COMPANY database
Populated database state for COMPANY
Slide 5- 34
Update Operations on Relations
u INSERT a tuple.
u DELETE a tuple.
u MODIFY a tuple.
u Integrity constraints should not be violated by the update
operations.
u Several update operations may have to be grouped together.
u Updates may propagate to cause other updates automatically.
This may be necessary to maintain integrity constraints.
Slide 5- 35
Update Operations on Relations
u In case of integrity violation, several actions can be taken:
u Cancel the operation that causes the violation (RESTRICT or
REJECT option)
u Perform the operation but inform the user of the violation
u Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
u Execute a user-specified error-correction routine
Slide 5- 36
Possible violations for each operation
u INSERT may violate any of the constraints:
u Domain constraint:
u ifone of the attribute values provided for the new tuple is not of the specified
attribute domain
u Key constraint:
u ifthe value of a key attribute in the new tuple already exists in another tuple
in the relation
u Referential integrity:
u ifa foreign key value in the new tuple references a primary key value that does
not exist in the referenced relation
u Entity integrity:
u if the primary key value is null in the new tuple
Possible violations for each operation
u DELETE may violate only referential integrity:
u If the primary key value of the tuple being deleted is referenced from other
tuples in the database
u Can be remedied by several actions: RESTRICT, CASCADE, SET NULL (see Chapter 8
for more details)
u RESTRICT option: reject the deletion
u CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples
u SET NULL option: set the foreign keys of the referencing tuples to NULL
u One of the above options must be specified during database design for each
foreign key constraint
Possible violations for each operation
u UPDATE may violate domain constraint and NOT NULL constraint on an attribute
being modified
u Any of the other constraints may also be violated, depending on the attribute being
updated:
u Updating the primary key (PK):
u Similar to a DELETE followed by an INSERT
u Need to specify similar options to DELETE
u Updating a foreign key (FK):
u May violate referential integrity
u Updating an ordinary attribute (neither PK nor FK):
u Can only violate domain constraints
Summary
Slide 5- 41
Relational Algebra
Chapter Outline
u Relational Algebra
u Unary Relational Operations
u Relational Algebra Operations From Set Theory
u Binary Relational Operations
u Additional Relational Operations
u Examples of Queries in Relational Algebra
u Relational Calculus
u Tuple Relational Calculus
u Domain Relational Calculus
u Example Database Application (COMPANY)
u Overview of the QBE language (appendix D)
Relational Algebra Overview
u Relational algebra is the basic set of operations for the
relational model
u These operations enable a user to specify basic retrieval
requests (or queries)
u The result of an operation is a new relation, which may have
been formed from one or more input relations
u This property makes the algebra “closed” (all objects in
relational algebra are relations)
Relational Algebra Overview (continued)
u The algebra operations thus produce new relations
u These can be further manipulated using operations of the
same algebra
u A sequence of relational algebra operations forms a relational
algebra expression
u The result of a relational algebra expression is also a
relation that represents the result of a database query (or
retrieval request)
Relational Algebra Overview
u Relational Algebra consists of several groups of operations
u Unary Relational Operations
u SELECT (symbol: σ (sigma))
u PROJECT (symbol: π (pi))
u RENAME (symbol: ρ (rho))
u Relational Algebra Operations From Set Theory
u UNION ( ∪ ), INTERSECTION ( ∩ ), DIFFERENCE (or MINUS, – )
u CARTESIAN PRODUCT ( x )
Relational Algebra Overview cont.
u Binary Relational Operations
u JOIN (several variations of JOIN exist)
u DIVISION
u Additional Relational Operations
u OUTER JOINS, OUTER UNION
u AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)
Database State for COMPANY
Unary Relational Operations: SELECT
u The SELECT operation (denoted by σ (sigma)) is used to select a subset of
the tuples from a relation based on a selection condition.
u The selection condition acts as a filter
u Keeps only those tuples that satisfy the qualifying condition
u Tuples satisfying the condition are selected whereas the other
tuples are discarded (filtered out)
u Examples:
u Select the EMPLOYEE tuples whose department number is 4:
σ DNO = 4 (EMPLOYEE)
u Select the employee tuples whose salary is greater than $30,000:
σ SALARY > 30,000 (EMPLOYEE)
Unary Relational Operations: SELECT
u x
The following query results refer to this
database state
Slide 6- 12
Unary Relational Operations: PROJECT
u PROJECT Operation is denoted by π (pi)
u This operation keeps certain columns (attributes) from a relation and discards
the other columns.
u PROJECT creates a vertical partitioning
u The list of specified columns (attributes) is kept in each tuple
u The
other attributes in each tuple are discarded
u Example: To list each employee’s first and last name and salary, the following is
used:
πLNAME, FNAME,SALARY(EMPLOYEE)
Unary Relational Operations: PROJECT (cont.)
Slide 6- 14
Unary Relational Operations: PROJECT
(contd.)
u PROJECT Operation Properties
u The number of tuples in the result of projection π<list>(R) is always less or
equal to the number of tuples in R
u Ifthe list of attributes includes a key of R, then the number of tuples in
the result of PROJECT is equal to the number of tuples in R
u PROJECT is not commutative
(π <list2> (R) ) = π
u π <list1> <list1> (R) as long as <list2> contains the
attributes in <list1>
Slide 6- 15
Examples of applying SELECT and PROJECT
operations
Slide 6- 16
Relational Algebra Expressions
u We may want to apply several relational algebra operations one
after the other
u Either we can write the operations as a single relational
algebra expression by nesting the operations, or
u We can apply one operation at a time and create
intermediate result relations.
u In the latter case, we must give names to the relations that
hold the intermediate results.
Slide 6- 17
Single expression versus sequence of
relational operations (Example)
u To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must
apply a select and a project operation
u We can write a single relational algebra expression as
follows:
u πFNAME, LNAME, SALARY(σ DNO=5(EMPLOYEE))
u OR We can explicitly show the sequence of operations,
giving a name to each intermediate relation:
u DEP5_EMPS ← σ DNO=5(EMPLOYEE)
u RESULT ← π FNAME, LNAME, SALARY (DEP5_EMPS)
Slide 6- 18
Unary Relational Operations: RENAME
Slide 6- 19
Unary Relational Operations: RENAME
(contd.)
Slide 6- 22
Relational Algebra Operations from Set Theory: UNION
u UNION Operation
u Binary operation, denoted by ∪
u The result of R ∪ S, is a relation that includes all tuples that
are either in R or in S or in both R and S
u Duplicate tuples are eliminated
u The two operand relations R and S must be “type
compatible” (or UNION compatible)
uR and S must have same number of attributes
u Eachpair of corresponding attributes must be type
compatible (have same or compatible domains)
Slide 6- 23
Relational Algebra Operations from Set Theory: UNION
u Example:
u To retrieve the social security numbers of all employees who either work in
department 5 (RESULT1 below) or directly supervise an employee who works in
department 5 (RESULT2 below)
RESULT1 ← π SSN(DEP5_EMPS)
RESULT2(SSN) ← πSUPERSSN(DEP5_EMPS)
u The union operation produces the tuples that are in either RESULT1 or RESULT2 or
both
Example of the result of a UNION operation
u UNION Example
Slide 6- 25
Relational Algebra Operations from Set Theory
u Type Compatibility of operands is required for the binary set operation UNION ∪,
(also for INTERSECTION ∩, and SET DIFFERENCE –, see next slides)
u R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:
u The resulting relation for R1∪R2 (also for R1∩R2, or R1–R2, see next slides) has
the same attribute names as the first operand relation R1 (by convention)
Relational Algebra Operations from Set Theory: INTERSECTION
u INTERSECTION is denoted by ∩
u The attribute names in the result will be the same as the attribute
names in R
u The attribute names in the result will be the same as the attribute
names in R
u R ∪ S = S ∪ R, and R ∩ S = S ∩ R
u Both union and intersection can be treated as n-ary operations applicable to any
number of relations as both are associative operations; that is
u R ∪ (S ∪ T) = (R ∪ S) ∪ T
u (R ∩ S) ∩ T = R ∩ (S ∩ T)
u R–S≠S–R
Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT
u CARTESIAN (or CROSS) PRODUCT Operation
u This operation is used to combine tuples from two relations in a combinatorial fashion.
u The resulting relation state has one tuple for each combination of tuples—one from R and
one from S.
u Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have nR *
nS tuples.
u FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)
u Example (meaningful):
u FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)
u ACTUAL_DEPS ← σ SSN=ESSN(EMP_DEPENDENTS)
u A special operation, called JOIN combines this sequence into a single operation
u This operation is very important for any relational database with more than a single
relation, because it allows us combine related tuples from various relations
u The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . .
., Bm) is:
R <join condition>S
u where R and S can be any relations that result from general relational algebra
expressions.
Binary Relational Operations: JOIN (cont.)
u Example: Suppose that we want to retrieve the name of the manager of each department.
u To get the manager’s name, we need to combine each DEPARTMENT tuple with the
EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department tuple.
u Combines each department record with the employee who manages the department
R.Ai=S.Bj
u The resulting relation state has one tuple for each combination of tuples—r from R and
s from S, but only if they satisfy the join condition r[Ai]=s[Bj]
u Hence, if R has nR tuples, and S has nS tuples, then the join result will generally have
less than nR * nS tuples.
u Only related tuples (based on the join condition) will appear in the result
Some properties of JOIN
u The general case of JOIN operation is called a Theta-join: R S
theta
u Theta can be any general boolean expression on the attributes of R and S; for
example:
u Most join conditions involve one or more equality conditions “AND”ed together; for
example:
u The most common use of join involves join conditions with equality comparisons
only
u Such a join, where the only comparison operator used is =, is called an EQUIJOIN.
u The standard definition of natural join requires that the two join attributes, or
each pair of corresponding join attributes, have the same name in both
relations
DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER
Binary Relational Operations NATURAL JOIN (contd.)
u Another example: Q ← R(A,B,C,D) * S(C,D,E)
u Q(A,B,C,D,E)
Example of NATURAL JOIN operation
Slide 6- 44
Complete Set of Relational Operations
u The set of operations including SELECT σ, PROJECT π , UNION ∪, DIFFERENCE − ,
RENAME ρ, and CARTESIAN PRODUCT X is called a complete set because any
other relational algebra expression can be expressed by a combination of these
five operations.
u For example:
u R ∩ S = (R ∪ S ) – ((R − S) ∪ (S − R))
u R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X ∪ Y); that is, let Y
be the set of attributes of R that are not attributes of S.
u The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear
in R with tR [Y] = t, and with
u For a tuple t to appear in the result T of the DIVISION, the values in t must
appear in R in combination with every tuple in S.
Example of DIVISION
Slide 6- 47
Recap of Relational Algebra Operations
Slide 6- 48