Dbs Merge

Download as pdf or txt
Download as pdf or txt
You are on page 1of 313

Database Systems

ICT1407
ISHANTHA HEWARATNE
CHAPTER 01
INTRODUCTION
CHAPTER OUTLINE
u Drawbacks of file systems
u Types of Databases and Database Applications
u Basic Definitions
u Typical DBMS Functionality
u Example of a Database (UNIVERSITY)
u Main Characteristics of the Database Approach
u Database Users
u Advantages of Using the Database Approach
u When Not to Use Databases
DATA
u A representation of facts, concepts or instructions in a formalised manner
suitable for communication, interpretation or processing by human beings or
by automatic means.
u Raw data which is unprocessed Text, colours, symbols, shapes, graphics,
images, temperatures, sound, video or other facts and figures are data
suitable for processing.
u E.g. Person or Employee or Customer
u name, address, phone, date of birth, designation, department, salary, –
employee no, photograph
INFORMATION
u Knowledge derived from data.
u Processed or organised or summarised data.
u Eg:-
u Process Date of Birth ->Age
u Process Salary (all) ->Highest paid employee
u Process all -> No of employees
u Process all -> Employees working for
Why use a Database?
u Many people collect things
u e.g. stamps, photos, paper cuttings
u If you collect any thing, you probably are familiar with some of the problems
of managing a collection
u e.g. storing, filtering, updating
u One way to keep track of a collection is to create a database
Why Database Technology?

u The need to manipulate large collection of data for frequent used data
queries and reports.
E.g. Collection of information on library books
u Queries:
u List of books written by a particular author
u List of books about a particular subject
u Borrowing a book
u Reserving a book for borrowing
Examples of Database Applications

u Purchases from the supermarket


u Purchases using your credit card
u Booking a holiday at the travel agents
u Using the Internet
u Studying at university
Manual Systems – Information on
library books
u Before and during most of last century, libraries used card catalogues stored
in drawers of special cabinets
u cards with typed book information
e.g. the title index has one card for every book in the library
Drawbacks of file systems
u Program-Data Dependence
u All programs maintain metadata for each file they use

u Data Redundancy (Duplication of data)


u Differentsystems/programs have separate copies of the
same data
u Multiple file formats, duplication of information in different
files
u Requires space, effort and result in loss of data & metadata
integrity
u Limited Data Sharing
u No centralized control of data
u Each application has its own private files & users has little chance
to share data outside their own applications
Drawbacks of file systems (cont…)
u Lengthy Development Times
u For each new application programmers must design their own
file formats & descriptions from scratch

u Excessive Program Maintenance


u 80% of information systems budget

u Difficulty in accessing data


u Need to write a new program to carry out each new task

u Integrity problems
u Integrityconstraints
u Hard to add new constraints or change existing ones
The Solution
n Unique and centralized administration of data in a given
company
i.e. database notion
What is a Database?
n A database is a collection of related data
n It represents some aspects of the real world.
n It is a coherent collection of data.
n It is designed for specific purpose and intended users.
n It can be at varying size and complexity
n It can be generated and maintained manually
Types of Databases and Database Applications
u Traditional Applications:
u Numeric and Textual Databases
u More Recent Applications:
u Multimedia Databases
u Geographic Information Systems (GIS)
u Data Warehouses
u Real-time and Active Databases
u Many other applications
u First we focus on traditional applications
Basic Definitions
u Database:
u A collection of related data.
u Data:
u Known facts that can be recorded and have an implicit meaning.
u Mini-world:
u Some part of the real world about which data is stored in a
database. For example, student grades and transcripts at a
university.
u Database Management System (DBMS):
u A software package/ system to facilitate the creation and
maintenance of a computerized database.
u Database System:
u The DBMS software together with the data itself. Sometimes, the
applications are also included.
Simplified database system environment
Typical DBMS Functionality

u Defining
u Constructing
u Manipulating
u Processing and Sharing
Typical DBMS Functionality
u Defining a particular database in terms of its data types,
structures, and constraints
u Constructing or Loading the initial database contents on a
secondary storage medium
u Manipulating the database:
u Retrieval: Querying, generating reports
u Modification: Insertions, deletions and updates to its content
u Accessing the database through Web applications
u Processing and Sharing by a set of concurrent users and
application programs – yet, keeping all data valid and
consistent
Typical DBMS Functionality
u Other features:
u Protection or Security measures to prevent unauthorized access
u “Active” processing to take internal actions on data
u Presentation and Visualization of data
u Maintaining the database and associated programs over the
lifetime of the database application
u Called database, software, and system maintenance
Example of a Database

u Mini-world for the example:


u Part of a UNIVERSITY environment.
u Some mini-world entities:
u STUDENTs
u COURSEs
u SECTIONs (of COURSEs)
u (academic) DEPARTMENTs
u INSTRUCTORs
Example of a Database

u Some mini-world relationships:


u SECTIONs are of specific COURSEs
u STUDENTs take SECTIONs
u COURSEs have prerequisite COURSEs
u INSTRUCTORs teach SECTIONs
u COURSEs are offered by DEPARTMENTs
u STUDENTs major in DEPARTMENTs
Example of a simple database
Main Characteristics of the Database Approach
u Self-describing nature of a database system:
u A DBMS catalog stores the description of a particular
database (e.g. data structures, types, and constraints)
u The description is called meta-data.
u This allows the DBMS software to work with different
database applications.
u Insulation between programs and data:
u Called program-data independence.
u Allows changing data structures and storage organization
without having to change the DBMS access programs.
Example of a simplified database catalog
Main Characteristics of the Database Approach
(continued)
u Data Abstraction:
u A data model is used to hide storage details and present the users with a
conceptual view of the database.
u Programs refer to the data model constructs rather than data storage details
u Support of multiple views of the data:
u Each user may see a different view of the database, which describes only the data
of interest to that user.
Main Characteristics of the Database Approach
(continued)
u Sharing of data and multi-user transaction processing:
u Allowing a set of concurrent users to retrieve from and to update the
database.

u Concurrency control within the DBMS guarantees that each


transaction is correctly executed or aborted.

u Recovery subsystem ensures each completed transaction has its effect


permanently recorded in the database

u OLTP (Online Transaction Processing) is a major part of database


applications. This allows hundreds of concurrent transactions to
execute per second.
Database Users
u Users may be divided into two
u Those who actually use and control the database content, and
those who design, develop and maintain database applications
(called “Actors on the Scene”), and

u Those who design and develop the DBMS software and related
tools, and the computer systems operators (called “Workers
Behind the Scene”).
Database Users
u Actors on the scene
u Database administrators:
u Responsible for authorizing access to the database, for
coordinating and monitoring its use, acquiring software and
hardware resources, controlling its use and monitoring
efficiency of operations.
u Database Designers:
u Responsible to define the content, the structure, the
constraints, and functions or transactions against the database.
They must communicate with the end-users and understand
their needs.
Categories of End-users
u Actors on the scene (continued)
u End-users: They use the data for queries, reports and some of
them update the database content. End-users can be categorized
into:
u Casual: access database occasionally when needed
u Naïve or Parametric: they make up a large section of the end-
user population.
u They use previously well-defined functions in the form of
“canned transactions” against the database.
u Examples are bank-tellers or reservation clerks who do this
activity for an entire shift of operations.
Categories of End-users (continued)
u Sophisticated:

u These include business analysts, scientists, engineers,


others thoroughly familiar with the system capabilities.
u Many use tools in the form of software packages that work
closely with the stored database.
u Stand-alone:

u Mostlymaintain personal databases using ready-to-use


packaged applications.
u An example is a tax program user that creates its own
internal database.
u Another example is a user that maintains an address book
Advantages of Using the Database Approach
u Controlling redundancy in data storage and in development and
maintenance efforts.
u Sharing of data among multiple users.

u Restricting unauthorized access to data.

u Providing Storage Structures (e.g. indexes) for efficient Query


Processing
Advantages of Using the Database Approach
(continued)
u Providing backup and recovery services.
u Providing multiple interfaces to different classes of users.
u Representing complex relationships among data.
u Enforcing integrity constraints on the database.
Additional Implications of Using the Database
Approach
u Potential for enforcing standards:
u This is very crucial for the success of database applications in
large organizations. Standards refer to data item names, display
formats, screens, report structures, meta-data (description of
data), Web page layouts, etc.
u Reduced application development time:
u Incremental time to add each new application is reduced.
Additional Implications of Using the Database
Approach (continued)
u Flexibility to change data structures:
u Database structure may evolve as new requirements are defined.
u Availability of current information:
u Extremely important for on-line transaction systems such as
airline, hotel, car reservations.
u Economies of scale:
u Wasteful overlap of resources and personnel can be avoided by
consolidating data and applications across departments.
Historical Development of Database Technology

u Early Database Applications:


u The Hierarchical and Network Models were introduced in
mid 1960s and dominated during the seventies.
u A bulk of the worldwide database processing still occurs
using these models, particularly, the hierarchical model.
u Relational Model based Systems:
u Relational model was originally introduced in 1970, was
heavily researched and experimented within IBM Research
and several universities.
u Relational DBMS Products emerged in the early 1980s.
Historical Development of Database Technology
(continued)
u Object-oriented and emerging applications:
u Object-Oriented Database Management Systems (OODBMSs) were
introduced in late 1980s and early 1990s to cater to the need of
complex data processing in CAD and other applications.
u Their use has not taken off much.
u Many relational DBMSs have incorporated object database
concepts, leading to a new category called object-relational
DBMSs (ORDBMSs)
u Extended relational systems add further capabilities (e.g. for
multimedia data, XML, and other data types)
Historical Development of Database Technology
(continued)
u Data on the Web and E-commerce Applications:
u Web contains data in HTML (Hypertext markup language) with links
among pages.
u This has given rise to a new set of applications and E-commerce is
using new standards like XML (eXtended Markup Language).
u Script programming languages such as PHP and JavaScript allow
generation of dynamic Web pages that are partially generated
from a database
u Also allow database updates through Web pages
When not to use a DBMS

u Main inhibitors (costs) of using a DBMS:


u High initial investment and possible need for additional hardware.
u Overhead for providing generality, security, concurrency control,
recovery, and integrity functions.
u When a DBMS may be unnecessary:
u If the database and applications are simple, well defined, and not
expected to change.
u If there are stringent real-time requirements that may not be met
because of DBMS overhead.
u If access to data by multiple users is not required.
When not to use a DBMS

u When no DBMS may suffice:


u If the database system is not able to handle the complexity of data because of
modeling limitations
u If the database users need special operations not supported by the DBMS.
Summary

u Types of Databases and Database Applications


u Basic Definitions
u Typical DBMS Functionality
u Example of a Database (UNIVERSITY)
u Main Characteristics of the Database Approach
u Database Users
u Advantages of Using the Database Approach
u When Not to Use Databases
COMPONENTS OF DATABASE SYSTEM ENVIRONMENT

u Hardware
u Set of physical devices on which a database resides.
u Can range from a PC to a network of computers.
u Software
– database management system (DBMS)
– operating system
– application programs
– User Interface
u Data
u Used by the organization and a description of this data called the
schema.
COMPONENTS OF DATABASE SYSTEM ENVIRONMENT

Procedures
u Instructions and rules that should be applied to the design and use of the
database.
People
u Two different types of people (end-users and practitioners) are concerned
with the database.
Database Systems
ICT1407
ISHANTHA HEWARATNE
CHAPTER 02
DATABASE ENVIRONMENT
Chapter Outline
• Data Models and Their Categories
• History of Data Models
• Schemas, Instances, and States
• Three-Schema Architecture
• Data Independence
• DBMS Languages and Interfaces
• Database System Utilities and Tools
• Centralized and Client-Server Architectures
• Classification of DBMSs
Data Models
• Data Model:
• A set of concepts to describe the structure of a database,
the operations for manipulating these structures, and
certain constraints that the database should obey.
• Data Model Structure and Constraints:
• Constructs are used to define the database structure
• Constructs typically include elements (and their data types)
as well as groups of elements (e.g. entity, record, table),
and relationships among such groups
• Constraints specify some restrictions on valid data; these
constraints must be enforced at all times
Data Models (continued)
• Data Model Operations:
• These operations are used for specifying database retrievals and
updates by referring to the constructs of the data model.
• Operations on the data model may include basic model operations
(e.g. generic insert, delete, update) and user-defined operations
(e.g. compute_student_gpa, update_inventory)
Categories of Data Models
• Conceptual (high-level, semantic) data models:
• Provide concepts that are close to the way many users
perceive data.
• (Also called entity-based or object-based data models.)
• Physical (low-level, internal) data models:
• Provide concepts that describe details of how data is stored
in the computer. These are usually specified in an ad-hoc
manner through DBMS design and administration manuals
• Implementation (representational) data models:
• Provide concepts that fall between the above two, used by
many commercial DBMS implementations (e.g. relational
data models used in many commercial systems).
Schemas versus Instances
• Database Schema:
• The description of a database.
• Includes descriptions of the database structure, data types, and
the constraints on the database.
• Schema Diagram:
• An illustrative display of (most aspects of) a database schema.
• Schema Construct:
• A component of the schema or an object within the schema, e.g.,
STUDENT, COURSE.
Schemas versus Instances
• Database State:
• The actual data stored in a database at a particular moment in
time. This includes the collection of all the data in the database.
• Also called database instance (or occurrence or snapshot).
• The term instance is also applied to individual database
components, e.g. record instance, table instance, entity
instance
Database Schema vs. Database State
• Database State:
• Refers to the content of a database at a moment in time.
• Initial Database State:
• Refers to the database state when it is initially loaded into the
system.
• Valid State:
• A state that satisfies the structure and constraints of the database.
Database Schema vs. Database State
(continued)
• Distinction
• The database schema changes very infrequently.
• The database state changes every time the database is updated.

• Schema is also called intension.


• State is also called extension.
Example of a Database Schema
Example of a database state
Three-Level Architecture
• Proposed to support DBMS characteristics of:
• Program-data independence.
• Support of multiple views of the data.
• Not explicitly used in commercial DBMS products, but has been useful
in explaining database system organization
Three-Level Architecture
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe physical
storage structures and access paths (e.g indexes).
• Typically uses a physical data model.
• Conceptual schema at the conceptual level to describe the
structure and constraints for the whole database for a
community of users.
• Uses a conceptual or an implementation data model.
• External schemas at the external level to describe the
various user views.
• Usually uses the same data model as the conceptual schema.
Three-Level Architecture
Three-Level Architecture
• Mappings among schema levels are needed to transform requests and
data.
• Programs refer to an external schema, and are mapped by the
DBMS to the internal schema for execution.
• Data extracted from the internal DBMS level is reformatted to
match the user’s external view (e.g. formatting the results of an
SQL query for display in a Web page)
Data Independence
• Logical Data Independence:
• The capacity to change the conceptual schema without having to
change the external schemas and their associated application
programs.
• Physical Data Independence:
• The capacity to change the internal schema without having to
change the conceptual schema.
• For example, the internal schema may be changed when certain
file structures are reorganized or new indexes are created to
improve database performance
Data Independence (continued)
• When a schema at a lower level is changed, only the mappings
between this schema and higher-level schemas need to be changed in
a DBMS that fully supports data independence.
• The higher-level schemas themselves are unchanged.
• Hence, the application programs need not be changed since they
refer to the external schemas.
DBMS Languages
• Data Definition Language (DDL)
• Data Manipulation Language (DML)
• High-Level or Non-procedural Languages: These include the
relational language SQL
• May be used in a standalone way or may be embedded in a
programming language
• Low Level or Procedural Languages:
• These must be embedded in a programming language
DBMS Languages
• Data Definition Language (DDL):
• Used by the DBA and database designers to specify the conceptual schema of a
database.
• In many DBMSs, the DDL is also used to define internal and external schemas
(views).
• In some DBMSs, separate storage definition language (SDL) and view definition
language (VDL) are used to define internal and external schemas.
• SDL is typically realized via DBMS commands provided to the DBA and database
designers
DBMS Languages
• Data Manipulation Language (DML):
• Used to specify database retrievals and updates
• DML commands (data sublanguage) can be embedded in a general-
purpose programming language (host language), such as COBOL, C,
C++, or Java.
• A library of functions can also be provided to access the DBMS
from a programming language
• Alternatively, stand-alone DML commands can be applied directly
(called a query language).
Types of DML
• High Level or Non-procedural Language:
• For example, the SQL relational language
• Are “set”-oriented and specify what data to retrieve rather than
how to retrieve it.
• Also called declarative languages.
• Low Level or Procedural Language:
• Retrieve data one record-at-a-time;
• Constructs such as looping are needed to retrieve multiple records,
along with positioning pointers.
DBMS Interface
• Stand-alone query language interfaces
• Example: Entering SQL queries at the DBMS interactive SQL
interface (e.g. SQL*Plus in ORACLE)
• Programmer interfaces for embedding DML in programming languages
• User-friendly interfaces
• Menu-based, forms-based, graphics-based, etc.
DBMS Programming Language Interfaces
• Programmer interfaces for embedding DML in a programming
languages:
• Embedded Approach: e.g embedded SQL (for C, C++, etc.), SQLJ
(for Java)
• Procedure Call Approach: e.g. JDBC for Java, ODBC for other
programming languages
• Database Programming Language Approach: e.g. ORACLE has PL/
SQL, a programming language based on SQL; language incorporates
SQL and its data types as integral components
User-Friendly DBMS Interfaces

• Menu-based, popular for browsing on the web


• Forms-based, designed for naïve users
• Graphics-based
• (Point and Click, Drag and Drop, etc.)
• Natural language: requests in written English
• Combinations of the above:
• For example, both menus and forms used extensively in Web
database interfaces
Other DBMS Interfaces
• Speech as Input and Output
• Web Browser as an interface
• Parametric interfaces, e.g., bank tellers using function keys.
• Interfaces for the DBA:
• Creating user accounts, granting authorizations
• Setting system parameters
• Changing schemas or access paths
Database System Utilities
• To perform certain functions such as:
• Loading data stored in files into a database. Includes data
conversion tools.
• Backing up the database periodically on tape.
• Reorganizing database file structures.
• Report generation utilities.
• Performance monitoring utilities.
• Other functions, such as sorting, user monitoring, data
compression, etc.
Other Tools
• Data dictionary / repository:
• Used to store schema descriptions and other information such as
design decisions, application program descriptions, user
information, usage standards, etc.
• Active data dictionary is accessed by DBMS software and users/
DBA.
• Passive data dictionary is accessed by users/DBA only.
Typical DBMS Component Modules
Centralized and
Client-Server DBMS Architectures
• Centralized DBMS:
• Combines everything into single system including- DBMS software,
hardware, application programs, and user interface processing
software.
• User can still connect through a remote terminal – however, all
processing is done at centralized site.
A Physical Centralized Architecture
Basic 2-tier Client-Server Architectures
• Specialized Servers with Specialized functions
• Print server
• File server
• DBMS server
• Web server
• Email server
• Clients can access the specialized servers as needed
Logical two-tier client server architecture
Logical two-tier client server architecture
DBMS Server
• Provides database query and transaction services to the clients
• Relational DBMS servers are often called SQL servers, query servers, or
transaction servers
• Applications running on clients utilize an Application Program Interface
(API) to access server databases via standard interface such as:
• ODBC: Open Database Connectivity standard
• JDBC: for Java programming access
• Client and server must install appropriate client module and server
module software for ODBC or JDBC
Two Tier Client-Server Architecture
• A client program may connect to several DBMSs, sometimes called the
data sources.
• In general, data sources can be files or other non-DBMS software that
manages data.
• Other variations of clients are possible: e.g., in some object DBMSs,
more functionality is transferred to clients including data dictionary
functions, optimization and recovery across multiple servers, etc.
Three Tier Client-Server Architecture
• Common for Web applications
• Intermediate Layer called Application Server or Web Server:
• Stores the web connectivity software and the business logic part of
the application used to access the corresponding data from the
database server
• Acts like a conduit for sending partially processed data between
the database server and the client.
• Three-tier Architecture Can Enhance Security:
• Database server only accessible via middle tier
• Clients cannot directly access database server

37
Three-tier client-server architecture

38
Classification of DBMSs
• Based on the data model used
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
• Other classifications
• Single-user (typically used with personal computers)
vs. multi-user (most DBMSs).
• Centralized (uses a single computer with one database)
vs. distributed (uses multiple computers, multiple databases)
Variations of Distributed DBMSs (DDBMSs)
• Homogeneous DDBMS
• Heterogeneous DDBMS
• Federated or Multi-database Systems
Cost considerations for DBMSs
• Cost Range: from free open-source systems to configurations costing millions
of dollars
• Examples of free relational DBMSs: MySQL, PostgreSQL, others
• Commercial DBMS offer additional specialized modules, e.g. time-series
module, spatial data module, document module, XML module
• These offer additional specialized functionality when purchased separately
• Sometimes called cartridges (e.g., in Oracle) or blades
• Different licensing options: site license, maximum number of concurrent users
(seat license), single user, etc.
History of Data Models

• Network Model
• Hierarchical Model
• Relational Model
• Object-oriented Data Models
• Object-Relational Models

42
History of Data Models
• Network Model:
• The first network DBMS was implemented by Honeywell in 1964-65
(IDS System).
• Adopted heavily due to the support by CODASYL (Conference on
Data Systems Languages) (CODASYL - DBTG report of 1971).
• Later implemented in a large variety of systems - IDMS (Cullinet -
now Computer Associates), DMS 1100 (Unisys), IMAGE (H.P.
(Hewlett-Packard)), VAX -DBMS (Digital Equipment Corp., next
COMPAQ, now H.P.).
Example of Network Model Schema

44
Network Model
• Advantages:
• Network Model is able to model complex relationships and
represents semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and
relationship types.
• Language is navigational; uses constructs like FIND, FIND member,
FIND owner, FIND NEXT within set, GET, etc.
• Programmers can do optimal navigation through the database.

45
Network Model
• Disadvantages:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through
a set of records.
• Little scope for automated “query optimization”

46
History of Data Models
• Hierarchical Data Model:
• Initially implemented in a joint effort by IBM and North American
Rockwell around 1965. Resulted in the IMS family of systems.
• IBM’s IMS product had (and still has) a very large customer base
worldwide
• Hierarchical model was formalized based on the IMS system
• Other systems based on this model: System 2k (SAS inc.)
Hierarchical Model
• Advantages:
• Simple to construct and operate
• Corresponds to a number of natural hierarchically organized
domains, e.g., organization (“org”) chart
• Language is simple:
• Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT, etc.
• Disadvantages:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
History of Data Models
• Relational Model:
• Proposed in 1970 by E.F. Codd (IBM), first commercial system in
1981-82.
• Now in several commercial products (e.g. DB2, ORACLE, MS SQL
Server, SYBASE, INFORMIX).
• Several free open source implementations, e.g. MySQL, PostgreSQL
• Currently most dominant for developing database applications.
• SQL relational standards: SQL-89 (SQL1), SQL-92 (SQL2), SQL-99,
SQL3, …

49
History of Data Models
• Object-oriented Data Models:
• Several models have been proposed for implementing in a database
system.
• One set comprises models of persistent O-O Programming
Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and
Smalltalk (e.g., in GEMSTONE).
• Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS
(at H.P.- used in Open OODB).
• Object Database Standard: ODMG-93, ODMG-version 2.0, ODMG-
version 3.0.
• Chapters 20 and 21 describe this model.
History of Data Models
• Object-Relational Models:
• Most Recent Trend. Started with Informix Universal Server.
• Relational systems incorporate concepts from object databases
leading to object-relational.
• Exemplified in the latest versions of Oracle-10i, DB2, and SQL
Server and other DBMSs.
• Standards included in SQL-99 and expected to be enhanced in
future SQL standards.

51
Summary
• Data Models and Their Categories
• History of Data Models
• Schemas, Instances, and States
• Three-Schema Architecture
• Data Independence
• DBMS Languages and Interfaces
• Database System Utilities and Tools
• Centralized and Client-Server Architectures
• Classification of DBMSs

52
Data Modeling Using
the Entity-Relationship (ER)
Model
Chapter Outline
u Overview of Database Design Process
u Example Database Application (COMPANY)
u ER Model Concepts
u Entities and Attributes
u Entity Types, Value Sets, and Key Attributes
u Relationships and Relationship Types
u Weak Entity Types
u Roles and Attributes in Relationship Types
u ER Diagrams - Notation
u ER Diagram for COMPANY Schema
Overview of Database Design Process
u Two main activities:
u Database design
u Applications design
u Focus in this chapter on database design
u To design the conceptual schema for a database application
u Applications design focuses on the programs and interfaces that
access the database
u Generally considered part of software engineering
Database Design Process
stages in the design of a database:
u requirement analysis
u conceptual database design
u choice of the DBMS
u data model mapping
u physical design
u implementation
Requirement Gathering and Analysis
Purpose: to document the data requirements of the users
functional requirements are the operations that will be applied to the database,
including queries and update
the specification will then be used as the basis for the design of the database typical
activities:
u identification of application areas and user groups
u analysis of existing documentation of application areas, e.g. policy documents,
forms, reports, organization charts
u analysis of current operating environments and the planned use of the information,
e.g. information flow, types of transactions, frequency of transaction types
u responses to user questionnaires are analyzed
In Other words
start from a description of the requirements which is:
u poorly structured,
u heterogeneous
u informal
and use a technique to transform that into a specification of the database
requirements which is:
u formal
u homogeneous
u consistent
u complete
Conceptual Design
Two parallel activities
u Schema Design
look at the data requirements resulting from the analysis (phase 1) and produce a
conceptual schema in a DBMS-independent high level data model
u Transaction Design
look at the database applications whose requirements were analyzed in phase 1
and produce high level specifications for these transactions
Conceptual Schema Design
Purpose: to produce a conceptual schema of database
u Expressed using concept of the high level data model
u Not including implementation details (has to be understood by
non-technical users)
u but detailed in terms of the “objects” of the domain the database
will represent
u independent of the DBMS to be used (no relational DB-oriented
notions!)
u cannot be used directly to implement the database
design is made in terms of a semantic or conceptual data model
Transaction Design
Purpose: to produce a design of the transactions, that will run on the
database
u retrieval: retrieve data for display or as part of a report
u update: enter new data or amend existing data
u mixed: more complex applications may do both retrieval and update
Why?
u need to be sure to include in the conceptual schema all information
required by transactions
u relative importance and frequency of use of transactions will
influence physical database design
u ... the software needs to be designed as well as the data!
Choosing a DBMS
Purpose: Deciding the best framework for implementing the produced
schema:
u type of DBMS (relational, network, deductive, Object Oriented, ...)
u user and programmer interfaces
u type of query languages
choice made on the basis of
u technical factors the DBMS
has to support the required tasks
u economic factors
software acquisition/maintenance, hardware acquisition,
creation/conversion, training of staff
u organizational factors:
platforms supported, availability of vendor services
Logical Design
Purpose: to transform the generic, DBMS independent conceptual
schema in the data model of the chosen DBMS (data model mapping)
Two stages:
u system independent mapping: no consideration of any specific
characteristics that may apply to the specific DBMS package
u tailoring to DBMS: different DBMSs may implement the same data
model in slightly different ways
u result is a set of Data Description Language (DDL) statements in the
language of the chosen DBMS
Physical Design
Purpose: to choose the specific storage structures and access paths for
the database files
attention to performances some relevant criteria:
u response time: may want to minimise database access time for data
items referenced by frequently used transactions
u space utilisation: less frequently used data and queries may be
archived
u transaction throughput: average number of transactions that can be
processed per minute
Implementation
Purpose: to create the database
u compile and execute DDL statements
populate the database
u Manually / Automatically (May need to convert data from previous
formats)
u Implement application programs (transactions)
u Programs are written with embedded DML statements
Overview of Database Design Process

Slide 3-
15
Example COMPANY Database
u We need to create a database schema design based on the following
(simplified) requirements of the COMPANY Database:
u The company is organized into DEPARTMENTs. Each department has
a name, number and an employee who manages the department.
We keep track of the start date of the department manager. A
department may have several locations.
u Each department controls a number of PROJECTs. Each project has
a unique name, unique number and is located at a single location.
Example COMPANY Database (Contd.)
u We store each EMPLOYEE’s social security number, address, salary,
sex, and birthdate.
u Each employee works for one department but may work on
several projects.
u Wekeep track of the number of hours per week that an
employee currently works on each project.
u We also keep track of the direct supervisor of each employee.
u Each employee may have a number of DEPENDENTs.
u For each dependent, we keep track of their name, sex,
birthdate, and relationship to the employee.
Entity-Relationship Model
u model to express the conceptual schema of the database
u originally proposed in 1976 by Peter Chen on the “ACM Transactions
on Database Systems” journal as a means to unify the network and
relational DB models
used routinely for system analysis and design
u simple enough to learn and understand the basic concepts
u Powerful enough to be used in the development of complex
applications
conceptual designs using the ER model are called ER schemas
ER Model Concepts
Entities and Attributes
u Entities are specific objects or things in the mini-world that are
represented in the database.
u For example the EMPLOYEE John Smith, the Research
DEPARTMENT, the ProductX PROJECT
u Attributes are properties used to describe an entity.
u For example an EMPLOYEE entity may have the attributes
Name, SSN, Address, Sex, BirthDate
u A specific entity will have a value for each of its attributes.
u For example a specific employee entity may have Name='John
Smith', SSN='123456789', Address ='731, Fondren, Houston, TX',
Sex='M', BirthDate='09-JAN-55‘
u Each attribute has a value set (or data type) associated with it –
e.g. integer, string, subrange, enumerated type, …
Types of Attributes
u Simple
u Each entity has a single atomic value for the attribute. For
example, SSN or Sex.
u Composite
u The attribute may be composed of several components. For
example:
u Address(Apt#, House#, Street, City, State, ZipCode, Country),
or
u Name(FirstName, MiddleName, LastName).
u Composition may form a hierarchy where some components are
themselves composite.
u Multi-valued
u An entity may have multiple values for that attribute. For
example, Color of a CAR or PreviousDegrees of a STUDENT.
u Denoted as {Color} or {PreviousDegrees}.
Types of Attributes (2)
u In general, composite and multi-valued attributes may be nested
arbitrarily to any number of levels, although this is rare.
u For example, PreviousDegrees of a STUDENT is a composite multi-
valued attribute denoted by {PreviousDegrees (College, Year,
Degree, Field)}
u Multiple PreviousDegrees values can exist
u Each has four subcomponent attributes:
u College, Year, Degree, Field
Example of a composite attribute
Entity Types and Key Attributes
u Entities with the same basic attributes are grouped or typed into an
entity type.
u For example, the entity type EMPLOYEE and PROJECT.
u An attribute of an entity type for which each entity must have a
unique value is called a key attribute of the entity type.
u For example, SSN of EMPLOYEE.
Entity Types and Key Attributes
u A key attribute may be composite.
u License plate number is a key of the CAR entity type
u An entity type may have more than one key.
u The CAR entity type may have two keys:
u Vehicle Identification Number (popularly called VIN)
u License plate number (Number, Province)
u Each key is underlined
Displaying an Entity type
u In ER diagrams, an entity type is displayed in a rectangular box
u Attributes are displayed in ovals
u Each attribute is connected to its entity type
u Components of a composite attribute are connected to the oval
representing the composite attribute
u Each key attribute is underlined
u Multivalued attributes displayed in double ovals
u See CAR example on next slide
Entity Type CAR with two keys and a
corresponding Entity Set

Slide 3-
26
Entity Set
u Each entity type will have a collection of entities stored in the
database
u Called the entity set
u Previous slide shows three CAR entity instances in the entity set for
CAR
u Same name (CAR) used to refer to both the entity type and the entity
set
u Entity set is the current state of the entities of that type that are
stored in the database
Initial Design of Entity Types for the
COMPANY Database Schema

u Based on the requirements, we can identify four initial entity types in


the COMPANY database:
u DEPARTMENT
u PROJECT
u EMPLOYEE
u DEPENDENT
u Their initial design is shown on the following slide
u The initial attributes shown are derived from the requirements
description
Initial Design of Entity Types:
EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

Slide 3-
29
Refining the initial design by introducing
relationships

u The initial design is typically not complete


u Some aspects in the requirements will be represented as relationships
u ER model has three main concepts:
u Entities (and their entity types and entity sets)
u Attributes (simple, composite, multivalued)
u Relationships (and their relationship types and relationship sets)
Relationships and Relationship Types
u A relationship relates two or more distinct entities with a specific
meaning.
u For example, EMPLOYEE John Smith works on the ProductX
PROJECT, or EMPLOYEE Franklin Wong manages the Research
DEPARTMENT.
u Relationships of the same type are grouped or typed into a
relationship type.
u For example, the WORKS_ON relationship type in which
EMPLOYEEs and PROJECTs participate, or the MANAGES
relationship type in which EMPLOYEEs and DEPARTMENTs
participate.
u The degree of a relationship type is the number of participating entity
types.
u Both MANAGES and WORKS_ON are binary relationships.
Relationship instances
u In the mini-world represented by figure below employees
e1, e2, e3 and e6 work for department d1, e2 and e4 works
for d2 and e5 and e7 work for d3.
Relationship type vs. relationship set
u Relationship Type:
u Is the schema description of a relationship
u Identifies the relationship name and the participating entity types
u Also identifies certain relationship constraints
u Relationship Set:
u The current set of relationship instances represented in the
database
u The current state of a relationship type
Relationship types and instances
Relationship type vs. relationship set
u Previous figures displayed the relationship sets
u Each instance in the set relates individual participating entities – one
from each participating entity type
u In ER diagrams, we represent the relationship type as follows:
u Diamond-shaped box is used to display a relationship type
u Connected to the participating entity types via straight lines
Quiz 02
u What is the main purpose of database design process?
u What are the main stages of database design process?
u What is a data model?
u List down the categories of data models and describe two of them.
u Distinguish database schema and and the database state
u What is known as a valid state of a database
u What are the three levels of three layer architecture?
u What id DDL? Who use DDL and for what purpose?
Refining the COMPANY database schema by
introducing relationships
u By examining the requirements, six relationship types are identified
u All are binary relationships( degree 2)
u Listed below with their participating entity types:
u WORKS_FOR (between EMPLOYEE, DEPARTMENT)
u MANAGES (also between EMPLOYEE, DEPARTMENT)
u CONTROLS (between DEPARTMENT, PROJECT)
u WORKS_ON (between EMPLOYEE, PROJECT)
u SUPERVISION (between EMPLOYEE (as subordinate), EMPLOYEE (as
supervisor))
u DEPENDENTS_OF (between EMPLOYEE, DEPENDENT)
ER Diagram For Company Database

Slide 3-
38
Discussion on Relationship Types
u In the refined design, some attributes from the initial entity
types are refined into relationships:
u Manager of DEPARTMENT -> MANAGES
u Works_on of EMPLOYEE -> WORKS_ON
u Department of EMPLOYEE -> WORKS_FOR
u In general, more than one relationship type can exist between
the same participating entity types
u MANAGES and WORKS_FOR are distinct relationship types between
EMPLOYEE and DEPARTMENT
u With different meanings and different relationship instances.
Recursive Relationship Type
u Is a relationship type with the same participating entity type in
distinct roles
u Example: the SUPERVISION relationship
u EMPLOYEE participates twice in two distinct roles:
u supervisor (or boss) role
u supervisee (or subordinate) role
u Each relationship instance relates two distinct EMPLOYEE entities:
u One employee in supervisor role
u One employee in supervisee role
Example of relationships of different
degrees
(a) Unary recursive relationships
Weak Entity Types
u An entity that does not have a key attribute
u A weak entity must participate in an identifying relationship type with
an owner or identifying entity type
u Entities are identified by the combination of:
u A partial key of the weak entity type
u The particular entity they are related to in the identifying entity
type
u Example:
u A DEPENDENT entity is identified by the dependent’s first name,
and the specific EMPLOYEE with whom the dependent is related
u Name of DEPENDENT is the partial key
u DEPENDENT is a weak entity type
u EMPLOYEE is its identifying entity type via the identifying
relationship type DEPENDENT_OF
Weak Entity Type
Constraints on Relationships
u Constraints on Relationship Types
u Cardinality Ratio (specifies maximum participation)
u Also known as ratio constraints
u One-to-one (1:1)
u One-to-many (1:N) or Many-to-one (N:1)
u Many-to-many (M:N)
u Existence Dependency Constraint (specifies minimum
participation) (also called participation constraint)
u zero (optional participation, not existence-dependent)
u one or more (mandatory participation, existence-dependent)
Supervision Relationship
Manages Relationship
Recursive Relationship Type is: SUPERVISION
(participation role names are shown)

Slide 3-
47
Attributes of Relationship types
u A relationship type can have attributes:
u For example, HoursPerWeek of WORKS_ON
u Its value for each relationship instance describes the number of
hours per week that an EMPLOYEE works on a PROJECT.
uA value of HoursPerWeek depends on a particular (employee,
project) combination
u Most relationship attributes are used with M:N relationships
u In1:N relationships, they can be transferred to the entity type on
the N-side of the relationship
Notation for Constraints on Relationships
u Cardinality ratio (of a binary relationship): 1:1, 1:N, N:1, or M:N
u Shown by placing appropriate numbers on the relationship edges.
u Participation constraint (on each participating entity type): total
(called existence dependency) or partial.
u Total shown by double line, partial by single line.
u NOTE: These are easy to specify for Binary Relationship Types.
Alternative diagrammatic notation
u ER diagrams is one popular example for displaying database schemas
u Many other notations exist in the literature and in various database
design and modeling tools
u Appendix A illustrates some of the alternative notations that have
been used
u UML class diagrams is representative of another way of displaying ER
concepts that is used in several commercial design tools
Summary of notation for ER diagrams

Slide 3-
51
Relationships of Higher Degree
u Relationship types of degree 2 are called binary
u Relationship types of degree 3 are called ternary and of degree n are
called n-ary
u In general, an n-ary relationship is not equivalent to n binary
relationships
u Constraints are harder to specify for higher-degree relationships (n >
2) than for binary relationships
n-ary relationships (n > 2)
u In general, 3 binary relationships can represent different information
than a single ternary relationship (see Figure 3.17a and b on next
slide)

u If needed, the binary and n-ary relationships can all be included in the
schema design (see Figure 3.17a and b, where all relationships convey
different meanings)

u In some cases, a ternary relationship can be represented as a weak


entity if the data model allows a weak entity type to have multiple
identifying relationships (and hence multiple owner entity types) (see
Figure 3.17c)
Example of a ternary relationship
n-ary relationships (n > 2)
u If a particular binary relationship can be derived from a higher-degree
relationship at all times, then it is redundant

u For example, the TAUGHT_DURING binary relationship in Figure 3.18


(see next slide) can be derived from the ternary relationship OFFERS
(based on the meaning of the relationships)
Another example of a ternary relationship
Displaying constraints on higher-degree
relationships

u Displaying a 1, M, or N indicates additional constraints


u An M or N indicates no constraint
u A 1 indicates that an entity can participate in at most one
relationship instance that has a particular combination of the
other participating entities
Extended Entity-Relationship (EER)
Model (in next chapter)

u The entity relationship model in its original form did not support the
specialization and generalization abstractions
u Next chapter illustrates how the ER model can be extended with
u Type-subtype and set-subset relationships
u Specialization/Generalization Hierarchies
u Notation to display them in EER diagrams
Chapter Summary

u ER Model Concepts: Entities, attributes, relationships


u Constraints in the ER model
u Using ER in step-by-step conceptual schema design for the COMPANY database
u ER Diagrams - Notation
Enhanced Entity- Relationship Modeling
Lecture 04
ICT 1407 Database Systems
Chapter Outline
u EER Model Concepts
u Includes all modeling concepts of basic ER
u Additional concepts:
u Sub-classes/super-classes
u specialization/generalization
u categories (UNION types)
u attribute and relationship inheritance
u These are fundamental to conceptual modeling
u The additional EER concepts are used to model applications more
completely and more accurately
u EER includes some object-oriented concepts, such as inheritance
Sub-classes and Super-classes
u An entity type may have additional meaningful subgroupings of its
entities
u Example: EMPLOYEE may be further grouped into:
u SECRETARY, ENGINEER, TECHNICIAN, …

u Based on the EMPLOYEE’s Job


u MANAGER

u EMPLOYEEs who are managers


u SALARIED_EMPLOYEE, HOURLY_EMPLOYEE
u Based on the EMPLOYEE’s method of pay
u EER diagrams extend ER diagrams to represent these additional
subgroupings, called subclasses or subtypes
Sub-classes and Super-classes
Sub-classes and Super-classes

u Each of these subgroupings is a subset of EMPLOYEE


entities
u Each is called a sub-class of EMPLOYEE
u EMPLOYEE is the super-class for each of these subclasses
u These are called superclass/subclass relationships:
u EMPLOYEE/SECRETARY
u EMPLOYEE/TECHNICIAN
u EMPLOYEE/MANAGER
u …
Sub-classes and Super-classes
u These are also called IS-A relationships
u SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A EMPLOYEE, ….
u Note: An entity that is member of a subclass represents
the same real-world entity as some member of the
superclass:
u The subclass member is the same entity in a distinct specific
role
u An entity cannot exist in the database merely by being a
member of a subclass; it must also be a member of the
superclass
u A member of the superclass can be optionally included as a
member of any number of its subclasses
Sub-classes and Super-classes
u Examples:
u A salaried employee who is also an engineer belongs to the two
subclasses:
u ENGINEER, and
u SALARIED_EMPLOYEE
u A salaried employee who is also an engineering manager belongs to
the three subclasses:
u MANAGER,
u ENGINEER, and
u SALARIED_EMPLOYEE

u It is not necessary that every entity in a superclass be a


member of some subclass
Attribute Inheritance in Super-class / Sub-class Relationships
u An entity that is member of a subclass inherits
u All attributes of the entity as a member of the superclass
u All relationships of the entity as a member of the superclass
u Example:
u In the next slide, SECRETARY (as well as TECHNICIAN and
ENGINEER) inherit the attributes Name, SSN, …, from EMPLOYEE
u Every SECRETARY entity will have values for the inherited
attributes
Representing Specialization in EER Diagrams
Specialization

u Specialization is the process of defining a set of subclasses of a


superclass
u The set of subclasses is based upon some distinguishing characteristics
of the entities in the superclass
u Example: {SECRETARY, ENGINEER, TECHNICIAN} is a specialization
of EMPLOYEE based upon job type.
u May have several specializations of the same superclass
Specialization
u Example: Another specialization of EMPLOYEE based on method of pay
is {SALARIED_EMPLOYEE, HOURLY_EMPLOYEE}.
u Superclass/subclass relationships and specialization can be
diagrammatically represented in EER diagrams

u Attributes of a subclass are called specific or local attributes.


u For example, the attribute TypingSpeed of SECRETARY

u The subclass can also participate in specific relationship types.


u Forexample, a relationship BELONGS_TO of
HOURLY_EMPLOYEE
Specialization
Generalization
u Generalization is the reverse of the specialization process
u Several classes with common features are generalized into
a superclass;
u original classes become its subclasses
u Example: CAR, TRUCK generalized into VEHICLE;
u both CAR, TRUCK become subclasses of the superclass
VEHICLE.
u We can view {CAR, TRUCK} as a specialization of VEHICLE
u Alternatively, we can view VEHICLE as a generalization of
CAR and TRUCK
Generalization
Generalization and Specialization
u Diagrammatic notation are sometimes used to distinguish between
generalization and specialization

u Arrow pointing to the generalized superclass represents a generalization

u Arrows pointing to the specialized subclasses represent a specialization

u We do not use this notation because it is often subjective as to which process is


more appropriate for a particular situation

u We advocate not drawing any arrows


Constraints on Specialization and Generalization

u If we can determine exactly those entities that will become members


of each subclass by a condition, the subclasses are called predicate-
defined (or condition-defined) subclasses

u Condition is a constraint that determines subclass members

u Display a predicate-defined subclass by writing the predicate


condition next to the line attaching the subclass to its superclass
Constraints on Specialization and Generalization
u If all subclasses in a specialization have membership condition on same attribute
of the superclass, specialization is called an attribute-defined specialization

u Attribute is called the defining attribute of the specialization

u Example: JobType is the defining attribute of the specialization {SECRETARY,


TECHNICIAN, ENGINEER} of EMPLOYEE

u If no condition determines membership, the subclass is called user-defined

u Membership in a subclass is determined by the database users by applying an


operation to add an entity to the subclass

u Membership in the subclass is specified individually for each entity in the


superclass by the user
Displaying an attribute-defined specialization in EER diagrams
Constraints on Specialization and Generalization

u Two basic constraints can apply to a specialization/generalization:

u Disjointness Constraint:

u Completeness Constraint:
Constraints on Specialization and Generalization
u Disjointness Constraint:

u Specifies that the subclasses of the specialization must be disjoint:

u an entity can be a member of at most one of the subclasses of the


specialization

u Specified by d in EER diagram

u If not disjoint, specialization is overlapping:

u that is the same entity may be a member of more than one


subclass of the specialization

u Specified by o in EER diagram


Constraints on Specialization and Generalization
u Completeness Constraint:

u Total specifies that every entity in the superclass must be a


member of some subclass in the specialization/generalization

u Shown in EER diagrams by a double line

u Partial allows an entity not to belong to any of the subclasses

u Shown in EER diagrams by a single line


Constraints on Specialization and Generalization
u Hence, we have four types of specialization/generalization:

u Disjoint, total

u Disjoint, partial

u Overlapping, total

u Overlapping, partial

u Note: Generalization usually is total because the superclass is derived


from the subclasses.
Example of disjoint partial Specialization
Example of overlapping total Specialization
Specialization/Generalization Hierarchies,
Lattices & Shared Subclasses
u A subclass may itself have further subclasses specified on it

u forms a hierarchy or a lattice

u Hierarchy has a constraint that every subclass has only one superclass
(called single inheritance); this is basically a tree structure

u In a lattice, a subclass can be subclass of more than one superclass


(called multiple inheritance)
Shared Subclass “Engineering_Manager”
Specialization/Generalization Hierarchies,
Lattices & Shared Subclasses
u In a lattice or hierarchy, a subclass inherits attributes not only of its
direct superclass, but also of all its predecessor superclasses

u A subclass with more than one superclass is called a shared subclass


(multiple inheritance)

u Can have:

u specialization hierarchies or lattices, or

u generalization hierarchies or lattices,

u depending on how they were derived

u We just use specialization (to stand for the end result of either
specialization or generalization)
Specialization/Generalization Hierarchies, Lattices
& Shared Subclasses
u In specialization, start with an entity type and then define subclasses
of the entity type by successive specialization

u called a top down conceptual refinement process

u In generalization, start with many entity types and generalize those


that have common properties

u Called a bottom up conceptual synthesis process

u In practice, a combination of both processes is usually employed


Specialization / Generalization Lattice Example (UNIVERSITY)
Categories (UNION TYPES)
u All of the superclass/subclass relationships we have seen thus far have a single superclass

u A shared subclass is a subclass in:

u more than one distinct superclass/subclass relationships

u each relationships has a single superclass

u shared subclass leads to multiple inheritance

u In some cases, we need to model a single superclass/subclass relationship with more than
one superclass

u Superclasses can represent different entity types

u Such a subclass is called a category or UNION TYPE


Categories (UNION TYPES)
u Example: In a database for vehicle registration, a vehicle owner can
be a PERSON, a BANK (holding a lien on a vehicle) or a COMPANY.

u A category (UNION type) called OWNER is created to represent a


subset of the union of the three superclasses COMPANY, BANK, and
PERSON

u A category member must exist in at least one of its superclasses

u Difference from shared subclass, which is a:

u subset of the intersection of its superclasses

u shared subclass member must exist in all of its superclasses


Two categories (UNION types): OWNER, REGISTERED_VEHICLE
Mapping ER and EER Designs to Relational
Designs
Chapter Outline
u ER-to-Relational Mapping Algorithm

u Step 1: Mapping of Regular Entity Types

u Step 2: Mapping of Weak Entity Types

u Step 3: Mapping of Binary 1:1 Relation Types

u Step 4: Mapping of Binary 1:N Relationship Types.

u Step 5: Mapping of Binary M:N Relationship Types.

u Step 6: Mapping of Multivalued attributes.

u Step 7: Mapping of N-ary Relationship Types.

u Mapping EER Model Constructs to Relations

u Step 8: Options for Mapping Specialization or Generalization.

u Step 9: Mapping of Union Types (Categories).


ER-to-Relational Mapping Algorithm
u Step 1: Mapping of Regular Entity Types.

u For each regular (strong) entity type E in the ER schema, create a relation R
that includes all the simple attributes of E.

u Choose one of the key attributes of E as the primary key for R.

u If the chosen key of E is composite, the set of simple attributes that form it will
together form the primary key of R.

u Example: We create the relations EMPLOYEE, DEPARTMENT, and PROJECT in the


relational schema corresponding to the regular entities in the ER diagram.

u SSN, DNUMBER, and PNUMBER are the primary keys for the relations EMPLOYEE,
DEPARTMENT, and PROJECT as shown.
Foreign Key Constraint
u A FOREIGN KEY is a key used to link two tables together.

u A FOREIGN KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table.

u For referential integrity to hold in a relational database, any column in a


base table that is declared a foreign key can contain either a null value, or only values
from a parent table's primary key or a candidate key
FIGURE 7.1
The ER conceptual schema diagram for the COMPANY database.
FIGURE 7.2
Result of mapping the COMPANY ER schema into a relational schema.

Slide 7-
7
ER-to-Relational Mapping Algorithm (contd.)
u Step 2: Mapping of Weak Entity Types

u For each weak entity type W in the ER schema with owner entity type E,
create a relation R & include all simple attributes (or simple components of
composite attributes) of W as attributes of R.

u Also, include as foreign key attributes of R the primary key attribute(s) of


the relation(s) that correspond to the owner entity type(s).

u The primary key of R is the combination of the primary key(s) of the


owner(s) and the partial key of the weak entity type W, if any.
Example: Mapping Week Entities
u Example: Create the relation DEPENDENT in this step to correspond to
the weak entity type DEPENDENT.

u Include the primary key SSN of the EMPLOYEE relation as a foreign


key attribute of DEPENDENT (renamed to ESSN).

u The primary key of the DEPENDENT relation is the combination


{ESSN, DEPENDENT_NAME} because DEPENDENT_NAME is the
partial key of DEPENDENT.
ER-to-Relational Mapping Algorithm (contd.)
u Step 3: Mapping of Binary 1:1 Relation Types

u For each binary 1:1 relationship type R in the ER schema, identify the relations S
and T that correspond to the entity types participating in R.

u There are three possible approaches:

1. Foreign Key Approach

2. Merged Relation Option

3. Cross Reference or Relationship Relation option


1. Foreign Key approach: Choose one of the relations-say S-and include a foreign
key in S the primary key of T. It is better to choose an entity type with total
participation in R in the role of S.

u Example: 1:1 relation MANAGES is mapped by choosing the participating entity


type DEPARTMENT to serve in the role of S, because its participation in the
MANAGES relationship type is total.

2. Merged relation option: An alternate mapping of a 1:1 relationship type is


possible by merging the two entity types and the relationship into a single
relation. This may be appropriate when both participations are total.

3. Cross-reference or relationship relation option: The third alternative is to set up


a third relation R for the purpose of cross-referencing the primary keys of the two
relations S and T representing the entity types.
ER-to-Relational Mapping Algorithm (contd.)
u Step 4: Mapping of Binary 1:N Relationship Types.

u For each regular binary 1:N relationship type R, identify the relation S that
represent the participating entity type at the N-side of the relationship type.

u Include as foreign key in S the primary key of the relation T that represents the
other entity type participating in R.

u Include any simple attributes of the 1:N relation type as attributes of S.

u Example: 1:N relationship types WORKS_FOR, CONTROLS, and SUPERVISION in the


figure.

u For WORKS_FOR we include the primary key DNUMBER of the DEPARTMENT


relation as foreign key in the EMPLOYEE relation and call it DNO.
ER-to-Relational Mapping Algorithm (contd.)

u Step 5: Mapping of Binary M:N Relationship Types.

u For each regular binary M:N relationship type R, create a new relation S to
represent R.

u Include as foreign key attributes in S the primary keys of the relations that
represent the participating entity types; their combination will form the primary
key of S.

u Also include any simple attributes of the M:N relationship type (or simple
components of composite attributes) as attributes of S.
u Example: The M:N relationship type WORKS_ON from the ER diagram
is mapped by creating a relation WORKS_ON in the relational database
schema.

u The primary keys of the PROJECT and EMPLOYEE relations are


included as foreign keys in WORKS_ON and renamed PNO and
ESSN, respectively.

u Attribute HOURS in WORKS_ON represents the HOURS attribute of


the relation type. The primary key of the WORKS_ON relation is
the combination of the foreign key attributes {ESSN, PNO}.
ER-to-Relational Mapping Algorithm (contd.)

u Step 6: Mapping of Multivalued attributes.

u For each multivalued attribute A, create a new relation R.

u This relation R will include an attribute corresponding to A, plus the primary


key attribute K-as a foreign key in R-of the relation that represents the
entity type of relationship type that has A as an attribute.

u The primary key of R is the combination of A and K. If the multivalued


attribute is composite, we include its simple components.
u Example: The relation DEPT_LOCATIONS is created.

u The attribute DLOCATION represents the multivalued attribute


LOCATIONS of DEPARTMENT, while DNUMBER-as foreign key-
represents the primary key of the DEPARTMENT relation.

u The primary key of R is the combination of {DNUMBER,


DLOCATION}.
ER-to-Relational Mapping Algorithm (contd.)

u Step 7: Mapping of N-ary Relationship Types.

u For each n-ary relationship type R, where n>2, create a new relationship
S to represent R.

u Include as foreign key attributes in S the primary keys of the relations


that represent the participating entity types.

u Also include any simple attributes of the n-ary relationship type (or
simple components of composite attributes) as attributes of S.
Example: The relationship type SUPPY in the ER
This can be mapped to the relation SUPPLY shown in the relational schema,
whose primary key is the combination of the three foreign keys {SNAME,
PARTNO, PROJNAME}
Summary of Mapping constructs and constraints

ER Model Relational Model


Entity type “Entity” relation
1:1 or 1:N relationship type Foreign key (or “relationship” relation)
M:N relationship type “Relationship” relation and two foreign keys
n-ary relationship type “Relationship” relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of simple component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary (or secondary) key
Mapping EER Model Constructs to Relations

u Step8: Options for Mapping Specialization or Generalization.

u Convert each specialization with m subclasses {S1, S2,….,Sm} and


generalized superclass C, where the attributes of C are {k,a1,…an}
and k is the (primary) key, into relational schemas using one of the
four following options:

u Option 8A: Multiple relations-Superclass and subclasses

u Option 8B: Multiple relations-Subclass relations only

u Option 8C: Single relation with one type attribute

u Option 8D: Single relation with multiple type attributes


Mapping EER Model Constructs to Relations

u Option 8A: Multiple relations-Superclass and subclasses


u Create a relation L for C with attributes Attrs(L) = {k,a1,…an} and PK(L) = k.
Create a relation Li for each subclass Si, 1 < i < m, with the attributesAttrs(Li) = {k} U
{attributes of Si} and PK(Li)=k.
This option works for any specialization (total or partial, disjoint of over-lapping).

u Option 8B: Multiple relations-Subclass relations only

u Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attr(Li) =
{attributes of Si} U {k,a1…,an} and PK(Li) = k.
This option only works for a specialization whose subclasses are total (every entity in
the superclass must belong to (at least) one of the subclasses).
EER diagram notation for an attribute-defined specialization on JobType.
Options for mapping specialization or generalization.
(a) Mapping the EER schema using option 8A.
Generalization.
(b) Generalizing CAR and TRUCK into the superclass VEHICLE.
Options for mapping specialization or generalization.
(b) Mapping the EER schema using option 8B.
Mapping EER Model Constructs to Relations (contd.)

u Option 8C: Single relation with one type attribute

u Create a single relation L with attributes Attrs(L) = {k,a1,…an} U {attributes


of S1} U…U {attributes of Sm} U {t} and PK(L) = k.
The attribute t is called a type (or discriminating) attribute that indicates
the subclass to which each tuple belongs

u Option 8D: Single relation with multiple type attributes

u Create a single relation schema L with attributes Attrs(L) = {k,a1,…an} U


{attributes of S1} U…U {attributes of Sm} U {t1, t2,…,tm} and PK(L) = k.
Each ti, 1 < I < m, is a Boolean type attribute indicating whether a tuple
belongs to the subclass Si.
EER diagram notation for an attribute-defined specialization on JobType.
Options for mapping specialization or generalization.
(c) Mapping the EER schema using option 8C.
EER diagram notation for an overlapping (non-disjoint) specialization.
Options for mapping specialization or generalization.
(d) Mapping using option 8D with Boolean type fields Mflag and Pflag.
Mapping EER Model Constructs to Relations
u Mapping of Shared Subclasses (Multiple Inheritance)

u A shared subclass, such as STUDENT_ASSISTANT, is a subclass of several


classes, indicating multiple inheritance. These classes must all have the
same key attribute; otherwise, the shared subclass would be modeled
as a category.

u We can apply any of the options discussed in Step 8 to a shared


subclass, subject to the restriction discussed in Step 8 of the mapping
algorithm. Below both 8C and 8D are used for the shared class
STUDENT_ASSISTANT.
A specialization lattice with multiple inheritance for a UNIVERSITY database.

Slide 7-
33
Mapping the EER specialization lattice using multiple options.
Mapping EER Model Constructs to Relations
u Step 9: Mapping of Union Types (Categories).

u For mapping a category whose defining superclass have different


keys, it is customary to specify a new key attribute, called a
surrogate key, when creating a relation to correspond to the
category.

u In the example below we can create a relation OWNER to


correspond to the OWNER category and include any attributes of
the category in this relation. The primary key of the OWNER
relation is the surrogate key, which we called OwnerId.
Two categories (union types): OWNER and REGISTERED_VEHICLE.
Mapping the EER categories (union types) in Figure 4.7 to relations.
Mapping Exercise
Exercise 7.4.

Slide 7-
38
Chapter Summary

u ER-to-Relational Mapping Algorithm


u Step 1: Mapping of Regular Entity Types
u Step 2: Mapping of Weak Entity Types
u Step 3: Mapping of Binary 1:1 Relation Types
u Step 4: Mapping of Binary 1:N Relationship Types.
u Step 5: Mapping of Binary M:N Relationship Types.
u Step 6: Mapping of Multivalued attributes.
u Step 7: Mapping of N-ary Relationship Types.

u Mapping EER Model Constructs to Relations


u Step 8: Options for Mapping Specialization or Generalization.
u Step 9: Mapping of Union Types (Categories).
ICT1407
Database Systems
The Relational Data Model and Relational
Database Constraints
Chapter Outline

u Relational Model Concepts


u Relational Model Constraints and Relational Database Schemas
u Update Operations and Dealing with Constraint Violations

Slide 5- 3
Relational Model Concepts
u A Relation is a mathematical concept based on the ideas of sets
u The model was first proposed by Dr. E.F. Codd of IBM Research
in 1970 in the following paper:
u "A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970
u The above paper caused a major revolution in the field of
database management and earned Dr. Codd the coveted ACM
Turing Award

Slide 5- 4
Informal Definitions

u Informally, a relation looks like a table of values.

u A relation typically contains a set of rows.

u The data elements in each row represent certain facts that correspond to a real-
world entity or relationship
u In the formal model, rows are called tuples

u Each column has a column header that gives an indication of the meaning of the
data items in that column
u In the formal model, the column header is called an attribute name (or just
attribute)
Example of a Relation

Slide 5- 6
Informal Definitions
u Key of a Relation:
u Each row has a value of a data item (or set of items) that uniquely identifies
that row in the table
u Called the key
u In the STUDENT table, SSN is the key

u Sometimes row-ids or sequential numbers are assigned as keys to identify the


rows in a table
u Called artificial key or surrogate key
Formal Definitions - Schema
u The Schema (or description) of a Relation:
u Denoted by R(A1, A2, .....An)
u R is the name of the relation
u The attributes of the relation are A1, A2, ..., An
u Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
u CUSTOMER is the relation name
u Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
u Each attribute has a domain or a set of valid values.
u For example, the domain of Cust-id is 6 digit numbers.
Formal Definitions - Tuple
u A tuple is an ordered set of values (enclosed in angled brackets ‘< … >’)
u Each value is derived from an appropriate domain.
u A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for
example:
u <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">
u This is called a 4-tuple as it has 4 values
u A tuple (row) in the CUSTOMER relation.
u A relation is a set of such tuples (rows)
Formal Definitions - Domain
u A domain has a logical definition:
u Example: “USA_phone_numbers” are the set of 10 digit phone numbers valid in
the U.S.
u A domain also has a data-type or a format defined for it.
u The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a
decimal digit.
u Dates have various formats such as year, month, date formatted as yyyy-mm-dd,
or as dd mm,yyyy etc.
u The attribute name designates the role played by a domain in a relation:
u Used to interpret the meaning of the data elements corresponding to that
attribute
u Example: The domain Date may be used to define two attributes named “Invoice-
date” and “Payment-date” with different meanings
Formal Definitions - State
u The relation state is a subset of the Cartesian product of the domains of its
attributes

u each domain contains the set of all possible values the attribute can take.

u Example: attribute Cust-name is defined over the domain of character strings of


maximum length 25

u dom(Cust-name) is varchar(25)

u The role these strings play in the CUSTOMER relation is that of the name of a
customer.
Formal Definitions - Summary
u Formally,
u Given R(A1, A2, .........., An)
u r(R) ⊂ dom (A1) X dom (A2) X ....X dom(An)
u R(A1, A2, …, An) is the schema of the relation
u R is the name of the relation
u A1, A2, …, An are the attributes of the relation
u r(R): a specific state (or "value" or “population”) of relation R – this is a set of
tuples (rows)
u r(R) = {t1, t2, …, tn} where each ti is an n-tuple
u ti = <v1, v2, …, vn> where each vj element-of dom(Aj)
Formal Definitions - Example
u Let R(A1, A2) be a relation schema:
u Let dom(A1) = {0,1}
u Let dom(A2) = {a,b,c}
u Then: dom(A1) X dom(A2) is all possible combinations:
{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }

u The relation state r(R) ⊂ dom(A1) X dom(A2)


u For example: r(R) could be {<0,a> , <0,b> , <1,c> }
u this is one possible state (or “population” or “extension”) r of the relation R,
defined over A1 and A2.
u It has three 2-tuples: <0,a> , <0,b> , <1,c>
Definition Summary
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Domain
Values
Row Tuple

Table Definition Schema of a Relation


Populated Table State of the Relation
Slide 5- 14
Example – A relation STUDENT
Characteristics Of Relations
u Ordering of tuples in a relation r(R):

u The tuples are not considered to be ordered, even though they appear to be in
the tabular form.

u Ordering of attributes in a relation schema R (and of values within each tuple):

u We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2,
..., vn> to be ordered .

u (However, a more general alternative definition of relation does not require


this ordering).
Same state as previous Figure (but with
different order of tuples)

Slide 5- 17
Characteristics Of Relations
u Values in a tuple:
u All values are considered atomic (indivisible).
u Each value in a tuple must be from the domain of the attribute for that column
u Iftuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of R(A1, A2,
…, An)
u Then each vi must be a value from dom(Ai)

u A special null value is used to represent values that are unknown or inapplicable
to certain tuples.
Characteristics Of Relations
u Notation:
u We refer to component values of a tuple t by:
u t[Ai] or t.Ai
u This is the value vi of attribute Ai for tuple t
u Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of
attributes Au, Av, ..., Aw, respectively in t
Relational Integrity Constraints
u Constraints are conditions that must hold on all valid relation states.
u There are three main types of constraints in the relational model:
u Key constraints
u Entity integrity constraints
u Referential integrity constraints
u Another implicit constraint is the domain constraint
u Every value in a tuple must be from the domain of its attribute (or it could be
null, if allowed for that attribute)
Key Constraints
u Superkey of R:
u Is a set of attributes SK of R with the following condition:
u No two tuples in any valid relation state r(R) will have the same value for SK
u That is, for any distinct tuples t1 and t2 in r(R), t1[SK] ≠ t2[SK]
u This condition must hold in any valid state r(R)
u Key of R:
u A "minimal" superkey
u That is, a key is a superkey K such that removal of any attribute from K results in
a set of attributes that is not a superkey (does not possess the superkey
uniqueness property)
Key Constraints (continued)
u Example: Consider the CAR relation schema:
u CAR(State, Reg#, SerialNo, Make, Model, Year)
u CAR has two keys:
u Key1 = {State, Reg#}
u Key2 = {SerialNo}
u Both are also superkeys of CAR
u {SerialNo, Make} is a superkey but not a key.
u In general:
u Any key is a superkey (but not vice versa)
u Any set of attributes that includes a key is a superkey
u A minimal superkey is also a key
Key Constraints (continued)
u If a relation has several candidate keys, one is chosen arbitrarily to be the primary
key.
u The primary key attributes are underlined.
u Example: Consider the CAR relation schema:
u CAR(State, Reg#, SerialNo, Make, Model, Year)
u We chose SerialNo as the primary key
u The primary key value is used to uniquely identify each tuple in a relation
u Provides the tuple identity
u Also used to reference the tuple from another tuple
u General rule: Choose as primary key the smallest of the candidate keys (in terms of
size)
u Not always applicable – choice is sometimes subjective
CAR table with two candidate keys –
LicenseNumber chosen as Primary Key

Slide 5- 24
Relational Database Schema
u Relational Database Schema:
u A set S of relation schemas that belong to the same database.
u S is the name of the whole database schema
u S = {R1, R2, ..., Rn}
u R1, R2, …, Rn are the names of the individual relation schemas within the
database S
u Following slide shows a COMPANY database schema with 6 relation schemas
COMPANY Database Schema

Slide 5- 26
Entity Integrity
u Entity Integrity:
u The primary key attributes PK of each relation schema R in S cannot
have null values in any tuple of r(R).
u This is because primary key values are used to identify the individual tuples.
u t[PK] ≠ null for any tuple t in r(R)
u If PK has several attributes, null is not allowed in any of these attributes
u Note: Other attributes of R may be constrained to disallow null values,
even though they are not members of the primary key.
Referential Integrity

u A constraint involving two relations


u The previous constraints involve a single relation.
u Used to specify a relationship among tuples in two relations:
u The referencing relation and the referenced relation.
Referential Integrity
u Tuples in the referencing relation R1 have attributes FK (called foreign key
attributes) that reference the primary key attributes PK of the referenced relation
R2.
u A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
u A referential integrity constraint can be displayed in a relational database schema
as a directed arc from R1.FK to R2.
Referential Integrity (or foreign key)
Constraint
u Statement of the constraint
u The value in the foreign key column (or columns) FK of the
the referencing relation R1 can be either:
u (1)a value of an existing primary key value of a
corresponding primary key PK in the referenced relation
R2, or
u (2) a null.
u In case (2), the FK in R1 should not be a part of its own primary
key.
Displaying a relational database schema
and its constraints
u Each relation schema can be displayed as a row of attribute names
u The name of the relation is written above the attribute names
u The primary key attribute (or attributes) will be underlined
u A foreign key (referential integrity) constraints is displayed as a directed arc
(arrow) from the foreign key attributes to the referenced table
u Can also point the the primary key of the referenced relation for clarity
u Next slide shows the COMPANY relational schema diagram
Referential Integrity Constraints for COMPANY database

Slide 5- 32
Populated database state
u Each relation will have many tuples in its current relation state
u The relational database state is a union of all the individual relation states
u Whenever the database is changed, a new state arises
u Basic operations for changing the database:
u INSERT a new tuple in a relation
u DELETE an existing tuple from a relation
u MODIFY an attribute of an existing tuple
u Next slide shows an example state for the COMPANY database
Populated database state for COMPANY

Slide 5- 34
Update Operations on Relations
u INSERT a tuple.
u DELETE a tuple.
u MODIFY a tuple.
u Integrity constraints should not be violated by the update
operations.
u Several update operations may have to be grouped together.
u Updates may propagate to cause other updates automatically.
This may be necessary to maintain integrity constraints.

Slide 5- 35
Update Operations on Relations
u In case of integrity violation, several actions can be taken:
u Cancel the operation that causes the violation (RESTRICT or
REJECT option)
u Perform the operation but inform the user of the violation
u Trigger additional updates so the violation is corrected
(CASCADE option, SET NULL option)
u Execute a user-specified error-correction routine

Slide 5- 36
Possible violations for each operation
u INSERT may violate any of the constraints:
u Domain constraint:
u ifone of the attribute values provided for the new tuple is not of the specified
attribute domain
u Key constraint:
u ifthe value of a key attribute in the new tuple already exists in another tuple
in the relation
u Referential integrity:
u ifa foreign key value in the new tuple references a primary key value that does
not exist in the referenced relation
u Entity integrity:
u if the primary key value is null in the new tuple
Possible violations for each operation
u DELETE may violate only referential integrity:
u If the primary key value of the tuple being deleted is referenced from other
tuples in the database
u Can be remedied by several actions: RESTRICT, CASCADE, SET NULL (see Chapter 8
for more details)
u RESTRICT option: reject the deletion
u CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples
u SET NULL option: set the foreign keys of the referencing tuples to NULL
u One of the above options must be specified during database design for each
foreign key constraint
Possible violations for each operation
u UPDATE may violate domain constraint and NOT NULL constraint on an attribute
being modified
u Any of the other constraints may also be violated, depending on the attribute being
updated:
u Updating the primary key (PK):
u Similar to a DELETE followed by an INSERT
u Need to specify similar options to DELETE
u Updating a foreign key (FK):
u May violate referential integrity
u Updating an ordinary attribute (neither PK nor FK):
u Can only violate domain constraints
Summary

u Presented Relational Model Concepts


u Definitions
u Characteristics of relations
u Discussed Relational Model Constraints and Relational Database
Schemas
u Domain constraints’
u Key constraints
u Entity integrity
u Referential integrity
u Described the Relational Update Operations and Dealing with
Constraint Violations
Slide 5- 40
In-Class Exercise
(Taken from Exercise 5.15)
Consider the following relations for a database that keeps track of student
enrollment in courses and the books adopted for each course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys for this
schema.

Slide 5- 41
Relational Algebra
Chapter Outline
u Relational Algebra
u Unary Relational Operations
u Relational Algebra Operations From Set Theory
u Binary Relational Operations
u Additional Relational Operations
u Examples of Queries in Relational Algebra
u Relational Calculus
u Tuple Relational Calculus
u Domain Relational Calculus
u Example Database Application (COMPANY)
u Overview of the QBE language (appendix D)
Relational Algebra Overview
u Relational algebra is the basic set of operations for the
relational model
u These operations enable a user to specify basic retrieval
requests (or queries)
u The result of an operation is a new relation, which may have
been formed from one or more input relations
u This property makes the algebra “closed” (all objects in
relational algebra are relations)
Relational Algebra Overview (continued)
u The algebra operations thus produce new relations
u These can be further manipulated using operations of the
same algebra
u A sequence of relational algebra operations forms a relational
algebra expression
u The result of a relational algebra expression is also a
relation that represents the result of a database query (or
retrieval request)
Relational Algebra Overview
u Relational Algebra consists of several groups of operations
u Unary Relational Operations
u SELECT (symbol: σ (sigma))
u PROJECT (symbol: π (pi))
u RENAME (symbol: ρ (rho))
u Relational Algebra Operations From Set Theory
u UNION ( ∪ ), INTERSECTION ( ∩ ), DIFFERENCE (or MINUS, – )
u CARTESIAN PRODUCT ( x )
Relational Algebra Overview cont.
u Binary Relational Operations
u JOIN (several variations of JOIN exist)
u DIVISION
u Additional Relational Operations
u OUTER JOINS, OUTER UNION
u AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)
Database State for COMPANY
Unary Relational Operations: SELECT
u The SELECT operation (denoted by σ (sigma)) is used to select a subset of
the tuples from a relation based on a selection condition.
u The selection condition acts as a filter
u Keeps only those tuples that satisfy the qualifying condition
u Tuples satisfying the condition are selected whereas the other
tuples are discarded (filtered out)
u Examples:
u Select the EMPLOYEE tuples whose department number is 4:
σ DNO = 4 (EMPLOYEE)
u Select the employee tuples whose salary is greater than $30,000:
σ SALARY > 30,000 (EMPLOYEE)
Unary Relational Operations: SELECT

u In general, the select operation is denoted by σ


<selection condition>(R) where

u the symbol σ (sigma) is used to denote the select operator


u the selection condition is a Boolean (conditional) expression
specified on the attributes of relation R
u tuples that make the condition true are selected
u appear in the result of the operation
u tuples that make the condition false are filtered out
u discarded from the result of the operation
Unary Relational Operations: SELECT (contd.)
u SELECT Operation Properties
u The SELECT operation σ <selection condition>(R) produces a relation S
that has the same schema (same attributes) as R
u SELECT σ is commutative:
u σ <condition1>(σ < condition2> (R)) = σ <condition2> (σ < condition1> (R))
u Because of commutativity property, a cascade (sequence) of
SELECT operations may be applied in any order:
u σ<cond1>(σ<cond2> (σ<cond3> (R)) = σ<cond2> (σ<cond3> (σ<cond1> ( R)))
u A cascade of SELECT operations may be replaced by a single
selection with a conjunction of all the conditions:
u σ<cond1>(σ< cond2> (σ<cond3>(R)) = σ <cond1> AND < cond2> AND < cond3>(R)))
u The number of tuples in the result of a SELECT is less than (or
equal to) the number of tuples in the input relation R
u Because of commutativity property, a cascade (sequence) of SELECT
operations may be applied in any order:
uσ<cond1>(σ<cond2> (σ<cond3> (R)) = σ<cond2> (σ<cond3> (σ<cond1> ( R)))

u x
The following query results refer to this
database state

Slide 6- 12
Unary Relational Operations: PROJECT
u PROJECT Operation is denoted by π (pi)
u This operation keeps certain columns (attributes) from a relation and discards
the other columns.
u PROJECT creates a vertical partitioning
u The list of specified columns (attributes) is kept in each tuple
u The
other attributes in each tuple are discarded
u Example: To list each employee’s first and last name and salary, the following is
used:
πLNAME, FNAME,SALARY(EMPLOYEE)
Unary Relational Operations: PROJECT (cont.)

u The general form of the project operation is:


π<attribute list>(R)
u π (pi) is the symbol used to represent the project operation
u <attribute list> is the desired list of attributes from relation
R.
u The project operation removes any duplicate tuples
u This is because the result of the project operation must be a
set of tuples
u Mathematical sets do not allow duplicate elements.

Slide 6- 14
Unary Relational Operations: PROJECT
(contd.)
u PROJECT Operation Properties
u The number of tuples in the result of projection π<list>(R) is always less or
equal to the number of tuples in R
u Ifthe list of attributes includes a key of R, then the number of tuples in
the result of PROJECT is equal to the number of tuples in R
u PROJECT is not commutative
(π <list2> (R) ) = π
u π <list1> <list1> (R) as long as <list2> contains the
attributes in <list1>

Slide 6- 15
Examples of applying SELECT and PROJECT
operations

Slide 6- 16
Relational Algebra Expressions
u We may want to apply several relational algebra operations one
after the other
u Either we can write the operations as a single relational
algebra expression by nesting the operations, or
u We can apply one operation at a time and create
intermediate result relations.
u In the latter case, we must give names to the relations that
hold the intermediate results.

Slide 6- 17
Single expression versus sequence of
relational operations (Example)
u To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must
apply a select and a project operation
u We can write a single relational algebra expression as
follows:
u πFNAME, LNAME, SALARY(σ DNO=5(EMPLOYEE))
u OR We can explicitly show the sequence of operations,
giving a name to each intermediate relation:
u DEP5_EMPS ← σ DNO=5(EMPLOYEE)
u RESULT ← π FNAME, LNAME, SALARY (DEP5_EMPS)
Slide 6- 18
Unary Relational Operations: RENAME

u The RENAME operator is denoted by ρ (rho)


u In some cases, we may want to rename the attributes of a relation or
the relation name or both
u Useful when a query requires multiple operations
u Necessary in some cases (see JOIN operation later)

Slide 6- 19
Unary Relational Operations: RENAME
(contd.)

u The general RENAME operation ρ can be expressed by any of the


following forms:
u ρS (B1, B2, …, Bn )(R) changes both:
u the relation name to S, and
u the column (attribute) names to B1, B1, …..Bn
u ρS(R) changes:
u the relation name only to S
u ρ(B1, B2, …, Bn )(R) changes:
u the column (attribute) names only to B1, B1, …..Bn
Slide 6- 20
Unary Relational Operations: RENAME
(contd.)

u For convenience, we also use a shorthand for renaming attributes in an


intermediate relation:
u If we write:
• RESULT ← π FNAME, LNAME, SALARY (DEP5_EMPS)
• RESULT will have the same attribute names as
DEP5_EMPS (same attributes as EMPLOYEE)
• If we write:
• RESULT (F, M, L, S, B, A, SX, SAL, SU, DNO)← π
FNAME, LNAME, SALARY (DEP5_EMPS)
• The 10 attributes of DEP5_EMPS are renamed to F,
M, L, S, B, A, SX, SAL, SU, DNO, respectively
Slide 6- 21
Example of applying multiple operations and
RENAME

Slide 6- 22
Relational Algebra Operations from Set Theory: UNION
u UNION Operation
u Binary operation, denoted by ∪
u The result of R ∪ S, is a relation that includes all tuples that
are either in R or in S or in both R and S
u Duplicate tuples are eliminated
u The two operand relations R and S must be “type
compatible” (or UNION compatible)
uR and S must have same number of attributes
u Eachpair of corresponding attributes must be type
compatible (have same or compatible domains)

Slide 6- 23
Relational Algebra Operations from Set Theory: UNION
u Example:

u To retrieve the social security numbers of all employees who either work in
department 5 (RESULT1 below) or directly supervise an employee who works in
department 5 (RESULT2 below)

u We can use the UNION operation as follows:

DEP5_EMPS ← σDNO=5 (EMPLOYEE)

RESULT1 ← π SSN(DEP5_EMPS)

RESULT2(SSN) ← πSUPERSSN(DEP5_EMPS)

RESULT ← RESULT1 ∪ RESULT2

u The union operation produces the tuples that are in either RESULT1 or RESULT2 or
both
Example of the result of a UNION operation
u UNION Example

Slide 6- 25
Relational Algebra Operations from Set Theory
u Type Compatibility of operands is required for the binary set operation UNION ∪,
(also for INTERSECTION ∩, and SET DIFFERENCE –, see next slides)

u R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:

u they have the same number of attributes, and

u the domains of corresponding attributes are type compatible (i.e.


dom(Ai)=dom(Bi) for i=1, 2, ..., n).

u The resulting relation for R1∪R2 (also for R1∩R2, or R1–R2, see next slides) has
the same attribute names as the first operand relation R1 (by convention)
Relational Algebra Operations from Set Theory: INTERSECTION

u INTERSECTION is denoted by ∩

u The result of the operation R ∩ S, is a relation that includes all tuples


that are in both R and S

u The attribute names in the result will be the same as the attribute
names in R

u The two operand relations R and S must be “type compatible”


Relational Algebra Operations from Set Theory: SET DIFFERENCE (cont.)

u SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by –

u The result of R – S, is a relation that includes all tuples that are in R


but not in S

u The attribute names in the result will be the same as the attribute
names in R

u The two operand relations R and S must be “type compatible”


Slide 6- 29
Some properties of UNION, INTERSECT, and DIFFERENCE
u Notice that both union and intersection are commutative operations; that is

u R ∪ S = S ∪ R, and R ∩ S = S ∩ R

u Both union and intersection can be treated as n-ary operations applicable to any
number of relations as both are associative operations; that is

u R ∪ (S ∪ T) = (R ∪ S) ∪ T

u (R ∩ S) ∩ T = R ∩ (S ∩ T)

u The minus operation is not commutative; that is, in general

u R–S≠S–R
Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT
u CARTESIAN (or CROSS) PRODUCT Operation

u This operation is used to combine tuples from two relations in a combinatorial fashion.

u Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)

u Result is a relation Q with degree n + m attributes:

u Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.

u The resulting relation state has one tuple for each combination of tuples—one from R and
one from S.

u Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have nR *
nS tuples.

u The two operands do NOT have to be "type compatible”


Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT (cont.)

u Generally, CROSS PRODUCT is not a meaningful operation

u Can become meaningful when followed by other operations

u Example (not meaningful):

u FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)

u EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)

u EMP_DEPENDENTS ← EMPNAMES x DEPENDENT

u EMP_DEPENDENTS will contain every combination of EMPNAMES and DEPENDENT

u whether or not they are actually related


Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT
(cont.)
u To keep only combinations where the DEPENDENT is related to the EMPLOYEE, we add a SELECT
operation as follows

u Example (meaningful):

u FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)

u EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)

u EMP_DEPENDENTS ← EMPNAMES x DEPENDENT

u ACTUAL_DEPS ← σ SSN=ESSN(EMP_DEPENDENTS)

u RESULT ← π FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)

u RESULT will now contain the name of female employees andSlidetheir


6- 33
dependent
Example of applying CARTESIAN PRODUCT
Binary Relational Operations: JOIN
u JOIN Operation (denoted by )
u The sequence of CARTESIAN PRODUCT followed by SELECT is used quite commonly to
identify and select related tuples from two relations

u A special operation, called JOIN combines this sequence into a single operation

u This operation is very important for any relational database with more than a single
relation, because it allows us combine related tuples from various relations

u The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . .
., Bm) is:

R <join condition>S

u where R and S can be any relations that result from general relational algebra
expressions.
Binary Relational Operations: JOIN (cont.)
u Example: Suppose that we want to retrieve the name of the manager of each department.

u To get the manager’s name, we need to combine each DEPARTMENT tuple with the
EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department tuple.

u We do this by using the join operation.

u DEPT_MGR ← DEPARTMENT MGRSSN=SSN EMPLOYEE

u MGRSSN=SSN is the join condition

u Combines each department record with the employee who manages the department

u The join condition can also be specified as DEPARTMENT.MGRSSN= EMPLOYEE.SSN


Example of applying the JOIN operation
Some properties of JOIN
u Consider the following JOIN operation:

u R(A1, A2, . . ., An) S(B1, B2, . . ., Bm)

R.Ai=S.Bj

u Result is a relation Q with degree n + m attributes:

u Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.

u The resulting relation state has one tuple for each combination of tuples—r from R and
s from S, but only if they satisfy the join condition r[Ai]=s[Bj]

u Hence, if R has nR tuples, and S has nS tuples, then the join result will generally have
less than nR * nS tuples.

u Only related tuples (based on the join condition) will appear in the result
Some properties of JOIN
u The general case of JOIN operation is called a Theta-join: R S
theta

u The join condition is called theta

u Theta can be any general boolean expression on the attributes of R and S; for
example:

u R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)

u Most join conditions involve one or more equality conditions “AND”ed together; for
example:

u R.Ai=S.Bj AND R.Ak=S.Bl AND R.Ap=S.Bq


Binary Relational Operations: EQUIJOIN
u EQUIJOIN Operation

u The most common use of join involves join conditions with equality comparisons
only

u Such a join, where the only comparison operator used is =, is called an EQUIJOIN.

u In the result of an EQUIJOIN we always have one or more pairs of attributes


(whose names need not be identical) that have identical values in every
tuple.

u The JOIN seen in the previous example was an EQUIJOIN.


Binary Relational Operations: NATURAL JOIN Operation
u NATURAL JOIN Operation

u Another variation of JOIN called NATURAL JOIN


— denoted by * was created to get rid of the second (superfluous) attribute in
an EQUIJOIN condition.

u because one of each pair of attributes with identical values is superfluous

u The standard definition of natural join requires that the two join attributes, or
each pair of corresponding join attributes, have the same name in both
relations

u If this is not the case, a renaming operation is applied first.


Binary Relational Operations NATURAL JOIN (contd.)
u Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT and
DEPT_LOCATIONS, it is sufficient to write:

u DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS

u Only attribute with the same name is DNUMBER

u An implicit join condition is created based on this attribute:

DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER
Binary Relational Operations NATURAL JOIN (contd.)
u Another example: Q ← R(A,B,C,D) * S(C,D,E)

u The implicit join condition includes each pair of attributes


with the same name, “AND”ed together:

u R.C=S.C AND R.D.S.D

u Result keeps only one attribute of each such pair:

u Q(A,B,C,D,E)
Example of NATURAL JOIN operation

Slide 6- 44
Complete Set of Relational Operations
u The set of operations including SELECT σ, PROJECT π , UNION ∪, DIFFERENCE − ,
RENAME ρ, and CARTESIAN PRODUCT X is called a complete set because any
other relational algebra expression can be expressed by a combination of these
five operations.

u For example:

u R ∩ S = (R ∪ S ) – ((R − S) ∪ (S − R))

u R <join condition>S =σ <join condition> (R X S)


Binary Relational Operations: DIVISION
u DIVISION Operation

u The division operation is applied to two relations

u R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X ∪ Y); that is, let Y
be the set of attributes of R that are not attributes of S.

u The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear
in R with tR [Y] = t, and with

u tR [X] = ts for every tuple ts in S.

u For a tuple t to appear in the result T of the DIVISION, the values in t must
appear in R in combination with every tuple in S.
Example of DIVISION

Slide 6- 47
Recap of Relational Algebra Operations

Slide 6- 48

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy