The document provides an overview of relational database systems, including their purpose, architecture, and the various data models used to represent data. It discusses the roles of database users and administrators, as well as the entity-relationship model which is fundamental for database design. Key concepts such as instances, schemas, attributes, and relationships are explained, along with the importance of data abstraction and constraints in database management.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views
M1 (1)
The document provides an overview of relational database systems, including their purpose, architecture, and the various data models used to represent data. It discusses the roles of database users and administrators, as well as the entity-relationship model which is fundamental for database design. Key concepts such as instances, schemas, attributes, and relationships are explained, along with the importance of data abstraction and constraints in database management.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95
Module1
M O D U L E I : R E L AT I O N A L D ATA B A S E S Introduction
A database-management system (DBMS) is a
collection of interrelated data and a set of programs to access those data. The collection of data, usually referred to as the database, contains information relevant to an enterprise. The primary goal of a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. Purpose of Database System
Before database management systems (DBMSs) were
introduced, organizations usually stored information in file-processing system. Keeping organizational information in a file- processing system has a number of major disadvantages: Data redundancy and inconsistency . In addition, it may lead to data inconsistency; Difficulty in accessing data. Data isolation Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints • Atomicity problems. Concurrent-access anomalies. Security problems View of Data
A database system is a collection of interrelated
data and a set of programs that allow users to access and modify these data. A major purpose of a database system is to provide users with an abstract view of the data. That is, the system hides certain details of how the data are stored and maintained. Data Abstraction
developers hide the complexity from users
through several levels of abstraction, to simplify users’ interactions with the system: Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail. Logical level. The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The user of the logical level does not need to be aware of the complexity of physical-level structures. This is referred to as physical data independence. View level. The highest level of abstraction describes only part of the entire database. Many users of the database system do not need all information; instead, they need to access only a part of the database. The system may provide many views for the same database. Instances and Schemas
Databases change over time as information is
inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema Database systems have several schemas, partitioned according to the levels of abstraction. The physical schema describes the database design at the physical level, while the logical schema describes the database design at the logical level. A database may also have several schemas at the view level, sometimes called sub schemas, that describe different views of the database. Data Models
Underlying the structure of a database is the
data model: a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. A data model provides a way to describe the design of a database at the physical, logical, and view levels. The data models can be classified into four different categories: 1. Relational Model.
The relational model uses a collection of tables
to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record- based model. 2. Entity-Relationship Model.
The entity-relationship (E-R) data model uses a
collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in data base design 3. Object-Based Data Model.
Object-oriented programming (especially in
Java,C++, or C#) has become the dominant software-development methodology . This led to the development of an object-oriented data model that can be seen as extending the E-R model with notions of encapsulation, methods(functions), and object identity. The object-relational data model combines features of the object-oriented data model and relational data model. 4. Semi structured Data Model.
The semi structured data model permits the
specification of data where individual data items of the same type may have different sets of attributes. This is in contrast to the data models mentioned earlier, where every data item of a particular type must have the same set of attributes. The Extensible Markup Language (XML)is widely used to represent semi structured data. Database Architecture
Database applications are usually partitioned
into two or three parts. Database Users and Administrators
A primary goal of a database system is to
retrieve information from and store new information into the database. People who work with a database can be categorized as database users or database administrators Database Users and User Interfaces
There are four different types of database-system users,
differentiated by the way they expect to interact with the system. Different types of user interfaces have been designed for the different types of users. 1. Naıve users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. For example, a clerk in the university The typical user interface for naıve users is a forms interface, where the user can fill in appropriate fields of the form. 2. Application programmers are computer professionals who write application programs. Application programmers can choose from many tools to develop user interfaces. 3. Sophisticated users interact with the system without writing programs. In-stead, they form their requests either using a database query language or by using tools such as data analysis software. 4. Specialized users are sophisticated users who write specialized database applications that do not fit into the traditional data-processing frame work. Among these applications are computer-aided design systems, knowledge-base and expert systems, systems that store data with complex data types (for example, graphics data and audio data), and environment-modeling systems. Database Administrator
One of the main reasons for using DBMSs is to
have central control of both the data and the programs that access those data. A person who has such central control over the system is called a database administrator (DBA). The functions of a DBA include: Schema definition. The DBA creates the original database schema by executing a set of data definition statements in the DDL. Storage structure and access-method definition. Schema and physical-organization modification. The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization, or to alter the physical organization to improve performance. Granting of authorization for data access. By granting different types of authorization, the database administrator can regulate which parts of the data base various users can access. The authorization information is kept in a special system structure that the database system consults whenever some one attempts to access the data in the system. Routine maintenance. Examples of the database administrator’s routine maintenance activities are: Periodically backing up the database, either onto tapes or onto remote servers, to prevent loss of data in case of disasters such as flooding. Ensuring that enough free disk space is available for normal operations , and upgrading disk space as required. Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive tasks submitted by some users. The Entity-Relationship Model
The entity-relationship(E-R) data model was
developed to facilitate data base design. The E-R data model employs three basic concepts: entity sets, relationship sets, and attributes. Entity Sets
An entity is a “thing” or “object” in the real
world that is distinguishable from all other objects. For example, each person in a university is an entity. An entity has a set of properties, and the values for some set of properties may uniquely identify an entity. For instance, a person may have a person id property whose value uniquely identifies that person. An entity set is a set of entities of the same type that share the same properties , or attributes. The set of all people who are instructors at a given university, for example, can be defined as the entity set instructor. Similarly, the entity set student might represent the set of all students in the university Entity sets do not need to be disjoint. For example, it is possible to define the entity set of all people in a university (person). A person entity may be an instructor entity, a student entity, both, or neither. Attributes
An entity is represented by a set of attributes.
Attributes are descriptive properties possessed by each member of an entity set. Possible attributes of the instructor entity set are ID ,name , deptname , and salary. Possible attributes of the course entity set are courseid , title , deptname , and credits. Each entity has a value for each of its attributes. Relationship Sets
A relationship is an association among several
entities. For example, we can define a relationship advisor that associates instructor James with student Shankar. This relationship specifies that James is an advisor to student Shankar. A relationship set is a set of relationships of the same type. Formally, it is a mathematical relation on n≥2 (possibly non distinct) entity sets. If E1,E2,...,En are entity sets, then a relationship set Ris a subset of {(e1,e2,...,en)|e1∈E1,e2∈E2,..., en∈En} where (e1,e2,...,en) is a relationship The association between entity sets is referred to as participation; that is, the entity setsE1,E2,...,En participate in relationship set R. The function that an entity plays in a relationship is called that entity’s role. A relationship may also have attributes called descriptive attributes . Consider a relationship set advisor with entity sets instructor and student. We could associate the attribute date with that relationship to specify the date when an instructor became the advisor of a student. Attributes
For each attribute, there is a set of permitted
values, called the domain,or value set, of that attribute. The domain of attribute courseid might be the set of all text strings of a certain length. An attribute, as used in the E-R model, can be characterized by the following attribute types Simple and composite attributes. Single-valued and multi valued attributes. Derived attribute. Simple and composite attributes.
Simple- that is, they have not been divided into
subparts. Composite attributes- can be divided into subparts (that is, other attributes). For example, an attribute name could be structured as a composite attribute consisting of first name ,middle initial , and last name. Single-valued and multi valued attributes.
single value for a particular entity.
For instance, the studentID attributefor a specific student entity refers to only one studentID. Such attributes are said to be single valued. multi valued attributes -an attribute has a set of values for a specific entity. Eg: a phone number attribute. Derived attribute.
The value for this type of attribute can be
derived from the values of other related attributes or entities. Eg: age Constraints
An E-R enterprise schema may define certain
constraints to which the contents of a database must conform. Mapping Cardinalities
Mapping cardinalities, or cardinality ratios,
express the number of entities to which another entity can be associated via a relationship set. For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of the following: One-to-one. One-to-many. Many-to-one. Many-to-many. One-to-one. An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A One-to-many. An entity in A is associated with any number (zero or more)of entities in B. An entity in B, however, can be associated with at most one entity in A. Many-to-one. An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A. Many-to-many. An entity in A is associated with any number (zero or more)of entities in B, and an entity in B is associated with any number (zero or more) of entities in A. Participation Constraints
The participation of an entity set E in a
relationship set R is said to be total if every entity in E participates in at least one relationship in R. If only some entities in E participate in relationships in R, the participation of entity set E in relationship R is said to be partial. Keys
The values of the attribute values of an entity
must be such that they can uniquely identify the entity. Entity-Relationship Diagrams
An E-R diagram can express the overall logical
structure of a database graphically. Basic Structure An E-R diagram consists of the following major components Rectangles divided into two parts :represent entity sets. The first part, contains the name of the entity set. The second part contains the names of all the attributes of the entity set. Diamonds represent relationship sets. Undivided rectangles represent the attributes of a relationship set. Attributes that are part of the primary key are underlined. Lines link entity sets to relationship sets. Dashed lines link attributes of a relationship set to the relationship set. Double lines indicate total participation of an entity in a relationship set Double diamonds represent identifying relationship sets linked to weak entity sets Roles Weak Entity Sets
An entity set that does not have sufficient
attributes to form a primary key is termed a weak entity set. An entity set that has a primary key is termed a strong entity set. For a weak entity set to be meaningful, it must be associated with another entity set, called the identifying or owner entity set. Every weak entity must be associated with an identifying entity; that is, the weak entity set is said to be existence dependent on the identifying entity set. The identifying entity set is said to own the weak entity set that it identifies. The relationship associating the weak entity set with the identifying entity set is called the identifying relationship The identifying relationship is many-to-one from the weak entity set to the identifying entity set, and the participation of the weak entity set in the relationship is total. The identifying relationship set should not have any descriptive attributes, since any such attributes can instead be associated with the weak entity set. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. The primary key of a weak entity set is formed by the primary key of the identifying entity set, plus the weak entity set’s discriminator. In E-R diagrams, a weak entity set is depicted via a rectangle, like a strong entity set, but there are two main differences: The discriminator of a weak entity is underlined with a dashed, rather than a solid, line. The relationship set connecting the weak entity set to the identifying strong entity set is depicted by a double diamond. Problem A company database needs to store information about employees (identified by ssn, with salary and phone as attributes), departments (identified by dno, with dname and budget as attributes), and children of employees (with name and age as attributes). Problem Employees work in departments; each department is managed by an employee; a child must be identified uniquely by name when the parent (who is an employee; assume that only one parent works for the company) is known. We are not interested in information about a child once the parent leaves the company. Draw an ER diagram that captures this information. Introduction to the RelationalModel
Structure of Relational Databases
A relational database consists of a collection of tables, each of which is assigned a unique name. For example, consider the instructor table,which stores information about instructors. The table has four column headers: ID, name, deptname, and salary. Each row of this table records information about an instructor. in the relational model the term relation is used to refer to a table, while the term tuple is used to refer to a row. Similarly, the term attribute refers to a column of a table. the term relation instance to refer to a specific instance of a relation, i.e., containing a specific set of rows. For each attribute of a relation, there is a set of permitted values, called the domain of that attribute. The domain of the name attribute is the set of all possible instructor names. for all relations r, the domains of all attributes of r be atomic. A domain is atomic if elements of the domain are considered to be indivisible units. Database Schema
The database schema, which is the logical design
of the database, and the database instance, which is a snapshot of the data in the database at a given instant in time. In general, a relation schema consists of a list of attributes and their corresponding domains. department(deptname,building,budget) Keys
Super key is a set of one or more attributes that, taken
collectively, allow us to identify uniquely a tuple in the relation. Such minimal super keys are called candidate keys The term primary key to denote a candidate key that is chosen by the database designer as the principal means of identifying tuples within a relation. A relation, say r1, may include among its attributes the primary key of an other relation, say r2. This attribute is called a foreign key from r1, referencing r2. Relational Query Languages
A query language is a language in which a user requests
information from the database. These languages are usually on a level higher than that of a standard programming language. Query languages can be categorized as either procedural or nonprocedural. In a procedural language, the user instructs the system to perform a sequence of operations on the database to compute the desired result. In a non procedural language, the user describes the desired information with out giving a specific procedure for obtaining that information The Relational Algebra
The relational algebra is a procedural query
language. It consists of a set of operations that take one or two relations as input and produce a new relation as their result. The fundamental operations in the relational algebra are select, project, union , set difference ,Cartesian product , and rename. In addition to the fundamental operations, there are several other operations—namely, set intersection , naturaljoin ,and assignment. Fundamental Operations
The select, project, and rename operations
are called unary operations, because they operate on one relation. The other three operations operate on pairs of relations and are, therefore, called binary operations The Select Operation
The select operation selects tuples that satisfy a
given predicate. We use the lower case Greek letter sigma (σ) to denote selection. The predicate appears as a subscript to σ . The argument relation is in parentheses after the σ Thus, to select those tuples of the instructor relation where the instructor is in the “Physics” department, we write: σ deptname=“Physics” (instructor) We can find all instructors with salary greater than $90,000 by writing: σ salary>90000(instructor) we allow comparisons using=,=,<,≤,>, and ≥ in the selection predicate. Furthermore, we can combine several predicates into a larger predicate by using the connectives and (∧),or(∨), and not(¬). Thus, to find the instructors in Physics with a salary greater than $90,000, we write: σ deptname=“Physics”∧salary>90000 (instructor) The Project Operation
Projection is denoted by the uppercase Greek
letter pi (Π). We list those attributes that we wish to appear in the result as a subscript to Π . The argument relation follows in parentheses. Π ID,name,salary(instructor) Composition of Relational Operations
“Find the name of all instructors in the Physics
department.” Πname(σdeptname=“Physics” (instructor)) The Union Operation To find the set of all courses taught in the Fall 2009 semester, we write: Πcourseid(σsemester=“Fall”∧year=2009(section)) To find the set of all courses taught in the Spring 2010 semester, we write: Π courseid(σsemester=“Spring”∧year=2010(section)) Πcourseid(σsemester=“Fall”∧year=2009(section)) U Π (σsemester=“Spring”∧year=2010(section)) courseid The Set-Difference Operation
The set-difference operation, denoted by −,
allows us to find tuples that are in one relation but are not in another. The expression r−s produces a relation containing those tuples in r but not in s. Courses offered in the Fall 2009 semester but not in Spring 2010 semester. Πcourseid(σsemester=“Fall”∧year=2009(section)) - Π courseid(σsemester=“Spring”∧year=2010(section)) The Cartesian-Product Operation
The Cartesian-product operation, denoted by a
cross (×), allows us to combine information from any two relations. We write the Cartesian product of relations r1 and r2 as r1×r2. The Rename Operation
The rename operator, denoted by the lowercase
Greek letter rho (ρ), ρ x(E) Formal Definition of the Relational Algebra
Let E1 and E2 be relational-algebra expressions.
Then, the following are all relational-algebra expressions: •E1∪E2 •E1−E2 •E1×E2 •σP(E1), where P is a predicate on attributes inE1 •ΠS(E1), where S is a list consisting of some of the attributes inE1 ρx(E1), where x is the new name for the result of E1 Additional Relational-Algebra Operations
The Set-Intersection Operation
Suppose that we wish to find the set of all courses taught in both theFall 2009 and the Spring 2010 semesters. Using set intersection, we can write Π courseid (σ semester=“Fall”∧year=2009(section)) ∩ Π courseid (σ semester=“Spring”∧year=2010(section)) Note that we can rewrite any relational-algebra expression that uses set intersection by replacing the intersection operation with a pair of set-difference operations as: r∩s=r−(r−s) The Natural-Join Operation
The natural join is a binary operation that allows
us to combine certain selections and a Cartesian product into one operation. It is denoted by the join symbol .
The natural-join operation forms a Cartesian
product of its two arguments,performs a selection forcing equality on those attributes that appear in both rela-tion schemas, and finally removes duplicate attributes. “Find the names of all instructors together with the courseid of all courses they taught. The Assignment Operation
It is convenient at times to write a relational-
algebra expression by assigning parts of it to temporary relation variables. The assignment operation, denoted by ←, works like assignment in a programming language. OUTER JOINs
Notice that much of the data is lost when applying a
join to two relations. In some cases this lost data might hold useful information. An outer join retains the information that would have been lost from the tables, replacing missing data with nulls. There are three forms of the outer join, depending on which data is to be kept. LEFT OUTER JOIN - keep data from the left-hand table RIGHT OUTER JOIN - keep data from the right-hand table FULL OUTER JOIN - keep data from both tables 1. Write a relational algebra expression that returns the food items required to cook the recipe “Pasta and Meat-balls”. For each such food item return the item paired with the number of ounces required by the recipe. 2. Write a relational algebra expression that returns food items that are sold at “Aldi” and their price 3. Write a relational algebra expression that returns food items (item) that are of type “Wheat product” or oftype “Meat” and have at least 20 calories per ounce (attribute calories)