UNIT-I
UNIT-I
Database Concepts
The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students
and faculty etc.
What is Data?
Word 'Data' is originated from the word 'datum' that means 'single piece of information.'
It is plural of the word datum.
In computing, Data is information that can be translated into a form for efficient
movement and processing. Data is interchangeable.
What is Database?
You can organize data into tables, rows, columns, and index it to make it easier to find
relevant information.
Database handlers create a database in such a way that only one set of software program
provides access of data to all the users.
The main purpose of the database is to operate a large amount of information by storing,
retrieving, and managing data.
SQL or Structured Query Language is used to operate on the data stored in a database.
SQL depends on relational algebra and tuple relational calculus.
Evolution of Databases
The database has completed more than 50 years of journey of its evolution from flat-file
system to relational and objects relational systems. It has gone through several
generations.
The Evolution
File-Based
1968 was the year when File-Based database were introduced. In file-based databases,
data was maintained in a flat file. Though files have many advantages, there are several
limitations.
One of the major advantages is that the file system has various access methods, e.g.,
sequential, indexed, and random.
Below diagram represents Hierarchical Data Model. Small circle represents objects.
Like file system, this model also had some limitations like complex implementation, lack
structural independence, can't easily handle a many-many relationship, etc.
In this model, files are related as owners and members, like to the common network
model.
Relational Database
1970 - Present: It is the era of Relational Database and Database Management. In 1970,
the relational model was proposed by E.F. Codd.
Relational database model has two main terminologies called instance and schema.
Schema specifies the structure like name of the relation, type of each column and name.
This model uses some mathematical concept like set theory and predicate logic.
During the era of the relational database, many more models had introduced like object-
oriented model, object-relational model, etc.
o Table
o Record/ Tuple
o Field/Column name /Attribute
o Keys
o Schema
Characteristics of DBMS
Advantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and
large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in most
of the organization, all the data stored in a single database and if the database is
damaged due to electric failure or database corruption then the data may be lost
forever.
An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and
consistency of the data.
All modern database management systems like SQL, MS SQL Server, IBM DB2,
ORACLE, My-SQL, and Microsoft Access are based on RDBMS.
How it works
A relational database is the most commonly used database. It contains several tables, and
each table has its primary key.
Due to a collection of an organized set of tables, data can be accessed easily in RDBMS.
From 1970 to 1972, E.F. Codd published a paper to propose using a relational database
model.
Properties of a Relation:
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
What is a row or record?
A row of a table is also called a record or tuple. It contains the specific information of
each entry in the table. It is a horizontal entity in the table. For example, The above table
contains 5 records.
Properties of a row:
1 Ajeet 24 B.Tech
What is a column/attribute?
A column is a vertical entity in the table which contains all information associated with a
specific field in a table. For example, "name" is a column in the above table which
contains all information about a student's name.
Properties of an Attribute:
Name
Ajeet
Aryan
Mahesh
Ratan
Vimal
The smallest unit of data in the table is the individual data item. It is stored at the
intersection of tuples and attributes.
1 Ajeet 24 B.Tech
Degree:
The total number of attributes that comprise a relation is known as the degree of the table.
For example, the student table has 4 attributes, and its degree is 4.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Cardinality:
The total number of tuples at any one time in a relation is known as the table's cardinality.
The relation whose cardinality is 0 is called an empty table.
For example, the student table has 5 rows, and its cardinality is 5.
1 Ajeet 24 B.Tech
2 aryan 20 C.A
3 Mahesh 21 BCA
4 Ratan 22 MCA
5 Vimal 26 BSC
Domain:
The domain refers to the possible values each attribute can contain. It can be specified
using standard data types such as integers, floating numbers, etc. For example, An
attribute entitled Marital_Status may be limited to married or unmarried values.
NULL Values
The NULL value of the table specifies that the field has been left blank during record
creation. It is different from the value filled with zero or a field that contains space.
Data Integrity
There are the following categories of data integrity exist with each RDBMS:
Domain integrity: It enforces valid entries for a given column by restricting the type, the
format, or the range of values.
Referential integrity specifies that rows cannot be deleted, which are used by other
records.
User-defined integrity: It enforces some specific business rules defined by users. These
rules are different from the entity, domain, or referential integrity.
Although DBMS and RDBMS both are used to store information in physical database but
there are some remarkable differences between them.
The main differences between DBMS and RDBMS are given below:
Example of RDBMS
Examples of DBMS are file
9) are mysql, postgre, sql
systems, xml etc.
server, oracle etc.
After observing the differences between DBMS and RDBMS, you can say that RDBMS
is an extension of DBMS. There are many software products in the market today who are
compatible for both DBMS and RDBMS. Means today a RDBMS application is DBMS
application and vice-versa.
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to get
their request done.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick response.
2-Tier Architecture
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further
communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
The main objective of three level architecture is to enable multiple users to access the
same data with a personalized view while storing the underlying data only once. Thus it
separates the user's view from the physical structure of the database. This separation is
desirable for the following reasons:
o The internal level has an internal schema which describes the physical storage
structure of the database.
o The internal schema is also known as a physical schema.
o It uses the physical data model. It is used to define that how the data will be stored
in a block.
o The physical level is used to describe complex low-level data structures in detail.
The internal level is generally is concerned with the following activities:
2. Conceptual Level
o The conceptual schema describes the design of a database at the conceptual level.
Conceptual level is also known as logical level.
o The conceptual schema describes the structure of the whole database.
o The conceptual level describes what data are to be stored in the database and also
describes what relationship exists among those data.
o In the conceptual level, internal details such as an implementation of the data
structure are hidden.
o Programmers and database administrators work at this level.
3. External Level
o At the external level, a database contains several schemas that sometimes called as
subschema. The subschema is used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a particular user group is
interested and hides the remaining database from that user group.
o The view schema describes the end user interaction with database systems.
Data Independence
o Data independence can be explained using the three-schema architecture.
o Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.
There are two types of data independence:
o Physical data independence can be defined as the capacity to change the internal
schema without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the
Conceptual structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal
levels.
o Physical data independence occurs at the logical interface level.
Fig: Data Independence
o A DBMS has appropriate languages and interfaces to express database queries and
updates.
o Database languages can be used to read, store and update the data in the database.
o DDL stands for Data Definition Language. It is used to define database structure
or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the
number of tables and schemas, their names, indexes, columns in each table,
constraints, etc.
Here are some tasks that come under DDL:
DML stands for Data Manipulation Language. It is used for accessing and manipulating
data in a database. It handles user requests.
o DCL stands for Data Control Language. It is used to retrieve the stored or saved
data.
o The DCL execution is transactional. It also has rollback parameters.
(But in Oracle database, the execution of data control language does not have the
feature of rolling back.)
TCL is used to run the changes made by the DML statement. TCL can be grouped into a
logical transaction.
ACID Properties
1) Atomicity
The term atomicity defines that the data remains atomic. It means if any operation is
performed on the data, either it should be performed or executed completely or should not
be executed at all. It further means that the operation should not break in between or
execute partially. In the case of executing operations on the transaction, the operation
should be completely executed and not partially.
Example: If Remo has account A having $30 in his account from which he wishes to
send $10 to Sheero's account, which is B. In account B, a sum of $ 100 is already present.
When $10 will be transferred to account B, the sum will become $110. Now, there will be
two operations that will take place. One is the amount of $10 that Remo wants to transfer
will be debited from his account A, and the same amount will get credited to account B,
i.e., into Sheero's account. Now, what happens - the first operation of debit executes
successfully, but the credit operation, however, fails. Thus, in Remo's account A, the
value becomes $20, and to that of Sheero's account, it remains $100 as it was previously
present.
In the above diagram, it can be seen that after crediting $10, the amount is still $100 in
account B. So, it is not an atomic transaction.
The below image shows that both debit and credit operations are done successfully. Thus
the transaction is atomic.
Thus, when the amount loses atomicity, then in the bank systems, this becomes a huge
issue, and so the atomicity is the main focus in the bank systems.
2) Consistency
The word consistency means that the value should remain preserved always. In DBMS,
the integrity of the data should be maintained, which means if a change in the database is
made, it should remain preserved always. In the case of transactions, the integrity of the
data is very essential so that the database remains consistent before and after the
transaction. The data should always be correct.
Example:
In the above figure, there are three accounts, A, B, and C, where A is making a
transaction T one by one to both B & C. There are two operations that take place, i.e.,
Debit and Credit. Account A firstly debits $50 to account B, and the amount in account A
is read $300 by B before the transaction. After the successful transaction T, the available
amount in B becomes $150. Now, A debits $20 to account C, and that time, the value
read by C is $250 (that is correct as a debit of $50 has been successfully done to B). The
debit and credit operation from account A to C has been done successfully. We can see
that the transaction is done successfully, and the value is also read correctly. Thus, the
data is consistent. In case the value read by B and C is $300, which means that data is
inconsistent because when the debit operation executes, it will not be consistent.
3) Isolation
The term 'isolation' means separation. In DBMS, Isolation is the property of a database
where no data should affect the other one and may occur concurrently. In short, the
operation on one database should begin when the operation on the first database gets
complete. It means if two operations are being performed on two different databases, they
may not affect the value of one another. In the case of transactions, when two or more
transactions occur simultaneously, the consistency should remain maintained. Any
changes that occur in any particular transaction will not be seen by other transactions until
the change is not committed in the memory.
Example: If two operations are concurrently running on two different accounts, then the
value of both accounts should not get affected. The value should remain persistent. As
you can see in the below diagram, account A is making T1 and T2 transactions to account
B and C, but both are executing independently without affecting each other. It is known
as Isolation.
4) Durability
Durability ensures the permanency of something. In DBMS, the term durability ensures
that the data after the successful execution of the operation becomes permanent in the
database. The durability of the data should be so perfect that even if the system fails or
leads to a crash, the database still survives. However, if gets lost, it becomes the
responsibility of the recovery manager for ensuring the durability of the database. For
committing the values, the COMMIT command must be used every time we make
changes.
Therefore, the ACID property of DBMS plays a vital role in maintaining the consistency
and availability of data in the database.
Component of ER Diagram
1. Entity:
An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
a. Weak Entity
An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.
2. Attribute
The attribute is used to describe the property of an entity. Eclipse is used to represent an
attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
The key attribute is used to represent the main characteristics of an entity. It represents a
primary key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
An attribute that composed of many other attributes is known as a composite attribute.
The composite attribute is represented by an ellipse, and those ellipses are connected with
an ellipse.
c. Multivalued Attribute
An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
An attribute that can be derived from other attribute is known as a derived attribute. It can
be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute
like Date of birth.
3. Relationship
A relationship is used to describe the relation between entities. Diamond or rhombus is
used to represent the relationship.
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known
as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.
b. One-to-many relationship
When only one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then this is known as a one-to-many
relationship.
For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity
on the right associates with the relationship then it is known as a many-to-one
relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a many-to-many
relationship.
For example, Employee can assign by many projects and project can have many
employees.
DBMS
DBMS stands for Database Management System, which is a tool, or a software used to do
various operations on a Database like the Creation of the Database, Deletion of the
Database, or Updating the current Database. To simplify processing and data querying,
the most popular types of Databases currently in use typically model their data as rows
and columns in a set of tables. The data may then be handled, updated, regulated, and
structured with ease. For writing and querying data, most Databases employ Structured
Query Language (SQL).
Cardinality
Cardinality means how the entities are arranged to each other or what is the relationship
structure between entities in a relationship set. In a Database Management System,
Cardinality represents a number that denotes how many times an entity is participating
with another entity in a relationship set. The Cardinality of DBMS is a very important
attribute in representing the structure of a Database. In a table, the number of rows or
tuples represents the Cardinality.
Cardinality Ratio
Cardinality ratio is also called Cardinality Mapping, which represents the mapping of
one entity set to another entity set in a relationship set. We generally take the example of
a binary relationship set where two entities are mapped to each other.
1. One to one
2. Many to one
3. One to many
4. Many to many
One to One
One to one cardinality is represented by a 1:1 symbol. In this, there is at most one
relationship from one entity to another entity. There are a lot of examples of one-to-one
cardinality in real life databases.
For example, one student can have only one student id, and one student id can belong to
only one student. So, the relationship mapping between student and student id will be one
to one cardinality mapping.
Another example is the relationship between the director of the school and the school
because one school can have a maximum of one director, and one director can belong to
only one school.
Note: it is not necessary that there would be a mapping for all entities in an entity set in
one-to-one cardinality. Some entities cannot participate in the mapping.
One to one Cardinality is the subset of Many to one Cardinality. It can be represented
by M:1.
For example, there are multiple patients in a hospital who are served by a single doctor,
so the relationship between patients and doctors can be represented by Many to one
Cardinality.
It is represented by M: N or N: M.
One to one cardinality, One to many cardinalities, and Many to one cardinality is the
subset of the many to many cardinalities.
For Example, in a college, multiple students can work on a single project, and a single
student can also work on multiple projects. So, the relationship between the project and
the student can be represented by many to many cardinalities.
Evidently, the real-world context in which the relation set is modeled determines the
Appropriate Mapping Cardinality for a specific relation set.
o We can combine relational tables with many involved tables if the Cardinality is
one-to-many or many-to-one.
o One entity can be combined with a relation table if it has a one-to-one relationship
and total participation, and two entities can be combined with their relation to
form a single table if both of them have total participation.
o We cannot mix any two tables if the Cardinality is many-to-many.
Keys
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely.
An entity can contain multiple keys, as we saw in the PERSON table. The key
which is most suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and
developers.
2. Candidate key
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of
a candidate key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of
another table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's
information in the employee table. That's why we link these two tables through the
primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new
attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables
are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify
each tuple in a relation. These attributes or combinations of the attributes are called the
candidate keys. One key is chosen as the primary key from these candidate keys, and the
remaining candidate key, if it exists, is termed the alternate key. In other words, the total
number of the alternate keys is the total number of candidate keys minus the primary key.
The alternate key may or may not exist. If there is only one candidate key in a relation, it
does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act
as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the other
candidate key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite
key. This key is also known as Concatenated Key.
For example, in employee relations, we assume that an employee may be assigned
multiple roles, and an employee may work on multiple projects simultaneously. So the
primary key will be composed of all three attributes, namely Emp_ID, Emp_role, and
Proj_ID in combination. So these attributes act as a composite key since the primary key
comprises more than one attribute.