Data Base Complete
Data Base Complete
Data Base Complete
Data Model
1. ER Model 1.Relational
2. Object oriented 2.Network
Model 3.Hierarchical
Many data models have been proposed .It is basically categorized in three types .
Database Designers start making a particular database. One such popular model
is the Entity relationship model (ER Model) and object oriented model.
• ER Model
The E/R model specializes in entities, relationships, and even attributes that are
used by database designers. Conceptual data models use concepts such as
entities, attributes, and relationships. An entity represents a real-world object or
concept, such as an employee or a project from the miniworld that is described in
the database. An attribute represents some property of interest that further
describes an entity, such as the employee’s name or salary. A relationship among
two or more entities represents an association among the entities, for example, a
works-on relationship between an employee and a project.
Database Programming
Database programming involves designing and maintaining a database for an
application. Best practices include establishing relationships between different
data sets and testing for errors and duplicate records. Retrieving instances of data
from the database is another key responsibility in database programming.Efficient
database design, implementation and programming and SQL use in an web-based
application is the most critical element of your website performance. Poorly
written queries can cause havoc in the database. Because in many organizations
power users access the production databases via reporting tools and direct
queries, efficiently written SQL not only results in better application performance
but also reduces traffic on the network.
Programming Languages
Our programmers have the ability to work with essentially any programming
language, however we are more focused on extending our expertise in those
language that are most efficient in web based applications.
JavaScript
JavaScript is an object-oriented scripting language that interact with website
visitors and intently responds to what they do with no need to reload the page
and can make the pages feel more dynamic and give feedback to the user.
JavaScript is growing in popularity due to its simple learning curve relative to the
amount of power it provides. A substitute for CGI (Common Gateway Interface)
scripting, Java scripting is designed for the web.
popular server-side scripting languages for writing CGI programs. Perl programs,
or scripts, are text files which are parsed (run through and executed) by a
program called an interpreter on the server. Perl is an interpretive language,
which makes it easy to build and test simple programs.
File Organization:
File organization is used to describe the way in which the records are stored in
terms of blocks, and the blocks are placed on the storage medium. Files of fixed
length records are easier to implement than the files of variable length records.
Attributes
Objective of file organization
o It contains an optimal selection of records, i.e., records can be selected as
fast as possible.
o To perform insert, delete or update transaction on the records should be
quick and easy.
o The duplicate records cannot be induced as a result of insert, update or
delete.
o For the minimal cost of storage, records should be stored efficiently.
CIT-503(Database Administration and Management )
When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way,
when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of
delete and update. In this method, there is no effort for searching and sorting the
entire file. In this method, each record will be stored randomly in the memory.
Clusters are created when two or more records are saved in the same file. There
will be two or more tables in the same block of data in these files, and key
attributes that are used to link these tables together will only be kept once. This
strategy lowers the cost of searching several files for various records.
When combining tables with the same condition on a regular basis, the cluster file
organization is employed. Only a few records from both tables will be returned by
these joins.
Distributed Database
A distributed database is basically a database that is not limited to one system, it
is spread over different sites, i.e, on multiple computers or over a network of
computers. A distributed database system is located on various sites that don’t
share physical components. This may be required when a particular database
needs to be accessed by various users globally. It needs to be managed such that
for the users it looks like one single database.
Heterogeneous DDMS have local users while Homogenous DDMS does not have
local users.
There are two ways in which data can be stored at different sites. These are,
1. Replication.
2. Fragmentation.
Replication
• As the name suggests, the system stores copies of data at different sites. If
an entire database is available on multiple sites, it is a fully redundant
database.
Fragmentation
In Fragmentation, the relations are fragmented, which means they are split
into smaller parts. Each of the fragments is stored on a different site, where it
is required. In this, the data is not replicated, and no copies are created.
Consistency of data is highly benefitted from Fragmentation.
The prerequisite for fragmentation is to make sure that the fragments can
later be reconstructed into the original relation without losing any data.
Concepts such as indexing, hardware improvements in CPU and RAM have made
it possible to have database systems perform at lightning speed. Contrary to what
you might expect, we are witnessing new DBMS strategies and the development
of more modern processes.
1. Cloud Database
Databases in the cloud are not a new concept. Many organizations have adopted
it at some point in their application life cycle. However, the trend we see now is
the adoption of native cloud support for databases. These are databases that are
built with the cloud’s advantages in mind. Your grocery store, bank, restaurant,
online shopping sites, hospital, favorite clothing store and mobile service
provider.
3. Graph Database
Transaction Processing
o The transaction is a set of logically related operation. It contains a group of
tasks.
o A transaction is an action or series of actions. It is performed by a single
user to perform operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's
account. This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from
the buffer.
1. R(X);
2. X = X - 500;
3. W(X);
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will
contain 3500.
o The third operation will write the buffer's value to the database. So X's final
value will be 3500.
But it may be possible that because of the failure of hardware, software or power,
etc. that transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which is not
acceptable by the bank.
Transaction property
The transaction has the four properties. These are used to maintain consistency in
a database, before and after the transaction.
Database Administration and Management
Property of Transaction
1. Atomicity
• It states that all operations of the transaction take place at once if not,
the transaction is aborted.
• There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
T1 T2
Read(A) Read(B)
A:=A-100 Y:=Y+100
Write(A) Write(B)
2. Consistency
o The integrity constraints are maintained so that the database is consistent
before and after the transaction.
o The execution of a transaction will leave a database in either its prior stable
state or a new stable state.
o The consistent property of database states that every transaction sees a
consistent database instance.
o The transaction is used to transform the database from one consistent
state to another consistent state.For example: The total amount must be
maintained before or after the transaction.
3.Isolation
o It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first one is
completed.
o In isolation, if the transaction T1 is being executed and using the data item
X, then that data item can't be accessed by any other transaction T2 until
the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation
property.
4.Durability
o The durability property is used to indicate the performance of the
database's consistent state. It states that the transaction made the
permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by
the system failure. When a transaction is completed, then the database
reaches a state known as the consistent state. That consistent state cannot
be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability
property.
Transaction States
A transaction may go through a subset of five states, active, partially committed,
committed, failed and aborted.
Database Administration and Management
• Active − The initial state where the transaction enters is the active state.
The transaction remains in this state while it is executing read, write or
other operations.
• Partially Committed − The transaction enters this state after the last
statement of the transaction has been executed.
• Committed − The transaction enters this state after successful completion
of the transaction and system checks have issued commit signal.
• Failed − The transaction goes from partially committed state or active state
to failed state when it is discovered that normal execution can no longer
proceed or system checks fail.
• Aborted − This is the state after the transaction has been rolled back after
failure and the database has been restored to its state that was before the
transaction began.
Database Administration and Management
Schedule
A schedule is a series of operations from one or more transactions. A schedule
can be of two types:
Serial Schedule: When one transaction completely executes before starting
another transaction, the schedule is called a serial schedule. A serial schedule is
always consistent. A serial schedule has low throughput and less resource
utilization.
Concurrent Schedule: When operations of a transaction are interleaved with
operations of other transactions of a schedule, the schedule is called a
Concurrent schedule. But concurrency can lead to inconsistency in the
database.
Concurrency Control
When several transactions execute concurrently without any rules and protocols,
various problems arise that may harm the data integrity of several databases.
These problems are known as concurrency control problems. Therefore several
rules are designed, to maintain consistency in the transactions while they are
executing concurrently which are known as concurrency control protocols.
A transaction is a single reasonable unit of work that can retrieve or may change
the data of a database. Executing each transaction individually increases
the waiting time for the other transactions and the overall execution also gets
delayed. Hence, to increase the throughput and to reduce the waiting time,
transactions are executed concurrently.
Example: Suppose, between two railway stations, A and B, 5 trains have to travel,
if all the trains are set in a row and only one train is allowed to move from
station A to B and others have to wait for the first train to reach its destination
Database Administration and Management
then it will take a lot of time for all the trains to travel from station A to B. To
reduce time all the trains should be allowed to move concurrently from
station A to B ensuring no risk of collision between them.
• In a multi-user system, multiple users can access and use the same
database at one time, which is known as the concurrent execution of the
database. It means that the same database is executed simultaneously on a
multi-user system by different users.
The dirty read problem in DBMS occurs when a transaction reads the data
that has been updated by another transaction that is still uncommitted. It
arises due to multiple uncommitted transactions executing simultaneously.
Transaction A reads the value of data DT as 1000 and modifies it to 1500 which
gets stored in the temporary buffer. The transaction B reads the data DT as 1500
and commits it and the value of DT permanently gets changed to 1500 in the
database DB. Then some server errors occur in transaction A and it wants to get
rollback to its initial value, i.e., 1000 and then the dirty read problem occurs.
The unrepeatable read problem occurs when two or more different values of the
same data are read during the read operations in the same transaction.
The Lost Update problem arises when an update in the data is done over another
update but by two different transactions.
Advantages of Concurrency
In general, concurrency means, that more than one transaction can work on a
system. The advantages of a concurrent system are:
• Waiting Time: It means if a process is in a ready state but still the process
does not get the system to get execute is called waiting time. So,
concurrency leads to less waiting time.
• Response Time: The time wasted in getting the response from the cpu for
the first time, is called response time. So, concurrency leads to less
Response Time.
Disadvantages of Concurrency
• Deadlocks: Deadlocks can occur when two or more transactions are waiting
for each other to release resources, causing a circular dependency that can
prevent any of the transactions from completing. Deadlocks can be difficult
to detect and resolve, and can result in reduced throughput and increased
latency.
Steps
As mentioned in the above image, query processing can be divided into compile-
time and run-time phases. Compile-time phase includes:
2. Query Optimization
Finally, the query processor checks if the meaning of the query is right or not.
Things like if the table(s) mentioned in the query are present in the DB or not? if
the column(s) referred from all the table(s) are actually present in them or not?
(Semantic Analysis)
Query:
SELECT
emp_name
FROM
employee
WHERE
salary>10000;
• The name of the queried table is looked into the data dictionary table.
• The name of the columns mentioned (emp_name and salary) in the tokens
are validated for existence.
The next step is to translate the generated set of tokens into a relational algebra
query. These are easy to handle for the optimizer in further processes.
2)Query Evaluation
The next step is to apply certain rules and algorithms to generate a few other
powerful and efficient data structures. These data structures help in constructing
the query evaluation plans. For example, if the relational graph was constructed,
there could be multiple paths from source to destination. A query execution plan
will be generated for each of the paths.
3)Query Optimization
In the next step, DMBS picks up the most efficient evaluation plan based on the
cost each plan has. The aim here is to minimize the query evaluation time. The
optimizer also evaluates the usage of index present in the table and the columns
being used. It also finds out the best order of subqueries to be executed so as to
ensure only the best of the plans gets executed.
Simply put, for any query, there are multiple evaluation plans to execute it.
Choosing the one which costs the least is called Query Optimization in DBMS.
Some of the factors weighed in by the optimizer to calculate the cost of a query
evaluation plan is:
• CPU time
• number of operations
Database System
Components of SQL
Security is the priority, and DCL commands like GRANT and REVOKE control access
permissions, ensuring that only authorized users enter the gates.
This is the directing body, TCL commands COMMIT and ROLLBACK manage the
flow of transactions, ensuring data integrity.
SQL Commands
SELECT commands, which serve as SQL's vision, are used to retrieve data.
The INSERT, UPDATE, and DELETE instructions serve as the hands, of updating
data entries. JOIN commands join tables together, forming associations that
improve data retrieval.
WHERE condition
There are many popular RDBMS available to work with. Some of the most popular
RDBMS are listed below −
Database System
• MySQL
• MS SQL Server
• ORACLE
• MS ACCESS
• PostgreSQL
• SQLite
A join is a query that combines records from two or more tables. A join will be
performed whenever multiple tables appear in the FROM clause of the query. The
select list of the query can select any columns from any of these tables. If join
condition is omitted or invalid then a Cartesian product is formed. If any two of
these tables have a column name in common, then must qualify these columns
throughout the query with table or table alias names to avoid ambiguity. Most
join queries contain at least one join condition, either in the FROM clause or in
the WHERE clause.
Advantages Of Joins:
• The advantage of a join includes that it executes faster.
• The retrieval time of the query using joins almost always will be faster than
that of a subquery.
• By using joins, you can minimize the calculation burden on the database
i.e., instead of multiple queries using one join query. This means you can
make better use of the database’s abilities to search through, filter, sort,
etc.
Disadvantages Of Joins:
Database System
• Disadvantage of using joins includes that they are not as easy to read as
subqueries.
• More joins in a query means the database server has to do more work,
which means that it is more time consuming process to retrieve data
Subquery
A Subquery or Inner query or Nested query is a query within SQL query and
embedded within the WHERE clause. A Subquery is a SELECT statement that is
embedded in a clause of another SQL statement. They can be very useful to select
rows from a table with a condition that depends on the data in the same or
another table. A Subquery is used to return data that will be used in the main
query as a condition to further restrict the data to be retrieved. The subquery can
be placed in the following SQL clauses they are WHERE clause, HAVING clause,
FROM clause.
Advantages Of Subquery:
• Subqueries divide the complex query into isolated parts so that a complex
query can be broken down into a series of logical steps.
• Subqueries allow you to use the results of another query in the outer
query.
Disadvantages of Subquery:
• The optimizer is more mature for MYSQL for joins than for subqueries, so in
many cases a statement that uses a subquery can be executed more
efficiently if you rewrite it as join.
• We cannot modify a table and select from the same table within a subquery
in the same SQL statement.
• MIN and MAX return the lowest and highest values in a particular column,
respectively.
Backup
Backup refers to the storage of a replication of the original data that may be
utilized in the event of data loss. Backup is considered one of the best data
security methods, and organizations should secure their important data by
utilizing the backup process. Backup can be accomplished by storing a backup
copy of the original data on storage devices or in a database.
The frequency of backup creation might vary depending on the importance of the
data. For example, if the data is particularly valuable, it must be backed up
daily. Monthly and quarterly backups are the same as daily backups but are only
performed on the last day of the month or quarter.
Database System
There are mainly three types of data backup, full, incremental, and differential
backup.
1. Full Backup
It is a simple and full backup procedure that copies all of your data to another
media set, including a tape, disk, or CD. As a result, a complete copy of all your
data is provided on a single media set. It takes a longer time to complete and
takes much storage space.
2. Incremental Backup
Incremental backups take up less space and time than differential and full
backups, but they are the most time-consuming technique for restoring a full
system. They're ideal for backing up data that hasn't changed in a long time.
However, there is no method to predict how much space you would use for future
backups.
3. Differential Backup
These backups are very useful because they allow you to recover a database or
server swiftly. You don't have to generate a completely new version of it. Instead,
Database System
you apply the most recent code changes, restore a differential backup, and have a
working copy of the database or server.
Features of Backup
There are various features of Backup. Some of the backup features are as follows:
1. It is generally a data replica that is utilized to restore the actual data in the
case of data loss/damage.
What is Recovery?
In this case, the failure could be any type, including system failure, concurrency
control enforcement, transaction errors, exception conditions, disk failure, and
disasters. Any event that results in downtime would require recovery. There are
various recovery processes, including Steal/no-steal and force/no-force policies,
shadowing, caching, before and after images of the data item, UNDO, REDO
recovery, etc.
Features of recovery
Database System
There are various features of Backup. Some of the backup features are as follows:
1. It is a process for restoring lost, damaged, or corrupted data to its original state.
2. The process of recovering is expensive.
3. When there is a failure, it refers to recovering the lost data.
4. It increases the database's reliability.
5. It is rarely utilized in production environments.
Indexing in Databases
Indexing improves database performance by minimizing the number of disc
visits required to fulfill a query. It is a data structure technique used to locate
and quickly access data in databases. Several database fields are used to
generate indexes. The main key or candidate key of the table is duplicated in
Database System
the first column, which is the Search key. To speed up data retrieval, the values
are also kept in sorted order. It should be highlighted that sorting the data is not
required. The second column is the Data Reference or Pointer which contains a
set of pointers holding the address of the disk block where that particular key
value can be found.
Attributes of Indexing
• Access Types: This refers to the type of access such as value-based search,
range access, etc.
• Access Time: It refers to the time needed to find a particular data element
or set of elements.
• Insertion Time: It refers to the time taken to find the appropriate space and
insert new data.
• Deletion Time: Time taken to find an item and delete it as well as update
the index structure.
NoSQL System
NoSQL is a type of database management system (DBMS) that is designed to
handle and store large volumes of unstructured and semi-structured data. Unlike
traditional relational databases that use tables with pre-defined schemas to store
data, NoSQL databases use flexible data models that can adapt to changes in data
structures and are capable of scaling horizontally to handle growing amounts of
data.
2. Key-value stores: These databases store data as key-value pairs, and are
optimized for simple and fast read/write operations.
4. Graph databases: These databases store data as nodes and edges, and are
designed to handle complex relationships between data.
NoSQL databases are often used in applications where there is a high volume of
data that needs to be processed and analyzed in real-time, such as social media
analytics, e-commerce, and gaming. They can also be used for other applications,
such as content management systems, document management, and customer
relationship management.
Database System
However, NoSQL databases may not be suitable for all applications, as they may
not provide the same level of data consistency and transactional guarantees as
traditional relational databases. It is important to carefully evaluate the specific
needs of an application when choosing a database management system.
1. Dynamic schema: NoSQL databases do not have a fixed schema and can
accommodate changing data structures without the need for migrations or
schema alterations.
Serializability
Example
If both transactions are performed without interfering each other then it is called
as serial schedule, it can be represented as follows –
Types of serializability
View serializability
• T1 is reading the initial value of A, then T2 also reads the initial value of A.
• T1 is the reading value written by T2, then T2 also reads the value written
by T1.
• T1 is writing the final value, and then T2 also has the write operation as the
final value.
Conflict serializability
It orders any conflicting operations in the same way as some serial execution. A
pair of operations is said to conflict if they operate on the same data item and one
of them is a write operation.
That means
Recoverability
Irrecoverable schedules
Recoverable Schedules
Example
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires
an appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only
read by the transaction.
o It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.
2. Exclusive lock:
Database Administration and Management
o In the exclusive lock, the data item can be both reads as well as written by
the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify
the same data simultaneously.
o In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is
started as soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.
Database Administration and Management
Transaction T1:
o Lock point: at 3
Transaction T2:
o Lock point: at 6
o The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time
or logical counter.
o Let's assume there are two transactions T1 and T2. Suppose the transaction
T1 has entered the system at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it executes first as it is
entered the system first.
Rule No. 01 is used when any transaction wants to perform a Read(A) operation
• Else (otherwise) execute R(A) operation and SET RTS (A) = MAX {RTS(A),
TS(Ti)}
Rules No.2 rules are used when a transaction needs to perform WRITE (A)
• Else (otherwise) execute W(A) operation and SET WTS (A) = TS(Ti)
Solution:
In the above table, A, B, and C are data values. And Read and Write timestamp
values are given “0”. As in the example table, time0 to time7 are given. Let’s
discuss it one by one.
• Go to else part and SET RTS (A) = MAX {RTS(A), TS(T1)} So,
Database Administration and Management
• Go to else part and SET RTS (B) = MAX {RTS(B), TS(T2)} So,
• Go to else part and SET RTS (B) = MAX {RTS(B), TS(T3)} So,
• Go to else part and SET RTS (C) = MAX {RTS(C), TS(T1)} So,
When T2 rolls, it never resumes. It will restart with a new timestamp value. Keep
in mind that T2 restarts after completion of all running transactions, so in this
example, T2 will restart after completion of T3.
It happens due to conflict where the older transaction (T2) wants to perform a
write operation on data “B,” but the younger transaction (T3) has already Read
the same data “B”
Database Administration and Management
Recovery Techniques
• Disk crashes may be due to the inability of the system to read the
disk.
Even though the database system fails, the data in the database must
be recoverable to the last state before the failure of the system.
Example: Database
A 100
B 200
Database Administration and Management
T1 Log File
R(A)
A=A+100 New
W(A) <T1, A, 200>
R(B)
B=B+200 <T1, B, 400>
W(B)
Commit <T1, Commit>
Second Case:
T1 Log File
R(A)
A=A+100 New
W(A) <T1,A,200>
R(B)
B=B+200 <T1,B,400>
W(B)
Another Example :
<T1, Start>
<T1, A, 200>
<T1, B, 400>
<T2, Start>
<T2, C, 500>
Database Administration and Management
Database
A 100
B 200
T1 Log File
R(A)
A=A+100 Old, New
W(A) <T1, A, 100,200>
R(B)
B=B+200 <T1, B, 200,400>
W(B)
Commit <T1, Commit>
Second Case:
T1 Log File
R(A)
A=A+100 Old, New
W(A) <T1, A, 100,200>
R(B)
B=B+200 <T1, B, 200,400>
W(B)
Database Administration and Management
Another Example:
<T1, Start>
<T1, Commit>
<T2, Start>
<T2, C, 700,800>
The log file holds the changes that The log file holds the changes along
are going to be applied. with the new and old values
If a rollback is made, the log files are If a rollback is made, the old state of
destroyed and no change is made to the data is restored with the records
the database in the log file
Database Administration and Management