Advanced Database Chapter Four & Five
Advanced Database Chapter Four & Five
and payroll management, with relatively simple data types that are well suited to
the relational data model.
The solution was the introduction of object-based databases, which allow one to
of the structure, and the lack of support for new data types such as graphics, xml,
2D and 3D data.
In the 1980s With the advent of Object Oriented methodologies and languages,
This led to the development of OODB and OODBMS where objects are stored in
database.
Data models are data structures which describe how data are represented and
o Hierarchical model
o Network Model
o Relational Model
ORD supports the basic components of any object-oriented database model in its
schemas and the query language used, such as objects, classes and inheritance.
For example, the attributes of a Person class such as name, age, and gender
same name as a structured type is a constructor function for the structured type.
For instance, we could declare a constructor for the type Name like this:
returns Name
begin
End;
database systems.
indices, etc.), the complex data types supported by object-relational systems can
be translated to the simpler type system of relational databases.
Availability is the measure of the percentage of time a machine is operable. For example, a
machine may be available 90% of the time, but reliable only 75% of the time from a
performance standpoint.
When the data and DBMS software are distributed over several sites, one site may fail
while other sites continue to operate.
Only the data and software that exist at the failed site cannot be accessed.
In a centralized system, failure at a single site makes the whole system unavailable to all
users.
In a distributed database, some of the data may be unreachable, but users may still be able
to access other parts of the database.
28 Prepared BY: Abebe SH. (MSc.) 1/16/2024
Advantages of DDB Systems
Improved performance:
A distributed DBMS fragments the database by keeping the data closer to where it
is needed most.
When a large database is distributed over multiple sites, smaller databases exist at
each site.
As a result, local queries and transactions accessing data at a single site have
of the data; users at one site can access data stored at other sites.
Data can be placed at the site close to the users who normally use that data.
In this way, users have local control of the data and they can consequently
establish and enforce local policies regarding the use of this data.
A global database administrator (DBA) is responsible for the entire system.
Generally, part of this responsibility is devolved to the local level, so that the local
database size can usually be handled by adding processing and storage power to
the network.
In a centralized DBMS, growth may entail changes to both hardware (the
The procurement and maintenance costs for a DDBMS to be higher than those for a
centralized DBMS.
Distributed DBMS requires additional hardware to establish a network between sites.
There are ongoing communication costs incurred with the use of this network.
There are also additional labor costs to manage and maintain the local DBMSs and the
underlying network.
Maintaining Data
34
sites
Prepared BY: Abebe SH. (MSc.) 1/16/2024
Distributed Data Storage
Consider a relation r that is to be stored in the database. There are two approaches to
storing this relation in the distributed database:
Replication
– The system maintains several identical replicas (copies) of the relation, and stores each
replica at a different site.
– The alternative to replication is to store only one copy of relation r
Fragmentation
– The system partitions the relation into several fragments, and stores each fragment at a
different site.
– Fragmentation and replication can be combined: A relation can be partitioned into several
fragments and there may be several replicas of each fragment.
Prepared BY: Abebe SH. (MSc.)
35 1/16/2024
Data Replication
Replication is useful in improving the availability of data.
The most extreme case is replication of the whole database at every site in the
This strategy consists of maintaining a complete copy of the database at each site.
are maximized.
However, storage costs and communication costs for updates are the most
expensive.
r2,...,rn.
Each tuple of relation r must belong to at least one of the fragments, so that the original
The tuples that belong to the horizontal fragment are specified by a condition on one or
We reconstruct the r elation r by taking the union of all fragments; that is,
r = r1 ∪ r2 ∪ ···∪ rn
39 Prepared BY: Abebe SH. (MSc.) 1/16/2024
Horizontal Fragmentation
Example; consider the following student relation.
Assuming that there are only two collages in the university, CCI and CBE, the horizontal fragmentation
of student relation by college can be obtained as follows:
Student Relation before fragmentation Fragment S1
Fragment S2
Completeness:- Each attribute in the student relation appears in either fragment S1 or S2.
Reconstruction:- The student relation can be reconstructed from the fragments
Disjointness:- The fragments are disjoint except for the primary key, which is necessary
for reconstruction the student relation.
42 Prepared BY: Abebe SH. (MSc.) 1/16/2024
Transparency
The user of a distributed database system should not be required to know either where
the data are physically located or how the data can be accessed at the specific local site.
This characteristic, called data transparency, can take several forms,
Fragmentation transparency. Users are not required to know how a relation has been
fragmented.
Replication transparency. Users view each data object as logically unique. The
distributed system may replicate an object to increase either system performance or
data availability.
Users do not have to be concerned with what data objects have been replicated, or
where replicas have been placed.
Location transparency. Users are not required to know the physical location of the
data.
The distributed database system should be able to find any data as long as the data
identifier is supplied by the user transaction.
43 Prepared BY: Abebe SH. (MSc.) 1/16/2024
Distributed Transactions
Access to the various data items in a distributed system is usually accomplished
Local Transactions are those that access and update data in only one local
database;
Global Transactions are those that access and update data in several local
databases.
transaction.
o The Transaction Coordinator coordinates the execution of the various transactions (both local
system does (for example, software errors, hardware errors, or disk crashes).
There are, however, additional types of failure with which we need to deal in a
Loss of messages
Network partition
are allowed to access the database concurrently, namely the lost update problems,
uncommitted dependency problems, and inconsistent analysis problems.
However, there are additional problems that can arise as a result of data distribution.
One such problem is the multiple-copy consistency problem, which occurs when a data
Clearly, to maintain consistency of the global database, when a replicated data item is
updated at one site all other copies of the data item must also be updated.
If a copy is not updated, the database becomes inconsistent.
Prepared BY: Abebe SH. (MSc.)
47 1/16/2024
Distributed Concurrency Control
The various locking protocols described in Chapter 2 can be used in a distributed
environment.
The only change that needs to be incorporated is in the way the lock manager
deals with replicated data.
In this section we will see the following locking protocols that can be employed
to ensure serializability for distributed DBMSs:
Single Lock-Manager Approach
Simple implementation. This scheme requires two messages for handling lock
One is that all locking requests are sent to a single site, possibly over loading that site
locking information is kept at that site. This can limit system reliability and
50
availability.
Prepared BY: Abebe SH. (MSc.) 1/16/2024
Distributed Lock Manager
This protocol attempts to overcome the disadvantages of Single lock-manager approach,
this time by distributing the lock managers to every site.
Each site maintains a local lock manager whose function is to administer the lock and
unlock requests for those data items that are stored in that site.
When a transaction wishes to lock a data item X that is not replicated and resides at site
Si, a message is sent to the lock manager at site Si requesting a lock (in a particular lock
mode).
If data item X is locked in an incompatible mode, then the request is delayed until it can be
granted.
Once it has determined that the lock request can be granted, the lock manager sends a message
back to the initiator transaction indicating that it has granted the lock request.
The distributed-lock-manager scheme has the advantage of simple implementation, and
reduces the degree to which the coordinator is a bottleneck.
However, deadlock handling is more complex, since the lock and unlock requests are no
longer made at a single site: There may be inter-site deadlocks even when there is no
51 Prepared BY: Abebe SH. (MSc.) 1/16/2024
deadlock within a single site.
Primary Copy
For each replicated data item, one copy is chosen as the primary copy; the other copies
are called slave copies.
The choice of which site to choose as the primary site is flexible, and the site that is
chosen to manage the locks for a primary copy need not hold the primary copy of that
item.
It is only necessary to exclusively lock the primary copy of the data item that is to be
updated. Once the primary copy has been updated, the change can be propagated to the
slave copies.
The disadvantages of this approach are that deadlock handling is more complex owing to
multiple lock managers, and that there is still a degree of centralization in the system:
52 lockPrepared
requests
BY: Abebefor a specific primary copy can be handled only by one site.
SH. (MSc.) 1/16/2024
Majority Protocol
This protocol is an extension of distributed lock manager to overcome having to lock all
Again, the system maintains a lock manager at each site to manage the locks for all data at
that site.
When a transaction wishes to read or write a data item that is replicated at n sites, it must
send a lock request to more than half of the n sites where the item is stored.
The transaction cannot proceed until it obtains locks on a majority of the copies
If the transaction does not receive a majority within a certain timeout period, it cancels
the copies; however, only one transaction can hold an exclusive lock on a majority
of the copies.
The disadvantages are that the protocol is more complicated and deadlock