DD Design
DD Design
DATABASE DESIGN
Topics
◦Data Fragmentation
◦Data Replication
◦Allocation Techniques
Data Fragmentation
Data Fragmentation
◦ Techniques that are used to break up the database into logical units, called fragments, which may be assigned
for storage at the various sites.
◦ In a DDB, decisions must be made regarding which site should be used to store which portions of the
database.
◦ Before we are deciding on how to distribute the data, the logical units of the database must be determined
that are to be distributed.
◦ The simplest logical units are the relations themselves; that is, each whole relation is to be stored at a
particular site.
◦ The data fragmentation process should be carried out in such a way that the reconstruction of original
database from the fragment is possible.
Data Fragmentation
◦ To reconstruct the relation R from a complete horizontal fragmentation, we need to apply the UNION
operation to the fragments.
Data Fragmentation
◦ Each horizontal fragment on a relation R can (be specified in the relational algebra by a σCi (R) operation.
◦ A vertical fragment on a relation R can be specified by a πLi (R) operation in the relational Algebra.
◦ A set of vertical fragments whose projection lists L1, L2, ..., Ln include all the attributes in R but share only
the primary key attribute of R is called a complete vertical fragmentation of R.
◦ In this case the projection lists satisfy the following two conditions:
◦ L1 U L2 U…..U Ln = ATTRS(R)
◦ Li ∩ Lj = PK(R) for any i ≠ j, where ATTRS(R) is the set of attributes of R and PK(R) is the primary key of
R.
◦ To reconstruct the relation R from a complete vertical fragmentation, we apply the OUTER UNION
operation to the vertical fragments
◦ The two vertical fragments of the EMPLOYEE relation with projection lists:
◦ L1 = {Ssn, Name, Bdate, Address, Sex} and
◦ L2 = {Ssn, Salary, Super_ssn, Dno}
◦ constitute a complete vertical fragmentation of EMPLOYEE.
Data Fragmentation
3. Mixed (Hybrid) Fragmentation:
◦ We can intermix the two types of fragmentation, yielding a mixed fragmentation.
◦ A fragment of a relation R can be specified by a SELECT-PROJECT combination of operations πL(σC(R)).
◦ In some situations, the horizontal and the vertical fragmentation isn’t enough to distribute data for some
applications and in that conditions, we need a fragmentation called a mixed fragmentation.
Mixed fragmentation can be done in two different ways:
◦ The first method is to first create a set or group of horizontal fragments and then create vertical fragments
from one or more of the horizontal fragments.
◦ The second method is to first create a set or group of vertical fragments and then create horizontal
fragments from one or more of the vertical fragments.
◦ Example:
◦ πFname, Minit, Dno (σSalary <= 30000 (Employee))
Data Replication and Allocation
Data Replication
◦ Replication is useful in improving the availability of data.
◦ The most extreme case is replication of the whole database at every site in the distributed system, thus
creating a fully replicated distributed database.
◦ This can improve availability remarkably because the system can continue to operate as long as at least one
site is up.
◦ It also improves performance of retrieval for global queries because the results of such queries can be
obtained locally from any one site; hence, a retrieval query can be processed at the local site where it is
submitted, if that site includes a server module.
◦ The disadvantage of full replication is that it can slow down update operations drastically, since a single
logical update must be performed on every copy of the database to keep the copies consistent.
◦ This is especially true if many copies of the database exist.
◦ Full replication makes the concurrency control and recovery techniques more expensive than they would be
if there was no replication.
Data Replication
◦ The other extreme from full replication involves having no replication - that is, each fragment is stored at
exactly one site.
◦ All fragments must be disjoint except for the repetition of primary keys among vertical (or mixed) fragments.
◦ This is also called nonredundant allocation.
◦ Between these two extremes, we have a wide spectrum of partial replication of the data—that is, some
fragments of the database may be replicated whereas others may not.
◦ The number of copies of each fragment can range from one up to the total number of sites in the
distributed system.
◦ A description of the replication of fragments is sometimes called a replication schema.
Data Allocation
◦ Each copy of a fragment must be assigned to a particular site in the distributed system.
◦ This process is called data distribution or data allocation.
◦ The choice of sites and the degree of replication depend on the performance and availability goals of the
system and on the types and frequencies of transactions submitted at each site.
◦ If high availability is required, transactions can be submitted at any site, and most transactions are retrieval
only, a fully replicated database is a good choice.
◦ If certain transactions that access particular parts of the database are mostly submitted at a particular site,
the corresponding set of fragments can be allocated at that site only.
◦ Data that is accessed at multiple sites can be replicated at those sites.
To summarize the Allocation:
◦ Centralises: Entire database is stored at a single site. No data distribution occurs
◦ Partitioned: The database gets divided into different fragments which are stored at several sites.
◦ Replicated: Copies of the database are stored at different locations to access the data.