Ddbms-Unit 1 Part2
Ddbms-Unit 1 Part2
Advantages:
1. Availability
2. Resource sharing
3. Incremental growth
4. Increased user involvement and control
5. End-user productivity
6. Distance and location independence
7. Privacy and security
Disadvantages:
• The ability to modify a scheme definition in one level without affecting a scheme
definition in a higher level is called data independence. Data independence is a
fundamental form of transparency that we look for within a DBMS. Data definition
occurs at two levels. At one level the logical structure of the data are specified, and
at the other level its physical structure.
• The ability to modify a scheme definition in one level without affecting a scheme
definition in a higher level is called data independence.
• There are two kinds : Physical data independence and logical data independence.
1. Physical data independence : The ability to modify the physical scheme without
causing application programs to be rewritten. Modifications at this level are usually
to improve performance.
2. Logical data independence : The ability to modify the conceptual scheme without
causing application programs to be rewritten. Usuallydone when logical structure of
database is altered.
Logical data independence is harder to achieve as the application programs are usually
heavily dependent on the logical structure of the data.
- Computer workstations or remote devices (sites or nodes) that form the network system.
The distributed database system must be independent of the computer system hardware.
- Network hardware and software components that reside in each workstation or device.
The network components allow all sites to interact and exchange data. Because the
components-computers, operating systems, network hardware, and so on-are likely to be
supplied by different vendors, it is best to ensure that distributed database functions can
be run on multiple platforms.
- Communications media that carry the data from one node to another. The DDBMS must
be communications media-independent; that is, it must be able to support several types of
communications media.
- The Transaction Processor (TP), which is the software component found in each
computer or device that requests data. The transaction processor receives and processes
the application's data requests (remote and local). The TP is also known as the Application
Processor (AP) or the Transaction Manager (TM).
Distributed Data Management Systems are used wherever large-scale, reliable, and
efficient data storage and processing are required across multiple machines. Here are the
key application areas:
• Use Case: Processing massive datasets (e.g., logs, clickstreams, sensor data)
• Examples: Hadoop, Apache Spark
• Industries: E-commerce, telecom, social media, finance
2. Cloud Computing
• Use Case: Storage and access of distributed data over the cloud
• Examples: Amazon S3, Google Bigtable, Microsoft Azure Cosmos DB
• Industries: All modern tech-enabled businesses
• Use Case: Managing user profiles, messages, media files, real-time feeds
• Examples: Facebook (Cassandra), Twitter (Manhattan), Instagram (PostgreSQL +
caching layers)
4. E-commerce Systems
6. Healthcare Systems
• Use Case: Storing and managing patient records, research data, imaging files
across institutions
• Examples: Distributed health record systems, genomic data analysis platforms
• Use Case: Efficient data replication and retrieval across global users
• Examples: Akamai, Cloudflare, Netflix’s Open Connect
• Higher reliability
• Improved performance
• Easier system expansion
• Transparency of distributed and replicated data
Higher reliability
• Replication of components
• No single points of failure
• e.g., a broken communication link or processing element does not bring down the
entire system
• Distributed transaction processing guarantees the consistency of the database and
concurrency
Improved performance
• Refers to the separation of the higher-level semantics of the system from the lower-
level implementation issues
• A transparent system “hides” the implementation details from the users.
• A fully transparent DBMS provides high-level support for the development of
complex applications.
• The user is protected from the operational details of the network (or even does not
know about the existence of the network)
• The user does not need to know the location of data items and a command used to
perform a task is independent from the location of the data and the site the task is
performed (location transparency)
• A unique name is provided for each object in the database (naming transparency)
In absence of this, users are required to embed the location name as part of
an identifier
Different ways to ensure naming transparency:
Replication transparency ensures that the user is not involved in the management of
copies of some data
• The user should even not be aware about the existence of replicas, rather should
work as if there exists a single copy of the data
• Replication of data is needed for various reasons
e.g., increased efficiency for read-only data access
Fragmentation transparency ensures that the user is not aware of and is not involved in
the fragmentation of the data
• The user is not involved in finding query processing strategies over fragments or
formulating queries over fragments
The evaluation of a query that is specified over an entire relation but now has
to be performed on top of the fragments requires an appropriate query
evaluation strategy
• Fragmentation is commonly done for reasons of performance, availability, and
reliability
• Two fragmentation alternatives
Horizontal fragmentation: divide a relation into a subsets of tuples
Vertical fragmentation: divide a relation by columns
Transaction transparency ensures that all distributed transactions maintain integrity and
consistency of the DDB and support concurrency
• Each distributed transaction is divided into a number of sub-transactions (a sub-
transaction for each site that has relevant data) that concurrently access data at
different locations
• DDBMS must ensure the indivisibility of both the global transaction and each of the
sub-transactions
• Can be further divided into
Concurrency transparency
Failure transparency
Failure transparency: DDBMS must ensure atomicity and durability of the global
transaction, i.e., the sub-transactions of the global transaction either all commit or all
abort.
• Thus, DDBMS must synchronize global transaction to ensure that all sub-
transactions have completed successfully before recording a final COMMIT for the
global transaction
• The solution should be robust in presence of site and network failures
1. Complexity
Distributed systems are more complex than centralized ones because they involve
multiple sites, networks, and synchronization. Managing data consistency,
communication between nodes, and fault tolerance adds to the system's overall difficulty.
2. Cost
3. Security
Since data is stored and transmitted across different locations and networks, it is more
vulnerable to unauthorized access, attacks, and data breaches. Ensuring secure
communication and access control across nodes is more difficult.
Maintaining data integrity—ensuring that the data is accurate and consistent—is harder in
a distributed environment. This is because updates and transactions may happen
simultaneously across different locations, increasing the chance of conflicts or
inconsistencies.
5. Lack of Standards
Relational databases are based on the relational model, introduced by E.F. Codd (1970),
which organizes data into structured tables (relations) with defined relationships. Below is
a comprehensive breakdown of key concepts:
2. Relational Constraints
Sym
Operation Description Example (SQL-like)
bol
π(CourseID)(Math) −
Difference − Rows in A but not B (EXCEPT)
π(CourseID)(Art)
• Deal with multi-valued dependencies and join anomalies (rarely used in practice).
Comman
Purpose Example
d
INSERT Add new records INSERT INTO Students VALUES (101, 'Alice')
Modify existing
UPDATE UPDATE Students SET Age = 23 WHERE ID = 101
data
CREATE Define a new CREATE TABLE Students (ID INT PRIMARY KEY,
TABLE table Name VARCHAR(50))
DROP
Delete a table DROP TABLE Students
TABLE
(b) Joins
Join
Description Example
Type
LEFT All rows from left + SELECT * FROM Students LEFT JOIN Enrollments
JOIN matching right ON Students.ID = Enrollments.StudentID
FULL All rows from both SELECT * FROM Students FULL JOIN Enrollments
JOIN tables ON Students.ID = Enrollments.StudentID
• Atomicity: Transactions are all-or-nothing (if one part fails, the whole transaction
rolls back).
• Consistency: Transactions bring the database from one valid state to another.
• Isolation: Concurrent transactions don’t interfere (via locking or MVCC).
• Durability: Once committed, changes persist even after a crash.