PPT -CC-UNIT-5
PPT -CC-UNIT-5
Storage Systems: Evolution of storage technology, storage models, file systems and
database, distributed file systems, general parallel file systems. Google file system.
M M
A A time
time
Before-or-after atomicity: the result of every
Read/Wr Read or Write is the same as if that Read or
ite Write occurred either completely before or
coheren completely after any other Read or Write.
ce: the
result of
a Read
of
Read/write coherence and before-or-after atomicity are two highly desirable
memory
cell M
properties of any storage modelshould
and in particular of cell storage
be the
10/16/2023 same asUnit-5
Cloud Computing/ 8
the
most
Data Base Management System (DBMS)
◼ Database ➔ a collection of logically-related records.
◼ Data Base Management System (DBMS) ➔ the software that controls the access
to the database.
◼ Query language ➔ a dedicated programming language used to develop database
applications.
◼ Most cloud application do not interact directly with the file systems, but through a
DBMS.
◼ Database models ➔ reflect the limitations of the hardware available at the time
and the requirements of the most popular applications of each period.
navigational model of the 1960s.
relational model of the 1970s. MySQL, Oracle, and Microsoft SQL Server
object-oriented model of the 1980s. MongoDB and Cassandra
NoSQL model of the first decade of the 2000s.
10/16/2023 Cloud Computing/ Unit-5 9
Requirements of cloud applications
◼ Most cloud applications are data-intensive and test the limitations of the existing infrastructure.
Requirements:
Rapid application development and short-time to the market.
Low latency.
Scalability.
High availability.
Consistent view of the data.
◼ These requirements cannot be satisfied simultaneously by existing database models; e.g.,
relational databases are easy to use for application development but do not scale well.
◼ The NoSQL model is useful when thestructure of the data does not require a relational model
and the amount of data is very large.
Does not support SQL as a query language.
May not guarantee the ACID (Atomicity, Consistency, Isolation, Durability) properties of
traditional databases; it usually guarantees the eventual consistency for transactions limited
to a single data item.
10/16/2023 Cloud Computing/ Unit-5 10
Logical and physical organization of a file
◼ File ➔ a linear array of cells stored on a persistent storage device. Viewed
by an application as a collection of logical records; the file is stored on a
physical device as a set of physical records, or blocks, of size dictated by
the physical media.
◼ File pointer➔ identifies a cell used as a starting point for a read or
write operation.
◼ The logical organization of a file ➔ reflects the data model, the view of the
data from the perspective of the application.
◼ The physical organization of a file ➔ reflects the storage model and
describes the manner the file is stored on a given storage media.
10/16/2023
Cloud Computing/ 15
Unit-5
The NFS client-server interaction. The vnode layer implements file operation in a
uniform manner, regardless of whether the file is local or remote.
An operation targeting a local file is directed to the local file system, while one for a
remote file involves NFS; an NSF client packages the relevant information about the
target and the NFS server passes it to the vnode layer on the remote host which, in turn,
directs it to the remote file system.
10/16/2023 Cloud Computing/ Unit-5 16
Comparison of distributed file systems
• A GPFS
configuration. LAN1
The disks are
interconnected
LAN2
by a SAN and
compute
servers are
distributed in disk
SAN
four disk
are connected
to LAN1
10/16/2023 Cloud Computing/ Unit-5 21
GPFS reliability
◼ To recover from system failures, GPFS records all metadata updates in
a write-ahead log file.
◼ Write-ahead ➔ updates are written to persistent storage only after the
log records have been written.
◼ The log files are maintained by each I/O node for each file system it
mounts; any I/O node can initiate recovery on behalf of a failed node.
◼ Data striping allows concurrent access and improves performance but
can have unpleasant side-effects. When a single disk fails, a large
number of files are affected.
◼ The system uses RAID devices with the stripes equal to the block size
and dual-attached RAID controllers.
◼ To further improve the fault tolerance of the system, GPFS data files as
well as metadata are replicated on two different physical disks.
10/16/2023 Cloud Computing/ Unit-5 22
GPFS distributed locking
◼ In GPFS, consistency and synchronization are ensured by a distributed locking
mechanism. A central lock manager grants lock tokens to local lock managers running in
each I/O node. Lock tokens are also used by the cache management system.
◼ Lock granularity has important implications on the performance.
GPFS uses a variety of techniques for different types of data.
Byte-range tokens ➔ used for read and write operations to data files as follows: the
first node attempting to write to a file acquires a token covering the entire file; this
node is allowed to carry out all reads and writes to the file without any need for
permission until a second node attempts to write to the same file; then, the range of
the token given to the first node is restricted.
Data-shipping ➔an alternative to byte-range locking, allows fine-grain data sharing. In
this mode the file blocks are controlled by the I/O nodes in a round-robin manner. A
node forwards a read or write operation to the node controlling the target block, the
only one allowed to access the file.
Chunk data
State
information
Instructions
Communication network
Chunk handle &
data count