module_3_Study Session_1
module_3_Study Session_1
module_3_Study Session_1
Introduction
3.0 Conclusion/Summary
4.0 Self-Assessment Questions
5.0 Additional Activities (Videos, Animations & out of Class Activities)
6.0 Reference/Further Reading
Introduction
The collection of data that makes up a computerized database must be stored physically on some computer
storage medium. The DBMS software can then retrieve, update, and process this data as needed. Computer
storage media form a storage hierarchy that includes two main categories: primary storage and secondary
best suit the given application requirements from among the options. DBMS system implementers must
study data organization techniques so that they can implement them efficiently and thus provide the DBA
and users of the DBMS with sufficient options. Typical database applications need only a small portion of
the database at a time for procesing. Whenever a certain portion of the data is needed, it must be located
on disk, copied to main memory for processing, and then rewritten to the disk if the data is changed. The
data stored on disk is organized as files of records. Each record is a collection of data values that can be
interpreted as facts about entities, their attributes, and their relationships. Records should be stored on disk
in a manner that makes it possible to locate them efficiently when they are needed.
■ Primary storage. This category includes storage media that can be operated on directly by the
computer’s central processing unit (CPU), such as the computer’s main memory and smaller but faster
cache memories. Primary storage usually provides fast access to data but is of limited storage capacity.
Although main memory capacities have been growing rapidly in recent years, they are still more expensive
and have less storage capacity than secondary and tertiary storage devices.
■ Secondary and tertiary storage. This category includes magnetic disks, optical disks (CD-ROMs,
DVDs, and other similar storage media), and tapes. Hard-disk drives are classified as secondary storage,
whereas removable media such as optical disks and tapes are considered tertiary storage. These devices
usually have a larger capacity, cost less, and provide slower access to data than do primary storage devices.
Data in secondary or tertiary storage cannot be processed directly by the CPU; first it must be copied into
We first give an overview of the various storage devices used for primary and secondary storage, and then
highest-speed memory is the most expensive and is therefore available with the least capacity. The lowest-
speed memory is offline tape storage, which is essentially available in indefinite storage capacity.
At the primary storage level, the memory hierarchy includes at the most expensive end, cache memory,
which is a static RAM (Random Access Memory). Cache memory is typically used by the CPU to speed
up execution of program instructions using techniques such as prefetching and pipelining. The next level
of primary storage is DRAM (Dynamic RAM),which provides the main work area for the CPU for keeping
program instructions and data. It is popularly called main memory. The advantage of DRAM is its low cost,
which continues to decrease; the drawback is its volatility1 and lower speed compared with static RAM .At
the secondary and tertiary storage level, the hierarchy includes magnetic disks, as well as mass storage in
the form of CD-ROM (Compact Disk–Read-Only Memory) and DVD (Digital Video Disk or Digital
Versatile Disk) devices, and finally tapes at the least expensive end of the hierarchy. The storage capacity
is measured in kilobytes (Kbyte or 1000 bytes), megabytes (MB or 1 million bytes), gigabytes (GB or 1
billion bytes), and even terabytes (1000 GB). The word petabyte (1000 terabytes or 10**15 bytes) is now
becoming relevant in the context of very large repositories of data in physics, astronomy, earth sciences,
Programs reside and execute in DRAM. Generally, large permanent databases reside on secondary storage,
(magnetic disks), and portions of the database are read into and written from buffers in main memory as
needed. Nowadays, personal computers and workstations have large main memories of hundreds of
megabytes of RAM and DRAM, so it is becoming possible to load a large part of the database into main
memory. Eight to 16 GB of main memory on a single server is becoming commonplace. In some cases,
entire databases can be kept in main memory (with a backup copy on magnetic disk), leading to main
memory databases; these are particularly useful in real-time applications that require extremely fast
response times. An example is telephone switching applications, which store databases that contain routing
and line information in main memory. Between DRAM and magnetic disk storage, another form of
memory, flash memory, is becoming common, particularly because it is nonvolatile. Flash memories are
Only Memory) technology. The advantage of flash memory is the fast access speed; the disadvantage is
that an entire block must be erased and written over simultaneously. Flash memory cards are appearing as
the data storage medium in appliances with capacities ranging from a few megabytes to a few gigabytes.
These are appearing in cameras, MP3 players, cell phones, PDAs, and so on. USB (Universal Serial Bus)
flash drives have become the most portable medium for carrying data between personal computers; they
have a flash memory storage device integrated with a USB interface. CD-ROM (Compact Disk – Read
Only Memory) disks store data optically and are read by a laser. CD-ROMs contain prerecorded data that
cannot be overwritten. WORM (Write-Once-Read-Many) disks are a form of optical storage used for
archiving data; they allow data to be written once and read any number of times without the possibility of
erasing. They hold about half a gigabyte of data per disk and last much longer than magnetic disks.2 Optical
jukebox memories use an array of CD-ROM platters, which are loaded onto drives on demand. Although
optical jukeboxes have capacities in the hundreds of gigabytes, their retrieval times are in the hundreds of
milliseconds, quite a bit slower than magnetic disks. This type of storage is continuing to decline because
of the rapid decrease in cost and increase in capacities of magnetic disks. The DVD is another standard for
optical disks allowing 4.5 to 15 GB of storage per disk. Most personal computer disk drives now read
CDROM and DVD disks. Typically, drives are CD-R (Compact Disk Recordable) that can create CD-
ROMs and audio CDs (Compact Disks), as well as record on DVDs. Finally, magnetic tapes are used for
archiving and backup storage of data. Tape jukeboxes—which contain a bank of tapes that are catalogued
and can be automatically loaded onto tape drives—are becoming popular as tertiary storage to hold
terabytes of data. For example, NASA’s EOS (Earth Observation Satellite) system stores archived databases
in this fashion. Many large organizations are already finding it normal to have terabyte-sized databases.
The term very large database can no longer be precisely defined because disk storage capacities are on the
rise and costs are declining. Very soon the term may be reserved for databases containing tens of terabytes
1.1.2 Storage of Databases
Databases typically store large amounts of data that must persist over long periods of time, and hence is
often referred to as persistent data. Parts of this data are accessed and processed repeatedly during this
period. This contrasts with the notion of transient data that persist for only a limited time during program
execution. Most databases are stored permanently (or persistently) on magnetic disk secondary storage, for
■ The circumstances that cause permanent loss of stored data arise less frequently for disk secondary storage
than for primary storage. Hence, we refer to disk—and other secondary storage devices—as nonvolatile
■ The cost of storage per unit of data is an order of magnitude less for disk secondary storage than for
primary storage. Some of the newer technologies—such as optical disks, DVDs, and tape jukeboxes—are
likely to provide viable alternatives to the use of magnetic disks. In the future, databases may therefore
reside at different levels of the memory hierarchy. However, it is anticipated that magnetic disks will
continue to be the primary medium of choice for large databases for years to come. Hence, it is important
to study and understand the properties and characteristics of magnetic disks and the way data files can be
organized on disk in order to design effective databases with acceptable performance. Magnetic tapes are
frequently used as a storage medium for backing up databases because storage on tape costs even less than
storage on disk. However, access to data on tape is quite slow. Data stored on tapes is offline; that is, some
intervention by an operator—or an automatic loading device—to load a tape is needed before the data
becomes available. In contrast, disks are online devices that can be accessed directly at any time. The
techniques used to store large amounts of structured data on disk are important for database designers, the
DBA, and implementers of a DBMS. Database designers and the DBA must know the advantages and
disadvantages of each storage technique when they design, implement, and operate a database on a specific
DBMS. Usually, the DBMS has several options available for organizing the data. The process of physical
database design involves choosing the particular data organization techniques that best suit the given
application requirements from among the options. DBMS system implementers must study data
organization techniques so that they can implement them efficiently and thus provide the DBA and users
of the DBMS with sufficient options. Typical database applications need only a small portion of the
database at a time for procesing. Whenever a certain portion of the data is needed, it must be located on
disk, copied to main memory for processing, and then rewritten to the disk if the data is changed. The data
stored on disk is organized as files of records. Each record is a collection of data values that can be
interpreted as facts about entities, their attributes, and their relationships. Records should be stored on disk
in a manner that makes it possible to locate them efficiently when they are needed. There are several
primary file organizations, which determine how the file records are physically placed on the disk, and
hence how the records can be accessed. A heap file (or unordered file) places the records on disk in no
particular order by appending new records at the end of the file, whereas a sorted file (or sequential file)
keeps the records ordered by the value of a particular field (called the sort key). A hashed file uses a hash
function applied to a particular field (called the hash key) to determine a record’s placement on disk. Other
primary file organizations, such as B-trees, use tree structures. A secondary organization or auxiliary
access structure allows efficient access to file records based on alternate fields than those that have been
3.0 Conclusion/Summary
In particular, we focused on magnetic disks because they are used most often to store online database files.
Data on disk is stored in blocks; accessing a disk block is expensive because of the seek time, rotational
delay, and block transfer time. To reduce the average block access time, double buffering can be used when
accessing consecutive disk blocks. We presented different ways of storing file records on disk. File records
are grouped into disk blocks and can be fixed length or variable length, spanned or unspanned, and of the
same record type or mixed types. We discussed the file header, which describes the record formats and
keeps track of the disk addresses of the file blocks. Information in the file header is used by system software
In-text Questions
List At least three examples of secondary starage devices
In-text Answer
magnetic disks, optical disks (CD-ROMs, DVDs,
2. Why are disks, not tapes, used to store online database files?
3. Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, read/write
head.