Storage and File Structure
Storage and File Structure
1
Topic 7: Storage and File Structure
2
Classification of Physical Storage Media
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Magnetic Disks
• Read-write head
• Positioned very close to the platter surface (almost touching it)
• Reads or writes magnetically encoded information.
• Surface of platter divided into circular tracks
• Over 50K-100K tracks per platter on typical hard disks
• Each track is divided into sectors.
• A sector is the smallest unit of data that can be read or written.
• Sector size typically 512 bytes
• Typical sectors per track: 500 to 1000 (on inner tracks) to 1000 to 2000 (on outer
tracks)
• To read/write a sector
• disk arm swings to position head on right track
• platter spins continually; data is read/written as sector passes under head
• Head-disk assemblies
• multiple disk platters on a single spindle (1 to 5 usually)
• one head per platter, mounted on a common arm.
• Cylinder i consists of ith track of all the platters
Magnetic Disks (Cont.)
• Earlier generation disks were susceptible to head-crashes
• Surface of earlier generation disks had metal-oxide coatings which would
disintegrate on head crash and damage all data on disk
• Current generation disks are less susceptible to such disastrous failures,
although individual sectors may get corrupted
• Disk controller – interfaces between the computer system and the disk drive
hardware.
• accepts high-level commands to read or write a sector
• initiates actions such as moving the disk arm to the right track and actually
reading or writing the data
• Computes and attaches checksums to each sector to verify that data is read
back correctly
• If data is corrupted, with very high probability stored checksum won’t
match recomputed checksum
• Ensures successful writing by reading back sector after writing it
• Performs remapping of bad sectors
Disk Subsystem
R6 R3 R1 R5 R2 R4
• Deletion of record i:
alternatives:
• move records i + 1, . . ., n
to i, . . . , n – 1
• move record n to i
• do not move records, but
link all free records on a
free list
Deleting record 3 and compacting
Deleting record 3 and moving last record
Free Lists
• Store the address of the first deleted record in the file header.
• Use this first record to store the address of the second deleted
record, and so on
• Can think of these stored addresses as pointers since they “point” to
the location of a record.
• More space efficient representation: reuse space for normal
attributes of free records to store pointers. (No pointers stored in in-
use records.)
Variable-Length Records
• Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields such as
strings (varchar)
• Record types that allow repeating fields (used in some older data
models).
• Attributes are stored in order
• Variable length attributes represented by fixed size (offset,
length), with actual data stored after all fixed length attributes
• Null values represented by null-value bitmap
Variable-Length Records: Slotted Page Structure
department
instructor
multitable clustering
of department and
instructor
Multitable Clustering File Organization (cont.)
• good for queries involving department instructor, and for
queries involving one single department and its instructors
• bad for queries involving only department
• results in variable size records
• Can add pointer chains to link records of a particular relation
Data Dictionary Storage
The Data dictionary (also called system catalog) stores
metadata; that is, data about data, such as
• Information about relations
• names of relations
• names, types and lengths of attributes of each relation
• names and definitions of views
• integrity constraints
• User and accounting information, including passwords
• Statistical and descriptive data
• number of tuples in each relation
• Physical file organization information
• How relation is stored (sequential/hash/…)
• Physical location of relation
• Information about indices (Chapter 11)
Relational Representation of System Metadata
• Relational
representation
on disk
• Specialized data
structures
designed for
efficient access,
in memory
Storage Access
• A database file is partitioned into fixed-length storage
units called blocks. Blocks are units of both storage
allocation and data transfer.
• Database system seeks to minimize the number of block
transfers between the disk and memory. We can reduce
the number of disk accesses by keeping as many blocks as
possible in main memory.
• Buffer – portion of main memory available to store copies
of disk blocks.
• Buffer manager – subsystem responsible for allocating
buffer space in main memory.
Buffer Manager
• Programs call on the buffer manager when they need a
block from disk.
1. If the block is already in the buffer, buffer manager returns
the address of the block in main memory
2. If the block is not in the buffer, the buffer manager
1. Allocates space in the buffer for the block
1. Replacing (throwing out) some other block, if required, to make
space for the new block.
2. Replaced block written back to disk only if it was modified since
the most recent time that it was written to/fetched from the disk.
2. Reads the block from the disk to the buffer, and returns the
address of the block in main memory to requester.
Buffer-Replacement Policies
• Most operating systems replace the block least recently
used (LRU strategy)
• Idea behind LRU – use past pattern of block references as a
predictor of future references
• Queries have well-defined access patterns (such as
sequential scans), and a database system can use the
information in a user’s query to predict future references
• LRU can be a bad strategy for certain access patterns involving
repeated scans of data
• For example: when computing the join of 2 relations r and s by a
nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …
• Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
• Pinned block – memory block that is not allowed to be written
back to disk.
• Toss-immediate strategy – frees the space occupied by a block
as soon as the final tuple of that block has been processed
• Most recently used (MRU) strategy – system must pin the block
currently being processed. After the final tuple of that block has
been processed, the block is unpinned, and it becomes the most
recently used block.
• Buffer manager can use statistical information regarding the
probability that a request will reference a particular relation
• E.g., the data dictionary is frequently accessed. Heuristic: keep
data-dictionary blocks in main memory buffer
• Buffer managers also support forced output of blocks for the
purpose of recovery (more in Chapter 16)