Elmasri Storage Hashing

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Disk Storage, Basic File

Structures, and Hashing


(Based on
Elmasri and Navathe, Chapter 13)
Contents
Disk Storage Devices
Files of Records
Operations on Files
Unordered Files
Ordered Files
Hashed Files
– Dynamic and Extendible Hashing Techniques

RAID Technology
Disk Storage Devices
 Preferred secondary storage device
for high storage capacity and low cost.
 Data stored as magnetized areas on
magnetic disk surfaces.
 A disk pack contains several magnetic
disks connected to a rotating spindle.
 Disks are divided into concentric
circular tracks on each disk surface.
Track capacities vary typically from
tens of Kbytes to 150 Kbytes.
Disk Storage Devices (cont.)
Because a track usually contains a
large amount of information, it is
divided into smaller blocks or sectors.
 The division of a track into sectors is hard-
coded on the disk surface and cannot be
changed. One type of sector organization calls
a portion of a track that subtends a fixed
angle at the center as a sector.
 A track is divided into blocks. The block size B
is fixed for each system. Typical block sizes
range from B=512 bytes to B=8192 bytes.
Whole blocks are transferred between disk
and main memory for processing.
Disk Storage Devices (cont.)
Disk Storage Devices (cont.)
 A read-write head moves to the track that contains
the block to be transferred. Disk rotation moves the
block under the read-write head for reading or
writing.
 A physical disk block (hardware) address consists of
a cylinder number (imaginary collection of tracks of
same radius from all recorded surfaces), the track
number or surface number (within the cylinder), and
block number (within track).
 Reading or writing a disk block is time consuming
because of the seek time s and rotational delay
(latency) rd.
 Double buffering can be used to speed up the
transfer of contiguous disk blocks.
– If CPU processing is faster than I/O processor’s
transferring
Disk Storage Devices (cont.)
Typical Disk
Parameters

Chapter 13-
8
Records
 Fixed and variable length records
 Records contain fields which have values of a
particular type (e.g., amount, date, time, age)
 Fields themselves may be fixed length or
variable length
 Variable length fields can be mixed into one
record: separator characters or length fields
are needed so that the record can be “parsed”.

Chapter 13-
9
Blocking
 Blocking: refers to storing a number of
records in one block on the disk.
 Blocking factor (bfr) refers to the number of
records per block.
 There may be empty space in a block if an
integral number of records do not fit in one
block.
 Spanned Records: refer to records that
exceed the size of one or more blocks and
hence span a number of blocks.
Files of Records
 A file is a sequence of records, where each
record is a collection of data values (or data
items).
 A file descriptor (or file header ) includes
information that describes the file, such as the
field names and their data types, and the
addresses of the file blocks on disk.
 Records are stored on disk blocks. The blocking
factor bfr for a file is the (average) number of
file records stored in a disk block.
 A file can have fixed-length records or variable-
Files of Records (cont.)
 File records can be unspanned (no record can
span two blocks) or spanned (a record can be
stored in more than one block).
 The physical disk blocks that are allocated to
hold the records of a file can be contiguous,
linked, or indexed.
 In a file of fixed-length records, all records have
the same format. Usually, unspanned blocking is
used with such files.
 Files of variable-length records require
additional information to be stored in each
record, such as separator characters and field
types. Usually spanned blocking is used with
Operation on Files
Typical file operations include:
 OPEN: Readies the file for access, and
associates a pointer that will refer to a current
file record at each point in time.
 FIND: Searches for the first file record that
satisfies a certain condition, and makes it the
current file record.
 FINDNEXT: Searches for the next file record
(from the current record) that satisfies a certain
condition, and makes it the current file record.
 READ: Reads the current file record into a
program variable.
 INSERT: Inserts a new record into the file, and
makes it the current file record.
Operation on Files (cont.)
 DELETE: Removes the current file record from
the file, usually by marking the record to
indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of
the current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Reorganizes the file records.
For example, the records marked deleted are
physically removed from the file or a new
organization of the file records is created.
 READ_ORDERED: Read the file blocks in
order of a specific field of the file.
Unordered Files
 Also called a heap or a pile file.

 New records are inserted at the end of the file.

 To search for a record, a linear search through


the file records is necessary. This requires
reading and searching half the file blocks on the
average, and is hence quite expensive.
 Record insertion is quite efficient.
 Reading the records in order of a particular
field requires sorting the file records.
Ordered

Files
Also called a sequential file.
 File records are kept sorted by the values of an
ordering field.
 Insertion is expensive: records must be inserted
in the correct order. It is common to keep a
separate unordered overflow (or transaction )
file for new records to improve insertion
efficiency; this is periodically merged with the
main ordered file.
 A binary search can be used to search for a
record on its ordering field value. This requires
reading and searching log2 of the file blocks on
the average, an improvement over linear search.
 Reading the records in order of the ordering
Ordered Files
(cont.)
Average Access Times
The following table shows the average access time
to access a specific record for a given type of
file
Hashed Files
 Hashing for disk files is called External Hashing
 The file blocks are divided into M equal-sized
buckets, numbered bucket0, bucket1, ..., bucket M-1.
Typically, a bucket corresponds to one (or a fixed
number of) disk block.
 One of the file fields is designated to be the hash key
of the file.
 The record with hash key value K is stored in bucket
i, where i=h(K), and h is the hashing function.
 Search is very efficient on the hash key.
 Collisions occur when a new record hashes to a
bucket that is already full. An overflow file is kept
for storing such records. Overflow records that hash
Hashed Files (cont.)
There are numerous methods for collision resolution, including
the following:
 Open addressing: Proceeding from the occupied position
specified by the hash address, the program checks the
subsequent positions in order until an unused (empty)
position is found.
 Chaining: For this method, various overflow locations are
kept, usually by extending the array with a number of
overflow positions. In addition, a pointer field is added to
each record location. A collision is resolved by placing the
new record in an unused overflow location and setting the
pointer of the occupied hash address location to the address
of that overflow location.
 Multiple hashing: The program applies a second hash
function if the first results in a collision. If another collision
results, the program uses open addressing or applies a
Hashed Files (cont.)
Hashed Files (cont.)
 To reduce overflow records, a hash file is
typically kept 70-80% full.
 The hash function h should distribute the
records uniformly among the buckets;
otherwise, search time will be increased
because many overflow records will exist.
 Main disadvantages of static external hashing:
- Fixed number of buckets M is a problem if
the number of records in the file grows or
shrinks.
- Ordered access on the hash key is quite
inefficient (requires sorting the records).
Hashed Files - Overflow handling
Dynamic And Extendible Hashed
Files
Dynamic and Extendible Hashing Techniques
 Hashing techniques are adapted to allow the
dynamic growth and shrinking of the number of
file records.
 These techniques include the following: dynamic
hashing , extendible hashing , and linear hashing .
 Both dynamic and extendible hashing use the
binary representation of the hash value h(K) in
order to access a directory. In dynamic hashing
the directory is a binary tree. In extendible
hashing the directory is an array of size 2d where
d is called the global depth.
Dynamic And Extendible Hashing
(cont.)
 The directories can be stored on disk, and they
expand or shrink dynamically. Directory entries
point to the disk blocks that contain the stored
records.
 An insertion in a disk block that is full causes the
block to split into two blocks and the records are
redistributed among the two blocks. The directory
is updated appropriately.
 Dynamic and extendible hashing do not require an
overflow area.
 Linear hashing does require an overflow area but
does not use a directory. Blocks are split in linear
Examples of Extendible Hashing
 Suppose that we are using an extendible hash table with bucket
size 2 and suppose that our hash function H is such that
 H(ANT) = 1110... H(DOG) = 0101... H(PIG) = 1001... H(BEAR) =
0010... H(ELK) = 1000... H(RAT) = 0000... H(CAT) = 1010...
H(GORN) = 1010... H(WOLF) = 0111... H(COW) = 0001...
H(MOOSE) = 0001...
000 COW RAT 000 COW RAT
001 BEAR 001 BEAR
010 010
DOG DOG WOLF
011 011
100 100
101 101
CAT ELK CAT ELK
110 110
111 111

Original state Insert WOLF


000 COW RAT 000 COW RAT
0000 RAT
001 BEAR 001 BEAR
0001 COW MOOSE
010 010
DOG DOG 0010
011 011 BEAR
100 ELK 0011
100
CAT ELK 101 CAT GORN 0100
101
110 110 0101
ANT DOG
111 111 0110

Insert ANT Insert GORN 0111


1000
1001
000 COW RAT
1010
00 COW BEAR 001 BEAR
1011
010 CAT ELK
01 DOG 1100
011
100
1101
10 101 1110
CAT ELK CAT ELK
110 1111
11 111

Delete RAT Delete DOG Insert MOOSE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy