L14 PPT IVSem

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Lecture- 14

File
Structures
Sequential Files,
Indexed Sequential Files

OBJECTIVES
After reading this chapter, we should
be able to:
Understand the file access methods.
methods.
Describe the characteristics of a sequential file.
file.
Describe the characteristics of an indexed file.
file.
Describe the characteristics of a hashed file.
file.
Distinguish between a text file and a binary file
file..

File
Fil
A file is an external collection of related data
treated as a unit.
Files
Fil are stored
t d iin auxiliary/secondary
ili /
d
storage devices.

Disk
Tapes

A file is a collection of data records with


each record consisting
g of one or more fields.

13.1
ACCESS
METHODS

Figure 13-1

Taxonomy of file structures

The access method

determines how records can be retrieved:


sequentially
ti ll or randomly.
d l

One record after another,


from beginning to end

Access one specific record


without having to retrieve all records before it

SEQUENTIAL
FILES

Figure 13-2

Sequential file

Sequential file
records can only
onl be accessed sequentially,
seq entiall
one after another, from beginning to end.

Program 13.1
13 1 Processing records in a sequential file
While Not EOF
{
Read the next record
Process the record
}

A li ti
Applications

Applications
that need to access all records from beginning to end.
Personal

information

Because you have to process each record,


sequential
q
access is more efficient and easier than
random access.

Sequential file is not efficient for random access.

Updating sequential files

sequential files must be updated periodically to


reflect changes
g in information.
The updating process
all of the records need to be checked and updated
(if necessary) sequentially
sequentially.
New Master File
Old Master File
Transaction
T
i Fil
File

contains changes to be applied to the master file.

Add transaction
D l t ttransaction
Delete
ti
Change transaction
A key is one or more fields that uniquely identify the data in the file.

Error

Report File

Updating a sequential file

Updating sequential files

To make updating process efficient, all files are


sorted on the same key
key.
The update process requires that you compare :
[transaction file key] vs.
vs [old master file key]
<

: add transaction to new master


= :

>

Change content of master file data (transaction code = R(revise) )


Remove data from master file (transaction code = D(delete) )

: write
it old
ld master
t fil
file record
d tto new master
t fil
file
(transaction code = A(add) )

Updating process

INDEXED
FILES

Mapping in an indexed file

To access a record in a file randomly,


y
you need to know the address of the record.
An index file can relate the key to the record address.

Indexed files

An index file is made of a data file, which is a sequential file,


and an index.
Index a small file with only two fields:

To access a record in the file :


1.
2
2.
3.
4.

The key of the sequential file


The address of the corresponding record on the disk
disk.
Load the entire index file into main memory.
S
Search
h the
th iindex
d fil
file to
t find
fi d th
the d
desired
i d kkey.
Retrieve the address the record.
Retrieve the data record. (using the address)

Inverted file
yyou can have more than one index,, each with a different key.
y

i
inverted
t d file
fil

A file that reorganizes the structure of an existing data file to enable


a rapid search to be made for all records having one field falling
within set limits.

For example, a file used by an estate agent might store records on


each house for sale, using a reference number as the key field for
sorting. One field in each record would be the asking price of the
house To speed up the process of drawing up lists of houses falling
house.
within certain price ranges, an inverted file might be created in which
the records are rearranged according to price. Each record would
consist of an asking price, followed by the reference numbers of all
the houses offered for sale at this approximate price
price.

Logical view of an indexed file

HASHED
FILES

Mapping
pp g in a hashed file

A hashed file uses a hash function to map the key to the


address.
Eliminates the need for an extra file (index).
There is no need for an index and all of the overhead
associated with itit.

Hashing
g methods

Direct Hashing
the keyy is the address without anyy algorithmic
g
manipulation.
p

Modulo Division Hashing (Division remainder hashing)


divides the key by the file size and
use the remainder plus 1 for the address.

Digit Extraction Hashing


selected digits are extracted from the key and used as the
address.

Direct hashing

Direct Hashing
the key is the address without any algorithmic manipulation.

Direct Hashing
g

the file must contain a record for every possible key.


Adv. no collision.
Disadv. space is wasted.
Hashing techniques
map a large population of possible keys into
a small address space.

Modulo division

address = key % list_size + 1


list_size : a prime number produces fewer collisions

A new employee
p y numbering
g system
y
that will handle 1 million employees.

Digit
g Extraction Hashingg

selected digits are extracted from the key


and used as the address.

For example :
1,3,4
6 di it employee
6-digit
l
number
b 3-digit
3 di it address
dd

125870 158

122801 128

121267 112

123413 134

Collision

Because there are many keys for each address in the file,
there is a possibility that more than one key will hash to the
same address in the file.
Synonyms the set of keys that hash to the same address
address.
Collision a hashing algorithm produces an address for an
insertion key, and that address is already occupied.
Prime area the part of the file that contains all of the home
addresses.
Home address

Collision Resolution

With the exception of the directed hashing,


none of the methods we discussed creates one-to-one
mapping.
mapping

Several collision resolution methods :

Open addressing resolution


Linked list resolution
Bucket hashing
g resolution

Figure 13-11

Open addressing resolution

Resolve collisions in the prime area.


The prime area addresses are searched for an open or
unoccupied
i d record
d where
h
th
the new d
data
t can b
be placed.
l
d
One simplest strategy
the next address ((home address + 1))
Disadv.
each collision resolution increases the possibility of future
collisions.
collisions

Linked list resolution

The first record is stored in the home address (prime area),


but it contains a pointer to the second record. (overflow area)

Figure 13-13

Bucket hashing resolution

Bucket
a node that can accommodate more than one record.

TEXT
VERSUS
BINARY

Text and binary interpretations of a file

A file stored on a storage device


is a sequence of bits that can be interpreted by an
application program as a text file or a binary file.
file

Text vs. Binary

Text files

A file of characters.
C
Cannot
t contain
t i iintegers,
t
fl
floating-point
ti
i t numbers,
b
or any other
th d
data
t
structures in their internal memory format.
Encoding system ASCII or EBCDIC

Binary files

A collection of data stored in the internal format of the computer.


Contain data that are meaningful
only if they are properly interpreted by a program.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy