File Processing
File Processing
and orgnization
Dr. Nehal Nabil Hassan Mostafa
References
1-Introduction to
File Pocessing
Data Structures vs.
File Structures
►Difference:
–Data Structures deal with data
►Both involve: Representation
in main memory
of Data
–File Structures deal with data
+
in secondary storage device
Operations for accessing data
(File).
Computer Architecture
increase in capacity & Access
CPU Differences
Increase in cost per byte
—Fast
Register —Small
—Expensive
—Volatile
time
Cache
—Slow
RAM Main Memory —Large
—Cheap
—Stable
HDD,SSD,CD Second storage
Memory hierarchy
On systems with 32-bit addressing, only 2^32 bytes can be
directly referenced in main memory.
The number of data objects may exceed this number!
Data must be maintained across program executions. This
requires storage devices that retain information when the
computer is restarted.
- We call such storage nonvolatile.
-Primary storage is usually volatile, whereas secondary and
tertiary storage are nonvolatile.
►Typical times for getting info
Volatile (information is lost when power failure Stable, persistent (information is preserved
occurs longer)
The goal of the Course
Minimize number of trips to the disk in order to get desired
information. Ideally get what we need in one disk access or get it with as
few disk access as possible.
Reduce the number of disk Good file orgnization By collecting data into buffers,
accesses and processing blocks or buckets
Design
5- What about getting info with a single request?
Hashing Tables (Theory developed over 60’s and 70’s but
still a research topic)
good when files do not change too much in time.
Expandable, dynamic hashing (late 70’s and 80’s) one or
two disk accesses even if file grows dramatically
2-Fundemental of
File Processing
Operations
What is a File? A collection of data is placed under
permanent or non-volatile storage
File DBMS
Processing
Operating systems
Hardware
The program (application) sends (or receives) bytes to (from) a file through the logical file. The
program knows nothing about where the bytes go (came from).
The operating system is responsible for associating a logical file n a program to a physical file in
disk or tape. Writing to or reading from a file in a program in done through the operating system.
The program (application) sends (or receives) bytes to (from) a file through the
logical file. The program knows nothing about where the bytes go (came from).
The operating system is responsible for associating a logical file n a program
to a physical file in disk or tape. Writing to or reading from a file in a program
in done through the operating system.
Note that from the program point of view, input devices (keyboard) and output
devices (console, printer, etc) are treated as files - places where bytes come
from or are sent to.
There may be thousands of physical files on a disk, but a program only have
about 20 logical files open at the same time.
The physical file has a name, for instance myfile.txt
The logical file has a logical name used for referring to the file inside the
program. This logical name is a variable inside the program, for instance
outfile.
Basic file Opening a file
basically, links a logical file to a physical file.
–On open, the O/S performs a series operations that
operations end in the program that is trying to open the file being
assigned a file descriptor.
–Additionally, the O/S will perform particular
operations on the file at the request of the calling
program, these operations are intended to ‘initialize’
the file for use by the program.
►Two options for opening a file:
–Open an existing file
–Create a new file
The mode
Example #include <fstream>
#include <iostream>
using namespace std ;
int main(){
char c;
fstream infile ;
infile.open("account.txt",ios::in) ;
infile.unsetf(ios::skipws) ;
infile >> c ;
while (! infile.fail()){
cout << c ;
infile >> c ;
}
infile.close() ;
return 0;
}
Basic file
operations Closing a file
Keys
Secondary key
Primary Key
Other keys that may be used for
A key that uniquely identifies a
search
record.
►Note that
In general not every field is a key, Keys correspond to fields, or combination of
fields, that may be used in a search.
FILE ACCESS METHODS
Search for a record matching a given key
1.Sequential Search
Look at records sequentially until matching record is found. Time is
in O(n) for n records.
Appropriate for Pattern matching, file with few records
FILE ACCESS METHODS
Search for a record matching a given key
2.Direct Access
We might prefer to jump directly to the location of a target record, then
read its contents.
. Time is in O(1) for n records.
One example of direct access you will immediately recognize is array
indexing.
Direct access
First, we need fixed-length records, since we need
to know how far to offset from the front of the file
to find the i-th record.
Second, we need some way to convert a record’s
key into an offset location.
Very Slow
Finding Information
Information Fast
If we have a sorted file, we
can perform a binary search
to locate information, this is
much faster than sequentially
looking at each record! (recall
that sequential search is O(n),
while binary search is
O(lg n) ).
but....
The file must be sorted, and maintaining this
property is very expensive.
Records must be fixed length, otherwise we cannot
jump directly to the i-th record in the file.
Binary search still requires more than one or two
seeks to find a record, even on moderately sized
files.
3. An index: a list of pairs (key, reference), sorted by key