0% found this document useful (0 votes)

14 views55 pages

File Processing

Uploaded by

testforanything13579

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views55 pages

File Processing

Uploaded by

testforanything13579

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

File Processing

and orgnization
Dr. Nehal Nabil Hassan Mostafa
References
1-Introduction to
File Pocessing
Data Structures vs.
File Structures

►Difference:
–Data Structures deal with data
►Both involve: Representation
in main memory
of Data
–File Structures deal with data
+
in secondary storage device
Operations for accessing data
(File).
Computer Architecture
increase in capacity & Access

CPU Differences
Increase in cost per byte

—Fast
Register —Small
—Expensive
—Volatile
time

Cache

—Slow
RAM Main Memory —Large
—Cheap
—Stable
HDD,SSD,CD Second storage
Memory hierarchy
On systems with 32-bit addressing, only 2^32 bytes can be
directly referenced in main memory.
The number of data objects may exceed this number!
Data must be maintained across program executions. This
requires storage devices that retain information when the
computer is restarted.
- We call such storage nonvolatile.
-Primary storage is usually volatile, whereas secondary and
tertiary storage are nonvolatile.
►Typical times for getting info

How Fast? – Main memory: ~120 nanoseconds =120x10^-9

– Magnetic Disks: ~30 milliseconds =30 x10^-6

►An analogy keeping same time proportion as

above
– Looking at the index of a book: 20 seconds
VS
– Going to the library: 58 days
Comparison

Main Memory Secondary Storage

Fast (since electronic) Slow (since electronic and mechanical)

Small (since expensive) Large (since cheap)

Volatile (information is lost when power failure Stable, persistent (information is preserved
occurs longer)
The goal of the Course
Minimize number of trips to the disk in order to get desired
information. Ideally get what we need in one disk access or get it with as
few disk access as possible.

Grouping related information so that we are likely to get everything we

need with only one trip to the disk (e.g. name, address, phone number,
account balance).

Locality of Reference in Time and Space

Fast access to great capacity

Reduce the number of disk Good file orgnization By collecting data into buffers,
accesses and processing blocks or buckets

Manage growth by splitting

these collections
1- In the beginning… it was the tape
–Sequential access
–Access cost proportional to size of file
[Analogy to sequential access to array data

History of File structure]

Processing 2- Disks became more common

Design –Direct access

[Analogy to access to position in array]
–Indexes were invented
•list of keys and points stored in small file
•allows direct access to a large primary file
Great if index fits into main memory.
As file grows we have the same problem we had with a
large primary file
3- Tree structures emerged for main memory (1960`s)
–Binary search trees (BST`s)
–Balanced, self adjusting BST`s: e.g. AVL trees (1963)

History of File 4- A tree structure suitable for files was invented:

B trees (1979) and B+ trees

Processing good for accessing millions of records with 3 or 4 disk

accesses.

Design
5- What about getting info with a single request?
Hashing Tables (Theory developed over 60’s and 70’s but
still a research topic)
good when files do not change too much in time.
Expandable, dynamic hashing (late 70’s and 80’s) one or
two disk accesses even if file grows dramatically
2-Fundemental of
File Processing
Operations
What is a File? A collection of data is placed under
permanent or non-volatile storage

Examples: anything that you can store in a

disk, hard drive, tape, optical media, and any
other medium which doesn’t lose the
information when the power is turned off.

Notice that this is only an informal

definition!
Where do File
Structures fit in CS?
Applications

File DBMS
Processing
Operating systems

Hardware

Back to Agenda Page

Physical File
vs
Logical File
Physical file : physically exists Logical file, what your program
on secondary storage; known actually uses, a ‘pipe’ though
by the operating system; which information can be
appears in its file directory. extracted, or sent.

The program (application) sends (or receives) bytes to (from) a file through the logical file. The
program knows nothing about where the bytes go (came from).
The operating system is responsible for associating a logical file n a program to a physical file in
disk or tape. Writing to or reading from a file in a program in done through the operating system.
The program (application) sends (or receives) bytes to (from) a file through the
logical file. The program knows nothing about where the bytes go (came from).
The operating system is responsible for associating a logical file n a program
to a physical file in disk or tape. Writing to or reading from a file in a program
in done through the operating system.
Note that from the program point of view, input devices (keyboard) and output
devices (console, printer, etc) are treated as files - places where bytes come
from or are sent to.
There may be thousands of physical files on a disk, but a program only have
about 20 logical files open at the same time.
The physical file has a name, for instance myfile.txt
The logical file has a logical name used for referring to the file inside the
program. This logical name is a variable inside the program, for instance
outfile.
Basic file Opening a file
basically, links a logical file to a physical file.
–On open, the O/S performs a series operations that
operations end in the program that is trying to open the file being
assigned a file descriptor.
–Additionally, the O/S will perform particular
operations on the file at the request of the calling
program, these operations are intended to ‘initialize’
the file for use by the program.
►Two options for opening a file:
–Open an existing file
–Create a new file
The mode
Example #include <fstream>
#include <iostream>
using namespace std ;
int main(){
char c;
fstream infile ;
infile.open("account.txt",ios::in) ;
infile.unsetf(ios::skipws) ;
infile >> c ;
while (! infile.fail()){
cout << c ;
infile >> c ;
}
infile.close() ;
return 0;
}
Basic file
operations Closing a file

-cuts the link between physical and logical files

–Upon closing, the OS takes care of
‘synchronizing’ the contents of the file, e.g.
often a buffer is used, need to write buffer
content to file.
In general, files are automatically closed when
the program ends.
Basic file
operations Reading and Writing
– basic I/O operations.
–Usually require three parameters: a logical
file, an address, and the amount of data that
is to be read or written.
Additional file operations
►Seeking: source file, offset.
►Detecting the end of a file
►Detecting I/O error
Seeking with C++
Stream Classes

A fstream has 2 file pointers: get pointer & put pointer

(for input) (for output)
file1.seekg ( byte_offset, origin); //moves get pointer file1.seekp (
byte_offset, origin); //moves put pointer
origin can be ios::beg (beginning of file)
ios::cur (current position)
ios::end (end of file)
3- Managing Files
of Records
File types
A file can be treated as
a stream of bytes
a collection of records with fields
Stream

Every stream has an associated file position

When we open a file, the file position is set to the beginning
The file size read 8 into c++ , increment the file position
The 38th fread() will read the newline character (referred to as
‘\n’ in C/C++) into c and increment the file position.
The 39th fread() will read 0 into c and increment the file
position, and so on.
Field and Record Organization
►Field: a data value, smallest unit of data with logical meaning
►Record: A group of fields that forms a logical unit
► Key: a subset of the fields in a record used to uniquely identify the
record
In our example, “example.txt” contains information about books:
Each line of the file is a record.
Fields in each record:
-ISBN Number,
-Author Name,
– Book Title
In order to manage fields in a file, we need to include
information to identify where one field ends and the next one
begins.
In this case, you might use capital letters to mark field
separators, but that would not work for names like O’Leary or
MacAllen. There are four common methods to delimit fields in
a file.
Methods for Organizing Fields
Fixed length
Begin each field with its Length indicator
Delimiters to separate fields
“keyword=value” pair identifies each field and its content
Identifying records
once records are positioning in a file , a related questions arise. when
we’re searching for target record, how can we identify the record ? that
is , how we distinguish the record we want from the other records in
the file?
Primary keys are used to uniquely identify a record in a file, but they
cannot guarantee uniqueness between records and can cause costly
system updates. A better approach is to generate a non-data field for
each record, like a student ID.
Primary and Secondary Back to Agenda Page

Keys

Secondary key
Primary Key
Other keys that may be used for
A key that uniquely identifies a
search
record.

►Note that
In general not every field is a key, Keys correspond to fields, or combination of
fields, that may be used in a search.
FILE ACCESS METHODS
Search for a record matching a given key
1.Sequential Search
Look at records sequentially until matching record is found. Time is
in O(n) for n records.
Appropriate for Pattern matching, file with few records
FILE ACCESS METHODS
Search for a record matching a given key
2.Direct Access
We might prefer to jump directly to the location of a target record, then
read its contents.
. Time is in O(1) for n records.
One example of direct access you will immediately recognize is array
indexing.
Direct access
First, we need fixed-length records, since we need
to know how far to offset from the front of the file
to find the i-th record.
Second, we need some way to convert a record’s
key into an offset location.
Very Slow
Finding Information
Information Fast
If we have a sorted file, we
can perform a binary search
to locate information, this is
much faster than sequentially
looking at each record! (recall
that sequential search is O(n),
while binary search is
O(lg n) ).
but....
The file must be sorted, and maintaining this
property is very expensive.
Records must be fixed length, otherwise we cannot
jump directly to the i-th record in the file.
Binary search still requires more than one or two
seeks to find a record, even on moderately sized
files.
3. An index: a list of pairs (key, reference), sorted by key

it provides an efficient way to access all the data blocks or records

within a large file without having to search the entire file for the
data
3. An index: a list of pairs (key, reference), sorted by key
Allow direct fast access to files
Eliminates the need to re-organize or sort the file (files can be entry sequenced)
Provide direct access for files with variable length records
Provide multiple access paths to the file
Impose an order on a file without rearranging the file
Index of a File of Books
Primary Index
Contains a primary key in canonical form,
and a pointer to a record in the file
Each entry in the primary index identifies
uniquely a record in the file
Designed to support binary search on the
primary key
Basic Operations
on Indexes
Index creation
Index loading
Updating of index files
Record additions /
deletions / updates
Use of Multiple Indexes
►Provides multiple views of a data file
►Allows us to search for particular values within fields
that are not primary keys
►Allows us to search using combinations of secondary
/ primary keys
►Each entry in a secondary index contains a key value
and a primary key (or list of primary keys).
Secondary Key

►Does not identify records uniquely

►It is not dataless
► Has a canonical form (i.e.there are
restrictions on the values that the key must
take)
Secondary Index Structure
►List of secondary keys, sorted first by value of the
secondary key, and then by the value of the primary key
►Updates to the file must now be applied on the
secondary indexes as well.
►The fact that we store primary keys instead of pointers
into the file minimizes the impact of file updates on the
secondary index.
Deletion of a Record
►Change only data file and primary index

►Search secondary key, find primary key,

search for p.k. in primary index
---> record-not-found
►saved from reading wrong data
Update a Record
►Change secondary key:

X rearrange secondary index

►Change primary key: rearrange primary index
rewrite reference fields of secondary index (no
rearrangement)
►Change other fields: no effect on secondary index
Improving Secondary Indexes
►We can store several primary keys per row in the
secondary index
—This, however, wastes space for some records, and is
not sufficient for other secondary keys.
►We can store a pointer to a linked list of primary keys
—We want these lists to be stored in a file, and to be easy
to manage; hence, the inverted list
Inverted Lists
►Solve the problems associated with the
variability in the number of references a
secondary key can have
►Greatly reduces the need to reorganize /
sort the
secondary index
►Store primary keys in the order they are
entered, do not need to be sorted
►The downside is that references for one
secondary key are spread across the
inverted list
Some Notes
►Even though it is preferred to store lists of primary
keys, under certain circumstances it could be better to store
pointers into the file.
-When access speed is critical
-When the file is static (does not suffer updates, or
updates are very seldom)
►Consider also that there is a safety issue related to having to
propagate updates to the file to several indexes, the updating
algorithm must be robust to different types of failure.

Chapter 5 File Management
100% (2)
Chapter 5 File Management
37 pages
Complete Unit of Java
No ratings yet
Complete Unit of Java
66 pages
Os CH 7
No ratings yet
Os CH 7
36 pages
UNIT 4 Os
No ratings yet
UNIT 4 Os
39 pages
Simple SysML For Beginners - Usi - Hetherington, David
No ratings yet
Simple SysML For Beginners - Usi - Hetherington, David
409 pages
OS Unit5
No ratings yet
OS Unit5
23 pages
File System Notes UNIT V
No ratings yet
File System Notes UNIT V
24 pages
File Handling
No ratings yet
File Handling
42 pages
Osunit 6
No ratings yet
Osunit 6
16 pages
Unit 6 File Management
No ratings yet
Unit 6 File Management
70 pages
Os 5TH
No ratings yet
Os 5TH
38 pages
OS Unit IV File System - Part 1
No ratings yet
OS Unit IV File System - Part 1
28 pages
Unit-5 File Management
No ratings yet
Unit-5 File Management
41 pages
File System
No ratings yet
File System
27 pages
Beginner's Guide for Cybercrime Investigators
From Everand
Beginner's Guide for Cybercrime Investigators
Nicolae Sfetcu
5/5 (1)
Unit 5
No ratings yet
Unit 5
21 pages
Os Module 5
No ratings yet
Os Module 5
21 pages
Unit V
No ratings yet
Unit V
34 pages
18IS61 FSmodule1 Notes
No ratings yet
18IS61 FSmodule1 Notes
40 pages
File Concept
No ratings yet
File Concept
21 pages
OS Unit-5
No ratings yet
OS Unit-5
20 pages
Unit 1 To 4 - Linux
No ratings yet
Unit 1 To 4 - Linux
76 pages
Adobe Scan Jan 01, 2023
No ratings yet
Adobe Scan Jan 01, 2023
12 pages
Chapter No 6 File Management
No ratings yet
Chapter No 6 File Management
50 pages
File Management
No ratings yet
File Management
26 pages
14 File System Interface 26-06-2023
No ratings yet
14 File System Interface 26-06-2023
28 pages
Unit 6
No ratings yet
Unit 6
56 pages
(Operating System) Week 10 - File Management
No ratings yet
(Operating System) Week 10 - File Management
7 pages
1.file Organization
No ratings yet
1.file Organization
90 pages
Chapter-3 Msc-cs-1
No ratings yet
Chapter-3 Msc-cs-1
55 pages
Coos Unit V Part 1&2
No ratings yet
Coos Unit V Part 1&2
16 pages
Os Chapter 5
No ratings yet
Os Chapter 5
20 pages
Module 5
No ratings yet
Module 5
68 pages
Ibm Base Professional 6.0.1 User Guide PDF
50% (2)
Ibm Base Professional 6.0.1 User Guide PDF
1,624 pages
Operating System
No ratings yet
Operating System
42 pages
File Concept
No ratings yet
File Concept
14 pages
Google Workspace Notebooklm Ebook
No ratings yet
Google Workspace Notebooklm Ebook
24 pages
Data File
No ratings yet
Data File
22 pages
6.file Managment
No ratings yet
6.file Managment
7 pages
Os-Unit Iv
No ratings yet
Os-Unit Iv
30 pages
Chapter 2 - File System Management
No ratings yet
Chapter 2 - File System Management
43 pages
7269IV - 5th Semester - Computer Science and Engineering
No ratings yet
7269IV - 5th Semester - Computer Science and Engineering
37 pages
Os Cha 5
No ratings yet
Os Cha 5
27 pages
File System Interface: Unit - 5
No ratings yet
File System Interface: Unit - 5
24 pages
L-2.3.1 File System Management
No ratings yet
L-2.3.1 File System Management
8 pages
Ca C3 Operating System Unit 4
No ratings yet
Ca C3 Operating System Unit 4
7 pages
File Management
No ratings yet
File Management
15 pages
Unit-Iv File Management
No ratings yet
Unit-Iv File Management
21 pages
Wa0024
No ratings yet
Wa0024
30 pages
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
No ratings yet
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
23 pages
Module-5 - File System
No ratings yet
Module-5 - File System
16 pages
Os Unit 4
No ratings yet
Os Unit 4
20 pages
Chapter 5: File Systems
No ratings yet
Chapter 5: File Systems
15 pages
OSY Chapter 6 SSP
No ratings yet
OSY Chapter 6 SSP
24 pages
Osy 6
No ratings yet
Osy 6
19 pages
Os Unit 5
No ratings yet
Os Unit 5
21 pages
File 1. File Concept
No ratings yet
File 1. File Concept
6 pages
Lecturer Notes BCA Pt-II - OOPs Unit-V File Management-I
No ratings yet
Lecturer Notes BCA Pt-II - OOPs Unit-V File Management-I
11 pages
File System Interface Access Methods Directory Structure
No ratings yet
File System Interface Access Methods Directory Structure
27 pages
File System Management
No ratings yet
File System Management
9 pages
File System
No ratings yet
File System
8 pages
File System: 1.1 Metadata
No ratings yet
File System: 1.1 Metadata
9 pages
Files and Their Organization: Data Hierarchy
No ratings yet
Files and Their Organization: Data Hierarchy
17 pages
Small Basic
No ratings yet
Small Basic
4 pages
Lesson 2 - SAS Basics PDF
No ratings yet
Lesson 2 - SAS Basics PDF
24 pages
AWP Practicals-51-99
No ratings yet
AWP Practicals-51-99
53 pages
Chapter 5
No ratings yet
Chapter 5
20 pages
CA Solve
No ratings yet
CA Solve
32 pages
Module-1 Introduction To File Structures
No ratings yet
Module-1 Introduction To File Structures
50 pages
Mathworks Installation Help
No ratings yet
Mathworks Installation Help
54 pages
Raspberry Pi Pico Guide
100% (1)
Raspberry Pi Pico Guide
131 pages
LBSM Proces At014 en e
No ratings yet
LBSM Proces At014 en e
88 pages
Keep Your Memory Dump Shut: Unveiling Data Leaks in Password Managers
No ratings yet
Keep Your Memory Dump Shut: Unveiling Data Leaks in Password Managers
15 pages
SW Lec3 Notes
No ratings yet
SW Lec3 Notes
10 pages
AOT Question Bank
No ratings yet
AOT Question Bank
34 pages
UserManual MokuLab DataLogger
No ratings yet
UserManual MokuLab DataLogger
14 pages
Operations Research - Lecture 12
No ratings yet
Operations Research - Lecture 12
13 pages
CST Probe Fed Patch Antenna Design
No ratings yet
CST Probe Fed Patch Antenna Design
24 pages
Operations Research - Lecture 10
No ratings yet
Operations Research - Lecture 10
21 pages
Green Cloud Quick Start Guide
No ratings yet
Green Cloud Quick Start Guide
8 pages
Os Lab
No ratings yet
Os Lab
29 pages
Pre-Assessment Questions: Java Binary Input/Output (I/O) Stream Classes
No ratings yet
Pre-Assessment Questions: Java Binary Input/Output (I/O) Stream Classes
25 pages
A Matlab Primer: by Jo Ao Lopes, Vitor Lopes
No ratings yet
A Matlab Primer: by Jo Ao Lopes, Vitor Lopes
45 pages
WinForms Editor
No ratings yet
WinForms Editor
62 pages
Select Om at Plen
No ratings yet
Select Om at Plen
20 pages
Ooad UNIT 5 Notes
No ratings yet
Ooad UNIT 5 Notes
29 pages
DFD Tutorials
No ratings yet
DFD Tutorials
5 pages
JFrog NPM CheatSheet V4
No ratings yet
JFrog NPM CheatSheet V4
1 page
MemTest86 User Guide V4.x BIOS
No ratings yet
MemTest86 User Guide V4.x BIOS
27 pages
User's Manual: Ver. 1.1 EN
No ratings yet
User's Manual: Ver. 1.1 EN
30 pages
SNIP-97-Material Source Library - Explained-Rev03 PDF
No ratings yet
SNIP-97-Material Source Library - Explained-Rev03 PDF
10 pages
1.UNIX Operating System
No ratings yet
1.UNIX Operating System
37 pages
Connecting MS Access and CONCEPT Via ODBC
No ratings yet
Connecting MS Access and CONCEPT Via ODBC
6 pages
Using EAGLE: Schematic
No ratings yet
Using EAGLE: Schematic
14 pages
PLC Sinumerik8 840Dsl
No ratings yet
PLC Sinumerik8 840Dsl
47 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

File Processing

Uploaded by

File Processing

Uploaded by

File Processing

How Fast? – Main memory: ~120 nanoseconds =120x10^-9

►An analogy keeping same time proportion as

Main Memory Secondary Storage

Fast (since electronic) Slow (since electronic and mechanical)

Small (since expensive) Large (since cheap)

Grouping related information so that we are likely to get everything we

Locality of Reference in Time and Space

Manage growth by splitting

History of File structure]

Processing 2- Disks became more common

Design –Direct access

History of File 4- A tree structure suitable for files was invented:

Processing good for accessing millions of records with 3 or 4 disk

Examples: anything that you can store in a

Notice that this is only an informal

Back to Agenda Page

-cuts the link between physical and logical files

A fstream has 2 file pointers: get pointer & put pointer

Every stream has an associated file position

it provides an efficient way to access all the data blocks or records

►Does not identify records uniquely

►Search secondary key, find primary key,

X rearrange secondary index

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.