0% found this document useful (0 votes)
8 views

Ch05

Chapter 5 of 'Operating Systems: Design and Implementation' discusses file systems, emphasizing the need for long-term information storage that allows multiple processes to access data concurrently. It covers file structure, types, attributes, operations, and directory management, highlighting the importance of organizing files efficiently to prevent conflicts and ensure reliability. Additionally, it addresses disk space management strategies and the implementation of file systems, including linked list allocation and i-nodes.

Uploaded by

7bp95h9mbz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Ch05

Chapter 5 of 'Operating Systems: Design and Implementation' discusses file systems, emphasizing the need for long-term information storage that allows multiple processes to access data concurrently. It covers file structure, types, attributes, operations, and directory management, highlighting the importance of organizing files efficiently to prevent conflicts and ensure reliability. Additionally, it addresses disk space management strategies and the implementation of file systems, including linked list allocation and i-nodes.

Uploaded by

7bp95h9mbz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 80

OPERATING SYSTEMS

DESIGN AND IMPLEMENTATION


Third Edition
ANDREW S. TANENBAUM
ALBERT S. WOODHULL

Chapter 5
File Systems
Storing/Retrieving Information
• All computer applications need to store and retrieve information.

• While a process is running, it can store a limited amount of


information within its own address space.

•First problem: the storage capacity is restricted to the size of the


virtual address space. For some applications this size is adequate,
but for others, such as airline reservations, banking, or corporate
record keeping, it is far too small.
Storing/Retrieving Information
• A second problem with keeping information within a process'
address space is that when the process terminates, the information is
lost. For many applications, the information must be retained for
weeks, months, or even forever. Furthermore, it must not go away
when a computer crash kills the process.

•A third problem is that it is frequently necessary for multiple


processes to access (parts of) the information at the same time. If we
have an online telephone directory stored inside the address space of
a single process, only that process can access it.

•The way to solve this problem is to make the information itself


independent of any one process.
Storing/Retrieving Information
• Essential requirements for long-term information storage:
1. It must be possible to store a very large amount of information.
2. The information must survive the termination of the process using it.
3. Multiple processes must be able to access the information
concurrently.

• Files are managed by the OS.


• How they are structured, named, accessed, used, protected,
and implemented are major topics in OS design.
• The part of the OS dealing with files is known as the file
system
File Naming

Figure 5-1. Some typical file extensions.


File Structure

Figure 5-2. Three kinds of files.


(a) Byte sequence. (b) Record sequence. (c) Tree.
...
File Structure
•Files can be structured in any one of several ways.
•Three common possibilities are depicted in figure 5-2.
• The file in (a) is just an unstructured sequence of bytes. In
effect, the OS does not know or care what is in the file. All it
sees are bytes. Any meaning must be imposed by user-level
programs. Both UNIX and Windows 98 use this approach.
• In (b) a file is a sequence of fixed-length records, each with
some internal structure. the idea is that the read operation returns
one record and the write operation overwrites or appends one
record. No current general-purpose system works this way.
File Structure
• In (c) organization, a file consists of a tree of records, not
necessarily all the same length, each containing a key field in a fixed
position in the record.
• The tree is sorted on the key field, to allow rapid searching for a
particular key.
• The basic operation here is not to get the "next" record, although
that is also possible, but to get the record with a specific key.
• Furthermore, new records can be added to the file, with the OS, and
not the user, deciding where to place them.
•This type of file is widely used on the large mainframe computers
still used in some commercial data processing.
File Types

Figure 5-3. (a) An executable file. (b) An archive


File Types
• Many OS support several types of files. UNIX and Windows, for example, have
regular files and directories. UNIX also has character and block special files.

• Regular files are the ones that contain user information. They are either

ASCII files: can be edited with text editors. They consist of lines of text.
Lines need not all be of the same length.

or binary files: which just means that they are not ASCII files. Usually, they
have some internal structure known to programs that use them

• Directories are system files for maintaining the structure of the file system.

• Character special files are related to input/output and used to model serial I/O
devices such as terminals, printers, and networks.

• Block special files are used to model disks.


File Attributes
• File Access: 2 types

• Sequential access: A process could read all the bytes or records


in a file in order, starting at the beginning, but could not skip
around and read them out of order. Sequential files were
convenient when the storage medium was magnetic tape, rather
than disk.

• Random access : Files whose bytes or records can be read in


any order. When disks came into use for storing files, it became
possible to read the bytes or records of a file out of order, or to
access records by key, rather than by position. Random access
files are essential for many applications.
File Attributes

Figure 5-4. Some possible file attributes.


File Attributes

Figure 5-4. Some possible file attributes.


File Attributes
• Every file has a name and its data.

• In addition, all OS associate other information with each file, for


example, the date and time the file was created and the file's size. We
will call these extra items the file's attributes although some people
called them metadata.

• The list of attributes varies considerably from system to system.

• No existing system has all of these, but each is present in some


system.
File Operations
• Files exist to store information and allow it to be retrieved later.
•Different systems provide different operations to allow storage and
retrieval.
• The most common system calls relating to files are:

1. Create 7. Append
2. Delete 8. Seek (for random access)
3. Open 9. Get attributes
4. Close 10. Set Attributes
5. Read 11. Rename
6. Write 12. Lock
Directories

Figure 5-5. (a) Attributes in the directory entry.


(b) Attributes elsewhere.
Directories
• To keep track of files, file systems normally have directories or folders, which, in
many systems, are themselves files.
• A directory typically contains a number of entries, one per file.
•One possibility is shown in (a), in which each entry contains the file name, the file
attributes, and the disk addresses where the data are stored.
• Another possibility is shown in (b), here a directory entry holds the file name and
a pointer to another data structure where the attributes and disk addresses are found.
•Both of these systems are commonly used. Both the cases are called simple
directories.
•The problem with having only one directory in a system with multiple users is that
different users may accidentally use the same names for their files Figure 5-6 (a).
•Consequently, this scheme is not used on multiuser systems any more, but could be
used on a small embedded system, for example, a handheld PDA (personal digital
assistant) or a cellular telephone.
Hierarchical Directory Systems
• To avoid conflicts caused by different users the next step up is
giving each user a private directory.
• Names chosen by one user do not interfere with names chosen by a
different user and there is no problem caused by the same name
occurring in two or more directories.
• This design leads to the system of (b). This design could be used, for
example, on a multiuser computer or on a simple network of personal
computers that shared a common file server over a local area network.
• Implicit in this design is that when a user tries to open a file, the OS
knows which user it is to know which directory to search  some kind
of login procedure is needed.
• Users can only access files in their own directories.
Hierarchical Directory Systems

• But another problem is that users with many files may want to
group them in smaller subgroups, for instance a professor might
want to separate handouts for a class from drafts of chapters of a
new textbook.
•What is needed is a general hierarchy (i.e., a tree of directories).
With this approach, each user can have as many directories as are
needed so that files can be grouped together in natural ways. This
approach is shown in (c).
Hierarchical Directory Systems

Figure 5-6. Three file system designs


(a) Single directory shared by all users.
(b) One directory per user.
(c) Arbitrary tree per user.
Path Names

Figure 5-7. A UNIX directory tree.


Path Names
• In directory tree, some way is needed for specifying file names.

• Two different methods are commonly used.

• Absolute path name consisting of the path from the root


directory to the file. Absolute path names always start at the root
directory and are unique. In UNIX the components of the path
are separated by /. In Windows the separator is \ .

• Relative path name. This is used in conjunction with the


concept of the working directory (also called the current
directory). in which case all path names begin at the working
directory.
Directory Operations
The system calls for managing directories exhibit more variation
than system calls for files.
sample directory system calls (taken from UNIX).

1. Create
2. Delete
3. Opendir (directory can be read)
4. Closedir
5. Readdir (this call returns the next entry in an open directory)
6. Rename
7. Link (a technique that allows a file to appear in more than one
directory)
8. Unlink (A directory entry is removed)
File System Implementation:
File System Layout

Figure 5-8. A possible file system layout.


File System Implementation: File System Layout
• Users are concerned with how files are named, what operations are allowed
on them, what the directory tree looks like, and similar interface issues.

• Implementers are interested in how files and directories are stored, how
disk space is managed, and how to make everything work efficiently and
reliably.

• File systems usually are stored on disks. To review this material briefly,
most disks can be divided up into partitions, with independent file systems
on each partition.

• Sector 0 of the disk is called the MBR (Master Boot Record) and is used
to boot the computer.

• The end of the MBR contains the partition table. This table gives the
starting and ending addresses of each partition. One of the partitions in the
table may be marked as active.
File System Implementation: File System Layout
• When the computer is booted, the BIOS reads in and executes the code in
the MBR. The first thing the MBR program does is locate the active
partition, read in its first block, called the boot block, and execute it.

•The program in the boot block loads the OS contained in that partition.

•The first one is the superblock. It contains all the key parameters about
the file system and is read into memory when the computer is booted or the
file system is first touched.

•Next might come information about free blocks in the file system. This
might be followed by the i-nodes, an array of data structures, one per file,
telling all about the file and where its blocks are located.

• After that might come the root directory, which contains the top of the
file system tree. Finally, the remainder of the disk typically contains all the
other directories and files.
Implementing Files: Linked List Allocation

Figure 5-9. Storing a file as a linked list of disk blocks.


Implementing Files: Linked List Allocation

•The one of the used method for storing files is to keep each one as a
linked list of disk blocks.

• The first word of each block is used as a pointer to the next one.
The rest of the block is for data.

• Every disk block can be used in this method. No space is lost to


disk fragmentation (except for internal fragmentation in the last
block of each file).

•Also, it is sufficient for the directory entry to store the disk address
of the first block. The rest can be found starting there.

• Reading a file sequentially is straightforward, random access is


extremely slow.
Linked List Allocation Using a Table in Memory

• Both disadvantages of the linked list allocation can be eliminated by taking


the pointer word from each disk block and putting it in a table in memory.

•Figure 5-10 shows what the table looks like for the example of Figure 5-9.

•Using the table of Figure 5-10, we can start with block 4 and follow the
chain all the way to the end.

• The same can be done starting with block 6. Both chains are terminated
with a special marker (e.g., -1) that is not a valid block number.

• Such a table in main memory is called a FAT (File Allocation Table).


Linked List Allocation Using a Table in Memory

Figure 5-10. Linked list allocation using a file allocation table in main memory.
I-nodes
• An i-node (index-node) is a data structure which lists the attributes and
disk addresses of the file's blocks.
• The big advantage of this scheme over linked files using an in-memory
table is that the i-node need only be in memory when the corresponding file
is open.
• If each i-node occupies n bytes and a maximum of k files may be open at
once, the total memory occupied by the array holding the i-nodes for the
open files is only k x n bytes.
• One problem with i-nodes is that if each one has room for a fixed number
of disk addresses, what happens when a file grows beyond this limit?
• One solution is to reserve the last disk address not for a data block, but for
the address of an indirect block containing more disk block addresses. This
idea can be extended to use double indirect blocks and triple indirect
blocks
I-nodes

Figure 5-11. An i-node with three levels of indirect blocks.


Implementing Directories: Shared Files
When a file is opened, the OS uses the path name supplied by the user to locate
the directory entry.
The root directory may be in a fixed location relative to the start of a partition.

Figure 5-12. File system containing a shared file.


Disk Space Management: Block Size
•Files are normally stored on disk, so management of disk space is a major
concern to file system designers. Two general strategies are possible for storing an
n byte file:

• n consecutive bytes of disk space are allocated

•the file is split up into a number of (not necessarily) contiguous blocks.

•The same trade-off is present in memory management systems between pure


segmentation and paging.

• Storing a file as a contiguous sequence of bytes has the obvious problem that if a
file grows, it will probably have to be moved on the disk.

• The same problem holds for segments in memory, except that moving a segment
in memory is a relatively fast operation compared to moving a file from one disk
position to another.
Disk Space Management: Block Size

•For this reason, nearly all file systems chop files up into fixed-size
blocks that need not be adjacent.

•Once it has been decided to store files in fixed-size blocks, the


question arises of how big the blocks should be.

•As an example, consider a disk with 131,072 bytes/track, a rotation


time of 8.33 msec, and an average seek time of 10 msec.

•The time in milliseconds to read a block of k bytes is then the sum


of the seek, rotational delay, and transfer times:

10 + 4.165 + (k / 131072)x 8.33


Disk Space Management: Block Size

Figure 5-17. The solid curve (left-hand scale) gives the data rate of a disk.
The dashed curve gives the disk space efficiency.
All files are 2 KB.
Keeping Track of Free Blocks

Figure 5-18. (a) Storing the free list on a linked list. (b) A bitmap.
Keeping Track of Free Blocks
•Once a block size has been chosen, the next issue is how to keep
track of free blocks.

•Two methods are widely used, as shown in figure 5-18.

The first one consists of using a linked list of disk blocks, with
each block holding as many free disk block numbers as will fit.
With a 1-KB block and a 32-bit disk block number, each block
on the free list holds the numbers of 255 free blocks.

The other free space management technique is the bitmap. A


disk with n blocks requires a bitmap with n bits. Free blocks are
represented by 1s in the map, allocated blocks by 0s (or vice
versa).
File System Reliability
• Destruction of a file system is often a far greater disaster than
destruction of a computer.

• Potential problems solved by backups:

1.Recover from disaster.


2.Recover from stupidity.
Backup Issues
1. Backup all or part of the system?
2. Don’t backup file if not changed
3. Compression of backup or not?
4. Difficulty of backup while file system active
5. Physical security of backup media
File System Performance: Caching
• Access to disk is much slower than access to memory.
• If only a single word is needed, the memory access is on the order of
a million times as fast as disk access  many file systems have been
designed with various optimizations to improve performance.
• The most common technique used to reduce disk accesses is the
block cache or buffer cache.
• Cache is pronounced "cash" and is derived from the French cacher,
meaning to hide.
• Cache is a collection of blocks that logically belong on the disk but
are being kept in memory for performance reasons.
File System Performance: Caching
•Various algorithms can be used to manage the cache, but a common
one is to check all read requests to see if the needed block is in the
cache.
If it is, the read request can be satisfied without a disk access.
If the block is not in the cache, it is first read into the cache, and
then copied to wherever it is needed.

Figure 5-20. The buffer cache data structures.


The Security Environment
• File systems generally contain information that is highly valuable to their users.

• Protecting this information against unauthorized usage is therefore a major


concern of all file systems.

• People frequently use the terms "security" and "protection" interchangeably.

• It is frequently useful to make a distinction between the general problems.

•To avoid confusion, we will use the term


Security to refer to the overall problem,
Protection mechanisms to refer to the specific operating system mechanisms
used to safeguard information in the computer.

•Security has many facets. Three of the more important ones are
The nature of the threats
The nature of intruders
Accidental data loss.
The Security Environment

Figure 5-22. Security goals and threats.


Categories of Intruders
1. Casual prying by nontechnical users.

2. Snooping by insiders.

3. Determined attempts to make money.

4. Commercial or military espionage (spying).


Accidental Data Loss
1. Acts of God: fires, floods, earthquakes, wars etc.

2. Hardware or software errors: CPU malfunctions, unreadable


disks or tapes, telecommunication errors, program bugs

3. Human errors: incorrect data entry, wrong tape or disk


mounted, wrong program run, lost disk or tape, or some other
mistake.
Generic Security Attacks
• Finding security flaws is not easy. The usual way to test a
system's security is to hire a group of experts, known as tiger
teams or penetration teams, to see if they can break in.
1. Request memory pages, disk space, or tapes and just read them.

2. Try illegal system calls, or legal system calls with illegal parameters,
or even legal system calls with legal but unreasonable parameters.

3. Start logging in and then hit DEL, or BREAK halfway through the
login sequence. (in some systems, the password checking program
will be killed and the login considered successful)

4. Try modifying complex OS structures kept in user space (if any).


Generic Security Attacks
5. Spoof the user by writing a program that types ‘‘login:’’ on the screen
and go away.

6. Look for manuals that say ‘‘Do not do X.’’ Try as many variations of
X as possible.

7. Convince a system programmer to change the system to skip certain


vital security checks for any user with your login name.

8. All else failing, the penetrator might find the computer center
director’s secretary and offer a large bribe.
Design Principles for Security
1. The system design should be public.

2. The default should be no access.

3. Check for current authority.

4. Give each process the least privilege possible.

5. The protection mechanism should be simple, uniform, and built


into the lowest layers of the system.

6. The scheme chosen must be psychologically acceptable.


User Authentication:
Physical Identification
1. Passwords
2. Physical Identification

Figure 5-23. A device for measuring finger length.


Protection Mechanisms:
Protection Domains
• All of these techniques make a clear distinction between
•Policy: whose data are to be protected from whom
•Mechanism: how the system enforces the policy.

•Our emphasis will be on mechanisms, not policies.

• In some systems, protection is enforced by a program called a reference


monitor. Every time an access to a potentially protected resource is
attempted, the system first asks the reference monitor to check its legality.
The reference monitor then looks at its policy tables and makes a decision.
•Protection Domains: A computer system contains many "objects" that
need to be protected.
•These objects can be hardware (e.g., CPUs, memory segments, disk
drives, or printers)
•They can be software (e.g., processes, files, databases, or semaphores).
Protection Mechanisms:
Protection Domains
• Each object has a unique name by which it is referenced, and a
finite set of operations that processes are allowed to carry out on it.
The read and write operations are appropriate to a file; up and down
make sense on a semaphore.
• A domain is a set of (object, rights) pairs. Each pair specifies an
object and some subset of the operations that can be performed on it.
• A right in this context means permission to perform one of the
operations.
• Often a domain corresponds to a single user, telling what the user
can do and not do
• But a domain can also be more general than just one user.
Protection Mechanisms:
Protection Domains
Figure shows three domains, showing the objects in each domain and the
rights [Read, Write, eXecute] available on each object and it is possible for
the same object to be in multiple domains, with different rights in each one.

Figure 5-24. Three protection domains.


Protection Domains

Figure 5-25. A protection matrix.


Protection Domains

Figure 5-26. A protection matrix with domains as objects.


Access Control Lists

Figure 5-27. Use of access control lists to manage file access.


Access Control Lists
• Most domains have no access at all to most objects, so storing a
very large, mostly empty, matrix is a waste of disk space.

• Two methods that are practical, are storing the matrix by rows or by
columns, and then storing only the nonempty elements.

• The two approaches are surprisingly different.

• The first technique consists of associating with each object an


(ordered) list containing all the domains that may access the object,
and how.

• This list is called the Access Control List or ACL


Access Control Lists

Figure 5-28. example of two access control lists.


Capabilities

Figure 5-29. When capabilities are used, each process has a capability list.
Capabilities
• The other way of slicing up the matrix is by rows.
• associated with each process is a list of objects that may be
accessed, along with an indication of which operations are permitted
on each (its domain).
•This list is called a capability list or C-list and the individual items
on it are called capabilities.
• Usually, a capability consists of a file identifier and a bitmap for the
various rights. In a UNIX-like system, the file identifier would
probably be the i-node number.
• Capability lists are themselves objects and may be pointed to from
other capability lists, thus facilitating sharing of sub-domains.
Examples of Generic Rights
1. Copy capability: create a new capability for the same object.

2. Copy object: create a duplicate object with a new capability.

3. Remove capability: delete an entry from the C-list; object


unaffected.

4. Destroy object: permanently remove an object and a


capability.
Covert Channels
• Even with access control lists and capabilities, security leaks can still
occur.

• Lampson's (1973) model was originally formulated in terms of a single


timesharing system, but the same ideas can be adapted to LANs and other
multi-user environments.

• In the purest form, it involves three processes on some protected machine.

The first process is the client, which wants some work performed by the
second one, the server. The client and the server do not entirely trust each
other. The third process is the collaborator, which is conspiring with the
server to indeed steal the client's confidential data.

The collaborator and server are typically owned by the same person.
Covert Channels

Figure 5-31. (a) The client, server, and collaborator processes.

(b) The encapsulated server can still leak to the


collaborator via covert channels.
Covert Channels
• From the system designer's point of view, the goal is to encapsulate or
confine the server in such a way that it cannot pass information to the
collaborator.
• Using a protection matrix scheme we can easily guarantee that the server
cannot communicate with the collaborator by writing a file to which the
collaborator has read access.
• We can probably also ensure that the server cannot communicate with the
collaborator using the system's normal inter-process communication
mechanism.
• The covert channel is a noisy channel, containing a lot of extraneous
information, but information can be reliably sent over a noisy channel by
using an error-correcting code (e.g., a Hamming code, or even something
more sophisticated).
Covert Channels

•The use of an error-correcting code reduces the already low


bandwidth of the covert channel even more, but it still may be
enough to leak substantial information.
•It is fairly obvious that no protection model based on a matrix of
objects and domains is going to prevent this kind of leakage.
• The paging rate can also be modulated (many page faults for a 1, no
page faults for a 0). In fact, almost any way of degrading system
performance in a clocked way is a candidate.
• If the system provides a way of locking files, then the server can
lock some file to indicate a 1, and unlock it to indicate a 0.
Covert Channels
•On some systems, it may be possible for a process to detect the
status of a lock even on a file that it cannot access.
•This covert channel is illustrated in figure 5-32, with the file locked
or unlocked for some fixed time interval known to both the server
and collaborator. In this example, the secret bit stream 11010100 is
being transmitted.
Covert Channels

Figure 5-32. A covert channel using file locking.


File System Layout

Figure 5-34. Disk layout for a floppy disk or small hard disk partition, with 64
i-nodes and a 1-KB block size
(i.e., two consecutive 512-byte sectors are treated as a single block).
The Block Cache

Figure 5-37. The linked lists used by the block cache.


Directories and Paths

Figure 5-38. (a) Root file system. (b) An unmounted file system.
(c) The result of mounting the file system of (b) on /usr/.
File Descriptors

Figure 5-39. How file positions are shared between


a parent and a child.
Initialization of the File System

Figure 5-44. Block cache initialization.


(a) Before any buffers have been used.
Initialization of the File System

Figure 5-44. Block cache initialization.


(b) After one block has been requested.
Initialization of the File System

Figure 5-44. Block cache initialization.


(c) After the block has been released.
Reading a File

Figure 5-45. Three examples of how the first chunk size is determined for a 10-
byte file.
The block size is 8 bytes, and the number of bytes requested is 6. The chunk is
shown shaded.
Reading
a File

Figure 5-46. Some of the


procedures involved
in reading a file.
Writing a File

Figure 5-47. (a) – (f) The successive allocation


of 1-KB blocks with a 2-KB zone.
Converting a Path to an I-Node

Figure 5-48. Some of the procedures used in


looking up path names.
Mounting File Systems
Possible file system mounting errors:
• The special file given is not a block device.
• The special file is a block device but is already mounted.
• The file system to be mounted has a rotten magic number.
• The file system to be mounted is invalid (e.g., no i-nodes).
• The file to be mounted on does not exist or is a special file.
• There is no room for the mounted file system’s bitmaps.
• There is no room for the mounted file system’s superblock.
• There is no room for the mounted file system’s root i-node.
Linking and Unlinking Files
Possible errors in a linking or unlinking call:

• File_name does not exist or cannot be accessed.

• File_name already has the maximum number of links.

• File_name is a directory (only superuser can link to it).

• Link_name already exists.

• File_name and link "name are on different devices.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy