Distributed File Systems (DFS) : A Resource Management Component of A Distributed Operating System
Distributed File Systems (DFS) : A Resource Management Component of A Distributed Operating System
Distributed File Systems (DFS) : A Resource Management Component of A Distributed Operating System
High Availability
Users should have the same easy access to files, irrespective of their physical location System failures or regularly scheduled activities such as backups or maintenance should not result in the unavailability of files
Architecture
Files can be stored at any machine and computation can be performed at any machine A machine can access a file stored on a remote machine where the file access operations are performed and the data is returned Alternatively, File Servers are provided as dedicated to storing files and performing storage and retrieval operations Two most important services in a DFS are
Name Server: a process that maps names specified by clients to stored objects, e.g. files and directories Cache Manager: a process that implements file caching, i.e. copying a remote file to the clients machine when referred by the client
Architecture of DFS
Caching
To reduce delays in the accessing of data by exploiting the temporal locality of reference exhibited by program
Hints
An alternative to cached data to overcome inconsistency problem when multiple clients access shared data
Encryption
To enforce security in distributed systems with a scenario that two entities wishing to communicate establish a key for conversation
Design Goals
Naming and Name Resolution Caches on Disk or Main Memory Writing Policy Cache Consistency Availability Scalability Semantics
Name Server
Resolves the names in distributed systems. Drawbacks involved such as single point of failure, performance bottleneck. Alternate is to have several name servers, e.g. Domain Name Servers
Writing Policy
Decision to when the modified cache block at a client should be transferred to the server Write-through policy
All writes requested by the applications at clients are also carried out at the server immediately.
Cache Consistency
Two approaches to guarantee that the data returned to the client is valid.
Server-initiated approach
Server inform cache managers whenever the data in the client caches become stale Cache managers at clients can then retrieve the new data or invalidate the blocks containing the old data
Client-initiated approach
The responsibility of the cache managers at the clients to validate data with the server before returning it
Availability
Immunity to the failure of server of the communication network Replication is used for enhancing the availability of files at different servers It is expensive because
Extra storage space required The overhead incurred in maintaining all the replicas up to date
Issues involve
How to keep the replicas of a file consistent How to detect inconsistencies among replicas of a file and recover from these inconsistencies
Causes of Inconsistency
A replica is not updated due to failure of server All the file servers are not reachable from all the clients due to network partition The replicas of a file in different partitions are updated differently
Availability (contd.)
Unit of Replication
The most basic unit is a file A group of files of a single user or the files that are in a server (the group file is referred to as volume, e.g. Coda) Combination of two techniques, as in Locus
Replica Management
The maintenance of replicas and in making use of them to provide increased availability
Concerns with the consistency among replicas
A weighted voting scheme (e.g. Roe File System) Designated agents scheme (e.g. Locus) Backups servers scheme (e.g. Harp File System)
Scalability
The suitability of the design of a system to cater to the demands of a growing system As the system grow larger, both the size of the server state and the load due to invalidations increase The structure of the server process also plays a major role in deciding how many clients a server can support
If the server is designed with a single process, then many clients have to wait for a long time whenever a disk I/O is initiated These waits can be avoided if a separate process is assigned to each client A significant overhead due to the frequent context switches to handle requests from different clients can slow down the server An alternate is to use Lightweight processes (threads)
Semantics
The semantics of a file system characterizes the effects of accesses on files Guaranteeing the semantics in distributed file systems, which employ caching, is difficult and expensive
In server-initiated cache the invalidation may not occur immediately after updates and before reads occur at clients. This is due to communication delays
To guarantee the above semantics all the reads and writes from various clients will have to go through the server Or sharing will have to be disallowed either by the server, or by the use of locks by applications
Students Task
Case Studies
9.5.1 The Sun Network File System