0% found this document useful (0 votes)

89 views27 pages

Distributed File System

The document discusses distributed file systems. Key points include: - A distributed file system allows files to be accessed remotely across physically dispersed users and storage devices, providing features like remote information sharing, user mobility, availability through replication, and support for diskless workstations. - Distributed file systems provide storage, file, and naming/directory services to manage files across multiple nodes. Desirable features include transparency in structure, access, naming, and replication. - Examples of distributed file systems discussed are the Hadoop Distributed File System (HDFS) and the Andrew File System (AFS). The AFS was an early system designed to support thousands of users at Carnegie Mellon University.

Uploaded by

Anoushka Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views27 pages

Distributed File System

Uploaded by

Anoushka Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Distributed File System

By Bandana Mahapatra
Why files are used
1. Permanent storage of information on a secondary
storage media.
2. Sharing of information between applications.
What is a file system
• A file system is a subsystem of the operating system that performs file
management activities such as organization, storing, retrieval, naming,
sharing, and protection of files.

• A file system frees the programmer from concerns about the details of
space allocation and layout of the secondary storage device.

• The design and implementation of a distributed file system is more

complex than a conventional file system due to the fact that the users
and storage devices are physically dispersed.
Distributed File System
DFS features
In addition to the functions of the file system of a single-processor system, the distributed file system supports the
following:

1. Remote Information sharing: Thus any node, irrespective of the physical location of the file can access the file.
2. User mobility : Users should be permitted to work on different nodes
3. Availability: For better fault-tolerance, files should be available for use even in the event of temporary failure of
one or more nodes of the system. Thus the system should maintain multiple copies of the files, the existence of which
should be transparent to the user.

4. Diskless workstations
A distributed file system, with its transparent remote-file accessing capability, allows the use of diskless workstations
in a system
A distributed file system provides the
following types of services:

Storage service

Allocation and management of space on a secondary storage device thus

providing a Logical view of the storage system.

True file service
Includes file-sharing semantics, file-caching mechanism, file replication
mechanism, concurrency control, multiple copy update protocol etc.
Continue…
• Name/Directory service
Responsible for directory related activities such as creation and deletion
of directories, adding a new file to a directory, deleting a file from a
directory, changing the name of a file, moving a file from one directory
to another etc.
Desirable features of a distributed file system:
Transparency

- Structure transparency
Clients should not know the number or locations of file servers and the storage devices.
- Access transparency
Both local and remote files should be accessible in the same way. The file system should automatically locate an
accessed file and transport it to the clients site.

- Naming transparency

The name of the file should give no hint as to the location of the file.
The name of the file must not be changed when moving from one node to another.

- Replication transparency
If a file is replicated on multiple nodes, both the existence of multiple copies and their locations should be
hidden from the clients.
HDFS
• The Hadoop Distributed File System (HDFS) is the primary data
storage system used by Hadoop applications. It employs a NameNode
and DataNode architecture to implement a distributed file system that
provides high-performance access to data across highly scalable
Hadoop clusters.
• HDFS is a key part of the many Hadoop ecosystem technologies, as it
provides a reliable means for managing pools of big data and
supporting related big data analytics applications.
Case study: Andrew File System
• Andrew is a distributed computing environment being developed in a joint project by Carnegie
Mellon University and IBM. One of the major components of Andrew is a distributed file system.

• The goal of the Andrew File System is to support growth up to at least 7000 workstations (one for
each student, faculty member, and staff at Carnegie Mellon) while providing users, application
programs, and system administrators with the amenities of a shared file system.

• The general goal of widespread accessibility of computational and informational facilities,

coupled with the choice of UNIX, led to the decision to provide an integrated, campus-wide file
system with functional characteristics as close to that of UNIX as possible. The first design choice
was to make the file system compatible with UNIX at the system call level.
• The general goal of widespread accessibility of computational and informational
facilities, coupled with the choice of UNIX, led to the decision to provide an
integrated, campus-wide file system with functional characteristics as close to that
of UNIX as possible. The first design choice was to make the file system
compatible with UNIX at the system call level.
• The second design decision was to use whole files as the basic unit of data
movement and storage, rather than some smaller unit such as physical or logical
records. This is undoubtedly the most controversial and interesting aspect of the
Andrew File System.
• It means that before a workstation can use a file, it must copy the
entire file to its local disk, and it must write modified files back to the
file system in their entirety.
This in turn requires using a local disk to hold recently-used files.
The issues with these file systems are:
• file size
• Record Updating
File Size
• only files small enough to fit in the local disks can be handled. Where
this matters in the environment, the large files had to be broken into
smaller parts which fit
File Updation
• Modified files are returned to the central system only when they are
closed, thus rendering record-level updates impossible.
This was considered as a non serious issue.
The main application for record-level updates is databases.
Serious multi-user databases have many other requirements.
1. record- or field-granularity authorization,
2. physical disk write ordering controls, and
3. update serialization
which are not satisfied by UNIX file system semantics, even in a non-
distributed environment
• The third and last key design decision in the Andrew File System was
to implement it with many relatively small servers rather than a single
large machine. This decision was based on the desire to support
growth gracefully, to enhance availability (since if any single server
fails, the others should continue), and to simplify the development
process by using the same hardware and operating system as the
workstations.
Current status of AFS
At the present time, an Andrew. file server consists of
• a workstation with three to six 400-megabyte disks attached.
• A price/performance goal of supporting at least 50 active workstations
per file server is achieved, so that the centralized costs of the file
system would be reasonable.
• In a large configuration like the one at Carnegie Mellon, a separate
"system control machine" to broadcast global information (such as
where specific users' files are to be found) to the file servers is used. In
a small configuration the system control machine is combined with a
(the) server machine.
Question(Skills)
• Comparative analysis between.
Network File system, Google file System, File system in cloud
Big Data
• What is Data?
The quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical
signals and recorded on magnetic, optical, or mechanical recording media.

• What is Big Data?

Big Data is also data but with a huge size. Big Data is a term used to describe
a collection of data that is huge in volume and yet growing exponentially with
time. In short such data is so large and complex that none of the traditional
data management tools are able to store it or process it efficiently.
Types Of Big Data

BigData could be found in three forms:

• Structured: tabular form of data representation and storage. E.g.
Databases, Excel format of representing Data
• Unstructured: Any data with unknown form or the structure is
classified as unstructured data. E.g. output of google.
• Semi-structured: Semi-structured data can contain both the forms of
data. We can see semi-structured data as a structured in form but it is
actually not defined with e.g. a table definition in relational DBMS.
Example of semi-structured data is a data represented in an XML file.
e.g. Personal data stored in an XML file.
Characteristics Of Big Data

(i) Volume – The name Big Data itself is related to a size which is enormous. Size of data
plays a very crucial role in determining value out of data. Also, whether a particular data
can actually be considered as a Big Data or not, is dependent upon the volume of data.
Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big
Data.
(ii) Variety – The next aspect of Big Data is its variety.
Variety refers to heterogeneous sources and the nature of data, both structured and
unstructured. During earlier days, spreadsheets and databases were the only sources of data
considered by most of the applications. Nowadays, data in the form of emails, photos,
videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis
applications. This variety of unstructured data poses certain issues for storage, mining and
analyzing data.
• (iii) Velocity – The term 'velocity' refers to the speed of generation of data. How
fast the data is generated and processed to meet the demands, determines real
potential in the data.
Big Data Velocity deals with the speed at which data flows in from sources like
business processes, application logs, networks, and social media sites,
sensors, Mobile devices, etc. The flow of data is massive and continuous.
• (iv) Variability – This refers to the inconsistency which can be shown by the data
at times, thus hampering the process of being able to handle and manage the
data effectively.
Benefits of Big Data Processing

1. Ability to process Big Data brings in multiple benefits, such as-

• Businesses can utilize outside intelligence while taking decisions
2. Access to social data from search engines and sites like facebook, twitter are enabling
organizations to fine tune their business strategies.
• Improved customer service
3. Traditional customer feedback systems are getting replaced by new systems designed with Big
Data technologies. In these new systems, Big Data and natural language processing technologies are
being used to read and evaluate consumer responses.
• Early identification of risk to the product/services, if any
• Better operational efficiency
Drivers of Big Data
Data driven initiatives:
They are primarily categorized into 3 broad areas:
a. Data Driven Innovation:
b. Data Driven Decision Making:.
c. Data Driven Discovery:
• Data Science as a competitive advantage:
• Sustained processes:
• Cost advantages of commodity hardware & open source software
• Quick turnaround and less bench times:
• Automation to backfill redundant/mundane tasks:
Technical
Data continues to grow exponentially
Data is everywhere and in many formats:
Alternate, Multiple Synchronous & Asynchronous data streams:
Low barrier to entry:
Traditional solutions failing to catch up with new market conditions:
Applications of Big Data
• Tracking Customer Spending Habit, Shopping Behaviour:
• Recommendation:
• Smart Traffic System:
• Auto Driving Car:
• Virtual Personal Assistant Tool:
• IoT:
• Energy Sector:
• Media and Entertainment Sector:
Algorithms for Big data Architecture
• Linear Regression
• Logistics Regression
• Classification and Regression Trees
• K-nearest Neighbours

DBMS OS CN OOPs MostFrequentlyAskedQuestions
No ratings yet
DBMS OS CN OOPs MostFrequentlyAskedQuestions
91 pages
Distributed File Systems
No ratings yet
Distributed File Systems
43 pages
CS6601-Distributed Systems
No ratings yet
CS6601-Distributed Systems
12 pages
C# - 3 in 1 - Beginner - S Guide+ Simple and Effective Tips and Tricks+ Advanced Guide To Learn C
No ratings yet
C# - 3 in 1 - Beginner - S Guide+ Simple and Effective Tips and Tricks+ Advanced Guide To Learn C
405 pages
Google Ads Tutorial
100% (1)
Google Ads Tutorial
85 pages
Unit IV Data Mining
No ratings yet
Unit IV Data Mining
65 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
Operating System
No ratings yet
Operating System
40 pages
Module 2
No ratings yet
Module 2
27 pages
Mathematics Grade 10 Unit 1 Note 1
0% (1)
Mathematics Grade 10 Unit 1 Note 1
10 pages
Characteristics Multi Processors
No ratings yet
Characteristics Multi Processors
7 pages
L8 DFS
No ratings yet
L8 DFS
35 pages
Student Guide DP Series Appliance Admin
No ratings yet
Student Guide DP Series Appliance Admin
652 pages
Presentation On: Business Communication Interviews
100% (1)
Presentation On: Business Communication Interviews
10 pages
Chapter 6: MAC Protocols For Ad-Hoc Wireless Networks
No ratings yet
Chapter 6: MAC Protocols For Ad-Hoc Wireless Networks
28 pages
Irlw Question Bank
No ratings yet
Irlw Question Bank
45 pages
Amfi PPT-1
No ratings yet
Amfi PPT-1
209 pages
Mutual Funds Shivnadar
No ratings yet
Mutual Funds Shivnadar
126 pages
Unit - 1
No ratings yet
Unit - 1
48 pages
Plan 9 Operating System
100% (1)
Plan 9 Operating System
10 pages
Programming The Web 10CS73
No ratings yet
Programming The Web 10CS73
103 pages
Pple Unit 3
No ratings yet
Pple Unit 3
27 pages
Mutual Fund Capstone
No ratings yet
Mutual Fund Capstone
37 pages
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
No ratings yet
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs
38 pages
Thesis Mob App
No ratings yet
Thesis Mob App
24 pages
Full Mobile Cloud Computing Models Implementation and Security 1st Edition Keke Gai Ebook All Chapters
100% (5)
Full Mobile Cloud Computing Models Implementation and Security 1st Edition Keke Gai Ebook All Chapters
62 pages
Industrial Relations
100% (1)
Industrial Relations
21 pages
Operating System
No ratings yet
Operating System
16 pages
CSCI319 Distributed Systems
No ratings yet
CSCI319 Distributed Systems
26 pages
Distributed Database Systems-Chhanda Ray-1
100% (2)
Distributed Database Systems-Chhanda Ray-1
20 pages
Risk
No ratings yet
Risk
61 pages
Distributed File Systems & Name Services: UNIT-4
No ratings yet
Distributed File Systems & Name Services: UNIT-4
70 pages
Industrial Relations and Labour Legislations BBA - 345
No ratings yet
Industrial Relations and Labour Legislations BBA - 345
45 pages
ADBMS Sem 1 Mumbai University (MSC - CS)
No ratings yet
ADBMS Sem 1 Mumbai University (MSC - CS)
39 pages
IBPS SO IT Officer Syllabus 2018-19 Amp Exam Pattern Latest Updates
No ratings yet
IBPS SO IT Officer Syllabus 2018-19 Amp Exam Pattern Latest Updates
18 pages
1994 Vitara SERVICE MANUAL 99500-60A10-01E PDF
No ratings yet
1994 Vitara SERVICE MANUAL 99500-60A10-01E PDF
835 pages
Source Code Management
No ratings yet
Source Code Management
14 pages
Adr241s - Am 511 19
No ratings yet
Adr241s - Am 511 19
87 pages
Distributed Systems Question Bank-2021-2022
0% (1)
Distributed Systems Question Bank-2021-2022
7 pages
Cloud Computing in Distributed System IJERTV1IS10199
No ratings yet
Cloud Computing in Distributed System IJERTV1IS10199
8 pages
Indian Railways Management Information System
No ratings yet
Indian Railways Management Information System
23 pages
ANU MCA Syllabus
No ratings yet
ANU MCA Syllabus
148 pages
MODULE 3 Syncronization
No ratings yet
MODULE 3 Syncronization
22 pages
Shweta Mba Project Report Final
No ratings yet
Shweta Mba Project Report Final
74 pages
Unit 1 1.: Discuss The Challenges of The Distributed Systems With Their Examples?
No ratings yet
Unit 1 1.: Discuss The Challenges of The Distributed Systems With Their Examples?
18 pages
Ppts On Industrial Relations
100% (2)
Ppts On Industrial Relations
89 pages
VersaMax Micro Manual
No ratings yet
VersaMax Micro Manual
664 pages
SketchUp & LayOut For Architecture Book - The Step-By-Step Workflow of Nick Sonder - Written by Matt Donley and Nick Sonder
No ratings yet
SketchUp & LayOut For Architecture Book - The Step-By-Step Workflow of Nick Sonder - Written by Matt Donley and Nick Sonder
17 pages
Final Innovations in Phase II
No ratings yet
Final Innovations in Phase II
86 pages
Short-Form Videos Degrade Our Capacity To Retain Intentions
No ratings yet
Short-Form Videos Degrade Our Capacity To Retain Intentions
15 pages
Bad Block Management in SSD
100% (1)
Bad Block Management in SSD
3 pages
Client and Server Concepts
No ratings yet
Client and Server Concepts
1 page
Business Communication
No ratings yet
Business Communication
15 pages
Microsoft 365 E5
No ratings yet
Microsoft 365 E5
1 page
System Unit: Ralph El Khoury, Wajdi Abboud
No ratings yet
System Unit: Ralph El Khoury, Wajdi Abboud
30 pages
Siemens Whats New in NX Academic
No ratings yet
Siemens Whats New in NX Academic
33 pages
15-Session Taxonomy of Virtualization Techniques
No ratings yet
15-Session Taxonomy of Virtualization Techniques
9 pages
A Comprehensive Survey On Authentication and Privacy-Preserving Schemes in VANETs
No ratings yet
A Comprehensive Survey On Authentication and Privacy-Preserving Schemes in VANETs
18 pages
Internship Report
No ratings yet
Internship Report
18 pages
Mis Case Studies
33% (3)
Mis Case Studies
3 pages
Welfare of Special Categories of Labour: Unit - V
No ratings yet
Welfare of Special Categories of Labour: Unit - V
43 pages
Introduction To Database Systems: Database Management Systems, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Introduction To Database Systems: Database Management Systems, R. Ramakrishnan and J. Gehrke 1
21 pages
Evolution of IR
No ratings yet
Evolution of IR
22 pages
Company Profile UPFOS
No ratings yet
Company Profile UPFOS
19 pages
Assignments-DISCRETE MATHEMATICS
No ratings yet
Assignments-DISCRETE MATHEMATICS
3 pages
Traveling Salesman Problem
No ratings yet
Traveling Salesman Problem
11 pages
Banking Sector: Developments, Challenges and Opportunities
No ratings yet
Banking Sector: Developments, Challenges and Opportunities
8 pages
Vehicular ODD-EVEN Compliance and Challan Generation
No ratings yet
Vehicular ODD-EVEN Compliance and Challan Generation
13 pages
Distributed Systems-A Brief Introduction
No ratings yet
Distributed Systems-A Brief Introduction
30 pages
Hardware Platform Trends
No ratings yet
Hardware Platform Trends
6 pages
Cpe 111a Assign Ment
No ratings yet
Cpe 111a Assign Ment
2 pages
E Commerce Security Protocols: Presentation By: Jyotsna Mishra Id: 618057 BSC 6 Semester
No ratings yet
E Commerce Security Protocols: Presentation By: Jyotsna Mishra Id: 618057 BSC 6 Semester
8 pages
Distributed Coordination-Based Systems
No ratings yet
Distributed Coordination-Based Systems
25 pages
Mbsimgui First Steps
No ratings yet
Mbsimgui First Steps
4 pages
Birlasoft CaseA PDF
No ratings yet
Birlasoft CaseA PDF
30 pages
Capital Budgeting: Final Paper 2: Strategic Financial Management, Chapter 2: Capital Budgeting, Part 2
No ratings yet
Capital Budgeting: Final Paper 2: Strategic Financial Management, Chapter 2: Capital Budgeting, Part 2
27 pages
Kumar Mangalam Birla Committee Report & Recommendations
No ratings yet
Kumar Mangalam Birla Committee Report & Recommendations
7 pages
Subject: Pple Branch: Ece Important Questions For External Exams
No ratings yet
Subject: Pple Branch: Ece Important Questions For External Exams
2 pages
Sap Support Portal
No ratings yet
Sap Support Portal
7 pages
The Database Environment
No ratings yet
The Database Environment
37 pages
Classification of Books Using Python and Flask
No ratings yet
Classification of Books Using Python and Flask
5 pages
Computer Terminology
No ratings yet
Computer Terminology
7 pages
K-Map Method
No ratings yet
K-Map Method
3 pages
Coda
No ratings yet
Coda
25 pages
Importance of EXCEL
No ratings yet
Importance of EXCEL
1 page
Steps For Signing Up On The NGO-DARPAN Portal New User
No ratings yet
Steps For Signing Up On The NGO-DARPAN Portal New User
1 page
Lecture 4
No ratings yet
Lecture 4
16 pages
Andrew
No ratings yet
Andrew
12 pages
Wage Policy
No ratings yet
Wage Policy
5 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Distributed File System

Uploaded by

Distributed File System

Uploaded by

Distributed File System

• The design and implementation of a distributed file system is more

Allocation and management of space on a secondary storage device thus

• The general goal of widespread accessibility of computational and informational facilities,

• What is Big Data?

BigData could be found in three forms:

1. Ability to process Big Data brings in multiple benefits, such as-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.