0% found this document useful (0 votes)

37 views

Distrsyslectureset1 Win20

Uploaded by

Faheem Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Distrsyslectureset1 Win20

Uploaded by

Faheem Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Distributed Computing

Systems (CS 230)

Prof. Nalini Venkatasubramanian

Dept. of Computer Science
Donald Bren School of Information and Computer Sciences
University of California, Irvine

1
CS230
Distributed Computing Systems
Winter 2020
Lecture 1 - Introduction to Distributed Computing
Wed 5:00-8:50p.m., ALP 2200
Nalini Venkatasubramanian
nalini@uci.edu

2
CS230: Distributed Computing Systems
Course logistics and details

● Course Web page -

● http://www.ics.uci.edu/~cs230
● Lectures – W 6:00 – 8:50 p.m, ALP 2200
● Must Read: Course Reading List
● Collection of Technical papers and reports by topic
● Reference Books (recommended)
● Distributed Systems: Concepts & Design, 5th ed. by
Coulouris et al.(preferred)
● Distributed Systems: Principles and Paradigms, 2nd
ed. by Tanenbaum & van Steen.
● Distributed Computing: Principles, Algorithms, and
Systems, 1st ed. by Kshemkalyani & Singhal.

● TA for Course
● Nailah Alhassoun (nailah@uci.edu)
3
CS230: Distributed Computing Systems
Course logistics and details
● Homeworks
● Written homeworks
● Problem sets
● Includes paper summaries (1-2 papers on the
specific topic from the reading list)
● Course Examination (tentatively Week 8)
● Course Project
● In groups of 3
● Will require use of open source distributed computing
platforms
● Suggested projects will be available on webpage
4
CS230: Distributed Computing Systems
Prerequisite Knowledge
● Necessary – Operating Systems Concepts and
Principles, basic computer system architecture
● Highly Desirable – Understanding of Computer
Networks, Network Protocols
● Necessary – Basic programming skills in Java,
Python, C++,…

Distributed Systems 5
CompSci 230 Grading Policy
● Homeworks - 30% of final grade
• 4 homeworks - one for each segment of the course
– Problem sets, paper summaries (2 in each set)
• A homework due approximately every 2 weeks
• Make sure to follow instructions while writing and creating
summary sets.
• Extra Credit - Summary of 2 distributed computing related
distinguished talks this quarter

● Course Exam – 40% of final grade

● Class Project - 30% of final grade
● Part 1: Due Week 6
● Part 2: Due Finals Week
● Final assignment of grades will be based on a curve.
6
CS230: Distributed Computing Systems
Syllabus and Lecture schedule
● Part 0 - Introduction to Distributed Systems
● Part 1: Time and State in Distributed Systems
○ Physical Clocks, Logical Clocks, Clock Synchronization
○ Global Snapshots and State Capture
● Part 2: From Operating Systems to Distributed Systems
○ Architectural Possibilities, Communication Primitives (Distributed Shared
Memory, Remote Procedure Calls)
○ Distributed Coordination (mutual exclusion, leader election, deadlocks)
○ Scheduling and Load Balancing in distributed systems
○ Distributed Storage and FileSystems
● Part 3: Messaging and Communication in Distributed Systems
○ ALM. Mesh/Tree Protocols, Group Communication, Distributed
Publish/Subscribe
● Part 4: Reliability and Fault Tolerance in Distributed Systems
○ Fault Tolerance, Consensus, Failure Detection, Replication, Handling
Byzantine Failures
7
CS230: Distributed Computing Systems
Wk Dates Lecture Topic

1 Jan 8 Introduction to distributed systems Project group formation

(Jan 10 - Gul Agha) and models
2 Jan 15 Time in Distributed Systems Project proposal due
(Physical/Logical Clocks, Clock
Synchronization)
3 Jan 22 Global State in Distributed Systems Homework 1 due

4 Jan 29 Distributed OS - RPC, DSM,

(Jan 30 - Nancy Lynch) Distributed Mutual Exclusion,
Deadlocks
5 Feb 5 Distributed OS - Homework 2
Scheduling,Migration, Load
Balancing, Distributed FileSystems

Project meeting
6 Feb 12 Group Communication, ALM Project update/initial
demo
7 Feb 19 Publish/Subscribe, Fault Tolerance Homework 3

8 Feb 26 Course Exam

9 Mar 4 Failure Detection, Consensus Project update

10 Mar 11 Replication, Replicated State Homework 4

Management
11 Mar 18 Project demos, reports, slides (if possible, poster presentation
to dept.)
8
9
Lecture Schedule
● Weeks 1,2,3: Distributed Systems Introduction
● Introduction – Needs/Paradigms
• Basic Concepts and Terminology, Concurrency
● Time and State in Distributed Systems
• Physical and Logical Clocks
• Distributed Snapshots, Termination Detection, Consensus
● Week 4,5,6: Distributed OS and Middleware Issues
● Interprocess Communication
• Remote Procedure Calls, Distributed Shared Memory
● Distributed Process Coordination/Synchronization
• Distributed Mutual Exclusion/Deadlocks, Leader Election
● Distributed Process and Resource Management
• Task Migration, Load Balancing
● Distributed I/O and Storage Subsystems
• Distributed FileSystems

Distributed Systems 10
Lecture Schedule
● Weeks 7,8: Messaging and Communication in
Distributed Systems
● Naming in Distributed Systems
● Gossip, Tree, Mesh Protocols
● Group Communication
● Weeks 9,10: Non-functional “ilities” in distributed
systems
● Reliability and Fault Tolerance
● Quality of Service and Real-time Needs
● Sample Distributed Systems (time permitting)
● P2P, Grid and Cloud Computing, Mobile/Pervasive

Distributed Systems 11
What is not covered

● Security in Distributed Systems (Prof. Tsudik’s

course)
● Distributed Database Management and
Transaction Processing (CS 223, Prof. Mehrotra)
● Distributed Objects and Middleware Platforms
(CS237 - Spring Quarter 2020, Prof. Nalini)

Distributed Systems 12
Distributed Systems
● Lamport’s Definition
● “ You know you have one when the crash of a computer you have
never heard of stops you from getting any work done.”

● “A number of interconnected autonomous computers that provide

services to meet the information processing needs of modern
enterprises.”

● Andrew Tanenbaum
A distributed system is a collection of independent computers that
appear to the users of the system as a single computer.
● “An interconnected collection of autonomous processes” - Wak
Fokknik (an algorithmic view)
● FOLDOC (Free on-line Dictionary) -??
A collection of (probably heterogeneous) automata whose distribution is transparent to the user
so that the system appears as one local machine. This is in contrast to a network, where the
user is aware that there are several machines, and their location, storage replication, load
balancing and functionality is not transparent. Distributed systems usually use some kind of
“client-server organization”
13
People-to-Computer Ratio Over
Time

From David Culler (Berkeley)

What is a Distributed System?

15
What is a Distributed System?

16
What is a Distributed System?

Internet

More Examples: Banking systems, Communication (messaging, email), Distributed information systems (WWW,
federated DBs, Manufacturing and process control, Inventory systems, ecommerce, Cloud platforms, mobile
17
computing infrastructures, pervasive/IoT systems
Distributed Computing Systems
Globus Grid Computing Toolkit
Gnutella P2P Network

PlanetLab Cloud Computing Offerings

18
Parallel Systems
● Multiprocessor systems with more than one CPU
in close communication.
● Improved Throughput, increased speedup,
increased reliability.
● Kinds:
• Vector and pipelined
• Symmetric and asymmetric multiprocessing
• Distributed memory vs. shared memory

● Programming models:
• Tightly coupled vs. loosely coupled ,message-based vs. shared
variable
Principles of Operating Systems -
Lecture 1 19
Parallel Computing Systems
ILLIAC 2 (UIllinois)
Climate modeling,
earthquake
simulations, genome
analysis, protein
folding, nuclear fusion
research, ….. K-computer(Japan)

Tianhe-1(China)

IBM Blue Gene

Connection Machine (MIT)

Principles of Operating Systems -
Lecture 1 20
Peer to Peer Systems

P2P File Sharing

Napster, Gnutella, Kazaa, eDonkey,
BitTorrent
Chord, CAN, Pastry/Tapestry,
Kademlia

P2P Communications
MSN, Skype, Social Networking Apps

P2P Distributed Computing

Seti@home

Use the vast resources of machines at the edge of the Internet to build a network that
allows resource sharing without any central authority .

Distributed Systems 21
Real-time distributed systems
● Correct system function depends on timeliness
● Feedback/control loops
● Sensors and actuators
● Hard real-time systems -
● Failure if response time too long.
● Secondary storage is limited
● Soft real-time systems -
● Less accurate if response time is too long.
● Useful in applications such as multimedia, virtual reality.

Principles of Operating Systems -

Lecture 1 22
New application domains
Key problem space challenges
•Highly dynamic behavior
•Transient overloads
•Time-critical tasks
•Context-specific requirements
•Resource conflicts
•Interdependence of (sub)systems
•Integration with legacy (sub)systems

Key solution space challenges

• Enormous accidental & inherent
complexities
• Continuous evolution & change
• Highly heterogeneous platform,
language, & tool environments

Mapping problem space requirements to solution space artifacts is very hard!

Mobile & ubiquitous
distributed systems

Distributed Systems 24
Sample SmartSpaces Built - UCI
Responsphere - A Campus-wide
infrastructure to instrument, monitor,
SAFIRE – Situational
awareness for fire
OpsTalk– Speech based
awareness & alerting system
disaster drills & technology validation incident command for soldiers on the field
ACOUSTI ACOUSTI SA
APPLICATION
C C S
CAPTUR ANALYSI » Alerts
E Speech S
» Conversation
Monitoring

Voice Real-Tim and

e playback
Processi » Image and
ng Video
Ambien Tagging
t Noise » Spatial
Messaging

SCALE – A smart
community
awareness and
alerting testbed @
Montgomery County,
MD. A
NIST/Whitehouse
SmartAmerica Project
extended to Global
Cities Challenge.
25
Today’s Platforms Landscape - examples
System Goal

BitTorrent swarm-style (unstructured peer-oriented) downloads

- used in Twitter datacenter
Memcached A massive key-value store

Hadoop (+ HDFS) Reliable, scalable, high-performance distirbuted

computing platform for data reduction
MapReduce Programming massively parallel/distributed
applications

Spark Programming massively parallel/distributed real-time

applications

Zookeeper Support for coordination in distributed clusters

Spanner Globally distributed database solution/storage service

Storm Dealing with Stream Data processing

Dynamo Amazon’s massively replicated key-value store

Spread Group communication and replicated data

26
Distributed Systems

Hardware – very cheap ; Human – very expensive

Principles of Operating Systems -

Lecture 1 27
Characterizing Distributed Systems
● Multiple Autonomous Computers
● each consisting of CPU’s, local memory, stable storage, I/O paths
connecting to the environment
● Multiple architectural possibilities
● client/server, peer-oriented, cloud computing, edge-cloud
continuum
● Distribute computation among many processors.
● Geographically Distributed
● Interconnections
● some I/O paths interconnect computers that talk to each other
● Various communication possibilities
● Shared State
● No shared physical memory - loosely coupled
● Systems cooperate to maintain shared state
● Maintaining global invariants requires correct and coordinated operation
of multiple computers.

Distributed Systems 28
Why Distributed Computing?
● Inherent distribution
● Bridge customers, suppliers, and companies at
different sites.
● remote data access - e.g. web
● Support for interaction - email/messaging/social media
● Computation Speedup - improved performance
● Fault tolerance and Reliability
● Resource Sharing
● Exploitation of special hardware
● Scalability
● Flexibility

29
Why are Distributed Systems
Hard?
● Scale
● numeric, geographic, administrative
● Loss of control over parts of the system
● Unreliability of message passing
● unreliable communication, insecure communication,
costly communication
● Failure
● Parts of the system are down or inaccessible
● Independent failure is desirable

30
An entertaining talk: https://www.youtube.com/watch?v=JG2ESDGwHHY
31
Design goals of a distributed
system
● Sharing
● HW, SW, services, applications
● Openness(extensibility)
● use of standard interfaces, advertise services,
microkernels
● Concurrency
● compete vs. cooperate
● Scalability
● avoids centralization
● Fault tolerance/availability
● Transparency
● location, migration, replication, failure, concurrency

Intro to Distributed Systems

Middleware 32
Modeling Distributed
Systems

Key Questions
● What are the main entities in the system?
● How do they interact?
● How does the system operate?
● What are the characteristics that affect their
individual and collective behavior?

33
Classifying Distributed
Systems
● Based on Architectural Models
● Client-Server, Peer-to-peer, Proxy based,…
● Based on computation/communication - degree
of synchrony
● Synchronous, Asynchronous

● Based on communication style

● Message Passing, Shared Memory

● Based on Fault model

● Crash failures, Omission failures, Byzantine failures
● how to handle failure of processes/channels
Intro to Distributed Systems
Middleware 34
Architectural Models: Client-server

● Client/server computing allocates application processing between the client

and server processes.
● Request-response paradigm
● A typical application has three basic components:
● Presentation logic, Application logic, Data management logic
35
Client/Server Models
● There are at least three different models for
distributing these functions:
● Presentation logic module running on the client
system and the other two modules running on one or
more servers.
● Presentation logic and application logic modules
running on the client system and the data
management logic module running on one or more
servers.
● Presentation logic and a part of application logic
module running on the client system and the other
part(s) of the application logic module and data
management module running on one or more servers
Intro to Distributed Systems
Middleware 36
Architectural Models: Peer-to-peer

• No single node
server as a
server

• All nodes act as

client (and
server) at a time

37
More Architectural Models

Multiple servers, proxy servers and caches, mobile code, …

Mobile code

Multiple
servers

Proxy
38
Computation in distributed systems

Two variants based on bound on timing of events

● Asynchronous system
● no assumptions about process execution speeds and message
delivery delays

● Synchronous system
● make assumptions about relative speeds of processes and delays
associated with communication channels
● constrains implementation of processes and communication

● Concurrent Programming Models

● Communicating processes, Functions, Logical clauses, Passive
Objects, Active objects, Agents
Intro to Distributed Systems
Middleware 39
Concurrency issues
● Consider the requirements of transaction based
systems
● Atomicity - either all effects take place or none
● Consistency - correctness of data
● Isolated - as if there were one serial database
● Durable - effects are not lost
● General correctness of distributed computation
● Safety
● Liveness

40
Parallel Computing Systems
● Special case of a distributed system
● often to run a special application
● Designed to run a single program faster
● Supercomputer - high-end parallel machine

Barcelona - BSC MareNostrum 4

(165,888 cores, 24 cores/processor)
Intel -Cray Theta @Argonne
The world’s most elegant supercomputer 281,888 core, 64 cores per
processors
11.69 Peta-flops

41
Aurora: USA’s First ExaSCALE computer
Imagine …
- A computer so powerful that it
can predict future climate
patterns, saving millions of
people from drought, ﬂood, and
devastation.

- A computer so powerful that it

can simulate every activity of a
cancer cell, at the sub-atomic
level, with such accuracy that
we can effectively cure it, or
create a personalized treatment,
just for you.

cf: Argonne National Labs

42
https://youtu.be/dYUEFvqQso8
Flynn’s Taxonomy for Parallel
Computing
Instructions

Single (SI) Multiple (MI)

SISD MISD
Single (SD)

Single-threaded Pipeline architecture

process
Data

SIMD MIMD
Multiple (MD)

Vector Processing Multi-threaded

Programming

Parallelism – A Practical Realization of Concurrency

43
SISD (Single Instruction Single Data)

Processor

D D D D D D D

Instructions

A sequential computer which exploits no parallelism in either the

instruction or data streams.

Examples of SISD architecture are the traditional uniprocessor machines

(currently manufactured PCs have multiple processors) or old mainframes.
SIMD (Single Instruction Multiple Data)

Processor

D0 D0 D0 D0 D0 D0 D0
D1 D1 D1 D1 D1 D1 D1
D2 D2 D2 D2 D2 D2 D2
D3 D3 D3 D3 D3 D3 D3
D4 D4 D4 D4 D4 D4 D4
… … … … … … …
Dn Dn Dn Dn Dn Dn Dn

Instructions
A computer which exploits multiple data streams against a single instruction
stream to perform operations which may be naturally parallelized.
For example, an array processorFor example, an array processor or GPU.
MISD (Multiple Instruction Single Data)

Instructions

Instructions
Multiple instructions operate on a single data stream.
Uncommon architecture which is generally used for fault tolerance.
Heterogeneous systems operate on the same data stream and
aim to agree on the result.
Examples include the Space Shuttle flight control computer.
46
MIMD(Multiple Instruction Multiple Data)
Processor

D D D D D D D

Instructions
Processor

D D D D D D D

Instructions
Multiple autonomous processors simultaneously executing different instructions on
different data.
Distributed systems are generally recognized to be MIMD architectures;
either exploiting a single shared memory space or a distributed memory space.
Communication in Distributed
Systems
● Provide support for entities to communicate
among themselves
● Centralized (traditional) OS’s - local communication
support
● Distributed systems - communication across machine
boundaries (WAN, LAN).
● 2 paradigms
● Message Passing
● Processes communicate by sharing messages
● Distributed Shared Memory (DSM)
● Communication through a virtual shared memory.

48
Message Passing
State State

Message

● Basic primitives
● Send message, Receive message

Properties of communication channel

Latency, bandwidth and jitter

49
Messaging issues
Synchronous ● Unreliable communication
● atomic action requiring the ● Best effort, No ACK’s or
participation of the sender and retransmissions
receiver.
● Application programmer designs
● Blocking send: blocks until
message is transmitted out of the own reliability mechanism
system send queue
● Blocking receive: blocks until
message arrives in receive queue ● Reliable communication
● Different degrees of reliability
Asynchronous ● Processes have some guarantee
● Non-blocking send:sending process that messages will be delivered.
continues after message is sent
● Reliability mechanisms - ACKs,
● Blocking or non-blocking receive:
Blocking receive implemented by NACKs.
timeout or threads. Non-blocking
receive proceeds while waiting for
message. Message is
queued(BUFFERED) upon arrival.

50
Synchronous vs. Asynchronous

Communication Type (sync/async)

Personal greetings Sync
Email Async
Voice call Sync
Online messenger/chat Sync ?

Letter correspondence Async

Skype call Sync

Voice mail/voice SMS Async

Text messages Async 51

Remote Procedure Call
● Builds on message passing
● extend traditional procedure call to perform transfer of control
and data across network
● Easy to use - fits well with the client/server model.
● Helps programmer focus on the application instead of the
communication protocol.
● Server is a collection of exported procedures on some shared
resource
● Variety of RPC semantics
● “maybe call”
● “at least once call”
● “at most once call”

Intro to Distributed Systems

Middleware 52
Distributed Shared Memory
● Abstraction used for processes on machines that
do not share memory
● Motivated by shared memory multiprocessors that do
share memory
● Processes read and write from virtual shared
memory.
● Primitives - read and write
● OS ensures that all processes see all updates
● Caching on local node for efficiency
● Issue - cache consistency

53
Fault Models in Distributed
Systems
● Crash failures
● A processor experiences a crash failure when it ceases
to operate at some point without any warning. Failure
may not be detectable by other processors.
● Failstop - processor fails by halting; detectable by
other processors.
● Byzantine failures
● completely unconstrained failures
● conservative, worst-case assumption for behavior of
hardware and software
● covers the possibility of intelligent (human) intrusion.

54
Other Fault Models in
Distributed Systems
● Dealing with message loss
● Crash + Link
● Processor fails by halting. Link fails by losing
messages but does not delay, duplicate or corrupt
messages.
● Receive Omission
● processor receives only a subset of messages sent to
it.
● Send Omission
● processor fails by transmitting only a subset of the
messages it actually attempts to send.
● General Omission
● Receive and/or send omission

Intro to Distributed Systems

Middleware 55
Process

Failure Models
Omission and arbitrary failures

Class of failure Affects Description

Fail-stop Process Process halts and remains halted. Other processes may
detect this state.
Crash Process Process halts and remains halted. Other processes may
not be able to detect this state.
Omission Channel A message inserted in an outgoing message buffer never
arrives at the other end’s incoming message buffer.
Send-omission Channel A process completes a send, but the message is not put
in its outgoing message buffer.
Receive-omission Process A message is put in a process’s incoming message
buffer, but that process does not receive it.
Arbitrary Process or Process/channel exhibits arbitrary behaviour: it may
(Byzantine) channel send/transmit arbitrary messages at arbitrary times,
commit omissions; a process may stop or take an
incorrect step.
56
Failure Models
Timing failures

Class of Failure Affects Description

Clock Process Process’s local clock exceeds the bounds on its
rate of drift from real time.
Performance Process Process exceeds the bounds on the interval
between two steps.
Performance Channel A message’s transmission takes longer than the
stated bound.

57
Other distributed system
issues
● Concurrency and Synchronization
● Distributed Deadlocks
● Time in distributed systems
● Naming
● Replication
● improve availability and performance
● Migration
● of processes and data
● Security
● eavesdropping, masquerading, message tampering,
replaying
Intro to Distributed Systems
Middleware 58
Middleware for distributed systems
● Middleware is the software between the application programs and
the Operating System/base networking.
● An Integration Fabric that knits together applications, devices, systems
software, data
● Distributed Middleware
● Provides a comprehensive set of higher-level distributed computing
capabilities and a set of interfaces to access the capabilities of the
system.
● Provides Higher-level programming abstraction for developing
distributed applications
● Higher than “lower” level abstractions, such as sockets, monitors
provided by the OS operating system
● Includes software technologies to help manage complexity and
heterogeneity inherent to the development of distributed
systems/applications/information systems. Enables modular
interconnection of distributed “services”.
Useful Management Services: Naming and Directory Service, State Capture Service. Event Service,
Transaction Service, Fault Detection Service, Discovery/trading Service, Replication Service, Migration
Services

59
cf: Arno Jacobsen lectures, Univ. of Toronto
Applications

Types of Middleware

Manageme
DCE DCE Distributed File Service
Securit
y DCE DCE
Other Basic
Service Distributed Directory

nt
Services
Time Service Service
● Integrated Sets of Services DCE Remote Procedure Calls
● DCE from OSF - provides key distributed DCE Threads Services
technologies, including RPC, a distributed
naming service, time synchronization service, Operating System Transport Services

60
Distributed Computing Environment (DCE)
● DCE - from the Open Software Foundation (OSF), offers an environment
that spans multiple architectures, protocols, and operating systems
(supported by major software vendors)
● It provides key distributed technologies, including RPC, a distributed naming service, time
synchronization service, a distributed file system, a network security service, and a threads
package.

Applications

Management
DCE Distributed File Service
DCE
Security DCE DCE
Service Other Basic
Distributed Directory
Services
Time Service Service

DCE Remote Procedure Calls

DCE Threads Services

Operating System Transport Services

Intro to Distributed Systems

Middleware 61
Distributed Object Models
● Goal: Merge distributed computing/parallelism with an object model
● Object Oriented Programming
● Encapsulation, modularity, abstraction
● Separation of concerns
● Concurrency/Parallelism
● Increased efficiency of algorithms
● Use objects as the basis (lends itself well to natural design of
algorithms)
● Distribution
● Build network-enabled applications
● Objects on different machines/platforms communicate
● The use of a broker like entity or bus that keeps track of
processes, provides messaging between processes and other
higher level services
● CORBA, COM, DCOM, JINI, EJB, J2EE, Agent and actor-based models
The Object Management
Architecture (OMA)
Application objects: document Common facilities: accessing databases,
handling objects. printing files, etc.

ORB: the communication hub for

all objects in the system

Object Services: object events, persistent

objects, etc.

Intro to Distributed Systems

Middleware 63
63
Objects and Threads
● C++ Model
● Objects and threads are tangentially related
● Non-threaded program has one main thread of control
● Pthreads (POSIX threads)
• Invoke by giving a function pointer to any function in the system
• Threads mostly lack awareness of OOP ideas and environment
• Partially due to the hybrid nature of C++?

● Java Model
● Objects and threads are separate entities
● Threads are objects in themselves
● Can be joined together (complex object implements
java.lang.Runnable)
• BUT: Properties of connection between object and thread are not
well-defined or understood

64
Java and Concurrency
● Java has a passive object model
● Objects, threads separate entities
● Primitive control over interactions
● Synchronization capabilities also primitive
● “Synchronized keyword” guarantees safety but not
liveness
● Deadlock is easy to create
● Fair scheduling is not an option

65
Actors:
A Model of Distributed Objects

Interfac Thread
e Stat
e

Interface Procedur
e Actor system - collection of
Thread
Stat
Messag independent agents interacting via
e
es message passing
Interfac
Procedur e
Stat
e Thread e
Features
Procedur
• Acquaintances
e •initial, created, acquired
•History Sensitive
•Asynchronous
An actor can do one of three things: communication
1.Create a new actor and initialize its behavior
2.Send a message to an existing actor
3.Change its local state or behavior
Distributed Objects
● Techniques ● Issues with Distributed Objects
● Message Passing
● Abstraction
● Object knows about network;
● Network data is minimum ● Performance
● Argument/Return Passing ● Latency
● Like RPC. ● Partial failure
● Network data = args + return
result + names ● Synchronization
● Serializing and Sending Object ● Complexity
● Actual object code is sent. Might ● …..
require synchronization.
● Network data = object code +
object state + sync info
● Shared Memory
● based on DSM implementation
● Network Data = Data touched +
synchronization info

Intro to Distributed Systems

Middleware 67
Cloud Computing
● Cloud - Large multi-tenant data centers hosting storage,
computing, analytics, applications as services.
● Amazon, Salesforce, Google, Microsoft

● An example: Netflix
● Offers Online streaming video service (17,000+ titles in
2010)
● Netflix website with support for video search
● Recommendation engines
● Instant playback on 100s of devices including xbox,
game consoles, roku, mobile devices, etc.
● Transcoding service
●…
Netflix App: version 0 (how
it started)
● Plays movies on demand on a mobile device

Server

Netflix.com
Simple Design
• Web Services standards
• Netflix owns the data center
• Uses a fairly standard server
Challenges with Version 0

● Incredible growth in customers and devices

led to
● Need for horizontal scaling of every layer of
software stack.
● Needed to support high availability, low latency,
synchronization, fault-tolerance, …
● Had a decision to make:
● Build their own data centers to do all the above
OR
● Write a check to someone else to do all that
instead 70
Netflix migrated to Amazon
AWS
● John Ciancutti, VP engg. Netflix 2010 [Technical Blog]

● Letting Amazon focus on data center infrastructure allows our engineers

to focus on building and improving our business.
● Amazon calls their web services “undifferentiated heavy lifting” and
that’s what it is. The problems they are trying to solve are incredibly
difficult ones, but they aren’t specific to our business. Every successful
company has to figure out great storage, hardware failover, network
infrastructure, etc.

● We’re not very good at predicting customer growth or device engagement.

● Netflix has revised our public guidance for the number of customers we
will end 2010 with three times over the course of the year. We are
operating in a fast-changing and emerging market. How may
subscribers would you guess used our Wii application the week it was
launched? How many would you guess will use it next month? We have
to ask ourselves these questions for each device we launch because our
software systems need to scale to the size of the business, every time.
● Cloud environments are ideal for horizontally scaling architecture. We
don‘t have to guess months ahead what our hardware, storage, and
networking needs are going to be. We can programmatically access
more off these resources from shared pools within AWS almost
instantly.
71
Netflix “outsourcing”
components
● Think of Netflix in terms of main components
● The API you see that runs on your client system
● The routing policy used to connect you to a data
center
● The Netflix “home page” service in that data
center
● The movie you end up downloading
● Netflix cloud-based design
● breaks the solution into parts
● Builds each of these aspects itself
● But then pays a hosting company to run each
part, and not necessarily just one company!
72
Netflix Version 1

Netflix Movies:
Master
Home copies

Amazon.com
73
Features of new version

● Netflix.com is actually a “pseudonym” for

Amazon.com
● An IP address domain within Amazon.com
● Amazon’s control over the DNS allows it to vector
your request to a nearby Amazon.com data
center, then on arrival, Amazon gateway routes
request to a Netflix cloud service component
● The number of these varies elastically based on
load Netflix is experiencing
● Amazon AWS used to host the master copies
of Netflix movies

74
Akamai

● Akamai is an example of a “content

distribution service”
● A company that plays an intermediary role
● Content is delivered to the service by Netflix.com
(from its Amazon.com platform)
● Akamai makes copies “as needed” and distributes
them to end users who present Akamai with
appropriate URLs
● Netflix.com (within Amazon.com) returns a
web page with “redirection” URLs to tell your
browser app what to fetch from Akamai

75
Multi-tier View of Cloud
Computing
● Good to view cloud applications running
in a data center in a tiered way
● Outer tier near the edge of the cloud
hosts applications & web-sites
● Clients typically use web browsers or
web services interface to talk to the
outer tier
● focus is on vast numbers of clients & 1 1
rapid response. 1 1
● Inside the cloud (next tier) we find high 1
volume services that operate in a 1 2 2
2 2
pipelined manner, asynchronously 2
1
● Caching to support nimble outer tier 2
services 1 Shards
2
● Deep inside the cloud is a world of
1 Index
virtual computer clusters that are 2 DB
scheduled to share resources and on
which applications like MapReduce
(Hadoop) are very popular
76
In the outer tiers replication
is key
● We need to replicate
● Processing: each client has what seems to be a
private, dedicated server (for a little while)
● Data: as much as possible, that server has copies of
the data it needs to respond to client requests without
any delay at all
● Control information: the entire structure is managed
in an agreed-upon way by a decentralized cloud
management infrastructure
But, In a more general setting - with updates and
faults, consistency becomes hard to maintain
77
across the replicas (more later)
Tradeoffs in Distributed Systems
Some interesting experiences

HOPELESSNESS
AND CONFIDENCE
IN DISTRIBUTED
SYSTEMS DESIGN

https://youtu.be/TlU1opuCXB0 78
Tradeoffs: The CAP Conjecture
(Eric Brewer: PODC 2000 Keynote)

It is impossible for a networked shared-data system to

provide following three guarantees at the same time:
● Consistency
● Availability
● Partition-tolerance
Proved in 2002 by Gilbert and Lynch (CAP Theorem)

Will revisit later…

CS 230 - Distributed Systems
No ratings yet
CS 230 - Distributed Systems
37 pages
NUNIT I
No ratings yet
NUNIT I
119 pages
Distributed Systems
No ratings yet
Distributed Systems
121 pages
DC Module1
No ratings yet
DC Module1
62 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
44 pages
Syllabus
No ratings yet
Syllabus
9 pages
DC Module1
No ratings yet
DC Module1
54 pages
(Given)_DSs_CH_00_-_Course_Outline
No ratings yet
(Given)_DSs_CH_00_-_Course_Outline
3 pages
CS3551 Unit 1
No ratings yet
CS3551 Unit 1
24 pages
CS3551-Distributed computing Notes_removed (1)
No ratings yet
CS3551-Distributed computing Notes_removed (1)
32 pages
Course Outline: Addis Ababa University Department of Computer Science
No ratings yet
Course Outline: Addis Ababa University Department of Computer Science
1 page
Tema1
No ratings yet
Tema1
59 pages
Ch1-Introduction
No ratings yet
Ch1-Introduction
57 pages
Distributive Systems Lecture
No ratings yet
Distributive Systems Lecture
24 pages
Distributed System Notes
No ratings yet
Distributed System Notes
5 pages
Ds - cs8603 - III Yr - Unit II Digital Material
No ratings yet
Ds - cs8603 - III Yr - Unit II Digital Material
62 pages
Unit 1 DC
No ratings yet
Unit 1 DC
19 pages
CS3551 - 1_merged
No ratings yet
CS3551 - 1_merged
117 pages
distributed-systems-notes
No ratings yet
distributed-systems-notes
122 pages
5ec6f859-83a0-4b48-a986-46fa87aaa36d
No ratings yet
5ec6f859-83a0-4b48-a986-46fa87aaa36d
122 pages
Intro DSs CH 0 - Course Outline (Given 2013-I)
No ratings yet
Intro DSs CH 0 - Course Outline (Given 2013-I)
2 pages
Distributed Systems: Xining Li
No ratings yet
Distributed Systems: Xining Li
21 pages
Distributed Systems Notes
No ratings yet
Distributed Systems Notes
86 pages
Distributed Systems-Course Outline
No ratings yet
Distributed Systems-Course Outline
1 page
Distributed System
No ratings yet
Distributed System
162 pages
Distributed Systems Syllabus-subsection
No ratings yet
Distributed Systems Syllabus-subsection
4 pages
Course Plan
No ratings yet
Course Plan
3 pages
Building Secure and Reliable Network Applications
No ratings yet
Building Secure and Reliable Network Applications
4 pages
L00 ❘ Outline
No ratings yet
L00 ❘ Outline
8 pages
DC Syllabus BE Comp Engg R2019!64!65
No ratings yet
DC Syllabus BE Comp Engg R2019!64!65
2 pages
COMP 323 Course Content
No ratings yet
COMP 323 Course Content
3 pages
Distributed Systems Syllabus
No ratings yet
Distributed Systems Syllabus
3 pages
Unit 1-DC
No ratings yet
Unit 1-DC
80 pages
SPE 2401 Distributed Systems Course Outline
No ratings yet
SPE 2401 Distributed Systems Course Outline
2 pages
106106168
No ratings yet
106106168
760 pages
Bit4209 Distributed Systems Module
No ratings yet
Bit4209 Distributed Systems Module
117 pages
Course plan DS
No ratings yet
Course plan DS
20 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Dist Sys Notes
No ratings yet
Dist Sys Notes
87 pages
CS3551 - Distributed Computing (1)
No ratings yet
CS3551 - Distributed Computing (1)
106 pages
Chapter (1) Introduction To Distributed Systems
No ratings yet
Chapter (1) Introduction To Distributed Systems
15 pages
Lec 01
No ratings yet
Lec 01
15 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
DS - Lec01 - The Completed One
No ratings yet
DS - Lec01 - The Completed One
38 pages
DC - Hand Written
No ratings yet
DC - Hand Written
26 pages
Week 1
No ratings yet
Week 1
15 pages
DistSys Script
No ratings yet
DistSys Script
156 pages
Dos 2160710
No ratings yet
Dos 2160710
3 pages
Week 1 Introduction to Distributed Computing
No ratings yet
Week 1 Introduction to Distributed Computing
75 pages
Lesson Plan DOS 2024 Autumn SCR 1
No ratings yet
Lesson Plan DOS 2024 Autumn SCR 1
6 pages
Distributed Comp (Intro)
No ratings yet
Distributed Comp (Intro)
39 pages
CS8603-DS
No ratings yet
CS8603-DS
6 pages
Distributed Systems
No ratings yet
Distributed Systems
29 pages
CC ZG526 Course Handout
No ratings yet
CC ZG526 Course Handout
6 pages
Unit-2 (A)
No ratings yet
Unit-2 (A)
40 pages
Course File Distributed System
No ratings yet
Course File Distributed System
65 pages
Cloud Computing
No ratings yet
Cloud Computing
126 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Distrsyslectureset1 Win20

Uploaded by

Distrsyslectureset1 Win20

Uploaded by

Distributed Computing

Systems (CS 230)

Prof. Nalini Venkatasubramanian

● Course Web page -

● Course Exam – 40% of final grade

1 Jan 8 Introduction to distributed systems Project group formation

4 Jan 29 Distributed OS - RPC, DSM,

8 Feb 26 Course Exam

9 Mar 4 Failure Detection, Consensus Project update

10 Mar 11 Replication, Replicated State Homework 4

● Security in Distributed Systems (Prof. Tsudik’s

● “A number of interconnected autonomous computers that provide

From David Culler (Berkeley)

PlanetLab Cloud Computing Offerings

IBM Blue Gene

Connection Machine (MIT)

P2P File Sharing

P2P Distributed Computing

Principles of Operating Systems -

Key solution space challenges

Mapping problem space requirements to solution space artifacts is very hard!

Voice Real-Tim and

BitTorrent swarm-style (unstructured peer-oriented) downloads

Hadoop (+ HDFS) Reliable, scalable, high-performance distirbuted

Spark Programming massively parallel/distributed real-time

Zookeeper Support for coordination in distributed clusters

Spanner Globally distributed database solution/storage service

Storm Dealing with Stream Data processing

Dynamo Amazon’s massively replicated key-value store

Spread Group communication and replicated data

Hardware – very cheap ; Human – very expensive

Principles of Operating Systems -

Intro to Distributed Systems

● Based on communication style

● Based on Fault model

● Client/server computing allocates application processing between the client

• All nodes act as

Multiple servers, proxy servers and caches, mobile code, …

Two variants based on bound on timing of events

● Concurrent Programming Models

Barcelona - BSC MareNostrum 4

- A computer so powerful that it

cf: Argonne National Labs

Single (SI) Multiple (MI)

Single-threaded Pipeline architecture

Vector Processing Multi-threaded

Parallelism – A Practical Realization of Concurrency

A sequential computer which exploits no parallelism in either the

Examples of SISD architecture are the traditional uniprocessor machines

Properties of communication channel

Communication Type (sync/async)

Letter correspondence Async

Skype call Sync

Text messages Async 51

Intro to Distributed Systems

Intro to Distributed Systems

Class of failure Affects Description

Class of Failure Affects Description

a distributed file system, a network security

DCE Remote Procedure Calls

DCE Threads Services

Operating System Transport Services

Intro to Distributed Systems

ORB: the communication hub for

Object Services: object events, persistent

Intro to Distributed Systems

Intro to Distributed Systems

● Incredible growth in customers and devices

● Letting Amazon focus on data center infrastructure allows our engineers

● We’re not very good at predicting customer growth or device engagement.

● Netflix.com is actually a “pseudonym” for

● Akamai is an example of a “content

It is impossible for a networked shared-data system to

Will revisit later…

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.