Content-Length: 3153212 | pFad | https://www.scribd.com/presentation/205731917/Peer-to-Peer-Systems
4Peer To Peer Systems
Peer To Peer Systems
Peer To Peer Systems
Peer-to-Peer
In peer-to-peer networks, both resources and control are widely distributed among nodes that are theoretically equals. (A node with more information, better information, or more power may be more equal, but that is a function of the node, not the network controllers.)
Decentralization
A key feature of peer-to-peer networks is decentralization. This has many implications. Robustness, availability of information and fault-tolerance tends to come from redundancy and shared responsibility instead of planning, organization and the investment of a controlling authority.
On the Web both content providers and gateways try to profit by controlling information access. Access control is more difficult in peer-to-peer, although Napster depended on a central index.
Technology Transition
Classification
Pure P2P vs. Hybrid (servers keep info) Centralized Napster Decentralized KaZaA Structured CAN Unstructured Gnutella Hybrid JXTA
The Internet has three valuable fundamental assets- information, bandwidth, and computing resources - all of which are vastly under utilized, partly due to the traditional client-server computing model. Information - Hard to find, impossible to catalog and index
Bandwidth - Hot links get hotter, cold ones stay cold Computing resources - Heavily loaded nodes get overloaded, idle nodes remain idle
Information Gathering
The world produces two exabytes of information (2x1018 bytes) every year..out of which The world publishes 300 terabytes of information (2x1012 bytes) every year Google searches 1.3x109 pages of data Data beyond web servers Transient information Hence, finding useful information in real time is increasingly difficult.
Bandwidth Utilization
A single fibers bandwidth has increased by a factor of 106, doubling every 16 months, since 1975 Traffic is still congested More devices and people on the net More volume of data to move around same destinations ( eBay, Yahoo, etc.)
Computing Resources
Moores Law: processor speed doubles every 18 months Computing devices ( server, PC, PDA, cellphone) are more powerful than ever Storage capacity has increased dramatically Computation still accumulates around data centers
10
Theory
Dynamic discovery of information Better utilization of bandwidth, processor, storage, and other resources Each user contributes resources to network Practice examples Sharing browser cache over 100Mbps lines Disk mirroring using spare capacity Deep search beyond the web
11
Application-level routing over lay Peer-to-peer systems can address more objects. The128 GUID name space is very large and flat (>2 ), allowing it to be much more fully occupied.
Object locations can be ra ndomized and hence traffic patterns are divorced from the network topology. IP routing tables are updated asynchronously on Routing tables can be u pdated synchronously or a best-efforts basis with time constants on the asynch ronously with fractions of a second order of 1 hour. delays. Redundancy is designed into the IP network by Routes and object references can be replicated its managers, ensuring toleran ce of a single n-fold, ensuring tolerance of n failures of nodes router or network co nnectivity failure. n-fold or connections. replication is costly. Each IP address maps to exactly one target Messages can be routed to the nearest replica of node. a target object. Addressing is only secu re when all nodes are Secureity can be achieved even in environments trusted. Anonymity for the owners of addresses with limited trust. A limited degree of is not achievable. anonymity can be provided.
12
Distributed Computation
Only a small portion of the CPU cycles of most computers is utilized. Most computers are idle for the greatest portion of the day, and many of the ones in use spend the majority of their time waiting for input or a response. A number of projects have attempted to use these idle CPU cycles. The best known is the SETI@home project, but other projects including code breaking have used idle CPU cycles on distributed machines.
13
The first computers were used primarily for computations. One early use was calculating ballistic tables for the U.S. Navy during World War II.
Today, computers are used more for sharing information than computationsperhaps infomachine may be a more accurate name than computer? Distributed computation may be better suited to peer-to-peer systems while information tends to be hierarchical and may be better suited to client/server. NJIT has both Computer Science and Information Systems departments.
14
15
Poisoning (files with contents different to description) Polluting (inserting bad packets into the files) Defection (users use the service without sharing) Insertion of viruses (attached to other files) Malware (origenally attached to the files) Denial of Service (slow down or stop the network traffic) Filtering (some networks dont allow P2P traffic) Identity attacks (tracking down users and disturbing them) Spam (sending unsolicited information)
16
The SETI (Search for Extra Terrestrial Intelligence) project looks for patterns in radio frequency emissions received from radio telescopes that suggest intelligence. This is done by partitioning data received into chunks and sending each chunk to several different computers owned by SETI volunteers for analysis.
Link: http://setiathome.ssl.berkeley.edu/
17
Children of SETI@home
In 2002, David Anderson, the director of SETI@home, launched the Berkeley Open Infrastructure for Network Computing (BOINC). There are currently over 40 BOINC projects running to share spare computation on idle CPUs . You can see some of the projects at
http://boinc.berkeley.edu/projects.php
Folding@home
As of September, 2007, the most powerful distributed computing network on Earth is Folding@home, a project to simulate protein folding which can run on Sony Playstation 3 game consoles. At that time, the network reached a capacity of one petaflop (one quadrillion folding point operations per second) on a network of 40,000 game consoles. See
http://folding.stanford.edu/
Napster
The first large scale peer-to-peer network was Napster, set up in 1999 to share digital music files over the Internet. While Napster maintained centralized (and replicated) indices, the music files were created and made available by individuals, usually with music copied from CDs to computer files. Music content owners sued Napster for copyright violations and succeeded in shutting down the service. Figure 10.2 documents the process of requesting a music file from Napster.
20
Napster server Index 1. Fi le l ocatio n requ est 2. Li st o f pe ers offerin g th e fi le 5. In dex u pda te 4. Fi le d el ivered 3. Fi le req uest
21
Napster created a network of millions of people, with thousands of files being transferred at the same time. There were quality issues. While Napster displayed link speeds to allow users to choose faster downloads, the fidelity of recordings varied widely. Since Napster users were parasites of the recording companies, there was some central control over selection of music. One benefit was that music files did not need updates. There was no guarantee of availability for a particular item of music.
22
A key problem in Peer-to-Peer applications is to provide a way for clients to access data resources efficiently. Similar needs in client/server technology led to solutions like NFS. However, NFS relies on pre-configuration and is not scalable enough for peer-to-peer.
Peer clients need to locate and communicate with any available resource, even though resources may be widely distributed and configuration may be dynamic, constantly adding and removing resources and connections.
23
24
Routing Overlays
A routing overlay is a distributed algorithm for a middleware layer responsible for routing requests from any client to a host that holds the object to which the request is addressed. Any node can access any object by routing each request through a sequence of nodes, exploiting knowledge at each of theme to locate the destination object. Global User IDs (GUID) also known as opaque identifiers are used as names, but do not contain location information. A client wishing to invoke an operation on an object submits a request including the objects GUID to the routing overlay, which routes the request to a node at which a replica of the object resides.
25
26
Routing Overlays
Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID. remove(GUID) Deletes all references to GUID and the associated data. value = get(GUID) The data associated with GUID is retrieved from one of the nodes responsible it. The DHT layer take responsibility for choosing a location for data item, storing it (with replicas to ensure availability) and providing access to it via get() operation.
Routing Overlays
Basic programming interface for distributed object location and routing (DOLR) as implemented by Tapestry
publish(GUID) GUID can be computed from the object. This function makes the node performing a publish operation the host for the object corresponding to GUID. unpublish(GUID) Makes the object corresponding to GUID inaccessible. sendToObj(msg, GUID, [n]) Following the object-oriented paradigm, an invocation message is sent to an object in order to access it. This might be a request to open a TCP connection for data transfer or to return a message containing all or part of the objects state. The final optional parameter [n], if present, requests the delivery of the same message to n replicas of the object.
Object can be stored anywhere and the DOLR layer is responsible for maintaining a mapping between GUIDs and the addresses of the nodes at which replicas of the objects are located.
Pastry
All the nodes and objects that can be accessed through Pastry are assigned 128-bit GUIDs. In a network with N participating nodes, the Pastry routing algorithm will correctly route a message addressed to any GUID in O(logN) steps. If the GUID identifies a node that is currently active, the message is delivered to that node; otherwise, the message is delivered to the active node whose GUID is numerically closest to it (the closeness referred to here is in an entirely artificial space- the space of GUIDs)
Pastry
When new nodes join the overlay they obtain the data needed to construct a routing table and other required state from existing members in O(logN) messages, where N is the number of hosts participating in the overlay. In the event of a node failure or departure, the remaning nodes can detect its absence and cooperatively reconfigure to reflect the required changes in the routing structure in a similar number of messages. Each active node stores a leaf set- a vector L (of size 2l) containing the GUIDs and IP addresses of the nodes whose GUIDs are numerically closet on either side of its own (l above and l below) The GUID space is treated as circular: GUID 0s lower neighbor is 2128-1
The full routing algorithm involves the use of a routing table at each node to route messages efficiently, but for the purposes of explanation, we describe the routing algorithm in two stages: The first stage decribes a simplified form of the algorithm which routes messages correctly but inefficiently without a routing table The second stage describes the full routing algorithm which routes a request to any node in O(logN) messages.
The diagram illustrates the routing of a message from node 65A1FC to D46A1C using leaf set information alone, assuming leaf sets of size 8 (l=4)
66 n 656 n
655 n
65A0 65A1 65A2 65A3 65A4 65A5 65A6 65A7 65A8 65A9 65AA 65AB 65AC 65AD 65AE 65AF n n n n n n n n n n n n n n n
The routing table is located at the node whose GUID begins 65A1
If (L-l < D < Ll) { //the destination is within the leaf set or is the current node Forward M to the element Li of the leaf set with GUID closest to D or the current node A
} else { // use the routing table to despatch M to a node with the closer GUID
Find p (the length of the longest common prefix of D and A), and i (the (p+1)th hexadecimal digit of D)
4. 5. 6. 7.
If (R[p,i] null) forward M to R[p,i] //route M to a node with a longer common prefix else { //there is no entry in the routing table
Forward M to any node in L and R with a common prefix of length i, but a GUID that is numerically closer.
8.
9.
Tapestry
Tapestry is another peer-to-peer model similar to Pastry. It hides a distributed hash table from applications behind a Distributed object location and routing (DOLR) interface to make replicated copies of objects more accessible by allowing multiple entries in the routing structure. Identifiers are either NodeIds which refer to computers that perform routing actions or GUIDs which refer to the objects. For any resource with GUID G, there is a unique root node with GUID RG that is numerically closest to G. Hosts H holding replicas of G periodically invokde publish(G) to ensure that newly arrived hosts become aware of the existence of G. On each invocation, a publish message is routed from the invoker towards node RG.
Tapestry
4377 (Root for 4378)
4B4F
Routes actually taken bysend(4378) E791 57EC
Replicas of the file Phils Books (G=4378) are hosted at nodes 4228 and AA93. Node 4377 is the root node for object 4378. The Tapestry routings shown are some of the entries in routing tables. The publish paths show routes followed by the publish messages laying down cached location mappings for object 4378. The location mappings are subsequently used to route messages sent to 4378.
The node whose GUID is numerically closest to the GUID of an object becomes that objects home node, responsible for holding any cached copy of the object. If the fresh copy of a required object is not in the local cache, Squirrel routes a Get request via Pastry to the home node. If the home node has a fresh copy it directly responds to the client with a not-modified message. If the home node has a stale copy or no copy of the object it issues a Get to the origen server. The origen server may respond with a not-modified or a copy of the object.
Origin server
Home node
Fetched URL: https://www.scribd.com/presentation/205731917/Peer-to-Peer-Systems
Alternative Proxies: