Grid vs. Peer-to-Peer: Project Report No. 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Project Report No.

Grid vs. Peer-to-Peer


Yin Chen
s0231189@sms.ed.ac.uk

1 Grid vs. P2P


Same general problem, the organization of resource sharing within virtual communities

Same general approach, the creation of overlay structures that coexist with, but need not correspond in
structure to, underlying organizational structures

Grid computing addresses infrastructure but not yet failure, whereas P2P addresses failure but not yet
infrastructure. The interests of the two commities are likely to grow together over time.

1.1 Defination
Grids are sharing environments implemented by the deployment of a persistent, standards-based
service infrastructure that supports the creation of, and resources sharing within, distributed
commities.

P2P is a class of applications that takes advantage of resources-storage, cycles,content, human


presence - available at the edges of the Internet. Because accessing these decentralized resources
means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P
design requirements commonly include independence from DNS and significant or total autonomy
from central servers. Their implementations frequently involve the creation of overlay network with
a structure independent of that of the underlying Internet.

1.2 Comparing Grids and P2P


Grid provids many servers to moderate-sized commities and emphasize the integration of
substantial resources to deliver nontrivial qualities of servers within an environment of at least
limited trust.

P2P deal with many more participants but offer limited and specialized servers, have been less
concerned with qualities of service, and have made few if any assumptions about trust.

Grids have incrementally scaled the deployment of relatively sophisticated servers and application.
Connecting small numbers of sites into collaborations engaged in complex scientific applications.As
system scale increases, Grid developers are now facing and addressing problems relating to
autonomic configuration and management.

P2P communities developed rapidly around sharing and are now seeking to expand to more
sophisticated applications as well as continuing to management.

1.2.1 Target Communities and Incentives


Grid were motivated by the requirements of professional communities needing to access remote
resources,federate datasets, and/or pool computers for large-scale simulations and data analyses.
It was initially developed to address the needs of scientific collaborations, commercial interest
is growing.

P2P has been popularized by grass roots, mass-culture file-sharing and highly parallel
computing applications that scale in some instances to hundreds of thousands of nodes.
1.2.2 Resources
Grid integrate resources that are more powerful, more diverse, and better connected that the
typical P2P.
A Grid resource might be a cluster, storage system, database, or scientific instrument of
considerable value that is administered in an organized fashion according to some well defined
policy. This explicit administration enhances the resource's ability to deliver desired qualities of
serviece and can facilitate, e.g. software upgrades, but it can also increase the cost of integrating
the resource into a Grid. Explicit administration, higher cost of membership, and the stronger
community links within scientific VOs mean that resource availability tends to be higher and
more uniform.

P2P often deal with intermittent participation and highly variable behavior. Major resources are
home computers. The difference in capabilities between home and work computers illustrated
by the average CPU time per work unit in SETI@home: home computer are 30% slower than
work computers(13:45 vs. 10:16 hours per work unit).

1.2.3 Applications
Grid tends to be far more data intensive. Because of better network connectivity, which also
allows for more flexibility in Grid application design.

1.2.4 Scale and Failure


Grid often involves only modest numbers of participants. The amount of activity can be large.
Early Grid implementatins did NOT address scalability and self-management as priorities.
Core Grid protocols(Globus Toolkit) does not preclude scalability, actual deployments often
employ centralized components. e.g. central repositories for shared data, centralized resource
management components, and centralized( and/or hierarchical) information directories. This
situation is changing...

P2P has far larger communities. The activity is about the same as Grid. First generation is
centralized structures. Second-generation is flooding-based, Third-generation based on
distributed hash tables. First and second generation are characterized at the level of both
individual nodes(behavior, resources) and network properties(topological properties, scale,
traffic), revealing not only general resilience but also unexpected emergent properties. Third
generation is characterized by simulation studies rather than large-scale deployments. Scalable
autonomic management achieved in narrow domains.

1.2.5 Services and Infrastructure


In Grid, works have been done associated with creating and operating persistent, multipurpose
infrastructure services for authentication, authorization, discovery, resource access, data
movement...Less effort has been devoted to managing participation in the absence of trust,

P2P offers much scalability, fault tolerance, self-configuration, automatic problem


determination. P2P system have tended to focus on the integration of simple
resources(individual computers) by protocols designed to provide specific vertically integreted
functionality. The persistence properties of such infrastructures are not specifically engineered
but are rather emergent properties.

1.2.6 Distinctions in requirements:

First, some services are specific to particular regimes: eg. machanisms that make up for the
inherent lack of incentives for cooperation in P2P.

Second, functionality requirements can conflict: eg. Grid require accountability and P2P
anonymity

Third, common services may start from different hypotheses, as in the case of trust.
1.2.7 Future Directions
Grid and P2P are both concerned with the pooling and coordinated use of resources within
distributed communities and are constructed as overlay structures that operate largely
independently of institutional relationships.

2 Inject P2P into Grid - Case Study [2]

3 Issues of P2P

3.1 Some link topologies [6]

3.2 Response Modes [6]

3.3 Query Processing [6]

(A) Template Query Execution Plan


(B) Centralized Execution Plan

Can be inefficient. However, sometime is the only plan that satisfies the semantics of a query.

(C) Recursively Partitionable Query

3.4 Dynamic Abort Timeout [6]


Problems
- User no longer interested in query results
- To avoid forever roaming the network, the query should be fade away after sometime
- Static timeout remains unchanging across hops

Solution =>Dynamic abort timeout


- Nodes further away from the originator time out earlier than nodes closer to the originator.
- Decrease the timeout at each hop
- Exponential decay with halving
3.5 Query Scope [6]
Problem
- No necessary to search the whole net
- Broadcast model will flooding the network.

Solution => Select a neighbour subset


- Search only a specific domain, host, owner etc.
- Random selects half of the neighbours
- In a tree-like topology, select all child nodes and ignores all parent nodes
- Maintain a statistics about its neighbours. One may only select neighbours that meet minimum
requirements in term of latency, bandwidth or historic (maxLatency, minBandwidth,
minHistoricResult)
- Only find a single result
- Specify the maximum number of result tuples(maxResults) and bytes(maxResultBytes) to be
returned.
- Neighbour Selection Query
- Radius of a query is a measure of path length.
* It is the maximum number of hops a query is allowed to travel
* The radius is decreased by one at each hop.
* The roaming query and response traffic fade away upon reaching a radius of less than zero.

3.6 Routing [5]


- Random forwarding(random walk)
- Learning: nodes record the requests answered by other nodes. A request is forwarded to the peer
that answered similar requests previously or randomly if no relevant experience exists.
- Best neighbour records the number of answers received from each peer( without recording the type
of request answered). A request is forwarded to the peer who answered the largest number of
requests.
- Learning + best neighbour: identical with the learning, except that, when no relevant experience
exits, the request is forwarded to the best neighbour.[4]

4 P2P Case Study - FreeNet [4]


Freenet provides a file-storage service.

4.1 Freenet Architecture


4.1.1 Adding a new file
- A user assigns it a GUID key and sends an insert message containing file identifier(GUID), and
a time-to-live (TTL) value that represents the number of copies to store.

<GUID Keys>
* Freenet GUID keys are calculated using SHA-1 secure hashes,
* GUID is location-independent globally unique identifier, based on contents of the file.
* Hash ensures that similar works will be scattered throughout the network. Single node's
failure will make no impact on others, which increases robustness.
- On receiving an insert, a node checks its data store to see if the key already exists.
- If the key does no exist in the node's data store, the node looks up the closest key and forwards
the message to the corresponding node as it would for a query.
- If the TTL expires without collision, the final node returns an "all clear" message. The user then
sends the data down the path established by the initial insert message.
- Each node along the path verifies the data against its GUID, stores it, and creates a routing table
entry that lists the data holder as the final node in this chain.
- If the insert encounters a loop to a dead-end, it backtracks to the second-nearest key, then the
third nearest and so on, until it succeeds.

4.1.2 Messaging and Privacy


- Rather than move directly from sender to recipient, messages travel through node-to-node
chains, in which each link is individually encrypted, until the message finally reaches its
recipient.
- Each node in the chain knows only about its immediate neighbours.
- The node immediately after the sender can not tell whether its predecessor was the message's
originator, or was merely forwarding a message from another node.
- The node immediately before the receiver can't tell whether its successor is the true recipient or
will continue to forward it.

4.1.3 Routing
- Steepest-ascent hill-climbing search: Each node forwards queries to the node that it thinks is
closest to the target.

4.1.4 Requesting files


- Every node maintains a routing table that lists the addresses of other nodes and the GUID keys
- When a node receives a query, it first checks its own store, and if it finds the file, returns it with
a tag identifying itself as the data holder. Otherwise the node forwards the request to the node in
its table with the closest key to the one requested. That node then checks its store, and so on.
- If the request is successful, each node in the chain passes the file back upstream and creates a
new entry in its routing table associating the data holder with the requested key.
- Each node might also cache a copy locally.
- Routing tables are never revealed to other nodes.
- To limit resource usage, the requester gives each query a time-to-live limit that is decremented
at each node. If the TTL expires, the query fails, although the user can try again with a higher
TTL(up to some maximum).
- If a node sends a query to a recipient that is already in the chain, the message is bounced back
and the node tries to use the next-closest key instead.
- If a node runs out of candidates to try, it reports failure back to its predecessor in the chain,
which then tries its second choice, and so on.
- The request homes in closer with each hop until the key is found.

4.1.5 Adding Nodes


- A new node first generates a public-private key pair for itself. The public keys are not certified,
using as node's identity.
- Next, the node sends an announcement message including the public key and physical address
to an existing node, located through some out-of-band means such as personal communication
or lists of nodes posted on the web, with a user-specified TTL.
- The receiving node notes the new node's identifying information and forwards the
announcement to another node chosen randomly from its routing table.
- The announcement continues to propagate until its TTL runs out.

4.1.6 Training Routes


- Nodes' routing tables should specialised in handling clusters of similar keys because each node
will mostly receive requests for keys that are similar to the keys it is associated with in other
nodes' routing tables.
- When those requests succeed, the node learns about previously unknown nodes that can supply
such keys and creates new routing entries for them.
- As the node gains more experience in handling queries for those keys, it will successfully
answer them more often and, in a positive feedback loop, get asked about them more often.
- Nodes' data stores should specialise in storing clusters of files with similar keys. Because
inserts follow the same paths as requests, similar keys tend to cluster in the nodes along those
paths. Nodes should similarly cluster files cached after requests because most requests will be
for similar keys.
- Well-known nodes tend to see more requests and become even better connected.

4.1.7 Managing Storage


- Given finite disk space, the system must sometime decide which files to keep.
- Each node orders the files in its data store by time of last request, and when a new file arrives
that cannot fit in the space available, the node deletes the least recently requested files until
there is room.
- Routing table entries are smaller, they can be kept around longer than files.
- Evicted files don't necessarily disappear right away because the node can respond to a later
request for the file using its routing table to contact the original data holder, which might be
able to supply another copy.
- Freenet's data holder pointers might see only a few local requests for a file, but those higher up
the tree receive requests from a larger part of the network, which makes their copies more
popular.
- The query-routing mechanism automatically creates more copies in an area of the network
where a file is requested, and the tree grows in that direction.
- Files that go unrequested in another part of the network are subject to deletion. As that part of
the tree shrinks, space is freed up for other files.

Reference
[1] On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing, Ian Foster, Adriana
Iamnitchi, Department of Computer Science, University of Chicago, Chicago, IL 60615, Mathematics and
Computer Science Division, Argonne National Laboratory, Argonn,IL60439
[2] Framework or Peer-to-Peer Distributed Computing in a Heterogeneous, Decentralised
Environment, Jerome Verbeke, Neelakanth Nadgir, Greg Ruetsch, and Ilya Sharapov, Sun Microsystems,
Inc., Palo Alto, CA 94303
[4] A Peer-to-Peer Approach to Resource Location in Grid Environments, Adriana Iamnitchi
Computer Science Dept. The University of Chicago, Ian Foster MCS Division Argonne National
Laboratory, Daniel C. Nurmi MCS Division Argonne National Laboratory
[6] A Unified Peer-to-Peer Database Framework for Scalable Service and Resource Discovery
Wolfgang Hoschek, CERN IT Division, European Organization for Nuclear Research

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy