Enhancing and Analyzing Search Performance in Unstructured Peer To Peer Networks Using Enhanced Guided Search Protocol (EGSP)
Enhancing and Analyzing Search Performance in Unstructured Peer To Peer Networks Using Enhanced Guided Search Protocol (EGSP)
ORG
59
Enhancing and Analyzing Search performance in Unstructured Peer to Peer Networks Using Enhanced Guided search protocol (EGSP)
Anusuya.R1, Dr.Kavitha.V2, Mrs. Golden Julie.E3
1
University Department, Anna University Tirunelveli2 Tirunelveli, Tamilnadu, India University Department, Anna University Tirunelveli3 Tirunelveli, Tamilnadu, India
Abstract Peer-to-peer (P2P) networks establish loosely coupled application-level overlays on top of the Internet to facilitate efficient sharing of resources. It can be roughly classified as either structured or unstructured networks. Without stringent constraints over the network topology, unstructured P2P networks can be constructed very efficiently and are therefore considered suitable to the Internet environment. However, the random search strategies adopted by these networks usually perform poorly with a large network size. To enhance the search performance in unstructured P2P networks through exploiting users common interest patterns captured within a probability-theoretic framework termed the user interest model (UIM). A search protocol and a routing table updating protocol are further proposed in order to expedite the search process through self organizing the P2P network into a small world. Both theoretical and experimental analyses are conducted and demonstrated the effectiveness and efficiency of the approach. Keywords: Peer to Peer Networks, Self-Organization, and Users Common Interest.
1. Introduction
Peer-to-peer (P2P) networks have become, in a short period of time, one of the fastest growing and most popular Internet applications [6]. A class of applications that takes advantage of resources like storage, CPU cycles, content and even human presence available at the edges of the Internet.One fundamental challenge of Peer to Peer networks is to achieve efficient resources discovery. Those networks can be largely classified into two categories, namely, structured P2P networks based on a distributed hash table (DHT)[21] and unstructured P2P networks based on diverse random search strategies (e.g., flooding)[3]. Without imposing any stringent constraints over the network topology, unstructured P2P networks can be constructed very efficiently and have therefore attracted far more practical use in the Internet [1], [2] than the structured networks. Peers in unstructured networks are often termed blind, since they are usually incapable of determining the possibility that their neighbour peers can satisfy any resource queries. An undesirable consequence of this is that the efficiency of distributed resource discovery techniques will have to be compromised. The fundamental idea of this paper is that the statistical
patterns over locally shared resources of a peer can be explored to guide the distributed resource discovery process and therefore enhance the overall resource discovery performance in unstructured peer to peer networks.
Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
60
Three essential research issues have been identified and studied in this paper in order to save peers from their blindness. The first research issue questions are the practicality of modelling users diverse interests. To solve this problem, the user interest model (UIM) based on a general probabilistic modelling tool termed Condition Random Fields (CRFs)[14]. With UIM, we are able to estimate the probability of any peer sharing a certain resource (file) fj upon given the fact that it shares another resource (file) fi. This estimation further gives rise to an interest distance between any two peers. Conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer a unique combination of properties: discriminatively trained models for sequence segmentation and labelling; combination of arbitrary, overlapping and agglomerative observation features from both the past and future; efficient training and decoding based on dynamic programming; and parameter estimation guaranteed to find the global optimum. The second research issue considers the actual exploration of users interests as embodied by UIM. For this greedy file search protocol is presented for fast resource discovery. Whenever a peer receives a query for a certain file that is not available locally, it will forward the query to one of its neighbours that have the highest probability of actually sharing that file. The third research issue is that the search protocol alone is not sufficient to achieve high resource discovery performance. This paper proposes a routing table updating protocol to support our search protocol through self organizing the whole P2P network into a small world[11],[16],[18]. In a P2P network, queries handled by a peer may be satisfied by any peer in the network with uneven probability.
passing intermediary entities. Peer-to-Peer (P2P) systems make it possible to harness resources such as the storage, bandwidth, and computing power of large populations of networked computers in a cost-effective manner. Actually P2P is a decentralized and distributed and here all the nodes are equivalent.
No centralized client-server scheme and network of equal peer nodes serving either as clients or servers to other nodes. In structured P2P systems, data items are spread across distributed computers (nodes), and the location of each item is determined in a decentralized manner using a distributed hash lookup table (DHT).Structured P2P systems based on the DHT [21] mechanism have proven to be an effective design for resource sharing on a global scale and on top of which many applications have been designed such as file sharing, distributed file systems, real-time streaming, and distributed processing.
Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
61
paper. For the purpose of this paper alone, we found that UIM is expressive enough to model users common interests and to guide the resource discovery process. In practice, every file shared through a P2P network can be uniquely described with a list of attributes. The key structure for estimating Pr(fj\fi) in UIM is the feature function. Each feature function stands for a certain domain-specific criterion, which is essential for evaluating Pr(fj\fi). If the criterion is satisfied, F(.) will return 1. Otherwise, 0 will become the output of F(.).The definition of feature functions forms the structure core of UIM, which is domain dependent and can be constantly learned via model learning algorithms. Based on this UIM, Pr(fj\fi)is to be evaluated as
structure update through servers will not introduce considerable communication cost. The model (UIM) based on a general probabilistic modelling tool termed Condition Random Fields (CRFs).In comparison with structural learning, parameter learning usually happens more frequently UIM serves essentially as a measure of the distance between any two peers or any two files. In cooperation with a proper strategy for updating routing tables, a small-world network that guarantees search efficiency can be formed and 3)The Enhanced Guided Search Protocol. In this section, a file search protocol is presented to regulate the activities of every peer p in a P2P network upon receiving a query q = < p;f; hq;TTL;ts; te > .
3. Proposed work
3.1 The Updating Routing Table Protocol (URTP)
In this section, a protocol for updating routing tables will be presented and analyzed. An uneven updating problem will also be highlighted, and a filtering mechanism will be further introduced to tackle this problem. This paper considers a loosely connected peer to peer network. We use p to denote a single peer in the network. P is further utilized to denote the set of all peers in the network. The main type of resource, namely, a data file, is represented by f. For
Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
62
every peer p, Fp is used to represent the group of files shared by p. In order to conduct distributed search over the P2P network, every peer p maintains locally a list of neighbor peers. This list serves as the routing table for peer p, denoted by Rp. There is an upper bound Br on the size of any routing table Rp, while the size is measured in terms of the number of entries in Rp. An entry Ep of Rp is a tuple of two elements: < p0; f0 >. It represents a link from peer p to another peer p0 that shares file f0. In order to locate (or discover) any file under request, the user of a peer p, denoted by up, sends out a query to the network. A query that originated in peer p is represented by pq and is a tuple of six elements:< pq ;fq; hq;TTL;ts;te >. Here, p stands for the peer that issued the query qp. f is the file requested by the query. hq records the search history, which is a list of peers that have processed the query previously, including peer p itself. In order to prevent a query from incurring too much traffic in the network, time-to- live (TTL) in a query defines an upper bound on the allowable size of hq, ts refers to the time when the query is issued, while te is the time when the query is completed. A query is completed successfully if the requested file f has been identified. On the contrary, the query is failed if the size of hq exceeds the TTL. Upon receiving a query q, a peer p needs to perform several basic operations: 1) append itself to the search history hq, 2) search the requested file f among its locally shared files (i.e., local repository), and 3) forward the query to one of its neighbor peers. Each forwarding operation is termed a hop. At the time when query q is finished, the number of hops NOP becomes an important measure of the search performance. In practice, we hope that NOP for average search tasks could be as low as possible, which essentially implies that only a small group of peers, will be involved in processing any query. To summarize, there are two widely used performance metrics for resource discovery in P2P networks: NOP and search success rate. Search success rate refers to the proportion of queries that have been successful among all the queries issued by network users.
The details of our protocol for updating routing tables are described. Whenever the search process driven by any query q =< pq;fq; hq;TTL;ts; te > is completed successfully, a new routing entry Ep = < pi ; fq > , indicating that peer pi shares the queried file fq, will be temporarily added into the routing table Rp of every peer p recorded in the search history hq. If Rp is not full, no entries of Rpwill be removed. Otherwise, the size of Rpwill be reduced to below Br by deleting one or more selected entries. For our approach, with respect to each routing entry Ep =< p; f >maintained by peer p, the interest distance between p and p is evaluated. The probability of removing any entry is proportional to d(p,p)r. Different from this approach, three competing strategies to be analyzed in this paper for updating routing tables are summarized as follows: 1. The LRU strategy. The routing entry that is least recently used to forward queries will be dropped. 2. The ECCR scheme. With a certain probability Pre, the least recently used routing entry will be dropped. Otherwise, the neighbour peer p which has the longest interest distance from peer p, will be removed from Rp. 3. The distance-centric (DC) strategy. Either the peer p,which has the longest interest distance from peer por another peer p, which has the second longest distance, will be removed from Rp of peer p,depending on a probability Prd.To make our analysis achievable, the routing table updating process will be represented through a DLM
Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
63
4. Performance Evaluation
Comparing with Guided Search, Routing protocol, filtering with routing updating table provides optimum results for the search performance. Initially when the queries are minimum, guided search performance was good. When the queries are getting increased ,filtering mechanism with routing updating table is the suitable one which gives the best results up to 90%.Hence it improve the searching performance of the peer. Routing updating table protocol contains the past successful search results and it is used for future references. Updating process can be taken place in each and every second.
5. Simulation Model
Simulation is based on NS-2 and Tcl with C++.Network Simulators such as NS-2 has been used for testing p2p protocols, while other network simulators ,like OMNeT++ have been forced to produce a simulator specifically designed for P2P systems namely oversim.We have taken Xaxis parameter as queries and Y-axis as success rate. By varying different methods like guided search, simple routing and routing with filtering towards search performance with varied queries (50,100,150)
7. Conclusion
Peer-to-peer networks are autonomously created, selforganizing, decentralized systems that appeal to everyday home computer users. We have shown that these networks can be organized into interest-based communities using simple formation and discovery algorithms. The search performance in unstructured P2P networks can be effectively improved through exploiting the statistical patterns over users common interests. Specifically, the search protocol was shown to be quite efficient in small-world networks. Succeeding analysis further justifies that by using our routing table updating protocol, the P2P network will self organize into a small world that guarantees search efficiency. Common interests seek to enhance the search performance in unstructured P2P networks. Through exploiting users common interest patterns captured within a probability-theoretic framework termed the user interest model (UIM). A search protocol and a routing table updating protocol are further proposed in order to expedite the search process through self organizing the P2P network into a small world. Conditional random fields offer a unique combination of properties: discriminatively trained models for sequence segmentation and labeling; combination of arbitrary, overlapping and agglomerative observation features from both the past and future; efficient training and decoding based on dynamic programming; and parameter estimation guaranteed to find the global optimum.
6. Simulation Results
Experiments were run using different parameter, protocols and system settings. The performance analysis presented here is designed to compare the effects of different filtering mechanisms parameters such as NOP, success rate, queries etc together with P2P protocols for the improvement of search performance. In this section 6.1(a) and 6.1(b) clearly shows the optimum results.
8. References
[1] Kazaa Media Desktop, http://www.kazaa.com/, 2001. Fig 6.1(a) Performance of filtering mechanism [2] BitTorrent, http://bitconjurer.org/, 2003. [3] The Gnutella Website, http://gnutella.wego.com, 2003. [4] The Napster Website, http://www.napster.com/, 2007.
Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
64
[5] T.M. Adami, E. Best, and J.J. Zhu, Stability Assessment Using Lyapunovs First Method,Proc.34th Southeastern Symp. System Theory (SSST 02), pp. 297-301, 2002. [6] S. Androutsellis-Theotokis and D. Spinellis, A Survey of Peer-to-Peer Content Distribution Technologies, ACM Computing Surveys, vol. 36, no. 4, pp. 335-371, 2004. [7] V. Cholvi, P.A. Felber, and E.W. Biersack, Efficient Search in Unstructured Peer-to-Peer Networks, European Trans.Telecomm,vol. 15, no. 6, 2004. [8] E. Cohen, A. Fiat, and H. Kaplan, Associative Search in Peer-to-Peer Networks: Harnessing Latent Semantics, Proc. IEEE INFOCOM, 2003. [9] E. Herskovits, Computer-Based Probabilistic-Network Construction, [10]R.A.Horn and C.R. Johnson, Cambridge Univ.Press, 1990. Matrix Analysis.
[18] G.S. Manku, M. Bawa, and P. Raghavan, Symphony: Distributed Hashing in a Small World, Proc. Fourth Usenix Symp. Internet Technologies and Systems (USITS), 2003. [19] M. Mitzenmacher, The Power of Two Choices in Randomized Load Balancing, IEEE Trans. Parallel and Distributed Systems, vol. 12, no. 10, pp. 10941104, Oct. 2001. [20] C. Plaxton, R. Rajaraman, and A. Richa, Accessing Nearby Copies of Replicated Objects in a Distributed Environment, Proc. Ninth Ann. ACM Symp. Parallel Algorithms and Architectures, 1997. [21]A.Rowstron and P. Druschel, Pastry: Scalable, Distributed Object Location and Routing for LargeScale Peer-to-Peer Systems, Proc IFIP/ACM Intl Conf. Distributed Systems Platforms (Middleware), 2001. [22] P. Smyth, D. Heckerman, and M. Jordan, Probabilistic Independence Networks for Hidden Markov Models, Neural Computation, vol. 9, no. 2, pp. 227-269, 1997. [23] P. Spirtes and C. Meek, Learning Bayesian Networks with Discrete Variables from Data, Proc. ACM SIGKDD, 1995. [24] K. Sripanidkulchai, B. Maggs, and H. Zhang, Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems, Proc. IEEE INFOCOM, 2003. [25]I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M.F. Kaashoek,F.Dabek, and H. Balakrishnan, Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications, IEEE/ACM Trans.Networking, vol. 11, no. 1, pp. 17-32, 2003. [26]S.Tewari and L. Kleinrock, Optimal Search Performance in Unstructured Peer-to-Peer Networks with Clustered Demands,IEEE J. Selected Areas in Comm., vol. 25, no. 1, 2007. [27] B. Yang and H. Garcia-Monlina, Efficient Search in Peer-to-Peer Networks, Proc. 22nd IEEE Intl Conf. Distributed Computing Systems (ICDCS), 2002. [28] H. Zhang, A. Goel, and R. Govindan, Using the SmallWorld Model to Improve Freenet Performance, Computer Networks,vol. 46, pp. 555-574, 2004. [29] B.Y. Zhao, L. Huang, J. Stribling, S.C. Rhea, A.D. Joseph,and J.D. Kubiatowicz, Tapestry: A Resilient Global-Scale Overlay for Service Deployment, IEEE J. Selected Areas in Comm., vol. 22, no. 1, pp. 41-53, 2004.
[11] H. Jin, X. Ning, and H. Chen, Efficient Search for Peer-to-Peer Information Retrieval Using Semantic Small World, Proc. Intl Conf. World Wide Web (WWW 06), pp. 1003-1004, 2006. [12] M. Khambatti, K. Ryu, and Peer-to-Peer Networks Communities,Proc.Intl Information Systems and (P2PDBIS), 2003. P. Dasgupta, Structuring Using Interest-Based Workshop Databases, Peer-to-Peer Computing
[13]J. Kleinberg, The Small-World Phenomenon: An Algorithmic Perspective, Proc. 32nd ACM Symp. Theory of Computing (STOC),2000. [14] J. Lafferty, A. McCallum, and F. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labelling Sequence Data, Proc. 18th Intl Conf. Machine Learning (ICML), 2001. [15] A. Loser, S. Staab, and C. Tempich, Semantic Social Overlay Networks, IEEE J. Selected Areas in Comm., vol. 25, no. 1, pp. 5-14,2007. [16] M. Li, W. Lee, and A. Sivasubramaniam, Semantic Small World: An Overlay Network for Peer-to-Peer Search, Proc. 12th IEEE Intl Conf. Network Protocols (ICNP), 2004. [17] E.K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, A Survey and Comparison of Peer-to-Peer Overlay Network Schemes, IEEE Comm. Surveys and Tutorials, vol. 7, no. 2, pp. 72-93, 2005.
Journal of Computing, Volume 2, Issue 6, June 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
65
Ms.R.Anusuya received his B.E. degree in Computer science Engg in 2006 from Anna University Chennai and M.E. degree in Computer Science and Engineering in 2010 from Anna University Tirunelveli, Tirunelveli, India. Her areas of interest are Network Security, Mobile computing, Computer Networks and software Engineering. She has presented many papers in national and International Conferences in various fields. As part of this paper, she is working on developing Routing protocols for wired networksprotocols optimized for wired and that can support the shortest path and searching performance. She is a member of ISTE.
Dr.V.Kavitha obtained her B.E degree in Computer Science and Engg in 1996 from MS University and ME degree in Computer Science and Engineering in 2000 from Madurai Kama Raj University. She is the University Rank Holder in UG and Gold Medalist in PG.She received PhD degree in computer science and Engg from Anna University Chennai in 2009. Right from 1996 she is in the Department of Computer Science & Engg under various designations. Presently she is working as Asst. Prof in the Department of CSE at Anna University Tirunelveli.In addition she is the Director In-Charge of University V.O.C College of Engineering. Tuticorin.Currently, under her guidance ten Research Scholars are pursuing PhD as full time and part time. Her research interests are Wireless networks Mobile Computing, Network Security, Wireless Sensor Networks, Image Processing, Cloud Computing .She has published many papers in national and International journal in areas such as Network security, Mobile Computing, wireless network security, and Cloud Computing. She is a life time member of ISTE.
Mrs.E.Golden Julie received her B.E degree in Computer Science and Engg in 2005 from Madurai Kama Raj University and ME degree in Computer Science and Engineering in 2008 from Anna University Chennai. Currently she is Pursuing her PhD from Anna University Tirunelveli.She has published many papers in various fields. Her research area includes Data Mining, Grid Computing, Mobile Computing, Wireless Networks and Image Processing. She is a member of ISTE.