Web Search Personalization Based On Browsing History by Artificial Immune System
Web Search Personalization Based On Browsing History by Artificial Immune System
Web Search Personalization Based On Browsing History by Artificial Immune System
net/publication/50863696
Article in International Journal of Advances in Soft Computing and its Applications · January 2010
Source: DOAJ
CITATIONS READS
13 757
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Siti Mariyam Shamsuddin on 21 May 2014.
Abstract
1 Introduction
World Wide Web (WWW) is the largest and most accessible source of
information. Usually, web structures are large and sophisticated and users often
miss the goal of their inquiry, or receive ambiguous results when they try to
navigate through them. Users seek a subject that they need information
accordingly. Commonly, the search engines investigate the relevant web and
pages according to the user query. Quite often the users find a lot of information
for each subject through the web. However, one of the issues is to find the useful
283 Web Search Personalization Based on Browsing
information from search result. Therefore, there are many studies on investigating
the importance of personalization in web search engine [1-4].
Personalization of search results is defined as any action to finding more relevant
pages in search results list for particular user or a set of users. The objective of a
personalization system is to “provide information that users want or need exactly,
without expecting from them to ask for it explicitly” [5] .
Web personalization is one of the fastest-growing segments of the Internet
economy. Because it can help in reducing information overload and give users a
more customized experience of a web site, search personalization can reduce
waste time to find requested information on the web. Recently, researchers have
used Artificial Immune System (AIS) as a web personalization technique to
optimize many problems [6],[7],[8].
AIS is defined by de Castro and Timmis as ‘‘adaptive systems, inspired by
theoretical immunology and observed immune functions, principles and models,
which are applied to problem solving’’ [9]. The immune system is a vast, complex,
interconnected network of agents and processes. While the innate immune system
is of great importance to our wellbeing, it is the adaptive immune system that
most AIS algorithms take inspiration from. As its name would suggest, the
adaptive immune system may change and adapt over time to provide protection
against previously unseen dangers. It is this learning and adaptability that AIS
algorithms seek to exploit.
This paper provides an overview of the topic of analyzing user behavior in web
search personalization. In Section 2, we describe the process of personalization of
web search results and the current challenges. Section 3 describes the various
techniques used in generating a personalized web search, while in Section 4; the
Artificial Immune Systems (AIS) is given. Section 5 provides an association of
AIS in the web search personalization, and finally the paper is concluded in
Section 6 with a discussion on the current state and future direction.
users with the same query at a certain time. Therefore it is natural that search
engines are not designed to adapt to personal preferences.
Personalized system helps every user to overcome the mentioned problems. First
of all, the system has to extract interesting keywords for each user. There are two
ways to finding interest keywords: explicitly and implicitly. In the explicit
approach, user should fill up the registration forms or rate on the visited pages,
while implicitly approach finds interesting keywords by examining the historical
search and analyzing the user behavior in the web browsing. Second process for
personalization system is to apply these keywords for exploring more relevant
pages in search results list by filtering and re-ranking techniques.
Web search personalization is still in its infancy [11]. Real-world systems that
claim to be doing personalization are often actually offering what we would call
customization the ability for users to build profiles of preferences for the content
they want to see and the layout it should be displayed in, with users typically
choosing from a set number of possibilities. For example, My Yahoo! allows
users to specify which news, stock prices, weather, and sports scores they want to
be displayed on their My Yahoo! web pages. These preferences are stored in a
profile that is used to create the pages each time the user visits [12]. This is what
we call customization rather than personalization.
Recently, Google search engine develops a personalization and customization
techniques for its user. In fact, Google offered personalized search for only
signed-in users for those who have web search history enabled on their Google
accounts. This addition in Google enables customization search results for singed-
out user based upon 180 days of search activity linked to an anonymous cookie in
their browser. It is completely separated from the Google account and web history
which are only available to signed-in users.
In the Google personalization system1, Google makes a user account to keep the
search history of the user. It also save the link of pages in search results that user
have clicked and viewed. In the next search, the viewed pages in search results are
placed on top of page rank. In most situations, this technique benefits the user, but
some researchers have mentioned that the best parameters to find the user
interested keywords are by calculating the elapsed time and number of clicks on
the web pages [13], [14], [15].
In this article, we present alternative approach for personalization of search
results by analyzing the number of clicks and elapsed time in browsing history
and user past searching. We consider time elapsed to find interesting documents
for particular user. In our proposed method, AIS is applied as a model to calculate
affinity function between the user interested keywords and the search results.
1
https://www.google.com/psearch
285 Web Search Personalization Based on Browsing
pages to filter and to recommend web documents for the new user. Initially,
Goldberg [19] implemented Tapestry system based on collaborative filtering for
recommendation system. Nevertheless, recommender systems for large
communities generally cannot depend on known individual. Hence, the
framework in Tapestry is not appropriate for large communities.
The GroupLens [20], was an automated system for collaborative filtering which
filtered Usenet news using the k-nearest neighbor-based algorithm. In this system,
a subset of appropriate k users is chosen based on their similarity to the active
user, and a weighted aggregate of their rating is used to generate predictions for
the active user.
While the two mentioned systems as above rely on explicit ratings, some other
systems rely on implicit ratings. For example, Morita and Shinoda [15]
exploited“time-spent-reading” as a main parameter in implicit ratings. PHOAKS
(People Helping One Another Know Stuff) also uses implicit ratings to construct a
recommender system by examining Usenet news postings to find “endorsements”
of Web sites [21]. It creates and endorsed a listing of the top web sites in each
newsgroup. Some recommender systems also explore user preferences
transparently without any extra effort from the users like the recommender
systems relying on implicit ratings described above.
In addition, at the E-commerce sites like Amazon.com, CDnow.com and
MovieFinder.com, automated collaborative filtering systems have been used with
considerable success.
Collaborative filtering can be represented as the problem of predicting missing
values in a user-item ratings matrix. Table 1 shows a simplified example of a
user-item ratings matrix. In this case “Item i” is predicated to the “user a”
User 1 2 3 - 5 1
User 2 - 3 1 1 2
User a 2 1
User k 5 1 - 1 -
287 Web Search Personalization Based on Browsing
to modify both the personal filter and the originating topic filter. Base [25]
integrated content and collaboration in a framework where they treat
recommendation as a classification task. Melville [26] addressed drawbacks of
collaborative filtering systems in their recommender system by exploiting content
information of items already rated.
user clicks one or more documents that look relevant and skips those documents
that the user is not interested in. If a specific personalization method can rerank
relevant documents for a user higher in results list, the user would be more
satisfied. Hence, it can be a judgment to evaluate search accuracy. Since click-
through data can be done at low cost, it is possible to do large-scale evaluation.
This section, briefly mentioned some ability of AIS corresponding with web
search personalization process.
4.3 Diversity
Diversity is a feature of the immune system, and a key feature in the ability to
recognize and response to a continuously changing environment. It means that the
immune system cells can adapt to different components and structure [36]. Like
the immune system, the web pages also are diverse. It carries many different
information formats and structures, from simple text to PDF formats. The immune
system includes a set of various cells with own specialized function and is able to
recognize a huge number of different types of antigen. These metaphors could be
extracted to produce a system in which different types of cell support different
types of data and therefore the ability to identify information contained in these
diverse media is a great advantage. Finding relevance information in the web
pages with different structure needs a powerful tool to overcome this problem. We
have found the artificial immune system as an appropriate and adaptable model to
personalized search results in this paper.
When personalization system uses web log file for pattern discovery, the main
issue for using server side data is reliability [37]. Client side data are collected
from the host that is accessing the web site. This data are more reliable and
accessible. One of the most common techniques for acquiring client side data is to
send out a remote agent, implemented in Java or JavaScript [38]. These agents are
embedded in web pages, for example as Java applets, and are used to collect
information directly from the client, such as the time that the user is spending and
leaving the web site, a list of sites visited before and after the current site, and the
user’s navigation history.
In addition of reliability, Scalability is second issue for server side data. Since a
usage data stream can grow to be huge, trying to discover the new behaviors from
the accumulated log files each time will require significant computational
resources, and could even be impractical or impossible for websites with huge
traffic.
After submission a query to a search engine by a user, the search engine returns
search results according to the submitted query. In the results list, the user may
select number web pages according to request information. Also, the user may
access more web pages by following the hyperlinks on the selected web page and
continue to browse. In the proposed approach, the system monitors user’s
browsing history and updates user profile. When the user submits a query in the
next time, personalization system re-ranks the search results based keywords on
user profile. Figure 1 shoes the proposed framework for personalization of search
results by analyzing user’s browsing history to construct user profile.
To improve the relevancy of the search result, some researcher use query
expansion by modified user query and use synonymous words [39-40]. This
method to find appropriate words to include in a query is to compare the previous
user queries using semantic similarity measures. If there exist a previous query
that is semantically related to the current query, then it can be suggested either to
the user or internally used by the search engine to modify the original query [41].
H. Rastegari et al. 292
5.1 WordNet2
WordNet is a useful tool for archive of words, phrases and relationships between
them. WordNet is described as an attempt to map the human understanding of
words and its synonyms. WordNet has proven to using in data mining and web
mining and it is used to improve the performance of an information retrieval
system [42]. In the WWW, different sites for a subject use different terms that
they have the same meaning. For instance, a user may looking for a car and
submits a query like “buy a car”, while some web sites used this term “purchase a
vehicle”. In proposed framework, we use WordNet for creation a synonym word
vector to overcome this problem.
When user submits a query, system finds synonyms of the query phrases by using
WordNet, and makes a synonym word vector (synset). A search engine like
Google receives this vector and search WWW. Search engine searches web for all
words or phrases in synonym vector, so gathers a complete set of web pages and
delivers to web search personalization system. Web search personalization system
selects relevant pages based on user interesting words in user profile.
2
http://wordnet.princeton.edu/
293 Web Search Personalization Based on Browsing
usage mining, corresponding to the two software systems interacting during a web
session: data on the web server side and data on the client side.
In the web server side, data are collected and stored in web log files. They consist
primarily of various types of logs generated by the web server. These logs record
the web pages accessed by the visitors of the site. Web mining tools use web
server log files as the main data source for discovering usage patterns. However,
log files cannot always be considered a reliable source of information about the
usage of a site. The problem of data reliability becomes particularly serious for
web personalization, where it is important to identify individual users, in order to
discover their interests.
In the client side, gathering data about the user for creating his profile can be done
both implicitly and explicitly. The simplest method for collection of data is to
collect the user’s preferences explicitly through forms, questioners, rating of
search results, value elicitation and preference feedback [43].
Although explicitly entered profile information is potentially of “high quality” and
provides reliable information about the user, but studies have shown that
reluctance and lack of motivation on the user’s part to provide information makes
the explicit collection of sufficient data for the profile difficult [44].
Therefore, it is required that the user data be collected implicitly by inferring
preferences from the user’s activities in web browsing. This can be implemented
by using the following techniques:
• Monitoring the user’s past search queries.
• The search results actually clicked on by the user, clicking and spending
time on a link confirms its relevance to the user.
• Monitoring the user’s browsing patterns.
• Background information about the user, the IP address of the user gives an
idea of his geographical location.
Implicit and explicit data collection methods can be used in conjunction with one
another, potentially giving the best of both the worlds [44]. Time spent on visited
web pages can be calculated for implicit data collection. Experimental studies in
[15] proved that a user usually jumps to another page quickly, if a page is not
interesting. However, a quick jump might be caused by the short length of the
page; hence the user’s interest might be more appropriately approximated by the
time spent on a page normalized by the page’s length. The proposed approach use
term frequency and word density to extract interest keywords in the web pages
based on the equation 1.
(1)
H. Rastegari et al. 294
Where ni,j donates the number of occurrences of the term i in the web
document dj, and the denominator is the sum of number of occurrences of all
terms in the web document dj. In this paper, we used percentage of the terms in
the web documents and called word density. Table 2 shows the snapshot of a user
profile that the user submitted the “intelligent data mining” query in the Google
search engine and clicked on some of links. We analyzed the user behavior in the
interest web documents based on the time spent and number of clicks.
In this table, “weight” indicates the measure of user interest. We use the user
profile to find more relevant web documents in next user query by calculation the
affinity measure.
relevant, every cell must carry a set of words relevant (user keywords) to the
user's subject. If system has detected the current web page is relevant, the cell will
remove it from text file and it will be saved into the user interest file.
In the recent work, Secker et al. [8] developed an artificial immune system for
interesting information discovery on the web (AISIID). This system applies
Artificial Immune System for the collection and ranking of web pages judged to
be interesting to the user. AISIID uses a population of immune cells, and again
processes inspired by clonal selection, to discover interesting web pages. The user
specifies a small collection of web pages that summaries own knowledge on the
search subject. Starting on one of the user specified pages; each cell is given a
position on the web and is free to move, following hyperlinks that may lead it to
other interesting web pages. Each web page it encounters is regarded as an antigen
and is therefore available for an affinity evaluation.
The affinity function is a mathematical formulation for measurement of similarity
between user interesting keywords in user profile and keywords in search result
pages. In fact affinity function is rate of antibody for matching with antigen and
removes it. Equation 2 calculates the affinity function for finding the similarity
between search results and user profile.
(2)
To calculate the affinity, we need to achieve two value relevance and interest.
Relevance is similarity between query word vector that submitted to the search
engine and web page keywords in search results. Interest parameter used to find
the similarity between user word vector in user profile and web page keywords.
Alfa is measurement of web page affinity with new knowledge and Beta is
measurement of web page affinity with user knowledge saved in use profile. We
mention that if affinity between a query submitted to search engine and user
keywords in user profile be zero value, it means this is a new subject for user. In
other word:
If Affinity (UWV, QWV) = 0 then Query is a new subject, update user profile
The result is the affinity between the antigen and the immune cell and by
definition will return a real number in the range [0, 1]. The relevance of a page is
calculated as shown in Equation (3) where the number of words in the QWV
(query word vector) that also appear on the Web page are encountered and then
normalized by the length of the QWV.
(3)
H. Rastegari et al. 296
Where QWVi is the ith component (word) of vector QWV and WPK is the set of
keywords in the Web page result. Likewise, the calculation for interestingness is
shown in Equation (4), the UWV (user word vector) is compared against the Web
page and the count of the number of words present in both the Web page and the
UWV is normalized by the length of the UWV.
(4)
Affinity function is used for calculating the similarity between the search results
and user profile to finding most relevant web documents. Table 3 depicts
reranking the search results from the “rule generator” as a next query in the user
profile that mentioned above. In this table, after calculation the affinity function
for 50 links in search results, we ordered this list based on value of affinity
descendingly and selected the 10 links in top of the ordered list.
As a benefit of this work, figure 2 shows the comparison between the proposed
approach and common search engines like Google. This figure distinguishes that
the personalization of the search results can optimize the search results by
analyzing the user behavior and construction the profile for each user.
297 Web Search Personalization Based on Browsing
6 Conclusion
In this paper, we have given the review of web browsing history to extract user’s
preferences and create a user profile impilictly. The user profile applied to adapt
the search results according to the user needs. This approach is quite interesting
since it allows each user to perform a fine-grained search by capturing changes in
each user’s preferences.
This work considers AIS as an alternative model to find more relevant pages in
the search results. It is suitable for intelligent search personalization to mine the
relevant documents in the search results based on the user preferences. We
defined affinity function for detection of user interesting page on web.
Based on the experimental result, web search personalization can improve the
search results for particular user in the most of time, but it need a intelligent
techniques to update the user profile. For future work, location information and
hybridization with collaborative filtering and clustering methods will be
considered accordingly.
ACKNOWLEDGMENT
The authors would like to thank Soft Computing Research Group (SCRG) for
their continuous support in making this research a success.
H. Rastegari et al. 298
References
[1] Sugiyama, K., K. Hatano, and M. Yoshikawa, "Adaptive web search based
on user profile constructed without any effort from users", Proceedings of
the 13th international conference on World Wide Web, ACM: New York,
NY, USA, (2004), pp.675-684.
[2] Jeh, G. and J. Widom, "Scaling personalized web search", Proceedings of
the 12th international conference on World Wide Web, ACM: Budapest,
Hungary, (2003), pp.271-279.
[3] Tanudjaja, F. and L. Mui, "Persona: A Contextualized and Personalized
Web Search", IEEE Proceedings of the 35th Annual Hawaii International
Conference on System Sciences (HICSS'02), (2002), pp. 1232-1240.
[4] Teevan, J., S.T. Dumais, and E. Horvitz, "Personalizing search via
automated analysis of interests and activities", Proceedings of the 28th
annual international ACM SIGIR conference on Research and
development in information retrieval, ACM: Salvador, Brazil, (2005),
pp.449-456.
[5] Maurice, D.M., S.A. Sarabjot, and G.B. Alex, "Personalization on the Net
using Web mining: introduction". Commun. ACM, 43(8), (2000),pp. 122-
125.
[6] Merve Acilar, A. and A. Arslan, "A collaborative filtering method based
on artificial immune network". Expert Systems With Applications, 36(4),
(2009), pp. 8324-8332.
[7] Nazri, M., et al., "A Hybrid Approach for Learning Concept Hierarchy
from Malay Text Using GAHC and Immune Network", in Artificial
Immune Systems. (2009). pp. 315-328.
[8] Secker, A., A.A. Freitas, and J. Timmis, AISIID: "An artificial immune
system for interesting information discovery on the web". Applied Soft
Computing, 8(2), (2008), pp. 885-905.
[9] Castro, L.R.d. and J. Timmis, Artificial Immune Systems: A New
Computational Intelligence Paradigm, Springer-Verlag New York, Inc.
368, (2002).
[10] Gauch, S., J. Chaffee, and A. Pretschner, "Ontology-based personalized
search and browsing",Web Intelligence and Agent Systems, 1(3), (2003),
pp. 219-234.
[11] Jeevan, V.K.J. and P. Padhi, A selective review of research in content
personalization. Library Review, (2006). 55(9): pp. 556-586.
[12] Manber, U., A. Patel, and J. Robison, "Experience with personalization of
Yahoo! Commun". ACM, 43(8), (2000), pp. 35-39.
[13] Jung, K., Modeling web user interest with implicit indicators, MSc Thesis,
Florida Institute of Technology, (2001).
[14] Claypool, M., et al., "Implicit interest indicators", Proceedings of the 6th
international conference on Intelligent user interfaces. ACM: Santa Fe,
New Mexico, United States, (2001), pp.33-40.
299 Web Search Personalization Based on Browsing
[30] Haveliwala, T.H., et al., Evaluating strategies for similarity search on the
web, in Proceedings of the 11th international conference on World Wide
Web, ACM: Honolulu, Hawaii, USA, (2002), pp. 432 - 442.
[31] Kamvar, S., T. Haveliwala, and G. Golub, "Adaptive methods for the
computation of PageRank". Linear Algebra and its Applications, 386: pp.
51-65, (2004).
[32] Shen, Y., L. Xing, and Y. Peng. "Study and application of Web-based data
mining in e-business". Proceedings - SNPD 2007: Eighth ACIS
International Conference on Software Engineering, Artificial Intelligence,
Networking, and Parallel/Distributed Computing, Qingdao, (2007),
pp.812-816.
[33] Sieg, A., B. Mobasher, and R. Burke. "Web search personalization with
ontological user profiles", International Conference on Information and
Knowledge Management, Proceedings, Lisboa, (2007), pp.525-534.
[34] Dou, Z., et al., "Evaluating the Effectiveness of Personalized Web Search".
IEEE Transactions on Knowledge and Data Engineering, 21(8), (2009),
pp: 1178-1190.
[35] Dasgupta, D. Artificial neural network and artificial immune systems:
similarities and differences, IEEE Proceeding on Systems, Man, and
Cybernetics, (1997), pp.873-878.
[36] Nurulhuda Firdaus Mohd, A. "Profile Adaptation in Adaptive Information
Filtering: An Immune Inspired Approach", Proceeding of IEEE
conference on Soft Computing and Pattern Recognition, (2009), pp.414-
419.
[37] Pierrakos, D., et al., "Web usage mining as a tool for personalization: A
survey", User Modelling and User-Adapted Interaction, 13(4),(2003),pp.
311-372.
[38] Shahabi, C., et al., Yoda: An Accurate and Scalable Web-Based
Recommendation System, in Cooperative Information Systems. (2001), pp.
418-432.
[39] Mitra, M., A. Singhal, and C. Buckley, "Improving automatic query
expansion", Proceedings of the 21st annual international ACM SIGIR
conference on Research and development in information retrieval, ACM:
Melbourne, Australia, (1998), pp.206-214.
[40] Biancalana, C., A. Lapolla, and A. Micarelli, "Personalized Web Search
Using Correlation Matrix for Query Expansion", Web Information
Systems and Technologies, (2009), pp.186-198.
[41] Bollegala, D., Y. Matsuo, and M. Ishizuka, "Measuring semantic similarity
between words using web search engines", Proceedings of the 16th
international conference on World Wide Web, ACM: Banff, Alberta,
Canada, (2007), pp.757-766.
[42] Fellbaum, C., WordNet: an electronic lexical database. 1998, MIT Press.
301 Web Search Personalization Based on Browsing