An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
An Optimized K-Harmonic Mean Based Clustering User Navigation Patterns
R. Gobinath1, M. Hemalatha2
I. INTRODUCTION
The surfing path of a user not only carries the foot
mark of users navigational history, it also carries the
user's mentality while using the web site. The behavior of
the user can be clearly noted from the navigational path
and very useful in web site improvement [1]. The
construction of the web sites mainly focusses the interest
of the users, so the web sites should be constructed on
convenient way to satisfy the basic needs of the user. The
access log files gathered from the web server contains the
detailed history of the previous users which is very much
needed in web site designing [2] [3]. The navigation path
chosen by each and every users may vary from each
others depending on their interest and needs. The
strategies used for surfing the websites may also varys
depending on time spent on the websites, comfort level in
using the web sites for gathering information [4] [5] [6].
The web site designer plays a vital role in directing
the user for short navigation path by implementing ideas
collected from the previous user's navigation path
sequence [7] [8]. The web mining application can extract
the necessary information for better arrangement of
navigational paths from access log files by following
basic data mining techniques. The web access log files are
the single line statement stored on the web server which
contains necessary information for analyzing the users
behavioral navigation patterns. However some of
unwanted items for web site personalization also present
in it. The removal of unwanted items from the acquired
access log files should be done before proceeding to
actual web site personalization process. The time and user
II. BACKGROUND
The proposed framework is based on the following
process.
1) Data collection
2) Pre-processing
3) Feature extraction
4) Pattern Discovery
5) Pattern analysis
The methodology involved in this paper is shown in the
following architecture.
Sequence
clustring
Access
Log
Files
Data PreProcessing
Feature
Extraction
User & Session
Identification
A. Data Collection
The web access log files are the information stored on
the web servers which has navigational history of the
users. Web access log files are the single lined statement
which contains full information about the user progress on
the web site. The single line access log files have
following information about the user who have visited.
Path Navigation
In this approach of attribute selection method how the
users visited a particular web page is analyzed. The
association existed among the web page navigation is
considered in analyzing the user browsing patterns.
First and Last Pages Visited
The pages which are viewed first and last are
identified. Whether the visitors enter on the website by
the main page or they tried from a different page.
Inferring why visitors left following reading these pages
can be difficult it could be that these pages comfortable
their information requirements, or that they became
irritated by the time they arrived at these pages.
E. Sequences clustering
The navigational paths which are categorized can be
arranged in a certain order for easy analyzing. The
clustering concept implemented in the navigation path
grouping differs from some of the researchers. The
extended markup language has been used for sequential
web page representation and named as log markup
language [9] [10]. The log markup language arrangement
of web pages by considering the indices processed in
sessions. The simplification shown in usage of log
markup language instead of using plain statements made a
difference in calculating the distance between
navigational sequence. The usage of different technique
doesn't limit the researcher to use the data mining
algorithm in the clustering process. Although web mining
follows the techniques of data mining, the concept of
clustering the navigational path followed by researchers
uses raw clustering process.
The raw clustering concept of collecting the groups
can slow up the analyzing process. The clustering
technique known as k-harmonic explained for sequential
clustering by Bin Zhang el al., in 1999 [13] shown the
difference between k-mean and k-harmonic clustering
algorithms.
K-harmonic clustering
The k-harmonic means algorithm (KHM) is a method
similar to KM that arises from a different objective
function [13]. The KHM objective function uses the
harmonic mean of the distance from each navigational
path to all navigation paths which centers on different
group of cluster.
,
(1)
Cluster
sequence
/ (default page)
860
/fdsearch/search.pl?(parameters)
174
/dmcourse/data_mining_course/course_notes.pdf
174
/software/ (default page)
166
/robots.txt
155
/jobs/index.html
114
/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf
109
/software/index.html
102
/phpBB/viewtopic.php?(parameters)
98
The navigation paths used frequently by the user are
shown in the TABLE II. The root page entry percentage is
more compared with the other sequential path.
TABLE II
USER ENTRY IN PERCENTAGE
Path Listing
Pages/Files
Visitors
Percentage
of visitors
/
/software/
/jobs/
/datasets/
/companies/consulting.html
/dmcourse/data_mining_course/
/software/suites.html
/software/visualization.html
/news/2005/n21/
/software/text.html
666
187
136
64
69
53
64
52
57
47
16.20%
4.55%
3.31%
1.56%
1.68%
1.29%
1.56%
1.26%
1.39%
1.14%
IV. CONCLUSION
The challenges faced by the website designer in
obtaining the clear information for easy web site
designing can be tackled by implementing web mining
techniques. The problem of retrieving necessary
navigational patterns for web personalization process can
be solved from this method and the navigational patterns
are taken for analyzing process. This paper focus in
REFERENCES
[1] J. Nielsen, Designing Web usability: the practice of
simplicity. Indianapolis IN: New Riders Press, 2000
[2] A. Cooper, The inmates are running the asylum.
Indianapolis, IN: SAMS, 1999.
[3] J. Preece, Y. Rogers and H. Sharp, Interaction design,
New York, NY: John Wiley and Sons, Inc, 2002.
[4] M. Graff, Individual differences in hypertext browsing
strategies, Behaviour and Information Technology, Vol.
24, no. 2, 2005.
[5] P.
Pirolli,
and
S.
Card,
Information
foraging. Psychological Review, Vol. 106, no. 4, 1999.
[6] L.D. Catledge, and J.E. Pitkow, Characterizing browsing
strategies in the World-Wide Web, Computer Networks
and ISDN Systems, Vol. 27, no.6, 1995.
[7] J. Holsanova, Tracking multimodal interaction with new
media, Paper presented at the workshop on The Citizen's
Use and Comprehension of Information on the Internet,
Uppsala, 2004..
[8] M.J. Bates, The design of browsing and berrypicking
techniques for the online search interface, Online
Review, Vol. 13, no. 5, 1989.
[9] T. Bray, C.M. Sperberg-McQueen, Extensible markup
language (XML) 1.0W3C recommendation, Technical
Report REC-xml-19980210, World Wide Web Consortium,
1998.
[10] J.R. Punin, M.S. Krishnamoorthy, M.J. Zaki, Web usage
mininglanguages and algorithms, Technical Report,
Rensselaer Polytechnic Institute, 2001.
[11] R. Gobinath and M. Hemalatha, Optimized Feature
Extraction for Identifying user Behavior in Web Mining ,
European Journal of Scientific Research, Vol. 105, no. 3,
2012.
[12] R. Gobinath and M. Hemalatha, Improved Preprocessing
Techniques for Analyzing Patterns in Web Personalization
Process, International Journal of Computer Application,
Vol. 58, no. 3, 2013.
[13] Bin Zhang, Meichun Hsu and Umeshwar Dayal, KHarmonic Means- A Data Clustering Algorithm. HewlettPackard Research Laboratory, 1999.