Cluster Optimization For Improved Web Usage Mining
Cluster Optimization For Improved Web Usage Mining
Volume: 3 Issue: 11
ISSN: 2321-8169
6394 - 6399
________________________________________________________________________________________________________
__________________________________________________*****_________________________________________________
I.
INTRODUCTION
_______________________________________________________________________________________
ISSN: 2321-8169
6394 - 6399
________________________________________________________________________________________________________
of promotional campaigns, optimizing the functionality of
III. CLUSTERING
Web-based applications, provides more tailored content to
Clustering is a method of data mining that collectively
viewers, and helps in finding the most effective logical
groups set of items having similarities in there characteristics
structure for Web space. This kind of analysis helps in
revealed. In usage domain, we have observed that there are
automatic discovery of significant patterns and associations
broadly two basic clusters i.e. user clusters and page clusters
among huge collection of chiefly semi-structured data stored
[12]. Clustering on user records (sessions or transactions) is
in Web servers and applications server access logs and related
referred as most common analysis done in Web usage mining
operational data sources. The goal is to confine, model, and
and Web analytics. Clustering of users together tends to create
examine the behavioral patterns and profiles of users
groups exhibiting similar kind of browsing or access patterns.
interacting with a Web site. The discovered patterns are
This knowledge is especially helpful for inferring user
usually represented as collections of pages, objects, or
demographics in order to perform market segmentation in eresources that are frequently accessed or used by groups of
commerce applications or provide personalized Web content
users with common needs or interests [10].
to the users with similar interests [8]. Further on, analysis of
Web Usage mining Consists of three phases, mainly preuser groups based on their demographic attributes can lead to
processing, pattern discovery, and pattern analysis [11]. Fig 1.
the discovery of valuable business intelligence. Moreover,
Below shows the sequence of Web Usage Mining process.
Usage-based clustering has also been used to create Web-
_______________________________________________________________________________________
ISSN: 2321-8169
6394 - 6399
________________________________________________________________________________________________________
general problem can be expressed in linguistic rules, then a
As the size of the cluster goes on increasing due to increase
fuzzy inference system (FIS) can be built, and if it is in data,
in users or growth of interest of users it has become inevitable
or can be learned from a simulation or training then artificial
need to optimize the clusters. Here we introduces a cluster
neural networks (ANNs) can be applied[16][17].
optimizing methodology based on ants nest mate recognition
ability and is used for eliminating the data redundancies that
IV. OPTIMIZATION THROUGH SWARM INTELLIGENCE
may occur after the clustering done by the web usage mining
methods. Ant Nest Mate approach for cluster optimization is
Particle Swarm Optimization (PSO) was originally
presented to personalize web page clusters of target users.
designed and introduced by Eberhart and Kennedy. The PSO
Hierarchy relationship exists within groups. These complex
algorithm is a population base seek algorithm based on social
behaviors can be instantiated with a fact that ants can
behavior of birds, bees or a school of fishes [20]. Originally
distinguish between nest mates and non-nest mates. The level
swarm intelligence focuses on graphically simulating the
of interaction and cooperation among ants of different colony
graceful and unpredictable choreography of bird folk. Every
is nearly nil as to protect the exploitation of the colony from
single individual is represented as vector in multidimensional
outsiders. Ants can distinguish nest mates from non-nest
search area. Thus same vector have one assigned vector that
mates, which allow them to limit altruism and cooperation to
can determine the subsequent progress of the particle called as
members of their own colony and protect their colony from
velocity vector. The PSO then determines methods to revise
exploitation by outsiders.
the velocity of a particle. Each particle then updates its
velocity based on present velocity and the finest arrangement
V. EXPERIMINTAL RESULTS
explored so far [20].The PSO practice is then iterated for some
fixed number of times till minimum error based on preferred
1) Creating Web log File.
The Windows Firewall log allows advanced users to
performance index is attained. It has been shown that this
collect and identify inbound traffic. You can log dropped
simple model can deal with difficult optimization problems
packets and successful connections. Once logging is turned
efficiently. The PSO, in the beginning, was developed for real
on all of the information is written to a file called,
valued spaces but many troubles are, however, defined for
pfirewall.log. The log file is stored in the %system
discrete valued spaces where the domain of the variables is
root%\Windows directory. This log file contains fields like
finite.
date, time, action, protocol src-ip ,dst-ip ,src-port ,dst-port
Recently a family of nature have inspired lots of
,size ,tcpflags, tcpsyn ,tcpack ,tcpwin, icmptype ,icmpcode
technical algorithms, known as Swarm Intelligence (SI).It has
& info path.This log file is filtered & pre-processed
fascinated number of researchers from the areas of pattern
initially.
recognition and clustering [21]. Various clustering techniques
that are based on this have allegedly presented many classical
Step 1: Take pfirewall.log file as an input.
methods of partitioning a complex real world dataset. This
Step 2: Parsing the pfirewall.log.
area of Swarm Intelligence is a relatively new interdisciplinary
(Selecting the only required attributes.i.e.Src ip,dest
field of research that has gained huge popularity now a day.
ip & size(no. of . Packets)
Different algorithms resembling to the domain portray
Step3: Store in array.
inspiration from the collective intelligence emerging from the
Step 4: Check which entry have same destination &
behavior of a group of social insects (like bees, termites and
source ip and apply the accumulation filter /
wasps). When acting as a community together, these insects
Discretion filter.
with very limited individual capability cooperatively perform
Step 5: Display output in console.
many complex tasks necessary for their continued existence.
Troubles of finding and storing foods, selecting and picking up
materials for future usage need a thorough planning, and are
solved by insect colonies without any kind of supervisor or
controller. Particle Swarm Optimization (PSO) is another very
popular SI algorithm for global optimization over continuous
search spaces.
The complex social behavior of ants and other social
insects requires multiple levels of recognition. Thus, Ant Nest
mate approach suggests that ants can distinguish nest mates
from non-nest mates, which allow them to limit altruism and
cooperation to members of their own colony and protect their
colony from exploitation by outsiders. Ants that have the same
odor will be in the same nest. The clusters obtained are feed
into an ant based clustering approach that checks for the
similarity of the pheromone values of the artificial ants. This is
done on the fact that ants belonging to the same nest will have
similar odor. In this algorithm clusters are considered as the
ants nest and the url combinations in each cluster is considered
as the artificial ants.
_______________________________________________________________________________________
ISSN: 2321-8169
6394 - 6399
________________________________________________________________________________________________________
time & Total accessed counts. Now using these inputs values,
Clusters as a whole that are referred as Centroid are
decision will be taken to add data to clustering.
divided in Scouts (Cluster 1) showing similar access patterns
of user & Searchers (Cluster 2) showing dissimilar ones.
_______________________________________________________________________________________
ISSN: 2321-8169
6394 - 6399
________________________________________________________________________________________________________
VI. PERFORMANCE ANALYSIS
Once the user profiles have been tracked, Pie-charts are
generated for the same analyzing the behavior of each user on
the various ip address.Fig 8,below represent the user access
times showing that which ip destination is accessed for how
much time span.In the result below green color represent the
maximum time of access & ip address related for same was
179.60.192.7.and remaining other ips were not accessed for
much longer duration as compared to mentioned one.
Fig 10:Pie Chart for User Access Sizes
CONCLUSION
[3]
[4]
[5]
[6]
http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Ale
x/
Lin, C.-W. and Hong, T.-P. (2013), A survey of fuzzy web
mining. WIREs Data Mining Knowl Discov, 3: 190199.
doi: 10.1002/widm.1091
Srivastava J, Desikan P and V Kumar, Web Mining-Concepts,
Applications & Research Direction in 2002 Conference.
Abraham, Ajith, He Guo, and Hongbo Liu. Swarm intelligence:
foundations, perspectives and applications. Springer Berlin
Heidelberg, 2006.
Srivastava J, Desika& n P and V Kumar , Web MiningAccomplishment Future Directions in 2004 Conference.
R. Kosala, and H. Blockeel, Web Mining Research: A Survey,
SIGKDD Explorations, Newsletter of the ACM Special Interest
6398
_______________________________________________________________________________________
ISSN: 2321-8169
6394 - 6399
________________________________________________________________________________________________________
Group on Knowledge Discovery and Data Mining, Vol. 2, No. [26] C. W. Cleverdon The Cranfield Tests on Index Languages
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
1 pp 1-15, 2000.
Puzis, Yury, et al. "Predictive web automation assistant for
people with vision impairments." Proceedings of the 22nd
international conference on World Wide Web. International
World Wide Web Conferences Steering Committee, 2013.
Mobasher, Bamshad. "Data mining for web personalization."
The adaptive web. Springer Berlin Heidelberg, 2007. 90-135.
Qingtian Han, Xiaoyan Gao, Wenguo Study on Web Mining
Algorithim based on usuage Mining, Computer Aided
Industrial design and Conceptual design, 2008 CAID/CD 2008.
Facca, Federico Michele, and Pier Luca Lanzi. "Mining
interesting knowledge from weblogs: a survey." Data &
Knowledge Engineering 53.3 (2005): 225-241.
Jaideep shrivastav, Robert Colley, Mukund Deshpande, PangNing Tan, Web Usage Mining: discovery and Application of
usage pattern from web data,ACM SIGKDD,jan2000.
www.springer.com/cda/content/.../cda.../9783642539640-c1.pdf
WANG Tong HE Pi-lian, Web Log Mining by an Improved
AprioriAll Algorithm, proceedings of world academy of
science, engineering and technology volume 4 February 2005
ISSN 1307-6884, 2005 WASET.ORG.
Mohd Helmy Abd Wahab, Mohd Norzali Haji Mohd, Hafizul
Fahri Hanafi, and Mohamad Farhan Mohamad Mohsin, Data
Pre-processing on Web Server Logs for Generalized Association
Rules Mining Algorithm, World Academy of Science,
Engineering and Technology ,2008.
Kobra Etminani, Mohammad-R. Akbarzadeh-T, and Noorali
Raeeji Yanehsari, Web Usage Mining: users' navigational
patterns extraction from web logs using Ant-based Clustering
Method, in Proc. IFSA-EUSFLAT ,2009.
Roohi, Farhat. "NEURO FUZZY APPROACH TO DATA
CLUSTERING: A FRAMEWORK FOR ANALYSIS."
European Scientific Journal 9.9 (2013).
Cordn, Oscar. Genetic fuzzy systems: evolutionary tuning and
learning of fuzzy knowledge bases. Vol. 19. World Scientific,
2001.
R. Cooley, B. Mobasher, and J. Srivastava,Web Mining:
Information and Pattern Discovery on the World Wide Web,
IEEE Computer Society,2009, pp. 558
R. Cooley, B. Mobasher, and J. Srivastava, Data Preparation
for Mining World Wide Web Browsing Patterns,
KNOWLEDGE AND INFORMATION SYSTEMS, vol.
1,1999.
He, Jie, and Hui Guo. "A modified particle swarm optimization
algorithm." TELKOMNIKA Indonesian Journal of Electrical
Engineering 11.10 (2013): 6209-6215.
Ali, Yasir Hassan, Roslan Abd Rahman, and Raja Ishak Raja
Hamzah. "Acoustic emission signal analysis and artificial
intelligence techniques in machine condition monitoring and
fault diagnosis: a review." Jurnal Teknologi 69.2 (2014).
D.Vasumathi, and A.Govardan,BC-WASPT : Web Acess
Sequential Pattern Tree Mining, IJCSNS International Journal
of Computer Science and Network Security., Vol.9,June-2009,
pp. 569571.
S.Vijayalakshmi V.Mohan, S.Suresh Raja,Mining Constraintbased Multidimensional Frequent Sequential Pattern in Web
Logs, European Journal of Scientific Research., Vol.36, pp
.480-490,2009.
Ming-Syan Chen, Jong Soo Park, Philip S. Yu, Efficient Data
Mining for Path Traversal Patterns, Ieee Transactions On
Knowledge And Data Engineering, Vol. 10, No. 2, March/April
1998.
F.M. Facca, P.L. Lanzi Mining interesting knowledge from
Weblogs: a survey, Data and Knowledge Engineering Vol. 53,
No. 3,June 2005, pp 225-241.
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
6399
IJRITCC | November 2015, Available @ http://www.ijritcc.org
_______________________________________________________________________________________