Clickstream Data
Clickstream Data
Clickstream Data
June 2014
Submitted by:
Kartik Gupta
201100048
M.C.A
Thapar University
Submitted to:
School of Mathematics and
Computer Application
Department,
Thapar university,
Patiala.
Outline
Overview
Big Data
Hadoop
Major Steps
Results and Analysis
Conclusion and Future Scope
Overview
This Project gives an analytic report to find the
behavior and location of visitor using Hadoop.
Map Reduce is implemented to refine and sort
the raw data.
Searching is done based on the country, ip
addresses, Postal code, categories wise
Hadoop is a tool which converts the
unstructured, structured and semi-structured
data into pair into a single value which is
represented in binary format.
MapReduce framework is used for parallel
implementation.
Big Data
Big Data is a term used to describe large
collections of data that may be unstructured
grow so large and quickly that it is difficult to
manage with regular database or statistical
tools.
3 vs of Big data
Hadoop
Processing (MapReduce)
MapReduce
MapReduce is a programming model and an
associated implementation for processing large
data sets.
MapReduce usually splits the input data-set into
independent chunks which are processed in a
completely parallel manner.
This allows programmers without any experience
with
parallel and distributed systems to easily
utilize the resources of a large distributed system.
The run-time system takes care of scheduling
tasks, monitoring them and re-executes the failed
tasks.
12
17
OTHER TECHNOLOGICAL
TERMS
Clickstream Data
Clickstream data is an information trail a user leaves
behind while visiting a website. It is typically captured in
semi-structured website log files.
Potential Uses of Clickstream Data
What is the most efficient path for a site visitor to research
a product, and then buy it?
What products do visitors tend to buy together, and what
are they most likely to buy in the future?
Where should I spend resources on fixing or enhancing the
user experience on my website?
Basically we will focus on the path optimization use case.
Specifically: how can we improve our website to reduce
bounce rates and improve conversion?
STEP I
STEP II
STEP III
STEP IV
STEP V
STEP VI
Configuration of Hadoop
Conclusion
The amount of clickstream data is rapidly
growing and with this demand for accessing
information
over
web
has
increased
significantly.
Therefore analyze the behavior and location
of the visitor.
It is inefficient to process large data using
traditional sequential method
Therefore MapReduce is used for processing
large datasets
Future Scope
Clickstream information play an important
role in a wide variety of applications such as
decision support systems, profile-based
marketing.
Location search is used by various industries
like telecom , e-commerce industry , in event
detection.
Nearest location method can be fused with
any other method to help in better way for
decision making.
Then the tradeoff would be done between
distance and other factor that would be fused