0% found this document useful (0 votes)
36 views9 pages

Data Mining Project

Uploaded by

Xhufkf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views9 pages

Data Mining Project

Uploaded by

Xhufkf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

School of Computer Science and Technology

Data Mining Project


Section A
Group Members:
1. Abenezer Tariku
2. Amanuel Belaineh
3. Beaman Belay
4. Biniyam Assefa
5. Binyam Edmealem

Submission date – 28/05/22


Submitted to – Dr. Eyob N.
Table of Contents

Introduction .................................................................................................................................................. 3
Objective ................................................................................................................................................... 3
Methodology............................................................................................................................................. 3
Review related works.................................................................................................................................... 4
Data preparation ........................................................................................................................................... 4
Experimental setup ....................................................................................................................................... 5
Mining Method ......................................................................................................................................... 5
Parameter used......................................................................................................................................... 6
Experimental results and findings................................................................................................................. 6
Conclusion ..................................................................................................................................................... 8
References .................................................................................................................................................... 9
Introduction
Shopping is an activity in which a customer browses the available goods or services presented
by one or more retailers with the potential intent to purchase a suitable selection of them.
Online shopping has become a major disruptor in the retail industry as consumers can now
search for product information and place product orders across different regions. Online
retailers deliver their products directly to the consumers' home, offices, or wherever they want.
Using online shopping methods, consumers do not need to consume energy by physically
visiting physical stores. This way they save time and the cost of traveling.
Shoppers' shopping experiences may vary. They are based on a variety of factors including how
the customer is treated, convenience, the type of goods being purchased, and mood.
A grocery is a type of shopping that deals with buying food and household supplies sold at a
store. Common grocery items include citrus fruit, tropical fruit, whole milk, pip fruit, whole
milk, rolls/buns, coffee, yoghurt, butter, etc.…
Determining customer’s behavior is an essential part on the retailers’ hand in order to
determine the specific wants and needs of their customers while also enabling them to provide
their customers with great shopping experience. The main problem our project aims to tackle is
reducing customers’ time while buying groceries, to enable retailers to better predict their
customers grocery choice.

There are various factors that affect customers’ buying experience such as previous experience
as a customer previous purchase, social characteristics, lifestyle culture, education, occupation,
beliefs, and others. By using such factors, we can precisely predict how customers will behave
while they are shopping for groceries using data mining techniques. Data mining is the process
of sorting through large data sets to identify patterns and relationships that can help solve
business problems through data analysis.

The data mining technique that we used for this project is the association rule mining.
Specifically, we used the Apriori and the FP-Growth algorithms to predict customers’ behavior
while they are shopping for groceries, these two algorithms will help determine strong
association rules between customer’s purchases.

Objective
I. Using association rule mining to generate strong rules of grocery shopping regarding customer
behavior
II. Predict which grocery item(s) customers are most likely to buy
III. Predict the demand and supply of certain grocery items from customer behavior

Methodology
We used to data set from [1] to generate rules using association rule mining techniques which include
apriori and FP-Growth algorithms.
Literature review
[2] Performance study of classification algorithms for consumer online shopping attitudes and behavior
using data mining by Rana Alaa El-Deen Ahmeda, M.Elemam.Shehaba, Shereen Morsya, Nermeen
Mekawiea.

The sales data in this paper includes information about customer buying history ,goods or services
offered for the customers. Hidden relationships in sales data can be discovered from the application of
data mining techniques. Because of the growing popularity and acceptance of e-commerce platforms,
users face an ever-increasing burden in choosing the right product from the large number of online
offers. Thus, techniques for personalization and shopping guides are needed by users. For a pleasant
and successful shopping experience, users need to know easily which products to buy with high
confidence. In this paper eleven data mining classification techniques will be comparatively tested to
find the best classifier fit for consumer online shopping attitudes and behavior according to obtained
dataset for big agency of online shopping ,the results shows that decision table classifier and filtered
classifier gives the highest accuracy and the lowest accuracy is achieved by classification via association
rule mining and simple cart, also this paper will provide a recommender system based on decision table
classifier helping the customer to find the products he/she is searching for in some ecommerce web
sites .Recommender system learns from the information about customers and products and provides
appropriate personalized recommendations to customers to find the desired products.

[3] A review of data mining techniques for research in online shopping behaviour through frequent
navigation paths by Wing Lok Yeung. Department of Computing and Decision Sciences Lingnan
University Hong Kong, China

Knowing how consumers navigate online shopping, web sites enable retailers to not only better design
their sites for navigation but also place buying recommendations at strategic points and personalize the
flow of content. Frequent item sets can be derived from browsing histories or clickstreams with
sequence-oriented data mining techniques. In this working paper, it is highlighted with examples, the
relevance of frequent navigation paths to online shopping behavior research and review some relevant
data mining techniques. The Internet has opened up a new area of consumer behavior research with
vast trove of data to explore. Data mining enables us to study them methodically. Yet, the methods of
research are still developing and the ever-increasing computational power allows us to probe ever
deeper into the data. As the Age of Big Data looms large, there are ample opportunities to apply novel
data mining techniques in consumer behavior research and we have briefly reviewed a few of them in
this working paper.

Data preparation
From the various data mining techniques that are available, the most suitable data mining technique for
this particular project is the association rule mining, because it accurately analyzes and predicts
customer behavior. In our data set there are no incomplete, missing values, outliers, and unbalanced
data. We added a header to the dataset/csv file to name the columns from 1-32.

The csv file was read transaction by transaction and each transaction was saved as a list. A mapping was
created from the unique items in the dataset to integers so that each item corresponded to a unique
integer. The entire data was mapped to integers to reduce the storage and computational requirement.
A reverse mapping was created from the integers to the item, so that the item names could be written
in the final output file.

Experimental setup
The experimental is done on Visual Studio Code following the Python programming language. There are
two association rule mining rule algorithms that we used in this project.

Mining Method
• Apriori algorithm
Apriori algorithm follows a sequence of steps in order to generate rules from the most frequent item
sets. It follows the join and the prune steps iteratively until the most frequent itemset is reached. Our
program allows the user to give minimum threshold for the support and confidence.

[4] Figure 1. Flowchart Apriori algorithm

• FP-Growth algorithm
FP-growth is a better version of the Apriori Algorithm that is prevalently used for generating rules using
frequent item sets. It's an analytical data mining technique for identifying common patterns or
correlations in data sets. For example, grocery store transaction data might have a frequent pattern that
people usually buy different types of grocery items together.
[5] Figure 2. Flowchart of FP-Growth algorithm

Parameter used
Brief description of our dataset:

The dataset contains 9835 transactions by customers shopping for groceries. The data contains 169
unique items and contains 32 attributes.

Experimental results and findings


Results for different Support and Confidence

1. Support=0.0045, Confidence=0.1 Number of frequent 1 item sets: 121 Association rules: 2123
Number of Maximal sets: 1113 Number of Closed sets: 1192 Number of redundant rules: 0
2. Support=0.05, Confidence=0.25 Number of frequent 1 item sets: 28 Association rules: 4
Number of Maximal sets: 27 Number of Closed sets: 31 Number of redundant rules: 0
3. Support=0.05, Confidence=0.05 Number of frequent 1 item sets: 28 Association rules: 6
Number of Maximal sets: 27 Number of Closed sets: 31 Number of redundant rules:0

We have observed that the process is faster while using the FP-Growth algorithm than using the apriori
algorithm. For apriori algorithm the process took approximately 7.52s to complete whereas the FP-
Growth took only 3.87s. It is also important to note that we used the same dataset to experiment with
both association rule mining algorithms. In terms of memory usage, we took notice of the fact that FP-
Growth algorithm uses much less memory compared to apriori algorithm. To be more specific the
apriori algorithm used 16-17mb of memory whereas the FP-Growth used 7-9mb of memory. In
conclusion FP-Growth uses less time and less memory.
Conclusion

Data mining is a strong tool that should be utilized with caution in order to improve consumer
happiness by offering the best, safe, and useful products at fair and affordable costs. This
should be put to good use in order to make the company more competitive and lucrative. The
main theme of this project revolved around discovering association rule mining models of data
mining techniques by finding patterns in customer behavior regarding grocery shopping
experience. The association rule mining algorithms have the potential to support retailers in
terms of determining whether a customer would buy certain grocery items if he/she bought
certain grocery item(s). The existence of this system can help make decisions for the owner in
order to maximize in stock provision, maximizing in marketing strategy with market share that
can be seen from the segment of products purchased by consumers. Then there is the
convenience of sorting products for both buyers and sellers.
References
[1] https://www.kaggle.com/datasets/irfanasrullah/groceries

[2]https://www.academia.edu/16818466/Performance_Study_of_Classification_Algorithms_for_Consu
mer_Online_Shopping_Attitudes_and_Behavior_Using_Data_Mining

[3] https://commons.ln.edu.hk/cgi/viewcontent.cgi?article=1075&context=hkibswp

[4] https://www.softwaretestinghelp.com/apriori-algorithm/

[5] https://www.degruyter.com/document/doi/10.1515/jisys-2020-0146/html

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy