0% found this document useful (0 votes)

7 views8 pages

Download File

This paper analyzes retail sales data using clustering techniques to identify profitable areas for investment and improve sales strategies. It employs the K-means algorithm to compare online and offline sales channels, revealing that offline sales are preferred despite online channels generating higher revenue for bulk purchases. The findings aim to assist retailers in optimizing their sales channels and addressing areas needing improvement.

Uploaded by

nandhupravichandran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Download File

Uploaded by

nandhupravichandran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ISSN: 0374-8588

Volume 21 Issue 14, December 2019

_____________________________________________________________________________________

RETAIL SALES ANALYSIS

USING CLUSTERING
Dr. M. Rajeshwari 1 , P.R.Bharathi Nandha 2

ABSTRACT
This project aims at analysing on the sales data of any retail company and gives the result on
where to invest more of its investment to get more profit. If we take a retail company, it can
have different branches and can sell many products through online and offline (Channel of
sale). By selling its products through various methods the company should analyse and invest
more on the area where it is getting more profit. The profitable area can be a particular
branch where sale is high (or) can a product which is selling most (or) can be an online/offline
method which helps in getting high profit. This analysis will help in finding the mode of sale
which is giving the more profit to the company and what is the reason behind that mode to give
the more profit through it. This paper also analyses the area of improvement for a company so
that company can focus on improving that particular part.

Key words : Retail sales, Clustering, Data Analysis, K-means.

_____________________________________________________________________________

I. INTRODUCTION:

Data analysis is the process of inspecting, transforming, analyzing data sets to get the insights
from the data to take business decisions using the machine learning algorithm. For that, we
used an analysis tool “WEKA”. It is one of the data mining software. It was developed at the
University of Waikato, New Zealand. It contains tools for data pre-processing, classification,
clustering and regression. Here, Simple K-means clustering (Clustering is the method of
grouping the similar objects with other objects in the same group) is used to analyse the retail
sales data.Every business retailer’s main aim is to have a high profit. For that purpose, for the
ease of purchasing for the customer, many retailers extend their business to online using
mobile app, website or by using social media. But, not every retailers will be successful in
that. It is important to know some things which customer will hate about online shopping.

II. RELATED WORKS

The paper mainly focus on the retail sales improvement of a shop which has both online and
offline establishments in different regions. Major analysis is the comparison of sales in both
offline and online channels. We conclude with the step to improve the channel which has low

876
ISSN: 0374-8588
Volume 21 Issue 14, December 2019

_____________________________________________________________________________________

sales. This paper examines the relationship between the sales channel and the sales revenue.
This study may be helpful for retailers who has both online and offline channels for their shop.
This can also be useful while the giving offers or discounts, like which offer will increase
revenue in different sales channel
Data Analytics the science of examining raw data to draw conclusions about that
information. It's just a process of analyzing raw data to find trends and answer questions
which involves applying an algorithmic or mechanical process to derive insights from the
data
whether it's structured or unstructured. Now a days, Data is everywhere and it becomes the
main asset. In fact, the amount of digital data that exists is growing at a rapid rate. So, the data
analysis becomes the powerful technology. Data mining is a data analysis technique used on
the statistical modeling and getting insights on the data for predictive analysis.
Predictive analytics and text analytics is the main application of the data
analysis. Predictive analytics mainly focuses on the future forecasting and classification while
text analytics focused on the unstructured text data which had a wide application in all sectors.
In today's world particularly in business field, data analysis plays a major role in taking more
effective business decisions.
Cluster analysis or clustering is the most commonly used technique of Machine
Learning. Machine Learning (ML) is an application and part of Artificial intelligence. It's gives
computers the capability to learn without being explicitly programmed. ML is one of the most
todays trending and powerful technologies that one would have ever come across. It has 2
major types.
One is Supervised learning, in which an algorithm learns from existing data with it's
respective target responses which consist of numeric values or labels which are strings, such
as classes or tags, for the purpose of predicting the correct predictive variable like sales in
future year when the new data is given to it.
Another one is unsupervised learning, the training using data that is neither classified
nor labelled and allowing the algorithm to act on that data without any insights. In this, the
main thing is to group the unlabelled data using its similarities, patterns and differences
without any previous training of data. Clustering is a main part of unsupervised learning. It is
mainly used to find data clusters (group of similar data points) such that each cluster has most
closely matched data. Actually, clustering could be “the process of organizing objects into
groups whose members are similar in some way”.

Clustering algorithms can be applied in many fields, for instance:

 Marketing: finding groups of customers with similar behaviour given a large database of
customer data containing their properties and past buying records;
 Biology: Finding clusters of similar genes in DNA analysis .Segmenting communities in
ecology

877
ISSN: 0374-8588
Volume 21 Issue 14, December 2019

_____________________________________________________________________________________

 Libraries:book ordering.
Retail: Grouping the content of a website or product in a retail business., Customer
segmentation
An important component of a clustering algorithm is the distance measure between data points.
If the components of the data instance vectors are all in the same physical units then it is
possible that the simple Euclidean distance metric is sufficient to successfully group similar
data instances.

For higher dimensional data, a popular measure is the Minkowski metric,

Where d is the dimensionality of the data. The Euclidean distance is a special case
where p=2, while Manhattan metric has p=1. However, there are no general theoretical
guidelines for selecting a measure for any given application.

III. METHODOLOGY:

The most used and important clustering algorithms are

 K-means
 Fuzzy C-means
 Hierarchicalclustering
 Mixture of Gaussians

Among those algorithms, K-means is the most popular and simple method for clustering. It
assumes that most of the data is located near prototypes (element of data space representing a
group of elements). It assigns training data to matching cluster based on similarity and
involves iterative process to get data points in the best possible clusters until the model is
optimized. Itis commonly used in medical imaging, biometrics, and related fields.
The algorithm keeps track of the centroids of the subsets, and proceeds in simple iterations.
The initial partitioning is randomly generated, that is, we randomly initialize the centroids to
some points in the region of the space.
K-means algorithm is an iterative algorithm that tries to partition the dataset into K pre-
defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one
group.It involves

878
ISSN: 0374-8588
Volume 21 Issue 14, December 2019

_____________________________________________________________________________________

• Start by picking k random centroids.

• Assign each point to the nearest centroid.
• Move each centroid to the centre of the respective cluster.
• Calculate the distance of the centroids from each point again.
• Move points across clusters and re-calculate the distance from the centroid.
• Keep moving the points across clusters until the distance from the centre is minimized.

S.NO VARIABLES DISCRIPTION POSSIBLE

VALUES
1. Region The region of the customer who bought (string)
the product
2. Country The country of the customer (numeric: Min 100
Max:1750
3. Item types Type of the product (character: Business
to Business, Business
to consumer,
Business to govt.
nonprofit.
4. Sales Channel Mode of the order (string: offline ,
online)
5. Shipping Cost Applicable only for online customer (numeric)
cost for the shipping the product
6. Units sold Total quantity of the product ordered (numeric)

7. Unit price Selling price of the one product (numeric)

8. Unit cost Actual Price of one product (numeric)

9. Total Revenue Total Revenue for the product (numeric)

10. Total cost Actual cost of the product (numeric)

11. Total Profit Total Profit (numeric)

Table 1: Selected variables from retail sales record.

The above table displays the attributes which are used in the data with description and details
about the value in that.Using this, the frequency of sales channel’s values were analyzed to
found out which channel is preferred highly preferred by customer. Also, analyzed the
879
ISSN: 0374-8588
Volume 21 Issue 14, December 2019

_____________________________________________________________________________________

relationship between the sales channels and shipping cost. Then, found out the reason for the
low preferred channel.

IV. RESULT

Cluster No. No. of Orders Percentage

0 (online) 75 75%
1 (offline) 25 25%
Total samples: 100

880
ISSN: 0374-8588
Volume 21 Issue 14, December 2019

_____________________________________________________________________________________

Figure 1: -

Problem 1: Which sales channel (offline/online) is highly preferred by Customer?

From the above analysis, it’s clear that the offline sales channel is highly preferred by
customers than the online sales channel. Then, further drilling down into the plot shows that,
offline orders have low revenue compared to the online orders. Both channels have their own
advantage and disadvantage.Further analyzing in the regions and units sold, we get to know
that, most of the online purchase are from the farther regions and the quantity of the product.
Maybe, according to the insights that I gained from the data, the products with many quantity or
high weighed products (if quantity is less but the price is high) only be purchased via online
channel. Furthermore analysis with the shipping cost, got some strong insights.

881
ISSN: 0374-8588
Volume 21 Issue X, Month 2019

_____________________________________________________________________________________

Figure 2: -

Problem 2: Why the online channel is least preferred?

Referring through the figure-1, Orders coming through online was receiving maximum only in
bulk and is also giving more revenue than offline mode of sale. This may be due to following
reasons:
1. Products ordered through online cannot be self-transferred easily.
2. Online products might have many options to choose and customize rather than offline.
This might be reason for high revenue.
3. Online ordering might give tracking facilities which cannot be done through
offline. My analysis for why orders are less in online.
1. Shipping cost is more for the purchase of less amount.
2. Some household products might be needed fast. Since online orders gets delayed in
delivery offline sales for small products is high.
3. People who is unaware of online mode of sale will buy through offline.

V. CONCLUSION
In this paper, weka is used to analyze the sales data of a retail company using centroid
based clustering. The main intention of this paper is to help the company in knowing the
sectors where it is getting more profit and where it needs to improve its sales. We used 100
sales data which contains various factors like mode of sale, area of sale, profit amounts,
product etc. Using these data this project will analyses and provide the necessary data to the
company.

REFERENCES

[1] Osmar R. Zaïane: “Principles of Knowledge Discovery in Databases - Chapter 8:

Data Clustering”
[2] J. B. MacQueen (1967): "Some Methods for classification and Analysis of
Multivariate Observations, Proceedings of 5-th Berkeley Symposium on
Mathematical Statistics and Probability", Berkeley, University of California Press,
1:281-297
[3] Andrew Moore: “K-means and Hierarchical Clustering - Tutorial
Slides” http://www-2.cs.cmu.edu/~awm/tutorials/kmeans.html
[4] J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in
Detecting Compact Well-Separated Clusters", Journal of Cybernetics 3: 32-5
[5] Quoc Qv Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and
882
ISSN: 0374-8588
Volume 21 Issue X, Month 2019

_____________________________________________________________________________________

Documents. International Conference on Machine Learning - ICML 2014 32(2014),

1188– 1196. https://doi.org/10.1145/2740908.2742760
[6] Marina Meilă. 2007. Comparing clusterings—an information based distance.Journal of
Multivariate Analysis 98, 5 (5 2007), 873–895
[7] David Newman, Edwin V Bonilla, and Wray Buntine. 2011. Improving Topic
Coherence with Regularized Topic Models. In Advances in Neural
InformationProcessing Systems 24, J Shawe-Taylor, R S Zemel, P L Bartlett, F Pereira,
and K QWeinberger (Eds.). Curran Associates, Inc., 496–504
[8] K. Mumtaz, “An Analysis on Density Based Clustering of Multi-Dimensional Spatial
Data”, Indian Journal of Computer Science and Engineering Vol 1 No 1 8-12.
[9] Sie Tang Lau, Journal of Sales management, vol 12, issue 3, july 2002 2017, page 234-
256.

883

Boom en Crush
100% (7)
Boom en Crush
51 pages
PDA03 - The Ultimate Treasure PDF
No ratings yet
PDA03 - The Ultimate Treasure PDF
235 pages
MGM3165 CHAPTER 16 17
No ratings yet
MGM3165 CHAPTER 16 17
21 pages
Segmentation of Shopping Mall Customers Using Machine Learning
No ratings yet
Segmentation of Shopping Mall Customers Using Machine Learning
11 pages
Customer Segmentation Using Machine Learning: Ilavendhan@galgotiasuniversity - Edu.in
No ratings yet
Customer Segmentation Using Machine Learning: Ilavendhan@galgotiasuniversity - Edu.in
7 pages
Variance Rover System
No ratings yet
Variance Rover System
3 pages
Customer Segmentation With Machine Learning
No ratings yet
Customer Segmentation With Machine Learning
7 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Employee Mangement System
No ratings yet
Employee Mangement System
60 pages
Final Synopsis
No ratings yet
Final Synopsis
9 pages
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
No ratings yet
Customer Segmentation Using Machine Learning With A Coupon Generator GUI
6 pages
Cluster analysis (3)
No ratings yet
Cluster analysis (3)
46 pages
Data Segmentation
No ratings yet
Data Segmentation
27 pages
Prrethy-Dr. Huma Lone - AL
No ratings yet
Prrethy-Dr. Huma Lone - AL
7 pages
Fin Irjmets1653303840
No ratings yet
Fin Irjmets1653303840
4 pages
Research Paper Mini Project
No ratings yet
Research Paper Mini Project
13 pages
60_Assignment
No ratings yet
60_Assignment
3 pages
Final
No ratings yet
Final
48 pages
Customer Segemntation
No ratings yet
Customer Segemntation
26 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
symmetry-13-01789-v2
No ratings yet
symmetry-13-01789-v2
15 pages
JPSP202244
No ratings yet
JPSP202244
7 pages
WQD7005 Case Study - 17219402
No ratings yet
WQD7005 Case Study - 17219402
21 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
FTU 2024 Chap10 Using Customer Related Data for Analytics
No ratings yet
FTU 2024 Chap10 Using Customer Related Data for Analytics
26 pages
Janardhanan 2020 J. Phys. Conf. Ser. 1706 012160
No ratings yet
Janardhanan 2020 J. Phys. Conf. Ser. 1706 012160
9 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
IJCRT22A6129
No ratings yet
IJCRT22A6129
9 pages
Dynamic Customer Segmentation using Unsupervised Machine Learning in Python (1)
No ratings yet
Dynamic Customer Segmentation using Unsupervised Machine Learning in Python (1)
42 pages
A Comparative Analyis of K-Means and Its Varinats For Customer Segmentation
No ratings yet
A Comparative Analyis of K-Means and Its Varinats For Customer Segmentation
15 pages
Janardhanan_2020_J._Phys.__Conf._Ser._1706_012160
No ratings yet
Janardhanan_2020_J._Phys.__Conf._Ser._1706_012160
9 pages
Mall Customer Segmentation Using Cluster
No ratings yet
Mall Customer Segmentation Using Cluster
6 pages
ML Assignment 1
No ratings yet
ML Assignment 1
23 pages
International Conference On Services Systems and Services Management
No ratings yet
International Conference On Services Systems and Services Management
5 pages
Sequential Clustering and Classication Approach To Analyze Sales Performance of Retail Stores Based On Point of Sale Data
No ratings yet
Sequential Clustering and Classication Approach To Analyze Sales Performance of Retail Stores Based On Point of Sale Data
26 pages
Unit 5
No ratings yet
Unit 5
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
49 pages
Customer Segmentation Using Data Science
No ratings yet
Customer Segmentation Using Data Science
7 pages
IJCSP23D1055
No ratings yet
IJCSP23D1055
9 pages
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
No ratings yet
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
4 pages
IEEE Conference Template 5
No ratings yet
IEEE Conference Template 5
5 pages
UNIT 4 NOTES
No ratings yet
UNIT 4 NOTES
66 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
Chapter 1: Introduction: 1.1 Background Theory
No ratings yet
Chapter 1: Introduction: 1.1 Background Theory
36 pages
IGI_Book-270-292 (1)
No ratings yet
IGI_Book-270-292 (1)
24 pages
Predictive Analysis 5
No ratings yet
Predictive Analysis 5
8 pages
M2
No ratings yet
M2
9 pages
UNIT II-Segmentation, Positioning, And Product Optimization
No ratings yet
UNIT II-Segmentation, Positioning, And Product Optimization
48 pages
288175101
No ratings yet
288175101
51 pages
2 Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
No ratings yet
2 Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
4 pages
Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
No ratings yet
Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
4 pages
Hariprasath Conferencepaper
No ratings yet
Hariprasath Conferencepaper
6 pages
Sales Analysis and Forecasting in Shopping Mart: Amit Kumar, Kartik Sharma, Anup Singh, Dravid Kumar
No ratings yet
Sales Analysis and Forecasting in Shopping Mart: Amit Kumar, Kartik Sharma, Anup Singh, Dravid Kumar
4 pages
Business Problems
No ratings yet
Business Problems
1 page
DM Unit-1
No ratings yet
DM Unit-1
27 pages
K-meanspaper
No ratings yet
K-meanspaper
20 pages
combinepdf-1
No ratings yet
combinepdf-1
74 pages
Behavioural Customer Segmentation Based
No ratings yet
Behavioural Customer Segmentation Based
7 pages
Ijcirv13n8 08
No ratings yet
Ijcirv13n8 08
8 pages
Unit 3
No ratings yet
Unit 3
58 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Retail Data Analytics: Enhancing Customer Experience and Profitability
From Everand
Retail Data Analytics: Enhancing Customer Experience and Profitability
Christine Nyaga
No ratings yet
Directions To 10 South Colonnade
No ratings yet
Directions To 10 South Colonnade
1 page
Arduino CC
No ratings yet
Arduino CC
5 pages
Gene Expression Lab 7
No ratings yet
Gene Expression Lab 7
4 pages
Unit 1 Lecture Notes
No ratings yet
Unit 1 Lecture Notes
20 pages
Jain Plumbing SWR Upvc Pipe Fittings
No ratings yet
Jain Plumbing SWR Upvc Pipe Fittings
6 pages
04ciliegio - GB Trešnje Nove
No ratings yet
04ciliegio - GB Trešnje Nove
18 pages
Arts in Asia .Test Paper.1
No ratings yet
Arts in Asia .Test Paper.1
3 pages
Code Blue Meds
100% (1)
Code Blue Meds
35 pages
Notice: Antidumping: Circular Welded Carbon Steel Pipes and Tubes From&#8212 Various Countries
No ratings yet
Notice: Antidumping: Circular Welded Carbon Steel Pipes and Tubes From&#8212 Various Countries
11 pages
Eet202 DC Machines and Transformers, July 2021
No ratings yet
Eet202 DC Machines and Transformers, July 2021
3 pages
A Generic Conceptual
No ratings yet
A Generic Conceptual
13 pages
Fluid Mechanics
No ratings yet
Fluid Mechanics
7 pages
The Three Wishes: 2 Assessment Grade II
No ratings yet
The Three Wishes: 2 Assessment Grade II
4 pages
Biofuels in Shipping
No ratings yet
Biofuels in Shipping
21 pages
Vanderbilt University ES 140x EE Introduction To Engineering Electrical Engineering Module Fall 2021
No ratings yet
Vanderbilt University ES 140x EE Introduction To Engineering Electrical Engineering Module Fall 2021
3 pages
Client Server and Peer To Peer
No ratings yet
Client Server and Peer To Peer
4 pages
Hegde Visual Speech Enhancement Without A Real Visual Stream WACV 2021 Paper
No ratings yet
Hegde Visual Speech Enhancement Without A Real Visual Stream WACV 2021 Paper
10 pages
Dragonfly Patterns - Plush and Medium Weight Yarn
100% (3)
Dragonfly Patterns - Plush and Medium Weight Yarn
14 pages
I/G/O I/G/O A/M/I: Instrument Approach Chart - Icao
100% (1)
I/G/O I/G/O A/M/I: Instrument Approach Chart - Icao
1 page
Mesh (Scale) - Wikipedia
No ratings yet
Mesh (Scale) - Wikipedia
3 pages
Petroleum Traps
100% (1)
Petroleum Traps
22 pages
SS7 DPC Outage Handling
No ratings yet
SS7 DPC Outage Handling
19 pages
RBX Gum
No ratings yet
RBX Gum
1 page
Standard Bidding Document (SBD 4)
No ratings yet
Standard Bidding Document (SBD 4)
3 pages
Mark Antony Julius Caesar by William Shakespeare
No ratings yet
Mark Antony Julius Caesar by William Shakespeare
7 pages
Instruction Manual: LKH Centrifugal Pump
No ratings yet
Instruction Manual: LKH Centrifugal Pump
54 pages
Kanthal Activity-Based Costing 20200118 - RVA
No ratings yet
Kanthal Activity-Based Costing 20200118 - RVA
12 pages
Max Martin Speaks On Songwriting Secrets - 'It's Almost Like Science' - Variety
No ratings yet
Max Martin Speaks On Songwriting Secrets - 'It's Almost Like Science' - Variety
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Download File

Uploaded by

Download File

Uploaded by

ISSN: 0374-8588

Volume 21 Issue 14, December 2019

RETAIL SALES ANALYSIS

Key words : Retail sales, Clustering, Data Analysis, K-means.

II. RELATED WORKS

Clustering algorithms can be applied in many fields, for instance:

For higher dimensional data, a popular measure is the Minkowski metric,

The most used and important clustering algorithms are

• Start by picking k random centroids.

S.NO VARIABLES DISCRIPTION POSSIBLE

7. Unit price Selling price of the one product (numeric)

8. Unit cost Actual Price of one product (numeric)

9. Total Revenue Total Revenue for the product (numeric)

10. Total cost Actual cost of the product (numeric)

11. Total Profit Total Profit (numeric)

Table 1: Selected variables from retail sales record.

Cluster No. No. of Orders Percentage

Problem 1: Which sales channel (offline/online) is highly preferred by Customer?

Problem 2: Why the online channel is least preferred?

[1] Osmar R. Zaïane: “Principles of Knowledge Discovery in Databases - Chapter 8:

Documents. International Conference on Machine Learning - ICML 2014 32(2014),

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.