Download File
Download File
_____________________________________________________________________________________
ABSTRACT
This project aims at analysing on the sales data of any retail company and gives the result on
where to invest more of its investment to get more profit. If we take a retail company, it can
have different branches and can sell many products through online and offline (Channel of
sale). By selling its products through various methods the company should analyse and invest
more on the area where it is getting more profit. The profitable area can be a particular
branch where sale is high (or) can a product which is selling most (or) can be an online/offline
method which helps in getting high profit. This analysis will help in finding the mode of sale
which is giving the more profit to the company and what is the reason behind that mode to give
the more profit through it. This paper also analyses the area of improvement for a company so
that company can focus on improving that particular part.
I. INTRODUCTION:
Data analysis is the process of inspecting, transforming, analyzing data sets to get the insights
from the data to take business decisions using the machine learning algorithm. For that, we
used an analysis tool “WEKA”. It is one of the data mining software. It was developed at the
University of Waikato, New Zealand. It contains tools for data pre-processing, classification,
clustering and regression. Here, Simple K-means clustering (Clustering is the method of
grouping the similar objects with other objects in the same group) is used to analyse the retail
sales data.Every business retailer’s main aim is to have a high profit. For that purpose, for the
ease of purchasing for the customer, many retailers extend their business to online using
mobile app, website or by using social media. But, not every retailers will be successful in
that. It is important to know some things which customer will hate about online shopping.
The paper mainly focus on the retail sales improvement of a shop which has both online and
offline establishments in different regions. Major analysis is the comparison of sales in both
offline and online channels. We conclude with the step to improve the channel which has low
876
ISSN: 0374-8588
Volume 21 Issue 14, December 2019
_____________________________________________________________________________________
sales. This paper examines the relationship between the sales channel and the sales revenue.
This study may be helpful for retailers who has both online and offline channels for their shop.
This can also be useful while the giving offers or discounts, like which offer will increase
revenue in different sales channel
Data Analytics the science of examining raw data to draw conclusions about that
information. It's just a process of analyzing raw data to find trends and answer questions
which involves applying an algorithmic or mechanical process to derive insights from the
data
whether it's structured or unstructured. Now a days, Data is everywhere and it becomes the
main asset. In fact, the amount of digital data that exists is growing at a rapid rate. So, the data
analysis becomes the powerful technology. Data mining is a data analysis technique used on
the statistical modeling and getting insights on the data for predictive analysis.
Predictive analytics and text analytics is the main application of the data
analysis. Predictive analytics mainly focuses on the future forecasting and classification while
text analytics focused on the unstructured text data which had a wide application in all sectors.
In today's world particularly in business field, data analysis plays a major role in taking more
effective business decisions.
Cluster analysis or clustering is the most commonly used technique of Machine
Learning. Machine Learning (ML) is an application and part of Artificial intelligence. It's gives
computers the capability to learn without being explicitly programmed. ML is one of the most
todays trending and powerful technologies that one would have ever come across. It has 2
major types.
One is Supervised learning, in which an algorithm learns from existing data with it's
respective target responses which consist of numeric values or labels which are strings, such
as classes or tags, for the purpose of predicting the correct predictive variable like sales in
future year when the new data is given to it.
Another one is unsupervised learning, the training using data that is neither classified
nor labelled and allowing the algorithm to act on that data without any insights. In this, the
main thing is to group the unlabelled data using its similarities, patterns and differences
without any previous training of data. Clustering is a main part of unsupervised learning. It is
mainly used to find data clusters (group of similar data points) such that each cluster has most
closely matched data. Actually, clustering could be “the process of organizing objects into
groups whose members are similar in some way”.
877
ISSN: 0374-8588
Volume 21 Issue 14, December 2019
_____________________________________________________________________________________
Libraries:book ordering.
Retail: Grouping the content of a website or product in a retail business., Customer
segmentation
An important component of a clustering algorithm is the distance measure between data points.
If the components of the data instance vectors are all in the same physical units then it is
possible that the simple Euclidean distance metric is sufficient to successfully group similar
data instances.
Where d is the dimensionality of the data. The Euclidean distance is a special case
where p=2, while Manhattan metric has p=1. However, there are no general theoretical
guidelines for selecting a measure for any given application.
III. METHODOLOGY:
K-means
Fuzzy C-means
Hierarchicalclustering
Mixture of Gaussians
Among those algorithms, K-means is the most popular and simple method for clustering. It
assumes that most of the data is located near prototypes (element of data space representing a
group of elements). It assigns training data to matching cluster based on similarity and
involves iterative process to get data points in the best possible clusters until the model is
optimized. Itis commonly used in medical imaging, biometrics, and related fields.
The algorithm keeps track of the centroids of the subsets, and proceeds in simple iterations.
The initial partitioning is randomly generated, that is, we randomly initialize the centroids to
some points in the region of the space.
K-means algorithm is an iterative algorithm that tries to partition the dataset into K pre-
defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one
group.It involves
878
ISSN: 0374-8588
Volume 21 Issue 14, December 2019
_____________________________________________________________________________________
_____________________________________________________________________________________
relationship between the sales channels and shipping cost. Then, found out the reason for the
low preferred channel.
IV. RESULT
880
ISSN: 0374-8588
Volume 21 Issue 14, December 2019
_____________________________________________________________________________________
Figure 1: -
From the above analysis, it’s clear that the offline sales channel is highly preferred by
customers than the online sales channel. Then, further drilling down into the plot shows that,
offline orders have low revenue compared to the online orders. Both channels have their own
advantage and disadvantage.Further analyzing in the regions and units sold, we get to know
that, most of the online purchase are from the farther regions and the quantity of the product.
Maybe, according to the insights that I gained from the data, the products with many quantity or
high weighed products (if quantity is less but the price is high) only be purchased via online
channel. Furthermore analysis with the shipping cost, got some strong insights.
881
ISSN: 0374-8588
Volume 21 Issue X, Month 2019
_____________________________________________________________________________________
Figure 2: -
Referring through the figure-1, Orders coming through online was receiving maximum only in
bulk and is also giving more revenue than offline mode of sale. This may be due to following
reasons:
1. Products ordered through online cannot be self-transferred easily.
2. Online products might have many options to choose and customize rather than offline.
This might be reason for high revenue.
3. Online ordering might give tracking facilities which cannot be done through
offline. My analysis for why orders are less in online.
1. Shipping cost is more for the purchase of less amount.
2. Some household products might be needed fast. Since online orders gets delayed in
delivery offline sales for small products is high.
3. People who is unaware of online mode of sale will buy through offline.
V. CONCLUSION
In this paper, weka is used to analyze the sales data of a retail company using centroid
based clustering. The main intention of this paper is to help the company in knowing the
sectors where it is getting more profit and where it needs to improve its sales. We used 100
sales data which contains various factors like mode of sale, area of sale, profit amounts,
product etc. Using these data this project will analyses and provide the necessary data to the
company.
REFERENCES
_____________________________________________________________________________________
883