0% found this document useful (0 votes)
118 views

Customer Data Analysis

This document summarizes a presentation on using customer data to improve demand forecasting. It describes the retail transaction dataset used, which contains over 500,000 records. Key attributes include invoices, products, prices, customers and countries. The solution segments customers based on purchase history and uses machine learning to predict which customers will buy certain products. Decision trees achieved the best accuracy on new test data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

Customer Data Analysis

This document summarizes a presentation on using customer data to improve demand forecasting. It describes the retail transaction dataset used, which contains over 500,000 records. Key attributes include invoices, products, prices, customers and countries. The solution segments customers based on purchase history and uses machine learning to predict which customers will buy certain products. Decision trees achieved the best accuracy on new test data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Customer data analysis

Presentation for Hackerearth Sigma-Thon 1.0


Demand Forecasting -> FMCG

Problem statement
The goal is to come up with an analytical solution for better demand
forecasting by mining insights from the marketplace, consumer, and
competitor data.
Dataset Used

 The dataset is a transnational data which contains all the transactions


occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered
non-store online retail. The company mainly sells unique all-occasion gifts.
Many customers of the company are wholesalers.
 Data Set Characteristics:
 Multivariate, Sequential, Time-Series
 Number of Instances: 541909
 Area: Business
 Attribute Characteristics: Integer, Real
 Number of Attributes: 8
 Date Donated: 2015-11-06
Dataset information

 InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned


to each transaction. If this code starts with letter 'c', it indicates a cancellation.
 StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely
assigned to each distinct product.
 Description: Product (item) name. Nominal.
 Quantity: The quantities of each product (item) per transaction. Numeric.
 InvoiceDate: Invoice Date and time. Numeric, the day and time when each
transaction was generated.
 UnitPrice: Unit price. Numeric, Product price per unit in sterling.
 CustomerID: Customer number. Nominal, a 5-digit integral number uniquely
assigned to each customer.
 Country: Country name. Nominal, the name of the country where each customer
resides.
Our Solution

 We have come up with a model which will identify the type of customers
based on their purchase history and segmented them.
 Then use Machine Learning to figure out the type of customers and suggesting
them according products.
Additional information

 What is Customer Segmentation ?


Customer segmentation is a process where we divide the consumer base of the
company into subgroups. We need to generate the subgroups by using some specific
characteristics so that the company sells more products with less marketing
expenditure. Before moving forward, we need to understand the basics, for example,
what do I mean by customer base? What do I mean by segment? How do we generate
the consumer subgroup? What are the characteristics that we consider while we are
segmenting the consumers? Let's answers these questions one by one.
Basically, the consumer base of any company consists of two types of consumers:
 Existing consumers
 Potential consumers
Generally, we need to categorize our consumer base into subgroups. These subgroups
are called segments. We need to create the groups in such a way that each subgroup of
customers has some shared characteristics.
 What is STP ?
STP stands for Segmentation-Targeting-Positioning. In this approach, there are three stages.
The points that we handle in each stage are explained as follows:
 Segmentation: In this stage, we create segments of our customer base using their profile
characteristics as well as consider features provided in the preceding figure. Once the
segmentation is firm, we move on to the next stage.
 Targeting: In this stage, marketing teams evaluate segments and try to understand which
kind of product is suited to which particular segment(s). The team performs this exercise
for each segment, and finally, the team designs customized products that will attract the
customers of one or many segments. They will also select which product should be offered
to which segment.
 Positioning: This is the last stage of the STP process. In this stage, companies study the
market opportunity and what their product is offering to the customer. The marketing
team should come up with a unique selling proposition. Here, the team also tries to
understand how a particular segment perceives the products, brand, or service. This is a
way for companies to determine how to best position their offering. The marketing and
product teams of companies create a value proposition that clearly explains how their
offering is better than any other competitors. Lastly, the companies start their campaign
representing this value proposition in such a way that the consumer base will be happy
about what they are getting.
Data Analysis

It is observed that UK has done most of the transactions. (19857)


Least amount of transactions were made by countries like Brazil, RSA etc. (only 1)
After removing duplicate entries and all the cancelled orders, the order amounts
are distributed as follows:
Now we grouped data according to important words used in products
and clustered them. Then we plot the silhouette score for each
cluster.
We also analysed which cluster has what common words or the most frequent
words in each cluster using wordclouds.

It is seen that words like 'box',


'pot' are common in all
clusters.
Using PCA, we reduced the dimensionality of the dataset. The plot for amount of
variance explained is:

It is seen that
more than 100 Principal Components are needed to explain more than 90 % of the variance.
Now we made customer segments and again checked variance explained .
After the segmentation of customers and some hyperparameter tuning, the
customers are classified and grouped. Selected customers’ who are then
labelled, their data is retained.
This led us to build a machine learning model which can predict what kind of
customers are accustomed or are most probable of buying certain goods.
From all the classifiers the best accuracy was provided by Decision Tree
Classifier.

The accuracy obtained by the model on testing on some of the relatively


new data was….

..which is pretty good.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy