Assessed Coursework Coversheet: Leeds University Business School

Leeds University
Business School
Assessed Coursework Coversheet

For use with individual assessed work
Student ID Number: 2 0 1 4 5 6 4 5 4
Module Code: LUBS5403M
Module Title: Marketing Analytics
Module Leader: Yeyi Liu
Declared Word Count: 2441
Please Note:
Your declared word count must be accurate, and should not mislead. Making a fraudulent statement concerning the
work submitted for assessment could be considered academic malpractice and investigated as such.
If the amount of work submitted is higher than that specified by the word limit or that declared on your word count, this
may be reflected in the mark awarded and noted through individual feedback given to you.
It is not acceptable to present matters of substance, which should be included in the main body of the text, in the
appendices (“appendix abuse”). It is not acceptable to attempt to hide words in graphs and diagrams; only text which
is strictly necessary should be included in graphs and diagrams.
By submitting an assignment you confirm you have read and understood the University of Leeds
Declaration of Academic Integrity
( http://www.leeds.ac.uk/secretariat/documents/academic_integrity.pdf).
1
2
Data Analysis Report for Premium Chocolate Company: to
Manage Customers’ Segmentation, Preference, and
Sustainable
Student ID: 201456454
3
Table of Content
1. Introduce......................................................................................1
2. Managing customer heterogeneity.......................................1
2.1 Customer segmentation.................................................1
a. Hierarchical method:.....................................................1
b. Kmeans method:............................................................2
c. Mclust method:...............................................................2
2.2 Understand consumers’ perception............................3
3. Customer dynamics..................................................................6
3.1 Consumer sustainable....................................................6
3.2 customer lifetime value...................................................7
4. Company sustainable competitive advantage...................8
4.1 Customer value.................................................................8
4.2 Managing sustainable...................................................10
5. Summarize.................................................................................11
6. Future suggestions.................................................................11
7. Reference...................................................................................12
0
1. Introduce
This report aims to analyse the data from a premium chocolate manufacturer (Crafty
Chocolates) about their customers heterogeneity, customers dynamics and their
sustainable competitive advantage by using appropriate analysis tools (RStudio).
2. Managing customer heterogeneity

2.1 Customer segmentation
In order to maximize the promotion profit, this analysis aim to find the most benefit
group to invest advertisement budget. From 378 customers quantitative research,
research needs to determine one or two best promoted groups from each data
analysis.
From the premium chocolate consumers perception research (Brown, Bakke, &
Hopfer,2020), the research found most of the consumers buy premium chocolate for
gift or for emotional effect. Hence, customers should be segmented into different
groups on the basis of higher income (Salary(<=3 is <25000, >3 is >25000)) and
sustainable preference (consumer's sustainability score ( < 0 is less sensitive to
sustainability, >0 is more sensitive to sustainability)) by three methods－a
hierarchical clustering, non-hierarchical methods (k-means), and model-based
cluster analysis (Mclust). In summary, the pros and cons on each segmentation
method will be discussed.
a. Hierarchical method:
In cluster analysis, the data shows the clear picture of 4 clusters. This data frame
gave each observation a particular group distinguished by their distance.
It means 378 customers can be divided into 4 groups.
Table 1. Cluster Dendrogram
Using hierarchical cluster plot to has a close look at different numbers of the clusters.
1
Table 2. Hierarchical cluster plot
These two components explain 40.07% of the point variability.
Then we can describe the variables in seg.df data below:

Table 3. Hierarchical method segment
Grou Salary Choco_Consump Sustainability_Sc
p tion ore
1 2.316 2.899 -0.416
2 2.606 2.887 0.319
3 2.500 3.093 0.148
4 1.883 2.992 -0.048
These hierarchical clustering analyse of aggregate means the group 2 and 3 are the
group which we need to target, because they have higher salary, further they are
more sensitive to sustainability.
b. Kmeans method:
This method is better for larger data set.
The result shows 4 groups difference scales sustainability score.
Table 4. Boxplot
Table 5. Kmeans method segment

p tion ore
1 2.404 3.043 -0.281
2 2.579 2.852 0.430
2
3 2.305 3.011 -0.303
4 1.851 3.021 0.152
In this boxplot, it can say that the better option is the group 2, this group both have
highest salary, positive chocolate consumption and highest sustainability.
c. Mclust method:
Mclust method can give an observation of what is the best number of groups
seperations.
## Mclust VEV (ellipsoidal, equal shape) model with 7 components:
##
## log-likelihood n df BIC ICL
## -4131.093 378 336 -10256.31 -10260.48
##
## Clustering table:
## 1 2 3 4 5 6 7
## 60 79 48 62 59 49 21
## Compare to 4 group sementation

## df BIC
## seg.mc 336 10256.31
## seg.mc4 195 10885.80
From this result, we know 7 cluster is better than 4 cluster. (Segementation pf 7

group has lower BIC, and higher log-likelihood values.)
Table 6. Mclust method segment

p tion ore
1 2.300 3.117 -0.040
2 2.797 2.975 -0.184
3 3.271 3.479 -0.089
4 2.097 2.839 -0.090
5 0.932 2.576 -0.179
6 2.673 2.776 0.588
7 1.524 3.524 0.410
Focus on the 7 groups result, we will recommend group 3, which has higher salary
and chocolate consumption is a good choice, however, group 3 has negative
sustainability score. For this reason, if the company place more emphasis on the
sustainable development, group 6 could be their best existing and potential customer
choice.
2.2 Understand consumers’ perception

Crafty Chocolates would like to know brand perception from 56 major chocolate
brands rating by 257 customers. Using customer ranking to compare with other
variations, which can help company to find the relationship between different
variables.
Computing correlation, the result shows that sugar, butter and ingredients are highly
related, vanilla and salt are related, so these two group could be represented by
common factors.
3
Table 7. Correlation plot
From the principal component analysis (PCA) result, we can have the the majority of
datas if we keep 5 components to compare our brand ranking.
Table 8. PCA brand.pc
Countinualy, we saw the rank of previous 5 components having more than half
observations.
Closer to the cumulative proportion of each component:
Comp Comp Comp Comp Comp Comp Comp Comp Comp Comp.
.1 .2 .3 .4 .5 .6 .7 .8 .9 10
27.6 48.1 63.3 75.3 84% 91.4 97.3 99.7 100% 100%
% % % % % % %
Comparing the position map of component 1 and component 2 (include 48%), it
helps us understand more specific groups of variable classification. Sugar, cooca
percentage and sweetener can be considered into one factor, and other various can
4
be seen as one factor. Therefore, this result is not clear, so we need to use rotation
to confirm factors.
Table 9. Visualising PCA
In order to load into two factors, we delet two highly correlated various
sweetener(correlated with cocoa percentage) and salt(correlated with vanilla).
After loading, we can see each various only has one high loading factor, such as
brand in factor1 is higher than factor2. In this result, we know ingredients, butter,
organic, and sugar belong to factor1, and brand, rating, cocoa percent, and vanilla
belong to factor2.
Factor1 Factor2
Brand 0.0124 -0.070
Cocoa percent -0.1157 0.1183
Rating 0.0351 -0.2156
Counts of 0.9767 0.2024
ingredients
Cocoa butter 0.8499 -0.0589
Vanilla 0.2772 0.9582
Organic 0.7186 -0.2686
Sugar 0.2176 -0.0126
And then we draw a plot about the relationship between the original variables.
5
Table 10. Variable relation
So from this factor analysis we can get the point of view that customers key attribute
of ranking brands from two aspect, one is the chocolate main ingredients, another is
brand image.
3. Customer dynamics
In this part, we would like to know customer purchase orientation and lifetime value.
3.1 Consumer sustainable
In order to understand customer requirement changing, we need to use dynamic
customer segmentation approach to analyse the data.
In this case, purchase label can be seen as a dependent variable comparing to
country, rural or urban, GDP, BMI, and children these five independent variables.
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.457e+00 8.129e-01 6.713 1.91e-11 ***
## country 1.175e-02 2.322e-03 5.061 4.18e-07 ***
## ruralurban 1.206e-01 2.065e-02 5.840 5.23e-09 ***
## GDP 2.807e-05 1.492e-06 18.809 < 2e-16 ***
## BMI -3.206e-01 3.133e-02 -10.233 < 2e-16 ***
## children 1.851e-01 3.619e-02 5.115 3.13e-07 ***
In this coefficients result, we know these five independent variables’ P value are
lower than 0.05, so these items are 99.9% confidence in our estimation.
Then we want to calculate the exponential of this coefficient, so we gain the odds
(the ratio between the purchase probability and non-purchase probability) of each
independent variable.
##> exp(coef(model1))
##(Intercept) country ruralurban GDP BMI children
234.3179038 1.0118223 1.1281484 1.0000281 0.7256828 1.2033729
This result means if the children value is increased by one unit, and then the odds of
the purchaselabel for customer will increase by 1.2.
Next step we use anova to compare with another model in order to check whether
model1 is better or not.
## Analysis of Deviance Table
6
## Model 1: purchaselabel ~ country + ruralurban + GDP + BMI + children
## Model 2: purchaselabel ~ 1
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
##1 25337 23758
##2 25342 24346 -5 -588 < 2.2e-16 ***
In this result, we know model1 actually is apparently a larger model.
Next, we gain the estimation of the logistic regression model to predict customer
purchase probability.
Table 11. Logistic regression results
Then we predict the customers purchase probability, then we segment the probability
into buy(>0.5%) and not buy (<0.5%), following by running the confusion matrix to
have a comparison between predicted purchase and the actual purchase behaviour.
0 1
0 20409 4627
1 221 86
From this result, we know there are 221 customers predicted as not purchase and
actually they are purchase, and there are 4627 customers predicted as purchase but
actually they are not purchase. So we gain our accuracy is 80.87%.
In final step, we can use the receiver operating characteristic (ROC) curve to know
the area under a curve.
Table 12. ROC curve
This curve indicated a poor prediction (lower than 90%) but still a positive rate. If we
want to have the best cutoff, we can use 0.22 as the cutoff.
Then we know in 61.9% of time, customer will have a higher purchase probability.
3.2 customer lifetime value

If company want to know how much value all customer generating for firm, they can
calculate the customer lifetime value (CLV).
We organize the data from online purchase from January 2019 to June 2020 (name
each month from 1 to 18 in exce) and then we count the active customer quantity,
monthly profit, each month cost (depend on the item total number which they buy),
then we set the discounting rate in 1%.
7
Using these data, we then can calculate the CLV in each month.
Table 13. Monthly data and CLV
Each variable comes from customer buying data. P is purchase cost, C is the total
number they buy, r is retention ratio.
Table 14. CLV evolution
From this data we can observe first month online sell obtain the highest CLV and the
sum of eighteen months CLV values are 22970.6.
4. Company sustainable competitive advantage

4.1 Customer value
In order to help company understanding customers’ value on different newly
developed chocolate products, we use the 378 customer, that we segment in
clustering analysis, attitude toward different brands in conjoint analysis.
8
At the beginning, we would like to know the quantity of choice in different prices:
## 2.76 price has 266 choices
## Price
## 2 2.76 3 4 4.95 5 7
## 81 266 443 339 31 380 350
We find that the lowest price £2 does not have the biggest choice number, in the
contrast, the two highest prices actually have a second and third of customer select
amount.
Next, we would like to know whether premium chocolates popular or not, so we use
xtabs() to cucullate the quantity.
##No Yes
##394 1496
From the result, we know more customer choose premium chocolates compared
with low-priced chocolates. Premium is the most popular choice.
Comparing with reference data, we know there are significant different on nuts,
loyalty taken with chocolates (especially for donate one), organic, premium, fairtrade,
sugar, and price (higher price means lower utility). And it shows consumers have
lower sub utility of origin and manufacturing locations.
Table 15. Coefficients
Confusion Matrix and Statistics:

selected_alternative
1 2 3
1 315 104 102
predicted_alternative
2 142 705 140
3 102 61 219
## Overall Statistics Accuracy : 0.6556
Base on the estimation, we can predict market share, than we get a good prediction
accuracy(65%)
Also, we can use data to know which kinds of choice customer will more willing to
pay.
# # 1ponds----> 0.0824 utility
##1/0.0824ponds------> 1 utility > coef(model)["NutsNuts and Fruit"]-coef(model)
["NutsNuts only"] / (-coef(model)["Price"])
9
## NutsNuts and Fruit
## -2.845858
For example, form this formula, we know customer will more like to pay for nuts only
chocolate than nuts and fruit chocolate.
4.2 Managing sustainable

Market basket analysis from supermarket partner can help chocolate company
extract insight of customers choice relation and help them setting the sell models or
packages.
There are 127 items usable for generated association rule.
After we ploted the rule, we can see most of the association rule content items with
the support are lower than 0.1, so majority of them association appears between 5%
to 10%. Furthermore, the confidence level is quite high from 30% to 100%
Table 16. Scatter plot for 7824 rules
In this result we know there are 7824 groups have the significant co-occurrence.
Then we want to focus on the group have chocolate.
Table 17. Room in to the baskets have chocolate
10
Table 18. Data of the baskets have chocolate
From the plot, we can see chocolate buying with milk, salty, snack has highly lift.
So it is highly recommend that company can put their product beside milk, salty,
snack areas or put them into the sale package.
5. Summarize
To sum up all of the analyses.
If company want to segment customer. Mclust method is the best choice if company
focus on the small and high-quality group. This method not only separate customers
into smallest group, but also provides the subdivision result. If company has an
ample budget and they want to promote boarder customer groups, the hierarchical
method is the best choice because of the wider cover customer range.
Then when we compare several brands across a lot of dimensions, we could use
some helpful components to position. A perceptual map helps us understanding the
differences which influence customers’ ranking in chocolate industry.
In logistic regression, we know there are some factors, such as country, live in rural
or urban, GDP, and having children or not, will influence customer buying
sustainability. And in the highly prediction accuracy, we know company has about
12% customer sustainable buying rate in all of customers who have bought their
chocolate.
To understand customer lifetime value changing on different time, we could use
customer evolution to explore the changing. We need to reorganize the data in order
to run them in R, so this case teach me how to manage the big data.
Using conjoint analysis helps company providing market requirement to innovation
department. It also helps company to compare with other brands about their product
difference.
Using the market basket data, company can know what kinds of sets they can use to
increase sells at terminal marketing.
6. Future suggestions
In this R analysis, we know the customer segmentation, brand rating, choice model,
customer lifetime value, sustainable, and basket correlation these analyses can help
company make decisions on promotion, brand image management, innovation,
market trend and requirement innovation.
In addition to this, I think there could be a model to analyse customer journey and
experience of the post-selling process, it will very helpful for company to improve in
the future sell service and produce development.
Also because trends and data are always changing, if there is a system that can
convert and analyse the data that companies routinely collect into specific indicators,
11
and this system can automatically filter irrelevant data, it will help companies easier
understanding the meaning of the data and using them quickly.
7. Reference
1. Brown, Allison L ; Bakke, Alyssa J ; Hopfer, Helene. 2020. Understanding
American premium chocolate consumer perception of craft chocolate and
desirable product attributes using focus groups and projective mapping. PloS
one, 2020, Vol.15 (11), p.e0240177-e0240177
12

Assessed Coursework Coversheet: Leeds University Business School

Uploaded by

Copyright:

Available Formats

Assessed Coursework Coversheet: Leeds University Business School

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessed Coursework Coversheet: Leeds University Business School

Uploaded by

Copyright:

Available Formats

Leeds University

Assessed Coursework Coversheet

Module Code: LUBS5403M

Module Title: Marketing Analytics

Module Leader: Yeyi Liu

Declared Word Count: 2441

2. Managing customer heterogeneity

These two components explain 40.07% of the point variability.

Then we can describe the variables in seg.df data below:

Table 5. Kmeans method segment

## Compare to 4 group sementation

From this result, we know 7 cluster is better than 4 cluster. (Segementation pf 7

Table 6. Mclust method segment

2.2 Understand consumers’ perception

3.2 customer lifetime value

4. Company sustainable competitive advantage

Table 15. Coefficients

Confusion Matrix and Statistics:

4.2 Managing sustainable

Table 16. Scatter plot for 7824 rules

Table 17. Room in to the baskets have chocolate

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.