Fin Irjmets1653303840

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com
CUSTOMER SEGMENTATION USING MACHINE LEARNING
Pyla. Srinivas Dileep*1, Dr. M. Seshashayee*2
*1Final Year Student Msc Data Science, Department Of Computer Science, Gandhi Institute Of
Technology And Management, Andhra Pradesh, Visakhapatnam, India.
*2Assistant Professor, Department Of Computer Science, Gandhi Institute Of Technology
And Management, Andhra Pradesh, Vishakhapatnam, India.
ABSTRACT
The emergence of the many competitors and entrepreneurs has caused tons of tension mong competing
businesses to seek out new buyers and keep the old ones. As a result of the predecessor, the necessity for
exceptional customer service becomes appropriate no matter the dimensions of the business. Furthermore, the
power of any business to know the requirements of every one of its customers will provide greater customer
support in providing targeted customer services and developing customized customer service plans. This
understanding is feasible through structured customer service. Each segment has customers who share
equivalent market features. Big data ideas and machine learning have furthered the greater acceptance of
automated customer segmentation perspectives in favour of traditional market analytics that always don't
work when the customer base is extremely large, hence k-means clustering algorithm is employed for this
purpose and therefore the program is trained employing a 100-pattern two-factor dataset derived from the
retail trade.
Keywords: Customer Segment, K – Means, Machine Learning.
I. INTRODUCTION
Over the years, due to increasing competition in the businesses and the availability of large-scale historical data
has resulted in the extensive use of knowledge mining techniques to get important and strategic information
that's hidden in the dataset of the organizations. Data filtering is the process of getting logical details from a
dataset and presenting it in a human-accessible way for decision support. Data processing techniques
distinguishes areas like statistics, artificial intelligence, machine learning, and data systems. Data processing
applications aren't limited to bioinformatics, meteorology, fraud detection, financial analysis, and customer
segmentation. The key to the present paper is to identify customer segments within the commercial business
employing a data mining method. Customer division means dividing the customers based on some
characteristics of the business, into groups called customer segments such that each customer segment consists
of consumers who share similar market characteristics. These distinctions are supporting factors that will
directly or indirectly influence the market or business-like product preferences or expectations, locations,
behaviour so on. The significance of customer segmentation includes, the power of a business to customize
market plans which will be appropriate for each segment of its customers; support for business decisions
supported a risky environment like debt relations with their customers; Identification of products associated
with individual components and the way to manage demand and supply power; disclose the interdependence
and interchange between consumers, between products, or between customers and products, the power to
predict customer decline, and which customers are most likely to possess problems and also to consider other
marketing research questions and provide clues to finding solutions. This mode of learning is classified under
Unsupervised Learning. Integration algorithms include the k-Means the algorithm, k-nearest algorithm, Sorting
Map (SOM), and more. These algorithms, without prior knowledge of the info, are ready to identify clusters in
them by repeated comparisons of input patterns until stable qualifications within the training examples are
obtained by counting on the topic matter or the method.
II. METHODOLOGY
In this paper, collection of data is a data preparation phase. The feature usually helps to refine all data items at a
standard rate to improve the performance of clustering algorithms. There are many ways to partition, which
vary in severity, data requirements, and purpose. Group analysis is an integration or unification, approach to
consumers based on their similarity. There are two main types of categorical group analysis in market policy: a)
Hierarchical group analysis, and b) Classification.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3484]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

The algorithm has three steps:


1. Initialization: It is done once the K has been decided, k centroids are obtained from the data space.
2. Secondly, the objects are assigned to the centroids: objects of the data are assigned to its nearest
centroid.
3. Updating of the Centroids: Taking the average position of the objects, the new centroids are updated for
each group.
4. Cluster Assignment: Finally, cluster assignment is done. The process is implemented to automatically adjust
the mean value for each cluster in the data when the assignment is done.
III. MODELING AND ANALYSIS
The main goal behind clustering methods like k-means is to define the clusters such that the intra-cluster
variation stays minimum.
minimize (sum W(Ck)), k=1…k
Where Ck represents the kth cluster and W(Ck) denotes the intra-cluster variation. With the calculation of the
total intra-cluster variation, one can evaluate the compactness of the clustering boundary. We can then proceed
to define the optimal clusters as follows
First, we calculate the clustering algorithm for several values of k. This can be done by setting up a variation
within k from 1 to 10 clusters. Then measurement the total intra-cluster is done. Then, we proceed to plot
based on the number of k clusters
IV. RESULTS AND DISCUSSION
Analyse and visualize our dataset:
Here we have the following features:
1. Customer_ID: It is the unique ID given to a customer
2. Gender: Gender of the customer
3. Age: The age of the customer
4. Annual Income ($): It is the annual income of the customer
5. Spending Score: It is the score (out of 100) given to a customer by the mall authorities, based on the money
spent and the behaviour of the customer

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3485]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

Visualize our dataset:


Now we will visualize the dataset using matplotlib and seaborn to understand the relationship between
columns. From this, we understand that 20-40 age group people do more shopping in comparison to other age
group peoples. And the person whose annual income is between $50,000 to $1,00,000 do more shopping in
comparison to others

Now let’s see which gender purchase more things:


Here we can obviously notice that females do more shopping in comparison to males.

Creating Clusters:
It is clear from the figure that that the number of clusters to be taken is equal to 5, as the slope of the curve is not
steep enough after it.

The data(clusters) are plotted on a spending score Vs annual income curve. Let us now analyse the results of the
model.
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[3486]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:04/Issue:05/May-2022 Impact Factor- 6.752 www.irjmets.com

The mall customers can be broadly grouped into 5 groups based on their purchases made in the mall.
V. CONCLUSION
In this project, segments of customers are created using the k-means clustering model and analysed the dataset,
in various ways. Visualization of the data set has been done for the better understanding about all the elements
and its relation between the data. We used a clustering approach called K-means clustering, in particular. K-
means clustering is one of the most popular clustering methods, and it's frequently the first thing practitioners
try when they're working on a clustering problem. K- means are used to divide data points into discrete, non-
overlapping groupings. One of the most common uses of K-means clustering is client segmentation in order to
gain a better understanding of them, which can then be used to boost the company's income.
VI. REFERENCES
[1] Blanchard, Tommy. Bhatnagar, Pranshu. Behera, Trash. (2019). Marketing Analytics Scientific Data:
Achieve your marketing objectives with Python's data analytics capabilities. S.l: Packt printing is
limited
[2] Griva, A., Bardaki, C., Pramatari, K., Papakiriakopoulos, D. (2018). Sales business analysis: Customer
categories use market basket data. Systems Expert Systems, 100, 1-16.
[3] Hong, T., Kim, E. (2011). It separates consumers from online stores based on factors that affect the
customer's intention to purchase. Expert System Applications, 39 (2), 2127-2131.
[4] Hwang, Y. H. (2019). Hands-on Advertising Science Data: Develop your machine learning marketing
strategies… using python and r. S.l: Packt printing is limited
[5] Puwanenthiren Premkanth, - Market Classification and Its Impact on Customer Satisfaction and Special
Reference to the Commercial Bank of Ceylon PLC.‖ Global Journal of Management and Business
Publisher Research: Global Magazenals Inc. (USA). 2012. Print ISSN: 0975-5853. Volume 12 Issue 1.
[6] Puwanenthiren Premkanth, - Market Classification and Its Impact on Customer Satisfaction and Special
Reference to the Commercial Bank of Ceylon PLC.‖ Global Journal of Management and Business
Publisher Research: Global Magazenals Inc. (USA). 2012. Print ISSN: 0975-5853. Volume 12 Issue 1.
[7] Sulekha Goyat. "The basis of market segmentation: a critical review of the literature. European Journal
of Business and Management www.iiste.org. 2011. ISSN 2222- 1905 (Paper) ISSN 2222-2839 (Online).
Vol 3, No.9, 2011
[8] By Jerry W Thomas. 2007. Accessed at: www.decisionanalyst.com on July 12, 2015.
[9] T.Nelson Gnanaraj, Dr.K.Ramesh Kumar N.Monica. AnuManufactured cluster analysis using a new
algorithm from structured and unstructured data. International Journal of Advances in Computer
Science and Technology. 2007. Volume 3, No.2.
[10] McKinsey Global Institute. Big data. The next frontier is creativity, competition and productivity. 2011.
Accessed at: www.mckinsey.com/mgi on July 14, 2015.
[11] Jean Yan. - Big Data, Big Opportunities- Domains of Data.gov: Promote, lead, contribute, and collaborate
in the big data era. 2013. Retrieved from:
http://www.meritalk.com/pdfs/bdx/bdxwhitepaper090413.pdf July 14, 2015.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[3487]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy