0% found this document useful (0 votes)
111 views5 pages

IEEE Conference Template

Uploaded by

Divanshu Nayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views5 pages

IEEE Conference Template

Uploaded by

Divanshu Nayan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Bisecting K-Means - An Efficient Approach to

Customer Segmentation
Divanshu Nayan Pranshu Mishra Shahoor Ahmed
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
C.V. Raman Global University C.V. Raman Global University C.V. Raman Global University
Bhubaneswar, India Bhubaneswar, India Bhubaneswar, India

Dr. Bichitrananda Behra


Computer Science and Engineering
C.V. Raman Global University
Bhubaneswar, India

Abstract—Organizations can efficiently segment their con- for evaluating clients based on their purchasing patterns. To
sumer base by leveraging RFM (Recency, Frequency, and Mon- quantify the Recency, Frequency, and Monetary aspects, a
etary) values, derived from analyzing transactional data over scoring system is developed. These scores are then amalga-
a specified period. This segmentation approach facilitates the
identification of groups with similar behaviors, enabling a deeper mated to generate an RFM score, ranging from 555 to 111
understanding of customer needs and the exploration of potential (Haiying and Yu, 2010). This composite score serves as a tool
clients for the business. Additionally, segmenting the client base for examining customers’ historical and current behaviors to
has a positive impact on revenue generation. Emphasizing the predict future patterns. Notably, within this framework, the
retention of current consumers over acquiring new ones is widely scores of Recency, Frequency, and Monetary exhibit a direct
acknowledged as a priority. For instance, businesses can employ
marketing strategies tailored to specific market niches to cultivate correlation with customers’ lifetime value and retention rates.
client loyalty and enhance retention efforts.The study utilizes
classic K-means and hierarchical clustering algorithms to cluster
transactional data subsequent to conducting an RFM analysis. It
introduces a novel approach for selecting the Bisecting K-Means Following the computation of recency, frequency, and mon-
method. The efficacy of these techniques is evaluated based on
cluster compactness, execution time, and the average similarity etary values, the K-Means technique is applied to group the
between each cluster and its most similar counterpart. variables into clusters within the customer base. This facilitates
Index Terms—Customer segmentation, RFM analysis, K- the identification of which consumer group contributes most
Means, Hierarchical Clustering, Bisecting K-Means significantly to the company’s profitability by examining the
behavior of each cluster. Additionally, two other clustering
I. I NTRODUCTION algorithms, the Bisecting K-Means algorithm and Hierarchi-
The current business environment has become more com- cal clustering, are employed. The objective of this study is
petitive, requiring new approaches to maintain a competitive to introduce a method for enhancing the interpretability of
edge. A customer segmentation model’s implementation can clusters, improving compactness, and reducing cluster spread
greatly increase business earnings. The Pareto principle, which and processing time. Once customer clusters are established,
states that every 2 customers out of 10 usually contribute understanding the distinctions among these groupings be-
disproportionately to revenue, emphasizes the need of prior- comes imperative. A thorough examination of the clusters is
itizing client retention over gaining new ones. Business ex- conducted to identify targeted clients and tailor offers and
perts can customize marketing strategies, identify trends, plan promotions relevant to their needs and preferences.Marketing
product development, coordinate advertising campaigns, and professionals will find the proposed consumer segmentation
provide pertinent items by utilizing customer segmentation, methodology valuable for its potential to enhance targeted
which capitalizes on a variety of distinctive client attributes. marketing efforts. The remaining portion of the research
Customer segmentation guarantees successful communication concentrates on comparing and contrasting the three cluster-
with specific groups by tailoring communications. Customer ing techniques, evaluating them based on similarity, cluster
segmentation often makes use of variables including location, compactness, execution time, and other pertinent variables.
age, gender, income, lifestyle, and past purchasing patterns. This comparative analysis will provide valuable insights into
In this context, behavioral data is utilized for segmenta- the strengths and weaknesses of each technique, enabling
tion due to its widespread availability, dynamic nature, and marketing professionals to make informed decisions about
foundation in past purchase behaviors. Recency, Frequency, which clustering approach best suits their specific needs and
and Monetary (RFM) analysis emerges as a prominent method objectives.
II. A LGORITHM DESCRIPTION utilizing three dimensions to segment customers: Monetary
(M), Frequency (F), and Recency (R).
The segmentation process utilizes the transactional dataset
of a company’s clients, employing three distinct algorithms to 1) Recency: How recently did the client make a purchase?:
group clients based on RFM analysis. Initially, the data un- The amount of time a customer takes between two purchases
dergoes pre-processing to remove outliers and filter significant is known as their recency value. A lower number of recency
occurrences. The z-score method is employed to identify out- suggests that the client makes several quick trips to the
liers and assess how the data align with the mean and standard business. In a similar vein, a higher value suggests a lower
deviation. Through this method, the standard deviation and likelihood of a customer visiting the business soon.
mean are standardized to 0 and 1, respectively. Outliers are 2) Frequency: How many times did the customer make a
identified as data points that exhibit significant deviation from purchase?: The quantity of purchases a consumer makes in a
the mean (zero). Subsequently, the recency, frequency, and certain time frame is known as their frequency. The company’s
monetary values are computed by inputting the preprocessed clients are more devoted the greater the frequency value.
data into the RFM model.The three clustering algorithms, 3) Monetary: What was the customer’s expenditure?: The
namely K-Means, Hierarchical Clustering, and Bisecting K- quantity of money spent by the consumer during a specific
Means, are subsequently applied to the three qualities (recency, time period is referred to as monetary. The more money spent,
frequency, and monetary values). These algorithms partition the more revenue the company receives from them.
the clients into distinct groups based on their RFM characteris-
tics. Following this, the cluster compactness, similarity index,
and execution time of each clustering method are scrutinized to B. K-Means clustering:
assess their effectiveness. For a quick reference, a summarized
depiction of the suggested client segmentation strategy is K-Means is a common algorithm that divides the data into
presented in Figure 1. the number of clusters that are specified so that the intra-cluster
similarity is high. It takes the parameters and the number of
clusters as inputs. The iterative K-Means method calculates
the centroids’ values before to each iteration. The centroids
determined at each iteration determine which clusters the data
points are shifted within. The procedure is iterated until the
total can no longer be reduced. Algorithm 1 displays the K-
Means algorithm.
Min-max normalization is used to normalize the recency,
frequency, and monetary values of the variables. Because
skewed values could be troublesome, this is done. Now, the
scaled data is subjected to the clustering method. To determine
which customer category generates the most revenue for the
business, the amount of money earned by each is calculated.
K-means has complexity O(n + k + i). where k is the number
of clusters, i denotes the number of iterations, and ’n’ denotes
the number of instances.
K-Means Algorithm
1: Input:
2: - Dataset D = {x1 , x2 , . . . , xn } with n data points in d
dimensions.
3: - Number of clusters k.
figureOverview of the integrity discovery system using secure 4: Output:
introspection 5: - Cluster centroids {c1 , c2 , . . . , ck }.
6: Initialization:
A. RFM analysis 7: 1. Randomly select k data points as initial cluster centroids
(0) (0) (0)
{c1 , c2 , . . . , ck }.
In database marketing, Recency, Frequency, and Monetary 8: for t = 1 to T (maximum iterations) do
(RFM) analysis stands as a potent and widely recognized 9: 2. Assignment Step: Assign each data point xi to the
method. Ranking clients based on their historical purchasing nearest cluster centroid cj based on distance metric
behavior is a prevalent practice in this realm. RFM analysis (e.g., Euclidean distance).
finds numerous applications across various domains, including
online shopping and e-commerce, particularly in scenarios (t−1) 2
involving a large number of clients. This strategy entails assign xi to cluster j = arg min ||xi − cl ||
l∈{1,2,...,k}
10: 3. Update Step: Recompute the centroid of each cluster D. Bisecting K-means
cj as the mean of the data points assigned to it. Bisecting K-means offers a unique clustering perspective,
(t) 1 X merging the top-down logic of divisive hierarchical clustering
cj = xi
|Cj | with the iterative splitting of K-means. It starts with all
xi ∈Cj
data points in a single cluster and strategically divides the
11: 4. Termination Criterion: If the centroids haven’t cluster with the most significant internal differences using
changed significantly between iterations (or a maximum K-means (K=2). This selective splitting continues until the
number of iterations is reached), then terminate. Other- desired number of clusters is reached. This approach can
wise, go back to step 2. be advantageous for large cluster counts due to its focus
12: end for=0 on the most informative splits and its tendency to produce
C. Hierarchical Clustering: clusters with more balanced sizes compared to standard K-
means.The time complexity of the bisecting k-means algorithm
Hierarchical clustering is an unsupervised learning tech- is O((K-1)IN), where I is the number of iterations to converge.
nique that organizes data points into a nested structure, similar Bisecting k-means is also linear in the size of the documents.
to a family tree. It begins by treating each data point as its own Bisecting K-Means Clustering Algorithm
individual cluster. Then, it iteratively merges the most similar
clusters based on a chosen distance metric (like Euclidean
1: Input:
distance) until a single cluster encompassing all data points is
2: - Dataset D = {x1 , x2 , . . . , xn } with n data points in d
formed. This process creates a visual representation called a
dimensions.
dendrogram, which depicts the merging hierarchy and allows
3: - Number of clusters k.
you to determine the optimal number of clusters for your
4: Output:
data analysis. However, the computational cost of hierarchical
5: - Cluster centroids {c1 , c2 , . . . , ck }.
clustering can be significant. In the worst-case scenario, its
6: Initialization:
time complexity scales with the cube of the number of data
7: 1. Start with all data points in a single cluster.
points (O(n3 )), making it less suitable for massive datasets
8: for t = 1 to k − 1 (bisection steps) do
compared to other clustering algorithms.
9: 2. Splitting Step: Apply K-Means algorithm (often
Min-max normalization is used to scale the variables, same
with a single iteration) to the current cluster to split
like in the preceding procedure. The clients are currently
it into two sub-clusters.
grouped according to the most recent, frequent, and monetary
10: 3. Choose one of the sub-clusters for further splitting
values using hierarchical clustering.
in the next iteration. Common strategies include:
Agglomerative Hierarchical Clustering Algorithm:
11: (a) Selecting the sub-cluster with higher centroid dis-
tance (larger diameter).
1: Input: 12: (b) Selecting the sub-cluster with higher within-cluster
2: - Dataset D = {x1 , x2 , . . . , xn } with n data points in d variance (more spread).
dimensions. 13: end for
3: - Distance metric (e.g., Euclidean distance). 14: 4. The final set of clusters consists of the k remaining
4: - Linkage function (e.g., Single Linkage, Complete Link- clusters after bisection. =0
age).
5: Output: III. E XPERIMENTATION AND RESULT DISCUSSION
6: - Dendrogram representing the hierarchical cluster struc-
ture.
7: Initialization:
8: 1. Consider each data point as an individual cluster.
9: 2. Compute a proximity matrix storing the distance be-
tween all data points.
10: for t = 1 to n − 1 (merging iterations) do
11: 3. Find closest clusters: Identify the two most similar
clusters based on the chosen linkage function and the
proximity matrix.
12: 4. Merge clusters: Combine the identified clusters into
a new cluster.
13: 5. Update proximity matrix: Recalculate distances By using the transactional data set of customers of an online
between the new cluster and all remaining clusters. retailer for a year, which is sourced from the University of
14: end for California, Irvine (UCI) repository, the effectiveness of the
15: 6. The final set of clusters and their hierarchy is repre- suggested methodology is assessed. This section outlines the
sented by the dendrogram. =0 consumer segmentation process step-by-step. The dataset has
eight attributes, such as the customer ID, product code, name, other two. The average distance between the generated clusters
price, date, and time of purchase, among others. There are is studied using the silhouette width and the average similarity
541910 instances with eight attributes in the original data of each cluster with its most similar cluster is meansured by
set. The dataset includes consumer purchases made between Davies-Bouldin score. The silhouette plot is a visual analysis
December 1, 2010, and December 9, 2011. During data of the clustering result that shows the number of customers in
pre-processing, any cases with missing values in significant each cluster as well as the shortest distance between a cluster
attributes, unit prices and quantities less than 0, and dates point and another cluster point. The data points inside a cluster
older than the current date are eliminated. As an extra step are closer to one another but not to the ones in other clusters
in the pre-processing of the data, the Z-Score analysis is also when the average silhouette width is bigger and vice versa.And
carried out to detect the outliers. Only those records that pass data points inside a cluster are less similar to the ones in
the filtering process—such as invoice data and time, product other clusters when the Davies-Bouldin score is Smaller and
quantity per transaction, and product pricing per unit in terms vice versa. For the final clusters produced by the Hierarchical
of currency and frequency—have been sent into the benchmark Clustering and Bisecting K-Means approach as well as the
algorithms. The three extra attributes—recentness, frequency, K-means clustering technique, the average silhouette width is
and monetary—that are produced from RFM computation are computed. The average silhouette width of the Bisecting K-
present in 4067 occurrences of the amended dataset. Table 2 Means clustering is found to be larger than that of the K-Means
displays a description of the original dataset. clustering and the Hierarchical clustering.

IV. C ONCLUSION :
Customer relationships will be strengthened by customer
segmentation. While acquiring new clients is significant for
the business, keeping the current clientele is even more crucial
(Tong et al., 2017). This work uses RFM analysis for seg-
mentation and then expands it to include other methods such
as K-Means clustering, Hierarchical Clustering, and a new
technique. K-Means bisection achieved by slightly altering
the current K-Means clustering. These methods’ operation is
examined. After analyzing how long each algorithm takes
to run, it is found that the suggested Bisecting K-Means
strategy takes less time. Because of its simplicity and lower
computation cost, the suggested algorithm is more efficient.
Due to the fact that segmentation is carried out according to
values of currency, frequency, and recency, the business is able
Fig. 2 displays the result plots produced by bisecting K- to tailor its marketing campaigns to the clients’ purchasing
means, hierarchical clustering, and K-means clustering. Every habits. Future research will examine consumer behavior in
algorithm’s execution time is computed using the system time. each category, including the products that members of that
It is found that because of its lower computational cost, the segment purchase on a regular basis. This would make it easier
suggested Bisecting K-Means method works faster than the to give particular products greater promotional incentives.
R EFERENCES on Information Technology, Information Systems and Electrical Engi-
neering (ICITISEE), Yogyakarta, pp. 299–303.
[1] Phan Duy Hung, Nguyen Thi Thuy Lien, and Nguyen Duc Ngoc. 2019. [19] Tong, L., Wang, Y., Wen, F., Li, X., Nov. 2017. The re-
Customer Segmentation Using Hierarchical Agglomerative Clustering. search of customer loyalty improvement in telecom industry
In Proceedings of the 2nd International Conference on Information Sci- based on NPS data mining. China Commun. 14 (11), 260–268.
ence and Systems (ICISS ’19). Association for Computing Machinery, https://doi.org/10.1109/CC.2017.8233665.
New York, NY, USA, 33–37. [20] Shah, S., Singh, M., 2012. Comparison of a Time Efficient Modified
[2] Chihli Hung, Chih-Fong Tsai, Market segmentation based on hierarchi- K-mean Algorithm with K-Mean and K-Medoid Algorithm. In: 2012
cal self-organizing map for markets of multimedia on demand, Expert International
Systems with Applications, Volume 34, Issue 1, 2008, Pages 780-787,
ISSN 0957-4174,
[3] I. Maryani, D. Riana, R. D. Astuti, A. Ishaq, Sutrisno and E. A. Pratama, IEEE conference templates contain guidance text for compos-
”Customer Segmentation based on RFM model and Clustering Tech- ing and formatting conference papers. Please ensure that all
niques With K-Means Algorithm,” 2018 Third International Conference
on Informatics and Computing (ICIC), Palembang, Indonesia, 2018, pp. template text is removed from your conference paper prior to
1-6, doi: 10.1109/IAC.2018.8780570. submission to the conference. Failure to remove the template
[4] A. Joy Christy, A. Umamakeswari, L. Priyatharsini, A. Neyaa, RFM text from your paper may result in your paper not being
ranking – An effective approach to customer segmentation, Journal of
King Saud University - Computer and Information Sciences, Volume published.
33, Issue 10, 2021, Pages 1251-1257, ISSN 1319-1578,
[5] Chongkolnee Rungruang, Pakwan Riyapan, Arthit Intarasit, Khanchit
Chuarkham, Jirapond Muangprathub, RFM model customer segmenta-
tion based on hierarchical approach using FCA, Expert Systems with
Applications, Volume 237, Part B, 2024, 121449, ISSN 0957-4174,
[6] M. Aryuni, E. Didik Madyatmadja and E. Miranda, ”Customer Seg-
mentation in XYZ Bank Using K-Means and K-Medoids Cluster-
ing,” 2018 International Conference on Information Management and
Technology (ICIMTech), Jakarta, Indonesia, 2018, pp. 412-416, doi:
10.1109/ICIMTech.2018.8528086.
[7] R. Kashef, M.S. Kamel, Enhanced bisecting k-means clustering using
intermediate cooperation, Pattern Recognition, Volume 42, Issue 11,
2009, Pages 2557-2569, ISSN 0031-3203.
[8] V. Rohilla, M. S. S. kumar, S. Chakraborty and M. S. Singh,
”Data Clustering using Bisecting K-Means,” 2019 International Con-
ference on Computing, Communication, and Intelligent Systems (IC-
CCIS), Greater Noida, India, 2019, pp. 80-83, doi: 10.1109/ICC-
CIS48478.2019.8974537.
[9] S. Banerjee, A. Choudhary and S. Pal, ”Empirical evaluation of K-
Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means clus-
tering algorithms,” 2015 IEEE International WIE Conference on Elec-
trical and Computer Engineering (WIECON-ECE), Dhaka, Bangladesh,
2015, pp. 168-172, doi: 10.1109/WIECON-ECE.2015.7443889.
[10] He X., Li, C., 2016. The research and application of customer seg-
mentation one-commerce websites. In: 2016 6th International Con-
ference on Digital Home(ICDH), Guangzhou, pp. 203–208. doi:
10.1109/ICDH.2016.050.
[11] Haiying, M., Yu, G., 2010. Customer Segmentation Study of Col-
lege Students Based on the RFM. In: 2010 International Conference
on E-Business and EGovernment, Guangzhou, pp. 3860-3863. doi:
10.1109/ICEE.2010.968.
[12] Sheshasaayee, A., Logeshwari, L., 2017. An efficiency analysis on
the TPA clustering methods for intelligent customer segmentation. In:
2017 International Conference on Innovative Mechanisms for Industry
Applications (ICIMIA), Bangalore, pp. 784–788.
[13] Liu, C.C., Chu, S.W., Chan, Y.K., Yu, S.S., 2014. A Modified K-
Means Algorithm – Two-Layer K-Means Algorithm. In: 2014 Tenth
International Conference on Intelligent Information Hiding and Multi-
media Signal Processing, Kitakyushu, pp. 447–450. doi: 10.1109/IIH-
MSP.2014.118.
[14] Cho, Young, Moon, S.C., 2013. Weighted mining frequent pattern-based
customer’s RFM score for personalized u-commerce recommendation
system. J. Converg. 4, 36–40.
[15] Jiang, T., Tuzhilin, A., March 2009. Improving personal-
ization solutions through optimal segmentation of customer
bases. IEEE Trans. Knowledge Data Eng. 21(3), 305–320.
https://doi.org/10.1109/TKDE.2008.163N.
[16] Lu, H., Lin, J.Lu., Zhang, G., May 2014. A customer churn prediction
model in telecom industry using boosting. IEEE
[17] Memon, K.H., Lee, D.H., 2017. Generalised fuzzy c-means clustering
algorithm with local information. In: IET Image Processing, vol. 11, no.
1, pp. 1-12, 1.
[18] Zahrotun, L., 2017. Implementation of data mining technique for cus-
tomer relationship management (CRM) on online shop tokodiapers.com
with fuzzy c-means clustering. In: 2017 2nd International conferences

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy