0% found this document useful (0 votes)
17 views10 pages

Applying Data Mining To Telecom Churn Ma

Uploaded by

alvinepaty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Applying Data Mining To Telecom Churn Ma

Uploaded by

alvinepaty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Expert Systems with Applications 31 (2006) 515–524

www.elsevier.com/locate/eswa

Applying data mining to telecom churn management


Shin-Yuan Hung a, David C. Yen b,*, Hsiu-Yu Wang c
a
Department of Information Management, National Chung Cheng University, Chia-Yi 62117, Taiwan, ROC
b
Department of DSC and MIS, Miami University, 309 Upham, Oxford, OH 45056, USA
c
Department of Information Management, National Chung Cheng University, Chia-Yi 62117, Taiwan, ROC

Abstract
Taiwan deregulated its wireless telecommunication services in 1997. Fierce competition followed, and churn management becomes a major
focus of mobile operators to retain subscribers via satisfying their needs under resource constraints. One of the challenges is churner prediction.
Through empirical evaluation, this study compares various data mining techniques that can assign a ‘propensity-to-churn’ score periodically to
each subscriber of a mobile operator. The results indicate that both decision tree and neural network techniques can deliver accurate churn
prediction models by using customer demographics, billing information, contract/service status, call detail records, and service change log.
q 2005 Elsevier Ltd. All rights reserved.

Keywords: Churn management; Wireless telecommunication; Data mining; Decision tree; Neural network

1. Introduction From a business intelligence perspective, churn manage-


ment process under the customer relationship management
Taiwan opened its wireless telecommunication services (CRM) framework consists of two major analytical modeling
market in 1997, with licenses granted to six mobile operators. efforts: predicting those who are about to churn and assessing
Competition has been fierce from this point. For any the most effective way that an operator can react (including ‘do
acquisition activity, mobile operators need to have significant nothing’) in terms of retention. This research focuses on the
network investment to provide ubiquitous access and quality former. It intends to illustrate how to apply IT technology to
communications. The market was saturated within 5 years, and facilitate telecom churn management. Specifically, this
mergers and acquisitions reduced the number of mobile research uses data mining techniques to find a best model of
operators from six to four by the end of 2003. predictive churn from data warehouse to prevent the customers
When the market is saturated, the pool of ‘available turnover, further to enhance the competitive edge.
customers’ is limited and an operator has to shift from its The remainder of this paper is organized as follows. Section
acquisition strategy to retention because the cost of acquisition 2 defines some basic concepts (and rationale) that we use in the
is typically five times higher than retention. As Mattersion research, Section 3 describes our research methodology, and
(2001) noted, ‘For many telecom executives, figuring out how Section 4 presents the findings. Section 5 concludes this paper.
to deal with Churn is turning out to be the key to very survival
of their organizations’.
Based on marketing research (Berson, Smith, & Thearling, 2. Basic concept
2000), the average churn of a wireless operator is about 2% per
month. That is, a carrier lost about a quarter of its customer 2.1. Churn management
base each year. Furthermore, Fig. 1 suggests that Asian
telecom providers face a more challenging customer churn than Berson et al. (2000) noted that ‘customer churn’ is a term
those in other parts of the world. used in the wireless telecom service industry to denote the
customer movement from one provider to another, and ‘churn
management’ is a term that describes an operator’s process to
* Corresponding author. retain profitable customers. Similarly, Kentrias (2001) thought
E-mail addresses: syhung@mis.ccu.edu.tw (S.-Y. Hung), yendc@muoho. that the term churn management in the telecom services
edu (D.C. Yen), HsiuYu.Wang@msa.hinet.net (H.-Y. Wang). industry is used to describe the procedure of securing the most
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. important customers for a company. In essence, proper
doi:10.1016/j.eswa.2005.09.080 customer management presumes an ability to forecast the
516 S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524

48 2.3. Data mining application


50
Percent of Customers

37
40 Table 1 below summarizes some data mining functional-
25 ities, techniques, and applications in the CRM domain.
Churning

30
20
10 3. Churn prediction data mining assessment methodology
0
Europe U.S. Asia The purpose of this research is to assess the performance of
various data mining techniques when applied to churn
Fig. 1. Annual telecom operator customer churn rate by region (Mattersion,
2001).
prediction. The methodology consists of three parts:

(1) An IT infrastructure to facilitate our research, which


includes a common customer base, attributes and
customer decision to move from one service provider to transactions, modeling parameters, model results, etc.
another, a measurement of ‘customer profitability’, and a set of (2) A model-independent knowledge discovery process to
strategic and tactic retention measures to reduce the movement discover customer behavior prior to churn, by using data
(Lariviere & Van den Poel, 2004). mining techniques, and
(3) A set of measurements to quantify the performance of
In practice, an operator can segment its customers by
models developed by different modeling tools, such as
‘profitability’ and focus retention management only on those
decision tree and neural network.
profitable segments, or score the entire customer base with
‘propensity to churn’ and prioritize the retention effort based on
profitability and churn propensity. However, the telecom
services industry is yet to standardize a set of ‘profitability’ 3.1. Churn management research framework
measurements (e.g. current versus life-time, business unit
Fig. 2 shows the conceptual infrastructure that we use to
versus corporate, account versus customer, ‘loyalty’ versus
assess the performance of various churn prediction models
‘profitability’, etc.). This research focuses on churn prediction,
built via data mining techniques. We built the infrastructure on
and assumes low correlation between ‘profitability’ and
a data warehouse with tables, views, and micros to facilitate the
‘propensity to churn’ to simplify the modeling framework.
following model development and assessment processes.

(1) Identify data items of interest from customer behavior


2.2. Data mining attributes that can differentiate between churners and non-
churners,
Thearling (1999) proposed that data mining is ‘the (2) Extract, transform, and derive variables from identified
extraction of hidden predictive information from large data items,
databases’, a cutting-edge technology with great potential to
help companies dig out the most important trends in their huge
database. Emerging data mining tools can answer business
Table 1
questions that have been traditionally too time-consuming to Data mining functionalities, techniques, and CRM applications
solve. Lejeune (2001) addressed that data mining techniques
Functionality Technique Application
allow the transformation of raw data into business knowledge.
The SAS Institute (2000) defines data mining as ‘the process of Association Set theory Cross sell
Statistics
selecting, exploring and modeling large amount of data to Bayesian classification
uncover previously unknown data patterns for business Estimation Neural network Exchange rate esti-
advantage’. Consequently, we would say that data mining is mation
applying data analysis and discovery algorithms to detect Statistics Stock price estimation
Time series
patterns over data for prediction and description.
Classification Decision tree Credit embezzle
With sufficient database size and quality, data mining Fuzzy Market segmentation
technology can provide business intelligence to generate new Neural network
opportunities (Lau, Wong, Hui, & Pun, 2003; Su, Hsu, & Tsai, Genetic algorithm
2002; Zhang, Hu, Patuwo, & Indro, 1999; Langley & Simon, Prediction Regression Churn prediction
Neural network Fraudster prediction
1995; Bortiz & Kennedy, 1995; Fletcher & Goss, 1993; Decision tree
Salchenberger, Cinar, & Lash, 1992; Tam & Kiang, 1992). Segmentation Neural network Market segmentation
In CRM, data mining techniques most commonly used include Statistics
clustering, associations, rule induction, genetic algorithm, Genetic algorithm
Decision tree
decision tree, and neural network.
S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524 517

Identify Data Items of Interest

Create
3, 5 Predictive
Model
6
Test
Model

Model
4
Scoring
7

Extract 2
Extract
Sample Full
Data Extraction
Score
8
Population

Data Derivation 1 Data Warehouse

9
Monitor Results

Cubes/Reports

Fig. 2. IT infrastructure of model assessment process (source: NCR Inc.).

This research selected decision tree, neural network and 3.2. Prediction model creation process
K-means cluster as data mining techniques to build predictive
models or segment customers. Fig. 3 shows our process of creating a predictive model.
Note that in addition to conducting empirical research, we
can use the same IT infrastructure to collect, analyze, detect, 3.2.1. Define scope
and eliminate major customer churn factors. This ‘closed loop’ In this study, we focus on the post-paid subscribers who pay a
infrastructure is imperative to business management as we monthly fee and were activated for at least 3 months prior to July
manage churn to sustain our relationship with customers. 1, 2001. A churner is defined as a subscriber who is voluntary to

Exploring Data
Exploring Data
Analysis DWDW
Analysis

Data Preprocess
Data Preprocess Data Extraction
Data Extraction

Variable Analysis
Variable Analysis
Sample
Sample DB DB

Customer Create Create


Customer
Segmentation
Segmentation Predictive Model
Predictive Model Predictive
PredictiveModel
Model
(K(K-means)
- means) (DT) (DT,NN )

BPN
Decision
DecisionTree
Tree
w/w/Segmentation
Segmentation Decision
DecisionTree
Tree

Approach 1 Approach 2

Fig. 3. Process of predictive model creation.


518 S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524

leave; non-churner is the subscriber who is still using this 1 in Fig. 3). This is to assess if the churn behaviors are different
operator’s service. Moreover, we used the latest 6 months in different ‘value-loyalty’ segments.
transactions of each subscriber to predict customer’s churn In Approach 2 (see Approach 2 in Fig. 3), we used neural
probability of the following month. The transaction data include network to segment customers, followed by decision tree
billing data, call detail records (CDR), customer care, etc. modeling. This is a technology assessment to test if BPN can
improve DT prediction accuracy.
3.2.2. Exploratory data analysis (EDA)
The purpose of EDA is to explore from the customer 3.3. Model performance evaluation
database those possible variables that can characterize or
differentiate the customer behavior. For the variables extrac- In actual practice, it is necessary to know how accurate a
tion, we interviewed telecom experts, such as telecom business model predicts and how long before the model requires
consultants, marketing analysts, customers, and sales of mobile maintenance.
provider to identify churn causes or symptoms prior to To assess the model performance, we use LIFT and hit ratio.
customer churn, such as ‘contract expired’, ‘low usage’, or The following chart illustrates their definitions, where A is the
‘query about terminating the contract’. number of subscribers predicted to churn in the predictive time
window who actually churned, B is the number of subscribers
3.2.3. Data preprocessing, variable analysis and selection, and predicted to churn but did not.
data extraction Hit ratio is defined as A/(ACB), instead of (ACD)/(ACBC
Based on the results of interviews with experts, we extract CCD). This is a model effectiveness measurement in
some possible variables from the customer database as an predicting churners instead of predicting all customer behavior
analytical base of EDA to determine which variables are useful in the predictive window.
to differentiate between churners and non-churners. To assess LIFT, we have to rank order all customers by their
For each of the causes/symptoms gained from interviews, churn score, and define hit ratio (X%) as the hit ratio of the ‘X%
we start to determine if we can observe similar customer Customers with top churn score’. LIFT (X%) (is this X% the
behavior from the database. For example, to the symptom same as the other?) is then defined as the ratio of hit ratio (X%)
‘contract expired’ we can define a variable ‘number of days to the overall monthly churn rate. For example, if the overall
between today and contract expiration date’ to test its monthly churn rateZ2%, XZ5, and hit ratio (5%)Z20%, then
correlation with customer churn, where ‘today’ is the date of LIFT (5%)Z20/2%Z10. LIFT is a measure of productivity for
prediction. Depending on the variable type, we can use modeling: with random sampling of the entire customer base
different statistical test tools, such as z-test. (That is, we you would yield 2% churners, focusing instead on the 5% Top-
examine variable significance by Z-score, 99%, and select if its Churn-Score customers you would yield 20% churners. (Note
Z-score is over 3). that in this case, the top 5% contains 50% of the total churners.)
Note that ‘contract expiration date’ must be a quality data To assess model robustness, we monitor each month’s
field (‘table column’) in the database. Otherwise, the statistical model hit ratio and LIFT for an extended period of time to
inference based on this variable would be invalid. A significant detect degradation.
effort in data preprocessing is to resolve data quality issues
related to unspecified business rules or business rules not 4. Empirical finding
enforced in the business process.
Note also that we can have alternative variable definitions, 4.1. Data source
such as ‘1 if today is later than contract expiration date and 0
otherwise’. It is an iterative process to define the variables, A wireless telecom company in Taiwan provides their
identify the table columns, specify the calculation formula, test customer related data. To protect customer privacy, the data
the validity of statistical inference, and select useful variables source includes data of about 160,000 subscribers, including
for modeling. Data extraction is a formalized system integration 14,000 churners, from July 2001 to June 2002, randomly
process to ensure data quality and code optimization in selected based on their telephone numbers.
modeling, production (e.g. scoring), and model maintenance.
4.2. Exploring data analysis results
3.2.4. Machine learning (model creation)
We took two approaches to assess how models built by We developed possible variables from other research and
decision tree (C5.0) and back propagation neural network interviews with telecom experts. We then analyzed these
(BPN) techniques perform. variables with z-test from four dimensions and listed significant
In Approach 1, we used K-means clustering methods to variables of churn below, based on our analysis database.
segment the customers into five clusters according to their
billing amount (to approximate ‘customer value’), tenure - Customer demography
months (to approximate ‘customer loyalty’), and payment † Age: analysis shows that the customers between
behaviors (to approximate ‘customer credit risks’). Then 45 and 48 have a higher propensity to churn than
we create a decision tree model in each cluster (see Approach population’s churn rate.
S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524 519

† Tenure: customers with 25 – 30 months tenure 4.3. Customer segmentation


have a high propensity to churn. A possible
cause is that most subscription plans have a To segment customers by loyalty, contribution, and usage,
2-year contract period. we selected bill amount, tenure, MOU (outbound call usage),
† Gender: churn probability for corporate MTU (inbound call usage), and payment rate as variables and
accounts is higher than others. A possible used K-Means to model the customers into five clusters. To
cause is that when employees quit, they lose generate roughly the same number of subscribers in each of the
corporate subsidy in mobile services. five clusters, we divided the customers equally into three
- Bill and payment analysis segments high, medium, and low for each variable.
† Monthly fee: the churn probability is higher for Table 3 and Fig. 4 summarize the clustering results. Note
customers with a monthly fee less than $100 NT that each cluster has its unique characteristics:
or between $520 and $550.
† Billing amount: the churn probability tends to be 4.4. Supervise machine learning
higher for customers whose average billing
amount over 6 months is less than or equal to
4.4.1. Decision tree
$190 NT.
In this research we use the decision tree technique to create
† Count of overdue payment: the churn probability
many models under different scenarios. One of the scenarios is
is higher for customers with less than four counts to create a decision tree model for all the customers. Another
of overdue payments in the past 6 months. In one is to create separate decision tree models in each of the
Taiwan, if the payment is 2 months overdue, the customer segments.
mobile operator will most likely suspend the
mobile service until fully paid. This may cause 4.4.1.1. Decision tree modeling without customer segmenta-
customer dissatisfaction and churn. tion. From EDA we select about 40 variables for C5.0 decision
- Call detail records analysis tree modeling. We use many different training sets to create
† In-net call duration: customers who don’t often models, and use the same data set for model validation and
make phone calls to others in the same operator’s performance tests. We compare model performance based on
mobile network are more likely to churn. In-net LIFT (10%), that is, the LIFT of the 10% subscribers with the
unit price is relatively lower than that of other call top churn scores.
types. Price-sensitive subscribers may leave for The churn rate of the whole population is only 0.71%. We
the mobile operator his/her friends use. over-sample the churners to get a higher churn rate training set
† Call type: customers who often make PSTN or for machine learning. Table 4 summarizes the results of six
IDD calls are more likely to churn than those who different sampling strategies for machine learning. For
make more mobile calls. example, the training set of the model S1-1M-RS-30K contains
- Customer care/service analysis 30 K randomly sampled subscribers while that of the model S1-
† MSISDN change count: customers who have 1M-P3 contains 36.7 K subscribers with a 3% over-sampling of
changed their phone number or made two or more churners.
changes in account information are more likely to Table 4 shows that model S1-1M-P5 has the highest LIFT
churn. (10%), about 10. It means that the performance of using a
† Count of bar and suspend: customers who have 1-month analytical window to predict the possible churners of
ever been barred or suspended are more likely to next month will be better than others.
churn. In general, a subscriber will be barred or
suspended by the mobile operators due to 4.4.1.2. Decision tree modeling with customer segmentation.
overdue payments. Since all customers have been assigned to a cluster identity, we
separate the training sets into five sub-sets for decision tree
Table 2 summarizes variables significant to differentiate
1.6%
between churners and non-churners from EDA. We use those C5
1.4% 9.6%
variables for machine learning. 1.2%
C2
Churn Rate

26.8%
1.0%
Table 2 0.8%
Significant variables of churn 0.6% C1
0.4% C4 32.9% C3
Dimension Items 0.2% 16.7% 14.0%

Demography Gender, age, area tenure 0.0%


CDR Inbound call, outbound call, demestric call, 15 20 25 30 35 40 45
Bill/payment Bill amount, payment, overdue payment, monthly fee Tenure(Month)
Custer service Inquire, phone no changed, bar/suspend
Fig. 4. Cluster distribution (churn rate vs tenure).
520 S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524

Table 3
Customer segmentation-cluster

Cluster ID Tenure Bill AMT MOU MTU PYMT rate Percentage of Churn rate (%)
population
C1 H H H H M 32.9 0.50
C2 L L L L L 26.8 1.19
C3 H M M M L 14.0 0.32
C4 L M M M L 16.7 0.30
C5 M M M M H 9.6 1.37

Note: L: Low, M: Medium, H: High.

modeling by the cluster identity. Then we used the same In general, the performance of building a predictive model
validation sets to evaluate all models. Table 5 shows the on individual segments is more accurate than the one built
performance of each clusters’ decision tree model. on the entire customer population. Thus, a decision tree
model without segmentation should outperform a decision
4.4.2. Neural network (back propagation network, BPN) tree model with segmentation. Our experiment shows
Based on other research results (e.g. Cybenco, 1998; Zhan, otherwise. Furthermore, we are concerned about the
Patuwo, & Hu, 1998), we know that using one-layer hidden significant performance gap after the first month between
layer and optimal network design might provide a more the NN and DT techniques.
accurate model from neural network. In this study, we use 1-1-
1 (input-hidden-output) as a training model type. This training
type includes 43 inputs and only one output. Since public 4.5.2. Test modeling technique differences
information is not available on key modeling parameters such We use T-test to compare modeling techniques under
as the learning rate or number of neurons in the hidden layer, different modeling parameters. The hypotheses are:
we try many different combinations. Table 6 shows the results,
in which model N18-R6, for example, uses 18 neurons in the † H01: hit ratio of the decision tree model without
hidden layer with 0.6 learning rate. segmentation is not different from that with segmentation.
To minimize other variances, we use the same training set (DTH-DTSH)
for BPN as for decision tree. Table 6 shows that N21-R6 † H02: capture rate of the decision tree model without
achieves the best performance from R-square and MSE segmentation is not different from that with segmentation.
measurements. (DTC-DTSC)
† H03: hit ratio of the neural network model is not different
from the decision tree model without segmentation. (NNH-
4.5. Model performance stability
DTH)
† H04: capture rate of the neural network model is not
4.5.1. Overall performance trend different from the decision tree model without segmenta-
We use the data from the telecom operator to ‘track’ model tion. (NNC-DTC)
performance over a period of time. Fig. 5 shows the trend of † H05: hit ratio of the neural network model is not different
performance in terms of hit rate and capture rate: from the decision tree model with segmentation. (NNH-
DTSH)
† Fig. 5 shows that all the models demonstrate stable † H06: capture rate of the neural network model is not
accuracy in the first 6 months. However, significant different from the decision tree model with segmentation.
degradation occurs in the month of February 2002, (NNC-DTSC)
regardless of modeling techniques. The Chinese New
Year is in February 2002 and it is possible that consumers Table 7 lists T-test results:
behave differently during this period.
† In the first 6 months, NN outperforms DT, and DT without † The performance of the decision tree model without
segmentation slightly outperforms DT with segmentation. segmentation is better than that with segmentation.
Table 4
Models evaluation of decision tree without segmentation

Model Hit ratio (%) Capture ratio (%) Lift at 10% Description
S1-1M-RS-30K 92.92 85.91 8.74 Analytical baseZ1 M, random sample, learning recordsZ30 K
S1-1M-P3 97.90 84.82 9.18 Analytical baseZ1 M, over samplingZ3%, learning recordsZ36.7 K
S1-1M-P5 96.21 94.55 9.96 Analytical baseZ1 M, over samplingZ5%, learning recordsZ22 K
S1-1M-P10 96.72 93.82 9.93 Analytical baseZ1 M, over samplingZ3%, learning recordsZ11 K
S1-2M-RS 87.85 92.95 9.21 Analytical baseZ2 M, random sample, learning recordsZ30 K
S1-3M-RS 98.04 86.27 9.53 Analytical baseZ3 M, random sample, learning recordsZ30 K
S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524 521

† The performance of BPN is better than the decision tree


model without segmentation on both hit ratio and capture

19.02
19.02
18.87
18.85
18.82
18.34
10.96
11.12
10.97
11.01
10.92
10.92
LIFT
rate.
DT with segment (C5)

Cap (%)

0.54
2.42
0.60
1.16
0.00
0.00
98.80
98.32
96.15
95.65
95.24
89.47
4.5.3. Sample size impact
One theory is that our results are biased because of limited
churn samples in the analysis base: the mobile service provider
Hit (%)

only budgeted this study at the population of about 160,000


4.32

4.30
1.20
0.30
0.61
0.00
0.00
33.07
26.70
12.90
12.43

17.79
customers, and the associated monthly churn rate was only
0.71%. The data size was not sufficient to build a good
predictive model by each customer segment because we could
not explore real significant information from a few churners in
LIFT
8.95
9.66
9.43
9.48
9.76
8.81
1.08
1.25
2.06
2.26
2.16
1.69
each customer segment. For example, Table 3 shows that C3
contains about 17% of the customer population with a 0.3%
DT with segment (C4)

Cap (%)

churn rate. Thus, there are only 160,000!17%!0.3%, or


8.11

0.00
1.11
1.11
88.42
94.32
92.45
93.10
95.24
84.75
10.28

12.20

about 80 churners in C3.

4.5.4. Robust models


Hit (%)
46.93
31.68
15.76
14.79

11.58
10.71
13.33
9.88
8.88

0.00
4.41
4.76

In order to validate the model’s accuracy, we randomly


selected 50,000 subscribers to evaluate the model’s perform-
ance. Fig. 6 shows the results: models built are more robust and
performance disparity between NN and DT models in the first 6
LIFT
9.47
9.56
9.64
9.47
9.60
9.67
1.27
1.04
1.20
0.36
0.64
1.50

months disappeared.
Fig. 7 compares the LIFT of all models: both NN and DT
techniques generate models with a hit rate of 98% from the top
DT with segment (C3)

Cap (%)

10% of predicted churners in the list. That is, LIFT (10%) is


89.05
90.23
90.51
91.09
91.42
2.31
2.34
1.35
2.31
0.60
1.55
1.40

about 10.
Hit (%)

4.5.5. Performance comparison with prior studies


78.09
43.60
33.89
25.03
19.15
13.52

12.77

14.89
14.29
9.90
6.00

4.65

Berson et al. (2000) used customer demographics, con-


tractual data, and customers’ service data as predictors to
generate models with a hit rate of about 40% from the top 10%
of predicted churners in the list. That is, LIFT (10%) is about 4.
10.08
9.89

9.55
9.55
9.87
9.82
0.94
0.94
1.08
1.16
0.94
1.11
LIFT

Wei (Wei & Chiu, 2002) used customer call detail records as a
predictor and generated models with a hit rate of less than 50%
DT with segment (C2)

Cap (%)

from the top 10% of predicted churners in the list. That is, LIFT
88.89
92.44
96.99
92.00
91.55
90.68
0.00
0.00
1.52
0.00
0.00
1.82

(10%) is less than 5. Our LIFT (10%) is about 10. Although the
customer bases are different and there are other modeling
parameters to consider, the LIFT achieved by all proposed
Hit (%)

techniques in this study demonstrates a significant improve-


0.00
0.00
2.56
0.00
0.00
5.00
64.86
49.66
36.86
24.74
21.85
14.84
Models performance of decision tree with segmentation

ment from those in early studies.

5. Conclusions and future research


LIFT
9.75
9.67
9.69
9.72
9.53
9.51
1.06
1.18
1.29
0.97
1.33
1.05

Churn prediction and management is critical in liberalized


DT with segment (C1)

mobile telecom markets. In order to be competitive in this


Cap (%)

0.72
1.39
1.61
0.42
0.40
1.63
94.61
93.99
94.43
95.32
93.29
93.39

market, mobile service providers have to be able to predict


possible churners and take proactive actions to retain valuable
customers.
Hit (%)

In this research, we proposed different techniques to build


2.34
4.80
5.88
1.77
1.79
7.02
79.91
42.41
32.23
24.48
21.67
15.26

predictive models for telecom churn prediction. We included


customer service and customer complaint log for modeling, as
suggestions from prior research of Wei and Chiu (2002). We
window
Table 5

200108
200109
200110
200111
200112
200201
200202
200203
200204
200205
200206
200207
Predict

examined the impact of inadequate data on model building.


Our empirical evaluation shows that data mining techniques
522 S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524

Table 6
Learning results of BPN

N18-R6 N19-R6 N20-R6 N21-R6 N22-R6 N23-R6 N24-R6 N25-R6


R squarred 0.9934 0.9922 0.9998 0.9999 0.9998 0.9998 0.9995 0.9941
R squarred 0.9934 0.9922 0.9998 0.9999 0.9998 0.9998 0.9995 0.9941
MSE 0.001 0.002 0 0 0 0 0 0.001

Table 7
Model performance evaluation

N Mean Std. deviation Std. error mean


One-sample statistics
DTH_DTSH 12 1.992!10K2 1.146!10K2 3.309!10K3
DTC_DTSC 12 1.108!10K2 8.062!10K3 2.327!10K3
NNH_DTH 12 .1508 8.284!10K2 2.391!10K2
NNC_DTC 12 3.333!10K2 2.103!10K2 6.072!10K3
NNH_DTSH 12 .1708 7.416E 2.141!10K2
NNC_DTSC 12 4.417!10K2 2.610!10K2 7.534!10K3
Test valueZ0
T Df Sig. (2-tailed) Mean difference 95% Confidence interval of the difference
Lower Upper

One-sample test
DTH_DTSH 6.020 11 .000 1.992!10K2 1.263!10K2 2.720!10K2
DTC_DTSC 4.762 11 .001 1.108!10K2 5.961!10K3 1.621!10K2
NNH_DTH 6.307 11 .000 .1508 9.820!10K2 .2035
NNC_DTC 5.490 11 .000 3.333!10K2 1.997!10K2 4.670!10K2
NNH_DTSH 7.980 11 .000 .1708 .1237 .2180
NNC_DTSC 5.863 11 .000 4.417!10K2 2.759!10K2 6.075!10K2

DTH; hit ratio of decision tree model without segmentation; DTC, capture rate of decision tree model without segmentation; DTSH, hit ratio of decision tree model
with segmentation; DTC, capture rate of decision tree model with segmentation; NNH, hit ratio of neural network (BPN).

can effectively assist telecom service providers to make more customers. Furthermore, integrating churn score with customer
accurate churner prediction. segment and applying customer value also helps mobile service
However, the effective churn prediction model only providers to design the right strategies to retain valuable
supports companies to know which customers are about to customers.
leave. Successful churn management must also include Data mining techniques can be applied in many CRM fields,
effective retention actions. Mobile service providers need to such as credit card fraud detection, credit score, affinity
develop attractive retention programs to satisfy those between churners and retention programs, response modeling,

DT w/o Segment Hit-Rate DT w/o Segment Capture-Rate DT w/ Segment Hit-Rate

DT w/ Segment Capture-Rate NN Hit-Rate NN CApture-Rate

120%

100%

80%

60%

40%

20%

0%
200108 200109 200110 200111 200112 200201 200202 200203 200204 200205 200206 200207

Fig. 5. Hit ratio and capture rate of different models.


S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524 523

DT w/o Segment Hit-Rate DT w/o Segment Capture-Rate DT w/ Segment Hit-Rate

DT w/ Segment Capture-Rate NN Hit-Rate NN Capture-Rate

120%

100%

80%
Rate

60%

40%

20%

0%
200108 200109 200110 200111 200112 200201 200202 200203 200204 200205 200206 200207

Validation Month (YYYYMM)

Fig. 6. Model performance with 50,000 subscribers.

DT W/O Segment DT With Segment


35 30
30 25
25 20
20
Lift

Lift

15
15
10 10
5 5
0 0
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% of Population % of Population

Neural Network
30
25
20
Lift

15
10
5
0
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% of Population

Fig. 7. LIFT comparison.

and customer purchase decision modeling. We expect to see Kentrias, S. (2001). Customer relationship management: The SAS perspective,
more data mining applications in business management, and www.cm2day.com.
Langley, P., & Simon, H. A. (1995). Applications of machine learning and rule
more sophisticated data mining techniques will be developed
induction. Communication of the ACM, 38(11), 55–64.
as business complexity increases. Lariviere, B., Van den Poel, D., & Van den Poel (2004). Investigating the role
of product features in preventing customer churn, by using survival analysis
References and choice modeling: The case of financial services. Expert Systems with
Applications, 27(2), 277–285.
Berson, A., Smith, S., & Thearling, K. (2000). Building data mining Lau, H. C. W., Wong, C. W. Y., Hui, I. K., & Pun, K. F. (2003). Design and
applications for CRM. New York, NY: McGraw-Hill. implementation of an integrated knowledge system. Knowledge-Based
Bortiz, J. E., & Kennedy, D. B. (1995). Effectiveness of neural network types Systems, 16(2), 69–76.
for prediction of business failure. Expert Systems with Applications, 9(4), Lejeune, M. (2001). Measuring the impact of data mining on churn
503–512. management. Internet Research: Electronic Network Applications and
Cybenco, H. (1998). Approximation by super-positions of sigmoidal function. Policy, 11(5), 375–387.
Mathematical Control Cignal Systems, 2, 303–314. Mattersion, R. (2001). Telecom churn management. Fuquay-Varina, NC:
Fletcher, D., & Goss, E. (1993). Forecasting with neural networks: An APDG Publishing.
application using bankruptcy data. Information and Management, 3, Salchenberger, L. M., Cinar, E. M., & Lash, N. A. (1992). Neural networks: A
159–167. new tool for predicting thrift failures. Decision Sciences, 23(4), 899–916.
524 S.-Y. Hung et al. / Expert Systems with Applications 31 (2006) 515–524

SAS Institute, (2000). Best Price in Churn Prediction, SAS Institute White Wei, C. P., & Chiu, I. T. (2002). Tuning telecommunications call detail to
Paper. churn prediction: A data mining approach. Expert Systems with
Su, C. T., Hsu, H. H., & Tsai, C. H. (2002). Knowledge mining from trained Applications, 23, 103–112.
neural networks. Journal of Computer Information Systems, 42(4), 61–70. Zhan, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural
Tam, K. Y., & Kiang, M. Y. (1992). Managerial applications of neural network: The state of the art. International Journal of Forecasting, 14, 35–62.
networks: The case of bank failure predictions. Management Science, Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural
38(7), 926–947. networks in bankruptcy prediction: General framework and cross-
Thearling, K. (1999). An introduction of data mining. Direct Marketing validation analysis. European Journal of Operational Research, 116,
Magazine . 16–32.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy