0% found this document useful (0 votes)

57 views13 pages

Lariviere 2005

This document discusses using random forests techniques to predict customer retention, defection, and profitability using a sample of 100,000 customers from a large European financial services company. It analyzes three dependent variables: next purchase, partial defection from a non-ending product, and evolution of customer profitability over time. Random forests classification is used for binary outcomes like next purchase, while regression forests are used for variables like profit evolution. The study finds these techniques provide better predictions than traditional regression models. Variables impacting retention differ from those impacting defection or profitability. Past customer behavior is more important for repeat purchases and profitability, while intermediaries have more impact on defection.

Uploaded by

Zain Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views13 pages

Lariviere 2005

Uploaded by

Zain Aamir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Expert Systems with Applications 29 (2005) 472–484

www.elsevier.com/locate/eswa

Predicting customer retention and profitability by using random forests

and regression forests techniques
Bart Larivière*, Dirk Van den Poel
Department of Marketing, Ghent University, Hoveniersberg 24, 9000 Ghent, Belgium

Abstract
In an era of strong customer relationship management (CRM) emphasis, firms strive to build valuable relationships with their existing
customer base. In this study, we attempt to better understand three important measures of customer outcome: next buy, partial-defection and
customers’ profitability evolution. By means of random forests techniques we investigate a broad set of explanatory variables, including past
customer behavior, observed customer heterogeneity and some typical variables related to intermediaries. We analyze a real-life sample of
100,000 customers taken from the data warehouse of a large European financial services company. Two types of random forests techniques
are employed to analyze the data: random forests are used for binary classification, whereas regression forests are applied for the models with
linear dependent variables. Our research findings demonstrate that both random forests techniques provide better fit for the estimation and
validation sample compared to ordinary linear regression and logistic regression models. Furthermore, we find evidence that the same set of
variables have a different impact on buying versus defection versus profitability behavior. Our findings suggest that past customer behavior is
more important to generate repeat purchasing and favorable profitability evolutions, while the intermediary’s role has a greater impact on the
customers’ defection proneness. Finally, our results demonstrate the benefits of analyzing different customer outcome variables
simultaneously, since an extended investigation of the next buy–partial-defection–customer profitability triad indicates that one cannot fully
understand a particular outcome without understanding the other related behavioral outcome variables.
q 2005 Elsevier Ltd. All rights reserved.

Keywords: Data mining; Customer relationship management; Customer retention and profitability; Random forests and regression forests

1. Introduction clear consensus about the true relationship between

customer retention and profitability. In their study, Reinartz
Since the last decade, many companies perceive the and Kumar (2000) argue that the most loyal customers are
retention of the customer as a central topic in their not necessarily the most profitable ones. As such, it is
management and marketing decisions (Van den Poel & plausible to assume that some of the most retention-prone
Larivière, 2004). The emphasis on retention is based on the customers represent lower profits for the company than
implicit assumption that there exists a strong association some other prosperous customers that divide their money
between customer retention and profitability: long-term among different financial services providers.
customers buy more and are less costly to serve (Ganesh, In this study, we investigate both customer retention and
Arnold, & Reynolds, 2000; Hwang, Jung, & Suh, 2004), profitability outcomes, and we explicitly test for differences
whereas replacing existing customer by ‘new’ ones is with respect to the impact of the same set of explanatory
known to be a more expensive (Bhattacharya, 1998; Colgate variables on both outcomes.
& Danaher, 2000) and risky strategy, since it is likely to Unlike previous retention studies that mainly focus on
assume that switched customers are more vulnerable to one particular type of retention, this study adopts a more
continue their churning behavior in the near future (Lewis & extended approach in the conceptualization of the retention
Bingham, 1991; McNeal, 1999). Nevertheless, there is no dependent variables; we investigate a repeat purchase as
well as a defection outcome. The first retention variable
* Corresponding author. Tel.: C32 9 264 35 24; fax: C32 9 264 42 79. ‘next buy’ represents whether a customer has bought
E-mail address: bart.lariviere@ugent.be (B. Larivière). another product given a particular subset of independent
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. variables. The second retention variable is labeled ‘active
doi:10.1016/j.eswa.2005.04.043 partial-defection’ and expresses the customer’s decision to
B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484 473

cancel a product that is characterized by a ‘non-ending’ implications are reported in Section 4. In Section 5, we
status. Contrary to typical grocery products like milk, coffee summarize and discuss the results of this study.
or cookies, financial products are bought and owned for a
specific period in time. As a consequence, you remain a
customer until all the products are closed or expired. 2. Methodology
Regarding the ending status of financial products, there exist
two notable types: (i) products that have a fixed duration In this study, we use random forests techniques to predict
term and as a consequence automatically end when the customers’ profitability evolution and their next buy and
expiration date is reached, and (ii) products that do not have partial-defection decisions. Two types of random forests are
a fixed expiration date and hence receive a ‘non-ending’ used depending on the conceptualization of the dependent
label, since they only stop when a customer explicitly asks variable: that is binary classification and linear prediction
to cancel that product. With the ‘active partial-defection’ outcomes. In the next paragraphs we present the methodo-
retention variable, we emphasize the latter ending status logical underpinnings of the random forests techniques and
scenario. The ‘partial’ refers to the fact that the closure of the evaluation criteria we use to investigate their
one particular product does not necessarily mean a ‘total’ performance.
defection of the customer, since that customer is allowed to
have other products that are still open or not expired. 2.1. Random forests
With respect to the ‘customer profitability’ dependent
variables, we investigate the customer’s evolution in profit. With regard to binary classification tasks, decision trees
Contrary to the existent literature that mainly investigated (DT) have become very popular, thanks to their ease of use
profitability in a cross-sectional manner by spanning and interpretability (Duda, Hart, & Stork, 2001) as well as
companies and industries, we investigate each customer’s their ability to deal with covariates measured at different
profitability longitudinally. As such we are able to analyze measurement levels (including nominal variables). Never-
the direct relationship between a customer’s set of theless, conventional decision trees techniques also have
their disadvantages. For instance, Dudoit, Fridlyand, and
explanatory variables and his generated profits in contrast
Speed (2002) mention their lack of robustness and the
to previous studies that were often constrained by linking
suboptimal performance. Fortunately, many of these
aggregated customer information with, for example, the
disadvantages have been dealt with by some researchers
stock-price performance per firm or the turnover per outlet
who optimized the DT technique. More specifically, the
due to the unavailability of profitability measures at the
creation of an ensemble of trees followed by a vote for the
customer level. In this study, we investigate two measures
most popular class, labeled forests (Breiman, 2001), is the
of customer profitability. The first measure is ‘profit
result of such a DT optimization.
evolution’ and represents the customers’ evolution with
In this paper, we also use the more advanced DT
respect to the profits generated during the observed window
technique. We select the random forests as proposed by
of observation. The second variable ‘profit drop’ is a Breiman (2001), which uses the strategy of a random
deduced version of the former profitability measure. ‘Profit selection of a subset of m predictors to grow each tree,
drop’ is a binary variable expressing whether the customer where each tree is grown on a bootstrap sample of the
has become less profitability for the company by the end of training set. This number, m, is used to split the nodes and is
observation. The variable is created as an extra tool to much smaller than the total number of variables available
validate the accuracy of predicting customers’ profitability for analysis.
evolutions and to compare its performance with the other Since its introduction, random forests have been enjoy-
binary retention dependent variables. ing increased popularity. The number of applications in
In sum, we investigate two major groups of customer fields with large datasets is growing: e.g. in bioinformatics
outcome: customer retention and profitability. We analyze (Deng et al., 2004). On the other hand, the number of
two measures of retention that both involve an ‘active’ applications in economics, and, more specifically in
transaction of the customer: the opening of a new product marketing related issues are rather scarce (Buckinx & Van
(next buy) and the decision to end a product that is still open den Poel, 2005). The available applications using random
(active partial-defection). Furthermore, we also investigate forests reveal that the predictive performance is among the
how customers evolve in terms of the profitability they best of available techniques (Luo et al., 2004). Furthermore,
represent for the company by means of a linear (profit an interesting by-product of the technique are the produced
evolution) and a binary (profit drop) dependent variable. importance measures for each variable (Ishwaran, Black-
The rest of the paper is organized as follows. In Section stone, Pothier, & Lauer, 2004) that indicate which variables
2, we elucidate the methodological underpinnings of the have the strongest impact on the dependent variables of
random forests and the regression forests techniques. In investigation. Another advantage of the technique concerns
Section 3, we present the data set and the explanatory the consistent high and robust performance results
variables under investigation. The study results and its (Breiman, 2001). Finally, the random forests as proposed
474 B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484

by Breiman have reasonable computing times (Buckinx & With respect to the linear dependent variable, profit
Van den Poel, 2005) and are easy to use; the only two evolution, we cannot use the AUC evaluator, since both
parameters a user of the technique has to determine are the predicted and real values have more than two (i.e. binary)
number of trees to be used and the number of variables (m) values. Profit evolution represents the change in the
to be randomly selected from the available set of variables. customer’s profitability during the observed window of
In both cases, we follow Breiman’s recommendation to pick analysis, and consequently can have a wide range of both
a large number (5000 in this case) for the number of trees to positive and negative values. In order to evaluate the
be used, as well as the square root of the number of variables predicted values, we calculate the mean absolute deviation
for the latter parameter. Since the number of explanatory (MAD)
variables equals to 30 (cf. Table 2) in this study, we fix the
number of variables to six. 1X n
MAD Z jP K Ri j (1)
n iZ1 i
2.2. Regression forests
where n is the sample size, Pi the predicted profit evolution
for customer i and Ri the real profit evolution for customer i.
Breiman also extended the concept of random forests to
Similar to the goodness-of-fit evaluation of the random
regression cases. Random forests for regression are formed
forests models, we also apply conventional linear regression
by growing trees depending on a random vector such that
models in order to benchmark its performance against the
the tree predictor takes on numerical values as opposed to
regression forests results with respect to the profit evolution
class labels (cf. Section 2.1). The random forests predictor is
target variable.
formed by taking the average over a number of the trees
specified by the user.

2.3. Evaluation criteria 3. Empirical study

In this study, we investigate four different dependent A major Belgian financial services company delivered
variables: next buy, active partial-defection, profit drop and the data for this study. Their data warehouse stores detailed
profit evolution. The first three measures involve a binary information about customers’ banking and insurance
classification problem of a specific event; that is the event of acquisitions; that is we know when, what, how much and
buying a new product, the event of canceling a ‘non-ending’ at which point of sales the customer has bought a specific
status product and the event of becoming less profitable for product. Furthermore, the company gathers demographic
the company. information about its customers and provides its customers
In order to assess the predictive performance of the with a monthly revenue indicator. Since our research setting
classification models based on the random forests technique, implies a fourfold analysis of dependent variables, we
we use the area under the receiver operating characteristic decided to use the same group of customers, as well as the
curve (AUC) criterion. Furthermore, we benchmark the same set of potential explanatory variables in order to
performance of the random forests against the AUC compare their relative and different impact on the customer
resulting from conventional logistic regression models in retention and profitability target variables we emphasize.
which we use the same set of customers, independent and We decided to take two randomly selected samples of
dependent variables. The AUC measure is based on a range 50,000 customers each of which one is used for the
of comparisons between the predicted status of the event estimation process and the second sample is used for
and the true status of the customer with respect to that event, validation. In the next paragraphs, we present the dependent
by considering all possible cut off levels for the predicted and explanatory variables that are created to perform the
values. More specifically, for all the cut off points, the customer retention and profitability models.
sensitivity (the number of true positive versus the total
number of events) and the specificity (the number of true 3.1. Conceptualization of the dependent variables
negatives versus the total number of non-events) of the
confusion matrix are considered and summarized by means The following timeline provides some detailed infor-
of a two-dimensional graph, resulting in a ROC curve. The mation about the period of analysis in this study.
area under this curve is used to evaluate the predictive As it is clear from the timeline, we determine the
accuracy of the classification models (Hanley & McNeil, dependent variables within the time period of 1 June 2003
1982). In order to compare the AUC’s resulting from the through 1 February 2004 (Zlatest release date of the data
random forests with these of the logistic regression models, warehouse). Two measures of retention are created in order
we apply the non-parametric test proposed by DeLong, to investigate the postulated research objectives. The first
DeLong, and Clark-Pearson (1988) that investigates measure is ‘next buy’ and expresses whether the customer
whether the areas under both ROC curves are significantly has bought a new product during the 8 months of follow-up
different. (i.e. 1 June 2003 through 1 February 2004). The second
B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484 475

dependent variable ‘active partial-defection’ explores the minimum and maximum values that there is a wide
whether the customer has ended himself a product that range of movements within the follow-up period with regard
was still open. Note that with respect to the latter dependent to the customers’ profit evolutions. Furthermore, the mean
variable, we explicitly focus on ‘active’ defection, meaning and median values are situated around zero, indicating that
that we do not consider an ‘automatic’ product defection as the extra profits generated by some customers are fully
the event of investigation (cf. Section 1). Both retention absorbed by the lost revenues of some other profitability
variables are binary and receive the value of ‘1’ when the defectors. Given the fact that only one quarter experienced a
event happened during the follow-up period (‘0’ in the other decrease in profits, we can ascertain the need to gain insight
case). into the drivers of the target variable ‘profit drop’, because
With respect to the profitability measures we make use of on average one customer is likely to absorb the extra profits
the company’s internal records. Each month, the investi- generated by three other customers.
gated company computes an individual profitability score
for its entire customer base. The monthly score is calculated 3.2. Explanatory variables
as a weighted average of the total number of products owned
multiplied by the corresponding balance amount (at the end In this study, we explore three major predictor categories
of each month) and the net margin that the product that encompass potential explanatory variables. The three
represents for the company. Based on the scores throughout categories are: past customer behavior, observed customer
the follow-up period, we were able investigate the heterogeneity and variables related to intermediaries. In the
customers’ evolution with respect to that profitability next paragraphs, we introduce each category by presenting
score. We created two dependent variables. The first its variables. Note that all explanatory variables are
profitability measure is ‘profit evolution’ and represents measured at the date of 31 May 2003 (cf. Fig. 1). Table 2
the shift (expressed in profitability points) in the customer’s presents the explanatory variables that are investigated in
profitability, whereas ‘profit drop’ is a binary indicator this study.
expressing whether the customer showed a negative
evolution with respect to his revenue profile, meaning that 3.2.1. Past customer behavior
he became less profitable for the company by the end of the There is ample evidence in the literature that behavioral
follow-up period. In Table 1, we provide some insights exchange characteristics are strong predictors of future
about the 100,000 customers under investigation in this customer behavior (Baesens et al., 2004; Reinartz & Kumar,
study and their corresponding retention and profitability 2003) and profitability (Hsieh, 2004). In this study, we
measures. investigate the following past customer behavior variables:
It is clear from Table 1 that some 13% of the customers specific product ownership, self-banking activity, total
bought a new product during the follow-up period, whereas number of products owned, monetary value and cross-
fewer customers (6.8% of the customers) decided to cancel a buying.
product with a non-ending status. With respect to the binary
profitability measure, we observe that approximately a 3.2.1.1. Specific product ownership. Some researchers
quarter of the customers experienced a negative evolution in investigated the impact of specific product ownership on
the profitability they represent for the firm. This latter customer outcome. Their findings indicate that specific
finding is intriguing in the context of the second profitability product ownership is likely to influence future customer
measure that reflects the absolute shift in a customer’s profit behavior (e.g. Athanassopoulos, 2000; Larivière & Van den
evolution expressed in profitability points. It is clear from Poel, 2004). In this study, we test for the impact of seven
Table 1
Insight in the dependent variables for both estimation and validation sample

Dependent variablesa Estimation sample (NZ50,000) Validation sample (NZ50,000)

Absolute Relative (%) Absolute Relative (%)
Next buy Yes 6642 13.3 6644 13.3
No 43,358 86.7 43,356 86.7
Active partial-defec- Yes 3420 6.8 3386 6.8
tion No 46,580 93.2 46,614 93.2
Profit drop Yes 14,349 28.7 14,167 28.3
No 35,651 71.3 35,833 71.7
Profit evolution Min K2938.28 K3500.86
Max 1179.53 2027.58
Mean K1.06 K0.98
Median 0.01 0.01
a
All dependent variables are measured within the follow-up period (1 June 2003 through 1 February 2004).
476 B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484

self-banking with ATM, phone banking or internet banking)

Time in order to minimize their operational working costs. Also,

31 May 2003

1 June 2003

1 Feb 2004
the company of investigation enables its customers to use
internet or phone services for both banking and insurance
transactions. For this study, we created a dummy variable
expressing whether the customer is a self-banking user (by
means of internet or phone).
Explanatory Follow-up period:
variables Conceptualization of the 3.2.1.3. Total number of products owned and monetary
dependent variables value. Previous research suggests that there exists a positive
association between these two explanatory variables and
Fig. 1. Period of analysis.
customers’ subsequent customer behavior. For instance,
different ownership variables. We introduce six dummy Huber, Lane, and Pofcher (1998) reveal that the more
variables that categorize all types of banking and insurance products a customer possesses with the bank, the more
products as well as one variable expressing whether the retention prone he is. Similarly, the more money a customer
customer owns credit cards or not. invests with a company the more likely he is to stay
(Baesens, Viaene, Van den Poel, Vanthienen, & Dedene,
3.2.1.2. Self-banking by means of internet and phone. 2002; Ganesan, 1994). With respect to the customer’s
Nowadays, more and more financial services providers profitability, it is plausible to assume that a higher quantity
encourage their customers to perform their daily trans- of products represents higher profits, since previous
actions by means of electronic banking services (such as research found a positive relationship between customers’
Table 2
Explanatory variables used in this study

1. Past customer behavior

Specific product ownership
Possession of savings and investment products, type low risk (e.g. a savings account, bonds, etc.) d_SI_low_riska
Possession of savings and investment products, type high risk (e.g. exchange products like stocks, etc.) d_SI_high_risk
Possession of a typical savings and investment product that is created as the steppingstone between the two d_SI_stepst
other savings and investment groups
Possession of risk products (e.g. fire insurance, car insurance, etc.) d_risks
Possession of credit products (e.g. a mortgage, etc.) d_credits
Possession of current account d_curracc
Possession of (credit) cards d_card
Self-banking by means of internet or phone d_self_b
Total number of products owned nbr_p
Monetary value mon_val
Cross-buying cross_b
2. Customer demographics
Age age
Lifecycle stage: that is, respectively, (1) youngsters, (2) families with young children, (3) midlife and (4) seniors d_lifec_stage_1
d_lifec_stage_2
d_lifec_stage_3
d_lifec_stage_4
Gender (1Zmale, 0Zfemale) d_gender
Region (1ZFlanders, 0ZWalloon) d_region
Geo-demographic data
Social status of the place of residence d_soc_status_1
d_soc_status_2
d_soc_status_3
d_soc_status_4
d_soc_status_5
d_soc_status_6
d_soc_status_7
d_soc_status_8
Median income per place of residence med_income
3. Intermediary variables
Selling tendency ST
Number of customers served nbr_cust
Sales assortment sales_assort
a
The prefix ‘d_’ refers to the fact that the corresponding variable is a dummy variable.
B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484 477

spending level and profitable lifetimes (Reinartz & Kumar, remain with a service supplier for both Australia and
2003). In this study, we also control for the customers’ total Thailand and found a significant difference. In this study, we
product ownership and monetary value. also account for this cohort information in order to test
whether we observe some significant differences with
3.2.1.4. Cross-buying. Cross-buying refers to the degree to respect to the profitability and retention proneness for
which customers purchase products from different product Flemish versus Walloon customers.
categories offered by the company. In this study, we
explicitly decided to create a cross-buying variable, since 3.2.2.4. Geo-demographic data. Besides customer demo-
the investigated company is characterized by a large group graphic data gathered at the customer level, the company
of mono-product customers. As such, it offers a viable also buys some additional customer information that is
opportunity to investigate the impact of a higher share-of- gathered based on the place of residence (that is geo-
wallet on both retention and profitability dependent demographic data). In this study, we analyze two different
variables. information items: the social status and the median income
of the region of residence. The social status consists of nine
3.2.2. Customer demographics groups. Therefore, we create eight dummy variables per
It is clear from previous research that accounting for customer in order to know to which categorical group a
observed customer heterogeneity is warranted. In this study, customer belongs. We wonder whether these variables
we control for the customer’s age, lifecycle stage, gender, provide some additional explanatory information with
geographical region, and some geo-demographic data. respect to the dependent variables we emphasize; and as a
consequence—in terms of practical reasons for the com-
3.2.2.1. Age and lifecycle stage. It is well known that pany—are worth paying for.
customers’ financial-need priorities and resource avail-
ability vary at different stages of his lifecycle, and as such 3.2.3. Variables related to intermediaries
influence the quantity and the sequence in which financial To date there is still a poor understanding of the impact
products and services are acquired (Kamakura, Ramas- of salespersons (or intermediaries) on customers’ behavior
wami, & Srivastava, 1991): e.g. in general, younger (Guenzi & Pelloni, 2004). Nevertheless, it seems important
customers (i.e. the ‘bachelor’ stage) have less money to to investigate the salesperson’s role, since he acts as the
invest than older individuals (i.e. the ‘empty-nest’ and the crucial player who interacts with the company’s customers.
‘retirement’ stage). As such older people that belong to a In this study, we investigate three variables related to these
later stage in their lifecycle are assumed to have more intermediary agents: the selling tendency of the salesperson,
money available. In this study, the lifecycle stage consists of the number of customers served by a salesperson and the
five stages; as such we create four dummy variables in order sales assortment.
to express to which stage the customer belongs. A higher
number corresponds with a later stage in the lifecycle. 3.2.3.1. Selling tendency of the salesperson. In real life, it is
likely to assume that not every intermediary is equally
3.2.2.2. Gender. As in most studies that account for skilled in selling financial products and services to the
customer demographic data, we also control for the company’s customers. With the variable ‘selling tendency’,
customer’s gender. The variable ‘gender’ is operationalized we aim to explore the impact of a salesperson’s selling
as a dummy variable that receives the value of ‘1’ when the capabilities on both the customers’ profitability and
customer is male, and a ‘0’ when the customer is female. retention proneness. The variable ‘selling tendency’ rep-
resents the number of products sold in relation to the number
3.2.2.3. Geographical region. The investigated company of customers served by a specific intermediary. The variable
provides its financial products and services at the Belgian is created by using the information from 1 year preceding
market. The variable region reflects a geographical cohort the date of 31 May 2003 (cf. Fig. 1); the higher the value for
and is operationalized as a dummy variable. In general, the the variable the more products the intermediary had sold to
Belgian market can be divided into two large geographical its own customer base.
areas: Flanders in the north and the Walloon part in the
south. Besides the fact that each region has its own language 3.2.3.2. Number of customers served by the salesperson.
(respectively, Dutch and French), the marketing department Although many researchers have suggested that the per-
of the investigated company reveals that they also have a formance of the salesperson during sales encounters is
different way of doing business with a financial services critical, many of the underlying mechanisms that govern the
company; that is Flemish people are known to be ‘savers’, interaction between salespersons and customers are still
whereas the Walloons make more use of personal loans to unclear (Van Dolen, Lemmink, de Ruyter, & de Jong, 2002).
acquire the products they want. Also previous research has In the financial services setting, it is plausible to believe that
taken the geographical region into account. For example, some customers experience less personal attention when a
Patterson and Smith (2003) investigated the propensity to salesperson is serving a large customer base, and as a
478 B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484

consequence is unable to know each customer personally. In models. For all three binary classification targets, we
this study, we account for this information and investigate its observe a significant (DeLong et al., 1988) and better
impact on both customer’s behavior and profitability. performance in favor of the random forests (cf. all p-values
range between !0.0001 and 0.025). Even for the next-buy
3.2.3.3. Sales assortment. The ‘sales assortment’ represents classification, we find a significant difference although the
the product variety offered by the salesperson. In their study, increase in prediction accuracy is rather low; that is an AUC
Hoch, Bradlow, and Wansink (1999) state that the variety in improvement of 0.006 (0.005) for the validation (esti-
offerings is viewed as the entree fee for maintaining future mation) sample. With respect to the predictive performance
customer loyalty. With respect to the investigated financial of the profit drop target variable, we observe a significant
services company, not every intermediary is selling the difference in AUC of 0.019 and 0.016 for, respectively, the
whole range of financial products to its customers; that is validation and estimation sample. In this study, the most
some typical ‘banking’ intermediaries solely supply bank- important and outperforming prediction accuracy of random
ing products, whereas some others only sell a limited variety forests can be found in the active partial-defection analysis,
of insurance products to their customers. As such, it is where we observe an AUC improvement of 0.106 (0.094)
possible that some customers are unable to acquire all for the validation (estimation) sample when benchmarking
financial products and services they need with their current its performance against a logistic regression model.
salesperson. In this study, we explore its impact on both In sum, the classification findings of this study indicate
retention and future customer profitability. the viable opportunity for both academics and practitioner
to consider other than the conventional prediction tech-
4. Findings niques (such as logistic regression) when investigating a
binary-classification problem. Especially, when the
The next paragraphs present the findings of the study. obtained goodness-of-fit indices based on conventional
First, we report the prediction accuracies of the various prediction models perform rather low—indicating that there
models. Next, we present the relative importance of each is more room for improvement—it is appealing to
explanatory variable with respect to the four dependent investigate whether other prediction techniques (such as
variables under investigation. Finally, we further examine random forests) perform better, since each major improve-
the signs of the 10 most important covariates for each target ment in predictive accuracy is likely to represent major
variable by means of some descriptive statistics.
shifts in terms of the effectiveness and the return on
4.1. Performance evaluation investment of marketing actions—that are based on
prediction models.
The evaluation criteria applied to investigate the In order to evaluate the performance of the linear
predictability of the four dependent variables are presented dependent variable, we use the mean absolute deviation
in Table 3. (MAD) criterion (cf. Section 2.3). The MAD for the
It is clear from the table that random forests provide regression forests model amounts to 5 (more specifically,
better prediction accuracies compared to logistic regression 5.099 for the test sample and 4.940 for the estimation
Table 3
Performance results

Dependent variable Technique AUC

Train Test
a
Next buy Random forests 0.752 0.751b
Logistic regression 0.747a 0.745b
Active partial-defection Random forests 0.734c 0.742d
Logistic regression 0.640c 0.636d
Profit drop Random forests 0.713e 0.714f
Logistic regression 0.697e 0.695f
Dependent variable Technique MAD
Train Test

Profit evolution Regression forests 4.940 5.099

Linear regression 5.346 5.445
a
Chi2Z5.057; dfZ1; pZ0.025.
b
Chi2Z7.157; dfZ1; pZ0.007.
c
Chi2Z305.470; dfZ1; pZ!0.0001.
d
Chi2Z383.140; dfZ1; pZ!0.0001.
e
Chi2Z114.528; dfZ1; pZ!0.0001.
f
Chi2Z145.970; dfZ1; pZ!0.0001.
B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484 479

sample), meaning that on average we obtain a prediction 4.2.2. Active partial-defection

‘error’ of five profitability points per customer. In contrast, With regard to the active partial-defection prediction, it
when evaluating the performance of the ordinary linear is striking that the three explanatory variables related to the
regression model, we observe an average MAD of 5.4 (that salespeople show the highest importance measures.
is 5.454 for the test sample and 5.346 for the estimation Especially, the selling tendency variable that is intuitively
sample), which corresponds with a decline of approximately more affiliated with the previous dependent variable (next
7% in terms of prediction accuracy. As such, we find buy) appears to be the number one predictor in terms of
evidence in this study that regression forests outperform importance, whereas the variable was attributed a 10th place
traditional linear regression techniques. for the next buy dependent variable. As such, the selling
capabilities of the agent do not only influence the customers
4.2. Relative importance indices for the explanatory purchase decisions, they also strongly drive customer’s
variables vulnerability to cancel non-ending status products. Further-
more, it is clear that the monetary value, the total product
As stated in Section 2, a welcome feature of the random ownership, the customer’s cross-buying behavior and age
forests techniques is the importance measures for the belong to the top-10 list of the most important variables.
explanatory variables. In Table 4, we report these Moreover, Table 4 reveals that these four variables are the
importance indices with regard to each dependent variable only ones that appear in the top 10 for all the dependent
of the study. The first three subsections in the table, variables under investigation in this study; suggesting its
respectively, present the importance measures with respect importance for a variety of CRM applications. Table 4 also
to the event of a next buy, active partial-defection and profit reveals the importance of risks products such as fire and car
drop, whereas the latter part of the table reports the insurances as well as SI products characterized by high
importance measures for the profit evolution target variable capital and revenue risks. Finally, the cohort variable
that is analyzed by means of regression forests. Per indicating whether a customer belongs to the Walloon or
subsection, we ranked the variables in terms of its Flemish part of Belgium seems to influence active partial-
corresponding importance level. defection. As such, we find evidence that besides the
A number of interesting findings emerge from the table. difference in language, there also exist differences with
In the next paragraphs, we elaborate on the top-10 variables respect to the cancellation of financial services products.
for each dependent variable in terms of their ranking.
4.2.3. Profit drop
4.2.1. Next buy For the profit drop dependent variable, we observe that
With respect to the binary classification variable that again the past customer behavior variables play an
expresses the customer’s likelihood to buy another product, important role as well as some customer demographic
we observe that, especially, past customer behavior variables. On the other hand, none of the variables related to
variables drive subsequent repeat-purchase behavior. It is the intermediaries show up in the top-10 list. With respect to
clear that specific product ownership has a major influence; the past behavior variables, we observe a stronger influence
more specifically: the possession of savings and investment of being an ‘active’ customer, since the self-banking
(SI) products characterized by high risks (e.g. stock-market variable has joined the top-10 importance list. As a
products), the possession of a current account and bank consequence, our findings give evidence that the interac-
cards, as well as the possession of risk products. tiveness of customers strongly influence the company’s
Furthermore, we observe the importance of customer’s future profits.
monetary values, the total product ownership and the fact
whether a customer owns products from different categories
(that is cross-buying). In terms of demographic variables, 4.2.4. Profit evolution
two variables belong to the top-10 list: the customer’s age A first thing that strikes the attention when investigating
and the individuals in their senior lifecycle stage (that is the importance indices for the profit evolution variable is
d_lifec_stage_4). With respect to the salesperson’s role, the that more than half of the explanatory variables reveal no
selling tendency of the intermediary seems to play a significant impact on the prediction of customers’ profit
significant and major influence. In sum, with respect to the evolutions. Hence, our results suggest that it is more difficult
customer’s next buy decision, we can conclude that the to understand the customer’s absolute changes in profit-
stage in the lifecycle as well as the decision to be an active ability, compared to the deduced format that only indicates a
(cf. possession of cards, current account, etc.) and loyal customer’s profit direction by means of a binary classifi-
customer (measured by means of cross-buying, monetary cation. With respect to the most important variables, we find
value and total product ownership) supported by the similar findings compared to the binary profit drop variable;
intermediary’s selling skills are responsible for future that is past customer behavior variables are most important
purchase behavior. followed by customer demographic variables.
480 B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484

Table 4
Importance of variables
Random forests Regression forests
Dependent variableZNext buy Dependent variableZActive partial- Dependent variableZProfit drop Dependent variableZProfit evolution
defection
No. Importance Variable name No. Importance Variable name No. Importance Variable name No. Importance Variable name
measurea measure measure measure
1 149.536 d_SI_high_risk 1 273.260 ST 1 174.616 mon_val 1 1.490 d_credits
2 147.642 d_curracc 2 240.702 nbr_cust 2 148.870 d_curracc 2 1.004 age
3 141.566 age 3 191.625 sales_assort 3 125.883 d_card 3 0.623 mon_val
4 128.041 mon_val 4 167.320 age 4 114.565 nbr_p 4 0.575 nbr_p
5 124.833 nbr_p 5 144.917 mon_val 5 95.953 age 5 0.273 cross_b
6 118.025 d_card 6 137.873 nbr_p 6 95.284 cross_b 6 0.232 nbr_cust
7 106.901 d_risks 7 132.024 d_region 7 94.376 d_credits 7 0.171 d_curracc
8 92.611 d_lifec_stage_4 8 126.118 d_risks 8 89.111 d_lifec_stage_4 8 0.144 d_soc_status_6
9 91.995 cross_b 9 123.324 cross_b 9 83.848 d_self_b 9 0.137 d_risks
10 75.584 ST 10 94.772 d_SI_high_risk 10 81.666 d_lifec_stage_2 10 0.091 d_lifec_stage_2
11 74.022 d_SI_low_risk 11 93.972 d_lifec_stage_2 11 79.941 sales_assort 11 0.039 d_card
12 73.556 d_credits 12 88.316 d_credits 12 72.971 d_region 12 0.009 d_self_b
13 70.177 nbr_cust 13 86.139 d_card 13 72.587 d_risks 13 0 ST
14 67.816 med_income 14 85.948 med_income 14 69.856 d_SI_high_risk 14 0 sales_assort
15 62.856 d_self_b 15 85.570 d_curracc 15 69.758 nbr_cust 15 0 med_income
16 61.791 d_lifec_stage_2 16 72.341 d_lifec_stage_4 16 64.157 d_SI_low_risk 16 0 d_SI_low_risk
17 57.312 d_lifec_stage_3 17 66.582 d_SI_low_risk 17 59.016 ST 17 0 d_SI_high_risk
18 54.640 sales_assort 18 63.783 d_lifec_stage_3 18 54.052 med_income 18 0 d_SI_stepst
19 54.232 d_SI_stepst 19 59.480 d_soc_status_6 19 47.041 d_lifec_stage_3 19 0 d_gender
20 36.329 d_region 20 54.517 d_self_b 20 39.008 d_SI_stepst 20 0 d_soc_status_1
21 28.606 d_soc_status_1 21 52.649 d_soc_status_8 21 38.321 d_soc_status_8 21 0 d_soc_status_2
22 25.130 d_soc_status_7 22 39.375 d_soc_status_2 22 25.617 d_soc_status_7 22 0 d_soc_status_3
23 21.529 d_soc_status_2 23 37.604 d_SI_stepst 23 22.509 d_soc_status_3 23 0 d_soc_status_4
24 19.544 d_soc_status_6 24 35.199 d_soc_status_1 24 21.363 d_soc_status_6 24 0 d_soc_status_5
25 14.317 d_soc_status_3 25 32.463 d_soc_status_7 25 17.138 d_soc_status_2 25 0 d_soc_status_7
26 8.789 d_soc_status_8 26 30.958 d_soc_status_3 26 13.215 d_soc_status_1 26 0 d_soc_status_8
27 4.66 d_lifec_stage_1 27 12.634 d_soc_status_5 27 11.699 d_soc_status_4 27 0 d_lifec_stage_1
28 4.193 d_gender 28 9.913 d_gender 28 10.836 d_lifec_stage_1 28 0 d_lifec_stage_3
29 0 d_soc_status_4 29 9.059 d_lifec_stage_1 29 9.72 d_gender 29 0 d_lifec_stage_4
30 0 d_soc_status_5 30 6.666 d_soc_status_4 30 4.288 d_soc_status_5 30 0 d_region
a
An importance measure of ‘0’ represents no significant impact of the corresponding explanatory variable on the target variable of investigation.

4.3. Investigation of the direction of impact on the explanatory variables, we apply simple chi-square statistics,
dependent variables whereas T-tests are performed for the other covariates.
While most explanatory variables have the excepted
While Section 4.2 provides a clear understanding of the sign, some other findings deserve some further explanation.
explanatory variables that have a strong impact on the four In the next paragraphs we briefly summarize the most
dependent variables of this study, the directions of these intriguing findings of Table 5.
impacts are still unknown. For example, the variable Section 4.2 revealed the importance of the past behavior
‘d_region’ plays an important role in the prediction of variables, such as total product ownership, cross-buying,
active partial-defection, but nevertheless we have no monetary value and specific product ownership; in this
indication whether Flemish customers, in contrast to their extended analysis we find that all these explanatory
Walloon counterparts, are less or more likely to defect. variables have a positive association with all the events
Hence, we decided to perform some additional descriptive under investigation: that is next buy, active partial-defection
analyses to gain insight into the direction of the most and profit drop. The latter finding implies that, for instance,
important explanatory variables. Analogous to Section 4.2, customers with higher monetary value or individuals that
we focus on the top-10 most important predictors and we possess more products from different categories (cross-
only investigate the binary target variables. Table 4 buying) are not only more likely to buy new products in the
summarizes the descriptive statistics. In fact, we analyze future (next buy), they are also more vulnerable to cancel
two strata (e.g. next buyers or not) and we wonder whether other products with a non-ending status (active partial-
we observe a statistically significant difference with respect defection), which probably results in a negative profitability
to the 10 most important variables. For the binary evolution (profit drop). In sum, our findings suggest the
B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484 481

Table 5
Descriptive statistics for the most important explanatory variables

Explanatory variable Strata

Next buyers No next buyers p-value
1 d_SI_high_risk Percentage of customers that possess SI 26.78% 12.80% !0.0001
products characterized by high risks
2 d_curracc Percentage of customers that own a 55.69% 37.74% !0.0001
current account
3 age Mean age per strata 50.70 44.32 !0.0001
4 mon_val Mean monetary value per strata 14,849 7380.9 !.0001
5 nbr_p Mean number of products owned by the 11.08 4.58 !0.0001
customer
6 d_card Percentage of customers that possess 40.29% 23.88% !0.0001
cards
7 d_risks Percentage of customers that own risk 24.82% 8.28% !0.0001
products
8 d_lifec_stage_4 Percentage of customers that belong to 34.06% 27.70% !0.0001
lifecycle stage 4
9 cross_b Mean number of cross-buyings per strata 2.99 1.87 !0.0001
10 ST Mean selling tendency of the customer’s 34.70 31.24 !0.0001
intermediary
Explanatory variable Strata

Active partial-defectors No active partial-defectors p-value

1 ST Mean selling tendency of 30.99 31.76 0.025

the customer’s intermedi-
ary
2 nbr_cust Mean number of customers 1367 1540.5 !0.0001
served by the salesperson
3 sales_assort Mean number of different 11.837 11.987 !0.0001
product categories sold by
the intermediary
4 age Mean age per strata 47.18 44.99 !0.0001
5 mon_val Mean monetary value per 12,198 8104 !0.0001
strata
6 Nbr_p Mean number of products 12.70 4.91 !0.0001
owned by the customer
7 d_region Percentage of customers 73.60% 70.54% !0.0001
belonging to the Flemish
part of Belgium
8 d_risks Percentage of customers 17.50% 9.96% !0.0001
that own risk products
9 cross_b Mean number of cross- 2.49 1.99 !0.0001
buyings per strata
10 d_SI_high_risk Percentage of customers 24.61% 13.93% !0.0001
that possess SI products
characterized by high risks
Explanatory variable Strata

Profit droppers No profit droppers p-value

1 mon_val Mean monetary value per 9964.8 7731.1 !0.0001

strata
2 d_curracc Percentage of customers 63.48% 30.81% !0.0001
that own a current account
3 d_card Percentage of customers 43.94% 18.93% !0.0001
that possess cards
4 nbr_p Mean number of products 5.79 5.31 0.002
owned by the customer
5 age Mean age per strata 46.86 44.42 !0.0001
6 cross_b Mean number of cross- 2.37 1.89 !0.0001
buyings per strata
7 d_credits Percentage of customers 24.40% 12.56% !0.0001
that possess credit products
(continued on next page)
482 B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484

Table 5 (continued)
Explanatory variable Strata

Profit droppers No profit droppers p-value

8 d_lifec_stage_4 Percentage of customers 27.21% 29.08% !0.0001

that belong to lifecycle
stage 4
9 d_self_b Percentage of customers 16.74% 7.42% !0.0001
doing self-banking
10 d_lifec_stage_2 Percentage of customers 31.96% 35.39% !0.0001
that belong to lifecycle
stage 2

existence of a typical group of active customers that are retention and profitability. For the first outcome, we analyze
constantly buying and defecting on financial products. two different measures: the opening of a new product (next
Furthermore, with respect to the variables related to the buy) and the decision to cancel a product with a non-ending
salesperson in the active partial-defection case, Table 5 status (active partial-defection). With respect to the latter
reveals rather small (but significant) differences when outcome, we investigate how customers evolve in terms of
comparing defectors versus non-defectors. Given the fact the profitability they represent for the company by means of
that these variables nevertheless represent the top three of a binary (profit drop) and a linear (profit evolution)
most important variables, we can certainly ascertain the dependent variable. More specifically, the first three
need to consider the intermediaries’ role when trying to measures involve a binary classification problem and are
understand typical customer behavior outcomes; since even analyzed by using random forests; for the latter target
small improvements in, for example, the intermediary’s variable (profit evolution), we applied regression forests.
selling capabilities or the sales assortment are likely to result Our research findings support previous studies that favor
in favorable customer behaviors. With regard to the the use of random forests techniques. In this study, we
‘number of customers served’ variable, we find the opposite observe significant improvements in terms of prediction
effect of what was hypothesized; that is customers accuracy when benchmarking the random and regression
belonging to larger agencies show lower active partial-
forests against the conventional logistic and linear
defection rates. A possible explanation might be found in
regression models.
the fact that serving fewer customers is just the result
Another interesting feature of the random forests
(instead of the ‘cause’) of customer defections in the past.
technique concerns the produced importance measures
Furthermore, it is also likely to assume that intermediaries
which indicate the variables that have the greatest impact
who serve fewer customers, experience a heavier compe-
on the dependent variable of investigation. In this study,
tition in their immediate vicinity, such that their customers
have more alternatives to switch. Another explanation might we find evidence that past customer behavior variables
be that customers perceive large agencies as more reliable, play an important role in predicting future customer
and as a consequence prefer them above smaller agencies. behavior and profitability. Another important finding of
Further research on this issue is warranted. the study is the relative importance of the variables
Finally, when we consider the customer lifecycle stage, related to intermediaries with respect to the active
we observe that seniors (d_lifec_stage_4) are more likely to partial-defection classification. It is clear that good
repurchase, but less vulnerable to decrease their profitability. selling agents not only generate more repeat purchases,
Also, families with young children show evidence of positive they also indirectly prevent customers from (partial)
profitability evolutions. As a consequence, the other defection. The same logics apply for the sales assortment
categories, such as the youngsters and the midlife category of the salesperson. For the company of investigation, it
are mainly responsible for the negative profit evolutions. offers a viable opportunity to encourage its salesforce to
Hence, it is crucial for financial services companies to gain a supply the whole range of financial products and
better understanding of these typical lifecycle stages such services, since a limited sales assortment is likely to
that the appropriate and proactive actions can be taken to stimulate customer-switching behavior. With respect to
guarantee the company’s future profits. the customer demographic variables, our findings reveal
the importance of the customer’s age and the stage of his
lifecycle. On the other hand, the customer’s gender and
5. Discussion the geo-graphical data gathered at the place of residence
level are less powerful in terms of predicting customer
This study investigates two typical and major outcomes retention and profitability, although they report significant
of customer relationship management (CRM): customer associations with the binary dependent variables.
B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484 483

Furthermore, we comparing the three binary classifi- points, respectively). In sum, just as the well-known
cation outcomes and its most important predictors, it is claim that ‘it is important to retain existing customers’,
striking that four of the top-10 variables are the same: total our research findings extend the same analogy with
product ownership, monetary value, cross-buying and the regard to customers’ profitability: ‘It is more profitable to
customer’s age. Moreover, when exploring their impact of retain the most profitable customers of the company’.
the dependent variable by means of descriptive statistics, we
observe the same positive impact on next buy, active partial-
Acknowledgements
defection and profit drop. As such, we find evidence that the
same set of variables is likely to generate both next-buy and
The authors would like to thank the anonymous company
defection behavior in terms of profits and products. These
that supplied the data to perform this research study.
intriguing findings suggest the existence of a highly active
Moreover, we are grateful to Leo Breiman for the public
customer segment, that is buying new products while it is
availability of the random forests and regression forests
switching on other financial products and invite us to
software.
perform some extra analyses that relate the next buy with
the active partial-defection and customer profitability
variables. Appendix A. Investigation of the next
In Appendix A, we present the statistics for the next buy–partial-defection–customer profitability triad
buy versus active partial-defection versus customer
profitability triad. The statistics indicate that more than Frequency table of next buy!active partial-defection
25% of the active partial-defectors also bought a new
product within the same period of observation, compared Frequency
to a 12.36 buying percentage for the customers who did Row% Active partial-defection
not cancel a non-ending product. As such, we find Column% No Yes Total
support for our theory that the company contains a Next buy No 81,671 5043 86,714
typical segment of active customers that constantly 94.18% 5.82%
replace old products by newer ones. Another striking 87.64% 74.10%
Yes 11,523 1763 13,286
finding concerns the link between next buy and 86.73% 13.27%
customers’ profit evolutions. It is clear from Appendix 12.36% 25.90%
A that more than 35% of the customers who bought a Total 93,194 6806 100,000
new product also experienced a profit drop, whereas their p-valueZ!0.0001.
counterparts who did not repurchase report lower
Frequency table of next buy!profit drop
percentages for the profit drop variable (that is
27.44%). Fortunately, in terms of absolute profitability Frequency
shifts (cf. profit evolution), we do not observe a Row% Profit drop
statistically significant difference whether customers Column% No Yes Total
purchased a next product. With respect to the relationship Next buy No 62,923 23,791 86,714
between active partial-defection and customers’ profit 72.56% 27.44%
evolution we observe the dramatic impact of customers’ 88.02% 83.43%
decision to cancel a non-ending product on their Yes 8561 4725 13,286
64.44% 35.56%
profitability evolution. Appendix A reveals that almost 11.98% 16.57%
70% of the active partial-defectors experienced a profit Total 71,484 28,516 100,000
drop, while approximately one quarter (25.56%) of the
p-valueZ!0.0001.
people who did not defect on products showed a negative
evolution with regard to the revenues they represent for Frequency table of active partial-defection!profit drop
the company. Similar conclusions can be derived for the Frequency
profit evolution variable. Summarized, the latter findings Row% Profit drop
are in line with previous research studies that underscore Column% No Yes Total
the impact of customer retention on a company’s Active par- No 69,372 23,822 93,194
profitability: ‘It is important to retain existing customers’. tial-defec- 74.44% 25.56%
Finally, when linking the two profit evolution variables tion 97.05% 83.54%
with each other, we confirm our descriptive findings Yes 2112 4694 6806
31.03% 68.97%
resulting from Table 1 (cf. Section 3.1): on average the
2.95% 16.46%
extent to which customers experience profit drops (in Total 71,484 28,516 100,000
terms of absolute profitability points) is more intense
than the extent to which other customers are able to p-valueZ!0.0001.
grow in profits (that is K8.72 versus C1.60 profitability T-tests for the profit evolution variables
484 B. Larivière, D. Van den Poel / Expert Systems with Applications 29 (2005) 472–484

Ganesh, J., Arnold, M. J., & Reynolds, K. E. (2000). Understanding the

Strata Mean profit evolution customer base of service providers: An examination of the differences
Next buyers K1.83 ns between switchers and stayers. Journal of Marketing, 64(3), 65–87.
No next buyers K1.31 ns Guenzi, P., & Pelloni, O. (2004). The impact of interpersonal relationships
Active partial-defectors K4.45a on customer satisfaction and loyalty to the service provider.
No active partial-defectors K1.11a International Journal of Industry Management, 15(3/4), 365–384.
Profit droppers K8.72a Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area
No profit droppers 1.60a under a receiver operating characteristic (ROC) curve. Radiology,
143(1), 29–36.
ns, not statistically significant. Hoch, S. J., Bradlow, E. T., & Wansink, B. (1999). The variety of an
a
Statistically significant at !0.0001. assortment. Marketing Science, 18(4), 527–546.
Hsieh, N.-C. (2004). An integrated data mining and behavioral scoring
model for analyzing bank customers. Expert Systems with Applications,
27(4), 623–633.
References Huber, C. P., Lane, K. R., & Pofcher, S. (1998). Format renewal in banks—
it’s not that easy. McKinsey Quarterly, 1998(2), 148–156.
Athanassopoulos, A. D. (2000). Customer satisfaction cues to support Hwang, H., Jung, T., & Suh, E. (2004). An LTV model and customer
market segmentation and explain switching behavior. Journal of segmentation based on customer value: A case study on the wireless
Business Research, 47(3), 191–207. telecommunication industry. Expert Systems with Applications, 26(2),
Baesens, B., Verstraeten, G., Van den Poel, D., Egmont-Petersen, M., Van 181–188.
Kenhove, P., & Vanthienen, J. (2004). Bayesian network classifiers for Ishwaran, H., Blackstone, E. H., Pothier, C. E., & Lauer, M. S. (2004).
identifying the slope of the customer lifecycle of long-life customers. Relative risk forests for exercise heart rate recovery as a predictor of
European Journal of Operational Research, 156(2), 508–523. mortality. Journal of the Amercian Statistical Association, 99(467),
Baesens, B., Viaene, S., Van den Poel, D., Vanthienen, J., & Dedene, G. 591–600.
(2002). Bayesian neural network learning for repeat purchase modelling Kamakura, W. A., Ramaswami, S. N., & Srivastava, R. K. (1991). Applying
in direct marketing. European Journal of Operational Research, latent trait analysis in the evaluation of prospects for cross-selling of
138(1), 191–211. financial services. International Journal of Reseach in Marketing, 8(4),
Bhattacharya, C. B. (1998). When customers are members: Customer 329–349.
retention in paid membership contexts. Journal of the Academy of Larivière, B., & Van den Poel, D. (2004). Investigating the role of product
Marketing Science, 26(1), 31–44. features in preventing customer churn, by using survival analysis and
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. choice modeling: The case of financial services. Expert Systems with
Buckinx, W., & Van den Poel, D. (2005). Customer base analysis: Partial Applications, 27(2), 277–285.
defection of behaviourally-loyal clients in a non-contractual FMCG Lewis, B. R., & Bingham, G. H. (1991). The youth market for financial
retail setting. European Journal of Operational Research, 164(1), 252– services. International Journal of Bank Marketing, 9(2), 3–11.
268. Luo, T., Kramer, K., Goldgof, D. B., Hall, L. O., Samson, S., Remsen, A.,
Colgate, M. R., & Danaher, P. J. (2000). Implementing a customer et al. (2004). Recognizing plankton images from the shadow image
relationship strategy: The asymmetric impact of poor versus excellent particle profiling evaluation recorder. IEEE Transactions on Systems
execution. Journal of the Academy of Marketing Science, 28(3), 375– Man and Cybernetics Part B—Cybernetics, 34(4), 1753–1762.
387. McNeal, J. U. (1999). The kids market: Myths and realities. Ithaca, New
DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). York: Paramount Market Publishing.
Comparing the areas under two or more correlated receiver operating Patterson, P. G., & Smith, T. (2003). A cross-cultural study of switching
characteristic curves: A nonparametric approach. Biometrics, 44(3), barriers and propensity to stay with service providers. Journal of
837–845. Retailing, 79(2), 107–120.
Deng, Y. P., Chen, H. S., Tao, L., Sha, Q. Y., Chen, J., Tsai, C. J., et al. Reinartz, W. J., & Kumar, V. (2000). On the profitability of long-life
(2004). Joint analysis of two microarray gene-expression data sets to customers in a noncontractual setting: An empirical investigation and
select lung adenocarcinoma marker genes. BMC Bioinformatics, 5(81), implications for marketing. Journal of Marketing, 64(4), 17–35.
1–12. Reinartz, W. J., & Kumar, V. (2003). The impact of customer relationship
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification. New characteristics on profitable lifetime duration. Journal of Marketing,
York: Wiley. 67(1), 77–99.
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of Van den Poel, D., & Larivière, B. (2004). Customer attrition analysis for
discrimination methods for the classification of tumors using gene financial services using proportional hazard models. European Journal
expression data. Journal of the American Statistical Association, of Operational Research, 157(1), 196–217.
97(457), 77–87. Van Dolen, W., Lemmink, J., de Ruyter, K., & de Jong, A. (2002).
Ganesan, S. (1994). Determinants of long-term orientation in buyer–seller Customer-sales employee encounters: A dyadic perspective. Journal of
relationships. Journal of Marketing, 58(2), 1–19. Retailing, 78(4), 265–279.

CRM Chapter 4
100% (1)
CRM Chapter 4
36 pages
A Quantitative Approach to Commercial Damages: Applying Statistics to the Measurement of Lost Profits
From Everand
A Quantitative Approach to Commercial Damages: Applying Statistics to the Measurement of Lost Profits
Mark G. Filler
No ratings yet
1 s2.0 S0020025519312022 Main
No ratings yet
1 s2.0 S0020025519312022 Main
19 pages
131 574 1 PB
No ratings yet
131 574 1 PB
12 pages
Ref 4
No ratings yet
Ref 4
16 pages
CRM Chapter 16
No ratings yet
CRM Chapter 16
41 pages
A SLR On Customer Dropout Prediction 44
No ratings yet
A SLR On Customer Dropout Prediction 44
29 pages
Customer Profitability Forecasting Using Big Data Analytics
No ratings yet
Customer Profitability Forecasting Using Big Data Analytics
11 pages
Undersatanding Churn in B2B and Imporance 2025
No ratings yet
Undersatanding Churn in B2B and Imporance 2025
34 pages
Customer Retention &amp Customer Loyalty - MFS
No ratings yet
Customer Retention &amp Customer Loyalty - MFS
26 pages
Milan Et Al. - 2015 - A Brazilian Experience of Customer Retention and I
No ratings yet
Milan Et Al. - 2015 - A Brazilian Experience of Customer Retention and I
19 pages
Conference Paper
No ratings yet
Conference Paper
11 pages
Market Research
From Everand
Market Research
Dr.V.V.L.N. Sastry
No ratings yet
IJIKMv18p087 105tran8783
No ratings yet
IJIKMv18p087 105tran8783
19 pages
Predicting Customer Using SVM
100% (1)
Predicting Customer Using SVM
24 pages
Customer Churn by Chen2014
No ratings yet
Customer Churn by Chen2014
20 pages
Customers Retention The Key To Business Performance
100% (1)
Customers Retention The Key To Business Performance
13 pages
Customer Retention Practices A Case of Cal Bank
50% (2)
Customer Retention Practices A Case of Cal Bank
88 pages
PDF Eng
No ratings yet
PDF Eng
13 pages
Customer Retention
No ratings yet
Customer Retention
16 pages
Customer Retention
No ratings yet
Customer Retention
4 pages
Research Study On Customer Retention: Submitted To
No ratings yet
Research Study On Customer Retention: Submitted To
18 pages
Jtaer 17 00024
No ratings yet
Jtaer 17 00024
18 pages
Forecasting Client Retention A Machine-Learning Approach
No ratings yet
Forecasting Client Retention A Machine-Learning Approach
9 pages
Managing Churn and Maximizing Profit
No ratings yet
Managing Churn and Maximizing Profit
38 pages
Customer Profitability - Marketing Metrics
No ratings yet
Customer Profitability - Marketing Metrics
62 pages
Behavioral Attributes and Financial Churn Prediction: Regulararticle Open Access
No ratings yet
Behavioral Attributes and Financial Churn Prediction: Regulararticle Open Access
18 pages
Session 4 5 Economics of CRM
No ratings yet
Session 4 5 Economics of CRM
19 pages
MADM Tutorial 2 Calculations With Answers
No ratings yet
MADM Tutorial 2 Calculations With Answers
6 pages
Beyond Customer Satisfaction Customer Commitment
No ratings yet
Beyond Customer Satisfaction Customer Commitment
36 pages
Customer Retention A
No ratings yet
Customer Retention A
21 pages
(2021) Customer Retention and Related Factors Analysis in Financial Services Company
No ratings yet
(2021) Customer Retention and Related Factors Analysis in Financial Services Company
9 pages
Machine-Learning Techniques For Customer Retention - A Comparative Study
No ratings yet
Machine-Learning Techniques For Customer Retention - A Comparative Study
9 pages
Customer Retention Paper Dec 2005
No ratings yet
Customer Retention Paper Dec 2005
38 pages
Customer Retention Strategies and Customer Loyalty
No ratings yet
Customer Retention Strategies and Customer Loyalty
4 pages
Machine Learning Based Customer Churn Prediction in Banking: November 2020
No ratings yet
Machine Learning Based Customer Churn Prediction in Banking: November 2020
7 pages
Customer Profitability Analysis and Customer Life Time Value Models: Portfolio Analysis
No ratings yet
Customer Profitability Analysis and Customer Life Time Value Models: Portfolio Analysis
12 pages
Customer Segmentation Analysis and Customer Lifetime Value Prediction Using Pareto/NBD and Gamma-Gamma Model
No ratings yet
Customer Segmentation Analysis and Customer Lifetime Value Prediction Using Pareto/NBD and Gamma-Gamma Model
18 pages
Ai 900
No ratings yet
Ai 900
168 pages
Customer Profitability Analysis: Prof. Chhabi Sinha
No ratings yet
Customer Profitability Analysis: Prof. Chhabi Sinha
21 pages
Customer Satisfaction and Loyalty
100% (2)
Customer Satisfaction and Loyalty
14 pages
ReSci - Retention Marketing & Predictive Analytics
No ratings yet
ReSci - Retention Marketing & Predictive Analytics
27 pages
Machine Learning Framework For Customer Purchase Prediction
No ratings yet
Machine Learning Framework For Customer Purchase Prediction
9 pages
A Study On Customer Retention at Canara Bank, Palani
No ratings yet
A Study On Customer Retention at Canara Bank, Palani
7 pages
Retentionstrategy 090919231621 Phpapp01
No ratings yet
Retentionstrategy 090919231621 Phpapp01
37 pages
A Multivariate Analysis of The Auditor's Going-Concern Opinion Decision
0% (1)
A Multivariate Analysis of The Auditor's Going-Concern Opinion Decision
16 pages
Chapter 2 RRL
No ratings yet
Chapter 2 RRL
14 pages
Decision Tree Technique For Customer Retention in Retail Sector
No ratings yet
Decision Tree Technique For Customer Retention in Retail Sector
2 pages
Resultados Após Exclusão Manual Bruta
No ratings yet
Resultados Após Exclusão Manual Bruta
5,910 pages
RELIANCETRENDSCHENNAI1
No ratings yet
RELIANCETRENDSCHENNAI1
6 pages
Chapter On 1
No ratings yet
Chapter On 1
42 pages
Customer Retention (Vugar Behbudov)
No ratings yet
Customer Retention (Vugar Behbudov)
5 pages
Naan Muthalvan Project Report Stock Market Forecast 4310
No ratings yet
Naan Muthalvan Project Report Stock Market Forecast 4310
29 pages
Customer Retention Research
No ratings yet
Customer Retention Research
7 pages
CRM Assignment
No ratings yet
CRM Assignment
6 pages
Customer Retention Strategies
No ratings yet
Customer Retention Strategies
6 pages
GiaoHoThanh - RFM and CLV Paper - V2
No ratings yet
GiaoHoThanh - RFM and CLV Paper - V2
16 pages
Chap 4
No ratings yet
Chap 4
6 pages
Pricing Types: Signalling Market Positioning Intent
From Everand
Pricing Types: Signalling Market Positioning Intent
Robert David Hughes
No ratings yet
Impact of Customer Retention Practices On Firm Performance: Gengeswari, K
No ratings yet
Impact of Customer Retention Practices On Firm Performance: Gengeswari, K
17 pages
A Data Mining Approach To Predict Prospective Business Sectors For Lending in Retail Banking Using Decision Tree
No ratings yet
A Data Mining Approach To Predict Prospective Business Sectors For Lending in Retail Banking Using Decision Tree
10 pages
Watchline Vol III No 44
No ratings yet
Watchline Vol III No 44
2 pages
Customer Retention Practices of Microfinance Banks
No ratings yet
Customer Retention Practices of Microfinance Banks
7 pages
ICOLD2021 ASR Benchmark
No ratings yet
ICOLD2021 ASR Benchmark
13 pages
Machine Learning Methods
No ratings yet
Machine Learning Methods
27 pages
Assessment Activity: Marketing Concepts: Aamer Adam
No ratings yet
Assessment Activity: Marketing Concepts: Aamer Adam
11 pages
Victor Shih
No ratings yet
Victor Shih
33 pages
HPA-2023 Paper 20
No ratings yet
HPA-2023 Paper 20
28 pages
Data Mining and Business Analytics
No ratings yet
Data Mining and Business Analytics
7 pages
Machine Learning Ensembles For Wind Power Prediction: Version of Record
No ratings yet
Machine Learning Ensembles For Wind Power Prediction: Version of Record
23 pages
Demand Forecasting For Fashion Products
No ratings yet
Demand Forecasting For Fashion Products
21 pages
Predicting Online Purchase Intentions For Clothing Products
100% (1)
Predicting Online Purchase Intentions For Clothing Products
15 pages
Literature Review Machine Learning
100% (1)
Literature Review Machine Learning
9 pages
Management Systems in digital business Environments: Howto keep the balance of agility and stability while establishing governance frameworks
From Everand
Management Systems in digital business Environments: Howto keep the balance of agility and stability while establishing governance frameworks
Helmut Steigele
No ratings yet
Pms Prediction Models RR
No ratings yet
Pms Prediction Models RR
165 pages
Some Thoughts On Education
No ratings yet
Some Thoughts On Education
237 pages
Chapter 6: Consumer Retention
No ratings yet
Chapter 6: Consumer Retention
8 pages
Prospective Analysis
No ratings yet
Prospective Analysis
4 pages
Prediction Theory
No ratings yet
Prediction Theory
90 pages
An Approach For Rainfall Prediction Using Long Short Term Memory Neural Network
No ratings yet
An Approach For Rainfall Prediction Using Long Short Term Memory Neural Network
6 pages
Data in Machine Learning
No ratings yet
Data in Machine Learning
7 pages
Chapter 5 Qualitative Sales Forecasting (Research PDF
No ratings yet
Chapter 5 Qualitative Sales Forecasting (Research PDF
32 pages
Journal of Fashion Marketing and Management: An International Journal
No ratings yet
Journal of Fashion Marketing and Management: An International Journal
18 pages
HCPC Husson Josse
No ratings yet
HCPC Husson Josse
17 pages
A Study On Software Effort Prediction Using Machine Learning Techniques
No ratings yet
A Study On Software Effort Prediction Using Machine Learning Techniques
15 pages
Time Series Analysis in The Toolbar of Minitab's Help
No ratings yet
Time Series Analysis in The Toolbar of Minitab's Help
30 pages
Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu
No ratings yet
Market Basket Analysis in A Multiple Store Environment: Yen-Liang Chen, Kwei Tang, Ren-Jie Shen, Ya-Han Hu
16 pages
Application of Multiple Linear Regression Model in Varanasi District, InDIA
No ratings yet
Application of Multiple Linear Regression Model in Varanasi District, InDIA
19 pages
Combination of Multiple Classifiers For The Customer's Purchase Behavior Prediction
No ratings yet
Combination of Multiple Classifiers For The Customer's Purchase Behavior Prediction
9 pages
Factor Selection For Delay Analysis Using Knowledge Discovery in Databases
No ratings yet
Factor Selection For Delay Analysis Using Knowledge Discovery in Databases
11 pages
Early Prediction of Market Success For New Grocery Products: - Louis A. Fourt and Joseph W. Woodlock
No ratings yet
Early Prediction of Market Success For New Grocery Products: - Louis A. Fourt and Joseph W. Woodlock
8 pages
A Market Basket Analysis Conducted With A Multivariate Logit Mod
No ratings yet
A Market Basket Analysis Conducted With A Multivariate Logit Mod
8 pages
Application of Predictive Analytics in Customer Relationship Mana
No ratings yet
Application of Predictive Analytics in Customer Relationship Mana
8 pages
Evaluation (Not Validation) of Quantitative Models: Oreskes
No ratings yet
Evaluation (Not Validation) of Quantitative Models: Oreskes
8 pages
Business Analytics:: From Data To Insights
No ratings yet
Business Analytics:: From Data To Insights
10 pages
Markus Benjamin and German Reproduction Debate
No ratings yet
Markus Benjamin and German Reproduction Debate
14 pages
The Importance of Statistics in Many Different Fields
No ratings yet
The Importance of Statistics in Many Different Fields
4 pages
Impact Analysis PDF
No ratings yet
Impact Analysis PDF
3 pages
Next-Item The Market Basket: Discovery Analysis
No ratings yet
Next-Item The Market Basket: Discovery Analysis
2 pages
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
No ratings yet
CS 432/536 (SP 17-18) - Dr. Mian Muhammad Awais Page 1 of 2
2 pages
Data Mining Assignment 1
No ratings yet
Data Mining Assignment 1
2 pages
Weather Forecasting Using BigData
No ratings yet
Weather Forecasting Using BigData
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lariviere 2005

Uploaded by

Lariviere 2005

Uploaded by

Expert Systems with Applications 29 (2005) 472–484

Predicting customer retention and profitability by using random forests

1. Introduction clear consensus about the true relationship between

2.3. Evaluation criteria 3. Empirical study

Dependent variablesa Estimation sample (NZ50,000) Validation sample (NZ50,000)

self-banking with ATM, phone banking or internet banking)

1. Past customer behavior

Dependent variable Technique AUC

Profit evolution Regression forests 4.940 5.099

sample), meaning that on average we obtain a prediction 4.2.2. Active partial-defection

Explanatory variable Strata

Active partial-defectors No active partial-defectors p-value

1 ST Mean selling tendency of 30.99 31.76 0.025

Profit droppers No profit droppers p-value

1 mon_val Mean monetary value per 9964.8 7731.1 !0.0001

Profit droppers No profit droppers p-value

8 d_lifec_stage_4 Percentage of customers 27.21% 29.08% !0.0001

Ganesh, J., Arnold, M. J., & Reynolds, K. E. (2000). Understanding the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.