MSC Hanif e 2019
MSC Hanif e 2019
MSC Hanif e 2019
Emad Hanif
10374354
DECLARATION
I, Emad Hanif, declare that this research is my original work
and that it has never been presented to any institution or university for the award of
Degree or Diploma. In addition, I have referenced correctly all literature and sources used
in this work and this work is fully compliant with the Dublin Business School’s
academic honesty policy.
Date: 07-01-2019
ACKNOWLEDGEMENTS
I would like to express deepest gratitude to my supervisor Dr. Shahram Azizi Sazi who built
the foundations of my work through his “Research Methods” modules, for his guidance,
encouragement, and gracious support throughout the course of my work and for his expertise
in the field that motivated me to work in this area.
I would like to thank Terri Hoare, instructor for “Data Mining” and John O’Sullivan, instructor
for “Programming for Data Analysis, Processing and Visualization”, who both taught me
several important concepts.
I would also like to thank Anita Dwyer, Postgraduate Program Coordinator who was always
helpful and super-fast to clarify and resolve any query.
Finally, I dedicate my work to my mother who motivated me to pursue Master’s degree and
who always supported me through prays, financial and moral support, especially during my
illness and difficulties.
ABSTRACT
Customer Churn is a critical point of concern for organizations in the telecommunications
industry. It is estimated that this industry has an approximate annual churn rate of 30% leading
to a huge loss of revenue for organizations every year. Even though the telecom industry was
one of the first adopters of data mining and machine learning techniques to gain meaningful
insights from large sets of data, the issue of customer churn is still at large in this industry. This
thesis presents a predictive analytics approach to improve customer churn in the telecom
industry as well as the application of a technique typically used in retail contexts known as
“cross-selling” or “market basket analysis”.
A publicly available telecom dataset was used for the analysis. K-Nearest Neighbor, Decision
Tree, Naïve Bayes and Random Forest were the four classification algorithms that were used
to predict customer churn in RapidMiner and R. Apriori and FP-Growth were implemented in
RapidMiner to understand the associtations between the attributes in the dataset. The results
show that Decision Tree and Random Forest are the two most accurate algorithms in predicting
customer churn. The “cross-selling” results show that association algorithms are a practical
solution to discover associations between these items and services in this industry. The
discovery of patterns and frequent item sets can be used by telecom companies to engage
customers and offer services in a unique manner that is beneficial to their operation.
Overall, the key drivers of churn are identified in this study and useful associations between
products are established. This information can be used by companies to create personalised
offers and campaigns for customers who are at risk of churning. The study also shows that
association rules can help in identifying usage patterns, buying preferences, socio-economic
influences of customers.
ACKNOWLEDGEMENTS .................................................................................. 3
ABSTRACT .......................................................................................................... 4
2.2 Research Model of Churn Prediction Based on Customer Segmentation and Misclassification
Cost in the Context of Big Data (Yong Liu and Yongrui Zhuang, 2015) ......................................... 18
2.3 Analysis and Application of Data Mining Methods used for Customer Churn in Telecom
Industry (Saurabh Jain, 2016) ........................................................................................................... 19
2.4 A Survey on Data Mining Techniques in Customer Churn Analysis for Telecom Industry (Amal
M. Almana, Mehmet Sabih Aksoy, Rasheed Alzahrani, 2014) ........................................................ 20
2.5 Mining Big Data in Telecommunications Industry: Challenges, Techniques, and Revenue
Opportunity (Hoda A. Abdel Hafez, 2016) ....................................................................................... 21
2.6 Improved Churn Prediction in Telecommunication Industry Using Data Mining Techniques (A.
Keramati, R. Jafari-Marandi, M. Aliannejadi, I. Ahmadian, M. Mozzafari, U. Abbasi, 2014) ........ 23
2.7 Predict the Rotation of Customers in the Mobile Sector Using Probabilistic Classifiers in Data
Mining (Clement Kirui, Li Hong, Wilson Cheruiyot and Hillary Kirui, 2013) ................................ 24
2.8 A Proposed Model of Prediction of Abandonment (Essam Shaaban, Yehia Helmy, Ayman
Khedr, Mona Nasr, 2012) ................................................................................................................. 25
2.9 Telecommunication Subscribers' Churn Prediction Model Using Machine Learning (Saad
Ahmed Qureshi, Ammar Saleem Rehman, Ali Mustafa Qamar, Aatif Kamal, Ahsan Rehman, 2013)
.......................................................................................................................................................... 26
2.11 Crunch Time: Using Big Data to Boost Telco Marketing Capabilities (Holger Hurtgen, Samba
Natarajan, Steven Spittaels, Ole Jorgen Vetvik, Shaowei Ying, 2012) ............................................ 27
3.5.2 R ........................................................................................................................................... 44
CHAPTER 5 – CONCLUSION.......................................................................... 84
5.1 Introduction ................................................................................................................................. 84
REFERENCES.................................................................................................... 87
APPENDIX ......................................................................................................... 92
TABLE OF FIGURES
Figure 1: Types of Churners. (Source: Saraswat and Tiwari, 2018). .................................................... 12
Figure 2: Data Mining Process (Han et al, 2011). ................................................................................ 15
Figure 3: Hybrid Methodology (Source: Keramati et. al, 2014). .......................................................... 24
Figure 4: NPTB Recommendation Engine (Source: Hurtgen et. al (2012). ......................................... 28
Figure 5: k-nearest neighbour algorithm (Bronshtein, 2017)................................................................ 29
Figure 6: Decision tree showing survival probability of passengers on the Titanic ship (Source:
Milborrow, 2011). ................................................................................................................................. 30
Figure 7: Artificial Neural Networks example. Source (McDonald, 2017). ......................................... 33
Figure 8: FP-Tree of the example (Source: Kotu and Deshpande, 2014). ............................................ 36
Figure 9: Customer Churn by Gender and Type of Contract ................................................................ 41
Figure 10: Treemap of Customer Churn, Tenure and Monthly Charges .............................................. 42
Figure 11: Customer Churn by Gender and Payment Method .............................................................. 42
Figure 12: Customer Churn by Tenure ................................................................................................. 43
Figure 13: Auto Model Overview ......................................................................................................... 45
Figure 14: Auto Model Select Inputs .................................................................................................... 46
Figure 15: Model Types in Auto Model ............................................................................................... 47
Figure 16: Auto Model Results Screen ................................................................................................. 47
Figure 17: Auto Model Simulator ......................................................................................................... 48
Figure 18: Auto Model Simulator ......................................................................................................... 49
Figure 19: k-Nearest Neighbor: How to Implement in RapidMiner with Split Validation .................. 49
Figure 20: k-Nearest Neighbor: How to Implement in RapidMiner with Split Validation .................. 51
Figure 21: K-Nearest Neighbor: Performance Vector .......................................................................... 52
Figure 22: K-Nearest Neighbor: How to Implement in RapidMiner with Split Data ........................... 52
Figure 23: K-Nearest Neighbor: Performance Vector .......................................................................... 53
Figure 24: K-Nearest Neighbor: How to Implement in RapidMiner with Cross-validation................. 53
Figure 25: k-nearest neighbor: Performance Vector ............................................................................. 54
Figure 26: K-Nearest Neighbor: Interpreting the Results ..................................................................... 55
Figure 27: Decision Tree: How to Implement in RapidMiner with Cross-validation........................... 56
Figure 28: Decision Tree: Interpreting the Results ............................................................................... 59
Figure 29: Decision Tree: Interpreting the Results ............................................................................... 59
Figure 30: Decision Tree: Interpreting the Results ............................................................................... 60
Figure 31: Decision Tree in R: Interpreting the Results ....................................................................... 63
Figure 32: Decision Tree in R: Interpreting the Results ....................................................................... 63
Figure 33: Naïve Bayes: How to Implement in RapidMiner ................................................................ 64
Figure 34: Naïve Bayes: Performance Vector ...................................................................................... 65
Figure 35: Naïve Bayes: Interpreting the Results – Distribution Table Output (Class Conditional
Probability Table) ................................................................................................................................. 65
Figure 36: Naïve Bayes: Interpreting the Results – Distribution Table Output (Class Conditional
Probability Table) ................................................................................................................................. 65
Figure 37: Naïve Bayes: Interpreting the Results – Probability Distribution Function for “Tenure”. .. 66
Figure 38: Naïve Bayes: Interpreting the Results – Bar Chart for Contract (Yes or No) ..................... 66
Figure 39: Naïve Bayes: Interpreting the Results – Probability Distribution Function for “Monthly
Charges”. ............................................................................................................................................... 67
Figure 40: Naïve Bayes in R: Class Conditional Probability for Attributes ......................................... 69
Figure 41: Naïve Bayes in R: Class Conditional Probability for Attributes ......................................... 70
Figure 42: Random Forest: How to Implement in RapidMiner ............................................................ 71
Figure 43: Random Forest: Performance Vector .................................................................................. 72
Figure 44: Random Forest: Interpreting the Results ............................................................................. 72
Figure 45: Random Forest: Interpreting the Results – Random Forest Tree ........................................ 73
Figure 46: Random Forest Model in R ................................................................................................. 74
Figure 47: Random Forest Model in R ................................................................................................. 74
Figure 48: Random Forest: Plotting Important Variables ..................................................................... 76
Figure 49: ROC Curve of the Three Classification Models .................................................................. 77
Figure 50: FP-Growth: How to Implement in RapidMiner .................................................................. 78
Figure 51: FP-Growth: Interpreting the Results – Frequent Item Sets ................................................. 79
Figure 52: FP-Growth: Interpreting the Results – Frequent Item Sets ................................................. 79
Figure 53: FP-Growth: Interpreting the Results – Association Rules................................................... 80
Figure 54: FP-Growth: Interpreting the Results – Association Rules................................................... 80
Figure 55: FP-Growth: Interpreting the Results – Association Rules................................................... 81
Figure 56: FP-Growth: Interpreting the Results – Association Rules................................................... 81
Figure 57: Apriori: How to Implement in RapidMiner ......................................................................... 82
Figure 58: Apriori: Interpreting the Results .......................................................................................... 83
LIST OF TABLES
Table 1: Example of list of transactions .................................................................................. 36
Table 2: Optimizing Decision Tree Parameters ....................................................................... 57
Table 3: Summary of Performance of Classification Algorithms ........................................... 84
CHAPTER 1 - INTRODUCTION
Customers in the telecom industry, especially pre-paid customers are usually not under any
contract by a telecom operator and are thus always at risk of churning. This means that
customers could change their telecom operator without notice at their own convenience. Hence,
it is important to manage and identify customers that are likely to churn, especially in an
industry such as the telecom industry which is often characterized by strong competition and
volatile markets. Proper management of customers that are likely to churn can minimize the
probability of churn, while maximizing the profit of a company. Data mining plays a very
important role in telecommunications companies and their effort to reduce overall churn by
developing better marketing strategies, identifying fraudulent activities and customers and
better managing their network. Hence, one of the first and most important steps in managing
and improving churn is identifying customers that are likely to churn.
Involuntary Churners
Some customers are deliberately withheld service due to reasons which may include fraud,
failure to pay bills and sometimes even non-utilization or insufficient utilization of the service.
It can also be due to a customer’s relocation to a “long-term care facility, death, or the
relocation to a distant location”. These customers are generally removed by the phone company
from their service and are referred to as involuntary churners (Saraswat and Tiwari, 2018).
Voluntary Churners
Voluntary churner occurs when a customer decides to terminate his/her service with the
provider and switch to another company or provider. Telecom churn is usually of the voluntary
kind. It can also be further divided into two sub-categories – deliberate and incidental (Saraswat
and Tiwari, 2018).
Incidental churn can happen when something significant changes in a customer’s personal lives
which forces a customer to churn whereas deliberate churn can happen for reasons of
technology, with customers always wanting newer or better technology, better service quality
factors, social or psychological factors, and convenience reasons. According to Shaaban et. al
this churn issue is the one that management in telecom companies are always looking to solve
(Shaaban et. al, 2014).
1. Call Detail Data: This relates to information about the call, which is stored as a call detail
record. For every call placed on a network, a call detail record is generated to store information
about the call. Call detail data essentially relates to the average call duration, average call
originated, call period and call to/from different area code.
2. Network Data: Network data includes information about error generation and status
messages, which need to be generated in real time. The volume of network messages generated
is huge and data mining techniques and technologies are used to identify network faults by
extracting knowledge from network data (Joseph, 2013, p. 526). The network data also includes
information about the complex configuration of equipment data, data about error generation
and data that is essential for network management configuration.
3. Customer Data: The customer data includes information about the customer which includes
their name, age, address, telephone type, type of subscription plan, payment history and so on.
1.1.5 Data Mining Challenges in Telecom Industry
Data mining in the telecommunications industry faces a number of challenges. Advances in
technology has led to a monumental increase in the amount of data in the last decade or so. The
advent of mobile phones has led to the creation of highly diverse sources of data, which are
available in many different forms including tabular, objects, log records and free text (Chen,
2016, p. 3). Data in this industry has also grown exponentially since the growth of 3G and
Broadband, and it will continue to grow as technology is evolving constantly and at a rapid
pace.
According to Weiss (2010, p. 194), “telecom companies generate a tremendous amount of data,
the sequential and temporal aspects of their data, and the need to predict very rare events—
such as customer fraud and network failures—in real-time”. According to Joseph (2013),
another challenge in mining big data in the telecom industry is in the form of transactions,
which is not at the proper level for semantic data mining.
The biggest telecom companies have data which is usually in petabytes and often exceeds
manageable levels. Hence the scalability of data mining can also be a concern. Another concern
with telecommunication data and its associated applications includes the problem of rarity.
This is because telecom fraud and network failure are both rare events. According to Weiss,
(2004) “predicting and identifying rare events has been shown to be quite difficult for many
data mining algorithms” and this issue must be approached carefully to ensure good results.
These challenges can be overcome by the application of appropriate data mining techniques ,
and useful insights from data can be gained from the data that is available in this industry.
1.2 Market Basket Analysis for Marketing in Telecom
Market Basket Analysis is a technique used by retailers to discover association between items.
It allows companies to identify relationship between items that people buy. It is not a widely
used technique in the telecom industry, but telecom companies can benefit if market basket
analysis is applied appropriately.
For data mining, CRISP-DM will be followed. This methodology provides a complete
blueprint for tackling data mining projects in 6 stages. These 6 stages are business
understanding, data understanding, data preparation, modelling, evaluation and deployment.
1.3 Research Problem Definition & Research Purpose
Customer churn is a focus of any services & customer centric industry. Among them is the
telecom industry which suffers greatly from customer churn every year. It is estimated that this
industry has an approximate annual churn rate of 30%.
The telecom industry was one of the first adopters of data mining techniques to gain meaningful
insights from large sets of data. In order to tackle the issue of customer churn, data mining and
machine learning can be applied to predict customers who are likely to churn. These customers
can then be approached with appropriate sales and marketing strategies in order to retain their
services. Mining of big data in the telecom industry also offers organizations a real opportunity
to gain a comprehensive view of their business operations.
Cross-selling or market basket analysis is a technique that is usually applied in retail contexts
to discover associations between frequently purchased items. The telecom industry has become
an industry where customers usually buy or subscribe to multiple services from one company.
These include phone service, internet service, TV packages, streaming TV, online security etc.
Finding associations between these items can lead to the discovery of patterns that can be used
by telecom companies to engage customers and offer services that are beneficial to their
operation.
Hence, the purpose of this is to not only develop effective and efficient models to recognize
customer before they churn, but also to apply cross-selling techniques in a telecom context to
find useful patterns and associations that can be used effectively by telecom companies.
How can data mining and machine learning techniques be effectively applied to predict
customer churn in the telecom industry?
Does cross-selling or market basket analysis offer a viable solution to gain valuable
insights in the telecom industry?
What are the opportunities and challenges in the application of data mining and machine
learning techniques in the telecom industry?
Chapter 1 - This chapter includes the Introduction and background of the topic as well as the
research problem & purpose, research question and objectives.
Chapter 2 - This chapter includes a review of relevant literature, summary and findings from
the reviewed research papers as well as a review of classification and association machine
learning algorithms.
Chapter 3 -This chapter defines the research methodology and the information about the
dataset used for the research.
Chapter 4 - This chapter includes the process of creating classification and association models
in RapidMiner as well as an analysis of their performance, results and the insights gained
from applying these models. These classification algorithms are also applied in R.
Chapter 5 - This chapter concludes the thesis with a conclusion of the results and the insights
gained from the study.
CHAPTER TWO - LITERATURE REVIEW
The literature review for this thesis summarizes the research from a list of papers related to
churn modeling, prediction as well as cross-selling products in telecom. The algorithms and
the methodologies used by the researchers has also been detailed.
This research paper helped in addressing some of the commonly used machine learning
algorithms to develop a research model on churn prediction. The research showed that a
decision tree algorithm C5.0 with misclassification cost & segmentation was much more
accurate than without misclassification cost & segmentation. To summarize, they established
a research model of customer churn based on customer segmentation and misclassification cost
and utilized this model to analyze customer behavior data of a Chinese telecom company.
This paper based used a number of statistical techniques to analyze customer churn in the
telecom industry. It essentially analyzed the performance and accuracy of different algorithms
and how they can be applied to a large telecom dataset. The researcher concluded that decision
tree-based techniques especially C5.0 and CART (Classification and Regression Trees)
outperformed widely used techniques such as regression in terms of accuracy. He also stated
that the selection the correct combination of attributes and fixing proper threshold values may
produce much more accurate results. It was also established that RULES3 is a great choice for
handling large datasets.
Another research paper that focuses on the use of different algorithms in the context of a
customer churn analysis problem for the telecom industry. This paper helped in establishing
the usefulness of neural networks and statistical based methods for predicting telecom churn.
Like the previous research paper, it also validated the use of C5.0 for churn prediction.
2.5 Mining Big Data in Telecommunications Industry:
Challenges, Techniques, and Revenue Opportunity (Hoda
A. Abdel Hafez, 2016)
This research paper focuses on the challenges present by the mining of big data in the
telecom industry as well as some of the more commonly used techniques and data
mining tools to solve these challenges.
The paper goes into detail about some of the major challenges presented by mining of
big data in the telecom industry.
Massive volume of data in this industry is represented by heterogenous and diverse
dimensionalities. Also, “the autonomous data sources with distributed and
decentralized controls as well as the complexity and evolving relationships among data
are the characteristics of big data applications” (Abdel Hafez, 2016). These
characteristics present an enormous challenge for the mining of big data in this industry.
Apart from this, the data that is generated from different sources also possesses different
types and representation forms that can lead to a great variety or heterogeneity of big
data and mining from a massive heterogeneous dataset can be a big challenge.
Heterogeneity in big data deals with structured, semi-structured, and unstructured data
simultaneously and unstructured data may not always fit with traditional database
systems.
There is also the issue of privacy, accuracy, trust and provenance. Personal data is
usually contained within the high volume of big data in the telecom industry. According
to the researcher, for this issue, it would useful to develop a model where a balance is
reached with the benefits of mining this data for business and research purposes against
individual privacy rights.
The issue of accuracy and trust arises because these data sources have different origins,
all of which are not known and verifiable. According to the researcher, to solve this
problem, data validation and provenance tracing is a necessary step in the data mining.
For this, unsupervised learning methods have been used to the trust measures of
suspected data sources using other data sources as testimony.
The paper also goes into detail about the machine learning techniques that can be used
to mine big data in the telecom industry. Both, classification and clustering techniques
are discussed in this paper.
Classification algorithms like decision trees (BOAT - optimistic decision tree
construction, ICE - implication counter examples and VFDT - very fast decision tree)
and artificial neural networks.
Clustering algorithms for handling large datasets mentioned in this paper include
hierarchical clustering, k-means clustering and density based clustering.
Both k-means and hierarchical clustering are used for high dimensional datasets and
improving data streams processing. Whereas, density based clustering which is another
method for identifying clusters in large high dimensional datasets with varying sizes
and shapes is a better option for inferring the noise in a dataset. DBSCAN and
DENCLUE are two common examples of density based clustering.
The paper also goes into detail about some of the tools that can be used for performing
data mining tasks including R, WEKA, KNIME, RapidMiner, Orange, MOA etc.
The paper mentions that WEKA is useful for classification and regression problems but
not recommended for descriptive statistics and clustering methods. The software works
well on large datasets according to the developers of WEKA, but the author of this
research paper mentions that there is limited support for big data, text mining and semi-
supervised learning. It is also mentioned that WEKA is weaker in classical testing than
R but stronger in a machine learning. It supports many model evaluation procedures
and metrics but lacks many data survey and visualization methods despite some recent
improvements.
KNIME is also mentioned as a useful for performing data mining tasks on large
datasets. One of the biggest advantages of KNIME is that it can be easily integrated
with WEKA and R, which allows for use of almost all of the functionality of WEKA
and R in KNIME. The tool has been used primarily in pharmaceutical research, business
intelligence and financial data analysis but is also often used in areas like customer data
analysis and can be a great tool for extracting information from customer data in the
telecom industry.
Orange is a python based data mining tool which can be used either through Python
scripting as a Python plug-in, or through visual programming. It offers a visual
programming front-end for exploratory data analysis and data visualisation. It consists
of a canvas in which users can place different processors to create a data analysis
workflow. Its components are called widgets and they can be used for combining
methods from the core library and associated modules to create custom algorithms. An
advantage of this tool is that the algorithms are organized in hierarchical toolboxes,
making them easy to implement.
RapidMiner is another excellent data mining tool and is generally considered one of the
most useful data mining tools in the market today. It offers an environment for for data
preparation, machine learning, deep learning, text mining, predictive analytics and
statistical modelling. According to the official RapidMiner website, this tool unifies the
entire data science lifecycle from data preparation to machine learning and predictive
modelling to deployment (RapidMiner, no date). It is an excellent tool that can be used
resourcefully in the telecom industry to gain useful insights using data mining
techniques like classification, clustering, support vector machines etc.
This paper went into great detail in covering the challenges, techniques, tools and
advantages/revenue opportunity from mining big data in the telecom industry. Starting with
the challenges presented by mining big data, covering issues like the diversity of data sources,
with the issues of data privacy, customer trust, the paper also addressed how these challenges
can be tackled by using data validation and provenance tracing. A number of supervised and
unsupervised algorithms and their usefulness was also explored. K-means and DBSCAN are
mentioned as two important clustering-based algorithms.
This paper also covered the practicality of different data mining tools. R, WEKA and
RapidMiner are mentioned as some of the best tools for the purpose of data mining and it
helped in finalizing RapidMiner as the primary data mining tool for this thesis.
The data about the customer can include sociodemographic as well as purchase history, contact
data etc. and an example of how it was done in this paper is shown below. The NPTB engine
is able to identify the top 3-5 services that a customer is likely to purchase, thus informing the
sales agents of what products to up-sell or cross-sell when a customer steps into a store or
reaches out to a call center. It is also able to inform marketing agents of specific products
customers are likely to buy. This information can be used by marketers to tailor specific
marketing campaigns for different customers.
2. Decision Tree: Decision tree is one of the most popular classification algorithms used
in data mining and machine learning. As the name suggests, in this algorithm a tree-
shaped structure is used to represent set of related decisions or choices. Berry and Linoff
(2004) perhaps offer the easiest understanding of a decision tree by explaining that a
large collection of data is divided into smaller sets of data by applying a set of decision
rules. They can be either classification trees where the target variable contains a set of
discrete values or a regression tree where the target variable comprises of a set of
continuous values. A decision tree comprises of leaves, branches and nodes where the
leaf is the output variable and lies at the base of the decision tree.
A decision tree shown below representing the survival probability of passengers on the
ship Titanic can be used to understand the process of a decision tree and also help solve
a series of questions.
In the figure below, the percentage represents the percentage of observations in the leaf
and the number for example 0.73 represents the probability of survival of a passenger.
It can be summarized that if the passenger was not a male, then there was a probability
of 0.73 or a 73% chance of survival. A male passenger greater than 9.5 years old only
had a 17% chance of survival however a male passenger greater than 9.5 years old with
less than 2.5 (2, 1 or 0) siblings had an 89% chance of survival.
3. Naïve Bayes: This algorithm is based on Bayes’ theorem but is called “naïve” because
of the strong assumption of independence between the features. Bayes’ theorem given
a class variable y and dependent feature 𝑥1 through 𝑥𝑛 , states that:
𝑷(𝒚)𝑷( 𝒙𝟏 , … 𝒙𝒏 ∣ 𝒚 )
𝑷( 𝒚 ∣ 𝒙𝟏 , … , 𝒙𝒏 ) =
𝑷(𝒙𝟏 , … , 𝒙𝒏 )
𝑷( 𝒚 ∣ 𝒙𝟏 , … , 𝒙𝒏 ) ∝ 𝑷(𝒚) ∏ 𝑷( 𝒙𝒊 ∣ 𝒚 )
𝒊=𝟏
Despite this assumption of independence, the Naïve Bayes classifier still works well,
and its use cases include the likes of spam filtering and document classification. The
algorithm can input both numerical and categorical attributes and is robust to outliers
and missing data.
4. Neural Networks: Neural networks also known as artificial neural networks are
inspired by the biological neural networks that constitute the brain. Neural networks
cannot be defined as an algorithm in itself, but rather a framework for many different
machine learning algorithms to work together and process data inputs (Deep AI, 2018).
This works by learning and processing examples with input, learning the characteristics
of the input and using this information to correctly construct the output. Once the
algorithm has processed a sufficient number of examples, the neural network can start
processing unseen inputs and successfully return the correct results (Deep AI, 2018).
An example of how the process of identifying an image by neural networks works can
help in better understanding neural networks. According to Deep AI (2018), the image
is decomposed into data points and information that a computer can use using layers of
function. Once this happens, “the neural network can start to identify trends that exist
across the many, many examples that it processes and classify images by their
similarities”. After studying and many examples of this image, the algorithm has an
idea of what data points and elements to look for when classifying this particular image
again. Hence it is often said for neural networks that, the more examples the algorithm
sees, the more accurate the results become as it learns from experience.
Berry and Linoff (2004) share a similar opinion about neural networks and mention that
neural networks have the ability to learn by example much the same way humans have
the ability to learn from experience.
Introduced by Rakesh Agarwal, Arun Swani and Tomasz Imielinski in 1993, it is a concept
that is now widely used across retail sectors, recommendation engines in e-commerce and
social media websites and online clickstream analysis across pages. One of the most popular
applications of this concept is “Market Basket Analysis” which finds the co-occurrence of one
retail item with another. For example, {milk, bread → eggs). In simpler terms, if a customer
bought product milk and bread, then there is an increased likelihood that the customer will buy
product eggs as well (Agarwal et. al, 1993). Such information is used by retailers to create
bundle pricing, shelf optimization and product placement. This is also implemented in e-
commerce through cross-selling and upselling to increase the average value of an order.
This thesis will aim to explore the concept of “Market Basket Analysis” and whether it can be
implemented in a telecom context. The two main algorithms used in association rules are
discussed below:
1. Apriori Algorithm: The principle of apriori algorithm defines that if an item set is
frequent, then all its subsets will be frequent and conversely if an item set is infrequent
then all its subsets will be infrequent as long as these items sets appear sufficiently in
the database. The algorithm uses a “bottom up” approach, meaning that frequent subsets
are extended one at a time till no further extensions are found, upon which the algorithm
is terminated.
A support threshold is used to measure how popular an item set is, measured by the
proportion of transactions in which an item set appears. Another measure is Confidence,
which is used to measure the likelihood of the purchase of item Y when item X is
purchased. Confidence of (X → Y) is calculated by:
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿∪𝒀)
Confidence (X → Y) =
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿)
A drawback of the Confidence measure is it only accounts for how popular item X is,
and does not consider the popularity of item Y, thus misrepresenting the importance of
an association. If item Y is popular in general then there is a higher chance that a
transaction containing item X will also contain item Y, thus inflating the value of the
Confidence measure.
A third measure known as Lift is used to account for the popularity of both items. Lift
is the ratio of observed support with what is expected if X and Y were completely
independent. It can be measured by
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿∪𝒀)
Lift (X → Y) =
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿)×𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝒀)
A lift value of greater than 1 suggests that item Y is likely to be bought if item X is
bought, while a lift value of less than 1 suggests that is not likely to be bought if item
X is bought.
The algorithm still has some drawbacks mainly due to its requirement of a large number
of subsets. It also scans the database too many times which leads to huge memory
consumption and performance issues.
1. {News, Finance}
2. {News, Finance}
4. {Sports}
6. {News, Entertainment}
To summarize, both association algorithms have their own pros and cons. An FP-Growth is
considered by many to be the better option due to its due to ability to compress the data, which
means that it requires less time and memory usage than Apriori. FP-Growth is however more
expensive to build and may not fit in memory. Both algorithms offer the advantage of offering
easy to understand rules.
This concludes the review of the different types of classification and association algorithms
that will be implemented in the design phase.
CHAPTER 3 – RESEARCH METHODOLOGY
• A project always starts off with understanding the business context. This
step invovles setting project objectives, setting up a project plan, defining
Business business sucess criteria and determining data mining goals.
Understanding
• The second phase involves acquiring the data that will be used in the
project, understanding of this data. It also involves describing the data that
Data has been acquired and assessing the quality of this data.
Understanding
• Once the data has been collected, it is time for cleaning the data,
identifying any missing values, errors and making sure the data is ready for
Data the modeling phase.
Preparation
• This phase involves selecting the modeling technique(s) and tool(s) that
will be used in the project. Generating a test design, using the modeling
tool(s) on the prepared dataset and assessing the performance and quality
Modeling of the model is also part of this phase.
• This step involves evaluating the results of the modeling phase, assessing
to what degree the models meet the business objectives. The entire process
is reviewed to make the results satisy the business needs. The review also
Evaluation covers quality assurance questions and the next steps are also determined.
Generally, there are two categories of data in a research project – Primary and Secondary.
Primary data refers to data this is collected by the researcher him/herself for a specific purpose.
Secondary data refers to data collected by someone other than the researcher for some other
purpose, but data that is utilized by the researcher for another purpose. This can be classified
as a fusion of primary and secondary data since this data was collected by someone other than
this researcher, but the purpose of this research remains similar to what it was originally
collected for – data mining and data analytics.
The visualization in figure 9 is that of a bar showing customer churn by gender and the type of
contract. The binary attribute churn is shown by color. The bar chart illustrates that the
distribution of churn by the type of contract is relatively the same for both genders.
Figure 10: Tree map of Customer Churn, Tenure and Monthly Charges
Figure 10 is a visualization of a tree map showing customer churn, tenure and monthly charges.
Tree map is a data visualization technique that is used to illustrate hierarchical data using nested
rectangles. The rectangles are separated based on the tenure attributes (One year, two years
etc.), the size of the rectangles illustrates the amount of monthly charges. Similar to figure 9,
figure 11 is bar chart of customer churn by gender and the type of payment method. Again, the
distribution of churn is relatively the same across the attributes.
3.5.1 RapidMiner
RapidMiner is a leading data science tool that offers an all in 1 package for data preparation,
data mining, machine learning, text mining amongst a plethora of other useful features. It offers
a user-friendly environment for data preparation, data modelling and evaluation. It allows users
to create workflows that can be basic or highly advanced to deliver almost instantaneous
results. RapidMiner offers a simplified and unified approach to data mining and machine
learning resulting to greatly enhanced productivity and efficiency. It also features scripting
support in several languages and offers a number of data mining tool sets including:
It offers a free license for students and researchers, with its educational license offering
unlimited data rows and premium features like Turbo Prep and Auto Model. Another excellent
feature of this tool is the “Wisdom of Crowds” feature, which works as a sort of
recommendation system for the operator and parameters that can be used in a data mining
process. These recommendations are derived from the activities of more than 250,000
RapidMiner users worldwide. According to RapidMiner “This data is anonymously gathered
and stored in a best-practice knowledge base” (RapidMiner, no date). All these features
combine to make RapidMiner one of the best data science platforms in the market, which is
validated by its place as one of the leaders in the 2018 Gartner Magic Quadrant for Data Science
and Machine Learning platforms.
3.5.2 R
R is an open source programming language which offers fast implementations of various
machine learning algorithms. Jovic, Brkic and Bogunovic (2014, p.1112) mention that R has
specific data types for big data, web mining, data streams, graph mining and spatial mining and
that it can be easily implemented for use in the telecom industry. R can be easily extended with
more than 13,000 packages available on CRAN (The Comprehensive R Archive Network) as
of November 2018.
4.1 Introduction
This chapter details the process of building and implementation of the machine learning models
in RapidMiner and R. The quality of the models will be assessed based on measures such as
accuracy, AUC, precision and recall.
After selecting Auto Model in RapidMiner studio, the first step is to load the dataset from the
repository. Once the dataset is selected, the type of task needs to be selected (Predict, Clusters
or Outliers). Since we want to predict customer churn, predict is selected and Churn is
selected as the attribute that we want to predict.
At this stage, the inputs are selected. Not all attributes are useful when making a prediction and
removing the unneeded attributes can help speed up the model and improve its performance.
Attributes with a high degree of correlation or attributes where all values are different or
identical (Customer ID in this case) should be removed. Auto Model helps in this task by
marking the attributes that should be removed as red.
The next step is selecting the models that are relevant to the problem. Auto Model provides a
default list of classification models. Some models like Deep Learning and Random Forest
take longer than others to run. If there is no time constraint, it makes sense to run all the models
and compare their performance and fit with the dataset. The following models were run for the
Telecom Churn dataset:
Naïve Bayes
Generalized Linear Model (GLM)
Logistic Regression
Deep Learning
Decision Tree
Random Forest
Random Forest takes the longest time to run and Naïve Bayes was the least accurate model
with an accuracy of 72.7%.
The next two screenshots show the Auto Model simulator. This interactive simulator consists
of sliders and dropdowns and the user has the ability to change the values for different attributes
to see how the predicted variable is impacted.
For example, by changing the contract from “Two year” to “Month-to-month” changes the
probability of customer not churning significantly from 58% to 83% which tells us that the
length of the contract is an important factor in deciding whether a customer will churn or not.
It also has an “Important Factors for No” section which shows how the different attributes
affect the possibility of a customer not churning.
Overall, Auto Model serves as a great feature for quickly creating automated predictive models.
It highlights the features which have the greatest impact on the business objective and also
offers built in visualizations and an interactive model simulator to see how the model performs
under a variety of conditions.
Figure 18: Auto Model Simulator
Figure 19: k-Nearest Neighbor: How to Implement in RapidMiner with Split Validation
The steps in building a k-nearest neighbor classification process in RapidMiner are detailed
below:
Figure 20: k-Nearest Neighbor: How to Implement in RapidMiner with Split Validation
Figure 22: K-Nearest Neighbor: How to Implement in RapidMiner with Split Data
Figure 23: K-Nearest Neighbor: Performance Vector
The Cross-Validation operator in RapidMiner offers users to input the value of k, which
determines the number of sub samples the example set should be divided into. The value of k
was set to 10, and the sampling type can be selected as either shuffled or stratified sampling so
that random sub samples are created.
One of the reasons for the large variance seen in the performance when using cross-validation
and split validation is due to the difference in the number of iterations that take place when
using the two validation processes. In split validation, the model is learned on a training set
and then applied on a test set in a single iteration, whereas in cross-validation, as explained
above, the number of iterations are k, and k in this case is 10, leading to a more accurate k-
nearest neighbor model.
The screenshot above, shows the output of the k-nearest neighbor algorithm. The
prediction(churn) variable shows if there are any customers who are likely to turn from no to
yes. These customers are ones identified to be the most likely churners. The confidence interval
for both yes and no is also shown. The customers who are likely to churn can then be
approached with appropriate marketing strategies.
4.4 Decision Tree: How to Implement in RapidMiner
The screenshot below shows the process of creating a classification decision tree in RapidMiner
with cross-validation. The classification decision tree operator in RapidMiner is a collection of
nodes intended to create decisions on values belong to a class or the estimate of a numerical
target value. The nodes represent a splitting rule for a specific attribute and a classification
decision tree uses these rules to separate values belonging to different classes (RapidMiner).
One of the benefits of a decision tree is its relative ease in interpreting the results for both
technical and non-technical users. However, a large number of attributes can lead to a decision
tree becoming cluttered and hard to understand, hence eliminating one of its biggest benefits.
Another advantage of a decision tree is that it requires very little data preparation.
Normalization is not necessary and the tree is not sensitive to missing values.
Here, the select attributes operator is used to select the attributes that are important for
the data mining process. The rest of the attributes are not used and this makes the
resulting tree easier to interpret.
Cross validation is used for training and testing.
The model does require some fine tuning to get the most accurate results. The decision
tree operator has a number of parameters that can be optimized and experimented with
in order to improve precision, accuracy and recall.
Feature selection is implicitly performed using Information Gain. The partitioning
criteria is set to information gain and the maximal depth of tree = 5.
The minimal gain, according to RapidMiner is “The gain of a node calculated before
splitting it and the node is split if its gain is greater than the minimal gain. A higher
value of minimal gain results in fewer splits and thus a smaller tree. A value that is too
high will completely prevent splitting and a tree with a single node is generated”
(RapidMiner). The minimal gain was kept to its default value of 0.01. Other larger
values were also tested but this led to a decrease in accuracy, precision and AUC.
The values for minimal size for split, minimal leaf size, maximal depth were kept to
their default values. These are auto determined by size of the dataset.
Apply Model and the Performance operator are then applied to assess the quality of
the model.
The table below shows the task of optimizing decision tree parameters to get the best
possible value for accuracy, precision and AUC.
In figure 28:
The “Contract” attribute manages to classify 100% of the rows in the dataset.
Contract = Two-year, manages to classify a total of 1695 customers. The output variable
distribution is “NO” for 1647 customers and “YES” for 48 customers. We can conclude
that when the contract of a customer two years, it is unlikely that the customer will
churn.
Contract = One-year, manages to classify a total of 1473 customers. The output variable
distribution is “NO” for 1307 customers and “YES” for 166 customers. Again, we can
conclude that customer with a one-year contract are unlikely, though their probability
of churning is slightly more when compared to customers with a two-year contract.
The contract variable is highly significant in predicting whether a customer will churn
or not.
Contract = month-to-month manages to classify 3875 customers. When contract =
month-to-month, Internet Service = DSL, total charges > 310.9, the output variable
distribution is “NO” for 583 customers and “YES” for 142 customers.
When contract = month-to-month and Internet Service = Fiber Optic and tenure > 15.5,
the output variable distribution is “NO” for 647 customers and “YES” for 445
customers. When tenure <= 15.5, output variable distribution is “YES” for 717
customers and “NO” for 319 customers.
We can conclude that, the tenure – which means the number of months the customer
has been a subscriber of the telecom company’s services –is also an important variable
for predicting whether a customer will churn or not.
Using this information, the telecom company can identify the customers with a high
probability of churning. They can focus their marketing efforts on customers with
shorter contract lengths and customers who are relatively new to the telecom company’s
services.
# Decision Tree
# For easier interpretation, we can convert the tenure attribute from months to year for
the decision tree.
n = nrow (NewTelecomChurn)
Trainset = NewTelecomChurn[indexes,]
Testset = NewTelecomChurn[-indexes,]
Print (CustomerChurn_ctree)
plot(CustomerChurn_ctree)
plot(CustomerChurn_ctree, type="simple")
# Testset Predictions
Similar to the decision tree in RapidMiner, the decision tree in R can be analyzed to find out
how the selected attributes affect the outcome of Churn:
The Naïve Bayesian model accepts both numeric and categorical variables.
Cross-validation is used for training and testing the model. Random subsets can be built
using stratified sampling and this is important as the training set needs to be
representative and proportional to the underlying dataset.
The Naïve Bayes operator in RapidMiner has one parameter – Laplace Correction. If
within a training set a given value never occurs within the context of a given class, then
the class conditional probability is set to 0. Using Laplace Correction adds 1 to the
count of each attribute to avoid the occurrence of zero values. This is recommended for
smaller data sets.
Apply Model and the Performance operator are then applied to assess the quality of
the model.
Figure 35: Naïve Bayes: Interpreting the Results – Distribution Table Output (Class
Conditional Probability Table)
Figure 36: Naïve Bayes: Interpreting the Results – Distribution Table Output (Class
Conditional Probability Table)
Figure 35 and 36 show the Naïve Bayes Distribution Table Output (Class Conditional
Probability Table) for the attributes in the dataset.
Figure 37: Naïve Bayes: Interpreting the Results – Probability Distribution Function for
“Tenure”.
The probability distribution function for tenure as shown in figure 37, illustrates that the
likelihood of “Yes” and “No” for the output variable churn.
Figure 38: Naïve Bayes: Interpreting the Results – Bar Chart for Contract (Yes or No)
Figure 38 is a bar chart showing the results of whether a customer will churn or not based on
their contract with the telecom company. The bar chart proves that the length of the contract is
a vital factor in determining whether a customer will churn or not. If a customer is on a short-
term contract (month-to-month), it increases the probability of a customer churning. However,
customers on one-year and two-year contracts are more likely to retain the telecom company’s
services.
Figure 39: Naïve Bayes: Interpreting the Results – Probability Distribution Function for
“Monthly Charges”.
Figure 39 is chart for the probability distribution function for the attribute “Monthly Charges”.
This attribute denotes the monthly charges accrued by the customer. The probability
distribution function shows the effect of this attribute on whether a customer will churn or not.
When monthly charges exceed 60, the likelihood of churn = no increases.
n2 = nrow (NewTelecomChurn)
Trainset2 = NewTelecomChurn[indexes,]
Testset2 = NewTelecomChurn[-indexes,]
NaiveBayes <- naiveBayes(Churn ~ ., data = Testset2)
NaiveBayes
# Confusion Matrix
sum (tab[row(tab)==col(tab)])/sum(tab)
# accuracy = 0.7342222
sum(tab[row(tab)==col(tab)])/sum(tab)
The code above was used to implement Naïve Bayes in R for the dataset.
The accuracy (73.42) for the model was slightly more than the Naïve Bayes model in
RapidMiner. The same accuracy was achieved using Laplace smoothing.
4.7.1 Naïve Bayes in R: Interpreting the Results
Figures 40 and 41, show the class conditional probabilities for the Naïve Bayes model
implemented in R.
The priori are calculated from the proporition of the training data. The Y values are the means
and standard deviations of the predictors from each class.
4.8 Random Forest: How to Implement in RapidMiner
Random Forest also known as Random Decision Forests does random selection of attributes
and training records. It is a type of ensemble model where, for each base ensemble decision
tree model, a random sample is selected with replacement and a random subset of all the
attributes in the training set is considered when deciding how to split each node of the tree.
Once all trees are built, for each new record, all the trees predict a class and vote with equal
weight (Kotu and Deshpande, 2014).
The results of the Random Forest model are shown in the screenshot below. The
prediction(churn) variable shows the customers that are likely to churn. These customers are
ones identified to be the most likely churners in the future. The confidence level for both yes
and no is also shown. The attributes can for the customers who are likely to churn can be
examined and they can then be approached with appropriate marketing strategies.
As mentioned above, Random Forest is also known as Random Decision Forests. It operates
by constructing a multitude of decision trees at training time. For this model, a total of 20 trees
were constructed and one of the Random Forest decision trees is shown in figure 41. Analyzing
the results of the Random Forest decision tree can let us know of how the attributes affect the
output variable “Churn”.
In figure 45:
The “Paperless Billing” attribute manages to classify 1409 records in the dataset.
The decision tree is then split into customers with paperless billing and customers
without paperless billing.
For customers without paperless billing, the attribute “Tech Support” is checked next.
For this attribute, customers with “no internet service” are highly likely to not churn.
With a distribution of 179 for “No” and 14 for “Yes”.
For customers with paperless billing and tech support, the “Total Charges” is the next
node to be checked. Customers with total charges of <= 332.95 are likely to churn,
whereas customer with total charges of >332.95 are highly unlikely to churn with a
distribution of 135 for “No” and 20 for “Yes”.
Similarly other nodes and their distribution of “No” and “Yes” can be examined.
Understanding how these attributes effect customer churn can help the telecom
company in prioritizing their marketing efforts on specific aspects.
4.9 Random Forest: How to Implement in R
# Random Forest in R
n1 = nrow (NewTelecomChurn)
Trainset1 = NewTelecomChurn[indexes,]
Testset1 = NewTelecomChurn[-indexes,]
RandomForestModel
RandomForestModel2
table(predTrainRandomForest, Trainset1$Churn)
mean(predTestRandomForest == Testset1$Churn)
# 0.7896233
importance(RandomForestModel2)
varImpPlot(RandomForestModel2)
RandomForest gives out-of-bag (OOB) error estimates, which is the mean prediction error on
a training sample. The OOB error estimate using the default parameters for Random Forest is
19.82% as shown in figure 46. The parameters that can be changed are the number of trees and
the No. of variables tried at each split. After changing the latter parameter from 4 to 3, the OOB
error estimate achieved is slightly less for the model as shown in figure 47. The number of trees
parameter was also experimented with, but this always increased the OOB error estimate and
hence it was kept to its default value of 500. The accuracy for the Random Forest model after
some fine tuning was 78.96%.
4.9.1 Random Forest in R: Interpreting the Results
The variable importance plot shows the most significant attributes in decreasing order by mean
decrease in accuracy and mean decrease in Gini.
According to Metagenomics Statistics “The more the accuracy of the random forest decreases
due to the exclusion of a single variable, the more important that variable is deemed, and
therefore variables with a large mean decrease in accuracy are more important for classification
of the data.”
The mean decrease in Gini are a measure of the pureness of the nodes at the end of the tree and
hence a higher Gini Index correlates with higher homogeneity (Rahman, 2018).
The plot reaffirms the fact the attributes “Total Charges”, “Tenure”, “Monthly Charges” and
“Contract” are the 4 most important variables in determining customer churn.
plot (RandomForestROC, col = "red", add = TRUE, print.auc.y = 0.5, print.auc = TRUE)
plot (NaiveBayesROC, col = "green" , add = TRUE, print.auc.y = 0.6, print.auc = TRUE)
legend(0.1,0.3, c("Naive Bayes", "Random Forest", "Decision Tree"), lty = c(1,1), lwd = c(2,
2), col = c("green", "red", "blue"), cex = 0.8)
Naïve Bayes has the highest AUC value, followed by Random Forest and Decision Tree.
4.11 FP-Growth: How to Implement in RapidMiner
The screenshot above shows the process of creating a FP-Growth process in RapidMiner. For
the association algorithms, the categorical attributes were converted into binomial attributes in
excel.
Confidence is used to measure the likelihood of the purchase of item Y when item X is
purchased. Figures 53 and 54 show the association rules output with Support, Confidence,
LaPlace, Gain, p-s, Lift and Conviction values for the results.
Figure 55: FP-Growth: Interpreting the Results – Association Rules
Figures 55 and 56 show easy to understand rules of the FP-Growth algorithm. For example,
[Multiple Lines, Streaming TV] [Device Protection] (confidence: 0.627) can be interpreted
as a customer with multiple lines and streaming TV is likely to purchase device protection with
a confidence of 0.627. Similarly, other rules and their confidence values are in the two
aforementioned figures. These rules and the frequent item sets can be used to generate an
effective cross-selling model for telecom services and products.
5.1 Introduction
This chapter concludes the thesis with a review of the overall research performed as the well
the results and the insights gained from performing the research. The ways in which the telecom
industry can benefit from using data mining and machine learning are also discussed.
Suggestions for future research are also presented. The performance of the classification
algorithms implemented in RapidMiner are shown in the table below.
Naïve Bayes Cross- 70.08 +/- 46.45 +/- 83.21 +/- 0.820
validation 0.80 % 0.84% 2.73%
Overall, Random Forest with splitting data in to the ratio of 80% for training and 20%
for testing was the best performing classification model with an accuracy of 79.39%. It
also had the highest AUC value with 0.829.
Naïve Bayes with a recall of 83.21 +/- 2.73% had the highest recall value.
Meanwhile, k-nearest neighbor with split-validation was the worst performing model.
To successfully achieve the objectives of this research, as defined in section 1.5, a publicly
available telecom dataset consisting of a total of 7043 rows with 21 attributes of customers was
used. The data was already in a clean and readable format and hence significant data pre-
processing was not necessary. RapidMiner and R were used for implementing the machine
learning models. The results were similar in both the tools. K-Nearest Neighbor, Decision Tree,
Naïve Bayes and Random Forest were the four classification algorithms that were used to
predict customer churn and understand which of the attributes in the dataset were most
significant in determining whether a customer will churn or not. The results show that these
algorithms are successful in predicting customer churn. The attributes that were found to be
most significant in predicting customer churn are Contract, Tenure, Monthly Charges and
Total Charges. It was found that customers with longer contracts (one year and two year) are
less likely to churn as opposed to customers with a month-to-month contract. The tenure
attribute is also highly influential in determining customer churn. Longer tenure customers are
less likely to churn. Loyal customers are also more likely to have high total and monthly
charges. The large number of telecom services related attributes this dataset also allowed for a
successful implementation of “Cross-selling” techniques in a telecom context. FP-Growth and
Apriori were implemented in RapidMiner to discover frequent item sets and associations
between attributes such as “Streaming Movies”, “Streaming TV”, “Device Protection”, “Tech
Support”, “Online Backup”, “Multiple Lines” etc.
To summarize, the key drivers of churn are identified in this study and relevant and useful
associations between products are established. Using this information telecom companies can
successfully work on improving customer churn by performing targeted marketing.
Companies can create personalized offers and campaigns customers who are at risk of
churning. Association rules can also help in identifying usage patterns, buying preferences,
socio-economic influences of customers.
Fraud Detection: Fraud detection is a major source of revenue loss for organisations
every year can also be improved by mining of data in the telecom industry.
Network Optimisation and Expansion: Correlating network usage and subscriber
density with traffic and location data can help telecom providers in accurately managing
network capacity and forecasting & planning for any potential network outages. Data
mining techniques can also be used while planning for network expansions by
identifying areas where network usage is nearing its capacity.
Future studies can focus on this aspect of data mining in the telecom industry. Overall, machine
learning and data mining can help telecom companies to make much better decisions regarding
their own business, marketing, service offerings etc. in order to get the most out of their
business.
REFERENCES
Agrawal, R., Imieliński, T. & Swami, A. (1993). ‘Mining association rules
between sets of items in large databases’. Proceedings of the 1993 ACM
SIGMOD international conference on Management of data - SIGMOD '93.
p. 207.
Almana, A.M., Aksoy, M.S. & Alzahrani, R. (2014), ‘A Survey On Data
Mining Techniques In Customer Churn Analysis For Telecom Industry’,
Int. Journal of Engineering Research and Applications, Vol. 4(5), pp. 165-
171.
Berry, M. & Linoff, G. (2014), ‘Data Mining Techniques for Marketing,
Sales and Customer Relationship Management (2nd edition)’. Indianapolis:
Wiley.
Bronhstein, A. (2017), ‘A Quick Introduction to K-Nearest Neighbors
Algorithm’, Medium. Http: https://medium.com/@adi.bronshtein/a-
quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7
(Last Accessed: 25/10/2018).
Chaurasia, K. (No Date), ‘DBSCAN Clustering Algorithm’, Practice 2
Code. Http: https://practice2code.blogspot.com/2017/07/dbscan-
clustering-algorithm.html (Last Accessed: 1/11/2018).
Chen, C. (2016), ‘Use Cases and Challenges in Telecom Big Data
Analytics’, Asia Pacific Signal and Information Processing Association ,
Vol. 5(19), pp. 1-7.
Deep AI (2018), ‘Neural Network: What is a Neural Network?’. Http:
https://deepai.org/machine-learning-glossary-and-terms/neural-
network (Last Accessed: 29/10/2018).
Deshpande, B. & Kotu, V. (2014), ‘Predictive Analytics and Data Mining:
Concepts and Practice with RapidMiner’. London: Elsevier
Gruber, C. (2016), ‘Using Customer Behavior Data to Improve Customer
Retention’, IBM. Http:
https://www.ibm.com/communities/analytics/watson-analytics-
blog/using-customer-behavior-data-to-improve-customer-retention/.
(Last Accessed: 15/10/2018).
Hafez, H.A.A (2016), ‘Mining Big Data in Telecommunications Industry:
Challenges, Techniques, and Revenue Opportunity’, Dubai UAE Jan 28-
29, 18 (1), pp. 4297-4304. Academia [Online].
Han, J., Pei, J., & Kamber, M. (2011), Data mining: concepts and
techniques. Elsevier.
Hurtgen, H., Natarajan, S., Spittaels, S., Vetvik, O.J. & Ying, S. (2012),
‘Crunch Time: Using Big Data to Boost Telco Marketing Capabilities’,
McKinsey. Http:
https://www.mckinsey.com/~/media/mckinsey/dotcom/client_service/T
elecoms/PDFs/RecallNo21_Big_Data_2012-07.ashx (Last Accessed:
17/11/2018).
Jain, S. (2016) ‘Analysis and Application of Data Mining Methods used
for Customer Churn in Telecom Industry’, LinkedIn. Http:
https://www.linkedin.com/pulse/analysis-application-data-
miningmethods-used-customer-saurabh-jain/ (Accessed: 4 September
2018).
Jaroszewicz, S. (2008), ‘Cross-Selling Models for Telecommunication
Services’, Journal of Telecommunications and Information Technology,
Vol. 3. pp. 52-59.
Jin, X. & Han, J. (2011), ‘K-Medoids Clustering. In: Sammut C., Webb
G.I. (eds) Encyclopedia of Machine Learning’. Boston, MA: Springer.
Joseph, M. (2013), ‘Data Mining and Business Intelligence Applications
in Telecommunication Industry’, International Journal of Engineering and
Advanced Technology, Vol. 2(3), pp. 525-528.
Jovic, A., Brkic, K. & Bogunovic, N. (2014), “An Overview of Free
Software Tools for General Data Mining”, 37th International Convention
on Information & Communication Technology Electronics &
Microelectronics, pp. 1112-1117.
Keramati, A., Jafari-Marandi, R., Aliannejadi, M, Ahmadian, I.,
Mozzafari, M. & Abbasi, U. (2014), ‘Improved churn prediction in
telecommunication industry using data mining techniques’, Applied Soft
Computing, Vol. 24, pp. 994-1012.
Kirui, C., Hong, L., Cheruiyot, W. & Kirui, H. (2013), ‘Predicting
Customer Churn in Mobile Telephony Industry Using Probabilistic
Classifiers in Data Mining’, International Journal of Computer Science
Issues, Vol. 10(2), pp. 165-172.
Liu, Y. & Zhuang, Y. (2015), ‘Research Model of Churn Prediction Based
on Customer Segmentation and Misclassification Cost in the Context of
Big Data’, Journal of Computer and Communications, Vol. 3, pp. 87- 93.
McDonald, C. (2017), ‘Machine learning fundamentals (II): Neural
networks’, Towards Data Science. Http:
https://towardsdatascience.com/machine-learning-fundamentals-ii-
neural-networks-f1e7b2cb3eef (Last Accessed: 29/10/2018).
Qureshi, S.A., Rehman, A.S., Qamar, A.M., Kamal, A. & Rehman, A.
(2013), ‘Telecommunication Subscribers' Churn Prediction Model Using
Machine Learning’, Conference: IEEE International Conference on Digital
Information Management (ICDIM)At: Islamabad, Pakistan.
RapidMiner (No Date), ‘RapidMiner Documentation (Auto Model)’, Http:
https://docs.rapidminer.com/latest/studio/auto-model/ (Last Accessed:
21/11/2018).
RapidMiner (No Date), ‘RapidMiner Documentation: Cross Validation
(Concurrency)’, Http:
https://docs.rapidminer.com/latest/studio/operators/validation/cross_v
alidation.html (Last Accessed: 27/11/2018).
Saraswat, S. & Tiwari, A. (2018), ‘A New Approach for Customer Churn
Prediction in Telecom Industry’, International Journal of Computer
Applications, Vol. 181(11), pp. 40-46.
SAS (No Date), ‘What is Data Mining?’. Http:
https://www.sas.com/en_ie/insights/analytics/data-mining.html (Last
Accessed: 4/9/2018).
Scheffer, T. (2001) ‘Finding Association Rules That Trade Support
Optimally against Confidence. In: 5th European Conference on Principles
of Data Mining and Knowledge Discovery’, pp. 424-435.
Shaaban, E., Helmy, Y., Khedr, A. & Nasr, M. (2012), ‘A Proposed Churn
Prediction Model’, International Journal of Engineering Research and
Applications, Vol. 2(4), pp. 693-697.
Tsiptsis, K. and Chorianopoulos, A. (2009), ‘Data Mining Techniques in
CRM: Inside Customer Segmentation’. Chichester: Wiley.
Van den Poel, D. & Lariviere, B. (2004), ‘Customer Attrition Analysis for
Financial Services Using Proportional Hazard Models’, European Journal
of Operational Research, Vol. 157(1), pp. 196-217.
Webb, G.I. (2011) Lazy Learning. In: Sammut, C., Webb, G.I. (eds)
Encyclopedia of Machine Learning. Boston, MA: Springer.
Weiss, G. (2010), ‘Data Mining in the Telecommunications Industry’,
Networking and Telecommunications: Concepts, Methodologies, Tools,
and Applications, pp. 194-201.
Witten, I., Frank, E. & Hall, M. (2011), ‘Data Mining’. Burlington, MA:
Morgan Kaufmann.
APPENDIX
# install.packages("ggplot2")
# install.packages("caret")
# install.packages("class)
# install.packages("rpart")
# install.packages("arules")
# Load libraries
library(MASS)
library(randomForest)
library(party)
library(e1071)
library(caret)
library(caTools)
library(dplyr)
library(rpart)
library(arules)
library(randomForest)
library(pROC)
summary(TelecomChurn)
str(TelecomChurn)
sum(is.na(TelecomChurn)
sum(is.na(NewTelecomChurn))
str(NewTelecomChurn)
# Changing the values for the Senior Citizen attribute from 0 and 1 to No and Yes
respectively
str(NewTelecomChurn)
# Decision Tree
# For easier interpretation, we can convert the tenure attribute from months to year for
the decision tree.
n = nrow (NewTelecomChurn)
Trainset = NewTelecomChurn[indexes,]
Testset = NewTelecomChurn[-indexes,]
print(CustomerChurn_ctree)
plot(CustomerChurn_ctree)
plot(CustomerChurn_ctree, type="simple")
# Testset Predictions
# Random Forest
n1 = nrow (NewTelecomChurn)
Trainset1 = NewTelecomChurn[indexes,]
Testset1 = NewTelecomChurn[-indexes,]
RandomForestModel
table(predTrainRandomForest, Trainset1$Churn)
mean(predTestRandomForest == Testset1$Churn)
# 0.7896233
importance(RandomForestModel2)
varImpPlot(RandomForestModel2)
# Naive Bayes
n2 = nrow (NewTelecomChurn)
Trainset2 = NewTelecomChurn[indexes,]
Testset2 = NewTelecomChurn[-indexes,]
NaiveBayes
tab = table(predNaiveBayes,Testset2$Churn)
sum (tab[row(tab)==col(tab)])/sum(tab)
sum(tab[row(tab)==col(tab)])/sum(tab)
plot (RandomForestROC, col = "red", add = TRUE, print.auc.y = 0.5, print.auc = TRUE)
plot (NaiveBayesROC, col = "green" , add = TRUE, print.auc.y = 0.6, print.auc = TRUE)
legend(0.1,0.3, c("Naive Bayes", "Random Forest", "Decision Tree"), lty = c(1,1), lwd = c(2,
2), col = c("green", "red", "blue"), cex = 0.8)