Ensemble Approach On Customer Churn Prediction
Ensemble Approach On Customer Churn Prediction
The number of customers lost within a certain time frame divided by the number of
active customers at the start of the period is one way to compute a churn rate. For
example, if you gained 1000 clients last month but lost 50, the monthly churn rate
would be 5%Every month, the active customer base is fed into a Machine Learning
Predictive Model, which calculates the likelihood of each client churning, will be
sorted from highest to lowest probability value (or score. Clients with a low
likelihood of turnover (or, in other words, customers for whom the model forecasts
no churn) are satisfied customers.
iv
TABLE OF CONTENTS
CHAPTER CHAPTER NAME PAGE NUMBER
Abstract iv
List of Figures vii
List of Abbreviations vii
1 INTRODUCTION 01
3 METHODOLOGY 09
3.1 EXISTING METHODOLOGY 09
3.2 PROPOSED METHODOLOGY 09
3.3 SYSTEM ARCHITECTURE 10
3.3.1 TAKING INPUT DATA 11
3.3.2 DATA PREPROCESSING AND ACUISATION 13
3.3.3 VALIDATION OF DATASET 17
3.3.4 USING DIFFERENT ML ALGORITHMS 19
3.3.4.1 KNN 19
3.3.4.2 RANDOM FOREST 21
3.3.4.3 XGBOOST 24
v
3.3.4.4 LOGISTIC REGRESSION 26
3.3.4.5 STACKING 29
3.3.5 TEST OF DATASET 32
3.3.6 METRICS USED 32
3.4 WORKFLOW -DIAGRAM 36
4 RESULTS AND DISSCUSSIONS 37
4.1 RESULTS 37
4.2 REQUIRED ANALYSIS 38
5 CONCLUSION AND FUTURE WORK 39
5.1 CONCLUSION 39
5.2 SCOPE OF FUTURE WORK 39
REFERENCES 40
APPENDICIES
A. SAMPLE CODE 41
B. SCREENSHOTS
C. PLAGIARISM REPORT
vi
LIST OF FIGURES
LIST OF TABLES
LIST OF GRAPHS
vii
LIST OF ABBRIVATIONS
viii
CHAPTER 1
INTRODUCTION
Today, web-based shopping is the quickest and internet promoting tool. The is vital
when shopping. quantity of engineers who purchase their items on the web with the
nature of online administrations and items on the web. In this way, the internet
shopping climate assumes a significant part in the connection among shippers and
purchasers.
Accurate online prediction for a product buying or not involves expert knowledge,
because price usually depends on many distinctive features and factors. Typically,
most significant ones are brand, Expiry date, Tenure, Membership and Average
Salary of person. The main objective of the Online Shopping System is to manage
the details of Shopping, Internet, Payment, Bills, Customer. It manages all the
1
information about Shopping, Products, Customer, Shopping. The project is totally
built at administrative end and thus only the administrator is guaranteed the access
based on the products purchased.
A typical online store enables the customer to browse the firm's range of products and
services, view photos or images of the products, along with information about the
product specifications, features and prices.
Online stores usually enable shoppers to use "search" features to find specific
models, brands or items. Online customers must have access to the Internet and a
valid method of payment in order to complete a transaction, such as a credit card,
debit card, or a service such as PayPal. For physical products (e.g., paperback books
or clothes), the e-tailer ships the products to the customer; for digital products, such
as digital audio files of songs or software, the e-tailer usually sends the file to the
customer over the Internet.
People in large number are doing online shopping today, and it is not only because it
is convenient as one can shop from home, but also because there is an ample
number of varieties available, with a high competition of prices, and also it is easy to
navigate for searching regarding any particular item.
K Maheshwari,P Packia Amutha Priya has discussed, in their paper in 2018 written
for Master thesis [2], that regression model that was built using Support Vector
Machines (SVM). Customer buying behaviour is identified by people's personality
and character. These personality characters vary from person to person. The
character includes quality, motivation, occupation and income level, perception,
2
psychological, personality, reference groups and demographic reasons learning,
beliefs, attitude, Culture and social forces. Nowadays, Data mining normally used to
investigate the customer activities on shopping by using various algorithms and
methods. Data mining has gradually raised and it gains numerous industries which
applies this technology. Each and every activity of a customer is stored as a byte of
data in a database to collect information such as how the customer spends their
valuable time, day in buying decision.
3
The end user of this product is a departmental store where the
application is hosted on the web and the administrator maintains the database. The
application which is deployed at the customer database, the details of the items are
brought forward from the database for the customer view based on the selection
through the menu and the database of all the products are updated at the end of
each transaction. Data entry into the application can be done through various
screens designed for various levels of users. Once the authorized personnel feed the
relevant data into the system, several reports could be generated as per the security.
The Server process the customer to the demo page to
enter the details i.e, age ,membership, has any offers, coupons , time spent on
website etc things to fill it and predict the score whether they are willing to buy the
product or not based on the churn score we get on the screen.
4
1.KNN (K Nearest Neighbor)
2.Random Forest
3.XGBoost (Xtreme Gradient Boosting)
4.Logistic Regression
5.Stacking (Ensemble Method)
HARDWARE REQUIREMENTS
• System : Pentium Dual Core.
• Hard Disk : 120 GB.
• Monitor : 15”LED
• Input Devices : Keyboard, Mouse
• Ram : 1GB.
SOFTWARE REUIREMENTS
• Operating system : Windows 7.
• Coding Language : python
• Toolkit : Jupiter Notebook
• DATABASE : EXCEL
5
CHAPTER 2
LITERATURE SURVEY
2.1 DIFFERENT SURVEYS FOR ONLINE PREDICTION
[1]Theresa Maria Rausch, Nicholas Daniel Derra, Lukas Wolf in the year 2020
Predicting online shopping cart abandonment with machine learning approaches was
proposed by using the methodology of k-nearest neighbour algorithm, Decision Tree
And Boosting Algorithm proposing that Thereby, we provide methodological insights
to gather a comprehensive understanding of the practicability of classification
methods in the context of online shopping cart abandonment prediction: our findings
indicate that gradient boosting with regularization outperforms the remaining models
yielding an F1-Score of 0.8569 and an AUC value of 0.8182. Nevertheless, as
gradient boosting tends to be computationally infeasible, a decision tree or boosted
logistic regression may be suitable alternatives, balancing the trade-off between
model complexity and prediction accuracy.
[2] Cheng Lin; Qinpei Zhao; Jiangfeng Li; Weixiong Rao in the year 2020 has
proposed that Size Prediction for Online Clothing Shopping with Heterogeneous
Information by a methodology of Naïve Bayes Algorithm by saying in their paper, we
studied the size prediction problem and proposed a method to improve the
predication accuracy. We capture the subtle semantics of customer reviews by using
Bert model to learn the latent representations. Deep features are learned from the
review data, combining with customers' body features and the product's information to
predict the product's fitness for a customer. The experiments on the real online
clothing retailer dataset validate the effectiveness of the proposed method.
[3] Yuanbang Liang; Yunyu Jia; Jinglin Li; Meiyi Chen; Yifan Hu; Yinan Shi; Fei Ma in
the year 2020 has proposed that Online Shop Daily Sale Prediction Using Adaptive
Network-Based Fuzzy Inference System by saying that This paper proposes a
method for online shop daily sale forecasting. In the method, Kalman Filter was firstly
applied on the historic sale data to smooth the data. Adaptive Network-based Fuzzy
Inference System (ANFIS) was then built to achieve time series forecasting. Sale
histories of an online shop was used to evaluate the method. Overall performance of