Customer Churn Internship Report PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Internship Report

on
CUSTOMER CHURN PREDICTION IN TELECOM
INDUSTRY

Submitted by

R SANTOSHKUMAR REDDY

22695A3114

In Partial Fulfillment of the Requirements for the Award of the degree of

BACHELOR OF TECHNOLOGY
In

ARTIFICIAL INTELLIGENCE

MADANAPALLE INSTITUTE OF TECHNOLGY & SCIENCE


(UGC – AUTONOMOUS)
(Affiliated to JNTUA, Ananthapuramu)
Accredited by NBA, Approved by AICTE, New Delhi)
AN ISO 9001:2008 Certified Institution
P. B. No: 14, Angallu, Madanapalle – 517325

2024-2025

1
2009-213

DEPARTMENT OF ARTIFICIAL INTELLIGENCE

BONAFIDE CERTIFICATE
This is to certify that the internship work entitled “Customer churn prediction in telecomindustry”
is a bonafide work carried out by

R Santoshkumar Reddy - 22695A3114

Submitted in partial fulfillment of the requirements for the award of degree Bachelor of Technology
in the Department of Artificial Intelligence, Madanapalle Institute of Technology & Science,
Madanapalle, affiliated to Jawaharlal Nehru Technological University Anantapur,
Ananthapuramu during the academic year 2024-2025.

Faculty Internship Co-ordinator Dr. K. Chokkanathan


Assistant Professor Dr. A. Poongodai Head & Associate Professor
Department of AI Assistant Professor Department of AI
Department of AI

2
ACKNOWLEDGEMENT

I sincerely thank the Management of Madanapalle Institute of Technology & Science


for providing excellent infrastructure and lab facilities that helped me to complete this project.

I sincerely thank Dr. C. Yuvaraj, M.E., Ph.D., Principal for guiding and providing facilities
for the successful completion of our project at Madanapalle Institute of Technology & Science,
Madanapalle.

I express our deep sense of gratitude to Dr. K. Chokkanathan, M. Tech., Ph.D., Associate
Professor & Head, Department of AI for his continuous support in making necessary arrangements
for the successful completion of the project.

I express our sincere thanks to the Internship Coordinator, Dr. A. Poongodai, Ph,D.,
Assistant Professor, Department of AI for his tremendous support for the successful completion of
the internship Project.

I also wish to place on record my gratefulness to other Faculty members of Department of


AI and also to our friends and our parents for their help and cooperation during our internship.

3
CERTIFICATE:

4
DECLARATION

I, the undersigned hereby declare that the results embodied in this Internship
“Customer churn prediction in telecom industry” is a bonafide record of the work done by
me in partial fulfillment of the award of Bachelor of Technology in Artificial Intelligence
from Jawaharlal Nehru Technological University Anantapur, Anantapur. The content of
this report is not submitted to any other University/Institute for award of any other degree.

Place: Madanapalle R Santoshkumar Reddy


Date: 22695A3114
Department of Artificial Intelligence
Madanapalle Institute of Technology & Science
Madanapalle.

5
CONTENTS

ABSTRACT…………………………………………………………………………… 7
List of Figures…………………………………………………………………………. 8
List of Symbols, Abbreviations or Nomenclature (optional)………………………… 9
1. INTRODUCTION…………………………………………………………………….. 15
1.1. Problem Statement 10
1.2. Objective 10
1.3. Domain Technology 10
1.4. Industry Vertical 11
1.5. Data 13
1.6. Methods 14
2. HARDWARE AND SOFTWARE…………………………………………………… 17
2.1. Platform and Hardware Used 16
2.2. Software Used 17
3. PROJECT ANALYSIS……………………………………………………….……… 24
3.1. Architecture Diagram 18
3.2. Implementation 18
3.3. Algorithm / Techniques 20
3.4. Result 24

CONCLUSIONS…………………………………………………………………..………. 25
APPENDICES………………………………………………………………………..……. 33
SOURCE CODE 26
SCREEN SHOT 27
BIBLIOGRAPHY………………………………………………………………….……… 34

6
ABSTRACT

In this project we attempt to implement machine learning approach to predict Customer churn
in telecom industry.

Customer churn prediction is a crucial analytical technique for the telecom industry to identify
customers likely to discontinue their services. High churn rates can lead to revenue losses and increased
customer acquisition costs.

This project explores the development of a predictive model that can accurately forecast
customer churn using historical data, including customer demographics, service usage, billing
information, and interaction history. By leveraging machine learning techniques, such as logistic
regression, decision trees, and ensemble methods, we aim to identify patterns that contribute to customer
attrition. The model's insights enable telecom providers to implement targeted retention strategies,
enhance customer satisfaction, and ultimately reduce churn rates.

This project demonstrates the effectiveness of machine learning in enhancing customer


retention and optimizing business operations within the telecom sector.

7
LIST OF FIGURES

Figure
Number Description Page No

1 Target Variables per Category 31


2 Dataset 31
3 Percentage of missing values 32
4 Bivariate Analysis Report 32
5 Distribution of Churners 33
6 HeatMap Of Churners 33

8
LIST OF ABBREVIATION

ACRONYM ABBREVIATION

DS Data Science

ML Machine Learning

AI Artificial Intelligence

EDA Exploratory Data Analysis

SMOTE Synthetic Minority Oversampling Technique

MLA Machine Learning Algorithm

9
CHAPTER 1
INTRODUCTION

1.1 Problem Statement


The telecom industry faces a significant challenge in retaining customers due to high
competition, leading to increased customer churn. Customer churn occurs when
subscribers stop using the service, which impacts revenue and requires high costs to
acquire new customers. Despite the importance of customer retention, identifying
customers at risk of churning remains a complex problem due to the vast amount of data
and various influencing factors, such as customer usage patterns, service quality, billing,
and interaction history.

1.2 Objective
Customer Churn Prediction in Telecom Industry

The primary objective of this project is to develop a predictive model for customer churn
in the telecom industry, enabling the early identification of customers likely to discontinue their
service. Specifically, the project aims to:
o Analyze customer behavior patterns and identify critical factors contributing to churn.
o Build and evaluate machine learning models that can predict churn with high accuracy.
o Provide actionable insights to help the telecom company design targeted retention
strategies.
o Develop a cost-effective and efficient solution to reduce churn, enhance customer
satisfaction, and increase overall customer lifetime value..

1.3 Domain Technology

Machine Learning
Machine learning plays a crucial role in developing predictive models that can analyze
historical Customer data and forecast future Churners. Here's how machine learning can be applied
in various stages of the project.
1. Data Preprocessing

• Handling Missing Data


Machine learning algorithms often require complete datasets. Techniques such as
interpolation or imputation can be used to address missing data.

10
• Normalization/Scaling
Ensuring that features are on similar scales to prevent certain features from dominating
the learning process.

2. Model Selection
• Machine Learning Algorithms
Regression algorithms (linear regression, decision trees, random forests) and more
advanced techniques like gradient boosting or neural networks can capture complex patterns.

3. Training and Testing


• Train-Test Split
Dividing the dataset into training and testing sets to evaluate model performance on
unseen data.
• Cross-Validation
Employing techniques like k-fold cross-validation to assess the robustness of the model.

4. Evaluation Metrics
. Accuracy, Precision, Recall (Sensitivity), F1 Score, ROC-AUC (Receiver Operating
Characteristic - Area Under Curve), Confusion Matrix
Common metrics for evaluating regression models.

• Profit and Loss Metrics


Designing custom metrics that reflect the financial impact of trading decisions based on
predictions.
5. Deployment
Integrating the trained model into a user-friendly interface for practical use by investors
or traders.

It's important to note that stock price prediction is a challenging task due to the inherent
volatility and complexity of financial markets. Continuous monitoring and adaptation of the model
to changing market conditions are often necessary for sustained performance. Additionally,
considering economic indicators and external events that might influence Customer churn can further
enhance the predictive capabilities of the model.

1.4 Industry Vertical


The telecommunications industry is highly competitive, with numerous service
providers vying for market share. This industry vertical involves offering mobile, broadband, and TV
services to both individual and enterprise customers.

In this domain, customer churn prediction has emerged as a crucial strategic focus, as it helps providers
identify which customers are at risk of leaving for a competitor, allowing for proactive retention
strategies.

Churn prediction leverages data science and machine learning to analyse customer behaviours, usage
patterns, complaints, and service interactions, identifying factors that contribute to churn.

By effectively predicting and managing churn, telecom companies can reduce revenue loss, optimize
customer service, and increase customer satisfaction and loyalty.
11
This focus on customer retention is critical in the telecom industry where acquisition costs are high, and
customer lifetime value is a key metric for profitability.

12
1.5 Data
1. Demographic Data:
Includes age, gender, location, and income level, which helps understand different customer
segments and their propensity to churn.

2. Usage Data:
Encompasses information about call usage, SMS, data usage, and roaming frequency.
Patterns in high or low usage can be indicators of customer engagement or dissatisfaction.

3. Billing and Payment Data:


Contains billing cycle details, payment history, and overdue accounts. Late payments or
payment declines can indicate financial distress or dissatisfaction.

4. Contract and Tenure Data:


Includes contract duration, renewal status, and customer tenure (how long they've been with
the provider). Long-tenure customers often have different churn patterns than newer ones.

5. Customer Support Data:


Tracks interactions with customer service, including the frequency and nature of complaints,
resolution times, and satisfaction ratings. Frequent complaints or unresolved issues can be
predictive of churn.

6. Network Quality Data:


Consists of information on call drops, signal strength, and downtime or service
interruptions. Poor network quality is often a significant driver of customer churn.

7. Engagement Data:
Captures how customers interact with various services, apps, promotions, and loyalty
programs. Low engagement with these offerings can indicate a lower likelihood of customer
retention.

8. Promotional and Plan Change Data:


Records details about offers, discounts, and plan changes. Customers frequently changing
plans or only responding to significant discounts might be more prone to churn.

Using these data types, machine learning models can be developed to predict churn risk
accurately, enabling telecom companies to take targeted, data-driven retention actions.

13
1.6 Methods
In a "Customer Churn Prediction" project, various methods and techniques are used to analyze
historical data and forecast future customer churn. Here are common methods employed in
such projects:

1. Machine Learning Algorithms:

Decision Trees and Random Forests: Ensembles of decision trees can capture complex patterns
in historical data.
Gradient Boosting Machines (GBM): Builds a strong predictive model by combining weak
models sequentially.

Support Vector Machines (SVM): Can be used for regression to predict stock prices.

1. Feature Engineering:

Creating lag features to capture historical trends.


Engineering additional features based on financial indicators and market sentiment.
2. Ensemble Methods:

Combining multiple models using ensemble techniques, such as bagging or boosting, to improve
overall prediction accuracy.
4. Hyperparameter Tuning:

Using techniques like grid search or random search to find the optimal hyperparameters for the
chosen models.

14
1. Cross-Validation:

Employing k-fold cross-validation to assess model performance on different subsets of the data.
2. Evaluation Metrics:

Use appropriate evaluation metrics such as Accuracy,Precision,Recall,F1-score,ROC-AUC to


assess the performance of each model
Designing custom metrics that reflect the financial impact of trading decisions based on
predictions.
3. Regularization Techniques:

Applying L1 and L2 regularization to prevent overfitting and improve model generalization.


4. Feature Selection:

Identifying and selecting the most relevant features to improve model efficiency and
interpretability.
5. Back testing:

Simulating the performance of the predictive model on historical data to assess how well it would
have performed in the past.
6. Algorithmic Trading Strategies:

Implementing trading strategies based on predicted customer churn, such as threshold-based


trading or trend-following strategies.
7. Deep Learning Models:

Exploring more advanced deep learning architectures beyond traditional neural networks, such as
recurrent neural networks (RNNs) or transformer models.
8. Time Series Forecasting Libraries:

Leveraging specialized libraries like Prophet or stats models for time series forecasting.
Reinforcement Learning (in specific cases):
Exploring reinforcement learning approaches for dynamic decision-making in trading, although
this is less common due to the challenges of applying RL to financial markets.
It's important to note that the choice of methods depends on factors such as the characteristics of
the data, the time horizon for predictions, and the specific goals of the project. Often, a
combination of methods or an ensemble approach yields the best results. Additionally, regular
model updating and adaptation to changing market conditions are crucial for sustained perform.

15
CHAPTER 2

TOOLS AND TECHNOLOGIES


2.1 Platform and Hardware Used
This section is to present a detailed description of the Platforms used for this project. It
explains the hardware and software requirements for developing the application and its
interface, tested features of the program
• What the program will do

• The constraints under which it must operate

• And how the programreacts to external stimuli

Hardware description development and deployment of the application requires the following
general and specific minimum requirements for Hardware.

➢ Processor : 11th gen Intel(R)core(TM)i5-1240p 1.70 GHZ


➢ Ram Capacity : 8.0 GB for development and evaluation use
➢ System Type : 64-bit Operating System,x64-based processor
➢ Storage Type : 512 SSD
➢ Solid State Drive: 5 GB for development and evaluation use in total capacity of 512 GB
➢ High speed Broadband connection

2.2 Software Used :

It is a development and deployment of the application requires the following general and
specific minimum requirements for software.
Operating System : Windows operating system.
IDE : Google colab
Programming Language : Python3
Domain : Machine Learning
Module : Sklearn
Packages : NumPy, Pandas, Matplotlib,Seaborn,os

GOOGLE COLAB
The Basics. Colaboratory, or “Colab” for short, is a product from Google Research. Colab
allows anybody to write and execute arbitrary python code through the browser, and is
especially well suited to machine learning, data analysis and education.
Google Colab Features
Google Colab has all of the exciting features that any modern IDE has, plus a lot more. The
following are some of exciting features.
16
➢ · Machine learning and neural networks are taught using interactive tutorials.
➢ · Without a local setup, you can write and execute Python 3 code.
➢ · Execute terminal commands from the Notebook.
➢ · Import datasets from external sources such as Kaggle.
➢ · Save your Notebooks to Google Drive.
➢ · Import Notebooks from Google Drive.
➢ · Free cloud service, GPUs and TPUs.
➢ · Integrate with PyTorch, Tensor Flow, Open C

1. Programming Languages
• Python
Widely used for data analysis, machine learning, and building predictive models.
2. Data Processing and Analysis
• Pandas
A Python library for data manipulation and analysis.
• NumPy
Essential for numerical operations in Python.
•Matplotlib and Seaborn
Libraries for data visualization.
3. Machine Learning Libraries
• Scikit-learn
Offers a wide range of machine learning algorithms and tools for data preprocessing.
4. Time Series Analysis
•Stats models
Useful for time series analysis and statistical modeling.
5. Financial Data APIs
•Yahoo Finance API, Alpha Vantage, Quandl
Sources for obtaining historical stock price data.
6. Data Visualization
•Matplotlib, Plotly
Tools for creating interactive and informative visualizations.

17
CHAPTER 3

PROJECT ANALYSIS
3.1 Architecture Diagram

3.2 Implementation
• Importing the data
• Importing the libraries
• Loading the dataset and analyzing the customer behavior
• Scaling the data
• Building the model and train the data
• Performance evaluation on test set
1. Data Collection:
Gather historical customer data for the target company, including features such as opening price,
tenure, payment and services ,monthly charges and Yearly Charges.

18
2. Data Preprocessing:
Handle missing data, normalize or scale features as needed, and perform any necessary data
cleaning.
3. Feature Engineering:
Create lag features and incorporate relevant financial indicators and technical indicators to enhance
the dataset.
4. Model Development:
Implement three regression models: Decision Tree, Random Forest and Support Vector
Regression (SVR).
Train each model on a subset of historical data.
5. Hyperparameter Tuning:
Optimize hyperparameters for each regression model to improve predictive accuracy.
6. Evaluation Metrics:
Use appropriate evaluation metrics such as Accuracy,Precision,Recall,F1-score,ROC-AUC to
assess the performance of each model.
7. Model Comparison:
Compare the accuracy of the three models and identify the one that demonstrates the highest
predictive performance.
8. Visualization:
Visualize the predicted Customer churn against the actual prices to provide a clear understanding of
the model's performance.
9. Documentation:
Document the entire process, including data preprocessing steps, feature engineering, model
development, hyperparameter tuning, and evaluation metrics.
10. Deliverables:
A report detailing the methodology, findings, and insights gained from the customer churn
prediction models.
Visualizations comparing predicted and actual customer churn.
Codebase with well-documented scripts for data preprocessing, feature engineering, and model
development.
11. Success Criteria:
The success of the project will be measured by selecting the model that achieves the highest
accuracy in predicting customer churn, as determined by the chosen evaluation metrics.

19
3.3 Algorithm/Techniques

1. Machine Learning Algorithms:


Logistic Regression
Decision Trees
Random Forests
Gradient Boosting Machines (GBM)
Support Vector Machines (SVM)
2. Feature Engineering:
Creating lag features
Engineering additional features based on financial indicators and market sentiment
3. Ensemble Methods:
Combining multiple models using ensemble techniques (bagging or boosting)
4. Hyperparameter Tuning:
Using techniques like grid search or random search to find optimal hyperparameters for
models
5. Cross-Validation:
Employing k-fold cross-validation to assess model performance on different subsets of the
data
6. Evaluation Metrics:
Accuracy
Precision
Recall
F1-Score
ROC-AUC Curve
Designing custom metrics reflecting the financial impact of trading decisions
7. Feature Selection:
Identifying and selecting the most relevant features to improve model efficiency and
interpretability
8. Backtesting:
Simulating the performance of the predictive model on historical data to assess how well it
would have performed in the past

9. Deep Learning Models:


Exploring more advanced deep learning architectures beyond traditional neural networks,
such as recurrent neural networks (RNNs) or transformer models
10. Reinforcement Learning (in specific cases):
Exploring reinforcement learning approaches for dynamic decision-making in trading,
although less common due to the challenges of applying RL to financial markets.

These algorithms and techniques offer a diverse set of tools for addressing the complexities of
stock price prediction, combining traditional time series analysis, machine learning, deep
learning, and specialized forecasting libraries
20
3.3.1 Algorithm 1
• Importing the data
• Importing the libraries
• Loading the dataset and analyzing the customer behavior
• Scaling the data
• Building the model and train the data
• Performance evaluation on test set
Here we will import the dataset and used to explore through it.
The data set contains the following features.
• customerID: Unique identifier for each customer.

• gender, SeniorCitizen, Partner, Dependents: Demographic features.


• tenure: How long a customer has been with the company (in months).
• Phone Service, Multiple Lines, Internet Service, Online Security, etc.: Information on
services the customer is subscribed to.
• Contract, Paperless Billing, Payment Method: Contract and payment details.
• Monthly Charges and Total Charges: Financial data regarding the customer's payment.
• Churn: The target variable (whether the customer has churned or not).
Reading the csv file of dataset
By using the following syntax, we can directly import the dataset into our project. We
have to use internet for this. Either we can import the dataset into the jupyter and then read the
csv file using pandas.read_csv(‘saved dataset’). Here pdr refers pandas_datareader library.
Syntax
df=pdr.DataReader(‘AAPL’,data_source=’yahoo’,start=’starting_date’,end=’ending_date’)
Since it a dataset which is imported from a website, it doesn’t have any null values in it.

21
Steps for Exploratory Data Analysis (EDA):
1. Data Cleaning:
o Handle missing values.
o Convert data types where necessary (e.g., Total Charges might need to be converted from
string to numeric).
2. Univariate Analysis:
o Distribution of the Churn variable to check churn rates.
o Analyze individual features like tenure, Monthly Charges, etc., using histograms or box plots
to understand distributions.
3. Bivariate Analysis:
o Analyze relationships between Churn and other features (e.g., gender, Contract, Payment
Method) using bar plots or stacked charts.
o Correlation analysis between numeric variables like tenure, Monthly Charges, and Total
Charges.
4. Multivariate Analysis:
o Investigate interactions between multiple variables (e.g., how tenure and Contract affect
Churn).

Scaling the data


After taking the closing values into a separate variable, we scale the data to avoid the large
difference in the data values when compared. Also, the data can be divided into parts for
training and testing. Here we use MinMaxScaler() to get all values in the range of 0 and 1. Then
we fit the scaled closing prices into the above function.

Building the model and train the data

1. Decision Trees
A Decision Tree is a simple, interpretable model that splits the data into branches based on
feature values to reach a decision (in this case, whether a customer will churn). It uses a series of
if-then rules to classify data. For instance, a tree might split customers based on features like
monthly charges, contract length, or customer service calls. Each split aims to increase the purity
of the nodes, meaning each node ideally contains mostly churners or non-churners.
Pros:
• Easy to interpret and visualize.

• Handles both numerical and categorical data.


Cons:
• Prone to overfitting, especially with noisy data.
• Performance can be suboptimal with complex datasets.

22
2. Random Forests
A Random Forest is an ensemble method that builds multiple decision trees and combines their
predictions. Each tree in the forest is trained on a random subset of the data and features. The final
prediction is made by averaging the predictions from all the trees (for regression) or using
majority voting (for classification).

Pros:
• Reduces overfitting compared to a single decision tree.
• More accurate and robust to variations in data.
Cons:
• Less interpretable than a single decision tree.

23
3.4 Result
1. Model Performance Metrics:
Higher values of Accuracy, Precision, Recall indicate better predictive performance.
Evaluate the metrics for each model to understand how close the predicted Customer churn are to
the actual prices.
2. Model Comparison:
Identify the model that consistently achieves the lowest values across the chosen metrics. This
model is considered the best performer in terms of accuracy for your specific dataset.
3. Visualization:
Create visualizations, such as line charts, comparing the predicted Customer churn of each model
against the actual stock prices. This provides a qualitative understanding of how well the
models capture the trends and patterns in the data.
4. Consideration of Complexity:
While accuracy is crucial, also consider the complexity of each model. Simpler models are often
preferred, especially if they achieve comparable accuracy, as they may generalize better to
unseen data.
5. Potential Next Steps:
If the selected model meets your accuracy requirements, you may proceed to deploy it for real-
time predictions or further refine it with additional features or hyperparameter tuning.
6. Documentation:
Document the results comprehensively, including insights gained from the comparison, any
challenges encountered, and recommendations for future improvements or adjustments to the
models.

24
CONCLUSIONS

The "Customer Churn Prediction in Telecom Industry" project aimed to leverage Decision Tree ,
Random Forest, and support vector Machine (SVM) to forecast the customer churn of company. The
following key conclusions and insights were obtained from the analysis:

Based on the accuracy of the models:


The accuracy of Decision Tree model is : 91.26732673267327
The accuracy of Random Forest model is : 94.84859901039993
The accuracy of Support Vector Machine model is : 90.46449540346566

Random Forest predicts the Customer churn of a company precisely because the accuracy of the
model is very high as compared to the accuracies of other models.

25
APPENDICES
SOURCE CODE:
import numpy as np
import pandas as pd
data=pd.read_csv("/content/Telco_Customer_Churn_Dataset (1).csv")
data.head()
data.shape
data.columns.values
data.dtypes
data.describe()
data['Churn'].value_counts().plot(kind='barh', figsize=(8,6))
plt.xlabel("count",labelpad=14)
plt.ylabel("Target variable",labelpad=14)
plt.title("count of Target per category",y=1.02)
data['Churn'].value_counts()
100*data['Churn'].value_counts()/len(data['Churn'])
data.info(verbose=True)
miss=pd.DataFrame((data.isnull().sum())*100/data.shape[0]).reset_index()
plt.figure(figsize=(16,5))
ax=sns.pointplot(x='index',y=0,data=miss)
plt.xticks(rotation=90,fontsize=7)
plt.title("percentage of missimg value")
plt.ylabel("percentage")
plt.show()

telco_data.TotalCharges=pd.to_numeric(telco_data.TotalCharges,errors='coerce')
telco_data.isnull().sum()

telco_data.loc[telco_data['TotalCharges'].isnull()==True]
telco_data.dropna(how='any',inplace=True)
print(telco_data['tenure'].max())

labels=["{0}-{1}".format(i,i+11) for i in range(1,72,12)]


telco_data['tenure_group']=pd.cut(telco_data.tenure,range(1,80,12),right=False,labels=labels)
telco_data['tenure_group'].value_counts()

26
#Data Exploration
for i,predictor in enumerate(telco_data.drop(columns=['Churn','TotalCharges','MonthlyCharges'])):
plt.figure(i)
sns.countplot(data=telco_data,x=predictor,hue='Churn')

telco_data['Churn'] = np.where(telco_data.Churn == 'No',0,1)


telco_data.head()
sns.lmplot(data=telco_data_dummies,x='MonthlyCharges',y='TotalCharges',fit_reg=False)

Mth=sns.kdeplot(telco_data_dummies.MonthlyCharges[(telco_data_dummies["Churn"]==0)],
color="Red",fill=True)
Mth=sns.kdeplot(telco_data_dummies.MonthlyCharges[(telco_data_dummies["Churn"]==1)],
ax =Mth,color="Blue",fill=True)
Mth.legend(["Churn","No Churn"],loc='upper right')
Mth.set_ylabel('Density')
Mth.set_xlabel('Monthly Charges')
Mth.set_title('Monthly Charges by churn')

plt.figure(figsize=(20,8))
telco_data_dummies.corr()['Churn'].sort_values(ascending=False).plot(kind='bar')

plt.figure(figsize=(20,8))
sns.heatmap(telco_data_dummies.corr(),cmap="Paired")

def uniplot(df,col,title,hue=None):
sns.set_style('whitegrid')
sns.set_context('talk')
plt.rcParams["axes.labelsize"]=20
plt.rcParams['axes.titlesize']=22
plt.rcParams['axes.titlepad']=30

temp=pd.Series(data=hue)
fig,ax=plt.subplots()
width=len(df[col].unique()) + 7 + 4 * len(temp.unique())
fig.set_size_inches(width,8)
plt.xticks(rotation=45)
27
plt.yscale('log')
plt.title(title)
ax=sns.countplot(data=df,x=col,order=df[col].value_counts().index,hue=hue,palette='bright')
plt.show()

28
Model Building
import pandas as pd
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.tree import DecisionTreeClassifier
from imblearn.combine import SMOTEENN

df=pd.read_csv("df_dummies.csv")
df

x=df.drop('Churn',axis=1)
print(x)

y=df['Churn']
print(y)

x_train,x_test,y_train,y_test=train_test=train_test_split(x,y,test_size=0.2)

#Decision tree classifier


dt=DecisionTreeClassifier(criterion='gini',random_state=100,max_depth=6,min_samples_leaf=8
)
dt.fit(x_train,y_train)

y_pred=dt.predict(x_test)
y_pred

dt.score(x_test,y_pred)

print(classification_report(y_test,y_pred,labels=[0,1]))

print(confusion_matrix(y_test,y_pred))

29
# Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier


rf=RandomForestClassifier(n_estimators=100, criterion='gini', random_state = 100,max_depth=6,
min_samples_leaf=8)
rf.fit(x_train,y_train)

RandomForestClassifier(max_depth=6, min_samples_leaf=8, random_state=100)


y_pred_rf=rf.predict(x_test)
print(classification_report(y_test, y_pred_rf, labels=[0,1]))

30
ScreenShots:

Target Variables per Category

Dataset:

31
Percentage of missing values:

Bivariate Analysis Report:

32
Distribution of Churners:

HeatMap:

33
BIBLIOGRAPHY

Code Links:
mailto:https://colab.research.google.com/drive/1N-
1kkpIRlKglirZHJxqpzkAN9Fx868ew?authuser=3#scrollTo=p5TybBo9v9xz

Book References:

1. "Python for Finance" by Yves Hilpisch.

2. "Hands-On Machine Learning for Algorithmic Trading" by Stefan Jansen.


3. "Machine Learning in Finance: From Theory to Practice" by Matthew F. Dixon, Igor Halperin,
and Paul Bilokon.
4. "Quantitative Financial Analytics: The Path to Investment Profits" by Kenneth L. Grant.

34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy