0% found this document useful (0 votes)

37 views

Balaji Capstone Project 1

The document discusses a project to develop a churn prediction model for a DTH company facing competition. It describes understanding the business problem, need for the study, and social opportunity. It then discusses the customer churn dataset including data ingestion, visualization, and attribute information to understand the data for building a predictive model.

Uploaded by

Balaji Bala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Balaji Capstone Project 1

Uploaded by

Balaji Bala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

CAPSTONE GRADED PROJECT -1

A PROJECT REPORT

Submitted by

BALAJI S (PGP-DSBA-JUNE 2023 TO JUNE2024)

Introduction of the business problem
Problem Statement: -
A DTH company provider is facing a lot of competition in the
current market and it has become a challenge to retain the existing
customers in the current situation. Hence, the DTH company wants to
develop a model through which they can do churn prediction of the
accounts and provide segmented offers to the potential churners. In
this company, account churn is a major thing because 1 account can
have multiple customers. hence by losing one account the company
might be losing more than one customer.
we have been assigned to develop a churn prediction model for this
company and provide business recommendations on the campaign.
The model or campaign has to be unique and has to sharp when offers
are suggested. The offers suggested should have a win-win situation
for company as well as customers so that company doesn’t hit on
revenue and on the other hand able to retain the customers.

Need of the study/project

This study/project is very essential for the client to plan for
future in terms of product designing, sales or in rolling out different
offers for different segment of clients. The outcome of this project
will give a clear understanding where the firm stands now and what’s
the capacity it holds in terms for taking risk. It will also denote
what’s the future prospective of the organization and how they can
make it even better and can plan better for the same and can help
them retaining customers in a longer run.

Understanding business/social opportunity

This a case study of a DTH company where in they have
customers assigned with unique account ID and a single account ID
can hold many customers (like family plan) across gender and marital
status, customers get flexibility in terms of mode of payment they
want to opt for. Customers are again segmented across various types
of plans they opt for as per their usage which also based on the device
they use (computer or mobile) moreover they ear cashbacks on bill
payment.
The overall business runs in customers loyalty and stickiness which
in-turn comes from providing quality and value-added services. Also,
running various promotional and festivals offers may help
organization in getting new customers and also retaining the old one.
We can conclude that a customer retained is a regular income for
organization, a customer added is a new income for organization and
a customers lost will be a negative impact as a single account ID
holds multiple number of customers i.e.; closure of one account ID
means loosing multiple customers.
It’s a great opportunity for the company as it’s a need of almost every
individual of family to have a DTH connection which in-turn also
leads to increase and competition.
Question arises how can a company creates difference when
compared to other competitors, what are the parameter plays a vital
role having customers loyalty and making them stay. All these social
responsibilities will decide the best player in the market.
Data Report
Dataset of problem: - Customer Churn Data Data Dictionary: -
 AccountID -- account unique identifier
 Churn -- account churn flag (Target Variable)
 Tenure -- Tenure of account
 City_Tier -- Tier of primary customer's city
 CC_Contacted_L12m -- How many times all the customers of the
account has
contacted customer care in last 12months
 Payment -- Preferred Payment mode of the customers in the
account
 Gender -- Gender of the primary customer of the account
 Service_Score -- Satisfaction score given by customers of the
account on service
provided by company
 Account_user_count -- Number of customers tagged with this
account
 account_segment -- Account segmentation on the basis of spend
 CC_Agent_Score -- Satisfaction score given by customers of the
account on customer
care service provided by company
 Marital_Status -- Marital status of the primary customer of the
account
 rev_per_month -- Monthly average revenue generated by account
in last 12 months
 Complain_l12m -- Any complaints has been raised by account in
last 12 months
 rev_growth_yoy -- revenue growth percentage of the account (last
12 months vs last
24 to 13 month)
 coupon_used_l12m -- How many times customers have used
coupons to do the
payment in last 12 months
 Day_Since_CC_connect -- Number of days since no customers in
the account has
contacted the customer care
 cashback_l12m -- Monthly average cashback generated by account
in last 12 months
 Login_device -- Preferred login device of the customers in the
account
Data Ingestion: -
Loaded the required packages, set the working directory, and
loaded the data file.
The data set has 11,260 observations and 19 variables (18
independent and 1 dependent or target variable).

Table 1 – glimpse of the data-frame head with top 5 rows

Understanding how data was collected in terms of time, frequency
and methodology
• data has been collected for random 11,260 unique account ID,
across gender and marital status.
• Looking at variables “CC_Contacted_L12m”, “rev_per_month”,
“Complain_l12m”,“rev_growth_yoy”, “coupon_used_l12m”,
“Day_Since_CC_connect” and “cashback_l12m”we can
conclude that the data has been collected for last 12 month.
• Data has 19 variables, 18 independent and 1 dependent or the
target variable, which shows if customer churned or not.
• The data is the combination of services customers are
usingalong with their payment option and also then basic
individual
• details as well. Data is mixed of categorical as well as
continuous variables.
Visual inspection of data (rows, columns, descriptive details)

Data has 11,260 rows and 19 variables.

Table 2:- Dataset Information

Fig 1:- Shape of dataset

•
• Describing data: - This shows description of variation in
various statistical
measurements across variables which denotes that each variable is
unique and
different.
Table 3: - Describing Dataset

1. Except variables “AccountID”, “Churn”, “rev_growth_yoy” and

“coupon_used_for_payment” all other variables have null values
present.

Table 4: - Showing Null Values in Dataset

Data has “NIL” duplicate observations.

Understanding of attributes (variable info, renaming if required)

This project has 18 attributes contributing towards the target
variable. Let’s discuss about these variables one after another.
• AccountID – This variable represents a unique ID which
represents a unique
customer. This is of Integer data type and there is no null values
present in this.
• Churn – This is our target variable, which represents if
customer has churned or not.
This is categorical in nature will no null values. “0” represents “NO”
and ”1”
represents “YES”.
• Tenure – This represents the total tenure of the account since
opened. This is a
continuous variable with 102 null values.
• City_Tier – These variable segregates customer into 3 parts
based on city the
primary customer resides. This variable is categorical in nature and
have 112 null
values.
• CC_Contacted_L12m – This variable represents the number
of times all the
customers of the account has contacted customer care in last
12months. This
variable is continuous in nature and have 102 null values.
• Payment – This variable represents the preferable mode of bill
payment opted by
customer. This is categorical in nature and have 109 null values.
• Gender – This variable represents the gender of the primary
account holder. This is
categorical in nature and 108 null values.
• Service_Score – Scores provided by the customer basis the
service provided by the
company. This variable is categorical in nature and have 98 null
values.
• Account_user_count – This variable gives the number of
customers attached with an
accountID. This is continuous in nature and have 112 null values.

• account_segment – These variable segregates customers into

different segment
basis their spend and revenue generation. This is categorical in nature
and have 97
null values.
• CC_Agent_Score -- Scores provided by the customer basis the
service provided by
the customer care representative of the company. This variable is
categorical in
nature and have 116 null values.
• Marital_Status – This represents marital status of the primary
account holder. This is
categorical in nature and have 212 null values.
rev_per_month – This represents average revenue generated per
account ID in last
12 months. This variable is continuous in nature and have 102 null
values.
• Complain_l12m – This denotes if customer have raised any
complaints in last 12
months. This is categorical in nature and have 357 null values.
• rev_growth_yoy – This variable shows revenue growth in
percentage of account for
12 months Vs 24 to 13 months. This is continuous in nature and
doesn’t have any
null values.
• upon_used_l12m – This represents the number of times
customer’s have used
discount coupons for bill payment. This is continuous in nature and
doesn’t have any
null values.
• Day_Since_CC_connect – This represents the number of days
since customer have
contacted the customer care. Higher the number of days denotes better
the service.
This is continuous in nature and have 357 null values.
• cashback_l12m – This variable represents the amount of cash
back earned by the
customer during bill payment. This is continuous in nature and have
471 null values.
• Login_device – This variable represents in which device
customer is availing the
services if it’s on phone or on computer. This is categorical in nature
and have 221
null values.

❖ With the above understanding of data, renaming any of the

variables is not required.
❖ With the above understanding of data, we can move towards the
EDA part where
❖ we will understand the data better along with treating bad data,
null values, and outliers.

Exploratory data analysis

Univariate analysis (distribution and spread for every continuous
attribute, distribution of data in categories for categorical ones)
Univariate Analysis: -
 The variable shows outlier in data, which needs to be treated in
further steps.
Table 5: - Showing Outliers in data
❖ None of the variables show normal distribution and are skewed
in nature.
Fig 2: - Count plot of categorical variable
Inferences from count plot: -
• Maximum customers are from city tire type “1”, which
indicates the high number of population density in this city
type.
• A maximum number of customers prefer debit and credit cards
as their preferred mode of payment.
• The ratio of male customers is higher when compared to
females.
• The average service score given by a customer for the service
provided is around “3” which shows the area of improvement.
• Most of the customers are in the “Super+” segment and least
number of customers are in the “Regular” segment.
• Most of the customers availing services are “Married”.
• Most customers prefer “Mobile” as the device to avail services.
Bi-variate Analysis: -
• Pair plot across all categorical data and its impact on the target
variable.
fig 4: - pair plot across categorical variables

• The pair-plot shown above indicates that the independent

variable are week or poor predictors of target variable as we the
density of independent
• variable overlaps with the density of target variable.

Correlation among variable:-

We have performed correlation between variables after treating

bad data and missing values. We have also converted into integer data
types to check on correlation as data type as categorical wont show in
the pictures below.
Fig 6: - Correlation among variables
Inferences from correlation: -
• Variable “Tenure” shows high co-relation with Churn.
• Variable “Marital Status” shows high co-relation with churn.
• Variable “complain_ly” shows high- correlation with churn.

Removal of unwanted variables: - After in-depth understanding of

data we conclude that removal of variables is not required at this stage
of project. We can remove the variable “AccountID” which denotes a
unique ID assigned to unique customers. However, removing them
will lead to 8 duplicate rows. Rest all the variables looks important
looking at the univariate and bi-variate analysis.

Outlier treatment: -
This dataset is the mix of continuous as well as categorical variables.
It doesn’t make nay sense if we perform outlier treatment on
categorical variable as each category denotes a type of customer. So,
we are performing outlier treatment only for variables continuous in
nature.
• Used box plot to determine the presence if outlier in a variable.
• The dots outside the upper limit of a quantile represents the
outlier in the variable.
• We have 8 continuous variables in the dataset namely,
“Tenure”,
• “CC_Contacted_LY”, “Account_user_count”, “cashback”,
“rev_per_month”,
• “Day_Since_CC_connect”, “coupon_used_for_payment” and
“rev_growth_yoy”.
• We have used upper limit and lower limit to remove outliers.
Below is the pictorial representation of variables before and
after outlier treatment.
Before After
Fig 7: - Before and after outlier treatement
Missing Value treatment and variable transformation: -
• Out of 19 variables we have data anomalies present in 17
variables and null values in 15 variables.
• Using “Median” to impute null values where the variable is
continuous because the Median is less prone to outliers when
compared with the mean.
• Using “Mode: to impute null values where variables are
categorical.
• We have treated null values variable by variable as each variable
is unique.
Treating Variable “Tenure”
• We look at the unique observations in the variable and see that
we have “#” and “nan” present in the data.
• Where “#” is a anomaly and “nan” represents null value.
Fig 8: - before treatment
• Replacing “#” with “nan” and further we replace “nan” with the
calculated median of the
variable and now we don’t see any presence of bad data and null
values.
• Converted data type to integer, because IDE has recognized it
as object data type
due presence of bad data.
Treating Variable “City_Tier”
• We look at the unique observations in the variable and presence
of null value as shown below.

Fig 9: - before treatment

• we replaced “nan” with the calculated mode of the variable and
now we don’t see any presence of null values.
• Converted data type to integer, because IDE has recognized it
as object data type due presence of bad data.

Treating Variable “CC_Contacted_LY”

• We look at the unique observations in the variable and see the
presence of a null value as shown below.
• we are replacing “nan” with the calculated Median of the
variable and now we don’t see
any presence of null values.
• Converted data type to integer, because IDE has recognized it
as object data type
due presence of bad data.
Treating Variable “Payment”
• We look at the unique observations in the variable and see the
presence of a null value as
shown below.

Fig 10: - before treatment

• we are replacing “nan” with the calculated Mode of the variable
and now we don’t see
• any presence of null values.
• Also performed label encoding for the observations. Where 1 =
Debit card, 2 = UPI, 3 = credit card, 4 = cash on delivery and 5
= e-wallet. Then converting them to integer
• data type as it will be used for further model building.
Treating Variable “Gender”
• We look at the unique observations in the variable and see
presence of a null value
and multiple abbreviations of the same observations as shown below.

Fig 11: - before treatment

• we are replacing “nan” with calculated Mode of the variable
and now we don’t see any presence of null values.
• Also performed label encoding for the observations.
• Where 1 = Female card and 2 =Male.
• Then converting them to integer data type as it will be used for
further model building.
Treating Variable “Service_Score”
• We look at the unique observations in the variable and see
presence of null value as
shown below.
• we are replacing “nan” with calculated Mode of the variable and
now we don’t see any presence of null values.
• Then converting them to integer data type as it will be used for
further model building.
Treating Variable “Account_user_count”
• We look at the unique observations in the variable and see
presence of null value as well “@” as bad data, shown below.

Fig 12: - before treatment

• Replacing “@” with “nan” and further we replace “nan” with
the calculated median of the variable and now we don’t see any
presence of bad data and null values.
• Then convert them to integer data type as it will be used for
further model building.
Treating Variable “account_segment”
• We look at the unique observations in the variable and see the
presence of a null value as
well different denotations for the same type of observations, shown
below.

Fig 13: - before treatment

• Replacing “nan” with the calculated Mode of the variable and
also labeled different account segments, where in 1 = Super, 2 =
Regular Plus, 3 = Regular, 4 = HNI and 5 =
Super Plus and now we don’t see any presence of bad data and null
values.
• Then convert them to an integer data type as it will be used for
further model building.
Treating Variable “CC_Agent_Score”
We look at the unique observations in the variable and see the
presence of a null value as
shown below.

• Replacing “nan” with the calculated Mode of the variable and

now we don’t see any presence of bad data and null values.
• Then convert them to integer data type as it will be used for
further model building.

Treating Variable “Marital_Status”

• We look at the unique observations in the variable and see
presence of null value as
shown below.

Fig 14: - before treatment

• Replacing “nan” with the calculated Mode of the variable and
also labelled the observations.
• Where in 1 = Single, 2 = Divorced and 3 = Married and now
we don’t see any presence of bad data and null values.
• Then converting them to integer data type as it will be used for
further model building.
Treating Variable “rev_per_month”
• We look at the unique observations in the variable and see
presence of null value as
• well as presence of “+” which denoted bad data. shown below.
Fig 15: - before treatment
• Replacing “+” with “nan” and further we replace “nan” with
calculated median of the variable and now we don’t see any
presence of bad data and null values.
• Then converting them to an integer data type as it will be used
for further model building.
Treating Variable “Complain_ly”
• We look at the unique observations in the variable and see the
presence of a null value as
• Replacing “nan” with calculated Mode of the variable and now
we don’t see any presence of null values.
• Then converting them to integer data type as it will be used for
further model building.
Treating Variable “rev_growth_yoy”
• We look at the unique observations in the variable and see
presence of “$” which denoted bad data. shown below.

Fig 15: - before treatment

• Replacing “$” with “nan” and further we replace “nan” with
calculated median of the
variable and now we don’t see any presence of bad data and null
values.
• Then converting them to integer data type as it will be used for
further model
building.

Treating Variable “coupon_used_for_payment”

• We look at the unique observations in the variable and see
presence of “$”, “*” and
“#” which denoted bad data. shown below.

• Replacing “$”, “*” and “#” with “nan” and further we replace
“nan” with calculated median of the variable and now we don’t
see any presence of bad data and null values.
• Then converting them to integer data type as it will be used for
further model building.
Treating Variable “Day_Since_CC_connect”
• We look at the unique observations in the variable and see
presence of “$” which denoted bad data and also the presence of
null values.
• Replacing “$” with “nan” and further we replace “nan” with
calculated median of the variable and now we don’t see any
presence of bad data and null values.
• Then converting them to integer data type as it will be used for
further model building.

Treating Variable “cashback”

• We look at the unique observations in the variable and see
presence of “$” which denoted bad data and also the presence of
null values.
• Replacing “$” with “nan” and further replace “nan” with the
calculated median of the variable and now we don’t see any
presence of bad data and null values.
• Then converting them to integer data type as it will be used for
further model building.
Treating Variable “Login_device”
• We look at the unique observations in the variable and see
presence of “&&&&”which denoted bad data and also the
presence of null values.
• Replacing “&&&&” with “nan” and further we replace “nan”
with calculated Mode of the variable.
• Also, labelling the observations where in 1= Mobile and 2 =
Computer and now we don’t see any presence of bad data and
null values.
• Then converting them to integer data type as it will be used for
further model building.
Count of null values before and after treatment

Before After

Fig 16: - Before and after null value treatment

• We see NIL null values across variable which indicated that the
data is now cleaned and we can move further for data
transformation of required.
Variable transformation: -
• We see that the different variable have different dimensions.
Like variable “Cashback” denotes currency where as
“CC_Agent_Score” denotes rating provided by the customers.
Due to which they differ in their statistical rating as well
• Scaling would be required for this data set which in turn will
normalize the date and standard deviation will be close to “0”
• Using MinMax scalar to perform normalization of data.
Addition of new variables: -
At the current stage we don’t see to create ay new variable as such.
May be required at further stage of model building and can be created
accordingly.
Business insights from EDA
Is the data unbalanced? If so, what can be done? Please explain in
the context of the business
• Dataset provided is imbalance in nature. The categorical count
of our target variable “Churn” shows high variation in counts.
• We have count of “0” as 9364 and count of “1” as 1896.

Fig 18: - Imbalanced dataset

Any other business insights
• We see decent variations in data collection with a mixture of
services provided along
• with rating provided by the customer and also about customer
profile.
• Business needs to increase its visibility in tier 2 city and can
acquire new customers.
• Business can promote payment via standing instruction in bank
account or UPI which can be hassle free and safe for customer.
• There is need of improvement in service scores and have a lot of
grey area left over.
• Business and roll out a survey for better understanding of
customer’s expectations.
• Business can train their customer care executive to provide
better customer experience which in turn will improve their
feedback scores.
• Can have curated plans for customers not only based on the
spend they have but also the tenure they have spent with the
business.
• Can have curated plan for married people something like a
family floater.
End of Project Note - 1

Problem Statement
0% (2)
Problem Statement
2 pages
Capstone Presentation: Telecom Churn Study
100% (3)
Capstone Presentation: Telecom Churn Study
19 pages
Customer Churn - E-Commerce: Capstone Project Report
100% (1)
Customer Churn - E-Commerce: Capstone Project Report
43 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
Customer Churn Prediction Project: by Shweta Gupta
100% (6)
Customer Churn Prediction Project: by Shweta Gupta
41 pages
8 Steps to Problem Solving: Six Sigma
From Everand
8 Steps to Problem Solving: Six Sigma
Mohit Sharma
3.5/5 (3)
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
No ratings yet
Project 4 - Predictive Modeling - Telecom Customer Churn Prediction PDF
22 pages
Intermediate Accounting 1: a QuickStudy Digital Reference Guide
From Everand
Intermediate Accounting 1: a QuickStudy Digital Reference Guide
Michael P Griffin
5/5 (1)
Status of Watershed Management in Brgy. Dolos, Bulan, Sorsogon As Perceived by LGU Officials, BWD Employees and Its Residents For C.Y 2017-2018.
No ratings yet
Status of Watershed Management in Brgy. Dolos, Bulan, Sorsogon As Perceived by LGU Officials, BWD Employees and Its Residents For C.Y 2017-2018.
17 pages
Balaji Capstone Project 2
No ratings yet
Balaji Capstone Project 2
56 pages
Capstone Project Customer Churn Abhay Ankit Project Note 1
No ratings yet
Capstone Project Customer Churn Abhay Ankit Project Note 1
31 pages
Reference Report 2
No ratings yet
Reference Report 2
43 pages
Yash - Capstone Report PDF Notes1
No ratings yet
Yash - Capstone Report PDF Notes1
14 pages
Part 1 Capstone Sri
No ratings yet
Part 1 Capstone Sri
38 pages
Capstone+Project+ +Nikhil.+R+ +01
No ratings yet
Capstone+Project+ +Nikhil.+R+ +01
30 pages
Interim report
No ratings yet
Interim report
17 pages
Rahul Jha Capstone Final
No ratings yet
Rahul Jha Capstone Final
14 pages
Methodology
No ratings yet
Methodology
12 pages
FINALIZED VERSION
No ratings yet
FINALIZED VERSION
16 pages
DM Assg 041
No ratings yet
DM Assg 041
9 pages
Milestone 1
No ratings yet
Milestone 1
5 pages
Telco Customers Churn Predication_Analysis
No ratings yet
Telco Customers Churn Predication_Analysis
24 pages
CustomerChurnPrediction_ProjectReport_2555425555
No ratings yet
CustomerChurnPrediction_ProjectReport_2555425555
19 pages
DSS 2 DRAFT
No ratings yet
DSS 2 DRAFT
33 pages
SQL Project
No ratings yet
SQL Project
21 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
Customer Churn Presentation
No ratings yet
Customer Churn Presentation
28 pages
Churn of Customers
No ratings yet
Churn of Customers
3 pages
It Is This Very Small Risk Probability That Causes
From Everand
It Is This Very Small Risk Probability That Causes
William Blanke
No ratings yet
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
RAVI - Which of The Customers in Your Database Will You Choose To Engage With To Increase Revenue or Reduce Churn
No ratings yet
RAVI - Which of The Customers in Your Database Will You Choose To Engage With To Increase Revenue or Reduce Churn
10 pages
Abhay Ankit Customer Churn Capstone Project
No ratings yet
Abhay Ankit Customer Churn Capstone Project
19 pages
Report
No ratings yet
Report
17 pages
Basics of Accountancy
From Everand
Basics of Accountancy
chakrapani srinivasa
5/5 (1)
Caselet2 - Understanding Customer Churn in Telecom Sector
No ratings yet
Caselet2 - Understanding Customer Churn in Telecom Sector
2 pages
Capstone Presentation SB
No ratings yet
Capstone Presentation SB
7 pages
DataScience_Project-new[1]
No ratings yet
DataScience_Project-new[1]
16 pages
output_4
No ratings yet
output_4
5 pages
Telcom Customer Churn JMP Summit Presentation V7h
No ratings yet
Telcom Customer Churn JMP Summit Presentation V7h
6 pages
Key Performance Indicators: A Law Firm Guide
From Everand
Key Performance Indicators: A Law Firm Guide
Stephen Mabey
2/5 (1)
Customer_Churn_Prediction_Capstone_Projectdocx (1)
No ratings yet
Customer_Churn_Prediction_Capstone_Projectdocx (1)
11 pages
Telecom_Customer_Churn
No ratings yet
Telecom_Customer_Churn
5 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
Measuring Customer Satisfaction: Exploring Customer Satisfaction’s Relationship with Purchase Behavior
From Everand
Measuring Customer Satisfaction: Exploring Customer Satisfaction’s Relationship with Purchase Behavior
Tim Glowa
4.5/5 (6)
Integration of Machine Learning Techniques To Evaluate Dynamic Customer Segmentation Analysis For Mobile Customers
No ratings yet
Integration of Machine Learning Techniques To Evaluate Dynamic Customer Segmentation Analysis For Mobile Customers
12 pages
MI Assignment1 2 Case Study
No ratings yet
MI Assignment1 2 Case Study
1 page
Dictionary of Credit Risk Business Terms - EXTRACT
From Everand
Dictionary of Credit Risk Business Terms - EXTRACT
Steve Preece
No ratings yet
Capstone Project Weekly Progress Report
No ratings yet
Capstone Project Weekly Progress Report
3 pages
CUSTOMER CENTRICITY & GLOBALISATION: PROJECT MANAGEMENT: MANUFACTURING & IT SERVICES
From Everand
CUSTOMER CENTRICITY & GLOBALISATION: PROJECT MANAGEMENT: MANUFACTURING & IT SERVICES
Chandra Sekar
No ratings yet
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
CUSTOMER CHURN PREDICTION AND CATEGORIZATION A MACHINE LEARNING APPROACH TO ANALYSE CUSTOMER BEHAVIOR AND DECISION MAKING IN THE TELECOMMUNICATIONS INDUSTRY
No ratings yet
CUSTOMER CHURN PREDICTION AND CATEGORIZATION A MACHINE LEARNING APPROACH TO ANALYSE CUSTOMER BEHAVIOR AND DECISION MAKING IN THE TELECOMMUNICATIONS INDUSTRY
30 pages
The Art of Maximizing Debt Collections: Digitization, Analytics, AI, Machine Learning and Performance Management
From Everand
The Art of Maximizing Debt Collections: Digitization, Analytics, AI, Machine Learning and Performance Management
Darryl D'Souza
No ratings yet
Project Report..
No ratings yet
Project Report..
36 pages
Capstone Presentation - Yash Gupta
No ratings yet
Capstone Presentation - Yash Gupta
10 pages
20231102_EntranceTest_DAInternMCI
No ratings yet
20231102_EntranceTest_DAInternMCI
1 page
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
100% (1)
Customer Churn: by Dinesh Nair Adrien Le Doussal Fiona Tait Fatma Ahmadi Fulya Percin
20 pages
Telecom Customer Churn Prediction Assessment-Pratik Zanke
No ratings yet
Telecom Customer Churn Prediction Assessment-Pratik Zanke
19 pages
Tanaya and Akansha Major Project
No ratings yet
Tanaya and Akansha Major Project
74 pages
Amazon Data Analysis with SQL (1)
No ratings yet
Amazon Data Analysis with SQL (1)
4 pages
MID TERM REPORT
No ratings yet
MID TERM REPORT
11 pages
Monetization Tactics
From Everand
Monetization Tactics
Lucas Morgan
No ratings yet
X M Midterm QP A E 21 22
No ratings yet
X M Midterm QP A E 21 22
2 pages
Hawa Sadak Rate List
No ratings yet
Hawa Sadak Rate List
3 pages
YYC Rack Catalog
100% (1)
YYC Rack Catalog
22 pages
Chapter 7 The Information Age
No ratings yet
Chapter 7 The Information Age
28 pages
Communication Notes Unit 1 To Unit 3
No ratings yet
Communication Notes Unit 1 To Unit 3
35 pages
English Level Test
100% (2)
English Level Test
6 pages
Conclusion E104
No ratings yet
Conclusion E104
1 page
POP - Module-1 - Chapter 2-Structure of C Program
No ratings yet
POP - Module-1 - Chapter 2-Structure of C Program
48 pages
Clara, Short Story Setting Analysis
No ratings yet
Clara, Short Story Setting Analysis
2 pages
36 - Z. Pavlov - Single-Domed Mosques in Macedonia
No ratings yet
36 - Z. Pavlov - Single-Domed Mosques in Macedonia
10 pages
4 - Light Plot - Deck - Billy Elliot - C01 PDF
No ratings yet
4 - Light Plot - Deck - Billy Elliot - C01 PDF
1 page
Meet The Artist: Berthe Morisot: Get Creative
No ratings yet
Meet The Artist: Berthe Morisot: Get Creative
1 page
En Genetec HID Global VertX EVO V1000 Specifications Sheet
No ratings yet
En Genetec HID Global VertX EVO V1000 Specifications Sheet
2 pages
Unit Ii
100% (1)
Unit Ii
7 pages
Investigating Human Resource Practices and Its Impact On Employee Performance in Selected Banks in The Philippines
No ratings yet
Investigating Human Resource Practices and Its Impact On Employee Performance in Selected Banks in The Philippines
24 pages
Ug40 Governor With Digital Interface (Ug40-Di) Software Versions 3.07 - 3.08 - 4.01 - 4.02 Installation and Commissioning Manual b03575.
No ratings yet
Ug40 Governor With Digital Interface (Ug40-Di) Software Versions 3.07 - 3.08 - 4.01 - 4.02 Installation and Commissioning Manual b03575.
66 pages
Fork Safety and Inspection
No ratings yet
Fork Safety and Inspection
4 pages
Broken Stone Seed Inside Out
No ratings yet
Broken Stone Seed Inside Out
4 pages
Ceragon 2024
No ratings yet
Ceragon 2024
498 pages
Air
No ratings yet
Air
8 pages
Glossary of Street Furniture and Inspection Covers
No ratings yet
Glossary of Street Furniture and Inspection Covers
27 pages
trent pdf
No ratings yet
trent pdf
41 pages
IGCSE Computer Studies 0420: Unit 11: The Coursework Project
No ratings yet
IGCSE Computer Studies 0420: Unit 11: The Coursework Project
2 pages
NATURAL-SCIENCE-GRADE-7-PRACTICAL-TASK term 2 2025
100% (1)
NATURAL-SCIENCE-GRADE-7-PRACTICAL-TASK term 2 2025
5 pages
TLS 2007 11 02
No ratings yet
TLS 2007 11 02
36 pages
Oman Viva For GP
No ratings yet
Oman Viva For GP
75 pages
Osd-Answers For Phonics
No ratings yet
Osd-Answers For Phonics
2 pages
10 Key Techniques For Making Cocktails
No ratings yet
10 Key Techniques For Making Cocktails
4 pages
Snyder-One World, Rival Theories
No ratings yet
Snyder-One World, Rival Theories
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Balaji Capstone Project 1

Uploaded by

Balaji Capstone Project 1

Uploaded by

CAPSTONE GRADED PROJECT -1

BALAJI S (PGP-DSBA-JUNE 2023 TO JUNE2024)

Need of the study/project

Understanding business/social opportunity

Table 1 – glimpse of the data-frame head with top 5 rows

Data has 11,260 rows and 19 variables.

Table 2:- Dataset Information

Fig 1:- Shape of dataset

1. Except variables “AccountID”, “Churn”, “rev_growth_yoy” and

Table 4: - Showing Null Values in Dataset

Understanding of attributes (variable info, renaming if required)

• account_segment – These variable segregates customers into

❖ With the above understanding of data, renaming any of the

Exploratory data analysis

• The pair-plot shown above indicates that the independent

Correlation among variable:-

We have performed correlation between variables after treating

Removal of unwanted variables: - After in-depth understanding of

Fig 9: - before treatment

Treating Variable “CC_Contacted_LY”

Fig 10: - before treatment

Fig 11: - before treatment

Fig 12: - before treatment

Fig 13: - before treatment

• Replacing “nan” with the calculated Mode of the variable and

Treating Variable “Marital_Status”

Fig 14: - before treatment

Fig 15: - before treatment

Treating Variable “coupon_used_for_payment”

Treating Variable “cashback”

Fig 16: - Before and after null value treatment

Fig 18: - Imbalanced dataset

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.