Cart-Rf-Ann: Prepared by Muralidharan N
Cart-Rf-Ann: Prepared by Muralidharan N
Cart-Rf-Ann: Prepared by Muralidharan N
PREPARED BY
MURALIDHARAN N
1
CART-RF-ANN
An Insurance firm providing tour insurance is facing higher claim frequency. The
management decides to collect data from the past few years. You are assigned the task to
make a model which predicts the claim status and provide recommendations to management.
Use CART, RF & ANN and compare the models' performances in train and test sets.
Data Dictionary
1. Target: Claim Status (Claimed)
2. Code of tour firm (Agency Code)
3. Type of tour insurance firms (Type)
4. Distribution channel of tour insurance agencies (Channel)
5. Name of the tour insurance products (Product)
6. Duration of the tour (Duration)
7. Destination of the tour (Destination)
8. Amount of sales of tour insurance policies (Sales)
9. The commission received for tour insurance firm (Commission)
10. Age of insured (Age)
2.1 Data Ingestion: Read the dataset. Do the descriptive statistics and do null value
condition check, write an inference on it.
Channel is online
We will further look at the distribution of dataset in univarite and bivariate analysis
As there is no unique identifier I’m not dropping the duplicates it may be different customer’s data.
4
AGENCY_CODE: 4
JZI 239
CWT 472
C2B 924
EPX 1365
TYPE: 2
Airlines 1163
Travel Agency 1837
CLAIMED: 2
5
Yes 924
No 2076
CHANNEL: 2
Offline 46
Online 2954
PRODUCT NAME: 5
Gold Plan 109
Silver Plan 427
Bronze Plan 650
Cancellation Plan 678
Customised Plan 1136
DESTINATION: 3
EUROPE 215
Americas 320
ASIA 2465
Categorical Variables
Agency Code
The distribution of the agency code, shows us EPX with maximum frequency
8
The box plot shows the split of sales with different agency code and also hue having claimed
column.
It seems that C2B have claimed more claims than other agency.
9
The box plot shows the split of sales with different type and also hue having claimed column.
We could understand airlines type has more claims.
The majority of customers have used online medium, very less with offline medium
10
The box plot shows the split of sales with different channel and also hue having claimed
column.
Customized plan seems to be most liked plan by customers when compared to all other plans.
11
The box plot shows the split of sales with different product name and also hue having
claimed column.
Asia is where customers choose when compared with other destination places.
12
The box plot shows the split of sales with different destination and also hue having claimed
column.
To build our models we are changing the object data type to numeric values.
Feature: Type
[Airlines, Travel Agency]
Categories (2, object): [Airlines, Travel Agency]
[0 1]
Feature: Claimed
[No, Yes]
Categories (2, object): [No, Yes]
[0 1]
Feature: Channel
[Online, Offline]
Categories (2, object): [Offline, Online]
[1 0]
Feature: Destination
[ASIA, Americas, EUROPE]
Categories (3, object): [ASIA, Americas, EUROPE]
[0 1 2]
Checking the proportion of 1s and 2s in the dataset. That is our target column.
2.2 Data Split: Split the data into test and train, build classification
model CART, Random Forest, Artificial Neural Network
For training and testing purpose we are splitting the dataset into train and test data in the ratio
70:30.
17
BEST GRID
MODEL 2
MODEL 3
MLP CLASSIFIER
GRID SEARCH
FITTING THE MODEL USING THE OPTIMAL VALUES FROM GRID SEARCH
ACCURACY
CONFUSION MATRIX
24
ACCURACY
CONFUSION MATRIX
26
AUC and ROC for the training data for Random Forest
27
ACCURACY
CONFUSION MATRIX
28
MODEL 3
ANN
29
CONFUSION MATRIX
ACCURACY
ACCURACY
CONFUSION MATRIX
31
2.4 Final Model: Compare all the model and write an inference which
model is best/optimized.¶
32
CONCLUSION:
2.5 Inference: Based on the whole Analysis, what are the business
insights and recommendations?
Looking at the model, more data will help us understand and predict models better.
• Other interesting fact, is almost all the offline business has a claimed associated
• Need to train the JZI agency resources to pick up sales as they are in bottom, need to run
promotional marketing campaign or evaluate if we need to tie up with alternate agency
• Also based on the model we are getting 80%accuracy, so we need customer books airline
tickets or plans, cross sell the insurance based on the claim data pattern.
• Other interesting fact is more sales happen via Agency than Airlines and the trend shows the
claim are processed more at Airline. So we may need to deep dive into the process to
understand the workflow and why?