0% found this document useful (0 votes)

99 views28 pages

Fraud Detection in Auto Insurance

This document discusses developing a predictive model to detect fraudulent auto insurance claims. It describes preprocessing a dataset of 1000 claims, each with 38 features, to select important features and address class imbalance. Two models, RandomForestClassifier and XGBoostClassifier, are tuned and evaluated based on precision, recall, and ROC AUC. The best model achieves over 80% recall for both classes, but there is room for improvement. Key features for identifying potentially fraudulent claims are also identified.

Uploaded by

Dinesh Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views28 pages

Fraud Detection in Auto Insurance

Uploaded by

Dinesh Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Fraud Detection in Auto Insurance

Aditya Bhattar
Problem Statement

 Determine if a given Auto Insurance Claim is fraudulent or not for a given set of features. Develop a
predictive model for the same.
Potential Business Problems

 Identifying fraudulent claims is important for any Insurance Company as they would not want to pay out on
claims that are not legal
 For Auto Insurance, fraudulent claims may arise due to someone claiming for damages for an accident, when
no accident might have taken place
 Traditionally, insurance companies have a claim investigation team to investigate claims and determine
whether or not they are fraudulent.
 However, with the increase in power of computing and advanced analytics, insurance companies are trying
to come up with automated solutions to determine if a claim is fraudulent or not as the previous methods are
prone to errors.
Why Solve this problem

 Identify the circumstances which lead to fraudulent claims

 Reduce reliance on human investigators to determine whether a claim is fraudulent or not
 Correctly identify legal claims to ensure payout to rightful policyholders
 May also help to determine what type of policyholders the company should avoid.
Features of the Dataset

 1000 Datapoints
 38 features:
 13 numerical features
 25 categorical features
 One Target Variable: Fraud_Reported
 Yes – 247 Count
 No – 753 Count
 Target variable demonstrates High Class Imbalance. Important to take care of this during modelling.
 Dataset split into Train and Test set for modelling purpose. Train set had 800 datapoints. Test set had 200
datapoints.
Evaluation Metric

 The evaluation metric for this project is precision and recall and ROC AUC Score.
 For business usecase, it is important to identify false positives and true negatives, so recall is to be given
more importance.
Data Cleaning and Preprocessing
Changing dtypes and Dropping columns with Unique Values

 Needed to change dtype of certain columns from numerical to categorical as they were categorical features by nature:
 Number of Vehicles Involved
 Number of Bodily Injuries
 Number of Witnesses
 Decided to drop 5 columns as had too many unique values, did not give any useful insights:
 Policy Number
 Policy Bind Date
 Insured Zip
 Incident Date
 Incident Location
Imputation of Missing Values

 Three columns had ‘?’ (missing values):

 Collision Type
 Property Damage
 Police Report Available
 Imputed missing values using a KNNImuputer (K = 5)
Label Encoding of Categorical Features

 Categorical features were label encoded before inputting into the model.
 Did not one hot encode because of use of tree- based models.
Exploratory Data Analysis
Number Of Witnesses gives some information

 Even the presence of one witness increase the

chances of detecting a claim as fraudulent.
 Out of 247 fraudulent claims, 197 claims had
atleast one witness present
Police Report Is Important

 Availability of police report reduces the chances

of fraudulent claims, although fewer cases
actually have a police report available.
 Increase in number of datapoints would help to
determine a more clear relationship between
fraudulent claims and the availability of police
reports.
Occupations Reveal an Interesting Fact

 People who mentioned their occupation as Exec –

Managers have the most likelihood of making a
fraudulent claim
 37% of claims made by this category was
fraudulent.
Hobbies seems to be an important question to ask
policyholders

 People who play chess and practice cross-fit are

actually more likely to make a fraudulent claim
than a rightful claim.
 For both hobbies, people are more than 4 times
more likely to report a fraudulent claims.
Incident may not be so severe

 Major Damage is reported fraudulently more

often.
 This is what would be expected. People claiming
for minor damage tend to add other previous
damages that would not actually be part of the
incident for which the claim has been made.
Feature Selection
Correlation Amongst Numerical Features

 The following heatmap shows high correlation

amongst the following columns:
 Age and Months as customer
 Total Claim Amount with its individual
constituents. (Injury, Property, Vehicle)
 Age and the individual Constituents were dropped
from the dataset.
Feature Importance as per RFC

 As per RFC, incident severity and insured hobbies

are the most important features. This was evidenced
during EDA. This is followed by total claim
amount.
 However, choosing just three features does not
allow the model to separate well between the
classes
 Decided to drop features with feature importance of
less than 1%. This gives us 19 features.
Feature Importance as per XGBC

 As per Xgbc, insured education level and policy_csl

are the most important features.. This is followed by
incident hour of the day.
 However, choosing just three features does not
allow the model to separate well between the
classes
 Decided to drop features with feature importance of
less than 1%. This gives us 19 features.
Models and Approaches
Models Used

 After removing correlated features and features with unique values, we have 30 features in the dataset:
 9 Numerical Features
 21 Categorical Features
 Predictive Model used is RandomForestClassifier and XGBoostClassifier
 Also tried using GradientBoostingClassifier, however above models gave superior performance
 RandomForestClassifier gave best performance, so results for that have been shown.
Base model performance

 The Base model performed fairly well on rightful

claims, but not so great on the fraudulent claims.
 This shows the need to hypertune the model, by
giving more weight to the underrepresented class.
Could also be the case of overfitting on training
dataset.
Model Tuning

 Model performance has vastly improved because

of hyperparameter tuning.
 Recall for both classes is above 80%.
 From a business point of view, recall of fraudulent
class should be one, so still room for
improvement.
Model Performance with important features

 Model performance with selection of only

important features has improved slightly more
 F1-Score has increased from 0.84 to 0.85
ROC AUC Curve

 ROC AUC Curve shows how well the model has

been able to separate out the classes
 ROC AUC Score of 0.82 shows the model has
done a good job, but there is room for
improvement.
Insights and Business Decisions

 Claims to be targeted:
 Having atleast one witness
 Insured has a hobby of Chess or CrossFit
 Insureds listing their occupation as Exec- Managers
 Insureds reporting claims for major damage
 Claim amount and insured education level are also important features as evidenced by model feature importance
 There is also a need to increase police reports on claims made as it can serve as an initial screen for
fraudulent claims
 Also prepared an interactive Dashboard using Dash and plotly showing the Countplots for categorical
features
Future Steps

 Time permitting, following additional actions could have been taken:

 Use of different imputers for missing values to see if any other gives superior performance such as iterative imputer
 Use of different models like LogisticRegression and SVM to see how dataset responds to these models
 Application of sampling techniques to handle class imbalance such as SMOTE, random oversampling
 Identify unique set of features that might always be leading to fraudulent claims.
 Use of pipeline to streamline the modelling process
 Adding features to dashboard and make it more user-friendly
 There is a need to capture more data as not many useful business insights can be drawn from 1000 datapoints. More
data, better results

Commercial Fraud 2022
No ratings yet
Commercial Fraud 2022
65 pages
Credit Repair For Survivors of Human Trafficking
No ratings yet
Credit Repair For Survivors of Human Trafficking
11 pages
Loan Approval Prediction Using Supervised Learning Algorithm
No ratings yet
Loan Approval Prediction Using Supervised Learning Algorithm
11 pages
The Complete Guide To Navigating Recovery & Compensation After Being Injured in Las Vegas
No ratings yet
The Complete Guide To Navigating Recovery & Compensation After Being Injured in Las Vegas
87 pages
A. Personal Credit Lesson 1
No ratings yet
A. Personal Credit Lesson 1
35 pages
Frauds in Insurance
No ratings yet
Frauds in Insurance
29 pages
Texas Economic Development Handbook 2015
No ratings yet
Texas Economic Development Handbook 2015
259 pages
Fraud or Misrepresentation in The Real Estate Contract
No ratings yet
Fraud or Misrepresentation in The Real Estate Contract
11 pages
Confederate Name Changes
No ratings yet
Confederate Name Changes
25 pages
Guide To Interruption Insurance
No ratings yet
Guide To Interruption Insurance
46 pages
Seminar Lesson Plan and Class Activities: Insurance Education
No ratings yet
Seminar Lesson Plan and Class Activities: Insurance Education
28 pages
Fraud Book Reeport
No ratings yet
Fraud Book Reeport
24 pages
Free Legal Clinics in VA Facilities March 2021 Indicates A VA Medical Legal Partnership
No ratings yet
Free Legal Clinics in VA Facilities March 2021 Indicates A VA Medical Legal Partnership
18 pages
Fraud Awareness Training: BY The Siu Group
No ratings yet
Fraud Awareness Training: BY The Siu Group
49 pages
Auto Insurance Worksheet
No ratings yet
Auto Insurance Worksheet
2 pages
Cv-Ans-102 Answer Civil (1)
No ratings yet
Cv-Ans-102 Answer Civil (1)
4 pages
Forensic Accounting PDF
No ratings yet
Forensic Accounting PDF
9 pages
Protect Yourself Against Identity Theft
No ratings yet
Protect Yourself Against Identity Theft
43 pages
Application To Seal Civil Record
No ratings yet
Application To Seal Civil Record
6 pages
Fraud in Insurance
No ratings yet
Fraud in Insurance
54 pages
Name: Nirja Pal Class: 6 Sec: E House: Chenab Roll Number: 24 Subject: Introduction of Cybercrimes: - Cyberstalking - Identity Theft
No ratings yet
Name: Nirja Pal Class: 6 Sec: E House: Chenab Roll Number: 24 Subject: Introduction of Cybercrimes: - Cyberstalking - Identity Theft
5 pages
Contractor Requirements
100% (1)
Contractor Requirements
34 pages
Take Charge: Fighting Back Against Identity Theft Deter Detect Defend
No ratings yet
Take Charge: Fighting Back Against Identity Theft Deter Detect Defend
27 pages
22D OptionalClauses
No ratings yet
22D OptionalClauses
2 pages
English-Grade-8_3rd-Quarter
No ratings yet
English-Grade-8_3rd-Quarter
6 pages
Controlling Credit Card Fraud
No ratings yet
Controlling Credit Card Fraud
4 pages
Identity, Identifi Ers and Identity Fraud: January 2005
No ratings yet
Identity, Identifi Ers and Identity Fraud: January 2005
8 pages
David J. Siegel - 14 Websites You Should Be Using
No ratings yet
David J. Siegel - 14 Websites You Should Be Using
3 pages
Implementation of A Client-Inquiry Quality of Service in A Constraint-Based System
No ratings yet
Implementation of A Client-Inquiry Quality of Service in A Constraint-Based System
8 pages
Missouri Death Certificates PDF
No ratings yet
Missouri Death Certificates PDF
1 page
Whistleblowing User Manual
No ratings yet
Whistleblowing User Manual
17 pages
10 Coiled Tubing Operation PDF
100% (7)
10 Coiled Tubing Operation PDF
51 pages
Civil Fraud Vs Criminal Fraud
No ratings yet
Civil Fraud Vs Criminal Fraud
7 pages
Service Manual: A2300 Diesel Engine D15S-5, D18S-5, D20SC-5 G15S-5, G18S-5, G20SC-5 GC15S-5, GC18S-5, GC20SC-5
No ratings yet
Service Manual: A2300 Diesel Engine D15S-5, D18S-5, D20SC-5 G15S-5, G18S-5, G20SC-5 GC15S-5, GC18S-5, GC20SC-5
242 pages
Fraud Abuse MLN4649244
No ratings yet
Fraud Abuse MLN4649244
23 pages
Frauds in Insurance
100% (1)
Frauds in Insurance
35 pages
Docket #4646 Date Filed: 2/25/2010
No ratings yet
Docket #4646 Date Filed: 2/25/2010
33 pages
CPBM Questions 1 - 900 Revised1
No ratings yet
CPBM Questions 1 - 900 Revised1
62 pages
Fraud: The Following Concepts Will Be Developed in This Chapter
No ratings yet
Fraud: The Following Concepts Will Be Developed in This Chapter
24 pages
Auto Insurance Claims - General: Frequently Asked Questions and Nationally-Generic Answers
100% (1)
Auto Insurance Claims - General: Frequently Asked Questions and Nationally-Generic Answers
10 pages
C#
No ratings yet
C#
64 pages
Module 1
No ratings yet
Module 1
16 pages
Barry Shaffer v. State Farm Mutual Automobile I, 3rd Cir. (2016)
No ratings yet
Barry Shaffer v. State Farm Mutual Automobile I, 3rd Cir. (2016)
9 pages
Pack 1 - Ad6 Speaking Exam - Part 2 and Part 3
100% (1)
Pack 1 - Ad6 Speaking Exam - Part 2 and Part 3
22 pages
Applicant Guidance Notes
No ratings yet
Applicant Guidance Notes
29 pages
2015 Winter Handouts PDF
No ratings yet
2015 Winter Handouts PDF
304 pages
EXAMPLE Feasibility Report For Teldon Facilities Corporation
No ratings yet
EXAMPLE Feasibility Report For Teldon Facilities Corporation
24 pages
Consumer Protection and The Credit Crisis
100% (1)
Consumer Protection and The Credit Crisis
88 pages
A Guide To Insurance Claims
No ratings yet
A Guide To Insurance Claims
6 pages
Unauthorized Transactions On Credit Card - What Should You Do
0% (1)
Unauthorized Transactions On Credit Card - What Should You Do
10 pages
Demat Account Fraud - How To Safeguard Against Demat Account Fraud
100% (1)
Demat Account Fraud - How To Safeguard Against Demat Account Fraud
2 pages
IDOC
100% (2)
IDOC
18 pages
CAGM 1902 Safety Management System
No ratings yet
CAGM 1902 Safety Management System
144 pages
PRR Letter
No ratings yet
PRR Letter
1 page
Declarations Page
No ratings yet
Declarations Page
2 pages
Chapter 4 Canadian Inquiries
No ratings yet
Chapter 4 Canadian Inquiries
14 pages
Ilovepdf Merged 3
No ratings yet
Ilovepdf Merged 3
57 pages
Robert Hand Interview
100% (4)
Robert Hand Interview
9 pages
Scopus 2020 2023
No ratings yet
Scopus 2020 2023
78 pages
Auto Insurance Rate Evasion: February 2011
No ratings yet
Auto Insurance Rate Evasion: February 2011
23 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
8 pages
Campus 5
No ratings yet
Campus 5
70 pages
Is Your Insurance Company Listening To You?
No ratings yet
Is Your Insurance Company Listening To You?
10 pages
Class 12 BST Mind Map
No ratings yet
Class 12 BST Mind Map
6 pages
Master-Slave Control System For Virtual-Physical I
No ratings yet
Master-Slave Control System For Virtual-Physical I
29 pages
S Ox Inventory Management Risks and Controls
100% (1)
S Ox Inventory Management Risks and Controls
21 pages
Internet Fraud 6monthreport 2000 A
No ratings yet
Internet Fraud 6monthreport 2000 A
14 pages
Semi-Detailed Lesson Plan in English 7 Q4
100% (2)
Semi-Detailed Lesson Plan in English 7 Q4
5 pages
Contempo Module 3
No ratings yet
Contempo Module 3
14 pages
Slides 16935695789455034964f1d22a61d19
No ratings yet
Slides 16935695789455034964f1d22a61d19
11 pages
Company Mail Send Your CV On This RPSL No
No ratings yet
Company Mail Send Your CV On This RPSL No
5 pages
Lab Manual 1
No ratings yet
Lab Manual 1
6 pages
Analisa Kepemimpinan Dan Perilaku Birokrasi: Studi Analisis Di Sekretariat Daerah Kota Bandung
No ratings yet
Analisa Kepemimpinan Dan Perilaku Birokrasi: Studi Analisis Di Sekretariat Daerah Kota Bandung
23 pages
Atomic Engine Price List AUG2011
100% (1)
Atomic Engine Price List AUG2011
8 pages
Forensic Science International: Jacques Linden, Raymond Marquis, Silvia Bozza, Franco Taroni
No ratings yet
Forensic Science International: Jacques Linden, Raymond Marquis, Silvia Bozza, Franco Taroni
14 pages
Surface Chemistry 2
No ratings yet
Surface Chemistry 2
12 pages
Exportfile Batch Print
No ratings yet
Exportfile Batch Print
4 pages
10 TH
No ratings yet
10 TH
1 page
1) Using MoSCoW Model Prioritization, Assume That ...
No ratings yet
1) Using MoSCoW Model Prioritization, Assume That ...
4 pages
Disclosure and Release Form-Signed
No ratings yet
Disclosure and Release Form-Signed
2 pages
Weintraub v. QUICKEN LOANS, INC, 594 F.3d 270, 4th Cir. (2010)
No ratings yet
Weintraub v. QUICKEN LOANS, INC, 594 F.3d 270, 4th Cir. (2010)
12 pages
Compilation of Outputs in Reading and Writing Skills
No ratings yet
Compilation of Outputs in Reading and Writing Skills
3 pages
Twin Spindle: Vertical Machining Centers
No ratings yet
Twin Spindle: Vertical Machining Centers
6 pages
Gestalt Approach To Team Building
71% (7)
Gestalt Approach To Team Building
2 pages
Previous HSE Questions From The Chapter "SOLUTIONS": A B Total
No ratings yet
Previous HSE Questions From The Chapter "SOLUTIONS": A B Total
2 pages
Leaflet Diverter Valve WZK
No ratings yet
Leaflet Diverter Valve WZK
2 pages
Bailing Out: Sane Way Get Out of Doomed Relationship and Survive with Hope and Self-respect
From Everand
Bailing Out: Sane Way Get Out of Doomed Relationship and Survive with Hope and Self-respect
Barry Lubetkin
No ratings yet
Stick it to the Man: How to Skirt the Law, Scam Your Enemies , and Screw Big, Fat, Stupid, Lazy Corporations...for Fun and Profit!
From Everand
Stick it to the Man: How to Skirt the Law, Scam Your Enemies , and Screw Big, Fat, Stupid, Lazy Corporations...for Fun and Profit!
Ronald Lewis
3/5 (1)
CUI Fundamentals: 100 Questions (and Answers) About the United States Government's Controlled Unclassified Information Program
From Everand
CUI Fundamentals: 100 Questions (and Answers) About the United States Government's Controlled Unclassified Information Program
James Goepel
No ratings yet
How to dispute properly and get paid to fix your own credit: The only credit repair guide you will ever need
From Everand
How to dispute properly and get paid to fix your own credit: The only credit repair guide you will ever need
Lawrence Hicks
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Fraud Detection in Auto Insurance

Uploaded by

Fraud Detection in Auto Insurance

Uploaded by

Fraud Detection in Auto Insurance

 Identify the circumstances which lead to fraudulent claims

 Three columns had ‘?’ (missing values):

 Even the presence of one witness increase the

 Availability of police report reduces the chances

 People who mentioned their occupation as Exec –

 People who play chess and practice cross-fit are

 Major Damage is reported fraudulently more

 The following heatmap shows high correlation

 As per RFC, incident severity and insured hobbies

 As per Xgbc, insured education level and policy_csl

 The Base model performed fairly well on rightful

 Model performance has vastly improved because

 Model performance with selection of only

 ROC AUC Curve shows how well the model has

 Time permitting, following additional actions could have been taken:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.