0% found this document useful (0 votes)

75 views

R Codes

The document analyzes an insurance dataset to build regression models and identify key factors that affect insurance payments and claims. It first cleans and explores the data, then builds multiple linear regression models with payment as the target variable and various factors as predictors. The models show that insured amount, claims, kilometers, and zone have significant effects on payments. Further analysis identifies that zone 4, kilometer group 2, and bonus group 7 tend to have higher claims and payments on average. A regression model is also built with claims as the target, finding that all analyzed predictors significantly impact claims rates.

Uploaded by

Nitika Dhariwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

R Codes

Uploaded by

Nitika Dhariwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

rm(list=ls(all=T))

mydata=read.csv("Insurance_factor_identification.csv", header=T)
mydata

# data exploration
#the given data is from insurance industry

head(mydata) # give the first 6 records of the data

dim(mydata)

# the data has 2182 no. of observations and 7 variables

summary(mydata) # give the descriptive analysis of the data

str(mydata) #displays the data type of each variable

# In the given data set all the variables are numeric.

#The committee is interested to know each field of the data collected through
descriptive analysis
#to gain basic insights into the data set and to prepare for further analysis.

summary(mydata)

# Splitting the data into Training data set and Testing dataset
#split ratio be 70:30

library(caTools)

# spliting the data : 70% for training and 30% for testing on the basis of dependent
variable

set.seed(125)
sample= sample.split(mydata$Payment,SplitRatio = 0.70)
sample

#Training Data set

train_data=subset(mydata,sample==TRUE);train_data

#Testing data set

test_data= subset(mydata,sample==FALSE);train_data

#Building multiple linear regression model by using train_data

model_1 = lm(Payment ~., data=train_data)

summary(model_1)

#The p-values for the model_1 are in column 4 under coefficients

#Except for Bonus and Make, the p-value for all the independent variables are less
than 0.05
#we can say that, except Bonus and Make , all other independent variables are
significant
#the Insured and Claims are the most significant variable as there p-value is very
small.
#The slope for Kilometres= 4.671e+03, Zone=2.825e+03 , Insured=2.916e+01,
Claims=4.319e+03

cor(mydata[,-c(3,4)])

#Insured and claim show a strong positive correlation with the dependent variable,
#whereas, Kilometer and Zone show a weak negative correlation with Payment.

#Both multiple R-square and adjusted R-square are same for model_1: 0.9947

#since, we have more than one significant variable so we will go for new model
#with significant variables as independent variables

model_1a=lm(Payment~ Kilometres+Zone+Insured+Claims, data=train_data)

summary(model_1a)

#The p-values for the model_1a are in column 4 under coefficients

#The p-value for all the independent variables are less than 0.05
#we can say that all the independent variables are significant
#the Insured and Claims are the most significant variable as there p-value is very
small.
#The slope for Kilometres= 4.625e+03 , Zone=2.782e+03 , Insured=2.948e+01,
Claims=4.310e+03

cor(mydata[,-c(3,4)])

#Insured and claim show a strong positive correlation with the dependent variable,
#whereas, Kilometer and Zone show a weak negative correlation with Payment.

#Both multiple R-square and adjusted R-square are same for model_1: 0.9947

#so model_1a is the final model as all the independent variables are significant i.e
#Kilometres, Zone, Insured and Claims all have significant effect on the Payment.

#To visualize the results for better understanding.

par(mfrow=c(1,2))
plot(mydata$Claims, mydata$Payment, xlab="Number of Claims", ylab="Payments",
main="")

plot(mydata$Insured,mydata$Payment,xlab="The number of insured in policy-years",

ylab="Payments", main="")

#prediction on the on testing data set

predtest = predict(model_1a,test_data)
head(predtest)

#transform into data frame

pred_payment=data.frame(predtest)

#Bind the predicted data set with original data set by cbind function

final_mydata = cbind(test_data,pred_payment); final_mydata

#export the final file with predicted values

write.csv(final_mydata,"InsuranceFinal.csv")

#The insurance company is planning to establish a new branch office, so they are
interested
#to find at what location, kilometer, and bonus level their insured amount, claims,
#and payment get increased.

library(dplyr)

#grouping the data according to Zone and comparing the mean of different zone

grupzone= apply(mydata[,c(5,6,7)], 2, function(x) tapply(x, mydata$Zone, mean))

grupzone
# Zone 4 has the highest number of claims, and thus payment as well.
# Zones 1-4 have more insured years, claims, and payments.

#grouping the data according to Kilometers and comparing the mean of different zone

grupkil= apply(mydata[,c(5,6,7)],2,function(x)tapply(x,mydata$Kilometres,mean))
grupkil

# Kilometer group 2 has the maximum payments.

#Though the insured number of years is lesser than kilometre 1, the claims and
payments are higher for group 2

#grouping the data according to Bonus and comparing the mean of different zone

grupbon=apply(mydata[,c(5,6,7)],2,function(x)tapply(x,mydata$Bonus,mean))
grupbon

Bonus group 7 has the maximum number of claims and Payment

#The committee wants to understand what affects their claim rates so as to decide the
right
#premiums for a certain set of situations. Hence, they need to find whether the insured
#amount, zone, kilometer, bonus, or make affects the claim rates and to what extent.

model_2 = lm(Claims~.,data=mydata)
summary(model_2)

#Dependent variable: claims Independent variable: kilometres, zone, bonus, make, and
insured
#The results provides the intercept and estimated value and this in turn shows
#that all the p values of independent variables, such as kilometres, zone, bonus, make,
and
#insured are highly significant and are making an impact on the claims.

Analyze The Report of Swedish Motor Insurance
No ratings yet
Analyze The Report of Swedish Motor Insurance
14 pages
Machine Learning Assignment Report - Cars
100% (4)
Machine Learning Assignment Report - Cars
42 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
(Practical) Programming With R
No ratings yet
(Practical) Programming With R
5 pages
Financial Risk Analytics: Assignment
No ratings yet
Financial Risk Analytics: Assignment
35 pages
R Course
No ratings yet
R Course
7 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
R Console
No ratings yet
R Console
6 pages
Group 5 - Applied Statistics and Experimental 152611
No ratings yet
Group 5 - Applied Statistics and Experimental 152611
28 pages
Midterm_Project_Group_6
No ratings yet
Midterm_Project_Group_6
41 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Regression
No ratings yet
Regression
36 pages
R All Program
No ratings yet
R All Program
10 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
R Practice
No ratings yet
R Practice
38 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Report
No ratings yet
Report
24 pages
4027 Assignment Q5
No ratings yet
4027 Assignment Q5
12 pages
CS2B 220 Exam
No ratings yet
CS2B 220 Exam
4 pages
Churn Assignment
No ratings yet
Churn Assignment
11 pages
COST - JournalPracticals (1-7)
No ratings yet
COST - JournalPracticals (1-7)
22 pages
Econometrics All R Codes Final
No ratings yet
Econometrics All R Codes Final
12 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Assignments
No ratings yet
Assignments
10 pages
CS1B April 2024
No ratings yet
CS1B April 2024
9 pages
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
No ratings yet
SPECIMEN EXAM SOLUTIONS - CS1B - IFoA - 2019 - Final
8 pages
DataScience R Project Insurance
33% (3)
DataScience R Project Insurance
9 pages
07exercise Solution
No ratings yet
07exercise Solution
9 pages
ds
No ratings yet
ds
2 pages
R Poisson
No ratings yet
R Poisson
11 pages
FRA Group Assignment - Report
No ratings yet
FRA Group Assignment - Report
22 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
222BDA35 Activity2
No ratings yet
222BDA35 Activity2
5 pages
ps3 Bongioanni Metrics TXT
No ratings yet
ps3 Bongioanni Metrics TXT
9 pages
Amta - Final - Notes.r: ### Step Wise AIC Regression
No ratings yet
Amta - Final - Notes.r: ### Step Wise AIC Regression
6 pages
Case 4 - Tutorial 2
No ratings yet
Case 4 - Tutorial 2
20 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
LAb Test 2
No ratings yet
LAb Test 2
4 pages
Descriptive Statistics in R
No ratings yet
Descriptive Statistics in R
46 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
R Practicals
No ratings yet
R Practicals
32 pages
Statistical Modelling: Regression: Choosing The Independent Variables
No ratings yet
Statistical Modelling: Regression: Choosing The Independent Variables
14 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
DSBAProject Oct 2020
No ratings yet
DSBAProject Oct 2020
24 pages
Lec 4
No ratings yet
Lec 4
18 pages
Sakhil Assignment 02
No ratings yet
Sakhil Assignment 02
8 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
Regression Analysis Script
No ratings yet
Regression Analysis Script
24 pages
Experiment 1
No ratings yet
Experiment 1
4 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Predictive Analytics: Group Assignment 2
No ratings yet
Predictive Analytics: Group Assignment 2
6 pages
COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
R-Codes-1
No ratings yet
R-Codes-1
3 pages
data analysis in r
No ratings yet
data analysis in r
10 pages
2023 Tutorial 12
No ratings yet
2023 Tutorial 12
6 pages
HW1 2023
No ratings yet
HW1 2023
4 pages
21BCS5999 - Ankit Kumar (Assignment 2)
No ratings yet
21BCS5999 - Ankit Kumar (Assignment 2)
16 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Assignment Kit For Program 3: Personal Software Process (PSP) For Engineers: Part I
No ratings yet
Assignment Kit For Program 3: Personal Software Process (PSP) For Engineers: Part I
18 pages
George Et Al. (2001)
No ratings yet
George Et Al. (2001)
18 pages
COSTE53 ConferenceWarsaw Presentations All PDF
No ratings yet
COSTE53 ConferenceWarsaw Presentations All PDF
178 pages
Statistical Thinking for Non Statisticians in Drug Regulation 3rd Edition Richard Kay - The ebook in PDF and DOCX formats is ready for download now
100% (2)
Statistical Thinking for Non Statisticians in Drug Regulation 3rd Edition Richard Kay - The ebook in PDF and DOCX formats is ready for download now
43 pages
Statistics in Economics and Management
100% (4)
Statistics in Economics and Management
592 pages
Google Play Store Apps-Data Analysis and Ratings Prediction
No ratings yet
Google Play Store Apps-Data Analysis and Ratings Prediction
10 pages
SESI 11 - CH 13 - SIMPLE-REGRESSION - Levine - Smume6 - ppt13
No ratings yet
SESI 11 - CH 13 - SIMPLE-REGRESSION - Levine - Smume6 - ppt13
56 pages
Internet Self-Efficacy and Interaction of Students in Mathematics Courses
No ratings yet
Internet Self-Efficacy and Interaction of Students in Mathematics Courses
21 pages
Download full Democratic Transitions Modes and Outcomes 1st Edition Sujian Guo ebook all chapters
100% (1)
Download full Democratic Transitions Modes and Outcomes 1st Edition Sujian Guo ebook all chapters
67 pages
Ecotrix Assignment
No ratings yet
Ecotrix Assignment
5 pages
Critical Path - ITC5203-Full
No ratings yet
Critical Path - ITC5203-Full
6 pages
ML Algorithms
No ratings yet
ML Algorithms
12 pages
Choosing The Correct Statistical Test (CHS 627 - University of Alabama)
No ratings yet
Choosing The Correct Statistical Test (CHS 627 - University of Alabama)
3 pages
Carmignani 2018
No ratings yet
Carmignani 2018
9 pages
Credit Scoring in The Age of Big Data - A State-of-the-Art
No ratings yet
Credit Scoring in The Age of Big Data - A State-of-the-Art
13 pages
The Impact of Advertising On Consumers Buying Behaviour: B.A. Chukwu, E.C. Kanu and A.N. Ezeabogu
No ratings yet
The Impact of Advertising On Consumers Buying Behaviour: B.A. Chukwu, E.C. Kanu and A.N. Ezeabogu
15 pages
The Effect of Financial Literacy and Attitude On Financial Management Behavior and Satisfaction
No ratings yet
The Effect of Financial Literacy and Attitude On Financial Management Behavior and Satisfaction
7 pages
Notes For Lectures 11 To 16 - 2024
No ratings yet
Notes For Lectures 11 To 16 - 2024
68 pages
Statistics Of Earth Science Data Their Distribution In Time Space And Orientation 2nd Edition Graham J Borradaile instant download
No ratings yet
Statistics Of Earth Science Data Their Distribution In Time Space And Orientation 2nd Edition Graham J Borradaile instant download
79 pages
Tarea Nro 9 Diego Vasquez
No ratings yet
Tarea Nro 9 Diego Vasquez
7 pages
[FREE PDF sample] Functions and Change: A Modeling Approach to College Algebra 6th Edition Bruce Crauder ebooks
100% (3)
[FREE PDF sample] Functions and Change: A Modeling Approach to College Algebra 6th Edition Bruce Crauder ebooks
41 pages
Energies: Machine Learning Based Photovoltaics (PV) Power Prediction Using Di Parameters of Qatar
No ratings yet
Energies: Machine Learning Based Photovoltaics (PV) Power Prediction Using Di Parameters of Qatar
19 pages
Logistic Ordinal Regression
No ratings yet
Logistic Ordinal Regression
10 pages
STAT 125 HK Business Statistics Midterm Exam
100% (1)
STAT 125 HK Business Statistics Midterm Exam
65 pages
Artikel Erwin Bakti
No ratings yet
Artikel Erwin Bakti
9 pages
STA 211 - Manual 1 - Agri Junction
No ratings yet
STA 211 - Manual 1 - Agri Junction
151 pages
Journal of Retailing and Consumer Services: Johan Anselmsson, Niklas Bondesson
No ratings yet
Journal of Retailing and Consumer Services: Johan Anselmsson, Niklas Bondesson
13 pages
THE_INFLUENCE_OF_PERSONAL_GROWTH_INITIATIVE_ON_STU
No ratings yet
THE_INFLUENCE_OF_PERSONAL_GROWTH_INITIATIVE_ON_STU
11 pages
Working Conditions and Three Types of Well-Being A Longitudinal Study With Self-Report and Rating Data
No ratings yet
Working Conditions and Three Types of Well-Being A Longitudinal Study With Self-Report and Rating Data
14 pages
29 Regression Ext
No ratings yet
29 Regression Ext
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

R Codes

Uploaded by

R Codes

Uploaded by

rm(list=ls(all=T))

head(mydata) # give the first 6 records of the data

# the data has 2182 no. of observations and 7 variables

summary(mydata) # give the descriptive analysis of the data

str(mydata) #displays the data type of each variable

# In the given data set all the variables are numeric.

#Training Data set

#Testing data set

#Building multiple linear regression model by using train_data

model_1 = lm(Payment ~., data=train_data)

#The p-values for the model_1 are in column 4 under coefficients

model_1a=lm(Payment~ Kilometres+Zone+Insured+Claims, data=train_data)

#The p-values for the model_1a are in column 4 under coefficients

#To visualize the results for better understanding.

plot(mydata$Insured,mydata$Payment,xlab="The number of insured in policy-years",

#prediction on the on testing data set

#transform into data frame

final_mydata = cbind(test_data,pred_payment); final_mydata

#export the final file with predicted values

grupzone= apply(mydata[,c(5,6,7)], 2, function(x) tapply(x, mydata$Zone, mean))

# Kilometer group 2 has the maximum payments.

Bonus group 7 has the maximum number of claims and Payment

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.