0% found this document useful (0 votes)

24 views

R Assignment

The document discusses building machine learning models to predict credit card default using different classification techniques. Three models were built: [1] A decision tree model achieved 80% precision and 94.6% recall for non-defaults, and 80% precision and 27.6% recall for defaults. [2] A random forest model achieved similar results to the decision tree with 80% precision and 93.5% recall for non-defaults, and 79.6% precision and 28.1% recall for defaults. [3] A logistic regression identified some features as not statistically significant and backward selection was used to remove them from the model.

Uploaded by

Tuna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

R Assignment

Uploaded by

Tuna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

# Name: Mohamed Reda Ibrahim

# Code: NC6NVD

#### What types of machine learning models perform best on this dataset? Build at least two types of models.
###------------------------- The answer---------------------------------------------

the machine learning model for such problem is that this problem is Classification problem- Supervised learning, and there are
many methods for Classification to solve such a problem such as :

# 1- Logistic Regression -- 2- Decision tree Classifier 3- Random Forest Classifier or we can use K Nearest Neighbors as a fourth
techniques
## i built three model to solve such a problem where i used the Decision tree model, the Random forest model and Finally the
logistic model

##The Steps for Solving and building up models:

Before we Explain the Codes, I First start by uploading the necessary packages to facilitate my work, then I start to upload the
data , after uploading it , I made the necessary cleaning for this data , then some visualization between the features, and finally
applying the necessary models and testing the accuracy of such a models .

# 1. install RandomForest and GGally

install.packages("randomForest")

install.packages("GGally")

#2. load the necessary packages

library(ggplot2)

library(dplyr)

library(rpart)

library(rpart.plot)
library(randomForest)

library(GGally)

Data loading and cleaning

I uploaded the data , before processing I renamed the label variables default_nextMonth to be
DEFAULT

library(readr)
default_creditcard <- read_delim("D:/MSc 3 Semester/data mining/default_creditcard.csv",

";", escape_double = FALSE, trim_ws = TRUE)

View(default_creditcard)

#Renaming default_nexrMonth as Default

colnames(default_creditcard)[colnames((default_creditcard)) == "default_nextMonth"] = "DEFAULT"
# Checking the structure

str(default_creditcard)

# 3. Let's move the features to be factors

cols<-c("DEFAULT", "PAY_2", "PAY_3")

for (i in cols){
default_creditcard[,i] <- as.factor(unlist(default_creditcard[,i]))

# Now let's look at the structure of the full data set

str(default_creditcard)

The Variables DEFAULT", "PAY_2", "PAY_3 are numbers so I factored them to be Categorical as it is obvious from the above
codes, then I checked the structure of the data again to ensure that are transferred

# Missing values

colSums(is.na(default_creditcard))

# We have a lot of missing data in the LIMIT_BAL feature (2657/221521), I got 2657 in the LIMIT_BAL feature

#so I Replaced these missing values with the mean as following

default_creditcard$LIMIT_BAL [is.na(default_creditcard$LIMIT_BAL )] <- mean(default_creditcard$LIMIT_BAL ,na.rm=T)

# Replace BILL_AMT1 missing values with the mean

default_creditcard$BILL_AMT1 [is.na(default_creditcard$BILL_AMT1 )] <- mean(default_creditcard$BILL_AMT1 ,na.rm=T)

# So, no missing values now

colSums(is.na(default_creditcard))

Comment:
I checked the missing values in all features, and there was missing values in LIMIT_BAL (2657out of 221521), also, BILL-AMTI
(1), so I replaced all theses missing’s with the mean and checked again to become zero missing’s

######## Some Visualizations#####------------------------------

# Plot Density for Client categorized on Education

> ggplot(default_creditcard, aes(x=EDUCATION, fill = EDUCATION)) +

+ geom_density(aes(fill=factor(EDUCATION)), alpha=0.8) + + labs(title="Density Plot", + subtitle="Clients Grouped by

Education", + caption="Source: UCI Credit Card",+ x="Education", + fill="Education")

# let's look at the relationship between sex and the relationship between sex and DEFAULT:
DEFAULT:
ggplot(data=default_creditcard,

aes(x=SEX,fill= DEFAULT)) +
geom_bar()

# females had a much greater chance of

defaulting rather than male

# The relationship between DEFAULT and Age:

ggplot(data = default_creditcard,
aes(x=AGE,fill=DEFAULT)) + geom_histogram(binwidth
=3)

# whose more than 20 years old are expecting to see

high default than the other, while 60 years old and
more will show no default
# The relationship between DEFAULT and EDUCATION: # The relationship between DEFAULT and EDUCATION:

ggplot(data = default_creditcard,

aes(x=EDUCATION ,fill=DEFAULT)) +
geom_bar(position="fill") + ylab("Frequency")

#student in High schools and university are the most

exposure to default than the other and graduate

# The relationship between DEFAULT and MARITAL The relationship between DEFAULT and MARITAL Status:
Status:

ggplot(data = default_creditcard,

aes(x=MARRIAGE ,fill=DEFAULT)) +
geom_bar(position="fill") + ylab("Frequency")

# there is non significance conclusion between the

married and single almost the same percentage in case
of default.

------------------------------------------### Predictions with simple machine learning models## ---------

# 4. The train set with the variables

train_features<- default_creditcard[,c("DEFAULT",
"LIMIT_BAL","SEX","EDUCATION","PAY_2","PAY_3","BILL_AMT1","PAY_AMT1", "PAY_AMT2","PAY_AMT3","PAY_AMT4")]
set.seed(2019)
defaultdatasample <-sample(1:nrow(train_features)[1],10000)
train<-train_features[defaultdatasample,] # The train set of the model
test<-train_features[-defaultdatasample,] # The test set of the model
train
test

Comment:
Before I start modeling , I classified the data in to two sub categories, the first part is the training part which Is consist of
10000 observation as a sample randomly talking, and the remaining are the testing data to test the accuracy of the model
as the above codes indicating, then we will start to deploy different methods as following
# A- let’s try a decision tree model to predict the default : #The Results of the codes:

# (Gini index is the default cutting function) pred.train.dt 0 1

model_dt<- rpart(DEFAULT ~.,data=train, method="class") 0 8633 2186
rpart.plot(model_dt) 1 496 837
pred.train.dt <- predict(model_dt,test,type = "class") > precision_dt<- Confmatrix[1,1]/(sum(Confmatrix[1,]))
Confmatrix<-table(pred.train.dt,test$DEFAULT) > recall_dt<- Confmatrix[1,1]/(sum(Confmatrix[,1]))
Confmatrix > precision_dt
precision_dt<- Confmatrix[1,1]/(sum(Confmatrix[1,])) [1] 0.7979481
recall_dt<- Confmatrix[1,1]/(sum(Confmatrix[,1])) > recall_dt
precision_dt [1] 0.9456677
recall_dt > presicion_dt<-Confmatrix[2,2]/(sum(Confmatrix[2,]))
# 80% precision and 94.6% recall for the non default > recall_dt<- Confmatrix[2,2]/(sum(Confmatrix[,2]))

> precision_dt
presicion_dt<-Confmatrix[2,2]/(sum(Confmatrix[2,])) [1] 0.7979481
recall_dt<- Confmatrix[2,2]/(sum(Confmatrix[,2])) > recall_dt
precision_dt [1] 0.2768773
recall_dt

# 80% precision and 27.6% recall for the default

#B- Let's try to predict DEFAULT using a random forest.

model_rf<-randomForest(DEFAULT~.,data=train)

model_rf

# Let's look at the error as a function of the number of trees

plot(model_rf)

# taiking about 200 trees

model_rf<-randomForest(DEFAULT~.,data=train,ntree=200)

pred.train.rf <- predict(model_rf,test)

t3<-table(pred.train.rf,test$DEFAULT)
t3
precision_rf<- t3[1,1]/(sum(t3[1,]))
recall_rf<- t3[1,1]/(sum(t3[,1]))
precision_rf
recall_rf
# 80% percision and 93.5% recall for the Non default--> almost the same as the tree!!

presicion_rf<- t3[2,2]/(sum(t3[2,]))
presicion_rf<- t3[2,2]/(sum(t3[2,]))
recall_rf<- t3[2,2]/(sum(t3[,2]))
precision_rf
recall_rf

# 79.6% percision and 28.1% recall for the default--> abit better for recall

The Results of codes :

Call:
randomForest(formula = DEFAULT ~ ., data = train)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3

OOB estimate of error rate: 21.52%

Confusion matrix:
0 1 class.error
0 7172 4 3 0.0618705
1 1679 676 0.7129512
> # Let's look at the error as a function of the number of trees
> plot(model_rf)
> # taiking about 200 trees
> model_rf<-randomForest(DEFAULT~.,data=train,ntree=200)
> pred.train.rf <- predict(model_rf,test)
> t3<-table(pred.train.rf,test$DEFAULT)

> t3

pred.train.rf 0 1
0 8507 2172
1 622 851
> precision_rf<- t3[1,1]/(sum(t3[1,]))
> recall_rf<- t3[1,1]/(sum(t3[,1]))

> precision_rf

[1] 0.7966102

> recall_rf

[1] 0.9318655

> presicion_rf<- t3[2,2]/(sum(t3[2,]))

> recall_rf<- t3[2,2]/(sum(t3[,2]))

> precision_rf

[1] 0.7966102

> recall_rf

[1] 0.2815084
# C- Let's try to run a logistic regression (glm (generalized The Results of codes :
linear model)) Call:
logit <- glm(DEFAULT glm(formula = DEFAULT ~ ., family = binomial(link = "logit"),
~.,family=binomial(link='logit'),data=train)
summary(logit) data = train)
# We can see EDUCATION and other features are not
statisticaly significant at 5%. Deviance Residuals:
Min 1Q Median 3Q Max
# Let's try to run a simple backward feature selection in -1.8037 -0.6716 -0.5737 -0.3661 2.6094
order to remove not significant features
Coefficients:
backwards = step(logit, direction = "backward") Estimate Std. Error z value Pr(>|z|)
# Let's look at the prediction of this model on the test set: (Intercept) -9.028e-01 1.014e-01 -8.899 < 2e-16 ***
pred.test <- predict(backwards,test) LIMIT_BAL -1.572e-06 3.169e-07 -4.959 7.09e-07 ***
SEXmale 1.624e-01 5.153e-02 3.151 0.001628 **
pred.test <- ifelse(pred.test > 0.5,1,0) EDUCATIONhigh school 7.989e-02 7.586e-02 1.053 0.292292
EDUCATIONother -1.322e-01 4.952e-01 -0.267 0.789492
t1<-table(pred.test,test$DEFAULT) EDUCATIONuniversity 4.964e-02 5.973e-02 0.831 0.405862
PAY_2-1 -4.336e-01 1.466e-01 -2.957 0.003107 **
t1 PAY_20 -6.630e-01 1.673e-01 -3.962 7.43e-05 ***
PAY_21 3.495e-01 8.752e-01 0.399 0.689674
PAY_22 6.442e-01 1.710e-01 3.767 0.000165 ***
# Presicion and recall of the model for the deaths PAY_23 3.007e-01 2.775e-01 1.084 0.278529
PAY_3-1 2.743e-01 1.451e-01 1.890 0.058695 .
precision<- t1[1,1]/(sum(t1[1,]))
PAY_30 2.277e-01 1.618e-01 1.408 0.159265
recall<- t1[1,1]/(sum(t1[,1])) PAY_31 -9.452e+00 1.195e+02 -0.079 0.936941
PAY_32 9.531e-01 1.669e-01 5.712 1.11e-08 ***
precision
PAY_33 1.482e+00 3.492e-01 4.243 2.20e-05 ***
recall BILL_AMT1 3.702e-06 8.609e-07 4.301 1.70e-05 ***
PAY_AMT1 -6.184e-05 1.528e-05 -4.046 5.21e-05 ***
# 78.6% percision and 97% recall for the dead, better PAY_AMT2 -2.725e-05 1.419e-05 -1.920 0.054916 .
than before!! PAY_AMT3 -5.776e-05 1.478e-05 -3.909 9.28e-05 ***
# Presicion and recall of the model for the survivors PAY_AMT4 -4.121e-05 1.518e-05 -2.715 0.006625 **
---
presicion<- t1[2,2]/(sum(t1[2,])) Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
recall<- t1[2,2]/(sum(t1[,2]))
(Dispersion parameter for binomial family taken to be 1)
precision Null deviance: 10917 on 9999 degrees of freedom
Residual deviance: 9710 on 9979 degrees of freedom
recall
AIC: 9752
# 78.6% percision and 18% recall for the default, very low Number of Fisher Scoring iterations: 9
than the other model After deleting the insignificance features , we will get the
following features:
Df Deviance AIC
<none> 9711.3 9747.3
- PAY_AMT2 1 9715.1 9749.1
- PAY_AMT4 1 9719.0 9753.0
- SEX 1 9721.0 9755.0
- PAY_AMT3 1 9727.3 9761.3
- PAY_AMT1 1 9728.8 9762.8
- BILL_AMT1 1 9730.0 9764.0
- LIMIT_BAL 1 9739.6 9773.6
- PAY_3 5 9790.9 9816.9
Pred.test 0 1

0 8885 2639

1 244 384

> # Presicion and recall of the model for the deaths

> precision<- t1[1,1]/(sum(t1[1,]))

> recall<- t1[1,1]/(sum(t1[,1]))

> precision

[1] 0.7709997

> recall

[1] 0.973272

> # Presicion and recall of the model for the survivors

> presicion<- t1[2,2]/(sum(t1[2,]))

> recall<- t1[2,2]/(sum(t1[,2]))

> precision

[1] 0.7897

> recall

[1] 0.1870261

UNIVERSAL BANK CASE SOLUTION
No ratings yet
UNIVERSAL BANK CASE SOLUTION
9 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
ISYE6501 HW1 Kevin
No ratings yet
ISYE6501 HW1 Kevin
7 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Codes
No ratings yet
Codes
14 pages
Janani Prakash Loan Prediction Study
No ratings yet
Janani Prakash Loan Prediction Study
97 pages
ISYE 6501 Georgia Tech Hmwk3.1a
No ratings yet
ISYE 6501 Georgia Tech Hmwk3.1a
4 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
ISYE6501-Homework-2
No ratings yet
ISYE6501-Homework-2
11 pages
R Codes
No ratings yet
R Codes
23 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Thera Bank
100% (1)
Thera Bank
25 pages
Solution 1
No ratings yet
Solution 1
6 pages
Handling The Dataset Using R - Word
No ratings yet
Handling The Dataset Using R - Word
54 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
1
No ratings yet
1
19 pages
ISYE6501-Homework-1
No ratings yet
ISYE6501-Homework-1
7 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Kaggle Course Notes
No ratings yet
Kaggle Course Notes
87 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Lecture 3 - MachineLearning-CrashCourse2023
No ratings yet
Lecture 3 - MachineLearning-CrashCourse2023
99 pages
data science
No ratings yet
data science
15 pages
Machine learning with Titanic dataset tutorial
No ratings yet
Machine learning with Titanic dataset tutorial
7 pages
Datamining 2
No ratings yet
Datamining 2
54 pages
hw2 - Credit
No ratings yet
hw2 - Credit
3 pages
Chenhao_HW1
No ratings yet
Chenhao_HW1
5 pages
Da Exp9,10
No ratings yet
Da Exp9,10
9 pages
Final Practical
No ratings yet
Final Practical
53 pages
Analysis Course HW2
No ratings yet
Analysis Course HW2
13 pages
Discussion 3 Supervised
No ratings yet
Discussion 3 Supervised
14 pages
Chemo Mortality Analysis
No ratings yet
Chemo Mortality Analysis
5 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
Vighnesh - S Log 13
No ratings yet
Vighnesh - S Log 13
4 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
4063 Final复习资料
No ratings yet
4063 Final复习资料
6 pages
7406HW02-1
No ratings yet
7406HW02-1
3 pages
Data Science in Non-Life Insurance Pricing
No ratings yet
Data Science in Non-Life Insurance Pricing
142 pages
Final Project
No ratings yet
Final Project
9 pages
Part I
No ratings yet
Part I
12 pages
A Note On R
No ratings yet
A Note On R
90 pages
Digital Assignment-6: Read The Data
No ratings yet
Digital Assignment-6: Read The Data
30 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
Assignment 11-17-15: Michael Petzold November 19, 2015
No ratings yet
Assignment 11-17-15: Michael Petzold November 19, 2015
4 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
Sakhil Assignment 02
No ratings yet
Sakhil Assignment 02
8 pages
Final Report (1)
No ratings yet
Final Report (1)
17 pages
Company Bankruptcy Detection PDF
No ratings yet
Company Bankruptcy Detection PDF
34 pages
KNN - Model: Train Test CL K
No ratings yet
KNN - Model: Train Test CL K
2 pages
Tree-Based-Methods
No ratings yet
Tree-Based-Methods
21 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
9 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
IT Manager (2022 version)
No ratings yet
IT Manager (2022 version)
3 pages
Oc60-E3, Oc95e3
No ratings yet
Oc60-E3, Oc95e3
108 pages
Resin Transfer Molding Process A Numerical
No ratings yet
Resin Transfer Molding Process A Numerical
12 pages
GM Counter
No ratings yet
GM Counter
4 pages
SEO Audit
No ratings yet
SEO Audit
16 pages
MARK1234 - Final Submission - 001281872
No ratings yet
MARK1234 - Final Submission - 001281872
24 pages
Assignment Chapter 2 Classification and Types of Research
No ratings yet
Assignment Chapter 2 Classification and Types of Research
4 pages
MLP405 Topic 2.1 Undue Influence and Unconscientious Dealing Study Guide
No ratings yet
MLP405 Topic 2.1 Undue Influence and Unconscientious Dealing Study Guide
19 pages
1 Fundamental of DC Machines
No ratings yet
1 Fundamental of DC Machines
28 pages
Principle of Accounting Assignment NUST
No ratings yet
Principle of Accounting Assignment NUST
2 pages
Dell Xps 13 - Dad13cmbag0 Quanta d13c d13c - QT - 0927 Schematic - A Rev
No ratings yet
Dell Xps 13 - Dad13cmbag0 Quanta d13c d13c - QT - 0927 Schematic - A Rev
44 pages
UTD 2000 Oscilloscope
No ratings yet
UTD 2000 Oscilloscope
23 pages
Pages 219 - 225
No ratings yet
Pages 219 - 225
8 pages
Lecture 9
No ratings yet
Lecture 9
33 pages
Flex Catalogue of Tailwind Css Compiled by Muhammad Rizwan - Compressed
No ratings yet
Flex Catalogue of Tailwind Css Compiled by Muhammad Rizwan - Compressed
88 pages
Contract
No ratings yet
Contract
82 pages
BUSINESS STATISTICS: Simple Linear Regression and Correlation
No ratings yet
BUSINESS STATISTICS: Simple Linear Regression and Correlation
55 pages
ELEC2911 Tutorial 3 Questions
No ratings yet
ELEC2911 Tutorial 3 Questions
2 pages
VCCT in Ansys
No ratings yet
VCCT in Ansys
13 pages
SEMINAR REPORT Railway Braking
No ratings yet
SEMINAR REPORT Railway Braking
21 pages
Scholar Search: A Simple Polysilicon Thin-Film Transistor SPICE Model
No ratings yet
Scholar Search: A Simple Polysilicon Thin-Film Transistor SPICE Model
5 pages
企業環境資訊揭露能否降低銀行貸款成本：來自台灣證券交易所上市公司的證據
No ratings yet
企業環境資訊揭露能否降低銀行貸款成本：來自台灣證券交易所上市公司的證據
44 pages
Grant
100% (2)
Grant
1 page
25lalitpur WardLevel
No ratings yet
25lalitpur WardLevel
23 pages
Airbus 49 A300 A310 Auxiliary Power Unit APU PDF
100% (1)
Airbus 49 A300 A310 Auxiliary Power Unit APU PDF
141 pages
Initial Information Report (Iir) : I. Internship Program Details
No ratings yet
Initial Information Report (Iir) : I. Internship Program Details
2 pages
SAP B1 9.0 FA Processes
100% (1)
SAP B1 9.0 FA Processes
40 pages
Mobile TV Seminar
0% (1)
Mobile TV Seminar
24 pages
Rem 321 Condo Reviewer
100% (3)
Rem 321 Condo Reviewer
14 pages
Joshua Pearson: Carroll University Student Athlete
No ratings yet
Joshua Pearson: Carroll University Student Athlete
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

R Assignment

Uploaded by

R Assignment

Uploaded by

# Name: Mohamed Reda Ibrahim

##The Steps for Solving and building up models:

# 1. install RandomForest and GGally

#2. load the necessary packages

Data loading and cleaning

";", escape_double = FALSE, trim_ws = TRUE)

#Renaming default_nexrMonth as Default

# 3. Let's move the features to be factors

cols<-c("DEFAULT", "PAY_2", "PAY_3")

# Now let's look at the structure of the full data set

#so I Replaced these missing values with the mean as following

default_creditcard$LIMIT_BAL [is.na(default_creditcard$LIMIT_BAL )] <- mean(default_creditcard$LIMIT_BAL ,na.rm=T)

# Replace BILL_AMT1 missing values with the mean

default_creditcard$BILL_AMT1 [is.na(default_creditcard$BILL_AMT1 )] <- mean(default_creditcard$BILL_AMT1 ,na.rm=T)

# So, no missing values now

######## Some Visualizations#####------------------------------

> ggplot(default_creditcard, aes(x=EDUCATION, fill = EDUCATION)) +

+ geom_density(aes(fill=factor(EDUCATION)), alpha=0.8) + + labs(title="Density Plot", + subtitle="Clients Grouped by

# females had a much greater chance of

# The relationship between DEFAULT and Age:

# whose more than 20 years old are expecting to see

#student in High schools and university are the most

# there is non significance conclusion between the

------------------------------------------### Predictions with simple machine learning models## ---------

# 4. The train set with the variables

# (Gini index is the default cutting function) pred.train.dt 0 1

# 80% precision and 27.6% recall for the default

#B- Let's try to predict DEFAULT using a random forest.

# Let's look at the error as a function of the number of trees

# taiking about 200 trees

pred.train.rf <- predict(model_rf,test)

The Results of codes :

OOB estimate of error rate: 21.52%

> presicion_rf<- t3[2,2]/(sum(t3[2,]))

> recall_rf<- t3[2,2]/(sum(t3[,2]))

> # Presicion and recall of the model for the deaths

> precision<- t1[1,1]/(sum(t1[1,]))

> recall<- t1[1,1]/(sum(t1[,1]))

> # Presicion and recall of the model for the survivors

> presicion<- t1[2,2]/(sum(t1[2,]))

> recall<- t1[2,2]/(sum(t1[,2]))

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.