0% found this document useful (0 votes)

33 views

Lab 4

lab lecture notes of R language.

Uploaded by

neilzhaony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Lab 4

lab lecture notes of R language.

Uploaded by

neilzhaony

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Logistic Regression in R

MACC7006 Accounting Data and Analytics

Keri Hu

Faculty of Business and Economics

1/20
Today: Logistic regression in R

By the end of today’s lecture, you should be able to:

• Create training and testing sets

• Build a logistic regression model
• Evaluate the model

We will work with the dataset: Healthcare.csv

• Predict whether a patient receives poor quality care, based on

information in his/her medical claims history

2/20
Variables in the dataset

3/20
Create training and testing sets

• Training dataset: used to build model

• Testing dataset: used to test the model’s out-of-sample accuracy

• If there is no chronological order on the observations, we randomly

assign observations to the training set or testing set.

4/20
Install and load new package

1. Install the package: install.packages("caTools")

2. Load into your current R session: library(caTools)

• When you use this package in the future, you will not need to
re-install it, but you will need to load it with the library function.

5/20
Split dataset

1. To replicate results by the same random number:

set.seed(any number )
• Restore “the seed” from a previous session, which enables us to reuse
the same set of random values

2. Randomly group data points:

sample.split(dependent variable, fraction of data in training set)
• Produce a TRUE/FALSE vector that helps us randomly split data into
two pieces according to the SplitRatio value (% of training data)

3. Split data into training set or testing set:

subset(data frame, spl==TRUE/FALSE )
• If spl is TRUE, put the corresponding observation in the training set;
if spl is FALSE, put the corresponding observation in the testing set.

6/20
Build a logistic regression model

1. Change the type/class of variables if needed using as.factor(),

as.numeric(), as.character(), etc.
• Here, PoorCare “ Y means quality is poor and N otherwise.

2. Generalized linear model:

glm(dependent variable „ sum of independent variables, data =
training set, family = binomial)
• Used for many different types of models
• family = binomial indicates that we are building a logistic
regression model

7/20
Result of the model

8/20
Evaluate performance of the model

If we want to calculate accuracy on the training set with threshold 0.5:

1. Prediction for the training set:

PredictTrain <- predict(logistic model, type="response")
• The type="response" option tells R to output probabilities of the
form PrpY “ 1|Xq, as opposed to other information such as logit.
• If no new data is specified within predict(), then probabilities are
computed for the training data used to fit the logistic regression.

2. Create a classification/confusion matrix for a threshold of 0.5:

table(training set$dependent variable, PredictTrain > 0.5)
• table() counts observations in each class of the variable(s).

9/20
Plot predictions

1. Add the vector of predictions to the data set:

training set$Predict <- PredictTrain
2. Plot predictions (about the training set)

10/20
Example: Classification/confusion matrix

Threshold value “ 0.5:

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 71 3
Y (actual poor care) 14 11

• The prediction is FALSE if the probability is less than (or equal to)
0.5, and TRUE if the probability is greater than 0.5.

71 ` 11
Accuracy “ “ 82.83%
p71 ` 11q ` p3 ` 14q

• 3 false positive errors: predict poor care but actually good care
• 14 false negative errors: predict good care but actually poor care

11/20
Different threshold values

Threshold value “ 0.3:

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 67 7
Y (actual poor care) 12 13

67 ` 13
Accuracy “ “ 80.81%
p67 ` 13q ` p7 ` 12q

• 7 false positive errors: predict poor care but actually good care
• 12 false negative errors: predict good care but actually poor care

12/20
Different threshold values

Threshold value “ 0.7:

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 73 1
Y (actual poor care) 19 6

73 ` 6
Accuracy “ “ 79.80%
p73 ` 6q ` p1 ` 19q

• 1 false positive errors: predict poor care but actually good care
• 19 false negative errors: predict good care but actually poor care

13/20
ROC curve for the training set

1. Install and load the ROCR package:

install.packages("ROCR"), library(ROCR)

2. Generate an ROC curve:

2.1 Create a prediction object that the ROCR package can understand:
ROCRpred <- prediction(PredictTrain, training set$
dependent variable)
2.2 Calculate performance metrics for the ROC curve:
ROCCurve <- performance(ROCRpred, "tpr", "fpr")
• "tpr": true positive rate
• "fpr": false positive rate

2.3 Plot the ROC curve:

plot(ROCCurve)

14/20
Example: ROC curve

Where is the threshold, say 0.5, on the curve?

15/20
Add threshold labels and calculate AUC

• plot(ROCCurve, colorize=TRUE,
print.cutoffs.at=seq(0,1,0.1), text.adj=c(-0.2,0.7))

• AUC of the training set

as.numeric(performance(ROCRpred, "auc")@y.values)
[1] 0.7945946
16/20
Prediction for the test set

• We should make out-of-sample predictions.

• This can be done on the test set by adding newdata:

PredictTest = predict(logistic model, type = "response",
newdata = testing set)

17/20
Classification/confusion matrix for the test set

Threshold value “ 0.5:

table(testing set$dependent variable, PredictTest > 0.5)

18/20
Example: Classification matrix for the test set

FALSE (predicted good care) TRUE (predicted poor care)

N (actual good care) 23 1
Y (actual poor care) 3 5

• Accuracy on the test set “ p23 ` 5q{rp23 ` 5q ` p1 ` 3qs “ 87.5%

• 1 false positive prediction
• 3 false negative prediction

19/20
ROC curve and AUC of the test set

• Plot ROC curve

• ROCRpredtest = prediction(PredictTest, testing set$
dependent variable)
• ROCCurvetest = performance(ROCRpredtest, "tpr", "fpr")
• plot(ROCCurvetest, colorize=TRUE,
print.cutoffs.at=seq(0,1,0.1), text.adj=c(-0.2,0.7)))

• AUC of the test set

as.numeric(performance(ROCRpredtest, "auc")@y.values)
[1] 0.875

20/20

Keith McNulty - Handbook of Regression Modeling in People Analytics-Routledge (2021)
100% (1)
Keith McNulty - Handbook of Regression Modeling in People Analytics-Routledge (2021)
272 pages
Nike Commerical Invoice
No ratings yet
Nike Commerical Invoice
1 page
Regression Modeling Strategies
No ratings yet
Regression Modeling Strategies
506 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Mysticism and Philosophy in Al-Andalus
No ratings yet
Mysticism and Philosophy in Al-Andalus
290 pages
BAUDM Assignment2
No ratings yet
BAUDM Assignment2
16 pages
2 Modele lineare
No ratings yet
2 Modele lineare
43 pages
BDA Lab Manual (12 Weeks)
No ratings yet
BDA Lab Manual (12 Weeks)
22 pages
Logistic Regression vs. SVMs - Solution
No ratings yet
Logistic Regression vs. SVMs - Solution
7 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Codes
No ratings yet
Codes
14 pages
CO-2-Session-3
No ratings yet
CO-2-Session-3
39 pages
AIH_LAB1
No ratings yet
AIH_LAB1
10 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Practical Machine Learning Course Notes
No ratings yet
Practical Machine Learning Course Notes
76 pages
cor
No ratings yet
cor
6 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Session 7-8 - Data Cleaning and Logistic Regression For Classification
No ratings yet
Session 7-8 - Data Cleaning and Logistic Regression For Classification
30 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Heart Disease Prediction With ML
No ratings yet
Heart Disease Prediction With ML
18 pages
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
No ratings yet
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
89 pages
Preview-9781000427899 A41277316
No ratings yet
Preview-9781000427899 A41277316
28 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Rms PDF
No ratings yet
Rms PDF
506 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
AST Day 2 Slides
No ratings yet
AST Day 2 Slides
58 pages
Module-2_Logistic Regression in Machine Learning
No ratings yet
Module-2_Logistic Regression in Machine Learning
28 pages
jds1022
No ratings yet
jds1022
24 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
Predicting Disease With Machine Learning
No ratings yet
Predicting Disease With Machine Learning
20 pages
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
PE IV - Practical Machine Learning
No ratings yet
PE IV - Practical Machine Learning
7 pages
Logistic Regression
No ratings yet
Logistic Regression
3 pages
Logistic Regression
100% (1)
Logistic Regression
10 pages
Classification Models
No ratings yet
Classification Models
3 pages
data science
No ratings yet
data science
15 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
Practical 3 2022
No ratings yet
Practical 3 2022
8 pages
Course PDF
No ratings yet
Course PDF
403 pages
Project
No ratings yet
Project
16 pages
Ml2-Summary
No ratings yet
Ml2-Summary
8 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
MLS+2+-+Classification
No ratings yet
MLS+2+-+Classification
13 pages
Adv Analytical Theory and Methods: Regression
No ratings yet
Adv Analytical Theory and Methods: Regression
45 pages
RPubs - The Analytics Edge EdX MIT15
No ratings yet
RPubs - The Analytics Edge EdX MIT15
57 pages
Logistic Regression Essentials in R - Articles - STHDA
No ratings yet
Logistic Regression Essentials in R - Articles - STHDA
10 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Aychew Chernet
No ratings yet
Aychew Chernet
8 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
JavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4
From Everand
JavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4
Charlie Masterson
No ratings yet
Lab 5
No ratings yet
Lab 5
30 pages
Lab 3 (Tutorial 1)
No ratings yet
Lab 3 (Tutorial 1)
20 pages
Lab 2
No ratings yet
Lab 2
23 pages
Lab 1
No ratings yet
Lab 1
26 pages
Weekly Home Learning Plan-Week1-English-Mapeh-Math-Esp
No ratings yet
Weekly Home Learning Plan-Week1-English-Mapeh-Math-Esp
6 pages
NCTM Article
100% (1)
NCTM Article
9 pages
Class 9 Autumn Break Holiday Homework
No ratings yet
Class 9 Autumn Break Holiday Homework
4 pages
Birthday Party - Basics of Operators & Basic Programming Practice Problems - HackerEarth
80% (5)
Birthday Party - Basics of Operators & Basic Programming Practice Problems - HackerEarth
4 pages
Design of Vierendeel Trusses
No ratings yet
Design of Vierendeel Trusses
52 pages
Xilinx Ise Tutorial
No ratings yet
Xilinx Ise Tutorial
150 pages
m2 CH 3 Binomial Theorem Supp Ex
No ratings yet
m2 CH 3 Binomial Theorem Supp Ex
1 page
FT-227RA Instruction and Service Manual
No ratings yet
FT-227RA Instruction and Service Manual
21 pages
Undertaking Letter-CU
No ratings yet
Undertaking Letter-CU
1 page
065-E-Ga-Nmc-01 R0 (P7)
No ratings yet
065-E-Ga-Nmc-01 R0 (P7)
1 page
IC-8 IOL Physician Brochure
No ratings yet
IC-8 IOL Physician Brochure
8 pages
IPS E-Max ZirCAD Labside ENG
No ratings yet
IPS E-Max ZirCAD Labside ENG
60 pages
SBTi Tool
No ratings yet
SBTi Tool
74 pages
Instant download Scooter Games 1st Edition Tony Larson pdf all chapter
100% (12)
Instant download Scooter Games 1st Edition Tony Larson pdf all chapter
67 pages
Newcrest Technical Report On Red Chris Operations As of 30 June 2021
No ratings yet
Newcrest Technical Report On Red Chris Operations As of 30 June 2021
285 pages
ADE7758 Calibration Aug2003
No ratings yet
ADE7758 Calibration Aug2003
42 pages
Power One Aurora Station
No ratings yet
Power One Aurora Station
4 pages
Teaching As A Profession, As A Vocation and Mission
No ratings yet
Teaching As A Profession, As A Vocation and Mission
4 pages
Roles of Audiologists and Speech-Language Pathologists Working With Persons With Attention Deficit Hyperactivity Disorder
No ratings yet
Roles of Audiologists and Speech-Language Pathologists Working With Persons With Attention Deficit Hyperactivity Disorder
41 pages
DLR Tah-RDO-JC Login Options
No ratings yet
DLR Tah-RDO-JC Login Options
9 pages
Imd 122 Group Assignment
No ratings yet
Imd 122 Group Assignment
8 pages
Subject Enrichment Activity
No ratings yet
Subject Enrichment Activity
4 pages
The Cherry Orchard and Marxism
No ratings yet
The Cherry Orchard and Marxism
2 pages
Inductive and Deductive Arguments
No ratings yet
Inductive and Deductive Arguments
2 pages
Nurse Patient Interaction
100% (1)
Nurse Patient Interaction
6 pages
Week 1-2 - Science Framwework For Phil Education
No ratings yet
Week 1-2 - Science Framwework For Phil Education
20 pages
MIDEL 7131 Dielectric Insulating Fluid Overview
No ratings yet
MIDEL 7131 Dielectric Insulating Fluid Overview
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lab 4

Uploaded by

Lab 4

Uploaded by

Logistic Regression in R

MACC7006 Accounting Data and Analytics

Faculty of Business and Economics

By the end of today’s lecture, you should be able to:

• Create training and testing sets

We will work with the dataset: Healthcare.csv

• Predict whether a patient receives poor quality care, based on

• Training dataset: used to build model

• Testing dataset: used to test the model’s out-of-sample accuracy

• If there is no chronological order on the observations, we randomly

1. Install the package: install.packages("caTools")

2. Load into your current R session: library(caTools)

1. To replicate results by the same random number:

2. Randomly group data points:

3. Split data into training set or testing set:

1. Change the type/class of variables if needed using as.factor(),

2. Generalized linear model:

If we want to calculate accuracy on the training set with threshold 0.5:

1. Prediction for the training set:

2. Create a classification/confusion matrix for a threshold of 0.5:

1. Add the vector of predictions to the data set:

Threshold value “ 0.5:

FALSE (predicted good care) TRUE (predicted poor care)

Threshold value “ 0.3:

FALSE (predicted good care) TRUE (predicted poor care)

Threshold value “ 0.7:

FALSE (predicted good care) TRUE (predicted poor care)

1. Install and load the ROCR package:

2. Generate an ROC curve:

2.3 Plot the ROC curve:

Where is the threshold, say 0.5, on the curve?

• AUC of the training set

• We should make out-of-sample predictions.

• This can be done on the test set by adding newdata:

Threshold value “ 0.5:

FALSE (predicted good care) TRUE (predicted poor care)

• Accuracy on the test set “ p23 ` 5q{rp23 ` 5q ` p1 ` 3qs “ 87.5%

• Plot ROC curve

• AUC of the test set

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.