Chenhao_HW1
Chenhao_HW1
Chenhao_HW1
2024-05-20
Week 1 Homework submission
First I will import the credit card data and convert this into a matrix
library(kernlab)
library(kknn)
head(cc_data)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## 1 1 30.83 0.000 1.25 1 0 1 1 202 0 1
## 2 0 58.67 4.460 3.04 1 0 6 1 43 560 1
## 3 0 24.50 0.500 1.50 1 1 0 1 280 824 1
## 4 1 27.83 1.540 3.75 1 0 5 0 100 3 1
## 5 1 20.17 5.625 1.71 1 1 0 1 120 0 1
## 6 1 32.08 4.000 2.50 1 1 0 0 360 0 1
For 2.2.1 I basically use the sample code provided and here we will use the vanilladot kernel with C value equals
to 100
# calculate a1…am
a <- colSums(model@xmatrix[[1]] * model@coef[[1]])
a
## V1 V2 V3 V4 V5
## -0.0010065348 -0.0011729048 -0.0016261967 0.0030064203 1.0049405641
## V6 V7 V8 V9 V10
## -0.0028259432 0.0002600295 -0.0005349551 -0.0012283758 0.1063633995
file:///C:/Users/quant/iCloudDrive/Documents/OMSA/2024/ISYE6501/Week1/HW1.html 1/5
5/21/24, 12:16 AM Chenhao_HW1
# calculate a0
a0 <- -model@b
a0
## [1] 0.08158492
## [1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [38] 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0
## [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [186] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## [223] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
## [260] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## [297] 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [482] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [519] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## [556] 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## [593] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## [630] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# see what fraction of the model’s predictions match the actual classification
sum(pred == matrix[,11]) / nrow(matrix)
## [1] 0.8639144
For question 3 in 2.2 i will use the kknn classifier, here i will create a function to pass in the value of k
#question 2.2.3
#use kknn to predict
knnPred <- rep(0,nrow(cc_data))
#create a function to pass value of k to the model and return accuracy
choose_K <- function(j){
for (i in 1:nrow(cc_data)){
model_knn <- kknn(V11~., cc_data[-i, ],cc_data[i, ],k=j,scale=TRUE)
knnPred[i] <- as.integer(fitted(model_knn)+0.5)
}
accuracy = sum(knnPred == cc_data[,11])/nrow(cc_data)
return(accuracy)
}
file:///C:/Users/quant/iCloudDrive/Documents/OMSA/2024/ISYE6501/Week1/HW1.html 2/5
5/21/24, 12:16 AM Chenhao_HW1
For question 3.1 i choose a bigger Kmax and just use the train.kknn to perform the cross validation method
#question 3.1.a
#this time we are going to try k value from 1 to 80
#training kknn via leave-one-out cross validation method
k_cv <- 80
kcvmodel
file:///C:/Users/quant/iCloudDrive/Documents/OMSA/2024/ISYE6501/Week1/HW1.html 3/5
5/21/24, 12:16 AM Chenhao_HW1
##
## Call:
## train.kknn(formula = V11 ~ ., data = cc_data, kmax = k_cv, kernel = "optimal", scale = TR
UE)
##
## Type of response variable: continuous
## minimal mean absolute error: 0.1850153
## Minimal mean squared error: 0.1073792
## Best kernel: optimal
## Best k: 58
Lastly on question 3.1.b, i use sample method to create the 60,20,20 split for the credit card data, and then train
the same ksvm model as in 2.1 on the training data set, then use this to predict the validation dataset
#question 3.1.b
set.seed(6)
#create the split of the train, validation (60,20,20 split), test using sample function
key <- sample(seq(1, 3), size = nrow(cc_data), replace = TRUE, prob = c(0.6, 0.2, 0.2))
train <- cc_data[key == 1,]
validation <- cc_data[key == 2,]
test <- cc_data[key == 3,]
set.seed(6)
#train on the training data set with C=100
ksvm_model <- ksvm(as.matrix(train[,1:10]),
train[,11],
type="C-svc",
kernel="vanilladot",
C=100,
scaled=TRUE)
## [1] 0.8671875
file:///C:/Users/quant/iCloudDrive/Documents/OMSA/2024/ISYE6501/Week1/HW1.html 4/5
5/21/24, 12:16 AM Chenhao_HW1
## [1] 0.9206349
Including Plots
Below is plot for 2.2.3
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that
generated the plot.
file:///C:/Users/quant/iCloudDrive/Documents/OMSA/2024/ISYE6501/Week1/HW1.html 5/5