0% found this document useful (0 votes)

4 views7 pages

Application of Linear Algebra

This document is an R Markdown report detailing the analysis of Olympic decathlon scores from 2012-2020 using PCA and k-means clustering. The analysis shows that using PCA to reduce dimensionality before clustering improves the classification of data into three distinct classes. The results indicate that the third class is more accurately defined after applying PCA compared to the original dataset.

Uploaded by

freemanchen115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views7 pages

Application of Linear Algebra

Uploaded by

freemanchen115

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

HW3

Freeman Chen

11/16/2021

R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring
HTML, PDF, and MS Word documents. For more details on using R Markdown see
http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as
well as the output of any embedded R code chunks within the document. You can embed an
R code chunk like this:
summary(cars)

## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00

library(devtools)

## Loading required package: usethis

#install_github("vqv/ggbiplot")
library(ggbiplot)

## Loading required package: ggplot2

## Loading required package: plyr

## Loading required package: scales

## Loading required package: grid

#install.packages("factoextra")
library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at

https://goo.gl/ve3WBa

data <- read.csv("Olympic_Dec.csv")

head(data)
## X100M LongJump ShotPut HighJump X400M X110M DiscusThrow PoleVault
## 1 1011 1068 769 850 963 1032 716 972
## 2 994 942 807 794 904 1035 834 849
## 3 801 940 759 906 859 917 782 819
## 4 850 970 819 850 853 863 835 849
## 5 980 945 712 850 899 926 785 819
## 6 940 864 782 714 906 989 852 880
## JavelinThrow X1500M
## 1 767 721
## 2 838 674
## 3 996 744
## 4 763 795
## 5 780 746
## 6 698 695

##perfroman PCA
pca.fit <- prcomp(data, center = TRUE, scale=TRUE)
summary(pca.fit)

## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
PC7
## Standard deviation 1.8102 1.3196 1.1844 0.95432 0.88665 0.82298
0.6760
## Proportion of Variance 0.3277 0.1741 0.1403 0.09107 0.07861 0.06773
0.0457
## Cumulative Proportion 0.3277 0.5018 0.6421 0.73314 0.81176 0.87949
0.9252
## PC8 PC9 PC10
## Standard deviation 0.54742 0.49014 0.45629
## Proportion of Variance 0.02997 0.02402 0.02082
## Cumulative Proportion 0.95516 0.97918 1.00000

plot(pca.fit$sdev^2, main="SCREE Diagram", type="l")

#From
the SCREE Diagram and the cumulative proportions of variances, we can observe that only
about the first 6 components may be sufficient to explain the variation in the original
dataset
# Compute k-means with k = 3
set.seed(123)
km.res <- kmeans(data, 3, nstart = 25)
# Print the results
print(km.res)

## K-means clustering with 3 clusters of sizes 36, 32, 1

##
## Cluster means:
## X100M LongJump ShotPut HighJump X400M X110M DiscusThrow
PoleVault
## 1 917.3611 936.0556 757.4167 838.0833 894.5833 935.1667 762.7222
884.6944
## 2 858.5312 825.1250 753.0938 774.3438 826.6250 873.9375 764.5312
825.3438
## 3 825.0000 847.0000 0.0000 925.0000 765.0000 869.0000 618.0000
941.0000
## JavelinThrow X1500M
## 1 779.3056 728.8333
## 2 707.9688 649.5938
## 3 746.0000 634.0000
##
## Clustering vector:
## [1] 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1
1 1 1 1 1 1 1
## [39] 2 1 2 2 2 2 1 2 2 2 2 3 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 2 2 2 2
##
## Within cluster sum of squares by cluster:
## [1] 1433565 1346603 0
## (between_SS / total_SS = 32.7 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
"tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"

# graph the cluster

fviz_cluster(km.res, data = data)

##cluster after PCA

km.res.pca <- kmeans(pca.fit$x[,1:6], 3, nstart = 25)
# Print the results
print(km.res.pca)

## K-means clustering with 3 clusters of sizes 32, 24, 13

##
## Cluster means:
## PC1 PC2 PC3 PC4 PC5
PC6
## 1 -1.407563 0.2612312 0.3442238 -0.13098381 0.006352374
0.09804290
## 2 1.686239 0.7046040 -0.2202246 -0.04617983 -0.177651106
0.04757325
## 3 0.351714 -1.9438381 -0.4407515 0.40767675 0.312334659 -
0.32916390
##
## Clustering vector:
## [1] 1 1 1 1 1 1 1 1 1 2 3 2 3 2 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1
1 1 3 1 1 3 1
## [39] 3 3 2 2 3 2 3 2 2 3 2 3 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 2 2 2 2
##
## Within cluster sum of squares by cluster:
## [1] 156.8503 117.7800 112.9131
## (between_SS / total_SS = 35.2 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
"tot.withinss"
## [6] "betweenss" "size" "iter" "ifault"

# graph the cluster

fviz_cluster(km.res.pca, data = data)

ggbiplot(pca.fit,ellipse=TRUE,choices=c(1,2))
#I use
the Olympic decathalon scores from 2012-2020 dataset, first I perform the k-mean cluster
tp make 3 classes for the datset,from the graph we can see that the 3rd class is not classfied
well from the orignal datset, then I decided use PCA method to lower the dimesnsion of the
dataset then clustering it, compare to other method, I think PCA method is the best choice
because in this dataset all the variables are integer and we don’t have to normalize the
columns of this dataset.after use the PCA method, I pick the first 6 componets and use k-
mean cluster to define 3 classes, from the graph we can see that the 3rd class is defined
way better than the orignal one.

Clustering
No ratings yet
Clustering
55 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
Week 10 Abhishek Srivastava VFinal
No ratings yet
Week 10 Abhishek Srivastava VFinal
14 pages
K Means Clustering 1
No ratings yet
K Means Clustering 1
18 pages
K Means
No ratings yet
K Means
25 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
Rlab SS
No ratings yet
Rlab SS
25 pages
DATAMINING
No ratings yet
DATAMINING
24 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
DM Lab Internal
No ratings yet
DM Lab Internal
37 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
LAB7 Kmeans
No ratings yet
LAB7 Kmeans
11 pages
Exp 6
No ratings yet
Exp 6
10 pages
Comp of Clustering Method
No ratings yet
Comp of Clustering Method
117 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
K-Means Clustering - MATLAB Kmeans
No ratings yet
K-Means Clustering - MATLAB Kmeans
23 pages
Experiment 9
No ratings yet
Experiment 9
10 pages
Assignment # 1: Performance Timeline of Flynn Taxonomy
No ratings yet
Assignment # 1: Performance Timeline of Flynn Taxonomy
21 pages
Data Mining Business Report 2
No ratings yet
Data Mining Business Report 2
18 pages
Pranav ML-8
No ratings yet
Pranav ML-8
4 pages
20 Cs 112
No ratings yet
20 Cs 112
11 pages
IDM Assignment
No ratings yet
IDM Assignment
15 pages
Learn Lab3
No ratings yet
Learn Lab3
12 pages
AI Ass 2
No ratings yet
AI Ass 2
32 pages
Casos de ML Unsupervised Daniel Ames Camayo
No ratings yet
Casos de ML Unsupervised Daniel Ames Camayo
20 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Practical 5
No ratings yet
Practical 5
6 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Vid 4
No ratings yet
Vid 4
6 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Clustering R Codes
No ratings yet
Clustering R Codes
2 pages
Clusterig
No ratings yet
Clusterig
6 pages
ML Minors Exp7
No ratings yet
ML Minors Exp7
6 pages
Ex No: Date: K-Means Clustering Using Python: Scatter
No ratings yet
Ex No: Date: K-Means Clustering Using Python: Scatter
10 pages
Neural Networks & Machine Learning: Worksheet 3
No ratings yet
Neural Networks & Machine Learning: Worksheet 3
3 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Chapter 10 - Exercise 10
No ratings yet
Chapter 10 - Exercise 10
6 pages
K Means
No ratings yet
K Means
3 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
K Means
No ratings yet
K Means
3 pages
Data Science Analysis Final Project
No ratings yet
Data Science Analysis Final Project
10 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Seminar 10
No ratings yet
Seminar 10
3 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Major - Project - Second-Defense - Report - Edit - Word - 1
No ratings yet
Major - Project - Second-Defense - Report - Edit - Word - 1
58 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
Chat Rest
No ratings yet
Chat Rest
65 pages
Student User Manual - ABES
No ratings yet
Student User Manual - ABES
50 pages
Ex 7
No ratings yet
Ex 7
17 pages
K-Means in Python - Solution
No ratings yet
K-Means in Python - Solution
6 pages
4722 Assessment 1 - Attempt Review
100% (1)
4722 Assessment 1 - Attempt Review
22 pages
Computer Hardware Thesis Proposal
100% (3)
Computer Hardware Thesis Proposal
4 pages
Clustering On Boston Dataset
No ratings yet
Clustering On Boston Dataset
3 pages
Internship Training Project File
No ratings yet
Internship Training Project File
26 pages
Digital Camera
No ratings yet
Digital Camera
1 page
Error Sheet
No ratings yet
Error Sheet
12 pages
Prabhat Dairy System Arch
No ratings yet
Prabhat Dairy System Arch
2 pages
896096-24 iTNC530 OEM HSCI
No ratings yet
896096-24 iTNC530 OEM HSCI
112 pages
Embedded Laboratory
No ratings yet
Embedded Laboratory
95 pages
Google Step Intern
No ratings yet
Google Step Intern
3 pages
Ecet Electronics and Communication - 2020 - 150crs - Summary
No ratings yet
Ecet Electronics and Communication - 2020 - 150crs - Summary
24 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Website: 1. How To Insert Table in Website
No ratings yet
Website: 1. How To Insert Table in Website
12 pages
NWC204
No ratings yet
NWC204
15 pages
An Automated Hydroponics System Based On Mobile Application
No ratings yet
An Automated Hydroponics System Based On Mobile Application
5 pages
Summary:: Sensitivity: Internal & Restricted
No ratings yet
Summary:: Sensitivity: Internal & Restricted
7 pages
Why Can'T I Innovate in My Wiring Closet?: Nick Mckeown
No ratings yet
Why Can'T I Innovate in My Wiring Closet?: Nick Mckeown
32 pages
Chapter 3 - Static Testing
No ratings yet
Chapter 3 - Static Testing
17 pages
Dbms Practical 7
No ratings yet
Dbms Practical 7
4 pages
Git Brochure English
No ratings yet
Git Brochure English
2 pages
Oracle: Exam Questions 1z0-932
No ratings yet
Oracle: Exam Questions 1z0-932
5 pages
FortiADC 2000F PDF
No ratings yet
FortiADC 2000F PDF
8 pages
Future Managers Computer Catalogue 2012
No ratings yet
Future Managers Computer Catalogue 2012
12 pages
Advantech ACP 2010MB User Manual
No ratings yet
Advantech ACP 2010MB User Manual
44 pages
HG8245H5 Datashee
No ratings yet
HG8245H5 Datashee
3 pages
Jharrison Resumeannonymous
No ratings yet
Jharrison Resumeannonymous
2 pages
Ashish Gupta: Education
No ratings yet
Ashish Gupta: Education
2 pages
Assignment 3 - Solution
No ratings yet
Assignment 3 - Solution
2 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Application of Linear Algebra

Uploaded by

Application of Linear Algebra

Uploaded by

HW3

## Loading required package: usethis

## Loading required package: ggplot2

## Loading required package: plyr

## Loading required package: scales

## Loading required package: grid

## Welcome! Want to learn more? See two factoextra-related books at

data <- read.csv("Olympic_Dec.csv")

plot(pca.fit$sdev^2, main="SCREE Diagram", type="l")

## K-means clustering with 3 clusters of sizes 36, 32, 1

# graph the cluster

fviz_cluster(km.res, data = data)

##cluster after PCA

## K-means clustering with 3 clusters of sizes 32, 24, 13

# graph the cluster

fviz_cluster(km.res.pca, data = data)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.