0% found this document useful (0 votes)
7 views5 pages

Analysis Course HW1

Uploaded by

12joe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Analysis Course HW1

Uploaded by

12joe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

HW2

5/21/2020

Question 4.1
A situation from everyday life where clustering would be appropriate would be in clustering the performance of
students based on predictors such as: - Amount of time studied - Average GPA - Student-Teacher Interaction -
Class size

Question 4.2

library(stats)
library(cluster)
library(fpc)

iris_response <- read.csv("iris.txt", sep="") #includes response column


iris <- iris_response[,1:4] #removes the response values from the data set
head(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width


## 1 5.1 3.5 1.4 0.2
## 2 4.9 3.0 1.4 0.2
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
## 6 5.4 3.9 1.7 0.4

tail(iris)

## Sepal.Length Sepal.Width Petal.Length Petal.Width


## 145 6.7 3.3 5.7 2.5
## 146 6.7 3.0 5.2 2.3
## 147 6.3 2.5 5.0 1.9
## 148 6.5 3.0 5.2 2.0
## 149 6.2 3.4 5.4 2.3
## 150 5.9 3.0 5.1 1.8

Generating an elbow plot to determine the an ideal number of clusters to use for the data set.
WSS = vector("numeric")
for (K in 1:10) {
km <- kmeans(iris,
centers = K,
iter.max = 100,
nstart = 25)
WSS[K] <- km$tot.withinss

plot(WSS)

Will use K = 3 since the marginal benefits of increasing the number of clusters starts to decrease after 3.

km1 <- kmeans(iris,


centers = 3,
iter.max = 100,
nstart = 25)

clusplot(iris, km1$cluster, color = T, lines = F )


Question 5.1
library(outliers)
crime <- read.table("uscrime.txt", header = T, )
head(crime)

## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq Prob


## 1 15.1 1 9.1 5.8 5.6 0.510 95.0 33 30.1 0.108 4.1 3940 26.1 0.084602
## 2 14.3 0 11.3 10.3 9.5 0.583 101.2 13 10.2 0.096 3.6 5570 19.4 0.029599
## 3 14.2 1 8.9 4.5 4.4 0.533 96.9 18 21.9 0.094 3.3 3180 25.0 0.083401
## 4 13.6 0 12.1 14.9 14.1 0.577 99.4 157 8.0 0.102 3.9 6730 16.7 0.015801
## 5 14.1 0 12.1 10.9 10.1 0.591 98.5 18 3.0 0.091 2.0 5780 17.4 0.041399
## 6 12.1 0 11.0 11.8 11.5 0.547 96.4 25 4.4 0.084 2.9 6890 12.6 0.034201
## Time Crime
## 1 26.2011 791
## 2 25.2999 1635
## 3 24.3006 578
## 4 29.9012 1969
## 5 21.2998 1234
## 6 20.9995 682

tail(crime)
## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq Prob
## 42 14.1 0 10.9 5.6 5.4 0.523 96.8 4 0.2 0.107 3.7 4890 17.0 0.088904
## 43 16.2 1 9.9 7.5 7.0 0.522 99.6 40 20.8 0.073 2.7 4960 22.4 0.054902
## 44 13.6 0 12.1 9.5 9.6 0.574 101.2 29 3.6 0.111 3.7 6220 16.2 0.028100
## 45 13.9 1 8.8 4.6 4.1 0.480 96.8 19 4.9 0.135 5.3 4570 24.9 0.056202
## 46 12.6 0 10.4 10.6 9.7 0.599 98.9 40 2.4 0.078 2.5 5930 17.1 0.046598
## 47 13.0 0 12.1 9.0 9.1 0.623 104.9 3 2.2 0.113 4.0 5880 16.0 0.052802
## Time Crime
## 42 12.1996 542
## 43 31.9989 823
## 44 30.0001 1030
## 45 32.5996 455
## 46 16.6999 508
## 47 16.0997 849

Determining any outliers in the last column of the dataset (number of crimes per 100,000 people) using a box and
whisker plot and the grubbs.test function.

crime[,16] #the crime column of the dataset

## [1] 791 1635 578 1969 1234 682 963 1555 856 705 1674 849 511 664 798
## [16] 946 539 929 750 1225 742 439 1216 968 523 1993 342 1216 1043 696
## [31] 373 754 1072 923 653 1272 831 566 826 1151 880 542 823 1030 455
## [46] 508 849

grubbs.test(crime[,16],type = 10) #Grubbs first test for determining one outlier from the data
set. Value 1993 is an outlier.

##
## Grubbs test for one outlier
##
## data: crime[, 16]
## G = 2.81287, U = 0.82426, p-value = 0.07887
## alternative hypothesis: highest value 1993 is an outlier

grubbs.test(crime[,16],type = 11) #Grubbs seconds test for checcking if the lowest and highest v
alues are two outliers on opposite tails of the sample. Values 342 and 1993 are outliers.

##
## Grubbs test for two opposite outliers
##
## data: crime[, 16]
## G = 4.26877, U = 0.78103, p-value = 1
## alternative hypothesis: 342 and 1993 are outliers

boxplot(crime[,16]) # Boxplot showing points 1969, 1993, and 1635 as potential outliers from dat
aset
Question 6.1
In technical trading, a Change Detection strategy could be employed to detect any significant cumulative changes
to a stocks price. Both the critical and threshold values could be set to whatever the risk tolerance the trader’s
strategy allows for. Since an increase to a stocks price could potentially be advantageous, the trader could set up
some indicator or alarm if the CUSUM model is detecting decreases in the stock price.

Question 6.2
The data and plots for this problem can be seen in the “6501 HW2”.xslx file uploaded along with this file. For the
first part of the question, we needed to identify when the unofficial summer ends (i.e, when the weather starts
cooling off). My approach was to obtain an average temperature from the years 1997-2015 and plotting that with
respect to each day of data that was available. I made an assumption that the mean of xt with no change, would
be taken from the month of July since we know for sure that it is summer in July. Implementing the CUSUM
approach in Excel, it was determined that August 24th represented the peak St from June 18th to October 31
indicating that this could be the unoffical day summer ends. It was found that a C value of 0.2 fit the model the
best. Any C values larger than 0.75 made the plot inconclusive.

For the second part of 6.2 a similar approach was done to determine if the average temperature in the summer
was increasing over the years. It was observed from our CUSUM model that between the years 2009 to 2012,
there was an increase in temperature to the summer months.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy