Analysis Course HW1
Analysis Course HW1
5/21/2020
Question 4.1
A situation from everyday life where clustering would be appropriate would be in clustering the performance of
students based on predictors such as: - Amount of time studied - Average GPA - Student-Teacher Interaction -
Class size
Question 4.2
library(stats)
library(cluster)
library(fpc)
tail(iris)
Generating an elbow plot to determine the an ideal number of clusters to use for the data set.
WSS = vector("numeric")
for (K in 1:10) {
km <- kmeans(iris,
centers = K,
iter.max = 100,
nstart = 25)
WSS[K] <- km$tot.withinss
plot(WSS)
Will use K = 3 since the marginal benefits of increasing the number of clusters starts to decrease after 3.
tail(crime)
## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq Prob
## 42 14.1 0 10.9 5.6 5.4 0.523 96.8 4 0.2 0.107 3.7 4890 17.0 0.088904
## 43 16.2 1 9.9 7.5 7.0 0.522 99.6 40 20.8 0.073 2.7 4960 22.4 0.054902
## 44 13.6 0 12.1 9.5 9.6 0.574 101.2 29 3.6 0.111 3.7 6220 16.2 0.028100
## 45 13.9 1 8.8 4.6 4.1 0.480 96.8 19 4.9 0.135 5.3 4570 24.9 0.056202
## 46 12.6 0 10.4 10.6 9.7 0.599 98.9 40 2.4 0.078 2.5 5930 17.1 0.046598
## 47 13.0 0 12.1 9.0 9.1 0.623 104.9 3 2.2 0.113 4.0 5880 16.0 0.052802
## Time Crime
## 42 12.1996 542
## 43 31.9989 823
## 44 30.0001 1030
## 45 32.5996 455
## 46 16.6999 508
## 47 16.0997 849
Determining any outliers in the last column of the dataset (number of crimes per 100,000 people) using a box and
whisker plot and the grubbs.test function.
## [1] 791 1635 578 1969 1234 682 963 1555 856 705 1674 849 511 664 798
## [16] 946 539 929 750 1225 742 439 1216 968 523 1993 342 1216 1043 696
## [31] 373 754 1072 923 653 1272 831 566 826 1151 880 542 823 1030 455
## [46] 508 849
grubbs.test(crime[,16],type = 10) #Grubbs first test for determining one outlier from the data
set. Value 1993 is an outlier.
##
## Grubbs test for one outlier
##
## data: crime[, 16]
## G = 2.81287, U = 0.82426, p-value = 0.07887
## alternative hypothesis: highest value 1993 is an outlier
grubbs.test(crime[,16],type = 11) #Grubbs seconds test for checcking if the lowest and highest v
alues are two outliers on opposite tails of the sample. Values 342 and 1993 are outliers.
##
## Grubbs test for two opposite outliers
##
## data: crime[, 16]
## G = 4.26877, U = 0.78103, p-value = 1
## alternative hypothesis: 342 and 1993 are outliers
boxplot(crime[,16]) # Boxplot showing points 1969, 1993, and 1635 as potential outliers from dat
aset
Question 6.1
In technical trading, a Change Detection strategy could be employed to detect any significant cumulative changes
to a stocks price. Both the critical and threshold values could be set to whatever the risk tolerance the trader’s
strategy allows for. Since an increase to a stocks price could potentially be advantageous, the trader could set up
some indicator or alarm if the CUSUM model is detecting decreases in the stock price.
Question 6.2
The data and plots for this problem can be seen in the “6501 HW2”.xslx file uploaded along with this file. For the
first part of the question, we needed to identify when the unofficial summer ends (i.e, when the weather starts
cooling off). My approach was to obtain an average temperature from the years 1997-2015 and plotting that with
respect to each day of data that was available. I made an assumption that the mean of xt with no change, would
be taken from the month of July since we know for sure that it is summer in July. Implementing the CUSUM
approach in Excel, it was determined that August 24th represented the peak St from June 18th to October 31
indicating that this could be the unoffical day summer ends. It was found that a C value of 0.2 fit the model the
best. Any C values larger than 0.75 made the plot inconclusive.
For the second part of 6.2 a similar approach was done to determine if the average temperature in the summer
was increasing over the years. It was observed from our CUSUM model that between the years 2009 to 2012,
there was an increase in temperature to the summer months.