0% found this document useful (0 votes)
95 views4 pages

Kmeans.R: Finding The Elbow' in Wss Curve

The document discusses using K-means clustering analysis and the elbow method to determine the optimal number of clusters for a dataset with two variables, English and Math marks. It performs K-means clustering with 2, 3, and 4 clusters, and plots the within-group sum of squares against the number of clusters, identifying 4 clusters as optimal. It then explores visualizing the data using various ggplot geometries and options.

Uploaded by

singh_saahil14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views4 pages

Kmeans.R: Finding The Elbow' in Wss Curve

The document discusses using K-means clustering analysis and the elbow method to determine the optimal number of clusters for a dataset with two variables, English and Math marks. It performs K-means clustering with 2, 3, and 4 clusters, and plots the within-group sum of squares against the number of clusters, identifying 4 clusters as optimal. It then explores visualizing the data using various ggplot geometries and options.

Uploaded by

singh_saahil14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Kmeans.

R
data1<-read.csv("grades.csv")

head(data1)

data2<-data1[,2:3]

data2

plot(data2, main = "Marks of English Vs Maths", pch =20, cex =2)

library(animation)

ki<-kmeans.ani(data2,2)

ki=kmeans(data2,3)

#kmeans.

ki

plot(data2, col= ki$cluster,main = "% English Vs Maths", pch =20, cex =2)

#ki<-kmeans(data2,centers=2,nstart)

Finding the ‘Elbow’ in wss Curve

wss=numeric(15L)

for (i in 1:15)

wss[i]<-(sum(kmeans(data2,centers=i,nstart=25)$withinss))

plot(1:15, wss, type="b", xlab="Number of Clusters",ylab="Within groups sum of


squares",main="Assessing the Optimal Number of Clusters with the Elbow Method",pch=10 , cex=1)

ki=kmeans(data2,4)

sum(ki$withinss)

#[1] 430.243

wss[4]

#[1] 430.243
Usage of ggplot:

options(scipen=999) # turn-off scientific notation like 1e+48

library(ggplot2)

#theme_set(theme_bw())

data("midwest", package = "ggplot2")

#midwest2 <- read.csv("http://goo.gl/G1K41K") # bkup data source

# Scatterplot

gg <- ggplot(midwest, aes(x=area, y=poptotal)) +

#geom_point(aes(col=state, size=popdensity)) +

geom_point(aes(col=state, size=popblack)) +

geom_point(aes(col=state, size=popwhite)) +

geom_smooth(method="loess", se=F) +

geom_smooth(method = "lm", se=F)+

xlim(c(0, 0.1)) +

ylim(c(0, 500000)) +

labs(subtitle="Area Vs Population", y="Population", x="Area", title="Scatterplot", caption = "Source:


midwest")

plot(gg)

ggplot2
housing <- read.csv("D:/BigdataAnalytics/R_folder/landdata-states.csv")

# Path to the file landdatastates.csv

head(housing[1:5])

hist(housing$Home.Value)

library(ggplot2)

ggplot(housing, aes(x = Home.Value)) + geom_histogram()


# with plot

plot(Home.Value ~ Date, data=subset(housing, State == "MA"))

points(Home.Value ~ Date, col="red",data=subset(housing, State == "TX"))

legend(1975, 400000, c("MA", "TX"), title="State", col=c("black", "red"), pch=c(1, 1))

#with ggplot

#plot 1

ggplot(subset(housing, State %in% c("MA", "TX")), aes(x=Date, y=Home.Value,


color=State))+geom_point()

#Scatter Plot

#plot 2

hp2001Q1 <- subset(housing, Date == 2001.25)

#plot 3

ggplot(hp2001Q1, aes(y = Structure.Cost, x = Land.Value)) + geom_point()

#plot 4

ggplot(hp2001Q1, aes(y = Structure.Cost, x = log(Land.Value))) + geom_point()

hp2001Q1$pred.SC <- predict(lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1))

p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))

plot(p1)

#plot 5

p1 + geom_point(aes(color = Home.Value)) +geom_line(aes(y =hp2001Q1$pred.SC))

#plot 6

p1+ geom_point(aes(color = Home.Value)) + geom_smooth()


#plot 7

p1 + geom_text(aes(label=State), size = 3)

install.packages("ggrepel")

library("ggrepel")

#plot 8

p1 + geom_point() + geom_text_repel(aes(label=State), size = 3)

#plot 9

p1 + geom_point(aes(size = 2), color="red")

#plot 10

p1+ geom_point(aes(color=Home.Value, shape = region))

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy