Exercise6 Solution
Exercise6 Solution
Essential Libraries
Let us begin by importing the essential Python Libraries.
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt # we only need pyplot
sb.set() # set the default Seaborn style for graphics
The dataset is train.csv; hence we use the read_csv function from Pandas.
Immediately after importing, take a quick look at the data using the head function.
houseData = pd.read_csv('train.csv')
houseData.head()
[5 rows x 81 columns]
<matplotlib.collections.PathCollection at 0x1dfad8709a0>
Basic KMeans Clustering
Guess the number of clusters from the 2D plot, and perform KMeans Clustering.
We will use the KMeans clustering model from sklearn.cluster module.
KMeans(n_clusters=3)
Discuss : Is this the optimal clustering that you will be happy with? If not, try changing
num_clust.
Anomaly Detection for the Dataset
Extract the required variables from the dataset, and then perform Bi-Variate Anomaly Detection.
<matplotlib.collections.PathCollection at 0x1dfaf326910>
LocalOutlierFactor(contamination=0.05)
<matplotlib.collections.PathCollection at 0x1dfb0c12e50>
Discuss : Is this the optimal anomaly detection that you will be happy with? If not, try changing
parameters.