ML Lap
ML Lap
no:1
Linear Regression with the California Housing Dataset
Date:
Aim:
Build a linear regression model to predict housing prices using the California Housing Dataset.
Experiment with different features and tune the model's hyperparameters.
Procedure:
Step 1:Load the Dataset
import pandas as pd
# Load dataset
df = pd.read_csv('housing.csv')
df.head()
df.info()
df.describe()
df.isnull().sum()
1
Step 4: Feature Selection
# Correlation matrix
plt.figure(figsize=(10, 8))
plt.show()8))
y = df['MedHouseVal']
Use ` Linear Regression` from scikit-learn to build and train the model.
model = LinearRegression()
model.fit(X_train, y_train)
Predict on the test set and evaluate using mean squared error (MSE).
y_pred = model.predict(X_test)
2
Step 8:Hyperparameter Tuning
Use Ridge (L2) regression and tune the hyperparameter `alpha` using GridSearchCV.
grid_search.fit(X_train, y_train)
best_model = Ridge(alpha=grid_search.best_params_['alpha'])
best_model.fit(X_train, y_train)
y_pred_final = best_model.predict(X_test)
3
Result:
Implemented a linear regression model to predict housing prices using the California Housing Dataset
in successfully
Ex.no:2
4
Binary Classification with the California Housing Dataset
Date:
Aim:
Implement a binary classification model to predict whether houses in a neighborhood are above a certain
price threshold
Procedure:
Step 1:Load the Dataset
import pandas as pd
# Load dataset
df = pd.read_csv('housing.csv')
df.head()
df.info()
df.describe()
df.isnull().sum()
5
threshold = 200000
# Correlation matrix
plt.figure(figsize=(10, 8))
plt.show()
y = df['Above_Threshold']
Use ` Logistic Regression` from scikit-learn to build and train the model.
model = LinearRegression()
model.fit(X_train, y_train)
Step 8:Evaluade the model.
6
Predict on the test set and evaluate using accuracy,precision,recall and F1-score.
y_pred = model.predict(X_test)
f1 = f1_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(f'Precision: {precision}')
print(f'Recall: {recall}')
Adjust the threshold for classification and observe the changes in model performance.
y_pred_proba = model.predict_proba(X_test)[:, 1]
# Modify threshold
new_threshold = 0.7
7
Step 10:Experiment with Different Classification Metrics:
plt.figure()
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.legend(loc='lower right')
plt.show()
print(f'Accuracy: {accuracy_new}')
print(f'Precision: {precision_new}')
8
Result:
Implemented a binary classification model to predict whether houses in a neighborhood are above a certain
price threshold was successfully.
Ex.no:3
9
Date: Classification with Nearest Neighbours
Aim:
To classify real vs. fake news headlines using the K-Nearest Neighbors (KNN) classifier in scikit-learn.
Procedure:
Step 1: Import Necessary Libraries
import numpy as np
import pandas as pd
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
print(X.head())
print(y.head())
10
Split the dataset into training and validation sets.
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_val)
11
Result:
Implemented a binary classification to classify real vs fake news headlines.The Scikit-learn API training
and validation was successfully.
Ex.no:4
Validation and test sets using the California Housing Dataset
Date:
12
Aim:
To classify perform the complete exercise of training a KNN classifier with validation and test sets
using the California Housing Dataset.
Procedure:
Step 1: Import Necessary Libraries
import numpy as np
import pandas as pd
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
print(X.head())
print(y.head())
13
Split the dataset into training and validation sets.
Further split the training set into a smaller training set and a validation set.
# Split the dataset into training and test sets (80-20 split)
# Further split the training set into a smaller training set and a validation set (80-
20 split of the training set)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_train_pred = knn.predict(X_train) 14
Evaluate the trained model on the test set to check for overfitting.
y_test_pred = knn.predict(X_test)
15
Result:
To perform the complete exercise of training a KNN classifier with validation and test sets using the
California Housing Dataset was successfully.
Ex.no:5
k-means algorithm using the Codon Usage dataset
16
Date:
Aim:
To implement the k-means algorithm using the Codon Usage dataset from the UCI Machine Learning
Repository.
Procedure:
Step 1: Import Necessary Libraries
import pandas as pd
url =
"https://archive.ics.uci.edu/ml/machine-learning-databases/codon/codon_usage.t
xt"
print(data.head())
scaler = StandardScaler()
17
Choose the number of clusters `k`.
Fit the k-means algorithm to the dataset.
Analyze the clustering results.
kmeans.fit(data_scaled)
labels = kmeans.labels_
data['Cluster'] = labels
print(data.head())
inertia = []
kmeans.fit(data_scaled)
inertia.append(kmeans.inertia_)
plt.figure(figsize=(8, 5))
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.show()
Step 5: Visualize the Clusters:
18
If the dataset has more than 2 dimensions, use dimensionality reduction techniques to visualize
the clusters.
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data_scaled)
data_pca_df['Cluster'] = labels
plt.figure(figsize=(10, 7))
plt.scatter(data_pca_df[data_pca_df['Cluster'] == cluster]['PC1'],
data_pca_df[data_pca_df['Cluster'] == cluster]['PC2'],
label=f'Cluster {cluster}')
plt.legend()
plt.show()
19
Result:
To implement the k-means algorithm using the Codon Usage dataset from the UCI Machine Learning
Repository was successfully.
Ex.no:6
k-means algorithm using the Codon Usage dataset
20
Date:
Aim:
To implement the Naïve Bayes Classifier using the Gait Classification dataset from the UCI Machine
Learning Repository.
Procedure:
Step 1: Import Necessary Libraries
import pandas as pd
url =
"https://archive.ics.uci.edu/ml/machine-learning-databases/00310/GaitClassifica
tion.csv"
data = pd.read_csv(url)
print(data.head())
scaler = StandardScaler()
Step 3: Split the Data:
21
Split the dataset into training and test sets.
# Split the dataset into training and test sets (80-20 split)
nb = GaussianNB()
nb.fit(X_train, y_train)
y_pred = nb.predict(X_test)
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
22
Result:
To implement the Naïve Bayes Classifier using the Gait Classification dataset from the UCI Machine
Learning Repository was successfully.
23