0% found this document useful (0 votes)
7 views

Machine_Learning _Assignment_3

The document outlines a machine learning assignment focused on classification analysis using various algorithms such as Support Vector Machines, Naïve Bayes, k-Nearest Neighbors, and Neural Networks, detailing their working principles, assumptions, strengths, weaknesses, and efficiencies. It also includes a practical task involving data analysis with Python, where a linear regression model is trained and evaluated on income prediction. Results are interpreted using error metrics and R² score to assess model performance.

Uploaded by

tiya Abid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Machine_Learning _Assignment_3

The document outlines a machine learning assignment focused on classification analysis using various algorithms such as Support Vector Machines, Naïve Bayes, k-Nearest Neighbors, and Neural Networks, detailing their working principles, assumptions, strengths, weaknesses, and efficiencies. It also includes a practical task involving data analysis with Python, where a linear regression model is trained and evaluated on income prediction. Results are interpreted using error metrics and R² score to assess model performance.

Uploaded by

tiya Abid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Machine Learning

Assignment 3: Classification
Analysis

Submitted to: Sir Atif Mehmood

Group Members:
Abeeha Abid (F2021065300)
Esha Tur Razia (F2021065348)
Eman Azam (F2021065253)

2024-2025
Task 1

Support Vector Machines (SVM):


 Working Principles: Classifies data by finding the hyperplane that best separates
classes in a high-dimensional space.
 Assumptions: Data is linearly separable or can be made separable using kernel
functions.
 Strengths: Works well with small datasets, effective for high-dimensional spaces, can
handle non-linear separability with kernels.
 Weaknesses: Computationally intensive for large datasets, sensitive to noise.
 Training Efficiency: Slow for large datasets.
 Computational Efficiency: Moderate; increases with the number of support vectors.

1.2 Naïve Bayes


 Working Principles: Based on Bayes’ theorem; assumes features are conditionally
independent given the class.
 Assumptions: Feature independence, which is often unrealistic.
 Strengths: Simple, fast, works well with categorical data.
 Weaknesses: Poor performance if feature independence assumption is violated.
 Training Efficiency: Extremely fast; suitable for large datasets.
 Computational Efficiency: Very high efficiency.

1.3 k-Nearest Neighbors (KNN)

 Working Principles: Predicts class by majority vote among the k-nearest neighbors
in the feature space.
 Assumptions: Nearby points have similar labels; requires good distance metrics.
 Strengths: Easy to implement, no training phase.
 Weaknesses: Sensitive to irrelevant features and feature scaling, slow for large
datasets.
 Training Efficiency: No training needed.
 Computational Efficiency: Low during prediction, especially with large datasets.

1.4 Neural Networks:

 Working Principles: Mimics the human brain with interconnected layers of neurons;
learns features and patterns using backpropagation.
 Assumptions: Requires large amounts of labeled data, assumes enough computational
resources.
 Strengths: Very powerful for complex, non-linear problems.
 Weaknesses: Requires significant computational power and training time, prone to
overfitting.
 Training Efficiency: Low for large networks.
 Computational Efficiency: Relatively low; depends on the size of the network.

Task: 2

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error,
r2_score

data = pd.read_csv('data.csv')

print(data.columns)

X = data[['Age', 'Education_Level', 'Occupation',


'Number_of_Dependents',
'Location', 'Work_Experience', 'Marital_Status',
'Employment_Status', 'Household_Size',
'Homeownership_Status',
'Type_of_Housing', 'Gender',
'Primary_Mode_of_Transportation']]
y = data['Income']

# Convert categorical variables into dummy/indicator variables


X = pd.get_dummies(X, drop_first=True)

X_train, X_test, y_train, y_test = train_test_split(X, y,


test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on test set


y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)


mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f'Mean Absolute Error: {mae}')


print(f'Mean Squared Error: {mse}')
print(f'Root Mean Squared Error: {rmse}')
print(f'R² Score: {r2}')

plt.scatter(y_test, y_pred)
plt.xlabel('Actual Income')
plt.ylabel('Predicted Income')
plt.title('Actual vs Predicted Income')
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)],
color='red') # Diagonal line for reference
plt.show()

Task: 3

Analysis and Interpretation of Results


 MAE, MSE, and RMSE provide insights into how well the model performs in terms of
prediction accuracy.
 R² Score indicates how much variance in income can be explained by other features.
 A good model will have low error metrics (MAE, MSE, RMSE) and a high R² score.

References: Sources use for this assignments are:


 https://www.geeksforgeeks.org/machine-learning/

 https://chatgpt.com/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy