0% found this document useful (0 votes)

17 views17 pages

Train

Uploaded by

faizyabiqbal21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views17 pages

Train

Uploaded by

faizyabiqbal21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Implement Find-S Algorithm

Load Libraries
In [1]:
import pandas as pd

Load Dataset
In [2]:
df = pd.read_csv('./tennis.csv')

Explore Dataset
In [3]:
df.head()
Out[3]:

tem
outlook humidity windy play
p

0 sunny hot high False no

1 sunny hot high True no

2 overcast hot high False yes

3 rainy mild high False yes

4 rainy cool normal False yes

In [4]:
df.shape
Out[4]:
(14, 5)
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 outlook 14 non-null object
1 temp 14 non-null object
2 humidity 14 non-null object
3 windy 14 non-null bool
4 play 14 non-null object
dtypes: bool(1), object(4)
memory usage: 590.0+ bytes
In [16]:
for i in df.columns:
print(f'{i} : {df[i].unique()}')
outlook : ['sunny' 'overcast' 'rainy']
temp : ['hot' 'mild' 'cool']
humidity : ['high' 'normal']
windy : [False True]
play : ['no' 'yes']

Split Dataset Into Attributes And Target

In [6]:
result=df['play'].values
In [7]:
attributes=df.drop('play',axis=1).values

Initialization Of Specific Hypothesis

In [8]:
H=['0']*attributes.shape[1]
In [9]:
print(f'Initial Hypothesis is : {H}')
Initial Hypothesis is : ['0', '0', '0', '0']

Implement The Logic Of Find-S Algorithm

In [10]:
for i in range(attributes.shape[0]):
if result[i]=='yes':
for j in range(attributes.shape[1]):
if H[j]=='0':
H[j]=attributes[i][j]
elif H[j]!=attributes[i][j]:
H[j]='?'
print(f'For Step-{i} : {H}')

For Step-0 : ['0', '0', '0', '0']

For Step-1 : ['0', '0', '0', '0']
For Step-2 : ['overcast', 'hot', 'high', False]
For Step-3 : ['?', '?', 'high', False]
For Step-4 : ['?', '?', '?', False]
For Step-5 : ['?', '?', '?', False]
For Step-6 : ['?', '?', '?', '?']
For Step-7 : ['?', '?', '?', '?']
For Step-8 : ['?', '?', '?', '?']
For Step-9 : ['?', '?', '?', '?']
For Step-10 : ['?', '?', '?', '?']
For Step-11 : ['?', '?', '?', '?']
For Step-12 : ['?', '?', '?', '?']
For Step-13 : ['?', '?', '?', '?']

Final General Hypothesis

In [11]:
print(f'Final Hypothesis is : {H}')
Final Hypothesis is : ['?', '?', '?', '?']

Q2)

You are given a dataset Housing.csv, which contains information about various features of houses and
their corresponding prices. The goal is to predict the house prices based on the available features
using linear regression.

Code :

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

df = pd.read_csv('Housing.csv')
# Check for missing values

print(df.isnull().sum())

# Fill missing values for numerical columns with mean and categorical columns with mode

df.fillna(df.mean(), inplace=True)

df['Column_name'] = df['Column_name'].fillna(df['Column_name'].mode()[0])

# One-Hot Encoding for categorical columns

df_encoded = pd.get_dummies(df, drop_first=True)

# Define features (X) and target (y)

X = df_encoded.drop(columns=['Price'])

y = df_encoded['Price']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the linear regression model

model = LinearRegression()

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

# Output the evaluation metrics

print("Mean Squared Error (MSE):", mse)

print("R-squared (R²):", r2)

# Plot the actual vs predicted values

plt.scatter(y_test, y_pred)

plt.xlabel("Actual Prices")

plt.ylabel("Predicted Prices")

plt.title("Actual vs Predicted House Prices")

plt.show()

# Residuals plot

residuals = y_test - y_pred

sns.scatterplot(x=y_pred, y=residuals)

plt.axhline(y=0, color='r', linestyle='--')

plt.xlabel("Predicted Prices")

plt.ylabel("Residuals")

plt.title("Residuals vs Predicted Prices")

plt.show()
Q: Design a task where you acquire two distinct types of datasets: one comprising numerical data and
the other categorical data. Subsequently, you will perform Linear Regression on the dataset containing
numerical values, and Logistic Regression on the dataset containing categorical values.

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression, LogisticRegression

from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, confusion_matrix

from sklearn.preprocessing import OneHotEncoder

# Dataset 1: Numerical Data (House Prices Prediction)

# Generate a random numerical dataset

data_numerical = {

'Size': [1200, 1500, 1800, 2000, 1600, 1100, 2500, 2200, 2300, 1400],

'Bedrooms': [3, 4, 3, 5, 4, 2, 5, 4, 4, 3],

'Age': [5, 10, 15, 7, 3, 12, 8, 14, 5, 20],

'Price': [400000, 500000, 450000, 350000, 475000, 320000, 600000, 550000, 580000, 330000]

df_numerical = pd.DataFrame(data_numerical)

# Features and target variable for numerical dataset

X_numerical = df_numerical.drop(columns=['Price'])

y_numerical = df_numerical['Price']

# Split the data into training and testing sets for linear regression

X_train_num, X_test_num, y_train_num, y_test_num = train_test_split(X_numerical, y_numerical,

test_size=0.2, random_state=42)

# Linear Regression Model

linear_model = LinearRegression()

linear_model.fit(X_train_num, y_train_num)

# Predict on test data

y_pred_num = linear_model.predict(X_test_num)

# Evaluate Linear Regression Model

mse_num = mean_squared_error(y_test_num, y_pred_num)

r2_num = r2_score(y_test_num, y_pred_num)

# Dataset 2: Categorical Data (Customer Purchase Prediction)

# Generate a random categorical dataset

data_categorical = {

'Age_Group': ['18-25', '26-35', '36-45', '46-60', '18-25', '26-35', '36-45', '46-60', '18-25', '26-35'],

'Gender': ['Male', 'Female', 'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male'],

'Product_Category': ['A', 'B', 'A', 'B', 'B', 'A', 'B', 'A', 'B', 'A'],

'Purchased': [1, 0, 1, 0, 1, 1, 0, 1, 0, 1] # 1 = Purchased, 0 = Not Purchased

}
df_categorical = pd.DataFrame(data_categorical)

# One-Hot Encoding for categorical features

df_categorical_encoded = pd.get_dummies(df_categorical, drop_first=True)

# Features and target variable for categorical dataset

X_categorical = df_categorical_encoded.drop(columns=['Purchased'])

y_categorical = df_categorical_encoded['Purchased']

# Split the data into training and testing sets for logistic regression

X_train_cat, X_test_cat, y_train_cat, y_test_cat = train_test_split(X_categorical, y_categorical,

test_size=0.2, random_state=42)

# Logistic Regression Model

logistic_model = LogisticRegression()

logistic_model.fit(X_train_cat, y_train_cat)

# Predict on test data

y_pred_cat = logistic_model.predict(X_test_cat)

# Evaluate Logistic Regression Model

accuracy_cat = accuracy_score(y_test_cat, y_pred_cat)

conf_matrix_cat = confusion_matrix(y_test_cat, y_pred_cat)

# Output the evaluation metrics

print("Linear Regression - Numerical Dataset Evaluation:")

print(f"Mean Squared Error: {mse_num}")

print(f"R-squared: {r2_num}")

print("\nLogistic Regression - Categorical Dataset Evaluation:")

print(f"Accuracy: {accuracy_cat}")

print(f"Confusion Matrix:\n{conf_matrix_cat}")

question 4

You decide to use the algorithm for classification.The dataset is split into 80%
training data and 20% test data.
1. Load the dataset and preprocess it (handle missing values, normalize the
data if necessary).
2. Implement the KNN algorithm to classify.
3. Evaluate the model's performance by calculating the and displaying the .

# Importing necessary libraries

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.impute import SimpleImputer

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Step 1: Load and preprocess the dataset

# Load the dataset (replace 'your_dataset.csv' with the actual path to your dataset)

# For demonstration, let's create a synthetic dataset

data = {

'Feature1': [1, 2, 3, 4, 5, 6, np.nan, 8, 9, 10],

'Feature2': [5, 4, 3, 2, 1, np.nan, 3, 2, 1, 5],

'Feature3': [np.nan, 1, 2, 3, 4, 5, 6, 7, 8, 9],

'Target': [0, 1, 0, 1, 0, 1, 0, 1, 0, 1] # Binary classification (0 or 1)

df = pd.DataFrame(data)

# Show the dataset

print("Original Dataset:")

print(df)

# Handling missing values

imputer = SimpleImputer(strategy='mean') # Fill missing values with the mean of the column

df_imputed = pd.DataFrame(imputer.fit_transform(df.drop(columns=['Target'])))
df_imputed['Target'] = df['Target'] # Add target column back

# Normalize/standardize the data (important for KNN)

scaler = StandardScaler()

X_scaled = scaler.fit_transform(df_imputed.drop(columns=['Target']))

y = df_imputed['Target']

# Step 2: Split the dataset into training and testing sets (80% training, 20% testing)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Step 3: Implement the KNN algorithm

# Initialize the KNN classifier (let's use k=3 for this example)

knn = KNeighborsClassifier(n_neighbors=3)

# Train the KNN model on the training data

knn.fit(X_train, y_train)

# Make predictions on the test data

y_pred = knn.predict(X_test)

# Step 4: Evaluate the model's performance

# Calculate accuracy

accuracy = accuracy_score(y_test, y_pred)

print(f"\nAccuracy: {accuracy:.2f}")

# Classification report (precision, recall, f1-score)

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

# Confusion matrix

print("\nConfusion Matrix:")

print(confusion_matrix(y_test, y_pred))

Question 2

Your task is to take an unclean dataset and drop the unnecessary columns from it. Then, check
the remaining columns to see if there are any NaN (missing) values, and if there are, fill those
values. After that, apply One-hot encoding to the categorical values. Finally, calculate the mean
of the columns you are using.

import pandas as pd

import numpy as np
from sklearn.preprocessing import OneHotEncoder

# Step 1: Load the dataset

# For this demonstration, we'll create a synthetic dataset with some NaN values and categorical data.

data = {

'Feature1': [1, 2, np.nan, 4, 5],

'Feature2': [5, np.nan, 3, 2, 1],

'Feature3': ['A', 'B', 'A', 'B', 'A'],

'Feature4': [10, 20, 30, 40, 50],

'UnnecessaryFeature': ['X', 'Y', 'Z', 'X', 'Y'] # This will be dropped

# Create DataFrame

df = pd.DataFrame(data)

# Show the original dataset

print("Original Dataset:")

print(df)

# Step 2: Drop unnecessary columns (in this case, 'UnnecessaryFeature')

df_cleaned = df.drop(columns=['UnnecessaryFeature'])

# Step 3: Check for missing (NaN) values

print("\nMissing Values Before Filling:")

print(df_cleaned.isna().sum())

# Fill missing values with the mean for numerical columns

df_cleaned = df_cleaned.fillna(df_cleaned.mean())

# Check again for missing values after filling

print("\nMissing Values After Filling:")

print(df_cleaned.isna().sum())

# Step 4: Apply One-hot encoding to categorical variables

# For this example, 'Feature3' is categorical and will be encoded

df_encoded = pd.get_dummies(df_cleaned, columns=['Feature3'], drop_first=True)

# Show the DataFrame after encoding

print("\nDataset After One-hot Encoding:")

print(df_encoded)

# Step 5: Calculate the mean of the numerical columns

mean_values = df_encoded.mean()

# Show the means of the columns

print("\nMean of Columns:")

print(mean_values)

output :

Original Dataset:

Feature1 Feature2 Feature3 Feature4 UnnecessaryFeature

0 1.0 5.0 A 10 X

1 2.0 NaN B 20 Y

2 NaN 3.0 A 30 Z

3 4.0 2.0 B 40 X

4 5.0 1.0 A 50 Y

Missing Values Before Filling:

Feature1 1
Feature2 1

Feature3 0

Feature4 0

UnnecessaryFeature 0

dtype: int64

Missing Values After Filling:

Feature1 0

Feature2 0

Feature3 0

Feature4 0

dtype: int64

Dataset After One-hot Encoding:

Feature1 Feature2 Feature4 Feature3_B

0 1.0 5.0 10 0

1 2.0 2.0 20 1

2 3.0 3.0 30 0

3 4.0 2.0 40 1

4 5.0 1.0 50 0

Mean of Columns:

Feature1 3.000000

Feature2 2.750000

Feature4 30.000000

Feature3_B 0.400000

dtype: float64
OR

import pandas as pd

# Load the dataset

file_path = '/mnt/data/laptopData.csv'

data = pd.read_csv(file_path)

# Step 1: Drop unnecessary columns

data_cleaned = data.drop(columns=['Unnamed: 0'])

# Step 2: Check for missing values

# Fill missing values: Numerical columns will be filled with their mean; categorical with the mode.

for column in data_cleaned.columns:

if data_cleaned[column].dtype == 'object': # Categorical data

data_cleaned[column].fillna(data_cleaned[column].mode()[0], inplace=True)

else: # Numerical data

data_cleaned[column].fillna(data_cleaned[column].mean(), inplace=True)

# Verify no missing values remain

assert data_cleaned.isnull().sum().sum() == 0, "There are still missing values!"

# Step 3: Apply One-Hot Encoding to categorical columns

categorical_columns = data_cleaned.select_dtypes(include=['object']).columns

data_encoded = pd.get_dummies(data_cleaned, columns=categorical_columns, drop_first=True)

# Step 4: Calculate the mean of all numerical columns

column_means = data_encoded.mean()
# Display the results

print("Column Means:")

print(column_means)

Ml Manual
No ratings yet
Ml Manual
30 pages
ML JOURNAL
No ratings yet
ML JOURNAL
53 pages
Ml_datascience Manual (1)
No ratings yet
Ml_datascience Manual (1)
64 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
43 pages
Data Science Record_05
No ratings yet
Data Science Record_05
20 pages
22K61A0654_2_sasi_auto
No ratings yet
22K61A0654_2_sasi_auto
24 pages
Crop Prediction System Final Report
No ratings yet
Crop Prediction System Final Report
46 pages
ml_labmanual (3)
No ratings yet
ml_labmanual (3)
33 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
Ml Lab Mannual1
No ratings yet
Ml Lab Mannual1
37 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
Machine Learning Lab Manual (1)
No ratings yet
Machine Learning Lab Manual (1)
33 pages
Machine_Learning_Lab_Manaul_BCSL606
No ratings yet
Machine_Learning_Lab_Manaul_BCSL606
27 pages
ml record
No ratings yet
ml record
19 pages
Data Analytics Program
No ratings yet
Data Analytics Program
11 pages
Machine learning Lab Assignment 1
No ratings yet
Machine learning Lab Assignment 1
23 pages
Great Learnings - Data Science and Business Analytics - Project List
No ratings yet
Great Learnings - Data Science and Business Analytics - Project List
30 pages
ml manual
No ratings yet
ml manual
9 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Machine Learning Lab Manual (1) (1)
No ratings yet
Machine Learning Lab Manual (1) (1)
26 pages
Aiml Practical
No ratings yet
Aiml Practical
17 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
ML MANUAL
No ratings yet
ML MANUAL
24 pages
23BCE7092_ML_Lab_Assignment[1]
No ratings yet
23BCE7092_ML_Lab_Assignment[1]
14 pages
ML full for print new 1
No ratings yet
ML full for print new 1
38 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
Praveen Ai
No ratings yet
Praveen Ai
6 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
No ratings yet
Import Pandas As PD DF PD - Read - CSV ("Titanic - Train - CSV") DF - Head
20 pages
ML Final-1
No ratings yet
ML Final-1
7 pages
23BCE7199 ML Lab Assignment[1]
No ratings yet
23BCE7199 ML Lab Assignment[1]
15 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
ml.yogesh
No ratings yet
ml.yogesh
23 pages
machinelearning
No ratings yet
machinelearning
26 pages
CP4252 Lab Manual(1)
No ratings yet
CP4252 Lab Manual(1)
13 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
ML lap
No ratings yet
ML lap
23 pages
ML lab manual
No ratings yet
ML lab manual
25 pages
mlalllabprgs
No ratings yet
mlalllabprgs
17 pages
27_KrishParasShah
No ratings yet
27_KrishParasShah
17 pages
DA Practicle Answers Easyw
No ratings yet
DA Practicle Answers Easyw
30 pages
Ml Short Code_under Updating
No ratings yet
Ml Short Code_under Updating
4 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
9 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Data_preprocessing_example_programs1
No ratings yet
Data_preprocessing_example_programs1
9 pages
Thesis ppt
No ratings yet
Thesis ppt
37 pages
LAB MANUAL For Machine Learning
No ratings yet
LAB MANUAL For Machine Learning
15 pages
hemraj_python_ass1
No ratings yet
hemraj_python_ass1
7 pages
External
No ratings yet
External
11 pages
ML lab manual
No ratings yet
ML lab manual
13 pages
Material+for+Student+CAIEC™+(V062021A)+EN
No ratings yet
Material+for+Student+CAIEC™+(V062021A)+EN
189 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
DA_012307
No ratings yet
DA_012307
8 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Data analytics
No ratings yet
Data analytics
10 pages
dsbda_5
No ratings yet
dsbda_5
4 pages
LM04 Extensions of Multiple Regression IFT Notes
No ratings yet
LM04 Extensions of Multiple Regression IFT Notes
17 pages
Contingency Table Analysis Methods and Implementation Using R Full eBook Access
100% (8)
Contingency Table Analysis Methods and Implementation Using R Full eBook Access
17 pages
ML Lab Programs (1)
No ratings yet
ML Lab Programs (1)
9 pages
193-Article Text-479-1-10-20240215
No ratings yet
193-Article Text-479-1-10-20240215
20 pages
Decision and Choice Luce's Choice Axiom PDF
100% (1)
Decision and Choice Luce's Choice Axiom PDF
18 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Advanced Statistical Methods using R Notes
No ratings yet
Advanced Statistical Methods using R Notes
55 pages
6 Loglinear Models Beamer-Online PDF
No ratings yet
6 Loglinear Models Beamer-Online PDF
112 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
Annalsats Articles in Press. Published February 03, 2023 As 10.1513/Annalsats.202211-946Oc
100% (1)
Annalsats Articles in Press. Published February 03, 2023 As 10.1513/Annalsats.202211-946Oc
33 pages
FRA LOng Quiz
0% (1)
FRA LOng Quiz
10 pages
Predictive Clustering For Credit Scoring
100% (1)
Predictive Clustering For Credit Scoring
5 pages
Agadjanian 2015 Women S Religious Authority in A Sub Saharan Setting Dialectics of Empowerment and Dependency
No ratings yet
Agadjanian 2015 Women S Religious Authority in A Sub Saharan Setting Dialectics of Empowerment and Dependency
27 pages
Cognitive Computing Model Brief - Hospital Admissions and ED Visits
No ratings yet
Cognitive Computing Model Brief - Hospital Admissions and ED Visits
9 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Humez Et Al - Methane - Probability - VF - EST - 2019
No ratings yet
Humez Et Al - Methane - Probability - VF - EST - 2019
9 pages
Forgiveness and Well-Being
No ratings yet
Forgiveness and Well-Being
17 pages
Saghar 8612-2
No ratings yet
Saghar 8612-2
22 pages
Berry 2011
No ratings yet
Berry 2011
14 pages
Structural Health Monitoring of Exterior Beam-Column Subassemblies (Feb 2022)
No ratings yet
Structural Health Monitoring of Exterior Beam-Column Subassemblies (Feb 2022)
17 pages
Econometric Modelling: Module - 2
No ratings yet
Econometric Modelling: Module - 2
17 pages
Attributable Fractions (As11) : Course: PG Diploma/ MSC Epidemiology
No ratings yet
Attributable Fractions (As11) : Course: PG Diploma/ MSC Epidemiology
37 pages
A Note On The Tobit Likelihood Function
No ratings yet
A Note On The Tobit Likelihood Function
2 pages
STIPS Course Outline
No ratings yet
STIPS Course Outline
4 pages
Logistic Regression Cost Function
No ratings yet
Logistic Regression Cost Function
1 page
Noc20-Cs28 Week 07 Assignment 02
No ratings yet
Noc20-Cs28 Week 07 Assignment 02
6 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.