0% found this document useful (0 votes)
0 views61 pages

Machine Learning Lab Manual - Record

The document outlines a series of machine learning experiments implemented in Python, including probability calculations using Bayes' rule, data extraction from databases and Excel, and various algorithms like k-nearest neighbors, linear regression, K-means clustering, Naïve Bayes classification, and genetic algorithms. Each experiment includes a detailed algorithm, program code, and expected outputs. The content serves as a practical guide for applying machine learning techniques using Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views61 pages

Machine Learning Lab Manual - Record

The document outlines a series of machine learning experiments implemented in Python, including probability calculations using Bayes' rule, data extraction from databases and Excel, and various algorithms like k-nearest neighbors, linear regression, K-means clustering, Naïve Bayes classification, and genetic algorithms. Each experiment includes a detailed algorithm, program code, and expected outputs. The content serves as a practical guide for applying machine learning techniques using Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

TABLE OF CONTENTS

S.No Title Page No Signature


The probability that it is Friday and that a student is absent is 3 %. Since there are
1. 5 school days in a week, the probability that it is Friday is 20 %. What is
theprobability that a student is absent given that today is Friday? Apply Baye’s
rule in python to get the result. (Ans: 15%)
2. Extract the data from database using python

3. Implement k-nearest neighbours classification using python

4. Implement linear regression using python

5. Implement K-Means_Clustering using python

6. Implement Naïve Bayes theorem to classify the English text

7. Implement an algorithm to demonstrate the significance of Genetic Algorithm in


python
8. Implement an algorithm to demonstrate Back Propagation Algorithm in python

9. Implementing FIND-S algorithm using python

10. Implementing Candidate Elimination algorithm using python


Machine

Learning Lab

Date: Experiment :1

1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school days in a week, the
probability that it is Friday is 20 %. What is theprobability that a student is absent given that today is Friday?
Apply Baye’s rule in python to get the result. (Ans: 15%)

ALGORITHM:

Step 1: Calculate probability for each word in a text and filter the words which have a probability less
than threshold probability. Words with probability less than threshold probability are irrelevant.
Step 2: Then for each word in the dictionary, create a probability of that word being in insincere
questions and its probability insincere questions. Then finding the conditional probability to use in naive
Bayes classifier. Step 3: Prediction using conditional probabilities. Step 4: End.

PROGRAM:

PFIA=float(input(“Enter probability that it is Friday and that a student is


absent=”)) PF=float(input(“ probability that it is Friday=”)) PABF=PFIA
/ PF
print(“probability that a student is absent given that today is Friday using conditional probabilities=”,PABF)

OUTPUT:

Enter probability that it is Friday and that a student is


absent= 0.03 probability that it is Friday= 0.2
probability that a student is absent given that today is Friday using conditional probabilities= 0.15

Result: -
Machine Learning Lab

Experiment:2
Date:

2. Extract the data from database using python

ALGORITHM:

Step 1: Connect to MySQL from Python


Step 2: Define a SQL SELECT Query
Step 3: Get Cursor Object from Connection
Step 4: Execute the SELECT query using execute() method
Step 5: Extract all rows from a result
Step 6: Iterate each row
Step 7: Close the cursor object and database connection object Step
8: End.

PROCEDURE

CREATING A DATABASE IN MYSQL AS FOLLOWS:

CREATE DATABASE myDB;


SHOW DATABASES;
USE myDB
CREATE TABLE MyGuests (id INT, name VARCHAR(20), email VARCHAR(20)); SHOW
TABLES;
INSERT INTO MyGuests (id,name,email) VALUES(1,"sairam","xyz@abc.com");

SELECT * FROM authors;

We need to install mysql-connector to connect Python with MySQL. You can use the below command to install
this in your system.

pip install mysql-connector-python-rf

PYTHON SOURCE CODE:


import mysql.connector

mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="myDB"
)
Machine Learning Lab

mycursor = mydb.cursor()
mycursor.execute("SELECT * FROM MyGuests")

myresult = mycursor.fetchall()

for x in myresult:
print(x)

OUTPUT:
Machine Learning Lab

Extracting data from Excel sheet using Python


Step1: First convert dataset present in excel to CSV file using online resources, then execute
following program: consider dataset excel consists of 14 input columns and 3 output columns (C1, C2,
C3)as follows: Python Souce Code: import pandas as pd
dataset=pd.read_csv("Mul_Label_Dataset.csv", delimiter=',')
print(dataset) #Print entire dataset
X=
dataset[['Send','call','DC','IFMSCV','MSCV','BA','MBZ','TxO','RS','CA','AL','IFWL','WWL','FWL']].values
Y = dataset[['C1','C2','C3']].values
print(Y) #Prints output values
print(X) #Prints intput values
X1 = dataset[['Send','call','DC','IFMSCV','MSCV']].values
print(X1) #Prints first 5 columns of intput values print(X[0:5]) # Prints only first
5 rows of input values

OUTPUT SCREENS:
Excel Format: CSV

Format:

Result: -
Machine Learning Lab

Experiment:3
Date:

3. Implement k-nearest neighbours classification using python

ALGORITHM:

Step 1: Load the data


Step 2: Initialize the value of k
Step 3: For getting the predicted class, iterate from 1 to total number of training data points
i) Calculate the distance between test data and each row of training data. Here we will use Euclidean
distance as our distance metric since it’s the most popular method. The other metrics that can be used
are Chebyshev, cosine, etc.
ii) Sort the calculated distances in ascending order based on distance values 3. Get top k rows from the
sorted array
iii) Get the most frequent class of these rows i.e. Get the labels of the selected K entries
iv) Return the predicted class If regression, return the mean of the K labels If classification, return the
mode of the K labels
• If regression, return the mean of the K labels
• If classification, return the mode of the K labels Step 4: End.

PROGRAM

import numpy as np from


sklearn import datasets

iris = datasets.load_iris()
data = iris.data labels =
iris.target

for i in [0, 79, 99, 101]: print(f"index: {i:3}, features: {data[i]},


label: {labels[i]}")

np.random.seed(42)
indices = np.random.permutation(len(data))
n_training_samples = 12
learn_data = data[indices[:-n_training_samples]] learn_labels
= labels[indices[:-n_training_samples]]
Machine Learning Lab

test_data = data[indices[-n_training_samples:]] test_labels = labels[indices[-

n_training_samples:]]

print("The first samples of our learn set:")


print(f"{'index':7s}{'data':20s}{'label':3s}")
for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")

print("The first samples of our test set:")


print(f"{'index':7s}{'data':20s}{'label':3s}")
for i in range(5):
print(f"{i:4d} {learn_data[i]} {learn_labels[i]:3}")

#The following code is only necessary to visualize the data of our learnset import
matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
colours = ("r", "b") X = [] for iclass in
range(3):
X.append([[], [], []]) for i in
range(len(learn_data)): if
learn_labels[i] == iclass:
X[iclass][0].append(learn_data[i][0])
X[iclass][1].append(learn_data[i][1])
X[iclass][2].append(sum(learn_data[i][2:]))
Machine Learning Lab

colours = ("r", "g", "y") fig = plt.figure()

ax = fig.add_subplot(111, projection='3d') for


iclass in range(3):
ax.scatter(X[iclass][0], X[iclass][1], X[iclass][2], c=colours[iclass])
plt.show()
#

def distance(instance1, instance2):


""" Calculates the Eucledian distance between two instances""" return
np.linalg.norm(np.subtract(instance1, instance2))

def get_neighbors(training_set, labels, test_instance, k, distance):


""" get_neighors calculates a list of the k nearest neighbors of an instance
'test_instance'.
The function returns a list of k 3-tuples. Each 3-tuples consists of (index, dist, label)
""" distances = [] for index in
range(len(training_set)):
dist = distance(test_instance, training_set[index]) distances.append((training_set[index],
dist, labels[index]))
distances.sort(key=lambda x: x[1]) neighbors
= distances[:k]
return neighbors
Machine Learning Lab

for i in range(5):
neighbors = get_neighbors(learn_data, learn_labels, test_data[i], 3, distance=distance)
print("Index: ",i,'\n',
"Testset Data: ",test_data[i],'\n',
"Testset Label: ",test_labels[i],'\n',
"Neighbors: ",neighbors,'\n')
OUTPUT:
Machine Learning Lab

Result: -
Machine Learning Lab

Experiment 4
Date:

4. Implement linear regression using python

ALGORITHM:

Step 1: Create Database for Linear Regression


Step 2:Finding Hypothesis of Linear Regression Step
3:Training a Linear Regression model
Step 4:Evaluating the model
Step 5: Scikit-learn implementation Step
6: End

PROGRAM:

# Importing Necessary Libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression from
sklearn.metrics import mean_squared_error, r2_score
# generate random data-set np.random.seed(0) x = np.random.rand(100, 1) #Generate a 2-D array with 100
rows, each row containing 1 random numbers:
y = 2 + 3 * x + np.random.rand(100, 1)
regression_model = LinearRegression() # Model initialization
regression_model.fit(x, y) # Fit the data(train the model) y_predicted
= regression_model.predict(x) # Predict
# model evaluation

rmse = mean_squared_error(y, y_predicted) r2


= r2_score(y, y_predicted)

# printing values
print('Slope:' ,regression_model.coef_)
print('Intercept:', regression_model.intercept_)
Machine Learning Lab

print('Root mean squared error: ', rmse) print('R2 score: ', r2)

# plotting values # data points


plt.scatter(x, y, s=10)
plt.xlabel('x-Values from 0-1')
plt.ylabel('y-values from 2-5')
# predicted values plt.plot(x,
y_predicted, color='r')
plt.show() )

OUTPUT:

Result: -
Machine Learning Lab

Experiment 5
Date:

5. Implement K-Means_Clustering using python

ALGORITHM:

Step 1: Read the Given data Sample to X


Step 2: Train Dataset with K=5
Step 3: Find optimal number of clusters(k) in a dataset using Elbow method
Step 4: Train Dataset with K=3 (optimal K-Value)
Step 4: Compare results
Step 6: End

PROGRAM:

#Import libraries import


numpy as np import pandas as pd
import matplotlib.pyplot as plt from
sklearn.cluster import KMeans from
sklearn import datasets

#Read DataSet df =
datasets.load_iris() x =
df.data
y = df.target

print(x)
print(y)

#Lets try with k=5 initially

kmeans5 = KMeans(n_clusters=5)
y_kmeans5 = kmeans5.fit_predict(x)
print(y_kmeans5)

print(kmeans5.cluster_centers_)

# To find optimal number of clusters(k) in a dataset

Error =[ ] for i in range(1, 11): kmeans =


KMeans(n_clusters = i).fit(x) kmeans.fit(x)
Error.append(kmeans.inertia_)
import matplotlib.pyplot as plt plt.plot(range(1,
11), Error)
Machine Learning Lab

plt.title('Elbow method')
plt.xlabel('No of clusters')
plt.ylabel('Error')
plt.show()

#Now try with k=3 finally


kmeans3 = KMeans(n_clusters=3)
y_kmeans3 = kmeans3.fit_predict(x)
print(y_kmeans3)

print(kmeans3.cluster_centers_)

OUTPUT:
Machine Learning Lab

Result: -
Machine Learning Lab
Machine Learning Lab
Machine Learning Lab

E.g. The first transformed row is [0 1 1 1 0 0 1 0 1] and the unique vocabulary is [‘and’, ‘document’,
‘first’, ‘is’, ‘one’, ‘second’, ‘the’, ‘third’, ‘this’], thus this means that the words “document”, “first”, “is”,
“the” and “this” appeared 1 time each in the initial text string (i.e. ‘This is the first document.’).
In our example, we will convert the collection of text documents (train and test sets) into a matrix of token
counts.
To implement that text transformation we will use the make_pipeline function. This will internally transform
the text data and then the model will be fitted using the transformed data.

Source Code
print("NAIVE BAYES ENGLISH TEST CLASSIFICATION")

import numpy as np, pandas as pd


import seaborn as sns import
matplotlib.pyplot as plt
from sklearn.datasets import fetch_20newsgroups from
sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB from
sklearn.pipeline import make_pipeline from sklearn.metrics
import confusion_matrix, accuracy_score sns.set() # use
seaborn plotting style

# Load the dataset data = fetch_20newsgroups()# Get the


text categories text_categories = data.target_names# define
the training set
train_data = fetch_20newsgroups(subset="train", categories=text_categories)# define the test set
test_data = fetch_20newsgroups(subset="test", categories=text_categories)

print("We have {} unique classes".format(len(text_categories)))


print("We have {} training samples".format(len(train_data.data)))
print("We have {} test samples".format(len(test_data.data)))

# let’s have a look as some training data let it 5th only


#print(test_data.data[5])

# Build the model


model = make_pipeline(TfidfVectorizer(), MultinomialNB())# Train the model using the training data
model.fit(train_data.data, train_data.target)# Predict the categories of the test data
predicted_categories = model.predict(test_data.data)

print(np.array(test_data.target_names)[predicted_categories])
Machine Learning Lab
Machine Learning Lab

# plot the confusion matrix


mat = confusion_matrix(test_data.target, predicted_categories)
sns.heatmap(mat.T, square = True, annot=True, fmt = "d",
xticklabels=train_data.target_names,yticklabels=train_data.target_names)
plt.xlabel("true labels")
plt.ylabel("predicted label")
plt.show()
print("The accuracy is {}".format(accuracy_score(test_data.target, predicted_categories)))

OUTPUT:
Machine Learning Lab

Given a target string, the goal is to produce target string starting from a random string of the same length. In the
following implementation, following analogies are made –
• Characters A-Z, a-z, 0-9 and other special symbols are considered as genes
• A string generated by these character is considered as chromosome/solution/Individual

Fitness score is the number of characters which differ from characters in target string at a particular index. So
individual having lower fitness value is given more preference.

Source Code
# Python3 program to create target string, starting from #
random string using Genetic Algorithm

import random

# Number of individuals in each generation


POPULATION_SIZE = 100

# Valid genes
GENES = '''abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOP QRSTUVWXYZ
1234567890, .-;:_!"#%&/()=?@${[]}'''

# Target string to be generated


TARGET = "I love GeeksforGeeks"

class Individual(object):
'''
Class representing individual in population '''
def init (self, chromosome): self.chromosome =
chromosome
self.fitness = self.cal_fitness()

@classmethod def
mutated_genes(self):
'''
create random genes for mutation
'''
global GENES
gene = random.choice(GENES)
return gene

@classmethod def
create_gnome(self):
'''
create chromosome or string of genes
'''
global TARGET
Machine Learning Lab

gnome_len = len(TARGET)
return [self.mutated_genes() for _ in range(gnome_len)]

def mate(self, par2):


''' Perform mating and produce new offspring '''

# chromosome for offspring child_chromosome = [] for


gp1, gp2 in zip(self.chromosome, par2.chromosome):

# random probability
prob = random.random()

# if prob is less than 0.45, insert gene


# from parent 1 if
prob < 0.45:
child_chromosome.append(gp1)

# if prob is between 0.45 and 0.90, insert


# gene from parent 2 elif
prob < 0.90:
child_chromosome.append(gp2)

# otherwise insert random gene(mutate),


# for maintaining diversity
else:
child_chromosome.append(self.mutated_genes())

# create new Individual(offspring) using


# generated chromosome for offspring
return Individual(child_chromosome)

def cal_fitness(self):
''' Calculate fittness score, it is the number of
characters in string which differ from target string. '''
global TARGET
fitness = 0 for gs, gt in zip(self.chromosome,
TARGET):
if gs != gt: fitness+= 1
return fitness

# Driver code def


main():
global POPULATION_SIZE

#current generation generation


=1
Machine Learning Lab

found = False

population = []

# create initial population


for _ in range(POPULATION_SIZE): gnome
= Individual.create_gnome()
population.append(Individual(gnome))

while not found:

# sort the population in increasing order of fitness score population


= sorted(population, key = lambda x:x.fitness)

# if the individual having lowest fitness score ie.


# 0 then we know that we have reached to the target
# and break the loop if
population[0].fitness <= 0:
found = True break

# Otherwise generate new offsprings for new generation new_generation


= []

# Perform Elitism, that mean 10% of fittest population


# goes to the next generation s =
int((10*POPULATION_SIZE)/100)
new_generation.extend(population[:s])

# From 50% of fittest population, Individuals


# will mate to produce offspring s =
int((90*POPULATION_SIZE)/100) for
_ in range(s):
parent1 = random.choice(population[:50])
parent2 = random.choice(population[:50])
child = parent1.mate(parent2)
new_generation.append(child) population =
new_generation

print("Generation: {}\tString: {}\tFitness: {}".\


format(generation,
"".join(population[0].chromosome),
population[0].fitness))
generation += 1
Machine Learning Lab

print("Generation: {}\tString: {}\tFitness: {}".\ format(generation,


"".join(population[0].chromosome),
population[0].fitness))

if name == ' main ': main()

OUTPUT:

Result: -
Machine Learning Lab

Source Code: import


numpy
import matplotlib.pyplot as plt

def sigmoid(sop):
return 1.0/(1+numpy.exp(-1*sop))

def error(predicted, target):


return numpy.power(predicted-target, 2)

def error_predicted_deriv(predicted, target):


return 2*(predicted-target)

def sigmoid_sop_deriv(sop):
return sigmoid(sop)*(1.0-sigmoid(sop))

def sop_w_deriv(x):
return x

def update_w(w, grad, learning_rate):


return w - learning_rate*grad

x1=0.1 x2=0.4

target = 0.7
learning_rate = 0.01

w1=numpy.random.rand() w2=numpy.random.rand()

print("Initial W : ", w1, w2)

predicted_output = []
network_error = []

old_err = 0 for k in
range(80000): # Forward Pass
y = w1*x1 + w2*x2 predicted
= sigmoid(y) err =
error(predicted, target)

predicted_output.append(predicted)
network_error.append(err) #
Backward Pass
Machine Learning Lab

g1 = error_predicted_deriv(predicted, target)
g2 = sigmoid_sop_deriv(y)

g3w1 = sop_w_deriv(x1) g3w2


= sop_w_deriv(x2)

gradw1 = g3w1*g2*g1 gradw2


= g3w2*g2*g1

w1 = update_w(w1, gradw1, learning_rate) w2


= update_w(w2, gradw2, learning_rate)

#print(predicted)

plt.figure()
plt.plot(network_error)
plt.title("Iteration Number vs Error") plt.xlabel("Iteration
Number")
plt.ylabel("Error")
plt.show()

plt.figure()
plt.plot(predicted_output)
plt.title("Iteration Number vs Prediction") plt.xlabel("Iteration
Number")
plt.ylabel("Prediction")
plt.show()
Machine Learning Lab

OUTPUT:
Initial W : 0.08698924153243281 0.4532713230157145

Result: -
Machine Learning Lab

Experiment 9
Date:

9. Implementing FIND-S algorithm using python

Training Database

Algorithm

1. Initialize h to the most specific hypothesis in H


2. For each positive training instance x
For each attribute constraint a, in h
If the constraint a, is satisfied by x
Then do nothing
Else replace a, in h by the next more general constraint that is satisfied by x
3. Output hypothesis h

Hypothesis Construction
Machine Learning Lab

Source Code: with


open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are :
",len(a)) num_attribute = len(a[0])-1 print("\n The initial
hypothesis is : ") hypothesis = ['0']*num_attribute
print(hypothesis) for i in range(0, len(a)):
if a[i][num_attribute] == 'TRUE': #for each positive example only for j in
range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is : \n".format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)

OUTPUT:

Result: -
Machine Learning Lab

Experiment 10
Date:

10. Implementing Candidate Elimination algorithm using python

Training Database

Algorithm
Machine Learning Lab

Source Code:
import csv

with open("enjoysport.csv") as f:
csv_file=csv.reader(f)
data=list(csv_file)

print(data)
print(" ------------------- ")
s=data[1][:-1] #extracting one row or instance or record
g=[['?' for i in range(len(s))] for j in range(len(s))]

print(s)
print(" ------------------- ")
print(g)
print(" ------------------- ")

for i in data:
if i[-1]=="TRUE": # For each positive training record or instance
for j in range(len(s)):
if i[j]!=s[j]:
s[j]='?'
g[j][j]='?'

elif i[-1]=="FALSE" : # For each negative training record or example


for j in range(len(s)):
if i[j]!=s[j]:
g[j][j]=s[j]
Machine Learning Lab

else: g[j][j]="?" print("\nSteps of Candidate Elimination


Algorithm",data.index(i)+1) print(s) print(g)
gh=[] for
i in g:
for j in i:
if j!='?':
gh.append(i)
break
print("\nFinal specific hypothesis:\n",s) print("\nFinal
general hypothesis:\n",gh)

OUTPUT:

Result: -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy