0% found this document useful (0 votes)
5 views

Ml Unit 2 Lecture Notes

The document provides lecture notes on Machine Learning, specifically focusing on Multi-layer Perceptrons (MLPs) and their training through forward and backward propagation. It explains the architecture of MLPs, the algorithms used for training, and the importance of error calculation and weight adjustment. Additionally, it covers Radial Basis Function Networks and Support Vector Machines, detailing their structures, training processes, and applications in classification and regression tasks.

Uploaded by

20259cm012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Ml Unit 2 Lecture Notes

The document provides lecture notes on Machine Learning, specifically focusing on Multi-layer Perceptrons (MLPs) and their training through forward and backward propagation. It explains the architecture of MLPs, the algorithms used for training, and the importance of error calculation and weight adjustment. Additionally, it covers Radial Basis Function Networks and Support Vector Machines, detailing their structures, training processes, and applications in classification and regression tasks.

Uploaded by

20259cm012
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

R22 B.Tech. III Year I Sem.

Subject Code : AM502PC / IT503PC

Machine Learning

LECTURE NOTES (QA)

UNIT – II
1. Explain the Multi-layer perceptron with an example. (OR)
Multi-layer Perceptron (MLP) : A multilayer perceptron (MLP) is modern feedforward
artificial neural network, consisting of fully connected neurons with a nonlinear activation
function, These networks are trained using the backpropagation method
A Multi-layer Perceptron (MLP) is a type of artificial neural network that consists of multiple
layers of neurons, typically including an input layer, one or more hidden layers, and an output
layer. Each layer is fully connected to the next one, meaning each neuron in one layer is
connected to every neuron in the next layer.

There are two phases in MLP. 1. Forward Propagation 2. Backward Propagation

• 1. Going Forwards (Forward Propagation) : In order to generate the output,, the input
data is fed into the forward direction only. The data should not flow in reverse direction
during output generation. This type of network configurations are known as feed-forward
network.
• Purpose: To compute the predicted output of the network given the input data.

Forward Propagation : Input Layer → Hidden Layers: → Output Layer

1. Input Layer : The input data is fed in the forward direction through the network.
2. Hidden Layers : Each hidden layer accepts the input data, processes it as per the activation
function and passes to the successive layer.
A perceptron used a step function as its linear activation function.

The backpropagation algorithm requires that modern MLPs use continuous activation functions
such as sigmoid or ReLU (rectified linear unit)

1
3. Output Layer: The final layer, which produces the output of the network.
This output is a prediction or classification based on the input data.
 The main goal of forward propagation is to compute the output of the neural network for a
given input.

Error Calculation: After the forward propagation, the error or loss is calculated by comparing
the network's output with the actual target values.
 for regression problems : Common loss functions include Mean Squared Error (MSE)
 for classification problems. : Cross-Entropy Loss
The primary goal of back propagation is to minimize the error by iteratively adjusting the
weights of the network.

Backpropagation Error=Target Output – Actual Output

2. Going Backwards (Backward Propagation) : The error signal is propagated backward from the
output layer to the network layers. The weights and biases of the network are adjusted based on the
error signal.

Gradient descent : The gradient descent approach is used to calculate the trajectories of the bias and
weight space. A small learning rate results in smaller changes to the weights and biases, while a large
learning rate can cause unstable changes.

2
. In machine learning, backpropagation is a training algorithm that uses a gradient descent approach to
calculate the error signal in a neural network and update its weights and biases:

Backward Propagation :

Output Layer → Hidden Layers: → Input Layer

Backward Pass (Weight Update)

• Purpose: To update the network's weights and biases to minimize the error between the
predicted output and the actual target.

• Steps:

• Calculate Error: Determine the difference between the predicted output and the
actual target.

• Compute Gradients: Calculate the gradient of the error with respect to each
weight and bias in the network using the chain rule. This involves:

• Output Layer: Compute the gradient of the error with respect to the output
layer's inputs (derivative of the loss function with respect to the network's output).

• Hidden Layers: Propagate the error back through the network, computing the
gradient of the error with respect to the inputs of each hidden layer neuron.

• Update Weights and Biases: Adjust each weight and bias by a small amount
proportional to the negative of its gradient (using the learning rate to control the
size of the update).

• Output: Updated weights and biases for the network, aimed at reducing the prediction
error.
3
• The simple error function used for perceptrons (Σ(yk - tk)) is inadequate for MLPs as
positive and negative errors can cancel out.

• A sum-of-squares error function is introduced

• This new error function ensures all errors contribute positively to the total error.

• The (1/2) factor in the error function simplifies differentiation.

• The weights of the network are trained so that the error goes downhill until it reaches a
local minimum, just like a ball rolling under gravity

2. Write the Multi-layer Perceptron algorithm

Multi-layer Perceptron (MLP) : A multilayer perceptron (MLP) is modern feedforward


artificial neural network, consisting of fully connected neurons with a nonlinear activation
function, These networks are trained using the backpropagation method

Multi-layer Perceptron algorithm :

Initialisation– initialise all weights to small (positive and negative) random values

• Training– repeat:

∗ for each input vector:

In Forward phase:

4
In Backward phase:

3. When should an MLP stop learning ?

The MLP is trained over multiple epochs (iterations over the entire dataset). Weights are
adjusted as the network makes errors in each iteration.

Stopping Criteria: Simple methods like setting a fixed number of iterations or a minimum
error threshold are not sufficient. These can lead to overfitting or underfitting.

Validation Set: A separate dataset used to monitor the network's generalization ability
during training.

Error Curves:

Training error: Typically decreases rapidly at first, then slows down.

Validation error: Initially decreases but may start increasing at some point.

5
Early Stopping: The technique of stopping training when the validation error starts to
increase.

3. Assume that the neurons have the sigmoid activation function to perform forward and
backward pass on the network. And also assume that the actual output of y is 0.5. calculate the
orward propagation and error

6
7
5. Explain the Forward propagation in MLP
Forward Propagation : In order to generate the output,, the input data is fed into the forward
direction only. The data should not flow in reverse direction during output generation.This type
of network configurations are known as feed-forward network. The feed-forward network helps
in forward propagation.

Forward Propagation : Input Layer → Hidden Layers: → Output Layer

1. Input Layer : The input data is fed in the forward direction through the network.
2. Hidden Layers : Each hidden layer accepts the input data, processes it as per the activation
function and passes to the successive layer.
A perceptron used a Heaviside step function as its linear activation function.

The backpropagation algorithm requires that modern MLPs use continuous activation functions
such as sigmoid or ReLU (rectified linear unit)

3. Output Layer: The final layer, which produces the output of the network.
This output is a prediction or classification based on the input data.
 The main goal of forward propagation is to compute the output of the neural network for a
given input.

Error Calculation: After the forward propagation, the error or loss is calculated by comparing
the network's output with the actual target values.
 for regression problems : Common loss functions include Mean Squared Error (MSE)
 for classification problems. : Cross-Entropy Loss
The primary goal of back propagation is to minimize the error by iteratively adjusting the
weights of the network.

Error=Target Output – Actual Output

8
6. Deriving the back-propagation algorithm
Backward -Propagation : The error signal is propagated backward from the output layer to the network
layers. The weights and biases of the network are adjusted based on the error signal.

In machine learning, backpropagation is a training algorithm that uses a gradient descent approach to
calculate the error signal in a neural network and update its weights and biases:

Gradient descent : The gradient descent approach is used to calculate the trajectories of the bias and
weight space. A small learning rate results in smaller changes to the weights and biases, while a large
learning rate can cause unstable changes.

Backward Propagation : Output Layer → Hidden Layers: → Input

Deriving Back-Propagation : Refer the note shared in drive

9
7. Describe the construction of Radial Basis Function Network with an example
(OR)
Illustrate the RBF Network in detail

Radial Basis Function : A Radial Basis Function(RBF) is a real-valued function (real valued input
and outputs) that depends on the distance between the input value and an imaginary fixed
point known as the center.
RBF is used in many machine learning and deep learning algorithms such as Support Vector
Machines, Artificial Neural Networks, etc. RBFs are used as function approximators

Kernel : Kernels play a fundamental role in transforming data into higher-dimensional spaces,
enabling algorithms to learn complex patterns and relationships.

The Radial Basis Function (RBF) kernel, also known as the Gaussian kernel,.. It operates by
measuring the similarity between data points based on their Euclidean distance in the input
space.
−||𝒙−𝒘||𝟐 −𝒓𝟐
𝒈(𝒙, 𝒘, 𝝈) = 𝒆 𝟐𝝈𝟐= 𝒆𝟐𝝈𝟐
𝟐
Where ||𝒙 − 𝒘|| represents the squared Euclidean distance between the two data points.
𝝈 is a parameter known as the bandwidth or width of of the kernel, controlling the
smoothness of the decision boundary.

Other Kernels :
1. Multi Quadric Function : 𝑓(𝑟) = √𝑟 2 + 𝑐 2 , where parameter 𝑐 > 0
1
2. Inverse Multi Quadric Function : 𝑓(𝑟) = 2 2 where parameter 𝑐 > 0
√𝑟 +𝑐

Architecture of RBF Networks : RBF Network consists of three layers:


1. Input Layer : It is made up of the same number of neurons as the characteristics in the input
data. One feature of the input vector corresponds to each neuron in the input layer. After
receiving the input features, the input layer sends them straight to the hidden layer.

10
2. Hidden Layer : This layer uses radial basis functions (RBFs) to conduct the non-linear
transformation of the input data. The Gaussian function is the RBF.
RBF Neurons: Every neuron in the hidden layer has a spread parameter (σ) and a center. The
spread parameter(𝝈) modulates the distance between the center of an RBF neuron and the
input vector, which determines the neuron’s output.

3. Output Layer : The output layer uses weighted sums to integrate(linear combination) the
hidden layer neurons’ outputs to create the network’s final output.. To reduce the error
between the network’s predictions and the actual target values, the weights of these
combinations are changed during training.

Training Process of radial basis function neural network


An RBF neural network must be trained in three stages:
Step 1: Selecting the Centers : Techniques for Centre Selection: Centre’s can be picked at
random from the training set of data or by applying k-means clustering.
Step 2: Determining the Spread Parameters : The spread parameter (σ) governs each RBF
neuron’s area of effect and establishes the width of the RBF.
Calculation: The spread parameter can be manually adjusted for each neuron or set as a
constant for all neurons.
𝒅
σ is adjusted by heuristic method,
√𝟐𝑴
Where 𝑑 = The greatest distance between canters
𝑀 = the number of center’s
Step 3: Training the Output Weights :
Linear Regression: The objective of linear regression techniques, which are commonly used to
estimate the output layer weights, is to minimize the error between the anticipated output and
the actual target values.
Pseudo-Inverse Method: The weights is to utilize the pseudo-inverse of the hidden layer
outputs matrix. The fit method trains the RBFN model by computing activations for input data
points and solving for the weights using the Moore-Penrose pseudo-inverse.
𝑨+ = (𝑨𝑻 𝑨)−𝟏 𝑨𝑻
Example of XOR gate: The XOR dataset (X) consists of four data points, each with two
features.Corresponding labels (y) represent the XOR function output for each data point.
A B A XOR B
0 0 0
0 1 1
1 0 1

11
Truth Table : 1 1 0

Graphical Representation of XOR : ( Non Lineaarly Seperable Data )

• An RBFN instance is created with a specified sigma value.


• The model is trained using the fit method on the XOR dataset.
• Predictions are obtained for the same dataset using the predict method.
• The mean squared error (MSE) between the predicted and actual outputs is calculated.

General Example : The following features as independent variables in classification of to a
student will pass or fail the examination :
1. Marks in internal exams 2. Marks in projects 3. Attendance percentage

So, these 3 independent variables become 3 dimensions of a space like this


Take Imaginary point as center and draw Concentric circles
Take the radius of circle r on which data points lies on circumference of that circle

Horizontal Expansion Vertical Compression

12
Positive weights are assigned to neurons belonging to the same category
Negative weights are assigned to other categories.
The decision boundary can be plotted by evaluating scores over a grid.

Applications of RBF Networks : Classification, Regression, Function Approximation

8. Explain the Support Vector Machine.


(OR)
Outline the steps to determine the linear decision boundary for separating two classes in a
feature space using the Support Vector Machine.
Support Vector Machine (SVM) : Support Vector Machine (SVM) is a supervised machine
learning algorithm used for both classification and regression. The main objective of the SVM
algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space.

The margin is the largest region that separates the classes without there being any points
inside, where the box is made from two lines that are parallel to the decision boundary

Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a
critical role in deciding the hyperplane and margin.

Margin: Margin is the distance between the support vector and hyperplane.
13
 The main objective of the SVM algorithm is to maximize the margin.
 The wider margin indicates better classification performance.

Hard Margin or Maximum-margin : It is a hyperplane that properly separates the data points of
different categories without any misclassifications.

Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft
margin technique. Each data point has a slack variable introduced by the soft-margin

14
KERNEL : Kernel is the mathematical function, which is used in SVM to map the original input
data points into high-dimensional feature spaces, so, that the hyperplane can be easily found
out even if the data points are not linearly separable in the original input space.

Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided
into two main parts:

 Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear SVMs are
very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A
hyperplane that maximizes the margin between the classes is the decision boundary.

 Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original input
data is transformed by these kernel functions into a higher-dimensional feature space,
where the data points can be linearly separated. A linear SVM is used to locate a
nonlinear decision boundary in this modified space.

Linear SVM Algorithm :

Step 1 : If 𝑆1 , 𝑆2 , 𝑆3 are support vectors which are closed to hyperplane

𝑥1 , 𝑥2 , 𝑥3 are input vectors and 𝛼1 , 𝛼2 , 𝛼3 are constants

Step 2 : Then linear combinations of equations are

If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = +1 − −→ (𝟏)

If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = +1 − −→ (𝟐)

15
If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = −1 − −→ (𝟑)

Step 3 : By solving equations (𝟏), (𝟐) and (𝟑) we get constants 𝜶𝟏 , 𝜶𝟐 , 𝜶𝟑

Step 4 : Evaluate w by
𝒏

𝑾 = ∑ 𝜶𝒊 𝑺𝒊
𝒊=𝟏

Step 5 : The equation for the linear hyperplane is 𝑤 𝑇 𝑥 + 𝑏 = 0

Mathematical intuition of Support Vector Machine : Consider a binary classification problem


with two classes, labeled as +1 and -1. We have a training dataset consisting of input feature
vectors X and their corresponding class labels Y.

The equation for the linear hyperplane can be written as:

𝑤𝑇𝑥 + 𝑏 = 0

The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular to
the hyperplane. The parameter b in the equation represents the offset or distance of the
hyperplane from the origin along the normal vector w.

The distance between a data point 𝑥 and the decision boundary can be calculated as:

𝑤 𝑇 𝑥𝑖 +𝑏
𝑑𝑖 =
||𝑤||

where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the
normal vector W

For Linear SVM classifier :


𝑇
𝑦̂ = { 1 ∶ 𝑤 𝑇𝑥 + 𝑏 ≥ 0
0 ∶ 𝑤 𝑥+𝑏 <0
Optimization: For Hard margin linear SVM classifier:
1 1
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑤 𝑇 𝑤 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ||𝑤||2
2 = 2
𝑤, 𝑏 𝑤, 𝑏

Subject to 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1 𝑓𝑜𝑟 𝑖 = 1,2,3, … . 𝑚

16
The target variable or label for the ith training instance is denoted by the symbol ti in this
statement. And ti=-1 for negative occurrences (when yi= 0) and ti=1positive instances (when yi
= 1) respectively. Because we require the decision boundary that satisfy the constraint:

𝑡𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1

6. Outline the steps to determine the linear decision boundary for separating two classes in a
feature space using the Support Vector Machine

Support Vector Machine (SVM) : Support Vector Machine (SVM) is a supervised machine
learning algorithm used for both classification and regression. The main objective of the SVM
algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space.

Linear SVM Algorithm :

Step 1 : If 𝑆1 , 𝑆2 , 𝑆3 are support vectors which are closed to hyperplane


𝑥1 , 𝑥2 , 𝑥3 are input vectors and 𝛼1 , 𝛼2 , 𝛼3 are constants

Step 2 : Then linear combinations of equations are


If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = +1 − −→ (𝟏)
If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = +1 − −→ (𝟐)
If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = −1 − −→ (𝟑)

Step 3 : By solving equations (𝟏), (𝟐) and (𝟑) we get constants 𝜶𝟏 , 𝜶𝟐 , 𝜶𝟑


Step 4 : Evaluate w by
𝒏

𝑾 = ∑ 𝜶𝒊 𝑺𝒊
𝒊=𝟏
Step 5 : The equation for the linear hyperplane is 𝑤 𝑇 𝑥 + 𝑏 = 0

17
Advantages and disadvantages of Support Vector Machine :
1. Support vector machine works comparably well when there is an understandable margin of
dissociation between classes.
2. It is more productive in high-dimensional spaces.
3. It is effective in instances where the number of dimensions is larger than the number of
specimens.
4. Support vector machine is comparably memory systematic.
5. Handling high-dimensional data: SVMs are effective in handling high-dimensional data,
which is common in many applications such as image and text classification.
6. Handling small datasets: SVMs can perform well with small datasets, as they only require a
small number of support vectors to define the boundary.
7. Modeling non-linear decision boundaries: SVMs can model non-linear decision boundaries by
using the kernel trick, which maps the data into a higher-dimensional space where the data
becomes linearly separable.
8. Robustness to noise: SVMs are robust to noise in the data, as the decision boundary is
determined by the support vectors, which are the closest data points to the boundary.
9. Generalization: SVMs have good generalization performance, which means that they are
able to classify new, unseen data well.
10. Versatility: SVMs can be used for both classification and regression tasks, and it can be
applied to a wide range of applications such as natural language processing, computer vision,
and bioinformatics.
Sparse solution: SVMs have sparse solutions, which means that they only use a subset of the
training data to make predictions. This makes the algorithm more efficient and less prone to
overfitting.
Regularization: SVMs can be regularized, which means that the algorithm can be modified to
avoid overfitting.
Very Short Answer Questions

1. What is multilayer network?


A multilayer neural network consists of multiple layers of interconnected nodes or neurons.
Each neuron computes a weighted sum of its input values and passes it through an activation
function to produce an output value.

2. If 𝒙𝟏 = 𝟐, 𝒙𝟐 = 𝟏 in the following neural network then calculate the net input of the
neural network

18
The net input =∑ 𝑤𝐼 𝑥𝐼 = 1(−20) + 2(15) + 1(10)
= −20 + 30 + 10
The net input = 𝟐𝟎

3 If the input to a single-input neuron is 2.0, its weight is 2.3 and its bias is – 3. Then find the
net input to the transfer function ?
Given weight w = 2.3
Input x = 2
Bias b = – 3
The net input = wx + b = (2.3) (2) + (– 3) = 1.6

4. Write any two activation functions used in multi layer perceptron ?


Activation functions used in multi layer perceptron :
1
(i) Sigmoid Function is 𝑓(𝑥) = −𝑥
1+𝑒
−||𝒙−𝒘||𝟐 −𝒓𝟐
(ii) Gaussian Function is 𝒈(𝒙, 𝒘, 𝝈) = 𝒆 𝟐𝝈𝟐 =𝒆 𝟐𝝈𝟐

(iii) ReLU Function is 𝑓(𝑥) = max(0, 𝑥)

5. Interpret a Sigmoid Function andThreshold unit?


Sigmoid activation function is defined as.
1
𝑓(𝑥) =
1 + 𝑒 −𝑥

Sigmoid function is a threshold unit returns value between 0 and 1.

6. What is gradient descent?


Gradient descent : The gradient descent approach is used to calculate the trajectories of the
bias and weight space. A small learning rate results in smaller changes to the weights and
biases, while a large learning rate can cause unstable changes.

7. What is Delta Rule?

19
Delta Rule : The delta rule is a gradient descent learning technique in machine learning that
updates the weights of inputs to artificial neurons in a single-layer neural network.

8. What is the Curse of Dimensionality?


Curse of Dimensionality : The Curse of Dimensionality refers to the phenomenon where the
efficiency and effectiveness of algorithms deteriorate as the dimensionality of the data
increases exponentially.

9. What is radial basis function?


Radial Basis Function : A Radial Basis Function(RBF) is a real-valued function (real valued input
and outputs) that depends on the distance between the input value and an imaginary fixed
point known as the center.

10. What is interpolation?


Interpolation : In machine learning, interpolation refers to the process of estimating unknown
values that fall between known data points.

11. What is Learning Rate ?


Learning Rate : Learning rate is a hyperparameter that controls how much a model's
parameters are adjusted during each iteration of an optimization algorithm.

12. What is spline?


Spline Interpolation : It is a method of interpolation where the interpolating function is a
piecewise - defined polynomial called a spline. Spline interpolation divides the data into smaller
segments and fits a separate polynomial to each segment.

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy