Ml Unit 2 Lecture Notes
Ml Unit 2 Lecture Notes
Machine Learning
UNIT – II
1. Explain the Multi-layer perceptron with an example. (OR)
Multi-layer Perceptron (MLP) : A multilayer perceptron (MLP) is modern feedforward
artificial neural network, consisting of fully connected neurons with a nonlinear activation
function, These networks are trained using the backpropagation method
A Multi-layer Perceptron (MLP) is a type of artificial neural network that consists of multiple
layers of neurons, typically including an input layer, one or more hidden layers, and an output
layer. Each layer is fully connected to the next one, meaning each neuron in one layer is
connected to every neuron in the next layer.
• 1. Going Forwards (Forward Propagation) : In order to generate the output,, the input
data is fed into the forward direction only. The data should not flow in reverse direction
during output generation. This type of network configurations are known as feed-forward
network.
• Purpose: To compute the predicted output of the network given the input data.
1. Input Layer : The input data is fed in the forward direction through the network.
2. Hidden Layers : Each hidden layer accepts the input data, processes it as per the activation
function and passes to the successive layer.
A perceptron used a step function as its linear activation function.
The backpropagation algorithm requires that modern MLPs use continuous activation functions
such as sigmoid or ReLU (rectified linear unit)
1
3. Output Layer: The final layer, which produces the output of the network.
This output is a prediction or classification based on the input data.
The main goal of forward propagation is to compute the output of the neural network for a
given input.
Error Calculation: After the forward propagation, the error or loss is calculated by comparing
the network's output with the actual target values.
for regression problems : Common loss functions include Mean Squared Error (MSE)
for classification problems. : Cross-Entropy Loss
The primary goal of back propagation is to minimize the error by iteratively adjusting the
weights of the network.
2. Going Backwards (Backward Propagation) : The error signal is propagated backward from the
output layer to the network layers. The weights and biases of the network are adjusted based on the
error signal.
Gradient descent : The gradient descent approach is used to calculate the trajectories of the bias and
weight space. A small learning rate results in smaller changes to the weights and biases, while a large
learning rate can cause unstable changes.
2
. In machine learning, backpropagation is a training algorithm that uses a gradient descent approach to
calculate the error signal in a neural network and update its weights and biases:
Backward Propagation :
• Purpose: To update the network's weights and biases to minimize the error between the
predicted output and the actual target.
• Steps:
• Calculate Error: Determine the difference between the predicted output and the
actual target.
• Compute Gradients: Calculate the gradient of the error with respect to each
weight and bias in the network using the chain rule. This involves:
• Output Layer: Compute the gradient of the error with respect to the output
layer's inputs (derivative of the loss function with respect to the network's output).
• Hidden Layers: Propagate the error back through the network, computing the
gradient of the error with respect to the inputs of each hidden layer neuron.
• Update Weights and Biases: Adjust each weight and bias by a small amount
proportional to the negative of its gradient (using the learning rate to control the
size of the update).
• Output: Updated weights and biases for the network, aimed at reducing the prediction
error.
3
• The simple error function used for perceptrons (Σ(yk - tk)) is inadequate for MLPs as
positive and negative errors can cancel out.
• This new error function ensures all errors contribute positively to the total error.
• The weights of the network are trained so that the error goes downhill until it reaches a
local minimum, just like a ball rolling under gravity
Initialisation– initialise all weights to small (positive and negative) random values
• Training– repeat:
In Forward phase:
4
In Backward phase:
The MLP is trained over multiple epochs (iterations over the entire dataset). Weights are
adjusted as the network makes errors in each iteration.
Stopping Criteria: Simple methods like setting a fixed number of iterations or a minimum
error threshold are not sufficient. These can lead to overfitting or underfitting.
Validation Set: A separate dataset used to monitor the network's generalization ability
during training.
Error Curves:
Validation error: Initially decreases but may start increasing at some point.
5
Early Stopping: The technique of stopping training when the validation error starts to
increase.
3. Assume that the neurons have the sigmoid activation function to perform forward and
backward pass on the network. And also assume that the actual output of y is 0.5. calculate the
orward propagation and error
6
7
5. Explain the Forward propagation in MLP
Forward Propagation : In order to generate the output,, the input data is fed into the forward
direction only. The data should not flow in reverse direction during output generation.This type
of network configurations are known as feed-forward network. The feed-forward network helps
in forward propagation.
1. Input Layer : The input data is fed in the forward direction through the network.
2. Hidden Layers : Each hidden layer accepts the input data, processes it as per the activation
function and passes to the successive layer.
A perceptron used a Heaviside step function as its linear activation function.
The backpropagation algorithm requires that modern MLPs use continuous activation functions
such as sigmoid or ReLU (rectified linear unit)
3. Output Layer: The final layer, which produces the output of the network.
This output is a prediction or classification based on the input data.
The main goal of forward propagation is to compute the output of the neural network for a
given input.
Error Calculation: After the forward propagation, the error or loss is calculated by comparing
the network's output with the actual target values.
for regression problems : Common loss functions include Mean Squared Error (MSE)
for classification problems. : Cross-Entropy Loss
The primary goal of back propagation is to minimize the error by iteratively adjusting the
weights of the network.
8
6. Deriving the back-propagation algorithm
Backward -Propagation : The error signal is propagated backward from the output layer to the network
layers. The weights and biases of the network are adjusted based on the error signal.
In machine learning, backpropagation is a training algorithm that uses a gradient descent approach to
calculate the error signal in a neural network and update its weights and biases:
Gradient descent : The gradient descent approach is used to calculate the trajectories of the bias and
weight space. A small learning rate results in smaller changes to the weights and biases, while a large
learning rate can cause unstable changes.
9
7. Describe the construction of Radial Basis Function Network with an example
(OR)
Illustrate the RBF Network in detail
Radial Basis Function : A Radial Basis Function(RBF) is a real-valued function (real valued input
and outputs) that depends on the distance between the input value and an imaginary fixed
point known as the center.
RBF is used in many machine learning and deep learning algorithms such as Support Vector
Machines, Artificial Neural Networks, etc. RBFs are used as function approximators
Kernel : Kernels play a fundamental role in transforming data into higher-dimensional spaces,
enabling algorithms to learn complex patterns and relationships.
The Radial Basis Function (RBF) kernel, also known as the Gaussian kernel,.. It operates by
measuring the similarity between data points based on their Euclidean distance in the input
space.
−||𝒙−𝒘||𝟐 −𝒓𝟐
𝒈(𝒙, 𝒘, 𝝈) = 𝒆 𝟐𝝈𝟐= 𝒆𝟐𝝈𝟐
𝟐
Where ||𝒙 − 𝒘|| represents the squared Euclidean distance between the two data points.
𝝈 is a parameter known as the bandwidth or width of of the kernel, controlling the
smoothness of the decision boundary.
Other Kernels :
1. Multi Quadric Function : 𝑓(𝑟) = √𝑟 2 + 𝑐 2 , where parameter 𝑐 > 0
1
2. Inverse Multi Quadric Function : 𝑓(𝑟) = 2 2 where parameter 𝑐 > 0
√𝑟 +𝑐
10
2. Hidden Layer : This layer uses radial basis functions (RBFs) to conduct the non-linear
transformation of the input data. The Gaussian function is the RBF.
RBF Neurons: Every neuron in the hidden layer has a spread parameter (σ) and a center. The
spread parameter(𝝈) modulates the distance between the center of an RBF neuron and the
input vector, which determines the neuron’s output.
3. Output Layer : The output layer uses weighted sums to integrate(linear combination) the
hidden layer neurons’ outputs to create the network’s final output.. To reduce the error
between the network’s predictions and the actual target values, the weights of these
combinations are changed during training.
11
Truth Table : 1 1 0
12
Positive weights are assigned to neurons belonging to the same category
Negative weights are assigned to other categories.
The decision boundary can be plotted by evaluating scores over a grid.
The margin is the largest region that separates the classes without there being any points
inside, where the box is made from two lines that are parallel to the decision boundary
Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a
critical role in deciding the hyperplane and margin.
Margin: Margin is the distance between the support vector and hyperplane.
13
The main objective of the SVM algorithm is to maximize the margin.
The wider margin indicates better classification performance.
Hard Margin or Maximum-margin : It is a hyperplane that properly separates the data points of
different categories without any misclassifications.
Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft
margin technique. Each data point has a slack variable introduced by the soft-margin
14
KERNEL : Kernel is the mathematical function, which is used in SVM to map the original input
data points into high-dimensional feature spaces, so, that the hyperplane can be easily found
out even if the data points are not linearly separable in the original input space.
Based on the nature of the decision boundary, Support Vector Machines (SVM) can be divided
into two main parts:
Linear SVM: Linear SVMs use a linear decision boundary to separate the data points of
different classes. When the data can be precisely linearly separated, linear SVMs are
very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective classes. A
hyperplane that maximizes the margin between the classes is the decision boundary.
Non-Linear SVM: Non-Linear SVM can be used to classify data when it cannot be
separated into two classes by a straight line (in the case of 2D). By using kernel
functions, nonlinear SVMs can handle nonlinearly separable data. The original input
data is transformed by these kernel functions into a higher-dimensional feature space,
where the data points can be linearly separated. A linear SVM is used to locate a
nonlinear decision boundary in this modified space.
15
If 𝑆1 is closed positive point then 𝛼1 𝑆1 𝑆1 + 𝛼2 𝑆1 𝑆2 + 𝛼3 𝑆1 𝑆3 = −1 − −→ (𝟑)
Step 4 : Evaluate w by
𝒏
𝑾 = ∑ 𝜶𝒊 𝑺𝒊
𝒊=𝟏
𝑤𝑇𝑥 + 𝑏 = 0
The vector W represents the normal vector to the hyperplane. i.e the direction perpendicular to
the hyperplane. The parameter b in the equation represents the offset or distance of the
hyperplane from the origin along the normal vector w.
The distance between a data point 𝑥 and the decision boundary can be calculated as:
𝑤 𝑇 𝑥𝑖 +𝑏
𝑑𝑖 =
||𝑤||
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean norm of the
normal vector W
16
The target variable or label for the ith training instance is denoted by the symbol ti in this
statement. And ti=-1 for negative occurrences (when yi= 0) and ti=1positive instances (when yi
= 1) respectively. Because we require the decision boundary that satisfy the constraint:
𝑡𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1
6. Outline the steps to determine the linear decision boundary for separating two classes in a
feature space using the Support Vector Machine
Support Vector Machine (SVM) : Support Vector Machine (SVM) is a supervised machine
learning algorithm used for both classification and regression. The main objective of the SVM
algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space.
𝑾 = ∑ 𝜶𝒊 𝑺𝒊
𝒊=𝟏
Step 5 : The equation for the linear hyperplane is 𝑤 𝑇 𝑥 + 𝑏 = 0
17
Advantages and disadvantages of Support Vector Machine :
1. Support vector machine works comparably well when there is an understandable margin of
dissociation between classes.
2. It is more productive in high-dimensional spaces.
3. It is effective in instances where the number of dimensions is larger than the number of
specimens.
4. Support vector machine is comparably memory systematic.
5. Handling high-dimensional data: SVMs are effective in handling high-dimensional data,
which is common in many applications such as image and text classification.
6. Handling small datasets: SVMs can perform well with small datasets, as they only require a
small number of support vectors to define the boundary.
7. Modeling non-linear decision boundaries: SVMs can model non-linear decision boundaries by
using the kernel trick, which maps the data into a higher-dimensional space where the data
becomes linearly separable.
8. Robustness to noise: SVMs are robust to noise in the data, as the decision boundary is
determined by the support vectors, which are the closest data points to the boundary.
9. Generalization: SVMs have good generalization performance, which means that they are
able to classify new, unseen data well.
10. Versatility: SVMs can be used for both classification and regression tasks, and it can be
applied to a wide range of applications such as natural language processing, computer vision,
and bioinformatics.
Sparse solution: SVMs have sparse solutions, which means that they only use a subset of the
training data to make predictions. This makes the algorithm more efficient and less prone to
overfitting.
Regularization: SVMs can be regularized, which means that the algorithm can be modified to
avoid overfitting.
Very Short Answer Questions
2. If 𝒙𝟏 = 𝟐, 𝒙𝟐 = 𝟏 in the following neural network then calculate the net input of the
neural network
18
The net input =∑ 𝑤𝐼 𝑥𝐼 = 1(−20) + 2(15) + 1(10)
= −20 + 30 + 10
The net input = 𝟐𝟎
3 If the input to a single-input neuron is 2.0, its weight is 2.3 and its bias is – 3. Then find the
net input to the transfer function ?
Given weight w = 2.3
Input x = 2
Bias b = – 3
The net input = wx + b = (2.3) (2) + (– 3) = 1.6
19
Delta Rule : The delta rule is a gradient descent learning technique in machine learning that
updates the weights of inputs to artificial neurons in a single-layer neural network.
20