0% found this document useful (0 votes)

149 views

AI & ML Unit 5 Notes

Uploaded by

Anandakumar A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views

AI & ML Unit 5 Notes

Uploaded by

Anandakumar A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Unit – V

Neuron:

• A neuron is a cell in the brain whose principal function is the collection, processing
and dissemination of electric signals.

Neural Networks:

• The brain’s information-processing capacity is thought to emerge primarily from

networks of such neurons.
• For this reason, some of the earliest AI work aimed to create artificial neural
networks.

Perceptron:

• A network with all the inputs connected directly to the outputs is called a Single
layer neural network or a perceptron.
• It is the basic processing element
• It has inputs that may come from the environment or may be the outputs of other
perceptrons.
• Perceptron model is also treated as one of the best and simplest types of artificial
neural networks.

• Input Nodes: This is the primary component of perceptron which accepts the
initial data into system.
• Weight: It represents the strength of the connection between units. Weight is
directly proportional to strength of the associated input neuron in deciding output.
• Activation Function: These are the final important components that help to
determine whether the neuron will fire or not.
Types of activation function:
(i) Sign function
(ii) Step function
(iii) Sigmoid function
• The output of perceptron as a dot product : Y = WTx
• Each perceptron is a local function of its inputs and synaptic weights.

Sigmoid function:

• It is a function which is plotted as ‘S’ shaped graph.

• Equation: A= 1/(1+e-x)
• Nature: Non-linear
• Value range: 0 to 1

Perceptron Function:

• Perceptron function “f(x)” can be achieved as output by multiplying the input ‘x’
with learned weight co-efficient ‘w’.
• It can be expressed by f(x) = 1; if w.x+b>0 ; otherwise, f(x)=0

Characteristics of perceptron:

• It is a machine learning algorithm for supervised learning of binary classifiers.

• The weight co-efficient is automatically learned.
• Initially the weights are multiplied with input features, and the decision is made
whether the neuron is fired or not.

Single-Layer Perceptron:

• This is one of the easiest Artificial Neural Networks types.

• A single-layered perceptron model consists feed-forward network and also includes
a threshold transfer function inside the model.
• The main objective of the single-layer perceptron model is to analyze the linearly
separable objects with binary outcomes.
• In a single layer perceptron model, its algorithms do not contain recorded data, so
it begins with inconstantly allocated input for weight parameters.

Multi-Layer Perceptron:

• Multi-layer perception is also known as MLP.

• It is fully connected dense layers, which transform any input dimension to the
desired dimension.
• A multi-layer perception is a neural network that has multiple layers.
• To create a neural network we combine neurons together so that the outputs of
some neurons are inputs of other neurons.

• In the multi-layer perceptron diagram, we can see that there are three inputs and
thus three input nodes and the hidden layer has three nodes.
• The output layer gives two outputs, therefore there are two output nodes.
• Every node in the multi-layer perception uses a sigmoid activation function. The
sigmoid activation function takes real values as input and converts them to numbers
between 0 and 1 using the sigmoid formula. (x)=1/(1+(exp(-x))
• The multi-layer perceptron model is also known as the Backpropagation algorithm,
which executes in two stages as follows:
✓ Forward Stage: Activation functions start from the input layer in the forward
stage and terminate on the output layer.
✓ Backward Stage: In the backward stage, weight and bias values are modified
as per the model's requirement.
• The neural network has neurons that work in correspondence with weight, bias,
and their respective activation function. In a neural network, we would update the
weights and biases of the neurons on the basis of the error at the output. This
process is known as backpropagation.
• Two Types of Backpropagation Networks are:
1. Static Back-propagation
2. Recurrent Backpropagation
Static Back-propagation: It is one kind of backpropagation network which
produces a mapping of static input for static output.
Recurrent Back-propagation: It is data mining is fed forward until a fixed value
is achieved.
• xj, j = 0,… , d are the inputs and zh, h = 1, … , H are the hidden units where H is
the dimensionality of this hidden space. z0 is the bias of the hidden layer. yi, i = 1,
… , K are the output units. whj are weights in the first layer, and vih are the weights
in the second layer.
Advantages:
✓ It can be used to solve complex non-linear problems.
✓ It handles large amounts of input data well.
✓ It makes quick predictions after training.
✓ It works well with both small and large input data.

Disadvantages:

✓ Time consuming
✓ Depends on quality of training

Activation Function:

• In an artificial neural network, the function which takes the incoming signals as
input and produces the output signal is known as the activation function.
• The activation functions are:
✓ ReLU Function
✓ Sigmoid Function
✓ Linear Function
✓ Tanh Function
✓ Softmax Function

ReLU Function:

• It stands for Rectified Linear Unit.

• It is the most widely used activation function.
• Chiefly implemented in hidden layers of neural network.
• Equation: A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
• Value Range: [0, inf)
• Nature: Non-linear
• Uses: ReLu is less computationally expensive than tanh and sigmoid.
• It learns much faster than sigmoid and Tanh function.

Sigmoid Function:

• It is a function which is plotted as ‘S’ shaped graph.

• Equation: A= 1/(1+e-x)
• Nature: Non-linear
• Value range: 0 to 1
• Uses: Usually used in output layer of binary classification.

Linear Function:

• Linear function has the equation similar to as of a straight line i.e. y = x

• No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
• Range : -inf to +inf
• Uses : Linear activation function is used at just one place i.e. output layer.

Tanh Function:

• The activation that works almost always better than sigmoid function is Tanh
function also known as Tangent Hyperbolic function.
• Equation: f(x) = tanh(x) = 2/1+e-2x -1
• Value Range: -1 to +1
• Nature: Non-linear
• Uses: Usually used in hidden layers of a neural network as it’s value lies between
-1 to 1.

Softmax Function

• It is a subclass of the sigmoid function, the softmax function comes in handy when
dealing with multiclass classification issues.
• Used frequently when managing several classes.
• The softmax function would split by the sum of the outputs and squeeze all outputs
for each category between 0 and 1.

Network Training:

Training Set:

• Training set is a set of pairs of input patterns with corresponding desired output
patterns.
• Each pair represents how the network is supposed to respond to a particular input.
• The network is trained to respond correctly to each input pattern from the training
set.

Test Set:

• The test set is the dataset that the model is trained on.

Steps to train a neural model:

Step 1: First an ANN will require a random weight initialization.

Step 2: Split the dataset in batches (batch size)

Step 3: Send the batches 1 by 1 to the GPU

Step 4: Calculate the forward pass (what would be the output with the current weights)

Step 5: Compare the calculated output to the expected output (loss)

Step 6: Adjust the weights (using the learning rate increment or decrement) according to
the backward pass (backward gradient propagation)

Step 7: Go back to step 2

Gradient Descent Optimization:

• Gradient Descent is a generic optimization algorithm capable of finding optimal

solutions to a wide range of problems.
• The general idea is to tweak parameters iteratively in order to minimize the cost
function.
• An important parameter of Gradient Descent (GD) is the size of the steps,
determined by the learning rate hyperparameters.
Types of Gradient Descent:
✓ Batch Gradient Descent
✓ Stochastic Gradient Descent
✓ Mini-batch Gradient Descent

Batch Gradient Descent:

• Batch Gradient Descent involves calculations over the full training set at
each step as a result of which it is very slow on very large training data.

Stochastic Gradient Descent:

• In SGD, only one training example is used to compute the gradient and
update the parameters at each iteration.

Mini-batch Gradient Descent:

• In mini-batch gradient descent, a small batch of training examples is used to

compute the gradient and update the parameters at each iteration.

Stochastic Gradient Descent:

• In Stochastic Gradient Descent, a few samples are selected randomly instead of

the whole data set for each iteration.
• In Gradient Descent, there is a term called “batch” which denotes the total number
of samples from a dataset that is used for calculating the gradient for each
iteration.
• In SGD, only one training example is used to compute the gradient and update the
parameters at each iteration.
• This problem is solved by Stochastic Gradient Descent. In SGD, it uses only a
single sample, i.e., a batch size of one, to perform each iteration.
• SGD Algorithm:

• SGD is generally noisier than typical Gradient Descent, it usually took a higher
number of iterations to reach the minima, because of its randomness in its descent.

Advantages:
• Speed: SGD is faster than other variants of Gradient Descent.
• Memory Efficiency: It is memory-efficient and can handle large datasets that
cannot fit into memory.

Disadvantages:

• Noisy updates: The updates in SGD are noisy and have a high variance
• Slow Convergence: SGD may require more iterations to converge to the
minimum
• Less accurate
Error Backpropagation:

• Backpropagation is one of the important concepts of a neural network. or a single

training example.
• Backpropagation algorithm calculates the gradient of the error function.
• The main features of Backpropagation are the iterative, recursive and efficient
method through which it calculates the updated weight to improve the network until
it is not able to perform the task for which it is being trained.
• The Back propagation algorithm in neural network computes the gradient of the
loss function for a single weight by the chain rule.
• It efficiently computes one layer at a time, unlike a native direct computation. It
computes the gradient, but it does not define how the gradient is used.

1. Inputs X, arrive through the preconnected path

2. Input is modeled using real weights W. The weights are usually randomly
selected.

3. Calculate the output for every neuron from the input layer, to the hidden layers,
to the output layer.

4. Calculate the error in the outputs ErrorB= Actual Output – Desired Output

5. Travel back from the output layer to the hidden layer to adjust the weights such
that the error is decreased.

6. Keep repeating the process until the desired output is achieved.

Types of Backpropagation Networks:

• Static Back-Propagation
• Recurrent Back-Propagation

Static Back-Propagation

• It is one kind of backpropagation network which produces a mapping of a

static input for static output. It is useful to solve static classification.

Recurrent Back-Propagation:

• Recurrent Back propagation in data mining is fed forward until a fixed value
is achieved

Advantages:

• It does not have any parameters to tune except for the number of input.
• It is a standard process that usually works well.

Disadvantages:

• Backpropagation needs a very large amount of time for training.

• Backpropagation requires a matrix-based method instead of mini-batch.

Unit Saturation:

• The vanishing gradient problem is an issue that sometimes arises when training
machine learning algorithms through gradient descent.
• This most often occurs in neural networks that have several neuronal layers such
as in a deep learning system, but also occurs in recurrent neural networks.
• The key point is that the calculated partial derivatives used to compute the gradient
as one goes deeper into the network.
• Since the gradients control how much the network learns during training, the
gradients are very small or zero, then little to no training can take place, leading to
poor predictive performance.
The Problem:

• As more layers using certain activation functions are added to neural networks,
the gradients of the loss function approaches zero, making the network hard to
train.

Why:

• Certain activation functions, like the sigmoid function, squishes a large input
space into a small input space between 0 and 1.

The sigmoid function and its derivative:

Why it’s significant:

• For shallow network with only a few layers that use these activations, this isn't
a big problem. However, when more layers are used, it can cause the gradient
to be too small for training to work effectively.
• Gradients of neural networks are found using backpropagation. Simply put,
backpropagation finds the derivatives of the network by moving layer by layer
from the final layer to the initial one.
• By the chain rule, the derivatives of each layer are multiplied down the network.
• However, when n hidden layers use activation like the sigmoid an function, n
small derivatives are multiplied together.
• Thus, the gradient decreases exponentially as we propagate down to the initial
layers.

Solution:

• The simplest solution is to use other activation functions, such as ReLU, which
doesn't cause a small derivative.
• The residual connection directly adds the value at the beginning of the block,
x, to the end of the block (F(x) + x).
• This residual connection doesn't go through activation functions that
"squashes" the derivatives, resulting in a higher overall derivative of the block.
• Finally, batch normalization layers can also resolve the issue.

ReLU:

• It stands for Rectified Linear Unit.

• It is the most widely used activation function.
• Chiefly implemented in hidden layers of neural network.

• Equation: A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.

• Value Range: [0, inf)
• Nature: Non-linear
• Uses: ReLu is less computationally expensive than tanh and sigmoid.
• It learns much faster than sigmoid and Tanh function.
• An activation function for hidden units that has become popular recently with deep
networks is the rectified linear unit (ReLU), which is defined as

• Leaky ReLU: In this the output is also linear on the negative side

Advantages:

• Sparse representations lead to faster Training.

• it does not saturate

Disadvantages:

• The derivative is zero for a ≤ 0, there is no further training if, for a hidden unit,
the weighted sum somehow becomes negative.

Hyperparameter Tuning:

• A Machine Learning model is defined as a mathematical model with a number of

parameters that need to be learned from the data.
• By training a model with existing data, we are able to fit the model parameters.
• However, there is another kind of parameter, known as Hyperparameters, that
cannot be directly learned from the regular training process.
• They are usually fixed before the actual training process begins.
• Some examples of model hyperparameters include:
1. The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization
2. The learning rate for training a neural network.
3. The C and sigma hyperparameters for support vector machines.
4. The k in k-nearest neighbors
• The two best strategies for Hyperparameter tuning are: 1. GridSearchCV 2.
RandomizedSearchCV

GridSearchCV:

• In GridSearchCV approach, the machine learning model is evaluated for a range of

hyperparameter values.
• This approach is called GridSearchCV, because it searches for the best set of
hyperparameters from a grid of hyperparameters values.
• For example, if we want to set two hyperparameters C and Alpha of the Logistic
Regression Classifier model, with different sets of values.
• As in the image, for C = [0.1, 0.2, 0.3, 0.4, 0.5] and Alpha = [0.1, 0.2, 0.3, 0.4]. For
a combination of C=0.3 and Alpha=0.2, the performance score comes out to be
0.726(Highest), therefore it is selected.

The following code illustrates how to use GridSearchCV

# Necessary imports
from sklearn.linear_model import Logistic Regression
from sklearn.model_selection import GridSearchCV
# Creating the hyperparameter grid
c_space = np.logspace(-5, 8, 15) param_grid = {'C': c_space}
# Print the tuned parameters and score
print("Tuned Logistic Regression
Parameters:{}".format(logreg_cv.best_params_))
print("Best score is {}".format(logrcg_cv.best_score_))
Output: Tuned Logistic Regression Parameters: {'C': 3.7275937203149381) Best
score is 0.7708333333333334
Drawback:
GridSearch CV will go through all the intermediate combinations of
hyperparameters which makes grid search computationally very expensive.

RandomizedSearchCV:

• RandomizedSearchCV solves the drawbacks of GridSearchCV, as it goes through

only a fixed number of hyperparameter settings.
• This approach reduces unnecessary computation.

Batch Normalization:

Normalization:

• Normalization is a data pre-processing tool used to bring the numerical data to a

common scale without distorting its shape.

Batch Normalization:

• Batch normalization is a process to make neural networks faster and more stable
through adding extra layers in a deep neural network.
• The new layer performs the standardizing and normalizing operations on the input
of a layer coming from a previous layer.
• A typical neural network is trained using a collected set of input data called batch.
• A similar case can also be made for the hidden units, and this is the idea behind
batch normalization.
• For each batch or minibatch, for each hidden unit j we calculate the mean mj and
standard deviation sj of its values, and we first znormalize:
• We can then map these to have arbitrary mean γj and scale βj and then we apply
the activation function.

• First, mj and sj are calculated anew for each batch, and we see immediately that
batch normalization is not meaningful with online learning or very small
minibatches.
• Second, γj and βj are parameters that are initialized and updated (after each batch
or minibatch) using gradient descent, just like the connection weights. So they
require extra memory and computation.

Why Batch normalization?

• An internal covariate shift occurs when there is a change in the input distribution
to our network.
• When the input distribution changes, hidden layers try to learn to adapt to the
new distribution. This slows down the training process.

Advantages:

✓ Speed Up the Training

✓ Handles internal covariate shift
✓ The model is less delicate to hyperparameter tuning.
Regularization:

• Regularization is one of the most important concepts of machine learning.

• It is a technique to prevent the model from overfitting by adding extra information
to it.
• Regularization helps choose a simple model rather than a complex one.
• Generalization error is "a measure of how accurately an algorithm can predict
outcome values for previously unseen data."
• Regularization refers to the modifications that can be made to a leaming algorithm
that helps to reduce this generalization error.

Overfitting:

• Overfitting means that the model is a good fit on the train data compared to the
data.
• Overfitting is also a result of the model being too complex
• In other words, in such a scenario, the model has low bias and high variance
and is too complex. This is called overfitting.

Commonly used regularization techniques:

• Hints
• Weight Decay
• Ride Regression (or) L2 Regularization
• Lasso Regression (or) L1 Regularization.
• Dropout

Hints:

• Hints are properties of the target function that are known to us independent
of the training examples.

• The identity of the object does not change when it is translated, rotated, or
scaled.
• These are hints that can be incorporated into the learning process to make
learning easier.

Weight Decay:

• Incentivize the network to use smaller weights by adding a penalty to the

loss function.
• The idea in weight decay is to add some small constant background force
that always pulls a weight toward zero,

Ridge regression:

• The Ridge regression technique is used to analyze the model where the
variables may be having multicollinearity.
• It reduces the insignificant independent variables though it does not remove
them completely. This type of regularization uses the L₂ norm for regularization

Lasso regression:

• Least Absolute Shrinkage and Selection Operator (or LASSO) Regression

penalizes the coefficients to the extent that it becomes zero.
• It eliminates the insignificant independent variables. This regularization
technique uses the L1 norm for regularization.

Dropout:

• "Dropout" in machine learning refers to the process of randomly ignoring

certain nodes in a layer during training.
• The neural network on the left represents a typical neural network where all
units are activated. On the right, the red units have been dropped out of the model-
the values of their weights and biases are not considered during training.
• Dropout is used as a regularization technique - it prevents overfitting by
ensuring that no units are codependent.
• In dropout, we have a hyperparameter p, and we drop the input or hidden
unit with probability p, that is, set its output to zero, or keep it with probability 1 –
p.

Difference between Shallow and Deep Neural Network:

Difference between Stochastic Gradient Descent and Gradient Descent:

Difference between Data Mining and Machine Learning:

Bias:

• Neural network bias can be defined as the constant which is added to the product
of features and weights
• It is used to offset the result.
• It helps the models to shift the activation function towards the positive or
negative side.

Feed Forward neural network:

• "The process of receiving an input to produce some kind of output to make some
kind of prediction is known as Feed Forward."
• Feed Forward neural network is the core of many other important neural
networks such as convolution neural network.
Artificial Neuron:

• An artificial neuron is a connection point in an artificial neural network

• Artificial neural networks, like the human body's biological neural network,
have a layered architecture and each network node.

Multilayer Perceptron
No ratings yet
Multilayer Perceptron
11 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
No ratings yet
ML_MU_Unit_5NeuralNetworkpdf__2025_04_16_13_47_39
57 pages
Lesson 7.0 Supervised Learning With Neural Networks (1)
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks (1)
22 pages
Unit-1 and 2 and 3 (1)
No ratings yet
Unit-1 and 2 and 3 (1)
212 pages
Unit 5
No ratings yet
Unit 5
102 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
unit v
No ratings yet
unit v
9 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
UNIT V (1)
No ratings yet
UNIT V (1)
25 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Unit-5
No ratings yet
Unit-5
59 pages
Machine Learning NN
100% (2)
Machine Learning NN
16 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
NNDL
No ratings yet
NNDL
96 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
73 pages
AI_UNIT_5
No ratings yet
AI_UNIT_5
33 pages
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
No ratings yet
Cs3491-Artificial Intelligence and Machine Learning-1221091049-Unit 5 Aiml
38 pages
ANN Presentation Exam Tanjina
No ratings yet
ANN Presentation Exam Tanjina
21 pages
@vtucode - in Module 5 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 5 AI 2021 Scheme 5th Sem
66 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Neural NetworksChapter2Sup
No ratings yet
Neural NetworksChapter2Sup
20 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
UNIT V NEURAL NETWORKS
No ratings yet
UNIT V NEURAL NETWORKS
35 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Module 5 AIML Notes
No ratings yet
Module 5 AIML Notes
77 pages
Neural-Network(Basics)
No ratings yet
Neural-Network(Basics)
48 pages
AD3451 ML UNIT 4 NOTES
No ratings yet
AD3451 ML UNIT 4 NOTES
36 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
No ratings yet
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
50 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Unit -4 Artificial Neural Networks
No ratings yet
Unit -4 Artificial Neural Networks
33 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
72 pages
Unit 2
No ratings yet
Unit 2
18 pages
Chapter 5 Final
No ratings yet
Chapter 5 Final
80 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
NN Concepts
No ratings yet
NN Concepts
4 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
UNIT-I.pptx
No ratings yet
UNIT-I.pptx
90 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
ADVANCED_SUPERVISED_LEARNING[1]
No ratings yet
ADVANCED_SUPERVISED_LEARNING[1]
17 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Neural Networks Notes
No ratings yet
Neural Networks Notes
22 pages
4. ANNs
No ratings yet
4. ANNs
57 pages
Unit II
No ratings yet
Unit II
12 pages
UNIT-II chapter-2
No ratings yet
UNIT-II chapter-2
20 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Mechanical Behaviour Optimization of Saw Dust Ash and Quarry Dust Concrete Using Adaptive Neuro-Fuzzy Inference System
No ratings yet
Mechanical Behaviour Optimization of Saw Dust Ash and Quarry Dust Concrete Using Adaptive Neuro-Fuzzy Inference System
17 pages
COMP9417 Review Notes
No ratings yet
COMP9417 Review Notes
10 pages
Instant ebooks textbook Neural Networks and Deep Learning: A Textbook, 2nd Edition Charu C. Aggarwal download all chapters
100% (2)
Instant ebooks textbook Neural Networks and Deep Learning: A Textbook, 2nd Edition Charu C. Aggarwal download all chapters
40 pages
Using Python in AI
No ratings yet
Using Python in AI
50 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
ML - Viva QnA - Doubtly - in
No ratings yet
ML - Viva QnA - Doubtly - in
14 pages
GANs in Slanted Land - Solution
No ratings yet
GANs in Slanted Land - Solution
17 pages
Rate Coding or Direct Coding Which One Is Better For Accurate Robust and Energy-Efficient Spiking Neural Networks
No ratings yet
Rate Coding or Direct Coding Which One Is Better For Accurate Robust and Energy-Efficient Spiking Neural Networks
5 pages
MCA-SEM-III-Syllabus Mobile Computing
No ratings yet
MCA-SEM-III-Syllabus Mobile Computing
12 pages
Autonomous Land Vehicle in A Neural Network
No ratings yet
Autonomous Land Vehicle in A Neural Network
13 pages
(Er Ajay Gupta) Study On Reinforced Concrete Frame PDF
No ratings yet
(Er Ajay Gupta) Study On Reinforced Concrete Frame PDF
82 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Statistical Language Models Based On Neural Networks
No ratings yet
Statistical Language Models Based On Neural Networks
59 pages
Purple Gradient Artificial Intelligence Presentation
No ratings yet
Purple Gradient Artificial Intelligence Presentation
9 pages
1725877145module 3 How AI Works
No ratings yet
1725877145module 3 How AI Works
18 pages
Handwriting Recognition System-A Review: Pooja Yadav Nidhika Yadav
No ratings yet
Handwriting Recognition System-A Review: Pooja Yadav Nidhika Yadav
5 pages
A Brief Review On Image Restoration Techniques
No ratings yet
A Brief Review On Image Restoration Techniques
5 pages
Solar Radiation Prediction Using Artificial Neural Network: S. Shanmuga Priya, Mohammad Hashif Iqbal
No ratings yet
Solar Radiation Prediction Using Artificial Neural Network: S. Shanmuga Priya, Mohammad Hashif Iqbal
4 pages
P15CS71 - Z2
No ratings yet
P15CS71 - Z2
3 pages
Training of Neural Networks: Q.J. Zhang, Carleton University
No ratings yet
Training of Neural Networks: Q.J. Zhang, Carleton University
44 pages
Seminar Deep Learning
No ratings yet
Seminar Deep Learning
17 pages
Fanntoolusersguide 161106045408
No ratings yet
Fanntoolusersguide 161106045408
21 pages
Introduction To Deep Learning-1
No ratings yet
Introduction To Deep Learning-1
16 pages
Practice Questions CNNs Solns
No ratings yet
Practice Questions CNNs Solns
11 pages
Application of Deep Learning in Software Testing and Quality Assurance
No ratings yet
Application of Deep Learning in Software Testing and Quality Assurance
13 pages
Whitepaper-AI and ML in Real-Time System Operations
No ratings yet
Whitepaper-AI and ML in Real-Time System Operations
57 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

AI & ML Unit 5 Notes

Uploaded by

AI & ML Unit 5 Notes

Uploaded by

Unit – V

• The brain’s information-processing capacity is thought to emerge primarily from

• It is a function which is plotted as ‘S’ shaped graph.

• It is a machine learning algorithm for supervised learning of binary classifiers.

• This is one of the easiest Artificial Neural Networks types.

• Multi-layer perception is also known as MLP.

• It stands for Rectified Linear Unit.

• It is a function which is plotted as ‘S’ shaped graph.

• Linear function has the equation similar to as of a straight line i.e. y = x

Steps to train a neural model:

Step 1: First an ANN will require a random weight initialization.

Step 2: Split the dataset in batches (batch size)

Step 3: Send the batches 1 by 1 to the GPU

Step 5: Compare the calculated output to the expected output (loss)

Step 7: Go back to step 2

Gradient Descent Optimization:

• Gradient Descent is a generic optimization algorithm capable of finding optimal

Batch Gradient Descent:

Stochastic Gradient Descent:

Mini-batch Gradient Descent:

• In mini-batch gradient descent, a small batch of training examples is used to

Stochastic Gradient Descent:

• In Stochastic Gradient Descent, a few samples are selected randomly instead of

• Backpropagation is one of the important concepts of a neural network. or a single

1. Inputs X, arrive through the preconnected path

6. Keep repeating the process until the desired output is achieved.

• It is one kind of backpropagation network which produces a mapping of a

• Backpropagation needs a very large amount of time for training.

The sigmoid function and its derivative:

Why it’s significant:

• It stands for Rectified Linear Unit.

• Equation: A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.

• Sparse representations lead to faster Training.

• A Machine Learning model is defined as a mathematical model with a number of

• In GridSearchCV approach, the machine learning model is evaluated for a range of

The following code illustrates how to use GridSearchCV

• RandomizedSearchCV solves the drawbacks of GridSearchCV, as it goes through

• Normalization is a data pre-processing tool used to bring the numerical data to a

Why Batch normalization?

✓ Speed Up the Training

• Regularization is one of the most important concepts of machine learning.

Commonly used regularization techniques:

• Incentivize the network to use smaller weights by adding a penalty to the

• Least Absolute Shrinkage and Selection Operator (or LASSO) Regression

• "Dropout" in machine learning refers to the process of randomly ignoring

Difference between Shallow and Deep Neural Network:

Difference between Data Mining and Machine Learning:

Feed Forward neural network:

• An artificial neuron is a connection point in an artificial neural network

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.