0% found this document useful (0 votes)

35 views

Neural Networks and CNN

Artificial neural networks are machine learning models inspired by the human nervous system. They consist of interconnected neurons organized in layers that pass messages between each other. Deep learning is a subset of machine learning that uses artificial neural networks with many layers to learn representations of data without needing to be programmed with domain-specific features. Perceptrons are the simplest type of neural network, consisting of inputs, weights, a weighted sum function, and an activation function to produce an output. They can learn linearly separable patterns but multilayer perceptrons are needed for more complex problems.

Uploaded by

cn8q8nvnd5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views

Neural Networks and CNN

Uploaded by

cn8q8nvnd5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Artificial Neural Networks and Deep Learning

Artificial neural networks (briefly, "nets" or ANNs) represent a class of machine learning models
loosely inspired by studies about the central nervous systems of mammals. Each ANN is made up of
several interconnected "neurons," organized in "layers." Neurons in one layer pass messages to
neurons in the next layer (they "fire," in jargon terms) and this is how the network computes things.

What Is Deep Learning?

Deep learning is a subset of machine learning, under the broader umbrella of artificial intelligence.
However, deep learning has some key distinctions from traditional machine learning methods that
make it unique in the field.

In traditional Machine learning techniques, most of the applied features need to be identified by an
domain expert in order to reduce the complexity of the data and make patterns more visible to
learning algorithms to work. The biggest advantage Deep Learning algorithms as discussed before
are that they try to learn high-level features from data in an incremental manner. This eliminates the
need of domain expertise and hard core feature extraction.
When to use Deep Learning or not over others?
 Deep Learning out perform other techniques if the data size is large. But with small data size,
traditional Machine Learning algorithms are preferable.
 Deep Learning techniques need to have high end infrastructure to train in reasonable time.
 When there is lack of domain understanding for feature introspection, Deep Learning
techniques outshines others as you have to worry less about feature engineering.
 Deep Learning really shines when it comes to complex problems such as image classification,
natural language processing, and speech recognition.

Why rise of deep learning now:

Bigdata:
Availability of large datasets.

Easier collection and storage of data.

There has been an exponential growth of data in the last few years
How machine learning techniques scale with amount of data

Hardware:
Increased performance of computer processors (CPUs & GPUs) and larger storage media for
retaining huge training datasets.

Software:
Today, Deep Learning frameworks are everywhere — most companies are open sourcing the entire
pipeline from model generation, debugging and all the way through to operating in production. By
open sourcing these products they gain valuable insights from the open source community and
business who can further develop and implement interesting new neural network type.
Until recently, neural networks were limited by computing power and thus were limited in
complexity. However, advancements in Big Data analytics have permitted larger, sophisticated
neural networks, allowing computers to observe, learn, and react to complex situations faster than
humans. Deep learning has aided image classification, language translation, speech recognition. It
can be used to solve any pattern recognition problem and without human intervention.

Rise of Artificial Neurons (Based on Biological Neuron)

Researchers Warren McCullock and Walter Pitts published their first concept of simplified brain cell
in 1943. This was called McCullock-Pitts (MCP) neuron which later became known as an Artficial
Neural Network. They described such a neuron as a simple logic gate with binary outputs.

Multiple signals arrive at the dendrites and are then integrated into the cell body, and, if the
accumulated signal exceeds a certain threshold, an output signal is generated that will be passed on
by the axon.

Biological neuron
A biological neuron consists of one cell body, multiple dendrites, and a single axon. The connections
between neurons are known as synapses. The neuron receives stimuli on the dendrites, and in cases
of sufficient stimuli, the neuron fires (also known as getting activated or excited) and outputs
stimulus on its axon, which is transmitted to other neurons that have synaptic connections to the
excited neuron. Synaptic signals can be excitatory or inhibitory; that is, some signals can prevent a
neuron from firing instead of causing it to fire.
What is Artificial Neuron
An artificial neuron is a mathematical function based on a model of biological neurons, where each
neuron takes inputs, weighs them separately, sums them up and passes this sum through a
nonlinear function to produce output.

Artificial Neuron at a Glance

The artificial neuron has the following characteristics:

 A neuron is a mathematical function modeled on the working of biological neurons

 It is an elementary unit in an artificial neural network
 One or more inputs are separately weighted
 Inputs are summed and passed through a nonlinear function to produce output
 Every neuron holds an internal state called activation signal
 Each connection link carries information about the input signal
 Every neuron is connected to another neuron via connection link

Perceptron
The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. It is
based on a slightly different artificial neuron called a threshold logic unit (TLU), or sometimes a
linear threshold unit (LTU). The inputs and output are numbers (instead of binary on/off values), and
each input connection is associated with a weight. The TLU computes a weighted sum of its inputs (z
= w x + w x + ⋯ + w x = x w), then applies a step function to that sum and outputs the result: h (x) =
step(z), where z = x w.

This algorithm enables neurons to learn and processes elements in the training set one at a time. A
perceptron consists of four parts: input values, weights and a bias, a weighted sum(Net input
function), and activation function
There are two types of Perceptrons: Single layer and Multilayer.

 Single layer - Single layer perceptrons can learn only linearly separable patterns
 Multilayer - Multilayer perceptrons or feedforward neural networks with two or more layers
have the greater processing power

Perceptron Learning Rule

The Perceptron Learning Rule states that an algorithm will automatically learn the optimal weight
coefficients. The perceptron receives multiple input signals, and if the sum of the input signals
exceeds a certain threshold, it either sends a signal or does not return an output. In the context of
supervised learning and classification, this can then be used to predict the class of a sample.

The Perceptron algorithm learns the weights for the input signals in order to draw a linear decision
boundary. What if we wanted the outputs to fall into a certain range say 0 to 1.We can do this by
using an activation function. An activation function is a function that converts the input given (the
input, in this case, would be the weighted sum) into a certain output based on a set of rules.

The original Perceptron was designed to take a number of binary inputs, and produce one binary
output (0 or 1).

The idea was to use different weights to represent the importance of each input, and that the sum
of the values should be greater than a threshold value before making a decision like true or false (0
or 1).

Example
Imagine a perceptron (in your brain).

The perceptron tries to decide if you should go to a musical show.

Is the artist good? Is the weather good? Will Friend Come? Is food served? Is Alcohol Served?
What weights should these facts have?

Criteria Input Weight

Artists is Good x1 = 0 or 1 w1 = 0.7

Weather is Good x2 = 0 or 1 w2 = 0.6

Friend will Come x3 = 0 or 1 w3 = 0.5

Food is Served x4 = 0 or 1 w4 = 0.3

Alcohol is Served x5 = 0 or 1 w5 = 0.4

1. Set a threshold value:

Threshold = 1.5

2. Multiply all inputs with its weights:

x1 * w1 = 1 * 0.7 = 0.7
x2 * w2 = 0 * 0.6 = 0
x3 * w3 = 1 * 0.5 = 0.5
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4

3. b Sum all the results:

0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)
4. Activate the Output:
Return true if the sum > 1.5 ("Yes I will go to the Concert")

Perceptron Terminology

Perceptron Inputs

 Node values
 Node Weights
 Activation Function
 Perceptron Inputs

Perceptron inputs are called nodes.

The nodes have both a value and a weight.

Node Values

In the example above, the node values are: 1, 0, 1, 0, 1. The binary input values (0 or 1) can be
interpreted as (no or yes) or (false or true).

Node Weights
Weights shows the strength of each node. In the example above, the node weights are: 0.7, 0.6, 0.5,
0.3, 0.4

The Activation Function

Activation functions introduce non-linear properties to our network, allowing it to learn more
complex functions. The main purpose of an activation function is to convert an input signal of a node
in an artificial neural network to an output signal. This output signal is then used as an input in the
next layer in the stack.

Let’s suppose we have a neural network working without the activation functions. In that case, every
neuron will only be performing a linear transformation on the inputs using the weights and biases.
It’s because it doesn’t matter how many hidden layers we attach in the neural network; all layers will
behave in the same way because the composition of two linear functions is a linear function itself.

Types of activation functions

Sigmoid / Logistic Activation Function

This function takes any real value as input and outputs values in the range of 0 to 1.

The larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller
the input (more negative), the closer the output will be to 0.0, as shown below.

Mathematically it can be represented as:

Here’s why sigmoid/logistic activation function is one of the most widely used functions:
 It is commonly used for models where we have to predict the probability as an output. Since
probability of anything exists only between the range of 0 and 1, sigmoid is the right choice
because of its range.
 The function is differentiable and provides a smooth gradient, i.e., preventing jumps in
output values. This is represented by an S-shape of the sigmoid activation function.

The limitations of sigmoid function are discussed below:

 The derivative of the function is f'(x) = sigmoid(x)*(1-sigmoid(x)).

As we can see from the above Figure, the gradient values are only significant for range -3 to
3, and the graph gets much flatter in other regions.
It implies that for values greater than 3 or less than -3, the function will have very small
gradients. As the gradient value approaches zero, the network ceases to learn and suffers
from the Vanishing gradient problem.
 The output of the logistic function is not symmetric around zero. So the output of all the
neurons will be of the same sign. This makes the training of the neural network more
difficult and unstable.

Tanh Activation Function (Hyperbolic Tangent)

Tanh function is very similar to the sigmoid/logistic activation function, and even has the same S-
shape with the difference in output range of -1 to 1. In Tanh, the larger the input (more positive), the
closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the
output will be to -1.0.
Mathematically it can be represented as:

Advantages of using this activation function are:

 The output of the tanh activation function is Zero centered; hence we can easily map the
output values as strongly negative, neutral, or strongly positive.
 Usually used in hidden layers of a neural network as its values lie between -1 to; therefore,
the mean for the hidden layer comes out to be 0 or very close to it. It helps in centering the
data and makes learning for the next layer much easier.

ReLU Activation Function

ReLU stands for Rectified Linear Unit.

Although it gives an impression of a linear function, ReLU has a derivative function and allows for
backpropagation while simultaneously making it computationally efficient.

The main catch here is that the ReLU function does not activate all the neurons at the same time.

The neurons will only be deactivated if the output of the linear transformation is less than 0.
Mathematically it can be represented as:

The advantages of using ReLU as an activation function are as follows:

 Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
 ReLU accelerates the convergence of gradient descent towards the global minimum of the
loss function due to its linear, non-saturating property.

The limitations faced by this function are:

 The Dying ReLU proble.

Leaky ReLU Activation Function

Leaky ReLU is an improved version of ReLU function to solve the Dying ReLU problem as it has a
small positive slope in the negative area.
Mathematically it can be represented as:

Softmax Activation Function

Before exploring the ins and outs of the Softmax activation function, we should focus on its building
block—the sigmoid/logistic activation function that works on calculating probability values.

The output of the sigmoid function was in the range of 0 to 1, which can be thought of as probability.

This function faces certain problems. Let’s suppose we have five output values of 0.8, 0.9, 0.7, 0.8,
and 0.6, respectively. How can we move forward with it?

The answer is: We can’t.

The above values don’t make sense as the sum of all the classes/output probabilities should be equal
to 1.

The Softmax function is described as a combination of multiple sigmoids.

It calculates the relative probabilities. Similar to the sigmoid/logistic activation function, the SoftMax
function returns the probability of each class.

It is most commonly used as an activation function for the last layer of the neural network in the
case of multi-class classification.

Mathematically it can be represented as:

Multi-layer ANN
A Perceptron is simply composed of a single layer of neurons, with each neuron connected to all the
inputs. When all the neurons in a layer are connected to every neuron in the previous layer (i.e., its
input neurons), the layer is called a fully connected layer, or a dense layer. The inputs of the
Perceptron are fed to special passthrough neurons called input neurons: they output whatever input
they are fed. All the input neurons form the input layer. Moreover, an extra bias feature is generally
added (x = 1): it is typically represented using a special type of neuron called a bias neuron, which
outputs 1 all the time

A Multilayer Perceptron has input and output layers, and one or more hidden layers with many
neurons stacked together. And while in the Perceptron the neuron must have an activation function
that imposes a threshold, like ReLU or sigmoid, neurons in a Multilayer Perceptron can use any
arbitrary activation function.
Note that the input and the output layers are visible from outside, while all the other layers in the
middle are hidden – hence the name hidden layers.

The circles are nodes or neurons, with their functions on the data and the edges/lines connecting
them are the weights/information being passed along.

Each column represents a layer. The first layer of your data is the input layer. Then, the hidden layers
are located between the input layer and the output layer.

If you have one or a few hidden layers, then you have a shallow neural network. If you have many
hidden layers, then you have a deep neural network.

Multilayer Perceptron falls under the category of feedforward algorithms, because inputs are
combined with the initial weights in a weighted sum and subjected to the activation function, just
like in the Perceptron. But the difference is that each linear combination is propagated to the next
layer.

Each layer is feeding the next one with the result of their computation, their internal representation
of the data. This goes all the way through the hidden layers to the output layer.

A neural network is usually described as having different layers. The first layer is the input layer, it
picks up the input signals and passes them to the next layer. The next layer does all kinds of
calculations and feature extractions—it’s called the hidden layer. Often, there will be more than one
hidden layer. And finally, there’s an output layer, which delivers the final result.
Overfitting In ANN
Overfitting is a condition that occurs when a machine learning or deep neural network model
performs significantly better for training data than it does for new data.

Overfitting is the result of an ML model placing importance on relatively unimportant information in

the training data. When an ML model has been overfit, it can't make accurate predictions about new
data because it can't distinguish extraneous (noisey) data from essential data that forms a pattern.

Regularization
Regularization refers to a set of different techniques that lower the complexity of a neural network
model during training, and thus prevent the overfitting.

Dropout

Dropout is a technique where during each iteration of gradient descent, we drop a set of randomly
selected nodes. This means that we ignore some nodes randomly as if they do not exist. Each neuron
is kept with a probability of q and dropped randomly with probability 1-q. The value q may be
different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0 for input
layer works well on a wide range of tasks. The idea behind Dropout is as follows − In a neural
network without dropout regularization, neurons develop co-dependency amongst each other that
leads to overfitting.

Early Stopping

A problem with training neural networks is in the choice of the number of training epochs to use.

Too many epochs can lead to overfitting of the training dataset, whereas too few may result in an
underfit model. Early stopping is a method that allows you to specify an arbitrary large number of
training epochs and stop training once the model performance stops improving on a hold out
validation dataset.

The idea behind early stopping is intuitive; we stop training when the error starts to increase. Here,
by error, we mean the error measured on validation data, which is the part of training data used for
tuning hyper-parameters. In this case, the hyper-parameter is the stop criteria.
Data Augmentation

Data Augmentation is a technique that can be used to artificially expand the size of a training set by
creating modified data from the existing one. This is achieved my making some transformations on
the existing data

 Cropping
A section of the image is selected, cropped and then resized to the original image size.
 Flipping
The image is flipped horizontally and vertically. Flipping rearranges the pixels while
protecting the features of the image. Vertical flipping is not meaningful for some photos, but
it can be useful in cosmology or for microscopic photos.
 Rotation
The image is rotated by a degree between 0 and 360 degree. Every rotated image will be
unique in the model.
 Scaling
The image is scaled outward and inward. An object in new image can be smaller or bigger
than in the original image by scaling.
 Translation
The image is shifted into various areas along the x-axis or y-axis, so neural network looks
everywhere in the image to capture it.
 Brightness
The brightness of the image is changed and new image will be darker or lighter. This
technique allows the model to recognize image in different lighting levels.
 Contrast
The contrast of the image is changed and new image will be different from luminance and
colour aspects. The following image’s contrast is changed randomly
 Colour Augmentation
The colour of image is changed by new pixel values. There is an example image which is
grayscale.
 Padding
In padding, the image is padded with a given value on all sides
Transfer Learning

Transfer learning is when elements of a pre-trained model are reused in a new machine learning
model.A neural network model is first trained on a problem similar to the problem that is being
solved. One or more layers from the trained model are then used in a new model trained on the
problem of interest. This means reusing the weights in one or more layers from a pre-trained
network model in a new model and either keeping the weights fixed, fine tuning them, or adapting
the weights entirely when training the model.
DEEP NEURAL NETWORKS
A deep neural network (DNN) is an ANN with multiple hidden layers between the input and output
layers. Similar to shallow ANNs, DNNs models complex non-linear functions. The main purpose of a
neural network is to receive a set of inputs, perform progressively complex calculations on them,
and give output to solve real world problems like classification. We have an input, an output, and a
flow of sequential data in a deep network.

Main Types of Deep Neural Networks

The following types of deep neural networks are popularly used today:

 Multi-Layer Perceptrons (MLP)

 Convolutional Neural Networks (CNN)
 Recurrent Neural Networks (RNN)

Convolutional Neural Networks (CNN)

 CNN is a feed-forward neural network, whichs is widely used for image recognition.
 CNN represents the input data in the form of multidimensional arrays .
 CNN extract each and every portion of input image, which is known as receptive field.
 It assigns weights for each neuron based on the significant role of the receptive field.

How do convolutional neural networks work?

Convolutional neural networks are distinguished from other neural networks by their superior
performance with image, speech, or audio signal inputs. They have three main types of layers, which
are:

 Convolutional layer
 Pooling layer
 Fully-connected (FC) layer

The convolutional layer is the first layer of a convolutional network. While convolutional layers can
be followed by additional convolutional layers or pooling layers, the fully-connected layer is the final
layer. With each layer, the CNN increases in its complexity, identifying greater portions of the image.
Earlier layers focus on simple features, such as colors and edges. As the image data progresses
through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it
finally identifies the intended object.

Convolutional Layer
This is the very first layer in the CNN that is responsible for the extraction of the different features
from the input images. The convolution mathematical operation is done between the input image
and a filter of a specific size MxM in this layer.

The convolutional layer is the core building block of a CNN, and it is where the majority of
computation occurs. It requires a few components, which are input data, a filter, and a feature map.
Let’s assume that the input will be a color image, which is made up of a matrix of pixels in 3D. This
means that the input will have three dimensions—a height, width, and depth—which correspond to
RGB in an image. We also have a feature detector, also known as a kernel or a filter, which will move
across the receptive fields of the image, checking if the feature is present. This process is known as a
convolution.

The feature detector is a two-dimensional (2-D) array of weights, which represents part of the
image. While they can vary in size, the filter size is typically a 3x3 matrix; this also determines the
size of the receptive field. The filter is then applied to an area of the image, and a dot product is
calculated between the input pixels and the filter. This dot product is then fed into an output array.
Afterwards, the filter shifts by a stride, repeating the process until the kernel has swept across the
entire image. The final output from the series of dot products from the input and the filter is known
as a feature map, activation map, or a convolved feature.

As you can see in the image above, each output value in the feature map does not have to connect
to each pixel value in the input image. It only needs to connect to the receptive field, where the filter
is being applied. Since the output array does not need to map directly to each input value,
convolutional (and pooling) layers are commonly referred to as “partially connected” layers.

there are three hyperparameters which affect the volume size of the output that need to be set
before the training of the neural network begins. These include:

1. The number of filters affects the depth of the output. For example, three distinct filters
would yield three different feature maps, creating a depth of three.

2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix.
While stride values of two or greater is rare, a larger stride yields a smaller output.
3. Zero-padding is usually used when the filters do not fit the input image. This sets all
elements that fall outside of the input matrix to zero, producing a larger or equally sized
output. There are three types of padding:
 Valid padding: This is also known as no padding. In this case, the last convolution is
dropped if dimensions do not align.
 Same padding: This padding ensures that the output layer has the same size as the
input layer
 Full padding: This type of padding increases the size of the output by adding zeros to
the border of the input.

After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU) transformation to the
feature map, introducing nonlinearity to the model.

As we mentioned earlier, another convolution layer can follow the initial convolution layer. When
this happens, the structure of the CNN can become hierarchical as the later layers can see the pixels
within the receptive fields of prior layers. As an example, let’s assume that we’re trying to
determine if an image contains a bicycle. You can think of the bicycle as a sum of parts. It is
comprised of a frame, handlebars, wheels, pedals, et cetera. Each individual part of the bicycle
makes up a lower-level pattern in the neural net, and the combination of its parts represents a
higher-level pattern, creating a feature hierarchy within the CNN.
Ultimately, the convolutional layer converts the image into numerical values, allowing the neural
network to interpret and extract relevant patterns.

Example

The kernel is a matrix of weights.

The following 3x3 kernel detects vertical lines.

Let us imagine an 9x9 input image of a plus sign.

This has two kinds of lines, horizontal and vertical, and a crossover.

we want to test the vertical line detector kernel on the plus sign image
Imagine we want to test the vertical line detector kernel on the plus sign image. To perform the
convolution, we slide the convolution kernel over the image. At each position, we multiply each
element of the convolution kernel by the element of the image that it covers, and sum the results.

Since the kernel has width 3, it can only be positioned at 7 different positions horizontally in an
image of width 9. So the end result of the convolution operation on an image of size 9x9 with a 3x3
convolution kernel is a new image of size 7x7.

So in the above example, first the kernel is placed in the top left corner and each element of the
kernel is multiplied by each element in the red box in the top left of the original image. Since these
values are all 0, the result for that cell is 0 in the top left of the output matrix.

Now let us consider the position of the blue box in the above example. It contains part of a vertical
line. When the kernel is placed over this vertical line, it matches and returns 3.

Recall that this convolution kernel is a vertical line detector. For the parts of the original image which
contained a vertical line, the kernel has returned a value 3, whereas it has returned a value of 1 for
the horizontal line, and 0 for the empty areas of the image.

In practice, a convolution kernel contains both weights and biases, similar to the formula for linear
regression. So an input pixel is multiplied by the weight and then the bias is added.
Pooling Layer
Pooling layer is used to reduce the size of the representations and to speed up calculations, as well
as to make some of the features it detects a bit more robust.

Pooling layers, also known as downsampling, conducts dimensionality reduction, reducing the
number of parameters in the input. Similar to the convolutional layer, the pooling operation sweeps
a filter across the entire input, but the difference is that this filter does not have any weights.
Instead, the kernel applies an aggregation function to the values within the receptive field,
populating the output array. There are two main types of pooling:

 Max pooling: As the filter moves across the input, it selects the pixel with the maximum
value to send to the output array. As an aside, this approach tends to be used more often
compared to average pooling.
 Average pooling: As the filter moves across the input, it calculates the average value within
the receptive field to send to the output array.

While a lot of information is lost in the pooling layer, it also has a number of benefits to the CNN.
They help to reduce complexity, improve efficiency, and limit risk of overfitting.

Fully-Connected Layer
The name of the full-connected layer aptly describes itself. As mentioned earlier, the pixel values of
the input image are not directly connected to the output layer in partially connected layers.
However, in the fully-connected layer, each node in the output layer connects directly to a node in
the previous layer.

This layer performs the task of classification based on the features extracted through the previous
layers and their different filters. While convolutional and pooling layers tend to use ReLu functions,
FC layers usually leverage a softmax activation function to classify inputs appropriately, producing a
probability from 0 to 1.

Convolutional neural network representation

Example of a CNN to detect hand written numbers

Commonly used CNN architectures

Almost all CNN architectures follow the same general design principles of:

 Successively applying convolutional layers to the input,

 periodically downsampling the spatial dimensions while increasing the number of feature
maps.
 These architectures serve as general design guidelines which machine learning practitioners
can adapt to solve various computer vision tasks.
 These architectures serve as rich feature extractors which can be used for image
classification, object detection, image segmentation, and many other more advanced tasks

Classic network architectures (included for historical purposes)

 LeNet-5
 AlexNet
 VGG 16

Modern network architectures

 Inception
 ResNet
 ResNeXt
 DenseNet

FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
35 pages
dp learn
No ratings yet
dp learn
72 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Unit 1 Notes Final.docx
No ratings yet
Unit 1 Notes Final.docx
36 pages
UNIT-II chapter-2
No ratings yet
UNIT-II chapter-2
20 pages
Module 1
No ratings yet
Module 1
100 pages
SOS Final Submission
No ratings yet
SOS Final Submission
36 pages
Neural Network
No ratings yet
Neural Network
85 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
SCT UNIT-2
No ratings yet
SCT UNIT-2
30 pages
lect 5
No ratings yet
lect 5
41 pages
FALLSEM2024-25_BCSE209L_TH_VL2024250101737_2024-08-06_Reference-Material-I
No ratings yet
FALLSEM2024-25_BCSE209L_TH_VL2024250101737_2024-08-06_Reference-Material-I
20 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
UNIT-4 Material
No ratings yet
UNIT-4 Material
43 pages
MODULE 1 DL
No ratings yet
MODULE 1 DL
6 pages
ADVANCED_SUPERVISED_LEARNING[1]
No ratings yet
ADVANCED_SUPERVISED_LEARNING[1]
17 pages
Mi 2
No ratings yet
Mi 2
605 pages
The Introduction To Neural Networks 10 4 24
No ratings yet
The Introduction To Neural Networks 10 4 24
54 pages
Neural Networks - V Unit (2)
No ratings yet
Neural Networks - V Unit (2)
43 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
14 pages
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
24 pages
eL_Assignment
No ratings yet
eL_Assignment
10 pages
AI: Neural Network For Beginners (Part 1 of 3) : Sacha Barber
No ratings yet
AI: Neural Network For Beginners (Part 1 of 3) : Sacha Barber
9 pages
Mod-1 Part 1
No ratings yet
Mod-1 Part 1
143 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Artifical Neural Network
No ratings yet
Artifical Neural Network
69 pages
20200428135045cfbc718e2c (1)
No ratings yet
20200428135045cfbc718e2c (1)
30 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
ML UNIT 3 NOTES
No ratings yet
ML UNIT 3 NOTES
37 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
chp1 NN, MLFFN, weight, bias, threshold, activation fn, loss fn
No ratings yet
chp1 NN, MLFFN, weight, bias, threshold, activation fn, loss fn
19 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
UNIT-II MLT1
No ratings yet
UNIT-II MLT1
45 pages
Unit-V
No ratings yet
Unit-V
42 pages
Week 2
No ratings yet
Week 2
47 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Deep learning notes
No ratings yet
Deep learning notes
47 pages
Unit 4
No ratings yet
Unit 4
9 pages
28 Lecture CSC462
No ratings yet
28 Lecture CSC462
28 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
54 pages
What Is Perceptron - Simplilearn
No ratings yet
What Is Perceptron - Simplilearn
46 pages
Module 3 Ppt
No ratings yet
Module 3 Ppt
83 pages
8.2.1: Introduction To Neural Networks: Objectives
No ratings yet
8.2.1: Introduction To Neural Networks: Objectives
11 pages
Perceptron For Class
No ratings yet
Perceptron For Class
28 pages
unit-3_ml[1]
No ratings yet
unit-3_ml[1]
21 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
Chapter 3-1 Neural Network
No ratings yet
Chapter 3-1 Neural Network
43 pages
CO2- ANN Structure and Funadamentals_P1
No ratings yet
CO2- ANN Structure and Funadamentals_P1
65 pages
Neural network
No ratings yet
Neural network
7 pages
Introduction To Artificial Neural Networks and Perceptron
No ratings yet
Introduction To Artificial Neural Networks and Perceptron
59 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Module 1 Ann
No ratings yet
Module 1 Ann
31 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
Neural Networks Notes
No ratings yet
Neural Networks Notes
22 pages
NNDL
No ratings yet
NNDL
96 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
Unit I - Afs
No ratings yet
Unit I - Afs
18 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
1 s2.0 S0303264718302843 Main
No ratings yet
1 s2.0 S0303264718302843 Main
12 pages
Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality
No ratings yet
Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality
19 pages
Notes Artificial Intelligence Unit 5
No ratings yet
Notes Artificial Intelligence Unit 5
11 pages
ENDSEM DEEP LEARNING IMPORTANT
No ratings yet
ENDSEM DEEP LEARNING IMPORTANT
2 pages
Image Classification Using Convolutional Neural Network With Python
No ratings yet
Image Classification Using Convolutional Neural Network With Python
8 pages
Assessing The Reliability of Artificial Neural Networks
No ratings yet
Assessing The Reliability of Artificial Neural Networks
6 pages
862-Article Text-2984-1-10-20230105 2
No ratings yet
862-Article Text-2984-1-10-20230105 2
14 pages
Thesis MAlnet
No ratings yet
Thesis MAlnet
78 pages
5 2 Multilayer Perceptron
No ratings yet
5 2 Multilayer Perceptron
17 pages
Sem 7 - Ai & DS
No ratings yet
Sem 7 - Ai & DS
57 pages
Deep Learning
No ratings yet
Deep Learning
43 pages
A Deep Learning Based Multi Agent System For Intrusion Detection
No ratings yet
A Deep Learning Based Multi Agent System For Intrusion Detection
13 pages
Unit II - Perceptron
No ratings yet
Unit II - Perceptron
20 pages
Curriculum Vitae I. General Information: (If Applicable)
No ratings yet
Curriculum Vitae I. General Information: (If Applicable)
24 pages
Speech Recognition Using Artificial Neural Network: - A Review
No ratings yet
Speech Recognition Using Artificial Neural Network: - A Review
4 pages
Computational Intelligence
No ratings yet
Computational Intelligence
54 pages
Lohitha
No ratings yet
Lohitha
7 pages
DNN Accelerators For Heterogeneous HPC
No ratings yet
DNN Accelerators For Heterogeneous HPC
53 pages
Lecture 4
No ratings yet
Lecture 4
146 pages
Damage Detection in Beams Using Spatial Fourier Analysis and Neural Networks
No ratings yet
Damage Detection in Beams Using Spatial Fourier Analysis and Neural Networks
15 pages
Major base 3
No ratings yet
Major base 3
43 pages
Rishik Rangaraju Annotated Bibliography 9
No ratings yet
Rishik Rangaraju Annotated Bibliography 9
2 pages
Energy Conversion and Management: Review
No ratings yet
Energy Conversion and Management: Review
18 pages
Enhancing Surface Quality of Metal Parts Manufactured Via LPBF: ANN Classifier and Bayesian Learning Approach
No ratings yet
Enhancing Surface Quality of Metal Parts Manufactured Via LPBF: ANN Classifier and Bayesian Learning Approach
9 pages
Bihl, Trevor J. - Zobaa, Ahmed F - Big Data Analytics in Future Power Systems (2019)
No ratings yet
Bihl, Trevor J. - Zobaa, Ahmed F - Big Data Analytics in Future Power Systems (2019)
189 pages
Multilayer Perceptron and Uppercase Handwritten Characters Recognition
No ratings yet
Multilayer Perceptron and Uppercase Handwritten Characters Recognition
4 pages
Blasting-Induced Flyrock and Ground Vibration Prediction PDF
No ratings yet
Blasting-Induced Flyrock and Ground Vibration Prediction PDF
14 pages
Data Driven Density Functional Design Unformatted
No ratings yet
Data Driven Density Functional Design Unformatted
61 pages
Deep Learning For Network Traffic Monitoring and Analysis (NTMA)
No ratings yet
Deep Learning For Network Traffic Monitoring and Analysis (NTMA)
23 pages
MTech I YEAR - II SEM QB
No ratings yet
MTech I YEAR - II SEM QB
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Neural Networks and CNN

Uploaded by

Neural Networks and CNN

Uploaded by

Artificial Neural Networks and Deep Learning

What Is Deep Learning?

Why rise of deep learning now:

Easier collection and storage of data.

Rise of Artificial Neurons (Based on Biological Neuron)

Artificial Neuron at a Glance

 A neuron is a mathematical function modeled on the working of biological neurons

Perceptron Learning Rule

The perceptron tries to decide if you should go to a musical show.

Criteria Input Weight

Artists is Good x1 = 0 or 1 w1 = 0.7

Weather is Good x2 = 0 or 1 w2 = 0.6

Friend will Come x3 = 0 or 1 w3 = 0.5

Food is Served x4 = 0 or 1 w4 = 0.3

Alcohol is Served x5 = 0 or 1 w5 = 0.4

1. Set a threshold value:

2. Multiply all inputs with its weights:

3. b Sum all the results:

Perceptron inputs are called nodes.

The nodes have both a value and a weight.

The Activation Function

Types of activation functions

Mathematically it can be represented as:

The limitations of sigmoid function are discussed below:

 The derivative of the function is f'(x) = sigmoid(x)*(1-sigmoid(x)).

Tanh Activation Function (Hyperbolic Tangent)

Advantages of using this activation function are:

ReLU Activation Function

ReLU stands for Rectified Linear Unit.

The advantages of using ReLU as an activation function are as follows:

The limitations faced by this function are:

 The Dying ReLU proble.

Leaky ReLU Activation Function

Softmax Activation Function

The answer is: We can’t.

The Softmax function is described as a combination of multiple sigmoids.

Mathematically it can be represented as:

Overfitting is the result of an ML model placing importance on relatively unimportant information in

Main Types of Deep Neural Networks

 Multi-Layer Perceptrons (MLP)

Convolutional Neural Networks (CNN)

How do convolutional neural networks work?

The kernel is a matrix of weights.

The following 3x3 kernel detects vertical lines.

Let us imagine an 9x9 input image of a plus sign.

Convolutional neural network representation

Commonly used CNN architectures

 Successively applying convolutional layers to the input,

Classic network architectures (included for historical purposes)

Modern network architectures

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.