0% found this document useful (0 votes)

21 views

Activation Function

Uploaded by

Avishka Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Activation Function

Uploaded by

Avishka Chaudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

ACTIVATION

FUNCTION
• An activation function are mathematical functions that
determine the output of neural network
• It is attached to each neuron in neural network and
determine whether the neuron should be
activated(“Fired”) or not

ACTIVATION • It introduce non-linear properties into network.

• An activation function is just a simple function that changes
FUNCTION its inputs into outputs with a defined range.
• Activation Function also called as transfer functions, to
transform the summed weighted input from the node into
an output value to be fed to the next hidden layer or as
output
Activation for Hidden Layers
• Activation function is one of the building block of neural network.
• When Y= ∑wixi +b then y can be anything from -∞ to + ∞. The neuron doesn’t know how to
bound the value and thus not able to decide the firing pattern.
• Activation function basically decide whether a neuron should be activated or not by
bounding value of y.
Activation Function
• All hidden layers usually use the same activation function. However, the output layer will
typically use a different activation function from the hidden layers. The choice depends on the
goal or type of prediction made by the model.
• Need of Activation Function
To add non-linearity to the neural network.
Concept of Gradient Descent
Binary ,Multiclass, Multilabel Classification
In binary classification problem statements, any of the samples from the dataset takes only one label out of
two classes.

In multi-class classification, each input will have only one output class,

In multi-label classification, each input can have multi-output classes.

Binary Class

Classify the review as either

positive or negative i.e, only
two possible target outcomes.

In neural networks, we usually use the Sigmoid Activation Function for binary classification tasks
Multi-Class Classification
For example, If we are making an animal classifier that classifies between Dog, Rabbit, Cat, and Tiger,
it makes sense only for one of these classes to be selected each time.

To ensure only one class is selected each time, we apply

the Softmax Activation Function at the last layer and we
use log loss to train the model.

Movie Review dataset

Multi-label Classification
In certain problems, each input can have multiple, or even none, of the designated output classes. In these
cases, we go for the multi-label classification problem approach.

For example, If we are building a model which predicts all the clothing articles a person is wearing, we can
use a multi-label classification model since there can be more than one possible option at once.

we use multiple sigmoids on the last layer

and thus learn a separate distribution for
each class.
Types of activation function
• Step Function
• Signum Function
• Linear Function
• Sigmoid(Logistic Function)
• ReLU (Rectified Linear Unit) Function
• Hyperbolic tan (tan h)
Binary Step Function
Input fed is compared to certain threshold, if the input is greater than it, then neuron is
activated else it is deactivated i.e. output is not passed to next layer.

Used in: Hidden layer and output layer for binary classification problems
Limitation
• It can’t provide multi value output i.e. not suitable for multiclass classification problem.
Signum Function

Used in: Hidden layer and output layer for binary classification problems
Linear Activation Function
• Also called no activation or identity function
• Activation is proportional to the input.

Range
-∞ to +∞

Used in Hidden layer or output layer for regression problem

Limitation
• It is not possible to use backpropagation as the derivative of function is a constant and has
no relation to the input x.
• All the layers of the neural network will collapse into one if a linear function is used ,no matter
the number of layers in neural network, the last layer will be linear function of first layer.
Sigmoid / Logistic Activation Function
• This function takes any real value as input and outputs values in the range of 0 to 1.
• The larger the input (more positive), the closer the output value will be to 1.0, whereas the
smaller the input (more negative), the closer the output will be to 0.0, as shown below.
Advantages:
• It is commonly used for models where we have to predict the probability as an
output. Since probability of anything exists only between the range of 0 and 1, sigmoid
is the right choice because of its range.
• Used in hidden layer, output layer for classification, tells likelihood of classification
rather than hard classification
• The function is differentiable and provides a smooth gradient, i.e., preventing jumps in
output values. This is represented by an S-shape of the sigmoid activation function.
Limitation
• The derivative of the function is f'(x) = sigmoid(x)*(1-sigmoid(x)).

• the gradient values are only significant for range -3 to 3, and the graph gets much flatter in other regions.
It implies that for values greater than 3 or less than -3, the function will have very small gradients. As the
gradient value approaches zero, the network ceases to learn and suffers from the Vanishing
gradient problem.
• The output of the logistic function is not symmetric around zero. So the output of all the neurons will be of the
same sign. This makes the training of the neural network more difficult and unstable.
Tanh Function (Hyperbolic Tangent)
• Tanh function is very similar to the sigmoid/logistic activation function, and even has the
same S-shape with the difference in output range of -1 to 1. In Tanh, the larger the
input (more positive), the closer the output value will be to 1.0, whereas the smaller the
input (more negative), the closer the output will be to -1.0.
Advantages of using this activation function
are:

• The output of the tanh activation function is Zero centered; hence we can easily map
the output values as strongly negative, neutral, or strongly positive.
• Used in hidden layer, output layer for classification, tells likelihood of classification rather
than hard classification
• Usually used in hidden layers of a neural network.
Limitation
It also faces the problem of vanishing gradients similar to the sigmoid activation
function. Plus the gradient of the tanh function is much steeper as compared to the
sigmoid function.

• Note: Although both sigmoid and tanh face vanishing gradient issue, tanh is
zero centered, and the gradients are not restricted to move in a certain direction.
Therefore, in practice, tanh nonlinearity is always preferred to sigmoid nonlinearity.
RelU Function
• ReLU stands for Rectified Linear Unit.

• Although it gives an impression of a linear function, ReLU has a derivative function and allows for
backpropagation while simultaneously making it computationally efficient.
• Used in hidden layer of CNN or vision applications and in output layer where dependent variable
is always positive

• The main catch here is that the ReLU function does not activate all the neurons at the same time.

• The neurons will only be deactivated if the output of the linear transformation is less than 0.
def ReLU(x):
if x>0:
return x
else:
return 0

Range def relu(x):

return max(0.0, x)
[0 , ∞)
Advantages of using ReLU as an activation function are as follows:

• Since only a certain number of neurons are activated, the ReLU function is far more computationally
efficient when compared to the sigmoid and tanh functions.
• ReLU accelerates the convergence of gradient descent towards the global minimum of the loss
function due to its linear, non-saturating property.

The limitations faced by this function are:

The Dying ReLU problem.

The Dying ReLU problem
• The negative side of the graph makes the gradient value zero. Due to this reason, during the
backpropagation process, the weights and biases for some neurons are not updated. This can
create dead neurons which never get activated.
• All the negative input values become zero immediately, which decreases the model’s ability to
fit or train from the data properly.
Leaky ReLU Function
• Leaky ReLU is an improved version of ReLU function to solve the Dying ReLU problem as it has a
small positive slope in the negative area

Range
-∞ to +∞

• The advantages of Leaky ReLU are same as that of ReLU, in addition to the fact that it does enable
backpropagation, even for negative input values. Gradient is a non zero value,no dead neurons.
• Mostly used in hidden layers of CNN
Limitations
• Predictions may not be consistent for negative input values
• The gradient for negative value is small value that makes the learning of model parameter time
consuming
Parametric ReLU Function
• Solve the problem of gradient’s becoming zero for the left half of the axis.
• This function provides the slope of the negative part of the function as an argument a. By performing
backpropagation, the most appropriate value of a is learnt.

Where "a" is the slope parameter for negative values.

• The parameterized ReLU function is used when the leaky ReLU function still fails at solving the problem
of dead neurons, and the relevant information is not successfully passed to the next layer.
Limitation
• It may perform differently for different problems depending upon the value of slope parameter a.
Softmax Function
Building Block of softmax function is sigmoid/logistic activation function that works on circulating
probability values.

The output of the sigmoid function was in the range of 0 to 1,

which can be thought of as probability.
But—
This function faces certain problems.

Suppose we have five output values of 0.8, 0.9, 0.7, 0.8, and
0.6, respectively. How can we move forward with it?

The answer is: We can’t.

The above values don’t make sense as the sum of all the
classes/output probabilities should be equal to 1.
The Softmax function is described as a combination of multiple sigmoid.

• It calculates the relative probabilities. Similar to the sigmoid/logistic activation function,

the SoftMax function returns the probability of each class.

• It is most commonly used as an activation function for the last layer of the neural network
in the case of multi-class classification.

• Mathematically it can be represented as:

• z is the vector of raw outputs from the

neural network

• The value of e ≈ 2.718

Let’s go over a simple example together.
• Assume that you have three classes, meaning that there would be three neurons in
the output layer. Now, suppose that your output from the neurons is [1.8, 0.9, 0.68].
• Applying the softmax function over these values to give a probabilistic view will
result in the following outcome: [0.58, 0.23, 0.19].
• The function returns 1 for the largest probability index while it returns 0 for the other
two array indexes. Here, giving full weight to index 0 and no weight to index 1 and
index 2. So the output would be the class corresponding to the 1st neuron(index 0)
out of three.
• You can see now how softmax activation function make things easy for multi-class
classification problems.
Activation Function for hidden layers
•Rectified Linear Activation (ReLU)
•Logistic (Sigmoid)
•Hyperbolic Tangent (Tanh)

Activation Function for output layers

• Linear
• Logistic (Sigmoid)
• Softmax
More about Activation Function?
1.ReLU activation function are more suitable for hidden layers.
2.Sigmoid/Logistic and Tanh functions should be avoided in hidden
layers as they make the model more susceptible to problems
during training (due to vanishing gradients).
3.Swish function is used in neural networks having a depth greater
than 40 layers.
Finally, a few rules for choosing the activation function for your output layer based on
the type of prediction problem that you are solving:

1.Regression - Linear Activation Function

2.Binary Classification—Sigmoid/Logistic Activation Function
3.Multiclass Classification—Softmax
4.Multilabel Classification—Sigmoid

The activation function used in hidden layers is typically chosen based on the type of
neural network architecture.

5.Convolutional Neural Network (CNN) AND Multilayer perceptron: ReLU activation

function.
6.Recurrent Neural Network: Tanh and/or Sigmoid activation function.
Neural Networks Activation Functions in a Nutshell
• Activation Functions are used to introduce non-linearity in the network.

• A neural network will almost always have the same activation function in all hidden layers. This
activation function should be differentiable so that the parameters of the network are learned in
backpropagation.

• ReLU is the most commonly used activation function for hidden layers.

• While selecting an activation function, you must consider the problems it might face: vanishing
and exploding gradients.

• Regarding the output layer, we must always consider the expected value range of the predictions.
If it can be any numeric value (as in case of the regression problem) you can use the linear
activation function or ReLU.

• Use Softmax or Sigmoid function for the classification problems.

How to choose an output layer activation
function

Pack de Payload Todos Los Operadores-AEC
20% (5)
Pack de Payload Todos Los Operadores-AEC
2 pages
Mining Software
No ratings yet
Mining Software
12 pages
Activation Function
No ratings yet
Activation Function
36 pages
Activation
No ratings yet
Activation
7 pages
Types of Neural Network Activation Functions_ How to Choose_ (1)
No ratings yet
Types of Neural Network Activation Functions_ How to Choose_ (1)
36 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Mod 2.3 - Activation Function, Loss Functions
No ratings yet
Mod 2.3 - Activation Function, Loss Functions
12 pages
UNIT-III Activation-function
No ratings yet
UNIT-III Activation-function
6 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Mod 2.3 - Activation Function
No ratings yet
Mod 2.3 - Activation Function
9 pages
Lect 5- Non Linear Activation Functions
No ratings yet
Lect 5- Non Linear Activation Functions
41 pages
5 TH
No ratings yet
5 TH
22 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
Activation Function
No ratings yet
Activation Function
31 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Module1
No ratings yet
Module1
124 pages
Act_Fun
No ratings yet
Act_Fun
7 pages
Activation Function
No ratings yet
Activation Function
9 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Activation Function
No ratings yet
Activation Function
18 pages
Activation Function
No ratings yet
Activation Function
4 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
lecture 9-NN- modified
No ratings yet
lecture 9-NN- modified
94 pages
activation fn
No ratings yet
activation fn
15 pages
Unit 2_Activation Function_PR
No ratings yet
Unit 2_Activation Function_PR
22 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
M2 PPT
No ratings yet
M2 PPT
84 pages
activatn fn 2
No ratings yet
activatn fn 2
10 pages
ML_Lec-22
No ratings yet
ML_Lec-22
25 pages
Unit 2b
No ratings yet
Unit 2b
11 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
How To Choose An Activation Function For Deep Learning
No ratings yet
How To Choose An Activation Function For Deep Learning
15 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Neural Network example and Activation Functions Summary
No ratings yet
Neural Network example and Activation Functions Summary
2 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Artificial Neural Networks(ANN)
No ratings yet
Artificial Neural Networks(ANN)
67 pages
Soft Computing Manual.-1
No ratings yet
Soft Computing Manual.-1
45 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
3. Activation Function
No ratings yet
3. Activation Function
14 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
Pr1_ANN_Writeup.docx
No ratings yet
Pr1_ANN_Writeup.docx
7 pages
3-Activation Function, Loss Function-24-07-2024
No ratings yet
3-Activation Function, Loss Function-24-07-2024
19 pages
Activation functions 2
No ratings yet
Activation functions 2
5 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
Activation Functions
No ratings yet
Activation Functions
3 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Activation Function
No ratings yet
Activation Function
44 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
CN Assignment 1
No ratings yet
CN Assignment 1
1 page
23-24 Microsoft Word - KOE068
No ratings yet
23-24 Microsoft Word - KOE068
3 pages
Unit1 Theory
No ratings yet
Unit1 Theory
42 pages
ch-12 Physics
No ratings yet
ch-12 Physics
22 pages
Project Title-: Woodland Factory Outlet
No ratings yet
Project Title-: Woodland Factory Outlet
17 pages
Lab Manual for B .A. BSc. VI Semester
No ratings yet
Lab Manual for B .A. BSc. VI Semester
34 pages
2001 Harmonic Sources and Filtering Approaches PDF
No ratings yet
2001 Harmonic Sources and Filtering Approaches PDF
8 pages
Lesson 4
No ratings yet
Lesson 4
13 pages
01 -RA_ Draft ICT Governance Policy Document Version 1.0 20241126
No ratings yet
01 -RA_ Draft ICT Governance Policy Document Version 1.0 20241126
20 pages
csc3112 Lecture7
No ratings yet
csc3112 Lecture7
20 pages
Theory Paper
No ratings yet
Theory Paper
79 pages
Company Profile: Godrej & Boyce Mfg. Co. Ltd. Corporate Profile-2007 History
0% (1)
Company Profile: Godrej & Boyce Mfg. Co. Ltd. Corporate Profile-2007 History
9 pages
Drum Sound and Drum Tuning : Bridging Science and Creativity 1st Edition Rob Toulson All Chapters Instant Download
100% (1)
Drum Sound and Drum Tuning : Bridging Science and Creativity 1st Edition Rob Toulson All Chapters Instant Download
65 pages
004 Ingrid MEYER ComputerWords in Our Everyday
No ratings yet
004 Ingrid MEYER ComputerWords in Our Everyday
20 pages
Logcat CSC Update Log
No ratings yet
Logcat CSC Update Log
1,762 pages
Service Manual: DVD Receiver
No ratings yet
Service Manual: DVD Receiver
70 pages
Internet On/off Message Cable Activity Cable Link PC Link Internet
No ratings yet
Internet On/off Message Cable Activity Cable Link PC Link Internet
48 pages
U1 - Lesson 1 HW - Translate Solve
0% (1)
U1 - Lesson 1 HW - Translate Solve
3 pages
AI Important Questions
No ratings yet
AI Important Questions
196 pages
Gaurav Resume Data Science
No ratings yet
Gaurav Resume Data Science
1 page
ONGC Placement Paper, 17 August 2008, Mumbai
No ratings yet
ONGC Placement Paper, 17 August 2008, Mumbai
5 pages
Colored Pencil Drawing Tutorial Step by Step
100% (1)
Colored Pencil Drawing Tutorial Step by Step
4 pages
CCNA 1 - Chapter 2 Summary
No ratings yet
CCNA 1 - Chapter 2 Summary
5 pages
Gaurav Iphone XR Bill (1) - Removed
No ratings yet
Gaurav Iphone XR Bill (1) - Removed
1 page
Artificial Intell06
No ratings yet
Artificial Intell06
42 pages
Analog Simulation
No ratings yet
Analog Simulation
9 pages
Chevalier FSG-1224ADIII
No ratings yet
Chevalier FSG-1224ADIII
12 pages
Organizational Elements
0% (1)
Organizational Elements
3 pages
Information Theory From-Coding To - Learning, Yury Polyanskiy, Yihong Wu, Cambridge University Press, 2022
No ratings yet
Information Theory From-Coding To - Learning, Yury Polyanskiy, Yihong Wu, Cambridge University Press, 2022
620 pages
Chain of Custody: (Failure To Comply With These Instructions Will Result in A Delay of Sample Analysis)
No ratings yet
Chain of Custody: (Failure To Comply With These Instructions Will Result in A Delay of Sample Analysis)
1 page
Response Surface Methodology
No ratings yet
Response Surface Methodology
26 pages
3D Printing Essay
No ratings yet
3D Printing Essay
2 pages
Original Redacted Manning Charge Sheet UCMJ Fas-Manning070510
No ratings yet
Original Redacted Manning Charge Sheet UCMJ Fas-Manning070510
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Activation Function

Uploaded by

Activation Function

Uploaded by

ACTIVATION

ACTIVATION • It introduce non-linear properties into network.

In multi-label classification, each input can have multi-output classes.

Classify the review as either

To ensure only one class is selected each time, we apply

Movie Review dataset

we use multiple sigmoids on the last layer

Used in Hidden layer or output layer for regression problem

Range def relu(x):

The limitations faced by this function are:

The Dying ReLU problem.

Where "a" is the slope parameter for negative values.

The output of the sigmoid function was in the range of 0 to 1,

The answer is: We can’t.

• It calculates the relative probabilities. Similar to the sigmoid/logistic activation function,

• Mathematically it can be represented as:

• z is the vector of raw outputs from the

• The value of e ≈ 2.718

Activation Function for output layers

1.Regression - Linear Activation Function

5.Convolutional Neural Network (CNN) AND Multilayer perceptron: ReLU activation

• Use Softmax or Sigmoid function for the classification problems.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.