0% found this document useful (0 votes)
12 views

Unit 5 Activation Function

Uploaded by

trendysyncs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit 5 Activation Function

Uploaded by

trendysyncs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

ACTIVATION FUNCTION

● An activation function in a neural network defines how the weighted sum of the input is
transformed into an output from a node or nodes in a layer of the network.
● Sometimes the activation function is called a “transfer function.” If the output range of the
activation function is limited, then it may be called a “squashing function.” Many activation
functions are nonlinear and may be referred to as the “nonlinearity” in the layer or the network
design.
● The choice of activation function has a large impact on the capability and performance of the
neural network, and different activation functions may be used in different parts of the model.
● Technically, the activation function is used within or after the internal processing of each node in
the network, although networks are designed to use the same activation function for all nodes in
a layer.
● A network may have three types of layers: input layers that take raw input from the domain,
hidden layers that take input from another layer and pass output to another layer, and output
layers that make a prediction.
● All hidden layers typically use the same activation function. The output layer will typically use a
different activation function from the hidden layers and is dependent upon the type of prediction
required by the model.
● Activation functions are also typically differentiable, meaning the first-order derivative can be
calculated for a given input value. This is required given that neural networks are typically trained
using the backpropagation of error algorithm that requires the derivative of prediction error in
order to update the weights of the model.
BINARY STEP FUNCTION
● Binary step function depends on a threshold value that decides whether a neuron should

be activated or not.

● The input fed to the activation function is compared to a certain threshold; if the input is

greater than it, then the neuron is activated, else it is deactivated, meaning that its output

is not passed on to the next hidden layer.

Here are some of the limitations of binary step function:

● It cannot provide multi-value outputs—for example, it cannot be used for multi-class


classification problems.
● The gradient of the step function is zero, which causes a hindrance in the backpropagation
process.
LINEAR ACTIVATION FUNCTION
● The linear activation function, also known as "no activation," or "identity function"
(multiplied x1.0), is where the activation is proportional to the input.
● The function doesn't do anything to the weighted sum of the input, it simply spits out
the value it was given.

However, a linear activation function has two major problems :

● It’s not possible to use backpropagation as the derivative of the function is a constant
and has no relation to the input x.
● All layers of the neural network will collapse into one if a linear activation function is
used. No matter the number of layers in the neural network, the last layer will still be
a linear function of the first layer. So, essentially, a linear activation function turns the
neural network into just one layer.
NON-LINEAR ACTIVATION FUNCTION
Because of its limited power, linear does not allow the model to create complex mappings
between the network’s inputs and outputs.

Non-linear activation functions solve the following limitations of linear activation functions:

● They allow backpropagation because now the derivative function would be related to the
input, and it’s possible to go back and understand which weights in the input neurons can
provide a better prediction.
● They allow the stacking of multiple layers of neurons as the output would now be a
non-linear combination of input passed through multiple layers. Any output can be
represented as a functional computation in a neural network.
1. Sigmoid
● It is commonly used for models where we have to predict the probability as an
output. Since probability of anything exists only between the range of 0 and 1,
sigmoid is the right choice because of its range.
● The function is differentiable and provides a smooth gradient, i.e., preventing jumps
in output values. This is represented by an S-shape of the sigmoid activation
function.

Drawbacks:

1. As the gradient value approaches zero, the network ceases to learn and suffers
from the Vanishing gradient problem.
2. The output of the logistic function is not symmetric around zero. So the output of all
the neurons will be of the same sign. This makes the training of the neural network
more difficult and unstable.
2. tan-h
Tanh function is very similar to the sigmoid/logistic activation function, and even has
the same S-shape with the difference in output range of -1 to 1.

Advantages of using this activation function are:

● The output of the tanh activation function is Zero centered; hence we can easily
map the output values as strongly negative, neutral, or strongly positive.
● Usually used in hidden layers of a neural network as its values lie between -1 to
1; therefore, the mean for the hidden layer comes out to be 0 or very close to it.
It helps in centering the data and makes learning for the next layer much easier.

Drawback:

it also faces the problem of vanishing gradients similar to the sigmoid activation
function.
3.ReLU
● ReLU stands for Rectified Linear Unit.
● Although it gives an impression of a linear function, ReLU has a derivative function and allows for
backpropagation while simultaneously making it computationally efficient.
● The main catch here is that the ReLU function does not activate all the neurons at the same time.
● The neurons will only be deactivated if the output of the linear transformation is less than 0.
● The advantages of using ReLU as an activation function are as follows:
○ Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
○ ReLU accelerates the convergence of gradient descent towards the global minimum of the loss
function due to its linear, non-saturating property.
● The limitations faced by this function are:
○ The Dying ReLU problem

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy