0% found this document useful (0 votes)
13 views

Activation Functions

Uploaded by

shubhodippal01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Activation Functions

Uploaded by

shubhodippal01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Activation Functions

Dr. Chiradeep Mukherjee


Department of CST and CSIT
University of Engineering and Management Kolkata
Activation Functions
Definition: An Activation Function decides whether a neuron should be activated or
not. This means that it will decide whether the neuron's input to the network is
important or not in the process of prediction using simpler mathematical operations.

Types of Activation Function: The Activation Functions can be basically divided


into 2 types-
i) Linear Activation Function
ii) Non-linear Activation Functions The Nonlinear Activation Functions are the most used
activation functions. It makes it easy for the model to
generalize or adapt with variety of data and to differentiate
Linear Activation Function: It doesn’t between the output. The main terminologies needed to
understand for nonlinear functions are:
help with the complexity or various
parameters of usual data that is fed to Derivative or Differential: Change in y-axis w.r.t.
change in x-axis. It is also known as slope.
the neural networks.
Monotonic function: A function which is either entirely
non-increasing or non-decreasing.
Purpose of Activation Functions
• The purpose of an activation function in a neural network is to introduce non-linearity into the model. Here's a more detailed
breakdown of its role:
• 1. Introducing Non-linearity
• Without an activation function, a neural network would essentially be a linear model, no matter how many layers it has. This is
because a composition of linear functions is still a linear function.
• The activation function enables the network to learn complex, non-linear relationships in the data. By applying non-linearity, the
network can model a wider range of patterns and behaviors, making it capable of solving more complex tasks.
• 2. Enabling Neural Networks to Learn Complex Patterns
• Real-world data (like images, speech, text, etc.) is often non-linear, and the relationships between inputs and outputs can be
intricate. Without non-linear activation functions, the neural network would be limited in its capacity to model such relationships.
• Activation functions allow the neural network to learn hierarchical patterns, making it suitable for complex tasks such as image
recognition, natural language processing, and more.
• 3. Control of Output Range
• Activation functions can also serve to control the range of the output of a neuron. For example, functions like sigmoid or tanh
compress the output to a specific range (e.g., between 0 and 1 for sigmoid or -1 to 1 for tanh), which can be useful for certain tasks
like binary classification.
• Some activation functions, like ReLU (Rectified Linear Unit), do not compress the output but rather set a lower bound, which can
help with issues like vanishing gradients.
Purpose of Activation Functions
• 4. Gradient Flow
• In backpropagation, gradients are propagated backward through the network to adjust the weights. The behavior of the activation
function affects how well gradients flow through the network.
• Activation functions like ReLU (and its variants like Leaky ReLU or Parametric ReLU) help mitigate issues such as vanishing
gradients, which can occur with functions like sigmoid or tanh when the gradients become very small and prevent effective
learning.
• Common Activation Functions:
• Sigmoid: Outputs values between 0 and 1, typically used for binary classification.
• Tanh: Outputs values between -1 and 1, often used when both positive and negative outputs are needed.
• ReLU (Rectified Linear Unit): Outputs the input directly if positive, and zero otherwise. It's widely used because it helps mitigate
the vanishing gradient problem and is computationally efficient.
• Leaky ReLU: A variation of ReLU that allows a small, non-zero gradient when the input is negative, which can help during
training by preventing "dead neurons."
• Softmax: Used in the output layer for multi-class classification tasks, converting the output into a probability distribution.
• In summary, the activation function plays a key role in enabling a neural network to learn and approximate complex functions by
introducing non-linearity, ensuring efficient gradient flow, and controlling the output values of the neurons.
Non-Linear Activation Functions
i) Sigmoid function ii) Hyperbolic Tangent (tanh) function:

σ(z)= g(z)=

σ(z) graph g(z) graph

Note: The functionality of tanh activation function is better because it provides the output
between +1 and -1 with zero mean. Fixing mean at 0 makes next layer’s decision easier.
Non-Linear Activation Functions

iii) Rectified Linear Unit iv) Leaky ReLU Function:


(ReLU)
PReLU(z) = max(α*z,z)
ReLU(z) = max(0,z)
ReLU(z) graph PReLU(z) graph
Choice of Activation Functions
i) Why and when sigmoid function:
• The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially
used for models where we have to predict the probability as an output. Since probability of anything exists only
between the range of 0 and 1, sigmoid is the right choice for OUTPUT layer of binary classification problem.
• The function is differentiable. That means, we can find the slope of the sigmoid curve at any two points.
• The function is monotonic but function’s derivative is not.
• The logistic sigmoid function can cause a neural network to get stuck at the training time. EXCEPT FOR
BINARY CLASSIFICATION, WE SHOULD AVOID IT.

ii) Why and when tanh Function:


• The range of the tanh function is from (-1 to 1).
• The advantage is that the negative inputs will be mapped strongly negative and the zero inputs will be mapped
near zero in the tanh graph.
• The function is differentiable.
• The function is monotonic while its derivative is not monotonic.
• The tanh function is mainly used classification between two classes.
• Both tanh and logistic sigmoid activation functions are used in feed-forward nets. AND IT IS
RECOMMENDED TO USE tanh IN HIDDEN LAYER.
Derivative of Activation Functions
i) When we implement backpropagation for a neural network, we need to compute the slope
or the derivative of the activation function,
ii) when updating the curve, it is needed to know in which direction and how much to
change or update the curve depending upon the slope. That is why we use differentiation in
almost every part of Machine Learning and Deep Learning.

Derivative of Sigmoid Function:

σ(z)= ==== σ(z)(1- σ(z))


Derivative of Activation Functions
Derivative of tanh Function:
g(z)= =
==1-(g(z)2)

Plots of Derivatives of Activation


Functions:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy