Activation
Activation
Types of AF:
The Activation Functions can be basically divided into 3 types-
1. Binary step Activation Function
2. Linear Activation Function
3. Non-linear Activation Functions
TanH is also like logistic sigmoid but in better way. The range of the
TanHfunction is from -1 to +1.
TanH is often preferred over the sigmoid neuron because it is zero centred.
The advantage is that the negative inputs will be mapped strongly negative
and the zero inputs will be mapped near zero in tanh graph.
tanh(x) = 2 * sigmoid(2x) - 1
The ReLU is the most used activation function. It is used in almost all
convolution neural networks in hidden layers only.
The ReLU is half rectified(from bottom). f(z) = 0, if z < 0
= z, otherwise
R(z) = max(0,z)
The range is 0 to inf.
Advantages
Avoids vanishing gradient problem.
Computationally efficient—allows the network to converge very
quickly
Non-linear—although it looks like a linear function, ReLU has a
derivative function and allows for backpropagation
Disadvantages
Can only be used with a hidden layer
hard to train on small datasets and need much data for learning non-
linear behavior.
The Dying ReLU problem—when inputs approach zero, or are
negative, the gradient of the function becomes zero, the network
cannot perform backpropagation and cannot learn.
We needed the Leaky ReLU activation function to solve the „Dying ReLU‟
problem.
Leaky ReLU we do not make all negative inputs to zero but to a value near
to zero which solves the major issue of ReLU activation function.
R(z) = max(0.1*z,z)
Advantages
Prevents dying ReLU problem—this variation of ReLU has a small
positive slope in the negative area, so it does enable backpropagation,
even for negative input values
Otherwise like ReLU
Disadvantages
Results not consistent—leaky ReLU does not provide consistent
predictions for negative input values.
3.4 Softmax:
sigma = softmax
zi = input vector
e^{zi}} = standard exponential function for input vector
K = number of classes in the multi-class classifier
e^{zj} = standard exponential function for output vector
e^{zj} = standard exponential function for output vector
Advantages
Able to handle multiple classes only one class in other activation
functions—normalizes the outputs for each class between 0 and 1with the
sum of the probabilities been equal to 1, and divides by their sum, giving the
probability of the input value being in a specific class.
Useful for output neurons—typically Softmax is used only for the output
layer, for neural networks that need to classify inputs into multiple
categories.
These models are called feedforward because information flows through the
function being evaluated from x, through the intermediate computations