Logistic Regression
Logistic Regression
o Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore
the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of
data and can easily determine the most effective variables used for the classification.
The below image is showing the logistic function:
Logistic Function (Sigmoid Function):
o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go beyond
this limit, so it forms a curve like the "S" form. The S-form curve is called the Sigmoid
function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.
The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
For this purpose, a linear regression algorithm will help them decide. Plotting a regression
line by considering the employee’s performance as the independent variable, and the salary
increase as the dependent variable will make their task easier.
Now, what if the organization wants to know whether an employee would get a promotion
or not based on their performance? The above linear graph won’t be suitable in this case. As
such, we clip the line at zero and one, and convert it into a sigmoid curve (S curve).
Based on the threshold values, the organization can decide whether an employee will get a
salary increase or not.
To understand logistic regression, let’s go over the odds of success.
𝜃=p/1-p
The values of odds range from zero to ∞ and the values of probability lies between zero and
one.
𝑦 = 𝛽0 + 𝛽1* 𝑥
p(x) = Y - Y(p(x))
p(x) + Y(p(x)) = Y
p(x)(1+Y) = Y
p(x) = Y / 1+Y
It helps estimate the dependent variable when there is a It helps to calculate the possibility
change in the independent variable of a particular event taking place
It is a straight line It is an S-curve (S = Sigmoid)
1. Using the logistic regression algorithm, banks can predict whether a customer would
default on loans or not.
2. To predict the weather conditions of a certain place (sunny, windy, rainy, humid,
etc.)
3. Ecommerce companies can identify buyers if they are likely to purchase a certain
product
4. Companies can predict whether they will gain or lose money in the next quarter,
year, or month based on their current performance
5. To classify objects based on their features and attributes
On the basis of the categories, Logistic Regression can be classified into three types:
o Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as "cat", "dogs", or "sheep"
o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as "low", "Medium", or "High".