Module IV - Logistic Regression
Module IV - Logistic Regression
Module IV - Logistic Regression
Module-IV
Logistic Regression
Prepared by: Dr. Ram Paul Hathwal
Dept of CSE, ASET, AUUP
Logistic Regression ASET
Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique.
It is used for predicting the categorical dependent variable using a given set of
independent variables.
In a classification problem, the target variable(or output), y, can take only discrete
values for given set of features(or inputs), X.
The objective of Logistic regression is to find the best fitting model to describe the
relationship between the dichotomous characteristics of interest and a set of independent
variables.
Used in a situation when a researcher is interested to predict the occurrence of any
happenings.
Graphical Representation:
Logistic Regression ASET
Linear vs Logistic Regression ASET
Linear vs Logistic Regression ASET
Logistic Regression:
Mathematical Representation ASET
In linear regression, the output Y is in the same units as the target variable (the
thing you are trying to predict).
However, in logistic regression the output Y is in log odds.
1 1
sigmoid ( z ) z
( T xi )
1 e 1 e
Sigmoid function convers input range 0 to 1
e= Euler’s number~2.71828
ASET
Types of Logistic Regression ASET
Binomial:
Target variable can have only 2 possible types: “0” or “1” which may
represent “win” vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
Multinomial:
Target variable can have 3 or more possible types which are not ordered
(i.e. Types have no quantitative significance) like “disease A” vs “disease
B” vs “disease C”.
Ordinal:
It deals with target variables with ordered categories. For example, a test
score can be categorized as: “very poor”, “poor”, “good”, “very good”.
Here, each category can be given a score like 0, 1, 2, 3.
How does Logistic Regression Work? ASET
Consider we have a model with one predictor “x” and one Bernoulli response variable “ŷ”
and p is the probability of ŷ=1. The linear equation can be written as:
p = b0+b1x --------> (1)
Odds: The ratio of the probability of an event occurring to the probability of an event not occurring.
Odds = p/(1-p)
The equation 1 can be re-written as:
p/(1-p) = b0+b1x --------> (2)
Odds can only be a positive value, to tackle the negative numbers, we predict the logarithm of odds.
Log of odds (i.e. logit) = ln(p/(1-p))
The equation 2 can be re-written as:
ln(p/(1-p)) = b0+b1x --------> (3)
To recover p from equation (3), we apply exponential on both sides.
exp(ln(p/(1-p))) = exp(b0+b1x)
eln(p/(1-p)) = e(b0+b1x) 9
How does Logistic Regression Work? ASET
The sigmoid function is useful to map any predicted values of probabilities into another value
between 0 and 1.
11
Logistic Regression assumptions ASET
Consider removing outliers in your training set because logistic regression will
not give significant weight to them during its calculations.
Does not favor sparse (consisting of a lot of zero values) data.
Logistic regression is a classification model, unlike linear regression.
The coefficients in logistic regression are estimated using a process called
maximum-likelihood estimation.
In a binary logistic regression, the dependent variable must be binary
For a binary regression, the factor level one of the dependent variables should
represent the desired outcome
The independent variables should be independent of each other. This means the
model should have little or no multicollinearity
Remove highly correlated inputs.
Logistic regression requires quite large sample sizes
12
Making Predictions with
Logistic Regression ASET
Thanks!
16