Logistic Regression
Logistic Regression
Logistic Regression
Logistic Regression
• Logistic Regression enables you to use regression models to predict the probability of a
particular categorical response for a given set of independent variables.
• Logistic regression is used to describe data and to explain the relationship between one
dependent binary variable and one or more nominal, ordinal, interval or ratio-level
independent variables.
• The goal is to model the probability of a random variable Y being 0 or 1 given experimental
data.
• The logistics regression model uses the odds ratio, which represents the probability of having
an event of interest compared with the probability of not having an event of interest.
Types of Logistic Regression
An odds ratio (OR) is a statistic that quantifies the strength of the association between two
events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and
the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A
and the odds of B in the absence of A.
Odds ratios are used to compare the relative odds of the occurrence of the outcome of interest
(e.g. disease or disorder), given exposure to the variable of interest (e.g. health characteristic,
aspect of medical history).
Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the
same in either the presence or absence of the other event.
If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared
to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A
raises the odds of B.
If the OR is less than 1, then A and B are negatively correlated, and the presence of one event
reduces the odds of the other event.
The logit is a transformation. Logistic regression is a regression model.
The logit transformation transforms a line to a logistic curve. Logistic regression fits a logistic
curve to set of data where the dependent variable can only take the values 0 and 1. It can be
generalized to fitting ordinal data.
Logistic regression is a regression model when you have response variable as binary
variable.
Logit is a transformation, which we use to transform our model to make a linear model.
Since logistic regression is not linear regression , so we take -
y= ln(π/1-π) ,
where π is the probability of success in binary variable (i.e. y=1), log of odds, this is logit
function.
if we used Y as the outcome variable and tried to fit a line, it wouldn’t be a very good
representation of the relationship.
Since response variable is between 0 or 1 so the graph of the model should not be linear it
should be little S or reverse S- shaped.
Linear Vs. Logistic Regression
Linear Regression
• Outcome is Continuous
• Linear relationship between the dependent and
independent variables.
• Fitting a straight line in the data
• Regression algorithm for Machine Learning
Logistic Regression
• Outcome is Discrete (Not Continuous)
• No linear relationship between the dependent and
independent variables
• Fitting a curve to the data
• Classification Algorithm for Machine Learning
LINEAR REGRESSION LOGISTIC REGRESSION
Linear Regression is a supervised Logistic Regression is a supervised
regression model. classification model.
In Linear Regression, we predict the In Logistic Regression, we predict the
value by an integer number. value by 1 or 0.
Here activation function is used to
Here no activation function is used. convert a linear regression equation to
the logistic regression equation
Here dependent variable should be
Here the dependent variable consists of
numeric and the response variable is
only two categories.
continuous to value.
Here when we plot the training datasets, Any change in the coefficient leads to a
a straight line can be drawn that touches change in both the direction and the
maximum plots. steepness of the logistic function
Used to estimate the dependent variable Whereas logistic regression is used to
in case of a change in independent calculate the probability of an event. For
variables. For example, predict the price example, classify if tissue is benign or
of houses. malignant.
Binary Logistic Regression Model
• The logistic regression model is based on the natural logarithm of the odds ratio, ln(odds
ratio). The logit link function is used to model the probability of ‘success’ as a function of
covariates (e.g., logistic regression). The purpose of the logit link is to take a linear
combination of the covariate values (which may take any value between ±∞) and convert
those values to the scale of a probability, i.e., between 0 and 1.
• The logit link function is defined as
𝑝
𝑙𝑜𝑔𝑖𝑡 𝑝 = ln = ln 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 = 𝛽0 + 𝜷𝟏 𝑿𝟏𝐢 + 𝜷𝟐 𝑿𝟐𝐢 + ⋯ . . +𝜷𝐤 𝑿𝐤𝐢 + 𝜺𝐢
1−𝑝
For example: consider a cardholder who charged $36000 last year and possesses additional
cards for members of the households. What is the probability the cardholder will upgrade to
the premium card during the market campaign?
Therefore odds are 2.3558 to 1 that a credit cardholder who spent $36000 last year and has
additional cards will purchase the premium card during the campaign.
= 0.702
Thus, the estimated probability is 0.702 that a credit cardholder who spent $36000 last year and
has additional cards will purchase the premium card during the campaign.
In other words, you predict 70.2% of such individuals will purchase the premium card.
Now that we have used the logistic regression model for prediction, we need to determine
whether or not the model is a good-fitting model.
• We will need to ascertain how good our regression model is once we have fitted it to the
data – does it accurately explain the data, or does it incorrectly classify cases as often as it
correctly classifies them?
• The deviance, or -2 log-likelihood (-2LL) statistic, can help us here.
• The deviance is basically a measure of how much unexplained variation there is in our
logistic regression model – the higher the value the less accurate the model.
• It compares the difference in probability between the predicted outcome and the actual
outcome for each case and sums these differences together to provide a measure of the
total error in the model.
• This is similar in purpose to looking at the total of the residuals (the sum of squares) in linear
regression analysis in that it provides us with an indication of how good our model is at
predicting the outcome.
• The -2LL statistic (often called the deviance) is an indicator of how much unexplained
information there is after the model has been fitted, with large values of -2LL indicating
poorly fitting models.
The deviance statistic follows a chi-square distribution with n-k-1 degrees of freedom.
Reject Ho if deviance > chi square value at α; otherwise, do not reject Ho.
Another way of evaluating the effectiveness of a regression model is to calculate how strong
the relationship between the explanatory variable(s) and the outcome variable is. This was
represented by the R2 statistic in linear regression analysis. R2, or rather a form of it, can also
be calculated for logistic regression.
The two versions most commonly used are Hosmer & Lemeshow’s R2 and Nagelkerke’s R2
Both describe the proportion of variance in the outcome that the model successfully
explains.
Like R2 in multiple regression, these values range between ‘0’ and ‘1’; with a value of ‘1’
suggesting that the model accounts for 100% of variance in the outcome and ‘0’ that it
accounts for none of the variance.
Interpretation of Classification
Table/Confusion Matrix
The following measures are used in the confusion matrix:
• True positive (TP): (1 1)Refers to the number of correctly predicted positive instances i.e.
Positive values predicted as positive. For example, you had predicted that France would win the world cup, and it
won.
• True negative (TN): (0 0)Refers to number of correctly predicted negative instances i.e.,
Negative values predicted as negative. You had predicted that England would not win and it lost.
• False negative (FN): (1 0)Refers to the number of incorrectly predicted positive instances i.e.
Positive values predicted as negative. You had predicted that France would not win, but it won.
• False positive (FP): (0 1) Refers to number of incorrectly predicted negative instances i.e.,
Negative values predicted as positive. You had predicted that England would win, but it lost.
Confusion Matrix
Accuracy=(TN+TP)/(ALL)
Recall: Recall gives us an idea about how often does it predict yes.
TP / (TP + FN)
Precision: Precision tells us about when it predicts yes, how often is it correct.
TP / (TP + FP)