Logistic Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Logistic Regression

Logistic Regression
• Logistic Regression enables you to use regression models to predict the probability of a
particular categorical response for a given set of independent variables.

• Logistic regression is used to describe data and to explain the relationship between one
dependent binary variable and one or more nominal, ordinal, interval or ratio-level
independent variables.

• Logistic regression is an important machine learning algorithm.

• The goal is to model the probability of a random variable Y being 0 or 1 given experimental
data.

• The logistics regression model uses the odds ratio, which represents the probability of having
an event of interest compared with the probability of not having an event of interest.
Types of Logistic Regression

• Binary Logistic Regression


The categorical response has only two 2 possible outcomes.
Example: Spam or Not

• Multinomial Logistic Regression


Three or more categories without ordering.
Example: Predicting which food is preferred more (Veg, Non-Veg, Vegan)

• Ordinal Logistic Regression


Three or more categories with ordering.
Example: Movie rating from 1 to 5
Odds Ratio

An odds ratio (OR) is a statistic that quantifies the strength of the association between two
events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and
the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A
and the odds of B in the absence of A.

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡


𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 =
1 − 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡

If the probability of an event of interest is 0.75, then

odds ratio = 0.75/(1-0.75)


=3
or 3 to 1.
When is it used?

Odds ratios are used to compare the relative odds of the occurrence of the outcome of interest
(e.g. disease or disorder), given exposure to the variable of interest (e.g. health characteristic,
aspect of medical history).

Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the
same in either the presence or absence of the other event.

If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared
to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A
raises the odds of B.

If the OR is less than 1, then A and B are negatively correlated, and the presence of one event
reduces the odds of the other event.
The logit is a transformation. Logistic regression is a regression model.
The logit transformation transforms a line to a logistic curve. Logistic regression fits a logistic
curve to set of data where the dependent variable can only take the values 0 and 1. It can be
generalized to fitting ordinal data.
Logistic regression is a regression model when you have response variable as binary
variable.
Logit is a transformation, which we use to transform our model to make a linear model.
Since logistic regression is not linear regression , so we take -
y= ln(π/1-π) ,
where π is the probability of success in binary variable (i.e. y=1), log of odds, this is logit
function.

if we used Y as the outcome variable and tried to fit a line, it wouldn’t be a very good
representation of the relationship.
Since response variable is between 0 or 1 so the graph of the model should not be linear it
should be little S or reverse S- shaped.
Linear Vs. Logistic Regression
Linear Regression
• Outcome is Continuous
• Linear relationship between the dependent and
independent variables.
• Fitting a straight line in the data
• Regression algorithm for Machine Learning

Logistic Regression
• Outcome is Discrete (Not Continuous)
• No linear relationship between the dependent and
independent variables
• Fitting a curve to the data
• Classification Algorithm for Machine Learning
LINEAR REGRESSION LOGISTIC REGRESSION
Linear Regression is a supervised Logistic Regression is a supervised
regression model. classification model.
In Linear Regression, we predict the In Logistic Regression, we predict the
value by an integer number. value by 1 or 0.
Here activation function is used to
Here no activation function is used. convert a linear regression equation to
the logistic regression equation
Here dependent variable should be
Here the dependent variable consists of
numeric and the response variable is
only two categories.
continuous to value.

Here when we plot the training datasets, Any change in the coefficient leads to a
a straight line can be drawn that touches change in both the direction and the
maximum plots. steepness of the logistic function
Used to estimate the dependent variable Whereas logistic regression is used to
in case of a change in independent calculate the probability of an event. For
variables. For example, predict the price example, classify if tissue is benign or
of houses. malignant.
Binary Logistic Regression Model
• The logistic regression model is based on the natural logarithm of the odds ratio, ln(odds
ratio). The logit link function is used to model the probability of ‘success’ as a function of
covariates (e.g., logistic regression). The purpose of the logit link is to take a linear
combination of the covariate values (which may take any value between ±∞) and convert
those values to the scale of a probability, i.e., between 0 and 1.
• The logit link function is defined as

𝑝
𝑙𝑜𝑔𝑖𝑡 𝑝 = ln = ln 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 = 𝛽0 + 𝜷𝟏 𝑿𝟏𝐢 + 𝜷𝟐 𝑿𝟐𝐢 + ⋯ . . +𝜷𝐤 𝑿𝐤𝐢 + 𝜺𝐢
1−𝑝

𝑤ℎ𝑒𝑟𝑒 𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑚𝑜𝑑𝑒


𝜀𝑖 = 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟 𝑖𝑛 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑖
Logistic Regression Equation
• In logistic regression, a mathematical method called maximum likelihood estimation is
typically used to develop a regression equation to predict the natural logarithm of this odds
ratio.

• The below equation defines the logistic regression equation:

ln 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 = 𝑏0 + 𝒃𝟏 𝑿𝟏𝐢 + 𝒃𝟐 𝑿𝟐𝐢 + ⋯ . . +𝒃𝐤 𝑿𝐤𝐢


Once the logistic regression equation is determined, estimated odds ratio is computed:

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 = 𝑒 ln 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜

And then finally we compute the estimated probability of an event of interest:

𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜


𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 =
1 + 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜
Illustration
• Consider the case of the sales and marketing manager for the credit card division of a major
financial company. The manager wants to conduct a campaign to persuade existing holders of
the bank’s standard credit card to upgrade, for a nominal annual fee, to the bank’s platinum
card.
The manager wonders, “which of the existing standard credit cardholders should we target for
this campaign?”
The manager has access to the results from a sample of 30 cardholders who were targeted
during a pilot campaign last year. These results have been organised as three variables:
dependent variable
(Y): cardholder upgraded to a premium card, (1=yes, 0=no);
independent variables
(X1): prior year’s credit card purchases (in $ thousands)
(X2): cardholder ordered additional credit cards for other authorized users, (1=yes,
0=no)
Using Excel data mining tool, the following LR equation is made :
ln(estimated odds of purchasing versus not purchasing)= -6.9394 + 0.1395(X1) + 2.7743(X2)

In the above model, the regression coefficients are interpreted as follows:


• The regression constant 𝑏0 , is -6.9394. This means that for a credit cardholder who did not
charge any purchases last year and who does not have additional cards, the estimated natural
log of the odds ratio of purchasing the premium card is -6.9394.
• The regression coefficient 𝑏1 , is 0.1395. This means that holding constant the effect of
whether the credit cardholder has additional cards for members of the households, for each
increase of $1000 in annual credit card spending using the company’s card, the estimated
natural log of the odds ratio of purchasing the premium card increases by 0.1395. Therefore,
cardholders who charged more in the previous year are more likely to upgrade to a premium
card.
• The regression coefficient 𝑏2 , is 2.7743. This means that holding constant the annual credit
card spending, the estimated natural logarithm of the odds ratio of purchasing the premium
card increases by 2.7743 for a credit cardholder who has additional cards for members of the
household compared with one who does not have additional cards. Therefore, cardholders
possessing additional cards for other members of the household are much more likely to
upgrade to a premium card.
• The regression coefficients suggests that the credit card company should develop a marketing
campaign that targets cardholders who tend to charge large amounts to their cards, and to
households that possess more than one card.
• The main purpose of performing logistic regression analysis is to provide predictions of a
dependent variable.

For example: consider a cardholder who charged $36000 last year and possesses additional
cards for members of the households. What is the probability the cardholder will upgrade to
the premium card during the market campaign?

Using X1=36, X2=1


ln(estimated odds of purchasing versus not purchasing)
= -6.9394 + (0.1395)(36) + (2.7743)(1)
= 0.8569
Then,

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 = 𝑒 ln 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜


= 𝑒 0.8569 = 2.3558

Therefore odds are 2.3558 to 1 that a credit cardholder who spent $36000 last year and has
additional cards will purchase the premium card during the campaign.

We can convert this odds ratio to a probability:

𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 2.3558


𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 = =
1+𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑜𝑑𝑑𝑠 𝑟𝑎𝑡𝑖𝑜 1+2.3558

= 0.702
Thus, the estimated probability is 0.702 that a credit cardholder who spent $36000 last year and
has additional cards will purchase the premium card during the campaign.
In other words, you predict 70.2% of such individuals will purchase the premium card.
Now that we have used the logistic regression model for prediction, we need to determine
whether or not the model is a good-fitting model.
• We will need to ascertain how good our regression model is once we have fitted it to the
data – does it accurately explain the data, or does it incorrectly classify cases as often as it
correctly classifies them?
• The deviance, or -2 log-likelihood (-2LL) statistic, can help us here.
• The deviance is basically a measure of how much unexplained variation there is in our
logistic regression model – the higher the value the less accurate the model.
• It compares the difference in probability between the predicted outcome and the actual
outcome for each case and sums these differences together to provide a measure of the
total error in the model.
• This is similar in purpose to looking at the total of the residuals (the sum of squares) in linear
regression analysis in that it provides us with an indication of how good our model is at
predicting the outcome.
• The -2LL statistic (often called the deviance) is an indicator of how much unexplained
information there is after the model has been fitted, with large values of -2LL indicating
poorly fitting models.
The deviance statistic follows a chi-square distribution with n-k-1 degrees of freedom.

The null and alternative hypotheses are

Ho: The model is a good-fitting model.


H1: The model is not a good fitting model.

Using the α level of significance, the decision rule is

Reject Ho if deviance > chi square value at α; otherwise, do not reject Ho.
Another way of evaluating the effectiveness of a regression model is to calculate how strong
the relationship between the explanatory variable(s) and the outcome variable is. This was
represented by the R2 statistic in linear regression analysis. R2, or rather a form of it, can also
be calculated for logistic regression.

The two versions most commonly used are Hosmer & Lemeshow’s R2 and Nagelkerke’s R2
Both describe the proportion of variance in the outcome that the model successfully
explains.

Like R2 in multiple regression, these values range between ‘0’ and ‘1’; with a value of ‘1’
suggesting that the model accounts for 100% of variance in the outcome and ‘0’ that it
accounts for none of the variance.
Interpretation of Classification
Table/Confusion Matrix
The following measures are used in the confusion matrix:

• True positive (TP): (1 1)Refers to the number of correctly predicted positive instances i.e.
Positive values predicted as positive. For example, you had predicted that France would win the world cup, and it
won.

• True negative (TN): (0 0)Refers to number of correctly predicted negative instances i.e.,
Negative values predicted as negative. You had predicted that England would not win and it lost.

• False negative (FN): (1 0)Refers to the number of incorrectly predicted positive instances i.e.
Positive values predicted as negative. You had predicted that France would not win, but it won.

• False positive (FP): (0 1) Refers to number of incorrectly predicted negative instances i.e.,
Negative values predicted as positive. You had predicted that England would win, but it lost.
Confusion Matrix

Accuracy=(TN+TP)/(ALL)

Recall: Recall gives us an idea about how often does it predict yes.
TP / (TP + FN)

Precision: Precision tells us about when it predicts yes, how often is it correct.
TP / (TP + FP)

F-Measure: (2 * Recall * Precision) / (Recall + Precision)


Practice Questions (Homework)
Q1. Given an estimated odds ratio of 2.5, compute the estimated probability of
an event of interest.

Q2. Consider the following logistic regression equation:

ln(Estimated odds ratio)=0.1+ 0.5X1 + 0.2X2

a. Interpret the meaning of the logistic regression coefficients.


b. If X1= 2 and X2= 1.5, compute the estimated odds ratio and interpret its
meaning.
c. On the basis of the results of (b), compute the estimated probability of an
event of interest.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy