Regression Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

Regression Analysis

Dr. Basharat Mahmood


Department of Computer Science
COMSATS University Islamabad
REGRESSION ANALYSIS
Regression analysis is used to:
 Predict the value of a dependent variable based on the
value of at least one independent variable
 Explain the impact of changes in an independent variable
on the dependent variable
 Provides an equation to be used for estimating the value of
dependent variable against some known value of
independent variable
Dependent variable: the variable we wish to predict or
explain
Independent variable: the variable used to predict or explain
the dependent variable
UTILITY OF REGRESSION
o Degree & Nature of relationship
o Forecasting
o Making data-driven decisions
o Recognizing opportunities to improve
DIFFERENCE BETWEEN CORRELATION &
REGRESSION
o Degree & Nature of Relationship
• Correlation is a measure of degree of relationship
between X & Y
• Regression studies the nature of relationship
between the variables so that one may be able to
predict the value of one variable on the basis of
another.
o Cause & Effect Relationship
• Correlation does not always assume cause and effect
relationship between two variables.
• Regression clearly expresses the cause and effect
relationship between two variables. The independent
variable is the cause and dependent variable is effect.
DIFFERENCE BETWEEN CORRELATION &
REGRESSION
o Prediction
• Correlation doesn't help in making predictions
• Regression enable us to make predictions using
regression line
o Symmetric
• Correlation coefficients are symmetrical i.e. r xy =
ryx·
• Regression coefficients are not symmetrical i.e.
bxy byx.
TYPES OF REGRESSION ANALYSIS
o Simple & Multiple Regression
o Linear & Non-Linear Regression
o Partial & Total Regression
o Logistic Regression
SIMPLE LINEAR REGRESSION

Simple
Linear
Regression

Regression Regression Regression


Lines Equations Coefficients
REGRESSION LINES
o The regression line shows the average relationship
between two variables. It is also called Line of Best Fit.
o If two variables X & Y are given, then there are two
regression lines:
• Regression Line of X on Y
• Regression Line of Y on X
o Nature of Regression Lines
• If r = ±1, then the two regression lines are coincident.
• If r = 0, then the two regression lines intersect each other at
90°.
• The nearer the regression Lines are to each other, the greater
will be the degree of correlation.
• If regression lines rise from left to right upward, then
correlation is positive.
REGRESSION EQUATIONS
o Regression Equations are the algebraic formulation of
regression lines.
o There are two regression equations:
• Regression Equation of Y on X
o Y= a+ bX
o Y - = byx(X - )

• Regression Equation of X on Y
oX=a+bY
o X - = bxy (Y - )
REGRESSION COEFFICIENTS
• Regression coefficient measures the average change in the
value of one variable for a unit change in the value of
another variable.
• These represent the slope of regression line
• There are two regression coefficients:
• Regression coefficient of Y on X:
• Regression coefficient of X on Y:
PROPERTIES OF REGRESSION COEFFICIENTS

• Coefficient of correlation is the geometric mean of the regression


coefficients. i.e.,
• Both the regression coefficients must have the same algebraic sign.
• Coefficient of correlation must have the same sign as t h a t of the
regression coefficients.
• Arithmetic mean of two regression coefficients is equal to or
greater than the correlation coefficient i.e.,
OBTAINING REGRESSION EQUATIONS
REGRESSION EQUATIONS USING NORMAL
EQUATIONS
o This method is also called as Least Square Method.
o Under this method, regression equations can be calculated by
solving two normal equations:
• For regression equation Yon X:

• Another Method

o Here a is the Y - intercept, indicates the minimum value of Y for X = 0


& b is the slope of the line, indicates the absolute increase in Y for a unit
increase in X.
Residual

• The difference between the observed value yi and the


corresponding fitted value .
ei  yi  y
ˆi
• Residuals are highly useful for studying whether a given
regression model is appropriate for the data at hand.
Estimation of the variance of the error terms, 2
• The variance 2 of the error terms i in the regression
model needs to be estimated for a variety of purposes.
• It gives an indication of the variability of the
probability distributions of y.
• It is needed for making inference concerning
regression function and the prediction of y.
• To estimate  we work with the variance and take
the square root to obtain the standard deviation.
• For simple linear regression, the estimate of 2 is
the average squared residual.
1 1
 i n2 i i
2
s y.x  e 2
 ( y  ˆ
y ) 2

n2
REGRESSION EQUATIONS USING REGRESSION
COEFFICIENTS (USING ACTUAL VALUES)

o Regression Equation of Yon X

o Regression Equation of X on Y
PRACTICE PROBLEMS

Ql: Calculate the regression equation of X on Y using method of


least squares:

X 5 7 9 8 11 12
Y 14 18 25 29 32 37

Also find the variance of the error terms.


PRACTICE PROBLEMS
PRACTICE PROBLEMS
PRACTICE PROBLEMS
INFERENCE CONCERNING THE REGRESSION
COEFFICIENTS

Where
TESTING ABOUT REGRESSION
COEFFICIENT
TESTING ABOUT REGRESSION
COEFFICIENT

Example: A study was


conducted at Virginia Tech to
determine
if certain static arm-strength
measures have an influence on
the “dynamic lift” characteristics
of an individual. Twenty-five
individuals were subjected to
strength tests and then were
asked to perform a weightlifting
test in which weight was
dynamically lifted overhead.
The data are given here.
TESTING ABOUT REGRESSION
COEFFICIENT

• Fit a linear regression line for the given data


• Compute the residuals and verify that they add to zero
• Find a 95% confidence interval for b in the regression
• Test the hypothesis that b = 1.0 against the alternative that
b < 1.0.
Multiple Regression Analysis (MRA)
Method for studying the relationship between a dependent
variable and two or more independent variables.

Y’ = a + b1X1 + b2X2 + …bkXk

Purposes:
Prediction
Explanation
Theory building
Multiple Regression Analysis (MRA)
The following are situations where we can use multiple
regression:
• Testing if IQ and level of education affect income (IQ
and years of education are the IV and income is the DV).
• Testing if study time and pre-test scores affect final
grades (DV is final grades, and study time and pre-test
scores are the IV).
• Testing if exercise and amount of salt in the diet affect
blood pressure (exercise and salt are the IV and blood
pressure is the DV).
Multiple Regression Analysis (MRA)

Multiple Regression with Two independent variables

Y’ = a + b1X1 + b2X2
Normal Equations
Multiple Regression Analysis (MRA)

Multiple Regression with Two independent variables

Y’ = a + b1X1 + b2X2
Normal Equations
Multiple Regression Analysis (MRA)

Coefficient of Determination

or
Polynomial Regression
• Polynomial regression, describes the fitting of a nonlinear
relationship between the value of x and y.
• Examples:
• Number of click on a post
• total working hours per week vs. overall happiness
Polynomial Regression

Quadratic regression calculation

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy