Statistical Interference Lecture-8
Statistical Interference Lecture-8
Statistical Interference Lecture-8
Regression Analysis
Presented by Dr. Muhammad Khalid Sohail
Regression
• In statistics, regression analysis is a statistical process
for estimating the relationships among variables.
When the focus is on the relationship between a
dependent variable and one or more independent
variables.
• The two basic types of regression are simple linear
regression and multiple linear regression.
• Simple Linear regression uses one independent
variable to explain and/or predict the outcome of Y,
while
• Multiple regression uses two or more independent
variables to predict the outcome, Y (Dependent
Variable).
• Regression is concerned with describing and evaluating the relationship between a
given variable and one or more other variables. More specifically, regression is an
attempt to explain movements in a variable by reference to movements in one or more
other variables.
• Denote the dependent variable by y and the independent variable(s) by x1, x2, ... , xk
where there are k independent variables.
• Note that there can be many x variables but we will limit ourselves to the case where
there is only one x variable to start with. In our set-up, there is only one y variable.
3
Regression is different from Correlation
• If we say y and x are correlated, it means that we are treating y and x in a completely symmetrical
way.
• In regression, we treat the dependent variable (y) and the independent variable(s) (x’s) very
differently. The y variable is assumed to be random or “stochastic” in some way, i.e. to have a
probability distribution. The x variables are, however, assumed to have fixed (“non-stochastic”)
values in repeated samples.
4
Simple Regression
• For simplicity, say k=1. This is the situation where y depends on only one x
variable.
5
Finding a Line of Best Fit
• Is this realistic? No. So what we do is to add a random disturbance term, u into the equation.
yt = + xt + ut
where t = 1,2,3,4,5
6
Why do we include a Disturbance term?
7
ˆb nXiYi Xi Yi
Regression
nXi Xi
2
• Regression Model 2
• yt = + xt + u t Xi XYi Y
• a=alpha=intercept
Xi X 2
• b= beta=slope
aˆ Y bˆX
• u=error term
• After calculating a^ a and b^ we can
predict any value of Y^, by the
following formula
• Y^=a^+b^X
nXiYi Xi Yi
bˆ
nXi2 Xi
2
X XY Y
i i
X X
2
i
aˆ Y bˆX
nXiYi Xi Yi
bˆ
nXi2 Xi
Regression
2
X XY Y
i i
X X
2
• If t(a^)>2, significant
• If t(b^)>2, significant
• OR s
uˆ 2
t
T 2
2 2
x x
• If t(a^)>CV, significant SE (ˆ ) s t
s t
,
T ( xt x ) 2
T xt T 2 x 2
2
• If t(b^)>CV, significant
ˆ) s 1 1
• CV->critical value, (from table) SE ( s
( xt x ) 2 xt2 Tx 2
• Say N=10, alpha=0.10
• Ho: a^=0 and b^=0
• H1: a^=0 and b^=0
• It is two tailed Test, alpha/2=0.10/2=0.05, so we can get table value as
t(0.05,N-2)=t(0.05,8)=2.3
Regression
• Try to solve some regression questions from book