Multiple Linear Regression: y BX BX BX
Multiple Linear Regression: y BX BX BX
Multiple Linear Regression: y BX BX BX
.
1
s
y y
N
y x .
( )
=
2
2
with
In multiple regression we may wish to test a hypothesis concerning all of the
predictors or some subset of the predictors, in addition to tests of the individual slopes.
t-tests of individual coefficients, with all other predictors held constant
F-tests of whether taken together the predictors are a significant predictor of the criterion
F-tests of whether some subset of predictors is significant
y b x b x a = + +
1 1 2 2
Example of strange outcomes.
t-test of b1 may be non-significant
t-test of b2 may be non-significant
F-test of b1, b2 may be significant
When two predictors are correlated, the standard errors of the coefficients are
larger than they would be in the absence of the other predictor variable.
F
R
R
N k
k
=
|
\
|
.
|
|
\
|
.
|
2
2
1
1
(
( )
F
N p R
p R
=
( )
( )
1
1
2
2
Or
k and p = # of predictors
df = p(or k), (N-p(or k)-1)
Limitations of test of significance: both of individual predictors and the overall model
If there are small differences in the betas of the various predictors, different
patterns of significance may easily arise from another sample. The relative
variabilities of the predictors may change.
A significant beta does NOT necessarily mean that the variable is of theoretical or
of practical importance. The issue of IMPORTANCE is a difficult one. The relative size
of the betas is not always the solution. (For example, your ability to manipulate a variable
may be as important an issue in practical terms.)
Difference between two
R
2
F
R R k k
R N k
l er smaller l er smaller
l er lareger
=
( ) ( )
( )( )
arg arg
arg
2 2
2
1 1
K = # of variable in the regression
Types of Analysis: Data types
Cross-sectional: cases represent different objects at one point in time
Time-Series: same object and variables are tested over time
- a lagged dependent variable (criterion) (value at previous time) can
be used as an independent variable (predictor)
Continuous versus dummy variables
Dummy variables: categorical, binary, dichotomous (0 and 1).
There may be more than two categories. For example there may be four categories.
This would produce three dummy variables. Let us say that there are four types of
people: A, B, C, and D. There would be three variables: A (yes/no, 0/1), B(0/1)
C(0/1) and all zeros would make the fourth category a yes and is reflected in
the intercept.
Interactions: derived variables
Between two continuous variables or between one continuous and
one dummy variable.
y a b x b x b x x = + + +
1 1 2 2 3 1 2
If x1 is the continuous variable, then b1 tell us its effect on the criterion when x2 = 0.
B1 + b3 will tell us the effect when x2 =1. B3 tell us the difference in the two slopes.
For two continuous variable, the additional interaction term will indicate if the
effect of x1 at low values of x2 is greater or less than its effect at higher values
of x2.
Remember, adding additional predictor variables, even interaction terms,
can change the betas of all other predictors.
Basic issues
If you omit a relevant variable(s) from your regression, the betas of the included
variables will be at best unreliable and at worst invalid.
If you include irrelevant predictor variable, the betas of the other relevant
variables remain unbiased but, however, to the extent that this irrelevant variable
is correlated with some of the other predictors, it will increase the size of the
Standard Errors (reduce power).
If the underlying function between one or more of the predictors and the criterion
is something other than linear, then the betas will be biased and unreliable. This
is one reason why it is important to look at all bivariate plots prior to the analysis.
Addressing Collinearilty
Ideally, you should collect new data that is free of multiple collinearity. This
usually requires an experimental design (creating true independent variables).
This is usually not feasible or it would have been done in the first place.
1. Model Respecification: combining correlated variable through various
techniques or choosing remove some. (Theoretical & Statistical)
2. Statistical Variable Selection
a. Step-wise procedures: can be deceptive and often fails to maximize
b. Examine all subsets: may reveal subsets with similar
but the resulting solution may not fit with either the research question
or the theoretical approach.
R
2