3.1-Multivariate-Analysis
3.1-Multivariate-Analysis
𝑛
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1, 2, … , 𝑛 𝜕𝐿
ቤ = −2 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
11-3 𝜕𝛽0 𝛽
0 ,𝛽1 𝑖=1
𝑛
• and the sum of the squares of the deviations of 𝜕𝐿
ቤ = −2 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 𝑥𝑖 = 0
the observations
𝑛
from
𝑛
the true regression line is 𝜕𝛽1 𝛽
0 ,𝛽1 𝑖=1
𝐿 = 𝜖𝑖2 = 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 2
11-5
𝑖=1 𝑖=1
11-4
Method of Least Squares
• Simplifying Equations𝑛(11-5) 𝑛 𝛽መ0 = 𝑦ത − 𝛽መ1 𝑥ҧ
𝑛𝛽መ0 + 𝛽መ1 𝑥𝑖 = 𝑦𝑖 11-7
𝑖=1 𝑖=1
σ𝑛𝑖=1 𝑦𝑖 σ𝑛𝑖=1 𝑥𝑖
σ𝑛𝑖=1 𝑦𝑖 𝑥𝑖−
𝑛 𝑛 𝑛
𝛽መ1 = 𝑛
𝛽መ0 𝑥𝑖 + 𝛽መ1 𝑥𝑖2 = 𝑦𝑖 𝑥𝑖 σ 𝑛
𝑥
2
𝑖=1 𝑖=1 𝑖=1 σ𝑛𝑖=1 𝑥𝑖2 − 𝑖=1 𝑖
𝑛
11-8
• Equations 11-6 (least squares normal equations)
where 𝑦ത = 1Τ𝑛 σ𝑛𝑖=1 𝑦𝑖 and 𝑥ҧ = 1Τ𝑛 σ𝑛𝑖=1 𝑥𝑖 .
Least Squares Estimates
• Notationally, it is occasionally convenient to give 𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥
special symbols to the numerator and
denominator of Equation 11-8. Given data (x1, Equation 11-9
y1), (x2, y2),
𝑛
…, (xn, yn), let
𝑛 𝑛
σ𝑖=1 𝑥𝑖
𝑆𝑥𝑥 = 𝑥𝑖 − 𝑥ҧ 2 = 𝑥𝑖2 − Note that each pair of observations satisfies the
𝑛
𝑖=1 𝑖=1 relationship
• Equation
𝑛
11-10 (denominator)
𝑛
and 𝑦𝑖 = 𝛽መ0 + 𝛽መ1 𝑥 + 𝑒𝑖 , 𝑖 = 1, 2, … , 𝑛
σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑦𝑖
𝑆𝑥𝑦 = 𝑦𝑖 − 𝑦ത 𝑥𝑖 − 𝑥ҧ = 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖=1 𝑖=1
• where 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 is called the residual.
Equation 11-11 (numerator)
𝑆𝑥𝑦
𝛽መ1 =
𝑆𝑥𝑥
Fitted or Estimated Regression Line
11.2/398 The grades of a class of 9 students on
a midterm report (x) and on the final
examination (y) are as follows:
x 77 50 71 72 81 94 96 99 67
y 82 66 78 34 47 85 99 99 68
𝑆𝑥𝑥 𝑆𝑥𝑦
𝑟 = 𝛽መ1 =
𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦
Interpretations of r
±1.00 perfect positive (negative) correlation
±0.91 - ±0.99 very high positive (negative) correlation
±0.71 - ±0.90 high positive (negative) correlation
±0.51 - ±0.70 moderate positive (negative) correlation
±0.31 - ±0.50 low positive (negative) correlation
±0.01 - ±0.30 negligible positive (negative) correlation
0.00 no correlation
Coefficient of Determination
Denoted by r2
• A descriptive measure of the strength of the
regression relationship, a measure of how
well the regression line fits the data
• Ordinarily, we do not use r2 for inference
about ρ2.
Coefficient of Determination
11-13/400 A study of the amount of rainfall and the
quantity of air pollution removed produced the
following data:
Find the equation of the regression line to predict
the particulate removed from the amount of daily
rainfall.
Estimate the amount of particulate removed when
the daily rainfall is x = 4.8 units.
Calculate r. Daily Rainfall, x (0.01 Particulate Removed, y (μg/m3)
cm)
4.3 126
4.5 121
5.9 116
5.6 118
6.1 114
5.2 118
3.8 132
2.1 141
7.5 108
Estimating σ2
This is actually another unknown parameter in our Computing SSE using Equation 11-12 would be
regression model, σ2 (the variance of the error term fairly tedious. A more convenient computing
ϵ). The residuals 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 are used to obtain an formula can be obtained by substituting 𝑦ො𝑖 =
estimate of σ2. The sum of squares of the residuals,
often called the error sum of squares, is
𝛽መ0 + 𝛽መ𝑖 𝑥𝑖 into Equation 11-12 and simplifying.
The resulting computing formula is
𝑛 𝑛
(11-12)
where 𝑆𝑆𝑇 = σ𝑛𝑖=1 𝑦𝑖 − 𝑦ത 2 = σ𝑛𝑖=1 𝑦𝑖2 − 𝑛𝑦ത 2 is
the total sum of squares of the response
variable y. Formulas such as this are presented
in Section 11-4.
Estimator of Variance
Recall: We can show that the expected value of the error
σ𝑛 2 σ𝑛 ത 2
sum of squares is E(SSE) = (n − 2)σ2. Therefore, an
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑖=1 𝑦𝑖 −𝑦 unbiased estimator of σ2 is
𝜎𝑥2 = , 𝜎𝑦2 = ,
𝑛 𝑛
𝑆𝑥𝑦
𝑟= , 2
𝑆𝑆𝐸
𝑆𝑥𝑥 𝑆𝑦𝑦 𝜎 =
𝑛−2
and
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝛽መ1 𝑆𝑥𝑦 (11-13)
From calculator:
𝑛 From calculator:
2
𝑛𝜎𝑦 − 𝐵𝑛𝑟𝜎𝑥 𝜎𝑦
𝑆𝑆𝑇 = 𝑦𝑖 − 𝑦ത 2 = 𝑆𝑦𝑦 = 𝑛𝜎𝑦2 𝜎2 =
𝑛−2
𝑖=1
𝛽መ1 = 𝐵
𝑆𝑥𝑦 = 𝑟 𝑺𝒙𝒙 𝑺𝒚𝒚 = 𝑟 𝒏𝝈𝟐𝒙 𝒏𝝈𝟐𝒚 = 𝑛𝑟𝜎𝑥 𝜎𝑦
Finally,
𝑺𝑺𝑬 = 𝒏𝝈𝟐𝒚 − 𝑩𝒏𝒓𝝈𝒙 𝝈𝒚
TABLE · 11-1 Oxygen and Hydrocarbon Levels
Observation Hydrocarbon Purity Observation Hydrocarbon Purity
Number Level x(%) y(%) Number Level y(%)
x(%)
HC level 𝟏
14.947←𝜷 1.317 11.35 0.000 Residual error 18 21.25←SSE 𝝈𝟐
1.18←ෝ
𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
Summary
• Generally, correlation is a measure of the Sample Correlation Coefficient
interdependence among data. The concept may
𝑆𝑥𝑥 𝑆𝑥𝑦
include more than two variables. The term is 𝑟 = 𝛽1መ =
most commonly used in a narrow sense to 𝑆𝑦𝑦 𝑆𝑥𝑥 𝑆𝑦𝑦
express the relationship between quantitative
variables or ranks. Where 𝑛 𝑛
σ𝑛𝑖=1 𝑦𝑖
• The correlation coefficient (r) is a dimensionless 𝑆𝑦𝑦 = 𝑦𝑖 − 𝑦ത 2
= 𝑦𝑖2 −
measure of the linear association between two 𝑛
𝑖=1 𝑖=1
variables, usually lying in the interval from ─1 to
+1, with zero indicating the absence of
correlation (but not necessarily the The coefficient of determination (r2) is often used to judge
independence of the two variables.) the adequacy of a regression mode. Its value tells that the
model accounts for r2×% of the variability in the data.
Summary
Test Statistic for the Slope Test Statistic for the Intercept
1 −𝛽1,0
𝛽 1 −𝛽1,0
𝛽 0 −𝛽0,0
𝛽 0 −𝛽0,0
𝛽
• 𝑇0 = = 1 • 𝑇0 = = 0
ෝ 2 Τ𝑆𝑥𝑥
𝜎 se 𝛽 1 𝑥ഥ2 se 𝛽
ෝ2 +
𝜎
𝑛 𝑆𝑥𝑥
• 𝑣 =𝑛−2
• 𝑣 =𝑛−2
• se 𝛽መ1 is the standard error of slope.
• se 𝛽መ0 is the standard error of intercept.
References
• Aczel-Sounderpandian. Business Statistics, 7th Ed. © 2008
• Montgomery and Runger. Applied Statistics and Probability for Engineers, 6th Ed. © 2014
• Walpole, et al. Probability and Statistics for Engineers and Scientists 9th Ed. © 2012, 2007, 2002