Course 10-Part 2

8STT117
Session 10
Simple Linear
Regression
Part 2
Plan
Part 2
 Confidence interval on the parameters of the regression line
 Hypothesis tests on the parameters of the regression line
 Pointwise prediction, Interval prediction
 Coefficient of determination and correlation
2
Confidence Intervals on b1 and b0
The confidence interval for estimating 1 the slope of the

theoretical regression model at the confidence level 1-a is given
by :
LL, UL b1 t / 2 Sb 

1
if n-2 < 30 t / 2  t( n  2 ) d .l
LL, UL b1 z / 2 Sb 

1
if n-2 ≥ 30 z / 2  N (0,1)
3
If the value X = 0 is in the interval of the observed values for X,

then it is interesting to estimate  0by using confidence intervals.
The confidence interval for estimating  0 the y-intercept of the

theoretical regression model at the confidence level 1-a is given
by :
LL, UL b0 t / 2 Sb 

0
if n-2 < 30 t / 2  t( n  2 ) d .l
LL, UL b0 z / 2 Sb 0

if n-2 ≥ 30 z / 2  N (0,1)
4
 b2 and  b
2
Estimation of 0 1
The variances  b2 and

0
 b2 are unknown, then they are
1
estimated by :
   
1 x 2   S 2 
Sb0 S e   n
2 2
 and Sb21  n e

n 2  ( x  x )2 
 i 1
( xi  x )
   i 
i 1
5
Estimation of the variance  2
In practice, the variance  2is unknown, then it is estimated by :
n n
2
e 2
i  ( yi  yi )
ˆ 2
S  i 1
e  i 1
n 2 n 2
6
Example 2: Yˆ 33.31  3.95 X
41.21= 33.31 + 3.95 x 2
Sales
Advertising cost volumes (Y)
(X) (M$) (M$)
 yi  y 
2
xi yi  xi  x 
2
yˆi
4 49,5 0,49 49,11 0,39 0,1521 7,6176 9,9225
2 41 1,69 41,21 -0,21 0,0441 26,4196 28,6225
2,5 43 0,64 43,185 -0,185 0,0342 10,0172 11,2225
2 39 1,69 41,21 -2,21 4,8841 26,4196 54,0225
3 46 0,09 45,16 0,84 0,7056 1,4161 0,1225
5 53 2,89 53,06 -0,06 0,0036 45,0241 44,2225
1 38 5,29 37,26 0,74 0,5476 82,6281 69,7225
5,5 54 4,84 55,035 -1,035 1,0712 75,4292 58,5225
3,5 48,5 0,04 47,135 1,365 1,8632 0,6162 4,6225
4,5 51,5 1,44 51,085 0,415 0,1722 22,4202 26,5225
Sum 19,1 9,478 298,008 307,525
Mean x  3,3 y  46,35
SSE = sum of squares due to error
Calculate S e2 , Sb20 , Sb21 SSR = sum of squares due to regression

SST = total sum of squares
7
SST = SSR +
SSE
∑ ( 𝑦𝑖 − 𝑦 ¿¿ ∑ ( ^𝑦𝑖 − 𝑦 ¿+∑ ( 𝑦𝑖 − ^𝑦𝑖 ¿

2 2 2
SST = total sum of squares

SSR = sum of squares due to regression
SSE = sum of squares due to error
8
Calculate S e2 , Sb20 , Sb21
∑ 𝑒2
𝑖 𝑛
𝑆2
𝑒=
𝑖= 1
=∑ ¿ ¿ ¿
𝑛− 2 𝑖= 1
2 2
𝑆 𝑏0 =𝑆 ¿ 𝑒
2
𝑆 𝑏1 =¿
9
Example:
Confidence interval for 1 (1-a =95%) :
n-2 = 10 -2 = 8 < 30, then
Student’s table
LL, UL b1 t / 2 Sb  1
LL, UL 3.95 t0.025 0.062 

LL, UL 3.95 2.306 0.062 
LL, UL 3.3758, 4.5242
10
Validation of the Empirical Regression Line
Hypothesis tests on 1
To test for a significant regression relationship, we must

conduct a hypothesis test to determine whether the value of
b1 is zero.
Y  0   1 X  
If β1 = 0, values of X don’t
influence Y
11
Validation of the Empirical Regression Line
H 0 : 1 0
1. Hypotheses : H0 et H1.
H1 : 1 0
2. Condition of the test
 Population normally distributed
 Variance  2 is unknown
 If n – 2 ≥ 30, use Z (Normal)
 If n – 2 < 30, use t (Student)
3. Statistic.
b1  1 b1  1
Z0  if n  2 30 T0  if n  2  30
S b1 S b1
4. Critical area and state the rejection rule
Reject H 0 , if T0  t / 2,( n  2 ) d .l or if T0   t / 2,( n  2 ) d .l

Reject H 0 , if Z 0  z / 2 or if Z 0   z / 2
12
Example: Answer
a = 0,05
Step 1 Step 2
H 0 : 1 0 n – 2 = 8 < 30, population normally distributed
H1 : 1 0  2 unknown
Step 3 Step 4
b1  1 3.95  0 t0.025,8 d .l 2.306
T0   15.86
S b1 0.062 T0 15.86  t0.025,8 d .l 2.306,
then we reject H 0
The statistical evidence is sufficient to conclude that we have a

significant relationship between the advertising cost (X) and the
sales volume (Y)
13
Example:
Forecast of sales volume at a cost of $3.5 million in weekly

advertising
Yˆ 33.31  3.95 X
Ŷ = 33.31 + 3.95 (3.5) = 47.14
million
14
Application of the Linear Regression Model
Once the regression model is validated, it is possible to

perform two types of applications:
 Confidence interval
 Prediction interval
15
Confidence Interval on a Regression Line
If we want to estimate the mean value of the regression line for a

specified x0 value of X by means of a confidence interval at a
level (1 - a), then:
If  2 is unknown and n - 2 < 30
LL, UL b0  b1 x0 t / 2,( n  2 ) d .l S (Yˆ / x0 ) with S (Yˆ / x0 )  S e

1
 n
( x0  x ) 2
n
 ( xi  x ) 2
i 1
If  2 is known or if n - 2 ≥ 30, we use z / 2
16
Confidence Interval on a Regression Line
Example
Estimate by confidence interval at a = 0.05 the average sales
volume if investing $4 million in advertising.
Here  2 is unknown and n – 2 = 8 < 30
ˆ 1 ( x0  x ) 2 1 (4  3.3) 2
S (Y / x0 ) S e  n  1.1847   0.3858
n 10 19.1
 ( xi  x ) 2
i 1
LL, UL b0  b1 x0 t0.025,8 d .l S (Yˆ / x0 )

33.31  3.95 4 2.306 0.3858
49.11 2.306 0.3858
48.22, 50
17
Confidence Interval on an Individual
Prediction
In addition to the n observations in the sample, it is possible to
carry out a new observation x0 of X which would be independent of
the first n. In this case, we now want to predict the value of Y
corresponding to a new observation x0 of X. In this case, the
confidence interval at level (1 - a) will be:
If  2 is unknown and n - 2 < 30
LL, UL b0  b1 x0 t / 2,( n 2 ) d .l S (Y0  Yˆ0 ), with

1 ( x0  x ) 2
S (Y0  Yˆ0 )  S e 1  n
n
 ( xi  x ) 2
i 1
If  is known or if n - 2 ≥ 30, we use z / 2

 2
18
Confidence Interval on an Individual
Prediction
Example : Provide prediction interval at a = 0.05 for sales

volume based on a new sampling, if $4 million in advertising is
desired.
Here  2 is unknown and n – 2 = 8 < 30
2 2
1 ( x  x ) 1 ( 4  3 . 3)
S (Y0  Yˆ0 )  S e 1   n 0
 1.1847  1   1.155
n 10 19 . 1
 ( xi  x ) 2
i 1
[ 𝐿 𝐿,𝑈𝐿]=[ 𝑏0+𝑏1 𝑥0±𝑡0.025,8𝑑.𝑙 𝑆(𝑌 0− 𝑌 0)]

^
19
Correlation Analysis
20
Difference between regression and
correlation?
Simple linear regression is primarily concerned with the form

of the linear relationship that exists between X and Y
The correlation attempts to measure the intensity or

strength of the relationship between X and Y.
21
Correlation Analysis
There are two possible measures to quantify the intensity of

the relationship between X and Y:
– Coefficient of Determination
– Coefficient of Correlation
22
Coefficient of Determination
The coefficient of determination, denoted R2 or r2 and
pronounced "R squared" measures proportion of the total
variation of y which is « explained » by the regression.
2
0≤𝑟 𝑌𝑋 ≤1
=
rYX2 is an indicator of the quality of the adjustment from the

line to the experimental points.
23
Coefficient of Determination
Example 2:
96.92% of the
𝑛
variation of y is
𝑟2
𝑌𝑋 =𝑟
2
=∑ ¿
= ¿¿
explained by the
𝑖= 1
linear regression?
2
Since r YX it is close to 1, it can be said that the regression line
fits very well with the scatter plot.
Sales
Advertising cost volumes (Y)
(X) (M$) (M$)
 yi  y 
2
xi yi  xi  x 
2
yˆi
4 49,5 0,49 49,11 0,39 0,1521 7,6176 9,9225
2 41 1,69 41,21 -0,21 0,0441 26,4196 28,6225
2,5 43 0,64 43,185 -0,185 0,0342 10,0172 11,2225
2 39 1,69 41,21 -2,21 4,8841 26,4196 54,0225
3 46 0,09 45,16 0,84 0,7056 1,4161 0,1225
5 53 2,89 53,06 -0,06 0,0036 45,0241 44,2225
1 38 5,29 37,26 0,74 0,5476 82,6281 69,7225
5,5 54 4,84 55,035 -1,035 1,0712 75,4292 58,5225
3,5 48,5 0,04 47,135 1,365 1,8632 0,6162 4,6225
4,5 51,5 1,44 51,085 0,415 0,1722 22,4202 26,5225
Sum 19,1 9,478 298,008 307,525
Mean x  3,3 y  46,35
24
Coefficient of Correlation
Example 2:
2
rxy (sign of b1 ) r  0,9692 0,9845
xy
(where b1 is the slope of the regression line)
Remarks :
  1 r 1
XY
 If rXY 1 indicates perfect and strong correlation
 If 𝑟 𝑋𝑌  0 indicates weak correlation
25
Coefficient of Correlation
Positive linear
relationship
Negative linear
relationship
No linear
relationship
26
Session 10
The End
27

Course 10-Part 2

Uploaded by

Copyright:

Available Formats

Course 10-Part 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course 10-Part 2

Uploaded by

Copyright:

Available Formats

8STT117

The confidence interval for estimating 1 the slope of the

LL, UL b1 t / 2 Sb 

LL, UL b1 z / 2 Sb 

If the value X = 0 is in the interval of the observed values for X,

The confidence interval for estimating  0 the y-intercept of the

LL, UL b0 t / 2 Sb 

LL, UL b0 z / 2 Sb 0

The variances  b2 and

Estimation of the variance  2

In practice, the variance  2is unknown, then it is estimated by :

Calculate S e2 , Sb20 , Sb21 SSR = sum of squares due to regression

∑ ( 𝑦𝑖 − 𝑦 ¿¿ ∑ ( ^𝑦𝑖 − 𝑦 ¿+∑ ( 𝑦𝑖 − ^𝑦𝑖 ¿

SST = total sum of squares

LL, UL 3.95 t0.025 0.062 

To test for a significant regression relationship, we must

Reject H 0 , if T0  t / 2,( n  2 ) d .l or if T0   t / 2,( n  2 ) d .l

The statistical evidence is sufficient to conclude that we have a

Forecast of sales volume at a cost of $3.5 million in weekly

Once the regression model is validated, it is possible to

If we want to estimate the mean value of the regression line for a

If  2 is unknown and n - 2 < 30

LL, UL b0  b1 x0 t / 2,( n  2 ) d .l S (Yˆ / x0 ) with S (Yˆ / x0 )  S e

If  2 is known or if n - 2 ≥ 30, we use z / 2

Here  2 is unknown and n – 2 = 8 < 30

LL, UL b0  b1 x0 t0.025,8 d .l S (Yˆ / x0 )

If  2 is unknown and n - 2 < 30

LL, UL b0  b1 x0 t / 2,( n 2 ) d .l S (Y0  Yˆ0 ), with

If  is known or if n - 2 ≥ 30, we use z / 2

Example : Provide prediction interval at a = 0.05 for sales

[ 𝐿 𝐿,𝑈𝐿]=[ 𝑏0+𝑏1 𝑥0±𝑡0.025,8𝑑.𝑙 𝑆(𝑌 0− 𝑌 0)]

Simple linear regression is primarily concerned with the form

The correlation attempts to measure the intensity or

There are two possible measures to quantify the intensity of

rYX2 is an indicator of the quality of the adjustment from the

(where b1 is the slope of the regression line)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.