Course 10-Part 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

8STT117

Session 10

Simple Linear
Regression
Part 2
Plan

Part 2
 Confidence interval on the parameters of the regression line
 Hypothesis tests on the parameters of the regression line
 Pointwise prediction, Interval prediction
 Coefficient of determination and correlation

2
Confidence Intervals on b1 and b0

The confidence interval for estimating 1 the slope of the


theoretical regression model at the confidence level 1-a is given
by :

LL, UL b1 t / 2 Sb 


1
if n-2 < 30 t / 2  t( n  2 ) d .l

LL, UL b1 z / 2 Sb 


1
if n-2 ≥ 30 z / 2  N (0,1)

3
Confidence Intervals on b1 and b0

If the value X = 0 is in the interval of the observed values for X,


then it is interesting to estimate  0by using confidence intervals.

The confidence interval for estimating  0 the y-intercept of the


theoretical regression model at the confidence level 1-a is given
by :

LL, UL b0 t / 2 Sb 


0
if n-2 < 30 t / 2  t( n  2 ) d .l

LL, UL b0 z / 2 Sb 0


if n-2 ≥ 30 z / 2  N (0,1)

4
Confidence Intervals on b1 and b0

 b2 and  b
2
Estimation of 0 1

The variances  b2 and


0
 b2 are unknown, then they are
1
estimated by :

   
1 x 2   S 2 
Sb0 S e   n
2 2
 and Sb21  n e

n 2  ( x  x )2 
 i 1
( xi  x )
   i 
i 1

5
Confidence Intervals on b1 and b0

Estimation of the variance  2

In practice, the variance  2is unknown, then it is estimated by :

n n

2
e 2
i  ( yi  yi )
ˆ 2

S  i 1
e  i 1
n 2 n 2

6
Confidence Intervals on b1 and b0
Example 2: Yˆ 33.31  3.95 X
41.21= 33.31 + 3.95 x 2
Sales
Advertising cost volumes (Y)
(X) (M$) (M$)
 yi  y 
2
xi yi  xi  x 
2
yˆi
4 49,5 0,49 49,11 0,39 0,1521 7,6176 9,9225
2 41 1,69 41,21 -0,21 0,0441 26,4196 28,6225
2,5 43 0,64 43,185 -0,185 0,0342 10,0172 11,2225
2 39 1,69 41,21 -2,21 4,8841 26,4196 54,0225
3 46 0,09 45,16 0,84 0,7056 1,4161 0,1225
5 53 2,89 53,06 -0,06 0,0036 45,0241 44,2225
1 38 5,29 37,26 0,74 0,5476 82,6281 69,7225
5,5 54 4,84 55,035 -1,035 1,0712 75,4292 58,5225
3,5 48,5 0,04 47,135 1,365 1,8632 0,6162 4,6225
4,5 51,5 1,44 51,085 0,415 0,1722 22,4202 26,5225
Sum 19,1 9,478 298,008 307,525
Mean x  3,3 y  46,35
SSE = sum of squares due to error

Calculate S e2 , Sb20 , Sb21 SSR = sum of squares due to regression


SST = total sum of squares

7
Confidence Intervals on b1 and b0

SST = SSR +
SSE

∑ ( 𝑦𝑖 − 𝑦 ¿¿ ∑ ( ^𝑦𝑖 − 𝑦 ¿+∑ ( 𝑦𝑖 − ^𝑦𝑖 ¿


2 2 2

SST = total sum of squares


SSR = sum of squares due to regression
SSE = sum of squares due to error

8
Calculate S e2 , Sb20 , Sb21

∑ 𝑒2
𝑖 𝑛
𝑆2
𝑒=
𝑖= 1
=∑ ¿ ¿ ¿
𝑛− 2 𝑖= 1

2 2
𝑆 𝑏0 =𝑆 ¿ 𝑒
2
𝑆 𝑏1 =¿
9
Confidence Intervals on b1 and b0
Example:
Confidence interval for 1 (1-a =95%) :
n-2 = 10 -2 = 8 < 30, then
Student’s table
LL, UL b1 t / 2 Sb  1

LL, UL 3.95 t0.025 0.062 


LL, UL 3.95 2.306 0.062 
LL, UL 3.3758, 4.5242

10
Validation of the Empirical Regression Line

Hypothesis tests on 1

To test for a significant regression relationship, we must


conduct a hypothesis test to determine whether the value of
b1 is zero.

Y  0   1 X  
If β1 = 0, values of X don’t
influence Y

11
Validation of the Empirical Regression Line
H 0 : 1 0
1. Hypotheses : H0 et H1.
H1 : 1 0
2. Condition of the test
 Population normally distributed
 Variance  2 is unknown
 If n – 2 ≥ 30, use Z (Normal)
 If n – 2 < 30, use t (Student)

3. Statistic.
b1  1 b1  1
Z0  if n  2 30 T0  if n  2  30
S b1 S b1
4. Critical area and state the rejection rule

Reject H 0 , if T0  t / 2,( n  2 ) d .l or if T0   t / 2,( n  2 ) d .l


Reject H 0 , if Z 0  z / 2 or if Z 0   z / 2

12
Example: Answer
a = 0,05

Step 1 Step 2
H 0 : 1 0 n – 2 = 8 < 30, population normally distributed
H1 : 1 0  2 unknown

Step 3 Step 4
b1  1 3.95  0 t0.025,8 d .l 2.306
T0   15.86
S b1 0.062 T0 15.86  t0.025,8 d .l 2.306,
then we reject H 0

The statistical evidence is sufficient to conclude that we have a


significant relationship between the advertising cost (X) and the
sales volume (Y)

13
Example:

Forecast of sales volume at a cost of $3.5 million in weekly


advertising

Yˆ 33.31  3.95 X
Ŷ = 33.31 + 3.95 (3.5) = 47.14
million

14
Application of the Linear Regression Model

Once the regression model is validated, it is possible to


perform two types of applications:

 Confidence interval
 Prediction interval

15
Confidence Interval on a Regression Line

If we want to estimate the mean value of the regression line for a


specified x0 value of X by means of a confidence interval at a
level (1 - a), then:

If  2 is unknown and n - 2 < 30

LL, UL b0  b1 x0 t / 2,( n  2 ) d .l S (Yˆ / x0 ) with S (Yˆ / x0 )  S e


1
 n
( x0  x ) 2
n
 ( xi  x ) 2
i 1

If  2 is known or if n - 2 ≥ 30, we use z / 2

16
Confidence Interval on a Regression Line
Example
Estimate by confidence interval at a = 0.05 the average sales
volume if investing $4 million in advertising.

Here  2 is unknown and n – 2 = 8 < 30

ˆ 1 ( x0  x ) 2 1 (4  3.3) 2
S (Y / x0 ) S e  n  1.1847   0.3858
n 10 19.1
 ( xi  x ) 2
i 1

LL, UL b0  b1 x0 t0.025,8 d .l S (Yˆ / x0 )


33.31  3.95 4 2.306 0.3858
49.11 2.306 0.3858
48.22, 50

17
Confidence Interval on an Individual
Prediction
In addition to the n observations in the sample, it is possible to
carry out a new observation x0 of X which would be independent of
the first n. In this case, we now want to predict the value of Y
corresponding to a new observation x0 of X. In this case, the
confidence interval at level (1 - a) will be:

If  2 is unknown and n - 2 < 30

LL, UL b0  b1 x0 t / 2,( n 2 ) d .l S (Y0  Yˆ0 ), with


1 ( x0  x ) 2
S (Y0  Yˆ0 )  S e 1  n
n
 ( xi  x ) 2
i 1

If  is known or if n - 2 ≥ 30, we use z / 2


 2

18
Confidence Interval on an Individual
Prediction

Example : Provide prediction interval at a = 0.05 for sales


volume based on a new sampling, if $4 million in advertising is
desired.
Here  2 is unknown and n – 2 = 8 < 30
2 2
1 ( x  x ) 1 ( 4  3 . 3)
S (Y0  Yˆ0 )  S e 1   n 0
 1.1847  1   1.155
n 10 19 . 1
 ( xi  x ) 2
i 1

[ 𝐿 𝐿,𝑈𝐿]=[ 𝑏0+𝑏1 𝑥0±𝑡0.025,8𝑑.𝑙 𝑆(𝑌 0− 𝑌 0)]


^
19
Correlation Analysis

20
Difference between regression and
correlation?

Simple linear regression is primarily concerned with the form


of the linear relationship that exists between X and Y

The correlation attempts to measure the intensity or


strength of the relationship between X and Y.

21
Correlation Analysis

There are two possible measures to quantify the intensity of


the relationship between X and Y:

– Coefficient of Determination

– Coefficient of Correlation

22
Coefficient of Determination
The coefficient of determination, denoted R2 or r2 and
pronounced "R squared" measures proportion of the total
variation of y which is « explained » by the regression.

2
0≤𝑟 𝑌𝑋 ≤1
=

rYX2 is an indicator of the quality of the adjustment from the


line to the experimental points.

23
Coefficient of Determination
Example 2:
96.92% of the
𝑛
variation of y is
𝑟2
𝑌𝑋 =𝑟
2
=∑ ¿
= ¿¿
explained by the
𝑖= 1
linear regression?
2
Since r YX it is close to 1, it can be said that the regression line
fits very well with the scatter plot.
Sales
Advertising cost volumes (Y)
(X) (M$) (M$)
 yi  y 
2
xi yi  xi  x 
2
yˆi
4 49,5 0,49 49,11 0,39 0,1521 7,6176 9,9225
2 41 1,69 41,21 -0,21 0,0441 26,4196 28,6225
2,5 43 0,64 43,185 -0,185 0,0342 10,0172 11,2225
2 39 1,69 41,21 -2,21 4,8841 26,4196 54,0225
3 46 0,09 45,16 0,84 0,7056 1,4161 0,1225
5 53 2,89 53,06 -0,06 0,0036 45,0241 44,2225
1 38 5,29 37,26 0,74 0,5476 82,6281 69,7225
5,5 54 4,84 55,035 -1,035 1,0712 75,4292 58,5225
3,5 48,5 0,04 47,135 1,365 1,8632 0,6162 4,6225
4,5 51,5 1,44 51,085 0,415 0,1722 22,4202 26,5225
Sum 19,1 9,478 298,008 307,525
Mean x  3,3 y  46,35

24
Coefficient of Correlation
Example 2:

2
rxy (sign of b1 ) r  0,9692 0,9845
xy

(where b1 is the slope of the regression line)

Remarks :
  1 r 1
XY
 If rXY 1 indicates perfect and strong correlation
 If 𝑟 𝑋𝑌  0 indicates weak correlation

25
Coefficient of Correlation

Positive linear
relationship

Negative linear
relationship

No linear
relationship

26
Session 10

The End

27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy