0% found this document useful (0 votes)
29 views5 pages

Sampling Unit 6

The document discusses linear regression estimates for population means and totals. It provides formulas for calculating regression estimates using an auxiliary variable and compares the variance of regression estimates to ratio and simple random sampling estimates. An example calculates the estimated systolic blood pressure of a population of teachers using age as an auxiliary variable from sample data.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Sampling Unit 6

The document discusses linear regression estimates for population means and totals. It provides formulas for calculating regression estimates using an auxiliary variable and compares the variance of regression estimates to ratio and simple random sampling estimates. An example calculates the estimated systolic blood pressure of a population of teachers using age as an auxiliary variable from sample data.

Uploaded by

yonasante2121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Chapter 6: Regression Estimates

Regression analysis is one of statistical techniques that can be used for investigating and
modeling the relationship between variables. Regression is applied in almost every field of
human endeavor.
Assume a regression model describing a chance relationship between the auxiliary variable and
the variable of interest for the study. Then a simple regression model is given by:
Y = o + 1X + 
Where o is the intercept and 1 is the slope of the model and they are unknown constants (called
regression coefficients). X is the independent variable, Y is the dependent variable, and  is a
random error component. This random error is assumed to have mean zero and unknown
variance 2, and the errors are uncorrelated.
The objective of sampling is the ‘prediction’ of some characteristic of the y-values of the
population such as population mean and total or the y-value of a single unit not yet in the sample.

6.1 The Linear regression Estimate

The linear regression estimate is designed to increase efficiency of population estimation by


using information on the auxiliary variable Xi, which is correlated with Yi. Suppose that (yi, xi),
i = 1, 2, - - -, n denote pairs of observations on the study and simple random sampling without
replacement of n units selected from the finite population of N units. Let y and x denote the
corresponding sample means, whereas X is the population mean for the auxiliary variable.
Then the linear regression estimate of the population mean, Y , is:
y lr = y + b ( X  x ), where b is an estimate of change in Y for any unit change of X. The
regression estimate of the population total Y is given as Yˆ = N y . lrs lr
In regression estimate, the value of b can be obtained from various sources:
i) It can be obtained from the result of sample survey,
ii) In repeated surveys b remains fairly constant, i.e., choose or assign the value of b in advance,
iii) One may consider the value of b as zero or b = y / x .

First we consider when the value of b is zero or b = y / x :

When the value of b is zero or b = y / x , the regression estimate, y lr will be reduced to y (mean
per unit) or Yˆ (ratio estimate) respectively. That is, y = y + b ( X  x ) = y if b = 0 and
R lr

y lr = y + b ( X  x ) = y + y / x ( X  x ) = y + ( y / x ) X  y = ( y / x ) X = YˆR , if b = y / x .

When the value of b is pre-assigned a constant value:


If b is pre-assigned value, say b = bo, it will give the following result.

Theorem 6.1: In simple random sampling, in which the regression coefficient is constant and
known (say bo), the linear regression estimate y lr = y + bo( X  x ) is unbiased, with variance,
N

 y  Y )  bo ( x i  X ) 
2
i
1 f 1 f 2
V( y lr ) = i 1
=
n

S y  2bo S yx  bo2 S x2  (prove this theorem)
n N 1

1
Corollary: An unbiased sample estimate of V( y lr ) is
n

 y  y )  bo ( xi  x )
2
i
1  f i 1 1 f 2
v( y lr ) =
n n 1
=
n
s y  2bo s yx  bo2 s x2  . The interest could be to
find the best value of bo that minimizes the V( y lr ) and it is given the following theorem.
N

S yx (y
i 1
i  Y )( xi  X )
Theorem 6.2: The value of bo that minimizes the V( y lr ) is bo = B = =
S x2 N

(x i  X )2
i 1
(prove this theorem)
1 f 2
The resulting minimum variance is V( y lr )min = S y (1   2 ) which could be obtained by
n
S yx
substituting bo = B = =  ( S y S x ) in the V( ylr ) and  is the population correlation
S x2
coefficient between y and x. Note that B does not depend on the properties of any sample.

When the value of b is computed from the sample:

In most applications, B is unknown and should be estimated from the sample. It is known that
n

s yx (y
i 1
i  y )( xi  x )
the BLUE of B is b= =
s x2 n

 (x i  x)2
i 1

Theorem 6.3: If b is the least squares estimate of B and y lr = y + b ( X  x ), then in simple


1 f 2
random sample of size n with n large, V( y lr )  S y (1   2 ) , where   S yx S y S x is the
n
population correlation between y and x. If  is unknown estimate by a sample correlation
coefficient, i.e., ̂  s yx s y s x

Example: A medical student was given an assignment to estimate the average systolic blood
pressure (BP) for teachers between 30 and 60 years of age in a certain university. The objective
was to compare it with the systolic BP of a part of the population engaged in manual work. A
without replacement of simple random sample of 24 teachers was drawn from the frame
consisting of 961 teachers. Age of teachers was taken as the auxiliary variable. The following
data are available to estimate the average systolic BP:
Average age for the population of teachers = 42.7 years, Regression coefficient from earlier
study is 0.8952,
n n n
Measurements from the sample are:  yi = 3247,
i 1
 yi2 = 442652,
i 1
x
i 1
i = 1059,
n
2
x
i 1
i = 48778, where yi and xi represent BP and age of teachers respectively. From these data

calculate to obtain: y = 135.292, x = 44.125, s 2y =146.085, s x2 = 89.114, s xy = 82.875

2
i) Average systolic BP: y lr = y + b ( X  x ) = 135.292 + 0.8952(42.7 – 44.125) = 134.016
24
1
1 f 2
v( y lr ) =
n
s y  bo2 s x2  2bo s yx  = 961 146.085  (0.8952) 2 (89.114)  2(0.8952)(82.875)
24
v( y lr ) = 2.808  s.e ( y lr ) = 1.6757

6.2 Large-sample Comparison of Regression with SRS and Ratio Estimates

To make comparisons, the sample size n must be large to apply approximate formulas for the
variances of the ratio and regression estimates. For the estimated population mean, Y , we have
the following three variances of estimate of the mean.
1 f 2 1 f 2
V( y lr )min = S y (1   2 ) , for regression, V( YˆR ) = S y  R 2 S x2  2 R  S y S x  , for ratio,
n n
1 f 2
and V( y ) = S y , for the mean per unit .
n

Comparison with SRS:

If V( y lr )  V( y ), then the regression estimate is more efficient, i.e.,


1 f 2 1 f 2
S y (1   2 )  S y , this is true since 1-  2  1. It is obvious that the two
n n
variances are equal when  = 0.

Comparison with ratio estimate:

The regression estimate is more precise than the ratio estimate when V( ylr )  V( YˆR ), i.e.,
1 f 2 1 f 2
n
S y (1   2 ) 
n
S y  R 2 S x2  2 R  S y S x 
Sy
After rearranging this inequality, we get (  S y  R S x )2  0,  (  R)2  0
Sx
2
 (B  R)  0. If B = R they are equally efficient and this equality holds when there is linear
relationship between y and x with a straight line through the origin. If B  R, then regression
estimate is more precise.

Example: See Cochran 3rd edition page 196.

6.3 Regression Estimate in Stratified Sampling

Like ratio estimate, two types of regression estimate can be made in stratified random
samplingseparate and combined regression estimates.

Separate regression estimate: For each stratum, a regression estimate can be given as

3
y l rh = y h + bh ( X h  x h ), where bh is pre-assigned value for the hth stratum regression
1 fh 2
coefficient. Its variance is V( y lrh ) =
nh
 
S yh  bh2 S xh2  2bh S yxh . Then a separate regression
L
estimate is y l rs = W
h 1
h y lrh , which is unbiased estimate of Y . Assuming independent sampling
L
(1  f h ) 2
in each stratum, V( y l rs ) = W h
2

n
S yh  bh2 S xh2  2bh S yxh
V ( y lrh ) = W h
2
 
h 1 h

The V( y l rs ) is minimized when bh = Bh the true regression coefficient in stratum h and the
2 (1  f h ) 2
minimum variance is V( y l rs )min = W h S yh (1   h2 )
nh

Sample estimate: For separate regression estimate, the sample estimate of bh is given as
nh

i 1
( y hi  y h )( x hi  x h )
s yh
b̂h = = h , and it is a least squares estimate of Bh.
nh s xh
 ( xhi  xh ) 2
i 1

2 (1  f h ) 2 2 (1  f h ) 2
The sample estimate of V( y l rs ) = W h S yh (1   h2 ) is v( y l rs ) = W h sh ,
nh nh
nh nh
ˆ
 ( y hi  y h )  bh  ( xhi  xh ) 2
22

2
where, s h2 = i 1 i 1
and s h2 is an unbiased estimate of S yh (1   h2 ) if
nh  2
the sample size is large in all strata and regression is linear.

Combined regression estimate:

Assume Bh is the same in all strata. Then the combined regression estimate is y l rc = y st + b
L L
( X  x st ), where y st = Wh yh , xst =
h 1
W
h 1
h x h , and b is pre-assigned value. y l rc is an unbiased

estimate of Y , i.e, E( y l rc ) = Y . The variance of this estimate can be computed as V( y l rc ) =


(1  f h ) 2
W h
2

nh
 
S yh  b 2 S xh2  2bS yxh . The value of b which minimizes this variance is bc =

(1  f h )
 Wh2 S yxh
Cov ( y st , x st )
=
nh
=
 ah Bh , which is a weighted mean of the stratum
V ( x st ) (1  f )  ah
 Wh2 n h S xh2
h

4
(1  f h ) 2
regression coefficient Bh, and ah = Wh2 S xh . Then the minimum variance is V( y l rc )min =
nh
(1  f h ) 2
W h
2

nh
 
S yh  bc2 S xh2  2bc S yxh .

(1  f h ) 2 ˆ 2 2
An estimate of V( y l rc ) is given as: v( y l rc ) = W h
2

nh
 
s yh  bc s xh  2bˆc s yxh , where

nh
(1  f h ) 
( y hi  y h ) ( x hi  x h )
(1  f )
2 2
W n h s yxh
h  W h
nh
i 1

nh  1
bˆc  h
=
2 (1  fh ) 2 nh
Wh n s xh  ( x hi  x h ) 2
h (1  f )
Wh2 n h i 1 n  1
h h

Total Estimate: Yˆlrs = N y lrs and V( Yˆ ) = N V( y ) for separate regression


lrs
2
lrs

Yˆlrc = N y lrc and V( Yˆlrc ) = N V( y lrc ) for combined regression


2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy