Advanced Sampling Theory
Advanced Sampling Theory
Advanced Sampling
Theory with Applications
How Michael 'selected' Amy
Volume I
by
Sarjinder Singh
St. Cloud State University,
Department of Statistics,
St. Cloud, MN, U.S.A.
SPRINGER-SCIENCE+BUSINESS M E D I A , B . V .
A C.LP. Catalogue record for this book is available from the Library of Congress.
PREFACE
XX I
1.0 Introduction 1
1.1 Population 1
1.1.1 Finite popul ation 1
1.1.2 Infinite population 1
1.1.3 Target population 1
1.1.4 Study popul ation 1
1.2 Sample 2
1.3 Examples of populations and samples 2
1.4 Census 2
1.5 Relati ve aspects of sampling versus census 2
1.6 Stud y variable 2
1.7 Auxiliary variable 3
1.8 Difference betwe en stud y variable and auxiliary var iable 3
1.9 Parameter 3
I. I0 Statistic 3
I. I I Stat istics 4
1.12 Sample se lectio n 4
1.12.1 Ch it method or Lottery method 4
1.12.1.1 With replacement sampling 4
1.12.1.2 Without replacem ent sampling 5
1.12.2 Random number table method 5
1.12.2.1Remainder method 6
1.13 Probability sampling 7
1.14 Probability of selecting a sample 7
1.15 Popu lation mean /tot al 8
1.16 Population moments 8
1.17 Population standard deviation 8
1.18 Population coefficient of variation 8
1.19 Relative mean square err or 9
1.20 Sample mean 9
1.21 Sample variance 9
1.22 Estimator 10
1.23 Estimate 10
1.24 Sample space 10
1.25 Univariate random variable 11
1.25.1 Qualitative random variables 11
VIII Advanced sampling theory with applications
2.0 Introduction 71
2.1 Simple random sampling with replacement 71
2.2 Simple random sampling without replacement 79
2.3 Estimation of population proportion 94
2.4 Searls' estimator of population mean 103
2.5 Use of distinct units in the WR sample at the estimation stage 106
2.5.1 Estimation of mean 107
2.5 .2 Estimation of finite population variance 113
2.6 Estimation of total or mean ofa subgroup (domain) ofa population 118
2.7 Dealing with a rare attribute using inverse sampling [23
2.8 Controlled sampling 125
2.9 Determinant sampling 127
Exercises 128
Practical problems 132
x Advanced samp ling theory with applicat ions
5 U SE OF AUXILIARY INFORMATION:
PROBABILITY PROPORTIONAL TO SIZE AND
WITHOUT REPLACEMENT (PPSWOR) SAMPLING
5. I I
Ordered and unordered estimators 444
5.11.1 Ordered estimators 445
5.11.2 Unordered estimators 449
5.12 Rao--Hartley--Cochran (RHC) sampling strategy 452
5.13 Unbiased strategies using IPPS sampling schemes 462
5.13.1 Estimation of population mean using a ratio estimator 462
5.13.2 Estimation of finite population variance 464
5.14 Godambe 's strategy: Estimation of parameters in survey sampling 465
5.14.1 Optimal estimating function 470
5.14.2 Regression type estimators 472
5.14.3 Singh's strategy in two-dimensional space 473
5.14.4 Godambe's strategy for linear Bayes and optimal
estimation 476
5.15 Unified theory of survey sampling 479
5.15.1 Class of admissible estimators 479
5.15.2 Estimator 479
5.15.3 Admissible estimator 479
5.15.4 Strictly admissible estimator 479
5.15.5 Linear estimators of population total 483
5.15.6 Admissible estimators of variances of estimators of total 485
5.15.6. I Condition for the unbiased estimator of variance 485
5.15.6.2 Admissible and unbiased estimator of variance 485
5.15.6.3 Fixed size sampling design 485
5.15.6.4 Horvitz and Thompson estimator and its variance
in two forms 485
5.15.7 Polynomial type estimators 489
5.15.8 Alternative optimality criterion 490
5.15.9 Sufficient statistic in survey sampling 491
5.16 Estimators based on conditional inclusion probabilities 493
5.17 Current topics in survey sampling 494
5.17.1 Surveydesign 495
5.17.2 Data collection and processing 495
5.17.3 Estimation and analysis of data 496
5.18 Miscellaneous discussions/topics 497
5.18.1 Generalized IPPS designs 497
5.18.2 Tam's optimal strategies 498
5.18.3 Use of ranks in sample selection 498
5.18.4 Prediction approach 498
5.18.5 Total of bottom (or top) percentiles of a finite population 499
5.18.6 General form of estimator of variance 499
5.18.7 Poisson sampling 499
5.18.8 Cosmetic calibration 500
5.18.9 Mixing of non-parametric models in survey sampling 501
5.19 Golden Jubilee Year 2003 of the linear regression estimator 504
Exercises 507
Practical Problems 520
XIV Advanced sampling theory with applications
13 M ISCELLANEOUS TOPICS
A pPENDIX
T ABLES
POPULATIONS
BIBLIOGRAPHY
1131
AUTHOR INDEX
1193
HANDY SUBJECT INDEX
1215
ADDITIONAL INFORMATION
1219
PREFACE
I have pro vided a summary of my book from which a stati stician can reach a fruitful
dec ision by makin g a comparison in his/her mind with the existing books in the
international marke t.
Title s) 4
Dedication 2
Table of contents 14
Preface 8 9 I
I 70 13 II 20 2 58
2 66 20 22 19 58 24
3 158 36 68 38 307 61
4 54 9 15 10 84 26
5 180 13 43 15 651 43
6 86 10 29 10 170 21
7 34 8 17 9 72 23
8 116 21 24 19 112 70
9 64 12 11 14 61 57
10 60 3 31 4 162 13
II 86 3 33 5 216 7
12 90 8 24 9 154 28
13 40 6 7 5 100 15
A endix 26 12
Biblio ra h 62
Author Index 22
Subi ect Index 4
Related Books 2
24
This book also covers, in a very simple and compact way, many new topics not yet
available in any book on the intern ational market. A few of these interesting topics
are: median estimation under single phase and two-ph ase sampling, difference
between low level and higher level calibration approach, calibration weights and
design weights, estimation of parametric function s, hidden gangs in finite
populations, compromised imputation, variance estimation using distinct units ,
general class of estimators of popul ation mean and variance, wider class of
estimators of population mean and variance, power tran sformation estimators,
estimators based on the mean of non-sampled units of the auxiliary character, ratio
and regression type estimators for estimating finite population variance similar to
prop osed by Isaki ( 1982), unbiased estimators of mean and variance under
Midzuno 's scheme of sampling, usual and mod ified jackknife variance estimator,
Preface XXIII
This book has 459 tables, figures, maps, and graphs to explain the exercises and
theory in a simple way. The collection of 1179 references (assembled over more
than ten years from journals available in India, Australia, Canada, and the USA) is a
vital resource for researcher . The most interesting part is the method of notation
along with complete proofs of the basic theorems . From my experience and
discussion with several research workers in survey sampling , I found that most
people dislike the form or method of notation used by different writers in the past.
In the book I have tried to keep these notations simple, neat, and understandable. I
used data relating to the United States of America and other countries of the world,
so that international students should find it interesting and easy to understand. I am
confident that the book will find a good place and reputation in the international
market, as there is currently no book which is so thorough and simple in its
presentation of the subject of survey sampling.
The objective , style, and pattern of this book are quite different from other books
available in the market. This book will be helpful to:
In this book I have begun each chapter with basic concepts and complete
derivations of the theorems or results. I ended each chapter by filling the gap
between the origin of each topic and the recent references. In each chapter I
provided exercises which summarize the research papers. Thus this book not only
gives the basic techniques of sampling theory but also reviews most of the research
papers available in the literature related to sampling theory. It will also serve as an
umbrella of references under different topics in sampling theory, in addition to
clarifying the basic mathematical derivations . In short, it is an advanced book, but
provides an exposure to elementary ideas too. It is a much better restatement of the
existing knowledge available in journals and books . I have used data, graphs,
tables, and pictures to make sampling techniques clear to the learners .
XXIV Advanced sampling theory with applications
EXERCISES ,>,
At the end of each chapter I have provided exercises and their solutions are given
through references to the related research papers. Exercises can be used to clarify or
relate the classroom work to the other possibilities in the literature .
At the end of each chapter I have provided practical problems which enable
students and teachers to do additional exercises with real data.
I have taken real data related to the United States of America and many other
countries around the world. This data is freely available in libraries for public use
and it has been provided in the Appendix of this book for the convenience of the
readers . This will be interesting to the international students .
.SOLU.TIO:N·.MANUAL'·'
I am working on a complete solution manual to the practical problems and selected
theoretical exercises given at the end the chapters.
I was born in the village of Ajnoud, in the district of Ludhiana, in the state of
Punjab, India in 1963. My primary education is from the Govt. Primary School,
Ajnoud; the Govt. Middle School, Bilga; and Govt. High School, Sahnewal, which
are near my birthplace. I did my undergraduate work at Govt. College Karamsar,
Rarra Sahib. Still I remember that I used to bicycle my way to college, about 15 km,
daily on the bank of canals. It was fun and that life has never come back. M.Sc. and
Ph.D. degrees in statistics were completed at the Punjab Agricultural University
(PAU), Ludhiana, and most of the time spent in room no. 46 of hostel no. 5.
At present I am an Assistant Professor at St. Cloud State University, St. Cloud, MN,
USA, and recently introduced the idea of obtaining exact traditional linear
regression estimator using calibration approach. From 200 I to 2002 I did post
doctoral work at Carleton University, Canada. From 2000 to 2001 I was a Visiting
Instructor at the University of Saskatchewan, Canada. From 1999 to 2000 I was a
Visiting Instructor at the University of Southern Maine, USA, where I taught
several courses to undergraduate and graduate students, and introduced the idea of
compromised imputation in survey sampling. From 1998 to 1999 I was Visiting
Scientist at the University of Windsor Canada. From 1996 to 1998 I was Research
Officer-II in the Methodology Division of the Australian Bureau of Statistics where
I developed higher order calibration approach for estimating the variance of the
GREG, and introduced the concept of hidden gangs in finite populations. From
1995 to 1996 I was Research Assistant at Monash University, Australia. From 1991
to 1995 I was Research Fellow, Assistant Statistician and then Assistant Professor
at PAU, Ludhiana, India and was also awarded a Ph.D. in statistics in 1991. I have
published over 80 research papers in reputed journals of statistics and energy
science. I am also co-author of a monograph entitled, Energy in Punjab Agriculture,
published by the Indian Council of Agricultural Research, New Delhi.
~CKNOWLED6EMENTS
Indeed the words at my command are not adequate to convey the feelings of
gratitude toward the late Prof. Ravindra Singh for his constant, untiring and ever
encouraging support since 1996 when I started writing this book. Prof. Ravindra
Singh passed away Feb . 4, 2003, which is a great loss to his erstwhile students and
colleagues, including me. He was my major advisor in my Ph.D. and was closely
associated in my research work. Since 1996 Mr. Stephen Hom, supervisor at the
Australian Bureau of Statistics, always encouraged to me to complete this book and
I appreciate his sincere co-operation, contribution and kindness in joint research
papers as well guidance to complete this book. The help of Prof. M.L. King,
Monash University is also appreciated. I started writing this book while staying
with Dr. Jaswinder Singh, his wife Dr. Rajvinder Kaur, and their daughter Miss
XXVI Advanced sampling theory with applications
Jasraj Kaur in Australia during 1996. Almost seven years I worked day and night on
this book, and during May-July, 2003, I rented a room near an Indian restaurant in
Malton , Canada to save cooking time and spent most of the time on this book
Thanks are due to Prof. Ragunath Arnab, University of Durban--Westville, for help
in completing the work in Chapter 10 related to his contribution in successive
sampling, and completing some joint research papers . The help of Prof. H.P. Singh,
Vikram University in joint publications is also duly acknowledged.
The contribution of late Prof. D.S. Tracy , University of Windsor, of reading a few
chapters of the very early draft of the manuscript has also been duly acknowledged.
The contribution of Ms. Margot Siekman, University of Southern Maine in reading
a few chapters has also been duly acknowledged. Thanks are also due to a
professional editor Kathlean Prenderqast, University of Saskatchewan, for critically
checking the grammar and punctuation of a few chapters. Prof. M. Bickis ,
University of Saskatchewan, really helped me in my career when I was on the road
and looking for a job by going from university to university in Canada. Prof. Silvia
Valdes and Ms. Laurie McDermott's help, University of Southern Maine, has been
much appreciated. Thanks are also due to Professor Patrick Farrell, Carleton
University, for giving me a chance to work with him as a post doctoral fellow .
Thanks are also due to Prof. David Robinson at SCSU for providing a very peaceful
work environment in the department. The aid of one Stat 321 student, Miss Kok
Yuin Ong in cross checking all the solved numerical examples, and a professional
English editor Mr. Eric Westphal in reading the entire manuscript at SCSU is much
appreciated. Thanks are also due to a professional editor Dr . M. Cole from England
for editing the complete manuscript, and to bring it in the present form. Mary
Shrode and Mitra Sangrovla, Learning Resources and Technology Service, SCSU,
for help in drawing a few illustrations using NOV A art explosion 600,000 images
collection is duly acknowledged.
The permission of Dimitri Chappas , NOAA/ National Climatic Data Center to print
a few maps is also duly acknowledged. Free access to data given in the Appendix by
Agricultural Statistics and Statistical Abstracts of the United States are also duly
acknowledged. I would also like to extend my thanks to the Editor James Finlay,
Associate Editor Inge Hardon , and reviewers for bringing the original version of the
manuscript into the present form and into the public domain .
Note that I used EXCEL to solve the numerical examples , and while using a hand
calculator there may be some discrepancies in the results after one or two decimal
places . Further note that the names used in the examples such as Amy, Bob, Mr.
Bean, etc., are generic , and are not intended to resemble any real people. I would
also like to submit that all opinions and methods of presentation of results in this
book are solely the author's and are not necessarily representative of any institute or
organization. I tried to collect all recent and old papers, but if you have any
published related paper and would like that to be highlighted in the next volume of
my book, please feel free to mail a copy to me, and it will be my pleasure to give a
suitable place to your paper . To my knowledge this will be the first book , in survey
sampling, open to everyone to share contribution irrespective your designation ,
status, group of scientists, journals names, or any other discriminating character
existing in this world, you feel. Your opinions are most welcome and any suggestion
for improvement will be much appreciated via e-mail.
Sarjinder Singh (B:Sc., M.Sc., Ph.D ., Gold Medalist, and Post Doctorate)
Assistant Professor, Department of Statistics, S1. Cloud State University,
S1. Cloud, MN, 56301-4498, USA E-mail: sarjinder@yahoo.com
1. BASIC CONCEPTS AND MATHEMATICAL NOTATION
1.0 INTRODUCTION
In this chapter we introduce some basic concepts and mathematical notation , which
should be known to every surve y statistician. The meanin g and the use of these
terms is supported by using them in the subsequent chapters.
1.1 POPULATION
If the number of objects or units in the popula tion is count able , it is said to be a
finite population. For example, the number of houses in a suburb is a finite
population.
A finite or infinite population about which we requ ire information is called target
population. For example, all 18 year old girls in the United States .
This is the basic finite set of individuals we intend to study. For exa mple, all 18 year
old girls whose permanent address is in New York .
The following table provides some of the major differences between a sample and a
census .
.. . ..,
: i: ,lc~f~:lir,,~~~ AspeCt· i : '· '"Y'.i' : Hy ,..." . •ll~Ji:~ . ,.... .'F, : ' · " ·· c Ji'~~/ "'.i:i' $ l !l:; j :;: '''Census: · j : l( '';·)~
Cost Less More
Effort Less More
Time consumed Less More
Errors May be predicted with certain confidence No such errors
Accuracy of More Less
measurements
The variable of interest or the variable about which we want to draw some inference
is called a study variable . Its value for the til unit is generally denoted by Yi' For
example , the life of the bulbs produced by certain plant can be taken as a study
variable .
Chapter I : Basic concepts and mathematical notation 3
1. 7 AUXILIARY VARIABLE
A variable hav ing a direct or indirect relationship to the study variable is called an
auxiliary variable. The value of an auxiliary variable for the /" unit is generally
denoted by X i or zi , etc .. For example, the time or money spent on producing each
bulb by the plant to maintain the quality can be taken as an auxiliary variable.
The main differences between the study variable and auxiliary variable are as
follows :
\
Factors >', ;> ;',.f "S tudy/V ariable Auxiliary Variable
Cost More Less
Effort More Less
Sources of availability Current Surveys or Current or Past Survey,
Experiments Books or Journals etc.
Interest of an investigator More Less
Error in measurement More Less
Sources of error More Fewer
Notation Y X,Z
1.9 PARAMETER
An unknown quantity, which may vary over different sets of values forming
population is called a parameter. Any function of population values of a variable is
called a parameter. It is generally denoted by O .
Mathematically, suppose a population n consists of N units and the value of its /"
unit is Yi . Then any function of Y; values is a parameter, i.e.,
Parameter = f(Y1'Y2 ' .... ' YN ). (1.9 .1)
For example, if Y; denotes the total life time of the /" bulb , then the average life
time of the bulbs produced by the company is a parameter and is given by
I
Parameter = -(l\+Y2+ .... +YN ) . (1.9 .2)
N
1.10 STATISTIC
I: 11 STATISTICS
A sample can be selected from a population in many ways . In this chapter, we will
discuss only two simple methods of samp le selection. As the readers get familiar
with sample selection, more complicated schemes will be discussed in following
chapters.
Suppose we have N = 10,000 blocks in New York City . We wish to draw a sample
of n = 100blocks to draw an inference about a character unde r study, e.g., average
amount of alcohol used or number of bulbs used in each block produced by a
certain company. Assign numbers to the 10,000 blocks and write these numbers on
chits and fold them in such way that all chits look identical. Put all the chits in a
box. Then there are two poss ibilities :
Select one chit out of 10,000 chits in the box and note the number of the block
written on it. This is the first unit selected in the sample. Before selecting the
second chit, we replace the first chit in the box and mix with the other chits
thoroughly. Then select the second chit and note the name of the block written on it.
This is called the second unit selected in the sample. Go on repeating the process,
until 100 chits have been selected. Note that the chits are selected after replacing the
previous chit in the box some chits may be selected more than once. Such a
sampling procedure is called Simple Random Sampling With Replacement or
simply SRSWR sampling. Let us expla in with a few numbers of block s in a
population as follows :
In general, the total number of samples of size n drawn from a population of size
N in with replacement sampling is Nil and is denoted by s(n).
Thus
s(n) = s", (1.12 .1)
Now imagine the situation , 'How many WR samples, each of n = 100blocks, are
possible out of N = 10,000blocks?'
In case of without replacement sampling, we do not replace the chit while selecting
the next chit; i.e., the number of chits in the box goes on decreasing as we go on
selecting chits. Hence, there is no chance for a chit to be selected more than once.
Such a sampling procedure is called Simple Random Sampling and Without
Replacement or simply SRSWOR sampling. Let us explain it as follows: Suppose a
population consists of N = 3 blocks A, Band C. We wish to draw all possible
unordered samples of size n = 2. Evidently, the possible samples are : AB,
AC, BC. Thus a total of 3 samples of size 2 can be drawn from the population of
size 3, which in fact is given by 3C 2 = 3 . In general, the total number of samples of
size n drawn without replacement from a population of size N is given by NCII or
Thus
N N!
s (n) = CII = ( ) (1.12.2)
n! N-n.
where n! = n(n-IXn - 2).......2.1 , and O! = I.
Now think again, 'How many WOR samples, each of n = 100 blocks, are possible
out of N = 10,000blocks?'
Note that it is a very cumbersome job to make identical chits if the size of the
population is very large. In such situations, another method of sample selection is
based on the use of a random number table . A random number table is a set of
numbers used for drawing random samples. The numbers are usually compiled by a
process involving a chance element, and in their simplest form, consist of a series of
digits 0 to 9 occurring at random with equal probability.
As mentioned above, in this table the numbers from 0 to 9 are written both in
columns and rows. For the purpose of illustrations, we used Pseudo-Random
Numbers (PRN), generated by using the UNIF subroutine following Bratley, Fox,
6 Advanced sampling theory with applications
and Schrage (1983), as given in Table 1 of the Appendix. We generally app ly the
following rules to select a sample:
Rule 1. First we write all random numbers into groups of columns as already done
in Table I of the Appendix. We take as many columns in each group as the number
of digits in the population size.
Rule 2. List all the indiv iduals or units in the population and assign them numbers
1,2,3,...,N.
Rule 3. Randomly select any starting po int in the table of random numbers. Write
all the numbers less than or equal to N that follow the starting point until we obtain
n numbers. If we are using SRSWOR sampling discard any number that is repeated
in the random number table. If we are using SRSWR sampling retain the repeated
numbers .
Rule 4. Select those units that are assigned the numbers listed in Rule 3. This will
constitute a required random sample .
In the case of SRSWOR sampling, the figures 039, 048 would not get repeated; i.e.,
we would take every unit only once, so we will continue to select two more distinct
random numbers as 078 and 163.
Although the above method of selecting a sample by using a random number table
is very efficient, may make a lot of rejections of the random numbers, therefore we
would like to discuss a shortcut method called the remainder method.
1.12.2.LREMAINlfER METHOD
Using the above example, if any three digit selected random number is greater than
225 then divide it by 225. We choose the serial number from 1 through 224
corresponding to the remainder when it is not zero and the serial number 225 when
the remainder is zero. However, it is necessary to reject the numbers from 901 to
999 (besides 000) in adopting this procedure as otherwise units with ser ial number
1 to 99 will have a larger probability (5/999) of selection, while those with serial
Chapter I : Basic concepts and mathema tica l notation 7
number 100 to 225 will have probability only equal to 4/999. If we use this
proced ure and also the same three figure random numbers as given in columns I to
3, 4 to 6, etc., we obtain the sample of units which are assig ned numbers given
below. Agai n in SRSWR sampling the number that gives rise to the same remainder
are not discarded while in SRSWOR sampling procedure such numbers are
discarded . Thus an SRSWR samp le is as give n below:
.... C' , , H Units selected in the sample
138 151 099 025 014 022 197 176 I I 209 042 194
015 049 095 040 027 124 116 097 126 142 073 158
108 053 046 001 207 156 201 027 II I 209 065 184
Note that in the SRSWR sample, only one unit 209 is repeated, thus for SRSWOR
sampling, we continue to apply remainder approach until another distinct unit is
selected, which is 089 in this case. Further note that the first random number 992
was discarded due to requiremen t of this rule .
Every sample selected from the popu lation has some known probabil ity of being
selected at any occ asion . It is generally denoted by the symbo l, PI or p(t) . For
example the probability of selecting a samp le using
with replacemen t sampling, PI = 1/ N n , t = 1,2, ..., N n , (1.14.1)
and
without replacement sampling, PI = 1/ N Cn , t = 1,2 , ... ,N CII • (1.14.2)
The following tab le describes the difference between with replacement and witho ut
replacement sampl ing procedures.
With repl acemen t sampl ing ' .:I··' Without replacement sampl ing
Cheaper Costly
Few units may be selected more than A unit can get selected only once .
once .
Less efficient. More efficient.
Number of possible samp les s(n) = N n
Number of poss ible samples s(n) = N C"
Let Yi , j = 1,2,....,N, denote the value of the ( h unit In a population, then the
population mean is defined as
- 1( ) 1 N
Y = - l"\ + Y2 + ....+ YN = - L Y; ( 1.15.1)
N N ;=\
and popu lation tota l is defined as
Y=(l"\ +Y2 + ····+YN) = ~Y;= NY . (1.15 .2)
;=\
Th e unit s of mea surements of population mean are the sam e as thos e for the actual
data. For exa mple, if the (h unit, Y; , ';j j , is mea sured in doll ars, then the popul ation
mean , Y, has the same units as dollars.
Th e positive square root of the popu lation variance is called the population standard
deviation and it is denoted by O"y . Th e units of measurements of " » will again be
the same as that of actual data. For instance, in the above example, the units of
0"
measurements of y will be doll ars.
The ratio of standar d deviation to population mean is call ed the coe fficient of
variation. It is denoted by Cy that is
(1.1 8.1)
Chapter I : Basic concept s and math ematical notation 9
Evidently Cy is a unit free numb er. It is useful to compare the variability in two
different populations having different units of measur ements, e.g., S and kg. It is
also ca lled the relative standard error (RSE) . Sometim es we also consider
C y ~Sy /Y.
The relative mean square error is defin ed as the square of the coe fficient of
variation Cy and is generally written by RMSE.
Mathematically
2
2 ay (1.19.1)
RMSE = Cy = -=T .
y
Let Yi' i = 1,2,..., 11, deno te the value of the til unit selected in the sample, then the
sample mean is defin ed as
_ 1 11
Y =- L Yi · (1.2 0. 1)
Il i=l
2 1 /l ( \2 (1.21.1)
S =- - L Yi - YJ .
y 11- 1 i= 1
Remark 1.1. The popul ation mean Y and population van ance a; etc., are
unknown quantities (parameters) and can be denoted by the symbol 8 . The sampl e
mean Y and sample variance s~ etc., are known after sampling and are called
statistic and can be denot ed by iJ . Also note that sample standard deviation (or
standard error) and sample coe fficient of variation can also be defin ed as Sy = M
and Cy =
•
--=-
Sy
, respe ctively. Note that standard error is a statistic whe re as standard
Y
deviation is a parameter.
10 Advanced sampling theory with applications
1.22 ESTIMATOR
A statistic 81 obtained from values in the sample s is also called an esti mator of the
population parameter () . Note that the notation 81 , or 8, or 8
11 have same
meaning. For example the notation YI' or Y, or Yll have the same meani ng, and s;,
or S;'(I) have the same meaning. We choose acco rding to our requirements for a
give n top ic or exercise.
1.23 ESTIMATE
Any num eric value obtained from the sample information is called the estimate of
the population parameter. It is also ca lled a statistic.
A pic toria l represe ntatio n of such a sample space is give n in Figure 1.24.1.
e50
T
2 x 2 = 4 outcomes
c/ .< :
Tr ee diagram:
HH
H
.< :
H
T HT
H TH
T
""" First
Coin
T
Seco nd
Coin
TT
Sample
Spa ce
A random variable is a real valued function defin ed on the sample spac e lfI . It is
generally of two type s:
Qualit ative random variables assume values that are not necessar ily numerical, but
can be categorized . For example, Gender has two po ssibl e values: Male and
Female. These two can be arbitrary coded numerically as Female = 0 and Male = I .
Such coded variables are called Nominal variables. In another example, consider
Grades that can take five pos sible values: A, B, C, D , and F. These five
categori es can be arb itrarily coded numerically as: A = 4, B = 3, C = 2, D = 1, and
F = o. Note that here the magnitude of cod ing tells us quality of Grade that if code
is 3 then the Grade is better than the Grade if code is 2. Such a coded variable is
called Ordinal variabl e. Also note that in the case of the Nominal variable, code
Male = I and Female = 0, does not mean that males are superior to female s.
Adding, subtracting or averagin g such qual itative variables has no meaning. Thu s
qualitative variables are of two types: ( a) Nominal var iables; ( b ) Ordinal
variables. Pie charts or Bar charts are generally used to present qualitat ive
variables.
Quantitative random variables can take num erical values for which addin g,
subtrac ting or avera ging such variables does have meanin g. Exa mples of
cont inuou s variables are wei ght, height, numb er of students, etc.. In general, two
types of quantitative random var iables are availabl e: ( a ) Discret e random variable;
( b ) Continuous random variable.
A rando m variabl e is said to be continuous if it can take all possibl e value s bet ween
certain limits. For exa mple, height a student can be 5.6 feet.
12 Advanced sampling theory with applications
Random
Variable
...
Qualitative Quantitative
Note that Age itself is a quantitative variable whereas Age Groups is a qualitative
variable. Pie charts, bar charts, dot plots , line charts, stem and leaf plots, histograms
and box plots are generally used to present quantitative variables.
VAl;UE.ANDNARIANCE"OF A UNIVARIATE
1.28EXPE:~TEJ)
RAN])OMVARIAimE . . .
If a discrete random variable X takes all possible values Xi with probability mass
function , P(Xi) , in the sample space, If, then its expected value is
or, equivalently
Sometimes (1.28.2) is called a formula by definition and that in (1.28 .3) is called a
computing formula.
or equivalently
b
V(x) = Ix 2 f(x)ix - {E(x)}2 . ( 1.28.6)
a
In this case there are a coun tab le number of points XI , Xz , .. . along with associated
1.5
_1
><
ir 0.5
2 3 4 5
x
dF(x)
by f(x) = - - . Th e c.d.f. F(x) IS a non-decre asing function of x and is
dx
continuo us on the right. Also note that F(- 00) = 0, F(+00) = I , 0 ~ F(x) ~ I, and
b
P(a ~ .r s b) = Jf(x)dx = F(b)- F(a) . For exampl e, if x is a continuous random
a
variable with probability den sity function (p.d. f.)
I O c x « I,
f ()
.r = { (1.29.2)
o otherwise,
then its cumulative distribution function (c.d.f.) is given by
0 if x < O,
F(x) = x if 0 ~ .r ~ I,
1
(1.29.3)
I if x> I,
and its graphical representation is given in Figure 1.29.2.
1
1.5
~ :: ~, ~+ -~ ,-~,
0.0 0.2 0.4 0.6 0.8 1.0 1.5 2 2.5 3
x
Example 1.30. 1. A discrete random variab le X has the followi ng probability mass
function:
Select a random sample of three units using the method of random numbers .
Sol ution: The cumulative distribution function of the random variable X is given
by
We used the first six columns of the Pseudo-Random Nu mber (PRN) Table I give n
in the Appendix multi plied by 10-6 as the random ly selected values of F(x). Then
the integral value of the random variable x selected in the sample is obtained using
In case of with replacement sampling, the value of x = 3 has bee n selected twice, as
otherwise for WOR sampling we have to continue the process until three distinct
values of x are not selected .
Exa mple 1.30.2. If x follows a binomial distribu tion with parameters Nand p , that
is, x - B(N,p), say N = 10 and P = 0.4 . Select an SRSWR sample of 11= 4 units by
using the random number method .
Chapter 1: Basic concepts and mathematical notation 17
We used three columns from 7th to 9th of the Pseudo-Random Number (PRN) Tab le
1 give n in the Appendix multiplied by 10- 3 as the randomly se lected va lues of
F(x) . Then the integral value of the random var iable x selected in the sample is
number drawn from the Pseudo-Random Number (PRN) Table 1 given in the
Appendix .
Then the value of the ran dom variab le x selected in the sample is given by
1
16
1 if x > 3.
Select a sample of 11 = 10 units by using SRSWR sampling.
Solution. We are given F(x) = ~(x _I)4 which implies that x = 2[F(x)JI/4 + I . By
16
using the first three column s of the Pseudo -Random Numbers (PRN) Tab le I given
in the Appen dix multiplied by 10- 3 we obtain the observed values of F(x) and the
samp led values of x as:
1'2 F(x) "t '
h .x • ."
C
0.992 2.995988
0.588 2.751356
0.601 2.760956
0.549 2.721563
0.925 2.961397
0.014 1.687958
0.697 2.827419
0.872 2.932676
0.626 2.778990
0.236 2.393985
Using the three column s multiplied by 10-3 , say 7th to 9th, of the Pseudo-Random
Numbers (PRN ) Table I given in the Appendix , the first five observed values of
F(x) are given by 0.622,0.77 1,0.917,0.675 and 0.534 . Thus the sampled five
values from the above distribution are
Chapter I: Basic concepts and mathematical notation 19
'F(x)u" r. x
0.622 0.403214
0.771 1.141487
0.917 3.747745
0.675 0.612801
0.534 0.107222
Note that we have used the tan function in radians and :r = 4 tan- I ( I ).
Solution. The dist ribution of x is uniform between 5 and 10, so its probability
distribution function is
F(x) = p[x ~ x] = ff (x)dx= .!-(x- S) ( 1.30.9)
5 S
which implies that
x =S[F(x)+I] . (1.30.10)
Using the three columns multiplied by 10- 3 , say t h to 9th , of the Pseudo-Random
Number Table I given in the appendix, the first five observed values of F(x) are
given by 0.622, 0.771, 0.917, 0.675 and 0.534 . Thus the sampled five values from
the above distribution are given by
u.
p(x)'
" , .'
.... .~,. X' . •
0.622 8.110
0.771 8.855
0.917 9.585
0.675 8.375
0.534 7.670
If X and Yare discrete random variables, the probability that X will take on the
value x and Y will take on the value y as p(X = x,Y = y) = p(x,y), is called the joint
probability distribution function of a bivariate random variable.
20 Advanced sampling theory with applications
DISCRETERANDOM VARIABLES'
(a) p(x,y)?: 0 for each pair of values (x,y) within its domain.
and
(b) IIp(x,y)= 1, where the sum extends over all possible pairs (x,y) .
xy
If X and Yare discrete random variables and p(x,y) is the value of the joint
probability distribution at (x, y), the function given by
pAx) = Ip(x,y)
y (I .34.I)
for each x with in the range of X is called the marginal distribution of X , and the
function ,
Py{Y) = I p(x,y) ( 1.34.2)
x
for each y within the range of Y is called the marginal distribution of Y.
Letp(x,y)denote the joint probability mass function (p.m .f.) of two random
variables x and y . Also, let F(x,y) denote the cumulative mass function (c.m .f.)
of X and y . It is well known that, the distribution of the marginal distribution
function (m.d.f.) Py(Y) for any joint probability density function of X and y is
rectangular (or uniform) in the range [0, 1]. Random numbers in the random
Chapte r I : Basic concepts and mathematical notation 21
number table also follow the same distribution . Then to find out the value of y one
solves the equation (1.35.1) below .
The known form of the joint dens ity function p(x, y) [one can choose any suitable
form for p(xI>Y)] can be substituted in (1.35 .1). The value / of y so obtained is
used to find the value x * of x . For this we use the cond itional mass funct ion of x
given y = y * since the distribution of the cond itional mass function will also be
un iform in [0, I] . Thus anoth er random number R, is drawn and the value / of X
is determined from the equation
( 1.35.2)
1.36 CONTINUOUSBIVARIATERAND()l\1VAR.IABEE
A bivariate func tion with value s f (x,y ) , defin ed over the two-dimensional plane is
called a j oint prob ability density function of the continuous random variables X
and Y if and only if
A bivar iate function can serve as the joint probability distribution of a pair of
continuous random variables X and Y if and only if its values, f (x, y), satisfy
the conditions:
( a) f(x , y ) ~ 0 for each pair of values (x,y ) withi n its doma in; (1.37 .1)
+00+00
(b) J fJ (x,y ) dxdy = 1. (1.37 .2)
- 00 -00
22 Advance d sa mp ling theory with applications
If X and Yare continuous random var iables, the fun ction given by
y x
F(x, y) = p(x s x, Y :$ y )= J fJ(s, 1}:isdl (1.3 8.1)
-00 - 00
for - 00 < x < + 00 , -00 < y < +00 , where j(s, I) is the value of the j oint probab ility
distribu tion of X and Y at the point (s, I), is calIed the Joint distribution function
or the Joint cumulative distri bution, of X and Y.
If X and Ya re continuous random variables and j(x,y) is the va lue of the j oint
prob ab ility density function, then cumulative marginal probab ility distributi on
function of y is give n by
v +00
Fy(y) = oJ fJ(x, y)dxdy (1.39 .1)
- 00-00
for - 00 < y < +00 , and the cumulative margi nal probability dis tribution funct ion of x
is give n by
x +00
FAx)= J fJ(x,y)dydx ( 1.39.2)
- 00 - 00
In genera l, let j(x, y) deno te the joint probability density functio n (p.d.f.) of two
continuous ran dom varia bles X and y . Also let F(x, y) denote the cumu lative
density function (c.d.f.) of X and y . It is well know n that the distribution of the
marginal distribution function (m.d.f.) F2(y) for any joint probability density
fun ction of X and y is rectangular (or unifo rm) in the range [0, I] . Rand om
nu mbers in the rand om number table also follow the same distributi on. To find out
r
the value of y , one so lves the equ ation ( 1040.1) below.
y =y* since the distribution of the conditional distribution function will also be
uniform in [0, I]. Thu s anoth er random number R, is drawn and the va lue x * of X
is de term ined from the equation:
Example 1.40.1. If the joint prob ability density function of two continuous random
variables x and y is given by,
()
1
f x,y =
~3 (x + 2Y) for O< x <l, O < y <l ,
o otherwise,
then , se lect six pairs of obse rva tions (x, y) by using the Random Number Tabl e
method.
r l'{ r
Solution. We have
Fy(Y)= Y{+oo
f ff(x,y)dx 2 f(x
y= ' f - ) + 2y )dx y = y + 2Y 2
0-00 0 30 3
Let 0 < Rl < 1 be any oth er random number, say obtained by usin g i h
to 9 th
co lumns, of the Pseudo-Random Numbers given in Tabl e I of the Appendix, then
the value of x is given by so lving the integr al
x {
ff~r Iy *\ .
= y p x = Rl or
F( \ ..J
- f 1.99 + x J'IX = 0.622
o 30
or, equiva lently solving a quadr atic equation x 2 + 3.98x - 3Rl = 0 , which implies
24 Advanced sampling theory with applicat ions
1;41 ~lUNBIASEDNESS
where, PI' denote the probability of selecting the (Iii sampl e from the population,
n , and s~) PI = 1. Note that total number of possib le samples, in case of SRSWR
1=1
sampling are, s(n ) = N n ,and in case of SRSWOR sampling are, s(n)=N en '
For example:
( i ) Sample mean YI is an unbiased estimator of population mean Y under both
SRSWR and SRSWOR sampling
Chapter I: Basic concepts and mathematical notation 25
E()lt) = s~)Pt)lt = Y.
_
(1041.2)
1=1
Show that the sample mean j', is an unbiased estimator of population mean f under
both SRSWR and SRSWOR sampling. The sample variance s; is an unbiased
estimator of population mean squared error S; under SRSWOR sampling, and the
population variance iJ; under SRSWR sampling, respectively.
The total number of all possible samples is s(n )= N C; = 4C2 = 6 and Pt = 1/6 .
Now we have the following table .
26 Advanced sampling theory with app lications
Sample Sampled-units ." S amp le mean < ' Sample < \' Probability of
, . .. ,.. <'<,' ,' • ., >
, -~'
The above tab le shows that the distribution of sample means is symmetric and that
of sample variance is skewed to the right in the case of without rep lacement
sampling.
Case II. Suppose we are drawing all possible samples of size n = 2 by usi ng
SRSWR sampling.
The total number of all possible samples is s(n) = N il = 4 2 = 16 and Pt = 1/16 for
all t = 1,2, ..., 16.
,..
t " ,
2
Sy(t) a sample
:'" c't Pt
'"
1 (A, A) or ( I, I) YI = (1 + 1)/ 2=1.0 0.0 1/ 16
2 (A, B) or (1 , 2) Y2 = (1+ 2)/2 = 1.5 0.5 1/l 6
3 (A, C) or (I , 3) Y3 =(1+3)/2 =2.0 2.0 l/16
4 (A, D) or ( I , 4) Y4 = (1 + 4)/ 2 = 2.5 4.5 1/l 6
5 (B, A) or (2, I) Y5 = (2 + 1)/ 2 = 1.5 0.5 l/ 16
6 (B, B) or (2,2) Y6 = (2 + 2)/2 = 2.0 0.0 1/ 16
7 (B, C) or (2,3) Y7 = (2 + 3)/ 2 = 2.5 0.5 1/l 6
8 (B, D) or (2, 4) Y8 = (2+ 4) / 2 = 3.0 2.0 l/ 16
9 (C, A) or (3, I) Y9 = (3 + 1)/ 2 = 2.0 2.0 1/l 6
10 (C, B) or (3 , 2) YIO = (3 +2)/2 = 2.5 0.5 1/l 6
II (C, C) or (3,3) YI I =(3+3) / 2=3 .0 0.0 I/l 6
12 (C, D) or (3, 4) Y12 = (3 +4)/ 2 = 3.5 0.5 l/ 16
13 (D,A) or (4, I) YI3 = (4 + 1)/2 = 2.5 4.5 I/l 6
14 (D, B) or (4,2) YI4 = (4 + 2)/2 = 3.0 2.0 l/16
15 (D , C) or (4,3) YI5 = (4 +3)/2 = 3.5 0.5 1/l 6
16 (D , D) or (4, 4) YI6 = (4 +4)/2 = 4.0 0.0 l/16
Thus the expec ted val ue of the sa mple mean )it is give n by
28 Advanced sampling theory with app licatio ns
£(y,)=
_
-N"1 N" _ 1 16_ 1 40-
L Y/ = - LY, = - (1+ 1.5+ ....+4)= - = 2.5 = Y
/ =1 16 s=1 16 16
and that of the sampl e varia nce s; is given by
z ] 1 N" Z 1 16 Z 1( ) 20 _ 2
£ [sY(') =- " LS y(/) = -LS y(,)= -0+0.5+ ...+ 2+ 0.5 =- = 1.25 -o- y .
N ' =1 16 s =1 16 16
Then we have the following new term .
1.41.1.1 BIAS
It is the difference between the expected value of a statistic ()/ and the actual value
of the parameter () that is
B(O/) = £(0,)-o. ( 1.41.5)
Thu s an estimator 0, is unbi ased if £(0/)=(), which is obvious by setting B(OJ=o.
1.41.2 CONSISTENCY
There are several definitions for the consiste ncy of any statistic, but we will use the
simplest. An estimator 0/ of the population parameter () is said to be consistent if
Lim(O/)= o. (1.4 1.6)
n-too
For example:
( i ) The sample mean y/ ( or simply y) is a consis tent estimator of the finite
popul ation mean, Y.
( ii ) The sample mean squared error s; is a consistent estimator of the population
1.41.3 SUFFICIENCY
An esti mator 0, is said to be suffic ient for a parameter () if the distribut ion of a
sample YI,YZ,...,Y" given 0/ does not depe nd on () . The distribution of 0, then
contains all the information in the samp le relevant to the estim ation of () and
°
knowledge of 0/ and its sampling distribution is 'sufficient' to give that
information . In general, a set of estima tors or statistics 1, Oz, ,Ok are 'jointly
sufficient' for para meters ()" (}z , . .. .. , (}k if the distribution of samp le values given
01 ,Oz , A does not depend on these (}I>(}z, ,(}k .
Chapter I: Basic concepts and mathematical notation 29
1.41.4 EFFICIENCY
Before defining the term efficiency, we shall discuss two more terms, viz., variance
and mean square error of the estimator.
1.41.4.1 VARIANCE
(1.41.7)
MSE(el)= v(e l ) .
Thus if e and e
l 2 are two different estimators of the parameter e then the
estimatore is said to be more efficient than the estimator e2 if and only if
l
MSE(e,) <MSE(eJ
1.42 RELATIVE EFFICIENCY
RE =MSE(e2)xIOO/MSE(e,) . (1.42.1)
The ratio of the absolute value of the bias in an estimator to the square root of the
mean squar e error of the estimator is called the relative bias.
It is defined as:
RB=ls(el)I/~MSE(el) (1.43 .1)
where B(e = E(e e and the
l ) l )- relative bias IS independent of the units of
measurement of the origin al data.
30 Advanced sampling theory with applications
If el, e2, ....,e" are independently distrib uted random variables with E(ej) = e '<j j ,
. I n .
and e= - I ej , then
11 j=l
v(e)= - ( 1_ ) f (ej -e'f (I .44.1)
1111- 1 j=1
is an unbiased estimator of v(e). If ej = e(jl is the l ' estimator of e obtained by
dropping the l'
unit from the samp le of size /I , then such a method of varia nce
estimation are also called Jackknife method of varia nce estimation , and the
estimator of variance takes the form
VJack (e) = (11 - I)
11 j=1
i. (e( jl - ef (1.44. 2)
. I" ·
where e = - I e(jl .
11 j=1
. _ I"
For example, if e = Y = - I Yi is an estimator of the population mean Y under
11 i=1
. I"
SRSWR samp ling, then e(jl = Y(jl = - - . I Yi , denote the estimator of the
II-I'*J=I
population mean Y obtai ned by droppingj" unit from the samp le. Clearly, we can
write
Also
where f = n] N .
Note that this is not always possible to adjust Jackknife estimator of variance to
make it unbiased for other sampling schemes available in the literature.
1.45L.OSSFUNCTION
(1.46 .1)
holds for all possible values of the characteristic under study . Now an estimator 0,
belonging to r is said to be admissible in r if there exists no other estimator in r
which is better than 0,.
A sample survey is a survey which is carried out using sampling methods, i.e., in
which only a portion and not the whole population is surveyed.
32 Advanced sampling theory with applications
Example 1.48.1. Select all possible SRSWR samples each of two units from the
population consisting of four units 1,3,5 and 7.
( a ) Construct the sampling distribution of the sample means.
( b ) Construct the sampling distribution of the sample variances.
Solution. The list of 16 samples of size 2 from the population and the mean of each
sample is given in the following table.
Samnle ..~ 1,1 1,3 1,5 1,7 3,1 3,3 3,5 3,7 5,1 5,3 5,5 5,7 7,1 7,3 7,5 7,7
Means I 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7
Variances 0 2 8 18 2 0 2 8 8 2 0 2 18 8 2 0
( a ) The relative frequency distribution of the sample means is
."
Sample Frequency Relative
means frequency
I I 0.0625
2 2 0.1250
3 3 0.1875
4 4 0.2500
5 3 0.1875
6 2 0.1250
7 I 0.0625
Sampling distribution of
the sample means
>. 0.3
Q) <J
~ a; 0.2
ra
8:1 go
::::I
0.1
.;: 0
2 3 4 5 6 7
Sample Means
0 4 0.250
2 6 0.375
8 4 0.250
18 2 0.125
0 .4 .
'"
u 0 .35
Ii 0 .3
& 0 .2 5
£ 0 .2
~ 0 .15
... 0.'
~ 0 .0 5
o.
18
Sam pie v ari a nce
1.49 SAMPLING.FRAME
A sample space lfI (or S) of iden tifiable units or eleme nts of populatio n to be
surveyed is called a sampling frame . It may be a discrete space such as househo lds,
ind ividuals or a continuous space such as area under a particular crop.
Let If/ ={t }, t = 1,2 ,..., s(O) be a speci fied space of samples, B( be a Borel set in
lfI and p( be the probability measure defined on B(, then the triplet (If/, B(, p() is
called a sample survey design.
In general, two types of errors, which arise dur ing the process of sampling, have
bee n ob served in actual practice in the estima tors :
( a ) Sa mp ling errors; ( b) Non-samp ling errors.
Let us brie fly explain these errors.
34 Advanced sampling theory with applications
An error which arises due to sampling is called a sampling error. Let us explain this
with the help of the following example. For a population of size N = 4 , let the units
be A = 1 , B = 2, C = 3, and D = 4. The population mean is given by, Y = 2.5.
There are N Cll =4C2 = 6 possible samples each of size II = 2 . The units selected in
the six samples are : (A, B), (A,C), (A,D), (B, C), (B, D), and (C, D} Thus six
sample means are given by:
2 3 4 5 6
(A, C) (A, D) (B,C) (B,D) (C,D)
or
3,4)
3.5
Ifwe take each of the sample means and population mean separately, then, we have
the following cases: error of (A ,B) = 11.5 - 2.51 = 1.0 ; error of (A, C) = 12.0 - 2.51 = 0.5 ;
error of (A, D) = 12.5 - 2.51 = 0.0 ; error of (B,C) = 12.5 - 2.51 = 0.0; error of
(B, D) = 13.0 - 2.51 = 0.5 ; error of (C,D) = 13.5 - 2.51= 1.0 . Note that we are measuring
only two units out of four units, i.e., we have only partial information in the sample
therefore sampling error arises. One of the measurements for the sampling error is
the variance of the estimator. For example, the variance of the sample mean
estimator, YI' is
The people from whom we get the information are called the respondents and the
people in the sample from whom we do not get information are called non-
respondents . The error which arises, when we fail to get the information is called
non-response error and the phenomenon is called non-response. This error arises
because of the fact that we are not able to cover the whole sample. For example , if
we want to interview 100 farmers and suppose 5 out of them do not allow us to
interview them. Then we are interviewing only 95. So the sample is not complete.
Such errors are called non-response errors.
1.51.ij,MEASQREMENT E~ORS .
The errors that we bring in measuring the characters are called measurement errors.
For example, suppose we want to measure the age of the respondents. Among the
respondents, some may report their age less than their actual age. These types of
errors are called measurement errors.
The errors which arise due to missing some numbers due to non availability of data
or recording some numbers wrongly, while making a table is called a tabulation
error.
After the table is formed, we start our calculations. The errors committed In
calculations are known as computational errors.
'. .
1.52 POINT ESTIMATOR
A point estimator endeavours to give the best single estimated value of the
parameter. For example, the average height of school children is 5.3 feet.
Thus there are two cases: If VVt) is known then for a large sample, a
(1 - a)100% confidence interval estimate for the population mean Y is given by
Yt±Za/2~VVt) (1.54.4)
where Za/2 values are given in Table 3 of the Appendix, and if VV t ) is unknown
then for a small sample, a (I - a)100% confidence interval estimate for the
population mean Y is given by
Yt±la/2(df=n-l~vVt) (1.54 .5)
where la/2[df = n - I) values are given in Table 2 of the Appendix, and df stands for
degree of freedom . Note that if a=0.05 it represents (l -a)100% =(1-0.05)100%
= 95% confidence interval.
( a ) When population va riance is known : The lower and upper limits are
( b ) W hen populatio n variance is not known: The lower and upper limits are
All possible sample s, sample means, and variances, lower and upper limits of the
95% confidence interval, and their coverage is given in the following table.
Thus we observed that the population mean Y = 6.29 lies 20 times between LI and
U 1 out of total 21 times, and hence the observed proportion of confidence intervals
containing population mean = 20/21 = 0.9524. In other words, 95.24% cases the
population mean lies between the confidence interval estimates when variance is
known . Thus the observed percentage is very close to the expected coverage of
95%.
Also we observed, when population variance is not known, then the populat ion
mean lies 16 times between L z and U z, and hence the observed proportion of the
confidence interval estimates containing population mean = 16/21 = 0.7619, that is,
only 76.19% times the population mean lies between the confidence interval
estimates when variance is unknown. Here the observed percentage of the coverage
is lower than the expected coverage of 95%. This may be due to very small sample
and population size. In practice as the sample size becomes large, (How large? Just
smile because there is no unique answer), then the observed proportion of coverage
in both cases converges to 95%.
0, that is, Yi = 1 (ifi E A) and Yi = 0 ~fi E A C ) , then the sample mean Y s also
becomes sample proportion P, as follows:
- 1~ 1(
Yt=-L.Yi=-I+O+1+0+ 0 +1 ) =-=p,
n\ ' (1.56.1)
n i; \ n n
where nj denotes the number of units of the sample in the group A, and n denotes
the total number of units in the sample. Note that the value of sample proportion
also lies between 0 and I that is 0:0; P:0; 1 .
Example 1.57.1. Consider a class consisting of 6 students. Their names and major
are given in the following table:
',';/ ''''' N am A,,, } IT" l'",,), " I . ,' , " '" ',i , ,'
Amy Math
Bob English
Chris Math
Don English
Erin Math
Frank English
40 Advanc ed sampling theory with applications
( b ) How many SRSWOR samples, each offour units, will there be?
The possible combinations of choosing 4 objects out of 6 object s are given by:
6 = 6! =~=6 x5 x4 x3 x2 xl =15.
C4
4!(6-4) 4!x2! 4 x3 x2 xl x2 xl
Note that each combination can be taken as a without replacement sample, so the
total number of distinct samples will be 15.
( c ) Sampling distribution of estimate of propo rtion: Let us construct those 15
samples as follow s:
The above table shows that the distribution of estimates of proportion is symmetric,
or say normal.
0'2 = V(Xi ) = v(p) = [i~I(P;.1} )]- ell f = [PF(f + P2X~ + P3X!]- ell f
=[~ Xo.252 + ~x O.502 + ~X O.752 ] _ (0.5?
15 15 15
= [0.0125 + 0.15 + 0.1125] - (0.25) = 0.275 - 0.25 = 0.025.
Example 1.57.2. Consider a class of 16 students taking statistics course , and their
names , marks, and major subjects are given in the following table:
Solution. We have
T·.
hi~ ..;!!~
.~
I l''1am ::.:•• j§ •• c~n~iii!. ,;
Ruth 92 8464
Ryan 97 9409
Tim 68 4624
Raul 62 3844
Marla 97 9409
Erin 68 4624
Judy 76 5776
Troy 75 5625
Tara 51 2601
Lisa 94 8836
John 70 4900
Cher 89 7921
Lona 62 3844
Gina 63 3969
Jeff 48 2304
Sara 97 9409
I:.i t "' Sum;:I :· .L 95559 .
So our SRSWOR sample consists of four students = {Judy, Tara, Jeff, Ruth} .
Judy 76 5776
Tara 51 2601
Jeff 48 2304
Ruth 92 8464
Sum 267 19145
Thus
11 2
IYi
11
II = 4 , = 267 and IYi = 19145.
i =1 i=1
( b ) Sample mean :
11
Iy
- - = -267 = 66.75 (stati
- = -i-I
Yt . )
statistic
II 4
which is an estimate of popu lation mean.
N
Note that an estimator of population total Y = If; will be given by
i=\
I.Yi J2
( i=1
n 2
I Yi - -"'----~- 19145- (267f
2 i= 1 II _ _ _----""4_ = 440.91 (statistic).
Sy =
II-I 4-1
(e) An estimator of the variance of the estimator of the population mean is
;;(Y/) = (N -IIJs 2 = (16-4JX440.91 = 82.67 (statistic).
Nil y 16x4
( f) Here 95% confidence interval is given by
Yt ± I.96~V(yt), or 66.75± I.96~52.548, or 66.75 ± 14.20, or [52.55, 80.20] .
Yes, the true popul ation mean Y = 75.26 lies in the 95% confidence interval
estimate. The interpretation of95% confidence interval is that we are 95% sure that
the true mean lies in these two limits of this interval estimate. Note that interval
estimate is a statistic.
46 Advanced sampling theory with applications
where la /2(df = n -1) = 10.025(df = 3) = 3.182 is taken from Table 2 of the Appendix.
Yes, again the true population mean lies in this 95% confidence interval and its
interpretation is same as above. Again note that interval estimate is a statistic.
3.
( a) Let us give upper case 'FLAG' of 1 to English majors and 0 to Math major
students in the whole population, then we have
o
2 R an Math o
3 Tim En lish
4 Raul Math o
5 Marla En lish
6 Erin Math o
7 Jud En \ish
8 Tro En lish
9 Tara Math o
10 Lisa Math o
11 John Math o
12 Cher En lish
13 Lona Math o
14 Gina Math o
15 Jeff Math o
16
Population Proportion:
N
L:FLAG i .. .
p= i=l = No.ofstudents wIth enghsh maJor =~=O .3125 (parameter).
N Total No.of Students 16
Chapter I: Basic concepts and mathematical notation 47
( b ) Let us now give the same lower case ' flag' to students in the sample .
r-:
:', "
Judy English I
Tara Math 0
Jeff Math 0
Ruth Math 0
:< J/>",Attt :,:;Y i:":, ~ UIIl'
I :~ ' " .
, ' '<> >' "/
Note that a proportion can never be negative, so lower limit has been changed to O.
Caution! It must be noted that we have here a very small sample, but in practice
when we deal with the problem of estimation of proportion, the minimum sample
size of 30 units is recommended from large populations. Note that instead of using
'FLAG' or 'flag' , sometimes we assign codes 0 or I directly to the variable Yor
X.
f §~l11ple '.
~Nu~ber '
Cher, John, Marla, Sara 0.25
2 Erin, Jud ,Raul, Tara 0.25
3 Gina, Lisa, Ruth, Tim 0.25
4 Jeff, Lona, Ran, Tro 0.25
48 Advanced sampling theory with applications
161.03
127.91
13.61
25.60
., 328.18
Chapter I: Basic concepts and mathematical notation 49
(_)
MSE Yt = ~
L. Pt Yt - Y {_ -}2 =-1 x328 .18=82.045 .
t= \ 4
where Y = 75.56 .
Thus we have
_ 13 _ 1
E(Yt ) = 'LPtYt = - x962 .3 = 74.02 ,
t= 1 13
and
B(Yt) = E(yt )- Y = 74.02 -75.56 = - 1.54 ,
V(Yt) = I Pt {Yt - E(Yt)}2 = ~13 x 237.8077 = 18.2929,
t=\
and
MSE(Yt) = Ipt ~t - Y}2 = ~ x 268.76956 = 20.675.
t =\ 13
Although John ' s samp ling scheme is less biased, it has too much mean square error
compared to Mike's sampling scheme . Thus we shall prefer Mike' s sampl ing
scheme over John's sampling scheme. Also note that the relative efficiency of
Mike' s sampl ing scheme over John 's sampling scheme is given by
MSE(- )
RE = Yt John X 100 = 82.045 X 100 = 396.83% .
MSE(Yt )Mike 20.675
Thus one can say that Mike's sampling plan is almost four times more efficient than
John 's sampl ing scheme.
50 Advanced sampling theory with applications
1.58 RELATIVESTANDARDERROR
whe re Rv(e)=v(e)/[E(e)f denotes the relative variance of the estim ator e. The
another famous name for relative standard error is coefficient of variation .
1.59.AUXILIARYINFORMATION··.
o o o
00 o
o
o o o
o o
o
x X
y P x)' = + 1 Y Px)' = - 1
x X
000
o o
o o
x X
Note that a similar scatter plot can be made from sample values to find the sign of
sample correlation coefficient rty .
( C) The population regression coefficient of X on Y is defined as
,B = Cov(X,Y)/V(X). (1.59.5)
For simple random sampling, it is given by
,B=SXy /S; . (1.59.6)
A biased estimator of f3 is given by
b= sxy /s; (1.59 .7)
which in fact represents a change in the study variable Y with a unit change in the
auxiliary variable X . Note that sign of ,B (orb) is same as that of PXy(or rxy ) .
From the above table, I(Y; - rl = 88, I {x; - xl = 52 and ~(y; - rXx; - x)= 65,
;;) ;;\ ;=\
so that
2 2
2 I N( -) 88 2 I N( - ) 52
Sy = - - L Y; - Y = - - = 22 , S x = - - LX; - X = - = 13 ,
N - I ;=\ 5-1 N-I;=I 5-1
I N( 65 -X -) Sxy 16.25
Sxy =--LY;-Y X i - X = -=16.25 , Px y = g = r.;:;-;:;:=0.960 ,
N - I;=I 5-1 S2 S2 ,,13 x22
x y
Units
A
B
c
Sum
Continued .
54 Adva nced samp ling theory with applications
13.44 4.00
0.44 1.00
18.78 9.00
E(x) = 19 = X , that is, the sample mean x is unbiased for population mean of the
auxiliary variable;
E(s~)= 22.00 = s~, that is, the sample variance s~ is unbiased for population s~
of the study variable;
E(s;)= 13.00 = S; , that is, the sample variance s; is unbiased for population S;
of the auxiliary variable ;
E(sxy ) = 16.25 = S xy , that is, the sample covariance s xy is unbiased for population
S xy of both variables;
EVxy)= 0.964 7c- Pxy' that is, the sample rxy is biased for population Pxy' and
B~xy)= EVxy)- P xy = 0.964-0.960 = 0.004;
56 Advanced sampling theory with applicatio ns
and
E(b) = 1.4 l' 13, that is, the sample b is biased for the popu lation 13;
and
B(b)= E(b)- 13 = 1.40-1 .25 = 0.15.
( c) The covariance between ji and x is defined as:
Cov(y,x)= E[y - E(Y)Ix - E(x)] =E[y - fl:~ - x]=IPs~s -f Ixs - xl
s=1
Now we have
( d ) Now we have
N - n S = (5-3) xI6.25 = 2.16667.
Nn ~ 5 x3
Thus we have
_ _) N- n (1- j)
COy (y , x =- - S ty = - -S,y, where j = niN .
Nn n
If x and y are two random variable and c and d are two real constants, then
(a) v(ex) = e 2V(x} (1.60. 1)
II
( C) If x = IXi , where the x; are also random variables, then we have
i= l
II
( d ) If x = I Cixi , where C; are real constants, then we have
i= l
V(X) = V ( ;~
" C;X; ) = ;~"C;2V(X;) .
II n
( e ) If x = I Cixi and Y = L d;y;, where Ci and d i are real con stant s, then we have
i= \ ;=1
The se are param eters which dea l with arrangi ng the data in ascendi ng or descen ding
order, and we introdu ce a few of them here as follows:
It is a measure which divides the popul ation into exa ctly two eq ua l parts, and it is
denoted by M y . Its analogo us from the sample is ca lled sample median , and is
denoted by if y' A pictorial repr esentation is given below:
/ Minimum )
\ Value
( ii ) If the sample size I is even, then the average of the values at the (%}h and
(%+ I}h positions from ordered data are called sampl e median . As an illustration,
consider a sample con sisting of 11= 6 (even) observations as 50, 90, 30, 60, 70 and
20. First step is to arrange the data in ascending order as: 20, 30, 50, 60, 70, 90.
Th e second step is to pick up two values: one at ( %}h = (%}h = 3rd position = 50 ,
and seco nd at (% + I}h = ( %+ I}h = 4th position = 60 . Then the average of these
These are three measures which divide the popul ation into four equal parts. The /11
quartile is represe nted by Qj, i = 1,2,3. A pictori al representation is give n below:
Minimu Maximum
Value Value
Note that the second quartile Q2 is a median. The first quartile QI is a median of
the data less than or equal to the second quartile Q2, and third quartile Q3 is the
median of the data more than or equal to the second quartile Q2' Thus finding three
quartiles needs to find median three times from the given ordered data. The
population interquartile range is defined as: 0 = (Q3 - QI)' The sample analogous of
population quartiles are called sample quartiles and are denoted by Qi, i = 1,2,3 and
sample interquartile range is defined as: <3 = (Q3 - QI)' which is a measure of
variation in the data set.
1.61.3 ~OI'JILATION.PERCENl'ILES
These are 99 measures, which divide the population into equal 100 parts. The { Ii
population percentile is represented by 11, i = 1,2,.... ,99 and its pictorial
representation is given below:
1% 1% 1%
and its sample analogous is called sample mode and is denoted by ifO' As an
illustration, for the data set 60, 70, 30, 60, 30, 30, 80, 30, the mode value is 30,
because it occurred most frequently .
1.62 DEFINITION(S»OESIATISTICS
There are several definitions of statistics and we list a few of them are as follows :
A few people have the following types of views in their mind about statistics:
( a ) Statistics can prove anyth ing;
( b ) There are three types of lies --- lies, damned lies, and statistics;
( c ) Statistics are like clay of which one can make a God or devil as he/she pleases ;
( d ) It is only a tool , and cannot prove or disprove anyth ing.
It has scope in almost every kind of category we are divided in this world due to our
social setup , for example, Trade, Industry, Commerce, Economics, Biology,
Botany, Astronomy, Physics, Chemistry, Education, Medicine, Sociology,
Psychology, Religious studies , Meteorology, National defence, and Business:
Production, Sale, Purchage, Finance, Accounting, Quality control , etc..
EXERCISES
Exercise 1.1. Define the terms population, parameter, sample, and statistic.
Exercise 1.3. Describe the relationship between the variance and mean squared
error of an estimator. Hence deduc e the term relative efficiency.
Chapter 1: Basic concepts and mathematical notation 61
Exercise 1.4. You are required to plan a sample survey to study the environment
activities of a business in the United States . Suggest a suitable survey plan on the
following points : ( a ) sampling units; ( b ) sampling frame ; ( c ) method of
sampling; and ( d ) method of collecting information. Prepare a suitable
questionnaire which may be used to collect the required information.
Exercise 1.5. Define population, sampling unit and sampling frame for conducting
surveys on each of the following subjects. Mention other possible sampling units,
if any, in each case and discuss their relative merits .
( a ) Housing conditions in the United States.
( b ) Study of incidence of lung cancer and heart attacks in the United States .
(c) Measurement of the volume of timber available in the forests of Canberra .
( d ) Study of the birth rate in India.
( e ) Study of nutrient contents of food consumed by the residents of California.
( f) Labour manpower of large businesses in Canada .
( g ) Estimation of population density in India.
Exercise 1.8. Show that the sample variance s; = _1_ I(Yi -)if can be put in
n-I i=l
different ways as
N _ N
N
"y J2
S;=_I_[Ir/ _Ny2]= _1_ Iy;2_~
( L../
N - 1 i~ \ N -1 i ~\ N N(N -I)
can be written as
Exercise 1.10. Construct a sample space and tree diagram for each one of the
following situations:
( a) Toss a fair coin; ( b ) Toss two fair coins ; ( c ) Toss a fair die; ( d ) Toss a fair
coin and a fair die; (e) Toss two fair dice ; and (f) Toss a fair die and a fair coin .
Exercise 1.11. State what type of variable each of the following is. If a variable is
quantitative, say whether it is discrete or continuous; and if the variable is
qualitative say whether it is nominal or ordinal.
I Religious preference.
2 Amount of water in a glass.
3 Master card number.
4 Number of students in a class of 32 who turn in assignments on time.
5 Brand of personal computer.
6 Amount of fluid dispensed by a machine used to fill cups with chocolate.
7 Number of graduate applications in statistics each year at the SCSU .
8 Amount of time required to drive a car for 35 miles.
9 Room temperature recorded every half hour.
10 Weight ofletters to be mailed .
11 Taste of milk.
12 Occup ation list.
13 Coded numbers to different colors, e.g., Red--l , Green--2, and Pink--3 .
14 Average daily low temperature per year in the St. Cloud city.
15 Nat ional ity of the students in your University.
16 Phone number.
17 Rent paid by the tenant.
18 Frog Jump in ems .
19 Colors of marbles .
Chapter 1: Basic concepts and mathematical notation 63
PRACTICAL PROBLEMS
Practical 1.1. From a population of size 5 how many samples of size 2 can be
drawn by using ( a ) SRSWR and ( b ) SRSWOR sampling?
Practical 1.2. Mr. Bean selects all poss ible samples of two units from a population
consisting offour units viz. 10, 15, 20, 25 by using SRSWOR sampling. He noted
that the harmonic mean of this population is given by
The total number of possible samples =N CI/= 4CZ = 6 and these samples are given by
(10, 15), (10, 20), (10, 25), (15, 20), (15, 25) and (20, 25) .
The harmonic means for these samples are, respectively, are
H,=n
, / I1/ -=21 y{-10II}
i~' Yi 15
1 y{ -10II}
+ - =12, ' / I -=2
Hz=n +- =13 .33333
1/
20
i~ IYi
Mr. Bean took the harmonic mean of these six sample harmonic means, as follows :
64 Advanced sampling theory with applications
HM = _ 6 _= 6
6 1
- ,- {-1 + 1 + 1 + 1 +- I-+I }
i~IHi 12 13.33333 14.28571 17.14286 18.75 22.22222
= 15.58442 = H y .
(a) Sample harmonic mean is an unb iased estimator of population harmonic mean.
E(iI)=ts(nJ ~) = Hv :
I
1=1 HI
. .
Hmt. Expected value . E HI =
. (,)
z:
s(o ) '
PI H I
.
with PI
1 ( ) ( L N
=- '<f t = 1,2,oo ., s nand s n F ell '
1=1 6
( c ) Find the bias, var iance, and mean square error in the estimator ill .
( d ) Does the relation MSE(ill)= V (ill)+ {s(il l )}2 hold?
Practical 1.3. Suppose that a population consists of 5 units given by : 10, 15, 20, 25,
and 30 .Select all possible samples of 3 units using SRSWR and SRSWOR
sampling.
( a ) Show that the sample mean is an unbiased estimator of population mean in
each case.
( b ) The sample var iance is unbiased estimator of the population variance under
SRSWR sampling, and for population mean squared error under SRSWOR
sampling.
( c ) Also plot the sampl ing distribution of sample mean and sample variance in
each situation.
( d ) Find the variance of sample mean under SRSWOR sampling using the
definition of variance? Show all steps .
( e ) Also compute ( N~ n )s; and comment on it.
r
Practical 1.4. Repeat Mr. Bean's exercise with the geometric mean (OM) and
comment on the results.
(}~l/i
n
P r actical 1.6. Suppose an urn contains N baIls of which Np are black and Nq are
white so that p + q = 1. The probability that if n baIls are drawn (without
replacement), exactly x of them will be black, is given by
such that 0:0; x :0; Np; and 0:0; n - x :0; Nq . Using the concept of c.d.f., select a
sample of three units by using without replacement samp ling.
Hi nt : Hypergeometric distrib ution .
Use the first 6 col umns multiplied by 10- 6 as the values of the cumulative
distribution funct ion (c.d.f.) F(x) of the random variable x , and select a random
samp le of IS units by using with replacement sampling.
Hint: F(x) = 100+tanHF(x) -0.5)].
66 Advanced sampling theory with applications
rl,
population is given by
with tJ = 100 and a = 2.5 .Use the first 6 columns multiplied by 10- 6 as the values
of the cumulative distribution function (c.d .f.) F(x) of the random variable x, and
select a random sample of 15 units by using with replacement sampling.
Hint: x = tJ+Z() and z - N(O,I).
Practical 1.12. In the hope of preventing ecological damage from oil spills, a
biochemical company is developing an enzyme to break up oil into less harmful
chemicals. The table below shows the time it took for the enzyme to break up oil
samples at different temperatures. The researcher plans to use these data in
statistical analysis:
( a ) If you are a consultant which variable you will consider dependent and
independent? Denote your dependent variable by Y and independent variable
with X .
( b ) Assuming that these six observations form a population, compute the following
parameters:
- - 2 2 _ Sy _ St _ Sxy _ Sty _ Cy
Y ,X, Sy,Sx' Sy,Sx,Cy-~,Ct-~,Sxy,Px
y--- ,f3--2 andK-pxy- .
Y X SxSy c, s;
( g ) Con struct 95% confidence interval assuming that population mean square
is unknown and sample size in small. Doe s the population mean falls in it?
Interpret it.
( h ) Find the variance of estimator of proportion of countries having suicide
rate more than 25%.
Practical 1.15. Consider a popul ation cons isting of the follow ing six units:
Assume that these 10 observations form a sample compute the following stat istic:
- ·, x- ·, s2y '. s2x '. S ' S . C'y -----=-
_ Sy . C' _
, x '" -=- , xy '.
Sx . S rxy -- Sxy . b_ s xy and
Y y ' x' -- , --
2
Y x SxSy Sx
Practical 1.17. The follow ing data show s the daily temp eratures in Ne w York over
a period of two weeks:
Chapter I: Basic concepts and mathematical notation 69
Find the following: sample size; sample mean; median; mode; first quartile ; second
quartile; third quartile; minimum value; maximum value; and interquartile range .
Practical 1.18. Construct scatter diagrams and find the linear correlation coefficient
in each one of the following five samples each of five units and comment on the
different situations will arise:
Practical 1.19. The following balloon is filled with five gases with their different
atomic number and atomic weights.
La a.a ._ • .L.a. a .LL .......
·
·
·
··
................ . ·-
70 Advanced sampling theory with applications
( a ) Find the average atomic weight of all the gases in the balloon;
( b ) Find the population variance 0- 2 of atomic weight of all the gases in the
balloon ;
( c ) Select all possible with replacement samples each consist ing of two gases;
( d ) Estimate the average atomic weight from each one of the 25 samples;
( e ) Construct a frequency distribution table of all poss ible sample means ;
( f) Construct an histogram. Is it symmetric?;
( g ) Find the expected value of all sample means of atomic weights from the
frequency distribution table you developed ;
( h ) Find the variance of all the sample means of atomic weights from the
frequency distribution table you developed.
Practical 1.20. Consider a sample Y \,Y2" "'Yn and let Y k and s; denote the sample
mean and variance, respectively, of the first k observat ions .
( a ) Show that
2 (k - J) 2 J ( - )2
s k+! = - - S k + - - Y k+! - Yk .
k k+l
( b ) Suppose that a sample of 15 observations has sample mean and a sample
standard deviation 12.60 and 0.50, respectively. If we consider 16th observation of
the data set as 10.2. What will be the values of the sample mean and sample
standard deviation for all 16 observations?
2. SIMPLE RANDOM SAMPLING
2:0 INTRODUCTION
Simple Random Sampling (SRS) is the simplest and most commo n method of
selecti ng a sample, in which the sample is selected unit by unit , with equa l
probability of selection for each unit at eac h draw. In other words, simple random
sampling is a method of selecting a sample s of II units from a popul ation n of
size N by giving equal prob abilit y of selection to all units. It is a sampling scheme
in whic h all po ssible combinations of II units may be formed from the popul ation
of N units with the same chance of selection.
As discussed in chapter I:
( a ) If a unit is selected, observed, and replaced in the popul ation before the next
draw is made and the procedure is repeated n times, it gives rise to a simple
rando m sample of II units. Thi s procedure is kno wn as simple rando m sampling
with replacement and is denoted as SRSW R.
( b ) If a unit is selected, observed , and not replaced in the popul ation befor e
makin g the next draw, and the procedure is repeated until n distin ct units are
select ed, ignoring all repetition s, it is called simple random sampling without
replac ement and is denoted by SRSWOR. Let us discuss the properties of the
estim ator s of population mean, variance, and proportion in each of these cases.
_ [I"] I"
£V,,) = £ - I Yi =- I £(yJ .
II i =1 II i=\
(2 .1.1)
Now Yi is a random variable and each unit has been selected by SR SWR sampling,
therefore Yi can take value s JI,Yz"" ' YN with prob abilities l/ V , l/ N, ...,!/N . By
the definition of the expected value we have
I N -
E(Yi) = - L}j = Y .
Ni=1
Thu s (2. 1.1) impl ies
II
[I
E(YII ) =-InL - NLYi] =- LY
n
=Y .
i= 1 N i= l
I
Ili=l
c: -
(2.1.2)
Proof. We have
E(YII) = E[NYn ]= NE(Yn) =NY =Y (2.1.3)
which proves corollary.
Theor em 2.1.2. The varia nce of the estimator y" of the population mean Y is
- -I 2 2 -I N - -I N 2 -2
V(YII) = II O"y' whe re O"y = N i~I(}j - Y)2 = N [ i~l}j - NY ] (2 .1.4)
V(Yn) = V( -I LYi
II J=2I LV(Yi)
II
· (2. 1.5)
II i=1 II i=1
By the defin ition of var iance we have
V(Yi)=E[Yi - E(Yi)]2 = E~l )- {E(Yi )}2 =.l. ~ Y? _ y 2
N i=l
=-I [NL}j2 - NY
-2 ] =-I N(
L }j - -\2
YJ = 0"Y2 .
N i=1 N i=1
Using (2 .1.5) we have V(y,,) = O"}/11 . Hence the theorem.
where
2 I n _ I n 2 -2
Sy =- - L(Yi -Yn)2 = - - [ LYi -llYn ] ·
I/-li=l II- I i= 1
Note that
Chapter 2: Simple Random Sampling 73
2
E~,~ )= VVn )+y2= cry + y2
n
and
Es[y2]=E[1
- - (n'IYi2- nYn2)] =-- n 2-nYn2] =--
1 E[ 'IYi n 2-nYn2]
1 E[n- 'IYi
n- 1 i= \ n-I i= \ n- 1 n i=1
[1
_- -n- - 'In El)'i
n - 1 1/ i=1
(-2)~ -_ - n- [1"(1
( 2)- El)'n NY;2J - [cr;
- L- L
1/ - 1
- +Y-2)]
1/ i=\ N i= 1 1/
=_'_
'
1/ -
[J.- 2: Y/ - y2_cr; ]=_'_
1 N i=\
' [cr~ - cr.~ )
n- 1 )
1/ 1/
= cr 2.
)
E[v(y,J= ~E(s;)
n
= cr;n = V(y,,) .
Hence the theorem.
Corolla ry 2.1.2. The variance of the estimator Yn = NYn of the popul ation total is
V(y,,) = N2V(y,,) .
T heorem 2.1.4 . Unde r SRSWR sampling, while estimating population mean (or
total) , the minimum sample size with minimum relati ve standard error (RSE) equal
1
to ¢ , is given by
n ~ [;,; , (218)
Proof. The relat ive standard error of the estimator Y" is given by
We need an estimator Yll such that RSE(y,, ) ~ rjJ , which implies that
2 2
cr; /~I P) ~rjJ, or cr! 2 ~rjJ2, or n e ;~2 '
1/Y rjJ Y
Hence the theorem.
Remark J.
2.1: If rjJ =( YZ:/2 with = Za/2 e j; then p[I(YIl; y)! ~ e) = 1- a.
Example 2.1.1. In 1995, a fisherman selected an SRSWR sample of six kinds of
fish out of 69 kind s of fish ava ilable at Atlantic and Gul f Coasts as give n below :
13859 192071881
2 3489 12173121
3 2319 5377761
4 3688 13601344
5 16238 263672644
6 3688 13601344
.Sum 500498095'
V(Yn) = s;
= 37658120.3 = 6276353.38.
n 6
Using Table 2 from the Appendix the 95% confidence interval for the average
number of fish is given by
Yn± (O.05/2(df = 6 -1)Jv(Yn) , or 7213.5 ± 2.571.J6276353.38, or [772.46, 13654.53] .
Example 2.1.2. We wish to estimate the average number of fish in each one of the
species groups caught by marine recreational fishermen at the Atlantic and Gulf
coasts. There are 69 species groups caught during 1995 as shown in the population
4 in the Appendix. What is the minimum number of species groups to be selected
by SRSWR sampling to attain the accuracy of relative standard error 30%?
Given: sJ; = 37199578 and Y = 311528 .
Thus a sample of size n = 20 units is required to attain 30% relative standard error
of the estimator of population mean under SRSWR sampling.
Example 2.1.3. Select an SRSWR sample of twenty units from population 4 given
in the Appendix . Collect the information on the number of fish during 1995 in each
of the species group selected in the sample. Estimate the average number of fish in
each one of the species groups caught by marine recreational fishermen at Atlantic
and Gulf coasts during 1995. Construct the 95% confidence interval for the average
number of fish in each species group available in the United States.
Solution. The population size is N = 69, thus we used the first two columns of the
Pseudo-Random Number (PRN) Table 1 given in the Appendix to select 20 random
numbers between 1 and 69. The random numbers so selected are 58, 60, 54, 01, 69,
62,23,64,46,04,32,47,57,56,57,60,33,05,22 and 38.
Example 2.1.4. The depth y of the roots of plants in a field is uniform ly distributed
between 5cm and 8cm with the probability density function
f(y) = -1 V5<y<8•
3
We wish to estimate the average length of roots of the plants with an accuracy of
relative standard error of 5%, what is the required minimum with replacement
sample size n?
Chapter 2: Simple Random Sampling 77
Example 2.1.5. The depth y of the roots of plants in a field is uniformly distributed
between 5cm and 8 em with the probability density function
A (I - a )100% confidence interval for the average depth of roots in the field is
Yn=+= ta/2(df = n -INv(y,J .
Using Table 2 from the Appendix the 95% confidence interval estimate of the
average depth of the roots is given by
Yn =+= to.025(df = 7 -I NV(Yn), or 6.8711+ 2.447~0. 1 309 , or [5.9857, 7.756] .
Theorem 2.1.5. The covar iance between two sample means Yn and xn under
SRSWR sampli ng is:
_ _) O'xy
C OY( Yn'Xn = -, (2.1.10)
n
where
0'xy = NI N -XXi - Xr) .
2: (Jj - Y
i= l
Proof. We have
- ,X-n) =C OY(I- In Yi,-InIXiJ= 2""
COY(Yn I ICOY(Yi
n
' X;) , (2.1.11)
n i=1 n i=l n i=l
Now
COY(Yi 'Xi )= E(YiXi) - E(Yi )E(Xi) . (2.1.12)
The random variable s (YiXi)' Yi and xi , respectively, can take anyone of the
value (JjXi ), Y.I and x.I for i = 1,2,...,N with probability 1/ N . Thus we have
I N
=-IYX- -X
Y I N(
- =-2: Y,. - -X -) =0'
y X -X .
N i= l l I N i=l 1 I xy
=_1_{
n i=!
IE(YiXi)-nE(Ynxn)} =
-1
_1_{i:...!.- IJiX
n -1 i=1 N i=1
i - n(Cov(Yn'xn)+ Y x)l
f
=_1_{~ ~}jXi _n(axy +y xJ} =_n_{...!.- ~}jXi -yx _axy}
n -1 N i=! n n -1 N i=l n
n- {O"xy---
O"x y}
=- =O"xy ·
n-l n
Hence the theorem.
_ -1 n
Theorem 2.2.1. The sample mean Y n =n L Yi is an unbiased estimator of the
i=1
- IN
population mean Y = N- Lli.
i=1
Proof. We have to show that E(Yn) = Y. It is interesting to note that this result can
be proved by using three different methods as shown below.
I 0 otherwise.
80 Adva nce d sampling theory with app lications
Note that Yi is a fixed value in the popul ation for the i''' un it, therefore, the
expec ted value of (2.2 . I) is give n by
Not e that (N- I)C(II_l) is the numb er of samples in which a given population unit can
occur out of all NC II SRSWOR samples, and therefore the prob ab ility that i'''
(N-I )C
pop ulation unit is selected in the sample is = N (II-I) =.!!...- So the random
CII N
variab le t i takes the value I with probability ~ and 0 with pr obab ility (1-~ ).
Thu s the expected value of t, is
_ [I I
Method II. We can also prove the same result as foIlows
In (2 .2.4) the sample value Yi is a random vari able and can take any population
value lj , i = 1,2,...,N , with probabil ity 1/N .
Thu s we have
1 N -
E (Yi) = - L}j = Y.
Ni~ 1
Method III. To prove the above result by another method, let us consi der
(YII)I = sample mean YI based on the tl" sample selected from the pop ulation.
Note that there are N C II possi ble samples, the probability of selecting the l" samp le
IS
PI = I/(NCII ).
By the defin ition of expec ted value, we have
Chapter 2: Simple Random Sampling 81
L LY; J
C
=
I
N
N " ( ,,
n( C,,) 1;1 ;;1 1
Sam Ie no 2 3 4 5 6
Sampleduni ( A ,B) (A,C) (A,D) (B,C) (B,D) (C,D)
PopQla@nuilit~J. (}] , Yz ) (}], Y3) (}], Y4 ) (Yz , Y3) (Yz , Y4 ) ( Y3 , Y4 )
The values of the units in the sample in all these cases are YI and yz.
Thus we have
=~+Yz~+~+Yz~+~+Yz~+~+Yz~+~+Yz\+~+Yz~
= (}] + Yz)+(}] + Y3)+ (}] + Y4)+ (Yz + Y3)+ (Yz + Y4)+ (Y3 + Y4 )
= f {(N-I)q,,_I)}(Ji) .
;;1
Theorem 2.2.2. The probability for any population unit to get selected in the
sample at any particular draw is equivalent to inverse of the population size, that is,
Probabil ity of select ing the i1h unit in a sample = ~. (2.2.5)
N
82 Adva nced sampling theory with applications
Proof. Let us consider that at the r'" draw, the i''' popul ation unit Yi is se lected. Th is
is poss ible only if this unit has not been selec ted in the previous (r- I) draws . Let
us now consider the draws one by one.
First draw: The probab ility for the particular unit' }j , to get selected on the first
draw out of N units is = 1/N . Note that ' the probability ' that }j is not selected on
the first draw, from a popu lation of N units is = {1- 1/N } = (N - 1)/ N .
Second draw: The probabil ity that a particular unit is selected on the second dra w
(if it is not already selected on the first draw) is the product of two prob ab ilities,
namely
(Probability that }j is not selected on the first draw) x (Probability that }j is
selec ted on the second draw)
Therefore the prob ab ility that }j is selected on the seco nd draw is equal to
(N - I ) 1 1
- N - x(N_ I) = N
Note that the probabil ity the }j is not selecte d on the seco nd draw out of the
remai ning (N -I) popul ation units is equal to
1--- =--
1
(N - I)
N- 2
N- l '
Third draw : The pro babi lity that a particular popu lation unit is se lected on the third
draw (if it is not selected on the seco nd draw) is the product of three probabilities .
Probabi lity that }j is selected on the third draw (if it is not selected on first or
seco nd draw) is a prod uct of three probabilities as
(Probability that }j is not selected on first draw) x (Probab ility that }j is not
selected on second draw) x (Probability that }j is selected on the third draw)
= (~) x (~) x _1
N N- I N-2
=J.-.
N
Note that the prob ability that Y; is not selected on the third draw out of (N - 2)
population units is equal to
1-- -=-- .
1
N- 2
N -3
N- 2
Th is procedure continues up to (r - I) draws.
rth draw: Prob abil ity that }j is not selected up to (r - I) th draw is given by
- - x---x x
(N- I)
N
(N -2)
(N- I)
N- (r - I)
N- (r - 2)
=-- -
N- r+ 1
N
Probability that }j is selected at ,-t" draw [ass uming that it is not selected at any of
the prev ious (r - I) draws] is equa l to
Chapter 2: Simple Random Sampling 83
I I
N -(r-I) = N -r+1 .
So we obtain the probability of a particular unit Y; to get selected at the l' draw is
(N - r + I)
-'-----'- x
I
-
I
N (N -r+l) N
Hence the theorem.
where S'[;2 = -I - N(
2: Yi - Y-)2 and f = n/ N denote the finite population correction
N-I i=l
factor (f.p .c.).
Proof. We have
where Ii is a random variable that takes the value ' 1' if the {" unit is included in the
sample, otherwise it takes the value O. Note that the Jj is fixed for the {" un it in the
population we have
In (2 .2.8) we need to determine the V(/J and CovV; , Ii). Note that the distributions
of t, and 11 are
I with probability n] N, and 12 = {I with probability (n/ N),
t, = { 0 with probability 1- (n/ N), , 0 with probability {I - (n/ N)}.
We have
vk) = E[/i - E(/iW= E~l)- {E(/i )}2
(2.2 .9)
The probability that both {" and r units are included in the sample IS
(N-2)C(n_2)
N =
n(n
(
- I)) an d ot herwise the pro babilitv
0 i
I ity IS 1-
n(n
(
-I)) ,there f ore
ell N N-I N N-I
84 Advanced sampling theory with app lications
I
I with probability /1((/1 - I)) .
N N- I
I;lj = 0 with probability { I - ~~:;_I?J
Now we have
COV/i,l ( )- E()
( j )=EI;lj /1(/1 - I)
li El( j ) = N(N - I)
(/IN )( N/I ) = Nn [ N- 1I- N
Il - Il ]
VCYII)=~[~r;2{/I(N~n)}+
n N i=\
~j=\ r;Yj{-~(N-n)}]
N (N- I) i",
= (N-n)[~f2
c:
__I_ c: ~ fY .] (2.2 .11)
nN2 i=\ (N-I) i", j=\ I J '
Note that
-2 1 N 1 N 2 N
Y (N i=1I: YiJ2 = -N2 [ i=1I: Yi
= - + I:
ioto )= \
YiY) ] , (2.2 .12)
we obtain
N 2 -2 N 2
I r;Yj =NY -I r; ·
i"' j=\ i=1
On substituting (2.2.12) in (2.2.11) we obtain
VCYIl )= (N
fiN
-;1)[~r;2 __
(N
I_( N2y2 - ~ r;2J]
i= 1 i=1 -I )
=(N
/IN
-;1 )[(1 + _I_)~
N- I
r;2_ ~ y2]
N-I i= 1
where
Sy
2=-( -1) [N
Ir;2- NY-2].
N- I i= 1
Now
E(s~)= E[_I{Iyr
n - I i= 1
-ny;}] =_1 [E( IYlJ -nE~; )l
n -I i= 1 J
= n~I[~i~/~l)- E~;)]. (2.2.15)
Not e that E~,~ )= V(Yn )+ {E(Yn)}2 = N- n s; + y2and each unit }j becomes selected
Nn
with probabil ity 1/ N , thus (2.2. I5) becomes
Theorem 2.2.5. Under SRSWOR sampling, while estimating population mean (or
total), the minimum sample size with minimum relative standard error (RSE) equal
to ¢ , is
(2.2.17)
Note that we need an estimator Yll such that RSE(y,, ) 5, ¢ , which implies that
- 1
( ~ - ~)
II
~}
N y2
5, ¢, or (~" - ~) 5-
N y2
5, ¢2, or
1 ¢ 2 y2
n >: [ N + s;]
Rem ark 2.2: If ¢ = (YZ:/2) ,with e=Za/2 (I ~/\~ then p(! (Y"i Y)I : ; e]=I-a .
Exa mple 2.2.2 . A fishermen recruiting company, XYZ , sele cted an SRSWOR
samp le of six kinds of fish out of 69 kinds of fish avai lable at Atlantic and Gulf
Coasts as be low :
Samp le 2
Yi Yi
Unit
I 16855 284091025
2 10940 119683600
3 4793 22972849
4 2146 4605316
5 3816 14561856
6 935 874225
Sum 39485 446788871
s; = _I_!IYT _
n - 1 i=l
n- 1( IYi)2) = _6 -I1_[ 446788871- (39485)2
i=l 6
j
= 37388933.4 .
Thus
v(Yn) = (I~/ }; = ( 1- 0;869 ) x 37388933.4 = 5689972.5 .
Using Tabl e 2 from the Appendix the 95% confidence interval for the average
number of fish is given by
Y,, ±lO.02S(df=6-I)Jv(y,,), or 6580.83±2.5nJ5689972.5 , or [448.05,12713 .61] .
( c ) An estimate of total number of fish is given by
y = Ny" = 69 x 6580 .83 = 454077.27 .
( d ) The 95% confidence interval for the total number of fish is given by
N x [448.05,1 2713.6 1], or 69 x [448.05,12713.61] , or [30915.45, 877239.09] .
Exa mple 2.2.3. We wish to estimate the average number of fish in each one of the
species groups caught by marine recreational fishermen at the Atlantic and Gulf
coasts. There were 69 species groups caught during 1995 as shown in the
popul ation 4 in the Appendix. What is the minimum numbe r of species groups to be
selected by SRSWOR sampling to attain the accuracy of relative standard error
30% ?
Gi ven: s; = 3719957 8 and Y = 311528.
n ';? [~ + ¢2Y2]-1 = [~+ 0.32 x4514 .8982 j-l= 15.6 0::: 16.
N S2y 69 37199578
Thus a minimum samp le of size n = 16 units is required to attain 30% relat ive
standard error of the estimator of population tota l or mean under SRSWO R
samplin g.
Solution. The population size is N = 69, therefo re we used the secon d and third
columns of the Pseudo-Random Number (PRN) Table I given in the Appen dix to
select 16 random numbers between 1 and 69. The random numbers so selected are
01,49,25, 14,2~36,42 ,44,65 ,2~4~66, 17, 08, 33, and 53.
.y
~i ~ Yn ~ ,;,
No. ' ~Jt~"'" '~;
".
01 Sharks, other 20 16 -937.8 130 879492.2852
08 Toadfishes 1632 -1321.8100 1747188 .2850
14 Scu lpins 71 -2882 .8 100 83 10607.9100
17 Temperate basses, other 23 -2930.8 100 858966 1.9100
20 Sea basses, other 2068 -885 .8 130 784663 .7852
25 Florida pompano 644 -2309.8 100 5335233.7850
26 Jacks, other 1625 -1328.8 100 1765742. 6600
33 Snappers, other 492 -2461.8100 6060 520.7850
36 Grunts, other 3379 425. 1875 180784.4102
40 Red porgy 230 -2723 .8 100 74 19154.5350
42 Spotted seatro ut 246 15 21661.1900 469207043.9000
44 Sand seatrout 4355 1401.1880 1963326.4 100
49 Black drum 1595 -1358.8 100 184637 1.4 100
53 Barracuda 908 -2045.8 100 4185348.7850
65 Winter flounder 2324 -629.8130 396663 .7852
66 Flounders, other 1284 -1669 .8100 2788273.7850
.,.il< ,""",0 '.'\\ Sum ~ 47261
" 0.0000 52 1460078.4000
An estimate of the average number of fish in each species group during 1995 is
Yn =.!- fYi = 47261 = 2953.813 .
n i=l 16
Now
s; =-n-I-l f (vi - Yn)2 = 521460078.4 = 34764005.23.
16- 1
i=l
and the estimate of variance of the estimator Yn is
v(Yn) = C~f }; =C-:~69 Jx 34764005.23 =1668924.16.
A (I - a)I00% confidence interval for the average number of fish in each one of the
species grou ps caught during 1995 by marine recreational fisherme n in the United
States is
Yn+(a/2(df = n-1)Jv(Yn ).
Using Tab le 2 from the Appendix the 95% confidence interval is given by
Exam ple 2.2.5 . The distribution of yield (kglha) y of a crop in 1000 plots has a
Cauchy distribution:
1
f (y) = i }, - 00 < y < + 00 .
ll"ll+(y-IO)2
We wish to estimate the average yield with an accura cy of relati ve standard error of
0.15%. What is the minimum sample size 11 requ ired while using SRSWOR
sampling?
Solution. Since the true mean and variance of a variable having Cauch y distribution
are unknown, therefore it is not possible to find the required sample size under such
a distribution.
Exam ple 2.2.6 . The distribution of yield (kg/ha) y of a crop in 1000 plots has a
logistic distribution
f(y )= _1 sech2{~(~)}
4/3 2 /3.
with a. = 40 and /3. = 2.5.
( a ) Find the value of minimum sample size 11 required to estimate ave rage yield
with an accuracy of standard error of 5%
( b ) Select a sampl e of the required size and construct 95% confidence interv al for
the average yield .
( c ) Does the true average yield lies in the 95% confidence interval?
Solution. ( a ) We know that the mean and variance of a logistic distribution are
given by
Mean = a . = 40
and
.
Variance = O"y =
2 /3.2- ll"2
= 2.5 x
2 3.14159 2
= 20.56.
3 3
Also we are given N = 1000 thus
2 N 2 1000
S y = - - 0" Y = -
- - x 20.56 = 20.5806 .
N- I 1000-1
Thu s the minimum sample size required for 1ft = 0.05 is given by
2 2j-l
n e - I + -1ft2f2j
-
-l [ I
= - - +
0.05 x40
=5 .11 ,:::5.
[N S2
y
1000 20.5806
We know that the cumul ative distribution function for the logistic distribution is
Using the last three columns of the Pseudo-Random Number (PRN) Table I given
in the Appendix, multiplied by 10-3, we obtain five values of F(y) and the
corresponding values ofy as given below:
h- =..!.-~
L~
. = 195.375 =39075
. .
n 1=1 5
We use the alternative method to find s; given by
s; = _1_[±Y1_
n -I
ny; ] = _1_[7661.202 - 5 x39.075
1=1 5-1
2]= 6.7309 ,
Using Table 2 from the Appendix the 95% confidence interval is given by
( c ) Yes, the resultant 95% confidence interval estimate contains the true average
yield a. = 40.
Theorem 2.2.6. The covariance between the two sample means Yn and xn under
a,
SRSWOR sampling is:
- -) (1-
Cov (Yn' X n = - n - xy ' (2.2.18)
where
S
xy
=_l_~(y
N-l1:J
-YXx. -x).
I I
Proof. We have
Chapter 2: Simple Random Sampling 91
__
Cov(Y", x,,) = COY(I"
- LY; , -I"
II ;=1
L·r; )= COY(-IIIN;=1
II ;= 1
LI;Y; , -III ;=LI;X;
N )
1
(2.2. 19)
where I; is a rand om variable that takes the value' I ' if the { II unit is included in the
sample, otherwise it takes the value O. Note that the pair Y; and X ; is fixed for the
Cov(y",x,,) = ~COV(II;Y;,
II 1=1
IliX; ) = ~[E{II;Y;}{II;X;}
1=1 II 1=1 1=1
- E{~I;Y;}E{It;X;}]
1=1 1=1
In (2.2.20) we need to determine the E~?) and E(I;Ij)' Note that the distributions
of and are :
{I
I; I?
The probability that both {II and /" units are included in the sample IS
(N-2)
N q ,,-2) =
c,
()
t
- I ) and otherwise the probability is I
II
N N- I
lit
(
)
-I ) , therefore
N N- I
11(11 -1)
with probabi lity N (N _I)'
I;lj = 11
o wit hprobabi lity {I N(N-I) .
II(II-I) }
Now we have
11(11 -1) { 11(11-1) } = 11(11-1)
E(
1;1)
j = 1x (
N N- I
) +0 I
N(N-I ) N(N - I)' (2.2.22)
=1 n
- [{-LYX.+
N n(n -I) LN YX · } - {n
-LYN n N }]
}{ -LX.
n2 N i =1 1 1 N(N -IL;< j=\ t) N i =\ 1 N i = 1 1 . (2.2.23)
Note that
= ~[~{~}
n N N- 1
~ Y;Xi - Y x {ntN-- I))}]
i =\
=(-
n
f)
N (Y;-Y Xi-X = -
I - - -1- I
N - I i=\
1- - St,. -x -) ( f) n }
Hence the theorem.
COV(YII' C
xn )= ~f}XY (2 .2.24)
where
Sxy = -1-[fYiXi-nynXn].
n-I i=1
Proof. We have to show that
E[cov(Yn' xn)] = Cov(Yn ' x,J.
We have
E[cov(Yn , Xn)]=E[l~f Sty] = I~f E(Sty)
Now
n[1 1I
= - - - L E(y;x; )-
__ ] .
E(Yllx
/I-I /I ; ;1
lI)
Now
E(YllxlI) = COV(YII' xlI )+ E(YII)E(xlI) = N - n Sxy + YX ,
Nil
and each pair of units >j and X; gets selected with probability 1/ N , therefore we
have
(N
E (Sty ) =-/I- [ -I L11 L - I >jX;) - - --/I) {(N
- X
Sty + Y - }]
/I -I /I;;) ;;\ N Nil
= _II_[~ ~ YX - Y X _ (N - 11 )s ]
11 - I N ;;) I I Nil xy
= _
I II_[N
-I N-I S ty _ (NNil-11) S ty] = S.w '
.
Thus we obtain
E[cov(YII ' XII)] = Cov(y,I' XII )'
Hence the theorem.
Ex am ple 2.2.7. Consider the joint proba bility densi ty function of two continuous
()l
random variables x and y is
f x,y =
~(x
3
+ 2y) 0 < x < I, 0 < y < I,
o otherwise.
ea ) Select six pairs of observations (y, x) by using the Random Number Table
method .
eb ) Estimate the value of covariance between x and y .
Solution . e a ) See Chapter I.
( b ) Estimate of covariance:
R Y R
2 I X
0.992 0.995 0.622 0.423
0.588 0.722 0.771 0.514
0.601 0.732 0.917 0.600
0.549 0.69 1 0.675 0.456
0.925 0.954 0.534 0.368
0.0 14 0.039 0.5 13 0.355
94 Advanced samp ling theory with applications
So we obtain
y "'~ (~, - y-) .'! 'l' X"c< ..' Ii" (x -x) (y - jl)(X- x)
'"
0.995 0.306 167 0.423 -0.030 -0.009080
0.722 0.033 167 0.514 0.06 1 0.002034
0.732 0.043 167 0.600 0.147 0.006360
0.691 0.002167 0.456 0.003 0.000000
0.954 0.265167 0.368 -0.085 -0.022450
0.039 -0.649830 0.355 -0.098 0.063467
I.. Sum k'. 4.133 , "0.000000 ""2 :7 16 0.000 0.0403 35
Thus an estimate of the covaria nce betwee n two sample means is give n by
• (-Y n , X- n )
COY = ( -1- f- J = (-1- -f J 1
Sxy --L, ~ (Yi - Y-XXi- r)
X
n n n - I i= l
Let N be the total number of units in the population nand N a be the numb er of
units possessing a certain attribute, A (say). Then population proportion is the ratio
of number of units possessing the attribute A to the total number of units in the
popu lation, i.e., Py = Nal N . Thus we have the following theorem:
Theorem 2.3.1. The popu lation proporti on Py is a special case of the popu lation
mean Y.
We will discuss the problem of estimation of popu lation proportion using SRSWR
and SRSWOR samp ling.
Chapter 2: Simp le Rand om Sampling 95
C ase I. When the sample is drawn using simple random sampling wit h rep lacement
(S RSWR samp ling), we have the following theo rems .
a; = ~[ I Y;2 -Nf2 ].
N i=1
Note that
y, = {I if the;''' unit possesses the attribute A,
, 0 otherwise,
and
y2=
,
{I
0 otherwise.
if the ;''' unit possesses the attribute A,
So that
a~ = ~ [NA - N~~] = '; -~~ = ~.(l -~\,)= PyQy '
Th us we have
96 Advanced sampling theory with applications
V(Py)= PyQy .
n
Hence the theorem.
Proof. We have to show that E[v(pJ = v(p)or in other words we have to show
that
E(PyqyJ= PyQy .
n-1 n
Now we know that s; /n is an unbiased estimator of a;/n .
Defining
y. =
I if the it" sampled unit E A,
an d y .2 = {1 if the /1 sampled unit E A,
1 { 0 otherwise, I 0 otherwise.
Hence we will obtain
2 1 [nLYi2-nYn-2] =--[r-npy
Sy =--
I r .2] n [r
= - - - - Py
_2] =--Py
n _ (1- Py
_) =npyqy
--.
n-1 i=l n-1 n-1 n n-1 n-I
So that
2
Sy = Pyqy
II II-I
Hence the theorem.
(2.3.4)
~ IIP
Qy ~ rjJ, or Qy s rjJ2, or
IIPy
11;:0: ;y .
Py
y rjJ
Hence the theorem.
Example 2.3.1. We wish to estimate the proportion of the number of fish in the
group Herring cau ght by marine recreational fish ermen at the Atlantic and Gul f
coasts. There are 30027 fish out of total 311,528 fish caught during 1995 as shown
in the population 4 in the Appendix. What is the minimum number of fish to be
selected by SRSWR sampling to atta in 5% relative standard error of the estimator
of population proportion ?
Solution. We ha ve
P , = 30027 = 0.0964 and Q), = 1- Py = 1- 0.0964 = 0.9036.
) 3 11528 '
Thus for rp = 0.05 , we have
II ;:>: Q / (rp2 P) =
0.9036 = 3749.4 '" 3750.
Y 0.052 x 0.0964
Y
Thus a minimum sample of size II = 3750 fish is required to attain 5% relati ve
standard error of the estimator of population proportion under SRSWR sampling.
Example 2.3.2. A fisherm an visited the Atlantic and Gulf coast and caught 4000
fish on e by one . He not ed the species group of each fish cau ght by him and put
back that fish in the sea before making the next catch. He ob served that 400 fish
belong to the group Herr ings .
( a ) Estimate the proportion of fish in the group Herrings livin g in the Atl an tic and
Gulf coast.
( b ) Co nstruct the 95 % confidence inter val.
Solution. W e are given 11 =4000 and r = 400 .
( a) An estimate of the proportion of the fish in the Herrings group is give n by
P =!- = 400 = 0.1.
y 1/ 4000
(b ) Under SRSWR sampling an estim ate of the v(p) is given by
v(p ,)= P/ly = 0.1x 0.9 = 2.2505 x 10- 5.
)
11 -1 4000 -1
A (1- a)100% confide nce interval for the true proportion Py is give n by
Py + Za/2 ~V( Py ) .
Thus the 95 % confidence interval for the proportion of fish belonging to the
Herrings group is given by
Py + 1.96~v( Py ) , or 0.1 +1.96b.2505 x 10- 5 , or [0.0907, 0.1092].
( a ) Select a sample of the required size, and estimate the proportion of plants with
height more than 15 cm.
( b ) Construct a 95% confidence interval estimate, assuming that your sample size
is large, and interpret your results.
Solution. We know that if y has uniform distribution function
1
f( y) = - \;f a <y <b •
b-a
Thus the proportion of plants with height more than 15cm is given by
20 20 1 1 5
Py = fJ(y}ty = f -dy
15
= -(20 -15) =- = 0.3333,
15 15 15 IS
and the variance
0"; = Py (1- py ) = 0.3333(1- 0.3333) = 0.2222 .
( a) We need ¢ = O.4S, thus the required minimum sample size is given by
19.31
0.183 7.75 o
0.448 11.72 o
0.171 7.57 o
0.567 13.51 o
0.737 16.06
0.856 17.84
0.233 8.50 o
0.895 18.43
0.263 8.95 o
Thus an estimate of the proportion Py is given by,
Chapter 2: Simple Random Sampling 99
Case II. When a samp le is drawn using SRSWO R sampling, we have the fol1owing
theorems.
Theorem 2.3.6. The unbiased estimator of the population proportion P; is given by
r
r- >;
A
V\YII (N-n)
(-:: ) =---Sy, where Sy = - I - 2 2 (NIf;2- NY-2J.
Nn N- I i=l
Again we define
y = {I if the ;ti' unit possesses the attribute A,
I ° otherwise ,
and
y'2 = {I if the /h unit possesses the attribute A,
I ° otherwise .
So
S2 = _I_(N _ Np2 )= ~(p _ p2 )= NPyQy = S2 .
y N- I A y N-I y y N- I P
Hence we have
v (p )=N-ns2=N- n x~PQv = (N -n)pQ "
y Nn P N/I N- 1 Y. n(N -1) y }
which proves the theorem.
N ow we k no w t h at (N-II)
- - -Sy2 .
IS an un b lase
i d estimator
esti 0 f -N-- II Sy2 .
Nil Nil
Cha ngi ng
y; = {I if the /" population unit A, and E y;2 = {I if the i~" poulation unit A, E
o otherwise, 0 otherwise,
mak es
(N- II)S2 = (N - Il) PQ
Nil y II(N - I) y r:
Similarly, if we make the ch anges
v. = {I if the /~ sampled unit E A, an d
2
y. ={I if the /" sampled unit E A,
o otherwise, I 0 otherwise,
then
-(N-II)
- - s2 _(-
N-Il
-JIl
- Pyqy
- _- (N-Il)
--P q
. .
Nil p - Nil (II-I) - N(II - I) y y'
Hen ce the theorem.
(2.3.6)
Note that we need an estimator Py suc h that RSE(p y ) ~ ¢, w hich implies that
(N
II
-IJ (N Q
y
-I )~,
<¢
- ,
Hen ce the theorem.
Chapter 2: Simple Random Sampling 101
Example 2.3.4. We wish to estimate the proportion of the number of fish in the
group Herrings caught by marine recreational fishermen at the Atlantic and Gulf
coasts . There are 30 ,027 fish out of total 311,528 of fish caught during 1995 as
shown in the population 4 in the Appendix . What is the minimum number of fish to
be selected by SRSWOR sampling to attain the accuracy of relative stand ard error
5%?
Solution. We have
P = 30027 = 0.0964 and Qy = 1- Py = 1- 0.0964 = 0.9036.
y 311528 '
Thus for ¢ = 0.05 , we have
ne 2( NQ y 311528 xO.9036 =3 704.8;::;3705.
¢ N -I)Py+Qy 0.052(311528-1)xO .0964 +0 .9036
Thus a minimum sample of size n = 3705 fish is required to attain 5% relative
standard error of the estimator of population proportion under SRSWOR sampling.
Example 2.3.5. A fisherman visited the Atlantic and Gulf coast and caught 4000
fish. He noted the species group of each fish caught by him . He observed that 400
fish belong to the group Herrings.
( a) Estimate the proportion of fish in the group Herrings living in the Atlantic and
Gulf coast.
( b ) Construct the 95% confidence interval.
Given: Total number of fish living in the coast = 311528.
Solution. We are given N = 311,528, n = 4,000 and r = 400 .
( a ) An estimate of the proportion of the fish in the Herrings group is
_ r 400
p = - = - - =0.1.
y n 4000
(b) Under SRSWOR sampling, an estimate of the V(Py) is given by
v(- ,) = (N -n)Pyqy = (311528-4000)x 0.l xO.9 =2.2216x10-5.
r, N n-1 311528 4000-1
A (1 - a)1 00% confidence interval for the true proportion Py is given by
Py + Za/2~V( Py ) .
Thus the 95% confidence interval for the proportion of fish belonging to Herrings
group is given by
py+1.96~~{ Py), or 0.1+1.96~2.2216x10-5, or [0.0908,0.1092].
Example 2.3.6. Ina field there are 1,000 plants and the distribution of their height
is given by the probability mass function
( a ) Select a random sample of n = 10 units and est imate the proportion of plants
with height more than or equa l to 225 ern.
( b ) Construct a 95% confidence interva l, assuming that it is a large sample.
Using the first three columns, multip lied by 10-3 , of the Pseudo-Random Number
(PRN) Table I given in the Appendix, we obtain the 10 values of F(Y) and y as:
F(y) ' Y
:
~ 225
yes-vL
+ .. '" " ,,
:no-~O :i
0.992 225 I
0.588 100 0
0.601 150 0
0.549 100 0
0.925 225 I
0.0 14 Discard this number
0.697 ISO 0
0.872 200 0
0.626 150 0
0.236 50 0
0.884 225 I
(a) An estimate of the proportion of plants with height more tha n 225 em is
Theorem 2.4.1. The minimum mean squared error of the estimator, Ysearl ' is
Min.MSE(Ysearl) = V(Yn)/ {I + V(Yn )/p}. (2.4 .2)
Proof. We have
MSE(Yscarl)= E~s - rf = E[AYn - rf = E[AYn -E(AYn)+ E(AYn)- rf
= E[A{Yn - E(Yn)} + AE(Yn)- Yf = E[A{Yn - E(Yn )} + (A -1)y]2
= ElA
2{Yn - E(Yn)f + (A -If y 2+ 2(A -1)YA{Yn - E(Yn )}j
.
Mm.MSE Ysearl
(_ ) = ,2 V(_)
Yn + (, - I)2-2
Y = {_
f4V(Yn) }2 + {y2
2 (_) - I}2 Y
-2
A A
2
y +V(Yn) Y +VYn
= y4 V(Yn)
{y2 + V(Yn)}
2 +{ y2 -t 2
Y + V(Yn)
- ~(Yn )}\2
y4V(Yn) + y2 {V(Yn )}2 = y2V(Yn ){y2 + V(Yn )}
{y2 + v(Yn)f {r 2+ V(Yn)f {y2 + V(Yn )}2
y2 V(Yn) V(Yn) (2.4.6)
y2+ V(Yn) I+V(Yn)/y 2'
Hence the theorem .
Theorem 2.4.2. Under SRSWR sampling, the minimum mean squared error of the
Searls' estimator is
Min.MSE(Ysearl) = n-10"; / {I + n -10"; / Y2}. (2.4.7)
Proof. Obvious from (2.4 .2) because under SRSWR sampling we have
V(Y-n )= n - I O"y2 .
104 Advanced sampling theory with appl ications
Theorem 2.4.3. The relati ve efficiency of the Searls' estimator Y searl with respect
to usual estimator Y", under SRSWR samplin g, is given by
RE =I + eT;' /~IY2 ) . (2.4.8)
Thu s the relat ive gain in the Searls' estimat or is inversely proportional to the
sample size, II. In other words, as /I ~ 00 , the value of RE ~ 1.
Proof. It follows from the definition of the relative efficiency. Note that the relative
efficiency of Searl s' estimator with respect to the usual estimator is given by
-
RE -
MSE(YIl) _
( ) - .
V(YIl )
( ) -
-
/I
-I 21eT y
II -leT;
I 2 2
)-1
MSE Ysea rl Mm.MSE Ysearl 1+ /1- Y a Y
- 1- 2 2
=1 +/1 YeT y . (2.4 .9)
Hence the theorem .
Theorem 2.4.4. Under SRSWOR sampling , the minimum mean squared error of
the Searls' estimator is
Theorem 2.4.5. The relative efficiency of the Searls' estimator Ysearl with respect
to Y/I , und er SRSWOR, is given by
1
RE = + (~ -
/I
J...)C
N
2
Y
.
Thu s the relative gain in efficienc y of the Searls ' estimator is inversely proportional
to the sampl e size II. In other word s, as /I ~ N the value of RE ~ 1.
M',.MSE(y" o" )
1 \
Y
/I N y2
2
= +1 (~II - J...)
N y2
::y =1+ (~II - J...)c
N Y
2
.
E xample 2.4.1. We wish to estimate the ave rage num ber of fish in eac h one of the
spec ies gro ups caught by marine recrea tional fishermen at the Atlantic and Gulf
coasts. Th ere were 69 species caught during 1995 as show n in the pop ulation 4 of
the Ap pen dix . We selecte d a sample of 20 units by SRSW R sampling . Wha t is the
gain in effic iency owed to the Searls' estimator over the sample mean?
Giv en: S; = 371995 78 and Y = 3 11528 .
Solution. We are given N = 69 , S; = 37199578 and Y = 311528 , thus
- Y 311528
Y =- =- - = 4514.898
N 69
and
0-Y2 = -(N-N-1)SY2 = -
(69 -1)
69
- x 37199578 = 36660453.68.
The relative efficiency of the Searls' estimator Ysearl with respect to usual estimator
Yn , under SRS WR, is given by
RE = [1 + 0-12 ] x l 00 = [ 1+ 36660453.68 ] x 100 = 108.99% .
lIy 20 x45 14.8982
Find the relative efficiency of Searls' estimator ove r the usual estimator based on a
sample of 5 or 20 units, respec tive ly.
Solution. We are given a = 200 and b = 500 , therefore the population mean
Searls ( 1967), Reddy ( 1978a) and Arn holt and Hebert ( 1995) studied the properties
of this estimator and found that it is useful if C y is large and sample size is small.
106 Advanced sampl ing theory with applications
We sha ll discuss the problem of estimatio n of finite popul ation mean and variance
by using only distinct units from the SRSW R sample. However, before goi ng
further we shall discuss some results, which will be helpful in de riving the result
from distinc t units for SRSW R sample. Basu ( 1958) introdu ced the concept of
sufficiency in sampling from finite populations. Acco rding to him , for every
orde red sample SO there ex ists an unordered sample s uo which is obtained from SO
by ignoring information concerning the order in which the labels occur . The data
obtained from the sample suo can be repre sent ed as
C/UO= \Yi
( ... / E S uo ) . (2.5 .1)
dO= ~i : i E so ). (2.5.2)
Then the probab ility of observing the ordered d" give n the unordered data dUO is
(2 .5.3)
where L is the summation ove r all those ordered sampl es sa which results in the
unordered sample suo . Since the probabil ity P(dOI dUO) is independ ent of any
populatio n parameters and hence the unordered statistics dU
Ois a suffic ient statistic
for any population parameter.
Let us now first state the Rao-- Blackwell theorem, which is based on Rao ( 1945)
and Blackwell ( 1947) results.
Theorem 2.5.1. Let e~ = e(do) be an estimator of e con structed from ordered data
s
=
suo
L{Le,o(do\~
As:) }p(suO) Le~(do )p(so)=£(e~).
sO
=
( b) We have
MSE(e~ ) = E[es O - of = E[e.~ - es + es - of
=E(e~ -e. +E(es -0) +2E (e~ -esXe -0)
.s
I. ) l
=E(e~ - + MSE(es)+0 .
Hence the theorem.
Now we will discuss the problem of estimation of mean and variance on the basis of
distinct un its in the sample. Clearly a unit can be repeated onl y in W R sampling
schemes . Hence we are dealing only with SRSW R sampling scheme . Suppose v
denote the numb er of distinct units in the sam ple of size n drawn from the
popul ation of N units by using SRSWR scheme.
Th e distribution of distin ct units in the sample was first develop ed by Feller (195 7)
as follows
For esti mating the population mean Y by using information only from the distinct
units we have the following theor ems.
Proof. Followi ng Raj and Khamis (1958), let E 2 and E1 be the expected values
defined for a given sample (fixed numb er of distinct unit s) and for all possible
samples, respectively, then by taking expe cted value on both sides of (2.5. 1.1), we
have
Theorem 2.5.1.2. Th e variance of the unb iased estimator y,. based on distinct un its
IS
(2.5.1 .3)
Proof. Suppose V2 and VI denote the variance for the given sample (fixed numb er
of distin ct unit s) and over all possible samples, we have
108 Ad vanced sa mpling theory with applications
It is interes ting to note that as the sa mple size 11 drawn wi th SRSW R sa mpling
approaches to the population of size N, the magn itud e of the relative efficiency also
inc reases. Th e rea son of inc rease in the relati ve effic iency may be that the increase
in sample size also increases the probability of rep etition of unit s in SRSW R
sa mpling .
Th e relati ve efficiency und er the Feller (1957) distribution is given by
V(y ) (N -I)N(I/-I )
RE = - = --'--r--'----,
n{Nil J(I/- I)}
_1/-
(2 .5. I .6)
V(Yv)
J=I
whic h is free fro m any po pulatio n parameter but depe nds upon populat ion size an d
sa mple size .
Th e following tabl e shows the percent relative efficie ncy of dist inct unit s based
est imators wit h res pect to the esti mators based on SRSW R sa mpling for different
va lues of sa mple sizes n an d po pulation sizes N = 10 .
Chapter 2: Simple Random Sampling 109
"
"
:: oi
BenetitOf\use ' distinct -units
>,J Sample size ( n )
J
J 2 3 4 5 6 7 8 9
J(n-l)
I I I I I 1 1 1 I
2 2 4 8 16 32 64 128 256
3 3 9 27 81 243 729 2187 6561
4 4 16 64 256 1024 4096 16384 65536
5 5 25 125 625 3125 15625 78125 390625
6 6 36 216 1296 7776 46656 279936 1679616
7 7 49 343 2401 16807 11 7649 823543 5764801
8 8 64 512 4096 32768 262144 2097152 16777216
9 9 81 729 6561 59049 531441 4782969 43046721
Sum I" 45 """ 285 2025 15333' 120825 978405 8080425 67731333
Theorem 2.5.1.4. (a) Show that an altern ative estimator of the population mean
based on distinct units is
where I] (v) and h (v) are suitably chosen constants such that ys is an unbiased
estimator of Y and its variance is minimum. Now from the property of
unbiasedness we have
E(ys) = EUi(v)yv + h(v)]= fi(v)Y + h(v)= Y . (2.5.l.l1)
This implies that
h(v)= [1 - fi(v)]Y. (2.5.l.l2)
Evidently the value of h (v) contains the unknown value Y, the exact value of
h (v) is not known unless fi(v)=1, which implies I: (v) = O. Thus we chose
fi(v) = 1, then h (v) = 0 , which means a better estimator of population mean Y
v
would be yy = v-I 2.: Yi . In practical situations, sometimes a priori information or
i=1
knowledge of X (say) is available about population mean Y from past surveys or
pilot surveys . In such situations, the value of h (v) is given by
h(v)= [1- fi(v)]X . (2.5.1.13)
Thus if we will chose h (v) as given in (2.5.l.l3), then the bias in the estimator ys
will be minimum . Unfortunately, I: (v) depends upon the value of II (v) too. The
best method to chose I I (v ) is such that the variance of ys is minimum . Now the
variance of the estimator ys is given by
V(Ys) = E]V2(Y.J + V] E2 (ys) = E]V2Ui(v)yv + h(v)] + V2ElUi (v)Yv + h(v)]
If no such information abou t Y is ava ilable, then we have X = 0 and the above
estima tor reduces to
_ (Nv)/(N-v) _
Y2 = E[(Nv)/(N _v)Vv . (2.5.1.18)
Path ak ( 1961) has show n that
Theorem 2.5.1.5. Show that if the square of the population coeffi cient of var iation
Cf, = sf, /f 2 exceeds (II-I) , then the esti mato r Y2 = (v/E(v))Yv is more effic ient
than Yv'
Proof. We know that
_
V(Y2) = 2
S;(N-I)1 /I 2
[ff-(1- N1 )" ) -f-
f 2(1- -;;1)"+ (I- fj2 )/1 )]
N! I-( I- N))
+ y2 II 2
[ N( I- fjI )" -N 21 I )211 +N(N - \{ I- N
( -fj 2 )" ] .
N 2 [1_( I_ ~) ] (2.5.1.25)
l:
N- I
J
(
II -I
)
_ S2 ) -1
- Y Nil
2 -2 (2.5.1.26)
= CISy -C2Y (say) .
Now the estimator Y2 IS better than Yv if v(Yv)- V()l2 )<O or if
(s;jy2)> (c2/cd.
The approximate values of C) and C2 for large pop ulation s, correct up to terms of
order N - 2 , arc given by
C) =_1_+ 5(11- 1) and C, = (n- I) _ (n- IXn-2)
2nN 12nN 2 2nN 3nN 2
and thus, (C2 /C));:::(n-I) . Hence the theorem.
Theorem 2.5.1.6. If squared error be the loss functio n then show that )Iv is
adm issib le amo ngst all functio ns of )Iv and v .
Proof. Let I = )I,. + /
()lv, v) be the function of )lv and v . Suppose that the est imator
I is uniformly better than )Iv . Suppose R(l ) be the quadratic loss function for the
Chapter 2: Simple Random Sampling 113
es timator t. Then the estimator t will be uniformly better than the estim ator Yv if
"»2 = N - I IN ( Y; - -Y \2J
i= 1
using distin ct uni ts in a sa mple of II unit s drawn by using S RSW R sa mpling. Th e
usual estimator of <7; is given by
-[1-C,.(II-()II I)]
Sv2 - Cv
2
Sci (2.5.2.1)
where
l
2 _! (V- It ±(Yi - Yv)2 if v > I, (2.5.2.2)
sci - i= 1
o otherwise,
and
Proof. Suppose we have any convex loss function and T be an ord ered sufficient
stati stic , then by the Rao--Blackwell theorem we have
To prove that the estimator at (2.5.2.1) is uniformly better than s ~ let us consider
the following cases :
If v = I , i.e., only one unit has been selected in the sample of two units drawn by
SRSWR then (2.5.2.4) is obv iously zero . Suppose 'I I and 'I II denote the
summations over all integral values of at such that the following equalities holds:
v v
2:a(i) = n , a(i) > 0 for i = 1,2,..., v and 'I a()F (n - 2), a(j r:: 0, at) ) > 0 and
i=1 )=!
E[(YI - yz)Z I
2
r] ± ~(i)
=
i ~j= l
- Y(j)~ P[XI = X(i )' Xz = x(j) I r].
2
On substituting the value of P~tl = X(i),X2 = x(j ) ITJ from (2.5.2 .8), we obtain
14 IN 1213.024
47 WA 1100.745
22 MI 323.028
42 TN 553.266
23 MN 1354.768
48 WV 99.277
06 CO 315.809
07 CT 7.130
21 MA 7.590
31 NM 140.582
36 OK 612.108
16 KS 1049.834
27 NE 1337.852
10 GA 939.460
18 LA 282.565
26 MT 292.965
( a ) Estimate the average real estate farm loans in the United States using
information from distinct units only.
( b ) Estimate the finite population variance of the real estate loans in the US using
information from distinct units only.
( c ) Estimate the average real estate loans and its finite population variance by
including repeated units in the sample . Comment on the results .
Solution. Here n = 20 and v = 17, and on the basis of distinct units information, we
have
( b ) An est imator of the finite population variance CT; based on distinct units
information is given by
2 [ C)n-I)] 2
Sv = 1- C)n) sd'
Now
2 1 V( _)2 3911380
sd = -(- ) L y.- y = = 244461.25,
v-I i=1 I v 17-1
and
C (n) = vn _(v)(V_I)n+.oo+(_I)(V-I)(V )In .
v I v-I
C)n) = CI7(20) = 1720 - cl 7(17 _1)20 + C~7 (17_2)20 - cj7 (17 - 3fO + CJ7(17 _4)20
- cF (17 _5)20 + C~7 (17- 6)20 - cj7 (17 _7)20 + cJ7 (17 _8)20
- cr (17- 9)20 + c1J(17 _10)20 - cli (17 _11)20 + cli (17 _12)20
-clI (17 _13)20 + cll (17 _14)20 - clJ (17 _ 15)20+ clJ (17 _16)20
= 2.6366 x 10
20 ,
and
C)n-I)= CI7(19)= 1719 -CF(17-1)19 +Cf(17-2)19 -Cj7(17-3)19 +CJ7(17 -4)19
-CF(17-5)19 +C~7(17-6)19 -Cj7(17-7)19 +CJ7(17- 8)19
S; = ( I _ 2.6366
4.4805 x 10
x 10
18
20
) x 244461.25 = 240307.004 .
1i,ymEer
t
farm loans, fYl"'-I '~t;;;
~ )
~
(-
"" , (; Yi - Yn ~', ' Yi - Yn t ,;"~
'~ '
2.6 'ESTIMkTION
OF K,POPULA
Sometime we are interested in estimating the total or mean value of a variable of
interest within a subgroup or part of the population. Such a part or subgroup of a
population is called the domain of interest. For example, in a state wide survey, a
district may be considered as a domain. After completing the survey sampling
process from the whole population, one may be interested in estimating the mean or
total of a particular subgroup of the population. We are interested in estimating
population parameters of a subgroup of a population. For example
~ ..
" , ;r ;!,. ;; ~. ;
;. yr••";·· ;.0· !0".
Let D be the domain of interest and N D be the number of units in this domain.
ND _ 1 ND
Let YD = L f; and YD =- L Y; be the total and mean for the domain D
;=1 ND ;=1
respectively . Suppose we selected an SRSWOR sample s of n units from the
entire population nand nD ~ n units out of the selected units are from the domain
D of interest. In certain situations the value of N D is known and in another
situations the value of N D is unknown. We shall discuss the both situations as
follows. Define a variable
y' =
I
{I0
if i E D ,
if i ~ D. (2.6.1)
N • •
Then we have If; = YD = Y (say).
i=1
Theorem 2.6.2. The variance of the estimator YD under SRSWOR sampling is:
2(
V(YD) = N I- f)sb, where Sb=_I_f~Y;'2_N-I(~y;,)2J. (2 .6.3)
n N -I 1;=1 ;=1
Proof. Obvious from the results ofSRS WOR sampling.
v.(y'D)_ j
2
- N (1_ f) SD,
(J
2 1 '2 -1
2
where SD= - L Y; - n L Y; 2J .
/I /I ,
- (2.6.4)
n n -I ;=1 ;=1
Proof. Obvious .
Lemma 2.6.1. If i E SD indicates that the t" unit is in the sample sub group of D
then, we have
Prf( . E SDI nD> 0 ) = -no . (2.6.5)
ND
Proof. We have
Pr(i E SD InD > 0)
Pr(i E SD, 1I1D) Number of samples of sizes n Dwith i E SD
= Pr(n D) = Number of samples of sizes nD
Number of ways (nD - I) can be chosen from (ND- I) and (n- nD)from (N- ND)
Numb er of ways n D can be chosen from NDand (n- nD) from (N- ND)
ND
=E2 [- - "i.JiYi InD > 0] , where Ii = {I if iES.O'
n DiED 0 otherwise,
Proof. We have
1 1
-;;; = n( n: ) = n( n: - Po ) + nPo
Chapter 2: Simple Random Sampling 121
by
(2 .6.8)
Hence the lemma. For more details about the expected values of an inverse random
variab le one can refer to Stephen (1945).
Theorem 2.6.5 . Show that the variance of the est imator YD , when N D is know n, is
V(YD)", Pr(IID > O{{PD N2(~_ f) + : : (1- PD)}Sb + Pr(IID = 0)Y5]. (2.6.9)
Proof. We have
V(YD)= E, ~2(YD IIID)]+ VI [E2(YDI n» )] . (2.6.10)
Now
'
V2(YD [IID)= OIlD
N D2 ( I-~
ND
Jsb if no > 0, (, ) {YD if liD > 0,
and E2 YD j llD = 0
1 if no = 0,
if no = O.
Now
.if liD > 0] = VI [YDI(IID > 0)] = Y5 V, [/(IID > 0)]= Y5 Pr(IID > OXI- Pr(IID > 0))
If liD = 0
= Y8 Pr(IID > O)Pr(IID = 0), (2.6.12)
122 Advanced sampling theory with app lications
and
if v» > 0]
If no =0
= min(flf,';'(IID = j )N5 (1 - ~)S5 +pr(IID = o)«0
j= 1 no ND
= Pr(1I D > 0)
min(ND ,II) N2 ( II 2
I Prlll D = j ill D > 0)---.!2... 1- ~ S D
J
no
;J
j =l ND
'" N5 S5{_ I_ + 1
IIPD II PD N D
~ P~
-_I_ }pr(IID > 0), where PD = N D
N
+ N5 1
2
'" { PD N
II
(1 -~)S5
N
~ P~ S5} pr(II D > 0),
II P (2.6.13)
D
1 ( J2) l
' no 2 1 II *2 *
IY; - IY;
1/
PD = - and SD = - - no t
II no -I ;= 1 ;= 1
P(n = I ) =
(:~J:QmJ
J (NP-m+l) m.m + I,... .m + NQ.
(
NN-I+I x , 1= (2.7.1)
I-I
Such a distribution is called negative hypergeometric distribution and we have
m+NQ
L: P(n = I) = I. Then we have the following theorem.
I=m
1\2\
p =
(N - IXm- IXm- 2) +(m- I)
-- - .
1 . N(II - IXII - 2) N(II - I)
est imate
v"(")_
p - {"}2
p - 11\p 2 ] _
- -(m-- If- - (N-I Xm- IXm-2 ) (m- I)
(II - If N(II - IXII - 2) N(II - I)
estimate
P ( II; Ill , P ) = II - J
I P 11/Q /1- 11/ cl or /I = Ill, III + I, III + 2,...
( m- I
Then we have the following theorem.
Theorem 2.7.3 . An unbiased estimator of the required proportion P of a rare
attribute is
• Ill-I
p =- - (2.7.4)
/I - I
and an estimator of the V(p ) is given by
v(p) = p(l - p). (2.7.5)
11 -2
Proof. Ob vious for large N from the previous theor ems.
While using simple random sampling and without replacement (SRSWO R) design,
the number of possible samples, N C; , is very large, even for moderate sample and
population sizes. For example, if
Number of samples
N
/I 30 40
5 142,506 658,008
10 30,045 ,0 15 847,660,528
15 155, 117,520 40,225 ,345,056
126 Advanced sampling theory with applications
Some time s in the field surveys , all the possible samples are not equally preferable
from the operational point of view , because a few of them may be inaccessible,
expensive, or inconv enient, etc.. It is therefore advantageous if the sampl ing design
is such that the total number of poss ible samples is much less than N en , retainin g
the unbi asedness prop erties of sample mean and sample variance for their
respective population param eters. Neyman's (1923) notati on for causal effect s in
randomized experiments and Fisher's (1925) proposal to actually randomize
treatments to units . Neyman (1923) appears to have been the first to provide a
mathem atical analysis for a randomi zed experim ent with expli cit notation for the
potential outcomes, implicitly making the stability assumption. This notation
became standard for work in randomi zed exper iments (e.g., Pitman, 1937; Welch,
1937 ; McCarth y, 1939; Anscombe, 1948; Kempthorne, 1952; Brillin ger, Jones, and
Tuk ey, 1978; Hod ges and Lehmann, 1970 , and dozens of other place s, often
assuming con stant treatment effects as in Cox, 1958, and sometimes being used
quite informally as in Freedman, Pisan i, and Purv es, 1978) . Neym an's formali sm
was a maj or advan ce because it allowed explicit prob abil istic inferences to be
drawn from data, where the probabilit ies were explicitly defined by the random ized
ass ignment mechanism. Independently and nearly simultaneously, Fisher ( 1925)
invented a somewhat different method of inferenc e for rand omized experiments,
also based on the specia l class of randomi zed assignment mech anisms. Fisher's test
and resulting ' significance levels' (i.e., p values), remain the accepted rigorou s
standard for the analysis of randomi zed clinical trials at the end of the twent ieth
century, so called ' intent to treat analy ses. The notions of the cen tral role of
randomized experiments seems to have been ' in the air ' in the 1920, but Fisher was
the first to combine physical randomi zation with a theoreti cal analysis tied to it. A
review on randomi zation is also available by Fienberg and Tanur ( 1987). These
ideas were primarily assoc iated with the notion of fairness and obje ctivity in their
earlier work. The role of the International Statistical Institute in the earlier work
related to sample surveys, as reviewed by Smith and Sugd en (1985). Fienberg and
Tanur (1987) explored some of the developments following from their earlier
pioneer ing work with an emph asis on the parall els between the methodologies in
the design of experiments and the design of sample surveys . Chakrabarti (1963)
initiated the idea that the results on the existence and construction of balanced
sampling designs can be easily translated to the language of design theor y by using
the corr espondence between sampling design and block designs. Bellhouse (19 84a)
also work ed on these lines and has shown that a systematic applic ation of the
treatments minimi ses the variance of the treatment constant averaged over the
application of the treatment. The lack of cross reference in the review papers by
Cox ( 1984) and Smith (1984) suggested that the specia lisation extends even to
compartmentalisation within the minds and pro fession al lives of outstand ing
investigators, for both these authors have been steeped in the tradition of parall els.
For example, consider a balanced incomplete block design (BIBD) with standard
parameters (b , v, r, k,A), where v denotes the number of varieties, b the number of
blocks , k the block size, r the number of times each treatment occurs and A the
number of times any pair of treatments occur together in a blocks. In practice
Chapter 2: Simple Random Sampling 127
i ' __
SI
{I if i E S
0 if i '" S
such that £(1.." ) = -r ,
J b
and
I i f i.] E S ( ) A
i s,;; =
" { 0 if i, j '" S such that E i "I)" = -b . J
such that
I N ] 1N ) 1N r r N -
£(y)=£[ - l. Y/ si =- l.}j £(Jsi =- l. }j-=- l. }j=Y
II i= l II i= l II i=l b lib i=l
becau se vr = bk .
Similarly using r(k -I) = A(V-I) and bk = vr we have
1/ J ( N
£ ( ~~YiYj =£ ~ ~}jY/sij =-l. l. }jYj = - _
(
JA N
) l. ~ Yi Yj= -( _ )l.l.}jYj
(r(r - I)J N k(1I - I) N
'*1 '* 1 b'*l bv 1 '* 1 v v 1 1* 1
11(11-1) N
= (
N N- I
)l.DiYj
N.j
'
Theorem 2.8.1. Under controlled sampling design, the sample mean and sample
variance rema in unbi ased to their respecti ve parameters.
Subramani and Trac y (1993) used the concept of incomplete block design in sample
surveys and introduced a new sampling scheme called determinant sampling. This
scheme totally ignore s the units close to each other for selection in the sample. In
the preceding discus sion , the units which are close to each other in some sense are
called contiguous units. Chakrabarti (1963) excluded conti guous units when
tran slating the result s of sampling designs to experimental design s since these units
have a tendency to provide identical inform ation which may be induced by factors
like time , category or location . As an example, in socio economic surveys people
128 Adva nced sampling theory with applications
have a tendency to exhibit similar expenditure patterns on household items dur ing
different wee ks of the month. More over peopl e belonging to the same income
category class have a grea ter tende ncy to have simi lar expe nditure patterns. With
regar d to the factor location, residents of a speci fic area show similar symp toms of a
disease caused by env ironmental pollution as of some infectious disease . Sim ilarly
in crop field surveys contiguous farms and fields shou ld be avoi ded. Because of
this limitation, Rao (197 5, 1987) has sugges ted that if contiguous units occ ur in any
observe d sample, they may be collapsed into a sing le unit, with the corresponding
response as the average observed respon se over these units. An estimate of the
unkn own par ameter is then recommended on the basis of such a reduced sample.
The situations for getting more information on the popul ation by avoiding pairs of
contiguous un its in the observed sample are well summarised by Heda yat, Rao, and
Stufk en (19 88). Tracy and Osahan (1994 a) furth er extend their work for other
sampling schemes.
EXERCISES
Exercise 2.1. Define simple random sampling. Is the sample mean a consistent or
unbi ased estimator of the population mean? Derive the variance of the estimator
using ( a ) SRSWR sampling ( b) SRSWOR sampling. Also derive an unbi ased
estimator of variance in each situation.
Exercise 2.2. A popul ation consists of N units, the value of one unit being known to
be YI • An SRS WOR of (II - I) units is drawn from the remainin g (N - I)
population units. Show that the estimator
)'1 = l) + (N-1)YIl_I
11 -1 N
where YIl- I = (11 - lt l LYi ,is an unbiased estimator of the popu lation total, Y=L Y j ,
i= 1 ;=2
but the variance of the estimator Y1 is not less than the variance of estim ator
Y2=NYIl ' where YIl=I1- 1 I Yi is an estimator of popu lation mean based on the sample
i= 1
of size 11 selected from the popul ation of N units. In other word s, the estimator Y1
is no more efficient than Y2. Give reasons.
Hint: By setting V(YI )?:: V(Y2) we obtain N > 11 which is always true for SRSWOR
Exercise 2.3. Suppose in the list on N businesses serially numb ered, k businesses
are found to be dead and t new businesses came into exis tence making the total
numb er of business (N - k + t). Give a simple procedure for selecting a businesses
with equal prob ability from (N - k + t) businesses, avoidin g renumbering of the
origi nal busine sses and show that the newly developed procedure achieves equal
probab ility for the new business too.
Chapte r 2: Simple Rand om Sampling 129
Hint: Using SRSWOR sampling the probability of selecting each unit will be
l
(N - k+ tt •
Exercise 2.4. Show that the bias in the Searl s' estimator defin ed as, Ysearl = AYn , is
B(Ysearl)= -Y V(YIl )/{f2 +v(y,.)}. Hence deduc e its values und er SRSWR and
SRSWOR.
Hint: Redd y (1 978a).
Exercise 2.5. An analogue to the Searls' estimator for estimating the population
propo rtion is defined as, P searl = Y Py , where y is a constant. Find the min imum
mean square error of the estimator Psearl under SRSWR and SRSWOR sampling.
Also study the bias of the estimator in each situation.
Hint: Conti (1995).
i=[I+ 11
(N
N- I
)(s;,/y 2 )tJ
Show that i is a consi stent estimator of the optimum value of A . Also calcul ate
the bias and mean squared error, to the first order of approximation, in the estimator
of popu lation mean defined as, Yo = i yn . Deduce the results for estimating
popul ation proportion with the estimator, P searl = r Py , where r is a consistent
estimator of y .
Hint: Mangat, Singh , and Singh (199 1).
Exercise 2.7. Sho w that: ( a) under SRSWR sampling s;' is an unbi ased estimator
of a.~ , ( b) under SRSWOR sampling s;' is an unbia sed estimator of sJ,.
Exercise 2.8. Define the Searls' estimator of population mean . Show that the
relative efficiency of the Searls ' estimator is a decreasing funct ion of sample size
under (a) SWSWR (b) SRSWOR sampling designs.
Exercise 2.9. Show that the prob ability of selecting the i l " unit in the Sl" sampl e
remain the same under SRSWR and SRSWOR and is given by 1/ N .
Exercise 2.10. Why is the Searls ' estimator not useful in actual practice? Suggest
some modifications to make it practicable.
Hint: Use i in place of A. .
130 Advanced sampling theory with applications
Exercise 2.11. In case ofSWSWR sampling, if ther e are two characters Y and x ,
the covariance between Y and X is defined as a x)' =~ I (Yi - vXXi - x). Then the
N i=1
·
usua I estimator 0 f a . .
xy IS given
by Sri'
-X
= - I - I1/ ( Yi - Y Xi - rX) . Show that an
. n-I i= 1
estimator better than S ty based onl y on distinct units is:
Exercise 2.12. (a) Show that the usua l estimator of the popu lation tota l (namely
Ny) in SRSWOR has average minimum mean squared error, for permutations of
va lues attac hed to the units , in the general class of linear translation invariant
estimato rs of the population total Y.
( b ) Show that for SRSWOR sampling of size II , the estimator which minimises
the average mean squared error, for permutations of values atta ched to the unit s, in
the class of all linear estim ators is give n by,
N
Ie =-II (-I + Ii )iIYi
ES
where Ii = (N
N- I
-II ) C; and
II
Cy is the known population coefficient of variation.
Hint: Ramakrishnan and Rao ( 1975) .
Exercise 2.13. Let a finite population consi st of N units . To every unit there is
attached a characteristic y . The characteristics are assumed to be measured on a
given sca le with distinct points Y I,Y Z,...,Yt . Let N, be the number of unit s
associated with scale point Y/ ' with N = I N/ . A simp le random sample of size II
t
where /It is the numb er of times the value s of Yt are observed in the sample.
Hin t: Hartl ey and Rao (196 8).
Exercise 2.14. Suppose we selected a sample of size II such that the {IJ unit of the
population occurs Ii times in the sample. Assume that II I of the se unit s (Ill < II ) are
r
selected with frequen cy one . Evidently II = /II + I f; , where r is the number of
i=1
units occ urring I i times in the sample. Let d (= III +r)be the number of distinct
Chapter 2: Simple Random Sampling 131
unit s in the sample. Th e d unit s are measured by one set of investigators and the r
repeated units by another set, preferably by the supervising staffs. The measurement
of the d units be denoted by Xl> X2, .•., xIII for the non-repeated ones and
xIII +1' xIII +2> •.•,x lIl +,. for the rep eated ones . The measurement of the r repeated unit s
be denoted by 21, 22, .. .,2,.. Us ing the abo ve information and not ation, study the
asymptotic properties of the following estimators of population mean:
(a) Xd =~~IIXIII
d
+rxrJ; (b ) ZR = Z,. ( ~dJ ;
X,.
(c) z/,.= Z,. +P(Xd - X,.);
Exercise 2.15. Discuss the problem of the estimation of domain total in survey
sampling. Derive the estimator of domain total and find its variance under different
situations.
Exercise 2.16. Under SRSWR sampling, show that the distinct unit s based unbiased
estimators of the finite population variance a y2 are given by
(Nil) J (II- I)
( a) •
VI =
J= I
N"-I(N- I) Y
S
2•
, ( b) V2 =[(~ - ~ ) +NI-II(I - ~)}3;
. _ Cv_l(n - l) 2. ( d) ~ _
(N
il)J (II -1) [1- Cv(n- I)] 2.
( C ) v3 ~4 -
J- I
II-I ( )
- ( )
c, n
sd,
N N -I c, (n ) Sd ,
Exercise 2.17. Discuss the method and theory of the estimation ofrare attributes in
survey sampling.
Exercise 2.18. Write a program in FORTRAN or SAS to find the values of the
coefficients Cv(n-I) and Cv(n ). Test the results for n = 5 and v = 3 with all steps
using your desk calculator.
where cI. is a constant depending on the /" draw, YI is the va lue of Y on the unit
se lected at the t" draw.
132 Ad vance d sampling theory with app lications
( a ) Show that -
Ynew is unbiased for population mean Y " Ci = 1.
if and onl y if L
i= J
( c ) Show that V(Ynew) is minim ised subject to the condition I Ci = 1 if and only if
i=1
Ci = 1/11, ,11 .
i = 1,2,...
H int: Ic; ~ (I ci i /11 = 1/11 and equal ity hold s if and only if Ci = 1/11 .
( a ) Sho w that both estim ators y" and YII/ are unbi ased for population mean Y.
Exercise 2.21. Discuss controlled sampling. Show that the sample mean and sampl e
variance rem ain unbiased to their respective parameters.
Exercise 2.22 . Discuss the concept of rare attribute and give a pos sible solution
using inverse samp ling.
PRACTICAL PROBL EM S
P r acti cal 2.1. Co nsider the problem of estimation of the tota l number of fish caught
by marine recreational fishermen at Atlantic and Gulf coasts. We know that there
were 69 species caught during 1992 as shown in the population 4 in the Appendix .
What is the minimum numb er of species groups to be se lected by SR SWR sampling
to attain the accuracy of relative standard error 12%?
Given: s; = 31,0 10,599 and Y = 291 ,882.
Chapter 2: Simpl e Random Sampling 133
Practical 2.2. Your supervis or has sugges ted you to think on the problem of
estim ation of the total numb er of fish caught by marine recreational fishermen at
Atlantic and Gulf coa sts. He told you that there were 69 species caught during 1993
as shown in the population 4 in the Appendix. He needs your help in deciding the
sample size using SRSWOR design with the relative standard erro r 25% . How your
kno wledg e in statistic s can help him?
Given: sJ, = 39,881,874 and Y = 316,784.
Practical 2.3. Th e demand for the Bluefi sh has been found to be highe st in certain
markets. In order to supply these types of fish the estimation of the proport ion of
bluefish is an important issue . At Atlantic and Gulf coas ts, in a large sampl e of
311,528 fish there were sho wn to be 10,940 Bluefish caught durin g 1995. What is
the minimum numb er of fish to be selected by SRSWR sampling to attain the
accuracy of relativ e standard error 12%?
Practical 2.4. John considers the problem of estim ation of the total number of fish
caught by marine recreati onal fishermen at Atlantic and Gulf coasts. There were 69
spec ies caught durin g 1994 as shown in the popul ation 4 in the Appendix. John
selected a sample of 20 units by SRSW R sampling. What will be his gain in
effi ciency ifh e considers the Sear ls' estimator instea d of usual estimator?
Given: sJ, = 49,829,270 and Y = 341,856.
Practical 2.5. Select an SRSWR sample of twenty units from population 4 given in
the Appendix. Collect the information on the number of fish during 1994 in each of
the species group selected in the sample . Estimate the average number of fish
caught by marine recreational fishermen at the Atlantic and Gulf coa sts dur ing
1994. Construc t 95% confid ence interval for the average numb er of fish in each
spec ies group of the United States .
Practical 2.7. Select an SRSWR sample of 20 state s using Random Number Table
meth od from popul ation I of the Appendix. Note the frequency of each state
selected in the sample. Construct a new sample by keepin g onl y distinct states and
coll ect the information about the nonr eal estate farm loans in these states. From the
information collected in the sample:
( a ) Estimate the average nonreal estate farm loans in the Unit ed States USIng
information from distin ct units only.
( b ) Estimate the finite population variance of the nonreal estate loans in the United
States using distinct units only.
134 Adva nced sampling theory with applications
( c ) Estimate the average nonrea l estate loans and its finite pop ulati on variance by
inclu ding repeated unit s in the sample. Comment on the results.
Practical 2.8. A fisherman visited the Atlantic and Gulf coast and caught 6,000 fish
one by one. He noted the species group of eac h fish caught by him and put back
that fish in the sea before mak ing the next caught. He observed that 700 fish belon g
to the group Herrings.
( a ) Estimate the proportion of fish in the group Herrings living in the Atlanti c and
Gulf coast.
( b ) Co nstruc t the 95% confidence interval.
Practical 2.9. Durin g 1995 Michael visited the Atlantic and Gulf coast and caught
7,000 fish. He observed the spec ies group of each one of the fish caught by him
using SRSWOR sampling and found that 1,068 fish belong to the group Red
snapper.
( a ) Estimate the proportion of fish in the group Red snappe r living in the Atlantic
and Gul f coast.
( b ) Construct the 95% confid ence interval.
Gi ven: Total numb er of fish living in the coast = 311 ,52 8.
Practical 2.10. Follo win g the instructions of an ABC comp any, select an SRSW R
sample of 25 unit s from the popul ation I by using the 4 th and 5th co lumns of the
Pseud o-R and om Numb ers (PRN) given in Table I of the Appendix . Record the
states selected more than once in the sample. Reduc e the sample size by keeping
only eac h state onc e in the sample and collect the information about the real estate
farm loans in these states. Use this information to:
( a ) Estimate the average real estate farm loans 10 the Uni ted States using
inform ation from distin ct units only.
( b ) Estimate the finite popul ation variance of the real estate loans in the US using
informati on from distinct units only.
( c ) Estimate the average real estate loans and its finite popul ation variance by
includ ing repeated units in the sample. Comment on the result s.
Practical 2.11. You think of a practical situation where you have to estimate a total
of a variabl e or characteristic of a subgroup (dom ain) of a population. Tak e a
sample of reasonable size from the population under study and collect the
information from the units selected in the sample. Apply the appropriate formul ae
to construct the 95% confidence interval estimate.
Practical 2.12. A practic al situation arises where you have to estimate a proportion
of a rare attribute in a popul ation, e.g., extra marital relations. Coll ect the
information from the units selected in the sample throu gh inverse sampling from the
population under study. Apply the appropriate formul ae to construc t the 95%
confidence intervals for the prop ortion of the rare attribute in the popul ation.
Chapter 2: Simple Random Sampling 135
Practical 2.13. A sample of 30 out of 100 managers was taken, and they were
asked whether or not they usually take work home. The responses of these
managers are given below where ' Yes' indicates they usually take work home and
'No' means they do not.
Construct 95% confidence intervals for the proportion of all managers who take
work home using the following sampling schemes :
( a ) Simple Random Sampl ing and With Replacement;
( b ) Simple Random Sampling and Without Replacement.
Practical 2.14. From a list of 80,000 farms in a state, a sample of 2,100 farms was
selected by SRSWOR sampling. The data for the number of cattle for the sample
were as follows :
n n 2
LYi = 38,000 , and L Yi = 920,000.
i ;1 i ;!
Estimate from the sample the total number of cattle in the state, the average number
of cattle per farm, along with their standard errors , coefficient of variat ion and 95%
confidence interval.
Practical 2.15. At St. Cloud State University, the length of hairs, Y, on the heads
of girls is assumed to be uniformly distributed between 5 em and 25cm with the
probability density function
1
f(y) = - \;j 5 < Y < 25
20
( a ) We wish to estimate the average length of hairs with an accuracy of relative
standa rd error of 5%, what is the required minimum number of hairs to be taken
from the girls?
( b ) Select a sample of the required size, and use it to construct a 95% confidence
interval for the average length of hairs?
Practical 2.16. The distribution ofweighty shipped to 1000 locations has a logistic
distribution
f Y =-sech
() 1
4fl.
2{ -
1 --
2 fl.
(x-a•J}
with a. = 10 and fl. = 0.5 .
( a ) Find the value of the minimum sample size n required to estim ate the average
weight shipped with an accuracy of standard error of 0.05% .
( b ) Select a sample of the required size and construct 95% confidence interval for
the average weight shipped.
( c) Does the true weight lies in the 95% confidence interval?
136 Advanced sampl ing theory with applicat ions
Practical 2.17. Assume that the life of every person is made of an infinite number
of good and bad events . Count the total number of good and bad events you
remember that have happened to you. Estimate the proportion of good events in
your life. Construct a 95% confidence interval estimate. Name the sampling
scheme you adopted to estimate proportion of good happenings, and comment.
Practical 2.18. Assuming that everyone dreams infinite number times during
sleeping hours in the life. Count the number of good and bad dreams in your life
you remember. Estimate the proportion of good dreams and construct a 95%
confidence interval estimate . Name the sampling scheme you followed to estimate
the proportion of good dreams, and comment.
Practical 2.19. Dr. Dreamer believes that if a person takes good dreams during
sleeping hours then he/she is mentally more healthy, and pleasant person . You are
instructed to report stories of your dreams to the doctor until you are not having 15
good dreams . Find the Dr. Dreamer's 95% confidence interval estimate of the
proportion of good dreams in your life. Can you be considered a pleasant person?
Comment and list the sampling scheme used.
3. USE OF AUXILIARY INFORMATION: SIMPLE RANDOM
SAMPLING
3.0 INTRODUCTION
It is well know n that suit able use of aux iliary informatio n in probab ility sam pling
results in co nsiderab le redu ction in the varia nce of the estimato rs of population
parameters viz. population mean (or total), med ian, variance, reg ress ion coefficient,
and popul ation correlation coefficient, etc.. In this chapter we will consider the
problem of estimation of different population parameters of interest to sur vey
statisticians using known auxiliary inform ation und er SRSWOR and SRSWR
sampling schemes only . Before proceeding furth er it is nece ssary to de fine som e
notation and ex pec ted values, which will be useful throu ghout this chapter.
Ass ume that a simple random sample (SRS) of size 11 is drawn from the give n
popul ation of N unit s. Let the value of the study variable Y and the auxiliary
variable X for the / " unit (i = 1,2,...,N) of the popul ation be denoted by >i and Xi
and for the i''' un it in the sample (i = 1,2,...,11) by Yi and Xi' respectiv ely. From the
sampl e obse rvations we have
- \ /I _ \ /I 2 \ /I _ 2 2 1 /I - 2
Y =- 'L Yt » X =- 'L Xi ' SY =-(- ) 'L (Yi - y) , Sx =-( - ) 'L (Xi - X) ,
11 i=1 11 i=1 11 - \ i=1 11 - \ i=1
and
S
xy
=-\()
11 -\ i=1
£(Y.-Y)(x.-x) .
I I
f i rs = -( -
I -) ~ (
L.. Yi - Y
-)1' (Xi - X- )s , and AI'S = fi rs
/V 1'/ 2 s /2)
' /20 fl 02 .
N- l i=\
Note that
fl2 0 = Sy2 , fl02 = S "2 and fil l = S ,y , so that Cy2 = Sy2/ Y-2 = fl 20 / Y-2 ,
Cr2 = S,2/ X- 2 = fl02 / - 2
X, and Pxy = S ty / (S,Sy ) = fill / (Vc-
f l 20 Vc-)
fl02 •
Let us define
y x
&0 ==-1, &1 =~-I,
Y X
The next section has been devoted to estimate the population mean in the presence
of known auxiliary information,
Several estimators of population mean are available in the literature and we will
discuss some of them .
3.2.1 RATIO :ESTIMATOR
Cochran (1940) was the first to show the contr ibution of known auxiliary
information in improving the efficiency of the estimator of the population mean Y
in survey sampling. Assuming that the population mean X of the auxiliary variable
is known, he introduced a ratio estimator of population mean Y defined as
Chapter 3: Use of auxiliary informat ion: Simple random sampli ng 139
- -(XJ
:x .
YR = Y (3.2.1.1 )
Theorem 3.2.1.1. The bias in the ratio estimator YR of the population mean Y , to
the first order of approximation, is
Assum ing led < 1 and using the binomial expansion of the term (1 + e,t' we have
where O(e,) denot es the higher order terms of e1' Note that le]1 < 1, ef """""* 0 as
g > 1 increases. Therefore the terms in (3.2.1.4) with higher powers of e ] are
negligible and can be ignored. Now taking expected values on both sides of
(3.2.104) and using the results from section 3.1 we obtain
Thus the bias in the estimator YR to the first order of approximation is given by
(3.2. 1.2). Henc e the theorem.
Theorem 3.2.1.2. The mean squared error of the ratio estimator Y R of the
population mean Y , to the first order of approx imation, is given by
MSE(h) = C ~f)y2[c; + C; - 2PxyCyCxl. (3.2.1.6)
Proof. By the definition of mean squared error (MSE) and usin g (3.2.104) we have
MSE(YR) = E[YR- r] "" E[V(1+eo- el +et - eoe, + 0(e2))- vf
"" V2 E[eo -e]+e? - eoe,]2.
Again neglecting high er order terms and using results from section 3.1 the MSE to
the first ord er of approximation is given by
By substituting the values of Cy ' C r and Pxy in (3.2 .1.6), one can easily see that
the mean squared error of the estimator YR, to the first order of approximation, can
be written as
Theorem 3.2.1.3. An estimator of the mean squared error of the ratio estimator YR ,
to the first order of approximation, is
Theorem 3.2.1.4. Another form of the estimator of the mean squared error of the
ratio estimator YR , to the first order of approximation, is
Theorem 3.2.1.5. The ratio estimator YR is more efficient than sample mean Y if
<. 1
P ry - >- ' (3.2.1.10)
. Cr 2
Proof. The proof follows from the fact that the ratio estimator YR is more effic ient
than the sample mean Y if
MSE(YR) < v(y)
orif
In the condition (3.2.1.10), if we assume that C y :::: e x ' then it holds for all values
of the correlation coefficient Pxy in the range (0.5, 1.0] . A Monte Carlo study of
ratio estimator is availab le from Rao and Beegle (1967). Thus we have the
following theorem.
T heore m 3.2.1.6. The ratio estimator YR is more efficient than the sample mean
Y if Pxy > 0.5 , i.e. , if the correlation between X and Y is positi ve and high.
Example 3.2.1.1. Mr. Bean was interested in estimating the average amount of real
estate farm loans (in $000 ) during 1997 in the United States. He took an SRSWOR
sample of eight states from the population 1 given in the Appendix. From the states
selected in the samp le he gathered the following information.
:, State
"
CA GA LA MS NM PA TX VT
Nonreal estate /' 3928.732 540.696 405.799 549.551 274.035 298.351 3520 .361 19.363
fafrrnloans(X..) '$
Real est~J~ farfn 1343.461 939.460 282.565 627.013 140.582 756.169 1248.761 57.747
loans (Y $,
The average amount $878.16 of nonreal estate farm loans (in $000) for the year
1997 is known. Apply the ratio method of estimation for estimating the average
amount of the real estate farm loans (in $000) during 1997. Also find an estimator
of the mean squared error of the ratio estimator and hence deduce 95% confidence
interval.
Thus we have II = 8,
142 Advanced sampling theory with applications
8
f(Xi -xf IXi
s2 = -'.::
i-:..!...
l _ 17382362.33 2483194.6, x = .!.::.!.- = 9536.888 = 1192.11
x 8-1 7 8 8
8
f(Yi - y)2 I Yi
s2 = H 1675478.89 = 239354.1 - = .!.::.!.- = 5395.758 = 674.469
y 8-1 7 ' Y 8 8 '
8
I(Yi - yXXi - r )
S = H = 4474294.08 = 639184.86 and r = I = 674.469 = 0.5658 .
xy 8-1 7 ' x 1192.11
We are given X = 878.16, N = 50 and f = 0.16.
Thus the ratio estimate of average amount of real estate farm loans during 1997, Y
(say), is given by
-
YR
= -(
Y
XJ
x = 674.469(878.162)
1192.11
= 496.86
Using the Table 2 given in the Appendix the 95% confidence interval is given by
Example 3.2.1.2. After applying the ratio method of estimat ion, Mr. Bean wants to
know if he achieved any gain in efficiency by using the ratio estimator. The amount
of real and nonreal estate farm loans (in $000) during 1997 in 50 different states of
the United States has been presented in population I of the Appendix. Find the
relative efficiency of the ratio estimator , for estimating the average amount of real
estate farm loans during 1997 by using known information on nonreal estate farm
loans during 1997, with respect to the usual estimator of population mean, given the
sample size is of eight units.
Solution. From the description of the population, we have Y; = Amount (in $000)
of real estate farm loans in different states during 1997, Xi = Amount (in $000) of
Chapter 3: Use of auxiliary information: Simple rando m sampling 143
nonreal estate farm loans in diffe ren t states during 1997, Y = 555.43, X = 878.16,
Sy2= 342021.5, C,2= 1.5256 , Cy2= 1.1086 , Pxy = 0.8038, an d N = 50 .
Thus we have
= 17606.39 .
Also
Thus the percent relative efficiency (RE) of the ratio estimator YR w ith respect to
the usual estima tor Y is given by
RE = v(-) x 100/ MSE(- ) = 35912.26 x 100 = 203.97%
y YR 17606.39
which shows that the ratio estimator is more effic ient than the usual estima tor of
pop ulatio n mean. It shou ld be noted that the relative efficiency does not depend
upon the sample size.
Theorem 3.2.1.7. The minimum sample size for the re lative standard error (RSE) to
be less tha n or equal to a fixed value ¢ is give n by
1/ >
¢2 y2
+-
1]-1 (3 .2 .1. 11)
- [ S2y + R 2S2x - 2RS xy N
(~ _ J...)(c;
II N
+C} - 2PxyCxCy) :,> ¢2
-1
¢2y2 1
or 11 >[ 2 2¢2 +J...]-I or II ~ 2 2s;2- 2RS + -N ]
Cy + C, - 2pxyC,Cy N [ Sy + R xy
Hence the theorem.
144 Advanced sampling theory with applications
Example 3.2.1.3. Mr. Bean wishes to estimate the average real estate farm loans in
the United States with the help of ratio method of estimation by using known
information about the nonreal estate farm loans as shown in population I in the
Appendix . What is the minimum sample size required for the relative standard error
(RSE) to be equal to 12.5%?
Solution. From the description of the population I given in the Appendix, we have
- - 2 2
N = 50, Y = 555.43, X = 878.16, Sy = 342021.5, Sx = 1176526, Sxy = 509910.41,
R = Y- / X
- =-
555.43 ..
- = 0.63249 , ¢ = 0.125, th us th e minimum samp Ie size
. IS.
878.16
¢2yz 1]-1
n> +-
2 - 2RS
y + R S2
- [ S2 x xy N
=[
2
0.125 x (555.43'f +J...-]-I =20.51",21.
342021.5 + 0.632492 x 1176526- 2 x 0.63249 x 509910.41 50
Solution. Note that the population size is 50. Mr. Bean started with the first two
columns of the Pseudo-Random Numbers (PRN) given in Table I of the Appendix
and selected the following 2 I distinct random numbers between I and 50 as: 01, 23,
46,04,32,47,33,05,22,38,29,40,03,36,27,19,14,42, 48, 06, and 07.
:it~~~i!~[ :rf,[!~~r' :;, I r!f'!r~i~fi~' :~ ! ~i(~: ~!~):,:[I:! : l i ~~i[~Y) g 1·','(:1:" ;11:1~;t;!-z:i Y,1
I~ : : ',:
:'i ' ,!;!" , \2
1,! !.i:J::'r,, ! 'N
01 AL 348.334 408.978 303627.6 21302 .6 80424 .302
03 AZ 43 I.439 54.633 218948 .3 250299 .3 234099 .570
04 AR 848.317 907.700 2605.2 124445.1 -18005 .653
05 CA 3928.732 1343.461 9177106 .0 621777 .6 2388748 .500
06 CO 906.281 315.809 47.9 57179.9 -1655.427
07 CT 4.373 7.130 800998.3 300087.3 490274 .840
14 IN 1022.782 1213.024 15233.5 433084 .8 81224.255
19 ME 51.539 8.849 718797.2 298206.9 462979 .800
22 MI 440.518 323.028 210534.2 53779.6 106406.960
23 MN 2466.892 1354.768 2457163.0 639737 .2 1253769.700
27 NE 3585.406 1337.852 7214853 .0 6 I2963.4 2102960 .000
29 NH 0.471 6.044 807998 .0 301278 .3 493388 .550
Continued .
Chapter 3: Use of auxiliary information: Simple random sampling 145
Give n N=50 and X=878 .16 . Now from the above table, n =21, Y=55 4 .93223,
x = 899.35809, s; = 1282260 , s; = 233397.6, Sty = 442921.47 , r = 0.617, and
f = 0.42.
Thus rat io estima te of the ave rage real estate farm loans in the United States is
-
YR Y xJ
= -( X = 554.93223( 878.16 ) = $541.85 .
899 .35809
An estimate of MSE(YR) is give n by
MSE\.YR f )f
• to: ) = ( -1-n- lSy2 + r 2Sx2 - 2rs ]
xy
= C -2
0{42
) [ 233397.6 + (0.617)2 x 1282260 - 2 x 0.617 x 442921.47]
= 4832.64 .
Us ing Tabl e 2 from the Appendix the 95% confide nce interval for the ave rage real
estate farm loans is given by
3.2.2 PRODU€ffESHMAffOR
Murthy (1964) considered another est imator of popul ation mea n Y using known
population mean X of the aux iliary variable as a product estima tor
(3.2.2 .1)
Theorem 3.2.2.1. The exact bias in the product estimator yp of the population
mean r is given by
l
3.1 we have
Thus the bias in the product estimator yp of the population mean is given by
B(yp)=E(yp)-Y =C~f)yPXYCXCy .
Hence the theorem.
Theorem 3.2.2.2. The mean squared error of the product estimator yp, to the first
order of approximation, is given by
MSE(yp) = C~f)y2[c; + C; + 2PxyCyCxJ, (3 .2.2.5)
Proof. By the definit ion of mean squared error (MSE) using (3.2.2.3) and again
neglecting higher order terms and using results from section 3.1 we have
- ) -2 r 2
MSE (YP = Y ElCo + CI2 + 2 cOCI ]
Hence the theorem.
Theorem 3.2.2.3. An estimator of the MSE of the product estimator yp , to the first
order of approximation, is given by
, ( ) () -f)[
MSE yp = -n- Sy2 +r2Sx2 +2rsxy] . (3.2.2.6)
Theorem 3.2.2.4. The product estimator yp is more effic ient than sample mean y
if
Cy )
PXYC <-"2 ' (3.2.2.7)
x
Chap ter 3: Use of auxiliary inform ation : Simple random sa mpling 147
Proof. The proof follows from the fact that the product estimator y p IS more
efficient than the sample mean y if
MSE(yp) < V(y)
orif
In the condition (3.2.2.7), if we assume that Cy '" Cx ' then it holds for all values of
the correlation coefficient P xy in the range [-1. 0, - 0.5) . Thus we have the
following theorem .
Theorem 3.2.2.5. The product estimator yp is more efficient than the sample mean
y if Pxy < -0.5 , i.e. , if the correlation between X and Y is negative and high .
Remark 3.2.2.1. We observed that the product and ratio estimators are better than
sample mean if the value of P xy lies in the interval [-1.0, -0.5) and (+0.5, +1.0],
respecti vely. Thus the sample mean estimator remains better than both the ratio and
product estimators of the population mean if Pxy lies in the range [-0 .5, + 0.5] .
Assume that the average age 67.267 years of the subj ects is known as shown in the
population 2 in the Appendix. Assuming that as the age of a person increases then
the sleeping hours decrease, apply the product method of estimation for estimating
the average sleep time in the particular village under study. Also find an estimator
of the mean squared error of the product estimator and deduce a 95% confidence
interval.
148 Advanced sampling theory with applications
i;< '"
1 408 55 132.25 110.25 - 120.75
2 420 67 552.25 2.25 35 .25
3 456 56 3540.25 90.25 -565.25
4 345 78 2652.25 156.25 -643 .75
5 360 71 1332.25 30.25 -200.75
6 390 66 42 .25 0.25 -3.25
Sum 2379 393 8251 .50 389.50 ~ 1 49 8 .5 0
Here }j = Duration of sleep (in minutes) , Xi = Age of subj ects (~50 years) , n = 6,
Y = 396.5, i = 65.5, s; = 77.9, s;= 1650.3, Sxy = -299.7, and r = Yli = 6.053 .
Also we are give n X = 67.267, N = 30 and f = 0.20.
Thus product estimate of the average sleep time, Y (say), is given by
Yp ~J = 396.5(~J
- = Y-( X 67.267 = 386.08'
and an estimate of MSE(yp) is given by
MSE(yp) = (1 ~f J[s; + r2s~ + 2rSry]
= ( 1- ~.20 J[1650.3 + (6.053)2x 77.9 - 2 x 6.053 x 299.7] = 116.83 .
A (1- a)100% confidence interval for population mean Y is given by
Exa mple 3.2.2.2. The duration of sleep (in minutes) and age of 30 people aged 50
and over living in a small village of the United States is given in the population 2.
Suppose a psychologist selected an SRSW OR sample of six individuals to collec t
the required information. Find the relative efficiency of the prod uct estimator, for
estimating average duration of sleep using age as an auxiliary variable, with respect
to the usual estimator of popu lation mean .
Solution. Using the description of the population 2 given in the Appendix we have
Yi = Duration of sleep (in minutes), Xi = Age of subjects (~50 years), N = 30 ,
- 2 2
X = 2018, Y = 11526, X = 67.267, Y = 384.2, Sy = 3582.58, Sx = 85.237,
C y2 = 0.0243, C x2 = 0.0188 , Sxy = - 472.607, an d Pxy = -0.8552 .
Chapter 3: Use of auxiliary information: Simple random sampling 149
Thus we have
I- f Y
- ) = ( -n-
MSE (yp )-2 rlCy2+ Cx2+ 2pxyCxCy]
= C-~.20 }384.2)2[0.0243 + 0.0188 - 2 x 0.8552~0.0243x 0.0188]
= 128.759.
Also
v(y) = (I ~f )s; = (1- ~.20) x 3582.58 = 477.677 .
Thus the percent relative efficiency (RE) of the product estimator yp with respect
to the usual estimator y is given by
Corollary 3.2.2.1. The minimum sample size for the relative standard error (RSE)
to be less than or equal to a fixed value ¢ is given by
- 1
¢2f2 1 (3.2.2.8)
n> +-
- [ S2
y + R 2S2
x + 2RSxy N]
3 ~2.3 REGRESSIONESTIMATQR .,
Thus the difference estimator Ydif is unbiased for the population mean, Y. The
variance of the estimator Ydif is given by
V(Ydir) = E[Ydif - yf
= E[Y(I+8o)-dX&] - y]2 = E[Y&O - dX&]]2
=E[ y2&6+d2X2&,2 _2dY X&o&,]
= (l~f)[Y2C;+d2X2C~_2dY XPXyCxcJ (3.2.3.4)
150 Advanced sampling theor y with applications
= (1-/)[S2 _
n y
s.~y
S2
] = (1-n/)Sy2[1_ S2S
Sly ; = (1- /) S2(I_ p.~) .
2 n y Y
(3.2.3.6)
x Y x
Cy Y Sxy
For the optimum value of d = Pxy - ~ = - 2 = /3 (regression coefficient, say) the
c, X Sx
difference estimator becomes
-
Ydif =Y S;
- + [ Sxy )(X
- - x-) . (3.2 .3.7)
Thu s the difference estimator becomes non-functional if the value of the regression
coeffic ient /3 = Sxy / s1 is unknown . In such situations, Hansen, Hurwitz, and
Madow (195 3) consider the linear regression estimator of the popul ation mean
Y as
YLR = Y + p(x - x), (3.2.3.8)
whe re p = s.w / s.~ denotes the estimator of the regression coefficie nt /3 = Sxy / S; .
Then we have the follo wing theorems:
Theorem 3.2.3.1. The bias in the linear regression estimator YLR of population
mean Y is given by
Proof. The linear regression estimator YLR , in terms of &0 , &\ , &3 and &4 , can
easily be written as
-( ) Sxy (I +&4)[- -( )]
YLR = Y 1+ &0 + 2 X - X 1+ &]
Sx( I +&3)
l
= Y(I+ &0)+ /3(1+ &4XI + &3t [X - X(I + &1)] .
Using the binomial expansion (I + «r ' = 1- &3 + &f + 0(&3) we obtain
YLR = Y(I + &0)- /3X lcl +&]&4 - &J&3 + 0(&)] . (3.2.3.10)
Taking expected value on both sides of (3.2.3.10) and neglecting higher order
terms, we obtain
Chapter 3: Use of auxiliary information: Simple random sampling 151
E(YLR) = Y
- - f3 X 1)[
1- - Cx--C
- (-
n
Al2
Pxy
xAo3 J= Y+
- (-1- - f3XCx Ao3 -A-
n
l2 J.
Pxy
I) - [
Thu s the bias is given by
Theorem 3.2.3.2. The mean squared error of the linear regression estimator YLR,
to the first order of approx imation, is
Proof. By the definition of mean squared error (MSE) and using (3.2.3.10) and
neglecting higher order terms we have
_(1 -/ )[
- - - S 2, + -S}y - 2-S}y] - - - S 2 - -
S2 S2 n y
_(1-/ )[ S~y]
S2
II }
x x x
= (I~/ )S~(I- p}y).
Hence the theorem .
Theorem 3.2.3.3. An estimator of the mean squared error of the linear regression
estimator YLR, to the first order of approximation, is given by
Theorem 3.2.3.4. The linear regression estimator YLR is always more efficient than
the sample mean Y if Pxy ;t 0 .
152 Advanced sampling theory with applications
Remark 3.2.3.1. If jJ = Ylx then the linear regression estimator YLR reduces to the
usual ratio estimator YR and if jJ = - Y/ X , then the linear regression estimator YLR
reduces to the usual product estimator yp .
Apply the regression method of estimation for estimating the average amount of the
real estate farm loans (in 000) during 1997. Also find an estimator of the mean
squared error of the regression estimator and deduce a 95% confidence interval.
Assume that the average amount $878.16 of nonreal estate farm loans (in $000) for
the year 1997 is known.
Solution. From the sample information, we have
H ere 2 2
n=8, Y=348 .3554 , x=531.4353, sx=170082.85, sy=131919.67,
,
sxy=118102.55, fJ=sxy / sx=0.6943
2 I
and rxy=sxy/\SxSy ) =0.7884 . Also we are
given X=878 .16, N=50 and /=0 .16.
Thus the regression estimate of average amount of real estate farm loans during
1997, Y (for example), is given by
YLR = Y + jJ(x- x)= 348.3554 + 0.6943x (878.16 - 531.4353) = 589.08
Chapter3: Use of auxiliary information: Simplerandomsampling 153
Solution. From the description of the population 1 given the Appendix we have
- - 2
Y = 555.43, X = 878.16, Sy = 342021.5, Pxy = 0.8038, and N = 50 .
Also from example 3.2.1.2 we have
MSE(YR)= 17606.39 .
Now
What is the minimum sample size required for relative standard error (RSE) to be
equal to 12.5%? Use that data as shown in population I of the Appendix .
Exa mple 3.2.3.4. A bank manager selects an SRSWOR sample of eighteen states
from population I of the Appendix and colIects information about real estate farm
loans and nonrea l estate farm loans. Estimate the average real estate farm loans by
using the regression method of estimation, given that the average amount of nonreal
estate farm loans in the United States is known to be equal to $878.16 .
Solution. The bank manager used the 19th and 20 th columns of the Pseudo-Random
Numbers (PRN) given in Table I of the Appendix to select the folIowing 18 distinct
,
random numbers between 1 and 50 as:16, 31, 50, 29, 08, 33,19,28,11,07,27,37,
-r .(Yi X)(Xi-X)
48, 22, 24, 46, 41, and 32.
r (xt~xr
"-, ,
Random Stat~
No ,.,
Yi ~> ( y.-y
-&- c;
.',L I '< ,~
Here N = 50 and X = 878. 16 . The above table shows II = 18, Y = 304 .5265,
x = 631.8754, s; = 982 834.03, s;= 147134.11, and Sxy =343672.33 . Thu s
iJ = 0.3496 , t:~y = 0.9037 and f = 0.36 .
Thu s the regression estimate of the average real estate farm loans in the United
States is
YLR = Y + iJ(x - x)= 304.5265 + 0.3496(878.16 - 631.8754 ) = 390. 627.
Using Table 2 from the Appendix the 95 % confidence interval for the average real
estate farm loans is given by
390.627 ± 2.120v'959.059 or [324.973, 456.280] .
'~:Units A B C D E
Yi 9 II 13 16 21
Xi 14 IS 19 20 24
Do the following:
( a ) Select all possible SRSWOR samples each of n = 3 units;
( b ) Find the variance of the sample mean estimator by definition;
( c ) Find the variance of the sample mean estimator using the formula. Comment;
( d ) Find the exact mean square error of the ratio estimator by definit ion;
( e ) Find the approximate mean square error of the ratio estimator using first order
approximation;
( f) Find the ratio of approximate mean square error to that of exact mean square
error of the ratio estimator and comment;
( g ) Find the exact mean square error of the regression estimator using definit ion;
( h ) Find the approximate mean square error of the regression estimator using first
order approximation;
( i ) Find the ratio of approx imate mean square error of the regression estimator to
that of the exact mean square error and comment;
( j ) Find the exact relative efficien cy of the ratio estimator with respect to sample
mean estimator;
156 Adv anced sampling theory with applications
( k ) Find the approximate relative efficiency of the ratio estimator with respect to
the sample mean estimator and comment;
( I ) Find the exact relative efficienc y of the regression estimator with respect to the
samp le mean estimator;
(m) Find the approximate relative efficiency of the regress ion estimator with respect
to the sample mean and comment.
Solution. ( a ) From Chapter I we have following information for this population
- - 2
Y = 14, X = 19, Sy = 22, s;2 = 13, S xy = 16.25, P xy = 0.96, and f3 = 1.25. Also
from the all poss ible 10 samples of n = 3 units taken from the population of N = 5
units .
(~) f- -\2
Exact V(Yt)= L PI 151 - Y J = 2.933 .
1=1
( c ) The variance of the samp le mean YI with formu la is given by
We can see from ( b ) and ( c ) that the exact variance and variance by the formula
are same .
Chapter 3: Use of auxiliary information: Simple random sampling 157
( d) The exact mean square error of the ratio estimator YR (t) = Y{ ; J is given by
(~) -
ExactMSE{YR} = I pJh(t)-yf =0.714.
1=\
( f) The ratio of approximate mean square error to the exact mean square error is
given by
. f S Approx.MSE(h) 0.681 0 953
RatIO 0 Mean quare Errors = ( ) = - - =. .
Exact.MSE YR 0.714
Note that this ratio of the mean square errors approaches unity if sample size and
population size are such that f = n] N ~ 0 .
- } = (1-n-
Approx.MSE{YLR - fJ Sy2[1- Pxy
2] = (1-3/5J
- 3 - x 22 x [1- 0.962] = 0.230.
(i ) The ratio of approximate mean square error to the exact mean square error of
the linear regression estimator is given by
. 0 f Mean Square Errors = Approx.MSE(YLR)
RatIO () =-0.230 0 386
-= . .
Exact. MSE YLR 0.596
Note that , for this particular example, the ratio of approximate mean square to the
exact mean square is far away from one, but if f = n]N ~ 0 then this ratio
approaches to unity.
(j ) The exact relative efficiency (RE) of the ratio estimator with respect to sample
mean estimator is
Exact RE of the Ratio Estimator = V(Yl)XI(O) = 2.933x100 = 410.78% .
Exact.MSE YR 0.714
158 Advanced samplin g theor y with applications
( k ) The approximate relative efficiency of the ratio estimator with respect to the
sample mean estimator is
.
Approximate RE t hee Rati
atio Estimator
. = V(Yt) x IOO( ) = 2.933 x 100 = 43O.69 0Yo .
Approx. MSE YR 0.68 1
It shows that the app roximate relative efficienc y expr ession for the ratio estima tor
gives a slightly higher efficiency than in reality.
( I ) The exa ct relative efficiency of the regress ion estimator with respect to the
sample mean estimator is
( m) The approximate relative effic iency of the linear regression estimator with
respect to sample mean estimator is
Approx. RE of the Regression Estimator = V(Yt ) x 100 = 2.933 x 100 = 1275 .21% .
Approx. MSE(YLR) 0.230
It also shows that the approximate relative effici ency expression for the regression
est imator gives higher effi ciency than in reality.
Caution : Be careful while using appro ximate expression for mean square error of
the linear regression estimator or approximate expression for estimating the mean
square error of the linear regression estimator. The interval estimate of the
popul ation mean may be bigger than you are constructing with the approximate
results.
Note the following graphic al situations in the Figure 3.2.1 for the use of ratio,
product, and regre ssion estimator in actual practice.
~:'I: V
~
i "I
' :E~~~
ll~~
2.5
J I
I
~ 'I j
.a 0.5 I
The follo wing tabl e is used to collect some informati on about these three
estimators, which will be useful to the readers:
Chapter 3: Use of auxiliary information: Simple random sampling 159
5 We have to estimate only If both variab les are Here we have two
one mode l parameter, so positive (x > 0 , and unknown parameters,
the degree of freedom for Y > 0) but the correlation viz.: intercept and slope,
constructing confidence is negative, then we have thus we must use
interval estimates will be both intercept and slope, df=(n-2) . Its more
df = (n - I) . and then we shou ld must justification is give n In
use df=(n-2). the Section 3.6.
Srivastava (1967) considered another estimator of popu lation mean, Y , using the
know n popu lation mean, X, of the auxiliary variable, as a power transformation
estimator given by
_
Yrw = Y X
_(:x)a (3.2.4. I)
T heorem 3.2.4.1. The bias in the power transformation estimator Yrw , to the first
order of approximation, is given by
Proof. The power transformation estimator Yrw , in terms of £0 and £, , can easily
be written as
Yrw = Y(I +£ 0 XI + elf = Y(I +&0 {I +a£, + a(a -I)£ ,2 +0(&1))
2
Taking expected values on both sides of (3.2.4.3) and using results from section
3.1, we obtain (3.2.4.2). Hence the theorem.
T heorem 3.2.4.2. The minimum mean squared error of the power transformation
estimator Yrw , to the first order of approximation , is given by
P roof. By the definitio n of mean squared error (MSE), using (3.2.4 .3) and aga in
neglecting the higher order terms we have
MSE(Yrw) = E~rw - r] = E[Y(I +&0 +a£1 +0(&; ))- yf
= y2E[ £6+ a 2£,2 + 2a£0£, ] .
Chapter 3: Use of auxiliary information: Simple random sampling 161
The power a depends upon the optimum values of unknown parameters. Thus the
estimator ypw is not practicable. Thus we have the following corollary.
r
YPW(pract) of population mean Y is given by
YPW(pract) = Y( ~ (3.2.4.8)
where
a = -(xsxy}/~s;)
is a consistent estim ator of a . Note that while making confidence interval estimate
with the power transformation estimator the degree of freedom will be (n - 2).
Remark 3.2.4.1. The difference estim ator Ydif of the population mean, Y , given as
Ydif = Y + d(X - x) (3 .2.4.9)
has the same variance equal to the mean squared error of the linear regression
estimator for the optimum value of d = Sxy / S.~ = f3 . Again note that the degree of
freedom for constructing confidence interval estimates will be df = (n - I), because
the slope is assumed to be known, but we estimate the intercept.
or Ynsu = Y( ~J (3.2.5.2)
auxiliary variable.
n - s\--
- ( l +so--
= Y n - sOs\ ) .
N -n N- n
Tak ing expected values on both sides we have
E(Ynsu )= YE(I + So - _ n_ s\ - _n- SOS\)
N -n N- n
- n 1- f - _ -Y YPxyC,Cy
= Y - - - x --YPxyCxCy
N -n n N
Thus the bias in the estimator Ynsu is given by
_ )_ (_ )_ - _ _ YPxyCxCy __ Sxy
B (Ynsu - EYnsu Y- -_
N NX
which proves the theorem.
Theorem 3.2.5.2 . The mean squared error of the estimator Ynsu is given by
where g = _ n_ and N 2 ~ 00 •
r
N- n
Proof. We have
1- f -2 2 n 2 n
= ( -n- ) Y [ Cy + ( N-n ) 2 Cx-2(
N -)n PxyCxC y ]
Theorem 3.2.5.3. The estimator Ynsu is more efficient than the ratio estimator YR if
N N
n < - , and Pxy < ( )' (3.2.5.4)
2 2N- n
assuming that the correlation coefficient Pxy is positive .
Proof. The estimator Ynsu wilI be more efficient than the ratio estimator YR if
MSE(Ynsu)< MSE(YR)
or C~f)y2 [c~ + g 2C; - 2g p xy C xCy ]< C~f )y2 [cf, +C; - 2P xyCxCy ]
or (g2 - I~; - 2(g -1)pxyCxCy < 0 or (g -IXg+ l)c; - 2(g -1)pxyCxCy < 0
or (g -I)[(g+I)c}-2PxyC t Cy] < 0. (3.2.5.5)
or n - N + n < 0 and Cy (g + I)
N- n PXYC <-2-
x
or N Cy n+ N-n N
n < - and Pxy - < ( ) = ( )
2 Cx 2 N-n 2 N-n
N N
For Cy '" Cx we have n < - and Pxy < ( ).
2 2N- n
This cond ition holds in practice . For example , if N = 100 and n = 30 then Pxy IS
or n - N + n O d Cy (g + I)
> an Pxy->--
N-n c, 2
or N Cy n + N - n N
n > 2 and Pry C, > 2(N _ n) = 2(N - n) .
For Cy "" Cx we have
n >-
N
2
and Pxy > (
N
2 N-n
r
This condition will not hold in practice. For example, if N = 100 and n = 70 then the
value of Pxy needs to be more that 1.667, which is not possible. Hence the
theorem.
Note that lu - II < I thus the higher order terms can be neglected. Using (3.2.6.2)
and (3.2.6.3) in (3.2.6.1) we obtain
tg = y[ I+(u-I)H) + (u - I)2Hz + ..... ] (3.2 .6.4)
sn l o2 H
where HI = "u lu=l and H2 = - - - 2 lu=\ denote the first and second order partial
u 2 ou
derivatives of H with respect to u and are the known constants. Evidently the class
of estimators t g given at (3.2.6.4) can easily be written in terms of &0 and &) as
Theorem 3.2.6.1. The bias in the general class of estimators t g defined at (3.2.6.1),
to the first order of approximation, is
(3.2 .6.6)
Theorem 3.2.6.2. The minimum mean squared error of the general class of
estimators t g defined at (3.2.6.1), to the first order of approximation, is given by
.
Mm.MSE t ()(I-f)-22(
g= -n- Y Cy1- Pxy2) . (3.2.6 .7)
HI = -P xy-CCy ·
x
(3 .2.6.9)
One may note here that regression estimator and difference estimator are not special
cases of the general class of estimators defined in (3.2.6 .1). Srivastava (1980)
defined another class of estimators and named a wider class of estimators as
tw = H[y, u] (3.2.7.1)
where H[y, u] is a function of y and u and satisfies the following regularity
conditions:
( a ) The point (y, u) assumes the value in a closed convex subset R2 of two-
dimensional real space containing the point (Y,I) ;
( b ) The function H(y, u) is continuous and bounded in R2 ;
( c ) H(Y, I) = Y and Ho(Y, I) = 1, where Ho(Y, I) denotes the first order partial
derivative of H with respect to y;
( d ) The first and second order partial derivatives of H (y, u) exist and are
continuous and bounded in R2 .
Expanding H(y, u) about the point (Y, I) in a second order Taylor series we have
tw =H(y,u)=H[Y +(y-Y}I+(u-I)]
(- ) (_ -y:JH sn
= H Y,I + Y - Y o y ly=Y,u=1+(u -I) ou ly=Y,u=l +(u -I)
\2 1 0 2lJ
2 ou2 ly=Y,u=l +
_ -)2 1 0 2H (_ -y 1 0 2lJ
+(y-Y 2 02 y2 Iy =y,U=1 +Y - YAU- I)2 o yo u ly=Y,u=1 + (3.2 .7.2)
2
lJ = 10 H 1_ _
4 2 0 2y 2 y=Y,u=l·
Thus we have the following theorems.
Theorem 3.2.7.1. The asymptotic bias in the wider class of estimators i; of the
population mean Y is:
B(t w ) = C~f )[YPxyCrCyJ-l3 + C';J-l 2+ Y 2C;J-l 4]. (3.2.7.4)
Proof. The wider class of estimators t w ' in terms of &0 and &1' can easily be
written as
Chapter 3: Usc of auxiliary information : Simple random sampl ing 167
(3.2.7.5)
Taking expected values on both sides of (3.2.7.5) and using the definit ion of bias,
we obtain (3.2.7.4). Hence the theorem .
Theorem 3.2.7.2. The minimum mean squared error of the wider class of
estimators, t w ' is given by
In this sect ion, we will show that the known variance of the auxiliary variable can
also be used as a benchmark, in addition to the known population total or mean of
the auxiliary variable, to improve the estimators of the finite population mean of the
study variabl e under certa in circumstances.
where u = xl X , v = s;/
S; and H(u , v) is a function of u and v such that:
( a ) The point (u, v) assumes the value in a closed convex subset R2 of two-
dimensional real space containing the point (I, I);
( b ) The function H(u , v) is continuous and bounded in R2 ;
(c )H(I ,I) = I;
( d ) The first and second order partial derivatives of H(u ,v) exist and are continuous
and bounded in R2 .
Thus all ratio and product type estimators of population mean r defined as
- - X
YI = y (-J
x [ 2)
s;
S;
- - (-
X
, Y2 =y aX+(I-a)X
J[ Sx
2) ,
yS; +(I - y )S;
-
and Y3 = y
- X
(-Ja[
x s;
Sx
2)Y
are the special cases of the class of estimators defined in (3.2 .8.1).
Expanding H(u , v) about the point (I, I) in a second order Taylor's series we obtain
YSJ = yH (u,v) = yH [1+(lI - I),I +(v- I)]
_[ Of! Of! \2 1 (} 2 H
"' Y H (I,I) +(u-I )& I(I,I) +(v-I )~ I(I,I) +(lI - l) 2w
2 1(1,1)
1 {}2H 1 {}2H ]
+(v - 1 f 2 (}v2 1(1,1) +(u- 1Xv - I)2 cum 1(1,1)+ .'
2H
= r(1 + coX 1+ s .H, + C3H2 + c I 3 + c} H4 + CIC3 HS + ..... ]
-[
'" Y 1+EO +cIH I +c3H2 + cI2 H3 +c3H4
2
+ clc3HS
+ cOEIHI + COC3 H2 + .... ] (3.2.8.2)
where
Of! Of!
HI =& '(1,1), H 2 = ~ I(I , I ) ' and
1 (} 2 H
Hs = 2 wOv 1(1,1)'
Thus we have the following theorems:
Theorem 3.2.8.1. The asymptotic bias in the class of ratio type estimator s YSJ IS
Proof. It follows by taking expected values on both sides (3.2.8 .2) we have
E(YSJ) = rE[ 1+ co + e.H , +c3H2 + c?H 3 + c}H 4 +clc3HS +coclH I + coc3 H2 + .... ]
B(YSJ ) = E(YSJ ) - Y
Theorem 3.2.8.2. The minimum MSE of the class of estimators YSJ is given by
MS E(YSJ) f
= E[YSJ - Y = y 2E[eo + e,H, +e3H 2 + 0 (& )j2
- 2 [2 2 2 2 2 ]
= Y E eo + e, HI +e3 H2 + 2eoe,H , + 2eOe3 H2 +2 ele3H ,H2
+2Cx~3HIH2 ]. (3.2.8.5)
Srivastava and Jhajj ( 198 1) also con sidered a wider class of estimators of
population mean Y as
YSJ(w) = H (y , 1/, v) (3 .2.8.8)
( a ) The point (y, 1/, v) assumes the value in a closed convex subset R3 of three-
dimensional real space containing the point (Y, I, I) ;
170 Advanced sampling theory with applications
( d ) The first and second order partial derivatives of H(y, u, v) exist, and are
continuous, and bound ed in R3 .
Expanding H(y, u,v) about the point (Y, I, I) in a second order Taylor 's series we
have
(3.2.8.9)
where
if{ if{ if{ 1 0 2H 1 0 2H
oy I(Y.I,I)=I , HI =~ I(Y,I,I)' H 2=a; I(Y,I,I), H 3- 2 oy2 I(Y,I,I)' H 4 ="2 at2 I(h l)'
2 2 2 2
H t 0 H I H =..!.- 0 H I_ H =..!.- 0 H I_ and H =..!.- 0 H 1-
5 ="2 a,.2 (Y,I,I)' 6 20yat (Y,I,I)' 7 2 ata,. (Y,I,I)' g 20voy (Y,I,I)'
Theorem 3.2.8.4. The minimum MSE of the class of estimators YSJ(w) is given by
HI =- C t,
YCy {PXy (A04
x /L04 -
-1)- A]2A03}
,z \
1- /L03 J
' and Hz =-
YCy {A]2 - Pxy A03}
,
/L04 -
I ,z
- /L03
. (3.2.8.15)
Remark 3.2.8.1.
( b ) The asymptotic minimum mean squared error of the ratio type and the wider
class of estimators remains the same.
( c ) Note that A]2 and A03 are odd ordered moments. In case X and Y follow the
bivariate normal distribution then both A]2 and A03 are zero. In such situations the
minimum mean squared error of the class of estimators proposed by Srivastava and
Jhajj (1981) reduces to the mean squared error of the usual linear regression
estimator . Thus there is no advantage in using the known variance of auxiliary
variable for the construction of the estimator of the population mean Y if the joint
distribution of the study variable Y and auxiliary variable X is a bivariate normal
distribution .
172 Advanced sampling theor y with applications
( e ) There are large number of estimators belongi ng to the same clas s of estimators
with the same minimum asymp totic mean square error, so it is difficult to select an
estimator for a particu lar survey, and there is no theoretical technique avai lable in
the literature to select an estimator.
Solution. From the description of the population I give n in the Appendix we have
- 2
Y = 555.43 , X = 878.16 , C y = 1.1086 Ao3 = 1.5936 , Pxy = 0.8038 , N = 50 ,
A12 = 1.0982 , and Ao4 = 4.5247 .
Now
= 12709.55 ,
Thus percent relative efficiency (RE) of the general class of estimators, YSJ, with
respect to the linear regression estimator, YLR , is given by
- ) x 100/Mm.MSE
RE = V (YLR . (-)
YSJ = 12709.55 x I 00 = II 059
. %.
11491.74
It should be noted that in this case the relative efficiency is independent of sampl e
size 1/.
The next section of this chapter has been devoted to con structing the unbia sed ratio
and product type of estimators of the population mean . We will discuss
Queno uille 's method, interpe netrating sampl ing method, exactly unbia sed ratio and
product type esti mators, and bias filtratio n techniques .
Chapter 3: Use of auxiliary information: Simple random sampling 173
We have observed that the ratio and product type of estimators are biased . Several
researchers have attempted to reduce the bias from these estimators. We should also
like to discuss a few methods to construct unbiased ratio and product type
estimators of population mean before going on to the problems of estimation of
finite population variance, correlation coefficient, and regression coefficient.
(a) YRI = YI(~J , where YI = n-I.IYi and x] = n-1 .I,xi are the first half sample
XI 1= 1 1=1
(b) YR2 = Y2( ~ J' where Yz = n-1 IYi and x2= n-I IXi are the second half sample
X2 i=1 i=1
means for Y and X variables , respectively ;
(c) YR = y(~J, where Y=(2nt l ~Yi and x= (2ntl ¥Xi are the sample means for
X 1= 1 1=1
and
(N -2n)
a= 2N (3.2.9 .5)
Proof. We have
E(YQ) = E[a(h l + YRJ+(1 - 2a)YR] = a[E(yRI)+ E(YRJ] + (1- 2a)E(YR)
(N - 2n)
2a(~-~)
n N
+(1-2a{~-~)=o,
\2n N
or if a=-
2N
.
0.6
0.5
0.4
0.3
...o
1Il 0.2
0.1
Ql
~ O -f! 1 +
> -0.1 3 4 5
-0.2
-0.3
-0.4
-0.5 j
Sample Size (n)
For more details, one can refer to Singh and Singh (1993), Murthy (1962) and Rao
(1965a) . The reduction in bias to the desired degree by using the method of
Quenouille (1956) has also been discussed by Singh (1979) .
Chapter 3: Use of auxiliary information: Simpl e random sampling 175
Let us first present an idea about the interpenetrating samples. If we want to select
II units with SRSWOR sampling, we can select k independent samples each of
size til = /I / k , where we assume that /I / k is an integer. We draw til units out of N
units, then put back these til units so as to make the popul ation size the same. To
make the k samples independent, each individual sample of III units is selected
with SRS WOR sampling. Now we have k samples each of size til. From the /"
sample, a ratio type estimator to estimate the popul ation mean Y is
_ <v,_(x)
YRj Xj
where Yj = tII-
1m
Z:Yi
m
and xj =tII - I Z:Xi denote the i" sample means for
i=\ i=l
the Y and X variables, respecti vely, for j = 1, 2,..., k. Let us defin e a new estimator of
the population mean Y as
_
YRK =
i
-k
z;
L YRj = -k
1 ~_ ( X )
L Y j -=-:- . (3.2.9.6)
F I J= I xJ
Also from the full sample information, we have the usual ratio estimator of
population mean Y given by
- =y-(XJ
YR x
and
Note that til units are drawn k times from a population of size N wh ich is equivalent
to a sample of size /I = km is drawn from a population of size kN . Thus we have the
followin g theorem:
Theorem 3.2.9.2. An unb iased estimator of the popul ation mean Y is given by
- kYR - YRK
Yu = k -I (3.2.9.9)
Proof. We have
E(yu) = E[A.YR - YRK] = kE(YR) - E(YRK)
k- 1 k- l
176 Advanced sampling theory with applic ations
Theorem 3.2.9.3. The varianc e of the unbiased estimator Yu of the popul ation mean
Y IS
Note that k > 1, thus the unbiased estimat or Yu is less efficient than the ratio
estimator YR in case of finite popu lations.
Exa mple 3.2.9.1. Select three different samples each of five units by using
SRSWO R sampling from the population 1 given in the Appen dix. Collect the
information for the real and nonrea l estate fann loans from the states selected in
each samp le. The average nonreal estate farm loan is assumed to be known . Obtain
three differen t ratio estimates of the average real estate farm loans from the
information collected in the three sample s. Pool the information collected in three
sample s to obtain a pooled ratio estimate of the average real estate farm loans.
( a ) Derive an unbiased estimate of the average real estate farm loans.
( b ) Construct 95% confidence interval.
Give n: Average nonreal estate farm loans $878.16.
S ampleI I
Random-Number State Real estatefann Nonrea l estate farm
I S; Rli :5; 50 loans, Yi loans, Xi
01 AL 408 .978 348 .334
23 MN 1354.768 2466 .892
46 VA 321 .583 188.477
04 AR 907 .700 848.317
32 NY 201.631 426.274
Sum 3194 .660 4278.294
6.044
1213.024
1100.745
323.028
553.266
3196 :107 '
P 00 Ied Sam I: e
..,
Stafe .;;;';;;.Yi:;; :liF••r:·;ix; :,,;
: ". ';,' \y;; ~).;• •
.... >
m'u'l'·i'y)2.;. ' I;e.:(.{ cW ·;' ...;.. (V.
' ",'
'.:A c..,
, ...•
AL 408.978 348.334 -198. 1400 -343.722 39259 .3 118 144.9 68 104.989
MN 1354.768 2466 .892 747.6503 1774.836 558981.0 3150042.0 1326956 .627
VA 321 .583 188.477 -285 .5350 -503 .579 81530 .1 253591.9 143789 .300
AR 907 .700 848.317 300.5823 156.261 90349 .7 24417.5 46969.256
Contmued .....
Chapter 3: Use of auxiliary information: Simple random sampling 179
A ratio estimate of the average real estate farm loans from the pooled sample
information is given by
An unbiased estimate of the average real estate farm loans in the United States is
Using Table 2 from the Appendix the 95% confidence interval of the average
amount of the real estate farm loans in the United States is
Note that
R = .l, ~ JL and Ri = JL .
u t:«, XI
YR is given by
Therefore the bias in the estimator
_ ) (_) - X N y; - X N y; 1 N y; - - 1 N
( =EYR -Y=-2:- -Y=-2:- --2:-' Xi =XR--IRiXi
BYR 1 1
where
Srx=-(
n )[y-rx]
n-l
and hence an estimator of B(YR) is
•(_) ( N - 1) n (_ __)
B YR = - ---;;- (n-l) Y- r X • (3.2.9.14)
Thus we have the following theorem :
Theorem 3.2.9.4. An unbiased ratio type estimator of the population mean r IS
given by
- (N -1) n (- --)
YHR = r- X- +---(-
N n-l) y-r x .
(3.2.9.15)
Proof. We have
X Ny I[ N X NY] X NY 1 N X NY -
= - I - ' +- D j - N - I - ' = - I -
' +-IJj--I-' =Y .
N i=\X i N i=\ N i=\X i N i=\X i N i=J N i=\Xi
Hence the theorem .
Theorem 3.2.9.5. For large samples, the variance of the estimator YHR is given by
1 r 2 -2 2
(- ) =-lay+R
VYHR
-
a x-2Raxy , whereR=-I- .
1
1 N Jj - (32916)
n Ni=IXi , ..
Proof. For large values of n and N, the estimator YHR can be written as
-
YH R =r
-X- (N-l) n (- --) -X- (- --) -
+--:V-(n-l) y-rx "'r + y-rx =y+r
-(x- -x-) .
Defining t = ~ -1 such that E(r) '" O . The estimator YHR in terms of &0, &1 and t
R
can be written as
YHR = Y(1+ &o)-RX&](I+r).
Now we have
V(YHR) = E[YHR - E (YHR )]2 '" E[Y(I+&o)-RX&1 _y]2
= E [Y
- 2 2
&0
-2 -2 2
+R X &1 -
- -
2R X&o&\
1 '" -1 [-2 2 -2 -2 2
Y Cy + R X C -
- -
2R XPxyCxCy
1
n . t
Remark 3.2.9.1. Exact variance of the estimator YHR is available in Robson (1957).
Theorem 3.2.9.6. In case of infinite population , the unbiased ratio estimator YHR
is more efficient than the usual ratio estimator YR if either
- -
(R-R) <O and f3 <R+R or (R-R»O and f3 >R+R .
2 2
Proof. For large values of nand N we have
MSE(YR)= J.-[a; + R 2a.; -2Raxy] and V(Y HR)= J.-[a; + R 2a.; - 2Raxy].
n n
Now the estimator YHR is more efficient than YR if
V(YHR) < MSE(YR)
1 r 2 -2 2 - ] 1r 2 2 2 1
orif -lay+R a x-2Ra xy < - lay + R a x-2Raxy
n n
182 Advanced sampling theory with applications
Example 3.2.9.2. The estimation of the average amount of real estate farm loans (in
$000) is an important issue for the banks operating in the United States of America.
A statistician suggests that they consider the terminology of unbiased estimation
proposed by Hartley and Ross (1954) . Select an SRSWR sample of eight states
from the population I given in the Appendix and derive the appropriate estimate of
the real estate farm loans, given that the average amount $878.16 of nonreal estate
farm loans (in $000) for the year 1997 is known .
Solution. An SRSWR sample of eight states is selected by using the 3151 and 3th
columns of the Pseudo-Random Numbers (PRN) given in Table I of the Appendix
as: 17,36,50,50,31, 05,18, and 50. Note that the state WY has been selected
three times in the sample .
- -X- (N -1) n (- r r)
YHR =r +---(-)
N n-1
y-r x
v,(-
YHR
) = -1 [2 -2 2 2-
SY + r S x - r S xy
1
n
BA(_
Yp )=(~JSXY
- • (3.2.9.19)
n X
Thus we have the foIlowing theorem:
Theorem 3.2.9.7. A product type unbiased estimator of the population mean is:
_ = _( X )_(~)Sty
Ypu Y X n X' (3.2 .9.20)
Proof. The estimator Ypu in terms of 80 , 81 and 84 can be written as
- _-(x)
Ypu - Y X - (l-
f J Sxy _-(
-n- X - Y 1+80
)X(1+8\)
X
(1-f)Sxy(1+ 84)
- -n- _=":=X~:":"
184 Advanced sampling theory with applications
-- (~)_
- Y+
n
YPxyCxCy_(~)Sty
n
-
X
_- (1-
- Y + --
n
f)- s; Sy
Y--~~ - - - ~
SxSy X Y n
Sf
f)S XY
X
(1-
=y+C~fYJ -C~fYJ =y.
Hence the theorem.
Theorem 3.2.9.8. The variance vtypu) to the first order of approximation is same as
(- ) (1- f)[
V Ypu = -n- Sy2 +R 2 s;2 + 2RSxy ] . (3.2.9.21)
-"]2
V (Ypu ) = E [Ypu - Y J = E [-( )
Y 1+ &0 + &\ + &0&\ - -n-
(1-
f)Sx (1+ &4)
y X
I-f )[
'" ( -n- Sy2 +R 2 s;2 + 2RSxy ] = MSE(-
Yp ) .
Example 3.2.9.3. It is a well known phenomenon that as people become older their
sleeping time reduces. A psychologist wants to study the average duration of sleep
(in minutes) during the night for the persons 50 years of age or over in a small
village of the United States. Instead of asking everybody, the psychologist selects
an SRSWOR sample of six persons and records the information as given below
.,
In. . . 0ii.. ..;~.{~·. y0~ 7 12 15 19 24 29
Age x..(yeais) if,. .... ··· . t • .·..... 67 70 53 77 87 66
Duration ·ofsl~ep ; ·y·mnlii1iriutes) 420 360 5 10 330 270 390
The average age of 67.267 years of the subjects is known as shown in the
population 2 in the Appendix. Apply the unbiased product method of estimation for
estimating the average sleep time in the particular village under study. Also find an
estimator of the variance of the unbiased product estimator and hence deduce a 95%
confidence interval.
Chapter 3: Use of auxiliary information: Simple random sampl ing 185
7 420
12 360
15 510
19 330
24 270
29 390
' SUfi'
A{-
Vl,Ypu )
= -n-(1 - Jr 2 ]
f lSy2 +r 2Sx2 + rsxy
Singh and Sahoo (1989) have considered the problem of estimation of the
parameter D = Y Y I XS , and defined a gener alized estimator d g of the form
dg = ~ IxS ) (3 .2.9.22)
where 0 is a constant which takes the value +1 and -1 according as we wish to
estimate population ratio or product. The estimator dg in terms of &0 and &1 can
be written as:
186 Adva nced sampling theory with applications
dg =
r(I + &O)
8(
( X )-8 =Dl+
\ ",=DI + &O 1+ &1
( &o{ 1- 0&1 +0(0+ 1) 2 )
-- -&1 + ..
X 1 + &IJu 2
+'1 1)(jtJ-j'f)]
Taking expected values on both sides of (3.2.9.23) we obtain
(3.2 .9.24)
Emp loying the techn iques developed by Beale (1962), Tin ( 1965) and Sahoo
(1983~, we may construct three almost unbiased estimators, up to terms of orde r
O{n-I ), for D as
f 0(0 +1) 2]-1
dB =d [l+&yx-LI +- -2-Cx , (3.2.9 .25)
r'
(3.2.9.26)
and
respecti vely, where c~ = s~ /;2 and Cxy = Sry/(; y) are the estimators of C; and
Cry , respectively. Singh and Sahoo (1989) proposed a class of almost unbiased
ratio and prod uct estimators. According to them , whatever the samp le chosen let
t = (0, Cxy' c~ ) assume values in a bounded closed convex subset S of the three
dimensional real space cont aining the point T = (0, C ry , C;). Let f(t) be a functio n
of t satisfying the folJowing conditions:
( a ) The function f(t) is continuous in S;
( b ) first and seco nd order partial derivatives of f(t) exist and are continuous in S;
( c ) After the expans ion under the give n conditions
Theo rem 3.2.9 .9. Let f(t) be any function of 0, e,y and e; satisfying the above
conditions. Then a class of almost unbiased estimators of D may be defined as
dss = df (t). (3.2.9.28)
The class of estimators represented by d ss gives us an infinit e number of almost
unbiased ratio and product estimators by substituting a prop er choic e of f( t). It is
easy to see that the estimators suggested by Beale ( 1962), Tin ( 1965), Sahoo
(1983), Robson (1957) and the estimator s of the type:
d I -- d (1+ <X:
s:
1O 2]. I -d(I- "-xy )-1[1_ 0(0 +1) cx2] '.
xy f 1 _ (02+ 1) cx , (2 - <X: 2
d3=d ( l-&:xy )-I[1+-0(0- +1)- c x2]-1 ; d4=d [ 1+-0 (0- +1) c x2]-1 ( xy );
expoc
2 2-
Singh and Singh (1991 , 1993a, 1993b, 1993c) have sugg ested a new method to
separate the bias precipitates from the ratio and produ ct type estimators by using a
funnel connected with a filter paper. The apparatu s consists of a linear variety of
estimators and linear constraints. Consider
Rj = ;{x/xy, such that Rj E G for i = 1,2,3
where G denotes the set of all possible ratio type estimators for estimating the
popu lation mean Y .
By definition the set G will be a linear variety if
• 3 •
s, = I, g iRi EG
(3.2.9.29)
i; 1
for
3
j~lgj =1 and gjER (3 .2.9.30)
where gi (i = 1,2,3 ) denote s the amount of chemical s used for separating bias
precipitates and R denotes the set of real number s. Using (3.2.9.30), the relation
(3.2.9.29) in terms of £0 and £ ) may be written as
Then the mean square error (MSE) to the first order of approximation is given by
MS E(Rs) = (1~lf )yz[c; + K ZC} - 2KPxy C r Cy ] . (3 .2.9.33)
From (3.2.9 .30), (3.2.9.32) and (3.2.9.34), Singh and Singh (1991 , 1993a, 1993b,
1993c) made a funnel consisti ng of two equations given by
Lgj
3
j =1
=1
(3.2.9 .36)
and
3 . C
Llgj= Pxy -Cy
j=1 r
' (3293
. . . 7
)
EgjB(Rj)= 0 (3.2.9.38)
where B(R j) denotes the bias in the Rj, i = 1,2,3 , of population mean and the above
three equations can be written as
I. 1, 1l[gl] [I]
[~(RI} B&Z} B(R:)J :~ = : or A
3xP3
xl = K
3x l
' (3 .2.9.39)
and
S(R3 )= (I ~f)Y[6C; - 3PxyCyCx] . (3 .2.9.44)
Example 3.2.9.4. Ms. Stephanie Singh wishes to estimate the average amount of
real estate farm loans (in $000) during 1997 in the United States using known
information about the nonreal estate farm loans. She would like to apply the method
of filtration of bias from the ratio method of estimation . Suggest her the values of
the required real constants g i , i = 1,2,3 .
Given: Pxy = 0.8038, C; = 1.5256 and C; = 1.1086.
• C 1.1086
Solution. Here Pxy --..L = 0.8038 - - = 0.6852 , thus values of g, , gz and g3 are:
c, 1.5256
Cy z C;
xyCx + PXYClx = 3 - 3 x 0.6852 + 0.6852 = 1.414,
Z
gl = 3 - 3 p
z
Cy z Cy z
gz =-3+5Pxy--2pxy-Z =-3+5xO.6852-2xO.6852 =-0.513,
c, Cx
and
z
z
g3 = 1- 2pxy
Cy
C + Pxy Cl
Cy Z
= 1- 2 x 0.6852 + 0.6852 = 0.099 .
x x
Example 3.2.9.5. Ms. Stephanie Singh considers the problem of estimating the
average amount of real estate farm loans (in $000) during 1997 in the United States.
She takes an SRSWOR sample of eight states from the population 1 of the
Appendix and collects the following information:
NY ME IL CT CA AZ OH
16.710 51.539 2610.572 4.373 3928.732431.439635.774
The average amount $878.16 of nonreal estate farm loans (in $000) for the year
1997 is known . She wishes to use three estimators Rj = Ji(X/xY , such that Rj E G
for i = 1,2,3, where G denotes the set of all possible ratio type estimators for
estimating the populat ion mean Y. Discuss her estimate based on the estimator
• 3. 3
Rs = I g jRiEG for Igi = 1and g iER .
i= \ i=\
Also find an estimator of the mean squared error of the estimator R s and hence
deduce a 95% confidence interval.
Given : gl = 1.414 , g 2 =-0.513 and g3 =0 .099 .
Thus the unbiased ratio type estimate of average amount of real estate farm loans
during 1997, Y (for example), is given by
•
s, =
3
I giY
_(X
-=-Ji
i= \ X
C C
MSE(Rs~ ~f};(I_ r~,) = -~.16)x 657650.11x [1- (0.8413)Z ] = 20178.35 .
A (l-a)l 00% confidence interval for population mean Y is given by
Rs ±t a /2(df = n-I).jMSE{RJ .
Using Table 2 from the Appendix the 95% confidence interval is given by
516.91±2.365~20178 .35 or [180.96, 852.86J.
Unbiased ratio and product type estimators have also been discussed by Tukey
(1958), Durbin (1959), Murthy and Nanjamma (1959), Nieto de Pascual (1961),
Rao (1966), Rao and Webster (1966), Murthy (1967), Rao (1969), Hutchison
(1971), Schucany, Gray, and Owen (1971), Sharot (1976) and Rao (1981). A
complete review on the reduction of bias using auxiliary information can be had
from Sahoo, Sahoo, and Wywial (1997) . Williams (1958, 1961) compared the two
unbiased regression estimators, one suggested by himself and other by Mickey
(1959). Williams (1963) compared the precision of some unbiased regression
estimators. Rao (1967) investigated in a variety of natural populations the
performance of Mickey's estimator and concluded that it is usually inferior to
standard ratio and regression estimators . Sahoo (1994) compared the effic iency of
William's (1963) estimator with the standard regression estimator and found
through an empirical study that the former is usually inferior to the latter. Rao and
Rao (1971) investigated the exact efficiency of Mickey's unbiased ratio type
estimator.
The next section has been devoted to estimating the finite population variance in the
presence of auxiliary information.
Isaki (1983) proposed a ratio type estimator of finite pop ulation variance s;' as
(3.3.1.1 )
Theorem 3.3 .1.1. The bias, up to terms of order O(n -I), in the estimator s? is
Theorem 3.3.1.2. The mean squared error of the estimator s?, up to the first order
of approximation, is
Proof. We have
MSE(s?)= E[s? -S;f '" E[S;' (I+ EZ - E3 + E5 - EZ E3 +....)- S; f
'" S~E[EZ - E3 F = S~E[E~ + E5 -2 EZE3 ]
= C ~f )S~[(A40 - 1)+(A04 - 1)-2(A22 - I)].
Hence the theorem .
T heore m 3.3.1.3. A consiste nt estimator of the mean squared error of the estimator,
s? , up to the first order of approximation, is given by
, (z)
MSE Sl =
(1-f) 4['A40
- n- Sy
, ,]
+ .104- 2AZZ . (3.3 .1.5)
"" Units, A B C D E
I ' ,~+ ·, '. ,4 '."'J1'O
' 9 II 13 16 21
liP .2h;: Ii 14 18 19 22 24
(a) Find population mean squares s; and S; of the study variable (Y) and
auxiliary variable ( X ), respective ly.
( b ) Select all possible samples of three units (n = 3) with SRSWO R sampling.
s; and s; from each sample.
( c ) Find the sample variance
(d) Find the exact mean square error of the estimator s; using the definition .
( e ) Assuming that the population mean square S; of the auxiliary variable is
known, find the ratio estimates of the S; as
s2 =
I
s2(S
Y ;J 2
Sx
from each sample, and its exact mean square error by definition .
( f) Find the relative efficiency of the ratio estimator Sf with respect to sample
. 2
estimator s y .
Solution. ( a ) The population mean squares of Y and X variab les are given by
S; = 22 and S~ = 14.8 .
( b ) and ( c ) All the 10 possib le samples, estimates of population mean square
errors from a given sample s; s;
I, , I, and related results are given as:
9 11 13 0.1 324.000
x: 14 18 19
2 y: 9 11 16 13.000 16.000 12.025 0.1 99.50 1 81.000
x: 14 18 22
3 y: 9 11 21 41.333 25.333 24.147 0.1 4.6 11 373.778
x: 14 18 24
4 y: 9 13 16 12.333 16.333 11.176 0.1 117.170 93.444
x: 14 19 22
Continued ......
194 Adva nced sampling theory with appl ications
Example 3.3.1.2. Mr. Jack Al1en wishes to estimate the finite population variance
of real estate farm loans in the United States using the known benchmark as the
variance of the auxiliary variable nonreal estate farm loans. Using the information
given in population 1 of the Appendix, discu ss the relative effi ciency of the ratio
type estim ator
s;
s, =
2 2
Sy
[ s~2J
with respect to the usual estim ator s; .
Given: ..1.40 = 3.5822, A.o 4 = 4.5247 and ..1.22 = 2.8411 .
Chapter 3: Use of auxiliary information: Simple random sampling 195
given by
Theorem 3.3.1.4. The minimum sample size for the relative standard error of the
estimator Sf
to be less than or equal to ¢ is given by
-1
.1.
2 1
n> 'I' +_ (3.3.1.6)
- [ ,140 + -104 - 2,122 N ]
Proof. We have
(~ - ~ }A40+-104-2~2k¢2
or 1 ¢2 I
-~ +-,
n ,140 + -104 - 2,122 N
Example 3.3.1.3. Mr. Jack Allen is a bank consultant in the United States of
America. He considers the problem of estimation of finite population variance of
the real estate farm loans in the United States . Based on the information given in the
population 1 of the Appendix, what is the minimum sample size required for the
estimator sfto have minimum relative standard deviation 45%?
Solution. Using the 23 rd and 24th columns of the Pseudo-Random Numbers Table I
given in the Appendix , we selected the 10 distinct random numbers between 1 and
50 as : 44, 32, 33, 09, 10, 50, 31, 38, 06, and 13. The results obtained are as given
below.
10(
L: Xi- X)4 I(Yi - yf
t, = H = 1.6198x1O IZ fizo = s; = H = 392178.55,
r04 10-1 ' 10-1
10(
L: v, - Y)4 1 10
t, = i;1
r40 10-1
= 6.2244444 x 1011
'
fi22 = - - L:(Yi - y)Z(Xi - xf = 9.925244 x 1011,
10 -1 i;1
,i •
_ ,u40 _ 6.2244444xl 0
II • fi04 1.6198xl0 lZ
40-· Z- Z 4.04699, ~4 = - z = Z = 6.28638, and
,uZO 392178.55 fioz 507610.07
11
~z = .fiz; = 9.925244xl0 =4.985711.
flzofloz 392178.55 x 507610.07
Thus the ratio estimate of variance, s ~ , of the real estate farm loans is
Isaki (1983) also considered the difference type estimator of finite population
variance S; given by
s~ =s; +ko (S; -s;) (3.3.2. 1)
where ko is a real constant. The estimator s ~ in terms of &2 and &3 can be written
as
s~ = S;(I + &2) + ko[S; - S;(I + &})] = S;(I +&2)-kOS~&3 . (3.3.2 .2)
Thus we have the following theorems:
E(s~ ) = s;.
Hence the theorem.
Exa mple 3.3.1.5. Using the information given in pop ulation I of the Appendix,
find the relative efficiency of the regression type estimator si = s;' + kots'; - s; )
with respe ct to the ratio type est imator sl = s;' (s';/ s;)whi le estimating the variance
of real estate farm loans using know variance of nonreal estate farm loans.
G iven: ..1.40 = 3.5822, A.o4 = 4.5247 and A.z2 = 2.8411.
Solution. The percent relative efficiency of the estimator si with respect to sl IS
si =s;.( ~r (3.3.3.1)
The estima tor si in terms of b"l and b"2 can easi ly be written as
si = S}(I + b"2XI + b"1)"0
which , after ignori ng the higher order terms, becomes
Theorem 3.3.3.1. The bias 111 the estimator si of s;' to the first order of
approximation is given by
(3.3.3.2)
I- f ) 2[aO(ao-l
2 )C{2 +aopxyC{Cy] .
= ( -/1- Sy
Hence the theorem.
Chapter 3: Use of auxiliary information : Simple random sampling 199
Proof. We have
MSE(s~) = £ls~ - s; J '" £[S;(I +£2 +aoGj +aoGj G2)- S; r s~Eh
= +a oG) j2
ao =-~
c; ' (3.3.3.5)
Remark 3.3.3.1. If the population follows the bivariate normal distribution, then
A21 = 0 , hence the minimum mean square error of the proposed estimator becomes
The expression (3.3.3.6) conveys the message to survey statisticians that if the
study variable and auxiliary variable follow bivariate normal distribution then use
of known population mean or total of auxiliary variable is not helpful in improving
the usual estimato r of variance.
3:3:4GENERAl..:tCCASSOF'.ESTIMATORS
Srivastava and Jhajj (1980) proposed a general class of ratio type estimators for
estimating the finite population variance as s;
(3.3.4.1)
s)
2
[-J(
2 X
=Sy-=-
x St
s;2J 2 [-
2 , s2= Sy-=-
2 X
X
Ja(2s;2JY,and
Sx
2 2
s 3=Sy [ -X
(I)X
ax + - a
J[ YS 2
t
Sx2
( ) 2
+ 1- Y S t
J
are special cases of the class of estimators defined in (3.3.4.1). Now the class of
estimators s~J defined in (3.3.4.1) can be written as
200 Advanced sampling theory with applications
(3.3.4.2)
Theorem 3.3.4.2.The minimum mean squared error of the class of estimators s§],
to the first order of appro ximation, is
Proof. By the definition of mean squared error and by ignoring higher order terms
of &1 , &z, and &3 we have
Theorem 3.3.4.3. The minimum MSE , to the first order of approximation, of the
wider class of estimators defined as
slJ(w) = H (s;', u, v) (3.3.4.9)
Proof. Expanding H(s;', u, v) around the point (S;',I,I) in a first order Taylor' s series
we have
slJ(w) = H(S;',u, v) = H[S;' +(s;, -s;, } 1+(u -I). 1+(V- I)]
= H (S;' .I,I)+ (S;' -S;' ) O
Of-~ I ( 2 ) +(U- l)OH
S)' s)' ,1,1
( ,2 ) +(V- l)OH
t3U S)' .I.I
1 ( ,2 )+....
OV S)' .I.1 1
The mean squared error of the wider class of estimators is minimum for the set of
normal equations
= 1.940787669 x 1010 •
Thus the general class of estimators remains more efficient than both ratio and
regression type estimators. It is noted that the relative efficiency, to the first order of
approximation, in each case remains independent of the sample size .
We have seen that the minimum mean square errors of the class of estimators of
population mean and variance depend upon the values of the unknown population
parameters H J and Hz etc.. Srivastava and Jhajj (1983a) have shown that
consistent estimators of the unknown parameters HI and Hz can be used in place
of actual values. The minimum mean square error of the resultant class of
estimators with estimated optimum values remains the same as that with the actual
optimum values to the first order of approximation. We are not discussing this
procedure in detail, but the interested reader may refer to Srivastava and Jhajj
(1983a) and Singh and Singh (1984a). Singh and Zaidi (2000) estimated square of
population mean and variance.
The next section has been devoted to study ing the asymptotic properties of the
various estimators of regression coefficient
s;
fJ=-z .
Sr
The meaning and purpose of the estimation of regression coefficient is well known
to survey statisticians. Some like to define it as the slope of the linear regression of
X on Y or change in the study variable for per unit change in the auxiliary variable.
Here we will discuss a few estimators for estimating the regression coefficient in
survey sampling.
Srivastava, Jhajj, and Sharma (1986) considered the usual estimator of regression
coefficient fJ as
Theorem 3.4.1.1. The bias , to the first order of approximation, in the estimator b,
l
of 13 is
Theorem 3.4.1.2. The mean squared error of the usual estimator b, of the regression
coefficient 13 is given by
3.4;2QNBIASEDESTIMATOR
Theorem 3.4.2 .1. The variance of the unbiased estimator of the regression
coefficient 13 is given by
Chapter 3: Use of auxiliary information: Simplerandom sampling 205
Example 3.4.2.1. For estimating the regression coefficient of the amount of the real
estate farm loans (in $000) on the nonreal estate farm loans, Mr. Nelson used the
53rd and 54th columns of the Pseudo-Random Numbers given in the Table I of the
Appendix to select eight distinct random numbers between I and 50 as:06, 08, 45,
15,22,39,43 and 34. He collected the following samp le information:
State CO DE VT IA MI RI TX ND
Nonreal estate s-.»: 906 .281 43 .229 19.363 3909.738 440 .518 0.233 3520 .361 1241.369
farm loans ( X )$
l ~ellJ estate farm" 3 15.809 42.808 57.747 2327 .025 323.028 1.611 1248.761 449 .099
loans ( Y )$ '.
The population variance S} = 1176526 of nonreal estate farm loans (in $000) for the
year 1997 is known from population I of the Appendix.
( a ) Estimate the regress ion coefficient ,B with two different method s.
( b ) Also find an estimate of the mean squared errors in each case and hence derive
the 95% confidence intervals.
Solution. From the sample information , we have
,
(Y;-yXx;-x ),i (x; -:t}2 , \;, (y; _ y)2
hi
I 99053.71 125213.72 7835 9. 12 15678474394 6 140 152522 9811 637177 12402 882820
2 672862.23 1480863.86 305729.37 2. 19296E+ 12 93470 4496 27 4.52 744 E+II 9.964 17E+ II
3 667522.49 153951 8.88 289432 .16 2.370 12E+ 12 83770977628 4.455 86E+ II 1.02766E+ 12
4 45872 25.90 70203 88 .11 299736 1.60 4.92858E+ 13 8.984 18E+ 12 2.10426E+13 3.22041 E+13
5 22351 6.52 671774.48 74369 .65 4.512 8I E+ll 5530845327 499596355 61 1.50153 E+11
6 748 540. 17 1587356.83 352984.5 1 2.519 7E+ 12 1.24598E+ II 5.603 12E+ II 1.1882 E+ 12
7 1475983.10 510 8614 .79 426441 .65 2.60979E+ 13 1.81852E+ II 2.17853E+ 12 7.54023E+ 12
8 2752.01 352 .22 21502.41 12405 8.2636 46235 3625.9 7573558.476 9693 10.3289
Sum 8477456.20 17534 082 .89 , 546 180.49 8.29335E+ 13 9.48E+ 12 2.47396E+ 13 4.31192E+ 13
_ 18 _ 1 8
n = 8, N = 50, f = 0.16, Y = - LYi = 595.736, x = - LXi = 1260.137,
8 i=1 8 i=1
•
Jl02 1 ~(Xi -X)
=Sx2 =-L. -\2
= 2504868 .98, .
Jl04= -1L~(
. xi -X-)4 = 1.184764xlO 13,
8 - 1i=1 8 -I i=1
The (I - aft 00% con fidence inter val for regression coefficient p using b2 is
b2 ± l a/2(df = n - 2NMSE(b2) .
Using Table 2 from the Appendix the 95% confidence interval is given by
1.029 ± IO.05/2(df = 8 -2))0.1568
Singh and Singh (1988) introduced a general class of estimators to estimate the
regression coefficient as
Sxy ()
bSS Hu
= - 2 (3.4.3.1)
Sx
where H ( .) is a parametric function such that H( I) = I satisfying certain
regularity conditions as defined earlier. Thus we have the following theo rem :
Theorem 3.4.3.1. The minimum mean square error, to the first order of
approximation, of the general class of estimators bss is given by
Min.MSE(bss ) = MSE(I1)- ( 1-
n
f)p2(&- Pxy
-103J2 (3.4.3 .2)
Sxy H ()
bss = -2 Sxy H [1+ (u -I )J = -Sxy
u = -2 2 [ H ()
I + (U - 1)-()H Iu;j +( () 2 H lu; j +..]
)2 --2
u -I
Sx Sx s, du {)u
) + ()2
Sxy [1+ (u-IH ] S,y(I +&4)[ 2 2 + ....]
=-2 1 u -I H 2+ .... = 2( ) 1+ £jH, + £jH
S, SX 1+ £3
The mean squared error of the estimator bss , to the first order of approximation, is
(3.4.3.3)
On differentiat ing (3.4.3 .3) with respect to H, and equating to zero, we obtain
HI=- C.~'[~_
,
P
1.3J.
"\}
xy
(3.4.3.4)
z
I - f z An
= MSE(bl)-(- n-).o P [ xy
- Ao3
J
. (3.4.3.5)
Sampath (1989) has shown that the resultant regression type of estimators of the
population mean have the same minimum mean squared error, to the first order of
approximation, as that of the usual regression estimator. In other words, the
construction of an improved estimator of regression coefficient j3 is not helpful in
improving the estimator of the population mean Y
Example 3.4.3.1. The real and nonreal estate farm loans (in $000) during 1997 in
the different 50 states of the United States have been given in population I of the
Appendix . If we selected an SRSWOR sample of eight states, find the relative
efficiency of the usual estimator b l with respect to the unbiased estimator b z of
regression coefficient for this population .
Solution. Using the results from the description of the population 1 we have
(3.5.1)
Wakimoto (1971), Gupta, Singh, and Lal (1978, 1979), Rana (1989) and Biradar
and Singh (1992a) studied the behaviour of the estimator rxy under SRSWOR
sampling. Singh, Mangat , and Gupta (1996) have shown that the class of estimators,
to estimate the population correlation coefficient, proposed by Srivastava and Jhajj
(1986) as
210 Advanced sampling theory with applications
(3.5.2)
where u=x/x, v= s~ /S~ and H(. , .) is a parametric function such that H(I, 1)=1
satisfying certain regularity conditions can take an inadmissible value , i.e., outside
the range [-1.0, + 1.0] from a given sample. In other words, the ratio type or
regression type estimators cannot be made to estimate the correlation coefficient.
The estimator r,y can easily be written in terms of &2, &3 and &4 as follows:
113232111 ]
=P ,[ 1 + &4 - - &2 - - &3+- &2+- &3 +- &2&3-- &2&4--&3&4 + .. . . ( )
X} 2 2 8 8 4 2 2 3.5.3
Thus we have the following theorems :
Theorem 3.5.1. The bias, up to terms of order O(n- I ) , in the estimator rxy of Pxy is
B~xy) = (1-nfJpxY[~(A40
8
+Ao4 - 2)-..!..(~+~- 2J +..!..(Az2 -1)] .
2 Pxy Pxy 4
(3.5.4)
Proof. It follows by taking expected values on both sides of (3.5.3). Hence the
theorem .
Theorem 3.5.2.The mean squared error, up to terms of order O(n- I ) , of the usual
estimator rxy is
1- -
MSE (r,y ) = ( - J
f Pxy
2 [[-Az2
2 -1 +- J
1 (,.1,22 -1 ) +-
1 ( A4o+Aoc 2 ) - (,1,3 I I ,1,13 2]1 (3.5.5)
n Pxy 2 4 Pxy Pxy
2[21212
= PxyE &4 + 4" &2 + 4" &3 -
I]
&2&4 - &3&4 + 2 &2&3 .
Example 3.5.1. The real and nonreal estate farm loans (in $000) during 1997 in the
50 states of the United States have been presented in population 1 of the Appendix.
If we selected an SRSWOR sample of eight states to collect the required
information. Study the relative bias of the usual estimator of the correlation
coefficient.
Solution . Using the results from the description of the populat ion 1 given in the
Appendix , we have
B~xy) = (1-nfJpXy[~(,140
8
+ -'4J4 - 2)-2.[~+ ~- 2J+2.(,122 -I)]
2 Pxy Pxy 4
+±(2.8411-1) ] =-0.0081 7.
Also
MSEky)= (~Jp;y[[
n
12xy -1]+2.(~2
P 2
-1)+2.(,140 + -'4J4 -2)-[~+~-
4 Pxy Pxy 2J]
=(I - 0.16)(0.8038?[( 2.8411
8 (0.8038?
-IJ + (2.8411-1) + (4.5247 + 3.5822-2)
2 4
- 0.008171
=1 = 0.081 .
~0.01018
farmJolll'ls X·)$
<Inn 7.590 639.57 1 1579.686 825.748 1248.761 2327.025 553.266 6.044
t~~~~'(il;';~ ;;; · ·i
( a ) Estimate the correlation coefficient Pxy between the real estate farm loans and
the nonreal estate farm loans.
212 Advanced sampling theory with applications
( b ) Also find an estimate of the mean squared error and hence deduce the 80%
confidence interval for the correlation coefficient.
, 2
1102 = Sx = -
1 ~(
L, Xi -X
-)2 = 2457239.19,
8-1 i= 1
and
, I =1- I8 (Yi
III -X -)
-Y xi- x =1072446.59 .
8-1 i=1
i _ it31 1.376874 x 10
12
.
MSEky)=(- )rXY
n
1- f 2
[[ ]
Az2
. -I +-(Az2-
- 2
rty
1 •
2
1)+-(A40 +Ao4- 2) - - +- . - 2
4
I · · ,1,31 ,1,13
rxy rxy
[. )]
= ( 1- 0.16 ) (0.8620 )2[( 1.59536 IJ + ..!.. (1.59536 -I )
8 0.8620 2 2
We shall first introduce the concept of a superpopulation mod el and its role in
survey sampling. Estimation of popul ation mean from a sample is equivalent to
predicting the mean of the non-sampled values of the study variable. Thus model-
based sampling theory considers the problem of estimating finite population
parameters can be expressed as prediction problem. The model based strategies
have been pro ved to overcom e the gap between finite popul ation problems and rest
of the statistics. We discuss here two types of super population models for
regre ssion and ratio estimator of the popul ation mean .
In fact und er a superpopulation model a relation between a study var iable Yi and
auxiliary variable X i is:
Y; =a +fJX;+ c; (3.6 .1.1)
where C
i is a random variable such that E{t:d =0 , E(c;)= 0-
2 and EleiC j J= 0 for
i ;t j . In short, under the superpo pulation model, the popu lation itself becomes a
large random sample. Royall ( 1970a, 1970b, I970c) was the first to show that the
linear regression estimator of popul ation mean defined as
YLR=Y+b(X - x) (3 .6. 1.2)
is the best linear unbi ased predictor of population mean f . Following Brewer
(1963 a), Royall (197 0a, 1970b, 1970c), and Scott and Smith (1969), by imposing a
superpopulation model on the actual finite population, inference about the
characteristics of the finite population can be made via the structure of the model. In
this section we shall only introdu ce the idea of superpopulation model under
SRSWOR design . Thu s we have the following theorems :
Theorem 3.6.1.1. Under the superpopulation model (3.6. 1.1) the leading term of
the variance of the linear regression estimator YLR is given by
Proof. We have
f i = a+f3Xi + &i for i = 1.2•...• N . (3.6.1.5)
Taking the sum on both sides of (3.6.1.5) and dividing by N we obtain
I N I N
- Ifi =a+f3- IX i
N i=1 N i=1
which in fact implies that
a =V- f3 X . (3.6.1.6)
On subst ituting the value of a in (3.6.1.5) and solving for ei , we obtain (3.6.1.4).
Following the same steps as for MSE in the Theorem 3.2.3.2 we can easily see that
variance. up to terms of 0(11-I). of the usual linear regression estimator is
= (1- f)_I_[
(N - I)
II
~(}j _Vr
;=1
~(X; -xt -2f3 ;=1~(}j - vXX; - x)lJ
+ 13 2
;= 1
Remember that E(e;)= 0 but e*-O . In other word s, in takin g a very large numb er of
samples we expect ei to have a mean value of zero by assumption, but in any
particular sample e is not necess arily zero.
Sim ilarly we have
(3.6.1.12)
n
(3.6 .1.13 )
Now we have
2
E[ fIfJ-P; " X;*2] = L"*2
;. \2 L A .0;\2
X; E(.0- = L"*2
X; x-,,-a -= a 2. (3 .6.1.15)
;= 1 ;=1 ;=1 LX?
;=1
" " */
Now
.0 - .0 = Lw;e; , where W; = X; LX;*2 /l
and using LX;* = 0 ,
11
;=1 ;= 1 ;=1
therefore we have
Chapter3: Use of auxiliary information: Simple randomsampling 217
I: X? E(e?)+2
i= l
I: x;x~E(eiej)
h" j=l _
a
2fx?+2
i=l
f
i'" j=l
x ,x)xO
n *2 n *2
I~ ~~
~l i =l
a2 • = (3.6.1.16)
Using these expected values in (3.6 .1.13) we have
n
E ( i~lei
2) = Ei~ln ( ei - -)2 {( , \2 n *2} n ( _ 'Ii' \ *
e + E fJ - fJ) i~IXi - 2Ei~1 ei - e Af3 - fJ)Xi
= (n -1)a 2 + a 2 - 2a 2 = (n - 2)a2
which impl ies that
a 2 =_I_ E(
n-2
Ie1J .
i=1 (3.6.1.17)
V(YLR)=( I-
n
f )_(
I )f e?
n- 2 i=1
(3.6.1.21 )
Hence the theo rem .
1 11 2 1 II 2 1 11 2[ 2]-1 1 II [ 2 4 ]
- ()Iei = ( )Iei =-I ei 1-- "'-Izi 1+-+ 2+ .... (3.6 .2.2)
n - 2 i=J 1
n - - 2 i=J n i=J n n i=1 n n
n
where Zi = el, i = I, 2, ..., nand
(;)=[I +(;-I)r ",[I+g(; -I)+g(g2-
1
)(; -lr + ...} (3.6.2.3)
Defining
x- -
Z
&J =~ -I and 0, ==-1
X Z
- 1 N 2
where Z = - IZi for Z, = &i , we have
Ni =J
1-
=( -n- 1)3 Z rlCz
-2 2
+ g C, + 2gpxz C,C z .
2 2 ]
(3.6.2.6)
Example 3.6.2.1. Apply the regression method of est imat ion for estimating the
average amount of the real estate farm loans (in $000) during 1997 . Also find an
estimator of the variance of the regression est imato r assuming that the relationship
between Y and X is given by Yi = fJo + fJIX i + &i ' An SRSWOR sample of eight
states selected from the population 1 of in the Appendix is given below.
Th e average amount $878 .16 of nonreal estate farm loan (in $000) for the year 1997
is known .
( a ) Derive the 95% confidence interv als using unbiased estimator of variance
under the linear model.
( b ) De rive the 95% confidence intervals using the estimator proposed by Deng and
Wu (1987) .
( c ) Compare both confidence interval estim ates in ( a ) and ( b ), and comment.
Given: Z, = &; , c; = 1.5097, C; = 1.5256 and Pxz = 0.4762 .
(a) Usual estimator of variance: The usual estimate of V(YLR) under the
superpopulation model is given by
Using Table 2 from the Appendix the 95% confidence interval is given by
(b) Deng and Wu's estimator: The Deng and Wu (1987) estimate of V(YLR)
under the superpopulation model is given by
"(_)DW (1--n-f) n_2i~ei
v YLR =
1 2( Xx)g n
( c ) Interpretation: Note that in this case the length of the confidence interval by
the usual estimator is more than that of Deng and Wu (1987) which states that their
estimator perform better than usual estimator of variance in this situation .
Deng and Wu (1987) have shown that the estimator at (3.6.2.1) remains better than
the estimators proposed by Royal1 and Eberhardt (1975), Royal1 and Cumberland
(1978, 1981a, 1981b, 1985) and Rao (1968a, 1969). It is to be noted that Deng and
Wu (1987) have taken (n-2) in the denominator of the usual estimator of variance
of the regression estimator, which comes only from Gauss--Markov Theorem
discussed above. Fol1owing Devil1e and Sarndal (1992) we have the fol1owing
theorem .
Chapter 3: Use of auxiliary information: Simple random sampling 22 I
Theorem 3.6.2.2. An estimator for estimating the appro ximate variance of the usual
linear regre ssion estimator is given by
•_ (1 - I ) 1 11 2
V(YLR )DS =n- - I) i=Lei
(n -- (3.6.2.8)
1
where
.
ei = Yi - fJxi with fJ. = i~11 XiYi / i~11 Xi2 ' This technique is called model assisted,
Method II . Again from the model (3.6.3. I) on setting E(e; )=0 we obtain sum of
square due to errors (SSE) as
SSE = I (Y; - RXi )2.
i= 1
(3.6.3.7)
. -
O n setting aSSE
- = 0 we 0 b tam
. NL ( Yi - RXi ) X i = 0 and it ai
It gives us
aR i=1
N
LY; X i
R = i.=.L-
N . (3.6.3.8)
LX;
i= 1
Note that we are not interested in this ratio, but the sample analogous model of
(3.6.3.8) will give us
n
LYixi
r -l=.L- (3.6.3.9)
- n 2 •
LXi
i= \
(3.6.3.10)
n
Note that LYixi = nx ~ +(n-l}sxyand Ln Xi=
2
nx-2 +n
(
- 1)s2x' then (3.6.3.10)
i=! i=1
becomes
Chapter 3: Use of auxiliary information: Simple random sampling 223
(II-I ) s <)'
1+ - - -
=2. II xy X
YR 2 - ( I) 2 (3.6.3.11)
r
e X 1 + ~2
II x2
which is called Beal e ( 1962) ratio estimator of the population mean Y .
Quen ouill e's (1951 ) method of bias reduction, popularly known as the Jackknife
procedure, has been successfully applied for estimating the variance of estimators.
We shall discuss the idea of Jackknife variance estimator for ratio and regression
estim ator of the popul ation mean, althou gh it can be used to estimate the variance of
any linear or non-linear estimator fJ of a parameter ().
-
YRg =-
1~_ 1~_ (xJ
L. YR(j) = - L. Y(j) -=- . (3.7.1.2)
g j~ 1 g j~ l X(j)
,
Vu = --
g - 1 L.
~ [_YR(j) - -f
YRg (3.7.1.3)
g ) ~1
Also from the full sample information we have the usual ratio estimator of
population mean defined as
_ _(x)
YR = y x
_ _I n _ -I n
where Y =n LY; and x = n LX; are the sample means based on full sample
;~I i ~l
information.
A modified Jackknife estimator of variance is given by
,
Vm = --
g-1~[_
g
L. YR( ') - YR
j ~l )
-f . (3.7.1.4)
For g = n it reduces to the situation of dropping one unit at a time while making
groups .
Example 3.7.1.1. People Bank took an SRSWOR sample of eight states from the
population I given in the Appendix and collected the following information :
AR NY WA NC CA MI PA SD
848.317 426.274 1228.607 494.730 3928.732 440.518 298.351 1692.817
( a ) Apply the ratio method of estimation for estimating the average amount of the
real estate farm loans (in $000) during 1997.
( b ) Estimates the variance of the ratio estimator using the usual Jackknife
estimator of variance and derive the 95% confidence interval estimate .
( c ) Estimates the variance of the ratio estimator using the modified Jackknife
estimator of variance and derive the 95% confidence interval estimate .
( d) Which estimate of variance gives smaller confidence interval estimate?
Given: The known average amount $878.16 of nonreal estate farm loans.
y,.
~
State xi .,"",
-
YR
= -(
Y
X)
x = 71O.7603( 11878. 162 ) = 533.5667 .
69.793
.,.
Jackknife mec fianism
". x j '0
~ Yj x(j) y(j). ;Yk(j) (YR(; )- YRn~ ~R(;vYRf
848 .3 17 907 .700 1215.718 682.6260 493.0 869 2440.14489 1638.50906
426 .274 201.631 1276.010 783.4930 539.2059 10.7510 8 31.81452
1228.607 1100.745 1161.391 655.0481 495 .3000 2226.40 154 1464.24376
494 .730 639.571 1266.231 720.9301 499 .9815 1806 .52439 1127.87878
3928.732 1343.461 775 .659 620.3744 702.3549 25558.47260 28489.89390
440 .518 323 .028 1273.975 766.1506 528.1128 206 .55132 29.73054
298 .351 756.169 1294.285 704.2733 477 .8427 417 8.59394 3105 .02188
1692.817 413 .777 1095.076 753.1864 603.9932 3783 .29055 4960 .07229
. •.:\2:. " '>: SUtrl.· 4339.8780 40210. 73030 40847.1 6470
where
_ IIX - X j _ IIY - Yj _ y(j ) _
x(j) = - - , Y(j) = -- , and YRe) = _( .)x .
11-1 II- I ) xv)
Vu = n -I
n
t ~R( ') - YRnf = 8 8-1 x 40210.7303 = 35184.38.
j =J )
vm = n-I
n
t ~R( ') - YRf = 8-1
j =1 ) 8
x 40847.1647 = 35741.269 .
( d ) In this particular example the usual Jackknife estimator of the variance of the
ratio estimate provides smaller confidence interval estimate at the same level of
confidence.
3:7.2,REGRESSION•• ES.TIMATOR··
Assume that (Yi, Xi), i = 1,2,..., n denotes a simple random sample of size n from a
bivariate infinite population with means (Y, r). Let Y = n- I tYi and x = n- I tXi
i=J i=1
be the sample means based on n observations. Let b = sxy / s~ be an estimator of the
regression coefficient. Then the linear regression estimator of population mean Y is
Ylr=Y+b(X-x) . (3.7.2.1)
Divide the sampled data into n sub-samples each having (n -I) observations.
Let
_(.)_( _I)_I~I _ ~IY-Yj) _( .)_ ( )_1"-1 _ (nx-xj)
YJ - n L.Yi - ( I)' and x J - n -I i=1LXi - ( )
i=1 n- n-I
Chapter3: Use of auxiliary information: Simplerandom sampling 227
be the sample means after dropping /" unit from the sample. From the sub- r
sample obtained by dropping one unit a regression estimator to estimate the
population mean Y is given by
(3.7.2.2 )
• (_) n - 1 ~ [_ ( .) - ]2
Vu Ylr = - - L. Ylr J - Ylm (3.7.2 .5)
n j =l
For details see Miller (1974) , Rao (1969 , 1974, 1979), Rao and Rao (197 1) and
Krewski and Chakrabarty (1981).
Examp le 3.7.2.1. A key bank took an SRSWOR sample of eight states from the
population I given in the Appendix and collected the following information:
'A State CA FL MO IN NJ MA OK ME
Nonrealestate 3928.732 464.5 16 1519.994 1022.782 27.508 56.471 1716.087 51.539
farm loan (X )$
Real estate farm 1343.461 825.748 1579.686 1213.024 39.860 7.590 6 12.108 8.849
loan (Y )$
'ie,
YI ,(Xj-X
,,~)~~
!;,,' , -y
~', (y'j - Y (y,,]y X-~~~x) ,
I \M.t> ,_
CA 3928,732 1343.461 8010476,0 409178 .0 1810444.900
FL 464.516 825.748 401876.9 14873.6 -773 13.289
MO 1519.994 1579.686 177696.3 767192 .5 369225 .210
IN 1022.782 1213.024 5726.2 2593 18.5 -38534.508
NJ 27.508 39.860 1146925.0 440804 .0 711033.730
MA 56.471 7.590 1085728.0 48469 5.5 725429 .090
OK 1716.087 612.108 38 1471.0 8405.7 -56626.326
ME 51.539 8.849 1096030.0 482944.0 727544.680
;.~ .Sumt j~ 8 7 8 7 ( 6 29 5630.326 12305929.0 28674 12.0 1 4 17 1203.500
( a ) The regression estimate of average amount of the rea l estate farm loan s is
Ylr = Y + b(X - x) = 703.7908 + 0.3389 x (878.16 -1098.454) = 629.133 .
Using Table 2 from the Appendix the 95% confidence interval is given by
629.I33±2.447,J36386.80, or [162.36,1095.90].
( d ) In this particular example, the usual Jackknife estimator of the variance of the
regression estimate provides smaller confidence interval estimate at the same level
of confidence.
While using the known single population mean of an auxiliary variable at the
estimation stage or for the construction of ratio, product, and regression type
estimators, a natural question arises of how to utilise the information if available on
more than one auxiliary variable. The next section studies such situations.
Let us first consider the case where the information on two auxiliary variables is
available. Suppose Yj , Xli and X 2i are respectively values of j''' unit of the study
variable Y and auxil iary variables Xl and X 2 from a finite population o. Let Y,
Xl and X2 be the population means of the study variable Y and auxiliary
variables Xl and X 2 . Let the auxiliary variables Xl and X 2 be correlated with Y
with correlation coefficients P YXI and P YX2 ' respectively. A simple random sample
of size n is drawn by SRSWOR sampling from the population n and let Y, Xl
and X2 denote the corresponding sample means.
Now define
- -
&0
Y
=-=-1,
XI
1]1 =~-I and 1]2
X2
=-=--1
Y XI X2
such that
230 Advanced sampling theory with applications
and
1- 1-
CX,=SXI X"C X2= SX2 X 2, Sy2 = ( N-I )-\ LYi-Y
N ( -)2 , Sx,
2 =N
( - l )-1 L
i= \
N ( - )2
X ii -X' ,
i=1
We have redefined these terms only to avoid any kind of confusion in learning the
est imation strategies with more than one auxiliary variable. Note that if we have p
auxiliary variables, say X 1,X 2,...,X p and their popul ation means X\,X 2,...,X p
are known, then the abo ve results can easily be extended.
Olkin (195 8) proposed a weighted ratio type estimator of popul ation mean Y as
(3.8.1.1)
where YRj = y(x j Ix j ) , for j = 1,2 , are the two usual ratio estimators of population
Then the estimator YRa in terms of &0, '71 and '72, can easily be expressed as:
- -(1+ &0-'72 +'7 2-
YRa=Y 2 &0'72+" ) +wY
-('7r'7I+'7 I-'72
2 2+&0'l r &0'71 + ...) . (3.8.1.2)
B~Ra )=c ~f )Y[(C;2 -P YX2C yCr2 )+wk ;1 -e.;2+PYX2 C yCX2- Pyx\ CyCX) )] (3.8.1.3)
Proof. It follows by takin g expected values on both sides of (3.8 .1.2). Hen ce the
theorem .
Chapter 3: Use of auxiliary information: Simple random sampling 23 1
Theorem 3.8.1.2. Th e minimum mean square d error of the mult ivariate ratio
estim ator Y Ro is given by
= E [(&J + d - 2&01]2)+ w 2(1]i + 1]? - 21]11]2)+ 2W(&0 172 -1]i - &01]1+ 1]11]2)]
=c~f JYZ[c; 2
+ C;2 - 2pYX2CyCX2 + w (C; 1 + C;2 - 2PX\X2 C' 1CxJ
w=
~YX2CyC'2 - C;2 - PYXICy C'1 +PXIX2CXICt2) (3.8 . 1.6)
f~C XI2 + CX22 -2 P.'lX2 CXI CX2 )
On substituting this val ue of w in (3.8 .1.5) we obtain (3.8.104). Hence the theorem.
Raj (1965a) proposed a multivariate regression type estimator, which in the case of
two aux iliary variab les can be wr itten as
The above est imator YRaj in terms of &0 , 1]1 and 1]2 can be easi ly ex pressed as :
(3 .8.2.2)
Theorem 3.8.2.1. Th e estima tor YRaj is an unbi ased estimator of the popul ation
mean Y.
Proof. On tak ing expected values on both sides of (3 .8.2.2) we have
1- / ) [- 2 2 2-2 2 2- 2 2 --
= ( - n- Y Cy+.81 XI C XI +.82 X 2C x2-2.8I Y X \PYXlCyC'1
6 = det _
X IC XI ' _
X 2C'2 PXIX2] = X
- X- (2)
I 2C qCQ I -Pqx2 '
[ X ,C XIPx'oQ' X 2C
X2
and
X le ' I ' YPYXICY ] _ _ ( )
62 = det - - = Y XIC yC'I\PYX2-PYXIPXloQ .
[ XIC'1 PXIX2' YpYX2Cy
/31 =~= YCY~YXI -P XIX2P 'X2 ) , and /32 =~= YC~(PYX2 -PXI X2 P XI).
2 2
"" X I C xI l_p .qx2 "" X 2 C x2 l _p xIX2
(l-p;I X2
)
=(-n- I)
1- Sy2[ 1+ (_ 12 \2 12
Py.q 2 2 PXIX2
+ PYX 2 - 4 pYXIPYX2 PXI.Q+ PYX2+
2 PYXIPXIX2
2 2
1 PXI .Q J
+ 2p"" ~'" P,,,,- pi"~ P"" - pi"~ P"" + P", P", p;",) )
=
1- I) 2[
( - / 1 - S y 1+(1-
1
2 \2
{ 2 2 2 2 2 2
P yxl- P YXIP XI X2 +PYX2-PYX2PXI X2-2PYXIPYX2P.qx2+2pYXIPYX2P.qx2 J
3 )
P,q X2 J
234 Advanced samp ling theory with applications
Example 3.8.2.1. The season average price (in $) per pound amount of the
commercial apple crop in 36 different states of the United States has been given in
pop ulation 3. Sup pose we selected an SRSWOR sample of nine states to collect the
required information from 1996. Find the relative efficiency of the regression type
estim ator of average price in the United States that make s use of past information
from two years with respect to the estim ator that makes use of past inform ation only
from one year.
Solution. From the description of the popul ation we have
Yi = Seaso n average price per pound during 1996,
Xl i = Season average price per pound during 1995,
X 2i = Season average price per pound during 1994,
N = 36, s; = 0.006488 , Py.q = 0.8775, PYX2 = 0.8577, PXI X2 = 0.8788, n =9 and
f = n]N = 9/36 = 0.25 .
Now we have
( a ) Use of one auxiliary variable
Regression estimator: Yll = Y + p(X l - Xl) .
Mean squ are error:
(- )_(1- fJ 2[
MSE Yl 2 - - - Sy I
n
PYX~ + P~X2 -2PYXIPYX2P
2
I- p
XI X2]
xlx2
2 2
= (1 -0.25 J(0.00648sfl 0.8775 +0.8577 - 2XO.8775XO.8577X O.8788]
9 1 1-0.8788 2
= 0.0001066.
£(X'i -XI)2 , £(Xli -X, XX2i - X2) 1[,811 r£(Yi - YXX'i - XI)]
'f'(X'i -X, XX2;~X2 )' .I(X2i - X2? ,8 = 'f'(Yi - YXX2i- X2)
r
(3.8.2.9)
,~' 1~ 1 2 l ~'
or
(3.8.2. 10)
.i:
VI)'Raj(p) (I-f)
n
I I" ei2 .
) = - - - -3
1/ - i~ '
(3 .8.2.12)
/-I)'RaJ.(p) ) -_ (~)sY2(1-/FY·XIX2)
/I
(3.8.2.13)
where
(3.8.2.14)
236 Adva nced sampling theory with applications
Theorem 3.8 .2.3. The min imum sample size requi red to achieve the minimum
relative standard error rjJ with the estimator YRaj is given by
(3.8 .2.15)
-1
1 rjJ2 1 ",2 1
or - < -.,----'-----,- +-
11 - C2 (1- R 2
y
)
y .XJX2
N'
or II ?
1
'f'
C 2 1_ R2
y ( y ..q x 2 )
+_
N )
Hence the theo rem .
Example 3.8.2.2. We wish to estimate the seaso n's average price per pound (Y) of
the commercial apple crop during 1996 in the United Sta tes. The corre lation
between the price duri ng 1996 ( Y ) with that dur ing 1995 (X 2 ) and 1994 (XI) are
assumed to be kno wn. Find the minimum sample size , 11 , required to est imate the
average price with relative standard devia tion 5.6%.
Gi ven : R~.XJ.Q = 0.8029 , C; = 0.1563, and N = 69 .
-1
Solution. We started with the first two columns of the Pseudo-Random Numbers
(PRN) given in Table 1 of the Appendix to select 14 distinct random numbers
I ~R ~36 as 01, 23, 04, 32, 33, 05, 22, 29, 03, 36, 27,19,14 and 06.
Sr. No. State and Terr itory Year 1994 Year 1995 Year ,1996
, X2i Xli Yi
01 AZ 0.078 0.071 0.122
03 CA 0.133 0.183 0.160
04 CO 0.157 0.145 0.223
05 CT 0.283 0.276 0.292
06 DE 0.168 0.125 0.173
14 ME 0.174 0.179 0.185
19 MO 0.198 0.160 0.228
22 NM 0.2 19 0.298 0.306
23 NY 0. 118 0.121 0.130
27 PA 0.104 0.095 0.133
29 SC 0.130 0.126 0.136
32 VT 0.165 0.181 0.194
33 VA 0.090 0.099 0.101
36 WI 0.230 0.241 0.133
Sum 2.247 2.300 2.516
The estimates of the partial regression coefficients /31 and /32 are
PI = sy bXl - rXIX2rYX2 ) = 0.063063(0.764 1- 0.8893 xO.7722) = 0.3431 ,
2
s Xl (l- r~X2 ) 0.068009( 1-0.8893 )
and
P2 = sy bX2 -rXIX2rYXI ) = 0.063063(0.7722 -0.8893 xO.7641) = 0.4837.
2
s X2 ( 1- r~ X2 ) 0.057778( 1- 0.8893 )
14 14
One can see that I ei = 0.0 and I e; = 0.019392408. Thus an estimate ofVlYRaj(p)) is
i; 1 i ;]
. )=(1-nfJ_l_f
v(-YRa)(p) n-3 ; 1
e 2 =(1-0.39J x 0.019392408 =0.00007694.
14 14-3
I
i
1- r 2
X]X2
2 2
0.7641 +0 .7722 -2xO.764l xO.7722xO.889 =0.6249 .
1-0.88932
An estimator of the vlYRaj(p)) is given by
A (I - a)1 00% confidence interval for estimating the population mean Y is given by
or 0.1920H2.201~0.00006499, or [0.1743,0.2097] .
and their population means XI' X2 ,.•., Xp are known, then the multivariate
regression type estimator proposed by Raj (1965) is
Ym =y+ If3;(X i
i=l
-xJ (3.8.2 .16)
Proceeding as in the above theorem the minimum variance of the multivariate
regression type estimator Ym is given by
Min.v(Ym)=(I-
n
f]Y 2Cy2( I_ Ry2•xlx2····xp ) (3.8.2.17)
2
where R y.xlx2·..
xp denotes the multiple correlation coefficient between Yand Xl,
X 2 , ••••, X p '
Srivastava (1971) proposed a general class of ratio type estimators for estimating
the population mean Y as
Ys =yH(Ul,U2 ,...,u p)=YH~) (3.8.3.1)
where
u i" xj / Xj ,j = 1,2, ..., P, and H~) is a parametric function such that H~) = 1 for
.§ = (I, I, ...., I),xp , satisfying certain regularity condition such as the first and second
order partial derivatives of H with respect to !:!. exist and are known . Expanding
H~) around the point .§ by using second order Taylor's series we have
(3.8.3 .2)
= Y[I +& 0 + &.i -§)H I ++&.i - §)' H 2&.i -§)+&O &.i -§)H I + ....] (3.8.3.3)
~ 1 0 2H
where H J =-II/=E and H 2 = = - - - 11/ E are the matrices consisting of first and
- OU - - 2 ou'u - =-
second order partial derivatives of the function H with respect to l:i and evaluated
at l:i = § . Thus we have the following theorems:
Theorem 3.8.3.1. The bias, to the first order of approximation, in the general class
of estimators of population mean is
Theorem 3.8.3.2. The minimum mean squared error, up to the terms of O(n- I
), of
the general class of estimators of population mean is given by
=(1-
n
f)p[c; + I fHl tHIjPxtxjCtt Ctj +2
t=l j =1
I PYX jCyCXjHIj] .
j =1
(3.8.3.6)
On differentiating (3.8.3.6) with respect to HI = (HI! ,..., Hlp Y and equating to zero,
we will obtain a set of p equations, as
Chapter 3: Use of auxiliary information: Simple rand om sampling 241
or :i H I = C . (3.8.3.8)
The set of equations given by (3.8.3.8) can easily be solved for unknown
parameters as HI = £ C . As we saw in the Theorem 3.8.2.2, one can easily see
that by substituting the optimum values so obtained in (3.8.3.6) we obtain (3.8.3 .5).
Hence the theorem.
Corollary 3.8.3.1. A wider class of estimators for estimating population mean, Y ,
using p auxiliary variables X I ' X 2 ' ... , X p can easily be defined as
Yw = H(y, !:!)
(3.8.3.9)
where H[.,.J is a parametric function such that H(Y, .§)=I , satisfying certa in
regularity conditions. It is easy to show that the wider class of estimators Yw has
the same asymptotic mean squared error as that of the general class of ratio type
estimators of population mean defined at (3.8.3.1).
r21, 1,
(3.8.3.11)
Sl detl!!l,}) .
bl } = - - ( ), ) = 2,3,..., p (383 12)
s} det !!III .. .
and the estimator of multiple correlation coeffici ent R I~ 2.3.4 .....P is given by
242 Advanced sampling theory with applications
k2 -I det(tl)
1.2 ,3, ..., p - - det(tl ) (3.8.3.13)
ll
Example 3.8.3.1. An estimate of total number of fish at the Atlantic and Gulf Coats
helps in making decision to recruit labour by fishermen contractors. The average
number of fish during 1994, 1993, and 1992 are known to be 4954.435 , 4591.072,
and 4230.174 respectively . Apply the following estimator to estimate the total
numbers of fish in all 69 types of groups
- yr C C
YX I Y Xl
- yrYX CY Cx
2 2
- yrYX CY Cx
3 3
To estimate the number offish during 1995, a consultant takes an SRSWOR sample
of 16 types of fish as given in the following table:
3
3 Skates/rays 2152 1981 2939 2353
5 Saltwater catfishes 13466 12690 14441 13859
4 Eels 138 222 186 152
16 Striped bass 3840 4799 8521 10758
21 Bluefish 11990 10301 12405 10940
43 Weakfish 1668 2219 4929 5739
55 Cunner 1931 1876 1255 1375
58 Atlantic mackerel 1045 2307 4860 4008
33 Snappers , other 746 861 462 492
69 Other fishes 12249 14953 20488 14426
39 Sheepshead 5933 5593 4383 5118
10 Pollock 168 397 862 832
59 King mackerel 1289 1023 1148 1252
45 Silver perch 1198 1034 1729 2146
62 Summer flounder 11918 22919 17741 16238
66 Flounders , other 1103 999 918 897
Chapter 3: Use of auxiliary information: Simplerandomsampling 243
S <lrt1 nl p x x xI
y
"" " " ,,"" ",, ,1 " , 3 2
<~;;, <" ; /; 4427.1250 5260.8750 6079 .1875 5661.5625
'!;;!
I!' 24711755.32 43283631 .72 44 114004.70 31798379.33
I ""';r , ,'~~
O,~, 1.1228 1.2505 1.0925 0.9960
I !J ,! ~~V~
1.0000
0.9347 1.0000
0.9246 0.9723 1.0000
Here
n = 16 , p = 3, Y = 5661.56, Xl = 6079.18, X2 = 5260.87, x3 = 4427.12,
Cy =0.9960, CXl = 1.0925, CX2 = 1.2505 , CX3 = 1.1229 , rYXI =0.9723,
'YX2 = 0.9246, rYX3 = 0.9 176 , rXI X2 = 0.9347, rX1X3 = 0.9305 and rX2X3 = 0.9133.
The set of normal equations becomes
1.2769,1.5637,1.2824 -6519.78
Hence a point estimate of the average number of fish during 1995 is given by
The next section has been devoted to study the general class of estimators to
estimate any population parameter (e.g., population mean, population variance,
population correlation coefficient, population coefficient of variation , population
regression coefficient, etc.) by making use of p auxiliary variables at the
estimation stage.
Singh, Mangat , and Mahajan (1995) considered the problem of estimation of any
population parameter Fo of the study variable Y by using known supplementary
information on p auxiliary variables F1, F z,...,Fp. Let fo ,ft>...,fp be the unbiased
or consistent estimators of Fo,F1 ,...,Fp , respectively, each based on a sample of
size n > p. Let g = (ut>uz ,...,U p), where Ui = f d F, , i = 1,2,..., P , assume values in a
bounded closed , convex subset R p of p-dimensional space containing the point
~=(I, I,..., I} Let 1/ =(b" bz,...,bp), where e, = {FoCOV(jO,fi)}/{FiV(jo)}'
i = 1,2,..., p, and A= la . j where hZCovVi,fj )}/{FiFjV(jo)} ,
1J pxp'
a ij =
-
+ f (Ui -IX/o - Fo)h~ (Fo, .§)+Vo - Fo? h~o(Fo, .§)+ ..... (3.9.3)
i~ l
where hf (Fo, .§) and hJ (Fo, .§) denote the first and second order partial derivatives
of hVo, ~) with respect to 10 and Ui' respectively. Thus we have the following
theorem .
Theorem 3.9.1. The bias in the wider class of estimators of any population
parameter is of the order O(n- I ) , i.e., prove that
E(th) =Fo+O(n- 1 ) . (3 .9.5)
Proof. Taking expected values on both sides of (3.9.4) we obtain
) 1 P ) 1
E(th) =Fo +'LC ij hij Fo,.§ +- 'LCiihii Fo, .§ +-FoCoohoo Fo, .§
p I ( II ( II ( )
i<j 2 i~l 2
(3.9.6)
Note that 10, II> fz, ..., I p are either unbiased or consistent estimators of
Fo, F1, F2, ..., Fn> respectively, by the definition of consistency from Gujarati
(1978), Cij» (i, j = 0, I, 2,..., p) representing the variance--covariance terms and
SVi) will tend to zero as sample size n~ 00 . Thus (3.9.6) can be expressed as
(3.9.5).
Hence the theorem.
Theorem 3.9.2. The minimum mean squared error, up to terms of order O(n- I ) , of
the class of estimators th is given by
where R}oo f l, n .... ,fp denotes the multiple correlation coefficient betwe en fo and
r
fl,fz ," ',fp'
Proof. By the definition of mean squared error we have
Remark 3.9.1. ( a) Note that the multiple correlation coefficient increases with the
numb er of secondary variables, it follows from (3.9.10) that the minimum mean
square error of th is a monoton e decreasing function of the number of secondary
variab les.
( b ) The value of RJizoofiI . f z,...,f P also increases if there is high correl ation between
two auxiliary variables and such high correlation between the auxili ary variables
may bring artificial reduction in the variance of the estimator of population mean or
total. Such a problem can be addressed as a problem of mult icolIinearity in survey
sampling. For example suppose there are three variab les Y, XI ' and X z. Then the
minimum variance of the linear regression estimato r
Situat ion I. Let P YXJ = 0.6 , P yxZ = 0.8 ,and P XIXZ = 0.3 ;
Situation II. Let P YXI = 0.6, P yxz = 0.8 , and P XIXZ = 0.95.
Then the ratio of V(Ylr) under case I to case 2 is given by
248 Advanced sampling theory with applicat ions
2 +p2
P YXI YX2 - 2 PYXI P YX2 PXI X2 ]
[1 l_p 2
ti
R alO= _V...,.(Y_-I:.:...,r"")s""itu:::;at",,io.:.:..n
,--I ,,\x2 situation 1 = 2.233 .
V(Ylr )situation II
I P~XI +P~X2 -2PYXIPYX2P ,,\X2]
[ l_p 2
,,\ X2 situation II
Clearly the reduction in variance V(Ylr) in situation II is due to high correlation
between XI and X2 '
Biradar and Singh (I 992a), Singh and Kataria (1990), Singh (1988), and Singh and
Upadhyaya (1986) have suggested some methods to improve the general class of
estimators. Srivastava (1992) has shown that those methods are not valid for
improving the general class of estimators . Biradar and Singh (1997 , 1998)
considered a class of estimators based on a general sampling design for a
population parameter ¢o utilizing the information on two paramete rs ¢I and ¢2 of
an auxiliary variable .
Suppose there are two variables Y\ and Y2 under study in a finite population n of
size N . Let Y\j and Y2i denote the values of the {" unit in the population. Suppose
we want to estimate the ratio of two population means defined as
YI
R Y1Y2 =-=- (3.10.1)
Y2
_ _I N _I N
where Y1 = N 2: Y1i and Y2 = N 2: Y2i denote, respectively, the popul ation means
i=1 i=1
of the two variables . Suppose a sample of n units is drawn by using SRSWOR and
both (Yli ' Y2i) for i = I, 2, ..., n paired observ ations are observed from the sample .
YI and 00 = ~2 - 1,
&0= -=- - 1 ,
Y1 Y2
such that
E(&o )= E(oo )= 0
and
E{&2
~ 0
)=(~)C2
/I YI '
E(05 )= ( 1-
/I
f)c~J 2 , and E{&oOo )= ( 1- f) PYIY2 CY1CY2 .
/I
Now the estimator RYIY2 in terms of &0 and 00 can easily be written as
RY1Y2 = RJ'1Y2 [ l +&o- oo+ oJ- &ooo +···] · (3.10.3)
Theorem 3.10.1. The bias, to the first order of approximation, in the estimator
RYIY2 is given by
Theorem 3.10.2. The mean square error, to the first order of approximation, of the
estimator RY1Y2 ' is given by
(, ) (1- f)
MSER ylY2 = -/1-
2 r2 2
RYI Y2lCYI +CY2 -2PYI Y2CYI CY2 .
]
(3.10.5)
Proof. By the definition of mean squared error , we have
MSE(R yly2)= E[RY1Y2 -R yly2]2 ,., R~IY2E[&0 -00 +05 - &000]2
2
,.,R y1 [ 2 2
y2 E &0 +00 - 2&000 =
j (I-f)
-/1-
2 r 2 +C 2 - 2PYI Y2CYI CY2 ] .
RYIY2lCYI Y2
Henc e the theorem.
Remark 3.10.2. In the case of the presence of known auxiliary inform ation , general
class of estimators of the form, Rir ={ ; :}H(II ), h = {:VI Y2}H(II ) where lI =x/x
and H(.) is a parametric function can also be constructed to estimate the ratio
RYIY2 and product Py 1Y2 of two popul ation means . The interested reader may refer
to Singh ( 1982a).
The median is often regarded as a more appropriate measure of location than the
mean when variables with a highly skewed distribution, such as income, are
studied. As we have seen in the previou s section s of this chapter, there is extensive
literature available on the estimation of mean and total s in sample sur veys.
Relatively few efforts have been made to develop an efficient estimator of the
median . Gross (1980), Sedran sk and Meyer (1978), and Smith and Sedransk (1983)
have cons idered the probl em of estimation of the median using simple random
sampling. Kuk and Mak (1989) are the first researchers to attempt the estimation of
the median using auxiliary information. Franci sco and Fuller ( 199 1) have also
considered the problem of est imation of the median as a part of estimation of finite
popul ation distribution function . In this chapt er we shall restrict our self to the
discussion of the ratio type estimator developed by Kuk and Mak ( 1989). Let Yj
and X j , i= 1, 2,..., N , be the values of the popul ation units for the study variable Y
and auxiliary variable X respectively. Furthermore, let Yi and Xi' i = 1,2,..., 1/ , be
the values of the units includ ed in an SRSWOR sample of size n . Assuming the
median M x of the var iable X is known we have the followin g theorem :
Theorem 3.11.1. The ratio type estimator to estim ate the median M y of the study
variable is give n by
M
. R
. (M
=M y it : J (3.11.1)
Suppose Y(l) ' Y(2)' . .. , Y(II) are the Y values of sample units in the ascending
order. Furthe rmore let p =!- be the proportion of Y values in the sample which are
1/
less than or equal to the median value M y which is an unkno wn par ameter and is to
be estimated from sample observations and so is the case of p . If p is an estimator
of p , the sample median it y in terms of quantiles can be written as Qy (p) , whe re
p = 0.5. Kuk and Mak ( 1989) defined a matrix of proport ions [ Pi) ] as:
Chapter 3: Use of auxiliary information: Simple random sampling 251
Y>M · Total
y.
P2. I
V(M y) (3 .11.2)
and
V(M x ) = C~fJ {Jt(~x)}-2 , (3 .11.3)
where f y and It are the density functions of y and x respect ively. The
covariance between My and Mx is given by
(3 .11.4)
My - My = {Jy(MJ rlfry( M
J-fry ( MJ ]+op(n- I 2
/ )
Theorem 3.11.3 . Th e variance of the ratio estima tor MR of popu lation median My
[::r (rA~Jt'
is give n by
On substituting the va lues of V(M y) , V(MJ and COV(M y, MJfrom theo rem
3.11.2 in (3 .11.6), we have (3. 11.5). Hence the theorem.
where Pc = 4(PIl - 0.25) goes from - I to + 1 as PII increases from 0.0 to +0 .5. This
condition is ana logo us to the condition under which the ratio estimator of
po pulation mean remains superior to the sample mean .
Proof. By setti ng V(M R)<V(My) we have
Chapter 3: Use of auxiliary information : Simple random sampling 253
or
or
or
M-x I {fx (M x )}-I
2M;I{i)M y l l
P >
c
Therefore we have
254 Advanced sampling theory with applications
and
r
V(M R) =C~f )[ {ry(~y)}-2 +(:: V,(~x)}-2 {::](1j -O.25){rAMJfy(MJ}-I]
I
= (1-0.16J[
8
~.354Ix 10- 4 r
4
+(322.305 J2 fu.4345 x 10-
452.517 4
4
r
-2(322.305J(0.42-0.25)~.354IXIO-4 X3.4345XIO-4}-' ]
452.517
= 61393.76 .
Thus the percent relative efficiency (RE) of the ratio estimator MR with respect to
the usual estimator My is given by
Remark 3.11.1. In the above example, for simplicity we have considered univariate
normal distributions for X and Y separately, but a more interesting example may
be considered by assuming bivariate joint normal distribution of X and Y .
Theorem 3.11.5. The minimum variance of the regression type estimator of median
defined as
Mlr = My +r(M x - Mx ) (3.11 .8)
is given by
(I
Min.v(M Ir)= ~f}PI' (1- 2PI 1 ){ry (My )}-2 . (3.11.9)
Proof. We have
V(M Ir)= V(M .l- r 2V(M x)- 2rCov(My'Mx)
= (' 1
~f (r,(:')\-' + r' (rA~, l ' - 2r( R, - 025 )(rAMJfAM,)r 1
(3.11.10)
On differentiating (3.11.10) with respect to r and equating to zero we obtain
[4(fj I - 0.25){ry(My)f,(Mx )}-J] (3.11.11)
r= [V,(M x )}-2 ]
Chapter 3: Use of auxiliary information: Simple random sampling 255
A B C D E
9 11 13 16 21
14 18 19 20 24
( a) Find the population medians My and Mx of the study variable and auxiliary
variable, respectively.
( b) Select all possible samples of three units (n = 3) with SRSWOR sampling.
( c ) Find the estimates of the medians My and Mx from each sample.
( d ) Find the exact bias in the estimator ify using the definition.
( e ) Find the exact mean square error of the estimator if y using the definition .
( f) Assuming that the median M x of the auxiliary variable is known, find the ratio
estimate of the median if R = if y(M x/if x) from each sample.
( g ) Find the exact bias in the ratio estimator if R using the definition.
( h ) Find the exact mean square error of the ratio estimator if R using the
definition .
( i ) Find the relative efficiency of the ratio estimator if R with respect to ify'
Solution. ( a ) The population medians of Y and X variables are given by
My = 13 and M'; = 19 .
( b ) and (c) All 10 possible samples, estimates of medians from a given sample
ify Is, if x Is and related results are given in the following table:
256 Advanced sampling theory with appl ications
, ) f.',
MSE (M R = L10 Ps tM R Is -M y
}2
= 2.031 .
s=1
Chapter 3: Use of auxiliary information: Simple random sampling 257
( h) The relative efficiency of the ratio estimator MR with respect to the usual
estimator My is given by
MSE(M y) 390
RE =----,,..--'----,xIOO = -' -xlOO = 192.05% .
MSE(M R ) 2.031
Remark: 3.11.1. It is not clear if the estimator M1r of median can work as
efficiently as the usual linear regression estimator of population mean, YLR' Kuk
and Mak (1989) studied two more estimators under the names of 'position
estimator' and 'stratification estimator', these were found to be as efficient as the
M!r from the variance point of view. Graf (2002) also pointed out that estimators
of median developed by Kuk and Mak (1989, 1994) and Ren (2000) deserve
practical investigations. A bootstrap method for smoothed est imators of median has
been discussed by Brown, Hall, and Young (2001) . Nelson and Meeden (1998)
used prior information about the population quartiles of the auxiliary variable to
improve estimator of median.
EXERCISES
Exercise 3.1. Under SRSWOR sampling, find the first order approximations of bias
and mean squared error in each of the follow ing estimators:
( c)
and
Y3 = Y[ ax + (~a )x l (d)
Y _ y[(I+a)X+(I - a)x].
4- (I - a)X + (I + a)x '
(e) Y5 =(I-a)y+aY(;Y;
where a and r are suitably chosen constants such that MSE(Yt), t = 1,2,3,4,5, is
minimum. Show that Min.MSE(Yt), t = 1,2,3 , to the first order of approximation, is
the same as that of the usual linear regression estimator.
Hint: ( a ) and ( b ) Chakrabarty (1968), Vos (1980), Adhvaryu and Gupta (1983);
( c ) Walsh (1970) ; ( d) Sahai and Sahai (1985); (e) Sisodia and Dwivedi
(1981).
258 Advanced sampling theory with applications
Exercise 3.2. Compare the following estimators with the usual ratio estimator under
SRSWOR design
- -( NX - nx
( b ) Y2=Y (N -n )X
Jor Y2=Y-=-'w
- - x* here x-* = -1 - N-/I
LX; denotesthemeano f
X N -n ; =1
-
yx +sxy In h
.£(x; -xf
d
£(x; -xXy;- y)
( d) Y4= x2 + s ; l n 2 1=1 ;-1
,w eresx==-'-(--n-_-I,-)-an Sxy= - (n-I)
- y[ 1+ (I--;; - N1)(sx
( e) Ys = i s;J] '
Y x - i2
y
and
(f)
Hint: ( a ) Sisodia and Dwivedi (1981); (b) Srivenkataramana (1980),
Srivenkataramana and Tracy (1980, 1981); (c) Prasad (1989); ( d ) Srivastava,
Dwivedi, Chaubey, and Bhatnagar (1983); ( e ) and ( f) Swain and Sahoo (1982).
Exercise 3.3. Find the minimum MSE of the estimator of population mean, given
by
- =_(s;
YI Y-
Jf3
2
SX
and study the behaviour of the resultant MSE under bivariate normal distribution
and discuss your views.
Exercise 3.4. Show that the minimum MSE of the estimator of population mean Y
defined as
Y sl = a y + jJ(X - x)
IS
Exercise 3.5. Suppose a class of ratio type estimators to estimate the population
mean Y is defined as
Yc = Yf(u,v,w)
where u = xl X, v= s;/S; , and w =rty / P xy , and rxy = S xy Ils x S y ) is an estimator of
population correlation coefficient, P xy .
Exercise 3.6. Consider A= y(xlxY, such that AE H,i = 1,2,3, where H denotes
the set of all possible product type estimators of population mean . Construct the
following terms:
( a ) Linear variety of estimators;
( b ) Funnel to filter the bias precipitates;
( c ) Filter paper to filter the bias precipitates;
( d ) Amount of chemicals to reduce the bias of first order of approximation.
Hint: Singh and Singh (1991, 1993a, 1993b, 1993c).
Exercise 3.7. Study the bias and MSE of the predictive product estimators, given by
( a) YI = Y( X~) , and (b) yz = ...!.- I YC i ,
ni=1 X
(a) - _(x)
Y1 =Y
(l -f)Sxy .
X --n- X '
( b) Yz- -( x
- - Y~
X
)[1(1- f) Sty ]-I
+--=
n xY
( c) Y3 = Y ~
X
(x) (l -f)Sxy
- - -n- =x- ; and (d)
260 Advanced samp ling theory with applications
where a is the characterising scalar. Also show that YI and Y2 are the speci al
cases of Y4 for certain choice s of a and the estimator Y4 remains better in the
sense of smaller MSE than the other estimators, Yt , t = 1,2,3.
Hint: (a) Robson (1957 ); (b) Singh (1989) ; ( c ) Dube y (1993); ( d) Srivastava
and Bhatnagar ( 198 1), Bhatn agar (1996).
Exercise 3.9. Let X m and X M denote the minimum and maximum values of a
known positive variate X respecti vely. Using these values, let us transform the
auxiliary variable X to create two new variables Z and V such that
Z, = Xi + X m and Vi = Xi + X M for i = 1,2,..., N .
XM +Xm X M+Xm
The same transformations are applied on the Xi value s In the sample as
Xi + X m
Zi = and "i = Xi + X M , for i = 1,2,....n . Find the conditions under which
XM +Xm X M +X m
the estimators
- -(VJ
(b ) Y2 = Y Ii
_ -I -I -I N
= N I Zi and U = N- 1 ~Vi
11 _ 11 -
where Z = n I Zi, II =n I "i , Z , of population
i=1 i=1 i=1 i= 1
Exercise 3.11. Find the asymptotic bias and mean squared error expressions for
. f h
three estim ators 0 t e parameter K =
Cy X
S xy
P ty - = ~-2 '
.
given by
. Ct Y s;
X S ty k _ X Sxy
(a) k _ ~ SXY .
1 -- 2 ' ( b ) k 2 = -=---T ; and (c ) 3 - -- .
Y St Y Sx Y S}
Chapter 3: Use of auxiliary information : Simple random sampling 261
Also study the properties of the six general classes of estimators defined as:
= kJH(u) and kbJ = kJH(u, v) for J = 1,2,3
k aJ
where u = xl X and v = s~ / S; . Construct the wider classes of estimators to
estimate the parameter K and study the properties . Comment on the results .
Hint: Singh and Singh (1988), Srivastava, Jhajj, and Sharma (1986), Reddy
(1978a) .
Exercise 3.12. Study the properties of the almost unbiased product type estimators
of population mean, Y given by
(a )---(x)X
YI - Y ~ -(N-n)(sry)
- - - ~ ,.
nN X
(b)
- - n(N-I)_(X)_ (N-n)(p) .
Y2 - N(n-I)Y X N(n-I) X '
(0) Y3 '(n
11)[y(~)-m l end (d) y" ymH1~fXjJ-;n];
h sxy = (1)
were n- - 1 2:
n(
Yi - Y-XXi - -)
X an d -p = n- 1 2:
n
Yi Xi '
i=1 i=1
Hint: Shah and Shah (1979); Murthy (1964); Pandey and Dubey (1989) .
Exercise 3.13. Suppose a ratio type estimator of the finite population variance S;
given by
Sl2 = s;2(S';/ s.;)
where s;2 = AS; denotes the Searls (1964) type estimator. Find the minimum mean
squared error of the estimator sf and find the condition under which it is more
efficient than the ratio type estimator of variance proposed by Isaki (1983) for
A=l.
Hint: Prasad and Singh (1990).
Exercise 3.14. (a) Show that the power type of estimator of variance S; given by
s~ s; (s.; /s~ r
=
IS always more efficient than the ratio type estimator si s; (s.; / s.~ ),
= for the
optimum choice of real constant a .
( b ) In case of multi-auxil iary information (say, k-variables), study the asymptotic
properties of the estimator, defined as
.
correlation coefficient Pxy is given by
Sty
rs = - -
StSy
where s:y = AS xy denotes the Searls (1964) type estimator of Sxy .
Hint: Singh, Mangat, and Gupta (1996).
81 = ~, 82 = ~( ~J, and 83 = ~( ~; J
where cy = (Sy/Y), Cy = (Syjf) and such that 8i E H, i = 1,2,3, where H denotes
the set of all possible estimators for estimating the 'Inverse of population mean
0= I/f '.
Define the following terms in statistical language:
( a ) Linear Variety of estimators;
( b ) Funnel to filter the bias precipitates ;
( c ) Filter paper to filter the bias precipitates ;
( d ) Amount of chemicals to reduce the bias of first order of approximation.
Hint: Singh and Gangele (1995, 1997), Singh and Singh (1991, 1993a, 1993b,
1993c).
Exercise 3.17. (a) Show that the MSE, to the first order of approximation, of the
usual ratio estimator of population mean can be expressed as
MSE(YR)= (1- f)_I_IE?'
n N - li= l
where Ei = (r; - r). R(Xi - x) and R= f/ X have their usual meanings.
( b ) Study the asymptotic properties of an estimator of MSE(YR) defined as:
, (_) (I-f)
MSE YR = -
n
I n
-(-)L: ei ~
n -1 i=l X
2(X)g,
where g is a suitably chosen constant and ei = (Yi - n- r(xi - r) for r = YIX .
Hint: Wu (1982).
( c ) Obtain the mean square error of the almost unbiased estimator in (b) to the
first order of approximation and compare it with that of the estimator in ( a ).
( d ) Find the bias and MSE of the general class of estimators defined as:
where u = x/X and H(.) is a parametric function such that H(l)= I and define
certain regularity conditions required.
Hint: Expand the ratio Sy/Y in terms of &0 and &2 by using binomial expansion
and use the results from the section 3.1 to proceed further.
Exercise 3.19. Study the second order asymptotic properties of the estimators of
population mean Y given by
Exercise 3.20. ( a ) Show that the minimum mean squared error of the estimator of
population mean as:
Ygen = a)i + p(X- x)
where a + p = I is given by
MSEV'gen
(.. ., )=(1- fJ 2
S;S;(I-P;y)
2 •
n Sx+ 2pxyS xS y+Sy
Hint: Jain (1987) .
( b ) Find the bias and mean square error of the estimator
Yds = WIY+W2 X+(1-WI- W2)X,
where WI and W2 are suitably chosen constants, such that the MSE of Yds IS
minimum .
Hint: Dubey and Singh (2001).
Exercise 3.21. Consider an estimator of population mean Y is defined as:
- _ _[(A+C)X + f BXa ]
Ysk - Y (A + f B)X + C x a '
where
xa = ax + (I - a)X , a = n/(N + n), f
= n/N and A, Band C are the functions of k
and A=(k-IXk-2), B=(k-IXk-4), C=(k-2Xk-3Xk-4) such that ke(O,oo) .
( a ) Show that several estimators are special cases of the class of estimators defined
as Ysk for different choices of A , Band C .
264 Advanced sampling theory with applications
( b ) Find the mean square error of Y sk over the sample mean estimator for
different choices of parameters involved, and comment.
Hint: Singh and Shukla (1993).
Exercise 3.22 . Suppose Y is the variable under study and XI' Xz, ...,XP are the p
auxiliary variables correlated with it. A sample of size n is drawn from the finite
population of N units with SRSWR. If prior information about the coefficient of
variation Cy of Y along with information about p auxiliary variables is available,
study the properties of the estimator of population mean Y defined as
Ym = w~ + ,lJ"(X -x)] ,
where {J"= [.a1,{JZ,···,{JP]PXI ' X"= [XI>XZ, oo .,Xp]PXI' :£'= [Xl>xz ,...,xp]PXI and W is
a suitably chosen positive constant such that the mean square error of the estimator
Ym is minimum.
Hint: Kothwala and Gupta (1989).
Exercise 3.23. Study the asymptotic proper ties of the multivariate estimator of
population mean, Y defined as:
( a) YI = YI~lwI( ~ J, I
(b) yz =fl[y( ~ J]WI
1;1 XI
and
P n
where LWI = 1, and WI are real constants, and XI = /l -I LXii denote the sample mean
1;\ i;1
- IN h
unbiased estimator of the known population mean XI = N- LXii of the t' auxiliary
i;1
variable, t = 1, 2, ..., p .
Hint: Singh (I 967b ), Tuteja and Bahl (1991) .
Exercise 3.24. Suppose there are two auxiliary variables X i and Z, on which
information is available and we wish to estimate the population mean, Y,of the
study variable, Y. Then study the behaviour of the estimators
yz = X t(YiJ(:!.J
n i;1 Xi Z
and show that an estimator after adjusting the bias is given by
Ys = y( ;)( ~),
_ _I n - _I N
wherell =n L.1I;, U = N L.1I; , f or ll;= L -x; andLis ascalartobechosen so
;= 1 ;= 1
that the mean squared error of Ys is minimum.
( f) Find the bias and variance of the product-cum-difference estimator, defined as
Ypd = ~ [x + k(z - z)]
X
for the optimum value of k .
r
( g ) Study the generalized regression ratio estimator
and
Hint: ( a ) and ( b ) Singh (1967a, 1967b, 1969), Tracy and Singh (1998); ( c )
Sahoo and Swan (1980); (d) Biradar and Singh(1992b), Tracy , Singh, and Singh
(1996); (e) Singh and Ray (198 1), (f) and (g ) Khare and Srivastava (1981);
( h ) Singh and Singh (I 984b ); ( i ) Agarwal (1980).
ifp = ify[~: J
remains more efficient than the sample median estimator ify.
Hint: Singh and Joarder (2002).
Exercise 3.26. Use the power transformation type estimator to estimate the
population median, My, defined as follows :
• • M. Ja
«; =M [M:
y
( a ) Show that the ratio and product type estimators are spec ial cases of if pw .
if v> I,
otherwise,
_ I VM _ Iv
where rv =- .L and Xv =- .LX(i) .
V, =IX(i) V ,=I
Hint: Pathak (1962).
Exercise 3.28 . Find choic es of 5 , OJ, '7 and G such that the cla ss of estimators
defin ed as
r
reduces to the following estimators :
[ Gupta (1978) ]
[ Tripathi (1980) ]
_ _ x
[ Mohanty and Sahoo (1987) ]
(i ) YMSI = Y (1-UJ)x + UJX
-z
[ Mohanty and Sahoo (1987) ]
(j) YMSz = Y (I -UJx +UJx-X )~
( k ) Replace x by x = (I + d)X - dX for some real constant d in Ye and discuss
the different members of the resultant class of estimators.
Hint: ( a ) to (j ) Ceccon, Diana, and Salvan (1991); ( k) Diana (1992); David and
Sukhatme (1974).
Ek I Xi) = 0, E(e;e j I XiXj )= O\ii;t j and V(ei IX;)= no, where t5 is a constant of
order n- I and the variate xdn have a gamma distribution with the parameter
m = nh.
Hint: Singh and Singh (1997) , Dalabehera and Sahoo (1995) .
i=1
( b ) Show that the estimator Y2 = ,tr X + {I- ,t }rX is almost unb iased for population
mean for the optimum value of A given by
,topt = k + (1- ck { ~)
where c = (n -1)/ nand k is some constant.
Hint: Rao (1981) .
Exercise 3.32. Compare the predictive ratio estimator
where
s, = (NX - nx)/(N-n), n-
1I)
cxy =- ( I(~
i= 1 X
-I)(~Y -1), and 1)I(~ _ 1)2
c.~ =_(
n- 1 i= 1 X
with the usual ratio estimator and with the estimator, given by
1=1 Qix
where Qix' i = I, 2, 3 denote the t" known quartiles of the auxiliary variable and QiX
denotes its sample analogue. Find the minimum MSE of Mnew for the optimum
values of ai .
Hint: Singh, Singh, and Puetas (2003c) .
Chapter 3: Use of auxiliary information: Simple random sampling 269
Exercise 3.34. (a) Assume that the mode M o of the auxiliary variable is known,
find the bias and variance of the estimator of median M y defined as
Mnew = My( ~ x + M 0
] •
M x +M o
Compar e the mean squared error of Mnew with that of the usual ratio estimator
MR = My( ~:: ],
and develop a condition of it being efficient estimator.
( b ) Study the follow ing estimator for population median M y defined as
M(a)= My (A-M x]
y A -M
x
F1x
1= 1
where P;x , i = I, 2, ..., 99 denotes the / , known percentil e of the auxiliary variable
and hr denot es its sample analogue . Find the minimum MSE of Mnew for the
optimum values of ai'
Hint: Singh (2002a) .
Show that under this model, the variance of the linear regression estimator
Y LR = Y+ (Sxy/s~ Xx -x)
can be written as
Exercise 3.37. Stud y the asymptotic properties of the multi vari ate estimator of
popul ation mean Y given by
YI = fWk~+akC~\ - Xk )].
k=!
Discu ss the choice of Wk and a k such that the estimator YI reduces to the well
known multivariate ratio type of estimator proposed by Olkin (195 8) as
Yo = f Wk( ~kXk ).
k=1
Also discuss the nature of the estimator defined as
p
YHRM = L WkYHR(k) ,
k=1
- - - II(N-I)(_ -- ) P d h h
h
were Y HR(k) = rkX k + - (- - ) Y - lie Xk ' LWk = l, 'ie , Xk an
N il - I k=1
x, ave t e same
Exercise 3.38. Stud y the asymptotic properties of the follo wing two estimators of
popul ation total Y defined as
Exercise 3.39. Show that under the transformation II . = a + bx . , where a and b are
I I
where A, is a scalar. Obtain the minimum mean squared error of the estimator,
11, = '(WJ
y W
. {Io {I
Hint: Srivenkataramana and Tracy (1983 , 1986).
if i E A, if i E B,
Exercise 3.42. Suppose Yi = . and Xi = for Vi , where
otherwise, 0 otherwise,
A and B represent two groups in the population.
r
i=\
by omitting the /' group and y} ., X. are the sample means based on the
}
sub-
sample of size m = n]g , then show that a general class of almost unbiased ratio cum
product estimator is given by
Exercise 3.44. Stud y the asymptoti c properties of an estimator of, r given by,
Y; = Y( ~) - fi{j - ( ~)}
where fi is a suitably chosen constant, under the super popul ation model
~ =a + fi Xi + ~ ,
with a and fi are unkn own real constants, G.
1
are random errors distributed
withEckIX;) =O, EJ1Ix;) =<5Xl and EclG;Gj lx ;Xj)= O for every i*j. A lso
assume that 0 < <5 < 0 ~ g s 2 and the Xi are independently identically distributed
<X) ,
Exercise 3.45. Let the variables y ,x, z take real values (y; ,x;, z;) on the { h unit
(i = 1,2, ,..., N) in a finite popul ation. Assuming that the popul ation mean s X and Z
of the auxi liary variable are known and we are to estimate population mean of r
the study variable y . Assuming that y is positively correlated with x and
negatively with z, stud y the asymptotic properties of the ratio cum product
estimators of population mean defined as
_ _(xx )(z)Z
Yj = Y
where (y, x, z) are the unbiased estimators of the popul ation means (r, X, z)
respectively based on a simple random sample of size 11 drawn without
replacement. Defin e 11; = A - x; and V; = B + z;, i = 1,2, .. . , N where A and B are
suitably chosen scalars. Then ii = A - x and v = B + z are the unbi ased estimators
of fJ = A - X and V = B + Z , respectively. Study the asymptotic bias and variance
of the estimator,
Y2 = y( g)( ~).
Compare the estimator Y2 with the estimator Yl and discuss your opinions.
Hint: Singh (1967a, 1967b), Trac y, Singh, and Singh (199 8), Chang and Huang
(2001 a).
~ =y(XJ
Y x
based on a SRSWOR samplin g of n units, where y ,x and X have their usual
meanings. Under the model
Em{r; )= jJX;, v"Jr;) =u 2 X;, and Cm(r;, y)) =O .
Show that:
( a ) The ratio estimator can be written as:
YR = Y+ r(X - x)
where r = ylx ;
( b) cov(x, Y)= 1~/ S; [Q - R], where f = III N and Q = Sty / s; ;
( c ) When the model holds true we expect the negligible gain in efficiency from the
optimum estimator, YM U = y + Q(X - x);
( d ) If the model is wrong the relative gain in efficienc y of YM U = Y+ Q(X - x) over
YR is expected to be substantial.
Hint: Montanari (1998 , 1999).
Exercise 3.48. Suppose (Y, x) denote the population means of the two variables
y and x , and (y, x) denote the means of a random sample of size n. Let y(j) and
x(j ) be the means obtained by deleting III g observations of the/ ' group.
(a) Compare the classical estimator R= y/X and the Jackknife estimator
R*=g R- (g-I) f ~(j)
g ); 1 x{J)
with g = 2.
( b ) Study the following three estimators of population mean given by
YI= RX =y+R(X- x), Y2 =R*X, and Y3 =y+ R*(X - x)
unde r the super-population model
Yi = a + jJxi + ci '
where 8; has mean zero and variance 0./ (O ~ I ~ 2) and is uncorrel ated with x. .
I
Vu
g g-I j~ l
f
=-(1_) [Ok)-O(rl
variance estimator
Vm =_(_1_)
g g -I
f [OJ(r)-Of
j ~l
where 0 = r.
Hint: Krewski and Chakrabarty (1981).
( b ) Show that if T = s; (s;(s) js;) and T* = Y+(s Y jsJxs -x) then the estimator
'2
Sy ' 2 = Sy2(s;2/ Sx2)
reduces to Sf
Hint: Biradar and Singh (1998) .
Chapter 3: Use of auxiliary information : Simple random sampling 275
Exercise 3.52. Let it y' it x and it z denote, respectively, the median estimators
of My, M x and M z for the y, x, and z values. Study the asymptotic properties
of the multivariate ratio type estimator
Exercise 3.53. If the relationship between the two variates x and y is given by the
relationship y = a +bx, b *- 0, then show that in SRSWOR sampling, the sample
estimator y = NY is more efficient than the ratio estimator YR = y(x/ x), where
x= Ni and X denotes the known total if
h
were -
y = g -I IYi,
g -
X = g -I IXi
g
an d -r = g -I Ig (Yi / Xi ) .
~I ~I ~I
( C )An unbiased regression estimator based on splitting the sample into g groups
each of size m = n]g is given by
- 1g
where bg = g- L b: i » b: j is the sample regression coefficient computed from the
j=\
sample after omitting thelh group and Xj is the sample mean for the/" group.
Hint: Mickey (1959), Rao (1969).
Exercise 3.55. Let YI = Y, yz = y(x/x) and Y3 = y(x/x) denote the usual, ratio and
_ 3
product estimator of population mean Y . Consider a new estimator Y new = La,y, ,
;=1
3 3
such that Ia, = 1 and Ia,B(y,) = 0, where B(y,) denotes the bias in the lh
,=1 ,=1
estimator of population mean. Choose a, such that the MSE(Ynew) is minimum.
Hint: Singh and Singh (1993a).
Exercise 3.57. Let v., be the value of the t" unit III the population
n = [UI>Uz, ...,U N] for lh variable Yj' (j = 0,1,2) of which Yo is the variable under
study and Y\ and yz are the auxiliary variables defined over n . Let Yj be the
(b) Consider Yo = Yo, YI = YO(~ /YI)' Y2 = YO(:Y2 /Y;) and Y3 = YO(~/YlxY2 /Y;) such
that Yt E H, for t = 0,1,2,3, where H denotes the set of all possible estimators of
_ • 3. 3
population mean Yo . The set H is a linear variety if Yh = "IhtYtEH for "Iht = 1 and
t=O t=O
ht ER , where h, (t = 0,1,2,3) denote the real constants used for reducing the bias and
R stands for the set of real numbers . Find the values of ht (t = 0,1,2,3) such that the
inverse of Fy is Fyl(a)= inf~ : fry(y) ~ a}= Ya ' Note that a = FY(Ya) and that
Ya = Fy 1{fry (ya )}. Define a new variable Z, = ~(Ya - Y;) , i = 1,2,.., N . If we est imate
Ya as z= a + b(w - w), where W= g(x) is correlated to Z variable. Find the
minimum variance of the resultant estimator.
Hint: Mak and Kuk (1993)
Exercise 3.59. Let Y and Xki' k = I, 2, ..., p , respectively, be the survey variable
and the auxiliary variables related to Y and the information about the quantiles of
the auxi liary variables or distribution functions are known . From the sample of n
units from a population of size N we observe (Xki, Yi) where i E s. Consider
QXk (ak) for ak E(O.O,O.S)U(OS,l.O) are known and we wish to estimate Qy(,B) with
,B = 1/2 . Study the asymptotic properties of the following estimators of Qy(,B) as
. . ( ())p !FXk(QXk(ak)))
FR=Fy Qy,B "IWk • ~ ())
k=1 FXk QXk ak
and
P
FD=Fy(Qy(,B)}t fbk{FXk (Qxk (ak)~FXk (QXk (ak ))}, where "IWk = 1.
k=1 k=1
Hint: Rueda and Arcos (2002)
278 Advanced sampling theory with applications
Exercise 3.60. Let y and x, respectively, be the survey variable and the auxiliary
variable related to each other and the information about the quantil es of the
auxiliary variables or distribut ion functions are known. From the sample of n units
from a population of size N we observe lXi'
Yi ) where i E S . Consider FAM x ) be
a known and we wish to estimate FylMy). Study the asymptotic prop erties of the
followin g estimators of F y lMy) as
FR = FY(MytX{~X~
F u, x
and
FD = Fy(My)+b ~AM x )- FAMx)}.
where b is a real constant.
Hint: Rueda, Arcos, and Artes (1998) .
Exercise 3.61. Show that under simple random without replacement (SRSWOR)
sampling, the following estimator:
Exercise 3.62. For the following situations, discuss how you may use ratio,
produ ct, difference or regression estimators:
( a ) Estimate the average number of fish caught per month by marine recreational
fishermen at Atlantic and Gulf coasts, assuming that the number of employees per
month are known .
( b ) Estimate the avera ge amount that graduate students spent on stationary in your
class, assuming that weekly sale of stationary at a local shop is known.
( c ) Estimate the proportion of time devoted to politics in the television on the
national channel of your country , and no further auxiliary information is available.
( d ) Estima te the total weight of bones discarded at the time of shipm ent of usable
meat of chickens, assuming that the number of shipments are known.
Exercise 3.63. Consider a popul ation of N identifi able units on which a study
variable y is associ ated with p auxiliary variables xl" " 'x p whose population
variances aI, n
k = 1, 2, ...., p are assumed to be known. Let (Yi' Xik), i = 1, 2, ..., be
the observed values in an SRSWR sample on the (p + 1) variables. Consider the
probl em of estimation of finite population variance
Chapter 3: Use of auxiliary information: Simple random sampling 279
2 1 N _ 2 - 1 N
0"0 =-I(if- Y) where Y = - I if .
Ni=1 Ni= 1
Find the bias and variance of the following estimators of 0"6 as:
p
k = I, 2, ..., p , 0 < w k < 1 and I Wk = I.
k=1
• •2 P( • 2) 2 p +1
( e) O"s = I Wk rkO"k + wp +l sO ,where I Wk =
I.
k=1 k=1
Hint: Isaki (1983), John (1969), Shukla (1996) , Mohanty and Pattanaik (1984) ,
Singh and Singh (2001) .
Exercise 3.64. Let Yi and Xi' i = 1,2,.., N denote the value s of the popu lation
units for the study variable Y and the auxiliary variabl e X , respectively . Further,
let Yi and xi , i = 1,2,.., n denote the values of the units included in a sample Sn of
size n drawn by simple random sampling without replacement (SRSWOR). The
parameter unde r interest is the population interquartile range of the study variable
Y defined by
0 y =Q3y -Ql y
where Ql y and Q3 y denote the first and third popul ation quartil es of Y respectively.
The conventional estimator of 0 y is
0y = Q 3y -QIY
where Ql x and Q3x are sample estimates of first quartile Q lx and third quartile Q3x
of X respectively.
( a ) Study the asymptotic properties of the following estimators of 0 y defined as
0(1)=0
y
(0 / 0)
y~ x xt»
and 8(2)=8
y y
+/3 iqr
. (0
~ x
-8 x )
where fJiqr is a suitably chosen constant such that the variance of 0~) is minimum .
( b ) Study the asymptotic prop erties of the following estimators of M y defined as
• (J ) _ • (" / : ) • (2) _ • ( _ : )
My - M y \0 x 0 x and M y - M y + fJ iqr \0 x 0 x
where fJiqr is a suitably chosen constant such that the variance of 0~) is minimum .
Hint: Singh and Singh (2002), Singh, Singh, and Puetas (2003b).
280 Advanced sampling theory with applications
Exercise 3.65 . Study the asymptotic properties of the following estimators of the
population mean, Y, defined as
Exercise 3.66. Consider a finite population of N units n:{u I' U2 ,... , UN} ' Let Y
and x be the variables taking value Yi and Xi respectively on Ui (i == 1,2,...,N) .
For estimating Y , Srivenkataramana (1980) and Bandyopadhyay (1980) proposed
a dual to product estimator as
.:.. _( X]
Yr == Y -=* , where x_* == NX - nx .
x N- n
Using predictive approach advocated by Basu (1971), Srivastava (1983) envisaged
another estimator for Y as
Y.s == Y-JnX(+(N
_
-2n)xL
)
-[I _(X_*-x]]
Y
NX - nx x
Let a sample of size n be drawn without replacement from the population and let it
be split into g sub-samples each of size m == n]g each, where m is an integer. Let
lx}, y}), j == 1,2,.... g be the unbiased estimators of (x, Y) based on/" sub-sample
of size m . The Jackknife versions of the above estimators are given by
.:..J-:, . == Y}
- [X
-=* , and J s: - [ X - X}]
Yr ' == Y} 1--_-*- , where x} ==
-* NX - nx} .
, ; == 1,2,...,g .
J x- J .r . N-n
J J
Let };I == ~ f Yr', };2 == ~ f Y, ., };3 == Yr , };4 == y.. and };5 == y . Study the
g } ~l J g }~ I J
where
if i E S,
• {Yi
Yio = a + bx, if i E (0 - s],
and a and b are constants to be chosen such that the mean square error of the
estimator is minimum.
( a ) Show that the estimator Yilu can be written as
Yilu=1 y+(I-l)a+b(X-/i)
where 1 = niN .
( b ) Show that the minimum mean square error of the estimator Yilu is given by
Exercise 3.68. Find two complex numbers such that the variance of the difference
estimator Ydif = Y+ k(X - r] is zero.
Hint: Set V(Ydif)= 0 and solve the quadratic equation for two complex values of
k.
Practical 3.1. A private company ABC was interested in estimating the average
amount of real estate farm loans (in $000) during 1997 in the United States . The
company collected information from six states included in an SRSWOR sample as
shown below :
CT ME NE NY VA WI
4.373 51.539 3585.406 426.274 188.477 1372.439
7.130 8.849 1337.852 201.631 321.583 1229.572
( a ) Given that the average amount of nonreal estate farm loans is $878 .16 (in
$000) for the year 1997, obtain the ratio estimate for the average amount of the real
estate farm loans (in $000) during 1997. Develop an estimator of the mean squared
error of the ratio estimator and hence deduce the 95% confidence interval. Verify
using information given in population 1 of the Appendix, if the true mean lies in the
confidence interval you suggested. Interpret your findings in two lines.
( b ) Estimate the average of the real estate farm form loans during 1997 by using
Beale's estimator defined as
282 Advanced sampling theory with applications
Construct 95% confidence interval for estimating the average real estate farm loans
by assuming that the Beale's estimator has the same mean squared error as that of
usual ratio estimator .
( c ) Also apply Tin's estimator of the population mean given by
.0.
Yt =Y
-(XJ[ (I-/)r.
x 1+- . )~
- \cXy - c'''' J '
Il
Construct 95% confidence interval estimate of the average real estate farm loans
assuming that the mean squared error of Tin's estimator is same as that of usual
ratio estimator .
( d ) Estimate the average of the real estimate farm loans using a new estimator of
population mean, defined as
LXiYi
• .n I ] -
Y new == .!.=.-.- X.
II 2
[ L Xi
i=1
Construct 95% confidence interval estimate for the real estate farm loans assuming
that the mean squared error of this new estimator is same as that of usual ratio
estimator .
( e ) Compare your results obtained in part ( a ), ( b ) , ( c ) and ( d ), and comment.
Answer:
(a) 484.69, [120.86,848.52]; (b) cxy == I .5306 , c == 2.1997 , Yb = 389.30;
xx
Practical 3.2. An instructor suggested to the class to use ratio estimator while
estimating the average nonreal estate farm loans by making use of known real estate
farm loans. Use the information given in population I of the Appendix to support
the instructor's statement.
Hint: Discuss the relative efficiency of the ratio estimator with respect to usual
estimator.
Practical 3.3. A team of doctors wishes to estimate the average duration of sleep
(in minutes) during the night for persons aged 50 years and over in a small village
in the United States. It is known that there are 30 persons living in the village aged
50 years and over. Instead of asking everybody, the psychologist selects an
SRSWOR sample of five of these people and records the information as given
below:
Chapter3: Use of auxiliary information: Simplerandom sampling 283
It is well known fact that as the age of a person increases, the sleeping hours
decrease. Apply the appropriate method of estimation for estimating the average
sleep time in the particular village under study. Find an estimator of the mean
squared error of the estimator you used and derive the 95% confidence interval
estimate. Assume that the average age 67.267 years of the subjects is known as
shown in the population 2 in the Appendix.
Hint: Apply product method of estimation.
Practical 3.4. The age and sleeping time (in minutes) of 30 persons aged 50 and
over living in a small village in the United States is given in population 2 of the
Appendix. Discuss the relative efficiency of the product estimator under SRSWOR
design.
Practical 3.5. The regression method of estimation has been found to be the most
effici ent method among others . Use it to estimate the average amount of the real
estate farm loans (in $000) during 1997 based on an SRSWOR sample of six states
selected from the population I in the Appendix and is given below
'" State AL FL MD OH TX VT
Nonreal estate sfarm'loans ( X )$ 348.334 464.516 57.684 635.774 3520.361 19.363
Real estate farm loans ( Y ) $ 408.978 825.748 139.628 87 1.720 1248.76 1 57.747
The average amount of nonreal estate farm loans $878.16 ($000) for the year 1997
is known . Construct a 95% confidence interval estimate and interpret it in non-
technical language.
Practical 3.6. Find the relative efficiency of the regres sion estimator for estimating
the average amount of the nonreal estate farm loans during 1997 by using data on
the real estate farm loans during 1997 as an auxiliary variable with respect to the
ratio estimator of population mean. The real and nonreal estate farm loans (in $000)
during 1997 in the 50 states of the United States have been presented in population
I of the Appendix.
Practical 3.7. The amounts of the real and nonreal estate farm loans (in $000)
during 1997 in the 50 states of the United States have been given in population 1 in
the Appendix. If we select an SRSWOR sample of six states to collect the required
information, find the relative efficiency of the general class estimators which makes
use of the known variance of the auxiliary variable at the estimation stage, for
estimating average amount of nonreal estate farm loans during 1997 by using
information from real estate farm loans during 1997 as an auxiliary variable, with
respect to the regression estimator of population mean.
284 Adva nced sampling theory with applications
Practical 3.8. The study of the relationsh ip between age and duratio n of sleep helps
a local hospital in developing future policies. If hospita l researchers consider 10
patients to collect the information from the population 2 of the Appendix, then what
will be the relative bias of the usual estimator of the correlation coefficient under
SRSWOR design ?
Practical 3.9. A bank manage r raised the issue that the variation in the nonreal
estate farm loans effects their customers . The bank selected an SRSW OR sample of
six states . The manage r decides to pick up an estimator from the general class of
estimators. Discuss the relative efficiency of the genera l class estimato rs which
makes use of known variance of the auxiliary variable at the estimatio n stage, for
estimating the finite population variance of the amount of the nonrea l estate farm
loans dur ing 1997 by using known information abo ut the real estate farm loans
during 1997, with respect to the ratio and regre ssion estimators of finite population
vanance.
Hint: Use information from population I of the Appendix .
Practical 3.10. A private company Kitty Manage ment believes that real and nonreal
estate farm loans have a cause and effect relationship between them. They want to
know the effec t of unit change in real estate farm loans on nonrea l estate farm
loans. A statistician suggests to them that there are two different measuring tools to
estimate regress ion coefficient. Study the relative efficiency of the usual estimator
bl with respect to the unbiased estimator b2 of regression coefficient by using
comp lete information available in population I of the Appendix.
Practic al 3.11. Your instructor provided you an SRSWOR sample of six states
from the population 3 of the Append ix as:
State DE MD NC VT WA WI
Year- 1994 <' 0.168 0.173 0.088 0.165 0.138 0.230
Year 1996 <• • 0. 173 0.158 0.117 0.194 0.204 0.241
Find the mistake made by the instructor during the collection of data . Correct the
data accord ingly. Apply the regression method of estimation for estimating the
average price of apple crop during 1996, assuming that average price during 1994
($0 .1708) is known.
Practical 3.12 . Jackknife variance estimation technique has become popular due to
its simplicity. Suppose we took an SRSW OR samp le of six states from the
popu lation 1 in the Appendix and gathered the following infor mation:
State AR KY MN OK UT WA
Nonrea l estate farrn loans (X )$ 848.317 557.656 ~466 . 892 1716.087 197.244 1228.607
Real estate fa rm loans ( Y )$ 907.700 1045.106 1354.768 6 12.108 56.90 8 1100.745
Chapter3: Use of auxiliary information: Simplerandomsampling 285
Apply the ratio method of estimation for estimating the average amount of the real
estate farm loans (in $000) during 1997. Also find an estimate of variance of the
ratio estimator using Jackknife technique and deduce the 95% confidence interva ls.
Assume that the average amount $878.16 of nonreal estate farm loans (in $000) for
the year 1997 is known.
Answers: 95% CIs are[21O.830,1060.399] , and [209.063, 1062.166].
Pract ical 3.13. Sometimes the estimation of variance of the regression estimator is
difficult, but the Jackknife technique has been found to be the best solution in such
situations. Apply the regression method of estimation for estimating the average
amount of the real estate farm loans (in $000) during 1997. Also find estimates of
variance of the regression estimator using Jackknife technique and hence deduce
the 95% confidence intervals by using the data given below:
;di~, State / KY NC SD UT AL FL
'" /;~
The average amount $878.16 of nonreal estate farm loans (in $000) for the year
1997 is known.
Practical 3.14. The season average price (in $) per pound of the commercial apple
crop in 36 states of the United States has been given in population 3 of the
Appendix. Suppose we selected an SRSWOR sample of six states to collect the
required information for 1996. Find the relative efficiency of the regression type
estimator of average price in the United States that makes use of past information
from two years with respect to the estimator that makes use of past information
from only from one year.
Practical 3.15. The real and nonreal estate farm loans (in $000) during 1997 in 50
states of the United States have been presented in population I of the Appendix.
Suppose we selected an SRSWOR sample of six states to collect the required
information. Find the relative efficiency of the ratio estimator of median, for
estimating median of the amount of the nonreal estate farm loans during 1997 by
using information from real estate farm loans during 1997, with respect to the usual
estimator of population median. Assume that both real and nonreal estate farm loans
follow normal distributions.
Practical 3.16. Consider the problem of estimating the average nonreal estate farm
loan in the United States . We wish to apply the ratio method of estimation using
known information about the real estate farm loans as shown in population I in the
Appendix. What is the minimum sample size required for relative standard error
(RSE) to be equal to 25%?
286 Advanced sampling theory with applications
Practical 3.18. Consider the population under study consists of the students today
present in the class . Construct a list of names of the students and assign a number
to each of them. Use random number table to select a sample of20% of the students
present in the class. Collect information about their GPA from the students selected
in the sample. Assume that the average number of lectures attended by the whole
class are known (or can be found from the register) . Also collect information about
the number of lectures attended by the students selected in the sample . Assuming
that the relationship between the average number of lectures attended and GPA of
the students is positive, apply the appropriate method(s) to estimate the average
GPA of the class, and derive 95% confidence interval estimate(s) .
Practical 3.19. John needs an estimate of the average nonreal estate farm loan in
the United States . His supervisor has advised him to apply the regression method of
estimation using known information about the real estate farm loans as shown in
population 1 of the Appendix with minimum relative standard error equal to 10%.
What will be John's sample size to meet his supervisor's conditions?
Practical 3.20. Select an SRSWOR sample of the size obtained in Practical 3.19
from population 1 of the Appendix . Collect information about the real estate farm
loans and nonreal estate farm loans from the selected states. Apply the regression
method of estimation for estimating the average nonreal estate farm loans, assuming
that the average real estate farm loans in the United States is known .
Practical 3.21. For estimating the regression coefficient of the amount of the real
estate farm loans (in $000) on the nonreal estate farm loans during 1997 in the
United States, we took an SRSWOR sample of six states from the population 1 in
the Appendix. From the states selected in the sample, we collected the following
information :
J';J~~~li;; : i· i.~$ . 11 :.
iiil'~;~l~ AK CA CT ME VA WI
NOr1reaI:estatefarm'loa.ns(Xi)$ 3.433 3928.732 4.373 51 .539 188.477 1372.439
Reallestate~farri1,;n()aiis (T) $~l ~t$l 2.605 1343.461 7.130 8.849 321 .583 1229.572
Prac tical 3.22. The amounts of the real and nonreal estate farm loans (in $000)
during 1997 in 50 state s of the United States have been given in population I in the
App endi x. Suppose we select ed an SRSWOR sample of six states to collect the
required information. Study the relati ve efficiency of the usual estimator bl with
respect to the unbiased estimator b 1 of regression coefficient for this population.
Practical 3.23. A team of medical doctors claims that ther e is a strong negati ve
relationship between the age of a person and the hours of sleep required . Justify
their statement based on the information given in population 2 of the Appendix.
Also study the relative bias of the usual estimator of the correlation coefficient
based on a sample of 10 units.
P ractica l 3.24. A student in medical college studies the statement made by the team
of medical doctors about the negative relationship between age and duration of
sleep, and takes an SRSWOR sample of 6 persons from population 2 of the
Appendix as given below :
Practica l 3.25. Select four different samples each of four unit s by using SRSWOR
sampling from the population I of the Appendix . Collect the information for the
real and nonreal estate farm loans from the state s selected in each sample. The
average nonreal estate farm loan is assumed to be known . Obtain four different ratio
estim ates of the aver age real estate farm loans from the information collected in the
four samples. Pool the information collected in four samples to obtain a pooled
ratio estimate of the average real estate farm loans .
( a ) Derive an unbiased estimate of the average real estate farm loans .
( b ) Construct 95% confidence interval.
Given: Average nonreal estate farm loans $878 .16 .
R ules: Use the Pseudo-Random Numbers (PRN) given in Table I of the Appendix
to select four samples with starting columns as:
Sample Starting
NUrhber Columns s
I 3 and 4
2 8 and 9
3 5 and 6
4 4 and 5
Answer: The 95% CI is [545.36, 904.80].
288 Advanced sampling theory with applications
Practical 3.26. Consider the problem of estimating the finite population variance of
the duration of sleep in the United States by using the known benchmark as the
variance of the auxiliary variable, age . Using information given in population 2 of
the Appendix, find the relative efficiency of the product type of estimator
s; = s;(s;/S;) with respect to the estimator s~ .
Practical 3.27. Estimate the finite population variance of the duration of sleep in
the United States based on the information given in the population 2 of the
Appendix. What is the minimum sample size required for the estimator
s; s;&
= ;/S;) to have minimum relative standard deviation 30%?
Practical 3.28. A pilot survey related to population 1 of the Appendix indicates that
the values of certain parameters of interest are given as
,1,40 = 3.5822, Ao4 = 4.5247 and ~2 = 2.8411 . Use this information to study the
relative efficiency of the regression type estimator si = s; + k(S; - s;) with respect
to the ratio type of estimator s} = s;S; / s; under two situations:
( a ) estimate finite population variance of the nonreal estate farm loans using real
estate farm loans as known benchmark.
( b ) estimate finite population variance of the real estate farm loans using nonreal
estate farm loans as known benchmark.
Practical 3.29. A supermarket is worried about the average price of the commercial
apple crop during 1996. The correlation between the price during 1996 (Y ) with
that during 1995 (X 2 ) and 1994 ( Xl) is assumed to be known. Find the minimum
sample size, n, required to estimate the average price with relative standard
deviation 15% from the population 3 given in the Appendix.
Given: R;.XI,X2 = 0.8029.
Practical 3.30. Select an SRSWOR sample of the size developed in Practical 3.29
and select so many states from population 3 given in the Appendix. Collect the
information about the season's average price per pound of the apple crop during
1996, 1995, and 1994. Estimate the average price per pound ( $ ) of the commercial
apple crop during 1996 in the United States. Assume that the average price per
pound of the commercial apple crop during 1995 and 1994 are accurately known.
Apply the regression estimator of population mean with two auxil iary variables.
Derive the 95% confidence intervals estimates using ( a ) superpopulation model
approach, and ( b) design based approach.
Given: Xl = 0.1856 and X2 = 0.1708 .
Chapter3: Use of auxiliary information: Simplerandom sampling 289
Practical 3.31. A private organisation PQR was interested in estimating the average
amount of real estate farm loans (in $000) during 1997 in the United States. The
organisation collected information from 15 states, the real estate (y) and nonreal
estate ( x ) farm loans, included in an SRSWOR sample taken from a list of 50
states and observed the following results :
11 n n 2 n 2
LXi = 18867.089, LYi = 12525.246, LXi = 48780336.98, LYi = 18001501.38, and
i= ! i= ! i=1 i=1
II
LXiYi = 26591710.56 .
i=1
The average amount of nonreal estate farm loans is $878.16 (in $000) for the year
1997 is known .
( a ) Obtain a ratio estimate for the average amount of the real estate farm loans (in
$000) during 1997. Develop an estimator of the mean squared error of the ratio
estimator and hence deduce the 95% confidence interval.
( b ) Obtain a regression estimate for the average amount of the real estate farm
loans (in $000) during 1997. Develop an estimator of the mean squared error of the
regression estimator and hence deduce the 95% confidence interval.
( c ) Comment on the interval estimates.
Answer: (a) [329.37 , 836.58] , (b) [460.08, 881.46] .
Practical 3.32. From a popu lation consisting of 100 individuals with their weekly
income ( Y ) and age ( x ), we have the following information:
N
LXi = 35,00,
N
LYi = 50,000,
N
Lx? = 85,0000 Li
N
= 90,000000 and
i=! i=! i=1 i=!
N
LXiYi = 7500000 .
i=1
Practical 3.33. An estimate of total fertility rate (TFR) in the world is helpful to the
policy makers in the world for each country. The fertility rate has been found to
have relationship with crude birth rate (CBR), crude death rate (CDR) and infant
mortality rate (lMR) . The average CBR, CDR and IMR for 96 countries during
1997 have been found to be 26.0 II , 10.872 and 50.138, respectively. To estimate
the total fertility rate (TFR) in the world, a team of consultants takes an SRSWOR
sample of 20 countries out of a list of 90 countries as given in the following table:
Apply the following estimator to estimate the TFR all 96 countries listed III
populat ion 8 of the Appendix :
)i'r =)i + H\\(u\ - 1)+ Hdu2 -1)+ H13(U3 -1),
where u j = Xj /X j and the estimates Hlj' j = 1,2,3 are obtained by solving the of
l
normal equations :
' 2
CX \ ' rX\X2 C X\C X2'
' 2
rX2X\ :X\ : X2 ' CX2 '
( a ) Find the population medians M y and M x of the study variable and auxiliary
variables, respectively.
( b ) Select all possible samples of three units ( n = 3 ) with SRSWOR sampling.
( c ) Find the estimates of the medians it y and M x from each sample.
( d ) Find the exact bias in the estimator it y using the definition.
( e ) Find the exact mean square error of the estimator it y using the definition .
( f) Assuming that the median M x of the auxiliary variable is known, find the
( g ) Find the exact bias in the product estimator it p using the definition.
( h ) Find the exact mean square error of the product estimator it p using the
definition.
( i) Find the relative efficiency of the product estimator it p with respect to
sample estimator it y .
Units p Q R S T U V
y.
. 1 9 II 13 19 21 26 29
J
Xi 10 26 28 25 36 37 40
( a ) Find the population medians M y and M x of the study variable and auxiliar y
variables, respectively.
( b ) Select all possible samples of three units ( 11 = 3 ) with SRSWOR sampling.
( c ) Find the estimates of the median s if y and M x from each sample.
My M x
Hint: Fj 1 = J JJ(x,y}dxdy , Singh and Joarder (2002)
9 9
Practical 3.37. From a population consisting of 200 individu als with their weekl y
income ( y ) and age ( x ), we have the following inform ation:
N N N N
LXi =1 8867.089 , LYi = 12525.246 , Lxl = 48780336.98 L yl = 18001501.38 and
i=1 i=1 i=1 i=1
N
LXiYi =26591710.56 .
i=1
Chapter 3: Usc of auxiliary information: Simple random sampling 293
Practical 3.38. The following map shows states in the USA that have beaches. We
wish to estimate the average of certain parameters of interes t with the beaches
across the USA.
-.,~'--ru
NJ CT
<;
\, ..........-
r-:;-Tj-:-:::--.;;;;-----jf.----, .;C---..L;::...-=~\ \ , 'OE
\ \.110
OC
.. -:.
HI "
...... _ OTHER PACIFIC ISLANDS -OTHER ATLANTIC
Source: Printed with permission from NOAA
( a ) Make a list of all the states that have beaches in the USA and arrange them in
alphabetical order. (Rule: Use two letter abbrev iations for sorting the states)
294 Advanced sampling theory with applications
( b ) Select an SRSWOR sample of 5 states from the sorted list of states. (Rule :
Start from the first two columns of the Pseudo-Random Number Table given in the
Appendix). Collect the information on the number of immigrants admitted during
1996 in these states from the population 9 given in the Appendix.
( c ) Estimate the total number of immigrants admitted to these states during 1996
and construct a 95% confidence interval estimate.
( d ) Assume that the number of immigrants admitted to all the states shown in the
above map are known for the year 1994. Using this information, apply the ratio
estimator to estimate the total number of immigrants during 1996. Also construct
95% confidence interval estimate for it.
( e ) Apply the regression estimator to estimate the total number of immigrants
during 1996, and compare the resultant 95% confidence interva l estimate with other
two cases.
( f ) Use Jackknife to estimate the variance of the ratio estimator, and construct
95% confidence interval estimate for the total number of immigrants during 1996.
( g ) Use Jackknife to estimate the variance of the regress ion estimator, and
construct 95% confidence interval estimate for the total number of immigrants
during 1996.
Optional:
( h ) Collect information from the internet about the current temperature in the
states selected in your sample. Estimate the average temperature on the beaches in
the USA , and construct a 95% confidence interval estimate.
( i ) Collect information about the precipitation (or any other auxiliary variable
related to temperature, for example number of visitors) of all the beaches in states
of the USA. Use this information to use the ratio (or product) estimator to find the
average temperature on the beaches, and construct a 95% confidence interval
estimate.
(j) Use the regression estimator to estimate the average temperature and develop
95% confidence interval estimate.
( k ) Use the Jackknife to estimate the variance of the ratio (or product) estimator,
and construct 95% confidence interval estimate of the average temperature across
all beaches in the USA.
( I ) Use the Jackknife to estimate the variance of the regression estimator, and
construct 95% confidence interval estimate of the average temperature across all
beaches in the USA.
( m ) Discuss your interval estimates in each case.
4. USE OF AUXILIARY INFORMATION: PROBABILITY
PROPORTIONAL TO SIZE AND WITH REPLACEMENT
(PPSWR) SAMPLING
4.0 INTRODUCTION
Th rou gh the pr eviou s chapter we have see n that the proper use of auxi liary
informat ion at the estimat ion stage for estima ting any populati on param eter results
in a ga in in the efficiency of the res ultant estimators. For example, the product and
rati o estimators of the population mean remain bett er than the sa mple mean when
the corre lation between the study variable and the aux iliary va riable lies in the
interval [-1.0, -0.5) and (+0.5, + 1.0], respectively. In this chapter we shall show
that the auxiliary information can also be used to sel ect a sa mple whi ch can provide
better estimators of population parameters. In oth er words, the auxiliary
information can be used at the sample sele ction stage as well as at the estimation
stage . A sa mpling scheme with replacement in which each sa mpling un it has
unequal probability of sel ection, the probability bein g proportional to the size of the
auxiliary va riable associated w ith the particular unit , is ca lled probability
proportion al to size and wit h repl acement (PPSW R) sa mpling sc heme.
Let Y be a study variable and X be an auxiliary variable. For example, con sid er
we wa nt to estimate the popul ation in the villages of a particul ar distri ct. Th en we
would choose as our auxiliary va riable a variable on whic h we have information ,
e.g.:
( a ) Area of each village of the district (co rre lation wi th a study va ria ble = 0.70 ,
say);
( b) N umber of hou seholds in each village of the distri ct (correlation wi th a study
va riable = 0. 85 , say) .
On the basis of the above information , we would choose the aux iliary variable
which has maximum correlation with the study va riabl e. Thus the variable at ( b )
may be a mo re useful au xiliar y var iable when selecting a sa mple usin g PPSWR
sampling.
Let us explain the method of PPSWR sampling with the help of a simple example.
Con sider a population con sists of N = 4 unit s, viz., A, B, C, and D . Consid er there
are two variables Y and X associated with each unit as foll ow s:
.
Unit number.or identifier A B C D
Values of a stud y variable ( r; ) 2 4 5 6
Values of an au xiliary variable (X i) 4 8 10 12
4 4
Obviously we have X = I X i = 34 and Y = I Yi = 17 . Consider we wish to draw a
i= 1 i= 1
sample of n = 2 units by using PPSWR sampling . Not e that we are using WR
sampling , the total number of possible samples wi ll be N n = 4 2 = 16 and are listed
in Table 4.1 .1. Evid entl y we have
,) Nn , 16 32 144
E (Y = j~/(J )Yj = 1156 x 17 + 1156 x 17 + + 1156 x 17 = 17 = Y . (4.1.1)
Hence
, X n Yi
Y=- I -
n i= IX i
is an unbi ased estimator of population total Y. No w the variance of this estimator is
giv en by
n 2
v(y)=E[Y- E(y)f = 1 p(JXY E(Y)) j -
j=1
=~(17-17r + .... +
144 (17-17)2 =0 . (4.1.2)
1156 1156
The po ssible 16 samples are shown in the follow ing table :
8 (B, D) YI =4 xI = 8 0.5 8 12
-
96
x - = --
17
Y2 = 6 x2 = 12 0.5 34 34 1156
9 (C, A) YI = 5 XI = 10 0.5 10 4
-
40
x- =--
17
Y2 = 2 x2 =4 0.5 34 34 1156
10 (C, B) YI = 5 x l = 10 0.5 10 8
-
80
x -= - -
17
Y2 = 4 x2 = 8 0.5 34 34 1156
11 (C, C) YI = 5 XI = 10 0.5 10 10
-
100
x- = - -
17
Y2 = 5 x2 = 10 0.5 34 34 1156
12 (C, D) Yl = 5 XI = 10 0.5 10 12
-
120
x -= - -
17
Y2 = 6 x2 = 12 0.5 34 34 1156
13 (D, A) Yl =6 xl = 12 0.5 12 4
-
48
x -= - -
17
Y2 = 2 x2 = 4 0.5 34 34 1156
14 (D, B) Yl =6 xI = 12 0.5 12 8
-
96
x -= - -
17
Y2 = 4 x2 = 8 0.5 34 34 1156
15 (D, C) YI = 6 xl = 12 0.5 12 10 120
- x- = - -
17
Y2 = 5 x 2 = 10 0.5 34 34 1156
16 (D, D) Yl =6 xl = 12 0.5 -
12 12 144
x -= - -
17
Y2 = 6 x 2 = 12 0.5 34 34 11 56
The following graph shows the relation ship between Y and X as X = 2Y and
o = tan- 1(i/2}
12
10
Y=Xl2
8
6
4 -
2
o - - --- 1 ----1- - -i-- - -----j- - - t - --- - - -,
o 2 3 4 5 6
x--->
We observed that the variance of the estimator reduces to zero if an exact linear
relat ionship exist between the study variable and the aux iliary variable. As the
direct proportionality between Y and X deviates, the variance of the estimator
incre ases. If Y and X are perfectly correlated and the regression line passes
through the orig in, then the relationship is of the type as show n in the Fig. 4.1.1.
Note that if Y and X are perfe ctly correl ated but the regression line doe s not pass
through the origin then the accuracy of the estimators under a PPSWR desi gn will
be decre ased . Thu s the conditions determ ining the use of the PPSWR sampling
scheme being more efficient than SRSWR sampl ing are:
8
6
'1
.;. 4
2 I
O+--- ~._ .... _ - - j
3 4 5 6
x---->
Th e above regre ssion line shows a perfect positi ve correlation but it does not pass
through the origin and therefore the second cond ition is not satisfied. In this case,
the accuracy of estimators will decrease. To discuss the effect on accuracy, let us
aga in consider a population consisting of N = 4 units , viz., A , B, C and D .
Assume there are two variables Y and X associ ated with each other through the
relation Y = 2 + O.5X as follows:
4 4
Obviou sly we have X = I X i = 20 and Y = I f; = 18 . Con sider we wish to draw a
i=1 i=1
sample of 11 = 2 un its by using PPSWR sampling. No te that we are using with
Chapter 4: Use of auxiliary information: PPSWR Sampling 299
replacement (WR) sampling, therefore the total number of possible samples will be
N n = 42 = 16 . The possib le 16 samples are shown in the Table 4.1.2.
Evidently we have
n
4 8 64
,)
E (Y =
N
L p ()
j Yj
A
= - x 30 + - x 25 + + -xI5 = 18 = Y . (4.1.3)
j =l 400 400 400
Hence Y=
X ±(YdxJ is again an unbiased estimator of population total Y .
n i= 1
However the variance of this estimator is
4
=-x(30 -18) 2 + -8x(25- 18)2 + 64
+-(15-18) 2 =9.667. (4.1.4)
400 400 400
which is obviously greater than zero.
YI = 3 xl = 2 1.5 2 2 4 30.00
- x- = -
Y2 = 3 x2 = 2 1.5 20 20 400
2 (A, B) YI =3 xl = 2 1.5 2 4
-x- = -
8 25.00
Y2 = 4 x2 = 4 1.0 20 20 400
3 (A, C) Yl = 3 xI =2 1.5 2 6
-x -= -
12 23.33
Y2 = 5 x2 = 6 0.833 20 20 400
4 (A, D) Yl = 3 xl =2 1.5 2
-x -
8
= -
16 22.50
Y2 = 6 x2 = 8 0.75 20 20 400
5 (B, A) Yl =4 xl =4 1.0 4
-
2 8
x- = -
25.00
Y2 = 3 x2 = 2 I.5 20 20 400
6 (B, B) Yl =4 xl =4 1.0 4 4
-x -= -
16 20.00
Y2 = 4 x2 = 4 1.0 20 20 400
7 (B, C) Yl =4 xl =4 1.0 4
- x-
6
= -
24 18.33
Y2 = 5 x2 = 6 0.833 20 20 400
8 (B, D) YI = 4 xl = 4 1.0 -
4 8
x- =-
32 17.50
Y2 = 6 x2 = 8 0.75 20 20 400
Continued ... ....
300 Adva nced sampling theory with applications
9 (C, A) YI = 5 xI = 6 0.833 6 2
- x- = -
12 23.33
Y2 = 3 x2 = 2 1.5 20 20 400
10 (C, B) YI = 5 xl = 6 0.833 -
6 4 24
x- =-
18.33
Y2 = 4 x2 = 4 1.0 20 20 400
II (C, C) Yl =5 xI = 6 0.833 -
6 6 36
x- = -
16.33
Y2 = 5 x2 = 6 0.833 20 20 400
12 (C, D) YI = 5 xI = 6 0.833 6
-
8 48
x - =-
15.83
Y2 = 6 x2 = 8 0.75 20 20 400
13 (D, A) YI =6 xI = 8 0.75 8
-
2 16
x- = -
22.50
Y2 = 3 x2 = 2 1.5 20 20 400
14 (D, B) YI = 6 xl = 8 0.75 8
-
4
x- =-
32 17.50
Y2 = 4 x2 = 4 1.0 20 20 400
15 (D, C) YI = 6 xI = 8 0.75 8
-
6
x- =-
48 15.83
Y2 = 5 x2 = 6 0.833 20 20 400
16 (D, D) YI =6 xl = 8 0.75 8
-
8
x- =-
64 15.00
Y2 = 6 x2 = 8 0.75 20 20 400
Remark 4.1.1. ( a ) The probabi lity of selecting the t il unit in a partic ular sample by
N
using PPSWR sampling is given by p; = X;/ X where X = LXi for i = 1,2,...,N.
i= l
( b ) The probab ility of selecting any particular sample s of II units from a
population n of N units by using PPSWR sampling is given by
There are several methods of selecting a sample by using PPSWR sampling, but we
will discuss here only two methods:
4;1.1 CUMULATIVKTOTALMETHOD
N XN X I +Xz +",+ X N TN
Total , N Remark: Note that: X = TN
.~. X= 'IXi
.,. i=1
The second step is to select n random numbers (Ri , i =I, 2, ...., II), say , between 0
and TN' Consider the first random number selected is R1 and now if 71-1< R, ~ 71,
then the /h population unit Xi is selected to be included in the sample. Then a
second random number Rz is selected and again tested in the same manner and the
process continued till II un its are selected. Th ese II selected unit s will form a
sample s of size n with the PPSWR sampling scheme. A pictori al representation
of the method is show n below :
<--- XI ---> <--- X Z ---> <---- X Z ----> <--- Xi -----> <-- X N --->
1 1 1 1 -' 1 1 1
To 1\ Tz T3 71-1 71 TN_I TN
Note that X is con stant, thu s the prob ability of selection of the /h population unit is
proportional to the size of the /h unit of the auxili ary variable. Hence this method is
called a method of probability proportional to size.
Example 4.1.1. Use the cumul ative total method to select a sample of eight units
from population 1 given in the Appendix by using nonreal estate farm loans as an
aux iliary variable.
Solution. The cumulative totals of the auxiliary variable ' nonrea l estate farm loans '
are given in column 2 of the Table 4.1.1.1.
302 Advanced sampli ng theory with applications
Table 4.1.1.1.Cumul ative totals (C.T.), random numbers, and states selected.
II't:\P.T . "Ii fSr? : i 3C'~';~i: ~RiltidOD1 .' : U llits V:
i"'i:: ' tt'lJ l'lits
~~:.t 1 1 :1Z:~':I::
(~;,. n'i '
i '! .' i "
'Selected: I'Nt;" I ·i';:;, .i· :.Number :Selected
I 348.334 26 25514.990
2 351.767 27 29100.390
3 783.206 28 29117.100
4 1631.523 01473 AR 29 29117.570
5 5560.255 04981,05365 CA,CA 30 29145.080
6 6466.536 31 29419.120
7 6470.909 32 29845.390
8 6514.138 33 30340.120
9 6978.654 34 31581.490
10 7519.350 35 32217.260 32063 OU
II 7557.4 17 36 33933.350 33313 OK
12 8563.453 37 34504.840
13 11174.030 38 34803.190
14 12196.810 39 34803.420
15 16106.550 40 34884.170
16 18686.850 41 36576.990
17 19244.510 42 36965.860
18 19650.300 43 40486.220 38107 TX
19 19701.840 44 40683.460
20 19759.530 45 40702.830
21 19816.000 46 40891.300
22 20256.520 47 42119.910
23 22723.410 22650 MN 48 42149.200
24 23272.960 49 43521.640
25 24792.950 23626 MO 50 43908.120
We used the first five colum ns of the Pseudo -Random Numbers (PRN) given in
Table I of the Appendix to select eight random numbers betw een I and
TN = 43908 . These selected random numbe rs came in the sequence as 01473,
23626,04981,32063,33313,05365,22650 and 38107. These random numbers
have bee n show n in the column 3 of the Table 4 .1 . I .I . The last column of this table
show s the states selected in the sample from the population I given in the
Appendix. It is to be noted that the state CA has been select ed twice.
Remark 4.1.1.1. The difficulty in this method is in calculating the cumulative totals
when the population size N is too large. Thus for large populations we shall
discuss another method called Lahiri's method .
Chapter 4: Use of auxiliary information : PPSWR Samplin g 303
Lahiri (1951) introduced a new method, which doe s not need cumul ative totals, for
selecting a PPSWR sample, but in this method we need to kno w the maximum
va lue of Xi wh ich we denote by X o ' Sometimes it is not possible to know the
maximum value of Xi' e.g., X = Number of errors in a book. In such cases, we
choose X °to be more than the maximum amon g all values of Xi ' In other words,
we choo se X o such that X o ~ Max(X!, X 2 ,... ,X N ) . Consider we expect (depending
upon the quality of printing) that the maximum numb er of errors in a book may be
°
100. Then if we choo se any value of X greater than or equal to 100, for example
X o = 200, it will not affect the sampling procedure too much . The steps for
selecting a sample by using Lahiri 's method are as follow s:
Remarks 4.1.2.1.
( i ) Any draw on which a unit is selected is called the effective draw otherwise it is
ca lled ineffective draw.
( ii ) One should note that the larger the difference betwe en the exac t maximum and
X o, the larger the numb er of rej ection s.
Theorem 4.1.2.1. With the Lahiri's method of PPSWR sampling, the probability
for the i'h population unit to get selected on the first effective draw is proportional to
the size of the i'h unit of the auxili ary variable. In other words, the probability of
selecting i'hunit in the sample by using PPSWR sampling is given by
x. N
P; =-' , where X = I Xi . (4.1.2 .1)
X ~I
Proof. We shall not take into account those dra w which are ineffect ive because
units are selected onl y on the effective draws. There are two possib ilitie s, i.e., either
the dra w is effective or not effective. Thu s if the first draw is effective then the
corresponding unit is selected and if the first dra w is ineffecti ve then we go to the
seco nd draw and so on.
304 Advanced sampling theory with applications
The probability for the {" unit to be selected on the first draw if it is effective is
Xi
(N X o)'
where 1/ N is the probability for a particular unit to be selected and x. ] X o is the
probability of an effective draw.
The probability of the first draw not being effective and consequently the
probability for the {" unit not being selected on the first draw is
1-( ;~ J.
Therefore the probability that any unit among the N units not being selected on the
first draw is
N I X X - -I N
l-L--' =1-- , where X=N LXi '
i=\ N Xo Xo i=1
Now the probability for the {" unit to be selected on the second draw (in case the
first draw was ineffective and second draw is effective) is
( 1 - ~J(~J
X
o NX o .
(4.1.2 .2)
Similarly , if the first and second draws are ineffectiv e and the third is effective, then
the probability for the {" unit to be selected on the third draw
(
1
Xo
X J2( oJ
Xi
NX
and so on.
Hence the probability that the /', unit will be selected on the first effective draw is
X ( X
Nl: + 1- X
o o
JX ( X
Nl: + 1- X
o o
J2 X
Nl: + ...
o
Xi
(4.1.2 .3)
X
Hence the theorem .
Example 4.1.2.1. Use Lahiri's method to select a sample of eight units from
population 1 given in the Appendix by using nonreal estate farm loans as an
auxiliary variable .
Solution. The maximum value of nonreal estate farm loans is $3928 .732. Therefore
we decided to choose X o = 4000. We started with the first two columns of the
Pseudo-Random Numbers (PRN) given in Table I of the Appendix to choose a
random number R, between 1 and N = 50 and reported it in the second column of
Chapter 4: Use of auxiliary information: PPSWR Sampling 305
Table 4. 1.2.1. We started with 7th to l oth columns to choose another random numb er
R j between I and 4000, and the random numbers so obtained have been presented
in the third colu mn of the Table 4.1 .2.1. The value of the nonreal estate farm loans
th
(Xi) correspond ing to the first random number R, has been coded in the 4
column. The s" column of the Table 4.1 .2. 1 has been devoted for making the
decision following Lahi ri 's instructions.
(a) If Rj > Xi then the pair of selected random numbers (Ri,R j ) is rejected and
marked ' R' in the 5th column.
( b ) If Rj < X i the n the pair of random numbers (Ri , Rj ) is selected and marked 'S'
,. -
in the 5th column.
T a ble 4. 1..
2 1. L ah'In., s met h 0 d 0 f sample se ection.
I~ T!ri~J H l~ii1 11&' ":,,, " cc"CO
'Cit::',,,
;':~~1;~[~1
r' "v" 'i' ~ ' tti '
26 06 3849 906.281 R
27 07 3466 4.373 R
28 42 0270 388.869 ;;r;~Si' '::
29 21 3490 56.471 R
30 31 1064 274.035 R
31 31 1101 274.035 R
32 36 3770 1716.087 R
33 16 2300 2580.304 !' r;?"2s '
34 27 1036 3585.406 11';' !!,' S ,i':'
35 10 3688 540.696 R
36 18 3591 405.799 R
37 26 0747 722.034 R
38 48 2486 29.291 R
39 02 0949 3.433 R
40 44 0635 197.244 R
41 12 2000 1006.036 R
42 49 0905 1372.439 I!\ "'S ' ;
Thus this method will include the following states in the sample.
1 2 3 4 5 6 7 8
23 05 38 03 42 16 27 49
MN CA PA AZ TN KS NE WI
We have seen that every unit within the sample has a different probability of being
selected. Also it follows from (4.1.5) that in the PPSWR sampl ing scheme, unlike
SRSWR or SRSWOR sampling, every sample has different probability of selection
from a given population. Now we shall discuss the problem of estimation of the
population total using PPSWR sampling.
•
YHH = -
1
L ydP;
n ( )
. (4.2.1)
n i=l
Chapter 4: Use of auxiliary information: PPSWR Sampling 307
Now YiIPi is a random variable and can take the values r.. 1Ii , Yzi Pz ,....., YN IPN .
Note that the draws are independent, the probability for the unit with value YI I PI to
be selected is Po. , Yz IPz to be selected is Pz, and so on.
Therefore
N N
E(yjp;) = I.p;(Y;IP;) = I.Y; = Y. (4 .2.3)
i=1 i=1
Using (4.2 .3) in (4.2 .2) we obtain
E(YHH ) = ~ iy = Y. (4.2.4)
n i= 1
Hence the theorem.
(, ) [1 I
E YHH = E -;; i~n yj P; ] =-;; E[N I N1 E(r; XY;I p;)
i~ r; (Y; 1 p;)] =-;; i~ (4.2 .5)
Obviousl y ri is a binomial variate and it can take any va lue between 0 and n
depending upon how many times {IJ unit is being selected in the sample under
PPSWR sampling scheme with probability of success Pi.
Thus
E('i) = np; .
Therefore (4 .2.5) reduce s to
, ) 1 N 1 N N
EY
( HH =- I. E(r; XY;I P; )=- I.np;(Y;I P; )=I. Y; =Y. (4 .2.7)
n i=1 n i= 1 i=l
Hence the theorem.
Proof. Again there are two method s to prove this theorem as follows :
Method I. Note that the draw s are independent in PPSWR sampling, therefore
Now we have
Further note that yl/ p/ takes value Y/ / p/ with probability I} and v.I Pi takes
value v.] Pi with prob ability Pi. On using (4.2 .10) in (4.2.9) we have the theor em.
Method II. Again using the binomial variate ri defined in (4.2.6) we have
E(rirJ=EIEz h rjlri ]= EIh Ezb I r;}]= E{r;(n- ri ) 1~~i ] = (1 ~jpi ) EI[nJi - r/]
j j
= (P )[nEI (/;)- EI~/ )] = ( P )~I.nl}- hr;) +(nl} f }] =n(n -l)I}Pj . (4.2.14)
I-I} I-I}
From (4.2.13) and (4.2.14) we have
CovlIi, Ij) = n(n - 1)I}Pj - nl}nPj = - np;Pj . (4.2.15)
Note that
Yz =( IY;
N )Z =IY; N Z+ IN Y; Yj,
i=1 i=1 i'1'j=1
and putting V(r i ) and CovlIi, rj) in (4.2.11) we have
Z
=- I[ INY Z(
-'-I-I} Y.. - Yz] .
) - IN y; y. ] =-1 [NI -'-
Il i=1 P; i¢ j=1 J n i=1 I}
v. (Y
. ) = - (1- -)
HH -2 - II Y'2]
[nI yf HH . (4.2 .16)
II 11 -1 ;= 1P;
Proof. We have
EVY
• •
[ (HH)~ =E -(--)
1
-1 [ II II
n y;
I -
{ ;= 1 22
p;
•2
- II YHH }] =-(-)E
-1
II
- I -p; J n ]
l i n y;
[
II
[
;=1
22
II • 2
- - YHH
=- (
n
~ 1)[~E[fY~J-E(yaH)l
p;
II J
1= 1
=-~()[[Iy;2J- {V(YH
ill p;
H) +(E(YHH)Y}] 1=1
=_1
(n - 1)
[[IY;2p; - Y2J- ~[
;= 1 n
I Y;2p; -Y2J] = ~[
n
I Y;2p; -Y2J.
;= 1 ; =1
Ex a mple 4.2.1. For esti mating the total amou nt of the real estate farm loans (in
$000) during 1997 in the United States we took a PPSWR samp le of eight states
from the pop ulation I given in the Appendix by using Lahiri 's method. From the
states selected in the sample we gathered the following information.
State AZ CA KS MN NE PA TN WI
Nonreal estate 431.439 3928.732 2580.304 2466.892 3585.406 298 .35 1 388.869 1372.439
farm loans, 'x
Real' estate 54.633 1343.46 1 1049.834 1354.768 1337.852 756.169 553.266 1229.572
farm loans, Y
Assume that the tota l amount $43908 .12 of nonreal estate farm loans (in $000) for
the year 1997 is known . App ly the Hansen and Hurwitz (1943) estimator for
estimating the tota l amount of the real estate farm loans (in $000) duri ng 1997 in
the United States . Also find an estimator of variance of the used estimator and
hence deduce a 95% confi dence interva l.
YHH ) =- (--
v"(" 1 ) i - n Y"2HH ] =-(--)
[nL ----T 1 [19259722369 - 8 x (36503 .7) 2 ] = 153563601.9 .
n n- 1 i=1 p; 8 8- 1
Using Table 2 from the Appendix the 95% confidence interva l is given by
Example 4.2.2. The amounts of the real and nonreal estate farm loans (in $000)
during 1997 in different 50 states of the United States have been presented in
population 1 of the Appendix. Consider we selected a PPSWR sample of eight
states to collect the required information. Find the relative efficiency of the PPSWR
sampling estimator for estimating total amount of the real estate farm loans during
1997 by using information on the nonreal estate farm loans dur ing 1997 with
respect to the ratio estimato r of population total based on SRSWR sample.
YR = NY( ;]
of popu lation total Y is given by
2
" ) = -N Y
MSE(YR
-2 lCy
r 2 + C 2 - 2p
x xyCxCy
1
n
(2500)(555.43)2 [ 1
= 8 1.1 086 + 1.5256- 2 x 0.8038~1.1 086 x 1.5256 = 52399977.81.
Chap ter 4: Use of auxi liary information: PPSWR Sampling 311
The variance of the estimator YHH of the population total under PPSWR sampling
is given by
V YHH = -;; i~ ~
(
A ) 1 (N y,2
- Y
2)
N
under the constraint I P; = 1 .
i= l
The Lagrange function is given by
L= IN -Y/- Y +..t 2 2( I N
Pi - l ) .
i=1 Pi i=l
Now
o L =0
oF;
implies that
2
_lL+..t2 = 0 or P; y, I y, If.
= -.1.. = -.1.. Y. ;::: 0 V i .
P; 2 '
I..t A I
N
Note that I Pi = 1 which implies that P; = Y; / Y and
i=l
- '- -Y 2]
(. ) I[ INY,2
V YHH = - =- l[ IN- y,2'--Y 2] =- l[YLY.-Y
N 2] = -1[ Y2- Y2] =0 .
II ;= 1 P; II i=IY;/Y II i= 1 I 11
As noted it is not necessarily the case that PPS WR sampling is always more
effic ient than SRSWR the next section has been devoted to study the efficiency
cond itions.
Here we shall discuss two different methods based on superpopu lation and cost
aspects in survey sampling.
Theorem 4.3.1. The PPSWR sampling remain s more efficient that SRSWR
sampling if the regres sion line should pass through or near the origin.
Proof. In order to study the relative efficiency of PPSWR sampling with respect to
SRSWR sampling we have to consider the superpopulation model approach by
following Foreman and Brewer (I 97 I). The superpopulation mod el is defined by
the linear relation ship between Y and X as follows
Y; =a + bX; + e; , (4.3 . I.I )
where e; is an error term satisfy ing the following assumptions:
a 2=-I
y
I N(Y;- Y-)2 and
N ;=1
a; = I_yP;
N
;=1
2
i_ _ y2 •
Now we have
E(VI ) = -E~a
11 11
[I
N 2 - IN ( Y; - Y
N 2 (2 ) =-E
N
-)2 ] N N (
;=1
-)2
=- E I Y; - Y .
11 ;=1
Also
Chapter 4: Use of auxiliary information: PPSWR Sampling 313
Thus we have
(4.3.1.4)
Thus we have
E(e.- e)2 = a'xl? + ~a'
I N
x g-l:..-a'
I N
xl?= at x g- ~a'
I N
xg I I I
and
E(Xi - x)(ei - e) = E[ei(x i - x)- e( Xi - x)J
=E[ei( Xi- x)rx;] = E[(Xi - x)E(eiI xJJ = O.
Putting these values in (4.3 .1.4) we obtain
E(V\)= N 2:[b2a; +a' Xf -!!....Xf +2b.O]
n ;= N
N 2[ 2 2 N _3:-' LXI?
N]
=- b a +~, LXI? (4.3.1.5)
n x N ;=\ I N 2 ;= \ I •
Obviously
314 Advanced sampling theory with applications
N(a+bX )2+a'XIJ
= XI I I
N {N
.I a'Xf -I(a+bX;)
}2
i= l X, 1=1 1=1
N I N N
=a2XI:-+a'XI:Xrl-a'I:Xf _N 2a 2. (4 .3.1.7)
i=IXi i=l i=1
HM = N/~;i = X(say)
which implies that
N I N
I - =-=, and X=NX.
i=IXi X
Therefore (4.3.1.7) becomes
E(V2)=.!-[N2a2(~
n X
-IJ-a'~xf +a'Nx~xrl]. i=l i=l
(4 .3.1.9)
From (4.3.1.9) and (4.3.1.5) the PPSWR sampling will be more efficient than
SRSWRif
E(vd-E(V2 » 0 (4.3.1.10)
or if
~[b2(j.~
n
+~~Xf
N
--4-~Xf]-.!-[N2a2(~
N n
i=l X
-IJ-a'~xf +a'Nx~xrl] >O
i=1 i=1 i= l
or if
Chapter 4: Use of auxiliary information: PPSWR Sampling 315
r=~~xrl .
N i=1
From (4.3 .1.11) and (4.3 .1.13) we find that the condition under which PPSWR
sampling is better than SRSWR is
In such situations PPSWR sampling will be less efficient than SRSWR sampling.
Hence the theorem.
4.3.2'COSTASPECT
Consider the total cost of the survey depends upon the number of draws and total
sample size. For simplicity, let us consider the following cost function
" xi
C = Co +nC1 +C2 L (4.3.2.2)
i= 1
where Co is the overhead cost, C 1 is the cost per sampled unit and C2 is the cost of
collecting data per unit size of the sample. In (4.3.1.2) the values of Co' CI> C2
( a ) For SRSWR sampling E(iExiJ= E(nx) = nX, thus the expected cost of the
survey is given by
E(C) = C· = Co + nCI + nXCz· (4.3.2.3)
The Lagrange function is then given by
Ili
Setting oLz/olI = 0 implies
II =~p;( P;
1=1
Y; _ r)Z- Ji
A 1 N
Min.v(rpPSWR)=-IP;(---C..-r) =
Y. Z C1 +Cz .I X i
N
.1-1
z/ 1.I P;(---C..- r)z
X N Y,
(4.3.2.10)
P; P;
I
11 i=1 C - Co 1=1
Using (4.3.2.6) and (4.3.2.10) the relative efficiency of PPSWR sampling with
respect to SRSWR sampling for the fixed cost of the survey is given by
Chapter 4: Useof auxiliary information: PPSWR Sampling 317
RECost = . (,
Mm.V YpPSWR
(')
M in ,VYsRswR
) = (C] + C2
-
X { N
C] + C2 ~ Xi
I-I
2 IXJ-] x RE. (4.3.2.11)
where Cx denotes the coefficient of variation of the auxiliary variable X. Large the
variation in the auxiliary variable, less will be the efficiency of PPSWR sampling
for the expected cost. If there is no variation in the auxiliary variable, i.e., C, = 0,
then REcost = RE .
'WRSAMPLIN
.! AILAnJ:,E:: e: :.' .0
We have observed in the previous chapter that for a simple random sample (SRS) of
size n drawn with replacement from a population of size N, the usual estimators
R= YIx and P = y x for estimating the ratio R = Y/ X and product P = YX ,
respectively, are well known. Prasad (1989) has increased the efficiency of the
estimator of R by certain scalar multiplication. The estimators of the product P
proposed by Beale (1962), Robson (1957), Tin (1965) and Sahoo (1983) are found
to be special cases of the class of estimators suggested by Singh and Sahoo (1989).
The efficiency of the estimators of the ratio and product available in the literature is
generally low. It is well known that suitable use of auxiliary information in
probability sampling results in considerable reduction in variance of the estimators
used for estimating a population parameter. Deng and Wu (1987) have increased
the efficiency of the estimator of variance of linear regression estimator by making
use of prior information about the population mean of the auxiliary variable .
Srivastava and Jhajj (1980, 1986) have shown that the efficiency of the estimators
for estimating finite population variance and the finite population correlation
coefficient, Pxy' may be increased by making use of some known population
parameters of the auxiliary variable at the estimation stage. If there is more than
one auxiliary variable, the problem remains as to how the entire information can be
utilised in a better way. Multi-variate ratio and regression methods of estimation
provide a solution to the problem. Singh (1967a, 1969) has also suggested a method
of using two auxiliary variates and the estimators suggested by him would be more
efficient than the usual ratio estimator under some conditions . Agarwal and Kumar
318 Advanced sampling theory with applications
(19 80) have used two auxiliary variables: one at the stage of selection of the sample
and the other at the estimation stage. Then taking the best linear combination of the
prob ability proportional to size (PPS) estimator and the ratio estimator so obtained
to estim ate the popul ation mean of the study variable. Jhajj and Srivastava (19 83)
have proposed a class of estimators of the population mean of the study variable
when the sampling is done by the method of prob ability proportional to suitable
size variable which is different from the auxiliary variable used at estimation stage.
We would like to first discuss the class of estimators proposed by Jhajj and
Srivastava ( 1983) and to show that other estimators ava ilable in the literature are the
spec ial cases of their class of estimators.
4:4.1
. NOTATION. AND 'EXPECTATIONS
.
Assume that information on two auxili ary variables Xi and Zi (say) for i = 1,2,...,N ,
highl y correlated with the study variable Y is available . Let a sample of size n be
drawn with PPSWR sampling. Let P; = zli~lzi denote the prob abil ity of selection
on the basis of known auxiliary variable Zi, i = I, 2,..., N . Let Yi and Xi denote the
value of the variable under study Y and the auxiliary variable X for the /" unit of
the popul ation , and r and X denote their popul ation means respecti vely. Let
Yi X, _ _I n
= NP'
V,=_' and VII = n IVi '
II i
I
1 NP;' i=1
Defining
II V
°o = f- I, and 0 = !!..-I
X
we have
E(00 ) = E(0) = 0
and
E (\u5:2)=
o
n- 1 eu2 , E(\u5:2)= I I - l ev2 , E(5: 5:)
uOu = II
-I e ev'
Pu v u
where
ell 2 = a ll2/-2
Y --R eIatrve
' vanance
. 0f -
11 11
ith
Wit au 2= ~ { -
L.P;~lIi -\2
Y J '
i= 1
and
P all"
=- -= COV(UII,VII) corre Iation
' between Yi an d Xi fior PPS WR
uv all av ~V(UII ) V(vlI )
samplin g,
where a liV = IP;(lIi - rXVi - X).
i= \
Chapter 4: Use of auxiliary information: PPSW R Sampling 3 19
It may be noted that the results for SRSW R sampling are straightforward if
P; = 1/ N \;/ i, (i.e., if
Zi = 1 \;/ i ). It is worth noting that the same auxiliary variable
cannot be used at both selection stage and estimation stage. In other words if
Zi = Xi \;/ i = 1,2,..., N , then E(02)= 0 and PI/V will take inde terminate form. Thus
it is compulsory to use one auxiliary variable at selection stage and other at
estimation stage .
Jhajj and Srivastava (1983) defined a class of ratio type estimators with PPS WR
sampling for estimating population mean as r
YJs =unH(w), (4 .4.2. 1)
where w=vn/X and H(e ) is a parametric function such that H(I )=I satisfying
certain regularity conditi ons. Expanding H(w) around the point I by using Taylor' s
second order series we have
YJS = un H(w) = unH[I +(w-I )]
H
_[
=u" H(I)+ (w - I)-t3H IIV:I +(W-I)\2-1 -()-2 2 IIV: I +.....] . (4.4.2 .2)
~v 2 ~v
Theorem 4.4.2 .1. The bias, up to terms of order O(n -I ), in the class of ratio type of
estimators in PPSWR sampling is
Theorem 4.4.2.2. The minimum mean squared error of the general class of ratio
type estimators YJS for estimating population mean Y is given by
-z
Min.MSE(YJs) = ~ Cl;[I- p~v]. (4.4 .2.5)
II
-z
= y 2E[8~ + 8 2H~ + 2808H,]= ~[Cl; + H)ZC~ + 2PuvCuCvHI]' (4.4.2 .6)
On differentiating (4.4 .2.6) with respect to HI and equating to zero we have
HI = - Puv Cu . (4.4.2.7)
Cv
On substituting the value of H) from (4.4.2.7) in (4.4.2.6) we obtain the minimum
mean squared error given by (4.4.2.5). Hence the theorem .
Remark 4.4.2.1. One can easily observe that the following estimators of the
population mean Yare special cases of the general class of ratio type estimators
under PPSWR sampling:
_ _
YI = all" + (1-a)l"
_
(~- ),
X
t Il
. _ _
yz = II" (~v".
X
Ja,
_ _
and Y3 = II" [av" + X-1
(_)
a X
] '
The estimator YI was proposed by Agarwal and Kumar (1980) , but the estimators
Yz, and Y3 are the analogues of the estimators proposed by Srivastava (1967) and
Walsh (1970) for PPSWR sampling strategies. Estimation of the population mean
using supplementary information has been considered by Unam (1995) which
shows the comparison of ratio type estimators with PPSWR sampling through
simulation.
Thus all ratio type estimators are special cases of the class of estimators YJs, In
order to cover the regression type estimators, Jhajj and Srivastava (1983) also
proposed a wider class of estimators as discussed in the following section.
( a ) The first and second order partial derivatives of the function H with respect to
u" and w exist and are known constants;
Chapter4: Use of auxiliary information: PPSWRSampling 321
O'H
( b) iii l(y,I)= I .
n
Expanding H(iln , w) around the point (Y, I) by Taylor's second order series we
have
YJSW = H[Y + (un - Y~I + (w-I)]
- I[ 2 -2 2 - ]
=Y+-;; CvH02+Y CuH20+YPuvCuCvHll .
Thus the required bias is given by
- 1r 2
B (Y JSW) = E (-
YJSW ) - Y
-
= -lC -2 2 - ]
vH02 + Y CIIH20 + YPllvCIICvH" .
n
Hence the theorem.
Theorem 4.4.3.2. The rrummum mean squared error of the wider class of
estimators YJSW is given by
rc, 2~
On substituting the value of H I in (4.4.3.6) we have
Remark 4.4.3.1. It may be noted that estimators of population mean Y of the form
YIO=un+a(X-vn), and YII =un+a(X-vn)
where a = i~I~(Ui - Un XVi - V,,)/i~I~(Vi - vn )2 is an estimator of a , are special cases
of the wider class of estimators.
Example 4.4.3.1. Select a PPSWR sample of six units from population-S given in
the Appendix by Lahiri 's method using the season average price during 1994 of
apple crop as a benchmark or auxiliary variable in the United States . Collect the
information on the prices of the apple crop during 1995 and 1996 from population 3
for the units selected in the sample . Use the estimator
for estimating the average price of the apple crop during 1996 by assuming that the
average price X = 0.1856 during 1995 is known. Also construct the 95% confidence
interval for the average price of the apple crop during 1996 in the United States .
Then the first seven effective pairs to select a sample of seven units are (01, .075) ,
(05, .155) , (29, .012), (36, .099), (19, .027), and (29,.039). Ultimately we have the
following sampled information.
Chapte r 4: Use of auxiliary information: PPSWR Samp ling 323
" '{ ~
1 2 3 4 5 6 ,,;,:i SUin '"
orate
, '0;:+
AZ CT MO SC SC WI i:t :',
;, Z'f}ii10.0780000 0.2830000 0.1980000 0.1300000 0.1300000 0.2300000 ;",
,, " " ;iY< ~;{:
It ,f;':"" 0.0126788 0.0460013 0.0321847 0.021 1313 0.0211313 0.0373862
I :'{;':::~i?'ff 0.0710000 0.2760000 0.1600000 0.1260000 0.1260000 0.2410000 !Y f+r;;' ;:) :;,Y">
~::" : ""
1 :~~'.Y i 1 :" 0.1220000 0.2920000 0.2280000 0.1260000 0.1260000 0.1330000 ii' ,, ' XI::
l i!;i u;; i;:'i 0.2672877 0.1763235 0.19678 11 0.1656308 0.1656308 0.0988184 ,<
1.0704723
'
I>
,
I"t Vi ' 0.1555527 0.1666620 0. 1380920 0. 1656308 0.1656308 0.17906 18 0.9706301
:b .' +>+, "
l'il.U':'i vi5;'> -0.0000070 -0.0000005 -0.0000140 -0.0000010 -0.0000010 -0.0000515 :0.0000750
""' +2 ,',
~Ui j 0.0001001 0.0000002 0.0000109 0.0000035 0.0000035 0.0002368 5 ;0003550
,~,0R~?li, 0.0000005 0.00000 11 0.0000 180 0.0000003 0.0000003 0.0000112 '0.00003 14
where
u7 = (Ui - un) , v7 = (Vi - vn ) , f; = z/i~lzi , with i~lzi = 6.152 (given), Ui = Yi/ (Nf; ),
and Vi = Xi /(Nf;) .
Then we have
n
Lf;Ui*. Vi
Un = n- I
n I
n
LUi = 0.178412, vn = n- LVi = 0.161772, and a = i=1
n *2
= -2.38563.
i=1 i=l L f;Vi
i=1
Thus an estimator of the average price of the apple crop during 1996 is given by
Yll = u, +a(X - vn ) = 0.178412-2.385636x(0.1856-0.161772) = 0.121567.
Taki ng
Su
2= n (
If; Ui - -)2
Un = 0.000355
i=l
and
n
If;(Ui - UnXVi -vn )
i=1 -0.0000750 = -0.71036,
n 2 n 2 .J0.0003550 X 0.0000314
I f; (Ui -Un) If;(Vi - il,,)
i=l i=1
we have
MSE(Yll) = 0.000355 [1- (-0.71036)2]= 0.00002931.
6
Using Tab le 2 from the Appendix the 95% confidence interva l of the average price
of the apple crop during 1996 in the United States is
324 Advanced sampling theory with applications
N
Consider y and x are highly negatively correlated and X = IX; > n.Max(x;}.
;;\
Srivenkataramana (1980) , Sahoo, Sahoo, and Mohanty (1994) considered a
transformed auxiliary variable z whose value for X; is defined as
X-nX
Z, = 1 \;j X; En (4.4.4 .1)
N-n
and proposed an unbiased estimator of population total Y as
(, ) I[ Nf,2 2J .
V YSM =- I~-Y
n ;;\ p;
(4.4.4 .3)
The limitation of this strategy is that in some situations the condition X > nMax(x;)
may not be satisfied, but this condition is reasonably satisfied in most of the
practical situations. A practical example in which this condition holds is given
below:
Example 4.4.4.1. Consider that the ranks of the auxiliary variable x; are used to
select the sample of n units with PPSWR sampling from a population of size N .
The condition X > n. Max (x;) will be satisfied if
Evidently Max (x;) = N and Ni~; = N(N -I) , therefore (4.4.4.4) reduces to the
;;\ 2
condition on sample size
n <-- .
(N +1)
(4.4.4.5)
2
Chapter 4: Use of auxiliary information : PPSWR Sampling 325
In other words, the condition X > n.Max(xi) remains satisfied if the size of the
selected sample is less than 50% of the population size. Thus the condition
X > n.Max(xi) may be satisfied in most practical situations. The literature related to
the use of ranks and its benefits in selecting the sample can be had from Wright
( 1990).
We observe that if the correlation between Y; and Xi is positive and high and the
regression line passes through the origin then PPSWR sampling remains better than
the SRSWR sampling. If the correlation between Y; and Xi is negative then the
transformation suggested by Sahoo, Sahoo, and Mohanty (1994) works well.
Example 4.4.4.2. In population 2 the age and duration (minutes) of sleep are
negatively correlated. Assuming that the whole population information is known .
Discuss the gain in efficiency due to Sahoo, Sahoo, and Mohanty (1994) method of
estimation over the Hansen and Hurwitz (1943) estimator . Assume that the sample
size consist of five units.
I :i'~i~i~llil i[;i~t:M'\'<l~»'
--
,X ; \' ;
>,:" .9 ." 'c :c
.,
::,'".,·W,'" 7~:' §~:~,tt~:1'1
(;:!'!
V(YHH )=~(i:
n i=1
}'/
P;
J=~[143043805.25-115262]=
-y 2
5
2039025 .84,
and
The next section has been devoted to the situation in which in large scale
multipurpose surveys, where the cost involved is quite high, the information
collected on several variables of interest is available. Some of these study variables
may be positively and highly correlated with the auxiliary information used at the
selection stage while others may be poorly correlated. Such a situation is called
'multi-character survey'.
( a ) Study variables have poor positive correlation with the selection probabilities.
( b ) Study variables have poor positive as well as poor negative correlation with the
selection probabilities.
• I" y .
YBS =-L~ (4.5 .1.1)
II i =1 p;
wher e
( a ) If Pxy = 0 then (4.5.1.2) becomes p;*= 1/ N and the estimator (4.5.1.1) reduces
• N "
to the well known estimator owed to Rao (1966a, 1966b) , that is YRao = - L Yi .
n i= 1
( b ) If Pxy = I then (4.5 .1.2) becomes p/ = Pi and the estimator (4.5.1.1) reduces
to the well known estimator of Hansen and Hurwitz (1943), that is YHH = ~ ± Yi .
n i= 1 p;
We now study some further properties of the estimator YBS in the following
theorems:
(4.5.1.3)
Evidently B(YBS ) is not converging to zero as sample size Increases. Hence the
theorem.
V(YBS)=~[I}j2l
p;*
- (I}j~p; J2] .
II i =1 i= 1
(4.5.1.5)
(" ) I[NY,z
V YHH = - I-'- -Y 2J . (4.5.1.7)
n i=\ P;
["(" )J
E v YBS j =-(--)
E ["I y
/I /I - 1
2
----!z - n YB"2SJ=-(--)
i= 1 p;
1
n /I - 1
1["I ----!zp;i J-
E
i= \
("2)!
nE YB S
=
1
( _-) /lI -+zp;-
nn 1
N y"
2 n V(YBS)+ I~ J2)]
[ I= \~ 1 [ N yp
I=J ~
1
=-(--) nI-+zp;-n
2
/I /I - 1
N y
[ i=\ ~ 1V(Y )+[I~
.
p; J2)]
BS
N yp
i= J
1 N Y N yp "
= (/I-I) 2 - [i~ ~.~ J2 V(YBS)1
[i~ ~'2P; -
Let Y; and F; denote respectively the value of variable Y and the relative measure
of size X for the t" ( i = 1,2,..., N) unit in the population so that the values
N
F; = X ;/ X will serve the purpose of selection probabilities with IF; = I . A general
;=1
superpopulation model following Cochran (1963) is given by
f3F; +e;, i = 1,2,....,N .
Jj = (4.5.1.11)
where e; are the error terms such that:
Em[e; I F;l= 0;
Em(ellF;)=aF;g, a >O, g;::O; (4.5.1.12)
and Em(e;ej I F;Pj) = 0, where Em denotes the expected value under the super-
population model. Here Em (ell F;) is the residual variance of Y for P = F; . The
expected value of this residual variance is given by,
E(apg)=!!...- IPg
, N;=l l '
when the infinite superpopulation is simulated by the finite population of N units
having the same characteristics as that of the superpopulation. Also , this expected
value of residual variance is known to be given by
0-;(1- P';y),
where Pxy is the correlation coefficient between Y and P . Thus we have
a N g _ 2( 2)
-IF; -o-y I-pxy . (4.5.1.13)
N ;=l
Now we have the following theorems :
Theorem 4.5.1.5. The expected value of the vanance V(YRao) under the
n1
superpopulation model (4.5.1.11) is
n
Proof. The V(YRao)in (4.5.1.6) under the model (4.5.1.11) can easily be written as
N 22
-IF;
1=1
N F;Pje;er2f3 ( .IN F; 2)( IF;e;
e; - I
'''J=1 1=1
J] . (4.5.1.15)
On taking expected values on both sides of(4.5 .1.15) and using (4.5.1.12) we have
(4.5.1.14) . Hence the theorem .
330 Advanced sampling theory with applications
Theorem 4.5.1.6. The expected value of the variance V(YBS ) under the
superpopulation model (4.5.1.11) is
Theorem 4.5.1.7. Under the superpopulation model (4.5.1.11) the estimator YB Sof
population total will be more efficient then Rao (1966a , I966b) estimator if
P,;y > (l-ot l . (4.5.1.17)
where
(4.5.1.18)
Proof. It follows by setting Em[YBs]~ Em[YRao ] and using (4.5.1.14) and (4.5.1.16).
Hence the theorem .
Example 4.5.1. 1. A team of medical doctors wishes in estimating the total of three
variables 'Crude Birth Rate (CBR)', 'Crude Death Rate (CDR)' and 'Infant
Mortality Rate (IMR)' in the world. We know that CBR and IMR have positive and
high correlation with the 'Total Fertility Rate (TFR)', where as CDR has low
positive correlation with TFR. Select a sample of 10 countries from the list given in
population 8 of the Appendix by using PPSWR sampling and using TFR as known
auxiliary information. The correlation coefficients values of TFR with TBR, CDR
and IMR are +0.9855, +0.5492 and +0.8525 respectively. Apply the appropriate
transformations on the selection probab ilities using know values of correlation
coefficient to obtain estimates of total CBR, CDR and IMR. Also construct 95%
confidence interval in each situation .
Solution. (a) Selection of PPSWR Sample:
The population 8 in the Appendix consists of N = 96 countries. The maximum
value of TFR is 7.17. Thus in the Lahiri's method we selected random numbers
1 ~ Ri ~ 96 and 1 ~ Rj ~ 8. We used first two columns of the Pseudo -Random
Number (PRN) Table I given in the Appendix to select the random number R,
whereas the random number Rj selected using the 13th column . Then we
performed the following trials:
Chapter 4: Use of auxiliary information: PPSWR Sampling 331
2 60 7 1.53
3 54 3 3.10
4 92 5 2.53
5 01 4 2.07
6 69 1 1.95
7 87 1 2.35
8 62 8 5.95
9 23 8 1.81
10 88 4 6.24
II 94 8 6.86
12 64 8 2.66
13 46 2 1.55
14 04 7 6.05
15 32 5 6.75
16 94 6 6.86
17 47 4 1.50 R
18 57 8 3.13 R
19 56 6 2.79 R
20 77 7 1.95 R
21 57 7 3.13
22 81 4 5.19
23 60 4 1.53
24 33 6 1.73
25 05 6 2.64
26 72 2 5.87
27 22 6 1.80
28 88 6 1.80
29 38 3 3.16
1 )l- PXY(I)
We have N = 96, Pxy(l) = 0.9855 and fl; = ( 1+ N (1 + p; Y'xy(l) - 1 thus we have
Using Table 2 from the Appendix the 95% Confidence Interval estimate of the total
CBR is given by
2439.271 ± 2.262"'4433.35, or [2288.66,2589.88] .
1 )1 -PXY(2)
Here N = 96 , Pxy (2) = 0.5492 , and P;i = ( 1 + N (1 + p; )pxY(2) -1 thus we have:
YCDR ±
=.!. Y2j = 9652.267 = 965.2267
n i=l P2i 10
and an estimate of the V(VCDR ) is given by
[ ~2- ]
1 11 Y2i 2
-(--) I
A • A
V(YCDR) = n YCDR
n n-I i=IP2i
2]=23860
= (1 )[11464058.39-IO X965.2267 .36 .
1010-1
A (1- a)100% confidence interval of total CDR is given by
334 Advanced sampling theory with applications
YCDR±(a/2(df=n-l~v(YCDR) .
Using Table 2 from the Appendix the 95% Confidence Interval estimate of the total
CDR is given by
965.2267 ± 2.262.J23860.36, or [810.75,1119.69] .
, , I n Y3i '2
V(Y1MR ) = -(--) [ I
2
n n- 1
--.z
i=J P:Ji
- n l'iMR ]
Using Table 2 from the Appendix the 95% Confidence Interval estimate of the total
IMR is given by
4589.557=F2.262.J419190.45 , or [3125.02,6054.08].
Chapter 4: Use of auxiliary information: PPSWRSampling 335
Th e probl em of estimation of population total using mult i-character survey has also
been considered by Pathak (1966) , Kumar and Herzel ( 1988), Amahia, Chaubey,
and Rao (19 89) and Rao (l993a, 1993b ). They considered the following types of
transform ations on the selection probabilities.
• = N1
Pio [ Rao ( I966a , I966b) ]
• ( 1 J(I-PXY)
f}, = 1+ N (I + Pi )pxy - 1 [ Bansal and Singh (19 85)]
(i ) H(~ J= N;
( ii ) The first and second order partial derivatives of H with respect to Pi exist and are
assumed to be known constants.
Expanding H(P i )around 1/ N with Taylor' s series and noting IPi - 1/ < 1 we have NI
J ,( JH
2
Y•g = -;iEYi
1 1/ [ N+ ( Pi - N1 H + Pi- N1 " + .....] , (4 .5.1.1.2)
where
i3H i32H
H' =- 1 I and H" =- 2 - 1 I
13 Pi Pi =fj 13 Pi Pi=fj
denot e the first and second order partial derivative s of H with respect to Pi and
are known constants for Pi = 1/ N .
Thu s we have the following cases:
Case I. Ifwe choose H ' = -PxyN 2 and H " = {N3/(N + 1)~xy[Pxy (1 + 2N)+ I] then the
class of estimators Yg reduce s to estimator of Bansal and Singh ( 1985).
336 Advanced sampling theory with applications
Case II. Ifwe choose H' = -PxyN Z and H" = 2p;yN 3 then the class of estimators
Yg reduces to the first estimator of Amahia, Chaubey, and Rao (1989) .
Case III. If we choose H ' = - Pxy N Z and H" = 2P xyN 3 then the class of estimators
Yg reduces to the second estimator of Amahia, Chaubey, and Rao (1989) .
Thus there are several choices of H' and H" exist depending upon the form of the
function H(Pi)' but still no one knows which is the best or optimum .
Case III. If Pxy = -1 then P; = pi and Yp reduces to YSM owed to Sahoo , Sahoo,
and Mohanty (1994) .
The transformation at (4.5.2.2) makes use of the known correlation coeffic ient P xy
between the study variable Y and the auxiliary variable x . The study variable y
can be anyone among the k variables of interest, say YI , Yz ,..., Yk , some of them
have low positive and others having low negative correlation with the auxiliary
Chapter 4: Use of auxiliary information: PPSWR Sampling 337
variable x. In actual practice, the value of P xy is not known in most of the surveys .
Thus Singh and Hom (1998) have advised to use the estimator of Pxy in (4.5.2.2).
Suppose rxy is an estimator of the correlation coefficient Pxy defined as
Yi Xi - -I + (1
d Pi=aPi+-api
-I )-
h
were Ui=--'Vi=--,u=n
11 11
'L.vi,an
'L.ui,v=n
-
.' ( 2Y I) +2':ry (+
Pi = 1- rxy \ N
-) r;y (+ -) [+2 _2 ( 1)2 J.
Pi - Pi +2 Pi +Pi +0 Pi .r, ' N (4.5.2.5)
Let £1 denotes the expected value over all possible values of the correlation
coefficient, rxy' then
£1 (p;)= P; +O(n-I). (4.5.2.6)
Hence the theorem.
Proof. Let E 2 denote the expected value for the given value of r<y' then we have
1(
E 017) = II-\(~-IJ
Pxy
, EI(& 17 ) =/1-J (.3R- 1
Pxy
J, and E 1(&)= /1-1(~2 -I)
Let VI denote the variance of the estimator of the correlation coefficient, then we
have the following lemmas:
_
Vi ~<y )- /1
-I 2 2 +pxy
Pxy .1.22 - -
[ [
2-
2
2 ~,
1
J
+4"(.1.40+ -1.04)- - [J]
.1.13 .1.3 \
+-
~ ~,
. (4.5.2.9)
Theorem 4.5 .2.4. The variance of the estimator Ys to the first order of
approximation is given by
V(YS)"' ~[Iyf~;
I p; -[IY;~;J2]
p;
;;1 ;;1
Chapter 4: Use of auxiliary information: PPSWR Sampling 339
=E -
I
ljN i po- [NL yopo j2)1 + V [NYopo]
L _'_I
*2
L_1
_ 1
,* I.
_1_1
,*
r P' 1=1 P. 1=1 P,
0 0
Il 1=1
i l l
_1[NL ~
Y;Pi - [NY i Pi ]2] Ny;p; (,*)
- - L -*- + L ~I PI '
Il i=1 Pi i=1 Pi i=1 Pi
which after using above expected values and on simplification reduces to (4.5.2.11).
Hence the theorem.
We observed that SRSWR will be better than PPSWR sampling if the line of
regression passes far away from the origin. Thus linearity between Y and X is not
a sufficient condition for PPSWR scheme to be better than SRSWR scheme. So in
practice it is very difficult to have an idea whether PPSWR sampling scheme can be
preferred over SRSWR scheme or not. In order to overcome this problem, Reddy
and Rao (1977) suggested that the sample be selected by probability proportional to
revised sizes scheme and with replacement. The revised sizes are obtained through
a location shift in the aux iliary variable as
This can also be treated as a compromise selection probability between PPSWR and
SRSWR leading to new selection probabilities given by
P.'= x.
1
'I LN'X.=LP. +(I-L) I N .
1 i=1 I I
(4.6.1)
V YRR
(
A ) 1
=- [N y;2 Y2] .
L:-'-,- (4.6.3)
n i~ 1 P;
Proof. Obvious by foJIowing Hansen and Hurwitz (1943).
For details one can refer to Reddy (1974), Singh, Kumar, and Chandak (1983) and
Bedi and Rao (1996) .
Example 4.6.1. One can observe for population I, as the values of L changes from
0.1 to 0.9 with a step of 0.2, the percent relative efficiency of the estimator YRR
with respect to the estimator YHII changes as shown in the following table.
The next section has been devoted to define the estimator of finite population
correlation coefficient Pxy under PPSWR sampling .
Assume (Yi' Xi )' i = 1,2,..., n , denotes a pair of observations on the variables Yand
X for the sample of size n drawn with varying probability p; and with
replacement procedure from a population of size N . FoJIowing Gupta, Singh, and
Kashani (1993) , we have the foJIowing theorem .
where
81 = i: Xi~i
i~1 p;
- (I p; )(I p;
i~1
Xi
i~1
Yi) + N(n -I)I sn:p;
,~I
(4.7.2)
_ nx2 (n X.)
B2 = L:-T- L:-L +N(n -I)L:-'-,
nx2
(4.7.3)
i~1 p; i~ 1 p; i~ 1 p;
and
Chapter 4: Use of auxiliary information : PPSWR Sampling 341
•
(}3 =
11
.L - Z -
Yi ]+ N(II-I).L-.
y1 (".L- y1 11
(4.7.4)
1= 1Pi 1=1 Pi 1= 1 P;
EXERCISES
Exercise 4.1. Define an estimator of the regression coefficient f3 = sxy (s.; t' In
PPSWR sampling.
Exercise 4.2. Write a short note on PPSWR sampling? Discuss at least one method
of selecting a sample by PPSWR sampling.
is an unbia sed estimator of population total. Also show that an unbiased estimator
of the variance
V(YHH )= ~(~
II
lf _y z ]
P; i =1
is given by V(YHH )= -(_I- _)[t
II II I
Y~ -(YHHJ].
P;
i =1
Exercise 4.4. Show that the probability of selection of the i''' unit in the sample by
using Lahiri's PPSWR sampling scheme is given by
P; = XI! ~Xi.
i=1
What is the probability of selecting i" sample of II units using PPSWR sampling?
population total. Find its variance and deduce the unbia sed estimator of variance.
342 Advanced samp ling theory with applications
Exercise 4.6. Write a short note on the relative efficiency of the PPSWR sampl ing
with respect to SRSWR sampling. Also discuss the concept of revised selection
prob abilities.
Exercise 4.7. Find the asymptoti c bias and variance of the following estimators:
_ _ ( \._(x) .
v" '
YI=all,,+ I - a )'l"
_ _(v)a
it ;
Y2 = 1I"
where _
II" =n - I I" Y;
- , -v" =n - I I" -X; N
and P; = Z ;/ IZ; have the ir usual mea ning
;=1 P; ;= 1 P; ;= 1
Exercise 4.8. Is it possible to app ly PPSWR sampling scheme if the stud y variable
and auxiliary variable are negatively correlated?
Hint: Sahoo, Sahoo, and Mohanty (1994).
and the first and second order partial derivati ves of H with respect to Pi exist and are
assumed to be known constants.
Chapter 4: Use of auxiliary information: PPSWR Sampling 343
Show that the estimators in ( a ) and ( b ) are special cases of ( c ) for certain choice
of parameters in the function H(Pi)' Study the bias and variance of the general
class of estimators.
Hint: Singh , Grewal , and Joard er (2002 ), Espejo , Pineda, and Nadarajah (2003) .
Exercise 4.10. ( a ) List a few situations where PPSWR sampling can be used in
actual practice.
( b ) Pickup one situation of interest to you and collect information on a variable(s)
of your interest from a reasonable good number of respond ents.
( c) Later you found that a few of the variable(s) have very low correlation with the
selection probabilities, what kind of analysis might be helpful to you?
be the usual unbia sed estimator of Y based on a PPSWR sample of size n and let
'. X
• y.
Ypps = - I --'-=
II
n i=\ Xi +dX
be an alternative unbia sed estimator of Y based on a PPSWR sample of size n,
where the size measure is the transformed variate
X'= 2: (X i+dX )=X(I+ d),
i=\
show that
v(Y;ps )<- d()v(y)+
l +d
(I +d
1-)v(YP
Ps).
Hint: Reddy and Rao (1977), Agarwal, Singh , and Goel (1979) .
and Pxy is the correlation between y and x , pt =!.L and pi = (x - nXi)/ {(N - n)X}
X
N
with X = L Xi ' Discuss the cases for Pxy = I , Pxy = 0 and Pxy = - I .
i= l
( b ) If the value of P ry is unknown , then use its consistent estimator
344 Advanced sampling theory with applications
_ -I " _ -I "
where II i = Yi/(Npi)' II = II 'Illi, v = II 'I vi and
i= 1 i =1
Exercise 4.13. If Y is perfe ctly linearly correlated with the auxiliary variable x so
that Y =a + fJx , where a, fJ(* 0) are constants . Show that sample mean estimator
with PPSWR sampling. Let P; = z/i~lzi denote the probability of selection on the
basis of known auxi liary variable Zi for i = 1,2,....,N. Let Yi and Xi denote the value
of the variable under study y and the second auxi liary varia ble x to be used at
estimation stage. Let IIi = yd (Np; ) and Vi = xd(Np;) for i = 1,2,...,11 . Obviously
RI = U" Iv" , R2 = (u"I X) and R3 = (u"/V" Xxiv,,) can be considered as three
Chapter 4: Use of auxiliary information: PPSWR Sampling 345
estimators of the ratio R = Y/ X . Find the values of the real constants gt' t = 1,2,3
such that the linear variety of the estimators of R defined as
, 3, 3
Rms = l. g/Rt , for 'Igt = 1
t =l t= l
remains unbiased with minimum variance .
Hint: Mahajan and Singh (1997).
PRACTICXVPROBLEMS
Practical 4.1. Use the cumulative total method to select a sample of ten units from
population I given in the Appendix by using nonreal estate farm loans as an
auxiliary variable.
Practical 4.2. Use Lahiri 's method to select a sample of ten units from population
I given in the Appendix by using nonreal estate farm loans as auxiliary variable .
Practical 4.3. Your supervisor used PPSWR sampling to select a sample of six
states from the population I in the Appendix and collected the follow ing
information.
State KY NE ME OH OK TX
Nonreal.estatefarm loan CX) $ 557.656 3585.406 51 .539 635.774 1716.087 3520.361
Real estatefarm loanH' » $ .).. 1045.106 1337.852 8.849 870.720 612.108 1248.761
The total amount $43908 .12 of nonreal estate farm loans (in $000) for the year
1997 is known. Use the Hansen and Hurwitz (1943) estimator for estimating the
total amount of the real estate farm loans (in $000) during 1997 in the United
States . Also find an estimator of variance of the estimator and hence derive the 95%
confidence interval.
Practical 4.4. Discuss the relative efficiency of the PPSWR sampling estimator for
estimating total amount of the real estate farm loans during 1997 by using
information on nonreal estate farm loans during 1997 with respect to the usual and
ratio estimator of population mean based on SRSWOR sample . Use the information
given in popul ation I of the Appen dix.
a= {f.~(Ui
1=1
- Un XVi- Vn)}/f.~(Vi - Vn ! ,
1= 1
for estimating the average price of the apple crop during 1997 by assuming that the
average price during 1995 is known. Also construct 95% confidence interval for
estimating the average price of the apple crop during 1996 in the United States.
Practical 4.6. It is a well known fact that age and duration of sleep are negatively
correlated . Assuming that the whole population information is known as presented
in population 2 of the Appendix. Discuss the gain in efficiency due to Sahoo,
Sahoo, and Mohanty (1994) method of estimation over the Hansen and Hurwitz
(1943) estimator. Assume that the sample size consist often units.
Practical 4.7. Your boss Ms. Stephanie Singh used TFR to select a PPSWR sample
of 5 units from the population 8 given in the Appendix. Her interest was to estimate
the total 'Crude Birth Rate (CBR)', 'Crude Death Rate (CDR)', ' Infant Mortality
Rate (IMR)' and 'Expectation of Life at Birth (ELB)' in the world as shown in the
following table.
Practical 4.8. The following table gives the weekly wages and expenditure of all
the seven households on a drive way:
Practical 4.9. Consider a class with six girls, their height and response to the
question, 'Do you like Bob?' are given in the following tab le:
Practical 4.1 0. Consider a class with six students, their marks in the assignments
and final examination are given in the following table:
5;oIN.TRODUCTION .\...,
Theorem 5.0.1. The sum of the probabil ities of including the /h population unit in
the sample over the whole population is equal to the sample size, i.e.,
LJri = n
ieQ (5.0.1)
where i E n implies that /, unit belongs to population n of size N .
Proof. Let us define a variable Ii which takes the value I if the / " population unit is
included in the sample and zero otherwise, i.e.,
I with probability Jri,
I i = {0 with probability (I- Jri} (5.0.2)
The expected value of the random variable t, IS
£(1;)= I XJri +(I-Jr;) xO = Jri ' (5.0.3)
Note that we are using without replacement (WOR) sampling , so there is no chance
of getting any unit repeated. The n selected units will be distinct and we have
2,fi = n.
ien (5.0.4)
Taking expected values on both sides of (5.0.4) we have
IE(tJ= n .
ien (5.0.5)
On using (5.0.3) in (5.0.5) we have
IJTi = n.
ien (5.0.6)
Hence the theorem.
Theorem 5.0.2. The sum of the probabilities of including different pairs of units
from the whole population in the sample is given by product of the probability of
including i h population unit and the remaining sample size. In other words
. I JTij = (n -lk . (507)
l(" ,)en . .
Proof. Let us consider a random variable t ij which takes the value I if both i h and
/h population units are included in the sample and 0 otherwise, i.e.,
I with probability JTij.
tij = { 0 with probability (1- JTy) (5.0.8)
L JTij = L[Probability for both /h and /h(i '* j) units are included in the sample]
j("i)en j{"i)En
L prj E s liE s ]P(i E s)
j("i)en
LP(j E S liE S )JTi = JTi LP(j E S liE s). (5.0.11)
j("i )en j("i)en
In Theorem 5.0.1 we proved that the sum of the probab ilities of including i h
population unit in the sample is LJTi = n, where n denotes the set of all N units
ien
in the population and n is the number of units in the sample s . Now in
LP(j E S liE s) the number of population units is (N -1) (since i h unit has been
j{"i)en
already selected) and in the sample we have to select a further (n - I) units . Thus
L P(j E S liE s) = (n -1) .
j{,,;)en (5.0.12)
Hence the theorem .
Chapter 5: Use of auxiliary informa tion: PPSWOR Sampling 351
The Horvitz and Thompson (1952) estimator (or HT estimator) of the population
total Y is a linear estim ator of the sample observations. The Horvitz and Thompson
(a universal estimator), on the basis of n sample observations y;, i I, 2, ..., n , can =
be defined as
(5.1.1)
ies
where d i , i = I, 2, ..., n are predeterm ined real constants or design weights. Thu s we
have the following theorems:
Theorem 5.1.1. For the estimator YHT to be unbia sed for the popul ation total, Y,
the design weights are given by
d, = 1/";. (5.1.2)
Proof. Taking expected values on both sides of (5.1.1) , we have
where t; is a random variable and takes a value I if the i'h population unit is
includ ed in the sample and 0 othe rwise. Note that here both d, and Y; (ith
popul ation value) are constants . Ther efore we have
E(YHT )=L, E{t;}ci;Y; = L,";d;Y; . (5.1.3)
;Efl ;Efl
Now 'L,,;d;Y; is equal to 'LY; if d, = 1/,,; . Therefore the cond ition that YHT is an
;Efl ;Efl
unbia sed estimator of population total Y is that the design weights d, are equal to
1/,,; . Thu s we have the following theor em:
Theorem 5.1.2. An unb iased estimator of the popul ation total Y is given by
• y.
YHT = L, --!..
Jt
. (5.104)
ies ;
Theorem 5.1.3. Under SRSWOR sampling, the estim ator (5.1.4) reduces to the
estimator of population total Y given by
YHT = Ny . (5 .1.5)
Proof. Under SRSWOR sampling, the probability of including the i'h population
unit in the sample s is
, y. N n _
YlIT = L-'-=-LY; = Ny . (5.1.7)
;ESn/ N n ;=1
Hence the theorem.
Theorem 5.1.4. The variance of the Horvitz and Thompson (1952) estimator,
YHT = LYd"; ,of the population total, Y, is
ies
Now we have
V(t;)= E~l)- {E(t;)}2 . (5. I.I 2)
From (5.1.9) we have
th
t . = {I if the i population unit is included in the sample,
I 0 otherwise.
(5.1.13)
Thus
EVil j)= l x Jrij +Ox (1- Jrij)= Jrij .
Therefore we get
COVVi ' Ij) = EVilj)- E(/i )Eb) = Jrij - " t"! . (5.1.20)
Now plugging the values of V(/ i) and COVVi,l j) in (5.1.11) we have
V (YHT =
y;2 ()
, ) I ---'TV Ii + I I Y;Yj
--COY li,l j
( )
iEO «; iEO j("i}eO JriJrj
y,2 ( Y;Yj (
)
= L: ---'TJri
- - Jrij - JriJrj ) ,
1- Jri + L: L:
iEO Jri JriJrj iEOj("i)EO
Theorem 5.1.5. In case of SRSWOR sampling, the variance of the Horvitz and
Thompson (1952) estimator reduces to
, \ N
2(1-
f) 2
(5.1.21)
V (YHT SRSWOR = Sy
n
where f = n/ N denotes the finite population correction (f.p.c .) factor .
Proof. We know that the probability of including the ith population unit in the
sample by using SRSWOR sampling is
Jrij=ln-2
(N - 2J/(NJ
In n(n -I)
=N(N-l)' (5.1.23)
On substituting the values of 1r i and 1r ij in (5.1.8) we have
n(n-l) n n
V (Y
HT
)= I (1-n/ N) y;2 + I I N(N -I) NN Y;Yj
iEO n] N iEO j(,,;}eO ~~
NN
N -n N 2 N -n N N
=--L:Yi - - - - L : L: YiY " (5. I .24)
n i~l n(N -1) i~1 j";~1 )
Note that
Y =
2 [NIY; J2 = IY;
N 2 N
+I N
IY;Yj . (5.1.25)
i~l i~ 1 i~1 j,,; ~ 1
Therefore
N N 2 N 2
I I Y;Yj = Y - I Y; . (5.1.26)
i ~1 j"i ~1 i ~1
2(1-
= (N - n) [NI Y;2- (Nyf ] = N(N - n) S~ = N f) S2. (5.1.27)
n(N-I) i=\ n > n Y
Theorem 5.1.6. Another form for the V(YHT ) , developed by Sen (1953), and Yates
and Grundy ( 1953) independently, is given by
2
, I y; Yj
V(YHT kYG = - I I (Jrillj - Jrij -L _ _
{ Jri
(5.1.28)
2 iEQ j(<<i}eQ Jrj )
Proof. We have
2
, I Y; Yj
V(YHTkYG =-I I kJrj-Jrij - --
2 iEQj(<<i}eQ { Jri Jrj )
=-
1
L: L:
( (y;2 Y}) + L:
JriJr . - Jrj" - '- + - L:
( \Y;
Jrj" - JriJr . ~- .
Y
j
. " J.(* ,'}e""
2 IE" J U
Jri
2 2
Jr j
. " .( .L "
' E" J * , /'="
U J -rr rr
"i " j
(5.1.29)
Note that the probabilities for y;2 / Jr? and Y} / Jr } are the same , therefore
(y;2/ Jr? + Y}/ Jr}) = 2y;2/ Jr? and (5.1.29) becomes
(, \
V YHTf.; YG = I I ({y;2]
JriJrj - Jrij ~ + I I (Jrij - Jri Jrj \Y;
~-
Yj
iEQj(*i )EQ Jri iEQ j(*i)EQ Jri Jrj
(5.1 .30)
Note that L: Jrj = n- Jri ' the i 'h unit with probability Jri is not there in the sum
j(*i)EQ
because the sum of all the inclusion probabilities is n. Also we know that
L: Jrij = (n -1)Jri . Using these results in (5.1.30) we have
j(* i)EQ
, \ y,2 y;2 ( \ Y ; Yj
V (YilTf.;YG = I -'-(n - Jr;) - I ~(n - 1)Jri + I I Jrij - JriJrj ~ -
iEQ Jri iEQ Jri iEQj(*i)EQ Jri Jrj
Chapter 5: Use of auxiliary information: PPSWOR Sampling 355
(5 .1.31)
Theorem 5.1.7. An unbiased estimator of the variance V (YHT )of the Horvitz and
Thompson (1952) estimator of the population total Y is given by
A(A ) = L--
vY (I-JTJ 2 + L L (JTij-JTiJTj]Yi
-- v, .
ies JTi2-Yi iesj(*i)es
HT (5.1.32)
JTij JTi JTj
Proof. We know that the variance of the Horvitz and Thompson (1952) estimator of
the population total, Y, is given by
E[.Iles(~)iai]
JTi = .I (~)}j2 .
len JTi (5.1.35)
Now
E[ I (~)y;ai]
JTi
i es
= E[.I (~)}j2aiti]
len JTi
(5.1.38)
(5.1.39)
(5.1.40)
(5.1.41)
j(;ti)En
L
( ~ - ~~ )
Jri Jr)
Y;YPijJrij = L
j(;ti)En
(~-~~) Y;Y) .
Jri Jr)
(5.1.42)
or in other words aij = 1/Jrij . Therefore an estimator of the second term of (5.1.33)
is given by
L L Jr lJ.. -Jr.Jr .) 1
I J YiY '-= L L (Jr"-Jr'Jr')y.y.
lJ I J _ '_J • (5.1.43)
iESj(;ti)ES ( JriJr) J Jrij iES j(;t i)Es Jrij JriJr)
, (,)
VSYGYHT = - L
1
L
(JriJr) - Jrij )( Yi
---
Y) )2 (5.1.45)
2 iESJC;ti)ES Jrij Jri Jr)
Proof. We know that the Sen--Yates--Grundy (1953) form of the variance of the
Horvitz and Thompson (1952) estimator of the population total Y is given by
Chapter 5: Use of auxiliary information: PPSWOR Sampling 357
(5.1.46)
(5.1.47)
(5.1.48)
(5.1.49)
(5.1.50)
The condition for the estimator of variance of the Horvitz and Thompson (1952)
estimator of population total Y to be non-negative is given by
(5.1.51)
( a ) Find the first order and second order inclusion probabilities provided that all
samples have equal chance of selection that is p(St) = 1/10, \;j t .
( b) Compare your results with SRSWOR sampling scheme.
( c ) P(SI) = 0.50, ph) = 0.30, P(S3) = 0.2 , p(St)= 0.00, t = 4, 5, 6, 7, 8, 9, 10 then
find the new first order and second order inclusion probabilities.
Solution. (a) When all samples have equal chance of selection, that is, p(St) = 0.1
\;j t then the first order inclusion probabilities Jri' i = 1,2, ..., 5 are given by
Jr) P[First unit A from the population is included in the sample 1
P(SI)+ P(S2)+ P(S3)+ P(S4)+ p(ss)+ P(S6)
0.1 + 0.1 + 0.1 + 0.1 + 0.1 +0.1 = 0.6,
The second order inclusion probabilities Jrij' i *- j = 1,2,3, 4, 5 (note that Jrij = Jrji )
are given by
Jrl2 P[First and Second units A and B from the population are included in the sample 1
p(s\)+P(S2)+P(S3) = 0.1+ 0.1 + 0.1 =0.3,
Jrl3 p[First and Third units A and C from the population are included in the sample 1
p(s\)+p(s4)+ph) = 0.1+ 0.1 +0.1 =0.3,
Jrl4 P[First and Fourth units Aand D from the population are included in the sample 1
P(S2)+P(S4)+P(S6) = 0.1+ 0.1 +0.1 =0.3,
JrIS P[First and Fifth units A and E from the population are included in the sample 1
P(S3)+P(SS)+P(S6) = 0.1+ 0.1 +0.1 =0.3,
Jr23 P[Second and Third units Band C from the populat ion are included in the sample 1
P(SI)+P(S7)+P(Sg) = 0.1+ 0.1 +0.1 =0.3,
Jr24 P[Second and Fourth units Band D from the population are included in the sample 1
P(S2) + P(S7 )+ P(S9) = 0.1+ 0.1 + 0.1 = 0.3,
Chapter 5: Use of auxiliary informatio n: PPSWOR Sampling 359
Jr2S P[Secondand Fifth units B and E from the population are included in the sample ]
P(S3)+P(Sg)+p(Sg) = 0.1+ 0.1 +0 .1 =0.3 ,
Jr34 P[Third and Fourth units C and D from the population are included in the sample]
P(S4)+ ph )+ P(SIO ) = 0.1+ 0.1 + 0.1 = 0.3,
Jr3S P[Th ird and Fifth units C and E from the populati on are included in the sample ]
p(SS)+ p(sg) + P(SIO ) = 0.1+ 0.1 +0.1 =0.3,
and
Jr4S P[Fourth and Fifth units D and E from the population are included in the sample ]
P(S6)+P(Sg)+P(SIO) = 0.1+ 0.1 +0.1 = 0.3.
Thus we observed that if all the samples have the same chance of selection then
Jri = 0.6, Vi = 1,2,..,5 and Jrij = 0.3, Vi *- j = 1,2,..,5 .
Thu s if all samples have the same chance of selection then PPSWOR and SRSWOR
sampling schemes are equivalent.
(c ) In this case we have
Jr l P[First unit A from the population is included in the sa mple ]
P(SI)+ P(S2)+ P(S3)+ P(S4)+ ph)+ P(S6)
0.5 + 0.3 + 0.2 + 0.0 + 0.0 + 0.0 = 1.0,
Jr2 P[Second unit B from the popul ation is included in the sample ]
P(SI ) + P(S2) + P(S3) + P(S7) + p(Sg) + p (Sg)
0.5 +0.3 + 0.2 + 0.0 + 0.0 + 0.0 = 1.0,
Jr 3 P[Third un it C from the population is included in the sa mple J
P(SI)+ P(S4)+ p(SS)+ P(S7)+ p(ss)+ P(SIO)
0.5 + 0.0 + 0.0 + 0.0 + 0.0 + 0.0 = 0.5,
Jr 4 P[Fourth unit D from the population is included in the sample ]
P(S2)+ P(S4)+ P(S6)+ P(S7)+ p(Sg)+ P(SIO)
0.3 + 0.0 + 0.0 + 0.0 + 0.0 + 0.0 = 0.3,
and
Jr 5 P[Fifth unit E from the population is included in the sample]
P(S3)+ p(ss)+ P(S6)+ p(ss)+ p(Sg)+ P(SIO)
0.2 + 0.0 + 0.0 + 0.0 + 0.0 + 0.0 = 0.2.
The second order inclusion probabilities Jrij ' i *- j = 1, 2, 3,4,5 , (note that Jrij = Jrji )
are given by
Jr l 2 = P[First and Seco nd units A and B from the popul ation are included in the sample ]
ll"\3 = P[First and Third units A and C from the population are included in the sample]
= P(SI)+ P(S4)+ P(S5) = 0.5 + 0.0 + 0.0 = 0.5,
ll"14 = P[First and Fourth units A and D from the population are included in the sample]
= p(sz)+ P(S4)+ P(S6) = 0.3 + 0.0 + 0.0 = 0.3,
ll"15 = P[First and Fifth units A and E from the population are included in the sample]
=P(S3)+P(S5)+P(S6) = 0.2+ 0.0 +0.0 =0.2,
ll"Z3 = P[Second and Third units B and C from the population are included in the sample]
=P(SI)+P(S7)+P(SS) = 0.5+0.0 +0.0 =0.5,
ll"Z4 = P[Second and Fourth units Band D from the population are included in the sample]
= p(sz)+ P(S7)+ P(S9) = 0.3 + 0.0 + 0.0 = 0.3,
ll"Z5 = P[Second and Fifth units Band E from the population are included in the sample]
= P(S3)+ p(ss)+ P(S9) = 0.2 + 0.0 + 0.0 = 0.2,
ll"34 = P[Third and Fourth units C and D from the population are included in the sample ]
= P(S4)+ P(S7)+ P(SIO) = 0.0 + 0.0 + 0.0 = 0.0,
ll"35 = P[Third and Fifth units C and E from the population are included in the sample]
= ph)+ p(ss)+ P(SIO) = 0.0 + 0.0 +0.0 = 0.0,
and
ll"45 = P[Fourth and Fifth units D and E from the population are included in the sample]
= P(S6)+P(S9) + P(SIO) = 0.0+ 0.0 +0.0 =0.0.
Example 5.1.2. John and Michael were appointed to select three players (n = 3 ) out
of five players (N = 5) from the list n = {Amy, Bob, Chris, Don, Eric} with their
scores 125, 126, 128,90 and 127, respectively.
such that
( a ) Find the first order inclusion probabilities for John's sampling scheme.
(b) Find the estimates of total score from each sample using John 's sampling .
(c) Find the bias in John's sampling scheme.
(d) Find the second order inclusion probabilities for John's sampling scheme.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 361
(e) Find the variance of John's sampling scheme using the Sen--Yates--Grundy
formula.
(f) Find the variance of John's sampling plan using usual formula.
(g) Find the variance of John 's sampling plan using definition of variance.
(h) Are three variances equal for John's sampling scheme?
( II) Michael likes Amy and cleverly suggests the following changes in John's
sampling scheme as: p(sd=0.50, p(sz)=O.OO , P(S3)=0.00 and p(s4)=0.50 .
( i) Find the first order inclusion probabilities for Michael's sampling scheme .
(j) Find the estimates of total score from each sample using Michael's sampling.
( k ) Find the bias in Michael's sampling scheme.
( I) Find the second order inclusion probabilities for Michael 's sampling scheme.
( m ) Find the variance of Michael's sampling scheme using the Sen--Yates--
Grundy formula .
(n) Find the variance of Michael's sampling plan using usual formula .
(0) Find the variance of Michael 's sampling plan using definition of variance .
( p ) Are three variances equal for Michael's sampling scheme?
( III) Discussion on John and Michael's schemes :
(q) Find the relative efficiency of Michael's sampling scheme over John 's
sampling .
( r ) Would you like to comment on the results?
( a ) The first order inclusion probabilities for John 's sampling scheme are
Jrt = P[Amy is included in the sample ] = p(s\)+ P(S4) = 0.25 + 0.25 = 0.50,
Jrz = P[Bob is included in the sample] = p(St)+ p(sz) = 0.25 + 0.25 = 0.50,
Jr3 = P[Chris is included in the sample] = p(s\) + p(sz) + ph) = 0.25 + 0.25 + 0.25 = 0.75,
Jr4 = P[Don is included in the sample] = p(sz) + P(S3)+ P(S4) = 0.25 + 0.25 + 0.25 = 0.75,
and
Jr 5 = P[Eric is included in the sample] = (S3) + P(S4) = 0.25 + 0.25 = 0.50 .
5
Note that L.Jri =3 .
i=\
( b ) Let YHT(t) ' t = I, 2, 3, 4 be the estimates of total score based on the first,
second, third and fourth sample respectively, then we have
'
y, _ " Yi _ Amy Bob Chris _ 125 126 128 _ 672 66-6
HT(\) - L. - - - - + - - + - - - - - + - - + - - - . ,
John iE S\ Jri Jr\ JrZ Jr3 0.50 0.50 0.75
'
y, _ " Yi _ Bob Chris Don _ 126 128 90 _ 542 66-6
HT(Z) - L. - - - - + - - + - - - - - + - - + - - - . ,
John iESZ Jri JrZ Jr3 Jr4 0.50 0.75 0.75
362 Advanced sampling theory with applications
'
y, _ " Yi _ Chris Don Eric _ 128 90 127 _ 544 66-6
HT(3) - L.. - - - - + - - + - - - - - + - - + - - - .
John ies3 Tei Te3 Te4 Te5 0.75 0.75 0.50 '
and
'
y, _ " Yi _ Amy Don Eric _ 125 90 127 _ 624 000
HT(4) - £.. - - - - + - - + - - - - - + - - + - - - . .
John ie s4 Tei Tel Te4 Te5 0.50 0.75 0.50
(c) The bias in John's sampling scheme: Note that true total score of the five
players is
5
Y= III = Amy + Bob+ Chris + Don + Eric = 125 + 126+ 128+90+ 127 = 596.
i=\
Now the bias in John 's sampling scheme is given by
( d) The second order inclusion probabilities for John's sampling scheme are
Tel2 = P[Amy and Bob are included in the sample] = p(s\) = 0.25,
TeI3 = P[Amy and Chris are included in the sample] = P(SI) = 0.25,
Tel4 = P[Amy and Don are included in the sample] = P(S4) = 0.25,
Tel5 = P[Amy and Eric are included in the sample] = P(S4) = 0.25,
Te23 = P[Bob and Chris are included in the sample] = p(s\)+ P(S2) = 0.25 + 0.25 = 0.50,
Te24 = P[Bob and Don are included in the sample] = P(S2) = 0.25,
Te25 = P[Bob and Eric are included in the sample] = 0.00,
Te34 = P[Chris and Don are included in the sample] = P(S2)+ P(S3) = 0.25 + 0.25 = 0.50,
Te35 = P[Chris and Eric are included in the sample] = P(S3) = 0.25,
and
Te45 = P[Don and Eric are included in the sample] = P(S3)+P(S4) = 0.25 + 0.25 = 0.50 .
(e) The variance of John's sampling plan using the Sen --Yates --Grundy formula
IS
= (0.50 x 0.50 - 0.25{ 125 _ 126 J2 + (0.50 x 0.75 _ 0.25{ 125 _ 128 J2
\ 0.50 0.50 \ 0.50 0.75
125- -
+(0.50xO.75-0.25{ - 90 J2 +(0.50xO.50-0.25{- -J2
125- 127
0.50 0.75 0.50 0.50
126-127
+(0 .50xO.50-0.00{ - - J2 +(0 .75xO.75-0.50{- 90 J2
128- -
0.50 0.50 0.75 0.75
128-127
+ (0.75xO.50-0.25 { - - J2 +(0.75xO.50-0.50{90 -J2
- - 127
0.75 0.50 0.75 0.50
= 3035.333 .
(f) The variance of John 's sampling plan using usual formula is given by
V(YHT )JOhn
=C~~l JJI2 +C~:2 Jrl +C~:3 JYl +C~:4 Jrl +C~:5 Jrl
364 Advanced sampling theory with applications
+(1-0.75)(90)Z +(1-0.50)(I27)Z
0.75 0.50
= 3035.333.
(i ) The first order inclusion probabilities for Michael's sampling scheme are
Jr] P[Amy is included in the sample] = p(s])+ P(S4) = 0.50 + 0.50 = 1.00,
=
Jr2 = P[Bob is included in the sample] = P(SI)+ p(sz) = 0.50 + 0.00 = 0.50,
Jr3 = P[Chris is included in the sample] = P(SI)+ p(sz)+ P(S3) = 0.50 + 0.00 + 0.00 = 0.50,
Jr4 = P[Don is included in the sample] = p(sz)+ ph)+ P(S4) = 0.00 + 0.00 + 0.50 = 0.50,
and
Jrs = P[Eric is included in the sample] = P(S3)+ P(S4) = 0.00 + 0.50 = 0.50 .
s
Note that I.Jri = 3.
i=l
Chapter 5: Use of auxiliary information: PPSWOR Sampling 365
(j ) Let YHT(t) MIC. haei ' t = 1, 2, 3, 4 be the estimates of total score based on the first,
second, third and fourth sample respectively, then we have
'
y, _ " Yi _ Amy Bob Chris _ 125 126 128 _ 633 000
HT(I) ' - L. - - - - + - - + - - - - - + - - + - - - .
MIchael iesl 7fi 7f1 7f2 7f3 1.00 0.50 0.50 '
Thus Michael's sampling scheme is also unbiased for estimating the total score.
( I ) The second order inclusion probabilities for Michael's sampling scheme are
7f12 = P[Amy and Bob are included in the sample] = P(SI) = 0.50,
7f13 = P[Amy and Chris are included in the sample] = P(SI) = 0.50,
7f14 = P[Amy and Don are included in the sample] = P(S4)= 0.50,
7f15 = P[Amy and Eric are included in the sample] = P(S4) = 0.50,
7f23 = P[Bob and Chris are included in the sample] = P(SI)+ P(S2) = 0.50 + 0.00 = 0.50,
7f24 = P[Bob and Don are included in the sample] = P(S2) = 0.00,
7f25 = P[Bob and Eric are included in the sample] = 0.00,
7f34 = P[Chris and Don are included in the sample] = P(S2)+ P(S3) = 0.00 + 0.00 = 0.00,
7f35 = P[Chris and Eric are included in the sample] = P(S3) = 0.00,
and
7f45 = P[Don and Eric are included in the sample] = P(S3)+ P(S4) = 0.00 + 0.50 = 0.50 .
125- 90)2
+(1 .00xO.50-0.50{ - - +(1.00xO.50-0.50{ -
125-127
-)2
1.00 0.50 1.00 0.50
+ (0.50 X 0.50 - 0.50{ -126 - -128 )2 + (0.50 X 0.50 - 0.00{ -126 -90)2
-
0.50 0.50 0.50 0.50
+ (0.50 X 0.50 - 0.00{ -126 - -127 )2 + (0.50 X 0.50 - 0.00{ -128 -90)2
-
0.50 0.50 0.50 0.50
(n) The variance of Michael 's sampling plan using usual formula is given by
V (YHT ~ichacl
Chapter 5: Use of auxiliary information : PPSWOR Sampling 367
= (1-1.00)(125
1.00
f + (1-0.50
0.50)(126 f + (1- 0.50)(128 f
0.50
+(1-0.50)(90
0.50
f +(1-0.50)(127
0.50
f
+ 2[(0.50 -1.00 x 0.50)(125x 126)+(0.50-1 .00x 0.5°)(125 x 128)
1.00x 0.50 1.00x 0.50
If John did not consider all possible samples we cannot do anything. If he selects all
possible samples, then we do not know how much probability he was going to
assign to different samples? How was Michael going to react? We may get an
answer to this question by doing an unsolved practical at the back of this chapter, so
let us think more here!
( ii ) What is the correlation between inclusion probabilities and the study variable?
This seems to be a good point. We observed that the value of correlation coefficient
between the John 's first order inclusion probabilities and the study variable is
negative that is P Jry(John) = -0 .569 and that for Michael's sampling scheme is
positive that is PJry(Michael) = +0.198.
Caution: While using PPSWOR sampling, we should make sure that the inclusion
probabilities have positive correlation with the study variable . Note that the
estimation of variance from each sample for John's sampling scheme is possible,
but remains biased . Further note that estimation of variance from each sample
using Michael's sampling scheme is not possible, because certain useful second
order inclusion probabilities are zero.
Moral: Note that three of the candidates (Bob, Chris and Eric) have higher scores
than Amy and John did not consider them together. Michael took the benefit
because John was trying to break the merit. Message for the future generations ,
"Do not attempt to break a merit and be honest if you get a chance to be an
administrator as otherwise someone, like Michael, may take benefit of your
limitation"
The next example has been taken from Ghosh (1998) which compares the two
estimators of variance of the Horvitz and Thompson (1952) estimators.
Example 5.1.3. A computer salesman wishes to estimate the number of left handed
students in a town having eight elementary schools . The salesman must make a
decision to ensure enough left hand mice are ordered to accommodate his expected
sale. He used prior information on the number of registered students to select a
sample of n = 3 units by using PPSWOR sampling. The information collected by
him is given in the following table:
Chapter 5: Use of auxiliary information: PPSWOR Sampling 369
0.20
2 4 0.40 0.20 0.20
3 2 0.50 0.20 0.20
Discuss two 75% confidence intervals based on two estimators of variance of the
Horvitz and Thompson estimator .
Solution: An estimate of the total number ofleft handed students in all schools of
the town is given by
• Yi Yl Y2 Y3 10 4 2
YHT = I - = - + - + - = - - + - - + - - = 36.2 '" 36.
iESJri Jrl Jr2 Jr3 0.45 0.40 0.50
Now the variance of the estimator YHT can be estimated two different ways :
, (A)_ 1
YHT - - I I
[JriJrj-Jrij][Yi
- - -Yj]2 -_ I I [JriJrj-Jrij][Yi
- -yjJ2
-
vSYG
2i*jes Jrij Jri Jrj i<jes Jrij Jri Jrj
= (0.45 x 0.4 - 0.2 )(~ _ ~)2 + (0.45 x 0.5 - 0.2 )(~ _ 2.)2
0.2 0.45 0.4 0.2 0.45 0.5
the circus is Y= SO w . Later a circus statistician suggests that the owner should
app ly random sampling in place of purpose sampling. Both the owner and
statistician decide to use a random sampling device which gives a 99/100 chance of
selection to Samb a and 1/4900 chance of selection to rest of the 49 elephants in the
circus. Thus the first order inclusion probabilities are given by
j
99 if Sambo is included in the sample,
"i = 1 0~
otherwi se.
4900
50
Clearly L " i = I indicate s selection of a sample of only one unit. Owing to the high
i~ 1
probability of selection, Samba is selected in the sample and the circus statistician
reports an estimate of the total weight of 50 elephants as
v _ '" Yi _ 100
L,- - - w
IHT -
iES"i 99
which is approximately the weight of Samba. Then the owner asks the statist ician
" If a large elephant named Jumbo would have been selected what would have been
the estimate of total weight?"
Let W be the weight of Jumb o. Then the Horvitz and Thompson estimate of the
total weight of all elephants is given by
, Yi
YHT = L:- = 490 0W
iES"i
which clearly is an over estimation of the true weight of 50 eleph ants . The main
mistake made by the circus statistician was to give a small selection probability to
Jumb o and a large selection probability to Samba. This ignore the fact that PPS
sampling works only if the correlation between selection probabilities and the study
variable is positive and high . In fact the selection method made by the circus
statistician shows that he might be a circus clown instead of a statistician or he may
not have understood the meaning of Horvitz and Thompson (1952 ). Thus aim of
Basu ( 1971) is to show that if some wrong prior (or Bayes) information is used in
the estimation process the results may be too biased, and gives us a caution while
using Bayes estimates!. Let us make the circus statistician 's problem clear with the
help of following examp les.
Example 5.1.4. Suppose there are 10 elephants in a circus and their weights and
diets are given in the following table:
lephantNo. I 2 3 4 5 6 7 8 9 10
arne ~ >
' Jumb o Jumbo Sambo Sambo Niko Niko Niko Niko Niko Niko
LTeight (kg) 5000 5000 1000 1000 500 500 500 500 500 500
Diet (kg) 300 250 75 75 50 50 50 50 50 50
372 Advanced samp ling theory with applications
Based on a samp le of one elephant , estimate the total weight of the 10 elephants
using the Horvitz and Thompson estimator.
Solution. In this case we are considering a sample of II = I unit. Then we can
consider that the inclusion probabilities are the same as the selection probabilities
N
beca use I Jri = If} = II = I . Assuming that the diet of each elephant is know n, the
iEO i=1
selection probabilities are given by
Now if the first elephant Jumbo is selected in the sample, the Horvitz and
Thompson's estimate of the total weig ht of the 10 elephan ts is given by
• 5000
JlIT(I) = - - = 16666 .67 kg.
0.30
Similarly, if the seco nd, third or fifth elephant, that is either Jumbo, Sambo, Niko, is
incl uded in the sample, then the respective estimates of the weight of all elephants
are given by
. 5000 • 1000 • 500
YHT(2) =- - = 20000kg, YHT(3) =- - = 13333.33kg, and YHT(5) =- - = 10000kg .
0.25 0.075 0.050
The true weight of the 10 elephants is given by Y = 15000kg . Thus one can easily
observe that in Basu' s (1971) example, the Horvitz and Thompson ( 1952) estimator
was correct. Rather it was misused by the circ us statisti cian by assigning incorrect
inclusion probabilities. The circus statistician problem is illustrated in the next
example.
Example 5.1.5. Suppose there are 10 elephants in a circus and their weights and
diets are given in the following table :
ElephantNo. I 2 00 3 4 5 6 7 8 9 10
Elephantname Jumbo JumboSambo Sambo Niko Niko Niko Niko Niko Niko
'Weight (kg) ~ 5000 5000 1000 1000 500 500 500 500 500 500
Diet (kg); I 1 81 I I I I 1 I I
Based on a samp le of one elephan t, estimate the tota l weig ht of the 10 elephan ts
using the Horvi tz and Thompson estima tor.
Assuming that the diet of each elephant is known, the selection probabilities are
Chapter5: Useof auxiliary information: PPSWOR Sampling 373
Now if the first elephant Jumbo is selected in the sample the Horvitz and
Thompson's estimate of the total weight of the IO elephants is given by
YHT (l) = 5000 = 450000 kg.
1/90
The true weight of the 10 elephants is only Y = 15000kg . Similar to Basu (1971) , the
total weight of the elephants is highly over estimated . The circus statistician never
considered that Jumbo's diet was heavier than Sambo' s diet. Thompson (1997) also
accepted that the circus statistician lost his job and perhaps became a teacher of
statistics, but he cannot be a teacher. Brewer (2002) attempted to weigh elephants
under SRSWOR sampling by taking a sample of 5 elephants, but Basu's (1971)
case is very serious . No doubt a sample of 5 units may represent a heterogeneous
population of 50 units and it may be possible to weigh an elephant in a spring
balance, but how can a sample of one unit represent a heterogeneous population?
There is a very basic assumption in sampling that the sample has to be random and
representative of the population. If the circus statistician does not know about these
two requirements in sampling theory, then it would have been better if the circus
statistician would have become a clown rather than a teacher. Caution! The use of
the wrong Bayes informat ion is more dangerous than using no information at all. A
layman can understand this in more simple language as follows: Consider a
policeman following a thief in going north, and the thief is also going north , but on
the way, while following the thief, the policeman received a phone call from the
police station that the thief went south, and now the policeman changes his
direction towards the south, but due to inaccurate information from the police
station, the policeman is now going away from the thief and will never reach the
thief. Basu (1971) alerts survey statisticians to be careful while using Bayes
estimates. More details about Basu's contribution to the foundation of survey
sampling theory can be found by Meeden (1992) and a decent monograph by
Ghosh and Meeden (1997).
As we discussed in the previous chapters, ratio and regression type estimators under
simple random sampling have been studied by a number of researchers, including
Cochran (1963), Srivastava (1967), Reddy (1974), Gupta (1978), Vos (1980),
Srivenkataramana and Tracy (1980,1981), and Singh and Singh (1993a). Most of
them are special cases of the class of estimators proposed by Srivastava (1971) in
which the efficiency of the optimum estimators is the same as that of the linear
regression estimator. There are a number of estimators viz. Srivenkataramana and
Tracy (1980) and Ray and Singh (1981) estimators which do not belong to the
Srivastava (1971) class. These are more efficient than the optimum estimators in the
Srivastava (1971) class. Das and Tripathi (1980) have used the coefficient of
variation of the auxiliary variable to form a class of estimators which are more
efficient than linear regression estimator. Srivastava and Jhajj (1981) have proposed
a general class of estimators in which, along with the ratio of sample mean to
population mean of the auxiliary variable, the ratio of sample variance to population
374 Advanced sampling theory with applications
variance of the aux iliary variable has also been used, and the opt imum estimator of
the proposed class was shown to be better than the linear regression estimator. For
the general sampl ing design, an unbiased estimator of S2 cannot be easily derived
x
N
but an unbiased estimator of IX; can be easily developed for any r 2: I.
i='
Following Cassel, Sarndal, and Wretman (1977), under any sampling design, the
population total can be estimated unbiasedly if and only if the first order inclusion
probabilities Jr i are positive for all the units in the population. For any such
sampling design, obviously
2
Y= I Yi , X, = I.::L and X2 = I!L
i ES Jrj i ES tfj iES Jrj
Theorem 5.2.1. ( a ) The lower bound of the asymptotic mean squared error
(AMSE) of the general class of estimators f g defined in (5.2 . I) is given by
(5.2 .2)
where g, (1, I) and g2 (I, I) denote the first order partial derivatives of the function g
with respect to UI and U2 , respectively . Then by the definition of MSE we have
Vo +-2
MSE (Y,g ) = Y 2[-2 v2 g22()
JiI g,2(1,1 ) +-2 COl (1,1 )
1,1 +2-g,
Y X, X2 YX,
Before defining the model we should define a few notation which will be helpful in
understanding the model based estimation strategies . Consider we wish to estimate
- ,N
the population mean, Y = N- L: Y; , based on a sample s of n observations drawn
;=,
with probability p(s) from a population of N units. The function p(s), defined for
all samples s, is called the sampling design. We shall consider the problem of
estimation of population mean using fixed effective sample size, i.e., all the units in
the sample are distinct. Let e, and Jrij be the probab ilities of including lh and l h
376 Advanced sampling theory with applications
and r population units in the sample and are called the first and second order
inclusion probabilities. These inclusion probabilities can also be defined as
"; = LP(S) (5.3.1)
ss i
and
"ij = LP(S) . (5.3.2)
ssi.]
(5.3.3)
Definition 5.3.3. Under design based approach the mean squared error of the
estimator Os is defined as
(5.3.5)
(5.3.6)
( b ) The mean squared error of the estimator and finite population parameter over
the super population model M by Royall (1971) is defined as
For example :
( a ) Rao (1979) minimized (5.3.6) subject to (5.3.4) and found that it leads to the
conventional sampling strategies involving randomization;
(b) Rao (1979) also noted that minimization of(5.3.7) subject to (5.3.3) provides
purposive selection strategies;
Chapter5: Use of auxiliary information: PPSWOR Sampling 377
Definition 5.3.4. A cost function is any linear or non-l inear funct ion of the costs of
selecting a unit in the sample and a number of units in the sample.
Various strategies have been proposed for estimating a finite population mean or
population total under a superpopulation model M that relates the variable of
interest to one or more auxiliary variable s. Brewer (l963a) and Royall (1970a,
1970b, 1970c, 1971, 1976) have adapted a linear model prediction theory to the
finite popul ation situation and have derived the best linear M unb iased (BLU)
predictor. Cassel, Sarndal, and Wretman (1976, 1977) and Sarndal (I 980b) have
propo sed a generalized regression pred ictor that is asymptotically design unbiased
(ADU) . Brewer (1979 ) suggested a predictor that blend s aspects of the BLU and
generalized regres sion predictors and retains the ADU prop erty by using a single
auxiliary variable. Isaki and Fuller (1982) propo sed some ADU predictors
involving several auxilia ry variables or characters. Wright ( 1983) exam ines
strategies that are approximately design unbiased and nearly optimal, assuming a
large sampl e surve y and a regression superpopulation model and suggested a new
class of predictors to link certain features of optimal design-unbiased and model-
unbiased predictors.
where ~ = ~I X x' z t
378 Advanced sampling theory with applications
1
}[ 13,
[l'X' "x",.......,x"
1,X ' 2,X ,·······,X
Y2 132
22 k2
X = Y= Y3 , 13= , and e= r.. - X'I!... .
- 1,Xln'X2n'........' x kn '
Yn 13k
Royall (1971) showed that the choice of design p(s) which mmnruzes
EmEJys - rf leads to the purposive design. If f( Xi) is a non-decreasing function
of Xi and f(Xi) / xl is non-increasing, then the sampled n units with largest values
provide the optimal sample selection. Brewer (1963a) noted that the purposive
design which minimizes EmE p (>is - rf ' the ratio method of estimation is optimal
when f(X;) = aXl for a = 1 and 0 ~ g s 1 . Cassel and Sarndal (1974) pointed out
that the study of Brewer (1963a) holds well even for continuous distributions of the
auxiliary characters . Royall's (1970a, 1970b, 1970c) result and his relevant work
has been the subject of much controversy for statisticians. The criticisms were noted
by Royall (1971), Cox (1971), and Wynn (1977a, 1977b). Neyman ( 1971) was
somewhat stronger in his criticism, saying that it would be dangerous to draw a
sample based on an unverified model. The optimal result is dependent on the model
assumed. In other words, different models could lead to different kinds of results
such as bias, etc.. Related criticism and interesting results can also be seen from
Royall and Herson (1973a, 1973b). We will discuss the robust estimation procedure
of Scott, Brewer, and Ho (1978) which is in fact the extension of the work of
Royall and Herson (1973a, 1973b), ensuring the robustness of the standard ratio
estimator against polynomial superpopulation models by choosing balanced
samples, to the case of more general regression estimator .
5;·312scotT;'BREWER~iJANDHO'·
... .. .
~ .. -
. _-
.-""
S.ROImsTEST.·.. IM.·••· ·.AT.IONSTRA'I'EGY
.. ' -, , -". .. .. .. . ..- . ~
Scott, Brewer, and Ho (1978) have shown that the requirement for robustness is a
relationship between the moments of the sample units and those of the remainder of
the population and it can be achieved approximate ly by an unequal probabil ity
sampling scheme. Let us first introduce the Royall and Herson ( 1973a) notations for
the purpose of clarity. Royall and Herson (1973a) found that the efficiency and
robustness can be combined by choosing an optimal estimator of population total
under a superpopulation model and a selection procedure so that the resultant
estimator is a BLU estimator under a more general family of polynomial models.
They used the notation r;loo ,0, ,..., 0 p :v(x)J to represent the superpopulation model,
given by
xsU)=n-1Lxl
iES
and XU)=N- I
t:« . The superpopulation models of the form
iEn
j
';[0, 1, x 2r have been used by the several researchers including Smith (1938) ,
Jessen (1942), Raj (1958), Rao and Bayless (1969) , and Bayless and Rao (1970) . It
is interesting to note that a more general form of the superpopulation model given
by ';[0, 1 : v(x)] has been considered by Scott, Brewer, and Ho (1978) . Following
them, regardless of the way the sample observations have been obtained, the BLU
estimator of population total Y is given by
IV -'(xi)Yixi ( )
Yo = 1'[0,1: v(x)] = IYi + iES -I( )x2 IXi - IXi . (5.3.2.2)
iES IV Xi i iEn iES
iE S
= f O'fJ .j
j ;1 } } I
iE(.d:
~:;) I
iES
[OjfJjx/+I]_
V(Xi)
I
iEn5
!)'
iES V(Xi)
380 Advanced sampling theory with applications
•
Since I
ieo.-s
( IP
j=O
8jfJjXijJ = °.
On sim?lifying and.using t~e condition of balanced sampling we have
E.;~Yo-Y)=O If s=s (p).
Hence the theorem .
Theorem 5.3.2.2. If s = s*(p) , then Yo is the BLU estimator of the population total
under the model ';(80 , .. ., 8p : v*(x)) for any variance function of the form
V*(x) = V(x) £ 8jajx j- 1
• (5.3.2.3)
j=O
Proof. Under the model ';(0,..., 0, 8j = 1 : v(x}xj-I), the BLU estimator of
population total Y is given by
j
• -_ IYi + I
l(j) YiXij '-1 / I YiXi2 '- 1 } (IXi
j - IXij) -_ Yo•
(5.3.2.4)
{
ies ies V(Xi )x/ ies V(Xi }xl ieo. ies
j-
linear unbiased estimator Y, E';j (Y - Y) depends only on the variance V(x )x I
and not on the coefficients fJo ,...,fJ p' Since Yo is unbiased when s = s* (p) and is
the BLU estimator when fJk = °for k *' j , this implies that Yo is the BLU estimator
under ';j'
Thus if Y is an unbiased linear estimator under the model ';(80 , .. ., 8 p : v*(x)) with
v*(x)= £8 .a.V(x}xj-1
j=1 } }
then its mean squared error is:
E.;(Y-Y) = £8.a
j=O } }.E.;.(Y-Yf.
} (5.3.2.5)
Hence the theorem.
If v(x) = x then the model considered by Scott, Brewer, and Ho (1978) reduces to
the Royall and Herson (1973a, 1973b) result leading to balanced sampling and the
estimator Yo reduces to the traditional ratio estimator of population total Y defined
as
IYi
~=~IXi'
IXi ieo.
iES
Chapter 5: Use of auxiliary information: PPSWOR Sampling 381
In this case the reduction of bias through balanced sampling may be expensive in
terms of efficiency under the model ';(0, I : x] . Thus the most efficient sampling
strategy is to choose the n units with the largest Xi values . The relative efficiency
of the balanced sampling is given by
M m.
· _ J.
(XQ-S
Xs
Royall and Herson (I 973a) have given numerical values of relative efficiency for a
variety of distributions.
Ix · IV-I (xi)xl
ie(Q-s) ies
reduces to
j
I x
I '" j_1 -_ ieQ-s I
- .::,.X· , j=O,I, ...,p,
nies I IXi
ieQ-s
which is always true for j = I. A sample satisfying this condition IS called
'overbalanced' .
Thus (5.3.2.6) shows that if the sampling fraction is small and no single Xi
dominates the others, the MSE is affected very little by the choice of sample and
little efficiency is lost by choosing an overbalanced sample.
As suggested by Scott, Brewer, and Ho (1978), in many real life situations v(x)
increases more quickly than x but less quickly than x2 , so that .;l8o,..., 8p : v(x))
with v(x)= a,?x + aix 2 is often a fairly realistic model.
382 Advanced sampling theory with applications
Thus we have
MSE(11V)-- N(N -n)rlalx+a2
2- 2-(2)]
x, (5.3.2.7)
n
and
MSE(Y2 ) = N(N - n) X(n-s )[aI2 + ai x], (5.3.2.8)
n
where X(n-s) is the mean of the elements not included in the overbalanced sample,
~ xi2 an d x- -- N- 1 L.
x-(2) -_ N - I L. ~ Xi .
~I ~I
L xi
..!.- L x i-I = ien-s I
n ies I LXi
ie n-s
with j = 0 that X(n- s) is smaller than Xs and hence less than x, which implies that
MSE(Y2 )<MSE(Y\ ).
In other words , the ratio estimator with balanced sampling is less efficient than Y2
with overbalanced sampling . It may be noted that the loss in efficiency will be less
if a? dominates ai , but can be substantial if ai is relatively large. Thus to form
an efficient estimator, we have to look for overbalanced sampling.
Note that
N 'j
E( L x!. ) = ;=1
;EO -S
LJr
N
fx!. =LX
;=1
! (I+AxJ . (5.3.2.12)
Also we have
E( . L XI )=E(LXtl)/A..
lE Q -S IE S
(5.3.2.14)
Thus an approximate overbal anced sample can be obtained if the sample size is
large enough .
such that Em(e;) = 0, Em (el)= a2 xf and Em (e;ej)=0 for i ;to. j and g is any real
number. In the prev ious sections, we have seen that if the sample size is fixed at n
the design variance of the Horvitz and Thompson (1952) type estimator of
population total (calIed generalised linear regression estimator or GREG) is defined
as
(5.3.3.2)
(5.3.3 .3)
If the samp le size n is fixed and sampling design is p , then the design variance of
the estimator Yg becomes
, ) 1 2
Vp (Yg = -2 .L.L(Jr;Jrj - Jrij Xd;e;- djeJ . (5.3 .3.4)
I* JES
384 Advanced sampling theory with applications
Consider the random variable Yg - Y, then the variance of this random variable
under the model m and the sampling design p are called Anticipated Variance
(AV) .
=(J' 2 I
ien
(I
--I
1l'i
JXig
which is the same expression as was shown by Godambe (1955) to be the minimum
possible anticipated variance for any design unbiased estimator of population total.
Similar results can also be had from Brewer (1979) and Sarndal and Wright (1984) .
The Anticipated Variance (AV) can also be derived with an alternative method as
follows :
i1l'j}i
AV(Yg)=EmVp(Yg)+VmEp(Yg)",Em[I I (1l'ij-1l' ej]
ienj(;ei)en 1l'i1l'j
We have seen in Corollary 5.1.1 that if 1l'i oc lj , the variance of the Horvitz and
Thompson (1952) estimator of the population total Y reduces to zero.
Unfortunately the values of Y.I are not usually known in practice, but the value X
of the auxiliary variable correlated with Y may be known . Thus if 1l'i is chosen
proportional to X i' there may be a substantial reduction in the variance. Now we
will consider the construction of the first and second order inclusion probabilities in
the following section.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 385
Theorem 5.4.1. The inclusion probability of selecting the j''' population unit using
PPSWOR sampling scheme is given by
TCi = P{I + S - P;/(I - Pi )}, where S =
j =1
I
Pj /(1- pJ (5.4.1 )
TCi = Pi( l + I
jo;<i=1 ) ) ) )
I
P-/(I-p. )1 =P .(I + P-/(I-P .)-P; / (I-P;) ]
j =I) )
=P;(I +S-P;/(I -p; )) (5.4.3)
where Pi /(1-Pi) is the prob ability of selecting the j''' unit and S = I Pj /(1- Pj) .
) =1
Now the factor Pi/(1- Pi) is not constant because it depends upon Pi' but S is a
constant since it is a sum over all the units. Therefore TCi is proportional not only to
P; but also to the factor Pi/(1- Pi ) . To achieve minimum variance , the probability
chosen should be such that TC.I may become proportional to Pi' Narain (1951),
Yates and Grundy (1953) , Fellegi (1963), and Hanura v (1967) have suggested
differ ent methods to obtain the values of 7r i proportional to p; . Chaudhuri (19 8 I)
has shown that the application of the sampling method of Fellegi (1963) to the first
occasion units and the unmatched second occasion units provides the required
inclusion probabilities. Now we shall discuss in detail the methods for the
construction and optimal choice of inclusion probabil ities.
386 Advanced sampling theory with applications
Assume Jr; ,
i = 1,2,...,N, be the set of desired inclusion probabilities for a sample of
size n drawn by using WOR sampling, say the target inclusion probabilities or nps
N
probabilities. Note that 'LJri =n, the nps probabilities should be such that
i~1
The first problem with these inclusion probabilities is that some size variables lead
to problems, since one or more of the Jrt exceeds one. This problem can partially
be solved by using a revised selection probability discussed in Section 5.4.5 . The
second problem concerns the exhibition of a good, preferably a best scheme . To
solve this problem , Rosen (1997a) has stipulated the following requirements :
( a ) The sample selection should be simple to implement.
( b ) The scheme should lead to good estimation precision.
( c ) The scheme should have good variance estimation properties.
We will now discuss below some target inclusion probab ility sampling schemes :
( b ) If the trial results in failure, draw two units from the population with
replacement and with probab ilities I/J;' = P; /(1- PN + PN - I ) for the t" unit
I::; i::; N -I and I/J~ = (PN - I/J /2)/(1-I/J).
In case the selections do not coincide then accept the sample; otherwise reject the
sample and select two units from the population with replacement and with
prob abilitie s proportional to 1/J;,2. If the sample consists of distinct units, then accept
it, otherwise reject it and repeat the process. Hanurav (1967) listed the following
obviously desirable properties of a sampling design to base Horvitz and Thompson
(1952) estimator:
( a) Jrj = np;, i = I, 2, ..., N; (b) No.of distinct units, v = n '<:j s :p(s) > 0 ;
Several other researchers have also worked on similar procedur es including Fellegi
(1963 ), Vijayan (1968 ), Fuller (197 1), Brewer (1967, 1975), and Asok and
Sukh atme (1975, 1976a).
388 Advanced sampling theory with applications
N
Putting Sj(n) = p;(n)/(l-p;) and S(n)= l: Sj(n) and eliminating p;(n) from (5.4.3.3)
j=!
we have
Sj(n) = p;[n-(n-1)S(n))j(1-np;) . (5.4.3.4)
N
Putting (5.4.3.4) in S(n)= I Sj(n) we have
j= l
working probabilities as
pJn) = nPj(1- Pj);[(1- nPj){1 + (n -l)T(n))]. (5.4.3.6)
Chapter5: Use of auxiliary information: PPSWOR Sampling 389
N
It is obvious that L Pi(n) = I , 0 < p;(n ) < I since 0 < "i = np; < I . Keeping in view the
i =1
Sen--Yates--Grundy estimator of variance, Brewer (1975) also suggested a formul a
for calculating the second order inclusion probabilities given by
where "t )(n -I) denote the joint probability of inclusion of the /"
k
and l" units in
the remaining (n-I) units given that the J(h unit was selected first. Thus the joint
inclusion probabilities can also be calculated by using the above recursi ve formula .
Chromy (1974) found that the second order inclusion probabilities "ij given by this
procedure asymptotically minimized the expected variance of the Horv itz and
Thompson (1952) estimator when r =1/2 . Rao (1963a, 1963b) showed that the
Horvit z and Thompson (1952) estimator is always more efficient than the
corresponding Hansen and Hurwitz (1943) estimator for multinomial sampling, and
that its variance estimator was never negati ve. Thus in this case, the joint
prob abilitie s of inclusion , and hence also the variance estimator, are simple
functions of size.
Sampford (1967) propo sed a method for inclusion prob abilities propo rtional to size
(lPPS ) scheme for selecting a sample of n units, which in fact is an extension of the
method s proposed by Brewer (1963a), Rao (1965b ), and Durb in (1967). For this
scheme of sampling, the first order inclusion probability is given by
lfi = probkh unit is selected at the first draw1
+probk" unit is selected at the second draw1
N
= p; + L PjP;.j= 2P; (5.4.4. 1)
j "'i=1
wher e P;.j denotes the conditional probability P(i I j } If P; denotes the probability
of selecting the { h unit at the first draw, then the probability of selecting the second
unit from the remaining ( N - I ) units in the population is given by
It can be easily verified that "i"j - " ij ~ 0 for Samphord' s sampling procedure.
Midha (1980), Hartley and Rao ( 1962), and Asok and Sukhatme (1976b) have also
suggested some procedures to appro ximate these inclusion probabilities. The
Durbin (1967 ) procedure has also been used by Brewer and Hanif (1970) for
developin g a new multistage estimator of variance.
390 Advanced sampling theory with applications
Narain (1951 ) proposed another method of sample selection, which is free from any
restriction on the set of initial selection probabilities and leads to a more effic ient
estimator of the population parameters than with replacement sampling. This
method consists of making revised selection prob abilities (i = 1,2,..., N) such that p/
the resulting inclusion probabilities If; are proportional to the original probabilities
of selection P;, i = 1,2,...,N . For a sample of two units , the revised selection
probabilities are given by
Rao (1989) , Yates and Grundy (1953) and Brewer and Undy (1962) have done
further work in these direction s.
5.4~6 MIDZUNO-::SENMETHOD
Midzuno (1952) and Sen (1952) introduced an interesting method for selecting the
sample , which is useful in many ways. Using this method , the unit at the first draw
is selected with unequal probability, while the rest of the units are selected with
equal probability and without replacemen t. Definin g a random variable, t. , such
that
t, =
1
{I if illl uni t is selec ted in the sample,
0 otherwise . (5.4 .6.1)
We have
E(t;) =If;=P; + p[/Il u~it. is not se lec ted at the first draw and is selecte d at any of the ]
remaining (n -I) draws
=p,
1
+(I - P, { ~J=(~JP,+(~J
''\..N_ lN- l N- I 1
(5.4 .6.2)
and
Ek tj)=lfij
= {it" unit is selected at the first draw and/ ' unit is selected ]
at any of the remaining (n- I) draws
p[i"
+ unit is selected at the first draw and i''' unit is selected]
at any of the remaining (1l-1) draws
=(~)[(!!..::.!2)(p;
N-I N-2 +P )+ (~)]N- 2 .
j (5.4.6 .3)
For this sampling scheme, one can easily see that Jr;1<j - Jrij > 0 , which guarantees
non-negativity of the estimator of variance proposed by Sen (1953) and Yates and
Grund y (1953). At the same time the limitation is that the first order inclusion
probabilities are not proportional to the selection probabilities. This can be rectified
by finding a new set of revised selection probabilities P;- by applying a suitable
transformation on the selection probabilities P; such that the resultant inclusion
probabilities Jri, i = 1,2 ,.., N are proportional to the value of Pi ' Note that P;- are
the revised selection probabilities we have
Note that the revised selection probabilities Pi- must always be positive, the initial
selection probabilities Pi must satisfy
(5.4.6.6)
Thus the use of revised selection probab ilities for deriving efficient estimates of
population total through Horvitz and Thompson estimator will be possible only in
those cases where the original selection probab ilities satisfy the above condition.
This condition on the initial probabilities usually does not hold, and hence limits the
use of the scheme in practice . Some more relevant work related to the discussion of
the validity of these inclusion probabilities can be found in Rao (1963a, 1963b),
Asok (1974 ,1980) , and Asok and Sukhatme (1978). A generali zation of Midzuno--
Sen sampling scheme has been given by Prasad and Srivenkataramana (1980) ,
Deshpande and Ajgaonkar (1987) , and Kumar and Srivenkataramana (1994) . Bedi
and Agarwal (1999) suggested a new set of revised probabilities under the Midzuno
(1952) sampling scheme. The revised probabilities are functions of the location
shift factor L between 0 and I and it is remarkable that the optimum value of L is
free from any knowledge of unknown population parameter.
Dey and Srivastava (19 87) considered the following IPPS sampling scheme.
Con sider a population of N units with y as the study variable and x, an auxiliary
variable, as the size. It is assum ed that x values are known for all the population
units. A sample of size n (> 2) is to be selected. To start with, it is assumed that n is
even. Divide the population into m (> n/2) groups so that the /i1 gro up contains
N, (> 2) units (i = 1,2, ..., m), and for each group
XI! X > (n- 2)j {n(m - I)} (5.4 .8.1)
were
h L XiII.IS the va Iue 0 f x c:lor the util unit• •In tel
X i = Ni h ·til group and X = IIn X i .
lI ; l i;1
Equation (5.4 .8.1) is satisfied if the Xi (i = 1,2,....m] are made nearly equal. It has
been seen in actual populations, consider ed by Rao and Bayless (1969), that this
conditi on is satisfied for quite a few values of m. Rao and Lanke ( 1984) suggested a
grouping procedure in which N units are divided into R groups so that group
totals X i are nearl y equal and group sizes are either [N/ R] or [N/ R] + 1, whe re [e] is
Chapter 5: Use of auxiliary informati on: PPSWOR Sampling 393
the largest integer. Having formed the m groups, the sugge sted sampling procedure
consists of the following steps:
Step I. Select n/2 groups out of m groups using Midzuno ( 1952) sampling
procedure, i.e., select one group with probability
p/= {n(m- I}p; -(n- 2}}/(2m- n), with p;=XJ X (5.4.8.2)
and the remaining (n/2}- 1group s with equal probabilities without replacement.
Step II. From each of the selected groups , select two units by any IPPS procedure,
say by Durbin 's (1967) procedure; that is, from the j''' selected group
(i = I. 2...., n/2) select one unit with probability
P;u li =Xiu/Xi (5.4.8.3)
and the second unit with revised probability
P;u liv =x;v[(Xi-2Xivtl +(Xi - 2Xiut ' ] /D i , (5.4.8.4)
For this sampling scheme the inclusion probability for the i~" unit is evidentl y given
by
Jr . =nP (5.4 .8.5)
'u III
where p;U = XiII IXand the joint inclusion probabilities for a pair of units are given
by
Jr . .
'II
=nP'u P' v (p-p
I 'u
-PI v )lI/ lU,
f ~.(P_
I
2P'II XP-
I
2P)}
'v
(5.4 .8.6)
'V
Saxena, Singh , and Srivastava (1986) suggested the following IPPS sampling
strat egy. Consider a popul ation of size N with Jj and Xi as the study and the
auxiliary variab le values, respectively, for the j''' unit. We further assume that
Xi > 0 for all units in the population. The following steps make the IPPS sampling
strategy:
Step I. Select a sample , s, (say), of size n from the popul ation by simple random
sampling without replacement. Let s; be its complement, that s; = n - Sl . Perform
independent Bernoulli trials on each unit of s,
with probability of success Pi for
the j''' unit in the population. Let the number of successes be r .
Step II. If r < n, select n - r units from s; by simple random sampling without
replacement.
394 Advanced sampling theory with applica tions
The ultimat e sampl e s will consist of r units selected at Step I and (II- r) units
selected at Step II. For this sampling scheme, the first and second order inclusion
probabilities are given by
Cassel, Sarndal, and Wretman ( 1976) have shown that the average variance of any
estimator as satisfying the cond ition of unbiasedness may be:
(5.4.10.1)
are respectively the variance and bias in the estimator as ' Note that Vm(e) is
constant , it does not enter into the minimization process. Con sider an estimator a;
and a set of inclusion probabilities that yield E p (a; )= o. If as minimi zes E p Vm (as)
subject to the condition I p(s)9s= e, then the choice of a; also minimi zes (5.4.10.1 )
s
if Bm (0;)=
O. For example, a model unbiased estimator in the case of a linear
homogeneous class of estimators that satisfies the above criterion is defined as
Chapter5: Use of auxiliary information: PPSWOR Sampling 395
Ys = "i.d;y; (504.1004)
iES
which is the well known Horvitz and Thompson estimator. Hajek (1958) obtained
the optimal estimator and design pair for estimating population mean Y and
introduced a general cost constraint defined as
(504.10 .5)
to obtain the optimal choice of the first order inclusion probabilities given by
Jr;OC oj,f;:; under the super population model m : y; = a; + e;, where Emk) = 0,
(504.10.6)
Several other researchers have also suggested methods to find the optimal first
order inclusion probabilities. For example Godambe (1955) also obtained results
similar to Hajek (1958) . Rao (1975), Cassel, Sarndal, and Wretman (1976) and
Rao, and Bellhouse (1978) have also suggested optimal choices of the inclusion
probabilities. Maxmin zrps sampling designs have been discussed by Hanurav
(1967), Rao and Bayless (1969), Sinha (1973) , Chao (1982) , Gabler (1984), Herzel
(1986), Chaudhuri and Vos (1988), and Herze l (1993) . Comparisons of PPSWR
and Brewer's zrps WOR procedures have been done by Sampford (1967),
Chaudhuri (1974), Gabler (1981) , and Sengupta (1986) .
Example 5.4.1. We wish to estimate the total number of fish of all kinds caught by
marine recreational fishermen of the Atlantic and Gulf coasts during 1995.
Population 4 in the Appendix shows that information on the number of different
kinds of fish caught during 1992 is available. Use the known information on the
number of fish caught during 1992 to select a sample of eight units by using
PPSWOR sampling. Collect the required information from population 4 to estim ate
the total number of fish caught during 1995. Apply the Sen--Yates--Grundy
estimator of variance to construct a 95% confidence interval.
From population 4 we have the total number of fish caught during 1992,
X = 291882. Thus the selection probabilities p; = xd X have been calculated as
shown in the Table 5.4 .1. The val ues of the first order inclusion probabilities based
on the Midzuno--Sen sampli ng scheme given by
N -n n- l
ti· = - - R + - -
I N-I IN_I
have also been presented in the above table. For example, the value of til IS
calcu lated as
til = N-n F]+~= 69-8 xO.005026+~=0.10745
N- 1 N- I 69 - I 69 - 1
and so on. The values of the second order inclusion probabilities based on
Midzuno--Sen sampling scheme given by
Chapter 5: Useof auxiliary information: PPSWOR Sampling 397
lrij =
n-l[N-n(
N-1 N- 2 P; + P
) n-2]
j + N- 2
have been given in Tab le 5.4.2. For example the value of lrlZ is calcu lated as
lrl2 [N-
= -n - -1 - -
n ( ~ +P ) + -11 - -2 ] = -
z N - 2 69 -
8 - 1 [69
- - - 8 (0.005026 + 0.008123) + - 8 - -2 ] =0.010451
N- 1 N- 2 - 1 69 - 2 69 - 2
and so on.
T abl e 5.4.2. Secon d order inclusion probabilities for the units selected In the
sample.
I
~
,."~: c:"
"" ",%
",l{.E .' . " .,
.
1fi},}? -e -(~ .{'c . ;..
The values of Sen--Yates--Grundy weights, (lr;lr j -lrij )Ilrij, are given below:
i.» 2 1 :;<> 6 7
2 ~ 0.133289
3 0.127726 0.127238
4 . 0.137742 0.134317 0.127962
5 0. 123894 0. 124520 0.125694 0.123590
I;' 6 0.109511 0.114270 0.123288 0.107219 0.129638
I ' 7 ' ", 0.140723 0. 136417 0.128445 0.142822 0. 122973 0.102578
1$, ,;,8 !' " 0.136022 0.133105 0.127683 0.137442 0.123948 0. 109925 0.140346
Using information from Table 5.4.1, the HT estimate of the total number of fish
caught during 1997 is:
• y.
YHT = I -1.. = 271386.0 1 .
i E Sltj
Now using information from Table 5.4. 1 and Table 5.4.3, the Sen--Yates- -Grundy
estimator of variance of the HT estimator of total is given by
398 Advanced sampling theory with applicat ions
2
x.IJ. y. y.
i« j)=lj=1 (
L8 L8 Jr.Jr . -
= 1 J
Jrij J( -l.. _ _
Jri
J
Jrj
J
The above 28 values of (Jril[ j- J( Jrij Yi - Y jJ2 are given in the following table.
Jrij Jri nj
Cha pter 5: Use of auxiliary information: PPSWOR Sampling 399
j
Table 5.4.4. Th e va lues of (lrilr - lrij J(Yi - Yj
lrij lri lrj
J2for different va lues of i and j.
' l' ' < ,Y"::!,,,. :~,:~:''i Y I" , :::'. " " " , ~::" >,.\1
l'>:~ ~ ::' )" ti: /;2-2 :'i3 " Iu; 'it 5 : ,; 6, Y" , 7
4~tt~i
I
,
",
j
" 1< I"
9' "~,g o,
,~'
' 2$' 690440.4
~3 25687680.6 18035050.2
Thus we have
The nex t section has been devoted to discuss ca libration approach In sampling
theory.
Statisticians are often interested in the precision of survey estimates. The most
commonly used esti mator of population tota l or population mean is the generalized
linear regression (GREG) estimator. Let us consider the simplest case of the GREG
where information on on ly one auxi liary variable is avai lable . Co nsi der a
population n = {I, 2, .., i, .., N} , from whic h a probability sample s (s en) is drawn
with a given samp ling design pO. The inclusion probabilities lri = P(iES) and
lrij EP(iES, j Es) are assumed to be strictly positive and known. Let Yi be the value
of the study variable, y, for the i''' population unit and let Xi be the value of the i'''
unit of the associated au xiliary variable. The population total X = LXi of the
iEO
auxi liary variable x is assumed to be acc urately known. The objective is to
survey data can be found in Bethlehem and Keller (1987) . Deville and Sarndal
(1992) used calibration on the known population total, x , to modify the basic
sampling design weights , d ; =1/1r; , that appear in the Horvitz and Thompson
(1952) estimator
(5.5.1)
A new estimator
(5.5.2)
was proposed by Deville and Sarndal (1992), with weights W; as close as possible
in an average sense to the d , for a given measure and subject to the calibration
constraint
2:w;x; = X.
(5.5.3)
ie s
Theorem 5.5.1. Minimization of chi square (CS) distance between the new weights
W; and selection weights or design weights d , leads to a general regression type
Proof. Let us define the chi square (CS) type distance function D as
D = I(w; -d;)Z(d;q;
ie s
r' (5.5.5)
where q; are suitably chosen constants such that the estimator depends upon its
choice. The Lagrange function L for minimizing D in (5.5.5) subject to the
constraint in (5.5.3) is then given by
Wi = di xl 'iA xi
ies
and the resultant estimator reduces to the ratio estimator of popul ation total as
YR = .' iA lES
Yi( X!.I diXi ] .
IES
(5.5.10)
Remark 5.5.1. Singh, Horn, and Yu (1998) reported that there is no choice of qi
such that the resultant estimator (5.5.4) reduces to the product estimator of
population total discussed by Cochran (1963) .
The main difficulty with the calibrated weights given in (5.5.9) is that they do not
satisfy the desired constraint of weights being non-negative. Deville and Sarndal
(1992) have considered several distance function s which guarantee the non-
negativity of the weights. Let us discuss a new distanc e funct ion, which also
guarantees the non-negativity of the weights, in the following theorem.
Theorem 5.5.2. The optimal weights obtained by minimizing the distance function
which implies
Wi = exp[ln(d i )+ ..lxi-I], (5.5.14)
where the value of ..l can be obtained by solving
(5.5.15)
ie s
Thus (5.5.14) shows that the calibrated weights are always non-ne gative if the
distance function (5.5.11) is minimized for the calibration constra int (5.5.3). Hence
the theorem .
Thus we conclude that the calibration approach can guarantee to yield non-negative
estimators of population total depending upon the choice of the distance function to
be minimized.
402 Advanced sampling theory with applications
Example 5.5.1. Find the calibration weights for the units selected in the sample by
using PPSWOR sampling by making use of known information about the number
of fish caught during 1994 as auxiliary variable at estimation stage.
Use the chi square distance function between the design weights and calibrated
weights. Discuss two cases where these weights lead to the ratio and GREG
estimator for estimating the total number of fish during 1995. Deduce the value of
the estimate in each case, provided that the sample has been selected using
Midzuno --Sen's scheme of sampling as in the Example 5.4.1 by using information
on the number of fish caught during 1992.
Solution. The chi square distance function gives the calibration weights as
,
IES
I,>':@ .._ :i:;';'; "i' -I'
I.; <C;"'AA;A~ ";~,.,::,;"" ;i o o c.
·'i.··i·.W.. ;' I~).lv;~iiu':i
1~~~~4, 1 ;: ; Yi ;i~ . I ;~ ' " : : :dr~I~'
1
'{i'~;i:.: ';1<';; =
l:1:'1J. •
I '';;;:.~ i'
....,. , ' ;i " ;: 1
:- i':; :. ::.: , '- I:F i" ii'
Sharks, other 2001 2016 0.107450 9.306654258 18622:6152 9.73030 19616.3
Blue runner 5692 2319 0.110228 9.072105091 51638.4222 9.48508 21995 .9
Tautog 2653 3816 0.115834 8.633043839 22903.4653 9.02603 34443.3
Atlantic mackerel 4860 4008 0.106 153 9.420364945 45782 .9736 9.84919 39475 .6
Spanish mackerel 3850 2568 0.120075 8.328128253 32063 .2938 8.70723 22360.2
Summer flounder 17741 16238 0.139569 7.1649 14845 127112.7543 7.49107 121640 .0
Gulf flounder 776 163 0.103605 9.652043820 7489.9860 10.09141 1644.9
Winter flounder 2300 2324 0.107686 9.286258195 21358 .3939 9.70898 22563.7
;i ;;
,'iiii;> e...s,i;i;i> '.>';;i Xi,i!>'!x . ." . ", ~111 3.26971,9042 'K '" ,283739.8
Note: To find Jri refer to Example 5.4.1.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 403
8
Thus we have Idi xli =340963 .6378 and the ratio estimate of the total number of
i=l
fish caught during 1997 is given by
YR = LWiYi = 283739 .8 .
ie s
Thus a GREG estimate of the total number of fish caught during 1995 is
YG = I WiYi = 284206.90 .
ie s
Now the question arises of how the calibration can be done if there are two or more
auxiliary variables. To answer this question we have the following theorem :
Theorem 5.5.3. Suppose XI and X 2 are the known totals of two auxiliary
characters Xli and X 2i , for i = I, 2,..., N . The minimization of the CS distance
function (5.5 .5) subject to the two linear calibration constraints given by
IWiXIi = XI (5.5 .16)
ie s
and
(5.5.18)
and
l
·L diqiX~'
IES
.LdiqiXliXli]rAI] =
IES
lXl - .L dixli ]
lES •
(5.5.23)
1
.L diqixlixli' .L diqixl i L Xl - .L diXl i
IES lES '"2 tES
and
Chapter 5: Use of auxiliary information : PPSWOR Sampl ing 405
(5.5 .24)
Yo = .' f diYi
lES
+ P1 l( XI - .' f diXli ) + P2( X 2 - .' f diX2i )
I ES l ES
(5.5 .25)
where
and
Example 5.5.2. Find the calibration weights for the units selec ted in the sample by
using PPSWOR sampling by making use of known infor mation about the number
of fish caught during 19 9 3 and 1994 as auxiliary variab les. Use the chi square
distance function between the design weights and calibrated weights. Discuss the
cases when these weights lead to the regression estimator for estimating the total
number of fish durin g 1995. Deduce the value of the estimate.
iX2i
+d f(.tdiX1~)(.tdiXii)-(.tdiXliX2i)2)
1 1=1 1=1 1=1
.
To calculate these weights we proceed as follows:
W.
",,,, . wl
10.62613538
I
'"21422.3
i
12.91002878 29938.4
7.35385154 28062 .3
14.07120429 56397.4
9.43691815 24234 .0
5.39870687 87664.2
10.65755444 1737.2
8.18806195 19029.1
"":-'Ii'!:
~~ ~1Sum~ ·268484:8
Re ma rk 5.5.2: Note that there is no choice of weights qi such that the estimator
h in (5.5.25) will reduce to a multivariate ratio type estimator with two auxiliary
variables.
Now if there are p auxiliary variables Xii ' j = 1, 2,..., p, and the population totals
Theorem 5.5.4. The optimal weights in case of p auxiliary variables are given by
w = d, + d .q.[I CX (5.5.30)
I I "
The resultant estimator of the population total III the presence of p auxiliary
variables is given by
YG = r.d;Yi + "IdiqiYi['CX . (5.5.31)
ie s ie s
Now we will study some properties of the above estimator III the following
theorems :
where R;O XI,X2,..,XP denotes the multiple correlation coefficient of Y on Xl> x2'''''x p '
In the following section we shall discuss the calibration of the estimator of variance
of the HT estimator of population total, Y, defined as
fHT = 'i.d;Yi
ies
and then the estimation of variance of GREG defined as
in section 5.7.
Remark 5.5.3: The drawback with the above approach is that it may produce
negative calibrated weights or large calibrated weights for some units. This can lead
to unstable parameter estimates in some domains. We may also produce implausible
estimates such as a negative total for a variable which is strictly positive. In order
to alleviate these problems we would like to introduce restrictions on the values that
the calibrated weights can take. Consider we specify a lower bound Ii and an
Chapter 5: Use of auxiliary information: PPSWOR Sampling 409
uppe r bound IIi for each unit i E S and bounds may differ from each unit. Thus the
above probl em can be restated as
The Sen--Yates--Gru ndy (1953) form of the variance of the estimator JIlT for a
fixed sample size is given by
(5.6.1)
where Dij = lJrillj - Jrij)/ Jrij denote s the design weights. Singh, Hom , Chowdhury
and Yu ( 1999) consider an estimator of variance of the Horvitz and Thompson
(1952) estimator
(5.6.4)
410 Advanced sampling theory with applications
adjusting the denominator of weights Dij has also been discussed by Fuller (1970),
but his method may have a limitation of not guaranty ing the non-negativity of the
estimates of variance.
For simplicity they restricted themselves to the two dimensional CS type of distance
D between two n x n grids formed by the weights wij and Dij for i, j = 1, 2, ..., n ,
defined as
(5.6.5)
In most situation s Qij = 1 but other types of weights can also be used. It is shown
that the ratio type estimator proposed by Isaki (1983) is a special case for a
particular choice of Qij' Minimization of(5 .6.5) subject to (5.6.4) leads to modified
optimal weights
Wij=Dij I{ DijQij(diXi - d j x)
} [ (A) 1
VSYG X HT - - .2: ,2:Dij\diXi-djxj}
{ \2] . ( 6)
1 (\4 2 l'T'j E S 5.6.
'2,2:,2: DijQij\diXi - dj xj)
' *jES
Substitu tion of wij from (5.6.6) in (5.6.3) leads to the following regression type
estimator :
where
B =,2:,2: DijQij(diYi -djYj~(diXi -dj X) / ,2:, 2: DijQij(diXi -djxjf = it22 /it04 (5 .6.8)
' *j E S ' *jES
and
VSYG (X IIT )= ± .2:,2:Dij(diXi - djX ) ' (5 .6.9)
' *j E S
The leading term of the mean squared error of the regression type estimator (5.6.7)
is
MSE[vl (YHT )] = V [VSYG (~lT )]+B 2V[vSYG(x HT)]- 2Bcov[vSYG(YHT ~ vSYG (xHT)]
(5.6.10)
where
B= L L Qij(JriJrj -JrijXdiYi - djYj ~(diXi - djXj ~/ L LQij(JriJrj- JrijXdiXi-djxjf
~~ ~~
and
Chapter 5: Use of auxiliary information: PPSWOR Sampling 411
Here "ijkl denotes the positive probability of including four units in the sample, i.e.,
"ijkl = P(i, j, k, I E s).
Expression (5 .6.10) shows that the estimator VI (YHT ) of variance is not always more
efficient than the estimator given by Sen-Yates-Grundy (1953), but the estimator
(5.6 .7) is consistent because the ratio of modified weights to original weights, i.e.,
wij/ Dij converges in design probability to unity. This condition of consistency is an
analogue of the condition given by Sarndal, Swensson, and Wretman (1989).
Ramakrishnan (1975a) has also mod ified the Sen-Yates-Grundy (1953) estimator to
suit varying sample size designs. It is also possible to calibrate the denominator
dij = "iiI of Dij similar to Fuller (1970) and has the limitation of not guaranteeing
the non-negativity of estimates of variance. Sitter and Wu (2002) claim that the
two -dimensional chi square distance function can take negative value, and hence
calibrates only design weights dij = "iiI . Note that the calibration of dij = "iiI will
not guarantee the non-negative estimate of variance, but the calibration of Dij can
guarantee the non-negativity of the variance estimate if calibrated under bounded
conditions by using quadratic programming. Further note that distance can be
negative, but the magnitude of distance cannot be negative. Stukel, Hidiroglou, and
Sarndal (1996) have attempted to compare Jackknife and linearization forms of the
estimators of variance.
Here we would like to discuss the cases where the estimator VI (YHT ) reduces to
usual estimators in simple designs as follows :
Case I. Under SRSWOR, "i = " j = n] N , "ij = n(n -1)/N(N -I) and if qij = I then
(5 .6.7) becomes
where
~rswor denotes the estimator of population total under SRSWOR design,
2 Inn t. )2 . . . 2
SY = ( ) I I I)' i - Y j IS an unbiased estimator of Sy'
2n n -I i= l j = 1
2
Sx = (
1
)
n n
I I
(
Xi -Xj
)2 .
IS
. .
an unbiased estimator of st.2
2n n -I i=lj=1
and
b = fi22/fi04
where
412 Advanced sampling theory with applications
, j) n (
N 4(1_ tl '12 ( '12 , N 4(1_ j) n n ( \4
fl22 = 4( ) I I I)'i - Yj) Xi -Xj) , and fl04= 4( ) I I X i - Xj) '
n n-I i;lj;1 n n -I i=lj;t
The ratio
Case II. For an IPPS sampling scheme for which "i = npi and"ij is a second order
inclusion probability such that I"i = n and I "ij = (n -I)"i , then the Horvitz
iEO j (;t i )EO
and Thompson (1952) estimator becomes
• 1 tl Yi
YHT =-I-
n i ;IPi
with
Vt(YHT )=vAYHT)+b~(XHT )-VAXHT)] (5.6.12)
where
, (Y
vy
, ) = -12 I
HT I v. -Y-j J2 ,
kij ( - . (,)
VX Y HT = - 2
1 I I kij ( -
Xi
- Xj
- J2
2n i » jes Pi P] 2n i ",jE s Pi P]
where
n 2p op 0 ,
Case III. If qij = (d iXi -d j X J-2 then we have ratio type estimator given by
,(, JRatio ' ('YHT 1V, SYGXH
\ = VSYG
VI JIlT '
T ].
VSYG X HT (5.6.13)
Following Sarndal, Swensson, and Wretman (1989), Deville and Sarndal (1992),
Sa rndal ( 1996), and Rao (1997), the GREG can be written as
" e· -
YG = L.-L+f3dsX (5.7 .1)
iE SJrj
and the Sen --Yates --Grundy (19 53) form of estimator of variance of the GREG is
where Dij = lJr;Jrj - Jrij)/ Jrij , i '" j , and e; = Y; - iJdsX; . Thi s estimator can easily be
written as
1
VSYG(YG)=-2 .L..L.Dij(d;e; - dje} +vll(X-±d;X;J +vl2(X-.±d;X;J2 (5 .7.3)
,*J ES 1=1 1= 1
where
IE S
1 ,*~
vl2 = O.5{ .Id;q;x }-2 .I Dij(d;q;x;e; - djq jxjej Y.
JE S
The est imator in (5.7.2 ) covers a variety of estimators of variance. Let us consider
SRSWOR design, i.e., Jr; = Jrj = n] N and Jrij = n(n -I);N(N - I).
Case I. If q; = I , then YG reduces to the usual regre ssion estimator of total. Now
if IV; = d, in (5.7 .2), it reduces to
Case II. If qi = J/Xi then the estimator YG reduces to the ratio estimator YR of the
population total. The estimator of variance (5.7.2) reduces to
2
_(" )_ N (1_ f) II 2{ X }2
v YR - ( ) L.ei -..,,- (5.7.5)
n n -I i=1 X
where,
and 1f12 =
(
_""-N
-;- ) I I (x;e;-xje}
.:.. j;-:]_ _-::-_
-_n.!...c~/:·".!.-
2N(n-l) (" 2J2
Ix;
;;]
Deng and Wu (1987) have defined a general class of estimators of the variance of
the regression estimator:
2((I_
vow(YG )=N f))Iel{~}g (5.7.8)
n n-I ;;1 X
The linear form of the class of estimators (5.7.6) takes the form
, (,)_
vow
2
YG - N ((1_f) n 2[
(X ) g(g-I)(X
) I ei I+g -,.-1 + - - - -,.-1
)2+....] (5.7.9)
nn-I ~l X 2 X
which is again similar to (5.7.3).
Following Singh, Hom, and Yu (1998), the estimators of variance of the estimators
of total considered so far belong to the low level calibrat ion approach. The
estimators studied by Chaudhuri and Mitra (1992) are also special cases of the low
level calibration approach . As noted earlier, there is no choice of qi which reduces
YG to the product method of estimation considered by Cochran (1963) . Thus the
estimation of variance of the product estimator has not been discussed here. To
discuss the efficiency of such estimators consider an analogue of the general class
of estimators of the variance of GREG by following Srivastava (1971) given as
Example 5.7.1. We wish to estimate the total number offish of all kinds caught by
recreational fisherman on the Atlantic and Gulf coasts during 1995. Population 4 in
the Appendix shows that information on the number of different kinds of fish
caught during 1992 and 1994 is available. Use the known information on the
number of fish caught during 1992 to select a sample of eight units by using
PPSWOR sampling. Collect the required information from population 4 to estimate
the total number of fish caught during 1995 using a regression estimator by using
information during 1994 as an the auxiliary variable. Apply the Sen--Yates--
Grund y form of the estimator of variance of the regression estimator to construct a
95% confide nce interval. Also use the estimator of variance of the GREG proposed
by Sarndal, Swenson and Wretman (1989).
Solut ion. Again referring to example 5.4.1 , the ultimate sample is as shown below .
As before, the Sen-- Yates--Grund y weights based on first and second order
inclusion probabilities are:
8 8 2
'i.dixli = 326971.9042 and 'i.dixli = 3047945487 .
i=1 i=1
Thus the regression estimate of the total number of fish caught during 1995 is
Now we assume the following superpopulation model passing through the origin as
Yi = fixli + ei'
J
d I. ¢
Yi ~ Xli 2
d I.x l I· d.x ·y · , e.
,c,
I 1I I I
Residual plot
2000 T •
'* 0 -f-- *- ---. -- - - -+- . -.----.- .\.. -~ _.. _ --j
•
ei = Yi - fJdsxi .'
WIth fJds = .' I d iXliY i / 'Idixl2i = 0.861381359,
I ES
therefore, the sum of the estimates of these residual s mayor may not be equal to
zero .
VSYG (YG)
2 2
1 e. e. e. e ,
I8 I8
( J( ( J(
zs:_ _ J_
1[ .1[ . - 1 [ . .
=-I I
1[ .1[ . - 1[ . .
-!..-_...L. I J
J J
IJ IJ lj
2
" ." . - " .. e e·
The above 28 values of
( I ~ij lj
J( ~ - ~
J
for different comb inations i and )
". 2 9 1238120.44
'3 "
14065768.94 170976306.70
'''4..; 2667685 .08 636126 17.02 283893 10.20
. ;~f 5 , 993 1124. 16 368810 63.74 47536545.93 256 1215.89
-'& 6'1- 1868 160.75 104865300.70 4992906.747 7803347.57 22 190860.32
7 8 127085.79 470 10740.63 42049622.94 146 1282.77 225325.07 14 113424.37
8 29093 .82 9436 1304.94 12849024.92 3250757 .81 10988465 .09 1478773.16 9 121878.27
Thus we have
vSYG (Y"G ) = . 2:
8
1«J l= IJ=1
2:8 ("'' '-''' J(
I J
" ij
lj 2 __
"j
e'J2
J = 854647113.8 .
"j
96029625.10
14456136.50 178738153.50
2748863.48 67239252.44 29199645.52
10252616.95 39207263.34 48956859.12 2648940.95
2416283.22 113663403.70 4352131.18 9005203.82 24667396.60
8223784.93 50272113.22 42935981.61 1441813.32 259376.84 15624764.01
30258.71 99296317.46 13198368.82 3352643.45 11348317.19 1962788.15 9121878.27
Thus we have
The next section is devoted to the higher level calibration approach introduced by
Singh, Horn, and Yu (1998), where the variance of the auxiliary variable is
assumed to be known. Several new estimators are shown as special cases of the
higher level calibration approach.
where O ij are the modified weights attached to the quadratic expression by the
Sen--Yates- - Grundy (1953) type of estimator and are as close as possible in an
average sense for a given measure to the Dij with respect to the calibrat ion
equat ion, defined as
420 Advanced sampling theory with applications
the right hand side of (5.8.2), we need either information on every unit of the
auxiliary variable in the population , or only VSYG HT) obtained from a past survey (x
or pilot survey. The examples of a situation where information on every unit of the
auxiliary variable is known are the establishment turnover recorded from census or
administrative records or Business Register (BR) or Internal Revenue Service, etc..
The use of a known variance of the auxiliary variable has also been supported by
Das and Tripathi (1978), Singh and Srivastava (1980), Srivastava and Jhajj (1980,
1981), Isaki (1983), Singh and Singh (1988), Swain and Mishra (1992) , Shah and
Patel (1996) , and Garcia and Cebrian (1996) . Singh, Mangat , and Mahajan (1995)
have reviewed classes of estimators of unknown population parameters making use
of the known variance of an auxiliary variable. For simplicity, Singh , Horn, and Yu
(1998) have been restricted themselves to the two-dimensional CS type distance,
D , between two 1/ x 1/ grids formed by the weights n ij and Dij for i,j = I,Z,..., I/ ,
given by
D =-ZI .L. L (n ij _Dij)2(DijQij)-1 . (5.8.3)
'''') ES
In most of the situations Qij = 1 but other types of weights can also be used. They
have shown that the ratio type adjustment using a known variance of the auxiliary
variable is a special case for a particular choice of Qij ' Minimization of (5.8.3)
subject to (5.8.2) leads to modified optimal weights as
nij = Dij + 1
DijQij(d;x;-djxj ~ [ ( ,) 1 (
VSYG X HT - z ,2:.2: Dij d .x, -d/ "j)
\2 ] (5.8.4)
2' ,2:,2: DijQij(d;x; - d jxj ~ ' ''' ) ES
'''' ) ES
Substituting nij from (5.8.4) in (5.8.1) leads to the following regression type of
estimator
v(YG) = vSYG(YG)+B1lvsYG(x HT)- vSYG(x HT )] (5.8.6)
where
81 = .L ,"i. DijQij(d;x; -
' * J ES
d jxj Y(w;e; - wjej ~/ ,"i. , "i.DijQij(d;x; -djx) = fi22/fi04 '
l':l:- J ES
Chapter 5: Use of auxiliary information: PPSWOR Sampling 421
The regression coefficient HI makes use of the known total X of the auxiliary
variable and hence can be treated as an improved estimator of the regres sion
coefficient by following Singh and Singh (1988) . Under the higher level calibration
approach, Singh, Horn , and Yu (1998) have discussed the following cases :
Case I. Under SRSWOR sampling design , if qi = xiI and Qij = (d; x i - d jX j )-2 are,
respectively, the weights attached in the low level and higher level calibration
appro ach, then the estimator (5.8.6) reduces to the estimator of the variance of the
ratio estimator as
(5.8.7)
(5.8.9)
Without loss of generality, the estimators of variance of ratio and GREG given in
(5.8.7) and (5.8.8) are neither members of a low level calibration nor of the class of
estimators by Deng and Wu (1987) . These estimators are members of the
analogues of classes of estimator s for estimating variance of GREG given by
Srivasta va and Jhajj (198 1) as
II 2 JH (X S
;J
2
vSJ Y = ( N((I - f) ) s:«
" (") G """"" -2 (5.8.10)
1111 -1 i; l X Sx
where H(., .) is a parametric function such that H(I, 1) = 1 which satisfies certain
regularity conditions defined in Chapter 3. Following Srivastava and Jhajj (1981)
and Deng and Wu (1987), it is easy to see that the class of estimators in (5.8.10)
remain s better than the class of estimators defined in (5.7.6) and hence (5.7.10). A
difficult issue in using (5.8.1) is how to get non-negative estimates of variance
using calibration. The simplest way is to optim ise the CS distance funct ion (5.8.3)
subject to calibration constraint (5.8.2) along with the conditions D.ij ~ 0
'i i, j = 1, 2 ,..., 11. Con straint calibratio n weights can be obtained by following
Estevao (1994) . While it is difficult to develop a solution to this problem
theoreticall y, well known quadratic programming techniques can yield useful
numeri cal results . Straightforward extension, using other distance function s as
discu ssed by Deville and Sarndal (1992) for instance, to the two-dimensional
problem is not poss ible owed to the unpredictable nature of the weights Dij'
422 Advanced samp ling theory with applications
Padmawar (1994, 1998a) has suggested sampling stra tegies admitting the non-
negative unbiased estimators of the variance following the lines of Rao and
Vijayan (1977) and Rao (1979). Chaudhuri (198 1) also discussed the methods for
construc ting non-nega tive estimators of the variance by following Sharma (1970),
Rao (1972, 1977a), Vijayan (1975) and Cha udhuri (1976) and Bandopadh yaya,
Chattopadhyaya, and Kundu (1977). Amab (1992) and Chaudhuri and Roy (1997a)
have also cons idered the problem of estimation of popu lation total under a super
population model.
Example 5.8 .1. We want to estimate the total real estate farm loans of all the
operating banks in the United States. Take an SRSWOR sample of 15 states from
population I and note the records of the real estate farm loans as well as nonreal
estate farm loans. Given that information on the nonreal estate farm loans is
avai lable for all states, apply the ratio estimator for estimating the total real estate
farm loans in the United States. Construct the 95% confidence intervals using low
and higher level calibration estimators.
Given: N = 50 , X =43908 .12 and S~ =1176526 .
Solution. We used the first two columns of the Pseudo -Random Numbers (PRN)
given in Table I of the Appendix to select 15 distinct random numbers 1 ~ R; ~ 50 .
We observed the random numbers in the sequence as 01,23,46,04,32,47,33, OS ,
22,38,29,40,03,36 and 27.
", *;;:~ Selected sampl e and analysis ~
l I!oP;<' ~itate I N onrealJ Real estate ;,</ '<,,,!!C~~\ ' " '{ I;; - 1:,:
"
An estimate of the total nonreal estate farm loans in the United States during 1997
is given by
X = N x = 50 x 1098.892 = 54944 .6 .
Also we are given X = 878. 1624 and f = 0.3 .
Thus a ratio estimate of the total real estate farm loans in the United States during
1997 is given by
Yo = N -(
R Y
XJ
x = 50 x 630.414 x ( 878.1624)
1098.892
= 25189 .276 .
= 12910667.19.
Making use of Table 2 from the Appendix the 95% confiden ce interval based on the
low level calibration estimator for estimating the total real estate farm loans in the
United States is given by
This example shows that the width of the 95% confidence interval obtained from a
higher level calibrated estimator is smaller than that obtained from a low level
calibration estimator.
424 Advanced sampling theory with applications
Farrell and Singh (2002a) considered a new estimator of the variance of GREG as
(5.8.1.1)
where O ij are recalibrated weights such that chi square type of distance function
-l~~j~S
is minimized subject to the calibration constraint
or equivalently
where cij are real constants, e; = y ; - fix; such that Em (e;)= 0 , Vm(e;)= a 2v(x ;) and
Em (e;eJ = Pe a 2 ~v(x; )v(Xj) for i '" j, a 2 > o. Here Pe is the correlation coefficient
between successive error terms that are related according to e; = Pe e; - l + U; where
the U; - i .i.d. N(O, 1).
Optimization of (5.8.1.2) subject to (5.8.1.3) yields the recalibrated weights
where YIand Y2 are sample dependent constants, is also a special case of (5.8.1.5).
We illustrate some special case s of (5.8 .1.5) here, but a large number of estimators
can be deri ved as special cases.
If q; =1/x; , Jr; =n/N , Jrij =n(n- I)/N(N - I), v(x;)=xf for O:,> g :,> 2 and if
'(r.' . ) = '
- Ix f -
N
I ;en
Pe I
N(N - I) ( ;e n
vxf J21
I;
v rn~ ~ 2 (5.8.1.7)
I ~Ixf-
n ~(INJ
n(n -I)
;e n ien
where Vo = N~(l- ~) I el and e; = v,- (y/x)x; . Assuming that I(x; - x)/ xl < I,
n n-I ie s
1 +~ 2: I (Xi=.X]j+l II (g- k)
N iEOj=O X k=O(j + I) (5.8. 1.8)
1
1+-2: 2:
00 (
Xi~X
)j+1 nj ~
- ( k)
n iEOj=O x k=OV+1)
;r
If j = 0 then (5.8.1.8) becomes
V(YratiO) = va(
which is the same as the class of estimators defined by Wu (1982).
However, if j =2 then
where
N - 1i=1
I
P03= _1_ (Xi - xf and iJ03= _1_ t (Xi - X-Y are the third order population and
n - 1i= l
sample moments of the auxiliary variable. This estimator does not belong to the
class of estimators proposed by Singh, Hom , and Yu (1998) , implying that the
recal ibration method is more general than that of these authors.
Consider the case of autocorrelation Pe not necessarily zero and note that if
(5.8.2.2)
where
L: L: !1ijcij(hl + hJ - 2peh;hj XWiei - wjej ~
PoPt= i* jes ( )
L: L: Cij!1ij III + IlJ - 2Pehihj
i* jes
This result illustrates that the Farrell and Singh (2002a) technique works to
recalibrate the Yates and Grundy (1953) form of the estimator of the variance of
GREG under the condition of minimum variance for the estimator of the total under
the true model. It is often the case that cij = I . If this is so, then (5.8.2.2) is a new
estimator of variance of the GREG estimator. Three other special cases of (5.8.2.2)
are also worthy of note.
Case I. If Pe =0 and cij =(hl +hJ }-t , then the estimator (5.8.2 .2) reduces to
. (.)
V2
I ( \2
YG = - L L Wij Wjej - Wjej J
j LL
" je fl 0 ij
--'-'-'~--,---
. 1 (5.8.2 .3)
2iv j es II; )
1
2
LL Wijh +
i e jes
Case III. If Pe = +1, then for cij = (hi - hj)-2 the estimator of variance in (5.8.2.2)
becomes
V2(YG)= O. (5.8.2.5)
Note also that the condition If i oc ~v(x;) corresponds to the Godambe and Joshi
(1965) lower bound of variance, so the variance for fixed sample design under the
true model may be equal to zero. This demonstrates the usefulness of the
recalibration method of the estimator of the variance of the GREG estimator.
Although different choices of the function, V(Xi1 leads to different estimators of the
variance of GREG and a most common form of the function can be considered as
V(Xi ) =xf , where g is any known model parameter.
428 Advanced sampling theory with applications
where the basic weights di(s) can depend both on s and i (i E s) and satisfy the
design unbiasedness condition. The choice h(y) = y in (5.9.3) gives Godambe 's
(1955) class of estimators of total. If d;(s)=Jr- 1 then (5.9.3) reduces to the Horvitz
and Thompson (1952) estimator of population total. If d, (s) = Wi and
h(y;) = J(Yi ~ r}, then (5.9.3) reduces to the estimator ft(t) suggested by Silva and
Skinner (1995) . Rao (1979) has suggested an estimator to estimate the variance of
the estimator il y as
v(il)= L: L:Dij(S)WiWJi -Zj~
i < j (5 .9.4)
i.j« S
where Zi = h(Yi )/Wi and weights Dij (s) can depend both on s and (i, j) E S , and
satisfy the unbiasedness condition. For example, the Sen--Yates--Grundy (1953)
estimator of the variance of Horvitz and Thompson estimator is a special case of
(5.9.4) with Wi =Jri and Dij(s)=(JriJrj-Jrij)j(JrijJriJrj) for any design with fixed
sample size n. Singh (200 I) proposed an estimator of the variance of il y as
vAil)= L: L:Wij(S}.viWJi -Zj~'
i< j (5 .9.5)
i.j e S
where wij(s) are the modified weights and are as close as possible in an average
sense for a given measure to the d ij (s) with respect to the calibration equation:
L:L:Wij(S)WiWj(qi-qJ2 =v(il x ) '
i< j (5.9.6)
i. j e s
denotes the known second order moment of the estimator ilx = Idi(s)J(x;) of the
ie s
auxiliary parameter H x .
The minimisation of the two-dimensional CS type distance D between two lower
triangular n x n grids formed by the weights wij(s) and Dij(s) for i, j = 1,2,...,n is
defined as
Minimization of (5.9 .7) subject to (5.9.6) leads to the modified optimal weights
given by
i< j t,J E S
i.jE S
On substituting the value of wij(s) from (5.9.8) in (5.9.5) we obtain a regression
type estimator for the variance of ily as
(5.9 .9)
where
B = .L L wfw]Dij(s)Qij{q; -q)(z; -Zj~/.L L wfw]Dij(s)Qij(q; - qj ~
1 < } 1< }
i ,j e s i ,je s
and
v(ilx )=i L< LDij(s)w;wAqi -q) .
j
i,jE S
The approximate leading term of the mean squared error of the regression type
estimator (5.9.9) is given by
B= !.I I
1< J
;.jE rl
w;wJDij {o.)Qij (qi - qj Yk - Z})/!.I 1<
;,jErl
Lw;wJDij{o.)Qij(q;- qj
}
r)
and
Cov[v(Ji y1v(il.J]", Lc jL«kL« LDij(n)Dkl(nXJrijkl
i I
- JrijJrkJqi - qjY(Z; - Zj ~ .
i , j , k,lErl
Note that Dij' Dij(s), and Dij(n) have their different meanings. Dorfman and Hall
(1993) considers the estimator of the distribution function of a variable over a finite
population, when a sample of units is available and the values of a related auxiliary
variable are known for the whole population and developed several estimators
430 Advanced sampli ng theory with applications
Consider IWij - dijl < I . Let g(wij) be any function of the new weights wij satisfying
the regularity conditions
( a) g (d ij) = 0 ,
( b ) The first and second order partial derivatives of the function g exist and are
known .
The weights wij are obtained such that the function g (wij ) has the minimum value
subj ect to the constraint
LL Wijh(Xi,Xj)=Vt , (5.9.1.3)
Expanding the function g (wij) around the point Dij by using second order Taylor
series, we have
g(wij) = g[Dij + (wij - Dij)]
(5.9.1.5)
where 'II ij and 'II ~ denote the first and second order derivatives respectively of
the function g with respect to wij and known constants. For example , if 'IIij = 0
and 'II ~ = Ij(DijQij) ' then L L g (wij ) reduces to the CS distance function discu ssed
Chapter 5: Use of auxiliary information: PPSWORSampling 431
The Horvitz and Thompson (1952) estimator and the generalized regression
(GREG) predictor for the population total, Y, are given respectively
(5.10.1 )
and
and
v2 "' ..!-2 i¢}
I I Dij(gsidiei -gs}d}e ) I sij (5.10.4)
I ..
Sl]
",{I if (i andj) Es.
0 otherwise.
Kott (1990) proposed the following estimators:
Vk} '" wi}, j '" 1, 2 (5.10.5)
where w} is the calibration weight satisfies
EmVg -Y~ '" w}Em(v} ). (5.10.6)
432 Advanced sampling theory with applications
The estimators are practicable for the model M(j) with o} = a 2fi . Chaudhuri and
Roy (I 997a) have shown that Vy , the variance of the regression predictor, can be
written as
2
Vy = IaiYi + I IaijYiYj,
i i~ j (5.10.7)
where
(5.10.8)
N N
V (r.d .x.f . ) r. d id k0ikxk r. d jdk0 jkXk
a ij = - d i d j 0- ij + Qi Q/ fi 7rj XiXj (
p I
\2 + QjXj 7rj
I 51 k- I
2 +
Q X 7r
i i i
k -I
2
(5 • 10• 9)
r.Q~~J r.Q~~ r.Q~ ~
with Qi > 0 being an assignable constant to form different estimators of regression
coefficient, and Vp( L diX/Si) =.!. L L 0ij(diXi - d jXj Y.They considered a class H of
iES 2 i~jE n
non-homogeneous quadratic unbiased estimator as:
vy =as + Lbsii I
l si + LLbSijYiY/Sij
l~ J
(5.10.10)
where as , bsi and bSij are constants free from the Yi values and satisfy the
unbiasedness conditions as follows :
Ep(asl = 0, Ep(bs/s;) = ai , and E p(bsylsij) = a ij .
Chaudhuri and Roy (1997a) derived the lower bound of the variance of an estimator
belonging to the class H under the following superpopulation model M :
Eml v;) =,ui ' Vm (Yi)= al and CmlYi' Yj)=O for i e ] and showed that the variance
estimator
Vo = Idiai~l--al-,ul Y Si+ IIaijdij(YiYr,ui,uj Y Sij+I a, (al+,ul
i i~ j i
)+I i~
I a ij,ui,uj (5.10.11)
j
is optimal within H in the sense that its variance attains the lower bound. In the
next section, Arnab and Singh (2002a) have shown that the result concerning the
lower bound is incorrect and hence the optimality property of vo cannot be
acceptable.
The estimator "0at (5.10.11) can not be used in practice since it involves unknown
parameters ,ui and a l . So Chaudhuri and Roy (1997a) proposed the following
alternative estimators when ,ui = fJxi and al = a 2x f by replacing the unknown
fJ and a 2 with their suitable estimators as:
Chapter 5: Use of auxiliary information: PPSWORSampling 433
(5 .10.12)
(5.10 .13)
where
The lower bound of the estimator of variance by Chaudhuri and Roy (1997a) has
been restated in the following theorem:
Theorem 5.10.1.1.Under model M, and vy E H
Vm(Vy)~ L:a?(di -lh? + L:L:aJ(lfIij -lhij (5 .10 .1.1)
h 2 s: {2 2 \2 d (2 2V 2 2) 2 2
were 77i = vi - \Ui + fli J an 77ij = \Ui + fli AUj + flj - fl i flj .
The equality is attained in the above if the estimator of variance vy takes the form
vo=L:diai~?--CJ}-fl? frsi+L:L:a ijlflij&iYrfliflj YSij+L:a;{u?+fl? ~L:L:aijfliflj' (5 .10.1 .2)
Thus we have the following theorem:
Theorem 5.10 .1.2. By relaxing the assumptions of Chaudhuri and Roy (1997a), a
new expression for the variance of Vo is given by
M(vo) = EmEp(vo - vf
(5.10 .1.3)
434 Ad vanced sampling theory with applications
2$ iYj - JJiJJj 12
+ 2II aijl)' f iYj - JJiJJj }{YkYI - JJkJJI }Cp( -ISij ,-ISk/ J
f Vp( -ISijJ + I I IIaiPkJl)'
i"j 1[ij i e j ek e j 1[ij 1[kJ
+ 4I I Iaijaik
~# k
~iYj - JJiJJj}{YiYk - JJiJJd c p(I Sij , Isik J
~ ~
Further noting
( i) Vp( I Si]=1[i -
1[i
21[l
1[i
= 1-1[i ,
1[i (1.I'J
( ii ) Cp --B...,....!L =
1[i 1[ j
1[ .. -1[ .1[ .
IJ
1[i1[ j
I ) for i *- j,
(1'1'1
' ) Vp ( ISijJ -- 1[ij-1[J
2
_1-1[ij
-
f
or I. *- J. , ( IV
. ) C
P
( lSi , I Sij ) -_ 1[ij-1[i1[ij fior I. *- J,.
1[ij 1[ij 1[ij 1[i 1[ij 1[i1[ij
Hence we have
M(vo) =E",Vp(vo)=I alrJl(di - I)+2IIaJ17ij{lf/ij -1}r4 II IalaijaikJijJik (l!ijklf/ijlf/ik - I)
i i* j i*j*k
+4IIaiaijJij ~i - Jii(al + Jil )Xdi - I). (5.10. 1.5)
i*j
Note that the results in (viii), (ix), (x) , and (xi) are derived under the assumption
C",lYi' Yj )= 0 , c", ~l, YJ)= 0 and c", ~l , Yj )= 0 for i * j , but these assumptions
may also be relaxed. The following theorem states a new lower bound of varianc e
as:
and
436 Advanced sampling theory with applications
Now from Chaudhuri and Roy ( 1997a) we note that Ao attains a minimum when
bsi = lTi- l , bSij = lTi/ and a s =rs , and the minimum value
Ao = z;ar(~-
I lT
lJ17r + ~~aD(~
lTlJ
l
- lJ'7ij 1* J
which Chaudhuri and Roy (1997a) claimed as the lower bound of M{vy) .
Obviously the claim is not justifiable under the new assumptions because 8 0 does
. a rmrnmum
not attains . . when bsi = lTi- I ,an dbsij = lTij-I .
The Horvitz and Thompson (195 2) type estimator of the variance of the GREG
predictor is given by
where qi and qij are suitably chosen constants to form different kinds of
estimators , subject to a model assisted calibration (MAC) constraint given by
Em(vc(Y)) = Em(Vy ) (5 .10.2.1.4)
or equivalently
'LaiwiEm~l Ysi + 'L'Laij wijEm&iyJSij = 'LaiEm~l)+ 'L'LaijEm&iyJ (5 .10.2.1.5)
i i* j i i* j
For the superpopulation model
M : Yi = fJxi+ ei (5.10.2.1.6)
2
such that E m(Yi ) = fJx i ' Vm( Yi)=a x f , and CmlYi'Yj )= O, the calibration
constraint (5.10.2.1.5) reduces to
I a iwi (a
2
xf + fJ 2x l )rSi + I I aij wijfJ2 XiXjI sij = Ia i(a2 xf + fJ2xl )+ I Ia ijfJ2 XiXj
i i* j i i* j
2'LP 2{Ia x
=a ixf + fJ i l + I Iaijxix j} . (5.10.2.1.7)
i i i* j
and
Ia iWi x f l si = Iai x f . (5.10.2.1.9)
i i
Note that a slightly new set of calibration constraints can also developed by
considering an autocorrelated model. Then we have the following theorems :
Theorem 5.10.2.1.1. For Vx > 0, that is the first and second order inclusion
probabiliti es are the functions of another auxiliary variable, then the calibrated
weights are given by
C6 , -B6 2 2 A6 2 -B6, g
wia i =diai + 2 d iq iaixi + d,q , a ix i (5 .10.2.1.10)
AC- B AC-B 2
and
C6, -B6 2
wija ij = 'IIija ij + 2 'IIijqijaijxixj ' (5.10.2.1.11)
AC-B
where
A= s:«, d iqi xi4 l si + I 2 2
Ia ij'llijq ijx i xj I sij , B-
- z«,d n ,»;g+2 l si, 6 , -- Vx - v' h' ( x ),
i i* j i
-2)1,L;.aiwixllsi + '~"?-'i:aijW
J
ijXP:jISij} - 2J1{L;.aiWixf l Si } '
I
(5.10.2.1.12)
Now
o¢ = 0 ~ Wjaj = dja; +d;a jqj(,h ; + J1 xr ) (5.10 .2.1.13)
OW;
and
o ¢ =0 ~ wijaij ='I'ijaij+ A'I'ijaijqijxjxj ' (5. 10.2.1.14)
o Wij
On substituting (5. 10.2.1.13) and (5.10.2.1.14) in (5. 10.2.1.8) and (5.10.2.1.9) we
have
A = CL\ 1 - BL\ 2 , and J1 = AL\ 2 - BL\ 1
AC-B 2 AC -B 2
On substi tuting these vales of A and J1 in (5.10.2.1.13) and (5.10.2.1.14) we have
the theorem .
Theorem 5.10 .2.1.2. The new calibrated estimator of the variance is given by
where
iJ2 = CP - BQ and 0-2 = AQ - BP
AC-B 2 AC - B 2
with
P="i.a jd jq;x j2 v i2 I sj + "i."i.aij'l'ijqijx jx j Y;Y j I sij , an d Q = "i. ajd jqj xf y ; I si :
j j* j j
Proof. Use (5.10.2.1.10) and (5.10.2.1. 11) in (5. 10.2.1.2) we have the theorem.
, = _(x)'
Yg x = N Y Yratio .
The conventional Horvit z and Thompson (1952 ) and the calibrated variance
estimators came as
Vht (Yratio )stage=1= N
II JE S
'[.ajol + Nt
II II -
-I))L L a ijoYjY j .
j", j ES
(5.10.2.1. 18)
The second stage calibrated estimator of variance of the ratio estimator becomes
Vht (Yratio)stage=2 = Vht CYratio )stage=l + Ytgo{Vx - vx }
, {
+ Y2go I.a joxj - -N I.a joXj g g} (5.10.2.1.19)
JE D II JE S
where
, N 2 N (N - I)
VX =- I. aj oXj + ( ) I. I. a ijoXjXj ,
II JE S II II - 1 j", j E S
, IE S
N - I ~ I. a ijoYjYj ) - ( .I. a joXjg+I)(.I. a joxjg-IY 2)
( .I. a joXj2g-I)(I.I.ESa joXjYj2+ --
II - I '* JES IES IES
j
Ylgo = ( 3 N- I )( 2g-l) -
.I. aj oXj + - - I. I. a ijoXjXj .La jo Xj
( .L a jo'""';g+I)2
IES II - 1 ,* JES IES IES
and
."I.
$ 3
ai . Xj +-N -- I I ,I aij. xixj ) ( .I a ;.xj g-12
Y; J- ( ,La;.xjg+IJ(.I 2
ai . xjY; + -N --I I I aij.YiYj )
...
(
l ES n - I l~J ES rEs l ES lES n - I l 'i:-J ES
Y2g. = (.Ia;. Xi3+N--I- I I a ij.xi x j
)(
.Ia;. xi
2g-1J- (.I a j. x;g+1J2 .
l ES n - 1 1:1-JE S lES rEs
and
and
(5.10.2.1.23)
where
N 2 N(N -I)
L.aioxi + ( ) L. L. aijoxixj ,
A
VX =-
n ies n n- 1 i¢ jes
and
( 4N-I
,'L.a joXj +--'L. ,'L. 22)(
aijoXj Xj g 2)- ( ,'L.a joXjg+2)( ,'L. aj oXj22
,'L.a ;ox j Y j Y; + -N-l
- 'L. 'L. aijoXjXjYjYj )
... l Est n -I 1':1: )ES lES I ES lES n -I '"¢.jE S
.2:CALIBRA'FION'ES'J1IMA'FORS j\VHEN~
'~j:AUXi:LjAAY VARIAHUE IsfkNOWN .
Amab and Singh (2002a) proposed alternative calibrated estimator of the variance
of the regression predictor as
(5.10.2.2.1)
where b;i and b;ij are the modified weights. Now the variance of the regression
predictor of the auxiliary variable is obtained by replacing Yi with Xi in (5.10.7)
and is given by
2
Vx = L.aiXi + L.L.aijxiXj . (5.10.2.2.2)
i i¢ j
known . Let I,aixl = 1] (x) and I,I,aijxixj = T2(x) . The calibrated weights b;i and
i ie ]
b;ij are obtained by minimizing the CS distance functions I, (b;i - bsi l si and
I bsiqsi
L
I, I,
(b;ij - bsij) l sij su b·ject to I, bsixi
• 2 l si = 1] (x) and '
I,I,bsijXiX/sij ( ) . 1
= T2 x ,respective y.
i" j bsijqsij i i j
Minimisation yields
• =b +
bsi si
bsiq six4l [ 2 2]
I,aixi - I,bsixi l si (5.10.2 .2.4)
I,bsiqsixi l si ' ,
i
(5.10.2.2.5)
On putting the values of b;i and b;ij from (5.10.2.2.4) and (5.10.2.2.5) in
(5.10.2.2.1), we obtain an another calibrated estimator of the variance of the
regre ssion predictor as
(5.10.2.2.7)
442 Advanced sampling theory with applications
where
rIo = .IP ioxiYl / .'L. a iox! , r20 = 'L. 'L. aijoYiYj/'L. 'L. a ijoxiXj' 'L.aio xl = 1i(x)
IES IES 1* JE S 1* JE S iE[l
• [ T2 ()
+ Y20 N(N -1) 'L. 'L. aijoxi x j ]
x - -(--) (5.10.2 .2.8)
n n -1 i* j es
Amab and Singh (2002a) further considered the situation when VX ' the variance of
the regression predictor for the auxiliary variable is known but 1i (x) and T2 (x) are
unknown, and determine the calibrated weights b;i and b;ij by minimizing
(5.10.2.2.11)
and
" bSiqSijXiX/Si{VX -l\}
bsij = bsij + 4 2 2 (5.10.2.2.12)
'L.bsiqsixi l si + 'L.'L.bsijqsijXi xj I sij
i i* j
The resultant estimator of variance of the regression predictor is given by
Chapter 5: Use of auxiliary information: PPSWOR Sampling 443
(5.10.2.2.13)
If we assume bsij (and hence b;ij) equal to zero for all values of i and j in the
sample , then the proposed strategy is an improved version of the estimator studied
by Sarndal (1996).
(5.10.2.2.14)
where
A (A ) N 2 N(N - 1)
v p 2 1)'greg = - IaiY; +- (- -) I I aij.Y;Yj
11 ie s n n- 1 i ~ je s
+ Y3. Vx -
{N
- Ea .x;2 +-N(N
(- - -1)) I I aij.x;xj }] (5.10.2.2.15)
A [
11 ie s 11 Il - 1 j~ j es
where
2 2 N- 1
I a ;.Yixi + - - I I aij.xixjYiYj
ies n - I tv j e s and Vx = Iajoxf + I I a ijoxjxj .
Y3. = 4 N- I 2 2 j EQ j* j EQ
I
ie s
«». + -- I I a ijx.xj
n - I i:l:je s
Brewer (1999) said, " It is appropriate to estimate the anticipated variance for
sample design purposes, but for the analysis of any particular sample the prediction
varian ce is more logical choic e".
Thu s the next section has been devoted to find the prediction van ance of the
calibrated estimator of variance .
444 Advanced sampling theory with applications
= Em[. I: wJsiyl
,eO
+ I: .I: wijIsijYiYJ - {I:aiY1 +
'''' JeO
~ .I: a ijYiYJ}]2
'''' JeO
+ I: I: I: apikm2ixJxk ] .
i"' J",keO
We will now discuss the estimators which take into account the order of the units
selected in the sample as well as those that ignore the order of the units. It is
remarkable here that:
( a ) For each ordered estimator, we can find an unordered estimator;
( b )The unordered estimator is more efficient than its corresponding ordered
estimators.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 445
Suppose that n units are selected in the sample in the order Y" Yz, ..., Y ll with
selection probabilities PI ' p z ,...., P ll ,respectively. Then an estimator of
population total can be defined as
Thus we have n possible estimators of population total and each estimator is found
to be unbi ased .
Ez [Ii I Results for first (i-I) draws are known]= iII Y j + EZ[ Y'.· {I _ iI I Pj }]
; =1 ~ ; =1
= .-:L Y . + Ez(Y')
j =1 ; P
--L.
j
-1
i- I
note that Pj = ~ I - L: Pj
( )
; =1
Therefore
446 Advanced sampling theory with applications
Theorem 5.11.1.2. The ordered estimator s i; and f j for i if:. j are uncorrelated.
Proof. We have to show that cov(i;, fj )=0 'if i if:. j . For simplicity let us assume
that i < j and results up to the i''' draw are known. Now we know that
cov(i;, fJ=EIC2[i;, fjIAj_ I]+ C1[£2 (i; I Aj_l~ E2(fj I Aj_I)] (5 .11.1.6)
where
EI , E2 = Cond itional expected values,
C" C2 = Conditional covariance terms,
A j_1 = Results for the first (j -1) draws which are known ,
Theorem 5.11.1.3. The unbiased estimator for the variance v(f) is given by
.(. ) 1 [" ' 2-IIT.2] .
vT=-(--)IT;
II 11-1 ;= 1 (5.11.1.10)
Proof. The theorem is proved if we can show
E[v(f)]= v(f), (5.11.1.11)
where
Then we have
Chapter 5: Use of auxiliary information: PPSWOR Sampling 447
Now
For simplicity we restrict ourselves to the case of sample size of two units . Let Yi
and Y j be the values of the units selected with varying probabilities and without
replacement in a sample of size two and let P; and Pj be the corresponding initial
selection probabilities at the first and second draw respectively. If Yi and Yj are
the values of the un its drawn at the first and second draw respectively, then Raj's
ordered estimator of population total , Y, corresponding to ordered sample
S' =(Yi,Yj) is given by
-(-0 )=-I-
v>I 1( ~ \2( Yi Yj J2
J --- (5.11.1.15 )
4 ~ Pj
If Yj and Yi are the va lues of the units drawn at the first and second draw
resp ecti vely, then Raj 's ordered estimator of population total , Y, corresponding to
the ordered sample S2 = (Yj' Yi ) is given by
Y5: = ~ {(I+ P ~ + (I - P ~
j ) j ) } (5.11.1.16)
The probabilities of the estimators Ylo and Y5: are respectively given by the
probabilities of the selecting the {'I ordered sample as
p(sr)=PiPj j(I-p; ), and P(S2 )= PiPj j(I-Pj ).
Then Raj ' s (1956) unordered estimator of population total, Y, is defined as
448 Advanced sampling theory with applications
(5.11.1.18)
Theorem 5.11.1.4. An expression for the variance of the estimator i for a sample
of two units is given by
A ) :::; -1 ,2N: f}
V (T,.aj (r, YJ2 - {1+ 0 (n)}(n-l)[N
-L. - - - ,2: f} 22: f} (r, YJ2 + 2:
N
Nf} 2(r, YJ2] .
-L. - -L. -
n 1=1 f} N 2n 1=1 1=1 f} 1=1 f}
Under the assumption of large population size, Mukhopadhyay (1977) showed that
v(fraJ =-21
ni*j=1
I &i/Pi - Yj/PjfPiPj[l- n- 1 (Pi + Pj)+ (n-IXn -2)(p; + P] + PiPj)
2 n
-
(n-l)(n-2)(
2 Pi + PJ' L
)N
If2] .
n /=1
The admissibility of the estimator ~aj for PPSWOR samples of two units within the
class of all unbiased estimators of population total was claimed to be proved by
Joshi (1970) . While indicating that this claim is incorrect, Patel and Dharmadhikari
(1978) proved its admissibility when restricted to the class of linear unbiased
estimators only. Sengupta (1980, 1982b) generalized the results of Joshi (1966) by
showing that an estimator identical to ~aj remains admissible within the class of all
estimators of population total for any fixed size sampling design of size two.
Sengupta (1983) also provided a sufficient condition for the admissibility of
unbiased estimators of finite population parameters when sample size is two at
most. The result is used to check admissibility of several unbiased estimators of
population total. Rosen (1997b) has provided an asymptotic theory for order
sampling and introduced a novel general class of varying probabilities sampling
schemes, called order sampling schemes. The main result concerns asymptotic
distributions of linear statistics. Even if the results are theoretical, they provide the
ground work for applications of practical sampling interest. Rosen showed that
order sampling yields interesting contributions to the problem of finding simple and
good zrps schemes . Bhargava (1978) considered some applications of the technique
of combined unordering of different estimators which enables us to obtain various
new results in addition to those given by Basu (1958) and Pathak (1967a, 1967b).
Rosen (1998) has discussed in detail the methods for calculating the inclusion
probabilities for ordered sampling. Some order relations between the selection and
Chapter 5: Use of auxiliary information: PPSWOR Sampling 449
the inclusion probabilities for PPSWOR sampling scheme have also been discussed
by Rao, Sengupta, and Sinha (1991) . Das (1951) proposed an estimator based on
ordered sample s, however his estimator does not have a non-negative variance
estimator. Mukhopadhyay (1977) considered the comparison of ordered estimators
under Midzuno--Sen 's and probability proportional to with replacement size based
on samples of two units. Andreatta and Kaufman ( 1986) stud ied these estimators
under informati ve design , that is, their selection probabilities depend upon the
values of the study variable.
If we have a sample of n units then there can be n! arrangements of the units, that
is, there will be n! ordered samples. For example, if n =2, we have two ordered
samples and one unordered sample. In other words , AB, BA are two ordered
samples, but if we ignore the order then there is only one sample of two units . For a
sample of n = 3 units, the number of ordered samples is 3!= 6 , namely ABC, ACB,
BAC, BCA, CAB, and CBA . Suppose Sit denotes the u" unordered sample, then the
number of unordered samples is N Cn and if So denotes the d" ordered sample of the
Sit unordered samples then the number of ordered sample So is n!.
Defining
i(so,o)= estimat e of population total based on d" ordered sample
corresponding to the s,/" unordered sample.
i(slt)= Unordered estimator of population total based on s/' unordered sample .
p(so,o) = Probability of selecting the d" ordered sample of the s/' unordered
sample .
n! I
p(SIt) = IP(so,o) = Probability for selecting s,/ ' unordered sample.
o= !
(5.11.2.1)
Proof. It follows from the result given in (5.11.2.10) based on next two theorems .
For n=2 the estimator (5.11.2.1) reduces to Murthy' s (1957) unordered estimator of
population total which we discuss in the following theorem .
450 Advanced sampling theory with applications
Theorem 5.11.2.2. Murthy 's (1957) unordered estimator of population total based
on two units is given by
T•M =( 1 ) [y-
---1... (1-Pj+-I-P;
) Yj ( )] . (5.11.2.2)
2-P;-Pj P; Pj
Proof. Suppose u. , Uj and Ui : Ui constitutes the units in the sample and the
probabilities of selection attached to these units are P;, Pj and Pj, P;,
respectively. Now if P; is the probability of selecting the {" unit at the first draw ,
then the probability of selecting /, unit in the second draw given that {" unit has
already been selected in the first draw = Pj /(1- p;) .
Therefore the probability for selecting Ui and Uj = P;Pj /(1- p;) = p(so, I).
Similarly the probability for selecting Ui and Uj = P;Pj /(1 - Pj)= p(so, 2).
Thus the sum of these probabilities is given by
_ ( ) ( ) _ P;Pj P;Pj _ p;pA2 - P; - Pj)
P(Su) -PsG ' 1 +Pso, 2 - - - + - - -
1- P; 1- Pj
X )
(1- P; 1- Pj (5.11.2.3)
Now the first ordered estimator based on the units Ui and Uj is given by
f( so, I) = Yi + i (1-p;)
}
(5.11.2.4)
with probability p(so, I), and the second ordered estimator based on the units Uj
and Ui is given by
f (so,2)= Yj + ~ (l-pJ
I
(5.11.2.5)
with probability P(so,2} Then we have
f (sJ = f p(so,o)f(so,o) = f p(so,o)f(so,o) = p(so,l)f( so,I)+P(so,2)f(so,2)
0=1 p(su) 0=1 p(su) p(su)
Theorem 5.11.2.3. The ordered estimator of population total is always less efficient
than the corresponding unordered estimator.
where v[i (so, o)] is based on the olhordered of the s:,h unordered sample. E2 , v2
are the conditional expectation and variance for a given unordered sample Su and
E I, VI are the expectation and variance over all the unordered samples s" .
Therefore we have
V [i(so,0 )]=E)V2 [i(so'0)1sul+VI [Q], (5.11.2.8)
where
Q= Ezli(so ,o)1sJ (5 .11.2.9)
Now to find the value of Q in (5.11.2.9), we proceed as follow s: The ordered
estimators i (so ,o) for a given unordered sample s" have values corresponding to
each order, that is i(so,I), i(so,2), .....,i (so , n!),where
i(so, I) = Estimator of the population total based on the first order sample from
1h
the u unordered sample s",
i(so, 2) = Estimator of the population total based on the second sample from the
the z/h unordered sample Su ,
and
i (so,n!) = Estimator of population total based on II! th ordered sample from the
u,h unordered sample s"
with prob abilities p(so, I), p(so, 2) , and p(so, n!), respectively. Now to find the
ordered estimator i(so, 0) we adjust the probabilities such that sum of these
probabilities equals one.
Thu s we use the adjusted probabilities given by p(so, 1)/p(s,,) , p(so, 2)/ p(sJ , 00'
II !
p(so'n!)/ p(s,,) , respectively, where p(sJ= IP(so'o) .
0= 1
Thus we have
Therefore we have
Population of
N
units
SRSWOR
nth random grou
of the remaining
», units
Se ect one
unit with
PPS
First random group: Out of N units, select N 1 units by using SRSWOR sampling.
n
INi=N.
i=l
(5.12.1)
The allocation of units to different groups is done randomly and we select one unit
from each of the n groups with probability proportional to size (PPS) and thus we
obtain a sample of size n .
Suppose R, P2 , .... , PN are the probabilities associated with the N units in the
N
population and IP; = I.
i= l
Further suppose that Pi) denotes the probability corresponding to the i h
unit in the
/h group, 0i ' 'II i = 1,2, ..., n. .
Chapter 5: Use of auxiliary information : PPSWOR Sampling 453
Thus the Rao, Hartley, and Cochran (1962) mechanism can be better understood
from the following table, which gives the structure of population units after making
random groups, as follows:
Proof. Suppose E 2 denotes the expected value for the given random group Gi and
E 1 denotes the expected value over all possible random groups. Then we have
(±N
2-NJ[
2] (5.12.3)
i~(~ -1) j~\ ~
2
V(Y RHC)= - y .
Proof. We have
V(YRHC)= E\V2(YRHCI Gi)+J.]E2(YRHC I Gi )
I. ~
=E\V2[ i;I(P;\ I G ] + J.]E2[ I.~ I Gi ]
/,;) i i;\(P;\ /,;)
Note that we have selected independent samples from each group, therefore
E,V2 [ I
11 -Y·I
'-
;=1(P;If,;)
(1"1
IG;] =E1[ I11 V2 - ' - IG;
;=, P;If,;
J] . (5 .12.5)
Thus we have
2- {E ---.11L }2
v2
(P;I I J-
Yil' ; G - E 2 ~
[(P;, /,;)] (P;I/,J (5.12.6)
In a given random gro up of N; units , the random variable -( Yil ) can take any of
P;' /';
If, lf2 lfN; .h b uu . P;, P;2 P;N;
t he va Iues -(- -) ' -( - -) ,....., ( ) WIt pro a I ines - , -
P;' /'; P;2 /'; P;N;/'; '; r; ';
respectively.
Th us we have
£2[( Y/il .)] = ( >i/., .)X(pil /,;)+( .If/2 .)X(P;2/' ;)+ +(Y;N/; y(P; N;/';)
11, r F:l r F:2 r F:N; r
N;
=If,+lf2+·······+lfN;= ~ lfj =lf. · (5.12 .7)
J='
Also we have
E -
y 'l
'-
]2= L N; 1':
1)
.2
..,--"---,r (5 .12.8)
[
(P;If,;) j=' (Pij I,;)
because [ _
Yil ( )]2 takes the values If,-)]2, [-( -lf2]2
-) ,....., [-(-lfN;]2 .
Iid,; [-(-
P;' /'; P;2 /' ;
-)
P;N/';
with
pro b a bili .
nines -p;!, -P;2 , P;N;, respective
,- . Iy. U smg
' (5 . 12 .7) an d (5 . 12.8) m
.
r, '; ';
(5.12 .6) we have
(Y' J
V2 -lL.L I G = L
N; r}I)
(5. 12.9)
P;, j =1 (Pij I';)
Note that
N;
'; = I Pij
j='
therefore (5. 12.5) implies that
E,V2 L-,_
11
[
·I _IG = E, LV
11
;=1 (P;If,;)
Y
2 _ i1_ IG =E1 L L
11 ] [ (Y
N;
;=1 P;If,;
J] [ 1j=' (Pij1I'':~ ;)
;=1
I)
= L11 [E, ( L
N; 1':~'J
;=, j=' (Pijl';)
Chapter 5: Use of auxiliary information : PPSWOR Sampling 455
(5.12.10)
Furthermore
E{V (t V{/T;)IG)]
2
= ff N; ~ y} + N;(N; - N;)s; _
-1)[ ~ y}P _ ~ Y}]_ N?(NNN; N?y2)
;;\1N N(N -I)
); 1 j ;1 j );1
= In[N; 2
- IN y. + N;(N;-I)[N
I -y} - IN y . 2]- N;(N-N;) 1 {NI 2 -2} - N;2-2]
Y - NY Y
;;1 N ) ;1 J N(N -I) );1 Pj ); \ J N (N -1) j ;1 J
456 Advanced sampling theory with applications
= L"[N; N 2
- L y. +
N; (N; - I) LN -YJ - N; (N;- I) LN y.2 - -
N;- L y. + -NN;
N 2
-Y
-2
;= N j =1 J N(N - I) j=1 Pj N(N - I) j= \ J (N - IL=I J (N - I)
2
+ N' N y2
L N~' _y_ 2 _N 2y
- 2 ]
N(N - IL=\ J (N- l) I
N Y}
- L:" [ N;(N; - I) L: N
- + L: y 2{N;
. - - N;(N -I) -N-
j j Nt
+- --} + -y {-
NNj Nt
2 - ---- N ·
2
2} ]
- j=1 N(N- I) j =1 j P j=1 J N N(N- I) N- I N{N - I) N N- I N- I '
Theorem 5.12.3. The RHC scheme is more efficient than PPSWR sampling if
N; =N , 'if i = I, 2, ..., II.
II
"
V (YpPSWR ) = -I
II
[ LN-YJ - Y2] ,
j = 1 Pj
and V(YRHC )=( ;=\
L" N ·2 -N
I
N(N - I)
[I Y
J j
j=1 Pj
2
_ Y2].
Combining these results we have
"
V (YRHC ) =
{~tl-NJV " ( ) (YPPSWR ).
(5.12. 15)
N N- I
To find the minimum value of the variance V(YRHC ) with respect to N; we have the
Lagrange function
L= ~ Nl-A[~N;
;= 1 ;=1
- N] . (5.12.16)
On differentiating (5.12.16) with respect to N; and equating to zero we have
N; =A/2. (5.12.17)
I( Ny_N
V(YRHC)= ;=1 n [nV(YpPsWR)] = (N-n)V(YpPSWR) ' (5.12.20)
N(N -1) N-1
Note that (N - n )/(N - 1)< 1 'i n :2: 2 , V(YRHC)< V(YPPSWR ). Hence the theorem .
J
± Y;~
n 2
"IN , -N [
v(r.RHC )= (
;=1 '
n , 2
Y'2
RHC ] • (5.12.21)
(N 2
- ;~IN? J '=I(/j1/1J
Proof. We know that
y2
Note that the estimator of I Yj is
N
j=J
I~
;=1(P;IIT;) ,
the estimator of
N
I
j=1
_J_
Pj
IS
n v(YPPSWR)= I -
, A n Yil
22 T; - [(I-T; J2 -V(YRHC).
;=I/jl
n Yil
i=I/j 1
A A
~ (5.12.25)
Therefore
458 Advanced sampling theory with applications
or
Some improvements in the RHC strategy have also been suggested by Hartley, Rao,
and Kiefer (1969) , Gabler and Horst (1995), Mangat (1993), Bansal and Singh
(1986) and Singh and Kishore (1975). Padmawar (1996) has considered an
interesting extension of the RHC strategy for the case of continuous populations. Its
comparison with Midzuno 's scheme of sampling has been discussed by Chaudhuri
(1977) under a superpopulation model setup .
Example 5.12.1. From the population I select a sample of five units by using the
RHC scheme. Estimate the total real estate farm loans using RHC estimator and
making use of nonreal estate farm loans as an auxiliary variable. Also find 95%
confidence interval for the total real estate farm loans in the United States.
Solution. We are to select a sample of size 5 by using RHC scheme. Thus the
population must be divided into five random groups. To do this we selected 50
distinct random numbers between 1 and 50 by starting with the first two columns of
the Pseudo-Random Numbers (PRN) given in Table I of the Appendix.
The states bearing serial numbers corresponding to the first ten selected random
numbers constitute the first random group, whereas the next ten form the second
random group and so on.
Let
Yij = Real estate farm loans ($000) for the/" state in the /" random group,
X ij = Nonreal estate farm loans ($000) for thej" state in the /" random group,
and
Pij = the initial selection probability of the /" unit in the /" random group.
We are given X = 43908.1 2 , thus the following are the 5 random groups of units
along with initial selection probabilities.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 459
,!;
I"X"1
37 OR 114.899 0.034606
25 MO 1579.686 0.034618
50 WY 100.964 0.008802
30 NJ 39.860 0.000626
09 FL 825.748 0.010579
II HI 40.775 0.000867
49 WI 1229.752 0.031258
43 TX 1248.761 0.080176
15 IA 2327.025 0.089044
39 RI 1.611 0.000005
0.2:90580:
0
35 OH 870.720 635.774 0.014480
20 MD 139.628 57.684 0.001314
34 ND 449.099 1241.369 0.028272
13 IL 2131.048 2610.572 0.059455
24 MS 627.oI3 549.551 0.012516
41 SD 413.777 1692.817 0.038554
17 KY 1045.106 557.656 0.012701
08 DE 42.808 43.229 0.000985
28 NY 5.860 16.710 0.000381
45 YT 57.747 19.363 0.000441
Sum 7424:730 0.169100
Now apply Lahiri's method of selection of sample in each group to select one unit
independently. The first random group consists of N, = 10units and maximum
value of the auxiliary variable Xl} is 3928.732. Choosing X o = 4000 we select
random number 1:0; Rj :0; 10 by starting with the first two columns and another
random number 1:0; R} :0; 4000 by starting from the i h to loth columns of the Pseudo-
Random Numbers. Then the first effective pair of random numbers is (04, 0757).
Thus from the first random group the unit with Sr. No.4, that is the state AR, will
be included in the sample.
The second random group consists of N 2 = 10 units and the maximum value of the
auxiliary variable X 2} is 3586.406 . Choosing X o = 3600 we select random
number 1:0; R :0; 10 by starting with the 7th and 8th columns and anothe r random
j
Chapter 5: Use of auxiliary information: PPSWOR Sampling 461
number 1 ~ Rj ~ 3600 by starting from the 13th to 16th columns of the Pseudo-
Random Numbers. Then the first effective pair of random numbers is (07, 0536) .
Thus from the second random group, the unit with Sr. No.7, that is the state IN,
will be included in the sample .
The third random group consists of N 3 = 10 units and the maximum value of the
auxiliary variable X 3j is 2580.304. Choosing X o = 2600 we select random number
1 ~ Ri ~ 10 by starting with the 13th and 14th columns and another random number
1 ~ R j s 2600 by starting from the 19th to 22th columns of the Pseudo-Random
Numbers . Then the first effective pair of random numbers is (07,0705). Thus from
the third random group the unit with Sr. No.7, that is the state MT, will be included
in the sample .
The fourth random group consists of N 4 = 10units and the maximum value of the
auxiliary variable X 4i is 3909.738. Choosing X o = 4000 we select random number
1 ~ Ri ~ 10 by starting with the 19th and 20th columns and another random number
1 ~ R j ~ 4000 by starting from the zs" to 28th columns of the Pseudo-Random
Numbers. Then the first effective pair of random numbers is (08, 1230). Thus from
the fourth random group the unit with Sr. No.8, that is the state TX, will be
included in the sample.
The fifth random group consist of N s = 10units and the maximum value of the
auxiliary variable X S j is 1610.572. Choosing X o = 27 we select random number
1 ~ Ri ~ 10 by starting with the 25 th and 26th columns and another random number
1 ~ Rj s 1700 by starting from the 31st to 34th columns of the Pseudo-Random
Numbers . Then the first effective pair of random numbers is (06, 0599) . Thus from
the fifth random group the unit with Sr. No.6, that is the state SD, will be included
in the sample .
After combining all above steps, the ultimate sample consists of the following
information.
11416.090 536347692.4
9749.003 507681529.1
2349.463 41857340.6
4525.876 70491971.3
0.169100 1814.867 19478071.8
29855.3001175856605.0
Note that here 'i = I Pij' i = 1,2,3 ,4,5.
jEGi
462 Advanced sampling theory with applications
The estimate of the total real estate farm loans during 1997 in the US is given by
• n y OI
YRHC = I:-(_,-) = 29855.30
1=1 Pn /Ti
and an estimate of variance of the estimator YRHC is given by
We shall discuss a few sampling schemes under which the usual ratio type
estimators of population mean and variance become unbiased .
Let the lh sample s consists of n units such that (Yi' Xi), i E sand t = 1,2,..., (~J is
the total number of samples using SRSWOR sampling. Our aim is to find the
probability p(t) of selecting any given sample. Consider the first unit Xi is
selected on the first draw .
Chapter 5: Use of auxiliary information: PPSWOR Sampling 463
(i )(:--/T
1
Then the probability of selecting the first unit on the first draw is ,
1
where (N11 -1-IJ- denotes the probability of selecting the remaining (n -1) units out
of (N -1) units .
Similarly the probability of selecting the second unit on the first draw IS
(
X2
X
J(N -IJ-I,and so on.
11-1
The probability of selecting the nth unit on the first draw is (-; )( :_-n- 1
Hence the probability of selecting the lh sample on the first draw is given by
N IXi
1=1 (5.13.1.1)
= ( 11-1 X '
_ _I 11 _ _I 11 th
Let YI = II LYi , XI = II LXi be the sample means obtained from the t sample for
{:l
i=1 i= 1
t = 1,2,....
Also assuming that the population mean X of the auxiliary variabl e is known, the
usual ratio estimator of population mean is given by
_ YI_(X-=- ).
YR = (5.13.1.2)
XI
Then we have the following theorem :
Theorem 5.13.1. The ratio estimator YR is unbiased for the popul ation mean .
Proof. We have
11-1 II-I
Nanjamma, Murthy, and Sethi (1959), Singh and Srivastava (1980) and Swain and
Mishra (1992) have considered the use of a Midzuno type of sampling scheme as
follows :
Step I. Select two units, for example, th and j", with the probability of their joint
selection being proportional to (Xi - Xj~;
Step II. Select (n - 2) units from the remaining units of the population by simple
random sampling and without replacement.
Note that the first step may be performed by considering all possible pairs of units
and selecting a pair with the assigned probability.
2 2(S;)
SI = Sy s;
is unbiased for S;.
Proof. We have
Singh and Srivastava (1980) proposed two unbiased regression type strategies
which depend on only one auxiliary variable. Wywial (1999) has considered an
elegant extension of such sampling schemes to the case of multi-auxiliary
information . Further, Wywial (2000) consider a study of Horvitz and Thompson
(1952) type estimators under such sampling schemes. Chen (1998) has also
proposed weighted polynomial models and weighted sampling schemes for finite
populations.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 465
L rPj~j,Xj,f}) = 0 (5.14.1)
jell
where y j and x j are values of observed variables Y and X respectively. The rPj
are known real valued functions and (} is a real valued parameter of interest. Some
special cases of (5.14 .1) are as follows :
( a ) the population mean u y defined by
(5.14.2)
t )
where/\)'. :<:; y=
{I if y . :<:; y ,
J . (5.14.5)
} 0 otherwise;
When a population function (or parameter) is defined by (5.14 .1), its estimator can
be defined as a solution of the sample estimating equation
(5.14.7)
466 Advanced sampling theory with applications
where Jrj denotes the probability of including the l' unit in the sample. An
estimator of the population mean can easily be obtained by solving
I (y j - 0)/ Jr j = 0 , that is,
j es
- - (jes'" YjJrj
Ys - L. - J/('" J
L. -
j e sJrj
I • (5.14.8)
Ys =(N jes
I YjJ/ I ~J .
(jes (5.14.9)
Jr j j Jr
Thus the underl ying concept of the term ' target paramet er ' can most simply be
explicated by ' induced paramete r'. To explain it let m;(O) and vi(O) be the mean
and variance of the random variable y ; associated with the i''' unit of the survey
popul ation (i = 1,2,...,N} In equation (5.I4.I0), let
E. (0) = { Y; - m;(o)} 01ll;(0)
, V; (0) 00 (5.14.11)
or its trimm ed version if one wants to exclude the tails. The score function of the
logistic model discussed by Skinner, Holt , and Smith (I 989) can easily be seen as a
special case of (5.14.1I) and hence (5.14. 10). The robu st opt imality is as
comp elling as the optimality of the sample mean , or its weighted version, for the
Gaussian linear models. The j oint distribution of the study variable Yo and known
auxiliary variable X o is assumed to be such that (y;,x;),i = 1,2,.., N , are independent
with density
f(y;, x;;0, 11;) = h(.'l:;;O, I1;)fi (y; I x;;O), i = 1,2,...,N (5.14. I2)
Chapter 5: Use of auxiliary information: PPSWOR Sampling 467
J
Prob(d;O,no) = {nfz(x;;O,n;)}p(s 1 X o njj(y; I x; ;O)}. (5.14 .14)
,;1 1'ES
The nuisance parameter n; can vary here. In (5.14.14), for every fixed 0 the class
of distributions of X 0 obtained for the possible variations of the nuisance
parameter is called a complete class of distributions. Following Godambe ( 1991),
an estimator g = g(d ,0) of the parameter 0 such that E(g) = g * is optimal if
(5.14.15)
is its a minimum for g = g * . Then the optimal estimating function for estimating 0,
based on data d, is
g*(d,O)= L ologjj (y; I x; ;O) (5.14 .16)
;ES 00
that is, the optimal estimate for () is the solution of the equation
g*(d,O) = ° (5.14.17)
which, in fact, is a special case of the general result of Godambe (1976) . The
interesting result is that the estimate of 0 is free from the sampling design
p(el X 0), which, in fact supports the result of Royall (1970a, 1970b, 1970c). The
estimates of parameters of interest in general can be obtained by solving the
maximum likelihood equations given by
olog[Prob(d; 0,no)] (5.14.18)
= 0,
00
and
olog[Prob(d; O,no)] =0. (5.14 .19)
ono
If (0, no) is the solution of the equations (5.14.18) and (5.14 .19), then the estimate
of J1 is given by
jJ=J1(O, no). (5 .14.20)
It is to be noted that the equations (5.14.18) and (5.14.19) are independent of the
sampling design p(el X), and so are the estimates 0, no and jJ.. The estimating
equations (5. 14.18) and (5.14.19), and hence the implied estimate jJ of J1, are
dependent on the full model (5.14.12). Godambe raised the following interesting
468 Advanced sampling theory with applications
question: Can this dependence of the estimate on the entire model be reduced by
some alternative procedure of estimation? This query is particularly meaningful in
that, as remarked earlier, the modelling of the design or auxiliary variate X must in
practice be very tentative . An alternative estimating procedure that utilizes only the
conditional distribution in (5.14.12), namely II, would be very desirable . It would
be more helpful if this alternative procedure depends only on some semiparametric
relationship underlying the conditional distribution fi . Such an alternative
procedure is given below. The main concept here is that of the 'induced parameter'
defined by (5.14.10). Since the variates Yi, i = 1,...,No are i.i.d. with the unknown
mean /.1, the induced parameter is given by the solution /.10 of the equation
1(Yi - /.10)= 0,
i=1
(5.14.21)
that is
1 NO
/.10=-N IYi · (5.14.22)
o i=1
Assume that in the conditional distribution II, B is a regression parameter and
Em {(Yi - a - Bx;)I Xi} = 0, i = 1,...,No , (5.14 .23)
where a is a known constant. The theory of estimating functions given by
Godambe and Thompson (1986) can be seen to provide optimal estimation of the
induced parameter /.10 in (5.14.22) by using (15.14.23) and the sampling design
p(.1 X n )· Note that this optimality is both conditional on holding the design
variable X n fixed and unconditional. It should be noted that the unconditional
optimality is important here, for it is in the unconditional distribution that
/.10 ~ /.1 as No ~ 00 . (5.14.24)
In other words, for survey populations with large size No, we have /.10;: /.1. This
also provides justification for using the estimate which is optimal or approximately
optimal for the induced parameter /.10 for the parent parameter /.1. The conditional
and unconditional optimal results are given below . Let
(5.14.25)
It is remarkable here that for given (a,B ,Xn), H is a function of /.10. Using the data
d in (5.14.13) and parameter B, we can obtain an optimal estimating function
h*(d,B) for H. For the given sampling design p(.IXn ), let the first order inclusion
probabilities be
Jri = Ip(z I X n ), i = 1, ...,No · (5.14.26)
ssi
For a fixed B, let h(d,B) denote an estimating function based on data d, which is
design unbiased for the function H because E p (h) = I hp(s I X n) = H . Let Em denote
the expectation with respect to the model (15.14.23) both conditionally on X n and
Chapter 5: Use of auxiliary information: PPSWOR Sampling 469
(5.14.27)
Following Godambe and Thompson (1986), one such optimum estimating function
is defined as
which is also called a design weighted optimum estimating function . Similarly, for a
fixed 8, the optimal estimating function or estimate fLo for IJo is given by the
solution of the equation h* - H = 0 or equivalently
• = - 1 + ( a + - 8 NO
IJo J
L Xi . (5.14.29)
No No i=1
It is clear that fLo in (I 5. I4.29) depends upon the unknown parameter 8 . Let e be
an estimator of 8 obtained by setting g~ (e)= 0 . Then an approximation to fLo is
given by
- 1 ( - ) + ( a+-Lxi
IJo=-h\d,8 NO e. J (5.14.30)
No No i=1
Consider Yj , i = 1,00 .,N, are drawn from a superpopulation model with a parameter
0, possibly a vector. Assume that the model is such that Yj, i = 1,00 .,N, are
independent and that for some specified real functions tPj(yj,O) of the indicated
variables
(5.14.1.1 )
where i = 1,00 .,N, and the choice tPj depends upon the problem of estimation under
consideration. Godambe (1995) considered the problem of estimation of the general
function, defined as
tP;
11 * = I - (5.14 .1.3)
iES 1[;
where "j = IP(s I X o ) denote s the first order inclusion probability. The function 11*
S3;
(5.14.1.4)
where li(Yi)and ai(O) are special functions of the indicated variables . Then it
follows from the optimality of h' that for any given 0, the optimal estimate of
(5.14.1.5)
is given by
(5.14.1.6)
(5.14.1.7)
This will be true if 0 is replaced by its estimate. We list here a few assumptions
about the functions Ii and a ., which have important implications for estimation of
robustness . For the superpopulation model satisfying the relations EoMYi'o) = 0 and
¢i(Yi'O) = ./i(yJ-ai(O), there exists a survey population based estimate ON, with the
following properties:
( a ) ON is consistent for 0;
(5.14.1.8)
Using first order Taylor Series expansion of a;(e) around the point eN, under the
assumption 2 and for large N if N+ I a; is of 0(1), then the optimal estimating
;
function h becomes
If ;;(y;)= y;, i = 1,2,...,N, then the problem of estimation of I;;(y;) reduces to the
;erl
problem of estimation of population total, Y = I y; . Assume that a; (e) = f3x;,
;erl
i =1,2,...,N, such that Ia;(e) = f3 I x; = f3X is known.
;erl ;erl
Let [I; (y;) =y; , a;(e) = f3 x;; i = 1,2,...,11] be the 11 sampled observed values on the
study and auxiliary variable.
Then the estimator
h= I ;;(y;)- a;(e) + I a;(e)=I y;- f3x; +f3X = I y; +f3( X - I .:5-) (5.14 .2.1)
ie s 7r; ;erl ie 7r; ie s 7ri ies 7ri
estimator. Godambe (1995) has also discussed ratio and regression type estimators
under stratified random sampling and over different occasions. For details in
stratified random sampling and on sampling different occasions, one can refer to
Chapter 8 and Chapter 10. Godambe's paradox and the ancillary principle has also
been discussed by Bhave (1987), and its resolution is discussed by Godambe
(1987). Godambe and Thompson (1999) have shown that the theory of estimating
functions can also be used to construct confidence intervals for population
parameters in survey sampling. Godambe (1998) also considered the problem of
estimation of parameters in survey sampling.
Chapter 5: Use of auxiliary information : PPSWOR Sampling 473
which leads to the following new estimator of variance of Horvitz and Thompson
(1952) estimator of total in Sen--Yates--Grundy (1953) form given by
J2
"z =
N (N -I) L:
---'----"-
2
L: -0 ij (Yi
L: L: _1_
- - -Yj
ies j(o'i)es lrij «. lrj
=
(
~ N (N -
y
1) L: L: - 1 ,
ies j(# )es lrij
J (5.14.3.5)
Under PPSWR sampling, ll"i = liP;, ll"ij = 1I(1I-1)p;Pj and therefore (5.14 .3.5) reduces
to
v2 = .-!.- I
211 iesj(t'i)es
I ~;/P;- Y)-jP.Y/{{N(N-I)}-II
} iesj(;<i)es } J
I (Ijp;p.)l. (5.14.3.7)
Following Singh (2000a) one can easily get an improved estimator of variance of
GREG given by
v2(Ygreg ) = J.- I I Oij(wiei-Wj e ) /{N(N-I)tII I dij' (5.14.3.8)
2 iesj(;<i)es iesj(;<i)es
where
•
Oij=dij0 ij ll
dij0 ijQy(diXi- d jX) [ (' ) 1 ( \2]
V X HT -2" .I . L dij0 ij d.x, -dj xj)
-- I I d ..0 ..Q.. (d.x.-d .x .f les}(;<i)es
2 i e s j(;<i)es IJ IJ IJ I I }}
and
~! 5, M x Rl P21 PI
"X > M R2 P22 P2
''''x
'Ji Totals: R, P2, I
. .
Kuk and Mak (1989) considered three estimators M R ' M s and M p under the
names of ratio, stratified and position estimators of median of Y, respectively, in
the presence of a known median of variable x, It is interesti ng to note that the
variances of all three estimators are a function of PI I for which X :'> M x and
Y s M y as defined above. The value of PI I can be determined by
Singh (2000a) suggested that this procedure would be useful for estimating the
variances of the estimators of median proposed by Kuk and Mak (1989) . Finally he
considered the problem of poststratification in the two dimensional plane, Let Y
and x be the two variab les used for post stratifying the given data set. Let us define
I ifa:'> y :'>b, and {I if c :'> y :'>d,
Yi = {0 otherwi se, Xi = 0 otherwise,
where a, b, c and d are predefined limits used for defining the stratum
boundaries. For examp le, the strata defined as
.!.I I [1(a :'> Yi :'> b,c :'> xj :'> d )- W] = O, (5.14.3.18)
2 iefl j efl
where
I if a :'> Yi :'> b, C s Xj :'> d,
I (a s Yi s b, c s x j s d ) = { 0
otherwise,
contains the number in the sample with a :'> Yi :'> band C:'> Xj :'> d . In case of post-
stratification, the sample size /I is a random variable; therefore an estimator of
(5.14.3 .18) is given by
476 Advanced sampling theory with appl ications
5.14.4 GODAMBE'SSTRATEGY
. FOR l::INEAR
. BAYES *AND OPTIMAL
," ESTIMATION ~. r •
God ambe ( 1999) has discussed very thoroughly the optimal estim ating function in
linear Bayes form. He followed the linear Bayes methodology introdu ced by
Hartigan ( 1969) to handle the Bayesian semi-parametric model s based on fewer
moments. As in the case of non-Ba yesian statistics, it is common practice to replace
a full distributional assumption by a much weaker assumption about its first few
moments such as mean and variance. In the Bayesian approach one may similarly
consider the replacement of a completely specifi ed prior distribution by an
assumption about ju st a few moments of the distribution. Intere stingly, Godamb e
(1999 ) proposed an alternative methodology based on the theo ry of optimum
estimating function s and showed that his strategy is more readil y applicable and
efficien t in common problems than the linear Bayes methodology. God ambe ( 1999)
generalized results of Godambe and Thomp son ( 1989) for extend ing the theory of
optimum estimating functions to semi-parametr ic Bayesian models. Following
Godambe (1999), let x={x} be an abstract sample space, P={p }be a class of
probability distribution of X, e =(e e2 ,... ,en,) is a real valued m dimension al
"
parameter of interest defined on P, and n = {e(p) : p EP } . Here we wish to estimat e
e based on sample information X = {x} through elementary estimating functions.
The elementar y estimating function h j is a real valued function defined on X x n
such that, under the distribution PEP, X ,
g * '" (* *)
gj ,···,gm (5.14.4.4)
where g; '" j~1 hjG;r with G;r '" E{ ~;, IX j }/E~J I X j} exists for r '" 1,2,...,111 . One
may note here that the estimating function g * is also a member of the class G .
Denoting hjG;r by hjr' the elementary estimating functions hj ' i> 1,2,...,k are said
to be mutually orthogonal if
E(hjrhpr'IX j) =0 (5.14.4.5)
for j 'I; j' ; j, l> 1,2, ..., k; r,r'= 1,2 , ..., III. Corresponding to the elementary
estimating function g, define the two matrices as:
2
g l , glg 2,..·,g lgm
2
g2gl, g2 ,..·,g2gm
J= E =IIE(grgr' ~I and
Similarly for the elementary estimating function g * define the two matrices / and
H * as follows
ig~ og~ og~
gl*2 , gl'" si'" » .. ,gl'" gm
'" 881
' 082,...,m
08
'" '" *2 '" '"
g2gl,g2 ,..·,g2gm o g ; o g; o g;
J*= E . and H* = E 081 ' 082 ,.. ., 08
m
Theorem 5.14.4.1. If the elementary estimating function hj' j = 1,2 ,.... k, are
'mutually orthogonal' in the class G, then the estimating function g * is optimal in
the sense that the matrix
is positive semi-definite for all g E G, where AI and A- denote the transpose and
generali zed inverse of the corresponding matrix A. An estimate of 8 is obtained
by solving the equation g * = 0 for the observed value of x. The details about this
theorem can be found in Godambe and Thompson (1989) .
478 Advanced sampling theory with applications
Theorem 5.14.4.2. Assume the interpretation of estimating function g., the class
G, the orthogonality and the matrices J I and HI ' Now if the elementary
estimating functions hj' j = 1,...,k, are mutually orthogonal, then in the class G,
the estimating function g. is optimal in the sense that the matrix
Following Godambe and Thompson (1987), the above equality shows the positive
semi-definiteness of the matrix D *. Hence the theorem.
Again we may note here that the estimate of () is obtained by solving the equation
g * = 0 for given x. In case of the scalar parameter () = (}I the optimality of the
+!, +1E[ :~ r
corresponding estimating function g * in the above theorem is equivalent to the
In this section we consider the problem of the estimating a population total or mean
through a unified approach of survey sampling. The concepts of uniform
admissibility, hyper-admi ssibility , ' uniformly minimum variance unbiased
estimator' and the concept of sufficiency in survey samp ling are discussed:
We first discuss possible definitions of best and admissib le estimators of popu lation
total Y = IY; for a given sampling design P. Lets be a sample of units drawn
iEn
according to a probability scheme P : {p(s )} from the totality of possible
samplesS, where p(s) ~O for all SES and Ip(s)=1. Let Y=(Yl> Yz "",YN) bea
SES
vector of variate values associa ted with differe nt units in a population of size N, be
an element of the Euclidean space RN .
5.15.2 ESTIMATOR
An estimator e, E C(p) is strict ly admissi ble if for every other estimator e E C(p)
such that MSE(e) < MSE(ej) for at least one y .
MSE(e,) = MSE(e2) for all Y and MSE(e})':::' MSE(e;), i = 3,4 for all y with
<
inequalities holding for some of the y values. In this case both e\ and e2 are
admissible , but none of them are strictly admissib le. Now if e\ is excluded then e2
i = I, 2, ...., N and I denotes the sum over all samples which include the /h unit.
S3;
Let us first exp lain the meaning of this sum with the help of a numerical example .
Example 5.15.1. Consider a population of N = 5 units as 10, 20, 30, 40 and 50.
Select all possible samples of size n = 3 units by using SRSWOR sampling. Find
the following:
(i ) L Ys2; ( ii) find /ls2 such that L/ls2Ps = 1 .
>32 >32
Solution. Let A = 10, B = 20, C = 30, D = 40 and E = 50 be the units In the
population. Here N = 5 and n = 3 . Therefore the total numbe r of possible without
replacement samples will be given by n(s) = N C,,=5C3 = 10 and the sample space S
is given as below:
T a ble 5 151 Sampi e space .
'i: Sample.; Units included Values of" Tota ls
"
Number . iri'the samples -tlie'units 'I
I A,B,C 10,20,30 60
2 A, B, D 10, 20, 40 70
3 A,B,E 10, 20, 50 80
4 A,C,D 10, 30, 40 80
5 A,C,E 10, 30, 50 90
6 A,D,E 10,40,50 100
7 B,C,D 20,30,40 90
8 B,C,E 20,30,50 100
9 B,D,E 20,40,50 110
10 C, D, E 30,40,50 120
Chapter 5: Use of auxi liary information: PPSWOR Sampling 481
(i ) In sum ~::Ys2 ' the value YS 2 corresponds to the sum over all po ssible samples
532
having the second unit Y S2 = B in the samples. Clearly the unit B is inc luded in 6
samples with num bers 1,2,3,7, 8 and 9 in the above table.
Thus we have
LYs 2 = 60+ 70 + 80 +90 + 100 + 110 = 510 .
532
( ii ) To find /3s2 such that L /3s2Ps = 1 and under SRSWOR,
532
Keep in mind that the seco nd unit belongs to six samp les as shown above in Tabl e
5.15.1. In other wo rds we are looking for six constants such that their su m is 10.
Th e fo llowing table shows that such a choice of constants is no t unique unless we
give equal weight to all samples in the sample space.
..
T a ble 5 15 2 Ch oice
. 0 f weiztht s.
l.Sample 1 2 3 7 8 9 Sum
no. r",
. ;;/3.2 •• 5/3 5/3 5/3 5/3 5/3 5/3 10, ,<
/3;2 2 2 3 2 0.5 0.5 10
Ir /3.2'i '; 1.176471 1.372549 1.568627 1.764706 1.960784 2.156863 10
Thus all these cho ices of /3s2 provide unbiased estima tors of population total or
mean under SRSWOR sampling. We shall see in the next theorem that the second
row of the Tab le 5.15 .2 provides the correct choice for unbiasedness, that is
/3s2 = N/ n = 5/ 3.
Now let B p denote the class of all /3 for which the corresponding estimates are
unbiased. Then one wou ld usually define an estimator esb as the best linear
est imate of popul ation total suc h that the variance is minimum, which is not
possible. We have the followi ng theorem:
Theorem 5.15.2. Prove that there does not ex ist any /30 E B p such that variance of
the estimator e(s,/30) is always less than or equal to all other estimators e(s ,/3) .
Proof. If e s is unbiased then
V(es ) = I. e;ps _ y 2 .
seS
The Lagrange function is given by
482 Advanced sampling theory with applications
"i = I. Ps denoting the first order inclusion probability. Hence if Po exists it must
53;
minimum. The choice of Po =Pso= YI.,ps = d, holds but does not provide minimum
H I
variance. Hence an optimum Po does not exist in the class of linear unbiased
estimators.
Thus Godambe (1955) has proven the non-existence of the uniformly best estimator
in the class of homogeneous linear unbiased estimators of population total. The
linearity restriction has been removed by Godambe and Joshi (1965) and has proven
the non-ex istence of the best estimator in the entire unbiased class of estimators.
Murthy and Singh (1969) extends the non-existence of such a result as wide classes
of estimators of the population total and we have the following theorems:
Proof. Note that the estimator e \ is admissible, for any other estimator
e E C(p) either MSE(e\) = MSE(e) for all Y or MSE(e1) < MSE(e) for at least one y
holds. Further the estimator e\ is not at least as good as e2 therefore
MSE(el) > MSE(e2) for at least one y . Hence the theorem by contradiction.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 483
Theorem 5.15.4. If there exist two (or more) admissible estimators in a class C(p) ,
with unequal MSEs for at least one Y for a given design P , then that class does
not contain a best estimator.
Proof. Let el and e2 be two admissible estimators in C(p). Then the estimator el
is not at least as good as e2 and e2 is not at least as good as el if either
MSE(el)= MSE(e2) for all y or MSE(el)':: MSEh ) for all y with inequalities for at
<
least one y . Clearly the relation MSE(el) = MSE(e2) does not hold if the relation
MSE(el)':: MSE(e2) holds. That is, neither el nor e2 is a best estimator. Hence the
<
theorem.
Let y = (YI' Y2,..., YN ) be a vector of variate values associa ted with different units in
a popu lation of size N and be an element of the Euclidean space R N . Then a linear
estimator eb is a function on S x RN such that
I if i E S,
eb(s, y ) = Ib(s ,i)Yi where b(s,i) = { ' .
iEO 0 If Iii!: s.
for all Y E RN .
Theorem 5.15.5. For any given design P, any constant is strictly admissible in the
class L of linear estimators of the population total, except in trivial cases .
Proof. For a design P consider an estimator eo as eo = 8 , for all y, where 8 IS a
constant not equal to zero. Let el E L be any other estimator such that
e} = as + IbsiYi .
ies
Suppose Y(8) is the set of vectors in RN for which the population total Y = 8 for
all Y E Y(o) ' Clearly the mean squared error of the constant estimator eo is given by
Theorem 5.15.6. There does not exist a best estimator, and hence the best and the
uniformly best estimator, in the class of linear (L), linear unbiased (Lu ),
homogeneous linear (LIl ) , homogeneous linear unbiased (4:),
all (both linear and
non-linear) unbiased (Au) and all (A) estimators of the population total for any
sampling design P .
Proof. In the previous theorem, we have seen that any constant belongs to the class
of linear estimators (L), that it is strictly admissible in L and hence there does not
exist any best estimator in this class. Its proof also follows from the fact that if there
are at least two strictly admissible estimators in C(p) then a best estimator does not
exist in that class of estimators.
Godambe and Joshi (1965) have shown that the Horvitz and Thompson (1952)
estimator of the population total defined as
ellt = IdiYi,
iES
where "i = I Ps is admissible in the class of all unbiased estimators All and hence
53;
where X = IXi ' We know that if there exist two (or more) admissible estimators
iEO
in a class C(p) , with unequal MSEs for at least one Y , for a given design P, then
that class does not contain a best estimator. This shows the non-existence of a best
estimator in the class of all estimators (A} Hence the theorem .
Chapter 5: Use of auxiliary information : PPSWOR Sampling 485
Let n = {Ul>U2" ",U N} denote a finite population . With each unit Ui' i = 1,2,...,N, is
an associated variate value vi - Let Y = (Y"Y2 "" ,YN) denote a point in the N
dimensional Euclidean space RN . Let s denote any non-empty subset of nand s
denotes the set of all possible samples s. Then a sampling design is defined by
attaching a selection probability p(s) to each s E S so that p(s);::: 0, L p(s) = 1. The
seS
size n(s) of a sample s denotes the number of units Ui included in s. Let
" i = L p(s) and "ij = L p(s) denote the positive first and second order inclusion
53; 53 ;,}
probabilities.
The Horvitz and Thompson (1952) estimator of the population total, Y, defined as
eht(s ,y )= L:diYi
ie s
which is unbiased if and only if the first order inclusion probabilities are known and
positive. Then the first form of the variance, which is valid under any kind of
sampling design , of the Horvitz and Thompson estimator is given by
v(y)= L:(di-l)yf+ L: L: (didj"ij- l)YiYj
ien ienj(;ti)en
which we can refer to as a class of unbiased estimators of variance.
486 Advanced sampling theory with applications
On the other hand, for a fixed sample size design the Sen-- Yates--Grundy form of
variance is give by
VSyg (Y )= ..!.- L L (JriJrrJrijXdiYi-djY}'
2 iEnj(#}en
Assuming that the second order inclusion probabilities are known and positive, an
unbiased estimator of v(y) = Vsyg(Y) based on a fixed sample design is given by
Theorem 5.15.6.1. For a fixed sampling design of size two the Sen-- Yates--Grundy
estimator of variance is admissible in the entire unbiased class of estimators of
vanance.
Proof. Suppose the theorem is not true. In this situation, there exist an unbiased and
admissible estimator Vuht(s,y)} such that
LP(sXvuht(s,y)}-V(y)]2 s LP(sXvsyght(s, y)}- v(y)r
SE S SE S
for all Y E RN and for strict inequality with at least one Y E RN . Suppose the new
unbiased estimator has a relationship with the Sen--Yates--Grundy estimator of
variance shown by
Vuh t(s, y)} = Vsyg {eht(s,y)}+ H(s,y)
where H(s,y) is any function of sample values.
Then we have
LP(SXVSyg h t(s, y)} - v(y)+H(s,y)r ::; LP(S XVSyg h t(s, y)}- v(Y)r
SES SES
or
LP(sXH(s, y)j2::; -2 LP(s)H(s,yXvsyght(s, y)} - v(y)].
SES SES
Considering Yi oc Jri , then one can easily see that the above inequality does not hold
and is a contradiction to our assumption . Hence the theorem.
method is not applicable . They discussed the admissibility (within the class of linear
unbiased estimators) of Murthy's estimators with an effective sample of two units.
Vi = Lb 2(s,i)p(s)-I .
s si
Let k»k 2,...,k N be non-zero constants such that 'Lk i = 1 and let ql> q2 ,...,q N be
iefl
positive numbers such that the weighted variance Lq iV; is minimum subject to
ie fl
conditions of unbiasedness and zero variance at a given point(k"k2,...,k N ) defined
as
'Lb(s,i)k .=I, S E S.
. I
IES
Consider a Lagrange function defined as
L = LqiV; - 2 LAi Lb(s,i)p(s)- 2 Lf.Js Lb(s ,i)ki
iefl iefl Hi seS ie s
or
488 Advanced sampling theory with applications
.L ~ ki + as.L 1]/i = I ,
I ES IES
which implies that
as = _(I
oS )(1- P ;iki)'
IES
Also we have
Lb(s,i)p(s) = I,
S3i
which implies that
L. (~ + as1]i )p( s) = I
S3 1
or
L . ~p(s)+ L . 1].a
1 S
p(s) = I
S3 1 S31
or
or
p(s) p(s) .
C;i"i -1]i I - () I C;iki = 1-1]i I-() for 1= 1,2,...,N .
HiO s iES HiO S
Remember that k, and qi are known, and hence the 1]i and o(s ) are known . For a
given design, p(s) and "i are also known. Thus the unknown quantities are only
values of C;i which is, in fact, a function of Lagrange multipliers. The above
system of equations can be written as
1- 1]1 I p(s)
HIO(S)
1-1] 2 I p(s)
HI o(s)
Clearly a solution (C;" C;2,...,C;N) to the above system of equations can be obtained
to find the weights b(s, i) subject to the condition of unbiasedne ss and admissibility .
Chapter 5: Use of auxiliary information: PPSWOR Sampling 489
Consider a situation where the sample consist of only two unit, that is, s = {i,;}.
Now if we take qi=kl /(I-Pi)' TJi=(I-Pi)/ki and TJiki =(I-Pi)' then
8(s) = 2 - Pi - Pj and the weights b(s, i) become
Hanurav (1966) starts with the basic concepts in sampling theory for finite
populations and considers a fundamental problem of optimum estim ation
procedures to estimate the popu lation total. For unicluster design s Hanurav (1966)
has shown that any estimator eo in M*(P), where M*(P) is a class of all
polynomial unbiased estimators of total, is admissible in M*(P) ifand only if
490 Advanced sampling theory with applications
where YHT(P) denotes the Horvitz and Thompson (1952) type of estimator under
design P and va lues of gs are constants independent of the study variable, the
following condition is satisfie d '[.gsPs =0 . A given parametric function g(y) is
SE S
T heore m 5.15.7.1. A set of necessary and sufficient conditions for the estimability
of the quadratic parametric function
2
Q = 10 + '[.l iYi + '[.q iiYi + '[. '[. qijYiYj
iEO iEO iEOj(;ti )EO
in a design P is given by
(i) Jri >O if Il +qi~ >O, (ii) Jrij >O if qij+q ji*O .
respect to any design P such that for any two samples SI and S2 the condition
PSI > ° PS2 > °get violated , then the estimator, / = E(e I (s ,y)) , is also unbiased
for g(y) and V(/ ):<> V(e) \;fYE RN with strict inequality occurs at least once. The
above theorem basically restates the well know n Basu (1958) result.
While looking for an optimal estimator in survey sampl ing, we have to keep the
following criteria in mind:
(i ) Bayesian approach; (ii) Linear invariance;
( iii) Regu lar estimators; (iv) Hyper-admissibility.
( i ) Ba yesian approach : This approach makes use of prior information about the
known distribution of the study variab le. In this case we genera lly prefer to
minimize a loss function in place of variance or mean squared error.
( ii ) Li nea r Invariance: Roy and Chakravorty ( 1960) introduced the concep t that
an estimator should remain invarian t under linear transformation of the study
variab le y .
( iii ) Regu lar est ima tor : An estimator e is said to be a regul ar estimator if
V(e) = ka 2 , where k is a constant and a 2 is a finite population variance.
For example, for any design P which is not a uniclu ster design the class of
polynom ial, M*(P), unb iased estimators of the popul ation total Y admits ju st one
estimator which is hyper-adm issible. This optimum estimator is only the Horvitz
and Thompson ( 1952) estimator. A class of hyper-adm issible estimators for
unicluster designs is give n by eo = g s + YHT(P) as defined earlier.
Then the statistic e = e,(YbYZ,...,y,,) = L" Yi follows a binom ial distribution with
i=1
parameters n and p, and
population with normal density N~, (J' z ) . Then the statistic es = ~ Yi is sufficient
i=1
492 Advanced sampling theory with applications
i~ 1
Yi2 is sufficient for 0" 2 because the likelihood
L= .rlf(Yi,B)= (
,~ l
d-
v 2Jr O"
J/ exP[- ~(
20"
.I.Yl - 2J.1.I.Yi +nJ.12J]
I~ l I~ l
=ge(e(y))h(y),
where
( i ) The conditional distribution of a sample when given the estimator e does not
depend on the parameter.
( ii ) The distribution of a sample can be reconstructed from that of the estimator e
through randomisation or mathematically using a stochastic kemal.
( iii) For every decision problem, given a decision function based on the sample ,
there exists a decision function based on e which is at least as good as the former.
( iv ) For any prior distribution of the parameter, the posterior is a function of the
sample through e .
The first definition is owed to Fisher (1920, 1922), the second and third are due to
Blackwell (1951) and the fourth is due to Ko1mogorov (1942). Based on the first
definition of sufficiency, Halmos and Perlman (1974) , and Bahadur (1954)
introduced its new definition as follows.
Further Yamada and Morimoto (1992) have developed relationships between above
definitions of sufficiency in more descriptive way. The readers must follow the
lecture notes of Ghosh and Pathak (1992) to have better understanding of
sufficiency in survey sampling.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 493
Following Tille (1998) let '7 = '7(Xi , i E s) be a stati stic based on the auxiliary
information. Since the population is finite the stati stic '7 takes a finite number of
possible values denot ed {'71, '72, ..., '7/}'
Define a indicator variable
I if i E S,
Ii = { 0 otherwise. (5.16.1)
(5.16.3)
I if Jl'ilIJ = 0,
I (Jl'ilk = 0 ) = { .
o If Jl'il'7 > O.
- sc w ] = E (-Yscw I '7) - Y
B [Y - = E [ - 1 .I - Yi ] - [
- Y = E -I . I Y;Ii]
- I '7 - -
Y
N 'EsJl'ilIJ N 'Efl Jl'il'7
" il'7> o
I
=- I
N iEfl
(YI J
E - ' -' 1'7 - - I I f; = - -
Jl'ilIJ N iEfl
I If; IJl'iI
N iEfl
[ IJ =0.
]
"il'7>o
where hi = ElI(Jril'l > o)j = Pr(Jril'l > 0) is an unbiased estimator of population mean.
Proof. The bias in the estimator Yccw is given by
Note that Rao (1985) also discussed conditional inference in survey sampling.
Rao (1996) discussed some current topics in survey sampling at the Golden Jubilee
conference of the Indian Society of Agricultural Statistics , New Delhi. In particular,
inferential issues were studied and the advantage of conditional design based
approach were demonstrated. Practically useful estimators for dual frame surveys
were presented. The jackknife method was shown to provide a unified , but
computer intensive , approach to variance estimation and analysis of survey data.
Finally, small area estimation was considered and model based indirect estimators
that borrow information from related small areas were introduced. Moors, Smeets,
and Boekema (1998) have considered an interesting problem of sampling with
probabilities proportional to the variable of interest. Brewer (1994) has discussed
the past and present prospects in survey sampling inference . Again Rao (I 999a,
1999b) has considered the problem of review of some current trends in sample
survey theory and methods. He provides a brief discussion on developments in
survey design and data collection and processing, issues related to inference from
survey data, re-sampling methods for analysis of survey data, and small area
estimation. Following him the principal steps in sample survey have been
( a ) Survey design ,
( b) Data collection and Processing,
( c) Estimation and analysis of data.
We would like to discuss these issues briefly here. They are covered in more depth
by Rao (I 999a, I999b).
Chapter 5: Use of auxiliary information: PPSWOR Sampling 495
Researchers have paid much attention to sampling errors, and have developed
numerous methods for optimal allocation of resources to minimze the sampling
variance associated with the estimators of total or mean. Much less attention has
been paid to reducing the total survey error arising from both sampling and non-
sampling errors . Many researchers, including Fellegi and Sunter (1974), Linacare
and Trewin (1993) , and Smith (1995), have emphasized the need for a total survey
design approach in which resources are allocated to those sources of error where
error reduction is most effective thus resulting in superior survey designs . Linacare
and Trewin (1993) applied this approach to the design of the Construction Industry
Surveys. Smith (1995) proposed the sum of component MSEs, rather than the MSE
of the estimated total, as a measure of the total error which may be written as the
sum of the errors from different sources. While estimating the population mean or
total, it is customary to study the effect of measurement errors. It has been found
that the usual estimators are design unbiased and consistent under the assumption of
zero mean measurement errors. Following Mahalanobis (1946), the traditional
variance estimators remain valid provided the sample is in the form of
interpenetrating sub-samples. It is interesting to note that this useful feature no
longer holds in the case of distribution functions, quantiles, and some other
complex parameters as shown by Fuller (1995) . The usual estimators are biased and
inconsistent and thus can lead to erroneous inferences . Fuller (1995) obtained bias
adjusted estimators under the assumption of independent and normally distributed
errors. Eltinge (1999) extended Fuller's results to the case of non-normal errors
using small standard deviation approximations. Singh , Gambino, and Mantel
(1994) illustrated the use of compromise allocation to redesign the Canadian
Labour Force Survey. As mentioned by Rao (1999a, 1999b), the interpenetrating
sub-samples provide a valid estimate of the total variance of an estimated total in
the presence of measurement errors, but such designs are not often used, at least in
North America, due to cost and operational considerations. Hartley and Rao (1968)
and Hartley and Biemer (1978) provided interview and coder assignment conditions
that permit the estimation of total variance and its components, such as sampling,
interviewer, and coder variances, directly from stratified multistage surveys that
satisfy the estimability conditions. Groves (1996) noted that the customary one way
random effect interviewer variance model may be unrealistic because it fails to
reflect 'fixed' effects of interviewer attributes such as race, age, and gender that can
affect the responses.
methods of collecting data like random digit dialing (RDD), which provides
coverage of both listed and unlisted telephone households . The two stage Mitofsky-
-Waksberg technique and its refinement s are designed to increase the proportion of
eligible numbers in the sample and thus reduce data collection cost by following
Casady and Lepkowski (1993). Following Groves and Lepkowski (1986) the dual
frame approaches are also useful in obtaining more efficient estimates by
combining a sample selected from an incomplete directory list frame with another
sample selected by random digit dialing . Many large scale surveys, especially
surveys on family expenditure and health, use long questionnaires for data
collection . Such surveys can lead to high rates of non-response and decreases in the
quality of response, but this problem may be reduced by splitting the long
questionnaire into two or more parts. For example, Wretman (1995) splits his long
questionnaire into five non-overlapping parts. For surveys dealing with sensiti ve
questions, the quality of responses and response rates might depend on the ordering
of the questions in the list. It is also important to keep in mind that data on sensitive
characters or variables can also be collected through a Randomized Response
Technique, which makes use of a device to collect indirect answers to the questions
in the surveys. Data collected from surveyor census by anyone of the above
methods needs editing . The purpose of editing is to see which records are
unacceptable, outliers, or missing. Then imputation of the missing record s and
proper treatment of the outliers or unacceptable values is most important, if care is
taken to assume that the editing procedure changes as few values as possible. For
example, Fellegi and Holt (1976) developed a method for automatic editing of
survey data with the help of computers under certain assumptions.
The basic idea of the inference is to obtain an estimator Y , its standard error s(r)
or coefficient of variation,
c(r) =s(r)/y,
and associated normal theory intervals
Y+ Za/2s(r)
on Y from the sampled data for large sample size n. As we noted in this chapter ,
there are three approaches:
( a ) Design based approa ch;
( b ) Model based approach ;
and
( c ) Model assisted approach.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 497
The traditional work in sampling including the Horvitz and Thompson estimator
comes under design approach . The work related to Royall (1970a, 1970b, 1970c)
and Brewer (1963a) comes under the model based approach. The contribution
related to Sarndal, Swensson , and Wretman (1991) comes under the concept of
model assisted approach . Following Rao (1999), the methods of re-sampling and
small area estimation, which we shall discuss in subsequent chapters, are also under
the current topics in survey sampling.
5;18 MISCEUEA.NEOUS'DISCUSSIONSrrpPICS
In this section we introduce some topics which exist in the literature and may be
useful for the research oriented readers.
Generalized 1r ps designs were defined by Rao (1972). The term IPPS stands for
Inclusion Probabilities Proportional to Size. Working with a general
superpopulation model B(g), the strategy consisting of G1r PS design together with
the associated HT estimator of population total has proved to be better than two
other well known strategies of Rao (1971, 1972). Following Rao (1972), if a design
is such that 1r; ex: Xl / 2 (i = 1,2,...,N ) and L xl-(g/2) =k , a constant for any sample s
ies
with Ps > 0 then that design is called generalized 1r ps design. For g = 2, the
G1r ps design is a 1r ps design with fixed sample size. Ramachandran (1982) has
shown that the g(B) optimality of the strategy consists of Gnps design together with
the associated HT estimator in the entire class of design based unbiased strategies of
the population total with expected sample size fixed. Ramachandran generalized the
results of Rao (1971 ,1972) by following Godambe and Joshi (1965). Pedgaonkar
and Prabhu- -Ajgaonkar (1978) have shown that the G1r ps strategy is better than
the RHC strategy with fixed sample size. McLeod and Bellhouse (1983) gave a
simple and useful algorithm for drawing a simple random sample without
replacement in a single pass through a sampling frame for a finite population whose
size is unknown. Richardson (1989) extended their method to probability
proportional to size sampling. Korwar (1996) provides a method, which is an
adoption of the method of McLeod and Bellhouse for the simple random sample
without replacement, for drawing a sample with probability proportional to
aggregate size in a single pass through a sampling frame of a finite population
whose size is unknown . Some discussion on the question of availability of a unique
best estimator in the Horvitz and Thompson (1952) class of estimators has been
given 'by Chaudhuri (1975a, 1975b).
498 Advanced sampling theory with applications
Tam (1984) has provided necessary and sufficient conditions for an estimator
design pair to be optimal under a regression superpopulation model with correlated
residuals. Tam (1986) has provided necessary and sufficient conditions for the
optimality of an arbitrary linear predictor of the total of a finite population in survey
sampling under a general linear model with a symmetric and positive definite
covariance matrix . He extended the work of Pfeffermann (1984) and Tallis (1978)
by foIIowing Cassel, Sarndal, and Wretman (1977), Royall (I 970a, 1970b, I970c,
1976), RoyaII and Herson (I973a, I973b), RoyaII and Pfeffermann (1982), Sarndal
(1980a, 1980b), Scott, Brewer and Ho (1978), and Zyskind (1967) .
Wright (1990) has shown that there can be a gain in estimation strategies over equal
probability sampling methods when one makes use of auxiliary information for
probability proportional to size with replacement sampling methods. When a
suitable variable X is not available, one may know how to rank units reasonably
weII relative to the unknown y values before sample selection . When such ranking
is possible Wright (1990) has introduced a simple and efficient sampling plan using
the ranks as the unknown X measure of size. He showed that the resultant
sampling plan is similar to, has the simplicity of, and has no greater sampling
variance than with replacement sampling, but is without replacement. Kumar,
Srivenkataramana, and Srinath (1996) have also shown the use of ranks in unequal
probability sampling for sample selection and stratification including determining
the strata boundaries. They also suggested a few sampling schemes . For samples of
size two, two sampling schemes and their IPPS versions have been discussed along
with their extension to large sample situations. Non-negative unbiased estimators of
the variance have also been suggested.
contributed some useful remarks on the use of models and sampling schemes while
using unequal probability sampling strategies.
Reddy and Rao (1990) have considered the problem of estimation of population
total of bottom (top) P percentiles of a finite population using the Horvitz and
Thompson (1952) type estimator.
1n -I n
where TXj = n" L.Xii ' j = 1,2,...,p .T; = n L.Yi, and g is a smooth function. It is
i=1 i=1
to be noted that estimation of mean, proportion and ratio of means is a special case
of the general estimator Q, but the estimation of median, variance, or correlation is
not a member of this general estimator. The estimand Q will be the same smooth
function g of the expectations of the means Q = g(E(TxJE(Tx2)...,E(TxJE(Ty)) ,
where the expectations are taken over repeated sampling and estimator Q is called
an estimator based on the method of moments. Thus the general form of the
estimator of the variance V{Q) is given by
"(Q) = n-l[og(r)joIJ ~[og(r)/oI] (5.18 .6.2)
where I = (J:tl ,TX2, ..·,Txp,T) t and S = (n-ltll~t ~ -ai: J with ~ = (K,y) . Schafer
and Schenker (2000) have developed a model for missing data based on this
procedure .
Yp = L Yi (5.18.7.1)
iesJr j
with variance
2
V(Yp )=L (I- Jri)2L (5.18.7.2)
iEn Jri
and an unbiased estimator of the V(Yp ) is given by
.(y'P )= " (1-2Jr;)Y"2
v .L. (5.18.7.3)
IE S Jri
For more depth in Poisson sampling, one can refer to Ogus and Clark (1971) and
Brewer, Early, and Hanif (1984).
Cosmetic calibration was introduced by Sarndal and Wright (1984). Brewer (1995)
suggested a procedure for constructing a cosmetic estimator. Brewer (1999) has
shown that cosmetic estimators are by definition interpretable both as design based
and as prediction based estimators. Formulae for them can be obtained directly by
equating these two estimators or indirectly by a simple form of calibration. Note
that they constitute a subset of GREG, their design variances cannot be estimated
without knowing the relevant second order inclusion probabilities, but under the
prediction model to which they are calibrated those probabilities do not affect their
anticipated variances, so it is more appropriate to estimate these and/or their
prediction variances. Brewer (1999) has shown that cosmetic calibration is a simple
and effective method for eliminating negative and unacceptably small positive
sample weights. Interestingly he suggests here to estimate the anticipated variance
of any calibrated estimator of population total under superpopulation model
m : Yi = .axi +ei' such that Em(e;) =0 , Em(el)= u xf , and Emkej )=0 for i
2 '* j .
We have seen that a calibrated estimator of population total Y can be written as
Yc = LWiYi , (5.18.8.1)
ie s
where Wi are the calibration weights. The by the definition of Anticipated Variance
=U
2
[.I Wi (Wi -If xf +(Ixf - .I Jrj1xf )- (.I wixf - .I Jrj'xf)] . (5. I8.8.2)
I ES lEO IE S I ES lE S
( b ) Its anticipated variance and its prediction variance can both be estimated more
easily and more efficiently than the design variance of the standard GREG ;
( c ) Design based estimation has a tendency to be more reliable for large samples,
and prediction based estimation for small samples and small domains. Thus the
estimators used for large domains are typically design based while those for small
domains are often purely prediction based or synthetic. If the large domain
estimator s are calibrated, the estimates for their compon ent small domains
automatically sum them without forcing;
( d ) As an unexpected spin off, the elimination of negative and other unacceptably
small weights is streamlined by the use of cosmetic calibration.
In order to estimate the population total '» = LieOYi , the Horvitz and Thomp son
(1952) estimator is given by
~ Y; _ ~ Y/ i
ty -
A _
where 0 ij=l" i" r " ij) for i s- j such that lim sup(n ) ~ax I0ijl <oo. Anestimator
N -w) l ,j eO
for (5.18.9.3) is
the finite population, er a vector with a 1 in the r,h position and 0 elsewhere,
E'= (&;,£2•...•£N )IXN '
and
-1 K (XI
h
- -Xi)
-
h '
0,
°
0,
W Oi =
0, 0, ,i K(XN ;Xi )
NxN
Minimization of er E' W Oi E leads to the local polynomial kernel estimator of the
regression function at Xi given by
= ei' (XoiWn;Xn;)
, \-1 . ,
1IIi X OiWOiYO = WOiYO ' (5.18.9.6)
This estimator is well defined if (X~i Wn;X o i )
is non-s ingular. If the 1/Ii are known,
then a design unbiased estimator of t y would be the generalized difference
estimator
• y- -1/1.
' + I1/Ii .
ty = I - ' - - (5. 18.9.7)
iES 1(i ;EU
The Sen--Yates--Grundy form of the variance of (5.18.9.7) is
1/Ii X siWsiYs
If lij denot es the (i,J}th element of the inverse (X~iWsiXsJI , then (5.18. 9.10) can be
written as
Chapter 5: Use of auxiliary information : PPSWOR Sampling 503
lIli =el L
' 0 . 11
J
.! / (I_I)j(XI -xYK(x-x.)YI
_ 1_ _'
(5.18.9.11)
1=1 Jrl h h
for i = 1,2,...11.
Note that Xi is known for the entire populat ion, the 1/11 can be calculated for all
i En. Using this result Breidt and Opsomer (2000) propo sed the estimator
- y . - ffl?
t)~ = L - ' - - ' + LI/Il (5.18.9.12)
iES Jri i EU
for tY" Following Sarndal, Swenson, and Wretman (1992), the variance of t"y0 can
be approximated as
•
VsygVy
('O)=~"''''D ..( Yi - ffll _Yj-ffli ]2
L. L. IJ •
(5.18.9.14)
2 i* jES Jri Jrj
(X~iWSiXSi + diag(RXq +I)x(q+l)t ' where R stands for a ridge constant. Fan (1993)
as an estimator for the population total, and an estimator for estimating the v(t"y) in
Sen--Yates--Grundy form is given by
Still more remain seems to be done on these lines. Note that one thing is very
obvious by following Singh, Hom and Yu (1998) that one can calibrate the
estimator of variance V
syg(7;,) by constructing model assisted calibration constraints
VSyg(7;,) under the model (5.18.9.5) such that
E; {VS yg(t;,)} =E; {vSyg(7;,)},
504 Advanced sampling theory with applications
and the chi square distance is minimum between the design weights and calibrated
weights . Further note that Bredit and Opsomer (2000) have reported second form
of the variance of Horvitz and Thompson estimator.
Singh (2003c) discovered that the traditional linear regression estimator can also be
shown as a special case of calibration approach, and pointed out that all the papers
related to minimizing chi square distance function in survey sampling need
modifications. There is a series of such papers by many followers of Deville and
Sarndal (1992) and it seems that everyone skipped a very important point while
using chi square distance function. The technique developed by Singh (2003c) is
logically more accurate than whatever is done by survey statisticians during the last
decade . The traditional linear regression estimator due to Hansen , Hurwitz, and
Madow (1953) is shown to be unique in its class of estimators, and celebrates
Golden Jubilee Year 2003 for its outstanding performance. Singh (2003c) considers
an estimator of the population total Y as
• Ell
Ys = LWj v. (5.19.1)
ie s
where w?
are called the calibrated EB (read as plus ) weights such that the chi
square distance function defined as
LW?=Ldj , (5.19.3)
ie s ies
and
L WjEll Xj=X, (5.19.4)
ie s
Chapter 5: Use of auxiliary information : PPSWOR Sampling 505
The choice of weights q?makes different forms of estimators . Note that the
conditio n (5.19.3) is a requireme nt of the chi square test given by Sir R.A. Fisher,
and is ignored by all the followers of Deville and Sarndal ( 1992). Obvio usly the
Lagrange function is given by
ffi d ·f { ffi
L= -21 .L -(w diqi-ffi'-
, ES
Al .LWi - .Ldi} - Az {ffi
I .LWixi- X} .
IES IES IES
AI(.Ldiq?] + Az(.Ldiq?Xi] = 0 .
I ES IES
(5.19.6)
On using (5.19 .5) in (5.19.4) we have
(.Ldiq?](.Ldiq?4 ]-(.Ldiq?Xi]
lE S lES I ES
(.Ldiq?](.Ldiq?4]-(Idiq?Xi]
I ES lE S IES
On substituti ng these values in (5.19.5) the calibrated plus weig hts are
diq?Xi(.Ldiq?]-diq?(.Ldiq?Xi ] )
Wi =
ffi d
i + I ES
2
(x - x·li T')
I ES
(5.19.8)
1 (.Ldiq?](.Ldiq?4 ]-(.Ldiq?Xi]
l ES
where
.
R _
(.Ldiq?XiYi ](.Ldiq?]-(.Ldiq?Yi](.Ldiq?Xi]
lES I ES I ES I ES
Po~ - 2 (5. 19.10)
(.Ldiq?](.Ldiq?4 ]-(.i L
IES es
I ES
diq?Xi]
which is clearly usual traditional linear regression estimator.
Note that if q?
= 1 and under SRSWOR sampling where d, = ut«, the estimator
(5.19.9) reduces to
506 Advanced sampling theory with applications
A new estimator of the variance of the traditional linear regression estimator Ys has
also been suggested by Singh (2003c) as
A(A) 1
s Y, = -2 .L,
V t», (<:B<:B
ei - Wj<:B<:B\2
wi ej ) (5.19.12)
'*JES
where e'j' = Yi - aol s - PolsXi ' Note that aol s and P ols are the least square estimates
of a and f3 in the model Yi = a + f3 Xi + ei obtained by minimizing IAq?e?2 .
ie s
Singh (2003c) also studied a further calibrated estimator of variance given by
D<:B=~'L'L(nt-DijJ, (5.19.14)
2 iv j e s DijQ:
is minimum subject to two calibration constraints given by
'L 'L n ij
<:B
= 'L'LDij' (5.19.15)
ie j es i*jES
and
±'*JES
,I , Int(diXi - djxj ~ = v(xHT)' (5.19.16)
Singh (2003c) suggested that the statistical package GES developed by Statistics
Canada could be modified to obtain the traditional linear regression type estimates
of population total, and to estimate its variance using the modified calibration
approach discussed in this section. Similar changes in other statistical packages
such as SUDAAN , CALMAR, SAS, and STATA etc. are also suggested.
Chapter 5: Usc of auxiliary information: PPSWOR Sampling 507
Example 5.19.1. Continuing from Example 5.5.1, find the calibrated plus weights
which leads to traditional linear regression estimate of the number of fish caught
during 1995.
Solution . Continuing from Example 5.5.1 and for = 1 we have q?
~lV?7: I( x'c,
,ki 'Jill
I k ' ~~~) ' ) ~i0:
,>lCV!". J;I,) 1" ,, '01'/ ' ,'>'
2001 2016 0.107450 9.307 18622.615 37263852.955 9.071 18288.135
5692 2319 0.110228 9.072 51638.422 293925899.046 9.167 21257.489
2653 3816 0.115834 8.633 22903.465 60762893.451 8.469 32318.978
4860 4008 0.106153 9.420 45782,974 222505251.853 9.443 37846.597
3850 2568 0.120075 8.328 32063.294 12344368 1.033 8.267 21228 .615
17741 16238 0.139569 7.165 127112.754 22551073 73.414 8.074 131111.6 75
776 163 0.103605 9.652 7489.986 58 12229.140 9.294 1514.894
2300 2324 0.107686 9.286 21358.394 49124305 .852 9.078 21098 .349
,., ,",,"'" ,
"l"" ", ;iq, "",,'" ,ii0)~;:; ~~A ,'J.lllll 8U4 3047945486:743 1
:;x J>'J' i284664:732
Thus a traditional linear regression estimate of the number offish caught during
1995 is given by
, Ell
Ylr = IWi Yi = 284664 .732.
ie s
Exercise 5.1. Find the bias and variance of the estimators of population total, Y ,
defined as
YgI = Yufuf, Yg2 = Y{1 + a(ut -1)+ fl(U2 - I)}, and Yg3 = Y[1 + a(ul -1)+ fl(U2 -1)]1
, / ' / , /I x~ N
where rq = XI XI and U2 = X2 X2 for X , = L-L and X, = LXi.
i= IJri i=t
Exercise 5.2. Find the bias and mean squared error of the estimators of population
total Y defined as
Y,kg 1 = aY'(XI
XI + flY'(XI
XI J J2 and
508 Advanced sampling theory with applications
where a,fJ and r are suitably chosen constants such that MSEs of the estimators
are minimum .
Hint: Kapadia and Gupta (1984) .
Exercise 5.3. Show that for any sampling design with positive first order inclusion
probabilities for all the units in the population, the covariance between Y. = 'L.d;Yi
"
i=\
. = 'L.d;xj
and X; " is given by
i=1
where d, = Jrjl, for ,. = 1,2, ..., m. Derive its value under SRSWOR sampling
design.
Hint: Sampath and Chandra (1990).
Exercise 5.4. Study the asymptotic properties of the estimator of population total
Y defined as
'" n x~
where X r = I - ' has its usual meaning .
i=1 Jri
Exercise 5.5. Show that the minimization of EmEp ~s - r] for any design p(s)
under the model 111 : Y; = fJX i + Ei' where Em (EllX i) = 0'2 f(X i) leads to the
estimator of total, Y , given by
• = "L,Yi + ("
YI
iES
L, -»i».
(- )
iESf Xi
J/(" xl )"
L, - (- ) L,Xi '
iESf Xi iES
Hint: Royall (I 970a, 1970b, 1970c, 1971), Bellhouse (1984).
.
E xercise 5..
6 Consiider Y- s =-L,-,
I" Yi 1" xij
- =-L,-
Xsj an d 1~
X-j =-L,Xij c.
lor
N iESJr' N iES Jri N i=1
j = 1,2,..., p have their usual meanings. Study the bias and variance of the estimator
of the population mean Y defined as
y, =Ys + f jJsAxj - i s)
j =!
where jJSj denotes the r partial regression coefficient, under the superpopulation
model.
Hint: Sarndal (I 980b).
Chapter 5: Use of auxiliary information: PPSWOR Sampling 509
D1 = L n{w-d.
' ,)Z ; n[ (w) -wi+di ] ;
D z = L wi ln -l...
n(c r:J\2J ;
D 3 =2L VWi -Vdi
i=1 2d i i= di i=1
n[
D4 = 7= - di ln d; +wi-di ; and
(w. ) ] D5=i~1
n{w.-d.)Z
i''
2W
subject to the calibration constraint
n
l:wixi =X,
i=1
where d, are the known design weights and Wi are the calibrated weights to be
found . Discuss the nature of calibrated weights in each situation.
( b ) Optimise the generalized distance function
n {w- _d.)a
D= L ' ,
i=1 diqi
subject to
n
L w.x . = X.
i= 1 I I
Find a such that the variance of the resultant estimator is minimum. Is optimum
a = 2?
( c) In each one of the above cases ( a ) and (b ), study the following two
estimators of the population total, Y, defined as
, n
Yds = l:wiYi and Yes = l:wi Yi + ei - l:diei
,n{}n
i= 1 i=1 i=1
where ei = Yi - PdsXi .
Hint: Deville and Sarndal (1992), Estevao and Sarndal (2000) .
Exercise 5.8. Find the bias and variance of the Rao, Hartley, and Cochran (1962)
strategy and the Horvitz and Thompson (1952) strategy for a multi-character survey
for estimating popu lation totals given by
, n y. , 1 n y.
YRHC = l: -t<i and YHT = - l:-.'-
i=1 p; ni=1p; Jri
where
p;* = (1 + ~ f PXY
(I + p; )PXY -1
and Pxy is the known correlat ion between the selection probabilities Pi and the
variab le under study , under the superpopulation model , y; = fJP; + ei , where e i are
the error terms such that E{ei I p;) = 0, E(ell p;) = ap;g with a>O and g e 0 and
Ele;ej I p;Pj ) = 0 \::Ii '* j . Can you suggest two more transformations p;*?
Hint: Bansal and Singh (1989, 1990) , Bedi (1995) , Mangat and Singh (1992-93).
510 Advanced sampling theory with applications
Exercise 5.9. Let there be a population of N units and we want to select a sample
of 11 units. For this selection, the population is randomly divided into (11 + k)
n+k
groups of sizes NI> N 2 , ••••, N n+k such that 'L.Ni = N . For the first group we select
i=1
N1 units out of N units with SRSWOR sampling. Then for the second group,
select N 2 units out of (N - N)) units with the same sampling scheme and so on.
From these (11 + k) groups we then select a sample of 11 random groups using
SRSWOR sampling. Now for selecting the ultimate sample of 11 units, we select
one unit with probability proportional to the orig inal probabilities Pi such that
N .
'L.P; = 1 from each of the groups mdependently. For this scheme, show that an
i =1
unbiased estimator of population total is given by
• (1I+k) Yi
lJ = - - ,'L. -(--) where Ti = L Pij'
n les P; / Ti j eGi
Find its variance and compare with the usual RHC scheme.
Hint: Bansal and Singh (1986).
Step II. Select the second unit from the remaining (N - 1) units with conditional
probability for the/" unit being proportional to (X j - Xi~ .
Step III. Select (11 - 2) units from the remaining units of the population by simple
random sampling and without replacement.
Show that under such a sampling scheme, the probability of selecting S'" sample is
p(s) = s~/{(:)s~}.
Deduce that the regression estimator of population mean defined as,
(s s.; Xx - Xs )
)ilr = )is + xy /
and the ratio type estimator of finite population variance defined as
2 2( 2/ 2)
S ) = Sy S x Sx
Exercise 5.11. Show that the difference between the variance of the estimator of
population total Y under PPSWR sampling and PPSWOR sampling schemes is:
Chapter 5: Use of auxiliary information: PPSWOR Sampling 511
V(YHT ) - V(YHH )= I
2
Difference = (Y; - ~YXYr Pj Y)-2.:- .
;,~j=l n ~Pj
Show that Difference z 0 for Midzuno (1952) and Sen (1952) sampling schemes.
Hint: Prabhu--Ajgaonkar (1975).
p(s) = m/ {(:)x}.
Show that a product type estimator of population mean, YI = (yx)/m , where m is
the equiprobable harmonic mean of x values, is unbiased.
Hint: Ruiz and Santos (1990).
Exercise 5.13. In the RHC sampling scheme, let the first random group be made by
using Midzuno --Sen's scheme of sampling while the remaining (n -1) groups are
constructed as usual. Show that if the random groups are of equal size then the
resultant strategy and the usual RHC strategy are equally efficient; otherwise, find
the condition under which the resultant strategy fares better than the usual RHC
strategy .
Hint: Singh and Lal (1978).
Exercise 5.14. Write the FORTRAN codes to generate the first and second order
inclusion probabilities to estimate the variance of the Horvitz and Thompson
estimator of population total.
Hint: Bandyopadhyay, Chattopadhyaya, and Kundu (1977).
Exercise 5.15. In the RHC sampling scheme let the sample be selected in such a
way that:
( a ) The cost is proportional to the expected number of distinct units in the sample;
and
( b ) The cost is proportional to the total expected size of the sample where {Pj } are
taken as the relative measures of sizes for the respective units in the population .
Then show that the RHC scheme remains less efficient than the PPSWR sampling
scheme for the fixed cost of surveys.
Hint: Singh and Kishore (1975).
Exercise 5.16. Show that under Midzuno--Sen's scheme of sampling, the variance
of the Horvitz and Thompson (1952) estimator (HTE) of total may not generally
decrease with an increase in sample size or average effective sample size. Suggest
an improvement so that the resultant Jr ps property yields a variance of HTE
which decreases with increasing sample size.
Hint: Chaudhuri and Amab (1978).
512 Advanced sampling theory with applications
Exercise 5.17. Discuss the efficiency of the usual ratio estimator under Midzuno-
Sen's scheme of sampling.
Hint: Singh (1975a).
Exercise 5.18. Show that the general formula for estimating the variance of the
estimator of population total under the RHC scheme is
n n
Vb = L: L: aijdij'
i=lj=1
2
.. QiQ j Yi Yj n n n
[
where aij=bij j(NiN j)c'i'}, dij=-z- ~--p, and ,L: L: bij = ,L:Ni(Ni-l) .
'i] J 1=1]=1 1=1
Exercise 5.19. Discuss the asymptotic properties of the two estimators of the
variance of the linear regression estimators:
VI =I IDij(diei -dje} and I IDijkigiei -djgje}
v2 = j¢i=1
j ¢i=l
where gi = JriWi and the other symbols have their usual meanings.
Hint: Chaudhuri and Maiti (1994), Valliant (2002).
where b(s, kk)and b(s, kl) are suitably chosen constants. Find the bias and variance
of this estimator.
Hint: Mukhopadhyay (1982), Hanif, Mukhopadhyay, and Bhattacharyya (1993)
Exercise 5.21. Discuss the relative efficiencies of the strategies due to Horvitz and
Thompson (1952), Rao, Hartley, and Cochran (1962), Sen (1952) and Midzuno
(1952) sampling schemes in estimating a finite population total under the
assumptions of a superpopulation model.
Hint: Chaudhuri and Amab (1979).
Exercise 5.22. Is there any model free evaluation technique to compare the bias
and the mean squared error of the regression estimator?
Hint: Konijn (1979).
Exercise 5.23. Assume that the sampling design is such that when the population
size N and sample size n are large enough, the joint probability sampling
. [V(Y),c(X.Y)]
v(z) = • • •
c(X,Y),v(x)
where v(Y) and v(X) are the sampling variance matrices of Y and x,
respectively, and c( X, Y) is the sampling covariance vector. Then show that the
where B=v(xfc(x.Y).
Also show that the conditional mean squared error of the estimator, Y, can be
written as
+! +1,{::,)}'.
Exercise 5.24. If the estimating function g * is optimal, that is if it satisfies the
E(ei)= 3cr4xi , where i = 1,2,... ,N, f3 and cr2 > 0 are unknown parameters. An
unb iased estimator of population mean f IS
_ 1 11 Yi 1 1 11 Yi
YIlT =-L:-=--L:-
N i; j !ri N n i;lPi
Assuming PPSWR sampl ing designs , compare the following estimators of variance
V(YHT 1 given by
(a) vWR = X
2
r.(Yi
n(n-1L;\ Xi
_..!.- r.
n i-i x,
Yi J2,
(c) v~=(I-nIP1)vwR'/;\
and
Exercise 5.27. Suppose cr 2 is known . Then the necessary and suffici ent conditions
, N
for the estimator Y = WOs + I WksYk of the linear function L:PkYk to be admissible
kes k;1
are that there exists As such that Wks = Asak + Pk (k E s), and one of the following
two conditions are satisfied:
( i) O ~ As~2 ,where cs'" L: Pkak and d s '" L:al;
ds k es kes
2
( ti ) As = 2 , and WOs = _2 L:akbk + L:Pkbk - acr [(2J2 d, + L:pl]
d, ds kes k es 2 ds k es
Exercise 5.28 . A necessary and sufficient condit ion for the estimator
, 1- f 2
Vs =-n- Sy
Exercise 5.29. Consider a design with p(:s) = (N: Ift Pi. Show that an unbiased
Exercise 5.30. For any sampling design P and any real number a, show that
{-a)= --;;I .~z:
E It ~ ~ (. . . )
. z: " ". z: Yil Yiz..·Yia" 1\ ,IZ,···,l a ,
n '1;\'2;) 'a ; \
where y = n- I IYi is the sample mean, and "(i\,iZ, ...,ia ) is the ath order inclusion
i ;)
probabil ity of including units (i\ ,iz,...,ia ) . Also for any real number fJ show that
E{( .L YifJJ{-a)~
IE S
_ I NN N
It ---;;.L
n
N fJ (. . . .)
.L.L ..... L YiOYi)YiZ" 'Yia" 10 ,/\,IZ ,·..,l
'a;) 'O;\,,;I'Z;I
a .
Deduce the result that for any non-negative real number fJ if M fJ = L(Yi - y)/3 then
ie s
show that
1
SOl }_ fJ- ( )1(fJ) II. {LN. LN .....LN YfJ-I Yi\ " 'Yi " (..
Kl!VlfJ - L -I
.) }
1;\ t n '0;) ,, ;\ /1;\ iO l 10,/\,...,11
I NN (. .) N fJ .
+ (-I)/3-fJ_\ { .L .... L Yi\ " 'YifJ" 1\, .. .,lfJ + .L Y; "(I)}
n ,,;\ IfJ ;\ /;\
Hint: Srivastava and Saleh (1985)
Exercise 5.31. Find the bias and variance of the following estimator of the
populat ion total Y as
YH = IdiYi[~]
iE S Idixi
, where d, = ,,;1.
ies
Hint: Hajek (1959) .
Exercise 5.32. Find the bias and variance of the estimator of the population total
Y defined as
YH = .I diYi +
IES
b(X - .I d;X; ]
IE S
Exercise 5.33. Justify the statement 'For each sampling design there exists a
rejective method of drawing a sample and terminates with selection of a sample
516 Advanced sampling theory with applications
with probability one.' Also show that for Hajek's (1964) method the expected
Exercise 5.34. In the non-parametric model Y; = m(x;)+ e;, where Em(e;) = 0 and
Em (el)=ax?, g <: 0, discuss different method of estimating m(x;) . Show that if
m(x;) = fJx; the non-parametric model reduces to parametric one.
Hint: Breidt and Opsomer (2000).
Exercise 5.35. Let y and x;, i = 1,2,...,1, respectively, be the survey variable and
the auxiliary variables related to y and the information about the quantiles of the
auxiliary characters or distribution functions are known . From the sample of n
units from a population of size N we observe lX;k ' Yk) where k E s. Consider
Qx.(a;)
1
fora; E (O.O,O.S)U(OS,l.O) are known and we wish to estimate Qy(fJ) with
fJ = 1/2. Study the asymptotic properties of the following estimators of Qy(fJ) as
FR = FHTy Qy
A A (
fJ ITV;
())'
1=1
A I t
jFHTx.(Qx.(a;)))
1 (
F HTXj Qx; a;
))'
. ,
With .I W; = 1
1=1
and
fro = frHTiQy(fJ))+ jIb;{F
:::::l
HTX'I (Qx.(a;))- frHTx I (Qx.(a;))}
I I
where
Ll(Qy(fJ)- Yk) LllQx.(a;)- x;k)
I I
A A
Exercise 5.36. Consider a penalized chi square distance function between design
weights d, and calibrated weights as w;
D=..!.-'<;"(w; -dJ +..!.-<]J2,<;"w*1* '
LJ tic L.
2 ;ES d .q, 2 ;Esd;q;
where q;
are weights and <]J is a positive quantity that reflects a penalty to be
decided by the investigator based on prior knowledge , or the desire for certain
levels of efficiency and bias. Minimize D subject to the following two situations :
(i ) No auxiliary information is available;
and
( ii ) Calibration constraint of Deville and Sarndal (1992) .
Chapter 5: Use of auxiliary information : PPSWOR Sampling 517
Discuss the different estimators of Ynew = L: w;Yi as special cases for different
; ES
sampling schemes and choices of weights qi' What choice of penalty and which
situation leads to Searls' estimator?
Hint: Farrell and Singh (2002b).
fJ = {NIXiY; - .I XlI
1=1 1=1 1=1
y;}/f NIX? _(.IXi)2).
1 1=1 1=1
}/! n
Show that the usual OLS estimator of fJ defined as
.
fJols = {n
n.''IXiYi- .'I Xi f= Yi
11 Il
n'Ixi2 - (n.'I Xi)2]
1=1 1=1 1=1 1=1 1=1
Consider the sample s is taken from a finite population n with sampling design p
such that ll'j and ll'ij denote the first and second order inclusion probabilities. Show
that the variance of the design consistent estimator of fJ, given by
Exercise 5.38. Consider an auxiliary variable x has a negative correlation with the
N
study variable y . Let X = "IXi and consider a transformed variable X; = (X - Xi)'
i=l
N
i = 1,2,..., N , so that L: X; = (N -l)X and the probabilities of selection are
i=l
• _ (1-pJ
Pi-(N_1)' i = 1,2,...,N.
•• 1 n y-
YHH = - I ---+ and
II i=IPi
where 1[; is the first order inclusion probability with prob ability set P; .
Hint: Bedi and Rao (2001 ).
Exercise 5.39 . Find the condit ion on the real constants c., i = I, 2,..., N such that
the estimator
YB = I Yi- Ci + C,
i es 1[i
N
where C = I.ci , is unbia sed for estimating population total Y .
i=1
Hint: Basu (1971) .
( a ) Consider a model m such that Em(ei IXi) =O, E m (el lxi )= a 2V(Xi) and
Em leie j I XiXj )= O. Assuming that V(Xi ) is known, show that the calibration
equation obtained by equating
Em [VI(Yds)] = Em~(YdJ]
is given by
I.
ie s
w;di(l- 1[;)v(x;) = ienI. di(l - 1[){x;) .
( b ) Consider an estimator of the populat ion total Y as
Yss = IW;Yi
ie s
where w;
are recalib rated weights, and are obtained by the minimization of a
penalized chi square distan ce function
Dp = -!. '"
~
J + -!.",2 '" ht* '
(w;- w • \..V ~
2 ies wiqi 2 ies wiqi
assuming Wi > 0 for all i = 1,2,..., II , and are already calibrated weigh ts ( or design
weights), and et> is a known real constant and called penalt y to the function .
Chapter 5: Use of auxiliary information : PPSWOR Sampling 519
Y.s = ( 1 )
1+ <1>2
[~WiYi+/JsS{~di(I-JlJV(x;)- .fwidi(I -Jl";)V(Xi)}] .
1=1 1=1 1=1
Further deduce the estimator owed to Prasad (1989), Yp(r) = N Ysearl (xix), as a
special case of it for certain choice of <1> •
( e ) Suggest some improved estimators of variance through calibration.
Hint: Singh (2002b), Singh and Hom (1999).
Exercise 5.41. ( a) Let Pi be the probability mass at, yt , for i E S and X be the
know population means for the (vector valued) auxiliary variable, x . Study the
asymptotic properties of the empirical maximum likelihood estimator of the
population mean, f, defined as
f EL = IPiYi
ies
where the values of Pi maximize the empirical likelihood function L(p)= Il Pi
ies
subjectto 2.: Pi =1, Pi > 0 and 2.:Pixi =X.
ie s ies
Exercise 5.42. Study the asymptotic properties of an unbiased ratio estimator of the
population total defined as
,
TRao = --.£ .L (y.J
(T) I (B;Y;-BjYj {I~--IJ. - (I--I .L~IJ( L-'.-'
-'. +-LL B.Y.J
n 'E S X, N ' <J E S B, BJ N ' EO B, 'E S «.
Show that if B; = Tx/(NX;) , and L B;-I = N then TRao reduces to Hartley and Ross
;EO
(1954) unbiased ratio estimator under SRSWOR sampling.
Hint: Rao (2002).
Exercise 5.43. Suppose XI and X 2 are the known totals of two auxiliary
characters Xli and X 2; , for i = I, 2" 0" N . Consider an estimator of the population
total Yas
Ylr = LW?'y;
i es
Then show that the minimization of the CS distance function
D= ~Jw?'d -dJ
. $
' ES ;q;
subject to the three linear calibration constraints, given by
where PI(OIS) and P2(OIS) are the ordinary least square estimates.
Hint: Singh (2003c)0
PRACTICAVPROBLEMS
Practical 5.1. John and Michael were appointed to select three players (n = 3 ) out
of five players (N = 5) from the list n = {Amy, Bob, Chris, Don, Eric} with their
scores 125, 126, 128,90 and 127, respectively.
(a) Find the first orde r inclusion probabilities for John's sampling scheme.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 521
(b) Find the estimates of total score from each sample using John's sampling
scheme.
( c ) Find the bias in John 's sampling scheme.
( d ) Find the second order inclusion probabilities for John's sampling scheme.
( e ) Find the variance of John's sampling scheme using Sen--Yates--Grundy
formula .
(f) Find the variance of John's sampling plan using usual formula.
( g) Find the variance of John's sampling plan using definition of variance.
(h) Are three variances equal for John 's sampling scheme?
(II) Michael likes Amy and cleverly suggests the following changes in John's
sampling scheme as:
p(s/) = 1/6, '<I t = 1,2 ,3 ,4 ,5,6, and p(s/)= 0.00, '<It = 7,8,9 ,10.
( i ) Find the first order inclusion probabilities for Michael 's sampling scheme.
(j ) Find the estimates of total score from each sample using Michael 's sampling
scheme.
( k ) Find the bias in Michael's sampling scheme .
( I ) Find the second order inclusion probabilities for Michael's sampling scheme .
(m) Find the variance of Michael's sampling scheme using Sen--Yates--Grundy
formula .
( n ) Find the variance of Michael's sampling plan using usual formula .
(0) Find the variance of Michael's sampling plan using definition of variance.
(p ) Are three variances equal for Michael's sampling scheme?
Practical 5.2. Use the known information on the number of fish caught during 1992
to select a sample of seven units by using PPSWOR sampling. Collect the required
information from population 4 to estimate the total number of fish caught during
1995 through a regression estimator by using information during 1994 as the
auxiliary variable. Apply the Sen--Yates--Grundy form of the estimator of variance
of the regression estimator to construct a 95% confidence interval. Also use the
calibrated estimators of variance of the general linear regression estimator.
Population 4 in the Appendix shows that the information on the number of different
kinds of fish caught during 1992 and 1994 is available.
Practical 5.3. Develop the calibration weights for the units selected in the sample
using PPSWOR sampling by making use of known information about the number
of fish caught during 1994 as the auxiliary variable.
522 Advanced sampling theory with applications
For simplicity use the chi square distance function between the design weights and
calibrated weights. Discuss the three cases where these weights lead to the ratio,
GREG and traditional linear regression estimator for estimating the total number of
fish caught during 1995. Derive the value of the estimate in each case.
Given : Total number of fish caught during 1994 is 341856.
Practical 5.4. Take an SRSWOR sample of 10 states from population 1 and note
the records of real estate farm loans as well as nonreal estate farm loans. Given that
information on the nonreal estate farm loans is available for all states, apply the
ratio estimator for estimating the total real estate farm loans in the United States.
Construct the 95% confidence intervals using the lower and higher level calibration
approach.
Given: N = 50, X = 43908.12 and S; = 1176526 . Repeat the exercise by taking
PPSWOR sample.
Practical 5.5. Select a sample of 4 units by using the RHC scheme from population
1 of the Appendix . Estimate the total real estate farm loans using the RHC
estimator and making use of nonreal estate farm loans as an auxiliary variable . Find
a 95% confidence interval for the total real estate farm loans in the United States.
Hint: Divide the population into four random groups of unequal sizes.
Practical 5.6. The demand of fish for human consumption creates the need to
estimate the total number of fish of all kinds caught by recreational fishermen of the
Atlantic and Gulf coasts during 1994. Population 4 in the Appendix shows that the
information on the number of different kinds of fish caught during 1992 is
available. Use the known information on the number of fish caught during 1992 to
select a sample of seven units by using PPSWOR sampling (Midzuno--Sen
Sampling Scheme). Collect the required information from population 4 to estimate
the total number of fish caught during 1994. Apply the Sen--Yates--Grundy
estimator of variance to construct a 95% confidence interval.
Chapter 5: Use of auxiliary information: PPSWOR Sampling 523
Practical 5.7. Mr. Mario was interested to estimate the total value of produc tion of
the soybeans for beans in all the 30 states of the United States of America growing
this particular crop. Mr. Mario selected a sample of n = 5 states based on some prior
information and listed the first order (ll'j) and second order (ll'ij) inclusion
probabilities based on PPSWOR sampling as
5430 0.55 I ? ? ? ? ?
7009 0.58 2 0.25 ? ? ? ?
7180 0.60 3 0.27 0.29 ? ? ?
6062 0.63 4 0.28 0.30 0.32 ? ?
4054 0.65 5 0.30 0.32 0.32 0.36 ?
( a ) Apply the Horvitz and Thompson estimator to estimate the total value of
production of the crop.
(b) Construct a 95% confidence interval of the total value of produc tion based on
Sen--Yates--Grundy estimator of variance.
( c ) Construct a 95% confidence interval of the total value of production based on
usual estimator of variance .
( d ) Comment on the confidence interval estimates .
Prac tical 5.8. Calibration weights are found to be useful in survey sampling . Find
the calibration weights for the units selected in the sample using PPSWOR
sampling by making use of known information about the number of fish caught
during 1992 and 1993 as auxiliary variables. Use the chi square distance function
between the design weights and calibrated weights. Discuss situations where these
weights lead to the GREG and traditional linear regression estimator for estimating
the total number offish caught during 1995. Construct a 95% confidence interval.
Practical 5.9. The following table provides a set of first order (Jr i) and second
order ( Jrij ) inclusion probabilities based on a popu lation of size N = 5 units with
values X and selection probabilities P, .
Data , . Selection I" First order . Secon d orde r inclusion pro babilit ies
Probabilities inclusi on '"
. ii..
'" . 'c probabilities I,
.. .~, "."
lJ
"',
"
",~.
'" P'! .~ "'~ ",
:j \Js,~ I"" } " 1
. + 2 3 4 ,; ,,' 5 0
" X """ I ,.
50 0.10 0.55 I d) ? ? ? ?
75 0.15 0.58 2 0.25 d ? ? ?
2
100 0.20 0.60 3 0.27 0.29 d) ? ?
125 0.25 0.63 4 0.28 0.30 0.32 d ?
4
Practical 5. 10. A psychologist wants to know if the birth rate of left handed girls
and left handed boys is same in a town having eight elementary schools. The
psychologist used the prior information on the number of registered boys and girls
to select two samples (one of boys and another of girls) each consisting n = 3 units
by using PPSWOR sampling. The information collec ted by him is given the
following table :
j
2
( a ) Derive a 75% confidence interval estimate for total number of left handed
boys using the Horvitz and Thompson estimator and usual estimator of its variance .
(b) Derive a 75% confidence interval estimate for total number of left handed girls
using the Horvitz and Thompson estimator and usual estimator of its variance .
( c ) Assuming that both samples are independent, use ( a ) and ( b ) to estimate the
difference between total number of left handed boys and girls, and derive 75%
confidence interval estimate for the same.
( d ) Derive a 75% confidence interval estimate for total number of left handed
boys using the Horvitz and Thompson estimator and the Sen--Yates--Grundy
estimator of its variance .
( e ) Derive a 75% confidence interval estimate for total number of left handed girls
using the Horvitz and Thompson estimator and the Sen--Yates-Grundy estimator
of its variance.
( f) Assuming that both samples are independent, use ( d ) and ( e ) to estimate the
difference between total number of left handed boys and girls, and derive 75%
confidence interval estimate for the same.
( g ) Comment on the confidence interval estimates obtained.
Hint: Ghosh (1998) .
Miscellaneous:
( h ) For each one of the sampling plan ( a ) to ( d ), from each sample estimate
population total using Horvitz and Thompson estimator and test for unbiasedness.
( i ) For each one of the sampling plan (a) to ( d), from each sample estimate the
variance using the usual estimator V(YHT ) , and test for unbiasedness.
( j ) For each one of the sampling plan (a) to (d), from each sample estimate the
variance using the Sen--Yates--Grundy estimator vSYG (YHT ), and test for
unbiasedness.
( e ) Find "SYG (YG ) using low level calibration approach and construct 95%
confidence interval estimates.
(x
( f) Assume that VSYG HT) is known and construct 95% confidence interval
estimate using higher order calibration approach .
( g ) Discuss the estimates.
such that
P(SI) = 0.25, P(S2) = 0.25, P(S3) = 0.25 , P(S4) = 0.25
( a ) Find the first order inclusion probabilities for the Melissa's sampling scheme.
( b ) Find the second order inclusion probabilities for the Melissa's sampling
scheme
(c) Find the variance of the Melissa 's sampling plan using the Sen-Yates-Grundy
formula .
(d) Find the variance of the Melissa's sampling plan using the usual formula of the
vanance.
)=
(e) Does the relation V (YHT VSYG(YHT ) holds for Melissa 's sampling plan?
( II ) Stephanie likes Sarjinder and cleverly suggests the following changes in the
above sampling plan:
P(SI) = 0.20, P(S2) = 0.25, P(S3) = 0.24, P(S4) = 0.31
( f) Find the first order inclusion probabilities for Stephanie's sampl ing scheme.
( g ) Find the second order inclusion probabilities for Stephanie's sampling scheme.
( h ) Find the variance of the Stephanie's sampling plan using the usual formula of
the variance.
)=
(i ) Does the relation V (YHT VSYG(YHT) holds for Stephanie's sampling plan?
(j) Find the relative efficienc y of the Stephanie's cleverness over Melissa 's
sampling plan?
( k) What is your opinion about the Stephanie's sampling scheme ?
( I ) Can you estimate the variance using any formula either or "(YIlT ) "SYG (YHT )
for Melissa's (or Stephanie's) sampling scheme? Give you opinion.
528 Advanced sampling theory with applications
Practical 5.15. Stephen and Sarjinder were appointed to select four students
(n = 4) out of a multicultural list of six students (N = 6) with different cultural
backgrounds as: n = {Poonam, Quang, Ryan, Stephanie, Tom, Udeesh] and GPA 3.70,
3.20,3.80,3.72,3.90 and 3.92, respectively.
( I ) Stephen considers the following sampling scheme consisting of only four
possibilities:
SI = {Poonam, Quang, Ryan, Stephanie} , S2 = {Quang, Ryan, Tom, Udeesh] ,
S3 = {Ryan, Stephanie, Tom, Udeesh] , and S4 = {Poonam, Stephanie, Tom, Udeeshj,
such that
p(St) = 0.25 'd t = 1,2,3,4.
(a) Find the first order inclusion probabilities for Stephen's sampling scheme, and
derive the estimates of average GPA from each sample .
(b) Find the bias in Stephen's sampling scheme.
(c) Find the second order inclusion probabilities for Stephen's sampling scheme,
and derive the variance using:
( i ) Sen-- Yates--Grundy formula, (ii) Usual formula , and (iii) By
definition.
( d ) Are three variances equal for Stephen 's sampling scheme?
( II ) Sarjinder likes both Poonam and Stephanie and suggests the following
changes in Stephen's sampling scheme as:
p(St) = 0.50 for t = 1, 4, and p(St) = 0.00 for t = 2, 3.
( h ) Find the first order inclusion probabilities for Sarjinder's sampling scheme,
and estimates of average GPA from each sample.
(i ) Find the bias in Sarjinder's sampling scheme.
(j ) Find the second order inclusion probabilities for Sarjinder's sampling scheme,
and derive the variance using :
(i ) Sen--Yates--Grundy formula, (ii) Usual formula , and (iii) By definition.
( k ) Are three variances equal for Sarjinder's sampling scheme?
6;0 INTRODUCTION
In the previous chapters we have seen that use of known auxiliar y information at
the estimation stage as well as at the selection stage leads to improved estimation
strategies in survey sampling. When such information is not completely known or
lacking and it is relatively cheaper to obtain information on the auxiliary
variable(s), one can consider taking a large preliminary sample for estimating
population mean(s) of the auxiliary variab le(s) to be used at the estimation or
selection stage of the ultimate estimation strategies. For examp le, in the case of
single auxiliary variable x, since it is cheaper to obtain information on x, we
consider taking a large preliminary sample for estimating population mean X or
distribution of X as the case may be, and only a small sample (some times a sub-
sample) for measuring the study variable Y.
Population of N units
Preliminary large .,
sample of
Sample of
n units
This could mean devoting a part of the resources to this large preliminary sample
and, therefore, reduct ion in sample size for measuring the study variable . This
sampling technique is called double sampling or two-phase sampling and was
invented by Neyman (1938) . In cases in which the sample for the main surveys is
selected in three or more phases , the sampling procedure is called three-phase or
multi -phase sampling. This procedure is advantageous when the gain in precision is
substantial as compared to the increase in cost owed to collection of information on
the auxiliary variate for large samples .
(a) SRSWOR scheme at the first as well as second phases of the sample selection ;
( b ) SRSWOR scheme at the first phase and PPSWR sampling at the second phase ;
( c) PPSWOR scheme at first as well as second phases of the sample selection .
In the following section , we will discuss a situation when simple random without
replacement sampling is applied at both phases of the sample selection.
Under this strategy we would like to discuss the usual ratio and regression type
strategies using one and two auxiliary variables. Before proceeding further, it is
necessary to define notation and expected values, which will remain useful
throughout this chapter.
Let (x;,x;, ....,x:) be the first phase sample SI (say) drawn by simple random
sampling from the population of N units and let only the auxiliary variable X be
measured. Also, let (Y"Y2 '....'Yn) and (XI, X2, ....,x n ) denote , respectively, the second
phase sample S2 (say) drawn by simple random sampling from the first phase
sample for the study variable Y and auxiliary variable X .
Let
_ _I n _ -I n _* _I m * 2 ( )_1 n ( -)2
y = n 'I Yi,x=n 'Ixi, x =m 'Ixi,sx=n-I 'Ixi-x ,
i=1 i=1 i=1 i=1
2 = ( n -I )-1 'I
Sy
n(
J an d sx*2 = ( m -I )-1 'I
Yi - Y-\2 m (* -* \2
Xi - x J .
i=1 i=1
Defining
- - - -* 2 2
Y X X X Sy Sx
Eo==-I EI=--I E2=~-1 E3=~-1 0.0 =--1 and 5, = - - 1
Y , -* , X ' X ' 2' *2
x ~ ~
such that
and
(1 1) 2 (I I)
E(E2 E3)= -;;;- N Cx' E(EO 01)= -;;- m CyAn, E(EIO,)= -;;- m CxAo3' (I I)
E(EOOO)=G- ~)CyA30' E(OOE')=(~- ~)cxA2I,and E(OOO')=(~- ~}A22-I)
where
A = J.lrs C2 = J.102 P = J.l"
rs r/2 s/2' x -2' xy ~
J.l20 J.102 X VJ.l20J.102
for
Note that some of these expected values are true only up to the first order of
approximation.
The above results can easily be proven on the lines of the following two theorems :
Then we have
V(EO) = E(E6)- {E(EO)Y = E(E6)
= E,V2[EO I first phase sample] + VjE 2[EO I first phase sample]
= E'[(~-~)
nmy x
~;~.] '" (~-~)
nmYX
!x~ = (~-~)Px
nm
C CX'
YY
The ratio estimator Yrd of r in two phase or double sampling takes the form
Assuming that IEil < I, i = 0,1 and using the binomial expansion
(I+E,t' =I-E, +Ef +...
the above estimator can easily be written as
Yrd = r(l+ EoXI+ Elt 1 = r(l+ EoXI- E, + Ef +......)
(6.1.1.2)
= r[l+ EO - E, + Ef - EOE, +.....] .
Thus we have the following theorems:
Theorem 6.1.1.1. The bias in the ratio estimator Yrd' to the first order of
approximation, is given by
= r[1 +0-0+(~_~)(C2
n m x
- Pxy CC)].
xy
Chapter 6.: Use of auxiliary information: Multi-Phase Sampling 533
Thu s the bias in the estimator Yrd' to the first order of appro ximation , is given by
B(Yrd) =E(Yrd) -Y = (~n _~)
m
Y(C2x _ pxyx
C C).
y
Hence the theorem.
Theorem 6.1.1.2. The mean squared error of the estimator Yrd' to the first order of
approximation, is
Corollary 6.1.1.1. An estimator of the mean squared error of the estimator Yrd' to
the first order of approxima tion, is
MSE V, ) (1 1) + (I- - -1)[s , +
(~rd = - - -
mN
S
2
Y nm
2
>
2 2-
I' S x 21'Sx
Y
]
(6. 1.1.5)
Example 6.1.1.1. From population 1 in the Append ix select a first phase sample of
10 units by SRSWO R sampling and note only the nonreal estate farm loans from
the selected units in the sample. From the selected first phase sample of 10 units,
selec t a sub-sample of 5 units and note the real estate farm loans as we ll as nonrea l
estate farm loans. Estimate the average real estate farm loans by using ratio
estimator in two-phase sampling. Deduce the 95% confidence interva l.
Solution. We used the first two columns of the Pseudo-Random Numb er (PRN)
Table 1 given in the Appendix to select an SRSWOR sampl e of m =10 units. The
following 10 distinct random numbers 0 1, 23, 46, 04, 32,47, 33 05, 22, and 38
between I and 50 resulted in the following first phase sample.
Sr. No . 1 2 3 4 5 6 7 8 9 10
Pop U nits' 01 23 46 04 32 47 33 05 22 38
State s AL MN VA AR NY WA NC CA MI PA
Xi
• 348 .334 2466 .892 188.477 848.3 17 426.274 1228.607 494.73 3928 .732 440 .5 18 298 .351
We used the 7th and 8th columns of the Pseudo-Random Numbers to select a second-
phase sample of n = 5 units from the above list of selected first phase sample units .
The following five distinct random numbers between I and 10 were observed: 07,
09, 0 I, 02 and 03. Thus the second phase sample consists of the following
information:
' 0)0);:;
··. •;~;•.• ;J';t ,",pt'nnr1 pbasesample.illformation ·; ' , ;;;;.>:; •.T:1 ~.. ~ <;*;;
';''4!{lrst d
Note that
x =..!.- IXi = 3938 .951 = 787.7902 Y =..!.- IYi = 3047.928 = 609.5856
n i=1 5 n i=l 5
Yrd Y X
J
- = -[ x' = 606.5856( 1066.9232) = 825.576 .
787 .7902
+ (.!-5 -...!...)[190375.025
10
+ 0.77382 x 894540.79 - 2 x 0.7738 x 400681 .275]
= 25820.171 .
The (1- a )100% confidence interval estimate for the average real estate farm loans
in the US during 1997 is
Yrd =+= f
aj 2(df -S-E-(Y-rd-) .
= n -1).jr-M
Using Table 2 from the Appendix, the 95% confidence interval for the average real
estate farm loans in the US during 1997 is given by
Theorem 6.1.1.3. The minimum MSE of the ratio estimator Yrd for the fixed cost
C given by (6.1.1.6) is
Min.MSE(Yrd)
=[
.JC:D
(C-Co~S; -VR -
I] 2
N Sy + (C-Co)
D r~C2VR(S;-VR)
~S; -VR
-.JC:VR lJ, (6.1.1.7)
where
2 2 (6.1.1.8)
VR = Sy +R S2x -2RSty
and
-0st.11/ =-11/2
2
Sy VR
-+- 11/2
+AC\ =0'
which implies that
II/ = ~S;-VR/{Ji.;c;} . (6.1. 1.11)
On differentiating (6.1.1.10) with respect to Il and equating to zero we have
oL/on= - VR/n2+ AC2= 0
which implies that
n=JV;/ Vi jC;}. (6 .1.1.12)
On substituting these values of m and n in equation (6.1 .1.6) we have
Ii = {~c, (S; - VR) +~C2VR }j(c-Co)=D/(C-Co) (say). (6.1.1.13)
On substituting this value of A in (6.1.1.11) and (6.1. 1.12) we obtain the optimum
sample sizes
m ={(C -CO~S;-VR }/~D} (6.1.1.14)
and
n= KC-Co )JV;}/{Jc;D}. (6. 1.1.15)
On substituting (6.1. 1.14) and (6. I. 1.15) in (6. 1.1.4) we obtain (6. 1.1.7). Hence the
theorem.
Theorem 6.1.1.4. The minimum cost C for the fixed MSE(Yrd) = Vo (say) is
and
V(-)=(~_J...JS2
Y C-C N Y
. (6.1.1.23)
o
Solution. Under SRSWOR sampling we know that
V(y) = (~- J...)S2 . (6.1.1.24)
n N Y
Example 6.1.1.2. The amounts of the real and nonreal estate farm loans (in $000)
during 1997 in the 50 states of the United States have been presented in population
I of the Appendix . Suppose we selected first phase and second phase samples each
of size 10 and 5 respectively .
( a ) Find the relative efficiency of the ratio estimator, for estimating the average
amount of real estate farm loans during 1997 by using information selected in the
first phase sample only on the nonreal estate farm loans during 1997, with respect
to the usual estimator of population mean.
( b ) Suppose a budget of US$5000 is available to spend on the survey, $2000 of
which will be the overhead cost. Suppose selection, compilation, and analysis of
one unit in the first phase sample cost $50, whereas for the second phase unit is
$500. Find the optimum values of the first phase and second phase sample sizes.
Also find the relative efficiency of the ratio estimator over the sample mean for the
fixed cost.
( c ) What will be the minimum cost for attaining a 30% relative standard deviat ion?
Solution. From the description of the population we have
Y; = Amount ($000) of the real estate farm loans during 1997.
538 Advanced sampling theory with applications
- ) = (I
MSE (Yrd I ) Sy2 + ( -;;--;;;
-;;;- N I )f
I lSy2 +R 2 s;2 -2RSxy ]
= (~-~)X342021.5
10 50
Thus relative efficiency of the ratio estimator over the sample mean is given by
RE= V(y) xI00=61563.87 x I00=139.51% .
MSE(Yrd) 44127 .86
(b) We have
C = 5000, Co = 2000, C1 = 50, C 2 = 500,
s;
VR = S y2 +R 2 2 -2RSxy
m={(C-CO~S;-VR }/~D}
= K5000 - 2000 N342021.5 -167661.41 Ywso x 12108.53}= 14.6 '" 15,
and
n = ffC - C ).JV;;}/ rc;
e D} = (5000 - 2000 N167661.41 = 4.5 '" 5.
~ 0 N 2 J500 x 12108.53
Thus we have
Thus the minimum cost of the survey for the fixed precision will be
Theorem 6.1.2.2. The minimum variance of the diffe rence estimator Ydd of
population mean r is
Min.v(Ydd)=(~ - ~ Js;+(;- ~JS;(l-P;J (6.1.2.3)
Proof. We have
VCYdd) = E[Ydr E(Vdd)]2 = E[r(l+ EO )+dX(E3 - E2)- r] = E[r EO +dX(E3 - E2 )]2
= EY
[- 2 EO
2
+d 2X-2(E32 + 2
E2 -2 E2E3 ) + 2dY- X-(EOE3 - EOE2
)]
= (~_~Jf2C2
nN y +(~-~Jrd2X2C2
nm~ x -2dr Xpxy xy ' cc] (6.1.2.4)
On differentiating (6.1.2.4) with respect to d and equating to zero we have
d = Pxy C
Ct
y
(~J
X
y
= S t2
s; . (6 I 25)
...
On substituting (6.1.2.5) in (6.1.2.4) we obtain (6.1.2.3). Hence the theorem.
6:1:3:REGRESSION·' ESTIMATOR .
The exact distribution of the regression estimator in two-phase sampling has been
derived by Causeur (1999). We consider the regression estimator of population
mean in two-phase sampling in the following theorem:
Theorem 6.1.3.1. Find the asymptotic MSE of the linear regression estimator of
population mean r
in two -phase sampling
Ytrd =Y+p(x·-x) (6.1.3.1)
where /J = s xy / S 7; is an estimator of the regression coefficient fJ = Sxy/ S; .
Proof. Let us define I] = /J/fJ - 1, such that E(I]) "" O. Then the estimator Ytrd of the
population mean r can be written as
Ytrd = r(l+ EO)+ fJX(l + I]XE3 - E2) = r(l+ EO)+ fJX(E3 - E2Xl+I]) .
"" EY
[- 2 EO
2
+fJ 2X-2(E32 + 2
E2 -2 E2E3 ) - 2fJY- X-(EOE3 - EOE2
)]
I --
= (-
mNY
I) S 2 + ( -I - -
nm
I){ S 2
Y
+13 2 Sx2 -2f3Sxy }
=(J.-_~)S2 +(~_J.-)S2(I_p2
mN Y nmY xy
).
Hence the theorem .
Example 6.1.3.1. The amounts of real and nonrea l estate farm loans (in $000)
during 1997 in the United States have been presented in population 1 of the
Appendix . Suppose we selected first phase and second phase samples each of size
10 and 5 respectively. Find the relative efficiency of the regression estimator, for
estimating the average amount of the real estate farm loans during 1997 using
information selected in the first phase sample only on the nonreal estate farm loans
during 1997, with respect to the ratio estimator of populat ion mean.
Solution. Continuing from example 6.1.1.2, we have MSE(Yrd) = 44127.86. Also the
mean squared error of the regression estimator is given by
Theorem 6.1.4.1. Consider a general class of ratio type estimators Ygd of the
population mean Y in two-phase sampling is
542 Advanced sampling theory with applications
Min.MSE("" d)=
\Yg m
(~_~)S2,
N >
+ (~_ ~)S2[1
m II
_ Px2 _ (,1,1 2 - PXYAo3~]
Y 1 ,1,
Y 1_
. 2
(6 I 42)
...
"04 - - 03
Proof. Expanding the function H(u, v) around the point (1,1) up to the first order
Taylor's series we have
where H IO
oH 1(11)
=- and
oH
HOI = -1(11 ) denote the known first order partial
ou ' dv '
deri vatives of the function H with respect to u and v respectively. By definition,
we have
MSE(ygd)= E~gd - rf = E[r{l+ EO+ EI H IO +oIHOI + ...}- rf
= PE[E5 +H?o E~ +H510? +2 EOEI H IO +2 EO 0IHOI +2 EI OIHIOHoI]
= p[(~_J...)C;
II N
+ (~ - ~){HI
m
II
~ C; + H51(Ao4 -1)+ 2H IOP.w. CyCx
On differentiating (6.1.4.4) with respect to H IO and HOI ' respectively, and equating
to zero we have
H IOCx+H01Ao3 = - PxyCy • (6.1.4.5)
and
HIOCxAo3+HOI(Ao4 -1) = - CyAI2 ' (6 .1.4.6)
Solving (6.1.4.5) and (6.1.4.6) for H IO and HOI we have
Remark 6.1.4.1. The optimum first phase and second phase sample sizes for the
fixed cost (or variance) can also be derived for the difference, regression, and
general class of estimators of mean. The optimum first phase and second phase
sample sizes from the above theorems can easily be obtained by replacing VR with
VR = S;(I- P;y)
for the difference and regression estimator and with
V; =S2jl_ 2
R y Pxy
_(AI2-PXY-103~)
2
-104 -1- ,103
for the general class of estimators.
Example 6.1.4.1. The amounts of the real and nonreal estate farm loans (in $000)
during 1997 in the United States have been presented in population 1 of the
Appendix . Suppose we selected first phase and second phase samples each of size
10 and 5 respectively . Find the relative efficiency of the general class of estimators,
for estimating average amount of the real estate farm loans during 1997 by using
information selected in the first phase sample only on the nonreal estate farm loans
during 1997, with respect to the regression estimator in two-phase sampling of the
population mean.
Proof. We have
Jj = Amount of the real estate farm loans in different states during 1997.
X i = Amount of the nonreal estate farm loans in different states during 1997.
- 2
N = 50, Y = 555.43, S y = 342021.5, -103 = 1.5936, Pxy = 0.8038, ,112 = 1.0982, and
-104 = 4.5247 .
So we have
MSE(Ylrd)= (~_~JS2
mN Y
+(~_~JS2(1-
nm Y
P;Y )
= (~_...!...J
10 50
x 34202 I.5 + (~- ~J x 342021 .5(1- 0.8038 2) = 39466.05 .
5 10
Also for the general class of estimators, we have
MSE(y d)=
g
(~_~JS2+(~_~JS2[1-
m N Y n m Y
P; _ (,112 - PXY-103~]
Y 1 ,1 2 1.
"tl4 - - 03
=(~-...!...)
10 50
X342021.5
Thus the percent relative efficiency (RE) of the general class of estimators Ygd with
respect to the regression estimator Ylrd is given by
Singh (1991) defined a general class of estimators for estimating the finite
population variance S; given by
s~ =s;H(u,v) (6 .1.5 .1)
Theorem 6.1.5.1. The minimum mean squared error of the general class of
estimators , s} , is given by
Min.MSE(s~ )
Proof. Expanding H(u , v) around the point (I, I) up to the first order Taylor's
series, we have
s~ = s;H(u, v) = s;H[1 +(u-I),I+(v-I)]
= S;[I + (u -I)H IO+ (v -1)Hol + (u _I)Z Hzo + (v-If Hoz + (u -IXv -1)H11 + ....j
where
sn an oZH oZH oZH
HIO = ou 1(1.1)' HOI = ov 1(1,1), Hzo = ouz 1(1,1}' Hoz = ovz 1(1,1}' and HI 1 = O UO V .
Thus the mean squared error of the general class of estimators s l , to the first order
of approximation, is given by
MSE(s~)= E[S~ - s ; f = 4; {I + (u -I)HIO + (v-I)HoI }- s; f
=E[(S; - s;)+ s; (u-I)HIO + s;(v -1)Hol + .... f
Chapter 6.: Use of auxiliary information: Multi-Phase Sampling 545
= Sy4E[82
0+ EJ7H2IO+ 8122
HOI + 280 E l H 10 + 2808tH 01+ 2 E l 81Hl0HO l ]
= S;[(~-~)(A40
n N
-I) + (~ - ~){H?OC'; + HJI(Anc 1)+ 2CxAzIH IO
n III
The optimum first phase and second phase sample sizes for the fixed cost (or
variance) can also be obtained for estimating the finite population variance .
We introdu ce here the concept of two-ph ase sampling calibration approach and its
generalisation which has been studied by Dupont (1995), Hidiroglou and Sarndal
(1995 ,1998) and Estevao and Sarnd al (2002). At the moment we are using only one
auxiliary variable to keep the procedure simple . Suppose a first phase probability
sampl e 51 is drawn from the popul ation n using a sampling design that generat es
the selection probabilities Jr li' From the given first phase sample 51 ' the second
phase sample 52 (subs et of 51 ) is drawn with a sampling design with the selection
prob abilities Jr2 i = Jri isl . Evidently from the first phase sample 51 the Horvit z and
Thompson (1952 ) type estimator of population total X is given by
x· = I. dlix i
ie st
where d Ii =1/ Jr li . From the second phase sample unbiased estimators of Y and X
are, respectively, given by
y = I. d 2iYi, and = I. d2ixi
ie s2
x ie s2
where d 2i = (1/Jrli )x (1/ Jr2;)'
546 Advanced sampling theory with applications
WZi = ( 11
d Zi + dZiqZixi L:.dZiqZixi
1;1
2
)-I(
",* n
x - .L:. dZiXi .
I;)
) (6.1.6.8)
On putting these calibrated weights in (6.1.6.1) we have the estimator (6.1.6.4).
Hence the theorem .
A*]
X
(T
A A
Yrd =y
Aga in app lying the Midz unc--Sen samp ling scheme to the units selected in the first
pha se sample, we selected one unit with probability proportional to the number of
fish during 1992 and the remaining 4 units with SRSWOR sampling. We used the
first two columns of the Pseudo-Rand om Numbers to se lect a random number
I ~ Ri ~ 10 and another random number I ~ Rj ~ 28933 by usi ng the 7 to u "
h
columns. The first effective pair of the rando m numbers is (01 , 07572). Thus the
unit at serial number 01 from the given first phase sample is incl uded as a first unit
in the second phase sample of n =5 units . The remaining 4 units are selected by
SRSWOR sam pling from the remaining 9 units in the given first phase sample. We
used the 13th column of the Pseudo-Random Numbers to draw 4 distinct random
numbers between 1 and 9. The rando m num ber came in the sequence as 3, 7, 5, and
4. Th us the ultimate sample cons ists of the followi ng information .
~~
.... ..
......
.Secon<I''phasesamplefiItfor mation ,
,:~.).> ". c. )/
Under the chi square distance function the second phase cal ibration weights are
1
- 1" 622966 .9
1V2 · = d 2 ·
I I ( I d ·x .
iESZ 2I I
J
x = d ·.
50 1796.3 2I
Sr. No...•'· d Zi 1;.; 1;.. ': X i Yi ', I!' xi dzi ,,'A' W2i wZiYi
1 11.4737 17741 16238 203554.9 1 14.24429 231298.78
2 16.44 14 1008 859 16572 .93 20.41156 17533 .53
3 14.3885 4707 4793 67726.67 17.86673 85635 .23
4 10.4936 1849 1 11567 194037.16 13.02752 150689.32
5 15.8588 1255 1375 19902.79 19.68828 27071 .38
SUnl 50 1794 .46 512228.24'
Chapter 6.: Use of auxiliary information: Multi-Phase Sampling 549
Thus a ratio estimate of the total number of fish during 1995 in the United States is
Yrd = L WZiYi = 512228 .24 .
iESZ
Sche me II . The first phase sample is selected as in Scheme I but the smaller
sample of size n is selected independently from the who le population;
Scheme III. Often the aux iliary information may be collected by two different
agencies and hence two independent preliminary samples of sizes m, and mz are
selected for observing X and Z, and the small sample of size n is also selected
independently from the population by SRSWOR. For simplicity we will consider
ml = mz ·
Before proceeding further let us define
550 Advanced sampling theory with applications
y x x- * z -*
z
EO= = - I, EI= ~ - I, E2 = ~ - 1 , E3= = - I, and E4=~ - I,
Y X X Z Z
such that
E(E j) = 0, j = 0, 1,2,3,4.
Under Scheme I.
Theorem 6.2.1. The difference type and unbiased estimator of population mean Y
is given by
YI =y+,BI(X* - x~,B2(Z* -z). (6.2.1)
Under scheme I the minimum variance ofYI is given by
V(YI)I = (~- ~ )s; +(~- ~)~?S; +pis; -2pISxy -2p2Syz +2pIp2SxJ (6.2 .8)
(-) S; (I I
V Ylll=-+ - + - )fIPIn2Sx2 +/3zSz
n mn
2 2) 2pI Sxy 2/JzSyz
------+2pIp2 Sxz(- + - . (6.2.12)
n n mn
I I)
On differentiating (6.2.12) with respect to PI and .02, respectively and equating to
zero and solving for PI and .02, we obtain
552 Advanced sampling theory with applications
m+n
Sy{PYZ- PxyPxz}}
- ( --.!!!-){ (6 2 13)
. .
,an d /32-
{Sz(l-P;J}
On substituting /31 and /32 from (6.2.13) in (6.2.12), we have (6.2 .3). Hence the
second part of the theorem.
which in fact reduces to (6.2.4) for the optimum values of /31 and /32 obtained from
(6.2.14). Hence the theorem.
The estimation procedure based on two-phase sampling schemes, when none of the
auxiliary variable population means are known, has been considered by Khan and
Tripathi (1967), Tripathi (1970, 1976, 1987), and Adhvaryu (1978). However, in
many socio -economic and agricultural surveys, the population means (totals) of
some of the auxiliary variables may be known while those of the others may not be
readily available. For example, to estimate the total number of agricultural labourers
in a rural county, the information about the area and population of the village may
be known from the recent county records while the information about the number of
cultivators and cultivated areas of the village in the county may not be readily
available. The estimation of population mean of a survey variable under the partial
knowledge of the auxiliary means has been considered by Singh (1969), Chand
(1975), Kiregyera (1980, 1984), Mukerjee, Rao, and Vijayan (1987), and
Srivastava, Khare, and Srivastava (1990). These estimators and their modifications
are popular in survey sampling under the name 'Chain Ratio Type Estimators'
which will be discussed in the next section, but let us first do an example.
Example 6.2. 1. The season average per pound prices (in $) of the commercial
Apple crop in 36 different American states have been given in population 3.
Scheme I. Select a first phase sample of 10 units by SRSWOR for observing the
auxiliary variab les:
Xli = Season average price per pound during 1995;
Scheme II. Select the first phase sample of 10 units as in Scheme I and collect
information on two variables. Select the second phase sample of 5 units
independently from the whole population and collect information on three
variables.
Scheme III. Suppose the information on 10 units selected by SRSWOR sampling
about ' Season price per pound average during 1994' is collected by a company
XYZ. The information about 'Season price per pound average during 1995' on an
independent sample of 10 units is collected by another company ABC in the United
States . A small sample of 5 units is also selected independently from the population
by SRSWOR sampling to collect information on the ' Season average price per
pound during 1996.'
We wish to estimate the average season price per pound during 1996 by making
proper use of information under different schemes. Which sampling scheme would
you prefer to recommend for the future?
Solution. From the description of the population we have Y; Season average =
price per pound during 1996, Xli = Season average price per pound during 1995,
and X 2i =Season average price per pound during 1994. Here Y=7.317,
X I = 6.683 , X 2 = 5.9222 , N = 36 , Y = 0.2033, XI = 0.1856 , X 2 = 0.1645 ,
2 2 2 C y2 = 0.1563 , 2
Sy =0.15633, S rI =0.16406, SX2 =0 .17396, CX \ = 0.1641,
C}2 = 0.1739 , PYX\ = 0.8775, PYX2 = 0.8759, and PX\ X2 = 0.74135 . Also we have
m = 10, n = 5 .
Min,V(YI)1 '" S;
n
[I _(m111- n J!p}\y + P;X2l_p
- 2PXP.pYX2PXI X2 ) ]
2
Xl X 2
2 2
= 0.15633[ I _ ( 10- 5J{ 0.8775 + 0.8759 - 2 x 0.8775 x 0.8759 x 0.7413}]
5 10 1-0.74132
= 0.017465.
2
= 0.15633 [1_ (~J{0.87752 + 0.8759 - 2 x 0.8775 x 0.8759 x 0.7413}]
5 10+5 1-0.7413 2
= 0.012865.
554 Advanced sampling theory with applications
= 0.15633 1- (~){ 2
0.8775 + 0.8759
2
- 2 x 0.8775 x 0.8759 x 0.7413)
5 10+5 ( 10
1- - -
)2 x 0.7413
2
10+5
= 0.021389.
In this situation the estimator of population mean under scheme II has minimum
variance. Hence from the efficiency point of view scheme II will be preferred.
Following the notation of the previous section and assuming that population mean
Z of the second auxiliary variable is known and Pxy > Pyz (variable Z is closely
related to X; however, it is not as closely related to Y as X is related to Y),
Chand (1975) proposed a chain ratio type estimator of population mean Y as
- -(x*J(
x z*
Yc = Y ZJ ' (6.3.1)
Theorem 6.3.1. Under scheme I the bias in the estimator Yc to the first order of
approximation is
Taking expected values on both sides of (6.3.3) and taking its deviation from
population mean Y, we have (6.3.2). Hence the theorem.
Chapter 6.: Use of auxiliary information: Multi-Phase Sampling 555
Theorem 6.3.2. Under scheme I the mean squared error of the estimator Yc, to the
first order of approximation, is
(- ) (1 1
MSE Yc = -;; - N ) Y
-Z lCy
[ Z + CxZ - 2pxyCxCy 1
Proof. We have
MSE(yd = E~c - r f '" E[r(EO - E\ + EZ - E4)f
-Z [ z Z Z
= Y E lEo + E\ + EZ + E4 -2 EOEI +2 EOEZ -2 EOE4 -
Z 2 E\ EZ +2 E \ E4 -2 EZE4
1.
On substituting the expected values and after simplification we obtain (6.3.4).
Hence the theorem .
Khare and Srivastava (1998) and Tracy and Singh (1999) have considered the study
of chain ratio type estimators in more detail by proposing a few classes of
estimators of population mean.
In this case a first phase probability sample s\ of size m is drawn from the
population n , using a sampling design that generates the selection probabilities
Jrl i ' Given that sample SI has been drawn, the second phase sample Sz (subset of
s\) of size n is selected from SI using a sampling design with the selection
probabilities JrZi = Jrils\ . The first phase sampling weight of t il unit is denoted by
d li = 1/Jr\i , and the second phase sampling weight by d Zi = (Jr\iJrZi t l.
Table 6.4.1. Relationship between set of units and available data at different levels.
Set of units 1
7 ..
•... ;"'; ' \. "7
.•J;;
liT;estirriators Calibrated
' ;.;.; '
'; 1;;\ ' ~~;'. '. ' .. ..; ;j . ;...: 7 •• ;' 1\ . " 7 .. ;
. estimators
popuIiliiOn
N
k :i E n} or z= IZ i is known .
i=\
Firstphase'. . X" = Id\ixi , Xc = I WIiXi
"
The Table 6.4.1 summarises our assumptions on the auxiliary information available
for estimation .
Now we have the following theorems:
Theorem 6.4.1. Under chi square (CS) type of distance function for the first phase
data set defined as
DI = I(WIi-dlif(dli%tl (6.4.1)
ie sl
subject to the calibration equation
IWlizi = Z (6.4.2)
ie s l
the first phase calibrated estimator
;; = IWlixi (6.4.3)
ies]
becomes
; ; = . Idlixi + ( .Idli%XiZi/
.IdliqlizlJ(z - .IdIiZiJ . (6.4.4)
les l lesl les]le sl
Theorem 6.4.2. Under the CS type of distance function for the second phase data
set defined as
Dz = I(WZi -dz;)Z(dziqz;)-1 (6.4 .9)
iesz
Chapter 6.: Use of auxiliary information : Multi-Phase Sampling 557
becomes
where PI(ds)
, = .L d'i% XiZi / .L d'i% Zli
2 P2(ds) = . Ld2iq2ixiYi / .Ldli%Xli2 .
and '
IES! I ES ' I E S2 I E S2
L 2 = . L:(W2i - d 2;)2(d2iq2i
IES2
r' - 2~(. L: W2i Xi - x;J .
IE S2
(6.4.13)
the unit at serial num ber 62 is included as a first unit in the preliminary large sample
of 15 unit s. The remaining 14 un its are selected by SRSWOR sampling from the
remaining 58 units in the population. We used the 13th and 14th co lumns of the
Pseudo -Rando m Numbers to draw 14 distinct random numbers between 1 and 58 .
The random numbers came in the sequence as 05, 34 , 30 , 55 , 46, 07 , 13, 19,44,25 ,
58 ,68,47, and 67 .
with N = 69, m= 15, and 11i = x;/ X· with X· = 291882 (given, the total number of
fish caught duri ng 1992) .
Thus an estima te of the total number of speci es caug ht duri ng 1993 is give n by
z= Idliz i = 466620.5 .
;Esl
For % = 1Vi and the total number offish during 19 93 are known to be Z = 3 16784 ,
the first phase calibration weig hts are given by
Thus an estimate of the total number of fish based on first phase sample during
1994 in the US is given by
.t = L WIiXi = 347262.1 .
ies,
Again applying the Midzuno--Sen sampling scheme on the units selected in the first
phase sample, we selected one unit with probability proportional to the number of
fish caught during 1992 and remaining 9 units with SRSWOR sampling. We used
the first two columns of the Pseudo-Random Numbers (PRN) given in Table I of
the Appendix to select a random number I s Ri :s; 15 and another random number
I :s; Rj :s; 28933 by using the 7th to 11th columns . The first effective pair of random
numbers is (01, 07572) . Thus the unit at serial number Olin the given first phase
sample is included as a first unit in the second phase sample of n = 10 units. The
remaining 9 units are selected by SRSWOR sampling from the remain ing 14 units
in the given first phase sample . We used the 13th and 14th column of the Pseudo-
Random Numbers to draw 9 distinct random numbers between 1 and 14. The
random numbers came in the sequence as 05, 07,13,09,06, 11, 12,04, and 03.
Summer
I flounder
ILane 0.003 0.2080.003 0.644 7.45 1088 8107.6 8821100 .0 859 7.26 6243.62
sna er
I Cunner 0.0070.211 0.006 0.645 7.34 1255 9212.4 11561586.8 1375 7.13 9806.76
S ot 0.0510.2470.051 0.661 6.13 18491 113399.0 2096863507 .0 11567 3.56 41266 .91
Saltwater 0.0460.2430.046 0.659 6.25 14441 903 12.1 1304197188.0 13859 4.21 58361.07
catfish
Searobins 0.0160.2190.016 0.648 7.04 4707 33155.1 156061147.0 4793 6.29 30166.37
Sand 0.0130.2160.013 0.647 7.14 5665 40474 .7 229289141.0 4355 6.22 27128.08
Seatrout
Atlantic 0.0040.2090.003 0.644 7.43 4860 36147.9 175678928.0 4008 6.62 26533.76
mackerel
Other fish 0.0070.2120.007 0.645 7.32 1141 8354.72 9532731 .7 6669.63
Kin fish 0.0130.2160.012 0.647 7.14 4805 34331.3 164961659.0 27594.05
~if~61Jr. Q ;,(i!§ ? 8 ~ (i7Q3 .0 295822:10
In the above table we used
m-n n-1
Jr2i = - -P2i+ - -
m _
with P2i =Xi IXi '
_1
m-1 m-1 ~
Under the chi square distance function the second phase calibration weig hts are
Following Sarndal, Swensson, and Wretman (1992) the two required set of
residuals are given by
eli = Yi - PaZi ViE 52 (6.5.2)
and
(6.5.3)
Following Hidiroglou and Sarndal (1995, 1998), the low level calibrated estimator
of variance of yc is
, (, ) _*_* 1 __
vB Yc = L L W 2ijWi Wj e2i e2j + - L L WiijwliWljelielj (6.5.5)
2 iES2 j Es2
r
iES2 jEs2
hatio(YJ= .L .L
IES2JES2
W2ije2ie2j[i~ldl:Xi]gl[
L, Wi Xi
"d
L,
Z
liZi
]" + L L
i ES2 jES 2
"'ij,,,e,,[_z_]g3 Idliz i
iES2 iEsl iESI
Similarly from (6.5.5) a regression type estimator for % = q2i = 1 can be developed
for estimating the variance of the chain regression type estimators.
Singh (2000b) suggested a higher order calibration estimator of the variance in two-
phase sampling as
, (,)
Yc = '"
vho L,
'"
L,
_*_*
D 2ij wi wj e2i e2j
1
+- '"
L,
'"
L,
_ _
Dlijwliwljelielj ' (6.5.6)
iES2 jEs2 2 iES2 j Es2
where D lij and D 2ij are the weights such that the distance between D lij and W 1ij
and that between D 2ij and W 2ij is minimum. Define two chi square type of
distance functions as
o, = ±.I .I
IESI JE SI
(D lij - Wiij ~(QlijWiij r l
(6.5.7)
and
562 Advanced sampling theory with applications
DZ =~ .L .L (Ozij-wzijNQzijwzij)-t. (6.5.8)
IESZ JESZ
Also let us define the first calibration constraint as follows :
L L 0liAAiZiZj = V(z), (6.5.9)
iESI JE St
where V(z) denotes the known variance of the estimator of the total of the cheaper
auxiliary character, Z .
The second calibration constraint is
L L OZijdZidZjXiXj = v(x) (6.5.10)
iESZ jE SZ
where
v(x) = L L (Jrtij - JrliJrlj kx j
iESI JE St JrlijJrtiJrlj
Qlijl¥tijdtidljzi z j [ ( ') ]
0tij = I¥tij + Z Z Z Z V Z - .L . L l¥tijdlidljZiZj . (6.5.11)
L L Ql ijWt ijdtidljzi Z j IESl JESZ
i ESt J ESt
°Zij=WZij + "
z: s:
QZijWZijdZidzjXiXj [t)
" Q W dZd z Z Z v x - L
Zij Zij 'u ZjXi Xj
L WZijdZidzjXiXj
iESZ jE SZ
]
' (6.5.12)
iESZ j ESZ
Use of (6.5.11) and (6.5.12) in (6.5.6) forms the higher order calibration estimator
of variance. Several estimators can be shown as special cases of the higher order
calibration approach .
For example , if Qtij = 1/ldtidtjXti Xlj) and QZij = l/ld Zid ZjXZiXZj ), then an estimator
of the variance of the chain ratio type estimator becomes
Similarly higher order calibration estimators for estimating the variance of the chain
regression type estimators can be developed by choosing
% = qZi = Qli = QZi = 1 "if i and j .
Chapter 6.: Use of auxiliary information: Multi-Phase Sampling 563
Suppose a first phase sample S j of size m is drawn from the population of N units
with sampling design PI and values Xi (i E s\) on a auxiliary variable X are
ascertained. A sub-sample Sz of size n is drawn from the first phase sample Sl
with design Pz and the values Yi ' Xi for i E Sz of the variable Y and x. The
resulting two-phase sample is S = (s), sz), the two-phase sampling design is P, and
the sample selection probabilities are given by p(s) = PI (SI )pz(sz I sd . Such a class of
design P will be denoted by D.
564 Advanced sampling theory with applications
The first phase inclusion probabilities will be denoted as Pj and second phase
inclusion probabilities will be denoted as Qj and are assumed to be positive .
Define
Pj (51) = L. pz (5Z I 51) .
spj
Let E p denote the design expectation and Vp denote the design variance .
(a) hi is free of Yi' Xi for i ~SZ but may involve them and i when i E 5Z;
( b ) hz is free of Xi for i ~ Sl but may involve them and i when i E SI ;
(c ) E pz [hi +hzl=o if P1(51»O .
Result 6.7.4. Suppose a prior; = ;(x) for Y may be postulated prior to second
phase sampling. Let E; denote the prior mean operator and the posterior mean by
E;(-Id), where d=(s,y,x). Then under a square error loss function, a Bayes
estimator of population total in two-phase sampling is given by
6.8CONCEPT·OF :THREE-PHASESAMPLlNG
Let (xf , x~ ,....,4) be the first phase sample S j (say) drawn by simple random
sampling from a population of N units where only auxiliary variable X is
measured. Let (x; ,x;,....,x: ) be the second phase sample S2 (say) drawn by simple
random sampling from the first phase sample s\ units and again only auxiliary
variable X is measured. Let (Y\>Y2, ... .'Yn) and (X\> X2, .... , Xn ) denote the third phase
sample S3 (say) drawn by simple random sampling from the second phase sample
for the study variable Y and auxiliary variable x.
Let
_ -I n _ -I n
y =n IYi, x=n I Xi, -* =
X m
~ *
-I ':"Xi and i. i#
- # = / -1 ':"X
X
i=1 i= 1 i=1 i=1
Define
-*
x
EO= ~ - I, EI = ~ - I, E2=~-I
Y X X
so that
E(Ej)=O, j=O,I ,2,3.
Theorem 6.8.1.
Proof. Let E) , E2, and E3 denote the expected values over all possible first,
second and third phase samples, respectively, and let V" V2 and V3 denote the
variances over all possible first, second, and third phase samples , respecti vely. Then
we have
= (~ - ~ )c;.
Hence the theorem.
Theorem 6.8.2. Prove that under the concept of three phase sampling
Proof. Let C1 , C2 and C3 denote the co-variances over all possible first, second,
and third phase samples, respectively. Then
E(EOE\)
=COV(EO,Et)=E,E z{ C3(EO, El) }+E1CZ{E3(EO), E3(El)}+ C1{ EzE3(E O), EzE3(El)}
={ E EzC3( ~ -I,~ -I)}+ E'Cz{ E3(~ -I}E3(~ -I)}+CEzE3( ~ -I} EZE3(~ -I)}
1 1{
Corollary 6.8.2. Following the above theorem we can easily prove that
E(EIE2)= (..!-_-.!...)c;,
m N
E(EIE3)= (..!-_-.!...)c;,
m N
and E(E2E3)= (!_-.!...)c;.
I N
Remark 6.8.1. Under three-phase sampling the following types of estimators can
r,
be studied
6.9
Suppose that a first phase sample SJ of fixed size m is taken by SRSWOR from a
population of N units and the auxiliary variable X is observed for all i E SJ . A
simple random sub-sample S2 of size n is taken by SRSWOR from sample SI , and
the variable of interest Yi and auxiliary variable Xi are observed for all i E S2' The
simple linear regression estimator for two-phase sampling is given by
(6.9.1)
n n
where y = n- I LYi, x = n- I LXi are the means for the second phase sample S2,
i=1 i=1
x· = m-I IXi is the mean for the first phase sample SI and b = Sxy/ s; . The variance
i=1
of the estimator (6.9.1) is given by
568 Advanced sampling theory with applications
where
Si>=(N-lt II[(Y;-y)-/3(Xj
i=1
-x)f =(N-ltIID? i=1
with Dj = ( Y ; - Y ) - / 3 ( Xi - X).
v,(_)
Ylr = (1 1 )S2 + ( -
--- I --
1 ) sd2 (6.9.4)
m N y n m
where
SJ = (n -It I f[(Yi - y)-b(x;- x)f = (n -It 1 fd? with d, = (Yi - y)-b(Xi - r) .
i=1 i=1
Following Sitter (1997) we have
S; = si> + /32S; . (6.9.5)
A sample analogue of (6.9.5) leads to
2 2 2 2
Sy=Sd+bsx ' (6.9.6)
On the basis of the relationship (6.9.6) Sitter (1997) considered the following two
estimators of variance of the regression estimator in two-phase sampling as
,(-)
vIYlr (I N
1)Sd (I NI)b Sx
= -;;-
2
+ -;;;- 2 *2
(6.9.7)
and
,(-) (1 I)sd (I I)b2Sx'2
Vo Ylr = -;;- N 2
+ -;;;- N (6.9.8)
Sitter (1997) also considered the problem of estimation of variance of the regression
estimator in two-phase sampling as follows.
Defining
-*
x*()
j =
mx -x·} if j E sl, (6.9.9)
m-l
ny - Yj
- - - if j E s2,
(6.9.11)
yj
!
() = y_ n - l
if j E sl -S2 ,
and
bul = 1:-[x} - xk lit,-1).;(1- k} 11 if j
if j
E s2,
E sl -sz, (6.9.12)
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 569
Example 6.9.1. From population 1 in the Appendix , select a first phase sample of
10 units by SRSWOR sampling and note only the nonreal estate farm loans for the
units selected in the sample. From the selected 10 units select a sub-sample of 5
units and note the real estate farm loans as well as nonreal estate farm loans.
Estimate the average real estate farm loans by using the regression estimator.
Construct the 95% confidence intervals by using three different estimators of the
variance.
Solution. We used the first two columns of the Pseudo-Random Number (PRN)
Table 1 given in the Appendix to select an SRSWOR sample of m = 10 units. The
following 10 distinct random numbers 01, 23, 46, 04, 32, 47, 33 05, 22, and 38
between 1 and 50 resulted in the following first phase sample.
where x; = Nonrealestate farm loans . From the first phase sample, x' = 1066.9232,
and S;2= 1470305.988.
570 Advanced sampling theory with applications
We used the 7th and 8th columns of the Pseudo-Random Numbers to select a second
phase sample of n = 5 units from the above list of selected first phase sample units.
The following five distinct random numbers between I and 10 were observed : 07,
09, 01, 02 and 03. Thus the second phase sample consists of the following
information.
.~,
,; 1i~, .•.,,' number ;x.
, I .: Yi' " .'
x
I 07 33 NC 494.730 639.57 1
2 09 22 MI 440.5 18 327.028
3 01 01 AL 348.334 408.978
4 02 23 MN 2466.892 1354.768
5 03 46 VA 188.477 32 1.583
.", :.':"
·R........'
··'i ·/ S um , '., 3938.951
" 305 1.928
Now
x = ~ r. x; = 3938.951 = 787.7902 y = ~ r.y; = 3051.928 = 610.3856
n ;=1 5 n ;=1 5
s; = (n - It r.(y; - yf = 759220.4
l
;=14
= 189805.1,
2
b = sxy / sx= 04475
' ,an d Sd2 = _1_ L.~d,2 = 42574 .26 = 10643.56.
n-I;=\ 4
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 571
Thus the regression estimate of the real estate farm loans is:
Case I. We have
, (-)
Vo Ylr = (1-;;- N1 sd2 + (1J
;;;- N1 Jb2 Sx2
or [329.70, 1140.89] .
Thus Jackknife estimator of the variance of the linear regression estimator Ylr is
VJ(Ylr) = (m-I) I[Ylr(;)-YtJ = (10-1) x 36286.02 =32657.42.
m j=1 10
A (I - a)l 00% confidence interval for the average real estate farm loans III the
United States during 1997 is given by
Ylr+ fa /2(df=n-2).jvJ(Ylr) .
Using Table 2 from the Appendix the 95% confidence interval for the average real
estate farm loans in the United States during 1997 is given by
Raj (1964) and Singh and Singh (1965) considered the problem of estimation of
population mean using the concept of probability proportional to size and with
replacement sampling of second phase sample of n units from the given first phase
sample of m units selected with SRSWOR sampling. A pictorial representation of
such a two-phase sampling is shown in Figure 6.10.1.
Population
of N units
Only auxiliary
variable, x, is
measured
Second phase
sample of n
units drawn wit
PPSWR
sampling
It means that the probability of selecting one unit in the first phase sample of size
m is given by 1/ N and that of selecting a second phase sample from the given first
m
phase sample is given by Pi = xd/, where x· = IXi .
i=1
Then we have the following theorem :
Proof. Defining E1 and Ez as the expected values for the given first phase sample
and all possible first phase samples respectively, we have
t
E(y) = E1Ez[YPP sd I first phase] = E1Ez[_1- 2l I first Phase] = El[~
mn i=1 Pi m i=1
IYi] = Y .
Hence the theorem.
V{-
I)'ppsd
)= (~_~)5Z + (m -I) 5z
m N Y mnN(N -I) Z
(6.10.2)
where
5 z =-1- IN(Y: - Y -f and 5 z =-1- IN(Z. - Z -f .
with Z , = y. /p. .
Y N - 1i= 1 I Z N _ 1i=1 I I I I
z
m(m-I) N m v. Y j lIz
mZnN(N -1)i=lj>i } Pi Pj[
I I PiP ' - - -
J
+ --- 5
(m N) Y
z
(m -I) Ip· Y' - lIz
(
N
= -L_y + --- 5
mnN(N -1)i=1 I Pi J (m N) Y
= (m-I) ~Pi(Zi-Yf+(~-~)5Z=(~-...!-)52+(m
-I)52 .
mnN(N-I) i=1 m N Y m N Y mnN Z
Hence the theorem.
574 Advanced sampl ing theory with applications
,(_ ) (1 N1)mi~/iz1-
v Yppsd nm(m-l)
= -;;;-
s; (6.10.3)
where
s; = (n-ltlJ±zl-n-I(.±zi)2j
1,=1 1=1
with Zi =yjNPi'
Proof. Obvious by taking expected values on both sides .
Example 6.10.1. From population 1 given in the Appendix select a first phase
sample of 10 units by SRSWOR sampling and note only the nonreal estate farm
loans from the units selected in the sample. From the selected 10 units select a sub-
sample of 5 units using PPSWR sampling and note the real estate farm loans as
well as nonreal estate farm loans. Estimate the average real estate farm loans by
using PPSWR two-phase sampling estimator. Construct the 95% confidence
intervals.
Solution. We used the first two columns of the Pseudo -Random Number (PRN)
Table 1 given in the Appendix to select an SRSWOR sample of m = 10units. The
following 10 distinct random numbers 01, 23, 46, 04, 32, 47, 33 05, 22, and 38
between 1 and 50 resulted in the following first phase sample .
where x; = Nonrealestatefarmloans.
t
1 01 AL 348.334 348.334 01473 S 0.032649
2 23 MN 2466 .982 2815 .316 0.231226
3 46 VA 188.317 3003 .633 0.017651
4 04 AR 848.317 3851.950 03965 S 0.079511
5 32 NY 426.274 4278 .224 04981, 05365 S,S 0.039954
6 47 WA 1228.607 5506.831 0.115155
7 33 NC 494 .730 6001.561 07673 S 0.046370
8 05 CA 3928 .732 9930 .293 0.368233
9 22 MI 440.518 10370.810 0.041289
10 38 PA 298.351 10669.160 0.027964
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 575
We used the cumulative total method to select the second phase sample as follows.
The cumulative totals of the auxiliary variable, nonreal estate farm loans, is given in
the fifth column of the above table. We used the first five columns of Pseudo-
Random Numbers given Table 1 in the Appendix to select eight random numbers
between 1 and 10,670. These random numbers came in the sequence as 01473,
04981,05365, 03965, and 07673. These random numbers have been shown in the
sixth column of the above table. The seventh column shows the states selected in
the second phase sample with PPSWR sampling. Note that the state NY has been
selected twice. Now, from the ultimate sample selected in the second phase, we
have the following table:
275.86 76096.26
95(i.66 211406.30
fzf-n-1(fzi)2 211406.31_956.662
s2 = .:...
i=...:.I_ _-->..:i-'=I'----"'_ ~5"--_ = 7091.66 .
_
z n-l 5-1
Thus an estimate of the variance of Yppsd is given by
n 2 2
The generalized linear regression (GREG) estimator has been found to be most
commonly used as an estimator of population total/mean in survey sampling. Let
us consider the simplest case of the GREG where information on only one
auxiliary variable is available . Consider two populations n/ = {1/,2/,..,i/ ,.., N/ },
for t = I, 2 from which two independent probability samples s/ (s/ en) are drawn
with a given sampling design, p/ (.). The inclusion probabilities Jri/ = Pr(i/ E s/ ) and
Jrij(t) E Pr(i/& lIE s/) are assumed to be strictly positive and known. Let Yi/ be the
value of the variable of interest, Y , for the l" population element , with which also
is associated an auxiliary variable Xli for the l" population. For the elements i) ESt,
we observe tyil ,xii ,Zil) in the main or first survey. In the second independent
survey, we observe lXiz , Ziz )' The population total of the common auxiliary variable
N/
ziz ' i = 1,2,...,nz; X = ~::Xi/' t = 1,2, is assumed to be accurately known in both
i/ =I
N
surveys. The objective is to estimate the population total Y = I Yi l using auxiliary
il =1
information available in both surveys .
,~ i~i, 1\.rG9l\1M6NVARIABLESIUSEDFOifFP;R'I'IIE~/~AiIBRATIQN~O~}
0 0 ° WEIGHTS " o ,\ ",i " /"" ,;,! /1:.\\\:: \\'1 ! " , I ' ' ;, ' ,
Suppose that two sample surveys have one variable in common. If the population
total of this variable is known, then it can be used as a control variable in GREG.
Suppose the population total of this common variable is unknown. Let it in the first
surveyor sample SI of nl units take values zil ' i = 1,2,..., n j ' In the second
independent surveyor sample of n: units it takes values ziz' i = 1,2,...,nz , defining
estimators of unknown common total as ZI = 2: di) Zit and Zz = 2: diZZ iZ ' where
ie s, ie s/
d iZ = 1/Jriz such that £(Z/) = Z , the unknown total for the common variable Zi ' Let
A z A Z
us define Z = 2:a/Z/ , where 2:a/ = I, is an unbiased estimator of Z based on the
/=1 /=1
estimators obtained in both surveys . Let us consider a new estimator of population
total, Y, as
Yp = 2: r il Y il (6.11.1)
i) e s)
with weights r i1 as close as possible for a given metric to the w il ' while respecting
the calibration equation
,L rij zil = Z. (6.11 .2)
'1e sl
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 577
A simp le case is the minimization of chi square type of distance function given by
L ~il - wil Y(Wil hiJI , (6.11.3)
il ESI
Substitution of the value of ril from (6.1 1.4) in (6.11. 1) leads to a new estima tor of
population total from the 1''' survey as
Yp = L: d i l Yi ) +PI) (x - XHT)+.021 (Z - ZGREG) (6.11.5)
i )ES I
whe re
XHT = L d i )Xi) ,
ilE SI
Now from Renssen and Nieuwenbroek (1997) we have the following coro llary:
where .0;1
and .0;) have their usual meanings . On comparing (6.11.5) with
(6.11.6) , one can easily see that we are using the regression type estimator for
estimating unknown total, Z, given by ZGREG , whereas in (6.11.6), Renssen and
Nieuwenbroek (1997) used the Horvitz and Thompson (1952) type estimator, given
by ZHT'
6~U.2 EST~ATION OF VARIANCE;USING'DUA.L FRAME SURYEYS ,
(6.11.8)
578 Advanced sampling theory with applications
Several estimators of variance can be shown as special cases of this estimator. Let
us consider the following case:
Case I. Suppose both surveys used simple random sampling and without
replacement (SRSWOR) scheme for selecting the same number, n, of the units.
Under such situations, J! i, = n/N and J!ij(t)=n(n-I)/N(N-I) . If we choose the
weights qi, =I/Xi, and hi, =I/=i" then the calibrated weights areri, = ~(i/iHT)
and the estimator of variance becomes
[T--d-,-)2].
N2(1_ f) ritei, ri,ei,
Vo (yp(t))=
{
,2: ,2: Dij(t J!ij(tXri, ei,-ri/i} + (6.11.9)
A A
n IES, JE S, I, J,
Hartley (1962,1974) pointed out that a dual frame design can result in considerable
cost savings over a single frame design with comparable design. For example, if the
frame 0, is an area frame and frame 02 is a list frame, the frame 01 may be
complete but expensive to sample, the frame 02 may be incomplete but has a lower
cost per unit sampling. Bankier (1986), Fuller and Burmeister (1972), Kalton and
Anderson (1986), Skinner (1991), and Skinner and Rao (1996) have considered the
problem of estimation of population total using dual frame surveys. Lohr and Rao
(2000) have considered several estimators of population total in dual frame surveys
and compared them under a unified setup. They also considered the problem of
estimation of variance of the estimator of population total under dual frame survey
through the concept of Jackknifing. Deville and Goga (2002) studied Horvitz and
Thompson estimator when information is gathered on two samples.
6.12ES'FIMATION:ORMEDJA.NUSING:T:W()~PHASE.·SAM:llJd(NG....
In two-phase sampling, in the first phase we select a preliminary large sample s' of
n' units by simple random sampling without replacement (SRSWOR) and only the
auxiliary character X is measured. Let M:
be the estimator of median M x of the
auxiliary character X based on the first phase sample. In the second phase a sub-
sample s of n units is drawn from the preliminary large sample by SRSWOR and
both the study variable Y and the auxiliary variable X are measured. Let Mx and
My denote the estimators of Mx and My respectively, based on the sample drawn
at the second phase.
Following Srivastava and Jhajj (1981) and Srivastava (1971), Singh, Joarder, and
Tracy (2001) proposed a general class of estimators for estimating the median as
M, =H(My,U) (6.12.1)
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 579
where U= MxlM~. Whatever may be the sample chosen, H(M y'u) assumes the
value in a bounded closed convex sub-set R 2 of two-dimensional real space
containing the point {My,l) such that H{My,I)= My. Using a first order Taylor's
series expans ion around the point {My,l) we have
MI = H(M y,I)+(My-My)H1 (M y,I)+(U -1)H 2(M y,I)+o(n- l ) (6.12 .2)
where HI {M y,I) and H2 {M y,I) denote the first order partial derivatives of H(M y'u)
with respect to My and U, respectively.
Furthermore, under the assumptions that HI {M y ,I) = 1 and that H 2 {M y ,I) is an
unknown constant, we have the following theorems:
and
1
given by
wherefy{My) denotes the values of the marginal density fy(Y) at the median value
of y.
Proof. Follows from Chapter 3.
(6.12.6)
Theorem 6.12.3. The estimators Mland Mllr have the same asymptotic variance
because
E(d)= d + O(n-I ).
Proof. Define
s =
o JAM J
JAMJ_ 1 s
' 1
= JY~Myl-l
Jy My '
and
such that
,M'
MJ=M y ---"L
A
Mx ( ' J, M A
4=M
'M
('Jd
y --,;-f
Mx
, and Ms=M y
A A
[
dMx"
A
+
M
d]
,
(~\,'I '
x
1 )lVl
which are in fact analogous of the ratio estimator and the estimators proposed by
Srivastava (1967) and Walsh (1968) in double sampling, respectively, are also
special cases of the class of estimators MI '
Theorem 6.12.4. The general class of estimators, M] , is always more efficient than
the ratio estimator MJ .
Proof. If we set V(M j )<V(M J) , where V(MJ)denotes the variance of the ratio
estimator MJ , then it reduces to
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 58 1
Suppose n.; is the number of units in the second phase sample SII with X s Mx'.
Thus if the values of Pij (i, j = 1,2) are known we can predict P by
(6. 12.10)
because P.j = ~ j + PZj "" a.5, j = 1,2. If we replace Pij by Pij from the sample in
(6.12.10) we obtain an estimator of P as
.01'=n-1[n:PII /P.I + (n -n:)PIZ/P.z] "" (2n- 1ln:PII + ~l- n:Xa.5 _PI I)] (6.12.11)
Theorem 6.12.6. The variance of the estimator M6 ' up to terms of order O~l -I ), is
V(M 6 )=(ry(MJ-Z[(~ - ~ )± - 4(~ - ~J~I-a.25f J (6.12.13)
So we have
M6 -My = (rAM y)}- I[F)M 6 )- FAMy)]+op(nO.5)
=try(My)}-I [p,'- Py1+0 p(nO.5).
Now we have
582 Advanced sampling theory with applications
where p,', is defined by a cross classification of the first phase sample using if .~
and if ~ as the cut-offs. Covj denotes the covariance term for the given first phase
sample. Using (6.12.16) in (6.12.15) and then using the result in (6.12 .14), we have
the theorem .
Suppose Fyi (y) and FY 2 (y) denote the proportion of units in the second phase
sample for which X:o; ifx' and X > if x ' , respectively , for the value Y that have y
values less than or equal to y. Then Fy(Y) can be estimated by
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 583
(6.12.17)
where /I x is the number of units in the preliminary large sample with X ::; Mx ' .
0
(6.12.18)
Proof. We have
Fy(M 7 )=Fy[My+(M 7 -M y)] = Fy(My)+fy(M Y XM 7 -My]+0 p(n-O.5).
This implies
M7 -My= {rAMy)}-'[F)M7 )- FAMy)]+op(n-O.5)
= {rAM>.)}-'[0.5- p y]+op(n-O.5).
For large N we have FylMy)"" FytlMy)+Fy2lM y), so that
V[Fy(My)]= EtV2[FAMy)]+I'\E2[FAMJ (6.12.20)
Now we have
We obtain the optimum first phase and second phase sample sizes for the fixed cost
as well as for the fixed variance cases.
Let C, and C 2 denote the cost per unit in the second phase and first phase,
respectively, then the fixed cost C is given by
C=nC,+n'C2 • (6.12.21)
The variance of M;, i = 1,2,4,5,6,7 will be minimum for the fixed cost C given by
(6.12.21) if the optimum values of nand n' are, respectively, given by
C~2~ ,(1- 2~ ,)
Also this variance Vo can be achieved by M; for i = 1,2,4,5 ,6,7 for the minimum
cost function (6.12 .21) if the optimum values of nand n' are, respectively
{rAMy)}-2 ~2~ ,(1- 2~ I)[~2~ 1(1- 2~ l)cl + 2~1~, - 0.251]
(6.12.24)
n = Je; ( Vo+ {{)M y )}-2 /(4N) ] ,
and
21~ I -0.251{ry(My)}-2[~2~ ,(1- 2~ l)cl + 2~1~, - 0.251] (6.12.25)
n' = --'---'-'-----'-'--""-'~=C~2('---Vo-'-'+- -'{f':"":"')r-M---,
y)""}-2'"'/(:-'--4N--:-)--3-]----'--'---'-'-------'-' .
Kuk and Mak (1994) proposed another ingenious method which in double sampling
can also be extended for estimating the finite population distribution function of Y
defined as:
N
F(y) = Wi I t.(y - yJ,
;=1
where t.(a) = 1 when a ~ 0 and t.(a) = 0 otherwise. Note that F simply puts
probability N-' at each y;, i = I, 2, ..., N assuming them to be distinct. The naive
estimator of F is the sample distribution function defined as:
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 585
where y and x are the second phase sample means of y and x, respectively, and x'
is the first phase sample mean of x variable . Also h(a)=(~'} is a function
satisfying h(x) = x' for the given first phase sample . Let G denote the population
distribution function of the auxiliary variable x. Also let (i' and G be the sample
distribution functions of the auxiliary variable X based on the first and second
phase sample information respectively. An obvious choice of h(-) is
h(-) = G'oG- 1(_), (6.12.28)
where G- denotes the inverse of the function G and 0 denotes the composition
I
first phase sample evenly to the n points . Also B= {.L {I(ox; )}{oL 24 • is the
IES v Xi IES v ( Xi
weighted least square estimator of the regression coefficient. After the redistribution
of mass based on a given first phase sample, we obtain Q' and Q as
Q'(y) = ("',,t I I L\&- Yij )=",-I IP;(y),
l
(6.12.31)
ies' j es ies'
where P; (y) =,,-1 L L\&- Yij) is an estimator of Pr[y':o; ylx'=x] under the model
JE S
(6.12.33)
586 Advanced sampling theory with applications
where x = G- 1(a) with a = F(y) and H(x, y) denotes the finite population joint
distribution of X and y.
.PI I (y)- F(y) '" hy)- F(y)}- {G(x)- G(x )}+ {G' (x)-G(x)}. (6.12.35)
Chen and Qin (1993) suggested an interesting empirical likelihood method for
quantile estimation in the presence of auxiliary information. Following their
notation, they considered the problem of estimation of () = E{T(X,Y)} subject to the
constraint 0 = E{w(X)}. An estimator of any population parameter () based on the
method of empirical likelihood function is given by
en =~ IT(x;,y;)/{I +iw(x;)}
n ies
(6.12.36)
Case I. If w(x) = x-x', where x=n-1Ix; and x'=n,-l I x; are the second phase
ie s ies'
and first phase sample means, respectively , of the auxiliary character X then in
such situations the estimator (6.12.36) will be an analogue of the Hartley and Rao
(1968) estimator in double sampling .
Case II. If w(x) = I[X5:M 'x1- 0.5 and e is the median of X, then the estimator
(6.12.36) will lead to the position estimator in double sampling.
Case III. If w(x;) = Xi - x' and x; is a 0 -1 variable and x is the estimator of the
population proportion X (say) based on the first phase sample information, then
the estimator (6.12.36) leads to an analogue of the post stratified estimator of Silva
and Skinner (1995) in double sampling.
Case IV. The ratio type estimator proposed by Garcia and Cebrian (1998) is also a
special case of it.
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 587
Example 6.12 .7.1. The amounts of the real and nonreal estate farm loans (in $000)
during 1997 in the United States have been presented in population I given in the
Appendix . Suppose we selected an SRSWOR first phase sample of ten states to
collect the information on nonreal estate farm loans only. From the given first phase
sample of ten units, we selected a second phase sample of eight states and both the
real and nonreal estate farm loans were observed . Find the relative efficiency of the
ratio estimator of median, for estimating median of the amount of the real estate
farm loans during 1997 by using information on the nonreal estate farm loans
during 1997 collected in the first phase sample, with respect to the usual estimator
of population median. Assume that both the real and nonreal estate farm loans
follow independent normal distributions.
Solution. From the description of the population, we have Yi Amount (in $000) =
of the real estate farm loans in different states during 1997, Xi = Amount (in $000)
of the nonreal estate farm loans in different states during 1997, N = 50,
M y = 322.305 , M x = 452.517 , ,uy = 555.434, ,ux = 878.162 , ax = 1073.776 ,
a y = 578.948, and fll = 0.42 .
(x-
2
I fl X )2 1 y-fly
&ax &ay
which implies that
_~( 452.517-878.162)2
e 2 1073.776
MRD=M y Ai:
A A [Ail J
with variance
*
588 Advanced sampling theory with applications
8 10 4 452.517 4
Thus the percent relative efficiency (RE) of the ratio estimator MRO with respect to
the usual estimator My is given by
which shows that the ratio estimator is more efficient than the usual estimator of
population median.
6.J3.DISTRIBU;fIONFUNc:nON:WITH;TWQ7PHASE SAMPLING ·
where F~ (t) = ~ f L'i(t - x;) , FAt) = .!- i L'i(t - x;) and Fy(I) =.!- ~ L'i(t - y;) . The
n i;J ni;J n i;J
preliminary large sample consists of n' units using SRSWOR sampling where only
auxiliary variable X is measured. In the second phase, a sub-sample of n units is
drawn from the preliminary sample of n' units through SRSWOR sampling and
both the study variable Y and the auxiliary variable X are measured on the
selected units . The following lemmas are needed to find the variance of the above
estimator.
Proof. Let Eland E 2 denote the expected values over all possible first phase
samples and for a given first phase sample, respectively . Also let the variances V,
and V2 be similarly defined. Then we have
V[Fy(t)] = E1V2{Fy(t )}+ VjE 2{Fy(t)}
= H
E,v2 iEtl(t - Yi)} + V1E2 Hi~,tl(t - y;)}
= (N -2n)[2: {tl(t-
nN i~1
v, W__ 1_ 2:
N -I i"o j~1
tl(t- Yi)tlV- yJ.J
Hence the lemma.
Proof. We have
COV(Fy(t), FAt))
= E, [c2{Fy(t), FAt )}]+ C1[E2{FAt)1 E2{FAd}]
(n'-n)
= -(-,-)E1
{I---;-l:tl(t - Y;)l:tl(t - x;) {I---;-l:tl(t - x;),---;-l:tl(t
n' n' }
+ C,
n' 1 - y;)n' }
n n -I n i~1 i~' n i~' n i~'
which proves the lemma.
=(N-
- n')
- [N 1 IN Ll(t-Yi)Ll\f-x,
ILl(t-Yi)Ll(t-Xi)+-- ()~
n'Nz i=\ N-1i"'J }
which proves the lemma.
+2{ ~& n[cov {fry (t), fr;(t )}- Cov {fry(t ),frAt)}] .
Proof. The proof of this theorem follows directly from elementary concepts. The
algebraic expression for V {frRO (t )}may be obtained by using lemmas.
This section has been especially designed to improve the work of Hidiroglou and
Sarndal (1995, 1998), and hence that Singh (2000b), and is based on Golden
Jubilee Year 2003 celebration by Singh (2003c) of the traditional linear regression
estimator owed to Hansen, Hurwitz, and Madow (1953) for its outstanding
performance in the literature. It has been shown that chain regression estimator is
unique in its class of estimators.
where dl~ are the calibrated plus weights obtained from the first phase sample
information. Although these weights can be chosen in many ways, but we will
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 591
discuss here only the simple case. We choose the first phase calibrated weights d!~
such that the chi square distance defined as
The choice of q~ decides the form of the estimator. The improved first phase
Lagrange function is then defined as
L\ =.I
IES\
(dl -d\.f
$
I $'Ull
qli d li
$[ -$ ] $[ -$
,Id\iXli-XI -U}2 .I d li -.Id\i ,
IE Sl IE SI IES\
] (6.14.1.5)
-$ $$ $ $
On differentiating (6.14.1.5) with respect to d!~ and equating to zero we have
dli = dli + Ai Iql i dlix\i + A1 2dli% (6.14.1.6)
On substituting (6.14.1.6) in (6.14.1.3) we have
A~(.Idliq~XIi)+
IES\
A~(.Idliq~)
IES!
= 0, (6.14.1.7)
A~(.Idliq~Xt) + A~(.Idli%Xli)
I E SI IES\
= (XI - .I dIiXli ) .
IES\
(6.14 .1.8)
On solving (6.14 .1.7) and (6.14.1.8) for A~ and A~, and substituting back in
(6.14 .1.6), we have the modified first phase calibrated weights as
[(X\Aiq~tIdliq~)-(dli%{.Idliq~l~Aiq~XIi)]
JI~=di+ IES! IE Sj IE S\ 2 [X\-Id1 iXIi] (6.14.1.9)
( IE.Id\iq~)(.Id\iq~xt)
S! IES\
-(.L:dliq~X\i)
IESI
I ESI
X• 2$ = . Idix2i + Pl(o\s)
• [
X\ - .I dlixli ] , (6.14.I.IO)
IES2 IES\
where
592 Advanced sampling theory with applications
(6.14.1.11)
Note that there is no choice of q~ which reduces the estimator (6.14.1.10) into ratio
or product method of estimation, and leads to the following theorem .
Theorem 6.1.4.1. The traditional first phase calibrated linear regression estimator
of the population total X 2 is unique in its class of estimators .
where di$ are called the second phase calibrated plus weights . Let us choose the
second phase calibrated weights di$ such that the chi square distance function
defined as
D$ = "
2 L,
(di$ - d1id2J $ ' (6.14.2.2)
iES2 d lid 2iq 2i
is minimum subject to the two calibration constraints defined as
-$
I,d i = I,d 1id2i , (6 .14 .2.3)
iES2 i ES2
and
(6.14.2.4)
where x~ is given by (6.14.1.1) after fixing first phase calibration. The choice of
q?; makes different forms of estimators in two-phase sampling. The modified
second phase Lagrange function is given by
L$2 = . I, -
IES2
(di $ - d 1id2i
$ -
d lid 2i q 2i
f -
$[-$ ,$] $[-$
2-121 .I,di
IES2
X2i - X2 - 2A.22 .I,d i
IES2 IES2
]
- . I,d lid 2i , (6.14.2.5)
where 4'1 and 4'2 are Lagrange multipliers. On setting 8~! = 0 we have
8d i
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 593
(6.14.2.6)
From (6.14.2.3) and (6.14.2.6) we have
On solving (6.14.2.7) and (6.14.2.8) for ~I and ~2' and substituting them in
(6.14.2.) we obtain the improved second phase calibrated weights as
Y'EEl '
c = . Id 1id2iYi + /32(0Is) [ X'EEl - . Id
2 lid2ix2i] , (6.14.2.10)
'ES2 1ES2
where
Note that if Q~ = 1 and Qf; = 1, then the resultant calibrated estimator (6.14.2.10)
can be claimed as a traditional chain regression type estimator in survey sampling.
Theorem 6.14.2.1. The traditional chain regression estimator is unique in its class
of estimators .
Note that in the same way all the ten cases (See Exercise 6.27) considered by
Estevao and Sarndal (2002) can be improved and we are leaving it to the readers as
an exercise . The problem of estimation of variance with the help of modified two-
dimensional methodology as discussed in Chapter 5 can also be extended for two-
phase sampling, and will be discussed in the next volume of this book.
594 Advanced sampling theory with applications
Exercise 6.1. Assume that a sample of size m is selected using SRSWR sampling
out of N units to observe the variable x , while a sub-sample of size n is selected
out of m units to observe the study variable Y and auxiliary variable X with the
same sampling strategy. Suppose Yn' xn denote the second phase sample means,
and XIII is the first phase sample mean.
( a ) Find the first order bias and mean squared error of the estimators of population
mean Y defined as
YB xn J
- = Y- n(XIII ' [Bose (1943) ]
[Patterson (1950)]
and
_ __ n(m-l)(_ __)
Yyr = rnX n + -(--) Yn - rnX n , [ Yates (1949) , Rao (1975) ]
m n-l
1 n y.
where "in = - L --!-.
n i =IXi
( b ) Now suppose
CI = Cost per unit of observation on the auxiliary variable X ,
Exercise 6.2. Suppose two samples of sizes nj and n2 are drawn independently
from a finite population of size N . Let XI denote the sample mean of auxiliary
variable X of the first phase sample of size nl, and let Y2 and x2 denote the
sample means of study variable Y and auxiliary variable X respectively, based on
the second phase sample of size n2 ' Assuming the two samples are drawn
independently and the regression coefficient [J is known, consider the follow ing
two estimators of population mean Y as
Yo =Y2 + [J(XI -X2)
and
YI =Y2+[J(X-X2), where x=aX\+bx2 with a +b=1.
Show that
V(yO)-V(YI)~O .
Hint: Shah and Gupta (1986) .
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 595
Exercise 6.3. Suppose a sample of size III is selected using SRSWOR sampling out
of N units to observe the variates X and Z , while a sub-sample of size n is
selected out of III units to observe the variates Y and X with the same sampling
strategy . Suppose Yn and xn denote the second phase sample means , and xm and
zm are the first phase sample means for the associated variables. Assuming that
population mean Z of the second auxiliary variable Z is known . Find the first
order bias and mean squared error of the estimators of population mean Y defined as
- _- [xm+b(Z-Zm)]
Y\ - Y n [Kiregyera (1980,1984)]
Xn
- - p[xzm Z-- - ]
Y2 = Yn +
m
XII [ Kiregyera (1980, 1984) ]
-
Y4
= -
Y II
(xmJ[ Z ++CCz ]a
-
Xn
-
Zm z
[Singh and Upadhyaya (1995)]
- _- [ Xm+b(Z-Zm)] ,
Ys - Yn
Xm +1l.(xlI - xm ) [Upadhyaya, Kushwaha, and Singh (1990)]
and
~ = q~ + ~~+ ~~ + ~~ + ~ ~+ ~Z
where Cj , j = 1,2,3,4,5,6, are suitable chosen constants such that bias in Y 6 is
equal to zero.
Hint: Mishra and Rout (1997) .
Exercise 6.4. Using the concept of two-phase sampling study the asymptotic
properties of the following estimators of population mean Y defined as
Exercise 6.5. In a finite population n of size N, let the value of the variable
Yj , (j = 0,1,2) on the it" unit be Y j i. Let Yj and SJ , respectively, denote the
population mean and mean square error of the / , variable. Consider we are
interested in estimating the ratio and product of population means of the first two
variables, defined as R = Yo/~ and p = Yo~ , respectively. Suppose a preliminary
large sample of III units is drawn from the given population by SRSWOR scheme
and only the auxiliary variable Y2 is measured on it. The main sample or second
596 Advanced sampling theory with applications
phase sample of size II is drawn either from the preliminary sample of size
III (> n)or independently from the population using SRSWOR scheme and variables
Exercise 6.6. Show that the following are unbiased estimators of the population
mean, Y ,
/1 = r(xI -x)+M(y-r x)+ Y, /z = r(x· -x)+aM(y-r x)+ Y,
and
r(xw- x)+ Mw(Y - r x)+ Y
/3 =
V(Yd)=(~_~J_l_I[(Yi-Y)-.BoIS(Xi-X)f + s; .
n m (n-3L=1 m
Hint: Tikkiwal (1960) .
fa
Yml = Y- j=1 Axr J1x}.)
is an unbiased estimator of the population mean.
( b ) Suppose that the sample elements are drawn once at a time, so the sample is
ordered set, the order being that of the order of drawn . Let Z a denote the ordered
set of observations on the first a sample elements, 1 < a < n, and let ai(Za)
denote functions of these observations . Let Y(a), Xj(a) denote the sums and y(a) ,
Xj(a) , j = 1,2,..., p denote means of the indicated observations on the first a
sample elements . Also let E{el Za} denote the conditional expectation given Za '
Show that the following three estimators are unbiased estimators of population
mean Y, defined as
II = Y- f aj(Za~xr J1x.]
j=1 }
((N -)n) {y(a)- y - f aj(ZaXXj(a
n-a N j=1
)-xJ}
n-a )N s -v:«,
Iz=((N-a)n j=1 [Xj-J1x }' ]~ (a(N-n){_(
{- P (Za)f- -v- ( )
) - P a j (ZaAxja-xj
n-a )N ya-Y-'f.
j=1
-)t
and
where w(x i ) is some suitably chosen weight to form different kinds of estimators,
leads to the biased ratio estimator
iJy = J1xa(Ze)
598 Advanced sampling theory with applications
where
y = n-
I
i~Y;/ Pi +.B[ m- i~X;/ Pi - n- i~ X;/ Pi] '
I I
Hint: Srivenkataramana and Tracy (1989), Sarndal and Swensson (1987) , Raj
(1964, 1965b).
Exercise 6.10. Consider a sequence of finite populations such that its t" member
U, consists of N, units labelled i = 1,2, ..., N, , 1 = 1,2,..., and N, ~ 00 as 1 ~ 00 • Let
~(I ) be a quantity that tends to zero as 1~ 00 . Let w{ ~ 0), X, Y denote
respectively a size measure, an auxiliary variable, and the variable of interest. Let
their respective values for the til unit (i = 1,2, ..., Nt) of U, (I = 1,2 , ....) be Wit
(known for each i ), Xit and Yit. The corresponding totals (means) over U, are
~, x. , 1;( ~ ,Xt ,~)· Let
Wt =lWII' ....'W Ntt) ' X t =lX II' ....' X Ntt ) and r; =lYlt ,..·.'YNtt) .
From U, an initial sample Sit (each possible Sit with a fixed number nit of distinct
units) is supposed to be drawn with a probability PIt(SIt ) that may involve the Wit
for i E V t. For the units in Sit the x values are supposed to be ascertained. From SIt
a sub-sample S2t (say, each possible S2t with a given number n2t < nIt of distinct
units) is to be chosen with a conditional probab ility P2t(S2t ) = P2t(S2t ISit) given that
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 599
Pit (Sit) > O. These probabilities may involve Wit' for i E U,and also Xit for i E Sit .
The overall two-phase sample St = (SiP S2t ) has the selection probability
Pt(St) = Pit(SIt )P2t(S2t I Sit), where PI> Pit and P2t denote respectively the design
corresponding to double, first phase, and second phase sampling plan. Define
E Pt' E PIt and E P2t be the design expectation operators with respect to Pt' Pit' and
P2t where E P2t is a conditional expectation for each fixed Sit with Pit > 0 ;
I kit(I kit)= 1 if i E Skt , and = 0 otherwise, for k = 1,2. Let Jrw and Jrlijt be the first
and second order inclusion probabilities according to design Pit and let
rlit = 1/ Jrlit be the design weight for every Sit with PIt > O. Let Jr2it and Jr 2ijt be
the first and second order conditional inclusion probabilities with respect to design
P2t(el SIt) and let design weights be r2it = 1/ Jr2it .
Consider two estimators such as
elt (w)= L Witrlit
i ESlt
and
e2t(w)= LWit rlitr2it .
iES2t
Similarly define elt (x), e2t(x) and e2t (y) based on the information available from
survey data.
Find the bias and variance of the regression type estimator of population total
Y given by
( c) PPSWR sampling : Y3 = N xm I.
Yi .
m n i=IXi
Hint: Prabhu--Ajgaonkar (1975).
Exercise 6.12. Find the asymptotic bias and variance of the chain ratio type
r[;r, r[?; r,
estimators of finite population variance S; as
Yr = Y[ ;J
where Y and x are the means for 52 and x· is the mean for 51 . Define
_(.)=!IIX-Xj
X} II -I
if j E 5 2'
x if j E(51 - 52 ),
mx> x ,
and x·(j) = J if j E 51 . Show that the linear form of the modified Jackknife
m -I
estimator
• _ m -I ...{- (.) - }2
VJ - - - 4- Y r } - Y r
111 j e sl
is identical to usual estimator of variance under mild conditions.
Hint: Rao and Sitter (1995) , Sitter and Rao (1997), Rao (1996b).
Exercise 6.14. Suppose that two sample surveys have one variable in common, say
Z, that is two agencies are collect ing information on it. Suppose the population
total of this variable is unknown to both agencies. Let X be the known auxiliary
variable and Y be the variable under study. Suppose 21 = LZ i/lrli and
ies l
22 = L Zi / lr 2i are the two different estimators of unknown total Z of the common
ie s2
variable obtained by both independent surveys agencies. Study the properties of the
regression type estimators of population total Y defined as
Exercise 6.15. Consider the problem of estimating the population mean Y of the
study variable, Y, from a finite population of size N. Suppose the information on
an auxiliary variable, X, highly correlated with Y, is not available, but values of
X are assumed known over a large random sample of m units . Suppose that
information on another auxiliary variable , Z, is available on all units of the
population with population mean Z. Let (Yi' xi, Zi)for i = 1,2,..., n denote the
information collected on the second phase sample. Then study the asymptotic
properties of the following four estimators:
(a )/1 = Yn +byA(xm-xn)-bxJm-z)]; (b) 12 = Yn +byAxm-xn)+byAz -zn);
(c) 13 = Yn +byAxm-xn)+byz(Z -zn);
and
( d ) 14 = Yn +byAxm- xn)+byxbxz(Z - zm)+ byAz - zn).
Hint: Ahmed (1998), Mukerjee, Rao, and Vijayan (2000), Sahoo and Sahoo
(I999a, 1999b), Pradhan (200 I).
( e) Develop a test statistic to test a hypothesis that known population mean Z can
be included or not in the regression type estimators using two-phase sampling.
Hint: Das and Bez (1995) .
( f) Study the asymptotic properties of these four estimators ( a ) to ( d ) under the
superpopulation models:
Yk =qlxk +elk
with
2k
E(elk I Xk) = 0, E(el I Xk)= axff ' a> °,g ~ 0, E(elkelj I Xk>xJ = 0, k '* } = 1,2,...,N,
and elk S are independent of x , and
Xk = q2 zk +e2k
with
E(e2klxk)=0, E(eiklxk)=CXZ,c>O,h~O,E(e2ke2jlxk,Xj)=0, k,*}=1,2, ...,N,
and elk S are independent of z . Assume el and e2 are also independent.
Hint: Sahoo and Sahoo (1999a, 1999b).
rmrumum.
Hint: Tripathi and Chaubey (1992).
Exercise 6.17. In the first phase we select a preliminary large sample s' of n' units
by SRSWOR and only the auxiliary variable X is measured . Let Mx be the I
where U =MX/ M'X' H(My' u) assumes the value in a bounded closed convex sub-
set R 2 of two dimensional real space containing the point lMy' 1) such that
HlMy'1)= My and satisfies certain regularity conditions .
( b ) Show that the following estimators
, _ -I n ,
where Yi = Y2 + ( Xi - x2 ) Pols for i = 1,2, ..., nand Y, = n LYi'
i=1
Hint: Kim (200 I).
Exercise 6.19. Let the first phase sample SI of nl units is taken with SRSWOR
sampling, and the second phase sample S2 of n : units is taken as PPSWR
sampling .
VY
, ) N2(1_
( =
Ji) 2
Sy+-- ,
V(Yp)
nl n2
and if the design is non-nested then:
v(y)= N2(I-Ji) R 2S; + V(Yp)[1 + (1- Ji) ~2 ],
nl n2 nl X
where
, ) 1 N ( \2 nl
V (Yp =-LPi Y;/Pi- Y ) , fl = - , Pli = x;/ L Xi' and Pi = X,! LXi '
nl i=1 N I iESI iEO
( C) Consider an estimator of Y as
Exercise 6.20. In the first phase sample SI consisting of m units measure the
values of the P auxiliary variables (Xii ,Xi2"",XiP} i = 1,2,....m . In the second phase
sample 52 C SI consisting of n units measure the study variable and auxiliary
variables as (ri ,Xil,Xi2' "'' XiP} i = 1,2,...,n . Let
604 Advanced sampling theory with applications
_* -I m .
xj=m 'L, xij' j=I,2, ..., p
i=1
denote the means of P auxiliary variables based on the first phase sample . Also let
_ -I n _ -I n
Y = n 'L,Yi and Xj = n 'L,xij
i=l i=l
denote the means of study variable Y and P auxiliary variates in the second phase
sample. Based on the j'h auxiliary variate define a difference estimator of the
population mean in two-phase sampling as
Yd(j)=y+bAx;-xJ
Now consider a weighted estimator of population mean Y as
Ywd = 'L, WjYd(j), sue hh
- p -
t at LP Wj = 1, 0 < Wj < 1 .
j=l j =l
where b 'k =
J m
(~_~)SZ
N Y
+(~-~)[b
n m J
·bkS ·k -b·S . -dkS ok ],
J J YJ >
SZ = (N -r)" I(Jj -
Y i=1
rt
Sjk=(N-ltl i%/Xji-XJXki-Xk} W=(Wt-WZ, ...,wp), B=(bjk)pxp' and
Exercise 6.21. Consider the problem of estimation of population mean using two-
phase sampling. Consider an initial first phase sample of m units is selected at
random and information on auxiliary variable x is measure. A second phase
sample of n units in selected with PPSWR sampling with probability proportional
to x. Find the variance of the unbiased estimator of the population total Y given by
, N ny .
Yuds = - 'L,---.L , where Pi = Xi
1mLXi'
nm i=IPi i=l
Hint: Raj (1964) .
Exercise 6.23. In the first phase we select a prelim inary large sample s' of II '
units by SRSWOR and only the auxiliary variables X and Z are measured . Let
Mx' and M: be the estimators of medians M x (unknown) and M z (known) of the
auxiliary variables X and Z based on the first phase sampl e. In the second phase, a
sub-sample s of II units is drawn from the preliminary large sample by SRSWOR
and both the study variable Y and the auxiliar y variable X are measured. Let Mx
and My denote the estimators of Mx and My respectively, based on the sample
drawn at the second phase.
( a ) Study the asymptotic properties of a general class of estimators for estimating
the median My as
Ma = H(My , U, V~
where U =Mx!M'x, V=M:/M z and H(M)" U,v) assumes the value in a
bounded closed convex sub-set R3 of three dimensional real space containing the
point (My, 1, 1) such that H(M)" 1,1) = My and satisfies certain regularity cond itions.
( b ) Show that the following estimators
and
M, =M y
\JM<+~~d)M;) [~M, +~~g)M,l]
are the special cases of the general class of estimators.
Hint: Allen, Saxena, Singh , Singh, and Smarandache (2002).
Exercise 6.24. ( I ) Assume that a sample of size III is selected using SRSWR
sampling out of N units to observe the variable X , while a sub-sample of size II is
selected out of III units to observe the study variable Y and auxiliary variable X
with the same sampl ing strategy. Suppose jill' XII denote the second phase sample
means, and xm is the first phase sample mean.
( a ) Find the first order bias and mean squared error of the estimators of popul ation
mean Y defined as
606 Advanced sampling theory with applicat ions
[Bose (1943)]
_ __ (m-I)
YHR = r X m + ----;;;- Srx' [ Sukhatme (1962) ]
where
~. = Yi - 1 n
'I , r = - ~:>i, and in ( _)
Srx = - -L rixi -xn '
Xi n ;;1 n -I i=\
( II ) Con sider anoth er cheaper auxiliary variable Z is ava ilable for all the units in
-
the popul ation and henc e Z = N - I.Z; is known and
IN
;; 1
zmbe its estimator obtained
from the first phase sample inform ation.
( a ) Find the bias and mean square error of the following estimators:
- -- (x
Yc - YnmJ(z-=-J.,
-=- X II Zm
[Chand (1975)]
- - - -
Z , w here g- = - 1 m
Xi
Y ds = r g L- ;
mi=\ zi
1[ (-J 1
U
-(I ) _ -
YI - y+ byx Xm s; { ( -J}
- ~ _ - . -(2) _ - -
x"' YI - y+ byx x m 2
_ ~
i; \ ]
_-
x" '
.
-(3) - - b { - ( Z
YI - Y + yx x m Z + a l (zm _ Z )
J--}. x"'
and
(t;-J
U
-(4) _ - - _ - ~ _- )
1
YI - Y + byx a \x m + (I a \ ~\:m 2 x" '
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 607
where al and a2 are suitably chosen constants, are the special cases of the
estimator )it .
Hint: Das and Tripathi (1979), Singh, Singh, and Upadhyaya (2001) .
on the preliminary large sample of III units . Further, let s f = r' h=1I. (Xiii - xh" f ,
(n - 1
.
with -
X/Ill -I "
= n LXhi an
d So2 = ( n -1 )-1 L" (yiii - Yh" \2 .h -
J , Wit Yh" = II _ I LYhi
"
, be the
h=1 h=1 h=1
estimators of al, i = 0,1,2, .., k based on the second phase sample . Find the bias
*2 k *2 sJ k
(a)al = LWi'isi where F;=2,=1 ,2,...,k , O<wi <I and LWi=I ,
A A
i=1 Si i=1
k +1
where LWi = 1.
i=l
Hint: Singh and Singh (2001).
Exercise 6.26. Consider a two-phase design in which in each phase the sampling
scheme is as follows:
(i ) The first phase sample s * of III ( III < N) units is drawn from the population
n to observe two auxiliary variables X and z.
( ii ) The second phase sample s of size II (II < Ill) is drawn from s * to observe
Y ,x, and z.
Consider the population mean Z of one auxiliary variable z is known.
Let
608 Advanced sampling theory with applications
_ _I n _ _I n _* _I m _* -I m
Y = n IYi, x=n IXi' x =m IXi' and Z = m Izi •
i~l i~1 i~ 1 i~1
After decomposing the whole pOfulation 0 into three mutually exclusive domains
S
as s , r2 = n s * and r, = (0-
s * of n , (m - n) and (N - m) units respectively,
where s = 0 - s , then the population mean can be written as
( b ) At the level of first phase sample Sl of size m : Xi and Zi are known for j E SI
( C)At the level of second phase sample S2 of size n : Yi ' Xi and Zi are known for
every j E S2;
Study the following calibrated estimator of the population total Y defined as
YES = L W2iYi '
iES2
where W2i' j = 1,2,..., n, are the second phase calibrated weights obtained under the
following situations:
SitUation -:;:;;;;;-:;:;;;;;-;;--C7;7;-:;:;;;;;T-"'"'~;:;;;:S ~~?iTI'iE~::;-r;;:;;;:-:;:;;;;;-:;:;;;;;;;;;;:::: ..........-:;:;;;;;-:;:;;;;;-:;:;;;;;-:;:;;;;;==E'l
'Niliiib~ti
I Wli Xi = I Xi ' I W2i Xi = I Wli Xi
SI i EQ S2 iES\
and I W2i Zi = I WliZi
S2 iESI
Continued .
Chapter 6: Use of auxiliary information: Multi-Phase Sampling 609
phase sample SI has been drawn , the second phase sample S2 (S2C SIC n) is
selected from SI with a sampling design with the selection probabilities 1( 2i = 1( ils, .
Evidently the first phase and second phase sampling weights are defined as
d li =1/Jrli and d 2i =1/Jr 2i' respectively. The overall sampling weights for the
selectedl" unit in the second phase sample S2 will be di• = d lid2i . Considered the
problem of estimation of general parameters of interest as,
n, = I.H(y;) and Hy = N- I 'LH(y;)
iEO iEO
for a specified function h.
( a ) Study the asymptotic properties the estimator of H y in two phase sampling,
defined as
HI = I'd;. H(y;),
i=1
where ;It are the ultimate calibrated weights obtained by minimizing chi square
distance between second phase design weights.
• 1/ i~ldliH(X;) Hz
H R = .'Ld lI·d 2IH (y I. nm ][ m ]
1=1
{
i~ldlid2iH(X;) i~ldliH (z;)
and the chain regression type of estimator is
HfA;) = Hf + wfAI- wfj t(Hf-H(zj))' HfA;) = Hfx + wfAI- wfj t(Hfx - H(Xj)) ,
Hzx(;) = Hzx+ W2j(1- W2j)1 (H zx- H(xj )), HZy(;) = HZy+ W2j(l- W2j )1 (H ZY- H(yj )),
,8\> ,81 (;), ,82, and,82(;) have their usual meanings.
( a ) Show that
E2 o» ,82 EI (;)+,82(/)d2(;)+ ,82°2(;)
H~(})-H~ =
j
,82 EI (;)
where
E2 (}) = (H ZY(;)- Hz) - ,82 (}XHzA;)- Hzx)-,81 (;),82 (}XiifA;)- HJ
EI (})= (HfA;)- Hfx)- ,81 (}XHfA;)- HJ d 2(; )= (HfA;)- HzA;)}
and
02(}) = (H zx(})- HfA;))- ,81 (}XH z- HfA;))- ,81 (Hz - Hfz) .
Exercise 6.29. Under the concept of two-phase sampling compare the mean
squared error of the ratio estimator MR defined as
.
MR=M y . (M*
M: J'
with the following estimator for population median My defined as
M(o)=M
y y
(A-N::J
A-M '
x
where A is a suitably chosen real constant.
Hint: Singh, Singh, and Puertas (2003a).
612 Advanced sampling theory with applications
Practical 6.1. Select a first phase sample of 15 units by SRSWOR sampling from
population 1 of the Appendix and record only the nonreal estate farm loans from the
units selected in the preliminary sample. Select a sub-sample of 5 units from the
preliminary large sample and note the real and nonreal estate farm loans. Estimate
the average real estate farm loans by using the ratio estimator. Construct the 95%
confidence intervals by estimating the variance of the ratio estimator with two
different estimators: ( a ) Jackkn ifing; (b) method of moments.
Practical 6.2. A key bank in the United States of America is interested in the
average of real estate farm loans. The bank manager has information about nonreal
estate farm loans in 15 states selected by SRSWOR sampling. From these selected
states select a sub-sample of 5 states and note the real estate farm loans as well as
nonreal estate farm loans from population I given in the Appendix. Use the
following estimators to estimate the average real estate farm loans
YI =y(x·/xXs;z/s.;) and yz =y+a(x' -x)+p(s?-s;).
Suggest estimators for optimum values of a and p . Construct the 95% confidence
interval in each case. Explain the difference in the estimates based on these two
estimators to the bank manager.
Practical 6.3. A private consultant selected first phase and second phase samples of
sizes 20 and 10 respectively. Discuss the relative efficiency of the general class of
estimators for estimating average amount of the real estate farm loans during 1997
by using information selected in the first phase sample only on the nonreal estate
farm loans during 1997, with respect to the regression estimator of population
mean.
Practical 6.4. People Bank has information about the real and nonreal estate farm
loans (in $000) during 1997 in the United States has been presented in population I.
If the bank select first phase and second phase samples of size 10 and 5
respectively, then:
( a ) Find the relative efficiency of the ratio estimator, for estimating the average
amount of the real estate farm loans during 1997 by using information selected in
the first phase sample only on the nonreal estate farm loans during 1997, with
respect to the usual estimator of population mean;
( b ) Suppose a budget of US$5000 is available to spend on the survey, out of
which, $2000 will be the overhead cost. Suppose selection, compilation, and
analysing of one unit in the first phase sample costs $50, while for the second
phase unit the cost is $500. Find the optimum values of the first phase and second
phase sample sizes. Also find the relative efficiency of the ratio estimator over the
sample mean for the fixed cost;
( c ) What will be the minimum cost for attaining a 20% relative standard deviation?
Chapter 6: Use of auxiliary information : Multi-Phase Sampling 613
Practical 6.5. A private company XYZ selected first phase and second phase
samples of size 20 and 10, respectively, from population 1 given in the Appendix.
Find the relative efficiency of the regression estimator, for estimating average
amount of real estate farm loans during 1997 by using information selected in the
first phase sample only on the nonreal estate farm loans during 1997, with respect
to the ratio estimator of population mean.
Practical 6.6. Mr. Nelson wishes to make a future strategy of selection of estimator
while estimating the average real estate farm loans. Suppose he selected first phase
and second phase samples each of size 10 and 5 respectively. Suggest a few
estimators as a member of the general class of estimators. Find the relative
efficiency of these members for estimating average amount of real estate farm loans
during 1997 by using information selected in the first phase sample only on the
nonreal estate farm loans during 1997 with respect to the regression estimator in
two-phase sampling of the population mean. Has Mr. Nelson any hope in finding a
new estimator?
Practical 6.7. Ms. Stephanie Singh selects a preliminary large sample of 20 units
by PPSWOR sampling using the Midzuno--Sen sampling scheme and using the
number of species groups during 1992 as an auxiliary variable and given in
population 4 of the Appendix . Select a second-phase sample of 10 units from the
given first phase sample by using the Midzunc--Sen sampling scheme. Find the
calibration weights for the units selected in the ultimate sample by making use of
known information about the number of fish caught during 1994 as an auxiliary
variable . Use the chi square distance function between the design weights and
calibration weights. Discuss three cases when these weights leads to the GREG,
ratio, and traditional linear regression estimator for estimating the total number of
fish caught during 1995. Deduce the estimates of the total number of fish in each
case.
Practical 6.8. Use the Midzuno--Sen sampling scheme to select a preliminary large
sample of 15 units by using the number of fish caught during 1992 in the United
States as a selection variable given in the population 4. Collect the information on
the number of fish caught during 1993 and 1994 from the units selected in the
sample. Assume that the total number of fish caught during 1993 are known, derive
the first phase calibration weights and hence estimate the total number of fish
caught during 1994 in the United States. Select a second phase sample of 10 units
from the given first phase sample by using Midzuno--Sen sampling scheme. Collect
the information on the number of fish caught during 1994 and 1995 for the selected
units in the second phase sample . Derive the second phase calibration weights, and
hence deduce the estimate of the number of fish caught during 1995 in the United
States .
614 Advanced sampling theory with applications
Practical 6.9. Take a first phase sample of IS units by SRSWOR sampling and
note only the nonreal estate farm loans from the units selected in the sample given
in population I of the Appendix . Select a sub-sample of 10 units from the given
preliminary large sample and note the real estate farm loans as well as nonreal
estate farm loans. Estimate the average real estate farm loans by using regression
estimator. Deduce the 95% confidence interval.
Practical 6.10. The real and nonreal estate farm loans (in $000) during 1997 in
different 50 states of the United States have been presented in population I of the
Appendix. Suppose we selected an SRSWOR first phase sample of ten states to
collect the information on nonreal estate farm loans only. From the given first phase
sample of ten units, we selected a second phase sample of seven units and both the
real and nonreal estate farm loans are observed. Find the relative efficiency of the
ratio estimator of the median , for estimating median of the amount of the real estate
farm loans during 1997 by using information of the nonreal estate farm loans during
1997 collected in the first phase sample with respect to the usual estimator of
population median. Assume that both the real and nonreal estate farm loans follows
,+
a bivariate normal distribution.
r r
Hint: The joint p.d.f. for the bivariate normal distribution is given by
2( 1_p;yl[(x~:' +(y~:y
1
- 2 P,, ( x~:'Xy~:y )J)
/(x,y) = --'-------=---=-::-------;===~----~
7. SYSTEMATIC SAMPLING
Sampling scheme in which only the first unit is selected at random, the rest being
automatically selected according to a predetermined pattern is known as systematic
sampling. Systematic sampling provides a very simple sampling design in practice
to select a sample of size n from a population of size N . Systematic sampling is
both operationally convenient and efficient in sampling some natural populations
like forest areas for estimating the volume of timber and hardwood seedlings, etc ..
The first step is to select a random number from 1 to k, that is in the range of
integers listed in the first row . Let the first selected random number is 2. Then first
unit selected in the sample is number 2 in the sequential list. After selecting the
second unit from the population, every J(h unit is automatically included in the
sample. Thus the units in the sample of size n are at the serial numbers
2, k + 2, 2k + 2, ... , (n -I)k + 2 .
The random number selected from 1 to k is called a random start. The number k
is called sampling interval. Corresponding to each random number from 1 to k,
there is only one possible sample of size n. Thus in systematic sampling the total
number of samples will be k. If r denotes the random start, then the systematic
sample consists of the units at the serial numbers given by the sequence
{r + ik, i = O,1,2.....,(n -I)} .
The sequential list of population units can either be made using known magnitude
of auxiliary information or by just numbering the list of population units. Assume
Yr denotes the sample mean corresponding to random start ' r '.
Then
_ 1 n-l
Yr=-IYr+ik' (7 .1.1)
n i=O
Thus we have following theorem.
v(yr)=ert=~~(yr-rf . (7.1.3)
k r =1
*r~IY; *r~l(yr rf .
Proof. By the definition of variance we have
From (7.1.3) one can easily calculate the variance between the sample means if all
possible sample means and population mean are known . Unfortunately we
generally have the knowledge of only one sample mean and only one sample mean
cannot be used to estimate the variance between all possible sample means. In
(7 .1.3) ert stands for the variance between all possible samples selected by using
systematic sampling. Till now we have discussed the variation between different
column means of the Table 7.1.1. A natural question arises that there will be
variation within units of each column . Such a variation is called within sample
..
vanation. Suppose Y ri denotes tel
h .th ( I• = 1,2,3'00 " n ) 0 b servation
. In. the r th samp Ie.
th
Then the r sample mean can also be defined as
_ 1 n
Yr =- LYri ' (7.1.4)
ni=l
The variation within the /h sample can be defined as
n
2 1 ( -\2 (7.1.5)
err = - L Yri - Yr} .
n i=l
Chapter 7: Systematic Sampling 617
Thus the overall measure of within sample variation can be defined as the average
of these values across all k samples as
2 Ik Ikln _ Ink _
O"w
r=1
2
= -k IO"r = -k I - I
r=ln i=1
(Yri - Yr f =N I
i=lr =1
IcYri - Yr f. (7.1.6)
Theorem 7.1.3. The total variance of the population is the sum of between and
within sample variances, that is
0"2=O"l+0"~. (7.1.7)
Proof. Note that
NO"
2= Lz.s.t -)2 nk( -)2
Ll,Yri -Y = L Ll,Yri - Y r + Yr- Y
i=lr=1 i=lr=1
2 2
= NO"w+NO"b'
Hence the theorem .
(7.1.11)
We now discuss the situations under which systematic sampling is better than
simple random sampling without replacement. The variance of the estimator of
population mean under systematic sampling is
V(-) 2 2 2
I,Ysy = O"b = 0" -O"w (7.1.12)
and that under SRSWOR is
_ ) (N -n) 2
V (Ysrs = n(N -1)0" (7.1.13)
618 Advanced sampling theory with applications
(7.1.15)
.r-
Evidently ifDE< 1, i.e.,
a~ > (I -~J(I -
then systematic sampling for a sample of size n remains better than SRSWOR
sample of the same size. The greater the variation a~ within samples, greater will
be the gain due to systematic sampling. It is obvious that if a ~ will increase then
as will decrease, since the total variation a 2
remains fixed. Thus for large gain in
efficiency due to systematic sampling, the units within each systematic sample
should be as heterogeneous as possible.
Case II. When N c# nk, i.e., it is not possible to find k such that k = N/n is an
integer. For example when N = 5 and n = 2, k = N/ n = 5/2 = 2.5 is not an integer.
In such situations, we can either take k = 2 or 3. Suppose the population consists of
following units.
If the selected random start r = 2 then the units at serial numbers 2 and 4 will be
selected. The resultant sample will be {3, 5}. The mean of the second sample will
be
- _3 +5_
yz --2-- 4 .
+( I~ + 4) = 3.667
which is not equal to the true population mean Y = 3.6 .
( b ) If we take k = 3, then we have to select a random start r between I and 3.
Then we have the following possibilities:
( i) If r = 1 then the sample is {2, 5} with mean )II = 2 + 5 = 3.5,
2
Example 7.1.1. Case I. If k = 2, then the possible samples are {2, 2, 6} and {3, 5}.
Thus YI = 2 + 2 + 6 = 10 and Y2 = 3 + 5 = 8 . Now the mean of Yu = ~ Yr , r = 1,2 ,
N
will be
E(YJ=~ f ~Yr = ~(
2 r =( N 2
2 XIO + 2X8) =.!! = 3.6 = Y.
5 5 5
Case II. If k = 3 then possible samples are {2, 5}, {3, 6}and {2} . Obviously
In case of modified systematic sampling, instead of selecting a random start ' r '
from 1 to k , we have to select a random number from 1 to N, then every J(h unit on
the right and left of it is included in the sample. Thus, for example, if k = 2 then the
Table 7.2.1 shows the possible samples for population given in Table 7.1.2 for
different random starts.
Hence under modified systematic sampling, the sample mean Yms is an unbiased
estimator of the population mean Y, but still the sample size is a random variable as
shown in Table 7.2.1.
Murthy (1961), Sukhatme and Sukhatme (1970), and Konijn (1973) have suggested
using the circular systematic sampling (CSS) design in the situations when N is not
a multiple of n . In this sampling scheme, the sequential list of the population units
is first prepared on the circle as shown in Figure 7.3.1.
The main steps involved in selecting a sample using CSS scheme are as follows:
( a ) Select a random number from 1 to N and name it as 'random start';
( b ) Chose some integer value of k = N/ n or rounded to nearest integer and name it
as skip;
( c ) Select all units in the sample with serial numbers
r+jk if r w jk s.N,
(r+jk-N) if r+jk>N; j=O,1,2, ..., (n-l) .
Ministers at
serial no. 3, Ministers
6 and 9 are at serial
selected with no. 8, I
r =3,k =3 and 4 are
and N = 10. selected
withr = 8 ,
k = 3 and
N = IO
Theorem 7.3.2. A necessary and sufficient condition for all units of the sample of
size n selected by circular systematic sampling with random start r to be distinct
for all r ~ Nand n ~ N, is that Nand k are relatively coprime.
Proof. Let Nand k be integers with k < N, r ~ N and II < N . Also let the sample
s be consist of with serial numbers s = {iI' ti- ... ,in } where ij = {r+ jk }mod( N),
j = 0, I, ..., (n -I}
Then the necessary and sufficient (n.s.) conditions are proved as follows :
( b ) Necessity: Assume for all r s N and II < N, all elements of the sample are
distinct and Nand k are not coprimes . Let the greatest common divisor (g.c.d.) of
(k, N ) = a , with k = b.a - N = c.a, where band c are both smaller than N . For any
random start r, let us take II ~ C + 1.
Then we have
which again contradicts our assumption that all elements of the sample s are
distinct. Hence the theorem.
where M = Max X j, [x] = the largest integer contained in x and 1 is the smallest
l::;j::;N
positive integer for which (I x k)1 X is an integer.
Proof. Follows from Chaudhuri and Adhikary (1987).
Sengupta (1988) pointed out that condition (7.4.1) is not sufficient and resultant
sample will not contain all distinct units even if (7.4.1) holds. He provided the
following example .
Example 7.4.1. Suppose N = 10, X = 300, M = 65 and [XI M] = 4 . The data is
given in the following table:
Let k= 120, n=3. Here 1=5, no=4, so that (7.4.1) holds . But for r=112 the
sample contains only two distinct units 4, 9. Furthermore Sengupta (1988) gave the
following theorem.
Theorem 7.4.2. Let M = Max (X;). Then a necessary and sufficient condition for a
l ::;j::;N
PPS Circular Systematic Sample of size n with sampling interval k to always
contain all distinct units is that, n ~ nl , where n, is the smallest positive integer j
for which
j k mod(X ) < M or >(X - M ) .
Brewer (l963a) suggested a method of selecting a systematic sample with unequal
probability sampling. Some new circular systematic sampling (CSS) schemes have
also been studied by Uthayakumaran (1998). Hartley (1966) studied PPSWOR
systematic sampling . Although we have discussed many difficulties and their
solutions in systematic sampling in the preceding sections, the more serious
difficulty is in the estimation of variance of the estimator of mean/total using
systematic sampling schemes .
We will discuss here certain methods to estimate the variance of the estimators of
mean (total) under systematic sampling scheme.
Chapter 7: Systematic Sampling 625
In this method, rather than choosing one systematic sample of size n , choose m
sub-samples each of size 11/ m by selecting m random starts from I to k =(Nm)j II
using without replacement sampling scheme. Compute the m sample means Yj '
j = 1,2, ..., m. Also compute the full sample mean defined as
_ 1 m_
Y p = - LYj ' (7.5.1)
m j=l
An unbiased estimator of vlyp) is
.f"" ) ' 2 1-/ ~f.,., - \2 (752)
vl)'p =(Jb = m(m-l) /:ll)'rYp) . .
Exa mple 7.5.1. Select three sub-samples each consisting of 5 states from
population 1 having 50 states as given in the Appendix by using systematic
sampling. Collect the information on the real estate farm loans from the states
selected in the sample. Obtain a pooled estimate of the average real estate farm
loans in the United States. Use an appropriate method for estimating the variance of
the resultant pooled estimator.
Solution. We are given N = 50 , m = 3, and n = 5 x m = 5 x 3 = 15 is the total sample
07 09 FL 01
17 19 ME 11
27 29 NH 21
37 39 RI 31
47 49 WI 41
626 Advanced sampling theory with applications
'~.5.2·SQ~G§S~JMWIFFE~~~§~dlll:I1.'liIriw~t:I'~~.~i~
If Y i denotes the i th unit selected in the sample, then an estimator of the variance of
sample mean can also be obtained as
-2 (I -f) n-I( \2
(7.5.3)
ab = 2 n (n- 1) i=1
L Yi - Yi+1J .
The assumption here is that each successive pair of units, i.e., Yi and Yi+1 are
drawn using SRSWOR sampling from the 2k eligible units.
Example 7.5.2. Select a sample of 10 states from popu lation I consisting of 50
states by using the systematic sampling scheme. Collect the information on the real
estate farm loans from the states selected in the sample . Use an appropriate method
for estimating the variance of the estimator of population mean.
Solut ion. We have N = 50 and n = 10 , therefore k = N /11 = 50/10 = 5. We used the
8th column of the Pseudo-Random Numbers (PRN) given in Table I of the
Appendix to select a random number between I and 5. We observed random
number 2. Thus the systematic sample consists of the following 10 distinct units as
02,07, 12, 17,22,27,32,37,42 and 47.
(Yi '~ n
2 AK -4.525 20.475
7 CT -46.623 2173.704
12 ID -991.353 982780 .77 1
17 KY 722.078 521396.638
22 MI - 1014.82 1029867.750
27 NE 1136.221 1290998.160
32 NY 86.732 7522.439
37 OR -438 .367 192165.627
42 TN -547.479 299733 .255
47 WA
Sum 432 6658;820 ,'
Chapter 7: Systematic Sampling 627
Thus an estimate of variance of the estimator of mean for the systematic sampling is
,2 (I-f) n-1( \2 (1-0.2)
(J"b = ( ) L v. - Yi+IJ = ( ) x 4326658.82 = 19229.595 .
2n n -1 i= 1 2 x 10 10 - 1
Ray and Das (1995, 1997) have proposed circular systematic sampling schemes
which provide unbiased estimator of population mean and also estimator of
variance of the mean without putting any restriction on the population size.
Quite often situation arises when the study variable Y; is related to the random start
i through a linear relationship, usually known as a linear trend. We discuss below
some results when such a trend is present.
In the following theorem, we show that the usual estimator of population mean
under systematic sampling scheme remains more efficient than simple random
sampling in the presence of linear trend.
Theorem 7.6.1. If Y; has linear relation with the random start i , that is
Y; = a+bi (7.6.1.1)
under systematic sampling strategy while drawing a sample of n units, and if
another sample of n units is drawn by SRSWOR or SRSWR, then for large N we
have
V(y1RS/V(y"JSYS = n . (7.6.1.2)
Proof. Under SRSWR sampling we have
V(YWR) =
n
2
(J"y , where
IN
(J"; = - I(Y; -
Nw
yf and Y_ = -Nw
IN
IY; . (7.6.1.3)
2
a 2 = ~ I(r; _
y N i= 1
¥f =~N Ira «bi-{a + b(N2+1)}]2
i=
= b I[i _ (N + 1)]2
N i= 2
=It.I[i2+(~)2
N 1=2
-2(~)i]
2
=!t[.~i2+N(~)2
N 2
-(N+I).~i] 1=1 1=1
2(N2
=It[N(N +IX2N +I) + N(N +If N(N +lfJ = b -IL
N 6 4 2 12 (7.6.1.5)
From (7.6.1.3) we have
v(- )_ a y2_ bN2( 2-I ) (7.6.1.6)
YWR - -;;- - 12n .
Under SRSWOR sampling we have
v(- ) = (N - n)S2 = (N - n)--!!'-a2 = (N - n)b2(N + I). (7.6.1.7)
YWOR Nn y Nn (N-I) y 12n
If a systematic sample of size n is selected with random start i and skip k, then
the units selected in the sample will be listed at serial numbers
i , k + i , 2k +i, ..., (n -I)k + i . Using (7.6.1.1) we have
- 1 n-I
Yi = - L Yi+jk
1[
= - Yi + Yk+i + Y2k+i +... + Y(n-I)k+i
1
n j =O n
Thus we have
_
V(Y;)SYS = EYi -12J = -1 .Lk(.,-,
[_ - Y -\2
\Yi - YJ = -1 Lk[a + b{.1(n-I)k}
+- - {a + b(N+I)}]2
-
k 1=1 k 1=1 2 2
Chapter 7: Systematic Sampling 629
=~k .I[i2+(~)2
,=1 2 -2(~)i]
2 =~[L2+k(k+lf
k 4 2(~).Ii]
2 1=1 1=1
V(-.)
2 2
= b (N -1 ) =
b2N2(1 1
-Nf
J b
2n2k2
nb
2k2 (7.6.1.11)
,
Y WR 12n 12n =U;-=-1-2-
Again from (7.6.1.7) for large N we have
12 nb k . (7.6.1.13)
12 N n 2 =!!£....
2
=~[N2
12
_n J=~[I -(.!!.-)2][~J
n2 12
(7.6.1.14)
Here we adjust the weights given to the sampled observations in the estimator of
population mean in such a way so that the variance of the estimator is zero. The
estimator of population mean with random start i and skip k in the systematic
sampling is given by
630 Advanced sampling theory with applications
_ 1 n-I
Yi =- L: Yi+jk' (7.6.2.1)
n j=O
Let us change the weights as
( ..!.-n - x],..!.-nn
) ,.....)),(..!.-+ x] .
nn n
(7.6.2.2)
Clearly we have only changed the weights for the first and the last unit in the
sample. The value of x is determined suitably such that the modified estimator s;
(with new weights) matches the population mean, Y for all i. Then the systematic
sampling estimator with new weights is given by
_.
Yi =( --x
1 ] 1
n Yi+-Yk
1
n +i+-Y2k+i+
n
1
·....+-Y(n-2)k+i+
n
( 1
-+x ] Y(n-l)k+i'
n
(7.6.2.3)
If k is even then the variance of the systematic sampling mean does not become
exactly zero, but if k is odd then it may be zero. We shall discuss each case
separately.
Case I. k is odd: We chose (k; I}h unit as random start. The value of y which
corresponds to the random start (k; I) is a + b(k; I). Thus the first unit in the
sample corresponds to serial number (k; I), the second unit corresponds to
k +( k ; I), the third unit corresponds to 2k + ( k ; I) and so on. Thus the sample
I n~l[
Y
-
(k+l)
-
=-
nj;O
b{'k (k+I)}]
L..
2
1[
a+ } + - - = - na+
n
nb(k+1) bkn(n-I)]
2
+
2
2
bk b nbk bk b(nk + I)
= a + - + - + - - - = a+----'--.....
2 2 2 2 2
b(N +1) - .
= a+ = Y = Population mean.
2
Thus with (k; I) as random start we have the sample mean equal to the population
tt
I' I'
2
t 1.1
( k ;
I'
t k
Fig. 7.6.3.1 Centrally located systematic sampling for odd random start.
632 Advanced sampling theory with applications
Case II. k is even: In this situation we chose two random starts, as shown in the
figure below .
.!,~ h~ .!,~
ll----=-----=----_t
I 2 k
Fig. 7.6.3.2 Centrally located systematic sampling for even random start.
In this case there are two possibilities of choosing random start point, i.e., either !..
2
perform an experiment with an unbiased coin, e.g., head for ~ and tail for (~+ 1) .
If we take !!... as random start then from the linear trend, the Y value corresponding
2
to the first unit is a +b;, second unit a +b(k+~), and (n - l)th unit
a + b[(n -1)k + ~] . Thus the sample mean with random start ~ will be
bk nbk bk Nb
=a+-+---=a+- . (7.6 .3.1)
2 2 2 2
_
Y(k ) = - c: a+
1n~l[ b{k' (k +
g+--
2)}] 1[na+ nb(k + 2) + nbk(n -1)]
=-
- +1 n j=l 2 n 2 2
2
b(N +2)
=a+ 2 . (7.6.3.2)
Note that we are choosing each of the two random starts with probability 1/2, so
_
2 -
1
Y=-Y(k) +-Y(k
2 - +1
1
)=-21 [bN
a+-+a+
2
b(N+2)]
2
b(N+l)-
=a+---=Y.
2
2 2
Furthermore
Chapter 7: Systematic Sampling 633
(7.6.3.3)
We assume that N = nk and n is even. Instead of taking the sampling span from 1
to k, we take the sampling span from 1 to 2k. In other words , instead of taking n
groups of k units each, we take n/ 2 groups of 2k units each and the sampling
span is 2k instead of k . A pictorial representation of Balanced Systematic
Sampling (BSS) is presented below :
tt
I'
~
I'
2
f i
~
k
f t ...
2k-i + 1
I'
~
Fig. 7.6.4.1 Balanced Systematic sampling.
Furthermore, as shown in the above diagram, we select two random starting points
i and 2k - i + 1 (say), one between I and k and another between k and 2k such
that both points are at equal distance from k. Then we have the following data in
the sample. The first unit is selected at serial number i, the second unit is selected
at serial number Zk - i + 1, the third unit is selected at serial number i + 'lk , the
fourth unit is selected at serial number 4k - i + 1and so on. Hence using the linear
trend we have
~-1 ~-1
YBSS =..!.- 2I [a +b{Zkj +i}]+..!.- 2I [a +b(Zkj +Zk - i +1)]
n j;O nj;O
~- l ~-1
=..!.- 2I [a+Zbkj+bi+a+2bkj+2bk-bi+b] =..!.- 2I [Za+4bkj+Zkb+b]
n j;O n j;O
=-;;Z [ "2
na +-Z-
Zbk ( "2 J( J
n -1 "2 nbk +4
n +-Z- nb] = a +bk ( "2
n -1 +bk +"2
b J
nbk b Nb b b(N + 1) -
=a+- -bk+bk+- =a+-+-=a+---= Y . (7.6.4 .1)
Z Z 2 Z 2
634 Advanced sampling theory with applications
Thus irrespective of the random start i, the balanced systematic sample (BSS)
mean is equal to population mean and equivalently V(YBSS) = o.
Singh and Garg (1979) have proposed a sampling scheme named as Balanced
Random Sampling Scheme, which has the advantage of both simple random
sampling and systematic sampling in the sense that a part of the sampling variance
depends upon the arrangements of the units in the population and the other part is
independent of the arrangements. The resultant sampling scheme performs the best
for the populations showing linear trend or periodicity.
From the above calculations the biased estimate of the average real estate farm
loans is 474 .002. The Yates end corrected and unbiased estimate is 498.404.
Chapter 7: Systematic Sampling 635
>~
Singh and Singh (1977) suggested a new systematic sampling scheme which
provides an unbiased estimator of variance of the sample mean. Here we shall
discuss their technique briefly. Suppose a population consists ofN distinct units and
a sample of size n has to be drawn. Let u(~ n) and d be two predetermined
integers which are chosen in such a way that: (i) Every sample contains distinct
units; (ii) The inclusion probability for each pair of units is non-zero , and starting
with a random number r(~ N), select u units continuously, and thereafter the
remaining n - u = v (say) units with span d such that d ~ u and u + vd ~ N . With
these conditions, a sample of size n can be drawn in two or more phases. The
condition for the number of phases p(say), required for selecting a sample of n
units from the population of N units is given by
log{log(N/ 2)}- log{log(n/2)}
p e log(2) (7.7.1)
of units with indices (r + u- I) + td , (t = 1,2, ...., v) . Assuming that the first and
second order inclusion probabilities Jr i and Jrij are known. Then we have the
following results.
Result 7.7.1. The Horvitz and Thompson (1952) type estimator of population mean
is given by
_ 1 n Yi (
YSys= N L - . 7.7.3)
i=I Jri
v(ySYs) = ~ f
N i=lj>~
f(l-Jrij - N:J&i -
n
Yj~ ' (7.7.4)
V\Ysys Inn
' (-, ) =- 2 LL
N i=lj>i
(I
---2
Jrij
N
n
2
J\Yi
r - Yj \2) . (7.7.5)
636 Advanced sampling theory with applications
Thus an estimate of the average amount of the real estate farm loans in the United
States is
Y sys =~ I Yi = 33504.64 = 670.093 .
N i = l Jri 50
Chapter 7: Systematic Sampling 637
_1 _ N
2
2
J= N(N -1) N
2
= 50x49 -~ = 2.2222 .
( Jrij n n(n-1) n2 lOx9 102
Using the above information the estimator of the variance of the estimator Ysys is
Given the finite population n of N distinct and identifiable units with variable of
interest, Y. Then the Zinger (1980) sampling design exists if the following
conditions hold:
( a ) First a sample s of size m < N is selected from n using design Ps .The
strategy (Ps' Ys) is unbiased for the population mean r;
( b ) Once the sample s is selected and held fixed, another sample r of size
n«N-m) is selected from (n-s) with probability design Pro The strategy
{(Pn tr ) Is fixedjis unbiased for Y(o-s)' and in tum (Ps' YO-s) is unbiased for
population mean r.
Espejo (1997) used the above defined Zinger design to
propose a Zinger statistic as
tE=fJYs+(1-fJ)(tr I s fixed). (7.8.1)
where fJ is a non-zero real constant. Then we have the following theorems.
638 Advanced sampling theory with applications
Espejo (1997) has shown that the estimators proposed by Gautschi (1957), Heilbron
(1978), Wolter (1984), Wu (1984), Ruiz and Santos (1992) and Rana and Singh
(1989) are also special cases of the estimator (7.8.1). Systematic sampling has also
been discussed by Madow and Madow (1944), Madow (1949,1953), Finney (1948,
1995), Sukhatme, Panse, and Sastri (1958), and Reddy (1980) .
Period(i)
The population may consist of a periodic trend given by the sine curve as
Yi =a+sin(Jri/n+p) (7.9.1)
where i varies from 0 to an integral multiple of 2n . In Figure 7.9.1, we have taken
a = 0 , P = 0.1, n = 10, Jr = 22/7 and i = 0,1,2,3,....,120. Here 2 x n = 20, which means
successive sampling units will repeat themselves after every 20 th value. It can be
easily observed that a 5% systematic sample from such a population will form
sampling units drawn from the same position of each cycle. An estimate from such
a sample will be as good as a single value. On the other hand, a 5% random sample
will contain units from different parts of the population and estimates from such
samples will be more precise for the effect ofa periodic trend . In Figure 7.9.1, the
height of the curve is equal to the value of the study variable Y. Now if the skip k
is equal to the period of the sine curve or an integral multiple of it, then the units
marked with the circles are in the sample. Since all circles are at the same height
from the x axis, therefore every observation within the systematic sample is exactly
the same. In other words, the sample has the same information as from any single
unit of the population. Thus this case can be considered as the least favourable for
systematic sampling. In contrast, the most favourable situation happens when the
span k is an odd multiple of half-period. In this case, every systematic sample
mean is equal to the population mean, which results in zero variance of estimator of
population mean. Such a situation of half-periodicity are indicated the Figure 7.9.1
with squares. The choice between these two cases depends upon the relationship
between k and wavelength . Increase in periodicity in cyclic populations, while
selecting the sample, increases the efficiency of the estimates. Madow (1949),
Finney (1948, 1950), and Milne (1959) have observed linear and periodic type
trends in the natural populations .
The simpl est way is to select a pair of random numbers (i, j ) such that is; rand
j S; s . Thus the random location of a grid is unique. For example, let us suppose
the population cons ists of N = 100 units and we wish to select a sample of n = 9
units. The simplest way to create 100 grid areas is to form a plane with r = 10 rows
and s = lO columns as shown in Figure 7.10.2.
Field 1 I l A2 3 4 5 I tA 6 7 8 9 Al o
1
,1 ~ ~
(~73
~ ,(,'13
2 c; ~J l.-J
3
10
y y
Fig . 7.10.2 Square grid or Aligned sample.
Chapter 7: Systematic Sampling 641
Field I rlh 2 3 4 7 8
I
a Wff
2
'-~
"~.~
3
'"
~ l
'
4~
".,
~
5 r----.l
6
i ,I},
, I"~J
7
~ , ,'\I~
8
L:J
9
10
~XERCISES ~( .~
Exe rcise 7.1. Give the circumstances under which systematic sampling is preferred
over simple random sampling. Discuss the difficulties in estimating the variance of
the estimator of population mean under systematic sampling design.
Exe rcise 7.2. What was the need of circular systematic sampling ? Is it always
preferable than usual systematic sampling? Give a practical example of using
circular systematic sampling.
Exer cise 7.4. What is Yates end correction in Systematic Sampling? How it effect
the variance of the estimator of population mean under systematic sampling? Does
there any difficulty in this method while implementing in actual practice?
Exe rcise 7.5. Discuss the systematic sampling strategies under ( a ) a linear trend,
( b ) a cyclic trend.
Exe rci se 7.6. What is Balanced Systematic Sampling (BSS)? Show that the
variance of the estimator of the population mean under BSS reduces to zero.
2
Exe rcise 7.7. In a population with quadratic trend Y i =i , i= 1,2,...,25 , as shown in
the following figure.
600 I
500 +
N
<
'ii
400
300
J
'" 200 ~
100
0 I I I I ,
M 10 .... C)
;:: M
~ ~ ~ ~ N M
N
10
N
compare the value of v(ySyJ = E(ysys - ff given by f(h systematic sample of size 5
by Yates end correction method and usual method of systematic sampling .
Chapter 7: Systematic Sampling 643
Exercise 7.8. Mr. Bean studied that the alternating current follows sine curve as
shown in the Figure 7.9.1. Mr. Bean's interest was to estimate the average amount
of current. He used systematic sampling to select a sample of n = 4 units with
random start r = 3 from a population of N = 40 as shown below .
3 13 23 33
+0.86 -0.86 +0 .86 -0.86
From the sample information, he obtained that the average current is zero . Do you
agree with him? If not, why?
Exercise 7.10. If the units in the population can be arranged in increas ing (or
decreasing) order of magnitude of the study variable. Show that the value of
intraclass correlation coefficient is negative (Refer to Chapter 9 ). Also show that
both balanced systematic sampling and modified systematic sampling remain
superior to the usual systematic sampling for estimating population total or mean.
Hint: Reddy (1980) .
Given that balanced systematic sampling is due to Murthy (1977) and modified
systematic sampling is due to Singh, Jindal, and Garg (1968) .
Hint: Reddy (1980).
Exercise 7.13. Consider a superpopulation model: m : Y; = a + fJ(i _ N; 1) + e.,
of the estimator ys be defined as ErnE p [Ys- yf for s = sys, mss, ess and bss over
the design p . Show that:
( a ) For systematic sampling
2(1- 2(k2
E [MSE f- )~= 00 f) + b -1l.
rn I)'sys ~ n 12'
(b) For modified systematic sampling (Singh, Jindal, and Garg ,1968))
2(k2
002
(1- f) + b ;1)
if n is odd,
E [MSE(y )]= n 12n
m mss 00 2 (1- f)
1 n
if n is even;
002(1- f) b 2
if k is even,
Em[MSE(ycss)]= n +4
00 2 (1- f)
1 n
if k is odd;
Ylr- p[N + 1
- =y+ - - - - £ .1~ .J .
.,1
2 n ;=1
( a ) Show that Ylr is the best linear unbiased estimator of the population mean Y.
Also show the following relations :
2(1-
. [ (_ )
( b ) under SRSWOR samplmg Em MSE Y lr srswor =
] 00 f) + oo 2((k -1)) ;
n nk n-l
2(1- 2
. . [ ( _ ) ] _ 00 f) oo (k - lXk +l ) .
( c ) under systematIc samphng Em MSE Ylr sys -
n
+ 2( X )'
nk n -1 n + 1
Chapter 7: Systematic Sampling 645
if k is odd,
if k is even .
Exercise 7.15. (a) Under the model m : Yi =a + Pi + rXi +ei, where a, P and r
are constants, ei - N(O, ( 2) , and Xi is a periodic function of i with a period of
2f , and N = 2fQ = nk Compare the expected variances given by Em[Vsrswr ],
EmlvsysJ, Em[Vsrswor], Em[Vbss], and Em[Vmss]'
Hint: Madow and Madow (1944).
( b ) After eliminating the linear trend from the model: m: Yi = a + Pi + r xi + ei ,
show that the following relations holds:
(i ) EmlVsysJ::; Em[vbssl
( ii ) Em[vbssl = EmlvsysJ::; Em [Vsrswor] for k/f and n both are even.
(iii) EmlvsysJ'" Em[Vsrswor] for k/f being odd.
and
( iv ) Em[Vsrswor ] '" Em[Vmss] for k/f, k and n/2 are odd.
Hint: Bellhouse and Rao (1975).
Exercise 7.16. Under the model m : Yi = PYi-l + ei, i = 1,2, ..., N , where ei - (0, ( 2 )
and are independent and identically distributed then show that the centrally located
systematic sample (c.s.s) is optimum.
Hint: Blight (1973).
Exercise 7.17. Let N (= kn, where a s k) be the population size. The population
units VI' V 2,... , V N are arranged in an n x k matrix M (say) and the /h row of the
matrix M is denoted by Rj , j =1,2,...,n . Note that the elements of the row
Rj = {uU-I)k+i' i = 1,2, ..., k} . Consider a sampling scheme to select n units from the
diagonal of the matrix M and hence belongs to different rows and columns . Let Yij
be the value corresponding to /h row and j" column then the sample will consist of
the observations, given by Si = {YIn Y2(r+I)' ..., Yn(r+n-I)}' for i = 1,2, ..., k. Note that
if r + n -1 > k then it has to be reduced to mod k .
( a) In the hypothetical situation of linear tread Y; = a + bX i , i = 1, 2,...., N , show
that the variances of the simple random sample mean eYr), systematic sample
mean (ySY ) and the diagonal systematic sampling mean (y dsy ) are given by
646 Advanced sampling theory with applications
Practical 7.3. Use two letter abbreviations to sort the 50 states listed in population
I of the Appendix, and then write them on a circle in the clock wise direction as
shown below:
Select a circular systematic sample of 10 states out of 50 states, and collect the
information from population 1 given in the Appendix on the real estate farm loans
from the states selected in the CS sample.
( a ) Estimate the average real estate farm loans in the United States.
Chapter 7: Systematic Sampling 647
Practical 7.6. From the population 1 given in the appendix, apply the following
methods to select a sample of 10 units and compare the estimate.
( a ) Select a sample with random start 1::; i ::; N with the units:
{i + jk, N - i - jk + 1, j = 0,1,2,oo.,(n/2 -I)} if n is even,
n ={
s ()
{i+jk, N-i-jk+l , i+(n-l)k/2, j=0,I,oo.,{(n-l)/2-1}} if nisodd,
which is called modified systematic sampling .
Hint: Singh, Jindal and Garg (1968).
( b ) Select a sample with random start 1::; i ::; N with the units:
s(n) = {{~ + 2~k, 2(~ + 1)k- ~ + 1, ~ = 0,1,2,oo.,(~/2 -I)} if n is even,
{/+2jk, 2(j+l)k-/+l, /+(n-l)k, j=0,I,oo.,{(n-3)/2}} if n is odd,
which is also called balanced systematic sampling.
Hint: Murthy (1977), Sethi (1965) .
( c ) Apply the following formulae to estimate the variance in each situation and
construct 95% confidence intervals:
(i) \-1
vy = 6n n - 2 )±(vi
j=3 '
j· -
2Yi j ·-I+Yi j
, ,
·-2r, called Yates' method;
( ii) vsd = 1(- 1 ) f
2n n -1 j=2
(vi,j - Yi,j-I r,, called the method of successive differences.
Practical 7.7. Select all possible five samples each of 10 units with their respective
random start 1 S iS 5 from the population 1 given in the Appendix. Calculate the
following estimators of the variance, given by
• (1- f) 2 2 1 II t _ )2 _ 1 II
vI = - - Sy ' where "» =--1 ~ I)'i,j - Yi and Yi =- ~ Yi,j '
n n- J=l n J=l
• 1- f II t )2 . 1- f II t )2
v2 = 2 ( 1) ~ I)'i,j -
n n - J=l
ru-. ,v3 = 6 (
n n-
2) I I)'i,j - 2Yi,j -l + Yi,j-2
j =3
,
and
• 1- f II ( Yi,j Yi,j_4)2
V4 = 35n(n-4)j~5 -2-- Yi,j - l + Yi,j-2 - Yi, j - 3 +-2-
Also find their empirically expected values, biases, and mean square errors .
Suggest an efficient estimator based on the empirical results.
Practical 7.8. Select t = 2 systematic samples, each of n = 10 units, with two
different random starts from population 1 consisting of N = 50 units. Let k = 5 , so
that the condition N = nk = lO x 5 = 50 is satisfied. Also note that the sample size
n = 10 is dividable by t = 2. Take an SRS of t = 2 units from the first block of
tk = 2x 5 = 10 units and select every tk = 2x 5 = 10th unit thereafter. This method
divides the population into tk: samples each of size m= N /(tk) = n/ t units and
selecting t = 2 systematic samples (or clusters) by an SRSWOR sampling . From
the selected I samples, each consisting of m units, find the sample means as
)ij' j =1,2,..., t. Estimate the population mean on the basis of pooled estimator
=
Ypoo led = -
1 ~ _ d esti . . • (k -1) 1 ~ t: =
z: Yj ,an estimate Its van ance as v = - - - (- ) z: I)'j - Ypooled
)2
m j=1 kt t - 1 j=l
Derive 95% confidence interval estimate.
Hint: Iachan (1982), Gautschi (1957), Shiue (1966).
Practical 7.9. The following figure shows nine trees on a portion of Indian Grand
Trunk Road with their heights (feet) and timbre weights (kgs) as
a .,: Indian Grand Trunk Road
Height: 10.5 11.5 12.2 8.4 15.9 12.7 14.3 11.7 10.5
Weight: 202 125 219 107 187 198 210 213 209
( a ) Select all the possible three samples each of three trees using systematic
sampling, and estimate the average height (and weight) from each sample.
( b ) Use these estimates to estimate the variance. (Rule: Use first column of the
Pseudo-Random Number Table 1 given in the Appendix .)
8. STRATIFIED AND POST-STRATIFIED SAMPLING
- Population
-:::
I~Homogeneous
strata or
of N units __ groups
......
»r-«:
.--:;;?
"""'-~
N( N2 NL
1 1
". ". Samples selected with ".
nl n2 design p. nL
Population of N units
Distribution of
sampled n units
among post-strata
Thus the difference between stratified and post-stratified sampling schemes is that
in stratified sampling the sub-sample size nh is a fixed or predecided number,
whereas in post-stratified sampling it is a random variable
Obviously, using the concept of weighted average the true population mean of the
whole population can be written as:
= (!!l-JY; + (N2 JY; +....+ (NL JYr = w;Y; + w2Y; +...+ WLYr =
N N N
f.Wh~ .
h;1
Consider a sample of size nh is drawn using SRSWOR sampling from the hlh
L
population stratum consisting of N h units such that Inh = n , the required sample
h;1
size . Assume the value of the /h unit of the study variable selected from the h lh
stratum is denoted by Yhi' where i = 1,2, ..., nh and Wh = N h/ N is the known
proportion of population units falling in the hlh stratum.
_ _I nh th
where Yh = nh 'IYhi denotes the h stratum sample mean .
i;(
Proof. We have
E(Yst) = E[ IWhYh]
h;1
= IWhE(Yh)=
h;1
IWh~ = h;1IWh[_I_¥
h;1 N
Yhi]
h i;(
= IL -N h [ - 1 Nh ] 1 L Nh Y-
I Yhi =- I I Yhi =- = Y .
h;l N Nh i ; 1 N h;li;1 N
Hence the theorem.
Theorem 8.1.2. Under SRSWOR sampling, the variance of the estimator Yst is
given by
V(Yst)= IWl(I-n«fhJS~y
h;( (8 .1.2)
Proof. Note that the strata are independent and under SRSWOR sampling we have
2V (-) 1 - !h J 2
2(-
- )
V ( Yst = V [ 'L.WhYh
L _]
= 'L.Wh
L
Yh = 'L.Wh
L
- Shy '
h=l h=l h=1 nh
Hence the theorem.
Example 8.1.1. The following data shows daily temperatures in London and New
York cities in F as follows:
0
I;;,< L~
NY 48
London 54
NY 52
NY 47
London 57
NY 54
NY 49
London 59
NY 53
NY 50
NY 52
NY 57
London 55
NY 54
London 68
NY 49
NY 51
London 61
NY 55
NY 53
London 50
Chapter 8: Stratified and Post-Stratified Sampling 653
Solution ( a) SRSWOR Sampling: From the above data we are given N = 21,
n=4, /=n/N=4/21 , and
NY 48 2304
London 54 2916
NY 52 2704
NY 47 2209
London 57 3249
NY 54 2916
NY 49 2401
London 59 3481
NY 53 2809
NY 50 2500
NY 52 2704
NY 57 3249
London 55 3025
NY 54 2916
London 68 4624
NY 49 2401
NY 51 2601
London 61 3721
NY 55 3025
NY 53 2809
London 50 2500
which implies
N N 2
IY; = 1128, IY; = 61064
;;1 ;; 1
654 Advanced sampling theory with applications
and
54 2916 48 2304
57 3249 52 2704
59 3481 47 2209
55 3025 54 2916
68 4624 49 2401
61 3721 53 2809
50 2500 50 2500
52 2704
57 3249
54 2916
49 2401
51 2601
55 3025
N 1 = 7,
N1
Jt] = N
7 nl
= 21 = 0.3333333 , It = Ii; = "7 '
2 NI
D t; = 404,
;: 1
Nl Z
1:
;:1
lJ; = 23516
and
S2
2y 2 1
=(N -ltIJ'fY2_N-I('fy.J2j= 37548-(724f/14 =8.22.
;;1 2/ 2 ;; 1 2/ 14-1
Thus the variance of the estimator of the population mean in the stratified random
sampling is given by
V(y-)- ~W2(1-fh}2
Sl - L.
h;1 nh
~W2(1-fh}2
h - - hy - L.
h;l
h - - hy -_W;2(1-
nh
Ji Iy +W22(I-
I - - S2
n(
- -JS
n2
fz 2y
2 J
= (;lr( 1-;/7)X 33.24 +(~~ r( 1- ;14) X8.22
= 1.319246 + 1.565714 = 2.88496.
( c ) Relative efficiency: The percent relative efficiency of the stratified random
sampling over SRSWOR sampling is given by
Thus the sample mean estimate of the population mean based on SRSWOR is
_ 1 n 204
y=-I y; =-=51.
n ;;1 4
The sample variance is given by
A (1- a)l 00% Confidence Interval estimate of the population mean is given by
y ± la/2(df = n - l).jvsrswor(Y) .
Thus a 95% Confidence Interval estimate of the population mean is given by
51 ± IO,02S(df =4 -1)J4.182 .
Using Table 2 from the Appendix we have
656 Advanced sampling theory with applications
51 ± 3.182"'4.182, or [44.49,57.51].
Note that the true population mean Y = 53.71 lies in the 95% confidence interval.
The interpretation of this confidence interval estimate is that we are 95% sure that
the true population mean lies between 44.49 OF to 57.51 OF.
( e) 95% CI estimate using stratified random sampling: We selected two units
from Stratum 1 and two units from Stratum 2 using lottery methods:
57 3249
55 3025
2 -(
Sly - nl- I)-Il~""Yli-
2 nl-I(~ - 6274-(112f /2 --20
""YliJ2)- ..
i=1 i=1 2-1
From sample stratum 2
n2 _ I n2 109 n2 2
LY2i =109, Y2 =-LY2i =-=54.5, LY2i =5953
i=1 n2 i=1 2 i=1
and
2 -_(n2 -
S2y 1)-11~""Y2i2 n:-I(~""Y2iJ2)_5953-(109f
- /2_125
- . .
2-1
r
i=1 i=1
Thus an estimator of the variance of the estimator of the population mean III
stratified random sampling is
"(Yst)= f Wh2(1-!h
h=1 nh
}~y = h=1f Wh2(1-nhIh }~y = WI2(1-nlII JS12y +Wl(l-n2h Js~y
Example 8.1.2. Select two units by SRSWOR sampling from each stratum of the
population 5 given in the Appendix . Collect the information on the yield/hectare of
the tobacco crop from the countries selected in the sample. Assuming that the total
number of countries in each continent are known, estimate the average yield/hectare
of the tobacco crop in the world. Construct the 95% confidence interval.
Solution. Using Pseudo-Random Number (PRN) Table I given in the Appendix we
have the following sample information and some results.
5 2.03
6 2.00
2 2 4 1.99 0.0566 1.750 0. 11520
2 1.51
3 3 2 3.69 0.0755 3.240 0.40500
8 2.79
4 4 and 5 07 1.36 0.0943 1.850 0.48020
08 2.34
5 6 and 7 07 2.05 0.1132 1.155 1.60210
02 0.26
6 8 2 1.61 0.0377 1.785 0.06130
I 1.96
7 9 and 10 20 2.10 0.2830 2.305 0.08410
12 2.51
8 II and 12 15 1.33 0.1604 1.090 0.11520
06 0.85
9 13 and 14 05 1.33 0.0944 1.240 0.01620
07 1.15
10 15 3 2.58 0.028 3 1.765 1.32850
2 0.95
658 Advanced sampling theory with applications
Then we have
where
2(1- fh) 2
Vh = Wh --;;;- Shy'
A
Thus a stratified estimate of the average yield/hectare of the Tobacco crop in the
world is
L
Yst = 'L.WhYh = 1.8243
h=l
and an estimate of the variance of the stratified estimator is given by
L 2(1-
,(- )= 'L.Wh
v Yst -- 2= 0.015905.
fh) Shy
h=l nh
Using Table 2 from the Appendix the 95% confidence interval estimate is
We know now that the variance of the estimator of population mean under stratified
sampling is given by formula in (8.1.2). Furthermore, we known that the choice of
sample size nh for stratified sampling has to be decided once the sample size n is
chosen. A natural question arises: What choice of nh will make the variance of the
estimator Yst a minimum ? There are several ways to answer this question, but we
will discuss only a few of them here.
Chapter 8: Stratified and Post-Stratified Sampling 659
As the name of the method suggests the sub sample sizes are equal, i.e., nh = n/ L .
Under this choice of sample allocation, the variance of the estimator Yst reduces to
v(-) -
Yst E -
k.z: wh2( 1- l h ) S hy2 -_ k.z: wh2( N h - nh ) S hy2 -__I_k.
2 z:
Nh(Nh-n/L)s 2
/ hy
h=l nh h=l Nhnh N h=1 n L
I L
=--2
nN h=l
z», ( LNh-n ) Shy2 ' (8.2.1)
Under this allocation the sub sample size from each stratum is proportional to the
size of the subpopulation in the stratum. That is
nh oc N h . (8.2.3)
To find the constant of proportionality we have
nh=KNh. (8 .2.4)
Taking the sum on both sides of(8.2.4) over all possible strata we have
L L
'i. nh = K 'i. N h , or n = KN
h=l h=l
which implies that
K = n/ N . (8.2.5)
On substituting (8.2.5) in (8.2.4) we have the proportional allocation in the hth
stratum as
(~ )
1 L 2
(8.2 .8)
VVst 'LWhShy '
p = -
n h=l
The unbiased estimator of V(Yst)p can easily be obtained from (8 .2.7) and (8 .2.8) by
Example 8.2.1. Select a sample of 40 countries from popu lation 5 using the method
of pr oportional alloca tion. Record the yieldlhectare of the tobacco crop from the
selected countries. Estimate the average yieldlhectare of the tobacco crop in the
world. Estimate the variance under the method of proportional allocation. Con struct
a 95% confidence interval.
Solution. By the method of proportional allocation, the number of units to be
selected from the h1h stratum is given by, nh = nNh/ N . Thus we have
I 2 3 4 5 6 7 8 9 10 TotlH~
i~+riir
1 1"f" r~~r" 'N;''''f!:;' 1 ,ilii!>' 6 6 8 10 12 4 30 17 10 3 1 ;;:iB bQ6:~ .pi
;?
.",;.)" 3 3 3 3 4 2 11 6 2 ""4 0
,~
nh 3
~tiatum
5 Nicara ua 2.03
6 Panama 2.00
2 El Salvador 1.79
2 2 4 Jamaica 1.99
2 Dominican Re 1.51
1 Cuba 0.63
3 3 2 Bel ium-Iux 3.69
8 S ain 2.79
I Austria 1.90
4 4 and 5 07 Macedonia 1.36
08 Poland 2.34
02 Albania 0.64
5 6 and 7 07 Moldova 2.05
02 Armenia 0.26
10 Turkmenistan 2.36
09 Taiikistan 2.48
6 8 2 Lib a 1.61
I Al eria 1.96
7 9 and 10 20 Ni eria 2.10
12 Ken a 2.5 1
30 Zambia 1.29
Continued. .. .. .
Chapter 8: Stratified and Post-Stratified Sampling 661
15 Malawi 1.22
07 Central African Rep 0.87
26 Togo 0.50
22 Zimbabwe 2.06
01 Angola 0.99
11 Cote d'ivoire 0.26
05 Zaire 1.11
06 Cameroon 1.62
8 11 and 12 15 Thailand 1.33
06 Indonesia 0.85
02 Burma 1.22
10 Korea, South 2.02
11 Laos 0.75
05 China 1.75
9 13 and 14 05 Lebanon 1.33
07 Syria 1.15
01 Cyprus 1.50
10 15 3 New Zealand 2.58
2 Solomon Islands 0.95
Thus we have
Thus an estimate ofyield!hectare of the Tobacco crop during 1998 in the world is
L
Yst = I,WhYh = 1.5654
h=l
and
v(Yst)p = (1- f) ~WhS~y = 1- 40/106 x 0.4970 = 0.007736.
n h=1 40
Using Table 2 from the Appendix the 95% confidence interval estimate of the
yield/hectare in the world during 1998 is
662 Advanced sampling theory with applications
This method is based on the cost aspect of the survey. Let Ch be the cost of
observing the variable y in the h'h stratum and let C/ be the total fixed cost of the
survey, then
L
C/ = Co + 'L,nhCh (8.2.9)
h=l
where Co stands for the known overhead cost. From (8.1.2) , the variance of the
estimator )lst is
(- ) L 2( fh) 2
1-
V Yst = 'L,Wh - - Shy' (8.2.10)
h=\ nh
We now discuss two cases:
( i ) the total cost is fixed; (ii) the variance is fixed.
Case I. Total cost is fixed: Minimization of (8.2.1 0) subject to (8.2.9) leads to the
Lagrange function
L(1 1 ) 2 2 A Co + InhCh- C, .
L\=I ---WhShy+ [L ] (8.2.11)
h =1 nh Nh h=\
On differentiating (8.2.11) with respect to nh and equating to zero we have
nh =WhShy/(fijC;). (8 .2.12)
L
Note that I nh = n , from (8.2.12) we have
h=1
j
'I [::1 JC; .
1 L WhShy
fi = (8.2 .13)
11
On substituting (8.2.13) in (8.2.12) we have
Vtx:
hy L WhShy) hy / L WhShy
nh = WhS =nWhShy Ch I - - =n WhS I rr cr : (8.2 .14)
fiJC; h=\ JC; VCh h=\ VCh
In a particular case if C1 = C2 =..... = CL = C , that is the cost of sampling in each
stratum is the same then (8.2.14) becomes
nh = n
[
f.WhShy
WhS
hy
]
.
(8.2 .15)
h=1
In other words the optimum allocation reduces to the famous Neyman (1934)
allocation. On substituting (8.2.14) in (8.2.10) , the variance of the estimator )lst
under optimum allocation is given by
Chapter 8: Stratified and Post-Stratified Sampling 663
(-;:)Opt -_ IL w"2S"y
V\Yst I W"S"y/
2 [L r;;- n[W"S"y) 1]
r;;- -]:I
"=1 "=1 VC" VC" "
=-
1( L
IW"S"yvr;;-)[ W"S"y)
C" IL - - -
L
I --.
W,,2Sly (8.2.16)
n "=1 "=1 je; "=1 N"
On substituting (8.2.15) in (8.2.10), the variance of the estimator jist under Neyman
allocation is given by
IW"S"y)2 - I~.
2 2
I(L L W S
V(Yst)N =- (8.2.17)
n "=1 "=1 N"
It can be easily shown that if f" is negligible then
V(Yst)oPt
-
=- IW"S"yvr;;-)[
1( L
n "=1
C" IL W"S"y)r;;-
,,=\ vC"
(8.2.18)
and
-
V(Yst ~ =-
1( L )2 .
IW"S"y (8.2.19)
n ,,=\
Case II. Total variance is fixed: In this case we minimize the cost given by (8.2.9)
subject to the fixed variance Vo'
V\Yst
(-;:)
=
1 --
IL ( - 1 JW"S"y=Vo ·
2 2
(8.2.20)
"=1 n" N"
In such situations, the Lagrange function L 2 is given by
L L
2=CO +"f.n"C,,+A. [ "f.L(1 IJW"S"y-V
--- 22 o] . (8.2.21)
"=1 "=1 n" N"
On differentiating (8.2.21) with respect to n" and equating to zero we have
n" = W"S"yJi/ je; . (8.2.22)
Now there are two possibilities.
( a ) Total sample size is fixed: Adding (8.2.22) over all possible strata we have
( b ) Minimum sample size for a fixed value of variance: From (8.2.20) we have
22/
LL Wh Shy nh = Vo + LL Wh Shy 22/n, . (8.2.26)
h=1 h=l
On substituting the value of nh from (8.2.22) we have
Summing over all possible strata we obtain the minimum sample size n for the
fixed variance as
= =Ct
n htnh WhShy /.rc;; JCt WhShy.rc;;J/[Vo + h~1 Wh2Sly /Nh] ' (8 .2.29)
Case III. Total variance and sample size are fixed: Now the Lagrange function is
On substituting the value of ,1,2 from (8.2.33) in (8.2.32) we obtain the allocation of
the given sample of n units into different strata for the fixed level of variance.
"2
Stratu,m ~~ l'f"h ,j 1ZC·~hy". ,~:., II .~S
,> ,hy
'~ .4 iV'h~h~~ ~~ -' n~
1 6 0.02682 0.163788 0.98270 0.5673 1
2 6 0.21809 0.467001 2.80200 1.6177 1
3 8 0.34699 0.589065 4.71250 2.7208 3
4 10 0.23456 0.484314 4.84310 2.7962 3
5 12 0.58214 0.762981 9.15570 5.2861 5
6 4 0.15310 0.391280 1.56510 0.9036 1
7 30 0.34385 0.586387 17.59220 10.1567 10
8 17 0.37855 0.615264 10.45950 6.0389 6
9 10 2.01830 1.420669 14.20670 8.2023 8
10 3 0.97460 0.987218 2.96160 1.7099 2
I' ,,; ~W;l;ii'" '!ff;~~"",Siim i 69.28074 l"a1!ii!f "t. I£\ YI' yilif
05 Zaire 1.11
8 II and 12 15 Thailand 1.33
06 Indonesia 0.85
02 Burma 1.22
10 Korea, South 2.02
11 Laos 0.75
05 China 1.75
9 13 and 14 05 Lebanon 1.33
07 Syria 1.15
01 Cyprus 1.50
09 Turkey 0.91
06 Oman 1.11
04 Jordan 1.29
03 Iraq 1.09
02 Iran 1.39
10 15 3 New Zealand 2.58
2 Solomon Islands 0.95
This leads to the pooled strata and related results given below:
Pooled
~ap1~le.
'n/"'"
1,2,6 16 3 1.8766 0.0537 0.1875 0.15094 0.2832 0.00033 13
3 8 3 2.7933 0.8010 0.3750 0.07547 0.2108 0.0009505
4 10 3 1.4467 0.7281 0.3000 0.09434 0.1364 0.0015120
5 12 5 1.7560 0.8073 0.4166 0.11320 0.1987 0.0012070
7 30 10 1.2910 0.5251 0.3333 0.28301 0.3654 0.0028040
8 17 6 1.3200 0.2462 0.3529 0.16037 0.2116 0.0006829
9 10 8 1.2213 0.0363 0.8000 0.09434 0.1152 0.0000081
10 3 2 1.7650 1.3285 0.6666 0.02830 0.0499 0.0001773
: .Sum' 1.5717 0.0076734
where vh = Wh 1
2( ~:h }~Y '
Thus an estimate of yield/hectare of the world tobacco crop during 1998 is
Using Table 2 from the Appendix the 95% confidence interval estimate of the
yieldlhectare in the world during 1998 is
Yst +fa/ 2(df = n -L~ , or Yst +fO.02S(df = 40-S)Jv(Yst)N
or 1.5717 +2.037-JO.0076734 , or [1.39306, 1.75014] .
Chapter 8: Stratified and Post-Stratified Sampling 667
Example 8.2.3. Select a sample of 40 countries from population 5 using the method
of optimum allocation. Record the yield/hectare of the tobacco crop from the
selected countries. Estimate the average yield/hectare of the tobacco crop in the
world . Estimate the variance under the method of optimum allocation. Construct a
95% confidence interval.
Given: C1 = $0.5, C2 = $2.0, C3 = $3.0, C4 = $5.0, Cs = $7.0, C6 = $1.5, C7 = $10.0, .
Cg = $5.0, C9 = $5.0, and CIO = $3.0.
Solution. By the method of optimum allocation, the number of units to be selected
from the hth stratum is
Using the Pseudo-Random Number (PRN) Table 1 from the Appendix we have :
5 Nicara ua 2.03
6 Panama 2.00
2 2 4 Jamaica 1.99
2 Dominican Re 1.51
1 Cuba 0.63
3 3 2 Bel ium--Lux 3.69
8 S ain 2.79
1 Austria 1.90
4 4 and 5 07 Macedonia 1.36
08 Poland 2.34
02 Albania 0.64
5 6 and 7 07 Moldova 2.05
Continued... ...
668 Advanced sampling theory with applications
02 Armenia 0.26
10 Turkmenistan 2.36
09 Taiikistan 2.48
6 8 2 Libya 1.61
I Algeria 1.96
7 9 and 10 20 Nigeria 2.10
12 Kenya 2.51
30 Zambia 1.29
15 Malawi 1.22
07 Central African Reo 0.87
26 Togo 0.50
22 Zimbabwe 2.06
8 11 and 12 15 Thailand 1.33
06 Indonesia 0.85
02 Burma 1.22
10 Korea, South 2.02
11 Laos 0.75
05 China 1.75
9 13 and 14 05 Lebanon 1.33
07 Syria 1.15
01 Cvnrus 1.50
09 Turkey 0.91
06 Oman 1.11
04 Jordan 1.29
03 Iraa 1.09
10 Yemen 1.73
10 15 3 New Zealand 2.58
2 Solomon Islands 0.95
Thus we have the following results:
where vh = Wl(
1
~:h }~Y' Thus an estimate of the yield/hectare of the tobacco crop
vA
(_
Yst )Opt = LLWhz(- fh}Zhy = 0.010591932 .
L
Yst = LWhYh = 1.6155 and 1- -
h=l h=l nh
Using Table 2 from the Appendix the 95% confidence interval estimate of the
yield/hectare in the world during 1998 is
Yst =Ffa/z(df = n- LNv(Yst) Opt or Yst =Ffo.ozs(df = 40 -lONv(Yst) Opt
or 1.6155+ 2.042~0.010591932, or [1.4053, 1.8256].
Example 8.2.4. Find the minimum sample size from population 5 to obtain
estimates of population mean with different levels of relative variance. Plot relative
variance versus sample size.
Given: Ct = $0.5, Cz = $2.0, C3 = $3.0, C4 = $5.0, Cs = $7.0, C6 = $1.5, C7 = $10.0,
Cg = $5.0, C9 = $5.0, CIO = $3.0 and Y = 1.5507 .
Solution. The minimum sample size n for the fixed variance Vo is given by
+
Thus we have the following table
RV={v(Yst)/Y Z } xl00.
Therefore we have the following table.
670 Advanced sampling theory with applications
0.04
0.035
Q)
u 0.03
c
ca
.;: 0.025
.
ca
>
Q)
0.02
> 0.015
ca
Qj 0.01
0::
0.005
0
0 10 20 30 40 50 60 70
Sample size
-\2
2
ahy = rL WhS hy
2 L (-
+ rWh Yh - Y j . (8.2.40)
h=l h=1
Thus we have
2 L 2 L (- -\2 (8.2.41)
a y-rWhShy= rWhYh-Yj ~O .
h=1 h=1
This proves that the inequality (8.2.38) holds. If, however, 0, = Y for all h then
the two variances are equal.
Now we prove the second part of the inequality (8.2.34), that is
V(Yst)p ~ V(yst~ (8.2.42)
672 Advanced sampling theory with applications
2
or I whsly - ( IWhShy J :2 0 or fWh[ShY - fWhShy]2 :2 O. (8.2.43)
r r-
h~1 h~l h~\ h~l
= 'LLwhsly -
( 'LLWhS hy )2 (8.2.44)
h~1 h~l
Theorem 8.2.2. Show that the estimator of gain in efficiency (em )owed to
stratification with respect to SRSWOR sampling is
and
s; = N
1
-
1[I N~yli - N~~ - V(Yst)}] .
h
h=\ nh i=\
(8.2.48)
Note that
L N h nh
E [ 'L -
L 2]
( 1 nh L 1 Nh L Nh
'LYhi = 'LNh E - 'LYhi = 'L N h -'LYhi = 'L 'LYhi
2) 2 2
h=l nh i=l h=l nh i=\ h=l Nh i=1 h=1 i=1
and
E~~ - V(Yst)] = E~~)- Ef(Yst)}= V(Yst)+ {E(Yst)F - V(Yst) = f2 . (8.2.49)
The percentage gain in efficiency (GE) owed to stratification can be defined as
GE = {V(y)- V(Yst)} x 100% . (8.2.50)
V(Yst)
Therefore the theorem follows by using method of moments.
Chapter 8: Stratified and Post-Stratified Sampling 673
N -n
- - S yZ - V"(-Yst)]
(8.2.51)
GE = [ Nn x 100 = (0.024405 -0.015905) x 100 = 53 44% .
v(jist) 0.015905 .
Reddy (1978b) has considered the case of a finite population of size N divided into
L strata of sizes N h, h =1,2, ..., L . Let Yhi denote the value of the variable Y for
L Nh
the lh unit in the h'h stratum. For estimating the population total Y = L LYhi it is
h= 1i=l
shown in several books (e.g., Cochran 1963) that stratified random sampling with
proportional allocation is superior to unstratified random sampling provided the
finite population correction factor (f.p.c.) in each stratum is ignored. Reddy (1978b)
has shown that the above result is true even without ignoring the f.p.c, under
proportional allocation with superpopulation model approach, if the variable Y
satisfies the following condition:
max(Yhi) ~ min(Yh+I,i) for h = 1,2, ...., L -1 . (8.2.52)
I I
Example 8.2.6. In a circus there are three types of elephants, viz., Light, Medium,
and Heavy in weight and some information about them is listed in the following
table.
( d ) Estimate the average weight , Y, of all elephants in the circus using usual
estimator in stratified sampling.
674 Advanced sampling theory with applications
( d ) Estimate of the average weight of all the elephants using stratified sampling is
L
Yst = 2:WhYh = 0.50 x 1800 + 0.30 x 3200+0.20 x 5200 = 2900 .
h= l
( e ) Estimate of the variance of Yst is given by
'(7; ) -
v\Yst ~
2 1- fh
- L. Wh ( - 2
- Shy J
h=l IIh
= 0.52 x ( 1-20~08) x 4902 +0.32 x (1-1~08) x 410 2 + 0.22 x (1- ~.08)x 2202
= 4143.68 .
( f) Using Table 2 from the Appendix the 95% confidence interval estimate using
strat ified sampling is given by
Yst +fa / 2(df = II - L )JqYJ , or 2900 + f a /2(df = 40-3)J4143.68 ,
s2 = h-l
f tnh -1)s~J+ h-lf~h(Yh - yf}
y n-I
f
(20- 1)490 2 + (12 - 1)410 2 + (8- 1)2202 + 20(1800 - 2900)2 + 12(3200 - 2900 + 8(5200 - 2900f
40-1
= 1906405.128 .
( k ) The pooled sample variance s; is given by
- =Y
YR
-(XJ
x = 2900(145.00)
142 .96
= 2941.38 k g .
( n ) Confidence Interval estimate :
Note that r =-
Y =-
2900
- = 20.285 and f =-n =-40 = 0.08 .
x 142 .96 N 500
Thus an estimate of the mean square error of the ratio estimator is given by
• (-)
MSE YR = ( -n-
1- f)r
lSy2 +r 2Sx2 -2rsxy ]
676 Advanced sampling theory with applications
= 6219.281.
Using Table 2 from the Appendix the 95% confidence interval estimate of the
average weight of elephants using ratio estimator is given by
YR ± t o.025 (df = 40 - 1N MSE(YR) , or 2941.38 ±2.023.j~62-1-9.-28-1
or [ 2781.841, 3100.918 ] .
( 0 ) Comment on the CI estimates: Although we are losing an extra two degrees
of freedom in stratified sampling, still the length of the confidence interval estimate
obtained through stratified sampling is smaller than that of a ratio estimator at the
same level of confidence. Thus we conclude that stratified sampling performs better
than the ratio estimator in this particular situation.
i», = N . Assume from the hth population stratum consisting of N h units, a sample
h=1
L ffi
of size nh is drawn using SRSWOR sampling, such that "[.nh = n • Let the i
h=1
th
sample unit of the study variable and auxiliary variable in the h stratum be denoted
bY Yhi an d Xhi
. Iy. Let Y- h =
respective -I nh
nh L Yhi an d Xh
- = -I nh
nh "[.xh i
denote th e hth
i= l i=1
stratum sample means for the study and auxiliary variable. Then for the hth stratum
let us define
Yh
EhO=~-1 an
d Ehl=~-1 Xh
Yh x,
such that
E(EhO) = E(Ehl) = 0
and
-
Ysr = ~W-(Xh)
hYh -=-
L.,
h=l Xh
(8.3.1.1)
where Wh = Nh/N .
Then we have the following theorems:
Theorem 8.3.1.1. Bias in the estimator Ysr' to the first order of approximation, is
E(Ysr)= ~WhY,,[I+(I-fh)~L:-PhXyChXCh)l.
h=1 nh 'J
(8.3.1.4)
_ L _
Taking the deviation of (8.3.1.4) from population mean Y = LWhYh we have
h=l
(8.3.1.2). Hence the theorem.
Theorem 8.3.1.2. The variance of the separate ratio estimator Ysr, to the first order
of approximation, is
V (Y- sr) = LL Wh2(I-fhJ-2r2 2 - 2phxyC
- - Yh lChy + Chx ]
hxChy . (8.3.1.5)
h=l nh
Proof. Note that the strata are independent we have
V(Ysr) = v[ h=l~WhYh(~h)]
Xh
~ WlV[Yh(~h)]
= h=l Xh
. (8.3.1.6)
r'"
Now we have
=Yh
1- fh)f
-2(----;;;- 2 + C 2 - 2PhxyChxChy ) .
\Chy hx (8.3.1.7)
On substituting (8.3.1.7) in (8.3.1.6) we have the theorem.
Theorem 8.3.1.3. An estimator to estimate the variance of the separate ratio
estimator, Ysr, is given by
, (Ysr
VI -) = L
L Wh2( - fh J[Shy
1- - 2 + rh2Shx
2 - 2rhShxy ] .
(8.3.1.8)
h=l nh
where
678 Advanced sampling theory with applications
Shxy = (nh -ItI !:(Yhi - YhXXhi - Xh), and rh = Yh/xh are the estimators of respective
i=l
population parameters in the hth stratum.
Another estimator to estimate the variance of the separate ratio estimator Ysr is
The third improved estimator of the variance of the separate ratio estimator Ysr'
owed to Wu (1985), is
, (_) L 1- fh )(
V3 Ysr = IWh -
2( s, - (I) nhIehi2
)gh 1
-=- (8.3.1.10)
h=l nh Xh nh - i= l
where gh denotes the suitably chosen constant in the h th stratum such that the
variance of the estimator V3(Ysr) is minimum. It is shown by Wu (1985) that the
optimum value of s» leads to an efficient estimator.
Example 8.3.1. Select a sample of 40 countries from population 5 using the method
of proportional allocation. Record the yield/hectare and area under the tobacco crop
from the countries selected in the sample. Apply ratio cum product estimator to
estimate the average yield/hectare of the tobacco crop in the world. Assuming that
the total area in each continent under the tobacco crop is known, estimate the
variance of the estimator used. Construct a 95% confidence interval.
Solution. By the method of proportional allocation, the number of units to be
selected from the hth stratum are given by nh = n N h / N . Thus we have
1 6 3 0.500 0.0566
2 6 3 0.500 0.0566
3 8 3 0.375 0.0755
4 10 3 0.300 0.0943
5 12 4 0.333 0.1132
6 4 2 0.500 0.0377
7 30 11 0.367 0.2830
8 17 6 0.353 0.1604
9 10 3 0.300 0.0943
10 3 2 0.667 0.0283
Chapter 8: Stratified and Post-Stratified Sampling 679
Using the Pseudo-Random Number (PRN) Table 1 given in the Appendix we have
the following sample information and some results.
where
if Shxy> 0,
if Shxy< 0,
and
2
I
Vh = Wh( 1~:h ][sly+ rlslx - 2rhshXY] if Shxy> 0,
Wl( 1~:h ][sly + rlslx + 2rhshxY] if Shxy< O.
We have to use ratio and product estimators in different strata because the
correlation between yield/hectare and total number of hectare is uncertain . An
estimate of the average yield/hectare of the tobacco crop in the world is given by a
new separate ratio cum product estimator defined as
L _
Ysrp = L.Yh = 2.1687.
h=l
or 2.1687+2.052,",0.083673, or [1.5751,2.7623].
Note that in three strata the estimates of correlation were negative, and in the
remaining seven strata the estimates of correlation were positive. Therefore forthe
strata where ratio estimator was used the loss of degree of freedom was one, and
where the product estimator was used the loss of degree of freedom was taken as
two.
(8.3.2.1)
Theorem 8.3.2.1. The bias in the separate regression estimator Yslr , to the first
order of approximation, is
where
Ahrs = r~hrsSl 2 and ,uhrs = (N h -1t '¥ (Yhi - r,J(Xhi -xhf
1
for h = 1,2, ..., L.
,uh20,uh02 i=l
Proof. Follows from the bias expression of the regression estimator in Chapter 3.
Theorem 8.3.2.2. The variance of the separate regression estimator Yslr, to the first
order of approximation, is given by
(- ) ILWh2(1-
V Yslr = -- fh J2hy\1- 2).
( Phxy (8.3.2.3)
h=l nh
Proof. Note that the strata are independent, we have
V(Yslr) = V[fWh
h=l
~h + phxAXh - Xh)}] = h=lf Wh2V~h + PhxAX h -Xh)} ' (8.3.2.4)
Applying the concept of the usual linear regression estimator in each stratum, we
have
-
V (Ysl ) L 2( fh)[ 2
h=l
1-
2 2
nh
1 L 2 2( 2)
r = IWh - - Shy + PhxyShx -2Phxy Shxy = IWh - - Shy \1- Phxy .
h=1
(1- h)
nh
Hence the theorem.
682 Advanced sampling theory with applications
VI ()islr) = I Wh
h=1
2
( 1- fh
nh
J[s~y + plxifu - 2PhxyShxy1 (8.3.2.5)
where Phxy = S hxy / S ~x denotes the estimator of the regression coefficient in the. hth
stratum.
Another estimator of the variance of the separate regression estimator )islr is
V2()islr)= IWl(l-fhJ_(
h=1
I
nh nh -
1)I.e~i
i=\ (8.3.2.6)
th th
where ehi = (Yhi - )ih)- Phxy(Xhi -Xh) denotes the i residual term in the h stratum.
The third improved estimator of the variance of the separate regression estimator
)islr, proposed by Wu (1985), is given by
v3 Yslr = LWh -
(_) L 2(I- f h J( -=-
X hJgh 1 nh
- (1) Lehi
2
(8.3.2.7)
h=1nh xh nh - i=1
where g h denotes the suitably chosen constant in the hth stratum such that the
variance of the estimator V3()islr) is a minimum. It is shown by Wu (1985) that the
optimum value of g h leads to efficient estimator of variance.
Example 8.3.2. Select a sample of 40 countries from population 5 using the method
of proportional allocation. Record the yieldlhectare and area under the tobacco crop
from the countries selected in the sample. Apply the regression estimator to
estimate the average yield/hectare of the tobacco crop in the world. Assuming that
the total area in each continent under the tobacco crop is known, estimate the
variance of the estimator used for estimation purpose. Construct a 95% confidence
interval.
Solution. Using information from the previous example 8.3.1, for the case of a
separate regression estimator we derive following table.
where
Note that we are using separate regression estimator in each stratum , so we are
loosing two degree of freedom in each stratum.
Sometimes it is not possible to know the population means X h, h = 1,2,00 .,L of the
auxiliary variable in each stratum, but the combined population mean,
_ L _
X = LWhXh , is known . In such situations it is not possible to use separate ratio and
h=1
regression type estimators, but we can use a combined ratio or combined regression
estimator.
For deriving the expressions of bias and variance of the combined ratio or
regression estimator in stratified sampling, let us define,
00 = y..!! - 1 and
y
L L
where )lSI = LWh)lh and XS! = L Whxh are the unbiased estimators of population
h=1 h=l
mean Y and X respectively .
Obviously we have
£(00)= £(01)= O.
Assuming that the strata are independent, we have
- - (XJ
-=- .
Ycr = Yst
X st
(8.3.3.1)
Theorem 8.3.3.1. The bias in the combined ratio estimator Ycr' to the first order of
approximation, is
B/~cr)=
\Y R~
t: WhZ(I-»«fhJ{S1-shxy}
X (8.3 .3.2)
where R = fix.
Proof. The estimator Ycr in terms of 80 and 8 1 can be approximately written as
(8.3.3.3)
Then we have
B(Ycr
- ) = E (-)
Ycr - -y = -[
y 1 -=z IW
L
X h=1
hZ(I-fhJ
- - ShxZ-~
nh
1 IWL
X Y h=1
hZ(l-fhJ
- - Shxy]
nh
which on simplification reduces to (8.3.3.2). Hence the theorem .
Theorem 8.3.3.2. The variance of the combined ratio estimator Ycr' to the first
order of approximation, is
= fzrJIWl( ~:h }~y + htWl( ~:h Js~x _2_hf_=I_W_l- ,(-=I=-~=:,. -h. :. .J_Sh_xy_1
1 1
yZ XZ X Y
where r =Yst IXst denotes the estimator of population ratio R =Y/ X across the L
strata . Another estimator of the variance of the combined ratio estimator Ycr is
where g denotes the suitable chosen constant such that the variance of the
estimator V3(YCT) is minimum . It has been shown by Wu (1985) that the optimum
value of g leads to efficient estimator of variance.
Exa mple 8.3.3. Select a sample of 40 countries from population 5 using the method
of proportional allocation. Record the production and area of the tobacco crop from
the countries selected in the sample . Apply the combined ratio estimator to estimate
the average production of the tobacco crop in the world . Assuming that the total
area in the world under the Tobacco crop is known . Estimate the variance of the
estimator used. Construct a 95% confidence interval.
Given: Total area is 3,650,492.66 hectares .
Solution. By the method of proportional allocation, the number of units to be
selected from the h'h stratum is given by, 1/h = 1/ Nhl N . Thus we have
~Nh>-
:i:.' Wh ~
StratUm .J fh
., No ~ ~\ ,,'~ lIhd' ,;~ 1
\,.
1 6 3 0.500 0.0566
2 6 3 0.500 0.0566
3 8 3 0.375 0.0755
4 10 3 0.300 0.0943
5 12 4 0.333 0.1132
6 4 2 0.500 0.0377
7 30 II 0.367 0.2830
8 17 6 0.353 0.1604
9 10 3 0.300 0.0943
,:
10 3 2 0.667 0.0283
~. Sum 106'i< ~ ,.' .40 -~"'~ i",:, .. i, 1. 0 00 0 ~,
Using the Pseudo-Random Number (PRN) Table 1 from the Appendix we have the
following samp le information and some results.
686 Advanced sampling theory with applications
We are given the average area under the tobacco crop in the world, X = 34438 .61 .
Using Table 2 from the Appendix the 95% confidence interval of the average
production of the tobacco crop in the world is given by
Yer+fa/z(df =n-1Nv(Yer) , or 54507.96+2.023"'50823164.5
or
[40085.92, 68930.00].
defined as
Yclr = Yst + .8 (x -xst) (8.3.4.1)
where Pxy = I Wl( 1-nhfh JShxy/f1h=1I Wl( 1-nhfh rI('~x h=1I Wl( 1-nhfh Jslry) denotes the
h=1
correlation coefficient in stratified sampling across all strata.
Proof. We have
V(Yclr ) = V~st + .8(x- xst)] '" V(Yst )+ jJzv(xst)- 2jJCov(Yst, xst) (8.3.4.3)
where jJ = Cov(Yst , xst)jv(xst)' Hence the theorem .
Theorem 8.3.4.2. An estimator to estimate the variance of the combined regression
estimator Yclr is
th
where ehi = (Yhi - Yh)- iJ(Xhi -Xh) denotes the lh residual term in the h stratum, but
using fJ obtained from all strata .
The third estimator of the variance of the combined regression estimator Yclr, due
to Wu (1985), is
• (_) L
v3 Yclr = IWh -
2(1- !h)-1(I)Iehi-=-
nh 2( X)g (8.3.4.6)
nh
h=1 nh - i=1 Xst
where g denotes the suitable chosen constant such that the variance of the estimator
V3(Yclr) is minimum. It has been shown by Wu (1985) that the optimum value of g
leads to an efficient estimator of variance.
Example 8.3.4. Select a sample of 40 countries from population 5 using the method
of proportional allocation. Record the production and area of the tobacco crop from
the countries selected in the sample. Apply the combined regression estimator to
estimate the average production of the tobacco crop in the world. Assuming that the
total area in the whole world under the tobacco crop is known, estimate the variance
of the estimator used. Construct a 95% confidence interval.
Given: Total area is 3,650,492.66 hectares.
Solution. Continuing information from the example 8.3.3 we have
We are given the average area under the tobacco crop in the world, X= 34438.61.
690 Advanced sampling theory with applications
Thus an estimate of average production of the tobacco crop based on the combined
regression estimator is given by
Yelr =YSI + !J(X - XSI )=94666 .01 + 1.7639(34438.61- 59810.82) =49911.96
and an estimate of its variance is given by
V(Yelr)= I W2(1-
h=l
h fh
nh
)[s~y + !J2 s'fu -2!JShxY]= 19078507.03.
fi
The (1 - a 00% confidence interval of the average production of the tobacco crop in
the world is given by
Yclr +fa/2(df = n- 2Nv(Yclr)
Using Table 2 from the Appendix the required 95% confidence interval estimate is
49911.96+ 2.024.J19078507.03, or [41071.34, 58752 .57].
2 3 4 5 6 7 8 9 10 Total
6 6 8 10 12 4 30 17 10 3 106
3 3 3 3 4 2 11 6 3 2 40
ea ) Using full information from the description of the population given in the
Appendix, find the relative efficiency of separate ratio estimator with respect to
combined ratio estimator while estimating average production using known area
under the crop.
e b ) Using full information from the description of the population given in the
Appendix, find the relative efficiency of separate regression estimator with respect
to combined regression estimator while estimating average production using know
area under the crop.
Solution. From the description of the population 5 given in the appendix we have
o:
I 0.50 0.057 6515 .83 3194.50 2.040 51856391.4 10899652 .8 23619714.4
2 0.50 0.057 13545.66 14660.00 0.924 390206386.6 584984730.0 420480877.0
3 0.38 0.075 43629 .38 18309.37 2.383 3172330153 .0 635958094.9 1387271318 .0
4 0.30 0.094 21928 .10 14923.50 1.469 569396673 .9 209817189.2 319262930.2
5 0.33 0.113 11788.00 5987.83 1.969 153854582.5 27842810.4 62846173 .1
6 0.50 0.038 4653 .00 3450.00 1.349 7232769 .3 5876666 .6 6066866 .6
7 0.37 0.283 16862.27 11682.73 1.443 2049296094.0 760238523.4 1190767859.0
8 0.35 0.160 227371.50 145162.30 1.566 3.72E+II 1.24E+ II 2.14E+ll
9 0.30 0.094 32854 .10 33976.10 0.967 6802998890.0 8340765245.0 7529100326.0
10 0.67 0.028 3548.33 13333.33 0.266 22819758 .3 2963333 .3 8223083 .3
Chapter 8: Stratified and Post-Stratified Sampling 691
( a ) Separate ratio and combined ratio estimators: From the above table we
have
R
L
= IWhYh
_j L
IWhXh
_
= 52444.56/34438.61 =1.522838 .
h=l h=1
Let
2(I-fh)[2 22 -2R
Vh = Wh --;;;- Shy + RhShx hShxy
]
and
1-
Vh (C ) = Wh2(--;;;- fh)[
2
Shy + R 2st;
2 - 2RS ] ,
hxy
±
h=l nh r
v(Yclr ) = Wh2[ 1- fh I('~y (1- P;y)= 1060957560 x (I - 0.9903 2 )= 20482751 .17 .
Let
V(Yslr) = IVh(i) =
h= 1
I Wh (1-!hl<,rl)l- p~)= 8710 127.62 .
h=l
2
nh
Thus the relative efficiency of separate linear regression estimator with respect to
combined linear regression estimator is given by
RE= V(Yclr) x 100% = 20482751.17 x I00% = 235.16% .
V(.Yslr) 8710127.618
Thus separate ratio and regression estimato rs remain better than combined ratio and
regression estimator in this situation. If the population means are known for each
stratum, then one shou ld must go for separa te ratio or regressi on estimator. If only
overa ll population mean of the auxiliary variable is known then we do not have
choice and we have to go for combined ratio or regression estimators.
When we stratify the given population into L homogeneous strata or groups each
with N h units, then for each stratum we estimate population mean f,. , h = 1,2,...,L ,
as shown below:
I .Sfi'atilih No.rs I 2 3 h I I L I
ISt:atuiy RfPuI~tion Melip 1'J Yz Y3 Yh I I YL I
th
Then we select nh , h = 1,2,...,L units from the h stratum using SRSWOR sampling
(or any other sampling scheme). There are five cases :
Case I. When no model or auxiliary information is avail able , then we estimate the
popu lation mean in each stratum independently and every time we loose one degree
of freedom . Thus when we consider simple pooled estimator of the population
mean Yas
L
Yst = IWhYh (8.3.5.1)
h=l
then the degree of freedom df = (n - L)
Case II. Whe n we consider separate ratio estimator
-
Ysr = ~W-(Xh)
L. hYh -=-
h=l Xh
(8.3.5.2)
Nh th
Then on setting I Chi = 0 we are estimating only one ratio Rh = f,JXh in the h
i=l
stratum and we are loosing one degree of freedom in each stratum. The for the
separate ratio estimator, the degree of freedom should be again as df = (n - L),
because here we assume that the regression line in each stratum passes through the
origin, as shown in the Figure 8.3.1.
Stratum I Stratum 2 Stratum L
o o 0
Fig. 8.3.1 Separate ratio estimator.
minimizing N.f C~i in the hth stratum, and hence we are losing two degrees of
i=1
freedom in each stratum. Thus for the separate regression estimator the degree of
freedom should be taken as df = (n - 2L), because here we assume that the
regression line in each stratum does not pass through the origin and in each stratum
we have two parameters viz., intercept and slope, as shown in the Figure 8.3.2.
0 0
/ 0
Fig. 8.3.2 Separate regression estimator .
Chapter 8: Stratified and Post-Stratified Sampling 695
ratio R =Y/ X across all the strata in the population, and we are loosing only one
degree of freedom across all strata. The for the combined ratio estimator, the
degree of freedom should be taken as df = (n - I} In practice one can think that if
we arrange all the strata in ascending order on the basis of auxiliary information,
and the regression line passes across all the strata as shown in the Figure 8.3.3, and
we have to estimate only one parameter across all strata.
o
Fig. 8.3.3 Combined ratio estimator.
degrees of freedom across all strata. Thus for the combined regression estimator,
the degree of freedom should be taken as df = (n - 2} In practice one can think that
if we arrange the all the strata in ascending order on the basis of auxiliary
696 Advanced sampling theory with applications
information , and the regression line passes across all the strata as shown in the
Figure 8.3.4 then we have to estimate only two parameters across all strata.
o
Fig. 8.3.4 Combined regression estimator.
Singh, Hom, and Yu (1998) extended the higher level calibration approach for
stratified sampling design as follows . Assume the population consists of L strata
with N h units in the h1h stratum from which a simple random sample of size n» is
L
taken without replacement such that total population size is N = I.Nh and sample
h=1
L
size IS n = I.nh . Associated with the l h unit of the h'h stratum there are two value
h=1
Yh'I and XhI. with Xh'I > 0 being the auxiliary variable. For the h1h stratum, let
Wh = Nh/N be the stratum weights, fh = nh/Nh the sampling fraction and Yh ' Xh ;
_ _ _ L _
Yh, X h the sample and population means, respectively. Assume that X = I.WhX h
h=1
_ L _
is known . The purpose is to estimate population mean Y = I.WhYh by using the
h=l
with new weights W; which are chosen such that the chi square (CS) type of
distance
f. (w; - Wh) (Whqh t
h=1
1
(8.4.3)
Note that qh in (8.4.3) is a suitably chosen weight which determines the form of
the estimator. Minimization of (8.4.3), subject to the calibration equation (8.4.4),
leads to the combined generalized regression estimator (GREG) given by
Wh_ = Wh + ( _/ L
Wh%Xh IWhqhXh X - IWh
h=1
L
h=1
_)
Xh . -2)(- (8.4.6)
If qh = 1/xh then the estimator (8.4.5) reduces to the well known combined ratio
estimator defined earlier in (8.3.3.1). Note that there is no choice of qh such that
the estimator (8.4.5) reduces to well known combined linear regression estimator in
stratified sampling as discussed in Section (8.3.4).
An estimator of variance of combined generalized regression estimator (GREG) is
given by
.f--)_{. W;(l- Ih) Seh2
v\)'st - L, (8.4.7)
h=1 nh
h
were 2 = (nh -I )-1 nh
seh L eh;2 is the h lh stratum sample vanance and
; =1
Vst(Yratio)= i: wl(l-
h=\ nh
!h)slh(!)2(~(~st)))
v X st X st
(8.4 .15)
which is again a ratio type estimator proposed by Wu (1985) for estimating the
variance of the combined ratio estimator. Note that it makes use of the extra
knowledge of the known variance of the auxiliary variable at the estimation stage.
Several more new estimators can be constructed for new choices of weights q h
and Qh' Defining u = XlI.dixi and v = V(Xht)/V(Xht). Singh, Hom, and Yu (1998)
/ ' ES
where H(u, v) is a parametric function of u and v such that H(I, 1)= 1 satisfying
certain regularity conditions. Then all estimators obtained from the functions,
H(u ,v)=uav p , H(u,v) = {I +a(u -I)}/{I+ p(v-I)} , H(u,v) = 1+a(u -1)+ p(v-I) and
H(u ,v)= {I +a(u -1)+ p(v_I)}-l are special cases of the higher level calibration
approach, where a and p are unknown parameters involved in the function
H(u, v). Replacement of these parameters with their respective consistent
estimators in the class of estimators at (8.4.16) yields estimators which possess the
same asymptotic variance as shown by Srivastava and Jhajj (1983a), Singh and
Singh (1984a), and Mahajan and Singh (1996) .
Chapter 8: Stratified and Post-Stratified Sampling 699
Exa mple 8.4.1. Select a sample of 40 countries from population 5 using the method
of proportional allocation . Record the production and area of the tobacco crop from
the countries selected in the sample. Assume that the area under the tobacco crop in
different continents is known. Obtain the calibration weights which forms
combined ratio and combined GREG type estimators . Obtain the values of the
estimates .
Solution. Using the proportiona l allocation, we have the same values of IIh as III
the previous example . Further from the sample information, we have
8t. . ,Whxh II Yr ",,""
no.I" >;hii.I f~!~~I" 'e.- "" ...... ~tW' Irifki2\it",~\ i, Wf;eg ffj
1l,,; iQ;!iJ.
'1
\)i;o. ",'J!'.l' i'!t ~,~';'
If % = 1/xh then the calibration weights which leads to the combined ratio estimator
SSI0
Singh (2003c) recently considers a new estimator of the population mean Y III
stratified sampling as
_® L ®_
YSI = LW" Y" (8.4 .1.1)
"=1
where W,,®are the calibrated weights such that the chi square distance function
D® =..!.- I (w h® - Wh L (8.4 .1.2)
2h=1 whQf
is minimum subject to two constraints defined as
IW,,®= IWh (8.4 .1.3)
h=1 "=1
and
L ®_ _
IW" Xh = X (8.4.1.4)
h=1
where Qf are some suitably chosen weights. Minim ization of (8.4 .1.2) with respect
to (8.4 .1.3) and (8.4 .1.4) leads to new calibrated weights are given by
w® -
h - h+
(WhQhX"{ tWhQ,,)-(WhQJ tW"QhXh) (
W \h-1
Ct
\h-1
WhQ")(,,~\W"Qhxl ) _ (~~"QhXh
X
L
I W-
r-
h=1 hXh
)
(8.4 .1.5)
where
Note that more work related to this topic will be discussed in the next volume.
Chapter 8: Stratified and Post-Stratified Sampling 701
Let us study how to make the strata boundaries which divide the scale of study or
auxiliary variable for the strata. The range of the auxiliary variable X is a to b
subject to the condition that (b - a) < OCJ • For example, let X E (a,b) and let the initial
rough four strata boundaries be given by a to B1 ; B1 to B2 ; B2 to B3 ; and B3 to, b .
It is to be noted here that to find two boundary points B] and B2 will result in three
strata, three boundary points BI , B 2 , and B3 will result in four strata and so on.
These boundary points can either be obtained using information on study variable
Y or auxiliary variable X. While doing stratification on the basis of the study
variable Y we assume that:
1-----1-----1-----1--------1-----1--------1---- -1
Y3 Yh-I
(iv ) The h' h stratum weight, mean, and variance are defined as
w, = Yh ( w
Jf Y"Y, f1hy
Yh-I
=w1h Yh-I
Yh (w
fyf y,.y, and O'hy
2
=w1h Yh-I
Yh 2 (w 2
f Y f Y"Y-f1hy;
Yh Yh
fyf(y)dy fyf(y)dy ()
Yh-I x f(y ) = Yhf(Yh) Yh-I x f Yh
TU2
rrh
h W
h w, Wh
702 Advanced sampl ing theory with applications
= y1f(Yh) fa 2 + //2
W ~ hy ,-.hy
)xf(Yh)
W
-2// f(Yhk _ // )
,-.hy W I)'h ,-.hy
h h h
f(y h)[2 f 2 2) ( )]
=----w;;- Yh -\ahy +f.1hy -2f.1hyl)'h -f.1hy l
= f(Yh)
w,
[~h - f.1hyr - a1J (8.5.3)
0Yh 0Yh
°
n oV (Yst) -_ --~fwhahy
p 2] +--~
0 Yh
°fw 2]iaiy
h) +W [oa 1yJ
-ahy
- 2 (OW
-- h - - +aiy
2(0W;)
- - +W;[ -
oa&J-
- -0. (8.5.1.2)
0 Yh 0Yh 0 Yh 0Yh
On using first order derivative results from the previous section in (8.5.1.2) we have
a1yf(Yh)+ Wh~~h)[~h - f.1hyr - a1y]+ ai~ {- f(Yh)} + W;{- f~h)}[~h - f.1iy r - ai~]
Chapter 8: Stratified and Post-Stratified Sampling 703
This set of equations is called the set of minimal equations. It is to be noted that
,uhyand ,uiy are based on the strata boundaries, therefore it is not possible to obtain
Yh directly . In other words , the strata boundaries under proportional allocation
technique are the means of the following and preceding strata mean values of the
study variable .
Under Neyman allocat ion, if the finite population correction factor is ignored then
the variance of the estimato r jist of population mean Y is given by
_ 1(
V(YS\)N = - I WhUhy
L )2 (8.5.2.1)
n h=1
L
Now minimization of (8.5.2.1) is equivalent to minimization of ¢= "iWhUhy' Thus
h=1
the minimal equation s in this case will be
o¢ =~ (WhUhy)+ ~(W;UiY )
0 Yh 0Yh 0Yh
o Uh o Uh . .
Now we have - - y ) , which implies that
2y = 2Uhy ( - -
0Yh 0 Yh
OUhy =
0 Yh
_1_[ OU~y J
2Uhy 0 Yh
=
2uhyWh
~ u~J
f (Yh) [(rh - ,uhy - (8.5.2.3)
Similarly we have
OUiy = _ f(Yh) [(rh - ,uiy ~ - Ui; ]' (8 5 2 4)
0 Yh 2UiyW; . . .
Thus the set of minimal equations (8.5.2.2) will reduce to
Whf(Yh) rr \2 2 ] f ( ) W;/(Yh )rr \2 2]
Uhy f (Yh ) + 2 w: lllh- ,uhy J - Uhy - Uiy Yh - 2 W lllh- ,uiy J - Uiy
Uhy h Uiy i
704 Advanced sampling theory with applications
or
or
&h - Jlhy ~ + (T~y = &h - Jliy ~ + (Ti~
(8.5.2.5)
(Thy (Tiy
Solving these equations is not easy since Jlh and (Th are dependent upon Y h'
The methods discussed so for obtaining the strata boundary points are due to
Dalenius and Hodges (1957, 1959). In actual practice the following six steps are
needed for applying Dalenius and Hodges 's method of finding strata boundaries.
Step 6. The stratum boundaries are obtained by finding the value of cumulative
sum {JJn
h = 1,2,...,L that are closest to the multiples of k, and using the right
boundary point of the intermediate stratum giving that value of Cum {J7,;"}.
Some times it may be difficult to use intermediate strata of equal width. For
example , when using employment. The Dalenious--Hodges technique proceeds as
before, except ~Wh!h has to be used instead of [1;, where Wh is the width of the
intermediate stratum h. The above method is also called the Cumulative Square
Chapter 8: Stratified and Post-Stratified Sampling 705
Root (CSR) method. Unnithan (1978) modified the Newton method used by
Shannon (1970), which seems to perform very satisfactorily when adopted for
minimizing the variance in the search for the best boundary points of stratification
for Neyman allocation. Shiledar--Baxi (1995) discussed a sequel to the modified
Kossack and Shiledar--Baxi (1971) procedure proposed to obtain an improved ' unit
stratified' design.
0.233 RI 540.696 GA
0.471 NR 549.551 MS
7
3.433 AK 1 557.656 KY
4.373 CT 571.487 OR
16.710 NV 635.774 OR
19.363 VT 722.034 MT
8
27.508 NJ 848.317 AR
29.291 WV 906.281 CO
38.067 HI 2 1006.036 ID
43.229 DE 1022.782 IN
9
51.539 ME 1228.607 WA
56.471 MA 1241.369 ND
57.684 MD 1372.439 WI
80.750 SC 1519.944 MO
10
188.477 VA 1692.817 SD
3
197.244 UT 1716.087 OK
274.035 NM 2466.892 MN
4
298.351 PA 2580.304 KS 11
348.334 AL 2610.572 IL
386.479 WY 5 3520.361 TX
388.869 TN 3585.406 NE 12
405.799 LA 3909.738 IA
426.274 NY 3928.732 CA
431.439 AZ 6
440.518 MI
464.516 FL
494.730 NC
706 Advanced sampling theory with applications
Note the use of !h as frequency. Also note that we need 6 strata, therefore divide
12
the sum 'L)Whfh by 6, and we have the first boundary point at a cumulative
h=1
frequency of 41.26502. Thus the five boundary points (or six strata) are given in the
following table.
Table 8.5.1. Final list of states in different strata obtained by cumulative square
root method.
Example 8.5.2. We wish to estimate the average death rate of the persons living in
the United States on the basis of year 2000 projections of the death rate. The
projected 191443 persons are to be grouped into five strata on the basis of their age
at the death time. The rough distribution of the death rate projected for the year
2000 in the United States has been listed in population 6 in 21 age groups with a
gap of four years. We wish to apply the Neyman method of sample allocation .for
selecting the overall sample. Apply the Cumulative Square Root method to form 5
strata.
Solution. From the information given in the population 6 we have
For making five strata, we need to know four boundary points, say B), B2 , B3 , and
B4 by using linear interpolation between the class intervals and the Cum.JJ;
values. On dividing 1323.4630 by 5 and taking the cumulative totals, we get the
rough five boundary points corresponding to values 264.6926, 529.3853, 794.0779
and 1058.771 as given in the 6th column of the above table. Thus four boundary
points obtained through interpolation are given by
708 Advanced sampling theory with applications
B =94.5+ 4(794.0779-680.4412)=97.25
3 164.8878 '
and
B =104.5 + 4(1058.771-1055.217) =104.55.
4 268.2462
Serfling (1968) suggested the use of Cum.Jj; rule for stratification on the auxiliary
variable x when the regression of Y on X is linear with uncorrelated
homoscedastic errors and nearly perfect correlation. As the Cum.Jj; rule was
primarily proposed for stratification on the study variable Y it does not take into
account the regression of Y on X and also the form of the conditional variance
v(y Ix). The Cumifj; rule of Singh and Sukhatme (1969) and Singh (1971)
though takes into account the regression of Y on X and also the form of the
conditional variance function V(Ylx) but does not reduce to the rules recommended
for optimum stratification on Y when V(Ylx) = O. While deriving Cumifj; rule,
Singh (1971) made the assumption that V(Ylx) > 0 for all x in the range (a,b) of x
with (b - a) < 00. Singh (1975b) suggested an improvement in his Cumifj; rule
such that the form of the conditional variance function V(Ylx) also reduces to a rule
for optimum stratification on Y when V(YI x) = 0 for all x in the range (a, b) .
Schneeberger (1979) commented on the necessary conditions of Dalenius and
Hodges (1957, 1959) for optimal stratification points with Neyman's allocation.
Bankier (1988) suggested a power allocation method, under which the sample size
in the h1h stratum takes the optimum value
_ [[ShXZJ/~
nh - n
Y
- ShXZ]
Y
L.
h;l
- . (8.5.3.1)
h h
The value of q is called the power of this allocation and it can take any value in
[0, 1]. The choice of q between 0 and 1 can be viewed as a compromise between
Chapter 8: Stratified and Post-Stratified Sampling 709
the Neyman allocation and the almost equal coefficient of variation allocation.
Mandowara and Gupta (1999) have obtained optimum points of stratification for
two or more stage designs with unequal first stage units and the subsequent units.
For numerical illustrations related to Cum..[j and cum.iff methods , one can also
refer to Singh and Mangat (1996) . We would like to discuss Singh (1971) method
of stratification by using auxiliary information. We have seen that an unbiased
estimator of population mean Y in stratified random sampling is given by
L
Yst = IWhYh (8.5.3.2)
h=\
with approximate variance of the estimator given by
_ ) LWh2 2
V ( Yst = I-ahv .
h=l nh /
(8.5.3. 3)
Lg = f wl
h=lnh
(a~A + I'h,,)+K[C- Co - h=\fnhl'he -'I'(L)]. (8.5.3.8)
Example 8.5.3. We wish to estimate the average death rate of the persons living in
the United States on the basis of year 2000 projections of the death rate. The
projected 191443 persons are to be grouped into five strata on the basis of their age
at the time of death. The rough distribution of the death rate projected for the year
2000 in the United States has been listed in population 6 in 21 age groups with a
gap of four years. We wish to apply the method of proportional allocation for
selecting the overall sample required for estimation purpose. Apply the cumulative
cube root method to form 5 strata.
Chapter 8: Stratified and Post-Stratified Sampling 711
For making five strata we need to know four boundary points, say B1 , B2 , B3 , and
B4 by using linear interpolation between the class intervals and the Cum. ifj;
values . On dividing 293.1782 by 5 and taking the cumulative totals we have the
rough five boundary points corresponding to values 58.6356, 117.2713, 175.9069
and 234.5425 in the 6th column of the above table. Thus four boundary points
obtained through interpolation are
B = 54.5 + 4(58.6356 - 55.4500) = 55.84
1 9.4838 '
B = 74.5 + 4(117.2713 -103.0828) =77.95
2 16.41493 '
B =89.5+ 4(175.9069-160.528) = 91.89
3 25.6699 '
and
712 Advanced sampling theory with applications
Nh
;=1
Nh (2 )
where Th = I X h; and Dh = Nh I X h; - Xff; . If g = 2 then Dh =
;=1
°and the optimum
allocation reduces to allocation proportional to stratum totals . Then we have the
following theorems .
Theorem 8.6.1. Under model (8.6.1) the Neyman optimum allocation reduces to
the allocation given by
nhocWh~'-~-;y-S-~x-+-r(I---PX-2y"T)S-;-X-h-gj-:'x-g . (8.6.3)
Proof. We have
L Nh ()
Xg = N- 1 I Ixfi and fJ2 = P~\S; / S; .
h; l i; )
Using these results in (8.6.5) we have
2 Sy 1 1
L 22 ( 2 -=-
=Pxy 2){L (- - - )WhS }+ I-pxy{2
Sy
)L -W -2-X-f . L
[s; nh N
h
-2 hx g (8.6.6)
h;1 X h;) nh
L
The Lagrange function subject to the condition n = L nh , is then given by
h;1
Use of one-way stratification for energy data in agriculture has been discussed by
Singh, Singh, Mittal, Pannu, and Bhangu (1994) and Singh, Singh, Pannu,
Bhangoo, and Singh (1994). Moses (1978) exhibited the information of multi-way
stratification technique in validation of energy data. Chernick and Wright (1983)
found that in data validation respondent surveys many variables are candidates for
stratification, and since the relationship between these variables and the response is
not well understood, two-way or multi-way stratification with proportional
allocation along with each variable seems appropriate . Frankel and Stock (1942)
discussed the use of multiple stratification techniques in gathering data related to
unemployment. They considered the possibility of using sample designs in which
the Latin square principle can be used to reduce the number of sample units
necessary to represent all strata. For example, suppose two criterion for
stratification are used, say A and B, such that p strata can be constructed from the
A characteristic, and, within each of these, p from the B characteristic. If one
714 Advanced sampling theory with applications
be the population mean for the (i,j}th cell, SJ =(Nij -1)-/i (Yijk _Yij)2 be the
k=1
population mean square error for the (i,j}th cell, nij be the number of units in the
nij
sample belonging to the (i, j}th cell, Yij = nijl L Yijk be the sample mean in the
k=1
C R - R C - C R
(i,j }th cell, P;. = I Pij' p. j = LPij , Y =L L PijYij . n.: =L nij and n'j = Lnij are
j=1 i=1 i=lj=1 j=1 i=\
Table 8.7.2. Yield in kg/ha of wheat crop in different six regions of the Punjab state
in India by different types of farmers and their proportions.
After n, ni., and n. } have been determined, construct a square matrix having
n sub-rows (s = 1,2,..., n) and n sub-columns (t = 1,2,..., n) forming n 2 sub-cells by
following Bryant, Hartley, and Jessen (1960) . Combining ni. adjacent sub-rows
for i = 1,2,...,R , form the R rows, and by combining n.} adjacent sub-columns for
j = 1,2,...,C, form the C columns. Following these notations, the intersection of the
716 Advanced sampling theory with applications
{h row and j " column will contain a single cell consisting of nj.n' j squares or sub-
cells. This method of allocation is called random allocation .
Table 8.7.3. A 15 x 15 grid for allocation of total sample size in a two-way layout.
Steps for two-way random allocation: The following steps are needed for
allocating a sample of n units into two-way stratification :
Step 4. At the end of the marking process, each sub-row and sub-column will
contain one mark. It will complete the allocation of the n observat ions to the RC
cells;
Step 5. The number of marks within the boundaries of the (i,j)th cell represents
the number of observations to be randomly selected from the cell and nij denotes
the number of observations in it.
Properties of random allocation method: The random allocation method has the
following properties:
(8.7.2)
8",~STRATUM;:BOU.ND~JJ:,S~.F.PRMIJtllEVARIAi~,RQRUltA:nPl~~S!:,;i",
Sadasivan and Aggarwal (1978) considered the problem of optimum points of
stratification with two study variates X and Y (say). The exact equations given by
Dalenius (1950) for the univariate case have been extended to the bivariate case.
Also for minimizing the generalized variance (variance--covariance matrix) under
Neyman allocation , a set of equations giving optimum points of stratification are
discussed for two cases: .
( a) When the correlation coefficient between the two variables is constant from
stratum to stratum ;
( b ) When the correlation coefficient is varying from stratum to stratum.
While extending the univariate procedure of Dalenius (1950) and Dalenius and
Gurney (1951, 1957) to the two-dimensional situation the following assumptions
are made:
( a) the variates X and Y have continuous joint p.d.f. f(x, y ) in the finite range
Xo :5 x :5 xLI; Yo :5 Y :5 YL2 ;
( b) the population is finite;
( c ) divide the population into L I x L 2 strata by determining the strata boundaries
for X and YI,Y2'''',Y(L2- 1) for Y such that the generalized variance
XI, x2,· ··,X(L\-I)
of the means of the variables for this stratified sample under Neyman 's allocation in
minimum;
( d) the generalized variance can be set out as
2
a x' a xy 2 2 2
GV= 2 =axay -axY' (8.8.1)
a xy, a y
Before proceeding further let us first define some mathematical notation as follows.
I
I-Jihkx Yh = ~() 2
xr x f(X,Yh)dx·
Jhk Yh Xk-I
Approximate variance of x under the Neyman allocation with cost per unit constant
2
CJ'x = -
1 (L2L LILWhkCJ'hkx J2
n h=lk=l
Approximate variance of Y under the Neyman allocation with cost per unit
constant
2=
CJ'y -
1 (L2L LWhkCJ'hky
LI J2
n h=lk=1
Approximate covariance between x and Y under the Neyman allocation with cost
per unit constant
CJ'xy = -
1 (L2L LWhk~CJ'hkxY
LI J2
n h=lk=1
Thus as in the case of univariate strata boundaries, we have the following
differentiation results for bi-variate distributions to find the set of minimal
equations:
oWhk
-",- =
Yh
ff (\-J
XbY fUY =
() .,
fhk Xk
UXk Yh-I
Thus
b'a x = '¥(x,k,k)- '¥(x,k,l),
b'xk
where
Theorem 8.8.1. The set of minimal equations for minimizing the generalized
variance in (8.8.1) is given by
(8.8.2)
(8.8.3)
Using the above results (8.8.2) and (8.8.3) we have the set of minimal equations:
¢(x,k,k) = ¢(x,k,i), for i = k + 1; k = 1,2 ,"' , LI -1 and h = 1,2,.." L z -1 (8.8.4)
where
"'(x , k , I) = a xay
'I'
I I Xk + ahly
- -) [JlZhky
Z L -fhk(Xk Z - 2J1hkyJlhly I xk ]
Z + Jlhly
h ahly
g ( x, k, 1) = I
hk (x k ) [ ( / )
~ Xk VJ2h ky I Xk - f.lhly + f.lhlxf.lh/y + CThlxy - f.lhlxf.lhky I xk
]
h "O"h/xy
and
The approximate solution to the set of minimal equations given above can also be
obtained by following Ekman (1959) method, which reduces to the set of simplified
minimal equations given by
Rizvi, Gupta, and Singh (2000) also studied stratification based on two auxiliary
variables.
Example 8.8.1. The death rates in the United Sates over the period 1990 to 2065 in
nine groups and in twenty two age groups are shown in a 22 x 9 contingency table
as given in population 6 of the Appendix. Construct three homogeneous columns
from nine columns and five homogeneous rows from 22 rows .
Age"
Group l it 1990 1?
Year, Year:; r·Year
I~~eai" 1,1'2000\( 2010
Year,J' ~y'ear" Year Year
, 2050
Yea~; :Row ~% cum.JJ:
1~~99~:~ 202~, ' 20?~~ 1 ~2040 2065 ~ rrota1s'~
I'cum:;'
p i; i ¥f I 'Ji:J~'''' ~6~~~~11 ,~ .:t
a I':
450.68~
~ '~'t !~~~
1333 .8~ 1765.4S 2188,,R2 ;6
I~ ,
~oli:71i
""' <
~4 1 2. 84 3804.57 ..
' I' 1",'
Divide 'L.JJ:
r
by number of rows (R = 5) required and 'La by the number of
c
columns (C = 3 ) required.
Then the five points for grouping 22 rows into five homogeneous groups or rows
on the basis of cum.JJ:
are given by
. .
Age,Gr5mp I~"t. ~ears . ''''~Years
1990 and 1995 r,i-,2000, 20 10
~. Years 4
2030, 2040, 51
kJ " 'i~ "'" i£" ;: w ~;'and 2020~" ,,,,2050 and 2065 .
0--69 14240 16615 13607
70--89 55245 68548 63126
90--99 96048 126595 134281
100--104 91888 129315 154393
105--109 144295 215707 287719
Ahsan and Khan (1982) considered the problem of allocation to minimize the total
budgetary cost of survey subject to the desired precision assigned to the posterior
variances of the population means when the sampling is multi-p urpose , by
assuming that the overhead cost is proportional to the number of individuals
contacted in that stratum . Consider a popu lation divided into L strata and let p
variables be defined on each unit of population under study . Let
Yly (h = 1, 2,..., L; j = 1,2 , ..., p)be the unknown population means of the
Wh (h = 1,2, ..., L) be the
observations on the /h variable within the h1h stratum. Let
proportion of the population elements falling in the h stratum. Let Yly be the
'h
sample means for the rvariable in the h1h stratum. A simple way of representation
of the case under consideration is given below .
•
•
L
Assume Wand fj be the row vectors of the values of Wh and Yhj , respectively,
defined as W = (W"WZ''' ''WL) and fj = (Y;j, Y; j'" '' YLj ) . The overall population mean
Yj for thel' variable is given by
724 Advanced sampling theory with applications
(8.9.1)
ml(I.I), · .. aI21Inll' .
.........,m2(2.2j.-·······..····..·· .. ,a'i.2/n22, .. (8.9.2)
Mj =
Raiffa and Schlaifer (1961) have shown that the posterior distribution of population
means Yj for a given stratified sample with nh > and observed sample means Yj °
is L variate normal with mean vector
and variance aj = W ~ WI .
Chapter 8: Stratified and Post-Stratified Sampling 725
Let Ch be the overhead cost of approaching an individual of the h1h stratum for
measurements and Chj be the cost associated with the measurement of the /h
variable of an individual in the h1h stratum, then the total cost of the survey is
L L P
C = LChnh+ L LChjnhj , (8.9.6)
h =1 h=lj=l
where nh = m~xlnlif ~ h = 1,2, ..., L; j = 1,2, ..., p. Our aim is to draw a sample which
J
attains the desired precision assigned to the posterior variances of Yj . For this
purpose we require
(8.9.7)
where the values of Wj are the required upper limits on the posterior variances of
~, j = 1,2,..., P . Note that we have
where vj(h.h) are the diagonal elements of Vj . Using (8.9.8) in (8.9.7) we have
ht 2/
Wh {Vj(h ,h) +(»lO't ) } s wj ' j = 1,2 , ..., p. (8.9.9)
L p Chj L
L = L. L. - + L. L.Ch - - vj(hh)
h=l j=l Y hj h=l jelj Ylif
P [I }2hi : (8.9.12)
726 Advanced sampling theory with applications
where J j is such that the maximum of nhj for each h E Jj is attained for the /h
variable, subject to
Solution to these questions can be obtained by following Kuhn and Tucker (1952)
and Kokan and Khan (1967). Sekkappan (1981) also considered a problem of
optimum allocation in stratified sampling from a finite population with p variables
under study using a superpopulation approach put forth by Ericson (1969). He also
studied allocation at second phase using information obtained from the first phase.
The results obtained by Khan (1976) and Draper and Guttman (1968a, 1968b) are
shown to be special cases.
We have seen that in stratified sampling the population has to be divided into L
strata, which are homogeneous within them selves, and whose means are widely
different. The strata weights Wh can be used to obtain unbiased estimates of
population mean or total. Sometime these strata weights are not known then the
technique of two-phase sampling can be used to obtain estimates of theses weights .
Following the two-phase sampling scheme we have to select a preliminary large
sample of m units by SRSWOR sampling to estimate the strata weights. The m
units in the first phase sample are then stratified into L strata with mh units in the
h'h stratum . Select first phase sample of mh units from the h1h stratum such that
L
L mh = m. A second phase sample of size nh < mh is selected from the h1h stratum
h=\
L
by SRSWOR sampling such that L nh = n. Let Wh = mh/m be an estimator of
h=l
original unknown weights Wh = Nh / N . In such situations an estimator for
estimating the population mean Y is given by
L
Ystd = L: whYh . (8.10.1)
h=l
Then we have the following theorems:
\Ystd ) --
v(-:; ~ (1- f h JWh2Shy
L..
2
h=1 nh
(I
I J{L(I-
+- N- - - -
N- I m
I --
N
f h fh(I-Wh)Shy
h=!
L h(-
2 + IW
nh
Yh-YJ
h=l
-\2} . (8.10.2)
Proof. Note that the random sample size mh follows multinomial distribution with
parameters (m, WI ' W2, ..., WL), therefore in particular we have
thus we have
Corollary 8.10.1. For large N and nh = nWh , the variance V(Ystd) reduces to
(-:; ) 1 L 2
V V std "'-IWhShy +-IWh Yh-YJ .
1 L (- -\2 (8.10 .3)
n h=! m h=1
- - m- L..
• (-:; )_
vu\Ystd ~ [{ wh - (N
2 ---
mJWh}S~y
- - + (N
---
mJWh(Yh -Ystdf] . (8.10 .4)
m - 1 h=1 N-1 m nh N- 1 m
Theorem 8.10.3. The minimum variance of the estimator Ystd under optimum
allocation is given by
Min.v(Ystd)=(~CIVZ +~Cz~j /(C-Co) (8.10.6)
V (Ystd ~ V2 .
- ) =-+- (8.10 .8)
n m
Thus the Lagrange function is given by
~ Vz
L =-+--A (C-CO-mC1-nCZ) .
n m
(8.10.9)
Example 8.10.1. We wish to estimate the yield/hectare of the tobacco crop in the
World during 2001. The number of countries on each continent growing this crop
during 200 I is unknown. Population 5 in the Appendix shows the yield/hectare of
the tobacco crop along with the number of countries on each continent growing this
crop . We wish to apply stratified random sampling to estimate the average
yield/hectare of the tobacco crop in the world . Using information from 1998 about
this crop in different countries, estimate the first phase and second phase sample
size for stratification.
Given: C = $1500, Co = $1000, Ct = $2 and Cz = $10.
Chapter 8: Stratified and Post-Stratified Sampling 729
Hence the optimum second and first phase sample sizes, respectively, are given by
.J0.427182(1500-IOOO) =39.16~40
n= ..[C;( ~CIV2 + ~C2~ ) .J1O( .J2xO.16359 +.JIOx0.42718 )
and
m .jV;(C-Co) = .J0.163597(1500-IOOO) = 54 19 ~ 55
~( ~CIV2 +~C2~) .J2( .J2xO.16359 +.JIOx0.42718) . .
Holt and Smith (1979) showed post-stratification is potentially more efficient than
stratification. Since in order to maximise the gain in precision the stratification
factors can be chosen in different ways for different sets of variables, e.g., age, sex
etc., after sampling. This technique is found to be most practicable for surveys
where individual responses may be expected to vary with age, sex, occupation,
education, state, country, and race etc.. Usually none of these variables are available
for stratification at the individual level prior to sampling . Such a situation is called
conditional poststratification. In some situations, censuses may provide information
on all of these variables at the aggregate level. In some situations, this aggregate
level information may not be available . In other such situations this is called
unconditional post-stratification. We will discuss both situations.
730 Advanced sampling theory with applications
(8.11.2)
Espejo and Pineda (1997) proposed an estimator to estimate the variance of Ypst as
"(.,., ) (N -1) "(- ) ~ w, ~ 2 -2
V\Ypst = -(N)V Y - L. - L .Yhi + Ypst (8 114)
-n h=1 nh i= \ . .
where
"(7:)
VV' = 'LL -w,' L
nh 2
Yhi -
-2 L 2 S~y
Ypst + 'LWh - .
h=l nh i=1 h=l nh
Chapter 8: Stratified and Post-Stratified Sampling 731
8.11;2
. UN€ONDITIONAL POST-STRATIFiCATION
~ - .
Proof. We have
I: : ) I:: ) I::
VV'pst = E 1V2V'p st I nh + V)E 2V'p st I nh
) = E) [LL Wh2(Nh-nh) Shy2] + ~ (-)
y
h;) Nhnh
L
=E\ 'f.W ---
[ h;( h nh
2(1 1}2]
Nh
hy L
= 'f.W
h;l
2{ (I)
h E( -
nh
- - Shy'
Nh
I} 2 (8.11.6)
Example 8.11.1. We wish to estimate the average yield/hectare of the tobacco crop
in the world. Select an SRSWOR sample of 40 countries from the population 5.
Record the yield and area under the crop. Assume each continent as a different
strata and number of countries in each continent are known. Post-stratify the
countries selected in the SRSWOR sample into different continents. Merge the
post-strata if required using total area under crop as an auxiliary variable. Obtain a
95% confidence interval for the average yield of the tobacco crop in the world.
Solution . After applying remainder approach on the first three columns (N = 106)
of the Pseudo-Random Number (PRN) Table 1 given in the Appendix we end with
the following 40 distinct random numbers as:
038,058,071,019,077,014,061,024,096,036,094,008, 041, 049, 002, 092,
053, 044, 030, 046, 075, 078, 015, 085, 034, 063, 066, 091, 009, 039, 086, 021,
074, 006, 10 1, 035, 090 , 079, 054 and 065.
Thus the countries corresponding to these population unit numbers will be included
in the sample. Using information about population unit numbers from population 5,
we post-stratify the above sampled units as follows:
Stratum 9 10
no.
Units in 002, 008, 019, 024, 038,036 044,058,071 ,061 ,077,092 096,
the 006 009 014, 030, 041,034 046 049,053,075, 078,085 094
Sample 015 021 039,035 063,066,074 ,091,086 101
054,065 090 ,079
Post-strata
Fig. 8.11.1 Example of post-stratification.
Chapter 8: Stratified and Post-Stratified Sampling 733
We observed after post-stratification that the to" stratum remains empty. There are
several ways of merging this particular empty stratum with other one. We would
like to use the information on the total area under the tobacco crop in ten continents
or strata as follows :
,Bc4_. c;;.
.
cl n;;~ ;~
fStratUm'N6:<~ ,Tqtal~~rea]Sr~: ..1O·"ijr
1 19167.00 6 2
2 87960.00 6 2
3 146474.96 8 3
4 149235.00 10 3
5 65866.13 12 6
6 13800 .00 4 2
7 35048 1.90 30 11
8 2467759.10 17 8
9 339761 .00 10 3
10 3999 .99 3 0
We also observed each one of the strata 1, 2, and 6 have onl y two units after pos t-
stratification, which is the minimum requirement for variance estimation. We would
prefer to merge stratum 10 with anyone of these three. We mak e the use of kno wn
information on the total area under tobacco crop in different continents or strata.
We observe that total area in the ro" stratum is 3999.99 hectares which is close to
total area of 13800 hectares in the 6th stratum. Thus we shall prefer to merge the
loth empty post-stratum with the 6 th post -stratum. Afte r merging these post-strata,
we have the following situation.
• rc ~ ~. i=l I
/
1 6 2 0.0566 0.3333 1.79, 2.00 1.895 7.2041 0.0221
2 6 2 0.0566 0.3333 1.51, 1.29 1.400 3.94 42 0.0242
3 8 3 0.0755 0.3750 2.14,3 .69,2.80 2.876 26 .0357 0.6050
4 10 3 0.0943 0.3000 1.64, 1.17,2.09 1.633 8.4266 0.2116
5 12 6 0.1132 0.5000 0.92,1 .82, 0.84 , 1.698 19.9217 0.5231
1.63,2.48, 2.50
6 7 2 0.0660 0.2857 1.61,1.18 1.395 3.9 845 0.0924
7 30 11 0.2830 0.3666 2.51 , 0.00, 1.22, 1.293 23.398 8 0.5016
1.00,0.87, 1.00,
1.63, 2.10, 0.96,
2.00, 0.93
8 17 8 0.1604 0.4706 0.88, 2.50 , 1.22, 1.445 19.4782 0.3963
1.25, 1.33, 2.0 2,
1.80,0.56
9 10 3 0.0943 0.3000 1.09, 1.50,5.71 2.767 36 .0422 6.5390
734 Advanced sampling theory with applications
Thus an estimate of the average yield/hectare of the tobacco crop in the world is
given by
L
Ypst = I,WhYh ~ 1.7005 .
h=l
Now we have
,(_) L w,
nh 2 -2 L 2 s~
v Y = I,-I,Yhi- Ypst + I.Wh -
h=l nh i=l h=l nh
= 3.868401079 -1.7005
2 + 0.027478596 = 1.004 179425 .
Thus a 95% confidence interval of the average yield/hectare of the world tobacco
crop is
Ypst +1.96~v(ypst) , or 1.7005+ 1.96.J0 .6209 , or [0.15608, 3.24491] .
In order to find the percent relative efficiency (PRE) of the post-stratified random
sampling with respect to SRSWOR sampling, we have
- ) N -n 2
V (Ysrswor = - - Sy = 106-40 x 0.6323 = 0.0098424.
Nn 106 x40
I I ) L 2 I L ( \,, 2
V (Ypst ) '" ( - - - 'LWhShy +2 'L I-Wh P hy
n N h=1 n h=l
'" (.2..__1_ )
40 106
x 0.510094 + 4.149474
40 2
= 0.0105336 .
In this particular example the poststratified sampling remains less efficient than
simple random sampling.
Define a variable
then
- I Nh N
ha
Yh = - 1: Yhi =-=P h (8.12.1)
N h i=1 Nh
denote the proportion of units falling in the hlh stratum possessing attribute A.
Obviously the proportion of the units belonging to group A in the whole
population can be written as
736 Advanced sampling theory with applications
(8.12.2)
where Qh = 1- Ph .
Proof. It follows by using results from Chapter 2 because strata are independent, so
" ) = V [LIWhPh
V (Pst
h=\
"J = IW
L h2( ") = IL Wh2(-
V Ph I- -
nh
-h- ) Ph (I-Ph'
f h)(-N
N -1
)
h=l h=l h
Hence the theorem.
v"(")
Pst = L.~W2(I-fh)" "
h - - 1 Phqh, (8.12.5)
h=l ns :
where qh = I -!Jh .
Proof. Again it follows by using results from Chapter 2 because the strata are
independen t. Hence the theorem.
Note that the methods of equal, proportional , and optimum allocations are still valid
while estimating population proportion, and we are considering SRSWOR
sampling.
Chapter 8: Stratified and Post-Stratified Sampling 737
Example 8.12.1. A gardener has 60,000 mango trees in two orchards and over an
experience of 4000 years, it has been observed that each tree produces about 100
mangoes. The gardener wishes to estimate the proportion of trees producing more
than 100 mangoes out of all 60,000 trees in his two orchards.
( a ) If 10% trees are producing more than 100 mangoes, then find the variance of
the estimator of population proportion under SRSWOR sampling based on a sample
of 50 trees.
( b ) Later a statistician found that in the first orchard of 40,000 trees 10% of the
trees are producing more than 100 mangoes and in the second orchard of 20,000
trees also 10% of the trees are producing more than 100 mangoes, so he suggested
to applying stratified random sampling by selecting 25 units from each orchard.
(c ) Do you agree with the statistician?
Solution. (a) Under SRSWOR sampling we have
N = 60,000 , n = 50 and P = 0.1
so we have
V(p) = ( 1- f J(~Jp(l- p)= (1- 50/60 ,000J( 60,000 ) x 0.1x (1- 0.1)
n N- I 50 60,000 - I
= 0.01798 .
Thus the use of stratified random sampling will be more beneficial than simple
random sampling . Yes, we agree with the statistician.
738 Advanced sampling theory with applications
Exercise 8.1. In some practical situations the correlation of the auxiliary variable x
with the study variable Y is positive on some units and negative on other units .
Evidently, such auxiliary variables cannot be used directly as the auxiliary variate in
rat io and product method of estimation, since the population mean and/or the
sample mean of the variate may be close to zero and these may occur in the
denominator of the estimator. Can you resolve this difficulty by using the technique
of (a) Stratified sampling, ( b ) Post-stratified sampling, (c) Change of scale
method? Which method you will prefer and why?
Hint: Srivenkataramana and Tracy (1984) .
Exercise 8.2. For strati fied random sampling compare the estimator
L
Y:t == I.cVhYh,
h=!
where cVh == Shy /Yh denotes the estimator of the coefficient of variation in the h1h
stratum, with the usual estimator
L
Yst == IWhYh
h=l
under (a) proportional allocation, (b) optimum allocation.
Hint: Bennett (1983).
Exercise 8.3. In case of strat ified random sampling, assuming a common fixed
sample size n and neglecting the finite population correction factor, the relative
r
precision (RP) of proportional allocation to optimum allocation is given by
RP == (JIWh~PhQh /h~IWhPhQh'
{I
where
-1 Nh if iA in h
E
1h stratum,
Ph == N h I Yhi , Qh == I - Ph and Yhi == .
i=1 0 otherwise,
and A denotes the attribute of interes t.
Hint: Bennett and Islam (1983).
- - (XJ
-=-
Ycr == Y st
X st
is given by
Chapter 8: Stratified and Post-Stratified Sampling 739
CV(X) 2 2 SxXf
2- ]
st. + (1- PXy) xg
2
nh oc Wh[{ Pxy - CV(y) }
Exercise 8.5. Suppose there are three study variables satisfying the following
conditions:
( a ) the variates x, Y, and Z have probability density function. f(x, Y, z) in
the finite range xo:,> x:'> xLI' Yo:'> Y:'> YL2 ' and zo :,> z:'> zL3;
( b ) population is finite;
( c ) divide the population into LI x L2 x ~ strata by determining the strata
boundaries xI,x2, ...,X(LI_I) for x; Y\,Y2"",Y(L2- 1) for Y; and zl>z2, ..., Z(L3- 1) such
that the generalized variance of the means of the variables for this stratified sample
under the Neyman allocation is minimum; and
( d ) the generalized variance can be set out as
2
ax' a xy' a xz
2
GV = a xy' a y, a yz ,
2
a xz ' a yz , v;
where a;, a;, a;, a xy' a xz and a yz denote the approximate variances and co-
variances of x and Y under Neyman allocation with cost per unit constant. Find
the optimum points of stratification in this tri-variate population .
Exercise 8.6. If m is the size of the first phase sample from which Wh, an estimate
of Wh , has been obtained. If n is the size of the second phase sample then show
that
L
)1st = LWh)lh
h;1
is an unbiased estimator of population mean r with variance
V()lst) = I [{w,; + gWh(l- Wh)} (1- ih)sly + gWh(f" - rf ]
h;\ m nh m
where g = (N - m)/(N -1), fh = nh/ Nh and the value of nh is independent of Wh'
Hint: Dayal (1979).
740 Advanced sampling theory with applications
Exercise 8.7. For a given sample sh E D.h (h = 1,2,...,L) let ~hY' thx)assume values
in a closed convex sub-space R2 of the two-dimensional real space containing the
point (Yh , X h ) . Find the bias and variance of the class of estimators defined as
h
' = gh~hY' thx)
where gh~hY' thx) is a known function of thy and thx satisfying certain regularity
conditions such that
(a) gh(Yh ,X h )=Yh;
( b ) the function gh is continuous in R2 ;
and
( c ) the first and second order derivatives of gh exist and are continuous in R2 .
Hint: Dalabehara and Sahoo (1997).
Exercise 8.8. Let ~j = X ij / X be the selection probability for the i" unit of the /h
stratum.
( a ) Show that an unbiased estimator of population total Y is given by
• L 1 nj Yij
Ypps =I - I -
j=lnj j =1 Pij
with variance
Exercise 8.9. For the h1h stratum define xZ = (NhX h - nhxh)/(Nh - nh) ' Find the bias
and variance of the dual to separate ratio estimator of population mean Y in
stratified sampling, defined as
Ydst = IL WhYh
h=l
[-* J
~h
Xh
.
Exercise 8.10. Show that the difference between the variances of the separate ratio
and the combined ratio estimator of population total in stratified sampling is
Discuss the effect of choice of Rh and R from stratum to stratum and across strata.
Chapter 8: Stratified and Post-Stratified Sampling 741
Exercise 8.11. Consider a population of size N stratified into L strata and the size
L
of the hlh stratum being Nh such that L Nh =N . A simple random sample of size
h=l
n is drawn from the population and is classified amongst the L strata such that
nh (h = 1,2,..., L) is the number of units in the sample that fall in hlh stratum nh
varying from sample to sample .
( a ) Compare the usual unbiased estimator
L
Yp S! = I,WhYh ,
h=1
where W h = Nh/N ,with the estimator defined as
L
Y;SI = I,WhaYh,
h=1
where W ha = awh + (1 - a )Wh for Wh = nh/n .
Exercise 8.12. Divide the population of size N into L strata, such that the hlh
stratum has N h units . From the hlh stratum, draw a preliminary large sample of mh
units with SRSWOR sampling and measure the auxiliary · variable Xhi'
lh
i = 1,2,..., mh . Out of mh units selected in the preliminary large sample from theh
stratum select a second phase sample of nh units using SRSWOR sampling and
measure both the study variable Ytu and auxiliary variable Xhi' i = 1,2, ..., nh .
( a ) Study the bias and variance properties of the ratio and regression type
estimators of the population mean Y defined as
and
_ _Imh _ _I nh _ _I nh
where Wh = Nh/N , Xhm = mh L Xhi , Xhn = nh LXhi and Y hn = nh I, Yhi are the first
i=l i=1 i=1
phase and second phase sample means.
( b ) Find x, and k real constants such that the variances of the Y 2 and Y 4 are
mmimum.
Hint: Lindley and Deely (1993)
742 Advanced sampling theory with applications
Exercise 8.13. Consider we draw a preliminary large sample of m units from the
population of N units with SRSWOR sampling. First step is to post-stratify the
selected units in the L strata so that the h'h stratum contains mh units and note the
associated auxiliary variable Xhi ' i = 1,2 ,..., mho From the post-stratified sample with
mh units in the h'h stratum, select nh units with SRSWOR sampling and mea~ure
the study variable Yhi and auxiliary variable Xhi' i = 1,2 ,..., nh .
Study the asymptotic properties of the estimator of population mean Y defined as
-
Ypst = IL mh -
h=1 m
-
Xhm
- Yhn(-=---
Xhn
J.
Exercise 8.14. Consider Ph be the proportion of units in the h'h stratum possessing
L
an attribute of interest, say A. Evidently P = I Wh Ph be the proportion of units in
h=1
the population possessing the attribute A. Assume nh units are selected from the h'h
stratum using SRSWOR sampling. Suggest an unbiased estimator of P. Compare
the efficiency of optimal allocation with proportional allocation .
_ L _ _ 1 nh
Exercise 8.17. Assume Y =I WhY" , W" =Nh/N and Yh =- I Yhj ' h =1,2,...,L.
h=1 nil j=1
Also let Y =..!.- IYi and Y p =..!.- IW"Yh denote the estimators of Y obtained from
n i=1 n h=1
simple random sampling and post-stratification (n" ~ I, h =1,2,..,L ) , respectively. It
should be noted that the sample size n of the first-stage simple random sampling is
a fixed number, while the classified sample size nh thereafter are random variables.
In case of small samples a commonly used estimation procedure for Y is to
collapse the empty post-strata , if there are any, with neighbouring strata. If yAL) be
the estimator of Y under such a procedure then define Yc(L), L = 2,3,4.
Hint: Chang, Liu, and Han (1998), Chang, Han, and Hawk ins (1999).
Chapter 8: Stratified and Post-Stratified Sampling 743
j = I,...,L.
( a ) Show that the estimating function
N
h = L{t;(Yi)-ai(O)}/Jri + Lai(O)
ie s i=1
can be written as
• L ( ) L
YG = L L \Yi -OjXi VJri + L 0jXj,
j=liesj j=l
where S j denote the sample from the stratum n j and X j = LXi'
o.j
( b ) Also show that under SRSWOR sample of fixed size n the estimating
function reduces to
• N L t; _) N
YG=- L nj\Yj -OjXj + L OjX,
n j=l j=1
where Yj and Xj are the means of the yand X over the sample S j .
completely arbitrary , then show that the optimum value of these weights Wi over
the repeated sampling design is given by
N
with a = I TC jWjX j .
j; 1
( C ) Also show that it leads to an analogue of the Neyman allocation given by
where k is a constant.
Show that the estimator ~I is unbiased for population total, Y, and its variance is
given by
Chapter 8: Stratified and Post-Stratified Sampling 745
, ) 1 L Nh Nh ( V )2
V (Ygt := -- I I I Jrhij - JrhiJrhj !V'hd Jrhi - Yhj/ Jrhj .
2 h=l i=lj;<i=1
Hint: Padmawar (1998).
Exercise 8.22. Assuming that reliability ratio <5 IS known, a necessary and
sufficient condition for the estimator
1~ i '* j ~ L such that Wjsi := adni for al1 j E Si and wO Si := (ai -I)a for i:= 1,2, ..., L .
Hint: Zou and Wan (2000).
and
746 Advanced sampling theory with applications
2L
2 '
L 4 (L 2)
h~IWhQhXh h~l WhQhShx - h~IWhQhShx
( b ) The minimum variance of the estimator Yst (new) is given by
where
Ahrs = ~hrsS/2 and J-Ihrs = (Nh -r): 1~(Yhi - f,,) (Xhi - X h)' .
J-Ih20J-102 i; 1
where
Ehi= (Yhi - f,,)- f31(X hi -Xh)- f32{(Xhi - x hf -o-kc} with a be = Nj,I%(Xhi -xhf .
nh
mean and variance . Also let xh = nj,1 L,Xhi , and
i ;1
Chapter 8: Stratified and Post-Stratified Sampling 747
nh nh(
_ _I 2 ( )_I - \2
Yh = »« I,Yhi' Shy = nh -1 I, Yhi - Yh J denote the second phase sample mean
i=1 i=1
and variances for the auxiliary and study variables, respectively.
( a ) Consider an estimator of the population mean in stratified double sampling as
L
Yst(d) = I,W;Yh,
h=1
where the W; are the calibrated weights such that the chi square distance
L
I,
(w; -WhY ,
h=1
WhQh
where Qh are prearranged weights, is minimized, subject to the constraints
L *_ L _* L *2 L *2
I,WhXh = I,Whxh, and I,Whs hx = I,Whshx'
h=1 h=1 h=1 h=1
where the W» = Nh/N are known stratum weights .
Show that a calibrated estimator of the population mean In stratified double
sampling is given by
.[-stdlWh
vY ~ h*2(-1- - S
() *] =L..W 1 Jhy
2
h=1 nh Nh
where
nh
sly = (nh -ItI I,(Yhi - Yhf .
i=1
( d ) Show that the minimum variance of the stratified double sampling estimator
Yst (d), to the first order of approximation, is given by
v(Ys,(d))= I Wl~(_1
h=l
__I J s~y + (...!...nh __
~ mh N h
I J...!... Ie~i]'
»« nh i=1
where ehi = (Yhi - Yh)- PI(Xhi - Xh)- P2 tXhi - Xh)2 - s~; } denotes the estimate of the
v(yst(d))n = I W;2~(_1
h=1
__I Js~y + (...!...nh __
~ mh N h
I J-I-Ie~i]'
mh nh -I i=1
Exercise: 8.24. Under stratified random sampling, study the asymptotic properties
of the following estimators of the population mean, Y , defined as
IWh(Xh + Cxh) IWh(Xh + fJ2h(X))
( a -
) YstSD - h=1
= Yst L
( b) YstSK
- - h=1
= Yst ==-'L- - - - -
LWh(Xh+Cxh) LWh(Xh+fJ2h(X))
h=1 h=1
IWh(XhfJ2h(X)+ Cxh)
( c ) YsyUSI = Yst-",h~?I-----
LWh(XhfJ2h(X) +Cxh)
h=1
L
where YSI = LWhYh has usual meaning. Discuss the special cases when there is
h=l
only one stratum ( L = 1) in the population .
Hint: Kadilar and Cingi (2003).
Practical 8.1. A World Bank manager selects two units by SRSWOR sampling
from each stratum of the population 5 given in the Appendix. Later on a statistician
collects the information on the production of the tobacco crop from the countries
selected in the sample. The total number of countries in each continent are known
then discuss the estimate of the average production of the tobacco crop in the world.
Derive the 95% confidence interval. Also find an estimate of the percentage gain in
efficiency (GE) due to stratification .
Chapter 8: Stratified and Post-Stratified Sampling 749
Practical 8.4. Owing to budget constraints the management of the World Bank has
suggested a condition of spending money in different continents while selecting a
sample. Keeping the instructions of the management in your mind, select a sample
of 40 countries from population 5 given in the Appendix using the method of
optimum allocation. Record the production of Tobacco crop from the selected
countries. Estimate the average production of the Tobacco crop in the world.
Estimate the variance under the method of Optimum allocation. Construct a 95%
confidence interval.
Given: Ct = $0.5, Cz = 2, C3 = 3, C4 = 5, Cs = 7, C6 = 1.5, C7 = 10, Cs = 5, Cg = 5, and
Cto = 3.
Practical 8.5. Find the minimum sample size from population 5 given in the
Appendix to obtain estimates of population mean with different levels of relative
standard deviations. Plot relative standard deviation versus sample size.
Given: C1 = $0.5, Cz = 2, C3 = 3, C4 = 5, Cs = 7, C6 = 1.5, C7 = 10, Cs = 5, Cg = 5,
Cto = 3 and Y = 1.5507.
Practical 8.6. Stratify the 50 states listed in population I given in the Appendix
using the real estate farm loans as an auxiliary variable. Modify the stratum
boundaries into six strata using cumulative square root method and cumulative cube
root method.
Practical 8.7. A team of doctors wish to estimate the average death rate of persons
living in the United States on the basis of year 2000 projections. The projected
191,443 persons are to be grouped into five strata on the basis of their age at the
time of death. The rough distribution of the death rate projected for the year 2000
in the United States has been listed in population 6 of the Appendix in 21 age
groups with a gap of four years. Apply the Neyman method of sample allocation for
selecting the overall sample and the cumulative square root method to form 4 strata.
750 Advanced sampling theory with applications
Practical 8.8. Suppose there are eight big cities in a particular country. The
following table lists the number of persons and standard deviation of their income
in each city with different age groups. We wish to allocate a sample of 1000
persons to collect information for social survey . Suggest the number of persons to
be selected from each city with the help of Proportional, Neyman and Optimum
allocation.
Standard':'
Devi liH()l{
9500 76.44
25--34 29500 25--34 29500 45 .89
35--44 19500 35--44 32500 18.32
45--54 4500 45--54 22500 26.58
55--64 6500 55--64 20500 24.88
65+ 15500 65+ 19500 19.14
<24 4500 <24 4500 44.63
" 25--34 20500 25--34 27500 13.65
35--44 27500 35--44 41500 18.58
45--54 19500 45--54 26500 27.99
55--64 21500 55--64 16500 12.44
65+ 21500 65+ 23500 15.07
<24 6500 <24 5500 4.62
25--34 12500 25--34 30500 11.02
35--44 28500 35--44 33500 16.58
45--54 29500 45--54 21500 23.91
55--64 29500 55--64 14500 23.03
65+ 31500 65+ 14500 21.82
<24 3500 <24 3500 36.94
25--34 23500 25--34 20500 8.74
35--44 26500 35--44 13500 12.25
45--54 16500 45--54 9500 14.24
55--64 19500 55--64 10500 22.09
65+ 20500 65+ 16500 9.97
Which method you will prefer and why?
Given: Cost of processing one unit in the t" the city is C1 = $2, C z = $4, C3 = $2.5,
C4 = $3, Cs = $4, C6 = $4.3, C7 = $2.8 and Cs = $3.2.
Practical 8.9. Consider a couple in your class consisting of a husband and wife.
Assuming that the life of every couple consists of good and bad events. Ask to both
of them to prepare a separate list of good and bad events in their life. Estimate the
proportion of good events in for each one of them . Also obtain a pooled estimate of
good events among the families assuming that 50% weights to each of them and
number of couples are infinitely large .
Chapter 8: Stratified and Post-Stratified Sampling 751
Practical 8.10. An estimate of the average death rate of persons living in the
United States has been found to be useful for making future strategies. The
projected 191443 persons are to be grouped into five strata on the basis of their age
at the time of death. The rough distribution of the death rate projected for the year
2000 in the United States has been listed in population 6 of the Appendix in 21 age
groups with a gap of four years. Apply the method of proportional allocation for
selecting the overall sample required for estimation purpose and apply the
cumulative cube root method to form 4 strata.
Practical 8.11. The estimation of total production of tobacco crop in the world is an
important issue to the health departments. Select an SRSWOR sample of 45
countries from the population 5 given in the Appendix and record the production
and area under the crop. Assume each continent as a different strata and number of
countries in each continent are known . Post-stratify the countries selected in the
SRSWOR sample into different continents . Merge the post-strata if required using
total area under crop as an auxiliary variable. Obtain 95% confidence interval for
the total production.
Practical 8.12. The following map shows rank of average temperature during
December 200 I in different states of the United States of America.
Recor d
Colde st
Mu c h
Bel ow
Normal
D
Belo w
Nonnal
oHe r
Nonn al
o
Above
Normal
Muc h
Abov e
Normal
•
Recor d
Warme st
( 1 ) SRSWOR sampling
( a ) Count the number of states whose rank of temperature is shown in the map.
( b ) Compute the average rank temperature in the USA.
( c ) List the names of the states in alphabetic order A to Z (Rule: Use two letter
abbreviations, e.g., use NY for New York).
( d) Select an SRSWOR sample of 15 states (Rule: Start with the first two columns
of the Pseudo-Random Number Table I given in the Appendix).
( e ) Construct a 95% confidence interval estimate of the average rank temperature
based on the SRSWOR sample information.
( f) Find population mean square s; .
( g ) Find the variance of the estimator of the average rank of temperature based on
SRSWOR sample of 15 units.
( h ) Divide the population into three strata based on information such that the
stratum I consists of all states having temperature Near Normal, stratum 2 consists
of all states having temperature Above Normal, and stratum 3 consists of all states
having temperature Much Above Normal or Recorded Warmest. Construct three
list of the states falling in different three strata in the alphabetic order A to Z. (Use
two letter abbreviations, e.g., NY for New York).
(i ) Select sub-samples of five units from each stratum using SRSWOR sampling.
(Rule : Always start with first row and first column from the Pseudo-Random
Number Table 1 given in the Appendix, and make sure the states in each stratum
are in alphabet order).
(j ) Construct 95% confidence interval estimate of the average rank temperature
based on stratified random sample of 15 units.
( k ) Find the population mean squares for each one of the three strata slY' h = 1,2,3.
( I ) Find the variance of the estimator of the average rank of temperature based on
stratified sampling, while five units are selected from each stratum.
( m ) Find the relative efficiency of the stratified sampling over the SRSWOR
sampling.
( s ) Assuming that the population mean squares (S~Y ' h = 1,2,3.) for the three strata
are known, select a stratified random sample using Neyman allocation. (Rule:
Always start with first row and first column from the Pseudo-Random Number
Table I given in the Appendix, and make sure the states in each stratum are in
alphabet order).
( t ) Construct the 95% confidence interval estimate of the average rank of
temperature.
( u ) Find the variance of the estimator of the average rank of temperature using
Neyman allocation .
( v ) Find the relative efficiency of the Neyman allocation with respect to SRSWOR
sampling.
( w ) Find the relative efficiency of the Neyman allocation with respect to equal
allocation sampling .
( x ) Find the relative efficiency of the Neyman allocation with respect to the
proportional allocation sampling.
( 6 ) Post-stratified sampling
( ff) Collected information for each state included in the SRSWOR sample of 15
units from the whole populat ion as in part (l) and note their status as: Near
Normal, Above Normal etc.. (Rule: Start with first two columns of the Pseudo-
Random Number Table 1 given in the Appendix , and make sure the states are listed
in alphabetic order A to Z).
( gg ) Post-stratify the sample of 15 units into three strata, viz., Stratum 1 with Near
Normal, Stratum 2 with Above Normal, and Stratum-3 with Much Above Normal
or Recorded Wannest, temperatures . Is there any empty post-stratum? If so,
suggest a possible solution to the issue.
( hh ) Estimate the average rank temperature using post-stratification, and deduce
95% CI estimate .
( ii ) Find the relative efficiency of the post-stratified sampling with respect to
SRSWOR sampling .
Experience has shown that the statewide rank of temperature during Dec. 2001 has
correlation with the rank of statewide precipitation during Dec. 2000 and a
complete information on the precipitation is known . Collect the state wise
information on the precipitation using the following map:
•
Record
Dries t
MuCh
Below
Norma l
o
Below
Nonnal
o
Nea,
Normal
o
Above
No rm al
Muc h
Abo ve
Normel
•
Record
Wettest
(jj ) Collected information on precipitation for each state included in the SRSWOR
sample of 15 units from the whole population as in part (1). Also calculate the
overall average precipitation in the USA based on information from the 48 states
listed above.
( kk ) Obtain the ratio estimate of average rank temperature.
( 11 ) Construct a 95% confidence interval estimate of average rank temperature
using estimate of mean square error of the ratio estimator based of an SRSWOR
sample of 15 units. Interpret it.
(mm) Find mean square error of the ratio estimator.
( nn ) Find the relative efficiency of the ratio estimator with respect to the sample
mean estimator.
Use the same stratification as used in the previous section and collect information
on the precipitation from each stratum. Assume that the information of precipitation
in each stratum is known as auxiliary variable.
( 00 ) Construct a 95% confidence interval estimate of the average rank temperature
using separate ratio estimate under equal allocation and proportional allocation.
Discuss the comparison among each other.
( pp ) Find the relative efficiency of the separate ratio estimator with respect to
stratified sampling estimator under equal and proportional allocation.
( bbb ) Select a sample of 15 units using Neyman allocation for separate ratio
estimate and construct a 95% confidence interval estimate, and find its relative
efficiency with respect to usual ratio estimator without stratification.
( ccc ) Select a sample of 15 units using Neyman allocation for separate regression
estima te and construct a 95% confidence interval estimate, and also find its relative
efficiency with respect to usual regression estimator without stratification.
( ddd ) Select a sample of 15 units using Neyman allocation for combined ratio
estimate and construct a 95% confidence interval estimate, and find its relative
efficiency with respect to usual ratio estimator without stratification.
( eee ) Select a sample of 15 units using Neyman allocation for combined regression
estimate and construct a 95% confidence interval estimate, and also find its relative
efficiency with respect to usual regression estimator without stratification.
( fff) Select a sample of 15 units using optimum allocation with same costs defined
earlier for separate ratio estimate and construct a 95% confidence interval estimate,
and find its relative efficiency with respect to usual ratio estimator without
stratification.
( ggg ) Select a sample of 15 units using optimum allocation with same costs for
separate regression estimate and construct a 95% confidence interval estimate, and
also find its relative efficiency with respect to usual regression estimator without
stratification.
( hhh ) Select a sample of 15 units using optimum allocation with same costs for
combined ratio estimate and construct a 95% confidence interval estimate, and find
its relative efficiency with respect to usual ratio estimator without stratification.
( iii) Select a sample of 15 units using optimum allocation for combined regression
estimate and construct 95% confidence interval estimate , and also find its relative
efficiency with respect to usual regression estimator without stratification.
( jjj ) Which estimator is best among all the above estimators based on
stratification? Give your views.
Chapter 8: Stratified and Post-Stratified Sampling 757
( kkk ) Apply the separate ratio estimator for the post-stratification of sample of 15
units in ( 6 ) to construct 95% confidence interval estimate , and find its relative
efficiency with respect to post-stratified estimator without using auxiliary
information.
( 111 ) Apply the separate regression estimator for the post stratification of sample of
15 units in ( 6 ) to construct 95% confidence interval estimate, and find its relative
efficiency with respect to post-stratified estimator without using auxiliary
information.
( mmm ) Apply the combined ratio estimator for the post stratification of sample of
15 units in ( 6 ) to construct 95% confidence interval estimate, and find its relative
efficiency with respect to post-stratified estimator without using auxiliary
information.
( nnn ) Apply the combined regression estimator for the post stratification of sample
of 15 units in ( 6 ) to construct 95% confidence interval estimate, and find its
relative efficiency with respect to post stratified estimator without using auxiliary
information.
( 000 ) Which estimator is best among all the above estimators based on post-
stratification? Give your views.
( ppp ) Derive the calibrated weights using know precipitation rank and deduce the
estimates of average rank temperature based on them using combined ratio,
combined GREG and combined linear regression estimators.
( f) Using full information from the description of the population, find the relative
efficiency of separate regression estimator with respect to combined regression
estimator .
St. No. 1 2 3 4 5 6 7 8 9 10
Nh 6 6 8 10 12 4 30 17 10 3
nh 3 3 3 3 4 2 11 6 3 2
PRN 1 2 3 4 and 5 6 and 7 8 9 and 10 11 and 12 13 and 14 15
Col.
Practical 8.14. Consider you are a circus statistician, and in the circus there are
three types of elephants, viz. Niko, Sambo, and Jumbo, which are of light, medium
and heavy in weights . The following table lists the number of elephants in each
category along with their average weights ( kg ) and population standard deviations .
150 100
3500 5000
400 200
(d) Ifwe select an SRSWOR sample of n = 40 elephants from the whole circus,
then what will be the variance of the estimator of the average weight of all
elephants in the circus?
( e ) How many elephants you will select from these strata using proportional
allocation to get a sample of n = 40 units?
( f) What will be variance of the estimator of the population mean under
proportional allocation?
( g ) What will be the relative efficiency of proportional allocation with respect to
SRSWOR sampling?
( h ) How many elephants you will select from these strata using the Neyman
allocation to get a sample of n = 40 units?
(i ) What will be variance of the estimator of the population mean under the
Neyman allocation?
Chapter 8: Stratified and Post-Stratified Sampling 759
(j ) What will be the relative efficiency of the Neyman aIlocation with respect to
SRSWOR sampling?
(k) What will be the relative efficiency of the Neyman aIlocation with respect to
proportional aIlocation sampling?
( 1) Let C\ = $4, C2 = $9, and C3 = $25 be the cost of weighing an elephant in the
first, second, and third strata. Find the optimum aIlocation of a sample of
n = 40 units over three strata.
( m ) What will be variance of the estimator of the population mean under optimum
aIlocation?
( n ) What will be relative efficiency of optimum aIlocation with respect to
SRSWOR sampling?
( 0 ) What will be relative efficiency of optimum aIlocation with respect to
proportional aIlocation?
( p ) What will be relation efficiency of optimum aIlocation with respect to the
Neyman aIlocation?
Practical 8.15. In a circus there are three types of elephants, viz., Light, Medium,
and Heavy in weight. The foIlowing table lists the number of elephants in each
category along with their average weights (kg) and average weight of food (kg/day)
along with other information.
150 100
3500 5000
100 150 250
500 400 200
50 30 20
17500 8400 2800
S2 = httNh-l)slx}+ ht,{Nh(Xh-xf}
x N -1 '
and
760 Advanced sampling theory with applications
( n ) What will be variance of the separate ratio estimator of the average weight of
the elephants under optimum allocation ?
( 0 ) What will be relative efficiency of the separate ratio estimator under optimum
allocation with respect to the usual ratio estimator under SRSWOR sampling?
(p) What will be relative efficiency of the separate ratio estimator under optimum
allocation with respect to separate ratio estimator under proportional allocation?
( q) What will be relative efficiency of the separate ratio estimator under optimum
allocation with respect to the separate ratio estimator under the Neyman allocation?
( r ) What will be mean square error of the combined ratio estimator of the
population mean under proportional allocation?
( s ) What will be the relative efficiency of the combined ratio estimator under
proportional allocation with respect to the usual ratio estimator under SRSWOR
sampling?
( t ) What will be the mean square error of the combined ratio estimator of the
average weight of elephants under the Neyman allocation ?
(u ) What is the relative efficiency of the combined ratio estimator under the
Neyman allocation with respect to the usual ratio estimator under SRSWOR
sampling?
Chapter 8: Stratified and Post-Stratified Sampling 761
( v ) What is the relative efficiency of the combined ratio estimator under the
Neyman allocation with respect to the combined ratio estimator under proportional
allocation?
( w) What will be variance of the combined ratio estimator of the average weight of
the elephants under optimum allocation?
( x ) What will be the relative efficiency of the combined ratio estimator under
optimum allocation with respect to the usual ratio estimator under SRSWOR
sampling?
( y ) What will be the relative efficiency of the combined ratio estimator under
optimum allocation with respect to combined ratio estimator under proportional
allocation?
( z ) What will be the relative efficiency of the combined estimator under optimum
allocation with respect to the combined ratio estimator under the Neyman
allocation?
( aa) Ifwe select an SRSWOR sample of n = 40 elephants from the whole circus,
then what will be the mean square error of the regression estimator of the average
weight of all elephants in the circus?
( bb ) What will be the mean square error of the separate regression estimator of the
population mean under proportional allocation?
( cc ) What will be the relative efficiency of the separate regression estimator under
proportional allocation with respect to the usual regression estimator under
SRSWOR sampling ?
( dd ) What will be the mean square error of the separate regression estimator of the
average weight of elephants under Neyman allocation?
( ee ) What will be the relative efficiency of the separate regression estimator under
the Neyman allocation with respect to the usual regression estimator under
SRSWOR sampling ?
( ff) What is the relative efficiency of the separate regression estimator under the
Neyman allocation with respect to the separate regression estimator for proportional
allocation?
(gg ) Let C1 = $4 , Cz = $9, and C3 = $25 be the cost of weighing an elephant in
the first, second , and third strata. Find the optimum allocation of a sample of
n = 40 units over three strata.
( hh ) What will be the variance of the separate regression estimator of the average
weight of the elephants under optimum allocation?
( ii ) What will be the relative efficiency of the separate regression estimator under
optimum allocation with respect to the usual regression estimator under SRSWOR
sampling?
(jj ) What will be the relative efficiency of the separate regression estimator under
optimum allocation with respect to the separate regression estimator under
proportional allocation ?
( kk ) What will be relative efficiency of the separate regression estimator under
optimum allocation with respect to the separate regression estimator for Neyman
allocation ?
( 11 ) What will be mean square error of the combined regression estimator of the
average weight of elephants under proportional allocation?
762 Advanced sampling theory with applications
Practical 8.16. In a circus there are three types of elephants, viz., Light, Medium,
and Heavy in weight and some information about them is listed below:
100
8
5200
244.8
250
490 410 220
45 25 30
17200 8000 5500
L
Yst = LWhYh
h=\
and
S =
I {(nh -l)s/txy}+ I {nh (Xh -XXYh - y)}
.:.:.h==..:I -'hc:..::=:.:..I _
~ n-l
Practical 8.17. Michael believes that when a soul is born it automatically falls into
one of the religious categories, say, Sikh, Hindu, Muslim, Christian etc., existing in
the world. Is it an example of stratification or post-stratification? Comment.
764 Advanced sampling theory with applications
Practical 8.19. Divide your class into two groups based on gender . Decide to
select a sample of reasonable size with the suggestion of the class instructor. Select
a sample of the required size using proportional allocation, and collect information
on the GPA of the students selected in the sample from the both strata.
( a ) Estimate the average GPA of the class using the usual formula in stratified
sampling. Construct the 95% confidence interval estimate.
( b) Collect information about the number of classes attended by the students from
the register of the instructor if he/she permits. Use this information to improve your
estimates in ( a ) and comment.
9. NON-OVERLAPPING, OVERLAPPING, POST, AND
ADAPTIVE CLUSTER SAMPLING
In survey sampling the basic assumption is that the population consists of a finite
number of distinct and identifiable units. A group of such units is called a cluster.
If, instead of randomly selecting a unit for sample, a group of units is selected as a
single unit in the sample, it is called cluster sampling. If the entire area containing
the population under study is divided into smaller segments, and if each unit of the
population belongs to only one segment, the procedure is called area sampling or
non-overlapping cluster sampling. If one or a few units appears in more than one
segment or cluster, then such a procedure is called overlapping cluster sampling.
The main purpose of cluster sampling is to divide the population into small groups
with each group serving as a sample unit. Clusters are generally made up of
neighbouring elements; therefore the elements within a cluster tend to be
homogeneous. However at some stage in the research we become interested in
heterogeneous clusters rather than homogeneous . More broadly , the concept of
forming strata in the previous chapter was to form homogeneous groups, whereas in
this chapter the concept of forming clusters will be to form groups of a
heterogeneous nature. After dividing the population into clusters the sample of
clusters can be selected with either equal or unequal probability. The concept of
unequal probability may be based on the size of the cluster; that is, the larger the
cluster, the larger the probability of its being selected in the sample . All the units in
the selected cluster will be enumerated. As a simple rule the number of units in a
cluster should be small and the number of clusters should be large. The main
advantage of cluster sampling is that it is cheaper, since the collection of data for
neighbouring units is easier and faster. It is also useful when the frame for selecting
the sample is not available at the unit level. For example, a list of persons may not
be available, whereas a list at a dwelling level may be available. For a given sample
size cluster sampling is less efficient than simple random sampling . However, in
most situations the loss in efficiency can be balanced by the reduction in cost. Any
sampling procedure, for example simple random sampling , stratified sampling , or
systematic sampling , may be applied to cluster sampling by using the clusters
themselves as sampling units. Smith (1938) and Hansen and Hurwitz (1942) have
discussed the efficiency of cluster sampling.
Population
e\:::::JA
© \cJ
Cluster I Cluster 2 Cluster N
Clusters are hetero
Fig. 9.0.1. Pictorial representation of cluster sampling.
The well known examples of the clusters and their units are given below :
School Dwellin Da
Students Persons Hours
There are two possibilities: ( a ) clusters of equal size or (b) clusters of different
sizes. It is obvious that if every cluster has the same number of units, then the
chance of selection of each cluster in the sample will be the same. On the other
hand, if the number of units in the clusters is different and known, then different
probabilities proportional to the number of units in the clusters can be assigned
before taking the sample. We will discuss both these situations in detail below.
Let N denote the number of clusters in the population and M the number of units
in each cluster. Evidently, the total number of units in the population is NM. We
select a sample of n clusters and hence we select nM units from the population. Let
us define
Yij = The value of the study variable y for the /h population element
-Y- = -1- L
N M
L v Ye.
I ij = - - , th i ation
e popu ' mean w h'IC h we want to estimate
. f rom
NM i = lj= l NM
the sample information,
Y = f..1 N, population mean per cluster,
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 767
- Y, I M th
fi. = ~ = - L fij , mean of M population elements in the i cluster,
M M j=1
nM
Y•• = L L Yij , the sample total of the study variable y,
i=lj=l
Y = y•• ln , sample mean per cluster,
- I n M y
Y= - L L Yij = - , sample mean per unit,
nM i=lj=l M
sl = _1_ ~(Y;.
N-li=l
-f)2 ,the variance between the means of different clusters,
s] = _1_ ~ (Yij _ 1';.)2, the variance within the t" cluster,
M-Ij=l
IN
5; = - Lsl = (
I NM(
) L L Yij -
_)2
fi. ,mean value of within cluster variances,
N i=1 N M -I i=lj=l
I - .L
S2 = - - N M (
L Yij - Y =)2 , the total population variance,
NM -It=lj=1
(
and
r~, fir f)( fij'- f)
p = t=ljO' j =1 ( ) 2 ' the intra-cluster correlation coefficient.
NM M-IS
Proof. We have
f=)
VI)' =V [I-LYi. I --
n _] = ( - I) -(--L
I N(_fi.- Y=)2 = (I- - -I)Sb'2
n i=l n N N-I)i=l n N
Hence the theorem.
768 Advanced sampling theory with applications
Example 9.1.1. Using the MAP of the United States of America construct 10
clusters, from the 50 states , each cluster consisting of 5 states.
Solution. There are several possible ways to construct 10 clusters based on their
locations. The number of possibilities may be decreased if we have some additional
information, for example, distance to be travelled between states to collect data . In
such a situation, we may decide to minimise the distance to be travelled by the data
collectors.
.0 ...
HI~<>
Example 9.1.2. Select 6 clusters from Table 9.1.1 using SRSWOR sampling.
Record the values of the real estate farm loans for the selected states from
population I given in the Appendix. Estimate the average real estate farm loans in
the United States using the cluster sampling. Estimate the variance of the estimator
used for estimating the average real estate farm loans . Construct a 95% confidence
interval.
Solution. Starting with the first two columns of the Pseudo-Random Number
(PRN) Table I given in the Appendix, we select the required 6 distinct random
numbers as 01, 04, 05, 03, 06 and 07. Thus the following clusters are included in an
SRSWOR sample of 6 units.
Note that M = 5 and n = 6. Thus an estimate of the average real estate farm loans
in the United States is given by
- 1 n M 1
Y =-2: 2: Yij =-x23017.97 = 767.2656.
nM i=lj=1 6x5
Now we have
770 Advanced sampling theory with applications
2 445.062 103815.16
3 820.193 2801.31
4 1388.066 385393.14
5 1058.798 84991.14
6 577.830 35885.85
Thus we have
I~;.- yf
s1; = ;=1 =818659.16 =163731.832 .
(n -1) 6-1
Thus an estimator of V(Y) is given by
A(=) =(1---1 ) sb2 =( ---
vI,)! 1 1) x163731.832 =10915.455.
n N 6 10
Y+ta/2(df=n(M-l)Nv(y) .
Thus 95% confidence interval for the average real estate farm loans in the United
States is given by
Theorem 9.1.4. The relative efficiency of cluster sampling with respect to SRS is
RE = S2 /(Msl). (9 .1.4)
Proof. We know that the variance of estimator Y under cluster sampling is
v(Y)=(;- ~ )S1; (9.1.5)
Exa mple 9.1.3. Suppose the United States has been divided into 10 neighbouring
clusters each consisting of 5 states as shown below.
We are interested in estimating the average real estate farm loans in the United
States. Is there any gain in efficiency due to clustering as opposed to simple random
sampling?
Solution. We have
Cluster ~':~r:lues of t~e re~'~state fatffiloans .., I ',
-~. "' ~. (-y:-y
~~t~
No !\,; l) , -,
> z. . . -1'
The relative efficiency of the estimator in cluster sampling with respect to the one in
SRSWOR sampling is given by
RE=S2/(MS 2)= 342021.5 =0.3722 .
b 5x183773.444
In this case, cluster sampling is less efficient than simple random sampling. Thus
there is no gain in efficiency through cluster sampling in this example. Let us try to
find the reason.
I[ ~
,;\ J;(
(Jij - r)]2= I[ ~
,;1 J;(
Jir ~ r]2 = ~(Ji. - Mry = ~( MY;. _Mr)2
J;( ,;1 ,;1
Ji.-Y=)2 =M 2(N-1)Sb2'
N(-
=M2 ~ (9.1.9)
,;\
Also we have
=L
N M ( =)2
L Jir Y + ~N . ML; ( =)( =)
Jir Y Yij' - Y . (9.1.10)
,;IJ;( ';\J* J;\
Now
S2 = _1_ ~ ~ (Ji'1o_r)2
NM ;;\j;( and ~ ~j';1 (Jio'1 - r)(Ji"-
p = ;;lj* '1
r)/{NM(M -1)s2}
r)l
which implies that
RE=~= M(N-I)
Msl (NM -I)+NM(M -I)p
= ( I ) = [1+(M_I)pjl (9.1.12)
1+ M -I P
which proves the first part of the theorem.
To prove the second part we have
RE > 1
which implies that
[I+(M-I)pjl > 1, or l+(M -I)p<1 (9.1.13)
which is possible only if
p c t).
Remark 9.1.1. Cluster sampling is more efficient than SRS if intraclass correlation
coefficient p < o.
Remark 9.1.2. If p = 0, then the cluster sampling and SRS are equally efficient.
Remark 9.1.3. In practice units which are near one another are more similar than
units which are apart, therefore p is positive and hence in general the efficiency of
cluster sampling is less than that of SRS. We will observe later on that cluster
sampling is more efficient than SRS for the fixed cost of a given survey.
Example 9.1.4. In Example 9.1.3, find the value of the intraclass correlation
coefficient. Use it to find the relative efficiency of cluster sampling over simple
random sampling.
Solution. Using information from Example 9.1.3 we have
-273572 .00
221008.50
131676.10
275711.40
2 -394645.00 -326915.00 -435644.00 -405565 .00
207758 .80 276857 .90 257742 .30
229342.70 213507.80
-46239 .30
3 226565 .60 119459.70 70620 .03 -390056.00
108902.70 64379.17 -355586 .00
33944 .76 -187487 .00
-110835 .00
4 39293 .22 342780 .30 -189188 .00 244235 .00
28019 .34 -15464 .50 19964. 11
-134907 .00 174159.90
-96122.50
5 1814554.00 818718 .50 - 108914.00 690670 .70
1416092.00 -188383 .00 1194614 .00
-84997.50 539004.50
-71703 .90
6 1036107.00 -366183 .00 496768 .10 316280.00
-152828 .00 207328.40 132000.90
-73274.40 -46652 .00
63288 .67
7 -10483.10 27487.96 19348.63 -33461.80
-56243.00 -39589.20 68466 .01
103807.30 -179526 .00
-126367 .00
8 -1061.86 989.19 507.11 -182.45
-223367.00 -114510.00 41199 .23
106673.10 -38379.50
-19675.40
9 176083.60 194376.30 193383.90 193829.30
273424 .80 272028.80 272655.40
300288 .90 300980 .60
299443.90
10 303663.90 285537.30 283904 .60 230283.40
282691.80 281075.40 227988 .60
264297 .20 214379.20
213153.40
173759.80
2 -423798.80
3 -420092.04
4 412769.87
5 6019655 .30
6 1612835 .67
7 -226560.20
8 -247807.58
9 2476495.50
10 2586974.80
Thus we have
p = 23928464.6 = 0.349809 .
lO x 5 x (5 -l) x 342021.5
Note that because the value of the intraclass correlation coefficient is positive, and
hence the cluster sampling is less efficient in this particular situation.
(9 .1.14)
Thus we have the following graph which shows that the increase in cluster size M
decreases the efficiency of cluster sampling provided that (1- g) > O.
776 Advanced sampling theory with applications
-o-g=0.1
-X-g=0.3
---lr-g=0.5
~g=0 .7
-o-g=0.9
Cluster size
Theorem 9.1.6. An estimator of relative efficiency from the sample information for
a large number of clusters is ~i,ven by
Est(RE) = ~l +(I -M - 1F;V~sl}. (9 .1.17)
P roof. In cluster analysis, there are three sources of variations, namely
( a ) Total variation,
( b ) Between cluster variation,
and
( c ) Within cluster variation.
Thus cluster analysis may be expressed in an Analysis of Yariance (ANOYA) table
form as below:
=(NM -1)S2.
The corrected between population sum of squares (SSB) owed to cluster totals is
N 2 N 2
()2 _2
I Y;. I Y;. (\2 N
SSB = 1.=1.- - CF = 1.=1.- - J:!.!.L = M I Y;.
M M NM i=1 M
- NM(f) = M I(~. -
N
i=1
f) 2
-
=M(N-I)sl·
The corrected within population sum of squares (SSW) owed to cluster totals is
N M
i=lj=1 M
2 ~~
i=1 N M 2 N -2
= L L Yij - - - = L L Yij -MLYi• = L L Yij -MYi• = L L (Yij -yi.)
i=lj=1 i=l i=l j=1 i=lj=1
N
[2]-2
M N M - 2
N (M -1) M
= I-(- ) I Yij - Y;.
( _)2 = (M -1 )NI-(-1 ) IM (Yij - Y;.
_ )2
i=1 M -1 j=1 i=1 M -1 j=1
N
= ( M -1 ) ISi. 2(
= M -I -N ISi.
N ) 2 = N (M -I )-2
Sw'
i=1 N i=1
SSB=M I
N( f;.- _)2
f
M
MSB = -_- I f;.-
N( _)2
f
i=1 N 1,=1 F= MSB
MSW
=Msl
NM-I N M (
SST=L L Y;j -
-)2
f MST=--IL Yij-Y
I N M ( =)2
i=lj=1 NM -li=lj=l
= S2
778 Advanced sampling theory with applications
Note that 0 ~ SSW ~ 1, therefore the value of intraclass correlation coefficient lies
SST
in - -1 () s P s 1. Again note the conditions of efficiency in terms of intraclass
M-l
correlation coefficient. Further note that the intra-class correlation coefficient can
be easily defined only for clusters of equal size. Similarly the sample analogue of
the population looks as given below:
n-1
=MIn -io-Y
i=1
(y -t M n (y =)2
= - - I i.- Y
n-1 i=1
F
msb
=Msl msw
ssw msw
n(M -1)
=t
i=lj=1
I &ij-Yi.~ = (M
1
-1)~
n M _
~ &ij - Yi.)
2
n 1=1]=1
=s~
sst mst
nM-1 n M
=II ij - Y
& _Y = -1 - In M
I ij-Y& =Y
i=lj=1 nM -l i=lj=1
=s2
780 Advanced sampling theory with applications
Thus if the value of sample F ratio is large, then we can guess that the cluster
sampling may not be efficient. Other guess for the cluster sampling to be effective
than SRSWOR sampling, we need sl > s2/M . Obviously, an estimate of the Intra-
class ( or Intra-cluster) correlation coefficient p can be had from the sample
ANOV A method as:
. -I- -
p ( - JSSW
M- - -.
M -I sst
Note that 0 s ssw s I , therefore the estimate of intraclass correlation coefficient lies
sst
. 1 •
In --(--)~p~l .
M-I
We know that sl and s~ are unbiased estimators of sl and s~, respectively. But
t
-1- Ln ML \Yij - y-)2 1
is not an unbiased estimator of - - N M (
-L L lfr Y =)2 . Let us
nM -I i=l j =l NM -I i= lj=l
try to put -1- NL ML ( Yj' - Y =)2 in terms of S2 and S~ . From the ANOV A for
NM -I i=lj=l U
population we have
The formation of ANOVA helps us to find the multipliers of sl and s~, which, in
fact, are M(N -I) and N(M - I~ respectively. Evidently an estimator of the relative
efficiency RE is given by
( N1J2+MN(I
MN I- I-"M J-2
Sb SW
1
M 2N ( 1- NM J
Sb2
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 781
Example 9.1.5. Mr. Stephen Hom and Mr. Ken Brewer were asked to construct
three clusters of9 regions of the USA using the follow ing two maps :
Precipitation
I ~'~;~~~~:e~
• •
Reco rd
Dr ie st
M uch
Below
No rm a l
Below
Norma '
Ne ar
No rmal
Above
Normal
• •
M u ch
Above
No rm al
R ecord
W ett e st
( A ) Whose clustering plan is more efficient and why? Apply the ANOVA method.
( B ) Find the relative efficiency of cluster sampling with respect to SRSWOR in
each case.
( C ) Select two clusters out of Mr. Stephen Hom's clusters, and construct 95%
confidence interval estimate for the population mean. Apply the sample ANOVA
approach to estimate intraclass correlation. (Rule: Use first row and first column of
the Pseudo-Random Number (PRN) Table I given in the Appendix)
( D ) Select two clusters out of Mr. Ken Brewer's clusters, and construct 95%
confidence interval estimate for the population mean. Apply sample ANOVA
approach to estimate intraclass correlation. (Rule: Use first row and first column of
the Pseudo-Random Number (PRN) Table 1 given in the Appendix)
( E ) Does both confidence interval estimates include true average precipitation of
the USA?
Solution ( A ) : ( I) Mr. Stephen Horn's Clustering: Using information from the
second map, we have
Cluster I 4 72 93 169
Cluster II 3 63 104 170
Cluster III 64 75 90 229
.;:~: .
2
CF= (OT)l = 568 =35847.11.
NM 3 x3
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 783
N M 2
SST = I I lfj - CF
i=lj=l
0.2401
6 9830 .00 1638.33
8 10616.89 1327.11
The value of intraclass correlation coefficient for Mr. Stephen Hom's clustering is
given by
PSlephen = 1- ( -M- - J
SSW
- = 1- (3
M -1 SST
J
- - 9830 = -0.388 ( w hiICh iIS negative.
3 -1 10616.89
ive)
( II ) Mr. Ken Brewer's Clustering: Using information from the second map we
have
4 3 63 70
93 64 229
75 90 104 269
N M 2
SST = L L Y;j - CF
i=l j=1
N 2 2 2 2
SSB = L Y;. _ CF = 70 + 229 + 269 - (35847.11) = 7386.89.
i=I M 3
6.8609
6 3230.00 538.33
The value of intraclass correlation coefficient for Mr. Ken Brewer's clustering is
given by
The value of intraclass correlation coefficient for Mr. Stephen Hom's clustering
plan is negative, whereas that for the Mr. Ken Brewer's plan is positive, which
indicates that Mr. Hom's clustering plan will perform better than Mr . Brewer's
clustering. The reason is that Mr. Hom's clusters have more variation withi n each
cluster and less variation between the clusters .
(B) The relative efficiency of Mr. Stephen Hom's cluster sampling over SRSWOR
sampling will be
2
RE(Mr. Horn) =-S- x100= MST x 100 = 1327.11 x lOO=33 7.31%
MS; MSB 393.44
and the relative efficiency of Mr. Ken Brewer's cluster sampling over SRSWOR
sampling will be
Chapter 9.: Non-overlapp ing, overlapping, post, and adaptive cluster sampling 785
S2 MST 1327.11
RE(Mr. Brewer) = - - 2 x 100 = - - x 100 = x 100 = 35.93% .
MS b MSB 3693.45
( C ) Sample from Mr. Stephen Horn's clustering: Now using the information
from second map we have
iili I 'nt<ll"
f
2
cf = (gt = 399 = 26533.5.
nM 2 x3
nM
sst = L Ly3 -cf = 32 +63 2 +104 2 +64 2 +75 2 +90 2 -26533.51 = 6081.49.
i=l j = l
=f
2 2
ssb yl. - cf = 170 + 229 26533.51 = 580.16.
i=lM 3
0.4218
4 5501.33 1375.33
5 6081.49 1216.29
Thus an estimate of the value of intraclass correlation coefficient for Mr. Stephen
Hom's cluster sampling is given by
786 Advanced sampling theory with applications
Thus we have
.r=) = (1-;;- N1 Jsb2 = ( 2-"3
vI)' 1 1Jx 193.38 =32,23 .
( D ) Sample from Mr. Ken Brewer's Clustering: Now using the information
from second map we have
1136.00 227.20
Thus 95% confidence interval estimate for the average precipitation in the USA is
given by
y= GT = 568 = 63.11 .
NM 3x3
788 Advanced sampling theory with applications
Clearly the true average precipitation in the USA lies in the 95% confidence
interval estimate due to Mr. Stephen Hom's clustering, given by [50.72, 82.28], but
not in the 95% confidence interval estimate due to Mr. Ken Brewer 's clustering,
given by [72.31, 93.68]. Although the estimate of intraclass correlation coefficient
for Mr. Ken Brewer's sampled clusters is slightly negative, but it is not performing
very well because the true intraclass correlation coefficient in the population is
positive .
Theorem 9.1.7. Under the superpopulation model approach, the relative efficiency
is free from sample information and is inversely proportional to the size of clusters.
Proof. We know that under the superpopulation model proposed by Smith (1938)
the relative efficiency can be written as
RE = S2 I(MSl) (9.1.21)
where Sb 2= (N - 1)I
- L N (- li. - Y=)2 and S 2= (NM -I )1
- L L ( Yij -
N M =)2
Y
1=1 1=1]=1
Evidently if the cluster size changes then sl changes but S2 does not, because lij
and hence Y remains the same. Note that NM remains constant, M increases if N
decreases, and vice versa. Thus with change in cluster size, only sl gets altered,
because Y;. changes with cluster size. Smith (1938) suggested an empirical relation
between sl and S2 through a superpopulation model parameter g given by
Sl=S2jMg . (9.1.22)
Thus the express ion for RE reduces to
RE=~= S2 M g = Mg - 1 (9.1.23)
2
Msl MS
Now to find the value of g we make use of data that we selected for one particular
value of cluster size M . Thus (9.1.23) implies that RE is constant K (for example),
and its estimator is
Est.(RE)= {M(N -1)S1 + N(M - I)s; } / M(NM -1)Sl. (9.1.24)
Now we will equate the value of Est.(RE) from the sampled data for a fixed cluster
size M with the value of K . For the fixed value of M we can find the value of g
given by
Est.(RE) = Mg- 1 (9.1.25)
Taking logs on both sides we have
g = 1+ log]Est.(RE )];log(M) . (9.1.26)
Example 9.1.6. Select 6 clusters from Table 9.1.1 using SRSWOR sampling.
Record the values of the real estate farm loans for the selected states from
population 1 given in Appendix . Estimate the relative efficiency using the ANOVA
approach. From this calculate the value of parameter g.
Chapter9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 789
Solution. Starting with the first two columns of the Pseudo-Random Number
(PRN) Table 1 given in the Appendix, we select the required 6 distinct random
numbers as 01, 04, 05, 03, 06 and 07. The following clusters are included in an
SRSWOR sample of 6 units.
Now we have
Also we have
790 Advanced sampling theory with applications
4093293.70
nM( - )2
L L Ilij - Yio = 10714231.64 s; = 446426.32
i=lj=l
n M (
L L Il i' -Y
-)2 = 10706420.62 52 = 369186.92
i =lj=1 ~
We have seen that for a given sample size, the sampling variance increases with
increase in cluster size and decreases with the number of clusters . It may be noted
here that the cost of the survey decreases with the increase in cluster size and
increases with the increase in number of clusters. Thus a need arises to compromise
between the cluster size and the number of clusters in the sample so that the
sampling variance is minimum for the fixed cost of the surveyor the cost is
Chapter 9.: Non-overlapping, overlapping, post, and adaptivecluster sampling 791
minimum for the fixed variance. Mahalanobis (1940, 1942) has considered the
problem of determination of optimum cluster size from the point of view of both
variance and cost. Singh (1956) studied cluster sizes in cluster sampling and sub-
sampling procedures . The best source of discussion of the construction of cost
functions in cluster sampling is that of Hansen, Hurwitz, and Madow (1953). In
their analysis, they postulate that the cost of the survey using cluster sampling
consists of two components in addition to the overhead cost:
( a ) Cost of enumerating the elements in the sample and travelling within the
cluster, which is proportional to the number of units in the sample.
( b) Cost of travelling between clusters, which is proportional to the distance to be
travelled between clusters. Empirical studies show that the expected value of
minimum distance between n points located at random is proportional to f;; .
(9.2.5)
Case I.. Cost is fixed: In this case the Lagrange function L I is given by
Case II. Variance is fixed : In this case the Lagrange function L z is given by
l
Lz = C1nM + Cz"['; +Az[VO - -n ~z - (M - 1)aMb- 1}], (9.2.7)
where Az is a Lagrange multiplier . Differentiating (9.2.7) with respect to n, M and
AZ and equating to zero in each case to find three equations, one solves the
resultant equations for the optimum values of nand M .
s;2 = N (MM -I ) IP
N (
i 1-
i;}
Pi ) .
Note that we have
(N -1)Msl = (NM -1)S2 - N(M -I)S~
or
S2= (NM-I)S2_ N(M-I)S2 = (NM-I)xNMP(I-P) N(M-I)M ~R(I-R)
b M(N-I) M(N-l) W M(N -I) (NM-I) M(N-I)N(M-I)i;}' ,
= NP(I-P) _I_~R(I-R) =P(I_P{~ __I_~P;(I-P;)]
(N-I) (N-I)i;}" t(N-I) (N-I)i;IP(I-P)
= P(I_P{~_I+I __ I_ ~ P;(I-P;)]
t(N-I) (N-I)i;}P(I-P)
= P(I_P{ N-(N - I) + 1_ _1_ ~ P;(I-P;)]
t
(N- I) (N- I)i;}P(I-P)
_P(I-pJ_I_+ M -I{I __I_ ~ P;(I-P;)}]
- t(N -I) M -I (N-I)i;}P(I-P)
Example 9.3.1. Select 6 clusters from table 9.1.1 using SRSWOR sampling.
Record the values of the real estate farm loans for the selected states from
population I given in the Appendix. Estimate the proportion of states having real
estate farm loans of more than $555.4345 in the United States using cluster
sampling. Estimate the variance of the estimator used for estimating the required
proportion. Construct a 95% confidence interval.
794 Advanced sampling theory with applications
Solution. Starting with the first two columns of the Pseudo-Random Number
(PRN) Table I given in the Appendix, we select the required 6 distinct random
numbers as 01, 04, 05, 03, 06 and 07. Thus the following clusters are included in an
SRSWOR sample of 6 units
Here M = 5 and n = 6 . Distinguish the states having real estate farm loans of more
than $555. 4345 in each one of the selected clusters.
I 1 0 0 0 0 1 0.2 0.1225
2 0 0 0 0 1 1 0.2 0. 1225
3 1 I 1 0 1 4 0.8 0.0625
4 1 I 1 0 1 4 0.8 0.0625
5 1 I 0 1 1 4 0.8 0.0625
6 1 0 1 1 0 3 0.5 0.0025
Using Table 2 from the Appendix the 95% confidence interval for the proportion of
states having real estate farm loans of more than $555.4345 in the United States is
given by
Example 9.3.2. Suppose the United States has been divided into 10 neighbouring
clusters consisting of 5 states as shown earlier. We are interested in the proportion
of states having real estate farm loans of more than $555.4345 . Is there any
expectation for gain in efficiency due to clustering as opposed to simple random
sampling?
Solution. Let us test the gain due to cluster sampling through the concept of
intraclass correlation coefficient.
I
From the above table we have
N
P= La; 21
NM=-=0.42
;: 1 50
and
P(I- p)= 0.42 x (1- 0.42)= 0.2436.
The value of the intraclass correlation coefficient is given by
= N-1 {1 _ _I_~P;(I-P;)} = 1O-1{1 _ _I_x~}=07722
P M-l (N-l);:l P(I-P) 5-1 (10-1) 0.2436 . .
which again shows that the cluster sampling is less efficient than simple random
sampling. The relative efficiency of cluster sampling over simple random sampling
is given by
796 Advanced sampling theory with applications
(1O-1)xlOxOA(1-0A) =~=04592
(10x5-1){lOxOA(1-0A)-1.44} 47.04 . .
We have assumed that all clusters are of equal size, which restricts the applicability
of cluster sampling in actual practice. For example, villages or suburbs which are
groups of households, or households which are groups of persons, or schools which
are groups of students and teachers, are usually taken as clusters for operational
convenience. We see that unequal cluster sampling is a more practical situation. We
will now discuss the cluster sampling scheme with clusters of different sizes.:
N
Consider the /h cluster consists of M i, i = 1,2,..., N, units and M o = I.M i is the
i=l
total number of units in the population. The population mean f can be defined as
= 1 N Mi
Y = - I. I. Yij = - I. u.r;
1 N -
(9.4.1)
M 0 i=lj=1 M 0 i=1
- 1 Mi
where lfo = M j~1 Yij denotes the /h cluster mean. Suppose n clusters of unequal
i
size are selected with SRSWOR sampling and Yij denotes the value of thej" unit of
the variable under study in the /h cluster. The following three estimators of
population mean can be suggested.
_ 1 {!._
Yn = - L.Yio, (9.4.2)
n i=1
_ 1 Mi
where Yio = - L Yij'
Mij=1
_* l{!._
Yn = - L.MiYio, (9.4.3)
mOi=!
n
where mo = LM i , and
i=1
_** 1{!._ (9.4.4)
Yn = M L.MiYio,
n i=1
- 1 N M
where M =- I.M i = - o .
Ni=1 N
Theorem 9.4.1. The simple arithmetic mean estimator
_ 1 {!._
Yn =- L.Yio, (9.4.5)
n i=1
_ 1 Mi
where Yio = - L Yij' is a biased estimator of population mean.
M, j=1
Proof. Taking the expected value on both sides of (9.4.5) we have
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 797
_ ) [ 1 n _] 1 n
E (Y n = E - IYi. =- IE Yi. = - IYi• = YN ;t Y .
(_) 1 N - - =
n i= \ n i=l N i=\
Thus the estimator Yn is a biased estimator and the bias can be written as
f-:: )
Bv (-)
=Ey 1 N-
-Y=-D;; 1 N - = 1 N(
-~LM.Y = -~LM .-M)li
-\-;
n n N i=l ,. NM i=\ " . NM i= \ 1 I·
1 I(Mi -M Yr;.
NM i=l
=-
'\
= 1 Cov(r;. , MJ
M
-r) (9.4 .6)
Theorem 9.4.3. The bias and variance of the second estimator of population mean
-* = - 1 ~
Yn -
t:...MiYi., (949)
. .
mOi=1
n
where mo = I Mi , are respectively given by
i=l
B~~)~ _1_
nM
[v(mo)- cov{mo, ~MiYi.}]
2 (9.4.10)
o 1=\
and
V~~)= (1 - f) S;2,
n (9.4 .11)
where Sb
*2 = 2( 1 N
) LM 2(-Y;. - Y=)2 .
i
M N-l i=\
798 Advanced sampling theory with applications
random variables and hence form a ratio of two random variables. From the
standard ratio method of estimation, the asymptotic bias in the is given by y:
B~:) ~ ~[v(mo)-cov{mo,
nM
IM;y;.}] , (9.4.12)
o I;]
where Sj*2 =
M
2(1 )IM;2(-If.- Y=)2.
N-I ;;[
N
Theorem 9.4.4. The third estimator s; of population mean is unbiased and its
variance is given by
v~:*) = (1- / )S;*2, (9.4.15)
n
Proof. In unequal cluster sampling, the total number of units in the sample will be
n
mo = L,M;. The expected value of mo is
i=1
Example 9.4.1. We wish to estimate the average yieldfhectare of the world tobacco
crop. Select four continents from population 5 by SRSWOR sampling. Collect
information about the yieldfhectare from all the countries in the selected continents.
Estimate the average yieldfhectare in the world using three different estimators.
Also construct the 95% confidence interval in each situation.
Solution. We used the first two columns of the Pseudo-Random Numbers (PRN)
Table 1 given in the Appendix to select the four distinct random numbers as 01, 04,
05 and 03. The continents ' Central America', 'Western & Eastern Europe', ' FSU-
12' and 'European Union' will be included in the sample. Thus we have the
following sample information.
';",.i"''''''' ,
_ ,Yi;t-~y;
6 0.000020
8 0.334662
10 0.183612
12 0.021170
Sum '0.539465
Thus the first estimate of average yield/hectare of the world tobacco crop is
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 801
- =.!-~-.
Yn ""Y,o = 7.91 =19775
. .
n i=1 4
Note that
s; = (n-I)-I tCYio - Yn? = 0.179822.
i=1
Thus the estimator of the variance of the estimator Yn is given by
Yn +ta/2(df =
1=1
~Mi -n)~V(Yn)
Using Table 2 from the Appendix the 95% confidence interval is given by
1.9775+ 2.037.J0 .0269733 , or [1.6429, 2.3117].
Estimator 2. We have
Thus
_ 1 n 36
mo =-LMi =-=9
n i=l 4
and an estimate of the average yield/hectare of the world tobacco crop is given by
_* 1 n _ 69.76
Yn = - LMiYio = - - = 1.9378.
mo i=1 36
Note that
*2 _
sb -
1 ~M2(- __
z: i'lio Y n -
*)2 _
41.232
0.1697 .
m2 (n - l) i=1 92(4-1)
Thus an estimate of the variance of the estimator of population mean is
v~~)= (1- f) s~2 = 1- 0.4 x 0.1697 = 0.02545 .
n 4
A (1- a)100% confidence interval of the average yield/hectare of the world tobacco
crop is given by
I 1.838 0.2793202
20.448 0.0805 178
15.490 0.0338484
1.832 21.984 0.1522993
.~~7:9 1 0' ~69:760;
Thus we have
002 _ (
sb - n -
~ (Mi
1)-1 L- - _00 )2
--=- Yio - Y II
_
-
0.5459857 -0
- •
1819952
i= 1 M 3
and an estimate of variance is
v~;o )= (1- f) s;02 = 1- 0.4 x 0.1819952 = 0.027299.
4 n
A (1- a )100% confidence interval of the average yieldlhecta re of the world tobacco
crop is given by
_ 00 ( ) r;:r-:;;\
ta/ 2 df = ~Mi - n " villi J.
II
YII =+=
1=1
Using Table 2 from the Appendix the 95% confidence interval is given by
1.6453 =+= 2.037.J0.027299 or [1.3087, 1.9818].
Now we would like to discuss how ANOVA works for unequal cluster sampling.
=f:-M
N =)2 = (N-I)Sb/2, where P; = MdM.
i Y;.-Y
(_
1=1
( d ) The population corrected within sum of squares (SSW) is given by
SSB MSB
=LM
N =)2
i Y;.-Y
(_ 1 N
=--LM. (_ =)2
y; -Y
F= MSB
MSW
1=1 N -I i=l I I'
=Sb 2
SSW MSW
NM ' 1
=L i (Y;j - ¥;.f
N M·
i=lj=1
f:- ~=1(Y;j - ¥;. f
=N(M _ I) 1=1;
NM-I
Remark 9.4.1. If clusters are not of equal size, then an alternative measure is
adjusted values of coefficient of determination given by
R2 =1- MSW .
a MST
804 Advanced sampling theory with applications
Further note that the value of adjusted R; becomes negative if the cluster sampling
is efficient. However Zasepa (1962) has shown that the value of intraclass
correlation coefficient for unequal clusters can be found as
Example 9.4.2. Mr. Ken Brewer learned from Mr. Stephen Horn that the clusters
are supposed to be heterogeneous and suggested the following clustering plan:
Verify if Mr. Brewer's new clustering plan may result in efficient estimates? Apply
ANOVA approach.
Solution. From the map of Dec. 2002 precipitation (Refer to Example 9.1.5) we
have the following information:
6 9645.92 1607.65
8 10616.89 1327.11
Note that if the clusters are not of equal size then an alternative measure is adjusted
values of coefficient of determination given by
which is negative, and it shows that Mr. Ken Brewer's new clustering plan will
perform better than SRSWOR sampling. Ken! Well done!.
In most of the practical situations the study variable is found to be positively and
highly correlated with the cluster size. Under such circumstances, it is
recommended to select the sample with probability proportional to the size of
clusters. Let P; = 2 ;/2 be the probability of selecting the jth cluster in the sample.
Following Hansen and Hurwitz (1943), let us define a transformed variable,
uij =MiYij/ MoP; , j =1,2, ..., M i ; i =1,2, ..., N . Assuming that n clusters are selected
by PPSWR sampling scheme, define
-uio = MiYio
MoP; Dor I 2
i = , , ..., n .
Theorem 9.5.4. The relative efficiency of the estimator Ypps(cl) with respect to
SRSWR is given by
RE=H-:nr. where
_ 1 [ 1 N Mi (
V(YsrJ=~ - I I Yij -Y
=)2] =~-I
lIN I Mi ( _ _
Y;rY;. +Y;.-Y
=)2
nM M o i=lj=l nM M o i=lj=l
Chapter9.: Non-overlapping, overlapping, post, and adaptivecluster sampling 807
1 1 N[M;
= ~-.2: 2: {(Yij - - J\2 + (_li. - Y
li. =)2}]
nM M O '= ;=\
1 1 N[M . M;(
=---=-2: - ' 2: Yij-Y;'
_)2 +M; (_Y;.-Y=)2]
nM M O ;=1 M; j=l
1
=-=2:
nM i=t
N[M.a~
_,_,_+_,
MO
M . (_ =)2] =~1 [a-.!!:.+V\Ypps(cl)
~.-Y
MO
2
(_
M
)~
n
where
2 N M.a~
a w = 2:_'_'_ .
;=1 MO
This implies
2
(.,., ) - _ a~ a - a~ 1 (2 2)
Vl)'pps(cl) =MV(Ysrs)--=~xM--=-\a -a w '
n nM n n
Example 9.5.1. We wish to estimate the production of the world tobacco crop.
Assume we form 10 clusters of the countries in the world based on the continents
listed in population 5. We wish to apply PPSWR sampling for selecting the cluster.
Should we expect any gain in efficiency over simple random sampling?
Solution. We have
6 0.022356 0.134133
6 0.181742 1.090450
8 0.303623 2.428986
10 0.211104 2.111040
12 0.533628 6.403540
4 0.114825 0.459300
30 0.332388 9.971650
17 0.356282 6.056800
10 1.816470 18.164700
3 0.649733 1.949200
808 Advanced sampling theory with applications
Thus we have
Also we have
0-
2= ~ ~ t (Yij - y)2 =(N -IJS; = (106 -IJX 0.6323 = 0.626335 .
MM o i=lj=1 N 106
Thus the relative efficiency of PPS cluster sampling over simple random sampling
is given by
Thus for this case the PPS cluster sampling will be less efficient than simple
random sampling.
Raj (1954, 1958), Zarcovic (1960), and Foreman and Brewer (1971) have
considered the concept of a superpopulation model for comparison purposes. It is to
be remarked here that other sampling schemes like PPSWOR sampling, systematic
sampling, two-phase sampling and stratified sampling can also be used to construct
the estimation strategies under cluster sampling. Madow (1949), Sukhatme (1954),
and Sampford (1962) have also suggested estimation strategies under cluster
sampling . Singh and Singh (1999) suggested an unbiased class of estimators in
cluster sampling.
Following Royall (1992), Tam (1995) has assumed that the finite population of
interest l =(YI' Y2' ..., YN) is a realization of a random vector r that is related to
the design matrix X = (Xl' X 2, .. ., X NYvia superpopulation model, defined as
(9.6.1)
(9.6.2)
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 809
where !Nxl is a vector of ones. Let s corresponds to the sampled n units and r for
the remainder of the population units (N -n) . Without loss of generality, the
population information can be represented as
1=
-
[!s]
1 '-
y = [Is]
Y '-
X = [XXs]' V = [~s'
V
~sr]
V
-r -r -r - rs' -r
Theorem 9.6.1. The best linear unbiased predictor (BLUP) of population total T is
given by
/J = (X
_
V-I X )--1 x' . V-I Y
~s-s -5) -S-S-s
(9.6.5)
(9.6.6)
For studying the optimal sampling strategies, Tam (1995) has considered the
variance-covariance matrix of the form
V
-
=[~A 'Q]
Q, yz (9.6.7)
810 Advanced sampling theory with applications
where !::.A is a (N - z)x(N - z) non-diagonal matrix that has the same correlation
coefficient in the off-diagonal elements, !::.Z is a z x z diagonal matrix, and
°s z s N. In other words, the diagonal matrix V looks like
v?, po,oz, POI03, • ., PO,ON-z ' 0, 0, ••, °
•
•
°
(9.6.8)
0, 0, 0, z
• . , VN-z+l> 0, 0, . ,.,
0, 0, 0, .,., 0, V~-z+z, .,., °
•
•
0, 0, 0, . , ., 0, 0, 0, .,., 0, v~
where
I if there is correlation p,
Ok = {
° otherwise.
(9.6.9)
where
(I-OIP)v?, 0, 0, , ° O,V?, 0, 0, , °
0, (1- ozp)v~, 0, ° 0, 02V~, 0, °
and !::.3=
Letting », be the number of zeros in V3s' where V3 = (v;s, v;r) and defining
ns = Zs + (n(zs XI- )) , Tam (1995) extended Royall (1992) results to cluster
1+ n-zs-Ip
sampling as given in the following theorems:
Theorem 9.6.2. If !::.! = X~, and ~l = X ~2 for some ~l and ~2 under M(K, y)
defined by (9.6.1) and (9.6.10) we have
Chapter9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 811
"( v
T \.K;V =
)
IN ~(I - 8 p)v. ns [
I I I 1
---rT'=~ (9.6.13)
- j=l n op l j=l ~(I - 8jp)
where
Zopt =: {
if z ~ n,
if z < n,
n-
1
( 7 )(vi
Vj
1
,•••••••••• ,v;llrs =!' K · (9.6.16)
812 Advanced sampling theory with appl ications
Let the population under consideration consist of N distinct and identifiable units.
Assume that these N units are expressible in the form of K overlapping clusters
with N;, (i = 1,2, ' 00' K) units in the r cluster and
K
L N; = M ~ N . The equa lity sign
;=1
will hold only for non-overlapping clusters . For an overlapping cluster situation, a
popu lation unit may be included in more than one cluster and let Fj be the
frequency of the r (;
= 1,2, 00" N) population unit occurring in K clusters. Let Y be
the variable of interest and we are interested in estimating the population mean
- IN
Y = N- L Yj •
j=1
Define
Zij = (M/NFjftlj' i=1,2, .oo,K and j = 1, 2,oo ., N;,
where Yij denotes the value of Y for the r unit in the {h cluster. Then we have the
following schemes:
Proof. Let E z denote the conditional expectation for a given sample of clusters and
E I denote the expectation over all such samples , then we have
z = K - I KL (Z;. -ZK
whereo-bz
_ )zand S;zz =(N ; -1 )-1 N;L (Zij -Z;.
-)Z
.
;=1 j =1
Proof. Let Vz denote the conditional variance for a given sample of clusters and VI
deno te the expec tation over all such samples , then we have
= Vj[~k ;=1
fZ;.]+ EI[~ f(~ __1 )s;;]
k ;=1 n; N;
= _1 f{Z;.-ZKf+-1
f(~ __1 )s;; .
kK i=1 kK N, i=1 n i
Hence the theorem.
Proof. We have
1 K N; Y;j -
=-LL-=Y.
N ;=lj=1 Fj
Hence the theorem.
-\2
where
2
CYbz = LP;
K (-
Z;. - YJ •
;=1
Proof. We have
=~[.!..k 1=1
.~Z;.]+E1[~ .~(..!... __I)s;;]
k 1=1 n; N;
LN;
K
;=1
= M;::: N is satisfied, but the population size N is unknown . When cluster
wise data on units are available on the computer, the values of these frequencies for
overlapping clusters may be easily available. Under such situations, define
Zij =Yij / Fij and W;j =1/Fij for i =1, 2,...., K; j =1, 2,..., N; , where Yij is the value
of y for the /h unit in the /h cluster and let Fij be its frequency of occurring in K
clusters.
Then again we have two schemes as discussed below :
Scheme 1. (a) Select k clusters out of K clusters by SRSWR sample.
( b ) From the /h selected cluster of size N;, (i = 1,2,..., k}, select n; units by
SRSWOR sampling .
Under such a sampling scheme, we have the following theorem :
Theorem 9.7.2.1. The ratio estimator under scheme 1 given by
RB(ZRS)'" K
k
[[O'l~
N NY
IJ[Si~N -SiZWJ]
- O'bZWJK + INl(~ __
ni N
i=1 NY i
(9.7.2.2)
K "[:.N
MSE (ZRS ) '" kN 2
K
i
2[(-z, - Y- -\2
W; (1 1J(s;2+ Y-2Siw2- 2YSizw
J + --:- - N - )~
,~ ~
where
O'bzw = K- 1 HNiZi -K-1yXNiW; -rIN); Sizw = (N i -i): 1 ~ (Zij -Zi XWij -Wi) ;
~l j~
Ni
-I _ -I ni - _I N] _ -Ini
z, = Ni L Zij , Zi = ni L Zij' W; = Ni L Wij' Wi = n, L Wij ,etc
j=l j=l j=l j=1
have their usual meanings.
k N
2
RB(ZRP)'" M [[O'l;. _ O'bZW'J+
YN i=1
IP;(~ni __n,IJ[Si~ _
N
SiZWJ]
YN
(9.7.2.4)
Example 9.7.1. Suppose there are three plots as shown in the figure below and nine
partners who are owners of these plots . A few partners have shares in only one plot,
a few have shares in two plots and others have shares in all three plots .
816 Advanced sampling theory with applications
Cluster III
2 3 4 5 6 7 8 9
We wish to estimate the average income of all partners using plots as overlapping
clusters. Which sampling scheme would you prefer and why?
Solution. We observed that partner 3 is the owner of all the three plots, partner 6 is
owner of two plots and the other partners are owners of only single plots. This
means these plots or clusters of the partners have overlapped with each other. Thus
we shall use the concept of overlapping cluster sampling. Let us compare here two
methods suggested by Tracy and Osahan (1994b) .
K K
MSE (IRS ) '" kN 2 ;~N;
2[(-z;- - W;)
y
-\2 + (1
-;;;- 1J 2]
N D;
3 4 5
2 2
3 5 7 3 4 7 2 3 6 8 9
6000 3000 2000 1000 6000 3000 2000 2000 6000 4000 3000 1000
3 1 2 1 3 1 1 1 3 2 1 1
2000 3000 1000 1000 2000 3000 2000 2000 2000 2000 3000 1000
Thus we shall prefer to use PPSWR sampling for selecting overlapping clusters in
the sample.
have considered a ratio of the expected value of the variable under study to the
expected value of the auxiliary variable. Consider a finite population consisting of
N distinct units. Let Y be the variable under study and let it take the values
Yj, Y2" ",YN in the population . Let X be the auxiliary variable . The N elements in
the population may, in principle , be grouped into M clusters, the cluster r
consisting of N, elements. The population mean, X = N- 1L:
M N·
t X ij , is assumed to
i =l j = l
be known. The ratio estimator to estimate the population mean r under post-cluster
sampling is given by
_
M M N·
( -nm i=1 t
L:IJi BijYij
j=1
)
-
_
Y-
X (9.8.1)
YR = (~~IJi! BijXij ) X = i
nm i=1 j=1
where nand m denote the size of initial random sample and number of post-clusters
selected , IJi and Bij are the random variables used for selection of post-clusters and
the elements from the population . Then we have the following corollary:
Corollary 9.8.1. The covariance between y and x is given by
2(M
__) M(N - n) M -mXn-l) (M -mXN -n)(- -\ (9.8.2)
COy (y, x = Vx + V + X Y j,
mn(N -I) Y nmN(N -IXM -I) bxY mn(N -I)
1M Ni ( _ _)
where Vxy = N- L: L: XijYij - X Y is the covariance between Y and X in the
i= lj=1
M
population, vbxy =M-IL:(Xi-X/MXY;-Y/M) is the covariance between cluster
i=1
totals for X and Y in the population. From (9.8.2), one can easily derive v(y) and
v(x) . Following standard ratio method of estimation, we have the following
theorems:
Theorem 9.8.1. The relative bias expression for the ratio estimator YR, to the first
order of approximation, is given by
RB(YR) = M~N-n~(c;_CXY)+ mn
nm N - I
(M(-mXX -I)N)[c;. -cx.yol
N- 1 M- 1 I I I
(9 .8.3)
where
C; = Vx / X2 = Square of coefficient of variation of X,
Cxy = Vxy /(r x)= Coefficient of co-variation between X and Y,
( b;/ )
II:
X /M Y M
= Coefficient of co-variation between X i and Y;.
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 819
Thompson (1990) reported that the use of adaptive cluster sampling for patchy
populations is an efficient design. Brown (1996) pointed out that adaptive cluster
sampling is efficient only for very patchy populations, but can be highly inefficient
for other less aggregate populations. Christman (1997) also supported Brown's
views about adaptive cluster sampling in case of non-patchy populations. In
general, the adaptive cluster sampling can be done in two steps. In the first step a
preliminary sample of n units is selected; for example, a random sample of n
quadrants from a study area are divided into N evenly sized quadrants. In the
second step, for any quadrant in the initial sample for which the variable of interest,
y, for example the number of plants in the quadrant, has a value as large as a
predefined critical value, the neighbourhood is sampled. The neighbourhood can be
defined in several ways, for example, four surrounding quadrants , that is, on the
east, west, north, and south sides. If any of the quadrants in the neighbourhood has
a value at least as the critical value, the neighbourhood of that quadrant is sampled,
and so on. The sampling of the units continues until all the neighbourhoods of
sampled quadrants are sampled, similar to inverse sampling. The difference is that
here initial sample size remains fixed, but the final sample size is variable and
depends on the number of networks selected in the sample. The group of adjacent
quadrants whose values are all at least as great as the critical value is known as a
network. If we divide a population into K networks then the Horvitz and
Thompson (1952) estimator of population total in adaptive cluster sampling
becomes
v_I
'adcl - -
~y;Ii
L..,- (9.9.1)
N i ; 1 Jri
where
Yi = total of the Y values in the /h network.
th
t. = {I if any quadrant in the i net work is in the initial sample,
I 0 otherwise,
Exercise 9.2. Let Yijk be the value of Y for the j(h element of the /h cluster in the /h
stratum. Define
_ I K Mij
Y;••= M ;. L L Y;jk = the per element cluster mean;
j=1k=1
n Mij
= the per element (large) sample mean in the /h stratum,
..0. •
Y. j . = M:) L L Yijk
;=l k= 1
• n ~h
where M. j = L M ij denotes the number of elements in the} stratum in the sample ;
;=1
where m.j= fmij denotes the number of sample elements in the /h cluster of thej"
;=1
stratum. Study the estimator RAS of the population ratio R =;~IM;Y; /;~IM;X; ,
defined as
.
R AS= I
K..
M. jY. j
IKI M.. j x.j
. .
j=1 j=1
Exercise 9.5. Suppose the overlapping clusters are selected such that if V; is being
selected then (M; -I) units are associated with it to form a cluster of M ; units.
Then show that an unbiased estimator of population mean Y is given by
__ 1 ~ '" Y j
Ya - - L . L . - '
n ;=l jE;M j
Hint: Amdekar (1985).
Exercise 9.6. From a population consisting of N clusters each contammg M
elements, a simple random sample of n clusters is selected to estimate the
population mean per element. Consider S; = aM g, with g < 0, be the variance
within clusters under the superpopulation model. Find the optimum cluster size
such that the variance of the estimator of population mean is minimum for the fixed
cost given by C =nMcI +cz"[;; .
Exercise 9.10. Consider a large library with N distinct titles, each title being
present in one or more volumes. A sample of n books is to be drawn to estimate a
proportion 1( (say), for instance, the proportion of Canadian books. Because the
sampling the books directly proves physically difficult , the librarian develops a
scheme for choosing an SRS from the card catalogue. Since the catalogue contains
a card for each volume, not for each title, this is a sampling of titles with
probab ilities proportional to size, the 's ize' of the l h title being the number of
volumes M i ' It is assumed that M i are known at least for the sample units. Define
z,. = M I./M , I
. = 1" 2 ... " N h
were M = ~M.
L.
. = {I if the lh
I ' y, . title is Spanish, then
i=1 0 otherwise,
= N ~/i . Let
I N
our interest is to estimate the population proportion, J! Ui = y;/(Nz;) ,
n n
Vi = 1/(Nzi ) , i = 1,2 ,..., n, iI = n- I L.Ui and v= n-I L. Vi . Find the bias and variance of
i=1 i=1
the following three estimators of population proportion, defined as
7l-1 = iI ; 7l-z = iI + (1 - v) and 7l-3 = ii/v.
Discuss their relative efficiencies with respect to one another .
Hint: Alalouf (1996), Mayor (2002).
Practical 9.1. Using the MAP of the USA, construct 10 clusters of the 50 states,
each cluster consisting of 5 states, listed in population 1 based on their locations .
Chapter 9.: Non-overlapping, overlapp ing, post, and adaptive cluster sampl ing 823
Practical 9.2. Select 6 clusters from the list of clusters you made in practical 9.1 by
using SRSWOR sampling. Record the values of the nonreal estate farm loans for
the selected states from population 1 given in Appendix. Estimate the average
nonreal estate farm loans in the United States using cluster sampling. Estimate the
variance of the estimator used for estimating the average nonreal estate farm loans.
Construct a 95% confidence interval.
Practical 9.3. Suppose the United States has been divided into neighbouring 10
clusters each consisting of 5 states as shown below.
Practical 9.4. In practical 9.3 find the value of the intraclass correlation coefficient.
Use it to find the relative efficiency of the cluster sampling over the simple random
sampling.
Answer: The value of intraclass correlation coefficient p = 0.3451 .
Practical 9.5. Select 6 clusters from the table given in Practical 9.3 by using
SRSWOR sampling. Record the values of the nonreal estate farm loans for the
selected states from population 1 given in the Appendix. Estimate the relative
efficiency using the ANOVA approach. Hence, deduce the value of parameter g.
Practical 9.7. Divide the United States neighbouring 10 clusters each consisting of
5 states as shown in practical 9.3. Consider the problem of estimating the
proportion of states having nonreal estate farm loans of more than $878.16 in the
United States. Is there any expectation of gain in efficiency due to clustering rather
than simple random sampling?
Practical 9.8. A world level team of doctors believes that the estimation of total
tobacco use in the world is an important factor in the persistence of health
problems. Select four continents from population 5 by SRSWOR sampling. Collect
information about the yield/hectare from all the countries in the selected continents.
Estimate the yield/hectare in the world using three different estimators and report
your 95% confidence interval to the doctors .
Practical 9.9. A tobacco farmer wishes to find a strategy which provides a better
estimate of the production of tobacco crops in the world. He divides the world level
tobacco growing countries into 10 clusters as listed in population 5 of the
Appendix. He wishes to apply PPSWR sampling for selecting the cluster. Can he
expect any gain in efficiency over simple random sampling?
Practical 9.10. For the purpose of comparison in overlapping cluster sampling,
suppose the clusters are formed and selected by following two different sampling
schemes :
Scheme 1. (a) Select k clusters out of K clusters by SRSWR sampling.
( b ) From the lh selected cluster of size Ni , (i = 1,2,...,k), select ni units by using
SRSWOR sampling.
Scheme 2. (a) Select k clusters out of K clusters by PPSWR sampling, with
P; = NilM , ( b ) From the lh selected cluster of size N i , (i = 1,2,...,k), select ni
units by SRSWOR sampling.
Assume that the population size is unknown and three possible clusters are formed
as shown in the figure below .
Overlapping clusters
Show that the relative efficiency of the estimator of the population mean based on
scheme 2 with respect to that based on scheme 1 is 114.21%.
Hint: Tracy and Osahan (1994b).
Chapter 9.: Non-overlapping, overlapping, post, and adaptive cluster sampling 825
Practical 9.11. Ms. Stephanie Singh and Ms. Renee Hom were asked to construct
three clusters of9 regions of the USA using the follow ing two maps :
• •
R ecord
Dries'
Muc h
Below
No rma l
Below
Normal
Near
Norm al
Abov e
Normal
• •
Muc h
Abo ve
Normal
Reco rd
Wettes t
( b ) Select two clusters using SRSWOR sampling from Ms. Stephanie Singh's
clusters and construct 95% confidence interval for the average precipitation. (Rule:
Always start from first row and first column of the Pseudo-Random Number Table
I given in the Appendix).
( c ) Select two clusters using SRSWOR sampling from Ms. Renee Hom's clusters
and construct 95% confidence interval for the average precipitation . (Rule: Always
start from first row and first column of the Pseudo-Random Number Table I given
in the Appendix)
Hint: Apply ANOVA approach.
Depending upon the quality and their standard the selling prices of one pipe of
these 15 suppliers are given below:
We wish to estimate the average selling price of the pipe by different suppliers.
Instead of asking all the suppliers, we wish to select all the suppliers working in few
states. At first-stage select two states and contact all the suppliers within the
selected states to collect information about their selling prices. Use the concept of
overlapping cluster sampling for estimating average selling price of a pipe.
Chapter 9.: Non-overlapping , overlapp ing, post, and adaptive cluster sampling 827
NH
VT
2 12
NY
3
4
13
S
ME
MA
6 15
( b ) Estimate the relative efficiency of the cluster sampling with respect to simple
random without replacement sampling.
( c ) Confirm your result in ( b ) based of the value of coefficient of determination.
Hint: Apply ANOVA approach with unequal number of units in each cluster.
Practical 9.14. The population of interest is the students in your class today. Each
of the rows of students in the class will form a cluster.
( I ) We wish to estimate the proportion of students in the class' Who visited theatre
last week?'
( a ) Using the Pseudo-Random Number Table 1 given in the Appendix select two
rows and estimate the proportion of students who visited a theatre last week. Also
construct 95% confidence interval estimate.
( b ) Now ask the same question to everyone in the class and find the true
proportion of such students who visited a theatre last week. Does the true
proportion lie in your confidence interval estimate? If yes, it is fine, but if not, then
suggest a suitable reason for it.
( a) Using the Pseudo-Random Number Table 1 given in the Appendix, select one
row and estimate the average GPA of the class. Also construct 95% confidence
interval estimate for the same.
( b ) Now ask the GPA question to everyone in the class and find the true average
GPA. Does the true average value lie in your confidence interval estimate?
Give your opinion .
10. MULTI-STAGE, SUCCESSIVE, AND RE-SAMPLING
STRATEGIES
10.0 INTROD.U€TION
The meaning of multi-stage sampling is clear from its name. Here we have several
stages for the sample selection. In fact it is an extension of the concept of cluster
sampling. Similar to cluster sampling first we divide the population into M clusters
or heterogeneous groups. We select m clusters and use the estimates of cluster
means or totals to form population estimate. For example, in two-stage sampling we
again divide our population into M groups and select a sample of groups, which
form the first stage sample. The units so selected are called first stage units (FSU).
First stage
sample
Second stage
sample List of selected
villages from each
district
Fig. 10.0.1 Two-Stage Sampling Scheme.
First stage units are sometimes also called preliminary stage units (PSU). From
each selected group at first stage, we select a sample of the population units forming
the group. The sub-sample of the first stage sample is called the second stage
sample and the units so selected are called the second stage units (SSU). Similarly
one can think three-stage or four-stage sampling and hence multi-stage sampling. A
pictorial representation of two-stage sampling is given in Figure 10.0.1. Thus, a
scheme of the type where at each stage we have a selection and go on selecting
smaller and smaller units is called multi-stage sampling scheme.
Let us define a few symbols, which will remain useful for understanding the
concept of multi-stage sampling, as follows :
N= Total number of first stage units (FSUs) in the population;
n = Number ofFSUs selected in the sample;
M, = The number of second stage units (SSUs) in the i th selected first stage
units (FSUs), (i = I,2, ...,N);
mj = The number of second stage units (SSUs) selected from the lh FSU in
the sample of n FSUs, (i = 1,2,..., n );
h
Tij = Total number of third stage units (TSUs) in the/ SSU of the lh FSU;
tij= No . ofTSUs selected in the sample from the/h SSU of the lh FSU in the
sample, i = 1,2,..., nand j = I,2,...,mj ;
l
Jik = value of the IIh TSU of the SSU of the lh FSU of the study variable
Y in the population, i = 1,2,..., N ,. j = I,2,...,M j , k = I,2,...,Tij;
h
Yjk = value of the II TSU of the /h SSU of the lh FSU of Y in the sample for
i = 1,2, ..., n ; j = I,2,....mi , k = I,2,...,tij ;
T,..
Yij. = f Yijk , total corresponding to all the TSUs in the/h SSU of the lh FSU
k=1
in the population;
S?= (M -It1[1!
j
J=l
YJ. - Mjl(1!
J=l
Jij.)2], the population mean square error
between M, units of the lh FSU;
sl=(mj -Itl[~ yJ. - mjl(~ Yij.)2], the sample mean square error between
J=l J=l
S;=(N -ltl[ .~lj;. - N_I(.~lj•• )2], the population mean square error between
1;1 ,;\
sE = (n -ltl[.IYj~. - n-\( .IYj•• )2], the sample mean square error between the
1;1 ,;1
totals ofFSUs.
Suppose we like to estimate the population total Y . Let Y denote its unbiased
estimator. At the first stage of sampling, we have N units in the population.
Therefore an estimate of per first stage unit (FSU) is given by YiN . Note that M,
j = 1,2,...,N , units are selected at the second stage from the selected lh first stage
unit (FSU), the total number of units at the second stage of sampling will be given
N
by M = IM j and hence the required unbiased estimator of per second stage units
j;\
(SSU) is Y/M . Now let the total number of units at the third stage units (TSU) be
N M' ,
T= I t Tij . Then an unbiased estimator of per third stage unit (TSU) is Y IT . Our
j;lj;1
objective is to find an estimator of the population total. For this purpose we proceed
as follows:
2 • n
1,2,...,ml 1,2, ...,m2 • 1,2, ,mn
tl j' t2j ' , t mlj tl j,t2 j, ·.., t m2j • tIj ' t2j ' , t mnj
j=I .2, , ml j = 1.2•...•m2 j = 1,2, .m;
Corollary 10.2.1. Suppose SRS has been used at each stage of selection of the
sample. Then if the Yj values are known for FSUs an estimator of the population
total Y is given by
, N n
Y ms = - L Yj . (10.2.1)
n j;\
In the case in which the Yj values are not known an estimator of the population
total can be obtained by replacing them by their estimators
832 Advanced sampling theory with applications
A M, mt
Yi = - I Yij '
mi ) ; 1
Thus
A N n M . mi (
Yms = - I-' I Y ij . 10.2.2)
n i;l mi );1
In the case in which the values of Y ij are not known they can be estimated as
I"
Corollary 10.2.2. Consider FSUs are selected using PPSWR sampling with Pi
being the probability of selecting the lh unit and Y values are known for FSUs,
then an estimator of population total Y is given by
Ym s -
A _.!. ~L. 1:'L . (10 .2.4)
ni=IPi
In the case in which the Y i values are not known and SSUs are selected with SRS
and they can be replaced by their estimators
A M, mi
Y i = - L Yij '
mi )=1
Then an estimator of the population total Y is given by
1 n 1 M, mi
A
(10.2 .6)
n i=1 Pi mi )=1 tij k=\
Corollary 10.2.3. If SSUs are selected with PPSWR sampling, whereas first and
third stage units are selected with SRS, then an estimator of population total is
given by
A N n 1 mi 1 T;) lij
Ym s = - I - I - - t I Yijk . (10.2.7)
n i=l mi j=l Pij ij k=l
10. Multi-stage, successive, and re-sampling strategies 833
In the same fashion the estimator of population total can be obtained under any
finite number of stages. It may be remarkable that the lesser the number of stages,
the more accurate win be the estimators based on a given sample size, but may not
be for fixed cost.
Let us first discuss simple cases to find the variance of the estimators of population
total under multi-stage sampling. Let us first consider the situation of three-stage
sampling, assuming that SRSWOR sampling has been used at each stage of
selection of the sample. Evidently an unbiased estimator of population total is
, N n M, mi Tij tij
Y3s =- I - I - I Yijk . (10.3.1)
n i=1 m i )=\ tij k=1
To find its variance let E3 and V3 denote the conditional expectation and variance
for the fixed first and second stage samples; E2 and v2 denote the conditional
expectation and variance for the fixed first stage samples; EI and VI denote the
expected value and variance for an possible first stage samples and we have
V(Y3s ) = EIE2V3(Y3s) + E\V2E3(Y3s )+ fJ E2E3(h s ) = II + 12 + 13 (say). (10.3.2)
Note that the sampling has been done independently in each FSU and SSU we have
N2 n
= E I -2 I - I
u, Mi Tij(Tij -tij) 2] =N -NIM-i uI, TijlTij -tij)Sr2,
Sr
(10.3.3)
[ n i=1 m i )=1 tij U n i=1 mi ) =1 tij U
(10.3.4)
834 Advanced sampling theory with applications
NL nM . m;] [Nn - ]
-' L lfj. = V1E2 - LM;Yij.
= V\E2 -
[ n ;=1 m; j =1 n ;=1
= V;[N~y,
1 L... /..
]=N(N-n)S2b • (10.3.5)
n ;=1 n
Thus on using (10.3.3), (10.3.4), and (10.3.5) in (10.3.2) we have the resultant
variance of the estimator in three stage sampling given in the following theorem:
with variance
.
V IV2s
A
N(N -n)s2 N ~M;(M; -m;)s2 (10.3.8)
b + - L...
• )_
- i
n n ;=1 m;
Solution. For selecting first stage units, we used the i h and 8th columns of the
Pseudo-Random Numbers (PRN) Table I from the Appendix to select four distinct
random numbers between I and 10 as 07, 09, 01 and 02. Obviously the four
continents selected in an SRSWOR first-stage sample of four units are Other
Africa, Middle East, Central America, and Caribbean.
Note that there are 30 countries in the continent Other Africa. To select two
countries out of these 30 countries, we selected two distinct random numbers
between 1 and 30 by using first two columns of the Pseudo-Random Numbers
(PRN) Table 1 from the Appendix as 01 and 23. Thus at the second-stage the
countries Angola and South Africa will be included in the sample.
Further note that there are 10 countries in the continent of the Middle East . To
select two countries out of these 10 countries, we selected two distinct random
numbers between 1 and 10 by using third and fourth columns of the Pseudo-
Random Numbers (PRN) Table 1 from the Appendix as 06 and 07. Thus at second
stage, the countries Oman and Syria will be included in the sample.
Similarly the second stage countries selected from Central America are Honduras
and Nicaragua, whereas those from the Caribbean are Haiti and Jamaica,
respectively.
The structure of sample information collected takes the following form. Here we are
given N = 10 and n = 4 .
Other I
Africa 2
2 Middle 10 2 I 11.30 0.0320
East 2
3 Central 6 2 I 11.43 0.3750
America 2
4 Caribbean 6 2 I 9.84 2.9400
2
sb2 =1
- [n[M
L - i mi
L Yij )2 -n\Y2s \2] =1-[ 77.4 22- 4(193.55
f :. / N J - )2]
n - 1 i= m i j =l 4 -1 10
836 Advanced sampling theory with applications
== 4495.3923 == 1498.4641 .
3
An unbiased estimate of V(Y2s) is given by
V(Y2.) == N (N -n) s~ + N I.
M ;(M; -m;) sf == 10(10-4) x 1498.4641 +.!.Q. x 217.568
n n ;;1 m; 4 4
== 23020.8815.
A (1- a p00% confidence interval for estimating population total Y is given by
Suppose first stage n units are selected by SRSWOR sampling out of N units in
the population. The second stage m units are selected from the given first-stage
units each consisting of M units . If Yij denotes the value of the sample unit
corresponding to /h first stage unit and /h second stage unit, then an unbiased
estimator of population mean is given by
=
Yeq
1 n M m
== - 2:- 2: Yij (lO.3.10)
n ;; \ m j;\
2 1 N (- -\2 2 1 N M ( - \2
whereSb == - - 2: Y;.-Yj and Sw== ( )2:2: Yij-Y;.j .
N - 1;;\ N M - 1 ;; lj; l
Consider C 1 be the cost of selecting first stage unit in a sample, C2 be the cost of
selecting a second stage unit, and Co be the overhead cost, then the simplest cost
function in two-stage sampling can be written as
C == Co + nC1 + mnC2 • (lO.3 .12)
One can easily observe that the optimum first stage and second stage sample sizes
for the fixed cost of survey are given by
(lO.3.13)
and
For simplicity let us assume that m is the number of SSUs selected from each of the
selected n FSUs and p is the number of TSUs selected from each of the selected
m SSUs. As we saw earlier in three stage sampling, the variance of the estimator
consists of three components. Let us assume that it is given by
2 2 _ 2
' ) =ab- +a-w+ -
v(Y3s aw
-, (1004.1)
n nm mnp
where
2 1 N( -\2 2 1 N M (- - \2 -2 1 N M P ( - \2
ab =-L If•• -Yj ,aw=--LL Yij.-lf•• j and a w = - - L L L lfjk - Yij.j
N ;=1 NM ;=lj=1 NMP ;=lj=lk=1
have their usual meanings .
Suppose Co is the overhead fixed cost for the survey and Cl> C2 ' C3 are the costs of
enumerating a unit at first, second, and third stage of sampling. Obviously, the
simplest cost function is
C = Co+ nC1 + nmC2 + mnpC3 . (1004 .2)
st. 0'2
--=--_w-+ AC2 =0 z> (mn)= ~ ,
o(nm) (mnf (1004.5)
"AC2
and
si. -2
=-~+AC3 =0 ~ (mnp)= ~ . (1004.6)
o (mnp ) (mnp) "AC3
From (1004 .2), (100404), (1004 .5), and (1004 .6) we have
O'w ~ <Tw ~
nopt
O'b(C - Co)
=~[O'bCJ +O'w.fG; +<Tw~r' mopt = O'b V~ ,an
d
Popt = vc;- .
O'w (10.4.8)
The minimum variance can easily be obtained by substituting these optimum values
in (10.4.1). The three-stage sampling considered so for is called traditional three
stage sampling . If FSUs consists of district, SSUs consist of villages, and TSUs
consist of bank accounts of customers, then from each selected village the bank
account holding customers are required to be sampled separately and independently
across the villages. Raj (1968) and Rao (1975) have discussed the estimation
strategies in greater detail.
To draw a sample of n FSUs, Chaudhuri (1997) has used the well known Rao,
Hartley, and Cochran (1962) scheme of sampling using Pi En . First we divide the
population n at random into n groups, taking N, FSUs in the i th group such that
n
IN; = N , the population size. From each group, one unit is selected with
;:1
probabil ity proportional to Pi, and selection is done independently across the
groups. Let Q; be the sum of the probabilities Pi falling in the /h group. The m;
SSUs are also selected using RHC scheme from the til FSU of size M; . We split
the M; SSUs into m; groups at random, taking Nij SSUs in the/h group such that
E(Ym3s) = E1E2E3 [± I Qi
i=1 Pi )=1
Qij Wij] = E1
Pij
[± I Qi
i=1 Pi
E)
1 )=1
Qij Wij}]
Pij
[± '!
= E1 Qi
i=1 Pi )=1
Yij] = E1 [± Q i YiO ] =
i=1 Pi
Y.
Let Y; and X i be the totals of the /h FSU of the study variable Y and auxiliary
N
variable X , respectively. Assume that the population total X = LXi of the
i=1
auxiliary variable is known and we are interested in the estimation of population
N
total Y = IY; of the study variable. A sample S of n FSUs is selected according to
i=l
any design with J( i and J( ij as the known first and second order inclusion
probabilities, which are in fact function of first stage sample size n , The selected
FSUs are sub-sampled independently with suitable selection probabilities at the
second stage. When the /h unit becomes selected then it is assumed that from
sampling in the second stage, estimators t iy and t ix , respectively, for Y; and Xi are
available such that
M·
C2ViY' tiJ=CTixy=Mj1i(xij-xiXYij-f;), E 2ViY)=Y;, and E2(tix ) = Xi '
)=1
with
( ) N O'.
COVl!y, tx =axy +I ~, (10.6.2)
i=J 1fi
where
~ (1fi1fr 1fij XXj 1fi - X j /1fJ Y;/1fi - Yj /1fJ
a xy = -2'
i* j=1
Proof. By the definition of covariance we have
Cov(ty, tJ= £ICzVy , tx)+Cl[EZVy~ £z (tj
na iXY ] [n y, n
=£1 L-- +Ct L-!..., L-' =0'
X.] a ixy
+ NL-- .
[i=1 1fl i=l 1fi i=l 1fi -'Y i=1 1fi
Corollary 10.6.1. From the above theorem the following results are obvious :
() z N ai~
Vl!y = a y + I -, (10.6 .3)
i=J 1fi
where
and
a;+ I -.--.i£,
a N Z
V(tx) = (10.6.4)
i=1 1fi
where
Sahoo and Panda (1997) first defined a sub-class to estimate the til unit population
total Y; (say) as
(10.6.5)
where lzi ~iY ' tuJ is a function of t iy and t ix , such that lz i(Y; , X;) = Y; satisfying
certain regularity conditions analogue to those defined by Srivastava (1980).
Evidently an estimator to estimate Y can be defined as
10. Multi-stage, successive, and re-sampling strategies 841
IIY,
o
'» = I--!...= IIz; l ;y ,lix V1(;
11 ( )
(10.6.6)
;=11(; ;=1
and that to estimate X can be defined as
(10.6.7)
Using (10.6.6) and (10.6.7), Sahoo and Panda (1997) have defined a main class of
estimators of population total in two-stage sampling as
(10.6.8)
where .Ii~;, I:] denotes the function of I; and I: satisfying certain regularity
Using the second order Taylor 's series, the class of estimators Yc can easily be
expanded around the point, (Y, X) as
Yc=Ji[Y+~;-Y) x +~:-x)l
=.Ii(Y,x)+~; - Y)Of;
o ty
f;
I(r ,x ) +~:- x) O I(r ,x )
Otx
+.....
=I; + V: - X )tiz(Y,X)+ ....
= ~ ~ +~:-X~2(Y'X)+ ""
;=11(;
_ 11 t;y+(t;x-X;)1z2;(Y;'X;) (0 L ( )
- L +Vx -X)Tl2 Y,X + ....., (10.6.10)
;=1 1(;
where .li2(Y, X) and 1z2;(Y; ,Xi) denotes the known first order partial derivatives of
the respective function used for the construction of the estimator. Then using the
definition of variance we have
842 Advanced sampling theory with applications
V(Yc
, ) = O"yZ+2li z( L Z( U NO"i; +2f 22i(Y;, x i}Tixy +flzi(y; ,x i}T& ( 10 .6 . 11)
Y,XP xy + li z Y,XP x + I
i= 1 " i
Sahoo and Panda (1999a, 1999b) extended the results to the situation when two
auxiliary variables X and Z are available to estimate the population mean of the
study variable Y under two-stage sampling.
Bellhouse and Rao (1986) have pointed out that the prediction estimator may be
only marginally better than the classical estimator in PPS sampling under two stage
sampling design from the efficiency point of view. Following them, consider a
population of N units consisting of L FSUs with M j units in the FSU r
L
(i = 1,2, ..., L ) such that N =I M j • Let Yij be the value of the lh unit in the {h FSU
j=1
by selecting a sample, s, of the FSUs and then choosing, for each i E S , a sample
Sj of units from the {h FSU . The probability p(s) with which the composite sample
is mi, where v and mj are pre-specified. The first order inclusion probability for {h
FSU is lrj = LS)jP(S) and that for lh unit in the {h FSU is lrij = L;3(j,j)P(s) . Then
we have the following theorem:
where d sij = N bsij -1 and bsij are defined for all s and all (i, j) E S.
Scott and Smith (1969) defined a model by assuming that Yij are uncorrelated, for
each i, with mean j.Jj and fixed variance a} and that j.Jj are uncorrelated with
mean j.J and variance (7z . They suggested a model as
(10 .7.2)
10. Multi-stage, successive, and re-sampling strategies 843
with
Emleij)=O, E(eJ)=02+ o } , Em (eijeij') = 02, J*J' and Em leJyeh'j') = 0, h e h",
(10,7.3)
h
were Yo
- = L: Mi s:
'<"' Yij /'<"'M
- - =- 1
z: i » Yb L: L: Yij an d IIf IS
' a cons tan t sue h th at the
iES jes] m iES m ViES jes]
variance of the estimator Yprl is minimum. It can be easily shown that vlY"prl) is
minimum if the optimum value of 'P is given by
2 1
'Popt = ~N LMi,
iES
with A= 02[02+ a
m
J- (10.7.4)
where A; = 02[02+ ~:
Bellhouse and Rao (1986) have proposed a two-stage permutation model :
Yij=Y+eij (10.7.6)
where Em leij)= °
,Em (eJ )=al +a;, Em (eijeij') = al- (Mi -lt1a;, J * J', and
Em(eyei'j') =- a;/(L - I), h * h' .
Theorem 10.7.4. The MSE of the estimator Ybs under model (10.7.6) is
)2]+Em(E)
M; 1
(
2 2 - 2
[
+o"w L - - L !Jf;r L - - L bsij
ie s M ; -1 ) es; ies M ; -1 )es;
(10.7.7)
Theorem 10.7.5. For any fixed size design, the optimal model unbiased predictor of
population mean r is
Ia·Iy··
{1-(A)N- {(A)N-
Ia .M·I y ··
Yopt = 1I a;M;} ; ' ) l) + 1 Ia;M;} ; I ' ) IJ (10.7.8)
L-l ies mIa; L-l ie s mIa;M;
; ;
-1
Lm. M ·-m·
where r =0"b2/0"2w and a,
"
= m,
(
-----.!L+
L-l
' ,
M ,.-l )
Y; Yij and
m; )=1 m; ) = \
are the simple expansion estimators of the /h cluster totals for Y and Z
respectively. The coefficient U; is free from y or Z values. Royall (1986) does not
guaranty that the model considered by him is correct, but he adopts it as a tool to
use in planning and inference. Now we have the following theorem:
Theorem 10.8.1. If variables in different clusters are uncorrelated, then the error
variance of the estimator under two-stage sampling design is given by
v(y-)= N B1 -2NB2 +NB 3, (10.8.1)
f
10. Multi-stage, successive, and re-sampling strategies 845
where
I N _ -1 mi
i cov Ysi'Yi ' y.
2 2(_)
Bt = - LUi M; V Ysi , Ysi =mi L Yij' B2 = n-1 i=NLujM 2 (- - )
= Y./M . and
n i=1 ) =1 l I I I
Royall (1986) has considered both the situations when the cluster sizes M i are
known and unknown , to estimate the variance. We will discuss only the situation
when the cluster size is known. Royall (1986) considered the following model
which depends upon three unknown parameters u, p , and (]"2, given by
if i = k, j = I,
if i = k, j *1 = k, (10.8.2)
if i * k.
The above model has been also used by Scott and Smith (1969) and Royall (1976).
Burdick and Sielken (1979) have studied the same variance estimation problem
under the same model but focused on obtaining unbiased variance estimators
having chi squared distribution, whereas Royall (1986) has suggested robustness in
the sense of consistency under broad conditions. In fact, this model describes
populations in which the cluster total }j is roughly proportional to cluster size M i ,
that is, E(}j) = j.l Mi , with Yij correlated within clusters.
Royall (1986) suggested the following steps to estimate the three different
components of the variance,
v(y)= N B,-2NB2+NB3 .
f
( a ) Estimate the third component under the model defined as:
0-2 if i = k,j = t,
M 1 :E(Yij)=,u,and M( : Cov(lfj,Ykl ) = p0-2 if i=k,jot,z, (10.8.5)
i * k.
1
o if
"Robust = ;2 j~Uj(Uj - 1
f~j + ~ j~UjMj(l- .Ii{ J+ t~lMl - ~ j~UjMl P2, (10.8.8)
where
10. Multi-stage, successive, and re-sampling strategies 847
The theory of two-stage sampling when selected FSUs on an occasion alone are
partially replaced on subsequent occasions after the first and by observing the same
set of SSUs in the given FSU from one occasion to the other was first developed by
Jessen (1942), Yates (1960), Patterson (1950), Eckler (1955), and Tikkiwal (1953,
1958, 1965). Singh (1968) considered the theory of two-stage successive sampling
and built up linear unbiased estimators of the population mean on the second and
third occasions separately under certain restrictive assumptions when the partial
replacement of units is made among the FSUs only. Kathuria and Singh (1971a,
1971 b) have shown that under certain circumstances, partial retention of SSUs may
be better than the partial retention of FSUs. Agarwal and Tikkiwal (1980) have
mentioned the following possibilities:
( a ) Replacement among FSUs only,
(b) Replacement among SSUs only,
( c ) Replacement among both FSUs and SSUs.
Surveys often have to be repeated on many occasions (over years or seasons or
months) for estimating the same characteristic at different points of time. The
information collected on previous occasions can be used to study the change or total
value for the most recent occasion. An investigator or owner of the industry of cold
drinks may be interested in the following type of problems:
( a ) The average or total sale of cold drinks for the current season;
( b ) The change in average sale of cold drinks for two different seasons;
To discuss Amab (1979a) successive sampling scheme let us first define few
notation. Let us consider
Yit = Value of the study variable for the unit of a finite population n of size N
jlh
N
YT = :DiT , population total of the study variable Y at time T , P; = The probability
i=1
of selecting the /h unit on any occasion, .~ Pi (Yti / Pi - YI XYI 'i / Pi - Y/,) = 8 11-1'1Vo for
1=1
N
every f,f'= 1,2,...,T, such that <5 and Vo = I.P;(}(d p; - yf are known quantities .
i=\
10. Multi-stage, successive, and re-sampling strategies 849
( 1 ) On the first occasion, select a sample S11 of size mil with PPSWR sampling.
( 3 ) The value of the total sample size mIl on each occasion is supposed to be fixed
and determined from cost consideration.
another sample mil = ( mIl - :~:mll') is selected by PPSWR method from the whole
given by
Yr = fC/r(/\ (10.9.1)
1=1
where
I
I vn (~I
-I--fJrl - £ . .'". ~'
-i(r-l) (I )}, (\
1= 1, 2, ...,T-Ij,
Yr(/) = mn STt P; mT/ stt Pi (10.9.2)
_I_I Yri, I=T,
mIT SIT P;
with fJr (I) and CI are being constants such that the estimator Yr is unbiased and
has the minimum variance. It can be verified by the method of induction on
different occasions that minimum variance of the estimator Yr exists for fJr(/) = 8
for all I .
Proof. For I = I
Now for T = 2
Y2 1 =1- I -
A ( ) (~1
Y2·' -fJ21 - YI"
I-'-J'j1 A ( ) }
.
m21 s2 1 Pi m21 s21 Pi
Note that s21 is selected from SII by SRSWOR sampling, therefore we have
Thus we have
So we have
E(YT(t))= EIE2 {YT(t) I ST_ I,I}
Note that
E( _ 1-
mT_I,1 sT- I,1
I
Pi
J
YTi = E( _ 1_ I YTi = YT .
ml,1 -u Pi
J
Thus we have
E(YT ) = f Ct~ = YT
t=1
T
whenever I C1 = 1.
1=1
Hence the theorem.
10. Multi-stage, successive, and re-sampling strategies 851
Theorem 10.9.2. The variance V[1'T(/)] with Pr(/) = £5 for every 1= 1,2,..., (T -1) is
given by
- -1-Jr~£5 zY']
![
- 1 T-I(
I 1 J Vo, for 1= I,2,...,(T -11
V[1'T(/)] = mn 1'=1 mT+I_I',1 mT_I',1 (10.9.3)
~, fori = T.
mIT
Proof. Consider
V(~(I))= v(_1_ I
mIl sll
Yli J=
Pi
_1_( i=IPi
mil
~ yt - JIzJ = ~
mil
Yzi
A ( )
=-I--fJzI
1 YZi
mZI sZI Pi
(~I Yli
mZI sZI Pi
1. - I - - Yo A ( )}
Then
V(1'z(I)) = EIVz[1'z(I) ISII]+ ViEz[1'z(I) ISII]
l
I YZi - fJz (i )Yli jZ
1 )_1_ I YZi - fJz(i)Yli
V(1'Z(I)) = EI(_I __ slI Pi + Vi [_1 I YZi]
mZI mil mll-i si l Pi mIl mIl -u Pi
,82(1)=8
and which leads to the following relation
V(Y2(1))lp2(1)=8 = (_I I_)~o
m21 mi l
+ 8 2VO- 282VO)+ Vo
m il
( I I)o(
= --- V
m21 mil
2)
1- 8 +- Vo
mi l
=[( 2
ml21 - mllJ - 8 ( m~1 - mll J + ml,Jv o .
, ] [I j -I ( I I J( 'of]
V [Yi l ) = mjl-t~1 mj +I-t'.1 mj _t' 8 J Vo for j =I,2 , ..., (T - I) .
2
Now
\r. {I
- 2h (I.!'-'ov r«t, - I -r«' -
-I- I
mTl sTl Pi mTl sTl Pi
( )} .
YT- II (10.9.4)
V( -
I L -Yti ) = £1{V2(- 1 L -r« I sil )} + ~ {£2(- I L -Yti I SI I)} . '
mTl STI Pi mTl sTl Pi mTl sTl Pi
L Yti
2)+~
[ ]
V _1_ L Yti _£\ _1 1 I_ Yti _~ _I_ Y/i
( mTl STI Pi J- ( mTl
L
mi l) mil - 1-u Pi
L
{ mIl -u Pi }
1
mi l
10. Multi-stage, successive, and re-sampling strategies 853
( I -~I)Vo+~=
= mTl
o V Vo
mTl .
C ov -£..,--,
I '" YT-I ,i Y -I (I)J
T =cov{_I- I YIi, _1_ I Yli}
( mTl STl Pi
m21 s21 Pi mil -n Pi
I
COy -I- I - YT-2i'
-
mT-I,1 sT-I ,1 Pi
Then we have
', YT - 2 ())
I +V (Y,T - 2 ())
I .
= C1 £2 - I I -YT-I
[ ( mTl sTl Pi
J
-,iI sT_I,1 , £2(AYT- I (I) I ST_I,I )~
+£1[ C2( - I
mTl sTl
YT -I
I - -
Pi
'i , YT-1I
' ( ) IST-I J]
= CI[_I- I YT- I,i , _1_ I YT- I,i _ _0_ I YT-2,i + oY - (1)]+0
T 2
mT-I,1 sT- I,1 Pi mT_I,1 sT- I,1 Pi mT-I,1 sT-I,1 Pi
Noting that
!
+oCov -I- I - YT--
mT-I,1sT- I,1 Pi
' ,YT - 21( )) .
I i'
854 Advanced sampling theory with applications
Yr -u Y' _ ( )) = V (,
COY( - 1 L--, ( ))•
Yr - 11
r 11
mTl sTl Pi
We have
v{_1_ L Yr-l ,i -Yr- 1(1)} = V(_1_ L Yr-l ,iJ+V(Yr_ I(1))-2COV(_1- L Yr-l ,i, Y
r - 1(1)J
mTl sTl Pi mTl sTl Pi mTl sTl Pi
Proof. We have
rt .
A ()
1=1
We note that Yr(t) are independent due to the sampling scheme shown in the Table
10.9.1 and E{Yr(t)} = Y for every t = 1,2, ..., T, thus the optimum value of
Cr oc {V(Yr(I))}-1 and
Vopt(Yr) If_
=[f1=11/ V(YAt))TJ =[f1=1 1mn1 - ~f-I(
1
1 _ _ 1 )(0 2r)vo]-l
1=1 mr+l-I',I mr_I',1
where
t/J(t)=_I_- rf-I( 1
A. T/ 1'=1 Ar+1-1',1
and
t/J(T)=_I- for t=T.
ArT
l
Thus we have
we get
856 Advanced sampling theory with applications
Following the proofs of the above theorems, the results given in the following
theorems can easily be proved, and avoided to save the space .
Theorem 10.9.3. If fJr(l) = 0, 'v' 1=1,2, ...,(T-l), then the optimum value of V(Yr )
with variation of values of CI , I = 1,2, ...., H is
1= 1,2,..,(T -1),
(10.9.5)
1= T,
where
-1- r-I'(
L 1 -l- 2'y I,--1,2, ...,(T-1),
) f\0)
Arl = mTt and ~(t')= Arl' 1;1 Ar+I-I,I' Ar-I,I'
mll _1_ 1'= T.
1Arr
Theorem 10.9.4. The optimum values of ATt (and hence mn ) are given by
J1jb( 1
1+ 1- 0 2 tA/-I)opt
0)
I
1= 1,2,....,(T-1),
(10.9.6)
ArI(Opt) =
1 15b(~r~I)opJ I=T,
(1+~1-02 )g(r-l)opt
where~I)oPt(I)=l,g(h)oPt=(~----::::::::::=----:,....-~==;:----
~) ( ~) and g(I)opt = 1 .
I -vl-o- + l+vl-o~ g(r-l)opt
Theorem 10.9.5. The minimum variance of the estimator of population total on the
r" occasion is given by
, ) o
V
VMin (Yr = -g(r)opt . (10.9.7)
mil
10. Multi-stage, successive, and re-sampling strategies 857
(10 .9.8)
Ar(opt) = ~ [ 1
1+~1-02 g(T-I}opt
J. (10.9.9)
It can be easily shown that the strategies proposed by Tripathi and Srivastava
(1979), Ghangurde and Rao (1969), Chotai (1974) and Chaudhuri and Amab
(1977) are the topics related to Amab's first strategy in successive sampling.
Arnab's Strategy II. The main steps of Amab's second strategy are:
( 1 ) On the first occasion select a sample SII of size mIl with the RHC scheme
with size measures Pi'
independently for t'= 1,2, ..., (t -1) following RHC scheme using normed size
measure I' Qi (t) which, in fact, is the sum of I'Qi (t -1)values containing the l" unit
occurring in S(I-I}t' for t'=1,2, ...,(t-l)and another independent sample SII is
selected from the entire population using Pi as normed size measures.
(10 .9.10)
with
858 Advanced sampling theory with applications
I
L YTi CQ;(T + 1))- .8r(J L Yr-I,I {I Qi(T + 1)}- f;(t)J t = 1,2,..., (T-1),
f;(t) = iEST! Pi iEs rl Pil
L Yhi {rQi(T + I)} t = T,
iESTT Pi
where c;
and Pr (t) are constants chosen such that estimator f; is unbiased and
have the minimum variance.
Theorem 10.9.7. The variance of the estimator f;(t) is minimum if Pr(t)= 0 and
minimum variance of the estimator f; is given by
-1
,*) NVo
Vrnin (Yr = - -
~
1 -
1 1 - -1- - 1 - -1-- (10.9.11)
[N - 1 ml2Arr(opt) N )[ 1- 0 2 [ Arr(OpJ ]
for the optimum values of C1 given by
_ 1[ 1
r/Jr(t)OPI ml1Arr(opt)
~)[1
N
_ _ 1 [1 _ _
~1-02
1 )]-1 t=1 2 (T-l1
Arr(opt) , , ...,
*
C1(opt) =
",",(op,) = 1 n
mil 1- 1-0 B(T-l)
)[mI1-N~I-02{1-~1-02 }B(T-l)]
with
B(T-l)= ril~2Nr/Jr_1(t)-(1-02)J1 .
1=1
Arnab's Strategy III. The main steps of Amah 's third strategy are:
( 1 ) On the first occasion, select a sample SII of size mIl with the RHC scheme
with size measures Pi'
Amab (1979a) has shown that Strategy II remains always superior than Strategy I
from the minimum variance point of view. Comparing Strategies II and III, the
efficiency comparison depends upon the values of the parameters involved in the
variance expressions. Sekkappan and Thompson (1994) have given thought on
multi-phase and successive sampling for a stratified population with unknown
stratum sizes. Amab (1998) has suggested two new sampling strategies for
estimating the most recent occasion total on the basis of two samples selected with
varying probability at two different occasions. Empirical results shows that Amab
(1998) strategy remains more efficient than one described by Prasad and Graham
(1994).
Example 10.9.1. The data in population 9 of the Appendix relates to the number of
immigrants coming to 51 states in the United States during 1994--1996. Select a
sample (SI ) of 10 states by PPSWR method using number of immigrants in 1994
( z) as measure of size. From the selected sample Sl , select a sub-sample Sm of size
m = 4 from SI by SRSWOR method assuming all elements of SI are distinct.
Finally select an independent sample Su of size u = (10- m) = 6 with PPSWR using
x as size measure. Estimate the total number of immigrants in 1996 using the
composite estimator.
Solution. We want to estimate total number of immigrants in 1996 ( Y ) using the
composite estimator
Y = ¢Ym+ (1 - ¢fu
with
860 Advanced sampling theory with applications
g= L
sm
(
Yi ..'. L Yi
Pi m sm Pi J(!L _..!.- L !L JVl L (
Pi m sm Pi sm
Yi ..'. L Yi z
Pi m sm Pi J
L ( !L _..!.- L !L
sm Pi m sm Pi
JZ]
and an estimate of Y based on unmatched units is given by
Yu =..!.- L a;
U ies u Pi
1 AL 1837 1937
2 AK 1129 3066
3 AZ 9141 12207
4 AR 1031 13238 014737, AR,
049819 AR
5 CA 208498 221736
6 CO 6825 228561
7 CT 9537 238098 236263 CT
8 DE 984 239082
9 DC 3204 242286
10 FL 58093 300379
11 GA 10032 310411
12 HI 7746 318157
13 ID 1559 319716 339922 ID
14 IL 42400 362116
Continued ... .. .
10.Multi-stage, successive, and re-sampling strategies 861
15 IN 3725 365841
16 IA 2163 368004
17 KS 2902 370906
18 KY 2036 372942
19 LA 3366 376308
20 ME 829 377137
21 MD 15937 393074
22 MA 22882 415956
23 MI 12728 428684
24 MN 7098 435782
25 MS 815 436597
26 MO 4362 440959
27 MT 447 441406
28 NE 1595 443001
29 NV 4051 447052
30 NH 1144 448196 465225 NH
31 NJ 44083 492279
32 NM 2936 495215 588183, NM, 622048, NM,
601448, NM, 534272, NM,
549171, NM, 513080 NM
626895 NM
33 NY 144354 639569 644818 NY
34 NC 6204 645773
35 ND 635 646408
36 OH 9184 655592
37 OK 2728 658320
38 OR 6784 665104 675048 OR
39 PA 15971 681075
40 RI 2907 683982
41 SC 2110 686092
42 SD 570 686662
43 TN 3608 690270 697856 TN
44 TX 56158 746428
45 UT 2951 749379
46 VT 658 750037
47 VA 15342 765379 771280 VA
48 WA 18180 783559
49 WV 663 784222
50 WI 5328 789550
51 WY 217 789767
862 Advanced sampling theory with applications
We used the first two columns of the Pseudo-Random Numbers to select a matched
sample Sm of m = 4 units from the given sample Sl of n = 10 units by using
SRSWOR sampling. The sample Sm selected is given by
~ "'\'
'0 .,
737756225 .7
. J
f (XI I. X(X YI ~J 'f YI )
A A
Also we have
( .4!.- - ~)(1-
2
0.3949 )+ .l,
10 10 = 4.412905 = 0.4238.
1 +6 10.412905
( ~4 - ~)(1-
10
0.3949 )+ ~
2
10
Hence the composite estimate of total immigration Y during 1996 in the United
States is given by
The last possibility has been discussed extensively by several research workers, for
estimating population mean of the second occasion using information from the first
occasion, including Jessen (1942), Tikkiwal (1950, 1955), Rao and Mudholkar
(1967), Das (1982), and Artes and Garcia (2000a, 2000b). Ratio estimator in
successive sampling was first studied by Avdhani (1968) and his followers Sen,
Seller, and Smith (1975), and Artes and Garcia (200Ia, 200Ib). Gupta (1970) and
Artes, Rueda, and Arcos (1998) studied product estimator in successive sampling.
The problem of estimation of ratio of two population means in successive sampling
over two occasions has been studied by Rao and Mudholkar (1967), Okafor and
Amab (1987), Okafor (1992) and Artes and Garcia (2001c, 2001d). Artes and
Garcia (200 1e) developed an unbiased ratio cum product estimator of the
population mean in successive sampling. Singh and Yadav (1992) have discussed a
generalized estimation procedure for successive sampling. Sud, Srivastava, and
Sharma (2001a, 2001b) have paid attention for estimating the population variance
over repeated surveys. Okafor (2001) discusses some successive sampling
estimation strategies in the presence of random non-response .
10. Multi-stage, successive, and re-sampling strategies 865
The basic idea of using least squares to incorporate information from a previous
occasion into the estimate of the current occasion is that of Jensen (1942) . Patterson
(1950) considered the use of information from the rotating samples . His genius idea
has been spawned in the vast literature owed to Eckler (1955), Rao and Graham
(1964), Raj (1965b), Smith (1978), Wolter (1979), Jones (1980), Huang and Ernst
(1981), Kumar and Lee (1983), Breau and Ernst (1983), Singh (1996), and
Yansaneh and Fuller (1998) . Duncan and Kalton (1987), Schreuder, Gregoire, and
Wood (1993), Fuller (1990), Lent, Miller, and Cantwell (1996) discussed different
kinds of repeated surveys and the objectives. Kasprzyk, Duncan, Kalton, and Singh
(1989) have discussed various aspects of panel surveys in an excellent way. A
rotation survey is one in which a unit is observed for a partial set of time points and
is not observed for the remaining set of time points in the study. The Canadian
Labour Force Surveys and the U.S. Current Population Survey are the well known
examples of the rotation surveys. The National Resources Inventory (NRI) is nearly
a pure panel survey of certain land area with a five year observation interval. A pure
panel survey can be defined as a survey in which the same units are observed at
each time point of a survey conducted at more than one time point. The longitudinal
survey can be defined as a survey conducted at more than two points in time with
multiple observations on some units planned as part of the survey design . Fuller and
Breidt (1999) introduced generalized least squares estimators for such surveys.
Following them, consider a simple three period survey in which one fourth of the
units are observed in all three periods and each of the remaining three sets of one
fourth of the units is observed in exactly one of the three periods. In other words, if
n is the total sample size, then 0.5 n of the units are observed at each point. Let
(ft, Yz, Y3 ) denote the value of a characteristic observed at times one, two, and three
respectively. Assume that the correlation between observations at time i and time
j on the same element is PjHI' For simplicity assume SRS for the selection of all
samples. Let YZI' Y31) I denote the estimated mean at time one, two and three,
(Y11>
of the sample elements that are observed all three periods . Let (YlZ, YZ3' Y34)1
denote the sample means for the three periods for the sample elements that are
observed once . These six estimators can be named as elementary estimators . We
wish to estimate the population means II = (Ill> liz , 113) for the three periods .
Y = X Il+e, (10.10.1)
IPI P2 0 0 0
PIIPI O O O
!l=4n-1er2 P2 PI I 0 0 0
(10.10.2)
o 0 0 I 0 0
000010
000001
It is to be noted that the term 4n- 1u 2 is the variance of the mean of n/4
observations. By applying the weighted least squares method the best linear
unbiased estimator of P = (PI> P2, P3) is given by
" _ (" " ")_ (Xlr.-IXt-lXlr.-I-
P - PI, P2' P3 - .. J •• y, (10.10.3)
and
V(u)= (x1n-1xjl. (10.1004)
Fuller and Breidt (1999) considered the comparison of the variance--covariance
matrix in (10.1004) with the variance--covariance matrix of a pure panel survey in
which the same n/2 units are observed on all three periods. The best linear
unbiased estimator of P under the pure panel design is
.upanel =Crl' ;;2, ;;3Y . (10.10.5)
The variance-covariance matrix for the pure panel design is given by
The major benefit of re-sampling methods is that a single standard error formula
can be used for all statistics, unlike the linearization method which requires the
derivation of separate formula for each statistic . Sometimes the linearization
method becomes too cumbersome in handling the situations specially for post-
stratification and non-response adjustments . Such situations can be handled very
easy through re-sampling methods. Re-sampling methods are found to be more
applicable to stratified random sampling or stratified multi-stage sampling.
Establishment surveys are the examples of stratified random sampling, whereas
large socio economic surveys are the examples of stratified multi-stage sampling. In
the case of stratified multi-stage sampling, re-sampling methods are valid if the
sample clusters are selected with replacement of the first stage sampling fraction is
negligible. Here we consider a stratified multistage design with large number of
strata, L, and relatively few primary sampling units or clusters, mh ~ 2, sampled
within each stratum , h = 1,2, ..., L, Assuming that sub-sampling within sampled
clusters i = 1, 2,..., mh is performed to ensure unbiased estimation of cluster totals.
10. Multi-stage , successive, and re-sampling strategies 867
The basic design weights Whik attached to the sample elements hik are adjusted for
post-stratification and unit non-response to get adjusted weights W~ik . The weights
W~ik may also be calibrated weights. As we saw from Godambe's work, many
parameters of interest such as mean, total, median, and variance etc., can be derived
as a solution to the census equation
S(O) = L U(Yhib0) = 0 .
(hik)en (10.11.1)
Obviously the GREG estimator of S(O) is
S(O) = LW~ikU(YhibO). (10.11.2)
(hik )es
In this section we would like to discuss different methods of variance of the
estimator S(O) viz. Jackknife, Balanced Repeated Replication (BRR), and Balanced
Half Sample (BHS) methods. Ahmad (1997) also suggested are-sampling
technique for complex survey data. Willson, Kimos, Gallagher, and Wanger (2002)
considered variance estimation from calibrated samples which improve techniques
adopted by statistical packages such as SUDAAN, SAS, and STATA etc..
At the first step, keep all the sampled units and solve the equation
S(O)= IW~ikU(yhik>O)=O
(hik)es
(101111)
. ..
we have an estimator Or of o. Delete one sampled cluster, say (l j), and adjust the
weights W~ik accordingly to the following two steps.
1
mt- 1
1 if h "* l.
Step II. Replace Whik by Whik(lj) in the post-stratification process to get adjusted
Jackknife weights W~ik(lj) .
Now solve the equation
S(lj)(O) = L W~ik(lj)U(yhib 0) = 0
(hik \es
i*t,k*j
to obtain the estimator O(lj) when the sampled cluster (lj) is deleted.
Then a Jackknife variance estimator of the estimator 0 is given by
868 Advanced sampling theory with applications
(10.11.1.2)
Example 10.11.1. Divide the United States of America into four independe nt strata
as Northeast, Midwest, South, and West. Suppose the first stratum Northeast
consists of two clusters New England and Mid Atlantic, second stratum Midwest
consists of two clusters East North Central and West North Central , third stratum
South consists of three clusters South Atlantic, East South Central, and West South
Central, and the fourth stratum West consists of two clusters Mountains and Pacific
as given in popu lation 7 in the Appendix. From each stratum (or region) select two
clusters ( or divisions) by SRSWOR sampling and within each selected cluster (or
division) select two units ( or states) and collect the information on the projected
population counts during 1995.
( a) Suggest an estimator for estimating population total in the US using stratified
multi-stage design.
( b ) Estimate the variance of the estimator of population total by using the
technique of Jackknife.
( c ) Find the 95% confidence interval for total projected population during 1995 in
the USA.
Solut ion. Let Yhik be the value of the J(h units (projected counts in a state during
1995) of the study variable in the lh cluster (division) of the h1h stratum (region).
1 M l1 =6 2 2
2 M 12 = 3
2 1 M 21 = 5 2 2
2 M 22 =7
3 1 M 31 = 9 3 2 2
2 M 32 =4 2
M 33 =4
4 1 M 4 1 =8 2 2 2
2 M 4 2 =5 2
10. Multi-stage, successive, and re-samp ling strategies 869
Let L be the total number of strata, N h be the total number of clusters in the h1h
stratum, n h be the number of clusters selected from the h1h stratum , M hi be the
total number of units in the /h cluster of the h1h stratum and mhi be the number of
units selected from the /h cluster of the h1h stratum.
+~[2.(5180
22
+ 7197)+i(3740 + 4282)] +~[!(1018 +3407)+2(31749 + 1253)]
2 22 2
which is an estimate of the projected population of the United States during 1995 as
shown in the appendix .
Thus using the concept of Jackknife, an estimate of the variance v(1') is given by
10. Multi-stage, successive, and re-sampling strategies 871
A half sample can be formed by selecting one of the two first stage sample clusters
from each stratum. A set of R half samples can be defmed by an R x L matrix
H with the (r,h}th element defined as
lh lh
+ 1 ifthe cluster ofthe h stratum in the r half is lSI from first stage sample cluster,
Eh-
r. -
1-1 if the cluster of the h lhstratum in the r lhhalf is 2nd from first stage sample cluster.
R
Then a set of R half samples is said to be balanced if LErh Erh'= 0 for all h -:;: h' . A
r;1
balanced H can be made from an R x R Hadamard matrix by choosing any L
columns excluding the column of all values of +1 , where L + 1 s R s L + 4 . Assume
872 Advanced sampling theory with applications
lh
B(r) be the survey estimator of () obtained by treating the r half sample as the
original data set after adjusting the weights Whik as
(r\ _ { 2 Whik if Erh = +1,
wh"k - (10.11.2 .1)
I 0 if Erh=-1.
Using these weights in place of the Jackknife weights, a standard Balanced Half
Sample (BHS) variance estimator of B is given by
fA)
VBHS\() = -
1 R[ A AlA A]I .
Il()(r) - () ()(r) - ()
(101122)
. . .
R r=1
A limitation of BHS method is that some of the replicate estimators may become
extreme because only a half sample is used. In other words, the weights wu« are
sharply perturbed. One well known method to solve this problem is to change the
weights Whik as
(r) _ {1. 5Whik if Erh= +1,
w hik - .
O.5whik If E rh=-1.
(10.11.2.3)
For practical use, we need a BRR with number of replications (R) as small as
possible. It can be made a balance between a desire for small R and the need to
have a reasonably stable variance estimator by following Wolter (1985). It may not
be trivial to find a BRR, but some methods have been suggested by Gupta and
Nigam (1987), Gurney and Jewett (1975), Sitter (1993), and Wu (1981). One may
note that the standard BHS variance estimator can also use the weights as
The original BHS was proposed by McCarthy (1969) for mh = 2 and was further
studied by Krewski and Rao (1979). It is difficult to construct balanced replicates
for arbitrary value of mh and its limitations can be had from Valliant (1996).
Fortunately there is an another solution to deal with arbitrary value of mh through
the method of Bootstrap.
10. Multi-stage, successive, and re-sampling strategies 873
A bootstrap sample can be obtained by drawing (m" -1) clusters with replacement
from the m" sample clusters in each stratum independently. Select a large number,
R, of bootstrap samples independently which can be represented in terms of the
bootstrap frequencies m"i(r) = number of times (h, i)th sample cluster get selected
in the rt/r bootstrap sample r =1,2, ..., R.
Then the bootstrap design weights are given by
Several research workers including Kott and Stukel (1998), Lohr and Rao (1998),
Rao and Shao (1992), Rao (1996b), Rao and Shao (1999), Shao, Chen, and Chen
(1998), and Shao and Chen (1999) have considered the applications of the above
standard techniques, with intelligent modifications, to estimate the variance in the
presence of non-response or imputed data through different techniques like hot
deck, cold deck, ratio imputation, and regression imputation, which we shall discuss
in details in Chapter 12.
EXERCIS~S:."""""... _,c.
Exercise 10.1. A sample of size n is to be taken from a population of size N to
estimate the total Y of values Yi (i = 1,2, ...,N) ofa variable Y when 'normed' size
measures Pi are available. Suppose the population is divided into n random
Exercise 10.2. Let Y; and Xi be the totals of the l/r FSU of the study variable Y
N
and auxiliary variable X, respectively. Assume that the population total X = IXi
i=1
of the auxiliary variable is known and we are interested in the estimation of
874 Advanced sampling theory with applications
N
population total of the study variable, Y = I Y; . A sample s of n FSUs is selected
;=1
according to any design with "; and "ij as the known first and second order
inclusion probabilities , which are in fact function of first stage sample size n . The
selected FSUs are sub-sampled independently with suitable selection probabilities
in the second stage. When the l h unit is being selected then it is assumed that from
sampling in the second stage estimators t;y and tix, respectively, for Y; and X; are
available such that EZV;y )=Y; and EZ(tix)=X;. Find the asymptotic properties of
the following estimators:
when the first and second stage units are selected with SRSWOR sampling .
when the first stage units are selected with PPSWOR sampling and second stage
units are selected with SRSWOR sampling.
(e ) t5 = [N
n
r. (M;yJM;X;)][
;=1 X;
N r. x;J/x
n ;= 1
[Sahoo and Swain (1986)]
Exercise 10.3. Show that Hartley and Ross (1954) unbiased ratio type estimator of
population mean in two-stage sampling is
tu -- +(N-l)(
=rX - - sb rx -(}sbrz )+- n Z
1 La; [M;-l
---- N-l(M
- ;-m;JJ(S;rx -(}S;rz )'
N n;=1 M; N M;m;
f
where
a; = NM; LM;,
N
I
() = XrtzZ, r;
;=1
L
= m; rij /
j =1
m;, r = La;r;
n
;=1
n,
10. Multi-stage, successive, and re-sampling strategies 875
Exercise lOA. In two-stage sampling n FSUs are selected with PPSWR sampling.
If the lh FSU occurs If times in the sample one of the following procedures may be
adopted for the second stage:
( a) If mi SSUs are selected with SRSWOR sampling;
( b) rt independent samples of mi SSUs selected with SRSWOR sampling;
( c) mi units are selected WOR and observations are weighted by rt -
Propose unbiased estimators of population total Y and derive the expressions for
the variance in each situation. If va' Vb, and Vc denote the variance of the
estimators under ( a ), ( b ), and ( c ), respectively, then show that the inequality
Va s Vb s Vc holds.
Hint: Rao (1961).
Exercise 10.5. A sample of n FSUs is selected with SRSWOR sampling and from
each selected FSU a constant fraction I: of SSU is selected. If If out of the mi
SSUs in the lh FSU possess an attribute, show that the estimator ratio to size
p = ~If/i~mi estimates the population proportion of the attribute. Find the
variance of the estimator p and suggest an estimator of variance.
Hint: Cochran (1977).
Exercise 10.6. (a) Suppose n FSUs are selected with PPSWR sampling in two
stage sampling. From each sampled FSU, m SSUs are selected with SRSWOR
sampling. Find the bias and variance of the estimator of population total defined as,
n , ,
YR = Ia), , where ai are the real numbers and Yi is an unbiased estimator of the
i=1
lh FSU. Deduce an unbiased estimator of variance.
Hint: Raj (1966).
( b ) In two stage sampling, n FSUs are selected from N FSUs in the population by
Midzuno--Sen's sampling scheme and the mi SSUs are selected from the lh
selected FSU by SRSWOR sampling. Derive an unbiased estimator of population
total, and expression for its variance and estimator of variance.
Hint: Raj (1954a, 1954b).
876 Advanced sampling theory with applications
Exercise 10.7. Consider n FSUs are selected with PPSWR sampling. From the ;th
selected FSU of size M i , suppose mi SSUs are selected with SRSWR sampling.
For estimating the population total Y the sub-sampling number mi are to be fixed
in two ways:
( a ) Expected value of mi is m;
( b ) Total number of SSUs is mo.
Find the optimum value of mi in each situation such that the variance of the
estimator of population total is minimum. Hence deduce the efficiency comparison.
Hint: Rangarajan (1957).
Exercise 10.8. Show that in the following cases the sample mean is an unbiased
estimator of population mean.
( a ) Divide the population into N clusters each of size M units. n FSUs are
selected with SRSWR sampling. From each selected FSU, m SSUs are selected
with SRSWR sampling.
( b ) Divide the population into clusters each of m' units. Select a sample of n'
clusters with SRSWR sampling.
( c ) Deduce the condition such that the efficiency of both schemes will be the same.
Hint: Singh (1956).
Exercise 10.10. Suppose n units are selected with SRSWR sampling on the first
occasion and a sub-sample of nst units is selected by a similar method and is
retained for the second occasion. A supplementary sample of size n(l- tr) units has
been selected independently by the same method. Find the constants a i'
Pi' (i = I, 2) and tr such that the variance of the estimator of population mean on
the second occasion defined as
is a minimum.
Hint: Hansen, Hurwitz, and Madow (1953).
Exercise 10.14. Let Pi> j = 1,2, ..., N , be the probability of selecting a unit U j
based on a variable z, N being the number of units in the population. On the first
N
occasion select a sample St of n units with probability P] such that L: P] = 1
j =t
using PPSWR sampling and observe the study variable y . On the hth occasion
878 Advanced sampling theory with applications
of size U either from the entire population or from the non-sampled units at the first
occasion by any suitable sampling design Pu • Assuming that the initial sample SI is
selected by using the Rao--Hartley--Cochran (RHC) scheme, compare the following
two estimator
'0 = I (OJ
Y2 =t,6Y2m+ (1-t,6)Y2u, and Y2m r; p;0)-
P; + fJZ ,
iES m
where
Y2m = I~;jp;~ ; Y2u= I (y2d p; )P;' ; Y;; =Y2;lld p;; J}= Ip;;P;'=Ip; ,with
;esm ;esu om Ou
Om and 0u being the groups of those values associated with those units that
belong to the random group from which the t h unit was selected in Sm and SU '
respectively. In addition
Exercise 10.16. Study the bias and variance properties of the following estimators
of population total in two stage sampling:
13 = .f ~~ x ;(xj.fx;J,
,=1Xl ,=1
14 = [N t (M;y;XM;X;)][N tx;J/x ,
n ,=\ X, n ,=1
and
15 »» -
n ;=\
-
=-I[M;y;+d;(M;x;-X;) nX NIX;
n (j
Jg
;=1
Hint: Sahoo and Panda (1997).
10. Multi-stage, successive, and re-sampling strategies 879
Exercise 10.18. Let a general linear unbiased estimator of population total Y under
a multi-stage design be defined as
A N .
Yms = LtisYi = LtisYi
ies i=1
where
tis• = {tis if i E S,
. and E( tis) = LtisP ( S ) = 1.
o otherwise, ies
(ii) In the case of ordered sampling the well known Raj estimator for n = 2
becomes
Vd'l =.!..[(I_lId.:~LY2)2
\11 P2
_(I_lIY{V2S~I)+
11
V2S(;2)}
P2
l
4
Exercise 10.19. Consider a population 0 consists of N first stage units such that
N
the t h first stage unit O . (say) containing M; second stage units and M
I
= 'IM;
~
. Let
y;, X; and Z; be the totals of 0; for the study variable Y and two auxiliary
N N
variables x and z, respectively, with corresponding totals Y = IY;, X = IX; ,
;=1 ;=1
10. Multi-stage, successive , and re-sampling strategies 881
N
and Z = IZi . At the first stage a sample s e n of n first stage units is drawn from
i=l
n according to any design with lfi and lfij as the known and positive first and
second order inclusion probabilities. For every i Es, a sample Si of mi second
stage units is drawn from n I. with suitable selection probabilities at the second
stage. Let r;, Xi and Zi be the unbiased estimates of Y;, Xi and Z., respectively,
I
such that
Vz(r;)= (Ti~' V z(X i)= (Ti~' Vz(z;) = (Ti; ' Cz(r;, Z;)= (Tiyz ' cz(r;, X;) = (Tiyx' and
Cz(Xi' z;) = (Tixz .
,
Y=I-L, y:
X=I-',andZ =I-L
' -
From the first phase sample information define
- ' X Z· -
ies lfi ies lfi ieslfi
such that
E(Y)=Y, E(X)=X, and E(Z)=Z.
Also
,) N (T2
V (Y = (T; + I -2.:..,
i=1lfi
N(Tixy
, ,) (Txy+ L--, , ,) N (T.
C(Y,X = CXZ=(T
( , xz +~--lB..
L..J ,
i=l lfi i=1 lfi
where
(Txz =-I
1N IN ( {X
X')[Z.
lfilfj-lfij _, __J J
-L _ _ z.) •
2 i=lj(",i)l lfi Ifj lfi Ifj
where G(., .), H(., .), hi (., .) and gi(" .) are the parametric functions as defined by
Srivastava (1980).
Hint: Sahoo and Panda (1999a, 1999b).
Exercise 10.20. Let K h be the number of primary clusters in the hlh stratum of the
finite population. At the first stage of sampling, k h = Ch(Kh) clusters are sampled
from stratum h as a SRSWOR, where Ch( 1)= 1 and Ch(t)~ 2 for t ~ 2. At the
second stage of sampling nhi = gh(Nh;) observations from a sampled cluster, where
N hi is the number of population units in this cluster. Let Yhi be the mean of these
sampled observations.
( a) Show that an unbiased estimator of population mean j.J is given by
_ L «, kh _ / L x, kh
Y = L - L N hiYhi L - LNhi·
h=l k h i=l h=1 k h i=1
( b ) Also show that if f.p.c. is ignored, then an estimator of the variance of the
estimator Y under the concept of repeated sampling is given by
, _
v(y)= L
L Kt
h=1 h h
kh
1=1 [
_ _ 1 kh
h ]=1
_
k (k -1) L Nhi(Yhi - Y )-k- ~ Nhj~hj -
_
~
y) 2/( h=1
L -kh ~Nhi )2
L K kh
h 1=1
Hint: Kom and Graubard (1998).
Exercise 10.21. In exercise 10.20 select the first stage k h clusters with PPSWOR
sampling using 'size' cluster level variable (Z, say) for constructing selection
probabilities. At the second stage of sampling, nhi = g h (Nhi ) ~ 2 observations are
sampled as a SRSWOR from the {IJ sampled cluster from the hllJ stratum. Show that
the estimator of the variance of the estimator of population mean Y defined as
YI = LL kh
L N hiYhi
- / L kh N
L L----.!!l..
h= l i=1 Jrhi h=l i= 1 Jrhi
is given by
j _NhjJ]2/(LkhNhiJ2
, (-YI ) = L
V L -- kk L - hi Yhi
kh[(N _- _YNhiJ
- - -1L kh[NhjYh
---Y - L L-
h=l (kh -l)i= Jrhi Jrhi k h j=1 Jrhj Jrhj h=li=IJrhi
Hint: Kom and Graubard (1998).
Exercise 10.22. Consider a population consists of N FSUs and the {IJ FSU contains
M, SSUs. Select a sample of n FSUs by PPSWR (and by PPSWOR ) sampling.
From the (IJ selected FSU select m i SSUs by SRSWOR sampling. Derive an
expression in each case for optimum allocation of second stage units with the
constraint that the total number of sampled SSUs remains fixed.
Hint: Rangarajan (1957), Chaudhuri and Amab (1982).
10. Multi-stage , successive, and re-sarnpling strategies 883
Exercise 10.25. Find the value of e such that the following estimator based on two
occasions is an unbiased estimator of population mean
2 _
Exercise 10.26. Derive an estimator of the parameter r = Iaj}j where a., i = 1,2
j=l
are constants. Assume that Sl is an SRSWOR sample of n units from population
n, S2m is an SRSWOR sample of m matched units and S2u is an SRSWOR
sample of n - m unmatched units from the rest of the population n - sf . Obtain the
optimum values of the sample sizes nand m such that the variance of the estimator
you suggest is minimum for the fixed cost of the survey given by
C = Co +nC1+ mC2+C3(n-m) .
Hint: Kulldoff (1963).
Exercise 10.27. Find the bias and variance of the following estimator of population
mean Y defined as
884 Advanced sampling theory with applications
Y~ = 1Ym-r (I -a {Yr
a-_-+
zm-r
-=-+ p21[Xn xr
-=---=-
zr zn Zr
J])z-,
where 0 s a s I, Yr(x r) denote the means of the matched sub-sample on the second
(first) occasion, zr is the mean of the auxiliary variable of the matched sub-sample,
Exercise 10.28. Let (Xi' Yi : i=I,2, .... ,n) be the observed values of the variables
on the first and second occasion respectively, and the corresponding true values are
(17;, OJ; : i = 1,2, ...., N) . An SRS sample of n units is obtained on the first occasion .
A random sub-sample of m = nA., 0 < A. < 1 of units is retained for use on the
second occasion . An independent unmatched SRS sample of u = n - m = nu units is
selected . Consider the measurement errors are defined as e, = X; -17; and
e;=Y;-OJi such that Em(c:ili)=O, Em(eil i)=O, E m(c:lli)=o;7 and Em(elli)=oi~'
where Em denote the expectation over the model. Let the single prime indicates
units common to two occasions and a double prime indicates the units selected
independently, so that we have the following situation:
Exercise 10.29. Consider an SRS s of n units selected on the first occasion from
a universe n of N units and measurements are taken on two variables Y and x
in each of the two occasions in bivariate normal population. While selecting the
second sample, we assume that m = pn, (0 < p <) of the units of the selected sample
on the first occasion are retained for the second occasion (matched sample) andthe
remaining u = n - m = nq, (q = 1- p) units are replaced by a new selection from the
universe of (N - m) units left after omitting m units. Let Xi (Yi) be the x (y)
variables on the ith occasion, i = 1,2 . Consider the problem of estimation of the
ratio of two population means. Let Ri = Y;/ Xi' i = 1,2 be the population ratio on
the ith occasion. Let Ri = YilXi' i = I, 2 be the estimator of the ratio on the /h
occasion. Further let Rim and Riu be the estimates of ratio on the /h (i = 1,2 )
occasion based on matched and unmatched units, respectively. Consider the
estimator R; of population ratio on the second occasion R 2, given by
"'* A A A A
Exercise 10.30. Consider the size N of a population remains same over two
occasions, but the values of the units changes over occasions. Let an SRSWOR
sample of nt units is selected on the first occasion. Out of this sample let n; units
are retained on the second occasion while a fresh sample of size n;* is drawn on the
second occasion from the remaining (N - n.) units of the population so that the total
sample size at the second occasion becomes n2 = n; + n;* . Assume that the
information on an auxiliary variable X, which is positively correlated to y is
available at the second occasion.
886 Advanced sampling theory with applications
Let
Yi the population mean of the study variable Y on the th occasion (i =1,2 );
Sly the population mean square of Y on the lh occasion;
YI the sample mean based on nl units drawn on the first occasion;
y; the sample mean based on n; units observed on the second occasion and
common with the first occasion;
y;* the sample mean based on n;* units drawn afresh on the second occasion;
y~ : the sample mean based on n; units common to both occasions and observed
on the first occasion;
X2 : the population mean of the auxiliary variable X on the second occasion;
six : the population mean square of X on the second occasion;
x; : the sample mean of X based on n; units common to both occasions and
observed on the second occasion; and
x;* : the sample mean of X based on n;* units drawn afresh on the second
occasion.
Study the asymptotic properties of the estimator of the population mean on the
second occasion Yz as
scheme using Pi = xi , the second sample Sz =SZm USZ u , SZm being an SRSWOR
X
sample of m units taken from SI, and SZu is an independent sample drawn from
the population n by using RHC scheme using the same Pi measures.
( I ) Study the bias and variance of the following estimator:
~ =rp YZm+ (l-rp)Yzu ,
10. Multi-stage, successive, and re-sampling strategies 887
where
Y2m = ~ (Y2i - Yli }rli + ~ YIi'1i
iES2m (mjn)Pi iES\ Pi
and
Y'2u -
- " Y2i'2i
L.---
iES2 Pi
with 'Ii and '2i are the totals of Pi values included in the t h group while selecting
Sl and S2u using RHC scheme.
Practical 10.2. Divide the United States of America into four independent strata as
Northeast, Midwest, South and West. Suppose the first stratum Northeast consists
of two clusters New England and Mid Atlantic, the second stratum Midwest
consists of two clusters East North Central and West North Central, the third
stratum South consists of three clusters South Atlantic, East South Central, and
West South Central, and the fourth stratum West consists of two clusters Mountains
and Pacific as given in population 7 in the Appendix. From each stratum (or region)
select two clusters ( or divisions) by SRSWOR sampling and within each selected
cluster (or division) select two units ( or states) and collect the information on the
projected population counts during 2000.
888 Advanced sampling theory with applications
Practical 10.3. The data in population 9 of the Appendix relates to the number of
immigrants coming to 51 states in the United States during 1994--1996 . Select a
sample ( Sl ) of 20 cities by PPSWR method using number of immigrants in 1994
(z) as measure of size. From the selected sample s, , select a sub-sample Sm of size
m = 12 from Sl by SRSWOR method assuming all elements of s, are distinct.
Finally select an independent sample Su of size u =(20 -12) =8 with PPSWR using
x as size measure. Estimate the total number of immigrants in 1996 using the
composite estimator. Compare your estimate with the estimate obtained in example
10.9.1 and comment on it.
Practical 10.4. The data in population 9 of the Appendix relates to the number of
immigrants coming to 51 states in the United States during 1994--1996 . Consider a
sampling on three occasions in which a constant number m such that n = mit = 2x 5
of sampled units is retained from each occasion to the next and a fresh sample of
u = 5 units is selected from the units not used up to that occasion . Suppose that
sampling is by SRSWOR and total sample size at each occasion is n. Estimate the
populat ion mean Y at the third occasion by using the estimator
The randomized response technique (RRn is useful for reducing response error
problems when potentially sensitive questions such as the illegal use of drugs,
sexual practice, illegal earning, or incidence of acts of domestic violence are
included in surveys of human populations. Direct questioning of respondents about
sensitive issues often results in either refusal or falsification of the answers. Social
stigma and fear of reprisals sometimes result in untruthful, exaggerated, or
misleading responses by respondents when approached with conventional survey
methods. Warner (1965) was the first to suggest an ingenious method of
counteracting fears in response to sensitive questions.
RESPONSE FROM
WARNER'S
MODEL
No Yes
o otherwise.
= tr(l-tr) P(l- p)
v(trwA )
+ 2 • (11.1.5)
n n(2P-l)
Example 11.1.1. Michael believes that 'churches' and 'academic institutes' were
made for honest people, but after watching the movie 'Wonder Boys-2000', he
felt that due to the increase of unscrupulous people in the world that there may be
academic research cheaters (or academic thieves) in the institutes. He selected an
SRSWR sample of 20,000 researchers across the world from different institutes by
using a randomisation device, like the spinner, producing 70% of the statements,
'Have you ever stolen your colleagues research papers or books?' along with 30%
of the statements, ' Have you never stolen your colleagues research papers or
books?' Out of 20,000 selected researchers 6060 reported 'Yes' through the above
randomization device. Estimate the proportion of 'academic thieves' in the world,
and also construct a 95% confidence interval estimate.
Solution . Here P = 0.7 , n = 20,000 and nt = 6060 .
Thus the observed proportion of 'academic thieves' is given by
ow = 6060
20000
= 0.303 .
892 Advanced sampling theory with applications
v(ir ) = Ow(l- Ow
w n-l
L 0.303(1- 0.303) = 0.211191 = 0.0000156 .
20,000-1 19999
Thus a 95% confidence interval estimate of the true proportion of ' academic
thieves' is given by
Thus Michael claims with 95% confidence that in these days there are 0.113% to
1.386% of the researchers in this world are 'academic thieves', but he is pleased
that this proportion is very negligible.
Let " represent the proportion of individuals in a population who belong to group
A (e.g., the proportion of women who have had an abortion). Franklin (l989a,
1989b) considers that a with replacement simple random sample of n > 1
individuals is chosen. This is usually the case and eliminates concerns about finite
population correction factors. A total of k > 1 trials per respondent are conducted.
For respondent i on trial j, random values are drawn from the densities gij and
hij (where we assume independence of all densities). The respondent but not the
interviewer seeks both values and is asked to report the value from gij if he/she
belongs to group A and the value from hij otherwise.
,
~",.1,.~ 'itRespo'nse from Franklin's model .
Status of the respondent
Belongs to group A Do not belongs to group A
The interviewer knows the exact form of gij and hij but sees only the value
reported by the respondent. This value we denote by Zij : The interviewer does not
know for certain from which of the two densities it comes. From the total of kn
observations of Zij (i = 1,2 , ..., II; j = 1,2 , , k) inference can be made about " . The
conditional density of a k-tuple (ZiJ , Zi2, , Zik) representing a random observation
of the fh respondent given " is
Chapter 11.: Randomized response sampling : Tools for social surveys 893
(11.2.1)
where P is Bernoulli (p) and X ij - g ij and Yij - hij' The model can be
specialised by having gij = gi and hij = hi for all i . Thus all respondents are
observing the same distributions and allows us to perceive the k-tuple response
(Zil,ZiZ, ,,,,Zik) as independent, identically distributed (i.i.d.) random variables with
form
(11.2.2)
from the density
JTng)Z;)+(I-JT)nhj(zJ (11.2.3)
j=l j=l
Although any density may be used for gi or hi, the Franklin (1989a, 1989b) model
consider only both as normal densities with known means J.Jlj and J.JZj' and
known standard deviations alj and aZj respectively. The choice of gi and hi as
Bernoulli (p) with hi(Z) = 1- g;(z) and k =1 reduces this model to Warner's
(1965) original model and the densities hi and g i are dependent. Also, if gi
denotes the distribution generated by the first deck of cards with known proportion
01 of red cards and hi denotes the distribution generated by the second deck of
°
cards with known proportion Oz (* 1) of red cards, then Franklin's model reduces
to the model suggested by Kuk (1990). It is a well known result that while the
estimators derived from the method of moments (MM) are usually not preferred
over maximum likelihood (ML) estimates, but under certain circumstances they
might be. Franklin has shown that there is one such possible instance where MM
estimators and their associated variances can be found analytically, while those of
the ML estimators cannot be obtained. This allows formation of confidence
intervals and test of hypothesis to be conducted for these MM estimators. Two
obvious MM estimators can be derived, one by concentrating on the row averages
of Z ij and other on its column averages.
Let
2;. = .!.[Zil + ZiZ + ...+ Zik] and Z.j = .!.[ZIj + ZZj
+...+ ZnJ (11.2.4)
k n
for i = 1,2, ..., nandj = 1,2, ..., k, represent the row and column averages
respectively. Concentrating on the row averages gives
(11.2.5)
k j=l
i j=l k
i
E(Z;o) =-!-[n ,ulj + (1-n) ,uZj] =-!-[ nml + (1- n )mz]' (11.2 .7)
where ml =
k
L ,ulj and mz
j=\
=
k
L ,uZj .
j=l
From these the standard MM approach gives
k _
L ZOj -mz
kZ -mz j=l (11.2.8)
=
A
nl
ml-mZ
Note that the k-tuple responses are i.i.d, it can be easily shown by (11.2.6) that
E(i 1) = n
with the variance of i 1 given by
Now the observations ZI ,ZZ,oo"Zk from a certain single respondent are not
independent and need not even be identically distributed. In fact, for example, the
joint density of (z; Zj) of two trials from a single respondent is given by
(,ulj - ,uZj)~ 00, then Covlz;, Zj)~ 1 . This is apparent since knowledge of Z; for
a particular respondent under these circumstances gives perfect predictability for Z;
given by the same respondent.
Equations (11.2.9), (11.2.11), and (11.2.12) give the variance of i\ as
(11.2.14)
Thus the vanance of i l can be thought of as the variance from the ordinary
sampling plus an additional term due to the randomizing effect.
E{Z. j) = .!. f E{Zij) = .!. f E{Z j) = E{Z j) = 1!J.llj + (1 - 1! )J.l2j (11.2 .15)
n ;=1 n j=1
for j = I,2, ...k .
Hence a second MM estimator can be formed from the average of the /h column of
observations by setting Z. j = E{Z. j) and solving for 1! to have
Thus a total of k such estimators (not independent of one another) for j = 1,2,..., k
can be formed. Each of these estimators is an unbiased estimator of 1! • The variance
of i O
j is given, since observations are i.i.d. by
, )_ V(ZoJ _ v(Z j)
V (1! 0 j - (
J.llj - J.l2j
)2 - (
n J.llj - J.l2j
)2 . (11.2.17)
Furthermore
, , )_ Cov{Zo;, ZOj)
COY ( 1!o;, 1!oj - 'jp ) (11.2 .19)
(J.ll; - J.l2; Ij - J.l2j
Note that Zp i and Zmi are independent if p"* m . But since the responses are i.i.d.
we have from (11.2 .12)
, , )_ nCov(Z;,zj) = 1!(I-1!)
COY ( 1!o;, 1!oj - ( ) (11.2 .20)
n(J.ll; - J.l2;) J.ll j - J.l2j n
_ IJll j - Jl2jl Dj
Wj - k (11.2.21 )
~ IJljj - Jl2jl D
J=!
k
where Dj = IJllj - Jl2jl and D = ~ Dj for j = 1, 2, ..., k .
J=!
(11.2.23)
Thus this second MM estimator has a variance similar to the first MM estimator . A
comparison of the variance of the two MM estimators 1rj and 1r2 from (11.2.14)
and (11.2 .23) reveals that they will be identical if and only if all the terms
(,ulj - Jl2j) have the same sign for j = 1,2,..., k then
(11.2.24)
In any other case the denominator of the right hand side of (11.2.14) will contain
both positive and negative terms and will result in V(1rl) :5 V(1r2) ' In fact, when all
(,ulj - Jl2j) for j = 1, 2,..., k have the same sign then we have 1rt = 1r2 ' Thus to
conclude the two MM of estimators will be identical if all terms (,ulj - Jl2 j) for all
j have the same sign. In any other case, 1r2 will have a smaller variance. This
aspect of keeping the sign of all (,ulj - Jl2j) the same is the particularly important
design consideration in using the 1r1 estimator. It is seen that (mj- m2) could not
be equal to or near zero. By violating this condition the variance of 1rl could be
extremely large. On the other hand, for the estimator 1r2 it is seen that the variance
can be decreased by making at least one absolute difference large. Singh and Singh
(1992a, 1993d) have suggested some improvements over Franklin's model. Amab
(1996) developed a unified setup for Singh and Singh's models. More details about
randomized response sampling can be had from Chaudhuri and Mukherjee (1988),
Sheers (1992), and Tracy and Mangat (I 996a).
Chapter 11.: Randomizedresponse sampling : Tools for social surveys 897
In Warner's model both questions refer to the same sensitive character A or its
complement A C • Greenberg, Abul-Ela, Simmons, and Horvitz (1969) felt that to
protect the privacy of the respondents, it is desirable that the two questions be
unrelated and suggested an unrelated question model. In Greenberg et.al.'s pioneer
unrelated question model the data gathering randomization device consists of two
questions or statements:
(i ) Are you member of group A?
( ii ) Are you member of group Y?
where character Yor complement of it are innocuous and unrelated to A. For
example, in estimating the proportion of persons having extra marital relations in a
certain community the two questions may be:
Clearly the second question has nothing to do with extra marital relations. An
SRSWR sample of n is drawn from the population. Greenberg, Abul-Ela,
Simmons, and Horvitz (1969), in their theoretical development, dealt with two
situations of Jr y (proportion of unrelated character, Y, say), being known and
unknown.
e=!!in .
Then we have the following theorems:
Here JTy the proportion of neutral character (say) Y in the population is unknown .
In such a situation, Greenberg, Abul-Ela, Simmons, and Horvitz (1969) suggested
to take two independent SRSWR samples of sizes nl and n2 from the population.
In the /h sample, Pj and (1 - fj) denote the probabilities of selecting the statements
regarding the possessing of the sensitive characteristic A and non-sensitive
characteristic Y in the randomized response device R, used for the respondents in
the /h sample, i = 1,2 , so that the probability of 'Yes' answer in the ith sample is
given by
(11.3 .2.1)
where 01 and O2 are the observed proportion of ' Yes' answers in the first and
second sample, respectively.
Proof. Solve the two equations given by (11.3.2.1) for JT and use the method of
moments. The unbiasedness is clear from the fact that OJ - B(nj, OJ).
V(ir ) =
o
1
\2
[(I-P2]201(1-01) + (1-11]202(1-02)], (11.3.2.3)
( 11-~) ~ ~
where OJ =fjJT+(I -fj)JTY' i=l, 2, denotes the probability of ' Yes' answer in the
first and second sample , respectively.
Proof. It follows from OJ- B(nj,O;) so that
V(Oj)= OJ(I-O;) , i=l, 2.
nj
Hence the theorem .
Chapter 11.: Randomized response sampling: Tools forsocial surveys 899
where 11 = III + 112 and III and 112 are Greenberg, Abul-Ela, Simmons, and Horvitz
(1969) optimum sample sizes. They suggested that one of the optimal choices of
P;, i = 1, 2, should be close to one and the other close to zero. The value lZ" y
should be chosen close to zero or one according as lZ" < 0.5 or lZ" > 0.5 respectively.
Moors (1971) claimed that if one of the value of p; (say, P2 ) is chosen to be zero
then the model of Greenberg, Abul-Ela, Simmons, and Horvitz (1969) becomes
optimal so far as the choice of p; is concerned. The optimality choice of P2 = 0
(i.e., Moors' model) discloses the privacy of sensitive character whenever any
respondent appears in both the independent samples and reports 'Yes' in the first
sample and 'No' in the second sample. The probability of repetition of the
respondents in both independent samples is quite high when large samples are
required for gaining efficiency in the randomized response techniques. This
difficulty has been pointed out by Mangat, Singh, and Singh (1997). The
probability of a respondent being selected in both samples depends upon the size of
stratum or the level of the estimates required. Some organizations require estimates
at country, state, district, village, or school level. As the stratum size changes from
country to school level, the probability of the respondents being selected in both the
samples increases. Another practical situation where the respondent gets selected in
both samples is the well-known overlapping cluster sampling. If there is no
population list, then in such situations, although an individual/ respondent might not
be detected by the interviewer, the interviewee might think that in the first sample,
he replied 'Yes' out of choice of two questions, now another interviewer is asking
him a direct non-sensitive question. If he understands RRT then he will never
respond, as he becomes suspicious. In this case the problem arises from the
interviewee's side. Mahmood, Singh, and Hom (1998) observed that the method
suggested by Mangat, Singh, and Singh (1997) needs splitting of the population of
N units into two random groups. Sometimes it is difficult to do so owing to the
population being large or non-availability of a list of individual units at population
level. Moreover, the variance expression of the estimator proposed by them is too
complicated to implement in actual practice. Mahmood, Singh, and Hom (1998)
have provided three simple and alternative survey techniques which parallel the
optimality conditions suggested by Moors (1971) and are free from the difficulties
in the Moors (1971) as well as in the Mangat, Singh, and Singh (1997) model.
900 Advanced sampling theory with applications
and
(iii) 'I belong to group Y ,
with probabil ities PI' P3 and P4 respectively , such that ~ + ~ + P4 = 1 .
(11.3.2.5)
.03-
1CI =
P3(I-i)-P
y 4iY (11326)
..•
~
where 03 and i y are the observed proportion of 'Yes' answers in the first and
second independent samples.
r
Theorem 11.3.2.3. The minimum variance of the estimator i l is given by
Proof. Note that 03-B(nl' 83), iy-Bh, 1Cy ) and the samples are independent,
V(i.) = V(03)+(P3-P4 f v(iy)
~2
8 (1- 8 ) + ( ~ - P )2 1Cy{1- 1C y)
3 3
--=,-,,----,,-,-
4 (11.3.2.8)
nl n2
~2
On differentiating (11.3.2.8) with respect to nl ' such that nl + n2 = n , and equating
to zero we have
(11.3.2 .9)
Technique II. The randomized response device here differs from technique I in the
sense that the second statement 'I belong to y C ' is simply replaced by the statement
'Try once again.' If this statement appears on the second trial, then the respondent
is requested to report 'Yes' irrespective of his actual status. Then the probability of
a 'Yes ' answer in the first sample of "( respondents is given by
(}4 =(I+P3XPIJZ" +P4JZ"y) +pl · (11.3.2:10)
The second independent sample of liZ respondents is asked the direct question on
Y by following Moors (1971).
By the method of moments an unbiased estimator of JZ" is given by
. 04 -pl-P4(I+P3)Jfy
JZ"z Pt(I+~) (11.3.2.11)
Thus we have the following theorem
Theorem 11.3.2.4. The minimum variance of the estimator Jfz is given by
Technique III. The randomized response device differs from technique I in the
sense that the second statement ' I belong to y C ' is simply replaced by the statement
' I belong to A C ' . Then the probability of 'Yes' answer in the first sample of "l
respondents is given by
(}s = PtJZ"+P3(1-JZ")+P4JZ"y ' (11.3.2.13)
The second independent sample of liZ respondents is treated by following Moors
(1971). By the method of moments, an unbiased estimator of JZ" is given by
• Os -~ -P4Jf
JZ"3 = Pt -P y (11.3.2.14)
3
Example 11.3.1. Assume the true proportion of the drug users in a city is 0.1.
Consider we used a randomization device to collect information from a large
sample of the persons bearing two types of statements:
(i ) ' Are you a drug user?' with probability 0.70;
( ii ) ' Were you born in spring ? ' with probability 0.30.
Assuming that the proportion of persons born in spring is 0.2. (In actual practice it
is unknown). Find the relative efficiencies of the above three techniques with
respect to the Greenberg, Abul-Ela, Simmons, and Horvitz (1969) technique.
Solution. We are given Jr = 0.1 , Jr y = 0.2 , ~ = 0.7 and P2 = 0.3 . Thus we have
01 =~Jr+(I-~)Jry=0.7 xO.I+(1-0.7) x O.2 = 0.13,
and
O2 = P2Jr+(I-P2) Jr y = 0.3 xO.I+(1-0.3) xO.2 = 0.17.
Thus the relative efficiency of the estimator tTl with respect to tTG is given by
+ (I - ~ ).)02(I - 02) ]2
r (~
[ (I - P2).)OI(I - 01) 2
~= ~
[~03(I-03) +(~ -P4~JrAI-Jry) P2f
r
[ (1- 0.3NO.13(I- 0.13) + (1- 0.7No.17(I- 0.17) ] 2 2
= 0.7 = 3 0822
[~0.14(1-0.14) +(0 .15-0.15No.2(1-0.2) (0.7 0.3f . .
+(I-~).)02(I-fJ2)
r
[ (I - P2).)fJ1(I - fJ1) ]2 (( )\2
~= ~1+P3)
r
[ (1- 0.3).)0.13(1- 0.13) +(1 -0.7).)0.17(1-0.17) ]2 ( ( )\2
= 0.71 +0.15) =44747
[~0.07425(1-0.07425) +0.15(1+0.15).)0.2(1-0.2) (0.7 0.3f . .
Chapter 11.: Randomized response sampling: Tools for social surveys 903
r
RE = [(1- PZ'N01(1- OI) +(l-fj'NOz(l-Oz)]Z (fj-P3)\2
Singh, Joarder, and King (1996) considered the problem of estimation of regression
coefficients in the traditional regression model. They assumed that the variable of
interest Y; is related to k non-stochastic regressors via the classical linear
regression model
v xp ;«
s (11.4 .1)
where Y is an n x 1 vector of Y; values, X is an n x k matrix of regressors, p is
k x 1 and e is an n x 1disturbance term such that e - N(O ,O"zl n).
Here Y; is a
sensitive variable whose observations have to be obtained by survey methods.
Because some respondents are unlikely to respond truthfully to questions about
their behaviour which is immoral, unpopular, or unlawful, Eichhorn and Hayre's
(1983) scrambling response approach is applied as follows: Each respondent is
requested to report the product YjSj where S, is the value of the scrambling
variable drawn by the l h respondent. The privacy of the respondent is protected by
the fact that S.I is not known to the interviewer but its distribution, and in particular
its mean E(Sj) = 0 and variance V(Sj) = yZ are known. The scrambling device may
be a deck of cards, a spinner, etc., following some suitable distribution, e.g.,
Normal, Weibull, or any discrete distribution. The Y;Sj value obtained from the lh
respondent can be standardized as Z, = Y;S;/O after collection. Singh, Joarder, and
King (1996) considered the estimation and testing of p under the model
Z=XP + 1/, (11.4.2)
where Z is the n x 1 vector of Z, values and 1/ is an n x 1 vector of errors whose
distribution is unknown.
(11.4.5)
Proof. Let VM denote the variance-covariance matrix over the .model (11.4.1).
Then we have
V~.)= EMVR(B.)+VMER(B.) = EMVR[(Xlxt XIZ]+ VMER[(XIXt XIZ]
2
Theorem 11.4.3. An estimator of 0- is given by
2
= (n - k + Cy ~Mi'i)a [ ~ /3jXi,jJMi,i '
2 n 2 2 n k
+C y L (11.4.7)
1=1 1=1 ;=1
By the method of moments (11.4.7) with unknown /3j values replaced by estimates
gives (11.4.6) . Hence the theorem.
Theorem 11.4.4. The Wald test statistic to test the null hypothesi s H o:/3 = /30
against the alternative hypothesis H a : /3 * /30 for the scrambled response model is
given by
(p. - /30 nvCB.)t (p. - /30) a~ X2(p)
under H0 assuming that p. ~ N(P ,v~. )) and an estimator of v~. ) is given by
v(p.)= a. 2[{x tx t +c;(xtX)-I]
t
+c;{xtX Xtdia g[ [ f Xlj/3. jJ2,[ ~ x2j/3.jJ2 ,....,[ ~ Xn,jP'jJ2]X{XtX
;~ ;~ ;~
t.
A detailed empirical study has been carried out to study the nature of estimates of
regression coefficients under different levels of respondents lying in direct
surveying . Strachan, King, and Singh (1998) have considered likelihood based
estimation of the regression model with scrambled responses, where they compared
Bayesian estimator achieved through a Markov Chain Monte Carlo (MCMC)
sampling scheme with a classical maximum likelihood estimator and the estimator
proposed by Singh, Joarder , and King (1996), and later coefficient of determination
is studied by Singh and King (1999).
A more serious problem arises when a character under study is sensitive in nature.
For example, researchers may be interested in estimating the average income using
expenditure and assets of households as independent variables . Obviously high
correlation is expected between expenditure and assets of household and will result
in matrix (X' X) to become ill conditioned . Singh and Tracy (1999) consider the
ridge regression estimator of the regression coefficient under scrambled responses
and found more efficient than the ordinary least square estimator.
Singh and Tracy (1999) considered a ridge regression estimator under scrambled
response as
PR(sc) = (X' X + RcI)- 1X'z , (11.4. I.l)
where Rc denotes the ridge constant.
906 Advanced sampling theory with applications
VR ( )= C 2Diag
z y
· (lJ 2, Y22, ..., Yn2). (11.4 .1.2)
B~~~) = E~~~) - f3
we get the theorem.
V~~~} 2 l
= er A(X 'X t A'+C;(X'X + Rcl t ' X' (D +er
2I)x (X ' X + RcIt l ( 11.4.1.3)
which is equivalent to trace of the right hand side of ( 11.4.1.3).
Proof. Let VM and VR denote , respectively, the variance-covariance matrix over
the model and the distribution of the randomization device . Thus we have
=er2 A(X 'Xt' A'+C;(X'X + Rcl t X' (D + er 2I )x(X ' X + Rcl t '.
l
Theorem 11.4.1.3. The mean square error of the estimator p~~ is given by
The ridge regression estimator p~) under scrambled respon ses is more efficient
c
or if
Chapter 11.: Randomized response sampling : Tools for social surveys 907
O<Rc <
2(1 + CYZ ~Z
pIp (11.4.1.6)
which can be derived by taking trace of (11.4.1.5). It is interesting to note that the
range of ridge constant Rc is wider in the case of scrambled responses than in the
case of direct question survey sampling because Cy > o. Also (11.4.1.6) shows that
the ridge constant Rc is directly proportional to Cr '
Singh, Horn, and Chowdhury (1998) have introduced a new and interesting model
in survey sampling for social sciences. Suppose someone is interested in estimating
the proportionate strength (or size) and average of a stigmatizing quantitative
character of a particular hidden gang G (defined by that character or otherwise) in
the finite population n. People are unwilling to admit membership of gang G due
to fear of administrative retribution or social embarrassment. Singh, Horn, and
Chowdhury (1998) attempted a possible solution to the following types of problems
is survey sampling.
( 1 ) Estimation of the proportion of persons in population n having income
greater than or equal to $60,000 (say, gang G1 ) along with their average income.
( 2) Estimation of the proportion of persons having extra marital relations
(say, gang G z ) in the whole population and their average income.
( 3) Estimation of the proportion of politically active persons in the country
(say, gang G) ) and their average income [or the average number of murders
committed by them].
( 4) More generally, estimation of a proportion of persons involved in a particular
crime (say, gang G) along with average value, Jix ' of any stigmatized quantitative
character (say, X) of the same gang.
In the first method Singh, Horn, and Chowdhury (1998) suggested drawing two
independent samples of sizes nl and nz from the population using SRSWR. The
persons appearing in the first sample are provided with a randomization device R1
(say). The device may be a spinner, deck of cards, or computer grid which
generates a random variable taking a value greater than or equal to '1'0 (say) from a
given probability distribution. Each respondent selected in the sample is requested
to draw one random number (which is obviously greater than or equal to '1'0) from
the randomization device R 1 , without showing it to the interviewer. Now he is
requested to report the actual value of the stigmatized quantitative variable, say X ,
if and only if he belongs to gang G, otherwise he is requested to report the random
number as drawn. The value of '1'0 depends upon the problem under consideration .
908 Advanced sampling theory with applications
For example, in problem 1 the value of 'Po is $60,000, in problems 2 and 3 the
value of 'Po can be taken as zero or any other suitable value. The choice of 'Po can
be made such that the respondent's privacy will not be jeopardized if they respond
honestly. It is assumed that the distribution of the randomization device, R1 , is
known to the interviewer, but not the number drawn by the respondent. Let 81 and
a? denote the known mean and variance of the randomization device R1 • Let"
denote the proportion of persons belonging to the gang G in the population n . If
the respondents are reporting 100% truthfully then the distribution of the /h
response, say ZIi , is given by
Xi with probability x,
ZI '- (11.5.1.1)
I -
{R with probability (1-" ~
li
(" ) 1 2 O"Z22]
O"ZI
V Jrs = (0 -0
1 2
'1 [~+~ ,
(11.5.1.6)
where
0"11 = JrO"; + (l-Jr)af + Jr(l-Jr Xf..Jx - 01'I (11.5 :1.7)
and
0"12 = JrO"; + (1-Jr )ai + Jr(l- Jr Xf..Jx - O2'1 . (11.5 .1.8)
Proof. Follows from the independence of samples.
Theorem 11.5.1.4. An expression for the variance of the estimator fix is given by
fix = 11/12 ,
where
11 = 2 201-0221 and 12 = (22 -02)-{21 - O[).
By the ratio method of estimation, the variance of the estimator fix is given by
(11.5.1.12)
On using (11.5.1.12) in (11.5 .1.11) we have (11.5 .1.10). Hence the theorem .
The next section discusses strategies for using the method 1 in actual survey
sampling.
910 Advanced sampling theory with applications
Decision Making Strategies: The decision maker may have any of the following
objectives:
Case I. When the investigator is interested in obtaining a more precise estimate of
population proportion Jr than mean J.l x ' he will seek to minimize the variance of
the estimator irs at the expense of the variance of fix . The variance of the
estimator irs will be minimal if the values of sample sizes nl and n2 are given by
Case III. Let ail = ai 2 as there is no prior reason to treat the two samples
- 1,
asymmetrically. Then for a given 8 =I~ 82 we can minimize V(irs) and v(jJx)
Singh, Horn, and Chowdhury (1998) proposed another method, according to which
a sample of n respondents instead of nl respondents of the previous method is
selected by SRSWR. Each respondent selected in the sample is asked two
questions . First question about the value of X; is asked by using the randomization
device R I as discussed earlier. The second question is asked about the membership
of the gang using the Warner (1965) model. For this situation we define a random
variable t, such that
I with probability PJr+(I-PXI-Jr),
t.-
I - { ° with probability P(I- Jr)+ Jr(I- p}
(11.5 .2.1)
where Z =n -I I ZIi
;;1
denotes the observed mean of n responses. In order to find the
It follows that
COY(ZJ)= E(t) [E(Z I t = I)-E(Z)] * 0
n
as
E(Z I t = I)= p( GI t =1)fix + p(G I t =1~ * E(Z)
since
p(Glt =I)= (P1C X )*1C.
P1C+ I-P 1-1C
Hence the lemma.
We now have the following theorem, the proof of which is easily obtained by
proceeding on the lines of obtaining the variance for the ratio estimator.
Theorem 11.5.2.3. The variance of the estimator jJ.;, to the first order of
approximation, is given by
V~;)'" ~~(Z)+ (fix - ~~V(iw)- 2(fix - ~)cOY(Z, i w)] . (11.5.2.6)
1C
The result of Theorem 11.5.2.3 can also be expressed as follows:
Let
1l = NG
N
be the proportion of persons in the finite population n belonging to the hidden
gang G . Let Y and X be the two quantitative sensitive characters of a hidden
gang G . We wish to estimate the population means
1 NG 1 NG
J1x =-N LXi' and J1 y =-N L Yi
G i=1 G i=l
value X and (1- 1l) is the probability that he will report value Tx '
Thus the distribution of the first response of the /h respondent is:
X with probability 1l ,
ZI - (11.5.3.1)
i - { T with probability (1 - 1l }
x
The third device R w is the same the as invented by Warner (1965), that is, it
consists of two outcomes. The statement' Are you a member of gang G ?' occurs
with probability P and its complement 'Are you not a member of gang G ?' occurs
with probability (1- p). Each respondent is also requested to use the device R w and
report 'Yes' or 'No' according to his status and the statement drawn by him from
the device, which, in fact, comes from the third response of the lh respondent. Thus
the probability of 'Yes' answer is given by
0= P"+(I-P)(I -") . (11.5.3.5)
Jrw
= e- (1- p) P
,,*
0. 5, (11.5.3.6)
2P-l
e
where = nl / n is the observed proportion of' Yes' answers in the sample.
Proof. Obvious from (11.5.3.5).
"w
Proof. Obtained by the method of moments.
2 lX 2
with probability n,
Zli =
T; with probability (I- Jr)
(11.5.3.10)
V(X) = E(X 2)- {E(X)f = 0';1 - (l-Jr )~(Ox - f-lxf + r;} . (11.5.3.13)
Jr
Similarly
v(r) = 0;2 -(l-Jr)~(Oy-f-ly~+r;} (11.5.3.14)
Jr
Again we have
Pxy - Jr~O;1 -(l-Jr)~(Ox _f-lx)2 +r;} ~O;I -(I-Jr)~(Oy _f-ly~ + r;} . (11.5.3.18)
and the population mean of the desired sub-group or hidden gang given by
I
Jlx =N IXi ,
G ieG
(11.6.3)
The randomized response from the /h unit for the sample s, as given in (11.6. 1) can
therefore be written as
Zki = X;Ii + (1- I;)Rki = X;Ii + Rk;If , (11.6.4)
where I f = 1- Ii . Denoting the expectation and variance as E R, VR, respectively
with respect to the randomization device, we have
E R (zki) =X;Ii + ER(Rki )If =X iIi + (}k I f =r, (k) (say) (11.6.5)
and
vR(zki)=IfvR(Rk i)=I fa} . (11.6.6)
Now consider the following linear homogenous unbiased estimator for ER (Zk ),
- I N
where Zk =- IZki based on the sampling design Pk as
N i =l
r, = I- L bSkiZki (11.6.7)
N iesk
with bSki being the known constants satisfying design unbiasedness ( Pk
unbiasedness) condition given by
(11.6.8)
Proof. We have
E(Tk) =EpER(Tk) =..!-E p I bskiYi =..!- IYi(k) =..!- ~ {x;Ii + I f (}k}
N iesk N i N I
I (}k / NG I
= - LXi+-LIi = - - LXi+(}k
(N-N G) (\ll
=JrJlx+ I-JrPk ·
N ieG N i N NG ieG N
Hence the theorem.
918 Advanced sampling theory with applications
VAk = - 12 [ .L -zl;
() (a; ()) k -I )~ an d ll"A(k) = - 1 r. -I;().
Zk;Zkj (aij ()
k -I + ~ .r. -(-)
N 1Esk ll"; k l~jESk ll"ij k N ;EPk ll"; k
Proof. Consider
E~k)=_12 E [ r. rl(k)+alIf (a;(k)-I)+ r. r. Y;(k)Yj(k)(a;-(k)_I)l
N p ;ESk ll";(k) ;~j Esk ll"ij(k) v J
=~[r.rl(k Xa(k); -1)+ .r.. r. y(k );Yj(kXaij(k )-1)+ ah;: (a; (k )-I)If]
N 1 l~jESk 1
V = v; + ak (1- i(k)),
, 2
(11.6.11)
N
where
A,
Vk = -
1
r. r. (ll";(k)ll"j(k)-ll"ij(k)J(-Zk;- - -Zkj-J2 + --"-~--'--'-'-
al(l-i(k))
N2 i<jEsk ll"ij(k) ll";(k) ll"j(k) N
Chapter 11.: Randomized response sampling: Tools for social surveys 919
i =I_1j-Tz (11.6.12)
°l-OZ
with variance
(11.6.13)
We have
(11.6 .15)
•
J.lx =
1j - (1 - i }'II _
i
°
1- Oz {r.
(1j - Tz}'I] }
-(O]-Oz)-(1j -Tz) , - 0l-OZ
1)0\ - T\02 -1)0\ + TA T20\ -1)°2
(11.6.16)
(T2 -02)-(T1-°1) = (T2 -°2)-(1) -°1).
without revealing to the interviewer which mode has been followed for giving the
response. Here we should stress the fact that Ii should have the same support as Xi'
otherwise one immediately knows the answer is either from option (i ) or ( ii ). Let
W be the probability that the respondent selects the option (i ) and (1- W) will be
the probability that he select the option ( ii ).
In order to find the variance of the estimator in (11.7.4) we have the following
lemma.
Proof. We have
On putting the value of a; from (11.7 .10) in (11.7.9) we have (11.7.8). Hence the
theorem.
Cx2+C2(1 + c2)
PRE = y x x 100% , (11.7.14)
C;+C;(I+C;) (I-W)
where Cx = ax and Cy = L have their usual meanings. Note that 0 ~ W ~ 1,
J.lx (}
therefore (11.7.14) shows that PRE is always more than 100%.
Theorem 11.7.4. The estimator Y1 is always more efficient than the estimator Yo.
Remark 11.7.1. The exact value of W can never be known. In practice, the
investigator/ interviewer does not need the actual value of W . The estimator
YI =.!. I: z; at (11.7.4) and estimator of its vanance s; =(n -1 r' i:(z; - YI f at
n~ ~
Remark 11.7.3. A rough guess can be made about W from a past surveyor pilot
survey. For example, if out of 100 persons 20 would like to report actual value
truthfully then W can be taken as 0.2.
Chapter 11.: Randomizedresponse sampling : Tools for social surveys 923
Remark 11.7.4. It is a fact that we have assumed W to be the same for all the
respondents in the sample, which is somewhat restrictive. It is worth mentioning
here that the case of unequal W for each respondent can be handled by a
hierarchical (or empirical) Bayes' approach.
Example 11.7.1. Show graphically that the percent relative efficiency (PRE) of the
optional randomized response technique is an increasing function of the proportion
of respondents revealing direct answers.
Given: ex = 0.1,and 0.7 .
Solution. The graphical representation of the PRE of the optional randomized
response technique is as follows:
C(x)=O.1
1200
1000
800 -+-C(s)=0.1
w
_--MA
It: 600 _C(s)=0.5
Do
400 -C(s)=0.9
200 ~~,.
O +-t-t-H H-t-+++-t-t-H H-t-++-H
~.
~~ ,,'Y
",~.
~~ ,,~
",~ .
~~ ",? C!>~
",~ .
C(x)=O.7
350
300
250 -+- C(S)=0.1
w 200 _C(S)=0.5
g:150
-:1(- C(S)=0.9
100~~~
50
o +++++++-+-+-+-+-+-+-+-I-HH-I-j
~~ ~'Y ~~ ~~ ~~ ~'? ~.C!>~
~. ~. ~.
Following Chaudhuri and Mukherjee (1988), Chaudhuri and Roy (1997b) assumed
that randomized response devices are available to produce a response rj from the /"
respondent in the sample such that
(11.8.1)
where a, rand () in (11.8.1) are constants and have their usual meanings. For
example, consider a practicable randomization device proposed by Chaudhuri and
Adhikary (1990). According to this device the /" respondent in the sample is
required to choose independently at random two tickets numbered a j and b, out of
boxes proposed by the investigator containing the tickets numbered (i ) AI> A2 ,
..., Am with known mean :4 and known variance cd, and ( ii ) BI> B2 , .. •, BI with
known mean B and variance a ~ .
Thus
( ) - -
ERZj =AY;+B , Rj= Z j-Bj.A (-\V-
() = V; 2 =\aAY;
and VRIf ( 2 2 2 \/-2
+aBJ/A
where ER and VR denote the expected value and variance corresponding to the
randomization device. Thus on comparing (11.8.1) with the above randomization
device, we get a = a~/:42 , t = 0 and () = aV:42 • Chaudhuri and Mukerjee (1988)
have also given an estimator for V; as
Theorem 11.8.1. The regression predictor for estimating the population total Y,
under randomized response sampling, is
t, = X PQ(r)+ IRJj - xjPQ(r)] , (11.8.4)
ies
where Qj and Rj are chosen subject to the condition
(I - R, " j)/ Qj"jXj = a constant \::l i E n
Chapter 11.: Randomized response sampling : Tools for social surveys 925
Theorem 11.8.2. Two estimators for estimating the variance V(t r ) are given by
+ L(diaki~V;
ie s
where k = 1,2, frij are the sample analogue of the parameters obtained as
Fij = ER[e;(r)- e;][ej(r)- eJ, di = 1/Jri' dij = 1/ Jrij' eij = lJri Jr r: Jrij)' ali = 1, and
The problem of estimation of variance of the linear regression estimator under the
randomized response sampling has also been considered by Chaudhuri (1993), and
Chaudhuri and Maiti (1994). Tracy and Singh (2000) considered the problem of
estimation of variance of the general linear regression est imator under scrambled
responses using low level and higher level calibration approach.
(11.8.1.5)
n
subject to the condition I wiXi = Tx yields Wi = di(I + qiX;' A), where A denotes
i;l
the Lagrange multiplier and the values of qi are suitably chosen weights which
results in different forms of the estimators. The resulting estimator Ys at (11.8.1.1)
becomes
-) Inn
y(Ys =- I I
r,
-L - -
rj J2 + In y. = V- (Y-s ,lin
-!- (11.8.1.8)
2 i .. j;l TCi TC j i;( TCi
leading to the estimator of the variance of the usual Horvitz and Thompson (1952)
estimator under RR sampling.
Chapter II.: Randomized response sampling: Tools for social surveys 927
Case III. If we choose q; = I / x; the strategy reduces to the usual ratio estimator of
total under scrambled responses, say YR ' Under SRSWOR sampling,
YR = Nr( ~}
_ I n _ I n - I N
where r =- Lr; , x =- LX; X = - LX; and the estimator of variance takes the
n ;=1 n ;=\ N;=\
form given by
'(y,' I )- N2(I-f).{!.~l
v R s - ( ) L.. (
X)2 +-N .{!.'2
) -,,- L.. v; , (11.8.1.1 0)
n n -I ;=\ I + a; X n ;=1
Theorem 11.8.1.4. A class of estimators for estimating the variance of the ratio
estimator of population total Y, is given by
, (, ) N2(I-f) n 17l (X)g N n ,2
vg YR Is ( ) L-(- ) -,,- +-LV;, (11.8.1.11 )
n n- I ;=1 I + a; X n ;=1
where g is a suitably chosen constant such that the variance of the estimator of
variance is minimum.
Theorem 11.8.1.5. A general class of estimators for estimating the variance of the
general regression estimator under scrambled responses is given by
, (, ) N2(I-f) n 17l (X)g N n ,2
Vg YG Is = ( ) L-(- ) -,,- +-LV;, (11.8 .1.12)
n n-I ;=Il+a; X n ;=1
v2
2(I-f)
,(,YGls)= [N ( ) L-(-)+-LV;
n 17l N n H-,,-,
(X) ,2] (11.8 .1.14)
n n- I ;=! I + a; n ;=\ X
928 Advanced sampling theory with applications
where H(e) is a parametric function such that H(l)=1 satisfying certain regularity
conditions.
Following Singh, Hom, and Yu (1998) (refer to Chapter 5 for details) consider
here another estimator of variance
A
VN G S
(r. I )_I~ z:~
--£..,
n
ij J
(Wi'li- W/7jf
( )
~A.A
+ £"'Y'iVi, (11.8.2.1)
2 i;Ij("i~1 ,,(I + ai) 1+ a j i;1
where nij and ¢i are the new weights such that the distance between nij and
Dij = dl~ij' as well as that between ¢i and wl, respectively, are minimum. Tracy
and Singh (2000) considered two chi square type of distance functions
D) = -2
1
I I
i;Ij(";~1
(nij- Dijf(Dijt1ij)-1 (11.8.2 .2)
and
1 n{
D2 =2i~I\¢i -Wi J °iWi
2\2( 2)-1 . (11.8.2.3)
They assumes that in many situations the variance of the estimator XHT of
population total X given by
VSYG(XHT)=!I I
2i;Ij(*~1
0ij(diXi- djXjf
is known either from past surveys or can be calculated . The weights nij are
chosen such that the chi square distance (11.8.2.2) is minimum subject to the
second order calibration constraint
!i: i: nij(diXi-djxjf=vsYG(XHT)'
2 i;[j("i~l
(11.8.2.4)
(11.8.2.5)
where
A (A ) 1 n n ( \2
VSYG X HT = - L L Dij\diXi - djxj) .
2 i;[j(,,;~[
Tracy and Singh (2000) introduced a new calibration constraint
i:¢i[V[x[ +v2 xi +V3]=
i;)
I [u1xl +U 2X i +U 3]= Qx,
i;[
(11.8.2.6)
Chapter II.: Randomized responsesampling: Tools for social surveys 929
that E[Vlx? + VZXi + V3] == Qx . Minimization of (11.8.2 .3), subject to (11 .8.2.6), leads
to the new calibration weights given by
do _
n - Wi +
z b"iwl(UJ x? +vZxi +UJ)
~ )
[Qx - ~ Wiz{\UJ Xiz + VZXi + UJ )~ .
z: (11.8.2 .7)
n Z Z i-I
L b"iwi UJ Xi +VZXi +UJ -
i=1
(11.8.2.8)
where
and
n b". w~ (Z Y Z )
I -I-'-'- \ai'i + Yi'i + 0iAUJ Xi + VZXi + V3
" - i=1 +ai
BZ -
I OiWl(UJ xl + VZXi + V3)
i =1
A large number of estimato rs can be shown to be special cases of the estimator
( 11.8.2.8).
Case I. Under the SRSWOR sampling, let qi == l/Xi ' d ij ==(di Xi-d j Xj)-Z, and
0i == ~l xl + VZXi +v3 t.
Also, for simplicity if we take VI == I and Vz == UJ == 0 then
an estimator for estimating the variance of the ratio estimator under scrambled
responses is given by
" (9R 1)
VN s == NZ(I-
(
f) n~
) L(
? ) -:- 2 ) +-N~"z(X)Z
X )Z(s; ""vi -:- , (11.8.2.9)
n 11 - I i=l I + ai X sx n i= l X
vAYG Is)= N
2(1_
f)f ryl F(~,
n(n-I) ;=I(I+a;) X
stJ+ NfVlG(~),
Sx n ;=1 X
(11.8.2.10)
where F(e,e) and G(e) are parametric functions such that F(I,I)=I and G(I)=I,
satisfying certain regularity conditions . Under SRSWOR sampling a more general
class of estimators has been suggested as
A( A
c YG
V
2(I
Is) = [N ( -f)n
n n -I
ryl Nn 2 ] (X
) L-(--)+-LV; F -;:-, -2
;=1 I + a ; n ;=1
A
X
s;J
sx
. (11.8.2.11 )
and
P(A c I R)= I-P(A I R) . (11.9.1.2)
According to Leysieffer and Warner (1976), the response R IS regarded as
jeopardizing with respect to A or A C if
P(A IR»" or p(Ac IR»l-" (11.9 .1.3)
respectively. Thus we have
((AIR)
PA
C
IR
(1-")= ((RIAl.
" I PR A
C
(11.9 .1.4)
It follows that if the right hand side of (11.9.1.4) is greater (less) than unity, then R
is Jeopardizing with respect to A (A C ) in the sense that, with this response, a
respondent genuinely of group A ( AC ) rather than in group AC ( A ) thus tilts the
scale against himselfiherself if A ( AC ) is stigmatizing. From (11.9.1.3) and
(11.9.1.4), Leysieffer and Warner (1976) proposed the natural measures ofjeopardy
carried by R about A and AC , respectively, which are as follows
and g(R I AC)= 1/ g(R I A).
g(R I A)= P(R I A)/P(R I AC ) (11.9.1.5)
These functions are called Jeopardy functions. The response R is non-jeopardizing
if and only if
g(R I A)= 1. (11.9.1.6)
Clearly the probability of' Yes' response is given by
A= p(r I A)" +(1-" )p(r I A C)=
[p(Y1 A)- p(r lAc)] "+ p(r I A
C
) . (11.9 .1.7)
If an SRSWR sample of size n is taken and .i is the sample proportion of 'Yes'
answers, then following Warner (1965) an unbiased estimator of" is given by
ir _ .i- p(r I AC
)
(11.9 .1.8)
- p(rIA)-p(rIA C )
which is defined if and only if
p(r I A)- p(r I AC ) * 0 (11.9 .1.9)
that is, if and only if the condition g(R I A) = 1 is violated. In fact, we have proved
that the existence of an unbiased estimator for " necessarily makes a response
jeopardising with respect to either A or AC •
The variance of ir becomes
932 Advanced sampling theory with applications
8V(i) 8V(i) .
< 0 and ( ) < 0, respectively.
8g (Y I A ) 8g N lAc
It indicates that for the sake of efficiency one needs as large magnitudes as possible
for g(Y I A) and g(N I AC ) both above unity. From a practical point of view,
regarding protection of privacy, one can fix some maximal allowable levels of
g(Y I A) and g(N lAc) (say, k1 and kz ), respectively. After fixing g(Y I A) and
g(N I A at
)
C
k( and kz the optimal choice of the design parameters for the particular
RR model can be worked out. In this way one can work out the variance
expressions for each RR model by substituting the values of design parameters and
then can be compared at the same level of protection of privacy.
Lanke (1976) assumed that the member of A may hesitate to reveal which group
he/she belongs to. On the other hand, a person who belongs to A C is supposed to be
quite willing to acknowledge the fact. It means the membership in A may be
embarrassing while membership in AC cannot be considered so. 'Embarrassment'
must mean 'suspicion of belonging to A '. A reasonable conclusion is, then, that
the larger the conditional probability of belonging to A given a certain answer, the
greater the embarrassment caused by giving that response.
summarised as follows. First derive the conditional probabilities P(A Ir) and
P(A I N) using the design probabilities and also check P(A Ir) > P(A I N) or vice-
versa for both strategies. The two strategies, say Rand P2 , will then be considered
equivalent from the protection of privacy point of view if R(A I r)= P2(A I r) . On
equating these two conditional probabilities a relationship is obtained between the
design parameter (say, R) of one strategy and the design parameter (say , P2 ) of the
other strategy. Substituting this value of P; into the variance expression of the first
strategy and on comparing it with the variance of the second strategy, we can be in
a position to assess the efficiency of the first strategy with respect to other strategy
at the same level of protection of privacy. Using the above two measures Bhargava
(1996) and Bhargava and Singh (2002) considered the comparison of Mangat and
Singh (1990) and Mangat (1994) strategies with the pioneer model of Warner
(1965) .
with known probabilities P2 and (1- P2 ) respectively, is exactly the same as used
by Warner (1965) . The respondent is instructed to experience first the
randomization device R j • He/she is to use R2 only if directed by the outcome of
R} . The respondent is required to answer 'Yes' if the outcome points to the
attribute he/she possesses and answers 'No' if the complement of his/her status is
pointed out by the outcome. The whole procedure is completed by the respondent,
unobserved by the interviewer.
(11.9.3.1)
A flow chart of Mangat and Singh 's two-stage model is given below:
934 Advanced sampling theory with applications
Corollary 11.9.3.1. For Pz =11 (say) and T = 0 the Mangat and Singh (1990)
model reduces to Warner (1965) model, or W model.
Corollary 11.9.3.2. The V(7fI) < V(7f w) if T >(1- Z1\)/(I- 1\) which shows the
estimator 7f1 can always be made more efficient than the usual Warner estimator,
7f w (say) by suitably choosing the value of T for any practicable value of PI '
PI =k/(1+k). If, however, kl *k z , asgw(r I A)= gw(N lAC), different upper bounds
for them cannot be attained simultaneously. In that case if, without loss of
generality, k, < k z one should chose a design such that
PI =kl/(k, +1) . (11.9.4.4)
For the Mangat and Singh (1990) model we have the design probabilities as
p(Y1 A)= P(N I AC)= T+(I-T)Pz and P(N I A)= p(YI AC)= (I-TXI-P z) . (11.9.4.5)
Theorem 11.9.4.1. The inequality p(r I A) > p(Y lAc) holds if Pz > (1- ZT)jZ(I- r).
Proof. On substituting the values of p(r I A) and p(Y lAc) in p(Y I A) > p(Y I AC) we
have
936 Advanced sampling theory with applications
T+(I-T)P2 > (I-TXI-P2) or 1-2P2 <--.!- or 1-2T < 2P2 or P2 > 1(-2T).
I- T I- T 21-T
Hence the theorem .
Now the jeopardy functions for the Mangat and Singh model are given by
gms(YIA)= [+\-T)P2)
I-T I-P2
and gms(NIA c)= [+(l
I-T I-P22r iT)P(l1.9A6)
Let k) and k 2 be the maximum allowable values of gms(Y I A) and gms(N lA c). If
k\ = k2 = k (say) maximization of these jeopardy functions leads to a design with
T+(I -T)P2 () ( X)
or T+ 1- T P2=kl-T I -P2 or P2=(
k(I-T)-T
X ). (11.9.4.7)
(I -T XI-P2)= k 1-T l+k\
Thus PI =kl/(k1 +1) and P2 = {kt(l-T)-T}/{(l-TXl + k))} are the optimal choices
for design parameters of the W model and MS model respectively.
Theorem 11.9.4.2. With the optimal choice It = kt/(kt + 1) of the design parameter,
the variance of the unbiased estimator i w is given by
V(i )= JT(l-JT) + k\(kt-1)-2 (11.9.4.8)
w
n n
Proof. On substituting the value of It in the expression on variance of W model,
we have
JT(I-JT)
=
k\ (kl -1
--+~-'---'--
t2
n n
Hence the theorem .
Thus for the optimum choices of 11 and P2 we have V(i l ) = V(i w), hence we
conclude that both MS model and W model are equally efficient at the same level of
protection of privacy.
(11.9.4.14)
Theorem 11.9.4.5. The Max[Pms(A I Y), Pms(A I N)] occurs if the conditional
probabilities are satisfying the following inequalities:
(i) Pms(AIY»Pms(AIN) if P2 > (I-ZT)jZ(I-T); (11.9.4.15)
(ii) Pms(AIN»Pms(AIY) if P2 «I-ZT)jZ(I-T). (11.9.4 .16)
Proof. On using Pms(A I Y) and Pms(A I N) in the inequality Pms(A I Y» Pms(A IN),
we have
938 Advanced sampling theory with applications
li[T+(1-T)P2] li(1-TX1-P2)
li{T + (1-T)P2}+ (1-li X1 - TX1 - P2) > li(1- TX1 - P2)+ (1-li ){T+ (1- T)P2} ,
or
li(1-liXT+(1-T)P2f > li(1-liX1-Tf(1-P2f,
or
[T + (1- T)P2] > (1- TX1- P2~
or
1-2T
P2 >-21-T
(- ) '
Which proves the first part of the theorem. The second part of the theorem can be
similarly proved. Hence the theorem Therefore the measure of protection of
privacy in the MS model is given by
_ {Pms(A I Y) if P2 > (1- 2T)j2(1- T),
ms (11.9.4.17)
IfI - Pms(A I N) if P2 < (1- 2T)/2(1- T)
Thus there are two measures of protection of privacy in each strategy. We shall
consider the following four cases for comparing these two strategies.
Case 1. On setting IfIw = IfIms we have
Thus the variance expression of the MS model and the W model under Leysieffer
and Warner's measure remains the same. Hence both the MS model (or two-stage
model) and the W model are equally efficient at the same level of protection of the
respondents . Nayak (1994) has also reported a similar conclusion .
Remark 11.9.1. Although the MS model and the W model are theoretically
performing the same, but are psychologically different to the respondents in a
sample, and thus from psychological point of view one may expect more co-
operation in the MS model than with the W model.
(11.9.5.1)
Again we shaII consider here two measures for comparing M model and W model.
( a ) Leysieffer and Warner's measure: For the M model we have the design
probabilities as foIIows:
P(YIA)=I, p(NIA)=O, P(YIA C)=I-P3 , and p(N IA C)=P3 •
Clearly the condition p(Y I A) > p(Y lAc) is true for P3 > O. Now the jeopardy
functions are given by
gz(YIA)=1/(I-P:J) and gz(NIAC)=a:>. (11.9.6.1)
Note that the second jeopardy function is infinite, we can take maximal aIIowable
limit for gz(Y I A) as k(, that is,
Theorem 11.9.6.1. With the optimal choice of design parameter P3 =(k( -l)/k t the
variance of the unbiased estimator "z is given by
V("z)= tr(l-tr) + (l-trXk\-lt' (11.9.6.3)
n n
Proof. On substituting P3 =(k 1 -1)/k1 in V("z) we have
V("z) = tr(l-tr) + (l-trXI-P3 ) = tr(l-tr) +(l-tr{l- kl-I)/n(k1-1)
n nP3 n \ k( k(
= tr(l-tr) + (l-trXk1-It(
n n
Hence the theorem.
The variance expression given in the above theorem is different from the one for W
model under the optimum choice of parameters. Thus we have the foIIowing
theorem:
Theorem 11.9.6.2. The M model is always more efficient that the W model at equal
level of protection of the respondents.
Proof. For the optimum choice of parameters we have
Z
V("w)= tr(l-tr) + k1(k,-lt ,
n n
and
A tr(l-tr)
V(trz - -
)_ (l-trXk(-lt'
- - +-'------'-'---'---'--
n II
Chapter 11.: Randomized response sampling: Tools for social surveys 941
which reduces to
(11.9 .6.4)
Note that k) is always more than one, therefore the above inequality will always
hold. This completes the proof.
( b ) Lanke's measure: The revealing probabilities for the M model are given by
Theorem 11.9.6.3. The variance of the estimator i w for PI = 1/(2 - P3) is given by
v(.lZ"w )= lZ"(I-lZ") + 1- Fj
2' (11.9 .6.8)
n nFj
Proof. Obvious from the variance expression of the W model.
Theorem 11.9.6.4 . The M model is always more efficient than the W model at an
equal level of protection of the respondents.
Proof. We have
V(i)= lZ"(I-lZ") + I-Fj
n np;l
z=..!.-fz;
n ;=1
and s;=(n-It'f(z;-zf
;=1
respectively, such that
E(Z) = f.J z and E(s;)= a; .
Thus we have the following theorem:
V(jJx)= v(:;L
P
aiz =~[pa; +Qa; +PQ~x - f.Jy~]
nP nP
and an unbiased estimator of V(jJx) is
Chapter 11.: Randomized responsesampling: Tools for social surveys 943
V(pJ=S;/~p2}.
Proof. Obvious using above results .
Pollock and Bek (1976) considered the additive model. In this model each
respondent in an SRSWR sample is asked to sum his/her sensitive attribute (X )to
a random value (Y) taken from a known distribution. The observed response,
denoted by 2 , is
2 = X +Y. (11.10.2 .1)
Here the random variable Y is distributed independently of sensitive attribute X .
Then the mean and variance of 2 are
2 2 2
Il z = Il x + Il y an d 0" z = 0" x + 0" Y .
If 21> 2 2 , .. ., Z; be the observed responses in a SRSWR sample of size n, then an
unbiased estimator of Il x is obtained as
Px = Z- Il y (11.1 0.2.2)
where Z denotes the sample mean of the observed responses. The variance of the
estimator Px is given by
(11.10.2.3)
Pollock and Bek (1976) also considered the multiplicative model. This model was
further considered by Eichhorn and Hayre (1983). According to them, each
respondent in an SRSWR sample of n units is asked to multiply his/her X value
by a random number Y taken from a known distribution and thus giving a
scrambled response to the interviewer. They also referred to this model as
scrambled RR method. The observed response ( 2 ) is
2=XY . (11.10.3.1)
Here also the random variable Y is independent of X , The mean and variances of
2 are
Il z = Ilxlly and
OX ox
ou' ov 0,
J = oy oy = -,--
1 u
2 v
- - v V
OU' OU
which implies that
IJI=~v ·
The joint distribution of u and v is given by
g(u, v) = f(x,Y)IJI
2-1)
~
= [B(al> PI )]-1 [B(a2' P2)]-1 v(al-I)(I_ v)(Pl-I)( )(a (I_~ )(P2-I)
= [B(al> PI)jl[B(a2,P2)jlval-(a2+P2)(I- vjPI-I)u(a 2-1)(v - ujP2- 1). (ll.l 0.5.6)
Ifwe assume
al =a2 +P2
we obtain
g(u, v) = [B(al' PI )]-1 [B(a2' P2 )]-1 (1- v)(al-l)u(a2-1)(v_u )(P2-1). (ll.l 0.5.7)
The region (0 < x < I, 0 < Y < I) in the (x, y) plane transforms to the region
(u < v <1, 0 < u < I) in the (u, v) plane. Following Rao (1973) the value of integral
of g(u, v) over v from u to I, the density of u is also beta with parameters
(a2, PI + P2) provided that the condition al = a2 + P2 is satisfied. Thus the marginal
density of u is given by
h(u)= [B(a2,al +a2)]-l u(a2-1)(I-u)(PI+P2-I), 0 < u < 1 (l1.I0.5.8)
which reduces to
h(u)=BIU(a2-1)(I-uiPI+P2-1), O<u<1 (11.l0.5.9)
(1-v)tPI-l)(v- u)(PZ-I)
-B (11.10.5.10)
- z (l-uip1+Pz-J) ,
The general rule for working out the measure of privacy protection is given by
E(X I Z)= E(V I U)= JVh*(v Iu)dv (11.10.5.11)
and
I (1 jPI-I)( jPz-I)
Var(VIU)=BzJ(v-E(VIU))Z -( 'j/~u I) dv, (11.10.5 .14)
u I-u 1+ z-
and the measure of privacy protection is
E[Var(V IU)] = BI Jvar(v Iu)u (az-I)(I-u)PI+Pz-I du . (11.10.5.15)
o
Note that the integral expressions (11.10.5.13) to (11.10.5.15) cannot be evaluated
exactly, therefore it is not possible to derive a simple expression for the efficiency
of the multiplicative model. Bhargava (1996) has resolved this issue through
numerical illustrations with different sets of parameters.
J= o u' o v
d y oy
=1°'1,-111 =_1
ou' ov
therefore
IJI = 1.
Using ( 11.10.5.2 1) in ( 11.10.5.19) and using jJj = 1 we have the following jo int
distribution of u and v
g(u, v) = [B(at> PI)j l [B(az,pz)j l v(Ut -t)(I _ vjPI-I)(u - vjPz-t)(I _ u + vjPz-l) (1 1.10.5.22)
°
The region (0 < x < I, < y < I) in the (x, y) plane transform s .to the region
(O < v <u, O< u< 1 and u - l< v< l, l <u < 2) in the (u, v) plane. Hence by
°
integrating g(u, v) over v, first from to u and then from (u -I) to I, then the
density of u as
hI(u) = [B(al ,PI)j l[B(a z,Pz)j ' lvCu1- 1)(I- v)(PI-l)(u - v)(uZ - I)(I _ u + v)(PZ - I)dv
o
for 0 < u < I ,
I
hz(u) = [B(a l,p t)jl [B(a z,pz)jt jv(UI- I)(I_ vjPt-t)(u - v)(uZ-I)(I_ u + vjPz- l)dv
u-I
for 1 < u < 2 .
and
_ I" ii
2 2 4 2 0.0238 0.0133
2 2 5 3 0.0178 0.0155
2 2 3 I 0.0333 0.0217
1 I 3 2 0.0250 0.0222
I 1 4 3 0.0083 0.0155
1 I 2 1 0.0417 0.0321
0.5 0.5 2.5 2 0.0215 0.0191
0.5 0.5 3.5 3 0.0139 0.0117
0.5 0.5 1.5 1 0.0412 0.0352
The above table shows that if the sensitive variable follows a beta distribution , the
multiplicative model remains more protective than the additive model in most cases.
Assume Jr be the proportion of people belonging to sensitive group A and (1- Jr)
be the proportion of persons belonging to non-sensitive group A C such that
Au A C = n . Owing to the sensitive nature of the group A, people do not like to
disclose their status to the interviewers. An SRSWR sample of n respondents will
be taken and each respondent will be given two random devices R 1 and R 2 • Each
of the random devices will have two statements:
950 Advanced sampling theory with applications
(i ) I belong to group A;
( ii) I do not belong to group A.
Using different probability mechanisms, under each device the respondent chooses
statement (i ) or ( ii ) with probability P or (I-P) and simply answers 'Yes' or
'No' depending upon his/her actual status. The responses under the two devices are
assumed to be independent. Let I be the probability that a person gives an
untruthful answer whether he/she belongs to A or AC •
Thus we have
I = Prll.Intruthful answer I A) = pr(Untruthful answer I A C
) .
(11.11.1)
Let
X. =
I
{I if the ith respondent answers 'Yes' with device R1,
0 otherwise,
and
__ {I if the ith respondent answers 'Yes' with device Rz,
li
o otherwise .
Then
Pr(X; = 1, li li = I)=[PI + (1- PXI-/)!(I- PY + P(I-/)]
= 0) = Pr(X; = 0,
Instead of assuming
I = Pr(Untruthful answer A) = Pr(Untruthful answer I AC )
one can assume
Pr(Untruthful answer I A) = I, and Pr(Untruthful answer I AC )=O.
Lakshmi and Raghavarao (1992) considered a p-variates N p (J.1, L) normal
distribution with mean vector J.1 and positive definite dispersion matrix I and by
In ' a column vector of nones. Lakshmi and Raghavarao (1992) considered the
problem of testing the null hypothesis
H o : 1=0
against the alternative hypothesis
n, : 1>0.
Following Rao (1973), the asymptotic distribution of the bivariate random vector
Chapter 11.: Randomized response sampling : Tools for social surveys 951
~[(n~o, n~I)_0/1'2]
is N 2(02' L / ) , where 02 is a vector of dimension 2 and
L/-_[0/(1-0/ 1 -Ol] .
-ol, 0/(1-0/)
Under the Null hypothesis H 0 : I = 0 , the statistic
lO Ll
T -- n(n--uo' nO] Ll)" -]( nlO Ll nO] Ll)1
--uo '::"'0 --uo' --uo (11.11.2)
n n n n
is distributed as a central chi square distribution with two degrees of freedom,
where 00 = P(I- p) and
LO=[00(1-00), -OJ].
-OJ, 00(1-00)
Lakshmi and Raghavarao (1992) developed the following result: A critical region
for an a level test
H o : 1=0
against the alternative hypothesis
n, : I > 0
is
(11.11.3)
Singh (2002c) suggested another procedure which may result in a greater sense of
response confidentiality among the sampled individuals. The procedure can be used
in surveys where the respondents selected in the sample assemble at a common
place for the conduct of the survey . This could be a situation of collecting data
from a small town, community or organization. The procedure invokes K decks of
cards (which is named as stochastic randomization device) with different
proportions of cards carrying the statement, '1 belong to group A ' . After explaining
to the respondents how the randomization device provides confidentiality to their
responses, the investigator asks one of the assembled respondents to randomly
select one deck of cards from the box containing K decks of cards . The deck is
then used to collect information on the sensitive attribute from the respondents.
Every sampled respondent draws one card from the selected deck of cards and reads
the statement on it. In this procedure every respondent is provided with two
identical slips of paper with 'Yes' or 'No' printed on them. According to his status
in relation to the statement printed on the card drawn, each respondent is requested
to put one of the two slips of paper into an empty box . After the survey is
completed the number of ' Yes' answers is counted from the box and the proportion
p. for the deck used in the survey is noted. Random selection of one randomization
952 Advanced sampling theory with applications
device from several such devices may help in increasing the sense of confidentiality
among the respondents. The choice of values of p for preparing K decks of cards
for the survey is important in this procedure. These K values of p could either be
purposively selected by the investigator or they could be taken as a random sample
from a known discrete or continuous density function . Let this density function be
denoted by f(p) . The value of p corresponding to the deck used in the survey
will be selected from this random sample of p values with equal probabilities.
Thus the value of p * used in the survey is a random variable with f(p) as its
probability density function . When f(p) is a one point distribution then this
procedure reduces to Warner (1965). Assume nl persons in the sample answered
'Yes' and (n-nl) answered 'No'. Note that the probability of 'Yes' answer for a
particular choice of p * is given by
B== p*lf+(I- lf X1- p*) . (11.12.1)
Consider the following estimator of If as
A _0-(1-P*)
lfR- 2p*-1 ' (11.12 .2)
Theorem 11.12.1. The estimator ffR is unbiased for the population proportion If •
Proof. We have
A ) == E1E (A
E (lfR z lfR ) == E1Ez {0-(1-* P*)} == E1()If == If
2p -1
where E z denotes the expected value for the fixed value p * of p and EI denotes
the expected value for all values of p generated by its distribution. This
completes the proof of the theorem.
n an 2p-lJ
where f(p) denotes the probability density function (p.d.f.) of p .
Proof. We have
V(ffR) == E1Vz(ffR) + fJ Ez(ffR)
where E I and E z have been defined earlier. Similarly, Vz and VI, respectively,
denote the variance for a fixed value p * of p and for all values of p generated by
its distribution. Thus we have
Chapter II .: Randomized response sampling : Tools for social surveys 953
Corollary 11.12.1. For a = Po and b = Po + g(l- Po), with 0 < g < I, the estimator
ll- R remains always more efficient than Warner's estimator . Here it is possible to
find more acceptable choices for a and b.
Theorem 11.12.4. The variance of the estimator ll-R with f(p) defined in (11.12.5)
is given by
V(ll-R) = Jr(I- Jr)+
n
t afJ
n a -fJf +a+fJ
} . (11.12.7)
954 Advanced sampling theory with applications
Proof. After replacing the value of f (p) from (11.12.5) in (11.12.3) we have
V(n-R) = "(1- ") +..!.- fp(a+'}-I(I - pj P+I}-l dp,
n no
which proves the theorem.
n II
t2Po-l f +2To(l-po)} ,
v(""ms )= "(1-") + (I -PoXI -ToXI-(I-PoXI-To)}
making the proposed procedure more efficient than the Mangat and Singh
(1990) method. Hence in this case as well one could possibly find other efficient
and more acceptable values of a and f3 .
Exercise 11.1. (I) Consider a social survey has been conducted by an investigator
using randomized responses from a device consisting of two devices R1 and
R2 (say). The device R, consists of the following two statements:
A 81 -(I-PXI-T)
Jrl = 2P-I+2T(I-P)·
( b ) Show that the estimator JTI reduces to the estimator proposed by Warner
(1965) for T = 0 .
( c ) Show that the estimator JT, is more efficient than JT w if
T > (I-2p)/(I-P).
(d) Study the properties of the estimator JTl under the SRSWR and the SRSWOR
sampling designs.
Hint: Mangat and Singh (1990, 1991).
( III ) Modify the second randomization device R2 of Mangat and Singh (1990)
model with the statements :
( I ) 'Do you possess the sensitive attribute A?' with probability P ;
( 2 ) 'Do you possess the unrelated attribute Y?, with probability (1 - p).
Obviously the probability of 'Yes' answer is given by
B = TJr+(I-rXPJr+(I-P)Jry] .
Develop an estimator of Jr and discuss its properties : ( a ) when Jry is known ; (b)
when Jry is unknown.
Hint: Mangat (1992), Mangat, Singh, Singh, and Singh (1993) .
(IV) Compare Mangat and Singh (1990) model with Warner (1965) model at
equal level of protection of the respondents . Discuss your views.
Hint : Nayak (1994) , Bhargava (1996), Moors (1997) .
if it is in G2 • We confront each person with two urns. There are red bal1s and black
balls in each urn. We ask each person to select k balls from each urn (WR
sampling), mental1y noting the number of red balls obtained from each. We do not
observe the drawing of the balls, and the subjects understand that we will not know
the results of the separate draws. Then each person is told to reveal the number of
red balls obtained from the urn corresponding to his/her group. Persons from ,G 1
will tell the number of red balls obtained from urn 1, and the persons from G2 will
tell us the number of red balls obtained from urn 2. By such a mechanism ,
confidentiality of individuals is preserved. Let {}1 and {}2 be the proportion of red
balls in the two urns. We control the values of {}t and {}2, and we would never
consider the case {}1 ={}2 ' Let r be the observed proportion of ' Yes' answers in a
sample of n respondents, then assuming that k = 1, show that an unbiased
estimator of population proportion " is given by
, r- {}2
"I =- - .
e.. - (}2
Find its variance and discuss the results and your views.
Hint : Kuk (1990), Chatterjee and Simon (1993).
Exercise 11.3. Let the value Y; of a sensitive variable y, defined on a finite survey
population of N identifiable and labelled persons, be supposed to be unavailable
through a direct response survey when one intends to estimate population total Y
on choosing a sample s from the population with a probability p(s) according to
design p . Instead, let a randomized response R, be available in independent
manner from the respective persons i , on request if sampled, in such a way that
their expectations, variance, and covariance (ER, V R, CR) respectively satisfy
ER(r;) = Y;, VR(r;) = ajy;2 + PjY;+{}j= ~i (say), CR(r; , rj)= 0, for i *j such that
ai > 0, Pi and {}i are known for every unit in the population.
l}; = (~ J(I-PXY)l}PXY , l}~ = [N(l- PXY)+ P:; r, and l}~ = (1-(P;;)jN +(P;;l}
with Pxy being the known correlation coefficient. Find the bias and variance for
each one of the estimators .
Hint: Bansal, Singh, and Singh (1994), Grewal, Bansal, and Singh (1999) .
( c ) Under PPSWOR sampling, an unbiased estimator of population total is
• n 'i
YWOR = I - ·
i=IJri
Find its variance and suggest an estimator of variance.
Hint: Godambe (1980b), Amab (1994).
( d ) If population total X of an auxiliary character is known, then find the variance
of the generalized regression estimator (GREG) of the population total, defined as
YGREG =
i=1 Jri
±.!l
+ PdS[X -
i=1Jri
±~] .
Suggest at least two estimators of its variance.
Hint: Chaudhuri, Maiti, and Roy (1996) , Tracy and Singh (2000) .
( e ) Assuming that Yi is a qualitative variable, show that a linear homogeneous
unbiased estimator of the population proportion is given by
Yp = Ibsi'i
ies
where the bsi are constants and are free from Yi values , and satisfy the condition
Ibsip(s) = N- I .
S=>;
Hint: Amab (1996)
Exercise 11.4. Consider a finite population of N first stage units (FSUs) and let the
;th , i = 1,2,...,N, FSU consist of M, second stage units (SSUs). Let a sample s of
n FSUs be selected with probability PI (s) following some sampling design PI and
if the ;th FSU is selected in the sample, we take a sub-sample Si of mi SSUs from
M i SSUs of the ;th FSU with probability pz(si ) following a sampling design pz.
The sub-sample Si , i ES, are selected independently . The overall sampling design
for selection of sample (si,iEs)is denoted by p . Let Eplvp) , EI(VI), and Ez(Vz)
denote expectation (variance) operators over the sampling design p, PI, and Pi-
respectively. Let Jri' Jrij' etc., denote the first and second order inclusion
probabilities of FSUs. Let Yij be the value of the character under study for the /h
SSU of the ;th FSU (; = 1,2,...,Mi; i = 1,2,...,N} Let rij denote the standardized
randomized response, such that
ERh)= lfj' VR~ij)= O"J = Bijlf] + .BijYij +oij and CRh,rkl)= 0 for (i,j) * (k,l) .
958 Advanced sampling theory with applications
where ~(r)= I bis;) rij ' b;(s) and bj(s;)are constants free from rij values such
jes;
that I bj(s;)pz(s;)= 1 and Ib;(s)p;(s)= I . Suggest an estimator of its variance.
S;3j 53;
Hint: Arnab (1992b) .
'Yes' answers) are not reporting 'Yes'. Then show that an unbiased estimator of
population proportion tt is given by
• ~ - (I - P)
Jrt = 2P -I
where 01 = min follows an Inverse Binomial distribution. Is it possible to list
situations where Inverse Binomial Randomized Response (IBRR) can be
implemented in actual practice?
Hint : Mangat and Singh (199lb, 1995).
Exercise 11.9. In the first phase select a preliminary large sample of m units from
the population of N units by using SRSWOR and only auxiliary information X is
measured on these m units as Xlo Xz, ..., Xm . In the second phase a sub-sample
of n units is drawn from the preliminary large sample of m units using PPSWR
and then the scrambled responses ri are measured through a randomization device.
1 n r,
Yp =-I-.!;-
mn i=lPi
where Pi• =Xi IXi
i=1
1
Study the asymptotic properties of an estimator of the population mean
m
denotes the probability of selecting the lh unit from the given first phase sample.
Hint: Grewal, Bansal, and Singh (2002).
Exercise 11.10. In the direct question survey methods , if there are u distinct units
in an SRSWR sample of size nand k, is the frequency with which the lh distinct
unit occurs in the sample, then we have E(yu) = Y and V(Yu) s V(Yn) where
1 I
Yu = u- fYi and Yn = n- ±kiYi . Show that this inequality oscillates in the case of
i=1 i=1
Chapter II .: Randomized response sampling : Tools for social surveys 961
randomized response with replacement sampling . Deduce the results for qualitative
characters also.
Hint: Arnab (1995), Mangat , Singh, Singh, Bellhouse, and Kashani (1995), Singh,
Mahmood, and Tracy (2001).
be the total number of respondents who replies 'Yes' in l" sample. Let Jra and Jrb
be the true proportion of persons with attribute A and B, respectively. Obviously,
the probability of 'Yes' answer in the l" sample is ()i = P;Jr a +(I-~XI-Jrb)' i = 1,2.
Deduce the estimators of Jra and Jrb and derive their minimum variances subject to
total sample size remains fixed n = nl + nz .
Hint: Chang and Liang (1996).
where Ilx and Ily respectively denote the means for the sensitive variable X in
the sub-group and for the variable Y in the randomization device R 1 . In addition to
the above, each respondent in the sample is also provided with the usual Warner's
(1965) randomization device Rz to estimate Jr . This device may consist of a deck
of cards having two types of statements:
(i ) 'Do you belong to the sub-group A?' with probability P;
and
( ii) 'Do you belong to the sub-group notA?' with probability (1- p).
962 Advanced sampling theory with applications
( . x ) = -1-
MSEll
[ 2 ( \-2
2 lrax + 1-lr JU y +(
p(1- p) t
\2'Px-lly }
\2] '
ntt 2p-1J
Hint: Singh, Singh, and Mangat (1996).
Exercise 11.13. Suppose X denotes the sensitive variable. Let S be the scrambling
variable, independent of X and having finite mean and variance. For simplicity,
assume that X ~ 0 and S > 0 . The respondent generates S by using some
specified randomization device. Each respondent scrambles the response on X by
multiplying it with the value taken by the scrambling variable S in his/her case.
Only the scrambled result y = xs is revealed to the interviewer. Note that the
particular value taken by S is not known to the interviewer, thus the respondent's
privacy is not disclosed. Consider a sample of size n is drawn using simple random
with replacement sampling from a population of size N. Let Yj denote the value
of the scrambled variable Y for the lh respondent of the sample , i = 1,2,....n. Show
that square of coefficient of variation of the sensitive character X is given by
Exercise 11.14. In Franklin's (1989a, I989b) model k ~ 1 responses are obtained from
each respondent of a simple random with replacement sample of size n. The response
Zij' i = 1,2,...,n; j = 1,2,...,k is a random number drawn from the density gij if the
respondent belongs to the sensitive group A; otherwise, it is drawn from the density
hij' The interviewer does not know the density used by the respondent for drawing the
random number. The model can be specialized by having gij = gj and hij = hj for all
i = 1,2,....n . The densities gj and hj , respectively, have known means Ilu and 1l2j
and known variances a?j and aij. Suppose an investigator modified the procedure
suggested by Franklin (1989b) by using the known proportion of unrelated character
lr y in the population and by suitably choosing known parameters of the proposed
( ii ) does not belong to both A and Y he is instructed to use the density g2ij'
On the basis of the above information suggest estimator of finite population proportion
of interest. Derive its variance expression.
Hint: Singh (1994).
2p -1 2p-1
Also suggest an estimator of Jr in case of unrelated question model based on the
information collected from distinct unit.
Hint : Tracy and Mangat (1998), Arnab (1999).
Exercise 11.17. Consider the population under study consists of N units. We assume
that the population could be thought of as consisting of two strata, the first stratum
having N l respondents who will return the completed questionnaire without waiting
for any further communication from the investigator while the N 2 members of the
second stratum (defined as non-response stratum) will not do so. Thus N, + N 2 = N .
964 Advanced sampling theory with applications
Let J.lI and J.l2 denote the population means for the sensitive character X in the first
and second strata. Let the sensitive character X in the first and second strata of the
population be denoted by Xl and X 2 respectively, so that
E(X\) = J.l1 and E(X2) = J.l2 '
Then the actual population mean of the sensitive variable X is given by
2
uIIV;J.li ,
>
i=1
where IV; = Nj N, i = I, 2. Assume we select a sample of size n using SRSWOR
method. Questionnaires will then be mailed to each of the selected respondents in the
sample with the request that they should be returned after completion. The respondents
will also be required to scramble their response on sensitive character X . For
scrambling the sensitive character X, each respondent is instructed to select K
random natural numbers Sj (j = 1,2,...,K) out of the random sequence of first No
natural numbers sent with the questionnaire by SRSWOR. Then each respondent is
K
instructed to calculate the mean, S = K - 1 L Sj of the K selected natural numbers and
j=1
record only the scrambled response Y = SX. The value of K and No are same for
each respondent. Here Sj is a random variable with 1 ~ Sj ~ No ' Using
E(S)= No+1 and v(S) ={No _1}No+1. Let nl denote the number of respondents
2 K 12
who return the completed questionnaires so that n2 = (n - nl) is the number of
respondents who do not return the questionnaire. We select a sub-sample of h2
respondents with SRSWOR from n2 respondents ofthe non-response stratum such that
n2 = h2g , (g ~ 1). These h2 respondents will then be interviewed personally. Let
lJi = XIiS I (i = 1,2,...,n l) and Y2i = X 2iS2 (i = 1,2,...,h2 ) denote the scrambled
responses given by the respondents at the first and second efforts, respectively. Note
that Xli and X 2i are independent of SI and S2 respectively, therefore
E(X Ii) = E(lJi)/ E(SI) = J.ll and E(X2i) = E(Y2i)/ E(S2) = J.l2 . On using sample analog of
scrambled responses, unbiased estimators of J.l1 and J.l2 are iLl = 2)11 /(N 0 + 1) and
• = 2Y2
J.l2 - /( N o + 1) , were
h - = nl- 1 ~
YI L,YIi an d Y2
- = h-2 l h2
L Y2i' Consid .
onsi er an estimator f or
i=1 i=l
the population mean J.l as:
• (n\iLl + n2iL2)
J.lw = .
n
Show that iLw is unbiased for J.l and find its variance for the fixed cost.
Hint: Singh, Singh, and Mangat (1995) .
Chapter 11.: Randomized response sampling: Tools for social surveys 965
rl
where
Exercise 11.20. Divide the population into (k + 1) groups --- of which the first
group consists of responsive group and the remaining k groups belong to non-
sensitive group of varying degrees --- each of these non-responsive groups supply
aI, az, ..., ak levels of responses. Let Jrl' Jrz, , Jrk+1 be the true probabilities of
responses in these (k + 1) groups and PI ' Pz, , Pk+1 be the probabilities that the
spinner points to the responsive group. The value of the response variable is 1,
al> az, ..., ak, that is, the response group supply the full information while the
remaining k groups reveals only al> az, ..., ak level of information.
Thus
p(x; =1)= JrIP] + (1- P1XI- Jrl) ,
966 Advanced sampling theory with applications
L= W[PjJrj+(I-pJI-JrJn j,
j=1
where
k+1
n = L nj .
j =1
Show that the maximum likelihood estimate of Jrj' j = 1,2,...,(k+l) is the solution to
the set of singular simultaneous equations
nl(2PI -1),nz(l- PZ~ ,nk+1(1- PhI)] {Jrl (2PI -1)+(1- PI )}-t = [0:]
nl (1- PI ),nz(2pz -1~ ,nk+1(1- Phi) {Jrz(2Pz -1)+(1- PZ)}-I
[
nl (1-PI ),nz(l- PZ~ ,nk+1 (2Pk+1 -1) {Jrk+1 (2Pk+1 -1)+(1- Pk+)}-I
Hint: Mishra and Sinha (1999), Bourke (1981), Eriksson (1973).
Exercise 11.21. ( a) Assume two independent samples of sizes ni, i = 1,2 are drawn
from the whole population by using an SRSWR method such that n\ + nz = n , the total
sample size required. Provide a randomization device S, to the respondents in the th
sample, i = 1,2 with two statements: (i) 'I belong to group A' and (ii) 'I belong to
group Y', represented with probabilities Pi and (1- Pi), i = 1,2 , respectively. Then
for this model 0i = PiJr + (1 - Pi)JrY is the probability of obtaining ' Yes' answer from a
person in the th sample. If Bi denotes the proportion of 'Yes' answers obtained from
the respondents in the til sample, show that an unbiased estimator of Jr is
with variance
v(Jid = [(1- pzfol (1-Ol)/nl +(1- PI fOz(l- Oz)/nz] /(PI - pzf .
Show that the optimal choice of one of the Pi' i = 1,2 is close to one and other close to
zero. Show that the choice of the value Jr y (unknown) close to zero or 1 according as
Jr < 0.5 or Jr > 0.5 and if Jr = 0.5 , the minimum variance occurs at the tails of Jr y •
Exercise 11.22. Let Jr be the true proportion of persons belonging to group A. Let
Pj' j = 1,2,3 be known parameters of a randomization device used for eliciting
information in randomized response surveys. Let y I. be the binary response of the
lh respondent in the sample consisting of n respondents, then the probabilities for
the randomized response sensitive question are given by
Pr(Yi =1) = Pt + (1- PI - P2)Jrrr and Pr(Yi =0) = P2 + (1- PI - P2XI- Jr rr ) , where Jr rr
denote the proportion of observed 'Yes' answers through the randomization device.
Let Xi be a vector of explanatory variable and f3 is a column vector of unknown
parameters, so that the probability for the lh respondent is given by
Jr = e P'X i (1 + e P'X i
the likelihood function
t. Show that an estimate of f3 can be obtained by maximizing
L= n
i :Yi=!
[PI+(I-PI-P2) eP';~.]
I+e I
n
i :Yi=O
[P2+(I-PI-P2)
I+e
~'X']'
I
Jrs
0\ -0
r-O;
=-.--.'
2
•
01 *°• 2,
where 0; and 0; are fixed for the given study and their values depend upon the
joint distribution of 01 and 2 , °
( a ) Show that the estimator irs is unbiased for the population proportion, Jr .
(b) The variance V(irJ of the estimator irs is given by
V( A)= Jr(I-Jr)
Jrs + fbf(I - Jr )e2(InOt-02)
d - 02)+JrBt (I - OI )/ (0
(\2 1>
°\..10dO
2 JU 1 2
n ca
where /(01) ° 2) denotes the joint p.d.f. of 01 and ° 2 and O:$; a < 0t < b :$; 1,
0 :$;c <02<d:$;1.
Hint: Singh (2002c).
968 Advanced sampling theory with applications
7! 0 = I.2 Wi 7! i
A
i=1
2
such that I. Wi = 1. Show that ;To is unbiased and find the optimum weights such
i=1
that the variance of ira is minimum.
Hint: Folsom, Greenberg, Horvitz, and Abernathy (1973).
Exercise 11.25. Consider that in Warner (1965) model the persons belonging to
sensitive group are not reporting truthfully, but only a proportion 6. of them are
honest. Assuming that those who are not members of the group A are honest and
report truthfully. Show that the probability of a 'Yes' answer becomes
e = 7! 6.(2P -1)+(1- p). Find the variance of the unbiased estimator of 7! defined as
A
7!u=
e- (1 - p)
6.(2P-l)
where P ~ 0.5 .
Exercise 11.26. Show that an optimal estimator of the population total, Y, under
scrambled responses is given by
e(s,r) = as + 'LA;'i,
iES
where ri denotes the scrambled responses, as and bsI. have their usual meanings.
Hint: Arnab (2002) .
Chapter II.: Randomizedresponse sampling : Tools for social surveys 969
Exercise 11.27. Consider an unrelated question model (or U model) for the
situation when 7r y is known. Each sampled respondent is provided with a random
device consisting of two statements: (r ) I belong to sensitive group A; and (ii) I
belong to non-sensitive group Y; represented with probabilities p\ and (I - PI),
respectively. The respondent selects randomly one of these two statements,
unobserved by the interviewer and reports 'Yes' or 'No' with respect to his/her
actual status. The probability of 'Yes' answer is B1 = p\ 7r + (1- PI)7ry. Thus, an
unbiased estimator of the population proportion 7r is given by
A BI -(1- P\)7ry
7rG = .
PI
Consider a two-stage randomized response unrelated question model in which each
interviewee is provided with two randomization devices R\ and Rz . The
randomization device RI consists of two statements, namely: ( i) I belong to
sensitive group A ; and (ii) Go to randomization device Rz; represented with
probabilities T and (1- T), respectively. The randomization device R z is the same
as used in the U model represented with probabilities pz and (1- pz), respectively .
The probability of 'Yes' answer is given by Bz = T7r+(I-rXPZ7r+(I- pZ)7ry] and
an unbiased estimator of 7r is given by
A
7r -
Bz - (I - Pz XI - T )7rY
m - T+ pz(I-T)
( a) Find the variance of JTG and JT m .
( b ) Show that at equal level of protection of the respondents V(JTG) = V(JTm ) .
Hint: Mangat (1992), Mangat, Singh, and Singh (1992), Bhargava and Singh
(2001).
Exercise 11.28. Let X denote the response to the first sensitive question (e.g.,
income) and Y denote the response to the second sensitive question (e.g.,
expenditure). Further assume S\ and Sz be the two scrambling random variables,
each independent of X and Yand having finite means and variances . For simplicity
also assume that X 2: 0, Y 2: 0 , S\ > 0 and Sz > 0 .
The interviewee multiplies his response X to the first sensitive ques tion by SI and
the response Y to the second sensitive question by Sz. The interviewer thus
receives two scrambled answers 2 1 = XS1 and 2 z = XSz . The part icular values of
SI and Sz are not known to the interviewer, but their joint distribution is known.
In this way the respondent's privacy is not violated. Let E(SI) = 1 , E(Sz) =Oz ,°
v(sd= rzo , V(Sz)=roz ,COV(SI,SZ)= rll ' E(X)=,uI' E(Y)=,uz , V(X)=O";=mzo,
V(Y )=O"; =moz , rrs =E[SI- odr[sz - ozY and mrs=E[X-,uIt[Y-,uzf , where 01'
0z, rzo, roz, r l l' and rrs are known to the interviewer but ,ul ' ,uz, 0";, 0"; and
mrs are known. Also let 0";1 ' O";z and O"ZIZZ denote the variance and co-variance
of 2 1 and 2 z , respectively.
( a ) Show that the correlation coefficient between the two sensitive variables X and
Y is then given by
(O"ZIZZ - rll,ul,uzWrzo + O? ~roz + oi
Pxy = \ I z zI z z.
(rll + 0IOZ IV O"z] - rzo,ul VO"zz - roz,uz
( b ) Develop an estimator of the correlation coefficient using scrambled responses.
Hint: Singh (1991b), Bellhouse (1995).
Exercise 11.29. A randomization device (say, deck of cards) consists of three types
of cards bearing statements: (i ) I belong to group A; (ii) I belong to group Y;
and ( iii) Draw one more card. The statements are represented with proportions P ,
PI , and P i » respectively. Note that the characters A and Yare uncorrelated. The
respondent is required to draw one card randomly from the above deck and give
answer in terms of 'Yes' or ' No' according to his/her actual status if the statements
(i ) or ( ii ) are drawn . However if statement (iii) is drawn the respondent is
required to repeat the above process without replacing that card . If the statement
(iii ) is drawn in the second phase , the respondent is directed to report 'No' . If m
be the total number of cards in the deck, then the probability of 'Yes' answer is
0= [ll" P + PIll"y][1 + pzm/(m -I)] .
Construct an unbiased estimator of ll" and study its properties for different values
of the parameters involved in it.
Hint: Singh, Singh, Mangat, and Tracy (1994)
t91 == nT+(I-T)t9
Practical 11.2. Assume the true proportion of the extramarital relations in the world
is 0.3. Suppose we used a randomization device to collect information from a large
sample of persons using a randomization device with two types of statements:
Chapter II .: Randomizedresponse sampling: Tools for social surveys 973
Practical 11.3. Ms. Poonam Singh wishes to estimate the proportion of persons
having extra marital relations in the world. Suppose she selected an SRSWR sample
of 10000 persons across the world, and took responses through a randomization
device producing 80% statements, 'Are you having extra marital relation?' along
with 20% statements, 'Are you having no extra marital relation?' Out of 10,000
selected persons 3,000 reported 'Yes ' through the above randomization device.
Find her estimate of the proportion of extramarital relations in the world. Also
construct a 95% confidence interval estimate.
( I ) Each male in the sample was asked to respond to one of the two outcomes
using randomization device R} as follows:
Q1: 'Were you virgin before you got married?' with probability 11 ;
Q2: 'Were you born during the first 6 months of the year?' with probability (1-11) .
If lrm denotes the true proportion of virgin males before marriage, and ¢m in the
probability of males born during the first 6 months of the year, then the probability
ofa 'Yes' answer for a male in the couple is given by
Om = 11 + (1-11)rpm'
lr m
( II ) Each female in the sample was asked a response to one of the two outcomes
using randomization device Rz as follows:
Ql: 'Were you virgin before you got married?' with probability Pz ;
Q2: 'Were you born during the first 6 months of the year?' with probability (1 - pz )
If lrJ denotes the true proportion of virgin females before marriage, and ¢J in the
probability of female born during the first 6 months of the year, then the probability
of 'Yes' answer for a female in the couple is given by
OJ = lrJPz +(I-Pz)rpJ .
974 Advanced sampling theory with applications
( III ) Assume IjJm =IjJf =IjJ (say), that is, the proportions of males and females born
during the first six months of a year is same, then every couple was asked the
following question directly.
'Were either one (not both) of you born during the first 6 months of the year?'
Obviously the probability of a 'Yes' answer from a couple is given by
e = llj(l-Jrm# + Jrm(l-llj }p.
( IV ) Using information from ( I ) to ( III ) estimate the proportion of the males and
females who were virgin before they were married. Also obtain a pooled estimate.
Given: n = 5000, nm = 2000, nf = 4000, nc = 2500 and Pt = P2 = 0.80, where nm ,
"r: and nc denote the number of observed ' Yes' responses from males, females,
and couples, respectively.
Practical 11.5. Michael visited hospitals and found AIDS is a very serious
problem in these days. He selected an SRSWOR sample of 70,000,000 persons
across the world and each respondent asked to use the following two randomization
devices in the sequence.
The randomization device R1 consists of two statements viz.:
Statement (i ): 'Are you suffering from AIDS?' with probability 0.8;
Statement ( ii ): 'Use second randomization device, R2 ' with probability 0.2.
The second randomization device R2 consists of the following two statements:
Statement (i ): 'Are you suffering from AIDS?' with probability 0.7;
Statement (ii): 'Are you not suffering from AIDS? ' with probability 0.3.
Out of sampled 70,000,000 persons he received 10,360,000 number of 'Yes'
answers.
( a ) Make a flow chart of the randomization device.
( b ) Estimate the proportion of AIDS patients, and derive 95% confidence interval
estimate.
12. NON-RESPONSE AND ITS TREATMENTS
Missing at random (MAR): The data are MAR if the probability of the observed
absence pattern given the observed and unobserved data, does not depend on the
values of the unobserved data. We shall put all such cases where the data are
missing only owing to chance factors. It will therefore include cases where the
enumerator is not able to contact the respondents only by chance and had he been
able to contact, the data would have been collected. For example, when the
information is kept on punched cards the non-response owed to the accidental loss
of one or more cards is of this category. This type of non-response is called random
non-response.
Observed at random (OAR): The data are OAR if for every possible value of the
missing data the probability of the observed absence pattern, given the observed
and unobserved data, does not depend on the values of the observed data.
Parameter distinctness (PD): PD holds if there are no a priori ties between the
parameters of the absence model and those of the data model. In other words, we
can possibly classify the non-response with respect to its nature into two broad
categories.
Deliberate non-response (DNR): If the respondents are not willing to reveal their
response, such cases of non-response will be classified as deliberate non-response.
For example, the non-response in surveys in which information is being collected
on personal income or on certain personal habits such as drinking, gambling, etc.,
will come under this category.
Unit and total non-response: Kahan and Kasprzyk (1986) says that it is a
common practice to distinguish between total (and unit) non-response, when none
of the survey responses are available for a sampled element, and item non-response,
when some but not alI of the responses are available. Total non-response arises
because of refusals, inability to participate, not at homes, and untraced elements.
Item non-response arises because of item refusals, do not know, omissions, and
answers deleted in editing . For more details about the forms of non-responses the
reader is referred to Rubin (1978). We will discuss here a few basic models
folIowed by their recent developments .
Hansen and Hurwitz (1946) proposed a model for mail survey designs to provide
unbiased estimators of population mean or total. Their pioneer model consists of the
folIowing steps:
( a ) Select a sample of respondents and mail a questionnaire to alI of them;
( b ) After the deadline is over, identify the non-respondents and select a sub-
sample of the non-respondents;
( c ) ColIect data from the non-respondents in the sub-sample by interview;
( d) Combine the data from the two parts of the survey to estimate the population
parameters of interest.
Consider that a population consists of N units can be divided into two classes :
( a ) those who wilI respond at the first attempt forming the response class;
( b ) those who will not respond at the first attempt forming the non-response class.
Assume that N, and N 2 are the number of units in the population that belong to the
response and non-response class, respectively. We may regard the sample of n\
respondents as a simple random sample from the response class and the sample of
n2 as a simple random sample from the non-response class. Let bz denote the size
where siy is the population mean square error for the non-response class.
Proof. By definition we have
V(Yhh) = ~[E2(Yhh I nl> n2)]+E1[V2Vhh I n"n2)]
=(-!._~)S2
n N y
+ (g-I)(~)Si .
n N y
Corollary 12.1.1. The second term in the expression of V(Yhh) will vanish if g = I.
In fact, it is true if it is possible to interview every non-respondent to collect
information.
The cost function for this model has been found to be made of three components:
( b ) Cost of collecting, editing, and processing per unit in the response class = CI
(say) ;
( c ) Cost of interviewing and proces sing information per unit in the non-response
class = Cz (say) .
Thus it is reasonable to consider the cost function given by
C·= nCo+ nICI+ hzCz . (12 .1.3)
Note that C· varies from sample to sample thus it is recommended to use the
Theorem 12.1.3. The optimum values of g and n for the minimum expected cost
are, respect ively , given by
g = Cz [sz-
y
Nzsi y
N J/szZy (c + NIC) J
0 N (12.1.5)
and
n={szy + Nz (g-
N l) SZ
Zy
}/f~
~ 0
+Syz/N}. (12.1.6)
Proof. Let the variance V(Yhh) has been fixed as Vo , that is, V(Yhh) = Vo · Then the
Lagrange function is given by
(12 .1.9)
12. Non-response and its treatments 979
Note that
V(Yhh) = (.!-- ...!...)s; + (g
n N n
-l)(!!"'-)Si
N
y = Vo
z
On using (12.1.9) we have
which is the required optimum value of sample size n. Now differentiating (12.1.7)
with respect to g and equating to zero we have
ilL =_ nNzZCz + A. Nz si = 0
t% Ng nN y , (12.1.12)
which implies
z nzC z
g = ,,,z . (12.1.13)
IWZy
Using (12.1.8) in (12.1.13) we have
) Nz z} (z Nz z) NzCz z
{N:::':,I+~:):: C{C::~'):I,:;:s;:' ,
{ z (
g' = =
or
or
Example 12.1.1. Consider a city consists of 1000 persons. We wish to estimate the
average income of the persons living in the city. We selected an SRSWOR sample
of 100 persons and mailed a questionnaire to each of them regarding their annual
income at the beginning of a particular month. Out of 100 questionnaires mailed we
received a reply from 70 people. The average annual income from the 70 responses
was $35,000. At the end of the month we selected 10 people out of 30 who did not
respond through the mail survey and contacted them through personal interviews
for collecting information regarding their income. The average income obtained
through personal interview survey was $38,000.
980 Advanced sampling theory with applications
( a ) If the questionnaire had been mailed to all the 1000 persons in the city, then
find the estimate of the number of persons which are expected to respond to it as
well as will not respond to it.
( b ) Apply the Hansen and Hurwitz technique to estimate the average income of the
persons living in the particular city.
Following Politz and Simmons (1950) model, the interviewer makes only one call
during specific time (such as morning) on six weekdays. The time of calls has been
considered as random within interviewing hours. If the respondent is at home the
desired information is collected and he is asked how many times in the preceding
five days he was at home at the time of the visit. The information so obtained is
used to estimate the probability of the respondent's availability . Assume n
households have been selected by SRSWR and 'If i denotes the probability that t h
household is available at the time of the first ring. Then we have the following
theorem:
Assuming e-, = (d + 1)/ D , where d denotes the number of times the respondent was
at home during the last D days (or hours, or months etc.), d = 0, I, 2, ..., D - I . Show
that the bias in the estimator Ys p is given by
Proof. Let 'IIid i = 1,2,...., nand d = 0,1,2,...., D -1 denote the probability that the i,h
respondent will be at home d times out of D -I attempts or calls fixed by the
investigator for collecting information. Assuming 'IIi remains same for every d th
day, the probability that the lh person will be available on d th days out of D -1 calls
is given by the binomial distribution
(. ) (D- l)
PI =d = d 'IIid( 1- 'IIi )D-I-d . (12.2.4)
(D-l}
Under this distribution we have
D-I D-l
E l'L li)
( 'II;
= Ll'LP(i=d)= Ll'L f(I-'II i)D-I -d
d=O'lli d=O'lli d
= l'L Di'(
D }f+I(I - 'IIi f-(I+d) .
'IIi d=O d + 1
Note that
E(l'L I
'IIi
i)=l'L[I-(I-'II;f].
'IIi
Note that the probab ility of the lh person being selected and found at home is given
by '11;/ N , thus we have
(_ ) (...,) I n
ElY sp =E jE 2 lYsp =- ~ EI E2 -!..Ii
(Y') = -i ,Is ,It: -»,' -y.' [1-(I-'ll; )D]
n .=1 'IIi n ,=11=1 N 'IIi
-
=Y - - I INY; (I-'ll; )D .
N ;=1
Hence the bias in ysp is given by
(_) I N
Bl)'sp = - N ;~l Y; (I - 'II; f
982 Advanced sampling theory with applications
D-I(
I 1l.J2 P(i = d )= y[ D-l(
[( J2Ii ] = d;O
E 1l. I .E:)2(D-IJV';d(I -V';'f- 1- d
'1'; '1'; d;O d + 1 d
-_ Y;2 D~I
L. -
1( D )2(D-IJ'1';d+1(I- V';)vr-i-«
--
d;OV'; d + 1 d
1
= - [-
D IN-'-'-
a .y;2 - {I
-IN Ji (1-(I-V';'f )~2] .
n N ;; 1 '1'; N ;; 1
. (.,.., )
v\Ysp =-(-1) L.
n n - ;;1
1 ~( -Y;
'1';
-
- J2
Ysp (12.2.5)
The Politz and Simmon (1950) estimator adjusts the non-response bias owed to the
non-availability of the selected respondents at home during the period of survey by
classifying the available respondents into six groups according to their availability
at home during the previous week (D = 6) and employing appropriate weighting
procedures. Thus this estimator is based on the premise that the selected
respondents available at home necessarily co-operate with the enumerator, which
however may not be true. Sharma and Sil (1996) considered to study the non-
response bias in the Politz-Simmon estimator taking into account the possible non-
co-operation from the selected respondents who are, though , available at home yet
may be busy otherwise.
12. Non-response and its treatments 983
:~I~.t~I;~:;,~"'!4~]~~~:~"'::;~~f~~£1
01 Sharks, other 2016 6 1.000000 2016.00 . 34479613.30
04 Eels 152 2 0.428571 354.67 56750122.37
05 Herrings 30027 6 1.000000 30027.00 490138226.70
23 Blue runner 2319 6 1.000000 2319.00 31013030.07
32 Yellowtail snapper 1334 6 1.000000 1334.00 42954055.79
33 Snappers , others 492 6 1.000000 492.00 54699845.28
46 Spot 11567 3 0.571429 20242.25 152629114.60
47 King fish 4333 5 0.857143 5055.17 8024572.89
54 Tautog 3816 5 0.857143 4452.00 11805645.03
56 Wrasses, other 185 6 1.000000 185.00 59335197.99
57 Little tunney/Atl 782 6 1.000000 782.00 50494303.34
bonito
58 Atlantic mackerel 4008 6 1.000000 4008.00 15053890.75
60 Spanish mackerel 2568 4 0.714286 3595.20 18427568.41
62 Summer flounder 16238 6 1.000000 16238.00 69723595.94
64 Southern flounder 1446 6 1.000000 1446.00 41498518.49
69 Other fish 14426 2 0.428571 33660.67 664233729.80
Given: D=7.
v(ys
p
)=_I_±(.l.i..._
n(n-l) i=1 v,
Ys )2= 1801261031
p 16x15
=7505254.295.
A (1- a)100% confidence interval for the average number of fish caught during
1995 by marine recreational fishermen in the United States is given by
Proof. Let E3 be the expected value for a given sample of n units from which
(n - r) units (on which response has been received) can be treated as an SRSWOR
sample, E 2 be the expected value over all such samples for a given r, and E 1 be
the expectation over all possible values of r .
Then we have
where
, n y-
for YHT = L.a: .
i=I Jri
Proof. Define the variances VI, V2 and V3 in the same manner as the expectations
E( , E 2 , and E3 we have
V(YHT R ) = E IE2V3 (YHTR )+E1V2E3(fHTR )+IItE 2E3(fHTR )
-E N y;2
1 "
Li
1
..----
N Jr.. y.y .]( r) + Vy,(, )
"L.. -IJI
-- J -- HT
[ i=l Jri n -1 i;t j=1 JriJrj n- r
_
-
N V
"Ii2 1 ,N, 1Jr"Y
J I'Y J'] E
L . . - - - - L.. - - -
[ i=1 Jri
I --
(r) + V (, ) Y,liT
n - 1i;tj=1 JriJrj n- r
which proves the theorem .
986 Advanced sampling theory with applications
E[v(YHTR)]
= EIE2E3[V(YHTR)]
-EE E II [n~rll-(n-r)ll"; i+ I n~r y {n(n-r-I)ll"ij-(n -rXn-I)lr;ll"j}y;Yj ]
- 1 2 3 (n-rY ;=1 ll"f ; (n -r-I) ;=1 j;<;=1 ll"ijll";lrj
_ [I
- E1E2 -(- ) I
n-r ;=1
n n-(n-r)ll";
ll";
2
2
Y; + (
I n n {n(n-r-l)ll"ij-(n-rXII-I)ll";ll"j}Y;Yj]
II-r Xn -I );=Ij;<;=l
I I ll"ijll";ll"j
= E n-(n -r)ll";lf 2+ I I IHn-r-I)ll"ij-(II-rXn-I)ll";ll"j}lfYj ]
1[_I_I
(II - r);=1 ll"; (n - rXn -I) ;=lj#=1 ll";ll"j
Tracy and Osahan (1994c) studied the effect of random non-response on the usual
ratio estimator of the population mean in two situations: (i ) non-response in
study as well as the auxiliary variable and ( ii ) non-response in the study variable
only. Singh, Joarder, and Tracy (2000) suggested three regression type estimators,
which are further studied by Singh and Tracy (2001), in the presence of random
non-response in different situations under an assumption that the number of
sampling units on which information can not be obtained owing to random non-
response follow some distribution.
12. Non-response and its treatments 987
Let n:{v\, V2, " " VN) denote the population of N units from which an SRSWOR
sample of size n is drawn. If r (r =0, I,..., (n - 2)) denotes the number of sampling
units on which information could not be obtained owing to random non-response ,
then the remaining (n - r) units in the sample can be treated as an SRSWOR sample
from n. Note that we are considering the regression type estimators , therefore we
are assuming that r should be less than (n - 2). We assume that if p denotes the
probability of non-response among the (n - 2) possible values of responses, then r
has the discrete distribution
_ (n - r) n-2 CrPrq n-2-r ,
p(r ) ---2- (12.4.1.1)
nq+ P
where q = 1- P and r = 0, I,2, ..., (n - 2}
Let us define
e =Y'.!:"r _l , 0= x~r x
y
-I, and 77 = X -I,
X
where Yn-r = (n-rttnfy; , xn-r = (n-rt1nfx; and x=n+t x; have their usual
;= 1 ;=1 ;=1
meanings (Tracy and Osahan, 1994c). The probability model defined at (12.4.1.1)
is free from actual data values, hence it can be considered as a model suitable for
MAR situation. Then under the probability model given by (12.4.1.1), we have the
following results:
£(eo)=[(nq+2p
I ) ~]PXyCyCx, £{e77)=(~-~)pXyCyCx,and £(077)=(~-~)C;'
N n N n N
where Cy = SyIY, Cx = Sx/X and Pxy = sxy/lsxSy). It is interesting to note that
under the model (12.4.1.1) the above expected values are exact and hence makes
valid comparison with the estimators in the absence of non-response. The logic and
the practical importance of the distribution defined at (12.4.1.1) has been discussed
by Singh and Joarder (1998). In place of (12.4.1.1) we can also use another
distribution (e.g., truncated binomial), but these distributions add an extra
approximation in the expected values.
Strategy I. Consider the situation when random non-response exists on both the
study variable y and the auxiliary variable x and population mean X of the
auxiliary variable is known. Thus we consider a regression type estimator as
988 Advanced sampling theory with applications
Min .V(Y\) = [(
nq+2p
1 f ~]s;
N
(1- P;y) (12.4.2.2)
for the optimum value of a\ given by
a\ = SXy / S; . (12.4.23)
If p = 0 then the variance in (12.4.2.2) reduces to the variance of the usual linear
regression estimator. In this situation, we can estimate a\ by al = S;y/ s? , where
are the
I:,:?'"J
"
If p = 0 the variance III (12.4.2.6) reduces to the variance of the usual linear
regression estimator. Here az can be estimated by £1Z =s~/s;, where
s; = (n -r): I f(Xi -if , which leads to the following theorems:
i~ l
given by
for the optimum value of a3 = Sxy / S;. Again a3 can be estimated by £1 z . Thus we
have the following theorems:
990 Advanced sampling theory with applications
Solution. We are given N = 50 and X = 878.16 . From the above table, we have
n = 20 , x = 942.8615 and s; = 1307911.82.
From the responding states we have
12. Non-response and its treatments 991
Thus we have
x- n_r = 1068.292 , -
Yn_r=580 .177, Sx~ = 1685838.731 , Sy~ =295169 .6416,
* d * 595984.887
Sxy = 595984.887 an rxy = I = 0.84487 .
,,1685838.731x 295169.6416
Estimator 1. We have
- - ( * / *2X - - ) 80 595984.887 (8 8 )
Yllr=Yn-r+\Sxy Sx X -xn-r =5 .177 + 1685838.731 7 .16-1068.292 =512.96
and
V(YlIr) = [(nQ+1
2P)
~};2(I-r;/)
1 1 ] X295169.6416 X(I-0.84487 2 ) = 4476.61.
= [ (20 xO.65+2xO.35) 50
A (1- a)100% confidence interval for the true population mean Y is given by
Using Table 2 from the Appendix the 95% confidence interval for the average real
estate farm loans is given by
992 Advanced sampling theory with applications
Estimator 2. We have
YZlr = Yn-r + (s.:.r/ s; Xx - x) = 580.177 + 595984.887
1307911.82
(878.16 - 942.861) = 550.694
and
V(YZlr)
=[(20 xO.65+2xO.35
1 ) ...!...] X295169 .6416 + (...!..._...!...J X295169.6416 X(I-0.84487
20 20 50
Z)
Estimator 3. We have
hlr = Yn- r + (s.:.r/s; Xx,,-r - r) = 580.177 + 595984.887 (1068.292 - 942.8615) = 637.33
1307911.82
and
V(Y3Ir)
= [(nq ~2p)
I]
n Y Y
(I 1J
- s -z(I-rx*z) + - - - s -z
n N Y
Singh and Joarder (1998) consider the problem of estimation of finite variance in
the presence of non-response in survey sampling.
Let us define
E(8)=E(0)=E(17)=0,
E(8 2)=[(nq+2p
1 fJ...](A40-
N
11 E(02)=[( 1 fJ...](Ao4- 1),
nq+2p N
E(172)=[~-J...](Ao4-11
n N
1 )-J...](~2-1),
E(80)=[(nq+2p N
Strategy I. Consider the situation when random non-response exists on both the
study variable Y and the auxiliary variable X and population variance of the S;
auxiliary character is known. An estimator of a finite population variance is
V=S y
( 21Sx'2) .
. '2 \Sx (12.4.3.1)
B(V)=[(nq+2p
1 ) 1
N
]S;(Ao4-~2) ' (12.4.3.2)
v in terms of
Proof. The estimator 8 and 0 can be written as
v= S;(1+8-0 +0 2- 80 +....) . (12.4.3.3)
Taking expected value on both sides of (12.4.3.3) and using the results on the
expectations from previous section, we get (12.4.3.2). Hence the theorem.
994 Advanced sampling theory with applications
1 ) ~]S;(A40+Ao4-2A.zZ)'
MSE(V)=[(nq+2p N (12.4.3.4)
Proof. It is easy to check that
=[(nq+2pf~]S;(A40+Ao4-2A.zZ)
1
N
Strategy II. Consider the situation when information on variable y could not be
obtained for r units while information on variable x is available and population
variance S; of the auxiliary variable is known, then we have the estimator
• *z( z/ z)
VI = "» S x Sx . (12.4.3.6)
Thus we have the following theorems and their proofs are obvious.
(12.4.3.7)
MSE(VI) =MSE{sJ)+[(nq+2p
1 )-~]s;
N (12.4.3.8)
where MSE(sJ) denotes the MSE of ratio type estimator of variance proposed by
Isaki (1983) as discussed in Chapter 3.
M§E(VI)=(~_~)S;4(~0+Ao4-2i;z)+[(
n N
.1 . )
~+~
1] *4
-s
n y (12.4.3.9)
If information on x is available for all the n units then we can obtain both s; and
s;2. Using this information consider another estimator as
Theorem 12.4.3.7. The bias in the estimator v2 is the same as in the estimator VI '
Theorem 12.4.3.8. The minimum mean square error of the estimator v2 is given by
S4{(_ 1 -~J(~2-1)-(~-~)(Ao4
y nq+2p N n N
_1)}2
Min.MSE(V2) ==MSE(VI) (12.4.3 .11)
Proof. We have
f
MSE(V2) == £(V2 - S; == £[S;(I+e -1] +1]2- &1])+ ao- S; f
== MSE(v\)+ a 2 { ( nq+2p
1 ) ~}(Ao4
N
-I)
+2aSy2{(_1 -~J(~2-1)-(~-~)(Ao4
nq+2p N n N
-I)} (12.4.3.12)
{ ( _nq+2p
I -~J(~2
N
-1)-(~-~)(A04
n N
-1)}S2
Y
a == (12.4.3.13)
{(nq+2p) -~}(Ao4
1
N
- I)
and then putting the optimum value of a III (12.4.3.12) we have (12.4.3 .11).
Hence the theorem.
(12.4.3.14)
a=s*2{(_.1
y
. -~)(~2-1)-(~-~)(io4-I)l/{
nq+2p N n N
.1 .
J (nq+2p) ~}(iorl)
N
denotes a consistent estimator of a. To find the mean square error of the estimator
v3 let us define K= (a/ a)-1, where E(K) = O(n -11 then the MSE of v3 is given by
MSE(V3)=E(V3- S;) =E[S;(I+8-1J+1J2- 81J)+a(I+K)o-s;f. (12.4 .3.16)
This is approximately the same as MSE(V2) . It may be noted here that estimators
V2 and v3 may take inadmissible value, i.e., a negative value. Thus an equally
efficient alternative estimator is given below
• _
V4- S y
2 [l.
'2~ '2)a (12.4.3.17)
Sx
2
s;2
for a 7' I. If a = 1 then it leads to the following strategy.
Strategy III. Consider the situation when information on variable y could not be
obtained for r units while information on the variable x is obtained for all the
sample units, but the difference is that the population variance S;
of the auxiliary
variable is not known. In this case consider another ratio estimator
(12.4.3.18)
Theorem 12.4.3.11. The asymptotic mean square error of the estimator vs, up to
terms of order n- 1 , is
MSE(vs)= [{nq+2p
1 ~}(A40
N
_I)+{_I --~}(-104 +1-2Az2)]S~.
nq+2p n
(124320)
. . .
05 CA 3928.732 34 1241.369
07 CT 4.373 36 1716.087 612.108
09 FL 464.516 40 80.750 87 .951
13 IL 2610.572 42 388.869 553 .266
19 ME 51.539 43 3520.361
24 MS 549.551 627.013 44 197 .244 56 .908
25 MO 1519.994 1579.686 46 188.477
27 NE 3585.406 1337.852 47 1228.607 1100.745
30 NJ 27.508 39.860 48 29 .291 99.277
31 NM 274.035 140.582 49 WI 1372.439 1229.752
x -- Nonreal estate farm loans, y -- Real estate farm loans .
Apply the ratio type estimator "5 = s;2(s; /s;2) for estimating the finite population
variance of the real estate farm loans in the United States. Construct 75%
confidence interval.
Solution. From the Table 12.4.2.1 for n = 20 and r = 4, we have p = 0.23355 and
q = 1- P= 1- 0.23355 = 0.76645. From the responding units in the sample we have
n-r nr-r
L Xi L Yi
xn- r = i=1 = 17977.97 = 1123.62 Yn-r = i=1 = 11772.39 = 735.77
n -r r 20-4 n- r 20 -4
13
1.2406 x 10 = 8.270667 x io"
20-4-1
and
, +
nf (Yi - Yn-r t
-"i=::-'.I _ 5.83837 x 10
12
= 3.8922 x 1011
J.l40 = n - r- l 20 -4-1
n
LXi I(Xi-xf
X= i=l = 22979.72 = 1148.986 iL02 = s; = i=1
n -1
= 32113454.88 = 1690181.84
20-1
n 20
14 " II
1.41818x10 =7.46411x10 12 l' = J.l40 = 3.8922 xl0 = 2.0781
20-1 40 iL;5 432778.2 2
, Oil " 8 6 11
..io =J.l04=7.46411x l =2.6128,&i;2= :22. = .2406 7 x l 0 =1.1775.
4 iLJ2 1690181.842 iL20iLo2 432778.2 xI622949.4
Thus an estimate of the finite population variance of the real estate farm loans is
given by
_[{ 1 ...!.-}(20781-1)
- (20xO.76645+2xO.23355) 50 .
The (1 - a)1 00% confidence interval for the finite population variance is given by
Vs ± la/2(df = n - r- 2)JM?m(vs) .
Using Table 2 from the Appendix the 75% confidence interval of the finite
population variance of the real estate farm loans is given by
The next section has been devoted to discuss a few imputation techniques in brief
for handling the non-response in survey sampling.
Important note: Please keep in mind that in the proceeding sections the notation r
and (n - r) have different meanings than in the preceding sections .
1000 Advanced sampling theory with applications
In almost all large scale surveys non-responses are unavoidable. Several methods
are available for handling non-response problems in survey sampling. Details are
given by Rubin (1987). We consider here the problem of variance estimation in .the
presence of non -response in a unified setup. Several researchers have suggested
several methods of estimating the variance of the estimators of total or mean in the
presence of non-response in survey sampling. In some cases the probability of
response of the /h individual may be known or may be estimated through logistic
regression model. Consider a finite population n = {1,2,... ,i,...,N} from which a
sample s of size n is selected with probability p(s) according to some sampling
design p. Let us denote the inclusion probabilities for the /h , and /h and /h (i * j)
units by "j and "ij' respectively. Let Sr be the respondent units in s. Here we
assume that the set of the response sample sAc s) selected from s by nature with
probability q(sr) . Let us denote S(r) is the collection of all possible samples of size
r, those can be selected from s, that is, S(r) will consist of (~) different samples.
Let q = q(sr J Lq(sr)j-t, then q forms a sampling design. Let lJIj(s,r) and
1res(r)
S
lJIij(s, r) be the inclusion probabilities for the /h, (i'h, /h) i * j units . Obviously
lJI i (s, r) and lJIij(s,r) will depend upon sand r, the number of respondents in the
sample for the sampling design q defined on S(r). Amab and Singh (2001) consider
the problem of estimation of population total Y and its variance under two different
situations, viz., (i ) absence of auxiliary information, and ( ii ) presence of auxiliary
information.
Let Y j be the value of the variable of interest, y, for the /h population unit. The
well known Horvitz and Thompson (1952) estimator of the population total, Y, in
the presence of non-response is given by
~ = L ..2L = L l , (12.5.1.1)
ies; " jlJljrs ies; lJIjrs
where lJIjrs=lJIj(r,s) and Zj=Yd" j . Let s, (vs ), Erls(vrl s), and Esr1r,s (vsrlr,s)
denote respectively the unconditional expectation (variance) over the initial sample
s conditional expectation (variance) over r given s and conditional expectation
(variance) over Sr given rand s. Similarly E; (vr ) denotes unconditional
expectation (variance) of r, the size of the respondent units. Then we have the
following theorems.
12. Non-response and its treatments 1001
( , ) =Er [y2
V}( 1
I~air+-I IOijr(Y ' _Yj
-1.. _ )2 +-I
1 I 0ij (y.
-1.. _Yj
_ )2] ,
ieD J!i 2 i* jeD . J!i J!j 2 i* jeD J!i J!j
Proof. Writing Er,srls (Vr,srls ) for the overall expectation (variance) for variation
over rand s, when s is fixed we have
and
[ 2 [J2]
-.s, v:
L~ir +-L L
I
0ijr
v. Yj
-1- _ _ (12.5.1.5)
iEn " i Z i* j En " i "j
since
L L ZiZjOijr=-~LLtZi-Z}-(zl+ZJ)~ijr=-~LL(Zi-Zj~ +LZl L 0ijr'
i* jEn i* j i* j i J(*i)
From (12.5.1.3) and (12.5.1.5), we have the theorem.
(12.5.1.6)
Note that here d, is slightly different than that defined in Chapter 5. Let us assume
that an auxiliary variable Xi is available and it is positive for every i, then we can
estimate IIfi through Xi ' So, in the presence of auxiliary information, we can choose
weights in different stages as follows:
When the auxiliary variable is available we can propose a calibrated estimator for
(12.5.2.1) as
Yg = LWiYi ' (12.5.2.2)
ie s;
The calibrated weights Wi are such that the chi square distance function owed to
Deville and Sarndal (1992) defined as
L (wi-dif
iesr d.q,
(12.5.2.3)
is minimum subject to the constraint
LWixi = L~ = x (say),
ies ; ie s trj (12.5.2.4)
where the qi are suitably chosen weights. The calibration equation (12.5.2.4) is
similar as used by Dupont (1995) and Hidiroglou and Sarndal (1995, 1998) for
two-phase sampling. The choice of q i leads to different forms of estimators of the
population total. Minimization of (12.5.2.3) subject to (12.5.2.4) leads to the
calibrated weights
Wi = d i + diqixi 2 [ xA
- L.. iXi 0
s:« J. (12.5.2.5)
LdiqiXi ies;
ie s;
If 1/Jri = Wilfli then we consider first stage calibrated estimator of variance in the
presence of non-response as
The second stage weights wijs (r) are obtained such that the distance function
is minimum subject to
v2
, ( x ) -_ '"
a
i r xi2
£"'-3
1 '" '" 8··
+- £.., £..,
IJr
+ E> IJ..[ xi
- --
xJ. J2 (12.5.3.5)
iES J(i 2 i¢ j es J(ij J(i J(j
and
2
_ V2(X)-Vl(X) Xi Xj (12.5.3.7)
Wijs(r)- bij,(r)+ () bij,(r)Qijsr [- - -
R X J(i J(j J
with
, ~
21
v[(X) = .L bus(r)-T+- L .L bijs(r -1... _ _
IES r J(i 2 JESr J(i J(j
/¢
{
x· J2 ~
and
- 2A[ L wu,(r)
ies ;
X~ +.!-2 Li¢ L Wij, (J!l...-
J(i j es;
Xj J2 - V2(X )],
J(i J(j \
(12.5.3.8)
2 42
.I wiiAr) Xi2 = A.IbiiAr
IESr trj IESr
f i4 Qiisr + . I biiAr ) Hi\ .
1r; IE Sr
( 12.5.3.9)
·
Agam . l'res that
0 <1> = 0 Imp
--(-)
Owijs r
1 .I
-~ Wijs (r {Xi
- - Xj-J2 A .I
=-~ bijS()Q
r ijs({ -J4+ -1 ~. I bijS(r {Xi
r -xi -Xj -J2 .
- - Xj
2 ,* JESr Jr i Jr j 2 ,*JES r Jri Jr j 2 ,*JESr Jri Jr j
Adding (12.5.3.9) and (12.5.3.10) we have A = V2(xl(x~1 (x), which proves the
theorem.
If the sub-sample Sr is selected by SRSWOR sampling, that is, . Iffirs = r/n and
Iffijrs = {r(r -l)}/{n(n -I)}, then oijr = {(n - r)jr(n -l)}Jrij ' Ojr = {(n - r)/r}Jrj , air = 0 ,
_
b.. (r)-O, bijs()
r -_n(n-l){n-r 0 ij }_
- (- ) -(-) +- -t'1.ij (say) , , (x )_
VI
l I t'1.ij [-
--I Xi - Xj
-J2 ,
us r r-l r n-l Jrij 2 i*jEsr Jr j Jri
,()
V2 x = r(r -l ) I I t'1.ij[Xi
- - -Xi J2 , and R ()
x = -1 I I t'1.ijQijS(r { -Xi -XiJ4
-
2n(n - 1) i* JES Jri Jr i 2 i* jes; Jri Jr i
v, (,Y
HT Jstg=2 = - L: L: wijs r - - -
\ 1 ({ Yi Yi J2
2 i*i Esr Jri Jri
Wi=~{L~}-I{LXi}
Tri ies; Tri iesTri
, «<v»: L Yi{L~}-IXi'
ies; Tri iesTri
and
(
YHTR = !!.... L Yi
r ies; Tri J[ !!.... L ~
r ies; Tri J
L~.
(ie sTri J
The three estimators of variance of the ratio estimator YHTR are given by
vA(AYHTR.!stg=O
\
= -1 L L !',.ij (ei
---
ej J2,
2#jesr Tri Trj
b = L XiYi {L x~ }-I ,and ei «v,- bXi. In this case Yg becomes the regression
tes; Tri tes; Tri
estimator of the following form
YHT(lr)=!!.... L Yi
r ies; Tri
+b[L~-!!.... L ~J.
iesTri r ies; Tri
The three estimators of variance of the regression estimator are given by
2
AA 1 e. ej
V(YHT(lr)t =0 = - L L !',.ij -1.... - - ,
g 2 i* je s; ( Tri Trj J
Case I. If Wi = d i then
A N A
Yg = - LYi =Yu
r ies;
and
A( A\ 1 1 J 2 A( A\
2(-;-
v Yu.!stg;O = N N Syr = v Yu.!stg;l·
Now
W .·
ljS
n
(r)=--
2
---
r(r-l) r N
(1 1Js;r s2
--E!...
and hence
vA(A\
Yu.!stg;2 = N 2(1- - -IJ2S;n
Syr-2-'
r N Sxr
which is a similar ratio type of estimator studied by Isaki (1983), Garcia and
Cebrian (1996), and Singh and Joarder (1998).
1008 Advanced sampling theory with applications
type estimator studied by Isaki ( 1983), Garcia and Cebrian (1996), and Singh and
Joarder ( 1998).
ei = Yi - bxi and
Yg = ~r = N[yr +b(xn - xr )]
is a regression type estimator for the finite population total studied by Singh,
Joarder, and Tracy (2000). In this situation a few estimators of variance are
v(~rLg=o = N2(~_~) L(ei -ef ,
r N tes;
- 1
h
were e = - I ei .
r ies;
where
12. Non-response and its treatments 1009
l)2 (r-l) I. xl
( I.I.
ES
x ies ;
-2 2
X· Xj n 2
If QijS(r) =
(-1.. _ _
Jr; Jrj
J
=-2 (x;-Xj)
N
,then
.= {y; if i
E A,
Y., "f'-
Y; 1 lEA .
(12.6.1)
_ 1[ r n-r. ]
Y s =- LY;+ LY; , (12.6.2)
n ;=[ ;=[
In ratio method of imputation, we assume that imputation is carried out with the aid
of an auxiliary variable X , such that Xi the value of x for unit i, is known and
positive for every i ES . In other words, the data Xs = {Xi: i ES} are known.
Following the notation of Lee, Rancourt, and Sarndal (1994, 1995) , in the case of
single value imputation, if the th unit requires imputation, the value ;; Xi is
Yi if i EA,
Y.i = { • - (12.6.3)
bx, if i E Ao
This method of imputation is called the ratio method of imputation. Thus we have
the following theorem:
Theorem 12.6.1. Under ratio method of imputation the point estimator (12 .6.2) of
the population mean becomes
(12 .6.4)
- =Y-r (Xxr ) '
n
YRAT
_ ~n _ ~r _ ~ r
where Xn =n LXi ' Xr =r LXi and Yr =r L Yi.
i=1 i=1 i=1
Under the HD method the data after imputation takes the form
Yi if i E A,
v; = Yg(i) if i E :4,
{ (12.6 .7)
where Yg(i) is the Y value given by the donor unit g(i) E R , drawn at random (with
replacement) from the r responding units . Thus we have the following theorem:
12. Non-response and its treatments 1011
Theorem 12.6.3. Under the HD method of imputation the point estimator (12.6.2)
of the population mean becomes
_ 1[ r n-r ]
YHD = -;; i~/i + i~/g(i) . (12.6.8)
where Y g(i) is the Y value given by the donor unit g(i) such that Min.h - xii
gER p
occurs for g = g(i). If it results in more than one unit, a donor is randomly selected
from them. More detail can be had from Chen and Shao (2001).
Thus we have the following theorem :
Theorem 12.6.4. Under the NN method of imputation the point estimator (12.6.2)
of the population mean becomes
There is 30% non-response about the real estate farm loans. Impute the missing
values with different methods of imputation .
1012 Advanced sampling theory with applications
8254 .538
;=1
thus the imputed values are given by
CT 4.373 2.95265
ME 51.539 34.79913
MO 1519.994 1026.30000
NV 16.710 11.28259
NJ 27.508 18.57340
WA 1228.607 829.55540
XroP.yt~g~Y~me~
CT 398.11
ME 398. 11
MO 398. I I
NV 398. I I
NJ 398.11
WA 398.11
IiriptlteclYaly~s
CT 2.605
ME 282.565
MO 139.628
NV 40.775
NJ 1229.752
WA 323.028
12. Non-response and its treatments 1013
CT
ME 51 .539 139.628
MO 1519.994 1229.752
NV 16.710 39 .860
NJ 27.508 99 .277
WA 1228 .607 1229 .752
Yi = 76.89265 + O.071471xi
CT 4.373 77 .2051
ME 51.539 80.5761
MO 1519.994 185.5260
NV 16.710 78.0869
NJ 27.508 78.8586
WA 1228.607 164.7010
The imputation for missing data appeals to one or more model assumptions. The
imputed values are assumed, on the average, to be good substitutes for the missing
values. The difference between the true value, the unobserved value Yi' and its
imputed value Yi is assumed to be zero . These assumptions are useful to find the
estimator of the variance under different imputation mechanisms. Thus we have
three different model components here :
The variance of the estimator of population total based on the data obtained after
imputation consists of three components. An estimator of population total Y based
on the imputed data is given by
, N n N [ r n-r, ]
Yos = - L Yoi = - LYi + LYi (12.7.2)
n i= l n i=1 i=1
Now we have
A
Ye s - Y =Y, - Y + Yos -
A A '"
Y, . (12.7.4)
where E p and E q are the expectation operators with respect to the sampling design
p and the response mechanism q respectively . Thus we can say
, 2(
1- - Syoswlth Syos = n -l
where VORD =N - f) 2 . 2 ( )-1 ~n { va :»-I ~n Yoi }2
n 1=1 1=1
Note that an estimator of the sampling variance is also given by
VSAM=VORD+VDIF ' (12.7.8)
the correction term VDIF should be constructed such that
We shall now present the three components VDl F , V1MP and VM1X for each of the
four types of imputat ion methods . Let us define Sd = Ons, rd =0 n Rand
ld = On R C
• Then we have the following lemmas:
Lemma 12.7.1. For the ratio method of imputation, under model M the different
components of the variance estimators are given by
Lemma 12.7.2. For mean method of imputation, Xi =1 Vi, then under model M,
the different components of the variance estimators are given by
2
V,IM P = tNl d {ld
-;;; + 1 }'2(J" ,
and
2(1_
v;'MIX -- N f)l d {ld
--
I} , 2 (J",
n2 m
1 2 (m-l )-1,,(
witith , 2 =m-
(J" - - Syr an d Syr=
2 L.Yi - Y-)2
r which, in fact, is a special case of
m Rp
Lemma 12.7.3. Under the NN method of imputation and the model M we have
and
1016 Advanced sampling theory with applications
Lemma 12.7.4. Under Hot Deck method of imputation and the model M we have
VDIF = 0,
, N ld ,2
2 ( --;;;+2 ) a ,
V1MP =--;;'2ld
and
v.'MIX -_N2(I-f)l{md
2 d --
1}'2
a,
n m
. h a, 2 =--Syr
Wit
m -1 2 an d Syr
2
= (m-l )-1", ( - ) 2 which in fact is a special case of
L.. Y j -Yr
m R
NN method of imputation for Xj = 1.
where the superscript (j) denotes that the /h unit was deleted. This is performed for
all units j E S . The modified Jackknife estimator of variance is
(12.8.2)
For data sets containing imputed values this estimator does not take the imputation
into account. Rao and Shao (1992) proposed a Jackknife variance estimator that
corrects the estimator by adjusting the imputed values when the deleted unit is in
the response set. For some imputation methods the adjusted values are the re-
imputed values based on the reduced response set after deletion of thej" unit. If the
/h unit deleted is a non-respondent the imputed values are unchanged . The data set
after adjusting the imputed values is
12. Non-response and its treatments 1017
j
Yi
y!1j ) = Yi +afj) if i E R, j E R, (12.8.3)
Yi if i E R, j E R C
,
where y!~j) is the adjusted imputed value and ap is called the adjustment. The
Jackknife variance estimator is then
v = n -1 "'{f:(a
J -s
j)_ f:(a)}2
-s '
n
~
jes (12.8.4)
where Y.~j) = ~ I Y !~j) and Y.~) = N I y!1) are the estimators of the total
n -1 i* jes n ie s
obtained after dropping the l unit and from all the sample information,
respectively. In (12.8.3), the value of the adjustment factor afj) changes from one
-
method of imputation to the other. For example, for the Ratio and NN method of
imputation afj) =[ ~f~i ~: ]xi, for the Mean and the HD method of imputation
afj) = y>j) - Yr' Estimation of variance with Jackknife method for data with imputed
values has also been discussed by Lee, Rancourt, and Samdal (1995a, 1995b).
Some advanced techniques to estimate the variance of an estimator of total in multi-
stage designs is also available from Rao and Shao (1999), Shao, Chen, and Chen
(1998), Rao (1996b), Chen, Rao, and Sitter (2000), and Lee and Kim (2002) .
Rao and Shao (1992) considered the problem of missing data in a multi-stage
survey design. They consider the situation in which the first stage units or clusters
are selected with replacement, or so treated and in which independent sub-samples
are taken within those clusters selected more than once and followed Krewski and
Rao (1981) asymptotic set-up in studying the consistency of the adjusted Jackknife
variance estimator. Assume nh clusters are selected with probab ilities Phi and with
replacement independently from the hth stratum. Let
• 1 nh Y. (12.9.1)
Yh=-I-'
h
nh i=I P hi
Y= I WhikYh ik (12.9.2)
(hik )eSn
1018 Advanced sampling theory with applications
where Y hik and Whik (k = 1,2, .., nhi ; i=1 ,2, ..., nh ; h=1 , 2 , ... ,L) denote the value of
the variable under study and design weights, respectively. Let ~~ik} be the imputed
values for the non-respondents, Sm' using a hot deck single imputation class
mechanism. Then imputed estimator of population total Y is given by
where Sr denotes the sample of respondents. Under SRS sampling if we select the
imputed values Y ~ik as Ygjl Wgjl j Whik where (gjl)E s; denotes the selected donor,
then following Platek and Grey (1983) the estimator 1'1 becomes unbiased,
otherwise it remains biased. It is more appropriate for quantitative data, but may
not provide good estimates for qualitative data. Rao and Shao (1992) suggested a
simple method to select the donors (gjl) E Sr with replacement with probabil ities
Wgi// IWgi/ and use Y~ik =Ygjl ·
Sr
E.(~)= ~ {; (12.9.4)
T
where S= L WhikYh ik, i = L Whik and O= L Whik
(hik )esr (hik )esr (hik )esn
Proof. Taking expected values on both sides of
1'1= L WhikYh ik + L WhikY~ik
(hik )esr (hik )esm
we have
E. (~ ) = L WhikYhik + L WhikE'~~ik)
(hik )esr (hik )esm
L WhikYh ik
~ ~
(
(hik )esr
= +
J
L... WhikYh ik L... Whik
(hik )esr (hik )esm L Whik
(hik )esr
L WhikYhik
~ ~ ~
(
(hik)e sr
+
J
= L... WhikYhik L... Whik - L... Whik
(hik )esr (hik )esn (hik )esr L Whik
(hik )esr
I WhikYh ik
_ '" (hik )esr
- £..,Whik
( (hik )esn ) I Whik
(hik )esr
Hence the theorem.
12. Non-response and its treatments 1019
= PE 1[ IWhikYhik] =pE1(f)=pY.
(hik )esn
= pEl ( IWhik) = pN .
(hik)esn
Now taking expected values on both sides of
U= I,wh ik
(hik )esn
we have
E(U) = E( I Whik) = N .
(hik)esn
Hence the theorem.
1020 Advanced sampling theory with applications
Defining
S
Eo=--I,
pY
such that
E(Ei) = 0 for i = 0,1,2.
Then we have the following theorem.
~ YE[(I+ EO + E2 - El +....)] = Y.
Hence the theorem.
(12.9.6)
"s
A
because E~~ik)= sif. Rao and Shao (1992) used synthetic imputation values to
construct an approximately unbiased estimator of population total Y as
(12.9.10)
12. Non-response and its treatments 1021
Theorem 12.9.4. The estimator ~a(_ gj) is approximately unbiased for the
population total Y.
Proof. Taking expected value of ~a(_ gj) we have
Rao and Shao (1992) have also considered the situation of multiple imputation
-
classes. They showed (~-Y)/.,F; N(O,l). Yung and Rao (2000) have considered
the post-stratified design for estimating the variance using Jackknife estimator of
variance.
Rubin (1978) introduced mutiple imputation (Ml) to account for the inflation in the
variance owed to imputation. It requires the construction of M(:::: 2) complete data
sets by replacing each missing value by M imputed values using the same
imputation procedure . If )III, )lI2, ...,)lIM denote the M imputed estimators of the
population mean Y under SRSWR sampling. The 'final' imputed estimator of
population mean Y is then given by
)II. = M
1
I)llJ
J=I
(12.10.1)
,(_) M1(1
V YI. = - 1)~L..sU2
---
n NJ=I
+(M+1){
--- ---
M
1 ~(- - \2}
L..
M-1 J =I
Yu - YI.j , (12.10.2)
where s 7J denotes the sample variance for the fh completed data set and n is the
sample size. Rubin and Schenker (1986) pointed out that the variance estimator
(12.10.2) leads to valid inference at least when the number of imputations, M is
large and imputations are 'proper' in the sense that the imputed values are drawn
from the posterior distribution of non-observed values from the given respondent
values. The traditional simple random hot deck imputation is not proper in this
sense and it may lead to be an underestimate of true variance of )I I , thus Rubin and
Schenker (1986) suggested a new method called Approximate Bayesian Bootstap
(ABB) method of variance estimation. Rao (1996b) considered stratified random
sampling design for estimating the variance with multiple imputed data. Rubin's
1022 Advanced sampling theory with applications
method has been found to be applicable in the situations in which the fraction of
missing data is large and the user and imputer are the same individual who chooses
multiple imputation because of its convenience; for example, see Little and Yao
(1996), Paik (1997) , Taylor, Muzoz, Bass, Sah, Chmiel, Kingsley et al. (1990) , Tu,
Meng, and Pagano (1993), and Clayton, Dunn, Pickles, and Spiegelhalter (1998).
On the other hand, Fay (1992 , 1994, 1996), Meng (1994) and Rubin (1996) have
shown that the estimator of variance proposed by Rubin results in upward bias and
inconsistent in certain cases. Robins and Wang (2000) derived a general formula
for the large sample bias in Rubin's estimator of variance, which not only confirms
the findings of the Fay (1992, 1994, 1996), Meng (1994) and Rubin (1996), but
also indicates there are other scenarios under which Rubin's estimator of variance is
downwardly biased. Robins and Wang (2000) provided an interesting formula
which overcomes the deficiencies of Rubin's estimator of variance and they
provided a consistent estimator of variance, unlike Rubins estimator, when the
imputation and analysis models are mis-specified and incompatible with one
another. Schafer and Schenker (2000) used imputed conditional means for drawing
inference from it. Let us explain the concept of multiple imputation with the help of
a simple example .
Example 12.10.1. Select an SRSWOR sample of twelve units from the population
I given in the Appendix. Record the values of the real estate farm loans for the
states selected in the sample. Observe the random non-response in the selected
sample. Impute the missing values three times with the help of hot deck method of
imputation. Apply the concept of multiple imputation for estimating population
mean and construct the 95% confidence interval.
Solution. We apply the remainder approach on the first two columns of the Pseudo-
Random Numbers (PRN) given in Table I of the Appendix to select a sample of 12
states from the population 1. The first 12 distinct random numbers between I and
50 were selected as: 49, 08, 10, 04, 42, 01, 19, 37, 12, 23, 38 and 44. The
information so collected from the selected units in the sample is give below :
01 AL 408 .978
04 AR 907.700
08 DE 42.808
10 GA 939.460
12 ID ~ i ~;
19 ME 8.849
23 MN 1354.768
37 OR 114.899
38 PA 756.169
42 TN :':"_ T'
mr:':
44 UT 56.908
49 WI ,:':'(m
W_ :':
12. Non-response and its treatments 1023
We observe that data is missing for the three states ID, TN and WI. In the following
table we impute these missing values three times with the help of hot deck method
of imputation. On the first occasion we used the 3 rd column of the PRN to select
three random numbers betwee n 1 and 9 to impute the missing values as shown in
the fourth column of the Table 12.10.1. First time three distinct random numbers
came in the sequence as 2, 8 and I.
We use the 7th column of PRN to select three random numbers between 1 and 9 to
impute the missing data second time. We find the three distinct random numbers
as: 6, 7 and 9. The corresponding imputed data is shown in the fifth column of the
Table 12.10.1. We use the 11th column of PRN to select three random numbers
between I and 9 to impute the missing data third time. We find the three distinct
random numbers as: 4, 8 and 1. The corresponding imputed data is shown in the
sixth column of the Table 12.10.1. Here M =3, n =12 and N =50 . An imputed
estimate of the average real estate farm loans during 1997 in the United States is
given by
Multiple imputation is becoming very popular these days and standard techniques
for properly dealing with the missing data are appearing to have easy access to the
computers during the forthcoming centuries . Let Y be the complete data set. Let
Yobs and Ymis denote the observed and missing components of the complete data,
that is Y '" (Yobs, Ymis ) . Assuming that with complete data valid inference about a k
component quantity fl, possibly a model parameter or a finite population
characteristic follows the standard large sample statement
(P-fl)- N(O, v) (12.10.3)
where p '" p(y) is an estimator of fl
and v'" V(Y) is its associated variance. The
basic idea of multiple imputation is to fill the missing data multiple times with
values drawn from some distribution that predicts the missing values given the
observed data and other available information . Each draw of Ymis is an imputation.
Thus in case of multiple imputation we have m imputations Y~!s' y~~l ,.., Y~~ as
repeated independent draws of Ymis from Bayesian prediction model. In general we
do multiple imputation in three steps:
Step I. Compute the complete data statistics ft.l '" p(y(I)) and VOl'" V(y(l)) for each
of the m completed data sets yell '" (YObS' y~!s) for 1= 1,2,...,m.
- 1 m
Step II. Compute an estimate of fl flm I.JJo l and an estimate of variance as
A
as =-
ml=1
Vm=Vm+(1+ ~)Bm
where Vm= 2- I VOl and Bm= _1_ I (POl - jim XiJol - jim) .
ml=1 m - 11=1
12. Non-response and its treatments 1025
Step III. The (I- a)100% confidence intervals for fJ are formed on the basis of
k component Student's t distribution given by
(P-Pm)/.JV:, -tvm (12.10.4)
where
V m= (m-l)y;;;2 with i; = (l+m-l)tr(BmT';I);k. (12.105)
Barnard and Rubin (1999) pointed out that in small data sets, however, it can be
unsatisfactory to set the degree of freedom to infinity, especially when there is little
missing information, because V m can then be many times the degree of freedom
available if there were no missing data. Thus Barnard and Rubin (1999) thought
that there is a need for the new expression for multiple imputation degree of
r
freedom that does not rely on a large complete data sample. They provided a new
expression as an adjusted degree of freedom
where A(V) =(v+ I)/(v+ 3) and vObs = A(vcom)vcom(l- Ym) denotes the observed degree
of freedom with Vcom being complete degree of freedom .
Singh and Hom (2000) suggested a compromised imputation, in which the data
after imputation becomes
_ {a ny.] r+(1 - a )bxi if i e A,
v; - ( \£ _ (12.11.1)
I-apxi if ieA,
where a is a suitably chosen constant, such that the variance of the resultant
estimator is minimum. Note that Meeden (2000) has also suggested the idea of
adjusting responding values in addition non-responding values while doing
imputation . They used information from imputed values for the responding units in
addition to non-responding units. Thus we have the following theorem:
Theorem 12.11.1. The point estimator of the population mean Y under the
compromised method of imputation becomes
Proof. We have
The main difficulty in using the compromised imputation procedure is the choice of
a . It is important to note that the optimum value of a depends only upon the well
known parameter K = PxyCyjCx ' The value of K is quite stable in the repeated
surveys as shown by Reddy (1978a). Thus if the value of K is known then the
compromised imputation method can be easily implemented in actual surveys.
Some time the value of K is not known. In these situations , Singh and Hom (2000)
have suggested two estimators of a . The first estimator is given by
(12.11.10)
where s; = (n -r): 1 I (Xi - xnt The choice between al and az is not very
i=1
important for the infinite populations, because the asymptotic mean squared error of
the resultant estimators of mean remains same by following Sampath (1989). Singh
and Hom (2000) have shown that the compromised imputation technique remains
better than ratio or mean methods of imputation.
Under such situations the values of Pxy ~ I and Cy '" Cx' then the optimum value
of a, and hence its estimators ai' i = 1,2, will tend to zero. In other words, then the
imputed values , using the compromised technique, remain close to the true values
in A . Also the actual values Yi do not have any impact of imputation in. It is
remarkable that a bad guess of a may lead to bad results in the compromised
imputation . Since the compromised imputation provides better estimator of
population mean, therefore, it is recommended to use in future.
1028 Advanced sampling theory with applications
Singh and Hom (2000) pointed out that this type of compromisation can also be
done between other type of imputation methods. For example a compromisation
between Hot deck and Cold deck methods of imputation may lead to the ' Warm
Deck' method of imputation, defined as
Ywo = ayco + (1- a )YHO . (12.11.13)
The correlation between estimates obtained via the cold deck and hot deck methods
is expected to be high, and hence the resultant estimator (12.11.13) named the
"Warm Deck" method of imputation is expected to be efficient for the optimum
value of a , given by
Example 12.11.1. Select an SRSWOR sample of twenty units from the population
1 given in the Appendix. Record the values of the real and nonrea1 estate farm loans
for the states selected in the sample. Observe the random non-response in the
selected sample. Impute the missing values with the following two methods:
( a ) Ratio method of imputation; and ( b ) Compromised method of imputation.
Estimate the average real estate farm loans with each method and comment on your
results.
Solution. We apply the remainder approach on the 3rd and 4 th columns of the
Pseudo-Random Numbers (PRN) given in Table 1 of the Appendix to select a
sample of 20 states from the population 1. The first 20 distinct random numbers
between 1 and 50 were selected as: 29, 31, 14,41 ,05,47,28,22,18,12,42,23,48,
02,06,07, 11,21,25, and 39. The information so collected from the selected units
in the sample is given below.
12. Non-response and its treatments 1029
The imputed values for the ratio method are then given by
if the /h state selected in the sample is not responding to the value of real estate
farm loans .
On the other hand, the imputed value for the compromised imputation method is
Thus by the ratio and compromised methods of imputation the data takes the form
as shown in the 4 th and 5th columns, respectively, of the following table.
1030 Advanced sampling theory with applications
~Sr.
No. ~
I ~~irffi~ndtj .~eal es~~;}.I !i,~Ratio.
tory 'Gf '· J oans" y
. C?1Jl~m)1~~
Imp!1tatlOn , Imputatlon~
2 AK 2.605 2.60500 2.339678
5 CA 1343.461 1343.46100 2069 .779000
6 CO 502.23370 362.449400
.'~,
Missingi~
7 CT 7.130 7.13000 4.394841
II HI 40.775 40.77500 30.355770
12 rn 53.753 53.75300 422 .292200
14 IN 12 13.024 1213.02400 859.195300
18 LA 282.565 282.56500 267.151400
21 MA ,"';'iJ,'ri: ,;l,;~i~ .• M:issirfg ;~ 31.29453 22.584480
22 MI 323.028 323.02800 296.052400
23 MN 1354.768 1354.76800 1489.340000
25 MO 1579.686 1579.68600 1194.114000
28 NV 5.860 5.86000 8.859487
29 NH 6.044 6.04400 2.431297
31 NM 140.582 140.58200 161.765000
39 RI "'w ~;; MissiIi'g"1i' 0.12912 0.093184
41 SD 413.777 413.77700 830.561900
42 TN 553.266 553.26600 360.837800
47 WA Missiilg;,,· 680.85710 491.357400
48 WV ..-f'
'".',J, Missing '" 16.23219 11.714360
"''' \r;,~jf~ .
',;
f
' ff •
Sum": 1l',,8551.07,J00 · 8887.666700""
Thus an estimate of the average of the real estate farm loans in the United States
during 1997 by the ratio method of imputation is given by
- . -.!- {!.
Yratto -
".( )= 8551.071
L..Y, r 20
_ 427 55
- .
n i=l
-
Ycom p
=.!- {!. ".( ) = 8887.6667 = 444 . 38 •
L..Y, C
n i=1 20
From the description of the population 1 given in the Appendix, the true average
real estate farm loans is give by f = 555.43 . One can see here that the estimate
based on the compromised method of imputation is close to the true value of the
population mean than the estimate based on the ratio method of imputation.
12. Non-response and its treatments 1031
in the convenient interval of x values centred at Xk ,(k ES) . Assuming 2lk be the
length of the interval centred at Xk , then Pk can be estimated as
I ikD~k -xJ {1 I
if Iz s;, lk>
II
• jES ()
Pk = ( ) ' where D z = (12.12.4)
IDxk-Xj 0 if z > lk>
jES
Y•c = I -.-
Y k + f3ds
' [X - I - Xk.-] (12.12 .5)
kES "kPk kES "kPk
where
2 ]-1
/3,
ds
=
[I Yk Xk
kES vk"kPk ][I~
kES vk "kPk
and "k denotes the probability of including the j(h unit in the sample .
Example 12.12.1. We wish to estimate the average duration sleep time (in minutes)
of the aged persons living in a small village of the United States as shown in
population 2 of the Appendix . Select an SRSWOR of eight persons from the
population 2. Collect the information about the duration of sleep and age of the
persons selected in the sample. Suppose the response probability of the lh person is
inversely proportional to the age of the persons selected . Assuming that an old
person will respond less quickly than a matured young person . Estimate the average
duration sleep time of the old persons in the particular village .
Solution. The population 2 consists 000 old persons living in a small village. We
used first two columns of the Pseudo-Random Number (PRN) Table 1 given in the
Appendix to select eight distinct random numbers between 1 and 30 as: 01, 23, 04,
05,22,29,03,27.
v, = xi-1/ N( )
/ i~ 1/ Xi'
30
where iZ:} /Xi = 0.454 is known.
In almost all large scale surveys non-responses are inevitable . Several methods for
handling non-response problems in sample surveys are available in the literature.
Good details are given by Rubin (1987). Sarndal (1992) developed a method of
estimation of the population total and its variance when a single imputation is used
for estimating unobserved values under the superpopulation model described below.
Let n = {1,2,oo .,N} be a finite population of N identifiable units and Yi be the value of
the variate under study for the lh unit of the population n. It is assumed that the
vector ~ = (Yl,oo,Yi,oo,YN) is a random sample from a superpopulation z having the
following distribution :E;(Yi)=jJxi' V;(Yi)=O' 2x f and C;(Yi 'Yj)=O for i e j ,
i,j = 1,oo,N, where E;, v; and C; denote respectively expectation, variance and
covariance operator with respect to the model ;; jJ, 0'2(> 0), andg(> 0) are
unknown model parameters, and Xi (>0) is the value of the auxiliary variable for the
lh unit. The objective is to estimate the finite population total Y = LYi on the basis
iefl
of a sample s, of size n, selected with probability p(s) according to a sampling
design p. Let 1Ci and 1Ci} be the inclusion probabilities for the and lh and r, r
(i *j ) unit of n . Let sr (c s) be
the set of respondent units of size m from which
responses Yi values are observed and the complement s - s r (of size n - m ) be the
set of non-response units. Let Yj for j E S - Sr be the imputed value of the j'h unit
computed according to a certain rule depending on the superpopulation model ;
(for details, see Sarndal (1992» .
Let
t=LwiYi
ie s
be an unbiased estimator used for estimating Y in case of 100% response
(i.e., m = n) where Wi are suitably chosen weights independent of Yi values. In the
presence of non-response (m < n) , Sarndal (1992) modified the estimator t as
i = LWiYiO,
ie s
where
1034 Advanced sampling theory with applications
Y; for r e s,;
Y;o = { c. •
Y; ror z e s s, ;
A
VIOl =
m N ;ES
r
(12.13.4)
For the superpopulation model
. 2 (12.13.5)
E~(y;) = fix;, V~(y;) = CT x;, and C~(Y;'Yj) = 0,
Samdal (1992) used
Y; for i E s.; A
Y;o ={ with B = ,L Y; j, L x;
Bx, for i s.,
A
E S - IES
r IES
r
and proposed an estimator for Y under SRSWOR as
- Yr
t02 = N XS -=-, (12.13.6)
Xr
. () = N 2(-;;-
Vsam 2 1 N1)L2
r yos + CoO".2} , ftimp 1 N1)CtO".2
. ()2 = N 2(-;;-
where
S;o s = (n -1).t(Yi - Yso)2, Yso = .t In,
YiO 0- = .t~i -
2
EXi f/[(m -1)~r(l- cV;r 1m)}],
lES lE S ie s;
ie s;
and
te s; ies ;
Finally, in order to compare the relative efficiency of the estimator i02 ' Sarndal
(1992) conducted a Monte Carlo study with 100,000 repeated response sets s.;
N = 100, n = 30. Three different response mechanisms were used:
when the response probabilities ek are known or can be estimated from the
available data. So Amab and Singh (2002d) proposed some alternative estimation
procedures assuming that the response probabilities ek for the 11" unit are either
known or estimated through the log linear models:
respectively, where Pi = zd2 and C\, C2 are unknown positive constants which are
appropriate when response probability increases or decreases with x.
Amab and Singh (2002d) assumed that the response probabilities ei are
independent. Under this assumption, they consider that the response sample s,
(formed by the set of respondent units) is a sub-sample from s selected by the
nature according to the Poisson sampling scheme with inclusion probabilities
lTils = e i and lTijl s = eij = 8i8 j for i *' j. The Horvitz and Thompson (1952) type
( 12.13.1.1)
It can be easily checked that Yht is unbiased for the total Y. Th e express ion for the
variance of Yht is given in the following theorems:
YJ = -1 L L7r;7Lj
( -7rij {Y;
- - -Yj J2, (12 .13.1. 2)
Z ;* j 7r; 7rj
and
V2 = Li
i
(J..-1J.
«. 0;
(12.13 .1.3)
Proof. We have
Now
Theorem 12.13.2. If the sample size is large enough to ensure Prob{m ~ Z} :; 1, then
the following two estimators
Vht(l) = VII + V2 and Vht(Z) = Vl2 + V2
are unbiased for V(Yht ) , where
•
VII -_ -L
1
L
Z; *jEs r 7rijO;Oj
-J2 ,
(n-;7rj - 7rij) [ Y;
- -Yj
7r; 7rj
.
V2 = L - i; - (1
--1 J
;ES r 7r; 0; 0;
'"
and
Ii.12 = '" £{1~ -lJ -'"L.
L. ;r 'O' L.
(7r;7rj - 7rij) y ; Y j .
;ES r I I 7r; ;* JESr 7rij7r;7rj 0; OJ
Proof. Noting
12. Non-response and its treatments 1037
1 v. v, 2 1 y . Yj
VI =-LL(IT;11j-ITij { -L_-L 2 =LY; ( - - I ) +LL(ITij -IT;ITj)l1-- ,
2 ;¢ j IT; IT; ) i IT; ;¢ j IT; ITj
we can verify that both the estimators VII and ~2 are unbiased for VI ' It can be
easily checked that V2 is unbiased for V2 •
Case II. Consider an SRSWOR design where IT; = IT;o, ITij = ITijO and response
probabilities 0; are equal to °for every i. Further, if we estimate °by iJ = m/n,
then
• N _.
Yht = - LY; = N Y r =/01 ' (12.13.1.4)
m ie s;
The estimator tOI was proposed by Samdal (1992) when the method of single
imputation is used as described in (12.13.3). Putting 0; = iJ = m/n ,
IT; = IT;o ' ITij = ITijO in the Theorem 12.13.2 we have two approximate variance
estimators for tOI as
• _ N(N-nXm-l) 2 N 2(n-m) 2 • _. N(N-nXn-m) 2
vOl - () Sy r + 2 LY;, and V02 - VOl + ( ) 2 L y; ·
m n -I nm ies ; n n -I m ie s;
Finally, noting that Va = N 2 ( ~ - ~ JS;r, the expression for the estimate of the
Theorem 12.13.3. (i ) v02 ;::: VOl for all the Y ; values, and ( ii ) V02 ;::: Vo whenever
all the Y; values are positive.
Proof. Straightforward and hence omitted.
We shall discuss estimators of total and variance under the calibrated response
probabilities mechanism in the following sections:
1038 Advanced sampling theory with applications
Consider the Xi' i E S are known. Then following Deville and Sarnda l (1992), a
calibrated estimator of the population total is given by
(12.13.2.1)
where the Wi are the calibrated weights obtained by minimizing the chi square type
distance function
D= I(Wi-Ijeyqjl
ies ;
Here the qi are suitably chosen weights. Minimization of Dleads to the calibrated
weights
(12.13.2.2)
where
(12.13.2.4)
Now writing
Br=[EI
IES r
YiX~qiJ/[E.I
1(i IESr
xl;iJ=(.I YiXiq;O;]/( .I xlq;O;]
1(i lEU 1(i lEU 1(i
Case I. Consider an SRSWOR design where " ; = n/ Nand " ij = n(n -1)/N(N -I)
(12.13.2.11)
1040 Advanced sampling theory with applications
VoC (3) -
_ »: n
(m -1) (is) (nXs -mXr ) 2 + N(N - n)(m -1) (is)2 2
m
2 -
x; x;
Ser
m(n-l)
-
Xr
Syr
and
+ (~sr )2 [N(N
X
-n) {(m -I)s~r +(~_.!-) L Yl}]
m(n-l) m n ies ;
where b= ,~ YiXi/,~xl.
ies ; IE Sr
Now writing (m - 1Fe; = L (Yi - b.x, and Y e; = Yi - bx-, we have the following
tes;
estimators for the variance of (12.13.2.12) as follows
2(n-m)(
-voC (1)-- N
2 m
_1)-2
Ser +
N(N-nXm-l) 2
( ) Syr'
n m m n-l
2(n-m)(
2 =N- - 2 - m-1J1::'
-voC () ser2 + N(N-n)[(
(_) m-l )s2yr + (1 1) ,L Yi2],
---
n m mn 1 m n ie s;
12. Non-response and its treatments 1041
2
_ () N ( '1:::'2 N(N-n) 1 f. \2
VoC 3 = -
2 L WiO WiO-1Fi + 2( ) L L WiOWj Ol)'i- Y j} ,
n ies; n n-I 2 i "'j ESr
2
_ () N ( \::'2 N(N-n) 2 N (N- n )
V oC 4 = -
2 L WiO WiO - 1JCi + 2 LWiOYi - 2( )L L WiOWOjYiYj'
n ies; n ies; n n - 1 i", JES r
One can also refer to Lundstrom (1997) and Lundstrom and Sarndal (1999) and
they suggested that calibration can be taken as a standard method for treatment of
non-response in survey sampling.
Exercise 12.1. Distinguish between 'Missing at random' and ' Missing completely
at random' . Give examples of random non-response and deliberate non-response in
interview surveys.
N
Exercise 12.2. Let t = t(s) = LbsiYi be an estimator of population total Y = LY; In
~s ~1
Exercise 12.3. Let there be a population of N units from which a sample of size n
is to be drawn. Let the study variable be denoted by Y and the auxiliary variable be
denoted by x. The selection probabilities Pi of the population units are taken to be
proportional to the corresponding x values. Let r (r =O,I,2,... ,(n-I)) be the number
of units (including repetitions in the case of PPSWR sampling) on which the
information on Y could be collected. The value of r is supposed to be less than or
equal to (n - 2) while estimation of variance of the estimator of population total is
concerned. Then show that an unbiased estimator of population total Y is given by
• n n-r
YHTR = - - LdiYi
n-r i;1
•• • / .z
where fJ = Sxy Sx , kl and k z are real constants.
Hint: Singh and Joarder (1998), Singh, Chandra, and Singh (2003).
Exercise 12.5. Consider (lj,x;) be the value of the fh unit of the study variable y
and the auxiliary variable x for a population of size N with population means Y
and X, respectively. Let n be the size of the SRSWOR sample drawn from it.
Consider only nl selected persons respond and nz do not, such that nl + nz = n.
From the nz non-response, let r = nz/k, k > 1 units are selected by making extra
efforts. An estimator of population mean Y is given by
-' nz _·
nl _
Y =-Yl +-Yz,
n n
where YI and Y; are the sample means based on nl and r units, respectively, for
the study character. Assuming that population mean X of the auxiliary variable is
known, study the asymptotic properties of the following estimators:
( b ) mz = y.(;), where x = n-I ;~lx; denote the sample means based on n units
(c) m3 _.(x+c)
=Y -=----
X +C
an d m4 _.(X+C
=Y -_--) ,
x+C1
I
with C and Cl suitably chosen constants such that the variances of m3 and m4
are minimum. Compare m3 and m4 with the estimator of population mean in the
presence of full response defined as
- =Y-(X+C x)
Ysd -=---C .
x+ x
Hint: Khare and Srivastava (1997).
Exercise 12.6. Let ljj' j = 1,2,...,N;, be the value of the study variable for the /h
unit in the fh stratum i = 1,2,...,L . The population of each stratum is divided into two
classes, those who will response at the first attempt and those who will not respond,
thus creating the problem of incomplete sample in the mail survey. In order to
12. Non-response and its treatments 1043
Let Yi and Y~.1 be the sample means of the respondent groups in the z4h stratum,
obtained at the first attempt and second attempt (mail survey, say), respectively.
Exercise 12.7. Discuss the different methods of imputation and suggest estimators
of variance.
Hint: Lee, Rancourt, and Sarndal (1994, 1995a, 1995b).
Exercise 12.8. Discuss Rao--Shao adjustments for estimating the variance of the
estimator of population total using Jackknifing under different methods of
imputation.
Exercise 12.9. Consider a finite population of size N. Let Yi be the value of the
variable under study, Y, for the lh unit. Each unit of the population has a definite
probability of providing the necessary information with respect to the variable of
interest under the particular given field method. Let Pi denote this probability of
obtaining required information from the lh unit and let qi = 1- Pi ' Consider we
1044 Advanced sampling theory with applications
selected a sample of n units by SRSWOR sampling and only m units provided the
required information. In the repeated samples, the value of m will be a random
variable taking the values 0,1,2,....n . Show that an unbiased estimator of the
population total is given by
0 if m = 0,
1
Yl = N ~
L. Y;V;Ii; )'f
m > 0,
n ;=\ P;
where V; = 1 or 0 according as the lh unit is III or is not in the sample and
e, = 1 or 0 according as the lh unit does or does not provide the required
information. Find its variance.
Hint: Singh and Narain (1989).
Exercise 12.10. Let a finite population consist of N units . To every unit there is
attached a characteristic y . The characteristics are assumed to be measured on a
given scale with distinct points Yl,YZ" ",YT' Let N I be the number of units
associated with scale point YI' with N = INI . A simple random sample of size n
I
non-response is observed assume that response is obtained from n(r) units in the
sample and non-response from n(r) units, such that n(r)+ n(r) = n . Show that under
the likelihood function, defined as
[ (-)] (NYJJ(N12J
L Nrl ,NrZ'· ···,NrT(r);N r = nYJ n (NrT(r)](N(r)J!(NJ
nrT(r) n(r) n'
12
a maximum likelihood estimator of the population mean is given by
_ 1
Yr = - ()LnrlYrl .
nr I
Hint: Laake (1986).
where
1l-l1) = (Jrk vr, Jr12) = !....Jrk' Jr13) = Jrk Pr(k EtR I Ik)= I) , with UR being a particular
n Pr k E UR
set of units in the population that would respond to the survey question, and I k is
. diicator functi
an m nction d e fime d as I k = {I if. k E S, sue h th at Jrk = Pr[I k = I I k E U R 1;
o If k ~ S,
Pxy between x and Y ; and Jrls) = (1- rxy XJrk rill + rxy !....(Jrk). Study the bias and
n
variance properties of these five estimators.
Hint: Chaubey and Crisalli (1995).
( b ) Consider here an estimator of the population total Y as
~:n = IAv;iYi,
ier
where V;i are the new calibrated response weights obtained by minimizing a new
penalized chi square distance function , defined as
o, =! IV(U -I J . : I
( 0 02
<l>ivni,
2 ier qi 2 ie r qi
where qi are the suitably chosen weights to form different types of estimators, and
<l>i is a relaxed penalty, and it can take any value in the range (-1, 00) .
Note that the estimator, r:.o(o) , is free from qi and thus comment on its choice.
Case I. If <I> i = v(r:.o Yy 2 and is constant for each respondent in the sample, then
r:.o(o) reduces to the Searls (1964) estimator in the presence of non-response.
1046 Advanced sampling theory with applications
f.:o(O)CC(I) = I ( Yr/n)
IE r H i
Case III. If <l> i = (~- I) then the estimator f.:o (0) becomes
' . ()
YwQ n IdiYi
0 cc(z) =-
r ier
which is the second estimator considered by Chaubey and Crisalli (1995).
'. ( )
YwQ 0 cC(4) = I
ier
t y.
t
IdivsiXi = IXi
ier ie s
and study the resultant estimators.
( ii ) Suggest a few new calibration constraints and study the resultant estimators.
Hint: Singh and Amab (2003).
12. Non-response and its treatments 1047
Exercise 12.12. (I) Assume the data after imputation takes the form
~~ =~jYi
j r-l(r-ltl{(n-l)xi-nXn}i~~;
if i e A,
Yoi = if ieA ,
where A and A denote the responding and non-responding sets of the sample s
such that s = Au A .
where Yr = r-
I
LYi,
ie A
xr = r - I
ieA ies
±
L_xi' and xn = n- I LXi, zr = r- I Yi have the usual
i=1 Xi
meanings.
(b) Find the variance of the estimator e2 and estimate it by Jackknife technique.
Yoi = r(n-:+l)li~Yi]Xi if ie A ,
LXi
i=1
- -
where y,(i) = ryr - Yi and x,(i)= rXr -Xi.
r-l r-l
( a ) Show that the point estimator e2 =.!- i»: of the population mean Y under
n i=1
this method of imputation becomes
r(n-r+l)_ Xn (n-rXr-l)~- (,) xn
A
e3 =
n
Yr -=--
x
r rn
£...Yr l-=---:-.
i=1 X,(l)
( b ) Show that the estimator e3 is an unbiased estimator of the population mean.
( c ) Show that the variance of the estimator e3' to the first order of approximation,
is equivalent to that of the ratio estimator in two-phase sampling.
Hint: Singh, Hom , and Tracy (2001).
1048 Advanced sampling theory with applications
observations y~*, Y;* ,....,Y;* on Y variable in the sample, but the associated values
of the auxiliary variable x are missing. Defining
p-q
I n- p - q ( )In- * I P * ** I P **
x=n-p-q-
( ) IXi,y=n-p-q- IYi, X=P-IXi, and y =q-IYi'
i=1 i=1 i=1 i=1
Find the bias and variance expressions of the following estimators of R defined as
Y (n-q)y (n- p-q)y+qy**
1j=-=, r2= *' r3=
x (n - p - q)x + px (n - p) x
Exercise 12.14. Consider the population of interest has been stratified into L strata
with N h clusters in the h1h stratum. At the first stage of sampling, let nh ~ 2 clusters
are selected from the h 1h stratum. Let Phi' i = 1,2,..., N h » h = 1,2,...,L denote the
probability of selecting the i' h cluster from the h1h stratum Assuming that these
clusters are selected independently across strata and without replacement and the
overall sampling fraction Inh/INh is negligible. Assume Yhik be the ultimate
h h
population units value, where the index (h, i , k1 k = 1,..., N hi ; i = 1,...,Nh ; h = 1,...,L,
have their usual meanings. We wish to estimate the population total
L NhNhi
y= I I I Yhik'
h=1 i=l k=1
If there is no non-response and Whik are the design weights, then show that the
unbiased estimator of population total Y is given by
Y= I Whik Yhik'
(hik }es
In a stratified multi-stage sampling design consider for the ratio method of
imputation that the adjusted values are given by:
12. Non-response and its treatments 1049
+
estimation is
2
VBRR(&) = f{e {r)(&)-e}
e R r=1
where e is an estimator of any parameter of interest, say (J, and e (r)(&) is
computed using the same formula for e but with the original weights Whik replaced
with the new weights, given by
otherwise.
Discuss your views by taking e = Yhik and e = 0.5. Also show that
VBRR (&)/ var(e)~ I .
Hint: Rao and Shao (1999) .
Exercise 12.15. Show that under the multi-stage post-stratified sampling design an
estimator of population total Y in the presence of unit non-response is
YN,ps = LL L dpebwhikahikeOhik17k;kYhik ,
e p (hik}es
where e M = post-stratum count, Yhik denotes the value of the Jlh unit of the study
variable in the lh cluster and h1h stratum,
L Whik17k;k
d = {hik}es I if (hik) responds,
ahik = { 0
p p
L. whik ahik17hik
'" otherwise,
{hik }es
eb =
eM p
p , 17hik =
{I
if (hik) E plh weighing class,
L L d pWhikahikeOhik17hik 0 otherwise,
p (hik)es
and Whik (> 0) denotes the design weights. Suggest a method to estimate the
variance of the estimator YN PS using the concept of Jackknife.
Hint: Yung and Rao (2000).
1050 Advanced sampling theory with applications
YI • at
YI
2 yz yz
• az
3 Y3 • a3
Y3
r Yr
Yr * ar
r +1 • ar+1
Yr+ 1
r+2 • ar+z
Yr+Z
n • an
Yn
Let y' and var~') denote he sample mean and sample variance of the sample
which comprises the values making up y'. Let m and u > 0 be the fixed real
numbers. Let a = (aI ' az , ..., an) be the vector of real numbers. Show that the
n-I ;=I
I:
var(a) = _1_ (a; - m = u f leads to the adjusted data set given by
O.5
a? = m + { ~} &; -Y'), i =1,2,...,n. Show that the mean method of imputation
is a special case of it. Extend the results for ratio method of imputation. Let V; be
another variable corresponding to the auxil iary variable. Show that the
minimization of I: (a; - Y; f
;=1
subject to the two constraints a =.!.. I: a; =m
n ;=1
and
imputation.
Hint: Meeden (2000).
12. Non-response and its treatments 1051
Exercise 12.17. Consider a census of the population has been undertaken and all
units could be classified by the call back index number on which they respond by
forming L strata. Each call back stratum h would be of size N h and the total
population size N = "iNh • Let Yhi be the (fixed) value of a variable of interest for
h
h h - -I L Nh
the l unit in the hI call back stratum. Defining Y =N I IYhi ,
h=li=1
Nh
-\2
2
a = N-
1L Nh(
I I Yhi - Y)\2 , Y- h = N h_I IYhi and 2
CYh
_I Nh(
= N h "i Yhi - Yh ) • Show that under
h=l i=1 i=1 i=1
the sub-sampling strategy a an unbiased estimator of Y is given by y(a) = IwfYh
h
where Yh=_I- IYhi for the set of interviewed units Sh in the stratum h , n~ isthe
nf i ESh
number of interviews obtained in hlh call back stratum under the a sub-sampling
strategy and wf =
nf /ah
( ) for ah =
{I if k < m,
. Show that the conditional
..
I\nf /ah alfk >m.
r
h
variance for the fixed number of attempts to be made is given by
2
Exercise 12.18. Consider for estimating the mean of a finite population, a random
sample of n units is distributed equally among m enumerators chosen randomly
from an infinite population of enumerators. Assume that Yij the value of Y on unit
i as enumerated by interviewer j , is given by the model Yij = Yi + a j + eijk , where
Exercise 12.19. Assume the data after imputation take the form
Yoi = j;:[(n_r)xn
ax; +
(ar(x~-xr)]~ i : iiE:~,
I-a n IXi
iER p
where a is a suitably chosen constant, such that the variance of the resultant
estimator is minimum. The sets A ( A) denote the sets of responding (non-
responding) units in the sample S such that s = AUA .
1052 Advanced sampling theory with applications
where b = L iesY;/Lies Xi .
( a ) Show that the point estimator of population mean given by
- - I",
Ys = n L.ies'Yoi
becomes
X n'
YR =Yn -=-
- -
Xn
with variance
V(YR) = (~ -
n' N
J...)S2+(~n - ~)SJ
Y n'
2 I ( - \2
Sy = - - I l j - YJ,
N -l ien
I I I d 2 ,Y
Sd2 =- N i
-
=N-I Iienlj, di =Yi - Rxi an d R= I lj / I X i ·
- ien ien ien
( b) Show that the estimator YR is design consistent for populat ion mean Y .
( c ) A design consistent linearization estimator of the variance of YR is given by
the standard formula
Vo = (~-~)sJ
n n'
+(~n' _J...)s2
N Y
with sJ =(n -r): IIdl , s; =(n -r): IL (Yi - yy, where d, =(Yi - y)- k (Xi - z).
ies ies
where
f~~~;j j
nY n - Yj
if j if j
x(j)= E S,
y(j)= _ n-l
E S,
where
_**( .) __*( .)X'(j) x'(J.) = n'xn,-xj.
J' E S'
YR J - YR J x*(j)' n'-1 if
nxn - xj ·
j
if j E S, if j E S,
-* n-l
and x (j) = nX . _ x .
n j if j E S'-S .
if j E s'-s, n-l
n-l
( a) Show that the modified Jackknife method of variance estimation becomes
_** . _
YR(J)-YR=
j ~'(j) ~j -Rx
x(j) n-l
j
) +R(X'(j)-Xn.)+(_n_)R~(~)(Xn'-Xn)
n-l x(J)
ifj ES,
Exercise 12.21. For estimating the mean of a finite population, a random sample
of n units is distributed equally among m enumerators chosen randomly from an
infinite population of enumerators. Assume that Yij' the value of Y on the lh unit
as enumerated by the lh interviewer, is
12. Non-response and its treatments 1055
Yij = Yi+aj+ eijk> where E{eijk li,j )=O, V(eijk li ,j)=CT; and Cov{eijk> eijk·l i,j)=O.
( a) Find the expected value and variance of the sample mean.
( b ) Assuming that the cost function is of the form
C = nC1 + mC2 + ";;;;;;C3 •
Find the optimum number of enumerators for which the variance of the sample
mean will be minimum for the fixed cost.
Hint: Sukhatme (1953)
Exercise 12.22. (I) Consider the data after imputation take the form
Yi if i E A
where a is a suitably chosen constant, such that the variance of the resultant
estimator is minimum . The sets A ( A') denote the sets of responding (non-
responding) units in the sample s such that s = AUA' .
(a) Show that the point estimator )is =..!.- LYei of population mean Ybecomes:
n ies
_)a
(xr
- - Xn
Y s = Yr '
( II ) Let X = {xij L p
' (i = 1,2,...,n; j = 1,2,....,p ) be the n x p matrix of the p auxiliary
vectors associated with the study variable y. It is assumed that full information is
available on the auxiliary variables, but responses are missing only for the study
variable. Consider a method of imputation given by
Yi if iE A
Yei =
if iE A
p
where O Xi = x l x 2... .xp denote the product of p terms.
i=1
-
Ymult = Y r
- Ilp(XnjJa
-=-
j
j=\ Xrj
(b) Show that the minimum variance of the estimator Y mult is given by
Exercise 12.23 . Consider the data after imputation take the form
Yi if i E A,
Y.i = { -
a + wXi if i E A,
where a and ware suitably chosen constants, such that the variance of the
resultant estimator is minimum, and the method of imputation becomes optimum.
The sets A (A') denote the sets of responding (non-responding) units in the sample
s such that s = AUA' .
. estimator
( a ) Sh ow t h at th e point . Y- s = -1.{!.
L..Y.i 0f popu anon mean Y- under the
nonulati
n i =1
above method of imputation becomes
h
were - = r -I
p = r/ n , Yr '" - =
L..Yi' Xr r -I '"
L..Xi' an d X- n = n-1",
L..Xi '
ieA ieA i es
( b ) Show that the estimator Ys is unbiased if either a = (Y - wx) for any real value
Exercise 12.24. Michael works in a private sector and his boss Harold Mantel
considers an imputation technique based on the mechanism of the ratio method of
imputation while estimating ratio R = Y/ X of two population means. Harold
Mantel shows data to Michael on a spread sheet as shown in the following table in
the first two columns, 'data before imputation', and suggests to him to use the
following two ratios given by
n-p-q jn- p- q n-p-q jn- p-q
byx = Iy; Ix;, and bxy = Ix; I y;
;=1 ;=1 ;=1 ;=1
for imputing the missing Y variable and the missing X variable , respectively, as
shown in the last two columns of the following table.
YI xI YI xI
yz Xz yz Xz
b yx X2*
Missing
Xz* Xz*
.Missin
Missin
*
byxx p*
Missing *
xp xp
*
YI
Missing
YI
*
bxyYl
*
yz*
Missing y z* b xyY2
*
Missin
Missin
Missing
Yq* Yq*
( a ) After imputation Michael found the sums Ys and x s ' and took the ratio of
these two as
1058 Advanced sampling theory with applications
and reported back to Harold Mantel that his imputation method is not going to
work. Justify Michael's claim by showing that
, Yn-p-q
R = - _ - - = Ratio of observed responses .
xn- p - q
( b ) Michael suggests to his boss Harold Mantel the following efficient class of
estimators of population ratio as follows:
p
Rw= ["ilIYn_p_qHl_xn-q J+"il2Yq]/["il3Xn-p-qGl !n- J+ "il 4xP ]
x n- p-q Yn-p-q
where "ilk> k = 1,2,3,4 are real constants, and H(.) and G(.) are the parametric
functions such that they satisfy the following assumptions :
(a)H(I)=1 and G(I) = I;
( b ) first (G1 and HI say) and second order (HI I and Gil) derivatives of Hand
G exist and are known constants.
Justify the Michael's claim by comparing the mean square errors of Rand Rw •
Hint: Singh, Singh, Tailor, and Allen (2002).
Practical 12.1. John selected an SRSWOR sample of twenty states from the
population 1. He collected information about the real estate farm loans and nonreal
estate farm loans from the selected states, but unfortunately the information on the
real estate farm loans was not available on ten states as marked in the table below.
Practical 12.2. Select an SRSWOR sample of sixteen units from population 4 given
in the Appendix. Consider you made 7 attempts to collect information about the
different species groups selected in the sample. Information about the number of
fish caught in different species is not available all the times you contact the
fisherman. You collected the information on the number of fish during 1994 in
seven visits. It was also noted that out of seven visits, how many times (D) the
information about these species was available? Estimate the average number of fish
caught by marine recreational fishermen at Atlantic and Gulf coasts during 1994.
Construct 95% confidence interval for the average number of fish in the United
States.
Practical 12.3. Michael selected an SRSWOR sample of twenty states from the
population I, and tried to collect information about the real estate farm loans and
nonreal estate farm loans from the selected states , but unfortunately the information
on the real estate farm loans was not available on nine states as marked
. Xi .,
;'1: '" IT:&,'
,,'
I ,~ando~~ il~~Mbm -« ,rr", Ylii.~"
<~
:i, Yi I~Stat~c
" No. '.\'~ t",,~ ~~. ", No. if ,,,,:Uti 1'.\i.?"i~~ ,.
Apply the ratio type estimator Vs =s;2{s;/s;2) for estimating the finite population
variance of the real estate farm loans and construct the 95% confidence interval.
1060 Advanced sampling theory with applications
Practical 12.4. Santa Singh and Banta Singh were appointed to select two
candidates from a list of four candidates n = {Anokha, Banto, Channa, Didar} with
their respective scores 25, 35, 40, and 45, respectively. In the first phase the
administration suggested to Santa Singh and Banta Singh to select three candidates
for telephone interview, and in the second phase they decided to select two
candidates for face to face interview. Santa Singh likes every one whereas Banta
Singh likes Banto so both suggested the following first phase sampling plan :
I!:!,!!!>!
s; = {Anokha, Banto, Channa} pSI
. = 1/4 pSI
,
= 1/3
s~ = {Anokha, Banto, Didar} p S2 = 1/4 P s2
. = 1/3
s~ = {Anokha, Channa, Didar} p s3 = 1/4 P s3 = 0.0
s~ = {Banto, Channa, Didar} pls~ = 1/4 p s4 1= 1/3
( a ) Construct the first order and second order inclusion probabilities for the first
phase telephone interview.
In the second phase the administration decided to select two candidates for face to
face interview out of the selected three candidates during telephone interview.
Again Santa Singh and Banta Singh suggested the following possibilities
( b ) Construct the first order and second order inclusion probabilities for the second
phase face to face interview. Find difficulties in Banta Singh's sampling scheme.
12. Non-response and its treatments 106 1
( c ) Estimate the total score from each one of the second phase sample for the given
first phase sample. (Except s') for Banta Singh's scheme).
(d) Find the bias and variance of Santa Singh and Banta Singh 's selection schemes
by using the definitions of bias and variance.
( e ) Discuss the relative efficiency of Banta Singh's sampling scheme over the
Santa Singh's sampling scheme and comment.
( a ) Estimate the average number of cattle per farm and derive 95% confidence
interval estimate using two-phase ratio estimator.
( b ) Estimate the average number of cattle per farm and derive 95% confidence
interval estimate using two-phase regression estimator .
( c) Comment on the confidence interval estimates obtained .
Practical 12.6. John and Michael were appointed to select two candidates from a
list of five candidates n= {Amy,Bob,Chris,Don,Eric} with their scores 125, 126,
128, 90 and 127, respectively . In the first phase, the administration sugges ted to
John and Michael to select three candidates for telephone interview , and in the
second phase they decided to select two candidates for face to face interview. Both
John and Michael suggested the following first-phase sampling plans for telephone
interview:
w ;jc<,J()hn;afidjMichaen:~~
, ,
sl = {Amy, Bob, Chris} P s\ = 1/4
,
s2 = {Amy, Chris, Don} P s2 = 1/4
,
s) = {Amy, Don, Eric} p s) = 1/4
( a ) Construct the first order and second order inclusion probabilities for the first
phase telephone interview .
1062 Advanced sampling theory with applications
In the second phase the administration decided to select two candidates for face to
face interview out of the selected three candidates during telephone interview. John
likes every one whereas Michael likes Amy, so they suggested the following
possibilities:
( b ) Construct the first order and second order inclusion probabilities for the second
phase face to face interview for both sampling schemes.
( c ) Estimate the total score from each one of the second phase sample for the given
first phase sample for both sampling schemes.
( d ) Find the bias and variance of John and Michael's selection schemes by using
the definitions of bias and variance.
( e ) Discuss the relative efficiency of Michael's sampling scheme over the John's
sampling scheme and comment.
Practical 12.7. Select an SRSWOR sample of twenty units from the population I
given in the Appendix . Record the values of the real estate farm loans for the states
selected in the sample. Assume 5% random non-response in the selected sample.
Impute the missing values four times with the help of hot deck method of
imputation. Apply the concept of multiple imputation for estimating population
mean and construct the 95% confidence interval.
Practical 12.8. Select an SRSWOR sample of twenty five units from the
population I given in the Appendix. Record the values of the real and nonreal estate
farm loans for the states selected in the sample. Assume 5% random non-response
in the selected sample. Impute the missing values with the following two methods:
12. Non-response and its treatments 1063
Practical 12.9. Professor Forgetful (e.g., refer to the film 'The Nutty Professor'
directed by Jerry Lewis) believes that the percentage of marks of students in an
examination depends upon the number of classes attended by them, and the number
of marks in the assignments. Professor Forgetful misplaced mid-term exams of 4
students, but has information about the number of classes attended and marks in the
assignments.
( c ) Assuming the information about the number of classes attended by the students
is known, impute the missing marks with the ratio method of imputation and again
find the average marks in the class.
( d) Impute the missing marks with the following method of imputation
Y.i=j~ =:lYi r
r-l (r - ltl {(n - l}xi - nXn }i~ ~;
if i
if iEA,
E A,
where Yi and Xi denote, respectively, the marks and number of classes attended by
the lh student, A and A denote the responding and non-responding sets of the
sample s such that s = A u A . Again find the average marks in the class after
imputation .
( e ) Repeat (c) and ( d ) using known marks in the assignments .
( f) Suggest a new method of imputation to use both of the variables viz., the
number of classes attended and marks in the assignments, to impute the missing
marks in the exam.
( g ) Give your views on the methods of imputation used by Professor Forgetful,
and your suggestion in ( f).
13. MISCELLANEOUS TOPICS
13.0. INTRODUCTION
The main purpose of this chapte r is to keep this book open to the new topics comin g
in the recent years or which have not been touch ed upon by the author in the present
version of the book. In this chapter we shall introduc e a few miscellaneous topics
namel y:
The statistical methods used for estimation of measurement errors for the tools
individually are studied below .
Grubbs (1948) was the first to suggest a sampling theory methodology for
estimating the variance of measurement error of any number of tools separately.
For the sample units all variables including the variables used in the model, will be
denoted by lower case letters . Assuming the n units are drawn randomly using
equal probability with replacement sampling from a population of N units , the
usual unbiased estimators of the variances and covariance of the measurements of
Tool 1 and Tool 2 are
where y = n- I Yi
I
and x = n- IXi are the respective sample means .
I
i=1 i=\
Theorem 13.1.1.1. The variance of the measurement error of Tool 1 is estimated by
/\
crU=Sy-Syx
2 2
'
Proof. If E is the expected value over the units then
E(s2)=E[_1{IYl-n(.!-
y n-I i=\ ni= ni=\ nn1-I ) i'"Ij=\YiY']
I 1yiJ2}] = E[.!-IY1--( }
Under model (13 .1.1.1) the expected value of s; becomes
E(S;) = E[~i~(lfIl +ul + 2lf1iUi)- n(nl-l)iJ=1(lfI,+UiXlfIj + Uj)]
1 N 2 1 N 2 n(n -1)- -
=-I'P +-IV --(--)'P 'P =cr'l'+cru
2 2 ( )
Ni=\ i N i=l i nn- 1
because fJ =0 .
13. Miscellaneous topics 1067
Similarly
E(syJ=E[_111 - 1 ;;1
{IY;X;-II(~I /I ;;(
y;J(~Ix;
11 ;; J
J}]=E[~I
/I ;;(
Y;X; __ 1 I Y;Xj ]
11(11 - 1)N' j;1
c?V=S;-Syx .
Proof. The proof is similar to the one given above for Theorem 13.1.1.1.
Since the variances of measurement errors of tools are estimated from the
" "
measurements of sample units, a& and a~ may differ from sample to sample.
Under certain conditions of normality
v[ au J----+
"2 2at a~a& + a~a~ + a&a~
/I-I /I-I
.
To determine v[:~ J the at in the first term of the right hand side is replaced by
a~. For practical purposes the unknown parameters in the above expression are
replaced by their respective sample estimates.
The relationship between total scatter, represented by the root mean square
differential/error, and the estimates of variance of measurement error for individual
tool, including bias if present, is illustrated in Morrison, Mangat, Carroll, and
Riznic (2003).
14 35.80
15 38.20 57.70 42.30 ~ 30 30.30 36.10 36.80
These values yield estimates of the standard deviations of measurement errors as:
Grubbs model is based on single measurement made per unit by each instrument.
Consider the case where it is possible to measure the characteristic of a unit more
than once by each tool. Bhatia, Mangat, and Morrison (1998) presented estimators
for this situation using varying probability and equal probability with and without
replacement sampling. Here the results are presented for equal probability with
replacement sampling only.
13. Miscellaneous topics 1069
Let each unit be measured repeatedly using Tool I and Tool 2. Using t as a
subscript for the l h measurement and i for the lh unit, the model is written as
1ft = \}li + Vii and XiI=\}li+ViI, (i=I,2 ,..., N ) (13.1.2.1)
where
1ft = observation, subjectto measurement error, recorded by Tool 1,
Xii = observation, subjectto measurement error,recorded by Tool 2,
true valuefor the ith unit,
\}li =
where \}l is the mean of the true values for the population units , fJ is the mean
over measurements and units in the population, u] represents the mean of squares
of the measurement errors over the infinite large repeated measurements for the lh
population unit and a& is the varia nce of the measurement errors.
2" 1'1 2 2 1 n 2 1 n_
ui = - LUiI , Sya=-L Yi- -LYi
Ij t~l n i~l n i~l
_ ()2 2 1 n-2 1 n_
, Sy=-- LYi-- L Yi
n -1 i~l n i~l
[ ()2] ,
Expressions for s;a' sff, and s; are obtained by substituting u for y in the
.
expressions f or Sya'
2 2 d 2
Sji , an SY '
Case I. 'i ~ 2 and ri is sufficiently large so that the average of r2 could be taken as
the true value for the /h unit.
Case II. 'i ~ 2 and r2 ~ 1 but not so large that the average of ri could be taken as
the true value for the /h unit.
Case III. 'i = 1and r2 is sufficiently large so that the average of r2 could be taken
as the true value for the /h unit.
Case IV. rl = 1 and rz ~ 2 but not so large that the average of rz could be treated as
the true value for the /h unit.
Case V. 'i = 1and rz = 1.
Note that Case V is equal to the Grubbs estimators for two inspection tools.
For equal probability with replacement sampling the estimators of the variance of
measurement error ab of Tool I are given by
( Case I )
A 2
au
2 2
= Sya + -
Sji
-Sji x
( Case II )
n
( Case III )
(Case IV)
(Case V)
Theorem 13.1.2.1. For equal probability with replacement sampling the Case II
estimator of variance of the measurement error of Tool I is given by
A 2
2 2 Sji
au = Sya + - -Sjix '
n
Proof. Let E 2 be the conditional expectation over measurements for a given / h unit
E
[
2 Sji
Sya +~ 2] = E1E2 Sya +~[ 2
2]
Sji
13. Miscellaneous topics 1071
+ - -1
I Z n n_ n I n_
1
[
= E1E2 - LY; - - LY;
n ;=1 n ;=1 ( 2 I
J n(n -I) LX; - - (J2)]
n
LX;
-2
i=1 i=1
I 1
= E1E 2[ - LYi - - (
nz
I) L YiYj .
n __ ]
n ;=1 n n - r~ j=l
Now
E2[ yf ] = E2(..!-
'i
IY~ J= E2[..!-'i I ('IIi +Ui/ )2] ='Ill +ul + 2'11iUi
1=1 1=1
E1[s;a + s~]
n
= E1['!' t('IIl +ul +2'11iUiJ-_1- t ('II; +u;X'IIj +uj)l
n;=1 n(n-1);..j=1 J
=-I'Pi
I N 2 1 N 2 2 N n(n-1)(-
+-IU; +-I'PP;--(--) 'P+UA'P+U
-v.,- -)
N ;=1 N ;=1 N i=1 n n-1
=a~ +a&, because Cov('P;, U i )=o.
Now
[ n - 1 ;=1
1n 2
1 I'll;
= E1 -1- I 'IIi - -
n ;=1
(n J2) +--
1 {nI'II;u; - -
n- 1
n Iu;
1 I'll;
;=1
n }]
n ;=1 ;=1
E
[
2
Sya "t
Sy2
:» ] = a'¥2 +au2 -a,¥2 = au2
which proves the theorem .
Example 13.1.2.1. Thirty true values were selected at random from a normal
distribution of mean 40 and standard deviation 15. To these values simulated
measurement errors of standard deviations 10 and 2 were added three times for
Tool 1 and three times for Tool 2 respectively to generate three measurements for
each tool. The data is given in Table 13.1.2.1. Using this data estimate the standard
deviations of measurement error of Tool 1 and Tool 2 separately assuming the
average of neither of the two tools is the true value of the unit.
Table 13.1.2.1. Hypothetical replicated data for two tools.
. ... :'''' : ,_.,-
Sr.N~. True Depth It.Too·l1 r l I'Tool' tr2 .T6~i . h r3 :fkToo12 r1 J'0012 12 4f ool2 r3
ii;~ ~.
13.21UKINGIUTIOUSING..CON11INGENCY 11ABLES
The use of raking ratio estimator in survey sampling is quite old, which was first
introduced by Deming and Stephan (1940) . This procedure uses an iterative method
of adjusting two-way contingency table so that the row and column sums add to
certain preassigned value. The concept is basically similar to the calibration
approach . Let n be a population of units cross classified into an R x C table, and
let nrc be the set of the Nrc units in the (r,c)th cell. We draw a sample s, with
Src = S n nrc, and let d, be the survey weight attached to the /h unit. Assuming that
the variable of interest Y, taking the value y; for the /h unit and LW;y; is a
ies
1074 Advanced sampling theory with applications
c = 1,2,...,C . The definition of the raking ratio method is to adjust the weights of
each of the observation so that the resulting estimators of the auxiliary totals X r.
and X. c (r = 1,2,...,R ; c = 1,2,...,C) correspond to their population values. In order to
have a better understanding of it, let us consider the following R x C contingency
table.
-Rows [ i .. ..
'i coli.l:riIDs .... Totals
.1 2 C
1 1] I , 1]~ ,X II' X;I Y12 , YI*2 ' X 12 , X;2 }) C 'YI~ ,XIC 'X;C Yl . , XI.
2 Y21 ,Y;I ,X21>X;1 Y22,Y;2,X22,X;2 Yzc, Yz*c ,Xzc ,X;c Yz. , X z.
'LAYi if 1= 0,
*
iESrc
Formulas for the asymptotic variance of the raking ratio estimator are given by
Brackstone and Rao (1979) for up to four iterations . Following them we have
following relations .
If 1=0, E(Yr~»)=E[.~diYil=y,
lE SrC J
13. Miscellaneous topics 1075
if t = 2, E(r,(2)\
rc;= E[r,(I)[--&]]
rc • (I) '" Y .
X oc
If general if t is odd then
Now using the result that the bias in the usual ratio estimator
- -(x]
YR =y x
is given by
B(YR)= ~ [fv(x) - cov(y, x)] . (13.2.4)
x
Thus we have the following theorem:
Theorem 13.2.1. The bias in the raking ratio estimator, to the first order of
approximation, is given by
~ ~[y.(t-1)V(X(t-I))_ Cov(X(t-l) r,(t-I))~~ if t is odd
j
c:y\t- IJ r· r· r· ' r · ,
r=IXro
B= (13.2.5)
~ ~[y.(t-l)V(X(t-I
s: y\I - IJ .c .c
))_ Cov(X(t-l) y.(t-I))~~
' .C
if t is even
.
c=IXoc
. C
r
Similarly using the result that the variance of the usual ratio estimator YR IS
~[V(r,(t-I))+(
LJ r.
Yr~-I) )V(X(t-I))_(
~ r.
Yr~-I) )cov(X(t-l) r,(t-I))~
r . ' r.
~
if t is odd ,
r= Xro Xro
V= (13.2.7)
~[V(Y.(t-I)~
z: y'~-I) )V(x(t-1)\I 2(~
~
. C
Yo~-I) )cov(X(t-l)
.C .C '
Y.(t-I))~ . C
if t is even
.
c= X oc X oc
Deming and Stephan (1940) were the first who used raking ratio method of
estimation for estimating the cell probabilities l [ rc in the r x c contingency table for
which the marginal probabilities st ro and l [ oc are known. Later on various iterative
procedures have been developed by several researchers including Smith (1947),
Friedlander (1961), Ireland and Kullback (1968), Fienberg (1970) and Causey
1076 Advanced sampling theory with applications
(1972) to find the solutions to this problem. The method originally proposed by
Deming and Stephan (1940) is called the Iterative Proportiona l Fitting Procedure
e
(IPFP) and it minimises the modified distance function as
;2 = If (nrc - n1l'ref, (13.2.8)
r=!c=1 nrc
where nrc > 0 denote the sample size in the (r, c}th cell and
R C
n = I In re·
r=l e=1
Ireland and Kullback (1968) proved that the IPFP rrururmses a discrimination
function defined as:
Konijn (1981) has derived biases, variances and co-variances for the estimators of
the cell and marginal totals and of the corresponding marginal averages in the
R x C contingency table. We shall now like to explain the raking ratio method with
the help of a numerical example given by Binder and Theberge (1988) as given
below.
n21
• = 20
•
n22 = 25
700
Use wi = 80 = 8.75 and taking Yi = Xi = I .
For t = 0 we have
'(0)_ _ _ • _ '(0)
Yrc - I Wi Yi- I8 .75-8.75xn re-Nre (say). (13.2.10)
•
iE S re
•
ie sre
Evidently
~ = ~ = 1.371429
X (o) 218.75
I·
and
X z• =~=1.015873 .
X (o) 393.75
z·
Thus we have following table
The literature on the survey samp ling of finite populations mainly dea ls with the
populations that consist of sets of discrete units. Here we shall discuss the
populations that may be considered as one continuous entity or a conti nuum of
points . Examples of natural continuous populations are air, soil temperature over
space and time, po llutant levels in a volume of material such as a river, percent
chemical contents along a strip of soil and inches of rainfall over a region and
commodity prices ove r time etc.. Thus a continuous population exist within a
support medium such as time or space . Let the function Ya assign the value of a
characteristic of interest y to point a of the support region . In geo-statistics this
function is called a regionalized variable . In survey sampling, the entire set of
labelled pairs Yp = {(a,Ya )} that is the subject of inference after sampling is called
the population parameter and any real function Y p is called a parametric function.
If P denote the support region , then we are interested in estimating the tota l value
of the Y characteristic defined as
1078 Advanced sampling theory with applications
Y = JYada . (13.3.1)
p
In the fixed population sampling strategies , the parameter {(a, Ya)} is regarded as
fixed but unknown. In case of superpopulation model approach, the parameter
{(a ,Ya )} is treated as a realisation of a random vector or function {(a , Ya)} whose
stochastic distribution .; is specified or partially specified. Let E.; denote the
expected value with respect to distribution .;. In the fixed population approach the
inference is based on a sample s of n units drawn with probability p(s), called
sampling design , from the population. Let E denote the expected value with
respect to sampling design .
E.;[O-Q(Yp)f·
The superpopulation value Ya can be partitioned into two components, viz.,
( a ) The first is a trend component, the values of which usually depend on the
location a of the population unit in the support region, defined as
p
f(a)= E.;(Ya) = l; cd k(a), a E P;
k=O
and cO,c\,...,c p are P + 1independent known functions ;
Then we have the following assumptions. The error structure Z(t), t E P, has
( a ) mean zero,
( b ) finite variance, a; = V~ {Z(t)} ,
(c) the covariance, vz~; ,tj)= Cov~{Z(t;~Z(tJ '
( d ) second order stationarity in one dimension can be written as a function of a
single variable h = tj - t; , so that, Vz (h) = Cov~ {Z(t), Z(t + h)} ,
and
( e ) linear operations of integration and expectation can be interchanged.
Then, following Bartlett (1986), our objective is to estimate the population total Y
as
T (\-1 (13.3.3)
tc=fytJUt
o
over the support region P = {t : 0::;; t : ; T} . Thus tc can be regarded as a realisation
of
T (13.3.4)
'Fe = fY(t}it
o
under the super-population model (13.3.2). Since we have only a finite number of
observations over the support region P, therefore, our sample will consist of the set
s= {t; : i = 1,2,...,n}.
Thus if YI(' YI2 ,... , YIn are the observed values then we will choose our estimator to
be a linear combination
• n
tc = IWI;YI; ' (13.3.5)
;=1
which is a realisation of
• n
t; = Iwl;Yr;, (13.3.6)
;=1
where WI; are the weights depending upon the population point t; selected.
Barltett (1986) has discussed the following four criterion to find these weights:
( a ) Determine both weights and the sample locations such that the estimator
(13.3.5) is model unbiased and has minimum predictive mean square error;
( b) Determine the weights which for any sample s actually obtained the estimator
(13.3.5) is model unbiased and has minimum predictive mean square error;
(c ) Determine for present weights the sample s for which the estimator is model
unbiased and has minimum predictive mean square error;
1080 Advanced sampling theory with applications
( d ) Determine either the weights or the sample or both such that the estimator has
minimum model bias for a given class of trend functions.
p(t) =_1 It
nj.J i=1
i,
Padmawar (1996) compared the following four estimators of the population total of
a continuous distribution, defined as:
~y(t/T!~ti) ;
1=1 \ 1=1
I ~ti )f(ti)/;r(ti),
i=1
where
n
;r(t;) = L qj(ti), ;r(ti) is assumed to be positive for t > 0, and qi(ti) = fq(t) fI dtj ,
j=l j*i=1
1 :5 i :5 n;
Following Bogue (1950), the VRM method makes use of only birth and death data.
These data are used as symptomatic variables rather than as components of
population change. In the first step, the number of births B, and deaths D, in a
given lh year are determined for a local area. If .8 10 denote the crude birth rate for
the local area in the latest census year (t =0) ,BIt denote the crude birth rate in the
current year for a larger area containing the local area, and B IO denote the crude
birth rate in the census year for a larger area containing the local area. Then an
estimator of the crude birth rate in the current year is given by
1082 Advanced sampling theory with applications
(13.4 .2.1)
Similarly, if blO denotes the crude death rate for the local area in the latest census
year (t = 0), Dlt denotes the crude death rate in the current year for a larger area
containing the local area, and D IO denotes the crude death rate in the census year
for a larger area containing the local area. Then an estimator of the crude death rate
in the current year is given by
bit =blO(Dlt/D IO) . (13.4 .2.2)
Then an estimator of total population P, for the local area in year t is given by
This method takes into account an important factor, migration, while estimating
total population in a local area. If M, denotes the net migration in the local area
since last census , then an estimator of total population P, is given by
~ = Po +B t -Dt +Mt, (13.4 .3.1)
where Po denotes the population count of the local area in the census year t = O.
Migration can be estimated in several ways. For example, the net migration can be
subdi vided into civilian and military migration. Evidently the military migration can
be taken from the administrative records and civilian migration can be taken from
school enrolments. If you estimate the net migration from the records for the
13. Miscellaneous topics 1083
individuals as opposed to collect units like schools, then such a method is called
administrative record method, and can be used for producing local area estimates.
Example 13.4.3.1. In a university let the total number students be 15000 during
1999. During 2000 there is a recruitment of 1500 students according to the schedule
and later on 50 students left the university and stopped their study. There was
migration + 10 students (say, 20 students migrated to other universities and 30
students came from other universities during the academic year 2000). Apply the
Census Component Method (CCM) to estimate the total number of students in the
university during 2000.
Solution. We have
A= Po + BI - D I + M I = 15000 + 1500 - 50 + 10 = 16460.
Y;ss = LK(XijjXOj).•
Yj , (13.4.5. 1)
j=1
where r; may be a ratio estimator or difference type estimator or any other direct
estimator. As shown in Table 13.4.5.1, the direct estimator r; as a ratio estimator
has been discussed by Ghosh and Rao (1994) and a difference estimator has been
discussed by Singh, Stukel, and Pfeffermann (1998). It is further investigated by
Gershunskaya , Eltinge, and Huff (2002).
~. = ~. =
Y.01X. -
'-
I Y. I + .ti(XoI - Xot)
Xol
2 Y;I Y;2 Y; N 2 r; = r; =
X 21 X 22 X 2N2 Y.02 -X'- 02 r02 + .ti(X02 - X02 )
N 21 N 22 N 2 N, X02
K YK I YK 2 YKN K r; = r; =
X K1 X K 2 X KN K Y.oK-X'oK
- roK+.ti(XoK- XoK)
N KI N K2 N KN K X. K
K
Proof. We know that the true total for the lh small area is given by Y; = I Yij ' Also
j~1
taking design based expected value on both sides of the general synthetic estimator
y;s = f (Xij/XOj)Y;
j~l
t
we have
MSE(Y;' s )= E[,s
Y; - Y; 12J = E[,s
Y; - Y;' " + Y;'" - Y; 12J
=E(Y;,s- Y;,,\2
J + E(," '"X'" )
(,s - Y; Y; - Y; .
Y; - Y; J\2 + 2EY;
1086 Advanced sampling theory with applications
we have
(,S) ('S ,.\2
MSE lj '" Elj - lj ) + Elj -
(,. ljJ\2 '" (,s
Elj - lj ) + Vlj . ,. \2 (,.)
Hence the theorem by the method of moments.
It is to be noted here that the condition cov(Bs, B')'" 0 may be realistic in practice
since the synthetic estimator is much more stable than the direct estimator in small
area estimation process.
A class of estimators can be defined by combining the direct estimator and the
synthetic estimator as
lj' C = Yj lj•• + (1- Yj ).ljS , (13.4 .6.1)
where r, is called a shrinkage factor between 0 and 1. The estimator is yF
expected to have a small prediction mean squared error if r: provides a suitable
trade off between the instability of direct estimator and the bias of the synthetic
estimator. Singh, Stukel, and Pfefferman (1998) reported that most of the work on
small area estimation is on the determination of the shrinkage coefficient rs-
The main difficulty in using r, is that it depends upon the unknown population
parameters. This difficulty was over come by Purcell and Kish (1979) by
suggesting as estimator ofYj as
(13.4 .6.3)
13. Miscellaneous topics 1087
Drew, Singh, and Choudhry (1982) suggested an interesting method to obtain the
shrinkage coefficient r, as
1 if ilj "? su;
h= _ilj (13.4.6.4)
where
1
ilj
oN j
otherwise,
(13.4 .6.5)
otherwise,
1 6 3194.50 2
2 6 14660.00 2
3 8 18309.37 3
4 10 14923.50 4
5 12 5987.83 4
6 4 3450.00 2
7 30 11682.73 11
8 17 145162.30 6
9 10 33976.10 4
10 3 1333.33 2
Solution. We selected the sub-samples of the required size from each continent by
SRSWOR sample and collected the information on the area and yield of tobacco
crop as shown below .
1088 Advanced sampling theory with applications
Contine,~~ ri
.' ' ' '
I '~
7
'ni o.
';e,,-." _""'''''';'-'
--;: ,,,
x,ij kf~ ",~3~~i.
~ . .: . " . . y.Y.. ~' ~~r
1th'~'
I 2 9024 7090 .50 2.21 1.995
5157 1.78
2 2 27050 14 112.50 1.51 1.750
1175 1.99
3 3 9260 24003 .30 2.80 2.780
47600 2.77
15150 2.79
4 4 24000 14025.00 0.63 1.450
7000 1.67
19100 2.34
6000 1.17
5 4 4500 8226 .00 2.33 1.615
4304 0.26
5500 1.82
18600 2.05
6 2 2700 4700 .00 1.96 1.570
6700 1.18
7 II 705 1.00
4000 21939 .09 0.45 1.010
3400 1.62
750 0.87
10000 0.26
10 1.00
116700 1.22
655 1.63
103110 2.06
0 0.00
2000 1.00
8 6 36000 325593 .16 1.22 1.450
12165 0.74
1445000 1.75
445000 1.43
11000 1.05
4394 2.50
9 4 18000 78800.00 1.39 1.175
2100 1.29
1800 1.11
293300 0.91
10 2 3300 1700.00 2.73 1.840
100 0.95
Synthetic Ratio (SR) estimators: The synthetic ratio estimates of average yield in
different continents is
."" .=.. -
,~f\ cont.i nent . ~j' ~ ,
SY!1tli~{stiinator: Y; =-? x, ,
'" """ l ~ - '- _ "' ,: ~':. f; X
1 1.486
x 3194.50 = 0.06947
68157.725
2 1.486
x 14660.00 = 0.31962
68157.725
3 1.486
x 18309.37 = 0.39918
68157 .725
4 1.486
x 14923.50 = 0.32536
68157 .725
5 1.486
x 5987.83 = 0.13054
68157.725
6 1.486
x 3450 .00 = 0.07522
68157 .725
7 1.486
x 11682.73 = 0.25471
68157 .725
8 1.486
x 145162.30 = 3.16488
68157.725
9 1.486
x 33976.10 = 0.74076
68157.725
10 1.486
x 1333.33 = 0.02906
68157.725
The estimate of the average yield in the eighth continent is negative, which is not
possible and hence can be taken as zero. It looks this estimator needs
improvements, may be due to no correlation between yield and area under the crop.
The small area estimation techniques are model dependent, and to understand these
techniques, one should must know the Henderson (1975) model. We first provide
here a complete solution to it.
We give here a complete solution to the general Henderson (1975) model defined as
Y = XfJ+Zu +e, (13.4.7.1)
where Y = col col lJ;ij) is an nx1 vector, X = (Xijl) k matrix, u = col (u;),
1$; $1 I$j $n; nx 1$;$1
e = 1$;$II
col col (eij)' and
$j$n;
Z = (zij)
nx
kmatrix. Let us partition a matrix as
(13.4.7.2)
t.
From (13.4.7 .7) and (13.4.7.9)
R = -D-1C(A - BD-1C (13.4 .7.10)
From (13.4.7.4) and (13.4.7.8)
AQ+BD-I(/-CQ)=O or AQ+BD-1 -BD-ICQ=0 or (A-BD-ICk=-BD-1,
which implies
t
Q = -(A - BD-IC BD- I . (13.4.7 .11)
or
S = D- I[I + e(A - BD- Ie t BD- I ]. (13.4 .7.12)
From (13.4.7.9), (13.4.7.10), (13.4.7.11) and (13.4.7. 12), the inverse of the matrix
[eAIl D
B] is given by
are given by
/J = (A - BD-Iet X'r ly - (A - BD- Ie t BD-IZ ' r ly
r
To simplify these estimates of p and it we have
t Xz,r r
r
[(X' R-1x )- (XIR- IZ
l
(A - BD-Ie = l
Z + G- I Z' R- l X
Now
1092 Advanced sampling theory with applications
r
= (Z'R-IZ +G- 1t Z'R-1y
l
+(Z'R-IZ +G-ItZ'R-1X[X'R-IX -x'rlz(z'r1z +G-ItZ'R-IX
Theorem 13.4.7.1. The best linear unbiased predictor (BLUP) of l;'fJ + t;'J1 is
I;'/J+t;'u = I;'/J + t;'GZ,v -I&-X/J) (13.4.7.22)
Proof. Replace fJ by /J and J1 by it in 1;' fJ + t;' J1 we have the theorem .
Theorem 13.4.7.2. The variance of the best linear unbiased predictor (BLUP),
itp = 1;' /J + t;' it
IS
-I;(x,v-Ixtx'rIZ(Z'R-IZ +G-Itt;
(13.4 .7.23)
Proof. Following Henderson (1975) let
13. Miscellaneous topics 1093
r =:~ -\~1~-1
C12 IC 22
be a generalized inverse of the matrix
[~-i-~-]
then
CIl I C121
V{;Jp) = V(f,8 + C;'iJ) = [;K
{
-~ -1--- [~}T2
Cl 2 I C22
2
= a l;'CI I ; +C;'C;2; +;'C12C; +C;'C 22C;J
where =(x,v -Ixt, C 12 =-(x,v-IXtX'R-IZ(Z'R-IZ+G-lt ,and
CII
t
W= [dx'v-I X X'V- I + C;'GZ'V- 1( 1- xkv- IX X,V- I)] . t
We first discuss three well known models useful for small area estimation as
follows:
( a ) Nested error regression model; (b) Random regression coefficient model;
and (c) Fay and Herriot model.
Battese, Harter and Fuller (1988) suggested a nested error regression model in the
context of estimating mean acreage under a crop for counties (small areas) in Iowa
using landsat satellite data in conjunction with survey data with the model
y = -I)
x'../3 + + e,
VI' i = 1,2,...,1 and j = 1,2,....n,
1094 Ad vanced sampling theo ry with applications
t
is an n xk matrix , P = (PI,P2, ...,f3k )· , n= 'Lni , and
i=1
So the nested error regression model can be written as
To find var iance unde r the nested error regression model let us choose
.
I0,, 0,1, ,,0]° [1]1
Z=dlagCol(I)=It @l ni=
[0, 0, ..., 1 1
@ ,
ISiStlSj Sni . .
txt ni xl
nxn
(I a +aJat l =I a _ _l +a_
aa
Ja . One can easily observe that with s'= (0,0 ,...,1,..., 0) we
have S, GZ•V - I - ' r,
= (Yi - ' 00"Yi
- ] , where r, = 2
0';_ I 2 '
. . .
which further implies that
ni ni ni O'v + ni O'e
Yil - PXil
X
S'Gz.V-1(Y-XP)=( Yi , Yi ,oo .,Yi] Yi2 - P i2 = Yi~i - XieP) .
ni ni ni
Yini - P Xini
Again taking ;= (o,o,oo .,Xie, .oo,o) the best linear unbiased predictor (BLUP) under
the nested error regression model becomes
13. Miscellaneous topics 1095
Theorem 13.4.7.2.1. The mean squared error under the nested error regression
model is given by
MSE[jl(nested)] = (Xi. - YiXi. XX 'V - 1X J I(Xi• - Yixi. )+CT; (I - Yi) ' (13.4 .7.26)
Proof. The true mean under the nested error regression model in the lh area is given
by Pi = X i.fJ + Vi ' therefore, the mean squared error (MSE) is given by
(-
= Xi. - YiXi.
- )vfIP;, X
- - Yixi.- )+(1- Yi f CTv2 +-CT
Xi. r1 e2 •
ni
2 2
Note that r, = v
2 CT _ 2' which implies Yi = (1- y;) CT~ and thus
CTv + ni 1CTe ni CTe
MSE(jl(nested)) = (Xi. - YiXi.)v~ XXi. - YiXi. )+(1- Yi f CT; +CT;r;(I- Yi) CTCT;2
e
Dempster, Rubin, and Tsutakawa (1981) proposed a more general model with
random regression coefficient fJ , which in case of single concomitant variable x
and regression through origin takes the form
Yij = fJiXij +eij = fJXij +ViXij +eij' i = 1,2,...,1; j = 1,2,....n., (13.4.7.27)
where fJi = fJ + Vi and Vi and eij are independent. The mean of the j lh area is given
by
(13.4.7.28)
1096 Advanced sampling theory with applications
a linear combination of fixed effect fJ and realized value of random effect Vj.
Under random regression model with k =I, z, =x, ,rj =O"i{O"~ + O"i/j~1 xt}-I ,
which implies, (1- r;) = O"~ (7./ I xtJ'
O"v J=I
and we have
j
= -2
( O"~ J[x, ej - r»
O"e
j i ej ~ Xij2] = (O"~J["j
-, XjX,(" j
J-I
_l~Ij xijeij ]
- 2 ~ xijeij - r.
O"e J-I J-I
=(O"~
O"e J[(I- r;) ~ Xijeij] =(O"~ O"~ J(7.( ~ xtJ[ ~ Xij~ij - /JXij)lJ
O"e J( O"v
J-I J-I J-I
=r_j[AfJj - fJA]
j j
= rj
_ [" A]
~ XijYij / "L Xij2 - fJ . (13.4.7.29)
pI J=I
Further if ~ = (o,o,... ,Xj,....o.o] then the best linear unbiased predictor (BLUP) of
ffJ stu
+ is given by
.u(random)=Xj/J+Xjrj~j-/J) . (13.4.7.30)
Theorem 13.4.7.3.1. The mean squared error of the BLUP under random
regression coefficient model with one auxiliary variable is given by
V~)=(xoV-IXt' =O"~/j~lj
A ( + VjXij + eij)y LlIj Xij2=fJ+Vj + L1Ij xijeij / LlIj Xij2
fJj =LlIj XijYij / LlIj Xij2=LlIj XijlpXij n
Also
j
rr;. - fJfj\ ] =E[2
EllPj Vj + v. LlIj xijeij /"L Xij2] =O"v·2
j=l j=1
Thus we have
MSE[.u(random)] =E[.u(random)-,uJ =E[Xj/J + rjXj ~j - /J)- XjfJ _XjVj]2
13. Miscellaneous topics 1097
For estimating per capita income for small areas (population less than 1000), Fay
and Herriot (1979) assumed that the k vector of benchmark variables
Xi = (Xi\,XiZ, ,,,,Xik)' related to Jli, is available for each area i, and that the Jli are
independent N~;P, A), where P is a k vector of unknown parameters. The
sample mean vector )I = ()II ' )lz,..·,)1/) = col ()Ii)' given JI = col (Jli) is N(JI,D) ,
1:5;";/ I";i";/
where D = diag(Di) with known diagonal element. The model can be stated as a
I";i";/
linear model
where e = (el, ez ,...,e/) and v =(VI> Vz ,...,v/) are distributed independently as N(O,D)
and N(O,AI), respectively. For the Fay and Herriot model, the best linear unbiased
predictor (BLUP) of Jli is obtained as
where P=(Xl V-I X t X' V-I)I with V = diag(A + D A + Dz, ..., A + D/) andI,
X = col (x;). Under normality the estimator (13.4.7.33) is also a Bayes estimator, as
1:5;";/
shown by Fay and Herriot (1979). Note that ,iJ(FH)~)li if D;/(A i +Di)~ 0, and
,iJ(FH)~ x;P if A;/(Ai +Di)~ O.
Consider Yijk denotes the value of the /(iJ unit in the cell (i,i), the JI j are fixed
effects and the error term eijk are uncorrelated with zero means and variance a; .
Holt, Smith, and Tomberlin (1979) obtain a best linear unbiased prediction (BLUP)
estimator of Yi under the linear model for the finite population
1098 Advanced sampling theory with applications
Further Nij denotes the number of population elements in the large domain j that
belong to the small area i . Let nij elements in a sample of size n fall in the cell
(i,j) , and Yij and YOj denote the sample means for (i,j) and j, respectively., The
best linear unbiased estimator of f.J j under the model (13.4.8.1) is jJj = YO j which
in turn leads to the BLUP estimator of Yj given by
'8 Nk· C
Y; = L Y;j , (13.4.8.2)
j;l
The use of estimates of proportion and their accuracy is well known in different
disciplines such as economics, criminology, etc., and a practitioner always look for
an improved methodology . Fay and Herriot (1979) and their followers, as reviewed
by Ghosh and Rao (1994), consider the problem of estimation of mean in small
areas of a continuous study variable such as income, tax return and yield of a crop
based on the standard normal theory distribution. Unfortunately such a theory is not
appropriate for estimating proportion of an attribute in small areas. MacGibbon and
Tomberlin (1989) consider, the classical statistical logistic regression theory, that
the logit transform of the proportion, not the proportion itself, that has to be
modelled in a linear way while estimating proportion in small areas. Dempster and
Tomberlin (1980) are the first to consider the problem of inference from a relatively
thinly spread complex, multi-stage surveys to small area or domains not necessarily
included in the survey by using model based approach. They estimated proportion
for small areas and associated uncertainty by making use of random effects,
multiple logistic regression model and empirical Bayes techniques . This explicitly
model based methods differs substantially from the implicitly model based
approach of the synthetic estimation techniques of Gonzalez and Hoza (1976).
Farrell (1997, 2000), Farrell, MacGibbon, and Tomberlin (1994, 1997a, 1997b,
1997c) pointed out that the importance of small area estimation as a facet of survey
sampling cannot be over emphasised. Of late there has been an increasing demand
for small area statistics in both the public and private sectors . They considered the
problem of gain in power of a Bayesian approach by borrowing strength from an
ensemble, while simultaneously obtaining desirable frequents operating
characteristic and adopted empirical Bayes methodology for the said purpose .
Farrell, MacGibbon, and Tomberlin studied the various adjustments in the empirical
Bayes interval estimates of small area proportions by following different kinds of
bootstrap methodology developed by Laird and Louis (1987) and modifications
suggested by Carlin and Gelfand (1990, 1991).
where Ifij denotes the probability ofa 'response' for the j" unit in the /h area, the
subscript i refers to a set of categorical variable covariates, and the subscript j
refers to a set of nested sampling characteristics, indicating primary stage units
(PSU), second stage units (SSU), SSU within PSU, and so on. The parameter 0
represents a sum of fixed classification effects, the parameter ¢i represents a sum of
random effects associated with sampling characteristics, the vector X ij represents a
vector of quantitative covariates, and the parameter P is a vector of fixed logistic
linear regression parameters. The random effects parameters are assumed to have
some parametric distribution. The probabilities Ifij are obtained by inverting
Ifij[1 + exp~ (0+ XijP +¢i )}]-I .
= (13.4.9.2)
The Bayes estimate for the /h small area is given by
"'[Jrij
• J (13.4.9.3)
Pi=T'
I
where
irij = [1+exp~(e+Xij,B+¢;}}jl (13.4.9.4)
such that
IYijXij = IirijXij ' (13.4.9.5)
i,J i,J
I Yij = Iirij , (13.4.9.6)
i,J i,J
and
L~ij -irij)-¢j(J"2 = O. (13.4.9.7)
J
The equations (13.4.9.5), (13.4.9.6) and (13.4.9.7) can be solved by the Newton--
Raphson algorithm. If Zij represents a vector of predictor variables, both
quantitative and qualitative, associated with the (i,j~h individual and I' represent a
vector of the parameters of the model
z[r = O+XijP+¢J'
(13.4.9.8)
then
1\
and (J"2 is known. If (J"2 is unknown, then it can be estimated by the empirical
Bayes EM algorithm of Dempster, Laird, and Rubin (1977).
13. Miscellaneous topics 1101
Exercise 13.1. What is the raking ratio estimation technique? Explain with the help
of a 3 x 3 contingency table.
Exercise 13.3. Show that RHC strategy remains design unbiased for estimating
population total of a continuous population .
Hint: Padmawar (1996) .
Exercise 13.5. Discuss the Vital Rate Method and Housing Unit Method techniques
for small area estimation. Discuss two methods of choosing optimal weights, y.,
I
in
case of the composite estimator, ric, of small area estimation . Show that if Yi ~ 0 ,
the composite estimator reduces to synthetic estimator, and if Yi ~ I, then the
compos ite estimator reduces to the direct estimator ri" .
Hint: Ghosh and Rao (1994).
- -(XJ
x
YR =y
n;. = 21
Explain the raking ratio method of estimation with minimum four iterations.
Practical 13.2. The average area in hectares, number of countries in each continent,
and the number of countries to be selected from each one of the 10 continent are as
given in the following table.
1 6 3194.50 3
2 6 14660.00 3
3 8 18309.37 4
4 10 14923.50 5
5 12 5987.83 5
6 4 3450.00 3
7 30 11682.73 12
8 17 145162.30 8
9 10 33976.10 5
10 3 1333.33 2
Practical 13.4. In a university let the total number students be 20,000 during 1999.
During 2000 there is a recruitment of 2,500 students according to the schedule and
later on 150 students left the university and stopped their studies . There was
migration -10 students (say, 30 students migrated to other universities and 20
students came from other universities during the academic year 2000). Apply the
Census Component Method (CCM) to estimate the total number of students in the
university during 2000.
13. Miscellaneous topics 1103
Practical 13.5. The followin"j figure shows a plant growing area near the main road
of a city. The upward arrow L shows a good plant, whereas the two-headed arrow
I shows a defective plant. The crossing block arrow ~ shows the possibility
of footpaths across the plant growing area.
Now divide the plant growing area into four small areas using the four directions of
the block arrow. Estimate the proportion of defective plants in each small area.
( b ) Consider that the information about the distance of each plant row from the
road is known. Suggest if this can be used to improve the estimator of proportion
developed above. Note that the growth is less near the footpaths .
( c ) Can you think of any information which can improve the estimates?
ApPENDIX
•.";;;,~ .';"T", ~
{,0.30 ,,,, 0.10 -s,
df " Iff~ a . ~. 0.50 ll;.0.25. ~O:20 0.05 11" ,0 :02 1~0.()1 ,
01 1.000 1.963 2.414 3.078 6.314 12.71 31.82 63.65
02 0.816 1.386 1.604 1.886 2.920 4.303 6.965 9.925
03 0.765 1.250 1.423 1.638 2.353 3.182 4.541 5.841
04 0.741 1.190 1.344 1.533 2.132 2.776 3.747 4.604
05 0.727 1.156 1.301 1.476 2.015 2.571 3.365 4.032
06 0.718 1.134 1.273 1.440 1.943 2.447 3.143 3.707
07 0.711 1.119 1.254 1.415 1.895 2.365 2.998 3.499
08 0.706 1.108 1.240 1.397 1.860 2.306 2.896 3.355
09 0.703 1.100 1.230 1.383 1.833 2.262 2.821 3.250
10 0.700 1.093 1.221 1.372 1.812 2.228 2.764 3.169
11 0.697 1.088 1.214 1.363 1.796 2.201 2.718 3.106
12 0.695 1.083 1.209 1.356 1.782 2.179 2.681 3.055
13 0.694 1.079 1.204 1.350 1.771 2.160 2.650 3.012
14 0.692 1.076 1.200 1.345 1.761 2.145 2.624 2.977
15 0.691 1.074 1.197 1.341 1.753 2.131 2.602 2.947
16 0.690 1.071 1.194 1.337 1.746 2.120 2.583 2.921
17 0.689 1.069 1.191 1.333 1.740 2.110 2.567 2.898
18 0.688 1.067 1.189 1.330 1.734 2.101 2.552 2.878
19 0.688 1.066 1.187 1.328 1.729 2.093 2.539 2.861
20 0.687 1.064 1.185 1.325 1.725 2.086 2.528 2.845
21 0.686 1.063 1.183 1.323 1.721 2.080 2.518 2.831
22 0.686 1.061 1.182 1.321 1.717 2.074 2.508 2.819
23 0.685 1.060 1.180 1.319 1.714 2.069 2.500 2.807
24 0.685 1.059 1.179 1.318 1.711 2.064 2.492 2.797
25 0.684 1.058 1.178 1.316 1.708 2.060 2.485 2.787
26 0.684 1.058 1.177 1.315 1.706 2.056 2.479 2.779
27 0.684 1.057 1.176 1.314 1.703 2.052 2.473 2.771
28 0.683 1.056 1.175 1.313 1.701 2.048 2.467 2.763
29 0.683 1.055 1.174 1.311 1.699 2.045 2.462 2.756
30 0.683 1.055 1.173 1.310 1.697 2.042 2.457 2.750
31 0.682 1.054 1.172 1.309 1.696 2.040 2.453 2.744
32 0.682 1.054 1.172 1.309 1.694 2.037 2.449 2.738
Continued .
1108 Advanced sampling theory with applications
Table 3. Area under the standard normal curve for (0 ~ z < 00).
l'i, ,~ ~O!OI~ !1;, 0.02 ~'. ~ 0:03'li 1'1:;,0:04,<; J[O':05'" :lo;O(j:~ j:·0;07~~ ~ 0.08~ ';1:.0.09 .:
If, Z ,~ I:l0.00.
0.0. 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.11' 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0:2'; 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
'1 0:3+ 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
n o~~'" 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0:5' 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8> 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0:9', 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
wi~Q~ 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
!f!l~ 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.38 10 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3~ 0.403 2 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4; 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
HS, 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
;fl :6~ 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
~:e7:~ 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
~1 .81 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
f~9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
"2.0: 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
, 2~ 1~ 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
n .2' 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
' ~ 2:3 1 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
1'2:'4&0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5' 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6, 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7. 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8y 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
~2:9i; 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
1'310] 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3:1@0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2! 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3::4' 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
1:13i!sl 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998
~3 . 6i. 0.4998 0.4998 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
oO''i 0.5000
Source: Generated in Excel usmg the NORMSDIST(x)
Appendix 1111
1 AL 348.334 408.978
2 AK 3.433 2.605
3 AZ 431.439 54.633
4 AR 848.317 907.700
5 CA 3928.732 1343.461
6 CO 906.281 315.809
7 CT 4.373 7.130
8 DE 43.229 42.808
9 FL 464.516 825.748
10 GA 540.696 939.460
11 HI 38.067 40.775
12 ID 1006.036 53.753
13 IL 2610.572 2131.048
14 IN 1022.782 1213.024
15 IA 3909.738 2327 .025
16 KS 2580.304 1049.834
17 KY 557.656 1045.106
18 LA 405.799 282.565
19 ME 51.539 8.849
20 MD 57.684 139.628
21 MA 56.471 7.590
22 MI 440.518 323.028
23 MN 2466.892 1354.768
24 MS 549.551 627.013
25 MO 1519.994 1579.686
26 MT 722.034 292.965
27 NE 3585.406 1337.852
28 NV 16.710 5.860
29 NH 0.471 6.044
30 NJ 27.508 39.860
31 NM 274.035 140.582
32 NY 426.274 201.631
33 NC 494.730 639.571
34 ND 1241.369 449.099
35 OH 635.774 870.720
36 OK 1716.087 612.108
37 OR 571.487 114.899
38 PA 298.351 756.169
Contmued .
1112 Advanced sampling theory with applications
39 RI 0.233 1.611
40 SC 80.750 87.951
41 SD 1692.817 413.777
42 TN 388.869 553 .266
43 TX 3520.361 1248.761
44 UT 197.244 56.908
45 VT 19.363 57.747
46 VA 188.477 321.583
47 WA 1228.607 1100.745
48 WV 29.291 99.277
49 WI 1372.439 1229.752
50 WY 386.479 100.964
Source: Agricultural Statistics (1999) Washington, US.
Appendix 1113
1 60 492
2 72 384
3 55 408
4 56 465
5 82 312
6 78 315
7 67 420
8 74 381
9 84 276
10 56 465
II 68 420
12 70 360
13 59 435
14 64 405
15 53 510
16 66 420
17 78 345
18 63 405
19 77 330
20 73 285
21 55 438
22 71 360
23 63 390
24 87 270
25 61 375
26 58 375
27 60 390
28 70 360
29 66 390
30 72 345
1114 Advanced sampling theory with applications
~
1 Sharks, other 1467 1385 2001 2016
2 Sharks, dogfish 1039 1031 993 833
3 Skates/Rays 2152 1981 2939 2353
4 Eels 138 222 186 152
5 Herrings 28933 34060 38007 30027
6 Freshwater catfish 1100 1091 1377 666
7 Saltwater catfish 13466 12690 14441 13859
8 Toadfish 1784 2676 1781 1632
9 Atlantic cod 850 2693 1861 1942
10 Pollock 168 397 862 832
11 Red hake 559 216 369 184
12 Codfishlhakes, other 73 124 130 266
13 Searobins 4768 7726 4707 4793
14 Sculpins 54 698 136 71
15 White perch 3669 5281 4648 3489
16 Striped bass 3840 4799 8521 10758
17 Temperate bass, other 5 35 32 23
18 Black sea bass 11759 12758 11892 17723
19 Groupers 4661 4236 4583 4923
20 Sea bass, other 2797 2690 2138 2068
21 Bluefish 11990 10301 12405 10940
22 Crevalle jack 3542 2569 2978 3951
23 Blue runner 2371 3800 5692 2319
24 Greater amber jack 692 1141 332 164
25 Florida pompano 498 641 425 644
26 Jacks, other 4463 3802 1878 1625
27 Dolphins 1484 1926 2449 2613
28 Gray snapper 5363 5154 4845 4552
29 Red snapper 2024 2546 2011 1608
30 Lane snapper 919 1079 1088 859
31 Vermilion snapper 950 1228 826 1200
32 Yellowtail snapper 1649 2061 1247 1334
33 Snappers, others 746 861 462 492
34 Pigfish 2955 2691 4918 4199
35 White grunt 5593 5356 5784 5678
36 Grunts, other 3039 3521 3186 3379
Continued .
Appendix 1117
- 1
2
3
4
United states
Afghanistan
Algeria
Angola
14.2
41.6
26.5
42 .6
8.8
16.6
5.4
15.9
6.2
137.5
42.2
125.9
76.3
47.8
69.6
48 .9
2.07
5.87
3.16
6.05
5 Argentina 19.9 7.6 17.8 75.0 2.64
6 Australia 13.0 6.9 5.0 80.4 1.80
7 Austria 10.3 10.3 5.8 77.3 1.52
8 Azerbaiian 21.1 8.6 72.3 65.4 2.61
9 Bangladesh 27.4 10.1 93.0 57.5 3.08
10 Belarus 14.0 12.9 12.0 69.4 1.92
11 Belgium 11.2 10.4 6.1 77.7 1.69
12 Bolivia 30.0 9.3 60.2 62.0 3.81
13 Brazil 19.2 10.1 47.7 60.9 2.14
14 Bulgaria 12.5 13.6 14.8 71.6 1.74
15 Burkina Faso 44.9 21.4 112.8 39.8 6.48
16 Burma 28.1 10.7 71.8 58.1 3.57
17 Burundi 41.0 15.3 95.2 48.1 6.25
18 Cambodia 41.0 14.3 100.5 51.5 5.81
19 Cameroon 41.4 13.9 74.1 51.3 5.73
20 Canada 12.3 7.2 5.5 80.0 1.80
21 Chad 42 .8 16.3 113.6 48.9 5.64
22 Chile 15.9 5.7 11.9 75.5 2.00
23 China 15.0 6.8 32.6 71.1 1.81
24 Colombia 19.0 4.6 21.3 74.2 2.17
25 Congo (Kinshasa) 46.5 15.6 98.9 48 .1 6.39
26 Cote d'ivoire 41.3 17.6 94.9 43 .7 5.80
27 Cuba 12.7 7.5 8.6 75.6 1.60
28 Czech Republic 13.3 11.0 8.0 74.2 1.74
29 Dominican Republic 21.1 5.4 40 .8 70.4 2.42
30 Ecuador 23.0 5.1 29.3 72.5 2.59
31 Egypt 26.2 8.1 65.7 62.7 3.24
32 Ethiopia 44 .3 17.6 117.7 46.0 6.75
33 France 12.1 9.0 5.6 79 .1 1.73
34 Germany 10.5 10.8 5.7 76.7 1.56
35 Ghana 30.8 10.2 74.8 57.5 3.95
36 Greece 10.7 9.5 6.6 79.0 1.52
37 Guatemala 31.2 6.5 44 .6 66.9 4.00
38 Guinea 40 .0 16.9 123.7 47.0 5.46
39 Haiti 32.3 14.7 98.4 50.2 4.50
Continued .
Appendix 1127
0.525913
0.882218 0.660721
.. 1l1'';1I ~ .. :il-0.842360 -0.828100 -0.910310 1
0.985520 0.549261 0.852536 -0.815260
Appendix 1129
[ I ] Adhvary u, D. ( 1978). Successive sampling using multi-auxi liary info rmation. Sankhy a. C, 40,
167--17 3.
[ 2 ] Adhvary u, D. and Gupta, P.C. (1983) . On some alternative sampling strategies using auxiliary
information. Metrika, 30 , 2 17--226
[ 3 ] Agarwa l, C.L. and Tikkiwal, B.D. ( 1980). Two stage sampling on successive occasions. Sankhy a.
42, C, 31--44 .
[ 4 ] Agarw al, O.K. and Singh, P. (1982). On cluster sampling strategies using ancillary information.
Sankhy ii , B, 44, 184--192.
[ 5 ] Agarwal, M.C. and Jain , N. (1989). A new predi ctive product estimator. Biometrika. 76,822--823 .
[ 6] Agarwal, M.C. and Panda, K.B. (1993) . An efficient estimator in post stratification. Metron , 179--
188.
[ 7 ] Agarwal, M.C. and Roy, D.C. (1999). Efficient estimators for small domains. 1. Indian Soc. Agric.
Statist.., 52(3), 327--337.
[ 8] Aagrwa l, M.C. and Sthapit, A.B. ( 1996). Model assisted selection of a product strategy. J. Indian
Soc. Agric. Statist., 48(2), 207--215.
[ 9 ] Agarwal, S.K . (1980). Two auxiliary variates in ratio method of estimation. Biom. J., 22(7), 569--
573.
[ 10 1 Agarwa l, S.K. and Kumar, P. (1980) . Combination of ratio and PPS estima tor. 1. Indian Soc. Agric.
Statist.. 32, 8 1--86 .
[ I I ] Agarwal, S.K., Sharma, U.K. and Kashyap, S. (1997) . A new approach to use mult ivariate auxiliary
information in sample surveys. J. Statist. Planning Inf er.. 60, 261--267.
[ 12] Agarwal, S.K., Singh , M. and Goel, B.B.P .S. ( 1979). Use of p-auxiliary variates in PPS samplin g.
Biom. J., 21(8), 781--785.
[ 13 ] Ahmed, M.S. (1997). The general class of chain estimators for the ratio of two means using double
sampling. Commun . Statist.--Theory Meth., 26(9), 2247--2254.
[ 14] Ahmed, M.S. (199 8). A note on regression type estimators using multiple auxiliary informat ion.
Austral. & New Zealand J. Statist., 40(3 ), 373--376.
[ 15 ] Ahmad, T. (1997). A resamplin g techniqu e for complex survey data. J. Indian Soc. Agric. Statist.,
50(3),364--37 9.
[ 16] Ahsan, MJ. and Khan , S.U. (1982). Optimum allocation in multivariate stratified random sampling
with overhead cost. Metrika, 29, 71--78.
[ 17] Aires, N. (2000). Compariso ns between conditional Poisson sampling and Pareto tips sampl ing
design, J. Statist. Planning Infer.. 88, 133-- 147.
[ 18 1 Ajga onkar, S.G.P. ( 1975). The efficien t use of supplementary information in double sampl ing
a
procedure. Sankhy ,C,37, 18 1--189.
1132 Advanced sampling theory with applications
[ 19 ] Akar, I. and Sedransk, J. (1979). Post-stratified cluster sampling. Sankhy d ,C, 41, 76--83.
[ 20 ] Alalouf, I.S. (1996). The estimation of a proportion in cluster sampling. Commun. Statist »- Theory
Meth ., 25(2), 325--343.
[21] Allen, J., Saxena, S., Singh, H.P., Singh, S. and Smarandaehe, F. (2002). Randomness and optimal
estimation in data sampling . American Research Press. 26--43.
[22] Amahia, a .N., Chaubey, Y.P. and Rao, TJ. (1989). Efficiency ofa new estimator in PPS sampling
for multiple characteristics . J. Statist . Planning Infer., 21,75--84 .
[ 23 ] Amdekar, SJ. (1985). An unbiased estimator in overlapping clusters. Calcutta Statist. Assoc. Bull .,
15, 231--232.
[ 24 ] Anderson, H. (1976). Estimation of a proportion through randomized response . Int. Statist. Rev. ,
44,213--217.
[ 25 ] Anderson, H. (1977). Efficiency versus protection in the general randomized response model.
Scand. J. Stati st., 4,11--19.
[ 26 ] Andreatta, a . and Kaufman, a .M. (1986). Estimates of finite population when sampling is without
replacement and proportional to magnitude. J. Amer. Statist . Assoc., 81, 657--666.
[ 27 ] Anscombe, F.J. (1948). The validity of comparative experiments. 1.R. Statist . Soc.. A, 61, 181--
211.
[ 29] Amab, R. (l979b). An addendum to Singh and Singh's paper on random nonresponse in unequal
probability sampling. Sankhy a.
C, 41, 138--140.
[30] Amab, R. (1988). Variance estimation in multi-satge sampling. Aust. 1. Statist .,30, 107--110.
[ 36 ] Arnab, R. (1996). Randomized response trials : A unified approach for qualitative data. Commun .
Statist.i-Theory Meth.. 25(6), 1173--1183.
[37] Amab, R. (1998). Sampling on two occasions: Estimation of population total. Survey Methodology,
24,185--192.
Bibliography 1133
[38] Arnab, R. (1999). On use of distinct respondents in randomized response surveys. Biom.J.,41(4),
507--513.
[ 39 ] Amab, R. (200 I) . Estimation of a finite population total in varying probability sampling for multi-
character surveys. Metrika, 54(2), 159--177.
[ 40 ] Amab, R. (2002). Optimum sampling strategies under randomized response surveys. Biom. J.,
44(4),490--495.
[41 ] Amab, R. and Singh, S. (2001). On the estimation of population total and variance in the presence
of non-response . Presented on JSM--2001 conference Atlanta. USA.
[ 42 ] Amab, R. and Singh, S. (2002a). Calibration for variance estimator of generalized regression
predictor. Submittedfor possible presentation at JSM--2003 , California , USA.
[ 43 ] Amab, R. and Singh, S. (2002b). Estimation of the size and mean value of a stigmatized
characteristic of a hidden gang in a finite population: a unified approach. Ann . Inst. Math . Stat., 54(3),
659--666.
[ 44 ] Amab, R and Singh, S. (2002c). On the estimation of size and mean value of a stigmatized
characterstic of a hiden gang in finite populations. Recent Advances in Statistical Methods--Proceedings
ofStatistics 2001 Concordia University Conference , 1--11.
[ 45 ] Amab, R. and Singh, S. (2002d). Estimation of variance form missing data. Presented at Statistical
Society ofCanada Conferen ce at Hamilton. Canada .
[ 46 ] Amab, R. and Singh, S. (2002e). Jackknifing the imputed data in addition to observed data while
estimating variance of the ratio estimator. Working paper.
[ 47 ] Arnholt, A.T. and Hebert, lL. (1995). Estimating the mean with known coefficient of variation.
American Statistician, 49(4), 367--369.
[ 48 ] Artes, E. and Garcia, A. (2000a). A note on successive sampling using auxiliary information.
Proceedings ofthe ts : International Workshop on Statistical Modelling , 376--379.
[ 49 ] Artes, E. and Garcia, A. (2000b). Sobre muestreo en ocasiones sucesivas. Aetas del IX congreso
sobr e ensenanza y aprendizaje de las Matematicas, 153--155.
[ 50 ] Artes, E. and Garcia, A. (200 I a). Metodo diferencia multivariate en muestreo en dos ocasiones .
Vlll Conferencia Espanola de Biometria, 199--200.
[ 51 ] Artes, E. and Garcia, A. (200 I b). Successive sampling for the ratio of population parameters.
Journal ofthe Portuguese Nacional Statisti cal Institute, En prensa.
[ 52 ] Artes, E. and Garcia, A. (200Ic). Estimating the current mean in successive sampling using a
product estimate.Conference on Agricultural and Environmental Statist . Application in Rome, XLIII-I-
-XLllI--2.
[ 54 ] Artes, E. and Garcia, A. (200 Ie). Estimation of current population ratio in successive sampling. J.
Indian Soc. Agric. Statist. , 54(3), 342--354.
[55] Artes, E., Rueda, M. and Arcos, A. (1998). Successive sampling using a product estimate. Appli ed
sciences and the environm ent. computational mechanics publications, 85--90 .
1134 Advanced sampling theory with applications
[ 56 ) Asok, C. ( 1974). Contribution to the theory of unequal probability sampling witho ut replacemen t.
Unpublished Ph.D. Thesis, Iowa State University, Ames, Iowa.
[ 57) Asok, C. ( 1980). A note on the comparison between simple mean and mean based on distinct units
in sampling with replacement. American Statistician, 34, 158.
[ 58 ) Asok, C. and Sukhatme, B.V. ( 1975). Unequal probability sampling with random stratification.
Proc. Amer. Statist. Assoc., 283--288.
[ 59 ) Asok, C. and Sukhatme, B.V. ( 1976a). On the efficiency compariso n of two 7LpS sampling
strategies. Proc. Amer. Statist. Asso c., 161--166.
[ 60 ) Asok, C. and Sukhatme, B.V. ( 1976b). On Sampford' s procedure of unequal probab ility sampling
without replacement. J. Amer. Statist. Asso c., 71, 9 12--9 18.
[ 61 ) Asok, C. and Sukhatme, B.V. (1978). A note on Midzuno Scheme of sampling. Pap er pr esented at
the 32nd Annual Conference ofthe Indian Soc. Agr icul. Stat ist., New Delhi,/n dia.
[ 62 ) Avdhani, M.S. (1968). Contribution to the theory of sampli ng from finite population and its
applic ation. Ph.D. thesis, Delhi University.
[ 63 ) Bahadur , R.R. ( 1954). Sufficiency and statistical decision functions. Ann. Math. Statist., 25,423--
462.
[ 64 ) Bandyopadhyay, S. (1980). Improved ratio and product estimators. Sa nkhy a,C, 42, 45--49.
[ 65) Bandyopadhyay, S., Chattopadhyaya, A.K. and Kundu, S. ( 1977). On estimation of population
total. Sankhy ii ,C, 39, 28--42.
[66) Bankier, M. D. (1986). Estimators based on several stratified samples with application to multiple-
frame surveys. J. Amer. Statis t. Assoc. , 81, 1074--1079.
[ 67 ] Bankier, M.D.(1988). Power allocations: Determining sample sizes for sub-national areas.
American Statist.,42(3), I74-- 178.
[ 68 ) Bansal, M.L and Singh, R. (1985). An alternative estimator for multiple characteristics in PPS
sampling. J. Statist. Planning Infer., I I, 313--320.
[ 69 ) Bansal, M.L. and Singh, R. (1986). On the generalization of Rao, Hartley and Cochran' s scheme.
Metrika, 33,307--3 14.
[ 70 ] Bansal, M.L. and Singh, R. (1989). An alternative estimator for multiple characteristics
correspond ing to Horvitz and Thompson estimator in probability proport ional to size and without
replacement sampling . Statistica, anno. XLIX, 3, 447--452 .
[ 71 ) Bansal, M.L. and Singh, R. (1990). An alternative estimator for multiple characteristics in RHC
sampling scheme. Commun . Statist. -Theory Meth. 19(5), 1777-- I784.
[ 72 ) Bansal, M.L., Singh, S. and Singh, R.(1994) Multi-character survey using randomized response
technique. Com mun.Stat ist.-- Theory Meth. 23(6), 1705--1715.
[ 73 ) Barnard, J. and Rubin, D.B. (1999). Small sample degree of freedom with multiple imputation.
Biometrika, 86(4), 948--955.
[ 74 ) Bartholomew, DJ. ( 1961). A method of allowing for not at home. bias in sample surveys. Applie d
Statist., 10, 52--59.
Bibliography 1135
[ 75 ] Bartlett, R.F. (1986). Estimating the total of a continuous populations . J. Statist. Planning Infer.. 13,
51--66.
[76] Basawa, LV., Godambe, V.P. and Taylor, R.L. (1997). Selected proceedings ofthe symposium on
estimating functions. Lecture Notes - Monograph Series, Institute of Mathematical Statistics, Hayward,
California.
[77 ] Basu, D. (1958). On sampling with and without replacement. Sankhy ii , 20, 287--294.
[ 78 ] Basu, D. (1971). An essay on the logical foundations of survey sampling. Part one. In: V.P.
Godambe and D.A. Sportt (eds.) Foundations of statistical inferences. Holt, Rinehart and Winston,
Toronto, 203--242.
[ 79 ] Battese, G.E., Harter, R.M. and Fuller, W.A. (1988). An error components model for prediction of
county crop areas using surveys and satellite data. J. Amer . Statist Assoc ., 83, 28--36.
[ 80 ] Bayless, D.L. and Rao, J.N.K. (1970). An empirical study of stabilities of estimators and variance
estimators in unequal probability sampling (n=3 or 4). 1. Amer. Statist . Assoc., 65, 1645--1667.
[81] Beale, E.M.L. (1962). Some use of computers in operational research. Industrie//e Organ ., 31, 27--
28.
[ 82 ] Bedi, P.K. (1995). An alternative estimator in Midzununo scheme for multiple characteristics.
Commun . Statist.--Simula., 17--30.
[ 83 ] Bedi, P.K. and Agarwal, S.K. (1999). Modified Midzuno scheme of sampling. J. Statist. Planning
Infer., 76, 203--214.
[ 84 ] Bedi, P.K. and Rao, TJ. (1996). Probability proportional to revised sizes with replacement scheme.
Metron, 67--82 .
[ 85 ] Bedi, P.K. and Rao, TJ. (2001). PPS method of estimation under a transformation . J. Indian Soc.
Argi c. Stati st., 54(2), 184--195.
[87] Bellhouse, D.R. (I984b). A review of optimal designs in survey sampling. Canad ian 1. Statist., 12,
53--65.
[ 89 ] Bellhouse, D.R. and Rao, J.N.K. (1975). Systematic sampling in the presence of a trend.
Biometrika, 62, 694--697.
[90] Bellhouse, D.R. and Rao, J.N.K. (1986). On the efficiency of prediction estimators in two-stage
sampling. 1. Statist . Planning Infer., 13, 269--281.
[91] Bennett, B.M. (1983). Alternate estimates in stratified sampling. Metron, 77--82.
[ 92 ] Bennett, B.M. and Islam, M.A. (1983). On relative precision in stratified sampling for proportions.
Metron, 19--22.
[ 93 ] Bethlehem, J.G and Keller, WJ. (1987). Linear weighting of sample survey data. J.Official Statist.,
141--153.
1136 Advanced sampling theory with applications
[ 94 ] Bhargava, M. (1996). An investigat ion into the efficiencies of certain randomized response
strategies. Unpublished Ph.D. thesis submitted to Punjab Agricultural University, Ludhiana, India.
[ 95 ] Bhargava, M. and Singh, R. (2000). A modified randomization device for Warner's model.
Stat istica, 60, 315--321.
[ 97 ] Bhargava, M. and Singh, R. (2002). On the efficiency comparison of certain randomized response
strategies. Metr ika, 55(3),191--197.
[ 98 ] Bhargava, N.K. (1978). On some applications of the technique of combined unordering. Sankhy d ,
C, 40,74--83.
[99] Bhatia, A., Mangat, N.S., and Morrison, T. (1998). Estimation of measurement errors. Proce edings
of the International Pipeline Conference 1998. Calgary. Canada. American Society of Mechanical
Engineers. I. 315--325.
[ 102 ] Binder, D.A. and Theberge, A. (1988). Estimating the variance of raking ratio estimators.
Canadian J. Statist ., 16,47--55 .
[ 103 ] Binder, D.A. and Patak, Z. (1994). Use of estimating functions for estimation from complex
surveys. J. Amer. Statist. Assoc., 89,1035--1043.
[ 104 ] Biradar, R.S. and Singh, H.P. (I 992a). A class of estimators for finite population correlation
coefficient using auxiliary information. J. Indian Soc. Agril. Stat ist., 44, 271--285.
[ 105 ] Biradar, R.S. and Singh, H.P. (1992b). A note on an almost unbiased ratio cum product estimator.
Metron , 249- -255.
[ 106 ] Biradar, R.S. and Singh, H.P. (1997-98). A class of estimators for population parameter using
supplementary information. Aliga rh J. Statist., 17/18, 54--71.
[ 107] Biradar, R.S. and Singh, H.P. (1998). Predictive estimation of finite population variance. Calcutta
Stat ist. Assoc . Bull., 48, 229--235.
[ 108 ] Blackwell, D. (1947). Conditional expectation and unbiased sequential estimation. Ann. Math .
Statist., 18, 105--110.
[ 109 ] Blackwell, D. (1951). Comparison of experiments. Proc. 2nd Berkeley symp. Math. Stat . Prob ., 93-
-102.
[ 110] Blight, BJ.N. (1973). Sampling from an autocorrelated finite population. Biometrika, 60, 375--
385.
[ III ] Bose, C. (1943). Note on the sampling error in the method of double sampling. Sankhy ii , 6, 330.
[ 112 ] Bogue, DJ. (1950). A technique for making extensive postcensus estimates. J. Amer. Statist.
Assoc.,45,149--163 .
Bibliography 1137
[ 113 ] Bourke, P.O. (1981). On the analysis of some multivariate randomized response designs for
categorical data. J. Statist. Plan ing Infer., 5,165--170.
[ 114] Bourke, P.O. and Dalenious, T. (1976). Some new ideas in the realm of randomized enquiries. Int.
Statist. Rev., 44, 219--221.
[ 115] Bouza, C. (1994). The use of auxiliary information for solving non-response problems. Test, 3,
113--122.
[ 116 I Brackstone, GJ. (1987). Small area data : policy issues and technical challenges. In Small Area
Statistics (R. Platek, J.N.K. Rao, C.E. Sarndal and M.P. Singh eds.) 3--20, Wiley New York.
[ 117 ] Brackstone, GJ. and Rao, J.N.K. (1979). An investigation of raking ratio estimators . Sankhy Ii ,
C, 41, 97--114.
[ 118 ] Bratley, P., Fox, B. L. and Schrage, L.E. (1983). A Guide to Simulation . NY: Springer--Verlag.
[ 119] Breau, P. and Ernst, L.R. (1983). Alternative estimators to the current composite estimator. Proc
ofthe section on Survey Research Meth ods, Amer. Statist. Assoc., 397--402.
[ 120 ] Breidt, FJ. and Opsomer, J.D . (2000). Local polynomial regression estimators in survey sampling.
Ann . Statist., 28(4),1026--1053 .
[ 121 ] Brewer, K.R.W. (1963a). Ratio estimation and finite populations: Some results deducible from the
assumption of an underlying stochastic process. Austral. J. Statist., 5, 93--105.
[ 122] Brewer, K.R.W. (l963b). A model of systematic sampling with unequal probabilities . Austral. J.
Stat ist ., 5, 5--13.
[ 123 ] Brewer, K.R.W. (1967). A note on Fellegi's method of sampling without replacement with
probabilities proportional to size. J. Amer. Statist. Assoc., 62, 79--85.
[ 124] Brewer, K.R.W. (1975). A simple procedure for sampling zps wor. Austra l. J. Statist., 17, 166--
172.
[ 125 ] Brewer, K.R.W. (1979). A class of robust sampling designs for large scale surveys. J. Amer.
Statist. Assoc.. 74, 911--915.
[ 126 ] Brewer, K.R.W. (1994). Survey sampling inference Some past perspectives and present
prospects. Pak. J. Statist., A, 10, 213--233.
[ 127] Brewer, K.R.W. (1995). Combining design based model based infer ence. Chapter 30 in Business
Survey Methods (Eds. B.G. Cox, D.A. Binder, B.N. Chinnapa, A. Christianson , MJ. Colledge and P.S.
Kolt). New York:Wiley, 589--606.
[ 128 ] Brewer, K.R.W. (I 999a). Cosmetic calibration with unequal probability sampling. Survey
Meth odology, 25(2), 205--212.
[ 129 ] Brewer, K.R.W. (I 999b). Design based or prediction based inference? Stratified random vs.
Stratified balanced sampling. Int. Statist. Rev., 67, 35--47.
[ 131 ] Brewer, K.R.W. and Hanif, M. (1970). Durbin's new multistage variance estimator. J. R. Statist.
Soc., B, 32, 302--311.
1138 Advanced sampling theory with applications
[ 132 ] Brewer, K.R.W. and Hanif, M. (1983). Sampling with unequal probabilities. New York
Springer --Verlag.
[ 133] Brewer, K.R.W. and Undy, G.C. (1962). Samples of two units drawn with unequal probabilities
without replacement. Austral. J. Statist., 4, 89--100.
[ 134 ] Brewer, K.R.W., Early, L.J. and Hanif, M. (1984). Poisson, modified poisson and collocated
sampling.J. Statist. Planning Infer., 10, 15--30.
[ 135] Brillinger, D.R., Jones, L.V. and Tukey, J.W. (1978). Report of the statistical task force for the
weather modification advisory board. The Management of Western Resources. Vol. II: The Role of
Statistics on Weather Resources Management. Government Printing Office, Washington, DC.
[ 136] Brown, B.M., Hall, P. and Young, G.A. (2001). The smoothed median and the bootstrap.
Biometrika, 88(2), 519--534.
[ 137 ] Brown, J.A. (1996). The relative efficiency ofadaptive cluster sampling for ecological surveys.
Mathematical and Information Sciences Reports, series B, 96/08, Massey University.
[ 138] Brown, J.A. (1999). A comparison of two adaptive sampling designs. Austral. & New Zealand J.
Statist., 41(4),395--403.
[ 139 ] Bryant, E.C., Hartley, H.O. and Jessen, R.J. (1960). Design and estimation in two-way
stratification. J. Amer. Statist. Assoc.. 55, 105--124.
[ 140] Buckland, W.R. (1951). A review of the literature of the systematic sampling. J. R. Statist. Soc.,
B, 13,208--215.
[ 141 ] Burdick, R.K. and Sielken, R.L. (1979). Variance estimation based on a superpopulation model
in two stage sampling. J. Amer. Statist. Assoc., 74, 438--440.
[ 142 ] Carlin, B.P. and Gelfand, A.E. (1990). Approaches for empirical Bayes confidence intervals. 1.
Amer. Statist. Assoc.. 85, 105--114.
[ 143 ] Carlin, B.P. and Gelfand, A.E. (1991). A sample re-use method for accurate parametric empirical
Bayes confidence intervals. J.R. Statist. Soc., B, 53, 189--200.
[ 144 ] Casady, R.J. and Lepkowski, J.M. (1993). Stratified telephone survey designs. Survey
Methodology, 19, 103--113.
[ 145 ] Cassel, C.M. and Sarndal, C.E. (1974). Evaluation of some sampling strategies using a
continuous variable framework. Commun. Statist.--Theory Meth.. 3,373--390.
[ 146 ] Cassel, C.M., Sarndal, C.E. and Wretman, J.H. (1976). Some results on generalized difference
estimation and generalized regression estimation for finite populations. Biometrika. 63, 615--620.
[ 147 ] Cassel, C.M., Samdal, C.E. and Wretman, J.H. (1977). Foundations of Inference in Survey
Sampling. John Wiley and Sons, New York.
[ 148 ] Cassel, C.M., Sarndal, C.E. and Wretman, J.H. (1979). Some uses of statistical models in
connection with the non-response problem. Symposium on Incomplete Data. Preliminary Proc..
Washington. D.C.
[ 149 ] Causey, B.D. (1972). Sensitivity of raked contingency table totals to changes in problem
conditions. Ann. Math. Statist., 43, 656--658.
Bibliography 1139
[ 150 ] Causeur, D. (1999). Exact distribution of the regression estimator in double sampling. Statistics ,
32,297--315 .
[ 151 ] Cebrian, A.A . and Garcia, M.R. (1997) . Variance estimation using aux iliary information : An
almost unbi ased multivariate ratio estimator. Metrika, 45, 171--178 .
[ 152 ] Ceccon, C., Diana, G. and Salvan , A. (1991). Approccio c1assico al campionamento da
popolazioni finite: Alcuni risultati recenti, CLEUP, Padova.
[ 1531 Chakrabarty, M.C. (1963). On the use of incidence matrices in sampling from finite populations.
J. Indian Statist . Assoc, I, 78--85 .
[ 154] Chakrabarty, R.P. (1968). Contribution to the theory ofratio type estimators. Ph.D . Thesis, Texas
A and M University.
[ 155] Chak rabarty, R.P . (1979). Some ratio type estimators . J. Indian Soc. Agril. Statist, 31,49--62.
[ 156] Chand , L. (1975). Some ratio type estimators based on two or more auxilia ry variables. Ph.D.
thesis submitted to Iowa State University, Ames, Iowa.
[ 157] Chang, H.J. and Liang, D.H. (1996). A randomized response procedure for two unrelated sensitive
questions. J. Information & Optimization Sci., 17(1), 185--198 .
[ 158 ] Chang, H.J . and Huang , K.C . (200Ia). On construction of almost unbiased estimators of finite
population mean using transformed auxiliary variable . Statistical Papers, 42(4), 505--515 .
[ 159] Chang , H.J. and Huang, K.C . (2001b). Estimation of proportion and sensitivity ofa qualitative
character. Metrika, 53(2), 269--280.
[ 160] Chang, K.C., Han, C.P. and Hawkins, D.L. (1999) . Truncated multiple inverse sampling in post-
stratification. 1. Statist . Planning Infer., 76, 215--234.
[ 161 ] Chang, K.C., Liu, J.F . and Han, c.e. (1998) . Multiple inverse sampling in post-stratification. J.
Statist . Planning Infer., 69,209--227.
[ 162 ] Chao, M.T. (1982). A general purpose unequa l probability sampling plan . Biometrika, 69, 653--
656.
[ 163 1 Chatterjee, S. and Simon , G. (1993) . Confidentiality guaranteed : A non-invasive procedure for
collecting sensitive information. Comm. Statist-Theory Meth., 22(6), 1629--1651.
[ 164] Chaubey, Y.P. and Crisalli, A.N. (1995) . Adjustment of the inclus ion probabilities in case of non-
response. Statistical Society ofCanada, Proceedings ofthe Survey Methods Section, 75--79.
[ 1651 Chaudhuri, A. (19 74). On some properties of sampling scheme due to Midzuno. Calcutt Statist.
Assoc. Bull; 23, 1--9.
[ 166 ] Chaudhuri, A. (I 975a) . A simple method of sampling without replacement with inclusion
probabilities exactly proportional to size. Metrika , 22,147--152.
[ 167] Chaudhuri, A. (1975b). Some results concerning Horvitz and Thompson 's T, -class of estimators.
Metrika, 217--223.
[ 168 ] Chaudhuri, A. (1976) . A non-negativ ity criterion for a certain variance estimator. Metrika, 23,
201--205 .
1140 Advanced sampling theory with applications
[ 169 ] Chaudhuri, A. (1977). On some properties of the Horvitz and Thompson estimator based on
Midzuno 's Jr p s sampling scheme. J. Indian Soc. Ag. Stat ist ., 47--52.
[ 171 ] Chaudhuri,A. (1992). Small domain statistic: a review. Techn ical Report ASC /92/2, Indian
Statistical Institute, Calcutta .
[ 172 ] Chaudhuri, A. ( 1993). Mean square error estimation in randomized response surveys. Pak. 1.
Statist., A, 9, 101--104.
[ 173 ] Chaudhuri , A. (1997 ). On a pragmatic modification of survey sampling in three stages . Commun.
Statist>-Theory Meth ., 26(7), 1805-- I 81O.
[ 174] Chaudhuri, A. (2001). Using randomized response from a complex survey to eliminate a sensitive
proport ion in a dichotomous finite population . J. Statist. Planning Infer., 94, 37--42.
[ 175 ] Chaudh uri, A. and Adhikary, AX (1983). On optimality of doub le sampling strategies with
varying probabili ties. J. Stati st. Planning Infer., 8, 257--265.
[ 176 ] Chaudh uri, A. and Adhikary, A.K. (1985). Some results on admissibility and uniform
admissibility in double sampling. J. Statist. Planning Infer., 12, 199--202 .
[ 177 ] Chaudh uri, A. and Adhikary, A.K. (1987). Circular systematic sampling with varying
probabil ities. Cal cutt a Sta tist. Ass oc. Bull.• 36, 193--I95.
[ 178 ] Chaudhuri , A. and Adhikary, A.K. (1990). Variance estimation with randomized response.
Commun. Statist» - Theory Meth ., 19(3), 1119--1125.
[ 179] Chaudhuri , A., Adhikary, A.K., Dihidar, S. (2000). Mean square error estimation in multi-stage
sampling. Metrika, 52, 2, I 15--13 1.
[ 180 ] Chaudhuri , A., Adhikary, A.K. and Seal, A.K. (1997). Small domain estimation by empirical
Bayes and Kalman filtering procedures-A case study. Commun. Statist -- Theory Me th., 26(7), 1613--
1621.
[ 181 ] Chaudhuri, A. and Amab, R. ( 1977). On the relative efficiencies of a few strategies of sampling
with varying probabilitie s on two occasions. Calcutta Stati st. Asso c. Bull .. 26, 25--38.
[ 182 ] Chaudhuri, A. and Amab , R. (1978). On the role of sample size in determin ing efficie ncy of
Horvitz and Thompson estimators . Sankh y d , C, 40, 104--109.
[ 183 ] Chaudhuri, A. and Amab, R. (1979). On the relative efficiencies of sampling strategies under a
superpopulation model. Sankhy a,
C, 41, 40--43.
[ 184] Chaudhuri , A. and Amab, R. ( 1982). On unbiased variance estimators with various multi-stage
sampling strategies . Sankhy Ii , B, 44, 92-- I0 I.
[ 185 ] Chaudhuri, A. and Maiti, T. (1994). Variance estimation in model assisted survey sampling.
Commun. Statist» - Theory Meth.. 23(4),1203--1214.
[ 186 ] Chaudhuri , A., Maiti, T. and Roy, D. (1996). A note on competing variance estimators in
randomized response surveys. Austral. J. Statist., 38(1), 35--42.
[ 187] Chaudhuri , A. and Mitra, J. ( 1992). A note on two variance estimators for Rao--Hartley--Cochran
estimator. Commun. Statist. -- Theory Meth., 21(12), 3535--3543 .
Bibliography 1141
[ 188 ] Chaudhuri , A. and Mukerjee, R. (1988). Randomized response: Theory and techniques. Marcel
Dekker, New York,.
[ 189 ] Chaudhuri, A. and Roy, D. (1997a). Optimal variance estimation for generalized regression
predictor. J. Statist. Planning Infer.. 60, 139--151.
[ 190] Chaudhuri, A. and Roy, D. (l997b). Model assisted survey sampling strategies with randomized
response. J. Statist . Planning Infer., 60, 61--68.
[ 191 ] Chaudhuri, A. and Vos, J.W.E. (1988). Unified theory and strategies of survey sampling. North
Holand.
[ 192 ] Chen, 1. and Shao, J. (2001). Jackknife variance estimation for nearest neighbour imputation. J.
Amer. Statist. Assoc., 96, 260--269.
[ 193 ] Chen, J. and Qin, 1. (1993). Empirical likelihood estimation for finite populations and the
effective usage of auxiliary information. Biometrika , 80,107--116.
[ 194 ] Chen, J., Rao, 1.N.K. and Sitter, R.R. (2000). Efficient random imputation for missing data in
complex surveys. Statistica Sinica , 10(4), 1153--1169.
[ 195 ] Chen, J., Sitter, R.R., and Wu, C. (2002). Using empirical likelihood methods to obtain range
restricted weights in regression estimators for surveys. Biometrika, 89, 230--237.
[ 196] Chen, S.X. (1998). Weighted polynomial models and weighted sampling schemes for finite
population . Annals ofStatistics, 26, 5, 1894--1515.
[ 197] Chernick, M.R. and Wright, T. (1983). Estimation of population mean with two-way stratification
using a systematic allocation scheme. J. Statist . Planning Infer.• 7, 219--231.
[ 198] Chotai, J. (1974). A note on Rao--Hartley--Cochran method for PPS sampling over two occasions.
Sankhy a, C, 36, 173--180.
[ 199 ] Choudhry, G.H. and Singh, M.P. (1979). Sampling with unequal probabilities and without
replacement-A rejective method. Survey Methodology, 5(2), 162--177.
[ 200 ] Christman, M. (1997). Efficiency of some sampling designs for spatially clustered populations.
Environmetrics, 8, 145--166.
[201] Christofides, T.C. (2003). A generalized randomized response technique . Metrika, 57,195--200.
[ 202 ] Chromy, J.R. (1974). Pairwise probabilities in probability non-replacement sampling. Presented
at ASA meeting at St. Louis, Missouri . USA.
[ 203 ] Clayton, D., Dunn, G., Pickles, A. and Spiegelhalter, D. (1998). Analysis of longitudinal binary
data from multi-phase sampling (with discussion). J. R. Statist . Soc., B, 60, 71--80.
[ 204 ] Cochran, W.G. (1940). Some properties of estimators based on sampling scheme with varying
probabilities. Austral. J. Statist ., 17,22--28.
[205] Cochran, W.G. (1963). Sampling Techniques. John Wiley and Sons : New York.
[207] Cochran, W.G. (1977). Sampling Techniques . 3'd Ed. John Wiley & Sons, New York.
1142 Advanced sampling theory with applications
[ 208 ] Conti, P.L. (1995). A note on the estimation of a proportion in sampling finite populations.
Metron, 35--41.
[209] Cox, D.R. (1958). The Planning ofExperiments. Wiley, New York.
[ 210 ] Cox, D.R. (1971). Discussion of Royall (1971). Foundation s of Statistical Inference (V.P.
Godambe and D.A. Sprott, eds). Holt, Rinehart & Winston, Toronto, 275.
[ 211 ] Cox, D.R.(1984). Present position and potential developments : some personal views, design of
experiments and regression . J. Roy. Statist . Soc., A, 147,306--315.
[212] Dalabehera, M. and Sahoo, L.N. (1995). Efficiency of six almost unbiased ratio estimators under a
particular model. Statist ical Hefte, 36, 61--67.
[ 213 ] Dalabehara, M . and Sahoo, L.N. (1997). A class of estimators in stratified sampling with two
auxiliary variables. J. Indian Soc. Agril. Statist., 50(2), 144--149.
[ 214 ] Dalabehera, M. and Sahoo, L.N (2000). An unbiased estimator in two-phase sampling using two
auxiliary variables. J. Indian Soc. Agric. Statist ., 53(2), 134--140.
[ 215 ] Dalenius, T. (1950). The problem of optimum stratification. Skand Akt., 33, 203--213.
[218] Dalenius, T. and Gurney, M. (1951). The problem of optimum stratification II. Skand. Akt., 34,
133--148.
[219] Dalenius, T. and Gurney, M. (1957). The choice of stratification points. Skand. and Akt.,40 , 198--
203.
[ 220 ] Dalenius, T. and Hodges, J.L. (1957). The choice of stratification points. Skandinavisk
Aktuarietidskrift.
[ 221 ] Dalenius, T. and Hodges, J.L. (1959). Minimum variance stratification . 1. Amer. Stat ist. Asso c.,
54.
[ 222 ] Das, A.C. (1950). Two dimensional systematic sampling. Sankhya, 10, 95--108 .
[223] Das, A.C. (1951). On two-phase sampling and sampling with varying probabilities. Bull. Int.
Statist . Inst., 33(2),105--112.
[224] Das, A.K. (1982). On the use of auxiliary information in estimating proportions . 1. Indian Statist .
Asso c., 20, 99--108 .
[ 225 ] Das, A.K. and Tripathi, T.P. (1978). Use of auxiliary information in estimating finite population
variance. Sankhy Ii ,C, 40, 139--148.
[ 226 ] Das, A.K. and Tripathi, T.P. (1979). A class estimators for population mean when mean of an
auxiliary character is known. Math. Tech. Report No. 22/79, lSI, Calcutta.
[227] Das, AX. and Tripathi, T.P. (1980). Sampling strategies for population mean when the coefficient
of variation of an auxiliary character is known. Sankhy a C,42,76--86.
[ 228 ] Das, G. and Bez, K. (1995). Preliminary test estimators in double sampling with two auxiliary
variables. Commun . Statist. - Theory Meth., 24(5),1211--1226.
Bibliography 1143
[229] Das, K. (1982). Estimation of population ratio on two occasions. J. Indian Soc. Agric. Statist ., 34
(2), 1--9.
[230] Datta, G.S., Day, B. and Basawa, LV. (1999). Empirical best linear unbiased and empirical Bayes
prediction in multivariate small area estimation. J. Statist. Planning Infer., 75, 269--279 .
[ 231 ] Datta, G.S., Day, B. and Maiti, T. (1998). A nested error regression model for multivariate
hierarchical Bayes estimation of small area means. Sankhy a,
A, 60, 344--362.
[ 232 ] Datta, G.S. and Ghosh, M. (1991). Bayesian prediction in linear models: applications to small area
estimation. Annals ofStatistics , 19, 1748--1770.
[ 233 ] Datta, G.S. and Lahiri, P. (1995). Robust hierarchical Bayes estimation of small area
characteristics in presence of covariates and outliers. J. Multtivar iate Analysis, 54, 310--328.
[ 234 ] Datta, G.S. and Lahiri, P. (2000). A unified measure of uncertainty of estimated best linear
unbiased predictors in small area estimation problems. Statistica Sinica, 10, 6 13--627.
[ 235 ] Datta, G.S., Lahiri, P., Maiti, T. and Lu, K.L. (1999). Hierarchical Bayes estimation of
unemployment rates for the states of the U.S. J. Amer. Statist. Assoc., 94,1074--1082.
[236] David, I.P. and Sukhatme, B.V. (1974). On the bias and mean square error of the ratio estimator.
J. Amer. Statist . Assoc., 69, 464--466.
[ 237 ] Dayal, S. (1979). Use of estimates of proportions of stratum sizes and standard deviations in
allocation of sample to different strata under stratified random sampling. Sankhy a.
C, 41, 159--175.
[ 238 ] Dayal, S. (1985). Allocation of sample using values of auxiliary characteristic . J. Statist. Plann ing
Infer.• II , 321--328.
[ 239 ] Deming, W.E. (1953). On a probability mechanism to attain an economic balance between the
resulting error of response and bias of non-response. J. Amer. Statist . Assoc., 48, 743--772.
[ 240 ] Deming, W.E. and Stephan, F.F. (1940). On a least square adjustment of a sampled frequency
table when the expected marginal totals are known. Ann. Math. Statist ., 11,427--444.
[241 ] Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data
via the EM algorithm (with discussion). 1. R. Statist. Soc., B, 39, 1--38.
[ 242] Dempster, A.P., Rubin, D.B., and Tsutakawa, R.K. (1981). Estimation in covariance component
models.J. Amer. Statist . Assoc ., 76, 341--353.
[ 243 ] Dempster, A.P. and Tomberlin, T.J. (1980). The analysis of census undercount from a post-
enumeration survey. Proceedings ofthe Conferenc e on Census Undercount, 88-94 .
[ 244 ] Deng, L.Y. and Wu, C.FJ. (1987). Estimation of variance of the regression estimator. J. Amer.
Statist . Assoc., 82, 568--575.
[ 245 ] Deshpande, M.N. (1978). A new sampling procedure with varying probabilities. 1. Indian Soc.
Agric. Statist ., 30,110--114.
[ 246] Deshpande, M.N. (1980). A note on the comparison between simple random sampling with and
without replacement. Metrika, 27,277--279.
[ 247] Deshpande, M.N. and Ajgoankar, S.G.P. (1977). On multitrial sampling methods. Biometrika, 64,
422--424.
1144 Advanced sampling theory with applications
[ 248 ] Deshpande, M.N. and Ajgoankar, S.G.P. (1987). A generalization of Midzuno sampling
procedure. Aust. J. Stat , 29,1 88--192.
[ 249] Deville, J.C. and Goga, C. (2002). The Horvitz--Thompson theory for two samples. Int ernational
Conferenc e on Imp rov ing Surveys, Copenhagen.
[250] Deville, J.C. and Sarndal, C.E. (1992). Calibration estimators in survey sampling. J. Amer. Stat ist.
Assoc., 87,376--382.
[ 251 ] Deville, J.C. and Tille, Y. (1998). Unequal probability sampling without replacement through a
splitting method. Biometrika, 85( I), 89--101.
[ 252 ] Deville, J.C. and Tille, Y. (2000). Selection of several unequal probability samples from the
same population . J. Stat ist. Plan ning Infer., 86,215--227.
[ 253 ] Dey, A. and Srivastava, A.K. (1987). A sampling procedure with inclusion probabilities
proportional to size. Survey Methodology, 13(I), 85--92.
[ 254 ] Diana, G. (1992). A study ofkth order approximation of some ratio type strategies . Metron, 19--
32.
[255] Dorfman, A.H. and Hall, P. (1993). Estimators of the finite population distribution function using
non-parametric regression. Annals ofStatistics , 21(3),1452--1475.
[ 256] Doss, D.C., Hartley, H.O. and Somayajulu, G.R. (1979). An exact small sample theory for post-
stratification . J. Statist. Planning Infer., 3,235--248.
[ 257 ] Dowling, T.A. and Shachtman, R.H. (1975). On the relative efficiency of randomized response
models. J. Amer. Statist. Assoc., 70, 84--87.
[ 258 ] Draper, N.R. and Guttman, 1. (1968a). Some Bayesian stratified two-phase sampling results.
Biometrika, 55, 131--139.
[ 259 ] Draper, N.R. and Guttman, 1. (1968b). Bayesian stratified two-phase sampling results: k
characteristics . Biometrika, 55,587--589.
[260] Drew, D., Singh, M.P. and Choudhry, G.H. (1982). Evaluation of small area estimation techniques
for the Canadian labour force surveys. Survey Methodology, 8, 17--47.
[ 261 ] Dubey, V. (1993). An almost unbiased product estimator. J. Indian Soc. Agril. Statist., 45, 226--
229.
[ 262 ] Dubey, V. and Singh, S.K. (2001). An improved regression estimator for estimating population
mean. J. Ind ian Soc. Agric. Statist., 54(2), 179--183.
[ 263 ] Duncan, GJ. and Kalton, G. (1987). Issue of design and analysis of survey across time. Int.
Statist. Rev., 55, 97--117.
[ 264 ] Dupont, F. (1995). Alternative adjustments where there are several levels of auxiliary information.
Survey Methodology, 21, 125--135.
[ 265 ] Durbin, J. (1959). A note on the application of Quenouille's method of bias reduction to the
estimation of ratios. Biometrika, 46,477--480.
[ 266 ] Durbin, J. (1967). Design of multi-stage survey for the estimation of sampling error. Applied
Statist., 16, 152--164.
Bibliography 1145
[ 267 ] Eckler, A.R. (1955). Rotation sampling. Ann. Math. Stat., 26, 664--685.
[ 268 ] Eichhorn, B.H. and Hayre, L.S. (1983). Scrambled randomized response methods for obtaining
sensitive quantitative data. J. Statist. Planning Infer.,7, 307--316 .
[269] Ekman, G. (1959). An approximation useful in univariate stratification . Ann. Math. Stat., 30, 219-
-229.
[ 270 ] Elliott, M.R., Little, RJ.A. and Lewitzky, S. (2000). Subsampling callbacks to improve survey
efficiency. J. Amer. Statist. Assoc., 95, 730--738.
[ 271 ] Eltinge, J.L. (1999). Accounting for non-Gaussian measurement error in complex survey
estimators of distribution functions and quantiles. Statist. Sinica ,9 , 425--450 .
[ 272 ] Ericson, W.A. (1969). Subjective Bayesian models in sampling finite populations . J. R. Statist.
Soc., 55, 587--589.
[273] Eriksson, S.A. (1973). A new model for randomized response. Int. Stat. Rev., 41, 101--103.
[ 274 ] Espejo, M.R. (1997). Uniqueness of the Zinger strategy with estimable variance : Rana-Singh
estimator. Sankhy a, B, 59, 76--83.
[ 275 ] Espejo, M.R. and Pineda, M.D.(1997). On variance estimation for poststratification: a review.
Metron, 209--220.
[ 276] Espejo, M.R., Pineda, M.D. and Nadarajah, S. (2003). Estimation of finite population parameters
with several realizations. Statistical Papers, 44 (2), 267--278.
[ 277 ] Estevao, Y.M. (1994). Calibration of g weights under calibration and bound constraints . Report,
Statistics Canada.
[ 278 ] Estevao, Y.M. and Sarndal, C.E. (2000). A functional form approach to calibration . J. Official
Statist., 16(4),379--399.
[ 279 ] Estevao, Y.M. and Sarndal, C.E. (2002). The ten cases of auxiliary information for calibration in
two-phase sampling. 1. Official Statist., 18(2),233--255.
[ 280 ] Farrell, PJ. (1997). Empirical Bayes estimation of small area proportions based on ordinal
outcomes variables. Survey Methodology , 23, 119--126.
[ 281 ] Farrell, P.J. (2000). Bayesian inference for small area proportions . Sankhy a, B, 62, 402--416 .
[ 282 ] Farrall, PJ., MacGibbon, B. and Tomberlin, TJ. (1994). Protection against outliers in empirical
Bayes estimation. Canad. 1. Statist ., 22,365-376.
[ 283 ] Farrell, PJ., MacGibbon, B. and Tomberlin, TJ. (1997a). Bootstrap adjustments for empirical
Bayes interval estimates of small-area proportions. Canad. J. Statist., 25(1),75--89.
[ 284 ] Farrell, PJ., MacGibbon, B. and Tomberlin, TJ. (1997b). Empirical Bayes estimators of small
area proportions in multistage designs. Statistica Sinica, 7, 1065--1083.
[ 285 ] Farrell, PJ., MacGibbon, B. and Tomberlin, TJ. (1997c). Empirical Bayes small area estimation
using logistic regression models and summary statistics. J. Business and Econo. Statist., 15, 101--108.
[ 286] Farrell, PJ. and Singh, S. (2002a). Recalibration of higher order calibration weights. Presented
at the Conference ofthe Statistical Society ofCanada, Hamilton.Canada .
1146 Advanced sampling theory with applications
[ 287 ] Farrell , PJ. and Singh, S. (2002b ). Penal ized chi square distance function in survey sampling.
Jo int Statistical Meeting s. NY-Section on survey research method , pp. 963- -968 .
[ 288 ] Fan, 1. (1993). Local linear regression smoothers and their minimax efficiencies . Ann. Statist .,
2 1,196--2 16.
[ 289 ] Fay, R. (1992). When are inferences from multiple imput ation valid? Proc. Survey Res. Meth.
Sect.. Amer Statist. Assoc., 227--232 .
[ 290 ] Fay, R. (1994). Discussion of paper by X.L. Meng . Statist. Sci., 9, 558--560.
[ 291 ] Fay, R. (1996). Alternative paradigms for the analysis of imputed survey data . J. Amer. Statist.
Assoc., 91, 490 --498.
[ 292 ] Fay, R.E. and Herriot, R.A. (1979). Estimates of income for small places: An application of
lames--Stein procedures to census data. J. Amer. Statist. Assoc., 74, 269- -277 .
[ 293 ] Fellegi, I.P. ( 1963). Sampling with vary ing proba bilities without rep lacement : rotating and non -
rotating samp les. J. Amer. Statist. Assoc., 58,183--20 1.
[ 294 ] Fellegi, I.P. and Holt, D. (1976). A system atic approach to automatic editi ng and imputat ion. J.
Amer. Statist. Assoc. , 71, 17--35.
[ 295 ] Fellegi, I.P. and Sunter, A.B. (1974). Balance between different sources of survey errors--some
Canadian experiences. Sankhy a, C, 36, 119--142 .
[296] Feller, W. (1957). An introduction to probability theory and its applications. Vol. I, John Wiley
and Sons, New York.
[ 297 ] Feng, S. and Zou, G. ( 1997). Samp le rotation method with auxi liary variable. Commun. Statist.--
Theory Meth ., 26(6) , 1497--1509.
[ 298 ] Fienberg, S.E. (1970). An iterative procedure for estimation in contingency tables . Ann. Math.
Statist., 41, 907--917.
[ 299 ] Fienberg, S.E. and Tanur, 1.M. ( 1987). Experimental and sampling structures : parallel diverging
and meeting. Ins. Statist. Rev., 55, 75--96 .
[ 300 ] Finney, 0 .1. (194 8). Random and systematic sampling in timbe r surveys. Forestry , 22 ,1 --36 .
[301] Finney, OJ. (1950). An example of period ic variation in forest sampling. Forestry, 23, 96-- 1I I.
[ 302 ] Fisher, R.A. (1920). A mathematical exam ination of the methods of determining the accuracy of
an observation by the mean error, and by the mean square error . Monthly Notices R. Astr. Soc., 80, 759--
770.
[303] Fisher, R.A . (1922). On the mathematical foundat ions of theo retical stati stics . Phil. Trans. R. Soc.
Lond., A, 222 , 309 --368.
[ 304 ] Fisher, R.A. (1925). Statistical Methods fo r Research Workers. 1st Edition. Oliver and Boyd,
Edinb urgh .
[ 305 ] Folsom, R.E., Greenberg, B.G., Horv itz, D.G. and Abernathy, 1.R. (1973). The two alternate
questions randomized response model for human surve ys. J. Amer. Statist. Assoc., 68, 525--530.
[ 306 ] Foreman, E.K. and Brewer , K.W.R. (1971). The efficient use of supp lementary information in
standard sampli ng procedures. 1. R. Statist. Soc.• B, 33, 391--400 .
Bibliography 1147
[307] Fountain , R.L. and Pathak, P.K. (1989). Systematic and non-random sampling in the presence of
linear trends. Commun . Statist>- Theory Meth., 18,2511--2526.
[ 308 ] Francis, R.LC. (1984) . An adaptive strategy for stratified random trawl surveys. New Zealand J.
Mar. Freshw . Res., 18,59--71.
[ 309 ] Francisco , C.A. and Fuller, W.A. (1991). Quantile estimation with a complex survey design. Ann.
Statist.. 19, 454--469 .
[ 310 ] Frankel, L.R. and Stock, J.S. (1942). On the sample survey of unemployment. J. Amer. Statist.
Assoc., 37, 77--80.
[ 311 ] Franklin , L.A. (1989a) . A comparison of estimators for randomized response sampling with
continuous distributions from a dichotomous population. Commun . Statist . -- Theory Meth ., 18,489--505.
[ 312 ] Franklin, L.A. (1989b) . Randomized response sampling from dichotomous populations with
continuous randomization. Survey Methodology. 15,225--235.
[ 313 ] Freedman , D., Pisani, R. and Purves, R. (1978). Statistics. Norton, New York.
[314] Freund, J.E. (2000) . Mathematical Statistics. Fifth Edition. Prentice HalI oflndia, New Delhi.
[ 315 ] Friedlander, D. (1961). A technique for estimating a contingency table, given the marginal totals
and some supplementary data. J. Roo Statist . A, 124,412--420.
[ 316 ] Fuller, W.A. (1966). Estimation employing post strata. 1. Amer. Statist . Assoc ., 61,1172--1183 .
[317] Fuller, W.A. (1970). Sampling with random stratum boundaries . J. R. Statist. Soc., 32, 209 -- 226.
[ 318 ] Fuller, W.A. (1971). A procedure for select ing non-replacem ent unequal probability samples.
Unpubl ished Manuscript, Department of Statistics, Iowa State, University , Ames, Iowa.
[ 319 ] Fuller, W.A. (1987) . Measurement error models . John Wiley and Sons, Inc., New York.
[320] FulIer, W.A. (1995) . Estimation in the presence of measurement error. Int. Statist . Rev ., 63,121--
147.
[321 ] FulIer, W.A. (1998). Replication variance estimation for two-phase samples. Statistica Sinica, 8,
1153--1164 .
[322] Fuller, W.A. and Breidt, FJ. (1999). Estimation for supplemented panels. Sankhy a ,51, 58--70.
[323] FulIer, W.A. and Burmeister, L.F. (1972). Estimators for samples selected from two overlapping
frames. Proceedings ofthe Social Statistics Section . American Statistical Association, 245--249 .
[ 324 ] Gabler, S. (1981) . A comparison of Sampford's sampling procedure versus unequal probability
sampling with replacement. Biometrika, 68, 725--727.
[ 325 ] Gabler , S. (1984) . On unequal probability sampling: sufficient conditions for the superiori ty of
sampling without replacement. Biometrika. 71, 171--175.
[326] Gabler, S. and Horst, S. (1995). Improving the RHC--strategy . Statistical Hefte, 36, 327--336 .
[ 327 ] Garcia, M.R. and Cebrian, A.A. (1996). Repeated substitution method : The ratio estimator for the
population variance. Metrika, 43, 101--105.
1148 Advanced sampling theory with applications
[ 328 ] Garcia, M.R. and Cebrian, A.A. (1998). Quantile interval estimation in finite population using a
multivariate ratio estimator. Metrika , 47, 203--213.
[329] Garcia, M.R. and Cebrian, A.A. (2001). On estimating the median from survey data using multi-
auxiliary information. Metrika , 54( I), 59--76.
[ 330] Gautschi, W. (1957). Some remarks on systematic sampling. Ann. Math. Statist.. 28,385--394.
[ 331 ] Gershunskaya, J., Eltinge, J.L. and Huff, L. (2002). Use of auxiliary information to evaluate a
synthetic estimator in the U.S. current employment statistic program. Joint Statistical Meetings. NY-
Section on survey research methods, 1149--1154.
[ 332 ] Ghangurde, P.O. and Rao, J.N.K. (1969). Some results on sampling over two occasions.
Sankhy E , A, 31, 463--472.
[ 333 ] Ghosh, M. and Meeden, G. (1997). Bayesian methods f or finite population sampling. Chapman
and Hall.
[ 334 ] Ghosh, M. and Pathak, P.K. (1992). Current Issues in Statistical Inf erence: Essays in Honor ofD.
Basu. Lecture Notes -- Monograph Series, Institute of Mathematical Statistics, Hayward, California.
[335] Ghosh, M. and Rao, J.N.K. (1994). Small area estimation : An appraisal. Statistical Science, 9(1),
55--93.
[336] Ghosh, S. (1998). The Horvitz-Thompson vs. Sen--Yates--Grundy variance Estimators: Issues in
finite population sampling. J. Indian Soc. Agric. Statist., 50, 2&3, 343--348.
[337] Ghosh, S.P. (1963). Post-cluster sampling. Ann. Math. Statist. 34,587--597.
[338] Giffard--Jones, W. (1993). The doctor game. The Windsor Star, April IS, 1993.
[339] Giommi, A. (1984). A simple method for estimating individual response probabilities in sampling
from finite populations. Metrika, 185--200.
[ 340 ] Godambe, V.P. (1995a). Estimation of parameters in survey sampling : Optimality. Canad. 1.
Statist., 23(3), 227--243.
[ 341 ] Godambe, V.P. (1955b). A unified theory of sampling from finite population s. J. R. Statist. Soc..
B, 17, 269--278 .
[ 342 ] Godambe, V.P. (1960). An optimum property of regular maximum likelihood estimation. Ann .
Math . Statist .• 3 1,1208--1211.
[ 343 ] Godambe, V.P. (1969). Admissibility and Bayes estimation in sampling finite populations- V,
Ann. Math. Statist ., 40,672--676.
[ 344] Godambe, V.P. (1976). Conditional likelihood and unconditional optimum estimating equations.
Biometrika , 63, 277--284.
[345] Godambe, V.P. (1980a). On the sufficiency and ancillarity in the presence of nuisance parameters.
Biometrika. 67,269--276.
[346] Godambe, V.P. (I 980b). Estimation in randomized response trials. Int. Statist. Rev., 48, 29--32.
[ 347] Godambe,V.P. (1984). On ancillarity and Fisher information in presence of a nuisance parameters.
Biometrika , 7 1, 626--629.
Bibliography 1149
[348] Godambe, V.P. (1987). Resolution of Godambe's paradox. Statist . Probab. Lett., 5, 239--239.
[ 349 ] Godambe, V.P. (1989). Estimation of cumulative distribution of survey population. Technical
Report STAT : 89--117, University of Waterloo.
[ 350 ] Godambe, V.P. (1991). Orthogonality of estimating functions and nuisance parameters.
Biometrika, 78, 143--151.
[ 351 ] Godambe, V.P. (1995). Estimation of parameters in survey sampling : Optimality. Canad. J.
Statist., 23(3), 227--243.
[352] Godambe, V.P. (1998). Estimation of parameters in survey sampling. J. Indian Soc. Agric. Statist.,
51 (2-3),315--330.
[353] Godambe, V.P. (1999). Linear Bayes and optimal estimation. Ann. Inst. Statist . Math.,51(2), 201--
215.
[ 354 ] Godambe, V.P. and Heyde, C.C. (1987). Quasi likelihood and optimal estimation. Int. Statist.
Rev., 55, 231--244.
[ 355 ] Godambe, V.P. and Joshi, V.M. (1965). Admissibility and Bayes estimation in sampling finite
populations -- I. Ann. Math. Statist.. 36, 1707--1722.
[ 356 ] Godambe, V.P. and Kale, B.K. (1991). Estimating functions: an overview. In estimating
Functions (V.P. Godambe ed.), Clarendon Press, Oxford, 3--20.
[ 357 ] Godambe, V.P. and Thompson, M.E. (1984). Robust estimation through estimating equations.
Biometrika, 71,115--125.
[ 358 ] Godambe, V.P. and Thompson, M.E. (1986). Parameters of superpopulation and survey
population, their relationship and estimation. Int.. Statist. Rev.. 54, 127--138.
[ 359 ] Godambe, V.P. and Thompson, M.E. (1989). An extension of quasi-likelihood estimation (with
discussion). J. Statist . Planning Infer., 22, 137--172.
[ 360 ] Godambe, V.P. and Thompson, M.E. (1996-97). Optimal estimation in a casual framework. J.
Indian Soc. Agril. Statist. , 49, 21--46.
[ 361 ] Godambe, V.P. and Thompson, M.E. (1999). A new look at confidence intervals in survey
sampling. Survey Methodology, 25, 161--173.
[ 362 ] Goel, B.B.P.S. and Singh, D. (1977). On the formation of clusters. J. Indian Soc. Agril. Statist.,
29,53--68.
[ 363 ] Gonzalez, M.E. (1973). Use and evaluation of synthetic estimators. Proceedings of the Amer.
Statist. Assoc ., Social Statistics Section.. 33--36.
[364] Gonzalez, M.E. and Hoza, C. (1976). Small area estimation of unemployment. Proceedings ofthe
Section on Social Statistics. American Statistical Association , 437--443.
[ 365 ] Gonzalez, M.E. and Hoza, C. (1978). Small area estimation with application to unemployment
and housing estimates. 1. Amer. Statist. Assoc ., 73,7--15 .
[ 366 ] Goodman, L.A. and Hartley, H.O. (1958). The precision of unbiased ratio type estimators. J.
Amer. Statist. Assoc ., 53, 491--508.
1150 Advanced sampling theory with applications
[ 367 ) Graf, M. (2002) . Assessing the accuracy of the median in a stratified double stage cluster
sampling by means of a nonparametric confidence interval: Application to the swiss earnings structure
survey. Proc. Jo int Statistical Meetings, NY--Section on Governm ent Statatistics, 1223--1228.
[ 368 ) Greenberg , B.G., Abul-Ela, A.L.A., Simmons, W.R. and Horvitz, D.G. (1969). The unrelated
question randomized response model -- theoretical framework. J. Amer. Statist. Assoc., 64, 520--539 .
[369) Greenberg, B.G., Kuebler, R.R., Abernathy, J.R. and Horvitz, D.G. (1971). Application of the
random ized response technique in obtaining quantitative data. J. Amer. Statist. Assoc., 66, 243--250 .
[ 370 ) Grewal, I.S., Bansal, M.L. and Singh, S. (1999). An alternative estimator for multiple
characteristics using randomized response technique in PPS sampling. Aligarh J. Statist., 51--65.
[371) Grewal, I.S., Bansal, M.L. and Singh, S. (2002). Estimation of populat ion mean ofa stigmatized
quantitative variable using double sampling. Statistica (Accepted) .
[ 372 ) Gross, S.T. (1980) . Median estimation in sample surveys. Proc. Surv. Res. Meth . Sect . Amer.
Statist. Assoc.. 181--184.
[ 373 ) Groves, R.M. (1996). Non-sampling error in surveys: the journey toward relevance in practice.
Proc. Statist . Can. Symp., 96, 7--14.
[ 374 ) Groves, R.M. and Lepkowski, J.M. (1986). An experimental implementation of a dual frame
telephone sample design . Proc. Sec. Survey Res. Meth .. American Statistical Association, 340--345 .
[ 375 ) Grubbs, F.E. (1948) . On estimating precision of measuring instruments and product variability . J.
Amer. Statist. Assoc., 43,243--264.
[ 376 ) Gujarati , D. (1978). Basic econometrics (Internat ional Student Edition). Mcgraw - Hill
International Book Company , Tokyo.
[377) Gupta, B.K. and Rao, T.J. (1997). Stratified PPS sampling and allocation of sample size. J. Indian
Soc. Agril. Statist., 50(2), 199--208.
[ 378 ) Gupta , J.P. (2002). Estimation of the correlation coefficient in probability proportion al to size
with replacement sampling . Statistical Papers, 43(4), 525--536.
[ 379) Gupta , J.P. and Singh, R. (1990). A note on usual correlation coefficient in systematic sampling.
Statist ica, 50,255--259.
[ 380 ) Gupta, J.P., Singh, R. and Kashani, H.B. (1993). An estimator of the correlation coefficient in
probability proportional to size with replacement sampling. Metron , 165--177.
[381) Gupta , J.P., Singh, R. and Lal, B. (1978). On the estimation of the finite population correlation
coefficienr-L Sankhy a, C, 41, 38--59.
[382) Gupta, J.P., Singh, R. and Lal, B. (1979). On the estimation of the finite populat ion correlation
coefficient- -11. Sankhy ii , C, 42, 1--39.
[ 383 ) Gupta, P.C. (1970). Some estimation problems in samp ling using auxiliary inf ormation.
Unpublished Ph.D. thesis submitted to lARS, New Delhi.
[ 384 ) Gupta, P.C. (1978). On some quadratic and higher degree ratio and product estimator. J. Indian
Soc. Agril. Statist ., 30, 7 I--80.
[385) Gupta, P.C. and Kothwala , N.H. (1990). A study of second order approximation for some product
type estimators. J. Indian Soc. Agril. Statist., 42,171--185.
Bibliography 1151
[ 386 ] Gupta, R.K., Singh, S. and Mangat, N.S. (1992-93). Some chain ratio type estimators for
estimating finite population variance. Aligarh J . Statist., 12&13,65--69.
[ 387 ] Gupta, V.K. and Nigam, A.K. (1987). Mixed orthogonal arrays for variance estimation with
unequal number of primary selections per stratum. Biometrika. 74, 735--742.
[388] Gupta, V.K., Nigan, A.K. and Kumar, P. (1982). On a family of sampling schemes with inclusion
probability proportional to size. Biometrika, 69, 191--196.
[ 389] Gurney, M. and Jewett, R.S. (1975). Constructing orthogonal replications for standard errors. J.
Amer. Statist. Assoc. , 70, 819--821.
[ 390] Hajek, J. (1958). Some contribution to the theory of probability sampling. Bull. Int. Statist. Inst.,
36,127--134.
[391 ] Hajek, J. (1959). Optimum strategies and other problems in probability sampling. Casopis Pest.
Mat., 84, 387--423.
[ 392 ] Hajek, J. (1964). Asymptotic theory of rejective sampling with varying probability from a finite
population . Ann. Math. Stat., 35,1491--1525.
[393] Halmos, P.R. and Perlman, M.D. (1974). On the existence of a minimal sufficient sub-field.
Ann. Statist.., 2,1049--1055.
[ 394 ] Hanif, M., Mukhopadhyay, P. and Bhattacharyya, S. (1993). On estimating the variance of
Horvitz and Thompson estimator. Pak. J . Statist., A, 9, 123--136.
[ 395 ] Hansen, M.H. and Hurwitz, W.N. (1942). Relative efficiencies of various sampling units in
population enquiries. J. Amer. Statist. Assoc., 37, 89--94.
[396] Hansen, M.H. and Hurwitz, W.N. (1943). On the theory of sampling from finite populations. Ann.
Math. Stat., 14, 333--362.
[397] Hansen, M.H. and Hurwitz, W.N. (1946). The problem of non-response in sample surveys. J .
Amer. Statist. Assoc.,41,517--529 .
[ 398 ] Hansen, M.H., Hurwitz, W.N. and Madow, W.G. (1953). Sample survey methods and theory.
New York, John Wiley and Sons, 456--464.
[ 399 ] Hanurav, T.V. (1965). Optimum sampling strategies and some related problems. Ph.D. Thesis,
Indian Statistical Institute.
[ 400 ] Hanurav, T.V. (1966). Some aspects of unified sampling theory. Sankhy a. A, 28, 175--204.
[401] Hanurav, T.V. (1967). Optimum utilization of auxiliary information: J( ps sampling of two units
from a stratum. J . R. Statist . Soc.,B, 29, 374--391.
[ 402 ] Hartigan, J.A. (1969). Linear Bayesian methods. J. R. Statist. Soc., B, 31, 440--454.
[403] Hartley, H.O. (1962). Multiple frame surveys. Proc. of the Social Statist ics Section. American
Statistical Association, 203--206 .
[ 404] Hartley, H.O. (1966). Systematic sampling with unequal probability and without replacement J.
Amer. Statist. Assoc., 61, 739--748.
1152 Advanced sampling theory with applications
[ 405] Hartley, H.G. (1974). Multiple frame methodology and selected applications . Sankhy d , C, 36,
99--118.
[ 406 ] Hartley, H.G. and Biemer, P.P. (1978). The estimation of non-sampling variances in current
surveys. Proc. Sec. Survey Res. Meth., American Statistical Association, 257--262 .
[ 407 ] Hartley, H.G. and Rao, J.N.K. (1962). Sampling with unequal probabilities and without
replacement. Ann. Math. Statist ., 33, 350--374.
[ 408 ] Hartley, H.G. and Rao, J.N.K. (1968). A new estimation theory for sample surveys. Biometrika,
55,547--557.
[409] Hartley, H.G., Rao, J.N.K. and Kiefer, G. (1969). Variance estimation with one unit per stratum.
J. Amer. Statist. Assoc ., 64, 841--851.
[410] Hartley, H.G. and Ross, A. (1954). Unbiased ratio estimators. Nature , 174,270--271.
[ 411 ] Hedayat, A.S., Rao, C.R. and Stufken, J. (1988). Sampling plans excluding contiguous units. J.
Statist. Planning Infer., 19, 159--170.
[ 412 ] Heilbron, D.C. (1978). Comparison of estimators of the variance of systematic sampling.
Biometrika . 65, 429--433 .
[ 413 ] Henderson, C.R. (1975). Best linear unbiased estimation and prediction under a selection model.
Biometrics, 31,423--447.
[ 414 ] Hendricks, W.A. (1944). The relative efficiencies of groups of farms as sampling units. J. Amer.
Statist. Assoc .. 39,366--376.
[ 415] Hendricks, W.A. (1949). Adjustment for bias caused by non-response in mailed surveys. Agric.
Econo. Res.. I, 52--56.
[416] Herzel, A. (1986). Sampling without replacement with unequal probabilities : sample designs with
pre-assigned joint inclusion probabilities ofany order. Metron, 49--68 .
[ 418 ] Hidiroglou, M.A. (1995). Sampling and estimation for stage one of the Canadian survey of
employment, payrolls and hours survey redesign. Statistical Society of Canada. Proc. of the Survey
Methods Section , 123--128.
[ 419 ] Hidiroglou, M.A. (200 I). Double Sampling. Survey Methodology, 27, 143--154.
[ 420 ] Hidiroglou, M. A. and Sarndal, C.E. (1995). Use of auxiliary information for two-phase sampling.
Proc. Sec. Survey Res. Meth., Amer. Statist. Assoc.. VoUI, 873--878.
[ 421 ] Hidiroglou, M. A. and Sarndal , C.E. (1998). Use of auxiliary information for two-phase sampling.
Survey Methodology, 24 (I), 11--20.
[422] Hodges, J.L. and Lehmann, E. (1970). Basic Concepts of Probability and Statistics . 2nd ed.
Holden--Day, San Francisco.
[ 423 ] Holt, D. and Smith, T.M.F. (1979). Post-stratification. J. R. Statist. Soc., A, 142 33--46.
[ 424 ] Holt, D., Smith, T.M.F. and Tomberlin, TJ. (1979). A model based approach to estimation for
small subgroups ofa population . J. Amer. Statist. Assoc., 74,405--410.
Bibliography 1153
[425] Horvitz, D.G. and Thompson, DJ. (1952). A generalisation of sampling without replacement from
a finite universe. J. Amer. Statist. Assoc., 47, 663--685.
[ 426 ] Huang, L.R. and Ernst, L.R. (\981). Comparison of an alternat ive estimator to the current
composite estimator in the Current Population Surveys. Proc. of the Amer. Statist. Assoc.. Section on
Survey Research Methods, 303--308 .
[ 427 ] Hutchison , M.C. (1971). A Monte Carlo comparison of some ratio estimators . Biometrika, 58,
313--321.
[428] Iachan, R. (\982). Systematic sampling : A critical review. Int. Stat. Rev., 50, 293--303.
[ 429 ] Ireland, C.T. and Kullback, S. (1968). Contingency table with given marginals. Biometrika, 55,
179--188.
[ 430 ] Isaki, C.T. (1983).Variance estimation using auxiliary information. J. Amer. Statist. Assoc. ,78,
117--123.
[ 431 ] Isaki, C.T. and Fuller, W.A. (1982). Survey design under a regression superpopulation model. J.
Amer. Statist . Asso c., 77,89--96.
[ 432 ] Jaech, J.L. (1981). Constraind expected likelihood estimates of precisions using Grubbs'
technique for two dimensional methods. Nuclear Materials Management Journal, X(2), 34--39.
[433] Jaech, J.L. (\985). Statistical analysis ofmeasurement errors. John Wiley and Sons, New York.
[434] Jagers, P. (\986). Post-stratification against bias in sampling. Int. Statist. Rev., 54,159--167.
[ 435 ] Jagers, P., Oden, A. and Trulsson, L. (\985). Post-stratification and ratio estimation : usages of
auxiliary information in survey sampling and opinion polls. Internat. Statist . Rev., 53, 221--238.
[ 436 ] Jain, R.K. (1987). Properties of estimators in simple random sampling using auxiliary variable.
Metron, 265--271.
[ 437 ] Jessen, RJ . (\942). Statistical investigation of a sample survey for obtaining farm facts. Iowa
Agricultural Experiment Station Research Bulletin, 104.
[ 438 ] Jhajj, H.S. and Srivastava, S.K. (\983). A class of PPS estimators of population mean using
auxiliary information. J. Indian Soc. Agril. Statist ., 35, 57--61.
[439 ] John, S. (1969). On multivariate ratio and product estimators. Biometrika, 56, 533--536 .
[440] Jolly, G.M. and Hampton, I. (1990). A stratified random transect design for acoustic surveys of
fish stocks. Canad . J. Fish. Aquat. Sci., 47,1282--1291.
[441] Jones, R.G. (1980). Best linear unbiased estimators for repeated surveys. J. R. Stat ist. Soc., B, 42,
221--226.
[ 442 ] Joshi,V.M. (1966). Admissibility and bayes estimation in sampling finite populations IV. Ann .
Math. Stati st., 37,1658--1678.
[443] Joshi, V.M. (1970). Note on the admissibility of the Sen-Yates--Grundy estimator and Murthy's
estimator and its variance estimator for samples of size two. Sankhy Ii , A,32, 431--438 .
[ 444 ] Kadilar, C. and Cingi, H. (2003). Ratio estimators in stratified random sampling. Biom. J., 45(2),
218--225.
1154 Advanced sampling theory with applications
[445) Kalton, 0 and Anderson, D.W. (1986). Sampling rare populations. J. R. Statist. Soc ., A, 65--82.
[446) Kalton, O. and Kasprzyk, J.R. (1986). The treatment of missing data. Survey Methodology, 105--
110.
[ 447 ) Kapadia, S.B. and Gupta, P.C. (1984). A quadratic and higher degree ratio, product estimators in
sampling with varying probabilities . J. Statist. Res., 18, 1--18.
[ 448 ) Karlheinz, F. (1990). Stratified sampling using double sampling. Statist . Hefte, 31, 55--63.
[ 449 ) Kasprzyk, D., Duncan, OJ., Kalton, O. and Singh, M.P. Panel Surveys . Wiley, New York.
[450) Kathuria, O.P. (1975). Some estimators in two-stage sampling on succesive occasions with partial
matching at both stages. Sankhy a , C, 37,147--162.
[451 ) Kathuria, O.P. and Singh, D. (l97la). Comparison of estimates in two-stage sampling on
successive occasions . J. Indian Soc. Agril. Statist ., 23, 31--51.
[ 452 ) Kathuria, O.P and Singh, D. (l97Ib). Relative efficiencies of some alternative replacement
procedures in two-stage sampling on successive occasions. J. Indian Soc. Agril. Statist., 23, 101--114.
[ 453 ) Kaur, P. and Singh, O. (1982). A note on estimating variance in a finite population . J. Statist.
Res., 16(1&2), 51--54.
[454) Kempthorne, O. (1952). The Design and Analysis ofExperiments. Wiley, New York.
[455) Kerkvliet, J. (1994). Estimating a logit model with randomized data: The case of cocaine use.
Austral. J. Statist.,36, 9--20.
[456) Khan, S.U. and Tripathi, T.P. (1967). The use of multi-auxiliary information in double sampling.
J. Indian Statist . Assoc ., 5, 42--48.
[ 457 ) Khan, Z. (1976). Optimum allocation in Bayesian stratified two-phase sampling. J. Indian Soc.
Agril. Statist ., 14, 65--74.
[ 458 ) Khare, B.B. (1987). Allocation in stratified sampling in presence of non-response . Metron , 213--
221.
[ 459 ) Khare, B.B. (1991). Determination of sample sizes for a class of two-phase sampling estimators
for ratio and product of two population means using auxiliary character. Metron, 185--197.
[ 460 ) Khare, B.B. and Srivastava, S. (1981). A generalized regression ratio estimator for the population
mean using two auxiliary variables. Aligarh J. Statist., 1(1),43--51.
[461) Khare, B.B. and Srivastava, S. (1997). Transformed ratio type estimators for the population mean
in the presence of nonresponse . Commun . Statist. -- Theory Meth., 26(7), 1779--1791.
[ 462 ) Khare, B.B and Srivastava, S.R. (1998). Combined generalised chain estimators for ratio and
product of two population means using auxiliary characters. Metron, 56, 109--116.
[463) Kim, J. (1978). Randomized respons e techniqu es/or surv eying human populations. Unpublished
Ph.D. dissertation , Temple University, Philadelphia, USA.
[464) Kim, J.K. (2001). Variance estimation after imputation. Survey Methodology, 27, 75--83.
[ 465 ) Kiregyera, B. (1980). A chain ratio type estimator in finite population : double sampling using two
auxiliary variables. Metrika , 27, 217--223.
Bibliography 1155
[ 466 ] Kiregyera,B.(1984). Regression type estimators using two auxiliary variables and the model of
double sampling from finite populations . Metrika, 31, 215--226.
[ 467 ] Kish, L. and Hess, I. (1959). A replacement procedure for reducing the bias of non-response.
American Statistician , 13, 17--19.
[ 468] Kokan, A.R. and Khan, S.U. (1967). Optimum allocation in multivariate surveys: An analytical
solution. J. R. Statist. Soc ., B, 2,115--125.
[ 469 ] Kolmogorov, A.N. (1942). Sur I'estimation statistique des parameters de la loi de Gauss. Izv.
Akod. Nauk SSSR Ser. Mat. 6, 3--32.
[ 470 ] Konijn, H.S. (1973). Statisti cal Theory of Sample Survey Design and Analysis. North Holand
Publishing Company.
[ 471 ] Konijn, H.S. (1979). Model free evaluation of the bias and the mean square error of the regression
estimator. Sankhy a. C, 41, 69--75.
[ 472 ] Konijn, H.S. (1981). Biases, variances and co-variances of raking ratio estimators for marginal
and cell totals and averages of observed characteristics. Metrika, 28, 109--121.
[ 473] Koop, J.C. (1967). Replicated (or interpenetrating) samples of unequal sizes. Ann . Math. Statist .,
38,1142--1147.
[ 474 ] Koop, J.e. (1971). On splitting a systematic sample for variance estimation. Ann. Math. Statist.,
42,3,1084--1087.
[ 475 ] Korn, E.L. and Graubard, B.I. (1998). Variance estimation for superpopulation parameters.
Statistica Sinica, 8, 1131--1151.
[476] Kossack, CF. and Shiledar--Bax, H.R. (1971). On designing ofa unit--stratified survey design for
discrete set of observations. Int. Statist. Rev., 39, 46--56.
[ 477 ] Kothwala, N.H. and Gupta, P.C. (1989). Estimation of population mean with knowledge of
coefficient of variation with p--auxiliary variables. Metron, 107--119.
[ 478 ] Kott, P.S. (1988). Model based finite population correction for the Horvitz and Thompson
estimator. Biometrika, 75, 797--799.
[ 479 ] Kott, P.S. and Stukel, D.M. (1997). Can the Jackknife be used with a two-phase sample? Survey
Methodology, 23, 81--89.
[ 480 ] Kowar, R.M. (1996). One pass selection of a sample with probability proportional to aggregate
size. Sankhy a.B, 58, 80--83.
[481] Krewski, D. and Chakrabarty, R.P. (1981). On the stability of the Jackkinfe variance estimator in
ratio estimation. J. Statist. Planning Infer., 5, 71--78.
[ 482 ] Krewski, D. and Rao, J.N.K. (1981). Inference from stratified samples: properties of the
linearization, Jackknife and balanced repeated replication methods. Ann . Statist .,9, 1010--1019 .
[483] Kuhn, H.W. and Tucker, A.W. (1952). Non-linear programming. Proceeding of the second
Berkeley Symposium on Mathematical Statistics and Probability.
[484] Kuk, A.Y.C. (1990). Asking sensitive questions indirectly. Biomerika, 77(2), 436--438.
1156 Advanced sampling theory with applications
[485] Kuk, A.Y.C. and Mak, T.K. (1989). Median estimation in the presence of auxiliary information. J.
R. Statist. So c., B, 51, 261--269.
[ 486] Kuk, A.Y.C. and Mak, TX. (1994). A functional approach to estimating finite population
distribution functions. Commun. Statist» - Theory Meth., 23(3), 883--896.
[ 487 ] Kulldorff, G. (1963). Some problems of optimum allocation for sampling on two occasions. Rev.
Inter. Statist. Inst., 31, 24--57.
[ 488 ] Kumar, E.V. and Srivenkataramana, T. (1994). A generalization of Midzunc --Sen sampling
scheme for finite populations. Commun. Statist.i-Theory Meth ., 23(9), 2541--2559.
[ 489] Kumar, E.V., Srivenkataramana, T. and Srinath, K.P. (1996). Use of ranks in probability
proportional to size sampling. Commun. Statist. -- Theory Meth ., 25( I2), 3 195--32 I5.
[ 490 ] Kumar, P. and Agarwal, S.K. ( 1997). Alternative estimators for the population totals in multiple
characteristic survey. Commun. Statist. -- Theory Meth., 26(10), 2527--2537.
[ 491 ] Kumar, P., Gupta, V.K. and Nigam, A.K. (1985). On inclusion probability proportiona l to size
sampling scheme. J. Statist. Planning Infer., 12, 127--131.
[ 492 ] Kumar, P. and Herzel, A. (1988). Estimating population totals in surveys involving multi-
characters. Metron, 33--47.
[ 493 ] Kumar, P., Srivastava, AX. and Agarwal, S.K.(1986). A genera l class of unequal probability
sampling schemes. Statistica, 46, 67--74.
[ 494 ] Kumar, S and Lee, H. (1983). Evaluation of composite estimation for the Canadian Labour Force
Survey. Su rvey Meth od ology , 9, 1--24.
[ 495 ] Laake, P. (1986). Optimal estimates and optimal predictors of finite population characteristics in
the presence of non-response. Metrika, 33,69--77.
[496] Lahiri, D.B. (1951). A method for sample selection providing unbiased ratio estimates. Bull. Ins.
Statist. Inst.,33(2), 133--140.
[ 497 ] Lahiri, P. and Rao, J.N.K. (1995). Robust estimation of mean squared error of small area
estimators . J. Am er. Statist. Assoc., 90, 758--766.
[ 498 ] Laird, N.M. and Louis, T.A. (1987). Empirical Bayes confidence intervals based on bootstrap
samples. J. Am er. Statist. Assoc., 82, 739--750.
[ 499] Lakshmi, D.V. and Raghavarao, D. (1992). A test for detecting untruthful answering in
randomized response procedure. J. Stat ist. Planning Infer., 31, 387--390.
[500] Lanke, J. (1975). On the choice of the unrelated question in Simons version of randomised
response. J. Am er. Stat ist. Asso c., 70, 80--83.
[501] Lanke, J. (1976). On the degree of protection in randomized interviews. Int . Statist. Rev ., 44,197-
-203.
[ 502] Lee, H. and Kim, J.K. (2002). Jackknife variance estimation for two-phase samples with high
sampling fractions. Joint Stat istical Meetings, N Y--Sec tion on survey research meth ods, 2024--2028
[ 503 ] Lee, H., Rancourt, E. and Sarnd all, C.E. (1994). Experiments with variance estimation from
survey data with imputed values. J. Official Statist., 10(3),231--243.
Bibliography 1157
[504) Lee, H., Rancourt, E. and Samdall, C.E. (1995a). Variance estimation in the presence of imputed
data for the generalized estimation system. Proc. oJthe American Statist. Assoc. (Social Surv ey Research
Methods Section) , 384--389 .
[505) Lee, H., Rancourt, E. and Sarndall, C.E. (1995b). Jackknife variance estimation for data with
imputed values. Statistical Society ojCanada. Proceedings ojthe Survey Methods Section , 111--115.
[ 506 ) Lent , J., Miller, S.M. and Cantwell, PJ. (1996). Effect of composite weight on some estimates
from the current population surveys. Proc. oj the section on Survey Research Methods. Amer. Statist .
Asso c., 130--139.
[ 507 ) Leysieffer, F.W . and Warner, S. L. (1976). Respondent jeopardy and optimal designs in
randomi zed response models. J. Amer. Statist. Assoc., 71, 649--656.
[ 508 ) Linacre, SJ. and Trewin, OJ. (1993). Total survey design application to a collection of the
construction industry. J. Official Statist ., 9, 611--621.
[ 509 ] Lindley, D.V. and Deely, J.J . (1993). Optimum allocation in stratified sampling with partial
information. Test, 2(1),147--160.
[510] Little, RJ.A. and Yao, L. (1996). Intent to treat analysis for longitudinal studies with drop outs .
Biometrics, 52 , 1324--1333.
[ 511 ] Lohr, S.L. and Rao , J.N.K. (1998). Jackknife variance estimation in dual frame surveys. Tech.
Rep.. LaboratoryJor research in Statistics and Probability, Carleton University.
[ 512 ] Lohr, S.L. and Rao, J.N.K. (2000). Inference from dual frame surveys. J. Amer. Statist .
Asso c., 95 , 271-- 280 .
[ 513 ) Lundstrom, S. (1997). Calibration as a standard method for treatment oj non-response. Ph.D .
Thesis.
[ 514) Lundstrom, S. and Sarndal, C.E. (1999). Calibration as a standard method for treatment of non-
response. J. Official Statist., 15(2),305--327 .
[ 515 ] MacOibbon, B. and Tomberlin, TJ. (1989). Small area estimates of proportion via empirical
Bayes techniques. Survey Methodology, 15(2),237--252.
[516) Madow, W.O. (1949). On the theory of systematic sampling --11. Ann. Math. Statist ., 20, 333--
354.
[517) Madow, W.O. (1953). On theory of systematic sampling -- III. Ann . Math . Statist ., 24, 101--106 .
[ 518 ] Madow, W.O. and Madow, L.H. (1944). On the theory of systematic sampling -- I. Ann . Math .
Statist .. IS, 1--24.
[ 519 ) Mahajan, P.K. and Singh, S. (1996). On estimation of total in two stage sampling. 1. Statist . Res.,
30,127--131 .
[ 520 ] Mahajan, P.K. and Singh, S. (1997). Almost unbiased ratio and product type estimators: A new
approach. Biom. 1., 39(3), 509--516.
[ 521 ] Mahalanobis, P.C. (1940). A sample survey of acreage under jute in Bengal. Sankhy d , 4, 511--
530 .
[ 522 ] Mahalanobis, P.C. (1942). General report on the sample census oj area under jute in Bangal.
Indian Central Jute Committee.
1158 Advanced sampling theory with applications
[ 523 ] Mahalanobis, P.C. (1944). On large scale sample surveys. Phil. Transac. Roy. Soc.. London, B,
231, 324--351.
[ 524 ] Mahalanobis, P.C. (1946). Recent developments in statistical sampling in the Indian Statistical
Institute. J. R. Statist. Soc ., 109,326--378.
[ 525 ] Mahmood, M., Singh , S. and Hom , S. (1998). On the confidentiality guaranteed under
randomized response sampling : A comparison with several new techniques. Biom . J . 40 (2), 237--242.
[526] Mak, T.K. and Kuk, A.Y.C. (1993). A new method for estimating finite population quantiles
using auxiliary information. Canad. J. Statist., 21(1), 29--38 .
[527] Malec, D., Sedransk, J., Moriarity, C.L. and LeClere, F.B. (1997). Small area inference for binary
variables in the National Health Interview Survey. J. Amer. Statist. Assoc.• 92, 815--826.
[528] Mandowara, V.L. and Gupta, P.C. (1999). Contribution to optimum points of stratification for
multi-stage designs. Metron, 57, 51--66.
[ 529 ] Mangat, N.S . (1991). An optional randomized response sampling technique using non-
stigmatized attribute. Statistica, LI , 595--602.
[ 530 ] Mangat, N.S. (1992). Two stage randomized response sampling procedure using unrelated
question . J. Indian Soc. Agril. Stati st., 44 (1),82-87.
[ 53 I ] Mangat, N.S. (1993). Estimation of population total using an alternative estimator for RHC
scheme. Statistica, 53, 251--259.
[ 532 ] Mangat, N.S. (1994). An improved randomized response strategy. J. R. Statist. Soc., B, 56, 93--
95.
[ 533] Mangat, N.S. and Singh, R. (1990) . An alternative randomized response procedure. Biometrika,
77, 439--442.
[ 534 ] Mangat, N.S. and Singh, R. (199Ia). An alternative randomized response procedure for sampling
without replacement. J. Indian Statist. Assoc., 29(2), 127--13 1.
[535] Mangat, N.S . and Singh, R. (l99Ib). An alternative approach to randomized response survey.
Statistica, 51(3), 327--332.
[ 536] Mangat, N.S. and Singh, R. (1992 -93). Sampling with varying probabilities without replacement:
A review . A/igarh J. Statist ., 12& 13, 75-- 106.
[ 537 ] Mangat, N.S. and Singh, R. (1995). A note on the inverse binomial randomized response
rocedure. J.lndian Soc. Agri.l. Statist., 47(1), 21--25 .
[ 538 ] Mangat, N.S., Singh, R. and Singh, S. (1991). Alternative estimators in randomised response
technique. A/igarh J. Statist., 11,75--80.
[539] Mangat, N.S., Singh, R. and Singh , S. (1992). An improved unrelated question randomized
response strategy. Cal cutta Statist. Assoc. Bull ., 42, 277--281.
[540] Mangat, N.S., Singh, R. and Singh , S. (1995). Unrelated question randomised response model
without randomization device. Estadistica, 47 ,59--68.
[541 ] Mangat, N.S., Singh, R. and Singh, S. (1997). Violation of respondent's privacy in Moor's model
-- its rectification through a random group strategy . Commun Statist. -- Theory Meth. , 26 (3) , 743--754.
Bibliography 1159
[ 542 ] Mangat, N.S., Singh, R., Singh, S. Bellhouse, D.R. and Kashani, H.B. (1995). On efficiency of
estimator using distinct respondents in randomised response survey. Survey Methodology, 21(I), 21--23.
[ 543 ] Mangat, N.S., Singh, R., Singh, S. and Singh, B. (1993). On Moors' randomised response model.
Biom. J., 35(6), 727--732.
[ 544 ] Mangat, N.S. and Singh, S. (1994). An optional randomized response sampling technique. J.
Indian Statist. Assoc ., 32, 71--75.
[ 545 ] Mangat, N.S., Singh, S. and Singh, R. (1993). On the use of a modified randomization device in
randomised response inquiries. Metron, 51 (I), 2 I 1--216.
[ 546 ] Mangat, N.S., Singh, S. and Singh, R. (1995). On use of a modified randomization device in
Warner's model. J. Indian Soc. Statist. Opers. Res., 16,65--69.
[ 547 ] Manisha and Singh, R.K. (2001). An estimation of population mean in the presence of
measurement errors. J. Indian Soc. Agric. Statist., 54(1), 13--18.
[ 548 ] Manwani, A.H. and Singh, K.B. (1978). Studies in systematic sampling for two-dimensional finite
population with special reference to survey for estimation of guavas. J. Indian Soc. Agric. Statist., 30, I,
82--93.
[549] Marker, D.A. (1983). Organization of small area estimators. Proc. Survey Research Method
Section. Amer. Statist. Assoc., Washington, D.C., 409-414.
[ 550 ] Mayor, I.A. (2002). Optimum cluster selection probabilities to estimate the finite population
distribution function under PPS cluster sampling. Test, 11(I), 73--88.
[ 551 ] McCarthy, M.D. (1939). On the application of the z-test to randomized blocks . Ann. Math.
Statist.. 10,337.
[552] McCarthy, PJ. (1969). Pseudo replication: Half samples. Rev. Int. Statist. Inst., 37, 239--264.
[ 553 ] McLeod, A.I. and Bellhouse, D.R. (1983). A convenient algorithm for drawing a simple random
sample. App. Statist.. 32, 182--184.
[554] Meeden, G. (1992). Basu's contribution to the foundations of sample survey. Current issues in
statistical inference: Essays in Honor of D. Basu by Ghosh and Pathak. Lecture Notes -- Monograph
Series. Institute ofMathematical Statistics. Hayward. California. 17, 178-- 186.
[ 555 ] Meeden, G. (2000). A decision theoretic approach to imputation in finite population sampling. J.
Amer. Statist. Assoc., 95, 586--595.
[556] Meeden, G. and Gosh, M. (1981). Admissibility in finite problems. Ann. Statist. 9, 846--852.
[ 557 ] Meeden, G. and Ghosh, M. (1983). Choosing between experiments : applications to finite
population sampling. Ann. Statist., 11,296--305.
[ 558 ] Meng, X.L. (1994). Multiple imputation inferences with uncongenial sources of input (with
discussion). Statist. Sci., 9, 538--573.
[ 559 ] Mickey, M.R. (1959). Some finite population unbiased ratio and regression estimators. J .Amer.
Statist. Assoc ., 54, 594--612.
[ 560 ] Midha, C.K. (1980). Contribution to survey sampling and design of experiments . Unpublished
Ph.D. thesis, Iowa State University Press, Ames, Iowa.
1160 Advanced sampling theory with applications
[ 561 ] Midzuno, H. (1952). On the sampling system with probability proportional to sum of sizes. Ann .
Inst. Statist. Math ., 3, 99--107.
[563] Milne, A. (1959). The centric systematic area sample treated as a random sample. Biometrics, 15,
270--297.
[ 564 ] Mishra, G. and Rout, K. (1997). A regression estimator in two-phase sampling in presence of two
auxiliary variables. Metron , 55, 177--186.
[ 565 ] Mishra, R.N. and Sinha, J.N. (1999). Randomized response procedure with multiple statement.
Aligarh J. Statist., 19, 1--9.
[566] Mohanty, S. (1977). Sampling with repeated units. Sankhy ii , C, 39, 43--46.
[ 567 ] Mohanty, S. and Sahoo, L.N. (1987). A class of estimators based on mean per unit ratio
estimators. Statistica, 47,473--477.
[ 568 ] Mohanty, S. and Sahoo, J. (1995). A note on improving the ratio method of estimation through
linear transformation using certain known population parameters. Sankhy Ii , B, 57, 93--102.
[569] Mohanty, S. and Pattanaik, L.M. (1984). Alternative multivariate ratio estimators using geometric
and harmonic means. J. Indian Soc. Argic. Statist., 36,100--118.
[ 570 ] Montanari, G.E. (1998). On regression estimation of finite population means. Survey
Methodology, 24(1), 69--77.
[ 571 ] Montanari, G.E. (1999). A study on the conditional properties of finite population mean
estimators. Metron, 57, 21--35.
[ 572 ] Moors, J.J.A. (1971). Optimization of the unrelated question randomized response model. J.
Amer. Statist. Asso c., 66, 627--629.
[573] Moors, J.J.A. (1997). A critical evaluation of Mangat's two-step p rocedure in randomized
response. Discussion paper at Center for Economic Research, Tilburg University, The Netherlands.
[574] Moors, J.J.A., Smeets, R. and Boekema, F.W.M. (1998). Sampling with probabilities proportional
to the variable of interest. Statistica Neerlandica, 52, 129--140.
[ 575 ] Morrison, T., Mangat, N.S., Desjardins, G. and Bhatia, A. (2000). Validation of an in-line
inspection metal loss tool. Proceedings ofthe Internat ional Pipeline Conference 2000, Calgary, Alberta,
Canada . The American Society ofMechanical Engineers, New York, Vol 2, 839--844.
[ 576 ] Morrison, T., Mangat, N.S, Carroll, L.B. and Riznic, J. (2002). Statistical estimation of flaw size
measurement errors for steam generator inspection tools. Proceedings of the 4th international Steam
Generator Conference. Canadian Nuclear Society, Toronto, Ontario, May 5--8.
[ 577 ] Morrison, T. Mangat, N.S., Carroll, L.B. and Riznic, J. (2003). Statistical estimation of flaw size
measurement errors for steam generator tube inspection tools. Submitted to Nucl ear Engineering and
Design.
[578] Moses, L.E. (1978). Energy information validation -- A status report. Proc . of the /978 DOE
Stat istical Symposium, 33--49.
[ 579 ] Moura, F.A.S. and Holt, D. (1999). Small area estimation using multilevel models. Survey
Methodology, 25(1), 73--80.
Bibliography 1161
[ 580 ] Mukerjee , R. and Chaudhuri , A. (1990). Asymptotic optimality of doub le sampl ing plans
employ ing generalized regression estimators. J. Statist. Plann ing Infer., 26, 173--183.
[ 581 ] Muke rjee, R. and Sengupt a, S. (1990). Optimal estimation of a finite population mean in the
presence of linear trend . Biometrika, 77., 625--630.
[ 582 ] Mukerjee, R., Rao, T.J. and Vijayan, K. (1987). Regression type estimator using multiple
auxiliary information. Austral. J. Statist., 29(3), 244--254 .
[ 583 ] Mukerjee, R., Rao, TJ. and Vijayan, K. (2000). Rejo inder to Ahmed , M.S. (1998) : A note on
regression type estimators using multiple auxiliary information. Austral. & New Zealand J. Statist .. 42(2),
245.
[ 584 ] Mukhop adhya y, P. (1977). Further studies in samp ling theory. Unpubl ished Ph.D. thesis
submitted to the University of Calcutta .
[585] Mukhopadhyay, P. (1982) . Optimum strategies for estimating the variance ofa finite population
under a superpopulation mode l. Metrika , 29, 143--158.
[ 586 ] Mukhopadhyay, P. (1994) . Prediction in finite popu lation under error in variables superpopulation
models . J. Statist . Plann ing Infer ., 41, 151--161.
[587] Mukhopadhyay, P. and Bhattacharyya, S. (1990-91) . Estimating a finite population variance under
some genera l linear models with exchangeable errors. Calcutta Statist . Assoc . Bull., 40, 219--228 .
[ 588 ] Murthy, M.N. (1957). Ordered and unordered estimators in sampling without replacement.
Sankhy d , 18,379--390.
[ 589 ] Murthy, M.N. (1961) . Introduction to sampling theory : Lecture Notes . Indian Statistical
Institutes .
[ 590 ] Murthy, M.N. (1962) . Almost unbiased estimators based on interpenetrating sub-samples.
Sankhy ii , 303--314.
[591 ] Murthy, M.N. (1963) . Genera lized unbiased estimation for finite populations. Sankhy d , B, 25,
245--261.
[592] Murthy, M.N. (1964) . Product method of estimat ion. Sankhy d , A, 26,69--74.
[ 593 ] Murthy , M.N. (196 7). Sampling theory and methods . Statistical Publish ing Society , Calcutta.
[ 594 ] Murthy , M.N. (1977) . Sampling theory and methods. Second edition, Statistic al Publication
Society, Calcutta .
[ 595 ] Murthy, M.N. and Nanjamma, N.S. (1959). Almost unbia sed ratio estimates based on
interpenetrating sub-sample estimates . Sankhy a,
21, 381--392.
[ 596 ] Murthy, M.N. and Singh, M.P. (1969). On the concepts of best and admissible estimators in
sampling theory . Sankhya, A, 31, 343--354 .
[ 597 ] Nanjamma, N.S., Murthy, M.N. and Sethi, V.K. (1959) . Some sampling systems providing
unbiased ratio estimators. Sankhy ii , 21, 299--314.
[598] Nara in, R.D. ( 1951). On sampling without replacement with varying probabilities. 1. Indian Soc.
Agril. Statist ., 3, 169--174.
1162 Advanced sampling theory with applications
[ 599 ] Nayak, T.K. (1994). On randomized response surveys for estimating a proportion . Commun .
Statist»-Theory Meth., 23( I), 3303--3321.
[ 600 ] Nelson, D. and Meeden, G. (1998). Using prior information about population quantiles in finite
population sampling. Sankhy d , A, 60, 426--445.
[ 602 ] Neyman, J. (1934). On two different aspects of the representative methods, the method of
stratified sampling and the method of purposive selection. J. R. Statist. Soc. 97,558--606.
[ 603 ] Neyman, J. (1938). Contribution to the theory of sampling human populations. J. Am er. Statist.
Assoc ., 33, 101--116.
[ 604 ] Neyman, 1. (1971). Discussion of Royall (/971) : Foundations of Statistical Inference (V.P.
Godambe and D.A. Sprott, eds). Holt, Rinehart & Winston, Toronto, 276--278.
[ 605 ] Nieto de Pascual, 1. (1961). Unbiased ratio estimates in stratified sampling. J. Amer. Stat ist.
Assoc. , 56, 70--87.
[ 606 ] Ogus, 1.K. and Clark, D.F. (1971). The annual survey of manufacturers: A report on
methodology . Technical Report No. 24, U.S. Bureau of Census, Washington, D.C.
[608] Okafor, F.e. (1992). The theory and application of sampling over two occasions for the estimation
of current population ratio. Stat istica , I , 137--147.
[ 610 ] Okafor, F.C. and Amab, R. (1987). Some strategies of two-stage sampling for estimating
population ratios over two occasions. Austrial. J. Statist ., 29(2), 128--142.
[ 611 ] Okafor, F.C. and Lee, H. (2000). Double sampling for ratio and regression estimation with sub-
sampling the non-respondents . Survey Methodology, 26, 183--188.
[612] Olkin, I. (1958). Multivariate ratio estimation for finite population . Biometrika, 43,154--163.
[ 613 ] Padmawar, V.R. (1994). Strategies admitting non-negative unbiased variance estimators. 1.
Statist. Planning Infer ., 40, 81--95.
[615] Padmawar, V.R. (I998a). On estimating non-negative definite quadratic forms. Metrika, 49, 231--
244.
[ 616 ] Padmawar, V.R. (1998b). On 7[ PS designs and stratification. J. Indian Statist. Assoc., 36, 99--
104.
[ 617 ] Paik, M.C. (1997). The generalized estimating equation approach when data are not missing
completely at random. J. Amer. Statist . Assoc., 92, 1320--1329.
Bibliography 1163
[618] Panda, P. and Sahoo, L.N. (1999). Predictive estimation of finite population mean using a product
estimator for two-stage sampling. Biom. J., 41(1), 93--97.
[619] Pandey, B.N. and Dubey, V. (1989). On almost unbiased estimators. Metron, 333-- 338.
[ 620 ] Pandey, S.K. and Singh, R.K. (1984). On combination of ratio and PPS estimators . Biom . J., 26
(3),333--336.
[ 621 ] Patel, H.C. and Dharmadhikari, S.W. (1978). Admissibility of Murthy and Midzuno's estimators
within the class oflinear unbiased estimators of finite population totals. Sankhy d , C, 40, 21--28.
[ 622 ] Pathak, P.K. (1961). On the evaluation of moments of distinct units in a sample. Sankhy a, A, 23,
415--420.
[ 623 ] Pathak, P.K. (1962). On simple random sampling with replacement. Sankhy ii . A, 24, 287--302.
[ 624 ] Pathak, P.K. (1966). An estimator in PPS sampling for multiple characteristics . Sankhy a. A, 28,
35--40.
[ 625 ] Pathak, P.K. (I 967a). Asymptotic efficiency of Des Raj's strategy--l. Sankhy d , A, 29, 283--298.
[ 626 ] Pathak, P.K. (1967b). Asymptotic efficiency of Des Raj's strategy--Il. Sankhya, A, 29, 299--
304.
[627] Pathak, P.K. and Rao, TJ. (1967). Inadmissibility of customary estimators in sampling over two
occasions. Sankhy a , A, 29, 49--54.
[ 628 ] Patterson, RD. (1950). Sampling on successive occassions with partial replacement units. J. R.
Statist. Soc., 241--255.
[ 630 ] Pedgaonkar, A.M. and Prabhu--Ajaonkar, S.G. (1978). Comparison of sampling strategies.
Metrika, 25, 149--153.
[ 631 ] Pfeffermann, D. (1984). A note on large sample properties of balanced samples. J. R. Statist. Soc..
B, 46, 38--4 1.
[ 632 ] Pitman, EJ.G. (1937). Significance tests which can be applied to samples from any population--
1lI: The analysis of variance test. Biometrika, 29, 322--335.
[ 633 ] Platek, R. and Grey, G.B. (1983). Imput ation methodology: total survey error. In incomplete data
in sample surveys, 2, Ed. W.G. Madow, I. Olkin, and D.B. Rubin, 249-333. New York: Academic Press.
[ 634 ] Pokropp, F. (2001). Imposed linear structures in conventional sampling theory. Allgemeines
Statist. Archiv, 86, 333--352.
[ 635 ] Politz, A. and Simmons, W. (1950). Note on an attempt to get not at home into the sample without
callbacks. J. Amer. Statist. Assoc., 45,136--137.
[ 636 ] Pollock, K.H. and Bek, Y. (1976). A comparison of three randomized response models for
quantitative data. J. Amer. Statist. Assoc ., 71, 884--886.
[ 637 ] Prabhu--Ajgaonkar, S.G. (1975). The efficient use of supplementary information in double
sampling procedures . Sankhy a.
C, 37, 181--189.
1164 Advanced sampling theory with applications
[ 638 ] Pradhan, B.K. (200 I). Modified chain regression estimators using multi auxiliary information.
Stati stica, no.I, 249--258.
[ 639 ] Prasad, B. (1989). Some improved ratio type estimators of population mean and ratio in finite
population sample surveys. Commun. Sta tist.--Theory Meth., 18(1),379--392.
[ 640 ] Prasad, B. and Singh, H.P. (1990). Some improved ratio type estimators of finite population
variance in sample surveys. Commun. Stati st.-- Theory Meth.• 19,1127--1139.
[ 641 1 Prasad, B. and Singh, H.P. (1992). Unbiased estimators of finite population variance using
auxiliary information in sample surveys. Commun. Statist»- Theory Meth ., 21(5),1367--1376.
[ 642 ] Prasad, B., Singh, R.S. and Singh, H.P. (1996). Some chain ratio type estimators for ratio of two
population means using two auxiliary characters in two phase sampling. Me tron, 95--113 .
[ 643 ] Prasad, N.G.N and Graham, J.B. (1994). PPS sampling over two occasions. Survey Methodology ,
20,59--64.
[ 644] Prasad, N.G.N. and Rao, J.N.K. (1999). On robust small area estimation using a simple random
effects model. Survey Methodology, 25, 67--72.
[646] Purcell, N.J. and Kish, L. (1979). Estimation for small domain. Biometrics, 35, 365--384.
[ 647 ] Purcell, N.J. and Kish, L. (1980). Postcensal estimates for local areas (or domains). Int. Stat ist.
Rev., 48, 3--18.
[ 648 ] Quenouille , M.H. (1949). Problems in plane sampling. Ann . Math . Statist., 20, 355--375.
[ 649 ] Quenouille, M.H. (1956). Notes on bias in estimation. Biometrika, 43, 353--360.
[ 650 ] Raghavarao, D. (1971). Constru ctions and combinatorial problems in design 01 exp eriments.
Wiley, New York.
[ 651 ] Raghunandanan , K. and Bryant, E.C. (1971). Variance in multi-way stratification Sankhy ii , A,
33, 221--226.
[ 652 ] Raiffa, H. and Schlaifer, R. (1961). Applied statistical decision theory. Boston.
[ 653 ] Raj, D. (1954a). On sampling probabilities proportional to size. Ganita, 52, 175--182.
[654 ] Raj, D. (I954b). Ratio estimator in sampling with equal and unequal probabili ties. J. Indian Soc.
Agril. Statist., 6, 127--138.
[ 655 ] Raj, D. (1956). Some estimators in sampling with varying probabilities without replacement. J.
Am er. Statist. Assoc ., 51, 269--284.
[ 656 ] Raj, D. (1958). On the relative accuracy of some sampling techniques. J. Am er. Statist.Assoc .,53,
98--101.
[ 657 ] Raj, D. (1964). On double sampling for pps estimation. Ann. Ma th. Statist ., 35, 900--902.
[ 658 ] Raj, D. (1965a) . On a method of using multi-auxiliary in sample surveys. J. Amer. Statist.Assoc..
60, 270--277.
Bibliography 1165
[ 659 ] Raj, D. (1965b). On sampling over two occasions with probability proportionate to size. Ann.
Math. Statist ., 36, 327--330.
[660] Raj, D. (1966) .Some remarks on a simple procedure of sampling without replacement. J. Amer.
Statist. Assoc. , 61,391--397.
[ 662 ] Raj, D. and Khamis , S.H. (1958) . Some remarks on sampling with replacement. Ann. Math.
Statist.• 29, 550--557 .
[ 663 ] Ramachandran, G. ( 1982). Horvitz and Thompson estimator and generalized 7tPS designs . J.
Statist. Planning InJer., 7, 151--153.
[664] Ramachandran, G. and Rao, T.J . (1974) . Allocation to strata and relativ e effic ienci es of strat ified
and unstratified nps sampl ing schemes . J. R. Statist. Soc., 97, 558-- 606 .
[ 665 ] Ramachandran, V. and Pillai, S.S. (1976) . Multivariate unbi ased ratio type estimation for finite
sampling. J. Indian Soc. Agril. Statist ., 28,71--80.
[ 666 ] Ramakrishnan, M.K. (1969) . Some remarks on the comparison of sampling with and without
replacement. Sankhy a , A, 31,333--342.
[ 667 ] Ramak rishnan , M.K. ( 1975a). A generalisation of the Yates--Grundy variance estimator.
Sankh y Ii , C, 37, 204--206.
[668] Ramakrishnan, M.K. (1975b). Choice of an optimum sampling strategy --I. Ann . Stat ist., 3, 669--
679 .
[ 669 ] Ramakrishnan, M.K. and Rao, V.V. B. (1975) . On the sample mean in simple random sampling
without replacement. Sankhy a.
C, 37, 207--210 .
[ 670 ] Rana, R.S. (1989). Concise estimator of bias and variance of the finite population correlation
coefficient. J. Indian Soc. Agril. Statist ., 41, 69--76 .
[ 671 ] Rana, R.S. and Singh , R. (1989) . Note on systematic sampling with supplementary observations.
Sankhy E , S, 51, 205--211.
[ 672] Rangarajan , R. (1957). A note on two stage samp ling . Sankhy a, 17,373--376 .
[ 673 ] Rao, C.R. (1945) . Information and accuracy obtainable in an estimation of a statistical parameter.
Bull. Calcutta Math . Soc., 37, 81.
[674] Rao, C.R. (1973). Linear Statistical Inference and its Applications. Wiley , New York.
[675] Rao, C.R. (1975 ). Some problems of sample surveys . Proceedings ofthe conference on directions
for math emati cal statistics. University oj Alb erta. Edmonton . Canada .
[ 676 ] Rao, C.R. (1987). Strategies of data analyst. Proceedings oj 46'· Session oj the Interna tional
Statisti cal Institut e. Tokyo.
[ 677 ] Rao , J.N.K. (1961). On sampling with varying probabilities in sub-sampling designs. J. Indian
Soc. Agric. Stati st., 13,211--217.
[678] Rao, J.N .K. (l963a). On two systems of unequal probability sampling without replacement. Ann.
Inst. Statist. Math.,15, 67--72 .
1166 Advanced sampling theory with applications
[679] Rao, J.N.K. ( 1963b). On three procedures of unequal probabili ty sampling without replacement. J.
Amer. Statis t. Assoc., 58, 202--215.
[680] Rao, J.N.K. (I965a). A note on estimation of ratios by Quenouille 's method. Biometrika, 52, 647-
-649.
[ 68 1 ] Rao, J.N.K. ( 1965b). On two simple schemes of unequal probability sampling without
replacement. J. Indian Soc. Agric. Statist., 3, 169--174.
[ 682 ] Rao, J.N.K. ( 1966a). Alternative estimators in PPSWR sampling for multiple characteristics.
Sankhy d , A, 28, 47--60.
[ 683] Rao, J.N.K. (I 966b). On the relative efficiency of some estimators in PPS sampling for multiple
characteristics. Sankhy ii , A, 28, 6 1--70.
[ 684 ] Rao, J.N.K. (1966 c). On the comparison of sampling with and without replacement. Rev. Int.
Statist. Inst., 34, 125--138.
[ 685] Rao, J.N.K. (196 7). The precision of Mickey's unbiased ratio estimator. Biometrika. 54,93--108.
[ 686 ] Rao, J.N.K. (I 968a). Some small sample results in ratio and regression estimation. J. Indian
Statist. Assoc., 6, 160--168.
[ 687 ] Rao, J.N.K. ( 1968b). Some non-response sampling theory when the frame contains an unknown
amount of duplication. J. Amer. Statist. Assoc., 63, 87--90.
[ 688 ] Rao, J.N.K. ( 1969). Ratio and regression estimators in new developm ents in survey sampling. eds.
N. L. Johnson and H. Smith, New York, John Wiley.
[ 689 ] Rao, J.N.K. (1975). Unbiased variance estimation for multi-stage designs. Sankhy ii , C, 37, 133--
139.
[ 690 ] Rao, J.N.K. (1979 ). On deriving the mean square errors and their non-negative unbiased
estimators in finite population sampling. J. Indian Soc. Agric. Statist., 17, 125-- 136.
[ 691] Rao, J.N.K. ( 1985). Cond itional inference in survey sampling. Survey Methodology. 11,15--31.
[ 692 ] Rao, J.N.K. (1989). A note on Narain's necessary condition in sampling. J. Indian Soc. Agric.
Statist. , 41, 3 16--3 17.
[ 693 ] Rao, J.N.K. (1994). Estimating totals and distribution functions using auxiliary information at the
estimation stage. 1. Official Statist., 10(2), 153-- 165.
[ 694] Rao, J.N.K. (I 996a). Some current topics in sample survey theory. J. Indian Soc. Agril.Statist.. 50,
244--263 .
[ 695 ] Rao, J.N .K. (1996b ). On variance estimation with imputed survey data. J. Amer. Statist. Assoc .,
9 1, 499--506.
[ 696 ] Rao, J.N.K. (1997). Developments in sample survey theory: an appraisal. Canad. J. Statist., 25,
1--21.
[ 697 ] Rao, J.N.K. (I 999a). Some current trends in sample survey theory and methods. Sankhy a. B, 6 1,
1--57.
Bibliography 1167
[ 698 ] Rao, J.N.K. (I999b). Reply to comments on 'some current trends in sample survey theory and
methods' . Sankhya , B, 61, 53--57.
[ 699 ] Rao, J.N.K. (1999c) . Some recent advances in model based small area estimation. Surv ey
Methodology. 25, 175--186.
[ 700 ] Rao, J.N.K. (2000). Conditional inference for large and small areas. J. Indi an Statist . Assoc., 38
(2), 383--398.
[ 701 ] Rao, J.N.K. (2002). Discussion of 'Exact linear unbiased estimation in survey sampling' . J.Statist.
Plann ing Infer., 102,39--40.
[ 702] Rao, J.NK (2003). Small area estimation. John Wiley and Sons, NY.
[ 703 ] Rao, J.N.K. and Bayless, D.L. (1969). An empirical study of the stabilities of estimators and
variance estimators in unequal probability sampling of two units per stratum. J. Ame r. Statist. Asso c., 64,
540--559.
[704] Rao, J.N.K. and Beegle, L.D. (1967). A Monte Carlo study of some ratio estimators. Sankhy a , B,
29,47--56.
[ 705 ] Rao, J.N.K. and Bellhouse, D.R. (1978). Optimal estimation of a finite population mean under
generalized random permutation models. J. Statist . Planning Infer., 2,125--141.
[ 706 ] Rao, J.N.K. and Graham, J.B. (1964). Rotation designs for sampling on repeated occasions. J.
Amer. Stat ist. Assoc., 59,492--509.
[707] Rao, J.NK, Hartley,H.O. and Cochran, W.G. (1962). A simple procedure of unequal probability
sampling without replacement. J. R. Statist. Soc., B, 24, 482--491.
[ 708 ] Rao, J.N.K. and Lanke, J. (1984). Simplified unbiased variance estimation for multistage designs.
Biometrika. 71,387--395.
[ 709 ] Rao, J.N.K and Rao, P.S.R.S. (1971). Small sample results for ratio estimators . Biometrika, 58,
625--630.
[ 710 ] Rao, J.N.K. and Shao, J. (1992). Jackknife variance estimation with survey data under hot deck
imputation. Biometrika, 79, 811--822.
[ 711 ] Rao, J.N.K. and Shao, J. (1999). Modified balanced repeated replication for complex survey data.
Biometrika, 86,403--415.
[ 712] Rao, J.N.K. and Singh, M.P. (1973). On the choice of estimator in survey sampling. Austral. J.
Statist., 15(2),95--104.
[ 713 ] Rao, J.N.K. and Sitter, R.R. (1995). Variance estimation under two-phase sampling with
application to imputation for missing data. Biometrika, 82, 453--460.
[ 714 ] Rao, J.N.K. and Vijayan, K.(I977). On estimating the variance in sampling with probability
proportional to aggregate size. J. Amer. Statist. Assoc, 72, 579--584.
[7 15] Rao, J.N.K.and Webster,J.T.(1966). On two methods of bias reduction in the estimation of ratios.
Biom etrika, 53,571--577 .
[ 716 ] Rao, J.N.K. and Wu, C.FJ. (1985). Inference from stratified samples: second order analysis of
three methods for non-linear statistics. J. Amer. Statist . Assoc., 80, 620--630.
1168 Advanced sampling theory with applications
[ 717 ] Rao, J.N.K. and Yu, M. (1994). Small area estimation by combining time series and cross
sectional data. Canad. J. Statist., 22, 511--528.
[ 718 ] Rao, P.S.R.S. (1969). Comparison of four ratio type estimates under a model. J. Amer. Sta tist.
Asso c., 64, 574--580.
[71 9] Rao, P.S.R.S. (1972). On two phase regression estimator. Sa nkhy a , A, 34, 373--476.
[ 720 1 Rao, P.S.R.S. (1974). Jackknifing the ratio estimator. Sankhy a. 36, 84--97.
[721 ] Rao, P.S.R.S. (1975). Hartley--Ross type estimators with two-phase sampling. Sankhy a. 37, 140-
-146.
[ 722 ] Rao, P.S.R.S. (1979). On applying the jackknife procedure to the ratio estimator. Sankhy a. 41,
115--126.
[ 723 ] Rao, P.S.R.S. and Mudholkar, G.S. (1967). Generalized multivariate estimators for the mean of
finite population parameters. J. Indian Soc. Agric . Statist ., 62, 1008--1012.
[ 724 1 Rao, TJ . (1966). On certain unbiased ratio estimators. Ann . Inst. Stat ist. Math. ,18, 117-- 121.
[725] Rao, T.J. (1967). Contributions to the theory of sampling strat egi es. Ph.D. Thesis, I.S.I. Calcutta.
[726] Rao, TJ . (1968). On the allocation of sample size in stratified sampling. Ann. Inst. Statist. Math . •
20, 159--166.
[ 727 1 Rao, T.J. (1971). nps sampling designs and Horvitz and Thompson estimator. J. Amer. Stat ist.
Assoc.• 66,872--875.
[ 728 1 Rao, TJ. (1972). Horvitz and Thompson and Des Raj estimator revisited. Austral. J. Statist.. 14,
227--230.
[ 729 1 Rao, TJ. ( 1977a). Estimating the variance of the ratio estimator for the Midzuno--Sen sampling
scheme. Metrika , 24 ,203-- 208 .
[7301 Rao, TJ. (I 977b). Optimum allocation of sample size and prior distributions: a review. Int. Statist.
Rev.. 45, 173--179.
[ 731 ] Rao, T.J. (1981). On a class of almost unbiased ratio estimators. Ann . Inst. Stati st. Math ., A, 33,
225--231.
[ 732 1 Rao, T.J. (I 993a). On certain problems of sampling design and estimation for multiple
characteristics. San khy a.B, 55, 372--38 1.
[ 733 ] Rao, T.J. (1993b). On certain alternative estimators for multiple characteris tics in varying
probability sampling. J. Indian Soc. Ag ril. Statist . 45(3), 307--318.
[734] Rao, T.J. (l983c). Horvitz--Thompson strategy vs. stratified random sampling strategy. J. Statist.
Planning Infer.. 8, 43--50.
[ 735 1 Rao, T.J. (1984). Allocation of sample size to strata and related problems. Biom . J., 26, 517--526.
[ 736 ] Rao, T.J., Sengupta, S. and Sinha, B.K. (1991). Some order relations between selection and
inclusion probabilities for PPSWOR sampling scheme. Metrika, 38, 335--343.
Bibliography 1169
[737] Ray, S. and Das, M.N. (1995). On systematic sampling allowing estimation of variance of mean.
J. Indian Soc. Agril. Statist ., 47(2), 192--196.
[ 738 ] Ray, S. and Das, M.N. (1997). Circular systematic sampling with drawback. J. Indian Soc. Agric .
Statist ., 50( I), 70--74.
[ 739 ] Ray, S.K. and Sahai, A. (1979). A note on ratio and product type estimators. Ann . Inst . Statist .
Math, 31,141 --144.
[740] Ray, SK and Singh, K. (1981). Difference cum ratio type estimators. J.Lndian Statist . Assoc., 19,
147--151.
[ 741 ] Reddy, V.N. (1973). On ratio and product method of estimation. Sankhy a ,B, 35, 307--3 I6.
[ 742] Reddy, V.N. (1974). On a transformed ratio method of estimation. Sankhy a , C, 36, 59--70.
[ 743 ] Reddy, V.N. (l978a). A study on the use of prior knowledge on certain population parameters in
estimation. Sankhy ii , C, 40, 29--37.
[ 744 ] Reddy, V.N. (1978b). A comparison between stratified and unstratified random sampling.
Sankhy d , C, 40, 99--103.
[745] Reddy, V.N. (1980). Systematic sampling in monotone populations. Sankhya ,C, 42, 97--108.
[ 746 ] Reddy, V.N. and Rao, T.J. (1977). Modified PPS method of estimation. SankhyIi , C, 39, 185--
197.
[747] Reddy, V.N. and Rao, T.J. (1990). On estimation of the population total of bottom (top) P
percentiles of a finite population. Metron , 309--320 .
[ 748 ] Ren, R. (2000). Estimation de la fonction de repartition et des fractiles d'une population finite.
Vllemes Journees de Methodologies Statist ique. Paris.
[ 749] Renssen, R.H. and Nieuwenbroek, N.J. (1997). Aligning estimates for common variables in two
or more sample surveys. J. Amer. Statist. Assoc. , 92, 368--374.
[ 750 ] Richardson, S.c. (1989). One pass selection of a sample with probability proportional to size.
Appl. Statist ., 38, 517--520.
[ 751 ] Rizvi, S.E.H., Gupta, J.P. and Singh, R. (2000). Approximately optimum stratification for two
study variables using auxiliary information. J. Indian Soc. Agric. Statist ., 53(3), 287--298.
[752] Robins, J.M. and Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113--124.
[753] Robson, D.S. (1957). Applications of multivariate polykays of the theory of unbiased ratio type
estimation. J. Amer. Statist . Assoc., 52, 5 I 1--522.
[ 754 ] Rosen, B. (l997a). On sampling with probability proportional to size. J. Statist. Planning Infer.,
62,159--191.
[ 755 ] Rosen, B. (I 997b). Asymptotic theory for order sampling. J. Statist. Planning Infer. , 62, 135--
158.
[ 756 ] Rosen, B. (1998). On inclusion probabilities for order sampling. Rand D Report, Research
Methods, Development , 2, 1--23.
1170 Advanced sampling theory with applications
[ 757 ] Roy, J. and Chakravorty, I.M. (1960). Estimating the mean of a finite population . Ann. Math.
Statist., 31, 392--398.
[ 758 ] Royall, R.M. (I 970a). On finite population sampling theory under certain linear regression
models. Biometrika,57, 377--387 .
[759] Royall, R.M. (l970b). Finite population sampling: on labels in estimation. Ann. Math. Statist.. 41,
1774--1779.
[ 760 ] Royall, R M. (I 970c). On finite population sampling theory under certain linear regression
models. Biometrika , 57, 377--387.
[761 ] Royall, R.M. (1971). Linear regression models in finite population sampling theory. Foundations
ofStatist. Infer. (V.P. Oodambe and D.A. Sprott, eds). Holt, Rinehart & Winston, Toronto, 259--274.
[ 762 ] Royall, R.M. (1976). The linear least squares prediction approach to two-stage sampling. 1. Amer.
Statist. Assoc.•71, 657--664 .
[ 763 ] Royall, R.M. (1986). The prediction approach to robust variance estimation in two-stage cluster
sampling. J. Amer. Statist. Assoc., 81, 119--123.
[ 764 ] Royall, R.M. (1992). Robustness and optimal design under prediction models in finite population
sampling. Survey Methodology, 2, 179--195.
[ 765 ] Royall, R.M. and Cumberland, W.O. (1978). Variance estimation in finite population sampling. J.
Amer. Statist . Assoc ., 73, 351--358.
[ 766 ] Royall, R.M. and Cumberland, W.O. (198Ia). An empirical study of the ratio estimator and
estimators of its variance. J. Amer. Statist. Assoc., 73, 351--358.
[ 767 ] Royall, R.M. and Cumberland, W.O. (l98Ib). The finite population linear regression estimator
and estimators of its variance -- An empirical study. 1. Amer. Statist . Assoc .. 76,924--930.
[ 768 ] Royall, R.M. and Cumberland, W.O. (1985). Conditional coverage properties of finite population
confidence intervals. 1. Amer. Statist. Assoc ., 80, 355--359.
[769] Royall, R.M. and Eberhardt, K.R. (1975). Variance estimates for the ratio estimator. Sankhy a. C,
37,43--52.
[ 770] Royall, R.M. and Herson, J. (l973a). Robust estimation in finite populations -- I. J. Amer. Statist.
Assoc.. 68, 880--889.
[ 771 ] Royall, R.M. and Herson, J. (l973b). Robust estimation in finite populations -- II: Stratification
on a size variable. J. Amer. Statist. Assoc., 68, 890--893.
[ 772 ] Royall, R.M. and Pfeffermann, D. (1982). Balanced samples and robust Bayesian inference in
finite population sampling. Biometrika , 69, 401-409.
[773] Rubin, D.S. (1976). Inference and missing data. Biometrika , 63, 581--592.
[ 774 ] Rubin, D.B. (1978). Multiple imputation in sample surveys -- a phenomenological Bayesian
approach to nonresponse . In Proc. Sect. Survey res. Meth ., pp. 20-34. Washington D.C.:American
Statistical Association.
[775] Rubin, R.B.(l987). Multiple imputationfor non-response in surveys. John Wiley, NewYork.
Bibliography 1171
[776] Rubin, R.B. (1996). Multiple imputation after 18+ years. J. Amer. Statist. Asso c., 91, 473--490.
[ 777 ] Rubin, D.B. and Schenker, N. (1986) . Multiple imputation for interval estimation from simple
random samples with ignorable non-response. J. Am er. Stat ist. Assoc., 8 1, 366--374.
[ 778 ] Rueda, M. and Arco s, A. (2002) . The use of quantiles of auxiliary variable to estimate medians .
Biom. J ., 44(5) , 6 I9--632.
[779] Rueda, M., Arcos, A. and Artes, E. (1998) . Quantile interval estimation in finite population using
a multivariate ratio estimator. Metrika , 47,203--213 .
[ 780 ] Ruiz, M. and Santos , J. (1990) . Sampling design providing unbiased new product estimator.
Statistiea, 50, 285--288.
[ 781 ] Ruiz, M. and Santos, J. (1992) . Variance estimation with systematic sampling. Rev. Acad. Cien e
Zaragoza. 47,121--124.
[782] Sadasivan, G. and Aggarwal, R. (1978) . Optimum points of stratification in bivariate populations.
Sankhy a.C, 40, 84--97 .
[ 783 ] Sadasivan, G. and Srinath , M. (1975) . Some contributions to post-cluster sampling. Sankhy a. C,
37,171--180.
[ 784] Sahai, A. and Ray, S.K . (1980) . An efficient estimator using auxiliary information. Metrika, 27,
271--275.
[ 785 ] Sahai , A. and Sahai, A. (1985). On efficient use of auxil iary information. J. Statist. Plann ing
Infer., 12,203--212.
[786] Sahoo, 1. and Sahoo, L.N. (1999a). A comparative study of some regression type estimators in
double sampling procedures. Aligarh J. Statist., 19,67--76.
[ 787 ] Sahoo, 1. and Sahoo , L.N. (I 999b) . An alternat ive class of estimators in double sampling
procedures. Calcutta Statist. Assoc. Bull., 49, 79--83 .
[788] Sahoo, J. Sahoo , L.N. and Mohanty , S. (1994) . Unequal probability sampling using a transformed
auxiliary variable. Metron, 71--83.
[ 789 ] Sahoo , J., Sahoo, L.N. and Wywial, J. (1997) . Some thoughts of reduction of estimation bias
using auxiliary information in sample . Statistics in Transition, 3(2), 383--401.
[790] Sahoo , L.N (1983). On a method of bias reduction in ratio estimation. J. Statist. Res. 17,1--6 .
[ 791 ] Sahoo, L.N. (1986) . On a ratio method of estimation using a transformed auxiliary variable .
Statistiea, 46, 409--413 .
[ 792 ] Sahoo, L.N. (1987) . A regression type estimator in two-stage sampling. Calcutta Statist. Assoc.
Bull ., 36, 97--100.
[793] Sahoo, L.N. (1991). An unbiased ratio cum product estimator in two-stage sampling. Metron, 213-
-217 .
[ 794 ] Sahoo , L.N. (1994). Some estimation problems in finite population sa mpling using auxiliary
information. Ph. D. Thesis, Utkal University, Bhubaneswar, India.
[ 795 ] Sahoo, L.N. and Panda, P. (1997). A class of estimators in two-st age sampling with varying
probabilities. South African Stati st. J., 31,151--160.
1172 Advanced sampling theory with applications
[ 796 ) Sahoo, L.N. and Panda, P. (I 999a). A class of estimators using auxiliary information in two-stage
sampling. Austral. & New Zealand J. Statist., 41(4), 405--410.
[797) Sahoo, L.N. and Panda, P. (I999b). A predictive regression type estimator in two-stage sampling.
J. Indian Soc. Agric. Statist., 52(3), 303--308.
[ 798 ) Sahoo, L.N., Sahoo, J. and Espejo, M.R. (1998). On some strategies using auxiliary information
for estimating finite population mean. Questiio , 22, 243--252.
[ 899 ) Sahoo, L.N., Sahoo, J. and Mohanty, S. (I995a). Empirical comparison of some regression and
regression type strategies. Statistical Hefte, 36, 337--347.
[ 800) Sahoo, L.N., Sahoo, J. and Mohanty, S. (1995b). A new predictive ratio estimator. J. Indian Soc .
Agric. Statist. 47(3), 240--242.
[801) Sahoo, L.N. and Sahoo, R.K. (2001). Predictive estimation of finite population mean in two-phase
sampling using two auxiliary variables. J. Indian Soc . Agric. Statist., 54(2), 258--264.
[ 802) Sahoo, L.N. and Swain, A.K.P.C. (1980). Unbiased ratio cum product estimator. Sankhy a, C, 42,
56--62.
[ 803 ) Sahoo, L.N. and Swain, A.K.P.C. (1986). Chain product estimators. Aligarh J. Statist., 6,53--58.
[804) Sahoo, L.N. and Swain, AX.P.C. (1987). Some modified ratio estimators . Metron, 285--293 .
[ 805 ) Sahoo, L.N. and Swain, A.K.P.C. (1989). On two modified ratio estimators in two-phase
sampling. Metron , 261--266.
[ 806) Samiuddin, M. and Kattan, A.K.A. (1991). A procedure of unequal probability sampling. Pak. J.
Statist., A, 7, 1--7.
[ 807 ) Samiuddin, M., Kattan, A.K.A., Hanif, M. and Asad, H. (199 2). Some remarks on models,
sampling schemes and estimators in unequal probability sampling. Pak . J. Statist., A, 8,1--18.
[ 808 ) Sampath, S. (1989). On the optimal choice of unknowns in ratio type estimators. J. Indian Soc .
Agric. Statist. 41,166--172.
[ 809 ) Sampath, S. and Chandra, S.K. (1990). General class of estimators for the population total under
unequal probability sampling schemes. Metron , 409--419.
[ 810 ) Sampath, S., Uthayakumaran, N. and Tracy, D.S. (1995). On an alternative estimator for
randomized response technique. J. Indian Soc . Agric. Statist. 47(3),243--248.
[ 811 ) Sampford, M.R. (1962). Methods of cluster sampling with and without replacement for clusters of
unequal sizes. Biometrika, 49, 27--40.
[ 812 ) Sampford, M.R. (1967). On sampling without replacement with unequal probabilities of selection.
Biometrika, 54,499--513.
[ 813 ) Sarndal, C.E. (1980a). On Jf --inverse weighting versus best linear unbiased weighting in
probability sampling. Biometrika. 67,639--650.
[ 814 ) Sarndall, C.E. (I980b). Two model based inference arguments in survey sampling. Austral. J.
Statist., 22, 341--348.
[ 815 ) Sarndal, C.E. (1982). Implications of survey designs for generalized regression estimators of
linear functions. J. Statist. Planning Infer., 7, 155--170.
Bibliography 1173
[ 816 ] Sarndal , C. E.( 1992): Methods for estimating the precision of survey estimates when imputation is
used. Survey Methodol ogy, 18, 241--252 .
[ 817 ] Sarndal, C.E. (1996) . Efficient estimators with simple variance in unequal probability sampling .
J. Amer. Statist .Assoc.,9 I, 1289--1300.
[ 818 ] Sarndal, C.E. and Hidiroglou, M.A. (1989). Small domain estimation : a conditional analysis . J.
Amer. Statist. Asso c., 84, 266--275 .
[819] Sarndal, C.E. and Wright, R.L. (1984). Cosmetic form of estimators in survey sampling . Scand. J.
Statist., II, 146--156.
[ 820 ] Sarndal, C.E. and Swensson , B. (1987). A general view of estimation for two phases of selection
with application to two-phase sampling and non-response. Int. Statist. Rev.. 55, 279--294.
[ 821 ] Sarndal, C.E., Swensson, B. and Wretman, J.H. (1989). The weighted residual technique for
estimating the variance of the general regression estimator of the finite population total. Biometrika, 76
(3),527--537.
[ 822 ] Sarndal, C.E., Swensson , B. and Wretman, J.H. (1992). Model assisted survey sampling .
NewYork: Springer--Verlag.
[ 823 ] Saxena, R.R., Singh, P. and Srivastava, AX. (1986). An unequal probability sampling scheme.
Biometrika , 73(3), 761--763 .
[ 824 ] Saxena, S.K., Nigam, A.K. and Shukla, N.D. (1995). Variance estimation for combined ratio
estimator . Sankhy a.
B, 57. 85--92.
[ 825 ] Schafer, J.L. and Schenker, N. (2000). Inference with imputed conditional means. 1. Amer. Statist.
Assoc ., 95,141--154.
[ 826 ] Schneeberger, H. (1979). Saddle points of the variance of the sample mean in stratified sampling .
Sankhy a. C, 41, 92--96
[ 827 ] Schreuder, H.T., Gregoire, T.G. and Wood, G.B. (1993) . Sampling methods for multi-resource
forest inventory. Wiley, New York.
[ 828 ] Schueany, W.R., Gray, H.L. and Owen, D.B. (1971). Bias reduction in estimation. J. Amer.
Statist. Assoc.. 66, 524--533 .
[829] Scott, A. and Smith, T.M.F. (1969). Estimation in multi-stage surveys . J. Amer. Statist. Assoc.. 64,
830--840 .
[830] Scott, AJ., Brewer, K.W. and Ho, E.W. (1978). Finite population sampling and robust estimation .
J. Amer. Statist. Assoc .. 73, 359--361.
[ 831 ] Searls, D.T. (1964). The utilization of a known coefficient of variation in the estimation
procedure . J. Amer. Statist. Assoc., 59,1225--1226.
[ 832 ] Searls, D.T. (1967). A note on the use of an approximately known co-efficient of variation.
American Statistician, 21(2), 20--21.
[833] Sedransk , 1. and Meyer, J. (1978). Confidence intervals for quantiles ofa finite population : simple
random and stratified simple random sampling . J. R. Statist. Soc.. B, 40, 239--252 .
[ 834 ] Sekkappan, R.M. (1973). Bayes estimation and uniform admissibility for sampling from finit e
population . Ph. D. Thesis, University of Waterloo.
1174 Advanced sampling theory with applications
[ 835 ) Sekkappan, R.M. (1981). Subjective Bayesian multivariate stratified sampling from finite
populations. Metrika, 28,123--132.
[ 836 ) Sekkappan, R.M. and Thompson, M.E. (1975). On a class of uniformly admissible estimators for
finite populations . Ann . Statist ., 3, 492--499.
[ 837 ) Sekkappan , R.M. and Thompson, M.E. (1994). Multi-phase and successive sampling for a
stratified population with unknown stratum sizes. Pak. J. Statist ., A, 10, 131--142.
[ 838 ) Sen, A.R. (1952). Present status of probability sampling and its use in estimation of farm
characteristics. Econom et. . 27, 130.
(839) Sen, A.R. (1953). On the estimate of the variance in sampling with varying probabilities. J.lndian
Soc. Agril . Statist ., 5, 119--127.
[ 840 ) Sen, A.R., Seller, S. and Smith, D.N. (1973). The use of ratio estimate in successive sampling.
Biometrics, 31, 673--683 .
[ 841 ) Sengupta, S. (1980). On the admissibility of the symmetrized Des Raj estimator for PPSWOR
sample of size two. Calcutta Statist. Assoc. Bull., 29, 35--44.
[ 842 ) Sengupta , S. (1981a). On interpenetrating samples of equal and unequal sizes. Calcutta Statist .
Accoc. Bull. 30,187--197.
[ 843 ) Sengupta , S. (198Ib). Jackknifing the ratio and the product estimators in double sampling.
Metrika, 28, 245--256.
[ 844 ) Sengupta, S. (1982a). On interpenetrating samples of unequal sizes. Metrika, 29, 175--188.
[ 845 ) Sengupta, S. (l982b). Admissibility of the symmetrized Des Raj estimator for fixed size sampling
designs of size two. Calcutta Statist . Assoc. Bull.. 31, 201--205.
[ 846 ) Sengupta, S. (1983). Admissibility of unbiased estimators in finite population sampling for
samples of size at most two. Calcutta Statist. Assoc. Bull.. 32,91 --102.
[ 847 ) Sengupta, S. (1986). A comparison between PPSWR and Brewer 's st ps WOR procedures.
Calcutta Statist. Assoc. Bull ., 35, 207--210.
[ 848 ) Sengupta, S. (1988). A note on PPS circulation systematic sampling. Calcutta Statist . Asso c. Bull..
37,111--112.
[ 849 ) Serfling, RJ . (1968). Approximate optimum stratification. J. Amer. Statist. Assoc., 63, 1298--
1309.
(850) Seth, G.R. and Rao, J.N.K. (1964). On the comparison between simple random sampling with and
without replacement. Sankhy ii , A, 26, 85--86.
[851 ) Sethi, V.K. (1965). On optimum pairing of units. Sankhy a , B, 27, 315--320.
(852) Shah, D.N. and Shah, S.M. (1979). Unbiased product type estimators. Gujarat Statist. Rev.. 6(2),
34--43.
[ 853 ) Shah, D.N. and Gupta, M.R. (1986). Comparison of double sampling estimators. Metron , 417--
419.
Bibliography 1175
[ 854 ) Shah, D.N. and Patel, P.A. (1996). Asymptotic properties of a generalized regression type
predictor of a finite population variance in probability sampling. Canad. J. Statist .. 24,373--384.
[ 855 ) Shannon, D.F. (1970). Parameter selection for modified Newton methods for function
minimisation. Siam . J. Numerical Analysis, 7, 102--109.
[856) Shao, J. and Chen, Y. (1999). Approximate balanced half sample and related replication methods
for imputed survey data. Sankhy a.
61, 187--201.
[ 857 ) Shao, J., Chen, Y and Chen, Y. (1998). Balanced repeated replication for stratified multistage
survey data under imputation. 1. Amer. Statist. Assoc., 819--831.
[ 858 ) Sharma, S.D. and Sil, A. (1996). A study of Politz--Simmon estimator under non-cooperation. J.
Indian Soc. Statist ., 48(2), 171--184.
[859) Sharma, S.S. (1970). On an estimation in T3 -- class of linear estimators in sampling with varying
probabilities from a finite population. Ann. Inst. Statist. Math.. 22,495--500.
[ 860) Sharma, Y.K., Singh, R., Rai, A. and Verma, S.S. (2000). Regression estimators from survey data
for small sample sizes. 1. Indian Soc. Agric. Statist ., 53(2), 115--124.
[862) Sheers, N. (1992). A review of randomized response technique. Measurement and Evaluation in
Counselling and Development., 25, 27--41.
[ 863 ) Shiledar - Baxi, H.R. (1995). Approximately optimum stratified design for a finite population--II.
Sankhy d , 57, 391--404.
[ 864 ) Shiue, OJ. (1966). Systematic sampling with multiple random starts. Forestry Science, 6, 142--
150.
[ 865 ) Shukla, O.K. (1996). An alternative multivariate ratio estimate for finite population . Bull.
Calcutta Statist . Assoc., 15, 127--134.
[ 866 ) Shukla, D. and Dubey, J. (2001). Estimation in mail surveys under PSNR sampling scheme. J.
Indian Soc. Agric . Statist ., 54(3), 288--302.
[ 867 ) Shukla, D. and Trivedi, M. (2001). Mean estimation in deeply stratified population under post-
stratification . J. Indian Soc. Agric. Statist., 54(2), 221--235.
[ 868 ) Silva, P.L.D.N. and Skinner, CJ. (1995). Estimating distribution functions with auxiliary
information using poststratification . 1. Official Statist., 11,277--294.
[ 869 ) Silverman, RW. (1986). Dens ity estimation for statistics and data analysis. London:Chapman
and Hall.
[ 870) Singh, A.C. (1996). Combining information in survey sampling by modified regression. Proc. of
the Section on Survey Research Methods, American Statistical Association, 120--129.
[ 871 ) Singh, A.C. ad Mohl, c.A. (1996). Understanding calibration estimators in survey sampling.
Survey Methodology, 22, 107--115.
[ 872 ) Singh, A .C., Stukel, D.M. and Pfeffermann, D. (1998). Bayesian versus frequentist measures of
error in small area estimation. 1. R. Statist . Soc., B, 60, 377--396.
1176 Advanced sampling theory with applications
[ 873 ] Singh , A.K. and Singh, H.P. (1997) . A note on the efficiencies of three product type estimators
under a linear model. J. Indian Soc. Agric. Statist ., 50(2), 130--134,
[ 874 ] Singh, AK, Singh, H.P. and Upadhyaya, L.N. (2001) . A generalized chain estimator for finite
population mean in two-phase sampling . J. Indian Soc. Agric. Statist., 370--375 .
[ 875 ] Singh, D. (1956). On efficiency of cluster sampling. J. Indian Soc. Agric. Statist ., 8, 45--55.
[ 876] Singh, D. (1968) . Estimation in successive sampling using a multi-stage design. J. Amer. Statist.
Assoc ., 63, 99--112 .
[ 877 ] Singh, D. and Chaudhary, F.S. (1986). Theory and analysis of sample survey designs . Wiley
Eastern Limited .
[ 878 ] Singh, D., Jindal, K.K. and Garg, J.N. (1968). On modified systematic sampling. Biometrika, 55,
541--546.
[ 879 ] Singh, D. and Singh , B.D. (1965). Some contribution to two-phase sampling. Austral. J. Statist.,
2, 45--67 .
[880] Singh, D. and Singh, P. (1977) . New systematic sampling. J. Statist. Planning Infer., I , 163--177.
[881 ] Singh, G.N. and Upadhyaya, L.N. (1995) . A class of modified chain type estimators using two
auxiliary variables in two-phase sampling. Metron, 117--125.
[ 882 ] Singh, G.N. and Singh, V.K. (2001). On the use of auxiliary information in successive sampling .
1. Indian Soc. Agric. Statist ., 54(1), 1--12.
[ 883 ] Singh, H.P. (1988) . An improved class of estimators of population mean using auxiliary
information. 1. Indian Soc. Agric. Statist., 96--104 .
[884] Singh, H.P. (1989) . A class of unbiased estimators of product of population means . 1. Indian Soc.
Agric. Statist ., 40, 113--118.
[ 885 ] Singh, H.P. and Biradar , R.S. (1992) . Almost unbiased ratio cum product estimators for the finite
popu lation mean . Test, I, 19--29.
[ 886 ] Singh, H.P., Chandra, P. and Singh, S. (2003). Variance estimat ion using multi-auxiliary
information for random non-response in survey sampling. Statistica ( Accepted).
[887] Singh, H.P. and Gangele, R.K. (1995). Almost separation of bias precipitates in the estimator of
'Inverse of population mean' with known coefficient of variation . J. Indian Soc. Agric. Statist., 47, 212--
218.
[ 888 ] Singh, H.P. and Gangele , R.K. (1997). An approach for almost separation of bias precipitates. J.
Indian Soc. Agric. Statist, 50, 11--17.
[ 889 ] Singh, H.P. and Kakran, M.S. (1993). A modified ratio estimator using known coefficient of
kurtosis of an auxiliary character. (Unpublished manuscript) .
[ 890 ] Singh, H.P., Katyar, N.P. and Gangwar , O.K. (1996) . A class of almost unbiased regression type
estimators in two phase sampling applying Quenouille's method . J. Indian Soc. Agric. Statist ., 48(1), 98--
104.
[ 891 ] Singh, H.P. and Sahoo, L.N. (1989) . A class of almost unbiased estimators for population ratio
and product. Calcutta Statist. Assoc. Bull, 38, 241--243 .
Bibliography 1177
[ 892 ] Singh, H.P. and Singh, R. (2001). Improved ratio type estimator for variance using auxiliary
information. J. Indian Soc. Agri c. Stat ist., 54(3), 276--287.
[ 893 ] Singh, H.P. and Singh, S. (2002). Estimation of median of the study variable using known
interquartile range of the auxiliary variable. Working Paper.
[ 894 ] Singh, H.P., Singh, S. and Joarder, A.H. (2003). Estimation of median using know mode of the
auxiliary variable. J. Statist. Research (To appear) .
[ 895 ] Singh, H.P., Singh, S. and Puetas, S.M. (2003a). Ratio type estimators for the median of finite
populations. Allgemeines Statistiches Archiv. (In press)
[ 896 ] Singh, H.P., Singh, S. and Puetas, S.M. (2003b). Estimation of Interquartile range of the study
variable using known interquartile range of the auxiliary variable. Working Pap er.
[ 897 ] Singh, H.P., Singh, S. and Puetas, S.M. (2003c). Estimation of median using three known
quartiles of the auxiliary variable. Working Paper.
[ 898 ] Singh, H.P. and Singh, V.P. (1993). A general class of unbiased estimators of a parameter.
Calcutta Sta tist. Assoc. Bull., 43, 169--170.
[ 899 ] Singh, H.P. and Singh, V.P. (1995). A class of unbiased dual to ratio estimator in stratified
sampling. J. Indian Soc. Agric. Statist., 47(2), 168--175.
[900] Singh, H.P. and Tracy, D.S. (2001). Estimation of population mean in presence of random non-
response in sample surveys. Statistica, LXI, no.2, 231--248.
[ 901 ] Singh, H.P. and Upadhyaya, L.N. (1986) On a class of estimators of the population mean in
sampling using auxiliary information. J. Indian Soc. Agric. Statist., 38, 100--104.
[902] Singh, M. (1979). On the reduction of bias of ratio estimator to a desired degree. Biom . J., 21(7),
645--647.
[ 903 ] Singh, M., Kumar, P. and Chandak, R. (1983). Use of multi-auxiliary variables as a condensed
auxiliary variable in selecting a sample. Commun . Statist .--Theory Meth .• 12,1685--1697.
[904] Singh, M.P. (1967a). Ratio cum product method of estimation. Metr ika, 12,34--42.
[ 905 ] Singh, M.P. (I 967b). Multivariate product method of estimation for finite populations. J. Indian
Soc. Ag ric. Statist.. 19,1 --10.
[ 906 ] Singh, M.P. (1969). Comparison of some ratio-cum-product estimators. Sankhy Ii , B, 31, 375--
378.
[ 907 ] Singh, M.P., Gambino, J. and Mantel, HJ. (1994). Issues and strategies for small area data.
Survey Methodology, 20, 3--22.
[908 ] Singh, P. (1978). A sampling scheme with inclusion probability proportional to size. Sankhy d ,
C,40, 122--128.
[ 909 ] Singh, P. and Garg, J.N. (1979). On balanced random sampling. Sankhy a, C, 41, 60--68.
[ 910 ] Singh, P. and Srivastava, A.K. (1980). Sampling scheme providing unbiased regression
estimators. Biometrika. 67,205--209.
[ 911 ] Singh, P. and Yadav, RJ. (1992). Generalized estimation under successive sampling. J. Indian
Soc. Agric. Statist., 44,27--36.
1178 Advanced sampling theory with applications
[ 912 ] Singh, R. (1971). Approximately optimum stratification on the auxiliary variable. J. Amer. Statist.
Assoc ., 66, 829--833.
[913] Singh, R. (1972). A note on successive sampling over two occasions. Aust. J. Stat ist., 14,2, 120--
122.
[914] Singh, R. (1975a) . A note on the efficiency of ratio estimate with Midzuno 's scheme of sampling.
Sankhy ii , C, 37, 211--214.
[ 915] Singh, R. (1975b) . An alternative method of stratification on the auxiliary variable. Sankhy ii . C.
37, 100--108.
[916] Singh, R. (1984). Double sampling for two auxiliary characters. Calcutta Stat ist. Assoc. Bull.• 33,
193--197.
[ 917 ] Singh, R. and Bansal, M.L. (1975). On the efficiency of interpenetrating sub-samples in simple
random sampling. Sankhy ii , C, 37, 190--198.
[ 918 ] Singh, R. and Bansal, M.L. (1978). A note on the efficiency of interpenetrating sub-samples in
simple random sampling. Sankhy a,
C, 40, 174--176.
[ 919 ] Singh, R. and Kathuria, O.P. (1995). Sampling without replacement in qualitative randomized
response model. J. Indian Soc. Agric. Statist .• 47(2), 134--141.
[920] Singh, R. and Kishore, L. (1975). On Rao, Hartley and Cochran's method of sampling. Sankhy a,
37,88--94.
[ 921 ] Singh, R. and Lal, M. (1978). On the construction of random groups in the RHC scheme.
Sankhy ii , C, 40, 129--135.
[ 922 ] Singh, R. and Mangat, N.S. (1996). Elements of survey sampling . Kluwer Academic Publishers,
The Netherlands .
[ 923 ] Singh, R., Mangat, N.S. and Singh, S.(1993). A mail survey design for sensitive character
without using randomization device. Commun . Statist . -- Theory Meth ., 22(9), 2661--2668.
[924] Singh, R. and Narain, P. (1989). Method of estimation from samples with random sample sizes. J.
Stat ist. Planning Infer.• 23,217--225.
[ 925] Singh, R. and Singh, B. (1974). On replicated samples drawn with Rao, Hartley and Cochran 's
Scheme. Sankhy a,C, 36, 147--150.
[ 926] Singh, R. and Singh, H.P. (1993). A Hartley--Ross type estimator for finite population mean when
the variables are negatively correlated. Metron, 205--216 .
[ 927 ] Singh, R. and Singh, H.P. (1999). A class of unbiased estimators in cluster sampling. J. Indian
Soc. Agric. Statist., 52(3), 299--302.
[928] Singh, R., Singh, H.P. and Espejo, M.R. (1998). The efficiency of an alternative to ratio estimator
under a super population model. J. Stat ist. Plann ing Infer., 71, 287--301.
[ 929 ] Singh, R., Singh, S. and Mangat, N.S.(1995). Mail survey design for sensitive quantitative
variable. Metron , 53,43--54.
Bibliography 1179
[930 ] Singh, R., Singh, S., Mangat, N.S. and Tracy, D.S. (1995). An improved two stage randomized
response strategy. Statist ical Papers, 36, 265--271.
[931] Singh, R. and Sukhatme, B.Y. (1969). Optimum stratification . Ann. Inst. Statist. Math., 21, 515-
528.
[ 932] Singh, R.K. ( 1982a). On estimating ratio and product of population parameters. Calcutta Statist.
Assoc. Bull., 31, 69--76.
[ 933 ] Singh, RK. (\982b). Generalized double sampling estimators for the ratio and product of
population parameters. J. Indian Statist. Assoc ., 20, 39--49.
[ 934 ] Singh, R.K. and Ray, S.K. (198 1). Product cum difference method of estimation using two
auxiliary variables. Biom. 1., 23(6), 563--571.
[ 935 ] Singh, R.K. and Singh, G. (1984a). A class of estimators with estimated optimum values in
sample surveys. Statist. Prob. Lett.. 2, 319--321.
[ 936 ] Singh, RK. and Singh, G. (1984b). Improved generalized ratio cum product estimation. Biom. 1.,
26 ( I), 57--61.
[ 937 ] Singh, RK. and Zaidi, S.M.H. (2000). On estimating square of population mean and population
variance. J. Indian Soc. Agric. Statist., 53(3), 243--256.
[ 938 ] Singh, S. (1988). Estimation in overlapping clusters. Commun. Statist. - Theory Meth.. 17(2), 613-
-621.
[939] Singh, S. (\99Ia). Estimation of finite population variance using double sample. Aligarh LStatist.,
11,53--56.
[ 940 ] Singh, S. (199Ib). On improved strategies in survey sampling . Unpublished Ph.D. thesis
submitted to Punjab Agricultural University, Ludhiana, India.
[ 942 ] Singh, S. (1999). An addendum to the confidentiality guaranteed under randomised response
sampling by Mahmood, Singh and Hom. Biom. J., 41(8), 955--966.
[944] Singh, S. (2000b). Estimation of variance of regression estimator in two phase sampling. Calcutta
Statist. Assoc. Bull., 50, 49--63.
[ 945 ] Singh, S. (2000c). A new method of imputation in survey sampling. Working paper.
[ 946 ] Singh, S. (200 I). Generalized calibration approach for estimating variance in survey sampling.
Ann. Inst. Statist. Math., 53(2), 404--4 17.
[ 947 ] Singh, S. (2002a) . Estimation of median of the study variable using 99 known percentiles of the
auxiliary variable. Working Paper at St. Cloud State University, MN, USA.
[ 948 ] Singh, S. (2002b). On Farrell and Singh's penalized chi square distance functions in survey.
Presented at the SSC--2003 conderence at Halifax, Canada.
[ 949 ] Singh, S. (2002c). A new stochastic randomized response technique. Metrika, 56, 131--142.
1180 Advanced sampling theory with applications
[ 950 ] Singh, S. (2003a). Short note on the linear regression estimator in survey sampling. Working
pap er at St. Cloud State University. St. Cloud. MN. USA..
[ 951 ] Singh, S. (2003b). On Jackknifing the two-phase calibration weights for estimating the variance
of estimator of distibution function using two auxiliary variables. Working pap er at St. Cloud State
University. St. Cloud. MN. USA.
[ 952] Singh, S. (2003c). Golden Jubilee Year-2003 of the linear regression estimator. Working pap er
at St. Cloud State University. St. Cloud. MN. USA.
[ 953 ] Singh, S. and Amab, R. (2003). Penalized chi square distance function for non-response
adjustments. Working paper at St. Cloud State Univerity, and University ofDurb in-Westvile( Submitted
for presentation at JSM-2003. California . USA) .
[ 954 ] Singh, S. and Deo, B. (2002). Imputing with power transformation. Statistical Papers (To appear)
[ 955 ] Singh, S., Grewal, I.S. and Joarder, A.H. (2002). General class of estimators in multi-character
surveys. Statistical Papers (To appear).
[ 956 ] Singh, S. and Hom, S. (1998). An alternative estimator for multi-character surveys. Metrika , 48,
99--107.
[ 957 ] Singh, S. and Hom, S. (1999). An improved estimator of the variance of the regression estimator.
Biom . J., 41(3), 359--369.
[958] Singh, S. and Hom, S. (2000). Compromised imputation in survey sampling. Metrika, 51, 267--
276.
[ 959 ] Singh, S., Hom, S. and Chowdhury, S. (1998). Estimation of stigmatized characteristics of a
hidden gang in finite population. Austral. & New Zealand J. Statist.,40(3) , 291--297.
[960] Singh, S., Hom, S. Chowdhury, S. and Yu, F. (1999). Calibration of the estimators of variance.
Austral. & New Zealand J. Statist. , 40(2), 199--212.
[ 961 ] Singh, S., Hom, S. and Tracy, D.s . (2001). Hybrid of calibration and imputation : estimation of
mean in survey sampling. Statistica , LXI (I), 27--41
[ 962 ] Singh, S., Hom, S. and Yu, F. (1998). Estimation of variance of general regression estimator:
Higher level Calibration Approach. Survey Methodology. 24 (1),41--50.
[ 963 ] Singh, S., Hom, S., Singh, R. and Mangat, N.S. (2003). On the use of modified randomization
device for estimating the prevalence of a sensitive attribute. Statistics in Transition (To appear)
[ 964 ] Singh, S. and Joarder, A.H. (l997a). Optional randomized response technique for quantitative
sensitive character. Metron, LV, 151--157.
[ 965 ] Singh, S. and Joarder, A.H. (l997b). Unknown repeated trials in randomized response sampling.
J. Indian Soc. Agric. Statist ., 50(1),103--105.
[ 966 ] Singh, S. and Joarder, A. (1998). Estimation of finite population variance using random
nonresponse in survey sampling. Metrika, 241--249 .
[ 967 ] Singh, S. and Joarder, A.H. (2002). Estimation of distribution function and median in two-phase
sampling. PakJ. Stati st., 18(2),301--319.
[968] Singh, S., Joarder, A. H. and King, M. L. (1996). Regression analysis using scrambled responses.
Austral. J. Statist. 38 (2), 201--211.
Bibliography 1181
[ 969 ] Singh, S., Joarder, A.H. and Tracy, D.S. (2000). Regression type estimators for random non-
response in survey sampling. Stat istica, LX, 39--44.
[ 970 ] Singh, S., Joarder, A.H. and Tracy, D.S. (2001). Median estimation using double sampling.
Austral. & New Zealand J. Statist., 43 (1),33--46.
[ 971 ] Singh, S. and Kataria, P. (1990). An estimator of finite population variance. J. Indian Soc. Agril.
Statist. 42, 186--188.
[ 972 ] Singh, S. and King, M.L. (1999). Estimation of coefficient of determination using scrambled
responses. J. Indian Soc . Agric. Statist. 52(3), 338--343.
[ 973 ] Singh, S., Mahmood, M. and Tracy, D.S. (200 I). On the estimation of mean and variance of a
sensitive character using distinct units. Statistical Papers, 42, 403--411
[ 974 ] Singh, S., Mangat, N.S. and Gupta, J.P. (1996). Improved estimator of finite population
correlation coefficient. J. Indian Soc. Agric. Statist., 48,141--149.
[ 975 ] Singh, S., Mangat, N.S. and Mahajan, P.K. (1995). General class of estimators. J. Indian Soc.
Agril. Statist. ,47(2), 129--133.
[ 976 ] Singh, S., Mangat, N.S. and Singh, R. (1994). On estimation of mean/total of stigmatised
quantitative variable. Stati stica , 54(3), 383--386.
[ 977 ] Singh, S., Mangat, N.S. and Singh, R. (1997). Estimation of size and mean of a sensitive
quantitative variable for a sub-group of a population. Commun. Statist. -- Theory Meth ., 26(7), 1793--
1804.
[ 978 ] Singh, S., Pannu, C.J.S., Singh, S., Singh, J.P. and Kaur, S. (1996). Energy is Punjab Agriculture.
Department of Farm Power and Machinery, Punjab Agricultural University, Ludhiana, India.
[ 979 ] Singh, S. and Puetas, S.M. (2002). On the estimation of total, mean and distribution function
using two-phase sampling: calibration approach. J. Indian Soc. Agric. Statist. (Revised submitted).
[ 980 ] Singh, S., Singh, H.P., Tailor, R. and Allen, J. (2002). General class of estimators for estimating
ratio of two population means in the presence of random non-response. Working paper.
[ 981 ] Singh, S. and Singh, R. (1979). On random non-response in unequal probability sampling.
Sankhy ii , C, 41,127--137.
[ 982 ] Singh, S. and Singh, R. (1991). Almost bias precipitate filtration: A new technique. Aligarh J.
Statist., 11,5--8.
[ 983 ] Singh, S and Singh, R. (1992a). Improved Franklin's model for randomized response sampling.
J. Indian Statist. Assoc. 30,109--122.
[984] Singh, S. and Singh, R. (1992b). An alternative estimator for randomised response technique. J.
Ind ian Soc . Agic. Statist., 44(2),149--154.
[ 985 ] Singh, S. and Singh, R. (1993a). Almost filtration of bias precipitates:a new approach. J. Indian
Soc. Agril. Statist. 45,214--218.
[986] Singh, S. and Singh, R. (1993b). A new method: almost separation of bias precipitates in sample
surveys. J. Indian Statist. Assoc., 31, 99--105.
1182 Advanced sampling theory with applications
[ 987 ] Singh, S. and Singh, R. (l993c). A class of almost unbiased ratio and product type estimators. J.
Ind ian Soc . Statist. Opers. Res., 14,35--39.
[988] Singh, S. and Singh R (1993d) . Generalised Franklin's model for randomised response sampling.
Commun. Statist. --Theory Meth 22 (3), 741--755.
[ 989 ] Singh, S., Singh, R. and Mangat, N.S. (1996). Estimation of mean of a stigmatized quantitative
variable for a sub-group of the population . Metron, 54 (3-4), 83--91.
[990] Singh, S., Singh, R. and Mangat, N.S. (1998). Estimation of coefficient of variation ofa sensitive
character. Metron, 55. 59--67.
[ 991 ] Singh, S., Singh, Rand Mangat, N.S. (2000). Some alternative strategies to Moors' model in
randomized response sampling. J. Statist. Planning Infer., 83, 243-255.
[ 992 ] Singh, S., Singh, R., Mangat, N.S. and Tracy, D.S. (1994). An alternative device for randomised
responses . Statistica, 54(2),233--243.
[ 993 ] Singh, S. and Singh, S. (1988). Improved estimators of K and B in finite populations . J. Indian
Soc. Agric. Statist.• 40(2), 121--126.
[ 994 ] Singh, S., Singh, S., Mittal, J.P., Pannu, C.J.S. and Bhangoo , B.S. (1994) . Energy inputs and crop
yield relationships for rice in Punjab. Energy-dnternational Journal,19( 10), 1061--1065.
[995] Singh, S., Singh, S., Pannu, C.J.S., Bhangoo, B.S. and Singh, M.P. (1994) . Energy inputs and crop
yield relationships for wheat in Punjab. Energy ConversoMgmt, 35(6), 493--499 .
[996] Singh, S. and Tracy, D.S. (1999). Ridge regressionusing scrambled responses. Metron, 57, 147-157.
[ 997 ] Singh, S. and Valdes, S.R. (2003). Optimum method of imputatio n in survey sampling. Working
paper at St. Cloud State University , St. Cloud,MN. USA.
[ 998] Singh, V.K. and Shukla, D. (1993). An efficient one parameter family offactor type estimators in
sample surveys . Metron, 139-- 159.
[999] Singh, V.K. and Singh, G.N. (1991). Chain type regression estimators with two auxiliary variables
under double sampling scheme. Metron , 279- -289.
[ 1000 ] Singh, V.K., Singh, H.P. and Singh, H.P. (\ 994). Estimation of ratio and product of two finite
population means in two-phase sampling. J. Stat ist. Planning Infer . , 41,163--171 .
[ 1001 ] Singh, V.P. and Singh, H.P. (1997--98). Chain estimators for popu lation ratio in double
sampling . Aligarh J. Statist., 17/18,85--100.
[ 1002] Sinha, B.K. (1973). On sampling schemes to realize pre-assigned sets of inclusion probabili ties
of first two orders . Calcuua Statist. Assoc. Bull., 22, 89--100.
[ 1003 ] Sisodia, B.V.S. and Dwivedi, V.K. (1981). A modified ratio estimator using coefficient of
variation of auxiliary variable. J. Indian Soc. Agric. Statist., 33, 13--18.
[ 1004 ] Sisodia, B.V.S. and Singh, A. (2001). On small area estimation-An empirical study. J. Indian
Soc . Agric. Statist., 54(3), 303--306 .
[ 1005 ] Sitter, R.R. (1993) . Balanced repeated replications based on orthogonal multi-arrays. Biom etrika,
80,211 --221.
[ 1006] Sitter, R.R. ( 1997). Variance estimation for the regression estimator in two-phase sampling . J.
Amer. Statist. Assoc., 92, 780--787.
Bibliography 1183
[ 1007] Sitter , R.R. and Rao, J.N.K. (1997). Imputation for missing values and corresponding variance
estimat ion. Canad. J. Statist., 25(1) , 61--73.
[ 1008 ] Sitter, R.R. and Wu, C. (2002). Efficient estimation of quadratic finite population functions. J.
Amer. Stat ist. Asso c., 97, 535--543
[ 1009] Skinner, C.J. (1991 ). On the efficienc y of raking ratio estimation for multiple frame surveys. J.
Amer. Statist. Assoc., 86, 779--784 .
[ 1010] Skinner, C.J., Holt, D. and Smith, T.M.F. (1989) . Analysis ofcomplex surveys . Wiley New York.
[ 101I ] Skinner, C.J. and Rao, J.N.K. (1996) . Estimation in dual frame surveys with complex designs . J.
Amer. Statist. Assoc., 91, 349--356 .
[ 1012] Smith, H.F. (1938) . An experimental law describing heterogeneity in the yields of agricultural
crops . J. Amer. Statist. Assoc., 28, 1--23.
[ 1013 ] Smith, J.H. (1947) . Estimation of linear functions of cell proportions. Ann . Math . Statist., 18,
231--254.
[ 1014 ] Smith , P. and Sedransk, J. (1983). Lower bounds for confidence coefficients for confidence
intervals for finite population quantiles . Commun . Statist .i-Theory Meth .. 12,1329--1344.
[ 1015] Smith , S.K. and Lewis , B.B. (1980) . Some new techniques for applying the housing unit method
oflocal population estimation. Demography, 17,323--340.
[ 1016] Smith, T.M.F. (1969). A note on ratio estimates in multi-stage sampling. J. R. Statist. Soc.. A,
132, 426--430.
[ 1017 ] Smith, T.M.F. (1978) . Principles and problems in the analysis of repeated surveys . In N.
Krishnan Namboodiri, ed. Survey Sampling and Measurement, Aead. Press, NY.
[ 1018] Smith, T.M.F. (1984). Present position and potential developments: some personal views, sample
surveys . J. R. Statist. Soc., A, 147,208--221.
[ 1019 ] Smith , T.M.F. and Sugden, R.A. (1985) . Inference and the ignorability of selection for
experiments and surveys . Bull. Int. Statist. Inst., 44'h Session, Book II , 10.2- I to 10.2-- I2.
[ 1020] Smith , T.M .F. (1995). Problems of resource allocation . Proc. Statist. Can. Symp ., 95, Statistics
Canada, 107--114.
[ 1021 ] Srikantan, K.S. (1963). A note on interpenetrating sub-samples of unequal sizes . Sankhy d , B,
25, 345--350.
[ 1022] Srinath, K.P. (1971). Multiphase sampling in non-response problems . J. Amer. Statist. Assoc.,
66, 583--586.
[ 1023 ] Srinath, K.P. and Hidiroglou , M.A. (1980). Estimation of variance in multi-stage sampling.
Metrika, 27,121--125.
[ 1024 ] Srivastava, J. and Ouyang, Z. (1992) . Studies on a general estimator in sampling, utilizing
extraneous information through a sample weight function . J. Statist. Plann ing Infer.. 31, 199--2 I8.
[ 1025 ] Srivastava, J.N. and Saleh, F. (1985). Need of t design in sampling theory. Utilitas
Math emati ca, 25, 5-- I 7.
1184 Advanced sampling theory with applications
[ 1026 ] Srivastava, S.K. (1965). An estimator of the mean of a finite population using several auxiliary
variables. J. Indian Statist. Assoc., 3, 189--194.
[ 1027] Srivastava, S.K. (1967). An estimator using auxiliary information in sample surveys. Calcutta
Statist. Assoc. Bull., 16,121--132.
[ 1028 ] Srivastava, S.K. (1971). A generalized estimator for the mean of a finite population using multi-
auxiliary information. J. Amer. Statist. Assoc. , 66,404--407.
[ 1029 ] Srivastava, S.K. (1980). A class of estimators using auxiliary information in sample surveys.
Canad. J. Statist.. 8(2), 253--254.
[ 1030 ] Srivastava, S.K. (l98Ia). A generalized two-phase sampling estimator. J Indian Soc. Agric.
Statist., 33, 38--46.
[ 1031 ] Srivastava S.K. (1981b). A note on generalized RPO estimator in double sampling. J Indian
Soc. Agric. Statist., 33, 89--93.
[ 1032] Srivastava, S.K. (1983). Predictive estimation of finite population mean using product estimator.
Metrika, 30,93--99.
[ 1033 ] Srivastava, S.K. (1992). A note on improving classes of estimators in sample surveys. J. Indian
Soc . Agric. Statist., 44,267--270.
[ 1034 ] Srivastava, S.K. and Jhajj, S.K. (1980). A class of estimators using auxiliary information for
estimating finite population variance. Sankhy d , C, 42,87--96.
[ 1035 ] Srivastava, S.K. and Jhajj, H.S. (1981). A class of estimators of the population mean in survey
sampling using auxiliary information. Biometrika, 68, 341--343.
[ 1036 ] Srivastava, S.K. and Jhajj, H.S. (l983a). A class of estimators of the population mean using
multi-auxiliary information. Calcutta Statist. Assoc. Bull., 32,47--56.
[ 1037 ] Srivastava, S.K. and Jhajj, H.S. (1983b). Class of estimators of mean and variance using
auxiliary information when correlation coefficient is also known. Siam. J, 25(4),401--409.
[ 1038 ] Srivastava, S.K. and Jhajj, H.S. (1986). On the estimation of finite population correlation
coefficient. J Indian Soc. Agric. Statist., 38,82--91.
[ 1039 ] Srivastava, S.K. and Jhajj, H.S. (1987). Improved estimation in two-phase and successive
sampling. J. Indian Statist. Assoc., 25, 71--75.
[ 1040 ] Srivastava, S.K. and Jhajj, H.S. (1995). Classes of estimators of finite population mean and
variance using auxiliary information. J Indian Soc. Agril. Statist., 47 ,119--128.
[ 1041 ] Srivastava, S.K., Jhajj, H.S. and Sharma, M.K. (1986). Comparison of some estimators of K and
B in finite populations . J. Indian Soc. Agric. Statist., 38(2), 230--236.
[ 1042] Srivastava, S.R., Khare, B.B. and Srivastava, S.R. (1990). A generalized chain ratio estimator for
mean of finite population . J. Indian Soc. Agric. Statist., 42,108--117.
[ 1043 ] Srivastava, V.K. and Bhatnagar, S. (1981). Ratio and product methods of estimation when X is
not known. J Statist. Res., 15,29--39.
[ 1044 ] Srivastava, V.K., Dwivedi, T.O., Chaubey, Y.P. and Bhatnagar, S. (1983). Finite sample
properties of Beale's ratio estimator. Commun. Statist. -- Theory Meth. , 12(15), 1795--1805.
Bibliography 1185
[ 1045 ] Srivenkataramana , T. (1978). Change of origin and scale in ratio and difference methods of
estimation in sampling . Canad. J. Statist., 6, 79--86.
[ 1046 ] Srivenkataramana, T. ( 1980). A dual to ratio estimator in sample surveys. Biom etrika, 67,199--
204.
[ 1047] Srivenkataramana, T. and Tracy, D.S. (1979). On ratio and product methods of estimation in
sampling . Statistica Neerlandica, 33, 37--49.
[ 1048] Srivenkataramana, T. and Tracy, D.S. (1980). An alternative to ratio method in sample surveys .
Ann . Inst . Statist. Math ., A, 32, 111--120.
[ 1049 ] Srivenkataramana, T. and Tracy, D.S. (1981). Extending product method of estimation to ositive
correlation case in surveys . Austral. J. Statist.. 23, 95--100.
[ 1050 ] Srivenkataramana, T. and Tracy, D.S. ( 1983). Interchangebility of the ratio and product methods
in sample surveys. Commun. Stati st. -- Theory Meth., 12(18), 2143--2150 .
[ 1051 ] Srivenkataramana, T. and Tracy, D.S. ( 1984). Positive and negative valued auxiliary variates in
surveys . Metron, 207- -319 .
[ 1052] Srivenkataramana, T. and Tracy, D.S. (1986). Transformations after sampling. Statistics, 17,
597--608 .
[ 1053] Srivenkataramana, T. and Tracy, D.S. (1989). Two-phase sampling for selection with probability
proportional to size in sample surveys. Biom etrika, 76(4), 818--82 1.
[ 1054 ] Strachan, R., King, M.L. and Singh, S. (1998). Likelihood-based estimation of the regression
model with scrambled responses. Austral. & New Zealand J. Statist.,40(3), 279--290.
[ 1055 ] Strauss, 1. (1982) . On the admissibility of estimators for the finite population variance. Metrika,
29, 195--202.
[ 1056 ] Stephan, F.F. (1945). The expected value and variance of the reciprocal and other negative
powers of a positive Bernoulli variate. Ann . Math . Statist., 16, 50--61.
[ 1057 ] Stroud , T.W.F. (1994) . Bayesian analysis of binary survey data. Canad. 1. Statist.• 22,33--45.
[ 1058] Stukel, D.M., Hidiroglo u, M.A. and Sarndal, C.E. (1996) . Variance estimation for calibration
estimators: A comparison of Jackknifing versus Taylor linearization. Survey Methodology, 22,107--1 15.
[ 1059 ] Stukel, D.M. and Rao, J.N.K. (1997). Estimation of regression model with nested error structure
and unequal error variance under two and three stage cluster sampling. Statist. Prob. Lell., 35, 401--407.
[ 1060 ] Stukel, D.M. and Rao, J.N.K. (1999). On small-area estimation under two-fold nested error
regression models . J. Statist. Planning Infer., 78, 131--147.
[ 1061 ] Subramani, J. (2000) . Diagonal systematic sampling scheme for finite populat ions. 1. Indian Soc.
Agric. Statis t., 53(2),187--195.
[ 1062 ] Subramani , 1. and Tracy, D.S. (1993). Determinant sampling scheme for finite populations .
Working paper at University oj Windsor. Canada.
[ 1063] Sud, U.C. and Srivastava , A.K. (2000). Estimation of population mean in repeated surveys in the
presence of measurement errors. J. Indian Soc. Agri c. Statist., 53(2), 125--133 .
1186 Advanced sampling theory with applications
[ 1064 ) Sud, V.C., Srivastava, A.K. and Sharma, D.P. (200Ia). On a biased estimator in repeated
surveys. J. Indian Soc. Agric. Statist., 54(1), 29--42.
[ 1065) Sud, U.c., Srivastava, A.K. and Sharma, D.P. (200Ib). On the estimation of population variance
in repeated surveys. J. Indian Soc. Agric. Statist., 54(2), 355--369.
[ 1066) Sudakar, K. (1978). A note on 'Circular Systematic Sampling Design'. Sankhy a , C, 40, 72--73.
[ 1067) Sukhatme, B.Y. (1962). Some ratio type estimators in two-phase sampling. J. Amer. Statist.
Assoc., 57, 628--632.
[ 1068) Sukhatme, P.Y. (1944). Moments and product moments of moment statistics for samples of the
finite and infinite populations . Sankhy d , 6, 363--382.
[ 1069) Sukhatme, P.Y. (1953). Sampling theory ofsurveys with applications. Iowa State College Press,
Ames,lowa.
[ 1070 ) Sukhatme, P.Y. (1954). Sampling theory of surveys with applications. Indian Society of
Agricultural Statistics, New Delhi.
[ 1071 ) Sukhatme, P.Y., Panse, V.G. and Sastri, K.Y.R. (1958). Sampling techniques for estimating the
catch of sea fish in India. Biometrics, 14, 78--96.
[ 1072 ) Sukhatme, P.Y. and Sukhatme, B.Y. (1970). Sampling theory of surveys with applications.
Second Edition, Asia Publishing House, Bombay, India.
[ 1073) Sukhatme, P.Y., Sukhatme, B.Y., Sukhatme, S. and Asok, C. (1984). Sampling theory ofsurveys
with applicat ions. Iowa State University Press and Indian Society of Agricultural Statistics, New Delhi.
[ 1074) Sunter, A.B. (1977). List sequential sampling with equal or unequal probabilities without
replacement. Applied Statist ., 26, 261--268.
[ 1075 ) Swain, A.K.P.C. and Mishra, a. (1992). Unbiased estimators of finite population variance using
auxiliary information. Metron, 201--215.
[ 1076 ) Swain, A.K.P.C. and Mishra, a. (1994). Limiting distribution of ratio estimator of finite
population variance. Sankhya, B, 56,11 --17.
[ 1077) Swain, A.K.P.C. and Sahoo, L.N. (1982). Comparison of three almost unbiased ratio estimators
in a survey for quantitative characteristics. Statistica , 42(3),397--401.
[ 1078) Tallis, a .M. (1978). Note on robust estimation in finite populations. Sankhy a , C, 40, 136--138.
[ 1079) Tallis, a .M. (1991). A note on balanced cluster sampling. Statist . Prob. Lett., I I, 169--172.
[ 1080) Tam, S.M. (1984). Optimal estimation in survey sampling under a regression superpopulation
model. Biometrika .Ti, 645--647 .
[ 1081 ) Tam, S.M. (1986). Characterization of best model based predictors in survey sampling.
Biometrika. 73,232--235.
[ 1082) Tam, S.M. (1995). Optimal and robust strategies for cluster sampling. J. Amer. Statist. Assoc.,
90,379--382.
Bibliography 1187
[ 1083 ) Taylor, J.M., Muzoz, A., Bass, S.M., Sah, A.1., Chmiel, J., Kingsley, L. et al. (1990) . Estimating
the distribution of times from HIV seroconversion to AIDS using multiple imputation. Statist . Med., 9,
505--514 .
[ 1084) Tepping, B.1., Hurwitz, W.N. and Deming, W.E. (1943) . On the efficiency of deep stratification
in block sampling . J. Amer. Statist . Assoc., 38, 93--100.
[ 1085) Thompson, M.E. (1997). Theory ofsample surveys. Chapman & Hall, London, U.K.
[ 1086) Thompson, S.K. (1990). Adaptive cluster sampling. J. Amer. Statist. Assoc., 85,1050--1059.
[ 1087 ) Thompson, S.K. and Seber, GAF. (1996). Adaptive sampling. New York: Wiley and Sons.
[ 1088) Tikkiwal, B.D. (1953). Optimum allocation in succesive sampling. J. Indian Soc. Agric. Statist .•
5,100--102.
[ 1089 ) Tikkiw al, B.D. (1955). Mult iphase sampling on successive occasions. Unpublished Ph.D. thesis
submitted to North Carolina State University.
[ 1090) Tikkiwal , B.D. (1958). Theory of successive two-stage sampling . (Abstract). Ann . Math. Statist .•
29, 1291.
[ 1091 ) Tikkiwal, B.D. (1960). On the theory of classical regression and double sampling estimation . J.
R. Statist. Soc.,B, 22,131--138.
[ 1092) Tikkiwal, B.D. (1965). The theory of two-stage sampling on succesive occasions . J. Indian Soc.
Agril. Statist.• 125--136 .
[ 1093) Tikkiwal, B.D. (1979). Succesive sampling - a review. Bull. Int. Statist. Inst ., 48, 367--383 .
[ 1094) Tille, y. (1998) . Estimation is surveys using conditional inclusion probabilities : Simple random
sampling. Int. Statist. Rev., 66, 303--322.
[ 1095) Tin, M. (1965). Comparison of some ratio estimators. J. Amer. Statist. Assoc.,60, 294--307.
[ 1096) Toutenburg, H. and Srivastava, V.K. (1998). Estimation of ratio of population means in survey
sampling when some observations are missing. Metrika, 48, 177--187.
[ 1097) Tracy, D.S. (1984). Moments of sample moments. Commun . Statist.s-Theory Meth ., 3(5), 553--
562.
[ 1098 ) Tracy, D.S. and Mangat, N.S. (1995). Respondent's privacy hazards in Moors ' randomized
response model-- A remedial strategy. Int. J. Math. & Statist . Sci., 4( I), 121--130.
[ 1099) Tracy, D.S. and Mangat, N.S. (1996a). Some developments in randomized response sampling
during the last decade - A follow up of review by Chaudhuri and Mukerjee , J. Applied Statist. Sci.,
4(2/3) ,147--158 .
[ 1100 ) Trcay, D.S. and Mangat, N.S. (I 996b). On respondent's jeopardy in two alternate question
randomized response model. J. Statist . Planning Infer., 55(1), 107--114.
[ 1101 ) Tracy, D.S. and Mangat, N.S. (1998). Comparisons of distinct units based estimators in unrelated
question randomized response model. Internal. J. Math . & Statist . Sci., 7, 229--240 .
[ 1102) Tracy, D.S. and Osahan, S.S. (I 994a). Determinant sampling versus some conventional sampling
schemes. Pak. J. Statist ., 10(1), 99--121.
1188 Advanced sampling theory with applications
[ 1103 ) Tracy, D.S. and Osahan, S.S. (I 994b). Estimation in overlapping clusters with unknown
population size. Survey Methodology, 20(1), 53--57.
[ 1104) Tracy, D.S. and Osahan, S.S. (l994c). Random nonresponse on study variable versus on study
as well as auxiliary variables. Stat istica, 54, 163--168.
[ 1105) Tracy, D.S. and Osahan, S.S. (1999). A partial randomized response strategy . Test, 4(2), 315--
322.
[ 1106) Tracy, D.S. and Singh, H.P. (1998). A modified ratio cum product estimator. Int.. J. Math. &
Statist. Sci., 7, 201--212.
[ 1107 ) Tracy, D.S. and Singh, H.P. (1999). A general class of chain regression estimators in two-phase
sampling. J. Appl. Stati st. Sci., 8, 205--216.
[ 1108) Tracy, D.S., Singh, H.P. and Singh, R. (1996). An alternative to ratio cum product estimator in
sample surveys. 1. Statist. Planning Infer., 53,375--387.
[ 1109) Tracy, D.S., Singh, H.P. and Singh, R. (1998). A class of unbiased estimators alternative to ratio
cum product estimator in sample surveys. Prisankhyan Samikkha, 5, 43--50.
[ 1110) Tracy, D.S., Singh, H.P. and Singh, S. (2001). An investigation on the bias reduction in linear
variety of ratio cum product estimator. Allgemeines Statist . Archive, 85,323--332.
[ 1III ] Tracy, D.S. and Singh, S. (2000). Calibration estimators in randomized response surveys.
Metron, 57,47--68.
[ 1112 ) Tracy, D.S., Singh, S. and Arnab, R. (2003). Note on calibration in stratified and double
sampling. Survey Methodology, June issue ( To appear)
[ 1113 ] Tripathi, T.P. (1970). Contributions to the sampling theo ry using multivariate information .
Ph.D. thesis submitted to Punjabi University, Patiala, India.
[ I I 14 ] Tripathi, T.P. (1976). On double sampling for multivariate ratio and difference methods of
estimation. J.lndian Soc. Agric. Statist ., 33, 33--54.
[ 1115 ] Tripathi, T.P. (1980). A general class of estimators for population ratio. Sankhy a, C, 42, 63--
75.
[ 1116] Tripathi, T.P. (1987). A class of estimators for population mean using multi-variate auxiliary
information under general sampling designs. Aligarh J. Statist ., 7,49--62.
[ 1117 ] Tripathi, T.P. and Ahmed, M.S. (1995). A class of estimators for a finite population mean based
on multivariate information and general two-phase sampling. Calcutta. Statist. Assoc. Bull.A5, 203--218.
[ 1118) Tripathi, T.P. and Chaubey, Y.P. (1992). Improved estimation ofa finite population mean based
on paired observations. Commun . Statist .i-Theory Meth ., 21, 3327--3333.
[ 1119 ] Tripathi, T.P. and Singh, H.P. (1992). A class of unbiased product type estimators for the mean
suitable for positive and negative correlation situations. Commun . Statist--Theory Meth. , 21(2), 507--
518.
[ 1120) Tripathi, T.P. and Srivastava, O.P. (1979). Estimation on successive occasions using PPSWR
sampling. Sankhy ii , C, 41, 84--91.
[ 1121 ) Tu, X.M., Meng, X.L. and Pagano, M. (1993). The AIDS epidemic: Estimating survival after
AIDS diagnosis from surveillance data. J. Amer. Statist . Asso c., 88, 26--36.
Bibliography 1189
[ 1122) Tukey, l .W . (1956). Keeping moments like sampling computations simple . Ann. Math. Statist.,
27,37--54.
[ 1123 ) Tukey, l .W . (1958) . Bias and confidence in not quite large samples. Ann. Math. Statist.
(Abstract), 29, 614 .
[ 1124 ) Tuteja, R.K. and Bahl, S. (1991) . Multivariate product estim ators . Calcutta Statist. Assoc . Bull..
42,109--115 .
[ 1125 ) Unam, I. (1995). Estimating the population mean using supplementary dephased information
Commun. Statist . -- Simula.. 24,733--743.
[ 1126 ) Unnithan, V.K.G. (1978) . The minimum variance boundary points of stratification Sankhy a, C,
40,60--72.
[ 1127) Upadhyaya, L.N., Kushwaha, K.S. and Singh, H.P. (1990). A modified chain ratio type estimato r
in two-phase sampling using multi-auxiliary information. Metron, 381--393.
[ 1128) Upadhy aya, L.N. and Singh, H.P. (1999) . Use of transformed auxiliary variable in estimating the
finite population mean . Biom. J., 41(5) , 627--636.
[ 1129) Upadhyaya, L.N ., Singh, H.P. and Singh, S. (2003). A family of almost unbiased estimators for
negat ively correlated variables using jackknife techn ique. Statistica (Accepted)
[ 1130 ) Uthayakumaran, N. (1998) . Additional cicular systematic sampling methods. Biom. J., 40(4) ,
467--474.
[ 1132 ) Valliant, R. (2002) . Variance estimation for the general regression estimator. Survey
Methodology , 28 (1),103--114.
[ 1133 ) Verdeman, S. and Meeden, G.(1983) . Admissible estimators in finite population sampling
employing various types of prior informat ion. J. Statist. Planning Infer., 7,329--341.
[ 1135 ) Vijayan, K. (1975). On estimating variance in unequal probability sampling. 1. Amer. Statist.
Assoc .. 70, 713--716.
[ 1136) Vos, l.W.E. (1980). Mixing of direct , ratio and product method estimators. Statistica
Nearlandi ca, 34,209--213 .
[ 1137) Wakimoto, K. (1971). Strat ified random sampl ing (III): Estimation of the correl ation coefficient.
Ann. Inst. Statist . Math , 23, 339--355.
[ 1138) Walsh, l .E. (1970). Generalization of ratio estimator for population total. Sankhy E , A, 32, 99--
103.
[ 1139) Warner, S.L. (1965). Randomized response : A survey techn ique for eliminating evasive answer
bias . J. Amer. Statist. Assoc., 60, 63--69 .
[ 1140) Welch , B.L. (1937). On the z test in randomized blocks and Latin squares. Biometrika , 29, 21--
52.
1190 Advanced sampling theory with applications
[ 1141 ) Williams, W.H. (\958). Unbiased regression estimator. Unpublished Ph.D. Dissertation, Iowa
State University, Ames, Iowa.
[ 1142 ) Williams, W.H. (1961). Generating unbiased ratio and regression estimators . Biometrics, 17,
267--274.
[ 1143) Williams, W.H. (\ 963). The precision of some unbiased regression estimators. Biometrics, 19,
352--361.
[ 1144 ) Willson, D., Kirnos, P., Gallagher, J. and Wagner, A. (2002). Variance estimation from
calibrated samples. Joint Statistiacl Meetings. NY-Section on survey research methods, 3727--3731 .
[ 1145) Wolter, K.M. (\979). Composite estimation in finite populations . J. Amer. Statist. Assoc. 74,
604--613.
[ 1146) Wolter, K.M. (1984). An investigation of some estimators of variance for systematic sampling.
J. Amer. Statist . Assoc.,79,781--790.
[ 1147) Wolter, K.M. (\985). Introduction to variance estimation. New York, Springer-Verlag.
[ 1148) Worthingham, R., Morrison, T., Mangat, N.S., and Desjardins, G. (2002). Bayesian estimates of
measurement error for in--line inspection and field tools. Paper IPC2002-27263 Internat ional Pipeline
Conference 2002. Calgary. Alberta . Canada. The American Society ofMechanical Engineers. New York.
Proceedings sre on a CD-ROM.
[ 1149 ) Wretrnan, J.H. (1995). Split questionnaires. Presented at the Conferen ce on Methodological
issues in Official Statistics, Stockholm.
[ 1150 ) Wright, R.L. (1983). Finite population sampling with multivariate auxiliary information. J.
Amer. Statist. Assoc., 78, 879--883.
[ 1151 ) Wright, T. (1990). Probability proportional to size (Jl' ps) sampling using ranks. Commun .
Statist. -- Theory Meth .. 19(1),347--362.
[ 1152 ) Wu, C. (2001). Empirical Likelihood method for finite populations . Proceedings of Statistics
2001 Canada . The 4 th Conference in Applied Statistics , 339--350 .
[ 1153 ) Wu, C. and Sitter, R.R. (2001). A model calibration approach to using complete auxiliary
information from survey data. J. Amer. Statist. Assoc. , 96, 185--193.
[ 1154 ) Wu, C. and Sitter, R.R. (2001). Variance estimation for the finite population distribution
function with complete auxiliary information. Canad. J. Statist., 29(2), 289--307.
[ 1155 ) Wu, C.FJ. (1981). Balanced repeated replications based on mixed orthogonal arrays.
Biometrika, 78,181--188,
[ 1156) Wu, C.FJ. (1982). Estimation of variance of the ratio estimator. Biometrika, 69,183--189.
[ 1157 ) Wu, C.FJ. (1984). Estimation in systematic sampling with supplementary observations .
Sankhy a , B, 46, 306--315.
[ 1158) Wu, C. F. J. (\985). Variance estimation for combined ratio and combined regression estimators.
1. R. Statist. Soc ., B, 47,147--154.
[ 1159 ) Wynn, H.P. (I 977a). Minimax purposive survey sampling design. J. Amer. Statist. Assoc., 72,
655--657.
Bibliography 1191
[ 1160 ] Wynn, H.P. (1977b) . Optimum designs for finite populations sampling. Statistica Decision
Theory and Related Topic (S.S. Gupta and D.S. Moore, eds) Academic Press, New York, 471--492.
[ 1161 ] Wywial, J. (1999). Generalization of Singh and Srivastava's sampling scheme providing
unbiased regression estimators. Statistics in Transition, 4(2), 259--281.
[ 1163] Yamada, S. and Morimoto, H. (1992). Sufficiency. Current issues in statistical inference: Essays
in Honor of D. Basu by Ghosh and Pathak . Lecture Notes--Monograph Series. Institute ofMathematical
Statistics. Hayward. Californ ia. 17, 86--98.
[ 1164 ] Yansaneh, I.S. and Fuller, W.A. (1998). Optimal recursive estimation for repeated surveys.
Survey Methodology, 24,31--40.
[ 1166] Yates, F. (1949). Sampling methods for censuses and surveys . London: Charles Griffin and Co.
[ 1167] Yates, F. (1960). Sampling methods for censuses and surveys. Charles Griffin & Co., London.
[ 1168 ] Yates, F. and Grundy, P.M. (1953). Selection without replacement from within strata with
probability proportional to size. J. R. Statist . Soc, 15(B), 253--261.
[ 1169] You, Y. and Rao, J.N.K. (2000a). Small area estimation with unmatched sampling and linking
models. Proceedings ofSurvey Methods Section, 191--196.
[ 1170 ] You, Y. and Rao, J.N.K. (2000b). Hierarchical Bayes estimation of small area means using
multi-level models. Survey Methodology, 26 (2), 173--181.
[ 1171 ] Yung, W. and Rao, J.N.K. (2000). Jackknife variance estimation under imputation for estimators
using poststratification information. J. Amer. Statist . Assoc.,95, 903--915 .
[ 1172] Zarcovic, S.S. (1960). On the efficiency of sampling with various probabilities and the selection
of units with replacement. Metrika, 3, 53--60.
[ 1174] Zinger, A. (1980). Variance estimation in partially systematic sampling. J. Amer. Statist . Assoc.•
75,89--97.
[ 1175 ] Zou, G. (1997). Admissible estimation for finite population under the Linex loss function. J.
Statist . Planning Infer. 61,373--384.
[ 1176] Zou, G. (1999). Variance estimation for unequal probability sampling. Metrika, 50, 71--82.
[ 1177 ] Zou, G. and Liang, H. (1997). Admissibility of the usual estimators under error in variables
superpopulation model. Statist . Prob. Lett., 32, 301--309.
[ 1178 ] Zou, G. and Wan, A.T.K. (2000). Simultaneous estimation of several stratum means under error-
in-variables superpopulation models. Ann . Inst. Statist. Meth., 52(2), 380--396.
[ 1179 ] Zyskind, G. (1967). On canonical forms, non-negative covariance matrices and best and simple
least squares linear estimators in linear models. Ann. Math. Statist., 38, 1092--1109.
AUTHOR INDEX
Agarwal, M.e. 259, 731 , 741,1098, Arcos , A. 277 , 278, 430, 516, 864,
1131 1133,1171
Agarwal, S.K. 265 , 270, 317, 320, Arnab, R. 342 , 422 , 432 , 436 , 440 ,
342 ,343,391,392,1131 ,1135, 442 ,511,5 12,700,748,848,857,
1156 859,864,878,880,882,896,916,
919,956,957,958,961,963,968,
Aggarwal, R. 7 18, 1171 1000, 1035, 1041, 1046, 1054,
1132, 1133, 1140, 1162, 1180,
Ahmed, M.S. 563, 596, 60 I, 1131, 1188
1188
Arnholt, A.T. 105, 1133
Ahmed, T. 867,1131
Artes, E. 278, 864 , 885, 1133, 1171
Ahsan , MJ. 723,1131
Asad , H. 498 , 1172
Aires, N. 386 , 1131
Asok ,e.269,378,389,391 ,1134,
Ajgaonkar, S.G.P . 391, 516, 605, 1186
1131,1143,1144
Avdhani, M.S. 864, 1134
1194 Advanced sampling theory with applications
Beale, E.M.L. 186, 187, 223, 317, Bogue, D.J. 1081, 1136
1135
Boekema, F.W.M. 494, 1160
Bedi, P.K. 340,391,509,518,1135
Bose, C. 594, 606, 1136
Beegle, L.D. 141, 1167
Bourke, P.D. 966, 968, 1137
Bek, Y. 943 , 1163
Bouza, C. 1041, 1137
Bellhouse, D.R. 126,395,497,508,
563,630,645,842,843,915,961, Brackstone, GJ. 1074, 1081, 1137
970, 1135, 1159, 1167
Author Index 1195
Brewer, K.R.W. 214, 312, 370, 373, Chakrabarty, M.C. 126, 127, 1139
377,378,379,380,381 ,384,387,
388,389,390,394,443,494,497, Chakrabarty, R.P. 227, 257, 267,
498,500,624,744,808,880, 274,569,1025,1139,1155
1137,1138,1146,1173
Chakravorty, I.M. 490, 1170
Brillinger, D.R. 126, 1138
Chand,L. 552, 554,606, 1139
Brown, RM. 257, 1138
Chandak, R. 340, 1177
Brown, J.A. 819, 1138
Chang, H.J. 272, 945, 961, 1139
Bryant, E.C. 714, 715, 717, 1138,
1164 Chang, K.C. 731, 742,1139
Burdick, R.K. 845, 1138 Chandra, S.K. 374, 507, 508, 1172
Chen, J. 519, 586,1011 ,1017,1028, Dalenius, T. 704, 708, 718, 817, 968,
1141 980,1137,1142
Dharmadhikari, S.W. 448, 486, 1163 Elliott, M.R. 980, 1051, 1145
Diana, G. 267, 1139, 1144 Eltinge, J.L. 495, 1084, 1145, 1148
Dubey,J. 731,1175
Duncan, G.J. 865, 1144, 1154 Farrell, P.J. 424, 427, 504, 517,
1099,1145, 1146
Dunn, G. 1022, 1141
Fan, J. 503, 1146
Dupont, F. 545, 563, 1003, 1144
Fay, R. 1022, 1146
Durbin,J.191,389,393,1144
Fay,R.E. 1097, 1099, 1146
Dwivedi, T.D. 258, 1184
Fellegi, I.P. 385,387,495,496, 1146
Dwivedi, V.K. 257, 258, 280, 1182
Feller, W. 107, 108, 1146
Feng,S.884,1146
Franklin, L.A. 892, 893,962, 1147 Ghosh, S. 368, 370, 525, 1148
Fuller, W.A. 250, 251, 377, 387, Godambe, V.P. 384, 395,427,428,
410,411,495,569,578,731 ,865, 465,466,467,468,469,470,472,
866, 1065, 1093, 1135, 1147, 476,477,478,480,482,484,513,
1153, 1191 563,743,879,957,1135,1148,
1149
1176, 1177
Gregoire, T.G. 865, 1173
Gautschi, W. 638, 648, 1148
Author Index 1199
Grewal, I.S. 335, 343, 957, 960, Han, C.P. 731, 742,1139
1150,1180
Hanif, M. 389, 394,498,500,512,
Grey,G.B.I018,1163 880,1137,1138,1151,1172
Grundy, P.M. 354, 356, 357, 385, Hanurav, T.V. 385, 387, 395,489,
387,390,391,409,411,413,419, 713, 1151
427,428,473,1191
Harter, R.M. 1093, 1135
Gujarati, D. 246, 1150
Hartigan, J.A. 476, 1151
Gupta, B.K. 740, 1150
Hartley, H.G. 130, 180, 182,266,
Gupta, J.P. 209, 262, 340, 341, 643, 270,291 ,349,389,453,458,495,
721,1150,1169, 1181 509,512,578,586,624,714,715,
717,731 ,742,838,919,1138,
Gupta, M.R. 594, 1174 1144,1149,1151 ,1152,1167
Hidiroglou, M.A. 411, 545, 561, 563, Hutchison, M.C. 191, 1153
590,592,603,847,1003,1087,
1152, 1173, 1183, 1185
Hoza, C. 1099, 1149 Jhajj, H.S. 138, 167, 169, 171, 199,
203,209,259,261,317,318,319,
Huang, K.C. 272, 945, 1139 320,373,420,421,541,578,698,
1153,1183
Huang, L.R. 865, 1153
Jindal, K.K. 630, 643, 644, 647,
Huff, L. 1084, 1148 1176
Kashani, H.B. 340, 341 , 961 , 1150, Kireg yera , B. 552 , 595 , 1054, 1055
1159
King, M.L. 903, 905, 906, 926, 1180,
Kashyap, S. 270, 1131 1181, 1185
Konijn, H.S. 512, 621, 1076, 1155 Lahiri, P. 1098, 1143, 1156
Koop, J.C. 177,647, 1155 Laird, N.M. 1099, 1100, 1143, 1156
Mak, T.K. 250, 251, 257, 277, 474, Midzuno, H. 390, 391, 393, 511,
475,579,581 ,582,584,1156, 512, 1160
1158
Miller, R.G. 227, 1160
Malec, D. 1099, 1158
Miller, S.M. 865, 1157
Mandowara, V.L. 708, 1158
Milne, A. 639, 641, 1160
Mangat, N.S. 129,209,245,262,
335,420,458,509,600,709,896, Mishra, G. 193,420,464, 510, 595,
899,901,920,933,935,939,954, 1160,1186
955,958,959,960,961 ,962,963 ,
964,967,969,970,971 ,1065, Mishra, R.N. 966, 1160
1067, 1068, 1073, 1101, 1136,
1151,1158,1159,1160,1178, Mitra, J. 414, 512,1140
1204 Advanced sampling theory with applications
Moors, J.J.A. 494, 899, 901, 930, Nieto de Pascual, J. 191, 1162
955, 1160
Nieuwenbroek, N.J. 577,600, 1169
Moriarity, e.L. 1099, 1158
Nigam, A.K. 391, 697, 872,1151,
Morimoto, H. 492, 1191 1156, 1173
Nadar~ah,S.343, 1145
Padmawar, V.R. 422, 458, 745,
Nanjamrna, N.S. 191,464, 1161 1080, 1101, 1162
Author Index 1205
Rao,J.N.K. 117, 130, 141, 174, 191, Robins, J.M. 1022, 1169
220,227,327,330,335,336,344,
349,370,376,378,389,390,391, Robson, D.S. 181 , 187,260,317,
392,395,413,422,428,453,458, 1169
484,494,495,497,509,512,520,
563,578,586,600,630,645,838, Rosen,B.386,448,1169
842,843,847,857,864,865,872,
873,875,879,887,919,980, Ross, A. 180, 182, 291, 266, 1152
1016, 1017, 1018, 1020, 1021,
1026, 1049, 1053, 1074, 1081, Rout, K. 595, 1160
1084,1098,1099, 1101, 1135,
1137,1141,1148, 1152, 1155, Roy,D. 422,432, 433,435,436,
1156, 1157, 1164, 1165, 1166, 924,925,926,957, 1140, 1141
1167, 1168, 1174, 1183, 1185,
1191 Roy, D.C. 1098, 1131
Sampath, S. 209, 374, 507, 508, 959, Scott, A.J. 387, 379, 380, 381,498,
1027,1172 1173
1208 Advanced sampling theory with applications
Singh, D. 572, 630, 635, 639, 643, Singh, R.K. 203, 250, 265, 342, 596,
644,647,791,812,847,876,883, 698,1101,1159,1163,1179
888,1149,1154,1176
Singh, R.S. 596, 1164
Singh, G. 199,203,265,698, 1154,
1179 Singh, S. 129, 187, 188,207,209,.
245,248,259,261,262,266,268,
Singh, G.N. 595, 886, 1176, 1182 269,276,277,279,280,281,292,
335,336,337,343,344,345,346,
Singh,H.P.174, 185, 186,209,248, 373,401,409,414,419,420,421,
260,261,262,265,268,269,270, 424,425,426,427,428,432,436,
271,272,274,277,279,289,317, 440,442,473,474,475,503,504,
555,563,595,596,605,607,611, 505,517,519,520,544,561,578,
740,808, 1042, 1058, 1132, 1136, 588,590,592,600,602,605,610,
1164, 1176, 1177, 1178, 1181, 611,696,698,700,713,715,748,
1182, 1188, 1189 812,874,896,899,900,901,903,
905,906,907,911,915,916,919,
Singh, J.P. 715,1181 920,925,926,927,928,951,955,
957,958,959,960,961,962,963,
Singh, K. 373, 1169 964,965,967,969,970,971,984,
986,987,993,1000,1007,1008,
Singh, K.B. 639, 1159 1025,1027,1028,1035, 1041,
1042, 1046, 1047, 1052, 1054,
Singh,M.174,340,343, 1131, 1177 1056, 1058, 1132, 1133, 1134,
1145, 1146, 1150, 1151, 1157,
Singh, M.P. 264, 265, 272, 277, 317, 1158,1159,1176,1177,1178,
370,394,479,482,495,552,713, 1179,1180,1181,1182,1185,
865,1087,1141, 1144, 1154, 1161, 1188,1189
1167, 1177, 1182
Singh, S.K. 263, 265, 1144
Singh, P. 278, 393, 394,420,464,
510,634,635,864,1131,1173, Singh, V.K. 264, 595, 596, 886,
1176, 1177 1176, 1182
Singh,R. 129, 177, 187, 188,209, Singh, V.P. 174,563, 740, 1177,
259,260,262,263,272,276,279, 1182
326,335,340,341,373,458,509,
510,511,512,517,549,607,638, Sinha, B.K. 395,449, 1168, 1182
643,708,709,710,721,808,896,
899,901,933,935,954,955,957, Sinha, J.N. 966, 1160
958,959,960,961,962,963,964,
965,967,969,970,971,984,1041, Sisodia, B.V.S. 257, 258, 280, 1098,
1044, 1134, 1136, 1150, 1158, 1182
1159,1165,1169,1175,1177,
1178, 1179, 1180, 1181, 1182, Sitter, R.R. 411, 425, 430, 504, 519,
1188 568,569,600,872,1017,1026,
1210 Advanced sampling theory with applications
1053, 1141, 1167, 1182, 1183, Srivastava, S.K. 138, 160, 164, 166,
1190 167,169,171,199,203,209,239,
248,259,261,266,268,276,280,
Skinner, C.J. 428, 466, 578, 586, 317,318,319,320,373,414,420,
1175,1183 421,541,578,580,698,840,882,
1153,1184
Smarandache, F. 605, 1132
Srivastava, S.R. 552, 555, 1054,
Smeets, R. 494, 1160 1184
Sukhatme, P.Y. 138,269,621 ,638, Tikkiwal, B.D. 597, 847, 864, 877,
808, 822, 1051, 1055, 1134, 1186 1131,1187
Thompson, DJ. 349, 351, 352, 353, Tu, X.M. 1022, 1188
355,356,357,368,371 ,372,383,
384,387,389,400,412,428,431, Tucker, A.W. 726, 1155
436,439,440,464,473,484,485,
490,491,497,499,501 ,509,511, Tukey,1.W. 126, 191,594, 1138,
512,545,577,635,713,819,919, 1189
926,1000,1006,1035,1153
Tuteja, R.K. 264, 1189
Thompson, M.E. 370, 373,465,466,
468,469,470,472,476,477,478,
563,859,1149,1174,1187
Valdes, S.R. 1056, 1182 Wretman, J.H. 374, 377, 394, 395,
411,413,415,418,496,49~498,
Vos, J.W.E. 257, 373, 395, 1025, Wu, C. 411, 425, 430, 504, 519,
1141, 1189 1141, 1183, 1190
Wu,C.F.J.217,220,262,317,413,
414,421,425,426,561,638,678,
682,685,689,697,698,872,
1143, 1167, 1190
Wakimoto, K. 209, 1189
Wynn, H.P. 378, 1190, 1191
Walsh, J.E. 257, 267, 320, 580, 1189
Wywial, J. 191,464, 1171, 1191
Wan, A.T.K. 745,1191
Warner, S.L. 889, 892, 893, 911, 912 Yadav, R.J. 864,1177
914,920,930,931,933,935,937,
939,952,953,954,955,958,963, Yamada, S. 492, 1190
968,970, 1157, 1189
Yansaneh, I.S. 865, 1191
Webster, J.T. 191, 1167
Yao, L. 1022, 1157
Author Index 1213
Yates, F. 354,356,357,358,387,
390,391,409,410,413,419,427,
428,473,629,630,641,647,847,
1191
Yu, F. 401,409,414,419,420,421,
425,426,474,503,594,696,698,
700, 928, 1098, 1180, 1191
Zinger, A. 637,1191
Zou,G.514, 745,884,1146,1191
Confidence interval 35
Balanced half sample 871
Controlled sampling 125
Balanced sample 380
Cosmetic calibration 500
Basic concepts 1
Cumulative total method 300
Beale's estimator 223
Current topics 494
Best estimator 479
Bootstrap 873
Determinant sampling 127
GREG 399
SRSWOR5
Strata boundaries 70 I
Warner's model 889
Stratified sampling 649
Unbiased estimators 24
I Barnett, V. (2002). Sampl e Survey Principl es and Method s. 3'd Ed., Arnold , London .
2 Biemer, P., Groves, R., Lyberg, L., Mathiowetz, N. and Sudman, S. (1992) . Measur ement Errors in
Surveys. Wiley.
3 Brewer, K. (2002) . Combined survey sampling inference . Arnold.
4 Cassel, C; Samdal, C.E. and Wretrnan, J.H. ( 1992). Foundations ofinference in survey samp ling.
Krieger Publishing Company .
5 Chaudhuri , A. and Vos, J.W.E..(1988). Unified theory and strategies of surv ey sampling. N. Holand.
6 Chaudhuri, A. and Stenger, H. (1992). Survey samp ling : Theory and Methods. Marcel Dekker, NY.
7 Cingi, H. (1994) . Sampling Theory. Hacettepe University Press.
8 Cochran, W.G. (1977). Sampling Techniques. 3rd ed., Wiley.
9 Coleman, P.B. (1993) . Practical sampling techniquesfor infrar ed analysis. Marcel Dekker, NY.
10 Cox, B.G., College, M., Binder, D., Kott, P.S. and Christianson, A. (1995). Business Survey
Methods. Wiley.
11 Deming, W.E. (1950) . Some Theory ofSampling. Wiley.
12 Groves, R.M. (1988) . Telephone Survey Methodology. Wiley.
13 Groves , R.M. (1989). Survey Errors and Survey Costs. Wiley.
14 Foreman, E.K. (199 1). Survey Sampling Principles. Marcel Dekker.
15 Hajek, J. (1981). Samplingfrom afinite population . Marcel Dekker.
16 Hansen, M.H., Hurwitz, W.N. and Madow, G. (1953). Sampl e Survey Methods and Theory. Wiley.
17 Hansen, M.H., Hurwitz, W.N. and Madow, G. (1993). Sampl e Survey Meth ods and Theory. Wiley.
18 Hedayat, A.S. and Sinha, B.K. (1991). Design and inference in fi nite population sampling. Wiley.
19 Jessen, RJ. (1978). Statistical Survey Techniques . Wiley.
20 Kish, L. (1995). Survey Sampling. Wiley.
21 Lohr, S.L. (1999) . Sampling: Design and Analysis. Duxbury Press
22 Mandenhall, W. (1979) . Elementary survey sampling . Duxbury.
23 Moser, C.A. and KaIton, G. (1971). Survey Methods in Social Investigation . London:Heineman.
24 Mukhopadhyay, P. (1998). Small area estimation in surv ey sampling. Narosa Publishing House .
25 Mukhopadhyay, P. (1998). Theory and methods ofsurv ey sampling . Prentice--Hall of India.
26 Mukhopadhyay, P. (2002). Topics in survey sampling. Springer-Verlag, NY.
27 Murthy, M.N. (1967) . Sampling theory and methods . Statistical Publishing Society, Calcutta .
28 Platek, R., Rao, J.N.K. , Sarndal, C.E. and Singh, M.P. (1987). Small area statistics. Wiley.
29 Raj, D. (1968). Survey Sampling. McGraw Hill.
30 Rao, J.N.K. (2003) . Small area estimation : Methods and applications. Wiley.
31 Rossi, P.H., Wright, J.D. and Anderson, A.S. (1983). Handb ook ofSurvey Research. AP.
32 Sarndal, C.E., Swenson ,B. and Wretrnan, J.H. (1992). Model Assisted Survey Sampling. Springer.
33 Schaefer, R.L., Mendenhall , W. and Ott, R.L. (1996). Elementary Survey Sampling. Duxbury.
34 Seber, G.A.F. (1981) . Estimation ofAnimal Abundance and Related Parameters. Griffin, London.
35 Singh , R. and Mangat , N.S. (1996). Elements ofsurvey sampling. Kluwer Academic Publisher .
36 Som, R.K. (1995). Practical sampling techniques . Marcel Dekker, NY.
37 Stuart, A. (1984) . The Ideas ofSampling. Griffin, London.
38 Sudman, S. (1976). Applied Sampling. AP.
39 Sukhatme, P.V., Sukhatme, B.V., Sukhatme, S. and Asok, C. (1984) . Sampling Theory of Surv eys
With Applications. 3'd Ed., Iowa State University Press, Ames, Iowa.
40 Thompson , M.E. (1997). Theory ofSampl e Surveys . Chapman & Hall.
41 Thompson, S.K. (1992) . Sampling. Wiley.
42 Tryfos , P. (1996) . Sampling Methodsfor Appli ed Resear ch. Wiley.
43 Valliant, R., Dorfman, A.H. and Royall, R. (2000). Finite Population Sampling and Inference: a
Prediction Approach. Wiley.
44 Williams, B. (1978) . A Sampl er on Sampling. Wiley.
45 Wolter, K.M. (1985). Introduction to Variance Estimation . Springer .
46 Yates, F. (1960). Sampling Methodsfor Censuses and Surveys. Griffen Publishing Co.
1220
Unique way to 1 - - - - - . . . . L . - . L - - - - - - - - - - - 1
Attractions