Inference and Errors in Surveys - Groves

SURVEY METHODOLOGY
Second Edition
Robert M. Groves
Floyd J. Fowler, Jr.
Mick P. Couper
James M. Lepkowski
Eleanor Singer
Roger Tourangeau
©WILEY
A JOHN W ILEY & SON S, INC., PUBLICATION
Copyright © 2009 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise,
except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without
either the prior written permission of the Publisher, or authorization through payment of the appro
priate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA
01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the
Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons,
Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008 or online at
http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
efforts in preparing this book, they make no representations or warranties with respect to the accu
racy or completeness of the contents of this book and specifically disclaim any implied warranties
of merchantability or fitness for a particular purpose. No warranty may be created or extended by
sales representatives or written sales materials. The advice and strategies contained herein may not
be suitable for your situation. You should consult with a professional where appropriate. Neither the
publisher nor author shall be liable for any loss of profit or any other commercial damages, includ
ing but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact
our Customer Care Department within the U.S. at (800) 762-2974, outside the U.S. at (317) 572-
3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print
may not be available in electronic format. For information about Wiley products, visit our web site
at www.wiley.com.
Library o f Congress Cataloging-in-Publication Data:
Survey methodology / Robert Groves ... [et al.]. — 2nd ed.

p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-46546-2 (paper)
1. Surveys—Methodology. 2. Social surveys—Methodology. 3. Social sciences—Research—
Statistical methods. I. Groves, Robert M.
HA31.2.S873 2009
001.4'33-dc22 2009004196
Printed in the United States of America.
10 9 8 7 6 5 4
■
CHAPTER TWO
INFERENCE AND ERROR

IN SURVEYS
2.1 I n t r o d u c t i o n
Survey methodology seeks to understand why error arises in survey statistics.
( 'hapters 3 through 11 describe in detail strategies for measuring and minimizing
error. In order to appreciate those chapters, and to understand survey methodol
ogy, it is first necessary to understand thoroughly what we mean by “error.”
As the starting point, let us think about how surveys work to produce statis-
lical descriptions of populations. Figure 2.1 provides the simplest diagram of
liOW they work. At the bottom left is the raw material of surveys—answers to
questions by an individual. These have value to the extent they are good descrip-
Figure 2.1 Two types of survey inference.

Survey Methodology, Second Edition. By Groves, Fowler, Couper, Lepkowski, Singer, and Tourangeau 39
( opyright © 2009 John Wiley & Sons, Inc.
40 INFERENCE AND ERROR IN SURVEYS MIT <'YCLE OF A SURVEY— DESIGN PERSPECTIVE
tors of the characteristics of interest (the next-higher box on the left). Surveys, Mime of the selected employers.) These are problems of inference from statistics
however, are never interested in the characteristics of individual respondents per mi the respondents to statistics on the full population.
statistic se. They are interested in statistics that combine those answers to summarize the ( )nc’s first reaction to this litany of errors may be that it seems impossible for
characteristics of groups of persons. Sample surveys combine the answers of indi fcrvcys ever to be useful tools to describe large populations. Do not despair!
vidual respondents in statistical computing steps (the middle cloud in Figure 2.1) bONpite all these potential sources of error, carefully designed, conducted, and
to construct statistics describing all persons in the sample. At this point, a survey Kimly/ed surveys have been found to be uniquely informative tools to describe the
is one step away from its goal—the description of characteristics of a larger pop tymId. Survey methodology is the study of what makes survey statistics more or
ulation from which the sample was drawn. I»»n informative.
The vertical arrows in Figure 2.1 are “inferential steps.” That is, they use Survey methodology has classified these various errors illustrated with the
information obtained imperfectly to describe a more abstract, larger entity. ( I S example above into separate categories. There are separate research litera-
inference “Inference” in surveys is the formal logic that permits description of unobserved piCN for each error because each seems to be subject to different influences and
phenomena based on observed phenomena. For example, inference about unob ■VS different kinds of effects on survey statistics.
served mental states, like opinions, is made based on answers to specific ques ( )ne way of learning about surveys is to examine each type of error in turn, or
tions related to those opinions. Inference about population elements not measured ■lllilying surveys from a “quality” perspective. This is a perspective peculiar to sur-
is made based on observations of a sample of others from the same population. In Vvv methodology. Another way of learning about surveys is to study all the survey
the jargon of survey methodology, we use an answer to a question from an indi [benign decisions that are required to construct a survey; identification of the appro-
vidual respondent to draw inferences about the characteristic of interest to the sur pi Itile population to study, choosing a way of listing the population, selecting a
vey for that person. We use statistics computed on the respondents to draw infer Munpling scheme, choosing modes of data collection, and so on. This is an ap-
ences about the characteristics of the larger population. hlinu ll common to texts on survey research (e.g., Babbie, 1990; Fowler, 2001).
These two inferential steps are central to the two needed characteristics of a
survey:
2 '/ The Lif e Cy c l e o f a Su r v ey Fr om a Design
1) Answers people give must accurately describe characteristics of the Per spect iv e
respondents.
2) The subset of persons participating in the survey must have characteris IniIns und the next section, we will describe the two dominant perspectives about
tics similar to those of a larger population. lurveys: the design perspective and the quality perspective. From the design per-
tpective, discussed in this section, survey designs move from abstract ideas to
When either of these two conditions is not met, the survey statistics are subject to kinbete actions. From the quality perspective, survey designs are distinguished
error “error.” The use o f the term “error” does not imply mistakes in the colloquial Hy the major sources of error that affect survey statistics. First, we tackle the
sense. Instead, it refers to deviations of what is desired in the survey process from lie sign perspective.
measurement what is attained. “Measurement errors” or “errors of observation” will pertain to A survey moves from design to execution. Without a good design, good sur
error deviations from answers given to a survey question and the underlying attribute vey statistics rarely result. As the focus moves from design to execution, the
being measured. “Errors of nonobservation” will pertain to the deviations of a sta HDliirif of work moves from the abstract to the concrete. Survey results, therefore,
error of tistic estimated on a sample from that on the full population. ili'pcnd on inference back to the abstract from the concrete. Figure 2.2 shows that
observation Let us give an example to make this real. The Current Employment Statistics j llli'iv are two parallel aspects of surveys: the measurement of constructs and
(CES) program is interested in measuring the total number of jobs in existence in descriptions of population attributes. This figure elaborates the two dimensions of
error of non the United States during a specific month. It asks individual sample employers to Inference shown in Figure 2.1. The measurement dimension describes what data
observation report how many persons were on their payroll in the week of the 12th o f that me tobe collected about the observational units in the sample: what is the survey observational
month. (An error can arise because the survey does not attempt to measure job *|iiml? The representational dimension concerns what populations are described unit
counts in other weeks of the month.) Some employer’s records are incomplete or hy the survey: who is the survey about? Both dimensions require forethought,
out of date. (An error can arise from poor records used to respond.) These are I'liiuuing, and careful execution.
problems of inference from the answers obtained to the desired characteristic to Because Figure 2.2 contains important components of survey methods, we
be measured (the leftmost vertical arrows in Figure 2.1). will spend some time discussing it. We will do so by defining and giving exam
The sample of the employers chosen is based on lists of units of state unem plesof each box in the figure.
ployment compensation rolls months before the month in question. Newly created
employers are omitted. (An error can arise from using out-of-date lists of employ
ers.) The specific set of employers chosen for the sample might not be a good 2.2.1 Constructs
reflection o f the characteristics of the total population of employers. (An error can
arise from sampling only a subset of employers into the survey.) Further, not all "< '(instructs” are the elements of information that are sought by the researcher. construct
selected employers respond. (An error can arise from the absence of answers from I In’ Current Employment Statistics survey attempts to measure how many new
42 INFERENCE AND ERROR IN SURVEYS I II I CYCLE OF A SURVEY— DESIGN PERSPECTIVE 43
limy have well-developed answers; those who have never thought about it may
Measurement Representation Iimvc to construct an answer de novo). In contrast, the National Survey on Drug
I Inc and Health (NSDUH) measures consumption of beer in the last month. This
Ik a construct much closer to observable behaviors. There are a limited number of
WMys this could be measured. The main issues are simply to decide what kinds of
ill inks count as beer (e.g., does nonalcoholic beer count?) and what units to count
( I-ounce cans or bottles is an obvious choice). Thus, the consumer optimism
cuiiNlruct is more abstract than the construct concerning beer consumption.
2.2.2 Measurement
Measurements are more concrete than constructs. “Measurements” in surveys are measurement
Ways to gather information about constructs. Survey measurements are quite
Uvcrse: soil samples from the yards of sample households in surveys about toxic
Itmtamination, blood pressure measurements in health surveys, interviewer
observations about housing structure conditions, electronic measurements of traf-
IU' How in traffic surveys. However, survey measurements are often questions
posed to a respondent, using words (e.g., “During the last 6 months, did you call
tile police to report something that happened to YOU that you thought was a
I I line?”). The critical task for measurement is to design questions that produce
HHNwers reflecting perfectly the constructs we are trying to measure. These ques
tions can be communicated orally (in telephone or face-to-face modes) or visually
(in paper and computer-assisted self-administered surveys). Sometimes, however,
they are observations made by the interviewer (e.g., asking the interviewer to
Figure 2.2 Survey lifecycle from a design perspective. observe the type of structure of the sample housing unit or to observe certain
alii ibutes of the neighborhood). Sometimes they are electronic or physical meas-
IIin Dents (e.g., electronic recording of prices of goods in a sample retail store,
jobs were created in the past month in the United States, the National Assessment Inking a blood or hair sample in a health-related survey, taking a sample of earth
of Education Progress measures knowledge in mathematics of school children, In n survey of toxic waste, taking paint samples). Sometimes questions posed to
and the National Crime Victimization Survey (NCVS) measures how many inci icspondents follow their observation of visual material (e.g., streaming video
dents of crimes with victims there were in the last year. The last sentence can be presentation of commercials on a laptop, presentation of magazine covers).
understood by many; the words are simple. However, the wording is not precise;
it is relatively abstract. The words do not describe exactly what is meant, nor
exactly what is done to measure the constructs. In some sense, constructs are 2.2.3 Response
ideas. They are most often verbally presented.
For example, one ambiguity is the identity of the victim of the crime. When I lie data produced in surveys come from information provided through the sur
acts o f vandalism occur for a household (say, a mailbox being knocked down), vey measurements. The nature of the responses is determined often by the nature response
who is the victim? (In these cases, NCVS distinguishes crimes against a house ill the measurements. When questions are used as the measurement device,
hold from crimes against a person.) When graffiti is spray painted over a public respondents can use a variety of means to produce a response. They can search
space, who is the victim? Should “victimization” include only those crimes llicir memories and use their judgment to produce an answer [e.g., answering the
viewed as eligible for prosecution? When does an unpleasant event rise to the question, “Now looking ahead, do you think that a year from now you (and your
level o f a crime? All of these are questions that arise when one begins to move liimily living there) will be better off financially, or worse off, or just about the
from a short verbal description to a measurement operation. Some constructs Mime as now?” from the SOC]. They can access records to provide an answer
more easily lend themselves to measurements than others. (e g., looking at the employer’s personnel records to report how many nonsuper-
Some constructs arc more abstract than others. The Survey of Consumers visory employees were on staff on the week of the 12th, as in the CES). They can
(SOC) measures short-term optimism about one’s financial status. This is an atti- keck another person to help answer the question (e.g., asking a spouse to recall
tudinal state of the person, which cannot be directly observed by another person. when the respondent last visited the doctor).
It is internal to the person, perhaps having aspects that are highly variable within Sometimes, the responses are provided as part of the question, and the task
and across persons (e.g., those who carefully track their current financial status of lhe respondent is to choose from the proffered categories. Other times, only
44 INFERENCE AND ERROR IN SURVEYS 45
| II M 'YC'LE OF A SURVEY— DESIGN PERSPECTIVE
the question is presented, and the respondents must generate an answer in their 1 , 2.6 The Frame Population
own words. Sometimes, a respondent fails to provide a response to a measure Illustration—Populations of
ment attempt. This complicates the computation of statistics involving that Inference and Target Populations
I hr Imine population is the set of target population
measure.
milliters that has a chance to be selected into the Often, survey statistics are con
prvcv sample. In a simple case, the “sampling structed to describe a population that
■Mine" is a listing of all units (e.g., people and cannot easily be measured. For
2.2.4 Edited Response illtployers) in the target population. Sometimes, example, the Surveys of Consumers
wmvver, the sampling frame is a set of units imper- attempts to estimate consumer senti
In some modes of data collection, the initial measurement provided undergoes a fculv linked to population members. For example, ment among U.S. adults in a specific
review prior to moving on to the next. In computer-assisted measurement, quan (he SOC has as its target population the U.S. adult month. Each minute, households are
titative answers are subjected to range checks, to flag answers that are outside jhtniNchold population. It uses as its sampling frame being formed through family or rent
acceptable limits. For example, if the question asks about year of birth, numbers • hit of telephone numbers. It associates each per- sharing arrangements; being dis
less than 1890 might lead to a follow-up question verifying the stated year. There ItiM lo the telephone number of his/her household. solved through death, divorce, and
may also be consistency checks, which are logical relationships between two dif IN. tie that there are complications in that some per- residential mobility; being merged
ferent measurements. For example, if the respondent states that she is 14 years old •<ii ih a v e no telephone in their household and oth- together, and so on. The household
and has given birth to 5 children, there may be a follow-up question that clarifies Vii Imvc several different telephone numbers.) The population of a month is different at
the apparent discrepancy and permits a correction of any errors of data. With National Survey on Drug Use and Health uses a the beginning of the month than at the
interviewer-administered paper questionnaires, the interviewer often is instructed IMinpling frame of county maps in the United end of the month. Sometimes, the
to review a completed instrument, look for illegible answers, and cross out ques Nlitics. Through this, it associates each housing unit phrase “population of inference” is
tions that were skipped in the interview. with a unique county. It then associates each person used for the set of persons who at
After all of the respondents have provided their answers, further editing of lit the target population of adults and children age any time in the month might be eligi
data sometimes occurs. This editing may examine the full distribution of an 12 or older with the housing unit in which they live. ble. The “target population” describes
outlier swers and look for atypical patterns of responses. This attempt at “outlier detec (Nole that there are complications for persons with- the population that could be covered,
detection tion” often leads to more careful examination of a particular completed ques nal fixed residence and those who have multiple given that the frame is set at the
tionnaire. miiidcnces.) beginning of the month and contact
To review, edited responses try to improve on the original responses obtained with sample households occurs
from measurements of underlying constructs. The edited responses are the data throughout the month.
from which inference is made about the values of the construct for an individual 2.2.7 The Sample
respondent.
Asample is selected from a sampling frame. This sample is the group from which sampling
measurements will be sought. In many cases, the sample will be only a very small frame
2.2.5 The Target Population 11notion o f the sampling frame (and, therefore, of the target population).
We are now ready to move to the right side of Figure 2.2, moving from the
abstract to the concrete with regard to the representational properties o f a survey. 2.2.8 The Respondents
target The first box describes the concept of a “target population.” This is the set of units
population to be studied. As denoted in Figure 2.2, this is the most abstract of the population In almost all surveys, the attempt to measure the selected sample cases does not
definitions. For many U.S. household surveys, the target population may be “the achieve full success. Those successfully measured are commonly called “respon respondents
U.S. adult population.” This description fails to mention the time extents of the dents” (“nonrespondents” or “unit nonresponse” is the complement). There is
group (e.g., the population living in 2004). It fails to note whether to include those usually some difficulty in determining whether some cases should be termed non
living outside traditional households, fails to specify whether to include those "respondents” or “nonrespondents,” because they provide only part of the infor respondents
who recently became adults, and fails to note how residency in the United States mation that is sought. Decisions must be made when building a data file about
would be determined. The lack of specificity is not damaging to some discus when to include a data record with less than complete information and when to unit non
sions, but is to others. The target population is a set of persons of finite size, exclude a respondent altogether from the analytic file. “Item missing data” is the response
which will be studied. The National Crime Victimization Survey targets those lerm used to describe the absence of information on individual data items for a
aged 12 and over who are not in active military service and reside in noninstitu- sample case successfully measured on other items. Figure 2.3 is a visual portrayal item missing
tionalized settings (i.e., housing units, not hospitals, prisons, or dormitories). The of the type of survey and frame data and the nature of unit and item nonresponse. data
time extents of the population are fixed for the month in which the residence of The figure portrays a data file; each line is a data record of a different sam
the sample person is selected. ple person. The left columns contain data from the sampling frame, on all sample
46 INFERENCE AND ERROR IN SURVEYS III I CYCLE OF A SURVEY— DESIGN PERSPECTIVE 47
members. We will learn later that “weighting” up the underrepresented in our cal- weighting
Data Items
fiilulions may improve the survey estimates. Alternatively, data that are missing
life replaced with estimated responses through a process called “imputation.” imputation
■hero are many different weighting and imputation procedures, all labeled as
Frame Data Interview Data "poslsurvey adjustments.” postsurvey
adjustments
00012456431 1214853579815987315^5483656^
i 2.10 How Design Becomes Process
00026585464 32597541287439rT2397125346fi[42325l1
00036549814 54979419^123136874168651636941)7962 I Itn design steps described above typically have a very predictable sequence. It is
c BtOHt common to array the steps of a survey along the temporal continuum in
0
00045987984 35468416586746816461064^7656743657 ■O Which they occur, and this is the order of most texts on “how to do a survey.”
c
o Figure 2.4 shows how the objectives of a survey help make two decisions,
Sample Cases
00056498841 3216549874681687651319846064064613 CL
C/)
0 pne regarding the sample and another regarding the measurement process. The
00066516557 6519684168496465496&541671654)98416 decision on what mode o f data collection to use is an important determinant of mode of data
00076165891 8798496198198461964981984984164981 how the measurement instrument is shaped (e.g., “questionnaire” in Figure 2.4). collection
I lie questionnaire needs a pretest before it is used to collect survey data. On the
00086841133 6516579616541654819651651613213477 llyhl-hand track of activities, the choice of a sampling frame, when married to a
■4(/) •iiinple design, produces the realized sample for the survey. The measurement
00096519846 c
—»
0 Instrument and the sample come together during a data collection phase, during
~o
00109898191 c Which attention is paid to obtaining complete measurement of the sample (i.e.,
o
CL
00116986516 C/3
c
00125789643 o
Figure 2.3 Unit and item nonresponse in a survey data file.
cases. Respondents have longer data records containing their answers to ques
tions. The nonrespondents (at the end of the file) have data only from the sam
pling frame. Here and there throughout the respondent records are some individ
ual missing data, symbolized by a “I.” One example of item missing data from
the CES is missing payroll totals for sample employers who have not finalized
their payroll records by the time the questionnaire must be returned.
2.2.9 Postsurvey Adjustments
After all respondents provide data and a set of data records for them is assembled,
there is often another step taken to improve the quality of the estimates made from
the survey. Because of nonresponse and because of some coverage problems
(mismatches of the sampling frame and the target population), statistics based on
the respondents may depart from those of the full population the statistics are
attempting to estimate. At this point, examination of unit nonresponse patterns
over different subgroups (e.g., the finding that urban response rates are lower than
rural response rates) may suggest an underrepresentation of some groups relative
to the sampling frame. Similarly, knowledge about the type of units not included
in the sampling frame (e.g., new households in the SOC or new employers in the
CES) may suggest an underrepresentation of certain types of target population Figure 2.4 A survey from a process perspective.
48 INFERENCE AND ERROR IN SURVEYS I I! f : c y c l e o f a su r v ey —q u a l it y p e r s p e c t iv e 49
avoiding nonresponse). After data collection, the data are edited and coded (i.e., | .3 The Lif e Cy c l e o f a Su r v ey f r om A Qu al it y
placed into a form suitable for analysis). The data file often undergoes some post Per spect iv e
survey adjustments, mainly for coverage and nonresponse errors. These adjust
ments define the data used in the final estimation or analysis step, which forms H e used Figure 2.2 to describe key terminology in surveys. The same figure is
the statistical basis of the inference back to the full target population. This book ihrliil to describe how survey methodologists think about quality. Figure 2.5 has
takes the perspective that good survey estimates require simultaneous and coor Hided in ovals a set of quality concepts that are common in survey methodology,
dinated attention to the different steps in the survey process. fhn h of them is placed in between successive steps in the survey process, to indi-
Hlo that the quality concepts reflect mismatches between successive steps. Most
ol lhe ovals contain the word “error” because that is the terminology most com
monly used. The job of a survey designer is to minimize error in survey statistics
Measurement Representation Ity muking design and estimation choices that minimize the gap between two suc-
MNNivc stages of the survey process. This framework is sometimes labeled the
"Mill survey error” framework or “total survey error” paradigm. total survey
There are two important things to note about Figure 2.5: error
1)Each of the quality components (ovals in Figure 2.5) has verbal descrip
tions and statistical formulations.
2) The quality components are properties of individual survey statistics
(i.e., each statistic from a single survey may differ in its qualities), not of
whole surveys.
I ho next sections introduce the reader both to the concepts of different quality
umiponents and to the simple statistical notation describing them. Since the
ijinility is an attribute not of a survey but of individual statistics, we could pres-
oni the statistics for a variety of commonly used statistics (e.g., the sample
llicitn, a regression coefficient between two variables, estimates of population
Mills). To keep the discussion as simple as possible, we will describe the error
components for a very simple statistic, the sample mean, as an indicator of the
h vcrage in the population of some underlying construct. The quality properties
of the sample mean will be discussed as a function of its relationship to the pop-
ulnlion mean.
We will use symbols to present a compact form of description of the error
concepts, because that is the traditional mode of presentation. The Greek letter j.i
Iiiiii) will be used to denote the unobservable construct that is the target of the
measurement. The capital letter Y will be used to denote the measurement meant
In reflect /./ (but subject to inevitable measurement problems). When the measure
ment is actually applied, we obtain a response called y (lower case).
The statistical notation will be
Hj = the value of a construct (e.g., reported number of doctor visits) for

the ith person in the population, i = 1,2,..., A
Y. = the value of a measurement (e.g., number of doctor visits) for the ith
sample person, i = 1,2,...,«
y= the value of the response to application of the measurement (e.g., an
Figure 2.5 Survey life cycle from a quality perspective.
answer to a survey question)
y ip = the value of the response after editing and other processing steps.
50 INFERENCE AND ERROR IN SURVEYS t i l l CYCLE OF A SURVEY— QUALITY PERSPECTIVE 51
In short, the underlying target attribute we are attempting to measure is p., but ill math ability for the z'th student is (52 - 57) = -5 , because Y = 52= p + s =
instead we use an imperfect indicator, Y., which departs from the target because M i (-5). ‘ ‘ '
of imperfections in the measurement. When we apply the measurement, there are One added feature of the measurement is necessary to understand notions of
problems of administration. Instead of obtaining the answer Y., we obtain instead Validity: a single application of the measurement to the z'th person is viewed as
y , the response to the measurement. We attempt to repair the weakness in the ih h ' ol an infinite number of such measurements that could be made. For exam-
measurement through an editing step, and obtain as a result y , which we call the |*l*’. iho answer to a survey question about how many times one has been victim-
edited response (the subscript p stands for “postdata collection”). Ui'il m the last six months is viewed as just one incident of the application of that
■Million to a specific respondent. In the language of psychometric measurement
theory, each survey is one trial of an infinite number of trials.
2.3.1 The Observational Gap between Constructs and Thus, with the notion of trials, the response process becomes
Measures
Yi, = ^ i + £ u
The only oval in Figure 2.5 that does not contain the word “error” corresponds
to mismatches between a construct and its associated measurement.
Measurement theory in psychology (called “psychometrics”) offers the richest j Now we need two subscripts on the terms: one to denote the element of the pop-
construct notions relevant to this issue. Construct “validity” is the extent to which the ulni ion (zj and one to denote the trial of the measurement (z). Any one application
validity measure is related to the underlying construct (“invalidity” is the term some p i llic measurement (t) is but one trial from a conceptually infinite number of pos-
times used to describe the extent to which validity is not attained). For example, phle measurements. The response obtained for the one survey conducted (Y for
in the National Assessment of Educational Progress, when measuring the con Hie tl h trial) deviates from the true value by an error that is specific to the one trial
struct of mathematical abilities of 4th graders, the measures are sets of arith HI,1 )• That is, each survey is one specific trial, t, of a measurement process, and
metic problems. Each of these problems is viewed to test some component of deviations from the true value for the z'th person may vary over trials (requir-
mathematical ability. The notion of validity is itself conceptual; if we knew each jltltf the subscript t, as in e u). For example, on the math ability construct, using a
student’s true mathematical ability, how related would it be to that measured by flHiticular measure, sometimes the z'th student may achieve a 52 as above, but on
the set of arithmetic problems? Validity is the extent to which the measures re li'pented administrations might achieve a 59 or a 49 or a 57, with the correspon
flect the underlying construct. ding error being +2 or -8 or 0, respectively. We do not really administer the test
In statistical terms, the notion of validity lies at the level of an individual ninny times; instead, we envision that the one test might have achieved different
respondent. It notes that the construct (even though it may not be easily observed pulcomes from the same person over conceptually independent trials.
or observed at all) has some value associated with the z'th person in the popula- Now we are very close to defining validity for this simple case of response
true value tion, traditionally labeled as p., implying the “true value” of the construct for the jdeviations from the true value. Validity is the correlation of the measurement, Y, validity
z'th person. When a specific measure of Y is admin «ml the true value, p., measured over all possible trials and persons:
istered (e.g., an arithmetic problem given to meas
The Notion of Trials ure mathematical ability), simple psychometric
measurement theory notes that the result is not p. Eu [ w, - y ) ( M i - ^ / [ ^ E ^ - Y f ^ E M - w ) 2
What does it mean when someone but something else;
says that a specific response to a sur
vey question is just “one trial of the Where p is merely the mean of the p u over all trials and all persons and Y is the
measurement process?” How can Y^pt+Si average of the Y.r The E at the beginning of the expression denotes an expected expected value
one really ask the same question of Or average value over all persons and all trials of measurement. When y and p
the same respondent multiple times aov&ry, moving up and down in tandem, the measurement has high construct
and learn anything valuable? The That is, the measurement equals the true value plus validity. A valid measure of an underlying construct is one that is perfectly corre-
answer is that “trials” are a concept, a some error term, e , the Greek letter epsilon, denot litlcd to the construct.
model of the response process. The ing a deviation from the true value. This deviation Later, we will become more sophisticated about this notion of validity, not
model posits that the response given is the basis of the notion of validity. For example, ing that two variables can be perfectly correlated but produce different values of
by one person to a specific question in the NAEP we might conceptualize mathematics mime of their univariate statistics. Two variables can be perfectly correlated but
is inherently variable. If one could ability as a scale from 0 to 100, with the average yield different mean values. For example, if all respondents underreport their
erase all memories of the first trial ability at 50. The model above says that on a par Weight by 5 pounds, then true weight and reported weight will be perfectly corre-
measurement and repeat the ques ticular measure of math ability, a student (z) who Inled, but the mean reported weight will be 5 pounds less than the mean of the true
tion, somewhat different answers has a true math ability of, say, 57, may achieve a weights. This is a point of divergence of psychometric measurement theory and
would be given. different score, say, 52. The error of that measure utirvey statistical error properties.
52 INFERENCE AND ERROR IN SURVEYS I III CYCLE OF A SURVEY—QUALITY PERSPECTIVE 53
2.3.2 Measurement Error: The Observational Gap between lu addition to systematic underreporting or
tlVPi reporting that can produce biased reports, The Notion of Variance
the Ideal Measurement and the Response Obtained or Variable Errors
lh<ire can be an instability in the response behav-
The next important quality component in Figure 2.5 is measurement error. By loi of a person, producing another kind of Whenever the notion of errors that
measurement measurement error we mean a departure from the true value of the measurement Ivxponse error. Consider the case of a survey are variable arises, there must be an
error as applied to a sample unit and the value provided. For example, imagine that the question in the SOC: “Would you say that at the assumption of replication (or trials) of
question from the National Survey on Drug Use and Health (NSDUH) is “Have ■ ta o n t time, business conditions are better or the survey process. When the esti
you ever, even once, used any form of cocaine?” A common finding (see Sections Wot sc than they were a year ago?” A common mates of statistics vary over those
5.3.5 and 7.3.7) is that behaviors that are perceived by the respondent as undesir Un spective on how respondents approach such a replications, they are subject to vari
able tend to be underreported. Thus, for example, the true value for the response question is that, in addition to the words of the able error. Variability at the response
to this question for one respondent may be “yes,” but the respondent will answer Individual question and the context of prior ques- step can affect individual answers.
“no” in order to avoid the potential embarrassment of someone learning of his/her [ lions, the respondent uses all other stimuli in the Variability in frame development, like
drug use. lueiisurement environment. But gathering such lihood of cooperation with the survey
To the extent that such response behavior is common and systematic across xlnnuli (some of which might be memories gen- request, or characteristics of sam
administrations of the question, there arises a discrepancy between the respondent mi tiled by prior questions) is a haphazard process, ples, can affect survey statistics.
mean response and the true sample mean. In the example above, the percentage unpredictable over independent trials. The result Usually, variance is not directly
of persons reporting any lifetime use of cocaine will be underestimated. In statis ol ilus is variability in responses over conceptual observed because the replications
tical notation, we need to introduce a new term that denotes the response to the 11mis of the measurement, often called “variabil- are not actually performed.
question as distinct from the true value on the measure, Yf for the /th person. We IIs in response deviations.” For this kind of
call the response to the question y., so we can denote the systematic deviation U'sponse error, lay terminology fits rather well;
from true values as (y. - Y.). Returning to the CES example, the count for non- lift is an example of low “reliability” or “unreliable” responses. (Survey statisti- reliability
supervisory employees might be 12 for some employer but the response to the i inns term this “response variance” to distinguish it from the error labeled above,
"response bias.”) The difference between response variance and response bias is response
question is 15 employees. In the terminology of survey measurement theory, a
response deviation occurs to the extent that y i * Y n in this case,y. —Y = 1 5 -1 2 llinl the latter is systematic, leading to consistent overestimation or underestima variance
= 3. tion of the construct in the question, but response variance leads to instability in
In our perspective on measurement, we again acknowledge that each act of Ilk- value of estimates over trials.
application of a measure is but one of a conceptually infinite number of applica
tions. Thus, we again use the notion of a “trial” to denote the single application
of a measurement. If response deviations described above are systematic, that is, 2.3.3 Processing Error: The Observational Gap between the
if there is a consistent direction of the response deviations over trials, then Variable Used in Estimation and that Provided by the
response bias “response bias” might result. “Bias” is the difference between the expected value Respondent
(over all conceptual trials) and the true value being estimated. Bias is a system
bias atic distortion of a response process. There are two examples of response biases What errors can be introduced after the data are collected and prior to estimation?
from our example surveys. In the NSDUH, independent estimates of the rates of l or example, an apparent outlier in a distribution may have correctly reported a
use of many substances asked about, including cigarettes and illegal drugs, sug vu Iue. A respondent in the National Crime Victimization Survey may report being
gest that the reporting is somewhat biased; that is, that people on average tend to assaulted multiple times each day, an implausible report that, under some editing
underreport how much they use various substances. Part of the explanation is that lilies, may cause a setting of the value to missing data. However, when the added
some people are concerned about how use of these substances would reflect on information that the respondent is a security guard in a bar is provided, the report
how they are viewed. It also has been found that the rates at which people are vic becomes more plausible. Depending on what construct should be measured by the
tims of crimes are somewhat underestimated from survey reports. One likely question, this should or should not be altered in an editing step. The decision can
explanation is that some individual victimizations, particularly those crimes that iilVdct processing errors.
have little lasting impact on victims, are forgotten in a fairly short period of time. Another processing error can arise for questions allowing the respondent to processing
Whatever the origins, research has shown that survey estimates of the use o f some phrase his or her own answer. For example, in the Surveys of Consumers, if a error
substances and o f victimization tend to underestimate the actual rates. Answers to respondent answers “yes” to the question, “During the last few months, have you
the survey questions are systematically lower than the true scores; in short, they heard of any favorable or unfavorable changes in business conditions?” the inter
arc biased. In statistical notation, we note the average or expected value of the viewer then asks, “What did you hear?” The answer to that question is entered
response over trials as Et(y.), where, as before, t denotes a particular trial into a text field, using the exact words spoken by the respondent. For example,
(or application) of the measurement. Response bias occurs when the respondent may say, “There are rumors of layoffs planned for my plant. I’m
worried about whether I’ll lose my job.” Answers like this capture the rich diver
E ,{ y n ) * l sity of situations of different respondents, but they do not lend themselves to
94 INFERENCE AND ERROR IN SURVEYS |H*( lY C L E OF A SURVEY— QUALITY PERSPECTIVE 55
quantitative summary, which is the main product of surveys. Hence, in a step

coding often called “coding,” these text answers are categorized into numbered classes.
For example, this answer might be coded as a member of a class labeled “Possible
layoffs at own work site/company.” The sample univariate summary of this meas
ure is a proportion falling into each class (e.g., “8% of the sample reported possi
ble layoffs at own work site/company”).
What errors can be made at this step? Different persons coding these text
answers can make different judgments about how to classify the text answers.
This generates a variability in results that is purely a function of the coding sys
tem (e.g., coding variance). Poor training can prompt all coders to misinterpret a
verbal description consistently. This would produce a coding bias.
In statistical notation, if we were considering a variable like income, subject
to some editing step, we could denote processing effects as the difference between
the response as provided and the response as edited. Thus,y. = response to the sur
vey question, as before, but y. = the edited version of the response. The process
ing or editing deviation is simply (y - y ) .
2.3.4 Coverage Error: The Nonobservational Gap between

the Target Population and the Sampling Frame Figure 2.6 Coverage of a target population by a frame.
The big change of perspective when moving from the left side (the measurement
side) of Figure 2.2 to the right side (the representation side) is that the focus
becomes statistics, not individual responses. Notice that the terms in Figure 2.5 In statistical terms for a sample mean, coverage bias can be described as a coverage bias
are expressions of sample means, simple statistics summarizing individual values Mim lion of two terms: the proportion of the target population not covered by the
of elements of the population. Although there are many possible survey statistics, Rumpling frame, and the difference between the covered and noncovered popula
we use the mean as our illustrative example. tion First, we note that coverage error is a property of a frame and a target pop- coverage error
Sometimes, the target population (the finite population we want to study) lllnlion on a specific statistic. It exists before the sample is drawn and thus is not
finite
does not have a convenient sampling frame that matches it perfectly. For exam II problem arising because we do a sample survey. It would also exist if we
population
ple, in the United States there is no updated list of residents that can be used as a Itk'iupted to do a census of the target population using the same sampling frame.
sampling frame of persons. In contrast, in Sweden there is a population register, Phils, it is simplest to express coverage error prior to the sampling step. Let us
an updated list of names and addresses of almost all residents. Sample surveys of [ #npi ess the effect of the mean of the sampling frame:
the target population of all U.S. residents often use sampling frames of telephone
numbers. The error that arises is connected to what proportion of U.S. residents ■Mean of the entire target population
can be reached by telephone and how different they are from others on the statis : Mean of the population on the sampling frame
tics in question. Persons with lower incomes and in remote rural areas are less Y, ; Mean of the target population not on the sampling frame
likely to have telephones in their homes. If the survey statistic of interest were the N = Total number of members of the target population
percentage of persons receiving unemployment compensation from the govern C = Total number o f eligible members of the sampling frame (“covered” ele
ment, it is likely that a telephone survey would underestimate that percentage. ments)
That is, there would be a coverage bias in that statistic. U = Total number of eligible members not on the sampling frame (“not cov
Figure 2.6 is a graphical image of two coverage problems with a sampling ered” elements)
frame. The target population differs from the sampling frame. The lower and left
portions of elements in the target population are missing from the frame (e.g.,
The bias of coverage is then expressed as
nontelephone households, using a telephone frame to cover the full household
undercoverage population). This is labeled “undercoverage” of the sampling frame with respect
to the target population. At the top and right portions of the sampling frame are a
set of elements that are not members of the target population but are members of
the frame population (e.g., business telephone numbers in a telephone frame try
ineligible units ing to cover the household population). These are “ineligible units,” sometimes Unit is, the error in the mean due to undercoverage is the product of the noncov-
overcoverage labeled “overcoverage” and sometimes the existence of “foreign elements.” fliigc rate (U/N) and the difference between the mean of the covered and noncov-
INFERENCE AND ERROR IN SURVEYS I II I CYCLE OF A SURVEY— QUALITY PERSPECTIVE 57
I
ered cases in the target population. The left side of the equation merely shows that illHi'iunt sets of frame elements could be drawn (e.g., different counties and
the coverage error for the mean is the difference between the mean of the covered llimvholds in the NCVS). Each set will have different values on the survey sta-
population and the mean of the full target population. The right side is the result Hltlc.
of a little algebraic manipulation. It shows that the coverage bias is a function of
the proportion of the population missing from the frame and the difference Just like the notion of trials of measurement (see Section 2.3.1), sampling
between those on and off the frame. For example, for many statistics on the U.S. B rllin e e rests on the notion of conceptual replications of the sample selection.
household population, telephone frames describe the population well, chiefly hpurc 2.7 shows the basic concept. On the left appear illustrations of the differ-
because the proportion of nontelephone households is very small, about 5% of the I fill possible sets of sample elements that are possible over different samples.
total population. Imagine that we used the Surveys of Consumers, a telephone I lie figure portrays S different sample “realizations,” or different sets of frame realization
survey, to measure the mean years of education, and the telephone households had ■Icnients, with frequency distributions for each (the x axis is the value of the
a mean of 14.3 years. Among nontelephone households, which were missed due ■VMiiiiblc and th e y axis is the number of sample elements with that value). Let
to this being a telephone survey, the mean education level is 11.2 years. Although | H« use our example of the sample mean as the survey statistic of interest. Each
the nontelephone households have a much lower mean, the bias in the covered ||i l Hie S samples produces a different sample mean. One way to portray the
mean is ■hiiiipling variance o f the mean appears on the right of the figure. This is the sampling
Klnuipling distribution of the mean, a plotting of the frequency of specific differ- distribution
■ n i values of the sample mean (the x axis is the value of a sample mean and the
Yc - Y = 0.05(14.3 years - 11.2 year) = 0.16 years I* as in is the number of samples with that value among the S different samples).
■hit' dispersion o f this distribution is the measure of sampling variance normal-
l |y employed. If the average sample mean over all S samples is equal to the
or, in other words, we would expect the sampling frame to have a mean years of I llit'iiu of the sampling frame, then there is no sampling bias for the mean. If the
education of 14.3 years versus the target population mean of 14.1 years. bln|H|nion of the distribution on the right is small, the sampling variance is low.
Coverage error on sampling frames results in sample survey means estimat I (Niimpling variance is zero only in populations with constant values on the vari-
ing the Yc and not the Y and, thus, coverage error properties of sampling frames j Hfelc of interest.)
generate coverage error properties of sample-based statistics.
2.3.5 Sampling Error: The Nonobservational Gap between

the Sampling Frame and the Sample
One error is deliberately introduced into sample survey statistics. Because of cost
or logistical infeasibility, not all persons in the sampling frame are measured.
Instead, a sample of persons is selected; they become the sole target of the meas
urement. All others are ignored. In almost all cases, this deliberate “nonobserva
tion” introduces deviation from the achieved sample statistics and the same sta
tistic on the full sampling frame.
For example, the National Crime Victimization Survey sample starts with the
entire set of 3067 counties within the United States. It separates the counties by
population size, region, and correlates of criminal activity, forming separate
groups or strata. In each stratum, giving each county a chance of selection, it
selects sample counties or groups of counties, totaling 237. All the sample per
sons in the survey will come from those geographic areas. Each month of the
sample selects about 8300 households in the selected areas and attempts inter
views with their members.
sampling error As with all the other survey errors, there are two types of sampling error:
sampling bias and sampling variance. Sampling bias arises when some members
sampling o f the sampling frame are given no chance (or reduced chance) of selection,
variance In such a design, every possible set of selections excludes them systematically.
sampling bias To the extent that they have distinctive values on the survey statistics, the statis
tics will depart from the corresponding ones on the frame population. Sampling
Figure 2.7 Samples and the sampling distribution of the mean.
variance arises because, given the design for the sample, by chance many
58 INFERENCE AND ERROR IN SURVEYS t i l l CYCLE OF A SURVEY—QUALITY PERSPECTIVE 59
The extent of the error due to sampling is a function of four basic principles r iihliiiued in surveys requiring sampling of inanimate objects (e.g., medical records
of the design: nl persons, housing units). Almost never does it occur in sample surveys of
I huiimns. For example, in the SOC, about 30-35% of the sample each month either
probability 1) Whether all sampling frame elements have known, nonzero chances of ■fludi's contact or refuses to be interviewed. In the main (nationwide) National
sampling selection into the sample (called “probability sampling”) 1 Assessment of Educational Progress (NAEP), about 17% of the schools refuse
2) Whether the sample is designed to control the representation of key sub pmiicipation, and within cooperating schools about 11% of the students are not
stratification
populations in the sample (called “stratification”) IlieitNiired, either because of absences or refusal by their parents. In even years,
element 3) Whether individual elements are drawn directly and independently or in *iTools are required by law to participate in reading and mathematics compo-
sample groups (called “element” or “cluster” samples) IH'iils, grades 4 and 8. As a result, in 2007, there was 100% school participation.
4) How large a sample of elements is selected Nonresponse error arises when the values of statistics computed based only nonresponse
cluster sample jim icspondent data differ from those based on the entire sample data. For exam error
Using this terminology, the NCVS is thus a stratified, clustered probability sam ple, if the students who are absent on the day of the NAEP measurement have
ple of approximately 8500 households per month. Hnwcr knowledge in the mathematical or verbal constructs being measured, then
Sampling bias is mainly affected by how probabilities of selection are as NAEP scores suffer nonresponse bias, they systematically overestimate the
signed to different frame elements. Sampling bias can be easily removed by giving knowledge of the entire sampling frame. If the nonresponse rate is very high, then
all elements an equal chance of selection. Sampling variance is reduced with big ■hi' amount of the overestimation could be severe.
samples, with samples that are stratified, and samples that are not clustered. Most of the concern of practicing survey researchers is about nonresponse
In statistical terms, hlii'., and its statistical expression resembles that o f coverage bias described in
IN» i lion 2.3.1:
ys = Mean of the specific sample draw, sam ples; s = 1, 2,..., S
Y = Mean of the total set of C elements in the sampling frame j y t = Mean of the entire specific sample as selected
y “ Mean of the respondents within the 5th sample
These means (in simple sample designs) have the form:
y m= Mean of the nonrespondents within the 5th sample
nv = Total number of sample members in the 5th sample
X t ,,- _ J j Y, r = Total number of respondents in the 5th sample
y s = ^ - , and Yc = ^ — mt = Total number of nonrespondents in the 5th sample
ns C
I lie nonresponse bias is then expressed as an average over all samples of

sampling “Sampling variance” measures how variable the y s are over all sample real
variance izations. The common measurement tool for this is to use squared deviations of _ _ m ._ _ .
y r - y s = — (t v - t J
the sample means about the mean of the sampling frame, so that the “sampling n.
variance” of the mean is
II ms, nonresponse bias for the sample mean is the product of the nonresponse nonresponse
Inlc (the proportion of eligible sample elements for which data are not collected) bias
i(^ c )2 l mid the difference between the respondent and nonrespondent mean. This indi-
j =i_________
i Hies that response rates alone are not quality indicators. High response rate sur
s veys can also have high nonresponse bias (if the nonrespondents are very distinc
tive on the survey variable). The best way to think about this is that high response
When sampling variance is high, the sample means are very unstable. In that
nilcs reduce the risk of nonresponse bias.
situation, sampling error is very high. That means that for any given survey with
that kind of design, there is a larger chance that the mean from the survey will be
comparatively far from the true mean of the population from which the sample
was drawn (the sample frame). 2.3.7 Adjustment Error
I lie last step in Figure 2.5 on the side of errors of nonobservation is postsurvey
2.3.6 Nonresponse Error: The Nonobservational Gap mliustments. These are efforts to improve the sample estimate in the face of cov
between the Sample and the Respondent Pool et uge, sampling, and nonresponse errors. (In a way, they serve the same function
UNthe edit step on individual responses, discussed in Section 2.3.3.)
Despite efforts to the contrary, not all sample members are successfully measured The adjustments use some information about the target or frame population,
in surveys involving human respondents. Sometimes, 100% response rates are in response rate information on the sample. The adjustments give greater weight
60 INFERENCE AND ERROR IN SURVEYS |H I« >l< NOTIONS IN DIFFERENT KINDS OF STATISTICS 61
to sample cases that are underrepresented in the final dataset. For example, some A oivm g how the notation for a simple sample mean varies over the development
adjustments pertain to nonresponse. Imagine that you are interested in the rate of til the survey. This notation will be used in other parts of the text. Capital letters
personal crimes in the United States and that the response rate for urban areas in It ill slnnd for properties of population elements. Capital letters will be used for
the National Crime Victimization Survey is 85% (i.e., 85% of the eligible sample Hu1discussions about measurement, when the sampling of specific target popula-
persons provide data on a specific victimization), but the response rate in the resl ■ttli members is not at issue. In discussions about inference to the target popula-
of the country is 96%. This implies that urban persons are underrepresented in the B mii through the use of a sample, capital letters will denote population elements
respondent dataset. One adjustment weighting scheme counteracts this by creat l|iul lower case letters will denote sample quantities. The subscripts of the vari-
ing two weights, w. = 1/0.85 for urban respondents and w. = 1/0.96 for other Mtli's will indicate membership in subsets of the population (e.g., i for the /th per-
respondents. An adjusted sample mean is computed by « n , or the existence of adjustment, such as w for weighting).
r
X w,y„
h ,5 Er r o r No t io n s in Dif f er en t Kinds o f St at ist ics
i=i I ho presentation above focused on just one possible survey statistic—the sample
blH'tiM -to illustrate error principles. There are many other statistics computed
which has the effect of giving greater weight to the urban respondents in the com [ftimi surveys (e.g., correlation coefficients and regression coefficients).
putation of the mean. The error associated with the adjusted mean relative to the Two uses of surveys, linked to different statistics, deserve mention:
target population mean is
1) Descriptive uses (i.e., how prevalent an attribute is in the population,
how big a group exists in the population, or the average value on some
0L - n
quantitative measure)
2) Analytic uses (i.e., what causes some phenomenon to occur or how two
which would vary over different samples and applications of the survey. That is, attributes are associated with one another)
adjustments generally affect the bias of the estimates and the variance (over sam
ples and implementations of the survey). Finally, although postsurvey adjust
Many surveys are done to collect information about the distribution of char-
ments are introduced to reduce coverage, sampling, and nonresponse errors, they
[H'leristics, ideas, experiences, or opinions in a population. Often, results are
can also increase them, as we will leam later in Chapter 10.
■pported as means or averages. For example, the NCVS might report that 5% of
n e people have had their car stolen in the past year.
In contrast, statements such as “Women are more likely to go to a doctor than
2.4 Put t ing It A l l To g et h er turn,” “Republicans are more likely to vote than Democrats,” or “Young adults
Mlv more likely to be victims of crime than those over 65” are all statements about
The chapter started by presenting three perspectives on the survey. The first, por li lidionships. For some purposes, describing the degree of relationship is impor
trayed in Figure 2.2, showed the stages of survey design, moving from abstract tant. So a researcher might say that the correlation between a person’s family
concepts to concrete steps of activities. Next, in Figure 2.4, we presented the steps [lllrome and the likelihood of voting is 0.23. Alternatively, the income rise associ-
of a survey project, from beginning to end. Finally, in Figure 2.5, we presented | llctl with an investment in education might be described by an equation, often
the quality characteristics of surveys that the field associates with each of the culled a “model” of the income generation process:
steps and the notation used for different quantities. The quality components focus
on two properties of survey estimates: errors of observation and errors of nonob
servation. Errors of observation concern successive gaps between constructs, M y,) = P0 + P ix ,+ P 2xf
measures, responses, and edited responses. Errors of nonobservation concern suc
cessive gaps between statistics on the target population, the sampling frame, the where y. is the value of the ith person’s income, and x. is the value of the /th per-
sample, and the respondents from the sample. lon’s educational attainment in years.
For some of these errors we described a systematic source, one that produced The hypothesis tested in the model specification is that the payoff in income
consistent effects over replications or trials (e.g., nonresponse). We labeled these ill educational investments is large for the early years and then diminishes with
“biases.” For others, we described a variable or random source of error (e.g., ndditional years. If the coefficient P { is positive and ft2 is negative, there is some
validity). We labeled these as “variance.” In fact, as the later chapters in the book jiupport for the hypothesis. This is an example of using survey data for causal
will show, all of the error sources are both systematic and variable, and contain innlysis. In this case, the example concerns whether educational attainment
both biases and variances. Onuses income attainment.
The reader now knows that the quantitative products of surveys have their Are statistics like a correlation coefficient and a regression coefficient sub
quality properties described in quantitative measures. Look again at Figure 2.5, ject to the same types of survey errors as described above? Yes. When survey
62 INFERENCE AND ERROR IN SURVEYS NUMMARY 63
data are used to estimate the statistics, they can be subject to coverage, sam Nomctimes, there is gap between the construct measured by the survey and that
pling, nonresponse, and measurement errors, just as can simple sample means. IH'rdod by the user. For example, a user may wish to have a prevalence indicator
The mathematical expressions for the errors are different, however. For most Ml economic suffering, the extent to which emotional and physical discomfort
survey statistics measuring relationships, the statistical errors are properties of w Ucn from economic difficulties among persons. They may use the unemploy
crossproducts between the two variables involved (e.g., covariance and variance ment rate as an indicator of such suffering. The relevance of this indicator might
properties). be ct iticized by noting that some unemployment does not produce such suffering,
The literatures in analytic statistics and econometrics are valuable for under line to government support mechanisms. The reader may note a similarity
standing errors in statistics related to causal hypotheses. The language of error between notions of “construct validity” (see p. 49) and relevance. Relevance
used in these fields is somewhat different from that in survey statistics, but many ftH USes on differences among constructs; construct validity relates to differences
of the concepts are similar. be lween a construct and a given measurement.
Thus, the kind of analysis one is planning, and the kinds of questions one The third notion is “timeliness.” One key determinant of whether the survey timeliness
wants to answer can be related to how concerned one is about the various kinds intimate is fit for a user’s purpose is whether the estimate is available at a time
of error that we have discussed. Bias, either in the sample of respondents answer flmlod for the decision based on the information. For example, a survey estimate
ing questions or in the answers that are given, is often a primary concern for those Ol the Surveys of Consumers describing the confidence of the U.S. public in
focused on the goal of describing distributions. If the main purpose of a survey is March, 2009 is of little value to macroeconomists a year later in March, 2010.
to estimate the percentage of people who are victims of particular kinds of crime, I mieliness of an estimate is completely determined by its use.
and those crimes are systematically underreported, then that bias in reporting will Indeed, all three of these notions—credibility, relevance, and timeliness—are
have a direct effect on the ability of the survey to achieve those goals. In contrast, ■IK'N that are well defined only when specific to a particular use of a survey esti-
if the principal concern is whether old people or young people are more likely to :||Uilc. The notions lie outside the paradigm of total survey error (and we will not
be victims, it could be that the ability of the survey to accomplish that goal would dl*euss them further), and they have not influenced the field of survey methodol-
be unaffected if there were consistent underreporting of minor crimes. If the bias Itliy in the same way as others. Nonetheless, they are important notions when con-
was really consistent for all age groups, then the researcher could reach a valid itiloring how the same estimates might be used in different ways by different
conclusion about the relationship between age and victimization, even if all the titters.
estimates of victimization were too low.
2.6 No n st at ist ical No t io n s o f Su r v ey Qu al it y 2.7 Summar y
In addition to the components o f the total survey error perspective reviewed Nmnple surveys rely on two types of inference - from the questions to constructs,
above, there are three additional notions that are relevant to assessing the quality Hinl from the sample statistics to the population statistics. The inference involves
of a survey estimate. These three notions arise from the overall desire to maxi Iwo coordinated sets of steps; obtaining answers to questions constructed to mir-
fitness for use mize the “fitness for use” o f an estimate. “Fitness for use” acknowledges that dif for the constructs, and identifying and measuring sample units that form a micro
ferent users of the same estimate may have different purposes for the information. cosm of the target population.
For one, highly accurate estimates are necessary for a good decision based on the Despite all efforts, each of the steps is subject to imperfections, producing
information; for another, rough orders of magnitude of the population value arc Utmlstical errors in survey statistics. The errors involving the gap between
sufficient. This viewpoint implies that a “good” estimate for the second user may Ihc measures and the construct are issues of validity. The errors arising during
not be a “good” estimate for the first. High fitness for use means the indicator pro Hpplication of the measures are called “measurement errors.” Editing and
vides the information needed for the specific use. processing errors can arise during efforts to prepare the data for statistical
credibility The first notion is “credibility,” the extent to which the producer of the infor Bimlysis. Coverage errors arise when enumerating the target population using
mation is judged by the user to be free of any particular point of view, a perspec it Ntimpling frame. Sampling errors stem from surveys measuring only a subset
tive on the phenomena being measured that may influence an outcome of the sur of the frame population. The failure to measure all sample persons on all meas
vey in a known direction. Central government statistical agencies strive to ures creates nonresponse error. “Adjustment” errors arise in the construction
achieve the image of neutrality and objectivity. Scientists using the survey nf statistical estimators to describe the full target population. All of these
method (a) document each step in their design and implementation, to facilitate error sources can have varying effects on different statistics from the same sur
replication of their results, then they (b) explicitly note the weaknesses in the esti vey.
mates that may affect their conclusions. Both of these steps are intended to This chapter introduced the reader to these elementary building blocks of the
enhance the credibility of the estimates. held of survey methodology. Throughout the text, we will elaborate and add to
relevance The second notion is “relevance.” A survey estimate is relevant to a user if it the concepts in this chapter. We will describe a large and dynamic research liter-
measures a construct quite similar in meaning to the user’s main concern. itlurc that is discovering new principles of human measurement and estimation of
64 INFERENCE AND ERROR IN SURVEYS M KCISES 65
large population characteristics. We will gain insight into how the theories being SKFRCISES
developed lead to practices that affect the day-to-day tasks of constructing and
implementing surveys. The practices will generally be aimed at improving the ) A recent newspaper article reported that “sales of handheld digital devices
quality of the survey statistics (or reducing the cost of the survey). Often, the (e.g., Blackberries and PDAs) are up by nearly 10% in the last quarter, while
practices will provide new measurements of how good the estimates are from the sales of laptops and desktop PCs have remained stagnant.” This report was
survey. based on the results of an on-line survey in which 9.8% of the more than
126,000 respondents said that they had “purchased a handheld digital device
between January 1 and April 30 of this year.”
K e y w o r ds E-mails soliciting participation in this survey were sent to individuals
using an e-mail address frame from the five largest commercial Internet serv
bias outlier detection ice providers (ISPs) in the United States. Data collection took place over a 6-
cluster sample overcoverage week period beginning May 1, 2002. The overall response rate achieved in
coding postsurvey adjustment this survey was 13%.
construct probability sampling Assume that the authors of this study wanted to infer something about the
construct validity processing erro expected purchases of U.S. adults (18 years old and older).
coverage bias realization
coverage error relevance
ti)What is the target population? What is the population in the sample
credibility reliability
frame?
element sample respondents
I b) Based on this chapter and your readings, briefly discuss how the design
error response
of this survey might affect the following sources of error:
errors of nonobservation response bias
errors of observation response variance • Coverage error
expected value sampling error • Nonresponse error
finite population sampling bias • Measurement error
fitness for use sampling frame
imputation sampling variance c) Without changing the duration or the mode of this survey (i.e., computer
ineligible units statistic assisted or self-administered), what could be done to reduce the errors
inference stratification you outlined in lb? For each source of error, suggest one change that
item missing data target population could be made to reduce this error component, making sure to justify
measurement timeliness your answer based on readings and lecture material.
measurement error total survey error d) To lower the cost of this survey in the future, researchers are consider
mode of data collection true values ing cutting the sample in half, using an e-mail address frame from only
nonrespondents undercoverage the two largest ISPs. What effect (if any) will these changes have on
nonresponse bias unit response sampling error and coverage error?
nonresponse error validity
Describe the difference between coverage error and sampling error in survey
observation unit weighting
estimates.
Given what you have read about coverage, nonresponse, and measurement
Fo r M o r e In -D ept h R e a d in g errors, invent an example of a survey design in which attempting to reduce
one error might lead to another error increasing. After you have constructed
Biemer, P. and Lyberg, L. (2003), Introduction to Survey Quality, New York: the example, invent a methodological study design to investigate whether the
Wiley. reduction of the one error actually does increase the other.
Groves, R. M. (1989), Survey Errors and Survey Costs, New York: Wiley. This chapter described errors of observation and errors of nonobservation.
Lessler, J. and Kalsbeek, W. (1992), Nonsampling Error in Surveys, New York:
Wiley. a) Name three sources of error that affect inference from the sample from
which data were collected to the target population.
Weisberg, H. (2005), The Total Survey Error Approach: A Guide to the New b) Name three sources of error that affect inference from the respondents’
Science o f Survey Research, Chicago: University of Chicago Press. answers to the underlying construct.
66 INFERENCE AND ERROR IN SURVEYS I<< ISES 67
c) For each source of error you mentioned, state whether it potentially li)Do you consider yourself to be a happy person?
affects the variance of estimates, biases the estimates, or both. I) Has a doctor ever told you that you have high blood pressure?
|) How would you rate your doctor in ability to diagnose and propose treat
5) For each of the following design decisions, identify which error sources ments for medical problems: excellent, good, fair or poor?
described in your readings might be affected. Each design decision can affect k) In the past week, have you prepared any meals for yourself?
at least two different error sources. Write short (2-4 sentences) answers to
each point. from an inference perspective, what is the central concern one should have
about those who are sampled but do not respond to surveys?
a) The decision to include or exclude institutionalized persons (e.g., resid
ing in hospitals, prisons, or military group quarters) from the sampling
frame in a survey of the prevalence of physical disabilities in the United
States.
b) The decision to use self-administration o f a mailed questionnaire for a
survey of elderly Social Security beneficiaries regarding their housing
situation.
c) The decision to use repeated calls persuading reluctant respondents in a
survey o f customer satisfaction for a household product manufacturer.
d) The decision to reduce costs of interviewing by using existing office per
sonnel to interview a sample of patients of a health maintenance organ
ization (FIMO) and thereby increase the sample size of the survey. The
topic of the survey is satisfaction with the medical care they receive.
e) The decision to increase the number of questions about assets and
income in a survey of income dynamics, resulting in a lengthening of the
interview.
f) The decision to extend interviewing on a survey of use of child care
facilities by parents of young children from the originally scheduled
period of January 1-May 1, to the new schedule of January 1-August 1.
g) The decision to include prisons and hospitals in the sampling frame for
a study of consumer expenditures.
h) The decision to use an existing trained staff of female interviewers
(instead of hiring and training some male interviewers) in a survey meas
uring attitudes toward an amendment to the constitution to provide equal
rights under the law to females and males.
i) The decision to change from a face-to-face interview design to a mailed
questionnaire mode in a household survey of illegal drug usage.
6) For each of the following questions, state briefly what the construct is that
you think the question is most likely designed to measure. In some cases,
there may be more than one plausible measurement goal.
a) How old are you?

b) Are you married?
c) Do you own a car?
d) What is your income?
e) Did you vote in the last election for U.S. President?
f) Do you consider yourself to be a Democrat, a Republican, or an
Independent?
g) In the next 12 months, do you think that the economy will get better, get
worse, or will it stay about the same as it is now?

Inference and Errors in Surveys - Groves

Uploaded by

Copyright:

Available Formats

Inference and Errors in Surveys - Groves

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inference and Errors in Surveys - Groves

Uploaded by

Copyright:

Available Formats

SURVEY METHODOLOGY

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Library o f Congress Cataloging-in-Publication Data:

Survey methodology / Robert Groves ... [et al.]. — 2nd ed.

Printed in the United States of America.

INFERENCE AND ERROR

Figure 2.1 Two types of survey inference.

Figure 2.3 Unit and item nonresponse in a survey data file.

2.2.9 Postsurvey Adjustments

Hj = the value of a construct (e.g., reported number of doctor visits) for

quantitative summary, which is the main product of surveys. Hence, in a step

2.3.4 Coverage Error: The Nonobservational Gap between

2.3.5 Sampling Error: The Nonobservational Gap between

I lie nonresponse bias is then expressed as an average over all samples of

2.6 No n st at ist ical No t io n s o f Su r v ey Qu al it y 2.7 Summar y

a) How old are you?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.