Reliability (Statistics) : Navigation Search
Reliability (Statistics) : Navigation Search
Contents
[hide]
1 Types
2 Difference from
validity
3 Estimation
6 See also
7 References
8 External links
[edit] Types
There are several general classes of reliability estimates:
An example often used to illustrate the difference between reliability and validity in the
experimental sciences involves a common bathroom scale. If someone who is 200 pounds
steps on a scale 10 times and gets readings of 15, 250, 95, 140, etc., the scale is not reliable.
If the scale consistently reads "150", then it is reliable, but not valid. If it reads "200" each
time, then the measurement is both reliable and valid. This is what is meant by the statement,
"Reliability is necessary but not sufficient for validity."
[edit] Estimation
Reliability may be estimated through a variety of methods that fall into two types: single-
administration and multiple-administration. Multiple-administration methods require that two
assessments are administered. In the test-retest method, reliability is estimated as the Pearson
product-moment correlation coefficient between two administrations of the same measure:
see also item-total correlation. In the alternate forms method, reliability is estimated by the
Pearson product-moment correlation coefficient of two different forms of a measure, usually
administered together. Single-administration methods include split-half and internal
consistency. The split-half method treats the two halves of a measure as alternate forms. This
"halves reliability" estimate is then stepped up to the full test length using the Spearman–
Brown prediction formula. The most common internal consistency measure is Cronbach's
alpha, which is usually interpreted as the mean of all possible split-half coefficients.[3]
Cronbach's alpha is a generalization of an earlier form of estimating internal consistency,
Kuder-Richardson Formula 20.[3]
These measures of reliability differ in their sensitivity to different sources of error and so
need not be equal. Also, reliability is a property of the scores of a measure rather than the
measure itself and are thus said to be sample dependent. Reliability estimates from one
sample might differ from those of a second sample (beyond what might be expected due to
sampling variations) if the second sample is drawn from a different population because the
true variability is different in this second population. (This is true of measures of all types—
yardsticks might measure houses well yet have poor reliability when used to measure the
lengths of insects.)
R(t) = 1 − F(t).
where ρxx' is the symbol for the reliability of the observed score, X; , , and are
the variances on the measured, true and error scores respectively. Unfortunately, there is
no way to directly observe or calculate the true score, so a variety of methods are used to
estimate the reliability of a test.