reliability
reliability
• are the test performance data of a particular group of PARALLEL-FORMS AND ALTERNATE-FORMS
test takers that are designed for use as a reference RELIABILITY ESTIMATES
when evaluating or interpreting individual test scores a. PARALLEL FORMS
NORMATIVE SAMPLE • exist when, for each form of the test, the means
and the variances of observed test scores are
• that group of people whose performance on a equal
particular test is analyzed for reference in evaluating b. ALTERNATE FORMS
the performance of individual test takers
• to evaluate the relationship between different
NORMING forms of a measure
• refer to the process of deriving norms • are simply different versions of a test that have
been constructed to be parallel
SPLIT-HALF RELIABILITY ESTIMATES THE TRUE SCORE MODEL OF MEASUREMENT AND
• obtained by correlating two pairs of scores obtained ALTERNATIVES TO IT
from equivalent halves of a single test administered TRUE SCORE
once • a value that genuinely reflects an individual’s ability
OTHER METHODS OF ESTIMATING INTERNAL (or trait) level as measured by a particular test
CONSISTENCY DICHOTOMOUS TEST ITEMS
a. INTER-ITEM CONSISTENCY • test items or questions that can be answered with only
• to evaluate the extent to which items on a scale one of two alternative responses, such as true-false,
relate to one another yes–no, or correct–incorrect questions
• refers to the degree of correlation among all the
POLYTOMOUS TEST ITEMS
items on a scale
o HOMOGENEITY – the degree to which a test • test items or questions with three or more alternative
measures a single factor responses
o HETEROGENEITY – the degree to which a IV. RELIABILITY AND INDIVIDUAL SCORES
test measures different factor
b. KUDER-RICHARDSON FORMULA 20 (KR-20) STANDARD ERROR OF MEASUREMENT
• the statistic of choice for determining the inter- • often abbreviated as SEM or SEM
item consistency of dichotomous items, • the tool used to estimate or infer the extent to which
primarily those items that can be scored right or an observed score deviates from a true score
wrong (such as multiple-choice items) STANDARD ERROR OF THE DIFFERENCE
c. COEFFICIENT ALPHA
• appropriate for use on tests containing non- • a statistical measure that can aid a test user in
dichotomous items determining how large a difference should be before
• typically ranges in value from 0 to 1 it is considered statistically significant
d. AVERAGE PROPORTIONAL DISTANCE (APD)
• a measure that focuses on the degree of
difference that exists between item scores
MEASURES OF INTER-SCORER RELIABILITY
INTER-SCORER RELIABILITY
• to evaluate the level of agreement between raters on
a measure
• the degree of agreement or consistency between two
or more scorers (or judges or raters) regarding a
particular measure
III. USING AND INTERPRETING A COEFFICIENT
OF RELIABILITY
THREE (3) APPROACHES TO THE ESTIMATION OF
RELIABILITY
1. TEST-RETEST
2. ALTERNATE OR PARALLEL FORMS
3. INTERNAL OR INTER-ITEM CONSISTENCY
THE NATURE OF THE TEST
1. HOMOGENEITY VERSUS HETEROGENEITY OF
TEST ITEMS
2. DYNAMIC VERSUS STATIC CHARACTERISTICS
3. RESTRICTION OR INFLATION OF RANGE
4. SPEED TESTS VERSUS POWER TESTS
5. CRITERION-REFERENCED TESTS