05B-Reliability
05B-Reliability
What is Reliability?
The reliability measure is large (for example, 1) when there is no error and small (for
example, 0) when there is error. A statistic which satisfies these conditions is called the
reliability coefficient. The reliability coefficient (r) is the ratio of variability of the true
score component of the observed score to the variability of the observed score (which
include both true and error components).
The following table is a standard in interpreting the values of the reliability coefficient
(De Guzman-Santos, 2007 p.64):
Reliability Interpretation
0.90 and above Excellent Reliability. At the level of the best standardized tests.
0.80 - 0.89 Very Good. This is ideal for a classroom test.
0.70 - 0.79 Good. This is good for a classroom test. There are probably a few
items which could be improved.
0.60 - 0.69 Somewhat Low. This test needs to be supplemented by other
measures to determine grades. There are probably some items
which could be improved.
0.50 - 0.59 Needs Revision
0.49 and below Questionable Reliability
Pearson r. This is used in describing the reliability estimates from a test-retest, alternate form,
and split-half reliabilities. The formula is as follows:
𝑁𝛴𝑋𝑌 − (𝛴𝑋)(𝛴𝑌)
𝑟=
√[𝑁𝛴𝑋 2 − (𝛴𝑋)2 ][𝑁𝛴𝑌 2 − (𝛴𝑌)2 ]
where:
r = the Pearson Product Moment Correlation Coefficient
ΣXY = sum of the product of X and Y scores
ΣX = sum of X-scores
ΣY = sum of Y-scores
ΣX2 = sum of the squares of X-scores
ΣY2 = sum of the squares of Y-scores
N = number of cases
For Example:
Mr. Tucmo administered his statistics test to ten (10) first year college students. After
two weeks, the same test was given to the same group of students. Their scores in the first and
second tests are shown below. Compute the reliability of the Test.
𝑁𝛴𝑋𝑌 − (𝛴𝑋)(𝛴𝑌)
𝑟=
√[𝑁𝛴𝑋 2 − (𝛴𝑋)2 ][𝑁𝛴𝑌 2 − (𝛴𝑌)2 ]
10(9843) − (304)(303)
𝑟=
√[10(9924) − (304)2 ][10(9841) − (303)2 ]
r = 0.94
Spearman-Brown Prophecy Formula. This is a reliability estimate of the whole test using the
results of the Pearson Product Moment Correlation Coefficient of a half of the test. This serves
as the correction for the reliability estimate. The formula is,
2roe
rt =
1 + roe
where:
rt = reliability of the whole test
roe = splif-half reliability
For Example:
Odd (X) Even (Y) XY X2 Y2
14 19 266 196 361
19 18 342 361 324
17 18 306 289 324
15 13 195 225 169
20 15 300 400 225
11 9 99 121 81
24 20 480 576 400
16 15 240 256 225
15 15 225 225 225
15 13 195 225 169
ΣX = 166 ΣY = 155 ΣXY = 2648 ΣX2 = 2874 ΣY2 = 2503
𝑁𝛴𝑋𝑌 − (𝛴𝑋)(𝛴𝑌)
𝑟=
√[𝑁𝛴𝑋 2 − (𝛴𝑋)2 ][𝑁𝛴𝑌 2 − (𝛴𝑌)2 ]
10(2648) − (166)(155)
𝑟=
√[10(2874) − (166)2 ][10(2503) − (155)2 ]
r = 0.69
2roe
rt =
1 + roe
2 (0.69)
rt =
1 + 0.69
rt = 0.82
𝑘 𝑥 (𝑘 − 𝑥 )
𝐾𝑅21 = { } {1 − }
𝑘−1 𝑘 (𝑠𝑑 )
where:
x = the mean of the obtained scores
sd = the standard deviation
k = the total number of items
For Example:
Ms. Reyes administered a 50-item mathematics test to her Grade VI pupils. The scores of her
pupils are shown below. Find the reliability of her test by using Kuder-Richardson Formula 21.
x = 28.8 sd = 89.07 k = 50
Solving for KR-21:
𝑘 𝑥 (𝑘 − 𝑥 )
𝐾𝑅21 = { } {1 − }
𝑘−1 𝑘 (𝑠𝑑 )
50 28.8(50 − 28.8)
𝐾𝑅21 = { } {1 − }
50 − 1 50(89.07)
𝐾𝑅21 = 0.88
A number of factors have shown to affect the conventional measures of reliability. If sound
conclusions are to be drawn, these factors must be considered when interpreting reliability
coefficients.
A. Length of Test
In general, the longer the test is, the higher its reliability will be. This is because a longer test
will provide a more adequate sample of the behavior being measured, and the scores are apt
to be less distorted by chance factor such as guessing. Suppose that to measure the spelling
ability, we asked pupils to spell one word. The result would be patently unreliable. Pupils who
were able to spell the word would be perfect spellers, and pupils who would not would be
complete failures. If we happened to select a difficult word, most pupils would fail; If the word
was an easy one, most pupils would appear to be perfect spellers. The fact that one word
provides an unreliable estimate of a pupil’s spelling ability is obvious. It should be equally
apparent that as we add more spelling word to the list, we come closer and closer to a good
estimate of each child’s spelling ability. Score based on a large number of spelling words thus
are more apt to reflect real differences in spelling ability and therefore to be more stable. By
increasing the size of the sample of spilling behavior, therefore, we increase the consistency of
our measurement.
B. Spread of Scores
Reliability coefficient is directly influenced by the spread of scores in the group tested. Other
things being equal, the larger the spread of scores is, the higher the estimate of reliability will
be. Because large reliability coefficient result when individuals tends to stay in the same
relative position in a group from one instrument administration to another, it naturally follows
that anything that reduces the possibility of shifting position in the group also contributes to
larger reliability of shifting positions.
C. Difficulty Of Test
Tests those are too easy or difficult for the group members taking it will tends to produce
scores of low reliability. This is because both easy and difficult tests result in a restricted
spread of scores. For the easy test, the scores are close together at the top end of the scale.
For the difficult test, the scores are grouped together at the bottom end of scale. For both,
however, the differences among individuals are small and tend to be unreliable.
D.Objectivity
The objectivity of a test refers to the degree to which equally competent scores obtain the
same results. Most standardized tests, aptitude and achievement, are high in objectivity. The
test items are of the objective type (e.g., multiple choice), and the resulting scores are not
influenced by the scorers’ judgment or opinion. In fact, such tests are usually constructed so
that they can be scored by trained clerks and scoring machines. When such highly objective
procedures are used, the reliability of the test result is not affected by the scoring procedures.
1. Sensitivity. It is the ability of the instrument to make the discriminations required for
the problem. A single characteristic measured may yield variations within subjects,
between subjects, and between groups. The degree of variation should be detected by
the instrument. If the reliability and validity of the test is high, most likely, the test is also
sensitive enough to make finer distinctions in the degree of variations of the
characteristics being measured.
2. Objectivity. It is the degree to which the measure is independent of the personal
opinions, subjective judgment, biases, and beliefs of the individual test user.
Regardless of sex, age, and appearance and gestures of the examiner, a respondent
should obtain a score that is stable and accurate, free from any influence of the
personal variables of the examiner. Item Analysis is a process that makes the test
items objective.
b. The manner of interpreting individual test items by the examinees who took
the test: a well-constructed test item should lend itself to one and only one
interpretation by examinee who know the subject in question.
c. Interpretability: Test results can be useful only when they are properly
evaluated. However, they can only be evaluated after they are interpreted.
b. Economy
❖ They should be economical in expenses. One way to economize cost for
test is to use answer sheets and reusable tests. However, test validity
and reliability should not be sacrificed.
❖ They should be economical in time. Test can be given in a short period of
time and are likely to gain the cooperation of the respondents. Also, it
conserves the time of all those involved in test administration.
6. Interesting. Tests that are interesting and enjoyable help to gain the cooperation of
the subject. However, those that are dull or seem silly may encourage or antagonize
the subject. Under these unfavorable conditions, the test is not likely to yield useful
results.