0% found this document useful (0 votes)
9 views

05B-Reliability

The document discusses the concept of reliability in measurement, defining it as the consistency of a test's results over time. It outlines various methods for measuring reliability, including test-retest, equivalence, and internal consistency, and provides a framework for interpreting reliability coefficients. Additionally, it highlights factors affecting reliability, such as test length, score spread, difficulty, and objectivity, as well as other test characteristics like sensitivity and usability.

Uploaded by

empisliza5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

05B-Reliability

The document discusses the concept of reliability in measurement, defining it as the consistency of a test's results over time. It outlines various methods for measuring reliability, including test-retest, equivalence, and internal consistency, and provides a framework for interpreting reliability coefficients. Additionally, it highlights factors affecting reliability, such as test length, score spread, difficulty, and objectivity, as well as other test characteristics like sensitivity and usability.

Uploaded by

empisliza5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

TECHNIQUES IN ESTABLISHING RELIABILITY

What is Reliability?

Synonyms for reliability are: dependability, stability, consistency, predictability,


accuracy, Reliable people, for instance, are those whose behavior is consistent,
dependable, predictable–what they will do tomorrow and next week will be consistent
with what they do today and what they have done last week. They are stable, we say,
unreliable people, on the other hand, are those whose behavior is much more variable.
They are unpredictable variable. Sometimes they do this, sometimes that. They lack
stability. We say they are inconsistent.

Reliability refers to the consistency of a measure. A test is considered reliable if we get


the same result repeatedly. For example, if a test is designed to measure a trait (such
as introversion), then each time the test is administered to a subject, the results should
be approximately the same. Unfortunately, it is impossible to calculate reliability exactly,
but there several different ways to estimate reliability.

How can reliability be measured?

The reliability measure is large (for example, 1) when there is no error and small (for
example, 0) when there is error. A statistic which satisfies these conditions is called the
reliability coefficient. The reliability coefficient (r) is the ratio of variability of the true
score component of the observed score to the variability of the observed score (which
include both true and error components).

If r, the coefficient of correlation, is squared, it becomes a coefficient of determination,


that is, it gives us the proportion or percentage of the variance shared by two variables.
If r = 0.90, then the two variables share (0.90)2 = 81% of the total variance of the two
variables in common. The reliability coefficient is also a coefficient of
determination. Theoretically, it tells how much variance of the total variance of
measured variables is true variance. If we had the true scores and could correlate them
with the scores of the measured variables and square the resulting coefficient of
correlation, we would obtain the reliability coefficient.

The following table is a standard in interpreting the values of the reliability coefficient
(De Guzman-Santos, 2007 p.64):

Reliability Interpretation
0.90 and above Excellent Reliability. At the level of the best standardized tests.
0.80 - 0.89 Very Good. This is ideal for a classroom test.
0.70 - 0.79 Good. This is good for a classroom test. There are probably a few
items which could be improved.
0.60 - 0.69 Somewhat Low. This test needs to be supplemented by other
measures to determine grades. There are probably some items
which could be improved.
0.50 - 0.59 Needs Revision
0.49 and below Questionable Reliability

Types of Reliability and sources of error

Type Measured by Sources of Error Potential Problems


Correlation
Test-Retest Between two Different occasions Confounded by the
administrations of as well as within memory and real
the same test test development
Equivalence/Parallel Between Different forms as Confounded by real
Forms administration of the well as within test development
two forms of test
Test-Retest with Between Different occasions Confounded by real
Equivalence administration of and forms as well development
the two forms of as within test
test
Internal Consistency Among items with Within test May not generalize
(Split-half, KR the test across raters,
Formulas, Cronbach occasions, forms
Alpha)
Inter-rater/Inter- Between raters as Difference between Beware of inter-
judge Consistency well as within the raters as well as rater agreement.
items of the test within test Percentages which
can be quite
different from
reliability coefficient

How can reliability coefficient be computed?

The following example illustrates how to arrive at a reliability coefficient of different


types:

Pearson r. This is used in describing the reliability estimates from a test-retest, alternate form,
and split-half reliabilities. The formula is as follows:

𝑁𝛴𝑋𝑌 − (𝛴𝑋)(𝛴𝑌)
𝑟=
√[𝑁𝛴𝑋 2 − (𝛴𝑋)2 ][𝑁𝛴𝑌 2 − (𝛴𝑌)2 ]

where:
r = the Pearson Product Moment Correlation Coefficient
ΣXY = sum of the product of X and Y scores
ΣX = sum of X-scores
ΣY = sum of Y-scores
ΣX2 = sum of the squares of X-scores
ΣY2 = sum of the squares of Y-scores
N = number of cases

For Example:

Mr. Tucmo administered his statistics test to ten (10) first year college students. After
two weeks, the same test was given to the same group of students. Their scores in the first and
second tests are shown below. Compute the reliability of the Test.

First Testing Second Testing XY X2 Y2


(X) (Y)
40 41 1640 1600 1681
35 40 1400 1225 1600
30 25 750 900 625
20 20 400 400 400
19 20 380 361 400
20 23 460 400 529
37 34 1258 1369 1156
38 35 1330 1444 1225
40 40 1600 1600 1600
25 25 625 625 625
ΣX = 304 ΣY = 303 ΣXY = 9843 ΣX2 = 9924 ΣY2 = 9841

𝑁𝛴𝑋𝑌 − (𝛴𝑋)(𝛴𝑌)
𝑟=
√[𝑁𝛴𝑋 2 − (𝛴𝑋)2 ][𝑁𝛴𝑌 2 − (𝛴𝑌)2 ]

10(9843) − (304)(303)
𝑟=
√[10(9924) − (304)2 ][10(9841) − (303)2 ]

r = 0.94

Spearman-Brown Prophecy Formula. This is a reliability estimate of the whole test using the
results of the Pearson Product Moment Correlation Coefficient of a half of the test. This serves
as the correction for the reliability estimate. The formula is,

2roe
rt =
1 + roe
where:
rt = reliability of the whole test
roe = splif-half reliability

For Example:
Odd (X) Even (Y) XY X2 Y2
14 19 266 196 361
19 18 342 361 324
17 18 306 289 324
15 13 195 225 169
20 15 300 400 225
11 9 99 121 81
24 20 480 576 400
16 15 240 256 225
15 15 225 225 225
15 13 195 225 169
ΣX = 166 ΣY = 155 ΣXY = 2648 ΣX2 = 2874 ΣY2 = 2503

𝑁𝛴𝑋𝑌 − (𝛴𝑋)(𝛴𝑌)
𝑟=
√[𝑁𝛴𝑋 2 − (𝛴𝑋)2 ][𝑁𝛴𝑌 2 − (𝛴𝑌)2 ]

10(2648) − (166)(155)
𝑟=
√[10(2874) − (166)2 ][10(2503) − (155)2 ]

r = 0.69

Reliability of a Half of the Test (roe)

Using the Spearman-Brown Prophecy Formula:

2roe
rt =
1 + roe

2 (0.69)
rt =
1 + 0.69
rt = 0.82

Kuder-Richardson Formula 21 (KR-21). This is a method of establishing the reliability of a


test using the mean and the standard deviation of the scores. The formula is shown as:

𝑘 𝑥 (𝑘 − 𝑥 )
𝐾𝑅21 = { } {1 − }
𝑘−1 𝑘 (𝑠𝑑 )

where:
x = the mean of the obtained scores
sd = the standard deviation
k = the total number of items

For Example:

Ms. Reyes administered a 50-item mathematics test to her Grade VI pupils. The scores of her
pupils are shown below. Find the reliability of her test by using Kuder-Richardson Formula 21.

Pupil Score (X) X- x (X - x )2


A 32 3.2 10.24
B 36 7.2 51.84
C 36 7.2 51.84
D 22 -6.8 46.24
E 38 9.2 84.64
F 15 -13.8 190.44
G 43 14.2 201.64
H 25 -3.8 14.44
I 18 -10.8 116.64
J 23 -5.8 33.64
---- 288 ---- 801.60

x = 28.8 sd = 89.07 k = 50
Solving for KR-21:
𝑘 𝑥 (𝑘 − 𝑥 )
𝐾𝑅21 = { } {1 − }
𝑘−1 𝑘 (𝑠𝑑 )

50 28.8(50 − 28.8)
𝐾𝑅21 = { } {1 − }
50 − 1 50(89.07)

𝐾𝑅21 = 0.88

FACTORS WHICH AFFECT RELIABILITY

A number of factors have shown to affect the conventional measures of reliability. If sound
conclusions are to be drawn, these factors must be considered when interpreting reliability
coefficients.

A. Length of Test
In general, the longer the test is, the higher its reliability will be. This is because a longer test
will provide a more adequate sample of the behavior being measured, and the scores are apt
to be less distorted by chance factor such as guessing. Suppose that to measure the spelling
ability, we asked pupils to spell one word. The result would be patently unreliable. Pupils who
were able to spell the word would be perfect spellers, and pupils who would not would be
complete failures. If we happened to select a difficult word, most pupils would fail; If the word
was an easy one, most pupils would appear to be perfect spellers. The fact that one word
provides an unreliable estimate of a pupil’s spelling ability is obvious. It should be equally
apparent that as we add more spelling word to the list, we come closer and closer to a good
estimate of each child’s spelling ability. Score based on a large number of spelling words thus
are more apt to reflect real differences in spelling ability and therefore to be more stable. By
increasing the size of the sample of spilling behavior, therefore, we increase the consistency of
our measurement.

B. Spread of Scores
Reliability coefficient is directly influenced by the spread of scores in the group tested. Other
things being equal, the larger the spread of scores is, the higher the estimate of reliability will
be. Because large reliability coefficient result when individuals tends to stay in the same
relative position in a group from one instrument administration to another, it naturally follows
that anything that reduces the possibility of shifting position in the group also contributes to
larger reliability of shifting positions.

C. Difficulty Of Test
Tests those are too easy or difficult for the group members taking it will tends to produce
scores of low reliability. This is because both easy and difficult tests result in a restricted
spread of scores. For the easy test, the scores are close together at the top end of the scale.
For the difficult test, the scores are grouped together at the bottom end of scale. For both,
however, the differences among individuals are small and tend to be unreliable.

D.Objectivity
The objectivity of a test refers to the degree to which equally competent scores obtain the
same results. Most standardized tests, aptitude and achievement, are high in objectivity. The
test items are of the objective type (e.g., multiple choice), and the resulting scores are not
influenced by the scorers’ judgment or opinion. In fact, such tests are usually constructed so
that they can be scored by trained clerks and scoring machines. When such highly objective
procedures are used, the reliability of the test result is not affected by the scoring procedures.

OTHER TEST CHARACTERISTICS

1. Sensitivity. It is the ability of the instrument to make the discriminations required for
the problem. A single characteristic measured may yield variations within subjects,
between subjects, and between groups. The degree of variation should be detected by
the instrument. If the reliability and validity of the test is high, most likely, the test is also
sensitive enough to make finer distinctions in the degree of variations of the
characteristics being measured.
2. Objectivity. It is the degree to which the measure is independent of the personal
opinions, subjective judgment, biases, and beliefs of the individual test user.
Regardless of sex, age, and appearance and gestures of the examiner, a respondent
should obtain a score that is stable and accurate, free from any influence of the
personal variables of the examiner. Item Analysis is a process that makes the test
items objective.

Objectivity has two aspects:


a. The manner of scoring the test: the element of subjectivity or personal
judgment has been eliminated in the correction of the test. This means that the
score is independent of the personal judgment of the scorer.

b. The manner of interpreting individual test items by the examinees who took
the test: a well-constructed test item should lend itself to one and only one
interpretation by examinee who know the subject in question.

3. Usability. It is the degree to which the measuring instrument can be satisfactorily


used by teachers, researchers, supervisors, etc. without undue expenditure of time,
money, and effort. In other words, usability means practicality.

Some important aspects of usability are the following:

a. Administrability: A good test can be administered with ease, clarity, and


uniformity. To ensure administrability, directions are made simple, clear and
concise.

b. Scorability: A good test is easy to score. Test results should be easily


available to both the students and the teacher so that proper remedial and follow
up measures and curricular adjustments can be made.

c. Interpretability: Test results can be useful only when they are properly
evaluated. However, they can only be evaluated after they are interpreted.

d. Proper Mechanical Make-Up: Tests should be printed clearly in an


appropriate size of paper. Careful attention should be given to the quality of
pictures and illustrations. The format should also be planned that it takes the
respondents a minimum amount of time to write their answer to each item. The
answer sheet, if it is used at all, should not be confusing as to lead the
respondents to write the answer in the wrong blank or column. If the answers
are to be written in the test papers themselves, there should be a definite system
in the placing of the blanks for the answers.

4. Feasibility. It is concerned with the aspects of skills, cost, and time.


a. Skills: There are certain tests which require minimum skills in developing
them and which may also require minimum training in administration, scoring,
analyzing and interpreting test data.

b. Economy
❖ They should be economical in expenses. One way to economize cost for
test is to use answer sheets and reusable tests. However, test validity
and reliability should not be sacrificed.
❖ They should be economical in time. Test can be given in a short period of
time and are likely to gain the cooperation of the respondents. Also, it
conserves the time of all those involved in test administration.

5. Comprehensibility. It is not possible to include all objectives of the course or all


aspects of the subject matter in a test. The best thing is to sample the objectives as
systematically as possible. It should be remembered that the longer the test and the
more comprehensive, the better the chances are that it will be more valid and reliable.

6. Interesting. Tests that are interesting and enjoyable help to gain the cooperation of
the subject. However, those that are dull or seem silly may encourage or antagonize
the subject. Under these unfavorable conditions, the test is not likely to yield useful
results.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy