0% found this document useful (0 votes)
10 views

Slide 4-Reliability

Uploaded by

zeenathussain013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Slide 4-Reliability

Uploaded by

zeenathussain013
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Reliability of Measures

The Concept Of Reliability


• The reliability of a measurement procedure is the
stability or consistency of the measurement.
• A measurement procedure is said to have reliability if
it produces identical (or nearly identical) results when
it is used repeatedly to measure the same individual
under the same conditions.
• For example, if we use an IQ test to measure a
person’s intelligence today, then use the same test for
the same person under similar conditions next week,
we should obtain nearly identical IQ scores.
• The inconsistency in a measurement comes from error.
• Observer error: The individual who makes the
measurements can introduce simple human error into the
measurement process.
• Environmental changes: there are small changes in the
environment (such as time of day, temperature, weather
conditions, and lighting) from one measurement to another,
and these small changes can influence the measurements.
• Participant changes: The participant can change between
measurements. As noted earlier, a person’s degree of focus
and attention can change quickly and can have a dramatic
effect on measures of reaction time. I.e. Hunger Vs. IQ
• In summary, any measurement procedure involves an
element of error and the amount of error determines the
reliability of the measurements. When error is large,
reliability is low, and when error is small, reliability is high.
Reliability Types
1. Inter Rater Reliability
• When measurements are obtained by direct observation
of behaviors, it is common to use two or more separate
observers who simultaneously record measurements; is
the degree of agreement or consistency between two or
more scorers (or judges or raters) with regard to a
particular measure.
• For example, two psychologists may watch a group of
preschool children and observe social behaviors. Each
individual records (measures) what she observes, and
the degree of agreement between the two observers is
called inter-rater reliability.
• Inter-rater reliability can be measured by computing the
correlation between the scores from the two observers
or by computing a percentage of agreement between
the two observers
2. Test-retest Reliability
• The reliability estimate obtained by comparing the
scores obtained from two successive measurements is
commonly called test-retest reliability.
• A researcher may use exactly the same measurement
procedure for the same group of individuals at two
different times; The reliability of test scores is by
repeating the identical test on a second occasion.
• The reliability coefficient in this case is simply the
correlation between the scores obtained by the same
persons on the two administrations of the test.
• Of course, sometimes poor test–retest correlations do
not mean that a test is unreliable
Limitations of Test-retest reliability
● Carry Over Effect: This effect occurs when the first
testing session influences scores from the second session.
For example, test takers sometimes remember their
answers from the first time they took the test. Carryover
problems are of concern only when the changes over
time are random not systematic.
● Practice effects- type of carryover effect. Some skills
improve with practice. When a test is given a second
time, test takers score better because they have
sharpened their skills by having taken the test the first
time.
● The time interval between testing sessions must be
selected and evaluated carefully. If the two
administrations of the test are close in time, there is a
relatively great risk of carryover and practice effects.
● Motivation Level
3. Parallel Form Reliability
• Researcher may use modified versions of the measurement
instrument (such as alternative versions of an IQ test) to
obtain two different measurements for the same group of
participants.
• The same persons can be tested with one form on the first
occasion and with another, comparable form on the second.
The correlation between the scores obtained on the two forms
represents the reliability coefficient of the test
• When different versions of the instrument are used for the
test and the retest, the reliability measure is often called
parallel-forms reliability.
• Both groups take both tests: group A takes test A first, and
group B takes test B first. The results of the two tests are
compared, and the results are almost identical, indicating high
parallel forms reliability
Just think!
You missed the midterm examination and have
to take a makeup exam. Your classmates tell you
that they found the midterm impossibly difficult.
Your instructor tells you that you will be taking
an alternate form, not a parallel form, of the
original test. How do you feel about that?
Limitations
● test scores may be affected by factors such as
motivation, fatigue, or intervening events
such as practice, learning, or therapy
● The order of administration is usually
counterbalanced to avoid practice effects
● Developing alternate forms of tests can be
time-consuming and expensive.
4. different methods for Internal
Consistency estimates of Reliability
1.Split half Reliability
2. KR 20 Formula and Cronbach Alpha
1. Split Half Reliability
To measure the degree of consistency, researchers commonly
split the set of items in half and compute a separate score for
each half. The degree of agreement between the two scores is
then evaluated, usually with a correlation. This general process
results in a measure of split-half reliability. Only a single
administration of a single form is required.
Steps: Divide the test into two equivalent halves
1. Random method
2. Odd-even
3. Content & difficulty
4. Compute the Pearson r b/w scores on the two halves of the
test
2.KR20 Formula and Cronbach Alpha
• Cronbach’s Alpha and the Kuder-Richardson
formula are two statistical techniques for dealing
with this problem
• KR20 first published in 1937 is a measure of
internal consistency reliability for measures
with dichotomous choices.
e.g. Right/wrong, true/false, correct/incorrect
• it shouldn’t be used for questions with partial
credit is possible or for scales like the Likert Scale.
If you have a test with more than two answer
possibilities (or opportunities for partial credit),
use Cronbach’s Alpha instead.
• Cronbach’s alpha refers to the degree of
correlation among all the items on a scale. It
is a measure of inter-item consistency is
calculated from a single administration of a
single form of a test and it is used to estimate
the internal consistency of homogeneous as
well heterogeneous tests (Most preferable).
How Reliability can be improved
• Sufficient .70 t0 .90 alpha coefficient
• Increase the number of items
• affect by sample size

• Factor and item analysis (correlation between


single item score and Total scale score; low
correlation indicates item is so hard, irrelevant,
different; also omitted to improve overall reliability)
• Correction for attenuation- low reliability reduces
the chances of finding significant correlations
between measures. If a test is unreliable,
information obtained with it is of little or no value.
Thus, we say that potential correlations are
attenuated, or diminished, by measurement error
Thank You ☺

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy