0% found this document useful (0 votes)
19 views

5 Reliability

Reliability refers to the consistency of measurement in assessments, indicating how stable test scores are over time or across different tasks. It is assessed using statistical indices such as correlation coefficients and is necessary for validity, though high reliability does not guarantee high validity. Various methods for estimating reliability include test-retest, equivalent forms, and internal consistency measures, each with specific applications and limitations.

Uploaded by

susloves98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

5 Reliability

Reliability refers to the consistency of measurement in assessments, indicating how stable test scores are over time or across different tasks. It is assessed using statistical indices such as correlation coefficients and is necessary for validity, though high reliability does not guarantee high validity. Various methods for estimating reliability include test-retest, equivalent forms, and internal consistency measures, each with specific applications and limitations.

Uploaded by

susloves98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Reliability

Dr. Münevver İLGÜN DİBEK


Nature of Reliability
▪ Reliability refers to the consistency of measurement. That is, how consistent
test scores or other assessment results are from one measurement to another.

Example:
A weight scale provides a reliable score if it
tells the same weight every time
How similar would the students’ scores have been had she assessed them
yesterday,or tomorrow, or next week?

How much would the scores have differed had the different teacher scored it?

How much would the scores have differed had the teacher used diffferent sample of
tasks?
Close succession
Assessment Assessment

Attention

Fatigue

Guessing

Memory

Effort
Long period
Assessment Assessment

Learning experience

Health

Forgetting
Characteristics of Reliability

▪ Reliability refers to the results obtained with an assessment instrument and not
the instrument itself
▪ An estimate of reliability always refers to a particular type of consistency
▪ If you want to measure what individuals will be like at some future time, consistency
of scores over time is important
▪ If you want to measure individual’s current understanding of certain scientific
principles, consistency of the performance across different tasks is important
Characteristics of Reliability

▪ Reliability is assessed primarily with statistical indices


▪ Correlation coefficient (reliability coefficient) and standard error of measurement are
computed.
▪ Reliability is necessary but not sufficient condition for validity
▪ Low reliability indicates low degree of validity, but high reliability does not ensure high
degree of validity.
RELIABILITY - VALIDITY RELATION

Reliability If something has high


reliability, it may have either
high validity or low validity

Validity
If something has high validity
it has also high reliability
▪ Correlation coefficient: A static that indicates the degree of the relationship between
any two sets of scores obtained from the same group of individuals

▪ Reliability coefficient: A correlation coefficient that indicates the degree of the


relationship between two sets of scores intended to be measures of the same
characteristics

▪ Validity coefficient: A correlation coefficient that indicates the degree to which a


measure predicts or estimates performance on some criterion measure (e.g
correlation between scholastic apptitude scores and grades in school)
1.Test-retest (measure of stability)
Give the same test twice some time interval
2. Equivalent forms (measure of equivalence)
Give two forms of the test to the same group in close Methods of
succession
3. Test-retest equivalent forms (measure of stability and
Estimating Reliability
equivalence)
Give two forms of the test to the same group in different
times
4. Split-half (measure of internal consistency)
Give test once; score two equivalent halves of test
5. Kuder-Richardson and coefficient alpha
(measure of internal consistency)
Give test once, apply KR or Cronbach’s alpha formula
6. Interrater (measure of consistency of ratings)
Give a set of students responses to two or more raters
TEST-RETEST METHOD

Time 1 Time 2
Construct X Construct X
This correlation coefficient indicates
how stable the assessment results are
Instrument A Instrument A over a period of time

Form 1 Form 1
Sample n Sample n

SCORE 1 SCORE 2
CLOSE TO 1 High Reliability

CLOSE TO 0 Lower Reliability 11


TEST-RETEST METHOD…

▪ If the time interval between two tests is too short, the constancy of the results
will be distorted because students will remember the taks and the responses to
first test.

▪ If the time interval between two tests is too long, the constancy of the results
will be distorted because the actual changes in student will happen.
Equivalent/alternative /parallel forms method

Time 1 Time 1
Construct X Construct X
This correlation coefficient indicates
the degree to which the two
Instrument A Instrument A assessments are measuring the same
aspects of behaviour.
Form 1 Form 2
Sample n Sample n

SCORE 1 SCORE 2
CLOSE TO 1 High Reliability

CLOSE TO 0
Lower Reliability 13
Equivalent/alternative /parallel forms method

▪ It reflects short-term consistency of the students’ performance (not long term


consistency)
▪ Equivalent forms have the same content and difficulty
▪ This is the easiest way to determine whether an assessment measures an
adequate sample of the content. Different versions of the assessment covering
the same domain of content are constructed and the results are correlated.
Test-Retest with Equivalent/alternative /parallel forms Method

Time 1 Time 2
Construct X Construct X
This correlation coefficient indicates
the degree to which the two
Instrument A Instrument A assessments are measuring the same
aspects of behaviour.
Form 1 Form 2
Sample n Sample n

SCORE 1 SCORE 2
CLOSE TO 1 High Reliability

CLOSE TO 0
Lower Reliability 15
Internal-Consistency Methods-Split half reliability
İ01 responses
Methods of splitting • There are several
• the first versus second İ02 responses internal-consistency
half methods that require
only one administration
• odd versus even- İ03 responses of an instrument.
numbered
• a random selection İ04 responses • Split-half procedure:
involves scoring two
It indicates theİ05responses
degree to halves of a test
separately for each
which consistent results subject and calculating
obtained from İ06
tworesponses
halves the correlation
of the test coefficient between the
two scores.
İ07 responses

İ08 responses
Internal-Consistency Methods-Split half reliability…

Spearman-Brown Formula:
2 x correlation between half assessments
Reliability on full assessment =
1 + correlation between half assessments

Suppose correlation between half assessments = 0.60

2 x (0.60)
Reliability on full assessment = = 0.75
1 + (0.60)
Internal-Consistency Methods-KR-20,KR 21, Alpha coefficient…

Kuder-Richardson Approaches (KR20 and KR21) :

• A test administered to the group only once.


• When students’ responses are scored dichotomously, KR 20 and KR 21
formulas are used.

Alpha Coefficient:
• It is the generalization of the KR-20 for assessments that have more than
dicthotomous scores (e.g each tasks is scored on a 5-point scale)
• Both coefficients provide information about the degree to which the items or
tasks in the assessment measure similar characteristics
Limitations for Internal Consistency Methods

▪ They are not appropriate for the speed assessment-for assessments with time
limits preventing students from attempting every task.
▪ They do not indicate the constancy of student responses from day to day
because there is only one administration.
INTER-RATER RELIABILITY/scorer agreement method

▪ Open-ended questions, essays, lab experiment exercises…


▪ Whenever students’ work are judgmentally scored, it is reasonable to
have more than one judge.
INTER-RATER RELIABILITY/scorer agreement
method

Rater 1 Rater 2
Construct X Construct X
This correlation coefficient indicates
the degree to which the relative
Instrument A Instrument A ordering of responses is consistent
from one rater to another.
Form 1 Form 1
Sample n Sample n

SCORE 1 SCORE 2
CLOSE TO 1 High Reliability

CLOSE TO 0
Lower Reliability 21
Percentage of agreement

Percentage of exact agreement =


100x [(3+7+5+4+2+3)/50] = 48%
Standard Error of Measurement (SEM)

Suppose that we are assessing a student over and over again on the same
assessment procedure. We will obviously get different scores each time.

The amount of variation in the scores would be directly related to reliability.

Low reliability large variation


High reliability small variation
Standard Error of Measurement (SEM)

• Although it is impossible to administer the same set of assessment tasks many


many times to the same students, we can calculate an estimation of those
variations.
• SEM is the amount of error that must be considered in interpreting score.It
provides the limits within which we can reasonably expect to find true score.

• True score is one that would be obtained if the test were perfectly reliable.
If a student were tested repeatedly under identical conditions and there were no memory,
learning, practice or fatigue effects,
- We could be 68% sure that the true score of him/her would fall within one SEM of his/her
obtained score.

- We could be 95% sure that the true score of him/her would fall within two SEM of his/her
obtained score.

- We could be 99% sure that the true score of him/her would fall within three SEM of his/her
obtained score.
• Each obtained score has a confidence band/interval.

For example;
• Tuğrul has a score of 52 and standard error of measurement is 4.
• What does this mean?

• Tuğrul’s true score is between (52-4) AND (52+4) with 68% confidence. In other
words, we 68% confident that his true score is between 48 and 56.

• Tuğrul’s true score is between (52-4x2) AND (52+4x2) with 95% confidence. In
other words, we 95% confident that his true score is between 44 and 60.

• Tuğrul’s true score is between (52-4x3) AND (52+4x3) with 99% confidence. In
other words, we 99% confident that his true score is between 40 and 64.
Relationship between SEM and Reliability

𝑆𝐸𝑀 =𝑆𝐷 𝑋 √1 −𝑟
SD=Standard deviation, r = reliability

As the reliability coefficient increases for any standard deviation, the standard error of
measurement decreases. Coversely, small reliability coefficients are associated with large
measurement errors.
Factors Influencing Reliability
▪ Number of Assessment Tasks: The larger the number of tasks, the higher its reliability.
▪ Longer assessment provides a more adequate sample of the behaviour being measured.
▪ Scores are less affected by chance factors such as familiarity with a given task

Objectivity: It refers to the degree to which equally competent scorers obtained the same results.
▪ Raters are important. Raters should be trained on how to use rubrics.
▪ Clearly established rubrics.
USABILITY

▪ Ease of administration
▪ Directions should be simple and clear
▪ Time needed for the administration should not be too great
▪ Ease of interpretation and application of results
▪ If they are interpreted correctly and applied effectively, they contribute to
more intelligent educational decisions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy