Validity and Reliability
Validity and Reliability
I. Validity
1. Definitions
- Valid (adj): based on truth or reason; able to be accepted
- Validity (n): accurate measure of what it is intended to measure.
- Term confusion: Validity or construct validity
o In recent years, ‘construct validity’ has been increasingly used to refer to the
general notion of validity
o Construct validity (n): Meaningfulness and appropriateness of interpretation we
make on the basis of test scores.
- Face validity (n): Subjective validity, seems reasonable for lay people
o Still important because it meets the stakeholders’ social purposes (the social
consequences of the test)
2. Evidence
a. Content validity
- Test content must have a representative sample of language skills, structures, etc.
- A specification of skills or structures must be made at early stages of test construction.
- Example: An intermediate-level achievement grammar test is intended to be made up of
proper items relating knowledge or control of grammar. So, it cannot contain items for
advanced learners.
b. Criterion-related validity (An external test)
- Test results must agree with the results provided by an independent and highly
dependable assessment of the candidate’s ability
- Concurrent validity: established when the test and criterion are administered at about
the same time.
- Predictive validity: the degree to which a test can predict candidate’s future
performance.
c. How is level of agreement measured?
- Correlation coefficient (validity coefficient): A mathematical measure of similarity.
o Perfect agreement: 1 (Perfectly valid)
o No agreement: 0 (Invalid)
- Scoring validity
o A reading test with short answer questions is meant to measure reading ability, If
the scoring takes into account spelling and grammar => May not be valid
o For a writing test, if we emphasize too much on mechanical features (e.g. spelling
and punctuation) => scoring may be invalid => the best may be invalid.
- Face validity
o The test looks as if it measures what it is supposed to measure.
o Indirect testing should be introduced slowly, carefully, and reasonably
3. How to make tests more valid
- Validate the test before operation
- For teacher-made tests, it is impossible to carry out full validation.
- Write test specification.
- Use direct testing.
- Use relevant scoring.
- Ensure reliability.
II. Reliability
- Definition: consistency of measurement of individuals by a test, usually expressed in a
reliability coefficient. Consistent. Something you can trust for a long period of time.
o Perfectly reliable: 1
o Unreliable: 0
- Without reliability, there’s no way for you to trust your results.
o Validity: testing what you’re measuring
o Reliability:
- The standard error of measurement estimates how repeated measures of a person on
the same instrument tend to be distributed around his or her true score. The true score
is always unknown.
- SEM is based on the reliability coefficient and a measure of the spread of all the scores
on the test
- => A way to predict a person’s actual score/true score.