Validity and Relability
Validity and Relability
4. Face Validity- A test has face validity if it looks valid to test users, examiners and
especially the examinees. It is really matter of social acceptability and not a
technically a form of validity. For instance, if a test is prepared to measure whether
students can perform multiplication, and the people to whom it is shown all agree
that it looks like a good test of multiplication ability.
1. Test-Retest Reliability- It is the most straight forward method for determining the
reliability of test scores is to administer the identical test twice to the same group of
representative subjects. If the test is perfectly reliable, each person’s second score
will be completely predictable from his or her second score. As long as the second
score is strongly correlated with the first score, the existence of practice, maturation,
or treatment effects doesn’t cast doubt on the reliability of psychological test.
2. Alternate Form Reliability- the test developers produce two forms of the same test.
These alternate forms are independently constructed to meet the same
specification, often on an item-by-item basic. Thus, alternate forms of a test
incorporate similar content and cover the same range and level of difficulty in items.
Alternate form reliability is derived by administering both forms to the same group
and correlating the two sets of scores. This test is similar to test-retest as both
involve 2 test administration to same subjects with same time interval. There is
fundamental difference between 2 approaches. The alternate-forms methodology
introduces item-sampling difference as an additional source of error variance. Some
test takers may do better or worse on one form of test. Even though the two forms
may be equally difficult on average, some subjects may find one form quite a bit
harder than other. Alternate forms also nearly cost double for publishing the test.
3. Split-Half Reliability- This are obtained by correlating the pairs of scores obtained
from equivalent halves of a test administered only once to a representative sample
of examinees. The logic in this straightforward: If scores on two half tests from a
single test administration shows a strong correlation, then scores on 2 whole tests
from 2 separate tests administrations should reveal a strong correlation. Excessive
cost may render it impractical to obtain a second set of test scores from the same
examinees. Pg 82.
4. Interscorer Reliability- a sample of tests is independently scored by two or more
examiners and scores for pairs of examiners are then correlated. Tests manually
typically report the training and experiences required of examiners then list
representative interscorer correlation coefficients. This reliability supplements other
reliability estimates but does not replace them. It is appropriate for any test that
involves subjectivity of scoring.
2. Speed and Power tests- A speed test typically contains items of uniform and
generally simple levels of difficulty. If time permitted, most subjects should be able
to complete most or all of the tests on such a test. It has a restrictive time limit. An
examinee’s score on a speeded test largely reflects speed of performance. A power
test allows enough time for test takers to attempt all items but is constructed so that
no test taker is able to obtain a perfect score. The reliability of a speed test should be
based on the test-retest method or split-half reliability from two, separately time
half test.