0% found this document useful (0 votes)
77 views

Validity and Relability

Validity refers to whether a test measures what it claims to measure. There are several types of validity including content validity, construct validity, criterion validity, and face validity. Factors like test item appropriateness, directions, vocabulary, difficulty level, construction, length, and ambiguity can influence a test's validity. Reliability refers to a test's consistency in measurement. There are several types of reliability including test-retest, alternate form, split-half, and interscorer reliability. Test-retest reliability involves administering the same test twice, while alternate form uses two equivalent versions of a test. Split-half reliability correlates scores on equivalent halves and interscorer reliability correlates scores from multiple examiners.

Uploaded by

cynthiasen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Validity and Relability

Validity refers to whether a test measures what it claims to measure. There are several types of validity including content validity, construct validity, criterion validity, and face validity. Factors like test item appropriateness, directions, vocabulary, difficulty level, construction, length, and ambiguity can influence a test's validity. Reliability refers to a test's consistency in measurement. There are several types of reliability including test-retest, alternate form, split-half, and interscorer reliability. Test-retest reliability involves administering the same test twice, while alternate form uses two equivalent versions of a test. Split-half reliability correlates scores on equivalent halves and interscorer reliability correlates scores from multiple examiners.

Uploaded by

cynthiasen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Validity and Reliability

Validity and its types


A validity of a test is the extent to which it measures what it claims to measure. A test is
valid when the inferences made from it is appropriate, meaningful and useful. Validity refers
to a test's ability to measure what it is supposed to measure. For example, a test of
intelligence should measure intelligence and not something else (such as memory). Internal
validity refers to whether the effects observed in a study are due to the manipulation of the
independent variable and not some other factor. Internal validity can be improved by
controlling extraneous variables, using standardized instructions, counter balancing, and
eliminating demand characteristics and investigator effects. External validity refers to the
extent to which the results of a study can be generalized to other settings (ecological
validity), other people (population validity) and over time (historical validity). External
validity can be improved by setting experiments in a more natural setting and using random
sampling to select participants.
1. Content Validity- It is determined by the degree to which the questions, tasks, or
items on a test are representative of the universe of behaviour the test was designed
to sample. If the sample or specific items of the test is representative of the
population or all possible items, then the test posses content validity. For example, a
depression scale may lack content validity if it only assesses the affective dimension
of depression but fails to consider the behavioral dimension.

2. Construct Validity- A construct is an attribute, skill, or ability that is based on


established theories and exists in the human brain. Intelligence, anxiety, and
depression are all examples of constructs. Construct validity is the degree to which a
test measures the construct that it is supposed to measure. There are two necessary
components of construct validity: convergent and discriminant validity.

a. Convergent Validity- Convergent validity is the degree to which a measurement


agrees with other measurements that assess the same construct. Let's say that
while reading the manual you found that the BAI (Becks Anxiety Inventory) is highly
correlated with the Hamilton Rating Scale (HRS) and the State Trait Anxiety Inventory
(STAI), which are two previously validated measures of anxiety. This establishes
convergent validity.
b. Discriminant Validity- It is the degree to which a measurement does not correlate
with other measurements that assess different concepts, so that it differentiates
between the two constructs. You also found that the BAI has a low correlation with
the Beck Depression Inventory, which is meant to measure depression. This
establishes discriminant validity.
3. Criterion Validity- It is when a test is shown to be effective in estimating an
examinee’s performance on some outcome measure. Basically, its measure is related
to the outcome. The variable of primary interest is the outcome measure called a
criterion. For example, a college entrance exam that is reasonably accurate in
predicting the subsequent grade point average of examinees would possess
criterion-related validity.
a. Concurrent Validity- It indicates the extent to which test scores accurately estimate
an individual’s present position on the relevant criterion. For example, a personality
inventory would possess concurrent validity if diagnostic classifications derived from
it roughly matched the opinions of psychiatrist or clinical psychologists.
b. Predictive Validity- In this validation, test scores are used to estimate outcome
measures obtained at a later date. An example of predictive validity is a comparison
of scores on the SAT with first semester grade point average (GPA) in college; this
assesses the degree to which SAT scores are predictive of college performance.

4. Face Validity- A test has face validity if it looks valid to test users, examiners and
especially the examinees. It is really matter of social acceptability and not a
technically a form of validity. For instance, if a test is prepared to measure whether
students can perform multiplication, and the people to whom it is shown all agree
that it looks like a good test of multiplication ability.

Factors influencing Validity

 Inappropriateness of the test item- Measuring the understandings, thinking skills,


and other complex types of achievement with test forms that are appropriate only
for measuring factual knowledge will invalidate the results
 Directions of the test items- "Directions that are not clearly stated as to how the
students respond to the items and record their answers will tend to lessen the
validity of the test items.
 Vocabulary and sentence structure- Vocabulary and sentence structures that do not
match the level of the students will result in the test of measuring reading
comprehension or intelligence rather than what it intends to measure.
 Difficulty of the test item- Then the test items are too easy and too difficult they
cannot discriminate between the right and the poor students. Thus, it will lower the
validity of the test.
 Poorly constructed test items. Test items which unintentionally provide clues to the
answer will tend to measure the students alertness in detecting clues and the
important aspects of students performance that the test is intended to measure will
be affected.
 Length of the test items. A test should have sufficient number of items to measure
what it is supposed to measure. If a test is too short to provide a representative
sample of the performance that is to be measured, validity will suffer accordingly.
 Arrangement of the test items. best items should be arranged in an increasing
difficulty. Difficult items early in the test may cause mental blocks and it may take up
too much time for the students hence, students are prevented from reaching items
they could easily answer. Therefore, improper arrangement may also affect the
validity by having a detrimental effect on students motivation.
 Pattern of the answers. A systematic pattern of correct answers, and this will lower
again the validity of the test.
 Ambiguity- Ambiguous statements in test items contribute to misinterpretations and
confusion. Ambiguity sometimes confuses the bright students more than the poor
students, casing the items to discriminate in a negative direction.

Reliability and its types


Reliability refers to the attribute of consistency in measurement. A measure is said to have a
high reliability if it produces similar results under consistent conditions. Reliability is best
viewed as a continuum ranging from minimal consistency of measurement to near-perfect
repeatability of results. Most psychological tests fall somewhere between them.

1. Test-Retest Reliability- It is the most straight forward method for determining the
reliability of test scores is to administer the identical test twice to the same group of
representative subjects. If the test is perfectly reliable, each person’s second score
will be completely predictable from his or her second score. As long as the second
score is strongly correlated with the first score, the existence of practice, maturation,
or treatment effects doesn’t cast doubt on the reliability of psychological test.

2. Alternate Form Reliability- the test developers produce two forms of the same test.
These alternate forms are independently constructed to meet the same
specification, often on an item-by-item basic. Thus, alternate forms of a test
incorporate similar content and cover the same range and level of difficulty in items.
Alternate form reliability is derived by administering both forms to the same group
and correlating the two sets of scores. This test is similar to test-retest as both
involve 2 test administration to same subjects with same time interval. There is
fundamental difference between 2 approaches. The alternate-forms methodology
introduces item-sampling difference as an additional source of error variance. Some
test takers may do better or worse on one form of test. Even though the two forms
may be equally difficult on average, some subjects may find one form quite a bit
harder than other. Alternate forms also nearly cost double for publishing the test.

3. Split-Half Reliability- This are obtained by correlating the pairs of scores obtained
from equivalent halves of a test administered only once to a representative sample
of examinees. The logic in this straightforward: If scores on two half tests from a
single test administration shows a strong correlation, then scores on 2 whole tests
from 2 separate tests administrations should reveal a strong correlation. Excessive
cost may render it impractical to obtain a second set of test scores from the same
examinees. Pg 82.
4. Interscorer Reliability- a sample of tests is independently scored by two or more
examiners and scores for pairs of examiners are then correlated. Tests manually
typically report the training and experiences required of examiners then list
representative interscorer correlation coefficients. This reliability supplements other
reliability estimates but does not replace them. It is appropriate for any test that
involves subjectivity of scoring.

Reliability under special circumstances


Traditional approaches to estimating reliability may be misleading or inappropriate for some
applicating.
1. Unstable characteristics- Some characteristics are presumed to be changing in
reaction to situational or physiological variables. Emotional reactivity as measured by
electrodermal or galvanic skin response is a good example. Such a measure
fluctuates quickly in reaction to loud noises, underlying thought processes and
stressful environment events. The true amount of emotional reactivity changes so
quickly, test and retest must be instantaneous in order to provide an accurate index
of reliability.

2. Speed and Power tests- A speed test typically contains items of uniform and
generally simple levels of difficulty. If time permitted, most subjects should be able
to complete most or all of the tests on such a test. It has a restrictive time limit. An
examinee’s score on a speeded test largely reflects speed of performance. A power
test allows enough time for test takers to attempt all items but is constructed so that
no test taker is able to obtain a perfect score. The reliability of a speed test should be
based on the test-retest method or split-half reliability from two, separately time
half test.

3. Restriction of Range- test-retest reliability will be spuriously low if it is based on a


sample of homogeneous subjects for whom there is a restriction of range on the
characteristics being measured. For example, it would be inappropriate to estimate
the reliability of an intelligence test by administering it twice to a sample of college
students.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy