CH 4 Reliability

Chapter 4
Psychometric Properties of a Test:

Reliability and Validity
Reliability
• Reliability refers to the consistency of a measure

• A measure is said to have a high reliability if it produces similar
results under consistent conditions
• If the same result can be consistently achieved by using the
same methods under the same circumstances, the measurement
is considered reliable.
• A test is considered reliable if we get the same result repeatedly.
Types of reliability
Type of reliability Measures the consistency of …
Test retest The same test over time
Interrater The same test conducted by different people
Parallel forms Different versions of a test which designed to be equivalent
Internal consistency The individual items of a test

Internal consistency
To calculate administer a test once and then calculate the reliability
index by the kuder-richardson formula 20 (kr-20) or the spearman-
brown formula.
• KR-20 formula used for dichotomous items (e.g., right-wrong
answers) and multiple choice
• Spearman-Brown formula(Split-Half Reliability) used If you
have a Likert scale or other types of items, use the
• Cronbach alpha
Kuder-richardson formula 20 (kr-20)
• Used if each item has only one right answer

Formula:
r =
KR20 ( )(
k
k-1
1– )
Σpq
σ2
 rKR20 is the Kuder-Richardson formula 20

 k is the total number of test items
 Σ indicates to sum
 p is the proportion of the test takers who pass an
item/right answer
 q is the proportion of test takers who fail an
item/wrong answer
 σ2 is the variation of the entire test
KR-20
A teacher administered a 10-item math's test to 15 students.
1.5+3 = ?
2.7+2= ?
3. 6+3= ?
4.9+1= ?
5.8+6= ?
6.7+5= ?
7.4+7= ?
8.9+2=?
9.8+4= ?
10.5+6= ?
The result of the test for the 15 students is given in the
following table.
In these columns, I marked a 1 if
This column lists each
the student answered the item
student.
correctly and a 0 if the student
answered incorrectly.
Student Math Problem
Name 1. 5+3 2. 7+2 3. 6+3 4. 9+1 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
Selam 1 1 1 1 1 1 1 1 1 1
Mulugeta 1 0 0 1 0 0 1 1 0 1
Linda 1 0 1 0 0 1 1 1 1 0
Lois 1 0 1 1 1 0 0 1 0 0
Ayuba 0 0 0 0 0 1 1 0 1 1
Andrea 0 1 1 1 1 1 1 1 1 1
Thomas 0 1 1 1 1 1 1 1 1 1
Anna 0 0 1 1 0 1 1 0 1 0
Amos 0 1 1 1 1 1 1 1 1 1
Martha 0 0 1 1 0 1 0 1 1 1
Sabina 0 0 1 1 0 0 0 0 0 1
Augustine 1 1 0 0 0 1 0 0 1 1
Priscilla 1 1 1 1 1 1 1 1 1 1
Tunde 0 1 1 1 0 0 0 0 1 0
Daniel 0 1 1 1 1 1 1 1 1 1
Solution
r =
KR20 ( )(
k
k-1
1–
Σpq
σ2 )
k = 10
 The first value is k, the number of items.

 Next we need to calculate p for each item, the
proportion of the sample who answered each item
correctly.
Name 1. 5+3 2. 7+2 3. 6+3 4. 9+1 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
Selam 1 1 1 1 1 1 1 1 1 1
Mulugeta 1 0 0 1 0 0 1 1 0 1
Linda 1 0 1 0 0 1 1 1 1 0
Lois 1 0 1 1 1 0 0 1 0 0
Ayuba 0 0 0 0 0 1 1 0 1 1
Andrea 0 1 1 1 1 1 1 1 1 1
Thomas 0 1 1 1 1 1 1 1 1 1
Anna 0 0 1 1 0 1 1 0 1 0
Amos 0 1 1 1 1 1 1 1 1 1
Martha 0 0 1 1 0 1 0 1 1 1
Sabina 0 0 1 1 0 0 0 0 0 1
Augustine 1 1 0 0 0 1 0 0 1 1
Priscilla 1 1 1 1 1 1 1 1 1 1
Tunde 0 1 1 1 0 0 0 0 1 0
Daniel 0 1 1 1 1 1 1 1 1 1
Number of
1's 6 8 12 12 7 11 10 10 12 11
Proportion
Passed (p) 0.40 0.53 0.80 0.80 0.47 0.73 0.67 0.67 0.80 0.73
To calculate the proportion of the sample who answered the item Second, divided the number of students who
correctly, first counted the number of 1’s for each item. This gives the answered the item correctly by the number of
total number of students who answered the item correctly. students who took the test, 15 in this case.
• Next we need to calculate q for each item, the proportion
of the sample who answered each item incorrectly.
• Since students either passed or failed each item, the sum p
and q is 1 i.e p + q = 1.
Name 1. 5+3 2. 7+2 3. 6+3 4. 9+1 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
Number of 1's 6 8 12 12 7 11 10 10 12 11
Proportion Passed (p) 0.40 0.53 0.80 0.80 0.47 0.73 0.67 0.67 0.80 0.73
Proportion Failed (q) 0.60 0.47 0.20 0.20 0.53 0.27 0.33 0.33 0.20 0.27
I calculated the percentage who failed You will get the same answer if you
by the formula 1 – p, or 1 minus the count up the number of 0’s for each
proportion who passed the item. item and then divide by 15.
• Now that we have p and q for each item, multiply p by q for
each item.
Name 1. 5+3 2. 7+2 3. 6+3 4. 9+1 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 10. 5+6
Number of 1's 6 8 12 12 7 11 10 10 12 11
Proportion Passed (p) 0.40 0.53 0.80 0.80 0.47 0.73 0.67 0.67 0.80 0.73
Proportion Failed (q) 0.60 0.47 0.20 0.20 0.53 0.27 0.33 0.33 0.20 0.27
pxq 0.24 0.25 0.16 0.16 0.25 0.20 0.22 0.22 0.16 0.20
Once we have p x q for every item, we sum up these values.

.24 + .25 + .16 + … + .20 = 2.05
Σpq = 2.05
Finally, we have to calculate σ2, or the variance of the total test scores.
For each student, calculate their total exam score by counting the number
of 1’s
10.
Name 1. 5+3 2. 7+2 3. 6+3 4. 9+1 5. 8+6 6. 7+5 7. 4+7 8. 9+2 9. 8+4 5+6 Total Exam Score
selam 1 1 1 1 1 1 1 1 1 1 10
mulugeta 1 0 0 1 0 0 1 1 0 1 5
Linda 1 0 1 0 0 1 1 1 1 0 6
Lois 1 0 1 1 1 0 0 1 0 0 5
Ayuba 0 0 0 0 0 1 1 0 1 1 4
Andrea 0 1 1 1 1 1 1 1 1 1 9
Thomas 0 1 1 1 1 1 1 1 1 1 9
Anna 0 0 1 1 0 1 1 0 1 0 5
Amos 0 1 1 1 1 1 1 1 1 1 9
Martha 0 0 1 1 0 1 0 1 1 1 6
Sabina 0 0 1 1 0 0 0 0 0 1 3
Augustine 1 1 0 0 0 1 0 0 1 1 5
Priscilla 1 1 1 1 1 1 1 1 1 1 10
Tunde 0 1 1 1 0 0 0 0 1 0 4
Daniel 0 1 1 1 1 1 1 1 1 1 9
Calculation of variance
Student Score Exam score Average Deviation Deviation sqr
Selam 10 6.6 3.4 11.56
Mulugeta 5 6.6 -1.6 2.56
Linda 6 6.6 -0.6 0.36
Lois 5 6.6 -1.6 2.56
Ayuba 4 6.6 -2.6 6.76
Andrea 9 6.6 2.4 5.76
Thomas 9 6.6 2.4 5.76
Anna 5 6.6 -1.6 2.56
Amos 9 6.6 2.4 5.76
Martha 6 6.6 -0.6 0.36
Sabina 3 6.6 -3.6 12.96
Augustine 5 6.6 -1.6 2.56
Priscilla 10 6.6 3.4 11.56
Tunde 4 6.6 -2.6 6.76
Daniel 9 6.6 2.4 5.76
Total=99 Total=83.6
Average=6.6
variance= sum of deviation sqr/n
variance =83.6/15=5.57
r =
KR20 ( )(
k
k-1
1–
Σpq
σ2 )
k = 10
Σpq = 2.05
σ2 = 5.57
• Now that we know all of the values in the equation, we can

calculate rKR20.
r =
KR20 ( )(
10
10 - 1
1–
2.05
5.57 )
rKR20 = 1.11 * 0.63
rKR20 = 0.702
Split-Half Reliability - Spearman Brown formula
• Used for likert scale or another measure that does not have just
one correct answer
• The preferable statistic to calculate the split-half reliability is
coefficient alpha (otherwise called Cronbach’s alpha).
• However, coefficient alpha is difficult to calculate by hand.
Hence we can use the Spearman Brown formula instead of alpha
coefficient because it is much easier to calculate.
Spearman-Brown Formula
2rxy
r =
SB 1 + rxy
where rxy = Pearson correlation of
scores in the two half tests.
Example: self-esteem scale
A 10-item scale that measures a participant’s self-esteem at a given point in time. It is
designed to measure what you are thinking at this moment.
•NB: All items are answered using a 5-point scale (1= not at all, 2= a little bit, 3=
somewhat, 4= very much, 5= extremely).
1. I feel confident about my abilities
2. I am worried about whether I am regarded as a success or failure
3. I feel satisfied with the way my body looks right now.
4. I feel frustrated or rattled about my performance.
5. I feel that I am having trouble understanding things that I read.
6. I feel that others respect and admire me.
7. I am dissatisfied with my weight.
8. I feel self-conscious.
9. I feel as smart as others.
10. I feel displeased with myself
Assume that Fourteen participants took the test.
Questionnaire Item Number
Students 1 2 3 4 5 6 7 8 9 10
1 5 4 1 4 3 4 5 4 5 4
2 5 3 5 3 5 3 4 2 4 4
3 5 3 5 3 5 3 4 2 4 4
4 5 3 3 3 4 3 4 4 5 3
5 5 3 3 3 4 3 4 4 5 3
6 5 5 1 2 3 3 3 3 3 1
7 5 4 3 3 4 4 4 3 4 3
8 5 5 3 4 5 3 4 4 4 3
9 5 3 2 3 5 5 4 5 5 4
10 5 3 3 4 4 4 4 4 4 4
11 4 4 3 4 5 4 3 4 4 4
12 4 3 1 3 4 3 3 3 5 3
13 5 3 2 4 3 5 3 3 4 3
14 3 2 1 3 3 2 3 4 4 4
• The first step is to split the questions into half. The recommended
procedure is to assign every other item to one half of the test.
– If you simply take the first half of the items, the participants may
have become tired at the end of the questionnaire and the reliability
estimate will be artificially lower.
1 Half 2 Half
S/No 1 2 3 4 5 Total 6 7 8 9 10 Total
1 5 4 1 4 3 17 4 5 4 5 4 22
2 5 3 5 3 5 21 3 4 2 4 4 17
3 5 3 5 3 5 21 3 4 2 4 4 17
4 5 3 3 3 4 18 3 4 4 5 3 19
5 5 3 3 3 4 18 3 4 4 5 3 19
6 5 5 1 2 3 16 3 3 3 3 1 13
7 5 4 3 3 4 19 4 4 3 4 3 18
8 5 5 3 4 5 22 3 4 4 4 3 18
9 5 3 2 3 5 18 5 4 5 5 4 23
10 5 3 3 4 4 19 4 4 4 4 4 20
11 4 4 3 4 5 20 4 3 4 4 4 19
12 4 3 1 3 4 15 3 3 3 5 3 17
13 5 3 2 4 3 17 5 3 3 4 3 18
14 3 2 1 3 3 12 2 3 4 4 4 17
The second half total was calculated by The first half total was calculated by adding
adding up the scores for the first 5 items up the scores for last 5 items
• Now that we have our two halves of the test, we have to calculate the
Pearson Product-Moment Correlation between them.
Σ(X – X) (Y – Y)
rxy = √[Σ (X – X) ] [ΣY – Y) ]
2 2
Where ,
X = one person’s score on the first half of items
X = the mean score on the first half of items
Y = one person’s score on the second half of items
Y = the mean score on the second half of items.
Σ(X – X) (Y – Y)
rxy = √[Σ(X – X) ] [Σ(Y – Y) ]
2 2
1 Half 2 Half To get X – X, we take each

S/No Total Total X-X Y-Y
second person’s first half
1 17 22 -1.1 3.6 total minus the average,
2 21 17 2.9 -1.4 which is 18.1. For example,
3 21 17 2.9 -1.4 21 – 18.1 = 2.9.
4 18 19 -0.1 0.6
5 18 19 -0.1 0.6
6 16 13 -2.1 -5.4
7 19 18 0.9 -0.4 To get Y – Y, we take each
8 22 18 3.9 -0.4 second half total minus the
9 18 23 -0.1 4.6
average, which is 18.4. For
example,
10 19 20 0.9 1.6
18 – 18.4 = -0.4
11 20 19 1.9 0.6
12 15 17 -3.1 -1.4
13 17 18 -1.1 -0.4
14 12 17 -6.1 -1.4
Mean 18.1 18.4
Σ(X – X) (Y – Y) = 12.66
1 Half 2 Half
S/No Total Total X-X Y-Y (X - X)(Y - Y)
1 17 22 -1.1 3.6 -3.96
Next we multiply (X – X) times (Y – Y).
2 21 17 2.9 -1.4 -4.06
3 21 17 2.9 -1.4 -4.06
4 18 19 -0.1 0.6 -0.06
5 18 19 -0.1 0.6 -0.06
6 16 13 -2.1 -5.4 11.34
7 19 18 0.9 -0.4 -0.36
8 22 18 3.9 -0.4 -1.56
9 18 23 -0.1 4.6 -0.46
10 19 20 0.9 1.6 1.44
11 20 19 1.9 0.6 1.14
12 15 17 -3.1 -1.4 4.34 After we have multiplied, we
13 17 18 -1.1 -0.4 0.44 sum up the products.
14 12 17 -6.1 -1.4 8.54

Mean 18.1 18.4 Sum 12.66
Σ(X – X) (Y – Y)
rxy = √[Σ(X – X) ] [Σ(Y – Y) ] 2 2
√[Σ(X – X)2] [Σ(Y – Y)2] = 82.72

To calculate the
S/ 1 Half 2 Half denominator, we have to
No Total Total X-X Y–Y (X - X)2 (Y - Y)2 square (X – X) and (Y – Y)
1 17 22 -1.1 3.6 1.21 12.96
2 21 17 2.9 -1.4 8.41 1.96
3 21 17 2.9 -1.4 8.41 1.96
4 18 19 -0.1 0.6 0.01 0.36
5 18 19 -0.1 0.6 0.01 0.36
6 16 13 -2.1 -5.4 4.41 29.16
7 19 18 0.9 -0.4 0.81 0.16
8 22 18 3.9 -0.4 15.21 0.16
9 18 23 -0.1 4.6 0.01 21.16 Next we sum the squares across the
10 19 20 0.9 1.6 0.81 2.56
participants. Then we multiply the
sums.
11 20 19 1.9 0.6 3.61 0.36
90.94 * 75.24 = 6842.33.
12 15 17 -3.1 -1.4 9.61 1.96 Finally, √6842.33 = 82.72.
13 17 18 -1.1 -0.4 1.21 0.16
14 12 17 -6.1 -1.4 37.21 1.96
Sum 90.94 75.24
Σ(X – X) (Y – Y)
rxy = √[Σ(X – X) ] [Σ(Y – Y) ]
2 2
Σ(X – X) (Y – Y) = 12.66
√[Σ(X – X)2] [Σ(Y – Y)2] = 82.72
• Now that we have calculated the numerator and

denominator, we can calculate rxy
12.66
rxy = 82.72
rxy = 0.15
2rxy
r =
SB
1 + rxy
• Now that we have calculated the Pearson correlation

between our two halves (rxy = 0.15), we substitute this value
for rxy and we can calculate rSB
2 * 0.15
rSB = 1 + 0.15
0.3
rSB = 1.15
rSB = 0.26 The measure did not have good
reliability in my sample!
The steps for conducting Spearman-Brown in SPSS
1. The data is entered in a within-subjects fashion.
2. Click Analyze.
3. Drag the cursor over the Scale drop-down menu.
4. Click on Reliability Analysis.
5. Click on the first item to highlight it.
6. Click on the arrow to move the item into the Items: box.
7. Repeat Steps 5 and 6 until all of the survey items are in
the Items: box.
8. Click on the Statistics button.
9. In the Descriptive for table, click on the Item, Scale, and Scale if
item deleted boxes to select them.
10. In the mode, select split half
11. Click Continue.
12. Click OK.
Spss Result
Reliability Statistics
Total N of Items 10
Correlation Between Forms .153
Equal Length .265

Spearman-Brown Coefficient
Unequal Length .265
Guttman Split-Half Coefficient .264

Test-Retest Reliability
• The same test is administrated on two occasions to the

same individuals under the same conditions.
• This yields two scores for each person and the correlation
between these two sets of scores is the test-retest reliability
coefficient.
• If the test is reliable, there will be a high positive
association between the scores.
• Assumes there is no change in the underlying trait between
time 1 and time 2.
EXAMPLE
The following self esteem test scores of 9 students are
obtained in two successive test administrations. Determine
the test-retest reliability based on data given data
Student ID Result of 1st test Result of 2nd test
1 75 74
2 50 53
3 93 94
4 80 79
5 67 69
6 88 89
7 56 54
8 71 72
9 66 65
How to compute the simple correlation
coefficient (r)
 xy   x y
r n
 ( x) 2
  ( y) 
2
x 
2 .  y 
2 
 n  n 
  
Solution
Test 1st Test 2nd
Student ID (x) (y) XY X^2 Y^2
1 75 74 5550 5625 5476
2 50 53 2650 2500 2809
3 93 94 8742 8649 8836
4 80 79 6320 6400 6241
5 67 69 4623 4489 4761
6 88 89 7832 7744 7921
7 56 54 3024 3136 2916
8 71 72 5112 5041 5184
9 66 65 4290 4356 4225

Sum 646 649 48143 47940 48369
 xy   n
x y
r 

  x2


( x)  
n
2
.  y  

2
(
n
y) 2



R=.993**
  
Steps for conducting test-retest reliability in SPSS
1.The data is entered in a within-subjects fashion.
2. Click Analyze.
3. Drag the cursor over the Correlate drop-down menu.
4.Click on Bivariate.
5.Click on the baseline observation, pre-test administration, or survey
score to highlight it.
6.Click on the arrow to move the variable into the Variables: box.
7.Click on the second observation, post-test administration, or survey
score to highlight it
8. Click on the arrow to move the variable into the Variables: box.
9.Click OK.
Spss output
 If the p-value is LESS THAN .05, and the Pearson correlation

coefficient is above 0.7, then researchers have evidence of test-retest
reliability.
 As can be seen from the SPSS result output, the correlation r is 0.993
(p < .0001), which is very high! This indicates that the test is highly
reliable.
Parallel-Forms Reliability
 Develop two equivalent forms of the test.
 The tests should have same mean and variance.
 Administer both forms to the same people
 Compute the correlation between paired scores.
 That correlation is the reliability estimate of each
form
These are the two forms of self esteem test
(Likert scale) Form two
Form one
1.I feel confident about my abilities 1.I think that overall, people find me boring to talk to
2.I mess up everything I touch
2. I am worried about whether I am regarded
as a success or failure 3.I feel devastated when someone criticizes me
4.If someone ever falls in love with me, I better do
3. I feel satisfied with the way my body looks my best to prove myself worthy, because it may
right now. never happen again
4. I feel frustrated or rattled about my 5.I could disappear from the surface of the earth and
performance. nobody would notice or care.
5. I feel that I am having trouble understanding 6.I feel as though I let those I care about down
things that I read. 7.I will never amount to anything or anyone
significant
6. I feel that others respect and admire me.
8.When someone criticizes me, I can't help but feel
7.I am dissatisfied with my weight. that I really am incompetent
8.I feel self-conscious. 9.I have what it takes to socialize with other people
9. I feel as smart as others 10.I think I am a failure
10. I feel displeased with myself.
Example two
 Total scores for English, forms A and B are listed for each participant, as are
total scores for math's, forms A and B.
• calculate the Pearson's product moment correlation between English A and
English B. This is the parallel forms reliability coefficient for English.
• calculate the Pearson's product moment correlation between math's A and
math's B. This is the parallel forms reliability coefficient for math's.
Inter-Rater Reliability
• Is used to calculate how consistent two "raters" are when they

score the same test.
• The purpose of inter-rater reliability is not to correlate test
scores, as are the previous three types of reliability. Instead,
the purpose is to determine how consistent the raters are in
their marks.
• Therefore, give the exact same tests to two raters, meaning
that the same test will be marked by two different people.
Example
To calculate: correlate the two markings from the different

raters.
Validity
VALIDITY
 Denotes the extent to which an instrument is measuring

what it is supposed to measure.
 Refers to the degree to which a measuring instrument or
test measures what is to be measured.
 Validity refers to how accurately a method measures what
it is intended to measure. If research has high validity,
that means it produces results that correspond to real
properties, characteristics, and variations in the physical
or social world.
What can be said of validity of the ff
 A spelling test with the following item: 2 + 5 = __

 A thermometer used to measure volume
 A math's test with following item : 2+5 use to test math
ability of high school student
 JSI used to measure job satisfaction
Linking reliability and validity
 A measure can be reliable but not valid, if it is measuring

something very consistently but is consistently measuring
the wrong construct.
 Likewise, a measure can be valid but not reliable if it is
measuring the right construct, but not doing so in a
consistent manner.
 Hence, reliability and validity are both needed to assure
adequate measurement of the constructs of interest.
1–41
Consider the African Geography quiz below:
1. The lake known by nick name “golden lake” is :
a. Tana b. ziway c. Langano
Is it reliable? – if you don’t
2. The most populated Ethiopian city is: know the answers, are you
a. Hawassa b. Adama c. AA going to suddenly know
them tomorrow or next
3. The largest region in Ethiopia is:
week? Probably not. We’ll
a. Oromia
say it’s reliable
b. Tigray
c. Amhara
4. The highest mountain is Ethiopia is:
a. Mt. Ras Dashen
b. Mt. Tulu Dimtu
c. Mt. Batu Is it valid?
This doesn’t look like a
5. The largest river in Ethiopia is measure of Africa
a. Abay Geography to me, it
focuses on Ethiopia . It’s
b. Awash not a valid measure of
c. Wabe shebelle African Geography.
Types of validity
Face validity
Content validity
Criterion-related validity
•Predictive validity and
•Concurrent validity
Construct validity
•Convergent
•Divergent validation
Face Validity
 Face validity, also called logical validity, is a simple form

of validity where you apply a subjective assessment
of whether or not your test measures what it is supposed
to measure
• Face validity is simply whether the test appears (at face
value) to measure what it claims to. This is the least
sophisticated measure of validity.
• Tests wherein the purpose is clear, even to naïve
respondents, are said to have high face validity.
Eg. Using meter to measure a weight
Content validity
 Is the degree to which the content of a measure truly

reflects the full domain of the construct for which it is
being used, no more and no less.
 Content validity can be evaluated only by those who have a
deep understanding of the construct in question.
 Occurs when the experiment provides adequate coverage
of the subject being studied.
e.g., a comprehensive math achievement test would lack
content validity if it only had questions about one aspect of
math (e.g., algebra)
Criterion-related validity
• Degree to which test scores can predict specific criterion
variable
• Examines the ability of the measure to predict a variable
that is designated as a criteria
• The success of the measure in predicting some behaviors
• The key to validity is the empirical association between test
scores and scores on the relevant criterion variable, such as
"job performance
• Concurrent validity
• Predictive validity
Predictive Validity
 This measures the extent to which a future level of a
variable can be predicted from a current measurement.
 Predictive validity is the degree to which a measure
successfully predicts a future outcome that it is
theoretically expected to predict.
 This is appropriate for tests designed to asses a person’s
future status on a criterion.
For example,
 A political election intends to measure future voting intent
 Standardized entrance test scores correctly predict the
academic success in university ( As measured by GPA)
Concurrent Validity
 Occurs when the criterion measures are obtained at the
same time as the test scores
 This indicates the extent to which the test scores
accurately estimate an individual’s current state with
regard to the criterion.
 Examines how well one measure relates to other concrete
criterion that is presumed to occur simultaneously
 For example, on a test that measures levels of depression,
the test would be said to have concurrent validity if it
measured the current levels of depression experienced by
the test taker.
Construct validity
 Test has construct validity if it accurately measures a

theoretical, non-observable construct or trait
 Concurrent validity occurs when the theoretical
constructs of cause and effect accurately represent the
real-world situations they are intended to model.
Two methods of establishing a test’s construct validity are
• Convergent
• Divergent validation
Convergent validity
• Convergent validity refers to the degree to which a

measure is correlated with other measures that it is
theoretically predicted to correlate with.
• E.g theoretically happiness and motivation are related
to self-esteem but depression is related to self-esteem in
a negative direction.
• Thus, if some one take self esteem test and if it is
positively correlated with measures of happiness and
motivation and negatively correlated with measures of
depression, then we have obtained convergent evidence
Discriminant/divergent validity
 Is the degree to which test scores are uncorrelated with

tests of unrelated constructs.
 Eg. Theoretically intelligence is unrelated to self-esteem.
Thus, if some one take a self esteem test and if the score
is uncorrelated (or only weakly correlated) with
measures of intelligence we can say the self esteem test
have discriminant validity

CH 4 Reliability

Uploaded by

Copyright:

Available Formats

CH 4 Reliability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 4 Reliability

Uploaded by

Copyright:

Available Formats

Chapter 4

Psychometric Properties of a Test:

• Reliability refers to the consistency of a measure

Internal consistency The individual items of a test

• Used if each item has only one right answer

 rKR20 is the Kuder-Richardson formula 20

Student Math Problem

 The first value is k, the number of items.

Student Math Problem

Once we have p x q for every item, we sum up these values.

• Now that we know all of the values in the equation, we can

1 Half 2 Half To get X – X, we take each

14 12 17 -6.1 -1.4 8.54

√[Σ(X – X)2] [Σ(Y – Y)2] = 82.72

• Now that we have calculated the numerator and

• Now that we have calculated the Pearson correlation

Correlation Between Forms .153

Equal Length .265

Guttman Split-Half Coefficient .264

• The same test is administrated on two occasions to the

 If the p-value is LESS THAN .05, and the Pearson correlation

• Is used to calculate how consistent two "raters" are when they

To calculate: correlate the two markings from the different

 Denotes the extent to which an instrument is measuring

 A spelling test with the following item: 2 + 5 = __

 A measure can be reliable but not valid, if it is measuring

 Face validity, also called logical validity, is a simple form

 Is the degree to which the content of a measure truly

 Test has construct validity if it accurately measures a

• Convergent validity refers to the degree to which a

 Is the degree to which test scores are uncorrelated with

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.