APM Tech Manual
APM Tech Manual
Pearson, TalentLens, Raven's Progressive Matrices and the logo are trademarks in
the US and/or other countries, of Pearson Education, Inc., or its affiliate(s).
Introduction ........................................................................................ 5
References ....................................................................................... 26
List of Figures
Figure 1. Item difficulties for Raven’s APM fixed form ........................... 7
List of Tables
Table 1. Item difficulties for Raven’s APM fixed form ............................. 8
Table 2. Item selection structure ...................................................... 11
Table 3. Descriptive statistics based on APM raw total scores, reliability,
and SEM for country-specific standardization samples......................... 15
Table 4. Reliability and Descriptive statics based on
Raven’s APM Item Bank theta scores (N = 466) ................................. 16
Table 5. Descriptive statistics of APM raw total scores and
comparisons by sex—international sample......................................... 18
3
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 6. Descriptive statistics of APM raw total scores and
comparisons by sex—Australia/New Zealand, India, and US ................ 18
Table 7. Descriptive statistics of APM raw total scores and
age group comparisons—international sample ................................... 19
Table 8. Descriptive statistics of APM raw total scores and
age group comparisons—Australia/NZ ............................................... 19
Table 9. Descriptive statistics of APM raw total scores and
age group comparisons—India ......................................................... 19
Table 10. Descriptive statistics of APM raw total scores and
age group comparisons—US ............................................................ 20
Table 11. Comparisons by education level—international sample.......... 21
Table 12. Descriptive statistics of APM raw total scores and
education group comparisons—Australia/NZ ...................................... 21
Table 13. Descriptive statistics of APM raw total scores and
education group comparisons—India ................................................ 21
Table 14. Descriptive statistics of APM raw total scores and
education group comparisons—US .................................................... 22
Table 15. Descriptive statistics of APM raw total scores and
occupational group comparisons ...................................................... 22
4
Copyright © 2015 Pearson, Inc. All rights reserved.
Introduction
The Raven’s Progressive Matrices have been used in many countries for decades as a
measure of problem-solving and reasoning ability (Raven, Raven, & Court, 1998a). The
various editions of the Raven’s Progressive Matrices (standard, advanced and colored)
have been studied in more than 48 countries on samples totaling more than 240,000
participants (Brouwers, Van de Vigver, & Van Hemert, 2009; Wongupparaj, Kumari, &
Morris; 2015).
This manual, Part 2 of the Raven’s International Manual, describes the development
and standardization of the Raven’s Advanced Progressive Matrices (APM) fixed form (also
known as APM short form or APM 2.0) and the subsequent item-banked version designed
for use within the domain of work and organizational psychology. The adaptation and
standardization process of language versions across countries is outlined. Information on
group differences regarding age, sex, and ethnicity also is presented.
The APM fixed form, consisting of 23 items, was developed to provide customers with a
shorter assessment that maintains the nature of the construct being measured and the
psychometric integrity of the assessment.
5
Copyright © 2015 Pearson, Inc. All rights reserved.
Data on the English version was initially collected in the United States from May
2006 through to October 2007. In total, data from 929 applicants and employees were
collected, representing a number of positions across various occupations. Information
regarding the respondents’ current occupation and organizational level is presented in
Appendix A, Table A.1.
Classical Test Theory (CTT) and Item Response Theory (IRT) methodologies were
used in the APM data analysis for item selection. Specifically, for each of the 36 items in
the previous APM version, the IRT item difficulty index (p value), item discrimination (a),
IRT item difficulty (b), and corrected item–total correlation were examined in the
selection process. The analyses inferred two main revisions due to the underlying aim of
the APM to become increasingly more difficult as the items progress. First, less
discriminating items were dropped when selecting items for the 23-item fixed form
version, thus less efficient items were excluded, and second, because the test was shown
to be operationally more difficult than intended, two of the initial items were replaced
with two less difficult items so that respondents would experience the first five items as
easier than the subsequent items.
Item difficulties of the fixed form are presented in Figure 1 and Table 1. The item
difficulties were calculated using a drawn sample of 663 diverse respondents (see
Appendix A, Table A.2 for additional sample descriptive information). As shown in Figure
1 and Table 1, the items increase in difficulty (e.g., item 1 = .94, meaning 94% of
respondents answer the item correctly) through item 23, which yields an item difficulty of
.09 (only 9% of respondents answer the item correctly). This indicates that the fixed
form version of the Raven’s APM is neither too easy nor too difficult, and that it is able to
measure cognitive ability levels across a continuum. The consistency in range and levels
of difficulty was examined further by drawing a subsample of Executives, Directors, and
Managers (N = 264). As shown in Figure 1 and Table 1, the item difficulties for this group
do not differ appreciably from the item difficulties for the entire sample.
6
Copyright © 2015 Pearson, Inc. All rights reserved.
Figure 1. Item difficulties for Raven’s APM fixed form
7
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 1. Item difficulties for Raven’s APM fixed form
8
Copyright © 2015 Pearson, Inc. All rights reserved.
International Adaptation
The U.S. standardized 23-item fixed form version was translated, adapted, and
standardized in a number of countries and their respective languages. To ensure
consistent measurement of the construct across language versions and countries, each
country-specific adaptation followed a uniform process. Instructions were translated from
U.S. English into the target language either by an independent translator (the
Netherlands) or by test development experts at the local Pearson Talent Lens office.
Then, local test development experts reviewed the translation—including the original U.S.
English version adapted for other English speaking countries—and further adapted and
refined the final translation as needed. The translated online version was then
administered to manager-level respondents across various industries.
The majority of respondents completed the APM under timed (40 minutes) and
proctored (i.e., supervised) conditions within the period of June 2008 through to March
2011. The only exception concerned respondents from Singapore, who completed the
test between August 2013 and November 2014. Testing period, language version used,
descriptive statistics, and internal consistency/reliability estimates for all country-specific
standardization samples are presented in Table 2.
9
Copyright © 2015 Pearson, Inc. All rights reserved.
Development of the Item-Banked Version
Due to the popularity combined with the need for unsupervised administration,
development of an item-banked version was initiated in January 2014. The aim was to
create an item bank of matrices containing items with equal characteristics that could be
randomly selected to provide unique item sets for each administration.
The initial step of the project was conducted in collaboration with the Psychometrics
Centre at University of Cambridge, and entailed the creation of 230 new Raven’s APM
items. Four parallel items were created for each of the 23 original operational items,
totaling 92 items.
A total of 1720 participants were recruited worldwide via the internet. Participants
answered the items online with English instructions via the platform Concerto, hosted by
the Psychometric Centre in Cambridge. Appendix D provides demographic information on
the age, sex, ethnicity, and educational level of the sample.
Each participant was administered a set of 36 items - 12 operational items and 24
new items. The 12 operational items were selected from the full set of 23 original
operational items. Original item selections were done randomly between pairs of items.
For example, original operational items 1 and 2 constituted a pair and either item 1 or
item 2 was randomly selected for administration. The structure of pairs and logic of
selection is described in Table 3.
10
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 2. Item selection structure
The 24 new items were pulled from the parallel items. For the 12 selected original items;
2 out of 4 (24 total) parallel items were randomly selected. After item selection according
to the described structure, the order of the 36 selected items was randomized before
administration to participants.
Performance on the original operational items was then used to calibrate the new
items, obtain item parameters and demographics to ensure item parallelism, reliability,
and validity. For each of the original 23 Raven’s APM items, the four most statistically
equivalent items were chosen for final inclusion in the new bank of items,(currently
standing at 92). Statistical examination of the trial data (N = 1720) found the original
operational 23 items to correlate with the new parallel items with a correlation coefficient
value of at least .97. This suggests that the new and original items are highly equivalent.
Respondents completing the online Raven’s APM test will be administered with one item
out of the 4 available parallel items. This randomization reduced the chance that any two
administrations will contain identical set of items and thereby protect the integrity of the
assessment.
Scoring
Each item in the bank is given a precise difficulty value, which runs on a finely
incremented scale ranging from –3.000 (highly easy) to +3.000 (highly difficult). This
value is based on data looking at the number of test takers answering the question
correctly and allows us to examine the precise difficulty value of each item. Items are
11
Copyright © 2015 Pearson, Inc. All rights reserved.
also coded in terms of item discrimination, the extent to which items can differentiate
between high and low scorers on the test. This again ranges on a finely incremented
scale, ranging from around .3000 (reasonable levels of discrimination) to 3.000 (high
levels of discrimination).
Traditional scoring, where a raw score is calculated based on the number of correct
items, does not take into account the difficulty or discrimination values of the items
presented, which will vary slightly from test to test. To solve this problem, a scoring
algorithm is used that takes into account the difficulty and discrimination level of each
item. Test scores are presented as ‘theta scores’, which range from –4.000 (low ability)
to +4.000 (high ability). Theta scores can be treated the same as classic raw scores
when running statistics such as comparing differences between candidates. As this is not
an easy-to-interpret scale, theta scores are converted to percentiles for the purpose of
reporting results. Theta scores are available via the data report function on the online
test platform.
As mentioned previously, theta scores take into account the difficulty level of the
items presented to test takers. What this means in practice is that if a test taker answers
more of the harder items correctly, they’ll achieve a higher score than someone correctly
answering the same number of easier items. Answering a number of easier items
incorrectly will negatively impact a score, whilst answering a number of harder items
correctly will have a positive impact.
Evidence of Reliability
Reliability refers to the consistency of measurements when the measurement procedure
is repeated on a population of individuals or groups an infinite number of times; that is
the extent to which the two people of the same ability or the same person being tested
on different occasions will receive the same score (Anastasi & Urbina, 1997). This
characteristic is important because the usefulness of behavioral measurements
presupposes that individuals and groups exhibit some degree of stability in their
behavior.
12
Copyright © 2015 Pearson, Inc. All rights reserved.
standardized and controlled conditions, successive samples of behavior from the same
person are rarely identical in all pertinent respects. An individual’s performances and
responses to sets of test questions inevitably vary in their quality or character from one
occasion to another, due to an examinee trying harder, making luckier guesses, being
more alert, feeling less anxious, or enjoying better health on one occasion than another,
etc. Some individuals may exhibit less variation in their scores than others, but no
examinee is completely consistent. Because of this variation, a respondent’s obtained
score and the average score of a group will always suffer from at least a small amount of
deficiencies in reliability.
Classical Test Theory (CTT) posits that a test score is an estimate of an individual’s
hypothetical true score, or the score an individual would receive if the test is perfectly
reliable. In actual practice, administrators do not have the luxury of administering a test
an infinite number of times, so some measurement error is to be expected. A reliable test
has a relatively small measurement error.
The internal consistency reliability estimate (split half) for the total score of the 23-
item fixed form version of APM was r = .85 in the U.S. standardization sample of N =
929. When tests are used in employment contexts, reliabilities above r = .89 are
generally considered “excellent”, r = .80–.89 as “good”, r = .70–.79 as “adequate”, and
below r = .70 “may have limited applicability” (U.S. Department of Labor, 1999, p.3-3,
for guidelines on interpreting reliability coefficients). This however, provides limited
information about the implications that the measurement error have for interpretation of
test scores, thus the applied implications of measurement error.
Evidence of the equivalence between the 36-item version and 23 item version is
also supported in that the reliability of the two assessments was virtually the same; the
internal consistency reliability of the 36-item version was r =.83. In addition, the internal
13
Copyright © 2015 Pearson, Inc. All rights reserved.
consistency reliability of the shorter version was essentially the same (r = .82) when
calculated using the independent sample of N = 663 respondents.
The SEM is the standard deviation (SD) of the measurement error distribution and
it is used to calculate confidence intervals. The confidence interval is a score range that,
at a specified level of probability, includes the respondent’s hypothetical “true” score that
represents the respondent’s actual ability. Because the true score is a hypothetical value
that can never be obtained—because all measurement, including testing, always involves
some measurement error—any obtained score is considered an estimate of the
respondent’s “true” score. Approximately 68% of the time, the observed score will be
within +1.0 and –1.0 SEM of the “true” score; 95% of the time, the observed score will
be within +1.96 and –1.96 SEM of the “true” score. The SEM (68 and 95%) for all
country-specific standardization samples is presented in Table 2.
14
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 3. Descriptive statistics based on APM raw total scores, reliability, and SEM for country-specific standardization samples
SEMsplit SEMsplit
Country Period Language N Mean SD Min Max Skew Kurtosis ralpha rsplit 68% 95%
Australia/
New Nov 2009–
Zealand Sept 2010 English 128 12.0 4.2 3 23 0.02 –0.39 .77 .78 1.96 3.83
Nov 2009–
France Mar 2010 French 106 14.3 4.1 3 22 –0.34 –0.09 .74 .79 1.87 3.67
Feb–April
India 2010 English 100 9.5 4.2 2 19 0.19 –0.80 .79 .82 1.79 3.51
Sept–Oct
Netherlands 2010 Dutch 103 13.0 4.5 2 22 –0.10 –0.64 .81 .83 1.87 3.67
June 2009–
UK Mar 2010 English 101 12.4 4.7 4 23 0.04 –0.71 .83 .85 1.83 3.58
Apr–Aug
US 2008 English 175 12.2 4.1 2 23 –0.03 –0.13 .77 .81 1.80 3.54
Aug 2013–
Singapore Nov 2014 English 229 15.0 4.7 1 23 –0.50 0.21 .84 .70 2.56 5.02
Mar–Apr
Sweden 2011 Swedish 105 9.7 4.4 0 20 0.16 –0.14 .79 .84 1.77 3.47
Mar–Apr
Norway 2011 Norwegian 102 9.5 4.2 1 20 0.41 –0.18 .77 .80 1.88 3.68
Mar–Apr
Denmark 2011 Danish 112 10.3 5.0 0 21 0.28 –0.65 .84 .88 1.74 3.40
Swedish/
Total Mar–Apr Norwegian/
Scandinavia 2011 Danish 319 9.8 4.5 0 21 0.28 –0.32 .80 .84 1.84 3.61
15
Copyright © 2015 Pearson, Inc. All rights reserved.
Evidence of Item-Bank Reliability Based on International Data
In March 2015, a sample of graduate applicants to a higher education course in the UK
(N = 466) completed the final version of the Raven’s APM item bank assessment online.
Traditional calculation of classical test theory based reliability indices such as Cronbach’s
Alpha require all test takers to complete all items. This type of analysis is not possible
with item-banked tests, whereby each test taker completes a different set of items.
Within IRT models, the information function provides estimates of accuracy of
measurement conditional on theta (ability). A single reliability coefficient from IRT theta
scores was estimated using the Standard Error of the theta score for each individual
based on the information function (Raju, Price, Oshima, & Nering, 2007). The results are
presented in Table 4 and indicate reliability for the Raven’s APM item bank to be above
the desired value of .70.
Table 4. Reliability and Descriptive statics based on Raven’s APM Item Bank
theta scores (N = 466)
Country UK
Period March 2015
Language English
Mean 0.3
SD 0.7
Min –2.14
Max 2.63
Skewness –0.04
Kurtosis 0.68
Reliability 0.73
Critical Value for 68% CI 0.35
Critical Value for 95% CI 0.69
Group Differences
It is important to consider issues of fairness, discrimination, and equal opportunity for
legal and ethical reasons in any assessment process. These issues are closely intertwined
and affect the meaning and impact of group differences in assessment scores. Local
legislation also may be a complicating factor in understanding the impact and meaning of
group differences in applied practice.
The group difference estimates presented are not sufficient to conclude if the
Raven’s APM is fair, unfair, or discriminate. This claim is dependent upon several factors.
1. All groups are different and even if there are score differences in one
circumstance this does not mean they will always occur.
16
Copyright © 2015 Pearson, Inc. All rights reserved.
2. The groups compared are likely not drawn from identical populations. For
example a female-led company may be more attractive to women applicants and
therefore may receive more strong women candidates than a company with few high
level women which has a poor track record in the equal opportunities field. High test
scores from women in this situation may reflect the fact that the company attracts the
best women rather than any unfairness in the test.
Results of comparing performance on the fixed 23-item version of Raven’s APM for
a range of samples and variables indicate whether group differences are likely and the
size of the differences. The difference between groups is expressed as a Cohen’s d
statistic. This expresses the difference in standard deviation units and the statistic can be
compared to results from different forms of the test or scores expressed on different
scales. Raw score differences are not comparable in the same way. Values of Cohen’s d
above 0.8 are considered large effects, above 0.5 are moderate effects, and above 0.2
are small effects (Cohen, Cohen, West, & Aiken, 2003). Below this level, values can be
considered negligible.
The most extensive analysis of group differences for the fixed form version of Raven’s
APM was done on a sample consisting of 1836 respondents who completed
the online version in English in the UK. Respondents completed the fixed form
assessment and the Means and SDs presented for this study and subsequent reported
studies in this section on Group Differences, are based on traditional number correct raw
scores. Respondents completed the test for a number of reasons; applying for a job
outside the respondent’s company of work, was reported as the most common reason
(63.8%), followed by those who sought professional development (10.0%). This sample
can be classified as international, because it consists of individuals born in 82 different
countries, from all continents, most frequent being British nationality (33.2%), Norway
(9.3%) and Swedish (6, 6%). Regarding sex, 1358 respondents (74.0%) were male and
456 respondents (24.8%) were female; 22 respondents (1.2%) chose not to
17
Copyright © 2015 Pearson, Inc. All rights reserved.
declare. Overall, the sample has a relatively high level of education; 37.3% reported they
had a Master’s degree, followed by 32.7 % reporting to have a Bachelor
degree. Regarding the ethnic composition, 72.6% of the sample reported to be White,
followed by 14.6% reporting to be Asian; Black respondents account for less than 1% of
the sample, and other ethnicities or those who are unwilling to declare 12%.
The total scores of 1836 respondents ranged from 0 to 23 points, with a mean of
13.2 and a standard deviation of 4.4. For the sample, the scale showed evidence of
adequate reliability (Cronbach α = 0.80), as well as dimensional structure. No problems
related to missing data or low responses to any of the items were found.
Group differences on sex, age, and educational level are also presented based on an
additional set of data collected using the TalentLens digital platforms for online
administration. All data was collected using the English version of the Raven’s APM,
gathered in Australia/New Zealand, India, and the US.
Sex
For the international sample, the comparison between males and females showed no
differences. This is in line with the classical literature, which postulates the absence of
differences in fluid intelligence by sex.
Table 5. Descriptive statistics of APM raw total scores and comparisons by sex—
international sample
Females Males
Mean SD n Mean SD n d
13.2 4.6 456 13.2 4.4 1358 0.00
For the country-level data, the results show only a negligible effect based on sex for
Australia/New Zeeland and a small effect for the US.
Table 6. Descriptive statistics of APM raw total scores and comparisons by sex—
Australia/New Zealand, India, and US
Male Female
Sample Mean SD n Mean SD n d
Australia/NZ 11.8 4.2 556 11.2 4.1 260 .15
India 14.4 3.9 2894 13.8 4.0 343 .14
US 12.8 4.4 2566 11.7 4.3 534 .25
Age
The literature claims that fluid reasoning ability tends to decrease somewhat with
increased age. Table 7 shows that effect sizes between pairwise comparisons of age
groups in the international sample vary from negligible to small, and effect size between
18
Copyright © 2015 Pearson, Inc. All rights reserved.
the youngest age group (16–24 years) and the oldest age group (50–59 years)
represents the largest difference. The difference of d = .85 between 20 and 60 years of
age is identical to previous research on matrix reason tests (Salthouse, 2009).
Table 7. Descriptive statistics of APM raw total scores and age group
comparisons—international sample
Age
Group Mean SD n Age Groups d
16–24 15.9 4.5 173 16–24 vs 25–29 .17
25–29 15.1 4.4 210 25–29 vs 30–34 .31
30–34 13.7 4.2 220 30–34 vs 35–39 .17
35–39 13.0 4.2 313 35–39 vs 40–44 .08
40–44 12.7 4.1 346 40–44 vs 45–49 .13
45–49 12.2 4.3 285 45–49 vs 50–59 .21
50–59 11.3 4.00 247 16–24 vs 50–59 .85
For the country-level samples, the effect of age on Raven’s APM test scores is based on
data from n = 9233. The main part of the data is identical to the data used for the
analysis of sex effects. The effects of age are presented by country sample. The results
generally show that performance declines somewhat by age group, even if the pairwise
comparisons show only negligible or small effects. The comparison between the youngest
and oldest age groups shows an average 1 SD lower performance by the older groups,
similar to previous research findings (Salthouse, 2009).
Table 8. Descriptive statistics of APM raw total scores and age group
comparisons—Australia/NZ
Age
Group Mean SD n Comparisons d
21–29 13.7 4.4 31 21–29 vs 30–39 .21
30–39 12.8 4.2 132 30–39 vs 40–49 .15
40–49 12.2 4.1 310 40–49 vs 50–59 .34
50–59 10.8 4.0 292 50–59 vs 60– .33
60– 9.6 3.6 50 21–29 vs 60– 1.02
Table 9. Descriptive statistics of APM raw total scores and age group
comparisons—India
19
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 10. Descriptive statistics of APM raw total scores and age group
comparisons—US
Age
Group Mean SD n Age Groups d
16–20 14.8 4.1 398 16–20 vs 21–24 .24
21–24 13.8 4.2 410 21–24 vs 25–29 .04
25–29 13.6 4.2 577 25–29 vs 30–34 .27
30–34 12.5 4.3 821 30–34 vs 35–39 .08
35–39 12.1 4.3 1452 35–39 vs 40–49 .14
40–49 11.5 4.2 1206 40–49 vs 50– .30
50– 10.2 4.1 385 16–20 vs 50– 1.11
Education
Effect sizes among Education levels of seven different groups among the international
sample were compared. Groups representing levels of education with a small number of
respondents were excluded due to the risk of bias in the results. The results generally
show that respondents with a higher education level scored higher on the APM compared
to respondents with a lower education level. The effect sizes indicate negligible to small
differences between groups.
20
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 11. Comparisons by education level—international sample
Groups
Education Mean SD n Comparisons d
(1) A-Level Scottish highers
or equivalent 12.4 4.7 91 (1) vs (2) –.05
(2) High education or
Diploma 12.6 4.3 163 (2) vs (3) .02
(3) Bachelor of arts 12.4 4.0 237 (3) vs (4) –.26
(4) Bachelor of Science 13.5 4.6 272 (4) vs (5) –.20
(5) Bachelor of Engineering 14.4 4.1 71 (5) vs (6) .10
(6) Master’s degree 14.0 4.4 685 (6) vs (7) –.14
(7) Doctorate degree 14.6 4.2 67 (1) vs (7) –.49
The effect of level of education on test scores from the Raven’s APM for the country-level
samples is based on data from 9298 respondents. The main part of the data is identical
to the data used for the analysis of sex and age effects, and the effects of education are
presented divided by sample. Overall, the results show that performance on Raven’s APM
increases by level of education. This is an expected finding and in concordance with
previous research.
Table 12. Descriptive statistics of APM raw total scores and education group
comparisons—Australia/NZ
Table 13. Descriptive statistics of APM raw total scores and education group
comparisons—India
21
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 14. Descriptive statistics of APM raw total scores and education group
comparisons—US
Occupation
To make it possible to analyze occupations, an exclusion criterion for the selection of
clusters to be included in the analysis of the international sample was used. Occupations
with n counts greater than or equal to 80 were analyzed to avoid bias due to small
sample sizes. The results are presented in Table 15.
Results showed the group of students to have the highest mean value, followed by
the engineers and the consultants. Human resource professionals were found to have the
lowest mean score, followed by marketers. Students and human resource professionals
showed a large group difference of approximately 1 SD, while the other effect sizes were
negligible and small.
The high mean score of the student group could be explained by two
hypotheses. Firstly, the student sample is mostly composed of younger individuals, who
research suggests may perform better on the test compared to older respondents.
Secondly, approximately 40% of students in the sample were at the Master’s degree
level of education. As shown in Table 7, the greater the number of years of schooling, the
greater the chance of achieving a high score on the APM.
Table 15. Descriptive statistics of APM raw total scores and occupational
group comparisons—international sample
Occupation
Occupation Mean SD n al Groups d
HR Professional 11.7 4.0 80 (1) vs (2) –.30
Marketing 13.0 4.1 81 (2) vs (3) –.07
Accountant 13.3 4.4 170 (3) vs (4) –.00
IT Professional 13.3 4.4 124 (4) vs (5) –.19
Financial analyst 14.1 4.3 111 (5) vs (6) –.03
Consultant 14.2 4.6 152 (6) vs (7) –0.03
Engineer 14.4 3.7 108 (7) vs (8) –0.46
Student 16.1 4.1 173 (1) vs (8) –1.09
22
Copyright © 2015 Pearson, Inc. All rights reserved.
Evidence of Validity
Validity is a unitary concept and refers to the degree to which all the accumulated
evidence and theory supports the intended interpretation of the test scores for the
proposed use. Thus, there are different types of validity evidence (that can be collected
from different sources), rather than distinct types of validity, and it is the interpretation
of test scores required by proposed uses that are evaluated, not the test itself. In
addition, if there is a decision to be made, it is the decision itself which should be
validated rather than a single test score. Please see Ravens International Manual Part 1
Interpretation–Predictive information for a more detailed discussion. In the following,
relevant evidence in support for the validity of Raven’s APM intended for application in a
work and organizational setting is presented.
Within the work and organizational domain however, the primary purpose of testing
is often to predict other external behaviors such as job performance rather than fully
describing an individual regarding g and not drawing any further conclusions.
Fortunately, convincing research has shown that a “full” measure of g is unnecessarily
extensive and expensive when the aim is to predict for example job performance
(Postlethwaite, 2011). In fact, factor-analytic studies have repeatedly demonstrated that
the item format matrices of the kind in Raven’s APM are one of the best single indicators
of g (e.g., Llabre, 1984; Snow et al., 1984; Spearman, 1927a; 1927b; Vernon, 1942).
Therefore, Raven’s APM aims for the position of being an indicator of g, rather than a full
measure of g. The empirical support for this aim is massive.
Matrices require the individual to perform mental operations needed when facing
new tasks and cannot be performed automatically. Examples include: Recognition,
concept formation, understanding implications, problem solving, extrapolation,
reorganization and transformation of information (Flanagan & Ortiz, 2001). Matrices
23
Copyright © 2015 Pearson, Inc. All rights reserved.
require both inductive and deductive problem solving and demands that the individual
mentally can manipulate patterns and symbols into a logical context. In addition, the
non-verbal content of the items minimizes the effect of prior knowledge and verbal ability
on the test scores to the benefit of the individual’s intellectual potential.
The standardized instructions kept at a low readability level, and practice items
including the rational provided for each response alternative, administered online under
supervised conditions serves to ensure individuals equal opportunities to perform on the
Raven’s APM.
Evidence of convergent validity for the current version of the APM is supported by
several findings. In a subset of 41 respondents from the standardisation sample, the
revised APM scores correlated .54 with scores on the Watson-Glaser Critical Thinking
Appraisal®–Short Form (Watson & Glaser, 2006). Further, in a sample of N = 276 (see
Appendix A, Table A.3) Raven’s APM correlated r = .51 (p < .001) with the total score on
the Advanced Numerical Reasoning Appraisal (ANRA; a cognitive ability assessment
measuring quantitative reasoning) and in a sample of N = 307 (demographics is
presented in Appendix A, Table A.4) the short version of the WGCTA r = .37 (p < .001).
There is abundant evidence that measures of general mental ability, such as the
APM, are significant predictors of overall performance across jobs. For example, in its
publication on the Principles for the Validation and Use of Personnel Selection Procedures
(2003) it is established that validity generalization is well-established for cognitive ability
24
Copyright © 2015 Pearson, Inc. All rights reserved.
tests. Schmidt & Hunter (2004) provide evidence that general mental ability “predicts
both occupational level attained and performance within one's chosen occupation and
does so better than any other ability, trait, or disposition and better than job experience”
(p. 162). Prien, Schippmann, and Prien (2003) observe that decades of research “present
incontrovertible evidence supporting the use of cognitive ability across situations and
occupations with varying job requirements” (p. 55). In addition, many other studies
provide evidence of the relationship between general mental ability and job performance
(e.g., Kolz, McFarland, & Silverman, 1998; Kuncel, Hezlett, & Ones, 2004; Ree &
Carretta, 1998; Salgado, et al., 2003; Schmidt & Hunter, 1998; Schmidt & Hunter,
2004).
25
Copyright © 2015 Pearson, Inc. All rights reserved.
References
Anastasi, A.. & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River,
N.J.: Prentice Hall.
Brouwers, S. A., Van de Vijver, F., & Van Hemert, D. A. (2009). Variation in Raven’s
Progressive Matrices scores across team and place. Learning and Individual
Differences, 19 (3), 330–338.
Cohen, J., Cohen, P., West, S. G, Aiken, L. S. (2003). Applied Multiple Regression /
Correlation Analysis for the Behavioral Sciences. London: Lawrence Erlbaum.
Fay, D., & Frese, M. (2001). The concept of Personal Initiative: An overview of validity
studies. Human Performance, 14 (1), 97–124.
Gonzalez, C., Thomas, R.P., & Vanyukov, P. (2005). The relationship between cognitive
ability and dynamic decision making. Intelligence, 33, 169–186.
Koenig, K. A., Frey, M. C., & Detterman, D.K. (2007). ACT and general cognitive ability.
Intelligence, doi:10.1016/j.intell.2007.03.005, 1–8.
Kolz, A. R., McFarland, L. A., & Silverman, S. B. (1998). Cognitive ability and job
experience as predictors of work performance. The Journal of Psychology, 132, 539–
548.
Kuncel, N. A., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career
potential, creativity, and job performance: Can one construct predict them all?
Journal of Personality and Social Psychology, 86, 148–161.
Prien, E.P., Schippmann, J.S., & Prien, K.O. (2003). Individual assessment as practiced in
industry and consulting. Mahwah, NJ: Lawrence Erlbaum.
26
Copyright © 2015 Pearson, Inc. All rights reserved.
Raju, N. S., Price, L. R., Oshima, T. C., & Nering, M. L. (2007). Standardized conditional
SEM: A case for conditional reliability. Applied Psychological Measurement, 31(3),
169–180.
Raven, J., Raven, J. C., & Court, J. H. (1998). Raven Manual: Section 4, Advanced
Progressive Matrices, 1998 edition. Oxford, UK: Oxford Psychologists Press.
Ree, M. J., & Carretta, T. R. (1998). General cognitive ability and occupational
performance. In C. L. Cooper & I. T. Robertson (Eds.), International review of
industrial and organizational psychology (Vol. 13, pp. 159–184). Chichester, UK:
Wiley.
Salgado, J., Anderson, N., Moscoso, S., Bertua, C., & de Fruyt, F. (2003). International
validity generalisation of GMA and cognitive abilities. Personnel Psychology, 56, 573-
605.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in
personnel psychology: Practical and theoretical implications of 85 years of research
findings. Psychological Bulletin, 124, 262–274.
Schmidt, F. L., & Hunter, J. E. (2004). General mental ability in the world of work:
Occupational attainment and job performance. Journal of Personality and Social
Psychology, 86, 162–173.
Principles for the validation and use of personnel selection procedures (4th ed.). (2003).
Society for Industrial and Organizational Psychology (Division 14 of the American
Psychological Association). Bowling Green, OH: Society for Industrial and
Organizational Psychology.
Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The typography of ability and
learning correlations. In R. J. Sternberg (Ed.), Advances in the Psychology of Human
Intelligence, (Vol. 2, pp. 47–103). New York: MacMillan.
Spearman, C. (1927b). The nature of “intelligence” and the Principles of cognition (2nd
ed.). London, United Kingdom: MacMillan.
27
Copyright © 2015 Pearson, Inc. All rights reserved.
U.S. Department of Labor (1999). Testing and assessment: An employer’s guide to good
practices. Washington DC: U.S. Department of Labor.
Vernon, P. E. (1942). The reliability and validity of the Progressive Matrices Test. London,
United Kingdom: Admiralty Report, 14(b).
Watson, G., & Glaser, E. M. (2006). Watson-Glaser Critical Thinking Appraisal, Short
Form manual. San Antonio, TX: Pearson.
28
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix A Demographics of U.S. Sample
29
Copyright © 2015 Pearson, Inc. All rights reserved.
Table A.2 Demographics of U.S. Sample Used to Calculate Item
Difficulties
Industry N % of Sample
Aerospace, Aviation 2 0.30
Arts, Entertainment, Media 6 0.90
Construction 13 1.96
Education 40 6.03
Energy, Utilities 7 1.06
Financial Services, Banking, Insurance 194 29.26
Government, Public Service, Defense 7 1.06
Health Care 23 3.47
Hospitality, Tourism 6 0.90
Information Technology, High-Tech, 5.43
Telecommunications 36
Manufacturing & Production 96 14.48
Natural Resources, Mining 1 0.15
Pharmaceuticals, Biotechnology 64 9.65
Professional, Business Services 32 4.83
Publishing, Printing 3 0.45
Real Estate 7 1.06
Retail & Wholesale 35 5.28
Transportation Warehousing 10 1.51
Other 68 10.26
Not Applicable 13 1.96
Total 663 100.00
Position
Not Applicable 79 11.92
Executive 65 9.80
Director 65 9.80
Manager 134 20.21
Professional/Individual Contributor 264 39.82
Supervisor 10 1.51
Self-Employed/Business Owner 4 0.60
Administrative/Clerical 21 3.17
Skilled Trades 3 0.45
Customer Service/Retail Sales 16 2.41
General Labor 2 0.30
Total 663 100.00
Sex
Male 392 59.13
Female 169 25.49
No Response 102 15.38
Total 663 100.00
30
Copyright © 2015 Pearson, Inc. All rights reserved.
Table A2—continued
% of
Ethnicity N Sample
White (Non-Hispanic) 334 50.38
Black, African American 34 5.13
Hispanic, Latino/a 25 3.77
Asian/Pacific Islander 142 21.42
Other 6 0.90
Multiracial 10 1.51
No Response 112 16.89
Total 663 100.00
% of
Age N Sample
21–24 57 8.60
25–29 149 22.47
30–34 108 16.29
35–39 71 10.71
40–49 128 19.31
50–59 32 4.83
60–69 5 0.75
No Response 113 17.04
Total 663 100.00
% of
Education Level N Sample
HS/GED 3 0.45
1–2 yrs college 4 0.60
Assoc. 4 0.60
3–4 yrs college 14 2.11
Bachelor’s degree 234 35.29
Master’s degree 261 39.37
Doctorate 35 5.28
No Response 108 16.29
Total 663 100.00
31
Copyright © 2015 Pearson, Inc. All rights reserved.
Table a.3 Demographics of U.S. Sample Used to Calculate
Correlations Between APM and ANRA (RANRA in UK)
Industry N % of Sample
Aerospace, Aviation 1 0.36
Arts, Entertainment, Media 1 0.36
Construction 1 0.36
Education 31 11.23
Energy, Utilities 1 0.36
Financial Services, Banking, Insurance 152 55.07
Government, Public Service, Defense 5 1.81
Health Care 5 1.81
Hospitality, Tourism 2 0.72
Information Technology, High-Tech, 7.97
Telecommunications 22
Manufacturing & Production 5 1.81
Pharmaceuticals, Biotechnology 6 2.17
Professional, Business Services 12 4.35
Real Estate 3 1.09
Retail & Wholesale 8 2.90
Transportation Warehousing 3 1.09
Other 11 3.99
Not Applicable 1 0.36
No Response 6 2.17
Total 276 100.00
Position
Not Applicable 36 13.04
Executive 5 1.81
Director 6 2.17
Manager 36 13.04
Professional/Individual Contributor 164 59.42
Supervisor 5 1.81
Self-Employed/Business Owner 3 1.09
Administrative/Clerical 10 3.62
Skilled Trades 2 0.72
Customer Service/Retail Sales 3 1.09
No Response 6 2.17
Total 276 100.00
Sex
Male 160 57.97
Female 63 22.83
No Response 53 19.20
Total 276 100.00
32
Copyright © 2015 Pearson, Inc. All rights reserved.
Table A3—continued
Ethnicity N % of Sample
White (Non-Hispanic) 70 25.36
Black, African American 16 5.80
Hispanic, Latino/a 8 2.90
Asian/Pacific Islander 116 42.03
Other 1 0.36
Multiracial 5 1.81
No Response 60 21.74
Total 276 100.00
Age
21–24 31 11.23
25–29 79 28.62
30–34 61 22.10
35–39 20 7.25
40–49 18 6.52
50–59 5 1.81
60–69 2 0.72
No Response 60 21.74
Total 276 100.00
Education Level
HS/GED 1 0.36
3–4 yrs college 6 2.17
Bachelor’s degree 69 25.00
Master’s degree 128 46.38
Doctorate 18 6.52
No Response 54 19.57
Total 276 100.00
33
Copyright © 2015 Pearson, Inc. All rights reserved.
Table a.4 Industries Used to Calculate Correlations
Between APM and Watson–Glaser Short Form (U.S
Sample)
Industry N % of Sample
Aerospace, Aviation 1 0.33
Arts, Entertainment, Media 1 0.33
Construction 1 0.33
Education 31 10.10
Energy, Utilities 1 0.33
Financial Services, Banking, Insurance 159 51.79
Government, Public Service, Defense 5 1.63
Health Care 6 1.95
Hospitality, Tourism 2 0.65
Information Technology, High-Tech,
Telecommunications 23 7.49
Manufacturing & Production 26 8.47
Pharmaceuticals, Biotechnology 6 1.95
Professional, Business Services 15 4.89
Real Estate 3 0.98
Retail & Wholesale 12 3.91
Transportation Warehousing 3 0.98
Other 11 3.58
Not Applicable 1 0.33
Total 307 100.00
Position
Not Applicable 36 11.73
Executive 24 7.82
Director 14 4.56
Manager 36 11.73
Professional/Individual Contributor 171 55.70
Supervisor 5 1.63
Self–Employed/Business Owner 3 0.98
Administrative/Clerical 11 3.58
Skilled Trades 2 0.65
Customer Service/Retail Sales 5 1.63
Total 307 100.00
Sex
Male 186 60.59
Female 71 23.13
No Response 50 16.29
Total 307 100.00
35
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix B Demographics of International Sample
36
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.2 France Sample Demographic Information
N % of Sample
Education Level 100 100.00
11 years (1ière, CAP–BEP) 1 1.00
13–14 years (Bac +1 et 2) 7 7.00
15–16 years (Bac +2 et 3) 11 11.00
17–18 years (Bac + 4 et 5) 74 74.00
More than 18 years 4.00
(Doctorate) 4
Not Reported 3 3.00
Sex
Female 51 51.00
Male 49 49.00
Age
21–24 13 13.00
25–29 25 25.00
30–34 17 17.00
35–39 10 10.00
40–49 18 18.00
50–59 15 15.00
60–69 1 1.00
Not Reported 1 1.00
37
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.3 India Sample Demographic Information
N % of Sample
Education Level 100 100.00
Bachelor’s degree 49 49.00
Master’s degree 39 39.00
Doctorate 2 2.00
Other 6 6.00
Not Reported 4 4.00
Sex
Female 22 22.00
Male 78 78.00
Age
20–24 2 2.00
25–29 27 27.00
30–34 33 33.00
35–39 17 17.00
40–44 10 10.00
45–49 4 4.00
Not Reported 7 7.00
38
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.4 Netherlands Sample Demographic Information
N % of Sample
Education Level 103 100.00
<12 (mbo niveau 4) 4 3.88
12 (havo) 2 1.94
13 (vwo) 4 3.88
14 (hbo) 50 48.54
15 (wo) 31 30.10
Not Reported 12 11.65
Sex
Female 33 32.04
Male 59 57.28
Not Reported 11 10.68
Age
21–24 1 0.97
25–29 7 6.80
30–34 6 5.83
35–39 23 22.33
40–49 44 42.72
50–59 11 10.68
Not Reported 11 10.68
Years in Occupation
1 year to less than 2 years 1 0.97
2 years to less than 4 years 5 4.85
4 years to less than 7 years 2 1.94
7 years to less than 10 8.74
years 9
10 years to less than 15 18.45
years 19
≥15 years 56 54.37
Not Reported 11 10.68
39
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.5 Scandinavian Sample Demographic Information
N % of Sample
Education Level 319 100.00
Elementary School 17 5.33
Gymnasium/High School 77 24.14
University (up to 3 years) 109 34.17
University (more than 3 36.36
years) 116
Sex
Female 154 48.28
Male 165 51.72
Age
21–24 4 1.25
25–29 36 11.29
30–34 18 5.64
35–39 18 5.64
40–49 86 26.96
50–59 79 24.76
60–69 68 21.32
Years in Occupation
<1 year 7 2.19
1 year to less than 2 years 44 13.79
2 years to less than 4 years 53 16.61
4 years to less than 7 years 75 23.51
7 years to less than 10 years 33 10.34
10 years to less than 15 15.36
years 49
≥15 58 18.18
40
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.6 Singapore Sample Demographic Information
N % of Sample
Education Level 229 100.00
A–level, Scottish Highers or
equivalent 51 22.27
BA 24 10.48
BEng 7 3.06
BSc 34 14.85
Doctorate 2 0.87
GCSE or equivalent 57 24.89
Higher Education Certificate or
Diploma 13 5.68
LLB 4 1.75
Master’s degree 21 9.17
No formal qualification 1 0.44
Other 11 4.80
Not Reported 4 1.75
Sex
Female 90 39.30
Male 137 59.83
Not Reported 2 0.87
Age
16–19 years 79 34.50
20–24 years 63 27.51
25–29 years 28 12.23
30–34 years 15 6.55
35–39 years 13 5.68
40–44 years 12 5.24
45–49 years 12 5.24
50–54 years 1 0.44
55–59 years 4 1.75
Not Reported 2 0.87
Years in Occupation
Less than 1 year 43 18.78
1 to 2 years 45 19.65
3 to 4 years 20 8.73
5 to 7 years 15 6.55
8 to 10 years 13 5.68
11 to 15 years 63 27.51
16 to 20 years 18 7.86
21 to 25 years 4 1.75
Not Reported 8 3.49
41
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.7 UK Sample Demographic Information
N % of Sample
Education Level 101 100.00
GCSE equivalent 3 2.97
A-level, Scottish Highers or equivalent 1 0.99
Higher Education Certificate or Diploma 12 11.88
BA 29 28.71
BSc 16 15.84
Bed 1 0.99
LLB 3 2.97
Master’s degree 26 25.74
Doctorate 2 1.98
Other 7 6.93
Not Reported 1 0.99
Sex
Female 46 45.54
Male 46 45.54
Not Reported 9 8.91
Age
16–19 4 3.96
20–24 1 0.99
25–29 10 9.90
30–34 28 27.72
35–39 31 30.69
40–44 12 11.88
45–49 5 4.95
50–54 2 1.98
55–59 2 1.98
Not Reported 6 5.94
Years in Occupation
<1 year 8 7.92
1–2 years 18 17.82
3–4 years 17 16.83
5–7 years 19 18.81
8–10 years 16 15.84
11–15 years 15 14.85
16–20 years 6 5.94
20+ years 2 1.98
42
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.8 U.S. Sample Demographic Information
N % of Sample
Education Level 342 100.00
8th–11th Grade 1 0.29
HS/GED 17 4.97
1–2 yrs of College 31 9.06
Associates 16 4.68
3–4 yrs of College 15 4.39
Bachelor’s degree 168 49.12
Master’s degree 88 25.73
Doctorate 4 1.17
Not Reported 2 0.58
Sex
Female 103 30.12
Male 238 69.59
Not Reported 1 0.29
Age
16–20 2 0.58
21–24 7 2.05
25–29 21 6.14
30–34 54 15.79
35–39 67 19.59
40–49 121 35.38
50–59 62 18.13
60–69 4 1.17
Not Reported 4 1.17
Years in Occupation
<1 year 18 5.26
1 year to less than 2 years 29 8.48
2 years to less than 4 years 51 14.91
4 years to less than 7 years 43 12.57
7 years to less than 10 42 12.28
years
10 years to less than 15 73 21.35
years
>15 years 83 24.27
Not Reported 3 0.88
43
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix C Raven’s APM Item Analyses for International
Sample
44
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.2 France
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
1 –3.38 0.05 0.90 0.97 0.00
2* — — — — —
3 –1.04 0.34 0.97 0.80 0.25
4 –1.97 0.32 1.03 0.90 0.26
5 –0.76 0.39 1.03 0.76 0.32
6 –0.45 0.15 0.45 0.71 0.02
7 –0.70 0.32 0.90 0.75 0.23
8 –0.70 0.45 1.13 0.75 0.37
9 –0.07 0.44 1.10 0.64 0.34
10 –0.63 0.49 1.21 0.74 0.43
11 0.34 0.38 0.88 0.56 0.26
12 –0.17 0.48 1.21 0.66 0.40
13 –0.07 0.25 0.55 0.64 0.13
14 0.49 0.50 1.32 0.53 0.41
15 0.98 0.40 0.92 0.43 0.27
16 1.18 0.42 0.99 0.39 0.30
17 0.04 0.52 1.35 0.62 0.44
18 0.34 0.57 1.56 0.56 0.50
19 0.78 0.24 0.31 0.47 0.13
20 0.93 0.45 1.08 0.44 0.32
21 0.88 0.43 1.03 0.45 0.32
22 0.93 0.48 1.20 0.44 0.38
23 3.06 0.32 0.98 0.12 0.24
*100 percent of the sample obtained a perfect score on Item 2, making the IRT and CTT
values inestimable.
45
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.3 India
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
46
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.4 The Netherlands
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
47
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.5 Scandinavia
Item Difficulty Item–Ability Item Difficulty
(b) Parameter (θ) Correlation Index (p value, Item–Total
Item # (IRT) (IRT) CTT) Correlation (CTT)
48
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.6 United Kingdom
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
1 –3.12 0.26 1.03 0.94 0.22
2 –2.94 0.33 1.09 0.93 0.31
3 –2.27 0.36 1.06 0.88 0.32
4 –1.06 0.40 0.93 0.73 0.34
5 –1.47 0.32 0.86 0.79 0.25
6 –1.06 0.36 0.83 0.73 0.29
7 –0.93 0.38 0.85 0.71 0.31
8 0.03 0.55 1.25 0.54 0.50
9 –0.40 0.58 1.37 0.62 0.54
10 –0.08 0.42 0.78 0.56 0.33
11 0.45 0.53 1.17 0.46 0.47
12 0.51 0.39 0.61 0.45 0.31
13 0.51 0.54 1.18 0.45 0.46
14 0.56 0.58 1.34 0.44 0.52
15 1.00 0.54 1.16 0.36 0.46
16 1.42 0.48 0.99 0.29 0.39
17 0.45 0.52 1.14 0.46 0.45
18 0.24 0.52 1.13 0.50 0.46
19 0.67 0.45 0.86 0.42 0.38
20 1.35 0.37 0.68 0.30 0.26
21 1.35 0.42 0.82 0.30 0.30
22 1.06 0.52 1.11 0.35 0.44
23 3.72 0.39 0.99 0.06 0.29
49
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.7 United States
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
1 –3.11 0.31 1.06 0.94 0.26
50
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix D Demographics of Concerto Sample (N =
1720)
N % of Sample
Age 1720 100.00
<18 8 0.47
18–20 194 11.28
21–30 1014 58.95
31–40 265 15.41
41–50 135 7.85
51–60 81 4.71
>60 23 1.34
Sex
Female 830 48.26
Male 856 49.77
Other 7 0.41
Unspecified 27 1.57
Ethnicity
Asian/Pacific Islander 1111 64.59
Black/African American 33 1.92
Hispanic/Latino 36 2.09
Native American/American 4 0.23
Indian
White 489 28.43
Others 47 2.73
Education Level
No schooling completed 31 1.80
Nursery school to 8th grade 19 1.10
Some high school, no 106 6.16
diploma
High school graduate, 303 17.62
diploma or equivalent
Associate degree 84 4.88
Bachelor’s degree 727 42.27
Master’s degree 316 18.37
Professional degree 42 2.44
Doctorate degree 92 5.35
51
Copyright © 2015 Pearson, Inc. All rights reserved.