0% found this document useful (0 votes)
58 views

APM Tech Manual

Uploaded by

chanthni.ehpss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

APM Tech Manual

Uploaded by

chanthni.ehpss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

RAVEN’S™ APM-III

PROGRESSIVE MATRICES Progressive Matrices


RAVEN’STM Advanced
(APM-III)
MANUAL
MANUAL
PART I
PART II
Copyright © 2015 NCS Pearson, Inc. All rights reserved

Warning: No part of this publication may be reproduced or transmitted in any form or


by any means, electronic or mechanical, including photocopy, recording, or any
information storage and retrieval system, without permission in writing from the
copyright owner.

Pearson, TalentLens, Raven's Progressive Matrices and the logo are trademarks in
the US and/or other countries, of Pearson Education, Inc., or its affiliate(s).

Adobe® and Adobe Acrobat® are trademarks of Adobe Systems, Inc.


Table of Content

Introduction ........................................................................................ 5

Development of the Fixed Form ........................................................ 5

International Adaptation ................................................................... 9


Development of the Item-Banked Version ........................................ 10
Scoring .......................................................................................... 11

Evidence of Reliability ...................................................................... 12

Standard Error of Measurement............................................................ 14


Evidence of Reliability Based on International Data .................................... 16
Group Differences ............................................................................. 16
Sex ....................................................................................... 18
Age ....................................................................................... 18
Education Level ...................................................................... 20
Occupation ............................................................................. 22

Evidence of Validity ......................................................................... 23


Evidence of Content Validity ............................................................ 23
Evidence of Convergent Validity ......................................................... 24
Evidence of Criterion-Related Validity ................................................... 24

References ....................................................................................... 26

Appenidix A Demographics of U.S Sample (N = 929) ........................ 29

Appendix B Demographics of International Samples ....................... 36

Appendix C Raven’s APM Item Analyses for International Samples 44

Appendix D Demographics of Concerto Sample (N = 1720) ............. 51

List of Figures
Figure 1. Item difficulties for Raven’s APM fixed form ........................... 7

List of Tables
Table 1. Item difficulties for Raven’s APM fixed form ............................. 8
Table 2. Item selection structure ...................................................... 11
Table 3. Descriptive statistics based on APM raw total scores, reliability,
and SEM for country-specific standardization samples......................... 15
Table 4. Reliability and Descriptive statics based on
Raven’s APM Item Bank theta scores (N = 466) ................................. 16
Table 5. Descriptive statistics of APM raw total scores and
comparisons by sex—international sample......................................... 18
3
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 6. Descriptive statistics of APM raw total scores and
comparisons by sex—Australia/New Zealand, India, and US ................ 18
Table 7. Descriptive statistics of APM raw total scores and
age group comparisons—international sample ................................... 19
Table 8. Descriptive statistics of APM raw total scores and
age group comparisons—Australia/NZ ............................................... 19
Table 9. Descriptive statistics of APM raw total scores and
age group comparisons—India ......................................................... 19
Table 10. Descriptive statistics of APM raw total scores and
age group comparisons—US ............................................................ 20
Table 11. Comparisons by education level—international sample.......... 21
Table 12. Descriptive statistics of APM raw total scores and
education group comparisons—Australia/NZ ...................................... 21
Table 13. Descriptive statistics of APM raw total scores and
education group comparisons—India ................................................ 21
Table 14. Descriptive statistics of APM raw total scores and
education group comparisons—US .................................................... 22
Table 15. Descriptive statistics of APM raw total scores and
occupational group comparisons ...................................................... 22

4
Copyright © 2015 Pearson, Inc. All rights reserved.
Introduction
The Raven’s Progressive Matrices have been used in many countries for decades as a
measure of problem-solving and reasoning ability (Raven, Raven, & Court, 1998a). The
various editions of the Raven’s Progressive Matrices (standard, advanced and colored)
have been studied in more than 48 countries on samples totaling more than 240,000
participants (Brouwers, Van de Vigver, & Van Hemert, 2009; Wongupparaj, Kumari, &
Morris; 2015).

This manual, Part 2 of the Raven’s International Manual, describes the development
and standardization of the Raven’s Advanced Progressive Matrices (APM) fixed form (also
known as APM short form or APM 2.0) and the subsequent item-banked version designed
for use within the domain of work and organizational psychology. The adaptation and
standardization process of language versions across countries is outlined. Information on
group differences regarding age, sex, and ethnicity also is presented.

Several enhancements were made to facilitate cross-country score comparisons and


to standardize the testing experience for administrators and participants. These
enhancements include:

—Implementation of a common set of instructions, items, and administration time


across countries (i.e., 23 items; 4 minutes)

—Use of a uniform test format for delivery (online)

—Uniform scoring and reporting of scores across countries

—Availability of local manager norms for each country, based on a common


definition of “manager” across countries

Development of the Fixed Form

The APM fixed form, consisting of 23 items, was developed to provide customers with a
shorter assessment that maintains the nature of the construct being measured and the
psychometric integrity of the assessment.

Raven’s APM is a power assessment, rather than a speeded assessment. Power


assessments are characterized by a wide range of item difficulty and a relatively
generous time limit while speeded assessments typically are composed of relatively easy
items and rely on the number of correct responses within restrictive time limits to
differentiate performance among candidates. The 40-minute administration time for the
shorter fixed form version maintains the APM as an assessment of cognitive reasoning
power rather than speed.

5
Copyright © 2015 Pearson, Inc. All rights reserved.
Data on the English version was initially collected in the United States from May
2006 through to October 2007. In total, data from 929 applicants and employees were
collected, representing a number of positions across various occupations. Information
regarding the respondents’ current occupation and organizational level is presented in
Appendix A, Table A.1.

Classical Test Theory (CTT) and Item Response Theory (IRT) methodologies were
used in the APM data analysis for item selection. Specifically, for each of the 36 items in
the previous APM version, the IRT item difficulty index (p value), item discrimination (a),
IRT item difficulty (b), and corrected item–total correlation were examined in the
selection process. The analyses inferred two main revisions due to the underlying aim of
the APM to become increasingly more difficult as the items progress. First, less
discriminating items were dropped when selecting items for the 23-item fixed form
version, thus less efficient items were excluded, and second, because the test was shown
to be operationally more difficult than intended, two of the initial items were replaced
with two less difficult items so that respondents would experience the first five items as
easier than the subsequent items.

Item difficulties of the fixed form are presented in Figure 1 and Table 1. The item
difficulties were calculated using a drawn sample of 663 diverse respondents (see
Appendix A, Table A.2 for additional sample descriptive information). As shown in Figure
1 and Table 1, the items increase in difficulty (e.g., item 1 = .94, meaning 94% of
respondents answer the item correctly) through item 23, which yields an item difficulty of
.09 (only 9% of respondents answer the item correctly). This indicates that the fixed
form version of the Raven’s APM is neither too easy nor too difficult, and that it is able to
measure cognitive ability levels across a continuum. The consistency in range and levels
of difficulty was examined further by drawing a subsample of Executives, Directors, and
Managers (N = 264). As shown in Figure 1 and Table 1, the item difficulties for this group
do not differ appreciably from the item difficulties for the entire sample.

6
Copyright © 2015 Pearson, Inc. All rights reserved.
Figure 1. Item difficulties for Raven’s APM fixed form

7
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 1. Item difficulties for Raven’s APM fixed form

Entire Sample Execs/Dirs/Mgrs


Item (N = 663) (N = 264)
1 .94 .96
2 .91 .90
3 .85 .85
4 .80 .77
5 .76 .78
6 .72 .71
7 .81 .80
8 .55 .55
9 .66 .68
10 .66 .68
11 .51 .53
12 .55 .56
13 .46 .42
14 .44 .45
15 .34 .33
16 .32 .29
17 .45 .41
18 .50 .46
19 .39 .32
20 .38 .34
21 .37 .33
22 .32 .28
23 .09 .09

8
Copyright © 2015 Pearson, Inc. All rights reserved.
International Adaptation
The U.S. standardized 23-item fixed form version was translated, adapted, and
standardized in a number of countries and their respective languages. To ensure
consistent measurement of the construct across language versions and countries, each
country-specific adaptation followed a uniform process. Instructions were translated from
U.S. English into the target language either by an independent translator (the
Netherlands) or by test development experts at the local Pearson Talent Lens office.
Then, local test development experts reviewed the translation—including the original U.S.
English version adapted for other English speaking countries—and further adapted and
refined the final translation as needed. The translated online version was then
administered to manager-level respondents across various industries.

English version standardization data was collected in Australia/New Zealand, India,


Singapore and the United Kingdom. Standardization data for the respective language
version was also collected in France, the Netherlands, Sweden, Norway, and Denmark.
Demographic information for each country-specific sample is presented in Appendix B,
Tables B1–B7.

The majority of respondents completed the APM under timed (40 minutes) and
proctored (i.e., supervised) conditions within the period of June 2008 through to March
2011. The only exception concerned respondents from Singapore, who completed the
test between August 2013 and November 2014. Testing period, language version used,
descriptive statistics, and internal consistency/reliability estimates for all country-specific
standardization samples are presented in Table 2.

9
Copyright © 2015 Pearson, Inc. All rights reserved.
Development of the Item-Banked Version

Due to the popularity combined with the need for unsupervised administration,
development of an item-banked version was initiated in January 2014. The aim was to
create an item bank of matrices containing items with equal characteristics that could be
randomly selected to provide unique item sets for each administration.
The initial step of the project was conducted in collaboration with the Psychometrics
Centre at University of Cambridge, and entailed the creation of 230 new Raven’s APM
items. Four parallel items were created for each of the 23 original operational items,
totaling 92 items.
A total of 1720 participants were recruited worldwide via the internet. Participants
answered the items online with English instructions via the platform Concerto, hosted by
the Psychometric Centre in Cambridge. Appendix D provides demographic information on
the age, sex, ethnicity, and educational level of the sample.
Each participant was administered a set of 36 items - 12 operational items and 24
new items. The 12 operational items were selected from the full set of 23 original
operational items. Original item selections were done randomly between pairs of items.
For example, original operational items 1 and 2 constituted a pair and either item 1 or
item 2 was randomly selected for administration. The structure of pairs and logic of
selection is described in Table 3.

10
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 2. Item selection structure

Original Item number Administer


1 2 1 or 2
3 4 3 or 4
5 6 5 or 6
7 8 7 or 8
9 10 9 or 10
11 12 11 or 12
13 14 13 or 14
15 16 15 or 16
17 18 17 or 18
19 20 19 or 20
21 22 21 or 22
23 Random between 1–22 23 or any between 1–22
Total number of items selected 12

The 24 new items were pulled from the parallel items. For the 12 selected original items;
2 out of 4 (24 total) parallel items were randomly selected. After item selection according
to the described structure, the order of the 36 selected items was randomized before
administration to participants.

Performance on the 12 original operational items in each administration became the


basis for score estimation of respondents with the IRT model. Estimated scores were
provided in feedback to the respondents.

Performance on the original operational items was then used to calibrate the new
items, obtain item parameters and demographics to ensure item parallelism, reliability,
and validity. For each of the original 23 Raven’s APM items, the four most statistically
equivalent items were chosen for final inclusion in the new bank of items,(currently
standing at 92). Statistical examination of the trial data (N = 1720) found the original
operational 23 items to correlate with the new parallel items with a correlation coefficient
value of at least .97. This suggests that the new and original items are highly equivalent.

Respondents completing the online Raven’s APM test will be administered with one item
out of the 4 available parallel items. This randomization reduced the chance that any two
administrations will contain identical set of items and thereby protect the integrity of the
assessment.

Scoring
Each item in the bank is given a precise difficulty value, which runs on a finely
incremented scale ranging from –3.000 (highly easy) to +3.000 (highly difficult). This
value is based on data looking at the number of test takers answering the question
correctly and allows us to examine the precise difficulty value of each item. Items are
11
Copyright © 2015 Pearson, Inc. All rights reserved.
also coded in terms of item discrimination, the extent to which items can differentiate
between high and low scorers on the test. This again ranges on a finely incremented
scale, ranging from around .3000 (reasonable levels of discrimination) to 3.000 (high
levels of discrimination).

Traditional scoring, where a raw score is calculated based on the number of correct
items, does not take into account the difficulty or discrimination values of the items
presented, which will vary slightly from test to test. To solve this problem, a scoring
algorithm is used that takes into account the difficulty and discrimination level of each
item. Test scores are presented as ‘theta scores’, which range from –4.000 (low ability)
to +4.000 (high ability). Theta scores can be treated the same as classic raw scores
when running statistics such as comparing differences between candidates. As this is not
an easy-to-interpret scale, theta scores are converted to percentiles for the purpose of
reporting results. Theta scores are available via the data report function on the online
test platform.

As mentioned previously, theta scores take into account the difficulty level of the
items presented to test takers. What this means in practice is that if a test taker answers
more of the harder items correctly, they’ll achieve a higher score than someone correctly
answering the same number of easier items. Answering a number of easier items
incorrectly will negatively impact a score, whilst answering a number of harder items
correctly will have a positive impact.

Percentiles based on the traditional number-correct method depend on a specific


set of items that have been administered. Theta scores can be calculated based on any
combination of test items. This means that for item bank tests, where test takers are
randomly presented with 23 items from the bank, theta scores must be used.

Evidence of Reliability
Reliability refers to the consistency of measurements when the measurement procedure
is repeated on a population of individuals or groups an infinite number of times; that is
the extent to which the two people of the same ability or the same person being tested
on different occasions will receive the same score (Anastasi & Urbina, 1997). This
characteristic is important because the usefulness of behavioral measurements
presupposes that individuals and groups exhibit some degree of stability in their
behavior.

Standardized testing conditions and administration settings—including online


administration with standardized instructions, scoring, example items, and a uniform
display of items helps to ensure the reliability in test scores. However, even under

12
Copyright © 2015 Pearson, Inc. All rights reserved.
standardized and controlled conditions, successive samples of behavior from the same
person are rarely identical in all pertinent respects. An individual’s performances and
responses to sets of test questions inevitably vary in their quality or character from one
occasion to another, due to an examinee trying harder, making luckier guesses, being
more alert, feeling less anxious, or enjoying better health on one occasion than another,
etc. Some individuals may exhibit less variation in their scores than others, but no
examinee is completely consistent. Because of this variation, a respondent’s obtained
score and the average score of a group will always suffer from at least a small amount of
deficiencies in reliability.

Classical Test Theory (CTT) posits that a test score is an estimate of an individual’s
hypothetical true score, or the score an individual would receive if the test is perfectly
reliable. In actual practice, administrators do not have the luxury of administering a test
an infinite number of times, so some measurement error is to be expected. A reliable test
has a relatively small measurement error.

Reliability is traditionally expressed as a coefficient that ranges from zero to one.


The closer the reliability coefficient is to one (1.00), the more reliable the test score and
the less measurement error there is associated with the test score. Reliability coefficients
estimated within the CTT framework are sample-dependent with higher reliability in
heterogeneous samples, lower reliability in groups at similar levels, and are affected by
the number of items; more items generate higher levels of reliability.

Evidence of reliability can be indicated by test–retest (stability of test scores over


time), internal consistency/correlation between scores on individual questions or groups
of questions in a test (Cronbach’s alpha and split half), and parallel test or alternate form
reliability (measure of the relationship between the administration of parallel versions of
the same test).

The internal consistency reliability estimate (split half) for the total score of the 23-
item fixed form version of APM was r = .85 in the U.S. standardization sample of N =
929. When tests are used in employment contexts, reliabilities above r = .89 are
generally considered “excellent”, r = .80–.89 as “good”, r = .70–.79 as “adequate”, and
below r = .70 “may have limited applicability” (U.S. Department of Labor, 1999, p.3-3,
for guidelines on interpreting reliability coefficients). This however, provides limited
information about the implications that the measurement error have for interpretation of
test scores, thus the applied implications of measurement error.

Evidence of the equivalence between the 36-item version and 23 item version is
also supported in that the reliability of the two assessments was virtually the same; the
internal consistency reliability of the 36-item version was r =.83. In addition, the internal

13
Copyright © 2015 Pearson, Inc. All rights reserved.
consistency reliability of the shorter version was essentially the same (r = .82) when
calculated using the independent sample of N = 663 respondents.

Standard Error of Measurement


Estimates of reliability have limited applicability in the interpretation of single test scores,
therefore, the standard error of measurement (SEM) is generally more relevant than the
reliability coefficient once a measurement procedure has been adopted and interpretation
of test scores has become the user’s primary concern.

The aggregated magnitude of SEM can be estimated, summarized, and considered in


interpretation and decision-making.

The SEM is the standard deviation (SD) of the measurement error distribution and
it is used to calculate confidence intervals. The confidence interval is a score range that,
at a specified level of probability, includes the respondent’s hypothetical “true” score that
represents the respondent’s actual ability. Because the true score is a hypothetical value
that can never be obtained—because all measurement, including testing, always involves
some measurement error—any obtained score is considered an estimate of the
respondent’s “true” score. Approximately 68% of the time, the observed score will be
within +1.0 and –1.0 SEM of the “true” score; 95% of the time, the observed score will
be within +1.96 and –1.96 SEM of the “true” score. The SEM (68 and 95%) for all
country-specific standardization samples is presented in Table 2.

14
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 3. Descriptive statistics based on APM raw total scores, reliability, and SEM for country-specific standardization samples

SEMsplit SEMsplit
Country Period Language N Mean SD Min Max Skew Kurtosis ralpha rsplit 68% 95%

Australia/
New Nov 2009–
Zealand Sept 2010 English 128 12.0 4.2 3 23 0.02 –0.39 .77 .78 1.96 3.83
Nov 2009–
France Mar 2010 French 106 14.3 4.1 3 22 –0.34 –0.09 .74 .79 1.87 3.67
Feb–April
India 2010 English 100 9.5 4.2 2 19 0.19 –0.80 .79 .82 1.79 3.51
Sept–Oct
Netherlands 2010 Dutch 103 13.0 4.5 2 22 –0.10 –0.64 .81 .83 1.87 3.67
June 2009–
UK Mar 2010 English 101 12.4 4.7 4 23 0.04 –0.71 .83 .85 1.83 3.58
Apr–Aug
US 2008 English 175 12.2 4.1 2 23 –0.03 –0.13 .77 .81 1.80 3.54
Aug 2013–
Singapore Nov 2014 English 229 15.0 4.7 1 23 –0.50 0.21 .84 .70 2.56 5.02
Mar–Apr
Sweden 2011 Swedish 105 9.7 4.4 0 20 0.16 –0.14 .79 .84 1.77 3.47
Mar–Apr
Norway 2011 Norwegian 102 9.5 4.2 1 20 0.41 –0.18 .77 .80 1.88 3.68
Mar–Apr
Denmark 2011 Danish 112 10.3 5.0 0 21 0.28 –0.65 .84 .88 1.74 3.40
Swedish/
Total Mar–Apr Norwegian/
Scandinavia 2011 Danish 319 9.8 4.5 0 21 0.28 –0.32 .80 .84 1.84 3.61

15
Copyright © 2015 Pearson, Inc. All rights reserved.
Evidence of Item-Bank Reliability Based on International Data
In March 2015, a sample of graduate applicants to a higher education course in the UK
(N = 466) completed the final version of the Raven’s APM item bank assessment online.
Traditional calculation of classical test theory based reliability indices such as Cronbach’s
Alpha require all test takers to complete all items. This type of analysis is not possible
with item-banked tests, whereby each test taker completes a different set of items.
Within IRT models, the information function provides estimates of accuracy of
measurement conditional on theta (ability). A single reliability coefficient from IRT theta
scores was estimated using the Standard Error of the theta score for each individual
based on the information function (Raju, Price, Oshima, & Nering, 2007). The results are
presented in Table 4 and indicate reliability for the Raven’s APM item bank to be above
the desired value of .70.

Table 4. Reliability and Descriptive statics based on Raven’s APM Item Bank
theta scores (N = 466)
Country UK
Period March 2015
Language English
Mean 0.3
SD 0.7
Min –2.14
Max 2.63
Skewness –0.04
Kurtosis 0.68
Reliability 0.73
Critical Value for 68% CI 0.35
Critical Value for 95% CI 0.69

Group Differences
It is important to consider issues of fairness, discrimination, and equal opportunity for
legal and ethical reasons in any assessment process. These issues are closely intertwined
and affect the meaning and impact of group differences in assessment scores. Local
legislation also may be a complicating factor in understanding the impact and meaning of
group differences in applied practice.

The group difference estimates presented are not sufficient to conclude if the
Raven’s APM is fair, unfair, or discriminate. This claim is dependent upon several factors.

1. All groups are different and even if there are score differences in one
circumstance this does not mean they will always occur.

16
Copyright © 2015 Pearson, Inc. All rights reserved.
2. The groups compared are likely not drawn from identical populations. For
example a female-led company may be more attractive to women applicants and
therefore may receive more strong women candidates than a company with few high
level women which has a poor track record in the equal opportunities field. High test
scores from women in this situation may reflect the fact that the company attracts the
best women rather than any unfairness in the test.

3. Although many countries apply a similar legislation regarding discrimination


there are differences; a variable which is covered by anti-discrimination law in one
country may not be included in another.

4. In most countries, discrimination from a legal perspective concerns the final


decision made based on the assessment scores. Equality in single predictors (such as a
score from the Ravens APM) does not guarantee lack of discrimination. (See Part 1
Interpretation–Descriptive Information for more information.)

Also consider that in comparison to many other assessment methods, psychological


testing has advantages, such as standardized administration and scoring procedures that
minimize the impact of variation between individuals.

Results of comparing performance on the fixed 23-item version of Raven’s APM for
a range of samples and variables indicate whether group differences are likely and the
size of the differences. The difference between groups is expressed as a Cohen’s d
statistic. This expresses the difference in standard deviation units and the statistic can be
compared to results from different forms of the test or scores expressed on different
scales. Raw score differences are not comparable in the same way. Values of Cohen’s d
above 0.8 are considered large effects, above 0.5 are moderate effects, and above 0.2
are small effects (Cohen, Cohen, West, & Aiken, 2003). Below this level, values can be
considered negligible.

The most extensive analysis of group differences for the fixed form version of Raven’s
APM was done on a sample consisting of 1836 respondents who completed
the online version in English in the UK. Respondents completed the fixed form
assessment and the Means and SDs presented for this study and subsequent reported
studies in this section on Group Differences, are based on traditional number correct raw
scores. Respondents completed the test for a number of reasons; applying for a job
outside the respondent’s company of work, was reported as the most common reason
(63.8%), followed by those who sought professional development (10.0%). This sample
can be classified as international, because it consists of individuals born in 82 different
countries, from all continents, most frequent being British nationality (33.2%), Norway
(9.3%) and Swedish (6, 6%). Regarding sex, 1358 respondents (74.0%) were male and
456 respondents (24.8%) were female; 22 respondents (1.2%) chose not to

17
Copyright © 2015 Pearson, Inc. All rights reserved.
declare. Overall, the sample has a relatively high level of education; 37.3% reported they
had a Master’s degree, followed by 32.7 % reporting to have a Bachelor
degree. Regarding the ethnic composition, 72.6% of the sample reported to be White,
followed by 14.6% reporting to be Asian; Black respondents account for less than 1% of
the sample, and other ethnicities or those who are unwilling to declare 12%.

The total scores of 1836 respondents ranged from 0 to 23 points, with a mean of
13.2 and a standard deviation of 4.4. For the sample, the scale showed evidence of
adequate reliability (Cronbach α = 0.80), as well as dimensional structure. No problems
related to missing data or low responses to any of the items were found.

Group differences on sex, age, and educational level are also presented based on an
additional set of data collected using the TalentLens digital platforms for online
administration. All data was collected using the English version of the Raven’s APM,
gathered in Australia/New Zealand, India, and the US.

Sex
For the international sample, the comparison between males and females showed no
differences. This is in line with the classical literature, which postulates the absence of
differences in fluid intelligence by sex.

Table 5. Descriptive statistics of APM raw total scores and comparisons by sex—
international sample
Females Males
Mean SD n Mean SD n d
13.2 4.6 456 13.2 4.4 1358 0.00

For the country-level data, the results show only a negligible effect based on sex for
Australia/New Zeeland and a small effect for the US.

Table 6. Descriptive statistics of APM raw total scores and comparisons by sex—
Australia/New Zealand, India, and US

Male Female
Sample Mean SD n Mean SD n d
Australia/NZ 11.8 4.2 556 11.2 4.1 260 .15
India 14.4 3.9 2894 13.8 4.0 343 .14
US 12.8 4.4 2566 11.7 4.3 534 .25

Age
The literature claims that fluid reasoning ability tends to decrease somewhat with
increased age. Table 7 shows that effect sizes between pairwise comparisons of age
groups in the international sample vary from negligible to small, and effect size between
18
Copyright © 2015 Pearson, Inc. All rights reserved.
the youngest age group (16–24 years) and the oldest age group (50–59 years)
represents the largest difference. The difference of d = .85 between 20 and 60 years of
age is identical to previous research on matrix reason tests (Salthouse, 2009).

Table 7. Descriptive statistics of APM raw total scores and age group
comparisons—international sample

Age
Group Mean SD n Age Groups d
16–24 15.9 4.5 173 16–24 vs 25–29 .17
25–29 15.1 4.4 210 25–29 vs 30–34 .31
30–34 13.7 4.2 220 30–34 vs 35–39 .17
35–39 13.0 4.2 313 35–39 vs 40–44 .08
40–44 12.7 4.1 346 40–44 vs 45–49 .13
45–49 12.2 4.3 285 45–49 vs 50–59 .21
50–59 11.3 4.00 247 16–24 vs 50–59 .85

For the country-level samples, the effect of age on Raven’s APM test scores is based on
data from n = 9233. The main part of the data is identical to the data used for the
analysis of sex effects. The effects of age are presented by country sample. The results
generally show that performance declines somewhat by age group, even if the pairwise
comparisons show only negligible or small effects. The comparison between the youngest
and oldest age groups shows an average 1 SD lower performance by the older groups,
similar to previous research findings (Salthouse, 2009).

Table 8. Descriptive statistics of APM raw total scores and age group
comparisons—Australia/NZ

Age
Group Mean SD n Comparisons d
21–29 13.7 4.4 31 21–29 vs 30–39 .21
30–39 12.8 4.2 132 30–39 vs 40–49 .15
40–49 12.2 4.1 310 40–49 vs 50–59 .34
50–59 10.8 4.0 292 50–59 vs 60– .33
60– 9.6 3.6 50 21–29 vs 60– 1.02

Table 9. Descriptive statistics of APM raw total scores and age group
comparisons—India

Age group Mean SD n Comparisons d


20–24 15.5 5.0 68 20–24 vs 25–29 .14
25–29 14.9 3.8 1163 25–29 vs 30–34 .22
30–34 14.1 3.9 1710 30–34 vs 35–39 .44
35–39 12.4 4.1 175 35–39 vs 40– –.07
40– 12.6 4.2 53 20–24 vs 40– .63

19
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 10. Descriptive statistics of APM raw total scores and age group
comparisons—US

Age
Group Mean SD n Age Groups d
16–20 14.8 4.1 398 16–20 vs 21–24 .24
21–24 13.8 4.2 410 21–24 vs 25–29 .04
25–29 13.6 4.2 577 25–29 vs 30–34 .27
30–34 12.5 4.3 821 30–34 vs 35–39 .08
35–39 12.1 4.3 1452 35–39 vs 40–49 .14
40–49 11.5 4.2 1206 40–49 vs 50– .30
50– 10.2 4.1 385 16–20 vs 50– 1.11

Education
Effect sizes among Education levels of seven different groups among the international
sample were compared. Groups representing levels of education with a small number of
respondents were excluded due to the risk of bias in the results. The results generally
show that respondents with a higher education level scored higher on the APM compared
to respondents with a lower education level. The effect sizes indicate negligible to small
differences between groups.

20
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 11. Comparisons by education level—international sample

Groups
Education Mean SD n Comparisons d
(1) A-Level Scottish highers
or equivalent 12.4 4.7 91 (1) vs (2) –.05
(2) High education or
Diploma 12.6 4.3 163 (2) vs (3) .02
(3) Bachelor of arts 12.4 4.0 237 (3) vs (4) –.26
(4) Bachelor of Science 13.5 4.6 272 (4) vs (5) –.20
(5) Bachelor of Engineering 14.4 4.1 71 (5) vs (6) .10
(6) Master’s degree 14.0 4.4 685 (6) vs (7) –.14
(7) Doctorate degree 14.6 4.2 67 (1) vs (7) –.49

The effect of level of education on test scores from the Raven’s APM for the country-level
samples is based on data from 9298 respondents. The main part of the data is identical
to the data used for the analysis of sex and age effects, and the effects of education are
presented divided by sample. Overall, the results show that performance on Raven’s APM
increases by level of education. This is an expected finding and in concordance with
previous research.

Table 12. Descriptive statistics of APM raw total scores and education group
comparisons—Australia/NZ

Education Mean SD n Comparisons d


(1) Bachelor Degree 11.3 4.1 280 (1) vs (2) –.07
(2) Graduate/ 11.6 3.8 94 (2) vs (3) –.17
Postgraduate Diploma
(3) Master’s degree 12.3 4.1 280 (1) vs (3) –.22

Table 13. Descriptive statistics of APM raw total scores and education group
comparisons—India

Education Mean SD n Groups d


(1) Other 12.8 4.0 228 (1) vs (2) .20
(2) Bachelor degree 13.6 4.1 424 (2) vs (3) –.24
(3) Master’s degree 14.6 3.8 2465 (1) vs (3) –.45

21
Copyright © 2015 Pearson, Inc. All rights reserved.
Table 14. Descriptive statistics of APM raw total scores and education group
comparisons—US

Education Mean SD n Groups d


(1) H.S. Diploma or 11.43 4.774 238 (1) vs (2) 0.30
GED
(2) 1–2 years of college 10.09 4.110 146 (2) vs (3) –0.00
(3) Associate degree 13.31 4.525 319 (3) vs (4) 0.21
(4) 3–4 years of college 12.37 4.276 1417 (4) vs (5) –0.01
(5) Bachelor degree 12.41 4.361 1996 (5) vs (6) –0.10
(6) Master’s degree 12.82 4.218 951 (6) vs (7) –0.12
(7) Doctorate degree 13.34 4.459 117 (1) vs (7) 0.41

Occupation
To make it possible to analyze occupations, an exclusion criterion for the selection of
clusters to be included in the analysis of the international sample was used. Occupations
with n counts greater than or equal to 80 were analyzed to avoid bias due to small
sample sizes. The results are presented in Table 15.

Results showed the group of students to have the highest mean value, followed by
the engineers and the consultants. Human resource professionals were found to have the
lowest mean score, followed by marketers. Students and human resource professionals
showed a large group difference of approximately 1 SD, while the other effect sizes were
negligible and small.

The high mean score of the student group could be explained by two
hypotheses. Firstly, the student sample is mostly composed of younger individuals, who
research suggests may perform better on the test compared to older respondents.
Secondly, approximately 40% of students in the sample were at the Master’s degree
level of education. As shown in Table 7, the greater the number of years of schooling, the
greater the chance of achieving a high score on the APM.

Table 15. Descriptive statistics of APM raw total scores and occupational
group comparisons—international sample

Occupation
Occupation Mean SD n al Groups d
HR Professional 11.7 4.0 80 (1) vs (2) –.30
Marketing 13.0 4.1 81 (2) vs (3) –.07
Accountant 13.3 4.4 170 (3) vs (4) –.00
IT Professional 13.3 4.4 124 (4) vs (5) –.19
Financial analyst 14.1 4.3 111 (5) vs (6) –.03
Consultant 14.2 4.6 152 (6) vs (7) –0.03
Engineer 14.4 3.7 108 (7) vs (8) –0.46
Student 16.1 4.1 173 (1) vs (8) –1.09
22
Copyright © 2015 Pearson, Inc. All rights reserved.
Evidence of Validity
Validity is a unitary concept and refers to the degree to which all the accumulated
evidence and theory supports the intended interpretation of the test scores for the
proposed use. Thus, there are different types of validity evidence (that can be collected
from different sources), rather than distinct types of validity, and it is the interpretation
of test scores required by proposed uses that are evaluated, not the test itself. In
addition, if there is a decision to be made, it is the decision itself which should be
validated rather than a single test score. Please see Ravens International Manual Part 1
Interpretation–Predictive information for a more detailed discussion. In the following,
relevant evidence in support for the validity of Raven’s APM intended for application in a
work and organizational setting is presented.

Evidence of Content Validity


Content validity refers to the relationship between test content and the construct it is
intended to measure. Test content refers to the themes, wording, and format of the
items, tasks, or questions, as well as the guidelines for procedures regarding
administration and scoring (Standards for educational and psychological testing, 1999).

Traditionally, APM is considered a measure of g, general intelligence, referring to


the ability to make meaning out of confusion, develop new insights, go beyond the given
to perceive what is not immediately obvious, evaluating complex information, finding
solutions to novel problems where prior knowledge cannot be applied. Jensen (1998)
however, has empirically shown that at least nine different item types (subtests) need to
be represented in a good measure of g. To develop a short test to measure this broad
and general ability can thus be problematic.

Within the work and organizational domain however, the primary purpose of testing
is often to predict other external behaviors such as job performance rather than fully
describing an individual regarding g and not drawing any further conclusions.
Fortunately, convincing research has shown that a “full” measure of g is unnecessarily
extensive and expensive when the aim is to predict for example job performance
(Postlethwaite, 2011). In fact, factor-analytic studies have repeatedly demonstrated that
the item format matrices of the kind in Raven’s APM are one of the best single indicators
of g (e.g., Llabre, 1984; Snow et al., 1984; Spearman, 1927a; 1927b; Vernon, 1942).
Therefore, Raven’s APM aims for the position of being an indicator of g, rather than a full
measure of g. The empirical support for this aim is massive.

Matrices require the individual to perform mental operations needed when facing
new tasks and cannot be performed automatically. Examples include: Recognition,
concept formation, understanding implications, problem solving, extrapolation,
reorganization and transformation of information (Flanagan & Ortiz, 2001). Matrices
23
Copyright © 2015 Pearson, Inc. All rights reserved.
require both inductive and deductive problem solving and demands that the individual
mentally can manipulate patterns and symbols into a logical context. In addition, the
non-verbal content of the items minimizes the effect of prior knowledge and verbal ability
on the test scores to the benefit of the individual’s intellectual potential.

The standardized instructions kept at a low readability level, and practice items
including the rational provided for each response alternative, administered online under
supervised conditions serves to ensure individuals equal opportunities to perform on the
Raven’s APM.

Evidence of Convergent Validity


Evidence of convergent validity is provided when scores on an assessment relate to
scores on other assessments that claim to measure similar traits or constructs. Years of
previous studies on the 36-item APM version support its convergent validity (Raven,
Raven, & Court, 1998b). In a sample of 149 college applicants, APM scores correlated .56
with math scores on the American College Test (Koenig, Frey, & Detterman, 2007).
Furthermore, in a study using 104 university students, Frey and Detterman (2004)
reported that scores from the APM correlated .48 with scores on the Scholastic
Assessment Test (SAT).

Evidence of convergent validity for the current version of the APM is supported by
several findings. In a subset of 41 respondents from the standardisation sample, the
revised APM scores correlated .54 with scores on the Watson-Glaser Critical Thinking
Appraisal®–Short Form (Watson & Glaser, 2006). Further, in a sample of N = 276 (see
Appendix A, Table A.3) Raven’s APM correlated r = .51 (p < .001) with the total score on
the Advanced Numerical Reasoning Appraisal (ANRA; a cognitive ability assessment
measuring quantitative reasoning) and in a sample of N = 307 (demographics is
presented in Appendix A, Table A.4) the short version of the WGCTA r = .37 (p < .001).

Evidence of Criterion-Related Validity


Criterion-related validity addresses the inference that individuals who score better on an
assessment will be more successful on some criterion of interest (e.g., job performance).
Criterion-related validity for general mental ability tests like the APM is strongly
supported by validity generalization. The principle of validity generalization refers to the
extent that inferences from accumulated evidence of criterion-related validity from
previous research, both on measures measuring the same underlying construct and
across versions of a measure, can be generalized to a new situation and across versions.

There is abundant evidence that measures of general mental ability, such as the
APM, are significant predictors of overall performance across jobs. For example, in its
publication on the Principles for the Validation and Use of Personnel Selection Procedures
(2003) it is established that validity generalization is well-established for cognitive ability
24
Copyright © 2015 Pearson, Inc. All rights reserved.
tests. Schmidt & Hunter (2004) provide evidence that general mental ability “predicts
both occupational level attained and performance within one's chosen occupation and
does so better than any other ability, trait, or disposition and better than job experience”
(p. 162). Prien, Schippmann, and Prien (2003) observe that decades of research “present
incontrovertible evidence supporting the use of cognitive ability across situations and
occupations with varying job requirements” (p. 55). In addition, many other studies
provide evidence of the relationship between general mental ability and job performance
(e.g., Kolz, McFarland, & Silverman, 1998; Kuncel, Hezlett, & Ones, 2004; Ree &
Carretta, 1998; Salgado, et al., 2003; Schmidt & Hunter, 1998; Schmidt & Hunter,
2004).

In addition to inferences based on validity generalization, studies using the APM in


the past 70 years provide evidence of its criterion-related validity. For example, in a
validation study of assessment centres, Chan (1996) reported that scores on the Raven’s
Progressive Matrices correlated with ratings of participants on “initiative/creativity” (r =
.28, p < .05). Another group of researchers (Gonzalez, Thomas, & Vanyukov, 2005)
reported a positive relationship between scores on the Raven’s APM and performance in
decision-making tasks. Fay and Frese (2001) found that APM scores were “consistently
and positively associated with an increase of personal initiative over time” (p. 120).
Recently, Pearson (2010) conducted a study of 106 internal applicants for management
positions in which APM scores were positively correlated with trained assessor ratings of
“thinking, influencing, and achieving.” In addition, manager applicants scoring in the top
30% of APM scores were two to three times more likely to receive above average ratings
for the “Case Study/Presentation Exercise”, “Thinking Ability”, and “Influencing Ability”
than applicants in the bottom 30% of APM scores. In addition, the APM Manual and
Occupational User’s Guide (Raven, 1994; Raven, Raven, & Court, 1998b) provide further
information indicating that the APM predicts the ability to attain and retain jobs that
require high levels of general mental ability.

25
Copyright © 2015 Pearson, Inc. All rights reserved.
References
Anastasi, A.. & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River,
N.J.: Prentice Hall.

Brouwers, S. A., Van de Vijver, F., & Van Hemert, D. A. (2009). Variation in Raven’s
Progressive Matrices scores across team and place. Learning and Individual
Differences, 19 (3), 330–338.

Chan, D. (1996). Criterion and construct validation of an assessment centre. Journal of


Occupational and Organizational Psychology, 69, 167–181.

Cohen, J., Cohen, P., West, S. G, Aiken, L. S. (2003). Applied Multiple Regression /
Correlation Analysis for the Behavioral Sciences. London: Lawrence Erlbaum.

Fay, D., & Frese, M. (2001). The concept of Personal Initiative: An overview of validity
studies. Human Performance, 14 (1), 97–124.

Frey, M. C., & Detterman, D. K. (2004). Scholastic assessment or g? The relationship


between the Scholastic Assessment Test and general cognitive ability. Psychological
Science, 15 (6), 373–378.

Gonzalez, C., Thomas, R.P., & Vanyukov, P. (2005). The relationship between cognitive
ability and dynamic decision making. Intelligence, 33, 169–186.

Koenig, K. A., Frey, M. C., & Detterman, D.K. (2007). ACT and general cognitive ability.
Intelligence, doi:10.1016/j.intell.2007.03.005, 1–8.

Kolz, A. R., McFarland, L. A., & Silverman, S. B. (1998). Cognitive ability and job
experience as predictors of work performance. The Journal of Psychology, 132, 539–
548.

Kuncel, N. A., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career
potential, creativity, and job performance: Can one construct predict them all?
Journal of Personality and Social Psychology, 86, 148–161.

Llabre, M. M. (1984). Standard Progressive Matrices. In D. J. Keyser, R. C. Sweetland


(Eds.), Test Critiques (Vol. 1, pp. 595–602). Kansas City, MO: Test Corporation of
America.

Postlethwaite, B. E. (2011). Fluid ability, crystallized ability, and performance across


multiple domains: a meta–analysis. Doctoral thesis, University of Iowa.

Prien, E.P., Schippmann, J.S., & Prien, K.O. (2003). Individual assessment as practiced in
industry and consulting. Mahwah, NJ: Lawrence Erlbaum.

26
Copyright © 2015 Pearson, Inc. All rights reserved.
Raju, N. S., Price, L. R., Oshima, T. C., & Nering, M. L. (2007). Standardized conditional
SEM: A case for conditional reliability. Applied Psychological Measurement, 31(3),
169–180.

Raven, J., Raven, J. C., & Court, J. H. (1998). Raven Manual: Section 4, Advanced
Progressive Matrices, 1998 edition. Oxford, UK: Oxford Psychologists Press.

Ree, M. J., & Carretta, T. R. (1998). General cognitive ability and occupational
performance. In C. L. Cooper & I. T. Robertson (Eds.), International review of
industrial and organizational psychology (Vol. 13, pp. 159–184). Chichester, UK:
Wiley.

Salgado, J., Anderson, N., Moscoso, S., Bertua, C., & de Fruyt, F. (2003). International
validity generalisation of GMA and cognitive abilities. Personnel Psychology, 56, 573-
605.

Salthouse, T. A. (2009). When does age–related cognitive decline begin? Neurobiology of


Aging, 30, 507–514.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in
personnel psychology: Practical and theoretical implications of 85 years of research
findings. Psychological Bulletin, 124, 262–274.

Schmidt, F. L., & Hunter, J. E. (2004). General mental ability in the world of work:
Occupational attainment and job performance. Journal of Personality and Social
Psychology, 86, 162–173.

Principles for the validation and use of personnel selection procedures (4th ed.). (2003).
Society for Industrial and Organizational Psychology (Division 14 of the American
Psychological Association). Bowling Green, OH: Society for Industrial and
Organizational Psychology.

Snow, R. E., Kyllonen, P. C., & Marshalek, B. (1984). The typography of ability and
learning correlations. In R. J. Sternberg (Ed.), Advances in the Psychology of Human
Intelligence, (Vol. 2, pp. 47–103). New York: MacMillan.

Spearman, C. (1927a). The abilities of man. London, United Kingdom: MacMillan.

Spearman, C. (1927b). The nature of “intelligence” and the Principles of cognition (2nd
ed.). London, United Kingdom: MacMillan.

Standards for educational and psychological testing (1999). American Educational


Research Association, American Psychological Association, National Council on
Measurement in Education. Washington, WA: American Educational Research
Association.

27
Copyright © 2015 Pearson, Inc. All rights reserved.
U.S. Department of Labor (1999). Testing and assessment: An employer’s guide to good
practices. Washington DC: U.S. Department of Labor.

Vernon, P. E. (1942). The reliability and validity of the Progressive Matrices Test. London,
United Kingdom: Admiralty Report, 14(b).

Watson, G., & Glaser, E. M. (2006). Watson-Glaser Critical Thinking Appraisal, Short
Form manual. San Antonio, TX: Pearson.

Wongupparaj, P., Kumari, V., & Morris, R. G. (2015). A cross-temporal meta-analysis of


Raven’s Progressive Matrices: Age groups and developing versus developed
countries. Intelligence, 49, 1–9.

28
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix A Demographics of U.S. Sample

Table A.1 U.S. Sample by Occupation and Organization Level (N = 929)


Occupation N % of Sample
Accountant; Auditor 8 0.86
Admin Assistant; Secretary; Office support 8 0.86
Architect 1 0.11
Attorney 2 0.22
Bank Teller 1 0.11
Consultant 42 4.52
Customer Service Representative 5 0.54
Engineer 45 4.84
Financial Analyst 23 2.48
Food Service 2 0.22
Human Resources Occupations 28 3.01
Information Technology Occupations 74 7.97
Installation/Maintenance/Repair 2 0.22
Loan Officer 1 0.11
MD, DO, DDS, etc. 2 0.22
Medical Dental Assistant 1 0.11
Nurse 1 0.11
Psychologist 4 0.43
Sales Representative (Non-Retail) 35 3.77
Sales Representative (Retail) 4 0.43
Skilled Tradesperson 1 0.11
Teaching Occupations 5 0.54
Other 170 18.30
Not Applicable 295 31.75
No Response 169 18.19
Total 929 100.00
Organizational Level
Executive; Director 147 15.82
Manager 167 17.98
Supervisor 14 1.51
Professional/Individual Contributor 143 15.39
Hourly/Entry Level 31 3.34
Blue Collar 5 0.54
Self-Employed/Business Owner 2 0.22
Not Applicable 251 27.02
No Response 169 18.19
Total 929 100.00

29
Copyright © 2015 Pearson, Inc. All rights reserved.
Table A.2 Demographics of U.S. Sample Used to Calculate Item
Difficulties
Industry N % of Sample
Aerospace, Aviation 2 0.30
Arts, Entertainment, Media 6 0.90
Construction 13 1.96
Education 40 6.03
Energy, Utilities 7 1.06
Financial Services, Banking, Insurance 194 29.26
Government, Public Service, Defense 7 1.06
Health Care 23 3.47
Hospitality, Tourism 6 0.90
Information Technology, High-Tech, 5.43
Telecommunications 36
Manufacturing & Production 96 14.48
Natural Resources, Mining 1 0.15
Pharmaceuticals, Biotechnology 64 9.65
Professional, Business Services 32 4.83
Publishing, Printing 3 0.45
Real Estate 7 1.06
Retail & Wholesale 35 5.28
Transportation Warehousing 10 1.51
Other 68 10.26
Not Applicable 13 1.96
Total 663 100.00
Position
Not Applicable 79 11.92
Executive 65 9.80
Director 65 9.80
Manager 134 20.21
Professional/Individual Contributor 264 39.82
Supervisor 10 1.51
Self-Employed/Business Owner 4 0.60
Administrative/Clerical 21 3.17
Skilled Trades 3 0.45
Customer Service/Retail Sales 16 2.41
General Labor 2 0.30
Total 663 100.00
Sex
Male 392 59.13
Female 169 25.49
No Response 102 15.38
Total 663 100.00

(Table continues on next page.)

30
Copyright © 2015 Pearson, Inc. All rights reserved.
Table A2—continued
% of
Ethnicity N Sample
White (Non-Hispanic) 334 50.38
Black, African American 34 5.13
Hispanic, Latino/a 25 3.77
Asian/Pacific Islander 142 21.42
Other 6 0.90
Multiracial 10 1.51
No Response 112 16.89
Total 663 100.00
% of
Age N Sample
21–24 57 8.60
25–29 149 22.47
30–34 108 16.29
35–39 71 10.71
40–49 128 19.31
50–59 32 4.83
60–69 5 0.75
No Response 113 17.04
Total 663 100.00
% of
Education Level N Sample
HS/GED 3 0.45
1–2 yrs college 4 0.60
Assoc. 4 0.60
3–4 yrs college 14 2.11
Bachelor’s degree 234 35.29
Master’s degree 261 39.37
Doctorate 35 5.28
No Response 108 16.29
Total 663 100.00

31
Copyright © 2015 Pearson, Inc. All rights reserved.
Table a.3 Demographics of U.S. Sample Used to Calculate
Correlations Between APM and ANRA (RANRA in UK)
Industry N % of Sample
Aerospace, Aviation 1 0.36
Arts, Entertainment, Media 1 0.36
Construction 1 0.36
Education 31 11.23
Energy, Utilities 1 0.36
Financial Services, Banking, Insurance 152 55.07
Government, Public Service, Defense 5 1.81
Health Care 5 1.81
Hospitality, Tourism 2 0.72
Information Technology, High-Tech, 7.97
Telecommunications 22
Manufacturing & Production 5 1.81
Pharmaceuticals, Biotechnology 6 2.17
Professional, Business Services 12 4.35
Real Estate 3 1.09
Retail & Wholesale 8 2.90
Transportation Warehousing 3 1.09
Other 11 3.99
Not Applicable 1 0.36
No Response 6 2.17
Total 276 100.00
Position
Not Applicable 36 13.04
Executive 5 1.81
Director 6 2.17
Manager 36 13.04
Professional/Individual Contributor 164 59.42
Supervisor 5 1.81
Self-Employed/Business Owner 3 1.09
Administrative/Clerical 10 3.62
Skilled Trades 2 0.72
Customer Service/Retail Sales 3 1.09
No Response 6 2.17
Total 276 100.00
Sex
Male 160 57.97
Female 63 22.83
No Response 53 19.20
Total 276 100.00

(Table continues on next page.)

32
Copyright © 2015 Pearson, Inc. All rights reserved.
Table A3—continued
Ethnicity N % of Sample
White (Non-Hispanic) 70 25.36
Black, African American 16 5.80
Hispanic, Latino/a 8 2.90
Asian/Pacific Islander 116 42.03
Other 1 0.36
Multiracial 5 1.81
No Response 60 21.74
Total 276 100.00
Age
21–24 31 11.23
25–29 79 28.62
30–34 61 22.10
35–39 20 7.25
40–49 18 6.52
50–59 5 1.81
60–69 2 0.72
No Response 60 21.74
Total 276 100.00
Education Level
HS/GED 1 0.36
3–4 yrs college 6 2.17
Bachelor’s degree 69 25.00
Master’s degree 128 46.38
Doctorate 18 6.52
No Response 54 19.57
Total 276 100.00

33
Copyright © 2015 Pearson, Inc. All rights reserved.
Table a.4 Industries Used to Calculate Correlations
Between APM and Watson–Glaser Short Form (U.S
Sample)

Industry N % of Sample
Aerospace, Aviation 1 0.33
Arts, Entertainment, Media 1 0.33
Construction 1 0.33
Education 31 10.10
Energy, Utilities 1 0.33
Financial Services, Banking, Insurance 159 51.79
Government, Public Service, Defense 5 1.63
Health Care 6 1.95
Hospitality, Tourism 2 0.65
Information Technology, High-Tech,
Telecommunications 23 7.49
Manufacturing & Production 26 8.47
Pharmaceuticals, Biotechnology 6 1.95
Professional, Business Services 15 4.89
Real Estate 3 0.98
Retail & Wholesale 12 3.91
Transportation Warehousing 3 0.98
Other 11 3.58
Not Applicable 1 0.33
Total 307 100.00
Position
Not Applicable 36 11.73
Executive 24 7.82
Director 14 4.56
Manager 36 11.73
Professional/Individual Contributor 171 55.70
Supervisor 5 1.63
Self–Employed/Business Owner 3 0.98
Administrative/Clerical 11 3.58
Skilled Trades 2 0.65
Customer Service/Retail Sales 5 1.63
Total 307 100.00
Sex
Male 186 60.59
Female 71 23.13
No Response 50 16.29
Total 307 100.00

(Table continues on next page.)


34
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix A4—continued
Ethnicity N %of Sample
White (Non-Hispanic) 95 30.94
Black, African American 22 7.17
Hispanic, Latino/a 9 2.93
Asian/Pacific Islander 117 38.11
Other 1 0.33
Multiracial 5 1.63
No Response 58 18.89
Total 307 100.00
Age Range
21–24 32 10.42
25–29 82 26.71
30–34 64 20.85
35–39 24 7.82
40–49 36 11.73
50–59 9 2.93
60–69 3 0.98
No Response 57 18.57
Total 307 100.00
Education Level
HS/GED 2 0.65
3–4 yrs college 7 2.28
Bachelor’s degree 82 26.71
Master’s degree 144 46.91
Doctorate 21 6.84
No Response 51 16.61
Total 307 100.00

35
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix B Demographics of International Sample

Table b.1 Australia/New Zealand Sample Demographic Information


N % of Sample
Education Level 128 100.00
Year 11 or equivalent 2 1.56
Year 12 or equivalent 4 3.13
Certificate III / IV 2 1.56
Diploma 3 2.34
Advanced Diploma 2 1.56
Bachelor’s degree 28 21.88
Graduate Certificate 1 0.78
Graduate/Postgraduate 15.63
Diploma 20
Master’s degree 47 36.72
Doctorate 3 2.34
Other 4 3.13
Not Reported 12 9.38
Sex
Female 61 47.66
Male 55 42.97
Not Reported 12 9.38
Age
21–24 1 0.78
25–29 9 7.03
30–34 19 14.84
35–39 25 19.53
40–49 28 21.88
50–59 29 22.66
60–69 5 3.91
Not Reported 12 9.38
Years in Occupation
<1 year 18 14.06
1 year to less than 2 years 6 4.69
2 years to less than 4 years 27 21.09
4 years to less than 7 years 17 13.28
7 years to less than 10 years 14 10.94
10 years to less than 15 10.94
years 14
≥15 years 18 14.06
Not Reported 14 10.94

36
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.2 France Sample Demographic Information
N % of Sample
Education Level 100 100.00
11 years (1ière, CAP–BEP) 1 1.00
13–14 years (Bac +1 et 2) 7 7.00
15–16 years (Bac +2 et 3) 11 11.00
17–18 years (Bac + 4 et 5) 74 74.00
More than 18 years 4.00
(Doctorate) 4
Not Reported 3 3.00
Sex
Female 51 51.00
Male 49 49.00
Age
21–24 13 13.00
25–29 25 25.00
30–34 17 17.00
35–39 10 10.00
40–49 18 18.00
50–59 15 15.00
60–69 1 1.00
Not Reported 1 1.00

37
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.3 India Sample Demographic Information
N % of Sample
Education Level 100 100.00
Bachelor’s degree 49 49.00
Master’s degree 39 39.00
Doctorate 2 2.00
Other 6 6.00
Not Reported 4 4.00
Sex
Female 22 22.00
Male 78 78.00
Age
20–24 2 2.00
25–29 27 27.00
30–34 33 33.00
35–39 17 17.00
40–44 10 10.00
45–49 4 4.00
Not Reported 7 7.00

38
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.4 Netherlands Sample Demographic Information
N % of Sample
Education Level 103 100.00
<12 (mbo niveau 4) 4 3.88
12 (havo) 2 1.94
13 (vwo) 4 3.88
14 (hbo) 50 48.54
15 (wo) 31 30.10
Not Reported 12 11.65
Sex
Female 33 32.04
Male 59 57.28
Not Reported 11 10.68
Age
21–24 1 0.97
25–29 7 6.80
30–34 6 5.83
35–39 23 22.33
40–49 44 42.72
50–59 11 10.68
Not Reported 11 10.68
Years in Occupation
1 year to less than 2 years 1 0.97
2 years to less than 4 years 5 4.85
4 years to less than 7 years 2 1.94
7 years to less than 10 8.74
years 9
10 years to less than 15 18.45
years 19
≥15 years 56 54.37
Not Reported 11 10.68

39
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.5 Scandinavian Sample Demographic Information

N % of Sample
Education Level 319 100.00
Elementary School 17 5.33
Gymnasium/High School 77 24.14
University (up to 3 years) 109 34.17
University (more than 3 36.36
years) 116
Sex
Female 154 48.28
Male 165 51.72
Age
21–24 4 1.25
25–29 36 11.29
30–34 18 5.64
35–39 18 5.64
40–49 86 26.96
50–59 79 24.76
60–69 68 21.32
Years in Occupation
<1 year 7 2.19
1 year to less than 2 years 44 13.79
2 years to less than 4 years 53 16.61
4 years to less than 7 years 75 23.51
7 years to less than 10 years 33 10.34
10 years to less than 15 15.36
years 49
≥15 58 18.18

40
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.6 Singapore Sample Demographic Information

N % of Sample
Education Level 229 100.00
A–level, Scottish Highers or
equivalent 51 22.27
BA 24 10.48
BEng 7 3.06
BSc 34 14.85
Doctorate 2 0.87
GCSE or equivalent 57 24.89
Higher Education Certificate or
Diploma 13 5.68
LLB 4 1.75
Master’s degree 21 9.17
No formal qualification 1 0.44
Other 11 4.80
Not Reported 4 1.75
Sex
Female 90 39.30
Male 137 59.83
Not Reported 2 0.87
Age
16–19 years 79 34.50
20–24 years 63 27.51
25–29 years 28 12.23
30–34 years 15 6.55
35–39 years 13 5.68
40–44 years 12 5.24
45–49 years 12 5.24
50–54 years 1 0.44
55–59 years 4 1.75
Not Reported 2 0.87
Years in Occupation
Less than 1 year 43 18.78
1 to 2 years 45 19.65
3 to 4 years 20 8.73
5 to 7 years 15 6.55
8 to 10 years 13 5.68
11 to 15 years 63 27.51
16 to 20 years 18 7.86
21 to 25 years 4 1.75
Not Reported 8 3.49

41
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.7 UK Sample Demographic Information
N % of Sample
Education Level 101 100.00
GCSE equivalent 3 2.97
A-level, Scottish Highers or equivalent 1 0.99
Higher Education Certificate or Diploma 12 11.88
BA 29 28.71
BSc 16 15.84
Bed 1 0.99
LLB 3 2.97
Master’s degree 26 25.74
Doctorate 2 1.98
Other 7 6.93
Not Reported 1 0.99
Sex
Female 46 45.54
Male 46 45.54
Not Reported 9 8.91
Age
16–19 4 3.96
20–24 1 0.99
25–29 10 9.90
30–34 28 27.72
35–39 31 30.69
40–44 12 11.88
45–49 5 4.95
50–54 2 1.98
55–59 2 1.98
Not Reported 6 5.94
Years in Occupation
<1 year 8 7.92
1–2 years 18 17.82
3–4 years 17 16.83
5–7 years 19 18.81
8–10 years 16 15.84
11–15 years 15 14.85
16–20 years 6 5.94
20+ years 2 1.98

42
Copyright © 2015 Pearson, Inc. All rights reserved.
Table B.8 U.S. Sample Demographic Information
N % of Sample
Education Level 342 100.00
8th–11th Grade 1 0.29
HS/GED 17 4.97
1–2 yrs of College 31 9.06
Associates 16 4.68
3–4 yrs of College 15 4.39
Bachelor’s degree 168 49.12
Master’s degree 88 25.73
Doctorate 4 1.17
Not Reported 2 0.58
Sex
Female 103 30.12
Male 238 69.59
Not Reported 1 0.29
Age
16–20 2 0.58
21–24 7 2.05
25–29 21 6.14
30–34 54 15.79
35–39 67 19.59
40–49 121 35.38
50–59 62 18.13
60–69 4 1.17
Not Reported 4 1.17
Years in Occupation
<1 year 18 5.26
1 year to less than 2 years 29 8.48
2 years to less than 4 years 51 14.91
4 years to less than 7 years 43 12.57
7 years to less than 10 42 12.28
years
10 years to less than 15 73 21.35
years
>15 years 83 24.27
Not Reported 3 0.88

43
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix C Raven’s APM Item Analyses for International
Sample

Table C.1 Australia/New Zealand


Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)

1 –3.38 0.25 1.03 0.95 0.22


2 –3.05 0.37 1.11 0.94 0.33
3 –1.82 0.43 1.12 0.83 0.37
4 –1.51 0.40 1.05 0.80 0.34
5 –1.24 0.30 0.85 0.76 0.22
6 –0.55 0.44 1.07 0.64 0.36
7 –1.24 0.44 1.10 0.76 0.36
8 0.33 0.41 0.91 0.46 0.31
9 0.05 0.46 1.12 0.52 0.38
10 –0.14 0.46 1.13 0.56 0.38
11 0.05 0.41 0.91 0.52 0.31
12 0.36 0.31 0.53 0.46 0.21
13 0.48 0.43 1.00 0.43 0.33
14 0.68 0.42 0.99 0.39 0.33
15 1.15 0.39 0.93 0.31 0.28
16 1.01 0.35 0.80 0.33 0.24
17 0.80 0.49 1.21 0.37 0.42
18 0.17 0.42 0.96 0.50 0.32
19 1.06 0.37 0.86 0.32 0.27
20 1.28 0.41 0.99 0.28 0.32
21 0.64 0.42 0.98 0.40 0.33
22 1.24 0.42 1.00 0.29 0.31
23 3.63 0.43 1.09 0.05 0.31

44
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.2 France
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
1 –3.38 0.05 0.90 0.97 0.00
2* — — — — —
3 –1.04 0.34 0.97 0.80 0.25
4 –1.97 0.32 1.03 0.90 0.26
5 –0.76 0.39 1.03 0.76 0.32
6 –0.45 0.15 0.45 0.71 0.02
7 –0.70 0.32 0.90 0.75 0.23
8 –0.70 0.45 1.13 0.75 0.37
9 –0.07 0.44 1.10 0.64 0.34
10 –0.63 0.49 1.21 0.74 0.43
11 0.34 0.38 0.88 0.56 0.26
12 –0.17 0.48 1.21 0.66 0.40
13 –0.07 0.25 0.55 0.64 0.13
14 0.49 0.50 1.32 0.53 0.41
15 0.98 0.40 0.92 0.43 0.27
16 1.18 0.42 0.99 0.39 0.30
17 0.04 0.52 1.35 0.62 0.44
18 0.34 0.57 1.56 0.56 0.50
19 0.78 0.24 0.31 0.47 0.13
20 0.93 0.45 1.08 0.44 0.32
21 0.88 0.43 1.03 0.45 0.32
22 0.93 0.48 1.20 0.44 0.38
23 3.06 0.32 0.98 0.12 0.24
*100 percent of the sample obtained a perfect score on Item 2, making the IRT and CTT
values inestimable.

45
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.3 India
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)

1 –2.85 0.39 1.05 0.86 0.30


2 –2.27 0.53 1.21 0.79 0.43
3 –1.85 0.54 1.21 0.73 0.46
4 –2.05 0.43 1.01 0.76 0.31
5 –1.49 0.53 1.17 0.67 0.43
6 –1.43 0.52 1.13 0.66 0.42
7 –1.21 0.46 0.96 0.62 0.34
8 0.12 0.40 0.83 0.37 0.31
9 –0.57 0.57 1.29 0.50 0.48
10 –0.57 0.52 1.14 0.50 0.43
11 0.64 0.55 1.26 0.28 0.48
12 0.01 0.38 0.73 0.39 0.27
13 0.52 0.43 0.98 0.30 0.35
14 1.42 0.29 0.88 0.17 0.22
15 1.34 0.33 0.93 0.18 0.25
16 1.59 0.23 0.83 0.15 0.15
17 0.17 0.40 0.82 0.36 0.30
18 0.76 0.30 0.74 0.26 0.20
19 0.52 0.51 1.16 0.30 0.43
20 0.83 0.45 1.07 0.25 0.37
21 1.42 0.32 0.94 0.17 0.26
22 1.03 0.35 0.91 0.22 0.26
23 3.94 0.01 0.90 0.02 –0.02

46
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.4 The Netherlands
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)

1 –2.95 0.29 1.02 0.94 0.24


2 –2.76 0.38 1.09 0.93 0.31
3 –1.77 0.50 1.19 0.85 0.44
4 –1.59 0.40 1.03 0.83 0.34
5 –1.13 0.24 0.68 0.78 0.16
6 –0.46 0.47 1.06 0.67 0.39
7 –1.00 0.33 0.82 0.76 0.23
8 0.32 0.43 0.87 0.52 0.34
9 0.17 0.50 1.12 0.55 0.42
10 –0.58 0.51 1.19 0.69 0.44
11 0.17 0.53 1.23 0.55 0.46
12 0.67 0.44 0.89 0.46 0.35
13 0.17 0.48 1.03 0.55 0.39
14 0.37 0.48 1.04 0.51 0.39
15 1.42 0.47 1.04 0.32 0.37
16 0.88 0.36 0.66 0.42 0.26
17 0.98 0.50 1.11 0.40 0.41
18 –0.14 0.51 1.15 0.61 0.43
19 0.77 0.59 1.38 0.44 0.52
20 1.30 0.40 0.84 0.34 0.30
21 0.98 0.45 0.94 0.40 0.38
22 1.19 0.40 0.84 0.36 0.30
23 2.99 0.28 0.92 0.12 0.20

47
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.5 Scandinavia
Item Difficulty Item–Ability Item Difficulty
(b) Parameter (θ) Correlation Index (p value, Item–Total
Item # (IRT) (IRT) CTT) Correlation (CTT)

1 –2.62 0.47 0.85 0.22


2 –2.47 0.48 0.84 0.33
3 –1.13 0.55 0.64 0.37
4 –1.06 0.53 0.62 0.34
5 –0.86 0.33 0.59 0.22
6 –0.07 0.44 0.43 0.36
7 –0.99 0.37 0.61 0.36
8 –0.07 0.49 0.43 0.31
9 –0.15 0.58 0.45 0.38
10 –0.33 0.45 0.48 0.38
11 0.73 0.37 0.29 0.31
12 –0.30 0.36 0.48 0.21
13 0.01 0.47 0.42 0.33
14 0.31 0.48 0.36 0.33
15 0.77 0.36 0.28 0.28
16 0.87 0.27 0.27 0.24
17 0.09 0.44 0.4 0.42
18 0.28 0.47 0.37 0.32
19 0.79 0.44 0.28 0.27
20 0.69 0.39 0.29 0.32
21 1.45 0.39 0.18 0.33
22 1.16 0.43 0.22 0.31
23 2.89 0.25 0.06 0.31

48
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.6 United Kingdom
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
1 –3.12 0.26 1.03 0.94 0.22
2 –2.94 0.33 1.09 0.93 0.31
3 –2.27 0.36 1.06 0.88 0.32
4 –1.06 0.40 0.93 0.73 0.34
5 –1.47 0.32 0.86 0.79 0.25
6 –1.06 0.36 0.83 0.73 0.29
7 –0.93 0.38 0.85 0.71 0.31
8 0.03 0.55 1.25 0.54 0.50
9 –0.40 0.58 1.37 0.62 0.54
10 –0.08 0.42 0.78 0.56 0.33
11 0.45 0.53 1.17 0.46 0.47
12 0.51 0.39 0.61 0.45 0.31
13 0.51 0.54 1.18 0.45 0.46
14 0.56 0.58 1.34 0.44 0.52
15 1.00 0.54 1.16 0.36 0.46
16 1.42 0.48 0.99 0.29 0.39
17 0.45 0.52 1.14 0.46 0.45
18 0.24 0.52 1.13 0.50 0.46
19 0.67 0.45 0.86 0.42 0.38
20 1.35 0.37 0.68 0.30 0.26
21 1.35 0.42 0.82 0.30 0.30
22 1.06 0.52 1.11 0.35 0.44
23 3.72 0.39 0.99 0.06 0.29

49
Copyright © 2015 Pearson, Inc. All rights reserved.
Table C.7 United States
Item
Difficulty
(b) Item–Ability Discrimination Item Difficulty Item–Total
Parameter Correlation (a) Parameter Index (p value, Correlation
Item # (IRT) (IRT) (IRT) CTT) (CTT)
1 –3.11 0.31 1.06 0.94 0.26

2 –2.47 0.39 1.10 0.90 0.34

3 –1.90 0.47 1.17 0.85 0.42

4 –1.05 0.41 1.03 0.74 0.33

5 –1.52 0.29 0.89 0.80 0.20

6 –0.84 0.31 0.79 0.70 0.21

7 –1.39 0.36 0.98 0.79 0.29

8 0.04 0.51 1.32 0.53 0.43

9 –0.61 0.50 1.25 0.66 0.44

10 –0.46 0.43 1.04 0.63 0.35

11 0.04 0.45 1.09 0.53 0.36

12 0.18 0.31 0.53 0.51 0.21

13 0.64 0.42 0.98 0.41 0.33

14 0.61 0.48 1.21 0.42 0.40

15 1.19 0.42 1.00 0.31 0.31

16 1.29 0.42 0.99 0.29 0.31

17 0.73 0.42 0.98 0.40 0.34

18 0.53 0.38 0.80 0.44 0.27

19 1.23 0.48 1.16 0.30 0.40

20 1.13 0.33 0.74 0.32 0.23

21 1.10 0.43 1.03 0.33 0.33

22 1.39 0.32 0.78 0.28 0.19

23 3.24 0.36 1.00 0.07 0.23

50
Copyright © 2015 Pearson, Inc. All rights reserved.
Appendix D Demographics of Concerto Sample (N =
1720)

N % of Sample
Age 1720 100.00
<18 8 0.47
18–20 194 11.28
21–30 1014 58.95
31–40 265 15.41
41–50 135 7.85
51–60 81 4.71
>60 23 1.34
Sex
Female 830 48.26
Male 856 49.77
Other 7 0.41
Unspecified 27 1.57
Ethnicity
Asian/Pacific Islander 1111 64.59
Black/African American 33 1.92
Hispanic/Latino 36 2.09
Native American/American 4 0.23
Indian
White 489 28.43
Others 47 2.73
Education Level
No schooling completed 31 1.80
Nursery school to 8th grade 19 1.10
Some high school, no 106 6.16
diploma
High school graduate, 303 17.62
diploma or equivalent
Associate degree 84 4.88
Bachelor’s degree 727 42.27
Master’s degree 316 18.37
Professional degree 42 2.44
Doctorate degree 92 5.35

51
Copyright © 2015 Pearson, Inc. All rights reserved.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy