33 - Selection Tests
33 - Selection Tests
33 - Selection Tests
33
Selection Tests
Learning outcomes
On completing this chapter you should be able to define these key concepts.
You should also know about:
Introduction
Selection tests are used to provide valid and reliable evidence of levels of abilities, intelligence,
personality characteristics, aptitudes and attainments. They typically supplement the informa-
tion obtained from an interview.
Selection tests can be divided into two broad categories: measures of typical performance such
as personality inventories that do not have right or wrong answers, and measures of maximum
performance that measure how well people can do things, how much they know and the level
of their ability, and ask questions for which there are right or wrong or good or bad answers.
The latter category can focus on what people are capable of knowing or doing (ability tests) or
what they actually know or can do (aptitude or attainment tests).
In this chapter, a distinction is made between psychological or psychometric tests, which
measure or assess intelligence or personality, and aptitude tests, which are occupational or job-
related tests that assess the extent to which people can do the work. These are dealt with in the
first two sections of this chapter. Before using any type of test it is necessary to be aware of the
characteristics of a good test and methods of interpreting test results, and these considerations
are examined in the following two sections. The chapter concludes with sections dealing with
choosing tests, using them in a selection procedure and guidelines on their use.
Psychological tests
Psychological tests use systematic and standardized procedures to measure differences in indi-
vidual characteristics such as intelligence and personality. They enable selectors to gain a
greater understanding of candidates to help in predicting the extent to which they will be suc-
cessful in a job. Psychological tests are measuring instruments, which is why they are often
referred to as psychometric tests. ‘Psychometric’ literally means mental measurement. For
selection purposes, the main types of tests are those used for measuring intelligence and ability
and those concerned with assessing personality characteristics.
Intelligence tests
Intelligence tests measure a range of mental abilities which enable a person to succeed at a
variety of intellectual tasks using the faculties of abstract thinking and reasoning. They are
concerned with general intelligence (termed ‘g’ by Spearman, 1927, one of the pioneers of
intelligence testing) and are sometimes called ‘general mental ability’ (GMA) tests. Intelligence
tests measure abilities while cognitive tests measure an individual’s learning in a specific subject
area. The meta-analysis conducted by Schmidt and Hunter (1998) showed that intelligence
tests had high predictive validity. In fact, when combined with a structured interview, they had
the highest predictive value of all the methods of selection they studied.
Selection Tests 569
Intelligence tests contain questions, problems and tasks. The outcome of a test can be expressed
as a score that can be compared with the scores of members of the population as a whole or
the population of the whole or part of the organization using the test (norms).
The outcome of an intelligence test may sometimes be recorded as an intelligence quotient
(IQ), which is the ratio of an individual’s mental age to the individual’s actual age as measured
by an intelligence test. When the mental and actual age correspond, the IQ is100. Scores above
100 indicate that the individual’s level of average is above the norm for his or her age, and vice
versa. It is usual now for IQs to be directly computed as an IQ test score. It is assumed that
intelligence is distributed normally throughout the population; that is, the frequency distribu-
tion of intelligence corresponds with the normal curve shown in Figure 33.1.
60 100 140
score
The normal curve describes the relationship between a set of observations and measures and
the frequency of their occurrence. It indicates that for characteristics such as intelligence that
can be measured on a scale, a few people will produce extremely high or low scores and there
will be a large proportion of people in the middle. Its most important characteristic is that it
is symmetrical – there are an equal number of cases on either side of the mean, the central axis.
The normal curve is a way of expressing how scores will typically be distributed; for example,
that 60 per cent of the population are likely to get scores between x and y, 20 per cent are likely
to get scores below x and 20 per cent are likely to get more than y.
Intelligence tests can be administered to a single individual or to a group. They can also be
completed online.
Ability tests
Ability tests establish what people are capable of knowing or doing. Although the term can
refer primarily to reasoning ability, the British Psychological Society (2007) refers to ability
tests as measuring the capacity for:
570 People Resourcing
• verbal reasoning – the ability to comprehend, interpret and draw conclusions from oral
or written language;
• numerical reasoning – the ability to comprehend, interpret and draw conclusions from
numerical information;
• spatial reasoning – the ability to understand and interpret spatial relations between
objects;
• mechanical reasoning – understanding of everyday physical laws such as force and
leverage.
Personality tests
Personality tests attempt to assess the personality of candidates in order to make predictions
about their likely behaviour in a role. Personality is an all-embracing and imprecise term that
refers to the behaviour of individuals and the way it is organized and coordinated when they
interact with the environment. There are many different theories of personality and, conse-
quently, many different types of personality tests. These include self-report personality ques-
tionnaires and other questionnaires that measure interests, values or work behaviour.
One of the most generally accepted ways of classifying personality is the five-factor model,
which defines the following ‘big five’ key personality characteristics.
As noted by Schmidt and Hunter (1998), integrity and conscientiousness tests have fairly high
predictive validity (0.41 and 0.31 respectively).
Selection Tests 571
Self-report personality questionnaires are commonly used. They usually adopt a ‘trait’
approach, defining a trait as a fairly independent but enduring characteristic of behaviour that
all people display but to differing degrees. Trait theorists identify examples of common behav-
iour, devise scales to measure these, and then obtain ratings on these behaviours by people
who know each other well. These observations are analysed statistically, using the factor analy-
sis technique to identify distinct traits and to indicate how associated groups of traits might be
grouped loosely into ‘personality types’.
‘Interest’ questionnaires are sometimes used to supplement personality tests. They assess the
preferences of respondents for particular types of occupation and are therefore most applica-
ble to vocational guidance but can be helpful when selecting apprentices and trainees.
‘Value’ questionnaires attempt to assess beliefs about what is ‘desirable or good’ or what is
‘undesirable or bad’. The questionnaires measure the relative prominence of such values as
conformity, independence, achievement, decisiveness, orderliness and goal-orientation.
Personality tests can provide interesting supplementary information about candidates that is
free from the biased reactions that frequently occur in face-to-face interviews, but they have to
be used with great care. The tests should have been developed by a reputable psychologist or
test agency on the basis of extensive research and field testing and they must meet the specific
needs of the user. Advice should be sought from a member of the British Psychological Society
on what tests are likely to be appropriate.
Aptitude tests
Aptitude tests are job-specific tests designed to predict the potential an individual has to
perform tasks within a job. They typically take the form of work sample tests, which replicate
an important aspect of the actual work the candidate will have to do, such as using a keyboard
or carrying out a skilled task such as repair work. Work sample tests can be used only with
applicants who are already familiar with the task through experience or training.
Aptitude tests should be properly validated. This will be the case if a test or a ‘test battery’ (an
associated group of tests) has been obtained from a reputable test agency. Alternatively, a
special test can be devised by or for the organization to determine the aptitudes required by
means of job and skills analysis. The test is then given to employees already working on the job
and the results compared with a criterion, usually managers’ or team leaders’ ratings. If the
correlation between test and criterion is sufficiently high, the test is then given to applicants.
To validate the test further, a follow-up study of the job performance of the applicants selected
by the test is usually carried out. This is a lengthy procedure, but without it no real confidence
can be attached to the results of any aptitude test. Properly validated work sample tests have a
high level of predictive validity (0.54 according to Schmidt and Hunter, 1998). The operative
words are ‘properly validated’ – many do-it-yourself tests are worse than useless because this
has not happened.
572 People Resourcing
Types of validity
There are five types of validity:
1. Predictive validity – the extent to which the test correctly predicts future behaviour. To
establish predictive validity it is necessary to conduct extensive research over a period of
time. It is also necessary to have accurate measures of performance so that the prediction
can be compared with actual behaviour.
2. Concurrent validity – the extent to which a test score differentiates individuals in relation to
a criterion or standard of performance external to the test. This means comparing the test
scores of high and low performances as indicated by the criteria and establishing the degree
to which the test indicates who should fit into the high or low performance groups.
Selection Tests 573
3. Content validity – the extent to which the test is clearly related to the characteristics of the
job or role for which it is being used as a measuring instrument.
4. Face validity – the extent to which the test ‘looks’ or ‘feels’ right in the sense that it is meas-
uring what it is supposed to measure.
5. Construct validity – the extent to which the test measures a particular construct or char-
acteristic. Construct validity is in effect concerned with looking at the test itself. If it is
meant to measure numerical reasoning, is that what it measures?
Measuring validity
A criterion-related approach is used to assess validity. This means selecting criteria against
which the validity of the test can be measured. These criteria must reflect ‘true’ performance at
work as accurately as possible. A single criterion is inadequate. Multiple criteria should be
used. The extent to which criteria can be contaminated by other factors should also be consid-
ered and it should be remembered that criteria are dynamic – they will change over time.
Test validity can be expressed as a predictive validity co-efficient in which 1.0 would equal
perfect correlation between test results and subsequent behaviour, while 0.0 would equal no
relationship between the test and performance. The following rule of thumb guide was pro-
duced by Smith (1984) on whether a validity coefficient is big enough:
over 0.50, excellent
0.40–0.49, good
0.30–0.39, acceptable
less than 0.30, poor.
On the basis of the research conducted by Schmidt and Hunter (1998), only work sample tests
and intelligence tests with coefficients of 0.54 and 0.51 respectively are excellent.
Norms
An individual’s score in a test is not meaningful on its own. It needs to be compared with the
scores achieved by the population on whom the test was standardized – the norm or reference
group. A normative score is read from a norms table and might, for example, indicate that
someone has performed the test at a level equivalent to the top 30 per cent of the relevant
population.
574 People Resourcing
Criterion scores
Norms simply tell us how someone has performed a test relative to other people. A more pow-
erful approach is to use the relationship between test scores and an indication of what the test
is designed to measure, such as job success. This is described as a criterion measure. For
example, when the test is validated it might be established that for scores of less than 10 on a
test, 50 per cent of people would fail in the job, while the failure rate may be 35 per cent for
those who score between 10 and 15 and 20 per cent for those scoring more than 15. The score
achieved by the individual would therefore enable a prediction to be made of the likelihood of
success.
Choosing tests
It is essential to choose tests that meet the four criteria of sensitivity, standardization, reliability
and validity. It is very difficult to achieve the standards required if an organization tries to
develop its own test batteries, unless it employs a qualified psychologist or obtains professional
advice from a member of the British Psychological Society. This organization, with the support
of the reputable test suppliers, exercises rigorous control over who can use what tests and the
standard of training required and given. Particular care should be taken when selecting per-
sonality tests – there are a lot of charlatans about.
Do-it-yourself tests are always suspect unless they have been properly validated and realistic
norms have been established. They should not be used.
Aptitude tests are most useful for jobs where specific and measurable skills are required, such
as word-processing and skilled repair work. Personality tests can complement structured inter-
views and intelligence and aptitude tests. Some organizations use them for jobs such as selling
where they believe that ‘personality’ is important, and where it is not too difficult to obtain
quantifiable criteria for validation purposes. They may be used to assess integrity and consci-
entiousness where these characteristics are deemed to be important.
Tests should be administered only by people who have been trained in what the tests are meas-
uring, how they should be used, and how they should be interpreted.
It is essential to evaluate all tests by comparing the results at the interview stage with later
achievements. To be statistically significant, these evaluations should be carried out over a rea-
sonable period of time and cover as large a number of candidates as possible.
In some situations a battery of tests may be used, including various types of intelligence, per-
sonality and aptitude tests. These may be a standard battery supplied by a test agency, or a
custom-built battery may be developed. The biggest pitfall to avoid is adding extra tests just for
the sake of it, without ensuring that they make a proper contribution to the success of the pre-
dictions for which the battery is being used.
The CIPD (2007c) has noted that online testing is growing in popularity (25 per cent of
respondents to their survey made some use of them). Online tests are most used for recruiting
graduates and when high volumes of applicants have to be dealt with.
4. Administer, score and interpret tests in accordance with the instructions provided by the
test distributor and to the standards defined by the British Psychological Society.
5. Store test materials securely and to ensure that no unqualified person has access to them.
6. Ensure test results are stored securely, are not accessible to unauthorized or unqualified
persons and are not used for any purposes other than those agreed with the test taker.
7. Obtain the informed consent of potential test takers, making sure that they understand
why the tests will be used, what will be done with their results and who will be provided
with access to them.
8. Ensure that all test takers are well informed and well prepared for the test session, and that
all have had access to practice or familiarization materials where appropriate.
9. Give due consideration to factors such as gender, ethnicity, age, disability and special
needs, educational background and level of ability in using and interpreting the results of
tests.
10. Provide the test taker and other authorized persons with feedback about the results in a
form that makes clear the implications of the results, is clear and in a style appropriate to
their level of understanding.
Questions
1. What does the term ‘validity’ mean when applied to selection tests? How can it be
measured?
2. What are the advantages and disadvantages of personality tests as a method of
selection?
3. From a colleague: ‘I have just come back from a spell in our French associated company
where they swear by graphology as a method of selection. Is there anything in it for
us?’
References
British Psychological Society (2007) Psychological Testing: A user’s guide, Psychological Testing Centre,
Leicester
CIPD (2007c) Psychological Testing, CIPD Fact Sheet, www.cipd.co.uk
578 People Resourcing
International Test Commission (2005) International Guidelines on Computer-based and Internet Delivered
Testing, British Psychological Society, Leicester
Schmidt, F L and Hunter, J E (1998) The validity and utility of selection methods in personnel psychology:
practical and theoretical implications of 85 years of research findings, Psychological Bulletin, 124 (2),
pp 262–74
Smith, M (1984) Survey Item Blank, MCB Publications, Bradford
Spearman, C (1927) The Abilities of Man, Macmillan, New York