Harlen 2005

The Curriculum Journal, Vol. 16, No. 2, June 2005, pp.
207 – 223
Teachers’ summative practices and

assessment for learning – tensions and
synergies
Wynne Harlen*
Faculty of Education, University of Cambridge
This article concerns the use of assessment for learning (formative assessment) and assessment of
learning (summative assessment), and how one can affect the other in either positive or negative
ways. It makes a case for greater use of teachers’ judgements in summative assessment, the reasons
for this being found in the research that is reviewed in the first sections of the article. This research,
concerning the impact of summative assessment, particularly high-stakes testing and examinations,
on students’ motivation for learning and on teachers and the curriculum, reveals some seriously
detrimental effects. Suggestions for changes that would reduce the negative effects include making
greater use of teachers’ summative assessment. However, this raises other issues, about the
reliability and validity of teachers’ assessment. Research on ways of improving the dependability of
teachers’ summative assessment suggests actions that would equally support more effective use of
assessment to help learning. The later sections of the article address the issues and opportunities
relating to the possibility of assessment that serves both formative and summative purposes, with
examples of what this means in practice, leading to the conclusion that the distinction between
formative and summative purposes of assessment should be maintained, while assessment systems
should be planned and implemented to enable evidence of students’ ongoing learning to be used for
both purposes.
Keywords: summative assessment; formative assessment; teachers’ assessment; motivation

for learning; reliability; validity
Introduction
All assessment in the context of education involves making decisions about what is
relevant evidence for a particular purpose, how to collect the evidence, how to
interpret it and how to communicate it to intended users. Such decisions follow from
the purpose of conducting the assessment. These purposes include helping learning,
*Haymount Coach House, Bridgend, Duns, Berwickshire, TD11 3DJ, UK. Email:
wynne@torphin.freeserve.co.uk
ISSN 0958-5176 (print)/ISSN 1469-3704 (online)/05/020207–17

ª 2005 British Curriculum Foundation
DOI: 10.1080/09585170500136093
208 W. Harlen
summarizing achievements at a certain time, monitoring levels of achievement, and

research. However, there are also many different uses of assessment information. In
some cases information is used for a purpose for which it was not designed and this
can have an effect that may not have been intended. For example, when students’ test
data, collected for the purpose of identifying the level reached by individual students
at a particular time, are used in setting targets and evaluating teachers or schools on
the basis of whether or not the targets are achieved, this may not only have unwanted
effects, as described below, but may also not be the best information for this use.
The two main purposes of assessment discussed in this article are for helping
learning and for summarizing learning. It is sometimes difficult to avoid referring to
these as if they were different forms or types of assessment. They are not. They are
discussed separately only because they have different purposes; indeed the same
information, gathered in the same way, would be called formative if it were used to
help learning and teaching, or summative if it were not so utilized but only employed
for recording and reporting. While there is a single clear use if assessment is to serve a
formative purpose, in the case of summative assessment there are various ways in
which the information about student achievement at a certain time is used. These
uses include: internal school tracking of students’ progress; informing parents,
students and the students’ next teacher of what has been achieved; certification or
accreditation of learning by an external body; and selection for employment or higher
education. It is also used, with other information, for monitoring the performance of
teachers and schools.
The summative uses of assessment can be grouped into ‘internal’ and ‘external’ to
the school community. Internal uses include using regular grading for record-
keeping, informing decisions about courses to follow where there are options within
the school, and reporting to parents and to the students themselves. Teachers’
judgements, often informed by teacher-made tests or examinations, are commonly
used in these ways. External uses include certification by examination bodies or for
vocational qualifications, selection for employment or for further or higher education,
monitoring the school’s performance and school accountability, often based on the
results of externally created tests or examinations. Some of these uses, as just
mentioned, are not ones for which assessment information alone ought to be used. It
is when information about students’ achievement is used for decisions that are
important, not just to the individual student but also to the teachers and school, that
the results acquire ‘high stakes’. These put pressure on teachers, which impacts not
only on the learning experiences of students but also on the nature of the assessment
itself.
In the next section we look briefly at these two aspects, since they lead to the
argument for the focus here on greater use of teachers’ judgements for external uses
of summative assessment. Evidence in support of the argument comes from a review
of research on the impact of high-stakes tests on students’ motivation for learning.
Later sections discuss the issues raised when teachers’ judgements are used for
summative assessment. The findings of further reviews of research, on the reliability
and validity of teachers’ summative assessment and on the impact on students and
Tensions and synergies of teachers’ summative practices 209
teachers of using teachers’ judgements, are outlined. These point to actions that need
to be taken to improve the dependability of teachers’ assessments, actions that
coincide with the key features of using assessment formatively. This leads to the
discussion of how to bring about synergy between the processes of formative and
summative assessment.
The impact of high-stakes summative assessment on teaching and the

curriculum
There have been several reviews of research on the impact of high-stakes tests on the
curriculum (Linn et al., 1982; Crooks, 1988; Koretz, 1988; Koretz et al., 1991;
Shepard, 1991; Kellaghan et al., 1996; Black & Wiliam, 1998; Stiggins, 1999; Linn,
2000). There are strong common themes in the findings from these reviews about the
use of tests in ways that affect the status of teachers or schools or the future of
individual students. This high-stakes use is universally found to be associated with
teachers focusing on the content of the tests, administering repeated practice tests,
training students in the answers to specific questions or types of question, and
adopting transmission styles of teaching. In such circumstances teachers make little
use of assessment formatively to help the learning process (Broadfoot et al., 1998;
Reay & Wiliam, 1999; Osborn et al., 2000; Pollard et al., 2000).
The result of this focus on passing tests is that test scores rise – at least as long as
the tests for which students are trained are being used. But this rise in scores is not the
rise in achievement that the proponents of testing claim; rather, it is an indication of
what Gordon & Rees (1997) reported, that teachers can train students to pass any
kind of test, even those intending to assess higher thinking skills. Linn (2000)
provides evidence of what others (e.g. Koretz, 1988; Koretz et al., 1991; Kohn, 2000)
have suspected, that the increase in test scores found on the introduction of tests is
due to familiarity with the particular test content and not to increased achievement.
This undermines the claim that better tests will lead to better teaching and learning,
pointed out by Kellaghan et al. (1996, p. 53):
Proponents of a system of high-stakes examinations will argue that if we get the right
kinds of tests – ones worth teaching to and preparing for – then test-preparation practices
will lead to the development of the valued skills purportedly measured by the test.
However, we believe that this argument seriously underestimates the ability of test
preparation to corrupt the very construct the test is trying to measure. . . . An important
implication of this is that when such corruption occurs, inferences from the test to the
original domain of interest – which if the educational reform language is to be believed is
the domain of higher-order thinking skills and habits of learning – will no longer be valid.
The reference here to a threat to validity of the assessment is but one of several. High-
stakes tests are inevitably designed to be as ‘objective’ as possible, since there is a
premium on reliable marking in the interests of fairness. This has the effect of
reducing what is assessed to what can be readily and reliably marked. Generally this
excludes many worthwhile outcomes of education such as problem-solving and
critical thinking.
210 W. Harlen
Evidence that raised levels of achievement result from using assessment in a

different way, as part of teaching to help learning, was brought together by Black &
Wiliam in their review of research on classroom assessment (1998). Their work into
the development of practical approaches to using assessment for learning, subsequent
to the review, has provided further evidence of this effect on achievement (Black et al.,
2003). Since the measures of change in achievement used in this work are the same
statutory tests as used in all schools, the results show that improvement can be
brought about without teaching to the test. However, the classroom practices of
formative assessment, as described in other articles in this issue, are very different
from those that are found when high-stakes testing dominates. Research, as noted
above, shows that when testing is influencing what goes on in the classroom, little use
is made of formative assessment to help learning. It takes a good deal of support – and
courage – for teachers to turn round their practices from being test-oriented to being
learning-oriented.
Impact on students’ motivation for learning

A cynical view of the impact of high-stakes testing on the curriculum and teaching
might be that this is not an unwanted side-effect, but the result of a policy to
control what is taught at a detailed level and, in particular, to ensure a focus on a
narrow view of ‘the basics’. However, there is another effect of a testing regime that
has not been given so much exposure – one that even the most cynical politician
could hardly condone. This is the impact on motivation for learning. Throughout
the 1990s, evidence was accumulating of the detrimental effect of frequent testing
on students’ enjoyment of school, their willingness to learn, other than for the
purpose of passing tests or examinations, and their understanding of the process of
learning.
The impact of summative assessment on students’ motivation for learning can be
both direct and indirect. A direct impact can be through inducing test anxiety and the
effect of low scores on self-esteem and perceptions of themselves as learners; an
indirect impact can be through the effect on their teachers and the curriculum. Any
negative impact on motivation for learning is clearly highly undesirable, particularly at
a time when the importance of learning to learn and lifelong learning is widely
embraced. Thus the process of driving up test scores could have serious
consequences for the current generation of students. This prompted a systematic
review for evidence on the impact of summative assessment and testing on motivation
for learning.
The findings of the review are reported in full in Harlen & Deakin Crick (2002,
2003a) and in summary in Testing, motivation and learning (ARG, 2002). Motivation
is a complex concept, embracing several aspects that relate to learning, such as self-
esteem, self-regulation, interest, effort, self-efficacy, and a person’s sense of themself
as a learner. None of the research studies found in the review dealt with all of the
aspects of motivation, but they could be grouped according to the outcomes they
investigated. Box 1 summarizes the main findings.
Box 1. Main findings from the systematic review of research on the

impact of high-stakes tests on aspects of students’ motivation for
learning (Harlen & Deakin Crick, 2002)
. When passing tests is high stakes, teachers adopt a teaching style which
emphasizes transmission teaching of knowledge, thereby favouring those
students who prefer to learn in this way and disadvantaging and lowering the
self-esteem of those who prefer more active and creative learning experiences.
. High-stakes tests can become the rationale for all that is done in classrooms
and permeates teachers’ own assessment interactions.
. Repeated practice tests reinforce the low self-image of the lower-achieving
students.
. Tests can influence teachers’ classroom assessment, which is interpreted by
students as purely summative regardless of teacher intention, possibly as a
result of teachers’ over-concern with performance rather than process.
. Students are aware of the performance ethos in the classroom and that the
tests give only a narrow view of what they can do.
. Students dislike selection and high-stakes tests, show high levels of test
anxiety (particularly girls) and prefer other forms of assessment.
. Feedback on assessments has an important role in determining further
learning. Judgemental feedback may influence students’ views of their
capability and likelihood of succeeding. Students use feedback from earlier
performance on similar tasks in relation to the effort they invest in further tasks.
. A school’s ‘assessment culture’ influences students’ feelings of self-efficacy,
so teacher collegiality is important and should be encouraged by school
management.
. An education system that puts great emphasis on evaluation and selectivity
produces students with strong extrinsic orientation towards grades and
social status.
The review not only identified the negative impacts of testing, but also gave clues as
to what actions could be taken to reduce these impacts. Suggested action included, at
the class level: explaining to students the purpose of tests and involving them in
decisions about tests; using assessment to convey a sense of progress in their learning
to students; providing explanations to students about the purpose of tests and other
assessments of their learning; providing feedback that helps further learning; and
developing students’ self-assessment skills and their use of criteria relating to
learning, rather than test performance. It is noteworthy that these actions refer to
several of the key features of assessment used to help learning.
Implications for assessment policy were drawn from the findings by convening a
consultation conference of experts representing policy-makers, practitioners, teacher
educators and researchers. The policy implications included steps that should be taken
212 W. Harlen
to reduce the high stakes of summative assessment, by using a wider range of indicators
of school performance, and by using a more valid approach to tracking standards at the
national level, through testing a sample of students rather than a whole age group. It was
also emphasized that more valid information about individual student performance was
needed than could be obtained through testing alone, and that more use should be make
of teachers’ judgements as part of summative assessment. We now turn to the potential
advantages and disadvantages of this latter course of action.
Issues and opportunities of using teachers’ assessment for summative

purposes
Issues
There are several potential advantages in using teachers’ judgements more widely as
part of summative assessment for external as well as internal uses. First, teachers are
making judgements about students’ attainment in the course of their normal
interactions during teaching and learning. Second, in this process teachers can build
up a picture of students’ attainments across the full range of activities and goals. This
gives a broader and fuller account than can be obtained through any test that uses a
necessarily restricted range of items and thus provides a more valid means of assessing
outcomes of education (Crooks, 1988; Wood, 1991; Maxwell, 2004). Third, the
teacher has the opportunity to use such accumulating information gathered in this way
to help learning. Fourth, it can facilitate a more open and collaborative approach to
summative assessment, in which students can share through self-assessment and derive
a sense of progress towards learning goals rather than performance goals (ASF, 2004a).
That these potential advantages can be translated into reality is evident in practice in
systems, such as those in Queensland and Sweden, where teachers’ judgements are used
for assessment on which important decisions for students are based.
At the same time, there are potential arguments against teachers having a
significant role in summative assessment. In the first place, there is no doubt that
there is evidence of unreliability and bias in teachers’ assessment. Second, where the
assessment is for external use (such as for certification by an awarding body), there
would be additional work for the teachers and resources required for moderation
procedures. Third, there is the possibility that the requirements of moderation
procedures could constrain teachers’ use of the full range of evidence available
(including ephemeral evidence) to focus only on what can be ‘safely’ assessed. There
is warning here that summative assessment by teachers in some circumstances can
have the same narrowing effect on the curriculum as do tests.
These opposing arguments gave rise to a search for the evidence in relation to the
questions:
. What is the evidence concerning the reliability and validity of assessment by

teachers used for summative purposes and how might it be improved?
. What is its impact on students, teachers and the curriculum?
Two further systematic reviews of research (Harlen, 2004a, 2004b) were carried out
to bring together relevant evidence to answer these questions. The definition of
summative assessment by teachers adopted in the reviews was
The process by which teachers gather evidence in a planned and systematic way in order
to draw inferences about their students’ learning, based on their professional judgement,
and to report at a particular time on their students’ achievements.
This excludes the role of teachers as markers or examiners in the context of external
examinations, where they do not mark their own students’ work.
In addition to defining reliability and validity it was found useful to discuss
approaches in terms of dependability. The interdependence between the concepts of
reliability and validity means that increasing one tends to decrease the other.
Dependability is a combination of the two, defined in this instance as the extent to
which reliability is optimized while ensuring validity. This definition prioritizes
validity, since a main reason for using teachers’ assessment rather than depending
entirely on tests for external summative assessment is to increase the construct
validity of the assessment.
The main findings from the two systematic reviews of research on the use of
teachers’ assessment for summative purposes are given in Box 2.
Box 2. Main findings from two systematic reviews of teachers’

assessment used for summative purposes (Harlen, 2004a, 2004b)
Evidence of the reliability and validity of assessment by teachers used
for summative purposes
. The extent to which the assessment tasks, and the criteria used in judging
them, are specified are key variables affecting dependability. Where neither
tasks nor criteria are well specified, dependability is low.
. Detailed criteria, describing progressive levels of competency, have been
shown to be capable of supporting reliable assessment by teachers.
. Tightly specifying tasks does not necessarily increase reliability and is likely
to reduce validity by reducing the opportunity for a broad range of learning
outcomes to be included.
. Greater dependability is found where there are detailed, but generic, criteria
that allow evidence to be gathered from the full range of classroom work.
. Bias in teachers’ assessments is generally due to teachers taking into account
information about non-relevant aspects of students’ behaviour or being
apparently influenced by gender, special educational needs, or the general or
verbal ability of a student in judging performance in a particular task.
. Researchers claim that bias in teachers’ assessment is susceptible to
correction through focused workshop training.
214 W. Harlen
. Participation of teachers in developing criteria is an effective way of enabling

the reliable use of the emerging criteria.
. There is variation in the way that teachers gather information from students’
regular work, but no evidence that this affects dependability. But it is
important for teachers to follow agreed procedures for applying criteria to
the evidence they collect.
. Consistency in applying criteria depends upon teachers being clear about the
goals of the work and on the thoroughness with which relevant areas of the
curriculum are covered in teaching.
. The context of the school’s support and value system has a role in how
assessment by teachers is practised. Conditions associated with greater
dependability include the extent to which teachers share interpretations of
criteria and develop a common language for describing and assessing
students’ work.
. Students find assessment of coursework motivating, enabling them to learn
during the assessment process. But they need more help to understand the
assessment criteria and what is expected of them in meeting these criteria.
. The way in which teachers present classroom assessment activities may affect
students’ orientation to learning goals or performance goals.
. Changing teachers’ assessment practices to include processes and explana-
tions leads to better student learning.
. The introduction of teachers’ assessment related to levels of the National
Curriculum in England and Wales was perceived by teachers as having a
positive impact on students’ learning, enhanced by teachers working
collaboratively towards a shared understanding of the goals of the assessment
and of procedures to meet these goals.
. Teachers find compensation for the time spent on assessment in the
information they gain about their students and about learning opportunities
for students that need to be extended.
. The existence of criteria, and particularly involvement in identifying them,
help teachers’ understanding of the meaning of learning outcomes. But
criteria that identify qualitative differences in progression towards learning
goals need to be distinguished from externally devised checklists, which
encourage a mechanistic approach to assessment.
. Close external control of teachers’ summative assessment can inhibit
teachers from gaining detailed knowledge of their students.
. Opportunities for teachers to share and develop their understanding of
assessment procedures enable them to review their teaching practice, their
view of students’ learning and their understanding of subject goals. Such
opportunities have to be sustained over time and preferably should include
provision for teachers to work collaboratively across, as well as within,
schools.
Opportunities
There is considerable similarity in some of the implications from the research
evidence in the three reviews, relating particularly to: the importance of providing
non-judgemental feedback that helps students know where they are in relation to
learning goals; the need for teachers to share with students the reasons for, and goals
of, assessment; the value to teachers of using assessment to learn more about their
students and to reflect on the adequacy of the learning opportunities being provided;
teachers and students placing less emphasis on comparisons among students and
more on individual development; and helping students to take responsibility for their
learning and work towards learning goals rather than performance goals. All these
points are ones that favour formative assessment as well as improving the
dependability and positive impact of summative assessment by teachers. It follows
that the actions teachers need to take in developing their assessment for summative
purposes overlap to a great extent with the actions required for practising formative
assessment.
The next section explores the extent to which assessment information can be used
for both summative and formative purposes, without the use for one purpose
endangering the effectiveness of use for the other. Some of those involved in
developing assessment have argued that the distinction is not helpful and that we
should simply strive for ‘good assessment’. Good formative assessment will support
good judgements by teachers about student progress and levels of attainment
(Hutchinson, 2001) and good summative assessment will provide feedback that can
be used to help learning. Maxwell (2004) describes progressive assessment, which we
consider below, as blurring the boundary between formative and summative
assessment. However, it remains the case that formative and summative are different
purposes of assessment and while the same information may be used for both, it is
necessary to ensure that the information is used in ways that serve these purposes. It
seems that, under current arrangements, in practice information is gathered initially
with one of these purposes in mind and may or may not be used for the other. These
are arguments to return to after looking at these current practices and considering the
possibility of collecting information designed for both purposes.
Using summative assessment to help learning

Using assessment to help learning means that the students, the ones who do the
learning, have information about where they are in their learning, what steps they
need to take and how to take them. This information can come from self-assessment
and from feedback – from the teacher or from peers. Self-assessment and peer
assessment depend on students knowing the goals of their work and the quality
criteria used in assessing it. This knowledge also enables students to make use of
feedback from the teacher to recognize what is needed for their next steps in learning
and to become engaged in learning. At the same time, the teacher makes use of
feedback about the student’s current understanding to adjust teaching, making
216 W. Harlen
materials and opportunities for learning available and, most importantly, making
clear the purposes and goals of the work.
Some examples of using assessment in this way are provided by Maxwell (2004)
and Black et al. (2003). Maxwell describes the approach to assessment used in the
Senior Certificate in Queensland, in which evidence is collected over time in a
student portfolio, as ‘progressive assessment’. He states that
All progressive assessment necessarily involves feedback to the student about the quality
of their (sic) performance. This can be expressed in terms of the student’s progress
towards desired learning outcomes and suggested steps for further development and
improvement. . .
For this approach to work, it is necessary to express the learning expectations in terms of
common dimensions of learning (criteria). Then there can be discussion about whether
the student is on-target with respect to the learning expectations and what needs to be
done to improve performance on future assessment where the same dimensions appear.
As the student builds up the portfolio of evidence of their performance, earlier assessment
may be superseded by later assessment covering the same underlying dimensions of
learning. The aim is to report ‘where the student got to’ in their learning journey, not
where they started or where they were on the average across the whole course. (Maxwell,
2004, pp. 2, 3)
The identification of goals and assessment criteria in terms of a ‘common

dimension of learning’ is, as Maxwell states, central to this approach. Further,
descriptions of these dimensions of learning need to be detailed to be capable of
giving guidance, yet not be so prescriptive as to infringe teachers’ ownership of the
curriculum. As the research reviewed earlier shows, the dependability of assessment is
enhanced when teachers have a thorough understanding of the goals and of the nature
of progression towards them. In Queensland this is facilitated, on the one hand, by
schools being able to make decisions about their own work plan and, on the other
hand, by teachers’ regular participation in the process of moderation. Time and
respect for the professionalism of teachers (Cumming & Maxwell, 2004) are also
important. These are clearly essential factors when teachers’ assessment has
outcomes with high stakes for individual students. However, a significant feature of
the Queensland system is that the assessment of students in the Senior Certificate is
detached from school and teacher accountability procedures.
Black et al. (2003) include the formative use of summative assessment as one of
four practices that teachers found were effective ways of implementing formative
assessment (the others being questioning, feedback by marking and student peer
assessment and self-assessment). These practices were all devised or elaborated by
teachers as they strove, working with the researchers, to make changes in their
classrooms so that assessment was used to help learning. In relation to the formative
use of summative tests, the teachers devised three main ways of using classroom tests,
beyond just assessing attainment, to develop students’ understanding. The first of
these involved helping students to prepare for tests by reviewing their work and
screening past test questions to identify areas of insecure understanding. This
reflection on their areas of weakness enabled them to focus their revision. The second
innovation was to ask students to set test questions and devise marking schemes. This
helped them ‘both to understand the assessment process and to focus further efforts
for improvement’ (Black et al., 2003, p. 54). The third change was for the teachers to
use the outcome of tests diagnostically and to involve students in marking each
other’s tests, in some cases after devising the mark scheme. This has some similarity
to the approach reported by Carter (1997), which she called ‘test analysis’. In this the
teacher returned test papers to students after indicating where there were errors, but
leaving the students to find and correct these errors. The students’ final mark
reflected their response to the test analysis as well as the initial answers. Carter
described this as shifting the responsibility for learning to the students, who were
encouraged to work together to find and correct their errors.
These approaches are ones that teachers can use in the context of classroom tests
over which they have complete control. Black et al. (2003) noted that when external
tests are involved, the process can move ‘from developing understanding to ‘‘teaching
to the test’’. More generally, the pressures exerted by current external testing and
assessment requirements are not fully consistent with good formative practices’
(Black et al., 2003, p. 56). These teachers used their creativity to graft formative value
on to summative procedures. A more fundamental change is needed if assessment is
to be designed to serve both purposes from the start.
There is the potential for such change in the use of computers for assessment,
which provide the opportunity for assessment to serve both formative and
summative purposes. In the majority of studies of the use of ICT for assessment
of creative and critical thinking, reviewed by Harlen & Deakin Crick (2003b), the
assessment was intended to help development of understanding and skills as well as
to assess attainment in understanding and skills. The effectiveness of computer
programs for both these purposes was demonstrated by those studies where
computer-based assessment was compared with assessment by paper and pencil
(Jackson, 1989; Kumar et al., 1993). The mechanism for the formative impact was
the feedback that students received from the program. In some cases this was no
more than reflecting back to the students the moves or links they made between
concepts or variables as they attempted to solve a problem. In others (e.g.
Osmundson et al., 1999), the feedback was in providing a ‘score’ for a concept map
that they created on the screen by dragging concepts and links. The score compared
the students’ maps with an ‘expert map’ and required a much greater degree of
analysis than could be provided in any other way. In other studies (Schacter et al.,
1997) the computer program used a record of all mouse clicks in order to provide
feedback to the students and teacher information about the processes used in
reaching a solution. Schacter et al. referred to this as ‘bridging the gap between
testing and instruction’.
In order for assessment to have a formative purpose it is necessary to be able to
report not only the students’ final performance, but also what processes students need
to improve in order to raise their performance. The collection of information about
processes, even if feasible in a non-computer-based assessment, is immensely time-
consuming and would not be a realistic approach to meeting the need for information
218 W. Harlen
for improving learning. The use of computers makes this information available, in
some cases instantly, so that it provides feedback for the learner and the teacher that
can be used both formatively and summatively. In these cases the process of
assessment itself begins to impact on performance; teaching and assessment begin to
coalesce. Factors identified as values of using computers for learning then become
equally factors of value for assessment. These include: speed of processing, which
supports speed of learning; elements of motivation such as confidence, autonomy,
self-regulation and enthusiasm, which support concentration and effort; ease of
making revisions and improved presentation, which support quality of writing and
other products; and information handling and organization, which support under-
standing (NCET, 1994).
Using formative assessment information for summative assessment

The approaches discussed above are linked to summative assessment as an
occasional, if frequent, event. In between classroom tests, whether administered by
computer or otherwise, there are innumerable other classroom events in which
teachers gather information about the students by observing, questioning, listening
to informal discussions among students, by reviewing written work and by using
students’ self-assessment (Harlen & James, 1997). In formative assessment this
information may be used immediately to provide students with help or it may be
stored and used to plan learning opportunities at a later stage. The information
gathered in this way is often inconclusive and may be contradictory, for what
students can do is likely to be influenced by the particular context. This variation,
which would be a problem for summative assessment, is useful information for
formative purposes, suggesting the contexts in which students can be helped to
develop their ideas and skills. By definition, information gathered at this level of
detail relates to all aspects of students’ learning. It is valuable information that is
well suited to deciding next steps for individual learners or groups. An
important question is: can this rich but sometimes inconsistent information be
used for summative assessment purposes as well as for formative assessment, for
which it is so well suited? If not, then separate summative assessment will be
necessary.
A positive answer to this question was given by Harlen & James (1997) who
proposed that both purposes can be served providing that a distinction is made
between the evidence and the interpretation of the evidence. For formative assessment
the evidence is interpreted in relation to the progress of a student towards the goals of
a particular piece of work, next steps being decided according to where a student has
reached. The interpretation is in terms of what to do to help further learning, not
what level or grade a student has reached. For this purpose it is important for teachers
to have a view of progression in relation to the understanding and skills they are
aiming for their students to achieve. The course of progression can be usefully
expressed in terms of indicators, which both serve the purpose of focusing attention
on relevant aspects of students’ behaviour and enabling teachers to see where
students are in development. An example of indicators for the development of

observation and using information sources in the context of science at the primary
level is given in Box 3.
These indicators have been developed from what is known about progression from
research and practice, but they are not by any means definitive. It is not likely that
there is an exact and invariable sequence that applies to every student, but it is helpful
to have a rough idea. Examples of similar lists have been published in Australia
(Masters & Forster, 1996) and developed in California (the Berkeley Evaluation and
Assessment Research [BEAR] assessment system) (Wilson, 1990; Wilson et al.,
2004). In these lists, the earlier statements indicate understanding, skills or attitudes
that are likely to be developed before the ones later in the list. There are no ‘levels’,
grades or stages suggested – just a sequence expected for students in a particular age
range (in the example, in primary and early secondary school years). For formative
assessment it is not necessary to tie indicators to grade level expectation – all that is
required is to see where students are and what is the next step in their further
progress.
For summative purposes, of course, common criteria need to be applied and
achievement is generally summarized in terms of levels or grades that must have the
same meaning for all students. This means that if the information already gathered
and used formatively is to be used for summative assessment it must be reviewed
against the broader criteria that define levels or grades. This process is one of finding
the ‘best fit’ between the evidence gathered about each student and one of the
Box 3. Example of developmental indicators (from Harlen & Qualter,

2004)
Things students do that are indicators of gathering information by
observing and using information sources
1. Identify obvious differences and similarities between objects and materials.

2. Make use of several senses in exploring objects or materials.
3. Identify relevant differences of detail between objects or materials and
identify points of similarity between objects where differences are more
obvious than similarities.
4. Use their senses appropriately and extend the range of sight using a hand
lens or microscope as necessary.
5. Take an adequate series of observations to answer the question or test the
prediction being investigated.
6. Take steps to ensure that the results obtained are as accurate as they can
reasonably be and repeat observations.
7. Regularly and spontaneously use printed and electronic information sources
to check or supplement their investigations.
220 W. Harlen
Figure 1. Formative and summative assessment using the same evidence but different criteria
reporting levels. In this process the change over time can be taken into account so
that, as in the Queensland portfolio assessment, preference is given to evidence that
shows progress during the period covered by the summative assessment. This process
is similar to the one teachers are advised to use in arriving at their teachers’
assessment for reporting at the end of key stages in the National Curriculum
assessment. The difference is that in the approach suggested here teachers have
gathered information in ways suggested above (incorporating the key features of
formative assessment) over the whole period of students’ learning, and used it to help
students with their learning.
The detailed indicators will map onto the broader criteria, as suggested in
Figure 1. The mapping will smooth out any misplacement of the detailed
indicators. But it is important not to see this mapping as a summation of
judgements about each indicator. Instead the evidence is re-evaluated against the
broader reporting criteria.
Conclusion
What has the research evidence reviewed and the arguments presented here to say in
relation to the questions of whether teachers’ summative assessment and assessment
for learning need to be considered as distinct from each other or how they can be
harmonized? There seems to be value in maintaining the distinction between
formative and summative purposes of assessment while seeking synergy in relation to
the processes of assessment. These different purposes are real. One can conduct the
same assessment and use it for different purposes just as one can travel between two
places for different purposes. As the purpose is the basis for evaluating the success of
the journey, so the purpose of assessment enables us to evaluate whether or not the
purpose has been achieved. If we fuse, or confuse, formative and summative
purposes, experience strongly suggests that ‘good assessment’ will mean good
assessment of learning, not for learning.
It is suggested here that the synergy of formative and summative assessment
comes from making use of the same evidence for the two purposes. This can be,
as in the Queensland example, where work collected in the portfolio is used to
provide feedback to the students at the time it is completed as well as being used
later in assessing overall attainment. Here the procedures for using the assessment
to help learning are less well defined than in the approach that starts from the
formative use. Possibly different emphases are appropriate at different stages of
education, the detailed indicators being particularly suited at the primary level
where teachers have opportunity to gather evidence frequently but, at the same
time, need more structured help in deciding next steps across the range of subjects
they teach.
Synergy also comes from having the same person responsible for using the evidence
for both purposes. All assessment involves judgement and will therefore be subject to
some error and bias, as noted in the research findings. While this aspect has been
given attention in the context of teachers’ assessment for summative uses, it no doubt
exists in teachers’ assessment for formative purposes. Although it is not necessary to
be over-concerned about the reliability of assessment for this purpose (because it
occurs regularly and the teacher will be able to use feedback to correct a mistaken
judgement), the more carefully the assessment is made, the more value it will have in
helping learning. Thus the procedures for ensuring more dependable summative
assessment will benefit the formative use and, as noted, the teacher’s understanding
of the learning goals and the nature of progression in achieving them. For example,
experience shows that moderation of teachers’ judgements, necessary for external
uses of summative assessment, can be conducted so that it not only serves a quality
control function, but also has an impact on the process of assessment by teachers,
having a quality assurance function as well (ASF, 2004b). This will improve the
collection and use of evidence for a formative purpose as well as a summative
purpose.
The procedures that will most help both the effectiveness of formative assessment
and the reliability of summative assessment are those that involve teachers in planning
assessment and developing criteria. Through this involvement they develop owner-
ship of the procedures and criteria and understand the process of assessment,
including such matters as what makes an adequate sample of behaviour, as well as the
goals and processes of learning. This leads to the position that synergy between
formative and summative assessment requires that systems should be designed with
these two purposes in mind and should include arrangements for using evidence for
both purposes.
References
Ames, C. (1990) Motivation: what teachers need to know, Teachers College Record, 91, 409–21.
ARG (Assessment Reform Group) (2002) Testing, motivation and learning (Cambridge, Cambridge
University Faculty of Education). Available from the ARG website www.assessment-reform-
group.org
ASF (2004a) ASF Working Paper 2 Available from the ARG website.
ASF (2004b) ASF Working Paper 1 available from the ARG website.
Black, P. & Wiliam, D. (1998) Assessment and classroom learning, Assessment in Education, 5(1), 7–
74.
222 W. Harlen
Black, P., Harrison, C., Lee, C., Marshall, B. & Wiliam, D. (2002) Working inside the black box
(London, King’s College London).
Black, P., Harrison, C., Lee, C., Marshall, B. & Wiliam, D. (2003) Assessment for learning: putting it
into practice (Maidenhead, Open University Press).
Broadfoot, P., Pollard, A., Osborn, M., McNess, E. & Triggs, P. (1998) Categories, standards and
instrumentalism: theorizing the changing discourse of assessment policy in English primary
education, paper presented at the Annual Meeting of the American Educational Research
Association, 13–17 April, San Diego, California, USA.
Carter, C. R. (1997) Assessment: shifting the responsibility, Journal of Secondary Gifted Education,
9(2), Winter 1997/8, 68–75.
Crooks, T. J. (1988) The impact of classroom evaluation practices on students, Review of
Educational Research, 58, 438–81.
Cumming, J. & Maxwell, G. S. (2004) Assessment in Australian schools: current practice and
trends, Assessment in Education, 11(1), 89–108.
Dweck, C. S (1992) The study of goals in psychology, Psychological Science, 3, 165–7.
Gordon, S. & Rees, M. (1997) High-stakes testing: worth the price?, Journal of School Leadership, 7,
345–68.
Harlen, W. (2004a) A systematic review of the evidence of reliability and validity of assessment by
teachers used for summative purposes (EPPI-Centre Review), Research Evidence in Education
Library, issue 3 (London, EPPI-Centre, Social Science Research Unit, Institute of
Education). Available on the website at: http://eppi.ioe.ac.uk/EPPIWeb/home.aspx?page = /
reel/review_groups/assessment/review_three.htm
Harlen, W. (2004b) A systematic review of the evidence of the impact on students, teachers and the
curriculum of the process of using assessment by teachers for summative purposes (EPPI-
Centre Review), Research Evidence in Education Library, issue 4 (London, EPPI-Centre, Social
Science Research Unit, Institute of Education). Available on the website at: http://
eppi.ioe.ac.uk/EPPIWeb/home.aspx?page = /reel/review_groups/assessment/review_four.htm
Harlen, W. & Deakin Crick, R. (2002) A systematic review of the impact of summative assessment
and tests on students’ motivation for learning (EPPI-Centre Review), Research Evidence in
Education Library, issue 1 (London, EPPI-Centre, Social Science Research Unit, Institute of
Education). Available on the website at: http://eppi.ioe.ac.uk/EPPIWeb/home.aspx?page = /
reel/review_groups/assessment/review_one.htm
Harlen, W. & Deakin Crick, R. (2003a) Teaching and motivation for learning, Assessment in
Education, 10(2), 169 – 208.
Harlen, W. & Deakin Crick, R. (2003b) A systematic review of the impact on students and teachers
of the use of ICT for assessment of creative and critical thinking skills (EPPI-Centre Review),
Research Evidence in Education Library, issue 2 (London, EPPI-Centre, Social Science
Research Unit, Institute of Education). Available on the website at: http://eppi.ioe.ac.uk/
EPPIWeb/home.aspx?page = /reel/review_groups/assessment/review_two.htm
Harlen, W. & James, M. J. (1997) Assessment and learning: differences and relationships between
formative and summative assessment, Assessment in Education, 4(3), 365–80.
Harlen, W. & Qualter, A. (2004) The teaching of science in primary schools (4th edn) (London, David
Fulton).
Hutchinson, C. (2001) Assessment is for learning: the way ahead (Internal Policy Paper, Scottish
Executive Education Department (SEED)).
Jackson, B. (1989) A comparison between computer-based and traditional assessment tests, and
their effects on pupil learning and scoring, School Science Review, 69, 809–15.
Johnston, J. & McClune, W. (2000) Selection project sel 5.1: pupil motivation and attitudes–self-
esteem, locus of control, learning disposition and the impact of selection on teaching and
learning, in: The effects of the selective system of secondary education in Northern Ireland, Research
Papers, Vol. II (Bangor, Co. Down, Department of Education), 1–37.
Kellaghan, T., Madaus, G. F. & Raczek, A. (1996) The use of external examinations to improve student
motivation (Washington, DC, American Educational Research Association).
Kohn, A. (2000) The case against standardized testing (Portsmouth, NH, Heinemann).
Koretz, D. (1988) Arriving at Lake Wobegon: are standardized tests exaggerating achievement and
distorting instruction?, American Educator, 12(2), 8 – 15, 46 – 52.
Koretz, D., Linn, R. L., Dunbar, S. B. & Shepard, L. A. (1991) The effects of high-stakes testing on
achievement: preliminary findings about generalization across tests, paper presented at the
annual meeting of the American Educational Research Association, 11 April, Chicago.
Kumar, D. (1993) Effect of HyperCard and traditional performance assessment methods on expert-novice
chemistry problem solving, annual meeting of the National Association for Research in Science
Teaching, Atlanta, GA.
Linn, R. (2000) Assessments and accountability, Educational Researcher, 29, 4–16.
Linn, R., Dunbar, S., Harnisch, D. & Hastings, C. (Eds) (1982) The validity of the title 1 evaluation
and reporting systems (Beverley Hills, CA, Sage).
Masters, G. & Forster, M. (1996) Progress maps (Victoria, Australian Council for Educational
Research).
Maxwell, G. S. (2004) Progressive assessment for learning and certification: some lessons from
school-based assessment in Queensland, paper presented at the third conference of the
Association of Commonwealth Examination and Assessment Boards, March, Nadi, Fiji.
National Council for Educational Technology (NCET) (1994) Integrated learning systems: a report of
the pilot evaluation of ILS in the UK (Coventry, NCET).
Osborn, M., McNess, E., Broadfoot, P., Pollard, A. & Triggs, P. (2000) What teachers do: changing
policy and practice in primary education (London, Continuum).
Osmundson, E., Chung, G., Herl, H. & Klein, D. (1999) Knowledge mapping in the classroom: a tool
for examining the development of students’ conceptual understandings, Research Report (Los
Angeles, CA, Centre for Research on Evaluation, Standards and Student Testing).
Pollard, A., Triggs, P., Broadfoot, P., McNess, E. & Osborn, M. (2000) What pupils say: changing
policy and practice in primary education (London, Continuum), chaps 7 and 10.
Reay, D. & Wiliam, D. (1999) ‘I’ll be a nothing’: structure, agency and the construction of identity
through assessment, British Educational Research Journal, 25, 345–54.
Schacter, J., Herl, H. E., Chung, G. K. W. K., O’Neil, H. F. O., Dennis, R. & Lee, J. J. (1997)
Feasibility of a web-based assessment of problem solving, annual meeting of the American
Educational Research Association, April, Chicago.
Shepard, L. (1991) Will national tests improve student learning?, Phi Delta Kappan, 72(4), 232–8.
Stiggins, R. J. (1999) Assessment, student confidence and school success, Phi Delta Kappan, 81(3),
191–8.
Wilson, M. (1990) Measurement of developmental levels, in: T. Husen & T. N. Postlethwaite
(Eds) International encyclopedia of education: research and studies. Supplementary vol. 2 (Oxford,
Pergamon), 152–8.
Wilson, M., Kennedy, C. & Draney, K. (2004) GradeMap (Version 4.0) [computer program]
(Berkeley, University of California, BEAR Center).
Wood, R. (1991) Assessment and testing: a survey of research (Cambridge, Cambridge University
Press).

Harlen 2005

Uploaded by

Copyright:

Available Formats

Harlen 2005

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Harlen 2005

Uploaded by

Copyright:

Available Formats

What are the main purposes of assessment discussed in the article?

What are the main purposes of assessment discussed in the article?

What are some of the negative effects of high-stakes testing mentioned?

What are some of the negative effects of high-stakes testing mentioned?

The Curriculum Journal, Vol. 16, No. 2, June 2005, pp.

Teachers’ summative practices and

Keywords: summative assessment; formative assessment; teachers’ assessment; motivation

ISSN 0958-5176 (print)/ISSN 1469-3704 (online)/05/020207–17

summarizing achievements at a certain time, monitoring levels of achievement, and

The impact of high-stakes summative assessment on teaching and the

Evidence that raised levels of achievement result from using assessment in a

Impact on students’ motivation for learning

Box 1. Main ﬁndings from the systematic review of research on the

Issues and opportunities of using teachers’ assessment for summative

. What is the evidence concerning the reliability and validity of assessment by

Box 2. Main ﬁndings from two systematic reviews of teachers’

. Participation of teachers in developing criteria is an effective way of enabling

Using summative assessment to help learning

The identiﬁcation of goals and assessment criteria in terms of a ‘common

Using formative assessment information for summative assessment

students are in development. An example of indicators for the development of

Box 3. Example of developmental indicators (from Harlen & Qualter,

1. Identify obvious differences and similarities between objects and materials.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.