Effects of Differential Feedback On Students' Examination Performance
Effects of Differential Feedback On Students' Examination Performance
Effects of Differential Feedback On Students' Examination Performance
The effects of feedback on performance and factors associated with it were examined in a large
introductory psychology course. The experiment involved college students (N ⫽ 464) working on an
essay examination under 3 conditions: no feedback, detailed feedback that was perceived by participants
to be provided by the course instructor, and detailed feedback that was perceived by participants to be
computer generated. Additionally, these conditions were crossed with factors of grade (receiving a
numerical grade or not) and praise (receiving a statement of praise or not). The task under consideration
was a single-question essay examination administered at the beginning of the course. Detailed feedback
on the essay, specific to individual’s work, was found to be strongly related to student improvement in
essay scores, with the influence of grades and praise being more complex. Generally, receipt of a
tentative grade depressed performance, although this effect was ameliorated if accompanied by a
statement of praise. Overall, detailed, descriptive feedback was found to be most effective when given
alone, unaccompanied by grades or praise. It was also found that the perceived source of the feedback
(the computer or the instructor) had little impact on the results. These findings are consistent with the
research literature showing that descriptive feedback, which conveys information on how one performs
the task and details ways to overcome difficulties, is far more effective than evaluative feedback, which
simply informs students about how well they did.
Students in university courses typically receive one or more of mation is commonly referred to as feedback (Ilgen & Davis, 2000;
three types of responses to the work that they produce: a grade, a Kluger & DeNisi, 1996), and formative assessment can be con-
statement of praise or concern, and some level of feedback on the ceptualized as a process through which learners receive feedback.
specifics related to their performance (Orrell, 2006). The response that However, not all feedback is the same, and not all feedback is
students receive often serves as a summary of their performance and equally effective in promoting learning (Hattie & Timperley, 2007;
provides information on how they can improve. These two different Kluger & DeNisi, 1996). The action taken by a learner in response
functions of the response are known as summative and formative to feedback depends heavily on the nature of the message, the way
functions of assessment (Scriven, 1967). The use of formative assess- in which it was received, and the working contexts in which that
ment to enhance student achievement has undergone a renaissance in action may be carried out (Black & Wiliam, 1998).
recent years, leading to a variety of studies examining aspects of the
relationship between formative assessment and students’ ability to
profit academically from such assessment (Schute, 2007; Symonds, Effects of Feedback
2004; Wiliam & Thompson, 2007). The formative function of assess- Three comprehensive meta-analyses have been conducted over
ment in university courses is the focus of this research. the past 20 years on the effects of feedback on achievement.
Black and Wiliam (1998) proposed that the core of formative Bangert-Drowns, Kulik, and Morgan (1991) found that although
assessment comprises two types of information: (a) learners’ cur- feedback was positively related to greater achievement in most
rent knowledge set and (b) the desired knowledge set. The dis- settings, there was wide variability of feedback effects on perfor-
crepancy between the two represents a gap that is to be closed by mance. Overall, they concluded that the key feature in effective
the learner (Black & Wiliam, 1998; Ramaprasad, 1983). In order use of feedback is that it must encourage “mindfulness” in stu-
for assessment to facilitate learning, students need to receive dents’ responses to the feedback. Kluger and DeNisi’s (1996)
information about the discrepancy between the actual and the meta-analysis demonstrated that although feedback typically im-
desired state and effectively process that information. This infor- proved performance, in one third of cases, presentation of feed-
back resulted in decreased performance. They contended that when
feedback was accompanied by praise or critical judgments, the
effectiveness of the feedback decreased and that feedback that
Anastasiya A. Lipnevich, Educational Testing Service, Princeton, New
Jersey; and Jeffrey K. Smith, Department of Education, University of
showed participants how to reach correct solutions was more
Otago, Dunedin, New Zealand. effective than were simple judgments of right or wrong responses.
Correspondence concerning this article should be addressed to Anasta- Similarly, Hattie and Timperley’s (2007) analysis found substantial
siya A. Lipnevich, Educational Testing Service, Rosedale Road, R-18, variability in the effects of feedback. They reported that feedback
Princeton, NJ 08551. E-mail: a.lipnevich@gmail.com about a particular task and how to do it is more effective than
319
320 LIPNEVICH AND SMITH
feedback that focuses on praise or on punishments and rewards. Hattie speaks to this issue and comes from an experimental intervention.
and Timperly (2007) emphasized that feedback needs to address the Although the research base at this point is not strong enough to
questions of what the goals are, where the student currently stands in make definite statements about how computer-based essay feed-
relation to those goals, and what the next steps should be for reaching back will be received, it is our anticipation that participants will
the goals. They also noted that feedback focused on the level of the not take such feedback at a personal level and therefore will not
task, the processes required to complete the task, and self-regulatory have a negative reaction to it.
task-related activities are more effective than is feedback focused on
the person (typically, praise). Finally, Hattie and Timperly (2007) Grades as a Component of Feedback
argued that feedback has “to prompt active information processing on
the part of the learners” (p. 104). The most common type of feedback that students receive is a
We argue that the key to understanding the effects of feedback grade, often with little or no additional commentary (Marzano,
as it occurs via formative assessment in formal learning settings 2000; Oosterhof, 2001). Grades provide a convenient summary of
has to do with what Bangert-Drowns et al. (1991) call the mind- student performance (Airasian, 1994), but how do grades perform
fulness with which it is received or what Hattie and Timperley in terms of a formative function? One of the main conclusions that
(2007) call actively processing the information. Unless students Black and Wiliam (1998) drew from their review of literature on
successfully process the feedback that they receive, there is little formative assessment is that descriptive feedback, rather than letter
reason to believe that the feedback will have a positive effect on grades or scores, leads to the highest improvements in perfor-
learning. But most research on feedback links feedback directly to mance. Moreover, several studies have suggested that grades are
subsequent achievement without considering the degree to which actively detrimental and may hinder students’ performance. For
the feedback is successfully interpreted and processed. In this example, Butler and Nisan (1986) found that grades emphasized
research, we examine how students in a university introductory quantitative aspects of learning, depressed creativity, fostered fear
psychology course use feedback on an essay exam to improve their of failure, and weakened students’ interest. Butler (1988) found
work. This allows us to assess the degree to which the feedback that students receiving comments specifically tailored to their
was used to improve performance under different conditions, rep- performance resulted in a significant increase in scores on a task.
resenting a tighter link between the intervention and the outcome. Students receiving only grades showed a significant decline in
scores, as did a group that received both grades and comments.
Computer as Source of Feedback Interestingly, high achievers in all three feedback conditions
sustained a high level of interest, whereas low achievers in the
Feedback has to have an origin. In some aspects of life, this graded groups evidenced dramatic declines (Butler, 1988). It
origin may be inanimate, such as the direction and distance of a seemed that the presentation of a grade was particularly discon-
golf ball’s flight after having been struck by a golf club. Other certing when it indicated that performance was in some sense
times, the source is an individual, typically a teacher. Advances in inadequate. The impact of receiving a grade may well depend on
technology allow for feedback to come from a computer rather whether this grade is fundamentally good news or bad news (Black
than from a teacher. The ability of a computer to provide feedback & Wiliam, 1998). It may be the case that no news is much better
on objective assessments has existed for some time, but more than bad news. The idea that the feedback delivered in different
recent advances have allowed for computer-based scoring of es- ways might have differential impact on students of different abil-
says that include individualized feedback (Attali, 2004; Attali & ities has not been extensively studied. The design of the present
Burstein, 2006; Landauer, Latham, & Foltz, 2003). The potential study allowed for a critical examination of this issue by taking
instructional benefit of computer-based feedback on such a labor- students’ scores on initial drafts of their essays and splitting the
intensive task as marking essays is clear, but a serious question sample into three subsamples based on those scores.
arises as to whether such feedback will be taken seriously by Explanations for the negative effects of grades on students’ perfor-
students. mance vary. Butler and Nissan (1986) and Butler (1988) proposed
One perspective on the question is that individuals will see that grades inform students about proficiency in relation to others,
computer-based feedback as essentially neutral (like the flight of a whereas individualized comments create standards for self-evaluation
golf ball), but also not particularly accurate or helpful (Kluger & specific for the task. They posited that even if feedback comments are
DeNisi, 1996; Lepper, Woolverton, Mumme, & Gurtner, 1993). A helpful for students’ work, their effect can be undermined by the
second perspective views computers as social actors (Nass, Moon, negative motivational effects of giving grades and scores (Butler,
& Carney, 1999), with people attributing human characteristics to 1988). Hattie and Temperley’s (2007) model would see this as focus-
computers (Ferdig & Mishra, 2004; Nass, Fogg, & Moon, 1996; ing the student at the level of the self rather than on the task or the
Nass, Moon, & Green, 1997). According to this perspective, stu- processes that produced the performance on the task.
dents will respond to computer-provided feedback in the same way The empirical base for these arguments is not uniformly consistent.
that they respond to human-provided information. Kluger and Although Butler’s (1988) research found a negative effect of grades,
DeNisi (1996) summarized findings (albeit sparse) on computer Smith and Gorard (2005) found that students receiving grades and
versus instructor feedback and concluded that computer feedback comments on their work outperformed students that received com-
was perceived as more helpful and accurate because of its ten- ments only. The research on the influence of grades is inconclusive,
dency to bypass issues of attitude, affect, and stereotypes that are especially with the university students. Because most university as-
characteristic of human interactions. sessment practices involve the assignation of grades, it is particularly
Our goal in comparing instructor-based feedback with important to investigate the impact of grades on student use of
computer-based feedback is to provide information that directly feedback information. We hypothesize that the presence of grades on
EFFECTS OF DIFFERENTIAL FEEDBACK 321
assessments at the university level has a negative impact on students’ Research in formative assessment frequently uses affective vari-
productive utilization of assessment feedback. The design of the ables (such as mood, motivation, and self-efficacy) to explain the
current study allowed for a direct investigation into this issue, as well reactions that individuals have toward different feedback condi-
as into the question of how grades work in combination with praise tions (see, e.g., Butler, 1987; Butler & Nisan, 1987; Ilies & Judge,
and source of the feedback. 2005). But there is little empirical research that actually examines
how feedback influences affective response. For example, does the
Praise as a Component of Feedback receipt of a grade actually result in a negative mood being induced,
or a decrease in self-efficacy? We do not propose to explicate the
Praise has been defined as “favorable interpersonal feedback”
exact nature of the workings of affective variables as moderator
(Baumeister, Hutton, & Cairns, 1990, p. 131) or “positive evalu-
variables in this research, although we believe that this would be
ations made by a person of another’s products, performances or
an excellent theoretical development. Instead, we propose to pro-
attributes” (Kanouse, Gumpert, & Canavan-Gumpert, 1981, p. 98).
vide baseline information on whether differential feedback condi-
Meta-analytic studies examining the effects of praise on motiva-
tions actually result in differential affective responses in an exper-
tion have shown that positive statements have a tendency to
imentally controlled setting. Thus, although we believe that
increase motivation across a variety of dependent measures (Cam-
affective variables function as moderator variables in the
eron & Pierce, 1994; Deci, Koestner, & Ryan, 1999). This effect
feedback–response process, we use them as dependent variables
is not always strong, varies for different age groups, and often has
here so that we can directly address the issue of whether they are
been derived in the course of methodologically flawed studies
influenced by different feedback conditions.
(Henderlong & Lepper, 2002; Lepper, Henderlong, & Gingras,
Several studies have shown that feedback containing praise leads to
1999).
increased motivation (Delin & Baumeister, 1994; Ilies & Judge,
The literature also includes examples of the negative impact of
2005). Heniderlong and Lepper (2002) argued that favorable feedback
praise on students’ learning. Baumeister et al. (1990) presented
cues would motivate children to work hard to sustain the approval of
evidence that praise can both impede and facilitate performance.
the evaluator, but that such behavior was transient, fading when the
They argued that when praise focuses attention on the self as
evaluator was no longer present. If praise is hypothesized to elicit
opposed to the task, cognitive resources are directed toward the
positive affect, grades are often thought to lead to negative affect.
self and not the task, hindering performance on more cognitively
Kluger, Lewinsohn, and Aiello (1994) argued that feedback received
complex tasks. This argument is consistent with Hattie and Tem-
by individuals gets cognitively evaluated with respect to potential
perley’s (2007) and Klueger and DeNisi’s (1996) position that
benefit or harm and for the need to take an action. More often than
feedback focused on the self is not productive. We include praise
not, frustration, or other negative affective responses, followed by a
as a factor in the design of the study, allowing for a direct
sense of helplessness, prevents students from effectively carrying out
investigation of the effects of praise, both in isolation and in
a task and succeeding on it.
combination with the factors of source of feedback and grades. We
The negative effect of grades on students’ performance can also
anticipate that praise will negatively influence students’ perfor-
be explained through the influence on students’ self-efficacy.
mance.
Generally, self-efficacy, or beliefs about one’s competence, is
known to be influenced by prior outcomes (Bandura & Locke,
Examining the Affective Outcomes of Formative
2003; Vancouver, More, & Yoder, 2008). Although self-efficacy is
Assessment Feedback typically conceptualized as a causal factor in educational and
We have argued thus far the following: psychological research (Boekaerts, Maes, & Karoly, 2005; Van-
couver, Thompson, & Williams, 2001), it is reasonable to consider
• that feedback holds the potential to positively influence learn- it as an outcome of receiving feedback. A grade that causes
ing by prompting active involvement with the material to be students to question their sense of efficacy has the potential to
learned; negatively affect performance or to spur students to increased
effort. Although there is evidence of the influence of feedback on
• that feedback coming from a computer may be viewed differ- motivation, mood, and self-efficacy beliefs, the research base is
ently from feedback coming from a course instructor; not extensive. We include measures of these three variables in the
design as dependent variables to examine the degree to which they
• that including grades as a component of feedback may have
hold potential to help understand differences seen in the degree to
a negative influence on how feedback is received by stu-
which student work improves as a result of differential feedback.
dents;
• that the effects of feedback may be different for students of Summary and Aims of the Present Research
differing levels of initial achievement on a task;
The purpose of the study presented here was to systematically
• that including praise as a component of feedback may have examine how feedback is received and used by university students
a negative influence on how feedback is received by stu- under different conditions. We did this by investigating students’
dents; and productive use of various forms of feedback. There have been a
number of studies focused on aspects of grades, praise, and other
• that investigating these influences might more effectively be feedback practices in higher education, but none that look specif-
studied by relating them to the productive use of feedback ically at all three of these aspects in combination, with experimen-
by students than by the gains in learning. tal control, in a course setting in which the grades count for the
322 LIPNEVICH AND SMITH
students. Because the consequence of the assessment for students information with which they were provided in the study. Third, the
is known to affect student performance (Wise & DeMars, 2005; posttest measures we wanted to use could be administered to
Wolf, Smith, & Birnbaum, 1995), conducting the study as part of students at this level. Finally, there is a void in the research
the grading system within a university course greatly adds to the literature concerning this kind of controlled investigation with
ecological validity and generalizability of the findings. With the university students. Although there is ample research using uni-
advent of computer-based essay scoring, it is now possible to versity students in general, the use of formative assessment is not
provide computer-based feedback to students regarding their ef- widespread in university level courses, and as a result, there is
forts. However, there is little to no research on how students react hardly any research on the topic at this level.
to feedback coming from the computer as opposed that coming Participants for the experiment were students enrolled in intro-
from the professor in the course. Finally, there is not extensive ductory psychology courses at two public northeastern universities
research of issues related to students’ affective response to detailed taught by the same instructor. The sample size for the experiment
feedback, praise, and grades. In this research, we sought to exam- was 464 students, with 409 students attending University 1, and 55
ine the relationship between the feedback that students receive and students attending University 2. Separate analyses were run for the
their sense of self-efficacy, motivation, and mood. two samples to compare the distributions of key variables (i.e., the
Additionally, most prior research links feedback to ultimate essay scores and affective measures) included in the current study;
learning, as opposed to successful engagement with the task and these variables were distributed normally for both samples, with
the feedback presented. This study examines improvement in nearly identical means and standard deviations. There were no
performance on a specific, complex task through the productive differences in the basic findings of the study for the two different
engagement with feedback delivered in differing conditions. The samples; therefore, the samples were merged.
design of the study also allows us to estimate the magnitude of the The participants ranged in age from 17 to 51 years, with a mean
influence of various conditions of feedback and at differing levels age of 18.9 years (SD ⫽ 2.5). Two hundred forty-one (51.9%)
of initial performance on the task studied in a fashion similar to participants were women, and 223 (48.1%) were men. Three
that of Butler (1988). Finally, we are able to examine the interac- hundred fifteen (68%) students were freshmen, 85 (18%) were
tions among the independent variables. sophomores, and 64 (14%) were juniors. The majority of the
participants identified themselves as White (54.7%), with an ad-
Method ditional 24.6% Asian, 6.9% Hispanic, 3.9% Black, and 6.0% other,
and with 3.4% choosing not to respond. Of the 464 participants,
In this experimental study, we investigated what happened when 382 (82.3%) were born in the United States, and 82 (17.7%) were
students were given the opportunity to revise an essay examination not. Students also provided information about their native lan-
in an introductory psychology course on the basis of the receipt (or guage. Three hundred seventy-one students (80%) reported being
lack of receipt) of feedback on their first efforts. We also system- native English speakers and 93 (20%) native speakers of a lan-
atically varied whether students were told that the feedback came guage other than English.
from the professor or from a computer essay-scoring program,
whether students received a tentative, preliminary grade on their
work, and whether they received a statement of praise and encour- Instrumentation
agement. This allowed us to study how important aspects of Examination. As a part of course requirements, students were
feedback influenced participants’ subsequent behavior in their asked to write a 500-word expository essay demonstrating their
efforts to improve their work. The basic design of the study was a understanding of theories of motivation that were part of their
3 ⫻ 2 ⫻ 2 analysis of covariance: 3 levels of feedback (no readings and class discussions. Their score on this essay served as
feedback, feedback perceived to be from the professor, or feedback a component of their overall grade in the course. Before the topic
perceived to be from the computer program) ⫻ 2 levels of pre- was presented, students received the following instructions:
liminary grade (presence or absence) ⫻ 2 levels of praise (pres-
ence or absence). The primary dependent measure was the score on Now you’re ready to write the essay. Below is the rubric which
the revised essay examination, and the covariate was the score on explains how the essay will be evaluated. The rubric will be available
the first draft of the examination. In addition to using the exami- to you while writing. YOUR ESSAY MUST NOT EXCEED 500
nation scores, we used motivation, sense of self-efficacy, positive WORDS. You must type your essay directly into the page. You may
not cut and paste from Microsoft Word or any other software. Good
and negative affect, and perceived accuracy and helpfulness of
luck!
feedback as additional outcome measures. These additional mea-
sures helped us to understand how the independent variables
The prompt for this examination was a modification of an
influenced improvement in performance.
Educational Testing Service (ETS) prompt developed for their
E-Rater essay scoring system (Attali & Burstein, 2006; Burstein,
Participants 2003) deemed appropriate for first-year students that incorporated
a reference to theories of motivation:
We conducted the study with university students for several
reasons. First, the format of a large, introductory university class Sometimes we choose to do things that we do not really enjoy—
taught by a single instructor allowed for the statistical power we studying hard, eating the right foods, and so on. Describe something
were after without the confounding effects of multiple instructors. you do by choice that you really do not enjoy. Using theories of
Second, we felt that university students were old enough and had motivation, explain why you might continue to do it. Discuss the
enough experience in assessment settings to effectively process the changes that might occur in your life if you were to stop this activity.
EFFECTS OF DIFFERENTIAL FEEDBACK 323
Support your claims with specific examples from your life and the feedback: “How accurate was the feedback?” and “How helpful
course reading. was the feedback?” The answers were based on a 7-point response
scale ranging from (1) not at all accurate (helpful) to (7) very
Students were presented with an extensive rubric describing the accurate (helpful).
criteria for evaluation. The rubric was available during the task and
could be consulted at any point in the writing process. To make
sure that students wrote essays of comparable length, a real-time Procedure
indicator displayed a word count. The primary dependent measure
The experiment involved computer administration and was con-
used in the analyses was students’ final score on the examination.
ducted in two sessions separated by 1 week. A data collection pro-
Their preliminary score, prior to receiving feedback, served as a
gram and an interactive Web site were created to satisfy specific
covariate in the design. A detailed description of the scoring
requirements of the study. Students were informed of the nature of the
procedures is presented below.1
study and told that participation in the study would satisfy their
Test motivation measure. The Post-Test Index of Test Moti-
psychology subject pool requirement. They were also told that all
vation (Wolf & Smith, 1995) was used to test how motivated
final test scores would be adjusted so that the means of all groups
students were to do well on the task in question. This measure is would equal the mean of the highest scoring group in the experiment.
different from other motivation measures in an important respect: Thus, there would be no detriment to their grade for having partici-
it is test-specific, in that the items refer specifically to the test that pated in the study. Students were reminded that they could choose not
has just been taken. The scale consists of eight 7-point Likert-type to allow their responses to be used for research purposes. If they chose
items bounded by (1) strongly disagree and (7) strongly agree. A to do so, they were asked to complete the requirements of the exam
sample item typical of the measure is, “Doing well on this exam and not fill out additional assessments.
was important to me.” High scores on the scale indicate that First session. All students who were enrolled in the two in-
students had a strong desire to do well on the exam that they just troductory psychology courses were scheduled to come to a com-
took and exerted all the necessary effort to ensure success. Lower puter lab to take their examination. Students were presented with
scores suggest a lack of interest in the process or the outcome of the test instructions and the grading rubric and were then asked to
the exam. Reliability coefficients reported in the literature are .89 begin their essay. Students submitted their work—which was
(Spencer, 2005) and .87 (Wolf et al., 1995), which are similar to saved in the system—were thanked for their participation, and
the ␣ ⫽ .85 found in the current study. reminded to return in 1 week for the second part of the study.
Test self-efficacy measure. The Post-Test Self-Efficacy Scale Scoring of the examination. ETS allowed the use of their
is modeled on the Post-Test Index of Test Motivation (Wolf & proprietary software package E-Rater for this study. E-Rater (At-
Smith, 1995), in that it focuses on an individual’s sense of self- tali & Burstein, 2006) extracts linguistically based features from an
efficacy on a test that has just been completed. It consists of eight essay and uses a statistical model of how these features are related
Likert-type items (Spencer, 2005). The answers were based on a to overall writing quality to assign a holistic score to the essay.
7-point response scale ranging from (1) strongly disagree to (7) Additionally, it assesses and provides feedback for errors in gram-
strongly agree. A sample item typical of the measure is, “I am not mar, usage, and mechanics, identifies the essay’s structure, recog-
competent enough to have done well on this exam” (scoring nizes undesirable stylistic features, and provides diagnostic anno-
reversed). This measure assesses students’ judgment of their own tations within each essay (Attali, 2004; Burstein, 2003). The total
capabilities for the test they have completed. Higher scores on the examination score presented to the students comprised two sepa-
measure indicate students’ confidence in their performance on rate components: the E-Rater score (ranging from 0 to 6) and the
the test, and lower scores suggest doubt in their ability to have content score provided by the instructor and the researcher (rang-
done well on the test in question. The reported alpha coefficient of ing from 0 to 6, including half points). The final score was
the instrument is .86 (Spencer, 2005), identical to ␣ ⫽ .86 found calculated as a weighted average of the two scores and converted
in the present inquiry. into a scale of 100. The E-Rater score contributed 30% to the total
Measure of affect. The Positive and Negative Affect Scale score, and the content score contributed 70% to the total score.
(PANAS) is a 20-item self-report measure of positive and negative E-Rater was customized to rate the essays written on the prompt
affect (Watson, Clark, & Tellegen, 1988). The scale is accompa- selected for the present study. Students’ essays were scored on all
nied by instructions for measuring students’ current affective state. of the aforementioned characteristics including mechanics, gram-
The participants were asked to indicate the extent to which they mar, spelling, and stylistic features, and a holistic score was
experienced the affective states described by the PANAS adjec- assigned to every student. For several experimental conditions, the
tives on a 5-point scale ranging from (1) slightly/not at all to (5) feedback provided by E-Rater was modified to satisfy the require-
extremely. Two additive indices were computed, resulting in sep- ments of specific feedback conditions described below. A portion
arate positive affect and negative affect scores for each participant. of the detailed feedback screen is presented in Figure 1.
The reported alpha coefficients of the positive affect scale range
from .86 to .95, and the negative affect scale from .84 to .92 1
Exploratory and confirmatory factor analyses of the three measures
(Crawford & Henry, 2004; Ilies & Judge, 2005; Jolly, Dyck,
have been conducted. The results replicated previous findings reported in
Kramer, & Wherry, 1994; Roesch, 1998). We obtained coeffi- the literature and demonstrated the theoretical and psychometric soundness
cients of ␣ ⫽ .89 and ␣ ⫽ .86, respectively. of the three measures. Because of space limitations, the results of the
Helpfulness and accuracy of feedback. Two items were used analyses have been excluded from this article. They are available upon
to gauge participants’ perceptions of accuracy and helpfulness of request from Anastasiya A. Lipnevich.
324 LIPNEVICH AND SMITH
Figure 1. Detailed feedback screen with a pop-up message for a specific feedback item.
Additionally, two raters (the course instructor and the re- exam screen, and the following instructions were pro-
searcher) scored the content aspect of the examination. Prior to vided: “During this session, you will be able to edit and
scoring the main experiment, a series of calibration sessions were improve the essay you wrote the first time, based on detailed
held to ensure interrater reliability between the two raters. We feedback I have given you on content, grammar, punctua-
developed a detailed rubric that provided criteria for evaluating the tion, spelling, sentence structure, and the overall quality
content of students’ essays (see the Appendix). The interrater of your essay. PLEASE READ MY COMMENTS
reliability was .96 for the first session examination score and .98 CAREFULLY and do your best to use them — it should
for the final examination score. In case of a discrepancy in ratings, really help you get a better score.”
the average of the two raters’ scores was taken. There were no
differences in ratings larger than 1 point. The instructor and the 3. Computer-Feedback Condition. Students in this group
researcher were unaware of the students’ identities and experimen- received feedback equivalent in its nature to the one in
tal conditions. the previous condition (i.e., all of the comments were
To provide feedback on the content of students’ essays in a work specific and directly linked to students’ essays). In
consistent fashion, a number of standard comments were written. this condition, students were told that all the comments
These comments were slightly modified depending on the exper- were generated by the computer. The following instruc-
imental condition, so that some comments sounded as if they came tions were provided: “During this session, you will be
from a computer and others from the professor. The comments able to edit and improve the essay you wrote the first
presented to each individual student reflected their particular mis- time, based on detailed feedback generated by an intel-
takes and omissions and therefore were highly specific to each ligent computer system designed to read and critique
individual’s work. The combination of the E-Rater essay feedback essays. The computer will give you feedback on content,
and the content feedback generated by the instructor and the grammar, punctuation, spelling, sentence structure, and
researcher are referred to hereinafter as detailed feedback. By the overall quality of your essay. PLEASE READ THE
detailed feedback, we mean feedback that is extensive and that COMPUTER’S COMMENTS CAREFULLY and do
relates to sentence and phrase level writing as well as commentary your best to use them — it should really help you get a
on the quality of the content of the essay. After the initial essays better score.” A picture of a computer was displayed on
were scored, blocking was used to assign participants to three every screen. The E-Rater comments were taken in their
experimental conditions so that the resulting groups had equivalent original form. The additional comments concerning the
numbers of students with high, medium, and low first-session content and adequacy of the use of course-related con-
scores. structs matched the style of the computer comments and
Each student was assigned to one of the three detailed feedback were impersonal and neutral. A comparative table of the
conditions: comments received by students in the computer and
instructor conditions is presented in Table 1.
1. No-Feedback Condition. This group received no detailed
feedback. Additionally, the three feedback conditions were crossed with
two factors of grade (grade or no grade) and praise (praise or no
2. Instructor-Feedback Condition. This group received a praise) resulting in a 3 ⫻ 2 ⫻ 2 experimental design.
combination of the E-rater– generated feedback regarding Numeric grades for the first draft of the essay were presented
mechanics and style and content-related comments and only to those students in the “grade” condition. Students to whom
suggestions. Students were informed that the feedback their first session score was revealed were informed that this
came from the course instructor. All comments were preliminary score was only for information, and it was the final
written in a reserved and neutral fashion but in a way that score on the revised essay that was to be counted as their outcome.
was clear that they came from a person rather than a Praise was provided in the form of a comment preceding the rest
computer. To make sure that the source of feedback was of the feedback. There were three levels of praise that differed
clear to the participants, a clip-art picture of a typical depending on the score that students received for the draft of their
college professor was displayed in the corner of every essay (whether they were presented with this score or not). These
EFFECTS OF DIFFERENTIAL FEEDBACK 325
Table 1
Comparison of Comments Received by Students in the Instructor and Computer Conditions
Mechanics Name, please break your essay into paragraphs so I can see Please break your essay into paragraphs so that the structure
the structure. can be detected.
Name, this sentence is a fragment. Proofread the sentence This sentence may be a fragment. Proofread the sentence to
to be sure that it has correct punctuation and that it has be sure that it has correct punctuation and that it has an
an independent clause with a complete subject and independent clause with a complete subject and predicate.
predicate.
Name, these sentences begin with coordinating These sentences begin with coordinating conjunctions. A
conjunctions. Try to combine the sentence that begins sentence that begins with and, but, and or can sometimes
with but with the sentence that comes before it. be combined with the sentence that comes before it.
Content Name, a good essay usually contains three main ideas, each A good essay usually contains three main ideas, each
developed in a paragraph. Use examples, explanations, developed in a paragraph. Use examples, explanations,
and details to support and extend your main ideas. Try and details to support and extend your main ideas. Center
to center them around the theories of motivation I them around the theories of motivation. Include details
discussed in class. Include details and theory-specific and theory-specific terminology.
terminology.
Name, please discuss all of the components of the Drive You may need to discuss all of the components of the Drive
reduction theory: need, drive, action, and homeostasis. reduction theory: need, drive, action, and homeostasis.
You are missing two of the components.
Name, discuss all of the components of Atkinson’s theory: Discuss all of the components of Atkinson’s theory:
expectancy, value and the need for achievement. You are expectancy, value and the need for achievement. You
missing one of the components. may be missing some of the components.
levels were used to avoid having students receive a praise state- essay on the basis of the feedback they received. Students could
ment clearly incongruous to their level of performance. See Table refer to the grading rubric and to their feedback comments at any
2 for the three levels of praise used. point of the session by hovering the mouse over hotspots in the
feedback text. A portion of the detailed feedback screen is pre-
Second Session sented in Figure 1.
Students who did not receive detailed feedback, praise, or
Participants were asked to return to the computer lab 1 week
after taking the initial examination. They logged into the system grades were encouraged to reread their essays, consult the rubric,
and were shown their essays with corresponding feedback. What and work on improving their work. After participants submitted
appeared to students on the computer screen (detailed feedback, their revised essays, they were asked to make a judgment concern-
praise, etc.) depended on the condition to which they had been ing the accuracy and helpfulness of the feedback. They were then
randomly assigned. After viewing their combination of detailed asked to complete the Post-Test Index of Test Motivation, and the
feedback, praise, and grade (or lack thereof), but prior to moving Post-Test Self-Efficacy Scale.
to the essay revision screen, students were asked to fill out the Scoring of the revised essay followed the rules of the first draft
Positive Affect Scale and the Negative Affect Scale. The partici- scoring. The final numeric grade was computed as a weighted
pants were then prompted to make revisions and resubmit their mean of the E-Rater (30%) and the content (70%) score. The
Table 2
Levels of Praise for the Instructor, Computer and No-Feedback Conditions
80 to 100 Name, you made an excellent start with You made an excellent start with You made an excellent start with
this essay! I still see room for this essay. The data indicate this essay! There is still room
improvement, so take some time and there is still room for for improvement, so take
make it even better. improvement, so take some some time and make it even
time and make it even better. better.
70 to 79 Name, you made a very good start with You made a very good start with You made a very good start with
this essay! I still see room for this essay. The data indicate this essay! There is still room
improvement, so take some time and there is still room for for improvement, so take
make it better. improvement, so take some some time and make it better.
time and make it better.
69 and below Name, you made a good start with this You made a good start with this You made a good start with this
essay! I still see room for essay. The data indicate there essay! There is still room for
improvement, so take some time and is still room for improvement, improvement, so take some
make it better. so take some time and make it time and make it better.
better.
326 LIPNEVICH AND SMITH
scorers were blind to unaware of student identity and experimental feedback obtained substantially lower final exam scores than did
condition. those who received detailed feedback from either the computer
After the completion of the study, a series of focus groups was ( p ⬍ .01) or the instructor ( p ⬍ .01), and there were no differences
held with 50 students to explore their reactions to the experiment in students’ performance between computer and instructor condi-
and to ensure that all students understood the nature of the condi- tions ( p ⬎ .05; see Table 4 for means). Differences between the
tions they were in. The results indicated that all participants no-detailed-feedback condition and the two detailed-feedback con-
understood that they were either getting feedback from a computer ditions showed effect sizes (Cohen’s d) of between .30 to 1.25,
program or from their course instructor. A schematic representa- depending on the presence of grade and praise.
tion of the steps in the study is presented in Figure 2. There was also a significant difference in the final exam score
between students in the grade condition and those in the no-grade
Results condition, F(1, 450) ⫽ 4.07, p ⬍ .05, 2 ⫽ .04. Students who were
shown the grade they received for their first draft performed less
Means, standard deviations, and intercorrelations of all major well on the revision than did those who were not shown their
variables in the study are presented for purposes of reference in grade. This effect needs to be viewed, however, in the context of
Table 3. two significant interaction terms involving grade.
The analysis revealed a significant disordinal interaction between
Analyses of the Effects of Treatments on the Final grade and praise, F(1, 450) ⫽ 6.00, p ⬍ .05, 2 ⫽ .04. Students in the
Exam Score no-grade/no-praise condition received the highest scores (M ⫽ 79.82,
SD ⫽ 5.12). The lowest scores were observed in the grade/no-praise
A 3 ⫻ 2 ⫻ 2 analysis of covariance (ANCOVA) with detailed
condition (M ⫽ 77.69, SD ⫽ 5.12). The grade/ praise condition also
feedback (3 levels), grade (2 levels), and praise (2 levels) conditions
produced fairly high scores (M ⫽ 79.26, SD ⫽ 5.12), as did the
as factors and the grade for the first essay draft (before revisions) as
no-grade/praise condition (M ⫽ 79.06, SD ⫽ 5.13). Means and
a covariate, examined differences in the final numeric grades for the
standard deviations are presented in Table 5. Although the cell means
essay exam. A Bonferroni adjustment was used to control for Type I
cannot be directly compared given that they are interaction terms, the
error (the criterion used was .0083). Significant main effects were
simplest explanation of this interaction appears to be that the presen-
found for detailed feedback and for grade but not for praise. Also,
tation of grades depressed performance unless ameliorated by the
there were significant interaction effects found for grade and praise, as
presence of a statement of praise.
well as for grade and detailed feedback. No other interactions were
There was also a significant interaction between grade and detailed
significant. The effect of detailed feedback was strong; the effect of
feedback, F(2, 450) ⫽ 5.54, p ⬍ .01, 2 ⫽ .08. In the no-detailed-
grade was moderate and needs to be examined in light of the two
feedback condition, scores were fairly similar for students who re-
small, but significant, interactions involving grade. We examine the
ceived a grade (M ⫽ 75.37, SD ⫽ 5.12) in comparison with those who
main effect of detailed feedback first and then the intriguing combi-
did not receive a grade (M ⫽ 74.65, SD ⫽ 5.12). Under the computer
nation of effects involving presentation of grades.
detailed-feedback condition, students’ scores were again similar (M ⫽
There was a strong significant main effect of detailed feedback
80.44, SD ⫽ 5.12, for the no-grade condition, to M ⫽ 80.93, SD ⫽
on students’ final score, F(2, 450) ⫽ 69.23, p ⬍ .001, 2 ⫽ .24.
5.12, for the grade condition), but under the instructor detailed-
Post hoc analyses show that students who did not receive detailed
feedback condition, a distinct difference was observed. Students’ final
exam scores were relatively high when their grade was not presented
(M ⫽ 82.74, SD ⫽ 5.13) and was substantially lower for students
Participants Experimenters
Before experiment
when their grade was presented (M ⫽ 79.63, SD ⫽ 5.12). Means and
Development of a Rubric standard deviations are presented in Table 6.
In summary, the analysis of the performance scores supported
First session
Complete demographic questionnaire Introduction to the experiment
the first hypothesis about the overall positive effect of detailed
Initial drafting of essay Collection of consent forms feedback on students’ improvement. There were no differences for
perceived source of the feedback. Therefore, the hypothesis about
Between sessions
Scoring of initial drafts
the differential effect of computer feedback was not supported.
Assignment to treatments Receipt of a numeric grade led to a substantial decline in perfor-
Generation of feedback mance, especially for students who thought the grade had come
Second session from the instructor. However, a praise statement appeared to lessen
Receive feedback that effect. The hypothesis positing that presentation of a grade
Complete positive and negative hinders improvement was supported, whereas the hypothesis about
affect scale negative effect of praise was not, because no main effect of praise
Revise and submit essay was found. But presenting praise appeared to lessen the negative
Complete post-test index of
motivation
effect of presenting a grade.
Complete post-test self-efficacy scale
Answer questions on accuracy and Analysis of Differences in the Final Exam Score by
helpfulness of feedback Scoring of final essays
Students’ Initial Performance
Focus groups
Following Butler (1988), we decided to investigate whether the
Figure 2. Schematic of procedures and administration of measures. differences found for the overall analysis would be replicated if we
EFFECTS OF DIFFERENTIAL FEEDBACK 327
Table 3
Descriptive Statistics and Intercorrelations of Study Variables
Variable M SD 1 2 3 4 5 6
Note. For the Self-Efficacy and Positive Affect Scales, N ⫽ 462. For the remaining measures, N ⫽ 463.
ⴱ
p ⬍ .05. ⴱⴱ p ⬍ .01. ⴱⴱⴱ p ⬍ .001.
examined students at varying levels of performance on the initial a grade (M ⫽ 64.15, SD ⫽ 6.75). The overall scores were quite
drafts. To that end, a frequency analysis was run for the initial draft low for these groups. Under the computer detailed-feedback con-
score. The analysis revealed a mean of 74.42, SD ⫽ 8.28, and a dition, students’ scores were higher when the grade was presented
range from 50 to 96. The analysis of frequency tables showed that (M ⫽ 75.50, SD ⫽ 6.71) than when no grade was presented (M ⫽
25% of the sample scored at or below 69 (equivalent to letter 72.07, SD ⫽ 6.64). Under the instructor detailed-feedback condi-
grades D and F), about 50% received a score between 70 and 79 tion, students’ final exam scores were relatively high for the
(equivalent to the letter grade C), and the remaining 25% obtained no-grade condition, but they were lower when the grade was
a score at or above 80 (equivalent to letter grades B and A). On the presented (M ⫽ 77.24, SD ⫽ 6.86, when no grade was presented;
basis of these cut points, students were identified as having low M ⫽ 72.07, SD ⫽ 6.65, when grade was presented). See Table 7
(N ⫽ 116), medium (N ⫽ 217), and high (N ⫽ 130) initial draft for means and standard deviations.
scores. We split the dataset on the first exam score grouping There was also a significant effect for the detailed feedback,
variable and ran a series of 3 ⫻ 2 ⫻ 2 ANCOVAs with the detailed F(2, 103) ⫽ 18.78, p ⬍ .001, 2 ⫽ .28, with students in the control
feedback (⫻3), grade (⫻2), and praise (⫻2) as factors and with the condition (who received no detailed feedback) scoring signifi-
first session grade as a covariate. These analyses examined differ- cantly lower (M ⫽ 65.46; SD ⫽ 6.06) than those in either the
ences in the final exam scores for students in each initial perfor- instructor (M ⫽ 75.11, SD ⫽ 6.94, p ⬍ .01) or computer condi-
mance group. tions (M ⫽ 73.88, SD ⫽ 8.04, p ⬍ .01). No differences were
revealed between the computer and instructor detailed-feedback
Students With Low Initial Draft Score conditions ( p ⬎ .05), and no significant effects were found for
grade, praise, or for the other interaction terms.
For students who received low scores on their initial draft, the
analysis revealed a significant Grade ⫻Detailed Feedback inter-
Students With Medium Initial Draft Score
action, F(2, 103) ⫽ 5.27, p ⬍ .01, 2 ⫽ .10. In the no-detailed-
feedback condition, scores were higher for students’ who received For students who received a medium draft score (between 70
a grade (M ⫽ 67.85, SD ⫽ 6.64) than for those who did not receive and 79), a significant effect for the detailed feedback, F(2, 204) ⫽
Table 4
Means and Standard Deviations of the Final Exam Scores by Detailed Feedback, Grade, and Praise
Condition No praise Praise Total No praise Praise Total No praise Praise Total
No feedback
M 73.80 74.38 74.09 75.11 76.24 75.67 74.44 75.27 74.85
SD 8.57 9.21 8.84 8.56 7.60 8.07 8.54 8.47 8.49
N 40 40 80 38 37 75 78 77 155
Computer
M 81.15 79.75 80.44 79.80 80.28 80.04 80.47 80.01 80.24
SD 8.43 8.97 8.68 7.07 8.36 7.70 7.75 8.62 8.18
N 39 40 79 40 40 80 79 80 159
Instructor
M 83.85 83.26 83.57 78.41 81.74 80.09 81.20 82.47 81.82
SD 7.60 7.56 7.53 7.84 7.92 8.01 8.14 7.74 7.94
N 39 35 74 37 38 75 76 73 149
Total
M 79.55 78.95 79.25 76.80 79.16 78.63 78.69 79.20 78.94
SD 9.20 9.32 9.24 8.02 8.24 8.15 8.66 8.78 8.71
N 118 115 233 115 115 230 233 230 463
328 LIPNEVICH AND SMITH
Table 5 Table 7
Estimated Marginal Means and Standard Deviations of the Estimated Marginal Means and Standard Deviations of the
Final Exam Score by Grade and Praise Final Exam Score by Grade and Source of Feedback for
Low-Ability Students
Condition M SD N
Condition M SD N
No grade
No praise 79.82 5.12 118 No grade
Praise 79.06 5.13 115 No feedback 64.15 6.75 19
Grade Computer 72.07 6.64 21
No praise 77.69 5.12 115 Instructor 77.24 6.86 18
Praise 79.26 5.12 115 Grade
No feedback 67.85 6.64 18
Note. Adjusted means after controlling for the first exam score. Computer 75.50 6.71 21
Instructor 72.07 6.65 19
Note. Adjusted means after controlling for the first exam score.
34.87, p ⬍ .001, 2 ⫽ .26, was found. Pairwise comparisons
revealed that students in the control condition scored significantly
lower (M ⫽ 74.23, SD ⫽ 4.79) than did those in either instructor
condition (M ⫽ 80.23, SD ⫽ 6.33, p ⬍ .01) or computer condition Overall, the analyses showed that students who scored low on
(M ⫽ 79.54, SD ⫽ 5.29, p ⬍ .01). No differences were found the first draft responded favorably to detailed feedback and were
between the instructor and the computer conditions ( p ⬎ .05). able to improve upon it. However, when presented with a grade
Additionally, significant differences were found between partici- from the instructor, these students did not do as well as when they
pants in the grade and no-grade conditions, F(1, 204) ⫽ 7.9, p ⬍ were oblivious to their draft grade. At the same time, we found that
.001, 2 ⫽ .09. Students who were shown their initial draft grade low-scoring students did not react negatively to a grade if they
scored lower than did those who were not shown their grade (M ⫽ believed it had come from the computer or when a grade was the
76.06, SD ⫽ 5.54, for the grade condition; M ⫽ 78.88, SD ⫽ 6.03, only feedback they received. Both medium and high scorers were
for the no-grade condition). Grade ⫻Detailed Feedback was found shown to respond well to detailed feedback coming from either the
not to be significant for this group of students. computer or the instructor. Their performance, however, depended
on whether a grade was presented, with those who received a grade
Students With High Initial Draft Score scoring lower than did those who did not. It did not matter whether
the grade came from the computer or the instructor, as students’
For the high-scoring group (80 and above), the ANCOVA response to it was comparably unfavorable.
revealed a significant effect for the detailed feedback, F(2, 117) ⫽
18.13, p ⬍ .001, 2 ⫽ .24, with students in the control condition Analyses of Differences in Motivation, Self-Efficacy,
scoring significantly lower (M ⫽ 84.49, SD ⫽ 4.88) than did those
and Affect
in either the instructor condition (M ⫽ 88.49, SD ⫽ 5.14, p ⬍ .01)
or computer condition (M ⫽ 88.76, SD ⫽ 4.35, p ⬍ .01). No The relationships among detailed feedback, praise, and grades,
differences were found between the computer and instructor and students’ motivation, self-efficacy, and negative and positive
detailed-feedback conditions ( p ⬎ .05). Additionally, significant affect were investigated via two 3 ⫻ 2 ⫻ 2 multivariate analyses
differences were found between the grade and no-grade conditions, of variance (MANOVAs). The first MANOVA included self-
F(1, 117) ⫽ 3.72, p ⬍ .01, 2 ⫽ .05. High-scoring students in the efficacy and motivation as dependent variables, and grade, praise,
grade condition scored significantly lower than did those in the and detailed feedback as independent variables. The second
no-grade condition (M ⫽ 86.54, SD ⫽ 4.95, for the grade condi- MANOVA was run with Positive Affect Scale and Negative
tion; M ⫽ 88.25, SD ⫽ 5.18, for the no- grade condition). Affect Scale scores as dependent variables, and with grade, praise,
and detailed feedback as independent variables. We ran the two
analyses separately because the data for them were gathered at
Table 6 different points in the experiment.
Estimated Marginal Means and Standard Deviations of the For self-efficacy and motivation, multivariate tests were signif-
Final Exam Score by Grade and Detailed Feedback icant for the grade factor—the F statistic for Wilks’ lambda was
F(2, 449) ⫽ 5.42, p ⬍ .01—and for the praise factor—the F
Condition M SD N statistic for Wilks’ lambda was F(2, 449) ⫽ 4.02, p ⬍ .01— but
not for the detailed feedback or any of the interactions. To test the
No grade difference for both of the dependent variables, univariate analyses
No feedback 74.65 5.12 80
Computer 80.93 5.12 79 were performed for motivation and self-efficacy.
Instructor 82.74 5.13 74 For motivation, the univariate results indicate significant differ-
Grade ences in motivation levels between students who received praise
No feedback 75.37 5.12 75 on their initial performance and those who did not, F(1, 450) ⫽
Computer 80.43 5.12 80
Instructor 79.63 5.12 75
7.58, p ⬍ .01, 2 ⫽ .04. Interestingly, students in the praise
condition reported lower motivation (M ⫽ 47.29, SD ⫽ 7.66) than
Note. Adjusted means after controlling for the first exam score. did students in the no-praise condition (M ⫽ 49.06, SD ⫽ 5.71).
EFFECTS OF DIFFERENTIAL FEEDBACK 329
For self-efficacy, the results indicated a significant grade effect, in comparison with those students who believed that feedback was
F(1, 450) ⫽ 10.80, p ⬍ .01, 2 ⫽ .08, with students who received computer generated (M ⫽ 5.44, SD ⫽ 1.56) or those who did not
a grade for their initial draft exhibiting lower self-efficacy levels receive detailed feedback (M ⫽ 2.79, SD ⫽ 1.76).
(M ⫽ 43.38, SD ⫽ 7.03) than did those who were unaware of their Overall, students rated detailed feedback from the instructor as
draft grade (M ⫽ 45.47, SD ⫽ 6.36). more helpful and accurate than did students in the computer
For positive and negative affect, multivariate tests were only feedback condition. The presence of grade or praise did not affect
significant for the grade factor. The F statistic for Wilks’ lambda the perceptions of the accuracy or helpfulness of the feedback.
was F(2, 450) ⫽ 7.03, p ⬍ .01. To test the difference for both of
the dependent variables, univariate analyses were performed for Discussion
both positive and negative affect variables. Similarly to self-
efficacy, there was a significant difference in negative affect The strongest and most consistent finding of the study was that
depending on the presence or absence of grade, F(1, 450) ⫽ 14.09, written, detailed feedback specific to individual work was strongly
p ⬍ .01, 2 ⫽ .08. Students who received a grade for their draft related to improvement. The effects of grades and praise on per-
reported higher levels of negative affect (M ⫽ 25.27, SD ⫽ 7.68) formance were more complex. Students in the instructor feedback
than did those who did not receive their grade (M ⫽ 22.72, SD ⫽ group who also received a grade for their draft had lower scores
7.12). For positive affect, there were no significant effects for any than did those who did not receive a grade. However, if they
of the independent variables or their interactions. received a grade and a statement of praise, the negative effect was
Overall, presence of grade was shown to have a significant ameliorated. It is interesting to note that the highest-performing
effect on students’ reported self-efficacy and negative affect. Stu- group in the study was the one receiving detailed feedback per-
dents who received a grade had more negative affect and reported ceived to come from the instructor with no grade and no praise.
lower levels of self-efficacy than did their counterparts for whom These findings are consistent with the research showing that
their grade was unknown. Praise affected motivation, but in an descriptive feedback that conveys information on how one per-
unexpected fashion, with students who were presented with a forms the task and details ways to overcome difficulties is far more
laudatory statement reporting lower levels of motivation than did effective than is evaluative feedback, which simply informs stu-
those who were not. dents about how well they did (Hattie & Timperley, 2007; Kluger
& DeNisi, 1998). Indeed, across the entire sample, for students of
Analyses of Differences in Perceived Helpfulness and all writing ability levels, detailed feedback led to the greatest
improvement. The importance of detailed feedback is especially
Accuracy of Feedback
clear for tasks that are loosely framed and do not have a simple
To examine the final issue addressed in the study about the right or wrong answer (Bangert-Drowns et al., 1991; Roos &
perceived helpfulness of detailed feedback and perceived accuracy Hamilton, 2005).
of that feedback, a 3 ⫻ 2 ⫻ 2 MANOVA was used. Perceived
helpfulness and accuracy of detailed feedback were used as de- Differences in Responses Depending on the Perceived
pendent variables, and grade, praise, and the detailed feedback as Source of Feedback
independent variables. Multivariate analyses only revealed signif-
icant effects for the detailed feedback; the F statistic for Wilks’ We found no significant differences due to source of feedback.
lambda was F(4, 900) ⫽ 87.10, p ⬍ .001. This finding provides partial support for the “computers as social
Subsequent univariate analyses with the perceived accuracy actors” paradigm, suggesting that people may be unconsciously
of detailed feedback as dependent variable revealed a signifi- perceiving computers as “intentional social agents,” and because
cant effect for the detailed feedback factor, F(2, 451) ⫽ 130.98, of this, computer-provided feedback tends to elicit the same or
p ⬍ .001, 2 ⫽ .37. A post hoc analysis yielded a significant very similar responses from individuals (Mishra, 2006; Nass et al.,
difference in accuracy ratings between instructor and computer 1996, 1999). The support for this paradigm is only partial, because
conditions ( p ⬍ .01), between instructor and no-detailed feed- although students’ exam scores were quite similar for both com-
back conditions ( p ⬍ .01), and between the computer and puter and instructor conditions, interactions between the source of
no-feedback conditions ( p ⬍ .01). Students who received their feedback and grade and praise were consistently found.
feedback that was perceived to come from the instructor rated feed- The competing paradigm, which proposes that computers are
back as being more accurate (M ⫽ 5.95, SD ⫽ 1.07) than did those generally perceived as neutral tools (Earley, 1988; Lepper et al.,
who received feedback perceived to be from a computer (M ⫽ 5.33, 1993), was not supported here. According to this perspective,
SD ⫽ 1.42) or those who did not receive detailed feedback (M ⫽ 3.30, computers tend to be viewed as neutral and unbiased sources of
SD ⫽ 1.91). Of course, those receiving no detailed feedback had little information, and feedback received from computers is more
basis for making a judgment. trusted by individuals. Quite contrary to this viewpoint, partici-
Univariate analysis with perceived helpfulness of feedback re- pants in our study rated the instructor’s feedback as being more
vealed a significant effect for the detailed feedback, F(2, 451) ⫽ accurate and helpful than was computer-generated feedback.
206.12, p ⬍ .001, 2 ⫽ .48. A post hoc analysis ( p ⬍ .01)
indicated a significant difference in helpfulness of feedback ratings Effects of Grades on Student Performance
between the instructor and computer conditions, between the in-
structor and no-feedback conditions, and between the computer The effect of receiving a grade in this study was particularly
and no-feedback conditions. Students who received feedback from interesting. There was a main effect for grade and two notable
the instructor rated it as being more helpful (M ⫽ 6.06, SD ⫽ 1.07) interactions. Among those students who believed that they re-
330 LIPNEVICH AND SMITH
ceived their detailed feedback from the instructor, those who were stronger influence on students’ performance, with praise adding to
given a grade for their draft showed substantially lower scores than and modifying their effects. Specifically, we found that praise
did those who were not. Receiving a grade was also generally mitigated the adverse effect of grades on students’ performance.
associated with lower self-efficacy and more negative affect. One The only outcome measure directly affected by praise was
explanation for theses findings comes from the feedback interven- motivation. The effect of praise here was quite interesting, if not
tion theory of Kluger and DeNisi (1996). They suggested that surprising. Students presented with praise reported slightly lower
optimal feedback should direct individuals’ attention toward the levels of motivation than did their counterparts who were not
task and toward the specific strategies that would lead to achieve- praised on their initial performance (effect size of .27). The only
ment of desired outcomes. Letter grades or numeric scores, being study that somewhat agrees with the finding here was conducted
evaluative in nature, tend to turn students’ attention away from the by Butler (1987). The researcher demonstrated that students re-
task and toward the self, leading to negative effects on perfor- ceiving praise on their performance reported high levels of ego
mance (Kluger & DeNisi, 1996; Siero & Van Oudenhoven, 1995; involvement, decreased levels of task involvement, and higher
Szalma, Hancock, Warm, Dember, & Parsons, 2006). The findings perceptions of success while exhibiting modest performance on a
are also consistent with Hattie and Timperley’s (2007) argument task in comparison with students who were not praised on their
that feedback focused on the task and not the individual is more work.
effective. The motivation measure administered in our study did not gauge
Similarly, attention to the self, elicited by the presentation of a different types of motivation. It is possible that this general moti-
grade, could activate affective reactions. According to Kluger, vation measure corresponded to the task-involvement measure
Lewinsohn, and Aiello (1994), feedback gets cognitively evaluated used by Butler (1987) and therefore elicited similar responses.
with respect to two dimensions: harm versus benefit and the need Students presented with praise were not as interested in the task
to take action. The appraisal of harm or benefit potential for the and were not as motivated to try harder, believing perhaps that they
self is reflected in the primary dimension of mood (pleasantness), had achieved enough. This supposition could be confirmed if
whereas the need to take action is reflected in a secondary dimen- students’ performance reflected it; however, praise appears to have
sion of mood (arousal). The affective measure administered in this a less direct—rather, a mitigating— effect on students’ perfor-
study addressed the arousal dimension of mood. High positive mance. Further research is needed.
affect was indicative of high arousal, and high negative affect was
indicative of depression and behavior inhibition (Crawford & Difference in Responses to Feedback as Dependent on
Henry, 2004). The results indicated that students who were shown Students’ Draft Score
their draft grade scored significantly higher on the Negative Affect
Scale than did their counterparts who did not receive their draft Several researchers propose that students’ responses to feedback
grade. Thus, the effect of the grade may have led students to messages may depend on their ability or typical performance
become depressed about their performance, leading them to be less levels (Black & Wiliam, 1998). Very few studies have examined
disposed to put forth the necessary effort to improve their work. the differential effects of feedback on students’ performance for
This effect may have been particularly strong if the grade was students of different past performance. In the present study, low-,
perceived to be coming from the instructor (as opposed to being medium-, and high-scoring students on the initial essay draft
computer generated), hence the large negative impact of grade on showed a significant increase in scores when presented with de-
performance in that condition. tailed feedback. It did not matter what level their original perfor-
The negative effect of grades on students’ performance can also mance was; students who were offered feedback specific to their
be explained through their influences on students’ self-efficacy. own work found ways to incorporate it into their essay and
Self-efficacy has been shown to be influenced by prior outcomes improve their results. After covariate adjustment for pretest per-
(Bandura & Locke, 2003). Feedback, therefore, has a potential of formance, feedback accounted for 28% of variance in the final
affecting self-efficacy. The current study revealed that presentation exam score for students within the low-achievement group, and for
of grade resulted in decreased levels of self-efficacy with regard to 26% and 24% of those in the medium and high groups, respec-
the exam. Students who were not shown their draft grade reported tively. Thus, the positive effect of personalized feedback was
higher levels of exam-specific self-efficacy than did those to observed irrespective of students’ initial writing scores.
whom a grade was provided. Although detailed feedback was conducive to learning in stu-
dents of all performance levels, some differences in students’
Effects of Praise on Student Performance responses to feedback were found between the low-scoring group
on one hand and the medium- and high-scoring groups on the
Our study attempted to clarify the effect of praise on students’ other. Butler (1988) showed that presentation of a grade on its own
performance, motivation, self-efficacy, and affect. Praise is a con- or in combination with any other information leads to a significant
troversial topic, with some researchers arguing that praise pro- decline of interest in performing the task for low-achieving stu-
motes learning by raising positive affect and self-efficacy (Alber & dents. In the current study, students who received high or medium
Heward, 2000), whereas others stipulate that it leads to depletion initial scores performed less well on the revision than did students
of cognitive resources by taking attention away from the task and in the no-grade condition. As was suggested in preceding sections,
focusing it on aspects of the self (Baumeister et al., 1990; Kluger a grade appeared to undermine the effort that students were willing
& DeNisi, 1996). This study did not reveal any consistent overall to put forward to improve their work. However, no overall differ-
differences in performance among students who did or did not ences between the grade and no-grade conditions were found for
receive praise on their performance. Comments and grades had a the low-scoring students. Instead, there was a strong Grade ⫻De-
EFFECTS OF DIFFERENTIAL FEEDBACK 331
tailed Feedback interaction. Specifically, students receiving grades appears that this occurs because a grade reduces a sense of self-
for their draft performed better in the no-detailed-feedback and efficacy and elicits negative affect around the assessment task.
computer feedback conditions but worse in the instructor feedback Although the present study was strengthened by conducting the
condition. It may be the case that the computer-based grade was research in an actual university course, we do not know whether
viewed as being less judgmental or personally directed than was students receiving detailed feedback on the task at hand would
the instructor-based grade. perform better in a subsequent task or whether presentation of a
grade led to less learning or simply to less effort on the revision of
the work. One clear venue for future research would be to study
Limitations how differential feedback influences subsequent learning in a
course. It is, of course, difficult to conduct research that would
Some limitations of the study should be noted. One of the
vary the nature of the feedback that students receive on a random-
feedback conditions in the study involved presentation of praise.
ized basis throughout an entire course, both for practical and
The decision was made to use standard laudatory comments dif-
ethical reasons. Yet, unless we find ways to conduct rigorous
ferentiated according to three levels of the quality of initial stu-
research into these issues, and their many elaborations and permu-
dents’ work. No main effects were found for the praise factor. It is
tations, we will not learn the most effective approaches to provid-
possible that none of the levels of praise were strong enough to
ing feedback and utilizing formative assessment.
induce the responses that are commonly reported in the literature
(Baumeister et al., 1990; Delin & Baumeister, 1994; Henderlong &
Lepper, 2002). Comments that were more detailed and personal References
might have induced more positive responses from the participants.
Airasian, P. W. (1994). Classroom assessment. New York, NY: McGraw-
At the same time, interaction effects were found between praise Hill.
and grade, as well as between praise and feedback source, which Alber, S. R., & Heward, W. L. (2000). Teaching students to recruit positive
indicate that the praise manipulation was successful at least to a attention: A review and recommendations. Journal of Behavioral Edu-
degree. cation, 10, 177–204.
The effects of the various conditions examined in this study may Attali, Y. (2004). Exploring the feedback and revision features of the
well not operate in the same fashion for all individuals. In this criterion service. Paper presented at the National Council on Measure-
study, we did not directly address individual differences among ment in Education annual meeting, San Diego, CA.
participants with the exception of considering the influence of Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.
2. Journal of Technology, Learning, and Assessment, 4, 123–212.
initial scores. A systematic investigation into how individuals
Bandura, A., & Locke, E. A. (2003). Negative self-efficacy and goal effects
differ with regard to response to feedback would be one particu-
revisited. Journal of Applied Psychology, 88, 87–99.
larly fruitful area for further investigation. Bangert-Drowns, R. L., Kulik, J. A., & Morgan, M. T. (1991). The
The sample of the present study comprised college students who instructional effect of feedback in test-like events. Review of Educa-
were relatively uniform in their age, with the majority of the tional Research, 61, 213–238.
participants being first-year students. Generalizing the results of Baumeister, R. F., Hutton, D. G., & Cairns, K. J. (1990). Negative effects
the study to wider populations should be approached with caution. of praise on skilled performance. Basic and Applied Social Psychology,
Conversely, the fact that the main experimental task was a part of 11, 131–148.
a normal learning experience, and was approached by participants Black, P., & Wiliam, D. (1998). Assessment and classroom learning.
seriously as a regular course exam, contributed to the robustness of Assessment in Education: Principles, Policy & Practice, 5, 7– 68.
Boekaerts, M., Maes, S., & Karoly, P. (2005). Self-regulation across
the findings.
domains of applied psychology: Is there an emerging consensus? Ap-
Finally, the experimental task involved students working on an
plied Psychology: An International Review, 54, 149 –154.
essay and then coming back a week later to revise their work on Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring
the basis of the feedback provided at that time. In other words, the with natural language processing. In M. D. Shermis & J. Burstein (Eds.),
feedback was used to monitor and improve performance on an Automated essay scoring: A cross-disciplinary perspective (pp. 107–
assignment carried out over a relatively brief period. The students 116). Hillsdale, NJ: Erlbaum.
were not assessed later, and they were not given a similar task at Butler, R. (1987). Task-involving and ego-involving properties of evalua-
a later time. Therefore, the present study does not allow for tion: Effects of different feedback conditions on motivational percep-
inferences concerning the long-term effect of feedback on stu- tions, interest, and performance. Journal of Educational Psychology, 79,
dents’ performance. 474 – 482.
Butler, R. (1988). Enhancing and undermining intrinsic motivation: The
effects of task-involving and ego-involving evaluation on interest and
Conclusions and Directions for Future Research performance. British Journal of Educational Psychology, 58, 1–14.
Butler, R., & Nisan, M. (1986). Effects of no-feedback, task-related com-
The findings of this study show that detailed, specific, descrip- ments and grades on intrinsic motivation and performance. Journal of
tive feedback that focuses students’ attention on their work, rather Educational Psychology, 78, 210 –216.
Cameron, J., & Pierce, D. P. (1994). Reinforcement, reward, and intrinsic
than on the self, is the most advantageous approach to formative
motivation: A meta-analysis. Review of Educational Research, 64, 363–
feedback. The benefits of such feedback occur at all levels of 423.
performance. Evaluative feedback in the form of grades may be Crawford, J. R., & Henry, J. D. (2004). The Positive and Negative Affect
helpful if no other options are available and can beneficially be Schedule (PANAS): Construct validity, measurement properties and
accompanied by some form of encouragement. At the same time, normative data in a large non-clinical sample. British Journal of Clinical
grades were shown to decrease the effect of detailed feedback. It Psychology, 43, 245–265.
332 LIPNEVICH AND SMITH
Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review computers? Social desirability and direct responses to computers. Jour-
of experiments examining the effects of extrinsic rewards on intrinsic nal of Applied Social Psychology, 29, 1093–1110.
motivation. Psychological Bulletin, 125, 627– 668. Nass, C., Moon, Y., & Green, N. (1997). Are computers gender-neutral?
Delin, C. R., & Baumeister, R. F. (1994). Praise: More than just social Gender stereotypic responses to computers. Journal of Applied Social
reinforcement. Journal for the Theory of Social Behaviour, 24, 219 –241. Psychology, 27, 864 – 876.
Earley, P. C. (1988). Computer-generated performance feedback in the Oosterhof, A. (2001). Classroom applications of educational measure-
subscription-processing industry. Organizational Behavior and Human ment. Upper Saddle River, NJ: Merrill Prentice Hall.
Decision Processes, 41, 50 – 64. Orrell, J. (2006). Feedback on learning achievement: Rhetoric and reality.
Ferdig, R. E., & Mishra, P. (2004). Emotional responses to computers: Teaching in Higher Education, 11, 441– 456.
Experiences in unfairness, anger and spite. Journal of Educational Ramaprasad, A. (1983). On the definition of feedback. Behavioral Science,
Multimedia and Hypertext, 13, 143–161. 28, 4 –13.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Roesch, S. C. (1998). The factorial validity of trait positive affect scores:
Educational Research, 77, 81–113. Confirmatory factor analyses of unidimensional and multidimensional
Henderlong, J., & Lepper, M. R. (2002). The effects of praise on children’s models. Educational and Psychological Measurement, 58, 451– 466.
intrinsic motivation: A review and synthesis. Psychological Bulletin, Roos, B., & Hamilton, D. (2005). Formative assessment: A cybernetic
128, 774 –795. viewpoint. Assessment in Education, 12, 7–20.
Ilgen, D. R., & Davis, C. A. (2000). Bearing bad news: Reactions to Shute, V. J. (2007). Focus on formative feedback (Rep. No. RR-07–11).
negative performance feedback. Applied Psychology: An International Princeton, NJ: Educational Testing Service.
Review, 49, 550 –565. Scriven, M. (1967). The methodology of curriculum evaluation. In R.
Ilies, R., & Judge, T. A. (2005). Goal regulation across time: The effects Taylor, R. Gagne, & M. Scriven (Eds.), AERA monograph series on
of feedback and affect. Journal of Applied Psychology, 90, 453– 467. curriculum evaluation (Vol. 1, pp. 39 – 83). Chicago, IL: Rand McNally.
Siero, F., & Van Oudenhoven, J. P. (1995). The effects of contingent
Jolly, J. B., Dyck, M. J., Kramer, T. A., & Wherry, J. N. (1994). Integration
feedback on perceived control and performance. European Journal of
of positive and negative affectivity and cognitive content specificity:
Psychology of Education, 10, 13–24.
Improved discrimination of anxious and depressive symptoms. Journal
Smith, E., & Gorard, S. (2005). They don’t give us our marks: The role of
of Abnormal Psychology, 103, 544 –552.
formative feedback in student progress. Assessment in Education Prin-
Kanouse, D. E., Gumpert, P., & Canavan-Gumpert, D. (1981). The seman-
ciples Policy & Practice, 12, 21–38.
tics of praise. In J. H. Harvey, W. Ickes, & R. F. Kidd (Eds.), New
Spencer, S. (2005). Stereotype threat in mathematics in undergraduate
directions in attribution research (Vol. 3, pp. 97–115). Hillsdale, NJ:
women. Unpublished doctoral dissertation, Rutgers University, New
Erlbaum.
Brunswick, NJ.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback interventions
Symonds, K. W. (2004). After the test: Closing the achievement gap with
on performance: Historical review, a meta-analysis, and a preliminary
data. Naperville, IL: Learning Point Associates.
feedback intervention theory. Psychological Bulletin, 119, 254 –284.
Szalma, J. L., Hancock, P. A., Warm, J. S., Dember, W. N., & Parsons,
Kluger, A. N., Lewinsohn, S., & Aiello, J. (1994). The influence of K. S. (2006). Training for vigilance: Using predictive power to evaluate
feedback on mood: Linear effects on pleasantness and curvilinear effects feedback effectiveness. Human Factors, 48, 682– 692.
on arousal. Organizational Behavior and Human Decision Processes, Vancouver, J. B., More, K. M., & Yoder, R. J. (2008). Self-efficacy and
60, 276 –299. resource allocation: Support for a nonmonotonic discontinuous model.
Landauer, T. K., Latham, D., & Foltz, P. (2003). Automatic essay assess- Journal of Applied Psychology, 93, 35– 47.
ment. Assessment in Education, 10, 124 –135. Vancouver, J. B., Thompson, C. M., & Williams, A. A. (2001). The
Lepper, M. R., Henderlong, J., & Gingras, I. (1999). Understanding the changing signs in the relationships between self-efficacy, personal goals,
effects of extrinsic rewards on intrinsic motivation: Uses and abuses of and performance. Journal of Applied Psychology, 86, 605– 620.
meta-analysis: Comment on Deci, Koestner, and Ryan (1999). Psycho- Watson, D., Clark, L. A., & Tellegen, A. (1988). Development and vali-
logical Bulletin, 125, 669 – 676. dation of brief measures of positive and negative affect: The PANAS
Lepper, M. R., Woolverton, M., Mumme, D. L., & Gurtner, J. (1993). Scales. Journal of Personality and Social Psychology, 47, 1063–1070.
Motivational techniques of expert human tutors: Lessons for the design Wiliam, D., & Thompson, M. (2007). Integrating assessment with instruc-
of computer-based tutors. In S. P. Lajoie & S. J. Derry (Eds.), Computers tion: What will it take to make it work? In C. A. Dwyer (Ed.), The future
as cognitive tools (pp. 75–106). Hillsdale, NJ: Erlbaum. of assessment: Shaping teaching and learning. Mahwah, NJ: Erlbaum.
Marzano, R. (2000). Transforming classroom grading. Alexandria, VA: Wise, S. L., & DeMars, C. E. (2005). Low examinee effort in low-stakes
Association for Supervision and Curriculum and Development. assessment: Problems and potential solutions. Educational Assessment,
Mishra, P. (2006). Affective feedback from computers and its effect on 10, 1–17.
perceived ability and affect: A test of the computers as social actors Wolf, L. F., & Smith, J. K. (1995). The consequence of consequence:
hypothesis. Journal of Educational Multimedia and Hypermedia, 15, Motivation, anxiety, and test performance. Applied Measurement in
107–131. Education, 8, 227–242.
Nass, C., Fogg, B. J., & Moon, Y. (1996). Can computers be teammates? Wolf, L. F., Smith, J. K., & Birnbaum, M. E. (1995). Consequence of
International Journal of Human–Computer Studies, 45, 669 – 678. performance, test motivation, and mentally taxing items. Applied Mea-
Nass, C., Moon, Y., & Carney, P. (1999). Are respondents polite to surement in Education, 8, 341–351.
EFFECTS OF DIFFERENTIAL FEEDBACK 333
Appendix