2da lectura
2da lectura
Students are often encouraged to generate and answer their own questions on to-be-remembered material,
because this interactive process is thought to enhance memory. But does this strategy actually work? In three
experiments, all participants read the same passage, answered questions, and took a test to get accustomed to
the materials in a practice phase. They then read three passages and did one of three tasks on each passage:
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
reread the passage, answered questions set by the experimenter, or generated and answered their own
This document is copyrighted by the American Psychological Association or one of its allied publishers.
questions. Passages were 575-word (Experiments 1 and 2) or 350-word (Experiment 3) texts on topics such
as Venice, the Taj Mahal, and the singer Cesaria Evora. After each task, participants predicted their
performance on a later test, which followed the same format as the practice phase test (a short-answer test in
Experiments 1 and 2, and a free recall test in Experiment 3). In all experiments, best performance was
predicted after generating and answering questions. We show, however, that generating questions led to no
improvement over answering comprehension questions, but that both of these tasks were more beneficial than
rereading. This was the case on an immediate short-answer test (Experiment 1), a short-answer test taken 2
days after study (Experiment 2), and an immediate free recall test (Experiment 3). Generating questions took
at least twice as long as answering questions in all three experiments, so although it is a viable alternative to
answering questions in the absence of materials, it is less time-efficient.
Much recent work has focused on optimizing students’ study as a strategy over eight times more often than self-testing (84%
strategies. Some strategies are more successful than others in and 11% of students surveyed, respectively). Kornell and Son
producing long-term retention. The most consistently effective tech- (2009) also found that when self-testing does occur, the motivation
nique seems to be self-testing (Carpenter, Pashler, & Vul, 2006; is diagnostic purposes rather than cognizance of the direct benefits
Karpicke & Roediger, 2007; McDaniel, Roediger, & McDermott, of testing.
2007; see Roediger & Karpicke, 2006a, for a review); taking a test Why does self-testing occur so infrequently? One probable
on material sometimes more than doubles retention compared to explanation is that students are not aware of the benefits of
control conditions involving unrelated tasks or restudy of the self-testing. Evidence for this explanation comes from predic-
material. This effect has been demonstrated with a wide range of tions made by students about future memory performance fol-
materials such as paired associates (Carrier & Pashler, 1992; lowing testing or restudy. Karpicke and Roediger (2008) found
Karpicke & Roediger, 2008), as well as more complex materials that students did not expect any change in retention following
such as passages (Kang, McDermott, & Roediger, 2007; Nungester retrieval practice, and Agarwal, Karpicke, Kang, Roediger, and
& Duchastel, 1982; Roediger & Karpicke, 2006b) and lectures (Butler McDermott (2008) found that students predicted the same level
& Roediger, 2007). It has also been successfully implemented in a of performance after testing as after restudy. However, another
real-world classroom setting (McDaniel et al., 2007). However, de- reason why students may not engage in self-testing is that they
spite the demonstrated benefits of self-testing, students do not tend may not have access to the resources required to implement this
to implement this technique when left to their own devices. technique. Testing schedules and materials used in laboratory
Karpicke, Butler, and Roediger (2009) asked college students studies are produced by the experimenter, and when these
about their study strategies and found that rereading was reported studies are extended to classroom settings, students are pro-
vided with practice tests rather than asked to develop their own
(e.g., McDaniel et al., 2007). Testing as retrieval practice rather
Yana Weinstein, Kathleen B. McDermott, and Henry L. Roediger III, than a diagnostic tool has not yet been accepted in mainstream
Department of Psychology, Washington University in St. Louis. education, so students may simply not have sets of questions to
Support for this research was provided by a James S. McDonnell use for self-testing. In this article, we set out to investigate
Foundation 21st Century Science Initiative grant: Bridging Brain, Mind
whether an alternative, more easily implemented technique
and Behavior/Collaborative Award. Thanks to Kristy Duprey for assistance
with data collection. could yield comparable benefits to taking a practice test. In the
Correspondence concerning this article should be addressed to Yana absence of practice test questions, could generating and answer-
Weinstein, Department of Psychology, Box 1125, Washington University, ing one’s own practice questions lead to similar benefits for
One Brookings Drive, St. Louis, MO 63130. E-mail: y.weinstein@wustl.edu retention?
308
A COMPARISON OF STUDY STRATEGIES FOR PASSAGES 309
Another reason why students may be reluctant to engage in In addition to this primary goal, the article also addressed two
self-testing is the mental effort involved in retrieving information additional questions: metacognition and efficiency of the study
from memory. Whereas it has been argued that this act of effortful strategies. First, if generating and answering one’s own questions
retrieval is crucial to promoting long-term retention (e.g., is beneficial to later memory, this is only helpful insofar as
Gardiner, Craik, & Bleasdale, 1973; Jacoby, 1978), there is also students are aware of the benefits and choose to use the technique.
some evidence that self-testing is beneficial to later memory even We thus collected participants’ predictions of how they would
when the effort during the retrieval process is minimized. Agarwal perform in each of the three conditions (rereading the passage,
et al. (2008) showed that the benefits of testing for later retention answering questions set by the experimenter, and generating and
extend to open-book practice tests—that is, when students practice answering their own questions). These judgments allowed us to
questions with the material in front of them, but later take a determine whether participants could differentiate among the three
traditional closed-book test. Open-book tests presumably require tasks when predicting their performance on the final test and
less retrieval effort and are also preferred by students (Ben-Chaim whether predictions followed the same pattern as performance.
& Zoller, 1997). Second, as time is a limited resource, it was also important to
In the present experiments, an additional self-testing technique determine how long each task would take. To measure the effi-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
is introduced. In this technique, students generate and also answer ciency of each of the three tasks, we let participants spend as much
their own questions after reading a passage. Performance on a time as they needed on each task. Any task that could potentially
subsequent closed-book test for passages studied in this manner aid later test performance also takes time, and there is trade-off
was compared with two control conditions: one in which partici- between these benefits and the time taken to achieve them. In the
pants answered questions set by the experimenter (analogous to an event that differences in performance are found among tasks, these
open book test), and one in which they reread the passage. The must be qualified by their efficiency—that is, improvements in
effects of these three study strategies were examined on a short- performance as a function of additional time taken.
answer test taken 30 to 45 min after initial study (Experiment 1),
a short-answer test taken 2 days after study (Experiment 2), and a Experiment 1
free recall test taken 30 to 45 min after study (Experiment 3).
The main aim of this article was to determine whether generat-
Method
ing questions could provide benefits equal to those of self-testing
using prepared materials, which may not be available to all stu- Participants. Twenty-nine participants volunteered for the
dents. However, there is some evidence to suggest that this con- experiment and were reimbursed $10 for 1 hour of their time.
dition might produce benefits over and above self-testing using Participants were recruited over the summer from the Psychology
materials prepared by a third party. Searching the text for infor- Department Subject Pool at Washington University in St Louis.
mation to generate questions could provide benefits via three Participants were thus current students, recent graduates, and
processes: generation, elaboration, and synthesis. First, the gener- members of the local community. The age range was 18 to 32, with
ation effect reveals that self-generated information is remembered a mean age of 21.4 years (SD ⫽ 2.8 years). Participants were
better than information that is passively encoded (Slamecka & predominantly female (23 women and 6 men). The sample was
Graf, 1978), at least in a mixed-list design (McDaniel & Bugg, ethnically diverse, with 12 Caucasians, 19 Asian or Pacific Island-
2008). Of course, the answer-questions condition also involves ers, and 7 African Americans. Twenty-three participants were
generation, but it could be that generating both the questions and current undergraduates: 6 had completed 1 year of college, 7 had
the answers will produce additional benefits to retention. Second, completed 2 years, and 10 had completed 3 years. Of the remain-
one account of the testing effect ascribes the benefits of testing to ing participants, three held a bachelor’s degree, two held a mas-
the increased elaboration that results from having to retrieve an ter’s degree, and one had completed grade school. Three of the
answer (Carpenter, 2009; Carpenter & DeLosh, 2006). In our task, participants were not native English speakers, but had spoken
selecting information for the questions could produce greater elab- English for 8, 9, or 16 years each.
oration leading to better retention of the material. Third, preparing Materials. Four passages were created by adapting Wikipedia
to relay information to a peer has been found to improve retention pages on Salvador Dalı́, the KGB, Venice, and the Taj Mahal (see
because it promotes synthesis of the material (Nestojko, Bui, Supplemental Materials for the passages reproduced in full; these
Kornell, & Bjork, 2009). In our task, scanning the text with a view materials were designed and previously used by Butler, Marsh, &
to finding appropriate material for questions could produce a Roediger, 2005, but are reproduced here for the first time). Pas-
similar effect. sages were approximately 575 words long and were divided into
An important motivation for testing this technique is that it is four paragraphs. For the final test, two questions per paragraph
often recommended by educators, who may believe that generating were devised (see Supplemental Materials). All questions could be
questions promotes deeper engagement with and understanding of answered by a single word or short phrase.
the material, thus promoting retention (see, for instance, Robinson In addition to the eight final test questions, eight comprehension
[1970], who proposed a study technique called SQ3R: survey, questions were also devised per passage (see Supplemental Mate-
question, read, recite, review). Up until now, this recommendation rials). The comprehension questions were used in the encoding
has seldom been tested, apart from in large-scale studies involving task in the answer questions condition, whereas the final test
lengthy training procedures to get students accustomed to the questions were given to all participants at the end of the experi-
technique (e.g., Martin, 1985; see McDaniel, Howard, & Einstein, ment to test their memory of the passages. To equate the quality
2009). Our primary goal was to empirically test the effectiveness and content of comprehension questions set by the experimenter
of the advice in a simple paradigm. with those generated by participants, we conducted a pilot study. A
310 WEINSTEIN, McDERMOTT, AND ROEDIGER
group of 26 participants from the same pool as those in the current Upon completion of the practice phase, participants were
study were first shown the practice passage and comprehension handed their first passage to read at their own pace. Which passage
questions written by the experimenter and asked to generate eight they got was determined randomly by the program. At this stage,
questions of a similar type for one of the three passages. They were participants did not know which task they would be performing on
instructed to generate two questions for each of the four para- the passage they were currently reading. After reading the passage,
graphs. From this bank of questions, we picked the two most participants pressed a key to continue and were then instructed to
frequently generated questions for each of the four paragraphs of do one of the three tasks: read the passage again at their own pace
each passage (eight questions per passage in total), with the con- (reread condition); answer eight comprehension questions with the
straint that four questions per passage probed information that was passage in front of them (answer questions condition); or generate
later tested, and the other four questions probed information that eight comprehension questions and answers with the passage in
was not tested. As a result, the final test for each passage consisted front of them (generate questions condition). There were no time
of four questions that tested information from the comprehension constraints on any of these tasks, and time taken was measured.
questions, and four questions that tested new information. Note For the answer questions condition, participants were handed a
sheet with eight comprehension questions and blanks to fill in their
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
did not necessarily consist of the same wording, but tested the responses. Questions were arranged in the order information ap-
same material. For instance, one of the comprehension questions peared in the passage. For the generate questions condition, par-
for the Dalı́ passage was: “What painting movement was Dalı́ a ticipants were handed a sheet of paper with eight long blanks for
part of?” In the final test, the same information was probed with questions and eight shorter blanks for responses. Participants were
the following question: “Salvador Dalı́ created some of the most instructed to generate two questions for each of the four para-
widely recognized images to come out of what artistic move- graphs in the passage, and fill in the answers. Once participants
ment?” completed the appropriate task for one passage, they were asked to
Design. We used a within-subjects design with study condi- estimate how much of the information from that passage they
tion (reread/answer questions/generate questions) as the only ma- thought they would remember at the end of the experiment. Re-
nipulated variable. The order of conditions and the assignment of sponses were given as a number from 0 (“you don’t think you’re
going to remember anything at all”) to 100 (“you think you’re
passages to conditions were randomly determined by the program
going to remember the passage perfectly”). The whole process was
for each participant.
then repeated for the other two passages. Every participant read
Procedure. Participants were tested individually with the ex-
one passage twice, answered comprehension questions on another
perimenter present in the room during the session. Participants
passage, and generated comprehension questions and answers on a
were told that they would study some passages for a later test.
third passage.
They were also told that for each passage, they would either be
Following a 15-min retention interval during which participants
reading the text twice, or answering questions after initial reading,
played Tetris, they were tested on the material from each passage.
or generating questions after initial reading. Participants were not
A total of 24 questions were answered, eight from each of the three
told how many passages to expect, or the order of conditions, to
passages. Questions were blocked by passage, and the order of
avoid anticipation of the third condition once two passages had
blocks and questions within the blocks was randomized.
been studied. Instructions were presented on the computer, and
each passage was handed to participants printed on a single sheet
of paper. Results and Discussion
Participants initially took part in a practice phase so that they Below we present the results as a function of condition for
could familiarize themselves with the format of the passages, predictions (how much of the information participants thought
comprehension questions, and test. They were handed the practice they would remember on the test); performance (what proportion
passage and read it at their own pace. Once they were done reading of test questions participants answered correctly); and time on task
the practice passage, the experimenter handed them a sheet of (both to read the passage initially, and then to complete the task
paper with eight comprehension questions, which participants an- appropriate to each condition). All three dependent measures were
swered while keeping the practice passage in front of them. Again, subjected to within-subjects analyses of variance (ANOVAs), with
there were no time constraints on this task. Following completion study condition (reread/answer questions/generate questions) as
of the questions, the passage and comprehension questions were the within-subjects variable. All reported effects were significant
removed and participants took the practice test, which consisted of at p ⬍ .05. Follow-up t-tests were only performed for significant
eight short-answer questions (including four questions on material main effects.
tested in the comprehension questions, and four questions on Predictions. Participants predicted how much information
untested material). These questions appeared on the screen, and they would remember from each passage on a scale from 0 to 100
participants typed their responses. Following completion of the after completing the appropriate task (rereading the passage, an-
practice phase, participants were reminded that the comprehension swering questions, or generating questions). Predictions are pre-
questions they had answered should serve as an example of the sented in the left panel of Figure 1. Participants felt that they would
types of questions they would be expected to generate later on in do better on the test after having generated their own comprehen-
the experiment, and the main experiment began. Note that the task sion questions and answers (Mprediction ⫽ 72.4; SD ⫽ 17.3), than
participants performed in the practice phase was equivalent to that after having read a passage twice (Mprediction ⫽ 63.0; SD ⫽ 21.2)
performed in the answer questions condition in the main part of the or answered comprehension questions set by the experimenter
experiment. (Mprediction ⫽ 63.4; SD ⫽ 21.8). There was a significant difference
A COMPARISON OF STUDY STRATEGIES FOR PASSAGES 311
100 Predictions cally identical). Instead, the effect was driven by a significant
Read Answer Generate difference in performance between the reread condition and the
90
Predicted Accuracy (0 - 100)
“you don’t think you’re going to remember anything at all” to 100 ⫽ “you tion, however, the number of test questions that referred to material
think you’re going to remember the passage perfectly.” that participants had not included in their generated questions ranged
from 3 to 8 (Muntested ⫽ 5.3, SD ⫽ 1.1). Whereas performance on the
untested questions was numerically higher when participants gener-
in predictions among study conditions, F(2, 56) ⫽ 5.37, 2p ⫽ .16. ated questions (Mperformance ⫽ .64; SD ⫽ .31) than when participants
In particular, predictions for the generate condition were sig- answered questions (Mperformance ⫽ .53; SD ⫽ .36), this difference
nificantly higher than predictions made in the reread condition, did not reach significance ( p ⫽ .17).
t(28) ⫽ 3.16, d ⫽ 0.59 and predictions made in the answer To check for order effects, we examined whether performance
questions condition, t(28) ⫽ 2.41, d ⫽ 0.45. Predictions made in the answer and generate questions conditions differed depend-
in the reread and answer questions conditions did not differ ing on which condition came first. Because of the random nature
( p ⫽ .91). of condition order, 11 participants answered questions before
Performance. Performance was measured in terms of the generating questions, and 18 participants generated questions
proportion of questions participants answered correctly on the final before answering them (although recall that all participants took
test (out of a total of eight per passage). Each answer was scored part in a practice phase in which they answered example ques-
as either correct or incorrect (there were no half-points awarded) tions, so all had been exposed to questions written by the
by a research assistant who was blind to the experimental condi- experimenter). A 2 ⫻ 2 mixed ANOVA on performance with
tions and hypotheses. Questions varied somewhat in the amount of study condition (answer/generate) as the within-subjects vari-
information required. For questions that required only one word, a able and order as the between-subjects variable produced no
correct point was awarded when that word was included in the significant effects ( ps ⬎ .09).
answer. Points were not deducted for incorrect spellings. For Time on task. Two sets of data were analyzed in relation to
instance, a question with a one-word answer is “Between the 9th time on task. First, the time taken to read the initial passage was
and 12th centuries, Venice flourished as the result of trade between compared between conditions. No differences were expected here
Western Europe and what empire?” and the correct answer is because participants in fact did not know which task was coming
“Byzantine.” Participants would get a point for answers such as up while reading the passage. As predicted, no such differences
“Bysantine,” “Byzantin,” and “Byzantine Empire,” but not for the were found ( p ⫽ .80); the mean time spent initially reading each
answer “Roman.” For questions that required a short sentence, a passage was 181 seconds across all conditions. More importantly,
correct point was awarded when the answer was judged to contain we also looked at the time spent on each of the three tasks
at least two-thirds of the correct information. An example of such
a question is “For what action did Salvador Dalı́ praise Francisco
Franco?” and the correct response is “Signing death warrants for 1 Performance
political prisoners.” Participants would get a point for answers 0.9
Read Answer Generate
such as “Signing death orders for political prisoners,” “Killing 0.8
Proportion Correct
(rereading, answering questions, and generating questions) after Islanders, 2 African Americans, 1 Hispanic, and 1 person who
the initial reading of the passage. The time spent on each of those ticked “Other.” Eighteen participants were current undergraduates:
three tasks is presented in the left panel of Figure 3. Time on task 5 had completed 1 year of college, 6 had completed 2 years, and
differed by study condition F(2, 48) ⫽ 111.25, 2p ⫽ .82. Answer- 7 had completed 3 years. Two participants were current graduate
ing questions took on average 23 seconds longer than rereading the students who had completed 1 and 2 years of graduate school,
passage, although this difference was not significant, t(24) ⫽ 1.74, respectively. Of the remaining participants, two held a bachelor’s
p ⫽ .09.1 However, generating questions took more than three degree and two held a master’s degree. Four of the participants
times longer than either rereading the passage, t(24) ⫽ 11.45, d ⫽ were not native English speakers, but had spoken English for 8, 13,
2.29 (Mdifference ⫽ 279 s, SD ⫽ 120), or answering questions, 16, or 16 years each.
t(24) ⫽ 11.16, d ⫽ 2.23 (Mdifference ⫽ 256 s, SD ⫽ 115). Thus, Design and procedure. The materials and procedure for this
although the two question conditions produced comparable recall, experiment were identical to those in Experiment 1, except that the
the condition in which the questions were provided to students was study and test phases were separated by a retention interval of two
much more efficient. days. Participants in this experiment also took a free recall test
prior to a short answer test in the same format as that of Experi-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
well after answering and generating questions, but worse after this issue is to use a criterial test that does not involve cues that
rereading the passage. Study condition had a significant effect on could overlap differentially with the encoding tasks. In Experiment
performance F(2, 46) ⫽ 2.86, 2p ⫽ .18. As in Experiment 1, 3 we gave participants a free recall test in which they were simply
performance did not differ between the answer questions and asked to recall as much information as possible from each passage.
generate questions conditions, but rereading the passage produced This design avoided issues of overlap between the questions in the
worse performance than both the answer questions condition, task and the test questions.
t(23) ⫽ 2.51, d ⫽ 0.51, and the generate questions condition,
t(23) ⫽ 3.63, d ⫽ 0.74. As in Experiment 1, there was no
additional benefit of generating questions over answering ques- Method
tions, although both tasks led to better performance than simply
Participants. Thirty-three participants volunteered for the ex-
rereading the passage. Performance across the three conditions was
periment and were reimbursed $10 for 1 hour of their time. As in
10% worse than in Experiment 1, as a result of the 2-day retention
Experiments 1 and 2, participants were recruited over the summer
interval. A cross-experiment comparison yielded a main effect of
study condition, F(2, 102) ⫽ 15.00, 2p ⫽ .22, and a main effect of from the Psychology Department Subject Pool at Washington
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
retention interval, F(1, 51) ⫽ 4.97, 2p ⫽ .09, on performance, but University in St Louis. Participants were thus current students,
This document is copyrighted by the American Psychological Association or one of its allied publishers.
no interaction between the two ( p ⫽ .85). recent graduates, and members of the local community. The age
Looking only at test questions that were not probed in the range was 18 to 31, with a mean age of 21.0 years (SD ⫽ 2.5
answered and generated questions (the number of which ranged years). There were more women than men in the sample (21
from 2 to 7; M ⫽ 4.7, SD ⫽ 1.2), as in Experiment 1, perfor- women and 12 men). There were 16 Caucasians, 12 Asian or
mance was numerically higher in the generate questions condi- Pacific Islanders, 3 African Americans, and 2 people who ticked
tion (Maccuracy ⫽ .45; SD ⫽ .28) than in the answer questions “Other.” Twenty-six participants were current undergraduates: 10
condition (Maccuracy ⫽ .42; SD ⫽ .27), but this difference did had completed 1 year of college, 4 had completed 2 years, 10 had
not approach significance ( p ⫽ .66). Eleven participants an- completed 3 years, and 2 had completed 4 years. Three participants
swered questions before generating questions, and 13 participants were current graduate students who had completed 1 and 2, and 7
generated questions before answering. A 2 ⫻ 2 mixed ANOVA on years of graduate school respectively. Of the remaining partici-
performance with study condition (answer/generate) as the within- pants, three held a bachelor’s degree and one held a master’s
subjects variable and order as the between-subjects variable pro- degree. Five of the participants were not native English speakers,
duced no significant effects ( ps ⬎ .38). but had spoken English for 6, 11, 12, 14, or 16 years each.
Time on task. The mean time spent initially reading each Materials. Four shorter passages were created for this exper-
passage was 182 seconds (no differences among condition, p ⫽ iment. Passages were created by adapting Wikipedia pages on the
.80). The time spent on each of the three study tasks is presented film director Pedro Almodovar (this was used as the practice
in the middle panel of Figure 3. Time on task differed by study passage), the singer Cesaria Evora, the archipelago Svalbard, and
condition F(2, 40) ⫽ 88.06, 2p ⫽ .82. Answering questions took the TV show Top Gear (see Supplemental Materials). Passages
on average 17 seconds longer than rereading the passage, and this were approximately 350 words long, and contained approximately
difference was not significant, ( p ⫽ .17). However, generating 40 idea units each, as defined by two raters. An idea unit was
questions took more than three times longer than either rereading identified as one self-contained fact, and there could be multiple
the passage t(21) ⫽ 9.67, d ⫽ 2.11 (Mdifference ⫽ 303 s, SD ⫽ 120), idea units in each sentence. An example of a sentence containing
or answering questions t(22) ⫽ 10.70, d ⫽ 2.23 (Mdifference ⫽ two units is: “Top Gear is an award-winning BBC TV series about
286 s; SD ⫽ 71). motor vehicles, primarily cars.” The two idea units in this sentence
are: “Top Gear is an award-winning BBC TV series” and “Top
Experiment 3 Gear is about motor vehicles, primarily cars.” Passages were split
into four paragraphs, and two comprehension questions per para-
In Experiments 1 and 2 we showed that generating answers to
graph were devised by the experimenter (see Supplemental Mate-
questions—whether self- or other-generated—improved perfor-
rials).
mance on a cued-recall test relative to rereading, both after a
Design and procedure. As in Experiments 1 and 2, we used
15-min and 2-day retention interval. We also showed that gener-
ating questions did not lead to any benefits over and above an- a within-subjects design with study condition (reread/answer ques-
swering questions set by someone else. However, performance on tions/generate questions) as the only manipulated variable. The
the final test may in part have been affected by the overlap assignment of passages to conditions was counterbalanced,
between questions that participants interacted with in the task, and whereas the order of conditions was randomly determined by the
the questions on the final cued-recall test. More specifically, program for each participant.
whereas the overlap between the questions set by the experimenter The procedure was identical to that of Experiment 1, except that
and the final test questions was controlled, this level of control was at test participants had 5 minutes per passage to recall as much
not possible in the generate questions condition. The questions set information as they could without any cues. Responses were typed
by the experimenter were taken from the same pool as those by participants. Instructions stated that the order in which the
generated by participants (from a pilot study, see Experiment 1). information was recalled did not matter, and that participants
However, the generated questions overlapped on average less with should try their best to recall as much content as they could. This
the final test questions than did the questions set by the experi- free recall test was also given for the practice passage, so partic-
menter, and were, by design, more variable. One way to get around ipants had experience of the type of test they would be getting.
314 WEINSTEIN, McDERMOTT, AND ROEDIGER
Below we present the results as a function of condition for The main aim of the present research was to test the effective-
predictions (how much of the information participants thought ness of an often-recommended study technique designed to aid
they would remember on the test); performance (what proportion retention of information presented in a passage. The technique is
of idea units participants recalled on the free recall test); and time requiring students to create their own comprehension questions
on task (both to read the passage initially, and then to complete the while reading the passage in front of them, and then answering
task appropriate to each condition). All three dependent measures these questions. This study technique was pitted against two alter-
were subjected to within-subjects ANOVAs with study condition native techniques: rereading the passage with no other task, and
(reread/answer questions/generate questions) as the within- answering questions prepared by the experimenter. Rereading is a
subjects variable. technique that is commonly employed by students preparing for a
Predictions. Participants predicted how much information test (Karpicke et al., 2009), even though experimental findings
they would remember from each passage on a scale from 0 to 100 demonstrate that additional readings of a text sometimes do not
after completing the appropriate task (rereading the passage, an- produce much improvement in performance (Callender & McDaniel,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
swering questions, or generating questions). Predictions are pre- 2009a). Answering comprehension questions provided by the ex-
sented in the right panel of Figure 1, and produced a pattern similar perimenter in preparation for a test has been shown to yield
to those of Experiments 1 and 2. Predictions were numerically performance superior to that after rereading (Agarwal et al., 2008;
higher for the generate questions condition compared with the Nungester & Duchastel, 1982). The advantage of the study tech-
other two conditions, but the main effect of task was not significant nique we tested is that it does not require any additional material
( p ⫽ .10). other than the passage itself. Thus, if this technique produces
Performance. Scoring was done by two raters blind to the comparable performance to one in which students answer ques-
conditions. Participants received one point for each idea unit for tions prepared by someone else, it could be useful for situations in
which they correctly recalled 2/3 of the information. For instance, which such materials are not available. Indeed, in three experi-
for the idea unit “She became an international star at the age of ments, we found that generating and answering one’s own ques-
47,” “She became a star at the age of 47” would get one point. tions in preparation for a memory test produced performance
Scores were averaged across those given by the two raters. The comparable to answering the experimenter’s questions, and always
two raters’ scores were highly correlated, r ⫽ .88. Performance on represented a significant improvement in performance over reread-
the final test in terms of the proportion of idea units recalled in ing. However, we did not find generating one’s own questions to
each condition is presented in the right panel of Figure 2. These be any more beneficial to retention than answering questions set by
scores are much lower than the short answer test results from the experimenter. This pattern of data was found on an immediate
previous experiments because the present experiment involved a cued-recall test (Experiment 1), a delayed cued-recall test (Exper-
free recall test. As in previous experiments, there was a significant iment 2), and an immediate free recall test (Experiment 3).
difference in performance between study conditions, F(2, 64) ⫽ It is important to consider the role of individual differences in
4.72, 2p ⫽ .13. Performance did not differ between the answer the effectiveness of a task that puts the onus on the student to
questions and generate questions conditions ( p ⫽ .36), but reread- generate their own material for study. One issue that arises is that
ing the passage resulted in fewer idea units being recalled than higher ability students may be more adept at selecting information
both generating and answering questions, t(32) ⫽ 3.18, d ⫽ 0.87, that will later be tested. Some relevant preliminary work came to
and answering questions set by the experimenter, t(23) ⫽ 1.95, our attention after the experiments presented here were completed.
d ⫽ 0.33, p ⫽ .06, although the latter difference did not reach Callender and McDaniel (2009b) had low- and high-ability readers
significance. As in Experiments 1 and 2, there was no additional highlight key information in a passage or generate questions about
benefit of generating questions over answering questions, but both that information. One week later, participants returned and restud-
tasks lead to better performance than simply rereading the passage. ied the highlighted information or answered the questions they had
Looking only at idea units that were not probed in the answered generated, and then took a cued-recall test. In addition to perform-
and generated questions conditions, performance was identical ing quantitatively better on all tasks, high-ability readers were also
across the two conditions (Maccuracy ⫽ .39; SD ⫽ .15). Fourteen qualitatively different from low-ability readers in that they selected
participants answered questions before generating questions, and more important information for highlighting or generating ques-
19 participants generated questions before answering them. A 2 ⫻ tions. Low ability readers appeared to benefit from the generation
2 mixed ANOVA on performance with study condition (answer/ task only insofar as their generated questions overlapped with test
generate) as the within-subjects variable and order as the between- questions. We did not design our study in a way that would permit
subjects variable produced no significant effects ( ps ⬎ .24). an analysis of individual differences, but we were able to conduct
Time on task. The mean time spent initially reading each rudimentary post hoc analyses to see whether overall performance
passage was 141 seconds (no differences between condition, p ⫽ mediated the differences between conditions we report. For each
.95). The time spent on each of the three study tasks is presented experiment we calculated mean differences between each pair of
in the right panel of Figure 3. Time on task differed by study conditions (read-generate, read-answer, and answer-generate), and
condition F(2, 50) ⫽ 140.35, 2p ⫽ .85; in particular, answering correlated these difference scores with overall performance. None
questions took double the amount of time that it took to reread the of the correlations reached significance in any of the three exper-
passage t(26) ⫽ 8.23, d ⫽ 1.58, and generating questions took iments, suggesting that the pattern of results we report was not
double the amount of time that it took to answer questions t(28) ⫽ driven by a subset of the sample. In addition, in Experiment 3 we
11.37, d ⫽ 2.11. eliminated the issue of overlap between the questions answered in
A COMPARISON OF STUDY STRATEGIES FOR PASSAGES 315
the task and the test questions by giving participants a free recall the absence of such materials, generating and answering questions
test in which all information recalled was scored equally (i.e., we is a viable alternative to rereading that appears to be just as
did not assign differential value to central and peripheral informa- beneficial to later retention even though much more time consum-
tion). Thus, the choice of comprehension questions was less likely ing.
to affect performance on the criterial test, but generating questions
still did not produce a significant advantage over answering ques-
References
tions. There are two caveats to this conclusion. First, all partici-
pants were given a practice phase and thus had an idea of the types Agarwal, P. K., Karpicke, J. D., Kang, S. H. K., Roediger, H. L., &
of questions they were expected to generate. In the absence of such McDermott, K. B. (2008). Examining the testing effect with open- and
information, stronger individual differences and/or differences be- closed-book tests. Applied Cognitive Psychology, 22, 861– 876.
tween conditions may have emerged. Second, we only examined Ben-Chaim, D., & Zoller, U. (1997). Examination-type preferences of
the effectiveness of self-testing as a technique for memorizing secondary school students and their teachers in the science disciplines.
information. In other domains such as those requiring original Instructional Science, 25, 347–367.
Butler, A. C., Marsh, E. J., & Roediger, III, H. L. (2005, May). Distractor
thought or creativity it may be that self-generated testing could
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
lead to bigger improvements than experimenter-led testing because the annual meeting of the Midwestern Psychological Society, Chicago.
the ability to generate appropriate questions for self-testing is an Butler, A. C., & Roediger, H. L. (2007). Testing improves long-term
inherent component of the task. We would, however, expect our retention in a simulated classroom setting. European Journal of Cogni-
findings to replicate in other situations involving factual informa- tive Psychology, 19, 514 –527.
tion. Callender, A. A., & McDaniel, M. A. (2009a). The limited benefits of
Two other issues were of import in the present article: that of rereading educational texts. Contemporary Educational Psychology, 34,
performance predictions, and that of time on task. First, in order 30 – 41.
for the technique to be adapted by students, they need to be Callender, A. A., & McDaniel, M. A. (2009b, November). Self-generated
mindful of its advantages over the popular method of rereading, as Questions and the testing effect. Poster presented at the annual meeting
of the Psychonomic Society, Boston.
they tend not to be when it comes to self-testing (e.g., Dunlosky &
Carpenter, S. K. (2009). Cue strength as a moderator of the testing effect:
Nelson, 1992; Roediger & Karpicke, 2006b). We replicated Agar- The benefits of elaborative retrieval. Journal of Experimental Psychol-
wal et al.’s (2008) finding that students do not predict any im- ogy: Learning, Memory, and Cognition, 35, 1563–1569.
provement on a later closed-book test from answering questions Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support
(i.e., taking an open-book self-test) as compared with rereading the enhances subsequent retention: Support for the elaborative retrieval
text. However, participants did predict better performance in the explanation of the testing effect. Memory & Cognition, 34, 268 –276.
condition in which they generated and then answered their own Carpenter, S. K., Pashler, H., & Vul, E. (2006). What types of learning are
questions. This could have both positive and negative conse- enhanced by a cued recall test? Psychonomic Bulletin & Review, 13,
quences. On one hand, this task appears to promote better meta- 826 – 830.
cognition than answering preset questions, in that participants Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention.
Memory & Cognition, 20, 632– 642.
recognize its superiority over rereading. So, whereas they do not
Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for
seem to be aware of the benefits of answering questions generated judgments of learning (JOL) and the delayed-JOL effect. Memory &
by someone else, they do seem to recognize the advantage of Cognition, 20, 374 –380.
generating their own questions. Gardiner, J. M., Craik, F. I. M., & Bleasdale, F. A. (1973). Retrieval
A likely reason for the difference in predictions between the difficulty and subsequent recall. Memory & Cognition, 1, 213–216.
generate questions condition and the other two tasks is the large Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a
difference in time taken to complete the tasks, with much greater problem versus remembering a solution. Journal of Verbal Learning and
time used for question generation. There were no time limits set for Verbal Behavior, 17, 649 – 667.
the completion of the encoding tasks, so that the time taken on Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format
each task in the absence of control could be measured. Whereas and corrective feedback modulate the effect of testing on memory
retention. The European Journal of Cognitive Psychology, 19, 528 –558.
rereading and answering comprehension questions took roughly
Karpicke, J. D., Butler, A. C., & Roediger, H. L. (2009). Metacognitive
the same amount of time, generating and answering one’s own strategies in student learning: Do students practice retrieval when they
questions was a far more time-consuming endeavor. Generating study on their own? Memory, 17, 471– 479.
and answering questions turned out to be a time-consuming task Karpicke, J. D., & Roediger, H. L. (2007). Expanding retrieval practice
both because of the additional effort required to search out appro- promotes short-term retention, but equally spaced retrieval enhances
priate information to write questions about and because of the long-term retention. Journal of Experimental Psychology: Learning,
amount of additional writing involved as compared to answering Memory, and Cognition, 33, 704 –719.
preset cued recall questions. It is possible that this difference in Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of
time investment rather than an awareness of the benefits of the task retrieval for learning. Science, 319, 966 –968.
drove students’ predictions of later recall. The discrepancy in the Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. (2004). Predicting one’s
own forgetting: The role of experience-based and theory-based pro-
amount of time taken to complete each task raises issues with
cesses. Journal of Experimental Psychology: General, 133, 643– 656.
regards to the efficiency of the tasks. Whereas the answer ques- Kornell, N., & Son, L. K. (2009). Learners’ choices and beliefs about
tions and generate questions conditions produced equivalent per- self-testing. Memory, 17, 493–501.
formance, the former was far more time-efficient. Clearly, if Martin, M. A. (1985). Students’ applications of self-questioning study
testing materials are available, students should opt to use them to techniques: An investigation of their efficacy. Reading Psychology, 6,
maximize the efficiency of their time spent studying. However, in 69 – 83.
316 WEINSTEIN, McDERMOTT, AND ROEDIGER
McDaniel, M. A., & Bugg, J. M. (2008). Instability in memory phenomena: Robinson, F. P. (1970). Effective Study (4th ed.). New York: Harper &
A common puzzle and a unifying explanation. Psychological Bulletin & Row.
Review, 15, 237–255. Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory:
McDaniel, M. A., Howard, D., & Einstein, G. O. (2009). The read-recite- Basic research and implications for educational practice. Perspectives on
review study strategy: Effective and portable. Psychological Science, 20, Psychological Science, 1, 181–210.
516 –522. Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning:
McDaniel, M. A., Roediger, H. L., & McDermott, K. B. (2007). General- Taking memory tests improves long-term retention. Psychological Sci-
izing test-enhanced learning from the laboratory to the classroom. Psy- ence, 17, 249 –255.
chonomic Bulletin & Review, 14, 200 –206. Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation of
Nestojko, J. F., Bui, D. C., Kornell, N., & Bjork, E. L. (2009, November). a phenomenon. Journal of Experimental Psychology: Human Learning
Preparing to teach improves the processing and retention of informa- & Memory, 4, 592– 604.
tion. Poster presented at the annual meeting of the Psychonomic Society,
Boston. Received February 25, 2010
Nungester, R. J., & Duchastel, P. C. (1982). Testing versus review: Effects Revision received June 10, 2010
Accepted July 2, 2010 䡲
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.