Mixing Grammar Exercises Facilitates Long-Term Retention: Effects of Blocking, Interleaving, and Increasing Practice
Mixing Grammar Exercises Facilitates Long-Term Retention: Effects of Blocking, Interleaving, and Increasing Practice
Mixing Grammar Exercises Facilitates Long-Term Retention: Effects of Blocking, Interleaving, and Increasing Practice
Cognitive psychology research has shown that interleaving, wherein learners practice multiple skills or
concepts at once, facilitates learning more than does blocking, wherein learners practice only one skill
or concept at a time. Despite the advantage of interleaving over blocking observed across a number
of domains, limited attention has been devoted to the effects of interleaving on second language (L2)
learning. This study examined the effects of blocking and interleaving on L2 grammar learning. In this
study, 115 Japanese learners studied 5 English grammatical structures under 1 of 3 conditions: blocking,
interleaving, and increasing (i.e., blocking followed by interleaving). Learning was measured using a
grammaticality judgment test administered immediately and 1 week after the treatment. Although in-
terleaving led to the highest number of incorrect responses during training, it was more effective than
blocking in the 1-week delayed posttest. These results indicate that the advantage of interleaving extends
to L2 grammar learning. Furthermore, learners’ levels of prior knowledge were found to moderate the
effects of interleaving. Specifically, participants with lower pretest scores benefited more from inter-
leaving compared to those with higher pretest scores. Pedagogically, the findings suggest that grammar
learning may be enhanced by incorporating interleaved practice.
Keywords: interleaving; blocking; increasing practice; grammar acquisition; contextual interference;
practice distribution
FIGURE 2
Example of a Blocked Schedule With Temporal Spacing
A f f A f f A B f f B f f B C f f C f f C
Note. The direction reads, “Choose the most appropriate word or phrase from the four options to complete the
sentence.” The Japanese translation for the sentence I bought a car for my daughter last Christmas is given above the
English sentence. The correct answer is “bought.”
FIGURE 4
Example of Feedback Given After a Learner’s Response
Note. The metalinguistic explanation reads, “The past tense is used to refer to an event that happened at a particular
point in the past.”
three verb tenses were always used as incorrect Since the study time was not controlled, it was
options. For instance, for the questions targeting treated as a covariate in the data analysis (see
the simple past tense (e.g., bought), three incor- Results). After each response, the correct answer
rect options were always the simple present (e.g., and metalinguistic explanation of the target struc-
“buy”), present perfect (e.g., “have bought”), and ture were provided as feedback for 12 seconds
future tense (e.g., “will buy”) of the target verb. (Figure 4; also see Online Supplement A for
To reflect real-life study situations, the treatment details).
was self-paced and participants were able to spend Note that the treatment used in this study
as much time as they needed on each question. (multiple-choice fill-in-the-blank questions) is
Tatsuya Nakata and Yuichi Suzuki 9
categorized as a type of “instruction that expects tions, questions related to the aforementioned 5
learners to focus on forms in isolation” (Norris grammatical categories were mixed as in the in-
& Ortega, 2000, p. 420). This type of controlled terleaved group (see Figure 5, bottom).
practice alone is insufficient for L2 acquisition
Pretest and Posttest. The grammaticality judg-
(Ellis & Shintani, 2014). Nonetheless, it was cho-
ment test was administered as the pretest and
sen for the present study for three reasons. First,
posttest (immediate and 1-week delayed). The
although controlled practice is not sufficient in
grammaticality judgment test was chosen as a
isolation, it can be an efficient technique for ac-
dependent measure because it is a commonly
quiring explicit, declarative knowledge about tar-
used tool for assessing L2 linguistic knowledge
get structures. This kind of knowledge can serve as
(Gass, 2018). Participants were presented with
a basis for developing procedural and automatic
40 sentences sequentially and were instructed
knowledge (DeKeyser, 2015), which allows learn-
to press the left arrow on the keyboard for a
ers to use the L2 more fluently and efficiently.
grammatical sentence and the right arrow for
Second, controlled practice including fill-in-the-
an ungrammatical sentence. The test targeted
blank questions is still used widely in L2 English
declarative-explicit knowledge of grammatical
textbooks (Nitta & Gardner, 2005) and in many
features developed via controlled practice in rule
English-as-a-foreign-language contexts. Third, us-
identification and form comparison. Although
ing controlled practice enables experimenters to
the grammaticality judgment test was not timed,
have strict control over the treatment and manip-
the participants were instructed to respond as
ulate the practice schedule relatively easily.
quickly as possible. The 40 items consisted of
Although all three groups answered the same
eight items from each of the five target struc-
50 multiple-choice questions during the treat-
tures. Half of the items were grammatical (e.g.,
ment, the item order was different in each case. In
“My father came to my school last week”), and the
the blocked group, the 50 questions were blocked
other half contained errors related to the verb
by category. Specifically, for approximately half of
tense and were ungrammatical (e.g., * “Mary has
the participants in this group (18 out of 39), the
bought a nice bag three days ago”). To ensure
50 questions were arranged as follows: 10 simple
that participants would not judge the sentence
past → 10 present perfect → 10 first conditional
as grammatically incorrect for reasons other than
→ 10 second conditional → 10 third condi-
the verb tense, they were informed that no sen-
tional (see Figure 5, top). For the remaining
tences would contain errors unrelated to the verb
participants (21 out of 39), the 50 questions were
tense. At the beginning of the test, four practice
arranged as follows: 10 first conditional → 10
items were given to familiarize the participants
second conditional → 10 third conditional → 10
with the test format. The four practice items did
simple past → 10 present perfect. These two item
not involve any of the target structures (e.g., “He
orders were used to minimize the potential order
plays basketball well” or * “This is mine pen”)
effects. In both cases, questions on the simple
to eliminate the potential influence on student
past were immediately followed by those on the
performance on the critical items.
present perfect (10 simple past → 10 present per-
To reduce a potential practice effect, three
fect). This was done to ensure that questions re-
forms of the test that differed in terms of the noun
lated to the simple past and present perfect, which
phrase and adverb phrase were used, as illustrated
are similar and somewhat confusable, would not
by the examples in (1).
be interleaved by questions pertaining to the con-
ditionals. For the same reason, questions on the 3
types of conditionals were presented in sequence (1) Form A: Daniel built a house last September.
(10 first conditional → 10 second conditional → Form B: Paul built a house last October.
10 third conditional) in both cases. Form C: John built a house last summer.
In the interleaved group, questions from five
categories were mixed, such as simple past, The test items appeared in a different ran-
present perfect, first conditional, second condi- domized order in each form. To control for the
tional, third conditional, second conditional, sim- test form effects, the administration of the three
ple past, first conditional, present perfect, third forms was counterbalanced across participants.
conditional, etc. Questions from the same gram- Specifically, a subgroup of participants (20 out of
matical category never appeared twice in a row 115) had Form A for the pretest, Form B for the
(see Figure 5, middle). In the increasing group, immediate posttest, and Form C for the delayed
the first 25 questions were blocked by category as posttest, while another subgroup (18 out of 115)
in the blocked group. For the remaining 25 ques- had Form C for the pretest, Form A for the
10 The Modern Language Journal 00 (2019)
FIGURE 5
Sample Item Orders in the Blocked, Interleaved, and Increasing Conditions
Blocked condition
1. Simple past 2. Simple past 3. Simple past 4. Simple past 5. Simple past
6. Simple past 7. Simple past 8. Simple past 9. Simple past 10. Simple past
11. Present perfect 12. Present perfect 13. Present perfect 14. Present perfect 15. Present perfect
16. Present perfect 17. Present perfect 18. Present perfect 19. Present perfect 20. Present perfect
21. 1st condional 22. 1st condional 23. 1st condional 24. 1st condional 25. 1st condional
26. 1st condional 27. 1st condional 28. 1st condional 29. 1st condional 30. 1st condional
31. 2nd condional 32. 2nd condional 33. 2nd condional 34. 2nd condional 35. 2nd condional
36. 2nd condional 37. 2nd condional 38. 2nd condional 39. 2nd condional 40. 2nd condional
41. 3rd condional 42. 3rd condional 43. 3rd condional 44. 3rd condional 45. 3rd condional
46. 3rd condional 47. 3rd condional 48. 3rd condional 49. 3rd condional 50. 3rd condional
Interleaved condition
1. Simple past 2. Present perfect 3. 1st condional 4. 2nd condional 5. 3rd condional
6. 2nd condional 7. Simple past 8. 1st condional 9. Present perfect 10. 3rd condional
11. Simple past 12. 2nd condional 13. 1st condional 14. Present perfect 15. 3rd condional
16. Simple past 17. Present perfect 18. 3rd condional 19. 2nd condional 20. 1st condional
21. Present perfect 22. Simple past 23. 3rd condional 24. 2nd condional 25. 1st condional
26. 3rd condional 27. 2nd condional 28. Simple past 29. 1st condional 30. Present perfect
31. Simple past 32. 1st condional 33. 2nd condional 34. Present perfect 35. 3rd condional
36. Present perfect 37. 3rd condional 38. Simple past 39. 2nd condional 40. 1st condional
41. Simple past 42. 1st condional 43. Present perfect 44. 3rd condional 45. 2nd condional
46. Simple past 47. Present perfect 48. 1st condional 49. 2nd condional 50. 3rd condional
Increasing condition
1. Simple past 2. Simple past 3. Simple past 4. Simple past 5. Simple past
6. Present perfect 7. Present perfect 8. Present perfect 9. Present perfect 10. Present perfect
11. 1st condional 12. 1st condional 13. 1st condional 14. 1st condional 15. 1st condional
16. 2nd condional 17. 2nd condional 18. 2nd condional 19. 2nd condional 20. 2nd condional
21. 3rd condional 22. 3rd condional 23. 3rd condional 24. 3rd condional 25. 3rd condional
26. 3rd condional 27. 2nd condional 28. Simple past 29. 1st condional 30. Present perfect
31. Simple past 32. 1st condional 33. 2nd condional 34. Present perfect 35. 3rd condional
36. Present perfect 37. 3rd condional 38. Simple past 39. 2nd condional 40. 1st condional
41. Simple past 42. 1st condional 43. Present perfect 44. 3rd condional 45. 2nd condional
46. Simple past 47. Present perfect 48. 1st condional 49. 2nd condional 50. 3rd condional
Tatsuya Nakata and Yuichi Suzuki 11
immediate posttest, and Form B for the delayed cality judgment test used in this study consisted
posttest, etc. of 20 grammatical items and 20 ungrammatical
The reliability of the grammaticality judg- items. Suppose Person A answered 16 grammat-
ment test indexed by Cronbach alpha was .688 ical items correctly and 16 ungrammatical items
(pretest), .851 (immediate), and .817 (delayed). correctly, while Person B answered 14 grammat-
As the pretest reliability is slightly lower than .70, it ical items correctly and 18 ungrammatical items
requires some caution in interpreting the results, correctly. Both individuals successfully answered
whereas the two posttests showed moderate relia- 32 items out of 40, and their accuracy rate is
bility (Brown, 2014). 80%. However, Person A rated more ungrammat-
ical items as correct (4) than did Person B (2).
Procedure The d-prime score takes this type of response bias
into account by penalizing participants who have
The study was conducted during two regular a high false alarm rate. As a result, the d-prime
classes. The participants received explanations score will be higher for Person B (1.81) than
about the study before taking part in the pretest. Person A (1.68).
In the pretest, participants were presented with
40 English sentences one by one and were asked
to judge whether each sentence was grammati- RESULTS
cally correct or not. The pretest was followed by Performance During Treatment
the treatment, where the participants answered
50 multiple-choice questions and studied 5 tar- During the treatment, the participants an-
get grammatical structures. After the treatment, swered 50 multiple-choice questions, comprising
the participants were required to answer 10 two- 10 questions for each of the 5 target grammatical
digit additions (e.g., 17 + 75 = ?) as a filler task. structures. Figure 6 illustrates the proportion of
This task was included to minimize the potential correct responses as a function of question posi-
order effects. Following the filler task, the par- tion during the treatment when collapsed across
ticipants took the grammaticality judgment test the five grammatical structures. For instance,
as the immediate posttest. After the immediate Figure 6 shows that, in the blocked group, the av-
posttest, participants were asked to evaluate the erage accuracy rate was 44.10% for the first ques-
effectiveness of the treatment for learning the tar- tion in each grammatical structure. It also shows
get structures on a 7-point Likert scale, where 1 different learning curves for the three groups.
meant not effective at all and 7 corresponded to First, in the blocked group, although the accuracy
very effective. Although the participants were not rate initially increased as a function of question
specifically asked to focus on item distribution, as position, there were no measurable gains in the
only learning schedules differed across the three latter half of the treatment. Second, in the inter-
groups, they would be the source of any potential leaved group, the accuracy rate continued to in-
differences among these groups. One week after crease toward the end of the treatment, in con-
the immediate posttest, an unannounced delayed trast to the blocked group. Third, for the first half
posttest was given. of the questions, the learning curve for the in-
creasing group was similar to that of the blocked
Analysis group. This finding can be attributed to the iden-
tical practice schedule for the first five questions
Responses to the grammaticality judgment tests in each grammatical structure for the blocked and
were analyzed in terms of accuracy rate (pro- increasing groups (see Figure 5). The accuracy
portion of correct responses) and d-prime score. rate of the increasing group, however, decreased
d-prime scores provide an unbiased index of when responding to the latter half of the ques-
grammatical sensitivity, namely the ability to dis- tions, where the questions were interleaved rather
tinguish grammatical from ungrammatical items. than blocked (see Online Supplement 2 for de-
They are considered a more sensitive index than tailed statistical analyses).
accuracy rates, because they take response bias When collapsed across all 50 questions, the
into account (Macmillan & Creelman, 2004). average accuracy rate was 87.23% (95% CI
Specifically, d-prime scores are calculated by sub- [84.33%, 90.13%], SD = 8.94%) for the blocked
tracting the z-score of false alarm rate (e.g., re- group, 78.83% (95% CI [74.76%, 82.90%], SD =
sponding “yes” to ungrammatical items) from the 12.03%) for the increasing group, and 77.00%
z-score related to hit rate (responding “yes” to (95% CI [72.21%, 81.79%], SD = 14.98%) for the
grammatical items). For instance, the grammati- interleaved group. Results yielded by a one-way
12 The Modern Language Journal 00 (2019)
FIGURE 6
Proportion of Correct Responses During the Treatment as a Function of Question Position
ANOVA revealed significant differences among time (immediate and delayed posttest) as a within-
the three groups, F(2,114) = 7.72, p = .001, participant variable. The covariates were d-prime
η2 = .121. The Bonferroni method of multiple pretest scores, treatment duration (log trans-
comparisons showed that the blocked practice led formed, in order to reduce skewness), and jMET
to significantly better performance than increas- scores. The results of the mixed ANCOVA re-
ing (p = .010, d = 0.80) and interleaved prac- vealed no significant main effect of condition (see
tice (p = .001, d = 0.83). The effect sizes (d) Table 2). However, a marginally significant inter-
are considered medium according to the guide- action between time and condition was found, in-
lines (small: d = 0.4; medium: d = 0.7; large: dicating that the effects of condition varied de-
d = 1.0) proposed by Plonsky and Oswald (2014). pending on the timing of posttests (immediate
However, no significant difference was found or delayed). Furthermore, the three-way interac-
between the increasing and interleaved groups tion among time, condition, and pretest was also
(p = .792), and only a very small effect size was marginally significant, suggesting that the pretest
noted (d = 0.13). scores may further moderate the interaction be-
Since participants were allowed to spend as tween time and condition.2
much time as they needed on answering the To probe the interaction between time and con-
multiple-choice questions, the treatment dura- dition further, two univariate ANCOVAs were con-
tion was different for each participant. The av- ducted separately for the immediate and delayed
erage treatment duration (with standard devia- posttests. For the immediate posttest, neither
tions given in parentheses) was 17.56 (3.84), 18.16 the main effect of condition nor the interaction
(1.98), and 20.26 (4.48) minutes for the blocked, between condition and pretest was significant.
increasing, and interleaved groups, respectively. Table 1 and Figure 7 show the adjusted means of
Due to the statistically significant difference in d-prime scores for the three conditions. Adjusted
the treatment duration among the three groups, d-prime scores refer to d-prime scores adjusted
F(2,112) = 6.01, p = .003, treatment duration was for the following three covariates: d-prime pretest
modeled as a covariate in the subsequent analyses. score, treatment duration, and jMET score. The
interleaved (M = 2.14) and increasing conditions
(M = 2.12) yielded higher adjusted d-prime scores
Performance on the Pretest and Posttests
than the blocked condition (M = 1.73) at the de-
Table 1 presents accuracy rates and d-prime scriptive level. However, the 95% confidence in-
scores of the grammaticality judgment tests. Ac- tervals overlapped among the three conditions,
cording to the Shapiro–Wilk test, all d-prime producing small effect sizes (0.02 ࣘ d ࣘ 0.51).
scores were normally distributed (ps > .05). A In contrast, on the delayed posttest, the main
mixed ANCOVA was conducted on the d-prime effect of condition was significant, F(2,107) =
scores with condition (blocked, increasing, and 4.512, p = .013, ηp 2 = .078. As shown in Table 1
interleaved) as a between-participant variable and and Figure 7, the adjusted mean of d-prime
Tatsuya Nakata and Yuichi Suzuki 13
TABLE 1
Accuracy and d-Prime Scores on the Grammaticality Judgment Tests
M SD M SD M SD
Accuracy (%)
Blocked 70.51 11.21 80.51 13.51 78.53 13.00
Increasing 70.90 12.72 83.54 12.81 80.14 14.64
Interleaved 67.88 13.06 80.00 17.02 79.63 14.75
d-Prime
Blocked 1.17 0.72 1.87 0.97 1.76 0.97
Increasing 1.16 0.77 2.18 1.04 1.92 1.09
Interleaved 1.04 0.84 1.95 1.21 1.89 1.09
Adjusted d-Prime
Blocked 1.73 0.80 1.60 0.72
Increasing 2.12 0.77 1.86 0.70
Interleaved 2.14 0.80 2.07 0.73
Note. M = mean; SD = standard deviation. n = 39, 36, and 40 for the blocked, increasing, and interleaved groups,
respectively. Adjusted d-prime scores refer to d-prime scores adjusted for the following three covariates: d-prime pretest
score, treatment duration, and junior Minimal English Test (jMET) score.
TABLE 2
Results of Mixed ANCOVA on the Grammaticality Judgment Tests
Effect df MS F p ηp 2
score was the highest for the interleaved group action was examined further to establish whether
(M = 2.07), followed by the increasing (M = 1.86) the benefit of interleaving differed depending on
and blocked groups (M = 1.60). The Bonferroni the pretest scores. This part of the analysis is more
method of multiple comparisons showed that the exploratory in nature, and the results should be
interleaved group significantly outperformed the interpreted with caution. After segregating the
blocked group (p = .021), and a close-to-medium effects of treatment duration and jMET score,
effect size was observed (d = 0.64). However, correlations (partial correlations) between the
there were no statistically significant differences pretest score and the absolute score gain from the
between the blocked and the increasing group pretest to the delayed posttest were computed. No
(p = .339, d = 0.37) or between the increasing significant correlation was found in the blocked
and the interleaved group (p = .667, d = 0.28), (r = −.103, p = .543) or increasing group
yielding small effect sizes. Furthermore, the (r = .192, p = .276). However, the correlation was
interaction between condition and pretest was negative and moderate in the interleaved group
marginally significant on the delayed posttest, (r = −.420, p = .009). This result suggests that the
F(2,107) = 2.542, p = .083, ηp 2 = .045. This inter- participants with lower pretest scores benefited
14 The Modern Language Journal 00 (2019)
FIGURE 7 (see Literature Review section) but are somewhat
Adjusted Mean of d-Prime Scores on the Immediate inconsistent with those obtained in the L2 gram-
and Delayed Posttests mar study conducted by Pan et al. (2018). Al-
though these researchers found benefits of in-
terleaving over blocking when the treatment was
conducted across multiple sessions (Experiments
3 and 4), no significant difference emerged be-
tween blocking and interleaving when the treat-
ment was delivered during a single session (Ex-
periments 1 and 2). The results obtained in the
present study, however, indicate presence of the
interleaving effect, even though the treatment was
conducted in a single-session format. The advan-
tage of interleaved practice observed in this study
may be partly attributed to the relatively high level
of learners’ prior knowledge. Although partici-
pants in Pan et al.’s (2018) study had no prior
knowledge of the target grammatical structures
(preterite and imperfect past tenses in Spanish),
Note. Error bars indicate 95% confidence intervals. the average pretest score in this study was 69.76%
Mean d-prime scores were adjusted for the following (d-prime = 1.12), which is above the chance level
three covariates: d-prime pretest score, treatment dura- of 50%. Earlier research suggests that interleav-
tion, and junior Minimal English Test (jMET) score. ing is especially effective for experienced learners
(e.g., Guadagnoli et al., 1999; Rey et al., 1982). As
a result, the benefits of interleaving were perhaps
more from interleaved practice compared to more pronounced in this study.
those with higher pretest scores. In an L2 pronunciation study, Carpenter and
Mueller (2013) found that blocked practice led
to better retention than interleaved practice. The
Judgments of Learning
present study, in contrast, showed that interleav-
After the immediate posttest, participants were ing was more beneficial than blocking. The incon-
asked to evaluate the treatment effectiveness for gruence in the reported findings may be partially
learning the target structures on a 7-point scale, due to three factors. First, the target grammatical
anchored at 1 = not effective at all and 7 = very structures used in this study were somewhat
effective. The average rating was 5.31 (SD = 1.28, similar to each other (e.g., three types of con-
95% CI [4.95, 5.67]), 5.61 (SD = 1.20, 95% CI ditionals), while the French ortho-phonological
[5.23, 5.99]), and 5.63 (SD = 1.21, 95% CI [5.27, rules used by Carpenter and Mueller were not
5.98]) for the blocked, increasing, and inter- (e.g., eau, s, er). In other words, the target features
leaved groups, respectively. No significant differ- in this study had lower between-category discrim-
ences were found among the ratings given by the inability than those employed by Carpenter and
three groups, F(112,2) = 0.820, p = .443, η2 = Mueller (2013). Previous research suggests that
.014. These findings suggest that the participants interleaving tends to be beneficial for target fea-
considered the three practice schedules equally tures with low between-category discriminability
effective. because, according to the Discriminative Contrast
Hypothesis, by mixing exemplars from different
DISCUSSION categories, interleaving helps learners to distin-
guish between similar categories (Kang & Pashler,
The aim of the current study was to establish 2012). This may be one of the reasons behind
whether interleaving would facilitate L2 grammar the advantage of interleaving found in this study,
learning more than blocking (RQ1). Although no but not in the study conducted by Carpenter and
statistically significant difference was found be- Mueller. Second, while participants in Carpenter
tween the interleaved and blocked groups on the and Mueller’s study had no prior knowledge
immediate posttest, the interleaved group signif- of the target pronunciation rules, the partici-
icantly outperformed the blocked group on the pants in this study had a relatively high level
1-week delayed posttest. These findings are con- of prior knowledge. As a result, the benefits of
sistent with those reported in non-L2 research interleaving were perhaps observed in this study,
Tatsuya Nakata and Yuichi Suzuki 15
whereas blocking was more effective than inter- same time, although the posttest scores achieved
leaving in Carpenter and Mueller’s study. Third, by the increasing and interleaved groups were
whereas Carpenter and Mueller examined the not statistically significantly different, the for-
learning of form–form connections (i.e., spelling mer group required a shorter treatment duration
and sound), the present investigation focused (M = 18.16 minutes) than the latter (M = 20.26
on the learning of form–meaning connections minutes). These findings suggest that, while both
for morphosyntactic features. The difference in treatments appear equally effective, the increas-
the nature of the target features might also be ing format could potentially be more efficient.
responsible for the inconsistent results. One of the reasons why increasing practice was
In addition to the Discriminative Contrast Hy- not very effective in this study might be a relatively
pothesis, the benefits of interleaving in this study high level of prior knowledge on the part of learn-
might also be explained by the spacing effect. ers. The theoretical underpinning for the advan-
While the questions targeting a particular gram- tage of increasing practice is based on two assump-
matical category were studied sequentially in the tions. First, while interleaving tends to be effective
blocked condition (which corresponds to massed for experienced learners, blocking is posited to
learning), questions from a particular grammati- benefit novices (e.g., Guadagnoli et al., 1999; Rey
cal structure were separated by those pertaining et al., 1982). Second, as training progresses, learn-
to other structures in the interleaved condition ers’ proficiency is expected to improve. These
(which corresponds to spaced learning). Because two assumptions suggest that, in the early stages
spaced learning leads to better long-term reten- of learning (when the proficiency level is low),
tion than would be achieved by massed learning blocking should be used, whereas interleaving
(spacing effect), interleaving was perhaps more should be introduced in the later stages of learn-
effective than blocking in this study. However, ing (when the proficiency level of the learner is
some researchers have failed to find the advan- relatively high). In the present study, however,
tage of interleaving over blocking, despite the pre- since the participants had a relatively high level of
dictions of the spacing effect (e.g., Carpenter & prior knowledge, blocking in the early stage might
Mueller, 2013). The current study findings sug- not have been particularly effective, and it is pos-
gest that the effects of the moderating factors, sible that interleaving should have been used at
such as the ones mentioned previously (between- the outset. This may be one of the factors that
category discriminability and prior knowledge), reduced the effectiveness of the increasing prac-
may sometimes outweigh the benefits of spacing. tice. This interpretation can be supported by the
The second research question addressed the relative effectiveness of the three practice sched-
extent to which increasing practice would facili- ules used in this study. The interleaved condi-
tate L2 grammar learning as opposed to blocked tion, which included more interleaved questions
or interleaved practice alone. We hypothesized (50) than the increasing condition (25), was the
that increasing practice would facilitate L2 gram- most effective on the delayed posttest, whereas
mar learning more than blocking or interleaving the blocked condition, which had no interleaved
would, for two reasons. First, a combination of questions, turned out to be the least effective.
blocking and interleaving is expected to facilitate These findings suggest that the amount of inter-
learning because it may allow learners to detect leaved practice might have been the key factor
the commonalities within each category while at for facilitating grammar learning in this study be-
the same time helping them distinguish among cause of the relatively high level of prior knowl-
different categories. Second, increasing practice edge the participants possessed.
may be beneficial for learning because difficulty The third research question concerned the
is increased gradually by using blocking first and role of prior knowledge in the effects of blocked,
then interleaving. This strategy provides a contin- interleaved, and increasing practice. Based on
uous match between the proficiency level of the the extant research on motor skill acquisition,
learner and task difficulty and helps introduce we hypothesized that blocking would be effective
the appropriate level of difficulty throughout the for learners with a low level of prior knowledge,
treatment (Porter et al., 2007; Porter & Magill, whereas interleaving would be more beneficial
2010), which enhances retention, according to for learners with a high level of prior knowledge.
the desirable difficulty framework (Bjork, 1999). The current study findings revealed a marginally
This hypothesis, however, was not supported by significant interaction between condition and
the study findings, as on both immediate and de- pretest score on the delayed posttest, suggesting
layed posttests, the increasing group failed to out- that prior knowledge may moderate the practice
perform the blocked or interleaved group. At the schedule effects. While no statistically significant
16 The Modern Language Journal 00 (2019)
correlation was found between the pretest score ing demand (DeKeyser, 2013; Sanz et al., 2016).
and the score gain on the delayed posttest in Similarly, no significant correlation was found in
the blocked (r = −.103) or increasing group the increasing group possibly because increas-
(r = .192), a statistically significant, moderate, ing practice, where the first half of the ques-
and negative correlation was detected in the tions were blocked by category and learning dif-
interleaved group (r = −.420). ficulty was gradually increased, helped lessen the
These findings suggest that prior knowledge learning burden and neutralized the role of prior
might play a role especially in interleaved prac- knowledge.
tice. Contrary to our prediction, the participants
with lower pretest scores benefited more from in- CONCLUDING REMARKS
terleaving when compared to those with higher
pretest scores. This finding may seem somewhat The findings yielded by the present study sug-
inconsistent with the results reported by other gest that the benefits of interleaving for L2 gram-
researchers indicating that interleaving is espe- mar learning observed in earlier research (Pan
cially effective for higher-level learners, whereas et al., 2018) may extend to the learning of par-
blocking is sometimes beneficial for novices (e.g., tially familiar grammatical structures. Pedagogi-
Guadagnoli et al., 1999; Rey et al., 1982). How- cally, the findings suggest that grammar learning
ever, recall that the participants with lower pretest may be enhanced by incorporating interleaved
scores in this study were not complete novices practice. These benefits can be derived by, for ex-
and can be regarded as relatively advanced learn- ample, including multiple features that the learn-
ers. Consequently, the findings of this study may ers have studied in the past in a language-focused
not necessarily be at odds with those yielded by task (e.g., a dictogloss task that contains mixed
prior research. The participants with lower pretest use of the simple past and present perfect; see
scores benefited from interleaving possibly be- Wajnryb, 1990). Both the present investigation
cause the level of difficulty was appropriate for and Pan et al.’s recent study demonstrated the
them (Bjork, 1999). The participants with higher advantage of interleaved practice over blocked
pretest scores, however, might have found inter- practice, suggesting that the interleaving effect on
leaved practice relatively easy, which resulted in grammar learning is reliable. Furthermore, the
smaller benefits of interleaved practice. This is magnitude of effect sizes found in the present
probably the reason behind the negative corre- study (d = 0.64) as well reported by Pan et al.
lation between the pretest scores and score gains (2018) for Experiments 3 and 4 (0.53 ࣘ d ࣘ 0.79)
for the interleaved group. Note that, because the indicates that incorporating interleaving into the
level of prior knowledge can only be assessed as curriculum is desirable (Hattie, 2008) for gram-
high or low in relative terms, this interpretation mar learning.
is only speculative, and more rigorous research is A questionnaire administered after the imme-
needed to scrutinize the role of prior knowledge diate posttest showed that the learners considered
in the interleaving effects. blocking as effective as interleaving, although the
The correlation between the pretest scores and latter schedule led to better long-term retention.
absolute score gains was not significant in the These findings highlight the importance of rais-
blocked group possibly because blocking was not ing awareness about the effects of interleaved
challenging enough for most participants (the practice. The participants in the present study
accuracy rate during learning reached a plateau were unaware of the benefits of interleaving po-
in the blocked condition, as shown in Figure 6 tentially because it led to a lower proportion of
and Online Supplement 2). Blocked practice is correct responses during learning (M = 77.00%)
deemed less demanding than interleaved prac- compared with blocking (M = 87.23%). Since
tice because it allows multiple practice items from learners tend to assess long-term retention based
the same grammatical category to be presented in on performance during the learning phase (e.g.,
sequence. This might have allowed learners that Bjork, 1999), the participants in the interleaved
took part in the present study to induce grammat- group perhaps felt that learning was not pro-
ical rules regardless of their L2 proficiency, neu- gressing smoothly, potentially resulting in judg-
tralizing the effects of prior knowledge. This in- ments of learning that were similar to those of
terpretation is consistent with L2 acquisition and the blocked group. The results reported here
educational psychology research demonstrating also demonstrate the value of highlighting that
that individual difference factors, such as prior conditions that initially confuse learners and in-
knowledge and aptitude, exert little influence un- duce a low level of performance during training
der a treatment with low information process- can be beneficial over time, whereas conditions
Tatsuya Nakata and Yuichi Suzuki 17
that increase learning phase performance can be 2017; Suzuki & DeKeyser, 2017a), the interleav-
harmful in the long term (desirable difficulty ing effects have received relatively little attention
framework; Bjork, 1999). in extant research. The present study demon-
Although the findings presented in this work strated that interleaving can potentially enhance
are valuable, this study is not without limitations. L2 grammar acquisition. Because interleaving
For instance, the present findings suggested that allows instructors and curriculum designers to
learners’ prior knowledge might interact with the significantly improve learning by simply rear-
interleaving effect. At the same time, it is possi- ranging practice questions, further investigations
ble that the interleaving effect is moderated by in- into the effects of interleaving on L2 develop-
dividual differences in other learner-related vari- ment would be valuable for both researchers and
ables, such as working memory capacity or lan- practitioners.
guage analytic ability (e.g., Suzuki & DeKeyser,
2017b). Further research examining the effects
of cognitive aptitudes on the interleaving effect ACKNOWLEDGMENTS
would thus be a useful follow-up to this study.
Another limitation stems from the rather nar-
This research was supported in part by Grant-in-Aid
row concept of practice. The treatment in this
for Young Scientists (A) (#16H05943) awarded to the
study comprised solely multiple-choice fill-in-the- first author from Japan Society for the Promotion of Sci-
blank questions, which is a form of controlled ence. The authors are very grateful to two anonymous
grammar practice. Although the use of controlled reviewers and the editor, Marta Antón, for their invalu-
practice may offer some benefits (e.g., it facili- able advice. We extend our gratitude also to Tomohiro
tates the acquisition of explicit, declarative knowl- Tsuchiya for his cooperation with data collection.
edge, while allowing experimenters to have strict
control over the treatment), it is not sufficient
for L2 acquisition (e.g., Ellis & Shintani, 2014).
Future researchers should thus investigate the NOTES
interleaving effects as a part of less controlled
grammar practice, such as picture description 1 It is true that several researchers found massing
tasks. Moreover, in this study, an untimed gram- superior to spacing, a phenomenon known as the
maticality judgment test was employed as the Peterson paradox. However, this phenomenon has
posttest that measured only declarative-explicit been observed under very limited conditions, that is,
knowledge. In future research, it may be useful to when spacing intervals are very short (4−8 seconds),
employ posttest measures that investigate learn- and learning is measured after a-less-than-8-second
ers’ ability to use target structures fluently, such delay following the treatment (Cepeda et al., 2006).
2 These marginally significant interactions are worth
as oral elicited imitation tasks (Suzuki & Sunada,
exploring because the current study is one of the early
2018). attempts to examine the effects of interleaving on L2
Furthermore, the practice schedule was con- grammar learning. Furthermore, the magnitude of ef-
founded with spacing in the current study. Specif- fect sizes (.04 < ηp 2 < .05) was non-negligible, ap-
ically, while interleaved practice corresponded to proaching medium size (Cohen, 1988).
spaced learning, blocked practice corresponded
to massed learning. Although this design reflects
authentic learning, separating the effects of in-
terleaving and spacing would allow us to better REFERENCES
understand the mechanisms of interleaving ef-
fects (e.g., Kang & Pashler, 2012; Taylor & Rohrer, Bird, S. (2010). Effects of distributed practice on the ac-
2010). Further research isolating the effects of in- quisition of second language English syntax. Ap-
terleaving and spacing for L2 grammar learning plied Psycholinguistics, 31, 635–650.
is thus warranted. Birnbaum, M. S., Kornell, N., Bjork, E. L., & Bjork,
Cognitive psychology research has long demon- R. A. (2013). Why interleaving enhances induc-
strated that learning can be increased signif- tive learning: The roles of discrimination and re-
trieval. Memory & Cognition, 41, 392–402.
icantly by manipulating the practice schedule
Bjork, R. A. (1999). Assessing our own competence:
through spacing and interleaving. Although there Heuristics and illusions. In D. Gopher & A. Ko-
has been a growing interest in the effects of riat (Eds.), Attention and performance XVII: Cognitive
spacing on L2 learning in recent years (e.g., regulation of performance: Interaction of theory and ap-
Bird, 2010; Muñoz, 2012; Nakata & Suzuki, 2019; plication (pp. 435–459). Cambridge, MA: The MIT
Nakata & Webb, 2016; Rogers, 2017; Suzuki, Press.
18 The Modern Language Journal 00 (2019)
Brown, J. D. (2014). Classical theory reliability. In A. J. lating science of learning for teachers (pp. 79–93). New
Kunnan (Ed.), The companion to language assessment York: Routledge.
(pp. 1165–1181). Oxford, UK: Wiley–Blackwell. Kang, S. H., & Pashler, H. (2012). Learning painting
Carpenter, S. K., & Mueller, F. E. (2013). The effects of styles: Spacing is advantageous when it promotes
interleaving versus blocking on foreign language discriminative contrast. Applied Cognitive Psychol-
pronunciation learning. Memory & Cognition, 41, ogy, 26, 97–103.
671–682. Lyster, R., & Sato, M. (2013). Skill acquisition theory
Carvalho, P. F., & Goldstone, R. L. (2014). Putting cat- and the role of practice in L2 development. In
egory learning in order: Category structure and M. P. García Mayo, J. Gutierrez–Mangado, & M.
temporal arrangement affect the benefit of inter- M. Adrián (Eds.), Contemporary approaches to sec-
leaved over blocked study. Memory & Cognition, 42, ond language acquisition (pp. 71–92). Philadelphia,
481–495. PA/Amsterdam: John Benjamins.
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Macmillan, N. A., & Creelman, C. D. (2004). Detection
Rohrer, D. (2006). Distributed practice in verbal theory: A user’s guide (2nd ed.). Mahwah, NJ:
recall tasks: A review and quantitative synthesis. Lawrence Erlbaum.
Psychological Bulletin, 132, 354–380. Miles, S. W. (2014). Spaced vs. massed distribution in-
Cohen, J. (1988). Statistical power analysis for the behavioral struction for L2 grammar learning. System, 42,
sciences (2nd ed.). Hillsdale, NJ: Erlbaum. 412–428.
DeKeyser, R. (2007). Practice in a second language: Perspec- Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Lev-
tives from applied linguistics and cognitive psychology. els of processing versus transfer appropriate pro-
Cambridge: Cambridge University Press. cessing. Journal of Verbal Learning and Verbal Behav-
DeKeyser, R. (2013). Aptitude. In P. Robinson (Ed.), The ior, 16, 519–533.
Routledge encyclopedia of second language acquisition Muñoz, C. (2012). Intensive exposure experiences in second
(pp. 27–31). New York: Routledge. language learning. Bristol, UK: Multilingual Mat-
DeKeyser, R. (2015). Skill acquisition theory. In B. Van- ters.
Patten & J. Williams (Eds.), Theories in second lan- Nakata, T. (2015). Effects of expanding and equal spac-
guage acquisition: An introduction (pp. 94–112). New ing on second language vocabulary learning: Does
York: Routledge. gradually increasing spacing increase vocabulary
Eglington, L. G., & Kang, S. H. K. (2017). Interleaved learning? Studies in Second Language Acquisition, 37,
presentation benefits science category learning. 677–711.
Journal of Applied Research in Memory and Cognition, Nakata, T., & Suzuki, Y. (2019). Effects of massing and
6, 475–485. spacing on the learning of semantically related
Ellis, R., & Shintani, N. (2014). Exploring language peda- and unrelated words. Studies in Second Language Ac-
gogy through second language acquisition research. New quisition. 41, 287–311.
York: Routledge. Nakata, T., & Webb, S. (2016). Does studying vocabu-
Ferguson, G. (2001). If you pop over there: A corpus- lary in smaller sets increase learning? The effects
based study of conditionals in medical discourse. of part and whole learning on second language vo-
English for Specific Purposes, 20, 61–82. cabulary acquisition. Studies in Second Language Ac-
Finkbeiner, M., & Nicol, J. (2003). Semantic category quisition, 38, 523–552.
effects in second language word learning. Applied Nitta, R., & Gardner, S. (2005). Consciousness-raising
Psycholinguistics, 24, 369–383. and practice in ELT coursebooks. ELT Journal, 59,
Gass, S. (2018). SLA elicitation tasks. In A. Phakiti, 3–13.
P. De Costa, L. Plonsky, & S. Starfield (Eds.), Norris, J., & Ortega, L. (2000). Effectiveness of L2 in-
The Palgrave handbook of applied linguistics research struction: A research synthesis and quantitative
methodology (pp. 313–337). London: Palgrave meta-analysis. Language Learning, 50, 417–528.
Macmillan. Pan, S. C., Tajrana, J., Loveletta, J., Osuna, J., & Rickard,
Goto, K., Maki, H., & Kasai, C. (2010). The Minimal En- T. (2018). Does interleaved practice enhance for-
glish Test: A new method to measure English as eign language learning? The effects of training
a second language proficiency. Evaluation & Re- schedule on Spanish verb conjugation skills. Jour-
search in Education, 23, 91–104. nal of Educational Psychology. Advance online publi-
Guadagnoli, M. A., Holcomb, W. R., & Weber, T. J. cation. https://doi.org/10.1037/edu0000336
(1999). The relationship between contextual in- Plonsky, L., & Oswald, F. L. (2014). How big is “big”?
terference effects and performer expertise on the Interpreting effect sizes in L2 research. Language
learning of a putting task. Journal of Human Move- Learning, 64, 878–912.
ment Studies, 37, 19–36. Porter, J. M., Landin, D., Hebert, E. P., & Baum, B.
Hattie, J. (2008). Visible learning: A synthesis of over 800 (2007). The effects of three levels of contextual in-
meta-analyses relating to achievement. New York: Rout- terference on performance outcomes and move-
ledge. ment patterns in golf skills. International Journal of
Kang, S. H. (2016). The benefits of interleaved practice Sports Science & Coaching, 2, 243–255.
for learning. In J. C. Horvath, J. M. Lodge, & J. Hat- Porter, J. M., & Magill, R. A. (2010). Systematically in-
tie (Eds.), From the laboratory to the classroom: Trans- creasing contextual interference is beneficial for
Tatsuya Nakata and Yuichi Suzuki 19
learning sport skills. Journal of Sports Sciences, 28, Suzuki, Y., & DeKeyser, R. (2017a). Effects of distributed
1277–1285. practice on the proceduralization of morphology.
Rey, P. D., Wughalter, E. H., & Whitehurst, M. (1982). Language Teaching Research, 21, 166–188.
The effects of contextual interference on females Suzuki, Y., & DeKeyser, R. (2017b). Exploratory research
with varied experience in open sport skills. Re- on second language practice distribution: An ap-
search Quarterly for Exercise and Sport, 53, 108–115. titude × treatment interaction. Applied Psycholin-
Rogers, J. (2017). The spacing effect and its relevance guistics, 38, 27–56.
to second language acquisition. Applied Linguistics, Suzuki, Y., & Sunada, M. (2018). Automatization
38, 906–911. in second language sentence processing: Rela-
Rohrer, D., & Taylor, K. M. (2007). The shuffling of tionship between elicited imitation and maze
mathematics problems improves learning. Instruc- tasks. Bilingualism: Language and Cognition, 21,
tional Science, 35, 481–498. 32–46.
Sanz, C., Lin, H.–J., Lado, B., Stafford, C. A., & Bow- Taylor, K. M., & Rohrer, D. (2010). The effects of in-
den, H. W. (2016). One size fits all? Learning con- terleaved practice. Applied Cognitive Psychology, 24,
ditions and working memory capacity in Ab ini- 837–848.
tio language development. Applied Linguistics, 37, Wajnryb, R. (1990). Grammar dictation. Oxford: Oxford
669–692. University Press.
Schneider, V. I., Healy, A. F., & Bourne, L. E. (1998). Wong, A. W.–K., Whitehill, T. L., Ma, E. P.–M., &
Contextual interference effects in foreign lan- Masters, R. (2013). Effects of practice schedules
guage vocabulary acquisition and retention. In A. on speech motor learning. International Journal of
F. Healy & L. E. Bourne (Eds.), Foreign language Speech-Language Pathology, 15, 511–523.
learning: Psycholinguistic studies on training and reten- Zulkiply, N., & Burt, J. S. (2013). The exemplar inter-
tion (pp. 77–90). Mahwah, NJ: Erlbaum. leaving effect in inductive learning: Moderation by
Schneider, V. I., Healy, A. F., & Bourne, L. E. (2002). the difficulty of category discriminations. Memory
What is learned under difficult conditions is hard & Cognition, 41, 16–27.
to forget: Contextual interference effects in for-
eign vocabulary acquisition, retention, and trans-
fer. Journal of Memory and Language, 46, 419–440. SUPPORTING INFORMATION
Suzuki, Y. (2017). The optimal distribution of practice
for the acquisition of L2 morphology: A concep- Additional supporting information may be found
tual replication and extension. Language Learning, online in the Supporting Information section at
67, 512–545. the end of the article.