Dalenberg, 2011
Dalenberg, 2011
Jelle R. Dalenberg
1338730
Mei 2011
Master Thesis
Human-Machine Communication
University of Groningen, The Netherlands
Internal supervisor:
Jelmer Borst, MSc (Artificial Intelligence, University of Groningen)
External supervisor:
Dr. Hedderik van Rijn (Experimental Psychology, University of Groningen)
1
Contents
1 SUMMARY...................................................................................................................................... 4
2 INTRODUCTION ........................................................................................................................... 5
4.1 INTRODUCTION............................................................................................................................ 21
2
4.3.1 METHOD .................................................................................................................................... 27
6 REFERENCES .............................................................................................................................. 39
3
1 Summary
Fact learning plays an important role in a variety of learning contexts. Previous studies indicate the
benefit of repeated testing and spacing learning sessions over time (Cepeda, Pashler, Vul, Wixted &
Rohrer, 2006; Karpicke & Roediger, 2008). These phenomena were previously incorporated in a
cognitive model by Van Woudenberg (2008), Van Thiel (2010), Koelewijn (2010) and Nijboer (2011)
that improved learning word pairs. The current study extends this work in two ways: 1) we
investigated how repeated spaced testing improves fact learning and 2) whether the spacing model
improved learning facts that are not only stored phonetically (word pairs) but also visuo-spatial
(topography).
Memory models indicate that by increasing the time between learning sessions, the relative
strength of memory traces weaken. These memory traces will then be harder to retrieve from
memory. We therefore hypothesized that increased effort on memory retrieval is responsible for the
benefit of spaced learning. Mental effort can be measured by pupil dilation. A previous pupillary
study by Magliero (1983) showed that encoding effort reacts to the recency effect but no studies
have linked effort as measured by pupillary dilation to the frequency effect. The first experiment in
the current study investigated the relation between retrieval effort and the relative memory strength
of mentally stored information. In the experiment phasic pupil dilation of 15 participants was
measured and analyzed during retrieval tasks while they were learning topographical facts. The facts
were studied once and tested during four repetitions in one of two repetition-interval conditions. We
hypothesized that retrieval effort will decrease as the relative strength of a memory trace increases.
This hypothesis accounts for recency effects as well as for frequency effects. Analysis of the phasic
pupil response in the experiment shows a significant main effect for the repetition interval condition.
Furthermore an interaction effect between the number of repetitions and repetition interval was
found, indicating that the difference in effort between short and long repetition intervals decreased
as the number of rehearsals increased. These findings largely confirm our hypotheses and the
assumptions of theories that assume that increased retrieval effort increases learning gains. The
results also show that phasic pupil dilation is an excellent predictor of the correctness on the
upcoming trial, indicating that an increase in effort results in an decreased chance of retrieving the
memory trace, which is associated with lower memory strength.
The cognitive model described above, initially implemented for word pair learning, was
adapted for an implementation in topographic learning. Previous implementations of the spacing
model indicated learning gains of approximately 10%. We therefore hypothesized that learning with
the spacing method improves the final test scores. To test this, the second experiment of the current
study investigated whether an adapted version of the spacing model increased learning gains in
topographic learning. Response times and correctness of 48 participants on a final test were
measured and analyzed after learning topographical facts with either a spacing or flashcard method.
Although the experiment failed to find a difference between both learning methods, the results did
show that the spacing method was more efficient by presenting less rehearsals per item while the
performance of the participants remained constant on the final test. In contrast, the flashcard
method presented on average more rehearsals and extra rehearsals were correlated with a lower
performance on the final test. Further analysis showed that the spacing model encountered
problems with the type of stimuli that were used. Furthermore previous studies also show difficulties
with finding effects after similar retention intervals. If the known issues with the current spacing
model are addressed and when the retention interval is increased, future experiments will more
likely show the learning gains that were expected.
Thus, This work shows that the spacing model improves learning gains by increasing retrieval
effort and that it still has potential to improve learning facts that are associated with different
memory systems.
4
2 Introduction
2.1 Improving topographical learning
Let us suppose that you are a high school student in the Netherlands and have to learn a map of 50
Dutch cities for your geography course. Time is short and you really need to learn this efficiently
because you also have to invest time in homework for other courses and, of course, social activities.
This is an example of a common problem that a lot of teenage students face during their education.
Two common strategies to learn the cities are by spacing learning sessions over time or cram
everything just before the test. Teachers and researchers often recommended the former, but 25-
50% of the students choose to start learning just before the test and mass everything in one learning
session (McIntyre & Munson, 2008).
This effect has been replicated many times which is shown by the meta study from Cepeda,
Pashler, Vul, Wixted & Rohrer (2006). Cepeda et al give a clear indication that, in general, spaced
learning improves learning results. By combining the results of 254 studies with retention intervals
(i.e. the time between the last learning session and the test) between 1 second and 31 days (and
longer), they show that spaced learning scored 47.3% whereas massed learning scored 36.7 on the
final recall tests. Does this mean that you have to distribute learning sessions with as much time as
possible in between, to be able to score high on the Dutch cities test? The answer to this question is
no. Cepeda et al (2006) also conclude that for every retention interval there is an optimal interval
between learning sessions (spacing intervals). This optimal spacing interval increases as the retention
interval increases.
Whether the spacing interval should be fixed or expanding remains unclear. Cepeda et al
(2006) argue that an expanding spacing interval either improves learning or shows similar effects to
the fixed spacing interval. An improvement to the idea of an expanding interval is to dynamically
change the interval per item. This can be done by estimating how well you remember a Dutch city by
means of response time and/or correctness. Paragraph 2.4 and Chapter 4 will discuss this in further
detail.
5
or have not learned something. If they successfully recall the information from memory, the
information is considered to be learned and the student drops it from further practice. This allows
him to focus on new material. The experiment of Karpicke & Roediger (2008) shows that this strategy
is wrong: eliminating items from further practice greatly impairs long term retention. Therefore it is
more efficient to keep learning all items at every learning session. Moreover, the main finding of the
Karpicke & Roediger (2008) study, is that repeated testing (memory retrieval) greatly improves
learning compared to repeated studying (memory encoding). This finding is also known as the testing
effect. This is an important effect to take into account while learning and will be further discussed in
Chapter 3.
To repeatedly test your knowledge on the Dutch cities you can test yourself, but it might be
more practical to use an aid. You can, for example, ask your parents or a friend to rehearse the Dutch
cities with you and let them rehearse the cities you tend to forget more often and vice versa. There
are also computer programs that help you rehearsing, but in general they only repeat your mistake
once and do not take this mistake into account any further. It would be more helpful if such a
computer program could take into account how to space learning the Dutch cities and how often
they should be rehearsed individually. Fortunately there are already such systems for learning word
pairs. These systems make use of cognitive models.
Van Woudenberg (2008) improved the PA model by using two measures; the correctness of
the user’s input and the response time of the user. For each fact that will be rehearsed, the model is
able to make a prediction for the response time of the participant. If this prediction deviates from
the monitored response time of the user, internal model parameters change in order to adjust to the
personal memory decay speed of the user. By using correctness and response times, Van
Woudenberg (2008) found that participants who learned with the adaptive model scored
significantly better at learning word pairs than both the PA model and a control flashcard method.
Recently two improvements for Van Woudenberg’s 'spacing model' have been proposed. First,
Van Thiel (2010) introduced a model that increased the speed of adaptation and was able to
personalize the rate of forgetting. A model that incorporated this adjustment also proved to score
significantly better than a control flashcard method. The second improvement was based on the
users’ response times. To account for a delay in motor response, the original model used a standard
response time of 300 ms. To personalize this response time, Koelewijn (2010) added a response time
test prior to the study session. The results of this experiment indicated a positive effect on the test
score but no significant results were found.
A more detailed description of these models and how to optimize topographical learning with
a similar model will be described in Chapter 4.
6
learning sessions increases retrieval effort. We set up a second experiment to test this hypothesis.
Chapter 3 will discuss the first part of this study in further detail by addressing the topics forgetting,
the testing effect and mental effort. Chapter 3, furthermore discusses pupil dilation as measure for
mental effort. To investigate whether spacing learning sessions indeed increases retrieval effort we
conducted a pupil dilation experiment that shows how spacing and multiple repetitions affect mental
effort and performance.
Second, we investigated whether the spacing model improved learning topographical facts.
In previous work the spacing model was tested in word pair learning paradigms. To investigate
whether the spacing model improves learning of factual information that is not only stored
phonetically but also visuo-spatial we set up a second experiment in which the spacing model was
adapted to a topographical learning setting and tested against a flashcard method. Chapter 4 will
discuss how the current cognitive models for learning word pairs work and the experiment in which
we tested how well the implementation of the spacing model in topographical learning performs.
7
3 The role of effort on memory retrieval
3.1 Forgetting and the benefit of rehearsing
Before we start discussing effort and memory retrieval we will first explain how fast you tend to
forget facts such as the Dutch cities. Ebbinghaus (1895) was the first to show the forgetting curve.
Although this curve is different per individual and type of learning material, it always behaves like a
power function which is shown in Figure 3.1. The curve shows how the chance on remembering a
fact that is only encountered once, declines over time. Let's say, that you learn a Dutch city once and
you are required to remember it after one day. According to the graph in this example, your
probability of still remembering it, dropped to around 35%.
Figure 3.1 – The forgetting curve showing the decline in retention as time passes.
Over the years, many different theories have been proposed to explain the effect of
forgetting (Byrnes, 2000). Regardless of the proposed underlying mechanisms, all theories assume
that the relative memory strength of an item decreases over time which can be countered by
rehearsal. Although the theories propose different mechanisms underlying the dynamics of memory
strength, one of the more constant mechanisms is the notion that rehearsals strengthen memory
traces, which increase retrieval performance. A neuronal explanation of the beneficial effect of
rehearsing was proposed by Hebb (1949), who stated that neurons strengthen their connection
when they show repeated temporal electrical activation. More support for the beneficial effect of
rehearsing was found in studies that showed that recall performance decreased when rehearsal was
prevented (e.g., Brown, 1958; Peterson & Peterson 1959). Since these initial findings, many memory
models have been proposed to explain the constructs of human memory. All these models
incorporate the importance of rehearsing.
One example is the modal model proposed by Atkinson and Shiffrin (1968) and Shiffrin and
Atkinson (1969). They stated that control processes initiate rehearsal in the short term memory store
(STS). The proportion of information that is transferred to the long term memory store (LTS) is a
function of how long the information resided in the STS or the number of cycles the information
made trough the STS (Shiffrin and Atkinson, 1969). Another model in which rehearsal plays a
prominent role is the three-component model of memory, initially proposed by Baddeley and Hitch
(1974) and has further evolved since (Baddeley et al, 2009). One of the three components in this
model is the phonological loop. This component actively rehearses auditory memory traces. Both
these models assume that non-rehearsed items will drop out of short-term memory. However,
neither model provides a detailed account of this process.
8
More recent theories are more explicit about how information becomes less accessible. Decay
based models (e.g., ACT-R, Anderson et al, 1998) assume that information becomes less accessible as
a function of time, whereas other models assume that the interaction with other information causes
the decreased performance (e.g, association-based models, SAM, Raaijmakers, 2003; and
interference-based models, such as SOB, Farrell & Lewandowsky, 2002). This decrease in
performance has to be countered by rehearsals. Rehearsal either strengthens a memory trace by
increasing the activation of an information chunk (i.e., the strength in memory; ACT-R), by creating
and strengthening associative connections between cues and information (SAM), or by adjusting the
vector weights of the new and all other learned information (SOB).
Therefore, regardless of the underlying mechanisms, all theories account for recency and
frequency effects by means of relative memory strength. Such effects are demonstrated by for
example Rundus (1971). He conducted an experiment in which participants listened to a list of 20
words and were instructed to remember them for a recall test after the learning phase. The
participants were instructed to talk aloud while learning the words. Figure 3.2 shows the results of
the experiment. The figure shows that the form of the recall probability per serial word position has
an U-shape. This shape is showing two effects: the frequency effect and the recency effect. The
frequency effect is the effect of a higher recall probability on facts that are learned more often. In
Figure 3.2, this effect is shown by the increased recall probability of the first words. The first words
exist longer in the experiment and the extra time allows for more rehearsals, as shown in the figure.
These extra rehearsals, in turn, increase the recall probability of the first words.
The other effect that can be seen in Figure 3.2 is the recency effect. This effect shows that
facts which are learned more recently are still more active in memory. This effect disappears when a
delay in recall is used (Glanzer & Cunitz, 1966).
Figure 3.2 – Results of the Rundus, (1971) experiment. The dotted line shows the recall probability per serial position of each to be
remembered word immediately after the study phase. The squared line shows the amount of rehearsals per serial position of each to be
remembered word.
The theories and findings discussed above clearly show that rehearsal is very important to
increase memory strength and improve later retention. Rehearsal can be done by restudying
(encoding) or by repeated testing oneself (retrieval). As has been stated in paragraph 2.3, Karpicke &
Roediger (2008) argued that repeated testing shows a large improve to learning gains compared to
repeated studying. The next paragraph will further discuss this testing effect.
9
3.2 The Testing Effect
The first large scale studies that show learning gains due to the testing effect are by Gates (1917) and
Spitzer (1939). Gates (1917) investigated the testing effect in groups of children from grades 1, 3, 4,
5, 6 and 8. The children were instructed to learn a set of non words (used from Ebbinghaus, 1885) in
an initial study phase and then recite the words during a second learning phase (reciting the words
was considered testing your knowledge). The time spent on reciting in the second phase was
manipulated by allowing the children to recite in a time period that was shorter than the time it took
them to study the items in the first phase. The results of Gates’ (1917) experiment showed that the
children scored higher on both an immediate and a delayed test when they spent more time on
reciting (see Figure 3.3). For the children in grade 1 there was no effect. This could indicate that it
only occurs after a certain point in development (Roediger & Karpicke, 2006).
Figure 3.3 – The results of the Gates experiment. The graphs show the proportion of non-words recalled after a learning phase
and a reciting phase that lasted for 0, 20, 40, 60 or 80% of the learning phase. The left graph shows the result after an immediate test and
the right graph shows the results after a delayed test 3-4 hours later (adapted from Roediger & Karpicke, 2006).
Figure 3.4 – The results of the Spitzer (1917) experiment. The graph shows the proportion correct of groups of 6th grade students
who completed tests about 600 word stories at different time points. The groups were tested 1, 2 or 3 times (adopted from Roediger &
Karpicke, 2006).
Spitzer (1939) let 3605 sixth-grade students study a short 600 word story about either bamboo
or peanuts. During the next 63 days, the students took retention tests about the stories on various
moments. The tests consisted of 25 multiple choice questions with 5 alternatives. To investigate the
effect of testing, Spitzer started testing groups of students on different time points. Figure 3.4 shows
the results of the Spitzer (1939) experiment. The graph shows the test performance of 8 groups of
students who were tested during the 63 day period. The results show a clear forgetting curve that
indicates how retention declined until the moment of the first test of each group. Completing this
10
first test greatly reduced forgetting. Furthermore, the results show that the sooner a test was given,
the better the students performed on subsequent tests.
After these initial findings, the testing effect has been replicated numerous of times and it has
been shown that this effect is powerful and long lasting (e.g. Carrier & Pashler, 1992; Roediger &
Karpicke, 2006; McDaniel, Anderson, Derbish & Morisette, 2007; Karpicke & Roediger 2008;
Carpenter, Pashler, Wixted & Vul, 2008).
So it is clear that if you repeatedly test yourself, you improve your retention on later tests.
Also, the sooner you test yourself, the better your retention on subsequent self tests. But how does
this work mentally? Dempster (1996) distinguished two theories from the literature that try to
explain this effect. The amount-of-processing hypothesis (Glover, 1989; Thompson, Wenger &
Bartling, 1978; Slamecka & Katsaiti, 1988) and the retrieval hypothesis (Gardiner, Craik, & Bleasdale,
1973). The amount-of-processing hypothesis states that the learning gain of the testing effect is only
because testing gives additional exposure to the material that you learn. The retrieval hypothesis
states that every self test reactivates and operates on memory traces by elaborating them or
creating new retrieval routes. It was argued by Gardiner et al (1973) and Jacoby (1978) that retrieval
effort causes this process.
Evidence from experiments show that the amount-of-processing hypothesis is a less likely
hypothesis. Roediger & Karpicke (2006) and Wheeler, Ewers, Buonanno (2006), for example, show
that additional studying only enhances retention on a short term test while the testing condition still
shows better learning gains on the long term. Therefore the retrieval hypothesis seems more
favorable.
If we translate these results to our Dutch city example it means that when you test yourself,
the cities that are hard but successfully retrieved from memory are better remembered on
subsequent tests. Thus, increasing retrieval effort seems to improve learning gains.
How do we increase retrieval effort and how do we measure it? So far retrieval effort has been
associated with response times; The time to retrieve mentally stored information increases when the
relative strength of this information decreases (e.g., Sternberg, 1969; Stanners, Meunier, Headley,
1969; Jolicœur & Dell'Acqua, 1998).
A different measure for retrieval effort is pupil dilation. Pupil size not only reacts to
differences in light intensity, but was also found to strongly correlate with mental effort (e.g. Hess
and Polt, 1964; Kahneman and Beatty, 1966; Beatty, 1982; Fish and Granholm, 2008; Jones, Siegle,
Muelly, Haggerty & Ghinassi, 2010). The preference of pupil dilation over response times is indicated
by Porter, Troscianko and Gilchrist (2007) who found effects of effort on pupil dilation in a search
difficulty task that was matched for response times.
11
3.4 Pupil dilation
3.4.1 Measuring effort
Although changes in pupil size as a response to mental activities was actively studied from the late
19th century (e.g., Schiff and Fao, 1874, Heinrich, 1896), renewed interest in this measure stems
from the early 1960s (e.g. Hess & Polt, 1964). In 1966, Kahneman and Beatty conducted an
experiment in which participants had to memorize a list of items and later report it. As the pupil
dilation increased with each additional presentation, and decreased after each successful report, this
study is taken as a prime example of the link between pupil dilation and effort. During the two
decades after these first findings, numerous studies indicated that pupil dilation increases in
response to increased cognitive processing demands (for a review see Beatty, 1982). These results
were found in tasks that involve memory, language processing, complex reasoning, perception and
attention. Furthermore, pupil dilation is also used to capture inter-individual differences in
processing load (Beatty, 1982; Fish and Granholm, 2008; Jones et al, 2010).
Two behavioral effects in pupil behavior can be distinguished; tonic pupil size and phasic
dilation of the pupil (Aston-Jones & Cohen, 2005; Rajkowski et al, 1994; Gilzenrat et al, 2010). Tonic
pupil size indicates to what extent the organism is exploring the environment and reflects general
alertness (baseline pupil dilation). Phasic dilations indicate exploitation and reflects focused attention
and processing of interesting and/or rewarding stimuli (task evoked pupil dilation).
Another property of pupil dilation that needs consideration is its response latency. Previous
studies indicated that the pupil typically responds to differences in light in a 200 to 1000 millisecond
time window (Beatty & Lucero-Wagoner, 2000). Furthermore, peak dilations typically appear in a 1 to
2 second time window (Krinsky & Nelson, 1981; Briesemeister, Hofmann, Tamm, Kuchinke, Braun &
Jacobs, 2009).
Although pupil dilation proved to be a reliable measure, previous studies have also shown
disadvantages of this method. The dilation, for example, only increases until the cognitive resource
limits are reached; when participants have to hold more information in their memory than they can
remember, the dilation stays at its maximum or starts to decline (Granholm, Asarnow, Sarkin, &
Dykes, 1996). Furthermore, using the measure pupil dilation is not suitable for every type of setting.
Iqbal et al. (2004) as well as Schultheis and Jameson (2004) found that pupil diameter did not show
an effect of cognitive load in continuous reading tasks. Conati and Merten (2007) used eye tracking
data in an exploratory learning environment in which subjects perform mathematical tasks. They also
measured pupil dilation continuously and showed that this measure could not predict whether the
subject could self-explain the instructional material. These findings show that measuring dilation
during continuous tasks and tasks that reach the cognitive limits of the participant should be
avoided.
12
3.5 Effects of spacing and repeated testing on pupil dilation
The previously discussed literature indicates the importance of repeated testing. Furthermore it was
hypothesized that increasing the mental effort of retrieving information improves learning gains.
Spaced learning is exactly what tries to accomplish this; by increasing the time between repetitions
of a Dutch city, memory strength decreases and more effort should be needed to retrieve the Dutch
city from memory. If this is true, pupil dilation should demonstrate measurable differences,
indicating that the retrieval hypothesis is true. Finding these differences will furthermore indicate
that pupil dilation could potentially be used as an additional measure in an adaptive learning model
next to correctness and response time.
The study by Magliero (1983) indicated that there is a link between memory strength and
effort. However, the participants in this study were not explicitly instructed to learn the words in the
list; the participants were told at the start of the experiment that they would participate in a memory
task after the list of words was presented. Therefore we set up an experiment to test the effects of
memory strength on pupil dilation during the retrieval process. As relative memory strength is
assumed to decrease over time and increase with the number of rehearsals, we manipulated both
the number of intervening items (and thus the time between repetitions) and the number of
repetitions of a to-be-learned item.
We hypothesized that less effort is needed to retrieve mentally stored information when the
relative strength of the information increases. To test this hypothesis an experiment was performed
in which participants had to learn facts. To investigate recency and rehearsal effects independently,
the facts were repeatedly tested at two different repetition intervals. We predicted that (1) an
increased number of repetitions would result in a decreased dilation of the pupil; (2) a longer time
between two repetitions will result in a increased dilation. Both these predictions are based on the
notion that increased relative memory strength is reflected in lower effort as estimated by pupil
dilation. With respect to the interaction, if effort is linked to relative memory strength in a similar
way as retrieval latency (as been discussed in paragraph 3.3), an interaction is to be expected in that
the decrease of dilation is stronger in the more difficult condition. The next paragraph shows the
method, results and discussion of our experiment.
Stimuli - Participants had to learn brain topography. The cross-section of the brain used in this
experiment is shown in Figure 3.5. A total of 26 areas were presented, indicated by the 26 circles
shown. The areas largely correspond to Brodmann areas, although some Brodmann areas were
combined into a single aggregate. Each of the areas was indicated throughout the experiment by its
topographical full name (e.g., “Inferior Temporal Gyrus”, or “Dorsal Anterior Cingulate Cortex”). Two
types of trials were presented, study trials and test trials. During the study trials, the name of the to
be identified area was shown above the cross-section in Courier New 26 point font, and the
corresponding area was indicated by an arrow. During a test trial, participants were first presented
the name of a previously learned area in Courier New 26 point font in black on a white background,
centered on the screen. After this presentation, the cross-section as shown in Figure 1 was shown.
Participants indicated which area they thought corresponded to the presented name by clicking on
one of the 26 circles.
13
Figure 3.5 – Display as presented to the participants during the answer-part of a test-trial. Circles mark the 26 areas used in this study.
Design - Every participant was presented all 26 areas, randomly distributed over five learning blocks.
Three learning blocks contained four areas each, and two learning blocks contained seven areas.
Every block was presented five times, in consecutive runs, before the next block commenced. All
items of a block were presented in each run. The first run consisted of study trials; the four
subsequent runs of test trials. When a run was completed, the order of areas within that block was
randomized to avoid learning the areas in a fixed order, while taking care that an area was never
presented twice in a row.
As the repetitions of each block were presented consecutively, the average time or distance
between two presentations of the same area is a function of the number of items in a block. The four
area-blocks constitute the short repetition interval condition, and two seven area-blocks constitute
the long repetition interval conditions. Because the order of areas within each run was randomized,
the interval between two repetitions of same area in the short repetition interval blocks was one to
six areas. For the long blocks the repetition interval was one to twelve intervening areas. The first
block was a short repetition interval block, and subsequent blocks alternated between long and short
repetition interval (i.e., S, L, S, L, S). A total of 130 trials were presented.
Procedure - Participants were seated in front of a 22” (20” viewable) IIlyama Vision Master Pro 513
CRT monitor (set at a resolution of 1280 x 1024) and were asked to rest their chin on a head mount in
front of the screen. Distance from head mount to the screen was approximately 60 cm. Pupil dilation
of the right eye was measured at 500 Hz using a SR Research EyeLink 1000 eye tracker which was
placed immediately below the computer screen. Presentation of all stimuli was controlled using
PsychToolBox (Brainard, 1997; Kleiner et al, 2007) with the Eyelink extensions (Cornelissen et al,
2002).
Participants were instructed that they were to learn brain topography, and that they would
get a set of study trials that presented the areas and the associated names, followed by four runs of
test trials in which they had to indicate the answer by clicking on the circle of the correct region. All
instructions were presented on-screen.
A study trial started with the string “Study trial…” presented centered on the screen for three
seconds after which the study screen appeared. This screen showed an area name together with an
arrow that indicated the right corresponding position. Although the right answer was indicated, the
participant was still free to choose any desired answer. After the participant clicked on a circle to
indicate his or her answer, feedback was provided. The selected circle turned green for 1 second if
14
correct, or red for 2 seconds if incorrect. If an incorrect answer was given, an arrow highlighted the
correct area.
A test trial started with a fixation cross that was presented centred on the screen for 4
seconds, followed by the presentation of the area name for 6 seconds. After this period, the cross-
section of the brain was presented. The participant had 10 seconds to provide an answer by clicking
on a circle associated with an area using a standard computer mouse. Feedback was identical to the
feedback presented during the study trials.
The slow pace of the experiment allowed for accurately measuring the relatively slow
fluctuations in pupil dilation. The long presentation of the fixation cross at the start of each test trial
provided the baseline to which later measures were scaled. The long presentation of the area name
allowed for measuring a complete phasic pupil response. The complete experiment, including setup
and debriefing, lasted about 25 minutes.
3.6.2 Results
Four participants were excluded because of technical measuring problems or not following
instructions. The data from 15 participants (5 male; average age 21.5 years; SD = 2.01) were used for
further analysis. The first short repetition interval block was considered training, and was not
analyzed. We will report data of Run 2 to 5 for Blocks 2 to 5, as in Run 1 only study trials were
presented. We will refer to these runs as Repetition 1 to 4. All trials with a response time longer than
8 s or shorter than 500 ms were considered outliers, and removed from further analyses (.9% of all
trials).
100% ●
●
●
●
●
● ●
Percentage Correct
●
●
●
90%
●
2.75
80%
● ●
Short RI
●
Response Time (in s)
Long RI 2.5
70%
2.25
●
2
●
●
●
●
●
1.75
●
●
●●
●
1.5
#1 #2 #3 #4
Repetition
Figure 3.6 - Percentage Correct and Response Time data for Short and Long Repetition Intervals.
Behavioral Results - Figure 3.6 shows the main behavioural data. The two lines in black indicate the
percentage correct responses over the four repetitions, which were submitted to a repeated
measure ANOVA after an arcsine transform. As expected, the correctness is higher for the short
repetition interval condition (F(1,14)=26.8, =.66, p<0.001) and increases with increased number of
repetitions (F(1,14)=24.4, =.64, p<0.001). The figure also shows in an interaction between
15
repetition and method, that the advantage of the short repetition interval condition decreases over
time (F(1,74)=7.1, =.09, p=0.009).
An inverse, but qualitatively similar pattern of results can be observed for the response times,
with main effects for condition (long repetition intervals result in increased response times,
F(1,14)=16.9, =.55, p=0.001) and repetition (responses decrease with increased number of
repetitions, F(1,14)=26.7, =.66, p<0.001). As for the percentage correct data, the interaction
between repetition and method shows that the initial response time advantage for the short
repetition interval blocks decreases with repetitions (F(1,74)=38.2, =.34, p<0.001). These results
are in line with previous studies: longer repetition intervals are associated with lower performance
than shorter repetition intervals, and an increasing number of repetitions improves performance
with a stronger effect for the long repetition condition.
Pupillary Results - The pupil diameter as reported by the SR Research Eyelink 1000 eye tracker was
cleaned from saccade and eye blink induced artefacts by linear interpolation of 25 samples before
and after a saccade, and 50 samples before and after a blink. Any remaining artefacts were manually
selected and the associated dilation was replaced by linear interpolation. A total of 58 trials (3.9%)
were excluded because of either too fast or slow responses or too many artefacts. The development
of the relative dilation during the presentation of the area name for the first and last repetition is
plotted for the short and long repetition intervals separately in Figure 3.7. The figure shows the
difference in phasic pupil dilation between the first and forth repetition for both the short and long
repetition intervals. As can been seen, the phasic dilation on the first repetition is higher in the long
repetition interval. Furthermore, the phasic dilation decreased to similar levels after four repetitions
in both repetition intervals.
10 Short RI 10 Long RI
9 9
8 Repetition 1 8 Repetition 1
Relative Dilation (in %)
7 Repetition 4 7 Repetition 4
6 6
5 5
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
−3 −3
−4 −4
−5 −5
0 500 1000 1500 2000 2500 3000 0 500 1000 1500 2000 2500 3000
Figure 3.7 - Lowess filtered (f=.05) relative dilation for first and last repetition, plotted for short (left) and long (right) retention intervals
(RI). Zero ms is the onset of the screen.
The phasic pupil response was calculated per trial as the difference in dilation between the
constriction and peak (see, e.g., Bradley, Miccoli, Escrig & Lang, 2008). For both estimates, the
average of a window of 400 ms around the extreme was calculated. Mean phasic pupil response and
(between-subject ANOVA-type) standard errors are depicted in Figure 3.8 for the correct trials in all
conditions. As the resulting distribution was heavily right skewed (Shapiro-Wilk test: W=0.88,
p<0.001), the data were log-transformed (W=0.99, p>.9). Nine (.07%) outliers (> 2.5 SD) were
removed.
16
We tested the effects of repetition interval and number of repetitions on phasic pupil
response in correct trials using linear mixed effect models (Baayen, Davidson, Bates, 2008) with
crossed, independent random effects. Repetition interval and number of repetitions were entered as
fixed factors, whereas area and participant were entered as random factors1.
24
Short RI Long RI
#1 #2 #3 #4 #1 #2 #3 #4
Repetition Repetition
Figure 3.8 - Phasic pupil response (in %) for the short (left) and long (right) repetition intervals per repetition
Table 3.1 shows the results of this analysis. In line with the hypotheses, the analysis shows that
the long retention interval is associated with increased pupil dilation (β=.14). This effect is mainly
caused by the first repetition as the interaction effect decreases the difference between short and
long repetition interval with .06 for each additional repetition. Post-hoc analyses indicated that the
interaction was indeed caused by the decrease in dilation in the long repetition interval (β=-0.05,
p<0.001) and a lack of repetition effect in the short repetition interval condition (β=0.02, p=0.393).
Fixed Effects on response times
HPD95
Table 3.1 - Overview of the estimates (β), the upper and lower 95% Bayesian highest posterior density (HPD) confidence intervals,
and p-values based on the MCMC posterior distribution (determined using pvals.fnc with 10000 samples, Baayen et al, 2008) of the fixed
factors entered in linear mixed-effect model.
We also tested the effects phasic pupil response on probability of a correct (P(correct)) answer
in the same trial using a similar linear mixed effect model with crossed, independent random effects.
Phasic pupil response, condition and rehearsal were entered as fixed factors, whereas area and
participant were entered as random factors.
1
We also fitted more complex models, including, for example, trial number and block. As these models did not qualitatively
change the outcomes, we decided against reporting the more complex models here.
17
Table 3.2 shows the results of this second analysis. The analysis shows the phasic pupil dilation
was a strong predictor for correctness; indicating that a larger phasic pupil dilation is correlated with
a lower P(correct) (β=-.02, p=.014). Furthermore, the condition for the long intervals resulted in a
lower P(correct) (β=-1.49, p<.001), whereas the number of rehearsals was positively correlated with
P(correct) (β=-.74, p<.001). Thus, performance is lower in the longer ISI condition and rehearsals
increase performance.
Fixed Effects on P(correct)
β p
Table 3.2 - Overview of the estimates (β) of the fixed factors entered in linear mixed-effect model. The β are on a logarithmic scale
used by the binomial linear effect models (Baayen et al, 2008)
3.6.3 Discussion
We conducted an experiment in which pupil dilation was measured to assess effort during fact
learning. We predicted a reverse relation between relative memory strength and effort. To test this
hypothesis, we manipulated relative memory strength by repeating all information multiple times at
one of two repetition intervals. The behavioural results of the experiment indicate that our
manipulations were successful: performance increased with increased repetitions and decreased
when the repetition interval was longer, an effect that diminished with increased repetitions. These
effects are in line with the assumption that repeated presentations increase relative memory
strength and an increased interval between two repetitions results in a decreased memory strength
(compared to a shorter interval).
Although slightly different, the pupil dilation effects also indicate effects of relative memory
strength. First of all, the long repetition interval – in which a lower relative memory strength is
assumed – is associated with increased pupil dilation. Second, the interaction between repetition
interval and number of repetitions indicates that the pupil dilation decreases with an increased
number of repetitions in the long repetition interval condition. Both these effects argue in favor of an
effect of relative memory strength on effort. However, we also predicted an effect of the number of
repetitions in the short repetition interval conditions, which could not be found. There are multiple
explanations possible for the lack of an effect in this condition. One explanation is that the short
repetition interval condition resulted in very strong memory traces that were retrieved in a fraction
of the time available to the participants. After retrieving the fact, participants might have involved in
other mental activities, which artificially raised the measured pupil dilation. Another possible
explanation is that the phasic pupil response as shown in Figure 4 has a floor effect of around 18% in
our setup. This response could, for example, reflect the effort associated with reading and processing
the presented area name. If these components of the process already invoke a large pupillary
response, small effects during high levels of relative memory strength are difficult to identify.
Although not all our predictions were confirmed, the results show that phasic pupil dilation is
an excellent predictor of the correctness on the upcoming trial; the higher the phasic dilation, the
lower the chance to be correct. This indicates that an increase in effort is associated with a decreased
chance of retrieving the memory trace.
18
Our findings can be explained by the leading memory theories. According to these theories,
introduced above, the measured increases in retrieval can be explained by extra memory decay after
longer repetition intervals (ACT-R), weakening of associative connections between cues and mentally
stored information (SAM) or greater interference from the additional stimuli in the long repetition
interval (SOB).
Porter et al (2007) indicated that effort effects are not always captured during the retrieval
process. However, the effects from the current experiment are in line with previous response time
memory studies that were able to find these effort effects. By using pupil dilation as an additional
measure of effort, our study gives a stronger indication that manipulations in repetitions and
repetition intervals affect retrieval effort (e.g., Sternberg, 1969; Stanners et al., 1969; Jolicoeur &
Dell'Acqua, 1998). Furthermore, the finding that retrieval effort is higher for facts with a lower
relative strength is in line with a fMRI study by Buckner, Koutstaal, Schacter, Wagner, and Rosen
(1998). They conducted a word recognition task and found that during a successful retrieval of
shallow encoded words, activation in the bilateral anterior insular regions and a left dorsal prefrontal
region increased. Buckner et al. argue that this increased activation is indicative of increased effort.
Thus, the current study extends these and Magliero’s (1983) findings by focusing on retrieval effort
instead of on encoding or recognition effort.
Although memory theories can explain our findings, it still remains unclear what biological
mechanism exactly causes the increased retrieval effort. A number of recent theories on the causes
of pupillary effects might help in unraveling this question. One explanation for the pupil response to
mental effort can be derived from the Adaptive Gain Theory, which states that activation in the
cortex is strongly dependent on the Locus Coeruleus (LC), a nucleus in the brainstem regulating
arousal and behavior (Aston-Jones & Cohen, 2005). A high correlation between activation in the LC
and pupil dilation was found in monkeys, and later the effect was confirmed in humans (Rajkowski et
al, 1994; Gilzenrat et al, 2010). By linking effort during memory retrieval via pupil dilation with
activation in specific brain regions, more precise hypotheses can be formulated.
Regardless of the underling mechanisms, the results of the current study can have an
extensive impact on learning theories. Many studies have shown the beneficial effect of deeper
encoding on later retention, whether it is by an implicit learning task or by mnemonics (Krinsky &
Nelson, 1981; McDaniel et al, 1986; Byrnes, 2000). According to the retrieval effort hypothesis by Pyc
and Rawson (2009), this effect is partly dependent on the amount of effort required, also during
successful retrievals. Although they confirmed their hypothesis by manipulating retrieval difficulty
through changes in repetition interval and the number of repetitions, they did not test whether
effort was indeed increased. The current study confirmed this assumption by showing that pupil
dilation decreases when the repetition interval decreases and when the number of repetitions
increase.
19
faster in the longer repetition interval, showing that increasing the memory strength of mentally
stored information decreases the effort to retrieve this information.
The results of the experiment also show that phasic pupil dilation is a suitable predictor for
trial performance. This indicates that pupil dilation as a measure can be used in a cognitive model.
However, errors are typically associated with longer response times and response times are
measurable with less error (no averaging, artifact removal nor baseline corrections are needed).
Therefore, pupil dilation was not further used for the current study and is better suited in future
implementations that aid learning of, for example, physically impaired students.
20
4 A spacing model in topographic learning paradigm
4.1 Introduction
Chapter 1 mentioned two previously implemented cognitive spacing models. A model by Pavlik &
Anderson (2005 & 2008) (PA model) and a model that was constructed by Van Woudenberg (2008),
Van Thiel (2010) and Koelewijn (2010) (WTK model). These authors showed that a spacing model is
able to increase learning performance on word pairs. Learning word pair is typically associated with
phonological memory systems (Baddeley, 2009). Topography, however, can be associated with visuo-
spatial memory systems (Baddeley, 2009). To investigate whether the spacing model is suited for
learning facts that involve both types of memory systems, an experiment was set up in which the
spacing model was tested against a flashcard during topographic learning. Before discussing this
experiment, this chapter will start with discussing the PA and WTK models in further detail and
propose an adapted model that was implemented in the topographic learning paradigm.
n
Ai ln t t j i , j
d
(Eq. 4.1)
j 0
Figure 4.1 shows an example this additive effect of additional rehearsals. The red line shows
the activation of a memory trace while it was encountered twice, while the blue line shows the effect
from three additional encounters. The graph shows how the activation of the memory trace benefits
from these extra encounters after 15 seconds.
The activation Equation accounts for both frequency and recency effects; the activation of
chunk i increases with the number of encounters and the activation of a recently encountered chunk
is higher. To also account for the spacing effect, Pavlik & Anderson (2005) proposed an Equation in
which the decay of every encounter was made dependable of the current activation. Equation 4.2
shows how the decay value will increase when the activation is high and decrease when the last
encounter was longer ago (therefore resulting in a lower activation). In Equation 4.2, the decay
parameter has an intercept α and a scaling parameter c.
di, j c e
Ai t j
(Eq. 4.2)
Figure 4.2 shows how the activation is affected by adding the spacing formula. The red line
shows the activation based on decay that dynamically changes based on Equation 4.3 and the blue
line the activation with a static decay parameter. The graph shows that for an early rehearsal the
21
dynamic decay decreases activation while for a later rehearsal the dynamic decay increases
activation, compared to the static decay. This change results in a higher activation when the time
between rehearsals increases. Pavlik & Anderson (2005, 2008) showed that by fitting these
parameters, they could model data sets of multiple spaced learning experiments.
Figure 4.1 The difference in activation of a memory trace between 2 encounters (red) and 5 encounters (blue).
Figure 4.2 The activation difference after 3 encounters at 0, 2 and 20 seconds, between static decay (blue, d=0.9) and decay
based on the spacing formula (red, c=0.25, α=0.17).
To account for adaptation to the user, Pavlik and Anderson (2008) changed the activation
Equation in their model. The Equation was expanded with three β parameters as shown in Equation
4.3. These parameters are intended to capture deviations of the overall model from the data that
could not be accounted for with random noise. βs represents differences at the participant level, βi at
the item level and βsi at the individual learning per item level. Pavlik and Anderson (2008) updated
the βs and βi every 300 trials. The update of βsi was based on the correctness of the users’ response
with a Bayesian algorithm and changed after every trial. It remains unclear how this was done in
further detail.
22
n
Ai s i si ln t t j i , j
d
(Eq. 4.3)
j 0
The user adaptive model by Pavlik & Anderson (2008) was able to improved the
learning of improved recall and recall latencies of Japanese-English word pairs when compared to a
flashcard method.
The rate of adaptation used by Van Woudenberg (2008) is shown in Equation 4.5. This
Equation shows that the α value changed by the difference between the predicted and observed
latency in milliseconds and with a maximum of 0.01. Incorrect answers were translated to a response
time of 15 seconds, reflecting in a maximum α increase. Van Woudenberg (2008) showed that this
adaptation method outperformed a binary adaptation method as proposed in the PA model as well
as a flashcard method.
Lobserved L predicted
max 0.01, (Eq. 4.5)
1000
Although Van Woudenberg improved the PA model with the new adaptation method, the
adaptation was slow. With a steps size maximized at 0.01, convergence to the right α value takes a
considerable amount of time and might not even be possible for short learning sessions. Therefore
Van Thiel (2010) changed the algorithm to speed up the adaptation by a binary search of the best
fitting α value based on a set of previous observed and predicted response times. Van Thiel (2010)
also showed that this version of the model was able to outperform a flashcard learning method in
word-pair learning.
Koelenwijn (2010) addressed the fixed time in the latency of retrieval Equation shown in
Equation 4.4). Koelenwijn (2010) personalized the model by estimating individual fixed time values
and by removing noisy observed values from the latency Equation. Although Koelenwijn (2010) was
not able to show significant effects in the test scores in a word-pair learning paradigm, his changes
did show a positive trend in the scores.
The last refinements on the model that were done by Nijboer (2011) were an α
adaptation based on last 5 differences between the predicted and observed response times and
setting the binary search to the best fitting α value to a maximum range of Δ 0.025. This last version
of the WTK model, which was faster and computational less heavy, was used for an implementation
in the topographic learning paradigm in the current study.
The spacing method has been compared to the flashcard method multiple times for word-
pair learning (e.g. Pavlik & Anderson, 2008; Van Thiel, 2010; Van Rijn, Van Maanen, & Van
23
Woudenberg, 2010). To indicate whether the spacing method also outperforms a flashcard method
in topographic learning, the current study will also make this comparison.
Figure 4.3 shows a flowchart of the the flashcard method that was implemented in the
current study. Before the schedule started the to be learned brain areas were randomly divided in
blocks, each containing 4 or 5 areas. Identical to the first experiment, the very first encounter of an
area was a study trial in which the area name together with the corresponding location (highlighted
with an arrow) was given. Further repetitions of an area were test trials in which the participant was
asked to find the location of the indicated area name. After studying the areas in a block once, the
order was shuffled and the participant was tested on his knowledge by a repetition of each area in
that block. Areas that were incorrectly remembered during testing were repeated again at the end of
each block in random order. When all areas within a block were correctly remembered once the next
block was studied and rehearsed. After all blocks were presented, the model repeated its sequence
by starting to retest the areas of the first group in random order. This sequence repeated until the
time limit of the learning phase was met.
D
MT a b log 2 1 (Eq. 4.6)
W
The Equation of Fitts’ Law was used in the model to make an estimation of the
movement time and subtract this time from the observed latency in response. To get an estimate of
the parameters a and b a small task was constructed in which the participant had to click on 20
circles that were shown in succession of each other. Every new circle had an increased distance from
the current mouse position. The parameters a and b where then calculated from the observed
movement times. Table 4.1 shows the Fitts’ Law parameters along with the other model parameters
that were used in the model.
24
Parameter settings spacing model
Parameter Value
To test the implementation of the spacing model and the changes that were applied a pilot
was conducted with 18 participants that learned for 15 minutes and had a final test on the areas
after a 15 minute break. 9 participants used the spacing method and the other 9 used a flashcard
method as control. The results of this pilot lead to three changes in the method. Firstly, the
estimations of the parameters in Fitts’ Law were subject to large outliers in the observed movement
times. Because of this problem, the values that were assigned to the parameters became highly
unlikely (e.g negative or very large) for some participants. Therefore the clicking task was removed
from the experiment and the parameters a and b were set to the fixed numbers 0.3 (corresponding
to the standard response time of 300 ms) and 0.15 (based on the average increase of DI in the pilot)
respectively.
Secondly, the participants in the pilot scored close to the maximum outcome on the test.
Therefore an attempt was made to increase the difficult by removing the colour coding of the cross-
section of the brain. The cross-section was presented in black and white as shown in Figure 4.3.
Figure 4.3 – Display as presented to the participants during the answer-part of a study-trial. Circles mark the 26 areas used in this study.
25
Thirdly, the pilot showed that the rate of adaptation was incorrect. The experiment showed
that several incorrect answers on an area rapidly increased α to very high values . In previous
implementations of the WTK model the α value decreased if the observed response time was faster
than the predicted response time. During the pilot study the response times of the participants were
not sufficiently faster than the predicted response times to decrease the α values, even though they
were always correct. Because the α values remained high, new rehearsals appeared rapidly and very
frequent. Therefore a change was made to the α adaptation. After each incorrect responses the α
value was increased by 0.01 (instead of setting the latency to 15 seconds). Furthermore, during the
binary search for the optimal α value, which was initiated after correct answers only, the model
ignored any preceding trial that was incorrectly answered.
Figure 4.4 shows a flowchart of the spacing algorithm that was used as learning schedule in
Experiment 2. Within the session time the model verified if the activation of any area was expected
to drop below threshold after 15 seconds ahead in time. We took 15 seconds to be sure there would
not be any area below threshold when the upcoming learning trial would be finished. If an area was
expected to drop below threshold a test (repetition) trial was given. If none of the areas were
expected to drop below the threshold the model would present a new area as a study trial unless all
areas were already studied. Then a test trial of the area with the lowest activation after 15 seconds,
was given.
Start
No
> Session time? Any min(act) <
treshold in 15s?
Yes No
Yes
All items
Stop presented?
Yes
Observe RT
Observe correctness
Answer correct?
Yes No
Figure 4.4 – Flowchart of the schedule in the spacing method that was used in the topographical learning experiment.
26
Each test trial started by an estimation of the movement time (MT) and a prediction of the
response time (RT). Subsequently the test trial was presented to the participant and the response
time and correctness of the answer was observed. If the answer was wrong, the α value of the tested
area would increase with 0.01, representing a faster decay and resulting in a faster future rehearsal.
For a correct answer however, the α value was changed based on the observed and predicted
response times of the last (maximally) 5 encounters (incorrectly answered encounters were ignored).
Equation 4.3 showed that the α value is just an offset and easily changes the historical activation
behaviour of each area. Changes in the activation during the previous encounters result in new
response time predictions (see Equation 4.5). By estimating which change in the α value results in the
least error between the predicted and observed response times, an approximation of the best fitting
α value can be made. This was done by a 5 step binary search which searched for the best α value in
a Δ 0.025 range of the current α.
When the model finished either presenting a study trial or changing the α value after a test
trial it would repeat the entire process again until the study session time expired.
4.3.1 Method
Participants- A total of 47 participants (13 male) took part in this experiment. The participants were
students of the University of Groningen who volunteered to participate in this experiment in
exchange for study credit. Informed consent, as approved by the Ethical Committee Psychology of
the University of Groningen, was obtained before testing. All participants were naïve to the study
material.
Stimuli- The stimuli were slightly different from the first experiment. In the current experiment the
name of the to be identified area was shown above the cross-section during both study and test
trials. Furthermore, because the experiment took place on different computers there were some
changes in font and text size. The learned areas were presented in Geneva 23 point font in black on a
white background. To increase learning difficulty, the color coding in the cross-section of the brain
was removed as was shown in Figure 4.1.
Design- The design was different from the first experiment. In the current experiment the
participants were divided in two groups; an experimental group that learned the areas with the
spacing method (n=23), and a control group that learned the areas with a flashcard method (n=24).
The method determined the order of the study and test trials as described in previous paragraphs. In
short: the spacing method presented study trials until the activation of a learned brain area was
expected to be below threshold after the next 15 seconds. The calculation of the activation was
based on correctness and response times. When the activation of a learned brain area was expected
to be below threshold, the area would be repeated again as test trial. If all areas were studied once
and no area was expected to be below the activation threshold the area with the lowest activation
after the next 15 seconds would be retested.
In the flashcard method every set of areas was studied once and the containing areas were
tested until all areas were responded to correctly, before moving to the next group. Areas that were
incorrectly responded to were repeated again in shuffled order at the end of the group. After all
27
blocks were presented, the model repeated its sequence by starting to retest the areas of the first
group in random order.
After the study phase every participant commenced with a filler task for 10 minutes. The
filler task was used to prevent recency effects and the slower memory decay in the spacing condition
should have a stronger retention performance after a break. After the filler task the participants
were retested on all areas during a final test.
Procedure- The participants were seated in front of a Macbook with a 15.4” screen (set to a
resolution of 1440 x 900). The distance to the screen was approximately 50 cm. The experimenter
explained the informed consent and asked the participants to sign it. The participants were then
informed about the time schedule of the experiment (a learning task for 25 minutes, a picture
judgment task for 10 minutes and a final task related to the first task for 10 minutes) and were
instructed that they were to learn brain topography and that their response times would be
measured during learning. The experimenter stressed the fact that participants should learn in their
own pace. After the experimenter left the room, the program took over the instructions on how the
participant could interact with the program. The program showed an example study trial, an example
test trial and how the program would react to correct and incorrect answers.
A study trial was indicated with the string “Learn:” presented at the top of the screen. This
screen also showed an area name below the string “Learn:” and an arrow that indicated the right
corresponding position on the cross-section of the brain. Although the right answer was indicated,
the participant was still free to chose any desired answer. After the participant clicked on a circle to
indicate his or her answer, feedback was provided. The selected circle turned green for 1 second if
correct, or red for 2 seconds if incorrect. If an incorrect answer was given, an arrow highlighted the
correct area.
A test trial was indicated with the string “Rehearse:” and an area name. This information was
presented in the same spot as in a study trial. The participant then had 10 seconds to provide an
answer by clicking on a circle associated with an area using a standard computer mouse. Feedback
was identical to the feedback presented during the study trials.
The study phase ended after 25 minutes and second task commenced. This was a filler task
which was unrelated to the study in which the participants had to make judgements about pictures
for around 10 minutes.
After the participants finished the filler task, a final test was given in which they were tested
on all the areas from the learning phase. For the test the set up of the learning phase was used and
worked exactly the same apart from the schedule. The schedule during the final test consisted of 26
test trials, one for each area. At the end of the test, the participant was shown his test result.
At the end of the test, participants were asked to fill in an evaluation form in which they were
asked to rate on a scale from 1 to 5 their motivation during learning, how difficult they perceived
learning the brain areas, how much they liked the learning and if they would use the current program
for future use. An open field for further comments was also provided. The experimenter stressed
that filling in the form had no effect on the study credits that the participant would receive from the
experiment.
The complete experiment, including setup and debriefing, lasted for about 50 minutes.
4.3.2 Results
One participants was excluded because of not following instructions. The data from 46 participants
(12 male; average age 20,9 years; SD = 1.9) were used for further analysis. The participants were
28
randomly divided over the spacing method (n= 22; 5 male; average age 20,7; SD = 1,8) and the
flashcard method (n= 24; 7 male; average age 21,0; SD = 2,0).
Figure 4.5 shows a box plot of the difference between the final test scores of the participants
that learned with the spacing or flashcard method. Unlike our prediction, no significant difference
was found (t = 0.44).
Figure 4.5 – Box plot of the difference in test scores between participants that learned with the spacing and flashcard method.
Despite the changes to the α adaptation in the model, the box plot in Figure 4.6 shows that
there were still areas with large α values at the end of the learning phase. This caused a large number
of repetitions of certain areas for some participants. Therefore areas with a large number of
repetitions (>34) were considered outliers and removed from the data (4%) for further analysis.
Figure 4.6 – Box plot of αs per area in the spacing condition at the end of the learning phase. The α started at 0.4 and adapted to the user
based on correctness and response times.
The effect of method, number of repetitions, motivation and difficulty were tested on the
probability to be correct P(correct) on each area during the final test, using binomial linear mixed
29
effect models with crossed, independent random effects1. Method, number of repetitions,
motivation and difficulty were entered as fixed factors, whereas area and participant were entered
as random factors.
Var SD
β p
M2 x TR -0.24 >0.001***
Table 4.2 - Overview of the estimates (β), and p-values of the fixed factors entered in linear mixed-effect model (n= 1096).
Figure 4.7 – The subjective effects of difficulty and motivation on P(Correct) per area on the final test. The values are calculated by the β-
values provided by the binomial linear mixed effect model.
Table 4.2 shows the results of the analysis. The Table shows that the flashcard condition (β =
3.10, p<0.001) and motivation (β = 0.75, p<0.01) show an overall positive effect on P(correct).
Indicating that the participants scored better with the flashcard method and scored higher when
they were more motivated during the experiment. An overall negative effect on P(correct) was
shown by difficulty (β = -0.65, p<0.05), which indicates that the participants who experienced the
learning phase to be more difficult, scored lower during the final test. The effects of the subjective
1
Adding interactions between the current predictors as well as adding new predictors 'exact exam delay', ' correctness on the last trial
during the learning session' or subjective measures such as 'enjoyment while using the program' and 'willing to use the program in the
future' as extra predictors did not qualitatively improve the model.
30
measures difficulty and motivation are shown in Figure 4.7. Both effects show no correlation (r =
-0.07). The inverse log of the β values in motivation show a curvilinear effect in motivation on
P(Correct). This indicates that low motivated participants already scored high. As motivation
increased the influence on P(Correct) gradually decreased.
Both difficulty and motivation were not significantly different between both methods.
although a small trend was visible showing that motivation was higher after having learned with the
spacing method (t = 1.24, p = 0.22).
Although no beneficial main effect was found for the spacing effect, the results show a
positive effect for this condition in the interaction between method and number of repetitions. In the
flashcard condition, increases in the number of repetitions per area had a negative effect on
P(correct), while in the spacing condition the probability remained constant, regardless of the
amount of rehearsals. This effect shows that the areas were sufficiently rehearsed in the spacing
condition to keep the retention on a constant level, while more repetitions in the flashcard condition
indicate a higher difficulty. This interaction is shown in Figure 4.8, which shows P(correct) on an area
as a function of the total repetitions for both the flashcard and the spacing method. The grey lines
indicates the density of the number of rehearsals per method. However, the main effect of method
still indicated that the flashcard method was better which can be explained by the large outliers in
the spacing condition. These outliers received a large number of unnecessary repetitions and greatly
reduced the effective learning time. As a result participants received less repetitions per area in the
spacing condition (n= 472; µ=12.61; σ=6.89) than the flashcard condition (n= 624; µ=13.69; σ=3.91)
(t=3.04, p<0.01).
Figure 4.8 – Tow plots that show the probability to be correct on a brain area during the final test as a function of the total
encodings during the learning phase for both the flashcard (Left, green) and the spacing (right, blue) methods. The values are the inverse
logarithm calculated from the β-values provided by the binomial mixed linear model analysis. The graph is plotted on encoding 1 to 20. The
grey lines indicates the densities of the number of total encounters per area on both methods.
In a second analysis the effects of method, number of repetitions and difficulty were tested
on the response time for each area during the final test, using linear mixed effect models with
crossed, independent random effects. Method, number of repetitions, difficulty were entered as
fixed factors, whereas area and participant were entered as random factors2. All response times on
incorrect trials were not measured and therefore excluded from this analysis (12.1%, n=963). Table
4.3 shows the results of this analysis. An inverse, but qualitatively similar pattern of results to
P(correct), can be observed for the response times. The flashcard condition reduced response times
2
Adding extra interactions, the exact exam delay, the correctness on the last trial during the learning session or subjective measures
such as motivation during learning, enjoyment while using the program and willing to use the program in the future, as extra predictors did
not qualitatively improve the model.
31
(β = -1.09, p>0.01) and a nearly significant effect of difficulty shows an increase in response times (β
= 0.26, p=0.07). Again a highly significant interaction was found between method and the number of
repetitions, which shows a gradual increase of response times in the flashcard method when the
number of repetitions increases, while the response times in the spacing method remain constant.
This effect also supports the fact that the areas were sufficiently rehearsed in the spacing condition
to keep the retention on a constant level. The interaction effect is shown in Figure 4.9, which shows
the response time per area on the final test as a function of the total amount of repetitions for both
the spacing and the flashcard methods.
Random Effects on Response Times
Var SD
HPD95
Table 4.3 - Overview of the estimates (β), the upper and lower 95% Bayesian highest posterior density (HPD) confidence intervals,
and p-values based on the MCMC posterior distribution (determined using pvals.fnc with 1000 samples, Baayen, 2008) of the fixed factors
entered in linear mixed-effect model (n=963).
Figure 4.9 - A plot that shows the response time on a brain area during the final test as a function of the total encodings during the learning
phase for both the flashcard (blue) and the spacing (green) methods. The values are the calculated from the β-values provided by the
mixed linear model analysis. RT is plotted against 9 (end of 1st quantile of observed total encodings) to 16 (end of 3rd quantile of observed
total encodings).
Figure 4.6 showed there were still areas with high α values. Outliers that were removed from
the analysis were repeated very often. Further analysis was done to look into which areas caused
32
problems with the α values in the spacing model. A set of areas contained multiple concepts (MC’s)
in them. 'Anterior Cingulate Cortex' for example indicates both the frontal position (anterior) and a
name (Cingulate Cortex). We therefore hypothesized that MC areas could be composed of multiple
chunks and/or need extra reasoning to determine the right position on the cross-section of the brain.
The equations used in the spacing method, model the decay of a single chunk. Extra memory
processes during the retrieval of MC areas could therefore explain problems in the α adjustment. To
investigate whether MC areas indeed affected the results, an extra factor was added to code the
presence of an MC area. Based on the area names, 9 of the 26 areas were coded as MC area (see
Appendix A). The MC areas fitted remarkably well on the areas with high α values as can be seen in
Figure 4.10.
Figure 4.10 – Box plot of α values per area in the spacing condition at the end of the learning phase. The α started at 0.4 and adapted to
the user based on correctness and response times. Areas that were indicated as MC area are highlighted in orange.
The addition of the predictor MC qualitatively improved both previously described linear
mixed models for P(correct) and the response times per area on the test. Table 4.4 and 4.5 show the
effects of adding MC to the linear models of P(Correct) and response times respectively. The
presence of a MC brain area shows negative effects on performance by decreasing P(Correct) (β = -
2.49, p<0.01) and increasing response time (β = 0.89, p<0.01). Furthermore, the interaction between
the factor MC and the amount of repetitions also shows negative effects on performance by
decreasing P(Correct) (β = -0.15, p<0.05) and decreasing response time (β = 0.07, p<0.01) when the
amount of rehearsals increase. In the case of P(Correct) the interaction effect arose in the flashcard
method (β = -0.19, p<0.05). The addition of the extra interaction MC x Method x Total Repetitions,
did not explain enough variance for the model that describes the effects on response time.
Interestingly, the addition of MC in the model with P(Correct) as dependant variable, reduced
the variance in the random effect of differences between areas from 1.25 to 0. This random effect
adjusts the intercept for specific areas in the mixed linear model (Baayen et al, 2008). The reduction
to 0 indicates that the addition of predictor MC completely explained the random variability caused
by area differences. The variance in area, was reduced by more than a half (from 0.28 to 0.11) in the
model with response time as dependant variable. This indicates that although the predictor MC
explained a large proportion of the variance caused by area differences, there still remained some
unexplained variance.
Because the new models explained more variance due to the presence of MC areas the
difference between the flashcard and spacing method decreased for both P(Correct) (β = 1.89,
p<0.05) as response times (β = -1.00, p<0.01).
33
Random Effects on P(correct)
Var SD
Area 0 0
β p
M2 x TR -0.12 0.051
Table 4.4 - Overview of the estimates (β), and p-values of the fixed factors entered in linear mixed-effect model (n=1096).
Var SD
HPD95
Table 4.5 - Overview of the estimates (β), the upper and lower 95% Bayesian highest posterior density (HPD) confidence intervals, and p-
values based on the MCMC posterior distribution (determined using pvals.fnc with 1000 samples, Baayen, 2008) of the fixed factors
entered in linear mixed-effect model (n=963).
34
Further analysis was done to investigate whether any differences in P(correct) and response
times during the final test between the spacing and flashcard method are found after removing the
MC areas from the data. Table 4.6 and 4.7 show the results of the previously used models for
P(correct) (n=741) and response times (n=678) respectively, while the MC areas were removed from
the data. The results show that the difference between the flashcard and spacing method was unable
to reach a significant level P(correct) (β = 1.52, p=0.11) and response times (β = -0.5, p=0.20). Effects
of the predictors motivation (β = 0.75, p<0.01) and difficulty (β = -0.67, p<0.05) in P(correct) as well
as for difficulty in response times (β = 0.23, p=0.12) remained approximately the same.
β p
M2 x TR -0.1 0.12
Table 4.6 - Overview of the estimates (β), and p-values of the fixed factors entered in linear mixed-effect model (n=741).
HPD95
Table 4.7 - Overview of the estimates (β), the upper and lower 95% Bayesian highest posterior density (HPD) confidence intervals,
and p-values based on the MCMC posterior distribution (determined using pvals.fnc with 1000 samples, Baayen, 2008) of the fixed factors
entered in linear mixed-effect model (n=678).
Finally, after the removal of MC areas, both the final test score and the number or total trials
were corrected per participant. To demonstrate that the reduced effective learning time in the
spacing method indeed affected the total scores on the final test, the factors method, corrected
number of learning trials and motivation were submitted to a linear model with corrected test score
as dependant variable. As a result, motivation (β = 0.74, p=0.025) and the total number of trials in the
spacing method (β=0.03, p=0.037) positively influenced the performance on the final test. There was
no significant difference between learning with either the spacing or the flashcard method. Thus, any
differences between the final test scores could only be explained by differences in motivation and
effective learning time (expressed in the corrected number of learning trials).
35
Fixed Effects on the corrected final test scores
β t p
Table 4.7 - Overview of the estimates (β), t-values and p-values of the fixed factors entered in the linear model (n=46).
4.3.3 Discussion
In the reported experiment we tested whether a spacing method outperforms a flashcard method in
topographic learning. Unfortunately the experiment failed to find a beneficial effect for the spacing
method. However, the results showed that more repetitions of an area during the learning phase had
a negative effect on performance in the flashcard method by decreasing P(correct) and increasing
response time. For the spacing method P(correct) and response times remained constant, regardless
of how many repetitions per area were given to the participant during the learning phase. The
interaction between method and the amount of repetitions therefore showed that the spacing
method sufficiently rehearsed every area to remain a constant performance on the final test over all
areas, while the flashcard method did not.
One reason for the lack of a main effect between the methods could be indirectly related to
the problems in the α adjustment of the spacing method. Some areas received a very high α value
which resulted in a large number of unnecessary repetitions. This greatly reduced the effective
learning time of other areas in the spacing method. We hypothesized that the problems in the α
adjustment were generated by the presence of stimuli that contained multiple concepts (MC's). Since
the spacing algorithm specifically models the decay of single facts, this property is not satisfied by
MC stimuli. We therefore analyzed whether presence of MC’s affected the correctness and response
times on the final test, because both these measures determine the adaptation in the spacing
algorithm. The addition of MC in the statistical analysis indeed showed negative effects on the final
test performance by increasing response times and decreasing P(correct). Furthermore, the addition
of MC’s explained a large portion of the random variance caused by the differences in the to be
learned brain areas. After removing the MC area's from the data, the analysis showed that the
performance differences between the spacing and flashcard method per individual area, were unable
to reach significance. The removal of MC areas furthermore showed that only differences in learning
time (expressed by the effective total number of learning trials) and motivation could explain the
total scores on the final test.
Previous studies showed that spacing your repetitions and implementing a spacing method in
your learning schedule is beneficial for your learning gains (Ebbinghaus, 1895; Brown and Huda,
1961; Bahrick and Phelps, 1987; Cepeda et al 2006; Pavlik and Anderson, 2005; Pavlik and Anderson,
2008; Van Woudenberg, 2008; Van Thiel 2010). After removing the MC areas, the current study could
not confirm nor deny if a similar effect can be found for an implementation of the spacing method in
a computer program that aids topographic learning. The meta study by Cepeda et al (2006) indicated
that the combined results of 10 previous spacing experiments in which spacing was compared to
massed learning with a retention interval of 10 minutes to 1 day also failed to reach significance
level. This could indicate that the final test in our study was conducted too early to find any effects
between the methods. The studies mentioned above, indicate that the effects of spacing are long
lasting and it could be interesting to test the participants on a later point in time to find any long
term effects in a future study.
36
Furthermore, future studies should take the effects of MC’s in the learning material into
account by either avoiding MC’s and focus on single facts, or explicitly model MC’s. If retrieving a MC
fact from memory is considered as retrieving multiple facts from memory, each fact could be
modelled separately. Subsequently, the model could judge which concept was incorrectly retrieved
from memory based on the position where the participant clicked (e.g. responding to an anterior
part while the program asked for an anterior part). Furthermore, single concepts could also be added
as single facts (e.g. an interactive circle in the front of the cross section that corresponds to anterior
and vice versa for posterior). The model could then adjust its parameters for both the area of the
current trial, as areas that are linked to it.
Another shortcoming of the current model for topographic learning is that the current model
adjusts the α value only for the stimulus of the trial that is being assessed. However when giving an
incorrect answer, the participant not only responds incorrectly to the area of the current trial, but
also to the incorrect area that was responded to. This indicates a failure to retrieve the correct
memory trace as well as incorrectly retrieving a different memory trace of another area, given that
both areas were encountered previously. The model should therefore adjust the α value of both
memory traces to adjust their decay value. As a result, this change should provide a faster repetition
of both the area of the current trial as the area that was incorrectly responded to.
The topographical learning of stimuli from the current experiment is comparable to learning
other overviews that contain learning material from psychology, biology, geography and medicine.
Most of these topographical overviews contain categorical groups. The current experiment, for
example, contained several stimuli that were parts of the Cingulate Cortex. To aid the learning
process, the program should help the participant learn to recognize the brain areas categorically (e.g.
also include stimuli that summarize a subset of more specific facts). Multiple participants of the
current study recommended such a feature in a future release of the study program.
The WTK spacing model (including the Nijboer (2011) changes), on which this study was
based, predicted the response time based on the time it takes the participants to initiate their first
key press to type in their answer. The current implementation required participants to find the
indicated area while Fitt’s Law was used to model the extra movement time. To avoid problems with
incorrect estimations of movement time, the learning of topographical facts could also be reversed
by asking the participant to type in the area name that corresponds to an indicated area position.
This set up is also more similar to the more successful set up that investigated word pair learning.
37
5 Final discussion
The questions that were addressed in this study were how repeated spaced testing improves learning
and how a model containing this method could improve topographical learning such as the Dutch
cities. It was hypothesized that repeated spaced testing the learning material improved learning
because this method is able to increase the retrieval effort of mentally stored facts. The first
experiment in this study manipulated both the amount of repetitions and time between repetitions
(ISI) and showed that pupil dilation (associated with mental effort) increased when the ISI increased
and decreased for every new repetition. This finding indicates that spacing your repetitions increases
retrieval effort. Learning gains associated with spacing are therefore explained by increased retrieval
effort during successful retrievals.
Previous studies indicated that an implementation of such a spacing model improves learning
gains for word-pair learning. The current study continued the work on the spacing model that has
been created by Van Woudenberg (2008), Van Thiel (2010), Koelewijn (2010) and Nijboer (2011) by
adapting it to topographical learning. The second experiment in this study investigated the learning
gains of this topographical spacing model by comparing it to a flashcard method. Unfortunately, no
difference could be found between the performance measures on both methods. However,
problems with the stimuli were identified. The presence of multiple concepts in the stimuli showed
to have significant effects on the behavioural results during the final test.
The spacing method showed that it sufficiently rehearsed every area to remain a constant
performance on the final test over all areas, while the flashcard method did not. Based on this result
as well as addressing the known issues with the stimuli, retention interval and the model, the
spacing method in topographical learning still has potential. Future experiments might therefore
show the learning gains that were expected. An adjusted spacing method could then show that the
model is suitable for more learning contexts than only word pair learning. It then offers wider
implementations opportunities for learning software that adjusts to individual differences. This is
very important goal for teachers and a great aid to their classroom instruction, because adjusting to
individual differences is harder to accomplish when teaching a full classroom (Suhre & Harskamp,
2002). Furthermore, the learning efficiency provided by the method also offers a cost effective
alternative to companies and government agencies because the learner is able to learn more in less
time.
The results of the first experiment showed that pupil dilation was able to predict the
probability to be correct on a next trial. Although using pupil dilation is still cumbersome, using the
measure could also aid instruction on special Schools. Students at special schools undergo a strongly
individualized learning program, because their physical handicaps or diseases (e.g. Duchenne
muscular dystrophy) cause a great variability in disabilities. Their physical handicap often impairs
their communication and/or writing skills. The current study indicates that using pupil dilation as an
measure of effort during learning could be a viable option to help in developing customized learning
software for this target group.
38
6 References
Anderson, J. R., Bothell, D., Lebiere, C., & Matessa, M. (1998). An integrated theory of list memory.
Journal of Memory and Language, 38, 341-380.
Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological
Science, 2(6), 396.
Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control
processes. In K. W. Spence, & J. T. Spence (Eds.), The psychology of learning and motivation:
Advances in research and theory (pp. 89-195). New York: Academic.
Atkinson, R. C., & Shiffrin, R. M. (1971). The control of short-term memory. Scientific American, 224,
82-89.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G.A. Bower (Ed.), the Psychology of Learning
and Motivation, New York: Academic Press., 8, 47-89.
Bahrick, H. P. (1979). Maintenance of knowledge: Questions about memory we forgot to ask. Journal
of Experimental Psychology: General, 108(3), 296-308.
Bahrick, H. P., & Phelphs, E. (1987). Retention of spanish vocabulary over 8 years. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 13(2), 344-349.
Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing
resources. Psychological Bulletin, 91(2), 276-292.
Beatty, J., & Lucero-Wagoner, B. (2000). The pupillary system. In J. T. Cacioppo, L. G. Tassinary & G.
G. Berntson (Eds.), Handbook of psychophysiology (2nd ed., pp. 142-162). New York:
Cambridge University Press.
Bradley, M. M., Miccoli, L., Escrig, M. A., & Lang, P. J. (2008). The pupil as a measure of emotional
arousal and autonomic activation. Psychophysiology, 45(4), 602-607.
Briesemeister, B. B., Hofmann, M. J., Tamm, S., Kuchinke, L., Braun, M., & Jacobs, A. M. (2009). The
pseudohomophone effect: Evidence for an orthography-phonology-conflict. Neuroscience
Letters, 455(2), 124-128.
Brown, J. (1958). Some tests of the decay theory of immediate memory. The Quarterly Journal of
Experimental Psychology, 10(1), 12-21.
39
Brown, J., & Huda, M. (1961). Response latencies produced by massed and spaced learning of a
paired-associates list. Journal of Experimental Psychology, 61(5), 360-364.
Buckner, R. L., Koutstaal, W., Schacter, D. L., Wagner, A. D., & Rosen, B. R. (1998). Functional-
anatomic study of episodic retrieval using fMRI:: I. retrieval effort versus retrieval success.
NeuroImage, 7(3), 151-162.
Byrnes, J. P. (2000). Cognitive development and learning in instructional contexts (2nd ed.) Allyn and
Bacon Boston.
Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The effects of tests on learning and
forgetting. Memory & Cognition, 36(2), 438-448.
Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory & Cognition, 20(6),
633-642.
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., & Rohrer, D. (2006). Distributed practice in verbal
recall tasks: A review and quantitative synthesis. Psychological Bulletin, 132(3), 354.
Conati, C., & Merten, C. (2007). Eye-tracking for user modeling in exploratory learning environments:
An empirical evaluation. Knowledge-Based Systems, 20(6), 557-574.
Cornelissen, F. W., Peters, E. M., & Palmer, J. (2002). The eyelink toolbox: Eye tracking with MATLAB
and the psychophysics toolbox. Behavior Research Methods, Instruments, & Computers, 34(4),
613-617.
Dempster, F. N. (1996). Distributing and managing the conditions of encoding and practice. In E. L.
Bjork, & R. A. Bjork (Eds.), Memory, handbook of perception and cognition (2nd ed., pp. 317-
344). San Diego, US: Academic Press.
Farrell, S., & Lewandowsky, S. (2002). An endogenous distributed model of ordering in serial recall.
Psychonomic Bulletin & Review, 9(1), 59-79.
Fish, S. C., & Granholm, E. (2008). Easier tasks can have higher processing loads: Task difficulty and
cognitive resource limitations in schizophrenia. Journal of Abnormal Psychology, 117(2), 355-
363.
Gardiner, F. M., Craik, F. I. M., & Bleasdale, F. A. (1973). Retrieval difficulty and subsequent recall.
Memory & Cognition, 1(3), 213-216.
Gates, A. I. (1917). Recitation as a factor in memorising. Archives of Psychol., New York, 6(40)
Gilzenrat, M. S., Cohen, J. D., Rajkowski, J., & Aston-Jones, G. (2003). Pupil dynamics predict changes
in task engagement mediated by locus coeruleus. Society for Neuroscience Abstracts, Program,
Gilzenrat, M. S., Nieuwenhuis, S., Jepma, M., & Cohen, J. D. (2010). Pupil diameter tracks changes in
control state predicted by the adaptive gain theory of locus coeruleus function. Cognitive,
Affective, & Behavioral Neuroscience, 10(2), 252-269.
40
Glanzer, M., & Cunitz, A. R. (1966). Two storage mechanisms in free recall1. Journal of Verbal
Learning and Verbal Behavior, 5(4), 351-360.
Glover, J. A. (1989). The" testing" phenomenon: Not gone but nearly forgotten. Journal of
Educational Psychology, 81(3), 392-399.
Granholm, E., Asarnow, R. F., Sarkin, A. J., & Dykes, K. L. (1996). Pupillary responses index cognitive
resource limitations. Psychophysiology, 33(4), 457-461.
Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. Wiley, New York.
Heinrich, W. (1896). Die aufmerksamkeit und die funktion der sinnesorgane. Zeitschrift Für
Psychologie Und Physiologie Der Sinnesorgane, 9, 342-388.
Hess, E. H., & Polt, J. M. (1964). Pupil size in relation to mental activity during simple problem-solving.
Science, 143(3611), 1190-1192.
Iqbal, S. T., Zheng, X. S., & Bailey, B. P. (2004). Task-evoked pupillary response to mental workload in
human-computer interaction. CHI'04 Extended Abstracts on Human Factors in Computing
Systems, 1477-1480.
Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering
a solution. Journal of Verbal Learning and Verbal Behavior, 17(6), 649-667.
Jacoby, L. L., Shimizu, Y., Daniels, K. A., & Rhodes, M. G. (2005). Modes of cognitive control in
recognition and source memory: Depth of retrieval. Psychonomic Bulletin & Review, 12(5), 852-
857.
Jolicœur, P., & Dell'Acqua, R. (1998). The demonstration of short-term consolidation. Cognitive
Psychology, 36(2), 138-202.
Jones, N. P., Siegle, G. J., Muelly, E. R., Haggerty, A., & Ghinassi, F. (2010). Poor performance on
cognitive tasks in depression: Doing too much or not enough? Cognitive, Affective, & Behavioral
Neuroscience, 10(1), 129-140.
Kahneman, D., & Beatty, J. (1966). Pupil diameter and load on memory. Science, 154(3756), 1583-
1585.
Karpicke, J. D., & Roediger, H. L. (2008). The critical importance of retrieval for learning. Science,
319(5865), 966.
Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., & Broussard, C. (2007). What’s new in
psychtoolbox-3? [Abstract]. Perception, 36 ECVP Abstract Supplement.
Koelewijn, L. (2010). Optimizing fact learning gains: Using personal parameter settings to improve the
learning schedule. (Unpublished MSc). University of Groningen, Groningen.
Krinsky, R., & Nelson, T. O. (1981). Task difficulty and pupillary dilation during incidental learning.
Journal of Experimental Psychology: Human Learning and Memory, 7(4), 293-298.
41
Magliero, A. (1983). Pupil dilations following pairs of identical and related to-be-remembered words.
Memory & Cognition, 11(6), 609-615.
McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in
the classroom. European Journal of Cognitive Psychology, 19(4), 494-513.
McDaniel, M. A., Einstein, G. O., Dunay, P. K., & Cobb, R. E. (1986). Encoding difficulty and memory:
Toward a unifying theory. Journal of Memory and Language, 25(6), 645-656.
McIntyre, S. H., & Munson, J. M. (2008). Exploring cramming. Journal of Marketing Education, 30(3),
226.
Murdock, B. B. (1962). Direction of recall in short-term memory. Journal of Verbal Learning & Verbal
Behavior, 1, 119-124.
Nijboer, M. (2011). Optimal fact learning: Applying presentation scheduling to realistic conditions.
(Unpublished MSc). University of Groningen, Groningen.
Pavlik, P. I., & Anderson, J. R. (2005). Practice and forgetting effects on vocabulary memory: An
activation-based model of the spacing effect. Cognitive Science, 29(4), 559-586.
Pavlik, P. I., & Anderson, J. R. (2008). Using a model to compute the optimal schedule of practice.
Journal of Experimental Psychology: Applied, 14(2), 101-117.
Peterson, L., & Peterson, M. J. (1959). Short-term retention of individual verbal items. Journal of
Experimental Psychology, 58(3), 193-198.
Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater difficulty
correctly recalling information lead to higher levels of memory? Journal of Memory and
Language, 60(4), 437-447.
Raaijmakers, J. G. W. (2003). Spacing and repetition effects in human memory: Application of the
SAM model. Cognitive Science, 27(3), 431-452.
Rajkowski, J., Kubiak, P., & Aston-Jones, G. (1994). Locus coeruleus activity in monkey: Phasic and
tonic changes are associated with altered vigilance. Brain Research Bulletin, 35(5-6), 607-616.
Roediger, H. L., & Karpicke, J. D. (2006). The power of testing memory: Basic research and
implications for educational practice. Perspectives on Psychological Science, 1(3), 181-210.
Rundus, D. (1971). Analysis of rehearsal processes in free recall. Journal of Experimental Psychology,
89(1), 63.
Schultheis, H., & Jameson, A. (2004). Assessing cognitive load in adaptive hypermedia systems:
Physiological and behavioral methods. Adaptive Hypermedia and Adaptive Web-Based Systems,
225-234.
Schiff, J. M., & Fao, P. (1874). La pupille considérée comme esthésiomètre. Marseille Médical, 2, 736-
741.
42
Shiffrin, R. M., & Atkinson, R. C. (1969). Storage and retrieval processes in long-term memory.
Psychological Review, 76(2), 179-193.
Stanners, R. F., Meunier, G. F., & Headley, D. B. (1969). Reaction time as an index of rehearsal in
short-term memory. Journal of Experimental Psychology, 82(3), 566-570.
Suhre, C., Harskamp, E. (2002). Praktijkbrochure ICT in het onderwijs. Een onderzoek naar de
mogelijkheden van het ICT-gebruik in het basis- en voortgezet onderwijs. Groningen: GION,
Gronings instituut voor onderzoek van onderwijs, opvoeding en ontwikkeling.
Rijksuniversiteit Groningen.
Thompson, C. P., Wenger, S. K., & Bartling, C. A. (1978). How recall facilitates subsequent recall: A
reappraisal. Journal of Experimental Psychology: Human Learning and Memory, 4(3), 210-221.
Van Rijn, H., Van Maanen, L., & Van Woudenberg, M. (2010). (Under revision) spacing-based
optimization for short-session fact learning. Unpublished manuscript.
Van Thiel, W. (2010). Optimize learning with reaction time based spacing: By modifying the order of
items in a learning session. (Unpublished MSc). University of Groningen, Groningen.
Van Woudenberg, M. (2008). Optimal word pair learning in the short term: Using an activation based
spacing model. (Unpublished MSc). University of Groningen, Groningen.
Wheeler, M., Ewers, M., & Buonanno, J. (2003). Different rates of forgetting following study versus
test trials. Memory, 11(6), 571-580.
43
7 Appendix A - Stimuli
Stimuli
Brodmann Marked as
Name
Area MC area
Premotor Cortex 6
Orbitofrontal Cortex 11
Visual Cortex 17
Subgenual Cortex 25
Ectosplenial Cortex 26
Piriform Cortex 27
Agranular Retrolimbic 30
Perirhinal Cortex 35
Parahippocampal Cortex 36
Fusiform Gyrus 37
Temporopolar Area 38
44