مقاله
مقاله
مقاله
EMPIRICAL STUDY
a
University of Birmingham b University of Suffolk
Abstract: Since its first adoption as a computational model for language learning, ev-
idence has accumulated that Rescorla–Wagner error-correction learning (Rescorla &
Wagner, 1972) captures several aspects of language processing. Whereas previous stud-
ies have provided general support for the Rescorla–Wagner rule by using it to explain
the behavior of participants across a range of tasks, we focus on testing predictions gen-
erated by the model in a controlled natural language learning task and model the data
at the level of the individual learner. By adjusting the parameters of the model to fit the
trial-by-trial behavioral choices of participants, rather than fitting a one-for-all model
using a single set of default parameters, we show that the model accurately captures par-
This is an open access article under the terms of the Creative Commons Attribution License,
which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
ticipants’ choices, time latencies, and levels of response agreement. We also show that
gender and working memory capacity affect the extent to which the Rescorla–Wagner
model captures language learning.
Introduction
We humans share with other species many core learning mechanisms that
allow us to adapt to our environment (Rescorla, 1988). These mechanisms
include, among others, classical conditioning (i.e., Pavlovian conditioning;
Pavlov, 1927), instrumental conditioning (also operant conditioning; Skinner,
1938), and forms of social learning, such as vicarious learning (Bandura,
1962). Arguably the most uniquely defining human learning ability is lan-
guage learning, which also includes efficient transgenerational transmission
and is foundational for social inclusion and cohesion. However, whereas core
learning mechanisms are relatively well understood, language learning remains
much of a mystery (Ambridge & Lieven, 2011). An early attempt by Skinner
(1957) to account for language learning using the same principles as those gov-
erning lower-level cognitive tasks was quashed by Chomsky (1959). For much
of the remainder of the 20th century, language was seen as a by-and-large innate
system, governed by rules and handled by a uniquely human and specialized
cognitive structure. This structure was initially conceptualized as the language
acquisition device, and later extended to become universal grammar.
This dominant view was challenged from two sides simultaneously. The
emergence of usage-based linguistics in the 1980s (Langacker, 1987) promoted
a view of language as a dynamic and probabilistic system, resulting from
general cognitive capacities acting on language input (Dabrowska
˛ & Divjak,
2015). This view meshed well with connectionist frameworks, which showed
that rulelike behavior can emerge from exposure to usage alone and that lan-
guage knowledge is sensitive to properties of the input (Plaut & Gonnerman,
2000; Seidenberg & McClelland, 1989). Connectionism, arguably, paved the
way for changes in theorizing too, toward a view of language as being learned
like any other skill, and the early 2000s witnessed the start of a reintegration
of the basic principles of learning into the body of work on language (e.g., see
Bybee & McClelland, 2005; for more up-to-date works, see Ellis et al., 2016,
which addresses both first and second language learning, as well as Chuang
et al., 2021, which addresses lexical acquisition in second and third languages).
Language was now seen as being amenable to the same general-purpose
cognitive capacities and learning mechanisms that humans and animals use
to navigate and adapt to their environment (cf. Ellis, 2006a; Ellis & Sagarra,
2010, 2011; Sturdy & Nicoladis, 2017).
Among these learning models, Rescorla and Wagner’s (1972) model of
classical conditioning stands out for its simplicity and its ability to explain a
range of empirical learning phenomena (Siegel & Allan, 1996). This model
has been shown to be biologically plausibile (Chen et al., 2008) and to have an
evolutionary advantage over other more powerful learning mechanisms, in the
sense that it has a higher likelihood of being naturally selected and persisting
in the evolutionary process, compared to other plausible learning mechanisms
(for more details, see Trimmer et al., 2012).
Background Literature
The Rescorla–Wagner Model
As a model of classical conditioning, the Rescorla–Wagner (R–W) model is
concerned with situations where an entity (a human, an animal, or a machine)
has to learn the predictive relationship between objects and/or events (i.e., cues
and outcomes) in an environment, and where cues compete for their predictive
value for an outcome while iteratively (re)calibrating the learning (or associa-
tion) weights. More specifically, an association weight reflects the tendency of
an outcome to occur in the presence of a certain cue. A higher positive associ-
ation weight value for a particular outcome corresponds to a higher likelihood
of occurrence of that outcome in the presence of the cue; conversely, a highly
negative value corresponds to a greater likelihood of nonoccurrence of that
outcome (the cue is said to be inhibitory in this case). Values close to zero
mean low chances of observing (if the weight is positive) or inhibiting (if the
weight is negative) the outcome.
The R–W model assumes that the organism computes a simple error-
correcting learning rule used to update the association weights in each new
learning event (e.g., each trial in a behavioral experiment). The general idea
behind the correction rule is that the association between a cue and outcome is
(a) strengthened if both cue and outcome are present in the learning event, (b)
weakened if the cue is present but the outcome is not, and (c) kept the same
if the cue itself is absent. The updating of the association weights is driven by
the discrepancy between the expected and the obtained outcome, such that the
magnitude of the update—how much the association weights are adjusted—is
determined by two parameters called learning rates, and the direction of the
update—whether it increases the weight or decreases it—depends on the sign
of the difference between the expected and the observed outcome. In this way,
most broadly, for the R–W model, learning is about the outcomes, and this sets
it apart from related models where learning is about the input cues (e.g., Pearce
& Hall, 1980).
Another feature of the R–W model is that, although the outcomes are up-
dated independently from each other, input cues compete for the predictivity
of outcomes. In other words, the adjustment of the weights depends not only
on the single cue being updated but on all the cues present in the learning
event through their sum of association weights. This cue competition principle
is what allowed the R–W model to explain many of the puzzling phenomena
of classical conditioning, some of which were also shown to be valuable for
understanding the mechanics of language learning (see the next section for a
discussion).1 One of the best-known examples of such learning phenomena is
the blocking effect (Kamin, 1969). This effect occurs when a cue is trained in
compound with a second cue to predict an outcome but when the second cue
is already a good predictor of the outcome. In such cases, the first cue cannot
form a strong association with the outcome (i.e., the first cue is blocked by
the second cue). More generally, the cue competition principle often results
in the observation that the best cues for the outcome prevent other cues from
developing a strong association with that same outcome.
Method
Participants
Sixty-six participants (Mdnage = 20 years; range = 18–65; 41 females) took
part in the experiment in exchange for a £7 Amazon voucher. Participants were
university students and staff. All of them were native English speakers without
knowledge of Polish or any other Slavic languages, had normal or corrected-
to-normal hearing and vision, and did not declare any learning disabilities.
Participants had different educational backgrounds, and many of them could
speak other languages in addition to English (the distributions of education
and language backgrounds are presented in Appendix S1 in the Supporting
Information online).
Stimuli
Each event in our learning task consisted of a scene that represented a joint
action performed by a group of human and/or animal characters, and for each
learning event, participants saw a picture that depicted the scene (Figure 1),
along with an audio recording of a Polish clause describing it. A new trial
started with a fixation dot that was shown at the center of the screen for about
500 ms, followed by the simultaneous display of the picture of the scene. Par-
ticipants heard the audio recording of the clause describing the scene 250 ms
after the onset of the picture of the scene while the picture remained on display.
A new trial was then presented after about 1 s.
We used the verb chodzić (“walk”), with its two possible plural past tense
forms chodziły (nonmasculine plural form) and chodzili (masculine plural
form), as the common action in all learning events. An example of a clause
heard by participants is Chłopiec i kaczka chodzili (“The boy and the duck
were walking”). The first three columns in Table 1 provide a list of all charac-
ters used in the experiment, along with their linguistic categories in terms of
gender and animacy; the last two columns concern the design of the task and
will become relevant in the next section.
The images representing the different human and animal characters were
extracted from Adobe Stock (https://stock.adobe.com) and then edited using
Adobe Photoshop CC 2018. The audio recordings of both the character labels
and the two verb forms were prepared using the speech synthesizer software
Speech2Go (Harpo Software, 2018).
Table 1 Information about the characters used in the language learning task
Design
First, participants were taught the Polish labels of the different animal and
human characters used in the learning task. Specifically, participants were
presented with the images of all the characters along with their correspond-
ing labels, first individually and then in combination, as they appear later in
the learning task (e.g., a dog; a boy, a dog, and a monkey). There were eight
such character combinations, and participants were required to remember at
least seven of them (i.e., to reach a retention accuracy of 87.5%) before they
could proceed to the main task (see Appendix S2 in the Supporting Informa-
tion online for more details). Participants were allowed up to 10 attempts to
reach the required accuracy level.
The main task consisted of a training and a test phase. The design of the
training part of the task is summarized in Table 2. The task contained 12 cues
and two outcomes. The “+” sign indicates that the cues were presented in
compound, and the arrow symbol “→” indicates that the outcome on the right-
hand side followed the cues. Thus, for example, “FP1 + FP2 + FP3 → np”
represents a clause such as Dziewczyna, kobieta i babcia chodziły (“The girl,
the woman, and the grandma were walking”), where the subject of the clause is
made up of three female characters and the verb is in the nonmasculine plural
(np) past form, as opposed to the masculine plural (mp) past form. There were
two training blocks, each containing four events that were repeated 15 times
each. The order of the events was fully randomized within each block. The
events in the first block were composed of cue pairs, whereas those in the
second block were composed of cue triples.
We structured our task in this way to create blocking-like effects as usually
seen in Pavlovian learning experiments. For example, the addition of cues FA3
and FP3 to the compounds “FA1 + FA2” and “FP1 + FP2,” respectively, in
the second block should reduce the association strength that can be gained by
FA3 and FP3 for outcome np. Likewise, training MP1 and MP2 with outcome
mp in the first block should block FA4 from acquiring a positive association
with mp. Besides predicting that FA4 could get blocked, we also predicted that
it could become inhibitory for mp, that is, gain a negative association weight
with mp, as will be seen when we present the model fit simulation results.4 We
thus refer to FA3 and FP3 as blocked cues, and to FA4 as an inhibitory blocked
cue.
We categorized the cues into seven different categories based on their lin-
guistic properties and the blockinglike effects they predict (see the rightmost
column in Table 1). Specifically, the seven categories were based on whether
the cue is masculine or feminine, whether it is personal or animate, whether it
is predicted to be blocked or unblocked, and whether it is predicted to be an
inhibitory blocked cue. The similarity between the cues within each of these
categories is reinforced by the fact that they share the same association weights
with each outcome, according to the R–W theory, as will be shown in the Re-
sults section on learned noun–verb form association weights.
After training, the participant moved to the testing phase. The test consisted
of two components. By using a randomly generated cue from each category,
we tested learning once on all possible pairs mixing either cues from the same
cue category (e.g., FP1 + FP2 from the uFP group) or cues from different cate-
gories (e.g., MA1 + FP3 from the uMA and bFP groups). We also included the
four combinations consisting of cue triples presented in the training phase as
a sanity check for participants’ recall (these combinations were excluded from
our main analyses). Overall, in the test phase, each learner encountered in total
29 cue combinations, which were randomly selected from a total of 70 possible
cue combinations. (The exact format and instructions used while administer-
ing the task are provided in Appendix S2 in the Supporting Information online,
and the list of all test cue combinations is provided in Appendix S3.)
Finally, let us return to the question of why we adopted Kiełkiewicz-
Janowiak and Pawelczyk’s (2014) rule, whereby any subject combination that
contains a masculine referent takes the masculine personal plural form. First,
having the combination “MA1 + MA2 + MA3” associated with “mp” rather
than “np” made it possible to have a balanced number of mp and np events both
within the full task and within each block. This reduced the likelihood of any
bias towards np emerging purely due to the design. Second, this allowed us to
have more challenging combinations that better probe participants’ learning,
notably combinations intermixing feminine and masculine cues.
Analysis
From the learning task, data from three participants were discarded because
they persistently chose the same response across the test phase (27 or more
out of 29 responses; i.e., rate > 93%).5 To analyze participants’ choices and
response times, we used generalized mixed-effects modeling. The data con-
tained repeated measurements from the same participants and items on multi-
ple trials, hence we added random effects for both participants and items (i.e.,
cue combinations in the test phase). We selected the random effects structure
of the models by using a top-down strategy starting with all random intercepts
and slopes and then removing higher-order random effects step by step based
on Akaike information criterion scores. We ran the mixed-effects models in
R (R Core Team, 2019) using the lme4 package; the p values were obtained
using the lmerTest package based on Satterthwaite’s approximations, and the
model summary tables were generated using the sjPlot package. To determine
statistical significance, we used an alpha level of .05. In the analysis of
(2021). In each trial, participants were asked to retain a list of digits (between
1 and 9) presented one at a time. Each digit presentation lasted for 1 s and was
followed by a simple mathematical operation that could be either correct or
incorrect (50% of the mathematical operations were correct). Participants had
to verify the veracity of the mathematical operation before the next digit could
be displayed. At the end of each trial, they had to type in the digits in the same
order in which they had been presented to them. The length of the digit lists
increased gradually from two to eight, with each length repeated three times.
The task, thus, consisted of 21 trials.
Analysis
We calculated each participant’s WM span by first summing the number of
correct items they recalled in the correct order and then z-transforming the ob-
tained score. We excluded one participant whose WM score was discontinuous
from the rest of the sample (their WM score was −4.3 standard deviations from
the mean, whereas the second furthest WM score was −1.8 standard deviations
from the mean).
Computational Modeling
The Rescorla–Wagner Equations
The R–W model (Rescorla & Wagner, 1972) describes computationally how
the associations between cues and outcomes are established. In the context of
our experiment, a cue is the Polish label and image of one of the human or an-
imal characters appearing in the scene on a given trial, and an outcome is the
verb form describing their common action. For example, the clause Chłopiec,
meżczyzna
˛ i małpa chodzili (“The boy, the man, and the monkey were walk-
ing”) has as cues chłopiec, meżczyzna,
˛ and małpa, and as outcome chodzili. In
our case, the association weight (or strength) measures the tendency of a verb
form to occur in the presence of a certain noun.
After encountering a clause, the learner updates the association weight be-
tween a cue ci and an outcome o, depending on whether the cue and outcome
appear in the sentence, using a delta-type correction rule:
wt (ci , o) = wt−1 (ci , o) + αβδt−1
where:
⎧
⎪
⎪ 0, if ci absent
⎨λ − w c , o , if c present and o present
⎪
t−1 j i
δt =
⎪
⎪
c j present
⎪
⎩0 − wt−1 c j , o , if ci present and o absent
c j present
The subscript t refers to the present trial, thus wt (ci , o) is the association
strength between ci and o at trial t. α and β denote the learning rates for the
cue ci and outcome o respectively. λ refers to the maximum associability to an
outcome and is almost always set to 1.
Based on the equation, three cases determine how an association weight is
adjusted:
1. If the cue is absent, we make no adjustment to the weight.
2. If both the cue and outcome are present, then this provides positive evi-
dence that should strengthen the association weight, and the sum of the
weights of the cues present in the current event is adjusted towards the
maximum associability value.
3. If the cue is present but the outcome is not observed, then this provides
negative evidence that should weaken the association weight, and the
sum of weights is adjusted towards 0.
For the implementation of the model, we used the package that was developed
as part of the study by Milin et al. (2020).
Model Evaluation
To help explain participants’ behavioral data, we derived an activation-based
measure from the fitted R–W model, which we call activation support for an
outcome. The measure aims to explain participants’ form choices and response
times, and is defined as the difference between the activation of the outcome of
interest and the activation of the remaining outcome. For example, the activa-
tion support for the nonmasculine plural form (np) is given by the following:
activation support (np) = activ (np) − activ (mp)
We hypothesized that the higher the activation support for a verb form (i.e.,
the stronger the evidence from the model supporting the verb form relative to
the other possible form), the higher the likelihood of that form being selected
by participants. We also expected that the magnitude of the activation support
would negatively correlate with participants’ response times. In other words,
the higher the magnitude of this measure, the quicker the participant’s response
would be. This should translate into a quadratic relationship between activation
support and response times, with the slowest responses expected when activa-
tion support values are near zero, and the fastest responses expected for high
positive or negative values.
Results
This section evaluates the extent to which the R–W model explains our par-
ticipants’ behavior by fitting a separate model to each participant’s data, and
tests whether the model fit quality is affected by individual differences such as
WM span, age, and gender. We first present some descriptive results on the
repeated only 15 times. This hypothesis was confirmed by rerunning the sim-
ulations presented in Figure 2, now with 1,000 repetitions per event, as shown
in Appendix S7 in the Supporting Information online; blocking and inhibitory
blocking effects occurred for all participants regardless of their learning rate or
event ordering. These results confirm what we pointed out before: Biases and
differences in learning are more likely to manifest early on in learning (Ellis,
2006a).
Figure 3 Proportion of participants that each model fitted the best. R–W = Rescorla-
Wagner.
and whereby a participant always chooses the nonmasculine verb form except
when a masculine personal cue is present (we also refer to this strategy as
the “feminine-biased” strategy). The normative strategy is the one generally
adopted by native speakers of Polish, whereby the masculine verb form is al-
ways selected except when all cues are feminine (referred to as the “masculine-
biased” strategy). We also included two basic strategies, whereby a participant
either always chooses the masculine verb form (referred to as the “masculine-
only” strategy) or always chooses the nonmasculine verb form (referred to as
the “feminine-only” strategy). The latter two strategies were included to cap-
ture participants’ behavior at the extremes.
Figure 3 displays the proportion of participants best fitted by each of
the five resulting models (R–W and our four decision strategies); we consid-
ered the model(s) with the highest participant–model match rate among the
five models as the best-fit model(s). The R–W model was the model that
best explained participants’ responses (31 out of 63 participants), followed
closely by the normative strategy (26 participants). The other three strategies
explained participants’ choices substantially less well than those two strate-
gies (< 12 participants). The fact that the R–W model and the normative
strategy were close in capturing participants’ behavior is not very surprising
since the verb forms used in the training events were selected based on the
Table 3 Fixed effects structures of the (generalized) linear mixed-effects models ex-
plaining participants’ nonmasculine plural form choices (left) and response times
(right) based on activation support from the fitted Rescorla–Wagner models
normative rules and the predictions of the R–W model were largely in ac-
cordance with the normative strategy (Figure 3). It is interesting, though, that
the R–W model managed to learn this strategy implicitly without any prior
experience based on a simple general learning rule. The average percent-
age of response matches between the R–W model and the normative strat-
egy per participant was above 90%, and the average percentage of response
matches between the R–W model and the prescriptive strategy was above
85%.
Figure 4 Relationship between the Rescorla–Wagner activation support for the non-
masculine plural form and the proportion of nonmasculine plural choices made by par-
ticipants (left), and relationship between the Rescorla–Wagner activation support for
the nonmasculine plural form and participants’ response times (right).
choices, OR = 6.78, p < .001, 95% CI [3.82, 12.03]. Figure 4 (left pane)
also shows that this relationship is asymmetrical around 0, reflecting a strong
bias towards the masculine verb form that, even with (activation-based) ev-
idence supporting the nonmasculine form, can still lead to a preference for
the masculine form. Also, and in line with our hypotheses, the second-order
polynomial term of activation support was a significant predictor of response
time, as there was a quadratic relationship between the activation support and
response time, with the slowest responses recorded for the least supported
events, b = −0.20, p = .012, 95% CI [−0.35, −0.04]; see also Figure 4, right
pane.
These results suggest that the fitted models performed well in predicting
participants’ form choices, and that the information encoded in the associa-
tion weights—the basic currency of a R–W model—is a good predictor of
both the likelihood of choosing a particular verb form and the speed with
which the response is made. Participants’ level of agreement regarding the
choice of a certain verb form thus differed depending on the activation sup-
port of that particular form, with a high level of agreement expected and at-
tested for high positive or low negative activation support values and with a
high level of disagreement expected and attested for activation support around
zero.
was more challenging for the model to capture their effect on participants’
choices.
Table 4 Summary of the linear regression model assessing the effect of working mem-
ory (WM) span and gender on the proportion of participant–model matches
Predictors b 95% CI p
choosing verb forms in accordance with the R–W model in our language learn-
ing task.
Discussion
Summary of Findings
Our findings show that a R–W mechanism captures well how participants learn
subject–verb agreement in a morphologically complex language and, by ex-
tension, how they might learn language through mere exposure to it. With an
average fit accuracy of 68%, based on a simple activation-based decision strat-
egy, the model explained the verb form choices made by a large proportion
of participants rather well.7 More interestingly, an activation-based measure
extracted from the best-fitting models correlated strongly with both the like-
lihood of a particular verb form choice and the time required to make that
choice.
The model also provided insights as to why participants might display high
or low agreement levels when choosing a verb form, depending on the nature
of the subject of the clause. According to the model, this is due to the associ-
ation strengths that the participants acquire, which are used to calculate the
activation support for each of the possible verb forms. These association
strengths are mostly affected by (a) the learner’s learning rate for the cues (the
learning rate determines the magnitude of the correction of the weights, based
on the estimated error in each trial) and (b) the distribution of the learning
events they encountered during the learning stage (this would include the fre-
quency of each learning event and the order of the learning events, among
other things). Thus, one prediction from our study is that changing the order or
the relative frequencies of the learning events during the training might lead to
different choice patterns from those we observed here.
We also found a significant relationship between both gender and WM ca-
pacity and the participant–model match rates, which sheds light on what might
have driven the observed differences in the quality of model fit. The fact that
in our experiment a larger proportion of women than men acted in accordance
with a R–W mechanism is in line with findings from several previous stud-
ies that highlighted the association between gender and classical condition-
ing for both humans (Lonsdorf et al., 2015; Merz et al., 2018) and animals
(e.g., Velasco et al., 2019). This suggests a significant difference in learning
between men and women, with women being better modeled by the R–W error-
correction learning rule. Women are generally known to have a small language
advantage over men (see Kimura, 1999, for an extensive assessment), specifi-
cally in areas pertaining to lexical retrieval (Balling & Baayen, 2008, 2012). It
has been suggested that this might be due to women having a superior declara-
tive memory, which they could use to generalize over stored neighboring forms
(Hartshorne & Ullman, 2006).
The finding that the likelihood of a language learner behaving accord-
ing to the R–W mechanism increases with WM capacity provides evidence
that WM can play a role in classical conditioning by affecting the adoption
of a classical conditioning mechanism such as the R–W rule. Sasaki (2009)
and Baetu et al. (2018) previously provided evidence of disruption of classi-
cal conditioning performance when WM is loaded using dual-task paradigms.
The present finding adds to the mounting evidence that, against the predom-
inant belief, WM may be implicated in low-level cognitive processes such as
instrumental learning, more commonly referred to as reinforcement learning
within the neuroscience and machine learning communities (Collins & Frank,
2012; Ez-zizi, 2016) and in some forms of implicit learning (Medimorec et al.,
2021).
Blocking and inhibitory blocking-like effects did not emerge from the R–W
model for all participants. As shown, this was mainly due to the short duration
of the training phase. Increasing the number of training trials not only resulted
control via WM. This has led to the development of new learning frameworks
where WM is explicitly modeled as a key component that supports learning by
retaining information from previous trials (e.g., see Collins & Frank, 2012; Ez-
zizi et al., 2015). This could be the approach to take for R–W and other clas-
sical conditioning models, especially because in large simulation-based lan-
guage studies, learning events typically contain a large number of cues (e.g.,
all trigraphs or words in one sentence), which cannot be processed at once by
a human learner—as is required in the updates of the R–W model—due to
known WM capacity limitations (see Glautier, 2013, and Baayen et al., 2016,
for early attempts in this direction).
Another direction for future extension of our work is to collect partici-
pants’ responses over time while they are trained on the cue–outcome associ-
ations rather than having a separate postlearning test phase. This would have
the potential to improve the model fit further and to provide a broader picture
of the behavior of participants while they are learning the task. In addition, this
could allow the extraction of a learning measure based on time slope for the
language learning task, such as we did for the implicit learning task, and thus
would increase the likelihood of finding a link between implicit learning and
the quality of fit of the R–W model (see also our discussion in Appendix S5
in the Supporting Information online). A link between the two measures can
also be probed by fitting the R–W model to the response times collected in the
implicit learning task, as was done in Notaro et al. (2018), rather than using
time slopes only or a mixture of the two.
Finally, the particular structure of our language learning task favored the
normative (masculine-biased) strategy, but an interesting question that remains
unanswered is whether we can use the R–W model to predict the emergence of
different strategies as we vary the structure of the language input and control
for individual differences among language learners. The approach of using the
R–W model to explain or predict the level of agreement among language users
can be extended beyond Polish subject–verb agreement in the plural past tense
to cover other facets of language where a lack of consensus in language use
has been observed (e.g., see Geeraert et al., 2020; Milin, Divjak, & Baayen,
2017).
Conclusion
The R–W model is a very simple learning model, yet it has multiple sources of
variation that can be used to explain participants’ behavior in language learn-
ing experiments. These include the model’s learning rate, the order of pre-
sentation of learning examples, and the relative frequencies of cue–outcome
This article has earned Open Data and Open Materials badges for making
publicly available the digitally-shareable data and the components of the re-
search methods needed to reproduce the reported procedure and results. All
data and materials that the authors have used and have the right to share
are available at https://github.com/ooominds/Error-correction-mechanisms-
in-language-learning and https://doi.org/10.25500/edata.bham.00000911. All
proprietary materials have been precisely identified in the manuscript.
Notes
1 The idea of cue competition is also at the core of the competition model of Bates
and MacWhinney (1987) for language acquisition. Their model, however, uses
mainly symbolic/linguistic cues such as word order or morphological features of
words and is based on a connectionist approach requiring a much more complex
architecture than the R–W model.
2 The contents of any corpus are, at best, a very rough approximation of the input that
language users receive. Conversely, artificial languages are illustrative and
informative for understanding natural languages but hardly a realistic reflection of
the complexity found in any given natural language.
3 The early implementations of the R–W rule as the naïve discrimination learning
model relied on a noniterative version of the algorithm, as provided by Danks
(2003), which eliminates the possibility of any order effects emerging.
4 It is important to note that here we were not interested in testing the blocking effects
per se as is typically done in behavioral experiments of classical conditioning. In
those experiments, only the events relevant to blocking are included (blocking is
tested separately from the other effects), and blocking is tested on a second cue
rather than a third cue as in our case (e.g., Kamin, 1969). Also, the learner is
usually trained for long enough to ensure that the “blocking” cue becomes a good
predictor of the outcome of interest. Such a clean experimental setup would not
References
Adani, S., & Cepanec, M. (2019). Sex differences in early communication
development: Behavioral and neurobiological indicators of more vulnerable
communication system development in boys. Croatian Medical Journal, 60(2),
141–149. https://doi.org/10.3325/cmj.2019.60.141
Ambridge, B., & Lieven, E. V. M. (2011). Child language acquisition: Contrasting
theoretical approaches. Cambridge University Press.
Baayen, R. H. (2011). Corpus linguistics and naive discriminative learning. Revista
Brasileira de Linguística Aplicada, 11(2), 295–328.
https://doi.org/10.1590/S1984-63982011000200003
Baayen, R. H., Endresen, A., Janda, L. A., Makarova, A., & Nesset, T. (2013). Making
choices in Russian: Pros and cons of statistical methods for rival forms. Russian
Linguistics, 37(3), 253–291. https://doi.org/10.1007/s11185-013-9118-6
Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of
Psychological Research, 3(2), 12–28. https://doi.org/10.21500/20112084.807
Baayen, R. H., Milin, P., Filipović Ðurđević, D., Hendrix, P., & Marelli, M. (2011). An
amorphous model for morphological processing in visual comprehension based on
naive discriminative learning. Psychological Review, 118(3), 438–481. https://doi.
org/10.1037/a0023851
Baayen, R. H., Shaoul, C., Willits, J., & Ramscar, M. (2016). Comprehension without
segmentation: A proof of concept with naive discriminative learning. Language,
Cognition and Neuroscience, 31(1), 106–128. https://doi.org/10.1080/23273798.
2015.1065336
Baetu, I., Burns, N., & Child, B. (2018). Individual differences in working memory
capacity predict performance on an associative learning task [Paper presentation].
Australian Psychologist, Sydney, Australia. https://doi.org/10.1111/ap.12372
Balleine, B. W., & O’Doherty, J. P. (2010). Human and rodent homologies in action
control: Corticostriatal determinants of goal-directed and habitual action.
Neuropsychopharmacology, 35(1), 48–69. https://doi.org/10.1038/npp.2009.131
Balling, L. W., & Baayen, R. H. (2008). Morphological effects in auditory word
recognition: Evidence from Danish. Language and Cognitive Processes, 23(7–8),
1159–1190. https://doi.org/10.1080/01690960802201010
Balling, L. W., & Baayen, R. H. (2012). Probability and surprisal in auditory
comprehension of morphologically complex words. Cognition, 125(1), 80–106.
https://doi.org/10.1016/j.cognition.2012.06.003
Bandura, A. (1962). Social learning through imitation. In M. R. Jones (Ed.), Nebraska
Symposium on Motivation (pp. 211–274). University of Nebraska Press.
Bates, E., & MacWhinney, B. (1987). Competition, variation, and language learning.
In B. MacWhinney (Ed.), Mechanisms of language aquisition (pp. 157–193).
Lawrence Erlbaum.
Burke, D. M., & Shafto, M. A. (2004). Aging and language production. Current
Directions in Psychological Science, 13(1), 21–24. https://doi.org/10.1111/j.0963-
7214.2004.01301006.x
Bybee, J., & McClelland, J. L. (2005). Alternatives to the combinatorial paradigm of
linguistic theory based on domain general principles of human cognition. The
Linguistic Review, 22(2–4), 381–410. https://doi.org/10.1515/tlir.2005.22.2-4.381
Chen, Z., Haykin, S., Eggermont, J. J., & Becker, S. (2008). Correlative learning: A
basis for brain and adaptive systems. Wiley.
Chomsky, N. (1959). A review of BF Skinner’s Verbal behavior. Language, 35(1),
26–58. https://doi.org/10.4159/harvard.9780674594623.c6
Chuang, Y.-Y., Bell, M. J., Banke, I., & Baayen, R. H. (2021). Bilingual and
multilingual mental lexicon: A modeling study with linear discriminative learning.
Language Learning, 71(S1), 219–292. https://doi.org/10.1111/lang.12435
Collins, A. G. E., & Frank, M. J. (2012). How much of reinforcement learning is
working memory, not reinforcement learning? A behavioral, computational, and
neurogenetic analysis. The European Journal of Neuroscience, 35(7), 1024–1035.
https://doi.org/10.1111/j.1460–9568.2011.07980.x
Dabrowska,
˛ E. (2018). Experience, aptitude and individual differences in native
language ultimate attainment. Cognition, 178, 222–235. https://doi.org/10.1016/j.
cognition.2018.05.018
Dabrowska,
˛ E., & Divjak, D. (2015). Introduction. In E. Dabrowska
˛ & D. Divjak
(Eds.), Handbook of cognitive linguistics (pp. 1–9). Walter de Gruyter.
Danks, D. (2003). Equilibria of the Rescorla–Wagner model. Journal of Mathematical
Psychology, 47(2), 109–121. https://doi.org/10.1016/S0022-2496(02)00016-0
Divjak, D. (2018). Binding scale dynamics. In D. Van Olmen, T. Mortelmans, & F.
Brisard (Eds.), Aspects of linguistic variation (pp. 9–42). De Gruyter Mouton.
Divjak, D. (2019). Frequency in language: Memory, attention, and learning.
Cambridge University Press.
Divjak, D., & Gries, S. T. (Eds.). (2012). Frequency effects in language representation.
De Gruyter.
Divjak, D., Milin, P., Ez-zizi, A., Józefowski, J., & Adam, C. (2021). What is learned
from exposure: An error-driven approach to productivity in language. Language,
Cognition and Neuroscience, 36(1), 60–83. https://doi.org/10.1080/23273798.2020.
1815813
Ellis, N. C. (2006a). Language acquisition as rational contingency learning. Applied
Linguistics, 27(1), 1–24. https://doi.org/10.1093/applin/ami038
Ellis, N. C. (2006b). Selective attention and transfer phenomena in L2 acquisition:
Contingency, cue competition, salience, interference, overshadowing, blocking, and
perceptual learning. Applied Linguistics, 27(2), 164–194. https://doi.org/10.1093/
applin/aml015
Ellis, N. C., Römer, U., & O’Donnell, M. B. (2016). Usage-based approaches to
language acquisition and processing: Cognitive and corpus investigations of
construction grammar. Wiley-Blackwell.
Ellis, N. C., & Sagarra, N. (2010). The bounds of adult language acquisition: Blocking
and learned attention. Studies in Second Language Acquisition, 33(4), 553–580.
Ellis, N. C., & Sagarra, N. (2011). Learned attention in adult language acquisition: A
replication and generalization study and meta-analysis. Studies in Second Language
Acquisition, 33(4), 589–624. https://doi.org/10.1017/S0272263111000325
Ez-zizi, A. (2016). Reinforcement learning in partially observable tasks: State
uncertainty and memory dependence [Doctoral thesis, University of Bristol].
EThOS. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.707711
Ez-zizi, A., Farrell, S., & Leslie, D. (2015). Bayesian reinforcement learning in
Markovian and non-Markovian tasks. IEEE Symposium Series on Computational
Intelligence (pp. 579–586). https://doi.org/10.1109/SSCI.2015.91
Geeraert, K., Newman, J., & Baayen, R. H. (2020). Variation within idiomatic
variation: Exploring the differences between speakers and idioms. East European
Journal of Psycholinguistics, 7(2), 9–27. https://doi.org/10.29038/eejpl.2020.7.2.
gee
Milin, P., Divjak, D., & Baayen, R. H. (2017). A learning perspective on individual
differences in skilled reading: Exploring and exploiting orthographic and semantic
discrimination cues. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 43(11), 1730–1751. https://doi.org/10.1037/xlm0000410
Milin, P., Feldman, L. B., Ramscar, M., Hendrix, P., & Baayen, R. H. (2017).
Discrimination in lexical decision. PloS One, 12(2), Article e0171935. https://doi.
org/10.1371/journal.pone.0171935
Milin, P., Madabushi, H. T., Croucher, M., & Divjak, D. (2020). Keeping it simple:
Implementation and performance of the proto-principle of adaptation and learning
in the language sciences. arXiv:2003.03813. https://doi.org/10.48550/arXiv.2003.
03813
Muñoz, C., & Singleton, D. (2011). A critical review of age-related research on L2
ultimate attainment. Language Teaching, 44(1), 1–35. https://doi.org/10.1017/S026
1444810000327
Mutter, S. A., Atchley, A. R., & Plumlee, L. M. (2012). Aging and retrospective
revaluation of causal learning. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 38(1), 102–117. https://doi.org/10.1037/a0024851
Notaro, G., van Zoest, W., Melcher, D., & Hasson, U. (2018). Prediction and
information integration determine subtle anticipatory fixation biases. bioRxiv.
https://doi.org/10.1101/252809
Olejarczuk, P., Kapatsinski, V., & Baayen, R. H. (2018). Distributional learning is
error-driven: The role of surprise in the acquisition of phonetic categories.
Linguistics Vanguard, 4(s2), Article 20170020. https://doi.org/10.1515/lingvan-
2017-0020
Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity
of the cerebral cortex. Oxford University Press.
Pearce, J. M., & Hall, G. (1980). A model for Pavlovian learning: Variations in the
effectiveness of conditioned but not of unconditioned stimuli. Psychological
Review, 87(6), 532–552.
Pirrelli, V., Marzi, C., Ferro, M., Cardillo, F. A., Baayen, H. R., & Milin, P. (2020).
Psycho-computational modelling of the mental lexicon. In V. Pirrelli, I. Plag, & W.
U. Dressler (Eds.), Word knowledge and word usage (pp. 23–82). De Gruyter
Mouton.
Plaut, D. C., & Gonnerman, L. M. (2000). Are non-semantic morphological effects
incompatible with a distributed connectionist approach to lexical processing?
Language and Cognitive Processes, 15(4–5), 445–485. https://doi.org/10.1080/01
690960050119661
R Core Team. (2019). R: A language and environment for statistical computing
(Version 3.6.2) [Computer software]. R Foundation for Statistical Computing.
http://www.R-project.org
Supporting Information
Additional Supporting Information may be found in the online version of this
article at the publisher’s website:
Accessible Summary
Appendix S1. Distributions of Participants’ Educational and Language Back-
grounds.
Appendix S2. Experimental Procedure for the Language Learning Task.
Appendix S3. Cue Combinations Used in the Test Phase of the Language
Learning Task.
Appendix S4. Explicit Knowledge and Demographic Questionnaire.
Appendix S5. Implicit Learning Task.
Appendix S6. Effect of Learning Rate Parameter on Model Fit Accuracy.
Appendix S7. Long-Run Simulations to Assess Blocking Effects.
Appendix S8. Generalized Linear Mixed-Effects Models Explaining Partici-
pants’ Choices and Response Times.
Appendix S9. Correlations Between the Different Individual Difference Mea-
sures.