Tilsen Articulatory Phonology
Tilsen Articulatory Phonology
Tilsen Articulatory Phonology
Abstract
Studies of the control of complex sequential movements have dissociated two aspects of movement planning: control over
the sequential selection of movement plans, and control over the precise timing of movement execution. This distinction is
particularly relevant in the production of speech: utterances contain sequentially ordered words and syllables, but
articulatory movements are often executed in a non-sequential, overlapping manner with precisely coordinated relative
timing. This study presents a hybrid dynamical model in which competitive activation controls selection of movement plans
and coupled oscillatory systems govern coordination. The model departs from previous approaches by ascribing an
important role to competitive selection of articulatory plans within a syllable. Numerical simulations show that the model
reproduces a variety of speech production phenomena, such as effects of preparation and utterance composition on
reaction time, and asymmetries in patterns of articulatory timing associated with onsets and codas. The model furthermore
provides a unified understanding of a diverse group of phonetic and phonological phenomena which have not previously
been related.
Citation: Tilsen S (2013) A Dynamical Model of Hierarchical Selection and Coordination in Speech Planning. PLoS ONE 8(4): e62800. doi:10.1371/
journal.pone.0062800
Editor: Ramesh Balasubramaniam, McMaster University, Canada
Received September 24, 2012; Accepted March 25, 2013; Published April 24, 2013
Copyright: ß 2013 Sam Tilsen. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This author has no support or funding to report.
Competing Interests: The author has declared that no competing interests exist.
* E-mail: tilsen@cornell.edu
Introduction example, we can consider rotating the wheel and releasing the
brake to be subcomponents of a more abstract ‘‘turn’’ program. At
Coordination and selection in motor planning this abstract level, we can view the task as consisting of two
Many of the motor activities we perform from day-to-day sequentially selected motor programs: brake and turn. In this case,
involve sequences of movements in which some of the movements the precisely coordinated movements of rotating the wheel and
are initiated with precise timing. Past research on motor control releasing the brake are co-selected subcomponents of the turning
has established an important distinction between the sequencing of program.
movements and the coordination of their timing [1–4]. For One puzzle for theories of motor planning is how selection and
example, consider the task of making a 90u turn in an automobile coordination interact. While there currently exist models of motor
at an intersection. The driver of the vehicle will press the brake planning based on competitive selection and models of coordina-
pedal to decelerate, then rotate the wheel and release the brake tion, no model integrating both of these mechanisms has yet been
pedal. At reasonable speeds, the deceleration and rotation are developed. This paper presents a dynamical model of motor
typically performed in sequence, whereas the release of the brake planning and execution – the activation-spin model – that integrates
is usually contemporaneous with the rotation of the wheel, i.e. both coordination and selection mechanisms. The model focuses
these movements are initiated at approximately the same time. on the articulatory movements of speech and more abstract speech
When we observe a precisely controlled temporal relation between units such as syllables and words, which together can be viewed as
movements, we say that they are coordinated. This precision control parts of a hierarchical structure of motor plans. The model is an
has important consequences for accomplishing the goals of the integration and elaboration of two different frameworks, articu-
task. If the driver were to substantially delay one or the other latory phonology and competitive queuing. Articulatory phonol-
movement, they could easily crash or damage the automobile. ogy is a theory of phonological representation and phonetic
Hence we distinguish between two aspects of motor control: implementation that utilizes phase-coupling forces between
sequencing, the control of the order in which movements are oscillatory systems to govern articulatory timing. Competitive
performed, and coordination, precision control of the relative queuing is a general dynamical framework for sequential selection
timing of movement initiation. of simultaneously active motor plans, which can be readily applied
Several models of motor planning have understood sequencing to the word and syllable components of utterances.
with the concept of competitive selection: movement plans are A key innovation of the model developed here is that it
activated in parallel prior to initiation, and then they are selected attributes an important role to selection in the control of
in order through mutually exclusive competitive processes [5,6]. articulatory timing within a syllable. Selection has been investigated
This understanding works well as long as motor plans are viewed primarily with regard to the word and syllable levels of the speech
at a sufficiently abstract level of a planning hierarchy. For motor hierarchy, and coordination with regard to the level of
articulatory gestures. The most obvious approach to integrating hierarchical structure of syllables and words; the model does not
coordination and selection would apply these mechanisms to the address a number of equally important issues, such as motor
levels of the motor planning hierarchy with which they have learning, the role of feedback in control, or the control of effector
traditionally been associated. In other words, a null hypothesis trajectories. Despite these limitations, the activation-spin model is
would hold that competitive selection governs the sequencing of an important development because it explains a broad range of
words and syllables, whereas coordinative mechanisms govern the behavioral phenomena related to sequencing and movement
timing of articulatory gestures. In such a view articulatory gestures timing.
would be selected automatically when their associated syllable is
selected. In contrast, the activation-spin model proposes that Articulatory phonology and the coupled oscillators
competitive selection plays a crucial role in the control of model of coordination
articulation within a syllable. Specifically, the model holds that Many theories of phonology have assumed that the basic
onset consonantal gestures and vocalic gestures are co-selected (i.e. cognitive units of speech are discrete, symbolic units such as
not competitively selected) and coordinated, while coda conso- segments or features. However, attempts to identify invariant
nantal gestures are competitively selected relative to a preceding acoustic correlates of these units in recordings of speech have been
vocalic or consonantal gesture, rather than coordinated. This unsuccessful. Articulatory phonology addresses this problem by
constitutes a substantial departure from the coupled oscillators proposing that articulatory gestures, rather than segments, are the
model of articulatory phonology, because it restricts the involve- basic units of speech [17,18]. In this framework, articulatory
ment of coordination in the control of articulatory timing. Hence gestures are conceptualized as dynamical events in which organs of
the proposal here is not merely a trivial combination of previously the vocal tract move to achieve constriction targets. The targets
proposed mechanisms, but an integration that attributes a novel themselves are defined in coordinates of vocal tract geometry [19],
role to selection in controlling articulatory timing. such the aperture between the lips or the location and degree of a
The proposed integration is significant because it offers a unified constriction between the tongue and the palate. For example, in
account for a variety of phonetic and phonological asymmetries the word ‘‘spa’’ the consonant/s/ is associated with a narrow
between onset and coda consonants. Most phonological theories constriction gesture near the anterior of the palate made by raising
take for granted a hierarchical organization of sounds into the tongue tip; the consonant/p/ is associated with a bilabial
syllables. A syllable can be analyzed to consist of an onset closure gesture, in which the aperture between the lips is closed to
(consonants preceding the vowel), a nucleus (the vowel itself), and a prevent airflow; and the vowel/a/ is associated with a lowering
coda (consonants following the vowel). When multiple consonants and retraction of the tongue body to create a narrowing along the
occur in onset or coda position, the syllable is said to have a complex rear wall of the pharynx, thereby influencing the resonances of the
onset or complex coda. There is a diverse collection of cross-linguistic vocal tract in a distinctive way.
typological differences between onsets and codas, which include A key tenet of articulatory phonology is that gestures can
the following: (1) onsets combine relatively freely with vowels, overlap in time. This possibility allows for a large amount of
whereas certain combinations of codas and vowels are more contextual variation in the acoustic signal to be explained. One
commonly restricted [7,8]; (2) codas often function to fulfill concrete example is the observation that the location of the closure
templates for morpheme or syllable structure, whereas onsets do between the tongue and the palate in/ki/ is more anterior than it
not [9]; (3) in languages with lexical tone, onset consonants rarely is in/ku/ – this pattern can be readily understood to result from
influence the capacity of a syllable to bear certain types of complex the coproduction of the consonant and vowel, which can be
or contour tones, whereas the presence of a coda consonant can modeled as the simultaneous activation of both gestures [19]. This
influence the tone-bearing capacity of a syllable [10,11]; (4) in sort of variation is not easy to accommodate in models of
languages with stress, the presence of a coda consonant can phonology based on linear sequences of units. In contrast,
influence the location of stress within a word, whereas onsets do articulatory phonology can more readily account for overlap
not exhibit this influence [12,13]; (5) on diachronic timescales, the because it incorporates the relative timing (i.e. coordination) of
reduction of articulatory gestures in a coda consonant can induce a gestures directly into lexical representations.
preceding vowel to be lengthened – a pattern known as An important observation in studies of speech articulation is
compensatory lengthening, whereas the loss of an onset consonant that the relative timing of consonantal and vocalic gestures
never results in lengthening of an adjacent vowel [12,14]; (6) in depends on the position of the consonant within the syllable. Onset
accelerating syllable repetition tasks, VC (vowel-coda) syllables consonants are consonants that occur before a vowel in a syllable,
spontaneously reorganize into CV (onset-vowel) syllables, whereas and coda consonants occur after the vowel. Articulation studies
the reverse does not occur [15,16]. These are just some of the have shown that onset consonantal gestures and vocalic gestures
numerous differences between onset and coda consonants, all of tend to be initiated closely together in time, whereas coda
which beg for a unified explanation. Below we evaluate how the consonantal gestures are initiated at the offset of the preceding
standard coupled oscillators model and the proposed activation- vocalic gesture. This onset/coda asymmetry is illustrated in Figure 1,
spin model account for these differences. We conclude that the which shows articulatory movements recorded with electromag-
proposed integration of coordination and selection mechanisms netic articulography [20]. Figure 1A shows the case in which
offers a more comprehensive explanation of the above asymme- consonantal gestures precede a vowel, here the consonant-vowel
tries. (CV) form/pa/ and the CCV form/spa/. The panels show the
Before presenting the activation-spin model, we review below following from top to bottom: distance between upper and lower
basic aspects of its precursors – the coupled oscillators model of lip sensors, i.e. lip aperture (LA); vertical position of the tongue
articulatory phonology and the competitive queuing model of tip/blade (TTy); and vertical position of the tongue body/dorsum
sequential selection – along with the key behavioral phenomena (TBy). Below the movement trajectories are gestural scores, which
these account for. It should be noted that, despite integrating two represent periods of gestural activation. In the task-dynamic model
important theoretical frameworks, the scope of the model of speech production [19], gestural activation corresponds to a
presented here is relatively limited: the current implementation period of time in which a movement intention is present in the
focuses on the planning and initiation of speech movements in a form of the displacement of a vocal tract equilibrium from its
Figure 1. Illustration of articulatory coordination and sequencing. (A) onset consonantal gestures are precisely coordinated with the vocalic
gesture and overlap substantially. (B) coda consonantal gestures are selected sequentially. Arrows show movement onsets. Gestural scores aligned to
movement trajectories are shown for consonantal and vocalic gestures.
doi:10.1371/journal.pone.0062800.g001
neutral value. The articulatory movements shown here were initiation of the vowel articulation as they are in the CV and CCV
produced in response to a go-cue given in the midst of a sustained forms.
vowel/i/, which is produced with a relatively high position of the Articulatory phonology accounts for the onset/coda asymmetry
tongue dorsum. by conceptualizing the syllable as a system of coupled gestural
To observe the temporal proximity of the initiation of the onset planning oscillators. In early versions of the framework inter-
consonant and vowel gestures, notice that in the CV form/pa/, gestural timing relations were specified directly in lexical
the bilabial closure for/p/ (a decrease of LA) begins nearly at the representations. Building upon the dynamical model of movement
same time as the vocalic/a/gesture that lowers and retracts the coordination in [21], the innovation in [22] introduced a dynamics
body of the tongue (TBy) – the delay between these movement of planning that computes intergestural timing from a network of
onsets is only about 50ms. Moreover, the periods of time in which phase-coupled oscillators. In this system, each articulatory gesture
these two movements occur overlap extensively. The same is true is associated with a planning oscillator, and the planning oscillators
for/spa/, where the raising of the tongue tip (TTy) for/s/ precedes may be coupled to one another in one of two ways: they may be
the initiation of the vocalic gesture by about 30 ms and the in-phase coupled, such that coupling forces minimize the phase
initiation of the bilabial closure follows the vocalic gesture by difference of the oscillations, or anti-phase coupled, such that
about 80 ms. These patterns indicate that consonant and vowel coupling forces maximize the phase difference. By hypothesizing
articulatory gestures are precisely coordinated when the conso- that onset consonantal gestures are in-phase coupled to a vocalic
nants are onsets of a syllable. gesture, this model accounts for the empirical observation of
The onset consonant timing pattern contrasts markedly with the nearly synchronous movement initiation. As schematized in
pattern observed for coda consonants in the forms/ap/and/asp/. Figure 2, the dynamic phase variable of each planning oscillator
Figure 1B shows that in a coda consonant, the articulatory can be viewed as an angle made by a point moving counterclock-
movement is initiated near the offset of a preceding gesture. In the wise on a unit circle. The oscillators are assumed to maintain a
form/ap/, the bilabial closure is initiated as the preceding vocalic constant amplitude, exhibit 1:1 frequency-locking of their intrinsic
movement reaches its target, and in the complex coda form/asp/, phase velocities, and depending on their initial conditions, require
this gesture is initiated near the offset of the preceding consonantal time to evolve toward a stable relative phase pattern prior to the
gesture for/s/. Hence the coda/s/ and/p/ gestures appear to be triggering process. The gesture corresponding to each planning
sequentially timed relative to a preceding gesture: first the/a/ oscillator is triggered (initiated) at an arbitrary phase angle, here
gesture is selected and executed, and then the following/s/ the top of the circle. For a simplex onset as in/pa/, the
consonantal gesture is selected and executed, and then the consonantal gesture is initiated at nearly the same time as the
following/p/ gesture is selected and executed. In other words, vocalic gesture because their associated planning oscillators are
the coda articulations are not precisely coordinated with the nearly in-phase.
Figure 2. Coupled oscillators model of coordination for CV (/pa/), CCV (/spa/), and VC (/ap/) forms. For each form, a coupling graph is
shown with in-phase coupling (solid green lines) and anti-phase coupling (dashed red lines). Oscillator phases proceed counter-clockwise until they
reach an arbitrary phase value that triggers movement initiation, i.e. activation of the corresponding gesture in a gestural score.
doi:10.1371/journal.pone.0062800.g002
Furthermore, by hypothesizing that in complex onsets (i.e. CCV interference from a simultaneously produced consonant. Although
or CCCV) consonantal gestures are anti-phase coupled to one coda consonants could in theory be in-phase coordinated with
another while simultaneously in-phase coupled to the vocalic vowels and be produced even more slowly than either onset
gesture, the model can account for another important empirical consonants or vowels, this would result in mechanical interference
pattern known as the c-center effect [23]. Studies in a number of between the gesture for attaining the vocalic target and hinder its
languages have shown that as additional consonants are added to a perceptibility, hence in-phase coordination between a vowel and
syllable onset, the midpoint of the sequence of consonantal coda consonant is undesirable.
articulations maintains a fixed temporal relation to the vowel [24– The coupled oscillators model accounts for coda coordination
27]. Accordingly, the timing of the rightmost (immediately pre- by hypothesizing that coda consonantal gestures are anti-phase
vocalic) consonant will shift closer to and overlap more with the coupled to a preceding gesture. Hence in a VC1C2 syllable, the C1
vocalic gesture. The c-center effect is evident in the movement gesture is anti-phase coupled to the preceding vowel, and the C2
trajectories and gestural score of Figure 1A, and is portrayed gesture is anti-phase coupled to the preceding consonant. Figure 2
schematically in Figure 2. When two onset gestures are present illustrates the case of a VC syllable/ap/, where the coda gesture is
(e.g. in ‘‘spa’’), the anti-phase coupling force between the triggered 180u out of phase with the vocalic planning oscillator.
consonantal gestures results in a temporal displacement of their This hypothesis receives some indirect support in the observation
respective triggering events relative to the vowel. In a sense, the co- that coda gestural timing is more variable than onset timing, which
existing in-phase and anti-phase coupling forces compete, and the could be due to the relative instability of anti-phase coupling
stable relative phase of the system reflects a compromise between compared to in-phase coupling [28]. However, because gestural
them. planning oscillations are not directly observable in articulatory or
One compelling basis for hypothesizing different coupling acoustic data, and because the model predicts no analogue of the
modes for consonant-vowel and consonant-consonant coordina- c-center effect for complex codas, only indirect evidence of this
tion relates to the interaction between the perception of sort can be provided for the coda coordination hypothesis.
articulatory gestures and mechanical coupling inherent in Furthermore, the articulatory phonology treatment of codas
articulation. Constrictions made with the front and back of the encounters a dilemma in explaining the observation that coda
tongue can interfere with one another in a mechanical sense, and movements appear to be initiated at the offset of vocalic
also can interfere with labial constrictions through mutual movements (see Figure 1B). First, consider that the model
mechanical coupling to the jaw. Moreover, when two consonantal explicitly dissociates gesture durations from planning system
constrictions are made contemporaneously, the more anterior one periods. The planning systems serve the purpose of determining
is liable to mask the acoustic consequences of the more posterior when gestures become active, but the durations of those activation
one. Hence if two consonantal articulations are produced intervals are independently specified. This is necessarily the case,
simultaneously, they are likely to interfere and jeopardize the given that all oscillators have a uniform intrinsic frequency and yet
perceptual recoverability of the articulations. Hence from the consonantal and vocalic movements can differ substantially in
standpoint of a listener, anti-phase coordination of consonants is duration [29,30]. Now consider the observation that there exists a
preferable, because it minimizes temporal overlap between close relation between the attainment of the vocalic target and the
consonantal articulations. In contrast, vowel gestures achieve their onset of the coda consonantal gesture, as is evident in the
targets more slowly than consonantal ones and involve a lesser movement patterns for coda gestures shown in Figure 1B. Under
degree of constriction; hence a vocalic gesture can be initiated the hypothesis of anti-phase coda coupling, in which the coda
simultaneously with a consonantal gesture without rendering the consonantal gesture is triggered 180u out of phase with the vowel,
consonantal articulation perceptually unrecoverable [7]. In one might be tempted to assert that the duration of the vocalic
contrast, the achievement of a vocalic target, which occurs well gesture is precisely half of the period of its corresponding planning
after initiation of the movement, is susceptible to mechanical oscillator. This would naturally suggest that gestural durations are
equal to half-periods of planning oscillators. Yet we know that this 42]. Regarding errors involving segmental units, these studies have
cannot hold true because of the aforementioned variation in shown that both anticipatory and perseveratory errors occur.
gestural durations across segment types and syllable positions. Anticipatory errors involve the errorful production of a segment or
Hence the phase-coupling model of coda coordination is left with group of segments prior to its expected location in a sequence, e.g.
no straightforward way to explain the basic observation that coda ‘‘[pl]eech planning’’. Perseveratory errors involve the errorful
gestures appear to be initiated at the offset of vocalic gestures. production of a segment subsequent to its expected location in a
The coupled oscillators model of articulatory phonology does sequence, e.g. ‘‘speech [sp]anning’’. Sometimes these co-occur
well in accounting for the coordination of onset consonants and resulting in an exchange (spoonerism): ‘‘[pl]eech [sp]anning’’. A
vowels, but has a number of limitations in the scope of phenomena reasonable conclusion is that error patterns of this sort could only
it explains. For one, there is the aforementioned dilemma in arise if errorful units are activated in planning well prior to and
explaining why coda consonantal gestures are initiated near the subsequent to their temporal location in a sequence.
attainment of vowel movement targets. More importantly, there Models of sequential movement planning have been developed
are a number of phenomena involving larger units of linguistic to account for the effects of sequence length and complexity on
organization – syllables, words, etc. – that a speech production reaction times, as well as the commonly observed anticipatory and
model would ideally accommodate. As we discuss below, it is perseveratory error patterns. A key idea in many approaches is the
known that the number of words in an utterance and the number concept of motor plan selection, which is a process that dissociates
of syllables in those words have effects on how quickly the anticipatory activation of motor programs from mechanisms of
utterance can be initiated. Although there have been several selection and execution. One example is the sequential selection
extensions of the coupled oscillatory systems approach to model of Sternberg et al. [39]. Motor programs are stored in a
incorporate planning dynamics associated with syllables, feet, memory buffer, then subsequently selected by a search mechanism
and phrases [31–36], it is not clear how these models can account that takes longer if there are more programs in the buffer.
for such phenomena. Below we consider a different class of model Subsequent to selection a motor program is executed and then re-
that is designed to account for factors influencing utterance enters the memory buffer. The tripartite division between
initiation. activation, selection, and execution is a common theme in other
approaches, including models which incorporate continuous
Sequential selection and competitive queuing activation dynamics.
Early theories of the control of sequential movement were based An important advance in modeling sequential movement
on the concept of an associative chain. In this conception, motor planning was developed in [5], which introduced the concept of
units correspond to groups of motor neurons. One unit activates, competitive-queuing of action units. In this approach (see Figure 3),
effecting a motor response; sensory feedback from that response motor units are associated with activation variables. In a pre-
would activate the next unit in the chain, which would in turn selection planning stage, the motor plans become activated to
effect a motor response and more feedback, activating the next varying degrees. The intention to initiate the movement sequence
unit, and so on. In a classic paper, Lashley argued that this view is results in growth of activation and selection of the most highly
untenable to explain behaviors involving serially ordered move- active plan, which triggers its execution. The selected plan is
ments [1], taking particular objection to the assumption that units subsequently suppressed (either directly through recurrent self-
sequentially go from a quiescent state to an activated state and inhibition or indirectly through sensory-feedback mechanisms),
back again without substantial overlap in periods of activation. His and this allows the next most highly active plan to be selected,
arguments were based on the free combinability of units, the suppressed, and so on. A consequence of this design is that the
potential for associations to develop between non-adjacent relative activation of motor plans determines the order in which
elements in a sequence, and the occurrence of sequential errors. they are selected, and hence such models can simulate sequential
Lashley suggested an alternative view in which units are selection. A number of models with a similar activation-dependent
hierarchically organized by a central planning mechanism. selection mechanism have been developed [6,43,44].
Crucially, this mechanism would excite units prior to their Competitive queuing dynamics are successful in accounting for
execution, and it is the pattern of excitation which directs the effects of sequence length and unit complexity, as well as common
control of serially ordered movements. In modern terms, this error patterns. In such models, latency to select a motor plan
means that units in a sequence of movements are planned in depends upon the time for activation levels of that motor plan to
parallel. win the competition for selection. Having more plans simulta-
Effects of sequence length and unit complexity on temporal neously active decreases the activation level of each plan, either
aspects of movement provide an important body of evidence for due to inhibitory interactions between active plans or due to
parallel planning [37–39]. In a series of experiments, Sternberg normalization of total activation. When a sequence contains more
and colleagues demonstrated that the reaction time to initiate an movements, the activation of each movement plan will be
utterance or typed sequence increases with the length of the diminished by the presence of more inhibitory interactions and
sequence (e.g. the number of words) and complexity of the units hence take longer to be selected. The predicts the aforementioned
(e.g. the number of syllables in each word). These effects are linear empirical observations: longer RTs and unit durations in longer
and additive, suggesting that the number of units within a given movement sequences. This effect can be observed by comparing
level of a motor response hierarchy (e.g. within word- and syllable- Figure 3A and 3B: the latency of initiation of the first unit is
levels), influences response behavior. Moreover, the durations of greater when there are more plans, and the intervals between
the elements in the sequence are likewise increased by length and selections of plans likewise increase. It is also straightforward to
complexity. These findings are readily understood if movements generate common error patterns in such models. If for some
are simultaneously planned and interact competitively so as to reason the relative activation pattern is altered early on, perhaps
slow the time-course of selection. by noise or external influences, selection will proceed in an errorful
Errors in sequencing provide a second body of evidence for way (Figure 3C). Dell showed that anticipatory errors can arise in
parallel planning. Studies of speech errors suggest that units are this way, and that perseveratory errors can occur when a plan fails
active prior to, during, and subsequent to their production [40– to be fully suppressed and is subsequently reselected [43].
Figure 3. Competitive selection model dynamics. (A) competitive selection of three units. (B) competitive selection of four units, with
movement initiation delayed relative to (A). (C) Sequencing error in which unit 3 is selected early.
doi:10.1371/journal.pone.0062800.g003
Another phenomenon amenable to explanation in a competitive and CCV forms are initiated quite closely in time with respect to
selection model is the dependence of RT on response preparation. the vocalic gesture, and these consonantal movements overlap
In delayed naming tasks, the target response is known well in extensively with the vocalic movement. This suggests that selection
advance of the response cue, and hence response plans are active of the vocalic gesture is not delayed until the deselection of a
prior to their selection. In contrast, in immediate response tasks, preceding consonantal gesture. Studies of the effects of syllable
response plans are given by the response cue itself, and hence the structure on the reaction time to initiate a speech response also
latency to initiate the response includes both retrieval and selection support the notion that onset consonants and vowels are not
of the response plans. Part of this effect is due to competition in sequentially selected. Recall that a key prediction of selection
retrieval: more frequent syllables and more phonotactically models is the length effect: the reaction time to initiate a motor
probable segmental sequences are initiated more rapidly [45– program increases with the number of units in the program. For
47]. Reaction times based on articulatory movement initiation in a example, if V, CV, and CCV responses are held to consist of one,
prepared response paradigm have been reported to be approxi- two, and three units, respectively, a sequential selection model
mately 180 ms for stop-initial CV syllables [48]. Crucially, that would predict that the RT to initiate a response should increase
study controlled pre-response articulatory posture to prevent from V to CV to CCV. Yet this prediction has been not been
speakers from configuring their vocal tract to facilitate rapid upheld. Experiments reported in [48,52] have found either no
reaction time, which [49] found to be a common strategy. In difference between latencies in CV and CCV responses, or have
contrast, RTs in an unprepared response task are substantially found certain CCV responses to be initiated more rapidly than CV
longer. While no studies have directly compared prepared and responses (this latter, unexpected effect is observed in/sC/ onset
unprepared CV response RTs with the appropriate controls on clusters and can probably be explained by conditional probability
pre-response posture, there are several studies which suggest that in orthographic stimuli [52]). Hence the absence of length effects
the additional delay in an immediate response task is on the order suggests that competitive selection is not sufficient for understand-
of 150–250 ms [49–51]. Sequential selection models can readily ing movement planning at the level of articulatory gestures.
account for preparation effects by associating preparation with
heightened activation prior to the intention to initiate a response. The activation-spin model: integrating selection and
In unprepared responses, activation takes longer to reach the coordination
selection threshold because activation values begin relatively low; The available evidence indicates that there is an important
in prepared responses, activation values begin relatively high, and distinction between how onset and coda consonantal gestures are
hence the selection threshold can be reached more rapidly. produced: movements associated with onset consonants are co-
The timing of articulatory movements, however, does not selected and tightly coordinated with vocalic movements; in
always conform to predictions made by strictly sequential models contrast, movements associated with coda consonants are sequen-
of selection. As observed above, onset consonantal gestures in CV tially selected relative to a preceding movement. Furthermore, the
number of words and syllables in an utterance has an effect on the inhibitory coupling cannot co-occur with attractive coupling. A
RT to initiate the utterance, as does the extent to which the further innovation is the use of a gating variable associated with
response has been prepared. Neither a strictly sequential selection each planning system, although the effects of this variable were
mechanism nor a coupled oscillators coordination mechanism can implicitly present in [5]. The gating variables of inhibitorily
fully account for all of these patterns. To address this problem, a coupled planning systems interact to prevent those systems from
hybrid model – the activation-spin model – is developed here, which being co-selected.
integrates both selection and coordination mechanisms. The dynamics of activation in the model are responsible for
The model utilizes two main dynamical variables, activation selection of movement plans: inhibitory activation-coupling
and spin, and accordingly two types of coupling forces, activation- between plans, in combination with competitive gating variables,
coupling and spin-coupling. The former regulates how activation determines which plans are competitively selected. A important
variables interact, and the latter how spin variables interact. Each insight is that movement plans associated with onset consonants
articulatory gesture, syllable, and word in the hierarchical and vowels can be co-selected: their activation variables are not
structure of an utterance is associated with a distinct planning inhibitorily coupled and their gating variables are not competitive.
system, and for each planning system there is an activation variable In contrast, coda consonants are sequentially selected, because
and a spin variable. Previous efforts to integrate selection and they exhibit inhibitory activation-coupling with other gestures and
coordination attempted to model these mechanisms using just a are competitively gated. A distinction is maintained from previous
single variable that exhibited both oscillatory and competitive- approaches [22,34] between planning systems, which correspond to
selection dynamics [36,53,54]. However, strong interactions pre-motor plans for units such as gestures, syllables, and words,
between oscillation- and selection-related changes in activation and gestural (driving) systems, which correspond to lower-level
created a variety of difficulties. The use of two distinct variables is articulatory goals and can be represented as intervals in a gestural
a key innovation of the approach presented here. Because there score in which motor commands drive articulator movement [19].
are two distinct variables in the model, there are two types of Furthermore, phase-coupled spin variables accomplish coordina-
coupling forces: spin-coupling and activation-coupling, both of tive timing by mediating between the selection of planning systems
which can have a positive or negative valence. The generic effects and the activation of driving systems. The spin variables are
of these coupling interactions are illustrated schematically in intrinsically oscillatory, and the model maps the relative phases of
Figure 4 below. When two systems are attractively spin-coupled, co-selected systems into post-selection delays of movement
their spin variables experience a force that acts to minimize their initiation. In other words, after co-selection, spin determines
relative phase difference; when they are repulsively spin-coupled, precisely when driving systems are activated, and hence governs
the force acts to maximize their relative phase difference. When the precision control of movement initiation.
two systems are excitatorily activation-coupled, the systems act
mutually to increase one another’s activation; when they are
Methods
inhibitorily coupled, they act to mutually decrease one another’s
activation. By hypothesis, attractive spin-coupling entails excitato- Selection with activation variables
ry activation-coupling, and vice versa. The same entailments are Each gestural plan is associated with an activation variable x,
not hypothesized to obtain between repulsive spin-coupling and indexed by i. The index i corresponds to a gestural or prosodic
inhibitory activation coupling, but excitatory coupling between planning system, as determined by the lexical content of an
two systems cannot co-occur with repulsive coupling, and utterance. For current purposes, the prosodic planning systems
included in model simulations are syllables and words. There is a
hierarchical relation between planning systems, such that each
word system is associated with some number of syllable systems,
and likewise each syllable system is associated with some number
of gestural planning systems. These associations are determined by
the lexical content of the utterance. Although both prosodic and
gestural planning systems are selected, only gestural planning
systems drive movement.
The dynamics of each activation variable are governed by a
potential function Vx and corresponding vector field -dV/dx, such
that the time-derivative of x is equal to the negative of the
derivative of its potential function with respect to x, plus a
Gaussian stochastic noise term gx (Eq. 1). The activation potential
(Eq. 2) is a function of three unit-specific variables: activation-
coupling (v), gating (w), suppression (y), as well as activation (x) and
intrinsic decay. The activation potential function can be concep-
tualized as a composite function that is the combination of several
forces acting upon the activation variable. First, there is an
intrinsic activation decay (cd). Second, there is a gating potential
(w). Third, there is an activation coupling potential (v) which
reflects inhibitory and excitatory interactions between units.
Fourth, there is a suppression potential whose contribution to
Figure 4. Schematic illustration of effects of spin-coupling and
activation coupling. Spin-coupling forces act to decrease or increase the vector field takes the form of a suppression-scaled exponential
the phase difference between oscillatory systems. Activation-coupling function of x, which is translated so that suppression is zero when
forces act mutually to increase or decrease activation. x = 0.
doi:10.1371/journal.pone.0062800.g004
Figure 5. Activation potential functions and corresponding vector fields from three stages of articulatory production: (A) prior to
response intention, (B) after response intention before selection, (C) after selection. Composite (black lines) and component functions
(colored lines) are shown.
doi:10.1371/journal.pone.0062800.g005
{dVi {cgate ,qi w0
x_ i ~cxi zcgxi gxi ð1Þ w_ i ~ , {1ƒwƒ1 ð3Þ
dxi cgate ,qi ~0
x2i X 1
Vi ðvi ,wi ,xi ,yi Þ~(cd {wi {vi ) zyi (exi {xi ) ð2Þ vi ~ Xij ð4Þ
2 (1ze{10(xj {0:5) )
j
The gating variables wi are limited to the range [0,1] and exhibit Ð
a constant growth or decay depending upon the selection of yi ~0 ^ xi wt,yi /0:1
competing units (Eq. 3). When units i and j compete the sign of y_ i ~cy yi , ð5Þ
xi v0:1,yi /0
their activation coupling parameters is negative, i.e. Xij ,0. The
signs of the matrix X are symmetric but the magnitude of the
coupling strengths need not be. The variable q formalizes the The activation coupling term is the summation of the influence
notion of having a competitor selected: qi is the number of of other units on i (Eq. 4). If the sign of the coupling strength Xij is
currently selected units which compete with i. Selection occurs negative, unit j exerts an inhibitory effect on i; conversely, if the
when x is greater than an arbitrary threshold t, here assumed to sign of Xij is positive, j exerts an excitatory effect on i. A sigmoid
have a constant value, t = 1. Hence gating variables w will function is used to describe the magnitude of the activation
decrease to a value of -1 (fully closed) when competing systems are coupling force exerted by j on i; the input to the sigmoid is the
selected, and will increase to a value of 1 (fully open) when no activation of unit j, which is translated and rescaled so that an
other competing systems are selected. The growth/decay rates of activation of 0 has a negligible influence and an activation of 1 has
the gating variables (cgate) are large so that gates open and close a maximal influence. The suppression variable is governed by Eq.
quickly. (5); prior to plan selection suppression is initially 0 and hence
exhibits no growth. When the activation of a unit first surpasses the
threshold, the value of its suppression variable is set to 0.1, and
begins to rise with a growth rate of cy. When the activation of that
system falls below 0.1, the suppression variable is reset to 0.
Figure 5 shows example potential functions in the top row, with d is 0 and the driving variable is quiescent. Upon selection (i.e.
functions representing the magnitudes and signs of the corre- when activation becomes suprathreshold), d is set to the linearized
sponding vector fields in the bottom row. The composite (black relative phase upon selection (Qsel) and grows at a constant rate.
lines) and component (colored) functions are shown in each The linearized relative phase upon selection Qsel is the difference
example. Figure 5A depicts a state in which the gating variable is between the phase of a unit and its corresponding anchoring unit.
closed, which would characterize a system prior to the intention to The anchoring unit for a gesture is assumed to be its associated
initiate movement. Since the slope of the gating potential is syllable. The linearization is such that it maps phase differences in
positive, the corresponding vector field is negative, and hence the the interval [-p, p] to the interval [-p, 0], preserving the signs of
potential exerts a force that diminishes activation. The same is true unwrapped phase differences. When d exceeds 0, the driving
for the intrinsic decay of activation and in this case activation variable D is activated with a unit step and subsequently decays at
coupling forces. Figure 5B shows the potential when the gating a rate of cD. Hence selection of a unit induces activation of the
variable opens as the result of an intention to produce a response. driving system after a brief delay, with the timing of movement
For the gating potential, x = 0 becomes an unstable equilibrium. initiation delayed from selection in proportion to Qsel. In this way,
The magnitude of the force exerted by the gating variable the coordinative interactions associated with spin exert an
contributes more to the composite potential than inhibitory influence on precisely when movements are initiated.
coupling forces or intrinsic decay, and hence activation will rise.
Figure 5C shows the potential after recurrent inhibition triggered
by selection has induced supression and the gating variable has (
v,Qi{sel =0
closed, so that activation decays rapidly. d_ i ~ ,
0,Qi{sel ~0
Coordination with spin variables 8
>
> (modp (hi {hanch ){2p)=2,Qi,anch wp ð8Þ
The spin variables exert additional control over the precise <
timing of movement initiation by imposing delays on the activation Qi{sel ~ (modp (hi {hanch ){p)=2,Qi,anch v{p
of driving variables. The delays depend upon the phases of >
>
:
(hi {hanch {p)=2,{pƒQi,anch ƒp
gestural systems relative to their associated syllable. The spin
variables are modeled with a phase angle hi which is 2p-periodic.
The phase velocity (Eq. 6) is the sum of three components: an
intrinsic frequency (v), phase-coupling forces (derived from the
relative phase potential VQ), and a Gaussian noise term gv. Phase
coupling forces are the negative derivative of the relative phase _ i ~{cD , di w0, Di /1D : ½0,1
D ð9Þ
potential, with relative phase defined as Qij = hi – hj, following
[55,56]. The parameter aij describes the phase-coupling force
exerted by the spin variable of unit j on the spin variable of unit i.
The signs of the matrix a are symmetric, but their magnitudes Simulations and parameters
need not be. If aij is positive, an attractive spin-interaction exists Differences between response forms (e.g. between a/CV/ and/
between i and j, and the stable equilibria of VQij are located at CCVCC/ syllable, or between a one and two-word response)
mod2p (Qij) = 0. Conversely, if aij is negative, a repulsive spin- correspond to systematic differences in the patterns of coupling
interaction exists between i and j, and mod2p (Qij) = p are the stable interactions between component systems. A compact representa-
equilibria. tion of these patterns is known as a coupling graph [22,34,36].
Coupling graphs shown in Figure 6 represent activation- and spin-
interactions between planning systems, where attractive and
excitatory coupling relations are indicated by solid lines, and
X dVQij Qij repulsive and inhibitory relations are indicated by dashed lines.
h_ i ~vzch { zcgv gv ,h~mod2p h ð6Þ
j
dQij Figure 6A shows activation and spin coupling graphs for a variety
of monosyllabic forms.
The coupling graphs are best understood in the context of a
two-part generalization: when a pair of systems on the same level
of the speech planning hierarchy interact, their activation and spin
interactions will be inhibitory and/or repulsive, and when a pair of
VQij Qij ~{aði,j Þcos(Qij ), Qij ~hi {hj ð7Þ systems on different levels of the hierarchy interact, their
interactions will be excitatory and attractive. An early version of
this generalization has been called the principle of like interaction in
[36]. For current purposes, we can view the speech planning
Generating gestural scores hierarchy to consist of three levels: words, syllables, and gestures.
Gestural activation functions (which are distinct from planning Each unit in the speech plan is associated with its own planning
activation) and a corresponding gestural score (see [19]) are system, which has both a spin variable and an activation variable.
obtained from driving variables (D). Only gestural planning Not all the variables of simultaneously active system will interact,
systems are associated with driving variables, and furthermore, yet those interactions that do occur are important for planning and
the control of driving variables is entirely feed-forward: they do not execution. Crucially, the nature of the interaction between a given
influence the activation, suppression, or gating variables of pair of systems is constrained by the levels of the hierarchy with
planning systems. The dynamics of the driving variables are which those systems are associated.
grossly approximated by assuming that D is limited to the range According to the principle of like interaction, all between-level
[0,1] and receives a step increase when a delay variable d surpasses interactions are both excitatory and attractive. It follows that for a
0. When a unit has not been selected, the value of its delay variable given syllable, all of its associated gestures will become more highly
Figure 6. Examples of activation and spin coupling graphs. Attractive/excitatory coupling relations solid lines) and repulsive/inhibitory
coupling relations (dashed lines) are shown. (A) monosyllables; (B) multisyllabic and multi-word responses.
doi:10.1371/journal.pone.0062800.g006
activated when the syllable becomes more highly active, and their that the model controls the period of time in which gestural driving
spin variables will experience attractive forces acting to minimize variables are active (greater than zero), whereas movement
their phase angles relative to the syllable’s phase angle. It likewise durations observable in kinematic data are commonly truncated
follows that all within-level interactions will be inhibitory and/or due to gestural blending, which is determined in the task-dynamic
repulsive. For articulatory gestures it is hypothesized that model of [19] by a function that weights the contributions of
interacting consonantal and vocalic systems are repulsively spin- simultaneously active gestures in the control of articulator
coupled, while only coda consonant gestures are inhibitorily movements. Here however, the decay rates of driving variables
activation-coupled (to each other and to vocalic gestures). Because were selected by making the simplifying assumption that the
inhibitory activation-coupling entails competitive gating, it follows durations of articulatory movements equate to the duration of time
that coda and vowel gestures cannot be co-selected, whereas onset in which driving variables are active. To obtain these approxi-
and vowel gestures can be co-selected and their spin-coupling mated movement durations, cD = 10 for consonants and cD = 5 for
interactions can influence the timing of their initiation. vowels (see Table S1 for further details).
The principle of like interaction also obtains for prosodic Model equations were implemented as libraries in the Matlab
systems such as syllables and words. Figure 6B shows coupling Simulink environment, and simulations were run using the ode4
graphs for a disyllable/CV.CV/ and a two-word sequence. numerical solver and a fixed time step of 1 ms. Instantaneous
Syllable systems associated with the same word are inhibitorily frequencies v of spin variables were set to 4 Hz, reflecting a typical
coupled, and all word systems are inhibitorily coupled. In contrast, syllable duration of 250 ms [58,59]. The values of all other
systems from different levels are excitatorily and attractively simulation parameters are reported in Table S1.
coupled if they are lexically associated. It is furthermore assumed
that there are relatively low magnitude competitive coupling forces Results and Discussion
between heterosyllabic gestures and syllables associated with
different words (these are not shown in the figure for purposes of This section presents the results of model simulations and
clarity). compares them to empirical patterns. The empirical data
Since our primary interests are in the timing of response-initial considered here are speech reaction times and the relative timing
movement initiation (i.e. reaction time) and the relative timing of of movement initiation in word onsets. It is shown that the model
consonant and vowel movements within syllables (i.e. onset/coda is able to simulate onset/coda timing asymmetries, relative timing
asymmetry), the values for parameters influencing movement of onset consonant and vowel gestures, effects of response
durations are, for current purposes, of secondary concern. preparation on reaction time, and effects of utterance composition
Moreover, the duration of time in which a gesture drives (number of words and syllables per word) on reaction time. It
movement for a given speech sound is subject to many sources should be noted that articulatory timing and reaction times exhibit
of systematic variation in natural speech. These include charac- variation from speaker to speaker and utterance to utterance;
teristics of the speech sound itself, speech rate, stress and moreover, numerous experimental design factors or design-
boundary-adjacency [57], various forms of pragmatic and induced biases may contribute to this variation. The space of
paralinguistic emphasis, as well as language-, dialect-, and such factors is exceedingly large and mostly unexplored. Because
speaker-specific variation. Hence there exists no ‘‘normal’’ of this, speech production models aim primarily to capture
movement duration for a given gesture. For practical purposes, qualitative patterns in data. As long as a model is capable of
the movement trajectories shown in Figure 1 were used as a guide. simulating qualitative patterns, close fits to quantitative values of
In those tokens the onset and coda consonantal movements last data can usually be easily obtained by selecting model parameters
about 100–150 ms. The durations of vocalic gestural movements which optimize those fits. Hence even though the model can be
are longer and more variable (250–400 ms), since they depend to a parameterized to fit the data closely, assessment of the model
larger extent on the preceding and following articulation(s). Note
should be based on its ability to simulate a variety of empirical primarily concerned with qualitative similarity between the model
patterns in a qualitative sense. and empirical data.
Second, all of the existing empirical data related to the effects of Although the activation-spin model is relatively more complex
utterance composition on reaction time are derived from acoustic than a simple coupled oscillators model or competitive selection
measurements. The large sample sizes required to detect such model, one benefit of this complexity is that the model can
effects are generally prohibitive for articulatory studies, which are simulate a wider range of empirical patterns involving articulatory
more time-consuming and expensive than acoustic studies. timing and movement initiation; an additional benefit is that the
Acoustic reaction times are measured using an algorithm which model offers a coherent explanation for a diverse group of
detects the onset of acoustic energy associated with vocal fold phonetic and phonological asymmetries between onset and coda
vibration during vowels. This response detection scheme does not consonants, which neither coordination or selection mechanisms
detect word-initial voiceless consonants that precede a vowel. alone can account for. In spite of the complexity of the model,
Furthermore, it is well-known that articulatory movements in a there are strong constraints on the hypothesized coupling
syllable precede the onset of vocal fold vibration by a substantial parameters that govern model behavior: when between-level
amount that depends on the composition of the syllable onset interactions are present, they are excitatory and attractive, and
[48,49,52]. Because the model simulates the initiation of when within-level interactions are present, they are inhibitory
articulator movements, there is necessarily a discrepancy between and/or repulsive. This suggests there exists a more fundamental
the model predictions and the empirical data derived from connection between the activation and spin variables and their
acoustic reaction times. This is not problematic because we are
Figure 7. Simulations of onset/coda asymmetry and complex onsets/codas. (A) a CVC form ‘‘pot’’. (B) a CCVCC form ‘‘spots’’. Top rows of
panels show potential functions and phases from a sequence of five stages in the planning of the utterance. Bottom rows of panels show activation
variables and gestural driving functions.
doi:10.1371/journal.pone.0062800.g007
coupling interactions, though the nature of that fundamental gesture’s spin variable relative to the spin variable of its associated
connection currently remains elusive. syllable. Hence the coordinative interactions that influence spin
variables determine the precise timing of movement initiation. In a
Simulation of onset/coda asymmetry complex onset, syllable-gesture spin interactions are attractive,
The activation-spin model accounts for differences in how onset acting to minimize the relative phase of consonantal and vocalic
and coda consonantal movements are timed relative to vocalic spin variables; these attractive forces oppose the action of the
movements. In the model, onset consonantal gestures are not repulsive spin-coupling between gestures, which act to maximize
inhibitorily coupled to each other or to vocalic gestures, and hence their relative phase. Hence the equilibrium compromise between
onset consonantal gestures and vocalic gestures can be co-selected. these opposing forces depends on the ratio of the magnitude of
Furthermore, the timing of the initiation of onset and consonantal attractive syllable-gesture coupling forces to repulsive gesture-
gestures is precisely coordinated by spin dynamics. In contrast, gesture coupling forces.
coda consonantal gestures are competitively selected relative to a
preceding vowel or consonantal gesture. Figure 7A illustrates the Dependence of RT on preparation
selection and spin dynamics of articulatory plans in a CVC The activation-spin model accounts for effects of preparation on
response, /pat/ ‘‘pot’’. The top row of panels shows activation reaction time through differences in initial activation between
potential functions and planning system phases from five stages in prepared and unprepared responses. As discussed above, reaction
the planning and production of the response. The stages were time to initiate an utterance depends on whether response plans
chosen for illustrative purposes and their times are indicated in the have been prepared. The difference between RT in prepared and
bottom left panel, which shows planning system activation. Prior unprepared response tasks in the model is reflected by a difference
to the initiation of movement (stage 1), the articulatory plans for in activation when selection gates are first opened. The reaction
each consonant are active but below-threshold; gating variables time to initiate a response is defined as the time from when gating
are closed, and hence activation exhibits a gradual decay (cf. the variables are opened (representing an intention to initiate the
positive slope of the activation potentials). When the intention to response) to when the first articulatory driving variable rises above
produce the response is manifested, all gating variables are opened zero. For prepared responses, activation levels are relatively high
and activation levels rapidly increase (stage 2; cf. the negative slope when the gates are opened, and hence the initial plan reaches the
of the activation potentials). When the/p/ and/a/ plans rise above selection threshold rapidly. For unprepared responses, activation
the selection threshold, the gating variable of/t/ is closed (stage 3). levels are relatively low when gates are opened, and hence it takes
The relative phase of the/p/ and/a/ spin variables at this time longer for activation to grow to exceed the selection threshold.
determines precisely when their articulatory plans will be initiated This additional delay in reaction time (DRT) can be conceptual-
by imposing a relative-phase dependent delay on the activation of ized as a ‘‘retrieval period’’, in which motor plans are being
driving variables. In this same stage, suppression of/p/ and/a/ retrieved from long-term memory. Along with the activation level
plans begins to grow and their activation levels decrease. When/ prior to gate-opening, the activation potential gain cx also
p/ and/a/ activation fall below threshold, the gating variables are influences how quickly activation grows and hence the time
reopened (stage 4), and/t/ activation rises until this plan is also between the gate-opening and selection.
selected and subsequently suppressed (stage 5). The periods of time Initial activation and the activation potential gain cx were varied
in which the driving variables are active is shown in the bottom in simulations of CV syllables in order to determine parameter
right panel of Figure 7A.
The co-selection of the/p/ and/a/ allows for their associated
gestures to be initiated with a delay of around 40 ms, whereas the
competitive selection of the/t/ and/a/ leads to a more substantial
delay between movement initiation (about 180 ms). Note that the
order in which competitively interacting units are selected is
determined by relative activation levels when gates are opened:
the/p/ and/a/ plans were more highly active than the/t/ prior to
the initiation of the response. This activation difference is assumed
to be lexically specified, i.e. part of the long-term memory
associated with the form. Figure 7B illustrates a more complex
CCVCC form, in this case/spats/ ‘‘spots’’. Here it can be
observed that the onset consonant and vowel plans are co-selected,
while both coda consonant plans are sequentially selected after the
preceding plan is deselected. The difference between co-selection
and competitive selection of a pair of plans arises from the setting
of just a single parameter, activation-coupling, which in turn
determines whether gating variables are competitive. Inhibitorily
coupled plans are competitively selected, un-coupled plans can be
co-selected.
Because the model allows for co-selected movement plans to be
coordinated, it can simulate patterns of articulatory timing in
complex onsets, in particular the c-center effect. Figure 7B shows
that the initiations of the onset consonants in ‘‘spots’’ are displaced
Figure 8. Dependence of RT on the activation potential gain (cx)
in opposite directions from the vocalic gesture: this timing pattern and initial activation (x0). Bold line shows prepared-response RT
is attributable to the manner in which selection and coordination = 180 ms, dashed lines show corresponding range of DRT = [150, 250]
interact: once a gesture is co-selected, the precise moment of its for a given value of cx.
initiation is delayed. The delay is determined by the phase of the doi:10.1371/journal.pone.0062800.g008
Hierarchical effects on RT
The activation-spin model can be readily extended to incorpo-
rate hierarchical effects of syllables and words in utterance
planning and execution. As discussed previously, it has been
shown that the RT in prepared responses depends on both the
number of syllables in the initial word of an utterance and on the
number of words in the utterance. Moreover, these effects were
observed to be linear and additive [39]. Simulations were
conducted in which the number of words in a prepared utterance
was varied from one to three and likewise the number of syllables
in each word was varied from one to three. Hence a two-word,
disyllabic utterance would have the form/CV.CV # CV.CV/,
while a three-word, trisyllabic utterance would have the form/
CV.CV.CV # CV.CV.CV # CV.CV.CV/. All parameters for a
given type of system were identical across simulations and
independent of position in the utterance (see Table S1), with the
exception of initial activations. Initial activations of syllables were
set to a constant percentage of the initial activation of their
associated word; likewise, initial activations of gestures were a
constant percentage of the initial activation of their associated
syllable – this ensures that in the absence of large noise
fluctuations, words, syllables, and gestures are selected in the
intended order.
Figure 9. Dependence of RT on the number of words (utterance Model-simulated effects of utterance length and word complex-
length) and the number of syllables per word (word complex- ity on RT are quite comparable to the empirical pattern. Figure 9
ity). The magnitudes of length and complexity effects are comparable
shows simulated RT as a function of the number of words in the
in model simulations (squares) and empirical results (circles).
doi:10.1371/journal.pone.0062800.g009 utterance and the number of syllables in each word (squares),
along with empirical data from [39] for comparison (filled circles).
The large discrepancy between simulated and empirical baseline
values whose RT outcomes are consistent with behavioral findings values follows from the fact that the empirical data are based on
for simple CV syllables in both prepared (delayed) and unprepared acoustically measured RTs, whereas the simulated RTs are based
(immediate) response paradigms. Gating variables were opened on the initiation of articulatory movement. The key correspon-
one time step (0.001 s) into the simulation, so that the effects of dence between model and empirical data pertains to the sizes of
activation decay are negligible and thus the initial activation is the length and complexity effects. Excepting the relatively short
equivalent to the activation level at gate-opening. There are two RT of a one-word monosyllable, the effects between other forms
parameters of interest: cx (activation gain) and x0-prep (initial are quite in line with the empirical results. For example, the
activation), and one outcome variable, RT. Hence the parameters empirical effect of word length has a slope of about 10 ms/word,
resulting in a given RT form a trajectory in the parameter space. and the simulated results are nearly the same for disyllabic and
Furthermore, we are interested in x0-unprep (initial activation in trisyllabic words. The empirical effect of word complexity is such
unprepared responses), which should result in DRT in the range of that RT increased by 12 ms between monosyllabic and trisyllabic
150–250 ms. Figure 8 shows the dependence of RT on the forms. In simulations this value is about 16 ms and 18 ms for two-
activation potential gain (cx) and initial activation (x0). As x0 and three-word utterances, respectively. Hence model simulated
decreases, a constant RT of 180 ms (the empirical value in [48]) RTs accord fairly well with empirical observations. This behavior
can be achieved by increasing cx. For a given cx, there is a range of of the model is due to the inhibitory activation coupling between
unprepared-response x0 that result in DRT of 150–250 ms. Notice planning systems on the same level of the hierarchy. Syllables
that this range expands slightly as prepared-x0 decreases. inhibit each other, resulting in a word complexity effect; the same
However, when prepared-x0 falls below about 0.45, it is not holds for words, which results in the utterance length effect. In
possible to achieve a DRT of 150–250 ms before initial activation both cases, the presence of a greater number of inhibitory
of unprepared plans reaches zero. interactions diminishes activation of prosodic systems and thereby
Hence the model can successfully simulate the difference reduces the extent to which they augment gestural activation; this
between prepared- and unprepared-response reaction times by in turn results in a longer period of time for gestural activation to
manipulation of initial activation. However, the above results only exceed the selection threshold.
obtain specifically for single-word, monosyllabic CV responses
independent of variation in word structure. The RT values will Accounting for onset/coda asymmetries
also depend on the excitatory activation coupling between gestures The success of a model can be judged by its ability to provide a
and syllables, on coupling between syllables and words, on initial unified understanding of a group of apparently unrelated
activations of those systems, and on inhibitory activation coupling phenomena. Phonetic and phonological differences between
forces that increase with utterance length and word complexity. In onsets and codas are good candidates for a group of phenomena
the following section we consider two of these sources of variation: which a theory of speech motor control should explain. A number
effects of utterance length (number of words) and effects of word of such differences were listed in section 1, and are presented again
complexity (number of syllables in each word). in Table 1 below. The coupled oscillators model hypothesizes that
coda consonantal gestures are anti-phase coordinated with a
preceding gesture, and hence attributes some of these patterns to
the relative instability of anti-phase coordination. In contrast, the
Table 1. Comparison of the standard coupled oscillators model and the activation-spin model in accounting for onset/coda
asymmetries.
restricted combinatoriality 3 3
templatic constraints - 3
tonal capacity influence - 3
stress location influence - 3
compensatory lengthening - 3
instability in repetition 3 ?
doi:10.1371/journal.pone.0062800.t001
activation-spin model associates coda consonantal gestures with from the model, since gestures associated with the last selection
selection events that are distinct from selection of preceding event in a syllable are more likely to be obscured by gestures
gestures. Here we evaluate how the two approaches – anti-phase associated with the following syllable. Indeed, exceptions to
coordination and competitive selection – compare in accounting constraints against three tautosyllabic selection events are most
for the patterns, and conclude that the concept of competitive commonly observed in word-final syllables, where coda conso-
selection in the activation-spin model offers a more comprehensive nantal articulations are least prone to being obscured by those of a
understanding of differences between onsets and codas. following syllable.
In many languages, the distribution of coda consonants is more For another example of how selection can inform our
restricted than that of onset consonants: onset consonants combine understanding of moraic theory, consider that some languages
relatively freely with vowels, while coda consonants may be require a morpheme to have at least two moras, such that
prohibited from occurring with certain vowels or may be bimoraic CVV, CVC, and CV.CV morphemes are allowed while
altogether disallowed in coda position. The coupled oscillators a monomoraic CV is not [9]. This sort of requirement can be
model attributes the greater propensity for restrictions on codas interpreted as a requirement that a word involve at least two
and dependency between codas and vowels to the relative selection events. It is not clear how the distinction between in-
instability of anti-phase coupling of coda consonants [7], allowing phase and anti-phase coupling can account for this sort of
for individual combinations to be made more stable through minimality constraint – it provides no obvious way to draw an
learning. In contrast, the activation-spin model attributes these inherent connection between a CVC morpheme and CVCV
patterns to a greater likelihood of overlap between a coda morpheme. In contrast, the activation-spin model provides a
consonantal gesture and onset gestures from a subsequent syllable, straightforward way of understanding such restrictions and
which is ultimately attributable to the independence of coda relating them to restrictions on coda combinatoriality.
gestural selection. When performing multiple selection events in Another set of phenomena that can be readily understood in
association with a syllable, gestures associated with the last relation to selection events involves prosodic features such as tone
selection process are more likely to obscured by gestures associated and pitch accents, which have recently been conceptualized as
with a following syllable. This can occur because the following gestures and have been shown to interact with consonantal and
syllable is selected relative to the preceding syllable, not preceding vocalic articulatory gestures [60,61]. In languages with lexical
gestures. tone, syllables with codas or long vowels (i.e. bimoraic syllables)
Selection events proposed by the activation-spin model can be often have a greater capacity to bear complex or contour tones, i.e.
related in a fundamental way to moraic theory and constraints on tones which consist of multiple pitch targets. The activation-spin
the minimal forms of morphemes/syllables [9,12]. Many phono- model straightforwardly accounts for this observation because it
logical analyses posit the existence of a level of subsyllabic structure holds that a bimoraic syllable has an additional selection event in
known as the ‘‘mora,’’ where the number of moras associated with which an additional tonal gesture can be co-selected and
a syllable often has consequences for a diverse range of patterns coordinated. Along the same lines, in languages with stress, the
involving word/syllable shape, tone, and stress. A short vowel in presence of a bimoraic syllable often influences the location of
the syllable nucleus constitutes one mora, long vowels and stress, such that stress will occur on the first bimoraic syllable
diphthongs constitute two moras, and in many languages coda relative to the beginning or end of a word. Because intonational
consonants are held to be associated with a mora. Prohibitions on pitch accents always associate with stressed syllables, the attraction
codas are more common when the preceding syllable nucleus is of stress to bimoraic syllables can be viewed to result from the
bimoraic, and this is typically viewed as a constraint against availability of an additional selection event in which a component
trimoraic syllables. It is not clear in the context of the coupled of an intonational pitch accent gesture can be selected. In contrast,
oscillators model why codas are substantially more restricted after the hypothesis that codas are anti-phase coordinated with a
bimoraic syllable nuclei. But, if we hypothesize that a bimoraic preceding vowel does not seem to offer a straightforward
vowel involves two selection events, then the prohibition on codas explanation for why stress can be influenced by syllable shape.
after bimoraic vowels can be seen as a constraint against Compensatory lengthening is likewise another phenomenon
producing three selection events in association with a syllable. In that falls out naturally from the activation-spin model, but which is
a sense, this hypothesis entails that a diphthong is bimoraic not easily amenable to explanation on the basis of coordination
because the component vowel gestures each occupy a selection alone. When a coda consonant is lost in the course of diachronic
event, and a long vowel is bimoraic because the vocalic gesture is sound change, it is sometimes observed that the preceding vowel
selected twice. The trimoraic syllable constraint follows naturally lengthens, apparently preserving the bimoraicity of the syllable;
this same compensatory lengthening does not occur when an onset interactions between consonantal gestures and syllables. Coordi-
consonant is lost [14]. This asymmetry in phonological behavior nation of word-initial consonants and vowels is observed in a
has been taken as evidence that onset consonants always share a number of languages where there is a c-center effect that such the
mora with the following vowel. The distinction can be related as consonantal gestures in a C1C2V form appear to be temporally
above to the hypothesis that coda consonantal gestures are displaced in opposite directions from the vocalic gesture. For
associated with an independent selection event, whereas onset example, evidence of this effect has been observed in English
consonantal gestures are co-selected with the following vowel. If it [23,24], French [62], Georgian [63], and for most clusters in
is hypothesized that the selection event associated with a coda can Italian [26]. Here we have shown that this pattern can be
be replaced by re-selection of the vocalic gesture, then the understood to arise from the absence of inhibitory coupling/
preservation of moraic structure observed in compensatory competitive gating among word-initial consonants and the
lengthening can be understood via selection mechanisms. In following vowel. In contrast, there are some languages in which
contrast, the anti-phase coordination mechanism proposed by the the immediately prevocalic consonant of a complex onset is not
coupled oscillators model would be forced to interpret compen- displaced when additional consonants occur initially, i.e. the
satory lengthening as a situation in which a vocalic gesture timing of the prevocalic consonant relative to the vowel does not
becomes anti-phase coordinated with itself, a situation which is differ between CCCV, CCV, and CV forms. This has been found
somewhat undesirable theoretically. to be the case in Moroccan Arabic [64], Slovak [65], Tashlhiyt
The last asymmetry listed in Table 1 less conclusively Berber [66,67], and for/sC/ clusters in Italian [26], where non-
differentiates the two models considered here. In accelerating prevocalic word-initial consonants are not syllabified as onsets.
repetition tasks, a VC syllable often reorganizes into a CV syllable This pattern can be understood to arise when all consonants
at a sufficiently high repetition rate [15,16]. The coupled except for the prevocalic one are sequentially selected and hence
oscillators model offers a conceptually appealing account of this do not influence the coordinative interactions between the
reorganization: the variation of a control parameter (rate) causes prevocalic consonant and vowel.
the relatively less stable anti-phase coordination employed for VC The notion that the selection and coupling interactions of onsets
to become unstable, inducing a phase-transition to the in-phase and codas differ is important because of its explanatory potential
coordination of CV. This can moreover be related to similar for a number of phonological patterns related to syllable structure.
observations in nonspeech domains [21]. To account for this, the Like the coupled oscillators model, the activation-spin model can
activation-spin model must stipulate that the phase-transition is account for the restricted combinatoriality of coda consonants. In
accompanied by a loss of inhibitory interaction between the addition, its use of competitive selection for coda consonants
consonantal and vocalic gesture, allowing for co-selection. Exactly allows for a number of phonological patterns to be understood.
why rate acceleration would induce this loss of inhibitory These include minimality/maximality constraints on morpheme/
interaction is not immediately evident, although one possible syllable shapes, influences of syllable structure on tone and stress,
mechanism might involve habituation. and compensatory lengthening patterns.
In sum, the activation-spin model offers a more comprehensive In addition to accounting for gestural timing patterns, the model
unified explanation for a variety of phonetic and phonological captures reaction time phenomena related to utterance prepara-
patterns involving onset/coda asymmetry. In particular, it tion, utterance length, and word complexity. In all of these cases,
provides an intuitive explanation for morpheme/syllable minim- the role of activation is key. Reaction times in prepared utterances
ality and maximality constraints, interactions between tone/stress are faster than those in unprepared responses because initial
and syllable structure, and compensatory lengthening. A purely activation levels are higher in the former, and hence it takes less
coordination-based model does not readily account for these time for the activation of the initial movement plan to reach the
patterns. Moreover, the activation-spin model provides an equally selection threshold. The effects of utterance length and word
plausible account of the relative prevalence of restrictions on complexity arise from inhibitory interactions between activation
codas, while maintaining a coordination-based account of variables: when there are more plans, each plan experiences more
articulatory timing in syllable onsets. inhibition and so it takes longer for the initial plan to reach the
selection threshold. This holds for word systems and for syllable
Conclusion systems within words. A similar explanation is feasible for related
effects not specifically modeled here. For example, reaction times
The activation-spin model was shown to successfully account for are shorter in production tasks for more frequent words and
important behavioral phenomena involving the timing of speech syllables [46,47,68]. This could arise if the frequency of a word or
articulation. These include: (1) onset-/coda-vowel timing asym- syllable is associated with a higher-level of initial activation.
metry, (2) coordinative timing patterns in simplex and complex It should be noted that the activation-spin model is highly
onsets, (3) the dependence of reaction time on utterance nonlinear and incorporates many parameters; thus there is a risk
preparation, and (4) effects of utterance length and word that the model may be overparameterized, allowing for spuriously
complexity on reaction time. More importantly, the model offers accurate fits of the data. However, the quantitative values of
a unified explanation for a diverse collection of phonetic and empirical speech data depend on factors such as task design and
phonological asymmetries. Here these findings and their implica- measurement procedure, the possibilities of which have been only
tions are discussed further. In addition, we consider potential sparsely sampled experimentally. For this reason, a more
neurological grounding and future elaborations of the model. appropriate assessment of the model is with regard to its
The model was successful in replicating the asymmetry in qualitative correspondence with empirical patterns. In this regard,
consonant-vowel timing between onset and coda consonants. overparameterization is less of a concern, because a model cannot
Onset consonantal gestures and vocalic gestures are known to be produce a good qualitative correspondence with a range of data if
initiated close together in time. The model simulates subtle it does not have an adequate structure. This of course begs the
temporal effects involving the coordination of syllable-initial question of whether the model structure is motivated in any way.
gestures by means of a compromise between repulsive spin On that note, many aspects of the model can potentially be
interactions among consonantal gestures and attractive spin related to neural mechanisms or other aspects of motor control.
The intent in describing these relations here is not to make One open issue in the model regards the nature of articulatory
definitive claims about neural mechanisms, but rather, to shed targets and how those targets are mapped to movements of
light on elements of the model which might otherwise seem individual articulators. Here we assumed the task-dynamic model
entirely arbitrary, and to suggest ways in which the activation-spin [19], in which the activation of a gesture displaces an equilibrium
model might be further extended or grounded in neurophysiology. expressed in coordinates of vocal tract geometry; the motion of a
For one, the intrinsic frequency of planning systems (in simulations gestural system to the new equilibrium drives the movement of
above: v = 4 Hz, Tv = 0.250 s) was chosen for consistency with individual articulators through a weighted inverse model that
typical syllable durations in speech, but notably this value also relates tract geometry to articulator positions. However, there are
corresponds to the low end of the range associated with cortical viable alternative control frameworks, such as threshold control
theta rhythms, 4-8 Hz. Theta-band power has been observed in theory [84–86], or forward-inverse neuromotor computation
MEG and EEG recordings in premotor cortex and has been [87,88] that can accomplish the control of movement trajectories.
argued to provide a coordinative mechanism for motor execution Since the activation-spin model addresses the relatively high-level
[69–72], to facilitate the maintenance of plans in working memory processes of motor selection and initiation, it could potentially be
[73], and to regulate sensorimotor integration [74,75]. Although used to govern any of these control schemes.
theta oscillations have been ascribed diverse functions and their Future development of the activation-spin model should also
exact role in motor planning remains unclear, it is possible that attempt to model the effects of diverse sources of variation on
spin dynamics are instantiated through neural mechanisms that kinematic properties such as movement durations, velocities, and
give rise to these low-frequency oscillations. targets. One challenge in addressing this is that empirically
Another aspect of the activation-spin model that can be related observed movement durations do not correspond precisely to the
to neural systems involves the relation between activation, gating, durations of time in which the model’s gestural driving variables
and selection. The gating variables serve to prevent the are active. Specifically, due to gestural blending and the overlap of
simultaneous selection of competing active plans. In this regard, gestures sharing the same articulatory effectors, the durations of
the relation between activation and gating is analogous to the physical movements will be truncated relative to the period of time
relation hypothesized to exist between the frontal cortex and the in which their associated gestures are active [19]. In the model
basal ganglia: motor plans are actively maintained in pre-frontal/ presented here, movement durations follow simply from the decay
pre-motor cortex and are selectively disinhibited by the basal rates of driving variables, but there are many factors which
ganglia, which results in selection of motor responses [76–78]. The interact with movement plans to influence durations. Among these
structure of the model implies that articulatory plans associated are the articulatory context in which a movement occurs, speech
with different segmental units need not be competitively gated. rate, rhythm, and proximity to prosodic boundaries. One
This is certainly true within a given segment, where several possibility that warrants consideration is that suprathreshold
gestures are often coordinated. For example, the segment [n] in gestural planning activation may modulate gestural driving
the word ‘‘nap’’ involves the co-selection of three gestures: a variables, such that driving variables decay more slowly when
tongue-tip constriction gesture to create a closure against the gestural planning activation is higher, as might be the case for
palate, a velum lowering gesture to allow air to flow through the gestural planning systems associated with rhythmic or prosodic
nasal cavity, and a glottal adduction gesture to enable vibration of prominence. Such ideas have been considered in [53,54], but have
the vocal folds. The model asserts that the same co-selection can not yet been integrated into the present model.
occur between gestures associated with different segments. In sum, this paper investigated a dynamical model of sequential
The role of suppression variables in the activation-spin model is speech movement planning. It was argued that sequencing and
to decrease activation after a plan has been selected. This relation coordination of movement arise from distinct cognitive mecha-
between activation and suppression may be associated with models nisms associated with two dynamical variables, activation and spin.
of feedback control in which motor activation is modulated by a In combination with a competitive gating mechanism and a
mismatch between sensory target representations and a motor selection threshold, activation regulates the ordered selection of
efference copy [79–83]. For example, in the hierarchical state movements. Spin variables oscillate and their relative phases exert
feedback control model of [79], activation of abstract lexical and precision control over the timing of the execution of co-selected
conceptual representations actives corresponding pre-motor rep- movements, such as tautosyllabic onset consonants. Because
resentations as well as sensory representations of expected selection and coordination are general phenomena, this model
somatosensory and auditory feedback corresponding to the motor can be applied to non-speech domains of motor control. The
actions. When active, the sensory representations exert excitatory model presents the first unified dynamical approach to investigat-
influences upon the motor representations, but the motor ing timing phenomena across several levels of the speech planning
representations inhibit corresponding sensory ones. These mo- hierarchy, from words to syllables to articulatory gestures.
tor-to-sensory inhibitory connections are hypothesized to serve the Moreover, the model draws a novel connection between moraic
function of a forward model, i.e. a representation of the expected theory and selection mechanisms, associating moras with selection
consequences of motor actions expressed in sensory coordinates. events. This enables the model to provide a unified explanation for
As the motor plans become highly active, they inhibit the sensory a diverse body of linguistic phenomena.
representations and hence diminish their own activation. In other
words, when the state feedback model observes a match between Supporting Information
target sensory feedback and anticipated sensory feedback, it turns
off motor activity. The dynamics of the suppression variables in Table S1 Model parameters used for simulations of
the activation-spin model may be seen as a manifestation of this hierarchical coupling shown in Figure 9.
process. Furthermore, the model can potentially be extended to (DOCX)
operate with state feedback control by incorporating sensory target
representations and sensory-motor interactions for each articula-
tory gesture.
References
1. Lashley K (1951) The problem of serial order in behavior. L. A. Jeffress (Ed.), 34. Saltzman E, Nam H, Krivokapic J, Goldstein L (2008) A task-dynamic toolkit for
Cerebral mechanisms in behavior. New York: Wiley. 112–135. modeling the effects of prosodic structure on articulation. Proceedings of the 4th
2. Martin JG (1972) Rhythmic (hierarchical) versus serial structure in speech and international conference on speech prosody. Brazil: Campinas. 175–184.
other behavior. Psychological Review 76(6): 487–509. 35. Tilsen S (2009) Multitimescale dynamical interactions between speech rhythm
3. Keele SW, Ivry RI (1987) Modular analysis of timing in motor skill. The and gesture. Cognitive science 33: 839–879.
psychology of learning and motivation 21: 183–228. 36. Tilsen S (2009) Toward a dynamical interpretation of hierarchical linguistic
4. Summers JJ (1977) The relationship between the sequencing and timing structure. UC Berkeley Phonology Lab Annual Report: 462–512.
components of a skill. Journal of Motor Behavior 9(1): 49–59. 37. Henry FM, Rogers DE (1960) Increased response latency for complicated
5. Grossberg S (1978) A theory of human memory: Self-organization and movements and a‘‘ memory drum’’ theory of neuromotor reaction. Research
performance of sensory-motor codes, maps, and plans. Progress in theoretical Quarterly of the American Association for Health, Physical Education, &
biology 5: 233–374. Recreation 31: 448–458.
6. Bullock D, Rhodes B (2002) Competitive queuing for planning and serial 38. Sternberg S, Monsell S, Knoll RL, Wright CE (1978) The latency and duration
performance. CAS/CNS Technical Report Series 3: 1–9. of rapid movement sequences: Comparisons of speech and typewriting.
7. Goldstein L, Byrd D, Saltzman E (2006) The role of vocal tract gestural action Information processing in motor control and learning: 117–152.
units in understanding the evolution of phonology. Action to language via the 39. Sternberg S, Knoll R, Monsell S, Wright C (1988) Motor programs and
mirror neuron system: 215–249. hierarchical organization in the control of rapid speech. Phonetica 45: 175–197.
8. Zec D (2007) The syllable. The Cambridge handbook of phonology: 161–194. 40. Fromkin VA (1971) The non-anomalous nature of anomalous utterances.
9. McCarthy JJ, Prince A (1986) Prosodic Morphology I: constraint interaction and Language: 27–52.
satisfaction. Technical Report #32, Rutgers University for Cognitive Science 41. Shattuck-Hufnagel S (1979) Speech errors as evidence for a serial-ordering
1996. mechanism in sentence production. Sentence processing: Psycholinguistic studies
10. Hyman LM (2007) Universals of tone rules: 30 years later. Tones and tunes: presented to Merrill Garrett. 295–342.
Studies in word and sentence prosody: 1–34. 42. Stemberger JP (1985) An interactive activation model of language production.
11. Gordon M (2002) A typology of contour tone restrictions. Studies in Language Progress in the psychology of language 1: 143–186.
25: 423–462. 43. Dell GS (1986) A spreading-activation theory of retrieval in sentence production.
12. Hyman LM (1985) A theory of phonological weight. Dortrecht: Foris Psychological review 93: 283–321.
Publications. 44. Bohland JW, Bullock D, Guenther FH (2010) Neural representations and
13. Gordon M (2006) Syllable weight: phonetics, phonology, typology. Routledge. mechanisms for the performance of simple speech sequences. Journal of
14. Kavitskaya D (2002) Compensatory lengthening: phonetics, phonology, cognitive neuroscience 22: 1504–1529.
diachrony. Routledge. 45. Levelt W, Wheeldon L (1994) Do speakers have access to a mental syllabary?
15. Stetson RH (1951) Motor phonetics; a study of speech movements in action. Cognition 50: 239–269.
16. Tuller B, Kelso JA (1990) Phase transitions in speech production and their 46. Cholin J, Levelt W, Schiller NO (2006) Effects of syllable frequency in speech
perceptual consequences. Attention and performance 13. production. Cognition 99: 205–235.
17. Browman C, Goldstein L (1989) Articulatory gestures as phonological units. 47. Cholin J, Levelt W (2009) Effects of syllable preparation and syllable frequency
Phonology 6: 201–251. in speech production: Further evidence for syllabic units at a post-lexical level.
18. Browman C, Goldstein L (1992) Articulatory phonology: An overview. Language and cognitive processes 24: 662–684.
Phonetica 49: 155–180.
48. Mooshammer C, Goldstein L, Nam H, McClure S, Saltzman E, et al. (2012)
19. Saltzman E, Munhall K (1989) A dynamical approach to gestural patterning in Bridging planning and execution: Temporal planning of syllables. Journal of
speech production. Ecological Psychology 1: 333–382.
Phonetics 40: 347–389.
20. Hoole P, Nguyen N (1999) Electromagnetic articulography. Coarticulation–
49. Kawamoto AH, Liu Q, Mura K, Sanchez A (2008) Articulatory preparation in
Theory, Data and Techniques, Cambridge Studies in Speech Science and
the delayed naming task. Journal of Memory and Language 58: 347–365.
Communication: 260–269.
50. Klapp ST (2003) Reaction time analysis of two types of motor preparation for
21. Haken H, Kelso JAS, Bunz H (1985) A theoretical model of phase transitions in
speech articulation: Action as a sequence of chunks. Journal of Motor Behavior
human hand movements. Biological cybernetics 51: 347–356.
35: 135–150.
22. Browman C, Goldstein L (2000) Competing constraints on intergestural
51. Wheeldon L, Lahiri A (1997) Prosodic Units in Speech Production. Journal of
coordination and self-organization of phonological structures. Bulletin de la
Memory and Language 37: 356–381.
Communication Parlée 5: 25–34.
52. Kawamoto AH, Kello CT (1999) Effect of onset cluster complexity in speeded
23. Browman C, Goldstein L (1988) Some notes on syllable structure in articulatory
phonology. Phonetica 45: 140–155. naming: A test of rule-based approaches. Journal of Experimental Psychology:
Human Perception and Performance 25: 361–375.
24. Marin S, Pouplier M (2010) Temporal organization of complex onsets and codas
in American English: Testing the predictions of a gestural coupling model. 53. Tilsen S (2011) Metrical regularity facilitates speech planning and production.
Motor Control 14: 380–407. Laboratory Phonology 2: 185–218.
25. Honorof DN, Browman C (1995) The center or edge: How are consonant 54. Tilsen S (2011) Effects of syllable stress on articulatory planning observed in a
clusters organized with respect to the vowel. Proceedings of the XIIIth stop-signal experiment. Journal of Phonetics 39: 642–659.
international congress of phonetic sciences. Vol. 3. 552–555. 55. Kuramoto Y (1975) Self-entrainment of a population of coupled non-linear
26. Hermes A, Grice M, Mücke D, Niemann H (2008) Articulatory indicators of oscillators. International symposium on mathematical problems in theoretical
syllable affiliation in word initial consonant clusters in Italian. Proceedings of the physics. 420–422.
8th International Seminar on Speech Production. Strasbourg, Vol. 151. 433– 56. Acebrón JA, Bonilla LL, Vicente CJP, Ritort F, Spigler R (2005) The Kuramoto
436. model: A simple paradigm for synchronization phenomena. Rev Mod Phys 77:
27. Kuhnert B, Hoole P, Mooshammer C (2006) Gestural overlap and C-center in 137–185.
selected French consonant clusters. Proc. 7th International Seminar on Speech 57. Byrd D, Saltzman E (2003) The elastic phrase: Modeling the dynamics of
Production. 327–334. boundary-adjacent lengthening. Journal of Phonetics 31: 149–180.
28. Nam H, Saltzman E (2003) A competitive, coupled oscillator model of syllable 58. Peterson GE, Lehiste I (1960) Duration of syllable nuclei in English. The Journal
structure. Proceedings of the 15th International Conference on Phonetic of the Acoustical Society of America 32: 693–703.
Sciences. Barcelona, Spain. 2253–2256. 59. Crystal TH, House AS (1990) Articulation rate and the duration of syllables and
29. Öhman SE. (1967) Numerical model of coarticulation. The Journal of the stress groups in connected speech. The Journal of the Acoustical Society of
Acoustical Society of America 41: 310. America 88: 101–112.
30. Browman C, Goldstein L (1990) Gestural specification using dynamically- 60. Goldstein L (2012) Coupling of tone and constriction gestures in pitch accents
defined articulatory structures. Journal of Phonetics 18: 299–320. Doris Mucke, Hosung Nam, Anne Hermes and. Consonant Clusters and
31. O’Dell M, Nieminen T (1999) Coupled oscillator model of speech rhythm. Structural Complexity 26: 205.
Proceedings of the XIVth international congress of phonetic sciences. Vol. 2. 61. Gao M (2009) Gestural coordination among vowel, consonant and tone gestures
1075–1078. in Mandarin Chinese. Chinese Journal of Phonetics 2: 43–50.
32. Cummins F, Port R (1998) Rhythmic constraints on stress timing in English. 62. Kühnert B, Hoole P, Mooshammer C, others (2006) Gestural overlap and C-
Journal of Phonetics 26: 145–171. center in selected French consonant clusters. Proc. 7th International Seminar on
33. Port RF (2003) Meter and speech. Journal of Phonetics 31: 599–611. Speech Production. 327–334.
63. Goldstein L, Chitoran I, Selkirk E (2007) Syllable structure as coupled oscillator 75. Fries P, others (2005) A mechanism for cognitive dynamics: neuronal
modes: evidence from Georgian vs. Tashlhiyt Berber. Proceedings of the XVIth communication through neuronal coherence. Trends in cognitive sciences 9:
International Congress of Phonetic Sciences. 241–244. 474–480.
64. Shaw JA, Gafos AI, Hoole P, Zeroual C (2011) Dynamic invariance in the 76. Frank MJ, Loughry B, O’Reilly RC (2001) Interactions between frontal cortex
phonetic expression of syllable structure: a case study of Moroccan Arabic and basal ganglia in working memory: a computational model. Cognitive,
consonant clusters. Phonology 28: 455–490. Affective, & Behavioral Neuroscience 1: 137–160.
65. Pouplier M, Beňuš Š (2011) On the phonetic status of syllabic consonants: 77. Hazy TE, Frank MJ, O’Reilly RC (2007) Towards an executive without a
Evidence from Slovak. Laboratory phonology 2: 243–273. homunculus: computational models of the prefrontal cortex/basal ganglia
66. Chitoran I, Goldstein L (2006) Testing the phonological status of perceptual system. Phil Trans R Soc B 362: 1601–1613. doi:10.1098/rstb.2007.2055.
recoverability: Articulatory evidence from Georgian. Proc. of the 10th 78. DeLong MR, Georgopoulos AP (2011) Motor Functions of the Basal Ganglia.
Conference on Laboratory Phonology, Paris, June 29th–July 1st. 69–70. Comprehensive Physiology. John Wiley & Sons, Inc. 1017–1061.
67. Hermes A, Ridouane R, Muücke D, Grice M (2011) Kinematics of syllable 79. Hickok G (2012) Computational neuroanatomy of speech production. Nature
structure in Tashlhiyt Berber: The case of vocalic and consonantal nuclei. 9th Reviews Neuroscience 13: 135–145.
International Seminar on Speech production. Montreal, Canada. 401–408.
80. Tian X, Poeppel D (2010) Mental imagery of speech and movement implicates
68. Balota DA, Chumbley JI (1985) The locus of word-frequency effects in the
the dynamics of internal forward models. Front Psychol 1: 166. doi:10.3389/
pronunciation task: Lexical access and/or production? Journal of Memory and
fpsyg.2010.00166.
Language 24: 89–106.
69. Giraud A-L, Kleinschmidt A, Poeppel D, Lund TE, Frackowiak RSJ, et al. 81. Guenther FH (2006) Cortical interactions underlying the production of speech
(2007) Endogenous Cortical Rhythms Determine Cerebral Specialization for sounds. J Commun Disord 39: 350–365. doi:10.1016/j.jcomdis.2006.06.013.
Speech Perception and Production. Neuron 56: 1127–1134. doi:10.1016/ 82. Houde JF, Jordan MI (1998) Sensorimotor Adaptation in Speech Production.
j.neuron.2007.09.038. Science 279: 1213–1216. doi:10.1126/science.279.5354.1213.
70. Fukai T (1999) Sequence generation in arbitrary temporal patterns from theta- 83. Hickok G, Houde J, Rong F (2011) Sensorimotor Integration in Speech
nested gamma oscillations: a model of the basal ganglia–thalamo-cortical loops. Processing: Computational Basis and Neural Organization. Neuron 69: 407–
Neural Networks 12: 975–987. doi:10.1016/S0893-6080(99)00057-X. 422. doi:10.1016/j.neuron.2011.01.019.
71. Popivanov D, Mineva A, Krekule I (1999) EEG patterns in theta and gamma 84. Feldman AG (1986) Once more on the equilibrium-point hypothesis (lambda
frequency range and their probable relation to human voluntary movement model) for motor control. Journal of motor behavior 18: 17–54.
organization. Neuroscience Letters 267: 5–8. doi:10.1016/S0304- 85. Ostry DJ, Feldman AG (2003) A critical evaluation of the force control
3940(99)00271-2. hypothesis in motor control. Experimental Brain Research 153: 275–288.
72. Jantzen K, Kelso J (2007) Neural coordination dynamics of human sensorimotor 86. Feldman AG, Levin MF (2009) The equilibrium-point hypothesis–past, present
behavior: A Review. Handbook of brain connectivity: 421–461. and future. Progress in Motor Control: 699–726.
73. Raghavachari S, Kahana MJ, Rizzuto DS, Caplan JB, Kirschen MP, et al. 87. Kawato M (1999) Internal models for motor control and trajectory planning.
(2001) Gating of Human Theta Oscillations by a Working Memory Task. Current opinion in neurobiology 9: 718–727.
J Neurosci 21: 3175–3183. 88. Haruno M, Wolpert DM, Kawato M (2001) Mosaic model for sensorimotor
74. Donner TH, Siegel M (2011) A framework for local cortical oscillation patterns. learning and control. Neural computation 13: 2201–2220.
Trends in Cognitive Sciences 15: 191–199. doi:10.1016/j.tics.2011.03.007.