Active Inference: A Process Theory: Article
Active Inference: A Process Theory: Article
Karl Friston
k.friston@ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.
Thomas FitzGerald
thomas.fitzgerald@ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.,
and Max Planck–UCL Centre for Computational Psychiatry and Ageing Research,
London WC1B 5BE, U.K.
Francesco Rigoli
f.rigoli@ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.
Philipp Schwartenbeck
philipp.schwartenbeck.12@alumni.ucl.ac.uk
Wellcome Trust Centre for Neuroimaging, UCL, London WC1N 3BG, U.K.;
Max Planck–UCL Centre for Computational Psychiatry and Ageing Research,
London, WC1B 5BE, U.K.; Centre for Neurocognitive Research, University
of Salzburg, 5020 Salzburg, Austria; and Neuroscience Institute,
Christian-Doppler-Klinik, Paracelsus Medical University Salzburg,
A-5020 Salzburg, Austria
Giovanni Pezzulo
giovanni.pezzulo@gmail.com
Institute of Cognitive Sciences and Technologies, National Research Council,
00185 Rome, Italy
This article describes a process theory based on active inference and be-
lief propagation. Starting from the premise that all neuronal processing
(and action selection) can be explained by maximizing Bayesian model
evidence—or minimizing variational free energy—we ask whether neu-
ronal responses can be described as a gradient descent on variational free
energy. Using a standard (Markov decision process) generative model, we
derive the neuronal dynamics implicit in this description and reproduce
a remarkable range of well-characterized neuronal phenomena. These in-
clude repetition suppression, mismatch negativity, violation responses,
place-cell activity, phase precession, theta sequences, theta-gamma cou-
pling, evidence accumulation, race-to-bound dynamics, and transfer of
dopamine responses. Furthermore, the (approximately Bayes’ optimal)
1 Introduction
There has been a paradigm shift in the cognitive neurosciences over the
past decade toward the Bayesian brain and predictive coding (Ballard, Hin-
ton, & Sejnowski, 1983; Rao & Ballard, 1999; Knill & Pouget, 2004; Yuille &
Kersten, 2006; De Bruin & Michael, 2015). At the same time, there has been
a resurgence of enactivism; emphasizing the embodied aspect of percep-
tion (O’Regan & Noë, 2001; Friston, Mattout, & Kilner, 2011; Ballard, Kit,
Rothkopf, & Sullivan, 2013; Clark, 2013; Seth, 2013; Barrett & Simmons, 2015;
Pezzulo, Rigoli, & Friston, 2015). Even in consciousness research and phi-
losophy, related ideas are finding traction (Clark, 2013; Hohwy, 2013, 2014).
Many of these developments have informed (and have been informed by) a
variational principle of least free energy (Friston, Kilner, & Harrison, 2006;
Friston, 2012), namely, active (Bayesian) inference.
However, the enthusiasm for Bayesian theories of brain function is ac-
companied by an understandable skepticism about their usefulness, par-
ticularly in furnishing testable process theories (Bowers & Davis, 2012).
Indeed, one could argue that many current normative theories fail to pro-
vide detailed and physiologically plausible predictions about the processes
that might implement them. And when they do, their connection with
a normative or variational principle is often obscure. In this work, we
show that process theories can be derived in a relatively straightforward
way from variational principles. The level of detail we consider is fairly
coarse; however, the explanatory scope of the resulting process theory is
remarkable—and provides an integrative (and simplifying) perspective on
many phenomena that are studied in systems neuroscience. The aim of this
article is to describe the basic ideas and illustrate the emergent processes
using simulations of neuronal responses. We anticipate revisiting some is-
sues in depth: in particular, a companion paper focuses on learning and
the emergence of habits as a natural consequence of observing one’s own
behavior (Friston et al., 2016).
This article has three sections. The first describes active inference, com-
bining earlier formulations of planning as inference (Botvinick & Tous-
saint, 2012; Friston et al., 2014) with Bayesian model averaging (FitzGerald,
Dolan, & Friston, 2014) and learning (FitzGerald, Dolan, & Friston, 2015).
Importantly, action (i.e., policy selection), perception (i.e., state estimation),
and learning (i.e., reinforcement learning) all minimize the same quantity:
variational free energy. This refinement of previous schemes considers an
explicit representation of past and future states, conditioned on competing
Active Inference 3
policies. This leads to Bayesian belief updates that are informed by beliefs
about the future (prediction) and context learning that is informed by beliefs
about the past (postdiction). Technically, these updates implement a form
of Bayesian smoothing, with explicit representations of states over time,
which include future (i.e., counterfactual) states. Furthermore, the implicit
variational updates have some biological plausibility in the sense that they
eschew neuronally implausible computations. For example, expectations
about future states are sigmoid functions of linear mixtures of the pre-
ceding and subsequent states. An alternative parameterization, which did
not appeal to explicit representations over time, would require recursive
matrix multiplication, for which no neuronally plausible implementation
has been proposed. Under this belief parameterization, learning is medi-
ated by classical associative (synaptic) plasticity. The remaining sections
use simulations of foraging in a radial maze to illustrate some key aspects
of inference and learning, respectively.
The inference section describes the behavioral and neuronal correlates
of belief updating during inference or planning, with an emphasis on elec-
trophysiological correlates and the encoding of precision by dopamine. It
illustrates a number of phenomena that are ubiquitous in empirical stud-
ies. These include repetition suppression (de Gardelle, Waszczuk, Egner,
& Summerfield, 2013), violation and omission responses (Bendixen, San-
Miguel, & Schroger, 2012), and neuronal responses that are characteris-
tic of the hippocampus, namely, place cell activity (Moser, Rowland, &
Moser, 2015), theta-gamma coupling, theta sequences and phase precession
(Burgess, Barry, & O’Keefe, 2007; Lisman & Redish, 2009). We also touch on
dynamics seen in parietal and prefrontal cortex, such as evidence accumula-
tion and race-to-bound or threshold (Huk & Shadlen, 2005, Gold & Shadlen,
2007; Hunt et al., 2012; Solway & Botvinick, 2012; de Lafuente, Jazayeri, &
Shadlen, 2015; FitzGerald, Moran, Friston, & Dolan, 2015; Latimer, Yates,
Meister, Huk, & Pillow, 2015).
The final section considers context learning and illustrates the transfer
of dopamine responses to conditioned stimuli, as agents become familiar
with experimental contingencies (Fiorillo, Tobler, & Schultz, 2003). We con-
clude with a brief demonstration of epistemic foraging. The aim of these
simulations is to illustrate how all of the phenomena emerge from a sin-
gle imperative (to minimize free energy) and how they contextualize each
other.
variational free energy (Friston, 2013). This leads to some surprisingly sim-
ple update rules for action, perception, policy selection, learning, and the
encoding of uncertainty or its complement, precision. Although some of
the intervening formalism looks complicated, what comes out at the end
are update rules that will be familiar to many readers (e.g., integrate-and-
fire dynamics with sigmoid activation functions and plasticity with asso-
ciative and decay terms). This means that the underlying theory can be
tied to neuronal processes in a fairly straightforward way. Furthermore, the
formalism accommodates a number of established normative approaches,
thereby providing an integrative framework.
In principle, the scheme described in this section can be applied to any
paradigm or choice behavior. Indeed, earlier versions have been used to
model waiting games (Friston et al., 2013), the urn task and evidence accu-
mulation (FitzGerald, Schwartenbeck, Moutoussis, Dolan, & Friston, 2015),
trust games from behavioral economics (Moutoussis, Trujillo-Barreto, El-
Deredy, Dolan, & Friston, 2014; Schwartenbeck, FitzGerald, Mathys, Dolan,
Kronbichler et al., 2015), addictive behavior (Schwartenbeck, FitzGerald,
Mathys, Dolan, Wurst et al., 2015), two-step maze tasks (Friston, Rigoli
et al., 2015), and engineering benchmarks such as the mountain car prob-
lem (Friston, Adams, & Montague, 2012). It has also been used in the setting
of computational fMRI (Schwartenbeck, FitzGerald, Mathys, Dolan, & Fris-
ton, 2015).
In brief, active inference separates the problems of optimizing action
and perception by assuming that action fulfills predictions based on in-
ferred states of the world. Optimal predictions are therefore based on (sen-
sory) evidence that is evaluated using a generative model of (observed)
outcomes. This allows one to frame behavior as fulfilling optimistic pre-
dictions, where the optimism is prescribed by prior preferences or goals
(Friston et al., 2014). In other words, action realizes predictions that are
biased toward preferred outcomes. More specifically, the generative model
entails beliefs about future states and policies, where policies that lead to
preferred outcomes are more likely. This enables action to realize the next
(proximal) outcome predicted by the policy that leads to (distal) goals. This
behavior emerges when action and inference maximize the evidence or
marginal likelihood of the model generating predictions. Note that action
is prescribed by predictions of the next outcome and is not itself part of
the inference process. This separation of action and perceptual inference or
state estimation can be understood by associating action with peripheral
reflexes in the motor system that fulfill top-down motor predictions about
how we move (Feldman, 2009; Adams, Shipp, & Friston, 2013).
The models considered in this article include states of the world in
the past and the future. This enables agents to select policies that will
maximize model evidence in the future by minimizing expected free en-
ergy. Furthermore, it enables learning about contingencies based on state
transitions that are inferred retrospectively. We will see that this leads to a
Bayes-optimal arbitration between epistemic (explorative) and pragmatic
Active Inference 5
Expression Description
Bπτ = B(u = π (τ )) ∈ [0, 1] Transition probability for hidden states under each
Bπτ = ln Bπτ action prescribed by a policy at a particular time
and its logarithm
2.1 The Generative Model. The generative model is at the heart of (ac-
tive) Bayesian inference. In simple terms, the generative model is just a way
of formalizing beliefs about the way outcomes are caused. Usually a genera-
tive model is specified in terms of the likelihood of each outcome, given their
causes and the prior probability of those causes. Inference then corresponds
to inverting the model, which means computing the posterior probability
of (unknown or hidden) causes, given observed outcomes. In approximate
Bayesian inference, this entails optimizing an approximate posterior so that
it minimizes variational free energy. In other words, the difficult problem
of exact Bayesian inference is converted into an easy optimization prob-
lem, where the approximate posterior minimizes a (variational free energy)
functional of observed outcomes, under a given generative model. We will
see later that when variational free energy is minimized, it approximates
the (negative) log evidence or marginal likelihood of the outcomes, namely,
the probability of the outcomes under the generative model.
In our case, the generative model can be parameterized in a general way
as follows, where the model parameters are η = {a, b, d, β}:
T
P õ, s̃, π, η = P(π )P(η) P(ot |st )P(st |st−1 , π )
t=1
P ot |st = Cat(A)
P st+1 |st , π = Cat(B(u = π (t)))
P s1 |s0 = Cat(D)
P (π ) = σ (−γ · G(π )) (2.1)
P(A) = Dir(a)
P(B) = Dir(b)
P(D) = Dir(d)
P(γ ) = (1, β ).
Q st |π = Cat(stπ )
Q(π ) = Cat(π) (2.2)
Q(A) = Dir(a)
Q(B) = Dir(b)
8 K. Friston et al.
Q(D) = Dir(d)
Q(γ ) = (1, β)
εt+1
u
= ot+1 − ot+1
u
(2.3)
ot+1 = Ast+1
u
ot+1 = AB(u)st
st = ππ · stπ .
π
Active Inference 9
2.3 Free Energy and Expected Free Energy. In active inference, all the
heavy lifting is done by minimizing free energy with respect to expectations
about hidden states, policies, and parameters. Variational free energy can
be expressed as a function of these posterior beliefs in a number of ways:
≈ P(x|õ)
= D[Q(x)||P(x|õ)] − ln P(õ)
complexity accuracy
10 K. Friston et al.
where Q̃ = Q(oτ , sτ |π ) = P(oτ |sτ )Q(sτ |π ) ≈ P(oτ , sτ |õ, π ) and Q(oτ |sτ , π ) =
P(oτ |sτ ).
In the expected free energy, the relative entropy becomes the mutual
information between hidden states and the outcomes they cause (and vice
versa), while the log evidence becomes the log evidence expected under
predicted outcomes. By associating the log-prior over outcomes with utility
Active Inference 11
or prior preferences, U(oτ ) = ln P(oτ ), the expected free energy can also
be expressed in terms of epistemic and extrinsic value (the penultimate
equality in equation 2.5). This means that extrinsic value is the (log) evidence
for a generative model expected under a particular policy. In other words,
because our model of the world entails prior preferences, any outcomes that
provide evidence for our model (and implicit preferences) have pragmatic
or extrinsic value. In practice, utilities are defined only to within an additive
constant, such that the prior probability of an outcome is a softmax function
of utility: P(oτ ) = σ (U(oτ )). This means prior preferences depend only on
utility differences and are inherently context sensitive (Rigoli, Friston, &
Dolan, 2016).
Epistemic value is the expected information gain (i.e., mutual informa-
tion) afforded to hidden states by future outcomes and vice-versa.1 We
will see below that epistemic value can be thought of as driving curiosity
and novelty-seeking behavior, by which we resolve uncertainty and ig-
norance. A final rearrangement shows that complexity becomes expected
cost—namely, the KL divergence between the posterior predictions and
prior preferences—while accuracy becomes the accuracy expected under
predicted outcomes (i.e., negative ambiguity). This last equality in equa-
tion 2.5 shows how expected free energy can be evaluated relatively easily;
it is just the divergence between the predicted and preferred outcomes plus
the ambiguity (i.e., entropy) expected under predicted states.
In summary, expected free energy is defined in relation to prior beliefs
about future outcomes. These define the expected cost or complexity and
complete the generative model. It is these priors that lend inference and
action a purposeful or goal-directed aspect because they represent prefer-
ences or goals. These preferences define agents in terms of characteristic
states they expect to occupy and, through action, tend to frequent.
There are several interpretations of expected free energy that appeal to
and contextualize/established constructs. For example, maximizing epis-
temic value is equivalent to maximizing (expected) Bayesian surprise
(Schmidhuber, 1991; Itti & Baldi, 2009), where Bayesian surprise is the KL
divergence between posterior and prior beliefs. This can also be interpreted
in terms of the principle of maximum mutual information or minimum re-
dundancy (Barlow, 1961; Linsker, 1990; Olshausen & Field, 1996; Laughlin,
2001). This is because epistemic value is the mutual information between
hidden states and observations: I(Sτ , Oτ |π ) = H[Q(sτ |π )] − H[Q(sτ |oτ , π )].
In other words, it reports the reduction in uncertainty about hidden states
afforded by observations. Because the KL divergence or information gain
1 Note that the negative mutual information (which is never positive) is not an expected
KL divergence (which is never negative). This is because the expectation is under the joint
distribution over outcomes and hidden states. Furthermore, epistemic value is never
positive, which means that the best one can do is to have an epistemic value of zero; in
other words, a preferred outcome is expected with probability one.
12 K. Friston et al.
π
= oπτ · (oτ − Uτ ) + sπτ · H ,
risk ambiguity
Active Inference 13
oπτ = Ă · sπτ
oπτ = ln oπτ
A = EQ [ln A] = ψ (a) − ψ (a0 )
Ă = EQ [Ai j ] = a × a−1
0 : a0i j = ai j .
i
The two terms in the first expression for expected free energy represent risk-
and ambiguity-sensitive contributions, respectively, where utility is a vector
of preferences over outcomes. This decomposition lends a formal meaning
to risk and ambiguity: risk is the relative entropy or uncertainty about
outcomes, in relation to preferences, while ambiguity is the uncertainty
about outcomes given the state of the world. This is largely consistent with
the use of risk and ambiguity in economics (Kahneman & Tversky, 1979;
Zak, 2004; Knutson & Bossaerts, 2007; Preuschoff, Quartz, & Bossaerts,
2008), where ambiguity reflects uncertainty about the context (e.g., which
lottery is currently in play).
In summary, the above formalism suggests that expected free energy can
be carved in two complementary ways. First, it can be decomposed into
a mixture of epistemic and extrinsic value, promoting explorative, novelty
seeking, and exploitative, reward-seeking behavior, respectively (Friston,
Rigoli et al., 2015). Equivalently, minimizing expected free energy can be
formulated as minimizing a mixture of expected cost or risk and ambiguity.
This completes our description of free energy. We now turn to belief updat-
ing that is based on minimizing free energy under the generative model we
have described.
⎫
sπτ = σ (A · oτ + Bπτ−1 sπτ−1 + Bπτ · sπτ+1 ) ⎪
⎪
⎬
π = σ (−F − γ · G) ⎪
Inference,
⎪
⎭
β = β + (π − π0 ) · G
14 K. Friston et al.
⎫
A = ψ (a) − ψ (a0 ) a=a+ o ⊗ s ⎪
⎪
τ τ τ ⎪
⎬
π π
B = ψ (b) − ψ (b0 ) b(u) = b(u) + π (τ )=u ππ · sτ ⊗ sτ −1 ⎪ Learning.
⎪
⎪
⎭
D = ψ (d) − ψ (d0 ) d = d + s1
(2.7)
For notational simplicity, we have used Bπτ = B(π (τ )), D = Bπ0 sπ0 , γ = 1/β,
and π0 = σ (−γ · G). Usually one would iterate the equalities in equation
2.7 until convergence. However, we can also obtain the solution in a robust
and biologically more plausible fashion using a gradient descent on free
energy (see appendixes B and C):
.
π
sτ = ∂s sπτ · ετπ
sπτ = σ (sπτ )
.
β = γ 2 εγ
ετπ = (A · oτ + Bπτ−1 sπτ−1 + Bπτ · sπτ+1 ) − sπτ
ε γ = (β − β) + (π − π0 ) · G. (2.8)
This converts the discrete updates above into dynamics for inference that
minimize state and precision prediction errors ετπ = −∂s F and ε γ = ∂γ F,
where these prediction errors are free energy gradients.
Solving these equations produces posterior expectations that minimize
free energy to provide Bayesian estimates of hidden variables. This means
that expectations change over several timescales: a fast timescale that up-
dates posterior beliefs about hidden states after each observation (to min-
imize free energy over peristimulus time) and a slower timescale that
updates posterior beliefs as new observations are sampled (to mediate
evidence accumulation over observations): (see also Penny, Zeidman, &
Burgess, 2013). Finally, at the end of each sequence of observations (i.e.,
trial of observation epochs), the expected (concentration) parameters are
updated to mediate learning over trials (FitzGerald, Dolan, & Friston, 2015).
These updates are remarkably simple and have intuitive (neurobiological)
interpretations:
prior expectations about hidden states with the likelihood of the current
observation (Kass & Steffey, 1989). However, the scheme does not use con-
ventional forward and backward sweeps (Penny et al., 2013; Pezzulo, Rigoli,
& Chersi, 2013), because all future and past states are encoded explicitly.
In other words, representations always refer to the same hidden state at
the same time in relation to the start of the trial, not in relation to the cur-
rent time. This may seem counterintuitive, but this form of spatiotemporal
(place and time) encoding finesses belief updating considerably and, as we
will see later, has a degree of plausibility in relation to empirical findings.
The formulation in equation 2.8 is important because it describes dy-
namics that can be related to neuronal processes. In other words, we move
a variational Bayesian scheme toward a process theory that can predict neu-
ronal responses during state estimation and action selection (e.g., Solway
& Botvinick, 2012). This process theory associates the expected probabil-
ity of a state with the probability of a neuron (or population) firing and
the logarithm of this probability with postsynaptic membrane potential.
This fits comfortably with theoretical proposals and empirical work on the
accumulation of evidence (Kira, Yang, & Shadlen, 2015) and the neuronal
encoding of probabilities (Deneve, 2008), while rendering the softmax func-
tion a (sigmoid) activation function that converts membrane potentials to
firing rates. The postsynaptic depolarization caused by afferent input can
now be interpreted in terms of free energy gradients (i.e., state prediction
errors) that are linear mixtures of firing rates in other neurons (or pop-
ulations). These prediction errors play the role of postsynaptic currents,
which drive changes in membrane potential and subsequent firing rates.
This means that when there are no prediction errors, postsynaptic currents
disappear and depolarizations (and firing rates) converge to the free energy
minimum. Note that the above expressions imply a self-inhibition because
prediction errors decrease when log expectations increase.
Technically, replacing the explicit solutions, equation 2.7, with a gradi-
ent ascent, equation 2.8, is exactly the same generalization of variational
Bayes found in variational Laplace (Friston et al., 2007), namely, a gen-
eralized coordinate descent. This is nice, because it means one can think
about process theories for variational treatments of Markov decision pro-
cesses as formally similar to equivalent process theories for state-space
models, such as predictive coding (Rao & Ballard, 1999; Bastos et al., 2012).
There are some finer, neurobiologically plausible details of the dynamics of
expectations about hidden states that we will consider elsewhere. For ex-
ample, the modulation by ∂s sπτ implies activity-dependent (e.g., NMDA-R
dependent) depolarization that enforces an excitation-inhibition balance
(see appendix B).
2.6 Action Selection, Precision, and Dopamine. The policy updates are
just a softmax function of their log probability, which has two components:
16 K. Friston et al.
the free energy based on past outcomes and the expected free energy based
on preferences about future outcomes. In other words, prior beliefs about
policies in the generative model are supplemented or informed by the free
energy based on outcomes. Policy selection also entails the optimization
of expected uncertainty or precision. This is expressed above in terms of
the temperature (inverse precision), which encodes posterior beliefs about
precision: β = 1/γ .
Interestingly, the updates for temperature are determined by the differ-
ence between the expected free energy under posterior and prior beliefs
about policies, that is, the prediction error based on expected free energy.
This endorses the notion of reward prediction errors as an update signal that
the brain might use, in the sense that if posterior beliefs based on current
observations reduce the expected free energy, relative to prior beliefs, then
precision will increase (FitzGerald, Dolan et al., 2015). This can be related
to dopamine discharges that have been interpreted in terms of changes
in expected reward (Schultz & Dickinson, 2000; Fiorillo et al., 2003) and
marginal utility (Stauffer, Lak, & Schultz, 2014). We have previously con-
sidered the intimate (monotonic) relationship between expected precision
and expected utility in this context (see Friston et al., 2014, for a fuller dis-
cussion). The role of the neuromodulator dopamine in encoding precision
is also consistent with its multiplicative effect in equation 2.7, to nuance the
selection among competing policies (Fiorillo et al., 2003; Frank, Scheres, &
Sherman, 2007; Humphries, Wood, & Gurney, 2009; Humphries, Khamassi,
& Gurney, 2012; Solway & Botvinick, 2012; Mannella & Baldassarre, 2015).
We will return to this later.
2.7 Learning and Associative Plasticity. Finally, the updates for the
parameters bear a marked resemblance to classical Hebbian plasticity (Ab-
bott & Nelson, 2000). The parameter updates for state transitions comprise
two terms: an associative term that is a digamma function of the accumu-
lated coincidence of past (postsynaptic) and current (presynaptic) states
(or observations under hidden causes) and a decay term that reduces each
connection as the total afferent connectivity increases. The associative and
decay terms are strictly increasing but saturating functions of the concen-
tration parameters. Note that the updates for the connectivity parameters
accumulate coincidences over time, because parameters are time invariant
(in contrast to states that change over time). Furthermore, the parameters
encoding state transitions have associative terms that are modulated by
policy expectations.
In addition to learning contingencies through the parameters of the tran-
sition matrices, the vectors encoding beliefs about initial states accumulate
evidence by simply counting the number of times an initial state occurs. In
other words, if a particular state is encountered frequently, it will come to
dominate posterior expectations. This mediates context learning in terms of
the initial state. In practice, the parameters are updated at the end of each
Active Inference 17
Figure 1: Schematic overview of belief updates for active inference under dis-
crete Markovian models. The left panel lists the solutions in the main text,
associating various updates with action, perception, policy selection, precision,
and learning. It assigns the variables (sufficient statistics or expectations) that
are updated to various brain areas. This attribution should not be taken too se-
riously but serves to illustrate a rough functional anatomy, implied by the form
of the belief updates. In this simplified scheme, we have assigned observed
outcomes to visual representations in the occipital cortex and state estimation
to the hippocampal formation. The evaluation of policies, in terms of their
(expected) free energy, has been placed in the ventral prefrontal cortex. Ex-
pectations about policies per se and the precision of these beliefs have been
attributed to striatal and ventral tegmental areas to indicate a putative role for
dopamine in encoding precision. Finally, beliefs about policies are used to create
Bayesian model averages over future states that are fulfilled by action. The blue
arrows denote message passing, and the solid red line indicates a modulatory
weighting that implements Bayesian model averaging. The broken red lines
indicate the updates for parameters or connectivity (in blue circles) that depend
on expectations about hidden states. This scheme is described heuristically in
Figure 2. See the appendixes and Table 1 for an explanation of the equations and
variables.
look next. In the next section, we use equation 2.8 to simulate neuronal
responses and show that many familiar electrophysiological phenomena
emerge.
Active Inference 19
3 Simulations of Inference
the location of rewards. The basic structure of this problem can be trans-
lated to any number of scenarios (e.g., saccadic eye movements to visual
targets). The simulations use the same setup as in Friston et al. (2015) and
is as simple as possible while illustrating some fairly complicated behav-
iors. This example can also be interpreted in terms of responses elicited in
reinforcement learning paradigms by unconditioned (US) and conditioned
(CS) stimuli. Strictly speaking, our paradigm is instrumental, and the cue is
a discriminative stimulus; however, we retain the Pavlovian nomenclature
when relating precision updates to dopaminergic discharges.
3.1 The Setup. An agent, such as a rat, starts in the center of a T-maze,
where either the right or left arms are baited with a reward (US). The lower
arm contains a discriminative cue (CS) that tells the animal whether the
reward is in the upper right or left arm. Crucially, the agent can make only
two moves. Furthermore, the agent cannot leave the baited arms after they
are entered. This means that the optimal behavior is to first go to the lower
arm to find where the reward is located and then retrieve the reward at the
cued location.
In terms of a Markov decision process, there are four control states that
correspond to visiting, or sampling, the four locations (the center and three
arms). For simplicity, we assume that each control state takes the agent to
the associated location, as opposed to moving in a particular direction from
the current location. This is analogous to place-based navigation strategies
mediated by the hippocampus (e.g., Moser, Kropff, & Moser, 2008). There
are eight hidden states (four locations by two contexts) and seven possible
outcomes. The outcomes correspond to being in the center of the maze plus
the (two) outcomes at each of the (three) arms that are determined by the
context (the right or left arm is more rewarding).
Having specified the state-space, it is now necessary to specify the (A,B)
matrices encoding contingencies. These are shown in Figure 3, where the
A matrix maps from hidden states to outcomes, delivering an ambiguous
cue at the center (first) location and a definitive cue at the lower (fourth)
location. The remaining locations provide a reward with probability p =
98% depending on the context. The B(u) matrices encode action-specific
transitions, with the exception of the baited (second and third) locations,
which are absorbing hidden states that the agent cannot leave.
In general treatments, we would consider learning contingencies by up-
dating the prior concentration parameters (a,b) of the transition matrices,
but we will assume the agent knows (i.e., has very precise beliefs about) the
contingencies. This corresponds to making the prior concentration param-
eters very large. Conversely, we will use small values of d to enable context
learning. Preferences in the vector Uτ = ln P(oτ ) ≤ 0 encode the utility of
outcomes. Here, the (relative) utilities of a rewarding and unrewarding out-
come were 3 and −3, respectively (and zero otherwise). This means, that
the agent expects to be rewarded exp(3) ≈ 20 times more than experiencing
Active Inference 21
a neutral outcome. Note that utility is always relative because the proba-
bilities over outcomes must sum to one. As noted above, this means the
prior preferences are a softmax function of utility P(oτ ) = σ (Uτ ). Associat-
ing utility with log probabilities is important because it endows utility with
the same measure as information, namely, nats (i.e., units of information or
entropy based on natural logarithms). This highlights the close connection
between value and information (Howard, 1966).
Having specified the state-space and contingencies, one can solve the
belief updating equations in equation 2.8 to simulate behavior. Prior beliefs
about the initial state were initialized to d = 8 for the central location for
22 K. Friston et al.
3.3 Theta-Gamma Coupling and Place Cell Activity. The lower right
panel of Figure 5 shows the same firing rate responses above but highlights
units encoding the three locations visited (the thick green blue and red
lines). These responses reflect increases in activity (during the second theta
epoch) in the same sequence that the locations are visited. Empirically,
this phenomenon is called a theta sequence: short (3–5) sequences of place
cells that fire sequentially within each theta cycle, as if they were encoding
time-compressed trajectories (Lisman & Redish, 2009).
In our setting, theta-gamma coupling is a straightforward consequence
of belief updating every 250 ms (i.e., theta), where each observation induces
phasic updates that necessarily possess high-frequency (i.e., gamma) com-
ponents. This is illustrated in the middle left panel of Figure 5, which shows
Figure 5: Simulated electrophysiological responses for the first trial. This figure
reports the belief updating described in the text. It presents responses in sev-
eral formats that emulate empirical characterizations of spatial navigation and
decision-making responses. The upper left panel shows the activity (firing rate)
of all units encoding hidden states in image (raster) format. There are eight hid-
den states for each of the three epochs in this trial, where each (250 ms or theta)
epoch starts with an observation and ends with an action. These responses are
organized such that the upper rows encode the probability of the eight states
in the first epoch, with subsequent epochs in the middle and lower rows. Note
the fluctuations in activity after each new outcome is observed. The upper right
panel plots the same information highlighting two units (in solid lines), en-
coding the upper left (rewarded and chosen state) and upper right location on
the third epoch (unrewarded and unchosen state). The simulated local field
potentials for these units (i.e., their rate of change of neuronal firing) are shown
in the middle right panel. This pattern of firing reflects a saltatory evidence
accumulation (stepping dynamics), in which expectations about occupying the
chosen and unchosen states diverge as the trial progresses. The simulated local
field potentials also show that responses in units encoding locations later in the
trial peak earlier, as successive outcomes are observed. This necessarily results
in a phase precession that is also illustrated in the middle left panel. This panel
shows the response of the rewarded hidden state unit before (dotted line) and
after (solid line) filtering at 4 Hz, superimposed on a time-frequency decompo-
sition of the local field potential (averaged over all units). The key observation
here is that depolarization in the 4 Hz range coincides with induced responses,
including gamma activity. The lower left panel illustrates simulated dopamine
responses in terms of a mixture of precision and its rate of change. Finally, the
lower right panel reproduces the upper right panel but highlights responses
in units encoding the states visited (green, – first; blue, second; and red, final
state).
28 K. Friston et al.
the response of the second (rewarded hidden state) unit before (dotted line)
and after (solid line) filtering at 4 Hz. These responses are superimposed on
a time frequency decomposition of the local field potential averaged over
all units. The key observation here is that depolarization in the theta range
coincides with induced responses, including gamma activity. The implicit
theta-gamma coupling during navigation can be seen more clearly in Fig-
ure 6. This figure reports simulated electrophysiological responses over the
first eight trials, with the top panel showing the responses of units encoding
hidden states and the second panel showing the associated time frequency
response (and depolarization of the first unit, after filtering at 4 Hz). The
final two panels show the simulated local field potentials and dopamine
responses using the same format as the previous figure. The key observa-
tion in this here is that fluctuations in gamma power (averaged over all
units) are tightly coupled to the depolarization in the theta range (of single
units).
Phase precession and theta-gamma coupling are typically observed in
the context of place cell activity, in which units respond selectively when an
animal passes through particular locations. This sort of response is easy to
demonstrate under the current scheme. Figure 7 (upper right panel) plots
the activity of two units encoding the rewarded locations at the right (green
dots) and left (red dots) arms as a function of the location in the maze
over the first eight trials. The trajectories (dotted lines) were constructed
by adding random displacements (with a standard deviation of an eighth)
to the trajectory prescribed by action. The dots indicate times at which
the unit approached its maximal firing rate (i.e., greater than 80%) and
illustrate place cell activity that is specific to the locations they encode.
However, this response profile is unique to the units encoding the final
location: units encoding the location in the second epoch fire maximally
at both the target location and the preceding (cue) location (lower right
panel).
We present these results to address an interesting question. Hitherto, we
have assumed that units encode states (location) in a frame of reference
that is locked to the beginning of a trial or trajectory. The alternative is that
each unit encodes the state in relation to the current time, in a moving time
frame. This distinction is shown schematically in the lower left panel of
Figure 7. If we use a fixed frame of reference, the successive activities of the
two units are described by rows of the raster, indicated with white numbers.
Conversely, if the encoding uses a moving frame of reference, these units
would show the activity along the leading diagonal of the raster, indicated
by the red numbers. Crucially, in a moving frame of reference, all units
would show classical place cell responses, whereas in a fixed frame of
reference, some units will encode the location of states that will be visited
in the future. This would lead to a more complicated relationship between
neuronal firing and the location of the animal.
Active Inference 29
Figure 7: Place cell responses. The upper right panel plots the activity of two
units encoding the rewarded locations at the right (green dots) and left (red
dots) arms, as a function of the location in the maze, over the first eight trials.
The trajectories (dotted lines) were constructed by adding (smooth) random
displacements (with a standard deviation of an eighth) to the trajectory pre-
scribed by action. The dots indicate times at which the unit exceeded 80% of its
maximum activity and illustrate place cell activity that is specific to the locations
encoded. However, this response profile is unique to the units encoding the final
location: units encoding the location in the second epoch fire maximally at both
the target location and the preceding (cue) location (lower right panel). The left
panel reproduces the neural activity in raster format for two trials to indicate
expectations about hidden states that are plotted.
sπτ (t + t) = σ (sπτ (t) − t · (sπτ (t) − . . . − B(π (τ )) · sπτ+1 (t))),
sπτ (t + t) = σ (sπτ (t) − t · (sπτ (t) − . . . − B(π ( t
+ τ )) · sπτ+1 (t))).
(3.1)
The key difference between these formulations is that in the moving frame
of reference, the connectivity changes from epoch to epoch, whereas in a
fixed frame of reference, the connectivity remains the same. In light of this,
we have elected to simulate responses assuming a fixed frame of reference,
which suggests that a subset of hippocampal (or parietal) units should
show extraclassical place cell activity, encoding trajectories over multiple
locations (Grosmark & Buzsaki, 2016).
4 Context Learning
4.3 Foraging for Information. One might ask what would happen if
rewards were devalued by setting their (relative) utility to zero. Figure 10
shows the results of a simulation, using the same setup as in Figure 4.
The only difference here was that there were no explicit preferences or
utilities. However, the resulting behavior is still structured and purposeful
because it is driven by epistemic value. In every trial, the agent moves to
the cue location to resolve ambiguity about the context (see lower panels).
After the cue is sampled, uncertainty cannot be reduced further, and the
agent either stays where it is or returns to the central location, avoiding the
baited arms. It avoids the baited arms because they are mildly ambiguous
(given our partial reinforcement schedule). This sort of simulation can, in
34 K. Friston et al.
Figure 9: Violation responses and simulated P300 waveforms. This figure uses
the same format as the previous figure but focuses on consecutive trials in-
dicated by the arrows above the insert. The first trial is an epistemic trial in
which the agent interrogates the cue location and then acquires the reward. In
the subsequent trial, we forced the agent to stay where it was, thereby induc-
ing protracted and high-amplitude belief updating about hidden states. This is
most evident in the hidden states encoding the (cue) location in the third (final)
epoch (cyan circles). Assuming each epoch lasts 250 ms, these responses reach
peak amplitude at about 150 ms—or 250 ms in peristimulus time (allowing for
100 ms conduction delays).
Figure 10: Epistemic foraging. This figure reports the (behavioral and physio-
logical) responses over the 32 trials as in Figure 4. However, in this simulation,
all outcomes were assigned the same utility. This means there is no extrinsic
value, and the agent maximizes epistemic value by first resolving its uncertainty
about the context (by going to the cue location) and then avoiding (the mildly
ambiguous) upper arms. This behavior is shown schematically, and in terms of
place cell firing, in the lower panels.
36 K. Friston et al.
5 Conclusion
+ D[Q(A)||P(A)] + . . .
38 K. Friston et al.
= π · (π + F + γ · G) + ln Z + βγ − ln γ + (ai − ai ) · Ai
i
− ln B(ai ) + . . .
The free energy of hidden states and the expected free energy are given by
Fπ = F(π )
F(π ) = F(π, τ )
τ
F(π, τ ) = EQ̃ [D[Q(sτ |π )||P(sτ |sτ −1 , π )]] − EQ̃ [ln P(oτ |sτ )]
complexity accuracy
= sπτ · (sπτ − Bπτ−1 sπτ−1 − A · oτ )
Gπ = G(π )
G(π ) = G(π, τ )
τ
Here, Bπτ = B(π (τ )), Bπ0 sπ0 = D. Z = π exp(−γ · Gπ ) and A = ψ (a) −
ψ (a0 ). The beta function of the column vector ai is denoted by B(ai ). Us-
ing the standard result, ∂a B(a) = B(a)A, we can differentiate the variational
free energy with respect to the sufficient statistics (with a slight abuse of
notation and using ∂s F := ∂F(π, τ )/∂sτπ ):
∂s F = sπτ − A · oτ − Bπτ−1 sπτ−1 − Bπτ · sπτ+1
∂π F = π + F + γ · G
1
∂γ F = β + π · G + ∂ Z−β
Z γ
= β + (π − π0 ) · G − β
∂γ Z = − exp(−γ · G) · G
π0 = σ (−γ · G)
Active Inference 39
∂a F = ∂a A · (a − a − o τ ⊗ sτ )
τ
∂b F = ∂b B · (b(u) − b(u) − ππ · sπτ ⊗ sπτ−1 )
π (τ )=u
∂d F = ∂d D · (d − d − s1 )
sτ = ππ · sπτ
π
Finally, the solutions to these equations give the variational updates in the
main text (see equation 2.7).
.
π
sτ = −∂s F = ∂s sπτ · ετπ
.
β = −∂β F = γ 2 ε γ
∂β F = ∂β γ · ∂γ F = −γ 2 · ετπ
∂s Z = Z · sπτ
∂s sπτ = diag(sπτ ) − sπτ ⊗ sπτ
−∂s F = ετπ = (A · oτ + Bπτ−1 sπτ−1 + Bπτ · sπτ+1 ) − sπτ ,
∂γ F = ε γ = (β − β) + (π − π0 ) · G.
Practically, one can solve these equations using the discrete updates:
In the simulations, we used t = 1/4 but continued iterating for 16 (250 ms)
iterations.
The mean field assumption approximates the posterior with the product of
marginals over the current state, lending free energy the following form:
Relative complexity
Accuracy
F(π ) = F(π, τ )
τ
Accuracy
= sπτ · sπτ − 12 ln(B̄πτ−1 sπτ−1 ) − 12 ln(B̄πτ † sπτ+1 ) − A · oτ .
f orward divergence
= − ln P(oτ ) .
evidence
Here, we have omitted (uniform) priors over hidden states P(sτ ). Note that
this marginal free energy retains the same form but uses the log of expec-
tations, as opposed to expectations of logs. Furthermore, it uses backward
transitions, B(u)† = Dir(b(u)T ), such that the free energy gradients become
∂s F(π, τ ) = sπτ − A · oτ − 12 ln(Bπτ−1 sπτ−1 ) − 12 ln(Bπτ † sπτ+1 ).
The free energy under any policy now becomes a tight or exact bound on
log evidence,
complexity
accuracy
42 K. Friston et al.
T
= diag(sπτ · πτ )sπτ−1 . . . sπ1 − oτ · Asπτ . . . sπ1 ,
τ =1
πτ = sπτ − Bπτ−1 ,
simplicity):
ln P (π ) = −G(π|ot ) − G(π|ot+1 ) − . . .
G π|ot+1 = EQ(o |π )P(π ) [G(πt+1 |ot+1 )]
t+1 t+1
π
Q ot+1 |π = EQ(s |π ) Q(ot+1 |st+1 ) = Ast+1
t+1
P πt+1 = σ (G(πt+1 |ot+1 ))
In this case, the expected free energy after the next outcome G(πt+1 |ot+1 )
is evaluated in the same way as the expected free energy at the current time
∧
G(πt |ot ) = G(π ) for each (fictive) outcome ot+1 by using the posterior over
π
current hidden states as the prior D = st+1 . Clearly, this scheme is com-
putationally more involved than the naive scheme and calls on recursive
variational updating. This means that sophisticated agents are metacogni-
tive in some sense because they perform belief updating (based on fictive
outcomes) to optimize their belief updating.
Heuristically, the difference between naive and sophisticated schemes
can be seen in terms of the first choice in current paradigm. For the naive
agent, the best policy is to sample the cue location and stay there, because
moving to a baited arm has, on average, no extrinsic value (and provides
ambiguous outcomes). Conversely, the expected free energy of retrieving
a reward after observing the cue is low for both (fictive) outcomes. This
means the best policies are to behave epistemically on the first move and
then pragmatically on the second move. Note that the sophisticated agent,
unlike the naive agent, can entertain future switches between policies.
Acknowledgments
References
Abbott, L. F., & Nelson, S. B. (2000). Synaptic plasticity: Taming the beast. Nat.
Neurosci., 3(Suppl.), 1178–1183.
Adams, R. A., Shipp, S., & Friston, K. J. (2013). Predictions not commands: Active
inference in the motor system. Brain Struct. Funct., 218(3), 611–643.
44 K. Friston et al.
Attias, H. (2003). Planning by probabilistic inference. In Proc. of the 9th Int. Workshop
on Artificial Intelligence and Statistics.
Ballard, D. H., Hinton, G. E., & Sejnowski, T. J. (1983). Parallel visual computation.
Nature, 306, 21–26.
Ballard, D. H., Kit, D., Rothkopf, C. A., & Sullivan, B. (2013). A hierarchical modular
architecture for embodied cognition. Multisensory Research, 26, 177.
Barlow, H. (1961). Possible principles underlying the transformations of sensory
messages. In W. Rosenblith (Ed.), Sensory communication (pp. 217–234). Cam-
bridge, MA: MIT Press.
Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nat.
Rev. Neurosci., 16(7), 419–429.
Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or surprise? Frontiers in
Psychology, 4.
Barto, A., Singh, S., & Chentanez, N. (2004). Intrinsically motivated learning of
hierarchical collections of skills. In Proceedings of the 3rd International Conference
on Development and Learning. Cambridge, MA: MIT Press.
Bastos, A. M., Usrey, W. M., Adams, R. A., Mangun, G. R., Fries, P., & Friston,
K. J. (2012). Canonical microcircuits for predictive coding. Neuron, 76(4), 695–
711.
Beal, M. J. (2003). Variational algorithms for approximate Bayesian inference. Doc-
toral dissertation, University College London.
Bendixen, A., SanMiguel, I., & Schroger, E. (2012). Early electrophysiological indi-
cators for predictive processing in audition: A review. Int. J. Psychophysiol, 83(2),
120–131.
Bengtson, C. P., Tozzi, A., Bernardi, G., & Mercuri, N. B. (2004). Transient receptor
potential-like channels mediate metabotropic glutamate receptor EPSCs in rat
dopamine neurones. J. Physiol., 555(Pt. 2), 323–330.
Botvinick, M., & An, J. (2009). Goal-directed decision making in prefrontal cortex:
A computational framework. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K.
I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems,
22. Cambridge, MA: MIT Press.
Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends Cogn. Sci., 16(10),
485–488.
Bowers, J. S., & Davis, C. J. (2012). Bayesian just-so stories in psychology and neuro-
science. Psychol. Bull., 138(3), 389–414.
Braun, D. A., Ortega, P. A., Theodorou, E., & Schaal, S. (2011). Path integral control
and bounded rationality. In IEEE symposium on adaptive dynamic programming and
reinforcement learning. Piscataway, NJ: IEEE.
Bromberg-Martin, E. S., & Hikosaka, O. (2009). Midbrain dopamine neurons signal
preference for advance information about upcoming rewards. Neuron, 63(1), 119–
126.
Burgess, N., Barry, C., & O’Keefe, J. (2007). An oscillatory interference model of grid
cell firing. Hippocampus, 17(9), 801–812.
Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future
of cognitive science. Behav. Brain. Sci., 36(3), 181–204.
De Bruin, L., & Michael, J. (2015). Bayesian predictive coding and social cognition.
Consciousness and Cognition, 36, 373–375.
Active Inference 45
de Gardelle, V., Waszczuk, M., Egner, T., & Summerfield, C. (2013). Concurrent
repetition enhancement and suppression responses in extrastriate visual cortex.
Cereb. Cortex, 23(9), 2235–2244.
de Lafuente, V., Jazayeri, M., & Shadlen, M. N. (2015). Representation of accumulat-
ing evidence for a decision in two parietal areas. J. Neurosci., 35(10), 4306–4318.
Deneve, S. (2008). Bayesian spiking neurons I: Inference. Neural Comput., 20(1), 91–
117.
Feldman, A. G. (2009). New insights into action-perception coupling. Exp. Brain. Res.,
194(1), 39–58.
Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward proba-
bility and uncertainty by dopamine neurons. Science, 299(5614), 1898–1902.
FitzGerald, T., Dolan, R., & Friston, K. (2014). Model averaging, optimal inference,
and habit formation. Front. Hum. Neurosci., 8, 457. doi:10.3389/fnhum.2014.00457
FitzGerald, T. H., Dolan, R. J., & Friston, K. (2015). Dopamine, reward learning, and
active inference. Front. Comput. Neurosci., 9, 136.
FitzGerald, T. H., Moran, R. J., Friston, K. J., & Dolan, R. J. (2015). Precision and
neuronal dynamics in the human posterior parietal cortex during evidence accu-
mulation. Neuroimage, 107, 219–228.
FitzGerald, T. H., Schwartenbeck, P., Moutoussis, M., Dolan, R. J., & Friston, K. (2015).
Active inference, evidence accumulation, and the urn task. Neural Comput., 27(2),
306–328.
Frank, M. J., Scheres, A., & Sherman, S. J. (2007). Understanding decision-making
deficits in neurological conditions: Insights from models of natural action selec-
tion. Philos. Trans. R. Soc. Lond. B Biol. Sci., 362(1485), 1641–1654.
Friederici, A. D. (2005). Neurophysiological markers of early language acquisition:
From syllables to sentences. Trends Cogn. Sci., 9(10), 481–488.
Friston, K. (2012). A free energy principle for biological systems. Entropy, 14, 2100–
2121.
Friston, K. (2013). Life as we know it. J. R. Soc. Interface., 10(86), 20130475.
Friston, K., Adams, R., & Montague, R. (2012). What is value—accumulated reward
or evidence? Frontiers in Neurorobotics, 6, 11.
Friston, K., & Buzsaki, G. (2016). The functional anatomy of time: What and when
in the brain. Trends Cogn. Sci., 20, 500–511. doi:10.1016/j.tics.2016.05.001
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., & Pezzulo,
G. (2016). Active inference and learning. Neurosci. Biobehav. Rev., 68, 862–879.
doi:10.1016/j.neubiorev.2016.06.022
Friston, K., Kilner, J., & Harrison, L. (2006). A free energy principle for the brain. J.
Physiol. Paris., 100(1–3), 70–87.
Friston, K., Mattout, J., & Kilner, J. (2011). Action understanding and active inference.
Biol. Cybern., 104, 137–160.
Friston, K., Mattout, J., Trujillo-Barreto, N., Ashburner, J., & Penny, W. (2007). Vari-
ational free energy and the Laplace approximation. NeuroImage, 34(1), 220–234.
Friston, K., Rigoli, F., Ognibene, D., Mathys, C., Fitzgerald, T., & Pezzulo, G. (2015).
Active inference and epistemic value. Cogn. Neurosci., 6, 187–214.
Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T., & Dolan,
R. J. (2014). The anatomy of choice: Dopamine and decision-making. Philos. Trans.
R. Soc. Lond.—B. Biol. Sci., 369, 20130481.
46 K. Friston et al.
Friston, K., Schwartenbeck, P., FitzGerald, T., Moutoussis, M., Behrens, T., Raymond,
R. J., & Dolan, J. (2013). The anatomy of choice: Active inference and agency. Front.
Hum. Neurosci., 7, 598.
George, D., & Hawkins, J. (2009). Towards a mathematical theory of cortical micro-
circuits. PLoS Comput. Biol., 5(10), e1000532.
Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annu. Rev.
Neurosci., 30, 535–574.
Grosmark, A. D., & Buzsaki, G. (2016). Diversity in neural firing dynamics sup-
ports both rigid and learned hippocampal sequences. Science, 351(6280), 1440–
1443.
Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., & Garrod, S.
(2013). Speech rhythms and multiplexed oscillatory sensory coding in the human
brain. PLoS Biol. 11(12), e1001750. doi:10.1371/journal.pbio.1001752
Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press.
Hohwy, J. (2014). The self-evidencing brain. Noûs, 50(2). doi:10.1111/nous.1262
Howard, R. (1966). Information value theory. IEEE Transactions on Systems, Science
and Cybernetics, SSC-2(1), 22–26.
Huk, A. C., & Shadlen, M. N. (2005). Neural activity in macaque parietal cortex
reflects temporal integration of visual motion signals during perceptual decision
making. J. Neurosci., 25(45), 10420–10436.
Humphries, M. D., Khamassi, M., & Gurney, K. (2012). Dopaminergic control of the
exploration-exploitation trade-off via the basal ganglia. Front. Neurosci., 6, 9.
Humphries, M. D., Wood, R., & Gurney, K. (2009). Dopamine-modulated dynamic
cell assemblies generated by the GABAergic striatal microcircuit. Neural Netw.,
22(8), 1174–1188.
Hunt, L. T., Kolling, N., Soltani, A., Woolrich, M. W., Rushworth, M. F., & Behrens, T.
E. (2012). Mechanisms underlying cortical activity during value-guided choice.
Nat. Neurosci., 15(3), 470–476, s471–s473.
Itti, L., & Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Res.,
49(10), 1295–1306.
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review
Series II, 106(4), 620–630.
Jensen, O., Gips, B., Bergmann, T. O., & Bonnefond, M. (2014). Temporal coding
organized by coupled alpha and gamma oscillations prioritize visual processing.
Trends Neurosci., 37(7), 357–369.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under
risk. Econometrica, 47(2), 263–291.
Kass, R. E., & Steffey, D. (1989). Approximate Bayesian inference in conditionally
independent hierarchical models (parametric empirical Bayes models). J. Am.
Stat. Assoc., 407, 717–726.
Kira, S., Yang, T., & Shadlen, M. N. (2015). A neural implementation of Wald’s
sequential probability ratio test. Neuron, 85(4), 861–873.
Klyubin, A. S., Polani, D., & Nehaniv, C. I. (2005). Empowerment: A universal
agent-centric measure of control. In Proc. CEC 2005. IEEE (vol. 1, pp. 128–135).
Piscataway, NJ: IEEE.
Knill, D. C., & Pouget, A. (2004). The Bayesian brain: The role of uncertainty in neural
coding and computation. Trends Neurosci., 27(12), 712–719.
Active Inference 47
Knutson, B., & Bossaerts, P. (2007). Neural antecedents of financial decisions. Journal
of Neuroscience, 27(31), 8174–8177.
Krebs, R. M., Schott, B. H., Schütze, H., & Düzel, E. (2009). The novelty exploration
bonus and its attentional modulation. Neuropsychologia, 47, 2272–2281.
Latimer, K. W., Yates, J. L., Meister, M. L., Huk, A. C., & Pillow, J. W. (2015). Neuronal
Modeling: Single-trial spike trains in parietal cortex reveal discrete steps during
decision-making. Science, 349(6244), 184–187.
Laughlin, S. B. (2001). Efficiency and complexity in neural coding. Novartis Found.
Symp., 239, 177–187.
Linsker, R. (1990). Perceptual neural organization: Some approaches based
on network models and information theory. Annu. Rev. Neurosci., 13, 257–
281.
Lisman, J., & Buzsaki, G. (2008). A neural coding scheme formed by the combined
function of gamma and theta oscillations. Schizophr. Bull., 34(5), 974–980.
Lisman, J., & Redish, A. D. (2009). Prediction, sequences and the hippocampus.
Philos. Trans. R. Soc. Lond. B. Biol. Sci., 364(1521), 1193–1201.
Maffongelli, L., Bartoli, E., Sammler, D., Kolsch, S., Campus, C., Olivier, . . . D’Ausilio,
A., (2015). Distinct brain signatures of content and structure violation during
action observation. Neuropsychologia, 75, 30–39.
Mannella, F., & Baldassarre, G. (2015). Selection of cortical dynamics for motor
behaviour by the basal ganglia. Biological Cybernetics, 109(6), 575–595.
Mirza, M. B., Adams, R. A., Mathys, C. D., & Friston, K. J. (2016). Scene construction,
visual foraging and active inference. Frontiers in Computational Neuroscience, 10,
56. doi:10.3388/fncom.2016.00056
Moser, E. I., Kropff, E., & Moser, M. B. (2008). Place cells, grid cells, and the brain’s
spatial representation system. Annu. Rev. Neurosci., 31, 69–89.
Moser, M. B., Rowland, D. C., & Moser, E. I. (2015). Place cells, grid cells, and
memory. Cold Spring Harb. Perspect. Biol., 7(2), a021808.
Moutoussis, M., Trujillo-Barreto, N. J., El-Deredy, W., Dolan, R. J., & Friston, K.
J. (2014). A formal model of interpersonal inference. Front. Hum. Neurosci., 8,
160.
Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y., & Tanji, J. (2006). Activity in the
lateral prefrontal cortex reflects multiple steps of future events in action plans.
Neuron, 50, 631–641.
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field
properties by learning a sparse code for natural images. Nature, 381, 607–609.
O’Regan, J., & Noë, A. (2001). A sensorimotor account of vision and visual con-
sciousness. Behav. Brain. Sci., 24, 939–973.
Ortega, P. A., & Braun, D. A. (2013). Thermodynamics as a theory of decision-making
with information-processing costs. Proc. R. Soc. A, 469, 2153.
Penny, W., Zeidman, P., & Burgess, N. (2013). Forward and backward inference in
spatial cognition. PLoS Comput. Biol., 9(12), e1003383.
Pezzulo, G., Rigoli, F., & Chersi, F. (2013). The mixed instrumental controller: Using
value of information to combine habitual choice and mental simulation. Front.
Psychol., 4, 92.
Pezzulo, G., Rigoli, F., & Friston, K. (2015). Active Inference, homeostatic regulation
and adaptive behavioural control. Prog. Neurobiol., 134, 17–35.
48 K. Friston et al.
Pezzulo, G., van der Meer, M. A., Lansink, C. S., & Pennartz, C. M. (2014). Internally
generated sequences in learning and executing goal-directed behavior. Trends
Cogn. Sci., 647–657.
Preuschoff, K., Quartz, S. R., & Bossaerts, P. (2008). Human insula activation re-
flects risk prediction errors as well as risk. Journal of Neuroscience, 28(11), 2745–
2752.
Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional
interpretation of some extra-classical receptive-field effects. Nat. Neurosci., 2(1),
79–87.
Rigoli, F., Friston, K. J., & Dolan, R. J. (2016). Neural processes mediating contextual
influences on human choice behaviour. Nat. Commun., 7, 12416.
Santangelo, V. (2015). Forced to remember: When memory is biased by salient infor-
mation. Behav. Brain Res., 283, 1–10.
Schmidhuber, J. (1991). Curious model-building control systems. In Proc. Interna-
tional Joint Conference on Neural Networks (vol. 2, pp. 1458–1463). Piscataway, NJ:
IEEE.
Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine
neurons to reward and conditioned stimuli during successive steps of learning a
delayed response task. Journal of Neuroscience, 13, 900–913.
Schultz, W., & Dickinson, A. (2000). Neuronal coding of prediction errors. Annu. Rev.
Neurosci., 23, 473–500.
Schwartenbeck, P., FitzGerald, T., Dolan, R. J., & Friston, K. (2013). Exploration,
novelty, surprise, and free energy minimization. Front. Psychol., 4, 710.
Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., & Friston, K. (2015). The
dopaminergic midbrain encodes the expected certainty about desired outcomes.
Cereb. Cortex, 25(10), 3434–3445.
Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., Kronbichler, M., & Fris-
ton, K. (2015). Evidence for surprise minimization over value maximization in
choice behavior. Sci. Rep., 5, 16575.
Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., Wurst, F., Kronbichler,
M., & Friston, K. (2015). Optimal inference with suboptimal models: Addiction
and active Bayesian inference. Med. Hypotheses, 84(2), 109–117.
Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends
Cogn. Sci., 17(11), 565–573.
Solway, A., & Botvinick, M. (2012). Goal-directed decision making as probabilistic
inference: A computational framework and potential neural correlates. Psychol.
Rev., 119, 120–154.
Specht, K. (2014). Neuronal basis of speech comprehension. Hear. Res., 307, 121–135.
Srihasam, K., Bullock, D., & Grossberg, S. (2009). Target selection by the frontal
cortex during coordinated saccadic and smooth pursuit eye movements. J. Cogn.
Neurosci., 21(8), 1611–1627.
Stam, C. J. (2005). Nonlinear dynamical analysis of EEG and MEG: Review of an
emerging field. Clin. Neurophysiol., 116(10), 2266–2301.
Stauffer, W. R., Lak, A., & Schultz, W. (2014). Dopamine reward prediction error
responses reflect marginal utility. Curr. Biol., 24(21), 2491–2500.
Still, S., & Precup, D. (2012). An information-theoretic approach to curiosity-driven
reinforcement learning. Theory Biosci., 131(3), 139–148.
Active Inference 49
van den Broek, J. L., Wiegerinck, W. A. J. J., & Kappen, H. J. (2010). Risk-sensitive
path integral control. UAI, 6, 1–8.
van der Meer, M., Kurth-Nelson, Z., & Redish, A. D. (2012). Information processing
in decision-making systems. Neuroscientist, 18(4), 342–359.
Wittmann, B. C., Daw, N. D., Seymour, B., & Dolan, R. J. (2008). Striatal activity
underlies novelty-based choice in humans. Neuron, 58(6), 967–973.
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approx-
imations and generalized belief propagation algorithms. IEEE Transactions on
Information Theory, 51(7), 2282–2312.
Yuille, A., & Kersten, D. (2006). Vision as Bayesian inference: Analysis by synthesis?
Trends Cogn. Sci., 10(7), 301–308.
Zak, P. J. (2004). Neuroeconomics. Philos. Trans. R. Soc. Lond. B. Biol. Sci., 359(1451),
1737–1748.
Zelinsky, G. J., & Bisley, J. W. (2015). The what, where, and why of priority maps
and their interactions with visual working memory. Ann. N. Y. Acad. Sci., 1339,
154–164.
Zhang, H., & Maloney, L. T. (2012). Ubiquitous log odds: A common representation of
probability and frequency distortion in perception, action, and cognition. Frontiers
in Neuroscience, 6, 1.