Neural Computation: Mark Van Rossum Lecture Notes For The MSC/DTC Module. Version 06/07
Neural Computation: Mark Van Rossum Lecture Notes For The MSC/DTC Module. Version 06/07
1
Acknowledgement
My sincere thanks to David Sterratt for providing the old course, figures, and tutorials on which this
course is based. Typeset using LYX, of course.
2
January 5, 2007
1
More reading
• Although no knowledge of neuroscience is required, there are numerous good neuroscience
text books which might be helpful:(Kandel, Schwartz, and Jessel, 2000; Shepherd, 1994;
Johnston and Wu, 1995). Finally, (Bear, Connors, and Paradiso, 2000) has nice pictures.
• There is now also a decent number of books dealing with neural computation:
– (Dayan and Abbott, 2002) High level, yet readable text, not too much math. Wide
range of up-to-date subjects.
– (Hertz, Krogh, and Palmer, 1991) Neural networks book; the biological relevance is
speculative. Fairly mathematical. Still superb in its treatment of abstract models of
plasticity.
– (Rieke et al., 1996) Concentrates on coding of sensory information in insects and what
is coded in a spike and its timing.
– (Koch, 1999) Good for the biophysics of single neurons.
– (Arbib(editor), 1995) Interesting encyclopedia of computational approaches.
– (Churchland and Sejnowski, 1994) Nice, non-technical book. Good on population codes
and neural nets.
• Journal articles cited can usually be found via www.pubmed.org
• If you find typos, errors or unclarity in these lecture notes, please tell me so they can be
corrected.
Contents
1 Important concepts 5
1.1 Anatomical structures in the brain . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 The neocortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 The cerebellum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 The hippocampus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Cortical layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Measuring activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Preparations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Synaptic Input 28
4.1 AMPA receptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 The NMDA receptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 LTP and memory storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 GABAa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Second messenger synapses and GABAb . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5 Release statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.6 Synaptic facilitation and depression . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7 Markov description of channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.7.1 General properties of transition matrices . . . . . . . . . . . . . . . . . . . . 37
4.7.2 Measuring power spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.8 Non-stationary noise analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2
CONTENTS 3
8 Coding 62
8.1 Rate coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8.2 Population code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.3 Fisher information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.4 Information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.5 Correlated activity and synchronisation . . . . . . . . . . . . . . . . . . . . . . . . 68
11 Spiking neurons 83
11.1 Many layers of spiking neurons: syn-fire or not . . . . . . . . . . . . . . . . . . . . 83
11.2 Spiking recurrent networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
11.3 Spiking working memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.4 Spiking neurons: Attractor states . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
12 Making decisions 88
12.1 Motor output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
CONTENTS 4
Bibliography 108
Chapter 1
Important concepts
5
CHAPTER 1. IMPORTANT CONCEPTS 6
Figure 1.1: Left: Dissected human brain, seen from below. Note the optic nerve, and the radiation
to the cortex.
Right: The cortex in different animals. Note the relatively constant thickness of the cortex across
species. From (Abeles, 1991).
distributed over the brain (unlike a conventional computer, where most computations take place
in the CPU). Similarly, long-term memory seems distributed over the brain.
Figure 1.2: Pre-frontal damage. Patients are asked to draw a figure; command in upper line,
response below. From (Luria, 1966).
1.2 Cells
Like most other biological tissue, the brain consists of cells. One cell type in the brain are the
so-called glial cells. These cells don’t do any computation, but provide support to the neurons.
They suck up the spilt over neuro-transmitters, and others provide myelin sheets around the axons
of neurons.
More important for us are the neurons. There are some 1011 neurons in a human brain. The
basic anatomy of the neurons is shown in Fig. 1.4: Every neuron has a cell body, or soma, contains
the nucleus of the cell. The nucleus is essential for the cell, as here the protein synthesis takes
place making it the central factory of the cell. The neuron receives its input through synapses on
its dendrites (dendrite: Greek for branch). The dendritic trees can be very elaborate and often
receive more than 10,000 synapses.
Neurons mainly communicate using spikes, these are a brief (1ms), stereotypic excursions of
the neuron’s membrane voltage. Spikes are thought to be mainly generated in the axon-hillock,
a small bump at the start of the axon. From there the spike propagates along the axon. The
axon can be long (up to one meter or so when it goes to a muscle). To increase the speed of
signal propagation, long axons have a myelination sheet around them. The cortical regions are
connected to each other with axons. This is the white matter, because the layer of fat gives the
neurons a white appearance, see Fig. 1.1. The axon ends in many axon terminals (about 10.000
of course), where the connection to next neurons in the circuit are formed, Fig. 1.4. The action
CHAPTER 1. IMPORTANT CONCEPTS 8
Figure 1.3: Amnesia in a patient whose hippocampus was damaged due to a viral infection. From
(Blakemore, 1988).
potential also propagates back into the dendrites. This provides the synapses with the signal that
an action potential was fired.
A basic distinction between neurons is between the excitatory and inhibitory ones, depending
on whether they release excitatory or inhibitory neurotransmitter. (The inhibitory neurons I
personally call “neuroffs”, but nobody else uses this term. Yet...). There are multiple sub-types of
both excitatory and inhibitory neurons. How many is not well known. As genetic markers become
more refined, more and more subtypes of the neurons are expected to appear. It is not clear if
and how these different subtypes have different computational roles.
Finally, in reading biology one should remember there are very few hard rules in biology: There
are neurons which release both excitatory and inhibitory neurotransmitter, there are neurons
without axons, not all neurons spike, etc...
Figure 1.4: Sketch of the typical morphology of a pyramidal neuron. Right: Electron micrograph
of a neuron’s cell body. From (Shepherd, 1994).
not only feed-forward input from lower areas, but also many lateral and feedback connections. The
feedback connections are usually excitatory and non-specific. The role of the feedback connections
is not clear. They might be involved in attentional effects.
CHAPTER 1. IMPORTANT CONCEPTS 10
Figure 1.5: Left: Layers in the cortex made visible with three different stains. The Golgi stain
(left) labels the soma and the thicker dendrites (only a small fraction of the total number of cells is
labelled). The Nissl stain shows the cell bodies. The Weigert stain labels axons. Note the vertical
dimension is one or two millimetres. From (Abeles, 1991). Right: approximate circuitry in the
layers. From (Shepherd, 1994)..
• EEG (electro encephalogram) and ERP (event related potential) measure the potential on
skull. This has very poor spatial resolution, but temporal resolution is good. Non-invasive.
Good for behavioural reaction times, e.g. (Thorpe, Fize, and Marlot, 1996).
• fMRI (functional magnetic resonance imaging) measures increased blood oxygenation level.
The signal is related to neural activity in a not fully understood manner. It seems to correlate
better to synaptic activity than to spike activity(Logothetis et al., 2001). Resolution: about
1mm, 1 minute. Non-invasive. Good for global activity and localization studies in humans.
• Extracellular electrodes: When a tungsten electrode is held close enough to a firing cell,
the cell’s spikes can be picked up. Alternatively, the slow components of the voltage can
be analysed, this is called the field potential and corresponds to the signal from ensembles
of synapses from the local neighbourhood of cells. Extracellular recordings can be done
chronically in awake animals. A newer trend is to use many electrodes at once (either in an
array, or arranged in tetrodes). Limitations: no access to precise synaptic current, difficult
to control stimulation, need to do spike sorting (an electrode receives usually signals from a
couple of neurons, which is undesirable; the un-mixing is called spike sorting)
CHAPTER 1. IMPORTANT CONCEPTS 11
Figure 1.6: A setup to measure extracellular activity. After a band-pass filter the spikes can be
clearly extracted from the small electrical signal. Electrodes can remain implanted for years.
Figure 1.7: Patch-clamp recording. The inside of the cell is connected to the pipette. This allows
the measurement of single channel openings (bottom trace).
• Intracellular: Most intracellular recording are now done using the patch clamp technique.
A glass pipette is connected to the intracellular medium, Fig. 1.7. From very small currents
(single channel) to large currents (synaptic inputs and spikes) can be measured. Secondly
the voltage and current in the cell can be precisely controlled. Access to the intracellular
medium allows to wash in drugs that work from the inside of the cell. However, the access
can also lead to washout (the dilution of the cell’s content). Limitations: hard in vivo (due
to small movements of animal even under anaesthesia), and limited recording time (up to 1
hour).
• A relatively new method is to use optical imaging. The reflectance of the cortical tissue
changes slightly with activity, and this can be measured. Alternatively, dyes sensitive to Ca
CHAPTER 1. IMPORTANT CONCEPTS 12
or voltage changes can be added, and small activity changes can be measured.
1.4 Preparations
In order to study the nervous system under controlled conditions various preparations have been
developed. Most realistic would be to measure the nervous system in vivo without anaesthesia.
However, this has both technical and ethical problems. Under anaesthesia the technical problems
are less, but reliable intracellular recording is still difficult. And, of course, the anaesthesia changes
the functioning of the nervous system.
A widely used method is to prepare slices of brains, about 1/2 mm thick. Slices allow for
accurate measurements of single cell or few cell properties. However, some connections will be
severed, and it is not clear how well the in vivo situation is reproduced, as the environment
(transmitters, energy supply, temperature, modulators) will be different.
Finally, it is possible to culture the cells from young brain. The neurons will by themselves
form little networks. These cultures can be kept alive for long times. Also here the relevance to
in vivo situations is not always clear.
A living neuron maintains a voltage drop across its membrane. One commonly defines the voltage
outside the cell as zero. At rest the inside of the cell will then be at about -70mV (range -90..-
50mV). This voltage difference exist because the ion concentrations inside and outside the cell are
different. The main ions are K+ (potassium, or kalium in many languages), Cl− (chloride), Na+
(sodium), and Ca2+ (calcium).1
Consider for instance for Na, the concentration outside is 440mM and inside it is only 50mM
(squid axon). If the cell membrane were permeable to Na, it would flow in. First, because of the
concentration gradient (higher concentration outside than inside), second, because of the attraction
of the negative membrane to the positive Na ions. Because of these two forces, Na influx does
not stop when the voltage across the membrane is zero. Only if the voltage across the membrane
would be +55 mV, net Na inflow would stop. This +55mV is called the reversal potential
of Na. Likewise K has a reversal potential of -75mV (outside 20 mM and inside 400mM), and Cl
of -60mV (outside 500mM and inside 50mM). The reversal potential can be calculated from the
Nernst equation
58mV [X]outside
Vrev = log10
z [X]inside
which follows from the condition that diffusive and electrical force should cancel at equilibrium.
The valency of the ion is represented by z.
However, at rest the Na channels are largely closed and only very little Na will flow into the cell.
2
The K and Cl channels are somewhat open, together yielding a resting potential of about -70mV.
By definition no net current flows at rest (else the potential would change). The concentration
gradient of ions is actively maintained with ion-pumps and exchangers. These proteins move ions
from one side of the membrane to the other at the expense of energy.
solution carry charge, so we have at least 4 different charge carriers, all of them contributing to the total current
in the cell.
2 When many ions contribute to the potential the Goldman-Hodgkin-Katz voltage equation should be used to
13
CHAPTER 2. PASSIVE PROPERTIES OF CELLS 14
Figure 2.1: The typical ion concentration inside and outside neurons. The concentrations listed
here are for mammals, whereas those in the text are for squid.
1
0.8
0.6
Voltage
0.4
0.2
0
−50 0 50 100 150
Time (ms)
Figure 2.2: Left: RC circuit to model a single compartment cell. Middle: schematic model. The
injected current is Iinj , the total capacitance is C and R is the total membrane resistance. Right:
Response of the voltage to a step in the stimulus current.
sum of the currents at any point in the circuit should be zero. What are the different contributions
to the current? The current through the resistor is given by Ohm’s law3
∆V Vmem − Vrest
Iresistor = =
R Rm
Similarly, there is a current associated to the capacitance. This current flows for instance when
initially the voltage across the capacitor is zero, but suddenly a voltage is applied across it. Like
a battery, A current flows only until the capacitor is charged up (or discharged). The current into
the capacitor is
dVmem
Icap = C
dt
3 Ohm’s law says that current and voltage are linearly related. As soon as the linearity is lost, Ohm’s law is
broken. This happens for instance in diodes or neon tubes, in that case we have a non-Ohmic conductance.
CHAPTER 2. PASSIVE PROPERTIES OF CELLS 15
It is important to note that no current flows when the voltage across the capacitor does not change
over time. (Altenatively, you describe the capacitor using a complex impedance).
Finally we assume an external current is injected. As stated the sum of the currents should
be zero. We have to fix the signs of the currents first: we define currents flowing away from the
point to be negative. Now we have −Iresistor − Icap + Iext = 0. The circuit diagram thus leads to
the following differential equation for the membrane voltage.
dVm (t) 1
C =− [Vm (t) − Vrest ] + Iinj (t)
dt Rm
In other words, the membrane voltage is given by a first order differential equation. It is always
a good idea to study the steady state solutions of differential equations first. This means that we
assume Iext to be constant and dV /dt = 0. We find for the membrane voltage V∞ = Vrest +Rm Iext .
If the current increases the membrane voltage (Iext > 0) it is called de-polarising; if it lowers
the membrane potential it is called hyper-polarising.
How rapidly is this steady state approached? If the voltage at t=0 is V0 , one finds by substitu-
tion that V (t) = V∞ + [V0 − V∞ ] exp(−t/τm ). So, the voltage settles exponentially. The product
τm = Rm C is the time constant of the cell. For most cells it is between 20 and 50ms, but we will
see later how it can be smaller under spiking conditions. The time-constant determines how fast
the subthreshold membrane voltage reacts to fluctuations in the input current. The time-constant
is independent of the area of the cell. The capacitance is proportional to the membrane area
(1µF/cm2 ), namely, the bigger the membrane area the more charge it can store.
It is useful to define the specific resistance, or resistivity, rm as
rm = ARm
The units of rm are therefore Ω.cm2 . The resistance is inversely proportional to membrane area
(some 50kΩ.cm2 ), namely, the bigger the membrane area the more leaky it will be. The product
of membrane resistance and capacitance is independent of area. It is also useful to introduce the
conductance through the channel, the conductance is the inverse of the resistivity g = 1/R.
The larger the conductance, the larger the current. Conductance is measured in Siemens (symbol
S).
Note, this section dealt with just an approximation of the behaviour of the cell. Such an
approximation has to be tested against data. It turns out to be valid for small perturbations of
the potential around the resting potential, but at high frequencies corrections to the simple RC
behaviours exist (Stevens, 1972).
dVm (x, t) 1 d 1
cm = − Vm (x, t) + 2 [V (x + h, t) − 2V (x, t) + V (x − h, t)] + Iext (x, t)
dt rm 4h ri
CHAPTER 2. PASSIVE PROPERTIES OF CELLS 16
Figure 2.3: Top: Electrical equivalent of a small cable segment. Bottom: Also branched cables
can be calculated in this formalism. From(Koch, 1999).
(where the external current is now defined per area.) Now we take the limit of small h, i.e. we
split the cable in very many small elements, and get the passive cable equation. Use that the
derivative is defined as dfdx
(x)
= limh→0 h1 [f (x + h) − f (x)]
The cable equation describes how a current locally injected to the cable propagates and decays.
First, consider the steady state. Suppose a constant current is injected at x = 0, and the dV /dt
is set to zero. The current injection at x = 0 can be written as a delta function I = I0 δ(x).4 In the
2
steady state cable equation 0 = d4 r1i d dx
V (x)
2 − r1m Vm (x)+ I0 δ(x). Integrate this over a narrow region
R²
around x = 0 (i.e. apply lim²→0 −² on both sides) and you find 0 = 4rdi ( dVdx(²) − dVdx (−²)
) + I0 .1.
In other words, the spatial derivative of V makes a jump, hence V itself will have a cusp at x = 0,
Fig. 2.4. The solution to the steady state equation is
I0 rm τm x2
Vm (x, t) = p exp(− ) exp(−t/τm ) (2.1)
4πt/τm 4λ2 t
The solution is plotted in Fig. 2.4B. Mathematically, the equation is a diffusion equation with
an absorption term. If we inject current into the cable, it is as if we drop some ink in a tube filled
with water and some bleach: the ink diffuses and spreads out; in the long run it is neutralised
by the bleach. Note, that in the passive cable there is no wave propagation, the input only
blurs. Therefore it is a bit tricky to define velocities and delays. One way is to determine when
4 A delta function is zero everywhere, except at zero, where it is infinite. Its total area is one. You can think of
Figure 2.4: Solution to the cable equation in an infinite cable. A) Steady state solution when a
constant current is injected at x=0. B) Decaying solution when a brief current pulse is injected
at x=0 and t=0. Taken from(Dayan and Abbott, 2002).
the response reaches a maximum at different distances from the injection site. For instance, in
Fig. 2.4 right the voltage at x/λ = 1 reaches a maximum around t = 1.τ . In general one finds
with Eq. 2.1 that
τm p 2 2
tmax = [ 4x /λ + 1 − 1]
4
For large x this means a “speed” v = 2λ/τm .
It is important to realise that because of the blurring, high frequencies do not reach as far in
the dendrite. In other words, the response far away from the stimulation site is a low-pass filtered
version of the stimulus. This is called cable filtering.
In the derivation we assumed a long cable and homogeneous properties. Real neurons have
branches and varying thicknesses. The voltage can be analytically calculated to some extent, but
often it is easier to use a computer simulation in those cases. Under the assumption that the
properties of the cables are homogeneous, this is fairly straightforward using the cable equation.
However, the validity of assuming homogeneity is not certain. This important question has to be
answered experimentally.
More reading: (Koch, 1999)
Chapter 3
We have modelled a neuron with passive elements. This is a reasonable approximation for sub-
threshold effects and might be useful to describe the effect of far away dendritic inputs on the
soma. However, an obvious property is that most neurons produce action potentials (also called
spikes). Suppose we inject current into a neuron which is at rest (-70mV). The voltage will start
to rise. When the membrane reaches a threshold voltage (about -50mV), it will rapidly depolarise
to about +10mV and then rapidly hyper-polarise to about -70 mV. This whole process takes only
about 1ms. The spike travels down the axon. At the axon-terminals it will cause the release of
neurotransmitter which excites or inhibits the next neuron in the pathway.
From the analysis of the passive properties, it seems that in order to allow such fast events
as spikes, the time-constant of the neuron should be reduced. One way would be the reduce the
membrane capacitance, but this is biophysically impossible. The other way is to dramatically
increase the conductance through the membrane, this turns out to be the basis for the spike
generation. The magnificent series of papers of Hodgkin and Huxley in 1952 explains how this
works (Hodgkin and Huxley, 1952).
Figure 3.1: Voltage gated channels populate the cell membrane. The pores let through certain
ions selectively, and open and close depending on the membrane voltage.
18
CHAPTER 3. ACTIVE PROPERTIES AND SPIKE GENERATION 19
reversal potential is +40mV), the sodium starts to flow in, depolarising the cell. 3) This positive
feedback loop will open even more Na channels and the spike is initiated. 4) However, rapidly
after the spike starts, sodium channels close again and now K channels open. 5) The K ions starts
to flow out the cell, hyper-polarising the cell, roughly bringing it back to the resting potential.
We now describe this in detail. Consider a single compartment, or a small membrane patch.
As before, in order to calculate the membrane potential we collect all currents. In addition to
the leak current and capacitive current, we now have to include Na and K currents. Let’s first
consider the sodium current. The current (per area) through the sodium channels is
The current is proportional to the difference between the membrane potential and the Na rever-
sal potential. The current flow will try to make the membrane potential equal to the reversal
potential.12
The total conductance through the channels is given by the number of open channels
0
gN a (V, t) = gN a ρN a Popen (V, t)
0
where gna is the open conductance of a single Na channel (about 20 pS), and ρN a is the density
of Na channels per area. The Na channel’s open probability turns out to factorise as
where m and h are called gating variables. Microscopically, the gates are like little binary switches
that switch on and off depending on the membrane voltage. The Na channel has 3 switches labelled
m and one labelled h. In order for the sodium channel to conduct all three m and the h have to
be switched on. The gating variables describe the probability that the gate is in the ’on’ or ’off’
state. Note that the gating variables depend both on time and voltage; their values range between
0 and 1. The gating variables evolve as
dm(V, t)
= αm (V )(1 − m) − βm (V )m (3.2)
dt
dh(V, t)
= αh (V )(1 − h) − βh (V )h
dt
Intermezzo Consider a simple reversible chemical reaction in which substance A is turned into sub-
stance B.
β
[A] [B]
α
the rate equation for reaction is: d[A]/dt = −β[A] + α[B]. Normalising without loss of generality such
that [A] + [B] = 1, we have: d[A]/dt = α(1 − [A]) − β[A]. This is very similar to what we have for
the gating variables. The solution to this differential equation is exponential, like for the RC circuit. If
at time 0, the concentration of A is [A]0 , it will settle to
1 The influx of Na will slightly change the reversal potential. Yet the amount of Na that flows in during a single
action potential causes only a very small change in the concentrations inside and outside the cell. In the long run,
ion pumps and ion exchangers maintain the correct concentrations.
2 There a small corrections to Eq. 3.1 due to the physical properties of the channels, given by the Goldman-
Figure 3.2: Left: The equilibrium values of the gating variables. Note that they depend on the
voltage. Also note that the inactivation variable h, switches off with increasing voltage, whereasm
switches on. Right: The time-constants by which the equilibrium is reached. Note that m is by
far the fastest variable. From(Dayan and Abbott, 2002).
The interesting part for the voltage gated channel is that the rate constants depend on the
voltage across the membrane. Therefore, as the voltage changes, the equilibrium shifts and the
gating variables will try to establish a new equilibrium. The equilibrium value of the gating
variable is
αm (V )
m∞ (V ) =
αm (V ) + βm (V )
and the time-constant is
1
τm (V ) =
αm (V ) + βm (V )
Empirically, the rate constants are (for the squid axon as determined by Hodgkin and Huxley)
25 − V
αm (V ) = (3.3)
10[e0.1(25−V ) − 1]
βm (V ) = 4e−V /18
αh (V ) = 0.07e−V /20
1
βh (V ) = 0.1(30−V )
1+e
In Fig. 3.2 the equilibrium values and time-constants are plotted.
Importantly, the m opens with increasing voltage, but h closes with increasing voltage. The
m is called an activation variable and h is an inactivating gating variable. The inactivation causes
the termination of the Na current. Because the inactivation is much slower than the activation,
spikes can grow before they are killed.
We can write down a Markov state diagram for a single Na channel. There are 4 gates in total
(3 m’s, and 1 h), which each can be independently in the up or down state. So in total there are
24 = 16 states, where one of the 16 states is the open state, Fig 3.3 top. However, in the diagram
it makes no difference which of the m gates is activated, and so it can be reduced to contain 8
distinct states, Fig. 3.3 bottom.
Inactivated
11
00
00
11 11
00
00
11 11
00
00
11
00
11
00
11 00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
αm 11
00
00
11
00
11
11
00
00
11 11
00
00
11
00
11
00
11 00
11
00
11 00
11
00
11 βm
00
11 00
11
00
11 αh
00
11 11
00
00
11
00
11
00
11 00
11
11
00
00
11 00
11
00
11
00
11 βh
00
11
00
11 00
11
00
11
11
00 11
00
00
11 11
00
00
11 11
00
00
11
00
11
00
11 00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11
00
11 00
11
00
11 Open
Rest 11
00
00
11
00
11 11
00
00
11
00
11
00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11
00
11
11
00 00
11
00
11
00
11
00
11
00
11
00
11
00
11 00
11
00
11
00
11
00
11
00
11
00
11 00
11
00
11
3α m 2α m αm
m0 h 0 m1 h 0 m2 h 0 m3 h 0
βm 2β m 3β m
βh
αh
βh
αh
βh
αh
βh
αh
3α m 2α m αm
m0 h 1 m1 h 1 m2 h 1 m3 h 1
βm 2β m 3β m
Figure 3.3: Upper: The 16 possible states of the Na channel. The 4 gating variables can each be
in off state (white symbols) or on state (grey symbols). There are 3 ’m’ gates (squares) and 1
’h’ gate (circle). In the transitions between the states, one symbol changes color. For clarity all
possible transitions of only one state are shown. All states but one correspond to a closed channel.
Bottom: The Markov diagram can be simplified to 8 distinct states. The rate-constants have to
be changed as indicated (check for yourself).
where ḡN a is the total maximal sodium current per area (0.12 S/cm2 in HH). Indeed, this is the
way the original HH theory was formulated. In the original HH model, all gating variable are
continuous real quantities, not switches. In contrast, in the stochastic version, one has a discrete
number of channels each with activation variables (h, m, n) that flip between on and off states,
this in turns leads to a flickering of the conductances. The rate constants give the probability that
they flip. In the limit of a large number of channels, the stochastic description matches of course
the original one.
The importance of the channel noise on the spike timing reliability is not fully known. It is not
a big effect, but is probably not completely negligible (van Rossum, O’Brien, and Smith, 2003).
As seen in Fig 3.4, already for 100 channels the noise is quite small.
The K channel works in a similar way as the Na channel. Its conductance can be written as
The difference is that the K current does not inactivate. As long as the membrane potential
remains high, the K conductance will stay open, tending to hyper-polarize the membrane. The
gating variable n also obeys dn/dt = αn (1 − n) − βn n. The rate constants are
(10 − V )/100
αn =
exp[0.1(10 − V )] − 1
βn = 0.125 e−V /80
CHAPTER 3. ACTIVE PROPERTIES AND SPIKE GENERATION 22
Figure 3.4: Top: State diagram for the K channel. Bottom: the K current for a limited number
of channels. Left: 1 channel; right 100 channels. The smooth line is the continuous behaviour.
Taken from (Dayan and Abbott, 2002).
Figure 3.5: Spike generation in a single compartment Hodgkin-Huxley model. The cell is stimu-
lated starting from t = 5ms. Top: voltage trace during a spike. Next: Sum of Na and K current
through membrane (stimulus current not shown). Bottom 3: the gating variables. Note how m
changes very rapidly, followed much later by h and n. From (Dayan and Abbott, 2002).
As can be seen in Fig. 3.2, the K conductance is slow. This allows the Na to raise the membrane
potential before the K kicks in and hyper-polarizes the membrane.
CHAPTER 3. ACTIVE PROPERTIES AND SPIKE GENERATION 23
3.1.2 A spike
To obtain the voltage equation we apply Kirchoff’s law again and collect all currents: the capaci-
tive, the leak, Na, and K currents. The Hodgkin-Huxley equation for the voltage is therefore
dV (t)
cm = −gleak [V (t) − Vleak ] − gN a (V, t)[V (t) − VNrev rev
a ] − gK (V, t)[V (t) − VK ] + Iext
dt
The full Hodgkin Huxley model consists of this equation, the equations for the Na current: Eqs.
(3.4),(3.2) and (3.3), and the similar equations for the K current! The Hodgkin-Huxley model is
complicated, and no analytical solutions are known. It is a four-dimensional, coupled equation
(the dimension are V, h, m, n). To solve it one has to numerically integrate the equations. The
simplest, but not the most efficient, way would be: initialise values for voltage (and the gating
variables) at time 0. Next, we calculate the voltage a little bit later. Calculate the rate constants
at the current voltage. From this calculate the change in the gating variables. From this follows
the Na and K conductance. Now the new value of the potential can be calculated. Repeat this
for the next time-step.
Fig. 3.5 shows a simulated spike. Note the sequence of the processes: first m, later followed by n
and h. The passive membrane is slow. The spike generation can be so fast, because the membrane
is made very leaky during a short time. The currents through the voltage gated channels are much
larger than the stimulus current. At rest small currents are sufficient to change the voltage, but
during the spike, the membrane is leaky and much larger currents flow.
The above description is for a single compartment. If one wants to describes propagation of
the spike in the axon, one has to couple the compartments like we did in the cable equation. The
equation holds for each compartment in the axon. In the experiments of Hodgkin and Huxley, a
small silver wire was inserted all along the axon, this causes the voltage to be equal across the full
axon. That way the spatial dependence was eliminated (called a space-clamp). Furthermore,
the voltage of the axon was controlled, a so-called voltage-clamp. Without the regenerative
mechanism in place, the rate constants could be measured accurately.
When the silver wire was removed, the axon is allowed to run “free”. The spikes now travel
on the axon. The equations predict the speed of the propagation of the spike. This speed closely
matched the speed observed. This provided an important check of the model.
Final comments:
• The kinetics of the HH model depend strongly on the temperature. The original experiments
were carried out at 6.3C. Usually temperature dependence is expressed with Q10 , which
describes how much quicker a certain reaction goes when the temperature is 10C higher.
A Q10 of 3 means that when going from 6C to 36C, the reaction goes 27 times faster. To
understand the effect on the spike generation, consider voltage-component of the space-
dependent Hodgkin-Huxley equations:
dV (x, t) d 1 d2 V (x, t) X
cm = − gk (V, t)[V (x, t)−Vkrev ]−gleak [V (x, t)−Vleak
rev
]+Iext (3.5)
dt 4 ri dx2
k
where ri is the axial resistivity and gk are the ionic conductances densities. At 36o C the
channel conductances speed up to gk0 (V, t) = gk (V, qt) with q = 27, that is, spikes are faster.
But in terms of this scaled time qt, the capacitance term becomes larger, C dV dV
dt → qC d(qt) ,
and as a result the spikes are damped more (Huxley, 1959). In order, at high temperature
there is no time to charge to membrane. This means that higher channel densities are
required for a model to work both at 6C and at 35C than a model adjusted to work just
at 6C. Consistent with this the firing rate increase at higher temperatures. In addition the
spike amplitude decreases at higher temperatures. This can easily be checked in simulations.
• We have approximated the gates as operating independently. There is no a priori reason
why this should be the case, but it works reasonably. Nevertheless, more accurate models of
the channel kinetics have been developed. Readable accounts can be found in Hille’s book
(Hille, 2001).
CHAPTER 3. ACTIVE PROPERTIES AND SPIKE GENERATION 24
100
80
K_noise/FI.dat
60 NA_K_noise/FI.dat
Na_noise/FI.dat
F (hz)
no_noise/FI.dat
40
20
0
0 10 20 30
I (pA)
Figure 3.6: FI curve for the Hodgkin Huxley model. Also shown the effect of channel noise on the
FI curve. This a small (100µm2 ) patch, so channel noise is substantial.
Figure 3.8: Spike frequency adaptation. Left: Normal situation, a step current is injected into
the soma, after some 100ms spiking stops. Right: Noradrenaline reduces the KCa current, thus
limiting the adaptation.
3.2.1 KA
The KA current (IA ) is a K current that inactivates at higher voltages. This seems a bit counter-
productive as one could expect that the higher the membrane voltage, the more important it is to
counteract it with a K current.
The effect of KA current on the firing is as follows: Suppose the neuron is at rest and a
stimulus current is switched on. At first the KA currents still are active, they keep the membrane
relatively hyper-polarized. As the KA channels inactivate, the voltage increases and the spike is
generated. The KA current can thus delay the spiking. Once repetitive spiking has started, the
same mechanism will lower the spike frequency to a given amount of current.
the cell should be taken into account. More details in(Koch, 1999; de Schutter and Smolen, 1998).
CHAPTER 3. ACTIVE PROPERTIES AND SPIKE GENERATION 26
strongly (see (Liu and Wang, 2001) for a mathematical model). The precise consequences remain
to be examined.
Most excitatory neurons have strong spike frequency adaptation (inhibitory ones much less
so). On many levels one can observe that the brain likes change: When an image is projected
steadily on the retina, the percept disappears after a few minutes. (By continuously making eye-
movements, known as saccades, this effect does not occur in daily life.) Also on the single cell
level changes usually have a larger effect on the output than steady state conditions.
3.2.4 Bursting
Some cells, for example in the thalamus, show bursting behaviour. In a burst, a few spike are
generated in a short time. The burst often continues after the stimulation is stopped. Ca channels
are thought to be important for burst generation. The computational role of bursts is not clear.
3.4 Myelination
Axons that have to transmit their spikes over large distances are covered with a sheet of myelin.
This highly insulating layer, reduces both the leak and the capacitance by a factor of about 250,
speeding up the propagation. The myelin covering is interrupted ever so often along the axon.
At these nodes one finds a very high density of sodium and potassium channels that √ boost the
spike. For unmyelinated axons the propagation speed is about 1m/s and proportional to d, while
for myelinated fibers the speed is about 50 m/s (and proportional to d. The actual speed is a
complicated reflection of the parameter, but can be found numerically. The proportionality to the
diameter can be found using dimensional considerations.
CHAPTER 3. ACTIVE PROPERTIES AND SPIKE GENERATION 27
Synaptic Input
So far we have studied artificially stimulated neurons, which responded because we injected a
current into them. But except for primary sensory neurons, input comes from other cells in the
brain. Synaptic contacts communicate the spikes in axon terminals of the pre-synaptic cell to
the dendrites of the post-synaptic cell. The most important are chemical synapses, in which
transmission is mediated by a chemical, called a neuro-transmitter.
Another type of connection between neuron are the so called gap-junctions, also called
electrical synapses. These synapses are pore-proteins, i.e. channels, and provide a direct
coupling of small molecules and ions (and hence voltage!) between cells. Although gap junctions
are abundant in some brain areas, their function is less clear. One clear role for gap-junctions
exists in the retina where they determine the spatial spread of electric activity. At different light
levels, dopamine regulates the gap junction conductivity and hence the spatial filtering of the
signal (see below for adaptation). Another role has been implied in the synchronization of spikes
in hippocampal inter-neurons (ref). We will not consider gap-junctions any further and follow the
common convention that by synapse we mean actually a chemical synapse.
Figure 4.1: Left: Schematic of a chemical synapse. Right top: Electron micrograph of a (chemical)
synapse. Right bottom: Electron micrograph of an electrical synapse or gap-junction.
28
CHAPTER 4. SYNAPTIC INPUT 29
Because synapses are the main way neurons communicate and how the networks dynamically
change, it is important to know their behavior.
rev
where VAM P A is the AMPA reversal potential, which is about 0mV. The g0 is the synapse’s peak
conductance. The peak conductance is depends on a number of factors we discuss below.
Like the voltage-gated channels, the channel openings and closings are stochastic events; they
occur randomly. Using patch-clamp recording these fluctuations can be measured, and the single
channel conductance has been determined and is about 10-100 pS for most channel types (Hille,
2001). The number of receptors is usually somewhere between 10 and a few hundred per synapse
for central synapses.
1 The distinction pre- and post-synaptic is useful when talking about synapses. However, note this term does
not distinguish cells; most neurons are post-synaptic at one synapse, and pre-synaptic at another synapse.
CHAPTER 4. SYNAPTIC INPUT 30
Figure 4.2:
State diagram for AMPA receptor. Its response is shown in the next figure. Parameters for the
state diagram are: Rb = 13 × 106 M −1 s−1 , Rd = 900s−1 , Rr = 64s−1 , Ro = 2.7 × 103 s−1 and
Rc = 200s−1 . T denotes the transmitter.From (Destexhe, Mainen, and Sejnowski, 1998).
Figure 4.3:
Comparison of the most important types of synaptic responses. Note the difference in time-scales
(both NMDA and GABAb are slow). From (Destexhe, Mainen, and Sejnowski, 1998)
CHAPTER 4. SYNAPTIC INPUT 31
Figure 4.4: NMDA response is non-linear when Mg is present (which is normally the case).
From(Koch, 1999).
2) The time course of the NMDA current is much longer than AMPA, Fig. 4.3. The NMDA
response comes on, after the AMPA response has largely decayed. It decays back with a time-
constant of some 100ms. This long time-constant has helped modellers to build networks which
have slow dynamics. The slower dynamics help to stabilise attractor states, see Chap. 10.
3) The difference in the dynamics is caused by a much slower unbinding of Glu from the receptor
(the binding of Glu to the receptor is similar). As a result, and some NMDA receptors can still
CHAPTER 4. SYNAPTIC INPUT 32
be occupied when the next Glu pulse arrives, which can cause saturation. Another consequence is
that the NMDA receptor is much more sensitive than AMPA when long puffs of Glu are applied.
4.3 GABAa
GABA is the main inhibitory transmitter. The transmitter binds to both GABAA and GABAB
receptors. The GABAA time-constant is about 5 ms. GABA synapses are often found close to the
cell body, whereas excitatory (AMPA) synapses are usually distributed all over the dendritic tree.
Counter-intuitively, the inhibitory reversal potential can be close to, or even above the resting
potential and still inhibit the cell. This is called shunting inhibition, the inhibitory conductance
effectively increases the leak conductance. Because this will make it harder for the neuron the
reach the threshold voltage, it has an inhibitory effect.
Figure 4.5: Schematic of GABAa and GABAb mechanism. For GABAa the transmitter binds
directly to the ion channel which it then opens (left), GABAb uses a second messenger cascade in
which the transmitter first binds to a GABAb receptor, which activates a G-protein, which opens
a channel (right).
Second messengers are common in neurobiology. Many neuro-modulators act through second
messenger systems. Although second messenger systems don’t give fast responses, they are very
flexible and can trigger also other events than channel openings. Because they trigger a cascade
of reactions, they can also be very non-linear. The GABAb response is an example of that: longer
stimuli give an un-proportionally large response, Fig. 4.6. Another important second messenger
receptor is the metabotropic glutamate receptor mGluR. For reason not fully clarified, mGluR
can often be found in AMPA/NMDA synapses.
• The content and diameter of the vesicle varies. The radii of the vesicles varies (COV2 on the
order of 10%), and hence the amount of transmitter per vesicle varies (assuming a constant
concentration). Exercise: Try to figure out the COV in the volume from the radius COV.
• Per event a varying number of vesicles is released.
• The binding of transmitter post-synaptically and the consequent opening of the channels is
stochastic.
The release of a discrete number of vesicles is called the quantum hypothesis. The quantum
hypothesis is one of the classic cornerstones of synaptic transmission. Consistent with this, the
distribution of amplitudes can have multiple peaks each associated with a vesicle, Fig. 4.7. The
easiest model is to assume a large number of independent release sites. The number of vesicles
released per spike, k, is then given by the Poisson distribution
mk exp(−m)
PP oisson (k) = (4.1)
k!
2 COV = coefficient of variation = standard deviation/mean. See chapter 6.
CHAPTER 4. SYNAPTIC INPUT 34
Figure 4.6: Comparison of response linearity of the major synapses. Note that this represents just
one aspect of the dynamics. The full dynamics are given by the state diagrams.
where m is the average number of vesicles released or quantal content. This can be used to
fit the distribution of amplitudes. Note that sometimes no vesicle will be released at all. This is
called failure. The Poisson distribution predicts that this happens with a probability exp(−m).
If the number of release sites, active zones, is limited, one needs to replace the Poisson
distribution with a binomial distribution
µ ¶
n k
Pbin (k) = p (1 − p)n−k
k
when there are n vesicles maximally per release.
CHAPTER 4. SYNAPTIC INPUT 35
Figure 4.7: Distribution of synaptic response amplitudes averaged over many trials. The different
humps are thought to correspond to the release of different numbers of vesicles. This illustrates
the quantum hypothesis. From (Stricker, 2002).
Most of these models were developed in squid and neuromuscular junctions, where the distri-
butions are clear. In central synapses (such as in hippocampus or cortex), results are less clear,
for review see (Stricker, 2002). It has been suggested that a single vesicle could fully saturate the
post-synaptic receptors. This would render the above analysis invalid.
dprel (t)
τdepress = p∞
rel − prel (t)
dt
The second term describes that every time a vesicle is released the release probability is reduced
with a fixed amount
CHAPTER 4. SYNAPTIC INPUT 36
Figure 4.8: Schematic of the effect of synaptic depression. Two excitatory neurons P1 (only firing
after a delay) and P3 (continuously firing) provide input to P2. Because of depression, only the
transients of the input are transmitted. Synapses onto inhibitory neurons (I) are thought to be
more likely to facilitate. The response of I is therefore strongly supra-linear in the firing rate of
P3. From comment in Science on(Abbott et al., 1997).
This should be supplemented with the requirement that the release probability should of not be
allowed to go below zero. For more models, see(Varela et al., 1997; Dayan and Abbott, 2002). The
result is a strong non-linearity in the transfer function, dependent on the history of the synapse.
Recently, the functional roles of depression are starting to be realised, e. g. (Carandini, Heeger,
and Senn, 2002).
The unreliable transmission also opens the possibility that the release probability can be sys-
tematically regulated. There was some evidence for that LTP not only changed synaptic strength
but also the temporal pattern of release (Markram and Tsodyks, 1996), inspiring a lot of models,
however, experimental evidence has failed to accumulate.
Final remark: two neurons might use more than one synaptic contact to contact each other.
In the cortex a typical number seems to be between 1 and 10 synaptic contacts per connection.
Of course, these synapses are proportionally more reliable and stronger.
The first binding of transmitter T move the channel from closed state C to a second closed state
C 0 ; binding of another T opens the channel. We can write the state of the channel as a vector
~s(t) = (C, C 0 , O). The entries in the vector are the probability to find a certain channel in that
CHAPTER 4. SYNAPTIC INPUT 37
state, or, when measuring populations of channels the average number of channels in that state.
The dynamics of the channel can then be written as
d~s
= W.~s
dt
where W is a transition matrix between the different states. (This formalism is also called master
equation and is closely related to the Chapman Kolmogorov equation.) For our example it is
−kT α 0
W = kT −k 0 T − α α0
0 k0 T −α0
The W matrix has always at least one zero eigenvalue and the corresponding eigenvector is the
steady state. In this case ~s∞ ∝ (αα0 , α0 kT, kk 0 T 2 ). The open probability in this steady state is
P3
found by normalising such that the sum of the probabilities is 1 (i.e. i=1 si = 1). In this case it
is O∞ = kk 0 T 2 /(αα0 +α0 kT +kk 0 T 2 ). For small concentration of transmitter, the open probability
is proportional to the transmitter concentration squared. Then there is an intermediate regime,
and for large concentration, O∞ = 1, i.e. the receptor is saturated.3
The other two eigenvalues of the transition matrix tell us to how quick the system settles to
the steady state. The general solution can be written as
where λi is eigenvalue i and ~si is the corresponding eigenvector. The constants c1 and c2 are
determined by the solution at time 0, such that ~s(0) = ~s∞ + c1~s1 + c2~s2 .
The average of many channels will tend to the steady state, but a single channel will randomly
fluctuate, it will fluctuate around the equilibrium.
wij sequil
j = wji sequil
i
You check easily check this is true for the Na-channel state diagram shown above by calculation
clockwise and counter clockwise loop probabilities. Detailed balance is a requirement from statis-
tical physics. Without it, we still have a proper Markov process, but one which can not describe
a physical process.
1. The probability density function (PDF), also called probability distribution. In this
case it is simple: the channel is either open or closed. The probability whether the channel
is open is determined by the amount of transmitter present. With multiple channels the
probability distribution becomes a binomial distribution.
2. The other thing to know is how fast the fluctuations change over time, how often does the
channel switch per second. This is expressed in the autocorrelation function. As we
shall see, it is closely related to the power spectrum.
For most processes c(t0 ) decays for long t0 , in other words, the process looses its memory. For the
simplest Markov scheme with two-states (C O), c(t0 ) decays exponentially. Namely, c(t0 ) =
c e−|t|/τ , we call τ the correlation time. In the case of our three-state synaptic channel, the
correlation is the sum of two exponentials with time-constants 1/λ1 and 1/λ2 .
RT
The Fourier transform of the signal is defined as s(f ) = 0 S(t)e−2πif t dt. The power-
spectrum gives how much power the fluctuations generate at each frequency. The power-spectrum
w is given by
2
w(f ) = lim |s(f )|2
T →∞ T
The Wiener-Khinchin theorem says that the power-spectrum and the autocorrelation are related
through a Fourier transform
Z ∞
w(f ) = 4 c(t) cos 2πf t dt
0
Z ∞
c(t) = w(f ) cos 2πf t df
0
Figure 4.9: Left: Non stationary noise analysis as applied the the sodium channels. A repeated
step current leads to an average response. The size of the fluctuations around it are dependent on
the unitary conductance.
Right: The resulting parabola when variance is plotted versus the mean. From (Hille, 2001)
protocol determines the open probability of the channel as a function of time. Let’s call the open
probability at any time p(t). The stimulus can be either a voltage step for voltage gated channels,
or transmitter jump for synaptic channels. The average current, averaged over many trials is
hI(t)i = i0 N p(t)
where i0 is the unitary channel current and N the number of channels. The variance will be
The last equation shows that the channel’s unitary can be extracted from plotting the variance
versus the mean, Fig. 4.9. In addition the number of channels can be extracted.
Good models for synaptic kinetics can be found in (Destexhe, Mainen, and Se-
jnowski, 1998), more on Markov diagrams in (Johnston and Wu, 1995).
Chapter 5
dVm (t) 1
C =− [Vm (t) − Vrest ] + Iext (t)
dt Rm
This is the so called leaky integrate and fire model. The simplification without the Rm -term
is called the leak-less I&F model. Usually, the Vreset is chosen to be equal to Vrest , but this is not
necessary. Observe that there is no real spike in the membrane voltage, but instead the spike is
Threshold
Leak Reset
Capacitor
Figure 5.1: Left: The circuit diagram of the integrate-and-fire neuron. Right: example voltage
trace. The integrate-and-fire neuron does not generate the nice spike shown, it only generates
’spike-events’. The spike shape was added artificially by hand.
40
CHAPTER 5. INTEGRATE AND FIRE MODELS 41
an event.Although the integrate-and-fire model appears linear and simple, it actually is not. This
is due to the threshold and subsequent reset.
The F/I curve of the I&F neuron is easily calculated: Suppose Vreset = Vrest , and the neuron
is driven by a current Istim . Let’s assume the neuron has fired at t = 0. Its membrane potential is
Vrest . When will the next spike occur? When is the threshold voltage reached? The voltage will
behave as V (t) = Vrest + Istim Rm [1 − exp(−t/τm )]. Solving for t gives
where gKCa is the strength of the feedback, [Ca] is the calcium concentration.
These extensions to the integrate-and-fire model are useful and easy to do in simulations, but
make analytical treatment often much harder. In case of the adaptation, the description of the
neuron becomes two dimensional (both V and [Ca] are needed to specify the state of a neuron),
instead of one dimensional (just V ).
0.175
0.15
0.125
0.1
0.075
0.05
0.025
5 10 15 20 25 30
Figure 5.2: Effect of exponentially decaying synaptic current on the membrane potential. Mem-
brane potential vs. time in ms. Parameters: τsyn = 2ms. Upper curve:τm = 50ms, lower curve:
τm = 10ms.
-20
-30
-40
-50
-60
-70
Figure 5.3: Effect on inhibition. Left: a passive model. The membrane voltage as a function of
excitatory input in a non-spiking, passive model for three levels of inhibition, gi = 0.1, 1, and 3
(upper curve to lower curve). Parameters:Vrest = Vinh = −70,Vexc = 0,gl = 0.1. Right: How
perfect divisive and subtractive inhibition would look like in theory. Right plot from (Holt, 1998).
dV (t)
C = −gl (V (t) − Vrest ) − ge (V (t) − Vexc ) − gi (V (t) − Vinh )
dt
CHAPTER 5. INTEGRATE AND FIRE MODELS 43
Figure 5.4: The effect of shunting inhibition in a spiking model cells can be more subtractive than
divisive. From (Holt, 1998).
where τm = Rm Cm . In code:
v += dt/tau*(-v+vrest+i_total*rm)
Next, we have to detect spiking. A spike will reset the potential but will also provide input to
other neurons receiving input from it, labelled m.
if (v > vthr){
v=vreset
forall m {
input_g(l,m) += synaptic_weight(l,m) % a simple update model for input to other cells
}
}
We have to update (decay) the synaptic conductances. When the synaptic conductances are
modelled as exponential synapses, the corresponding differential equation for the conductance is
τsyn dg(t)/dt = −g(t).
Given that in a network there are usually many more synapses than neurons, the synaptic
calculations will take the most time. If all synapses have the same time-constant, a simple but
effective speed-up is to collect all incoming synaptic conductance into a single conductance. Now
one only needs to decay the sum of the conductances instead of the individual ones.
6.1 Variability
In the previous chapters we have developed models for firing of the neurons. Now one can measure
the firing properties of neurons in vivo, and see how they match the models, or lead to refinement
of the models. One of the most obvious aspects of neural activity not present in the models so far
is the large variability, so here we look at computational considerations about that.
Suppose one has found a cell which reacts to certain visual feature, such as a white bar on a
black background, or maybe a face. When we present the same stimulus over and over again to
the neuron, its spike-train shows quite a bit of variability, see Fig. 6.1. One common explanation
is that this variability is due to noise. Another possibility is that for every repetition the nervous
system perceives the stimulus differently, leading to a different response. Something like: ”Hey, a
new face”, “Here it is again”, “And yet again”, and finally “It’s boring”. A more formal way to
state this is to say that the internal state of the organism is important. This could potentially be
a serious concern. By checking whether firing rates and patterns remain similar throughout the
experiment one can control for this. In early sensory cells, such as in the retina, one would not
expect such effects. Anesthesia might also partly relieve this problem, although that introduces
other problems.
Alternatively, the neural code might not be well stimulus locked, but could use very precise
timing between neurons. Also in that case fluctuations might not corresponds necessarily to noise.
This could easily happen in neurons further removed from the sensory system.
Although these possibilities are worth more study both experimentally and theoretically, one
usually interprets all variability as noise and a certain portion of the variability is certainly noise.
Therefore responses are commonly averaged over trails and by binning the spikes the Post Stim-
ulus Time Histogram (PSTH) is produced. This PSTH gives the average temporal response
profile to the stimulus. Note that although signals might be obvious in the PSTH, the nervous
system itself does not have access to the many trials and has to rely on other information, such
as combining the responses from multiple neurons.
45
CHAPTER 6. FIRING STATISTICS AND NOISE 46
Figure 6.1: Example PSTH of a visual cortex neuron that is motion sensitive (area MT). The dots
indicate the spikes from the neuron, each row corresponding to a different trial, firing frequency
averaged over trials is shown above. The three panels correspond to different spatial temporal
stimuli. The top stimulus is a constant motion signal, the lower two are fluctuating stimuli.
These sort of firing statistics are seen throughout the cortex. The figure also shows that despite
fluctuations, the firing can be more (bottom) or less (top) stimulus locked. From (Dayan and
Abbott, 2002) after (Bair and Koch, 1996).
Z
hδt2 i = σt2 = −hti2 + dt PISI (t)t2 (6.1)
σt
CV =
hti
The simplest category of spike interval models are the so-called renewal processes. In renewal
processes each spike time is chosen from 1) the same interval distribution and 2) independent of
other intervals. Therefore the interval distribution completely determines the spike times. This
rules out effects such as spike frequency adaptation known to be present in biology. Despite
this lack of reality the description as a renewal process opens up many mathematical results and
techniques. The spike time models in the next two sections are renewal models.
Intermezzo As we are talking statistics anyway The following information is useful when you are
reading experimental papers. In experiments is often important to see if a certain treatment changes
1 Especiallyfor complicated problems, many mistakes are made in calculating the variance; use Eq. 6.1 and you
will make no such mistakes.
2
CHAPTER 6. FIRING STATISTICS AND NOISE 47
some mean property of the cell. Therefore it is important how reliable the √ estimate of the mean is.
This quantity is called the standard error σM and is given by σM = σ/ N . By measuring enough
data, the standard error can usually be made arbitrarily small, whereas σ usually tends to a fixed value
when enough data are measured. The means from the two conditions can be compared using a t-test.
The resulting p value indicates how likely it was that the effect occurred due to a random coincidence
(testing an hypothesis). The lower the standard errors, the more significant the result, values below
5% (p<0.05) are considered significant and publishable. Note, that the lack of a significant result does
not mean that there is no effect...
1
PP oisson (tisi ) = exp(−tisi /τ )
τ
In other words, the intervals are exponentially distributed. The mean interval is τ and the variance
is τ 2 . In the Poisson process the probability for an event is constant (there is no time-scale intrinsic
to the Poisson process). Therefore the autocorrelation function is a constant plus a δ function at
zero. As a result the power-spectrum is flat as well with a δ function at zero.
Of course, the spike rate of a neuron usually changes when a stimulus is presented/removed
or the stimulus fluctuates, as in Fig. 6.1. A rate-modulated Poisson process can be used to model
this. In that case at every instant the firing probability is calculated from a rate λ(t). This will
of course destroy the exponential distribution of inter-spike intervals.
The Poisson process is only a rough approximation for the firing of real neurons. Although
reasonably correct for low firing rates, the Poisson approximation goes wrong for higher rates.
When the instantaneous rate is high, a Poisson model can fire twice in rapid succession. However,
a real neuron’s membrane potential will have to climb up from reset to threshold voltage. In
addition, refractory mechanisms prevent a rapid succession of spikes in real neurons. As a result
real neurons fire less fast and also more reliable (Berry and Meister, 1998). Nevertheless, the
Poisson model is an easy benchmark for neural firing.
isi isi
Threshold
Voltage
Time
Figure 6.2: Noisy integrate and fire neuron. Different sample paths from rest to threshold. For
strong stimuli (right 2 traces) the slope is steep and the variations in the spike timing is small.
For weak stimuli (left 2 traces) the slope is shallow and noise accumulates. The shape of the
inter-spike interval histograms sketched on top is like those observed in physiology.
P(V) 111
000
P(V)
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
000
111
V V V
thr
Figure 6.3: Cartoon of the Fokker-Planck approach. Left: Situation shortly after the neuron has
fired. The Fokker-Planck equation collects all loss and gain term to P (V ) at each voltage (solid
arrows loss terms, dashed arrows gain terms). Right: Later situation: the distribution is spread
due to the noise, and its mean has shifted toward the threshold due to the stimulus current. When
the probability reaches threshold, the neuron fires (open arrow).
neuron, one obtains reasonably realistic firing statistics. Simplifying such that Vreset = Vrest = 0,
and C = 1 the governing equation is
dV
= Istim + Inoise (t)
dt
and of course, as before the neuron spikes and resets when it reaches the threshold.
The problem is now to determine when the neuron will fire, given that V (t = 0) = 0. The
noise term is easy to simulate, from trial to trial the neuron will take different paths to the
threshold, Fig. 6.2. Analytical treatment is possible: The voltage obeys a diffusion equation as we
will see shortly (Gerstein and Mandelbrot, 1964). We assume that the noise term has zero mean
hInoise i = 0, and its variance is hInoise (t)Inoise (t0 )i = σ 2 δ(t − t0 ).3
The spike-timing can be calculated using a Fokker-Planck equation. Consider the distribution
of possible voltages P (V, t). The Fokker-Planck equation simply collects the contributions which
change this distribution, as is shown schematically in Fig. 6.3.
Z Z
P (V, t + ∆t) = P (V, t) + dV T (V , V )P (V , t) − P (V, t) dV 0 T (V, V 0 )
0 0 0
3 This is so called white noise; its power-spectrum is flat, i.e. it contains all possible frequencies.
CHAPTER 6. FIRING STATISTICS AND NOISE 49
where T (V 0 , V ) is the transition probability that voltage jump from V 0 to become V . Consider
a small time interval ∆t, the voltage can change due to two effects: 1) due to the stimulus
V → V + ∆tIstim . 2) due to the noise V → V + noise. In other words, T (V 0 , V ) is a Gaussian
distribution with mean Istim ∆t and variance σ 2 ∆t.
If one Taylor expands P (V 0 , t) one obtains the Fokker-Planck equation
∂P (V, t) ∂ 1 ∂2
= − [A(V )P (V, t)] + [B(V )P (V, t)]
∂t Z ∂V 2 ∂V 2
∞
A(V ) = dW W T (V, V + W )
−∞
Z ∞
B(V ) = dW W 2 T (V, V + W )
−∞
In this case we simply have A(V ) = Istim and B(V ) = σ 2 . In general Fokker Planck equations
these so called jump-moments A(V ) and B(V ) could depend on V , but here they are constants.
The distribution of voltages obeys a diffusion equation, just like the cable equation:
∂P (V, t) ∂P (V, t) 1 2 ∂ 2 P (V, t)
= −Istim + σ
∂t ∂V 2 ∂V 2
Suppose that at time 0, the neuron has just fired, thus at time t = 0 the potential is distributed
asP (V, 0) = δ(V ). From the discussion of the cable equation, we know the solution in infinite
space, namely
1 2 2
P∞,0 (V, t) = √ e−V /2σ t
2πσ 2 t
The voltage moves around like a diffusing particle, Fig. 6.2.
When will the voltage reach the threshold? This is the meandering drunk with a ditch problem.
There are two complications: First, there is a drift term, the term proportional to Istim (the drunk
walks on a down-hill slope). Secondly, the boundary conditions are important. Once the threshold
is reached the neuron fires and this particular voltage trace needs to be discounted. This means
that we have to impose an absorbing boundary at the threshold: P (VT , t) = 0. The neuron that
spiked will re-enter the population at the rest voltage, but here we are interested only in the timing
of the first spike. The probability turns out to be
1 ³ 2 2 2 2
´ 2 2 2
P (V, t) = √ e−V /2σ t − e−(2VT −V ) /2σ t eIstim V /σ −Istim t/2σ
2
2πσ t
(Check that P (VT , t) = 0). Next, we need the firing rate of the neuron. This is given by the
fraction of neurons that pass the threshold, Fig. 6.3. Fick’s law relates the spatial gradient of the
density to the probability current: PF ire (t) = − 12 σ 2 ∂V
∂P
|V =Vthr . Thus we get
VT 2
/2σ 2 t
PF ire (t) = √ e−(VT −Istim t)
t 2πσ 2 t
This is called an inverse Gaussian distribution. Its moments are
VT
hti =
Istim
VT σ 2
hδt2 i = 3
Istim
√ 2 √
hδt i
Or the Coefficient of Variation (CoV) of the interspike interval is CV ≡ hti = σ/ VT Istim .
From this last equation we see that when the stimulus is strong the firing frequency is high and
fluctuation in the interspike interval are small. When the stimulus is weak, the interspike interval
has a wide distribution, Fig. 6.2. This in reasonable agreement with cortical neurons (Gerstein
and Mandelbrot, 1964).
The noise typical has other effects as well. The strong threshold for firing seen in vitro is
smeared out and the input-output relation becomes more or less threshold-linear (Anderson et al.,
2000).
CHAPTER 6. FIRING STATISTICS AND NOISE 50
30
20
Trial
10
0
0 500 1000 0 500 1000
Figure 6.4: Left: Noisy integrate and fire neuron stimulated with a DC current (bottom panel),
responses (top) are shown trial-to-trial. After few spikes, spikes are not stimulus locked. Right:
Stimulation with a fluctuating current, leads to much more precise spiking; the firing locks to the
stimulus.
For a Poisson process, see Eq. 4.1, the mean count is T /τ , and the variance is T /τ as well, so the
Fano factor is 1 for all T . (It is important to distinguish the count statistics and interval statistics
especially for the Poisson process). The behaviour for large T is revealing: most simple neuron
models converge to fixed values for long T, however real neurons show often a steady increase in
the Fano factor for long times (seconds), Fig. 6.5. This could either mean that slow variations
in for instance excitability occur, or that the firing has a fractal structure (Lowen et al., 2001).
(Exercise: Try to create a cell model with such a Fano factor as in Fig. 6.5).
CHAPTER 6. FIRING STATISTICS AND NOISE 51
Figure 6.5: The Fano factor of the spike count for an experiment like in Fig. 6.1. The flat line
corresponds to a Poisson process. From (Buracas et al., 1998).
• Because there are so many synapses, the fluctuations in the input are averaged out. Therefore
one would expect the post-synaptic neuron to fire very regular. Note, that this will be the
case despite all synaptic and channel noise sources we have encountered.
• A single input depolarises the cell with about 0.1 mV. Maybe some 20..200 inputs are required
to fire the cell. The total excitation will cause the cell to fire at very high rates, even though
is just receives background input.
The solution to the riddle is not fully clear yet, but the following factors are believed to be
important:
Two final remarks on this issue: 1) With some simple estimates the approximate behaviour of a
neuron was calculated. The realisation that this is not consistent with data, is a nice example of
neural computational thinking (Shadlen and Newsome, 1998; Holt et al., 1996).
CHAPTER 6. FIRING STATISTICS AND NOISE 52
2) It shows that it is not only necessary to have an accurate model of the cell at rest, but that
an effective model for the cell in vivo is as important.
More reading on statistical description of neurons: (Tuckwell, 1988)
Chapter 7
In the following few lectures we discuss a typical laboratory task that a monkey might have to
perform. We will follow the neural signal through the different brain areas and discuss the neural
systems which are involved in performing this task. The monkey sits in a chair in front of a
monitor. A set of pictures flashed on the monitor. One of these pictures contains a banana. The
monitor goes blank and after a delay a ’Go’ command is shown. Now the monkey has to move a
joystick to the point where the banana was. If the response is correct and within time, it will get
a reward.
Such a simple task highlights the various tasks the nervous system has to do:
Go
53
CHAPTER 7. A VISUAL PROCESSING TASK: RETINA AND V1 54
Figure 7.2: Left: Diagram of the retina and its different cell types. Note, that the light passes
through all cells before reaching the photoreceptors rods and cones). Right: Drawing by Ramon
y Cahal the famous 19th century anatomist, who said of the retina:“I must not conceal the fact
that in the study of this membrane I for the first time felt my faith in Darwinism weakened, being
amazed and confounded by the supreme constructive ingenuity revealed not only in the retina and
in the dioptric apparatus of the vertebrates but even in the meanest insect eye.”
Out Out Out
In In In
Figure 7.3: Adapting the neurons response to one feature of the input statistics, namely the
probability distribution of the pixel intensities. A (mismatched) input-output relation (top) and
the distribution of the input signal (bottom). Middle: Adaptation to the average signal. Right:
Adaptation to the variance in the signal (contrast adaptation).
As we analyze this task in neural terms, we will encounter many properties of neurons and their
computation, but also many unknown aspects.
7.1 Retina
The description of the processing of our task starts with the retina. The retina is a thin layer of
cells in the back of the eye-ball which convert the light into neural signals, Fig. 7.2. In the centre
is the fovea, which is high resolution area. The resolution gets less and less further away from
the fovea. Except for the ganglion cells, which have as one of their tasks to send the signal to
the thalamus, the neurons in the retina do not spike. Nevertheless, most retinal neurons do have
voltage-gated channels which help in the adaptation process.
The ganglion cells come both in ON and OFF types. The ON responds maximally to a white
spot on a black background, whereas the OFF cell likes a dark spot on a white background. (This
is an interesting symmetry, that one does not encounter often in the brain.) The second distinction
CHAPTER 7. A VISUAL PROCESSING TASK: RETINA AND V1 55
is between the cells in the M- and P- pathway. The P have a fine spatial resolution, are slow and
some are colour selective. The M cells encompass larger receptive fields and therefore have a lower
spatial resolution, but they are faster. The M and P pathway are processed more or less in parallel
in the retina, LGN and cortex.
7.1.1 Adaptation
The retina works over a very large range of possible light intensities: a sunny day is about1010
times brighter than a starlit night. The dynamic range of the output (which is defined the ratio
the strongest and weakest perceptible signal) is about 100 (assuming rate coding with firing rates
ranging from a few to 200 Hz). So adaptation is required. (Opening and closing the pupil only
contributes a small bit to the total adaptation).
The principle of adaptation is shown in Fig. 7.3. Adaptation can be understood as chang-
ing the input/output relation of cell such that the dynamic range of the cell is optimally used.
Adaptation seems to be implemented at many levels in the retina, from the photoreceptor to the
ganglion circuit. There are both global mechanisms (mediated through neuro modulators such as
dopamine), and local mechanisms.
Contrast adaptation also occurs in the retina, but its mechanisms are less well known. Apart
from biophysical mechanisms in the cells, adaptation can be achieved with a centre-surround
circuit (a Mexican hat profile). The output of such a circuit subtracts the surround from the
centre. It is easy to see that this circuit will not react to changes in mean light level, instead it is
a local contrast detector. The centre-surround structure is ubiquitous in the nervous system, not
only in the visual system, but also in the somato sensory and auditory system.
a) On−center b) Off−center
+
+ + +
+ −
− − −
−
− +
− + − + − +
− +
Figure 7.4: Left: Receptive fields of retinal ganglion cells can be modelled by a difference of
Gaussians. This yields a bandpass spatial filtering.
Right: Spatial transfer function of the retina at different mean light intensities (lowest curve:
lowest light level). At low light levels the transfer function is low-pass; at higher light levels it
becomes band-pass. From (Atick and Redlich, 1992).
However, when the signal is corrupted with photon noise, this filtering would be catastrophic,
2
as the noise has a flat power-spectrum. Therefore multiplying the signal plus noise with fspat
would cause most bandwidth would be taken up by noise. Instead there is a trade-off between
whitening and getting too much noise, Fig. 7.5.
Therefore at normal light levels the resulting spatial filter has a Mexican hat, or centre-surround
profile. Its centre collects signals from neighbouring cells, thus doing a low-pass filter to reduce
high frequency noise. The inhibition removes partly low-frequency components in the input. The
result is a bandpass filter. In contrast, at low light levels the optimal filter is purely low-pass,
Fig. 7.4.
The idea of removing correlation (redundancies) in the visual signal before sending it on the
cortex dates back to Barlow(Barlow, 1958). The typical images are also redundant in time and
color-space(Buchsbaum and Gottschalk, 1983). Redundancy reduction is a compression algorithm,
a bit like JPEG compression. This sort of information reduction is probably going on at many
stages in the brain. Our retina sends some 106 axons, with maybe some 10 bits/sec each. Yet,
consciously we process much less information (estimate are about 100 bits/sec).
These properties of the input are so called second order statistics, as the calculation of the
correlation uses terms such as hI(x)I(x + ∆x)i. It is unclear yet if similar principles can be used
for higher level (smarter) stages of processing. In principle, higher order statistics could yield
2
additional compression tricks. Natural images might have a 1/fspat power-spectrum, but have
maximal resolution. The resolution of the eye is determined by lens scatter and the number of ganglion cells in the
retina.
CHAPTER 7. A VISUAL PROCESSING TASK: RETINA AND V1 57
A C
B D
Figure 7.5: A) Original image. B) Edge enhanced version (high pass filtered). C). With noise
added. D). The same edge enhancement does not gain anything. E). Low pass filtering works
better. Note: made with GIMP for illustration purposes only, the filters were not exact.
2
Figure 7.6: An artificially generated image with roughly 1/fspat power spectrum. Despite the
correct statistics it does not look very natural. Higher order statistics are important.
CHAPTER 7. A VISUAL PROCESSING TASK: RETINA AND V1 58
Figure 7.7: Left: Simple cell model according to Hubel and Wiesel. Middle: Corresponding
feed-forward model of a complex cell. Right: Complex cell response.
2
just having 1/fspat statistics does not yield a natural looking image is shown in Fig. 7.6. This
means that higher order statistics are important.
left
eye right
eye left
eye right
eye
5mm
D. Somatosensory map
C. Topographic representation
Trunk
Hip
Neck
Leg
Hand
of visual field
Shoulder
Arm
Elbow
Forearm
Wrist
Fo
d
Han
ot
e
Littl
M Ring
To
es
Ind d l e
Left visual field Ge
um x
id
e
nit
b
a ls
se e
Th
rs
N o Ey
ge
Fin
c e
Fa
5 Right V1 Upper lip
3 Lower lip
1 Teeth, gums, and jaw
2 Tongue
4 6 4 2
6 Fovea 5 abdo
3 1 mina
l
Figure 7.8: A) The ocular dominance columns in visual cortex. Left eye is dark, right eye is
white (note the scale bar; comparable to your fingerprints !). B) The orientation selectivity has a
columnar layout in the primary visual cortex. C) The primary visual cortex has a retinotopic map:
nearby retinal cells project to nearby cortical cells. The fovea is over-represented compared to the
periphery. D) Also somato-sensory cortex has a regular mapping.From (Trappenberg, 2002).
Figure 7.9: Left: Principle of reverse correlation. A random stimulus is given and every time
spike is produced the preceding stimulus is recorded. The average gives the spike triggered
average. Right: the spike triggered average thus constructed for simple cells in V1. From (Rieke
et al., 1996) and from (Jones and Palmer, 1987).
If both stimuli cause the same response, the reverse correlate averages out.
A very new development is to use ’higher order’ techniques in this case (Touryan, Felsen, and
Dan, 2005).
Figure 7.10: Stimulus reconstruction. The lower graph shows the measured spike train and the
reconstruction. The other panels show higher order reconstructions. Fly motion sensitive neuron.
From (Rieke et al., 1996).
Chapter 8
Coding
In this chapter we look into the question how stimuli and actions are coded in single neurons and
networks and formalize some of the interpretation of the data encountered in the previous chapter.
We present a few coding schemes and methods to test hypotheses about coding experimentally.
62
CHAPTER 8. CODING 63
Figure 8.2: Population coding. Left: four cells with their tuning curves. The centres of the tuning
curves are different, but they overlap. Middle: When a single bar is presented, all neurons respond,
but with different amplitudes. Right: Hyper-acuity. The location of the bar can be estimated with
a accuracy better than the distance between the neurons. The stimulus must have been located
slightly to the right of the second neuron (because the third neuron is more active than the first
neuron).
A slightly unrelated characteristic of the data shown, is that the relation between firing rate
and force is not linear. Instead, the firing rate codes roughly the log of the force. As a result
errors in the force estimate will be proportional to the force itself (assuming additive, fixed noise).
In other words, ∆F/F is constant; the percentage error, rather than the absolute error is fixed.
This is called Weber’s law.
Figure 8.3: Population coding in the motor cortex. The neurons shown here code for a 2D arm
movement. The actual movement directions of the arm are along the dashed lines. The population
vector response is drawn with the thick arrows. There is a reasonable match between the actual
response and the population vector. The line bundles give the response of the individual neurons,
the sum of which gives the population vector.From (Dayan and Abbott, 2002) after (Kalaska,
Caminiti, and Georgopoulos, 1983).
can employ a population code.1 It is important to note that the resolution of our vision is not
enhanced by hyper-acuity. Hyper-acuity does not improve our ability to see a very fine pattern,
only the position estimate of isolated stimuli is improved. Else blurring (incorrect glasses) would
improve vision.
Population coding occurs in many, if not all, locations in the brain. For instance in primary
visual cortex, in higher visual areas, in working memory, in hippocampal place cells, and in motor
cortex, Fig. 8.3. A interesting question when using a population code is: how to estimate the
encoded signal given the population response? A decent (but not optimal) read-out is given by
the population vector. In the population vector, every neuron votes with a weight proportional
to its activity that the stimulus was their optimal stimulus. In the case of angles, the sum over
all votes is best seen as a vector, Fig. 8.3. In experiments the thus constructed population vector
gives a reasonable description of actual angle, see Fig. 8.3. The population vector will not be a
very good estimate when the neurons have a high background rate and narrow tuning curves. In
that case a lot of non-sense votes are included. These votes average out when many trials are
included, but worsen the trial to trial variations in the estimate, as we discuss below.
the neurons are noise-free, the accuracy of the estimate is in principle infinite. The question is
therefore: given a certain noise-level in the neurons, what the highest accuracy one can achieve.
Suppose we have a large population of N neurons that encode for an angle θ. For convenience
we suppose that they all have tuning curves which only differ in their centrefi (θ) = A exp[cos(θ −
θi )/w2 ].
The error in the angle estimate can be calculated using the Fisher information2 , which is given
by
Z Z µ ¶2
∂ 2 log P (r|θ) ∂ log P (r|θ)
IF = − drP (r|θ) = drP (r|θ)
∂θ2 ∂θ
The Cramer-Rao bound says that it is not possible to estimate the original stimulus with a variance
less than the inverse Fisher information (for a so-called unbiased estimator3 ), that is
σθ2 ≥ 1/IF
This inequality hold not only for angles, but for any encoded quantity. Suppose the noise on
the neurons is Gaussian, the probability for a certain responseri from neuron i is P (ri |θ) =
√1 exp[−(ri −fi (θ))2 /2σ 2 ]. When the neurons are independent, the full probability is: P (r|θ) =
Q2πσ
N
i=1 P (ri |θ). Hence
Z X n
∂ 2 [−(ri − fi (θ))2 /2σ 2 ]
IF = − dr1 dr2 ...drn P (r1 |θ)P (r2 |θ)..P (rn |θ)
i=1
∂θ2
n Z ³ ´
1 X 0
= − 2 dri P (ri |θ) ri fi00 (θ) − fi (θ)fi00 (θ) − fi 2 (θ)
σ i=1
n ½ Z Z ¾
1 X 0
2 00
= fi (θ)[ dr i P (ri |θ)] − f i (θ) dr i P (r i |θ)[ri − f i (θ)]
σ2 i
n
1 X 02
= f (θ)
σ 2 i=1 i
There are various things to note: the more noise (σ 2 ), the less Fisher information; the more
neurons, the more information. The information is proportional to the derivative on the tuning
function squared. This means that there is no information when the derivative is zero, i.e. when
the stimulus is right in the centre of the neuron’s receptive field (!) or when the neuron fires hardly
at all. Instead, most information is provided by neurons in the flanks. This makes sense, if we
move the stimulus slightly, neurons which show a big change in response are most informative.
This does not mean we can simply remove the un-informative neurons, because with another
stimulus these neurons might become important.
The calculation can be extended to include correlations in the noise sources. Correlation
in general reduces the accuracy, as fluctuations in one neuron will not average out against the
fluctuations in another neuron. One basically finds that correlations with a range comparable to
the tuning curve are most deteriorating. Short range and infinite range correlations can be filtered
out (Sompolinsky et al., 2002). Another interesting extension is to see what happens if the neural
responses have Poisson noise instead of Gaussian noise (Snippe, 1996).
The next question is how to read out the code, such that one reaches the Cramer-Rao bound.
If the estimator is able to reach the Cramer-Rao bound one has done the best possible. Various
methods have been developed such as Bayesian estimates and maximum likelihood methods, see
(Dayan and Abbott, 2002) for a recent overview.
However, it is good to realise that except at the motor output the nervous system itself does not
need to ’read out’ the population code. Instead, it computes with it. How that differs from regular
2 The Fisher information is not an information measure in the strict sense (bits). It is related though, for details
which always estimates the angle to be 0, of course the variance is in that case very low (but the estimate is useless).
CHAPTER 8. CODING 66
computation is not fully clear. One proposed scheme is to use Radial Basis Functions (RBF)
(Hertz, Krogh, and Palmer, 1991; Pouget and Sejnowki, 1997). One variant of this
coding scheme is as follows: Suppose one wants to calculate a function f (A, B) of the variables
A and B (A and B might for instance represent the head direction and the eye direction). A
typical transformation is to calculate the sum of these angles. When quantities are populated
coded, already a simple sum is actually a non-linear operation. Using radial basis functions such
a sum is calculated as follows: One first creates a 2-dimensional layer with an activity equal to
the (outer) product of the population activity in A and B, i.e. f (x, y) = fA (x).fB (y). Next, a
projection from this layer to an output layer implements the output function. Under the quite
general conditions this allows for the calculation of arbitrary functions.
When dealing with a spiketrain, the response will be a discretized spiketrain, such as r =
(0, 1, 0, 0, 1). When we split the total time T in bins of length ∆t there are n = 2T /∆t possi-
ble responses. If the response pattern is always the same, one p(r) is one while the others are
zero. In that case the information is zero. Indeed such a response would not be very informative.
Instead, when all probabilities are equal (p(r) = 1/n), information is maximal,H = log2 (n). In
that case the response itself is very informative, we could not have predicted the response with
much certainty. Information is also called entropy. High entropy corresponds to having a rich
response ensemble.
However, the richness in the response could just be noise. We want to know the information
that the response provides about the stimulus, this is called the mutual information. We need
to subtract the noise part. The noise part can be measured by repeating the stimulus over and
over, and measuring the fluctuations in the neural response (see our discussion on spike statistics
for problems with stimulus repetition). We write the stimulus as s, and the response as r. The
noise part is for given s and its average are, respectively,4
X
Hnoise (s) = − P (r|s) log2 P (r|s)
r
X X
Hnoise = Hnoise (s)P (s) = − P (s)P (r|s) log2 P (r|s)
s r,s
4 With P (r|s) we denote the probability for response r given a stimulus s.
CHAPTER 8. CODING 67
Figure 8.4: Step in the calculation the information in a spiketrain: counting ’words’. The prob-
ability for words enters in the information calculation. From (de Ruyter van Steveninck et al.,
1997).
The mutual information is defined as the difference between total and noise entropy
Im = H − Hnoise
X X
= − P (r) log2 P (r) + P (s)P (r|s) log2 P (r|s)
r r,s
X P (r|s)
= P (s)P (r|s) log2
r,s
P (r)
P
where we used that P (r) = s P (s)P (r|s).
In practise the information calculation goes as follows: pick a neuron and give it a stimulus
from a chosen ensemble. Measure the response and repeat many times. The output spike-train
is written a string of ’1’s and ’0’s. Response r is a possible ’word’ in the response, such as
r = (1, 0, 0, 1, 0, 1). Next, one can calculate the information, Fig. 8.4. (It is important to choose
the right stimulus ensemble and time-discretization). It is easy to understand that given the many
possible words, an accurate information calculation requires a lot of data.
Typical values for the mutual information are that a spike carries 1-2 bits of information. Note,
that is not because a spike is binary. When temporal precision is high, information content can be
much higher (see our example in the beginning of this section). When precision is low or the cell
is not tuned to the stimulus, information content can be virtually zero. See (Rieke et al., 1996)
CHAPTER 8. CODING 68
5 There are very recent theoretical studies which challenge this view...
CHAPTER 8. CODING 69
Figure 8.5: Left: A. Autocorrelation of two cells. B: The cross-correlation shows a peak, despite the
neurons being millimetres apart. Right: Data from the same paper, showing that the correlation
depends on the stimulus: disjunct stimuli do not cause synchrony.Left figure from (Dayan and
Abbott, 2002) after (Engel, Konig, and Singer, 1991)(?). Right from (Engel, Konig, and Singer,
1991)
Chapter 9
In this chapter we discuss how the visual cortex further processes the image that is presented to
the monkey. How does the monkey recognise the banana? The answer to this question is not
known yet, but we discuss some interesting properties of the visual cortex.
Figure 9.1: Line drawings by Van Dyck. Top: originals. Below: inverted images. A lot of the
subtlety is lost. If the cortex would interpret the image using only complex cell responses, it would
be hard to explain such effects.
70
CHAPTER 9. HIGHER VISUAL PROCESSING 71
Figure 9.2: Attention increases the neuron’s response. Upper curves are with attention (the neuron
fires more), lower curves without attention. Left for area MST (a higher motion area) neurons,
right for V4 (higher visual area). From (Treue, 2001).
In a line rendering junctions of edges are of special importance. Junctions tell you which object
is in front of which. In contrast, straight or slightly curved lines can be interpolated, and are
therefore somewhat less informative.
Figure 9.3: Contextual effects in V1. Left: non-linear interaction of centre stimuli and its context.
The response to both centre and co-linear segment is much stronger than either parts alone.
Right: The interactions in the vertical direction of the grey scale plots are excitatory, in the
percept they lead to line completion. The interactions in the horizontal direction are inhibitory
(parallel lines). Perceptually such interaction can cause pop-out of connected structures (B and
C). From (Kapadia, Westheimer, and Gilbert, 2000).
In the early visual system the receptive fields are simple, and in the higher visual areas they
become more complex. This suggests there is a hierarchical stack of processing layers (perhaps
some 10 stages) in which the image is processed more and more until in the end a grandmother cell
pops up. The speed by which visual images are processed is very quick. Even complicated natural
scenes are processed very fast: categorisation whether an image contains an animal or not is done
in some 150 ms (Thorpe, Fize, and Marlot, 1996). Of course, we knew this result approximately
from daily life, but this figure stresses the computational demands in visual processing.
The anatomy indicates that there is no strict hierarchy in the processing stages, instead many
cross and feedback connections exist. On the other hand man-made simple object recognition
models often operate exclusively in a feed-forward mode (Fukushima, 1980; Riesenhuber and
Poggio, 1999). It seems that there is in principle no operation that a recurrent net could do that
a feed-forward net could not do (given an arbitrary set of time delays). Therefore the question is
perhaps: ’how do recurrent connections make the network more efficient?’,
CHAPTER 9. HIGHER VISUAL PROCESSING 73
Figure 9.4: Top: Schematic of the different visual areas. Bottom: activity as it sweeps through
the visual system. Both from (Lamme and Roelfsema, 2000).
• k = 1 : grandmother cell. This has a low coding capacity; there are only N possible outputs
(for binary units)1 . Furthermore, it would be hard to generalize between stimuli, as each
stimulus gives a completely different response. However, as an advantage this code can
represent N stimuli at once and it is relatively fault insensitive.
• 1 ¿ k ¿ N : sparse code. This has a decent capacity k!(NN−k)!
!
. It can represent a few stimuli
at once, and it can help with generalization (depends on details of representation).
1 For rate coding units the number of possible outputs per cell depends on the signal to noise ratio: (rate
max −
ratemin )/σrate .
CHAPTER 9. HIGHER VISUAL PROCESSING 74
Figure 9.5: Neural responses to complex objects in higher visual areas. Left: single cell recordings,
right: optical imaging (scale bar 1mm). Left from (Tanaka, 1996), right from (Tsunoda et al.,
2001). (The right picture is hard to see in gray-scale. Consult pdf-file or original paper).
• k ≈ 0.5N : dense code (e.g. binary computer). This has the highest capacity, namely 2N
(for binary units). However, it can represent only one stimulus at a time. Furthermore it
can be hard to extract information as every bits counts. For the same reason it is hard to
combine it with learning. In this code a single bit error can give strange results, and finally,
it consumes more energy than spares codes.
Note we are mixing two quantities: how many stimuli activate a certain neurons, and, how many
neurons does a single stimulus activate. On the average both these sparseness measures are the
same.
In experiments, sampled across many images, neurons commonly have an exponential distri-
bution of firing rates, i.e. few are very active, most have no or low activity when stimulated by an
arbitrary stimulus. This means that the coding is sparse. One way to define sparseness is with
P
hri2 ( µ r µ )2
a= 2 = P µ 2
hr i µ (r )
where the average is taken over a set of relevant stimuli. When the code does not have any
variation a ≈ 1, while computers have a = 1/2. In monkey visual cortex a ≈ 0.3 and the firing
rates follow roughly an exponential distribution (Rolls and Deco, 2002). Of course, the choice
CHAPTER 9. HIGHER VISUAL PROCESSING 75
of stimulus ensemble is crucial here. In these experiments the code was also independent: each
neuron contributes comparable number of bits. There was not much redundancy, i.e. each extra
neuron helped to further identify the face.
This representation of sensory information can be compared to a dictionary in our written
language. There is a trade-off in our language between having a large dictionary of very precise
words where a single word can say all, or a smaller dictionary with less precise words in which
more words are needed. In this analogy the receptive fields of the neurons are the words in the
dictionary, the stimuli (sentences or stories) are represented by a combination of neurons (words).
(Also compare to our discussion of hyper-acuity.)
Another reason why sparse representations are used might be related to minimising energy
consumption. The brain demands some 20% of the body’s total oxygen and energy at rest (about
20 watts). Some models include this constraint to help to explain the observed representations
(Stemmler and Koch, 1999).
So the consensus is that the coding in the brain is Sparse and distributed. The fact that
the code is sparse and distributed still leaves many possibilities (check for yourself). It would be
possible that the neurons have some random, yet sparse code. This appears not to be the case:
1) for each stimulus, the neurons represent some stimulus feature consistently. (Although this is
harder to verify in higher areas)
2) nearby neurons have similar tuning properties.
3) the information is the neurons is not redundant (see above), because the responses are inde-
pendent and partly overlapping.
• Intracellular recording in vivo. This in principle allows to measure both excitatory and
inhibitory inputs to a certain cell. However, technical problems, such as a bad space-clamp
make this difficult.
• Multi-unit extracellular recording. In principle cross-correlations between neurons can be
used to figure out connections. However, without more information correlations are hard to
interpret. Suppose two cells are found to have a strong correlation. Is this because they
receive common input, or does one provide input to the other? In addition, stationarity can
be a problem and can cause spurious correlations.
9.4 Plasticity
The visual cortex is remarkably plastic and has an enormous storage capacity. After passively
watching thousands of images, people can tell which one they have seen and which ones they
didn’t when tested later. All this information must be stored somewhere, most likely in visual
cortex.
Indeed, the response of the neurons changes as monkeys becomes familiar with images, even
when the monkeys have to do nothing but passively view them (Erickson, Jagadeesh, and Desi-
mone, 2000). In another experiment monkeys were trained to do image pairing tasks. It was found
that cells pick up rather the pairing between the images rather than similarities in the shape of the
images (Sakai and Miyashita, 1991). These experiments show that the representation of objects
CHAPTER 9. HIGHER VISUAL PROCESSING 76
is in a constant state of flux. It is surprising that the changes can be large, and do not lead the
degradation of older memories.
More reading: (Rolls and Deco, 2002)
Chapter 10
After concentrating on single neurons, an obvious question is how a network consisting of many
neurons will behave. This is where the fun starts. The final dynamics will of course to large extent
depend on the connections between the neurons. Here we study very simple connectivity patterns,
which in some cases give already very rich and interesting behaviour. We will model the neurons
with I&F neurons or similar simplified models.
d~v (t)
τsyn = −~v + [W.~u]+ (10.1)
dt
where [x]+ = x if x > 0,[x]+ = 0 otherwise. Eq.(10.1) is called the rate approximation.
• The firing rate sums the excitatory inputs, subtracts the inhibitory input and rectifies the
result. This is not exact, but seems reasonable: above we have seen that inhibition can
indeed act in a subtractive manner, and that the F/I curve can be more or less linear.
• The rectification can form the basis for doing computations. Without the rectification, even
stacks of layers could only do linear computations. In that case the transformation between
two layers can be written as a matrix. The computation of three layers in series is the product
of the two matrices, which is again a matrix. (This is the famous argument of Minsky and
Figure 10.1: A) A simple two-layer feed-forward network, input layer u is connected with a weight
matrix W to output layer v. B) A two-layer network with lateral connections that feedback the
output activity to other neurons in the layer.
77
CHAPTER 10. GENERAL APPROACH TO NETWORKS 78
0.8
0.6
0.4
0.2
w -2 2 4 6
Figure 10.2: Left: A single neuron with a recurrent connection with strength w. Middle: rate as
a function of the input, and the line r = in/w. The only solution is where the curves intersect.
Right: For stronger w there are multiple solutions, the middle one is unstable, the two outer ones
are stable.
Papert against perceptrons). But with the non-linearity we can easily create a network that
solve the XOR problem. (Try it!). More generally one can also have τsyn d~vdt(t) = −~v +g(W.~u),
where g is a monotonic function. Unfortunately, analysis is not so straightforward once non-
linearities are included.
• Unlike more abstract neural nets, the dynamics of Eq. (10.1) are not saturating, but this
can be build in easily, for instance by replacing [.]+ with min(1, [.]+ ) in Eq.(10.1). One does
expect some mechanisms to be present in biology to prevent too high levels of activation.
• The dynamics of change is approximately given by the synaptic time-constant. (Assuming
equal excitatory and inhibitory time-constants and no extra delays). This is fast. The
synaptic time-constant will be the time-constant of the computation. This means our brain
has some 300 clock cycles per second (τAM P A ≈ 3ms). When one simulates integrate-and-
fire neurons, one finds that there are some corrections which slow down the dynamics, but
these are usually smaller effects, see (Gerstner, 2000; Brunel, 2000; Dayan and Abbott, 2002)
for details. The answer critically depends on the distribution of membrane potentials when
the stimulus arrives. When the membrane potentials are near rest, it takes longer to fire,
than when they are close to threshold.
• The dynamics are assumed very simple: there is no adaptation, or synaptic depression.
Synapses have all the same time-constant and are linear.
The rate approximation is often used in artificial neural networks, perceptrons and cognitive
models. In some aspects the rate approximation seems a decent description of cortical activity.
Especially when the firing rate instead of a single neuron describes a group of neurons firing
independently (without synchrony). This could describe a column.
Figure 10.3: Stability of two connected nodes. Left (w12 = w21 = −0.8) Stable fixed point at
(0.59, 0.59). Right (w12 = w21 = −2). Unstable fixed point at (1/3, 1/3).
When w is even larger, the only fixed point is near r = 1. You can see this by imagining an
even shallower line in Fig. 10.2.
du1
τ = −u1 + [w12 u2 + in1 ]+
dt
du2
τ = −u2 + [w21 u1 + in2 ]+ (10.2)
dt
where [x]+ is a rectification with [x]+ = x if x > 0 and [x]+ = 0 otherwise. This describes two
neurons that provide input to each other. If w12 > 0 (w12 < 0) then neuron 2 has an excitatory
(inhibitory) influence on neuron 1. Apart from that they receive external input (in1 , in2 ) which
is assumed to be constant.
Let’s assume (to be confirmed post hoc) that w21 u2 + in1 and w12 u1 + in2 are much larger
than 0, so that we can ignore the rectification. In that case we have
µ ¶
du −1 w12
τ = u(t) + in
dt w21 −1
= W.u(t) + in
Let’s find the fixed points, that is the u for which τ du dt = 0, i.e. uf p = −W
−1
.in. In Fig. 10.3
these fixed points are right in the middle of the plot (indicated with a cross and a circle).
Next, we perform a stability analysis to see if these fixed point is stable. To this end we look
at what happens if we perturb the system away from the fixed point, i.e. u = uf p + δu. Now
τ du
dt = W.(ufp + δu) + in = W.δu, where δu is a small vector. The only thing we need to know
if such a perturbation grows or shrinks over time. An easy way is to perturb in the direction of
the eigenvectors W . An eigenvector of W will behave as τ ds dt = λi si , and the perturbation will
i
therefore develop as si (t) = c. exp(λi t/τ ). The sign of λi will determine whether the system runs
away from the fixed point or returns to it. We can distinguish a few possibilities.
• λ1,2 < 0. The dynamics are stable, the system converges to fixed point. This is illustrated
in Fig. 10.3 left. The figure illustrates the system’s evolution. We simulated Eq.10.2, and
CHAPTER 10. GENERAL APPROACH TO NETWORKS 80
Figure 10.4: Hopfield net with 8 nodes. The arrows denote the (symmetric) weights between them.
followed the system over time. Different sets of the initial conditions were taken, all along
the edges of the graph.
• λ1 > 0, λ2 < 0. Saddle point. Although the dynamics are stable in one direction, in the
other direction it is unstable. Therefore the fixed point as a whole is unstable. This is
illustrated in Fig. 10.3 right. Along the diagonal the dynamics moves towards the fixed
point ( 31 , 13 ), but then bends off towards to (0, in2 ) or (in1 , 0). The intuition is that in this
case, because the inhibition is stronger, the nodes strongly inhibit each other and there can
only be one winner.
• λ1,2 > 0. Dynamics are unstable. This means that a minuscule fluctuation will drive the
solution further and further from the equilibrium. Like in the previous case, the solution
will either grow to infinity or till the linear approximation breaks down.
• If the eigenvalues are complex the system will oscillate. Remember: ex+iy = ex [cos(y) +
i sin(y)]. Stability determined by the real part of the eigenvalue Re(λ). When the real part
is < 0 the oscillations die out, otherwise they get stronger over time.
The above technique is that it also can be applied to the case when the equations are non-linear.
In that case the fixed points usually have to be determined numerically, but around the fixed point
one can make a Taylor expansion, so that for small perturbations τ dudt ≈ W.δu and one can study
the eigenvalues of W again.1
In Fig. 10.3 left, the system will always go to the same fixed point. The basin of attraction
in this case encompasses all possible initial conditions. In Fig. 10.3 right we have two basins of
attraction, starting above the line in1 = in2 the system will go to the upper left fixed point,
starting below the line the system will go to the lower right fixed point,
attractor basin
energy
y
te
sta
state x attractor state
Figure 10.5:
Left: Whenever the network starts in a state close enough to an attractor, it will ’fall in the hole’
and reach the attractor state.
Middle: Multiple attractor are present in the network, each with their own basin of attraction.
Each corresponds to a different memory.
Right: Pattern completion in a Hopfield network. The network is trained on the rightmost images.
Each time the leftmost (distorted) input is given, the network evolves via the intermediate state
to the stored state. These different image can all be stored in the same network. From Hertz.
The Hopfield network can store multiple binary patterns with aP simple learning rule. The weight
between node i and j should be set according to the rule wij = µ pµi pµj , where pµi is a binary bit
(±1) representing entry i for pattern µ.
Each stored pattern will correspond to a stable fixed point, or attractor state, see Fig. 10.5.
However, the storage capacity of the network is limited. As we increase number of stored patterns,
the memories states becomes unstable. Stable mixed states appear. At the critical amount of
storage, performance decreases suddenly. The network will still equilibrate, but it will end up in
spurious attractor states rather than in the memory states it was suppose to find. The numbers
of patterns we can store is proportional to the number of nodes, nstored = αn where α is called
the capacity. Simulation and statistical physics gives αc n = 0.138n. Hence a network with 100
nodes can store about 14 patterns. Many more details can be found in (Hertz, Krogh, and Palmer,
1991).
This is called an auto-associative memory: Presenting a partial stimulus leads to a recall
of the full memory, see Fig. 10.5. Auto-associative memories are very different from computer
memory (a bit like Google ...).
Figure 10.6: Recurrent model of simple cells in V1. Left: input to the network, which can be very
weakly tuned. Middle: The network output is strongly tuned. Right: experimental data from V1.
Figure from (Dayan and Abbott, 2002).
The model describes a population of simple cells with their preferred angle θ = −π/2...π/2.
Rather than specifying u and W , the network receives an input h (i.e. h = W.u), which is
parameterised as
h(θ) = Ac[1 + ²(−1 + cos(2θ))]
where A describes the amplitude, c the stimulus contrast, and ² the amount of tuning of the
input. We label the neurons v with the angular variable θ. The lateral interaction is described
with two parameters λ0 , which describes uniform inhibition and λ1 which gives the strength of
tuned excitation. So the full equation is (the sum can be written as an integral):
" Z π/2 #
dv(θ) dθ0 0 0
τ = −v(θ) + h(θ) + {−λ0 + λ1 cos(2θ − 2θ )} v(θ )
dt −π/2 π
+
If the network were linear, the output of the network would smoothly depend on the input param-
eters. However, in this model, because of the rectification, excitation is not always counteracted
by the uniform inhibition: a silent neuron can not inhibit... The effect is that when the input is
weakly tuned (or not tuned at all!), the output of the network can be much sharper, amplifying
the differences in the input, Fig. 10.6.
Importantly, the width of the tuning curves becomes almost independent of the input ’sharp-
ness’. This way the recurrent model explains the iceberg effect.2
2 Note, that the firing threshold in this model is zero, unlike in the discussion of the iceberg effect above, section
Spiking neurons
Above we dealt with a rate approximation of the network. What happens when one uses spiking
neurons? Suppose we have a population of unconnected I&F neurons. Now we present some
external stimulus to all the neurons in the population. How does the firing change? This is not a
trivial question, especially not when noise is present (Gerstner, 2000). The response will depend
on the different states in which the neurons in the population are, that is, the distribution of the
membrane potentials. If the neurons have the same potential, they will all fire simultaneously.
But right after that all neurons are in the refractory period and the population will have much
less activity. This can cause damped oscillations in the firing rate, Fig. 11.1. Note that in this
figure the neurons have slightly different initial conditions, else they would all spike at precisely
the same time. We can also see that the neurons can follow transients without any sluggishness,
despite the presence of a slow membrane time-constant (here taken 20 ms). Compare to Fig. 6.4,
0.4
A [kHz]
0.2
0.0
80 90 100 110 120
t [ms]
Figure 11.1: Response of spiking neuron simulation to a step current in the input (note, not
synaptic input). Spiking neurons can follow transients rapidly. From (Gerstner, 2000)
83
CHAPTER 11. SPIKING NEURONS 84
Figure 11.2: Propagation of a current stimulus through a layered feed-forward network. a) Archi-
tecture of the network: 10 layers with 20 integrate-and-fire neurons per layer (5 neurons shown),
all-to-all connectivity. b) Syn-fire behaviour: without noise the neurons strongly synchronise. c)
But if the neurons are noisy, firing rates can propagate rapidly through layered networks without
much distortion. From (van Rossum, Turrigiano, and Nelson, 2002).
The speed of the propagation helps to explain why one can do such fast processing of, for instance,
visual stimuli. The tendency to synchronise is counteracted by the noise and by the synapses which
act as a filter between the layers, removing sharp transients.
It might be possible that in vivo a subset of neurons propagates information using a syn-fire
mode, whereas the population as a whole appears to be asynchronous. This could have interesting
computational effects. The existence of such situations would be very hard to verify as one has to
measure the possibly very small subset of neurons.
Figure 11.4: Working memory in pre-frontal cortex. A stimulus was presented (onset and offset
marked by the two leftmost vertical bars). The monkey had to remember this location for two
seconds. After this the monkey had to make an eye-movement (final bar). The neurons have (in
this case) a spatial tuning function. From(Funahashi, Bruce, and Goldman-Rakic, 1998).
synaptic time-constant, using a lot of NMDA will help to stabilise the network. See (Compte et
al., 2000) for a recent model of a working memory network.
As is, the task the money performs could in principle be done by an advanced computer vision
system, if we program it correctly. One of the interesting issues is how the monkey learns how to
pay attention to the banana and remember its location. If we would change the task to reward only
movements to red objects, the monkey could learn this, and the cells in the working memory would
encode different information. How this interaction between reward, attention, and the dynamic
change in pathway works is not known. It is good to look back at Chapter 1, Fig. 1.2 for examples
of pre-frontal damage.
Also the working memory network shown in Fig. 11.4 has an attractor. But in that network
an arbitrary stimulus angle can be encoded. The memory state is therefore a continuous or
line attractor. Unlike point attractors, the network state can without much effort move
between the attractor state (like a ball in a Mexican hat). This means that the memory state will
only be stable for few minutes. The memory will start to drift away, both in model and data.
Continuous attractor networks are thought to be important for more mundane tasks as well, such
as maintaining eye position and remembering body position.
Chapter 12
Making decisions
In many instances a task involves making a decision: an eye-movement, a bar press, or a report
to the researcher. And also in real life decisions need to be made. Also our monkey has to decide
where to move the joystick. It is therefore important to know how the decision is made and how
errors in the decision relate to the underlying quantities.
The most simple case for a decision with noise is a yes/no decision between Gaussian distributed
quantities with an equal standard deviation. The errors in the decision are called false positives
(saying yes when the answer was no) and false negatives (saying no when the answer is yes). The
error rates will depend on the width of the distribution and their distance. By changing the
threshold, the trade-off between false positives and false negatives can be changed. This way one
creates the so called receiver operator characteristic, or roc curve.
The variability in the quantities can either come from the stimulus (for instance when the
stimulus is very dim and photon noise matters) or from the nervous system itself (for instance
when you try to decide to go left or right in a forgotten road). Many stimuli have been developed
in which the error can be systematically varied, for instance one can make the stimulus very dim.
Another interesting stimulus is the random moving dots stimulus, Fig. 12.2. In area MT of
the monkey there are neurons which respond to the (local) motion in their receptive field. For
instance, the stimulus in the right panel can be a very effective stimulus for such a neuron when
the motion is in the direction of the neuron’s preferred motion, whereas the stimulus in the left
causes a much less vigorous response. By changing the coherence of the stimulus, the difficulty
P(x)
x
false false
negatives positives
Figure 12.1: Left: principle of deciding between two quantities. Right: ROC curve of the random
moving dots stimulus, described below (the numbers indicate the coherence of the stimulus). In
the graph 1 − β is the false negative rate, α is the false positive rate.
88
CHAPTER 12. MAKING DECISIONS 89
Figure 12.2: Moving random dot stimulus. Left no net motion is apparent. Right: the net motion
is apparent. In the middle the nervous system is more challenged.
of the task can be changed and the neural responses varies accordingly. Recording the neuron
while at the same time measuring actual performance, it was found that the performance of a
single (!) neuron already matched the behaviour of the monkey (Newsome, Britten, and Movshon,
1989; Britten et al., 1992). (Note, in these studies it is interesting to study both correct and false
responses).
How does a certain stimulus and the corresponding firing of a neuron lead to a decision?
The neurons in the higher brain regions seem to accumulate evidence until they reach a certain
threshold, Fig. 12.3. Also in psychology there are models for the integration of evidence (Ratcliff
and Rouder, 1998). How do noise, error rate and reaction time relate? Integration of signal in the
presence of noise is something we have encountered before: the noisy integrate and fire neuron.
Indeed such models give reasonable fits to the distribution of response times. This section is one
of the few cases where human psychology and neuroscience have become closely entangled.
Making decisions is also closely related to rewards. If the decision leads to punishment the
monkey should better try something else next time. Recommended reading on the relation to
reward (Schall, Stuphorn, and Brown, 2002) and the rest of the special issue of Neuron, Oct.
2002.
The activity of the decision neurons (if there are such neurons), reflect not only the integration
of evidence but also the size and probability of the rewards (Platt and Glimcher, 1999). This
means that when multiple alternatives for action are available, the neurons can directly compete
with each other and the one with highest (expected) reward will win, without the need for a
deciding arbiter.
It seems reasonable to assume that decision making involves a close to optimal weighting of
the evidence. How this is exactly done, is not clear. Many computational challenges remain:
how to make good integrators from neurons, how to implement learning, how is the threshold
implemented, are but a few.
It is also remarkable that the simplest rate decoding mechanism gives good performance. So
a rate coding scheme accounts for a major part of the information. Although one could argue
that in motor cortex there is no place for more complicated coding schemes, as the output is send
fairly directly to the muscles. At least it seems we are out of the woods. These experiments have
potentially important clinical applications: paralysed patients could potentially control artificial
limbs with brain signals through chronically implanted electrodes.
CHAPTER 12. MAKING DECISIONS 91
Figure 12.3: Left: Climbing of a neural response to threshold. This neuron (in the so-called
frontal eye field) initiates an eye-movement. The time at which the neuron reaches threshold has
a fixed latency to the eye-movement, and does not have a fixed latency to the stimulus onset. This
shows that the reaching of threshold is strongly coupled to making the movement. From (Schall,
Stuphorn, and Brown, 2002).
Right: From (Platt and Glimcher, 1999)..
CHAPTER 12. MAKING DECISIONS 92
Figure 12.4: Predicting 1D hand movement from neural recordings in the motor cortex. Even a
simple linear model predicts the movement fairly well. (The ’observed’ curve in c,d is the one with
the flat regions). From (Wessberg et al., 2000).
Chapter 13
One of the characteristics of the nervous system is that it is not a fixed structure, but it is in a
constant state of change. This plasticity takes places at many different levels and timescales:
1. Development: young brains develop, part by themselves, part helped by sensory signals.
This development often goes one way, and damage can only partly be repaired.
2. Throughout life memories are stored, new procedures are invented and stored, new motor
tasks are learnt. One assumes that such memories are stored in the synaptic weights.
3. Finally, on short time scales (milliseconds to minutes) neurons adapt, especially in the sen-
sory systems. This is generally done by biophysical mechanisms and feedback loops.
All these three adaptation mechanisms are a hallmarks of neural computation. Here we focus on
the second form of plasticity, long term synaptic plasticity. It is generally believed that there is not
enough genetic information to specify all 1014 synaptic connections and their strengths, instead
some limited set of rules might be at work. The most famous rule for learning is Hebbian learning.
In 1949 Hebb, who was mainly thinking of reverberating loops, stated that:
“When an axon of cell A is near enough to excite cell B or repeatedly or consistently takes
part in firing it, some growth or metabolic change takes place in one or both cells such that A’s
efficiency, as one of the cells firing B, is increased.”
This remained an unproven theory for a long time, and changes in neural excitability and
genes where also considered as possible memory storage sites. But evidence for Hebbian learning
has now been found in many places in the brain. What remains less clear is whether Hebbian
learning, as we shall shortly describe it, explain all forms of learning. Neuro-psychological case
studies show that different memory systems exist. The different situations quickly mentioned
above under 2) are all processed in different ‘memory systems’. It is not clear if all use the same
Hebbian mechanism. In human memory one distinguishes:
Implicit memory Examples: motor learning, pattern completion. These tasks can be learned
by amnesiacs. Their performance improves, but they will not remember learning it. Implicit
memory has different modalities like explicit memory has (motor, visual, etc.).
Explicit memory The opposite of implicit memory. This is the type of memory that is more
intuitive. It can be divided in two sorts.
Semantic memory: Factual knowledge about the world.
Episodic memory: Personal events memory. This is the answer to the question “Where were
you when JFK was killed?”
The medial temporal lobe is thought to be the storage site for these episodic memories.
In all learning networks we present input and look how the output of the network develops. We can
distinguish supervised from unsupervised learning. In supervised learning we have a certain task
93
CHAPTER 13. HEBBIAN LEARNING: RATE BASED 94
Figure 13.1: Hippocampal plasticity. First, LTP is induced by high frequency extracellular stim-
ulation, and 10 minutes later it is partly reversed by low frequency stimulation.From (Dayan and
Abbott, 2002).
that we want the network to perform, feedback is given to the network whether the performance
was correct. A good example is a feed-forward network with back-propagation. These networks
can in principle be taught any function between input and output. Essential is the presence of
a hidden layer and some non-linearity in the transfer function. One can argue that supervised
algorithms are nothing more but a fancy way of curve fitting an input-output relation. This is
true, but the supervised algorithms remain an interesting and inspiring subject.
On an abstract level the brain is certainly able to do supervised learning. However, we will
concentrate mainly on unsupervised learning. Hebbian learning is unsupervised when cell B does
not receive a feedback signal. Here we research variants of Hebb’s learning rule and see how the
network and synaptic weights develop under these rules.
inputs
N
X
y= wj xj = w
~ · ~x
j=1
where w ~ are the weights of the inputs, which will be modified in the learning. Note that we have
a fully linear neuron for which the activity and the weights can be negative.
The experiments seem to indicate that combining high pre- and post-synaptic activity leads
to a strengthening of the synapse. One of the simplest ways to interpret the experimental results
is to write
∆wi = ²yxi
XN
= ²xi wj .xj
j=1
where ² is a small constant which is the learning rate. It is usually chosen small so that a single
presentation only gives a small change in the weights. This is more mathematical convenience
more than an effort to describe biology; ’one-shot learning’ seems to exists as we all know from
daily life.
Suppose we have M patterns, labelled with µ = 1..M . Every so many milliseconds we change
the pattern. The total weight change after all patterns have occurred an equal amount of time is
M
X N
X
∆wi = ² xµi wj xµj
µ=1 j=1
P
Define the correlation matrix of the inputs as Qij = µ xµi xµj . The weights wi can be written as
a vector w.
~ We find that
∆w~ = ² Q.w
~
As a last step one can write as a differential equation to describe to evolution of the weights
dw
~
τ = Q.w
~
dt
• The output neuron is linear. It can have negative activity, and the activity does not saturate.
• The synaptic weights can have both positive and negative values.
Figure 13.2: Hebbian learning of a neuron with two inputs. A) Two inputs which have strong
negative correlation (each point represents a sample from the input distribution). The inputs are
zero-mean. After learning with the plain Hebb rule the weight vector aligns with the data. B)
When the data are not zero-mean, the weight vector aligns with the mean. C) In that case the
covariance rule aligns with the data. From (Dayan and Abbott, 2002) and (Hertz, Krogh, and
Palmer, 1991).
It is a good exercise to try to investigate variants of the above learning rule for which these
assumptions no longer hold. The effect of many assumptions is unclear.
The correlation matrix Q is a special matrix: It is symmetric and it is positive semi-definite,
which means that its eigenvalues are larger or equal than zero. From our discussion of Markov-
diagrams we know how the weights are going to evolve:
X
w(t)
~ = ~ k eλk t/τ
ck w
k
As stated, the learning picks out the inputs with the strongest correlation. These inputs are
able to fire the post synaptic cell most and will therefore be enhanced. The unsupervised learning
rules perform a principal component analysis on the data. (Below we discuss how to extract
more than one principal component). Principal component analysis projects the data along the
eigenvectors of the correlation matrix with the largest eigenvalues, Fig. 13.2. Describing the data
this way is an efficient compression method, we can have many more input dimensions than output
neurons and still have a decent description of the original input. This situation could occur in
vision. An image has a very high dimension: each pixel is a separate dimensions (to describe an
image you need a vector with as many elements as pixels). As we have seen in our discussion of
visual processing there are many regularities in natural images which can be used to describe them
more succinctly. For instance, neighbouring pixels will be positively correlated. The unsupervised
learning rules pick up such regularities because they are driven by input correlations.
13.3 Normalisation
As one simulates the above rules, it soon becomes clear that the synaptic weights and the post-
synaptic activity run off to un-biologically large values. One can stop learning after a certain
time, or normalisation condition have to be imposed. The precise way one normalises the learning,
however, turns out to be of crucial importance for the final weights.
The easiest way is to incorporate a hard limit for each synaptic weight. For instance, 0 < w <
wmax . This can already have interesting effects on the dynamics, see Fig. 13.3.
Two other simple ways to constrain the synaptic weight are multiplicative and subtractive
scaling. In the first we normalise with a second term which is proportional to the weight itself, in
the second we subtract a constant factor:
dw n.Q.w
τ = Q.w − γ(w)w(t) = Q.w − [ ]w Multiplicative
dt n.w
n.Q.w
= Q.w − ²(w)n = Q.w − [ ]n Subtractive
n.n
where n is the unit vector, P
n = (1, 1, 1, ...). You can easily check that in both cases d(n.w)/dt = 0,
that is the sum of weights i wi is constant.1 Normalisation schemes automatically cause compe-
tition between the synaptic weights: one synapse can only win because the other gets reduced.
The subtractive scheme is more competitive than the multiplicative scheme (see Practical). See
(Miller and MacKay, 1994) for an full analysis of these schemes.
In biology evidence for homeostasis of synaptic weights has been found (Turrigiano et al.,
1998). On long time scales a neuron will try to make sure that it’s activity is not too low or too
high. It does this by both scaling its synaptic weight (in a multiplicative manner it seems) and
by changing the densities of voltage gated channels.
One has introduced the terms hetero-synaptic LTD and homo-synaptic LTD. In hetero-
synaptic LTD, synapses depress in the absence input to that synapse (maybe concurrently with the
strengthening of a stimulated input). In homo-synaptic LTD, low pre-synaptic activity is required.
With the inclusion of these normalization mechanisms the learning is no longer homo-synaptic
(affecting a single synapse only), but hetero-synaptic (a stimulus causes change in not only the
associated synapse, but changes many synapses of the cell).
Figure 13.3: Weight development of two weights in which the weights have hard limits. Plotted in
weight space, the arrows show how the weightsµdevelop under ¶ continuous stimulation. The inputs
1 −0.4
are anti-correlated with an correlation matrix , the largest eigenvector is (1, −1).
−0.4 1
However, because of the limits on the growth, depending on the initial conditions different final
weights are reached. There are three stable fix-points (filled circles) where the weights can end
up. (With positively correlated inputs this does not happen and (1, 1) is the only fixed points.)
Note the quadratic term, which will constrain the weight. It is not difficult to show that the steady
state needs to be an eigenvector of Q.
X X
∆wi = ² wj xµi xµj − wi wj wk xµj xµk
µ,j µ,j,k
0 = Q.w − (w.Q.w)w
∆wi = xi φ(y, θ)
= xi [y(y − θ)]
Again, this learning rule would lead to run away excitation. But the trick is to continuously
change the modification curve depending on the recent activity of the post-synaptic cell. We can
do this by varying the threshold parameter θ. A good choice is to set θ = hy 2 i, that is, take a slow
varying average of recent activity τθ dθ/dt = −θ + y 2 . Now when, for instance, activity is high for
a long time, the threshold is enhanced, leading to a depression for most patterns, and thus in turn
lowering activity again. This way the learning rule has a negative feedback build into it.
CHAPTER 13. HEBBIAN LEARNING: RATE BASED 99
Θ− Θ+
LTP
2+
[Ca ]
LTD
Figure 13.4: BCM rule. Left: the weight change as a function of the instantaneous post-synaptic
activity (measured through the calcium concentration). In BCM, θ+ shifts slowly with activity
changes (θ− is often set to zero).
Right: An application of the BCM rule. Development of orientation and direction selectivity in
primary visual cortex modelled using the BCM rule.After (Blais, Cooper, and Shouval, 2000).
The shape of the function φ, see Fig. 13.4, is roughly what has been seen in experiments. BCM
has homo-synaptic LTD; the modification is proportional to xi , hence no modification without pre-
synaptic input (in contrast to the covariance rule).
What really distinguished BCM from other learning rules, is the change in the threshold. Some
evidence for that has been found (Kirkwood, Rioult, and Bear, 1996). It seems a powerful learning
model but its mathematical analysis is more tricky than the above models. Apparently, BCM can
does PCA like decomposition like the above rules (Intrator and Cooper, 1992). It also has been
applied to the development of the visual system, e.g. Fig. 13.4.
where M is the number of output neurons. Now the population of output neurons represents the
CHAPTER 13. HEBBIAN LEARNING: RATE BASED 100
first M principal components. Another method is Sanger’s rule, which roughly does the same
thing (Hertz, Krogh, and Palmer, 1991).
Introducing lateral interaction between the neurons to induce competition is of much more
general use. Inhibitory interactions reduce the redundancy between the neurons. This is called
competitive learning.
In contrast excitatory interactions help to form populations of neurons coding the same stimulus
properties. This decreases the noise associated to a single neuron spike train, increasing reliability
and reducing integration times. Excitatory interactions do not seem necessary for noiseless, rate-
based model neurons. However, when the neurons are noisy, such interactions help to average the
signal. In addition they might perhaps be useful if you want to create population codes with rate
based neurons.
13.7 ICA
PCA uses only first and second order statistics which means it contains only terms such
as hxi xj i,hx2i i. This is because it is based on the correlation matrix. PCA analysis works there-
fore best on Gaussian distributed variables. Now the sum of many random numbers tends to a
Gaussian, so it seems not a bad assumption. (see also the remarks about Fig. 7.6).
However, suppose that we are dealing with a signal in which many variables are mixed and we
want to disentangle them. One way is to make linear combinations of inputs that are explicitly not
Gaussian distributed, as these combinations are likely to be independent. This can be formulated in
terms of information theory. This leads to so an extension of PCA, called independent component
analysis (ICA), see (Bell et al., 1997) (nice read).
The derivation (for noiseless neurons) goes as follows: suppose we have a set of neuron which
have a sigmoid non-linearity. We can now maximise the mutual information of the output about
the input by changing the synaptic weights. For a single neuron this leads to the weights (and an
offset) to be such that the full output range of the neuron is used (much like the retinal adaptation
we described). For multiple neurons the learning rule will de-correlate inputs. A learning rule
which does this is
cof (wji )
∆wij = + xi (1 − 2yj )
detW
where cof (wij ) is (−1)i+j times the determinant of the weight matrix W with row i and column
j removed. However, it is not so clear how this would be implemented biologically, as it is a very
non-local rule.
ICA is a more powerful technique than PCA, for instance, it allows the de-mixing of mixed
signals. Suppose that three microphones each record in a room in which three persons are all
speaking simultaneously. The signal of the microphones will be strongly correlated. How can we
disentangle the voices. This is called the cocktail party problem. Applying the ICA algorithm
will de-mix the signals and split the output into the 3 distinct sound sources. Whether this is
like what the brain does in auditory processing is not clear, but it is a useful algorithm for signal
processing such as EEGs. It can also predict reasonably realistic V1 receptive fields, Fig. 13.5.
Figure 13.5: Receptive fields that result from doing PCA on natural images. From(van Hateren
and van der Schaaf, 1998).
to learning... It is not so clear how to proceed to construct learning models that do higher level
vision. Not many models have reached beyond V1 and explained the emergence of, say, a face cell.
In addition, most models have build in rules and constraints which make sense when formulating
the model, but have not much experimental support from biology.
Chapter 14
In the previous chapter we looked at plasticity using average firing rates. Recent experiments with
other plasticity protocols show that the timing of the spikes turns out to be of critical importance
in the induction of LTP and LTD. Using patch clamp recording the pre- and post-synaptic event
can be precisely timed. Such an experiment is shown in Fig. 14.1. If the pre-synaptic activity
precedes the post-synaptic spike (and the input thus helps to induce the post-synaptic spike),
the synapse is strengthened, but if the input lags behind the post-synaptic spike, the synapse
is depressed. Thus the modification depends on the precise spike sequence. Note that this in
remarkable agreement with Hebb’s statement which implies that the pre-synaptic activity helps
to cause the post-synaptic activity. The surprising part is the LTD condition. It is not clear yet
whether all forms of the rate-based activity can be analysed in terms of this timing dependent
plasticity.
Suppose we have one pre-synaptic from neuron i and one post-synaptic spike from neuron j.
The simplest way to model this plasticity is to model the plasticity window as an exponential
function, and to update the synaptic weight according to
∆wij = A+ exp(−|ti − tj |/τ+ ) (if ti < tj )
= A− exp(−|ti − tj |/τ− ) (if tj < ti )
where ti is the pre-synaptic spike-time and tj the post-synaptic spike-time, and A+ , A− , τ− , and
∆t 1s
pre
LTP
post
pre
LTD
post
Figure 14.1: Left: Protocol for measuring spike time dependent plasticity. Right: Experimental
data. Changes in synaptic strength after pairing pre- and post-synaptic spikes with 60 repetitions
of pairings at 1Hz. From(Bi and Poo, 1998).
102
CHAPTER 14. SPIKE TIMING DEPENDENT PLASTICITY 103
Figure 14.2: Spike timing dependent plasticity in cortical cells.From (Sjöström, Turrigiano, and
Nelson, 2001).
τ+ are constants to be extracted from the data. The τ+ is usually tens of milliseconds, τ− can be
longer (100ms).
But how to cope with multiple spikes? Suppose pre- and post-synaptic activity are not well
separated pairs as in Fig. 14.1, but contain trains of many spikes. Should the total strengthening
be found by summing over all possible pairs, or perhaps should we just take closest pairs into
account? In other words, interactions beyond simple pairing might play a role. Fig. 14.1 does
not tell us how to deal with that situation (and is therefore a bit confusing). Such effects need
to be measured experimentally, and are currently actively researched (Froemke and Dan, 2002;
Sjöström, Turrigiano, and Nelson, 2001).
Very much related to that, the potentiation part is stronger for higher pairing frequency,
whereas the depression part does not depend much on pairing frequency, Fig. 14.2D, and see
(Markram et al., 1997).
CHAPTER 14. SPIKE TIMING DEPENDENT PLASTICITY 104
A 20
B 20
0 0
Voltage (mV)
Voltage (mV)
−20 −20
−40 −40
−60 −60
−80 −80
−40 −20 0 20 40 60 −40 −20 0 20 40 60
Time (ms) Time (ms)
C 1 D 1
0.8 0.8
wa/wmax
max
0.6 0.6
w /w
a
0.4 0.4
0.2 0.2
0 0
−20 −10 0 10 20 0 5 10 15 20
Relative latency Time jitter
Figure 14.3: Spike timing dependent plasticity prefers early and reliable input. A) Post-synaptic
spiketrain in response to periodic burst before training. B) After training the neuron responds
earlier to the same input. C) Correspondingly, inputs with the shortest latencies are potentiated
most. D) In another simulation different jitters were given to the inputs, the inputs with the least
jitter were potentiated.
Figure 14.4: Formation of columns and maps under STDP. A) Network diagram. Two layers are
simulated. The input layer contained neurons that fire Poisson trains at different rates and the
network layer contained integrate-and-fire neurons. Every neuron in the input layer is randomly
connected to one fifth of the neurons in the network layer and all the neurons in the network layer
are recurrently connected. All synaptic connections are governed by STDP. Input correlations are
introduced by moving a Gaussian hill of activity along the input layer. The Gaussian hill is centred
on a random input neuron for a period of time and then shifted to another random location at the
end of the period. The length of the period is chosen from an exponential distribution and the time
constant is similar to the time window of STDP. B) First stage of column formation. Seed placed
in the forward connections creates a correlated group of network neurons. C) Second stage of
column formation. Correlated group of networks neurons send out connections to other neurons
in the network. D) Third stage of column formation. Transfer of information from recurrent
layer to feed-forward layer. E) Last stage of column formation. Recurrent connections weaken
as feed-forward connections become well formed. F) Receptive fields of two network neurons.
G. Feed-forward synaptic strengths define a column. Dark dots represent strong synapses. The
horizontal stripe indicates that the network neurons have similar input connections, I.e.. receptive
fields. H. When short range excitatory and global inhibitory recurrent connections are introduced
in the network layer, a map forms in the feed-forward connection. The diagonal bar reflects the
progression of receptive field centres as we move across the sheet of network neurons. After(Song
and Abbott, 2001).
slow NMDA receptors. 2) the EPSP propagates sub-threshold to the soma. 3) a spike is generated
4) the spike travels not only down the axon, but also back-propagates into the dendrites. The back-
propagation spike arriving at the synapse releases the voltage-dependent Mg block. 4) Ca flows
into the synapse (both through NMDA channels and through voltage dependent Ca channels). 5)
The Ca influx leads to the induction of LTP. (The buildup of calcium provides a possible reason
why higher pairing frequencies give more LTP).
If this were the full story one might expect a longer LTP window than typically observed,
CHAPTER 14. SPIKE TIMING DEPENDENT PLASTICITY 106
namely a time window that is roughly similar to the NMDA time-constant (check for yourself).
A recent study suggest that the EPSP from the synaptic input inactivates the KA current on
the dendrite (Watanabe et al., 2002). When the back-propagated spike follows the input rapidly
enough, the spike can easily back-propagate as the KA is still inactivated. But when the back-
propagated spike has a longer delay, the KA current is de-inactivated and could stop the back-
propagation. This could help to explain the short LTP windows (τ+ ).
The next question is of course how the Ca influx leads to LTP. As we briefly mentioned in
our discussion of synapses, there are multiple ways the synapse could potentially be modified to
change its strength. There is some evidence that right after LTP the AMPA receptors are put
in a phosphorilated state which gives them a higher conductance than normal, un-phosphorilated
AMPA receptors. Some 40 minutes (?) after induction, new receptors are inserted and the
phosphorilated ones go back to an un-phosphorilated state. Now the synapse is in a naive state
again but with a higher conductance as there are more receptors. However, the sequence of events
causing LTP is not known yet, and different systems might show different LTP mechanisms.
About LTD there is less known: it is mostly believed that a low level Ca influx causes LTD.
But the spatial- temporal profile of Ca concentration might very well matter as well.
Note, that the weights each correspond to a different delay from the stimulus, so a neural imple-
mentation requires something like delay lines. A very efficient algorithm to learn the weight vector
is the temporal difference rule:
X
∆w(t0 ) = ² δ(t)u(t0 − t)
t
δ(t) = r(t) + v(t + 1) − v(t)
Recently, experimental evidence for reward prediction cells has been found, Fig. 14.5. The
neurons release dopamine in response to reward, but like Pavlov’s dogs, after learning they are
already active at stimulus presentation. Precisely how these neurons interact with learning and
change the behaviour of the animal is not yet clear.
We have discussed a monkey that mostly sits passively and gets rewards (or not). But temporal
difference learning can also be used to guide behaviour as it predicts the reward of a certain action,
even if the reward follows much later. This learning method has been successfully applied to teach
computers (and perhaps humans) to play backgammon and similar games.
CHAPTER 14. SPIKE TIMING DEPENDENT PLASTICITY 107
Figure 14.5: Left: During learning the δ-signal shifts to earlier times. After learning it responds
to the stimulus, not to the reward. Right: A) Cells which before learning respond to the actual
reward, respond after learning to the stimulus. B) Absence of the expected reward reduces the
firing rate.From (Dayan and Abbott, 2002), experiments from Schultz and co-workers.
Abbott, L. F., J. A. Varela, K. Sen, and S. B. Nelson (1997). Synaptic depression and cortical gain control.
Science 275: 220–224.
Abeles, M. (1991). Corticonics: neural circuits of the cerebral cortex. Cambridge University Press, Cambridge.
Anderson, J. S., I. Lampl, D. C. Gillespie, and D. Ferster (2000). The contribution of noise to contrast
invariance of orientation tuning in cat visual cortex. Science 290: 1968–1972.
Arbib(editor), M. (1995). The handbook of brain theory and neural networks. MIT press, Cambridge, MA.
Atick, J. J. and A. N. Redlich (1992). What does the retina know about natural scenes. Neural
Comp. 4: 196–210.
Bair, W. and C. Koch (1996). Temporal precision of spike trains in extrastriate cortex of the behaving macaque
monkey. Neural Comp. 8: 1185–1202.
Barlow, H. B. (1958). Temporal and spatial summation in human vision at different background intensities.
J. Physiol. 141: 337–350.
Barlow, H. B. and W. R. Levick (1965). The mechanism of directionally selective units in rabbit’s retina. J.
Physiol. 178: 477–504.
Battro, A. M. (2000). Half a brain is enough. The story of Nico. Cambridge University Press, Cambridge.
Bear, M., B. W. Connors, and M. A. Paradiso (2000). Neuroscience: Exploring the Brain. Lippincott, Williams
and Wilkins.
Bell, C. C., V. Z. Han, Y. Sugawara, and K. Grant (1997). Synaptic plasticity in cerebellum-like structure
depends on temporal order. Nature 387: 278–281.
Ben-Yishai, R., R. L. Bar-Or, and H. Sompolinsky (1995). Theory of orientation tuning in visual cortex. Proc.
Natl. Acad. Sci. 92: 3844–3848.
Berry, M. J. and M. Meister (1998). Refractoriness and neural precision. J. Neurosci. 18: 2200–2211.
Bi, G.-q. and M.-m. Poo (1998). Synaptic modifications in cultured hippocampal neurons: dependence on
spike timing, synaptic strength, and postsynaptic cell type. J. Neurosci. 18: 10464–10472.
Bialek, W. (1987). Physical limits to sensation and perception. Annu Rev Biophys Biophys Chem 16: 455–78.
Bienenstock, E. L., L. N. Cooper, and P. W. Munro (1982). Theory for the development of neuron selectivity:
orientation specificity and binocular interaction in visual cortex. J. Neurosci. 2: 32–48.
Blais, B. A., L. N. Cooper, and H. Shouval (2000). Formation of direction selectivity in natural scene
environments. Neural Comp. 12: 1057–1066.
Blakemore, C. (1988). The mind machine. BBC books.
Bliss, T.V.P. and T. Lomo (1973). Long-lasting potentiation of synaptic transmission in the dentate area of
the anaesthetized rabbit following stimulation of the perforant path,. J. Physiol. 232: 331–56.
Blum, K. I. and L. F. Abbott (1996). A model of spatial map formation in the hippocampus of the rat. Neural
Comp. 8: 85–93.
Braitenberg, V. and A. Schüz (1998). Statistics and geometry of neuronal connectivity. Springer.
Britten, K. H., M. N. Shadlen, W. T. Newsome, and J. A. Movshon (1992). The analysis of visual motion:
a comparison of neuronal and psychophysical performance. J. Neurosci. 12: 4745–4765.
Brunel, N. (2000). Dynamics of networks of randomly connected excitatory and inhibitory spiking neurons.
Journal of Physiology-Paris 94: 445–463.
108
BIBLIOGRAPHY 109
Brunel, N. and J.-P. Nadal (1998). Mutual information, Fisher information, and population coding. Neural
Comp. 10: 1731–1757.
Bryant, H. L. and J. P. Segundo (1976). Spike initiation by transmembrane current: a white-noise analysis.
J. Physiol. 260: 279–314.
Buchsbaum, G. and A. Gottschalk (1983). Trichromacy, opponent colours coding and optimum colour infor-
mation transmission in the retina. Proc. R. Soc. London B 220: 89–113.
Buracas, G. T., A. M. Zador, M. R. DeWeese, and T. D. Albright (1998). Efficient discrimination of temporal
patters by motion sensitive neurons in primate visual cortex. Neuron 20: 959–969.
Carandini, M., D. J. Heeger, and W. Senn (2002). A synaptic explanation of suppression in visual cortex. J.
Neurosci. 22: 10053–10065.
Churchland, P. S. and T. J. Sejnowski (1994). The Computational Brain. MIT Press.
Colbert, C. M. and E. Pan (2002). Ion channel properties underlying axonal action potential initiation in
pyramidal neurons. Nat. Neurosci. 5: 533–548.
Compte, A., N. Brunel, P. S. Goldman-Rakic, and X. J. Wang (2000). Synaptic mechanisms and network
dynamics underlying spatial working memory in a cortical network model. Cerebral Cortex 10: 910–923.
Dayan, P. and L. F. Abbott (2002). Theoretical Neuroscience. MIT press, Cambridge, MA.
de Ruyter van Steveninck, R. R., G. D. Lewen, S. P. Strong, R. Koberle, and W. Bialek (1997). Reproducibility
and variability in neural spike trains. Science 275: 1805–1809.
de Schutter, E. and Paul Smolen (1998). Calcium dynamics in large neuronal models. In Koch, C. and
I. Segev, editors, Methods in neuronal modeling, 2nd ed., pp. 211–250. MIT Press, Cambridge.
Desai, N. S., L. C. Rutherford, and G. G. Turrigiano (1999). Plasticity in the intrinsic electrical properties of
cortical pyramidal neurons. Nat. Neurosci. 2: 515–520.
Destexhe, A., Z. F. Mainen, and T. J. Sejnowski (1998). Kinetic models of synaptic transmission. In Koch,
C. and I. Segev, editors, Methods in neuronal modeling (2nd ed.). MIT Press, Cambridge.
Engel, A. K., O. Konig, and W. Singer (1991). Direct physiological evidence for scene segmentation by
temporal coding. Proc. Natl. Acad. Sci. 88: 9136–9140.
Erickson, C. A., B. Jagadeesh, and R. Desimone (2000). Clustering of perirhinal neurons with similar properties
following visual experience in adult monkeys. Nat. Neurosci. 3: 1143–8.
Froemke, R.C. and Y. Dan (2002). Spike-timing-dependent synaptic modification induced by natural spike
trains. Nature 416: 433–8.
Fukushima, K. (1980). Neocognitron: A self-organising multi-layered neural network. Biol. Cy-
bern. 20: 121–136.
Funahashi, S., C. J. Bruce, and P. S. Goldman-Rakic (1998). Mnemonic encoding of visual space in the
monkey’s dorsolateral prefrontal cortex. J. Neurophysiol. 61: 1464–1483.
Gerstein, G. L. and B. Mandelbrot (1964). Random walk models for the spike activity of a single neuron.
Biophys. J. 4: 41–68.
Gerstner, W. (2000). Population dynamics of spiking neurons: Fast transients, asynchronous state, and
locking. Neural Comp. 12: 43–89.
Hansel, D., G. Mato, C. Meunier, and L. Neltner (1998). On numerical simulations of integrate-and-fire neural
networks. Neural Comp. 10: 467–483.
Hertz, J., A. Krogh, and R. G. Palmer (1991). Introduction to the theory of neural computation. Perseus,
Reading, MA.
Hille, B. (2001). Ionic Channels of excitable membranes. Sinauer, Sunderland, MA.
Hodgkin, A. L. and A. F. Huxley (1952). A quantitative description of membrane current and its application
to conduction and excitation in nerve. J. Physiol. 117: 500–544.
Hoffman, D. A., J. C. Magee, C. M. Colbert, and D. Johnston (1997). K + channel regulation of signal
propagation in dendrites of hippocampal pyramidal neurons. Nature 387: 869–875.
Holt, G. (1998). A Critical Reexamination of Some Assumptions and Implications of Cable Theory in Neuro-
biology. Ph.D. diss., Caltech.
BIBLIOGRAPHY 110
Holt, G. and C. Koch (1997). Shunting inhibition does not have a divisive effect on firing rates. Neural
Comp. 9: 1001–1013.
Holt, G. R., W. R. Softky, C. Koch, and R. J. Douglas (1996). Comparison of discharge variability in vitro
and in vivo in cat visual neurons. J. Neurophysiol. 75: 1806–1814.
Huxley, A. F. (1959). Ion movements during nerve activity. Annals of the New York Academy of Sci-
ences 81: 221–246.
Intrator, N. and L. N. Cooper (1992). Objective function formulation of the bcm theory of visual cortical
plasticity. Neural Netw. 5: 3–17.
Johnston, D. and S. Wu (1995). Foundations of cellular Neurophysiology. MIT Press, Cambridge, MA.
Jones, J. and L. Palmer (1987). The two-dimensional spatial structure of simple receptive fields in cat striate
cortex. J. Neurophysiol. 58: 1187–1211.
Kalaska, J. F., R. Caminiti, and A. P. Georgopoulos (1983). Cortical mechanisms related to the direction of
two-dimensional arm movements: Relations in parietal area 5 and comparison with motor cortex. Exp. Brain
Res. 51: 247–260.
Kandel, E. R., J. H. Schwartz, and T. M. Jessel (2000). Principles of Neural science. McGraw Hill, New York.
Kapadia, M. K., G. Westheimer, and C. D. Gilbert (2000). Spatial distribution of contextual interactions in
primary visual cortex and in visual perception. J. Neurophysiol. 84: 2048–2062.
Kirkwood, A., M. C. Rioult, and M. F. Bear (1996). Experience-dependent modification of synaptic plasticity
in visual cortex. Nature 381: 526–528.
Koch, C. (1999). Biophysics of computation. Oxford University Press, New York.
Lamme, V. A. and P. R. Roelfsema (2000). The distinct modes of vision offered by feedforward and recurrent
processing. Trends Neurosci. 23: 571–9.
Lapique, L. (1907). Recherches quantitatives sur l’excitation electrique de nerfs traitee comme une polarization.
J. Physiol. Pathol. Gen. 9.
Linsker, R. (1992). Local synaptic learning rules suffice to maximise mutual information in a linear network.
Neural Comp. 4: 691–704.
Liu, Y. H. and X. J. Wang (2001). Spike-frequency adaptation of a generalized leaky integrate-and-fire model
neuron. J. Comp. Neurosci. 10: 25–45.
Logothetis, N.K., J. Pauls, M. Augath, T. Trinath, and A. Oeltermann (2001). Neurophysiological investigation
of the basis of the fmri signal. Nature 412: 150–7.
Lowen, S. B., T. Ozaki, E. Kaplan, B. E. Saleh, and M. C. Teich (2001). Fractal features of dark, maintained,
and driven neural discharges in the cat visual system. Methods 24: 377–94.
Luria, A. (1966). Higher Cortical function in man. Plenum.
Mack, V., N. Burnashev, K.M. Kaiser, A. Rozov, V. Jensen, O. Hvalby, P. H. Seeburg, B. Sakmann, and
R. Sprengel (2001). Conditional restoration of hippocampal synaptic potentiation in glur-a-deficient mice.
Science 292: 2501–4.
Mainen, Z. F. and T. J. Sejnowski (1995). Reliability of spike timing in neocortical neurons. Sci-
ence 268: 1503–1506.
Markram, H., J. Lübke, M. Frotscher, and B. Sakmann (1997). Regulation of synaptic efficacy by coincidence
of postsynaptic APs and EPSPs. Science 275: 213–215.
Markram, H. and M Tsodyks (1996). Redistribution of synaptic efficacy between neocortical pyramidal
neurons. Nature 382: 807–810.
Martin, S. J. and R. G. Morris (2002). New life in an old idea: the synaptic plasticity and memory hypothesis
revisited. Hippocampus 12: 609–36.
Miller, K. D. and D. J. C. MacKay (1994). The role of constraints in Hebbian learning. Neural
Comp. 6: 100–126.
Mitchell, S. J. and R. A. Silver (2003). Shunting inhibition modulates neuronal gain during synaptic excitation.
Neuron 8: 433–445.
Newsome, W. T., K. H. Britten, and J. A. Movshon (1989). Neuronal correlates of a perceptual decision.
Nature 341: 52–54.
BIBLIOGRAPHY 111
Oja, E. (1982). A simplified neuron model as a principal component analyzer. J. Math. Biol. 15: 267–273.
Oram, M. W., N. G. Hatsopoulos, B. J. Richmond, and J. P. Donoghue (2000). Excess synchrony in the
motor cortical neurons provides redundant direction information with thtat from coarse temporal measures.
J. Neurophysiol. 86: 1700–1716.
Paradiso, M. A. (1988). A theory for the use of visual orientation information which exploits the columnar
structure of striate cortex. Biol. Cybern. 58: 35–49.
Paré, D., E. Shink, H. Gaudreau, A. Destexhe, and E. J. Lang (1998). Impact of spontaneous synaptic activity
on the resting properties of cat neocortical pyramidal neurons in vivo. J. Neurophysiol. 79: 1450–1460.
Platt, M. L. and P. W. Glimcher (1999). Neural correlates of decision variables in parietal cortex. Sci-
ence 400: 233–238.
Pouget, A. and T. J. Sejnowki (1997). Spatial transformations in the parietal cortex using basis functions. J.
Cogn. Neurosci. 9: 222–237.
Ratcliff, R. and J. N. Rouder (1998). Modeling response times for two-choice decisions. Psychological
Science 9: 347–356. maybe other ratcliff papers are better.
Riedel, G., J. Micheau, A. G. Lam, E. Roloff, S.J Martin, H. Bridge, L. Hoz, B. Poeschel, J. McCulloch,
and R.G. Morris (1999). Reversible neural inactivation reveals hippocampal participation in several memory
processes. Nat. Neurosci. 2: 898–907.
Rieke, F., D. Warland, R. Steveninck, and W. Bialek (1996). Spikes: Exploring the neural code. MIT Press,
Cambridge.
Riesenhuber, M. and T. Poggio (1999). Hierarchical models of object recognition in cortex. Nat. Neu-
rosci. 2: 1019–1025.
Rolls, E. T. and G. Deco (2002). Computational neuroscience of vision. Oxford.
Sacks, O. (1985). The man who mistook his wife for a hat and other clinical tales. Summit Books, New York.
Sakai, K. and Y. Miyashita (1991). Neural organization for the long-term memory of paired associates.
Nature 354: 152–157.
Schall, J. D., V. Stuphorn, and J. W. Brown (2002). Monitoring and control of action by the frontal lobes.
Neuron 36: 309–322.
Sejnowski, T. J. (1976). Statistical constraints on synaptic plasticity. J. theor. Biol. 69: 385–398.
Seung, H. S. and H. Sompolinsky (1993). Simple models for reading neuronal population codes. Proc. Natl.
Acad. Sci. 90: 10749–10753.
Shadlen, M. N. and W. T. Newsome (1998). The variable discharge of cortical neurons: Implications for
connectivity, computation, and information coding. J. Neurosci. 18: 3870–3896.
Shannon, C. E. and W. Weaver (1949). The mathematical theory of communication. Univeristy of Illinois
Press, Illinois.
Shepherd, G. M. (1994). Neurobiology. Oxford, New York.
Sjöström, P.J., G.G. Turrigiano, and S.B. Nelson (2001). Rate, timing, and cooperativity jointly determine
cortical synaptic plasticity. Neuron 32: 1149–1164.
Snippe, H. P. (1996). Parameter extraction from population codes: a critical assessment. Neural
Comp. 8: 511–529.
Sompolinsky, H., H. Yoon, K. Kang, and M. Shamir (2002). Population coding in neuronal systems with
correlated noise. Phys. Rev E 64: 51904.
Song, S. and L.F. Abbott (2001). Column and map development and cortical re-mapping through spike-timing
dependent plasticity. Neuron 32: 339–350.
Song, S., K. D. Miller, and L. F. Abbott (2000). Competitive Hebbian learning through spike-timing-dependent
synaptic plasticity. Nat. Neurosci. 3: 919–926.
Stemmler, M. and C. Koch (1999). How voltage-dependent conductances can adapt to maximize the infor-
mation encoded by neuronal firing rate. Nat. Neurosci. 2: 512–527.
Stevens, C. F. (1972). Interferences about membrane properties from electrical noise measurements. Biophys.
J. 12: 1028–1047.
BIBLIOGRAPHY 112
Stevens, C. F. and A. M. Zador (1998). Input synchrony and the irregular firing of cortical neurons. Nat.
Neurosci. 1: 210–217.
Stricker, C. (2002). Central synaptic integration: Linear after all? News Physiol. Sci 17: 138–143.
Sutton, R. S. and A. G. Barto (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge.
Tanaka, K. (1996). Inferotemporal cortex and object vision. Annu Rev Neurosci 19: 109–139.
Taylor, D. M., S. I. Helms Tillery, and A. B. Schwartz (2002). Direct cortical control of 3d neuroprosthetic
devices. Science 296: 1829.
Thorpe, S., D. Fize, and C. Marlot (1996). Speed of processing in the human visual system. Na-
ture 381: 520–522.
Touryan, J., G. Felsen, and Y. Dan (2005). Spatial structure of complex cell receptive fields measured with
natural images. Neuron 45: 781–791.
Trappenberg, T. P. (2002). Fundamentals of computational neuroscience. Oxford.
Treue, S. (2001). Neural correlates of attention in primate visual cortex. Trend in Neurosc. 24: 295–300.
Tsunoda, K., Y. Yamane, M. Nishizaki, and M. Tanifuji (2001). Complex objects are represented in macaque
inferotemporal cortex by the combination of feature columns. Science 4: 832–838.
Tuckwell, H. C. (1988). Introduction to theoretical neurobiology. Cambridge University Press, Cambridge.
Turrigiano, G. G., K. R. Leslie, N. S. Desai, L. C. Rutherford, and S. B. Nelson (1998). Activity-dependent
scaling of quantal amplitude in neocortical neurons. Nature 391: 892–896.
van Hateren, J. H. (1992). Real and optimal neural images in early vision. Nature 360: 68–70.
van Hateren, J. H. and A. van der Schaaf (1998). Independent component filters of natural images compared
with simple cells in primary visual cortex. Proc. Roc. Soc. B 265: 359–366.
van Rossum, M. C. W., G.-q. Bi, and G. G. Turrigiano (2000). Stable Hebbian learning from spike timing
dependent plasticity. J. Neurosci. 20: 8812–8821.
van Rossum, M. C. W., B. J. O’Brien, and R. G. Smith (2003). The effects of noise on the timing precision
of retinal ganglion cells. J. Neurophysiol. 89: 2406–2419.
van Rossum, M. C. W., G. G. Turrigiano, and S. B. Nelson (2002). Fast propagation of firing rates through
layered networks of noisy neurons. J. Neurosci. 22: 1956–1966.
van Vreeswijk, C. and H. Sompolinsky (1996). Chaos in neuronal networks with balanced excitatory and
inhibitory activity. Science 274: 1724–1726.
Varela, J. A., K. Sen, J. Gibson, J. Fost, L. F. Abbott, and S.B. Nelson (1997). A quantitative description of
short-term plasticity at excitatory synapses in layer 2/3 of rat primary visual cortex. J. Neurosci. 17: 7926–7940.
von der Heydt, R., E. Peterhans, and G. Baumgartner (1984). Illusory contours and cortical neuron responses.
Science 224: 1260–2.
Watanabe, S., D. A. Hoffman, M. Migliore, and D. Johnston (2002). Dendritic k+ channels contribute
to spike-timing dependent long-term potentiation in hippocampal pyramidal neurons. Proc. Natl. Acad.
Sci. 99: 8366–8371.
Wessberg, J., C. R. Stambaugh, J.D. Kralik, P. D. Beck, M. Laubach, J. K. Chapin, J. Kim, S. J. Biggs,
M. A. Srinivasan, and M. A. Nicolelis (2000). Real-time prediction of hand trajectory by ensembles of cortical
neurons in primates. Nature 408: 361–368.
Zohary, E., M. N. Shadlen, and W. T. Newsome (1994). Correlated neuronal discharge rate and its implications
for psychophysical performance. Nature 370: 140–144.