Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
2D1432, 2004
1 Objectives
This exercise is about recurrent networks, especially the Hopeld network and
dierent forms of associative networks. When you are nished you should un-
derstand:
2 Tasks
We will use Matlab to do all calculations in this exercise. Most of the operations
of the Hopeld-type of networks can be seen as vector-matrix operations which
are easy to express in Matlab.
We will look at some simple networks using the Hebbian learning principle
which is often used in recurrent networks. You will construct an auto-associative
memory of the Hopeld type, and explore its capabilities, capacity and limita-
tions. Most of the tasks consist of observing the dynamics and analysing why
it occurs.
3 Background
A neural network is called recurrent if it contains connections allowing output
signals to enter again as input signals. They are in some sense more general
1
than feedforward networks: a feedforward network can be seen as a special case
of a recurrent network where the weights of all non-forward connections are set
to zero.
One of the most important applications for recurrent networks is associa-
tive memory, storing information as dynamically stable congurations. Given
a noisy or partial pattern, the network can recall the original version it has
auto-associative memory)
been trained on ( or another learned pattern ( hetero-
associative memory).
The most well-known recurrent network is the Hopeld network, which is
a fully connected auto-associative network with two-state neurons and asyn-
chronous updating and a Hebbian learning rule. One reason that the Hopeld
network has been so well studied is that it is possible to analyse it using meth-
ods from statistical mechanics, enabling exact calculation of its storage capacity,
convergence properties and stability. This exercise will deal mainly with this
network and some variants of it.
> cd ~/ann04
> cp -r /info/ann04/labbar/lab2 .
> cd lab2
> matlab
2
Figure 1: A simple Hebbian fully connected auto-associative network. When
three of the units are activated by an outside stimulus their mutual connections
are strengthened. The next time some of them are activated they will activate
each other.
for x̄µ (a vector with components xµi , where µ is the number of the pattern).
We could use for instance 0 and 1 for the activities but the calculations become
easier if we choose -1 and 1.
To measure the correlated activities we can use the outer product W = x̄T x̄
of the activity vectors we intend to learn; if the components xi and xj are
correlated wij will be positive, if they are anticorrelated wij will be negative.
Note that W is a symmetric matrix; each pair of units will be connected to each
other with the same strength.
The coecients for the weight matrix can then be written as:
P
1 X µ µ
wij = x x
N µ=1 i j
where
1 x>0
sign(x) =
−1 x < 0
3
The function is undened for zero; the built-in Matlab function sign(x) returns
zero for x = 0, while the utility function sgn(x) returns 1. It does not really
matter which function you use, but in order to avoid the appearance of confusing
neural states between -1 and 1 sgn is more practical in this lab.
The above dynamics and learning rule form a Hopeld network.
Actually, this variant of the Hopeld network, where all states are updated
synchronously, is known as the Little model. In the original Hopeld model
the states are updated one at a time, allowing each to be inuenced by other
states that might have changed sign in the previous steps. It can be seen as
an asynchronous parallel system. This has some eects for the convergence
properties (see section 5.3), but is otherwise very similar in behavior to the
synchronous model. The Little model is very easy to implement in Matlab, so
we will use it for most of the lab.
In Matlab these operations can be implemented as vector matrix manipu-
lations. To make the pattern vectors as easy as possible to read and write we
dene them as row vectors.
• Translate the calculation of the weight matrix and the update rule into
Matlab expressions.
Can the memory recall the stored patterns from distorted inputs patterns? De-
ne a few new patters which are distorted versions of the original ones:
4
x1d has a one bit error, x2d and x3d have two bit errors.
• Apply the update rule repeatedly until you reach a stable xpoint. Did
all the patterns converge towards stored patterns?
You'll probably nd that x1, x2 and x3 are attractors in this network.
• How many attractors are there in this network? Hint: automate the
searching.
• What happens when you make the starting pattern even more dissimilar
to the stored ones (e.g. more than half is wrong)?
So far we have only used a very small 8-neuron network. Now we will switch to a
1024-neuron network and picture patterns. Load the le pict.m, which contains
nine patterns named p1, p2, p3, p4, p5, p6, p7, p8 and p9, and learn the rst
three.
>> pict
>> w = p1'*p1 + p2'*p2 + p3'*p3;
Since large patterns are hard to read as rows of numbers, we have prepared
a function vis(x) which displays a 1024 unit pattern as a 32 × 32 image.
>> vis(p1);
(one could also use imagesc(reshape(p1,32,32))).
• Can the network complete a degraded pattern? Try the pattern p11, which
is a degraded version of p1, or p22 which is a mixture of p2 andp3.
• Clearly convergence is practically instantaneous. What happens if we
select units randomly, calculate their new state and then repeat the process
(the original sequential Hopeld dynamics)? Write a matlab script that
does this, showing the image every hundredth iteration or so.
5.3 Energy
Can we be sure that the network converges, or will it cycle between dierent
states forever?
For networks with a symmetric connection matrix it is possible to dene
an energy function or Lyapunov function, a nite-valued function of the state
that always decreases as the states change. Since it has to have a minimum at
least somewhere the dynamics must end up in an attractor . A simple energy
1
function with this property is:
XX
E=− wij xi xj
i j
1 In
the Little model it can actually end up alternating between two states with the same
energy; in the Hopeld model with asynchronous updates the attractor will always be a single
state.
5
• How do you express this calculation in Matlab? (Note: you do not need
to use any loops!)
• Follow how the energy changes from iteration to iteration when you use
the sequential update rule to approach an attractor.
How resistant are the patterns to noise or distortion? You can use the flip(x,n)
function, which ips n units randomly. The rst argument is a row vector, the
second the number of ips.
5.5 Capacity
Now add more and more memories to the network to see where the limit is.
Start by adding p4 into the weight matrix and check if moderately distorted
patters can still be recognized. Then continue by adding others such as p5, p6
and p7 in some order and checking the performance after each addition.
• How many patterns could safely be stored? Was the drop in performance
gradual or abrupt?
• Try to repeat this with learning a few random patterns instead of the
pictures and see if you can store more. You can use sgn(randn(1,1024))
to easily generate the patterns.
6
Create 300 random patterns (sign(randn(300,100)) is a quick way) and
train a 100 unit (or larger) network with them. After each new pattern has been
added to the weight matrix, calculate how many of the earlier patterns remain
stable (a single iteration does not cause them to change) and plot it.
• What happens with the number of stable patterns as more are learned?
The self-connections wii are always positive and quite strong; they always
support units to remain at their current state. If you remove them (a simple
trick is to use w=w-diag(diag(w))) and compare the curves from pure and noisy
patterns for large number of patterns you will see that the dierence goes away.
In general it is a good idea to remove the self-connections, although it looks like
it makes performance worse: actually, they promote the formation of spurious
patterns and make noise removal worse.
The reduction in capacity because of bias is troublesome, since real data usually
isn't evenly balanced.
Here we will use binary (0,1) patterns, since they are easier to use than
bipolar (±1) patterns in this case and it makes sense to view the ground
state as zero and diering neurons as active. If the average activity ρ =
P P µ
(1/N P ) µ i xi is known, the learning rule can be adjusted to deal with this
imbalance:
P
X
wij = (xµi − ρ)(xµj − ρ)
µ=1
This produces weights that are still on average zero. When updating, we use
the slightly updated rule
X
xi ← 0.5 + 0.5 ∗ sign( wij xj − θ)
j
• Try generating sparse patterns with just 10% activity and see how many
can be stored for dierent values of θ (use a script to check dierent values
of the bias).
7
Good luck!