Neural Network Machines: Back Then
Neural Network Machines: Back Then
Neural Network Machines: Back Then
Howard C. Anderson Published in February 1989 issue of IEEE Potentials The Von Neumann machine architecture, remarkably successful at handling many problems, is now viewed by many to be limited. Von Neumann machines can do things, such as compute missile trajectories, that humans have great difficulty doing. But, we can do things, such as take dictation, that we have great difficulty making Von Neumann machines do. Thus, neural network machine architectures are currently under intensive investigation. These new architectures are based on knowledge about biological computers such as the human brain. At present, most researchers are using simulation on Von Neumann machines to investigate alternative neural network machine architectures. Some neural network machines, however, have already been manufactured as experimental silicon chips. A Von Neumann machine typically has a very complex instruction-processing unit that sequentially accesses a memory bank to perform a sequence of instructions. A neural network machine is a very complex network of very simple processing units that operate asynchronously but in parallel. Von Neumann machines evolved primarily from A. M. Turings theoretical work on David Hilberts 23rd problem. Neural network machines are evolving primarily from practical studies of neurobiological structures. The dream of making a thinking machine is the driving force for investigating alternative machine architectures based on neurophysiological models. It is also the driving force for the field of study known as artificial intelligence.
Back then
The thinking in the late sixties was that artificial intelligence could be achieved on Von Neumann machines, and that there was no practical reason to pursue investigation of alternative machine architectures. In fact, the artificial intelligence community effectively closed the door on research in this area for nearly 20 years when Minsky and Papert published their book Perceptrons. The book analyzed the limitations of what is now recognized as one of the most elementary and limited forms of neural network machine architectures. In addition, the mainstream artificial intelligence community paid little attention to alternative machine architectures. We presume this is because Turing had shown that a digital computer is a universal machine capable of simulating any other digital or analog computer and can ... carry out any task that is explicitly described. Turing also wrote, ... considerations of speed apart, it is unnecessary to design various new machines to do various computing processes. They can all be done with one digital computer, suitably programmed for each case. Clearly then, the universal machine is sufficient for all processes which can be explicitly described and for which processing time is immaterial. Unfortunately, many of the processes we wish our machines to perform must be done in a timely, efficient manner. Also, we have been unable to explicitly describe many of the processes we want our machines to perform. (Even though some of the best minds have been working on such problems for years.) Processes that for us are relatively simple and accomplished quickly, e.g., recognizing a face in a photograph or taking dictation, seem to be incredibly difficult for the universal machine.
linking small processing units (neurons) together. The communication lines transfer electrical signals to the processing units via small resistors (synaptic junctions). Memory in the brain appears to consist primarily of the values assigned to the resistors once the topology is established. The processing units receive signals from many other processing units, add the values, and then produce a somewhat proportional but amplitude limited output signal if the summed values exceed a thresh old value. The output signal is then transmitted across the communication lines to other resistors connected to other processing units. The process of learning seems to be primarily a matter of changing the values of the resistors. (There are inhibitory and excitatory neurons, so the simplified electronic analogy referred to here must be biased in some way.) Amplitude in a biological network refers to the rate at which a neuron fires. In a simulated neural network, the amplitude can be represented by a real number. In a silicon implementation, the amplitude can be represented by a voltage or a current. Amplitude limiting in a simulation can be performed by applying a sigmoid function, an S shaped curve that asymptotically approaches a maximum positive output value for large positive sums and asymptotically approaches a maximum negative output value for large negative sums. The sigmoid function seems to simulate reasonably well the observed behavior of real biological neurons.
Associative recall
The power inherent in associative memory neural network machines appears to stem primarily from their ability to perform non-linear vector transformations. The exclusive-or or XOR function in electronics can be stated as a vector transformation problem. The problem is to find a transformation operator T that transforms a set of x vectors into a set of y vectors as shown in the following table: x (0,0) (1,0) (0,1) (1,1) -> -> -> -> y (0) (1) (1) (0)
We can rewrite these relationships as: (0,0)T (1,0)T (0,1)T (1,I)T = = = = (0) (1) (1) (0)
(1)
Now the definition of a linear transformation, T, over the set of real numbers is the following: Let V and U be vector spaces over the set of real numbers. Let T:V -> U satisfy (1) (v1 + v2)T = (v1)T + (v2)T, for all v1, v2 within V, and (2) (av)T = a(vT), for all real numbers a , and for all v within V, then T is a linear transformation from V to U. So lets test T given in the set of equations (1) to see if T could be a linear transformation. Applying the first condition of the definition of a linear transformation to the second and third of equations (1), we must have: [(0,1) + (1,0)]T = (0,1)T + (1,0)T, or (1,1)T = (0,1)T + (1,0)T, or (0) = (1) + (1), or (0) = (2), which is not true, therefore, T cannot be a linear transformation! It is somehow surprising that the simple exclusive-or function, so important and fundamental to digital processing, involves a non-linear operator. It is also surprising, in view of this fact, that our mathematics texts say so much about linear transformations and so little about non-linear transformations. A neural network machine to perform the XOR function is shown in figure 1. The green triangles represent neurons and the red circles represent synapses. Each of the neurons in the example has a threshold value of .01; i.e., the neuron fires only if the sum of its input values multiplied by their respective synapse coefficients exceeds .01. There are two input neurons, neurons one and two with values of one or zero representing the true or false values of logical variables A and B. There are two more neurons, neurons three and four which each have two synapses that are connected to the input neurons. The values of the synapses are as shown in figure 1. Let the symbol ~ denote NOT and let ^ denote EXCLUSIVE OR. Note that the synapse values of neurons three and four have been chosen so that neuron three produces (~ A)(B) and neuron four produces (A)(~B). The fifth neuron is an output neuron with two synapses that receive input from neurons three and four. The values of its synapses are as shown in figure 1. The output C of the output neuron is the exclusive or of A and B. (Recall that (~A) (B) + (A)(~B) = A ^ B.) For example, if A is 1 and B is 0 then neuron threes output will be 0 (since the sum of the weighted input values is -1 and that is below the threshold value of .01). Neuron fours output will be + 1 and neuron fives output will then be the sum of +1 and 0 or +1. If however A is 1 and
B is 1 then neuron threes internal sum will be 0, its output will be 0, neuron fours internal sum will be 0, its output will be 0 and neuron fives output will then be the sum of 0 and 0 or zero.
Note that the two neurons on the left are called input neurons because they are connected directly to real world input signals. The neuron on the right is called an output neuron since it provides an output to the real world. The two neurons in the middle are called hidden units. Hidden units must be present in order to perform the XOR function. They are the source of the non-linearity that allows the network to perform as a non-linear vector operator.
Auto-associative recall
Now lets describe mathematically the relatively simple neural network model shown in figure 2, a one layer, fully connected neural network that behaves as an auto-associative memory.
We will be using the term activation rules. Activation rules refer to the mathematical formulae that determine the current value or activation value of a particular neuron. Assume we have N neurons, ui, whose activity values at time t are ai(t). Assume that each neuron, u, has a set of input or synaptic weights, Wij, so that the output of some neuron, ai, is first multiplied by the weight Wij before being given to neuron ui. Assume that the set of weighted values provided to neuron ui are added together and then amplitude limited by a sigmoid function and that this result then becomes the new activity value, ai( t+ 1 ), of neuron ui at time t+l. Then we have described a one layer, fully connected neural network. The activity value ai at time t of neuron ui is given by:
a j (t ) = (Wij a j (t 1))
j =1
(2)
where
Note that we could have chosen a more complicated function in place of the summation of the products of the Wij and the aj. Also some other function could have been chosen for the sigmoid. Many different models of neural networks using different functions exist within the current literature. The model just described is a particularly simple one. Now we may select some of these neurons to be input neurons and assign their values from real-world sources or terminals rather than let the equations set their values. Similarly, we may select specific neurons to be output neurons and pass their output values to real-world output terminals. Note that the Wij values contain all of the knowledge embedded in the network. Learning rules refer to the mathematical formulae, which determine how the Wij values are modified when the neural network learns. There are many learning rules described in the current literature. We will choose a particularly simple learning rule here for illustrative purposes known as the delta rule. The delta rule takes its name from the Greek delta symbol often used in mathematics to signify the amount of change of a variable. In the case of neural networks, an input vector is provided to the neural network and it produces an output vector. The output vector is compared with the desired or correct output vector. The difference between these two vectors, or some function of the difference between these two vectors, is the delta referred to. This delta can be used as input to a function which adjusts the Wij so that the difference between the output vector and the desired output vector will be reduced. The equations for the delta rule are: ij (t ) = Ti ai (t 1)) a j (t 1) for all i and j W ( and Wij (t ) =Wij (t 1) + ij (t ) for all i and j W (4) where is the learning rate, a real number that is usually in the range [0,1], and T is a training or target vector which the machine is trying to learn. Now, depending upon the learning rate, , it may take several iterations of the above equations (3 and 4) before the network learns the vector T. A training interval refers to one of the time steps from time t -1 to time t during which the training vector is injected into the network, the Wij are adjusted, and the new ai are computed. Note that we may inject the training vector, T, at the beginning of each training interval, by setting (3)
ai ' (t 1) = ai (t 1) + Ti
(5)
where ai ' (t 1) is to be used in place of ai (t 1) in equation (3) and is a real number in the range [0,1]. (The injection of the training vector may be accomplished by sensory or dummy neurons that feed
into synapses of the real neurons.) The response of the network to the injected training vector is computed by equation (2). Once the network has learned to distinguish several different training vectors, we may turn learning off by not performing equations (3) and (4). Injecting part or all of a pattern will then cause the machine, via equations (2) and (5), to attempt to reproduce the nearest training vector. The results are the ai values produced by equation (2). This type of machine is most similar to the auto-associative kind. The interest in neural network machines is growing rapidly at this point. Neural network machines appear to be required to solve some of the most pressing problems in artificial intelligence. These machines will not replace Von Neumann machines but will probably be introduced as hybrids. There is plenty of work ahead, however, before the dream is achieved. Work is needed in non-linear mathematics, chip design (there is a three-dimensional interconnect problem that will need to be solved), and neurophysiology, to name a few. These machines will be an important part of future computing systems.