Learning Law in Neural Networks
Learning Law in Neural Networks
Learning Law in Neural Networks
Learning implies that the processing element somehow changes its input/output behavior in response to tbhe environment. For example, if the processing element originally gives an output of +1 in response to a particular input pattern, it might have an output of -1 to that same input pattern after learning takes place. The processing element has somehow changed it mind about what the correct response to that input should be. What does the processing element do to make this change? The output is computed as a result of a transfer function of the weighted input. The net input for this simple case is computed by multiplying the value of each individual input by its corresponding weight, or equivalently, taking the dot product of the input and weight vectors. The processing element then takes this input value and applies the transfer function to it to compute the resulting output. Activation function - A function by which new output of the basic unit is derived from a combination of the net inputs and the current state of the unit (the total input). axon - The part of a nerve cell through which impulses travel away from the cell body; the electrically active parts of a nerve-cell. back-propagation - A learning algorithm for a multilayer network in which the weights are modified via the propagation of an error signal "backward" from the outputs to the inputs. connection - A pathway between processing elements, either positive or negative, that links the processing elements into a network. dendrite - The branched part of a nerve cell that caries impulses toward the cell body. The electrically passive parts of a nerve cell. learning - The phase in a neural network when new data is introduced into the network, causing the weights on the processing elements to be adjusted. neuron - The structural and functional unit of the nervous system, consisting of the nerve cell body and all its processes, including an axon and one or more dendrons.
perceptron - A large class of simple neuron-like networks with only an input layer and an output layer. Developed in 1957 by Frank Rosenblatt, this class of neural network had no hidden layer. summation function - A function that combines the various input activations into a single activation. synapse - The point of contact between adjacent neurons where nerve impulses are transmitted from one to another. threshold - A minimum level of excitation energy. training - A process whereby a network learns to associate an input pattern with the correct answer. weight - The strength of an input connection expressed by a real number. Processing elements receive input via interconnects. Each interconnect has a weight attached to it. The sum of the weights make up a value that updates the processing element. The output value of a processing element is described by a level of excitation that causes interconnects to be either on (i.e. excitatory output) or off (i.e. inhibitory output).
Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete for the right to respond to a subset of the input data. A variant ofHebbian learning, competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include vector quantization and selforganising maps (Kohonen maps).
Competitive Learning is usually implemented with Neural Networks that contain a hidden layer which is [2] commonly known as competitive layer. Every competitive neuron i is described by a vector of weights input data and calculates the similarity measure between the and the weight vector .
For every input vector, the competitive neurons compete with each other to see which one of them is the most similar to that particular input vector. The winner neuron m sets its output and all the other competitive neurons set their output .
Usually, in order to measure similarity the inverse of the Euclidean distance is used: between the input vector and the weight vector .
Example
Here is a simple competitive learning algorithm to find three clusters within some input data. 1. (Set-up.) Let a set of sensors all feed into three different nodes, so that every node is connected to every sensor. Let the weights that each node gives to its sensors be set randomly between 0.0 and 1.0. Let the output of each node be the sum of all its sensors, each sensor's signal strength being multiplied by its weight. 2. When the net is shown an input, the node with the highest output is deemed the winner. The input is classified as being within the cluster corresponding to that node. 3. The winner updates each of its weights, moving weight from the connections that gave it weaker signals to the connections that gave it stronger signals. Thus, as more data are received, each node converges on the centre of the cluster that it has come to represent and activates more strongly for inputs in this cluster and more weakly for inputs in other clusters.
LEARNING TYPES Supervised Or Active Learning - learning with an external teacher or a supervisor who presents training set to the network. Unsupervised Or Self-Organized Learning does not require an external teacher. During the training session, the neural network receives a number of different input patterns, discovers significant features in these patterns and learns how to classify input data into appropriate categories. Unsupervised learning can be used in real-time. Re-inforcement learning:- The output will be come with the help of feedback. If the output is match than the result will be +1 otherwise 0 or -1.
Unsupervised Learning Types:Adaptive Resonance Theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. ART networks consist of an input layer and an output layer. Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Carpenter and Grossberg (1987) On-line clustering algorithm Recurrent ANN Competitive output layer Data clustering applications Stability-plasticity dilemma Stability: system behaviour doesnt change after irrelevant events Plasticity: System adapts its behaviour according to significant events Dilemma: how to achieve stability without rigidity and plasticity without chaos? Ongoing learning capability Preservation of learned knowledge
ART Architecture
Bottom-up weights bij Top-down weights tij Store class template Input nodes Vigilance test Input normalisation Output nodes Forward matching Long-term memory ANN weights
ART Types ART1: Unsupervised Clustering of binary input vectors. ART2: Unsupervised Clustering of real-valued input vectors. ART3: Incorporates "chemical transmitters" to control the search process in a hierarchical ART structure. ARTMAP: Supervised version of ART that can learn arbitrary mappings of binary patterns. Fuzzy ART: Synthesis of ART and fuzzy logic. Fuzzy ARTMAP: Supervised fuzzy ART dART and dARTMAP: Distributed code representations in the F2 layer (extension of winner take all approach). Gaussian ARTMAP
an interface portion (F1(b)) the cluster units (the F2 layer) and a mechanism to control the degree of similarity of patterns placed on the same cluster a reset mechanism weighted bottom-up connections between the F1 and F2 layers weighted top-down connections between the F2 and F1 layers
Reset Module Fixed connection weights Implements the vigilance test Excitatory connection from F1(b)
Inhibitory connection from F1(a) Output of reset module inhibitory to output layer Disables firing output node if match with pattern is not close enough Duration of reset signal lasts until pattern is present
Gain module Fixed connection weights Controls activation cycle of input layer Excitatory connection from input lines Inhibitory connection from output layer Output of gain module excitatory to input layer 2/3 rule for input layer
Fast Learning Weights reach equilibrium in each learning trial Have some of the same characteristics as the weight found by ART1 More appropriate for data in which the primary information is contained in the pattern of components that are small or large
Slow Learning Only one weight update iteration performed on each learning trial Needs more epochs than fast learning More appropriate for data in which the relative size of the nonzero components is important
Image segmentation
Character recognition Data mining Data set partitioning Detection of emerging clusters
Hebbian learning In 1949, Donald Hebb proposed one of the key ideas in biological learning, commonly known as Hebbs Law. Hebbs Law states that if neuron i is near enough to excite neuron j and repeatedly participates in its activation, the synaptic connection between these two neurons is strengthened and neuron j becomes more sensitive to stimuli from neuron i. Hebbs Law can be represented in the form of two rules: 1. If two neurons on either side of a connection are activated synchronously, then the weight of that connection is increased. 2. If two neurons on either side of a connection are activated asynchronously, then the weight of that connection is decreased. Hebbs Law provides the basis for learning without a teacher. Learning here is a local phenomenon occurring without feedback from the environment.
Output Signals
Input Signals
Competitive learning In competitive learning, neurons compete among themselves to be activated. While in Hebbian learning, several output neurons can be activated simultaneously, in competitive learning, only a single output neuron is active at any time. The output neuron that wins the competition is called the winner-takes-all neuron. The basic idea of competitive learning was introduced in the early 1970s. In the late 1980s, Teuvo Kohonen introduced a special class of artificial neural networks called self-organising feature maps. These maps are based on competitive learning.
SOM What is a self-organising feature map? Our brain is dominated by the cerebral cortex, a very complex structure of billions of neurons and hundreds of billions of synapses. The cortex includes areas that are responsible for different human activities (motor, visual, auditory, somatosensory, etc.), and associated with different sensory inputs. We can say that each sensory input is mapped into a corresponding area of the cerebral cortex. The cortex is a self-organising computational map in the human brain.
The Kohonen network n The Kohonen model provides a topological mapping. It places a fixed number of input patterns from the input layer into a higher-dimensional output or Kohonen layer. Training in the Kohonen network begins with the winners neighbourhood of a fairly large size. Then, as training proceeds, the neighbourhood size gradually decreases.
The lateral connections are used to create a competition between neurons. The neuron with the largest activation level among all neurons in the output layer becomes the winner. This neuron is the only neuron that produces an output signal. The activity of all other neurons is suppressed in the competition. The lateral feedback connections produce excitatory or inhibitory effects, depending on the distance from the winning neuron. This is achieved by the use of a Mexican hat function which describes synaptic weights between neurons in the Kohonen layer.
Output Signals
Input Signals
y1
PERCEPTRON In machine learning, the perceptron is an algorithm for supervised classification of an input into one of several possible non-binary outputs. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector describing a given input using the delta rule. The learning algorithm for perceptrons is an online algorithm, in that it processes elements in the training set one at a time. The perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt.
The perceptron is a binary classifier which maps its input value (a single binary value):
where is a vector of real-valued weights, is the dot product (which here computes a weighted sum), and is the 'bias', a constant term that does not depend on any input value. The value of (0 or 1) is used to classify as either a positive or a negative instance, in the case of a binary classification problem. If is negative, then the weighted combination of inputs must produce a positive value greater than in order to push the classifier neuron over the 0 threshold. Spatially, the bias alters the position (though not the orientation) of the decision boundary. The perceptron learning algorithm does not terminate if the learning set is not linearly separable. If the vectors are not linearly separable learning will never reach a point where all vectors are classified properly. The most famous example of the perceptron's inability to solve problems with linearly nonseparable vectors is the Boolean exclusive-or problem. The solution spaces of decision boundaries for all binary functions and learning behaviors are studied in the reference. In the context of artificial neural networks, a perceptron is an artificial neuron using the Heaviside step function as the activation function. The perceptron algorithm is also termed the single-layer perceptron, to distinguish it from a multilayer perceptron, which is a misnomer for a more complicated neural network. As a linear classifier, the single-layer perceptron is the simplest feedforward neural network.
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early single-layer neural [1] network and the name of the physical device that implemented this network. It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCullochPitts neuron. It consists of a weight, a bias and a summation function. The difference between Adaline and the standard (McCullochPitts) perceptron is that in the learning phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function's output is used for adjusting the weights. There also exists an extension known as Madaline
Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables: x is the input vector w is the weight vector n is the number of inputs some constant y is the output
then we find that the output is then the o/p reduces to the dot product of x and w