DL Unit-Ii
DL Unit-Ii
Unsupervised learning is a type of machine learning that looks for previously undetected
patterns in a data set with no pre-existing labels and with a minimum of human supervision. In
contrast to supervised learning that usually makes use of human-labeled data, unsupervised
learning, also known as self-organization allows for modeling of probability densities over
inputs. It forms one of the three main categories of machine learning, along
with supervised and reinforcement learning. Semi-supervised learning, a related variant,
makes use of supervised and unsupervised techniques.
Two of the main methods used in unsupervised learning are principal component
and cluster analysis. Cluster analysis is used in unsupervised learning to group, or segment,
datasets with shared attributes in order to extrapolate algorithmic relationships. Cluster
analysis is a branch of machine learning that groups the data that has not been labelled,
classified or categorized. Instead of responding to feedback, cluster analysis identifies
commonalities in the data and reacts based on the presence or absence of such commonalities
in each new piece of data. This approach helps detect anomalous data points that do not fit into
either group.
1- Maxnet
Maxnet
Maxnet Algorithm
1 for i = j
ωij =
-є i≠j
Step 2: If more than one node has nonzero output, do step 3 to 5.
Step 3: Update the activation (output) at each node for
j = 1, 2, 3……., n
aj (t+1) = f [ aj (t) – є ∑ ai (t)] i ≠ j
Step 5: Test for stopping condition. If more than one node has a nonzero output then
Go To step 3, Else Stop.
Example: A Maxnet has three inhibitory weights a 0.25 (є = 0.25). The net is initially
activated by the input signals [0.1 0.3 0.9]. The activation function of the neurons is:
1
1
Solution:
2.Hamming Net:
Y
net 1 Class 1
Maxnet
X2
Y Class 2
2 net2
Wij = ei(j)/2
Where ei(j) is the i'th component of the j'th exemplar vector.
Terminology
M: number of exemplar vectors
N: number of input nodes (input vector components) E(j) : j'th
exemplar vector
Algorithm:
Step 3: Maxnet iterations are used to find the best match exemplar.
Example: Given the exemplar vector e(1)=(-1 1 1 -1) and
e(2)=(1 -1 1 -1). Use Hamming net to find the exemplar vector close to bipolar input
patterns
(1 1 -1 -1), (1 -1 -1 -1), (-1 -1 -1 1) and (-1 -1 1 1).
Y Class 1
net 1
Maxnet
X2
Class 2
Y
b2
Solution:
-0.5 0.5
0.5 -0.5
0.5 0.5
-0.5 -0.5
28
Rectangular Grid Topology
This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8 nodes in
the distance-0 grid, which means the difference between each rectangular grid is 8 nodes. The
winning unit is indicated by #.
This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6 nodes in
the distance-0 grid, which means the difference between each rectangular grid is 6 nodes. The
winning unit is indicated by #.
Architecture
The architecture of KSOM is similar to that of the competitive network. With the help of
neighborhood schemes, discussed earlier, the training can take place over the extended region of
the network.
29
Algorithm for training
Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme.
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every input vector x.
Step 4 − Calculate Square of Euclidean Distance for j = 1 to m
D(j)=∑i=1n∑j=1m(xi−wij)2D(j)=∑i=1n∑j=1m(xi−wij)2
Step 5 − Obtain the winning unit J where Djj is minimum.
Step 6 − Calculate the new weight of the winning unit by the following relation −
wij(new)=wij(old)+α[xi−wij(old)]wij(new)=wij(old)+α[xi−wij(old)]
30
Architecture:
Following figure shows the architecture of LVQ which is quite similar to the architecture of
KSOM. As we can see, there are “n” number of input units and “m” number of output units. The
layers are fully interconnected with having weights on them.
Parameters Used:
Following are the parameters used in LVQ training process as well as in the flowchart
• x = training vector (x1,...,xi,...,xn)
• T = class for training vector x
• wj = weight vector for jth output unit
• Cj = class associated with the jth output unit
Training Algorithm:
31
Step 7 − Calculate the new weight of the winning unit by the following relation −
if T = Cj then wj(new)=wj(old)+α[x−wj(old)]wj(new)=wj(old)+α[x−wj(old)]
if T ≠ Cj then wj(new)=wj(old)−α[x−wj(old)]wj(new)=wj(old)−α[x−wj(old)]
Step 8 − Reduce the learning rate αα.
Step 9 − Test for the stopping condition. It may be as follows −
CPN (Counter propagation network) were proposed by Hecht Nielsen in 1987.They are
multilayer network based on the combinations of the input, output, and clustering layers. The
application of counter propagation net are data compression, function approximation and pattern
association. The counter propagation network is basically constructed from an instar-outstar
model. This model is three layer neural network that performs input-output data mapping,
producing an output vector y in response to input vector x, on the basis of competitive learning.
The three layer in an instar-outstar model are the input layer, the hidden(competitive) layer and
the output layer.
There are two stages involved in the training process of a counter propagation net. The
input vector are clustered in the first stage. In the second stage of training, the weights from the
32
cluster layer units to the output units are tuned to obtain the desired response. There are two
types of counter propagation net:
1. Full counter propagation network
2. Forward-only counter propagation network
33
Training Algorithm for Full CPN:
34
For k =1 to m, uJk(new)=uJk(old) + a[yk-uJk(old)]
Step 14: Reduce the learning rates a and b.
a(t+1)=0.5a(t); b(t+1)=0.5b(t)
Step 15: Test stopping condition for phase II training.
.
Training Algorithm for Forward-only CPN:
Step 0: Initialize the weights and learning rates.
Step 1: Perform step 2 to 7 when stopping condition for phase I training is false.
Step 2: Perform step 3 to 5 for each of training input X.
Step 3: Set the X-input layer activation to vector X.
Step 4: Compute the winning cluster unit J. If dot product method is used, find the cluster unit zJ
with the largest net input:
zinj=∑xi.vij
35
If Euclidean distance is used, find the cluster unit zJ square of whose distance from the input
pattern is smallest:
Dj=∑(xi-vij)^2
If there exists a tie in the selection of winner unit, the unit with the smallest index is chosen as
the winner.
Step 5: Perform weight updation for unit zJ. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
Step 6: Reduce learning rate α:
α (t+1)=0.5α(t)
Step 7: Test the stopping condition for phase I training.
Step 8: Perform step 9 to 1 when stopping condition for phase II training is false.
Step 9: Perform step 10 to 13 for each training input pair x:y.
Step 10: Set X-input layer activations to vector X. Set Y-output layer activation to vector Y.
Step 11: Find the winning cluster unit J.
Step 12: Update the weights into unit zJ. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
Step 13: Update the weights from unit zJ to the output units.
For k=1 to m, wJk(new)=wJk(old) + β[yk-wJk(old)]
Step 14: Reduce learning rate β,
β(t+1)=0.5β(t)
Step 15: Test the stopping condition for phase II training.
36
Types of Adaptive Resonance Theory(ART)
The adaptive resonant theory is a type of neural network that is self-organizing and
competitive. It can be of both types, the unsupervised ones(ART1, ART2, ART3, etc) or the
supervised ones(ARTMAP). Generally, the supervised algorithms are named with the suffix
“MAP”.
But the basic ART model is unsupervised in nature and consists of :
• The F1 layer accepts the inputs and performs some processing and transfers it to the F2
layer that best matches with the classification factor.
There exist two sets of weighted interconnection for controlling the degree of similarity
between the units in the F1 and the F2 layer.
• The F2 layer is a competitive layer.The cluster unit with the large net input becomes the
candidate to learn the input pattern first and the rest F2 units are ignored.
• The reset unit makes the decision whether or not the cluster unit is allowed to learn the
input pattern depending on how similar its top-down weight vector is to the input vector
and to he decision. This is called the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or new
information. Higher vigilance produces more detailed memories, lower vigilance
produces more general memories.
Generally two types of learning exists,slow learning and fast learning.In fast learning, weight
update during resonance occurs rapidly. It is used in ART1.In slow learning, the weight change
occurs slowly relative to the duration of the learning trial. It is used in ART2.
• It exhibits stability and is not disturbed by a wide variety of inputs provided to its
network.
• It can be integrated and used with various other techniques to give more good results.
37
• It can be used for various fields such as mobile robot control, face recognition, land cover
classification, target recognition, medical diagnosis, signature verification, clustering web
users, etc.
• It has got advantages over competitive learning (like bpnn etc). The competitive learning
lacks the capability to add new clusters when deemed necessary
Some ART networks are inconsistent (like the Fuzzy ART and ART1) as they depend upon
the order in which training data, or upon the learning rate.
****
38