0% found this document useful (0 votes)
16 views

DL Unit-Ii

Deep learning unit 2 docx file
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

DL Unit-Ii

Deep learning unit 2 docx file
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT – II

UNSUPERVISED LEARNING NETWORK

Unsupervised learning is a type of machine learning that looks for previously undetected
patterns in a data set with no pre-existing labels and with a minimum of human supervision. In
contrast to supervised learning that usually makes use of human-labeled data, unsupervised
learning, also known as self-organization allows for modeling of probability densities over
inputs. It forms one of the three main categories of machine learning, along
with supervised and reinforcement learning. Semi-supervised learning, a related variant,
makes use of supervised and unsupervised techniques.
Two of the main methods used in unsupervised learning are principal component
and cluster analysis. Cluster analysis is used in unsupervised learning to group, or segment,
datasets with shared attributes in order to extrapolate algorithmic relationships. Cluster
analysis is a branch of machine learning that groups the data that has not been labelled,
classified or categorized. Instead of responding to feedback, cluster analysis identifies
commonalities in the data and reacts based on the presence or absence of such commonalities
in each new piece of data. This approach helps detect anomalous data points that do not fit into
either group.

➢ Fixed weight competitive nets


They are additional structures included in networks of multi output in order
to force their output layers to make a decision as to which one neuron will fire.
This mechanism is called competition. When competition is complete, only one
output neuron has nonzero output. Symmetric (fixed) weight nets are: (Maxnet and
Hamming Net).

1- Maxnet

• -Maxnet is based on winner-take-all policy.


• -The n-nodes of Maxnet are completely connected
• -There is no need for training the network, since the weights are fixed.
• -The Maxnet operates as a recurrent recall network that operates in an
Auxiliary mode
• Activation functions

net if net > 0


f (net) =
0 otherwise

Where Є is usually positive less than 1 number.

Maxnet

Maxnet Algorithm

Step 1: Set activations and weights,


aj (0) is the starting input value to node Aj

1 for i = j
ωij =
-є i≠j
Step 2: If more than one node has nonzero output, do step 3 to 5.
Step 3: Update the activation (output) at each node for
j = 1, 2, 3……., n
aj (t+1) = f [ aj (t) – є ∑ ai (t)] i ≠ j

є < 1/m where m is the number of competitive neurons

Step 4: Save activations for use in the next iteration.


aj (t+1) → aj (t)

Step 5: Test for stopping condition. If more than one node has a nonzero output then
Go To step 3, Else Stop.
Example: A Maxnet has three inhibitory weights a 0.25 (є = 0.25). The net is initially
activated by the input signals [0.1 0.3 0.9]. The activation function of the neurons is:
1
1

net if net > 0 f


(net) = 0 otherwise

Find the final winning neuron.

Solution:

First iteration: The net values are:


a1 (1) = f [0.1 - 0.25(0.3+0.9)] = 0
a2 (1) = f [0.3 - 0.25(0.1+0.9)] = 0.05
a3 (1) = f [0.9 - 0.25(0.1+0.3)] = 0.8

Second iteration: a1 (2) = f [0 - 0.25(0.05+0.8)] = 0


a2 (2) = f [0.05 - 0.25(0 +0.8)] = 0
a3 (2) = f [0.8 -0.25(0+0.05)] =0.7875

Then the 3rd neuron is the winner.

2.Hamming Net:

Hamming net is a maximum likelihood classifier net. It is used to determine an


exemplar vector which is most similar to an input vector. The measure of similarity is
obtained from the formula:
x.y = a – D = 2a – n , since a +D = n
Where D is the hamming distance (number of component in which vectors differ), a is
the number components in which the components agree and n is the number of each
vector components.
When weight vector off a class unit is set to be one half of the exemplar vector,
and bias to be (n/2), the net will find the unit closest exemplar by finding the unit with
maximum net input. Maxnet is used for this purpose.
Hamming Net

Y
net 1 Class 1
Maxnet
X2

Y Class 2
2 net2

Wij = ei(j)/2
Where ei(j) is the i'th component of the j'th exemplar vector.

Terminology
M: number of exemplar vectors
N: number of input nodes (input vector components) E(j) : j'th
exemplar vector

Algorithm:

Step 1: Initialize the weights


wij = ei(j)/2 = i'th component of the j'th exemplar, ( i=
1,2,....n, and j = 1,2,. ....................... m )
Initialize bias values, bj = n/2
For each input vector X do steps 2 to 4

Step 2: Compute net input to each output unit Yj as: Yinj = bi +


∑eij xij ( i = 1,2,…n, j = 1,2,…m )

Step 3: Maxnet iterations are used to find the best match exemplar.
Example: Given the exemplar vector e(1)=(-1 1 1 -1) and
e(2)=(1 -1 1 -1). Use Hamming net to find the exemplar vector close to bipolar input
patterns
(1 1 -1 -1), (1 -1 -1 -1), (-1 -1 -1 1) and (-1 -1 1 1).

Y Class 1
net 1
Maxnet
X2

Class 2
Y

b2

Solution:

Step 1: Store the exemplars in the weights as:


wij = ei(j)/2 = i'th component of the j'th exemplar,

-0.5 0.5
0.5 -0.5
0.5 0.5
-0.5 -0.5

Since e(1) = (-1 1 1 -1) and e(2) = (1 -1 1 -1).


bj = n/2 = 2
step 2: Apply 1st bipolar input (1 1 -1 -1) Yin1
= b1 + ∑ xi wi1
= 2 + (1 1 -1 -1) .* (-0.5 0.5 0.5 -0.5)
=2
Yin2 = b2 + ∑ xi wi2
= 2 + (1 1 -1 -1) .* (0.5 -0.5 0.5 -0.5)
=2
Hence, the first input patter has the same Hamming distance HD = 2
with both exemplar vectors.

Step 3: Apply the second input vector (1 -1 -1 -1)


Yin1 = 2 + (1 -1 -1 -1) .* (-0.5 0.5 0.5 -0.5) =1
Yin2 = 2 + (1 -1 -1 -1) .* (0.5 -0.5 0.5 -0.5) =3
Since y2 > y1, then the second input best matches with the second
exemplar e(2).

Step 4: Apply input pattern no. 3 (-1 -1 -1 1)


Yin1 = 2 + (-1 -1 -1 1) .* (-0.5 0.5 0.5 -0.5) = 1
Yin2 = 2 + (-1 -1 -1 1) .* (0.5 -0.5 0.5 -0.5) = 1
Hence we have Hamming similarity.

Step 5: Consider the last input vector (-1 -1 1 1)


Yin1 = 2 + (-1 -1 1 1) .* 0.5 (-1 1 1 -1) = 2
Yin2 = 2+ (-1 -1 1 1) .* 0.5 (1 -1 1 -1) = 2
Hence we have Hamming similarity

Kohonen self organizing feature maps:


There can be various topologies, however the following two topologies are used the most −

28
Rectangular Grid Topology

This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1 grid, and 8 nodes in
the distance-0 grid, which means the difference between each rectangular grid is 8 nodes. The
winning unit is indicated by #.

Hexagonal Grid Topology

This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1 grid, and 6 nodes in
the distance-0 grid, which means the difference between each rectangular grid is 6 nodes. The
winning unit is indicated by #.

Architecture

The architecture of KSOM is similar to that of the competitive network. With the help of
neighborhood schemes, discussed earlier, the training can take place over the extended region of
the network.

29
Algorithm for training

Step 1 − Initialize the weights, the learning rate α and the neighborhood topological scheme.
Step 2 − Continue step 3-9, when the stopping condition is not true.
Step 3 − Continue step 4-6 for every input vector x.
Step 4 − Calculate Square of Euclidean Distance for j = 1 to m
D(j)=∑i=1n∑j=1m(xi−wij)2D(j)=∑i=1n∑j=1m(xi−wij)2
Step 5 − Obtain the winning unit J where Djj is minimum.
Step 6 − Calculate the new weight of the winning unit by the following relation −
wij(new)=wij(old)+α[xi−wij(old)]wij(new)=wij(old)+α[xi−wij(old)]

Step 7 − Update the learning rate α by the following relation −


α(t+1)=0.5αtα(t+1)=0.5αt

Step 8 − Reduce the radius of topological scheme.


Step 9 − Check for the stopping condition for the network.

Learning Vector Quantization LVQ


LVQ, different from Vector quantization VQVQ and Kohonen Self-Organizing Maps KSOM
.KSOM, basically is a competitive network which uses supervised learning. We may define it as
a process of classifying the patterns where each output unit represents a class. As it uses
supervised learning, the network will be given a set of training patterns with known classification
along with an initial distribution of the output class. After completing the training process, LVQ
will classify an input vector by assigning it to the same class as that of the output unit.

30
Architecture:
Following figure shows the architecture of LVQ which is quite similar to the architecture of
KSOM. As we can see, there are “n” number of input units and “m” number of output units. The
layers are fully interconnected with having weights on them.

Parameters Used:

Following are the parameters used in LVQ training process as well as in the flowchart
• x = training vector (x1,...,xi,...,xn)
• T = class for training vector x
• wj = weight vector for jth output unit
• Cj = class associated with the jth output unit

Training Algorithm:

Step 1 − Initialize reference vectors, which can be done as follows −


• Step 1aa − From the given set of training vectors, take the first “m” number of clusters
number of clusters training vectors and use them as weight vectors. The remaining vectors
can be used for training.
• Step 1bb − Assign the initial weight and classification randomly.
• Step 1cc − Apply K-means clustering method.
Step 2 − Initialize reference vector αα
Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met.
Step 4 − Follow steps 5-6 for every training input vector x.
Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n
D(j)=∑i=1n∑j=1m(xi−wij)2D(j)=∑i=1n∑j=1m(xi−wij)2
Step 6 − Obtain the winning unit J where Djj is minimum.

31
Step 7 − Calculate the new weight of the winning unit by the following relation −
if T = Cj then wj(new)=wj(old)+α[x−wj(old)]wj(new)=wj(old)+α[x−wj(old)]
if T ≠ Cj then wj(new)=wj(old)−α[x−wj(old)]wj(new)=wj(old)−α[x−wj(old)]
Step 8 − Reduce the learning rate αα.
Step 9 − Test for the stopping condition. It may be as follows −

• Maximum number of epochs reached.


• Learning rate reduced to a negligible value.

CPN (COUNTERPROPAGATION NETWORK):

CPN (Counter propagation network) were proposed by Hecht Nielsen in 1987.They are
multilayer network based on the combinations of the input, output, and clustering layers. The
application of counter propagation net are data compression, function approximation and pattern
association. The counter propagation network is basically constructed from an instar-outstar
model. This model is three layer neural network that performs input-output data mapping,
producing an output vector y in response to input vector x, on the basis of competitive learning.
The three layer in an instar-outstar model are the input layer, the hidden(competitive) layer and
the output layer.
There are two stages involved in the training process of a counter propagation net. The
input vector are clustered in the first stage. In the second stage of training, the weights from the
32
cluster layer units to the output units are tuned to obtain the desired response. There are two
types of counter propagation net:
1. Full counter propagation network
2. Forward-only counter propagation network

1. Full counter propagation network:


Full CPN efficiently represents a large number of vector pair x:y by adaptively constructing a
look-up-table. The full CPN works best if the inverse function exists and is continuous. The vector
x and y propagate through the network in a counterflow manner to yield output vector x* and y*.

Architecture of Full CPN:


The four major components of the instar-outstar model are the input layer, the instar, the
competitive layer and the outstar. For each node in the input layer there is an input value xi. All
the instar are grouped into a layer called the competitive layer. Each of the instar responds
maximally to a group of input vectors in a different region of space. An outstar model is found to
have all the nodes in the output layer and a single node in the competitive layer. The outstar looks
like the fan-out of a node.

33
Training Algorithm for Full CPN:

Step 0: Set the weights and the initial learning rate.


Step 1: Perform step 2 to 7 if stopping condition is false for phase I training.
Step 2: For each of the training input vector pair x:y presented, perform step 3 to .
Step 3: Make the X-input layer activations to vector X.
Make the Y-input layer activation to vector Y.
Step 4: Find the winning cluster unit.
If dot product method is used, find the cluster unit zj with target net input; for j=1 to p,
zinj=∑xi.vij + ∑yk.wkj
If Euclidean distance method is used, find the cluster unit zj whose squared distance from input
vectors is the smallest:
Dj=∑(xi-vij)^2 + ∑(yk-wkj)^2
If there occurs a tie in case of selection of winner unit, the unit with the smallest index is the
winner. Take the winner unit index as J.
Step 5: Update the weights over the calculated winner unit zj.
For i=1 to n, viJ(new)=viJ(old) + α[xi-viJ(old)]
For k =1 to m, wkJ(new)=wkJ(old) + β[yk-wkJ(old)]
Step 6: Reduce the learning rates.
α (t+1)=0.5α(t); β(t+1)=0.5β(t)
Step 7: Test stopping condition for phase I training.
Step 8: Perform step 9 to 15 when stopping condition is false for phase II training.
Step 9: Perform step 10 to 13 for each training input vector pair x:y. Here α and β are small
constant values.
Step 10: Make the X-input layer activations to vector x. Make the Y-input layer activations to
vector y.
Step 11: Find the winning cluster unit (Using the formula from step 4). Take the winner unit
index as J.
Step 12: Update the weights entering into unit zJ.
For i=1 to n, viJ(new)=viJ(old) + α[xi-viJ(old)]
For k =1 to m, wkJ(new)=wkJ(old) + β[yk-wkJ(old)]
Step 13: Update the weights from unit zj to the output layers.
For i=1 to n, tJi(new)=tJi(old) + b[xi-tJi(old)]

34
For k =1 to m, uJk(new)=uJk(old) + a[yk-uJk(old)]
Step 14: Reduce the learning rates a and b.
a(t+1)=0.5a(t); b(t+1)=0.5b(t)
Step 15: Test stopping condition for phase II training.

2. Forward-only Counter propagation network:


A simplified version of full CPN is the forward-only CPN. Forward-only CPN uses only
the x vector to form the cluster on the Kohonen units during phase I training. In case of forward-
only CPN, first input vectors are presented to the input units. First, the weights between the input
layer and cluster layer are trained. Then the weights between the cluster layer and output layer are
trained. This is a specific competitive network, with target known.
Architecture of forward-only CPN:
It consists of three layers: input layer, cluster layer and output layer. Its architecture
resembles the back-propagation network, but in CPN there exists interconnections between the
units in the cluster layer.

.
Training Algorithm for Forward-only CPN:
Step 0: Initialize the weights and learning rates.
Step 1: Perform step 2 to 7 when stopping condition for phase I training is false.
Step 2: Perform step 3 to 5 for each of training input X.
Step 3: Set the X-input layer activation to vector X.
Step 4: Compute the winning cluster unit J. If dot product method is used, find the cluster unit zJ
with the largest net input:
zinj=∑xi.vij

35
If Euclidean distance is used, find the cluster unit zJ square of whose distance from the input
pattern is smallest:
Dj=∑(xi-vij)^2
If there exists a tie in the selection of winner unit, the unit with the smallest index is chosen as
the winner.
Step 5: Perform weight updation for unit zJ. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
Step 6: Reduce learning rate α:
α (t+1)=0.5α(t)
Step 7: Test the stopping condition for phase I training.
Step 8: Perform step 9 to 1 when stopping condition for phase II training is false.
Step 9: Perform step 10 to 13 for each training input pair x:y.
Step 10: Set X-input layer activations to vector X. Set Y-output layer activation to vector Y.
Step 11: Find the winning cluster unit J.
Step 12: Update the weights into unit zJ. For i=1 to n,
viJ(new)=viJ(old) + α[xi-viJ(old)]
Step 13: Update the weights from unit zJ to the output units.
For k=1 to m, wJk(new)=wJk(old) + β[yk-wJk(old)]
Step 14: Reduce learning rate β,
β(t+1)=0.5β(t)
Step 15: Test the stopping condition for phase II training.

Adaptive Resonance Theory (ART):

Adaptive resonance theory is a type of neural network technique developed by Stephen


Grossberg and Gail Carpenter in 1987. The basic ART uses unsupervised learning technique. The
term “adaptive” and “resonance” used in this suggests that they are open to new learning(i.e.
adaptive) without discarding the previous or the old information(i.e. resonance). The ART
networks are known to solve the stability-plasticity dilemma i.e., stability refers to their nature of
memorizing the learning and plasticity refers to the fact that they are flexible to gain new
information.

36
Types of Adaptive Resonance Theory(ART)

Carpenter and Grossberg developed different ART architectures as a result of 20 years of


research. The ARTs can be classified as follows:
• ART1 – It is the simplest and the basic ART architecture. It is capable of clustering
binary input values.
• ART2 – It is extension of ART1 that is capable of clustering continuous-valued input
data.
• Fuzzy ART – It is the augmentation of fuzzy logic and ART.
• ARTMAP – It is a supervised form of ART learning where one ART learns based on the
previous ART module. It is also known as predictive ART.
• FARTMAP – This is a supervised ART architecture with Fuzzy logic included.

Basic of Adaptive Resonance Theory (ART) Architecture

The adaptive resonant theory is a type of neural network that is self-organizing and
competitive. It can be of both types, the unsupervised ones(ART1, ART2, ART3, etc) or the
supervised ones(ARTMAP). Generally, the supervised algorithms are named with the suffix
“MAP”.
But the basic ART model is unsupervised in nature and consists of :
• The F1 layer accepts the inputs and performs some processing and transfers it to the F2
layer that best matches with the classification factor.
There exist two sets of weighted interconnection for controlling the degree of similarity
between the units in the F1 and the F2 layer.
• The F2 layer is a competitive layer.The cluster unit with the large net input becomes the
candidate to learn the input pattern first and the rest F2 units are ignored.
• The reset unit makes the decision whether or not the cluster unit is allowed to learn the
input pattern depending on how similar its top-down weight vector is to the input vector
and to he decision. This is called the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or new
information. Higher vigilance produces more detailed memories, lower vigilance
produces more general memories.
Generally two types of learning exists,slow learning and fast learning.In fast learning, weight
update during resonance occurs rapidly. It is used in ART1.In slow learning, the weight change
occurs slowly relative to the duration of the learning trial. It is used in ART2.

➢ Advantage of Adaptive Resonance Theory (ART)

• It exhibits stability and is not disturbed by a wide variety of inputs provided to its
network.
• It can be integrated and used with various other techniques to give more good results.

37
• It can be used for various fields such as mobile robot control, face recognition, land cover
classification, target recognition, medical diagnosis, signature verification, clustering web
users, etc.
• It has got advantages over competitive learning (like bpnn etc). The competitive learning
lacks the capability to add new clusters when deemed necessary

➢ Limitations of Adaptive Resonance Theory:

Some ART networks are inconsistent (like the Fuzzy ART and ART1) as they depend upon
the order in which training data, or upon the learning rate.

****

38

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy