Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)
Historical Background
In 1943 two scientists (Warren McCulloch) and (Walter Pitts) invent the first
computational model for artificial neural networks relying on math. and algorithms,
called (Threshold Logic Model). This model has paved the way to the artificial
neural network research, and included two parts, the first focusing on ecological
processes in the brain, and the second search in the artificial neural network
applications in the field of artificial intelligence.
At the end of the forties of the twentieth century put the world (Donald Hebb)
hypotheses based learning mechanical neural plasticity known as (Hebbian
Learning). It is a perfect model for learning the rules without supervisor
(Unsupervised Learning). These ideas and applied to Computational models of
machine (Turing) of the type (B) in 1948.
In 1954 for the first time use (Farley) and (Clark) computation machine (ie digital
calculator) and through the simulation model (Hebb) at the Massachusetts Institute
of Technology (MIT), and there are several scientists worked computation machines
neural networks in 1956, such as (Rochester , Holland, Habit, Duda).
Prof. Dr. Ahmed Tariq Saadeq
Historical Background
In 1958 the world (Rosenblatt) invented the (Perceptron), a pattern recognition
algorithm through a network calculator to learn two-level (Two-Layer) using
addition and subtraction. Subsequently developed to include more
complicated than the first operations. Then developed the world thereafter
(Paul Werbos) to become an algorithm called back propagation algorithm in
1975.
This is worth noting that research on artificial neural networks have experienced a
recession between 1960 and 1975 for several reasons, including the models
failed to address the issues are not complex algorithms proposed Another
reason is the lack of fast computers at that time to deal with such advanced
technology needed to execution speed to get the desired learning and these
reasons outlined in the report to the worlds (Marvin Minsky) and (Seymor
Papert). The period saw the eighties and nineties of the twentieth century, the
emergence of many artificial neural network algorithms and helped to steady
growth this rapid evolution of computing and increase the memories used, and
especially the development that took place in the field of parallel computing
processors (Parallel Computers).
Prof. Dr. Ahmed Tariq Saadeq
Biological Neuron
Artificial Neuron
ANN Components:
- Inputs (Int., real, binary).
- Output (Int., real, binary).
- Activation function (linear, non-linear).
- Weights (fixed, variable).
- ANN topology.
Prof. Dr. Ahmed Tariq Saadeq
ANN Properties
1. Parallelism or synchronization at work: a fundamental feature of the advantages of
artificial neural networks, ie if promised each cell represents a processing unit
(Processor Element) itself linked to turn a large number of units similar treatment,
they will we increase the processing calculator for orders speed tremendously and
this is what you did mainframes, which produced tremendous speed using artificial
neural network technology to computers able to record time to solve very complex
problems companies.
2. The ability to adapt (Adaptation): It is a very important characteristic of artificial
neural networks properties, as they are solving the problem through a particular
algorithm rather than through the programming problem is, they adapted
themselves to solve the problem through the private data of the problem. In other
words, there is no programming to solve the problem but the fact that no
application for a specific learning algorithm and its function to cope with the
requirements of the problem and adapt.
3. Distributed Memory : When the artificial neural network learns the data problem, it
certainly will be storing data or keep it in its own way as any learning base, that is,
they become a real problem memory for data broken down by type of artificial
neural network. There are types of artificial neural networks are used for storage
only serves as a working memory.
Prof. Dr. Ahmed Tariq Saadeq
4. The ability to generalize (Generalization): If artificial neural network trained on a particular
set of models in a particular application, and then were tested this network over other
models is that may I trained them and were successful test results of any that artificial
neural network introduced new models this means it has circulated the base that came out
through her training on the given models, the new models and this is very important
characteristic of the properties of artificial neural networks. In other words, it scrapped the
concept of rule: "If ... Then ...“ through a generalization feature.
5. fault tolerance (Fault Tolerance): As known, the architectural Von Neumann rely on serial
execution of operations (Sequential Processing) meaning that errors that occur in the
suggestions (steps) program affect what follows from the meta, while in architectural
artificial neural networks, that bears the errors that occur in the inputs and correct them or
something like that so it will not affect the work of the artificial neural network as well as for
software solutions to problems using artificial neural network technology.
6. Definition of the problem: As we know, the basis of the work of artificial neural networks is
learning a set of input models or problem situations through certain algorithms, and thus
was a paragraph defining the problem exceeds what has to do so in the traditional solve the
problem of programming cases, this is very important in many applications because it is
difficult to understand what the problem is a mechanism.
Prof. Dr. Ahmed Tariq Saadeq
Some Parameters
: learning rate (0< <1).
Bias or Threshold (): it is an added value to the
weights or a constant node to increase the
convergence or decrease the time learning.
: momentum term (0< <1), it is also use to
increase the convergence or decrease the
time learning during the weights updates.
Prof. Dr. Ahmed Tariq Saadeq
Hopfield NN
Invented by the physician (John Hopfield) in 1982.
Features:
- Unsupervised learning.
- Associative memory.
- Full connection.
- Single layer.
- Feedback.
- Fixed weight.
- Bipolar.
- Linear activation function.
- Input Nodes equal to Output Nodes.
Prof. Dr. Ahmed Tariq Saadeq
Hopfield NN Topology
-
Example Prof. Dr. Ahmed Tariq Saadeq
Assume N=4, P=3 as below
0 1 1 1 1- 1 1 1
0 0 0 0 -1 -1 -1 -1
0 0 1 1 -1 -1 1 1
[1 1 1 0]
Relatively it is good.
Prof. Dr. Ahmed Tariq Saadeq
Back-Propagation NN
(Paul Werbos) was developed back propagation algorithm in 1975. It is
one of the most artificial neural network algorithms in applications
Features:
- Supervised learning.
- Multi Layers.
- Non-Linear Activation Function.
- Full connection.
- Based on GBDR.
- Variable weights.
- Process Forward-Backward.
BP-NN topology
Prof. Dr. Ahmed Tariq Saadeq
BP-NN Algorithm
- Initialize the parameters (A: no. of input nodes), (B:
no. of hidden nodes), (C: no. of output nodes), (P: no.
of samples), (: learning rate) and activation function.
- Generate the weight matrices (W1[AxB], W2[BxC])
randomly.
- While error is acceptance Do
- For each sample Do
- Compute the hidden nodes as below
1 A
Hj where j 1,2,.....B, V W1i, j X i
1 e V i 1
- Compute the output nodes as below
1 B
Oj where j 1,2,.....C, V W2i, j H i
1 e V i 1
- End for
Prof. Dr. Ahmed Tariq Saadeq
- Compute error as
C
error Y Ok
2
k
k 1
- End If;
- End While;
- Save the last weight values;
- End Algorithm.
Prof. Dr. Ahmed Tariq Saadeq
Example: Assume we have 3 input nodes, 2 output nodes and 3 samples as below:
X1 X2 X3 Y1 Y2
1 0 0 1 0
0 0 1 0 1
1 1 1 1 1
Suppose we have 2 hidden nodes with the following weight matrices & BP-NN topology:
X1
0.2 0.1 H1 `Y1
0.1 0.6
W 1 0.3 0.7 W2
X2
Solution:
First sample [1 0 0 1 0]
Sum = W1* X 1
Sum1 = 1*0.2 + 0*0.3 + 0*0.8 = 0.2 h1 0.5498
1 e 0.2
1
Sum2 = 1*0.1 + 0*0.7 + 0*0.5 = 0.1 h2 0.525
1 e 0.1
Prof. Dr. Ahmed Tariq Saadeq
Sum = W2* H 1
o1 0.5915
Sum1 = 0.5498*0.1 + 0.525*0.6 = 0.37 1 e 0.37
1
Sum2 = 0.5498*0.7 + 0.525*0.8 = 0.805 o2 0.691
1 e 0.805
Kohonen - NN
(Teuvo Kohonen) developed this NN in 1982, and adopted this algorithm as a
kind of algorithms that does not need to supervisor and the competitive
type are also used for the purposes of clustering. The structure of this
network includes two layers, the first concerning the inputs (features) and
the second is the output (clusters or groups), the weights determining which
outputs a matrix fits the input. Given the values of the vectors weights
randomly and then examine the inputs start and the extent of bias to the
output through the equation of Euclidean dimension, takes less output value
in terms of the Euclidean dimension siding designated entrance into this
category, and so the second and third inputs to the end, that is, network
self-reorganizing through mathematical equations that each input to the exit
is biased or a particular class. In other words, that there is competition
between them (input) on the bias to the output. Certainly the inputs would
side similar to the same items in other words, it will be based on the
principle of neighboring (Neighborhood).
Prof. Dr. Ahmed Tariq Saadeq
Update of
- Linear : = - , (>0)
- Non-Linear : = , (0<<1)
a= / , (>1)
= log()
Prof. Dr. Ahmed Tariq Saadeq
Example: Assume we have the following samples
11000
00011
10100
00001
Suppose there are 2 clusters, with initial =0.6, min- = 0.1, =0.1 and initial W matrix as below:
C1 C2
0.3 0.7
0.4 0.2
w 0.3 0.6 Wi,j
0.1 0.5
0.5 0.4
X1 X2 X3 X4 X5
For the first sample 1 1 0 0 0 , compute D(j) as below
D(1)= (1-0.3)2+(1-0.4)2+(0-0.3)2+(0-0.1)2+(0-0.5)2=1.2
D(2)= (1-0.7)2+(1-0.2)2+(0-0.6)2+(0-0.5)2+(0-0.4)2=1.5
The min j=1 so, 0.72 0.7
W1,1=0.3+0.6*(1-0.3)= 0.72 0.76 0.2
W1,2=0.4+0.6*(1-0.4)= 0.76 W= 0.12 0.6
W1,3=0.3+0.6*(0-0.3)= 0.12 0.04 0.5
0.2 0.4
W1,4=0.1+0.6*(0-0.1)= 0.04
W1,5=0.5+0.6*(0-0.5)= 0.2
For the first sample 0 0 0 1 1, compute D(j) as below Prof. Dr. Ahmed Tariq Saadeq
D(1)= (0-0.72)2+(0-0.76)2+(0-0.12)2+(1-0.04)2+(1-0.2)2=2.672
D(2)= (0-0.7)2+(0-0.2)2+(0-0.6)2+(1-0.5)2+(1-0.4)2=1.5
The Min j= 2 0.72
0.28
W2,1=0.7+0.6*(0-0.7)= 0.28 0.76
0.08
W2,2=0.2+0.6*(0-0.2)= 0.08 W = 0.12 0.24
0.04 0.8
W2,3=0.6+0.6*(0-0.6)= 0.24
0.2 0.76
W2,4=0.5+0.6*(1-0.5)= 0.8
W2,5=0.4+0.6*(1-0.4)= 0.76
The Third sample 1 0 1 0 0 , d1 = 1.472 , d2 = 2.32
0.888 0.28
0.304 0.08
W = 0.648 0.24
0.016 0.8
0.08 0.76
The fourth sample 0 0 0 0 1, d1 = 2.1475, d2 = 0.84
0.888 0.112
0.304 0.032
W = 0.648 0.096
0.016 0.32
0.08 0.904
Prof. Dr. Ahmed Tariq Saadeq
Update the learning rate (), as : = - = 0.6 - 0.1= 0.5
Then again from the first sample, d1 = 0.9235, d2 = 2.6544
0.944 0.112
0.652 0.032
W= 0.324 0.096
0.008 0.32
0.04 0.904
The second sample, d1 = 3.3269, d2 = 0.4944
0.944 0.056
0.652 0.016
W = 0.324 0.048
0.008 0.66
0.04 0.952
The Third sample, d1 = 0.8869, d2 = 3.1396
0.972 0.056
0.326 0.016
W = 0.662 0.048
0.004 0.66
0.02 0.952
The fourth sample, d1 = 2.4497, d2 = 0.4436
0.972 0.028
0.326 0.008
W = 0.662 0.024
0.004 0.33
0.02 0.976