0% found this document useful (0 votes)
15 views41 pages

Ann I

Uploaded by

SWAROOPA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views41 pages

Ann I

Uploaded by

SWAROOPA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

MACHINE LEARNING

UNIT-II
ARTIFICIAL NEURAL
NETWORKS

P.Swaroopa
Dept. of CSE
UNIT-II

Syllabus:
Artificial Neural Networks-1– Introduction, neural
network representation, appropriate problems for neural
network learning, perceptrons, multilayer networks and
the back-propagation algorithm.
Artificial Neural Networks-2- Remarks on the Back-
Propagation algorithm, An illustrative example: face
recognition, advanced topics in artificial neural networks.
Evaluation Hypotheses – Motivation, estimation
hypothesis accuracy, basics of sampling theory, a general
approach for deriving confidence intervals, difference in
error of two hypotheses, comparing learning algorithms.
ACE Engineering College
INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

Neural network learning methods provide a robust approach to


approximate
 Real valued ex. Percentage obtained by student
 Discrete valued. ex. playing tennis/ enjoy sport or not
 Vector valued target functions . ex. Series of outputs
Artificial neural networks (ANNs) provide a general, practical
method for learning real-valued, discrete-valued, and vector-valued
functions from examples.
Backpropagation algorithm has proven to solve the practical problems
like Learning to recognize :
 Handwritten characters
 Spoken words
 Face recognition
ANN learning is robust to errors in the training data and has been
successfully applied to problems such as interpreting visual scenes,
speech recognition, and learning robot control strategies.
ACE Engineering College
Biological Motivation

ACE Engineering College


Biological Motivation

Artificial neural networks(ANNs) are inspired by the


information processing model of the human
mind/brain.
The human brain consists of billions of neurons that
link with one another in an intricate pattern.
Every neuron receives information from many other
neurons, process it, gets excited or not, and passes
its state information to other neurons.

Inputs Output
ANN
ACE Engineering College
Biological Motivation

Artificialneural networks are built out of a densely


interconnected set of simple units, where each unit takes a
number of real valued inputs (possibly the outputs of other
units) and produces a single real-valued output (which may
become the input to many other units).
For example, Human Brain
 Estimated to contain a densely interconnected network of approximately
1011 neurons, each connected, on average, to 104 others. Neuron activity
is typically excited or inhibited through connections to other neurons.
 The fastest neuron switching times are known to be on the order of 10-3
seconds, quite slow compared to computer switching speeds of 10-
10
seconds.
 Yet humans are able to make surprisingly complex decisions, surprisingly
quickly. For example, it requires approximately 10 -1 seconds to visually
recognize your mother.
ACE Engineering College
One motivation for ANN systems is to capture this
kind of highly parallel computation based on
distributed representations.
Most ANN software runs on sequential machines
emulating distributed processes, although faster
versions of the algorithms have also been
implemented on highly parallel machines and on
specialized hardware designed specifically for ANN
applications.

ACE Engineering College


Business Applications of ANN

They are used in stock price prediction where the


rules of the game are extremely complicated, and a
lot of data needs to be processed very quickly.
They are used for character recognition, as in
recognizing handwritten text, or damaged or
mangled text.
They are used in recognizing finger prints. These
are complicated patters and are unique for each
person. Layers of neurons can progressively clarify
the pattern.

ACE Engineering College
NEURAL NETWORK REPRESENTATIONS

A prototypical example of ANN learning is provided by Pomerleau's


(1993) system ALVINN(Autonomous Land Vehicle in a neural
Network), which uses a learned ANN to steer an autonomous vehicle
driving at normal speeds on public highways.
The input to the neural network is a 30 x 32 grid of pixel
intensities obtained from a forward-pointed camera mounted on the
vehicle.
The network output is the direction in which the vehicle is
steered.
The ANN is trained to mimic the observed steering commands of a
human driving the vehicle for approximately 5 minutes.
 ALVINN has used its learned networks to successfully drive at speeds
up to 70 miles per hour and for distances of 90 miles on public
highways (driving in the left lane of a divided public highway, with
other vehicles present).
ACE Engineering College
Neural network representation used in one version of the
ALVINN system
Each node (i.e., circle)
in the network diagram
corresponds to the
output of a single
network unit, and the
lines entering the node
from below are its
inputs

ACE Engineering College


Neural network learning to steer an autonomous vehicle

The ALVINN system uses


BACKPROPAGATION to learn
to steer an autonomous vehicle
(photo at top) driving at speeds
up to 70 miles per hour.
The diagram on the left shows
how the image of a forward-
mounted camera is mapped to
960 neural network inputs,
which are fed forward to 4
hidden units, connected to 30
output units.
Network outputs encode the
commanded steering direction.
ACE Engineering College
The figure on the right shows
weight values for one of the hidden
units in this network.
The 30 x 32 weights into the hidden
unit are displayed in the large
matrix, with
 white blocks indicating
positive and
 black indicating negative
weights.
The weights from this hidden unit
to the 30 output units are depicted
by the smaller rectangular block
directly above the large block.
As can be seen from these output
weights, activation of this particular
hidden unit encourages a turn
toward the left.
ACE Engineering College
APPROPRIATE PROBLEMS FOR NEURAL
NETWORK LEARNING

ANN learning is well-suited to problems in which the


training data corresponds to
 noisy,
 complex sensor data, such as inputs from cameras and
microphones.
It is also applicable to problems for which more
symbolic representations are often used, such as the
decision tree learning tasks
The BACKPROPAGATION algorithm is the most
commonly used ANN learning technique.

ACE Engineering College


APPROPRIATE PROBLEMS FOR NEURAL NETWORK
LEARNING(cont..)

It is appropriate for problems with the following


characteristics:
Instances are represented by many attribute-value pairs.
The target function output may be discrete-valued, real-
valued, or a vector of several real- or discrete-valued
attributes.
The training examples may contain errors.
Long training times are acceptable.
Fast evaluation of the learned target function may be
required
The ability of humans to understand the learned target
function is not important.
ACE Engineering College
PERCEPTRONS

One type of ANN system is based on a unit called a


perceptron(also called as linear classifier).
A perceptron takes a vector of real-valued inputs,
calculates a linear combination of these inputs, then
outputs a
 1 if the result is greater than some threshold and
 -1 otherwise.

ACE Engineering College


PERCEPTRONS(Cont..)

Given inputs x1 through xn, the output o(x1, . . . , xn)


computed by the perceptron is
 o(x1,. ..,xn) = 1 if w0 + w1x1+ w2x2 + … +wnxn> 0
 -1 otherwise
Where wi is the real valued constant or weight that
determines the contribution of xi.
To simplify notation, imagine additional constant input x0=1

In vector form w .x >0


Perceptron function:
ACE Engineering College
Perceptron

1. Representational Power of Perceptrons


2. The Perceptron Training Rule
3. Gradient Descent and the Delta Rule
a) VISUALIZING THE HYPOTHESIS SPACE
b) DERIVATION OF THE GRADIENT DESCENT RULE
c) STOCHASTIC APPROXIMATION TO GRADIENT DESCENT
4. Remarks

ACE Engineering College


1. Representational Power of Perceptron's

 The perceptron outputs a 1 for instances lying on one side of


the hyperplane and outputs a -1 for instances lying on the
other side.
 Decision hyperplane is w .x = 0.
 Some sets of positive and negative examples cannot be
separated by any hyperplane. Those that can be separated
are called linearly separable sets of examples.
 A single perceptron can be used to represent many boolean
functions. For example, if we assume boolean values of 1 (true) and
-1 (false), then one way to use a two-input perceptron to implement
the
 AND function is to set the weights wo = -3, and w1 = w2 = .5
 OR function instead by altering the threshold to wo = -.3

ACE Engineering College


The decision surface represented by a two-input
perceptron.

ACE Engineering College


2. The Perceptron Training Rule
How to learn the weights for a single perceptron?
The precise learning problem is to determine a weight vector that causes
the perceptron to produce the correct ±1.
Algorithms to solve learning problems:
 The perceptron rule
 The delta rule
Modifying the perceptron weights whenever it misclassifies an example. This
process is repeated, iterating through the training examples as many
times as needed until the perceptron classifies all training examples
correctly.
Weights are modified at each step according to the perceptron training rule,
which revises the weight wi associated with input xi according to the rule wi
wi+Δwi
Δwi= η(t-o)xi
Here,
 t is the target output for the current training example,
 o is the output generated by the perceptron, and
 η is a positive constant called the learning rate.
ACE Engineering College
The role of learning rate is to
moderate the degree to which
weights are changed at each step
It is usually set to some small value (e.g., 0.1) and is
sometimes made to decay as the number of weight-
tuning iterations increases.
Why should this update rule converge toward
successful weight values?
 If the training example is correctly classified already by the
perceptron. Then (t - o) =0, Δwi=0, so that no weights are
updated.
 If the perceptron outputs a -1, when the target output is + 1. To
make the perceptron output a + 1 instead of - 1 in this case, the
weights must be altered to increase the value of w.x.

ACE Engineering College


3.Gradient Descent and the Delta Rule

The perceptron rule finds a successful weight vector when the


training examples are linearly separable, it can fail to converge
if the examples are not linearly separable.
The delta rule, is designed to overcome this difficulty.
The key idea behind the delta rule is to use gradient descent to
search the hypothesis space of possible weight vectors to find
the weights that best fit the training examples.
Gradient descent provides the basis for the
BACKPROPAGATION algorithm, which can learn networks
with many interconnected units.
The delta training rule is best understood by considering the
task of training an unthresholded perceptron; that is, a linear
unit for which the output o is given by
ACE Engineering College
To derive a weight learning rule for linear units, let us begin
by specifying a measure for the training error of a hypothesis
(weight vector), relative to the training examples.
The measure to find the training error:

Where,
 D is the set of training examples,
 td is the target output for training example d, and
 od is the output of the linear unit for training example d.
 By this definition, E(w) is simply half the squared difference between the
target output td and the linear unit output od, summed over all training
examples.
ACE Engineering College
3.1 Visualizing The Hypothesis Space

ACE Engineering College


3.2 Derivation Of The Gradient Descent Rule

How can we calculate the direction of


steepest descent along the error surface?
The direction can be found by computing the
derivative of E with respect to each component of the
vector w. This vector derivative is called the gradient
of E with respect to w, written

ACE Engineering College


ACE Engineering College
ACE Engineering College
weight update rule for gradient descent

ACE Engineering College


3.3 STOCHASTIC APPROXIMATION TO GRADIENT DESCENT

Gradient descent is an important general paradigm


for learning.
It is a strategy for searching through a large or
infinite hypothesis space that can be applied
whenever
(1) the hypothesis space contains continuously
parameterized hypotheses (e.g., the weights in a
linear unit), and
(2) the error can be differentiated with respect to
these hypothesis parameters.

ACE Engineering College


The key practical difficulties in applying gradient
descent are
 (1) converging to a local minimum can sometimes be quite
slow (i.e., it can require many thousands of gradient descent
steps), and
 (2) if there are multiple local minima in the error surface, then
there is no guarantee that the procedure will find the global
minimum.
One common variation on gradient descent intended
to alleviate these difficulties is called incremental
gradient descent, or alternatively stochastic gradient
descent
ACE Engineering College
ACE Engineering College
The key differences between standard gradient descent and
stochastic gradient descent are:

ACE Engineering College


MULTILAYER NETWORKS AND THE
BACKPROPAGATION ALGORITHM

Single Perceptrons can only express linear decision


surfaces.
Multilayer networks learned by the BACKPROPAGA-
TION algorithm are capable of expressing a rich
variety of nonlinear decision surfaces. The speech
recognition
task: It involves
distinguishing
among 10
possible vowels,
all spoken in the
context of "h-d"
ACE Engineering College
• A Differentiable Threshold Unit

we need a unit whose output is a nonlinear function


of its inputs, but whose output is also a differentiable
function of its inputs.
One solution is the sigmoid unit - -a unit very
much like a perceptron, but based on a smoothed,
differentiable threshold function.
The sigmoid unit is illustrated in the following
figure:

ACE Engineering College


The sigmoid unit first computes a linear
combination of its inputs, then applies a
threshold to the result.
In the case of the sigmoid unit, however, the threshold
output is a continuous function of its input. More
precisely, the sigmoid unit computes its output o as

sigmoid function or, alternatively, the logistic


function. Note its output ranges between 0 and
1, increasing monotonically with its input

ACE Engineering College


The BACKPROPAGATION Algorithm

ACE Engineering College


ADDING MOMENTUM

ACE Engineering College


Derivation of the Backpropagation Algorithm

Case 1: training rule for output unit weights


Case 2: training rule for Hidden unit weights

ACE Engineering College


Remarks on the Backpropagation Algorithm

ACE Engineering College


ACE Engineering College
ACE Engineering College

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy