0% found this document useful (0 votes)

383 views

Neural Networks Question Bank

The document provides a question bank for neural networks with 12 multiple choice questions and answers. Some key details: - Question 1 asks about the type of network shown in Figure 1 and the answer is that it is a single layer feed-forward neural network. - Question 2 asks about the output of a 3-input neuron after training and generalization, and the answer is that the output will be zero when the input is 000, 010, 110, or 100. - Question 3 defines a perceptron as a single layer feed-forward neural network with preprocessing. This summarizes the document structure and provides context around 3 sample questions and answers from the neural networks question bank.

Uploaded by

Josphat Otieno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

383 views

Neural Networks Question Bank

Uploaded by

Josphat Otieno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

NEURAL NETWORKS QUESTION BANK

1. The network of figure 1, is:

(a) a single layer feed-forward neural network
(b) an autoassociative neural network
(c) a multiple layer neural network

Figure 1

The answer is (a).

The network of figure 1, is a single layer feed-forward neural network because there is only one neuron
between any input and output. The network is not autoassociative i.e. doesn't have feedback because
there are no loops in it.

2. A 3-input neuron is trained to output a zero when the input is 110 and a one when the input is
111. After generalisation, the output will be zero when and only when the input is:
(a) 000 or 110 or 011 or 101
(b) 010 or 100 or 110 or 101
(c) 000 or 010 or 110 or 100

The answer is (c).

The truth table before generalisation is:
Inputs Output

000 $

001 $

010 $

011 $

100 $

101 $
110 0

111 1

where $ represents don't know cases and the output is random.

After generalisation, the truth table becomes:

Inputs Output

000 0

001 1

010 0

011 1

100 0

101 1

110 0

111 1

Therefore, the output will be zero when the input is 000 or 010 or 110 or 100

3. A perceptron is:
(a) a single layer feed-forward neural network with preprocessing
(b) an autoassociative neural network
(c) a double layer autoassociative neural network

The answer is (a).

The perceptron is a single layer feed-forward neural network. It is not an autoassociative network
because it has no feedback and is not a multiple layer neural network because the preprocessing stage is
not made of neurons.

4. An autoassociative network is:

(a) a neural network that contains no loops
(b) a neural network that contains feedback
(c) a neural network that has only one loop

The answer is (b).

An autoassociative network is equivalnet to a neural network that conatins feedback. The number of
feedback paths(loops) does not have to be one.
5. A 4-input neuron has weights 1, 2, 3 and 4. The transfer function is linear with the constant of
proportionality being equal to 2. The inputs are 4, 10, 5 and 20 respectively. The output will be:
(a) 238
(b) 76
(c) 119

The answer is (b).

The output is found by multipling the weights with their respective inputs, summing the results and
multipling with the trasfer function. Therefore:
Output = 2 * (1*4 + 2*10 + 3*5 + 4*20) = 238

6. Which of the following is true?

(i) On average, neural networks have higher computational rates than conventional computers.
(ii) Neural networks learn by example.
(iii) Neural networks mimic the way the human brain works.

(a) all of them are true

(b) (ii) and (iii) are true
(c) (i), (ii) and (iii) are true

The answer is (a)

Neural networks have higher computational rates than conventional computers because a lot of the
operation is done in parallel.
Note: that is not the case when the neural network is simulated on a computer. The idea behind neural
nets is based on the way the human brain works. Neural nets cannot be programmed, they cam only
learn by examples.

7. Which of the following is true for neural networks?

(i) The training time depends on the size of the network.
(ii) Neural networks can be simulated on a conventional computer.
(iii)Artificial neurons are identical in operation to biological ones.

(a) all of them are true.

(b) (ii) is true.
(c) (i) and (ii) are true.

The answer is (c).

The training time depends on the size of the network; the number of neuron is greater and therefore the
the number of possible 'states' is increased. Neural networks can be simulated on a conventional
computer but the main advantage of neural networks - parallel execution - is lost. Artificial neurons are
not identical in operation to the biological ones. We don't know yet what the real neurons do in detail.
8. What are the advantages of neural networks over conventional computers?
(i) They have the ability to learn by example
(ii) They are more fault tolerant
(iii)They are more suited for real time operation due to their high 'computational' rates
(a) (i) and (ii) are true
(b) (i) and (iii) are true
(c) all of them are true

The answer is (c).

Neural networks learn by example.
They are more fault tolerant because they are always able to respond and small changes in input do not
normally cause a change in output.
Because of their parallel architecture, high computational rates are achived.

9. Which of the following is true?

Single layer associative neural networks do not have the ability to:
(i) perform pattern recognition
(ii) find the parity of a picture
(iii)determine whether two or more shapes in a picture are connented or not
(a) (ii) and (iii) are true
(b) (ii) is true
(c) all of them are true

The answer is (a).

Pattern recognition is what single layer neural networks are best at but they don't have the ability to find
the parity of a picture or to determine whether two shapes are connected or not.

10. The network shown in Figure 1 is trained to recognize the characters H and T as shown below:

If the following pattern was given

What would be the output of the network?

(a)

(b)

(c)

The answer is (b).

The top square of the output is black because the top pattern differs in two squares from a T and in 3
squares from an H.
The middle square is not defined because the middle row of the input defers the same amount from both
T and H(differs in 1). Therefore, the output can be either black or white.
The bottom square is black because it differs from a T and in 2 from an H.

11. With a supervised learning algorithm, we can specify target output values, but we may never
get close to those targets at the end of learning. Give two reasons why this might happen.

Answer:

(i) data may be valid, and inconsistency results from a stochastic aspect of the task (or some aspect of
the task is not modelled by the input data collected);

(ii) the data may contain errors - e.g. measurement errors or typographical errors

12. Describe the architecture and the computational task of the NetTalk neural network.

Answer:
Each group of 29 input units represents a letter, so inputs together represent seven letters computational
task is to output the representation of the phoneme corresponding to the middle letter of the seven.

13. Why does a time-delay neural network (TDNN) have the same set of incoming weights for each
column of hidden units?

Answer:

To provide temporal translation invariance. Or So that the TDNN will be able to identify the input sound,
no matter which frame the input sound begins in.

14. Distinguish between a feedforward network and a recurrent network.

Answer:

A feedforward network has no cyclic activation flows.

15. Draw the weight matrix for a feedforward network, showing the partitioning. You can assume
that the weight matrix for connections from the input layer to the hidden layer is Wih, and that the
weight matrix for connections from the hidden layer to the output layer is Who.

Answer:

16. In a Jordan network with i input neurons, h hidden layer neurons, and o output neurons:

(a) how many neurons will there be in the state vector, and

(b) if i = 4, h = 3, and o = 2, draw a diagram showing the connectivity of the network. Do not forget
the bias unit.

Answer:

(a) o neurons in state vector (same as output vector – that s letter o, not zero)

(b)
17. Draw a diagram illustrating the architecture of Elman’s simple recurrent network that performs
a temporal version of the XOR task. How are the two inputs to XOR provided to this network?

Answer:

The inputs are passed sequentially to the single input unit (0) of the temporal XOR net.

18. Briefly describe the use of cluster analysis in Elman’s lexical class discovery experiments, and
one of his conclusions from this.

Answer:

Elman clustered hidden unit activation patterns corresponding to different input vectors and different
sequences of input units. He found that the clusters corresponded well to the grammatical contexts in
which the inputs (or input sequences) occurred, and thus concluded that the network had in effect learned
the grammar.

19. Draw an architectural diagram of a rank 2 tensor product network where the dimensions of the
input/output vectors are 3 and 4. You do not need to show the detailed internal structure of the
binding units.

Answer:
20. Draw a diagram of a single binding unit in a rank 2 tensor product network illustrating the
internal operation of the binding unit in teaching mode.

Answer:

21. Define the concepts of dense and sparse random representations. How do their properties
compare with those of an orthonormal set of representation vectors.

Answer:

In a dense random representation, each vector component is chosen at random from a uniform
distribution over say [–1, +1]. In a sparse random representation, the non-zero components are chosen in
this way, but most components are chosen (at random) to be zero. In both cases, the vectors are
normalised so that they have length 1.

Members of orthonormal sets of vectors have length one, and are orthogonal to one another. Vectors in
dense and sparse random representations are “orthogonal on average” – their inner products have a
mean of zero.

22. What is a Hadamard matrix? Describe how a Hadamard matrix can be used to produce suitable
distributed concept representation vectors for a tensor product network. What are the properties
of the Hadamard matrix that makes the associated vectors suitable?

Answer:
A Hadamard matrix H is a square matrix of side n, all of whose entries are ±1, which satisfies HHT = In …
the identity matrix of side n. The rows of a Hadamard matrix, once normalised, can be used as distributed
representation vectors in a tensor product network. This is because the rows are orthogonal to each
other, and have no zero-components.

23. In a 2-D self-organising map with input vectors of dimension m, and k neurons in the map,

how many weights will there be?

Answer:

24. Describe the competitive process of the Self-Organising Map algorithm.

Answer:

Let m denote the dimension of the input pattern

x = [x1, x2, ..., xm]T

The weight vector for each of the neurons in the SOM also has dimension m. So for neuron j, the weight
vector will be:

wj = [wj1, wj2, ..., wjm]T

For an input pattern x, compute the inner product wj•x for each neuron, and choose the largest inner
product. Let i(x) denote the index of the winning neuron (and also the output of a trained SOM).

25. Briefly explain the concept of a Voronoi cell.

Answer:

Given a set of vectors X, the Voronoi cells about those vectors are the ones that partition the space they
lie in, according to the nearest-neighbour rule. That is, the Voronoi cell that a vector lies in is that
belonging to the x ‫ א‬X to which it is closest.

26. Briefly explain the term code book in the context of learning vector quantisation.

Answer:

When compressing data by representing vectors by the labels of a relatively small set of reconstruction
vectors, the set of reconstruction vectors is called the code book.

27. Describe the relationship between the Self-Organising Map algorithm, and the Learning Vector
Quantisation algorithm.

Answer:
In order to use Learning Vector Quantisation (LVQ), a set of approximate reconstruction vectors is first
found using the unsupervised SOM algorithm. The supervised LVQ algorithm is then used to fine-tune the
vectors found using SOM.

28. Briefly describe two types of attractor in a dynamical system.

Answer:

An attractor is a bounded subset of space to which non-trivial regions of initial conditions converge at time
passes. Pick two of …

• point attractor: system converges to a single point

• limit cycle: system converges to a cyclic path

• chaotic attractor: stays within a bounded region of space, but no predictable cyclic path

29. Write down the energy function of a BSB network with weight matrix W, feedback constant β,
and activation vector x.

Answer:

30. Compute the weight matrix for a 4-neuron Hopfield net with the single fundamental memory ξ1
= [1,–1, –1,1] stored in it.

Answer:

31. Write down the energy function of a discrete Hopfield net.

Answer:

32. What is Artificial Neural Network?

An extremely simplified model of the brain

● Essentially a function approximator
► Transforms inputs into outputs to the best of its ability

Composed of many “neurons” that co-operate to perform the desired function

33. What Are ANNs Used For?

● Classification
► Pattern recognition, feature extraction, image matching
● Noise Reduction
► Recognize patterns in the inputs and produce noiseless outputs
● Prediction
► Extrapolation based on historical data

Ability to learn
► NN’s figure out how to perform their function on their own
► Determine their function based only upon sample inputs
● Ability to generalize
► i.e. produce reasonable outputs for inputs it has not been taught how to deal with
34. How do Neural Networks Work?

• The “building blocks” of neural networks are the neurons.

• In technical systems, we also refer to them as units or nodes.

• Basically, each neuron

– receives input from many other neurons,

– changes its internal state (activation) based on the current input,

– sends one output signal to many other neurons, possibly including its input neurons
(recurrent network)

• . Information is transmitted as a series of electric impulses, so-called spikes.

• The frequency and phase of these spikes encodes the information.

• In biological systems, one neuron can be connected to as many as 10,000 other neurons.

• Usually, a neuron receives its information from other neurons in a confined area, its so-called
receptive field.

• NNs are able to learn by adapting their connectivity patterns so that the organism improves its
behavior in terms of reaching certain (evolutionary) goals.

• The strength of a connection, or whether it is excitatory or inhibitory, depends on the state of a

receiving neuron’s synapses.

The NN achieves learning by appropriately adapting the states of its synapses

The output of a neuron is a function of the weighted sum of the inputs plus a bias

● The function of the entire neural network is simply the computation of the outputs of all the neurons
► An entiirely determin
nistic calculatiion

35. Expla
ain Gaussian Neurons

Another tyype of neuron

ns overcomess this problem
m by using a Gaussian
G activvation functio
on:

net i ( t ) −1

n i (t )) = e
f i (net σ2

Gaussian neurons are able to realizze non-linear functions.The

f erefore, netwo
orks of Gausssian units are in
principle unrestricted
u with
w regard to the functionss that they can
n realize. The
e drawback off Gaussian
neurons iss that we hav n input doess not exceed 1. This adds some difficullty to
ve to make sure that their net
the learnin
ng in Gaussiaan networks.

36. Expla
ain Sigmoida
al Neurons

Sigmoidal neurons acc cept any vecto ors of real numbers as input, and they output
o a real number
n betweeen 0
gmoidal neuro
and 1. Sig ons are the most
m common type of artificcial neuron, esspecially in le
earning netwoorks.
A networkk of sigmoidall units with m input neuronns and n outpu
ut neurons realizes a netw work function
f: R → (0
m
0,1) n

f i (net i (t ))) =
1
1+ e − ( net i ( t ) −θ ) / τ

The param meter τ contro

ols the slope of the sigmoidd function, wh meter θ contro
hile the param ols the horizontal
offset of th
he function in
n a way simila
ar to the threshold neuronss.

opagation nettworks, we typically choose τ = 1 and θ = 0.

In backpro

37. Expla
ain Correlatio
on Learning

Hebbian Learning
L (194
49):
“When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes place in
firing it, some growth process or metabolic change takes place in one or both cells such that A’s
efficiency, as one of the cells firing B, is increased.”

Weight modification rule:

Δwi,j = c⋅xi⋅xj

Eventually, the connection strength will reflect the correlation between the neurons’ outputs.

38. Explain Competitive Learning

• Nodes compete for inputs

• Node with highest activation is the winner

• Winner neuron adapts its tuning (pattern of weights) even further towards the current input

• Individual nodes specialize to win competition for a set of similar inputs

• Process leads to most efficient neural representation of input space

• Typical for unsupervised learning

39. Explain Linear Neurons

Obviously, the fact that threshold units can only output the values 0 and 1 restricts their applicability to
certain problems.

We can overcome this limitation by eliminating the threshold and simply turning fi into the identity function

xi (t ) = net i (t )
so that we get:

With this kind of neuron, we can build feedforward networks with m input neurons and n output neurons
that compute a function f: Rm → Rn

Linear neurons are quite popular and useful for applications such as interpolation.

However, they have a serious limitation: Each neuron computes a linear function, and therefore the
overall network function f: Rm → Rn is also linear.

This means that if an input vector x results in an output vector y, then for any factor φ the input φ⋅x will
result in the output φ⋅y.

Obviously, many interesting functions cannot be realized by networks of linear neurons.

40. Explain Gradient Descent

Gradient descent is a very common technique to find the absolute minimum of a function.

It is especially useful for high-dimensional functions. We will use it to iteratively minimizes the network’s
(or neuron’s) error by finding the gradient of the error surface in weight-space and adjusting the
weights in the opposite direction.
ample: Findin
Gradient--descent exa ng the absolutte minimum of
o a one-dimensional error function f(x):

Repeat th
his iteratively until
u for some
e xi, f’(xi) is su
ufficiently close to 0.

Gradientss of two-dimen
nsional functio
ons:

The two-ddimensional fu e left diagram is represente

unction in the ed by contour lines in the riight diagram,
where arrrows indicate the gradient ofo the function
n at different locations. Obbviously, the gradient
g is alw
ways
pointing in
n thedirection of the steepeest increase of
o the functionn. In order to find the functtion’s minimumm,
we should d always mov ve against thee gradient.

41. Devellop a Percep

ptron Training
g Algorithm?
?

Algorithm Perceptron;

ndomly chose
Start with a ran en weight vecctor w0;

et k = 1;
Le

w
while there exist input vecto
ors that are
missclassified by wk-1, do

Let ij be
b a misclassiified input vecctor;
Let xk = class(ij)⋅ij, implying that wk-1⋅xk < 0;

Update the weight vector to wk = wk-1 + ηxk;

Increment k;

end-while;

For example, for some input i with class(i) = -1,

If w⋅i > 0, then we have a misclassification.

Then the weight vector needs to be modified to w + Δw

with (w + Δw)⋅i < w⋅i to possibly improve classification.

We can choose Δw = -ηi, because

(w + Δw)⋅i = (w - ηi)⋅i = w⋅i - ηi⋅i < w⋅i,

and i⋅i is the square of the length of vector i and is thus positive.

If class(i) = 1, things are the same but with opposite signs; we introduce x to unify these two cases.

42. Develop an Adaline Learning Algorithm?

The Adaline uses gradient descent to determine the weight vector that leads to minimal error.

Error is defined as the MSE between the neuron’s net input netj and its desired output dj (= class(ij))
across all training samples ij.

The idea is to pick samples in random order and perform (slow) gradient descent in their individual error
functions.

This technique allows incremental learning, i.e., refining of the weights as more training samples are

∂E ∂
E = (d j − net j ) 2
added.

= 2(d j − net j ) (− net j )

∂wk ∂wk

⎜ − ∑ wl il , j ⎟
∂ ⎛ n ⎞
= 2(d j − net j )
∂wk ⎝ l =0 ⎠
= −2(d j − net j )ik , j

⎡ ∂E ⎤
The Adaline uses gradient descent to determine the weight vector that leads to minimal error.

⎢ ∂w ⎥
⎢ 0⎥
The gradient is then given by

⎢ : ⎥ = −2(d j − net j )i j
⎢ ∂E ⎥
⎢ ∂wn ⎥
⎣ ⎦
For gradient descent, Δw should be a negative multiple of the gradient:

Δw = η (d j − net j )i j , with positive step - size parameter η

43. Explain the difference between Internal Representation Issues and External Interpretation
Issues?

Internal Representation Issues

As we said before, in all network types, the amplitude of input signals and internal signals is limited:

• analog networks: values usually between 0 and 1

• binary networks: only values 0 and 1allowed

• bipolar networks: only values –1 and 1allowed

Without this limitation, patterns with large amplitudes would dominate the network’s behavior.

A disproportionately large input signal can activate a neuron even if the relevant connection weight is very
small.

External Interpretation Issues

From the perspective of the embedding application, we are concerned with the interpretation of input and
output signals.

These signals constitute the interface between the embedding application and its NN component.

Often, these signals only become meaningful when we define an external interpretation for them.

This is analogous to biological neural systems: The same signal becomes completely different meaning
when it is interpreted by different brain areas (motor cortex, visual cortex etc.).

Without any interpretation, we can only use standard methods to define the difference (or similarity)
between signals.

For example, for binary patterns x and y, we could…

• … treat them as binary numbers and co
ompute
their difference as | x – y |

• nd use the co
… treat them as vectors an osine of the
angle betweeen them as a measure of similarity
s

• … count the numbers

n of digits that we would
w have
to flip in orde
er to transform
m x into y (Haamming
distance)

Example: Two binary patterns

p x and
d y:

0001011111000100011001
x = 00010 1011001001
y = 10000
0100001000010000100001
1000011110

These pattterns seem to be very diffferent from ea

ach other. How
wever, given their external interpretatio
on…

…xx and y actua

ally represent the
same thin
ng.

44. Expla
ain the proce
ess of data re
epresentation
n?

• M
Most networks
s process info
ormation in the
e form of inpu
ut pattern vecctors.

• These network
ks produce ou
utput pattern vectors
v that are
a interpreted
d by the embe
edding appliccation.

• All networks prrocess one off two types off signal compo
A onents: analo
og (continuously variable)
siignals or discrete (quantize
ed) signals.

• In mplitude has a minimum and a maximum

n both cases, signals have a finite amplitude; their am
va
alue.
The main question is:

How can wew appropriately capture these

t signals and represen
nt them as pa
attern vectors that we can feed
f
into the ne
etwork?

We shouldd aim for a da

ata representaation scheme ork to detect (and
e that maximizzes the abilityy of the netwo
respond to
o) relevant fe
eatures in the input pattern.

Relevant features are those

t that ena
able the netw
work to genera ed output pattern.
ate the desire

Similarly, we also need

d to define a set
s of desired outputs that the network can
c actually produce.
p

Often, a “natural” repre

esentation of the
t output da
ata turns out to
o be impossib
ble for the nettwork to produce.

oing to consid
We are go der internal re
epresentation and external interpretation
n issues as well
w as specificc
f creating appropriate representationss.
methods for

45. Expla
ain the proce
ess of Multicllass Discrimination?

Often, ourr classification

n problems in
nvolve more th
han two classses. For exam
mple, characteer recognition
n
requires at
a least 26 diffferent classess. We can perform such ta
asks using layyers of percep
ptrons or Adallines

A four-nod
de perceptron
n for a four-cla
ass problem in n-dimensio
onal input spa
ace

Each percceptron learns

s to recognize
e one particullar class, i.e., output 1 if th
he input is in that class, and
d0
otherwise
e.
The units can be trained separately and in parallel.

In production mode, the network decides that its current input is in the k-th class if and only if ok = 1, and
for all j ≠ k, oj = 0, otherwise it is misclassified.

For units with real-valued output, the neuron with maximal output can be picked to indicate the class of
the input.

This maximum should be significantly greater than all other outputs, otherwise the input is misclassified.

46. Explain difference between Supervised and unsupervised learning?

• Supervised learning:
An archaeologist determines the gender of a human skeleton based on many past examples of
male and female skeletons.

• Unsupervised learning:
The archaeologist determines whether a large number of dinosaur skeleton fragments belong to
the same species or multiple species. There are no previous data to guide the archaeologist, and
no absolute criterion of correctness.

47. Explain different ways of representing the data in the neural network system? 10

48. Explain temporal data representations? Give example. 10

49. Write a note on Adaptive Networks

As you know, there is no equation that would tell you the ideal number of neurons in a multi-layer
network.
Ideally, we would like to use the smallest number of neurons that allows the network to do its task
sufficiently accurately, because of:
• the small number of weights in the system,
• fewer training samples being required,
• faster training,
• typically, better generalization for new test samples.

So far, we have determined the number of hidden-layer units in BPNs by “trial and error.”
However, there are algorithmic approaches for adapting the size of a network to a given task.
Some techniques start with a large network and then iteratively prune connections and nodes that
contribute little to the network function.
Other methods start with a minimal network and then add connections and nodes until the network
reaches a given performance level.
Finally, there are algorithms that combine these “pruning” and “growing” approaches.

50. Write a note on Cascade correlation

None of these algorithms are guaranteed to produce “ideal” networks. (It is not even clear how to define
an “ideal” network.)
However, numerous algorithms exist that have been shown to yield good results for most applications.
We will take a look at one such algorithm named “cascade correlation.”
It is of the “network growing” type and can be used to build multi-layer networks of adequate size.
However, these networks are not strictly feed-forward in a level-by-level manner.
This learning algorithm is much faster than backpropagation learning, because only one neuron is trained
at a time.
On the other hand, its inability to retrain neurons may prevent the cascade correlation network from
finding optimal weight patterns for encoding the given function.

cov(x, y ) = ∑
( xi − x )( yi − y )
51. Explain Covariance and Correlation

For a dataset (xi, yi) with i = 1, …, n the covariance is: n

i =1 n

Covariance tells us something about the strength and direction (directly vs. inversely proportional) of the
linear relationship between x and y.
For many applications, it is useful to normalize this variable so that it ranges from -1 to 1.
The result is the correlation coefficient r, which for a dataset (xi, yi) with i = 1, …, n is given by:

r = corr (x, y ) =
∑ ( xi − x )( yi − y )
n

∑ ∑
i =0

( xi − x ) 2 ( yi − y ) 2
n n
i =0 i =0
In the case of high (close to 1) or low (close to -1) correlation coefficients, we can use one variable as a
predictor of the other one.
To quantify the linear relationship between the two variables, we can use linear regression:

52. What are the benefits to have smallest number of neurons in the network? 4

53. Develop a cascade correlation algorithm? Why it is used for? What are its advantages?

We start with a minimal network consisting of only the input neurons (one of them should be a constant
offset = 1) and the output neurons, completely connected as usual.
The output neurons (and later the hidden neurons) typically use output functions that can also produce
negative outputs; e.g., we can subtract 0.5 from our sigmoid function for a (-0.5, 0.5) output range.
Then we successively add hidden-layer neurons and train them to reduce the network error step by step:
Weights to each new hidden node are trained to maximize the covariance of the node’s output with the
current network error.

S(wnew ) = ∑ ∑ ( xnew, p − xnew )( Ek , p − Ek )

Covariance:
K P

k =1 p =1
wnew : vector of weights to the new node

xnew, p : output of the new node to p-th input sample

Ek , p : error of k-th output node for p-th input sample before the new node is added

xnew and Ek : averages over the training set

None of these algorithms are guaranteed to produce “ideal” networks.

(It is not even clear how to define an “ideal” network.)
However, numerous algorithms exist that have been shown to yield good results for most applications.
We will take a look at one such algorithm named “cascade correlation.”
It is of the “network growing” type and can be used to build multi-layer networks of adequate size.
However, these networks are not strictly feed-forward in a level-by-level manner.

= η ∑∑ S k ( Ek , p − Ek ) f ' p I i , p
δS
Since we want to maximize S (as opposed to minimizing some error), we use gradient ascent:

Δwi = η
K P

δwi k =1 p =1

I i , p : i-th input for the p-th pattern

Sk : sign of the correlation between the node’s output and the k-th network output

η : learning rate

f 'p : derivative of the node’s activation function with respect to its net input, evaluated at p-th pattern

If we can find weights so that the new node’s output perfectly covaries with the error in each output node,
we can set the new output node weights and offsets so that the new error is zero.
More realistically, there will be no perfect covariance, which means that we will set each output node
weight so that the error is minimized.
To do this, we can use gradient descent or linear regression for each individual output node weight.
The next added hidden node will further reduce the remaining network error, and so on, until we reach a
desired error threshold.

This learning algorithm is much faster than backpropagation learning, because only one neuron is trained
at a time.
On the other hand, its inability to retrain neurons may prevent the cascade correlation network from
finding optimal weight patterns for encoding the given function.

54. What are input space clusters and radial basic functions (RBFs)? 6

only depends on the Euclidean distance μ between the input vector and another (“weight”) vector.
To achieve such local “receptive fields,” we can use radial basis functions, i.e., functions whose output

ρ g (μ ) ∝ e − ( μ / c )
A typical choice is a Gaussian function: 2

where c determines the “width” of the Gaussian.

However, any radially symmetric, non-increasing function could be used.

55. Explain linear interpolation for one dimensional and multidimensional case? 5
For function approximation, the desired output for new (untrained) inputs could be estimated by linear
interpolation.

( f (x2 ) − f (x1 ))(x0 − x1 )

As a simple example, how do we determine the desired output of a one-dimensional function at a new

f ( x0 ) = f ( x1 ) +
input x0 that is located between known data points x1 and x2?

(x2 − x1 )
D1 f ( x1 ) + D2 f ( x2 )
−1 −1
f ( x0 ) =
which simplifies to:

D1 + D2
−1 −1

with distances D1 and D2 from x0 to x1 and x2, resp.

( )
In the multi-dimensional case, hyperplane segments connect neighboring points so that the desired

D1 f (x1 ) + D2 f (x 2 ) + ... + DP0 f x P0

output for a new input x0 is determined by the P0 known samples that surround it:
−1 −1 −1

f (x 0 ) =
+D + ... + DP0
−1 −1 −1
D
Example for f:R →R (with desired output indicated):
1 for input
Where Dp is the Euclidean distance between x0 and xp and f(xp) is the desired output value 2 xp.
2 1

For four nearest neighbors, the desired output for x0 is

5 D2 + 4 D3 + 7 D6 + 6 D7
−1 −1 −1 −1
f (x 0 ) = ≈ 5 .5
D2 + D3 + D6 + D7
−1 −1 −1 −1

56. Explain different types of learning methods? What are counter propagation networks?
Unsupervised/Supervised Learning ….

The counterpropagation network (CPN) is a fast-learning combination of unsupervised and supervised

learning.
Although this network uses linear neurons, it can learn nonlinear functions by means of a hidden layer of
competitive units.
Moreover, the network is able to learn a function and its inverse at the same time.
However, to simplify things, we will only consider the feedforward mechanism of the CPN.

57. Explain the process of learning in radial basic function networks? 5

If we are using such linear interpolation, then our radial basis function (RBF) ρ0 that weights an input
vector based on its distance to a neuron’s reference (weight) vector is ρ0(D) = D-1.

∑d ( )
For the training samples xp, p = 1, …, P0, surrounding the new input x, we find for the network’s output o:

o∝ ρ 0 x − x p , where d p = f ( x p )
1
p
P0 p
(In the following, to keep things simple, we will assume that the network has only one output neuron.
However, any number of output neurons could be implemented.)

( )
samples and use any monotonically decreasing RBF ρ:
Since it is difficult to define what “surrounding” should mean, it is common to consider all P training

∑o= dpρ x − xp
1 P
P p =1
This, however, implies a network that has as many hidden nodes as there are training samples. This in
unacceptable because of its computational complexity and likely poor generalization ability – the network
resembles a look-up table.
It is more useful to have fewer neurons and accept that the training set cannot be learned 100%

∑ϕ ρ ( x − μ )
accurately:

o=
N
1
i =1
i i
N
Here, ideally, each reference vector μi of these N neurons should be placed in the center of an input-
space cluster of training samples with identical (or at least similar) desired output ϕi.
To learn near-optimal values for the reference vectors and the output weights, we can – as usual –
employ gradient descent.

58. Write a note on distance and similarity functions with respect to counterpropagation network?
5
In the hidden layer, the neuron whose weight vector is most similar to the current input vector is the
“winner.”

s (w, x) = w ⋅ x
There are different ways of defining such maximal similarity, for example:
(1) Maximal cosine similarity (same as net input):

d (w, x) = ∑ (wi − xi )
(2) Minimal Euclidean distance: 2

i
(no square root necessary for determining the winner)

59. Develop a counterpropagation network learning algorithm? 10

A simple CPN with two input neurons, three hidden neurons, and two output neurons can be described as
follows:
The CPN learning process (general form for n input units and m output units):
1. Randomly select a vector pair (x, y) from the training set.
2. If you use the cosine similarity function, normalize (shrink/expand to “length” 1) the input vector x

∑
by dividing every component of x by the magnitude ||x||, where
|| x || =
n
2
x
j=1
j
3. Initialize the input neurons with the resulting vector and compute the activation of the hidden-layer
units according to the chosen similarity measure.
4. In the hidden (competitive) layer, determine the unit W with the largest activation (the winner).

(t ) + α ( xn − wWn
5. Adjust the connection weights between W and all N input-layer units according to the formula:
H
wWn (t + 1) = wWn
H H
(t ))
6. Repeat steps 1 to 5 until all training patterns have been processed once.
7. Repeat step 6 until each input pattern is consistently associated with the same competitive unit.
8. Select the first vector pair in the training set (the current pattern).
9. Repeat steps 2 to 4 (normalization, competition) for the current pattern.

(t ) + β ( ym − wmW
10. Adjust the connection weights between the winning hidden-layer unit and all M output layer units

(t + 1) = wmW
according to the equation:
O O O
wmW (t ))
11. Repeat steps 9 and 10 for each vector pair in the training set.
12. Repeat steps 8 through 11 for several epochs.

60. Develop a Quickprop learning algorithm? 10

The assumption underlying Quickprop is that the network error as a function of each individual weight can
be approximated by a paraboloid.
Based on this assumption, whenever we find that the gradient for a given weight switched its sign
between successive epochs, we should fit a paraboloid through these data points and use its minimum as
the next weight value.
Illustration (sorry for the crummy paraboloid):
Newton’s method:
E = aw 2 + bw + c
∂E (t ) ∂E (t − 1)
= E ' (t ) = 2aw(t ) + b = E ' (t − 1) = 2aw(t − 1) + b
∂w ∂w
( E ' (t ) − E ' (t − 1)) w(t )
E ' (t ) − E ' (t − 1) E ' (t ) − E ' (t − 1) ⇒ b = E ' (t ) −
⇒ 2a = = Δw(t − 1)
w (t ) − w( t −
For the minimum of E we must have:1) Δw(t − 1)

∂E (t + 1) ⇒ w(t + 1) = −
= 2aw(t + 1) + b = 0
b
∂w 2a
⎛ Δw(t − 1) ⎞ [ E ' (t ) − E ' (t − 1)]w(t ) − E ' (t )Δw(t − 1) E ' (t )Δw(t − 1)
⇒ w(t + 1) = ⎜⎜ ⎟⎟ ⇒ w(t + 1) = w(t ) +
⎝ E ' (t ) − E ' (t − 1) ⎠ Δw(t − 1) E ' (t − 1) − E ' (t )
Notice that this method cannot be applied if the error gradient has not decreased in magnitude and has
not changed its sign at the preceding time step.
In that case, we would ascent in the error function or make an infinitely large weight modification.
In most cases, Quickprop converges several times faster than standard backpropagation learning.

61. Develop an Rprop learning algorithm? 10

Resilient Backpropagation (Rprop)

The Rprop algorithm takes a very different approach to improving backpropagation as compared to
Quickprop.
Instead of making more use of gradient information for better weight updates, Rprop only uses the sign of
the gradient, because its size can be a poor and noisy estimator of required weight updates.
Furthermore, Rprop assumes that different weights need different step sizes for updates, which vary
throughout the learning process.

epochs, we increase its step size Δij, because the weight’s optimal value may be far away.
The basic idea is that if the error gradient for a given weight wij had the same sign in two consecutive

If, on the other hand, the sign switched, we decrease the step size.
Weights are always changed by adding or subtracting the current step size, regardless of the absolute
value of the gradient.
This way we do not “get stuck” with extreme weights that are hard to change because of the shallow

⎧ + ( t −1 ) ∂ E ( t −1 ) ∂ E ( t )
slope in the sigmoid function.

⎪η ⋅ Δ ij , ⋅ >0
⎪ ∂ w ij ∂ w ij
if

⎪ ∂ E ( t −1 ) ∂ E ( t )
Δ(ijt ) = ⎨η − ⋅ Δ(ijt −1) , if ⋅ <0
Formally, the step size update rules are:

Empirically, best results were obtained

of 0.1, η+=1.2, η-=1.2, Δmax=50, and Δmin=10-6

with initial step sizes

⎧ ∂E (t )
Weight updates are then performed as follows:

⎪ − Δ ij , if >0
∂
It is important to remember that, like in (t )

⎪
Quickprop, in Rprop the gradient needs to be
w
⎪ ∂E
computed across all ij

wij(t ) = ⎨ + Δ(ijt ) , if <0

samples (per-epoch learning). (t )

⎪ ∂
⎪
w ij

⎪
0 , otherwise
The performance of Rprop is comparable to Quickprop; it also⎩considerably accelerates backpropagation
learning. Compared to both the standard backpropagation algorithm and Quickprop, Rprop has one
advantage:
Rprop does not require the user to estimate or empirically determine a step size parameter and its
change over time. Rprop will determine appropriate step size values by itself and can thus be applied “as
is” to a variety of problems without significant loss of efficiency.

62. What are Maxnets? Give example. 5

A maxnet is a recurrent, one-layer network that uses competition to determine which of its nodes has the

All pairs of nodes have inhibitory connections with the same weight -ε, where typically ε ≤ 1/(# nodes).
greatest initial input value.

In addition, each node has a self-excitatory connection to itself, whose weight θ is typically 1.

net = ∑ wi xi
The nodes update their net input and their output by the following equations:

f (net ) = max(0, net )

i
All nodes update their output simultaneously.
With each iteration, the neurons’ activations will decrease until only one neuron remains active.
This is the “winner” neuron that had the greatest initial input.
Maxnet is a biologically plausible implementation of a maximum-finding function.
In parallel hardware, it can be more efficient than a corresponding serial function.

Example of a Maxnet with five neurons and θ = 1, ε = 0.2:

We can add maxnet connections to the hidden layer of a CPN to find the winner neuron.
63. Write a note on Kohonen maps? 5

Self-Organizing Maps (Kohonen Maps)

As you may remember, the counterpropagation network employs a combination of supervised and
unsupervised learning. We will now study Self-Organizing Maps (SOMs) as examples for completely
unsupervised learning (Kohonen, 1980). This type of artificial neural network is particularly similar to
biological systems (as far as we understand them).
In the human cortex, multi-dimensional sensory input spaces (e.g., visual input, tactile input) are
represented by two-dimensional maps.
The projection from sensory inputs onto such maps is topology conserving.
This means that neighboring areas in these maps represent neighboring areas in the sensory input
space.
For example, neighboring areas in the sensory cortex are responsible for the arm and hand regions.

Such topology-conserving mapping can be achieved by SOMs:

• Two layers: input layer and output (map) layer
• Input and output layers are completely connected.
• Output neurons are interconnected within a defined
neighborhood.
• A topology (neighborhood relation) is defined on
the output layer.

Network structure:

Common output-layer structures:

A neighborhood function ϕ(i, k) indicates how closely neurons i and k in the output layer are
connected to each other. Usually, a Gaussian function on the distance between the two neurons

in the layer is used:

64. Describe Adaptive resonance theory with an example? 10

Adaptive Resonance Theory (ART) networks perform completely unsupervised learning.

Their competitive learning algorithm is similar to the first (unsupervised) phase of CPN learning.

However, ART networks are able to grow additional neurons if a new input cannot be categorized
appropriately with the existing neurons.

A vigilance parameter ρ determines the tolerance of this matching process.

A greater value of ρ leads to more, smaller clusters (= input samples associated with the same winner
neuron).

ART networks consist of an input layer and an output layer.

We will only discuss ART-1 networks, which receive binary input vectors.

Bottom-up weights are used to determine output-layer candidates that may best match the current input.

Top-down weights represent the “prototype” for the cluster defined by each output neuron.

A close match between input and prototype is necessary for categorizing the input.

Finding this match can require multiple signal exchanges between the two layers in both directions until
“resonance” is established or a new neuron is added.

ART networks tackle the stability-plasticity dilemma:

Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a new weight vector)
if the given input cannot be classified by existing clusters.

Stability: Existing clusters are not deleted by the introduction of new inputs (new clusters will just be
created in addition to the old ones).

Problem: Clusters are of fixed size, depending on ρ.

A. Initialize each top-down weight tl,j (0) = 1;

B. Initialize bottom-up weight bj,l (0) = ;

C. While the network has not stabilized, do

1. Present a randomly chosen pattern x = (x1,…,xn) for learning

2. Let the active set A contain all nodes; calculate

yj = bj,1 x1 +…+bj,n xn for each node j A;

3. Repeat

a) Let j* be a node in A with largest yj, with ties being broken arbitrarily;

b) Compute s* = (s1,…,sn ) where sl = tl,j xl ;

c) Compare similarity between s* and x with the given vigilance parameter r :

if < r then remove j* from set A

else associate x with node j* and update weights:

bjl (new) = tl,j (new) =

Until A is empty or x has been associated with some node j

4. If A is empty, then create new node whose weight vector coincides with current input
pattern x;

end-while

65. What is classification?

A. Deciding which features to use in a pattern recognition problem.

B. Deciding which class an input pattern belongs to.

C. Deciding which type of neural network to use.

Answer: B

66. What is a pattern vector?

A. A vector of weights w = [w1,w2, ...,wn]T in a neural network.

B. A vector of measured features x = [x1, x2, ..., xn]T of an input example.

C. A vector of outputs y = [y1, y2, ..., yn]T of a classifier.

Answer: B

67. For a minimum distance classifier with one input variable, what is the decision boundary
between two classes?

A. A line.

B. A curve.

C. A plane.

D. A hyperplane.

E. A discriminant.

Answer: E

68. For a Bayes classifier with two input variables, what is the decision boundary between two
classes?

A. A line.

B. A curve.

C. A plane.

D. A hypercurve.
E. A discriminant.

Answer: B

69. Design a minimum distance classifier with three classes using the following training data:

Then classify the test vector [0.5,−1]T with the trained classifier. Which class does this vector
belong to?

A. Class 1.

B. Class 2.

C. Class 3.

Answer: B

70. The decision function for a minimum distance classifier is dj(x) = xTmj – 1/2mjTmj where mj is
the prototype vector for class !j . What is the value of the decision function for each of the three
classes in above question for the test vector [0,−0.5]T ?

A. d1(x) = −1.5, d2(x) = −0.5, d3(x) = −0.5.

B. d1(x) = −0.875, d2(x) = −0.375, d3(x) = −2.375.

C. d1(x) = −0.5, d2(x) = −0.5, d3(x) = −1.5.

D. d1(x) = −0.375, d2(x) = −0.875, d3(x) = −0.875.

Answer: A

71. Is the following statement true or false? “An outlier is an input pattern that is very different
from the typical patterns of the same class”.

A. TRUE.

B. FALSE.

Answer: A

72. What is generalization?

A. The ability of a pattern recognition system to approximate the desired output values for pattern vectors
which are not in the test set.

B. The ability of a pattern recognition system to approximate the desired output values for pattern vectors
which are not in the training set.
C. The ability of a pattern recognition system to extrapolate on pattern vectors which are not in the
training set.

D. The ability of a pattern recognition system to interpolate on pattern vectors which are not in the test
set.

Answer: B

73. Is the following statement true or false? “In the human brain, roughly 70% of the neurons are
used for input and output. The remaining 30% are used for information processing.”

A. TRUE.

B. FALSE.

Answer: B

74. Which of the following statements is the best description of supervised learning?

A. “If a particular input stimulus is always active when a neuron fires then its weight should be increased.”

B. “If a stimulus acts repeatedly at the same time as a response then a connection will form between the
neurons involved. Later, the stimulus alone is sufficient to activate the response.”

C. “The connection strengths of the neurons involved are modified to reduce the error between the
desired and actual outputs of the system.”

Answer: C

75. Is the following statement true or false? “Artificial neural networks are parallel computing
devices consisting of many interconnected simple processors.”

A. TRUE.

B. FALSE.

Answer: A

76. Is the following statement true or false? “Knowledge is acquired by a neural network from its
environment through a learning process, and this knowledge is stored in the connections
strengths (neurons) between processing units (weights).”

A. TRUE.

B. FALSE

Answer: A

77. A neuron with 4 inputs has the weight vector w = [1, 2, 3, 4]T and a bias _ = 0 (zero). The
activation function is linear, where the constant of proportionality equals 2 — that is, the
activation function is given by f(net) = 2 × net. If the input vector is x = [4, 8, 5, 6]T then the output
of the neuron will be
A. 1.

B. 56.

C. 59.

D. 112.

E. 118.

Answer: E

78. Which of the following types of learning can used for training artificial neural networks?

A. Supervised learning.

B. Unsupervised learning.

C. Reinforcement learning.

D. All of the above answers.

E. None of the above answers.

Answer: D

79. Which of the following neural networks uses supervised learning?

A. Simple recurrent network.

B. Self-organizing feature map.

C. Hopfield network.

D. All of the above answers.

E. None of the above answers.

Answer: A

80. Which of the following algorithms can be used to train a single-layer feedforward network?

A. Hard competitive learning.

B. Soft competitive learning.

C. A genetic algorithm.

D. All of the above answers.

E. None of the above answers.

Answer: D

81. What is the biggest difference between Widrow & Hoff’s Delta Rule and the Perceptron
Learning Rule for learning in a single-layer feedforward network?

A. There is no difference.

B. The Delta Rule is defined for step activation functions, but the Perceptron Learning Rule is defined for
linear activation functions.

C. The Delta Rule is defined for sigmoid activation functions, but the Perceptron Learning Rule is defined
for linear activation functions.

D. The Delta Rule is defined for linear activation functions, but the Perceptron Learning Rule is defined for
step activation functions.

E. The Delta Rule is defined for sigmoid activation functions, but the Perceptron Learning Rule is defined
for step activation functions.

Answer: D

82. Why are linearly separable problems interesting to neural network researchers?

A. Because they are the only problems that a neural network can solve successfully.

B. Because they are the only mathematical functions that are continuous.

C. Because they are the only mathematical functions that you can draw.

D. Because they are the only problems that a perceptron can solve successfully.

Answer: D

83. A perceptron with a unipolar step function has two inputs with weights w1 = 0.5 and w2 = −0.2,
and a threshold _ = 0.3 (_ can therefore be considered as a weight for an extra input which is
always set to -1).

For a given training example x = [0, 1]T , the desired output is 1. Does the perceptron give the
correct answer (that is, is the actual output the same as the desired output)?

A. Yes.

B. No.

Answer: B

84. The perceptron in question 22 is trained using the learning rule 4w = _ (d − y) x, where x is the
input vector, _ is the learning rate, w is the weight vector, d is the desired output, and y is the
actual output.
What are the new values of the weights and threshold after one step of training with the input
vector x = [0, 1]T and desired output 1, using a learning rate _ = 0.5?

A. w1 = 0.5,w2 = −0.2, _ = 0.3.

B. w1 = 0.5,w2 = −0.3, _ = 0.2.

C. w1 = 0.5,w2 = 0.3, _ = −0.2.

D. w1 = 0.5,w2 = 0.3, _ = 0.7.

E. w1 = 1.0,w2 = −0.2, _ = −0.2.

Answer: C

85. The Perceptron Learning Rule states that “for any data set which is linearly separable, the
Perceptron Convergence Theorem is guaranteed to find a solution in a finite number of steps.”

A. TRUE.

B. FALSE.

Answer: B

86. Is the following statement true or false? “The XOR problem can be solved by a multi-layer
perceptron but a multi-layer perceptron with bipolar step activation functions cannot learn to do
this.”

A. TRUE.

B. FALSE.

Answer: A

87. The Adaline neural network can be used as an adaptive filter for echo cancellation in
telephone circuits. For the telephone circuit given in the above figure, which one of the following
signals carries the corrected message sent from the human speaker on the left to the human
listener on the right? (Assume that the person on the left transmits an outgoing voice signal and
receives an incoming voice signal from the person on the right.)

A. The outgoing voice signal, s.

B. The delayed incoming voice signal, n.

C. The contaminated outgoing signal, s + n0.

D. The output of the adaptive filter, y.

E. The error of the adaptive filter, " = s + n0 − y.

Answer: E
88. What is the credit assignment problem in the training of multi-layer feedforward networks?

A. The problem of adjusting the weights for the output layer.

B. The problem of adapting the neighbours of the winning unit.

C. The problem of defining an error function for linearly inseparable problems.

D. The problem of avoiding local minima in the error function.

E. The problem of adjusting the weights for the hidden layers.

Answer: E

89. Is the following statement true or false? “The generalized Delta rule solves the credit
assignment problem in the training of multi-layer feedforward networks.”

A. TRUE.

B. FALSE.

Answer: A

90. A common technique for training MLFF networks is to calculate the generalization error on a
separate data set after each epoch of training. Training is stopped when the generalization error
starts to decrease. This technique is called

A. Boosting.

B. Momentum.

C. Hold-one-out.

D. Early stopping.

E. None of the above answers.

Answer: E

91. Which of the following statements is NOT true for an autoassociative feedforward network with
a single hidden layer of neurons?

A. During training, the target output vector is the same as the input vector.

B. It is important to use smooth, non-decreasing activation functions in the hidden units.

C. The network could be trained using the backpropagation algorithm, but care must be taken to deal with
the problem of local minima.

D. After training, the hidden units give a representation that is equivalent to the principal components of
the training data, removing non-redundant parts of the input data.
E. The trained network can be split into two machines: the first layer of weights compresses the input
pattern (encoder), and the second layer of weights reconstructs the full pattern (decoder).

Answer: D

92. Which of the following statements is NOT true for a simple recurrent network (SRN)?

A. The training examples must be presented to the network in the correct order.

B. The test examples must be presented to the network in the correct order.

C. This type of network can predict the next chunk of data in the series from the past history of data.

D. The hidden units encode an internal representation of the data in the series that precedes the current
input.

E. The number of context units should be the same as the number of input units.

Answer: E

93. How many hidden layers are there in an autoassociative Hopfield network?

A. None (0).

B. One (1).

C. Two (2).

Answer: A

94. A Hopfield network has 20 units. How many adjustable parameters does this network contain?

A. 95

B. 190

C. 200

D. 380

E. 400

Answer: B

95. Is the following statement true or false? “Patterns within a cluster should be similar in some
way.”

A. TRUE.

B. FALSE.
Answer: A

96. Is the following statement true or false? “Clusters that are similar in some way should be far
apart.”

A. TRUE.

B. FALSE.

Answer: B

97. Which of the following statements is NOT true for hard competitive learning (HCL)?

A. There is no target output in HCL.

B. There are no hidden units in a HCL network.

C. The input vectors are often normalized to have unit length — that is, k x k= 1.

D. The weights of the winning unit k are adapted by 4wk = _ (x − wk), where x is the input vector.

E. The weights of the neighbours j of the winning unit are adapted by 4wj = _j (x − wj ), where

_j < _ and j 6= k.

Answer: E

98. Which of the following statements is NOT true for a self-organizing feature map (SOFM)?

A. The size of the neighbourhood is decreased during training.

B. The SOFM training algorithm is based on soft competitive learning.

C. The network can grow during training by adding new cluster units when required.

D. The cluster units are arranged in a regular geometric pattern such as a square or ring.

E. The learning rate is a function of the distance of the adapted units from the winning unit.

Answer: C

99. Which of the following statements is the best description of reproduction?

A. Randomly change a small part of some strings.

B. Randomly generate small initial values for the weights.

C. Randomly pick new strings to make the next generation.

D. Randomly combine the genetic information from 2 strings.

Answer: C

100. Which of the following statements is the best description of mutation?

A. Randomly change a small part of some strings.

B. Randomly pick new strings to make the next generation.

C. Randomly generate small initial values for the weights.

D. Randomly combine the genetic information from 2 strings.

Answer: A

101. Ranking is a technique used for

A. deleting undesirable members of the population.

B. obtaining the selection probabilities for reproduction.

C. copying the fittest member of each population into the mating pool.

D. preventing too many similar individuals from surviving to the next generation.

Answer: B

102. Is the following statement true or false? “A genetic algorithm could be used to search the
space of possible weights for training a recurrent artificial neural network, without requiring any
gradient information.”

A. TRUE.

B. FALSE.

Answer: A

103. Is the following statement true or false? “Learning produces changes within an agent that
over time enables it to perform more effectively within its environment.”

A. TRUE.

B. FALSE.

Answer: A

104. Which application in intelligent mobile robots made use of a single-layer feedforward
network?

A. Goal finding.

B. Path planning.
C. Wall following.

D. Route following.

E. Gesture recognition.

Answer: C

105. Which application in intelligent mobile robots made use of a self-organizing feature map?

A. Goal finding.

B. Path planning.

C. Wall following.

D. Route following.

E. Gesture recognition.

Answer: D

106. Which application in intelligent mobile robots made use of a genetic algorithm?

A. Goal finding.

B. Path planning.

C. Wall following.

D. Route following.

E. Gesture recognition.

Answer: B

Nutrition & You - Chapter 6
No ratings yet
Nutrition & You - Chapter 6
40 pages
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
ANSI Y32.2-1986 Sup
100% (1)
ANSI Y32.2-1986 Sup
66 pages
TH-5000 Series Products Instruction Manual V5.2.0.1 (New) PDF
0% (1)
TH-5000 Series Products Instruction Manual V5.2.0.1 (New) PDF
269 pages
Exam 2003
No ratings yet
Exam 2003
21 pages
NN Question Bank VIISem
No ratings yet
NN Question Bank VIISem
42 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Sheet 1 Neural Network
No ratings yet
Sheet 1 Neural Network
5 pages
Machine Learning Unit 2 MCQ
No ratings yet
Machine Learning Unit 2 MCQ
17 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
CS 672 - Neural Networks - Practice - Midterm - Solutions
No ratings yet
CS 672 - Neural Networks - Practice - Midterm - Solutions
7 pages
ML Mcqs Without Answers
50% (2)
ML Mcqs Without Answers
21 pages
Exam 2001
No ratings yet
Exam 2001
17 pages
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
No ratings yet
Omkar Sabnis B4-764 Experiment No. 7 Aim: Implementation of MC-Culloch Pitt Model For AND Gate Using Python. Theory
10 pages
CISC 867: Deep Learning Assignment #1: K J Net
No ratings yet
CISC 867: Deep Learning Assignment #1: K J Net
3 pages
Machine Learning MCQ'S
No ratings yet
Machine Learning MCQ'S
3 pages
ANN Quiz | PDF | Artificial Neural Network | Computational Science
No ratings yet
ANN Quiz | PDF | Artificial Neural Network | Computational Science
17 pages
Question Bank For NN
No ratings yet
Question Bank For NN
6 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
ML Assignment 3
No ratings yet
ML Assignment 3
5 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
ML Mid Sem Question Bank
No ratings yet
ML Mid Sem Question Bank
11 pages
Perceptron Notes
No ratings yet
Perceptron Notes
4 pages
MCQ Machine Learning
No ratings yet
MCQ Machine Learning
23 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
2005 Neural Networks and Applications
No ratings yet
2005 Neural Networks and Applications
4 pages
Artificial Neural Networks Quiz Questions 1
No ratings yet
Artificial Neural Networks Quiz Questions 1
17 pages
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
No ratings yet
Introduction To Machine Learning - Unit 3 - Week 1 - Non - Graded
3 pages
Train A Simple NN - Jupyter Notebook
No ratings yet
Train A Simple NN - Jupyter Notebook
4 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
CS230 Midterm Fall 2022
No ratings yet
CS230 Midterm Fall 2022
14 pages
Early Detection of Parkinson S Disease Using Machine Learning 11591
No ratings yet
Early Detection of Parkinson S Disease Using Machine Learning 11591
7 pages
NNFLC Question
No ratings yet
NNFLC Question
1 page
Assignment - Week 6 (Neural Networks) Type of Question: MCQ/MSQ
No ratings yet
Assignment - Week 6 (Neural Networks) Type of Question: MCQ/MSQ
4 pages
1 FIND+S+Algorithm
No ratings yet
1 FIND+S+Algorithm
2 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
Backpropagation Examples PDF
No ratings yet
Backpropagation Examples PDF
9 pages
MCQ
No ratings yet
MCQ
4 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Assignment 11
100% (1)
Assignment 11
4 pages
ANN Architecture
No ratings yet
ANN Architecture
41 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
Question Bank - Machine Learning
No ratings yet
Question Bank - Machine Learning
16 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
100% (2)
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Exam 2003 B
No ratings yet
Exam 2003 B
20 pages
DL Lab Manual
No ratings yet
DL Lab Manual
65 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Associative Memory Neural Networks
100% (1)
Associative Memory Neural Networks
35 pages
Assignment 1: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 1: Introduction To Machine Learning Prof. B. Ravindran
4 pages
Neural Network PDF
No ratings yet
Neural Network PDF
84 pages
SC MCQs
100% (1)
SC MCQs
12 pages
Name:-Time Allowed: - 3 Hours: Artificial Neural Networks Exam
No ratings yet
Name:-Time Allowed: - 3 Hours: Artificial Neural Networks Exam
11 pages
300+ TOP Neural Networks Multiple Choice Questions and Answers
No ratings yet
300+ TOP Neural Networks Multiple Choice Questions and Answers
29 pages
Question Bank - Deep Learning
No ratings yet
Question Bank - Deep Learning
25 pages
Isometric View Scale: 1:15: Designed by Date Size Drawing Title REV Project Steel Tower
No ratings yet
Isometric View Scale: 1:15: Designed by Date Size Drawing Title REV Project Steel Tower
1 page
Fractions
100% (2)
Fractions
21 pages
Etherchannel in GNS3
No ratings yet
Etherchannel in GNS3
3 pages
C36000 (Free-Cutting Brass)
No ratings yet
C36000 (Free-Cutting Brass)
2 pages
Unit - Ii Mechanics of Structure
No ratings yet
Unit - Ii Mechanics of Structure
2 pages
Blockchain Fundamentals
No ratings yet
Blockchain Fundamentals
13 pages
Gas Fluxing of Molten Aluminum: An Overview and Update: Liquid
No ratings yet
Gas Fluxing of Molten Aluminum: An Overview and Update: Liquid
8 pages
Centurion Plus Tranfer Guide
No ratings yet
Centurion Plus Tranfer Guide
12 pages
Airmar Speed and Temperature Model Identification Chart
No ratings yet
Airmar Speed and Temperature Model Identification Chart
1 page
Advanced Data Base Course Outline (RVU)
No ratings yet
Advanced Data Base Course Outline (RVU)
4 pages
Learning Patterns
No ratings yet
Learning Patterns
437 pages
Kyron 200603
100% (1)
Kyron 200603
1,011 pages
Applications of Derivatives
No ratings yet
Applications of Derivatives
5 pages
PPT1 CB VIII Math Rational Numbers
No ratings yet
PPT1 CB VIII Math Rational Numbers
7 pages
TT 400, Turbocor TIM
No ratings yet
TT 400, Turbocor TIM
68 pages
Chapter 2 Atoms Molecules Ions
No ratings yet
Chapter 2 Atoms Molecules Ions
81 pages
20210724170553D6181 - ACCT6300 - Session 3 (Lab)
No ratings yet
20210724170553D6181 - ACCT6300 - Session 3 (Lab)
29 pages
Router Lint Report
No ratings yet
Router Lint Report
16 pages
Loads On Structures (Case of Two-Way Slabs) : Preliminary Design
No ratings yet
Loads On Structures (Case of Two-Way Slabs) : Preliminary Design
5 pages
Krasnov - Aerodynamics 1
100% (3)
Krasnov - Aerodynamics 1
512 pages
Hypothesis Formulation: Capstone Project First Quarter: Week 5
No ratings yet
Hypothesis Formulation: Capstone Project First Quarter: Week 5
3 pages
AWR5524EXC
No ratings yet
AWR5524EXC
2 pages
Unit 6. Light Activities
No ratings yet
Unit 6. Light Activities
4 pages
Presentation
No ratings yet
Presentation
12 pages
CCNA 2 v7 Modules 7
No ratings yet
CCNA 2 v7 Modules 7
32 pages
AACS2284 Tut 4 Question
No ratings yet
AACS2284 Tut 4 Question
8 pages
Formula Sheet-158
No ratings yet
Formula Sheet-158
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.