UNIT3
UNIT3
DECISION TREE
NAÏVE BAYESIAN
MULTILAYER PERCEPTRON
Decision trees
1. Classification trees
Tree models where the target variable can take a
discrete set of values are called classification trees. In
these tree structures, leaves represent class labels
and branches represent conjunctions of features that
lead to those class labels.
2. Regression trees
Decision trees where the target variable can take
continuous values (real numbers) like the price of a
house, or a patient’s length of stay in a hospital, are
called regression trees.
Classification trees
Consider the data given in Table 8.1 which specify the features of certain
vertebrates and the class to which they belong. For each species, four features have
been identified: “gives birth”, ”aquatic animal”, “aerial animal” and “has legs”.
There are five class labels, namely, “amphibian”, “bird”, “fish”, “mammal” and
“reptile”. The problem is how to use this data to identify the class of a newly
discovered vertebrate.
Construction of the tree
Step 1
We split the set of examples given into disjoint subsets according to
the values of the feature “gives birth”. Since there are only two
possible values for this feature, we have only two subsets: One
subset consisting of those examples for which the value of “gives
birth” is “yes” and one subset for which the value is “no”.
Step 2
We now consider the example We split these examples
based on the values of the feature “aquatic animal”. There
are three possible values for this feature.
these appear in Table 8.2. Accordingly, we need
consider only two subsets. These are shown in Tables
8.4 and 8.5.
• Table 8.4 contains only one example and hence no further splitting is required. It
leads to the assignment of the class label “fish”.
• The examples in Table 8.5 need to be split into subsets based on the values of
“aerial animal”. It can be seen that these subsets immediately lead to unambiguous
assignment of class labels: The value of “no” leads to “mammal” and the value “yes”
leads to ”bird”.
• Entropy is measured in bits. If there are only two possible classes, entropy
values can range from 0 to 1.
• In each case, the minimum value indicates that the sample is completely
homogeneous, while the maximum value indicates that the data are as
diverse as possible, and no group has even a small plurality.
Example
Problem
Use ID3 algorithm to construct a decision tree for the
data in Table 8.9.
Solution
Note that, in the given data, there are four features but only
two class labels (or, target variables),
namely, “yes” and “no”.
Step 1
We first create a root node for the tree (see Figure 8.7).
Step 2
Note that not all examples are positive (class label “yes”) and not all examples are
negative (class label “no”). Also the number of features is not zero.
Step 3
We have to decide which feature is to be placed at the root node. For this, we
have to calculate the information gains corresponding to each of the four features.
The computations are shown below.
(i) Calculation of Entropy (S)
(ii) Calculation of Gain (S, outlook)
The values of the attribute “outlook” are “sunny”, “ overcast” and “rain”.
We have to calculate
Entropy (Sv) for v = sunny, v = overcast and v = rain.
(iii) Calculation of Gain (S, temperature)
The values of the attribute “temperature” are “hot”,
“mild” and “cool”. We have to calculate
Entropy (Sv) for v = hot, v = mild and v = cool.
• The classifier is also known as “naive Bayes Algorithm” where the word
“naive” is an English word with the following meanings: simple,
unsophisticated, or primitive.
• P(A) is called the prior probability of proposition and P(B) is called the
prior probability of evidence
Artificial neurons
Definition
• An artificial neuron is a mathematical function conceived as a model of
biological neurons.
• Artificial neurons are elementary units in an artificial neural network.
• The artificial neuron receives one or more inputs (representing excitatory
postsynaptic potentials and inhibitory postsynaptic potentials at neural
dendrites) and sums them to produce an output.
• Each input is separately weighted, and the sum is passed through a function
known as an activation function or transfer function.
• The small circles in the schematic representation of the artificial neuron shown in
Figure 9.3 are called the nodes of the neuron.
• The circles on the left side which receives the values of x0, x1, . . . , xn
are called the input nodes and the circle on the right side which outputs the
value of y is called output node.
• The squares represent the processes that are taking place before the result is
outputted.
Activation function
Definition
In an artificial neural network, the function which takes the incoming signals as input
and produces the output signal is known as the activation function.
Representation of x1 AND x2
Let x1 and x2 be two boolean variables. Then the boolean function x1 AND x2 is
represented. It can be easily verified that the perceptron shown in Figure 9.13 represents
the function
Representations of OR, NAND and NOR
The functions x1 ORx2, x1 NAND x2 and x1 NORx2 can also be represented by
perceptrons. Table 9.2 shows the values to be assigned to the weights w0, w1, w2 for
getting these boolean functions
Learning a perceptron
By “learning a perceptron” we mean the process of assigning values to the weights
and the threshold such that the perceptron produces correct output for each of the
given training examples.
Artificial neural networks
• An artificial neural network (ANN) is a computing system inspired by the
biological neural networks that constitute human brains.
• An ANN is based on a collection of connected units called artificial
neurons. Each connection between artificial neurons can transmit a
signal from one to another.
• The artificial neuron that receives the signal can process it and then send
it to the artificial neurons connected to it.
• Each connection between artificial neurons has a weight attached to it
that get adjusted as learning proceeds.
• Artificial neurons may have a threshold such that only if the aggregate
signal crosses that threshold the signal is sent.
• Artificial neurons are organized in layers. Different layers may perform
different kinds of transformations on their inputs. Signals travel from the
input layer to the output layer, possibly after traversing the layers
multiple times.
Characteristics of an ANN
An ANN can be defined and implemented in several different
ways. The way the following characteristics are defined
determines a particular variant of an ANN.
• The activation function
This function defines how a neuron’s combined input signals
are transformed into a single output signal to be broadcasted
further in the network.
• The network topology (or architecture)
This describes the number of neurons in the model as well as
the number of layers and manner in which they are
connected.
• The training algorithm
This algorithm specifies how connection weights are set in
order to inhibit or excite neurons in proportion to the input
signal.
Activation functions
The activation function is the mechanism by which the artificial
neuron processes incoming information and passes it throughout the
network. Just as the artificial neuron is modeled after the biological
version, so is the activation function modeled after nature’s design.
Let x1, x2, . . . , xn be the input signals, w1, w2, . . . , wn be the
associated weights and −w0 the
threshold.
Let
x = w0 + w1x1 + ⋯ + wnxn.
Network topology
By “network topology” we mean the patterns and structures in the collection
of interconnected nodes.
The topology determines the complexity of tasks that can be learned by the
network. Generally, larger and more complex networks are capable of
identifying more subtle patterns and complex decision boundaries. However,
the power of a network is not only a function of the network size,
but also the way units are arranged.
Different forms of forms of network architecture can be differentiated by the
following characteristics:
• The number of layers
• Whether information in the network is allowed to travel backward
• The number of nodes within each layer of the network
The number of layers
• In an ANN, the input nodes are those nodes which
receive unprocessed signals directly from the input
data.
• The output nodes (there may be more than one) are
those nodes which generate the final predicted
values.
• A hidden node is a node that processes the signals
from the input nodes (or other such nodes) prior to
reaching the output nodes.
• The nodes are arranged in layers. The set of nodes
which receive the unprocessed signals from the input
data constitute the first layer of nodes.
• The set of hidden nodes which receive the outputs
from the nodes in the first layer of nodes constitute
the second layer of nodes.
• In a similar way we can define the third, fourth, etc.
layers. Figure 9.14 shows an ANN with only one layer
of nodes.Figure 9.15 shows an ANN with two layers.
The direction of information travel
• Networks in which the input signal is fed
continuously in one direction from connection to
connection until it reaches the output layer are
called feedforward networks.
• Networks which allows signals to travel in both
directions using loops are called recurrent
networks (or, feedback networks).
• In spite of their potential, recurrent networks are
still largely theoretical and are rarely used
in practice.
• On the other hand, feedforward networks have
been extensively applied to real-world problems.
In fact, the multilayer feedforward network,
sometimes called the Multilayer Perceptron
• (MLP)
The number of nodes in each layer
• The number of input nodes is predetermined by
the number of features in the input data.