06 Learning Systems
06 Learning Systems
06 Learning Systems
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
Learning Systems : AI Course Lecture 31 34, notes, slides
www.myreaders.info/ , RC Chakraborty, e-mail rcchak@gmail.com , June 01, 2010
www.myreaders.info/html/artificial_intelligence.html
Learning System
Artificial Intelligence
www.myreaders.info
Return to Website
Learning systems, topics : Definition, learning agents, components
of learning system, paradigms of machine learning. Rote Learning :
learning by memorization, learning something by repeating.
Learning from example : Induction, Winston's learning, Version
spaces - learning algorithm (generalization and specialization tree),
Decision trees - ID3 algorithm. Explanation Based Learning (EBL) :
general approach, EBL architecture, EBL system, generalization
problem, explanation structure. Discovery : theory driven AM
system, data driven - BACON system. Clustering : distance
functions, K-mean clustering algorithm. Learning by analogy;
Neural Net Perceptron; Genetic Algorithm. Reinforcement
Learning : RL Problem, agent - environment interaction, RL tasks,
Markov system, Markov decision processes, agents learning task,
policy, reward function, maximize reward, value functions.
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
Learning System
Artificial Intelligence
Topics
(Lectures 31, 32, 33, 34 4 hours)
Slides
1. What is Learning
Definition, learning agents, components of learning system; Paradigms
of machine learning.
03-09
2. Rote Learning
Learning by memorization, Learning something by repeating.
10
3. Learning from Example : Induction
Winston's learning, Version spaces - learning algorithm (generalization
and specialization tree), Decision trees - ID3 algorithm.
11-38
4. Explanation Based Learning (EBL)
General approach, EBL architecture, EBL system, Generalization problem,
Explanation structure.
39-43
5. Discovery
Theory driven AM system, Data driven - BACON system
44-52
6. Clustering
Distance functions, K-mean clustering algorithm.
53-62
7. Analogy
63
8. Neural net and Genetic Learning
Neural Net Perceptron; Genetic learning - Genetic Algorithm.
64-67
9. Reinforcement Learning
RL Problem : Agent - environment interaction, key Features; RL tasks,
Markov system, Markov decision processes, Agents learning task, Policy,
Reward function, Maximize reward, Value functions.
68-80
10 References
81
02
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
Learning
Machine Learning
What is learning ? Some Quotations
Herbert Simon, 1983
Learning denotes changes in a system that enable a system to do the same
task more efficiently the next time.
Marvin Minsky, 1986
Learning is making useful changes in the workings of our minds.
Ryszard Michalski, 1986
Learning is constructing or modifying representations of what is being
experienced.
Mitchell, 1997
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E.
03
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI - Learning
1. What is Learning
Learning denotes changes in a system that enable the system to do the
same task more efficiently next time.
Learning is an important feature of Intelligence.
1.1 Definition
A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P, if its performance at
tasks in T, as measured by P, improves with experience E. (Mitchell 1997)
This means :
Given : A task T
A performance measure P
Some experience E with the task
Goal : Generalize the experience in a way that allows to
improve your performance on the task.
Why do you require Machine Learning ?
Understand and improve efficiency of human learning.
Discover new things or structure that is unknown to humans.
Fill in skeletal or incomplete specifications about a domain.
04
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI - Learning
1.2 Learning Agents
An agent is an entity that is capable of perceiving and do action.
An agent can be viewed as perceiving its environment through sensors and
acting upon that environment through actuators.
Environment
Agent
Sensors Actuators
Human agent Eyes, ears, etc Leg, hands, mouth
Robotic agent Cameras, IR range finders motors
Software agent Key stroke, File contents Displays to screen,
write files
In computer science an agent is a software agent that assists users
and acts in performing computer-related tasks.
05
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI - Learning
Intelligent Agent (Learning Agent)
Agent is an entity that is capable of perceiving and do action.
In computer science an agent is a software agent.
In artificial intelligence, the term used for agent is an intelligent agent.
Learning is an important feature of Intelligence.
Percept : agent's perceptual inputs
Percept sequence : history of everything the agent has perceived
Agent function : describes agents behavior
Agent program : Implements agent's function
Learning Agent consist of four main components :
Learning element,
Performance element,
Critic, and
Problem generator.
Components of a Learning System
The components are described in the next slide.
06
E
N
V
R
O
N
M
E
N
T
Problem
Generator
Sensors
Performance
Element
Learning
Element
Critic
learning
goals
knowledge
feed back
Performance
standard
changes
Effectors
Learning Agent
Percepts
Actions
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI - Learning
Components of a Learning System
Performance Element: The Performance Element is the agent itself that
acts in the world. It takes in percepts and decides on external actions.
Learning Element: It responsible for making improvements, takes
knowledge about performance element and some feedback,
determines how to modify performance element.
Critic: Tells the Learning Element how agent is doing (success or failure)
by comparing with a fixed standard of performance.
Problem Generator: Suggests problems or actions that will generate
new examples or experiences that will aid in training the system further.
Example : Automated Taxi on city roads
Performance Element: Consists of knowledge and procedures for
driving actions.
e.g., turning, accelerating, braking are performance element on roads.
Learning Element: Formulates goals.
e. g., learn rules for braking, accelerating, learn geography of the city.
Critic: Observes world and passes information to learning element.
e. g. , quick right turn across three lanes of traffic, observe reaction of
other drivers.
Problem Generator: Try south city road .
07
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI - Learning
1.2 Paradigms of Machine Learning
Rote Learning: Learning by memorization; One-to-one mapping from
inputs to stored representation; Association-based storage and
retrieval.
Induction: Learning from examples; A form of supervised learning,
uses specific examples to reach general conclusions; Concepts are
learned from sets of labeled instances.
Clustering: Discovering similar group; Unsupervised, Inductive
learning in which natural classes are found for data instances, as
well as ways of classifying them.
Analogy: Determine correspondence between two different
representations that come from Inductive learning in which a system
transfers knowledge from one database into another database of a
different domain.
08
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI - Learning
Discovery: Learning without the help from a teacher; Learning is
both inductive and deductive. It is deductive if it proves
theorems and discovers concepts about those theorems. It is
inductive when it raises conjectures (guess). It is unsupervised,
specific goal not given.
Genetic Algorithms: Inspired by natural evolution; In the natural
world, the organisms that are poorly suited for an environment die off,
while those well-suited for it prosper. Genetic algorithms search the
space of individuals for good candidates. The "goodness" of an
individual is measured by some fitness function. Search takes place in
parallel, with many individuals in each generation.
Reinforcement: Learning from feedback (+ve or -ve reward) given
at end of a sequence of steps. Unlike supervised learning, the
reinforcement learning takes place in an environment where the
agent cannot directly compare the results of its action to a
desired result. Instead, it is given some reward or punishment that
relates to its actions. It may win or lose a game, or be told it has made
a good move or a poor one. The job of reinforcement learning is to
find a successful function using these rewards.
09
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Rote Learning
2. Rote Learning
Rote learning technique avoids understanding the inner complexities but
focuses on memorizing the material so that it can be recalled by the
learner exactly the way it was read or heard.
Learning by Memorization which avoids understanding the inner
complexities the subject that is being learned; Rote learning instead
focuses on memorizing the material so that it can be recalled by the learner
exactly the way it was read or heard.
Learning something by Repeating over and over and over again;
saying the same thing and trying to remember how to say it; it
does not help us to understand; it helps us to remember, like we learn a
poem, or a song, or something like that by rote learning.
10
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
i=1
n
yi=1
k
xj=1
l
xj=1
l
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
yi=1
k
xj=1
l
xj=1
l
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
5. Discovery
Simon (1966) first proposed the idea that we might explain scientific discovery
in computational terms and automate the processes involved on a computer.
Project DENDRAL (Feigenbaum 1971) demonstrated this by inferring structures
of organic molecules from mass spectra, a problem previously solved only by
experienced chemists.
Later, a knowledge based program called AM the Automated Mathematician
(Lenat 1977) discovered many mathematical concepts.
After this, an equation discovery systems called BACON (Langley, 1981)
discovered a wide variety of empirical laws such as the ideal gas law. The
research continued during the 1980s and 1990s but reduced because the
computational biology, bioinformatics and scientific data mining have convinced
many researchers to focus on domain-specific methods. But need for research
on general principles for scientific reasoning and discovery very much exists.
Discovery system AM relied strongly on theory-driven methods of discovery.
BACON employed data-driven heuristics to direct its search for empirical laws.
These two discovery programs are illustrated in the next few slides.
44
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
5.1 Theory Driven Discovery
The Simon's theory driven science, means AI-modeling for theory building.
It starts with an existing theory represented in some or all aspects in form
of a symbolic model and one tries to transform the theory to a runable
program. One important reason for modeling a theory is scientific discovery
in the theory driven approach, this means the discovery of new theoretical
conclusions, gaps, or inconsistencies.
Many computational systems have been developed for modeling different
types of discoveries. The Logic Theorist (1956) was designed to prove
theorems in logic when AI did not exist. Among the more recent
systems, the Automated Mathematician AM (Lenat, 1979) is a good
example in modeling mathematical discovery.
AM (Automated Mathematician)
AM is a heuristic driven program that discovers concepts in elementary
mathematics and set theory. AM has 2 inputs:
(a) description of some concepts of set theory: e.g. union, intersection;
(b) information on how to perform mathematics. e.g. functions.
AM have successively rediscovered concepts such as :
(a) Integers , Natural numbers, Prime Numbers;
(b) Addition, Multiplication, Factorization theorem ;
(c) Maximally divisible numbers, e.g. 12 has six divisors 1, 2, 3, 4, 6, 12.
[AM is described in the next slide.]
45
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
How does AM work ?
AM employs many general-purpose AI techniques.
The system has around 115 basic elements such as sets, lists, elementary
relations. The mathematical concepts are represented as frames. Around
250 heuristic rules are attached to slots in the concepts. The rules present
hints as how to employ functions, create new concepts, generalization etc.
about activities that might lead to interesting discoveries.
The system operates from an agenda of tasks. It selects the most
interesting task as determined by a set of over 50 heuristics. It then
performs all heuristics it can find which should help in executing it. The
heuristics represented as operators are used to generalize, to specialize or
to combine the elementary concepts or relations to make more complex
ones. Heuristics can fill in concept slots, check the content of the slots,
create new concepts, modify task agenda, interestingness levels, etc.
Because it selects the most interesting task to perform at all times, AM is
performing the best-first search in a space of mathematical concepts.
However, its numerous heuristics (over 200) guide its search very
effectively, limiting the number of concepts it creates and improving their
mathematical quality.
46
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
5.2 Data Driven Discovery
Data driven science, in contrast to theory driven, starts with empirical data
or the input-output behavior of the real system without an explicitly given
theory. The modeler tries to write a computer program which generates the
empirical data or input-output behavior of the system. Typically, models are
produced in a generate-and-test-procedure. Generate-and-test means
writing program code which tries to model the i-o-behavior of the real
system first approximately and then improve as long as the i-o-behavior
does not correspond to the real system. A family of such discovery models
are known as BACON programs.
BACON System
Equation discovery is the area of machine learning that develops methods
for automated discovery of quantitative laws, expressed in the form of
equations, in collections of measured data.
BACON is pioneer among equation discovery systems. BACON is a family of
algorithms for discovering scientific laws from data.
BACON.1 discovers simple numeric laws.
BACON.3 is a knowledge based system, has discovered simple empirical
laws like physicists and shown its generality by rediscovering the
Ideal gas law, Kepler's third law, Ohm's law and more.
The next few slides shows how BACON1 rediscovers Keplers third Law and
BACON3 rediscovers Ideal Gas Law.
47
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
BACON.1 : Discovers simple numeric laws.
Given a set of observed values about two variables X and Y, BACON.1 finds
a function Y = f(X) using four heuristics:
Heuristic 1 : If Y has value V in several observed cases, make the
hypothesis that Y = V in all cases.
Heuristic 2 : If X and Y are linearly related with slope S and intercept I
in several observed cases, then make the hypothesis that Y = S X + I
is in all cases.
Heuristic 3 : If X increases as Y decreases, and X and Y are not
linearly related define a new term T as the product of X and Y
ie., T = X Y
Heuristic 4 : If X increases as Y increases, and X and Y are not
linearly related, define a new term T as the division of X by Y
ie., T = X / Y
Note : BACON1 iteratively applies these 4 heuristics until a scientific law is
discovered. Heuristics 1 and 2 detect linear relationships. Heuristics 3 and 4
detect simple non-linear relationships. Heuristics 1 and 2 produce scientific laws.
Heuristics 3 and 4 are intermediate steps.
Example : Rediscovering Keplers third Law
Keplers third Law is stated below. Assume the law is not discovered or known.
"The square of the orbital period T is proportional to the cube of the mean distance
a from the Sun." ie., T
2
= k a
3
, k is constant number, is same for all planets.
If we measure T in years and all distances in "astronomical units AUs" with 1 AU
the mean distance between the Earth and the Sun, then if a = 1 AU, T is one
year, and k with these units just equals 1, i.e. T
2
= a
3
.
Input : Planets, Distance from Sun ( D ) , orbit time Period ( P )
Planet D P
Mercury 0.382 0.241
Venus 0.724 0.616
Earth 1.0 1.0
Mars 1.524 1.881
Jupiter 5.199 11.855
Saturn 9.539 29.459
[continued in next slide]
48
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
Apply heuristics 1 to 4 : (Iteration 1)
Try heuristic 1: not applicable, neither D nor P is constant
Try heuristic 2: not applicable, no linear relationship
Try heuristic 3: not applicable, D increasing P not decreasing
Try heuristic 4: applicable, D increases as P increase,
so add new variable D/P to the data set.
Adding new variable D/P to the data set:
Planet D P D/P
Mercury 0.382 0.241 1.607
Venus 0.724 0.616 1.175
Earth 1.0 1.0 1.0
Mars 1.524 1.881 0.810
Jupiter 5.199 11.855 0.439
Saturn 9.539 29.459 0.324
Apply heuristics 1 to 4 : (Iteration 2)
Try heuristic 1: not applicable, D/P is not constant
Try heuristic 2: not applicable, no linear relationship between D/P and D or P
Try heuristic 3: applicable, D/P decreases as D increases and as P
increases, so the system could add two new
variables: D (D/P) = D
2
/P or P (D/P) = D
but D already exists, so add new variable D
2
/P
Adding new variable D
2
/P to the data set:
Planet D P D/P D
2
/P
Mercury 0.382 0.241 1.607 0.622
Venus 0.724 0.616 1.175 0.851
Earth 1.0 1.0 1.0 1.0
Mars 1.524 1.881 0.810 1.234
Jupiter 5.199 11.855 0.439 2.280
Saturn 9.539 29.459 0.324 3.088
49
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
Apply heuristics 1 to 4 : (Iteration 3)
Try heuristic 1: not applicable, D
2
/P is not constant
Try heuristic 2: not applicable, D
2
/P is not linearly related with any other
variable
Try heuristic 3: applicable, D
2
/P decreases as D/P increases, so add the
new variable: (D
2
/P) (D/P) = D
3
/P
2
Adding new variable D
3
/P
2
to the data set:
Planet D P D/P D
2
/P D
3
/P
2
Mercury 0.382 0.241 1.607 0.622 1.0
Venus 0.724 0.616 1.175 0.851 1.0
Earth 1.0 1.0 1.0 1.0 1.0
Mars 1.524 1.881 0.810 1.234 1.0
Jupiter 5.199 11.855 0.439 2.280 1.0
Saturn 9.539 29.459 0.324 3.088 1.0
Apply heuristics 1 to 4 : (Iteration 4)
Try heuristic 1: applicable,, D
3
/P
2
is constant
Conclusion : D
3
/P
2
This is Keplers third law. (took about 20 years to discover it !)
A limitation of BACON.1
It works only for target equation relating at most two variable.
50
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
BACON.3 :
BACON.3 is a knowledge based system production system that discovers
empirical laws. The main heuristics detect constancies and trends in data,
and lead to the formulation of hypotheses and the definition of theoretical
terms. The program represents information at varying levels of description.
The lowest levels correspond to direct observations, while the highest
correspond to hypotheses that explain everything so far observed. BACON.3
is built on top of BACON.1.
It starts with a set of variables for a problem. For example, to derive
the ideal gas law, it started with four variables, p, V, n, T.
p - gas pressure,
V - gas volume,
T - gas temperature,
n - is the number of moles.
Values from experimental data are inputted.
BACON holds some constant and try to notice trends in the data.
Finally draws inferences. Recall pV/nT = k where k is a constant.
BACON has also been applied to Kepler's 3rd law, Ohm's law,
conservation of momentum and Joule's law.
51
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Discovery
Example :
Rediscovering the ideal gas law pV/nT = 8.32, where p is the pressure on a
gas, n is the number of moles, T is the temperature and V the volume of
the gas. [The step-by-step complete algorithm is not given like previous
example, but the procedure is explained below]
At the first level of description we hold n = 1 and T = 300 and vary p
and V. Choose V to be the dependent variable.
At this level, BACON discovers the law pV = 2496.0.
Now the program examines this phenomenon :
when n = 1 and T = 310 then pV = 2579.2. Similarly,
when n = 1 and T = 320 then pV = 2662.4.
At this point, BACON has enough information to relate the values of pV
and the temperature T. These terms are linearly related with an
intercept of 0, making the ratio pV/T equal to 8.32.
Now the discovery system can vary its third independent term.
while n = 2, the pV/T is found to be 16.64,
while n =3, the pV/T is found to be 24.96.
When it compares the values of n and pV/T, BACON finds another linear
relation with a zero intercept. The resulting equation, pV/nT = 8.32, is
equivalent to the ideal gas law.
52
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
6. Clustering
Clustering is a way to form natural groupings or clusters of patterns.
Clustering is often called an unsupervised learning.
Example : Three natural groups of data points, i.e., 3 natural clusters.
Y
X
Clustering is one of the most utilized data mining techniques.
The data have no target attribute.
The data set is explored to find some intrinsic structures in them.
Unlike Classification where the task is to learn to assign instances to
predefined classes, in Clustering no predefined classification is required.
The task is to learn a classification from the data.
53
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
6.1 Distance Functions
Euclidean geometry is study of relationships between angles and
distances in space. Mathematical spaces may be extended to any
dimension, called an n-dimensional Euclidean space or an n-space.
Let R denote the field of real numbers. The space of all n-tuples of real
numbers forms an n-dimensional vector space over R, denoted by R
n
.
An element of R
n
is written as X = (x
1
, x
2
, x
i
., x
n
), where x
i
is a
real number. Similarly the other element Y = (y
1
, y
2
, y
i
., y
n
) .
The vector space operations on R
n
are defined by
X + Y = (x
1
+ y
1
, X
2
+ y
2
, . . , x
n
+ y
n
) and
aX = (ax
1
, ax
2
, . . , ax
n
)
The standard inner product (ie dot product) on R
n
, given by
X Y = (x
1
y
1
+ x
2
y
2
+ . . . . + x
n
y
n
) is a real number.
This product defines a distance function (or metric) on R
n
by
d(X , Y) = ||X Y|| = (x
i
y
i
)
2
The (interior) angle between x and y is then given by
= cos
-1
( )
The inner product of X with itself is always nonnegative.
||X || = (x
i
- y
i
)
2
54
i=1
n
X Y
||X|| ||Y||
i=1
n
i=1
n
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
Euclidean distance or Euclidean metric is the "ordinary" distance
between two points that one would measure with a ruler. The Euclidean
distance between two points P = (p
1
, p
2
, . . p
i
. . , xn) and Q = (q
1
, q
2
,
. . q
i
. . , q
n
) in Euclidean n-space, is defined as :
(p
1
q
1
)
2
+ (p
2
q
2
)
2
+ . . + (p
n
q
n
)
2
= (p
i
- q
i
)
2
Example : 3-dimensional distance
For two 3D points, P = (p
x
, p
y
, . . p
z
) and Q = (q
x
, q
y
, . . q
z
)
The Euclidean 3-space, is computed as :
(p
x
q
x
)
2
+ (p
y
q
y
)
2
+ (p
z
q
z
)
2
Manhattan distance also known as city block distance between two
points in an Euclidean space with fixed cartesian coordinate system is
the sum of the lengths of the projections of the line segment between
the points onto the coordinate axes.
Example : In a plane, the Manhattan distance between the point P
1
with coordinates (x
1
, y
1
) and the point P
2
with coordinates (x
2
, y
2
) is
|x
1
x
2
| + |y
1
y
2
|
Manhattan versus Euclidean distance:
The red, blue, and yellow lines represent
Manhattan distance. They all are of same
length as 12.
The green line represent Euclidian distance of
length 62 8.48.
55
i=1
n
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
Minkowski Metric
Let Xi and Xj are data points (vectors) then Minkowski distance
between these two data points is :
dist(Xi, Xj) = ( (x
i1
x
j1
)
h
+ (x
i2
x
j2
)
h
+ . . + (x
ir
x
jr
)
h
)
1/h
where h is a positive integer.
Euclidean distance IF h = 2
dist(Xi, Xj) = ( (x
i1
x
j1
)
2
+ (x
i2
x
j2
)
2
+ . . + (x
ir
x
jr
)
2
)
1/2
= ( (x
i1
x
j1
)
2
+ (x
i2
x
j2
)
2
+ . . + (x
ir
x
jr
)
2
)
Weighted Euclidean distance
dist(Xi, Xj) = (w
1
(x
i1
x
j1
)
2
+ w
2
(x
i2
x
j2
)
2
+ . . + w
r
(x
ir
x
jr
)
2
)
Manhattan distance IF h = 1
dist (Xi, Xj) = |x
i1
x
j1
| + |x
i2
x
j2
| + . . + |x
ir
x
jr
|
Squared Euclidean distance: to place progressively greater weight
on data points that are further apart.
dist(Xi, Xj) = (x
i1
x
j1
)
2
+ (x
i2
x
j2
)
2
+ . . + (x
ir
x
jr
)
2
Chebychev distance: to define two data points as "different" if they
are different on any one of the attributes.
dist(Xi, Xj) = max (|x
i1
x
j1
| , |x
i2
x
j2
| , . . , |x
ir
x
jr
|)
56
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
6.2 K-Mean Clustering
k-means clustering is an algorithm to classify or to group objects based on
attributes/features into K number of group.
K is positive integer number.
The grouping is done by minimizing the sum of squares of distances
between data and the corresponding cluster centroid.
K-Mean Clustering algorithm
The algorithm consists of 3
steps. First take any random
object as initial centroid, then
the three steps mentioned
below are iterated until
converge.
Step-1 Determine centroid
coordinate
Step-2 Determine distance of each
object to the centroids
Step-3 Group the object based on
minimum distance
The next slide explains step by step the complete algorithm
57
Start
Objects distance
from Centroid
Grouping based on
min distance
Number of
cluster P
No Object
Move to group
Centroid
END
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
Example : K-Mean Clustering
Objects : 4 medicines as A, B, C, D.
Attributes : 2 as X is weight & Y is PH
Objects Attributes
X Y
A 1 1
B 2 1
C 4 3
D 5 4
Initial value of centroids:
Suppose medicine A and medicine B
be first centroids. If P
1
and P
2
denote
coordinates of the centroids,
then P
1
= (1, 1) and P
2
= (2, 1).
Let centroid P1 be for cluster group-1
Let centroid P2 be for cluster group-2.
58
4.5
4.0 D
3.5
3.0 C
2.5
2.0
1.5
1.0 A B
0.5
Y
0
0 1 2 3 4 5
X
4.5
4.0 D
3.5
3.0 C
2.5
2.0
1.5
1.0 A B
0.5
Y
0
0 1 2 3 4 5
X
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
1. Iteraion 0
(a)
Objects Clusters centers stated before :
Group-1 has center P
1
= (1, 1) ;
Group-2 has center P
2
= (2, 1) ;
Objects : A , B, C, D
Attributes : X and Y
A B C D
X 1 2 4 5
Y 1 1 3 4
(b)
Calculate distances between cluster center to each object
1st, calculate the Euclidean distances from cetroid P
1
to each point A, B,
C, D. It is the 1st row of the distance matrix.
2nd, calculate the Euclidean distances from cetroid P
2
to each point A, B,
C, D. It is the 2nd row of the distance matrix.
The ways to calculate just two distance matrix elements D
13
and D
23
are :
D
13
= (Cx P1x)
2
+ (Cy P1y)
2
= (4 1)
2
+ (3 1)
2
= 3.61
D
23
= (C
x
P
2x
)
2
+ (C
y
P
2y
)
2
= (4 2)
2
+ (3 1)
2
= 2.83
Similarly calculate other elements D
11
, D
12 ,
D
14 ,
D
21
, D
22 ,
D
24
(c)
Distance matrix becomes
0 1 3.61 5
D
0
=
1 0 2.83 4
A B C D
1st row indicates group-1 cluster
2nd row indicates group-2 cluster
(d) Objects clustering into groups:
Assign group to each object based on the minimum distance. Thus,
medicine A is assigned to group 1;
medicine B is assigned to group 2,
medicine C is assigned to group 2 , and
medicine D is assigned to group 2.
Group Matrix : matrix element is 1 if the object is assigned to that group
1 0 0 0
G
0
=
0 1 1 1
A B C D
1st row as group 1
2nd row as group 2
59
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
2. Iteration 1 :
The cluster groups have new members. Compute the new centroids of
each group. Repeat the process of iteration indicated below.
Group-1 has one member A, the
centroid remains as P
1
= (1, 1).
Group-2 now has 3 members B,
C, D, so centroid is the average
of their coordinates:
P
2
= ( , )
= (11/3 , 9/3)
= (3.66, 3)
(a)
Objects Clusters centers stated above :
Group-1 has center P
1
= (1, 1) ;
Group-2 has center P
2
= (3.66, 3) ;
Objects : A , B, C, D
Attributes : X and Y
A B C D
X 1 2 4 5
Y 1 1 3 4
(b)
Calculate distances between cluster center to each object
[The method is same as in Iteration 0 (b)]
1st, calculate the Euclidean distances from cetroid P
1
to each point A, B,
C, D You get D
11
, D
12 ,
D
13
, D
14
as the 1st row of the distance matrix.
2nd, calculate the Euclidean distances from cetroid P
2
to each point A, B,
C, D. You get D
21
, D
22 ,
D
23
, D
24
as the 2nd row of the distance matrix.
(c)
Distance matrix becomes
0 1 3.61 5
D
1
=
3.14 2.36 0.47 1.89
1st row indicates group-1 cluster
2nd row indicates group-2 cluster
(d) Objects clustering into groups:
Assign group to each object based on the minimum distance.
medicine A no change, remains in group 1;
medicine B moves to group 1,
medicine C no change, remains in group 2 ,
medicine D no change, remains in group 2.
Group Matrix : matrix element is 1 if the object is assigned to that group
1 1 0 0
G
1
=
0 0 1 1
A B C D
1st row indicates group-1 cluster
2nd row indicates group-2 cluster
60
2+4+5
3
1+3+5
3
4.5
4.0 D
3.5
3.0 C
2.5
2.0
1.5
1.0 A B
0.5
Y
0
0 1 2 3 4 5
X
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
3. Iteration 2 :
The cluster groups have new members. Compute the new centroids of
each group. Repeat the process of iteration indicated below.
Group-1 has 2 members, so the
new centroid is the average
coordinate among A and B :
P
1
= ( , ) = (1.5, 1)
Group-2 has 2 members, so the
new centroid is the average
coordinate among C and D :
P
2
= ( , ) = (4.5, 3.5)
(a)
Objects Clusters centers stated above :
Group-1 has center P
1
= (1.5, 1) ;
Group-2 has center P
2
= (4.5, 3.5) ;
Objects : A , B, C, D
Attributes : X and Y
A B C D
X 1 2 4 5
Y 1 1 3 4
(b)
Calculate distances between cluster center to each object
[The method is same as in Iteration 0 (b) or Iteration 1 (b)]
1st, calculate the Euclidean distances from cetroid P
1
to each point A, B,
C, D You get D
11
, D
12 ,
D
13
, D
14
as the 1st row of the distance matrix.
2nd, calculate the Euclidean distances from cetroid P
2
to each point A, B,
C, D. You get D
21
, D
22 ,
D
23
, D
24
as the 2nd row of the distance matrix.
(c)
Distance matrix becomes
0.5 0.5 3.2 4.61
D
2
=
4.3 3.54 0.71 0.71
1st row indicates group-1 cluster
2nd row indicates group-2 cluster
(d) Objects clustering into groups:
Assign group to each object based on the minimum distance.
medicine A no change, remains in group 1;
medicine B no change, remains in group 1;
medicine C no change, remains in group 2 ,
medicine D no change, remains in group 2.
Group Matrix : matrix element is 1 if the object is assigned to that group
1 1 0 0
G
2
=
0 0 1 1
A B C D
1st row indicates group-1 cluster
2nd row indicates group-2 cluster
61
4.5
4.0 D
3.5
3.0 C
2.5
2.0
1.5
1.0 A B
0.5
Y
0
0 1 2 3 4 5
X
1+2
2
1+1
2
4+5
2
3+4
2
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning : Clustering
4. Finally, the result state that Group Matrix G
2
= G
1
.
This means the grouping of objects in this last iteration and the one
before does not change anymore. Thus, the computation of the k-mean
clustering has reached its stability and no more iteration is needed.
Results of final grouping are :
Objects Feature 1 (X)
Weight index
Feature 2 (Y)
pH
Cluster Group
Medicine A 1 1 1
Medicine B 2 1 1
Medicine C 4 3 2
Medicine A 3 4 2
62
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning by analogy
7. Analogy
Learning by analogy means acquiring new knowledge about an input entity by
transferring it from a known similar entity.
This technique transforms the solutions of problems in one domain to the
solutions of the problems in another domain by discovering analogous states and
operators in the two domains.
Example: Infer by analogy the hydraulics laws that are similar to Kirchoff's laws.
Qa = 3 Qb = 9
Qc = ?
Hydraulic Problem
I1 I2
I3 = I1 + I2
Kirchoff's First Law
The other similar examples are :
Pressure Drop is like Voltage Drop
Hydrogen Atom is like our Solar System :
The Sun has a greater mass than the Earth and attracts it, causing the Earth to
revolve around the Sun. The nucleus also has a greater mass then the electron
and attracts it. Therefore it is plausible that the electron also revolves around the
nucleus.
63
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
- return
policy, decision-making rule
(s) action taken in state s under deterministic policy
(s,a) probability of taking action a in state s under stochastic policy
S set of all non-terminal states
S
+
set of all states, including the terminal state
A(s) set of actions possible in state s
p
a
ss
probability of transition from state s to state s under action a
r
a
ss
expected immediate reward on transition from state s to state
s under action a
73
r
t+1
r
t+2
r
t+3
a
t
a
t+1 a
t+2
a
t+3
s
t
s
t+1
s
t+2
s
t+3
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
K=1
K=1
R
C
C
h
a
k
r
a
b
o
r
t
y
,
w
w
w
.
m
y
r
e
a
d
e
r
s
.
i
n
f
o
AI Learning - References
10. References : Textbooks
1.
"Artificial Intelligence", by Elaine Rich and Kevin Knight, (2006), McGraw Hill
companies Inc., Chapter 17, page 447-484.
2.
"Artificial Intelligence: A Modern Approach" by Stuart Russell and Peter Norvig,
(2002), Prentice Hall, Chapter 18-21, page 649-788.
3.
"Computational Intelligence: A Logical Approach", by David Poole, Alan Mackworth,
and Randy Goebel, (1998), Oxford University Press, Chapter 11, page 397-438.
4.
"Artificial Intelligence: Structures and Strategies for Complex Problem Solving", by
George F. Luger, (2002), Addison-Wesley, Chapter 10- 13, page 385-570.
5.
"AI: A New Synthesis", by Nils J. Nilsson, (1998), Morgan Kaufmann Inc., Chapter
10, Page 163-178.
6.
"Artificial Intelligence: Theory and Practice", by Thomas Dean, (1994),
Addison-Wesley, Chapter 5, Page 179-254
7.
Related documents from open source, mainly internet. An exhaustive list is
being prepared for inclusion at a later date.
81