KE Unit 5 Notes
KE Unit 5 Notes
KE Unit 5 Notes
UNIT - 5 NOTES
The previous chapters introduced the main knowledge elements from the knowledge base
of an agent, which are all based on the notion of concept. This chapter presents the basic
operations involved in learning, including comparing the generality of concepts, general-
izing concepts, and specializing concepts. We start with a brief overview of several
machine learning strategies that are particularly useful for knowledge-based agents.
“Learning denotes changes in the system that are adaptive in the sense that they
enable the system to do the same task or tasks drawn from the same population more
efficiently and more effectively the next time” (Simon, 1983, p. 28).
“‘Learning’ is making useful changes in the workings of our minds” (Minsky, 1986,
p. 120).
“Learning is constructing or modifying representations of what is being experienced”
(Michalski, 1986, p. 10).
“A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured by
P, improves with experience E.” (Mitchell, 1997, p. 2).
Given the preceding definitions, we may characterize learning as denoting the way in
which people and computers:
There are two complementary dimensions of learning: competence and efficiency. A system
is improving its competence if it learns to solve a broader class of problems and to make
fewer mistakes in problem solving. The system is improving its efficiency if it learns to solve
the problems from its area of competence faster or by using fewer resources.
222
13:53:43,
.009
8.1. Introduction to Machine Learning 223
Machine learning is the domain of artificial intelligence that is concerned with building
adaptive computer systems that are able to improve their performance (competence and/or
efficiency) through learning from input data, from a user, or from their own problem-solving
experience.
Research in machine learning has led to the development of many basic learning
strategies, each characterized by the employment of a certain type of:
Rote learning
Version space learning
Decision trees induction
Clustering
Rule induction (e.g., Learning rule sets, Inductive logic programming)
Instance-based strategies (e.g., K-nearest neighbors, Locally weighted regression, Col-
laborative filtering, Case-based reasoning and learning, Learning by analogy)
Bayesian learning (e.g., Naïve Bayes learning, Bayesian network learning)
Neural networks and Deep learning
Model ensembles (e.g., Bagging, Boosting, ECOC, Staking)
Support vector machines
Explanation-based learning
Abductive learning
Reinforcement learning
Genetic algorithms and evolutionary computation
Apprenticeship learning
Multistrategy learning
In the next sections, we will briefly introduce four learning strategies that are particularly
useful for agent teaching and learning.
13:53:43,
.009
224 Chapter 8. Learning for Knowledge-based Agents
An example of a cup
cup(o1): color(o1, white), made-of(o1, plastic), light-mat(plastic),
has-handle(o1), has-flat-bottom(o1), up-concave(o1), ...
Proof that o1 is a cup Generalized proof
cup(o1) cup(x)
light-mat(plastic) light-mat(y)
made-of(o1,plastic) has-handle(o1) made-of(x, y) has-handle(x)
Learned rule
∀x ∀y made-of(x, y), light-mat(y), has-handle(x) ... cup(x)
the agent needs. For instance, the agent does not require any prior knowledge to perform
this type of learning.
The result of this learning strategy is the increase of the problem-solving competence of
the agent. Indeed, the agent will learn to perform tasks it was not able to perform before,
such as recognizing the cups from a set of objects.
13:53:43,
.009
8.1. Introduction to Machine Learning 225
Notice that the agent used the fact that o1 has a handle in order to prove that o1 is a cup.
This means that having a handle is an important feature. On the other hand, the agent did
not use the color of o1 to prove that o1 is a cup. This means that color is not important.
Notice how the agent reaches the same conclusions as in inductive learning from
examples, but through a different line of reasoning, and based on a different type of
information (i.e., prior knowledge instead of multiple examples).
The next step in the learning process is to generalize the proof tree from the left-hand
side of Figure 8.2 into the general tree from the right-hand side. This is done by using the
agent’s prior knowledge of how to generalize the individual inferences from the specific tree.
While the tree from the left-hand side proves that the specific object o1 is a cup, the tree
from the right-hand side proves that any object x that satisfies the leaves of the general tree
is a cup. Thus the agent has learned the general cup recognition rule from the bottom of
Figure 8.2.
To recognize that another object, o2, is a cup, the agent needs only to check that it
satisfies the rule, that is, to check for the presence of these features discovered as
important (i.e., light-mat, has-handle, etc.). The agent no longer needs to build a complex
proof tree. Therefore, cup recognition is done much faster.
Finally, notice that the agent needs only one example from which to learn. However, it
needs a lot of prior knowledge to prove that this example is a cup. Providing such prior
knowledge to the agent is a very complex task.
13:53:43,
.009
226 Chapter 8. Learning for Knowledge-based Agents
causes causes?
similar
B B’
yellow
has as color
Sun nucleus
has as mass has as temp has as mass
The students may then infer that other features of the Solar System are also features of
the hydrogen atom. For instance, in the Solar System, the greater mass of the sun and its
attraction of the planets cause the planets to revolve around it. Therefore, the students may
hypothesize that this causal relationship is also true in the case of the hydrogen atom: The
greater mass of the nucleus and its attraction of the electrons cause the electrons to revolve
around the nucleus. This is indeed true and represents a very interesting discovery.
The main problem with analogical reasoning is that not all the features of the Solar
System are true for the hydrogen atom. For instance, the sun is yellow, but the nucleus is
not. Therefore, the information derived by analogy has to be verified.
13:53:43,
.009
8.2. Concepts 227
8.2 CONCEPTS
13:53:43,
.009
228 Chapter 8. Learning for Knowledge-based Agents
Positive
exception +
Negative
exception –
+
Positive
example
Negative –
example
Most likely a More likely a More likely a Most likely a
positive example positive example negative example negative example
13:53:43,
.009
8.3. Generalization and Specialization Rules 229
In the next sections, we will describe in detail the basic learning operations dealing with
concepts: the generalization of concepts, the specialization concepts, and the comparison of
the generality of the concepts.
A concept was defined as representing a set of instances. In order to show that a concept
P is more general than a concept Q, this definition would require the computation and
comparison of the (possibly infinite) sets of the instances of P and Q. In this section, we
will introduce generalization and specialization rules that will allow one to prove that a
concept P is more general than another concept Q by manipulating the descriptions of
P and Q, without computing the sets of instances that they represent.
A generalization rule is a rule that transforms (the description of) a concept into (the
description of) a more general concept. The generalization rules are usually inductive
transformations. The inductive transformations are not truth preserving but falsity pre-
serving. That is, if P is true and is inductively generalized to Q, then the truth of Q is not
guaranteed. However, if P is false, then Q is also false.
A specialization rule is a rule that transforms a concept into a less general concept. The
reverse of any generalization rule is a specialization rule. Specialization rules are deduct-
ive, truth-preserving transformations.
A reformulation rule transforms a concept into another, logically equivalent concept.
Reformulation rules are also deductive, truth-preserving transformations.
If one can transform concept P into concept Q by applying a sequence of generalization
rules, then Q is more general than P.
Consider the phrase, “Students who have majored in computer science at George
Mason University between 2007 and 2008.” The following are some of the phrases that
are obvious generalizations of this phrase:
“Students who have majored in computer science between 2007 and 2008”
“Students who have majored in computer science between 2000 and 2012”
“Students who have majored in computer science at George Mason University”
“Students who have majored in computer science”
Some of the phrases that are specializations of the preceding phrase follow:
“Graduate students who have majored in computer science at George Mason Univer-
sity between 2007 and 2008”
“Students who have majored in computer science at George Mason University in 2007”
“Undergraduate students who have majored in both computer science and mathemat-
ics at George Mason University in 2008”
13:53:43,
.009
230 Chapter 8. Learning for Knowledge-based Agents
Dropping conditions
Extending intervals
Extending ordered sets of intervals
Extending discrete sets
Using feature definitions
Using inference rules
By replacing 55 with the variable ?N1, which can take any value, we generalize this
concept to the one shown in [8.2]: “The set of professors with any number of publications.”
In particular, ?N1 could be 55. Therefore the second concept includes the first one.
Conversely, by replacing ?N1 with 55, we specialize the concept [8.2] to the concept
[8.1]. The important thing to notice here is that by a simple syntactic operation (turning a
number into a variable), we can generalize a concept. This is one way in which an agent
generalizes concepts.
E1 may be interpreted as representing the concept: “the papers ?O1 and ?O2 authored
by the professor ?O3.” E2 may be interpreted as representing the concept: “the papers ?O1
13:53:43,
.009
8.3. Generalization and Specialization Rules 231
and ?O2 authored by the professors ?O31 and ?O32, respectively.” In particular, ?O31 and
?O32 may represent the same professor. Therefore, the second set includes the first one,
and the second expression is more general than the first one.
13:53:43,
.009
232 Chapter 8. Learning for Knowledge-based Agents
human age
(0.0, 1.0) [1.0, 4.5) [4.5, 12.5) [12.5, 19.5) [19.5, 65.5) [65.5, 150.0]
Figure 8.7. Ordered set of intervals as an ordered generalization hierarchy.
13:53:43,
.009
8.3. Generalization and Specialization Rules 233
13:53:43,
.009
234 Chapter 8. Learning for Knowledge-based Agents
Up to this point we have only defined when a concept is more general than another
concept. Learning agents, however, would need to generalize sets of examples and
concepts. In the following we define some of these generalizations.
To show that [8.23] is more general than [8.22] it is enough to show that [8.22] can be
transformed into [8.23] by applying a sequence of generalization rules. The sequence is the
following one:
13:53:43,
.009
8.4. Types of Generalizations and Specializations 235
person
employee student
graduate graduate
research teaching
assistant assistant
S1 ?O1 instance of graduate research assistant S2 ?O1 instance of graduate teaching assistant
is interested in ?O2 is interested in ?O2
?O2 instance of area of expertise ?O2 instance of area of expertise
C1 ?O1 instance of graduate research assistant C2 ?O1 instance of graduate teaching assistant
is interested in ?O2 is interested in ?O2
?O2 instance of area of expertise ?O2 instance of area of expertise
requires programming requires fieldwork
13:53:43,
.009
236 Chapter 8. Learning for Knowledge-based Agents
13:53:43,
.009
8.4. Types of Generalizations and Specializations 237
Notice, however, that there may be more than one minimal generalization of two
expressions. For instance, according to the generalization hierarchy from the middle of
Figure 8.8, there are two minimal generalizations of graduate research assistant and
graduate teaching assistant. They are university employee and graduate student. Conse-
quently, there are two minimal generalizations of S1 and S2 in Figure 8.11: mG1 and
mG2. The generalization mG1 was obtained by generalizing graduate research assistant and
graduate teaching assistant to university employee. mG2 was obtained in a similar fashion,
except that graduate research assistant and graduate teaching assistant were generalized to
graduate student. Neither mG1 nor mG2 is more general than the other. However, G3 is
more general than each of them.
Disciple agents employ minimal generalizations, also called maximally specific gener-
alizations (Plotkin, 1970; Kodratoff and Ganascia, 1986). They also employ maximal
generalizations, also called maximally general generalizations (Tecuci and Kodratoff,
1990; Tecuci, 1992; Tecuci 1998).
mG1 ?O1 instance of university employee mG2 ?O1 instance of graduate student
is interested in ?O2 is interested in ?O2
?O2 instance of area of expertise ?O2 instance of area of expertise
S1 ?O1 instance of graduate research assistant S2 ?O1 instance of graduate teaching assistant
is interested in ?O2 is interested in ?O2
?O2 instance of area of expertise ?O2 instance of area of expertise
13:53:43,
.009
238 Chapter 8. Learning for Knowledge-based Agents
The minimal specialization of two clauses consists of the minimal specialization of the
matched feature-value pairs, and of all the unmatched feature-value pairs. This procedure
assumes that no new clause feature can be made explicit by applying theorems. Otherwise,
one has first to make all the features explicit.
The minimal specialization of two conjunctions of clauses C1 and C2 consists of the
conjunction of the minimal specializations of each of the matched clauses of C1 and C2,
and of all the unmatched clauses from C1 and C2.
Figure 8.12 shows several specializations of the concepts G1 and G2. mS1 and mS2 are
two minimal specializations of G1 and G2 because graduate research assistant and graduate
teaching assistant are two minimal specializations of university employee and graduate
student.
Notice that in all the preceding definitions and illustrations, we have assumed that the
clauses to be generalized correspond to the same variables. If this assumption is not
satisfied, then one would need first to match the variables and then compute the general-
izations. In general, this process is computationally expensive because one will need to try
different matchings.
Inductive concept learning from examples has already been introduced in Section 8.1.2. In
this section, we will discuss various aspects of this learning strategy that are relevant to
agent teaching and learning. The problem of inductive concept learning from examples
can be more precisely defined as indicated in Table 8.2.
The bias of the learning agent is any basis for choosing one generalization over another,
other than strict consistency with the observed training examples (Mitchell, 1997). In the
following, we will consider two agents that employ two different preference biases: a
cautious learner that always prefers minimal generalizations, and an aggressive learner
that always prefers maximal generalizations.
Let us consider the positive examples [8.28] and [8.29], and the negative example [8.30]
of a concept to be learned by these two agents in the context of the generalization
hierarchies from Figure 8.13.
mS1 ?O1 instance of graduate research assistant mS2 ?O1 instance of graduate teaching assistant
is interested in ?O2 is interested in ?O2
?O2 instance of area of expertise ?O2 instance of area of expertise
C1 ?O1 instance of graduate research assistant C2 ?O1 instance of graduate teaching assistant
is interested in ?O2 is interested in ?O2
?O2 instance of area of expertise ?O2 instance of area of expertise
requires programming requires fieldwork
13:53:43,
.009
8.5. Inductive Concept Learning from Examples 239
Given
A language of instances.
A language of generalizations.
A set of positive examples (E1, . . ., En) of a concept.
A set of negative (or counter) examples (C1, . . ., Cm) of the same concept.
A learning bias.
Other background knowledge.
Determine
A concept description that is a generalization of the positive
examples and that does not cover any of the negative examples.
Purpose of concept learning
Predict if an instance is a positive example of the learned concept.
person
university
state private
university university
computer full associate assistant
technician professor professor professor
Positive examples:
Mark White instance of assistant professor [8.28]
is employed by George Mason University
Negative example:
George Dean instance of computer technician [8.30]
is employed by Stanford University
What concept might be learned by the cautious learner from the positive examples
[8.28] and [8.29], and the negative example [8.30]? The cautious learner would learn a
minimal generalization of the positive examples, which does not cover the negative
example. Such a minimal generalization might be the expression [8.31], “an assistant
professor employed by a state university,” obtained by minimally generalizing George
Mason University and University of Virginia to state university.
13:53:43,
.009
240 Chapter 8. Learning for Knowledge-based Agents
The concept learned by the cautious learner is represented in Figure 8.14 as the
minimal ellipse that covers the positive examples without covering the negative example.
Assuming a complete ontology, the learned concept is included into the actual concept.
How will the cautious learner classify each of the instances represented in Figure 8.14
as black dots? It will classify the dot covered by the learned concept as positive example,
and the two dots that are not covered by the learned concept as negative examples.
How confident are you in the classification, when the learner predicts that an instance
is a positive example? When a cautious learner classifies an instance as a positive example
of a concept, this classification is correct because an instance covered by the learned
concept is also covered by the actual concept.
But how confident are you in the classification, when the learner predicts that an
instance is a negative example? The learner may make mistakes when classifying an
instance as a negative example, such as the black dot that is covered by the actual concept
but not by the learned concept. This type of error is called “error of omission” because
some positive examples are omitted – that is, they are classified as negative examples.
Let us now consider the concept that might be learned by the aggressive learner from
the positive examples [8.28] and [8.29], and the negative example [8.30]. The aggressive
learner will learn a maximal generalization of the positive examples that does not cover the
negative example. Such a maximal generalization might be the expression [8.32], “a
professor employed by a university.” This is obtained by generalizing assistant professor
to professor (the most general generalization that does not cover computer technician from
the negative example) and by maximally generalizing George Mason University and Univer-
sity of Virginia to university. Although university covers Stanford University, this is fine
because the obtained concept [8.32] still does not cover the negative example [8.30].
Concept learned by
the cautious learner
-
Actual concept
+
+
Figure 8.14. Learning and classifications by a cautious learner.
13:53:43,
.009
8.5. Inductive Concept Learning from Examples 241
The concept learned by the aggressive learner is represented in Figure 8.15 as the
maximal ellipse that covers the positive examples without covering the negative example.
Assuming a complete ontology, the learned concept includes the actual concept.
How will the aggressive learner classify each of the instances represented in Figure 8.15
as black dots? It will classify the dot that is outside the learned concept as a negative
example, and the other dots as positive examples.
How confident are you in the classification when the learner predicts that an instance is
negative example? When the learner predicts that an instance is a negative example, this
classification is correct because that instance is not covered by the actual concept, which is
itself covered by the learned concept.
But, how confident are you in the classification when the learner predicts that an
instance is a positive example? The learner may make mistakes when predicting that an
instance is a positive example, as is the case with the dot covered by the learned concept,
but not by the actual concept. This type of error is called “error of commission” because
some negative examples are committed – that is, they are classified as positive examples.
Notice the interesting fact that the aggressive learner is correct when it classifies
instances as negative examples (they are indeed outside the actual concept because they
are outside the concept learned by the aggressive learner) while the cautious learner is
correct when it classifies instances as positive examples (they are inside the actual concept
because they are inside the concept learned by the cautious learner). How could one
synergistically integrate these two learning strategies to take advantage of their comple-
mentariness? An obvious solution is to use both strategies, learning both a minimal and a
maximal generalization from the examples, as illustrated in Figure 8.16.
What class will be predicted by a dual-strategy learner for the instances represented as
black dots in Figure 8.16? The dot covered by the concept learned by the cautious learner
-
Actual concept
+
+
+
+
Concept learned by the cautious learner
Figure 8.16. Learning and classifications by a dual-strategy learner.
13:53:43,
.009
242 Chapter 8. Learning for Knowledge-based Agents
will be classified, with high confidence, as a positive example. The dot that is not covered
by the concept learned by the aggressive learner will be classified, again with high
confidence, as a negative example. The dual-strategy learner will indicate that it cannot
classify the other two dots.
Let us consider the ontology from Figure 8.17. What is the maximal generalization of the
positive examples John Doe and Jane Austin that does not cover the given negative example
Bob Sharp, in the case where graduate research assistant is included into the ontology? The
maximal generalization is faculty member.
But what is the maximal generalization in the case where graduate research assistant is
missing from the ontology? In this case, the maximal generalization is employee, which is,
in fact, an overgeneralization.
What is the minimal specialization of person that does not cover Bob Sharp in the case
where graduate research assistant is included into the ontology? It is faculty member.
But what is the minimal specialization in the case where graduate research assistant is
missing from the ontology? In this case, the minimal specialization is employee, which
is an underspecialization.
person
subconcept of
employee student
subconcept of
subconcept of
university employee
graduate
subconcept of student
undergraduate
staff member faculty member student
subconcept of
subconcept of
subconcept of
instructor professor PhD advisor graduate
research teaching
assistant assistant
subconcept of BS student
John Smith John Doe Jane Ausn Bob Sharp Joan Dean
_
+ +
Figure 8.17. Plausible generalizations and specializations due to ontology incompleteness.
13:53:43,
.009
8.7. Formal Definition of Generalization 243
Notice that the incompleteness of the ontology causes the learner both to overgeneral-
ize and underspecialize. In view of the preceding observations, what can be said about the
relationships between the concepts learned using minimal and maximal generalizations
and the actual concept when the ontology and the representation language are incom-
plete? The minimal and maximal generalizations are only approximations of the actual
concept, as shown in Figure 8.18.
Why is the concept learned with an aggressive strategy more general than the one
learned with a cautious strategy? Because they are based on the same ontology and
generalization rules.
Let be a set of variables. For convenience in identifying variables, their names start
with “?,” as in, for instance, ?O1. Variables are used to denote unspecified instances of
concepts.
Let be a set of constants. Examples of constants are the numbers (such as “5”), strings
(such as “programming”), symbolic probability values (such as “very likely”), and
instances (such as “John Doe”). We define a term to be either a variable or a constant.
Let be a set of features. The set includes the domain independent features
“instance of,” “subconcept of,” and “direct subconcept of,” as well as other domain-
specific features, such as “is interested in.”
Let be an object ontology consisting of a set of concepts and instances defined using the
clause representation [7.2] presented in Section 7.2, where the feature values (vi1 . . . vim)
are constants, concepts, instances, or intervals (numeric or symbolic). That is, there are no
variables in the definition of a concept or an instance from , such as the following one:
13:53:43,
.009
244 Chapter 8. Learning for Knowledge-based Agents
The concepts and the instances from are related by the generalization relations
“instance of” and “subconcept of.” includes the concept object, which represents all
the instances from the application domain and is therefore more general than any
other object concept.
Let be the set of the theorems and the properties of the features, variables, and
constants.
Two properties of any feature are its domain and its range. Other features may have
special properties. For instance, the relation subconcept of is transitive (see Section
5.7). Also, a concept or an instance inherits the features of the concepts that are more
general than it (see Section 5.8).
Let be a set of connectors. includes the logical connectors AND (∧), OR (∨), and
NOT (Except–When), the connectors “{” and “}” for defining alternative values of a
feature, the connectors “[” and “]” as well as “(” and “)” for defining a numeric or a
symbolic interval, the delimiter “,” (a comma), and the symbols “Plausible Upper
Bound” and “Plausible Lower Bound.”
In the preceding expression, each of concept-i . . . concept-n is either an object concept from
the object ontology (such as PhD student), a numeric interval (such as [50, 60]), a set of
numbers (such as {1, 3, 5}), a set of strings (such as {white, red, blue}), a symbolic probability
interval (such as [likely - very likely]), or an ordered set of intervals (such as [youth - mature]).
?Oj, . . . , ?Ol,. . ., ?Op, . . . , ?Ot are distinct variables from the sequence (?O1, ?O2, . . . , ?On).
When concept-n is a set or interval such as “[50, 60],” we use “is in” instead of “instance of.”
A more complex concept is defined as a conjunctive expression “BRU ∧ not BRU1 ∧ . . . ∧
not BRUp,” where “BRU” and each “BRUk (k = 1, . . . , p)” is a conjunction of clauses. This is
illustrated by the following example, which represents the set of instances of the tuple
(?O1, ?O2, ?O3), where ?O1 is a professor employed by a university ?O2 in a long-term
position ?O3, such that it is not true that ?O1 plans to retire from ?O2 or to move to some
other organization:
13:53:43,
.009
8.7. Formal Definition of Generalization 245
Not
?O1 instance of professor
plans to retire from ?O2
?O2 instance of university
Not
?O1 instance of professor
plans to move to ?O4
?O4 instance of organization
C1 = v1 instance of b1
f11 v11
...
f1m v1m
C2 = v2 instance of b2
f21 v21
...
f2n v2n
We say that the clause C1 is more general than the clause C2 if there exists a substitution
σ such that:
σv1 = v2
b1 = b2
8i2{1,. . .,m}, 9 j2{1,. . .,n} such that f1i = f2j and σv1i = v2j.
C1 = ?X instance of student
enrolled at George Mason University
13:53:43,
.009
246 Chapter 8. Learning for Knowledge-based Agents
C2 = ?Y instance of student
enrolled at George Mason University
has as sex female
Indeed, let σ = (?X ?Y). As one can see, σC1 is a part of C2, that is, each feature of
σC1 is also a feature of C2. The first concept represents the set of all students enrolled at
George Mason University, while the second one represents the set of all female students
enrolled at George Mason University. Obviously the first set includes the second one, and
therefore the first concept is more general than the second one.
Let us notice, however, that this definition of generalization does not take into account
the theorems and properties of the representation language . In general, one needs to
use these theorems and properties to transform the clauses C1 and C2 into equivalent
clauses C’1 and C’2, respectively, by making explicit all the properties of these clauses.
Then one shows that C’1 is more general than C’2. Therefore, the definition of the more
general than relation in is the following one:
A clause C1 is more general than another clause C2 if and only if there exist C’1, C’2,
and a substitution σ, such that:
C0 1 ¼L C1
C0 2 ¼L C2
σv1 ¼L v2
In the following sections, we will always assume that the equality is in and we will no
longer indicate this.
A ¼ A1 ∧A2 ∧ . . . ∧An
B ¼ B1 ∧B2 ∧ . . . ∧Bm
A is more general than B if and only if there exist A’, B’, and σ such that:
A0 ¼ A, A0 ¼ A0 1 ∧A0 2 ∧ . . . ∧A0 p
B0 ¼ B, B0 ¼ B0 1 ∧B0 2 ∧ . . . ∧B0 q
Otherwise stated, one transforms the concepts A and B, using the theorems and the
properties of the representation language, so as to make each clause from A’ more general
than a corresponding clause from B’. Notice that some clauses from B’ may be “left over,”
that is, they are not matched by any clause of A’.
A is more general than B if and only if there exist A0 , B0 , and σ, such that:
8 i2f1; . . . ; pg, 9 j2f1; . . . ; qg such that BRU0 bj is more general than σBRU0 ai :
P5
P1 P2
Figure 8.19. Determining concepts that satisfy given “subconcept of” relationships.
13:53:43,
.009
8.8. Review Questions 249
the bodies is not relevant because they can move inside the cell. For example,
((1 green) (2 yellow)) is the same with ((2 yellow) (1 green)) and represents a cell
where one body has one nucleus and is green, while the other body has two nuclei
and is yellow. You should also assume that any generalization of a cell is also
described as a pair of pairs ((s t) (u v)).
(a) Indicate all the possible generalizations of the cell from Figure 8.20 and the
generalization relations between them.
(b) Determine the number of the distinct sets of instances and the number of the
concept descriptions from this problem.
(c) Consider the cell descriptions from Figure 8.21 and determine the following
minimal generalizations: g(E1, E2), g(E2, E3), g(E3, E1), g(E1, E2, E3).
8.14. Consider the ontology fragment from the loudspeaker manufacturing domain,
shown in Figure 5.23 (p. 171), and the following expressions:
8.15. Consider the ontology fragment from the loudspeaker manufacturing domain,
shown in Figure 5.24 (p. 172). Notice that each most specific concept, such as dust
or air press, has an instance, such as dust1 or air press1.
Consider also the following two expressions:
E1: ((1 green) (1 green)) E2: ((1 yellow) (2 green)) E3: ((1 green) (2 green))
Figure 8.21. The descriptions of three cells.
13:53:43,
.009
250 Chapter 8. Learning for Knowledge-based Agents
Use the generalization rules to show that E1 is more general than E2.
8.16. Determine a generalization of the following two expressions in the context of the
ontology fragment from Figure 5.24 (p. 172):
E: ?O instance of object
color yellow
shape circle
radius 5
Indicate five different generalization rules. For each such rule, determine an
expression Eg that is more general than E according to that rule.
polygon round
warm color cold color
13:53:43,
.009
8.8. Review Questions 251
8.19. Consider the following two concepts G1 and G2, and the ontology fragments in
Figure 8.23. Indicate four specializations of G1 and G2 (including a minimal
specialization).
8.20. Illustrate the clause generalization defined in Section 8.7.3 with an example from
the PhD Advisor Assessment domain.
8.21. Illustrate the BRU generalization defined in Section 8.7.4 with an example from the
PhD Advisor Assessment domain.
8.22. Illustrate the generalization of concepts with negations defined in Section 8.7.5 by
using an example from the PhD Advisor Assessment domain.
8.23. Use the definition of generalization based on substitution to prove that each of the
generalization rules discussed in Section 8.3 transforms a concept into a more
general concept.
13:53:43,
.009
9 Rule Learning
In this and the next chapter on rule refinement, we will refer to both problems and
hypotheses, interchangeably, to emphasize the fact that the learning methods presented
are equally applicable in the context of hypotheses analysis and problem solving.
Figure 9.1 summarizes the interactions between the subject matter expert and the learning
agent that involve modeling, learning, and problem solving.
The expert formulates the problem to be solved (or the hypothesis to be analyzed), and
the agent uses its knowledge to generate a (problem-solving or argumentation) tree to be
verified by the expert.
Several cases are possible. If the problem is not completely solved, the expert will extend
the tree with additional reductions and provide solutions for the leaf problems/hypotheses.
P1
Problem Question
Answer
1 1 1
P1 S1 Pn
Mixed-Initiative
Question
Problem Solving Question
Answer
Answer
2 2 2 2
P1 S1 Pm Sm
Question
Reasoning Tree Question
Answer
Answer Ontology + Rules
3 3 3
Extend Tree Accept step P1 S1 Sp
3
Sp
1
Pn
Learned Rules
252
13:59:50,
.010
9.2. Rule Learning and Refinement 253
From each new reduction provided by the expert, the agent will learn a new rule, as will be
presented in the following sections.
If the expert rejects any of the reasoning steps generated by the agent, then an explan-
ation of why that reduction is wrong needs to be determined, and the rule that generated it
will be refined to no longer generate the wrong reasoning step.
If the expert accepts a reasoning step as correct, then the rule that generated it may be
generalized. The following section illustrates these interactions.
As will be discussed in the following, the subject matter expert helps the agent to learn
by providing examples and explanations, and the agent helps the expert to teach it by
presenting attempted solutions.
First, as illustrated in Figure 9.2, the expert formulates the problem to solve or the
hypothesis to analyze which, in this illustration, is the following hypothesis:
In this case, we will assume that the agent does not know how to assess this hypothesis.
Therefore, the expert has to teach the agent how to assess it. The expert will start by
developing a reduction tree, as discussed in Chapter 4 and illustrated in the middle of
Figure 9.2. The initial hypothesis is first reduced to three simpler hypotheses, guided
by a question/answer pair. Then each of the subhypotheses is further reduced, either
to a solution/assessment or to an elementary hypothesis to be assessed based on
evidence. For example, the bottom part of Figure 9.2 shows the reduction of the first
subhypothesis to an assessment.
After the reasoning tree has been developed, the subject matter expert interacts with
the agent, helping it “understand” why each reduction step is correct, as will be discussed
in Section 9.5. As a result, from each reduction step the agent learns a plausible version
1. Modeling 2. Learning
Expert explains Agent learns
how to solve a general
specific problem reducon rules
Rule1
Rule2
13:59:50,
.010
254 Chapter 9. Rule Learning
space rule, as a justified generalization of it. This is illustrated in the right-hand side of
Figure 9.2 and discussed in Section 9.7. These rules are not shown to the expert, but they
may be viewed with the Rule Browser.
The agent can now use the learned rules to assess by itself similar hypotheses formu-
lated by the expert, as illustrated in Figure 9.3, where the expert formulated the following
hypothesis:
The reduction tree shown in Figure 9.3 was generated by the agent. Notice how the agent
concluded that Bob Sharp is interested in an area of expertise of Dan Smith, which is
Information Security, by applying the rule learned from John Doe and Bob Sharp, who share
a common interest in Artificial Intelligence.
The expert has to inspect each reduction generated by the agent and indicate whether it
is correct or not. Because the reductions from Figure 9.3 are correct, the agent generalizes
the lower bound conditions of the applied rules, if the reductions were generated based on
the upper bound conditions of these rules.
The bottom part of Figure 9.4 shows a reduction generated by the agent that is rejected
by the expert. While Dan Smith has indeed a tenured position, which is a long-term faculty
position, he plans to retire. It is therefore wrong to conclude that it is almost certain that he
will stay on the faculty of George Mason University for the duration of the dissertation of
Bob Sharp.
Such failure explanations are either proposed by the agent and accepted by the expert,
or are provided by the expert, as discussed in Section 9.5.2.
Based on this failure explanation, the agent specializes the rule that generated this
reduction by adding an Except-When plausible version space condition, as illustrated in
the right-hand side of Figure 9.4. From now on, the agent will check not only that the
faculty member has a long-term position (the main condition of the rule), but also that he
or she does not plan to retire (the Except-When condition). The refined rule is not shown
to the expert, but it may be viewed with the Rule Browser.
3. Solving 5. Refinement
Expert accepts
reasoning
13:59:50,
.010
9.2. Rule Learning and Refinement 255
1. Solving 3. Refinement
Agent refines
rule with
negative
example
2. Critiquing
Incorrect
because
Dan Smith
plans to retire
- +
1. Solving 3. Refinement
2. Critiquing
- +
Incorrect
because
Jane Austin
+ -
plans to move -
Figure 9.5 shows another reasoning tree generated by the agent for an expert-
formulated hypothesis. Again the expert rejects one of the reasoning steps: Although
Jane Austin has a tenured position and does not plan to retire, she plans to move from
George Mason University and will not stay on the faculty for the duration of the dissertation
of Bob Sharp.
Based on this failure explanation, the agent specializes the rule that generated the
reduction by adding an additional Except-When plausible version space condition, as
shown in the right-hand side of Figure 9.5. From now on, the agent will check not only that
the faculty member has a long-term position, but also that he or she does not plan to retire
or move from the university.
13:59:50,
.010
256 Chapter 9. Rule Learning
The refined rule is shown in Figure 9.6. Notice that this is a quite complex rule that was
learned based only on one positive example, two negative examples, and their explan-
ations. The rule may be further refined based on additional examples.
The following sections describe in more detail the rule-learning and refinement pro-
cesses. Before that, however, let us notice a significant difference between the develop-
ment of a knowledge-based learning agent and the development of a (nonlearning)
knowledge-based agent. As discussed in Sections 1.6.3.1 and 3.1, after the knowledge base
of the (nonlearning) agent is developed by the knowledge engineer, the agent is tested
with various problems. The expert has to analyze the solutions generated by the agent, and
the knowledge engineer has to modify the rules manually to eliminate any identified
problems, testing the modified rules again.
+
-
-
13:59:50,
.010
9.3. The Rule-Learning Problem 257
In the case of a learning agent, both rule learning and rule refinement take place as part
of agent teaching. Testing of the agent is included into this process. This process will also
continue as part of knowledge base maintenance. If we would like to extend the agent
to solve new problems, we simply need to teach it more. Thus, in the case of a learning
agent, such as Disciple-EBR, there is no longer a distinction between knowledge base
development and knowledge base maintenance. This is very important because it is well
known that knowledge base maintenance (and system maintenance, in general) is much
more challenging and time consuming than knowledge base (system) development.
Thus knowledge base development and maintenance are less complex and much faster
in the case of a learning agent.
The rule-learning problem is defined in Table 9.1 and is illustrated in Figures 9.7 and 9.8.
The agent receives an example of a problem or hypothesis reduction and learns a plausible
version space rule that is an analogy-based generalization of the example. There is no
restriction with respect to what the example actually represents. However, it has to be
described as a problem or hypothesis that is reduced to one or several subproblems,
elementary hypotheses, or solutions. Therefore, this example may also be referred to as a
problem-solving episode. For instance, the example shown in the top part of Figure 9.8
reduces a specific hypothesis to its assessment or solution, guided by a question and
its answer.
The expert who is training the agent will interact with it to help it understand why the
example is a correct reduction. The understanding is done in the context of the agent’s
ontology, a fragment of which is shown in Figure 9.7.
The result of the rule-learning process is a general plausible version space rule that will
allow the agent to solve problems by analogy with the example from which the rule was
learned. The plausible version space rule learned from the example at the top of Figure 9.8
is shown at the bottom part of the figure. It is an IF-THEN structure that specifies the
conditions under which the problem from the IF part has the solution from the THEN part.
The rule is only partially learned because, instead of a single applicability condition, it has
two conditions:
GIVEN
A knowledge base that includes an ontology and a set of (previously learned) rules
An example of a problem reduction expressed with the concepts and instances from the agent’s
knowledge base
An expert who will interact with the agent to help it understand why the example is correct
DETERMINE
A plausible version space rule, where the upper bound is a maximal generalization of the
example, and the lower bound is a minimal generalization that does not contain any specific
instance
An extended ontology, if any extension is needed for the understanding of the example
13:59:50,
.010
258 Chapter 9. Rule Learning
object
A plausible upper bound condition that is a maximal generalization of the instances and
constants from the example (e.g., Bob Sharp, certain), in the context of the agent’s ontology
A plausible lower bound condition that is a minimal generalization that does not
contain any specific instance
The relationships among the variables ?O1, ?O2, and ?O3 are the same for both conditions
and are therefore shown only once in Figure 9.8, under the conditions.
Completely learning the rule means learning an exact condition, where the plausible
upper bound is identical with the plausible lower bound.
During rule learning, the agent might also extend the ontology with new features or
concepts, if they are needed for the understanding the example.
An overview of the rule-learning method is presented in Figure 9.9 and in Table 9.2. As in
explanation-based learning (DeJong and Mooney, 1986; Mitchell et al., 1986), it consists of
two phases: an explanation phase and a generalization phase. However, in the explanation
phase the agent does not automatically build a deductive proof tree but an explanation
structure through mixed-initiative understanding. Also, the generalization is not a deduct-
ive one, but an analogy-based one.
In the following, we will describe this learning method in more detail and illustrate it.
First we will present the mixed-initiative process of explanation generation and example
13:59:50,
.010
9.4. Overview of the Rule-Learning Method 259
Rule learning
Mixed-Initiative Analogy-based
Understanding Generalization
Plausible version
space rule
Example of a
reduction step
Explanation
NLP, Analogy
Knowledge Base
13:59:50,
.010
260 Chapter 9. Rule Learning
understanding, which is part of the first phase. Then we will present and justify the
generalization method, which is based on analogical reasoning.
13:59:50,
.010
9.5. Mixed-Initiative Example Understanding 261
is interested in is expert in
13:59:50,
.010
262 Chapter 9. Rule Learning
Because the agent’s ontology is incomplete, sometimes the explanation includes only
an approximate representation of the meaning of the question/answer (natural language)
sentences.
The next section presents the explanation generation method.
It is easier for an expert to understand sentences in the formal language of the agent
than it is to produce such formal sentences
It is easier for the agent to generate formal sentences than it is to understand sentences
in the natural language of the expert
In essence, the agent will use basic natural language processing, various heuristics,
analogical reasoning, and help from the expert in order to identify and propose a set of
plausible explanation pieces, ordered by their plausibility of being correct explanations.
Then the expert will select the correct ones from the generated list.
The left-hand side of Figure 9.11 shows an example to be understood, and the upper-
right part of Figure 9.11 shows all the instances and constants from the example. The agent
will look for plausible explanation pieces of the types from Table 9.3, involving those
instances and constants. The most plausible explanation pieces identified, in plausibility
13:59:50,
.010
9.5. Mixed-Initiative Example Understanding 263
order, are shown in the bottom-right of Figure 9.11. Notice that the two most plausible
explanation pieces from Figure 9.11 are the correct explanation pieces shown in Figure 9.10.
The expert will have to select each of them and click on the Accept button. As a result, the
agent will move them in the Explanations pane from the left side of Figure 9.11.
Notice in the upper-right of Figure 9.11 that all the objects and constants from the example
are selected. Consequently, the agent generates the most plausible explanations pieces
related to all these objects and displays those with the highest plausibility. The expert may
click on the See More button, asking the agent to display the next set of plausible explanations.
The expert may also deselect some of the objects and constants, asking the agent to
generate only plausible explanations involving the selected elements. For example,
Figure 9.12 illustrates a situation where only the constant “certain” is selected. As a result,
the agent generated only the explanation “The value is specifically certain,” which means
that this value should be kept as such (i.e., not generalized) in the learned rule.
The expert may also provide a new explanation, even using new instances, concepts, or
features. In such a case, the expert should first define the new elements in the ontology.
After that, the expert may guide the agent to generate the desired explanations.
If the example contains any generic instance, such as “Artificial Intelligence,” the agent
will automatically select the explanation piece “Artificial Intelligence is Artificial Intelligence”
(see Explanation pane on the left side of Figure 9.11), meaning that this instance will
appear as such in the learned rule. If the expert wants Artificial Intelligence to be general-
ized, he or she should simply remove that explanation by clicking on it and on the Remove
button at its right.
The expert may also define explanations involving functions and comparisons, as will
be discussed in Sections 9.12.3 and 9.12.4.
Notice, however, that the explanation of the example may still be incomplete for at least
three reasons:
The ontology of the agent may be incomplete, and therefore the agent may not be able
to propose all the explanation pieces of the example simply because they are not
present in the ontology
1. Select an object
2. Click on “Search”
13:59:50,
.010
264 Chapter 9. Rule Learning
The agent shows the plausible explanation pieces incrementally, as guided by the
expert, and if one of the actual explanation pieces is not among the first ones shown, it
may not be seen and selected by the expert
It is often the case that the human expert forgets to provide explanations that corres-
pond to common-sense knowledge that also is not represented in the question/
answer pair
The incompleteness of the explanation is not, however, a significant problem because the
explanation may be further extended during the rule refinement process, as discussed
in Chapter 10.
To conclude, Table 9.4 summarizes the mixed-initiative explanation generation method.
Once the expert is satisfied with the identified explanation pieces, the agent will generate the
rule, as discussed in the following sections.
As indicated in Table 9.2 (p. 260), once the explanation of the example is found, the agent
generates a very specific IF-THEN rule with an applicability condition that covers only that
example. The top part of Figure 9.13 shows an example, and the bottom part shows the
generated specific rule that covers only that example. Notice that each instance (e.g., Bob
Sharp) and each constant (e.g., certain) is replaced with a variable (i.e., ?O1, ?Sl1).
However, the applicability condition restricts the possible values of these variables to
those from the example (e.g., “?O1 is Bob Sharp”). The applicability condition also includes
the properties and the relationships from the explanation. Therefore, the rule from the
bottom of Figure 9.13 will cover only the example from the top of Figure 9.13. This rule will
be further generalized to the rule from Figure 9.8, which has a plausible upper bound
condition and a plausible lower bound condition, as discussed in the next section. In
particular, the plausible upper bound condition will be obtained as the maximal general-
ization of the specific condition in the context of the agent’s ontology. Similarly, the
plausible lower bound condition will be obtained as the minimal generalization of the
specific condition that does not contain any specific instance.
Let E be an example.
Repeat
The expert focuses the agent’s attention by selecting some of the instances and constants from
the example.
The agent proposes what it determines to be the most plausible explanation pieces related to
the selected entities, ordered by their plausibility.
The expert chooses the relevant explanation pieces.
The expert may ask for the generation of additional explanation pieces related to the selected
instances and constants, may select different ones, or may directly specify explanation pieces.
13:59:50,
.010
9.7. Analogy-based Generalization 265
Example
Specific rule
13:59:50,
.010
266 Chapter 9. Rule Learning
explains explains?
initial example similar example
I need to I need to
Bob Sharp is interested in an area of Peter Jones is interested in an area of
expertise of John Doe. expertise of Dan Smith.
similar
Therefore I conclude that Therefore I conclude that
It is certain that Bob Sharp is interested It is certain that Peter Jones is interested
in an area of expertise of John Doe. in an area of expertise of Dan Smith.
they are both less general then a given expression that represents the analogy criterion.
Consequently, the preceding question may be rephrased as:
Given the explanation EX of an example E, which generalization of EX should be
considered an analogy criterion, enabling the agent to generate reductions that are analo-
gous to E?
There are two interesting answers of this question, one given by a cautious learner, and
the other given by an aggressive learner, as discussed in the next sections.
13:59:50,
.010
9.7. Analogy-based Generalization 267
object
Most general generalization
?O1 is person
actor area of UB is interested in ?O3
expertise ?O2 is person
person UB
is expert in ?O3
subconcept of
?O3 is area of expertise
employee student ?SI1 is in [certain – certain]
Computer
Science Maximal
university employee generalization
instance of Specif ic condit ion
faculty member graduate ?O1 is Bob Sharp
student Artificial is interested in ?O3
subconcept of
Intelligence LB ?O2 is John Doe
professor PhD advisor ?O3 is expert in ?O3
LB ?O3 is Artificial Intelligence
subconcept of subconcept of
is expert in ?SI1 is exactly certain
associate LB PhD student LB domain person
professor range area of expert ise
instance of instance of is interested in
John Doe Bob Sharp domain person
range area of expertise
?O2 ?O1
Figure 9.16. Maximal generalization of the specific applicability condition.
Now consider John Doe. Its most general generalization is object \ domain(is expert in) =
object \ person = person.
Consider now Artificial Intelligence. It appears as a value of the features “is interested in”
and “is expert in.” Therefore, its maximal generalization is: object \ range(is interested in) \
range(is expert in) = object \ area of expertise \ area of expertise = area of expertise.
13:59:50,
.010
268 Chapter 9. Rule Learning
On the other hand, the maximal generalization of “certain” is the interval with a single
value “[certain – certain]” because ?Sl1 is restricted to this value by the feature “is exactly.”
Let us consider again the example from the top part of Figure 9.13, but now let us
assume that “The value is specifically certain” was not identified as an explanation piece.
That is, the explanation of the example consists only of the following pieces:
In this case, the generated specific condition is the one from the bottom of Figure 9.17, and
its maximal generalization is the one from the top of Figure 9.17. Notice that the maximal
generalization of “certain” is the entire interval [no support – certain] because there is no
restriction on the possible values of ?Sl1.
Figure 9.17. Maximal generalization of a symbolic probability value when no explanation is identified.
13:59:50,
.010
9.7. Analogy-based Generalization 269
Analogy associate
criterion PhD student PhD advisor
professor
instance of instance of
Artificial Intelligence
?O1 ?O2
is interested in is is expert in
?O3
Probability of solution is always certain
Similarly, the specific instance John Doe is minimally generalized to PhD advisor or
associate professor, because these are the minimal generalizations of John Doe and neither
is more specific than the other. Additionally, both these concepts are subconcepts of
person, the domain of is expert in.
Because Artificial Intelligence is a generic instance, it can appear in the learned rule (as
opposed to the specific instances Bob Sharp and John Doe). Therefore, its minimal general-
ization is Artificial Intelligence itself. Similarly, the constants (such as certain) can also appear
in the learned rule, and they are kept as such in the minimal generalization.
13:59:50,
.010
270 Chapter 9. Rule Learning
Notice that if you want an instance to appear in the condition of a learned rule, it needs
to be defined as a generic instance. Specific instances are always generalized to concepts
and will never appear in the condition.
The partially learned rule is shown in the bottom part of Figure 9.8 (p. 259). Notice that the
features are listed only once under the bounds because they are the same for both bounds.
The generated rule is analyzed to determine whether there are any variables in the
THEN part that are not linked to some variable from the IF part. If such an unlinked
variable exists, then it can be instantiated to any value, leading to solutions that make no
sense. Therefore, the agent will interact with the expert to find an additional explanation
that will create the missing link and update the rule accordingly.
The generated rule is also analyzed to determine whether it has too many instances in
the knowledge base, which is also an indication that its explanation is incomplete and
needs to be extended.
The rule learned from an example and its explanation depends on the ontology of the
agent at the time the rule was generated. If the ontology changes, the rule may need to be
updated, as will be discussed in Chapter 10. For example, the minimal generalization of a
specific instance will change if a new concept is inserted between that instance and the
concept above it. To enable the agent to update its rules automatically when relevant
changes occur in the ontology, minimal generalizations of the examples and their explan-
ations are associated with the learned rules.
Why is the agent maintaining minimal generalizations of examples instead of the
examples themselves? Because the examples exist only in Scenario KBs, where the specific
instances are defined, while the rules are maintained in the Domain KB. If a scenario is no
longer available, the corresponding examples are no longer defined. However, generalized
examples (which do not contain specific instances) will always be defined in the Domain
KB. Thus the generalized examples represent a way to maintain a history of how a rule was
learned, independent of the scenarios. They are also a compact way of preserving this
history because one generalized example may correspond to many actual examples.
Figure 9.20 shows the minimal generalization of the example and its explanation from
which the rule in Figure 9.8 (p. 259) was learned.
One should notice that the minimal generalization of the example shown at the top part
of Figure 9.20 is not the same with the plausible lower bound condition of the learned rule
from Figure 9.8. Consider the specific instance John Doe from the example.
In the ontology, John Doe is both a direct instance of PhD advisor and of associate
professor (see Figure 9.16). In the lower bound condition of the rule, John Doe is general-
ized to PhD advisor or associate professor, indicated as (PhD advisor, associate professor),
because each of these two concepts is a minimal generalization of John Doe in the
ontology. Thus the agent maintains the two concepts as part of the lower bound of the
rule’s version space: one corresponding to PhD advisor, and the other corresponding to
13:59:50,
.010
9.10. Hypothesis Learning 271
Generalized Example
?O1 is PhD student
is interested in ?O3
?O2 is PhD advisor
is associate professor
is expert in ?O3
?O3 is Artificial Intelligence
?SI1 is in [certain - certain]
Covered positive examples: 1
Covered negative examples: 0
Minimal example
generalization
Example and its explanation
?O1 is Bob Sharp
is interested in ?O3
?O2 is John Doe
is expert in ?O3
?O3 is Artificial Intelligence
?SI1 is exactly certain
associate professor. During further learning, the agent will choose one of these generaliza-
tions or a more general generalization that covers both of them.
In the minimal generalization of the example, John Doe is generalized to PhD advisor
and associate professor because this is the best representation of the minimal generaliza-
tion of the example that can be used to regenerate the rule, when changes are made to the
ontology. This minimal generalization is expressed as follows:
?O2 is PhD advisor
is associate professor
Initially, the generalized example shown at the top of Figure 9.20 covers only one
specific example. However, when a new (positive or negative) example is used to refine
the rule, the agent checks whether it is already covered by an existing generalized example
and records this information. Because a generalized example may cover any number of
specific positive and negative examples, its description also includes the number of
specific examples covered, as shown in the top part of Figure 9.20.
Cases of using generalized examples to regenerate previously learned rules are pre-
sented in Section 10.2.
In addition to learning a general reduction rule from a specific reduction example, Disciple-
EBR also learns general hypotheses (or problems). The left-hand side of Figure 9.21 shows
13:59:50,
.010
Hypothesis Generalizaon
1. Modeling 2. Learning
Reducon Example
.010
13:59:50,
Figure 9.21. A reduction rule and four hypotheses learned from a specific hypothesis reduction.
9.10. Hypothesis Learning 273
The hypothesis learning method, shown in Table 9.5, is very similar with the rule-learning
method.
Figure 9.22 illustrates the automatic learning of a general hypothesis from the specific
hypothesis, “John Doe would be a good PhD advisor for Bob Sharp,” when no explanation is
provided.
The specific instances, John Doe and Bob Sharp, are replaced with the variables ?O1 and
?O2, respectively, as in the reduction rule.
The lower bounds of these variables are obtained as the minimal generalizations of
John Doe and Bob Sharp, according to the agent’s ontology from the left-hand side of
Figure 9.22, because both of them are specific instances. Notice that there are two minimal
generalizations of John Doe: PhD advisor and associate professor. The minimal generaliza-
tion of Bob Sharp is PhD student.
The upper bounds are obtained as the maximum generalizations of John Doe and
Bob Sharp, according to the agent’s ontology from the left-hand side of Figure 9.22. They
are both object.
During the explanation generation process, the user may wish to restrict the general-
ization of the hypothesis by providing the following explanations:
13:59:50,
.010
274 Chapter 9. Rule Learning
object UB
Most specific generalization Most general generalization
?O1 is (PhD advisor, associate professor) ?O1 is object
actor
?O2 is PhD student ?O2 is object
associate
LB PhD student LB
professor
instance of instance of
In this case, the lower bound condition of the learned hypothesis remains the same, but
the upper bound condition becomes:
13:59:50,
.010
9.11. Hands On: Rule and Hypotheses Learning 275
Hypothesis Pattern or the Learn Tree Patterns commands that were introduced in
Section 4.10. Finally, it may be learned by specifically invoking hypothesis (problem)
learning when working with the Mixed-Initiative Reasoner. But it is only this last situation
that also allows the definitions of explanations, as will be discussed later in this section. In
all the other situations, a hypothesis is automatically learned, with no explanations, as was
illustrated in Figure 9.22.
The overall user–agent interactions during the hypothesis explanation process are
illustrated in Figure 9.23 and described in Operation 9.1. Here it is assumed that the
reasoning tree was already formalized and thus a hypothesis pattern was already learned.
If it was not learned, the pattern will be automatically learned before the explanations are
identified.
This case study will guide you to use Disciple-EBR to learn rules and hypotheses from
examples. More specifically, you will learn how to:
The overall user–agent interactions during the rule- (and hypotheses-) learning process
are illustrated in Figure 9.24 and described in Operation 9.2. It is assumed that the
13:59:50,
.010
1. In “Reasoning Hierarchy” (a) Select Hypothesis 4. In “Reasoning Step” (b) Accept Explanation Pieces
6. Click on “Accept”
5. Select a relevant
3. Click on explanation piece
“Modify Explanations”
2. Click on a
hypothesis to select it
Figure 9.23. Overview of the user–agent interactions during the hypothesis explanation process.
1. In “Reasoning Hierarchy” 4. In “Reasoning Step” (b) Accept Explanation Pieces
(a) Select Reasoning Step
6. Select a relevant
explanation piece 7. Click on “Accept”
3. Click on
2. Click on question/answer “Learn Condition” 5. If necessary, select and remove any
to select the reasoning step automatically selected explanation piece
Figure 9.24. Overview of the user–agent interactions during rule (and hypotheses) learning.
278 Chapter 9. Rule Learning
reasoning tree is formalized. If not, one can easily formalize it in the Evidence workspace
before invoking rule learning by simply right-clicking on the top node and selecting
Learn Tree.
You are now ready to perform a rule-learning case study. There are two of them, a shorter
one and a longer one. In the shorter case study, you will guide the agent to learn the rule
from Figure 9.8, as discussed in the previous sections. In the longer case study, you will
guide the agent to learn several rules, including the rule from Figure 9.8.
Start Disciple-EBR, select one of the case study knowledge bases (either “11-Rule-Learning-
short/Scen” or “11-Rule-Learning/Scen”), and proceed as indicated in the instructions at the
bottom of the opened window.
A learned rule can be displayed as indicated in the following operation.
13:59:50,
.010
9.12. Explanation Generation Operations 279
At the top of the Rule Viewer, notice the name of the rule (e.g., DDR.00018). You may
also display or delete this rule with the Rule Browser, as described in Operations 10.4
and 10.5.
Click on the X button of the Rule Viewer to close it.
1. Select an object
2. Click on “Search”
13:59:50,
.010
280 Chapter 9. Rule Learning
Select one or several entities from the “Elements to search for” pane, such as certain in
Figure 9.25. You may also need to deselect some entities by clicking on them.
Click on the Search button, asking the agent to generate explanation pieces related to
the selected entities.
Select an explanation piece and click on the Accept button.
Click on the See More button to see more of the generated explanation pieces.
Repeat the preceding steps until all the desired explanations are generated and selected.
Notice that these types of explanation pieces are generated when only constants or generic
instances are selected in the “Elements to search for” pane. For example, only “certain”
was selected in the “Elements to search for” pane in the upper-right part of Figure 9.25,
and therefore the potential explanation piece “The value is specifically certain” was
generated.
Notice also that explanations such as “Artificial Intelligence is Artificial Intelligence” can be
generated only for generic instances. They cannot be generated for specific instances
because these instances are always generalized in the learned rules.
13:59:50,
.010
9.12. Explanation Generation Operations 281
using relationships from the agent’s ontology. They are the following ones, and are shown
also in the left-hand side of Figure 9.27:
13:59:50,
.010
282 Chapter 9. Rule Learning
Additionally, you have to teach the agent how the price is actually computed. You invoke
the Expression Editor by clicking on the Edit Expression button, which displays a pane to
define the expression (see the bottom right of Figure 9.27). Then you fill in the left side of
the equality with the price, and the right side with the expression that leads to this price, by
using the numbers from the example:
Figure 9.28. Learned rule with a learned function in the applicability condition.
13:59:50,
.010
9.12. Explanation Generation Operations 283
Additionally, you have to indicate that 650.35 is greater than 519.75. You click on the
Create New… button, which opens a window allowing you to define a new explanation as
an object-feature-value triplet (see the bottom of Figure 9.29). In the left editor, you
start typing the amount of money Mike has (i.e., 650.35) and select it from the completion
13:59:50,
.010
284 Chapter 9. Rule Learning
pop-up. In the center editor, you type >=. Then, in the right editor, you start typing the
actual cost (519.75) and select it from the completion pop-up. Finally, you click on the OK
button in the Create explanation window to select this explanation:
13:59:50,
.010
9.13. Guidelines for Rule and Hypothesis Learning 285
In the center editor, type the comparison operator (<, <=, =, !=, >=, or >).
In the right editor, type the number corresponding to the right side of the comparison.
Click on the OK button in the Create explanation window to accept the explanation.
Guideline 9.1. Properly identify all the entities in the example before
starting rule learning
Before starting rule learning, make sure that all the elements are properly recognized as
instances, numbers, symbolic intervals, or strings. This is important because only the
entities with one of these types will be replaced with variables as part of rule learning, as
shown in the top part of Figure 9.31. Recognizing concepts is also recommended, but it is
optional, since concepts are not generalized. However, recognizing them helps the agent
in explanation generation.
Notice the case from the middle part of Figure 9.31. Because “the United States”
appears as text (in black) and not as instance (in blue), it will not be replaced with a
variable in the learned rule. A similar case is shown at the bottom of Figure 9.31. Because
600.0 is not recognized as number (in green), it will appear as such in the learned rule,
instead of being generalized to a variable.
13:59:50,
.010
286 Chapter 9. Rule Learning
Guideline 9.2. Avoid learning from examples that are too specific
It is important to teach the agent with good examples from which it can learn general
rules. A poor example is illustrated in the upper-left part of Figure 9.32. In this case, the
amount of money that Mike has is the same as the price of the Apple iPad 16GB. As a result,
both occurrences of 519.75 are generalized to the same variable ?N1, and the agent will
learn a rule that will apply only to cases where the amount of money of the buyer is exactly
the same as the price of the product (see the upper-right part of Figure 9.32).
You need instead to teach the agent with an example where the numbers are different,
such as the one from the bottom-left part of Figure 9.32. In this case, the agent will
generalize the two numbers to two different variables. Notice that the learned rule will
also apply to cases where ?N1 = ?N2.
Therefore, before starting rule learning, review the modeling and check that you have
defined the features suggested by the Q/A pair in the ontology.
What are the other explanation pieces for this reduction step? You will need to define
two additional explanation pieces involving comparisons, as well as explanation pieces
fixing the values 41, 53, and very likely, as shown in the left-hand side of Figure 9.34. The
learned rule is shown in the right-hand side of Figure 9.34.
Does Mike have enough money Learning Does ?O1 have enough money
to buy Apple iPad 16GB? to buy ?O2?
Yes, because Mike has 519.75 Yes, because ?O1 has ?N1
dollars and Apple iPad 16GB dollars and ?O2
costs 519.75 dollars. costs ?N1 dollars.
Does Bob have enough money Does ?O1 have enough money
to buy Apple iPad 16GB? to buy ?O2?
Yes, because Bob has 620.25 Yes, because ?O1 has ?N1
dollars and Apple iPad 16GB dollars and ?O2
costs 519.75 dollars. costs ?N2 dollars.
13:59:50,
.010
9.13. Guidelines for Rule and Hypothesis Learning 287
When you define a new feature, make sure you define both the domain and the range.
Do not leave the ones generated automatically. The automatically generated range (Any
Element) is too general, and features with that range are not even used in explanations.
Therefore, make sure that you select a more specific domain and range, such as a concept,
a number interval, or a symbolic interval.
If you have already defined facts involving that feature, you need to remove them first
before you can change the domain and the range of the feature.
13:59:50,
.010
288 Chapter 9. Rule Learning
13:59:50,
.010
9.15. Review Questions 289
Learn rules from the reasoning trees developed in the previous project assignments.
9.3. Consider the following expression, where both Jane Austin and Bob Sharp are
specific instances:
Find its minimal generalization that does not contain any instance, in the context
of the ontological knowledge from Figure 9.35. Find also its maximal generalization.
9.4. Consider the following explanation of a reduction:
9.6. Consider the ontological knowledge from Figure 9.37, where Dana Jones, Rutgers
University, and Indiana University are specific instances.
(a) What are the minimal generalization and the maximal generalization of the
following expression?
13:59:50,
.010
290 Chapter 9. Rule Learning
object
subconcept of
has as advisor
actor research area domain student
range faculty member
subconcept of
organization person
subconcept of
employee student
subconcept of
subconcept of
graduate undergraduate
faculty member staff member student student
subconcept of subconcept of
subconcept of
instructor professor PhD advisor graduate graduate subconcept of
research teaching
subconcept of assistant assistant BS student
John Smith John Doe Jane Austin Bob Sharp Joan Dean
university
faculty member
instance of
subconcept of
George Mason Indiana
professor PhD advisor University University
subconcept of
13:59:50,
.010
9.15. Review Questions 291
university
faculty member instance of
subconcept of instance of
Rutgers Indiana
professor PhD advisor University University
Dana Jones
(b) What are the minimal generalization and the maximal generalization of the
following expression?
9.7. Consider the example problem reduction and its explanation from Figure 9.38.
Which is the specific rule condition covering only this example? What rule will be
learned from this example and its explanation, assuming the ontology fragment
13:59:50,
.010
292 Chapter 9. Rule Learning
from Figure 9.39? What general problem will be learned from the specific IF
problem of this reduction?
Notice that some of the instances are specific (e.g., Aum Shinrikyo and Masami
Tsuchiya), while others are generic (e.g., chemistry).
9.8. Consider the problem reduction example and its explanation from Figure 9.40.
Which is the specific rule covering only this example? What rule will be learned
from this example and its explanation, assuming the ontology fragment from
Figure 9.41? What general problems will be learned from this example? Assume
that all the instances are specific instances.
has as member
domain organization
range person
13:59:50,
.010
9.15. Review Questions 293
object
is tesmony by is tesmony about
subconcept of domain evidence domain evidence
range source range evidence
actor source evidence
subconcept of subconcept of
tesmonial evidence
person
subconcept of subconcept of
direct tesmonial
author terrorist
item of evidence evidence
instance of instance of subconcept of
subconcept of
Hamid Mir Osama bin Laden tesmonial tesmonial evidence
evidence obtained non-elementary elementary piece based on direct
interview at second hand piece of evidence of evidence observaon
instance of instance of
EVD-Dawn-Mir-01-01 EVD-Dawn-Mir-01-01c
9.9. Compare the rule-learning process with the traditional knowledge acquisition
approach, where a knowledge engineer defines such a rule by interacting with
a subject matter expert. Identify as many similarities and differences as possible,
and justify the relative strengths and weaknesses of the two approaches, but be as
concise as possible.
13:59:50,
.010
10 Rule Refinement
Regardless of the origin of the example, the goal of the agent is to refine the rule to be
consistent with the example. A possible effect of rule refinement is the extension of the
ontology.
The rule refinement problem is defined in Table 10.1 and an overview of the rule
refinement method is presented in the next section.
GIVEN
A plausible version space reduction rule
A positive or a negative example of the rule (i.e., a correct or an incorrect reduction)
A knowledge base that includes an ontology and a set of (previously learned) reduction rules
An expert who will interact with the agent, helping it understand why the example is positive
(correct) or negative (incorrect)
DETERMINE
An improved rule that covers the example if it is positive, and does not cover the example if it is
negative
An extended ontology, if this is needed for rule refinement
294
13:54:13,
.011
10.1. Incremental Rule Refinement 295
Learning by Analogy
Knowledge Base and Experimentation Rule
Rule’s condition IF we have to solve
– <Problem>
– +
+ Main
PVS Condition
Except-When
Failure PVS Condition
explanation
Examples of problem reductions
THEN solve
generated by the agent <Subproblem 1>
…
<Subproblem m>
Incorrect Correct
example example
Learning from
Explanations Learning from Examples
Figure 10.1. Multistrategy rule refinement.
13:54:13,
.011
296 Chapter 10. Rule Refinement
version space condition may be learned, starting from that negative example and its failure
explanation. This plausible version space Except-When condition is represented by the red
ellipses at the top of Figure 10.1.
The refined rule is shown in the right-hand side of Figure 10.1. The applicability
condition of a partially learned rule consists of a main applicability condition and zero,
one, or more Except-When conditions. The way the rule is refined based on a new
example depends on the type of the example (i.e., positive or negative), on its position
with respect to the current conditions of the rule, and on the type of the explanation of
the example (if identified). The refinement strategies will be discussed in more detail in
the next sections by considering a rule with a main condition and an Except-When
condition, as shown in Figure 10.2. We will consider all the possible nine positions of the
example with respect to the bounds of these conditions. Notice that the presented
methods will similarly apply when there is no Except-When condition or more than
one Except-When condition.
We will first illustrate rule refinement with a positive example and then we will present
the general method.
.
5
XL: Except-When Condition
ML: Main Condition
6 Plausible Lower Bound
.
Plausible Lower Bound
. 1
..
7
. 8
Universe of
.
2 9
Instances
3
Figure 10.2. Partially learned condition and various positions of a new example.
13:54:13,
.011
Positive example that satisfies the upper bound but not the lower bound
Figure 10.4. Minimal generalization of the rule’s plausible lower bound condition.
13:54:13,
.011
10.1. Incremental Rule Refinement 299
Let R be a plausible version space rule, U its main plausible upper bound condition, L its main plausible
lower bound condition, and P a positive example of R covered by U and not covered by L.
end
Determine Pg, the minimal generalization of the example P (see Section 9.9).
Return the generalized rule R with the updated conditions U and L, and Pg in the list of
generalized examples of R.
Refined rule
Rule’s condition
13:54:13,
.011
300 Chapter 10. Rule Refinement
problem, however, is trivial in Disciple-EBR because both the plausible lower bound
condition and the condition corresponding to the example have exactly the same struc-
ture, and the corresponding variables have the same names, as shown in Figure 10.5. This
is a direct consequence of the fact that the example is generated from the plausible upper
bound condition of the rule.
Based on this failure explanation, the agent generates an Except-When plausible version
space condition by applying the method described in Sections 9.6 and 9.7. First it reformu-
lates the explanation as a specific condition by using the corresponding variables from the
rule, or by generating new variables (see also the bottom-left part of Figure 10.7):
Then the agent generates a plausible version space by determining maximal and minimal
generalizations of the preceding condition. Finally, the agent adds it to the rule as an
Except-When plausible version space condition, as shown in the bottom-right part of
Figure 10.7. The Except-When condition should not be satisfied to apply the rule. Thus, in
order to conclude that a professor will stay on the faculty for the duration of the disserta-
tion of a student, the professor should have a long-term position (the main condition) and
it should not be the case that the professor plans to retire from the university (the Except-
When condition).
Figure 10.8 shows the further refinement of the rule with an additional negative
example. This example satisfies the rule in Figure 10.7. Indeed, Jane Austin has a long-term
position and she does not plan to retire from George Mason University. Nevertheless, the
expert rejects the reasoning represented by this example because Jane Austin plans to
move to Indiana University. Therefore, she will not stay on the faculty of George Mason
University for the duration of the dissertation of Bob Sharp.
13:54:13,
.011
10.1. Incremental Rule Refinement 301
MU XU XU
MU
2. If E is covered by MU, is not covered by ML, and is not
covered by XU (case 2), then minimally generalize ML ML
XL
ML
XL
MU XU MU XU
3. If E is not covered by MU (cases 3 and 5), or if E is 5 +
covered by XL (cases 5, 6, and 7), then keep E as a ML 6 XL
ML + XL
as a positive exception. 8
ML
XL
13:54:13,
.011
Negative example generated by the rule Rule that generated the negative example
.011
13:54:13,
Failure explanation
Dan Smith plans to retire from George Mason University
+
-
13:54:13,
Failure explanation
Dan Smith plans to rere from George Mason University
Rewrite as
Specific Except-When condition
?O1 is Dan Smith
plans to retire from ?O2
?O2 is George Mason University
Negative example
.011
13:54:13,
+
-
Failure explanation -
Rewrite as
Specific Except-When condition
?O1 is Jane Austin
Max generalization
plans to move to ?O5 Min generalization
?O5 is Indiana University
Notice that the agent has introduced a new variable ?O5 because Indiana University does
not correspond to any entity from the previous form of the rule (as opposed to Jane Austin
who corresponds to ?O1).
Then the agent generates a plausible version space by determining maximal and
minimal generalizations of the preceding condition. Finally, the agent adds it to the rule
as an additional Except-When plausible version space condition, as shown at the bottom-
right part of Figure 10.8.
Then the agent developed a partial reasoning tree, but it was unable to assess one of the
subhypotheses:
Jill Knox will stay on the faculty of George Mason University for the duration of the
dissertation of Peter Jones.
Let R be a plausible version space rule, N an instance of R rejected by the expert as an incorrect
reasoning step (a negative example of R), and EX an explanation of why N is incorrect (a failure
explanation).
(1) Reformulation of the Failure Explanation
Generate a new variable for each instance and each constant (i.e., number, string, or symbolic
probability) that appears in the failure explanation EX but does not appears in the negative
example N. Use the new variables and the rule’s variables to reformulate the failure explanation EX
as an instance I of the concept EC representing an Except-When condition of the rule R.
(2) Analogy-based Generalizations of the Failure Explanation
Generate the plausible upper bound XU of the concept EC as the maximal generalization of I in the
context of the agent’s ontology.
Generate the plausible lower bound LU of the concept EC as the minimal generalization of I that
does not contain any specific instance.
(3) Rule Refinement with an Except-When Plausible Version Space Condition
Add an Except-When plausible version space condition (XU, LU) to the existing conditions of the
rule R. This condition should not be satisfied for the rule to be applicable in a given situation.
13:54:13,
.011
306 Chapter 10. Rule Refinement
Therefore, the expert defined the reduction of this hypothesis, which includes its assess-
ment, as shown at the bottom of Figure 10.9.
Based on this example, the agent learned a general rule, as illustrated in the right-hand
side of Figure 10.9 and as discussed in Chapter 9. The rule is shown in Figure 10.10.
1. Solving
Agent applies
learned rules to
solve new
problems
3. Learning
2. Modeling
Agent learns
a new rule
13:54:13,
.011
10.1. Incremental Rule Refinement 307
This and the other learned rules enabled the agent to develop the reasoning tree from
Figure 10.11 for assessing the following hypothesis:
However, the expert rejected the bottom reasoning step as incorrect. Indeed, the correct
answer to the question, “Is Bill Bones likely stay on the faculty of George Mason University
for the duration of the PhD dissertation of June Allison?” is, “No,” not, “Yes,” because there
is no support for Bill Bones getting tenure.
The user–agent interaction during example understanding is illustrated in Figure 10.12.
The agent identified an entity in the example (the symbolic probability “no support”) that
would enable it to specialize the upper bound of the main condition of the rule to no
longer cover the negative example. Therefore, it proposed the failure explanation shown at
the right-hand side of Figure 10.12:
The expert accepted this explanation by clicking on OK, and the rule was automatically
specialized as indicated in Figure 10.13. More precisely, the upper bound of the main
condition for the variable ?Sl1 was minimally specialized from the interval [no support –
certain] to the interval [likely – certain], in order to no longer cover the value no support,
while continuing to cover the interval representing the lower bound, which is [almost
certain – almost certain].
3. Solving
Agent applies
learned rules
5. Refinement
to solve new
problems
Agent refines
rule with
negative example
4. Critiquing
Incorrect because + -
of the “no support”
probability
13:54:13,
.011
308 Chapter 10. Rule Refinement
Specialized rule
Failure explanation
Min specialization
Figure 10.13. Specialization of the upper bound of a plausible version space condition.
13:54:13,
.011
10.2. Learning with an Evolving Ontology 309
Let R be a plausible version space rule, U the plausible upper bound of the main condition, L the
plausible lower bound of the main condition, N a negative example covered by U and not covered
by L, and C an entity from N that is blamed for the failure.
1. Let ?X be the variable from the rule’s conditions that corresponds to the blamed
entity C.
Let UX and LX be the classes of ?X in the two bounds.
If each concept from LX covers C
then Continue with step 2.
else Continue with step 3.
2. The rule cannot be specialized to uncover the current negative example.
The negative example N is associated with the rule as a negative exception.
Return the rule R.
3. There are concepts in LX that do not cover C. The rule can be specialized to uncover
N by specializing UX, which is known to be more general than C.
3.1. Remove from LX any element that covers C.
3.2. Repeat for each element ui of UX that covers C
Remove ui from UX.
Add to UX all minimal specializations of ui that do not cover C and are more general than
or at least as general as a concept from LX.
Remove from UX all the concepts that are less general than or as general as other
concepts from UX.
end
4. Return the specialized rule R.
13:54:13,
.011
310 Chapter 10. Rule Refinement
exception. _
5 _
need to be refined because the example is correctly XL _ XL
ML 6 ML
classified as negative by the current rule. If N is not
7 _
covered by MU, is not covered by XL, and is covered
by XU (case 4), then minimally generalize XL to
_
cover N and remain less general than XU. 3
MU XU MU XU
4. If N is covered by ML and by XU, but it is not covered
by XL (case 8), or N is covered by MU and by XU, but ML
XL
ML
XL
and their explanations in the context of the updated ontology. This is, in fact, the reason
why the generalized examples are maintained with each rule, as discussed in Section 9.9.
The rule regeneration problem is presented in Table 10.7. Notice that not all the
changes of an ontology lead to changes in the previously learned rules. For example,
adding a new concept that has no instance, or adding a new instance, will not affect the
previously learned rules. Also, renaming a concept or a feature in the ontology automatic-
ally renames it in the learned rules, and no additional adaptation is necessary.
13:54:13,
.011
10.2. Learning with an Evolving Ontology 311
the ontology a version, and each time a rule is learned or refined, it associates the version
of the ontology with the rule. The version of the ontology is incremented each time a
significant change – that is, a change that may affect the conditions of the previously
learned rules – is made. Then, before using a rule in problem solving, the agent checks
the rule’s ontology version with the current version of the ontology. If the versions are
the same, the rule is up to date and can be used. Otherwise, the agent regenerates the
rule based on the current ontology and also updates the rule’s ontology version to the
current version of the ontology. The on-demand rule regeneration method is presented
in Table 10.8.
GIVEN
A plausible version space reduction rule R corresponding to a version v of the ontology
Minimal generalizations of the examples and explanations from which the rule R was learned, in
the context of the version v of the ontology
An updated ontology with a new version v’
DETERMINE
An updated rule that corresponds to the same generalized examples, but in the context of the
new version v’ of the ontology
Updated minimal generalizations of the specific examples from the current scenario, if any
object
educational
expert employee student organization faculty position Computer Science
13:54:13,
.011
312 Chapter 10. Rule Refinement
We will first illustrate the regeneration of the rules presented in the previous sections
and then provide the general regeneration method.
Let R be a plausible version space rule, and O the current ontology with version v.
If R’s ontology version is v, the same as the version of the current ontology O
then Return R (no regeneration is needed).
else Regenerate rule R (see Table 10.9).
Set R’s ontology version to v.
Return R
Min Max
generalization generalization
Figure 10.15. The updated conditions of the rule from Figure 10.5 in the context of the ontology
from Figure 10.14.
13:54:13,
.011
10.2. Learning with an Evolving Ontology 313
in Section 9.9. The top part of Figure 10.15 shows the updated bounds of the rule, in the
context of the updated ontology from Figure 10.14. The new plausible lower bound condi-
tion is the minimal generalization of the generalized examples, in the context of the updated
ontology. Similarly, the new plausible upper bound condition is the maximal generalization
of the generalized examples in the context of the updated ontology.
Notice that in the updated plausible lower bound condition (shown in the upper-left
part of Figure 10.15), ?O2 is now a professor, instead of a professor or PhD advisor. Indeed,
note the following expression from the first generalized example shown in the lower-left
part of Figure 10.15:
Based on the updated ontology, where associate professor is a subconcept of PhD advisor,
this expression is now equivalent to the following:
Similarly, note the following expression from the second generalized example shown in
the lower-right part of Figure 10.15:
Because full professor is now a subconcept of PhD advisor, this expression is now equivalent
to the following:
Then the minimal generalization of the expressions [10.2] and [10.4] is the following
expression because professor is the minimal generalization of associate professor and full
professor:
Also, in the updated plausible upper bound condition (shown in the upper-right part of
Figure 10.15), ?O2 is an expert instead of a person, because this is the maximal generaliza-
tion of associate professor and full professor, which is included into the domain of the is
expert in feature of ?O2, which is expert.
Let us now consider the rule from the right-hand part of Figure 10.8 (p. 304). The
minimal generalizations of the examples from which this rule was learned are shown
under the updated conditions in Figure 10.16. They were determined based on the
ontology from Figure 9.7. You remember that this rule was learned from one positive
example and two negative examples. However, each of the negative examples was used as
a positive example of an Except-When plausible version space condition. That is why each
of the generalized examples in Figure 10.16 has a positive example.
13:54:13,
.011
Except-When Condition 1
Plausible Lower Plausible Upper
Bound Condition (LB) Bound Condition (UB)
?O1 is full professor ?O1 is employee
plans to retire from ?O2 plans to retire from ?O2
?O2 is university ?O2 is organization
Min Max
Main Condition generalization generalization
Plausible Lower Plausible Upper
Bound Condition (LB) Bound Condition (UB) Example generalization
?O1 is full professor
?O1 is associate professor ?O1 is employee
is PhD advisor
has as position ?O4 has as position ?O4
plans to retire from ?O2
?O2 is university ?O2 is actor
?O2 is university
?O3 is PhD student ?O3 is actor
?O4 is tenured position ?O4 is long-term faculty position Covered positive examples: 1
?SI1 is in [almost certain - almost certain] ?SI1 is in [almost certain - almost certain] Covered negative examples: 0
Plausible Upper
Example Generalization Bound Condition (LB) Bound Condition (UB)
?O1 is associate professor ?O1 is full professor ?O1 is person
plans to move to ?O5 plans to move to ?O5
13:54:13,
is PhD advisor
has as position ?O4 ?O5 is university ?O5 is organization
?O2 is university
?O3 is PhD student Min Max
?O4 is tenured position generalization generalization
?SI1 is exactly almost certain Example Generalization
Covered positive examples: 1 ?O1 is full professor
Covered negative examples: 0 is PhD advisor
plans to move to ?O5
?O5 is university
The lower and upper bounds of the rule in Figure 10.8 (p. 304) were updated by
computing the minimal and maximal generalizations of these generalized examples in
the context of the updated ontology from Figure 10.14. Let us first consider the updated
version space of the main condition. Notice that in the lower bound, ?O1 is now associate
professor, instead of PhD advisor or associate professor (see Figure 10.8). Also, in the upper
bound, ?O1 is employee instead of person. The version spaces of the Except-When condi-
tions have also been updated. In the first Except-When condition, ?O1 is now full professor
in the lower bound and employee in the upper bound, instead of PhD advisor or full
professor and person, respectively. Similarly, in the second Except-When condition, ?O1
is now full professor in the lower bound, instead of PhD advisor or full professor.
Finally, let us consider the rule from Figure 10.13 (p. 308), which was learned based on
the ontology from Figure 9.7, as discussed in Section 10.1.4. The minimal generalizations
of the positive and negative examples from which this rule was learned are shown at the
bottom of Figure 10.17. They were determined based on the ontology from Figure 9.7.
Notice that these generalized examples include all the explanations from which the rule
was learned. In particular, the explanation that fixed the value of “tenure-track position” is
represented as “?O4 is exactly tenure-track position” and that which excluded the value “no
support” is represented as “?SI1 is-not no support in main condition.”
Min Max
generalization generalization
Figure 10.17. The updated conditions of the rule from Figure 10.13 in the context of the ontology
from Figure 10.14.
13:54:13,
.011
316 Chapter 10. Rule Refinement
The new lower and upper bounds of the rule in the context of the updated ontology
from Figure 10.14 are shown at the top of Figure 10.17. Notice that in this case, the
regenerated rule is actually the same with the previous rule. The changes made to the
ontology did not affect this rule. However, the agent did recompute it because it cannot
know, a priori, whether the rule will be changed or not. The only change made to the rule
is to register that it was determined based on the new version of the ontology.
Hypothesis refinement is performed using methods that are very similar to the preceding
methods for rule refinement, as briefly summarized in this section.
Remember that general hypotheses are automatically learned as a byproduct of reduc-
tion rule learning, if they have not been previously learned. When a reduction rule is
refined with a positive example, each included hypothesis is also automatically refined
with its corresponding positive example. Indeed, when you say that a reduction is correct,
you are also implicitly saying that each of the included hypotheses is correct.
However, when a reduction rule is refined with a negative example, the hypotheses
are not affected. Indeed, a negative reduction example means that the corresponding
reduction is not correct, not that the any of the involved hypotheses is incorrect. For this
reason, an explanation of why a specific reduction is incorrect does not automatically
apply to the hypotheses from that reduction. If you want to say that a specific hypothesis
is incorrect, you have to select it and click on the Incorrect problem button. Then the
Let O be the current ontology with version v, and R a plausible version space rule with a different
version.
1. Recompute the formal parameters P for rule R (see Table 10.10).
2. Refresh examples for rule R (see Table 10.11).
3. If R is no longer valid (i.e., the rule has no longer any generalized positive example)
then Return null
4. Recompute plausible version space for rule R (see Table 10.12).
5. Repeat for each specific example EX of R
If the upper bound of the main plausible version space condition (PVS) of R does not cover
EX and EX is a positive example
then make EX a positive exception.
If the upper bound of PVS of R does cover EX and EX is a negative example
then make EX a negative exception.
end
6. Return R
13:54:13,
.011
10.4. Characterization of Rule Refinement 317
selected hypothesis will be refined basically using the same methods as those for rule
refinement.
Just as a refined rule, a refined hypothesis may include, in addition to the plausible
version space of the main condition, one or several plausible version spaces of Except-When
conditions, and generalized positive and negative examples. When the ontology is changed,
the hypotheses can be automatically regenerated based on the associated generalized
examples. They are actually regenerated when the corresponding rules are regenerated.
The presented rule learning and refinement methods have the following characteristics:
13:54:13,
.011
318 Chapter 10. Rule Refinement
Table 10.12 Method for Recomputing the Plausible Version Space of a Rule
Let R be a plausible version space rule having the main condition M and the list of the Except-
When conditions LX.
1. Let MP be the list of the parameters from the main condition M.
Let IP be the list of parameters from the natural language part of the rule R, referred to as
informal parameters.
MP = IP
2. Repeat for each explanation of a positive example EP in the rule R
MP = MP [ the new parameters from EP
end
3. Let LGS be the list of the generalized examples that have at least one specific positive
example.
Let LEP be the list of the explanations EP of the positive examples in the rule R.
Create the multivariable condition MC based on MP, LGS, LEP, IP (see Table 10.13).
4. Let LX be the list of Except-When conditions.
LX = [ ]
5. Repeat for each group EEp of Except-When explanations in R.
Let EP be the list of parameters used in EEp.
Let EX be the list of the generalized negative examples associated with EEp that have
at least one specific negative example.
Create the multivariable condition XC based on MP = EP, LGS = EX, LEP = EEp, and
IP = ∅ (see Table 10.13).
LX = LX [ XC
end
6. Return R
13:54:13,
.011
10.5. Hands On: Rule Refinement 319
Let R be a plausible version space rule, MP be the list of the parameters from the main condition,
LGS be the list of generalized examples that have at least one specific positive example, LEP be the
list of the explanations EP of the positive examples in the rule R, and IP be the list of informal
parameters of R.
1. Let A be the list of the generalized explanations fragments (such as “?Oi is interested in
?Oj”) from the generalized examples of R.
Compute A based on MP and LEP.
2. Let D be the domains of the variables from MP, each domain consisting of a lower bound
and an upper bound.
D=[]
3. Repeat for each parameter ?Oi in MP
Determine the list GE = {ge1, . . ., geg} of the concepts from LGS corresponding to ?Oi
(e.g., “assistant professor” from “?Oi is assistant professor”).
Determine the list PC = {PC1, . . ., PCp} of the concepts to which ?Oi must belong,
corresponding to positively constraining explanations
(e.g., “?Oi is exactly tenured position”).
Determine the list NC = {NC1, . . ., NCn} of the concepts to which ?Oi must not belong,
corresponding to negatively constraining explanations (e.g., “Sl1 is-not no support”).
Create the domains Do from the examples GE and the constraints PC and NC
(see Table 10.14).
D = D [ Do
end
4. Create the multivariable condition structure from MP, D, and A.
Return MVC
There are two rule refinement case studies, a shorter one and a longer one. In the shorter
case study, you will guide the agent to refine the rule from Figure 9.8 (p. 259), as discussed
in the previous sections. In the longer case study, you will guide the agent to refine several
rules, including the rule from Figure 9.8. You may perform the short case study, or the long
one, or both of them.
Start Disciple-EBR, select the case study knowledge base (either “13-Rule-Refinement-
short/Scen” or “13-Rule-Refinement/Scen”), and proceed as indicated in the instructions
at the bottom of the opened window.
The following are the basic operations for rule refinement, as well as additional operations
that are useful for knowledge base refinement, such as changing a generated reasoning step
into a modeling step, visualizing a rule with the Rule Editor, and deleting a rule.
13:54:13,
.011
320 Chapter 10. Rule Refinement
Table 10.14 Method for Creating Domains from Examples and Constraints
13:54:13,
.011
10.6. Guidelines for Rule Refinement 321
13:54:13,
.011
322 Chapter 10. Rule Refinement
Refine the learned reduction rules by assessing hypotheses that are similar to the ones
considered in the previous assignments.
10.3. Consider the version space from Figure 10.19. In light of the refinement strategies
studied in this chapter, how will the plausible version space be changed as a result
of a new negative example labeled 1? Draw the new version space(s).
10.4. Consider the version space from Figure 10.20. In light of the refinement strategies
studied in this chapter, what are three alternative ways in which this version space
may be changed as a result of the negative example 2?
. 1
Universe of
Instances
Figure 10.19. Version space and a negative example covered by the lower bound of the main
condition.
. 2
Universe of
Instances
Figure 10.20. Version space and a negative example covered by the upper bound of the main
condition.
13:54:13,
.011
10.8. Review Questions 323
Positive Example 1
We need to
Determine a strategic center of gravity for a
member of Allied Forces 1943.
Explanation
Which is a member of Allied Forces 1943?
Allied Forces 1943 has as member US 1943
US 1943
Therefore we need to
Determine a strategic center of gravity for US 1943.
object
multistate multistate
alliance coalition
Figure 10.22. Ontology fragment from the center of gravity analysis domain. Dotted links
indicate instance of relationships while continuous unnamed links indicate subconcept of
relationships.
10.5. (a) Consider the example and its explanation from Figure 10.21. What rule will
be learned from them, assuming the ontology from Figure 10.22, where all
the instances are considered specific instances?
(b) Consider the additional positive example from Figure 10.23. Indicate the
refined rule.
(c) Consider the negative example, its failure explanation, and the additional
ontological knowledge from Figure 10.24. Indicate the refined rule.
13:54:13,
.011
324 Chapter 10. Rule Refinement
Positive Example 2
We need to
Determine a strategic center of gravity for a member of
European Axis 1943.
Germany 1943
Therefore we need to
Determine a strategic center of gravity for Germany 1943.
Figure 10.24. Negative example, failure explanation, and additional ontology fragment.
is a major generator of
explains
10.6. Consider the example and its explanation shown in Figure 10.25. Find the plaus-
ible version space rule that will be learned based on the ontology fragments from
Figures 10.26, 10.27, and 10.28, where all the instances are defined as generic
instances.
13:54:13,
.011
10.8. Review Questions 325
raw material
industrial transportation
industrial farm implement center factor
authority industry
strategic raw
material industrial
farm implement capacity
industry of Italy 1943 transportation transportation
center network or system
oil chromium copper and
bauxite of Germany 1943
is critical to the production of
Figure 10.26. Ontology of economic factors. Dotted links indicate instance of relationships while
continuous unnamed links indicate subconcept of relationships.
object
… …
force
group
multigroup force opposing force multistate force single-state force single-group force
Figure 10.27. An ontology of forces. Dotted links indicate instance of relationships while continuous
unnamed links indicate subconcept of relationships.
10.7. Minimally generalize the rule from the left side of Figure 10.29 in order to cover
the positive example from the right side of Figure 10.29, considering the back-
ground knowledge from Figures 10.26, 10.27, and 10.28.
13:54:13,
.011
326 Chapter 10. Rule Refinement
object
… …
resource or infrastructure element
resource
product
strategic
raw material
war material and fuel war material and transports farm implements
Figure 10.28. An ontology of resources. Dotted links indicate instance of relationships while con-
tinuous unnamed links indicate subconcept of relationships.
Rule
10.8. Minimally specialize the rule from the left side of Figure 10.30, in order to cover
the positive example from the right side of Figure 10.30, considering the back-
ground knowledge from Figures 10.26, 10.27, and 10.28.
13:54:13,
.011
10.8. Review Questions 327
Table 10.15 Rule Refinement with Learning Agent versus Rule Refinement by
Knowledge Engineer
Description (highlighting
differences and similarities)
Strengths
Weaknesses
Rule
IF
Identify each strategic COG candidate with respect Negative example that satisfies the upper bound
to the industrial civilization of ?O1.
IF the task to accomplish is
Plausible Upper Bound Condition Identify each strategic COG candidate with
?O1 is force respect to the industrial civilization of Italy 1943.
has as industrial factor ?O2
?O2 is industrial factor THEN accomplish the task
is a major generator of ?O3 farm implement industry of Italy 1943 is a
?O3 is product strategic COG candidate for Italy 1943.
10.9. Consider the problem reduction step and its explanation from Figure 10.25, as
well as the ontology of economic factors from Figure 10.26. Show the correspond-
ing analogy criterion generated by a cautious learner, and an analogous reduction
made by that cautious learner.
10.11. Compare the learning-based rule refinement process discussed in this chapter
with the traditional knowledge acquisition approach discussed in Section 3.1.4 by
filling in Table 10.15. Identify similarities and differences and justify the relative
strengths and weaknesses of the two approaches.
13:54:13,
.011
328 Chapter 10. Rule Refinement
I2
I3
I1
I4
10.12. Consider the partially learned concept and the four instances from Figure 10.31.
Order the instances by the plausibility of being positive examples of this concept
and justify the ordering.
13:54:13,
.011