ML UNIT-3 Notes PDF
ML UNIT-3 Notes PDF
COLLEGE
(Approved by AICTE, Affiliated to JNTUK)(An ISO9001:2008 Certified
Institution)
Prepared by Dr.M.BHEEMALINGAIAH
UNIT-III
Content
3.1 Computational Learning Theory:
3.2 Probably approximately correct (PAC) learning
3.3 Sample Complexity for Infinite Hypothesis Spaces
3.4 Vapnik-Chervonenkis dimension
3.5 Rule Learning
3.6 Translating decision trees into rules
3.7 Sequential Covering Algorithms (Separate and Conquer Approach)
3.8 Learning First Order Rules
3.9 First Order Rule Inductive Learning (FOIL)
3.9.1 FOIL Algorithm
3.9.2 FOIL: Explanation
3.9.3 FOIL: Specializing the Current Rule
3.9.4 FOIL: Performance Evaluation Measure
3.9.5 Foil-Gain
3.9.6 Summary/Observations of FOIL
3.10 Induction as Inverted Deduction
3.11 Inverse resolution
3.12 PROGOL
1
3.1 Computational Learning Theory
Computational learning theory, or CoLT for a brief, could be a field of study involved with utilizing formal
mathematical strategies applied to learning systems. It seeks to utilize the tools of theoretical technology to
quantify learning issues. This includes characterizing the problem of learning specific tasks. Computational
learning theory is also thought of as an associate extension or relation of applied math or statistical learning
theory, or SLT for a brief that uses formal strategies to quantify learning algorithms.
When studying machine learning it is natural to wonder what general laws may govern machine (and
nonmachine) learners. Is it possible to identify classes of learning problems that are inherently difficult or
easy, independent of the learning algorithm? Can one characterize the number of training examples
necessary or sufficient to assure successful learning? How is this number affected if the learner is allowed
to pose queries to the trainer, versus observing a random sample of training examples? Can one
characterize the number of mistakes that a learner
will make before learning the target function? Can one characterize the inherent computational complexity
of classes of learning problems?
Although general answers to all these questions are not yet known, frag- ments of a computational
theory of learning have begun to emerge. This chapter presents key results from this theory, providing
answers to these questions within particular problem settings. We focus here on the problem of inductively
learning an unknown target function, given only training examples of this target func- tion and a space
of candidate hypotheses. Within this setting, we will be chiefly concerned with questions such as how
many training examples are sufficient to successfully learn the target function, and how many mistakes will
the learner make before succeeding. As we shall see, it is possible to set quantitative bounds on these
measures, depending on attributes of the learning problem such as:
For the most part, we will focus not on individual learning algorithms, but rather on broad classes of
learning algorithms characterized by the hypothesis spaces they consider, the presentation of training
examples, etc. Our goal is to answer questions such as:
Sample complexity. How many training examples are needed for a learner to converge (with
high probability) to a successful hypothesis?
Computational complexity. How much computational effort is needed for a learner to
converge (with high probability) to a successful hypothesis?
Mistake bound. How many training examples will the learner misclassify before converging
to a successful hypothesis?
2
3.2 Probably Approximately Correct (PAC) Leaner
3
4
5
6
3.3 SAMPLE COMPLEXITY FOR INFINITE HYPOTHESIS SPACES
7
8
9
3.5 Rule Learning
The Rule Based Learning represents knowledge in the form of IF-THEN rules that proves useful
in Artificial Intelligence(AI) system . This type of learning is most suitable for analyzing data
contains a mixture of numerical and qualitative attributed. Rule mining is useful and easy to
understand by humans. The main aim of this learning is to discover interesting relations between
variables and patents in large data sets. Several algorithms have proven useful which
automatically induce rules for data to build more accurate AI systems .
Format of Rule:
AB
LHS A is called Antecedent and RHS B is called Consequent OR
A is conjunction of attribute-value pairs and B is class label (Target class, class prediction that is
Yes or No)
Evolution Metrics of Rule: Two evolution metrics for rule , accuracy and coverage are defined
as follows
Number of instances that satisfy both antecedent and consequent of a rule
Accuracy
instances that satisfy the antecedent of a rule.
10
Types of rule base earning methods
Indirect methods: In this method, the rules are extracted from other classification models, the
Decision Tree and Genetic Algorithms belongs to this category
Direct mothed: In this method, the rules are extracted directly from taring data. Some of examples are
OneR Algorithm and Sequential Covering Algorithm
In this method first decision is constructed from training data then it translated to rules.
Example decision tree for Playtennis as follows
11
R1: IF ( Outlook=Sunny Humidity=Normal) THEN PlayTennis=Yes
Select rule from Training Dataset that covers most of positive examples and add this new
rule to rule set
Update the training dataset by remove those positive examples that have been covered
selected rule from training dataset.
Repeat this process until training dataset doesn’t contain positive examples (Finally it
contains only negative examples) as shown in figures
12
(i) Original Data
(ii) Step 1
R1 R1
R2
13
The sequential covering algorithm uses the subroutine (function) is called LEARN-ONE-RELE
14
3.8 Learning First Order Rules
Lecture Outline:
• Why Learn First Order Rules?
• First Order Logic: Terminology
• The FOIL Algorithm
Propositional logic allows the expression of individual propositions and their truth-functional
combination.
E.g. propositions like Tom is a man or All men are mortal may be represented by single
proposition letters such as P or Q (so, proposition letters may be viewed as variables
which range over propositions)
Truth functional combinations are built up using connectives, such as ∧, ∨, ¬, →– e.g.
P∧Q
Inference rules are defined over propositional forms – e.g.
P→Q
P
Q
–Note that if P is Tom is a man and Q is All men are mortal, then the inference that Tom is
mortal does not follow in propositional logic
First order logic allows the expression of propositions and their truth functional combination,
but it also allows us to represent propositions as assertions of predicates about individuals or
sets of individuals
Example : propositions like Tom is a man or All men are mortal may be represented by
Predicate-argument representations such as man (tom) or ∀x (man(x) →mortal(x))
(So, variables range over individuals)
Inference rules permit conclusions to be drawn about sets/individuals – e.g. mortal (tom)
First order logic is much more expressive than propositional logic – i.e. it allows a finer-
grain of specification and reasoning when representing knowledge
In the context of machine learning, consider learning the relational concept daughter(x,
y) defined over pairs of persons x, y, where
15
o persons are represented by attributes: (Name,Mother,Father,Male,Female)
Training examples then have the form: (person1, person2, target attribute value)
E.g. (Name1 = Ann,Mother1 = Sue,Father1 = Bob,Male1 = F,Female1 = T)
Name2 = Bob,Mother2 = Gill,Father2 = Joe,Male2 = T,Female2 = F)
Daughter1,2 = Ti
: From such examples, a propositional rule learner such as ID3 or CN2 can only learn rules like:
o quantifiers – e.g. ∀, ∃
A term is
o any constant – e.g. bob
o any variable – e.g X
o any function applied to any term – e.g. age(bob)
A literal is any predicate or negated predicate applied to any terms – e.g. female(sue),
¬father(X,Y)
– A ground literal is a literal that contains no variables – e.g. female(sue)
– A positive literal is a literal that does not contain a negated predicate – e.g. female(sue)
– A negative literal is a literal that contains a negated predicate – e.g ¬father(X,Y)
A clause is any disjunction of literals L1 ∨· · ·∨Ln whose variables are universally quantified
(With wide scope)
16
H ∨¬L1 ∨· · ·∨¬Ln
Since ¬L1 ∨· · ·∨¬Ln ≡ ¬ (L1 ∧· · ·∧Ln)
and (A∨¬B) ≡ (A←B) (read A←B as “if B then A”)
then a Horn clause can be equivalently written:
H ←L1 ∧· · ·∧Ln
Note: the equivalent form in Prolog: H :- L1, ..., Ln.
FOIL learns first order rules which are similar to Horn clauses with two exceptions:
literals may not contain function symbols (reduces complexity of hypothesis space
literals in body of clause may be negated (hence, more expressive than Horn clauses
Like SEQUENTIAL-COVERING, FOIL learns one rule at time and removes positive
examples covered by the learned rule before attempting to learn a further rule.
• The inner loop works out the detail of each specific rule, adding conjunctive constraints to the
rule precondition on each iteration.
This loop may be viewed as a general-to-specific search
17
starting with the most general precondition (empty)
stopping when the hypothesis is specific enough to exclude all negative examples
• In its inner loop search to generate each new rule, FOIL needs to cope with variables in the
rule preconditions
• The performance measure used in FOIL is not the entropy measure used in LEARN-ONE-
RULE since
the performances of distinct bindings of rule variables need to be distinguished
FOIL only tries to discover rules that cover positive examples
• Suppose we are learning a rule of the form: P(x1, x2, . . . , xk)←L1 . . .Ln
18
• Then candidate specializations add a new literal of the form:
Q(v1, . . . , vr), where
• Q is any predicate in the rule or training data;
• at least one of the vi in the created literal must already exist as a variable in the
rule
Equal(x j, xk), where x j and xk are variables already present in the rule; or
The negation of either of the above forms of literals
How do we decide which is the best literal to add when specializing a rule?
• To do this FOIL considers each possible binding of variables in the candidate rule
specialization to constants in the training examples.
• For example, suppose we have the training data:
granddaughter(bill, joan) father( joan, joe) father(tom, joe)
female( joan) father( joe,bill)
and we also assume (“closed world assumption”) that any literals
– involving predicates granddaughter, father, and female
– involving constants bill, joan, joe, and tom
– not in the training data
3.9.5 Foil-Gain
19
p1 p0
Foil _ Gain( L, R) t log 2 - log 2
p1 n1 p0 n0
Where
L is the candidate literal to add to rule R
p0 = number of positive bindings of R
n0 = number of negative bindings of R
p1 = number of positive bindings of R+L
n1 = number of negative bindings of R+L
t is the number of positive bindings of R also covered by R+L
3.9.6 Summary/Observations of FOIL
A second, quite different approach to inductive logic programming is based on the simple
observation that induction is just the inverse of deduction! In general, machine learning involves
building theories that explain the observed data. Given some data D and some partial background
knowledge B, learning can be described as generating a hypothesis h that, together with B, explains
D. Put more precisely, assume as usual that the training data D is a set of training examples,
each of the form (xi, f (x;)}. Here xi denotes the ith training instance and f (xi) denotes its target
value. Then learning is the problem of discovering a hypothesis h, such that the classification f (xi)
of each training instance x; follows deductively from the hypothesis h, the description of xi and
any other background knowledge B known to the system.
The expression X |– Y is read "Y follows deductively from X ," or alternatively "X entails Y."
Expression (10.2) describes the constraint that must be satisfied by the learned hypothesis h;
namely, for every training instance x;, the target classification f(x;) must follow deductively
from B, h, and xi
As an example, consider the case where the target concept to be learned is "pairs of people (u, v}
such that the child of u is v," represented by the predicate Child(u, v).
Assume we are given a single positive example Child (Bob, Sharon), where the instance is
described by the literals Male(Bob), Female(Sharon), and Father(Sharon, Bob).
Furthermore, suppose we have the general background knowledge Parent(u, v) Father(u, v).
B: Parent(u, v) Father(u, v)
Inverse resolution
3. 11 Inverse resolution
It is easiest to introduce the resolution rule in propositional form, though it is readily extended
to first-order representations. Let L be an arbitrary propositional literal, and let P and R be
arbitrary propositional clauses. The resolution rule is
PL
¬LR
PR
21
1. Given initial clauses C1 and C2, find a literal L from clause C1 such that ¬ L occurs in clause
C2.
2. Form the resolving C by including all literals from C1 and C2, except for L and ¬ L. More
precisely, the set of literals occurring in the conclusion C is
C = (C1 - {L}) (C2 - {¬ L})
C: PassExam ¬Study
C: PassExam ¬Study
C = (C - (C - {L})) {¬ L
2 1
L = ¬L
1 2
2. Form the resolvent C by including all literals from C and C , except for L theta and ¬L .
1 2 1 2
More precisely, the set of literals occuring in the conclusion is
22
C = (C - {L }) (C - {L })
1 1 2 2
Inverting:
-1 -1
C2 = (C - (C1 - {L1}) 1) 2 {¬L1 1 2 }
Father(Shannon,Tom) GrandChild(Bob,x)
¬Father(x,Tom))
{Shannon/x}
GrandChild(Bob,Shannon
)
3. 12 PROGOL
B h x |– f(x )
i i i
23