MLT Unit 5 12m

UNIT-5
PART-C
1. Elaborate in detail about the learning sets of rules and state how it differs from other algorithms.
Learning sets of rules is a method in machine learning and data mining that focuses on deriving
a set of human-readable rules from a dataset. These rules are typically in the form of "if-then"
statements, such as "If condition A and condition B are true, then class C." This method is valued for its
interpretability and ease of understanding compared to other complex models like neural networks or
ensemble methods. Here, I'll elaborate on the process, advantages, and differences from other
algorithms.
### Process of Learning Sets of Rules
1. Data Preparation:
- The dataset is prepared, often involving cleaning, normalization, and feature selection.
- The dataset is typically divided into a training set and a test set.
2. Rule Generation:
- Initialization: Start with an empty set of rules.
- Rule Induction: Use an algorithm to find patterns in the training data and create rules. Common
algorithms include:
- Sequential Covering Algorithms (Separate-and-Conquer): These algorithms generate one rule at a

time, remove the instances that are covered by the rule, and then repeat the process on the remaining
instances. Examples include RIPPER (Repeated Incremental Pruning to Produce Error Reduction) and
CN2.
- Decision Trees: Rules can be extracted from decision trees. Each path from the root to a leaf in a
decision tree represents a rule.
- Association Rule Learning: Techniques like the Apriori algorithm find frequent itemsets and derive
rules from them.
- Pruning: Simplify the rules by removing redundant conditions or merging similar rules to avoid
overfitting.
3. Rule Evaluation:
- **Accuracy:** Measure how well the rules classify the training data.
- **Coverage:** Evaluate the proportion of instances covered by each rule.
- **Simplicity:** Prefer simpler rules for better interpretability.
4. **Rule Application:**
- Apply the derived rules to new instances for classification or prediction.
### Advantages of Learning Sets of Rules
1. **Interpretability:**
- Rules are easy to understand and interpret by humans, making them useful in domains where
transparency is crucial, such as healthcare or finance.
2. **Transparency:**
- The decision-making process is clear, which is beneficial for debugging and explaining the model's
predictions.
3. **Flexibility:**
- Rules can be easily modified or extended by domain experts.
4. **Domain Knowledge Incorporation:**
- Expert knowledge can be directly incorporated into the rule set, improving model accuracy and
relevance.
### Differences from Other Algorithms
1. **Complexity vs. Interpretability:**
- **Neural Networks:** These models can capture complex patterns in data but are often considered
"black boxes" due to their lack of interpretability. In contrast, rule-based systems provide explicit
reasoning paths.
- **Ensemble Methods (e.g., Random Forests, Boosting):** These methods combine multiple models
to improve accuracy but at the cost of interpretability. Each decision tree in a random forest might be
interpretable individually, but the overall model is not.
2. **Model Structure:**
- **Linear Models (e.g., Linear Regression, Logistic Regression):** These models assume a linear
relationship between input features and the target variable. Rule-based systems do not assume any
specific form of relationship and can model non-linear relationships easily.
- **Support Vector Machines (SVMs):** SVMs use hyperplanes to separate data into classes and can
be difficult to interpret. Rule-based systems use logical conditions, which are more intuitive.
3. **Training and Inference:**
- **Training Time:** Learning rules can be faster for smaller datasets but may become slow for very
large datasets. Algorithms like decision trees or random forests can handle larger datasets more
efficiently.
- **Inference Time:** Applying rules to classify a new instance is usually fast since it involves checking
a few conditions. Neural networks and SVMs can have longer inference times depending on their
complexity.
4. **Handling of Different Data Types:**
- Rule-based systems can naturally handle categorical and numerical data. Other algorithms may
require more preprocessing, such as encoding categorical variables for neural networks or SVMs.
5. **Robustness to Noise:**
- Rule-based systems can be sensitive to noise, as noisy data can lead to overfitting in rule induction.
Techniques like pruning are essential to mitigate this issue. Ensemble methods and neural networks
often handle noise better through averaging and regularization techniques.
### Conclusion
Learning sets of rules is a powerful technique for creating interpretable models that are easy to
understand and modify. While they may not always achieve the highest accuracy compared to more
complex models, their transparency and simplicity make them valuable in many applications where
interpretability is essential. The choice between rule-based systems and other algorithms depends on
the specific requirements of the task, such as the need for interpretability versus the need for accuracy
and handling complex patterns in data.
2. (i)Illustrate the diagram for the search for rule preconditions as learn-one-rule proceeds from general
to specific.
(ii)Discuss the implementation algorithm for Learn one-rule.
### (i) Diagram for the Search for Rule Preconditions in Learn-One-Rule
The process of learning one rule typically involves starting with a very general rule and progressively
specializing it to better fit the data. Here's a diagrammatic illustration of this process:
```
Initial General Rule

(True -> Class)
+----------------+----------------+
| |
Add Condition 1 Add Condition 2
(Attribute1 = Value1) (Attribute2 = Value2)
| |
| |
+-------------+-------------+ +----------+----------+
| | | |
Add Condition 3 Add Condition 4 Add Condition 3 Add Condition 5
(Attribute3 = Value3) (Attribute4 = Value4) (Attribute3 = Value3) (Attribute5 = Value5)
| | | |
| | | |
+-------------+ +--------+----------+ |
| | | | |
Add Condition 6 Add Condition 7 Add Condition 6 Add Condition 8
(Attribute6 = Value6) (Attribute7 = Value7) (Attribute6 = Value6) (Attribute8 = Value8)
```
This diagram illustrates the progression from a general rule to more specific rules by adding conditions.
Each branch point represents the addition of a new condition that makes the rule more specific.
### (ii) Implementation Algorithm for Learn-One-Rule
The Learn-One-Rule algorithm involves creating a single rule that covers a subset of the instances in the
dataset. This is usually part of a larger rule-learning system that will create multiple rules to cover the
entire dataset. Here's a step-by-step outline of the algorithm:
1. **Initialize the Rule:**
- Start with the most general rule possible, usually something like "IF True THEN Class = X," where X is
the majority class in the dataset.
2. **Select Candidate Preconditions:**
- Generate candidate conditions that can be added to the rule. Each candidate condition is a test on an
attribute, such as "Attribute1 = Value1".
3. **Evaluate Preconditions:**
- For each candidate condition, temporarily add it to the current rule and evaluate the rule's
performance on the training data. Common evaluation metrics include accuracy, coverage, and
precision.
4. **Select the Best Condition:**
- Choose the condition that, when added to the rule, results in the best improvement according to the
chosen evaluation metric.
5. **Update the Rule:**
- Add the selected condition to the rule, making it more specific.
6. **Check for Stopping Criteria:**
- Determine if the rule should stop being specialized. Common stopping criteria include:
- The rule's performance does not significantly improve with additional conditions.
- The rule reaches a minimum coverage threshold (e.g., it covers too few instances to be useful).
- The rule becomes too complex (e.g., exceeds a maximum number of conditions).
7. **Prune the Rule (Optional):**
- Simplify the rule by removing any conditions that do not significantly affect its performance, which
helps to avoid overfitting.
8. **Finalize the Rule:**

- Once the stopping criteria are met, finalize the rule. The instances covered by this rule are usually
removed from the training set, and the algorithm starts over to learn a new rule to cover the remaining
instances.
### Pseudocode for Learn-One-Rule Algorithm
Here's a pseudocode representation of the Learn-One-Rule algorithm:
```python
def learn_one_rule(training_data, target_class):
# Step 1: Initialize the rule with no conditions
rule = {"conditions": [], "class": target_class}
# Step 2: Loop to specialize the rule
while True:
best_condition = None
best_performance = evaluate_rule(rule, training_data)
# Step 3: Generate and evaluate candidate conditions
for condition in generate_candidate_conditions(training_data, rule):
temp_rule = add_condition_to_rule(rule, condition)
performance = evaluate_rule(temp_rule, training_data)
if performance > best_performance:
best_performance = performance
best_condition = condition
# Step 4: Check if the best condition improves the rule
if best_condition is not None:
rule = add_condition_to_rule(rule, best_condition)

else:
break
# Step 5: Check for stopping criteria (e.g., no improvement, minimum coverage, etc.)
if stopping_criteria_met(rule, training_data):
break
# Step 6: Prune the rule if necessary
rule = prune_rule(rule, training_data)
return rule
def evaluate_rule(rule, training_data):
# Evaluate the rule's performance (e.g., accuracy, coverage)
pass
def generate_candidate_conditions(training_data, rule):
# Generate possible conditions to add to the rule
pass
def add_condition_to_rule(rule, condition):
# Return a new rule with the added condition
pass
def stopping_criteria_met(rule, training_data):
# Determine if the stopping criteria are met
pass
def prune_rule(rule, training_data):

# Simplify the rule to prevent overfitting
pass
```This pseudocode provides a high-level overview of the Learn-One-Rule algorithm, covering

initialization, candidate condition generation, evaluation, selection, updating, stopping criteria, and
optional pruning.
3. Refine the LEARN-ONE-RULE algorithm. So that it can learn rules whose preconditions include
constraints such as nationality E {Canadian, Brazilian}, where a discrete-valued attribute is allowed to
take on any value in some specified set. Your modified program should explore the hypothesis space
containing all such subsets. Specify your new algorithm as a set of editing changes to the algorithm.
To refine the LEARN-ONE-RULE algorithm to handle preconditions that include constraints like
"nationality ∈ {Canadian, Brazilian}", where a discrete-valued attribute can take any value in a specified
set, we need to update the algorithm to explore the hypothesis space containing all such subsets. Here's
how you can modify the algorithm:
### Editing Changes to the Algorithm
1. **Initialize the Rule:**
- Modify the initialization to handle sets of values.
2. **Generate Candidate Preconditions:**
- Update the candidate generation process to include all possible subsets of values for each discrete-
valued attribute.
3. **Evaluate Potential Preconditions:**
- Adapt the evaluation function to handle these subsets.
4. **Update Instances Based on Subsets:**
- Ensure the instance updating functions handle subsets correctly.
### Updated Algorithm
#### Pseudocode with Changes

```python
def learn_one_rule(dataset):
# Start with the most general rule
rule = "IF true THEN Class = Positive"
positive_instances = get_positive_instances(dataset)
negative_instances = get_negative_instances(dataset)
while positive_instances:
best_precondition = None
best_improvement = -inf
for attribute in dataset.attributes:
if is_discrete(attribute):
# Generate all possible subsets of values for the attribute
for value_subset in power_set(attribute.values):
candidate_rule = rule + f" AND {attribute} ∈ {{{', '.join(value_subset)}}}"
improvement = evaluate_rule(candidate_rule, positive_instances, negative_instances)
if improvement > best_improvement:
best_precondition = (attribute, value_subset)
best_improvement = improvement
else:
for value in attribute.values:
candidate_rule = rule + f" AND {attribute} = {value}"
improvement = evaluate_rule(candidate_rule, positive_instances, negative_instances)
if improvement > best_improvement:
best_precondition = (attribute, value)

best_improvement = improvement
if best_precondition:
if isinstance(best_precondition[1], set):
rule += f" AND {best_precondition[0]} ∈ {{{', '.join(best_precondition[1])}}}"
else:
rule += f" AND {best_precondition[0]} = {best_precondition[1]}"
positive_instances = update_positive_instances(positive_instances, best_precondition)
negative_instances = update_negative_instances(negative_instances, best_precondition)
else:
break
return rule
def evaluate_rule(rule, positive_instances, negative_instances):
# Implement evaluation logic to measure rule performance
# Return a metric that indicates the improvement made by the rule
pass
def update_positive_instances(positive_instances, precondition):
# Update the list of positive instances based on the new precondition
# If precondition is a subset constraint, filter accordingly
if isinstance(precondition[1], set):
return [inst for inst in positive_instances if inst[precondition[0]] in precondition[1]]
else:
return [inst for inst in positive_instances if inst[precondition[0]] == precondition[1]]
def update_negative_instances(negative_instances, precondition):

# Update the list of negative instances based on the new precondition
# If precondition is a subset constraint, filter accordingly
if isinstance(precondition[1], set):
return [inst for inst in negative_instances if inst[precondition[0]] in precondition[1]]
else:
return [inst for inst in negative_instances if inst[precondition[0]] == precondition[1]]
def get_positive_instances(dataset):
# Extract and return positive instances from the dataset
pass
def get_negative_instances(dataset):
# Extract and return negative instances from the dataset
pass
def is_discrete(attribute):
# Determine if the attribute is discrete-valued
pass
def power_set(s):
# Generate the power set of a given set s (all possible subsets)
from itertools import chain, combinations
return list(chain.from_iterable(combinations(s, r) for r in range(1, len(s)+1)))
```
### Explanation of Changes
1. **Candidate Generation:**
- For discrete-valued attributes, the algorithm generates all possible subsets of values using the
`power_set` function.
2. **Precondition Evaluation:**
- The èvaluate_rule` function is updated to consider rules of the form Àttribute ∈ {Value1, Value2,
...}`.
3. **Instance Updating:**
- The ùpdate_positive_instances` and ùpdate_negative_instances` functions are modified to filter

instances based on whether their attribute values belong to the specified subset.
4. **Subset Handling:**
- Conditions and rules are modified to include set notation for discrete-valued attributes where
applicable.
This refined algorithm now explores a hypothesis space that includes rules with constraints on subsets
of attribute values, allowing for more flexible and potentially more accurate rule generation.
4. Consider a sequential covering algorithm such as CN2 and a simultaneous covering algorithm such as
ID3. Both algorithms are to be used to learn a target concept defined over instances represented by
conjunctions of n boolean attributes. If ID3 learns a balanced decision tree of depth d, it will contain 2d -
1 distinct decision nodes, and therefore will have made 2d - 1 distinct choices while constructing its
output hypothesis. How many rules will be formed if this tree is re-expressed ast a disjunctive set of
rules? How many preconditions will each ru?e possess? How many distinct choices would a sequential
covering algorithm have to make to learn this same set of rules? Which system do you suspect would be
more prone to overfitting if both were given the same training data?
### Analysis of ID3 and CN2 in Terms of Rule Formation and Overfitting
#### ID3 Algorithm
The ID3 algorithm constructs a decision tree by recursively splitting the data based on the attribute that
provides the maximum information gain at each node. For a balanced decision tree of depth \(d\):
1. **Number of Decision Nodes:**
- A balanced decision tree of depth \(d\) will contain \(2^d - 1\) decision nodes.
2. **Number of Rules:**
- Each path from the root to a leaf node represents a rule. In a balanced tree of depth \(d\), there are
\(2^d\) leaf nodes, hence \(2^d\) distinct rules.
3. **Number of Preconditions per Rule:**
- Each rule corresponds to a path from the root to a leaf, and thus contains \(d\) preconditions (one for
each level of the tree).
#### Re-expressing the Decision Tree as a Disjunctive Set of Rules
When the decision tree is re-expressed as a set of disjunctive rules:
- **Total Number of Rules:** \(2^d\)
- **Preconditions per Rule:** \(d\) (since each rule corresponds to a path from the root to a leaf, and
the depth of the tree is \(d\)).
#### Sequential Covering Algorithm (CN2)
The CN2 algorithm, a sequential covering algorithm, learns one rule at a time. It attempts to cover as
many positive instances as possible with each rule before removing the covered instances and repeating
the process on the remaining data.
1. **Distinct Choices to Learn the Same Set of Rules:**
- To learn the same \(2^d\) rules, a sequential covering algorithm like CN2 must make choices
iteratively. Each choice involves selecting a precondition to add to the current rule being formed.
- If each rule has \(d\) preconditions, and there are \(2^d\) rules, CN2 will need to make choices to
form each of these rules.
2. **Number of Distinct Choices:**
- In the worst case, assuming no overlap in the conditions (which is an overestimate for practical
scenarios), CN2 might have to make \(d\) choices for each of the \(2^d\) rules. Therefore, the total
number of choices can be approximated as \(d \times 2^d\).
#### Overfitting Susceptibility
- **ID3:**
- ID3 makes \(2^d - 1\) distinct choices to construct the decision tree. Each choice is made to maximize
information gain locally.
- When re-expressed as rules, ID3 produces \(2^d\) rules with \(d\) preconditions each.
- **CN2:**
- CN2 potentially makes \(d \times 2^d\) distinct choices, as it forms each rule sequentially and each
rule has \(d\) preconditions.
Given that CN2 potentially makes more distinct choices and constructs rules sequentially, it might be
more prone to overfitting. Overfitting occurs when a model is excessively complex and captures noise in
the training data as if it were a true pattern.
- **Reasoning:**
- **ID3** might be less prone to overfitting because it globally considers the best attribute splits at
each node based on information gain, constructing a more structured and balanced model.
- **CN2** constructs rules sequentially and could overfit by creating overly specific rules to cover
exceptions or noise in the data.
### Conclusion
- **Number of Rules in ID3 (Re-expressed):** \(2^d\)
- **Preconditions per Rule in ID3 (Re-expressed):** \(d\)
- **Number of Choices in CN2:** Approximately \(d \times 2^d\)

**Overfitting Susceptibility:**
- **ID3** is likely less prone to overfitting due to its global approach to splitting nodes based on
information gain.
- **CN2** is more prone to overfitting because it constructs rules sequentially and may create very
specific rules that fit the noise in the training data.
5. Apply inverse resolution to the clauses C = R(B, x) v P(x, A) and CI = S(B, y) vR(z, x). Give at least four
possible results for C2. Here A and B are constants, x and y are variables.
Inverse resolution involves deriving more general clauses by inverting the steps of the resolution
process. Given the clauses \( C = R(B, x) \lor P(x, A) \) and \( C_I = S(B, y) \lor R(z, x) \), we can derive
possible ancestor clauses by inverting their resolution.
Here are four possible results for \( C_2 \):
1. **Result 1:**
- Let’s resolve on \( R \):
- Clauses: \( C_1 = P(x, A) \lor S(B, y) \)
- Derived from resolving \( C = R(B, x) \lor P(x, A) \) and \( C_I = S(B, y) \lor R(z, x) \) on \( R \).
2. **Result 2:**
- Let’s assume \( z = B \) and resolve on \( R \):
- Clauses: \( C_1 = P(x, A) \lor S(B, y) \)
- Derived from resolving \( C = R(B, x) \lor P(x, A) \) and \( C_I = S(B, y) \lor R(B, x) \) on \( R \).
3. **Result 3:**
- Let’s resolve on a different predicate, e.g., \( P \):
- Clauses: \( C_1 = R(B, x) \lor S(B, y) \)
- Derived from resolving \( C = R(B, x) \lor P(x, A) \) and some other clause involving \( P \).
4. **Result 4:**
- Let’s assume \( x = y \) and resolve on \( R \):
- Clauses: \( C_1 = P(y, A) \lor S(B, y) \)
- Derived from resolving \( C = R(B, x) \lor P(x, A) \) and \( C_I = S(B, y) \lor R(z, y) \) on \( R \).
In each of these results, the key step is to identify the common predicate for resolution (in this case, \( R
\)), and then find the general form of the clause that can result from this resolution.
6. Consider the bottom-most inverse resolution, derive at least two different outcomes that could result
given different choices for the substitutions θ1 and θ2 .Derive a result for the inverse resolution step if
the clause Father(Tom, Bob) is used in place of Father(Shannon, Tom).
Given the clauses \( C = R(B, x) \lor P(x, A) \) and \( C_I = S(B, y) \lor R(z, x) \), we are exploring
the inverse resolution step. Inverse resolution involves finding more general clauses that could have led
to the given clauses through resolution.
To derive two different outcomes for the inverse resolution, we need to apply different substitutions \(
\theta_1 \) and \( \theta_2 \). Let's assume the substitutions are related to the variables in the given
clauses.
### Substitutions
Let’s consider two sets of substitutions:
1. **First set of substitutions:**
- \( \theta_1: \{x \mapsto y, z \mapsto B\} \)
- \( \theta_2: \{y \mapsto x\} \)
2. **Second set of substitutions:**
- \( \theta_1: \{x \mapsto z, y \mapsto A\} \)
- \( \theta_2: \{z \mapsto y, B \mapsto z\} \)
Using these substitutions, we will derive two different outcomes.

### Outcome 1: Using the first set of substitutions
1. Apply \( \theta_1 \) to \( C_I \):
- \( C_I = S(B, y) \lor R(z, x) \)
- Applying \( \theta_1 \): \( S(B, x) \lor R(B, y) \)
2. Apply \( \theta_2 \) to \( C \):
- \( C = R(B, x) \lor P(x, A) \)
- Applying \( \theta_2 \): \( R(B, y) \lor P(y, A) \)
Combining these, we get the clause before the resolution step:
- \( C_2 = P(y, A) \lor S(B, x) \)
### Outcome 2: Using the second set of substitutions
1. Apply \( \theta_1 \) to \( C_I \):
- \( C_I = S(B, y) \lor R(z, x) \)
- Applying \( \theta_1 \): \( S(B, A) \lor R(A, z) \)
2. Apply \( \theta_2 \) to \( C \):
- \( C = R(B, x) \lor P(x, A) \)
- Applying \( \theta_2 \): \( R(z, x) \lor P(x, A) \)
Combining these, we get the clause before the resolution step:
- \( C_2 = P(x, A) \lor S(B, A) \)
### Considering a different clause
Now, let's consider a different clause, such as \( \text{Father(Tom, Bob)} \) in place of \(

\text{Father(Shannon, Tom)} \):
- Original Clause: \( \text{Father(Shannon, Tom)} \)

- New Clause: \( \text{Father(Tom, Bob)} \)
Assume this new clause is used in some resolution step:
1. Given \( C = R(B, x) \lor P(x, A) \)
2. And \( C_I = \text{Father(Tom, Bob)} \)
We can derive the following by using inverse resolution:
#### Outcome 3:
- Suppose the substitution \( \theta: \{x \mapsto Bob, B \mapsto Tom\} \):
- \( C = R(Tom, Bob) \lor P(Bob, A) \)
- And given \( C_I = \text{Father(Tom, Bob)} \)
The general form of the derived clause could be:
- \( C_2 = P(Bob, A) \lor \text{Father(Tom, x)} \)
#### Outcome 4:
- Suppose a different substitution \( \theta: \{x \mapsto Tom, A \mapsto Bob\} \):
- \( C = R(B, Tom) \lor P(Tom, Bob) \)
- And given \( C_I = \text{Father(Tom, Bob)} \)
The general form of the derived clause could be:
- \( C_2 = P(Tom, Bob) \lor \text{Father(x, Bob)} \)
These examples show how different substitutions can lead to different outcomes in the inverse
resolution process.
7. Consider the problem of learning the target concept "pairs of people who live in the same house,"
denoted by the predicate Housemates(x, y). Below is a positive example of the concept.
Housemates(Joe, Sue) Person( Joe) Person(Sue) Sex(Joe, Male) Sex(Sue, Female) Hair Color (Joe, Black)
Hair color (Sue, Brown) Height (Joe, Short) Height(Sue, Short) Nationality(Joe, US) Nationality(Sue, US)
Mother(Joe, Mary) Mother(Sue, Mary) Age (Joe, 8) Age(Sue, 6) The following domain theory is helpful
for acquiring the Housemates concept: Housemates(x, y) t InSameFamily(x, y) Housemates(x, y) t
Fraternity Brothers(x, y) InSameFamily(x, y) t Married(x, y) InSameFamily (x, y) t Youngster(x) A
Youngster (y) A Same Mother (x, y) Same Mother(x, y) t Mother(x, z) A Mother(y, z) Youngster(x) t Age(x,
a) A Less Than(a, 10) Apply the PROLOG-EBG algorithm to the task of generalizing from the above
Instance, using the above domain theory. In particular, (a) Show a hand-trace of the PROLOG-EBG
algorithm applied to this problem; that is, show the explanation generated for the training instance,
show the result of regressing the target concept through this explanation, and show the resulting Horn
clause rule. (b) Suppose that the target concept is "people who live with Joe" instead of "pairs of people
who live together." Write down this target concept in terms of the above formalism. Assuming the same
training instance and domain theory as before, what Horn clause rule will PROLOG-EBG produce for this
new target Concept?
To apply the PROLOG-EBG algorithm to the task of generalizing the concept "pairs of people
who live in the same house" from the provided training instance and domain theory, we'll follow these
steps:
### (a) Hand-trace of PROLOG-EBG algorithm:
#### Step 1: Explanation Generation
We start with the positive example:
```
Housemates(Joe, Sue)
Person(Joe)
Person(Sue)
Sex(Joe, Male)
Sex(Sue, Female)
HairColor(Joe, Black)
HairColor(Sue, Brown)
Height(Joe, Short)
Height(Sue, Short)
Nationality(Joe, US)
Nationality(Sue, US)
Mother(Joe, Mary)
Mother(Sue, Mary)
Age(Joe, 8)
Age(Sue, 6)
```
Using the domain theory, we explain the positive example:
1. Housemates(x, y) ← InSameFamily(x, y)
2. InSameFamily(x, y) ← Married(x, y)
3. InSameFamily(x, y) ← Youngster(x)
4. Youngster(y)
5. InSameFamily(x, y) ← SameMother(x, y)
6. SameMother(x, y) ← Mother(x, z), Mother(y, z)
7. Youngster(x) ← Age(x, a), LessThan(a, 10)
#### Step 2: Regress the Target Concept
We regress the target concept "pairs of people who live in the same house" through the explanation.
#### Step 3: Resulting Horn Clause Rule
The resulting Horn clause rule would be:
```
Housemates(x, y) ← InSameFamily(x, y), Person(x), Person(y),
Sex(x, Male), Sex(y, Female),
HairColor(x, Black), HairColor(y, Brown),
Height(x, Short), Height(y, Short),
Nationality(x, US), Nationality(y, US),
Mother(x, z), Mother(y, z),
Age(x, a), Age(y, b), LessThan(a, 10), LessThan(b, 10)
```
### (b) Target Concept: "People who live with Joe"
The target concept "people who live with Joe" can be expressed in terms of the formalism as follows:
```
PeopleWithJoe(x) ← InSameFamily(x, Joe)
```
Given the same training instance and domain theory, the Horn clause rule PROLOG-EBG will produce for
this new target concept would be:
```
PeopleWithJoe(x) ← InSameFamily(x, Joe)
```
This rule states that any person \( x \) who is in the same family as Joe is considered to be living with Joe.
8. Compose the following horn clauses (i)First-Order Horn Clauses (6M) (ii)Basic terminology in horn
clauses.(6M)
### (i) First-Order Horn Clauses:
1. **Parent(x, y) ← Mother(x, y)**
- If x is the mother of y, then x is a parent of y.
2. **Sibling(x, y) ← Parent(z, x), Parent(z, y), NotEqual(x, y)**
- If there exists a z such that z is a parent of both x and y, and x and y are not the same individual, then
x and y are siblings.
3. **Grandparent(x, y) ← Parent(x, z), Parent(z, y)**
- If there exists a z such that x is the parent of z and z is the parent of y, then x is a grandparent of y.
4. **Ancestor(x, y) ← Parent(x, y)**

- If x is the parent of y, then x is an ancestor of y.
5. **Married(x, y) ← Spouse(x, y)**
- If x is the spouse of y, then x and y are married.
6. **Uncle(x, y) ← Sibling(x, z), Parent(z, y), Male(x)**
- If there exists a z such that x and z are siblings, z is the parent of y, and x is male, then x is the uncle of
y.
### (ii) Basic Terminology in Horn Clauses:
1. **Clause:**
- A clause is a disjunction of literals. In Horn clauses, at most one positive literal is allowed.
2. **Horn Clause:**
- A Horn clause is a clause that contains at most one positive literal. It is often represented in the form
\( H \leftarrow B_1, B_2, \ldots, B_n \), where H is the head (positive literal) and \( B_1, B_2, \ldots, B_n
\) are the body (negative literals or conjunction of literals).
3. **Positive Literal:**
- A positive literal is a predicate applied to terms without negation. It states a positive fact or
condition.
4. **Negative Literal:**
- A negative literal is a predicate applied to terms with negation. It states a negative fact or condition.
5. **Head:**
- The head of a Horn clause is the positive literal on the left side of the arrow (←).
6. **Body:**
- The body of a Horn clause consists of the negative literals or conjunction of literals on the right side
of the arrow (←).
These terms are fundamental to understanding and working with Horn clauses, which are widely used in
logic programming and knowledge representation.
10. Consider again the search trace of FOCL suppose that the hypothesis selected at the first level in the
search is changed to Cup- t Has Handle Describe the second-level candidate hypotheses that will be
generated by FOCL as successors to this hypothesis. You need only include those hypotheses generated
by FOCL's second search operator, which uses its domain theory. Don't forget to Post-prune the
sufficient conditions.
In the FOCL (First-Order Concept Learning) algorithm, after selecting the hypothesis "Cup-t Has
Handle" at the first level, the second-level candidate hypotheses are generated using FOCL's second
search operator, which utilizes the domain theory. The domain theory contains background knowledge
about the relationships between concepts.
Given the hypothesis "Cup-t Has Handle," FOCL's second search operator will consider the domain
theory to generate successor hypotheses. Let's assume the domain theory provides the following
information:
1. **Cup-t Is Used For Drinking**: Cups are typically used for drinking beverages.
2. **Cup-t Is Made of Ceramic**: Cups are commonly made of ceramic material.
3. **Handle-t Is Attached To Cup-t**: Handles are usually attached to cups for ease of holding.
Based on this domain theory, FOCL will generate second-level candidate hypotheses that refine or
extend the initial hypothesis "Cup-t Has Handle" by considering these relationships. Here are some
possible second-level candidate hypotheses:
1. **Cup-t Has Handle And Is Used For Drinking**: This hypothesis combines the initial hypothesis with
the knowledge that cups are used for drinking.
2. **Cup-t Has Handle And Is Made of Ceramic**: This hypothesis adds the information that cups are
typically made of ceramic material.
3. **Cup-t Has Handle And Handle-t Is Attached To Cup-t**: This hypothesis explicitly states the
relationship between the cup and its handle.
These second-level candidate hypotheses are generated by incorporating the domain theory's
knowledge to refine or extend the initial hypothesis "Cup-t Has Handle." Additionally, post-pruning may
be applied to these hypotheses to ensure they meet the minimum coverage and generalization criteria.
11. Consider playing Tic-Tac-Toe against an opponent who plays randomly. In particular, assume the
opponent chooses with uniform probability any open space, unless there is a forced move (in which case
it makes the obvious correct move). (a) Formulate the problem of learning an optimal Tic-Tac-Toe
strategy in this case as a Q-learning task. What are the states, transitions, and rewards in this non-
deterministic Markov decision process? (b) Will your program succeed if the opponent plays optimally
rather than randomly?
### (a) Formulation of Q-learning Task for Tic-Tac-Toe:
#### States:
The states represent the current configurations of the Tic-Tac-Toe board. Each state corresponds to a
different arrangement of Xs, Os, and empty spaces on the board.
#### Transitions:
Transitions occur when the agent (player) makes a move. The agent selects an action (placing its symbol
in a specific position), which results in a transition to a new state (the updated board configuration).
#### Actions:
Actions represent the possible moves the agent can make. In Tic-Tac-Toe, actions involve placing an X or
O symbol in one of the empty spaces on the board.
#### Rewards:
- **Winning (+1):** If the agent wins the game by placing three of its symbols in a row, column, or
diagonal, it receives a reward of +1.
- **Losing (-1):** If the opponent wins the game, the agent receives a reward of -1.
- **Draw (0):** If the game ends in a draw (tie), both players receive a reward of 0.
#### Non-Deterministic Markov Decision Process:
- The environment is non-deterministic because the opponent plays randomly unless there is a forced
move.
- The state transitions depend on the agent's actions and the opponent's moves.
### (b) Success Against an Optimal Opponent:
If the opponent plays optimally rather than randomly, the task becomes more challenging for the agent.
An optimal opponent will always choose the best move available, making it difficult for the agent to win
consistently.
However, Q-learning is a reinforcement learning algorithm that can adapt to different opponent
strategies over time. While the learning process may take longer and the agent's win rate may decrease
compared to playing against a random opponent, the agent can still learn to play competitively against
an optimal opponent.
In summary, while the agent's success rate may decrease against an optimal opponent, Q-learning still
provides a framework for learning and improving the agent's performance over time.

MLT Unit 5 12m

Uploaded by

Copyright:

Available Formats

MLT Unit 5 12m

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLT Unit 5 12m

Uploaded by

Copyright:

Available Formats

UNIT-5

### Process of Learning Sets of Rules

- Initialization: Start with an empty set of rules.

- Sequential Covering Algorithms (Separate-and-Conquer): These algorithms generate one rule at a

- **Simplicity:** Prefer simpler rules for better interpretability.

- Apply the derived rules to new instances for classification or prediction.

### Advantages of Learning Sets of Rules

- Rules can be easily modified or extended by domain experts.

4. **Domain Knowledge Incorporation:**

### Differences from Other Algorithms

1. **Complexity vs. Interpretability:**

3. **Training and Inference:**

4. **Handling of Different Data Types:**

(ii)Discuss the implementation algorithm for Learn one-rule.

Initial General Rule

Add Condition 1 Add Condition 2

(Attribute1 = Value1) (Attribute2 = Value2)

Add Condition 3 Add Condition 4 Add Condition 3 Add Condition 5

(Attribute3 = Value3) (Attribute4 = Value4) (Attribute3 = Value3) (Attribute5 = Value5)

Add Condition 6 Add Condition 7 Add Condition 6 Add Condition 8

(Attribute6 = Value6) (Attribute7 = Value7) (Attribute6 = Value6) (Attribute8 = Value8)

### (ii) Implementation Algorithm for Learn-One-Rule

2. **Select Candidate Preconditions:**

4. **Select the Best Condition:**

5. **Update the Rule:**

- Add the selected condition to the rule, making it more specific.

6. **Check for Stopping Criteria:**

7. **Prune the Rule (Optional):**

8. **Finalize the Rule:**

### Pseudocode for Learn-One-Rule Algorithm

Here's a pseudocode representation of the Learn-One-Rule algorithm:

def learn_one_rule(training_data, target_class):

# Step 1: Initialize the rule with no conditions

rule = {"conditions": [], "class": target_class}

# Step 2: Loop to specialize the rule

best_performance = evaluate_rule(rule, training_data)

# Step 3: Generate and evaluate candidate conditions

for condition in generate_candidate_conditions(training_data, rule):

temp_rule = add_condition_to_rule(rule, condition)

performance = evaluate_rule(temp_rule, training_data)

if performance > best_performance:

# Step 4: Check if the best condition improves the rule

if best_condition is not None:

rule = add_condition_to_rule(rule, best_condition)

# Step 6: Prune the rule if necessary

rule = prune_rule(rule, training_data)

def evaluate_rule(rule, training_data):

# Evaluate the rule's performance (e.g., accuracy, coverage)

def generate_candidate_conditions(training_data, rule):

# Generate possible conditions to add to the rule

def add_condition_to_rule(rule, condition):

# Return a new rule with the added condition

def stopping_criteria_met(rule, training_data):

# Determine if the stopping criteria are met

def prune_rule(rule, training_data):

```This pseudocode provides a high-level overview of the Learn-One-Rule algorithm, covering

### Editing Changes to the Algorithm

1. **Initialize the Rule:**

- Modify the initialization to handle sets of values.

- Simplicity: Prefer simpler rules for better interpretability.

4. Domain Knowledge Incorporation:

1. Complexity vs. Interpretability:

3. Training and Inference:

4. Handling of Different Data Types:

2. Select Candidate Preconditions:

4. Select the Best Condition:

5. Update the Rule:

6. Check for Stopping Criteria:

7. Prune the Rule (Optional):

8. Finalize the Rule:

1. Initialize the Rule:

2. Generate Candidate Preconditions:

3. Evaluate Potential Preconditions:

4. Update Instances Based on Subsets:

3. Number of Preconditions per Rule:

- Total Number of Rules: \(2^d\)

1. Distinct Choices to Learn the Same Set of Rules:

- Number of Rules in ID3 (Re-expressed): \(2^d\)

- Preconditions per Rule in ID3 (Re-expressed): \(d\)

- Number of Choices in CN2: Approximately \(d \times 2^d\)

1. First set of substitutions:

2. Second set of substitutions: