Knowledge Representation and Inference
Knowledge Representation and Inference
Knowledge Representation and Inference
There are various techniques that have evolved that can be applied to a variety of AI tasks - these will be the focus of this course. These techniques are concerned with how we represent, manipulate and reason with knowledge in order to solve problems. Knowledge Representation Search
Knowledge Representation
Knowledge representation is crucial. One of the clearest results of artificial intelligence research so far is that solving even apparently simple problems requires lots of knowledge. Really understanding a single sentence requires extensive knowledge both of language and of the context. For example, December 29th 2002 headline "It's President Kibaki'' can only be interpreted reasonably if you know it was the day after Kenya's general elections. Really understanding a visual scene similarly requires knowledge of the kinds of objects in the scene. Solving problems in a particular domain generally requires knowledge of the objects in the domain and knowledge of how to reason in that domain - both these types of knowledge must be represented. Knowledge must be represented efficiently, and in a meaningful way. Efficiency is important, as it would be impossible (or at least impractical) to explicitly represent every fact that you might ever need. There are just so many potentially useful facts, most of which you would never even think of. You have to be able to infer new facts from your existing knowledge, as and when needed, and capture general abstractions which represent general features of sets of objects in the world. Knowledge must be meaningfully represented so that we know how it relates back to the real world. A knowledge representation scheme provides a mapping from features of the world to a formal language. (The formal language will just capture certain aspects of the world, which we believe are important to our problem - we may of course miss out crucial aspects and so fail to really solve our problem, like ignoring friction in a mechanics problem). Anyway, when we manipulate that formal language using a computer we want to make sure that we still have meaningful expressions, which can be mapped back to the real world. This is what we mean when we talk about the semantics of representation languages.
Search
Another crucial general technique required when writing AI programs is search. Often there is no direct way to find a solution to some problem. However, you do know how to generate possibilities. For example, in solving a puzzle you might know all the possible moves, but not the sequence that would lead to a solution. When working out how to get somewhere you might know all the roads/buses/trains, just not the best route to get you to your destination quickly. Developing good ways to search through these possibilities for a good solution is therefore vital. Brute force techniques, where you generate and try out every possible solution may work, but are often very inefficient, as there are just too many possibilities to try. Heuristic techniques are often better, where you only try the options which you think (based on your current best guess) are most likely to lead to a good solution.
Broad approaches to knowledge representation: Logic, Structured objects and Production systems AI notes on Knowledge Representation and Inference
logics out there, such as default logics, temporal logics and modal logics. However, another approach is to abandon the constraints that the use of a logic imposes and use a less clean, but more flexible knowledge representation language. Two such "languages'' are structured objects and production systems. The idea of structured objects is to represent knowledge as a collection of objects and relations, the most important relations being the subclass and instance relations. The subclass relation (as you might expect) says that one class is a subclass of another, while the instance relation says that some individual belongs to some class. We'll use them so that "X subclass Y'' means that X is a subclass of Y, not that X has a subclass Y. (Some books/approaches use the relation is-a to refer to the subclass relation. So Mutua is an instance of the class representing ICS611 students (not all 1st year MSc. students are studying/attending this course now), while the class of ICS611 students is a subclass of the class of 1st year MSc. students. We can then define property inheritance, so that, by default, Mutua inherits all the typical attributes of ICS611 students, and ICS611 students inherit typical attributes of 1st year MSc. students. We'll go into this in much more detail later. Production systems consist of a set of if-then rules, and a working memory. The working memory represents the facts that are currently believed to hold, while the if-then rules typically state that if certain conditions hold (e.g., certain facts are in the working memory), then some action should be taken (e.g., other facts should be added or deleted). If the only action allowed is to add a fact to working memory then rules may be essentially logical implications, but generally greater flexibility is allowed. Production rules capture (relatively) procedural knowledge in a simple, modular manner. In the next few parts we will describe these different knowledge representation languages in more detail. We'll start with structured objects, as these are fairly easy to understand. Then we'll talk about logic, and then production rules. The discussion of production rules should naturally exploit what you have already learnt in the topic: problem solving using search.
Structured Objects
[Note: In this section the notation and terminology may not be the same as your textbooks, though the underlying ideas should be the same! Try to stick with the notation used here, but don't view it as THE correct one. Terms like "instance'', "subclass'' and the particular representation of frames (e.g., use of "*'') will vary across different texts.] We will discuss the following: Semantic Nets Frames
Semantic Nets
The simplest kind of structured object is the semantic net originally developed in the early 1960s to represent the meaning of English words. They are important both historically, and in introducing the basic ideas of class hierarchies and inheritance. A semantic net is really just a graph, where the nodes in the graph represent concepts, and the arcs (or links) represent binary relationships between concepts. The most important relations between concepts are subclass relations between classes and subclasses, and instance relations between particular objects and their parent class. However, any other relations are allowed, such as has-part, is-a, colour, etc. So, to represent some knowledge about animals (as AI people so often do) we might have the following network:
This network represents the fact that mammals and reptiles are animals, that mammals have heads, an elephant is a large grey mammal, Clyde and Nellie are both elephants, and that Nellie likes apples. The subclass relations define a class hierarchy (in this case very simple). The subclass and instance relations may be used to derive new information which is not explicitly represented. We should be able to conclude that Clyde and Nellie both have a head, and are large and grey. They inherit information from their parent classes. Semantic networks normally allow efficient inheritance-based inferences using special purpose algorithms. Semantic nets are fine at representing relationships between two objects - but what if we want to represent a relation between three or more objects? Say we want to represent the fact that "John gives Mary the book'' This might be represented in logic as gives(John, Mary, book2) where book2 represents the particular book we are talking about. However, in semantic networks we have to view the fact as representing a set of binary relationships between a "giving'' event and some objects. When semantic networks became popular in the 1970s there was much discussion about what the nodes and relations really meant. People were using them in subtly different ways, which led to much confusion. For example, a node such as elephant might be used to represent the class of all elephants or just a typical elephant. Saying that an elephant has-part head could mean that an every elephant has some particular head, that there exists some elephant that has a head, or (more reasonably in this case) that they all have some object belonging to the class head. Depending on what interpretation you choose for your nodes and links, different inferences are valid. For example, if it's just a typical elephant, then Clyde may have properties different from general elephant properties (such as being pink and not grey). The simplest way to interpret the class nodes is as denoting sets of objects. So, an elephant node denotes the set of all elephants. Nodes such as Clyde and Nellie denote individuals. So the instance relationship can be defined in terms of set membership (Nellie is a member of the set of all elephants), while the subclass relation can be defined in terms of a subset relation - the set of all elephants is a subset of the set of all mammals. Saying that elephants are grey means (in the simple model) that every individual in the set of elephants is grey (so Clyde can't be pink). If we interpret networks in this way we have the advantage of a clear, simple semantics, but the disadvantage of a certain lack of flexibility - maybe Clyde is pink! In the debate about semantic nets, people were also concerned about their representational adequacy (i.e., what sort of facts they were capable of representing). Things that are easy to represent in logic (such as "every dog in town has bitten the constable'') are hard to represent in nets (at least, in a way that has a clear and well-defined interpretation). Techniques were developed to allow such things to be represented, which involved partitioning the net into sections, and having introducing a special (for all) relationship. These techniques didn't really catch on, so we won't go into them here, but can be found in many AI textbooks. To summarize, nets allow us to simply represent knowledge about an object that can be expressed as binary relations. Subclass and instance relations allow us to use inheritance to infer new facts/relations from the explicitly represented one. However, early nets didn't have a very clear semantics (i.e., it wasn't clear what the nodes and
AI notes on Knowledge Representation and Inference 4
links really meant). It was difficult to use nets in a fully consistent and meaningful manner, and still use them to represent what you wanted to represent. Techniques evolved to get round this, but they are quite complex, and seem to partly remove the attractive simplicity of the initial idea.
Frames
Frames are a variant of nets that are one of the most popular ways of representing non-procedural knowledge in an expert system. In a frame, all the information relevant to a particular concept is stored in a single complex entity, called a frame. Superficially, frames look pretty much like record data structures. However frames, at the very least, support inheritance. They are often used to capture knowledge about typical objects or events, such as a typical bird, or a typical restaurant meal. We could represent some knowledge about elephants in frames as follows:
Mammal subclass: Animal warm_blooded: yes Elephant subclass: * colour: * size: Clyde instance: colour: owner: Nellie: instance: size:
Elephant small
A particular frame (such as Elephant) has a number of attributes or slots such as colour and size where these slots may be filled with particular values, such as grey. We have used a "*'' to indicate those attributes that are only true of a typical member of the class, and not necessarily every member. Most frame systems will let you distinguish between typical attribute values and definite values that must be true. [Rich & Knight in fact distinguish between attribute values that are true of the class itself, such as the number of members of that class, and typical attribute values of members] In the above frame system we would be able to infer that Nellie was small, grey and warm blooded. Clyde is large, pink and warm blooded and owned by Fred. Objects and classes inherit the properties of their parent classes UNLESS they have an individual property value that conflicts with the inherited one. Inheritance is simple where each object/class has a single parent class, and where slots take single values. If slots may take more than one value it is less clear whether to block inheritance when you have more specific information. For example, if you know that a mammal has_part head, and that an elephant has_part trunk you may still want to infer that an elephant has a head. It is therefore useful to label slots according to whether they take single values or multiple values. If objects/classes have several parent classes (e.g., Clyde is both an elephant and a circus-animal), then you may have to decide which parent to inherit from (maybe elephants are by default wild, but circus animals are by default tame). There are various mechanisms for making this choice, based on choosing the most specific parent class to inherit from.
In general, both slots and slot values may themselves be frames. Allowing slots of be frames means that we can specify various attributes of a slot. We might want to say, for example, that the slot size always must take a single value of type size-set (where size-set is the set of all sizes). The slot owner may take multiple values of type person (Clyde could have more than one owner). We could specify this in the frames:
Slot no Person
The attribute value Fred (and even large and grey etc) could be represented as a frame, e.g.,:
Person Elephant-breeder
One final useful feature of frame systems is the ability to attach procedures to slots. So, if we don't know the value of a slot, but know how it could be calculated, we can attach a procedure to be used if needed, to compute the value of that slot. Maybe we have slots representing the length and width of an object and sometimes need to know the object's area - we would write a (simple) procedure to calculate it, and put that in place of the slot's value. Such mechanisms of procedural attachment are useful, but perhaps should not be overused, or else our nice frame system would consist mainly of just lots of procedures, interacting in an unpredictable fashion. Frame systems, in all their full glory, are pretty complex and sophisticated things. More details are available in AI textbooks. The main idea to get clear is the notion of inheritance and default values. Most of the other features are developed to support inheritance reasoning in a flexible but principled manner. As we saw for nets, it is easy to get confused about what slots and objects really mean. In frame systems we partly get round this by distinguishing between default and definite values, and by allowing users to make slots `first class citizens', giving the slot particular properties by writing a frame-based definition.
Predicate Logic
The most important knowledge representation language is arguably predicate logic (or strictly, first order predicate logic - there are lots of other logics out there to distinguish between). Predicate logic allows us to represent fairly complex facts about the world, and to derive new facts in a way that guarantees that, if the initial facts were true then so are the conclusions. It is a well understood formal language, with well-defined syntax, semantics and rules of inference. Here we will discuss the following: Review of Propositional Logic Predicate Logic: Syntax Predicate Logic: Semantics Proving Things in Predicate Logic Representing Things in Predicate Logic Logic and Frames
In order to infer new facts in a logic we need to apply inference rules. The semantics of the logic will define which inference rules are universally valid. One useful inference rule is the following (called modus ponens) but many others are possible: a, a b --b this rule just says that if a b is true, and a is true, then b is necessarily true. We could prove that this rule is valid using truth tables.
such as "X". For consistency with Prolog we'll use capital letters to denote variables. Function expressions such as "father(Alison)". Function expressions consist of a functor followed by a number of arguments, which can be arbitrary terms. This should all seem familiar from our description of Prolog syntax. However, although Prolog is based on predicate logic the way we represent things is slightly different, so the two should not be confused. So, atomic sentences in predicate logic include the following: friends(Alison, Richard) friends(father(Fred), father(Joe)) likes(X, Richard)
Sentences in predicate logic are constructed (much as in propositional logic) by combining atomic sentences with logical connectives, so the following are all sentences in predicate calculus: friends(Alison, Richard) likes(Alison, Richard) likes(Alison, Richard) likes(Alison, Waffles) ((likes(Alison, Richard) likes(Alison, Waffles)) likes(Alison, Waffles)) likes(Alison, Richard)
Sentences can also be formed using quantifiers to indicate how any variables in the sentence are to be treated. The two quantifiers in predicate logic are and , so the following are valid sentences:
X bird(X) flies(X) i.e., there exists some bird that doesn't fly. X (person(X) Y loves(X,Y)) i.e., every person has something that they love.
A sentence should have all its variables quantified. So strictly, an expression like "X loves(X, Y)'', though a well formed formula of predicate logic, is not a sentence. Formulae with all their variables quantified are also called closed formulae.
This only gives a flavour of how we can give a semantics to expressions in predicate logic. The details are best left to logicians. The important thing is that everything is very precisely defined, so if use predicate logic we should know exactly where we are and what inferences are valid.
X(macintosh(X) realcomputer(X)) "No macintosh is a real computer'' or "If something is a macintosh then its not a real computer'' X glaswegian(X) (supports(X,rangers) supports(X,celtic)) "All Glaswegians support either Celtic or Rangers'' existX small(X) on(X,table) "There is something small on the table'' Try out the following: "All elephants are grey'' "Every apple is either green or yellow'' "There is some student who is intelligent'' X red(X) on(X,table) small(X) Xgrapes(X) tasty(x)
[Note: When asked to translate English statements into predicate logic you should NOT use set expressions. The following expression is wrong: X:Xcarrots:orange(X).]
10
Rule-Based Systems
Instead of representing knowledge in a relatively declarative, static way (as a bunch of things that are true), rulebased system represent knowledge in terms of a bunch of rules that tell you what you should do or what you could conclude in different situations. A rule-based system consists of a bunch of IF-THEN rules, a bunch of facts, and some interpreter controlling the application of the rules, given the facts. There are two broad kinds of rule system: forward chaining systems, and backward chaining systems. In a forward chaining system you start with the initial facts, and keep using the rules to draw new conclusions (or take certain actions) given those facts. In a backward chaining system you start with some hypothesis (or goal) you are trying to prove, and keep looking for rules that would allow you to conclude that hypothesis, perhaps setting new subgoals to prove as you go. Forward chaining systems are primarily data-driven, while backward chaining systems are goaldriven. We'll look at both, and when each might be useful. [Note: Previously the term production system was use to refer to rule-based systems, and some books will use this term. However, it is a non-intuitive term so we will avoid it.] Here we will look at: Forward Chaining Systems Backward Chaining Systems Forwards vs Backwards Reasoning Uncertainty in Rules
Here we use capital letters to indicate variables. In other representations variables may be indicated in different ways, such as by a ? or a ^ (e.g., ?person, ^person). Let us assume that initially we have a working memory with the following elements: (month February) (happy Alison) (researching Alison) Our system will first go through all the rules checking which ones apply given the current working memory. Rules 2 and 3 both apply, so the system has to choose between them, using its conflict resolution strategies. Let us say that rule 2 is chosen. So, (lecturing Alison) is added to the working memory, which is now: (lecturing Alison) (month February) (happy Alison) (researching Alison) Now the cycle begins again. This time rule 3 and rule 6 have their preconditions satisfied. Lets say rule 3 is chosen and fires, so (marking-practicals Alison) is added to the working memory. On the third cycle rule 1 fires, so, with X bound to Alison, (overworked Alison) is added to working memory which is now: (overworked Alison) (marking-practicals Alison) (lecturing Alison) (month February) (happy Alison) (researching Alison) Now rules 4 and 6 can apply. Suppose rule 4 fires, and (bad-mood Alison) is added to the working memory. And in the next cycle rule 5 is chosen and fires, with (happy Alison) removed from the working memory. Finally, rule 6 will fire, and (researching Alison) will be removed from working memory, to leave: (bad-mood Alison) (overworked Alison) (marking-practicals Alison) (lecturing Alison) (month February) The order that rules fire may be crucial, especially when rules may result in items being deleted from working memory. (Systems which allow items to be deleted are known as nonmonotonic). Anyway, suppose we have the following further rule in the rule set: 7. IF (happy X) THEN (gives-high-marks X) If this rule fires BEFORE (happy Alison) is removed from working memory then the system will conclude that I'll give high marks. However, if rule 5 fires first then rule 7 will no longer apply. Of course, if we fire rule 7 and then later remove its preconditions, then it would be nice if its conclusions could then be automatically removed from working memory. Special systems called truth maintenance systems have been developed to allow this. A number of conflict resolution strategies are typically used to decide which rule to fire. These include: Don't fire a rule twice on the same data. (We don't want to keep on adding (lecturing Alison) to working memory). Fire rules on more recent working memory elements before older ones. This allows the system to follow through a single chain of reasoning, rather than keeping on drawing new conclusions from old data. Fire rules with more specific preconditions before ones with more general preconditions. This allows us to deal with non-standard cases. If, for example, we have a rule ``IF (bird X) THEN ADD (flies X)'' and another rule ``IF (bird X) AND (penguin X) THEN ADD (swims X)'' and a penguin called tweety, then we would fire the second rule first and start to draw conclusions from the fact that tweety swims.
These strategies may help in getting reasonable behaviour from a forward chaining system, but the most important thing is how we write the rules. They should be carefully constructed, with the preconditions specifying as precisely as possible when different rules should fire. Otherwise we will have little idea or control of what will happen. Sometimes special working memory elements are used to help to control the behaviour of the system. For example,
AI notes on Knowledge Representation and Inference 12
we might decide that there are certain basic stages of processing in doing some task, and certain rules should only be fired at a given stage - we could have a special working memory element (stage 1) and add (stage 1) to the preconditions of all the relevant rules, removing the working memory element when that stage was complete.
First we check whether the goal state is in the initial facts. As it isn't there, we try matching it against the conclusions of the rules. It matches rules 4 and 5. Let us assume that rule 4 is chosen first - it will try to prove (overworked Alison). Rule 1 can be used, and the system will try to prove (lecturing Alison) and (marking practicals Alison). Trying to prove the first goal, it will match rule 2 and try to prove (month February). This is in the set of initial facts. We still have to prove (marking-practicals alison). Rule 3 can be used, and we have proved the original goal (bad-mood Alison). One way of implementing this basic mechanism is to use a stack of goals still to satisfy. You should repeatedly pop a goal of the stack, and try and prove it. If its in the set of initial facts then its proved. If it matches a rule which has a set of preconditions then the goals in the precondition are pushed onto the stack. Of course, this doesn't tell us what to do when there are several rules which may be used to prove a goal. If we were using Prolog to implement this kind of algorithm we might rely on its backtracking mechanism - it'll try one rule, and if that results in failure it will go back and try the other. However, if we use a programming language without a built in search procedure we need to decide explicitly what to do. One good approach is to use an agenda, where each item on the agenda represents one alternative path in the search for a solution. The system should try `expanding' each item on the agenda, systematically trying all possibilities until it finds a solution (or fails to). The particular method used for selecting items off the agenda determines the search strategy - in other words, determines how you decide on which options to try, in what order, when solving your problem. We'll go into this in much more detail in the section on search.
Uncertainty in Rules
So far, when we have assumed that if the preconditions of a rule hold, then the conclusion will certainly hold. In fact, most of our rules have looked pretty much like logical implications, and the ideas of forward and backward reasoning also apply to logic-based approaches to knowledge representation and inference. Of course, in practice you rarely conclude things with absolute certainty. Usually we want to say things like "If Alison is tired then there's quite a good chance that she'll be in a bad mood''. To allow for this sort of reasoning in rulebased systems we often add certainty values to a rule, and attach certainties to any new conclusions. We might conclude that Alison is probably in a bad mood (maybe with certainty 0.6). The approaches used are generally loosely based on probability theory, but are much less rigorous, aiming just for a good guess rather than precise probabilities. We'll talk about this more in a later lecture.
14
References:
[Rich & Knight] [Russel & Norvig]
15