Knowledge Representation and Inference

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Overview of AI Techniques

There are various techniques that have evolved that can be applied to a variety of AI tasks - these will be the focus of this course. These techniques are concerned with how we represent, manipulate and reason with knowledge in order to solve problems. Knowledge Representation Search

Knowledge Representation
Knowledge representation is crucial. One of the clearest results of artificial intelligence research so far is that solving even apparently simple problems requires lots of knowledge. Really understanding a single sentence requires extensive knowledge both of language and of the context. For example, December 29th 2002 headline "It's President Kibaki'' can only be interpreted reasonably if you know it was the day after Kenya's general elections. Really understanding a visual scene similarly requires knowledge of the kinds of objects in the scene. Solving problems in a particular domain generally requires knowledge of the objects in the domain and knowledge of how to reason in that domain - both these types of knowledge must be represented. Knowledge must be represented efficiently, and in a meaningful way. Efficiency is important, as it would be impossible (or at least impractical) to explicitly represent every fact that you might ever need. There are just so many potentially useful facts, most of which you would never even think of. You have to be able to infer new facts from your existing knowledge, as and when needed, and capture general abstractions which represent general features of sets of objects in the world. Knowledge must be meaningfully represented so that we know how it relates back to the real world. A knowledge representation scheme provides a mapping from features of the world to a formal language. (The formal language will just capture certain aspects of the world, which we believe are important to our problem - we may of course miss out crucial aspects and so fail to really solve our problem, like ignoring friction in a mechanics problem). Anyway, when we manipulate that formal language using a computer we want to make sure that we still have meaningful expressions, which can be mapped back to the real world. This is what we mean when we talk about the semantics of representation languages.

Search
Another crucial general technique required when writing AI programs is search. Often there is no direct way to find a solution to some problem. However, you do know how to generate possibilities. For example, in solving a puzzle you might know all the possible moves, but not the sequence that would lead to a solution. When working out how to get somewhere you might know all the roads/buses/trains, just not the best route to get you to your destination quickly. Developing good ways to search through these possibilities for a good solution is therefore vital. Brute force techniques, where you generate and try out every possible solution may work, but are often very inefficient, as there are just too many possibilities to try. Heuristic techniques are often better, where you only try the options which you think (based on your current best guess) are most likely to lead to a good solution.

These are edited and compiled notes. Source;

http://www.cee.hw.ac.uk/~alison/ai3notes/chapter2_4.html by Alison Cawsey

AI notes on Knowledge Representation and Inference

Intro' to Knowledge Representation and Inference


As mentioned in our lectures, one of the assumptions underlying work in Artificial Intelligence is that intelligent behaviour can be achieved through the manipulation of symbol structures (representing bits of knowledge). These symbols can be represented on any medium - in principle, we could develop a (very slow) intelligent machine made out of empty soda cans (plus something to move the soda cans around). However, computers provide the representational and reasoning powers whereby we might realistically expect to make progress towards automating intelligent behaviour. So, the main question now is how we can represent knowledge as symbol structures and use that knowledge to intelligently solve problems. The next few parts will concentrate on how we represent knowledge, using particular knowledge representation languages. These are high-level representation formalisms, and can in principle be implemented using a whole range of programming languages. The remaining parts will concentrate more on how we solve problems, using general knowledge of problem solving and domain knowledge. In AI, the crucial thing about knowledge representation languages is that they should support inference. We can't represent explicitly everything that the system might ever need to know - some things should be left implicit, to be deduced by the system as and when needed in problem solving. For example if we were representing facts about a particular MSc Info. Sys. student (say Mutua) we don't want to have to explicitly record the fact that Mutua is studying AI. All MSc. students are, so we should be able to deduce it. Similarly, you probably wouldn't explicitly represent the fact that on Sundays the University is closed, but the Nairobi Hilton is open. You can deduce these things from your general knowledge about the world. Representing everything explicitly would be extremely wasteful of memory. For our MSc. example, we'd have 70 statements representing the fact that each student studies AI. Most of these facts would never be used. However, if we DO need to know if Mutua studies AI we want to be able to get at that information efficiently. We also would like to be able to make more complex inferences - maybe that Mutua should be attending a lecture at 5:30 pm on Tuesday Feb 18th, so he won't be able to have a lab session then. However, there is a tradeoff between inferential power (what we can infer) and inferential efficiency (how quickly we can infer it), so we may choose to have a language where simple inferences can be made quickly, though complex ones are not possible. In general, a good knowledge representation language should have at least the following features: It should allow you to express the knowledge you wish to represent in the language. For example, suppose you want to represent the fact that "Richard knows how old he is''. This turns out to be difficult to express in some languages. It should allow new knowledge to be inferred from a basic set of facts, as discussed above. It should be clear, and have a well defined syntax and semantics. We want to know what the allowable expressions are in the language, and what they mean. Otherwise we won't be sure if our inferences are correct, or what the results mean. For example, if we have a fact grey(elephant) we want to know whether it means all elephants are grey, some particular one is grey, or what. Some of these features may be present in recent non-AI representation languages, such as deductive and object oriented databases. In fact, these systems have been influenced by early AI research on knowledge representation, and there is some promise of further cross-fertilization of ideas, to allow robust, multi-user knowledge/data bases with well defined semantics and flexible representation and inference capabilities. However, at present the fields are still largely separate, and we will only be discussing basic AI approaches here. Broadly speaking, there are three main approaches to knowledge representation in AI. The most important is arguably the use of logic. A logic, almost by definition, has a well defined syntax and semantics, and is concerned with truth preserving inference. However, using logic to represent things has problems. On the one hand, it may not be very efficient - if we just want a very restricted class of inferences, we may not want the full power of a logicbased theorem prover, for example. On the other hand, representing some common-sense things in a logic can be very hard. For example in first order predicate logic we can't conclude that something is true one minute, and then later decide that it isn't true after all. If we did this it would lead to a contradiction, from which we could prove anything at all! We could decide to use more complex logics which allow this kind of reasoning - there are all sorts of

Broad approaches to knowledge representation: Logic, Structured objects and Production systems AI notes on Knowledge Representation and Inference

logics out there, such as default logics, temporal logics and modal logics. However, another approach is to abandon the constraints that the use of a logic imposes and use a less clean, but more flexible knowledge representation language. Two such "languages'' are structured objects and production systems. The idea of structured objects is to represent knowledge as a collection of objects and relations, the most important relations being the subclass and instance relations. The subclass relation (as you might expect) says that one class is a subclass of another, while the instance relation says that some individual belongs to some class. We'll use them so that "X subclass Y'' means that X is a subclass of Y, not that X has a subclass Y. (Some books/approaches use the relation is-a to refer to the subclass relation. So Mutua is an instance of the class representing ICS611 students (not all 1st year MSc. students are studying/attending this course now), while the class of ICS611 students is a subclass of the class of 1st year MSc. students. We can then define property inheritance, so that, by default, Mutua inherits all the typical attributes of ICS611 students, and ICS611 students inherit typical attributes of 1st year MSc. students. We'll go into this in much more detail later. Production systems consist of a set of if-then rules, and a working memory. The working memory represents the facts that are currently believed to hold, while the if-then rules typically state that if certain conditions hold (e.g., certain facts are in the working memory), then some action should be taken (e.g., other facts should be added or deleted). If the only action allowed is to add a fact to working memory then rules may be essentially logical implications, but generally greater flexibility is allowed. Production rules capture (relatively) procedural knowledge in a simple, modular manner. In the next few parts we will describe these different knowledge representation languages in more detail. We'll start with structured objects, as these are fairly easy to understand. Then we'll talk about logic, and then production rules. The discussion of production rules should naturally exploit what you have already learnt in the topic: problem solving using search.

Structured Objects
[Note: In this section the notation and terminology may not be the same as your textbooks, though the underlying ideas should be the same! Try to stick with the notation used here, but don't view it as THE correct one. Terms like "instance'', "subclass'' and the particular representation of frames (e.g., use of "*'') will vary across different texts.] We will discuss the following: Semantic Nets Frames

Semantic Nets
The simplest kind of structured object is the semantic net originally developed in the early 1960s to represent the meaning of English words. They are important both historically, and in introducing the basic ideas of class hierarchies and inheritance. A semantic net is really just a graph, where the nodes in the graph represent concepts, and the arcs (or links) represent binary relationships between concepts. The most important relations between concepts are subclass relations between classes and subclasses, and instance relations between particular objects and their parent class. However, any other relations are allowed, such as has-part, is-a, colour, etc. So, to represent some knowledge about animals (as AI people so often do) we might have the following network:

AI notes on Knowledge Representation and Inference

This network represents the fact that mammals and reptiles are animals, that mammals have heads, an elephant is a large grey mammal, Clyde and Nellie are both elephants, and that Nellie likes apples. The subclass relations define a class hierarchy (in this case very simple). The subclass and instance relations may be used to derive new information which is not explicitly represented. We should be able to conclude that Clyde and Nellie both have a head, and are large and grey. They inherit information from their parent classes. Semantic networks normally allow efficient inheritance-based inferences using special purpose algorithms. Semantic nets are fine at representing relationships between two objects - but what if we want to represent a relation between three or more objects? Say we want to represent the fact that "John gives Mary the book'' This might be represented in logic as gives(John, Mary, book2) where book2 represents the particular book we are talking about. However, in semantic networks we have to view the fact as representing a set of binary relationships between a "giving'' event and some objects. When semantic networks became popular in the 1970s there was much discussion about what the nodes and relations really meant. People were using them in subtly different ways, which led to much confusion. For example, a node such as elephant might be used to represent the class of all elephants or just a typical elephant. Saying that an elephant has-part head could mean that an every elephant has some particular head, that there exists some elephant that has a head, or (more reasonably in this case) that they all have some object belonging to the class head. Depending on what interpretation you choose for your nodes and links, different inferences are valid. For example, if it's just a typical elephant, then Clyde may have properties different from general elephant properties (such as being pink and not grey). The simplest way to interpret the class nodes is as denoting sets of objects. So, an elephant node denotes the set of all elephants. Nodes such as Clyde and Nellie denote individuals. So the instance relationship can be defined in terms of set membership (Nellie is a member of the set of all elephants), while the subclass relation can be defined in terms of a subset relation - the set of all elephants is a subset of the set of all mammals. Saying that elephants are grey means (in the simple model) that every individual in the set of elephants is grey (so Clyde can't be pink). If we interpret networks in this way we have the advantage of a clear, simple semantics, but the disadvantage of a certain lack of flexibility - maybe Clyde is pink! In the debate about semantic nets, people were also concerned about their representational adequacy (i.e., what sort of facts they were capable of representing). Things that are easy to represent in logic (such as "every dog in town has bitten the constable'') are hard to represent in nets (at least, in a way that has a clear and well-defined interpretation). Techniques were developed to allow such things to be represented, which involved partitioning the net into sections, and having introducing a special (for all) relationship. These techniques didn't really catch on, so we won't go into them here, but can be found in many AI textbooks. To summarize, nets allow us to simply represent knowledge about an object that can be expressed as binary relations. Subclass and instance relations allow us to use inheritance to infer new facts/relations from the explicitly represented one. However, early nets didn't have a very clear semantics (i.e., it wasn't clear what the nodes and
AI notes on Knowledge Representation and Inference 4

links really meant). It was difficult to use nets in a fully consistent and meaningful manner, and still use them to represent what you wanted to represent. Techniques evolved to get round this, but they are quite complex, and seem to partly remove the attractive simplicity of the initial idea.

Frames
Frames are a variant of nets that are one of the most popular ways of representing non-procedural knowledge in an expert system. In a frame, all the information relevant to a particular concept is stored in a single complex entity, called a frame. Superficially, frames look pretty much like record data structures. However frames, at the very least, support inheritance. They are often used to capture knowledge about typical objects or events, such as a typical bird, or a typical restaurant meal. We could represent some knowledge about elephants in frames as follows:

Mammal subclass: Animal warm_blooded: yes Elephant subclass: * colour: * size: Clyde instance: colour: owner: Nellie: instance: size:

Mammal grey large

Elephant pink Mutua

Elephant small

A particular frame (such as Elephant) has a number of attributes or slots such as colour and size where these slots may be filled with particular values, such as grey. We have used a "*'' to indicate those attributes that are only true of a typical member of the class, and not necessarily every member. Most frame systems will let you distinguish between typical attribute values and definite values that must be true. [Rich & Knight in fact distinguish between attribute values that are true of the class itself, such as the number of members of that class, and typical attribute values of members] In the above frame system we would be able to infer that Nellie was small, grey and warm blooded. Clyde is large, pink and warm blooded and owned by Fred. Objects and classes inherit the properties of their parent classes UNLESS they have an individual property value that conflicts with the inherited one. Inheritance is simple where each object/class has a single parent class, and where slots take single values. If slots may take more than one value it is less clear whether to block inheritance when you have more specific information. For example, if you know that a mammal has_part head, and that an elephant has_part trunk you may still want to infer that an elephant has a head. It is therefore useful to label slots according to whether they take single values or multiple values. If objects/classes have several parent classes (e.g., Clyde is both an elephant and a circus-animal), then you may have to decide which parent to inherit from (maybe elephants are by default wild, but circus animals are by default tame). There are various mechanisms for making this choice, based on choosing the most specific parent class to inherit from.

AI notes on Knowledge Representation and Inference

In general, both slots and slot values may themselves be frames. Allowing slots of be frames means that we can specify various attributes of a slot. We might want to say, for example, that the slot size always must take a single value of type size-set (where size-set is the set of all sizes). The slot owner may take multiple values of type person (Clyde could have more than one owner). We could specify this in the frames:

Size: instance: single_valued: range: Owner: instance: single_valued: range:

Slot yes Size-set

Slot no Person

The attribute value Fred (and even large and grey etc) could be represented as a frame, e.g.,:

Fred: instance: occupation:

Person Elephant-breeder

One final useful feature of frame systems is the ability to attach procedures to slots. So, if we don't know the value of a slot, but know how it could be calculated, we can attach a procedure to be used if needed, to compute the value of that slot. Maybe we have slots representing the length and width of an object and sometimes need to know the object's area - we would write a (simple) procedure to calculate it, and put that in place of the slot's value. Such mechanisms of procedural attachment are useful, but perhaps should not be overused, or else our nice frame system would consist mainly of just lots of procedures, interacting in an unpredictable fashion. Frame systems, in all their full glory, are pretty complex and sophisticated things. More details are available in AI textbooks. The main idea to get clear is the notion of inheritance and default values. Most of the other features are developed to support inheritance reasoning in a flexible but principled manner. As we saw for nets, it is easy to get confused about what slots and objects really mean. In frame systems we partly get round this by distinguishing between default and definite values, and by allowing users to make slots `first class citizens', giving the slot particular properties by writing a frame-based definition.

Predicate Logic
The most important knowledge representation language is arguably predicate logic (or strictly, first order predicate logic - there are lots of other logics out there to distinguish between). Predicate logic allows us to represent fairly complex facts about the world, and to derive new facts in a way that guarantees that, if the initial facts were true then so are the conclusions. It is a well understood formal language, with well-defined syntax, semantics and rules of inference. Here we will discuss the following: Review of Propositional Logic Predicate Logic: Syntax Predicate Logic: Semantics Proving Things in Predicate Logic Representing Things in Predicate Logic Logic and Frames

AI notes on Knowledge Representation and Inference

Review of Propositional Logic


Predicate logic is a development of propositional logic, which should be familiar to you. In proposition logic a fact such as "Alison likes waffles'' would be represented as a simple atomic proposition. Lets call it P. We can build up more complex expressions (sentences) by combining atomic propositions with the logical connectives and . So if we had the proposition Q representing the fact "Alison eats waffles'' we could have the facts: P Q : "Alison likes waffles or Alison eats waffles'' P Q : "Alison likes waffles and Alison eats waffles'' Q: "Alison doesn't eat waffles'' P Q : "If Alison likes waffles then Alison eats waffles''. In general, if X and Y are sentences in propositional logic, then so are X Y, X Y, X, X Y, and X Y. So the following are valid sentences in the logic: P Q P (P Q) (Q R) P Propositions can be true or false in the world. An interpretation function assigns, to each proposition, a truth value (i.e., true or false). This interpretation function says what is true in the world. We can determine the truth value of arbitrary sentences using truth tables which define the truth values of sentences with logical connectives in terms of the truth values of their component sentences. The truth tables provide a simple semantics for expressions in propositional logic. As sentences can only be true or false, truth tables are very simple, for example: X T T F F Y T F T F XY T F F F

In order to infer new facts in a logic we need to apply inference rules. The semantics of the logic will define which inference rules are universally valid. One useful inference rule is the following (called modus ponens) but many others are possible: a, a b --b this rule just says that if a b is true, and a is true, then b is necessarily true. We could prove that this rule is valid using truth tables.

Predicate Logic: Syntax


The trouble with propositional logic is that it is not possible to write general statements in it, such as "Alison eats everything that she likes''. We'd have to have lots of rules, for every different thing that Alison liked. Predicate logic makes such general statements possible. Sentences in predicate calculus are built up from atomic sentences (not to be confused with Prolog atoms). Atomic sentences consist of a predicate name followed by a number of arguments. These arguments may be any term. Terms may be: Constant symbols such as "Alison". Variable symbols
AI notes on Knowledge Representation and Inference 7

such as "X". For consistency with Prolog we'll use capital letters to denote variables. Function expressions such as "father(Alison)". Function expressions consist of a functor followed by a number of arguments, which can be arbitrary terms. This should all seem familiar from our description of Prolog syntax. However, although Prolog is based on predicate logic the way we represent things is slightly different, so the two should not be confused. So, atomic sentences in predicate logic include the following: friends(Alison, Richard) friends(father(Fred), father(Joe)) likes(X, Richard)

Sentences in predicate logic are constructed (much as in propositional logic) by combining atomic sentences with logical connectives, so the following are all sentences in predicate calculus: friends(Alison, Richard) likes(Alison, Richard) likes(Alison, Richard) likes(Alison, Waffles) ((likes(Alison, Richard) likes(Alison, Waffles)) likes(Alison, Waffles)) likes(Alison, Richard)

Sentences can also be formed using quantifiers to indicate how any variables in the sentence are to be treated. The two quantifiers in predicate logic are and , so the following are valid sentences:

X bird(X) flies(X) i.e., there exists some bird that doesn't fly. X (person(X) Y loves(X,Y)) i.e., every person has something that they love.

A sentence should have all its variables quantified. So strictly, an expression like "X loves(X, Y)'', though a well formed formula of predicate logic, is not a sentence. Formulae with all their variables quantified are also called closed formulae.

Predicate Logic: Semantics


The semantics of predicate logic is defined (as in propositional logic) in terms of the truth values of sentences. Like in propositional logic, we can determine the truth value of any sentence in predicate calculus if we know the truth values of the basic components of that sentence. An interpretation function defines the basic meanings/truth values of the basic components, given some domain of objects that we are concerned with. In propositional logic we saw that this interpretation function was very simple, just assigning truth values to propositions. However, in predicate calculus we have to deal with predicates, variables and quantifiers, so things get much more complex. Predicates are dealt with in the following way. If we have, say, a predicate P with 2 arguments, then the meaning of that predicate is defined in terms of a mapping from all possible pairs of objects in the domain to a truth value. So, suppose we have a domain with just three objects in: Fred, Jim and Joe. We can define the meaning of the predicate father in terms of all the pairs of objects for which the father relationship is true - say Fred and Jim. The meaning of and is defined again in terms of the set of objects in the domain. X S means that for every object X in the domain, S is true. X S means that for some object X in the domain, S is true. So, X father(Fred, X), given our world (domain) of 3 objects (Fred, Jim, Joe), would only be true if father(Fred, X) was true for each object. In our interpretation of the father relation this only holds for X=Jim, so the whole quantified expression will be false in this interpretation.

AI notes on Knowledge Representation and Inference

This only gives a flavour of how we can give a semantics to expressions in predicate logic. The details are best left to logicians. The important thing is that everything is very precisely defined, so if use predicate logic we should know exactly where we are and what inferences are valid.

Proving Things in Predicate Logic


To prove things in predicate calculus we need two things. First we need to know what inference rules are valid - we can't keep going back to the formal semantics when trying to draw a simple inference! Second we need to know a good proof procedure that will allow us to prove things with the inference rules in an efficient manner. When discussing propositional logic we noted that a much used inference rule was modus ponens: A, A B --B This rule is a sound rule of inference for predicate logic. Given the semantics of the logic, if the premises are true then the conclusions are guaranteed true. Other sound inference rules include modus tollens (if A B is true and B is false then conclude A), and-elimination (if A B is true then conclude both A is true and B is true), and lots more. In predicate logic we need to consider how to apply these rules if the expressions involved have variables. For example we would like to be able to use the facts X (man(X) mortal(X)) and man(Socrates) and conclude mortal(Socrates). To do this we can use modus ponens, but allow universally quantified sentences to be matched with other sentences (like in Prolog). So, if we have a sentence X A B and a sentence C then if A and C can be matched or unified then we can apply modus ponens. The most well known general proof procedure for predicate calculus is resolution. Resolution is a sound proof procedure for proving things by refutation - if you can derive a contradiction from P then P must be true. In resolution theorem proving, all statements in the logic are transformed into a normal form involving disjunctions of atomic expressions or negated atomic expressions (e.g., dog(X) animal(X)). This allows new expressions to be deduced using a single inference rule. Basically, if we have an expression A1 v A2 ... An C and an expression B1 B2 ... Bm C then we can deduce a new expression A1 A2 ... An B1 B2 ... Bm. This single inference rule can be applied in a systematic proof procedure. This is all described in tedious detail in Rich & Knight, pgs 143-160]. Resolution is a sound proof procedure. If we prove something using it we can be sure it is a valid conclusion. However, there are many other things to worry about when looking at a proof procedure. It may not be complete (i.e., we may not be able to always prove something is true even if it is true) or decidable (the procedure may never halt when trying to prove something that is false). Variants of resolution may be complete, but no proof procedure based on predicate logic is decidable. And of course, it may just not be computationally efficient. It may eventually prove something, but take such a long time that it is just not usable. The efficiency of a proof will often depend as much on how you formulate your problem as on the general proof procedure used, but it is still an important issue to bear in mind.

Representing Things in Predicate Logic


Your average AI programmer/researcher may not need to know the details of predicate logic semantics or proof theory, but they do need to know how to represent things in predicate logic, and what expressions in predicate logic mean. Formally we've already gone through what expressions mean, but it may make more sense to give a whole bunch of examples. This section will just give a list of logical expressions paired with English descriptions, then some unpaired logical or English expressions - you should try and work out for yourself how to represent the English expressions in Logic, and what the Logic expressions mean in English. X table(X) numberoflegs(X,4) "There is some table that doesn't have 4 legs''

AI notes on Knowledge Representation and Inference

X(macintosh(X) realcomputer(X)) "No macintosh is a real computer'' or "If something is a macintosh then its not a real computer'' X glaswegian(X) (supports(X,rangers) supports(X,celtic)) "All Glaswegians support either Celtic or Rangers'' existX small(X) on(X,table) "There is something small on the table'' Try out the following: "All elephants are grey'' "Every apple is either green or yellow'' "There is some student who is intelligent'' X red(X) on(X,table) small(X) Xgrapes(X) tasty(x)

[Note: When asked to translate English statements into predicate logic you should NOT use set expressions. The following expression is wrong: X:Xcarrots:orange(X).]

Logic and Frames


Representation languages such as frames often have their semantics defined in terms of predicate (or other) logics. Once we have defined precisely what all the expressions and relations mean in terms of a well understood logic then we can make sure than any inferences that are drawn are sound, according to that logic. For example, we could decide that if an object elephant has a definite slot colour with value grey then this means that: X elephant(X) colour(X,Y) grey(Y). (and similarly for any other definite slot) If we have slots that take default values then we will need a more powerful logic to represent their meaning, such as a default logic. Using a representation language with a logic-based semantics has the advantage that we can deal (on the surface) with a simple, natural representation language such as frames, while underneath we can be quietly confident that the inferences drawn by the system are all sound. Of course, we have to understand the semantics of the language to be able to represent things meaningfully in it, but this may not be as awkward as dealing directly with the logic. Another possible "advantage'' of this approach is that something like a frame system typically has restricted representational power compared with full predicate (or default) logic. This may sound like a disadvantage, as it will mean there are some things we can't represent. However, the gain in efficiency you get by reasoning with a restricted subset usually makes this tradeoff worthwhile. In fact, new logics (called terminological logics) have been developed, which have the expressive power needed to perform inheritence type inferences on simple properties of classes of objects (as in frame systems), but which do not allow some of things (deductions etc) possible in predicate logic. These allow you to reason directly in the logic, rather than using the special inferences of a frame system, which are only indirectly validated by a logical semantics. Terminological logics have more restricted expressive power than predicate logic, but greater efficiency. So, you can choose between a logic with (fairly) great expressive power, but rather inefficient (and undecidable) inference and proof procedures, or a logic with slightly weaker representational power, but which you can reason with efficiently. Or you can use something like a frame system, which may (or may not) have a well-defined semantics, and which uses special purpose inference procedures to perform class related deductions (such as inheritance of slot values from parent classes).

AI notes on Knowledge Representation and Inference

10

Rule-Based Systems
Instead of representing knowledge in a relatively declarative, static way (as a bunch of things that are true), rulebased system represent knowledge in terms of a bunch of rules that tell you what you should do or what you could conclude in different situations. A rule-based system consists of a bunch of IF-THEN rules, a bunch of facts, and some interpreter controlling the application of the rules, given the facts. There are two broad kinds of rule system: forward chaining systems, and backward chaining systems. In a forward chaining system you start with the initial facts, and keep using the rules to draw new conclusions (or take certain actions) given those facts. In a backward chaining system you start with some hypothesis (or goal) you are trying to prove, and keep looking for rules that would allow you to conclude that hypothesis, perhaps setting new subgoals to prove as you go. Forward chaining systems are primarily data-driven, while backward chaining systems are goaldriven. We'll look at both, and when each might be useful. [Note: Previously the term production system was use to refer to rule-based systems, and some books will use this term. However, it is a non-intuitive term so we will avoid it.] Here we will look at: Forward Chaining Systems Backward Chaining Systems Forwards vs Backwards Reasoning Uncertainty in Rules

Forward Chaining Systems


In a forward chaining system the facts in the system are represented in a working memory which is continually updated. Rules in the system represent possible actions to take when specified conditions hold on items in the working memory - they are sometimes called condition-action rules. The conditions are usually patterns that must match items in the working memory, while the actions usually involve adding or deleting items from the working memory. The interpreter controls the application of the rules, given the working memory, thus controlling the system's activity. It is based on a cycle of activity sometimes known as a recognise-act cycle. The system first checks to find all the rules whose conditions hold, given the current state of working memory. It then selects one and performs the actions in the action part of the rule. (The selection of a rule to fire is based on fixed strategies, known as conflict resolution strategies.) The actions will result in a new working memory, and the cycle begins again. This cycle will be repeated until either no rules fire, or some specified goal state is satisfied. Rule-based systems vary greatly in their details and syntax, so the following examples are only illustrative. First we'll look at a very simple set of rules: 1. IF (lecturing X) AND (marking-practicals X) THEN ADD (overworked X) 2. IF (month february) THEN ADD (lecturing alison) 3. IF (month february) THEN ADD (marking-practicals alison) 4. IF (overworked X) OR (slept-badly X) THEN ADD (bad-mood X) 5. IF (bad-mood X) THEN DELETE (happy X) 6. IF (lecturing X) THEN DELETE (researching X)
11

AI notes on Knowledge Representation and Inference

Here we use capital letters to indicate variables. In other representations variables may be indicated in different ways, such as by a ? or a ^ (e.g., ?person, ^person). Let us assume that initially we have a working memory with the following elements: (month February) (happy Alison) (researching Alison) Our system will first go through all the rules checking which ones apply given the current working memory. Rules 2 and 3 both apply, so the system has to choose between them, using its conflict resolution strategies. Let us say that rule 2 is chosen. So, (lecturing Alison) is added to the working memory, which is now: (lecturing Alison) (month February) (happy Alison) (researching Alison) Now the cycle begins again. This time rule 3 and rule 6 have their preconditions satisfied. Lets say rule 3 is chosen and fires, so (marking-practicals Alison) is added to the working memory. On the third cycle rule 1 fires, so, with X bound to Alison, (overworked Alison) is added to working memory which is now: (overworked Alison) (marking-practicals Alison) (lecturing Alison) (month February) (happy Alison) (researching Alison) Now rules 4 and 6 can apply. Suppose rule 4 fires, and (bad-mood Alison) is added to the working memory. And in the next cycle rule 5 is chosen and fires, with (happy Alison) removed from the working memory. Finally, rule 6 will fire, and (researching Alison) will be removed from working memory, to leave: (bad-mood Alison) (overworked Alison) (marking-practicals Alison) (lecturing Alison) (month February) The order that rules fire may be crucial, especially when rules may result in items being deleted from working memory. (Systems which allow items to be deleted are known as nonmonotonic). Anyway, suppose we have the following further rule in the rule set: 7. IF (happy X) THEN (gives-high-marks X) If this rule fires BEFORE (happy Alison) is removed from working memory then the system will conclude that I'll give high marks. However, if rule 5 fires first then rule 7 will no longer apply. Of course, if we fire rule 7 and then later remove its preconditions, then it would be nice if its conclusions could then be automatically removed from working memory. Special systems called truth maintenance systems have been developed to allow this. A number of conflict resolution strategies are typically used to decide which rule to fire. These include: Don't fire a rule twice on the same data. (We don't want to keep on adding (lecturing Alison) to working memory). Fire rules on more recent working memory elements before older ones. This allows the system to follow through a single chain of reasoning, rather than keeping on drawing new conclusions from old data. Fire rules with more specific preconditions before ones with more general preconditions. This allows us to deal with non-standard cases. If, for example, we have a rule ``IF (bird X) THEN ADD (flies X)'' and another rule ``IF (bird X) AND (penguin X) THEN ADD (swims X)'' and a penguin called tweety, then we would fire the second rule first and start to draw conclusions from the fact that tweety swims.

These strategies may help in getting reasonable behaviour from a forward chaining system, but the most important thing is how we write the rules. They should be carefully constructed, with the preconditions specifying as precisely as possible when different rules should fire. Otherwise we will have little idea or control of what will happen. Sometimes special working memory elements are used to help to control the behaviour of the system. For example,
AI notes on Knowledge Representation and Inference 12

we might decide that there are certain basic stages of processing in doing some task, and certain rules should only be fired at a given stage - we could have a special working memory element (stage 1) and add (stage 1) to the preconditions of all the relevant rules, removing the working memory element when that stage was complete.

Backward Chaining Systems


[Rich &Knight, 6.3] So far we have looked at how rule-based systems can be used to draw new conclusions from existing data, adding these conclusions to a working memory. This approach is most useful when you know all the initial facts, but don't have much idea what the conclusion might be. If you DO know what the conclusion might be, or have some specific hypothesis to test, forward chaining systems may be inefficient. You COULD keep on forward chaining until no more rules apply or you have added your hypothesis to the working memory. But in the process the system is likely to do a lot of irrelevant work, adding uninteresting conclusions to working memory. For example, suppose we are interested in whether Alison is in a bad mood. We could repeatedly fire rules, updating the working memory, checking each time whether (bad-mood Alison) is in the new working memory. But maybe we had a whole batch of rules for drawing conclusions about what happens when I'm lecturing, or what happens in February - we really don't care about this, so would rather only have to draw the conclusions that are relevant to the goal. This can be done by backward chaining from the goal state (or on some hypothesized state that we are interested in). This is essentially what Prolog does, so it should be fairly familiar to you by now. Given a goal state to try and prove (e.g., (bad-mood Alison)) the system will first check to see if the goal matches the initial facts given. If it does, then that goal succeeds. If it doesn't the system will look for rules whose conclusions (previously referred to as actions) match the goal. One such rule will be chosen, and the system will then try to prove any facts in the preconditions of the rule using the same procedure, setting these as new goals to prove. Note that a backward chaining system does NOT need to update a working memory. Instead it needs to keep track of what goals it needs to prove to prove its main hypothesis. In principle we can use the same set of rules for both forward and backward chaining. However, in practice we may choose to write the rules slightly differently if we are going to be using them for backward chaining. In backward chaining we are concerned with matching the conclusion of a rule against some goal that we are trying to prove. So the 'then' part of the rule is usually not expressed as an action to take (e.g., add/delete), but as a state which will be true if the premises are true. So, suppose we have the following rules: 7. IF (lecturing X) AND (marking-practicals X) THEN (overworked X) 8. IF (month february) THEN (lecturing alison) 9. IF (month february) THEN (marking-practicals alison) 10. IF (overworked X) THEN (bad-mood X) 11. IF (slept-badly X) THEN (bad-mood X) 12. IF (month february) THEN (weather cold) 13. IF (year 1993) THEN (economy bad) and initial facts: (month february) (year 1993) and we're trying to prove: (bad-mood alison)
AI notes on Knowledge Representation and Inference 13

First we check whether the goal state is in the initial facts. As it isn't there, we try matching it against the conclusions of the rules. It matches rules 4 and 5. Let us assume that rule 4 is chosen first - it will try to prove (overworked Alison). Rule 1 can be used, and the system will try to prove (lecturing Alison) and (marking practicals Alison). Trying to prove the first goal, it will match rule 2 and try to prove (month February). This is in the set of initial facts. We still have to prove (marking-practicals alison). Rule 3 can be used, and we have proved the original goal (bad-mood Alison). One way of implementing this basic mechanism is to use a stack of goals still to satisfy. You should repeatedly pop a goal of the stack, and try and prove it. If its in the set of initial facts then its proved. If it matches a rule which has a set of preconditions then the goals in the precondition are pushed onto the stack. Of course, this doesn't tell us what to do when there are several rules which may be used to prove a goal. If we were using Prolog to implement this kind of algorithm we might rely on its backtracking mechanism - it'll try one rule, and if that results in failure it will go back and try the other. However, if we use a programming language without a built in search procedure we need to decide explicitly what to do. One good approach is to use an agenda, where each item on the agenda represents one alternative path in the search for a solution. The system should try `expanding' each item on the agenda, systematically trying all possibilities until it finds a solution (or fails to). The particular method used for selecting items off the agenda determines the search strategy - in other words, determines how you decide on which options to try, in what order, when solving your problem. We'll go into this in much more detail in the section on search.

Forwards vs Backwards Reasoning


Whether you use forward or backwards reasoning to solve a problem depends on the properties of your rule set and initial facts. Sometimes, if you have some particular goal (to test some hypothesis), then backward chaining will be much more efficient, as you avoid drawing conclusions from irrelevant facts. However, sometimes backward chaining can be very wasteful - there may be many possible ways of trying to prove something, and you may have to try almost all of them before you find one that works. Forward chaining may be better if you have lots of things you want to prove (or if you just want to find out in general what new facts are true); when you have a small set of initial facts; and when there tend to be lots of different rules which allow you to draw the same conclusion. Backward chaining may be better if you are trying to prove a single fact, given a large set of initial facts, and where, if you used forward chaining, lots of rules would be eligible to fire in any cycle.

Uncertainty in Rules
So far, when we have assumed that if the preconditions of a rule hold, then the conclusion will certainly hold. In fact, most of our rules have looked pretty much like logical implications, and the ideas of forward and backward reasoning also apply to logic-based approaches to knowledge representation and inference. Of course, in practice you rarely conclude things with absolute certainty. Usually we want to say things like "If Alison is tired then there's quite a good chance that she'll be in a bad mood''. To allow for this sort of reasoning in rulebased systems we often add certainty values to a rule, and attach certainties to any new conclusions. We might conclude that Alison is probably in a bad mood (maybe with certainty 0.6). The approaches used are generally loosely based on probability theory, but are much less rigorous, aiming just for a good guess rather than precise probabilities. We'll talk about this more in a later lecture.

AI notes on Knowledge Representation and Inference

14

Advantages and Disadvantages of KR Languages


So far we have discussed three approaches to knowledge representation and inference: structured objects, logic, and rules. Structured objects are useful for representing declarative information about collections of related objects/concepts, and in particular where there is a clear class hierarchy, and where you want to use inheritance to infer the attributes of objects in subclasses from the attributes of objects in the parent class. Early approaches tended to have poorly specified semantics but there are now some practical systems with a clear underlying semantics. Structured objects are no good if you want to draw a wide range of different sorts of inferences, maybe using IFTHEN rules. For this you could use a logic-based approach, along with a theorem prover, or you could use a rulebased system. Logic-based approaches allow you to represent fairly complex things (involving quantification etc), and have a well-defined syntax, semantics and proof theory. However, no hacking is allowed in logic! If you can't represent something in your logic then that's too bad. General purpose theorem provers may also be very inefficient, especially once you get to more powerful logics. Rule-based systems tend to allow only relatively simple representations of the underlying facts in the domain, but may be more flexible, and often allow certainty values to be associated with rules and facts. While logic is primarily used in a declarative way, saying what's true in the world, rule-based systems (especially forward chaining systems) are concerned more with procedural knowledge - what to do when. In all the approaches it should be possible to add new facts (and rules) in a simple, incremental fashion, without rewriting the whole system. This is an important advantage compared with just writing a Pascal program which implicitly captures the knowledge. AI programs typically have a distinct knowledge base, to which new facts can be added as needed, capturing the rules and facts of the domain in question. Separate problem solving procedures may then access and possibly update that knowledge. Whatever "language'' you represent the knowledge in, certain things are important. First, it should be represented at the right level of abstraction - you want to be able to write a few general purpose facts/rules, not a whole lot of very specific ones. Next, it is helpful to write things in a way which allows new facts/rules to be added without radically changing the behaviour of the whole system. Preconditions of rules, for example, should be specified sufficiently precisely so that they won't inappropriately fire when new facts are added. Lots more issues are discussed in [Rich & Knight, 4.3].

References:
[Rich & Knight] [Russel & Norvig]

AI notes on Knowledge Representation and Inference

15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy