AI Unit-3
AI Unit-3
• In Non-monotonic reasoning, some conclusions may be invalidated if we add some more information to our
knowledge base.
• Non-monotonic reasoning deals with incomplete and uncertain models.
• Non-monotonic Reasoning is the process that changes its direction or values as the knowledge base increases.
Example: Let suppose the knowledge base contains the following knowledge:
o Birds can fly
o Pitty is a bird
So from the above sentences, we can conclude that Pitty can fly.
However, if we add one another sentence into knowledge base "Pitty is a penguin", which concludes "Pitty cannot
fly", so it invalidates the above conclusion.
Advantages of Non-monotonic reasoning:
o For real-world systems such as Robot navigation, we can use non-monotonic reasoning.
In a class, there are 70% of the students who like English and 40% of the students who likes English and
mathematics, and then what is the percent of students those who like English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
Types of Probabilistic Reasoning
o Bayes' rule
o Bayesian Statistics
Bayes' theorem
• Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the
probability of an event with uncertain knowledge.
• It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
• Bayes' theorem allows updating the probability prediction of an event by observing new information of the real
world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of
cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
As from product rule we can write:
P(A ⋀ B)= P(A|B) P(B) or P(A ⋀ B)= P(B|A) P(A)
The mathematical formula of Bayes' Rule is as follows.
Here,
• The P(A|B) is posterior probability. This is the probability of occurrence of A event, given that B has occurred.
• P(B|A) is the same thing as P(A|B). This is the probability of occurrence of B event, given that B.
• The P(A) is the natural occurrence before taking any new evidence.
• The P(B) is the natural occurrence before taking any new evidence.
Example:
From a standard deck of playing cards, a single card is drawn. The probability that the card is king is 4/52, then
calculate posterior probability P(King|Face), which means the drawn face card is a king card.
Solution:
P(king): probability that the card is King= 4/52= 1/13
P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:
Application of Bayes' theorem in Artificial intelligence:
• It is used to calculate the next step of the robot when the already executed step is given.
• Bayes' theorem is helpful in weather forecasting.
• It can solve the Monty Hall problem.
Bayesian Belief Network
Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network as:
• "A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
• It is also called a Bayes network, belief network, decision network, or Bayesian model.
• Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also
use probability theory for prediction and anomaly detection.
• Real world applications are probabilistic in nature, and to represent the relationship between multiple events, we
need a Bayesian network.
• It can also be used in various tasks including prediction, anomaly detection, diagnostics, automated insight,
reasoning, time series prediction, and decision making under uncertainty.
Bayesian network graph:
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random variables.
These directed links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other
o In the diagram, A, B, C, and D are random variables represented by the nodes of the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then node A is called the
parent of Node B.
• Medical diagnosis: In medical diagnosis systems, certainty factors are used to evaluate the probability of a
patient having a particular disease based on the presence of specific symptoms.
• Fraud detection: In financial institutions, certainty factors can be used to evaluate the likelihood of fraudulent
activities based on transaction patterns and other relevant factors.
• Customer service: In customer service systems, certainty factors can be used to evaluate customer requests or
complaints and provide appropriate responses.
• Risk analysis: In risk analysis applications, certainty factors can be used to assess the likelihood of certain events
occurring based on historical data and other factors.
• Natural language processing: In natural language processing applications, certainty factors can be used to
evaluate the accuracy of language models in interpreting and generating human language.
Limitations of Certainty Factor:
• Difficulty in assigning accurate certainty values: Assigning accurate certainty values to propositions or
hypotheses can be challenging, especially when dealing with complex or ambiguous situations. This can lead to
faulty results and outcomes.
• Difficulty in combining certainty values: Combining certainty values from multiple sources can be complex and
difficult to achieve accurately. Different sources may have different levels of certainty and reliability, which can
lead to inconsistent or conflicting results.
• Inability to handle conflicting evidence: In some cases, conflicting evidence may be presented, making it difficult
to determine the correct certainty value for a proposition or hypothesis.
• Limited range of values: The numerical range of the certainty factor is limited to -1 to 1, which may not be
sufficient to capture the full range of uncertainty in some situations.
• Subjectivity: The Certainty factor relies on human judgment to assign certainty values, which can introduce
subjectivity and bias into the decision-making process.
Rule-based systems
• Rule-based systems are AI systems that employ a set of predefined rules to derive conclusions from given
data. These rules are typically represented in the form of IF-THEN statements, where:
• IF represents the conditions
• THEN represents the actions or conclusions.
Components of a rule-based system:
• Knowledge base: It stores the rules, facts, and domain-specific knowledge used by the rule-based system to
make decisions, providing the necessary information for logical reasoning and rule matching.
• Explanation facilities: It generate justifications or explanations for the system's decisions, enhancing
transparency and helping users understand the reasoning behind the system's outputs, increasing trust and
interpretability.
• Database: It holds relevant data used by the rule-based system, such as input data or historical records,
providing a source of information for the inference process and enabling data-driven decision-making.
• User interface: It allows users to interact with the rule-based system, providing a means to input data, modify
rules, and receive outputs or recommendations, facilitating user engagement and system usability.
• External interface: It enables communication and integration with external systems or services, allowing data
exchange, interaction with other software components, or integration with external sources for obtaining inputs
or delivering outputs.
• Inference engine: It processes the rules and data from the knowledge base, applying logical reasoning and rule
matching to determine the appropriate actions or conclusions based on the given inputs.
• Working memory: It temporarily holds the system's current state during the inference process, storing input data,
intermediate results, and inferred conclusions, providing the necessary context for rule matching and facilitating the decision-
making process.
Examples of rule-based systems
Rule-based systems are widely applied in diverse domains for intelligent decision-making. Here are a few examples:
Advantages:
• Transparency: Rule-based systems operate on explicit rules, making their decision-making process transparent
and auditable.
• Flexibility: Rules can be easily modified or updated, allowing rule-based systems to adapt to changing
requirements and new knowledge.
• Scalability: Rule-based systems can handle large amounts of data and complex rule sets, making them suitable
for managing intricate decision-making processes.
• Explainability: The explicit representation of rules enables rule-based systems to explain their decisions,
enhancing trust and understanding.
Limitations:
• Complexity: As rule-based systems grow in size and complexity, managing and maintaining the rule base can
become challenging.
• Incomplete knowledge: Rule-based systems heavily rely on the availability and accuracy of predefined rules,
limiting their ability to handle unforeseen or uncertain scenarios.
• Lack of learning: Unlike machine learning-based approaches, traditional rule-based systems cannot learn from
data and improve their performance over time.
Dempster-Shafer Theory
Dumpster-Shafer Theory was given by Arthur P. Dempster in 1967 and his student Glenn Shafer in 1976. This
theory was released because of the following reason:-
• Bayesian theory is only concerned about single evidence.
• Bayesian probability cannot describe ignorance.
DST is an evidence theory, it combines all possible outcomes of the problem. Hence it is used to solve
problems where there may be a chance that a piece of different evidence will lead to some different result.
The uncertainty in this model is given by:-
• Consider all possible outcomes.
• Belief will lead to belief in some possibility by bringing out some evidence. (What is this supposed to
mean?)
• Plausibility will make evidence compatible with possible outcomes.
Example: Let us consider a room where four people are present, A, B, C, and D. Suddenly the lights go out and when
the lights come back, B has been stabbed in the back by a knife, leading to his death. No one came into the room
and no one left the room. We know that B has not committed suicide. Now we have to find out who the murderer
is.
To solve these there are the following possibilities:
• Either {A} or {C} or {D} has killed him.
• Either {A, C} or {C, D} or {A, D} have killed him.
• Or the three of them have killed him i.e; {A, C, D}
• None of them have killed him {o} (let’s say).
There will be possible evidence by which we can find the murderer by the measure of plausibility.
Using the above example we can say:
Set of possible conclusion (P): {p1, p2….pn}
where P is a set of possible conclusions and cannot be exhaustive.
Power Set will contain 2n elements where n is the number of elements in the possible set.
For eg:-
If P = { a, d, c}, then Power set is given as
{o, {a}, {d}, {c}, {a, d}, {d ,c}, {a, c}, {a, c ,d }}= 23 elements.
Mass function m(K):
It is an interpretation of m({K or B}) i.e; it means there is evidence for {K or B} which cannot be divided among more
specific beliefs for K and B.
Belief in K: The belief in element K of Power Set is the sum of masses of the element which are subsets of K. This
can be explained through an example
Lets say K = {a, d, c}
Bel(K) = m(a) + m(d) + m(c) + m(a, d) + m(a, c) + m(d, c) + m(a, d, c)
Plausibility in K: It is the sum of masses of the set that intersects with K.
i.e; Pl(K) = m(a) + m(d) + m(c) + m(a, d) + m(d, c) + m(a, c) + m(a, d, c)
Characteristics of Dempster Shafer Theory:
• It will ignorance part such that the probability of all events aggregate to 1. (What is this supposed to mean?)
• Ignorance is reduced in this theory by adding more and more evidence.
• Combination rule is used to combine various types of possibilities.
Advantages:
• As we add more information, the uncertainty interval reduces.
• DST has a much lower level of ignorance.
• Diagnose hierarchies can be represented using this.
• Person dealing with such problems is free to think about evidence.
Disadvantages:
In this, computation effort is high, as we have to deal with 2n sets
Fuzzy logic
• Fuzzy logic contains the multiple logical values and these values
are the truth values of a variable or problem between 0 and 1.
• In the Boolean system, only two possibilities (0 and 1) exist,
where 1 denotes the absolute truth value and 0 denotes the
absolute false value. But in the fuzzy system, there are multiple
possibilities present between the 0 and 1, which are partially
false and partially true.
• The Fuzzy logic can be implemented in systems such as micro-
controllers, workstation-based or large network-based systems
for achieving the definite output. It can also be implemented in
both hardware or software.
Architecture of a Fuzzy Logic System:
In the architecture of the Fuzzy Logic system, each component plays an
important role. The architecture consists of the different four components
LP x is Large Positive
MP x is Medium Positive
S x is Small
MN x is Medium Negative
LN x is Large Negative
For X2
μA∪B(X2) = max (μA(X2), μB(X2))
μA∪B(X2) = max (0.2, 0.8)
μA∪B(X2) = 0.8
For X3
μA∪B(X3) = max (μA(X3), μB(X3))
μA∪B(X3) = max (1, 0)
μA∪B(X3) = 1
For X4
μA∪B(X4) = max (μA(X4), μB(X4))
μA∪B(X4) = max (0.4, 0.9)
μA∪B(X4) = 0.9
2. Intersection Operation: The intersection operation of fuzzy set is defined by:
μA∩B(x) = min (μA(x), μB(x))
Example:
Let's suppose A is a set which contains following elements:
A = {( X1, 0.3 ), (X2, 0.7), (X3, 0.5), (X4, 0.1)}
And, B is a set which contains following elements:
B = {( X1, 0.8), (X2, 0.2), (X3, 0.4), (X4, 0.9)}
then,
A∩B = {( X1, 0.3), (X2, 0.2), (X3, 0.4), (X4, 0.1)}
For X1 For X3
μA∩B(X1) = min (μA(X1), μB(X1)) μA∩B(X3) = min (μA(X3), μB(X3))
μA∩B(X1) = min (0.3, 0.8) μA∩B(X3) = min (0.5, 0.4)
μA∩B(X1) = 0.3 μA∩B(X3) = 0.4
For X2 For X4
μA∩B(X2) = min (μA(X2), μB(X2)) μA∩B(X4) = min (μA(X4), μB(X4))
μA∩B(X2) = min (0.7, 0.2) μA∩B(X4) = min (0.1, 0.9)
μA∩B(X2) = 0.2 μA∩B(X4) = 0.1
μĀ(x) = 1-μA(x),
Example:
Let's suppose A is a set which contains following elements:
A = {( X1, 0.3 ), (X2, 0.8), (X3, 0.5), (X4, 0.1)}
then,
Ā= {( X1, 0.7 ), (X2, 0.2), (X3, 0.5), (X4, 0.9)}
For X1 For X3
μĀ(X1) = 1-μA(X1) μĀ(X3) = 1-μA(X3)
μĀ(X1) = 1 - 0.3 μĀ(X3) = 1 - 0.5
μĀ(X1) = 0.7 μĀ(X3) = 0.5
For X2 For X4
μĀ(X2) = 1-μA(X2) μĀ(X4) = 1-μA(X4)
μĀ(X2) = 1 - 0.8 μĀ(X4) = 1 - 0.1
μĀ(X2) = 0.2 μĀ(X4) = 0.9
Membership Function
• The membership function is a function which represents the graph of fuzzy sets, and allows users to quantify the
linguistic term. It is a graph which is used for mapping each element of x to the value between 0 and 1.
• This function is also known as indicator or characteristics function.
• For the Fuzzy set B, the membership function for X is defined as: μB:X → [0,1].
o In this function X, each element of set B is mapped to the value between 0 and 1.
o This is called a degree of membership or membership value.
• There can be multiple membership functions applicable to fuzzify a numerical value.
• Simple membership functions are used as use of complex functions does not add more precision in the output.
All membership functions for LP, MP, S, MN, and LN are shown as below :
The triangular membership function shapes are most common among various other membership function shapes
such as trapezoidal, singleton, and Gaussian.
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding output also changes.
Example of a Fuzzy Logic System
Let us consider an air conditioning system with 5-level fuzzy logic system. This system adjusts the temperature of air
conditioner by comparing the room temperature and the target temperature value.
Step 1 − Define linguistic variables and terms
Linguistic variables are input and output variables in the form of simple words or sentences. For room temperature,
cold, warm, hot, etc., are linguistic terms.
Temperature (t) = {very-cold, cold, warm, very-warm, hot}
Every member of this set is a linguistic term and it can cover some portion of overall temperature values.
Step 2 − Construct membership functions for them
The membership functions of temperature variable are as shown −
Step3 − Construct knowledge base rules
• Create a matrix of room temperature values versus target temperature values that an air conditioning system is
expected to provide.
• Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures.
RoomTemp.
Very_Cold Cold Warm Hot Very_Hot
/Target
• It is the technology that is used by machines to understand, analyse, manipulate, and interpret human's
languages.
Components of NLP:
1. Natural Language Understanding (NLU)
Natural Language Understanding (NLU) helps the machine to understand and analyse human language by extracting
the metadata from content such as concepts, entities, keywords, emotion, relations, and semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both spoken and written
language.
. Natural Language Generation (NLG)
Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural language
representation. It mainly involves Text planning, Sentence planning, and Text Realization.
Advantages of NLP:
• NLP helps users to ask questions about any subject and get a direct response within seconds.
• NLP offers exact answers to the question means it does not offer unnecessary and unwanted information.
• NLP helps computers to communicate with humans in their languages.
• It is very time efficient.
• Most of the companies use NLP to improve the efficiency of documentation processes, accuracy of
documentation, and identify the information from large databases.
Disadvantages of NLP:
• NLP may not show context.
• NLP is unpredictable
• NLP may require more keystrokes.
• NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is built for a single
and specific task only.
Applications of NLP:
1. Question Answering
Question Answering focuses on building systems that automatically answer
the questions asked by humans in a natural language.
2. Spam Detection
Spam detection is used to detect unwanted e-mails getting to a user's
inbox.
3. Sentiment Analysis
Sentiment Analysis is also known as opinion mining. It is used on the web
to analyse the attitude, behaviour, and emotional state of the sender. This
application is implemented through a combination of NLP (Natural
Language Processing) and statistics by assigning the values to the text
(positive, negative, or natural), identify the mood of the context (happy,
sad, angry, etc.)
4. Machine Translation
Machine translation is used to translate text or speech from one natural language to another natural language
Example: Google Translator
5. Spelling correction
Microsoft Corporation provides word processor software like MS-word, PowerPoint for the spelling correction.
6. Speech Recognition .
Speech recognition is used for converting spoken words into text. It is used in applications, such as mobile, home
automation, video recovery, dictating to Microsoft Word, voice biometrics, voice user interface, and so on.
7. Chatbot
Implementing the Chatbot is one of the important applications of NLP. It is used by many companies to provide the
customer's chat services.
Syntactic Processing
• Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the words.
There are a number of algorithms researchers have developed for syntactic analysis, but we consider only the
following simple methods −
• Context-Free Grammar
• Top-Down Parser
Context-Free Grammar:
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite rules.
The parse tree breaks down the sentence into structured parts so that the computer can easily understand and
process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which describe what
tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then the string
combined by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences such as "The bird peck
the grains" can be wrongly permitted. i. e. the subject-verb agreement error is approved as correct.
Merit −
Demerits −
• They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct according to parser,
but even if it makes no sense, parser takes it as a correct sentence.
• To bring out high precision, multiple sets of grammar need to be prepared. It may require a completely different
sets of rules for parsing singular and plural variations, passive sentences, etc., which can lead to creation of huge
set of rules that are unmanageable.
Top-Down Parser:
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that
matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again with a
different set of rules. This is repeated until a specific rule is found which describes the structure of the sentence.
Merit −
• It is simple to implement.
Demerits −