Aiml Unit 2
Aiml Unit 2
Aiml Unit 2
DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERINGII
YEAR/IVSEM
CS3491–ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
UNITII PROBABILISTICREASONING
SYLLABUS:
Acting under uncertainty – Bayesian inference – Naïve Bayes models.
Probabilistic reasoning – Bayesian networks – exact inference in BN–
approximate inference in BN–causal networks.
PARTA
1. Define uncertainty and list the causes of uncertainty.
Uncertainty:
The knowledge representation, A→B, means if A is true then B is true, but
a situation where not sure about whether A is true or not then cannot
express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, uncertain reasoning or probabilistic
reasoning is used.
Causes of uncertainty:
1. Causes of uncertainty in the real world
2. Information occurred from unreliable sources.
3. Experimental Errors
4. Equipment fault
5. Temperature variation
6. Climate change.
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
7. In a class, there are 70% of the students who like English and 40% of
the students who likes English and mathematics, and then what
Is the percent of students
Those who like English also like mathematics?
Solution:
Let, A is an event that a student likes
Mathematics B is an event that a student
likes English.
Hence, 57%arethestudentswholikeEnglishalsolikeMathematics
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
11. Considertwoevents:A(itwillraintomorrow)andB(thesunwillshin
etomorrow).
Use Bayes’ theorem to compute the posterior probability of each event
occurring, given the resulting weather conditions for today: P(A|
sunny)= P(sunny|A)*P(A)/P(sunny)
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
P(B|sunny)=P(sunny|B)*P(B)/P(sunny)
where sunny is our evidence (the resulting weather condition for today).
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Local Semantics
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
8
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
PARTB
1. Explain the concept of uncertainty and acting under uncertainty with suitable
example. Explain in detail about probabilistic reasoning.
UNCERTAINITY&PROBABILISTICREASONING
Uncertainty:
Causes of uncertainty
Probabilistic reasoning:
Need of probabilistic reasoning in AI
Ways to solve problems with uncertain
knowledge
Probability
Conditional probability
Example
Handlinguncertainknowledge
In thissection, welookmoreclosely at the natureofuncertain knowledge. We
will use a simple diagnosis example to illustrate the concepts
involved.Diagnosiswhetherformedicine,automobilerepair,orwhatever-isataskthat
almost always involves uncertainty. Let us try to write rules for dental diagnosisusing
first-order logic, so that we can see how the logical approach breaks down. Consider
the following rule:
The problem is that this rule is wrong. Not all patients with toothaches have
cavities; some of them have gum disease, an abscess, or one of several other
problems:
Unfortunately, in order to make the rule true, we have to add an almost unlimited
list of possible causes. We couldtry turning the rule into a causal rule:
9
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
But this rule is not right either;notall cavities cause pain The only way to fix the rule
is to make it logically exhaustive: to augment the left-hand side with all the
qualificationsrequiredforacavitytocauseatoothache.Eventhen,forthe
purposesofdiagnosis,onemustalsotakeintoaccountthepossibilitythatthe
patientmighthaveatoothacheandacavitythatareunconnected.Tryingtouse first-order
logic to cope with a domain like medical diagnosis thus fails for threemain reasons:
Probabilityprovidesawayofsummarizingtheuncertaintythatcomesfromourl
azinessandignorance. We might not know for sure what afflicts a particular
patient,butwebelievethatthereis,say,an80%chance-thatis,aprobabilityof0.8-
thatthepatienthasacavityifheorshehasatoothache.
Thatis,weexpectthatoutofallthesituationsthatareindistinguishable
fromthecurrentsituationasfarastheagent'sknowledgegoes,thepatientwill have a cavity
in 80% of them. This belief could be derived from statistical data-80%
ofthetoothachepatientsseensofarhavehadcavities-orfromsomegeneralrules, or from a
combination of evidence sources.
The80%summarizesthosecasesinwhichallthefactorsneededforacavity to
cause a toothache are present and other cases in which the patient has both
toothache and cavity but the two are unconnected. The missing 20% summarizes
alltheotherpossiblecausesoftoothachethatwearetoolazyorignoranttoconfirm or
deny.
10
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Designforadecision-theoreticagent
Belowalgorithmsketchesthestructureofanagentthatusesdecisiontheory
toselectactions.Theagentisidentical,atanabstractlevel,tothelogicalagent.
Theprimarydifferenceisthatthedecision-theoreticagent'sknowledgeofthe current state
is uncertain; the agent'sbeliefstate is a representation of the probabilities of all
possible actual states of the world. As time passes, the agent accumulates more
evidence and its belief state changes. Given the belief state, the agent can make
probabilistic predictions of action outcomes and hence select the action with highest
expected utility.
Uncertainty:
The knowledge representation, A→B, means if A is true then B is true, but a situation where not sure about whether
So to represent uncertain knowledge, uncertain reasoning or probabilistic reasoning is used.
Causesofuncertainty:
Causesofuncertaintyintherealworld
1. Informationoccurredfromunreliablesources.
2. ExperimentalErrors
3. Equipmentfault
4. Temperaturevariation
5. Climatechange.
Probabilisticreasoning:
Probabilistic reasoning is a way of knowledge representation, the
concept of probability is applied to indicate the uncertainty in
knowledge.
NeedofprobabilisticreasoninginAI:
o Whenthereareunpredictableoutcomes.
11
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
o Whenspecificationsorpossibilitiesofpredicatesbecomestoo large
to handle.
o Whenanunknownerroroccursduringanexperiment.
Waystosolveproblemswithuncertainknowledge:
o Bayes'rule
o BayesianStatistics
Probability:
Probability can be defined as a chance that an uncertain event
will occur.
Thevalueofprobabilityalwaysremainsbetween0and1that represent
ideal uncertainties.
o 0≤P(A)≤1, whereP(A)istheprobabilityofaneventA.
o P(A)=0,indicatestotaluncertaintyinaneventA.
o P(A)=1,indicatestotalcertaintyin aneventA.
Formulatofindtheprobabilityofanuncertainevent
o Event:Eachpossibleoutcomeofavariableiscalledanevent.
o Samplespace: The collection of all possible events is called
sample space.
o Randomvariables: Randomvariablesareusedtorepresentthe
eventsandobjectsintherealworld.
o Priorprobability: The prior probability of an event is
probabilitycomputedbeforeobservingnewinformation.
o PosteriorProbability:The probability that is calculated after
all evidence or information has taken into account. It is a
combinationofpriorprobabilityandnewinformation.
Conditionalprobability:
Conditionalprobabilityisaprobabilityofoccurringaneventwhen another
event has already happened.
12
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Let'ssuppose,tocalculatetheeventAwheneventBhasalready occurred,
"the probability of A under the conditions of B", it is:
Example:
In a class, there are 70% of the students who like English and
40%ofthestudentswholikesEnglishandmathematics,andthenwhatisthe
percent of students those who like English also like mathematics?
Solution:
Let,AisaneventthatastudentlikesMathematics B is
an event that a student likes English.
Hence,57%arethestudentswholikeEnglishalsolikeMathematics
2. ExplainindetailaboutBayesianinferenceandNaiveBayesModelorNaiveBayesThe
oremorBayesRule.
NaiveBayesModelorNaiveBayesTheoremorBayesRule
BayesianInference
BayesTheoremorBayesRule
Example-ApplyingBayes'rule:
ApplicationofBayes'theoreminArtificialintelligence
BayesianInference
Bayesian inference is a probabilistic approach to machine learning that
provides estimates of the probability of specific events.
Bayesianinferenceisastatisticalmethodforunderstandingthe
uncertainty inherent in prediction problems.
13
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
BayesianinferencealgorithmcanbeviewedasaMarkovChainMonte
Carloalgorithmthatusespriorprobabilitydistributionstooptimizethe
likelihood function.
The basis of Bayesian inference is the notion of apriori and a posteriori
probabilities.
o The priori probability is the probability of an event before any
evidence is considered.
o The posteriori probability is the probability of an event after
taking into account all available evidence.
For example, if we want to know the probability that it will rain
tomorrow, our priori probability would be based on our knowledge of
the weather patterns in our area.
BayesTheoremorBayesRule
Bayes' theorem can be derived using product rule and conditional
probability of event A with known event B:
ProductRule:
3. P(A⋀B)=P(A|B)P(B)or
4. P(A⋀B)=P(B|A)P(A)
ConditionalProbability:
LetAandBareevents,
P(A|B)istheconditionalprobabilityofAgivenB,
P(B|A)istheconditionalprobabilityofBgivenA.
Equatingrighthandsideofboththeequationswillget:
14
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
HencetheBayes'rulecanbewrittenas:
WhereA1,A2,A3,........,Anisasetofmutuallyexclusiveand exhaustive
events.
2.3Example1-ApplyingBayes'rule:
Supposewewanttoperceivetheeffectofsomeunknowncause,andw
anttocomputethatcause,thentheBayes'rulebecomes:
whatistheprobabilitythatapatienthasdiseasesmeningitis
withastiffneck?
GivenData:
Adoctorisawarethatdiseasemeningitiscausesapatienttohaveasti
ffneck,anditoccurs80%ofthetime.Heisalsoawareofsomemorefa
cts,whicharegivenasfollows:
TheKnownprobabilitythatapatienthasmeningitisdisease
is1/30,000.
TheKnownprobabilitythatapatienthasastiffneckis2%.
Solution
Letabethepropositionthatpatienthasstiffneckandbbethe
proposition that patient has meningitis.
So,calculatethefollowingas:
P(a|b)=0.8
P(b)=1/30000
P(a)= .02
Hence,assumethat1patientoutof750patientshasmeningitis disease
with a stiff neck.
15
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Example2-ApplyingBayes'rule:
Considertwoevents:A(itwillraintomorrow)and B(thesunwillshine
tomorrow).
UseBayes’theoremtocomputetheposteriorprobabilityofeachevent
occurring,giventhe resultingweather conditionsfor today: P(A|
sunny)= P(sunny|A)*P(A)/P(sunny) P(B|sunny)
= P(sunny|B) * P(B) /P(sunny)
wheresunnyisourevidence(theresultingweatherconditionfor today).
Fromtheseequations,
o if event A is more likely to result in sunny weather than event B,
then the posterior probability of A occurring, given that theresulting
weather condition for today is sunny, will be higher than the
posterior probability of B occurring.
o Conversely, if event Bis more likely to result in sunny weather than
event A, then theposteriorprobabilityof B occurring, given that the
resulting weather condition for today is sunny, will be higher than
the posterior probability of A occurring.
2.4ApplicationofBayes'theoreminArtificialintelligence:
Itisusedtocalculatethenextstepoftherobotwhenthealready executed
step is given.
Bayes'theoremishelpfulinweatherforecasting.
NaiveBayesTheorem
The dentistry example illustrates a commonly occurring pattern in which a single
cause directly influences a number of effects, all of which are conditionally
independent, given the cause. The full joint distribution can be written as
16
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
3. ExplainindetailaboutBayesianNetwork
BayesianNetwork
BayesianNetwork
Joint probabilitydistribution:
ConstructingBayesianNetwork
Example
ThesemanticsofBayesianNetwork
ApplicationsofBayesiannetworksinAI
BayesianNetwork
"A Bayesian network is a probabilistic graphical model which represents a
setofvariablesandtheirconditionaldependenciesusingadirected acyclic
graph."
Itisalsocalleda Bayesnetwork,beliefnetwork,decisionnetwork, or Bayesian
model.
BayesianNetworkcanbeusedforbuildingmodelsfromdataandexperts
opinions,and it consists of two parts:
o DirectedAcyclicGraph
o Tableofconditionalprobabilities
The generalized form of Bayesian network that represents and solve
decisionproblems under uncertain knowledge is known as an Influence
diagram.
Itisusedtorepresentconditionaldependencies.
It can also be used in various tasks including prediction, anomaly
detection, diagnostics, automated insight, reasoning, time series
prediction, and decision making under uncertainty.
A Bayesian networkgraph ismadeupofnodesandArcs(directedlinks).
Figure2.1–ExampleforBayesianNetwork
Each node corresponds to the random variables, and a variable can be
continuous or discrete.
17
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Jointprobabilitydistribution:
If variables are x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability
distribution.
P[x1,x2,x3,,xn],canbewrittenasthefollowingwayintermsofthe
jointprobability distribution.
=P[x1|x2,x3,.....,xn].p[x2,x3,..............,xn]
=P[x1|x2,x3,.....,xn]P[x2|x3,.....,xn]............P[xn-1|xn]P[xn].
IngeneralforeachvariableXi,
P(Xi|Xi-1,............,X1)=P(Xi|Parents(Xi))
ConstructingBayesianNetwork
18
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
GlobalSemantics
LocalSemantics
MarkovBlanket
Eachnodeis conditionallyindependentofall othersgivenits
Markovblanket:parents+children+children’sparents
Example:
Harry installed a new burglar alarm at his home to detect burglary.
The alarmreliably responds atdetecting aburglary butalso responds
for minor earthquakes. Harry has two neighbors David and Sophia,
who have taken a responsibility to inform Harry at work when they
hear the alarm. David always calls Harry when he hears the alarm,
but sometimes he got confused with the phone ringing and calls at
that time too.Ontheother hand,Sophia likes to listento high music, so
sometimes she misses to hear the alarm. Here we would like to
computetheprobabilityofBurglaryAlarm.
Problem:
Calculatetheprobabilitythatalarmhassounded,butthereisneitherab
urglary,noranearthquakeoccurred,andDavidandSophiabothcalledt
heHarry.
19
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Solution:
The Bayesian network for the above problem is given in figure 2.2.
The network structure is showing that burglary and earthquake
istheparentnodeofthealarmanddirectlyaffectingtheprobability of
alarm's going off, but David and Sophia's calls depend on alarm
probability.
Variables:
Burglar,Earthquake,Alarm,JohnCalls,MaryCalls
Networktopologyreflects“causal”knowledge
Aburglarcansetthealarmoff
Anearthquakecansetthealarmoff
ThealarmcancauseMarytocall
The alarmcancauseJohnto call
Figure2.2-TheBayesiannetworkfortheexampleproblem
20
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Alleventsoccurringinthisnetwork:
o Burglary(B)
o Earthquake(E)
o Alarm(A)
o DavidCalls(D)
o Sophiacalls(S)
Writetheeventsofproblemstatementintheformofprobability:
P[D,S,A,B,E],
Rewritetheprobabilitystatementusingjointprobabilitydistribution:
Let's take the observed probability for the Burglary and earthquake
component:
P(B=True)=0.002,whichistheprobabilityofburglary.
P(B=False)=0.998,whichistheprobabilityofnoburglary.
P(E=True)=0.001,whichistheprobabilityofaminorearthquake
P(E=False)=0.999,Whichistheprobabilitythatanearthquakenot occurred.
ConditionalprobabilitytableforAlarmA:
TheConditionalprobabilityofAlarmAdependsonBurglarand earthquake:
B E P(A=True) P(A=False)
ConditionalprobabilitytableforDavidCalls:
The Conditionalprobability ofDavid thathewillcall dependsonthe
probability of Alarm.
21
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
A P(D=True) P(D=False)
ConditionalprobabilitytableforSophiaCalls:
TheConditionalprobabilityofSophiathatshecallsisdependingon its Parent
Node "Alarm."
A P(S=True) P(S=False)
ThesemanticsofBayesianNetwork:
TherearetwowaystounderstandthesemanticsoftheBayesiannetwork, which is
given below:
1. TounderstandthenetworkastherepresentationoftheJointproba
bilitydistribution.
Itishelpfultounderstandhowtoconstructthenetwork.
2. Tounderstandthenetworkasanencodingofacollectionofcondit
ionalindependencestatements.
Itishelpfulindesigninginference procedure.
ApplicationsofBayesiannetworksinAI
Bayesiannetworksfindapplicationsinavarietyoftaskssuchas:
1. Spamfiltering:
a. A spam filter is a program that helps in detecting unsolicited and
spammails.Bayesianspamfilterscheckwhetheramailisspam or not.
22
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
2. Biomonitoring:
a. Thisinvolvestheuseofindicatorstoquantifytheconcentrationof
chemicals in the human body.
3. Informationretrieval:
a. Bayesian networks assist in information retrieval for research,which
is a constant process of extracting information fromdatabases.
4. Imageprocessing:
a. A form of signal processing, image processing uses mathematical
operations to convert images into digital format.
5. Generegulatorynetwork:
a. A Bayesian network is an algorithm that can be applied to gene
regulatory networks in order to make predictions about the effectsof
genetic variations on cellular phenotypes.
b. Generegulatorynetworksareasetofmathematicalequationsthat
describetheinteractionsbetweengenes,proteins,andmetabolites.
c. They are used to study how genetic variations affect the
development of a cell or organism.
6. Turbocode:
a. Turbo codesare a typeof error correctioncode capableof achieving
very high data rates and long distances between error correcting
nodes in a communications system.
b. They have been used in satellites, space probes, deep-space
missions, military communications systems, and civilian wireless
communication systems, including WiFi and 4G LTE cellular
telephone systems.
7. Documentclassification:
a. The main issue is to assign a document multiple classes. The task
can be achieved manually and algorithmically. Since manual effort
takes too much time, algorithmic documentation is done tocomplete
it quickly and effectively.
4. ExplainindetailaboutBayesianInferenceanditstypeExactInferencewithsuitabl
eexample.
ExactinferenceinBayesiannetworks
The basic task for any probabilistic inference system is to compute the
posterior probability distribution for a set of query variables, given some observed
event-thatis,someassignmentofvaluestoasetofevidencevariables.Wewilluse
23
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
thenotationXdenotesthequeryvariable;EdenotesthesetofevidencevariablesEl, . . . ,Em,and
e is a particular observed event;Y will denote the nonevidencevariablesYl,...,(some-
timescalledthehiddenvariables).Thus,thecompleteset of variables X ={X}U E U Y. A
typical query asks for the posterior probability distribution P(X|e)4
Inferencebyenumeration
Conditional probability can be computed by summing terms from the full joint
distribution. More specifically, a query P(X|e) can be answered using Equation,
whichwe repeathereforconvenience:
24
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
25
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Thatis,thechanceofaburglary,givencallsfrombothneighbors,isabout28%.
TheevaluationprocessfortheexpressioninEquationisshownasanexpression tree in
Figure.
Thevariableeliminationalgorithm
The enumeration algorithm can be improved substantially by eliminating
repeated calculations of the kind illustrated in Figure. The idea is simple: do the
calculation once and save the results for later use. This is a form of dynamic
programming. There are several versions of this approach; we present the variable
elimination algorithm, which is the simplest. Variable elimination works by
evaluatingexpressionssuch as Equationinright-to-left order (thatis,bottom-upin
Figure).Intermediateresultsarestored,andsummationsovereachvariableare done only
for those portions of the expression that depend on the variable. Let us illustrate this
process for the burglary network. We evaluate the expression
Thecomplexityofexactinference
TheburglarynetworkofFigurebelongstothefamilyofnetworksinwhich
thereisatmostoneundirectedpathbetweenanytwonodesinthenetwork.These are
calledsinglyconnected networks orpolytrees, and they have a particularly nice
property:
Thetimeandspacecomplexityofexactinferenceinpolytreesislinearinthesizeofthenetw
ork.I-Iere,thesizeisdefinedasthenumberofCPTentries;if
26
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
the number of parents of each node is bounded by a constant, then the complexity
will also be linear in the number of nodes. These results hold for any ordering
consistent with the topological ordering of the network .
5. ExplainCausalNetworkorCausalBayesianNetworkinMachine
CausalNetworkorCausalBayesianNetwork
Acausalnetworkisanacyclicdigrapharisingfromanevolutionof
asubstitutionsystem,andrepresentingitshistory.
Inanevolution ofa multiwaysystem,eachsubstitutioneventisavertex in a
causal network.
Two events which are related by causal dependence, meaning one occurs
just before the other, have an edge between the corresponding vertices in
the causal network.
More precisely, the edge is a directed edge leading from the past event to
the future event.
ReferFigure2.3foranexamplecausalnetwork.
A CBN is a graph formed by nodes representing random variables,
connectedbylinksdenotingcausalinfluence.
27
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Figure2.3–CausalNetworkExample
Somecausalnetworksareindependentofthechoiceofevolution,and these
are called causally invariant.
StructuralCausalModels(SCMs).
SCMsconsistoftwoparts:agraph,whichvisualizescausal
connections,andequations,whichexpressthedetailsofthe
connections.agraphisamathematicalconstructionthatconsistsofv
ertices(nodes)andedges(links).
SCMs use a special kind of graph, called
aDirectedAcyclicGraph(DAG), for which all edges are directed and
no cycles exist.
DAGsareacommonstartingplaceforcausalinference.
Bayesianandcausalnetworksarecompletelyidentical.However, the
difference lies in their interpretations.
Fire->Smoke
Anetworkwith2nodes(fireiconandsmokeicon)and1edge(arrow pointing
from fire to smoke).
ThisnetworkcanbebothaBayesianorcausalnetwork.
Thekeydistinction, however,iswheninterpreting thisnetwork.
For a Bayesian network, we view the nodesasvariables and the
arrowasaconditionalprobability, namely the probability of smoke given
informationabout fire.
Wheninterpretingthis as a causal network, westillview
nodesasvariables,however,thearrowindicatesacausalconnection.
28
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
In this case, both interpretations are valid. However, if we were to flip the
edge direction, the causal network interpretation would be invalid, since
smoke does not cause fire.
ImplementingCausalInference
1.Thedo-operator
Thedo-operatoris
amathematicalrepresentationofaphysicalintervention.
If the model starts with Z → X → Y, simulate an intervention in X by
deleting all the incoming arrows to X, and manually setting X to some
value x_0. Refer Figure 2.4 denotes the example of do-operator.
Figure2.4–do-operatorExample
29
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
2:Confounding
Asimpleexampleofconfoundingisshowninthefigure 2.5below.
Figure2.5–ConfoundingExample
3:EstimatingCausalEffects
Treatment effect = (Outcome under E) minus (Outcome under C),
that is the difference between the outcome a child would receive if
assigned to treatment E and the outcome that same child would
receive if assigned to treatment C. These are called potential
outcomes.
6. ExplainapproximateinferenceinBayesiannetwork(BN)
30
PREPAREDBY:R.Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Directsamplingmethods
The primitive element in any sampling algorithm is the generation of samples
fromaknownprobabilitydistribution.Forexample,anunbiasedcoincanbe thought of as a
random variableCoin with values (heads,tails) and a prior distribution P(Coin) =
(0.5,0.5). Sampling from this distribution is exactly like
flippingthecoin:withprobability0.5it will returnheads,andwithprobability0.5 it will
return tails.
Given a source of random numbersruniformly distributed in the range [0,1],
itisasimplemattertosampleanydistributiononasinglevariable,whether discrete or
continuous. This is done by constructing the cumulative distribution for the variable
and returning the first value whose cumulative probability exceedsr
We begin with a random sampling process for a Bayes net that has no evidence
associated with it. The idea is to sample each variable in turn, in topological order.
The probability distribution from which the value is sampled is conditioned on the
values already assigned to the variable’s parents. (Because we sample in topological
order,theparentsareguaranteedtohavevaluesalready.)Thisalgorithmisshown in Figure.
Applying it to the network with the orderingCloudy,Sprinkler,Rain, WetGrass,
we might produce a random event as follows:
RejectionsamplinginBayesiannetworks
Rejectionsampling isa general methodforproducing samplesfroma hard- to-
sampledistributiongivenaneasy-to-sampledistribution.Initssimplestform,it can be
used to compute conditional probabilities thatis, to determineP(X|e).The
REJECTION-SAMPLINGalgorithmisshowninFigure.First,itgeneratessamples
from the prior distribution specified by the network. Then, it rejects all those that
donotmatchtheevidence.Finally,theestimateˆP(X=x|e)isobtainedbycounting how
often X=xoccursin the remainingsamples.
31
PREPAREDBY:R .Mohan,G.Selvapriya,S.Shanmuganathan,AP/CSE
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT2
Let ˆP(X|e)be the estimated distribution that the algorithm returns; this
distribution is computed by normalizing NPS(X,e), the vector of sample counts for
each value of X where the sample agrees with the evidence e:
InferencebyMarkovchainsimulation
In this section, we describe theMarkovchainMonteCarlo (MCMC)
algorithm for inference in Bayesian networks. We will first describe what the
algorithmdoes,thenwewillexplainwhyitworksandwhyithassucha complicated name.
TheMCMCalgorithm
ConsiderthequeryP(Rain1Sprinkler=true,WetGrass=true)appliedtothe
network in Figure. The evidence variablesSprinkler andWetGrass are fixed to their
observedvaluesandthehiddenvariablesCloudyandRainareinitializedrandomly- let us say
totrue andfalse respectively. Thus, the initial state is[true,true,false,true]. Now the
following steps are executed repeatedly:
32
by Ramya p
CS3491– ARTIFICIALINTELLIGENCEANDMACHINELEARNING UNIT3
2. Rain is sampled, given the current values of its Markov blanket variables: in this
case, we sample from P(Rain1Cloudy=false, Sprinkler=true, Wet Grass=true).
Suppose this yields Rain = true. The new current state is [false, true, true,
true].Each state visited during this process is a sample that contributes to the
estimate for the query variable Rain. If the process visits 20 states where Rain is
true and 60 states where Rain is false, then the answer to the query is
NORMALIZE
33
49
Downloaded by Ramya p (ramyafeb35@gmail.com)