System Identification
System Identification
System Identification
Sometimes called
Individualization/ Personalization
Simple Medical Decision
Making
First simple applications
• Probability Basics
• Medical decision making (simple approach)
• Therapeutic decision making (applied utility theory)
Probability Basics
Probabilistic assertions summarize effects of
– laziness: failure to enumerate exceptions, qualifications, etc.
– ignorance: lack of relevant facts, initial conditions, etc.
Subjective probability:
• Probabilities relate propositions to agent's own state of
knowledge
e.g., P(A25 | no reported accidents) = 0.06
– 0 ≤ P(A) ≤ 1
– P(true) = 1 and P(false) = 0
– P(A ∨ B) = P(A) + P(B) - P(A ∧ B)
Prior probability
• Prior or unconditional probabilities of propositions
e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival
of any (new) evidence
• P(toothache v cavity) =
0.108 + 0.012 + 0.016 + 0.064 +.072 +0.008 = 0.28
Inference by enumeration
• Start with the joint probability distribution:
• Equivalent statements:
P(Toothache | Catch, Cavity) = P(Toothache | Cavity)
P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity)
Conditional independence contd.
• Write out full joint distribution using chain rule:
P(Toothache, Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch, Cavity)
= P(Toothache | Catch, Cavity) P(Catch | Cavity) P(Cavity)
= P(Toothache | Cavity) P(Catch | Cavity) P(Cavity)
• or in distribution form
P(Y|X) = P(X|Y) P(Y) / P(X) = αP(X|Y) P(Y)
f(blood pressure)
Where do features come from?
Diagnosis/Classification
decision
data
At practioner
+ (f > Θ) - (f ≤
Truth
Θ RR
Θ)
TP = True Positive + TP FN
FP = False Positive
TN = True Negative - FP TN
FN = False Negative
Simple Decisionmodel
+ - + - Sum
Truth
- FP TN 29 73 102
Sonography Galle
Quality measures
+ - + - Sum
Truth
- FP TN 29 73 102
Sonography Cholezystis
+ - + - Sum
Truth
- FP TN 29 73 102
Sonography Cholezystis
p
odds =
(1 − p )
odds
p=
(1 + odds )
Likelihood ratio
p[ D | R ] p[ D ] p[ R | D ]
= ×
p[ − D | R ] p[ − D ] p[ R | − D ]
Example
• Calculate posttest probability for a positive exercise test
(TPR=0.65; FPR=0.2) of a 60 year old man whose
pretest probability is 0.75.
• Pretest odds =…
• LR+=…
Implications
TPR=0.9
TNR=0.9
• Syntax:
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ "directly influences")
– a conditional distribution for each node given its parents:
P (Xi | Parents (Xi))
• I.e., grows linearly with n, vs. O(2n) for the full joint distribution
P(J | M) = P(J)?
Example
Suppose we choose the ordering
M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)?P(A | J, M) = P(A)?
Example
Suppose we choose the ordering
M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)?P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)?
P(B | A, J, M) = P(B)?
Example
Suppose we choose the ordering
M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)?P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
Example
Suppose we choose the ordering
M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)?P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)?
P(E | B, A, J, M) = P(E | A, B)?
Example
Suppose we choose the ordering
M, J, A, B, E
P(J | M) = P(J)? No
P(A | J, M) = P(A | J)?P(A | J, M) = P(A)? No
P(B | A, J, M) = P(B | A)? Yes
P(B | A, J, M) = P(B)? No
P(E | B, A ,J, M) = P(E | A)? No
P(E | B, A, J, M) = P(E | A, B)? Yes
Example contd.
• Approximate Inference
– Direct sampling
– Rejection sampling
– Likelihood weighting
– Markov Chain Monte Carlo (MCMC)
Typical Inference Tasks
Inference by enumeration
revisited
Usually, our interest is on P(X|e) with
the posterior joint distribution of a query variable
given specific values e for the evidence variables E
Let the hidden or nonevidence variables be
• But:
for polytrees i.e. networks in which there is at most one
undirected path between any two nodes in the network
time and space complexity is linear in n!
The Bad News
• Exact inference is feasible in small to
medium-sized networks
Query: P(m|a)
Try n=10, n=100 and n=1000 samples