Information Retrieval: Venkatesh Vinayakarao

https://vvtesh.sarahah.
com/
Information Retrieval
Venkatesh Vinayakarao
Term: Aug – Sep, 2019
Chennai Mathematical Institute
So much of life, it seems to me, is determined by pure randomness.

– Sidney Poitier.
God does not play dice with the universe.
- Einstein.
Love Tarun Venkatesh Vinayakarao (Vv)

The Law – Robert M. Coates
From the book, “The World of Mathematics – Volume IV”.
Triborough Bridge, NY, USA. And then, one day…

(aka Robert F. Kennedy Bridge)
It just looked as if everybody

Late 1940s, NY: No other bridge or main in Manhattan who owned a car
highway was affected, and though the had decided to drive out to Long
two preceding nights had been equally Island that evening.
balmy and moonlit, on both of these the
bridge traffic had run close to normal.
No Reason!
Sergeant: “I kept askin’ them” he said, “Is there night

football somewhere that we don’t know about? Is it the
races you’re goin’ to?”
But the funny thing was half the time they’d be askin’ me.
“What’s the crowd for, Mac?” they would say. And I’d just
look at them.
If normal things stop happening, if
we lose regularities in life, our
planet could become unlivable!
Time for Action
• At this juncture, it was inevitable that Congress should
be called on for action.
• Senator said, “You can control it”. Re-education and

reforms were decided upon. He said, (we need to lead
people back to) “the basic regularities, the homely
averageness of the American way of life”.
The Law of Large Numbers
Known as the Fundamental theorem of Probability
The average of the results obtained from a

large number of trials should be close to the
expected value, and will tend to become
closer as more trials are performed.
Expectation
• Roll a dice. Assume you may see 1 to 6 with equal
probability.
• Expected Value = ?
According to the law of large numbers, if a large number

of six-sided dice are rolled, the average of their values
(sometimes called the sample mean) is likely to be close to
3.5, with the precision increasing as more dice are rolled. –
Wikipedia.
A Minor Digression
• What is the fundamental theorem of algebra?
Quiz
• What is the fundamental theorem of algebra?
• Loosely, “Every polynomial has root(s)”
• More Precisely, “every non-constant single-
variable polynomial with complex coefficients has at
least one complex root.” [Source: Wikipedia].
Conditional Probability
• P(A) = 0.52,
• P(B1) = 0.1, and so on as shown below
• What is P(A|B1)?
• P(A|B1) = 1.
Euler Diagram
Quiz: Conditional Probability
Euler Diagram
Revisiting Probability
• Developers in two companies are distributed as
follows. Compute Joint Probabilities.
Java C Total Java C Total
Company-X 1 17 18 Company-X 0.013 0.227 0.24
Company-Y 37 20 57 Company-Y ?? ?? ??
Total 38 37 75 Total 0.506 ?? ??
P(Company-X,Java) = 1/75 = 0.013

• Developers in two companies

Company-Y 37 20 57 Company-Y 0.493 0.267 0.76
Total 38 37 75 Total 0.506 0.494 1
• Joint Probability P(Company-X,Java) = 0.013.

• P(Company-Y, Java) = 0.493
• Sometimes written as P(AB) or P(A ∩ B)

Total 38 37 75 Total 0.506 0.494 1
• P(Company-Y|Java) = ??
• Is P(Company-Y|Java) == P(Java|Company-Y) ?

Total 38 37 75 Total 0.506 0.494 1
• P(Company-Y|Java) = 37/38 = 0.974

• P(Java|Company-Y) = 37/57 = 0.649
Odds
• Odds, O(A) = P(A)/P(A’) = PA/(1-P(A))
Quiz
• What is the probability of getting a 5 when rolling a
six sided die? Assume a fair die.
• What is the odds of the same event?
Quiz: Conditional Probability
P(A|B2) = 0.12/(0.12 + 0.04) = 0.75.
Euler Diagram
Reading
Probability and Computing - Eli Fooled by Randomness

Upfal and Michael - Nassim Nicholas Taleb
Mitzenmacher
Thomas Bayes, 1701 to 1761
Bayesian Data Analysis

and
Beta Distribution
Venkatesh Vinayakarao
Agenda
Updating Beliefs using Probability Theory
Role of Beta Distributions
Will Discuss Will not Discuss

✓Concepts Details
✓Illustrations Definitions
✓Intuitions Formalism
✓Purpose Derivations
✓Properties Proofs
The Case of Coin Flips
General Assumption: If a coin is fair! Heads (H) and
Tails (T) are equally likely. But, coin need not be fair

Experiment with Coin - 1 Experiment with Coin - 2

HHHTTTHTHT HHHHHHHHHT
Coin-1 more likely to be fair when compared to coin-2.

Our Beliefs
• Can we find a structured way to determine coin’s
nature?
Data None H T H T
Observed
Belief Fair Skewed Fair Skewed Fair
Coin-1
Data None H H H H
Observed
Belief Fair Skewed More Even Even Even
Coin-2 Skewed More… More…
Prior Belief
Belief Updates
Priors
• Priors can be strong or weak
Weak Prior
A few observations sufficient to
New Coin change our belief significantly.
Strong Prior
Coin is lab tested for 1 Million Tosses.
50% H, 50% T observed.
One more observation will not
change our belief significantly.
HyperParameter
• Prior probability (of Heads) could be anything:
• O.5 → Fair Coin
• 0.25 → Skewed towards Tails
• 0.75 → Skewed towards Heads
• 1 → Head is guaranteed!
• 0 → Both sides are Tails.
We use θ as a HyperParameter to visualize what

happens for different values.
World of Distributions
Discrete Distribution of Prior. Since I typically
perceive coins as fair, Prior belief peaks at 0.5.
Another Possibility
I may also choose to be unbiased! i.e., θ may take
any value equally likely.
A Continuous Uniform Distribution!

Observations
Let’s flip the coin (N) 5 times. We observe (z) 3
Heads.
Impact of Data
Belief is influenced by Observations. But, note that:
Belief ≠ observation
Bayes’ Rule
Eeks… what’s in the denominator?

Numerator is easy
• p(θ) was uniform. So, nothing to calculate.
• How to calculate p(D|θ)?
θ𝑧 1 − θ 𝑛−𝑧
If D observed is HHHTT and θ is 0.5,

We have:
p(D|θ) = (0.5)3 1 − 0.5 5−3
Jacob Bernoulli
1655 – 1705. Remember, two things: 1)we are interested in the distribution
2) Order of H,T does not matter.
Quiz
• Calculate p(D|θ) for the observation TTHHH and θ
= 0.3.
• (0.3)3(0.7)2 = 0.013
Bayesian Update
Painful Denominator
• Recall, for discrete distributions:
• And, for continuous distributions:

A Simpler Way
Form, Functions and Distributions
Normal (or Gaussian) Poisson
What form will suit us?
Beta Distribution
Vadivelu, a famous Tamil comedian.

This is one of his great expressions -
terrified and confused.
Beta Distribution
• Takes two parameters Beta(a,b)
• For now, assume a = #H + 1 and b = #T + 1.
• After 10 flips, we may end up in one of these three:
Prior and Posterior
Let’s say we have a Strong Prior – Beta (11,11). What
should happen if we see 10 more observations with
5H and 5T?
Prior and Posterior…
What if we have not seen any data?
Conjugate Prior
So, we see that:
z Heads in
N trials
Prior Posterior
beta(a,b) beta(a+z,b+N-z)
Such Priors that have same form are called Conjugate

Priors
Summary
Prior
Likelihood
Posterior
Bayes’ Rule
Bernoulli
Distribution
Beta Distribution
References
Book on Doing Bayesian YouTube Video on
Data Analysis – John K. Statistics 101: The
Kruschke. Binomial Distribution –
Brandon Foltz.
Maximum Likelihood Estimation
• An Observation
• HHTT TTHH TTTT TTTT
• What values of P(H) and P(T) will maximize the
probability of the above observation?
• P(…Observation…) = 𝑥 4(1 − 𝑥)12
• For what value of x will this function maximize?
Probabilistic Retrieval
• Information Need: Taj Mahal
• Let a query q be “Taj”
• Let the results be:
• d1: Taj
• d2: Taj Mahal
• d3: Taj Tea
• Two judges were asked to provide relevance judgments:
Document Judge 1 Judge 2
Taj R N
Taj Mahal R R
Taj Tea N N
Probability of Relevance
• Documents can have probability of being relevant
and of being non-relevant at the same time.
• Example:
• Documents in our collection :
Document P(R=0|d,q) P(R=1|d,q) R = 0 ➔ Non-Relevant
R = 1 ➔ Relevant
Taj 0.5 0.5
Taj Mahal ? ?
Taj Tea ? ?
Probability of Relevance
• Documents can have probability of being relevant
and of being non-relevant at the same time.
• Example:
• Documents in our collection :
R = 1 ➔ Relevant
Taj 0.5 0.5
Taj Mahal 0 1
Taj Tea 1 0
Probability Ranking Principle
Rank documents by the

probability of relevance,
P(R=1|q,d) R{0,1}
Probability Ranking Principle
Rank documents by the

probability of relevance,
P(R=1|q,d) R{0,1}

R = 1 ➔ Relevant
Taj 0.5 0.5
Taj Mahal 0 1
Search Result:
Taj Tea 1 0 1. Taj Mahal
2. Taj
3. Taj Tea
Bayes Optimal Decision Rule
d is relevant if
P(R=1|d,q) > P(R=0|d,q)
Document P(R=0|d,q) P(R=1|d,q)

Taj 0.5 0.5
Taj Mahal 0 1
Search Result:
Taj Tea 1 0 1. Taj Mahal
Predicting Relevance

Taj 0.5 0.5 This is user given
relevance.
Taj Mahal 0 1
Taj Tea 1 0 Can we
estimate/predict
relevance based
on term
occurrence ?
• You may use labeled set from judges (or mined
from clicklogs)
• You may assume query and document as set of
words.
Query Document Relevance
This is user given
q1 = (x1,x2,…) d1 = (..xi, xj,…) 1
relevance.
q1 d2 1
q1 d3 0 Can we
q2 d1 0 estimate/predict
q2 d2 0 relevance ?
q2 d3 1
Binary Independence Model
(BIM)
• Each document is a binary vector of terms.
• Occurrence of terms is mutually independent.
𝑃 𝑑 𝑅 = 1, 𝑞 𝑃(𝑅 = 1|𝑞)
𝑃 𝑅 = 1 𝑑, 𝑞 =
𝑃(𝑑|𝑞)
Bayes Rule
Quiz
𝑃 𝑑 𝑅 = 1, 𝑞 𝑃(𝑅 = 1|𝑞)
𝑃 𝑅 = 1 𝑑, 𝑞 =
𝑃(𝑑|𝑞)
Bayes Rule
• P(d=Taj|R=1,q) = ?
• P(R=1|q) = ?
• P(d|q) = ?

Taj 0.5 0.5
Taj Mahal 0 1
Taj Tea 1 0
Quiz
𝑃 𝑑 𝑅 = 1, 𝑞 𝑃(𝑅 = 1|𝑞)
𝑃 𝑅 = 1 𝑑, 𝑞 =
𝑃(𝑑|𝑞)
Bayes Rule
• P(d=Taj|R=1,q) = 1/3
• P(R=1|q) = 1/2 P(R=1|d=Taj,q) = (1/3)(1/2)/(1/3) = ½
• P(d=Taj|q) = 1/3
Document P(R=0|d,q) P(R=1|d,q) R=0 R=1

1/6 1/6
Taj 0.5 0.5 Taj
1/3 1/3
Taj Mahal 0 1 Taj
Taj Tea Mahal
Taj Tea 1 0
• Odds of Relevance is easier to calculate
Constant term.
• In BIM, we assume that the term occurrence is

mutually independent
Retrieval Status Value
Not document specific.
Constant for a query.
“We can manipulate this

expression by including the
query terms found in the
document into the right
product, but simultaneously
RSV =
dividing through by them in the
left product, so the value is
unchanged” - CPS.
RSV is used for ranking documents.
Read Section 11.3.1 of CPS.

Thank You

Information Retrieval: Venkatesh Vinayakarao

Uploaded by

Copyright:

Available Formats

Information Retrieval: Venkatesh Vinayakarao

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Retrieval: Venkatesh Vinayakarao

Uploaded by

Copyright:

Available Formats

https://vvtesh.sarahah.

So much of life, it seems to me, is determined by pure randomness.

Love Tarun Venkatesh Vinayakarao (Vv)

Triborough Bridge, NY, USA. And then, one day…

It just looked as if everybody

Sergeant: “I kept askin’ them” he said, “Is there night

• Senator said, “You can control it”. Re-education and

The average of the results obtained from a

According to the law of large numbers, if a large number

P(Company-X,Java) = 1/75 = 0.013

Java C Total Java C Total

• Joint Probability P(Company-X,Java) = 0.013.

Java C Total Java C Total

Java C Total Java C Total

• P(Company-Y|Java) = 37/38 = 0.974

Probability and Computing - Eli Fooled by Randomness

Bayesian Data Analysis

Will Discuss Will not Discuss

Experiment with Coin - 1 Experiment with Coin - 2

Coin-1 more likely to be fair when compared to coin-2.

We use θ as a HyperParameter to visualize what

A Continuous Uniform Distribution!

Eeks… what’s in the denominator?

If D observed is HHHTT and θ is 0.5,

• And, for continuous distributions:

Vadivelu, a famous Tamil comedian.

Such Priors that have same form are called Conjugate

Rank documents by the

Rank documents by the

Document P(R=0|d,q) P(R=1|d,q) R = 0 ➔ Non-Relevant

Document P(R=0|d,q) P(R=1|d,q)

Document P(R=0|d,q) P(R=1|d,q)

Document P(R=0|d,q) P(R=1|d,q)

Document P(R=0|d,q) P(R=1|d,q) R=0 R=1

• In BIM, we assume that the term occurrence is

“We can manipulate this

RSV is used for ranking documents.

Read Section 11.3.1 of CPS.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.