Conditional Probability and Probability Updating

Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A-Mat.
(2023) 117:144
https://doi.org/10.1007/s13398-023-01462-2
ORIGINAL PAPER
Conditional probability and probability updating
José Manuel Gutiérrez1
Received: 4 October 2022 / Accepted: 3 June 2023

© The Author(s) 2023
Abstract
The conditional probability formula is supposed to reflect the correct updating of probability
assignments when new information is incorporated. Starting from a non-atomic probability
measure, it is proved that the conditional probability formula provides the only transformed
probability measure satisfying a “minimum requirement” relational assumption. This result
applies to the standard Bayesian parametric model.
Keywords Conditional probability · Probability updating · Bayesian inference
Mathematics Subject Classification 62A05 · 60A05 · 62F15
1 Introduction
Conditional probability is, on the one hand, an intuitive concept, which captures the change
in the original probability assignment when new information is known. On the other hand,
the axiomatic definition of conditional probability is given by a formula that determines it
from the original probability. Often both concepts are identified, and it is postulated that the
incorporation of new information alters the original probability assignment according to this
formula.
As always when an axiomatic definition is applied, it is worth discussing its applicability
in each case. Indeed, when considering the frequentist interpretation of probability, there
are plausible reasons for such applicability. In the case of the subjective interpretation of
probability, as a degree of belief, typical of Bayesian statistical inference, arguments have
been constructed to justify that the change in the assignment of probabilities when new
information is incorporated must follow the conditional probability formula. These arguments
start from a qualitative relation of the form A|B C|D, meaning “A given B is qualitatively
at least as probable as C given D”, satisfying certain elaborated assumptions (see [8]). Then
it is proved that there is one and only one probability P such that
P(A ∩ B) P(C ∩ D)
A|B C|D iff ≥ .
P(B) P(D)
B José Manuel Gutiérrez

jmgut@usal.es
1 Facultad de Economía y Empresa, Universidad de Salamanca, Salamanca 37007, Spain
0123456789().: V,-vol 123

144 Page 2 of 6 J. M. Gutiérrez
This result is to be understood within measurement theory, where the representation by

probabilities of qualitative probability orderings of events is discussed; usually finitely addi-
tive probabilities have been considered, although completely additive probabilities have also
been studied (see [11]).
We consider in this paper a different starting point to justify the applicability of the
axiomatic definition of conditional probability (i.e. the conditional probability formula). The
original probability measure is taken as given, and an assumption on the relation between this
original probability and a possible updated conditional probability is imposed (Aristotelian
Assumption, (A.A) for short). Provided that the original probability is non-atomic, it is proved
that there is one and only one transformed probability measure satisfying the assumption
(Theorem 7).
This result applies to Bayesian statistics. We recall that Bayesian inference relies on
the use of the conditional probability formula to update probability assignments when new
information is incorporated. For simplicity, we take momentarily all probability distributions
to be representable in terms of densities. Suppose that Y = (Y1 , ..., Yn ) is a random vector of
n observations taking values on a sample space S. The parameter θ = (θ1 , ..., θk ) with values
in a parameter space ⊆ Rk indexes the various possible density functions p(y|θ ) for Y ;
so p(y|θ ) denotes the distribution of Y when θ is known. Bayesian statistics postulates that
p(y|θ ) represents a conditional distribution following the conditional probability formula.
Thus (Y , θ ) has a probability distribution (say with joint density p(y, θ ); p(y) and p(θ )
stand for the density marginals) and
p(y|θ ) p(θ ) = p(y, θ ). (1)
On the other hand, given the observed data y = (y1 , ..., yn ), let p(θ |y) denote the distribution
of the parameter θ when y is known. Bayesian statistics now postulates that p(y|θ ) represents
a conditional distribution following the conditional probability formula. Thus
p(θ |y) p(y) = p(y, θ ). (2)
Equating (1) and (2), Bayes’ formula for the posterior distribution follows:
p(y|θ ) p(θ )
p(θ |y) = . (3)
p(y)
In general, Bayes’ formula for the posterior distribution is certainly the basis of Bayesian
statistics. Two hypotheses are underlying this formula:
(H1) There is a joint probability measure P on S × .1
(H2) If P(A|C) is given the interpretation “probability of event A when event C is known”,
then the conditional probability formula applies:
P(A ∩ C)
P(A|C) =
P(C)
for P-measurable A and C, with P(C) > 0.
In the Bayesian parametric model, the joint probability P is shown to be non-atomic
(Proposition 9). Taking (A.A) for granted, it follows from Theorem 7 that, at least in the
parametric case, condition (H2) is redundant, and only (H1) is necessary for the Bayes’
formula for the posterior distribution.
1 The existence of a suitable joint probability is far from being a foregone conclusion from that of the marginals.
The case of quantum mechanics is to the point. In that theory both P(A) and P(B) may exist and yet P(A ∩ B)
need not (think of A referring to the position of a particle and B to its momentum).
123
Conditional probability and probability updating Page 3 of 6 144
2 The formula of conditional probability
In this section (, A, P) is a probability space, where is a set, A is a σ -algebra in and

P is a (σ -additive) probability measure. Let C ∈ A, with P(C) > 0.
Definition 1 Let (, A, P) be a probability space and let C ∈ A with P(C) > 0. The
probability space (, A, P ) is called a pre-conditional probability space given C iff P (C) =
1 and the following assumption hold:
(A.A) If A, B ∈ A and A, B ⊆ C, then
P(A) = P(B) implies P (A) = P (B).
This definition arguably captures obvious requirements for any re-assignement of proba-
bilities when we have the added information that the outcome is one of the elements of the
event C. The requirement P (C) = 1 says simply that “the outcome is one of the elements of
the event C”. Besides, the original assignment of probabilities has to have an influence on the
new assignment, and not merely be thrown away. It has to be re-worked in an even-handed
way, and (A.A) is in this sense a minimum requirement, expressing some sort of Aristotelian
“treat like cases alike” principle.2
Assumption (A.A) may be even unconstraining.
Example 2 Consider that := {1, 2, 3, 4}, A is the set of all subsets of , P(1):= 10 1
,
P(2):= 10 , P(3):= 10 , P(4):= 10 , and C:= {1, 2, 3}. Then any probability space (, A, P )
3 5 1
is a pre-conditional probability space given C, provided that C is a support of P (i.e. P (C) =

1).
The set function P(·|C) on A defined by

P(A ∩ C)
P(A|C):= (4)
P(C)
makes (, A, P(·|C)) into a pre-conditional probability space given C. We are interested in
the question of its uniqueness as pre-conditional probability space given C.
Remark 3 It is immediate that, if a probability space (, A, P ) satisfies P (C) = 1, then

the following three conditions are equivalent:
(i) P = P(·|C), as defined in (4).
(ii) If A ∈ A such that A ⊆ C, then
P(A)
P (A) = . (5)
P(C)
(iii) If A, B ∈ A such that A, B ⊆ C and P(B) > 0, then P (B) > 0 and
P (A) P(A)

= .
P (B) P(B)
Recall the following definition.
2 The idea of “treat like cases alike” can be found in several places of the Corpus Aristotelicum. See, e.g.,
Politics III.9: (“for instance, it is
thought that justice is equality, and so it is, though not for everybody but only for those who are equals”).
123
Definition 4 A ∈ A is an atom for P iff: (a) P(A) > 0 and (b) for every B ∈ A with B ⊆ A,
either P(B) = 0 or P(B) = P(A). A probability measure P which has no atoms is called
non-atomic.
A probability measure P is called atomic iff every E ∈ A such that P(E) > 0 contains
an atom. If P is a probability measure, then there exist unique probability measures P1 and
P2 and α ∈ [0, 1] such that P = α P1 + (1 − α)P2 and such that P1 is atomic and P2 is
non-atomic (see [7] for further discussion in the general context of measures).
The following result is a particular case of a theorem of Sierpinski [10].
Theorem 5 Let (, A, P) be a probability space with P non-atomic. If E ∈ A and P(E) >
0, then for every α ∈ [0, P(E)] there is an element F ∈ A with F ⊆ E and P(F) = α.
Induction on k gives directly the next corollary of Theorem 5 (see [9]).
Corollary 6 Let P be non-atomic, and suppose k E ∈ A such that P(E) > 0. Let αi for
i = 1, ..., k be real numbers with αi > 0 and i=1 αi = P(E). Then E can be decomposed
as a union of disjoint sets E i ∈ A with P(E i ) = αi for i = 1, ..., k.
Provided that a probability measure is non-atomic, we are going to see that any pre-
conditional probability is determined by the conditional probability formula.
Theorem 7 Let (, A, P) be a probability space and let C ∈ A with P(C) > 0. Suppose
that (, A, P ) is a pre-conditional probability space given C. If P is non-atomic, then
P = P(·|C) as defined in (4).
Proof Let A ∈ A such that A ⊆ C. In order to prove (5 ), it can be assumed, without loss of
generality, that P(A) > 0. The proof will be divided into three steps.
P(A)
(a) Consider the case P(C) = q1 , where q ∈ N, q > 0.
Applying Corollary 6 to C, with αi = q1 P(C) for i = 1, ..., q, there exist disjoint sets
q
C1 , ..., Cq ∈ A such that i=1 Ci = C and P(Ci ) = q1 P(C) = P(A) for i = 1, ..., q. By
q
(A.A), P (Ci ) = P (A) for i = 1, ..., q, and thus P (A) = q1 P ( i=1 Ci ) = q1 . Therefore
P (A) = P(A)
P(C) , which is our claim.
P(A) p
(b) Consider the case P(C) = q ∈ Q, where p, q ∈ N, p, q > 0, p ≤ q.
Applying Corollary 6 to A, with αi = 1p P(A) for i = 1, ..., p, there exist disjoint sets
p
A1 , ..., A p ∈ A such that i=1 Ai = A and P(Ai ) = 1p P(A) = q1 P(C) for i = 1, ..., p.
P(Ai ) P(Ai )
Since P(C) = for i = 1, ..., p, it follows from case (a) that P (Ai ) =
1
q P(C) for i =
p
1, ..., p. Therefore P (A) = P ( i=1 Ai ) = qp = P(C)
P(A)
.
P(A)
(c) Consider the general case P(C) = β ∈]0, 1].
There is a strictly increasing sequence (βn ) in ]0, β[∩Q such that lim βn = β. Write
n→∞
γn := P(C)
P(A) βn for n = 1, 2, ...; obviously γn ∈]0, 1[. We proceed to define inductively an
expansive sequence (An ) in A, with An ⊆ A and P(An ) = βn P(C). For n = 1, by
Theorem 5, there is A1 ∈ A, A1 ⊆ A, such that P(A1 ) = γ1 P(A) = β1 P(C). For n = 2,
by Theorem 5, there is A 2 ∈ A, A 2 ) = γ2 −γ1 P(A\A1 ); let
2 ⊆ (A \ A1 ), such that P( A
1−γ1
A2 :=A1 ∪ A 2 . We have
γ 2 − γ1
P(A2 ) = γ1 P(A) + (P(A) − γ1 P(A)) = γ2 P(A) = β2 P(C).
1 − γ1
123
Conditional probability and probability updating Page 5 of 6 144
Suppose now that A1 , ..., An ∈ A are defined, such that Ai−1 ⊆ Ai ⊆ A for i = 2, ..., n
and P(An ) = βn P(C). By Theorem 5, there is A n+1 ∈ A, A n+1 ⊆ (A\An ), such that
γn+1 −γn
P( An+1 ) = 1−γn P(A\An ); let An+1 :=An ∪ An+1 . We have
γn+1 − γn
P(An+1 ) = γn P(A) + (P(A) − γn P(A)) = γn+1 P(A) = βn+1 P(C),
1 − γn
n)
which shows that the expansive sequence (An ) is defined as intended. Since P(A = βn ∈ Q,

∞
P(C)
it follows from case (b) that P (An ) = βn for n = 1, 2, ... Therefore P ( n=1 An ) =
limn→∞ βn = β. On the other hand,
∞

P( An ) = ( lim βn )P(C) = β P(C) = P(A).
n→∞
n=1

Hence, from (A.A), we have P (A) = P ( ∞
n=1 An ), and so P (A) = β =
P(A)
P(C) .
Obviously (Example 2) the condition of P being non-atomic cannot be dropped in Theo-

rem 7.
3 Bayesian parametric inference
In standard Bayesian parametric inference we consider a probability space (S ×, Bn+k , P),
where S is a Borel set in Rn , is a (generalized) interval in Rk , Bn+k is the Borel σ -algebra
on S × and P is a (σ -additive) probability measure. Here S is interpreted as the sample
space where the response vector Y takes values and as the parameter space, each parameter
θ determining a probability distribution for Y . Recall that the marginal distributions PY and
Pθ are defined by PY (A):=P(A × ), Pθ (B):=P(S × B) for the corresponding Borelian
sets A in S and B in . In accordance to practice (see [4] and [3]; note that improper
prior distributions are not being considered) we assume that in the parametric case Pθ is
non-atomic. We shall refer to (S × , Bn+k , P) as the Bayesian parametric model.
For proofs of the following proposition see [1] or [5].
Proposition 8 Any atom of a Borel measure on a second countable Hausdorff space includes
a singleton of positive measure.
Our last result is now immediate.
Proposition 9 Let (S × , Bn+k , P) be the Bayesian parametric model. Then P is non-

atomic.
Proof By Proposition 8, if P had an atom, then it would include a singleton of positive

measure, which contradicts that Pθ is non-atomic.
If the Bayesian parametric model is considered as a valid formulation of a statistical

problem (essentially, if S × can be given a joint probability distribution), we conclude
(taking (A.A) for granted) from Theorem 7 and Proposition 9 that (H.2) follows, and thus
Bayes’ formula for the posterior distribution can be applied (provided that the measure-
theoretic hypotheses for the suitable representation of the probability distributions hold; see
for instance [6]).
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
123
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
1. Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd edn. Springer,
Berlin, New York (2006)
2. Aristotle: Politics. Greek text and facing English translation, edited by H. Rackam, Loeb Classical Library,
Harvard University Press, Cambridge Mass. (1932)
3. Bernardo, J.M., Smith, A.F.M.: Bayesian Theory, 2nd edn. Wiley, Chichester (2006)
4. DeGroot, M.H.: Optimal Statistical Decisions. McGraw-Hill, New York (1970)
5. Dudley, R.M., Norvaisa, R.: Concrete Functional Calculus. Springer, New York (2011)
6. Ghosal, S., van der Vaart, A.: Fundamentals of Nonparametric Bayesian Inference. Cambridge University
Press, Cambridge (2017)
7. Johnson, R.A.: Atomic and nonatomic measures. Proc. Am. Math. Soc. 25, 650–655 (1970)
8. Krantz, D.H., Luce, R.D., Suppes, P., Tversky, A.: Foundations of measurement, Vol. I: additive and
polynomial representations. Academic Press, New York (1971)
9. Pfeffer, W.F.: Integrals and Measures. Dekker, New York, Basel (1977)
10. Sierpinski, W.: Sur les fonctions d’ensemble additives et continues. Fundamenta Mathematicae 3, 240–
246 (1922)
11. Villegas, C.: On qualitative probability σ -algebras. Ann Math. Stat. 35, 1787–1796 (1964)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
123

Conditional Probability and Probability Updating

Uploaded by

Copyright:

Available Formats

Conditional Probability and Probability Updating

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conditional Probability and Probability Updating

Uploaded by

Copyright:

Available Formats

Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A-Mat.

Conditional probability and probability updating

José Manuel Gutiérrez1

Received: 4 October 2022 / Accepted: 3 June 2023

Keywords Conditional probability · Probability updating · Bayesian inference

Mathematics Subject Classification 62A05 · 60A05 · 62F15

B José Manuel Gutiérrez

0123456789().: V,-vol 123

This result is to be understood within measurement theory, where the representation by

2 The formula of conditional probability

In this section (, A, P) is a probability space, where is a set, A is a σ -algebra in and

is a pre-conditional probability space given C, provided that C is a support of P  (i.e. P  (C) =

The set function P(·|C) on A defined by

Remark 3 It is immediate that, if a probability space (, A, P  ) satisfies P  (C) = 1, then

Induction on k gives directly the next corollary of Theorem 5 (see [9]).

Obviously (Example 2) the condition of P being non-atomic cannot be dropped in Theo-

3 Bayesian parametric inference

Our last result is now immediate.

Proposition 9 Let (S × , Bn+k , P) be the Bayesian parametric model. Then P is non-

Proof By Proposition 8, if P had an atom, then it would include a singleton of positive

If the Bayesian parametric model is considered as a valid formulation of a statistical

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

is a pre-conditional probability space given C, provided that C is a support of P (i.e. P (C) =

Remark 3 It is immediate that, if a probability space (, A, P ) satisfies P (C) = 1, then