Statistical Data Analysis: PH4515: 1 Course Structure

Statistical Data Analysis: PH4515
Lecture 1
Glen Cowan
September 30, 2019
1 Course structure
Be aware that up through week 6, the final course hour is used for C++ prep; after that
time, it will be essential coursework review and example supervision.
Please turn in your problem sheets on paper, Mondays at our lectures. Staple the pages
and indicate on the sheet your name, College, and degree programme (Graham Van
Goffrier, UCL, PhD Data-Intensive Science).
Course assessment is 100% coursework for PhD students, this will NOT appear on HEP
exams. 9 problem sheets!
2 What is Statistics?
Theory → Statistics → Experiment
Theories give predictions for observables, but these are essentially always approximations
in particle physics. Furthermore, they require adjustable parameters (25 in SM) which
are known approximately.
Experimental measurements are subject to random fluctuations, which injure repeatabil-
ity.
Goals:
1. Estimate the parameters.
2. Quantify uncertainty in parameter estimates.
3. Test the theory (model).
We want quantitative statements here!
To understand uncertainty in our models and measurements, we need to first describe
probability.
3 What is Probability?
Kolmogorov’s Axioms were formalized (using set theory) in 1933.
1
Consider a set called the sample space S, which has subsets A, B, etc. E.g. elements
could be the outcomes of some repeatable measurement.
1. ∀A ⊂ S, P (A) ≥ 0.
2. P(S) = 1.
3. If A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B).
Can derive results from this which are familiar, e.g. P (Ā) = 1−P (A) (where Ā ≡ S −A).
e.g. P (A ∪ Ā) = 1, P (∅) = 0, A ⊂ B =⇒ P (A) ≤ P (B).
Show this on problem set 1: P (A ∪ B) = P (A) + P (B) + P (A ∩ B). And show it only
using the three Kolmogorov Axioms!
3.1 Conditional Probability

A sort of fourth axiom:
P (A ∩ B)
P (A|B) = (1)
P (B)
e.g. dice, imagine somebody has peaked and already told us that the answer is even.
P (n < 3 ∩ n even) 1/6

P (n > 3, n even) = = = 1/3 (2)
P (n even) 3/6
Two subsets A, B are said to be independent if P (A ∩ B) = P (A) · P (B). If A and B

satisfy this property, we note the following conditional probability result:
P (A) · P (B)
P (A|B) = = P (A) (3)
P (B)
which is the reason for calling this property independence.

Independent is not the same as disjoint; this is a common mistake. Disjoint means that
A ∩ B = ∅. For example, n even and n odd are clearly disjoint subsets, but they are not
independent because choosing one has a clear effect on the probability of the other.
3.2 Interpretation of Probability

1. Relative frequency: we can think of elements of S as outcomes of repeatable obser-
vations:
# times outcome in A
P (A) = lim (4)
n→∞ n
e.g. ”usual” Copenhagen interpretation of QM, proton-proton (pp) collisions, ra-
dioactive decay.
2
2. Subjective (Bayesian): we can think of elements of S as hypotheses which may be
true or false. Only one of the hypotheses in S is true, and P(A) is my subjective
degree of belief that the true hypothesis is in that subset,
e.g. cosmology, since we only have access to one entire universe and therefore any
experimentation is not repeatable.
Both interpretations are consistent with the Kolmogorov Axioms, but remain distinct.
3.3 Bayes’ Theorem

By definition (1), we can also switch labels and take:
P (B ∩ A)
P (B|A) = (5)
P (A)
then rearranging, we arrive at Bayes’ Theorem (1763):
P (A)
P (A|B) = P (B|A) · (6)
P (B)
this is key: the probability that a person eats bread given that they are a murderer is
something like 80%. The probability that a person is a murderer given that they eat
bread is something like 10−6 .
3.4 Law of Total Probability

Given a partition Ai such that ∪i Ai = S, where Ai are all disjoint. Then B = ∪i (B ∩ Ai ),
and by Kolmogorov #3, we have:
X X
P (B) = P (B ∩ Ai ) = P (B|Ai ) · P (Ai ) (7)
i i
If Ai is labeled by a continuous parameter θ, then the above sum becomes an integral:

Z
P (B) = dθP (B|θ) · p(θ) (8)
where p(θ) is a probability distribution function (pdf).

Bayes’ Theorem then becomes:
P (B|A) · P (A)
P (A|B) = P (9)
i P (B|Ai ) · P (Ai )
where often A is one of the Ai .
3
3.5 Example application of Bayes’ Theorem
Consider disease D where: P(D) = 0.001, P(no D) = 0.999.
The accuracy of the test is provided as follows:
P (+|D) = 0.98
P (−|D) = 0.02
P (+|noD) = 0.03
P (−|noD) = 0.97
Your result is +. Now what?
P (+|D)P (D)
P (D|+) = = 0.032 (10)
P (+)
This is so low! How can this happen?

Our posterior probability P (D|+) is proportional to P(D), the known population proba-
bility which is very low. The positive test has raised the patient probability from 0.001
to 0.032, but would need to be significantly more accurate for the patient to be sure.
4 The Farmer and the Poet

Frequentist statistics:
Prob(data), e.g. P (~x), using Latin letters for random variables. This is the outcome of a
repeatable experiment.
e.g. P( Warren elected potus in 2020), or P (0.118 < αs < 0.119), or P(SUSY is true).
Both of these probabilities, in the frequentist view, are either 0 or 1. These statements
are either true or not.
In frequentist statistics, we say that a hypothesis is preferred if it predicts a high prob-
ability for data which are ”like” the data we observed. It has a higher probability than
what is predicted by some alternative hypothesis using the same data.
Bayesian statistics:
Extend probability to include this subjective degree of belief. We have a hypothesis H: a
rule that specifies probability for some observation. We call this probability the likelihood
~
if H P (X|H).
We would like to know the probability that the hypothesis is true, after doing an ex-
periment and gathering some data from that experiment. In other words, we want to
~ applying Bayes’ theorem.
evaluate P (H|X),
~ ~
~ = PP (X|H) · π(H) → R P (X|H) · π(H)
P (H|X) (11)
~ ~
i P (X|Hi ) · π(Hi ) dθP (X|θ) · π(θ)
This denominator is a kind of normalization constant. But where do we find our prior,
π(H)? As There is no known golden rule, this is a subjective choice. But, Bayes’ theorem
gives an if then statement: if I have assumed some prior, given experimental data, the
theorem allows us to update our assumption.
4
Before any serious experimentation, we might have a very broad, ”uninformative” prior.
For any sufficiently uniform prior within the region of the experimental results, the poste-
rior will be very similar and lends legitimacy to this approach. However, a prior which is
very certain (like a delta function), and false, will not easily be improved by this approach.
It is therefore important to keep an open mind.
The bulk of statistical tools used in physics are frequentist in nature, and this course will
match that.
5 Random Variables
A random variable (RV) is a numerical label for each element of the sample space (or
the hypothesis space). This label could be continuous or discrete. It could be a single
quantity (a scalar) or a collection of numbers (a vector).
For continuous variables, we need to talk about the probability of finding a value in some
infinitessimal interval:
P (x found in [x, dx]) = f (x)dx (12)
which defines a probability density function (pdf) f (x), with property
Z b
f (x)dx = P (a < x < b) (13)
a
If we integrate over the ”support” of the RV:

Z ∞
f (x)dx = 1 (14)
−∞
For discrete variables, P (xi ) = pi gives a ”probability mass function” rather than a pdf.
We can calculate the cumulative distribution function (cdf):
Z x
0
F (x) = P (x < x) = f (x0 )dx0 (15)
−∞
where the original random variable is relabeled to x’.

∂F
Or, take F(x) as starting point. Define pdf f (x) ≡ if it exists.
∂x
Given a value 0 < α < 1, a unique xα is defined such that F (xα ) = α. These are quantiles,
for example x1/2 = median.
6 Homework
Problem Set 1: Due Monday, October 9

Statistical Data Analysis: PH4515: 1 Course Structure

Uploaded by

Copyright:

Available Formats

Statistical Data Analysis: PH4515: 1 Course Structure

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Data Analysis: PH4515: 1 Course Structure

Uploaded by

Copyright:

Available Formats

Statistical Data Analysis: PH4515

September 30, 2019

3.1 Conditional Probability

P (n < 3 ∩ n even) 1/6

Two subsets A, B are said to be independent if P (A ∩ B) = P (A) · P (B). If A and B

which is the reason for calling this property independence.

3.2 Interpretation of Probability

3.3 Bayes’ Theorem

then rearranging, we arrive at Bayes’ Theorem (1763):

3.4 Law of Total Probability

If Ai is labeled by a continuous parameter θ, then the above sum becomes an integral:

where p(θ) is a probability distribution function (pdf).

where often A is one of the Ai .

Your result is +. Now what?

This is so low! How can this happen?

4 The Farmer and the Poet

P (x found in [x, dx]) = f (x)dx (12)

which defines a probability density function (pdf) f (x), with property

If we integrate over the ”support” of the RV:

where the original random variable is relabeled to x’.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.