Statistical Data Analysis: PH4515: 1 Course Structure
Statistical Data Analysis: PH4515: 1 Course Structure
Statistical Data Analysis: PH4515: 1 Course Structure
Lecture 1
Glen Cowan
1 Course structure
Be aware that up through week 6, the final course hour is used for C++ prep; after that
time, it will be essential coursework review and example supervision.
Please turn in your problem sheets on paper, Mondays at our lectures. Staple the pages
and indicate on the sheet your name, College, and degree programme (Graham Van
Goffrier, UCL, PhD Data-Intensive Science).
Course assessment is 100% coursework for PhD students, this will NOT appear on HEP
exams. 9 problem sheets!
2 What is Statistics?
Theory → Statistics → Experiment
Theories give predictions for observables, but these are essentially always approximations
in particle physics. Furthermore, they require adjustable parameters (25 in SM) which
are known approximately.
Experimental measurements are subject to random fluctuations, which injure repeatabil-
ity.
Goals:
1. Estimate the parameters.
2. Quantify uncertainty in parameter estimates.
3. Test the theory (model).
We want quantitative statements here!
To understand uncertainty in our models and measurements, we need to first describe
probability.
3 What is Probability?
Kolmogorov’s Axioms were formalized (using set theory) in 1933.
1
Consider a set called the sample space S, which has subsets A, B, etc. E.g. elements
could be the outcomes of some repeatable measurement.
1. ∀A ⊂ S, P (A) ≥ 0.
2. P(S) = 1.
3. If A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B).
Can derive results from this which are familiar, e.g. P (Ā) = 1−P (A) (where Ā ≡ S −A).
e.g. P (A ∪ Ā) = 1, P (∅) = 0, A ⊂ B =⇒ P (A) ≤ P (B).
Show this on problem set 1: P (A ∪ B) = P (A) + P (B) + P (A ∩ B). And show it only
using the three Kolmogorov Axioms!
P (A ∩ B)
P (A|B) = (1)
P (B)
e.g. dice, imagine somebody has peaked and already told us that the answer is even.
P (A) · P (B)
P (A|B) = = P (A) (3)
P (B)
# times outcome in A
P (A) = lim (4)
n→∞ n
e.g. ”usual” Copenhagen interpretation of QM, proton-proton (pp) collisions, ra-
dioactive decay.
2
2. Subjective (Bayesian): we can think of elements of S as hypotheses which may be
true or false. Only one of the hypotheses in S is true, and P(A) is my subjective
degree of belief that the true hypothesis is in that subset,
e.g. cosmology, since we only have access to one entire universe and therefore any
experimentation is not repeatable.
Both interpretations are consistent with the Kolmogorov Axioms, but remain distinct.
P (B ∩ A)
P (B|A) = (5)
P (A)
P (A)
P (A|B) = P (B|A) · (6)
P (B)
this is key: the probability that a person eats bread given that they are a murderer is
something like 80%. The probability that a person is a murderer given that they eat
bread is something like 10−6 .
X X
P (B) = P (B ∩ Ai ) = P (B|Ai ) · P (Ai ) (7)
i i
P (B|A) · P (A)
P (A|B) = P (9)
i P (B|Ai ) · P (Ai )
3
3.5 Example application of Bayes’ Theorem
Consider disease D where: P(D) = 0.001, P(no D) = 0.999.
The accuracy of the test is provided as follows:
P (+|D) = 0.98
P (−|D) = 0.02
P (+|noD) = 0.03
P (−|noD) = 0.97
P (+|D)P (D)
P (D|+) = = 0.032 (10)
P (+)
~ ~
~ = PP (X|H) · π(H) → R P (X|H) · π(H)
P (H|X) (11)
~ ~
i P (X|Hi ) · π(Hi ) dθP (X|θ) · π(θ)
This denominator is a kind of normalization constant. But where do we find our prior,
π(H)? As There is no known golden rule, this is a subjective choice. But, Bayes’ theorem
gives an if then statement: if I have assumed some prior, given experimental data, the
theorem allows us to update our assumption.
4
Before any serious experimentation, we might have a very broad, ”uninformative” prior.
For any sufficiently uniform prior within the region of the experimental results, the poste-
rior will be very similar and lends legitimacy to this approach. However, a prior which is
very certain (like a delta function), and false, will not easily be improved by this approach.
It is therefore important to keep an open mind.
The bulk of statistical tools used in physics are frequentist in nature, and this course will
match that.
5 Random Variables
A random variable (RV) is a numerical label for each element of the sample space (or
the hypothesis space). This label could be continuous or discrete. It could be a single
quantity (a scalar) or a collection of numbers (a vector).
For continuous variables, we need to talk about the probability of finding a value in some
infinitessimal interval:
Z b
f (x)dx = P (a < x < b) (13)
a
For discrete variables, P (xi ) = pi gives a ”probability mass function” rather than a pdf.
We can calculate the cumulative distribution function (cdf):
Z x
0
F (x) = P (x < x) = f (x0 )dx0 (15)
−∞
6 Homework
Problem Set 1: Due Monday, October 9