DarkArts Handout

You are on page 1of 11

Statistics

Lent Term 2015


Prof. Mark Thomson

Lecture 4 : The Dark Arts


Prof. M.A. Thomson Lent 2015 107

Course Synopsis
Lecture 1: The basics
Introduction, Probability distribution functions, Binomial
distributions, Poisson distribution
Lecture 2: Treatment of Gaussian Errors
The central limit theorem, Gaussian errors, Error
propagation, Combination of measurements, Multi-
dimensional Gaussian errors, Error Matrix
Lecture 3: Fitting and Hypothesis Testing
The χ2 test, Likelihood functions, Fitting, Binned maximum
likelihood, Unbinned maximum likelihood
Lecture 4: The Dark Arts
Bayesian Inference, Credible Intervals
The Frequentist approach, Confidence Intervals
Systematic Uncertainties

Prof. M.A. Thomson Lent 2015 108


Parameter Estimation Revisited
!  Let s consider more carefully the maximum likelihood method
for simplicity consider a single parameter
!  Construct the likelihood that our data are consistent with the model, i.e.
the probability that the model would give the observed data

!  We have then (very reasonably) taken the value of which maximises


the likelihood as our best estimate of the parameter
!  With less justification we then took our error estimate from

!  Does this really make sense ?


!  What we really want to calculate is the posterior PDF for the parameter
given the data, i.e.

assumed

Can not justify this – in general it is not the case

Prof. M.A. Thomson Lent 2015 109

Conditional Probabilities and Bayes Theory


!  A nice example of conditional probability (from L. Lyons)
"  In the general population, the probability of a randomly selected woman
being pregnant is 2%

"  But
!  Correct treatment of conditional probabilities requires Bayes theorem
"  Probability of A and B can be expressed in terms of conditional probabilities

!  Here the prior probability of selecting a woman is


i.e. half population are women
and the prior probability of selecting a pregnant person is
i.e. 1 % of population are pregnant

Sanity
restored…

Prof. M.A. Thomson Lent 2015 110


!  Apply Bayes theory to our the measurement of a parameter x
"  We determine , i.e. the likelihood function
"  We want , i.e. the PDF for x in the light of the data

"  Bayes theory gives:

the likelihood function, i.e. what we measure


the posterior PDF for x, i.e. in the light of the data
prior probability of the data. Since this doesn t depend on
x it is essentially a normalisation constant
prior probability of x, i.e. encompassing our knowledge of
x before the measurement
!  Bayes theory tells us how to modify our knowledge of x in the light of new data
Bayes theory is the formal basis of Statistical Inference

Prof. M.A. Thomson Lent 2015 111

Applying Bayes Theorem


!  Bayes theory provides an unambiguous prescription for going from

!  But you need to provide the PRIOR PROBABILITY


!  This is fine if you have an objective prior, e.g. a previous measurement

"  If we now make a new measurement, i.e. determine the likelihood function

"  Bayes theory then gives

Where and are the usual


mean and variance for combining
two measurements
"  For this to be a (normalised) PDF can infer (although it isn t of any interest):

Prof. M.A. Thomson Lent 2015 112


The Problem with Applying Bayes Theorem
!  The problem arises when there is no objective prior
!  For example, in a hypothetical background free search for a Z , observe
no events
"  No problem in calculating the likelihood function (a conditional probability)
Poisson prob. for observing 0

x is the true number of expected events


"  What is the best estimate of x and the 90 % confidence level upper limit ?
"  Depends on the choice of prior probability:

"  What to do about the prior ?


"  i.e. how do we express our knowledge (none) of x prior to the measurement
!  In general there is no objective answer, always putting in some extra information
"  i.e. a subjective bias
"  could argue that a flat prior, i.e. P(x) = constant, is objective
"  but why not choose a prior that is flat in ln x ?
"  for some limits/measurements (e.g. a mass) a flat prior in ln x is more natural
"  the arbitrariness in the choice of prior is a problem for the Bayesian approach
"  it can make a big difference…
Prof. M.A. Thomson Lent 2015 113

Choice of Prior, example I


!  See no events…
Poisson prob. for observing 0

Prior flat prior in x : Prior flat prior in lnx :

!  The Conclusions are very different. Compare regions containing 90 % of probability

"  In this case, the choice of prior is important

Prof. M.A. Thomson Lent 2015 114


Choice of Prior, example II
!  Suppose we measure the W-boson mass:

!  We want
"  Again consider two priors

"  Here the choice of prior is NOT important


"  The data are strong enough to overcome our prior assumptions (subjective bias)
"  Here, can interpret the measurement as a Gaussian PDF for m
Prof. M.A. Thomson Lent 2015 115

Choice of Prior, example III


#  An example (apparently due to Newton), e.g. see CERN Yellow Report 2000-005
!  Suppose you are in the Tower of London facing execution.
!  The Queen arrives carrying a small bag and says
This bag contains 5 balls; the balls are either white or black. If you correctly
guess the number of black balls, I will spare your life and set you free.
!  The Queen is in a good mood and continues
To give you a better chance, you can take one of the balls from the bag.
It s BLACK
!  The Queen points her pistol at you
Time to choose, sucker…
!  What do you guess to maximise your chance of survival ?
!  Use statistical inference to analyse the problem.
"  Let n be the number of black balls in the bag.
"  The data are that you picked out a black ball
"  Can calculate
e.g. if there were two black balls chance of picking out a black ball from the
five in the bag was 2/5.

Prof. M.A. Thomson Lent 2015 116


1
0.9
0.8
0.7
0.6
!  But we want
0.5
0.4 !  Answer depends on choice of Prior
0.3
0.2
0.1
0
0 1 2 3 4 5

!  Could assume flat Prior


0.18 0.35
0.16
0.3
0.14
0.25
0.12
0.1
0.08
0.2

0.15
GUESS: 5
0.06
0.1
0.04
0.05
0.02
0 0
0 1 2 3 4 5 0 1 2 3 4 5

!  Could assume balls drawn randomly from a large bag containing equal nos. B & W
0.35 0.4
0.35
0.3
0.3
0.25
0.25
0.2
0.2 GUESS: 3
0.15 0.15
0.1 0.1
0.05 0.05

0 0
0 1 2 3 4 5 0 1 2 3 4 5

!  Oh dear… answer depends on Prior (unknown) assumptions


Prof. M.A. Thomson Lent 2015 117

!  So what do we learn from this ?


(apart something about the role of the Monarchy in a modern democracy)

"  Whilst we know how to apply Bayesian statistical inference, we have


insufficient data, i.e. we don t know the prior
"  Unless the data are strong , i.e. override the information in the reasonable
range of prior probabilities, we cannot expect to know

"  Applies equally to our experiment where we saw zero events and wanted to
arrive at a PDF for the expected mean number of events…
Don t have enough information to answer this question

Prof. M.A. Thomson Lent 2015 118


Bayesian Credible Intervals
!  Ideally, (I) would like to work with probabilities, i.e. a PDF which encompasses all
our knowledge of a particular parameter, e.g.

!  Could then integrate PDF to contain 95 % of probability. Can then define the
95 % Credible Interval*: mH < 186 GeV

!  To do this need to go from , i.e. from , to


"  requires subjective choice of prior probability
!  Hence Bayesian Credible Intervals necessarily include some additional input
beyond the data alone…
*This is not what is done.
Prof. M.A. Thomson Lent 2015 119

Bayesian Credible Intervals - example


!  Trying to estimate a selection efficiency using MC events. All N events pass cuts.
"  what statement can we make about the efficiency?
!  Binomial distribution…

!  Apply Bayes theorem:


Prior
Constant
!  Choose prior, e.g.

!  Normalise

Prof. M.A. Thomson Lent 2015 120


! Integrate to find region containing 90% of probability

90 % Credible Interval:
(with a flat prior probability)

Prof. M.A. Thomson Lent 2015 121

Likelihood Ordering
!  Note, 90 % credible interval is not uniquely defined
"  more than one interval contains 90 % probability, e.g.

90 % Credible Interval:
!  Natural, to choose the interval such that all points in the excluded region are
lower in likelihood than those in the credible interval : likelihood ordering
!  Credible intervals provide an intuitive way of interpreting data, but:
"  Rarely used in Particle Physics as a way of presenting data
"  Because they represent the data and prior combined
"  NOTE: all information from the experiment is in the likelihood

Prof. M.A. Thomson Lent 2015 122


C.I. vs C.L.
!  From data obtain
!  Bayes theorem provides the mathematical framework for statistical inference
!  To go from requires a (usually) subjective choice
of Prior probability
!  For weak data, the choice of Prior can drive the interpretation of the data
!  Credible intervals are a useful way of interpreting data, but are generally not
used in Particle Physics as a way of presenting the conclusions of an experiment.
!  Particle Physics to use Frequentist Confidence limits which are not
[and do not form a mathematically consistent basis for
statistical inference]
!  Finally, never forget that credible intervals (or confidence limits) are an
interpretation of the data

The experimental result is the likelihood function

Prof. M.A. Thomson Lent 2015 123

A Few words on Systematic Uncertainties


!  Systematic Uncertainties are often associated with an internal unknown bias, e.g.
"  How well do you know your calibration
"  How well does MC model the data, e.g. jet fragmentation parameters
!  Parametric Uncertainties associated with uncertain parameters
"  How does the uncertainty on the Higgs mass impact the interpretation of a
a measurement
!  No over-riding principle – just some general guidelines
"  Once a result is published, systematic errors will be treated as if they are
Gaussian
x = a ± b (stat.) ± c (syst.)
"  Some systematic errors are Gaussian: e.g. energy scale determined
from data e.g. Z ! e+ e to determine electron energy scale
"  Others are not: e.g. impact of different jet hadronisation models, where one
might compare PYTHIA with HERWIG – here one obtains a single estimate
of the scale of the uncertainties
"  Theoretical uncertainties: e.g. missing HO corrections. Again these are
estimates – should not be treated as Gaussian (although they are)
!  Systematic dominated measurements
"  Beware – if there is a single dominating systematic error and it is inherently
non-Gaussian, this is a problem

Prof. M.A. Thomson Lent 2015 124


Estimating Systematic Uncertainties
!  No rules – just guidelines
"  Remember syst. errors will be treated as Gaussian, so try to evaluate them on
this basis, e.g. suppose use 3 alternative MC jet fragmentation models and
result changes by +Δ1, +Δ2 and –Δ3 (where Δ2 is the largest):
i) take largest shift as systematic error estimate: Δ2 ?
ii) assume error distributed uniformly in “box” of width 2Δ2 giving an rms
of 2Δ2 /√12 ?

"  Cut variation is evil (i.e. vary cuts and see how results change)
•  at best, introduces statistical noise
•  at worst, hides away lack of understanding of some data - MC discrepancy
understand the origin of the discrepancy

"  Wherever possible use data driven estimates, energy scales, control samples,
etc.

"  Remember that you are estimating the scale of a possible systematic bias

Prof. M.A. Thomson Lent 2015 125

Incorporating Systematics into Fits


!  Two commonly used approaches
"  Error matrix – with (correlated) systematic uncertainties
"  Nuisance parameters
!  Nuisance parameter example:
"  Suppose we are looking at WW decays and count numbers of events in three
different decay channels qqqq, qqlv and lvlv
"  Want to measure cross section and hadronic branching fractions accounting
for common luminosity uncertainty
i) build physics model
exp WW
Nqqqq ( WW , Bqq , L) = B2qq
"L
ii) build likelihood function
exp exp obs 2 exp
(Nqqqq obs 2
Nqqqq ) (Nqqlv Nqqlv ) (Nlvlv obs 2
Nlvlv )
2
( WW , Bqq , L) = 2 ln L = exp + exp + exp
Nqqqq Nqqlv Nlvlv
iii) add penalty term for nuisance parameters, here integrated lumi. Known
to be L0 with uncertainty σL
exp exp obs 2 exp
(Nqqqq obs 2
Nqqqq ) (Nqqlv Nqqlv ) (Nlvlv obs 2
Nlvlv ) (L L0 )2
2
( WW , Bqq , L) = 2 ln L = exp + exp + exp + 2
Nqqqq Nqqlv Nlvlv L

Prof. M.A. Thomson Lent 2015 126


Incorporating Systematics into Fits
!  Let’s consider this more closely
exp exp obs 2
(Nqqqq N obs 2
qqqq ) (Nqqlv Nqqlv ) (L L 0 )2
2
( WW , Bqq , L) = 2 ln L = exp + ... exp + 2
Nqqqq Nqqlv L

"  We are now fitting 3 parameters


•  the number of degrees of freedom has not changed, since we have added one
parameter, but also one additional “data point”
"  Of the 3 parameters, we are “not interested” in the fitted value of the lumi.
"  The penalty term constrains the luminosity to be consistent with the
externally measured value
"  The presence of the nuisance parameters will flatten the fitted likelihood
surface – increasing the uncertainties on the fitted parameters
"  Also have some measure of the tension in the fit
•  if the data pull the nuisance parameter away from the expected value, could
indicate a problem

Prof. M.A. Thomson Lent 2015 127

That’s All Folks

Prof. M.A. Thomson Lent 2015 128

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy