DarkArts Handout

Statistics
Lent Term 2015

Prof. Mark Thomson
Lecture 4 : The Dark Arts

Prof. M.A. Thomson Lent 2015 107
Course Synopsis
Lecture 1: The basics
Introduction, Probability distribution functions, Binomial
distributions, Poisson distribution
Lecture 2: Treatment of Gaussian Errors
The central limit theorem, Gaussian errors, Error
propagation, Combination of measurements, Multi-
dimensional Gaussian errors, Error Matrix
Lecture 3: Fitting and Hypothesis Testing
The χ2 test, Likelihood functions, Fitting, Binned maximum
likelihood, Unbinned maximum likelihood
Lecture 4: The Dark Arts
Bayesian Inference, Credible Intervals
The Frequentist approach, Confidence Intervals
Systematic Uncertainties

Parameter Estimation Revisited
!  Let s consider more carefully the maximum likelihood method
for simplicity consider a single parameter
!  Construct the likelihood that our data are consistent with the model, i.e.
the probability that the model would give the observed data
!  We have then (very reasonably) taken the value of which maximises

the likelihood as our best estimate of the parameter
!  With less justification we then took our error estimate from
!  Does this really make sense ?

!  What we really want to calculate is the posterior PDF for the parameter
given the data, i.e.
assumed
Can not justify this – in general it is not the case
Conditional Probabilities and Bayes Theory

!  A nice example of conditional probability (from L. Lyons)
"  In the general population, the probability of a randomly selected woman
being pregnant is 2%
"  But
!  Correct treatment of conditional probabilities requires Bayes theorem
"  Probability of A and B can be expressed in terms of conditional probabilities
!  Here the prior probability of selecting a woman is

i.e. half population are women
and the prior probability of selecting a pregnant person is
i.e. 1 % of population are pregnant
Sanity
restored…

!  Apply Bayes theory to our the measurement of a parameter x
"  We determine , i.e. the likelihood function
"  We want , i.e. the PDF for x in the light of the data
"  Bayes theory gives:
the likelihood function, i.e. what we measure

the posterior PDF for x, i.e. in the light of the data
prior probability of the data. Since this doesn t depend on
x it is essentially a normalisation constant
prior probability of x, i.e. encompassing our knowledge of
x before the measurement
!  Bayes theory tells us how to modify our knowledge of x in the light of new data
Bayes theory is the formal basis of Statistical Inference
Applying Bayes Theorem

!  Bayes theory provides an unambiguous prescription for going from
!  But you need to provide the PRIOR PROBABILITY

!  This is fine if you have an objective prior, e.g. a previous measurement
"  If we now make a new measurement, i.e. determine the likelihood function
"  Bayes theory then gives
Where and are the usual

mean and variance for combining
two measurements
"  For this to be a (normalised) PDF can infer (although it isn t of any interest):

The Problem with Applying Bayes Theorem
!  The problem arises when there is no objective prior
!  For example, in a hypothetical background free search for a Z , observe
no events
"  No problem in calculating the likelihood function (a conditional probability)
Poisson prob. for observing 0
x is the true number of expected events

"  What is the best estimate of x and the 90 % confidence level upper limit ?
"  Depends on the choice of prior probability:
"  What to do about the prior ?

"  i.e. how do we express our knowledge (none) of x prior to the measurement
!  In general there is no objective answer, always putting in some extra information
"  i.e. a subjective bias
"  could argue that a flat prior, i.e. P(x) = constant, is objective
"  but why not choose a prior that is flat in ln x ?
"  for some limits/measurements (e.g. a mass) a flat prior in ln x is more natural
"  the arbitrariness in the choice of prior is a problem for the Bayesian approach
"  it can make a big difference…
Choice of Prior, example I

!  See no events…
Poisson prob. for observing 0
Prior flat prior in x : Prior flat prior in lnx :
!  The Conclusions are very different. Compare regions containing 90 % of probability
"  In this case, the choice of prior is important

Choice of Prior, example II
!  Suppose we measure the W-boson mass:
!  We want
"  Again consider two priors
"  Here the choice of prior is NOT important

"  The data are strong enough to overcome our prior assumptions (subjective bias)
"  Here, can interpret the measurement as a Gaussian PDF for m
Choice of Prior, example III

#  An example (apparently due to Newton), e.g. see CERN Yellow Report 2000-005
!  Suppose you are in the Tower of London facing execution.
!  The Queen arrives carrying a small bag and says
This bag contains 5 balls; the balls are either white or black. If you correctly
guess the number of black balls, I will spare your life and set you free.
!  The Queen is in a good mood and continues
To give you a better chance, you can take one of the balls from the bag.
It s BLACK
!  The Queen points her pistol at you
Time to choose, sucker…
!  What do you guess to maximise your chance of survival ?
!  Use statistical inference to analyse the problem.
"  Let n be the number of black balls in the bag.
"  The data are that you picked out a black ball
"  Can calculate
e.g. if there were two black balls chance of picking out a black ball from the
five in the bag was 2/5.

1
0.9
0.8
0.7
0.6
!  But we want
0.5
0.4 !  Answer depends on choice of Prior
0.3
0.2
0.1
0
0 1 2 3 4 5
!  Could assume flat Prior

0.18 0.35
0.16
0.3
0.14
0.25
0.12
0.1
0.08
0.2
0.15
GUESS: 5
0.06
0.1
0.04
0.05
0.02
0 0
0 1 2 3 4 5 0 1 2 3 4 5
!  Could assume balls drawn randomly from a large bag containing equal nos. B & W
0.35 0.4
0.35
0.3
0.3
0.25
0.25
0.2
0.2 GUESS: 3
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0 1 2 3 4 5 0 1 2 3 4 5
!  Oh dear… answer depends on Prior (unknown) assumptions

!  So what do we learn from this ?

(apart something about the role of the Monarchy in a modern democracy)
"  Whilst we know how to apply Bayesian statistical inference, we have

insufficient data, i.e. we don t know the prior
"  Unless the data are strong , i.e. override the information in the reasonable
range of prior probabilities, we cannot expect to know
"  Applies equally to our experiment where we saw zero events and wanted to
arrive at a PDF for the expected mean number of events…
Don t have enough information to answer this question

Bayesian Credible Intervals
!  Ideally, (I) would like to work with probabilities, i.e. a PDF which encompasses all
our knowledge of a particular parameter, e.g.
!  Could then integrate PDF to contain 95 % of probability. Can then define the
95 % Credible Interval*: mH < 186 GeV
!  To do this need to go from , i.e. from , to

"  requires subjective choice of prior probability
!  Hence Bayesian Credible Intervals necessarily include some additional input
beyond the data alone…
*This is not what is done.
Bayesian Credible Intervals - example

!  Trying to estimate a selection efficiency using MC events. All N events pass cuts.
"  what statement can we make about the efficiency?
!  Binomial distribution…
!  Apply Bayes theorem:

Prior
Constant
!  Choose prior, e.g.
!  Normalise

! Integrate to find region containing 90% of probability
90 % Credible Interval:
(with a flat prior probability)
Likelihood Ordering
!  Note, 90 % credible interval is not uniquely defined
"  more than one interval contains 90 % probability, e.g.
90 % Credible Interval:
!  Natural, to choose the interval such that all points in the excluded region are
lower in likelihood than those in the credible interval : likelihood ordering
!  Credible intervals provide an intuitive way of interpreting data, but:
"  Rarely used in Particle Physics as a way of presenting data
"  Because they represent the data and prior combined
"  NOTE: all information from the experiment is in the likelihood

C.I. vs C.L.
!  From data obtain
!  Bayes theorem provides the mathematical framework for statistical inference
!  To go from requires a (usually) subjective choice
of Prior probability
!  For weak data, the choice of Prior can drive the interpretation of the data
!  Credible intervals are a useful way of interpreting data, but are generally not
used in Particle Physics as a way of presenting the conclusions of an experiment.
!  Particle Physics to use Frequentist Confidence limits which are not
[and do not form a mathematically consistent basis for
statistical inference]
!  Finally, never forget that credible intervals (or confidence limits) are an
interpretation of the data
The experimental result is the likelihood function
A Few words on Systematic Uncertainties

!  Systematic Uncertainties are often associated with an internal unknown bias, e.g.
"  How well do you know your calibration
"  How well does MC model the data, e.g. jet fragmentation parameters
!  Parametric Uncertainties associated with uncertain parameters
"  How does the uncertainty on the Higgs mass impact the interpretation of a
a measurement
!  No over-riding principle – just some general guidelines
"  Once a result is published, systematic errors will be treated as if they are
Gaussian
x = a ± b (stat.) ± c (syst.)
"  Some systematic errors are Gaussian: e.g. energy scale determined
from data e.g. Z ! e+ e to determine electron energy scale
"  Others are not: e.g. impact of different jet hadronisation models, where one
might compare PYTHIA with HERWIG – here one obtains a single estimate
of the scale of the uncertainties
"  Theoretical uncertainties: e.g. missing HO corrections. Again these are
estimates – should not be treated as Gaussian (although they are)
!  Systematic dominated measurements
"  Beware – if there is a single dominating systematic error and it is inherently
non-Gaussian, this is a problem

Estimating Systematic Uncertainties
!  No rules – just guidelines
"  Remember syst. errors will be treated as Gaussian, so try to evaluate them on
this basis, e.g. suppose use 3 alternative MC jet fragmentation models and
result changes by +Δ1, +Δ2 and –Δ3 (where Δ2 is the largest):
i) take largest shift as systematic error estimate: Δ2 ?
ii) assume error distributed uniformly in “box” of width 2Δ2 giving an rms
of 2Δ2 /√12 ?
"  Cut variation is evil (i.e. vary cuts and see how results change)
•  at best, introduces statistical noise
•  at worst, hides away lack of understanding of some data - MC discrepancy
understand the origin of the discrepancy
"  Wherever possible use data driven estimates, energy scales, control samples,
etc.
"  Remember that you are estimating the scale of a possible systematic bias
Incorporating Systematics into Fits

!  Two commonly used approaches
"  Error matrix – with (correlated) systematic uncertainties
"  Nuisance parameters
!  Nuisance parameter example:
"  Suppose we are looking at WW decays and count numbers of events in three
different decay channels qqqq, qqlv and lvlv
"  Want to measure cross section and hadronic branching fractions accounting
for common luminosity uncertainty
i) build physics model
exp WW
Nqqqq ( WW , Bqq , L) = B2qq
"L
ii) build likelihood function
exp exp obs 2 exp
(Nqqqq obs 2
Nqqqq ) (Nqqlv Nqqlv ) (Nlvlv obs 2
Nlvlv )
2
( WW , Bqq , L) = 2 ln L = exp + exp + exp
Nqqqq Nqqlv Nlvlv
iii) add penalty term for nuisance parameters, here integrated lumi. Known
to be L0 with uncertainty σL
exp exp obs 2 exp
(Nqqqq obs 2
Nqqqq ) (Nqqlv Nqqlv ) (Nlvlv obs 2
Nlvlv ) (L L0 )2
2
( WW , Bqq , L) = 2 ln L = exp + exp + exp + 2
Nqqqq Nqqlv Nlvlv L

Incorporating Systematics into Fits
!  Let’s consider this more closely
exp exp obs 2
(Nqqqq N obs 2
qqqq ) (Nqqlv Nqqlv ) (L L 0 )2
2
( WW , Bqq , L) = 2 ln L = exp + ... exp + 2
Nqqqq Nqqlv L
"  We are now fitting 3 parameters

•  the number of degrees of freedom has not changed, since we have added one
parameter, but also one additional “data point”
"  Of the 3 parameters, we are “not interested” in the fitted value of the lumi.
"  The penalty term constrains the luminosity to be consistent with the
externally measured value
"  The presence of the nuisance parameters will flatten the fitted likelihood
surface – increasing the uncertainties on the fitted parameters
"  Also have some measure of the tension in the fit
•  if the data pull the nuisance parameter away from the expected value, could
indicate a problem
That’s All Folks

DarkArts Handout

Uploaded by

Copyright:

DarkArts Handout

Uploaded by

Document Information

Original Description:

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

DarkArts Handout

Uploaded by

Copyright:

Statistics

Lent Term 2015

Lecture 4 : The Dark Arts

Prof. M.A. Thomson Lent 2015 108

! We have then (very reasonably) taken the value of which maximises

! Does this really make sense ?

Can not justify this – in general it is not the case

Prof. M.A. Thomson Lent 2015 109

Conditional Probabilities and Bayes Theory

! Here the prior probability of selecting a woman is

Prof. M.A. Thomson Lent 2015 110

" Bayes theory gives:

the likelihood function, i.e. what we measure

Prof. M.A. Thomson Lent 2015 111

Applying Bayes Theorem

! But you need to provide the PRIOR PROBABILITY

" Bayes theory then gives

Where and are the usual

Prof. M.A. Thomson Lent 2015 112

x is the true number of expected events

" What to do about the prior ?

Choice of Prior, example I

Prior flat prior in x : Prior flat prior in lnx :

! The Conclusions are very different. Compare regions containing 90 % of probability

" In this case, the choice of prior is important

Prof. M.A. Thomson Lent 2015 114

" Here the choice of prior is NOT important

Choice of Prior, example III

Prof. M.A. Thomson Lent 2015 116

! Could assume flat Prior

! Oh dear… answer depends on Prior (unknown) assumptions

! So what do we learn from this ?

" Whilst we know how to apply Bayesian statistical inference, we have

Prof. M.A. Thomson Lent 2015 118

! To do this need to go from , i.e. from , to

Bayesian Credible Intervals - example

! Apply Bayes theorem:

Prof. M.A. Thomson Lent 2015 120

Prof. M.A. Thomson Lent 2015 121

Prof. M.A. Thomson Lent 2015 122

The experimental result is the likelihood function

Prof. M.A. Thomson Lent 2015 123

A Few words on Systematic Uncertainties

Prof. M.A. Thomson Lent 2015 124

Prof. M.A. Thomson Lent 2015 125

Incorporating Systematics into Fits

Prof. M.A. Thomson Lent 2015 126

" We are now fitting 3 parameters

Prof. M.A. Thomson Lent 2015 127

That’s All Folks

Prof. M.A. Thomson Lent 2015 128

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

!  We have then (very reasonably) taken the value of which maximises

!  Does this really make sense ?

!  Here the prior probability of selecting a woman is

"  Bayes theory gives:

!  But you need to provide the PRIOR PROBABILITY

"  Bayes theory then gives

"  What to do about the prior ?

!  The Conclusions are very different. Compare regions containing 90 % of probability

"  In this case, the choice of prior is important

"  Here the choice of prior is NOT important

!  Could assume flat Prior

!  Oh dear… answer depends on Prior (unknown) assumptions

!  So what do we learn from this ?

"  Whilst we know how to apply Bayesian statistical inference, we have

!  To do this need to go from , i.e. from , to

!  Apply Bayes theorem:

"  We are now fitting 3 parameters