Understanding P - Values and CI 20nov08

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 37

Understanding P-values and

Confidence Intervals

Thomas B. Newman, MD, MPH

20 Nov 08
Announcements
 Optional reading about P-values and
Confidence Intervals on the website
 Exam questions due Monday 11/24/08 5:00
PM
 Next week (11/27) is Thanksgiving
 Following week Physicians and Probability
(Chapter 12) and Course Review
 Final exam to be distributed in SECTION 12/4
and posted on web
 Exam due 12/11 8:45 AM
 Key will be posted shortly thereafter
Overview
 Introduction and justification
 What P-values and Confidence Intervals don’t
mean
 What they do mean: analogy between
diagnostic tests and clinical researc
 Useful confidence interval tips
– CI for “negative” studies; absolute vs.
relative risk
– Confidence intervals for small numerators
Why cover this material here?

 P-values and confidence intervals are


ubiquitous in clinical research
 Widely misunderstood and mistaught
 Pedagogical argument:
– Is it important?
– Can you handle it?
Example: Douglas Altman Definition of
95% Confidence Intervals*
 "A strictly correct definition of a 95% CI is,
somewhat opaquely, that 95% of such
intervals will contain the true population
value.
 “Little is lost by the less pure interpretation of
the CI as the range of values within which we
can be 95% sure that the population value
lies.”

*Quoted in: Guyatt, G., D. Rennie, et al. (2002). Users' guides to the medical
literature : essentials of evidence-based clinical practice. Chicago, IL, AMA
Press.
Understanding P-values and
confidence intervals is important
because
 It explains things which otherwise do
not make sense, e.g. the need to state
hypotheses in advance and correction
for multiple hypothesis testing
 You will be using them all the time
 You are future leaders in clinical
research
You can handle it because

 We have already covered the important


concepts at length earlier in this course
– Prior probability
– Posterior probability
– What you thought before + new
information = what you think now
 We will support you through the process
Review of traditional statistical
significance testing
 State null (Ho) and alternative (Ha)
hypotheses
 Choose α
 Calculate value of test statistic from
your data
 Calculate P- value from test statistic
 If P-value < α, reject Ho
Problem:
 Traditional statistical significance testing
has led to widespread misinterpretation
of P-values
What P-values don’t mean

 If the P-value is 0.05, there is a 95%


probability that…
– The results did not occur by chance
– The null hypothesis is false
– There really is a difference between the
groups
So if P = 0.05, what IS there a 95%
probability of?
White board:
 2x2 tables and “false positive confusion”
 Analogy with diagnostic tests
 (This is covered step-by-step in the
course book.)
Analogy between diagnostic tests
and research studies

Diagnostic Test Research Study


Absence of Disease
Presence of disease
Severity of disease in the
diseased group
Cutoff for distinguishing
positive and negative
results
Test result
Analogy between diagnostic tests
and research studies
Diagnostic Test Research Study
Negative result (test
within normal limits)
Positive result
Sensitivity
False positive rate (1-
specificity)
Prior probability of
disease (of a given
severity)
Posterior probability of
disease, given test result
Extending the Analogy

 Intentionally ordered tests and


hypotheses stated in advance
 Multiple tests and multiple hypotheses
 Laboratory error and bias
 Alternative diagnoses and confounding
Bonferroni
 Inequality: If we do k different tests,
each with significance level α, the
probability that one or more will be
significant is less than or equal to k  α
 Correction: If we test k different
hypotheses and want our total Type 1
error rate to be no more than alpha,
then we should reject H0 only if P < α/k
Derivation
 Let A & B = probability of a Type 1 error for
hypotheses A and B
 P(A or B) = P(A) + P(B) – P(A & B)
 Under Ho, P(A) = P(B) = α
 So P(A or B) = α + α - P(A & B) = 2α - P(A & B).
 Of course, it is possible to falsely reject 2 different null
hypotheses, so P(A & B) > 0. Therefore, the
probability of falsely rejecting either of the null
hypotheses must be less than 2α.
 Note that often A & B are not independent, in which
case Bonferroni will be even more excessively
conservative
Problems with Bonferroni correction

 Overly conservative (especially when


hypotheses are not independent)
 Maintains specificity at the expense of
sensitivity
 Does not take prior probability into
account
 Not clear when to use it
 BUT can be useful if results still
significant
CONFIDENCE INTERVALS
What Confidence Intervals don’t
mean
 There is a 95% chance that the true
value is within the interval
 If you conclude that the true value is
within the interval you have a 95%
chance of being right
 The range of values within which we
can be 95% sure that the population
value lies
One source of confusion: Statistical
“confidence”

 (Some) statisticians say: “You can be 95%


confident that the population value is in the
interval.”
 This is NOT the same as “There is a 95%
probability that the population value is in the
interval.”
 “Confidence” is tautologously defined by
statisticians as what you get from a
confidence interval
Illustration
 If a 95% CI has a 95% chance of containing
the true value, then a 90% CI should have a
90% chance and a 40% CI should have a
40% chance.
 Study: 4 deaths in 10 subjects in each group
 RR= 1.0 (95% CI: 0.34 to 2.9)
 40% CI: 0.75 to 1.33
 Conclude from this study that there is 60%
chance that the true RR is <0.75 or > 1.33?
Confidence Intervals apply to a
Process
 Consider a bag with 19 white and 1 pink
grapefruit
 The process of selecting a grapefruit at
random has a 95% probability of yielding a
white one
 But once I’ve selected one, does it still have a
95% chance of being white?
 You may have prior knowledge that changes
the probability (e.g., pink grapefruit have
thinner peel are denser, etc.)
Confidence Intervals for negative
studies: 5 levels of sophistication
 Example 1: Oral amoxicillin to treat
possible occult bacteremia in febrile
children*
– Randomized, double-blind trial
– 3-36 month old children with T≥ 39º C (N=
955)
– Treatment: Amox 125 mg/tid (≤ 10 kg) or
250 mg tid (> 10 kg)
– Outcome: major infectious morbidity

*Jaffe et al., New Engl J Med 1987;317:1175-80


Amoxicillin for possible occult
bacteremia 2: Results
 Bacteremia in 19/507 (3.7%) with amox,
vs 8/448 (1.8%) with placebo (P=0.07)
 “Major Infectious Morbidity” 2/19
(10.5%) with amox vs 1/8 (12.5%) with
placebo (P = 0.9)
 Conclusion: “Data do not support
routine use of standard doses of
amoxicillin…”
5 levels of sophistication
 Level 1: P > 0.05 = treatment does not
work
 Level 2: Look at power for study.
(Authors reported power = 0.24 for
OR=4. Therefore, study underpowered
and negative study uninformative.)
5 levels of sophistication, cont’d
 Level 3: Look at 95% CI!
 Authors calculated OR= 1.2 (95% CI:
0.02 to 30.4)
– This is based on 1/8 (12.5%) with placebo
vs 2/19 (10.5%) with amox
– (They put placebo on top)
– (Silly to use OR)
 With amox on top, RR = 0.84 (95% CI:
0.09 to 8.0)
 This was level of TBN in letter to the
editor (1987)
5 levels of sophistication, cont’d
 Level 4: Make sure you do an “intention
to treat” analysis!
– It is not OK to restrict attention to
bacteremic patients
– So it should be 2/507 (0.39%) with amox vs
1/448 (0.22%) with placebo
– RR= 1.8 (95% CI: 0.05 to 6.2)
Level 5: the clinically relevant quantity
is the Absolute Risk Reduction (ARR)!
 2/507 (0.39%) with amox vs 1/448 (0.22%)
with placebo
 ARR = −0.17% {amoxicillin worse}
 95% CI (−0.9% {harm} to +0.5% {benefit})
 Therefore, LOWER limit of 95% CI for benefit
(I.e., best case) is NNT= 1/0.5% = 200
 So this study suggests need to treat ≥ 200
children to prevent “Major Infectious
Morbidity” in one
Stata output
. csi 2 1 505 447
| Exposed Unexposed | Total
-----------------+------------------------+----------
Cases | 2 1 | 3
Noncases | 505 447 | 952
-----------------+------------------------+----------
Total | 507 448 | 955
| |
Risk | .0039448 .0022321 | .0031414
| |
| Point estimate | [95% Conf. Interval]
|------------------------+----------------------
Risk difference | .0017126 | -.005278 .0087032
Risk ratio | 1.767258 | .1607894 19.42418
Attr. frac. ex. | .4341518 | -5.219315 .9485178
Attr. frac. pop | .2894345 |
+-----------------------------------------------
chi2(1) = 0.22 Pr>chi2 = 0.6369
Example 2: Pyelonephritis and new renal
scarring in the International Reflux
Study in Children*
 RCT of ureteral reimplantation vs prophylactic
antibiotics for children with vesicoureteral
reflux
 Overall result: surgery group fewer episodes
of pyelonephritis (8% vs 22%; NNT = 7; P <
0.05) but more new scarring (31% vs 22%; P
= .4)
 This raises questions about whether new
scarring is caused by pyelonephritis
Weiss et al. J Urol 1992; 148:1667-73
Within groups no association between
new pyelo and new scarring
 Trend goes in the OPPOSITE direction

New No New
N %
Scarring Scarring
New pyelo 2 18 20 10%
No new
28 58 86 29%
pyelo
Total 30 76 106

RR=0.28; 95% CI (0.09-1.32)


Weiss, J Urol 1992:148;1672
Stata output to get 95% CI:
. csi 2 18 28 58
| Exposed Unexposed | Total
-----------------+------------------------+------------
Cases | 2 18 | 20
Noncases | 28 58 | 86
-----------------+------------------------+------------
Total | 30 76 | 106
| |
Risk | .0666667 .2368421 | .1886792
| |
| Point estimate | [95% Conf. Interval]
|------------------------+------------------------
Risk difference | -.1701754 | -.3009557 -.0393952
Risk ratio | .2814815 | .069523 1.13965
Prev. frac. ex. | .7185185 | -.1396499 .930477
Prev. frac. pop | .2033543 |
+-----------------------------------------
chi2(1) = 4.07 Pr>chi2 = 0.0437
Conclusions
 No evidence that new pyelonephritis causes scarring
 Some evidence that it does not
 P-values and confidence intervals are approximate,
especially for small sample sizes
 There is nothing magical about 0.05

 Key concept: calculate 95% CI for negative studies


– ARR for clinical questions (less generalizable)
– RR for etiologic questions
Confidence intervals for small
numerators
Observed Approximate
numerator Numerator for
Upper Limit of 95%
CI
0 3
1 5
2 7
3 9
4 10
When P-values and Confidence Intervals
Disagree
 Usually P < 0.05 means 95% CI excludes null value.
 But both 95% CI and P-values are based on
approximations, so this may not be the case
 Illustrated by IRSC slide above
 If you want 95% CI and P- values to agree, use “test-
based” confidence intervals – see next slide
Alternative Stata output: Test-
based CI
.
. csi 2 18 28 58,tb

| Exposed Unexposed | Total


-----------------+-----------------------+------------
Cases | 2 18 | 20
Noncases | 28 58 | 86
-----------------+-----------------------+------------
Total | 30 76 | 106
| |
Risk | .0666667 .2368421 | .1886792
| |
| Point estimate | [95% Conf. Interval]
|-----------------------+------------------------
Risk difference | -.1701754 | -.3363063 -.0040446 (tb)
Risk ratio | .2814815 | .0816554 .9703199 (tb)
Prev. frac. ex. | .7185185 | .0296801 .9183446 (tb)
Prev. frac. pop | .2033543 |
+-------------------------------------------------
 chi2(1) = 4.07 Pr>chi2 = 0.0437

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy