Likelihood Ratio PDF
Likelihood Ratio PDF
Likelihood Ratio PDF
Summary
• Diagnostic accuracy studies address how well a test identifies the target condition of interest.
• Sensitivity, specificity, predictive values and likelihood ratios (LRs) are all different ways of expressing test performance.
• Receiver operating characteristic (ROC) curves compare sensitivity versus specificity across a range of values for the
ability to predict a dichotomous outcome. Area under the ROC curve is another measure of test performance.
• All of these parameters are not intrinsic to the test and are determined by the clinical context in which the test is
employed.
• High sensitivity corresponds to high negative predictive value and is the ideal property of a “rule-out” test.
• High specificity corresponds to high positive predictive value and is the ideal property of a “rule-in” test.
• LRs leverage pre-test into post-test probabilities of a condition of interest and there is some evidence that they are more
intelligible to users.
Reference Standard
Disease present Disease absent Total
Index Test positive True positive (TP) False positive (FP) TP + FP
Index Test negative False negative (FN) True negative (TN) TN + FN
Total TP + FN TN + FP
in the population being studied, even if the sensitivity and condition who also test positive.3 Negative LR is the ratio of
specificity remain the same.1 In the examples discussed below, the proportion of patients who have the target condition who
the positive predictive value for B-type natriuretic peptide test negative to the proportion of patients without the target
(BNP) to identify congestive heart failure (CHF) is lower in condition who also test negative.3 A positive LR >10 and a
a low prevalence setting, namely patients being screened in negative LR <0.1 are considered to exert highly significant
general practice compared with newly-presenting breathless changes in probability, such as to alter clinical management.3
patients in the emergency department (ED). Application of Fagan’s nomogram is a way of making these
changes in probability by graphical means.3
This approach usually requires the creation of a cut-off point
from continuous data and depending on the cut-off selected, Applied Examples – BNP Studies
the sensitivity and specificity of a test will vary. If the cut- BNP is a cardiac peptide secreted from the ventricles in
off is selected so that the sensitivity increases, the specificity response to volume expansion and plasma levels have
will decrease as discussed in the example below. ROC curves been shown to be elevated in patients with left ventricular
are a way of graphically displaying true positives versus (LV) dysfunction and to correlate with the New York Heart
false-positives across a range of cut-offs and of selecting the Association class and prognosis.4 The Breathing Not Properly
optimal cut-off for clinical use to be selected.1 (BNP) study is an example of a diagnostic accuracy study in
which breathless patients newly presenting to the ED were
Ultimately, the value of a test will depend upon its ability to enrolled.5 Patients whose dyspnoea was clearly not secondary
alter a pre-test probability of a target condition into a post- to CHF were excluded.5 During the initial evaluation, BNP
test probability that will influence a clinical management was compared with the reference standard of CHF provided
decision. This can be achieved through the application of by two cardiologists who reviewed all medical records and
LRs as discussed further below.3 Positive LR is the ratio of independently classified the diagnosis without knowledge of
the proportion of patients who have the target condition and the BNP result. In the ROC curve shown (Figure 1), BNP levels
test positive to the proportion of patients without the target are plotted for their ability to predict CHF (a dichotomous
Figure 1. ROC curve for various cut-off levels of BNP in differentiating between dyspnoea due to congestive heart failure and
dyspnoea due to other causes. Copyright © 2002 Massachusetts Medical Society. All rights reserved.5
outcome) with true positives on the vertical axis (sensitivity) the chosen cut-off). In order to do the calculations long-hand, it
and false-positives (1-specificity) on the horizontal axis. necessitates converting probabilities into odds, multiplication
by LR and then back calculation into probability.
At lower BNP cut-offs, e.g. 50 pg/mL (17 µmol/L), there is
higher sensitivity or better ability to identify patients with CHF, Post-test odds = pre-test odds X LR
although this is compromised by lower specificity (i.e. the test
Odds = prevalence / (1- prevalence)
falsely identifies more subjects without CHF).5 The corollary
of higher sensitivity, however is higher negative predictive Prevalence = odds / (1 + odds)
value, in other words the test performs better as a “rule-out”
test and enables the clinician to consider causes of dyspnoea In the example above, the prior probability of LV dysfunction
other than CHF. Conversely, higher cut-offs are more likely to in this clinical setting was 32%. By applying the equation
identify patients with CHF than due to other causes, in other above, this can be converted to odds.
words higher specificity and positive predictive value, giving
a better “rule-in” test. Odds = 0.32 / (1 – 0.32)
Odds = 0.32 / 0.68 = 0.47 (or odds of approximately 1 to 2)
The ROC curve graphically displays the trade-off between
sensitivity and specificity and is useful in assigning the best Post-test odds = 0.47 X 0.53 (LR of a negative test)
cut-offs for clinical use.3 Overall accuracy is sometimes Post-test odds = 0.25
expressed as area under the ROC curve (AUC) and provides Post-test probability = 0.25 / (1 + 0.25)
a useful parameter for comparing test performance between, Post-test probability = 0.2 or 20%
for example, different commercial BNP assays and also the
related N-terminal pro-BNP assay.6 The posterior (or post-test) probability of LV dysfunction
is therefore 20% in the presence of normal ECG and chest
The diagnostic parameters of a test are not intrinsic properties radiogram and the absence of a prior myocardial infarction.
of the test and are critically dependent upon the clinical
context within which they are employed, as illustrated in the When a negative BNP (<17.9 pg/mL) is added to the above
following example. combination of tests, the negative LR becomes 0.42 (as
opposed to 0.53 without BNP). It is an instructive exercise
The diagnostic accuracy of BNP was evaluated in a General for the reader to follow the above train of calculations starting
Practice study of elderly patients (mean age of 74 years) who with the given pre-test probability of 32%. With the addition
presented with breathlessness. The diagnosis of LV dysfunction of BNP, the reader should be able to derive the post-test
was confirmed by transthoracic echocardiography, and a BNP probability of 16%.7
concentration >17.9 pg/mL was considered abnormal.7
The point of this exercise is to show that adding a test for
BNP was raised in the 40 patients with LV systolic dysfunction BNP to the determination of a patient’s history of myocardial
compared with those with normal ventricular systolic infarction in the diagnostic screening process reduces the
function.7 At a BNP concentration >17.9 pg/mL, there was a posterior probability to 16%, a small incremental advantage
sensitivity of 88% and specificity of 34% for identification to that achieved by a combination of clinical history and
of LV dysfunction.7 The prevalence (or prior probability) of traditional investigations likely to be undertaken in any case.7
LV dysfunction in this study was 32%, lower than the 47% This leaves a residual 1 in 7 chance of LV systolic dysfunction
of patients with CHF in the ED setting.5 The negative LR which is unacceptably high and unlikely to deter a General
for a patient without a history of myocardial infarction, with Practitioner from referring the patient for echocardiography.7
normal chest radiography and electrocardiogram (ECG) is Therefore, in the clinical context of General Practice, the
0.53, yielding a posterior probability of LV dysfunction of prevalence of CHF is lower than that among newly presenting
20%.7 This is derived as follows: dyspnoeic patients to the ED and the diagnostic performance
of BNP is correspondingly lower.
LRs are multiplied by pre-test odds of a condition in order
to give post-test odds. In the case of positive LRs, it gives Another way of applying LRs without doing long-hand
the post-test odds of a condition being present if the test is calculations is to use Fagan’s nomogram, an example of
positive (and relative to whatever chosen cut-off). Conversely which is shown in Figure 2.8 Prior probability is indicated on
in the case of negative LRs, it gives the post-test odds of a the vertical axis on the left of the nomogram and a line is then
condition being present if the test is negative (again relative to drawn through the LR value in the middle (note the logarithmic
Figure 2. An example of Fagan’s nomogram.8 Prior probability is indicated on the vertical axis on the left of the nomogram
and a line can be drawn through the BNP value in the middle (note the logarithmic scale) and extrapolated to the point where
it intercepts the vertical axis on the right of the nomogram which corresponds to post-test probability. Source: BMJ, 2004, 329,
168-9. Reproduced with permission from the BMJ Publishing Group.
scale) and extrapolated to the point where it intercepts the cancer than in women without this disease.” The third version
vertical axis on the right of the nomogram which corresponds was intended to present the positive LR of the second in
to post-test probability. non-technical language. The participants who were not given
any information on the test’s accuracy seemed to grossly
Likelihood Ratios may be More Intelligible overestimate the probability of endometrial cancer compared
In conveying the meaning of diagnostic accuracy to clinicians, with the other two groups. Those clinicians provided with the
there is some evidence that LRs expressed in non-technical sensitivity and specificity of the scan had a lower degree of
language are more intelligible to clinicians and enable a more over-estimation of test performance and those given the LR
appropriate interpretation of tests. in plain language gave the most appropriate estimation of test
performance.9 Despite a long tradition of reporting diagnostic
General practitioners were asked to estimate the probability accuracy in terms of sensitivity and specificity, only a minority
of endometrial cancer in a 65 year old woman with abnormal of clinicians correctly apply this information. Authors of
uterine bleeding with the prevalence of endometrial cancer diagnostic test data have been urged to reconsider the way
in all women with abnormal uterine bleeding given as 10%.9 they communicate their research data, with more emphasis on
Participants were given the result of a transvaginal ultrasound LRs. With more structured request forms, it may be possible
scan in one of three different ways: “Transvaginal ultrasound to elicit prior probabilities of a condition and then use the test
showed a pathological result compatible with cancer”; values (converted to LRs) to derive post-test probabilities.
“Transvaginal ultrasound showed a pathological result
compatible with cancer. The sensitivity of this test is 80%, Pre-test Probability – the Starting Point
its specificity is 60%”; or “Transvaginal ultrasound showed An assignment of pre-test probability is the prerequisite to any
a pathological result compatible with cancer. A positive result decision to undertake a diagnostic test or not and presupposes
is obtained twice as frequently in women with an endometrial that there is diagnostic uncertainty. Related to this are the
concepts of test and treatment thresholds.1 If the probability Based Laboratory Medicine; Principles, Practice and
of a condition is so unlikely (below the test threshold), it can Outcomes, Second Edition. Washington DC, USA:
be eliminated from the differential diagnosis. Conversely, if AACC Press; 2007. p. 53-66.
the probability is sufficiently high for treatment to be initiated 2. Bossuyt PMM. Studies for Evaluating Diagnostic and
(above the treatment threshold), then testing is not required.1 Prognostic Accuracy. In: Price CP, editor. Evidence-
Where the probability lies between the two thresholds, further Based Laboratory Medicine; Principles, Practice and
diagnostic testing is indicated.1 Where the thresholds are set Outcomes, Second Edition. Washington DC, USA:
depends upon the clinical context and clinician preference.1 AACC Press; 2007. p. 67-81.
3. Boyd JC. Statistical Analysis and Presentation of
Quality of Studies and STARD Criteria Data. In: Price CP, editor. Evidence-Based Laboratory
Diagnostic accuracy presupposes that the quality of studies Medicine; Principles, Practice and Outcomes, Second
is rigorous and that sources of bias are avoided. There are Edition. Washington DC, USA: AACC Press; 2007. p.
many sources of bias. For example, spectrum bias where one 113-40.
compares a group of subjects with the condition of interest to 4. Maisel A. Circulating natriuretic peptide levels in acute
a group without the condition causes the diagnostic accuracy heart failure. Rev Cardiovasc Med 2007;8 (Suppl):
to be overestimated (perhaps as much as three-fold), a S13-21.
commonly observed problem with tumour markers. Lijmer 5. Maisel AS, Krishnaswamy P, Nowak RM, McCord J,
et al. give a good review of sources of bias, an essential
Hollander JE, Duc P, et al. Rapid measurement of B-
foundation for any appraisal of the quality of diagnostic
type natriuretic peptide in the emergency diagnosis of
studies,10 and the Standards for Reporting of Diagnostic
heart failure. N Engl J Med 2002;347;161-7.
Accuracy (STARD) criteria give a checklist of points to be
6. Lainchbury JG, Campbell E, Frampton CM, Yandle TG,
fulfilled by investigators.11 Beyond diagnostic accuracy is the
Nicholls MG, Richards AM. Brain natriuretic peptide
consideration of whether the performance of a diagnostic test
and n-terminal brain natriuretic peptide in the diagnosis
actually influences a clinical outcome, a higher stratum in the
of heart failure in patients with acute shortness of breath.
pyramid of evidence based laboratory medicine and beyond
J Am Coll Cardiol 2003;42:728-35.
the scope of the present review.
7. Landray MJ, Lehman R, Arnold I. Measuring brain
natriuretic peptide in suspected left ventricular systolic
Conclusion
Sensitivity and specificity vary with the cut-off chosen for a dysfunction in general practice: cross-sectional study.
diagnostic test and are not intrinsic to the test but critically BMJ 2000;320:985-6.
dependent upon the clinical context. ROC curve analysis 8. Deeks JJ, Altman DG. Diagnostic tests 4: likelihood
enables the best cut-off for clinical purpose to be assigned, ratios. BMJ 2004;329:168-9.
higher sensitivity corresponding to high negative predictive 9. Steurer J, Fischer JE, Bachmann LM, Koller M,
value and the ideal property of a “rule-out” test. LRs may Riet G. Communicating accuracy of tests to general
be a more intelligible way of conveying the properties of a practitioners: a controlled study. BMJ 2002;324:824-6.
diagnostic test to clinicians and may merit further adoption 10. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins
into operational practice. MH, van der Meulen JH, et al. Empirical evidence of
design-related bias in studies of diagnostic tests. JAMA
Competing Interests: None declared. 1999;282:1061-6.
11. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA,
References Glasziou PP, Irwig LM, et al. Towards complete and
1. Matchar DB, Orlando LA. The Relationship Between accurate reporting of studies of diagnostic accuracy: the
test and Outcome. In: Price CP, editor. Evidence- STARD initiative. BMJ 2003;326:41-4.