Bland-Altman Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Turkish Journal of Emergency Medicine 18 (2018) 139–141

Contents lists available at ScienceDirect

Turkish Journal of Emergency Medicine


journal homepage: www.elsevier.com/locate/tjem

Review Article

Bland-Altman analysis: A paradigm to understand correlation and T


agreement
Nurettin Özgür Doğan
Kocaeli University, Faculty of Medicine, Dept. of Emergency Medicine, Kocaeli, Turkey

A R T I C LE I N FO A B S T R A C T

Keywords: The rapid increase in the number of new laboratory methods has led to the necessity of reliable verification
Bland-Altman analysis methods. Validation of a new measurement method for application to medical practice requires comparison with
Limits of agreement gold standard techniques. The Bland-Altman analysis is a frequently applied technique in studies that investigate
Correlation analysis the agreement between two methods of the same medical measurement. In this review, potential areas of usage
Biostatistics
of Bland-Altman analysis is elaborated from a clinical viewpoint, and possible pitfalls in study designs are
discussed in statistical perspective.

1. Introduction regression. Regression analysis uses the principles of correlation, but it


does more than just to describe the strength of a relationship between
The Bland-Altman analysis was proposed by Martin Bland and two variables.3 The main result of correlation analysis is the correlation
Douglas Altman over thirty years ago with an article published in coefficient (r), which ranges from −1.0 to +1.0. The closer the coef-
Lancet.1 In this article, their main argument was about the incorrect use ficient is to the ends of this range, the greater the strength of the linear
of correlation coefficients in comparison of a new measurement tech- relationship is.4 Correlation coefficients can be handled as linear mea-
nique with an established gold standard. This article is accepted as the sures for the relationship between variables without providing their
sixth most-cited paper in statistics literature and was about the differ- agreement.
ences between measurements obtained by two different measurement A fictitious data set is provided in Table 1. In this dataset, potassium
systems.2 In the following years, their method has become the most measurements from venous blood gas analysis and biochemistry panel
appropriate way of determining the limits of agreement (LOA) between are presented for each patient. It is easy to make an approximate esti-
measurements. mate of these values, and conclude that they are very close to each
Medical laboratories and clinicians often need to assess the agree- other. Also using a Spearman correlation analysis, correlation coeffi-
ment between two measurement methods. Validation of a clinical cient (Spearman's rho) can be found as 0.885 (p < 0.001), which in-
measurement method is a compelling and lengthy process, which ne- dicates a very strong relationship between the variables.5
cessitates acceptable LOA between two techniques. When the com- Does this mean that we can use a given variable instead of the
paring methods are continuous variables (e.g. leucocyte count, anti- other? Can we replace a laboratory method with the new one, regarding
body titer, body temperature), the Bland-Altman analysis is an this strong relationship? This argument is not always correct.
appropriate way to perform this comparison and presents quantified Unfortunately, correlation analysis provides a link between variables
measures to decide whether the new method is acceptable or not. This which just happen to occur together, without having an association in
review focuses on the current approach to the Bland-Altman method between. In this setting, Spearman's rho indicates only the power of this
and its applications in clinical practice. relationship, and this small p-value suggests just strong evidence
against the null hypothesis. Consequently, the null hypothesis is re-
2. Concept of correlation analysis jected and there is probably a relationship. However, results of the
correlation analysis do not answer following questions: [a] Is this oc-
For many years, correlation analysis has been used to assess the currence an incidental finding or have they a meaningful clinical as-
relationship between one variable and another. Correlation analysis is sociation? [b] What is the probability of error in each measurement of
classified as a part of a larger class of statistical techniques known as potassium? A high correlation does not explicitly imply that there is

Peer review under responsibility of The Emergency Medicine Association of Turkey.


E-mail addresses: @DrOzgurDogan, nurettinozgurdogan@gmail.com.

https://doi.org/10.1016/j.tjem.2018.09.001
Received 1 September 2018; Received in revised form 10 September 2018; Accepted 11 September 2018
Available online 17 September 2018
2452-2473/ 2018 Emergency Medicine Association of Turkey. Production and hosting by Elsevier B. V. on behalf of the Owner. This is an open access article
under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/).
N.Ö. Doğan Turkish Journal of Emergency Medicine 18 (2018) 139–141

Table 1
Dataset for potassium levels in venous blood gases and blood electrolyte work-up.
Potassium level (mEq/L) (Obtained from Potassium level (mEq/L) (Obtained from Mean potassium level Difference between potassium
venous blood gas analysis) blood electrolyte levels) (mEq/L) levels (mEq/L)

Patient Nr. 1 4.5 4.7 4.6 0.2


Patient Nr. 2 3.8 4.2 4.0 0.4
Patient Nr. 3 5.1 5.1 5.1 0.0
Patient Nr. 4 4.9 5.3 5.1 0.4
Patient Nr. 5 3.9 4.0 3.95 0.1
Patient Nr. 6 4.0 3.8 3.9 −0.2
Patient Nr. 7 4.1 4.0 4.05 −0.1
Patient Nr. 8 4.3 4.0 4.15 −0.3
Patient Nr. 9 5.3 5.3 5.3 0.0
Patient Nr. 10 5.2 5.1 5.15 −0.1
Patient Nr. 11 3.9 4.0 3.95 0.1
Patient Nr. 12 4.1 4.4 4.25 0.3
Patient Nr. 13 4.0 4.2 4.1 0.2
Patient Nr. 14 5.3 5.1 5.2 −0.2
Patient Nr. 15 5.5 5.3 5.4 −0.2
Patient Nr. 16 4.4 4.2 4.3 −0.2
Patient Nr. 17 4.9 5.0 4.95 0.1
Patient Nr. 18 3.7 3.9 3.8 0.2
Patient Nr. 19 3.9 3.7 3.8 −0.2
Patient Nr. 20 4.8 4.7 4.75 −0.1
Patient Nr. 21 5.5 5.2 5.35 −0.3
Patient Nr. 22 3.7 3.8 3.75 0.1
Patient Nr. 23 3.7 3.9 3.80 0.2
Patient Nr. 24 4.8 4.2 4.5 −0.6
Patient Nr. 25 5.1 5.6 5.35 0.5

good agreement between the two methods.4 Moreover, data which 4. Clinical implication and potential areas of usage
seem to be in a poor agreement can produce quite high correlations.
Only a clinician, who uses the test results in a clinical setting can
decide whether the mean bias and LOA are acceptable or not. For in-
3. Analysis of the differences between variables stance, a mean bias of 0.2 mEq/L is obviously acceptable for potassium
levels. However, 3 mEq/L is too broad and can lead to lethal compli-
Bland and Altman quantified the difference between measurements cations if the actual potassium value is higher in biochemistry panel.
using a graphical method. They draw a scatterplot in which the X-axis Bland-Altman analysis was previously used in many method com-
represented the average [(K1 + K2)/2], and the Y-axis represented the parisons in the literature. It may be used to compare two new mea-
difference (K1 – K2) of two measurements. After the graph is drawn, the surement methods or one measurement method against a reference
mean bias (mean of the K1 – K2) and its confidence limits (limits of standard. These measurement variables should be continuous (not ca-
agreement) should be quantified. Using statistical software, a one- tegorical) such as hemoglobin level (g/dl), anti-HCV antibody titer or
sample T-test can be performed to calculate the mean bias and its SD. the size of a tumor (cm). The Bland-Altman method is a popular ap-
To represent mean bias and limits of agreement, we need only mean of proach, and there are reports including but not limited to compare two
the difference of measurement methods and its standard deviation ob- hemodynamic measurements,6 end-tidal carbon dioxide measurement
tained from one-sample T-test. Secondly, the data points can be re- methods,7,8 different electrolyte level measurement methods,9 self-as-
stricted using +2 standard deviation (SD) to demonstrate a 95% con- sessed general well-being scores,10 performance of different computed
fidence interval (CI; precisely defined: mean ± 1.96 standard tomography technologies in evaluating pulmonary nodules.11
deviations) of distributed data. An ideal agreement is zero difference
between measurements. Thus average difference and its limits can also
be found near zero in this setting. 5. Pitfalls in Bland-Altman analysis
For our dataset, the mean difference (mean bias) was found as 0.012
with an SD of 0.260. A scatterplot should be drawn to understand One of the critical problems in the Bland-Altman analysis is the need
dispersion of variables using X-axis (average) and Y-axis (difference). to meet the assumption of normal distribution. The continuous mea-
The LOA can be drawn manually if the statistical software does not surement variables need not to be normally distributed, but their dif-
automatically demonstrate them. In our data set, the upper limit can be ferences should. If the assumption of normal distribution is not met,
calculated using mean + 1.96 x SD (0.012 + 1.96 x 0.260 = 0.522) data may be logarithmically transformed.4 The data may be tested
and the lower limit can be calculated using mean – 1.96 x SD against the normal distribution using classical methods such as the
(0.012–1.96 x 0.260 = –0.498). The appropriate statement used in the Shapiro-Wilk test or Kolmogorov-Smirnov test. Visual evaluation of the
manuscript can be following: The Bland-Altman plot showed the mean histogram plot may not be adequate.
bias ± SD between first and second potassium levels as 0.012 ± 0.260 Another problem arises from the sample size. Studies comparing
mEq/L, and the limits of agreement were −0.498 and 0.522 (Fig. 1). methods of measurements should be adequately sized to conclude that
The scatterplot can be evaluated according to the scatter dispersion. the effects are universally valid. If the sample size is not adequate, it is
In a good agreement, the scattering of points is diminished, and points possible to find a low mean bias and reduced limits of agreement by
lie relatively close to the line which represents mean bias. As a quan- comparing two methods.12 Such methods cannot be recommended for
tifiable measure, mean bias and limits of the agreement give informa- general use without verification of the results of other studies. To cal-
tion about the utility of the new measurement method. Regarding our culate sample size, maximum allowed difference derived from other
data set, those two methods can be used interchangeably as the limits studies should be provided.
vary from nearly one mEq/L of potassium. Some authors argue that also regression analysis can be performed

140
N.Ö. Doğan Turkish Journal of Emergency Medicine 18 (2018) 139–141

Fig. 1. Agreement between two potassium measurements (Bland-Altman plot).

to compare two methods of measurements. The Bland-Altman analysis Author contributions


may bring proportional bias, which is present when the difference in
values resulting from two methods increases or decreases in proportion NOD designed and wrote the manuscript, he also takes responsi-
to the average values.13 Although it is an uncertain area of expertise, bility for the paper as a whole.
Ludbrook indicated that two methods could be used for different pur-
poses: According to him, regression analysis can be used if the concern References
of the investigator is to calibrate one measurement against another or to
detect bias between two methods of measurement. However, if the goal 1. Bland JM, Altman DG. Statistical methods for assessing agreement between two
is to determine whether a method may be safely substituted for another, methods of clinical measurement. Lancet. 1986;1:307–310.
2. Ryan TP, Woodall WH. The most-cited statistical papers. J Appl Stat.
particularly in clinical practice, the Bland-Altman method may be 2005;32:461–474.
used.13 3. Greenfield ML, Kuhn JE, Wojtys EM. A statistics primer. Correlation and regression
An other problem in the Bland-Altman analysis is repeated measure analysis. Am J Sports Med. 1998;26:338–343.
4. Giavarina D. Understanding Bland Altman analysis. Biochem Med. 2015;25:141–151.
designs. The Bland-Altman analysis is not an appropriate method to 5. Akoglu H. User's guide to correlation coefficients. Turk J Emerg Med. 2018;18:91–93.
compare repeated measurements. However, it can be performed by 6. Brazdzionyte J, Macas A. Bland-Altman analysis as an alternative approach for sta-
adding a random effects model to the analysis.14,15 In addition, some tistical evaluation of agreement between two methods for measuring hemodynamics
during acute myocardial infarction. Medicina (Kaunas). 2007;43:208–214.
statistical softwares allow to perform analysis for repeated designs 7. Pekdemir M, Cinar O, Yilmaz S, Yaka E, Yuksel M. Disparity between mainstream and
using Bland-Altman method. Besides, a meta-analysis of studies con- sidestream end-tidal carbon dioxide values and arterial carbon dioxide levels. Respir
ducted with the Bland-Altman analysis is still under debate, recently a Care. 2013;58:1152–1156.
8. Doğan NÖ, Şener A, Günaydın GP, et al. The accuracy of mainstream end-tidal carbon
framework for the meta-analysis of Bland-Altman studies based on
dioxide levels to predict the severity of chronic obstructive pulmonary disease ex-
limits of agreement approach is published.16 acerbations presented to the ED. Am J Emerg Med. 2014;32:408–411.
9. Altunok İ, Aksel G, Eroğlu SE. Correlation between sodium, potassium, hemoglobin,
6. Conclusion hematocrit, and glucose values as measured by a laboratory autoanalyzer and a blood
gas analyzer. Am J Emerg Med. 2018 Aug 18. https://doi.org/10.1016/j.ajem.2018.
08.045 [In Press].
Correlation analysis may lead to incorrect or debated results in 10. Hofman CS, Melis RJ, Donders AR. Adapted Bland-Altman method was used to
comparison of two measurement methods. The Bland-Altman analysis is compare measurement methods with unequal observations per case. J Clin Epidemiol.
2015;68:939–943.
a simple and accurate way to quantify agreement between two vari- 11. Paks M, Leong P, Einsiedel P, Irving LB, Steinfort DP, Pascoe DM. Ultralow dose CT
ables and may help clinicians to compare a new measurement method for follow-up of solid pulmonary nodules: a pilot single-center study using Bland-
against another one or a reference standard. Altman analysis. Medicine (Baltim). 2018;97(34):e12019.
12. Bunce C. Correlation, agreement, and Bland-Altman analysis: statistical analysis of
method comparison studies. Am J Ophthalmol. 2009;148:4–6.
Conflict of interest 13. Ludbrook J. Confidence in Altman-Bland plots: a critical review of the method of
differences. Clin Exp Pharmacol Physiol. 2010;37:143–149.
14. Myles PS, Cui J. Using the Bland-Altman method to measure agreement with re-
The author declares no conflicts of interest. peated measures. Br J Anaesth. 2007;99:309–311.
15. Woodman RJ. Bland-Altman beyond the basics: creating confidence with badly be-
Source of funding haved data. Clin Exp Pharmacol Physiol. 2010;37:141–142.
16. Tipton E, Shuster J. A framework for the meta-analysis of Bland-Altman studies based
on a limits of agreement approach. Stat Med. 2017;36:3621–3635.
None declared.

141

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy