Part 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Statistics

Study designs: Part 5 – Interventional studies (III)


Priya Ranganathan, Rakesh Aggarwal1
Department of Anaesthesiology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, 1Director, Jawaharlal
Institute of Postgraduate Medical Education and Research, Puducherry, India

Abstract Several methodological and statistical aspects of clinical trials can affect the robustness of their results. We
conclude the series of articles on “Interventional Studies” by discussing some of these features.

Keywords: Bias, clinical trials as topic, research design

Address for correspondence: Dr. Priya Ranganathan, Department of Anaesthesiology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai,
Maharashtra, India.
E‑mail: drpriyaranganathan@gmail.com
Received: 19-12-19, Accepted: 31-12-19, Published: 31-01-20.

In this last of the three pieces on interventional studies, we essentially hypothesis generating, and these hypotheses can
examine some additional aspects of clinical trials, which are then form the basis of future studies.
crucial to ensure the validity of their results. These include:
Most studies will have a single primary outcome
a. Choice of study outcomes (corresponding to the primary objective of the study) and
b. Appropriate sample size a number of secondary outcomes (corresponding to the
c. Minimizing missing data secondary objectives). For example, the DREAMS study
d. Appropriate analysis technique compared the efficacy of dexamethasone versus standard
i. Intention‑to‑treat versus per‑protocol analysis therapy for postoperative nausea and vomiting in patients
ii. Choice of statistical test undergoing gastrointestinal surgery.[1] The primary outcome
iii. Adjustment for multiple testing. was the occurrence of “any episode of vomiting within
e. Complete and unbiased reporting. 24 h after surgery.” The study also assessed many secondary
outcomes, including the number of episodes of vomiting,
CHOICE OF STUDY OUTCOME the need for anti‑emetics, and severity of nausea and of
vomiting.
The study outcomes are the variables that a research study
sets out to measure. These should be chosen such that they Sometimes, a researcher may choose to study more than
capture the key effects of the study interventions. Study one (multiple) primary outcomes. Although this may
outcomes should be defined a priori (in the protocol; before provide a more comprehensive assessment of the effects
the study commences), should be clinically relevant, should of the experimental treatment, it carries an increased risk
be amenable to quick and reliable measurement, should be of false‑positive results, as discussed in the section below
sensitive to the effect of the study intervention, and should on multiple testing. Hence, such studies need more careful
address the overall aim of the study. At times, a study may planning and interpretation.
assess a few additional exploratory outcomes, which are
Access this article online This is an open access journal, and articles are distributed under the terms of the Creative Commons
Attribution‑NonCommercial‑ShareAlike 4.0 License, which allows others to remix, tweak, and
Quick Response Code:
Website: build upon the work non‑commercially, as long as appropriate credit is given and the new creations
are licensed under the identical terms.
www.picronline.org
For reprints contact: reprints@medknow.com

DOI:
10.4103/picr.PICR_209_19 How to cite this article: Ranganathan P, Aggarwal R. Study designs:
Part 5 – Interventional studies (III). Perspect Clin Res 2020;11:47-50.

© 2020 Perspectives in Clinical Research | Published by Wolters Kluwer - Medknow 47


Ranganathan and Aggarwal: Interventional studies

The sample size required for a study is calculated based on the Composite outcomes
expected difference in a primary outcome measure between Researchers often combine many related outcomes
the intervention and the control groups. Studies are often not into a single outcome measure known as a composite
sufficiently powered to definitively address the secondary outcomes. endpoint. For example, trials of cardiovascular diseases
commonly use major adverse cardiovascular event
Very often, in addition to the efficacy outcomes, some (MACE) as a composite endpoint; this combines any
outcomes related to toxicity (e.g., the total number of myocardial infarction, cerebrovascular event (e.g.,
adverse events or the number of individuals with specific stroke), and cardiovascular death. Composite endpoints
adverse events, in each arm) are also included. increase the total number of patients who have events
of interest, improving the statistical power of the
Outcomes can be of different types. Several considerations
analysis of study results. However, one should be
may influence the decision to choose some specific types
careful to combine only such outcomes that have the
of outcomes.
same biological pathway and are affected similarly by
Surrogate outcomes the study interventions.
Researchers may choose to measure one or more
Some considerations for integrating many outcomes into
biochemical or radiological parameters (which are often
a composite endpoint include whether the components
easier to measure and show a change over a shorter time
frame) as substitutes for more direct outcomes - such as are of similar importance, whether they occur with
clinical improvement, improved survival, or reduced risk of somewhat similar frequency, and whether the intervention
disease recurrence. These are known as surrogate outcomes. is likely to affect all the components similarly.[4] A
For example, to assess the effect of a new treatment for systematic review of studies with composite endpoints in
diabetes, one may measure the change in glycosylated cardiovascular medicine found that the largest treatment
hemoglobin, although the real interest is the impact of effects were seen in the components which were clinically
experimental treatment on diabetic complications and less important, thus potentially misleading readers.[5]
end‑organ damage. In prostate cancer, one could measure Interestingly, in a trial of cariporide, a cardiovascular
the changes in blood levels of prostate‑specific antigen drug, the incidence of composite outcome (death or
or tumor shrinkage after therapy; however, again, the real MI) showed a reduction from 20.3% in the placebo
interest is in whether the treatment translates into a benefit group to 16.6% in the treatment group; however, a
in survival. Other examples include measurement of CD4 closer look showed that though the incidence of MI had
counts to assess the efficacy of antiretroviral therapy or of declined (from 18.9% to 14.4%), the mortality had in fact
lipid levels for that of statins. increased (from 1.5% to 2.2%).[6]

The use of surrogate outcomes is valid only if the changes Subjective versus objective outcomes
in these correlate well with changes in clinical outcomes. Objective or “hard” outcomes are those which are
Their use may sometimes lead to a misleading conclusion. unambiguous and can be consistently measured by
Medical literature is replete with examples of drugs that different assessors. On the other hand, subjective or “soft”
were initially approved for marketing based on benefit outcomes are based on interpretation by the participant
in surrogate outcomes but were subsequently found to or assessor and can be associated with measurement bias.
worsen clinical outcomes. For example, anti‑arrhythmic For example, in the DREAMS study, episodes of vomiting
drugs in myocardial infarction (MI) patients were found defined as projectile expulsion of gastric content would be
to suppress ventricular premature beats, which are known a hard endpoint, whereas nausea (as experienced by the
in this situation to be associated with increased mortality. participant) is a subjective endpoint.[1] Wherever possible,
Hence, these drugs were, for several years, recommended for one should use objective endpoints, in order to minimize
post-MI patients.[2] However, a subsequent trial showed that bias and improve the validity of study results. If subjective
the use of these drugs, despite reducing the occurrence of outcomes have to be used (since patient‑reported outcomes
premature beats (a surrogate outcome), was not associated are important though often subjective), all attempts must
with a reduction in more complex fatal arrhythmias (the be made to reduce or eliminate bias, such as using blinding
desired clinical endpoint) and in fact led to increased techniques (for patients and assessors) and standardized
mortality.[2] Similarly, higher doses of erythropoietin in validated scales and scores. As an example, the DREAMS
patients with renal failure improve hematocrit but lead to trial used standard validated scales to measure nausea,
increased cardiovascular thrombotic events and death.[3] fatigue, and quality of life.[1]
48 Perspectives in Clinical Research | Volume 11 | Issue 1 | January-March 2020
Ranganathan and Aggarwal: Interventional studies

APPROPRIATE SAMPLE SIZE These errors are more likely if the sample sizes are small. In
particular, studies with a small sample size have low study
Research studies begin with a statement of belief or a power and a high risk of beta error. Thus, if a study with
hypothesis. For conventional superiority studies, where the only a few subjects fails to find a difference between two
objective is to compare an experimental treatment (E) with treatments, this may reflect a failure to detect a difference
standard treatment (S), we start with a null hypothesis – that even if one existed, rather than a true absence of difference.
there is no difference between the effects of treatment S Hence, it is important to ensure that a study is designed
and treatment E. The alternate hypothesis states that there to be sufficiently large to have a reasonable power, i.e., to
is a difference between these effects. have a reasonable likelihood of picking up a difference if
one exists.
Research studies are carried out in subsets (“samples”) from
the entire universe (“population”) of individuals to whom The formula for the calculation of the sample size required
the research question pertains. For example, to compare for a clinical trial is based on type 1 and type 2 errors
two drugs for the treatment of hypertension, ideally, we that one is willing to accept and the expected difference
would randomly assign all the individuals with hypertension between the treatment effects. The lower the type 1 and
to receive either drug and compare the results. However, type 2 errors one permits, the larger is the required sample
since this is not practical or feasible, we choose a sample size. One may wish both these errors to be zero; however,
of individuals with hypertension, compare the effects of this would mean an infinite sample size – an impossible
the two anti‑hypertensive drugs in them, and extrapolate task. Hence, as indicated above, we conventionally limit
the results to the rest of the population. In doing so, we the acceptable type 1 error to 5% and the type 2 error to
run the risk of two types of errors. 10% or 20%. As for the treatment effect, if the expected
difference in outcomes (or the difference that one wishes
1. Finding a difference between the effects of treatments to detect) between the two groups is smaller or if the
when a true difference does not exist (i.e., there outcome measure (on a continuous scale) has a larger
would be no difference if we could study the entire standard deviation, the required sample size is larger. The
population). This is called a type 1 error or alpha estimate of expected difference can be based on previous
error or a false‑positive error. In terms of hypothesis literature, a pilot study or the researcher’s assessment of
testing, this means that we would falsely reject the null what would be a clinically meaningful yet feasible difference
hypothesis and accept the alternate hypothesis between treatments. The calculated sample size is inflated
2. Not finding a difference between the effects of by 10%–20% to account for protocol violations and
treatments when, in fact, a difference exists. This is losses to follow‑up (please see the section on “Minimizing
known as a type 2 error or beta error or a false‑negative missing data” below), so that an adequate final number of
error. This means that we falsely accept the null observations is available for the analysis when the study
hypothesis and reject the alternate hypothesis. ends.
Fortunately, statistical methods allow us to assess the Researchers are often tempted to use a large expected
likelihood of these errors. By convention, the upper limit treatment difference to obtain a smaller estimate of the
of type 1 error is set at 5%. This means that if we observe required sample size. However, if this is not a realistic
a difference between the samples receiving new and the difference, one would run a greater risk of negative study
standard treatments, and the probability of this difference results.
having occurred by chance is 5% or less, we conclude (with
95% or greater certainty) that the observed difference is All trial protocols (and reports) should include a detailed
a true difference. section on sample size calculation, allowing readers to
assess whether the assumptions made are valid.
In most studies, the type 2 error is set at 10% or 20%.
This means that even if there is a true difference between MINIMIZING MISSING DATA
the treatments in the population, there is a 10% (or 20%)
probability that the study will fail to pick up this difference. During a trial, there are likely to be protocol deviations or
The converse of beta error is the “power” of a study, violations, and participant losses to follow‑up, resulting
which is defined as the ability of the study to detect a true in missing data. This has a negative impact on the validity
difference in treatment effects (90% or 80%, in the above of the study results. Statisticians have developed methods
example). to deal with missing data, such as multiple imputation
Perspectives in Clinical Research | Volume 11 | Issue 1 | January-March 2020 49
Ranganathan and Aggarwal: Interventional studies

techniques, best‑ and worst‑case scenarios, and the This ensures that the readers can better assess the quality of
last‑observation‑carried‑forward technique. However, the a study and hence the validity and applicability of its results.
best way of ensuring the validity of results is to have as
complete data as possible. There are no absolute cut-off It is not uncommon for investigators to compare multiple
points to define the acceptable level of missing data – these outcomes or to use multiple statistical tests for a particular
vary with the clinical condition being studied and the comparison and then cherry‑pick the results that show a
duration of follow‑up required. positive impact of a treatment. This is inappropriate. It
is important to report the results of a trial in totality and
Some ways to improve completeness of data collection without bias so that readers can assess the validity of the
include training of the study personnel to minimize study findings. Mandatory registration of clinical trials,
protocol violations, keeping the study protocol simple so with the investigators being required to specify the primary
that compliance is better and motivating participants to and secondary outcomes before starting a trial, is aimed at
adhere to the protocol. promoting such behavior.
APPROPRIATE STATISTICAL ANALYSIS
Financial support and sponsorship
Intention to treat versus per‑protocol analysis
Nil.
Intention‑to‑treat analysis refers to the analysis of
Conflicts of interest
participants in the group to which they were randomized,
There are no conflicts of interest.
irrespective of what treatment they received. On the other
hand, per‑protocol analysis refers to the analysis of only REFERENCES
those participants who adhered to the protocol. To minimize
bias, as discussed in a previous article in the journal,[7] 1. DREAMS Trial Collaborators and West Midlands Research
intention‑to‑treat analysis should always be reported in Collaborative. Dexamethasone versus standard treatment for
postoperative nausea and vomiting in gastrointestinal surgery:
superiority studies; per‑protocol analysis may be reported Randomised controlled trial (DREAMS Trial). BMJ 2017;357:j1455.
in addition, if desired. 2. Connolly SJ. Use and misuse of surrogate outcomes in arrhythmia
trials. Circulation 2006;113:764‑6.
Choice of statistical test 3. Singh AK, Szczech L, Tang KL, Barnhart H, Sapp S, Wolfson M, et al.
The choice of statistical test used for the analysis depends Correction of anemia with epoetin alfa in chronic kidney disease.
N Engl J Med 2006;355:2085‑98.
on the type of data, the number of groups to be compared, 4. McCoy CE. Understanding the use of composite endpoints in clinical
the objective of the study, and the study design (paired trials. West J Emerg Med 2018;19:631‑4.
versus unpaired). The use of an inappropriate test can give 5. Ferreira‑González I, Busse JW, Heels‑Ansdell D, Montori VM, Akl EA,
misleading results. Readers can refer to published articles Bryant DM, et al. Problems with use of composite end points in
cardiovascular trials: Systematic review of randomised controlled trials.
for further details on the different types of tests and their
BMJ 2007;334:786.
application.[8] 6. Mentzer RM Jr, Bartels C, Bolli R, Boyce S, Buckberg GD, Chaitman B,
et al. Sodium‑hydrogen exchange inhibition by cariporide to reduce the
Adjustment for multiple testing risk of ischemic cardiac events in patients undergoing coronary artery
In a previous article, we had discussed how the comparison bypass grafting: Results of the EXPEDITION study. Ann Thorac
of several outcomes, interim analyses, or multiple subgroup Surg 2008;85:1261‑70.
7. Ranganathan P, Pramesh CS, Aggarwal R. Common pitfalls in statistical
comparisons increases the possibility of spuriously analysis: Intention‑to‑treat versus per‑protocol analysis. Perspect Clin
significant results.[9] For such analyses, the validity of Res 2016;7:144‑6.
positive results without examining and correcting for 8. Bellolio MF, Serrano LA, Stead LG. Understanding statistical tests
multiple comparisons is questionable. in the medical literature: Which test should I use? Int J Emerg Med
2008;1:197‑9.
9. Ranganathan P, Pramesh CS, Buyse M. Common pitfalls in statistical
COMPLETE AND UNBIASED REPORTING
analysis: The perils of multiple testing. Perspect Clin Res 2016;7:106‑7.
10. Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT
The CONSORT statement lists the elements which are 2010 statement: Updated guidelines for reporting parallel group
mandatory for the reporting of randomized clinical trials.[10] randomised trials. BMJ 2010;340:c332.

50 Perspectives in Clinical Research | Volume 11 | Issue 1 | January-March 2020

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy