2015 Book TheEssentialGuideToN-of-1Trial
2015 Book TheEssentialGuideToN-of-1Trial
2015 Book TheEssentialGuideToN-of-1Trial
The Essential
Guide to
N-of-1 Trials
in Health
The Essential Guide to N-of-1 Trials in Health
Jane Nikles • Geoffrey Mitchell
Editors
Springer Science+Business Media B.V. Dordrecht is part of Springer Science+Business Media (www.
springer.com)
Preface
The fundamental problem facing the clinical research enterprise is this: what
clinicians (and patients) want to know is not what clinical trials are equipped to say.
Randomized controlled trials (RCTs) are all about the average patient; they yield
average treatment effects. Doctors and patients want individual treatment effects:
how a given patient will respond to Treatment A versus Treatment B. No amount of
statistical subterfuge can make standard-issue, parallel group RCTs reveal precisely
the results we want. There is one method, however, that under certain conditions can
reliably identify the best treatment for an individual. That method is the N-of-1 trial.
N-of-1 trials are crossover experiments conducted with a single patient. They are
applicable principally to treatments used for symptomatic, chronic, nonfatal condi-
tions. By systematically observing a patient’s response to two or more treatments,
we can determine which of the treatments are likely to work best for that patient in
the long run. N-of-1 trials were introduced to clinicians by Hogben and Sim as early
as 1953, but it took 30 years before Gordon Guyatt brought them into the medical
mainstream. Early pioneers established active N-of-1 trial units in academic cen-
ters, only to abandon them once funding was exhausted. However, several units are
still thriving, and over the past three decades, over 2,000 patients have participated
in published N-of-1 trials.
And yet, considering the significant potential N-of-1 trials have for individual-
izing care and supporting shared decision making, a compelling case could be made
that they are woefully underused. One reason is that few clinical investigators and
even fewer clinicians understand their rationale, methods, and applications. Now,
here in one place is the information these individuals have been seeking.
The Essential Guide to N-of-1 Trials in Health will be useful to two audiences:
clinical researchers seeking a more direct approach to estimating individual
treatment effects and clinicians aspiring to apply more rigor to their own therapeutic
decision making. Written by many of the world’s most knowledgeable authorities
on N-of-1 trials, the book provides a step-by-step approach to design and conduct of
N-of-1 trials, from study conception and ethical approval to data collection, analysis,
and interpretation. While some enthusiastic readers will read the guide cover to
cover, each chapter can also stand alone.
v
vi Preface
When clinicians and patients first hear about N-of-1 trials, their initial incredulity
frequently turns to intense interest. How, in an era when personalized medicine is all
the rage, could such a powerful approach be so little known? As an accessible yet
rigorous introduction to the method, the Essential Guide to N-of-1 Trials in Health
will help provide tools, answers, and inspiration.
We are very appreciative of the time and skill contributed by Kylie O’Toole and
Genevieve Clark in preparing this book for publication and for invaluable adminis-
trative support.
We thank our editor at Springer, Melania Ruiz, for her guidance and patience in
the process of bringing this book to completion.
vii
Contents
1 Introduction ............................................................................................. 1
Jane Nikles and Geoffrey Mitchell
2 What are N-of-1 Trials?......................................................................... 9
Jane Nikles
3 N-of-1 Trials in the Behavioral Sciences ............................................... 19
Robyn L. Tate and Michael Perdices
4 N-of-1 Trials in Medical Contexts.......................................................... 43
Geoffrey Mitchell
5 Aggregated N-of-1 Trials ........................................................................ 57
Geoffrey Mitchell
6 Methodological Considerations for N-of-1 Trials................................. 67
Keumhee C. Carriere, Yin Li, Geoffrey Mitchell, and Hugh Senior
7 Randomization, Allocation Concealment, and Blinding ..................... 81
Hugh Senior
8 Data Collection and Quality Control .................................................... 93
Hugh Senior
9 Individual Reporting of N-of-1 Trials to Patients and Clinicians ....... 105
Michael Yelland
10 Assessing and Reporting Adverse Events ............................................. 115
Hugh Senior
11 Research Ethics and N-of-1 Trials ......................................................... 125
Andrew Crowden, Gordon Guyatt, Nikola Stepanov, and Sunita Vohra
12 Statistical Analysis of N-of-1 Trials ....................................................... 135
Kerrie Mengersen, James M. McGree, and Christopher H. Schmid
ix
x Contents
Jane Nikles has been working at The University of Queensland in the field of
N-of-1 trials for over 15 years. Her Ph.D. on using N-of-1 trial methodology in
clinical practice was awarded in 2006. She has been a chief investigator on over
$40 m of research funding in the area of N-of-1 trials and has published over 20
peer-reviewed journal articles in the field. She was involved in developing the
CONSORT extension for N-of-1 trials (CENT). She is currently conducting an
international multisite N-of-1 trial which also compares aggregated N-of-1 trials
with parallel arm randomized controlled trials, a world first.
xi
Chapter 1
Introduction
With the rising cost of patient care (including drug costs and clinic visits), N-of-1
trials have potential to minimize clinician and patient investment in time and money
on suboptimal treatments. Recognition that the USA is in the midst of a healthcare
crisis has prompted calls for advances in biomedical research. Potential ways for-
ward are individualized medicine and personalized evidence-based medicine to
improve treatment efficiency, by reducing individual patients’ exposure to treat-
ments that do not work and those that cause adverse side effects. In addition, mov-
ing towards a more individualized and personalized health-care system of the type
built from the N-of-1 study principle and infrastructure, would allow exploration
and tapping into the potential of genomics and wireless devices. In this context, a
text setting out the theoretical and practical issues surrounding N-of-1 trials in the
health setting is timely. This is illustrated by a quote from Lillie et al. 2011:
Despite their obvious appeal and wide use in educational settings, N-of-1 trials have been
used sparingly in medical and general clinical settings. We emphasize the great utility of
modern wireless medical monitoring devices in their execution. We ultimately argue that
N-of-1 trials demand serious attention among the health research and clinical care com-
munities given the contemporary focus on individualized medicine. (Lillie et al. 2011)
Our centre has conducted over 600 N-of-1 trials in areas ranging from osteoarthritis
in adults to Attention Deficit Hyperactivity Disorder in children to palliative care.
We have experience in conducting the trials face to face and by post and telephone;
and both individually and aggregated together.
Our colleagues felt that the expertise we had developed in over 15 years of
conducting N-of-1 trials was worth sharing more broadly and more in-depth than is
possible in journal articles. The idea for a book was born.
At the time we commenced writing, there were no in depth books on N-of-1
trials in the health setting such as this one. However, Kravitz et al. (2014) have
recently published a comprehensive text entitled “Design and Implementation of
N-of-1 Trials: A User’s Guide.” Our book intentionally avoids significant overlap
with their book.
The readers we hope to reach with this book are clinicians, academic researchers,
health professionals or practitioners, scientists, and pharmaceutical company staff
in the broad area of health; and funders and regulators in various countries who wish
to investigate or conduct N-of-1 trials.
The book may also be useful for graduate students in methodologically based
courses or doing research higher degrees in areas such as public health, and also for
undergraduate students or interested consumers not trained in the health sphere.
We have written this book with two discrete audiences in mind. The first is
interested clinicians who will gain benefit from an overview of the N-of-1
technique. We have included chapters that look at the clinical applicability of the
technique, how to run an N-of-1 trial in individuals and how to combine results to
gain a population estimate. We would suggest reading Chaps. 2, 3, 4, 5, 9, and 15
for this broader overview.
For those readers who desire in-depth examination of N-of-1 trial design, con-
duct and analysis, we have included chapters that are more technical in nature. This
will be of considerable use to people designing high quality trials, and analyzing the
data that arises from them, both in terms of determining individual treatment effects
1 Introduction 3
and when aggregating the data to generate a population estimate. We would suggest
reading Chaps. 6, 7, 8, 10, 11, 12, 13, 14, 16, and 17 for this more in-depth
discussion.
A brief description of the chapters follows.
What are N-of-1 trials? In Chap. 2, Jane Nikles defines N-of-1 trials and provides
a brief historical perspective. She discusses the background and rationale for N-of-1
trials, and describes their benefits.
Robyn Tate and Michael Perdices’ chapter on N-of-1 trials in the behavioral
sciences (Chap. 3) describes the application of N-of-1 trials in the behavioural sci-
ences, where they are commonly referred to as single-case experimental designs
(SCEDs). Four essential features demarcate single-case methodology from between-
group designs: (i) the individual serves as his or her own control, (ii) use of a specific
and operationally-defined behaviour that is targeted by the intervention, (iii) frequent
and repeated measurement of the target behaviour throughout all phases of the exper-
iment, and (iv) issues surrounding external validity. Features that strengthen internal
and external validity of SCEDs are discussed in the context of a standardised scale to
evaluate the scientific quality of SCEDs and N-of-1 trials, the Risk of Bias in N-of-1
Trials (RoBiNT) Scale. New work in developing a reporting guide in the CONSORT
tradition (the Single-Case Reporting guideline In BEhavioural interventions,
SCRIBE) is referenced. Subsequent sections in the chapter highlight differences
among the prototypical single-case designs reported in the literature, both experi-
mental (withdrawal/reversal, multiple-baseline, alternating-treatments, and chang-
ing-criterion designs) and non-experimental (biphasic A-B design, B-phase training
study, preintervention/ post-intervention design, and case description/report), along
with illustrative examples reported in the literature. The final section of the chapter
describes available methods to analyse data produced by SCEDs, including struc-
tured visual analysis, randomization tests and other statistical procedures.
Following on from this, Geoff Mitchell in N-of-1 trials in medical contexts
(Chap. 4) argues the case for N-of-1 studies assuming a place in the clinical arma-
mentarium. Clinicians make treatment decisions on a regular basis, and some deci-
sions may result in patients taking treatments for years. This decision-making is a
core skill of clinicians, and if possible it should be evidence based. The problem is
that the most common tool to aid this decision making, the RCT, has many problems
which can lead to a patient being prescribed a treatment that may not work for them.
N-of-1 studies may be useful tools to assist in making the best decision possible.
This chapter argues the case for N-of-1 studies assuming a place in the clinical
armamentarium. It describes the rationale for and uses of N-of-1 trials, the
advantages and limitations of N-of-1 trials, and discusses aggregation of N-of-1 trials
to generate population estimates of effect.
In the next chapter (Chap. 5) he outlines the rationale, methods, benefits and
limitations of combining N-of-1 trials. The original purpose of N-of-1 trials is to
determine whether a treatment works in a person. However, these trials can be
considered as mini-randomized controlled trials (RCTs), with the person providing
multiple datasets to the intervention and control groups. Therefore, several people
undergoing the same N-of-1 trial can contribute many data sets and this rapidly
4 J. Nikles and G. Mitchell
scales up to the point where the power of the trial can equate to a normal RCT, but
with far fewer participants. This characteristic means that RCT-level evidence can
be derived from populations that are almost impossible to gather data from, because
of low prevalence conditions, or difficulty in recruiting or retaining subjects. This
chapter describes the method in detail, along with methodological challenges and
limitations of the method.
Chapter 6 on major design elements of N-of-1 trials by Kimmie Carriere, Yin Li,
Geoff Mitchell and Hugh Senior discuss some important considerations when
choosing a particular individual N-of-1 trial design. N-of-1 trials are extremely
useful in subject-focused investigations, for example, medical experiments. As far
as we are aware, no guidelines are available in the literature on how to plan such a
trial optimally. In this chapter, they discuss the considerations when choosing a
particular N-of-1 trial design, assuming that the outcome of interest is measured on
a continuous scale. The discussion is limited to comparisons of two treatments,
without implying that the designs constructed can apply to non-continuous or binary
outcomes. Optimal N-of-1 trials under various models are constructed depending
upon how we accommodate the carryover effects and the error structures for the
repeated measurements. Overall, they conclude that alternating between AB and BA
pairs in subsequent cycles will result in practically optimal N-of-1 trials for a single
patient, under all the models considered, without the need to guess at the correlation
structure or conduct a pilot study. Alternating between AB and BA pairs in a single
trial is nearly robust to misspecification of the error structure of the repeated
measurements.
In Chap. 7 Hugh Senior discusses a major concern in N-of-1 trials, common to
any epidemiological approach – the introduction of bias and confounding. These
factors may modify the size of the treatment estimate or shift the treatment estimate
away from its true value. The methodological approaches of randomization, alloca-
tion concealment, and blinding are employed to prevent or minimize confounding
and bias in trials. This chapter provides definitions and describes the various
methods of randomization, allocation concealment, and blinding that can be adopted
in N-of-1 trials. In addition, the chapter details the roles of specific research staff
and the information required for the reporting of N-of-1 trial blinding methods in
medical journals.
In Chap. 8 on data collection and quality control, Hugh Senior explains how to
achieve a reliable data set for analysis that complies with the protocol. A system of
clinical data management (the planning and process of data collection, integration
and validation) is critical. This chapter provides a synopsis of the key components
of clinical data management which need to be considered during the design phase
of any trial. Topics addressed include the roles and responsibilities of research staff,
the design of case report forms for collecting data; the design and development of a
clinical database management system, subject enrolment and data entry, data
validation, medical coding, database close-out, data lock and archiving. An addi-
tional section discusses the rationale for the requirement of trial registration.
Chapter 9, by Michael Yelland, offers a very practical account of the reporting of
N-of-1 trials to patients and clinicians, using trials for chronic pain conditions as
models which may be applied to many other forms of N-of-1 trials. It draws from
1 Introduction 5
Next, Kerrie Mengersen, James McGree and Christopher Schmid discuss issues
and approaches related to systematic review and meta-analysis of N-of-1 trials.
Chapter 16 describes some basic guidelines and methods, and some important steps
in a systematic review of these types of trials are discussed in detail. This is followed
by a detailed description of meta-analytic methods, spanning both frequentist and
Bayesian techniques. A previously undertaken meta-analysis of a comparison of
treatments for fibromyalgia syndrome is discussed with some sample size consider-
ations. This is further elaborated on through a discussion on the statistical power of
studies through a comparison of treatments for chronic pain. The chapter concludes
with some final thoughts about the aggregation of evidence from individual
N-of-1 trials.
Finally, in Chap. 17, Jane Nikles looks at the current status of N-of-1 trials and
where N-of-1 trials are headed. N-of-1 trials and review articles have recently been
published in the areas of chronic pain, pediatrics, palliative care, complementary
and alternative medicine, rare diseases, patient-centered care, the behavioral sciences
and genomics. These are briefly reviewed and the current place of N-of-1 trials
discussed. The chapter concludes with a vision for the future of N-of-1 trials.
We trust you find the book useful. Feedback that might inform later editions is
welcomed.
References
Kravitz RL, Duan N (eds) and the DEcIDE Methods Center N-of-1 Guidance Panel (Duan N,
Eslick I, Gabler NB, Kaplan HC, Kravitz RL, Larson EB, Pace WD, Schmid CH, Sim I, Vohra
S). Design and implementation of N-of-1 trials: a user’s guide. AHRQ Publication No.
13(14)-EHC122-EF. Agency for Healthcare Research and Quality, Rockville, February 2014.
www.effectivehealthcare.ahrq.gov/N-1-Trials.cfm
Lillie EO, Patay B, Diamant J, Issell B, Topol EJ, Schork NJ (2011) The N-of-1 clinical trial: the
ultimate strategy for individualizing medicine? Pers Med 8(2):161–173
Chapter 2
What are N-of-1 Trials?
Jane Nikles
Abstract In this chapter, we define N-of-1 trials and provide a brief historical
perspective. We briefly cover the background and rationale for N-of-1 trials, and
discuss their benefits.
Keywords N-of-1 trials • Trial of therapy • Clinical trial • Crossover trial • Patient-
centered outcome research • Chronic-disease • Medication • Medication expendi-
ture • Quality use of medicines • Adverse events
Introduction
J. Nikles (*)
School of Medicine, The University of Queensland, Ipswich, QLD, Australia
e-mail: uqjnikle@uq.edu.au
These trials may also be useful in situations where there is a need to prioritize
medications or where there is significant difference in cost or availability between
drugs approved for the same indication.
In addition, for selected treatments, multiple N-of-1 studies of the same treat-
ment in a similar patient population can be aggregated with high levels of power
(e.g. via Bayesian or other statistical methods), to provide a population estimate
of effect, but requiring a fraction of the sample size of the equivalent parallel arm
RCT. This has obvious benefits for accumulating evidence in populations where
participants are hard to recruit (e.g., small number of patients such as pediatric
traumatic brain injury (Nikles et al. 2014), rare conditions (Facey et al. 2014) or
to retain (e.g., palliative care (Mitchell et al. 2015). Further, the effect on each
participant in the trial is known, which presents opportunities for research into
individual responsiveness to therapy. This is discussed in more detail in Chap. 5.
Historical Perspective
Cushny and Peebles conducted the first N-of-1 trial in 1905 by examining the
saliva--inducing effects of optical isomers of hyoscines (Cushny and Peebles
1905). N-of-1 trials have long been used in psychology, and over the last 40
years, have been used in many different situations in clinical medicine, initially
by Kellner in 1968 (Kellner and Sheffield 1968) as an experimental approach to
targeting drug therapy. A resurgence in 1986 with Guyatt, Sackett and colleagues
conducting several N-of-1 trials and the setting up of N-of-1 services by Guyatt
and Larson in the 1980s and 1990s (e.g. Guyatt et al. 1986, 1988; Larson 1990;
Larson et al. 1993), marked the beginning of the use of N-of-1 trials in modern
clinical settings.
Today, chronic diseases are among the most prevalent, costly, and preventable of
all health problems. Seven out of every ten Americans who die each year, or
more than 1.7 million people in America yearly, die of a chronic disease (Centers
for Chronic Disease Prevention and Health Promotion 2009). The prolonged
course of illness and disability from such chronic diseases as diabetes and arthri-
tis results in extended pain and suffering and decreased quality of life for mil-
lions of Americans, causing major limitations in activity for more than 33
million Americans (Centers for Chronic Disease Prevention and Health
Promotion 2009).
12 J. Nikles
In the USA, as in other Western countries, chronic diseases cost large amounts both
to the government and to the individual. Heart disease and stroke cost US$313.8
billion in 2009; cancer cost US$89 billion in health care expenditures in 2007; the
direct costs of diabetes totaled $US116 billion in 2007; the cost of direct medical
care for arthritis was more than $US80.8 billion per year, in 2003 (Centers for
Chronic Disease Prevention and Health Promotion 2009).
In Western countries, a large part of the cost of chronic disease is the cost of
medications and their adverse effects. As in the USA, large and increasing amounts
of money are spent on prescription medications in Australia. In the year ending 30
June 2013, a total of at least AUD$8.9 billion, including AUD$7.4 billion by the
PBS and AUD$1.49 billion by patients, was spent on 197 million PBS prescriptions
(compared to 194.9 million in the previous year) (PBS expenditure and prescrip-
tions, July 2012).
Over the last 15 years, there has been increasing concern about the high psychoso-
cial, economic and health costs of inappropriate medication use. Many people do
not respond to medicines they are prescribed (Guyatt et al. 2000): the evidence that
clinicians rely on to make treatment decisions may not apply to individuals. Ensuring
that patients only take medicines that work for them, is an important strategy in
reducing the burden of medication misadventure. In Australia each year, medication
misadventure, including adverse drug reactions and drug-drug interactions, is impli-
cated in 2–3 % of all hospitalizations (i.e. 190,000 per year) (Australian Council for
Safety and Quality in Health Care (2006)), 8,000 hospital related deaths (Roughead
and Semple 2009), an estimated annual cost of $350 million in direct hospital costs
and total costs to the health system of $660 million (Roughead and Semple 2009).
Even for those who are not ill enough to require admission, adverse events can
impair their quality of life (Sorensen et al. 2005). Polypharmacy, especially in older
people and women, contributes to this (Curry et al. 2005). There is also evidence
to suggest that considerable amounts of medicines are wasted, that is, unused by
patients and returned to pharmacists (Commonwealth Department of Human
Services and Health 2009).
It is clear that the appropriate and safe use of medicines is an urgent national
priority. In the current evidence-based and consumer-driven policy environ-
ments, information which helps medical practitioners and patients make informed
2 What are N-of-1 Trials? 13
decisions about the appropriate use of medicines is much needed, for example,
ways of reducing unnecessary medication use by targeting medications to
responders.
Although the art of prescribing has advanced significantly over recent decades,
there is still a large element of uncertainty involved. For example, it is known that
only 30–50 % of individuals with osteoarthritis respond to NSAIDs (Walker et al.
1997); so how does a clinician predict whether a particular osteoarthritic patient will
respond or not?
Randomized controlled trials remain the ‘gold standard’ for assessing drug
effectiveness. However, the difficulty of extrapolating the results of randomized
controlled trials to the care of an individual patient has resulted in prescribing
decisions in clinical practice often being based on tradition, rather than evi-
dence (Larson 1990; Larson et al. 1993; Mapel et al. 2004). When considering
any source of evidence about treatment other than N-of-1 trials, clinicians are
generalizing from results on other people to their patients, inevitably weaken-
ing the inferences about treatment impact and introducing complex issues of
how randomized controlled trial (RCT) results apply to individuals (Kellner and
Sheffield 1968). In fact, in their hierarchy of study design to evaluate the
strength of evidence for making treatment decisions, Guyatt et al. place N-of-1
trials at the top (highest level of evidence) (Guyatt et al. 2000). This is discussed in
more detail in Chap. 4.
14 J. Nikles
Doctors often use a trial of therapy to assist in their clinical decision-making. That
is, the patient presents with a particular cluster of symptoms, they are prescribed a
particular medication (for example, an asthma drug or an NSAID) tentatively, as a
trial, and the subsequent condition of the patient is used to monitor the efficacy of
the treatment, usually informally. Then, based on the patient’s response, the medica-
tion is either continued, discontinued or the dose is changed. This is discussed in
more detail in Chap. 4.
The informal trial of therapy has serious potential biases. These are the placebo
effect, the patient’s desire to please the doctor, and the expectations of patient and
doctor. Although commonly used, it is not adequate for determining the appropri-
ateness of prescribing certain medications, particularly those that have significant
side effects or are expensive. This is discussed in more detail in Chap. 4.
One method with the potential to solve these problems is the N-of-1 trial. An N-of-1
trial is more rigorous and objective version of a trial of therapy and therefore could
be a potentially feasible initiative to incorporate into clinicians’ routine day to day
practice. Rather than using a group of patients as a control for a group of patients,
as is done in parallel group clinical trials, N-of-1 trials use a single patient as their
own control. Because the data have come from that patient, the result is definitely
applicable to that patient. Because treatment periods are randomized and double-
blind, N-of-1 trials remove the biases mentioned above.
interventions since 1986 (Gabler et al. 2011). The most common conditions examined
in the N-of-1 trials were neuropsychiatric (36 %, of which 9 % were attention deficit
hyperactivity disorder), musculoskeletal (21 %, of which 9 % were osteoarthritis),
and pulmonary (13 %).
Doctors
Health System
Conclusion
N-of-1 trials are individual randomized multiple crossover controlled trials which
use the patient as their own control. Applicable in many contexts, they have signifi-
cant benefits for patients, clinicians and the health system.
References
Askew DA (2005). A study of research adequacy in Australian general practice. PhD thesis, The
University of Queensland, Brisbane
Australian Council for Safety and Quality in Health Care (2006) National inpatient medication
chart: general instructions/information for doctors. The Australian Department of Health and
Ageing, Canberra
Centers for Chronic Disease Prevention and Health Promotion (2009) The power of prevention
chronic disease….the public health challenge of the 21st century. http://www.cdc.gov/chron-
icdisease/pdf/2009-power-of-prevention.pdf. Accessed 31 Oct 2014
Commonwealth Department of Human Services and Health (2009) A policy on the quality use of
medicines. Commonwealth Department of Human Services and Health, Canberra
Curry LC, Walker C, Hogstel MO, Burns P (2005) Teaching older adults to self manage medica-
tions: preventing adverse drug reactions. J Gerontol Nurs 31(4):32–42
Cushny AR, Peebles AR (1905) The action of optical isomers: II. Hyoscines. J Physiol
32(5–6):501–510
Facey K, Granados A, Guyatt G, Kent A, Shah N, van der Wilt GJ, Wong-Rieger D (2014)
Generating health technology assessment evidence for rare diseases. Int J Technol Assess Health
Care 19:1–7 [Epub ahead of print]
Gabler NB, Duan N, Vohra S, Kravitz RL (2011) N-of-1 trials in the medical literature: a system-
atic review. Med Care 49(8):761–768. doi:10.1097/MLR.0b013e318215d90d
Guyatt G, Sackett D, Taylor DW et al (1986) Determining optimal therapy: randomized trials in
individual patients. N Engl J Med 314:889–892
Guyatt G, Sackett D, Adachi J et al (1988) A clinician’s guide for conducting randomized trials in
individual patients. CMAJ 139:497–503
Guyatt GH et al (2000) Users′ guides to the medical literature: XXV. Evidence‐based medicine:
principles for applying the users′ guides to patient care. Evidence‐based medicine working
group. JAMA 284(10):1290–1296
2 What are N-of-1 Trials? 17
Abstract This chapter describes the application of N-of-1 trials in the behavioural
sciences, where they are commonly referred to as single-case experimental designs
(SCEDs). Four essential features demarcate single-case methodology from between-
group designs: (i) the individual serves as his or her own control, (ii) use of a specific
and operationally-defined behaviour that is targeted by the intervention, (iii) frequent
and repeated measurement of the target behaviour throughout all phases of the exper-
iment, and (iv) issues surrounding external validity. Features that strengthen internal
and external validity of SCEDs are discussed in the context of a standardised scale to
evaluate the scientific quality of SCEDs and N-of-1 trials, the Risk of Bias in N-of-1
Trials (RoBiNT) Scale. New work in developing a reporting guide in the CONSORT
tradition (the Single-Case Reporting guideline In BEhavioural interventions, SCRIBE)
is referenced. Subsequent sections in the chapter highlight differences among the
prototypical single-case designs reported in the literature, both experimental (with-
drawal/reversal, multiple-baseline, alternating-treatments, and changing-criterion
designs) and non-experimental (biphasic A-B design, B-phase training study, pre-
intervention/post-intervention design, and case description/report), along with illus-
trative examples reported in the literature. The final section of the chapter describes
available methods to analyse data produced by SCEDs, including structured visual
analysis, randomization tests and other statistical procedures.
In the 25 years between 1985 and 2010, just over 100 articles were published in
medical journals using N-of-1 methodology (Gabler et al. 2011). By contrast, dur-
ing the same period more than 600 articles using single-case experimental designs
were published in the neuropsychological rehabilitation field alone (www.psycbite.
com, accessed 21 February, 2014), and in the even more circumscribed field of spe-
cial education more than 450 such articles were published over approximately the
same period (1983–2007; Hammond and Gast 2010).
The behavioral sciences (including clinical, education, and neuro- and rehabilita-
tion psychology) have used single-case experimental methodology to test the efficacy
of interventions for many decades. Indeed, Gordon Guyatt, who was interviewed
about the evolution of the medical N-of-1 trial that occurred in the early 1980s, com-
mented that at that time:
The department [Clinical Epidemiology and Biostatistics at McMaster University, Canada]
was multidisciplinary and very tightly integrated. So there were … statisticians and psy-
chologists and people with behavioral backgrounds, physicians and epidemiologists getting
together on a regular basis. And for a while, one of the psychologists would say, ‘Oh, that
would be very interesting for an N-of-1 trial.’ And we said, ‘Thank you very much’ and
would go on. Then at one point it clicked, and we started to get out the psychology literature
and found three textbooks full of N-of-1 designs from a psychology perspective. … It was
totally old news (Kravitz et al. 2008, p. 535).
This interview highlights two issues: first, the methodology of the N-of-1 trial
was already well established in psychology by the time that Guyatt and colleagues
commenced work with N-of-1 trials, and second, there was not just a single N-of-1
design but many different types. In what is taken to be the initial, landmark publica-
tion describing a medical N-of-1 trial (a 66-year old man with uncontrolled asthma),
Guyatt et al. (1986, pp. 889–890) commented that “experimental methods have
been applied to individual subjects in experimental psychology for over two
decades, to investigate behavioral and pharmacological interventions. The method
has been called an ‘intensive research design’, a ‘single case experiment’ or (the
term we prefer) an ‘N of 1’ study (‘N’ being a standard abbreviation for sample
size).” In the behavioral sciences, although the term ‘N-of-1’ was used in early work
(e.g., Davidson and Costello 1978; Kratochwill and Brody 1978), the descriptor
‘single-case experimental design’ (SCED) is more commonly used, referring to a
family of different types of controlled research designs using a single participant.
Because the ‘N-of-1 trial’ in medicine now refers to a specific type of SCED, hence-
forth, in the present chapter we use the broader term SCED to avoid confusion. We
include the medical N-of-1 trial as a subset of SCEDs, the varieties of which are
described later in this chapter.
Several essential features define SCEDs and distinguish them from traditional
between-group research methodology. These differences are important, because
they also form the building blocks for designing and implementing scientifically
rigorous SCEDs:
3 N-of-1 Trials in the Behavioral Sciences 21
First, in SCEDs the individual serves as his or her own control. Whereas a group
design compares two or more groups who receive different interventions (which
may include a non-intervention condition), comparable control is achieved in
SCEDs by having the same individual receive the intervention conditions sequen-
tially in a number of “phases” (this term being comparable to “periods” as used in
the medical N-of-1 literature). There is a minimum of two types of phases, the
baseline phase (generally designated by the letter, A) and the intervention phase
(generally designated by the letter, B). In this way, a SCED involves a controlled
experiment. There are certain design rules that govern the way in which the inter-
vention (i.e., the independent variable) is manipulated (e.g., only changing one vari-
able at a time) which are described in detail in single-case methodology texts (e.g.,
Barlow et al. 2009; Kazdin 2011).
Second, the outcome (i.e., the dependent variable) in SCEDs is an operationally
defined and precisely specified behavior or symptom that is targeted by the inter-
vention (and hence referred to as the “target behavior”). In group designs, outcomes
often reflect general constructs (e.g., social skills) and are usually measured with
standardized instruments that, ideally, have good psychometric properties. Such
instruments (at least in the behavioral sciences) often encompass multiple and even
disparate aspects of the outcome of interest (e.g., an outcome measure for “social
skills” may include a heterogeneous set of items addressing eye contact, facial
expression, initiating conversation, response acknowledgement) and the score
obtained may not capture the specificity of the particular problem behavior being
treated. By contrast, the dependent variable used in a SCED aims to provide a
specific, precisely defined, behavioral observation (e.g., frequency of initiating
conversation topics), and indeed often does not employ a standardized instrument
for measuring the outcome. As a consequence, the researcher needs to demonstrate
that the data collected on the target behavior have acceptable inter-rater reliability.
Third, SCEDs involve frequent and repeated measurement of the target behavior
in every phase, whereas the outcome variable/s in group designs may be measured
on as little as two occasions (pre-intervention and post-intervention). The reason for
multiple measures of the target behavior is to address the variability in behavior that
occurs in a single individual. In group designs, such variability is overcome by
aggregating data across participants. Because the target behavior in SCEDs needs to
be measured repeatedly, it also needs to lend itself well to this purpose and be ame-
nable to frequently repeated administrations (which are often not feasible with
standardized instruments that may be time-consuming to administer). The exact
number of measurements taken per phase, however, will also depend on different
parameters, such as stability of the data (discussed later in this chapter).
Fourth, another difference between group designs and SCEDs pertains to exter-
nal validity, specifically generalization of the results to other individuals and other
settings. Traditionally, external validity has been regarded as a special strength of
group designs (most notably in the randomized controlled trial), and conversely a
particular weakness of SCEDs – clearly, with a sample size of n = 1, the grounds for
generalization to other individuals are very tenuous. Nonetheless, these extreme
views regarding external validity are an oversimplification. It has been observed
22 R.L. Tate and M. Perdices
previously that generalization of results from a group design only applies to indi-
viduals who share similar characteristics to those participating in the study, and
more specifically, the subset of participants in a study who actually improve (gener-
ally, not all participants respond positively to the intervention). Moreover, selection
criteria for participants in clinical trials are often stringent in order to increase
homogeneity of the sample and generally people are excluded who have premorbid
and/or current comorbidities (e.g., in the neuropsychological rehabilitation field:
alcohol and substance use problems, other neurological or psychiatric conditions, as
well as severe motor-sensory and cognitive impairments). Restricted selection
criteria place severe limitations on the capacity to generalize the results to those
individuals who do not possess such characteristics.
On the other hand, the SCED has developed methods that strengthen external
validity, most commonly through replication (see Horner et al. 2005), as well as
using generalization measures as additional outcome measures (see Schlosser and
Braun 1994). Direct replication of the experimental effect within the experiment
(i.e., intra-subject replication) is a key feature of single-case methodology that
strengthens internal validity, whereas inter-subject replication and systematic
replication are methods to enhance external validity (Barlow et al. 2009; Gast 2010;
Horner et al. 2005; Sidman 1960).
The foregoing provides the parameters of single-case methodology. It is impor-
tant to distinguish this single-case methodology from anecdotal, uncontrolled case
descriptions that are also reported in the literature. Specifically, single-case method-
ology is distinguished by the following cardinal features (Allen et al. 1992; Backman
et al. 1997; Perdices and Tate 2009; Rizvi and Nock 2008):
• It consists of a number of discrete phases, generally, but not invariably, baseline
(A) and intervention (B) phases in which the individual serves as his or her own
control
• There is a clear, operational definition of the dependent variable (behavior tar-
geted for treatment)
• The dependent variable is measured repeatedly and frequently throughout all
phases using precisely defined methods
• Measurement and recording procedures are continued until requirements of the
specific design have been satisfied
• Experimental control is exercised by systematically manipulating the indepen-
dent variable (intervention), manipulating one independent variable at a time and
carefully controlling extraneous variables.
The above defining features of SCEDs have been incorporated into methodologi-
cal quality rating scales to evaluate both internal and external validity. It is well
established that even randomized controlled trials vary widely with respect to their
scientific rigor (in the neurorehabilitation literature see Moseley et al. 2000; Perdices
et al. 2006), and the single-case literature also exhibits variability with respect to
scientific calibre (Barker et al. 2013; Maggin et al. 2011; Shadish and Sullivan
2011; Smith 2012; Tate et al. 2014). Over the past 10 years concerted efforts have
been made in the behavioral sciences to improve the conduct and reporting of SCED
3 N-of-1 Trials in the Behavioral Sciences 23
Table 3.1 Item content of the risk of bias in N-of-1 trials (RoBiNT) scale
Internal validity subscale External validity and interpretation subscale
1. Design: Does the design of the study meet 8. Baseline characteristics: Were the
requirements to demonstrate experimental participant’s relevant demographic and
control? clinical characteristics, as well as
characteristics maintaining the condition
adequately described?
2. Randomization: Was the phase sequence and/ 9. Therapeutic setting: Were both the specific
or phase commencement randomized? environment and general location of the
investigation adequately described?
3. Sampling: Were there a sufficient number of 10. Dependent variable (target behavior):
data points (as defined) in each of baseline and Was the target behavior defined,
intervention phases? operationalized, and the method of its
measurement adequately described?
4. Blind participants/therapists: Were the 11. Independent variable (intervention): Was
participants and therapists blinded to the intervention described in sufficient detail,
treatment condition (phase of study)? including the number, duration and
periodicity of sessions?
5. Blind assessors: Were assessors blinded to 12. Raw data record: Were the data from the
treatment condition (phase of study)? target behavior provided for each session?
6. Inter-rater reliability (IRR): Was IRR 13. Data analysis: Was a method of data
adequately conducted for the required analysis applied and rationale provided for
proportion of data, and did it reach a sufficiently its use?
high level (as defined)?
7. Treatment adherence: Was the intervention 14. Replication: Was systematic and/or
delivered in the way it was planned? inter-subject replication incorporated into the
design?
15. Generalization: Were generalization
measures taken prior to, during, and at the
conclusion of treatment?
Note: the RoBiNT manual, available from the corresponding author, provides detailed description
and operational definitions of the items
1
PsycBITE is a multi-disciplinary database that archives all of the published empirical articles on
nonpharmacological treatments for the psychological consequences of acquired brain impairment
and contains more than 4,500 records. Controlled studies (both group and single-case) are rated for
scientific rigor and ranked on the database in terms of their scientific quality.
24 R.L. Tate and M. Perdices
The specific requirements of our research team were to develop a practical, feasible,
reliable and sensitive scale that evaluated risk of bias in single-case research designs.
The RoBiNT Scale is designed to be a generic scale, applicable to any type of
SCED (including the medical N-of-1 trial). It demonstrates very good psychometric
properties, with excellent inter-rater reliability (all intra-class correlation coeffi-
cients (ICC) for each of the total score and the two subscales for pairs of both novice
and trained raters are in excess of ICC = 0.86), as well as discriminative validity
(Tate et al. 2013).
An added feature of the RoBiNT Scale is that the items can be used as a checklist
to plan the conduct and reporting of single-case experiments and N-of-1 trials. That
said, recent endeavors have focused specifically on developing reporting guidelines
in the CONSORT tradition for SCEDs in the behavioral sciences literature (Tate
et al. 2012). The guideline, entitled the SCRIBE (Single-Case Reporting guideline
In BEhavioral interventions) is to be published soon (Tate et al. accepted). The
SCRIBE owes its origins to the complementary CONSORT Extension for N-of-1
Trials (CENT) Statement developed for the medical community (Shamseer et al.
2015; Vohra et al. 2015; see Chap. 14 in this volume). The reason that two sets of
guidelines are being developed is to cater to the different readership (medical vs
behavior sciences). In addition, the CENT relates exclusively to the medical N-of-1
trial, whereas the SCRIBE addresses the broader family of SCEDs.
Single Case
Experimental
Designs
Withdrawal/Reversal (SCED)
Multiple-Baseline Designs
Designs (e.g., A-B-A,
A-B-A-B, A-B-A-C-A-D)
N-of-1 randomised
Alternating-Treatments Changing-Criterion
Designs Designs
Case Descriptions
Fig. 3.1 Common types of designs using a single participant (Reproduced from: unpublished
manual for critical appraisal of single-case reports, University of Sydney. Adapted version published
in Tate et al. (2013))
Case Descriptions
Case descriptions are “a description of clinical practice that does not involve
research methodology” (Backman and Harris 1999, p. 171). The quintessential
example of the case description noted by Rizvi and Nock (2008) is the classic work
of Freud and Breuer (1895), the most famous being the case of Anna O.
In B-phase training studies, the target behavior is measured only during the treat-
ment (B) phase. Because these designs lack a baseline (A phase), there is no system-
atic manipulation of the independent variable. Therefore the experimental effect
cannot be demonstrated, thus abolishing any attempt at experimental control. It is
therefore not possible to reliably determine the reasons for any change observed in
the target behavior – it may be the result of treatment or it may not. One might rea-
sonably ask the question as to why such an uncontrolled study that cannot demon-
strate treatment effect is conducted in the first place. B-phase training studies are
often implemented in situations of clinical emergency where the target behavior is
at risk of causing harm to the patient or other people (Kazdin 2011). In this
26 R.L. Tate and M. Perdices
situation, there may not be sufficient time to conduct an initial baseline (A) phase
and there may also be ethical objections to withdrawing the intervention (to revert
to baseline to demonstrate treatment effect) if there has been amelioration of the
target behavior.
Pre-intervention/Post-intervention Design
The bi-phasic A-B design can be considered the basic, entry-level design for single-
case methodology. It is characterized by two phases: in the baseline (A) phase, the
target behavior is measured repeatedly and frequently, and treatment does not com-
mence until the B phase, in which the target behavior also continues to be measured
repeatedly and frequently. Yet, because there is no second A phase (i.e., A-B-A),
there is only one demonstration of the experimental effect and thus little, if any,
control of extraneous variables. Without a control condition (e.g., a withdrawal or
reversal phase, additional patients or behaviors that are concurrently examined) it is
not possible to reliably establish the reason for change in the target behavior.
Consequently, A-B designs are “more accurately considered a pre-experimental
design” (Byiers et al. 2012, p. 401). Such designs cannot provide firm evidence to
establish a causal (functional) relationship between the dependent and independent
variables, even though it may be possible to verify statistically a change in level,
trend or variability of the target behavior between the A and B phases.
3 N-of-1 Trials in the Behavioral Sciences 27
Recent surveys have documented a rich diversity of designs reported in the current
psychological and educational literature (Shadish and Sullivan 2011; Smith 2012;
Barker et al. 2013). Barlow et al. (2009) describe 19 distinct SCEDs that allow
investigations to be tailored to the diverse and often complex nature of behavioral
interventions, and meet specific challenges in scientifically evaluating treatment
effects. In this section, we review the four major types of SCEDs used in rehabilita-
tion, behavioral and education research. The 19 design varieties described by
Barlow et al. fit within the four major design types.
Withdrawal/Reversal Designs
that have potentially irreversible effects on the target behavior, such as training the
participant to use a self-monitoring strategy, which cannot be readily ‘unlearned’
when the treatment phase ends. These designs may also be unsuitable in situations
when withdrawal of treatment might not be clinically appropriate or ethical, particu-
larly if treatment appears to be effective or dangerous behaviors are involved.
Moreover, if the investigation includes several interventions (e.g., A-B-A-C-A-
D-A) the effect of intervention order, or interaction between different interventions
may be difficult to interpret.
In general, N-of-1 trials in the medical literature are essentially multiple-phase,
withdrawal/reversal designs, described by Guyatt and colleagues (1986, 1988) as
the double-blind, randomized, multiple cross-over A-B trial in an individual patient.
They may include washout periods following an intervention phase in order to
reduce carryover effects of treatment. In addition, duration of treatment periods
may be determined a priori in order to allow the expected treatment effects to occur
(Duan et al. 2013).
Experimental effect in withdrawal/reversal designs is demonstrated if the level of
the dependent variable (target behavior) varies in the predicted manner when the
independent variable (intervention) is systematically introduced and withdrawn
(i.e., at phase change from A to B, from B to A, etc.). Adequate experimental
control is achieved when the study design provides at least three opportunities for
demonstrating the experimental effect (Horner et al. 2005; Kratochwill et al. 2013).
The A-B and A-B-A designs fail to meet this criterion. The A-B-A-B design (see
Fig. 3.2) is now regarded as the simplest withdrawal/reversal design offering accept-
able experimental control.
Travis and Sturmey (2010) used an A-B-A-B design to decrease the production
of delusional utterances in a 26-year old man with “mild intellectual disabilities,
frontal lobe syndrome, traumatic brain injury, mood disorder, and mania” (p. 745).
Several years after sustaining his brain injury, this man began to utter delusional
statements which negatively impacted the relationships he had with his peers in
the inpatient facility where he lived. During baseline phases, the therapist pro-
vided 10 s of disapproving comment, immediately following the patient uttering
a delusional statement. The intervention was based on differential reinforcement
of alternative behavior and extinction. This consisted of withholding attention for
10 s when the patient uttered a delusional statement and providing 10 s of positive
verbal reinforcement following contextually appropriate, non-delusional utter-
ances. The behavioral experiment was conducted over 17 sessions with four to
five sessions per phase, during which data were collected on the target behavior
each session, and were presented graphically in the report. Compared with base-
line performance, the intervention markedly decreased the rate (per minute) of
delusional utterances, and increased the rate of non-delusional, contextually
appropriate statements. Treatment effect was still evident at 6 month, 1- 2- and
4-year follow-ups.
3 N-of-1 Trials in the Behavioral Sciences 29
Multiple-Baseline Designs
Many interventions teach new skills (e.g., communication strategies, gait retraining,
social skills, behavior monitoring), which cannot be readily ‘unlearned’. In this
scenario, as well as the situation where it is considered unethical to withdraw a suc-
cessful intervention, the multiple-baseline design (MBD) provides an effective and
ready way by which to test the efficacy of an intervention. The MBD can also be
used for interventions that can be meaningfully withdrawn.
In MBDs several baselines (legs or tiers) of the dependent variable are measured
simultaneously. The intervention, however, is introduced across the various tiers in
a staggered sequence. Thus, at different stages of the experiment some tiers will be
in the baseline (A) phase and others will be in the intervention (B). There are three
basic types of MBDs: across behaviors, participants or settings (see Fig. 3.3). Onset
of the initial baseline phase occurs concurrently across all tiers, and onset of the first
intervention phase is then sequentially staggered over time across the different tiers.
Each tier generally consists of a simple A-B sequence, but more complex designs
(e.g., alternating-treatment or multiphasic withdrawal/reversal designs) can also be
Fig. 3.3 Simulated data demonstrating a multiple-baseline design across three different settings
30 R.L. Tate and M. Perdices
embedded into each tier (Shadish and Sullivan 2011; Smith 2012). In the latter
instance, onset of additional phases is also generally staggered across the tiers.
MBDs designs are commonly reported in the psychology and education litera-
ture, accounting for more than 50 % of SCEDs (Shadish and Sullivan 2011).
Experimental effect in MBDs is demonstrated when “change occurs when, and only
when, the intervention is directed at the behavior, setting or participant in question”
(Barlow et al. 2009, p. 202). Adequate experimental control is achieved when the
design permits the experimental effect to be demonstrated on at least three occa-
sions (Horner et al. 2005; Kratochwill et al. 2013). This means that in a MBD with
three tiers, each tier must, at minimum, incorporate an A-B phase sequence.
The MBD has many strengths, efficiencies and flexibility in meeting specific
contingencies of a situation which may cause problems with the classic withdrawal/
reversal (A-B-A-B) design. They are most elegant in addressing replication (MBD
across participants) and testing for generalization (MBD across settings and behav-
iors). Nonetheless, Kazdin (2011) has noted some potential difficulties with the
MBD. There may be interdependence, particularly in MBD across behaviors, which
means that when the intervention is introduced at the first tier, carry-over effects
occur in the behaviors of the second and subsequent tiers which are still in the base-
line phase. There may also be inconsistencies in response across tiers, which intro-
duce ambiguity in interpreting results. In addition, prolonged baselines in the second
and subsequent tiers may mean that in MBDs across behaviors or participants there
is a lengthy period before intervention can commence.
Feeney (2010) used an A-B MBD across settings to investigate the effects of a
multi-component intervention (including addressing environmental context, behavior
supports and cognitive strategies) for reducing challenging behaviors in two children
with traumatic brain injury. The intervention was delivered in three classroom
settings: English Language Arts class, Mathematics class and Science class. In each
setting, Feeney measured the treatment effect on three behavioral measures:
(1) frequency of challenging behaviors (defined as attempted or completed acts of
physical aggression such as hitting or pushing), or verbal aggression such as threats;
(2) intensity of aggression measured on a 5-point scale (0 = no problems, to 4 = severe
problems) using the sections relevant to aggression of the Aberrant Behavior
Checklist; (3) percentage of work completed in the classroom. The experiment in
both children lasted 30 days (the baseline in the first tier being 5 days), and data on
the target behavior were collected for each of the 30 days and presented graphically.
For both children, the treatment diminished both frequency and intensity of challeng-
ing behavior and also increased the quantity of work completed.
Alternating-Treatment Designs
Fig. 3.4 Simulated data demonstrating an alternating treatment design, comparing two interven-
tions. The treatment phase may be preceded by a baseline phase (e.g., Morrow and Fridriksson
2006), and/or a best intervention phase may follow the treatment phase
the same day) (Barlow et al. 2009). ATDs are particularly appropriate and useful
when there is a need to identify an effective intervention as quickly as possible (see
Fig. 3.4).
An initial baseline phase may precede the treatment phase (Morrow and
Fridriksson 2006) and a final “best treatment” phase may also be included in the
design. The “best treatment” phase refers to the final phase of the experiment where
the intervention that has been shown to be superior is administered by itself. This
phase is incorporated into the designs in order to evaluate threats to internal validity
posed by potential interference associated with multiple interventions. Generally,
factors such as treatment sequence, setting or therapists are also counterbal-
anced in order to reduce threats to internal validity. Experimental effect is demon-
strated by “iterative manipulation of the independent variable (or levels of the
independent variable) across observation periods” (Horner et al. 2005, p. 168).
Experimental control is demonstrated when measures (levels) of the dependent vari-
able for each intervention do not overlap.
ATDs are not suitable for investigating irreversible effects, and there is a risk that
response generalization from one treatment to another may also occur. Interpretation
of results may also be problematic due to carry-over effects or interaction between
treatments. Intervention order may also influence outcome and make it difficult to
attribute change in the dependent variable to a particular intervention.
Mechling (2006) used an alternating treatment design to compare the relative
effectiveness of three interventions to teach three students (aged 5, 6 and 18 years)
with profound physical and intellectual disabilities (assessed as functioning at lev-
els between 6 and 13 months of age) to use switches to operate equipment. The
interventions used different types of reinforces (intervention A = adapted toys and
devices; intervention B = cause-and-effect commercial software; and intervention
C = instructor-created video programs). The intervention was delivered over nine
sessions, with sessions occurring 2–3 days per week. Each session lasted 9 min,
within which the three interventions were administered, in block rotation order, for
32 R.L. Tate and M. Perdices
3 min each. The dependent variable was the number of times the switches were
activated. As is common with ATDs, there was no initial baseline phase in the study.
Intervention C (instructor created video programs) was found to be the most effec-
tive intervention, and the study concluded with a “best treatment” phase, namely
intervention C for three sessions.
Changing-Criterion Designs
25
20
Target Behaviour
15
10
5
B1 Treatment B2 Treatment B3 Treatment
A Baseline Criterion 1 Criterion 2 Criterion 3
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Days
Fig. 3.5 Simulated data demonstrating a changing criterion design. Vertical solid lines indicate
phase changes. Dashed horizontal lines indicate performance criterion level for a given treatment
phase. Phase change occurs when criterion has been met, in this instance on three consecutive
occasions
3 N-of-1 Trials in the Behavioral Sciences 33
Randomization Tests
Table 3.2 Standards of evidence for single-case experimental designs (After Kratochwill et al.
2010, 2013)
Features of graphed data to be examined within- and between-phases, evaluating the following:
1. Level
2. Trend
3. Variability
4. Immediacy of effect
5. Data overlap
6. Consistency of data patterns across similar phases
Procedure for
visual
analysis:
Step 1: Scrutinize the first A (baseline) phase to establish that the target behavior has
been demonstrated to occur and there is an acceptable level of stability
Step 2: Compare data in adjacent phases for level (phase mean), trend (slope of fitted
line) and variability
Step 3: Determine immediacy of intervention effect by examining changes in level,
trend, variability, and degree of overlap between the last three data points in one
phase and the first three data points in the next phase. Examine patterns of level,
trend and variability for consistency across similar phases
Step 4: Integrate information yielded in preceding steps to determine if there is
adequate experimental control. If so, the intervention is deemed to work if data
in either a treatment or baseline phase do not overlap the actual or extrapolated
data pattern of the preceding baseline or treatment phase respectively
observations can be readily obtained in medical N-of-1 trials and random allocation
is generally applied in this setting (Duan et al. 2013; Kravitz et al. 2014). By con-
trast, randomization is not, regrettably, commonly featured in behavioral SCEDs.
Visual Analysis
with whether or not an intervention has had a tangible (and supposedly beneficial)
impact on the recipient and the way he/she functions and interacts in quotidian life
(Kazdin 2001). An intervention effect might well be statistically significant, but so
small that it has little or no impact on functional performance and be of little value
to the recipient of the intervention.
As Kazdin (1978) points out, however, visual analysis has significant shortcom-
ings: “The problem with visual inspection is that those individuals who peruse that
data may not see eye to eye” (p. 638). First, there are no agreed upon formal criteria
or decision rules. Inter-rater agreement, even among experienced judges, varies
between not much above chance (De Prospero and Cohen 1978) to very high (e.g.,
Kahng et al. 2010), and is poorer for some features of the data, such as change in
variability or slope, than others, such as change in level or mean (Gibson and
Ottenbacher 1988). Using visual aids (e.g., celeration and trend lines) can increase
the accuracy of visual judgments (e.g., Stocks and Williams 1995), but can also lead
to misinterpretation by emphasizing the trend at the expense of other important
features of the data (e.g., level; Brossart et al. 2006).
Visual analysis can also yield high rates of Type I error (up to 84 %), and the
reliability of judgments can be confounded by variability in the data, autocorrela-
tion2 pre-existing linear trends, data cyclicity and effect size (Brossart et al. 2006;
Jones et al. 1978). Although at least one study has reported high agreement (86 %)
between statistical and visual analysis (Bobrovitz and Ottenbacher 1998), the two
approaches are generally discordant, particularly when autocorrelation is high or
when statistical analyses yield a significant result rather than a non-significant result
(Jones et al. 1978).
Statistical Techniques
A variety of statistical techniques and approaches can be used to analyze SCED data,
and these have been reviewed elsewhere (Brossart et al. 2006; Perdices and Tate
2009; Smith 2012). Techniques include the following: quasi-statistical techniques
applied to graphed data (e.g., split-middle trend line; standard deviation band),
randomization tests (described above), time-series analysis (e.g., C-statistic; auto-
regressive integrated moving average, ARIMA), traditional inferential statistics
(e.g., parametric t-test; nonparametric Wilcoxon matched-pairs signed-ranks test;
Friedman two-way analysis of variance), and effect sizes (using nonparametric
methods, such as percentage of non-overlapping data; standardized mean differences;
regression models; hybrid nonparametric/regression models, such as Tau – U; and
multilevel modelling).
2
Autocorrelation in a series of observations refers to the degree of predictability or lack of inde-
pendence between one observation and the subsequent observation. It is usually expressed as a
Pearson-Product Moment Correlation coefficient between all pairs of consecutive observations.
3 N-of-1 Trials in the Behavioral Sciences 37
Conclusion
This chapter has described N-of-1 methods from the perspective of the behavioral
sciences. In spite of its established history, however, the family of SCEDs has had a
chequered course, both in the medical and behavioral sciences. We believe that this
38 R.L. Tate and M. Perdices
References
Allen KD, Friman PC, Sanger WG (1992) Small n research designs in reproductive toxicology.
Reprod Toxicol 6:115–121
Arnau J, Bono R (1998) Short-time series analysis: C statistic vs Edgington model. Qual Quant
32:63–75
Backman CL, Harris SR (1999) Case studies, single-subject research, and n of 1 randomized trials:
comparisons and contrasts. Am J Phys Med Rehabil 78:170–176
Backman CL, Harris SR, Chisholm J-AM, Monette AD (1997) Single-subject research in rehabili-
tation: a review of studies using AB, withdrawal, multiple baseline, and alternating treatments
designs. Arch Phys Med Rehabil 78:1145–1153
Barker JB, Mellalieu SD, McCarthy PJ, Jones MV, Moran A (2013) A review of single-case
research in sport psychology 1997–2012: research trends and future directions. J Appl Sport
Psychol 25:4–32
Barlow DH, Hersen M (1984) Single case experimental designs. Strategies for studying behaviour
change, 2nd edn. Allyn and Bacon, Boston
Barlow DH, Nock MK, Hersen M (2009) Single case experimental designs. Strategies for studying
behaviour change, 3rd edn. Pearson, Boston
Bobrovitz CD, Ottenbacher KJ (1998) Comparison of visual inspection and statistical analysis of
single-subject data in rehabilitation research. Am J Phys Med Rehabil 77(2):90–102
Box GEP, Jenkins GM (1970) Time-series analysis: forecasting and control. Cambridge University
Press, New York
Brossart DF, Parker RI, Olson EA, Mahadevan L (2006) The relationship between visual analysis
and five statistical analyses in a simple AB single-case research design. Behav Modif
30(5):531–563
Byiers BJ, Reichle J, Symons FJ (2012) Single-subject experimental design for evidence-based
practice. Am J Speech Lang Pathol 21:397–414
Crosbie J (1993) Interrupted time-series analysis with brief single-subject data. J Consult Clin
Psychol 61:966–974
Davidson PO, Costello CG (eds) (1978) N = 1: experimental studies of single cases. Van Nostrand
Reinhold Company, New York
Davis DH, Gagné P, Fredrick LD, Alberto PA, Waugh RE, Haardörfer R (2013) Augmenting visual
analysis in single-case research with hierarchical linear modeling. Behav Modif 37:62–89
De Prospero A, Cohen S (1978) Inconsistent visual analyses of intrasubject data. J Appl Behav
Anal 12(4):573–579
3 N-of-1 Trials in the Behavioral Sciences 39
Duan N, Kravitz RL, Schmid CH (2013) Single-patient (N-of-1) trials: a pragmatic clinical deci-
sion methodology for patient-centered comparative effectiveness research. J Clin Epidemiol
66:S21–S28
Edgington ES (1980) Random assignments and statistical tests for one-subject experiments. Behav
Assess 2:19–28
Feeney TJ (2010) Structured flexibility: the use of context-sensitive self-regulatory scripts to sup-
port young persons with acquired brain injury and behavioral difficulties. J Head Trauma
Rehabil 25(6):416–425
Freud S, Breuer J (1895) Studies in hysteria (trans: Luckhurst N, Bowlby R). Penguin Books,
London
Gabler NB, Duan N, Vohra S, Kravitz RL (2011) N-of-1 trials in the medical literature. A system-
atic review. Med Care 49(8):761–768
Gast DL (2010) Single subject research methodology in behavioural sciences. Routledge,
New York
Gibson G, Ottenbacher K (1988) Characteristics influencing the visual analysis of single-subject
data: an empirical analysis. J Appl Behav Sci 24:298–314. doi:10.1177/0021886388243007
Guyatt G, Sackett D, Taylor DW, Chong J, Roberts R, Pugsley S (1986) Determining optimal
therapy—randomized trials in individual patients. N Engl J Med 314(14):889–892
Guyatt G, Sackett D, Adachi J, Roberts R, Chong J, Rosenbloom D, Keller J (1988) A clinician’s
guide for conducting randomized trials in individual patients. CMAJ 139:497–503
Guyatt G, Jaeschke R, McGinn T (2002) N-of-1 randomized controlled trials. In: Guyatt G, Rennie
D, Meade MO, Cook DJ (eds) User’s guide to the medical literature: a manual for evidence-
based clinical practice, 2nd edn. McGraw Hill/AMA, New York/Chicago, pp 179–192
Hammond D, Gast DL (2010) Descriptive analysis of single subject research designs: 1983–2007.
Educ Train Autism Dev Disabil 45(2):187–202
Hartmann DP, Hall RV (1976) The changing criterion design. J Appl Behav Anal 4:527–532
Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M (2005) The use of single-subject
research to identify evidence-based practice in special education. Except Child
71(2):165–179
Howick J, Chalmers I, Glasziou P, Greenhalgh T, Heneghan C, Liberati A, Moschetti I, Phillips B,
Thornton H (2011) The 2011 Oxford CEBM evidence table (Introductory document). Oxford
Centre for Evidence-Based Medicine. http://www.cebm.net/index.aspx?o=5653
Jones RR, Weinrott MR, Vaught RS (1978) Effects of serial dependency on the agreement between
visual and statistical inference. J Appl Behav Anal 11(2):277–283
Kahng SW, Chung KM, Gutshall K, Pitts SC, Kao J, Girolami K (2010) Consistent visual analyses
of intrasubject data. J Appl Behav Anal 43(1):35–45
Kazdin AE (1978) Methodological and interpretative problems of single-case experimental
designs. J Consult Clin Psychol 46(4):629–642
Kazdin AE (1982) Single case research designs: methods for clinical and applied settings. Oxford,
New York
Kazdin AE (2001) Almost clinically significant (p < .10): current measures may only approach
clinical significance. Clin Psychol Sci Pract 8:455–462
Kazdin AE (2011) Single-case research designs: methods for clinical and applied settings, 2nd edn.
Oxford University Press, New York
Kratochwill TR, Brody GH (1978) Single subject designs: a perspective on the controversy over
employing statistical inference and implications for research and training in behavior modifica-
tion. Behav Modif 2:291–307
Kratochwill TR, Levin JR (2010) Enhancing the scientific credibility of single-case intervention
research: randomization to the rescue. Psychol Methods 15(2):124–144
Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR
(2010) Single-case designs technical documentation. Retrieved from What Works Clearinghouse
website. http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf
Kratochwill TR, Hitchcock J, Horner RH, Levin JR, Odom SL, Rindskopf DM, Shadish WR
(2013) Single-case intervention research design standards. Remedial Spec Educ
34(1):26–38
40 R.L. Tate and M. Perdices
Kravitz RL, Duan N, Niedzinski EJ, Hay MC, Subramanian SK, Weisner TS (2008) Whatever
happened to N-of-1 trials? Insiders’ perspectives and a look to the future. Milbank Q
86(4):533–555
Kravitz RL, Duan N, Vohra S, Li J (2014) Introduction to N-of-1 trials: indications and bar-
riers. In: Kravitz RL, Duan N (eds) Design and implementation, Rockville of N-of-1 tri-
als: a user’s guide. AHRQ Publication No. 13(14)-EHC122-EF. Rockville
Lane JD, Gast DL (2013) Visual analysis in single case experimental design studies: brief review
and guidelines. Neuropsychol Rehabil. doi:10.1080/09602011.2013.815636
Maggin DM, Chafouleas SM, Goddard KM, Johnson AH (2011) A systematic evaluation of token
economies as a classroom management tool for students with challenging behaviour. J Sch
Psychol 49:529–554
Mechling LC (2006) Comparison of the effects of three approaches on the frequency of stimulus
activations, via a single switch, by students with profound intellectual disabilities. J Spec Educ
40:94–102
Morrow KL, Fridriksson J (2006) Comparing fixed- and randomized-interval spaced retrieval in
anomia treatment. J Commun Disord 39:2–11
Moseley A, Sherrington C, Herbert R, Maher C (2000) The extent and quality of evidence in neu-
rological physiotherapy: an analysis of the Physiotherapy Evidence Database (PEDro). Brain
Impair 1(2):130–140
Parker RI, Brossart DE (2003) Evaluating single-case research data: a comparison of seven statisti-
cal methods. Behav Ther 34:189–211
Perdices M, Tate RL (2009) Single-subject designs as a tool for evidence-based clinical practice:
are they are they unrecognised and undervalued? Neuropsychol Rehabil 19:904–927
Perdices M, Schultz R, Tate RL, McDonald S, Togher L, Savage S, Winders K (2006) The evi-
dence base of neuropsychological rehabilitation in acquired brain impairment (ABI): how good
is the research? Brain Impair 7(2):119–132
Rizvi SL, Nock MK (2008) Single-case experimental designs for the evaluation of treatments for
self-injurious and suicidal behaviors. Suicide Life Threat Behav 38(5):498–510
Schlosser RW, Braun U (1994) Efficacy of AAC interventions: methodologic issues in evaluating
behaviour change, generalization, and effects. Augment Altern Commun 10:207–223
Shadish WR, Sullivan KJ (2011) Characteristics of single-case designs used to assess intervention
effects in 2008. Behav Res 43:971–980
Shadish WR, Rindskopf DM, Hedges LV (2008) The state of the science in the meta-analysis of
single-case experimental designs. Evid Based Commun Assess Interv 3:188–196
Shamseer L, Sampson M, Bukutu C, Schmid CH, Nikles J, Tate R, Johnston BC, Zucker D,
Shadish WR, Kravitz R, Guyatt G, Altman DG, Moher D, Vohra S, The CENT Group (2015)
CONCORT extension for reporting N-of-1 Trials (CENT) 2015: explanation and elaboration.
BMJ 350:h1793. doi:10.1136/bmj.h1793
Sidman M (1960) Tactics of scientific research. Evaluating experimental data in psychology. Basic
Books, New York
Skinner CH, Skinner AI, Armstrong KJ (2000) Analysis of a client-staff-developed shaping pro-
gram designed to enhance reading persistence in an adult diagnosed with schizophrenia.
Psychiatr Rehabil J 24(1):52–57
Smith JD (2012) Single-case experimental designs: a systematic review of published research and
current standards. Psychol 17(4):510–550
Stocks JT, Williams M (1995) Evaluation of single subject data using statistical hypothesis tests
versus visual inspection of charts with and without celeration lines. J Soc Serv Res
20(3–4):105–126
Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S (2008) Rating the methodologi-
cal quality of single-subject designs and N-of-1 trials: introducing the Single-Case Experimental
Design (SCED) Scale. Neuropsychol Rehabil 18(4):385–401
Tate R, Togher L, Perdices M, McDonald S, Rosenkoetter U on behalf of the SCRIBE Steering
Committee (2012) Developing reporting guidelines for single-case experimental designs: the
3 N-of-1 Trials in the Behavioral Sciences 41
SCRIBE project. Paper presented at the 8th annual conference of the Special Interest Group in
Neuropsychological rehabilitation of the World Federation of NeuroRehabilitation, Maastricht,
July 2012. Abstract in Brain Impair 13(1):135
Tate RL, Perdices M, Rosenkoetter U, Wakim D, Godbee K, Togher L, McDonald S (2013)
Revision of a method quality rating scale for single-case experimental designs and N-of-1 tri-
als: the 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychol Rehabil
23(5):619–638
Tate RL, Perdices M, McDonald S, Togher L, Rosenkoetter U (2014) The design, conduct and
report of single-case research: resources to improve the quality of the neurorehabilitation litera-
ture. Neuropsychol Rehabil 24(3–4):315–331
Travis R, Sturmey P (2010) Functional analysis and treatment of the delusional statements of a
man with multiple disabilities: a four-year follow-up. J Appl Behav Anal 43(4):745–749
Vohra S, Shamseer L, Sampson M, Bukutu C, Schmid CH, Tate R, Nikles J, Zucker D, Kravitz R,
Guyatt G, Altman DG, Moher D, The CENT Group (2015) CONCORT extension for reporting
N-of-1 trials (CENT) 2015 statement. BMJ 350:h1738. doi:10.1136/bmj.h1738
Wampold BE, Furlong MJ (1981) Randomization tests in single-subject designs: illustrative exam-
ples. J Behav Assess 3(4):329–341
Chapter 4
N-of-1 Trials in Medical Contexts
Geoffrey Mitchell
Introduction
After a doctor has taken a history, examined the patient and possibly ordered and
reviewed pathology tests and radiological examinations, a decision is made about
the nature of the presenting problem. From this arises one of the most critical deci-
sions to be made: how to manage the problem. This will often result in a medicine
being prescribed. But, how does the doctor decide what treatment is best for that
condition?
The following discussion relates to the likelihood of a treatment improving a
presenting symptom, like pain or nausea. It does not relate to long-term treatments
aimed at preventing a consequence like a stroke or heart attack. Here randomized
controlled trials (RCTs) conducted on large populations and with long follow-up
times are the only way of estimating benefit.
G. Mitchell (*)
School of Medicine, The University of Queensland, Ipswich, QLD, Australia
e-mail: g.mitchell@uq.edu.au
Therapeutic decision-making is not easy. Ideally the clinician will utilize published
evidence for treatment efficacy, by either knowing the evidence for treatments, or
searching for it amongst the vast academic literature at his or her disposal. Often
there are clinical guidelines, which have been developed by expert reference groups
who have identified and evaluated the literature and made considered recommenda-
tions. However, it is common that there is no credible evidence to guide a specific
situation, and the clinician has to decide on relatively flimsy grounds. Sometimes
clinicians choose a treatment on the basis of probable physiological or biochemical
effect. While appearing logical, the reality may not match the theoretical effect.
They may also take a “try it and see” approach, either on the basis of published tri-
als, less robust evidence, or intuition.
Gold standard trial evidence comes from randomized controlled trials (RCTs).
These are trials where the subjects are allocated a treatment purely by chance. The
treatments in the trial are the test treatment, and either a comparator treatment in
common use or a placebo, or dummy medicine. Sometimes the trial involves both
groups being given the best available treatment, plus either the test treatment or a
placebo.
In RCTs, subjects have an equal chance of being randomly assigned to the test
treatment or the comparator. There are important reasons for testing a treatment in
this way. Firstly, if a person is offered a treatment for a problem, they expect to
observe an effect, whether or not an effect is actually there. This is called the pla-
cebo effect. In clinical practice, if a patient is prescribed a treatment and he or she
experiences an improvement, it may be the placebo effect where taking a tablet
leads to a presumed improvement. Alternatively, the observed improvement may
have occurred, simply because the disease was resolving in line with its natural his-
tory. The illness may spontaneously resolve, as do upper respiratory illnesses caused
by viruses.
Secondly, trials of treatment may demonstrate an improvement that is actually
due to an unrelated factor. The patient may be taking an “over the counter” treat-
ment, unknown to the doctor, and it might be impossible to tell which treatment, if
any, was responsible for the resolution. Alternatively, treatment effectiveness may
be influenced positively or negatively by some unrelated issue, termed a confounder,
like age, gender or smoking status. If the sample size is large enough, randomizing
the participants should lead to confounders being evenly distributed across the two
groups, leading their effects on the trial outcome to be negated. The only thing that
should influence the outcome is the test treatment.
4 N-of-1 Trials in Medical Contexts 45
Minimizing Bias
The whole purpose of RCTs is to try to eliminate the risk of the results being biased
by something. There are a myriad of types of bias.
Some forms of bias (Higgins and Green 2011):
1. Selection bias. Participants are (consciously or unconsciously) allocated to one
or other treatment group in the trial on the basis of the likelihood of improvement
or poor likelihood of improvement, or some other parameters, like age, gender or
appearance. This is prevented by a selection process that is truly random, like a
computer generated randomization schedule. The randomization is done by
someone completely at arms-length to the participants.
2. Performance bias. If those observing the patients are involved in their care, and
know to which arm the person has been assigned, there is a risk that they may be
wishing for a positive outcome in the trial, and (hopefully unconsciously) make
observations in favor of one treatment over another. This is dealt with by blind-
ing the allocation of the treatment to both the treating clinician and the person
receiving the treatment, so called double blinding. This may not always be pos-
sible. The next best alternative is that the person doing the assessment of effect
is not the treating clinician, but someone blind to the allocation.
3. Detection bias. This is where the results are derived in a manner in which they
may be selectively reported. An example might be clinical records of blood pres-
sure. The clinician (usually unconsciously) may record the blood pressure more
often in a particular group of patients, perhaps on the assumption that it is more
useful to do so. Such groups might include overweight or obese people com-
pared with normal weight individuals, women on the oral contraceptive pill com-
pared with those not on the pill, and so on.
4. Attrition bias. Here some people drop out of the trial in a differential way. For
example, the trial treatment may give some people a headache, but the treatment
is being tested for its ability to reduce their cholesterol levels. If the people who
drop out are not taken into account in the analysis, a skewed result in favor of
people who do not suffer headaches will occur. This is countered by so-called
“intention to treat” analysis, where every person who enters the experiment is
accounted for, and the characteristics of those who drop out are compared with
those who stay. Further, a means of accounting for missing data in dropouts is
devised so they are represented within the trial results.
5. Publication bias. Researchers are (unconsciously) less willing to report trials
where the test treatment did not work, than those where the treatment was suc-
cessful. Furthermore, journals are more likely to publish trials with positive
results, than trials reporting no change in outcomes. Thus what is available to the
clinician making the decision is not the full story. This is countered by the
requirement to register the trials before commencement, so that interested people
can use trials registries to track whether all or most of the trials being conducted
are actually reported in the literature.
46 G. Mitchell
We have shown that there are problems in trial design, which researchers try very
hard to minimize. However, decision-making is made even more difficult by the
nature of clinical trials.
1. The evidence may be from clinical trials that have been conducted in different
circumstances to that of the patient. Approved medicines have to have support-
ing evidence for a particular condition, a particular dose and a particular form of
the treatment (for example, tablets, capsules or liquids). The most common issue
is that the evidence is for a condition that is similar to, but not the same as, the
patient’s problem. In some cases the trial population is quite different to that of
the patients. For example, a particular analgesic may have been tested in people
with postoperative pain. It will be challenging to apply evidence derived from
this setting to people with chronic pain seen in a physician’s office.
2. Clinical trials often have very restrictive inclusion and exclusion criteria, so that
the characteristics of the patients where benefit was displayed may be quite dif-
ferent to the characteristics of the patient in front of the doctor, even if the setting
is the same as the trial. The classic example of this is where medicines with
approval for use in adults are used in children. Another situation arises where the
person’s condition is similar to, but not the same as, the condition for which the
approval has been obtained. Prescription in these circumstances is called off-
label usage.
3. The evidence may be inferential, rather than actual trial evidence. This is virtu-
ally always the case in treatments for pregnancy. It is rare indeed for ethics
approval to be granted for trials of a medicine to be conducted on pregnant
women, because of the fear of adverse effects on the fetus. All that can be done
is for studies to be conducted on pregnant animals, usually using doses far in
excess of those to be applied in humans, looking for adverse effects on the fetus.
For older medicines, clinical data on use and outcomes in pregnancy that may
have been collected over many years can be used, but this is observational rather
than trial based.
Trial data are based on population estimates of effect, but clinical decisions have to
be made about individuals.
Arguably the most important constraint of RCT evidence is that it presents popu-
lation estimates of effect – the mean effect of the treatment group is compared with
that of the control group. However, within each of the groups, there will be a range
of responses, and individuals may well have a contrary response to that of the bulk
of people in that group. In Fig. 4.1, one of the intervention values falls within the
4 N-of-1 Trials in Medical Contexts 47
confidence intervals of the control group and so the intervention actually had no
effect. Three individuals in the intervention had such major effects that they fell
outside of the upper 95 % confidence interval of the control group. These individu-
als responded so strongly to the comparator treatment (which might be placebo!)
that they fell within the intervention confidence intervals – they had a very strong
control treatment effect.
To try and overcome the problem of reporting average effects, different measures
have been derived to help the clinician. These have in common an estimate of the
likelihood that a treatment will work for a given person. Furthermore, because the
control group did not have access to the intervention treatment, there is no way of
knowing how they would respond if subjected to the intervention.
Cross-Over Studies
Cross-over studies involve the subject having both the intervention and treatment
arms sequentially, and in random order. The main objective is to provide a single
dataset from each participant in both the intervention and control arms. While it is
possible to use the intervention and control arms to estimate the intervention
response in the individual, this is usually not done. The data from a single pair could
produce a result that does not reflect the actual participant response, simply by
chance. Usually, trial data from a crossover study is not analyzed until the entire
dataset is complete.
Fig. 4.1 The distribution of individual results in a randomized controlled trial (RCT). Blue values
are control values, green are intervention values. Individual values that fall outside the confidence
intervals of the other group have a black border
48 G. Mitchell
N-of-1 Studies
N-of-1 studies are double blind, placebo controlled multiple crossover trials
measuring immediate treatment effects. They are described in detail in Sect. 1.1.
The key differences compared with RCTs are that each participant receives both the
intervention and comparator treatment in random order, and this is repeated multi-
ple times – ideally at least three times. Each pair, termed a cycle, is analyzed. If the
intervention treatment shows a stronger effect than the comparator in each cycle,
this is the strongest evidence possible for the intervention in that participant. If the
majority of cycles demonstrate a benefit for the test treatment, then the person is a
probable responder. If more than one cycle favors the comparator, the patient is
deemed a non-responder to the intervention treatment. This is one way of describing
the results – there are others (See Chap. 9).
The trial design overcomes some key limitations of RCTs. In particular, N-of-1
studies allow all participants to receive both the active and the comparator treat-
ments. Hence individual treatment decisions can be made with more certainty than
those using RCT information.
Individual Decision-Making
For the reasons described above, RCTs have significant limitations. Most trial
designs cannot estimate the efficacy of a treatment for an individual. There may be
situations where this could be quite critical. These include: the treatment in question
might be very expensive; there may be a significant side effect profile; or the treat-
ment is controversial. It would be ideal to take such treatments only when a benefit
will be obtained. N-of-1 studies have been used to assist in rational decision making
for individual patients and their clinicians.
Example
A study has been completed which compares paracetamol with a non-steroidal anti-
inflammatory medicine (NSAID), for chronic osteoarthritic pain of large joints
(Yelland et al. 2006). NSAIDS can be very effective as treatment for osteoarthritis,
but they carry a significant risk of gastro-intestinal bleeding, and of exacerbating
both heart failure and renal impairment. Paracetamol has a more benign side effect
profile, so may be an acceptable alternative so long as the clinical relief obtained is
acceptable.
Each participant had three cycles, comprising 2 weeks of each of paracetamol
and celecoxib in random order. To ensure blinding, every day they took active
4 N-of-1 Trials in Medical Contexts 49
paracetamol, they also took placebo celecoxib, and vice versa. Every day they
completed a symptom diary. At the end of the 12 week study period, the order
within each pair was unmasked and the symptom diary data were analyzed. It was
then possible to determine whether the patient’s arthritis symptom improved with
the NSAID or whether equal or better relief was obtained with paracetamol (Figs. 4.2
and 4.3).
Some of the treatments which have been tested in this way are found in Table 4.1.
The technique has been used in symptom management in cancer treatment and pal-
liative care, chronic non-malignant pain, Attention Deficit Hyperactivity Disorder,
natural therapies vs prescription therapy for insomnia, and melatonin for sleep in
children with ADHD.
The method can be used for any treatment where the following characteristics
apply (Nikles et al. 2011):
• The treatment has to be expensive, or have a significant side effect profile, or is
controversial (these trials can be complex to set up, and the effort has to be worth
the cost);
• The condition is present at all times and has minimal fluctuations over time.
• The treatment does not alter the underlying pathology, but only treats
symptoms;
• There is a short half-life;
• There is rapid onset of therapeutic effect and rapid reversal when the treatment is
ceased;
• There is no cumulative treatment effect.
Maximum
Pain over Time
Pain 6
3 Paracetamol
Ibuprofin
2
No Pain 0
Week Week Week Week Week Week
2 4 6 8 10 12
Fig. 4.2 Effect of non-steroidal anti-inflammatory medicines vs paracetamol for chronic arthritic
pain in a responder to paracetamol
50 G. Mitchell
6 Pain
5 Panadol
4 Actiprofen
3
2
1
0
No Pain Week Week Week Week Week Week
2 4 6 8 10 12
Fig. 4.3 Effect of non-steroidal anti-inflammatory medicines vs paracetamol for chronic arthritic
pain in a non-responder to paracetamol
Table 4.1 Conditions and treatments where N-of-1 trials have been used by our group
Study Principal author
Stimulant therapy for Attention Deficit Hyperactivity Nikles et al. (2006)
Disorder (ADHD)
Paracetamol vs celecoxib for chronic large joint arthritis Yelland et al. (2007)
Paracetamol vs ibuprofen in chronic large joint arthritis Nikles et al. (2007)
Temazepam vs Valerian for insomnia Coxeter et al. (2003)
Gabapentin vs placebo in chronic neuropathic pain Yelland et al. (2009)
Stimulants for fatigue in advanced cancer Senior et al. (2013b); Mitchell
et al. (2015)
Pilocarpine oral drops for dry mouth in palliative care Nikles et al. (2013)
Paracetamol vs placebo in people already on opioids in (Publication in preparation)
palliative care
Stimulants for acquired ADHD in children with acquired Nikles et al. (2014)
brain injury
Melatonin vs placebo for children with ADHD (In progress)
While the initial intent of N-of-1 studies is to provide the strongest possible evidence
for the effectiveness of a treatment in an individual, another use has emerged
(Zucker et al. 1997; Nikles et al. 2011). This is to provide a population estimate
comparable in power to a full RCT by aggregating a series of individual N-of-1 tri-
als. The difference between the two trial designs is that in an RCT, the participant
only receives either the active treatment or placebo/comparator. Potential partici-
pants may baulk at the prospect of receiving the comparator/placebo and withdraw
or not participate. Since participants receive both test and comparator states, it they
4 N-of-1 Trials in Medical Contexts 51
may be more willing to participate in aggregated N-of-1 trials if they find out
whether the test treatment works for them.
Each participant in an N-of-1 trial contributes multiple datasets (often three or
more) to each of the intervention and control arms of the trial Therefore, aggregating
multiple N-of-1 studies is in effect a cluster randomized controlled trial, with the unit
of the cluster being the individual patient. The number of participants required is far
less than the equivalent RCT, and the two comparator groups are perfectly matched.
This technique can be used for some treatments where the treatment meets the
characteristics described above. It may be an alternative to RCTs in patient groups
where conducting a standard RCT is difficult or impossible (Nikles et al. 2011). For
example, some populations are so small that there are not enough participants to
generate the required sample size. This includes rare genetic conditions, rare can-
cers, and infrequent events like brain injury in children. A further scenario involves
patients who are difficult to recruit or to retain in a trial, such as people approaching
the end of life.
Example
Adults with advanced cancer frequently have fatigue, which is difficult to treat. The
National Cancer Control Initiative of the USA has recommended stimulants
(National Comprehensive Cancer 2010), but the evidence base is scant. We under-
took an aggregated N-of-1 study of methylphenidate (MPH), which yielded 43
patients and 84 completed cycles. The equivalent RCT would have required 94 par-
ticipants. The population estimate showed that there was no difference between
treatment and placebo. However, because each participant contributed data to both
intervention and control data, a report for each patient was generated. We showed
that, although there was a negative population finding, seven patients had important
differences favoring MPH over placebo, and one patient had important worsening
of fatigue on MPH (Senior et al. 2013a; Mitchell et al. (2015) (Fig. 4.4).
N-of-1 trials produce results that can guide individuals and their clinicians to make
rational treatment choices. While this is easiest when all three cycles show results in
favor of the intervention, it is also useful when two of the three interventions favor
the intervention, but the level of uncertainty rises because of the risk that the results
could have arisen by chance (Nikles et al. 2000).
These trials can be complex to establish and run. However, the advantage to the
patient is more certainty that the treatment will work for them. The patient also has
the advantage of not taking medicines that are known not to work, which reduces
patient costs and the risk of adverse events and drug interactions with the test
medicine.
52 G. Mitchell
30
20
10
-10
-20
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
m 3
n
3
ea
Patient ID (ordered by estimated mean)
Fig. 4.4 Mean difference (95 % credible intervals) between methylphenidate (MPH) compared to
placebo on individual fatigue scores (FACIT-F) for each patient (circle) and the overall group
(square) (Note: the solid black circles designate positive responders, the hollow circles designate
non-responders, and the solid gray circle designates a negative responder)
N-of-1 trials represent a major shift in the way clinicians could think about
treatment decisions. Given the time pressures that most work under, rapid decision-
making tends to override the benefits of certainty that treatments can work by
conducting an N-of-1 trial. However, treatment decisions can have longstanding
consequences. The decision to commence stimulant therapy for a child with ADHD
for example, could lead to years of therapy. To have at the clinician’s disposal a
means of determining if the treatment is effective or not before the decision to pre-
scribe long-term is made, should lead the clinician to make use of this opportunity.
This has been borne out in a study of treatment decisions in children with ADHD,
who underwent N-of-1 trials of stimulant therapy (Nikles et al. 2006). Forty-five
doctors across Australia requested 108 N-of-1 trials, of which 86 were completed.
In 69 drug-versus-placebo comparisons, 29 children responded better to stimulant
than placebo. Immediately post-trial, 19 of 25 (76 %) drug-versus- placebo respond-
ers stayed on the same stimulant, and 13 of 24 (54.2 %) non-responders ceased or
switched stimulants. In 40 of 63 (63.5 %) for which data were available, post-trial
management was consistent with the trial results. For all types of N-of-1 trials, man-
agement changed for 28 of 64 (43.8 %) children for whom information was avail-
able. 12 months after the trial, 89 % of participants were still adhering to the
treatment consistent with the trial, with the concordance rate falling from 50 % at
the time of trial to 38 % (Nikles et al. 2007) (Figs. 4.5 and 4.6).
Another use of N-of-1 studies is to assist in determining if an existing treatment
is working or not. Professor Dave Sackett, one of the founding fathers of evidence
based practice, established a single patient trial clinic at his hospital. Its purpose was
4 N-of-1 Trials in Medical Contexts 53
Fig. 4.5 Stimulant vs comparator trials in ADHD children – concordance rate from time of N-of-1
test result
Fig. 4.6 Treatment decisions by time after N-of-1 test results received – non-responders
to utilize the method to help solve challenging clinical dilemmas. This clinic arose
from a case of intractable asthma where the clinicians thought there was a better
response to one treatment (theophylline) than an alternative one (ipratropium).
Sackett, using an N-of-1 approach, showed that the patient felt worse on theophyl-
line than when not taking it (Sackett 2011). The opportunities that such clinics
could create in terms of higher quality treatment, better decision making, improved
patient outcomes, and reduced system costs are obvious.
N-of-1 studies are not a panacea. They are useful only when certain conditions are
met in the treatment to be tested, as discussed above. They can be complex, and it
may take considerable setting up to ensure they are done well. If there is access to a
54 G. Mitchell
service that can set them up for the clinician, this could make it simpler – more like
ordering a pathology or radiology test.
The use of evidence derived from aggregating N-of-1 studies is limited because
few of these tests have been done. In addition, there is overwhelming acceptance of
RCTs as the gold standard, and the place of neither individual nor aggregated N-of-1
studies is not clear in the minds of most clinicians. They are yet to find their place
in the clinical armamentarium. Finally, there is a perceived risk that the results of
aggregated N-of-1 studies may not be generalizable. The very characteristic that
makes them attractive in situations where gathering evidence is difficult – small
participant numbers needed to get adequate statistical power – may lead to the sam-
ple not being representative of the broader population in question. In fact, this is the
case for all RCTs, particularly of symptom interventions. It is important if possible
to compare the test population with the characteristics of the broader population in
question, to try and ensure that the results are generalizable.
Conclusion
Clinicians make treatment decisions on a regular basis, and some decisions may
result in patients taking treatments for years. This decision-making is a core skill of
clinicians, and if possible it should be evidence based. The problem is that the most
common tool to aid this decision making, the RCT, has many problems which can
lead to a patient being prescribed a treatment that may not work for them. N-of-1
studies may be useful tools to assist in making the best decision possible. This chapter
argues the case for N-of-1 studies assuming a place in the clinical armamentarium.
Future chapters will look in detail at how this can be done.
References
Coxeter PD, Schluter PJ, Eastwood HL, Nikles CJ, Glasziou PP (2003) Valerian does not appear
to reduce symptoms for patients with chronic insomnia in general practice using a series of
randomised n-of-1 trials. Complement Ther Med 11:215–222
Higgins J, Green SE (2011) Cochrane handbook for systematic reviews of interventions. Version
5.1.0. The Cohrane Collaboration, London
Mitchell G, Hardy J, Nikles J, Carmont S, Senior H, Schluter P, Good P, Currow D (2015) The
effect of methylphenidate on fatigue in advanced cancer: an aggregated n-of-1 trial. J Pain
Symptom Manage. doi:10.1016/jpainsymman.2015.03.009
Network NCC (2010) NCCN clinical practice guidelines in oncology- adult cancer pain V.1.2010.
NCCN, Washington, DC
Nikles CJ, Glasziou PP, Del Mar CB, Duggan CM, Mitchell G (2000) N of 1 trials. Practical tools
for medication management. Aust Fam Physician 29:1108–1112
Nikles CJ, Mitchell GK, Del Mar CB, Clavarino A, Mcnairn N (2006) An n-of-1 trial service in
clinical practice: testing the effectiveness of stimulants for attention-deficit/hyperactivity disor-
der. Pediatrics 117:2040–2046
4 N-of-1 Trials in Medical Contexts 55
Nikles CJ, Mitchell GK, Del Mar CB, Mcnairn N, Clavarino A (2007) Long-term changes in
management following n-of-1 trials of stimulants in attention-deficit/hyperactivity disorder.
Eur J Clin Pharmacol 63:985–989
Nikles J, Mitchell GK, Schluter P, Good P, Hardy J, Rowett D, Shelby-James T, Vohra S, Currow
D (2011) Aggregating single patient (n-of-1) trials in populations where recruitment and reten-
tion was difficult: the case of palliative care. J Clin Epidemiol 64:471–480
Nikles J, Mitchell GK, Hardy J, Agar M, Senior H, Carmont SA, Schluter PJ, Good P, Vora R,
Currow D (2013) Do pilocarpine drops help dry mouth in palliative care patients: a protocol for
an aggregated series of n-of-1 trials. BMC Palliat Care 12:39
Nikles CJ, Mckinlay L, Mitchell GK, Carmont SA, Senior HE, Waugh MC, Epps A, Schluter PJ,
Lloyd OT (2014) Aggregated n-of-1 trials of central nervous system stimulants versus placebo
for paediatric traumatic brain injury–a pilot study. Trials 15:54
Sackett DL (2011) Clinician-trialist rounds: 4. why not do an N-of-1 RCT? Clin Trials 8:350–352
Senior HE, Mckinlay L, Nikles J, Schluter PJ, Carmont SA, Waugh MC, Epps A, Lloyd O, Mitchell
GK (2013a) Central nervous system stimulants for secondary attention deficit-hyperactivity
disorder after paediatric traumatic brain injury: a rationale and protocol for single patient (n-of-
1) multiple cross-over trials. BMC Pediatr 13:89
Senior HE, Mitchell GK, Nikles J, Carmont SA, Schluter PJ, Currow DC, Vora R, Yelland MJ,
Agar M, Good PD, Hardy JR (2013b) Using aggregated single patient (N-of-1) trials to deter-
mine the effectiveness of psychostimulants to reduce fatigue in advanced cancer patients: a
rationale and protocol. BMC Palliat Care 12:17
Yelland MJ, Nikles CJ, Mcnairn N, Del Mar CB, Schluter PJ, Brown RM (2006) Celecoxib
compared with sustained-release paracetamol for osteoarthritis: a series of n-of-1 trials.
Rheumatology (Oxford) 46:135–140
Yelland MJ, Nikles CJ, Mcnairn N, Del Mar CB, Schluter PJ, Brown RM (2007) Celecoxib
compared with sustained-release paracetamol for osteoarthritis: a series of n-of-1 trials.
Rheumatology (Oxford) 46:135–140
Yelland MJ, Poulos CJ, Pillans PI, Bashford GM, Nikles CJ, Sturtevant JM, Vine N, Del Mar CB,
Schluter PJ, Tan M, Chan J, MacKenzie F, Brown R (2009) N-of-1 randomized trials to assess
the efficacy of gabapentin for chronic neuropathic pain. Pain Med 10:754–761
Zucker DR, Schmid CH, McIntosh MW, D’Agostino RB, Selker HP, Lau J (1997) Combining
single patient (N-of-1) trials to estimate population treatment effects and to evaluate individual
patient responses to treatment. J Clin Epidemiol 50:401–410
Chapter 5
Aggregated N-of-1 Trials
Geoffrey Mitchell
G. Mitchell (*)
School of Medicine, The University of Queensland, Ipswich, QLD, Australia
e-mail: g.mitchell@uq.edu.au
are reported. Clinicians use these results to decide whether to use the treatment or
not, based on these population results.
Please note that the following discussion is only relevant to evidence about
treatment interventions, and not large scale, long term, population-based outcomes.
Where the outcome is death or a major morbid event, thousand of people have to be
followed for years to identify a difference in mortality or morbidity between inter-
vention and comparator treatments.
Sample size is determined by the size of the probable effect of the intervention.
If there is an expected large effect size, then the sample size is small. Conversely the
sample size may be in the hundreds or thousands if the effect or event sought has a
low prevalence. This is the case for outcomes like deaths or life threatening events
where to a particular treatment is trying to avoid these. An example might be a treat-
ment to prevent deep venous thrombosis during airline flights. Finally, if the impact
of a limited intervention on a parameter such as quality of life is likely to be small,
a large sample will be needed to demonstrate it.
It can be very difficult to reach the predetermined sample size. There may be
stringent inclusion and exclusion criteria, which make most potential participants
ineligible. It may be necessary to screen many people to find one eligible partici-
pant. Trials may require long recruitment times and very large budgets to achieve
the sample size, even if the condition is relatively common.
There are some situations where achieving the sample size is exceedingly
difficult (Nikles et al. 2011). For example, the condition may have a low prevalence.
Say a trial is planned where the estimated sample size for an RCT is 250. If there are
only 500 people in the country with the condition, the recruiters would have to
recruit half of all eligible people into the trial. This is virtually impossible.
There may be conditions where it is very difficult to recruit or retain people.
Palliative care is the classic example. The subjects are very sick and can deteriorate
very quickly. Hence relatively small numbers may agree to participate (Davis and
Mitchell 2012). Even if they agree, they may not stay well enough to complete the
trial, and their data are lost when they withdraw or die. Gate-keepers may limit
access to potential participants on the basis that the patient is too sick. Formal RCTs
in palliative care populations often require multiple recruitment sites, significant
staffing and long recruitment time frames to achieve the required sample size
(Shelby-James et al. 2012).
Individual N-of-1 trials have been described elsewhere in the book (Chaps. 3 and 4).
The characteristics that are so useful in providing information about the efficacy of
a treatment in the individual can be used to provide population estimates of effect
(Nikles et al. 2011). An individual N-of-1 study could be considered as a series of
RCTs in an individual. Each cycle is a double blind, placebo- or comparator-
controlled trial, repeated multiple times.
Therefore, an N-of-1 trial comprising three cycles could be considered as a series
of three RCTs, where the participants are perfectly matched. If multiple people do
60 G. Mitchell
Participant 1 Participant 2
Fig. 5.1 Aggregated N-of-1 trials contributing multiple datasets to a virtual randomized con-
trolled trial. Effectively, each participant provides multiple datasets to each side of an RCT
the same trial, then it can be considered as a cluster RCT, with the unit of clustering
being the individual participant.
Hence if each person participating does a three cycle N-of-1 trial, they contribute
three sets of data about the test treatment, and three for the comparator (Fig. 5.1).
Even accounting for a cluster effect, there is a dramatic escalation in the accumulation
of trial data with the addition of each new person to such a trial.
Sample Size
If a participant drops out part way through an RCT, or is lost to follow-up, those data
are effectively lost to the trial. This is an expected problem, and the sample size is
inflated to account for this. If a patient does drop out, but completes at least one
cycle of data in an N-of-1 trial, then the completed cycles can be added to the final
dataset for analysis. The sample size can be described in terms of the number of
participants required, and the number of cycles completed. This reduces the number
of patients required to be recruited, and shortens the trial and reduces the cost.
The result of a parallel arm RCT is the mean effect and standard deviation of each
group, and a decision is made about effectiveness based on the difference in effect
between the groups. No inference can be made about what happens to individuals
62 G. Mitchell
Fig. 5.2 Multiple N-of-1 trials of methylphenidate vs placebo for cancer fatigue. Participants with
shaded dots show clear improvement, most show no change (unshaded dots) and one (grey dotted
lines) is worse. All this is hidden within a single population estimate
because any one individual only gets exposed to one arm of the trial. Contrast this
with the results seen in the aggregated N-of-1 trial. Figure 5.2 shows the results of
an N-of-1 trial of methylphenidate in fatigue in palliative care patients (Mitchell
et al. 2015). The mean and confidence intervals of the group are small. However,
note the variation in individual responses that are in effect hidden within this popu-
lation result.
N-of-1 studies allow an indication of what happens within a trial. How many will
fall outside the result as described by the population result, and actually benefit
from the test treatment? There will also be some who get worse on the treatment.
It could be argued that this is a major weakness of parallel arm RCT results
where the trial is testing an intervention for a symptom. There may be a case for
developing a technique where an estimate of the proportion of people who will
respond is made. This could be done by adding a crossover element to the trial, for
suitable treatments. The original calculations would be done on the first crossover,
and the population estimate calculated. The proportions that appeared to respond
and get worse could be estimated. The trial would report both the population
response and the estimated proportions who appeared to respond or get worse.
Because there is only one crossover, the level of precision is not as high as in a
multiple crossover design. This suggestion is a pragmatic balance between preci-
sion and the practicalities of lengthening a trial with the added expense and time this
would entail.
Then the discussion between doctor and patient around treatment decisions
could be couched as follows for a trial with a null treatment effect: “Most people
5 Aggregated N-of-1 Trials 63
in the population will not derive a benefit. However, it is estimated that (say)
20 % will benefit, and (say) 5 % will get worse if they try it. Would you consider
a trial of this treatment where you have those odds of a positive or a negative
response?”
In Chap. 16 we discuss how to analyze aggregated N-of-1 trials.
Assume a trial determines that some people respond to a treatment and not others.
Two distinct groups therefore exist in the trial population, and it should be possible
to compare demographic and clinical characteristics of the two groups, looking for
markers likely to predict response.
However, the study is usually powered for the primary outcome – a change in
a clinical condition like pain. Unless it was powered to identify characteristics,
there is a risk of Type 2 error when reporting characteristics of responders. If there
are statistically significant characteristics present in responders with these smaller
numbers, then they are truly representative of clinical differences between the
intervention and control group. Then the above argument arises: do these observed
differences apply to a larger population?
The very small participant number runs a similar risk to an underpowered study
in some people’s minds– does a negative result mean it is not possible to say that it
truly represents the outcomes for a person to whom the study results could be
applied, and would a larger sample give a more reliable result?
64 G. Mitchell
The same sample size problem applies to this issue. Is the sample size big enough
to detect low prevalence events, as many adverse effects of treatments are? It may
not be and therefore there is a risk of reporting no adverse events, when the risk of
a potentially dangerous adverse event may exist. There is no counter to this problem,
except to demonstrate a robust process to record and assess any adverse events that
do occur. Minor adverse events can be recorded, and serious adverse events need to
be reviewed by an independent drug safety monitoring committee (See Chap. 10).
All that can be done is to state in the discussion that serious side effects may exist,
but were not observed in the test population.
It is possible that by presenting individual results to the patient and clinician, that the
clinician may start to think that certain patients will respond to the treatment and oth-
ers will not. They may then present certain patients for recruitment and avoid others.
For this reason it is ideal to have one person recruiting people and a different person
presenting the results to the patient and another seeing the completed data and making
treatment decisions. Obviously this may not be practical. Tests of whether there are
trends in recruiting responders and non-responders over the course of the trial, to
detect whether the proportions of each change over the course of the trial, have to be
conducted if the one clinician is both selecting patients and discussing the trial results.
For the same reason it is important that the person generating individual reports
is not the person analyzing the completed dataset. This is particularly the case if the
technique of determining an important clinical effect is based on the posterior prob-
ability of effect. The posterior probability figure is the one used to decide whether
the person has responded to a treatment or not, and will constantly change as new
patient data are added. This requires constant analysis of the data by the researcher.
Whether or not the participant is considered a responder to the treatment has major
implications for the patient. It is possible that the reports presented to the patient and
clinician could be influenced by the knowledge of what has been observed in the
posterior probability of effect.
expected standard deviation of the effects. These figures can be obtained from the
literature reporting similar studies, or from pilot trials large enough to obtain suffi-
cient data to estimate an effect size.
Because the trials are in effect small cluster randomized trials with the unit of
randomization being the individual, adjustment of the sample size has to take into
account the highest possible intra-cluster coefficient of one. The sample size has to
be inflated accordingly. Computer modeling can be used to estimate sample size by
running multiple simulations of the theoretical trial.
Conclusion
References
Bland M (1995) An introduction to medical statistics. Oxford University Press, New York
Davis MP, Mitchell GK (2012) Topics in research: structuring studies in palliative care. Curr Opin
Support Palliat Care 6:483–489
Hardy J, Quinn S, Fazekas B, Plummer J, Eckermann S, Agar M, Spruyt O, Rowett D, Currow DC
(2012) Randomized, double-blind, placebo-controlled study to assess the efficacy and toxicity
of subcutaneous ketamine in the management of cancer pain. J Clin Oncol 30:3611–3617
Mitchell G, Hardy J, Nikles J, Carmont S, Senior H, Schluter P, Good P, Currow D (2015) The
effect of methylphenidate on fatigue in advanced cancer: an aggregated n-of-1 trial. J Pain
Symptom Manag (in press). doi: 10.1016/jpainsymman.2015.03.009
Nikles J, Mitchell GK, Schluter P, Good P, Hardy J, Rowett D, Shelby-James T, Vohra S, Currow
D (2011) Aggregating single patient (n-of-1) trials in populations where recruitment and retention
was difficult: the case of palliative care. J Clin Epidemiol 64:471–480
Shelby-James TM, Hardy J, Agar M, Yates P, Mitchell G, Sanderson C, Luckett T, Abernethy AP,
Currow DC (2012) Designing and conducting randomized controlled trials in palliative care: a
summary of discussions from the 2010 clinical research forum of the Australian Palliative Care
Clinical Studies Collaborative. Palliat Med 26:1042–1047
Chapter 6
Methodological Considerations
for N-of-1 Trials
Introduction
(Armitage 1975; Kenword and Jones 1987; Wei and Durham 1978). One approach
to designing an RCT is the use of optimal experimental designs. An optimal design
is a technique designed to assist a decision maker in identifying a preferable choice
among many possible alternatives. Among the many RCT designs available, the
most useful and popular design is the crossover design. For example, in a survey
done in 1980 of numerous studies on the effects of antianxiety drugs on human
performance, 68 % of the studies used the crossover approach (Brown 1980). It is
still certainly one of the most popular approaches being adopted in many epidemio-
logic and pharmaceutical trials (Figueiras et al. 2005).
To illustrate the logistics of choosing a particular design, we first note that there
are a number of excellent articles on optimal designs in the RCT literature. See for
example Cheng and Wu 1980; Carriere 1994; Carriere and Huang 2000; Liang and
Carriere 2009; Laska and Meisner 1985; (Afsarinejed and Hedayat 2002; Kunert
and Stufken 2002, 2008). However, most of these designs, if not all, focus on
optimizing the treatment effect for an average patient. The average patient is a
construct – a virtual person who responds to the intervention by the mean of the
population’s responses. Individuals enrolled in a trial will respond better or worse
than, or simply differently from the average patient. The available optimal designs
are not adequate when studying individual-based treatment effects is desired.
Multi-crossover single-patient trials, known as N-of-1 trials, are often employed
when the focus is to make the best possible treatment decision for an individual
patient. From a clinician’s perspective, having clear evidence of the value of one
treatment over another (or no treatment) is far more useful than knowing the average
response. The average response gives the clinician the probability that a treatment
will be effective, whereas N-of-1 trials give far more certainty about whether the
treatment for the patient sitting in front of them will work or not.
The simplest two-treatment N-of-1 trial uses the AB (or BA) sequence for treat-
ments A and B; this treatment sequence has one crossover pair over two treatment
periods. Each period is chosen to be of sufficient length for the treatments being
tested to show an effect. Two periods (such as AB or BA) constitute a single cycle
in a N-of-1 trial. As the patient becomes his or her own control, N-of-1 trials provide
individual-based clinical evidence for the treatment effect, free of between-patient
variations. With the rising cost of patient care, N-of-1 trials have the potential to be
extremely useful, as they can minimize clinic visits and time on suboptimal treat-
ments (Greenfield et al. 2007; Guyatt et al. 1986; Kravitz et al. 2004; Larson 1990;
Nikles et al. 2005). To obtain stable estimates of the treatment effect, we desire to
replicate such evidence for each patient. The question is then how many such cycles
are desirable and what is the optimal order of treatment administration. Quite
naturally, designing an N-of-1 trial involves deciding on the number of cycles and
proper sequencing of treatments in order to plan the study optimally to achieve
the trial objectives. The literature is lacking in providing these guidelines for
constructing optimal N-of-1 trials. A recent book on N-of-1 trials also leaves the
choice of an ideal design to the clinician, while suggesting various possible designs
to consider (Kravitz et al. 2014).
6 Methodological Considerations for N-of-1 Trials 69
Ideally, when aggregated, the series of N-of-1 trials that are optimal for individual
patients can also provide an optimal estimate of the treatment effects for the average
patient. For example, in a multi-clinic setting in three AB pair six-period N-of-1
studies, all eight possible sequences ( 26 / 2 = 8) have been used, i.e., ABABAB,
ABABBA, ABBAAB, ABBABA and their duals to estimate both individual- based
and average treatment effects (Guyatt et al. 1990). However, it is not known whether
each of these eight sequences is optimal for individual patients. Further, it is not
known whether a collection of the optimal and not-so-optimal N-of-1 trials will lead
to optimal designs for estimating the average treatment effects. In the next Sections,
we discuss how these do not lead to optimal aggregated N-of-1 trials for estimating
the treatment effects for the average patient. We first discuss issues arising due to the
repeated nature of these experiments.
The main attraction of crossover designs is that the subject provides their own
control, as measurements are taken repeatedly from the same subject using different
treatments. If the treatment effects lasting beyond the given period are equal, they
can provide efficient within-subject estimators of direct short-term treatment effects
by removing between-subject variations.
However, there is one critical issue plaguing these repeated measurement designs
and preventing them being popular despite their practical appeal. It is because they
suffer from a long-standing controversy regarding residual treatment effects that
last beyond the given period. N-of-1 trials are no exception. Sometimes referred to
as the carryover effect, the residual effect is the effect of a previous treatment that
carries over into the subsequent treatment periods. Thus, the effect of a treatment in
a given period can be carried over to influence the responses in a subsequent period.
Often, the residual effect of a treatment may be ignorable after two periods.
However, the residual effect between the responses over two consecutive treatment
periods may not be assumed to be negligible. A “washout” period placed between
treatment periods could reduce the carryover effects, but a long washout period may
increase the risk of drop-outs. Also, there is no guarantee that it completely removes
the residual effects. Therefore, careful planning is important, as the nature of the
carryover effect may be such that the N-of-1 trial method is not feasible (Bose and
Mukherjee 2003; Carriere and Reinsel 1992; Kunert and Stufken 2002).
Nevertheless, the presence of residual effects does not invalidate the use of cross-
over designs. Rather it is the inequality of the residual effects of each treatment that
may be causing the controversies. If the residual effects are equal for the treatments,
then this has an effect in a statistical sense as if the residual effects do not exist,
because they cancel out mathematically.
Despite the concern over residual effects, ethicists apparently have less of a
problem with self-controlled designs such as crossover designs than with com-
pletely randomized or parallel-group designs (Carriere 1994). For example, when a
6 Methodological Considerations for N-of-1 Trials 71
The most widely read statistical paper on the use of crossover experiments in clinical
trials was published in 1965 by Grizzle, where the responses are modeled as:
Response = overall mean
+ period effects
+ sequence effects
+ direct treatment effects
+ residual treatment effects
+ measurement error.
Aside from the obvious overall mean effect and period effects, sequence effects may
be present due to treatments given in a different order to patients, because some patients
will be given AB or BA or some other order. While the primary objective is to study the
72 K.C. Carriere et al.
direct treatment effects, their effects may not be unique due to the unequal residual
treatment effects. Crossover design models have typically assumed that the treatments
assigned to subjects have lasting effects on their responses to treatments in subsequent
periods. A two-step approach has been used quite extensively where the unequal
residual effects are first estimated and tested for significance before proceeding to
estimate the direct treatment effects (Carriere 1994; Kunert and Stufken 2008).
When it is assumed that the carryover effects last for only one period, this is known
as a first-order residual effect model. In such a model, no interaction is assumed
between the treatment administered during the current period and the carryover
effects from the previous period. This interaction gives rise to the second-order
residual effect. Hence, the model under this assumption is basically equal to the Eq.
(1), where the term for the residual treatment effects is just the first-order residual
effects, which last for only one more period of the treatment administration.
Taking the treatment and period interactions into account, Kunert (Kunert and
Stufken 2002, 2008) presented an alternate model with self and mixed carryover
effects. The self carry-over effect occurs when the treatment administered in the
current and the previous period is the same; alternatively, if two different treatments
are administered between the periods, it is known as a mixed carryover effect. The
model under this assumption is more elaborate. From the Eq. (1), the term for the
residual treatment effects will be split and be replaced with the following two terms:
• +Self carryover effects (if a preceding treatment is the same as the current one)
• +Mixed carryover effects (if the preceding treatment is not the same as the
current one).
Optimal designs are highly model dependent. These effects are assumed to exist
unless proven insignificant and therefore a reasonable effort should be made to
separate them for an unbiased estimation of the direct treatment effects. Sometimes,
however, it is simply practically impossible to accommodate all effects in the model.
With N-of-1 trials, not all of these effects can be included in the model. Because we
are dealing with just one subject and p responses in total, the period effects cannot
be accommodated.
Repeated responses from a subject can be correlated and also involve measurement
errors. The most popular structures may be to consider the pair of measurements as
being equally correlated. Such a simple structure may work in designs with small
numbers of periods. In N-of-1 trials with at least 4 periods, we may need to consider
an auto-regressive structure, as the correlations may diminish gradually as the
pair of measurements comes farther apart in their treatment periods. Here, the
6 Methodological Considerations for N-of-1 Trials 73
correlation is assumed large for a pair of measurements from two adjacent periods
and decreases as the time between the two period increases. Some modification is
possible by assuming them to be uncorrelated if they are more than two periods
apart. See related discussion in Carriere (1994).
Table 6.1 Sequences for p = 8 with corresponding design parameter values
h Sequence Alternation s m
−7 ABABABAB 0 0 7
−5 ABABABBA
−5 ABABBABA 1 1 6
−5 ABBABABA
−3 ABABBAAB
−3 ABBAABAB 2 2 5
−3 ABBABAAB
−1 ABBAABBA 3 3 4
Note: s is the number of AA and BB and m is the number of AB and BA in a treatment sequence
with h = s-m
74 K.C. Carriere et al.
Typically, we are interested in estimating the direct and carryover treatment effects,
while all others are treated as nuisance and secondary parameters. We build designs
to this end. Due to the special nature of the design and the correlated data, we first
consider how one could approach the data analysis. When data are obtained from an
N-of-1 design, there are various ways to approach data analyses in order to gather
any beneficial treatment evidence for the patient.
One is to observe effects between successive trial conditions, as shown in
Fig. 6.1. Once the series of differences are observed, it would involve the usual
repeated measures data analytic technique to plan and analyze them (Cantoni 2004;
Davidian et al. 2009; Liang and Zeger 1986).
Alternately, one can observe and analyze treatment differences from each pair, as
shown in Fig. 6.2. The same strategy of a longitudinal and repeated measures data
analysis method as the first approach applies here, as the N-of-1 trial data were
replaced by differences from successive pairs. The paired differences are simply
regarded as repeated measurements, and more data simply means a better precision
for the treatment effect (Carriere and Huang 2000). Such repeated measurements
can be analyzed using various parametric and nonparametric methods (Liang and
Zeger 1986; Senn 2002).
The third way to look at these N-of-1 trials data is similar to the first approach,
but differs in that it holds the judgment or decision of a beneficial treatment effect
for the given patient till the end of the trial, by using likelihood methods (Kravitz
et al. 2014). As it analyzes the entire data set based on the employed model, the
Type I error probability is minimized while possibly making multiple interim
analyses and decisions. This is another consideration, discussed in this chapter in
the development of optimal N-of-1 trial design. Such a model-based general
approach could be more efficient than the first two approaches analyzing each cycle
separately within each patient (Carriere 1994; Kravitz et al. 2014).
In this chapter, we recognize that N-of-1 designs deal with small samples, and thus
we discuss an optimal strategy to collect the data based on the model by construct-
ing an optimal design rather than specific data analysis strategies. To do so, we first
need to define the information matrix under a particular model, which will contain
information about all parameters of interest under the assumed model. More details
Fig. 6.2 Analysis of results from each successive pair in a six period of N-of-1 design
about an information matrix are found in much of the design literature, for example,
Carriere and Reinsel (1993). Then, the optimal design for a set of parameters of
interest is constructed using various optimality criterion. We use the D-optimality,
which finds the optimal design by maximizing the determinant of the information
matrix. See also Cheng and Wu (1980) and Kiefer (1975). Carriere (1994) and
Carriere and Huang (2000) describe a practical approach to find optimal designs.
For N-of-1 trials, the optimal sequence is completely determined by h, as noted
in the previous section and therefore, is much simpler to construct than previously.
One could also find the optimal design that simultaneously optimizes all parameters
of interest or the one that optimizes the carryover effects under the constraint that it
optimizes the direct treatment effects. We note that the approach can also apply to
find optimal designs for estimating some linear combinations of the parameters of
interest. See also Carriere and Reinsel (1993). Since we are primarily interested in
the optimal estimation of the direct treatment effects, we do not consider these
cases. We summarize important results based on a D-optimality criterion in the next
subsections (Carriere and Reinsel 1992).
Under the traditional model, we consider the error structure to be equally correlated
between two measurements within the patient. We find the optimal design to consist
of pairs of AB and BA appearing alternatively throughout the trial. Therefore, we
find the following result.
Result 1 The optimal N-of-1 trial for estimating both the direct and residual effects is
the one sequence design that consists of pairs of AB and BA appearing alternately.
For example, the optimal designs for N-of-1 trials with 4, 6, and 8 periods are the
one sequence designs, ABBA, ABBAAB, and ABBAABBA, respectively. One
could switch A and B to obtain a dual sequence with the same effect.
76 K.C. Carriere et al.
Repeated measures may be correlated highly when they are close together in
time, while they may be negligible when they occur at a distance in time. One possible
model for such a situation is to consider auto-regressive errors, accounting for
correlation of measurements between two adjacent periods to be stronger than those
far apart. It turns out that to discuss designs with auto-regressive errors, we need to
consider whether the treatment given in the first and last periods are the same or not.
However, the optimal designs are still determined by the value of h, and in general,
the optimal design is to alternate AB and BA cycles as above.
We also considered the model with self and mixed carryover effects. The optimal
design was constructed by obtaining information matrices for the relevant parameters.
Unlike the previous case, the optimal designs for estimating the direct treatment
effect are not the same as those for residual effects.
The optimal design for estimating the direct treatment effect is the sequence with
only AB pairs, such as ABABABAB. Although it may be of less interest, the o ptimal
sequence for estimating the self carryover effect is to alternate between AB and BA
pairs, while the optimal sequence for estimating the mixed carryover effect is to
repeat the AB pair with no alternation. We summarize our findings in Result 2.
Result 2 The optimal N-of-1 trial for estimating the direct treatment and mixed
carryover effect is the sequence with only AB pairs with no alternation, such as
ABABABAB, while the optimal N-of-1 trial for estimating the self carryover effect
is the sequence with AB and BA alternating throughout the trial.
Numerical Comparison
Table 6.2 tabulates the variances of the treatment effects for 8 period N-of-1
trials under the two models. Table 6.2 reveals that the effects of choosing a specific
sequence under the self and mixed carryover effect model are rather minimal.
Careful examination also reveals that the optimal sequence for all three (direct and
residual carryover) treatment effects simultaneously is the same as that under the
traditional model.
Specifically, Table 6.2 shows that the optimal individual-based N-of-1 trials are
S63 and S81 for estimating the direct treatment effects under respective models, as
expected. However, there are no real practical differences among various N-of-1
trials under the self and mixed model. Under the self and mixed carryover model,
although S81 repeating AB pairs in each cycle is optimal for estimating the direct
treatment effect, it does not allow estimation of self carryover effects, making S63
and S83 preferable. Therefore, for robust and optimal N-of-1 trials it is recom-
mended to use a sequence alternating between AB and BA pairs, such as S63 and
S83, under all models.
In summary, it appears that there is no discernable or sizable advantage to distin-
guish among the two models and possibly various error structures. The above com-
parison remains true under both independent and equi-correlated error structures.
Overall, S63 and S83 for single N-of-1 trials seem to be the best under both
models. They are optimal for estimating direct treatment and mixed carryover
effects. Further, they are optimal for estimating both the treatment and carryover
effects under the traditional model.
Table 6.2 also shows that increasing the number of periods from 6 to 8 will result
in an efficiency gain of 0.173/0.127 = 36 % under the traditional model, while the
gain is not as substantial under a complex model.
In general, we suggest that alternating AB and BA pairs in sequence will result
in a nearly optimal design, if not the optimal one, under all models we considered,
for estimating individual effects in N-of-1 trials.
78 K.C. Carriere et al.
Adaptive designs are gaining popularity in recent years. Liang and Carriere (2009)
outline how one could plan response adaptive designs utilizing outcomes in a given
experiment while achieving multiple objectives. For example, clinicians may wish
to achieve good estimation precision, effective treatment of patients, or cost effec-
tiveness (Carriere and Huang 2000). Recently, Liang, Li, Wang, and Carriere (Liang
and Carriere 2009) extended their approach to binary responses. For N-of-1 trials,
designs can be found by updating AB or BA pairs successively as the trial progresses.
Such objectives as maintaining a balance or counterbalancing between AB and BA
pairs as suggested by Kravitz et al. (2014) can also be considered. A Bayesian
framework may be the most natural for adaptive design and decision-making.
However, not much attention has been given to finding optimal designs for binary
data in the literature, and further research is needed.
Discussion
Surprisingly, however, the results do not change by very much in practice with
adopting not so optimal sequences, as we examined in Section “Optimal N-of-1
Trials”. A numerical calculation of the estimation precision using several 6- and
8-period designs revealed the actual performance of a particular design, giving us
practical guidelines. Overall, we conclude that alternating between AB and BA
pairs in subsequent cycles will result in practically optimal N-of-1 trials for a single
patient, if not the optimal, under all the models we considered without the need to
guess at the correlation structure or conduct a pilot study. Alternating between AB
and BA pairs in a single trial is nearly robust to misspecification of the error
structure of the repeated measurements.
Lastly, we suggest that when an experiment has been carried out with the optimal
N-of-1 trial and additional patients are accrued in the trial, we can plan and aggre-
gate these N-of-1 trials optimally by allocating the same number of patients to its
dual sequence by reversing the treatment order, thereby optimizing the trial for both
the individual and average patients.
Acknowledgments This research was made possible due to the funding from the Natural Sciences
and Engineering Research Council of Canada.
References
Afsarinejed K, Hedayat A (2002) Repeated measurement designs for a model with self and mixed
carryover effects. J Stat Plan Inference 106:449–459
Armitage P (1975) Sequential medical trials. Blackwell, Oxford
Bose M, Mukherjee B (2003) Optimal crossover designs under a general model. Stat Probab Lett
62:413–418
Brown BWJ (1980) The crossover experiment for clinical trials. Biometrics 36:69–79
Cantoni E (2004) A robust approach to longitudinal data analysis. Can J Stat 32:169–180
Carriere KC (1994) Cross-over designs for clinical trials. Stat Med 13:1063–1069
Carriere KC, Huang R (2000) Crossover designs for two-treatment cancer clinical trials. J Stat
Plan Inference 87:125–134
Carriere KC, Reinsel GC (1992) Investigation of dual-balanced crossover designs for two treatments.
Biometrics 48:1157–1164
Carriere KC, Reinsel GC (1993) Optimal two-period repeated measurements designs with two or
more treatments. Biometrika 80:924–929
Cheng CS, Wu CF (1980) Balanced repeated measurements designs. Ann Stat 8:1272–1283
Davidian M, Verbeke G, Molenberghs G (2009) Longitudinal data analysis. Int Stat Rev
77:1857–1936
Figueiras A, Carracedo-Martinez E, Saez M, Taracido M (2005) Analysis of case-crossover designs
using longitudinal approaches: a simulation study. Epidemiology 16:239–246
Greenfield S, Kravits R, Duan N, Kaplan SH (2007) Heterogeneity of treatment effects: implications
for guidelines, payment, and quality assessment. Am J Med 120:S3–S9
Grizzle JE (1965) The two-period change over design and its use in clinical trials. Biometrics
21:461–480
Guyatt G, Sackett D, Taylor DW, Chong J, Roberts R, Pugsley S (1986) Determining optimal
therapy–randomized trials in individual patients. N Engl J Med 314:889–892
Guyatt GH, Heyting A, Jaeschke R, Keller J, Adachi JD, Roberts RS (1990) N of 1 randomized
trials for investigating new drugs. Control Clin Trials 11:88–100
80 K.C. Carriere et al.
Hedayat AS, Yang M (2003) Universal optimality of balanced uniform crossover designs. Ann Stat
31:978–983
Kenword M, Jones B (1987) A log-linear model for binary crossover data. Appl Stat 36:192–204
Kiefer J (1975) Construction and optimality of generalized Youden designs. In: Srivastava JN (ed)
A survey of statistical designs and linear models. North-Halland, Amsterdam
Kravitz RL, Duan N, Breslow J (2004) Evidence-based Medicine, heterogeneity of treatment
effects, and the trouble with averages. Milbank Q 82:661–687
Kravitz RL, Duan N (eds) and The Decide Methods Center N-of-1 Guidance Panel (Duan N, EI,
Gabler NB, Kaplan HC, Kravitz RL, Larson EB, Pace WD, Schmid CH, Sim I, Vohra S) (2014)
Design and implementation of N-of-1 trials: a user’s guide. Agency for Healthcare Research
and Quality, Rockville
Kunert J, Stufken J (2002) Optimal crossover designs in a model with self and mixed carryover
effects. J Am Stat Assoc 97:896–906
Kunert J, Stufken J (2008) Optimal crossover designs for two treatments in the presence of mixed
and self-carryover effects. J Am Stat Assoc 103:1641–1647 (correction 1999, 86, 234)
Larson EB (1990) N-of-1 clinical trials. West J Med 152:52–56
Laska E, Meisner M (1985) A variational approach to optimal two-treatment crossover designs:
application to carryover-effect models. J Am Stat Assoc 80:704–710
Liang YY, Carriere KC (2009) Multiple-objective response-adaptive repeated measurement
designs for clinical trials. J Stat Plan Inference 139:1134–1145
Liang KY, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika
73:13–22
Nikles CJ, Clavarino AM, Del Mar CB (2005) Using N-of-1 trials as a clinical tool to improve
prescribing. Br J Gen Pract 55:175–180
Senn S (2002) Crossover trials in clinical research. Wiley, Hoboken
Wei LJ, Durham S (1978) The randomized play the winner rule in medical trials. J Am Stat Assoc
73:840–843
Chapter 7
Randomization, Allocation Concealment,
and Blinding
Hugh Senior
Introduction
N-of-1 trials are cross-over trials with multiple cycles where each patient receives
both the intervention and control treatments. Randomization in N-of-1 trials, as in
other trial designs, aims to minimize confounding and selection bias. The process of
randomization in N-of-1 trials involves the random selection of the order of treat-
ments for each patient. Allocation concealment is a process of concealing the allo-
cation sequence from the investigator responsible for recruiting patients. It prevents
investigators from influencing the assignment of treatment to a patient and thus
prevents selection bias. Blinding is the process of keeping investigators, patients
and other researchers unaware of the assigned treatments within an N-of-1 trial to
minimize ascertainment bias. Ascertainment bias occurs if the results and conclu-
sions of a trial are influenced by knowledge of the trial medication each participant
H. Senior (*)
School of Medicine, The University of Queensland, Brisbane, Australia
e-mail: h.senior@uq.edu.au
number the envelopes in advance, (iii) place pressure sensitive or carbon paper
inside the envelope to record the patient’s name onto an assignment card thereby
creating an audit trail, (iii) insert a material such as cardboard or tinfoil in the enve-
lope that is impermeable to bright light and (iv) ensure the envelopes are opened
sequentially only after the patient’s details are written on the face of the envelope
(Schulz and Grimes 2002a; Viera and Bangdiwala 2007).
More trial designs including N-of-1 trials are using pharmacy controlled alloca-
tion to ensure allocation concealment. Eligible patients are registered into the trial
by the recruiting investigator. The investigator provides the patient with a prescrip-
tion for the trial medications, which the patient takes to the pharmacy. The pharma-
cist prepares and dispenses the trial medications in numbered containers (for
example, week 1, week 2…) according to a pre-prepared numbered randomization
list that lists the next allocation sequence. The pharmacist writes the patient’s details
alongside the next allocation sequence on the randomization list. This method also
ensures the pharmacy can provide unblinding in the case of an emergency (see later
in this chapter). In some trial designs such as community N-of-1 trials, the prescrip-
tion may be emailed/faxed to the pharmacy, who will prepare the trial medication
according to the randomization list and courier the medication containers to the
patient at home, workplace or community pharmacy. Investigators must ensure the
pharmacy is provided with and follows standard operating procedures for random-
ization and allocation concealment.
Centralized computer systems that can be accessed through a remote computer
via the internet or by telephone (through an “Interactive Voice Response System”) or
fax/e-mail are a popular method of randomization and allocation concealment. This
method is especially useful for multisite trials. This method ensures allocation con-
cealment by only assigning the next allocation sequence for an individual if the
patient is eligible and enrolled in the study. In an N-of-1 trial, the central computer
will provide a number of a sequence which is next in the randomization list. This
number is provided to the pharmacy who dispenses medications according to the
sequence of medications assigned denoted to that number. In a common N-of-1 trial
design where two treatments are randomly allocated in three cycles, for example, AB
BA AB, there are 23 = 8 possible sequences (namely, ABBAAB, ABABAB, ABBABA,
ABABBA, BABAAB, BAABAB, BABABA, BAABBA). The pharmacy will be pro-
vided with a number from 1 to 8 by the centralized computer system with each num-
ber denoting a specific sequence. An additional advantage of the centralized computer
system of treatment allocation is that the system also monitors allocation conceal-
ment though time stamps and electronic logs (Viera and Bangdiwala 2007).
concealment methods were adequate (Clark et al. 2013; Hewitt et al. 2005). The
CONSORT statement on reporting of trials states that investigators must provide
information on the mechanism used to implement the random allocation sequence
including the description of any steps taken to conceal the sequence prior to assign-
ing the trial medications (Schulz et al. 2010).
Blinding (also known as “Masking”) is a process that attempts to ensure that those
who are blinded are unaware of the treatment group (e.g. active drug or placebo) for
the duration of the trial (Schulz and Grimes 2002b). Those who are blinded may
include participants, investigators, assessors who collect the data, data safety
monitoring boards, and statisticians (Schulz and Grimes 2002b; Viera and
Bangdiwala 2007).
Blinding ensures that the participants, investigators and assessors are not
influenced by their knowledge of the intervention, thereby minimizing ascertainment
bias (Schulz et al. 1995, 2002; Schulz and Grimes 2002b; Forder et al. 2005).
In N-of-1 trials, blinding reduces the likelihood that participants will bias any
physical or psychological responses to therapy (e.g. quality of life or pain levels)
due to their preconceived perception of the value of treatment, and the reporting of
side effects (Schulz and Grimes 2002b; Matthews 2000; Friedman et al. 1998). As
participants in an N-of-1 trial receive both the control and intervention therapies
during the trial, concerns that do occur in traditional trials that are minimized by
blinding may be less relevant. These include participants seeking adjunct therapies
or withdrawing from the trial as they are dissatisfied with being randomized to a
placebo control.
For investigators, blinding reduces the influence of the knowledge of the inter-
vention on their attitude and advice to participants, patient management, their likeli-
hood of differentially adjusting the dose or administering co-interventions, or
differentially guiding participants to withdraw (Schulz and Grimes 2002b; Viera
and Bangdiwala 2007; Matthews 2000). Blinding reduces the likelihood that asses-
sors (which may be the investigator and/or another health professional) will differ-
entially assess an outcome based on their knowledge of the assigned treatment
(Viera and Bangdiwala 2007).
Please note that the term “investigators” in describing blinding is a term broadly
assigned to the trial team, which may include among others the trial designers, trial
recruiters, assessors, and health care providers treating the participant (Schulz and
Grimes 2002b).
86 H. Senior
Types of Blinding
Blinded trials can be of three types, namely single or double or triple blind. The
reader should note that these terms do not have clear definitions and are often
reported incorrectly in the literature.
Commonly, in a single-blinded trial, the participant (or sometimes the investiga-
tor) is unaware of the treatment assignment, but everyone else involved is. In some
cases, this can refer to a situation where the participant and the investigator know
the treatment assigned, whereas the assessor is blinded (Schulz and Grimes 2002b).
A single-blinded trial may be the best approach if there is a clear rationale that the
investigator must keep the participant blind to reduce bias, but their knowledge is
critical for the participant’s health and safety (Friedman et al. 1998). Alternatively,
if the intervention is actually delivered by a clinician, and the N-of-1 trial is for the
purpose of guiding treatment decisions, it may not be practical to blind the clinician.
The disadvantage of the single-blind is that the investigator may consciously or
subconsciously bias the study by biasing data collection, or differential prescription
of concomitant therapy or differential patient management (Friedman et al. 1998).
In a double-blinded trial, the participants, investigators and assessors are blinded
throughout the trial (Schulz and Grimes 2002b). As already stated, a double-blind
approach eliminates or minimizes the risk of bias in the trial. In a double-blind and
a triple blind (see below) trial it is important that there is a procedures for blinding
when assigning the interventions, and that a separate body who can be unblinded,
such as an “Independent Data Monitoring Committee” (see Chap. 10 on Adverse
Events), are responsible for assessing the data for any adverse effects and benefit
(Friedman et al. 1998).
A triple-blinded trial has the same characteristics as a double-blind trial with the
addition that those who adjudicate the study outcomes are also blinded (Schulz and
Grimes 2002b; Forder et al. 2005). This can be achieved by ensuring the data moni-
toring committee are not provided with the identity of the groups, instead, they are
assigned a code such as group A and B. This approach assumes the data monitoring
committee could be biased if the randomization status was known to them in their
assessment of adverse effects or benefit. Some investigators feel this may impede
the ability of the committee to perform their tasks of safeguarding participants by
looking at individual cases (Friedman et al. 1998). This is a decision which needs to
be made during the design of the trial in consultation with the data monitoring com-
mittee. If a triple-blinded trial is chosen, the data monitoring committee should have
the option to break the blind if the direction of an observed trend in group A or B
requires a further unblinded investigation (Friedman et al. 1998). Some trials blind
the study statistician to reduce bias during analysis again by assigning the dummy
codes of A and B to the trial groups (Matthews 2000). The groups only become
unblinded at the end of the analysis for reporting purposes.
Utilizing a double or triple blind is highly recommended in N-of-1 drug trials to
determine drug efficacy.
7 Randomization, Allocation Concealment, and Blinding 87
A placebo is a pharmacologically inactive agent used in the control group of the trial
(Schulz et al. 2002). It is used in trials where the investigator is not assessing the
effectiveness of a new active agent against an effective standard agent. Indeed, it is
more ethically sound not to provide a placebo control if an effective standard agent
exists to act as the control. However, even if a standard agent is used as a control,
investigators may include placebos by using the double-dummy method for blind-
ing (see the following section). The use of a placebo agent is critical for achieving
trial blindness.
The use of a placebo not only maintains a blind, but it also excludes the “placebo
effect” in the trial. The placebo effect occurs when an inactive agent is administered
to a patient, but this has a beneficial effect on the attitude of participants, thereby
producing a response (Schulz and Grimes 2002b; Matthews 2000). The placebo
effect occurs both in the control group and the intervention group, therefore, the
provision of a placebo balances the placebo effect across the trial groups (Schulz
and Grimes 2002b).
In some trials, investigators may involve an active placebo, rather than an inac-
tive placebo. An active placebo contains substances that will produce the symptoms
or side effects that will occur in the active intervention agent, thereby ensuring the
blind is not broken as these effects otherwise would identify the active investiga-
tional agent (Schulz and Grimes 2002b). Most placebo-controlled trials use an inac-
tive placebo.
To ensure a blinded trial, the placebo control agent and the intervention agent must
be similar.
This is especially important in cross-over trials including N-of-1 trials where
participants receive both the control (e.g. placebo) and intervention agents. The trial
agents must be similar in appearance (size, shape, color, sheen, and texture)
(Friedman et al. 1998). It may be necessary to also ensure that the taste (and odor)
is the same using masking agents. It is good practice to pre-test the similarity of the
trial agents by asking a group of independent observers not involved in the trial to
see if they can observe any differences (Friedman et al. 1998).
The most common method for drug matching in trials, and to reduce trial costs,
is to over-encapsulate the trial agents. Over-encapsulation is the process of placing
trial tablets or capsules in a hard gelatin capsule and backfilling with an inactive
excipient to produce identical capsules. When an investigator decides to produce
drug matches by over-encapsulation they must consider the size of the final capsule
and whether this will be difficult to swallow for participants who have swallowing
difficulties, such as older people, stroke patients, and young children.
88 H. Senior
Unblinding Procedures
All blinded trials must have unblinding processes for individual participants, espe-
cially for a “Data Monitoring Committee” to access benefit and safety of the trial,
and also for doctors to be able to identify what agents an individual is prescribed in
a medical emergency (Matthews 2000). In many cases, the participant can be
7 Randomization, Allocation Concealment, and Blinding 89
withdrawn from the trial medication without breaking the blind. Unblinding
procedures may involve an individual or group (e.g. trial pharmacy) other than the
trial investigator to ensure the investigator and participant can remain blinded. If a
participant or investigator is unblinded, this must be noted as a protocol deviation/
violation. For additional information on unblinding procedures, see Chap. 10.
One major advantage of N-of-1 trial designs over other trial designs is that at the end
of the trial for an individual participant, the individual’s data can be analyzed and a
report prepared on the effectiveness of the intervention compared to control, for a
clinician to consult with the patient on whether to continue with trial medication
under routine care (see Chap. 9). However, if the study is a set of N-of-1 trials, a
problem arises, as even though an individual patient has finished the trial and a report
has been generated, other participants are still to be recruited or are still being fol-
lowed in the trial by the same investigator. Clinicians could possibly perceive a pat-
tern of treatment effect in certain sorts of individuals, whether that is a true observation
or not, thereby risking ascertainment bias. The challenge is how to produce a report
for an individual during a live trial while maintaining the blind of the remaining par-
ticipants and the investigators. A procedure adapted by our research group is to have
the unblinded statistician provide the individual analyses and send the report data to
an independent academic clinician, who will prepare the report and send it to the
patient’s doctor for consultation. If the statistician is also blinded, the statistician can
conduct an individual data analysis using dummy codes and send the blinded indi-
vidual findings to the academic clinician not involved in the trial to unblind and
prepare the individual patient’s report. Another strategy is to have one investigator
responsible for the analyzing and reporting the trial results to the patient, and a dif-
ferent investigator recruiting and conducting assessments for the trial.
For many journals, the extent of blinding must be reported according to the
CONSORT statement on reporting trial findings (Schulz et al. 2010). Information
must be provided on who was blinded, the methods of blinding, on what character-
istics the control and intervention agents were matched, where the randomization
schedule was held, if individuals or the trial were unblinded at any stage, and how
the success of the blind was assessed (Schulz and Grimes 2002b; Viera and
Bangdiwala 2007).
90 H. Senior
Conclusion
Investigators should dedicate adequate time and resources to prepare a trial protocol,
prior to study commencement, that details the procedures of randomization, alloca-
tion concealment and blinding. In N-of-1 trials, this should also include the prepara-
tion of the individual patient report for the patient’s doctor for consultation while
maintaining the blind for those who remain in the trial.
The implementation of these procedures during the trial will ensure the preven-
tion or minimization of confounding and bias, and accordingly their influence on
the estimate of the treatment effect, to allow reporting of accurate and credible
findings.
References
Schulz KF, Chalmers I, Altman DG (2002) The landscape and lexicon of blinding in randomized
trials. Ann Intern Med 136:254–259
Schulz KF, Altman DG, Moher D, Group C (2010) CONSORT 2010 statement: updated guidelines
for reporting parallel group randomised trials. BMJ 340:c332
Viera AJ, Bangdiwala SI (2007) Eliminating bias in randomized controlled trials: importance of
allocation concealment and masking. Fam Med 39:132–137
Yelland MJ, Nikles CJ, Mcnairn N, Del Mar CB, Schluter PJ, Brown RM (2007) Celecoxib com-
pared with sustained-release paracetamol for osteoarthritis: a series of n-of-1 trials.
Rheumatology (Oxford) 46:135–140
Chapter 8
Data Collection and Quality Control
Hugh Senior
Abstract To achieve a reliable data set for analysis that complies with the protocol,
a system of clinical data management (CDM) is critical. CDM is the planning and
process of data collection, integration and validation. This chapter provides a synop-
sis of the key components of CDM which need to be considered during the design
phase of any trial. Topics addressed include the roles and responsibilities of research
staff, the design of case report forms for collecting data; the design and develop-
ment of a clinical database management system, subject enrolment and data entry,
data validation, medical coding, database close-out, data lock and archiving. An
additional section discusses the rationale behind trial registration.
Introduction
Data management is an essential component in clinical trials to ensure that data that
are analyzed are reliable and statistically sound (Krishnankutty et al. 2012). Clinical
data management (CDM) aims to ensure high quality data by minimizing errors and
missing data to ensure a reliable dataset for analysis (Krishnankutty et al. 2012).
CDM is the process of collection, integration and validation of data in a clinical
trial. High quality CDM ensures that data collected for analysis is accurate and
complies with the protocol-specified requirements (Krishnankutty et al. 2012).
The main purpose of this chapter is to provide a synopsis of CDM for N-of-1
trials. For larger scale trials including multinational trials or trials evaluating inves-
tigational products for the purpose of registration of a new product, we refer the
reader to the guidelines by the working group on data centers of the European
H. Senior (*)
School of Medicine, The University of Queensland, Brisbane, Australia
e-mail: h.senior@uq.edu.au
The size of a CDM team in a small study depends on the budget and the scope of the
project. Often, research staff such as project managers may need to take on addi-
tional CDM roles.
The CDM team members are typically data managers, database programmers/
designers, medical coders, clinical data coordinator, data entry associate and a qual-
ity control associate (Krishnankutty et al. 2012).
The investigators also have a critical role in CDM in ensuring data quality by
collecting data that are accurate, complete and verifiable.
The data manager oversees the entire CDM process and liaises with the study
researchers and project manager (also known as the clinical research coordinator).
They are responsible for preparing the data management plan which describes the
database design, data entry, data tracking, quality control measures, serious adverse
event (SAE) reconciliation, discrepancy management, data extraction (including
assisting the statistician in preparing data sets for analysis) and database locking.
They approve all CDM activities (Krishnankutty et al. 2012; McFadden 2007).
The database programmer/designer performs case report form (CRF) annotation,
creates the database, and programs and tests the data validation system. Further,
they design data entry screens including access via the web for data entry
(Krishnankutty et al. 2012; McFadden 2007).
The medical coder codes the data for medical history, medications, co-
morbidities, adverse events, and concomitant medications (Krishnankutty et al.
2012). The clinical data coordinator is responsible for CRF design, CRF completion
guides, a data validation plan (DVP) and discrepancy management (Krishnankutty
et al. 2012).
Quality control associates check accuracy of data entry and conduct audits of the
data for discrepancies (Krishnankutty et al. 2012).
Data entry associates track CRFs and perform data entry into the database
(Krishnankutty et al. 2012).
8 Data Collection and Quality Control 95
ICH-GCP states that “all clinical trial information should be recorded, handled, and
stored in a way that allows its accurate reporting, interpretation, and verification”
(ICH E6 Section 2) (International Conference on Harmonisation 2015). To comply
in a trial there must be standardized operating procedures for data collection,
processing and storage.
Case Report Forms (CRFs) are printed, optical or electronic documents designed
to record all of the protocol-required information to be reported to the sponsor on each
trial subject (ICH-GCP E6) (International Conference on Harmonisation 2015).
The CRFs must seek data that are specified in the protocol, and any data that are
required by regulatory bodies.
The types of CRFs required in a clinical trial may include the following forms:
screening/eligibility form, demographics, randomization form, medical history,
physical examination and vital signs, concomitant/ concurrent medications, key
efficacy endpoints of the trial and other efficacy data, adverse events, and laboratory
tests. If using validated questionnaires in a CRF, the CRF should maintain the integrity
of the questionnaire to maintain validity.
CRFs must be written and formatted to be user-friendly and self-explanatory
(Krishnankutty et al. 2012). Most CRFs seek data that requires multiple-choice
responses including yes/no responses, pre-coded tables, validated questionnaires or
Likert scales, which allows data being collected across subjects to have the same
terminology and to be easily combined (Liu et al. 2010). Generally, if free-text is
being sought, it is kept to the minimum, with the exception of the recording of
adverse events.
Thought also needs to be given to by whom and when data are to be collected,
for example, if baseline data include both cardiology department data and medical
imaging department data, then it may be better for separate CRFs for each depart-
ment instead of requiring each department to complete a section of the same form
(McFadden 2007).
Often the data collected are conditional on a previous question. For example, if a
response is “Yes” for a medical history of diabetes, additional information may be
sought through dependent questions on insulin use or diet, but only for those who
responded with a yes. These series of questions are termed conditional data fields
(Liu et al. 2010).
All CRFs should contain a header with the trial identifier and name, site number,
and the subject identification number stated. They should also contain a section at
the end of the form for the investigator/subject who completes the CRF to sign and
date the form. The signature should match those authorized to complete CRFs
according to the study signature log. CRFs should have version information in the
footer along with page numbers (X of Y). The version number should be the most
recent and be the version that has been ethically approved and listed on the version
control log. It is also helpful to name the form in the header, for example Form A
may be the screening or enrolment form, Form X may be the adverse event form etc.
96 H. Senior
To ensure quality CDM, the CDM team in collaboration with the project manager
should develop a Data Management Plan (DMP), which describes the data management
procedures including listing roles and responsibilities of personnel, a final set of
CRFs, design of the database, data entry procedure, data query rules, query handling,
quality assurance including audit trail checks, electronic data transfer rules, database
backup and recovery, archiving, database security, procedures for database locking
and unlocking, and reports (Ohmann et al. 2011).
8 Data Collection and Quality Control 97
The development of CDM should begin in the early stages of a study. The database
is created on the basis of the protocol-derived CRFs. The types of data including
number of decimal points and units should be clear from the CRFs when developing
the database.
Databases have to be designed to allow the linking of multiple database records
for an individual subject, by including a unique subject number for each record
(McFadden 2007). On the database, subjects must be de-identified, where each
subject is given a unique subject number that serves as the subject identifier within
the database (Liu et al. 2010).
When developing the database, the designer has to define the database to comply
with the study objectives, intervals, visits, users with different levels of access,
different sites and subjects.
Depending on the requirements of the project and the study budget, there is a range
of software available to create a database from MS excel, MS access, open source soft-
ware (TrialDB, OpenClinica, openCDMS, PhOSCo), to specialized software (Oracle
clinical, Redcap, Clintrial, eClinical suite) among others. As a rule, these software
packages are designed to comply with regulatory requirements for conducting clinical
trials, but it is the responsibility of the sponsors to confirm this assumption.
If the purpose of a study is to provide evidence for regulatory approvals of new
medications or new indications, the database must comply with ICH-GCP and
country-specific regulations, and we recommend the employment of an experienced
clinical data manager. For these studies, it is imperative to ensure that the software
allows an audit trail of all activity in the database (Krishnankutty et al. 2012).
Security of the database is paramount, and most software allows the data manager
to allocate access of users to only the parts of the database to which they are required
to access according to roles and responsibilities, thereby preventing any unauthorized
access or changes to the database. The data manager must maintain a list of individuals
who are authorized to have access and make changes to the data. Further, the database
must be constructed in such a way to ensure the safeguarding of any trial blinding.
If a database user enters data, most software provides an audit trail by recording
the changes made, the user’s details, and the date and time of the change
(Krishnankutty et al. 2012). Further, the database must have a system of automatic
backups to ensure data preservation.
With all CDM, it is important to develop a system to maintain quality control of the
data during all stages of the trial. ICH-GCP states that “Quality control should be applied
to each stage of data handling to ensure that all data are reliable and have been processed
correctly” (ICH-GCP E6 Section 5) (International Conference on Harmonisation 2015).
After subjects have provided informed consent, and prior to the collection of data,
subjects are enrolled into a clinical trial using an “enrolment form”. The enrolment
form is used to screen subjects for eligibility into the trial using the inclusion and
98 H. Senior
X-rays, subject files, and records kept at the pharmacy, at the laboratories and at
medico-technical departments involved in the clinical trial) (ICH-GCP E6 Section
1) (International Conference on Harmonisation 2015).
On the CRFs, subjects must be de-identified. Each subject is given a unique sub-
ject number that serves as the subject identifier within the database (Liu et al. 2010).
It is a sound idea to also record the subject’s initials on the CRFs along with the
unique subject number. Identifiers that should be separate from the CRFs and
database are names, addresses, email addresses, contact phone numbers, social
security numbers or equivalent, medical record numbers, and photos (Liu et al.
2010). Specifically, this is any information where there is a reasonable basis to
believe the information can identify the individual.
The requirement of de-identification of subject’s data is re-iterated in ICH-GCP,
which states that “the confidentiality of records that could identify subjects should
be protected, respecting the privacy and confidentiality rules in accordance with the
applicable regulatory requirements”(ICH E6 Section 2) (International Conference
on Harmonisation 2015).
The subject’s name and initials will appear on a subject log along with the unique
subject number and this log is stored separately to the CRFs and the database. The
subject log and a contact details form should be available for the verification of
source documents and subject follow-up (Liu et al. 2010).
After completion of a CRF, if an investigator wishes to make a correction or
change to a CRF, they should put a line through the original data without obscuring
the original entry, write the new data alongside, date and initial the change (and the
initials should match those in the signature log), and provide an explanation for the
change. This applies to both written and electronic changes or corrections (ICH E6
Section 4) (International Conference on Harmonisation 2015). Most CDM com-
puter software automatically records any changes made on electronic CRFs includ-
ing the identity of the investigator, the original data, and date and time of the change.
Submission of study data to the CDM team may be by mail, courier, fax or elec-
tronically (Liu et al. 2010). Timely submission is critical to ensure deadlines for
data entry are met (Liu et al. 2010) and data validation can occur. SAE report forms
are usually required within 24 h of the investigator becoming aware of an SAE. CRFs
can be submitted when completed or at specified intervals depending on the process
for verification of source documents at the site (Liu et al. 2010).
To minimize the risk of transcription errors, data are often entered from p-CRFs
into e-CRFs using double data entry, often by two operators separately. Any dis-
crepancy is checked and resolved, leading to a cleaner database (Krishnankutty
et al. 2012). With single data entry, the data operator must double-check that the
data entered matches exactly the data on the p-CRF, and retain the p-CRFs in the
subject folder.
According to ICH-GCP, every clinical trial must have a designated staff member
who will monitor trial activities, called a clinical research associate (CRA) or a
study monitor. One role of a CRA is monitoring data entry into a CRF to ensure
completeness of data. The CDM team tracks CRFs to ensure data is collected at the
right assessment point, to check for missing pages on CRFs or illegible data, to raise
100 H. Senior
data discrepancies to the investigator seeking resolution, and to ensure data completeness,
timeliness and accuracy.
An important role of the CRA is to conduct a site visit to ensure that data recorded
on a CRF which has been derived from a source document is consistent with the
source document, and if inconsistent, the discrepancy is explained by the investiga-
tor (ICH-GCP E6 Section 4) (International Conference on Harmonisation 2015). If
there are any data on a CRF where there is a discrepancy between the data entry in
the CRF and the source document, the CRF needs to be corrected to match the data
in the source document or an explanation provided as to why the CRF is correct.
Data Validation
To ensure the quality of data in a database, data validation occurs throughout data
collection. Data validation is the process of testing the validity of the data against
protocol specifications (Krishnankutty et al. 2012). The project manager must
provide the CDM team with an edit specifications list which describes which data
are to be checked and queried, and this is programmed into the database. If the
investigators are not using a database with edit checking capabilities, they will have
to conduct edit checks manually prior to any data entry of a CRF.
Edit check programs test each variable with a logic condition specific for the
variable. All edit check programs are tested with dummy data before the database
goes live to ensure they are working (Krishnankutty et al. 2012). Any discrepancy
that occurs between the logic condition and the variable will raise a data query. The
types of discrepancies may include missing data that must be entered, data that is
not consistent with other data recorded on the forms, data out of range (range
checks), and protocol deviations (Krishnankutty et al. 2012; Liu et al. 2010).
Examples of edit checks include checking if eligibility criteria are met prior to
randomization, and checking that variables like height and weight are within a cer-
tain range. Edit checks need to be logical. It may be that a person may be taller than
the range, as the subject is more than the pre-specified two standard deviations
above the population mean height. This discrepancy would be resolved by respond-
ing that the height is correct.
With e-CRFs, many of the edit checks can occur immediately, raising data que-
ries as data are entered into the database. Other data queries will be raised as data
validation processes are conducted at regular intervals, and the investigator will
resolve any queries after logging into the system (Krishnankutty et al. 2012).
Further, data queries may be logged on a study report or data clarification form
derived from the database which the project manager, CDM team and investigator
can access to ensure the investigator resolves queries quickly.
The CDM team are responsible for data quality checks to identify missing, illog-
ical and inconsistent data both automatically in batches, and manually. Manual
checks will review the CRFs for any missed discrepancies such as laboratory data
8 Data Collection and Quality Control 101
or medical records recorded on a CRF suggesting an adverse event has occurred, but
this event is not recorded on the serious adverse event or adverse event forms.
Medical Coding
Often data are collected in clinical trials that require medical coding, which is a
process of classifying disease sub-types, medical procedures, medical terminologies
and medications. In large scale trials, the coding will be entered directly into the
database by a qualified medical coder. However, in smaller studies with limited
resources, the research or CDM team will have to assign a person with appropriate
medical knowledge to conduct the medical coding using a medical dictionary and
knowledge of the hierarchy of classifications in medical coding to allow coding
within proper classes. The types of data that may require medical coding to allow
counts in statistical analysis include disease sub-types, medical procedures, and
medications.
Common dictionaries for coding include the Medical Dictionary for Regulatory
Activities (MedDRA) for coding of adverse events, medical history, medical terms;
the WHO Adverse Reactions Terminology (WHOART) for the coding of adverse
events, and the WHO Drug Dictionary Enhanced (WHO-DDE) (Krishnankutty
et al. 2012; Liu et al. 2010). The dictionaries to be used for coding must be specified
in the protocol.
102 H. Senior
Prior to database close-out, a final data quality check and validation is made to
ensure there are no discrepancies remaining that have not been assessed to be “irre-
solvable”. All study activities are completed including medical coding and data
cleaning procedures, and external data is reconciled (Ohmann et al. 2011). Datasets
required by the study statistician are finalized (Krishnankutty et al. 2012). After
data-lock, no data can be changed in the database. Upon approval of the steering
committee, the database is locked and the data is extracted for data analysis
(Krishnankutty et al. 2012).
After data extraction, the database is archived. The data may be archived along
with essential documents on CDs/DVDs. The length of archiving of essential docu-
ments can range between 2 to 15 years. If there is a marketing application, ICH-
GCP requires archiving for a minimum of 2 years after the approval of application,
or if no marketing application is made, records must be archived for 2 years after the
end of the trial (ICH-GCP E6 Section 4) (International Conference on Harmonisation
2015). Sponsors and researchers must be aware of the regulations and guidelines on
retention of records for their specific local and national regulatory bodies, for exam-
ple, in trials involving children in Australia the period of retention can be 28 years.
If documents are archived off-site due to space constraints, then a record of the loca-
tion of the records must be kept by the sponsor or researcher (Liu et al. 2010).
From July 2005, the International Committee of Medical Journal Editors (ICMJE)
adopted a policy (De Angelis et al. 2005) that all member journals will only consider
a trial for publication if it has previously undergone registration in a trial registry.
N-of-1 trials are not exempt from this position of the ICMJE. A clinical trial is
defined by the ICMJE as “any research project that prospectively assigns human sub-
jects to intervention or comparison groups to study the cause-and-effect relationship
between a medical intervention and a health outcome” (De Angelis et al. 2005).
The purpose of a trial registry and the policy statement of the ICMJE is the sub-
stantial under-reporting of the findings of clinical trials. Under-reporting of clinical
trials potentially leads to a bias of the overall knowledge of the effect of a medical
intervention, leading to over-estimates of benefit and under-estimates of harm
8 Data Collection and Quality Control 103
(McGauran et al. 2010; Chalmers et al. 2013) Under-reporting of trials occurs both
within commercially sponsored trials and academic trials (Chalmers et al. 2013).
In clinical practice, evidence for interventions that will have an impact in many
cases does not arise from a single trial, but from the collective body of evidence, as
assessed by a systematic review and meta-analysis. As such, selective reporting
misrepresents the true effectiveness of a medical intervention and negatively impacts
on both clinical guidelines and practice. Registering trials prior to the conduct of a
trial, places the awareness of the trial in the public domain.
An additional important ethical consideration is that subjects who volunteer for
a clinical trial do so because they assume that they are advancing medical knowl-
edge. This places an obligation on the trial sponsor and investigators not to betray
this trust (Chalmers et al. 2013; De Angelis et al. 2005) This responsibility is clearly
stated in the Helsinki declaration that states that “Every clinical trial must be regis-
tered in a publicly accessible database before recruitment of the first subject”(World
Medical Association 2008). In some countries, clinical trials must be registered to
comply with policies from funding bodies or legislation (including the European
Commission, and the US Food and Drug Administration).
Clinical trials can be registered in ICMJE approved registries (International
Committee of Medical Journal 2015) or within the WHO registry network (World
Health Organization 2014). A trial only needs to be registered once, although some
investigators may register a trial in more than one country depending on specific
funders’ or countries’ policies and regulations.
Conclusion
At the end of the hard work of seeking funding, preparing and conducting a clinical
trial to answer an important research question, ideally researchers will have pro-
duced an accurate and complete database for analysis. The phenomenon of ‘garbage
in, garbage out’ (GIGO) applies to clinical trials. Databases that are incomplete and/
or inaccurate increase the risk of biased findings. To avoid the GIGO phenomenon
in N-of-1 trials, it is imperative to engage a clinical data management system that
adopts well-designed and user-friendly CRFs and databases, a data validation and
CRF tracking system, an audit trail, and clinical monitoring.
References
Chalmers I, Glasziou P, Godlee F (2013) All trials must be registered and the results published.
BMJ 346:1–2
De Angelis CD, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S, Laine C, Marusic
A, Overbeke AJ, Schroeder TV, Sox HC, Van Der Weyden MB, International Committee of
Medical Journal, Editors (2005) Is this clinical trial fully registered? a statement from the inter-
national committee of medical journal editors. Lancet 365:1827–1829
International Committee of Medical Journal Editors (2015) ICMJE clinical trials registration
[Online]. Available: http://www.icmje.org/about-icmje/faqs/clinical-trials-registration/
104 H. Senior
International Conference on Harmonisation (2015) ICH official web site : ICH [Online]. Available:
http://www.ich.org/home.html
Krishnankutty B, Bellary S, Kumar NB, Moodahadu LS (2012) Data management in clinical
research: an overview. Indian J Pharmacol 44:168–172
Liu MB, Davis K, Liu MB, Duke Clinical Research Institute (2010) A clinical trials manual from
the duke clinical research institute : lessons from a horse named Jim. Wiley-Blackwell,
Chichester/UK/Hoboken
Mcfadden E (2007) Management of data in clinical trials. Wiley-Interscience, Hoboken
Mcgauran N, Wieseler B, Kreis J, Schuler YB, Kolsch H, Kaiser T (2010) Reporting bias in medi-
cal research – a narrative review. Trials 11:37
Ohmann C, Kuchinke W, Canham S, Lauritsen J, Salas N, Schade-Brittinger C, Wittenberg M,
Mcpherson G, Mccourt J, Gueyffier F, Lorimer A, Torres F, CTR EWGD (2011) Standard
requirements for GCP-compliant data management in multinational clinical trials. Trials 12:85
Schulz KF, Altman DG, Mohe D, Group, C (2010) Consort 2010 statement: updated guidelines for
reporting parallel group randomised trials. BMJ 340:c332
World Health Organization (2014) The WHO registry network [Online]. Available: http://www.
who.int/ictrp/network/en/
World Medical Association (2008) Ethical principles for medical research involving human sub-
jects [Online]. Available: http://www.wma.net/en/30publications/10policies/b3/17c.pdf
Chapter 9
Individual Reporting of N-of-1 Trials
to Patients and Clinicians
Michael Yelland
Abstract This chapter offers a very practical account of the reporting of N-of-1
trials to patients and clinicians, using trials for chronic pain conditions as models
which may be applied to many other forms of N-of-1 trials. It draws from the
author’s experience in managing N-of-1 trials comparing celecoxib with extended
release paracetamol for chronic pain and osteoarthritis and comparing gabapentin
with placebo for chronic neuropathic pain. Reporting the results of N-of-1 trials to
patients and health care professionals requires considerable planning to make
reports user-friendly and an efficient tool for clinical decision making. Decisions
need to be made about key elements of the report, how to order them with the most
important summary elements first followed by detailed results, and how to set
thresholds for clinically important changes. The inclusion of tables and graphs in
reports should improve readability. An example of an individual report is provided.
Aim of Reporting
M. Yelland MBBS, Ph.D., FRACGP, FAFMM. Grad Dip Musculoskeletal Med (*)
School of Medicine, Griffith University and Menzies Health Institute,
Gold Coast, QLD, Australia
e-mail: m.yelland@griffith.edu.au
Decisions must be made about the elements of the trial that are essential to report.
Key elements of the report may include:
• Patient details
• Description of the trial – medications compared; order of medication periods;
marker joint/region; date of report
• Conclusion/summary of overall response
• Summary of outcomes used to determine the overall response
• Use of other medications during the trial
• Success of blinding
• Detailed results of the individual outcome measures, including graphs and/or
tables of relevant data points.
The results we reported were a mix of quantitative outcomes and qualitative
outcomes. See Fig. 9.1 at the end of this chapter for an example of the report that we
generated. The quantitative outcomes included mean scores of numerical rating
scales with ranges of zero to ten on which the severity of symptoms, functional loss
(Yelland et al. 2007; Yelland et al. 2009) or sleep disturbance (Yelland et al. 2009)
were rated. In arriving at these means, we omitted the first week of data from each
two week period to negate any carry-over effects from the preceding period.
The qualitative outcomes included medication preference between the current
period and the preceding period and a summary of adverse events during treatment
periods. Medication preference was recorded as a preference for one of the two
medications or no preference. Adverse events were listed and tallied for each period.
Reporting the results of N-of-1 trials should be in a format that suits the needs of the
‘consumers’ of the trial service, namely the patient and their health care profes-
sional. Their needs may differ, with some wishing to just read the ‘headlines’ or
conclusions, and others wishing to read the fine details that underlie these conclu-
sions. For this reason we chose to put the conclusions very early in the report,
directly after the description of the order of medication periods within the trial.
The raw data from treatment periods can be reported in graphical, tabular or
descriptive formats depending on its nature. Data from washout periods where a
carry-over effect may apply can be omitted. For quantitative data collected on a
daily or weekly basis, we used graphs of scores over time. The means of scores for
each treatment period were presented in a table to allow easy comparison between
treatment periods (Fig. 9.1).
The qualitative data on medication preferences and description of adverse events
were presented in tabular format.
9 Individual Reporting of N-of-1 Trials to Patients and Clinicians 107
Name: XXXXX
IMET: Celecoxib 200mg vs Extended-release
ID: XXX Paracetamol 665 mg
Sex: Z Dates of IMET: DD-MM-YY to DD-MM-YY
DOB: DD-MM-YY Marker joint/region: ZZZ
ZZZZZ Date of report: DD-MM-YY
Medication Diary:
1st Pair Week 1-2 Celecoxib
Week 3-4 Paracetamol SR
CONCLUSION
There were small differences in pain, stiffness and functional scores favouring paracetamol over
celecoxib throughout the IMET, but none of these differences was detectable by the patient. There
was no consistent preference for either medication and no adverse events or use of extra analgesics
with either medication. Overall there was no difference in the response to paracetamol and
celecoxib.
PAIN, SLEEP INTERFERENCE AND FUNCTIONAL SCORES: These scores were recorded on 0 to 10 scales
daily throughout the IMET. The mean differences between medications for each outcome and the probability that
these changes were detectable and clinically important are given below. Note a 50% probability signifies pure
chance and 100% signifies certainty.
USE OF OTHER TREATMENT: The patient used no additional pain medication during the IMET.
BLINDING OF MEDICATIONS: The patient correctly guessed that she was taking paracetamol in one period and
celecoxib in another period, but incorrectly guessed she was taking celecoxib in another period.
*Minimum detectable difference for pain and stiffness is 1.0(J Rheumatol 2000;27(11):2635-41). Minimum clinically important difference for pain and
stiffness is 1.75 (Pain 2001;94(2):149-58 & J Rheumatol, 2001. 28(2): p. 427-30).
Minimum detectable difference for function is 2.(Physical Therapy, 1997. 77(8): p. 820-9). Minimum clinically important difference for function is not known.
Fig. 9.1 Deidentified report of a patient who completed an individual medication effectiveness
test or N-of-1 trial on celecoxib versus sustained-release paracetamol for osteoarthritis.
DETAILED RESULTS
PAIN, STIFFNESS AND FUNCTIONAL SCORES
Average Average
Average pain
Week Medication Medication Guess stiffness functional
score*
score^ score~
GRAPHS
PAIN
10
9
8
7
6
5
4
3
2
1
0
Pre-IMET Weeks 1-2 Weeks 3-4 Weeks 5-6 Weeks 7-8 Weeks 9-10 Weeks 11-12
Celecoxib Paracetamol SR Celecoxib Paracetamol SR Celecoxib Paracetamol SR
STIFFNESS
10
9
8
7
6
5
4
3
2
1
0
Pre-IMET Weeks 1-2 Weeks 3-4 Weeks 5-6 Weeks 7-8 Weeks 9-10 Weeks 11-12
Celecoxib Paracetamol SR Celecoxib Paracetamol SR Celecoxib Paracetamol SR
FUNCTIONAL SCORE
10.00
9.00
8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
Celecoxib Celecoxib Paracetamol SR Celecoxib Paracetamol SR Celecoxib Paracetamol SR
Pre-IMET Weeks 1-2 Weeks 3-4 Weeks 5-6 Weeks 7-8 Weeks 9-10 Weeks 11-12
Defining what constitutes a response for each outcome can present some challenges.
This is in part because of the different ways outcomes are recorded and in part
because the threshold for response varies from individual to individual. With quan-
titative outcomes, the traditional way of defining differences in population trials is
to test for statistical significance of the difference between the mean change over
time for two groups of participants. In single patient datasets, the difficulties with
using inferential statistics with a threshold of significance of 0.05 % or 5 % have
been discussed elsewhere in this book in the section on analysis. Bayesian methods
are more appropriate for calculating and expressing differences in response to treat-
ment within the individual (Zucker et al. 1997). In Bayesian statistics results are
expressed as the probability that a nominated trend is present, e.g. there is an 87 %
110 M. Yelland
probability that pain scores on drug A are lower than on drug B. This is more
informative than reporting whether or not the difference in outcomes met a prede-
termined threshold of statistical significance.
Nonetheless, it is desirable to have some method of setting a threshold for a dif-
ference in response between treatments over the course of each trial. This will allow
comparisons between individuals in a series of N-of-1 trials. For this some guidance
may be found in the literature on minimum clinically important differences (MCID),
minimum clinically important change (MCIC) and minimum detectable change
(MDC). These will have been derived for some outcomes from mean results in
population based clinical trials and so may not represent what an individual patient
regards as important or detectable. This could be determined prospectively in con-
sultation with the patient with a question such as “What is the minimum percentage
improvement in your (insert outcome) that would make this treatment worthwhile?”
(Yelland and Schluter 2006).
The MCID which is defined as “the smallest difference in score in the domain of
interest which patients perceive as beneficial and would mandate, in the absence of
troublesome side effects and excessive cost, a change in the patient’s
management”(Jaeschke et al. 1989). This can be calculated by the differences
between groups in a clinical trial. In the case of the MCIC, it is calculated by the
differences within individuals in a clinical trial (Bombardier et al. 2001). Many
methods of estimating these values exist (Copay et al. 2007). The commonest are
‘anchor-based’ methods that compare change scores in key outcomes over time with
an anchor of the retrospective assessment of global response by the patient. For
example, the MCIC for pain may be the mean change in pain scores of those who
regard themselves as ‘much better’. Alternatively it may be the difference in mean
change scores between those who rate their improvement as ‘much better’ and ‘very
much better’ from the remainder who did not do as well as this.
The other way of estimating clinically important changes is by using distribution
methods.(Copay et al. 2007) One distribution method is based on the effect size,
which is calculated by subtracting the mean of the scores at baseline from the mean
of the scores at the follow-up point and then dividing this difference by the standard
deviation at baseline.
An effect size of 0.5, described as ‘moderate,’ seems to correlate best with MCIC
calculated by anchor based methods. However standard deviations can vary
considerably from sample to sample, so some prefer other distribution-based methods
that use the standard error of the mean (SEM) to calculate what is called the ‘mini-
mum detectable change’(MDC). The SEM provides a measure of within-person
change that is less dependent on a specific sample because it incorporates both the
standard deviation and the reliability. The MDC is equivalent to one SEM and repre-
sents the minimum change that is reliably detected by patients (Copay et al. 2007).
In our chronic pain trials, we reported both the MCID and the MDC for pain, but
used the MDC to define a response. Using the published MDC of 1.0 for pain scores
(Dworkin et al. 2008), a definite response was defined as an adjusted mean absolute
difference ≥1.0, a probable response as a difference of ≥0.5 but <1.0, and differ-
ences of less than 0.5 as no response.
9 Individual Reporting of N-of-1 Trials to Patients and Clinicians 111
A statement on the overall response conveniently provides a single result that sum-
marises the trial outcomes for the patient and clinician and is also useful in report-
ing on series of N-of-1 trials in scientific papers. However it should not necessarily
be the one measure that determines future treatment decisions. Defining an overall
response requires the aggregation and integration of the results from several out-
comes into a single result. This is not an easy task as it is necessary to make a
judgment about the relative value of each outcome. This may be at odds with the
relative value of each outcome for individual patients. Symptom relief may be the
most important outcome for a highly symptomatic patient, whilst absence of
adverse events may be more important to one who has suffered a lot of adverse
events in the past. We dealt with this dilemma in the celecoxib-paracetamol trials
by creating an aggregate response variable, composed from the five outcomes
weighted equally (Yelland et al. 2007). Each outcome was arbitrarily defined on a
5-point scale from −2 favouring celecoxib to +2 favouring paracetamol. An indi-
vidual with aggregate response absolute value ≥6 was considered a definite
responder, a value ≥3 but <6 was considered a probable responder, and a value <3
was considered a non-responder.
Equal weightings were assigned to each outcome here in the belief that it was
impossible to have a valid system of weighting each outcome. However there is now
an emerging science of discrete choice experimentation that allows determination of
an average value patients place on different attributes when making health related
decisions. This could conceivably be used to determine the relative value of out-
comes in a series of patients undertaking N-of-1 trials. These relative values could
112 M. Yelland
Conclusion
In summary, reporting the results of N-of-1 trials to patients and health care profes-
sionals requires considerable planning to make reports user-friendly and an efficient
tool for clinical decision-making. Decisions need to be made about key elements of
the report, how to order them with the most important summary elements first fol-
lowed by detailed results, and how to set thresholds for clinically important changes.
The inclusion of tables and graphs in reports should improve readability.
Transmission of reports to patients and their health care professionals should be
done very soon after completion of the trial when the results are most useful for
clinical decision–making.
9 Individual Reporting of N-of-1 Trials to Patients and Clinicians 113
References
Bombardier C, Hayden J, Beaton DE (2001) Minimal clinically important difference low back
pain: outcome measures. J Rheumatol 28:431–438
Copay AG, Subach BR, Glassman SD, Polly DW, Schuler TC (2007) Understanding the minimum
clinically important difference: a review of concepts and methods. Spine J 7(5):541–546
Dworkin RH, Turk DC, Wyrwich KW, Beaton D, Cleeland CS, Farrar JT et al (2008) Interpreting
the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recom-
mendations. J Pain 9:105–121
Jaeschke R, Singer J, Guyatt GH (1989) Measurement of health status. Ascertaining the minimally
clinically important difference. Control Clin Trials 10:407–415
Ryan M (2004) Discrete choice experiments in health care. BMJ 328:360–361
Yelland M, Schluter P (2006) Defining worthwhile and desired responses to treatment of chronic
low back pain. Pain Med 7(1):38–45
Yelland MJ, Nikles CJ, McNairn N, Del Mar CB, Schluter PJ, Brown RM (2007) Celecoxib com-
pared with sustained-release paracetamol for osteoarthritis: a series of N-of-1 trials.
Rheumatology (Oxford) 46(1):135–140
Yelland MJ, Poulos CJ, Pillans PI, Bashford GM, Nikles CJ, Sturtevant JM (2009) N-of-1 random-
ized trials to assess the efficacy of gabapentin for chronic neuropathic pain. Pain Med
10(4):754–761
Zucker DR, Schmid CH, McIntosh MW, D'Agostino RB, Selker HP, Lau J (1997) Combining
single patient (N-of-1) trials to estimate population treatment effects and to evaluate individual
patient responses to treatment. J Clin Epidemiol 50(4):401–410
Chapter 10
Assessing and Reporting Adverse Events
Hugh Senior
Introduction
H. Senior (*)
School of Medicine, The University of Queensland, Brisbane, Australia
e-mail: h.senior@uq.edu.au
It is paramount that all investigators involved in clinical trials are trained in and
are fully aware of the guidelines. An essential guideline is the efficacy guidelines,
denoted by the letter ‘E’, especially ‘E6’, which is the guideline addressing Good
Clinical Practice (GCP), the last version of which was in May 1996. GCP is defined
as ‘a standard for the design, conduct, performance, monitoring, auditing, record-
ing, analyses, and reporting of clinical trials that provides assurance that the data
and reported results are credible and accurate, and that the rights, integrity, and
confidentiality of trial subjects are protected’.
Investigators working with specialist and vulnerable populations should also be
trained in and be aware of other guidelines. In addition, trial investigators must also
be aware of their national, local and institutional regulatory requirements for the
conduct of clinical trials.
An important aspect of the E6 guideline is the set of standards on the ethical and
safe conduct of trials, including the assessment and reporting of adverse events
(AE). Under ICH-GCP, ‘the rights, safety, and well-being of the trial subjects are
the most important considerations and should prevail over interests of science and
society’ (E6, 2.3).
The following glossary provides definitions on the types and seriousness of
adverse events, and of other important terminology required for the understanding
of the assessment and reporting of AEs.
According to the ICH Expert Working Group (ICH Harmonised Tripartite Guideline
2003) a serious adverse event or reaction is any untoward medical occurrence that
at any dose:
10 Assessing and Reporting Adverse Events 117
• Results in death,
• Is life-threatening (the term “life-threatening” in the definition of “serious” refers
to an event/reaction in which the patient was at risk of death at the time of the
event/reaction; it does not refer to an event/reaction which hypothetically might
have caused death if it were more severe),
• Requires inpatient hospitalization or results in prolongation of existing
hospitalization,
• Results in persistent or significant disability/incapacity,
• Is a congenital anomaly/birth defect,
• Is a medically important event or reaction.
Medical and scientific judgment should be exercised in deciding whether other
situations should be considered serious, such as important medical events that might
not be immediately life-threatening or result in death or hospitalization, but might
jeopardize the patient or might require intervention to prevent one of the other out-
comes listed in the definition above. Examples of such events are intensive treat-
ment in an emergency room or at home for allergic bronchospasm, blood dyscrasias
or convulsions that do not result in hospitalization, or development of drug depen-
dency or drug abuse.
In the pre-approval clinical experience with a new medicinal product or its new
usages, particularly as the therapeutic dose(s) may not be established, all noxious
and unintended responses to a medicinal product related to any dose should be con-
sidered adverse drug reactions.
The phrase, responses to a medicinal product, means that a causal relationship
between a medicinal product and an adverse event is at least a reasonable possibil-
ity, namely, the relationship cannot be ruled out.
Regarding marketed medicinal products, the definition of an adverse drug reac-
tion is: a response to a drug which is noxious and unintended and which occurs at
doses normally used in man for prophylaxis, diagnosis, or therapy of diseases or for
modification of physiological function.
According to the ICH Expert Working Group, (ICH Harmonised Tripartite Guideline
2003):
An Unexpected Adverse Drug Reaction is an ADR whose nature, severity,
specificity, or outcome is not consistent with the term or description used in the
local/regional product labelling (e.g. Package Insert or Summary of Product
118 H. Senior
Investigator
A person responsible for the conduct of the clinical trial at a trial site. If a trial is
conducted by a team of individuals at a trial site, the investigator is the responsible
leader of the team and may be called the principal investigator.
Sub-investigator
Any individual member of the clinical trial team designated and supervised by the
investigator at a trial site to perform critical trial-related procedures and/or to make
important trial-related decisions (e.g., associates, residents, research fellows). See
also Investigator.
10 Assessing and Reporting Adverse Events 119
Sponsor
Investigator’s Brochure
The sponsor is responsible for the ongoing safety evaluation of the investigational
medicinal product(s). Please note, in small studies or academic-led studies, the
sponsor may also be an investigator.
The sponsor must arrange systems and written standard operating procedures
(SOPs) to ensure quality standards are met in the identification, documentation,
grading, archiving and reporting of adverse events, and provide these SOPs to all
study investigators.
Study investigators and research staff must routinely and prospectively identify
any individual adverse events, record these on an adverse event case report form
120 H. Senior
(CRF), and evaluate and report the adverse event to the sponsor for evaluation. The
investigator needs to evaluate the seriousness and causality between
the investigational medicinal product(s) and/or concomitant therapy and the adverse
event.
The sponsor is responsible for retaining detailed records of all adverse events
reported by investigator(s) and performing an evaluation with respect to serious-
ness, causality and expectedness.
Seriousness
The reporting investigator makes the judgment as to whether the event is serious
according to the definition of serious adverse event and serious adverse drug reac-
tion (European Commission 2011).
Causality
Expectedness
All serious adverse events (SAEs) should be reported immediately to the sponsor
except for those SAEs that the protocol or other document (e.g. Investigator’s
Brochure) identifies as not needing immediate reporting (ICH E6).
The immediate reports should be followed promptly by detailed, written reports.
The immediate and follow-up reports should identify subjects by unique code num-
bers assigned to the trial subjects rather than by the subjects’ names, personal iden-
tification numbers, and/or addresses. The investigator should also comply with the
applicable regulatory requirement(s) related to the reporting of unexpected serious
adverse drug reactions to the regulatory authority(ies) and the IRB/IEC (ICH E6).
Cases of adverse drug reactions that are both serious and unexpected are subject to
expedited reporting. The reporting of serious expected reactions in an expedited
manner varies among countries. Non-serious adverse reactions, whether expected
or not, would normally not be subject to expedited reporting (ICH E2D).
The sponsor should expedite the reporting to all concerned investigator(s) and
institutions(s), to the IRB(s) and IEC(s), where required, and to the regulatory
authority(ies) of all adverse drug reactions (ADRs) that are both serious and unex-
pected (ICH E6).
Such expedited reports should comply with the applicable regulatory
requirement(s) and with the ICH Guideline for Clinical Safety Data Management:
Definitions and Standards for Expedited Reporting (ICH E6) (ICH Expert Working
Group 1994).
The sponsor should submit to the regulatory authority(ies) all safety updates and
periodic reports, as required by applicable regulatory requirement(s) (ICH E6).
The sponsor shall inform all involved investigators of relevant information about
SUSARS that could adversely affect the safety of subjects.
SUSARs must undergo expedited reporting. At the time of initial reporting informa-
tion may be incomplete. As much information as is possible should be collected for
the initial report. It should at a minimum report the following: (a) a suspected inves-
tigational medicinal product, (b) an identifiable subject (e.g. study subject code
number, age, sex), (c) an adverse event assessed as serious and unexpected, and for
122 H. Senior
Conclusion
In conclusion, all researchers must ensure that the rights, safety, and well-being of
the trial subjects are the most important considerations, and these prevail over inter-
ests of science and society. Guidelines such as ICH-GCP amongst others, along
with regulatory bodies and committees, provide the standards and framework for
this to occur, and for the safety of subjects to be monitored by the sponsor and inde-
pendently of the sponsor by IDMCs, IRBs, IECs and Governmental agencies. By
centralizing and standardizing the reporting of events to the sponsor, this allows the
sponsor and others to have an overall perspective of the rate and types of events
across sites. It allows the sponsor to monitor sites with high adverse event rates
more closely, to make any protocol amendments to improve safety, and to collate
and publish adverse events so that the safety profile of the medication within sub-
populations can be reported in addition to the effectiveness of the test drug.
10 Assessing and Reporting Adverse Events 123
References
Abstract Some N-of-1 trials are conducted as part of clinical care, others are
developed as research. For those that are research, unless they are deemed exempt
from formal review, a relevant Human Research Ethics Committee or Institutional
Review Board should review specific projects before they are approved. N-of-1 tri-
als should also be authorized by institutions before commencing. The level of risk
to the patient/participant should guide and determine whether a particular project is
exempt from review, subject to a low/negligible risk review, or should be reviewed
by a full committee. Research ethics reviewers must develop a heightened ethical
sensitivity toward ensuring that a misguided approach to N-of-1 review does not
occur. Clinical researchers, institutions and research review committees, should rec-
ognize the continuum of clinical care and clinical research, in order to set and act
from explicit standards which are consistent with the clinical practice – clinical
research continuum.
A. Crowden (*)
School of Medicine, The University of Queensland, Brisbane, Australia
e-mail: a.crowden@uq.edu.au
G. Guyatt
McMaster University, Hamilton, Canada
e-mail: guyatt@mcmaster.ca
N. Stepanov
School of Medicine, The University of Queensland, Brisbane, Australia
e-mail: n.stepanov@uq.edu.au
S. Vohra
Department of Pediatrics, University of Alberta, Edmonton, Canada
e-mail: svohra@ualberta.ca
Introduction
The same ethical values that underpin clinical care also underpin clinical research:
respect for human beings, research (or clinical) merit and integrity, justice, and
beneficence. The specific values that are the foundation for ethical relationships
between researchers and research participants are intrinsically connected to widely
acknowledged values for effective ethical therapeutic relationships between clini-
cians and patients. The weight placed on particular values in clinical care and
research may, however, differ depending on the context. Researchers and research
reviewers, for instance, tend to place a higher value on respect for persons and
autonomy, and therefore focus on the consent process and associated documenta-
tion (Appelbaum et al. 1987; Faden et al. 1986; Stepanov and Smith 2013: Stepanov
2014). This necessitates ensuring that potential participants are provided with infor-
mation and consent forms that clearly articulate the justifications for the proposed
research, including its scientific merit and integrity (Kerridge et al. 2013). While the
standards applied to the delivery of clinical care and the conduct of research may
appear to be growing more divergent over time, in practice the boundary between
clinical care and clinical research often cannot be clearly differentiated (Lewens
2006; Kottow 2009).
Perhaps there is no better illustration of the close connection between clinical
practice and clinical research than that evidenced by recent developments in innova-
tive N-of-1 trials (Guyatt 1996). Analysis of the ethical dimensions of these trials is
pertinent and instructive. We will begin this brief exploration of the ethics of N-of-1
trials and the nature of the relationship between clinical care and clinical research
by highlighting certain aspects from the N-of-1 trials story.
To ensure accurate results, one needs methods of inquiry that reduce potential bias
(Keech et al. 2007). Medical history is littered with misleading results. Many once
popular, but now discarded treatments, were previously thought to be effective, but
are now known to be useless or harmful (Brignardello-Petersen et al. 2014).
Accordingly, the last 50 years has seen the development of increasingly sophisti-
cated strategies for minimizing bias in establishing intervention effectiveness and
thus avoiding misleading results.
Even the most rigorously conducted randomized trials are limited by variability
in patient responses. Positive trials do not mean that every patient benefits. Clinicians
recognize this variability in response and conduct traditional trials of therapy in
which they offer an intervention to patients and then monitor patients’ response.
Such conventional trials of therapy may, however, result in misleading conclusions
11 Research Ethics and N-of-1 Trials 127
The ethical implications of N-of-1 trials became evident when early developers of
N-of-1 trials considered the circumstances in which to conduct such a trial. A typi-
cal experience is outlined in the following case:
Susan, an experienced clinician, faces Derek, a patient suffering from chronic
obstructive pulmonary disease who remains with troubling dyspnea despite treat-
ment with inhaled tiotropium, inhaled steroids, and as-needed inhaled salbutamol.
Susan considers adding an oral theophylline, but notes that it is often associated
with adverse effects, and that though randomized trials have shown that on average
it reduces dyspnea in such patients, there is considerable variability in patient
response. Susan decides to prescribe theophylline (evidence shows will help some,
but not all the patients, to whom it is offered). Derek, the patient before her may be
one of those who benefit, or one who receives no symptom relief but only treatment
side-effects. How might Susan handle the situation?
128 A. Crowden et al.
One option would be to prescribe theophylline and leave it at that. For treatments
such as theophylline in COPD where RCTs have shown overall benefit, clinicians
adopting this approach can at least be confident that on balance, their patients are
more likely to benefit than not. However, when trial results suggest large variability
in response to treatment, or when patients differ substantially from those enrolled in
the available randomized trials, the benefit for the individual patient will be uncer-
tain. Thus, rather than simply prescribing therapy, the clinician may choose to con-
duct a conventional trial of therapy. If that trial shows apparent benefit, the
intervention (typically a drug) is continued; if not, it is stopped.
Physicians who are aware of the aforementioned sources of bias may not, however,
be satisfied with conventional trials of therapy. N-of-1 trials potentially minimize the
bias of a conventional trial of therapy, allowing much greater confidence in inferences
regarding individual benefit and may result in changes in treatment decisions up to
35 % of the time (Guyatt et al 1986). Thus, because the conclusion for treatment
choice will be far less likely to be spurious, clinicians such as Susan in the scenario
above, may decide it is in their patient’s best interest to conduct an N-of-1 trial.
Having made this judgment, how should the clinician proceed? Susan could
explain the uncertainties regarding the treatment decision to Derek. If Derek under-
stands the concept of an N-of-1 trial, is competent to understand potential risks and
benefits, and is willing to be involved, Susan and Derek, in partnership, can plan and
conduct an N-of-1 trial.
Or can they? Should Susan apply to a Human Research Ethics Committee or
Institutional Review Board for approval? Should the institution authorize the
research? Is authorization required from any other body such as a government health
department (often these are location specific, for example the Therapeutic Goods
Administration in Australia, or in Canada, Health Canada)?
When investigators at McMaster University proposed the first N-of-1 trials in the
early 1980s, their Institutional Review Boards (IRB) didn’t see the need for research
ethics review. The McMaster IRB viewed N-of-1 trials as ‘optimal clinical care’ and
not research.
Subsequent experience has been varied. Some IRBs or Human Research Ethics
Committees consider N-of-1 trials to be research, while others conclude that they
can be either research or clinical care. If the primary goal is to test treatments for the
purpose of contributing to further knowledge about how to manage and treat a con-
dition in average patients, then the N-of-1 trial may be most appropriately consid-
ered research. However, if the primary goal is to improve the care of the individual
patient, then the N-of-1 trial may be appropriately considered clinical care (Punja
et al. 2014). What is the correct view? At what point do clinical activities become
research? Are N-of-1 trials research?
11 Research Ethics and N-of-1 Trials 129
N-of-1 trials illustrate the difference between the standards we set for clinical
care and the standards we set for research. The difference would not be a problem if
there were a clear boundary between clinical and research activities. Systematic
clinical observation, continuous quality improvement and clinical research are over-
lapping activities. These activities are on a continuum and the ethical standards we
apply should reflect that continuum.
As an illustration of the continuum between quality improvement efforts and
clinical research consider the following. A clinician wishes to monitor the extent to
which she is successful in achieving full vaccination for all children in her practice.
No one is likely to suggest that she is conducting research, or that she had better
appear before an institutional review board or risk subsequent censure from her col-
leagues when her clandestine research activities are brought to light. What if she
wishes to conduct her monitoring in collaboration with a number of colleagues with
one goal: to ultimately compare how well each of them is doing? What if, as a
group, these physicians negotiate with the local public health department for a pub-
lic health nurse to help them establish registries of their patients with systematic
reminders to help achieve full vaccination? What if they now monitor, in a before-
after fashion, the extent to which the intervention of the health department improved
the vaccination rate? Finally, what if the group decides to publish the results of their
experience, in the hopes that they might be beneficial to others? At what point in this
series of possible activities related to vaccination does the transition from quality
assurance not requiring research ethics oversight to a research activity requiring
such oversight occur?
As we have previously noted, the specific values that are the foundation for ethi-
cal relationships between researchers and research participants are intrinsically con-
nected to widely acknowledged foundational values for effective ethical therapeutic
relationships between clinicians and patients. These values include respect for
human beings, research (or clinical) merit and integrity, and justice and beneficence
(Beauchamp and Childress 2001). These values tend to be applied at a higher
standard in research, and the ranking of particular values may differ depending on
the context. Whether one considers Susan’s proposed N-of-1 trial the delivery of
optimal clinical care (requirement only that the patient understands and consents)
versus research (the necessity for review by a human research ethics committee or
IRB), illustrates the nature of the problem.
Most of us recognize that we must treat all rational beings, or persons, never
merely as a means, but always as ends. There are strong reasons to accept some ver-
sion of Immanuel Kant’s famous formula for humanity, the so-called ‘consent prin-
ciple’. Sometimes clinicians may find that consent seems too demanding. However
in most cases, clinicians agree that it is wrong to act in ways to which a person
might not rationally consent (Parfitt 2013).
In relation to a proposed clinical practice intervention, or a choice about partici-
pation in a research project, it is generally accepted that the patient or potential
participant should make a decision. Relevant information is disclosed by the
clinician/researcher. If patient participants are competent to understand the informa-
tion, they can voluntarily choose to make an informed decision about whether to
130 A. Crowden et al.
accept treatment or participate in the research project. Good in theory, but what
about practice? Let’s return to Susan, our clinician, Derek her patient, and their
N-of-1 trial partnership.
If Susan had decided to undertake a conventional trial of therapy believing it
would benefit Derek, she would obtain consent in her usual way and begin. For most
clinicians, this would be implied consent only. In other words, the clinician would
suggest the intervention and, in the absence of objections from the patient, write the
prescription and presume that the patient will proceed accordingly. However, for
good reasons, she has chosen to conduct a trial of therapy in partnership with Derek
in a much more rigorous manner, with the idea of reaching a more trustworthy
assessment of benefit. In one view, apparently, by being more rigorous and explic-
itly involving the patient in the decision, the clinician, rather than conscientiously
discharging her ethical responsibilities, is taking on additional ones.
Clinicians considering N-of-1 trials have to think carefully about their intent.
If they are trying to improve the individual patient’s care, then the N-of-1 trial can
be conducted as part of clinical practice. If their intent is to develop knowledge to
benefit others (i.e. their primary intent is research), this may influence what patients
are offered or how their outcomes are measured. Under these circumstances, in
order to obtain research approval and institutional authorization before the therapy
can begin they must write a research protocol and complete an ethics application as
well as meeting any other local site authorization requirements. A research ethics
review board will consider the proposal, the consent procedure, and scrutinize other
relevant underlying values.
As John Lantos has rightly claimed, a researcher is evaluating therapy, while a
clinician who conducts a conventional trial of therapy makes her inferences, because
of the biases we have previously noted, based on very imperfect information. It does
seem odd that Susan (a responsible clinician who understands the principle of
evidence-based practice) requires external review and regulation because she chose
to be more responsible in ascertaining what is best for her patient than colleagues
who would conduct conventional far less rigorous trials of therapy (Lantos 1994).
This is the double standard on consent to treatment/research that may frustrate
clinicians and potentially disadvantage patients. The clinician who is prepared to
admit uncertainty about therapy, and address a need to for safeguards against bias,
is subject to more stringent rules than clinicians who don’t.
We regard this double standard as illogical and indefensible. The onus must be
on us all to ensure that participation in N-of-1 trials, or indeed any clinical care
procedure, is not always presented as a high-risk endeavor (Evans et al. 2013,
p. 166). The double standard anomaly is most obvious in the consent example.
The double standard anomaly is also relevant to other key values. For example, merit
and integrity, justice, beneficence, like respect for persons, are all reviewed and
regulated in a more robust manner in a research context. Clinical researchers are
aware of this anomaly. Research approving and authorizing bodies should be too.
This brings us to the last key issue with N-of-1 trials. How should they be
reviewed? Should N-of-1 trials be exempt from research ethics review, or not?
The answer to this question is as expected – it depends!
11 Research Ethics and N-of-1 Trials 131
and considered care are important, especially when indigenous people are
involved (Crowden 2013). However again, it is likely that conventional therapy
will have greater risk than N-of-1 trials.
Conclusion
Some N-of-1 trials are conducted as part of clinical care, others are developed as
research. For those that are research, unless they are deemed exempt from formal
review, a relevant Human Research Ethics Committee or Institutional Review Board
should review specific projects before they are approved. N-of-1 trials should also
be authorized by institutions before commencing. The level of risk to the patient/
participant should guide and determine whether a particular project is exempt from
review, subject to a low negligible risk review, or should be reviewed by a full
committee.
Research ethics reviewers must develop a heightened ethical sensitivity toward
ensuring that a misguided approach to N-of-1 review does not occur. They must
ensure that the identified double standards between clinical care and clinical
research do not persist. Clinical researchers, institutions and research review com-
mittees, should recognize the continuum of clinical care and clinical research, in
order to set and act from explicit standards which are consistent with the clinical
practice – clinical research continuum. We should recognize that N-of-1 trials are a
better ethical alternative when compared to conventional therapy. The notion that
optimal clinical practice in the form of an N-of-1 clinical trial requires greater over-
sight than usual suboptimal clinical practice is indefensible.
References
Appelbaum PS, Lidz C, Meisel A (1987) Informed consent. Oxford University Press, New York
Beauchamp T, Childress J (2001) Principles of biomedical ethics. Oxford University Press,
New York
Brignardello-Petersen R, Ioannidis JPA, Tomlinson G, Guyatt G (2014) Surprising results of ran-
domized trials. In: Guyatt G, Meade MO, Cook DJ, Rennie D (eds) Users’ guides to the medi-
cal literature: a manual for evidence-based clinical practice, 2nd edn. McGraw-Hill, New York
Crowden A (2013) Chapter 6: ethics and indigenous health care: cultural competencies, protocols
and integrity. In: Hampton R, Toombs M (eds) Indigenous Australians and health: the wombat
in the room. Oxford University Press, South Melbourne, Victoria, Australia, pp 114–129
Evans I, Thornton H, Chalmers I, Glasziou P (2013) Testing treatments: better research for better
healthcare, 2nd edn. Pinter &Martin Ltd., London
Faden R, Beauchamp T, King N (1986) A history and theory of informed consent. Oxford
University Press, New York
Gabriel SE, Normand SL (2012) Getting the methods right – the foundation of patient-centered
outcomes research. N Engl J Med 367(9):787–790
11 Research Ethics and N-of-1 Trials 133
Guyatt G (1996) Clinical care and clinical research: a false dichotomy. In: Daly J (ed) Ethical
intersections: health research, methods and researcher responsibility. Allen and Unwin, Sydney,
pp 66–73
Guyatt GH, Sackett DL, Taylor DW et al (1986) Determining optimal therapy-randomised trials in
individual patients. N Engl J Med 314:889–892
Keech A, Gebski V, Pike R (2007) Interpreting and reporting clinical trials. A guide to the consort
statement and the principles of randomised controlled trials. Australian Medical Publishing,
Sydney
Kerridge I, Lowe M, Stewart C (2013) Ethics and law for the health professions. The Federation
Press, Leichardt
Kottow M (2009) Clinical and research ethics as moral strangers. Arch Immunol Ther Exp (Warsz)
57:157–164
Kravitz R, Duan N (eds) (2014) Design and Implementation of N-of-1 Trials: a user’s guide,
Agency for healthcare research and quality, US Department of Health and Human Services,
Feb 2014
Lantos J (1994) Ethical issues – how can we distinguish clinical research from innovative therapy?
Am J Pediatr Hematol/Oncol 16:72–75
Lewens T (2006) Distinguishing treatment from research: a functional approach. J Med Ethics
32:424–429
Parfitt D (2013) On what matters, vol 1. Oxford University Press, Oxford/New York, p 178
Punja S, Vohra S, Eslick I, Duan N (2014) An ethical framework for N-of-1 trials: clinical care,
quality improvement, or human subjects research? In: Kravitz RL, Duan N (eds) Design and
implementation of N-of-1 trials: a user’s guide. AHRQ Publication No.
13(14)-EHC122-EF. Rockville, pp 13–22. www.effectivehealthcare.ahrq.gov/N-1-Trials.cfm
Stepanov N (2014) Questioning the boundaries of parental rights: exploring children’s rights, the
best interests standard, and parental consent to paediatric non-therapeutic experimental
research. Doctor of Philosophy PhD, The University of Melbourne
Stepanov N, Smith MK (2013) Double standards in special medical research: questioning the
discrepancy between requirements for medical research involving incompetent adults and med-
ical research involving children. J Law Med 21:47–52
Tate RL, Perdices M, Rosenkoetter U et al (2013) Revision of a method quality rating scale for
single-case experimental designs and n-of-1 trials: the 15-item risk of bias in N-of-1 trials
(RoBiNT) scale. Neuropsychol Rehabil 23(5):619–638
Chapter 12
Statistical Analysis of N-of-1 Trials
Abstract This chapter discusses some techniques for exploratory data analysis and
statistical modelling of data from N-of-1 trials, and provides illustrations of how
statistical models and corresponding analyses can be developed for the more
common designs encountered in N-of-1 trials. Models and corresponding analyses
for other designs, perhaps involving different nesting of treatments, order and
blocks, can be developed in a similar manner. The focus of this chapter is on con-
tinuous response outcomes, that is, numerical response data. The chapter is pre-
sented in tutorial style, with concomitant R code and output provided to complement
the description of the models. Mixed effects models are also discussed. Such mod-
els can be extended to account for a variety of factors whose effects can be consid-
ered as random draws from a population of effects. A taxonomy of relevant statistical
methods is also presented. This chapter is aimed at readers with some background
in statistics who are considering an analysis of data from an N-of-1 trial in the R
package.
Introduction
Once data from an N-of-1 trial have been collected, they need to be analyzed. The
methods and models adopted for analysis will depend on the way in which the trial
has been designed and the aim of the analysis.
Suppose that the trial aims to evaluate an outcome associated with two treatments.
The patient is exposed to each treatment in six blocks (replicates). An example
dataset for this study is given below (Table 12.1). It involves six blocks of two time
periods each during which the patient receives each treatment in randomized order.
We note that the statistical techniques implemented in this chapter are not specific
to the particular design of this N-of-1 trial, meaning they could be applied to a nalyze
data from a range of different experimental designs.
There are a few ways to set up the data in the software package R. Two of these
are as follows.
• Open R and specify a working directory that identifies the location for the data
and the results. For example, in Windows, if the location is a folder called
‘example’ within the folder ‘trials’ on the C drive of the computer, this would be
achieved with the command:
setwd(“C:/trials/example”)
• Option 1:
Type the data in an Excel spreadsheet and save it as a csv file, say “data.csv”, in
the ‘example’ directory. Then read the file into R as an object called ‘egdat’, say,
using the command:
egdat = read.csv(”data.csv”)
egdat = data.frame(egdat)
attach(egdat)
• Option 2:
If the dataset is sufficiently small, type it directly into R:
time = seq(1,12)
block = c(1,1,2,2,3,3,4,4,5,5,6,6)
treat = rep(c(1,2),6)
order = rep(c(1,2,2,1),3)
outcome= c(31,35,28,39,32,39,36,37,38,41,39,39)
egdat = data.frame(time, block, treat, order, outcome)
attach(egdat)
Attaching the data frame allows you to reference the variables inside it without
having to also reference the data frame. Now check the number of rows and col-
umns of egdat and what data are in it:
dim(egdat)
egdat
head(egdat)
By default, any numbers that appear within the dataset will be considered within
R as numeric. For variables that are actually factors, such as ‘block’, it is impor-
tant to set these as factor variables. This can be achieved for the appropriate
variables as follows:
block = as.factor(block)
The output of the command ‘head (egdat)’ shows data from the first six observa-
tions would be
time block treat order outcome
1 1 1 1 1 31
2 2 1 2 2 35
3 3 2 1 2 28
4 4 2 2 1 39
5 5 3 1 1 32
6 6 3 2 2 39
138 K. Mengersen et al.
Schmid and Duan (2014) provides details of a range of statistical methods used for
analysis of N-of-1 trials. These methods are represented as a decision tree in
Fig. 12.1. This chapter elaborates on some of these approaches. Note that other
related methods, such as the treatment alone model using a t-test, or an ordered
categorical model for scaled outcomes, may also be appropriate, depending on the
design, data and the intended inference.
Exploratory Analysis
It is useful to conduct a preliminary evaluation of the data, using plots and summary
statistics. Plots of the data can be obtained using the following R commands.
par(mfrow=c(1,2))
plot(as.numeric(block[treat==1]),outcome[treat==1],
type="l", ylim=c(25,45),xlab="Block",ylab="Outcome",main="(a)")
lines(block[treat==2],outcome[treat==2], lty=2)
legend(3,45,c("'-' Treat=1","'--' Treat=2"))
plot(outcome[treat==1], outcome[treat==2],
xlim=c(28,41), ylim=c(28,41),xlab="Outcome for treatment
1",ylab="Outcome for treatment 2",main="(b)")
lines(c(30,41),c(30,41))
Fig. 12.1 Decision tree of statistical methods for analysis of N-of-1 trials
12 Statistical Analysis of N-of-1 Trials 139
The resultant plots are shown in Fig. 12.2 and are described as follows. The left
hand panel shows the outcome values for the two treatments (1: solid line; 2: dotted
line), for each block (indicated on the horizontal axis). It appears that treatment 2 is
better than treatment 1, although the magnitude of this improvement is not clear and
appears to depend on the block and/or time.
The right hand panel shows the six pairs of outcome values plotted by treatment
(1: horizontal axis; 2: vertical axis). The solid line indicates where the outcome for
treatment 1 would be equal to the outcome for treatment 2. All of the points are
above this line, indicating that treatment 2 is better than treatment 1.
It is important to note that the plot does not take into account other variables such
as order. Similar plots could be drawn to visually evaluate the association and effect
of order on the outcome.
It is also interesting to plot a histogram and an empirical density of the outcome,
see Fig. 12.3. Although not shown here, one could ‘color’ (or otherwise identify) the
histogram bars to indicate the outcomes associated with different time periods,
blocks, treatments and order. This can facilitate a general evaluation of the contribu-
tion of these factors; for example if all of the outcomes for treatment A are at one
end of the plot, then treatment B would appear to be better than treatment A. This is
indeed shown in Fig. 12.3b, where the density for all of the outcomes (solid line) is
contrasted with the density for treatment 1 (wide dotted line) and treatment 2 (taller
dotted line, showing that these values are more concentrated and generally larger
than the values for treatment 1).
Note that, as above, these plots are based on very small numbers so it is impor-
tant not to read too much into them. They are visual inspections only, and not formal
statistical tests.
Fig. 12.2 Assessing preliminary outcomes using plots of (a) the outcomes of treatment 1 and
treatment 2, and (b) direct comparison of the outcomes of treatment 1 and treatment 2 against a
line indicating identical treatment effect
140 K. Mengersen et al.
4
different treatment outcomes
(b) to estimate treatment
effects
3
Frequency
2
1
0
28 30 32 34 36 38 40 42
outcome
b
hist(outcome)
plot(density(outcome),main="",ylim=c(0,0.30))
lines(density(outcome[treat==1]), lty=2)
lines(density(outcome[treat==2]), lty=3)
Overall summary statistics for the outcome can also be obtained as part of the
exploratory analysis.
summary(outcome)
summary(outcome[treat==1])
summary(outcome[treat==2])
12 Statistical Analysis of N-of-1 Trials 141
These commands display the minimum, 1st quartile, median, mean, 3rd quartile
and maximum:
Outcome:
Min 1st Qu Median Mean 3rd Qu Max.
28.00 34.25 37.50 36.17 39.00 41.00
Outcome for Treatment=1:
Min. 1st Qu Median Mean 3rd Qu Max.
28.00 31.25 34.00 34.00 37.50 39.00
Outcome for Treatment=2:
Min. 1st Qu Median Mean 3rd Qu Max.
35.00 37.50 39.00 38.33 39.00 41.00
The differences between outcome values for treatments 1 and 2 within each
block can also be calculated. Since it is a small dataset, and for exposition, we do
this manually:
diff=c(35-31, 39-28, 39-32, 37-36, 41-38, 39-39)
diff [1] 4 11 7 1 3 0
All of the values are positive, indicating that treatment 2 appears to be better than
treatment 1, ignoring time and order.
Nonparametric Methods
The sign test and the Wilcoxon signed rank test (or the Mann–Whitney test) are two
common nonparametric tests that can be used to test for differences between the
median outcomes for the two different treatment groups. Note that although the
Wilcoxon test is more informative than the sign test because it uses the ranks of the
outcome values in addition to their sign, i.e. whether they are above or below the
median), both tests ignore the other variables (such as time and order).
The R commands for the sign test below are in a library called BSDA. This pack-
age needs to be installed (e.g. using the ‘Packages’, ‘Install Packages’ menus in R)
and attached (e.g., using the ‘Packages’, ‘Load Packages’ menus or the command
‘library(BSDA)’. See R user information for more details.
SIGN.test(diff)
wilcox.test(outcome[treat==1], outcome[treat==2])
These commands test whether the median difference is equal to zero. The sign
test returns a p-value of 0.06 and the Wilcoxon test returns a p-value of 0.07. Note
that with such a small dataset, these values, and the test itself, may have low statisti-
cal power to detect reasonably sized differences between treatment groups. As each
p-value is between 0.05 and 0.1, there is some evidence to indicate a true difference
142 K. Mengersen et al.
between the two treatments, but as above this still ignores the possible effect of time
and order.
Linear Models
The simplest linear model describes the variation in the outcome, y, as a function of
the variation in blocks, treatments and order. This can be easily expressed in R as
follows. Here, ‘outlm’ is the object that holds the results of the linear model analy-
sis. The command ‘summary’ then provides a summary of these results.
outlm = lm(outcome ~ block + treat + order)
summary(outlm)
This output tells us that there is no significant effect (on the outcome values) due
to differences in blocks and due to order, but that there is some evidence of a signifi-
cant effect due to treatment. Multiple R-squared shows that about 76 % of the vari-
ability in the data is explained by the model. However, a much smaller adjusted
R-squared value is seen (34 %). This suggests that the model is overfit. Indeed, there
are 12 observations and 8 terms in the model. Generally, it is preferable to describe
as much variability in the data as possible but with as fewer model terms as possible.
Therefore, these results are not satisfactory, and further exploration into the devel-
opment of a parsimonious statistical model would typically follow. However, it is
important to understand that these nonparametric tests and the above linear statisti-
cal model do not take into account the nested structure of the data nor do they
account for other potentially important features of the study, such as possible linear
time trend (rather than by block) that may be present in the data, and/or autocorrela-
tion of errors that is typically seen in data that are collected over time. These are
fundamental flaws that render these approaches inappropriate for inference. In the
12 Statistical Analysis of N-of-1 Trials 143
The above discussion illustrates the way in which the statistical model and corre-
sponding analysis are developed to reflect the design of the trial, in particular, the
way in which order or treatment is nested within blocks. For example, it may be
useful to consider outcomes for a particular treatment within a specified block. This
can be analyzed using the same model without order, that is, by using the following
R command
outlm <-lm(outcome~block + treat)
Alternatively, a model that describes the nested structures in the dataset, and the
corresponding analysis of variance, can be written in R as follows. Note that the
slash in the Error term indicates the nested structure of the design. Also, the use of
the term ‘nesting’ of treatment within block differs from the usual use of nesting,
where the nested factor levels only make sense within the higher order factor. Here,
order is nested within block unless one defines order as always the first or second
treatment in a block and the order has the same meaning across blocks. To reflect the
within block design, one could use the following call in R
outlm <-aov(outcome~treat+Error(block/treat))
The above analysis indicates that there is indeed a significant difference between
the treatments, after accounting for the block structure of the data.
Interaction plots can be drawn to visually evaluate the above results, and also to
evaluate the results of the order analysis (if undertaken): (Fig. 12.4)
interaction.plot(block,treat,outcome)
interaction.plot(order,treat,outcome)
From the above plots, it would appear as though that there is a general increase
in the mean response for both treatments as the block identifier increases. In regards
144 K. Mengersen et al.
to order, there is a slight suggestion that the effect of treatment 2 is increased when
it is administered second. However, given the sample size of this study, this pattern
may have been produced at random. An interaction between treatment and order
could suggest that there is a carryover effect in treatment. Despite the typical
assumption of a sufficiently long wash out period between treatments, the potential
existence of a carryover effect should be investigated and either discounted or incor-
porated in the analysis.
A linear modelling approach can be undertaken to test the effect of time on the
outcome using the following command. Note that time, block, treatment and order
can’t all be included in the model, since there are insufficient observations to esti-
mate all of these effects. In fact, order or block may be meaningless when time is
included in the model, since they are both time factors, just defined differently.
For illustration, we omit blocks and order, and evaluate the effect of time alone.
(Just for variety, we call the object containing the analysis in R a different name:
12 Statistical Analysis of N-of-1 Trials 145
newout). Note that now there is no nesting in the model (since we are ignoring
blocks and order), and an interaction between treatment and time is also estimated.
newout <-lm(outcome~treat*time)
summary(newout)
The estimated difference in treatment effects is much more significant and has
actually increased when compared to our previous analysis. The sign of the interac-
tion effect indicates that the increase over time is smaller in the treatment 2 group
than in the treatment 1 group, which suggests that the difference between treatments
decreases over the course of the trial. While the difference is 7.5 at time 1, it has
completely disappeared by time 12. This suggests that the potential benefits of treat-
ment 2 over treatment 1 are short lived. We note that in regards to time, the above
model assumes that time is linearly related to outcome, depending upon treatment.
It is important to check that this assumed linear relationship is reasonable. If not,
then alternative parameterizations of time such as higher order polynomial terms or
some grouping of time could be considered.
When fitting statistical models, it is important to check that statistical and model-
ling assumptions are appropriate. In the model described above, it is assumed that the
residuals (difference between the observed and fitted data points) are normally distrib-
uted. Specifically, that the residuals are centered on zero, have constant variance, are
independent and follow the bell shaped curve of the normal distribution. The follow-
ing plots should be inspected to ensure that these assumptions are valid (Fig. 12.5).
par(mfrow=c(2,2))
plot(fitted.values(newout),residuals(newout),main="Fitted values
vs residuals")
plot(1:length(outcome),residuals(newout),type="l",main="Residu
als by order")
hist(residuals(newout))
qqnorm(residuals(newout))
From the ‘Fitted values vs residuals’ plot, it appears that the residuals are
symmetrically distributed around zero, and there appears to be constant variability
146 K. Mengersen et al.
Fig. 12.5 Four means of assessing the normality of the example dataset
of these residuals among different fitted values. The ‘Residuals by order’ plot shows
no patterns or runs in the residuals suggesting that independence is a valid assump-
tion. The remaining two plots show that the characteristic shape of the normal dis-
tribution seems appropriate to describe the distribution of the residuals. When
inspecting such plots, it is important to note that with such a small sample size, a
variety of patterns could be observed, even if the residuals were generated from a
normal distribution. Hence, some apparent anomalies may not necessarily void
particular assumptions. For example, in this study, the histogram shows that the
mode of the residuals is not 0. However, with such a small sample size, this is not
enough to void the normality assumption.
It is also necessary to confirm other model assumptions. For example, it is
assumed that each treatment group has equal variance and that the outcome has a
linear relationship with time. These assumptions can be confirmed by inspecting the
following plots (Fig. 12.6).
12 Statistical Analysis of N-of-1 Trials 147
Fig. 12.6 Testing the effects of time (a) and mean and variability of each treatment group (b)
par(mfrow=c(1,2))
plot(time,residuals(newout),main="Time vs residuals")
boxplot(residuals(newout)~treat,xlab="Treatment group", main=
"Residuals by Treatment")
Models with such an error structure can be estimated under a generalized least
squares framework via the gls function. This can be implemented in R as follows:
newout <-gls(outcome~treat*time,
correlation=corAR1(form=~1|time))
summary(newout)
this additional parameter can be investigated via a likelihood ratio test or by com-
paring values of different information criteria. Such criteria include the Akaike
information criterion or AIC (Akaike 1974) and the Bayesian information criterion
or BIC (Schwarz 1978).
Further analyses could be undertaken. For example, in the analyses thus far, we
have considered the block effects to be fixed. Alternatively, one could consider such
effects as random or more specifically random effects drawn from a population of
block effects. This is useful in cases where the blocks considered in the experiment
cannot be used again for data collection. In this study, blocks were used to reflect the
variability in the response between different periods of time, hence this cannot be
repeated exactly. In such cases, one is usually less concerned with the specific esti-
mated block effects, but rather more interested in learning about the distribution of
block effects. Having this understanding would be useful for experimentation into
the future.
There are many different packages and functions used in R to estimate models
with random effects (such models are also known as mixed effects models).
Examples of packages include: lme4, nlme and asreml (with each using different
function calls to estimate the mixed effects model). Below is an example of a mixed
model involving blocks which can be estimated via the lmer function in the lme4
package.
library(lme4)
mixedmodelout <- lmer(outcome~treat + (1|block))
summary(mixedmodelout)
From above, one can see that the estimated difference in treatment effects is the
same between the fixed and mixed effects models. The output also shows that vari-
ability from two different sources (block and residual) was assumed. The usual
assumptions regarding the residuals are made here. In regards to the block variabil-
ity, it is assumed that the block effects are normally distributed around zero with a
standard deviation estimated to be 1.8.
Discussion
This chapter has considered the statistical modelling and analytic aspects of N-of-1
trials. The preceding sections have provided illustrations of how statistical models
and corresponding analyses can be developed for the more common designs encoun-
tered in N-of-1 trials. Other designs, perhaps involving different nesting of treat-
ments, order and blocks, can be developed in a similar manner.
Mixed effects models were also discussed. Such models can be extended to
account for a variety of factors whose effects can be considered as random draws
from a population of effects. For example, this modelling approach can also account
for the between subject variability of repeated measures data, and thus offers an
approach to combining different N-of-1 trials conducted on many individuals
(Zucker et al. 2010; Schmid and Duan 2014). Importantly, such a modelling
approach can also handle sparse and/or unbalanced data that occur in studies for a
variety of reasons including missing data.
The focus of this chapter has been on continuous response outcomes, that is,
numerical response data. There are, of course, other data types such as binary or
count data which could be measured from N-of-1 trials. In such cases, it is impor-
tant to appropriately model the distribution of the response data. A wide variety of
response data types (or distributions of data) can be modelled within a generalized
linear modelling framework. Such models have three components: a distribution of
the response, a linear predictor and a link function that relates the mean response to
the linear predictor. It is the link function that appropriately re-scales the linear
predictor to define different parameters in different distributions as a function of
explanatory variables. These models can be implemented in R within the glm func-
tion for fixed effects models and in the glmer function for mixed effects models. In
either case, one needs to specify the appropriate distribution of the data. For exam-
ple, binary data could follow the Binomial distribution and count data could follow
the Poisson distribution.
The importance of exploring the goodness-of-fit of all models considered cannot
be understated. One part of this is assessing the appropriateness of all statistical
and modelling assumptions. This exploration would focus on checking the validity
of assumptions in regards to the distribution of the residuals, the appropriate
inclusion of explanatory variables and the predictive ability of the particular model.
As shown, the assumptions about the distribution of the residuals can be investigated
12 Statistical Analysis of N-of-1 Trials 151
via histograms, quantile-quantile plots, a plot of the residuals versus the fitted
values and a plot of the residuals versus the order of the data. In terms of models that
account for autocorrelation, the specified error structure needs to be checked. In the
first-order autoregressive model implemented in this chapter, the exponential decay
of r s towards zero as s tends to infinity should be verified. In relation to checking
the appropriate inclusion of explanatory variables, plots of the residuals versus each
explanatory variable could be inspected for the presence of any patterns that are
unaccounted for in the model. Other considerations for the mixed effects model
include checking the appropriateness of the assumed distribution of the random
effect estimates. Another part of assessing goodness-of-fit is investigating the accu-
racy and associated uncertainty of model predictions. These can be inspected visu-
ally or by cross-validation techniques such as leave-one-out procedures.
Further, we only very briefly touched on model choice. With such an array of
potential models available for analysis and potentially many explanatory variables
to be considered for inclusion into the model, it can be a difficult task to determine
the most appropriate statistical model. This is a common statistical problem, and as
such it is prevalent in the literature. A wide variety of the literature focuses on infor-
mation criteria such as AIC and BIC. These criteria are calculated for each compet-
ing model, and are constructed in a manner that rewards goodness-of-fit but
penalizes model complexity. This means more complex models are preferred or an
additional explanatory variable is included into the model only if either of these
choices significantly increases the goodness-of-fit of the model. We note that the
choice between random effect models is difficult, in general. In such cases, one can
consider the appropriateness of assuming a random effect and/or evaluate the vari-
ability of the random effect to determine the worth of inclusion. Other approaches
based on the differences in deviance have also been considered in the literature.
The residual variance structure of proposed models requires careful consider-
ation in relation to checking model assumptions and model choice. It can also relate
to the sample size of the N-of-1 trial/s. In taking account of the trial design, we
considered two forms of residual variance. The first was uncorrelated errors with a
common variance parameter for all time periods and both treatments, and the sec-
ond was a first-order autoregressive structure where errors were assumed to have a
specific form of correlation depending upon distance apart in time. There are a
variety of other choices that are worth considering in terms of model assumptions
and benefit in regards to model choice. For example, one could consider a com-
pletely unstructured covariance matrix providing great flexibility is describing the
covariance of the errors. Further, one could consider other autoregressive structures
and/or different variance terms for each treatment. The benefits of assuming more
complex variance structures, even if they actually exist, will ultimately be deter-
mined by the sample size of the study. That is, are there enough data points to actu-
ally observe the variance structure, and does assuming this structure significantly
improve the goodness-of-fit of the model? Therefore, before trying a variety of different
variance structures, one should consider which ones make sense given the study
design (that is, the number of data points available to estimate the variance parameters).
152 K. Mengersen et al.
Conclusion
Appropriate modelling and analysis are crucial for accurate statistical inference and
clinical decision support. However, they are only part of the larger picture: they
depend critically on careful design and conduct of the study, and management and
preparation of the data. Clear reporting of statistical methods, models and analyses,
including the availability of code and data, will facilitate continual improvement in
the way that these trials are designed, conducted, analyzed and used. This call for
clarity and transparency in statistical analysis is not confined to this type of study,
of course, but applies to all fields of quantitative scientific endeavor. It is hoped that
this chapter provides some useful resources to assist in this challenge.
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control
19:716–723
Barlow DH, Hersen M (1984) Single case experimental designs: Strategies for studying behavior
change, vol 2, 2nd edn. Pergamon Press, New York.
Kazdin AE (1982) Methods for clinical and applied settings, vol 368, Single-case research designs.
Oxford University Press, New York
Lillie EO, Patay B, Diamant J, Issell B, Topol EJ, Schork NJ (2011) The N-of-1 clinical trial: the
ultimate strategy for individualizing medicine? Pers Med 8(2):161–173
Rochon J (1990) A statistical model for the “N-of-1” study. J Clin Epidemiol 43(5):499–508
Schmid CH, Duan N (2014) Chapter 4: The DEcIDE Methods Centre N-of-1 guidance panel sta-
tistical design and analytic consideration for N-of-1 trials. In: Kravitz RL, Duan N, The
DEcIDE Methods Centre N-of-1 Guidance Panel (Duan N, Eslick L, Gabler NB, Kaplan HC,
Kravitz RL, Larson EB, Pace WD, Schmid CH, Sim I, Vohra S) (eds) Design and implementa-
tion of N-of-1 Trials: a user’s guide. AHRQ Publication No. 13(14)-EHC122-EF. Agency for
Healthcare Research and Quality, Rockville, pp 33–53, Feb 2014. http://effectivehealthcare.
ahrq.gov/index.cfm/search-for-guides-reviews-and-reports/?productid=1857&pageaction=dis
playproduct
12 Statistical Analysis of N-of-1 Trials 153
Introduction
In Chaps. 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 of this book, the authors have argued
that N-of-1 trials provide an important research design with the potential to evaluate
individual treatment effects and to estimate heterogeneity of treatment effects in a
population (Gabler et al. 2011). However, the economics of N-of-1 trials needs to be
considered also. The economics of N-of-1 trials relates to the economic value of
undertaking a clinical N-of-1 trial as opposed to an alternative research study or
usual practice (i.e. no trial). It also relates to the value of extending N-of-1 trials to
compare the costs as well as benefits of two or more interventions at an individual
level. These economic questions are, or should be, important considerations for
health care funders, clinicians and patients. However, whilst conventional economic
evaluation methods undertaken alongside traditional clinical trials, such as a ran-
domized controlled trial, are well established, there are some nuances in the appli-
cation of economic evaluation and interpretation of the findings in the N-of-1 trial
context as opposed to the more traditional trial setting that need consideration.
Imagine you are considering undertaking an N-of-1 trial to address the clinical
question of whether a new intervention is superior to an existing intervention in
either a specific patient, or at the aggregate level, in a heterogeneous cohort. There
are two economic questions that might be of interest alongside this N-of-1 trial. The
first relates to the outcome of the trial, and asks “Is the new intervention a cost-
effective use of resources compared to an existing intervention, for the management
of the specific indication in this specific patient (or alternatively in this cohort)?” In
other words, does the new intervention provide acceptable value for money? The
second question relates to the optimal research approach for answering this ques-
tion: “Is an N-of-1 trial an economically viable research method to address this
clinical question?” In this section, we outline approaches to each of these questions
in turn. In the following section, we then outline some methodological consider-
ations in addressing these questions.
In comparison to the relatively developed literature reflecting the use of N-of-1 tri-
als to evaluate the clinical outcomes associated with an intervention (see for exam-
ple the systematic review by Gabler and colleagues (2011)), economic evaluations
undertaken alongside N-of-1 trials have been sparse. Karnon and Qizilbash (2001)
were innovators in recognizing the potential for N-of-1 trials to provide estimates of
13 The Economics of N-of-1 Trials 157
et al. 2008b; Kravitz et al. 2008). In this case, the economic question is whether it is
worth paying the cost of undertaking an N-of-1 trial to inform access decisions, as
compared to “standard practice”, where standard practice is no individualized
N-of-1 trial. With standard practice, individuals could receive either an existing
intervention (with possible suboptimal clinical outcome), or the new intervention
(with possible unnecessary costs and adverse effects). However, they receive these
without the targeting to response provided by data from an N-of-1 trial.
Scuffham and colleagues (2008b, 2010) showed the potential for use of N-of-1
trials to tailor the decision to continue a high cost intervention in healthcare decision-
making. In the context of N-of-1 trials of celecoxib for osteoarthritis and gabapentin
for chronic neuropathic pain, they reported fixed costs associated with undertaking
an N-of-1 trial in the region of AU$23,000 with an additional variable cost of
AU$1,300 per patient (expressed as 2003–2005 Australian dollar value). In a subse-
quent paper, including a third trial (of medications for the management of Attention
Deficit Hyperactivity Disorder) with 1 year follow up, the estimated marginal cost
of running an N-of-1 trial was reduced to AU$600 per patient (2006 Australian dol-
lars) (Scuffham et al. 2010). The authors reported these costs to be partially offset
by the savings generated in subsequent prescribing patterns (Scuffham et al. 2008b,
2010). Moreover, the health benefits gained from individualized treatment resulted
in estimates for the incremental cost (AU$6,896 per life year or AU$29,550 per
quality-adjusted life year gained) well within the range generally considered to pro-
vide acceptable value for money in Australia (Scuffham et al. 2008b; Harris et al.
2008). The authors concluded that the N-of-1 trial offers a realistic and viable option
for improving access to selected high cost medicines in patients for whom manage-
ment is uncertain (Scuffham et al. 2008b, 2010). However, despite this potential, the
role of N-of-1 trials to target access to high cost interventions outside of the research
setting has not yet been realized in practice. Nor, to our knowledge, has any funding
mechanism been put in place to support the application of N-of-1 trials to address
routine policy decisions.
The Comparator
Within N-of-1 trials the comparator may either be current active treatment or pla-
cebo, with appropriate washout between treatments. N-of-1 trials require strict con-
trol of within participant blindness. In order to protect the cloak of blindness the
following considerations should be made:
• Design of pharmaceutical pack;
• Equivalent size, color, smell of competing treatments;
• Treatment frequency.
13 The Economics of N-of-1 Trials 163
FV
PV = (1)
(1 + r )
t
Table 13.2 Discounting a Future value Discount rate (%) Present value
future cost of $1,000 and 1
Cost ($)
Life Year (LY, effect)
occurring in 5 years using 1,000 0.0 1,000.00
different discount rates 1,000 3.5 841.97
1,000 5.0 783.53
1,000 10.0 620.92
Effect (LYs)
1 0.0 1.00
1 3.5 0.84
1 5.0 0.78
1 10.0 0.62
Clinical and summary health outcome measures are collected for each patient and
for each phase of the trial, whether it is an N-of-1 trial or a traditional clinical trial.
Health outcomes may be clinical measures (e.g. a frailty score, leukocyte count),
natural units (e.g. visits to the family physician, days stay in hospital), or a patient
reported outcome such as a pain or quality of life measure.
166 J.A. Whitty et al.
Cost-Effectiveness Analysis
(continued)
168 J.A. Whitty et al.
∆Cost
ICER = (13.3)
∆ Effect
This produces the final metric of the analysis, the additional cost per unit of
outcome.
A CEA alongside an N-of-1 trial (N-of-1 CEA) is constructed for each intervention
arm for each patient. This differs to traditional clinical trials where the means for
each intervention and comparator groups are used and the differences calculated.
Where the trial design involves multiple cross-overs between treatment options, the
incremental cost is estimated as the difference between the mean costs for each
treatment for each individual (13.4).
Incremental Cost =
∑Cost int
−
∑Cost comp
(13.4)
nint ncomp
In the case of differential periods of treatment, for example, 6 months of interven-
tion, 6 months comparator followed by 3 months intervention and finally 3 months
13 The Economics of N-of-1 Trials 169
comparator, then the weighted average cost should be estimated reflecting the rela-
tive duration of treatment.
The incremental effect is estimated in a similar manner. The incremental cost-
effectiveness ratio (ICER), that is the comparison of the difference in cost (ΔCost)
and the difference in effect (ΔEffect) is then estimated for each individual patient.
∆Costi
ICERi = (13.5)
∆ Effecti
Traditionally, the decision criteria for CEA is whether the cost per unit effect is
below an accepted level (cost-effectiveness threshold) that the decision maker is
willing to pay. Alternatively, for N-of-1 trials, the decision criteria revolves around
each individual patient. As such the analysis is to determine whether for any one
patient the incremental gains are achieved at an acceptable level of additional cost.
It is then possible to summarize, for the entire cohort enrolled in the N-of-1 trial, the
proportion of patients whose benefit was achieved at an acceptable cost.
Uncertainty
Sensitivity Analyses
After accounting for the uncertainty in the incremental cost, incremental effect and the
final ICER using statistical methods, there remain additional uncertainties unrelated
to sampling variation. This is the role of sensitivity analysis. Sensitivity analysis is the
usual approach for CEAs alongside clinical trials; however these should be applied to
N-of-1 CEA when comparing all costs and outcomes for the cohort, but are generally
not appropriate at the individual level. For example, the price of a drug will be the
same for all participants in an N-of-1 trial and cannot vary for each individual.
A one-way sensitivity analysis is used to evaluate the relative impact of the vari-
ables included in the analysis. Systematic variation of each variable across a plausi-
ble range of values, whilst holding all other variables constant, reveals the relative
influence of each variable on the cost-effectiveness estimate (Briggs 1999). For
example, the plausible cost of hospitalization may vary by 10 % of the baseline value
used in the analysis. Re-running the analysis with the hospitalization cost reduced by
10 % and increased by 10 % will reveal the impact of this variation on the overall
result for each individual. Completing multiple one-way analyses with a fixed varia-
tion (e.g. ±50 %) will reveal which variables have the greatest to the least impact on
the overall result for all individuals within the trial. Alternatively, best and worst case
values can be used. Another alternative form of multiple one-way sensitivity analysis
is to complete a threshold sensitivity analysis. Instead of varying each parameter by
a fixed amount, each parameter is varied to the extent where it changes the overall
result of the evaluation. Threshold sensitivity analysis shows how much a particular
variable needs to change for example, to result in the intervention under evaluation
being no longer cost-effective for any patient or for all patients.
Regardless, it is important to justify the form of sensitivity analysis selected as
well as the choice of parameters included and, for sensitivity analyses other than
threshold, the range over which these parameters are varied (Husereau et al. 2013).
The results of sensitivity analyses are best presented in a table and also diagram-
matically, for example as a Tornado Diagram (Drummond et al. 2005).
Despite the potential of individual N-of-1 trials to optimize access to high cost inter-
ventions (Scuffham et al. 2008b), they do not appear to have been adopted to guide
market access or subsidy decisions yet. Their role in this context is largely unex-
plored. Mixed methods research could investigate the feasibility and barriers to
implementation (Kravitz et al. 2009). It would also be useful to gain insights into the
relative merits of an individual level or cohort level analysis from N-of-1 trials as
opposed to a traditional clinical trial, including the potential of N-of-1 trials to
expand our understanding of the heterogeneity of treatment response and costs.
N-of-1 trials offer patients an approach to effective and cost-effective individualized
medicine. Identifying the intervention that the patient has the greatest response to
enables the patient to make use of that intervention in the knowledge that it is effec-
tive for them. Moreover, this reduces wastage of scarce healthcare resources as the
patient does not continue using an ineffective intervention for a protracted period
until the clinician determines that there may be a better option; sometimes this can
be months.
Evaluations to date have considered only the tangible clinical and cost benefits asso-
ciated with N-of-1 trials. However, N-of-1 trials are closely aligned with the con-
ceptual framework of patient-centered health care (McMillan et al. 2013). As such,
they are likely to provide intangible benefits that go beyond direct health outcomes
which are as yet not easy to measure. These benefits might include greater patient
involvement with their care and decision-making, an improved clinician-patient
relationship, and a more holistic understanding of the trade-offs and patient prefer-
ences around the use of high cost interventions (Karnon and Qizilbash 2001).
Moreover, as a consequence of its individualized focus, it is quite possible that the
N-of-1 research method might of itself produce improvements in health (Karnon
and Qizilbash 2001; McMillan et al. 2013). Mechanisms for measuring and valuing
the broader benefits for patients and society associated with the delivery of N-of-1
trials need to be explored.
Arguably, the individualized nature of the N-of-1 trial and close involvement of
patients in decision-making means the N-of-1 trial closely captures and accounts for
individual patient treatment preferences in decision-making. This is aligned with
previous attempts to describe improvements in health related quality of life at the
172 J.A. Whitty et al.
individual level (Ruta et al. 1994). It is also consistent with current endeavors to
establish methods for valuing the outcomes of health care using preference-based
valuations of outcome at an individual level (Lancsar and Louviere 2008). However,
the concept of optimizing individual treatment preferences from health care raises
several interesting considerations relevant to the underlying theories of health eco-
nomics. Conventional economic evaluation has evolved to evaluate “average” costs
and benefits across a relevant cohort, and importantly to value “average” benefits
based on the preferences of the general public (Scuffham et al. 2008a). It is argued
that this approach to valuation is closest to the ideal of valuing benefits from behind
a “veil of ignorance” (Rawls 1999), avoids potential bias due to patient self-interest,
and allows the valuation to represent the preferences of all tax-payers who jointly
bear the opportunity cost of a funding decision (Robinson and Parkin 2002). From
a normative perspective, making individual funding decisions based on the valua-
tion of individual patients as might occur in an N-of-1 trial is somewhat contradic-
tory to this approach.
A related point has been raised by Karnon and Qizilbash (2001). If patients real-
ize the outcome of the trial affects their treatment decision, and if they have a prefer-
ence for a specific treatment, there is the potential for gaming. This may not be
completely mitigated by strategies to avoid bias, such as randomization and blind-
ing. How then do we control for patient preferences to avoid bias? One potential
answer to this challenge might be borrowed from the literature on patient preference
trials (Preference Collaborative Review Group 2008). Obtaining an indication of
patient treatment preference before randomization, and controlling for this in any
aggregate analysis, might mitigate any unintentional self-interest bias.
Nevertheless, despite these considerations relating to the use and valuation of
individual treatment preferences, the benefits of understanding the range of
responses both clinically and economically to a treatment is an important advantage
of the N-of-1 trial approach. This adds complexity to the data available to make
economic decisions, but may possibly add to the validity of the decisions made.
Conclusion
This chapter has outlined the rationale, challenges and methodological consider-
ations for evaluating the economics of N-of-1 trials. The costs and effects for an
individual in an N-of-1 trial are important to use to determine if the health benefits
are sufficient to justify any additional ongoing costs. This chapter describes the
approach to estimating the additional costs and health outcomes for individuals in
an N-of-1 trial. The classification of costs, measurement of health outcomes, data
transformations such as converting costs and outcomes to present values through
discounting are described. The statistical methods for data analysis, and making
decisions based on the comparative effectiveness and costs are presented. Finally,
some reflections on the research agenda to progress the methods of economic evalu-
ation alongside N-of-1 trials are outlined.
13 The Economics of N-of-1 Trials 173
References
Briggs A (1999) Economics notes: handling uncertainty in economic evaluation. BMJ 319:120
Cairns J (1994) Valuing future benefits. Health Econ 3:221–229
Claxton K (1999) The irrelevance of inference: a decision-making approach to the stochastic eval-
uation of health care technologies. J Health Econ 18:341–364
Drummond MF, Sculpher MJ, Torrance GW, O’brien B, Stoddart GL (2005) Methods for the eco-
nomic evaluation of health care programmes. Oxford University Press, New York
Gabler NB, Duan N, Vohra S, Kravitz RL (2011) N-of-1 trials in the medical literature: a system-
atic review. Med Care 49:761–768
Glick H, Doshi J, Sonnad S, Polsky D (2007) Economic evaluation in clinical trials. Oxford
University Press, Oxford
Gold MR, Siegel JE, Russell LB, Weinstein MC (eds) (1996) Cost-effectiveness in health and
medicine. Oxford University Press, New York
Harris AH, Hill SR, Chin G, LI JJ, Walkom E (2008) The role of value for money in public insur-
ance coverage decisions for drugs in Australia: a retrospective analysis 1994–2004. Med Decis
Making 28:713–722
Husereau D, Drummond M, Petrou S, Carswell C, Moher D, Greenberg D, Augustovski F, Briggs
AH, Mauskopf J, Loder E, Force IHE, Force, I. H. E. E. P. G.-C. G. R. P. T. (2013) Consolidated
Health Economic Evaluation Reporting Standards (CHEERS)–explanation and elaboration: a
report of the ISPOR Health Economic Evaluation Publication Guidelines Good Reporting
Practices Task Force. Value Health 16:231–250
Karnon J, Qizilbash N (2001) Economic evaluation alongside n-of-1 trials: getting closer to the
margin. Health Econ 10:79–82
Kravitz RL, Duan N, White RH (2008) N-of-1 trials of expensive biological therapies: a third way?
Arch Intern Med 168:1030–1033
Kravitz RL, Paterniti DA, Hay MC, Subramanian S, Dean DE, Weisner T, Vohra S, Duan N (2009)
Marketing therapeutic precision: potential facilitators and barriers to adoption of n-of-1 trials.
Contemp Clin Trials 30:436–445
Lancsar E, Louviere J (2008) Estimating individual level discrete choice models and welfare mea-
sures using best worst choice experiments and sequential best worst MNL. Centre for the Study
of Choice, University of Technology, Sydney
Mcmillan SS, Kendall E, Sav A, King MA, Whitty JA, Kelly F, Wheeler AJ (2013) Patient-centered
approaches to health care: a systematic review of randomized controlled trials. Med Care Res
Rev 70:567–596
Nice (2004) Guide to the methods of technology appraisal. National Institute for Clinical
Excellence (NICE), London
Pharmaceutical Benefits Advisory Committee (2013) Guidelines for preparing submissions to the
Pharmaceutical Benefits Advisory Committee (Version 4.4). Australian Government
Department of Health, Canberra
Pope JE, Prashker M, Anderson J (2004) The efficacy and cost effectiveness of N of 1 studies with
diclofenac compared to standard treatment with nonsteroidal antiinflammatory drugs in osteo-
arthritis. J Rheumatol 31:140–149
Preference Collaborative Review Group (2008) Patients’ preferences within randomised trials:
systematic review and patient level meta-analysis. BMJ 337:a1864. doi:10.1136/bmj.a1864
Ramsey S, Willke R, Briggs A, Brown R, Buxton M, Chawla A, Cook J, Glick H, Liljas B, Petitti
D, Reed S (2005) Best practices for economic evaluation alongside clinical trials: an ISPOR
RCT-CEA task force report. Value Health 8:521–533
Rawls J (1999) A theory of justice. Belknap Press of Harvard University Press, Cambridge, MA
Robinson A, Parkin D (2002) Recognising diversity in public preferences: the use of preference
sub-groups in cost-effectiveness analysis. A response to Sculpher and Gafni. Health Econ
11:649–651
174 J.A. Whitty et al.
Ruta DA, Garratt AM, Leng M, Russell IT, Macdonald LM (1994) A new approach to the measurement
of quality of life. The patient-generated index. Med Care 32:1109–1126
Schulman K, Seils D (2003) Clinical economics. In: Max M, Lynn J (eds) Interactive textbook on
clinical symptom research. National Institutes of Health, Bethesda
Scuffham PA, Whitty JA, Mitchell A, Viney R (2008a) The use of QALY weights for QALY cal-
culations: a review of industry submissions requesting listing on the Australian Pharmaceutical
Benefits Scheme 2002-4. Pharmacoeconomics 26:297–310
Scuffham PA, Yelland MJ, Nikles J, Pietrzak E, Wilkinson D (2008b) Are N-of-1 trials an eco-
nomically viable option to improve access to selected high cost medications? The Australian
experience. Value Health 11:97–109
Scuffham PA, Nikles J, Mitchell GK, Yelland MJ, Vine N, Poulos CJ, Pillans PI, Bashford G, Del
Mar C, Schluter PJ, Glasziou P (2010) Using N-of-1 trials to improve patient management and
save costs. J Gen Intern Med 25:906–913
Smith DH, Gravelle H (2001) The practice of discounting in economic evaluations of healthcare
interventions. Int J Technol Assess Health Care 17:236–243
Thompson S, Barber J (2000) How should cost data in pragmatic randomised controlled trials be
analysed? BMJ 320:1197–20
Torgerson D (1999) Discounting. BMJ 319:914–915
Chapter 14
Reporting N-of-1 Trials to Professional
Audiences
M. Sampson (*)
Children’s Hospital of Eastern Ontario, Ottawa, Canada
e-mail: msampson@cheo.on.ca
L. Shamseer
Ottawa Hospital Research Institute, Ottawa, Canada
e-mail: lshamseer@ohri.ca
S. Vohra
Department of Pediatrics, University of Alberta, Edmonton, Canada
e-mail: svohra@ualberta.ca
Reporting Standards
Since the early 1990s, there have been concerted efforts to improve the complete-
ness and clarity of reporting of healthcare research (Moher 2009). Philosophically,
making trial results available to a professional audience has been stated as an ethical
imperative. The Declaration of Helsinki states that
Researchers have a duty to make publicly available the results of their research on human
subjects and are accountable for the completeness and accuracy of their reports. All parties
should adhere to accepted guidelines for ethical reporting. Negative and inconclusive as
well as positive results must be published or otherwise made publicly available. Sources of
funding, institutional affiliations and conflicts of interest must be declared in the publica-
tion. Reports of research not in accordance with the principles of this Declaration should
not be accepted for publication. Paragraph 36 (World Medical Association General
Assembly 2013)
This requirement would clearly apply to N-of-1 trials with a research focus, but
many have a clinical focus, being designed to inform particular clinical decisions. It
seems likely that many of these clinically oriented trials are not shared in the profes-
sional literature (Price and Grimley Evans 2002). However, even in these cases, the
Declaration encourages evaluation of safety and efficacy and recommends that
information be recorded and made publicly available (Paragraph 37).
Positive and negative results are equally important – reporting only successful
trials, or only the successful outcome measures within trials, will result in a dis-
torted picture of the efficacy of the intervention. Publication bias, that is, selective
reporting of positive trials, has been extensively documented in the randomized
controlled trial literature (Dwan et al. 2013); similar problems are likely to exist
across the spectrum of health research, including in N-of-1 trials, perhaps to a larger
degree due to fewer regulations and less monitoring. Increasingly, there is pressure
from consumer, legal and professional sources to register trials and publicly report
their results (Goldacre 2014; Lefebvre et al. 2013). Reporting guidelines assist in
this. A reporting guideline is typically a consensus-based document, which provides
authors with a minimum set of information that should be completely reported for a
particular research design or design aspect (Moher et al. 2011). Importantly, report-
ing guidelines are not a judgment on the quality of the research (Moher 2009)
although transparency in reporting by authors enables readers to better gauge the
quality of the conduct and design of reported/published research.
Since the publication of the first scientific article around 1665, the overall orga-
nization of articles has become more formal and less literary in style and, in the
twentieth century, the IMRAD format of Introduction, Methods, Results and
Discussion has been adopted in medicine, becoming dominant by approximately
1965 (Sollaci 2004). The narrative or chronological approach persisted in abstracts
until 1987 when the Annals of Internal Medicine introduced the structured abstract,
but only for clinical trials (Huth 1987), with the aim of assisting readers in quickly
judging the applicability and validity of findings of an article to clinical practice
14 Reporting N-of-1 Trials to Professional Audiences 177
In 2006 a team led by Sunita Vohra of the University of Alberta, Canada, set out to
examine all published reports of N-of-1 trials. Our objective was to ascertain the
range of designs used to conduct N-of-1 trials; methods used in trial statistical anal-
ysis; and methods for combining data from a series of N-of-1 trials. While we later
scaled back our efforts to focus on ABAB designs, we initially examined AB, ABA
and ABAB approaches. What became clear quite early on, and was crystal clear
after reading several hundred reports, was that it was very often difficult to tell what
the investigators had done. We wanted to study the number of periods and pairs, the
use of blinding, run-in and washout periods. We wanted to know the number of
measurements per period, stopping criteria, types of outcome measures (e.g. subjec-
tive vs. objective, validated population-specific measures vs. patient- and symptom-
specific measures) and methods for adverse event reporting. It was often not clear
what the diagnosis was, and whether there were any concurrent conditions or thera-
pies. Often, the description was inadequate to assess if the comparator used was
appropriate. One frequently had to read and re-read, looking for clues as to how
many treatment periods were administered, in what sequence and how that sequence
was determined. Partly because many of the reports were so difficult to unravel, we
eventually realized that new studies were being published faster than we could
screen them and extract the data.
Over a period of years, we concluded that a standardized method of reporting of
N-of-1 trials, adapted from the CONSORT statement for randomized controlled
trial (RCT) reports, could help investigators improve the quality and consistency of
their N-of-1 trial reports. Based on this preliminary work, and with funding support
14 Reporting N-of-1 Trials to Professional Audiences 179
There are two main tools to support those using the CENT guidelines. The first is a
checklist of 25 items, some with sub-elements that were determined to be the essen-
tial aspects of an N-of-1 trial to be reported. The checklist follows the IMRAD
structure, so can be thought of as a detailed document outline. The checklist is avail-
able as part of the CENT statement (Vohra et al. 2015). The accompanying E&E
gives the rationale for reporting each item (Shamseer et al. 2015). In some cases,
180 M. Sampson et al.
such as randomization and blinding, the purpose of describing the research proce-
dures is to allow the assessment of observer bias. In other cases, the purpose is to
avoid bias in the interpretation of results, for example, by stating whether carryover
effect or period effect was assessed. Some elements are needed so that meta-
analysis, or aggregation across trials, is possible – these include reporting of mea-
sures of error and precision. Whatever the reason for inclusion in the checklist, the
E&E explains the rationale and provides examples from published accounts of
N-of-1 trials.
While most elements of the CENT checklist are intended for reports of both
individual N-of-1 trials and series of N-of-1 trials, there is need for some nuanced
reporting differences for some checklist items between these two types of reports
and, where applicable to both individual and aggregated reports, examples are given
of each type.
Fundamentally, detailed reporting of key elements of the methods and results for
N-of-1 trials enables the reader’s assessment of the validity of the research and both
the clinician’s and researcher’s optimal use of N-of-1 trial findings, either in clinical
care or future research.
Although reporting guidelines are not intended to dictate how a study should be
designed and conducted, full reporting is easiest with forethought and planning.
Thus, these two elements, the CENT checklist and the E&E, are helpful to the
researcher in both protocol development and manuscript preparation once a trial (or
series of trials) is complete. The CENT guidance can be used by journal editors and
peer reviewers to assess the merits of a research report and request that gaps be
filled. Many journals that have endorsed the CONSORT checklist require that
authors provide a completed checklist, indicating the page number of each item in
the manuscript when the manuscript is submitted for consideration (e.g. JAMA
Instructions For Authors (JAMA 2014)). Finally, post-publication, the user of the
published article can critically appraise the work.
The rest of this chapter addresses how specific aspects of the N-of-1 trial should
be reported and will draw heavily from the CENT statement (Vohra et al. 2015) and
the CENT E&E (Shamseer et al. 2015). The reader may wish to keep the CENT
checklist at hand while reading. All CONSORT extension checklists can be down-
loaded from the EQUATOR web site (http://www.equator-network.org/).
Main elements within CENT that differ from CONSORT begin with Item 1a, the
title of the manuscript – it should identify as an “N-of-1 trial” in the title, and for a
series, identify the design as “a series of N-of-1 trials”. The abstract should be a
structured summary of trial design, methods, results and conclusions. Detailed
guidance is available in Table 14.1 of the CENT E&E (Shamseer et al. 2015) as well
as the CONSORT extension for abstracts, designed to cover both abstracts of jour-
nal articles and conference abstracts (Hopewell et al. 2008).
Authors should state the scientific background, explain the rationale, including the
rationale for using the N-of-1 design and state the specific objectives or hypotheses
of the trial. It may be helpful to clarify whether the trial was done as research or as
clinical care (Punja et al. 2014).
Trial Design
For N-of-1 trials, authors would describe the planned number of periods and the
duration of each period (including run-in and wash out, if applicable) with rationale.
In addition, if the report describes a series of trials, authors should state whether and
how the design was individualized to each participant, along with an explanation of
the series design.
Throughout the report, and beginning with the trial design, any deviation from
the planned design, such as a change in number or length of periods, should be
described and explained. The reasons for the deviation may be important in inter-
preting the results.
Participant(s)
the eligibility criteria for trial participation should be included; however the
description of the actual trial participants should be reported in the results section.
As well, the methods section should include a description of the settings and loca-
tions where data were collected and the dates defining the periods of recruitment
and follow-up.
The interventions for each period should be described in sufficient detail to allow
replication. The description should include how and when the interventions were
actually administered. A strength of N-of-1 trials is that interventions can generally
be tailored to meet a patient’s unique profile, (Guyatt et al. 2000) and so the inter-
vention as tested needs to be fully and clearly described. Several other CONSORT
extensions are available that can guide reporting of herbal interventions (Gagnier
et al. 2006), acupuncture (MacPherson et al. 2010) and non-pharmacological treat-
ments (Boutron et al. 2008b). In addition, detailed guidance on effectively describ-
ing interventions can be found in the TIDieR – the Template for Intervention
Description and Replication Checklist and Guide (Hoffmann et al. 2014).
Measurement properties, that is, validity and reliability, of outcome assessment
tools are needed. Any changes made to the selection of trial outcomes and measure-
ment instruments after the trial commenced should be stated and reasons for the
change explained. Supplemental guidance on patient-reported outcomes is available
through CONSORT PRO (Calvert et al. 2013).
Population, intervention, comparison and outcome are the classic elements of a
clinical query. At this stage of the report preparation, authors should reflect on
whether they have given a sufficiently clear description of these elements so that the
report can be found by a search of those essential parameters. Although sources
such as PubMed do not allow readers to search the full text of articles, a good
description will enable indexers to assign useful subject headings and thereby make
the article easier to find.
Also included in the methods section are the more technical elements of the design –
sample size, allocation concealment, randomization, blinding and statistical meth-
ods – all of which need to be described in enough detail to permit critical appraisal
and replication.
In discussing these elements, it is helpful to keep in mind that reporting guide-
lines address reporting – they are not prescriptive regarding how a trial should be
conducted, therefore they do not mandate that a trial should or should not be ran-
domized or blinded. CENT takes no position on whether statistical analysis is
appropriate for N-of-1 designs. However, for each of these aspects, the report should
make clear what was done. In N-of-1 trials, randomization refers to the random
14 Reporting N-of-1 Trials to Professional Audiences 183
Analytic Methods
Analytic methods for N-of-1 may include two broad types of approaches: visual
analysis and statistical analysis. There is some difference of opinion as to which
approach is preferable, thus authors should state which approach to analysis they
184 M. Sampson et al.
used (if not both) and the reasons. In line with recommendations made by the
International Committee for Medical Journal Editors (ICMJE) and the CONSORT
group, analytical methods should be described “with enough detail to enable a
knowledgeable reader with access to the original data to verify the reported results”
(International Committee of Medical Journal Editors 1997).
Many N-of-1 trial authors provide a visual representation of the data which
allows readers to inspect the slope, variability and patterns of the data and the over-
all reliability and consistency of treatment effects (Gage and Lewis 2013; Horner
et al. 2012). Analytic aids such as a line of best fit may sometimes be used to facili-
tate interpretation of visually presented data. If such analysis was done, authors
should describe how and why it was carried out.
For a series of N-of-1 trials, methods of any quantitative synthesis of individual
trial data should be described fully, including subgroup analyses, adjusted analyses
and how heterogeneity between participants was assessed. Authors may find it help-
ful to consult the PRISMA Statement for specific guidance on reporting syntheses
of multiple trials (Moher et al. 2009).
Ethics
Finally, authors should include a statement in the methods section about the research
ethics status of the N-of-1 trial. If the N-of-1 trial was undertaken solely to better
manage an individual’s treatment, i.e. as a form of enhanced clinical assessment,
then the trial may not require institutional ethics review board oversight (Punja et al.
2014; Mahon et al. 1995; Irwig et al. 1995). Whether the report represents an under-
taking under the auspices of research or clinical care should be made clear and if it
is research, the report should cite the institutional ethics board that reviewed and
approved the research study (Punja et al. 2014).
Moving on to the results section, the main elements to be described are recruitment,
participant flow, baseline data, numbers analyzed, the estimated effect size and its
precision (such as 95 % confidence interval) for each primary and secondary out-
come, results of ancillary analysis and finally harms. Thus, the results section
accounts for the flow of the trial as well as the participant outcomes. While symp-
tom data are certainly going to be the core of any N-of-1 trial report, that data cannot
be interpreted in isolation.
The results section should begin with a clear account of the number and sequence
of periods completed and any changes from the original plan and the reasons for
those changes. If the report describes a series of N-of-1 trials, the number of partici-
pants who were enrolled, assigned to interventions and analyzed for the primary
outcome should be described. Any losses or exclusion of participants after treat-
14 Reporting N-of-1 Trials to Professional Audiences 185
ment assignment should be accounted for. Whether or not any periods were stopped
early and whether or not the trial was stopped early should be reported and any early
stopping should be explained.
A diagram showing the flow of participants through the trial is strongly recom-
mended. This will be similar to the flow diagram recommended by the CONSORT
statement and an example from an N-of-1 series is presented in the CENT E&E and
reproduced here (Fig. 14.1) (Shamseer et al. 2015).
Following the description of the trial flow, a table showing baseline demographic
and clinical characteristics for trial participants is needed.
This brings us to the heart of the results section: reporting the results for the
outcomes of interest. This includes stating the estimated effect size (i.e. the magni-
tude of change in outcome for one treatment compared to another) and its precision
(e.g. 95 % confidence interval) for each primary and secondary outcome. For binary
outcomes, presentation of both absolute and relative effect sizes is recommended. In
addition, for a series of N-of-1 trials where a quantitative synthesis was performed,
group estimates of effect and precision for each primary and secondary outcome
should be stated.
Since N-of-1 trials consist of repeated periods, and sometimes multiple outcome
measurements within periods, authors may first summarize and present data for
each treatment before estimating effect size.
In addition to a table or text containing this information, authors may also pres-
ent it in the form of a simple graph, plotting each outcome over time, distinguishing
186 M. Sampson et al.
Fig. 14.2 Example of a trial pictorial showing outcome over time for an N-of-1 trial
trials (Donegan et al. 2013). Making individual patient data available may facilitate
future inclusion of N-of-1 trials in meta-analyses (i.e., using individual patient data)
(Riley et al. 2010). Where the nature of the data is such that presenting all data
points is prohibitive, authors are encouraged to consider data deposition, discussed
later in this chapter.
Results should be reported for all planned primary and secondary end points, and
for all participants, not just for analyses that were statistically significant or interest-
ing. Selective reporting of outcomes within population-based RCTs is a widespread
and serious problem (Chan et al. 2004) although it is, in many cases, unintentional
(Smyth et al. 2011). Trial registration has made selective reporting easier to detect,
although it has not eliminated it (Huić et al. 2011).
Results of any other analyses performed, including assessment of carryover
effects, period effects and intra-subject correlation, should be reported. Where a
series of N-of-1 trials is reported, results of any sub-group analysis that was done
should be reported.
All harms or unintended effects for each intervention should be described. If no
harms were observed, this should be stated. Without such a statement the reader
cannot determine if the treatment was free of unintended effects or if the authors
have simply not reported on this important aspect of trial outcomes. Specific guid-
ance for reporting harms associated with trials is available in CONSORT for harms
(Ioannidis et al. 2004).
The discussion section, like the introduction, follows the same format for N-of-1
trials as for parallel group trials. Authors should discuss limitations of the study,
generalizability of the findings and interpretation of the findings (balancing benefits
and harms) taking into consideration other relevant evidence.
Supplemental Information
deposit (El Emam et al. 2012) although some do advocate for data sharing as neces-
sary to advance the treatment of rare diseases (Pleticha 2014). In datasets that are to
be publicly available, rather than available to those with certain credentials, and
which contain data for small numbers of participants, the risk of re-identification is
considered high (El Emam 2008). The privacy afforded by removing identifying
information from the dataset will be entirely undone if the information supplied in
the manuscript concerning location of the research or characteristics of the partici-
pants can be easily collated with dataset elements to re-identify those participants.
When there is any doubt about the protection of the research participant, authors
should discuss the data release with their institutional review board (Hrynaszkiewicz
et al. 2010).
Clearly data deposit must be planned in advance. Research participants must be
informed in the consent process and data management planned to ensure it can be
de-identified and fully documented without substantial additional effort.
Conclusion
A full report of clinical data to professional audiences will include trial registration,
a publicly available protocol, journal publication of methods and results without
bias toward positive findings as well as public deposit of the anonymized clinical
data. The journal article is the core of this reporting as it is often the only product of
a research study that is available to the public, with the other elements ensuring full
transparency and data reusability for a research study overall. CENT provides a
structured format to ensure that the main journal report is sufficiently detailed that
it can be critically appraised and replicated.
References
Hrynaszkiewicz I et al (2010) Preparing raw clinical data for publication: guidance for journal
editors, authors, and peer reviewers. Trials 11(1):9. Available at: http://www.trialsjournal.com/
content/11/1/9. Accessed 29 Apr 2014
Huić M, Marušić M, Marušić A (2011) Completeness and changes in registered data and reporting
bias of randomized controlled trials in ICMJE journals after trial registration policy.
N. Siegfried, ed. PloS One 6(9):e25258. Available at: http://dx.plos.org/10.1371/journal.
pone.0025258. Accessed 1 May 2014
Huth EJ (1987) Structured abstracts for papers reporting clinical trials. Ann Intern Med 106(4):626.
Available at: http://annals.org/article.aspx?articleid=701794. Accessed 9 May 2014
International Committee of Medical Journal Editors (1997) Uniform requirements for manuscripts
submitted to biomedical journals. N Engl J Med 336(4):309–315. Available at: http://linking-
hub.elsevier.com/retrieve/pii/S0168827897803066. Accessed 5 May 2014
Ioannidis JPA et al (2004) Better reporting of harms in randomized trials: an extension of the
CONSORT statement. Ann Intern Med 141(10):781–788
Irwig L, Glasziou P, March L (1995) Ethics of N-of-1 trials. Lancet 345(8948):469. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/7861872. Accessed 5 May 2014
JAMA (2014) Instructions for authors. Available at: https://jama.jamanetwork.com/public/instruc-
tionsForAuthors.aspx#CONSORTFlowDiagramandChecklist. Accessed 6 May 2014
Jones J, Hunter D (1995) Consensus methods for medical and health services research. BMJ
311(7001):376–380. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=
2550437&tool=pmcentrez&rendertype=abstract. Accessed 2 May 2014
Krleza-Jerić K, Lemmens T (2009) 7th revision of the declaration of Helsinki: good news for the
transparency of clinical trials. Croat Med J 50(2):105–110. Available at: http://www.ncbi.nlm.
nih.gov/pmc/articles/PMC2681053/. Accessed 1 May 2014
Lefebvre C et al (2013) Methodological developments in searching for studies for systematic
reviews: past, present and future? Syst Rev 2(1):78. Available at: http://www.ncbi.nlm.nih.gov/
pubmed/24066664. Accessed 17 Oct 2013
MacPherson H et al (2010) Revised STandards for Reporting Interventions in Clinical Trials of
Acupuncture (STRICTA): extending the CONSORT statement. J Evid Based Med 3(3):140–155
Mahon JL, Feagan BG, Laupacis A (1995) Ethics of N-of-1 trials. Lancet 345(8955):989. Available
at: http://www.sciencedirect.com/science/article/pii/S0140673695907386. Accessed 5 May
2014
Moher D (2009) Guidelines for reporting health care research: advancing the clarity and transpar-
ency of scientific reporting. Can J Anaesth 56(2):96–101
Moher D, Schulz KF, Altman DG (2001) The CONSORT statement: revised recommendations for
improving the quality of reports of parallel group randomized trials. BMC Med Res Methodol
1:2. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=32201&tool=pm
centrez&rendertype=abstract. Accessed 10 Jul 2014
Moher D et al (2009) Preferred reporting items for systematic reviews and meta-analyses: the
PRISMA statement. PLoS Med 6(7):e1000097. Available at: http://www.plosmedicine.org/
article/info%3Adoi%2F10.1371%2Fjournal.pmed.1000097
Moher D et al (2010a) CONSORT 2010 explanation and elaboration: updated guidelines for
reporting parallel group randomised trials. BMJ 340:c869–c869. Available at: http://www.bmj.
com/cgi/doi/10.1136/bmj.c869. Accessed 6 Jul 2010
Moher D et al (2010b) Guidance for developers of health research reporting guidelines. PLoS Med
7(2):e1000217. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2821
895&tool=pmcentrez&rendertype=abstract
Moher D et al (2011) Describing reporting guidelines for health research: a systematic review. J
Clin Epidemiol 64(7):718–742
Morris C (2008) The EQUATOR network: promoting the transparent and accurate reporting of
research. Dev Med Child Neurol 50(10):723
Nakayama T et al (2005) Adoption of structured abstracts by general medical journals and format
for a structured abstract. J Med Libr Assoc 93(2):237–242. Available at: http://www.pubmed-
central.nih.gov/articlerender.fcgi?artid=1082941&tool=pmcentrez&rendertype=abstract.
Accessed 29 Apr 2014
14 Reporting N-of-1 Trials to Professional Audiences 193
Nikles J et al (2013) Do pilocarpine drops help dry mouth in palliative care patients: a protocol for
an aggregated series of n-of-1 trials. BMC Palliat Care 12(1):39. Available at: http://www.
biomedcentral.com/1472-684X/12/39. Accessed 29 Apr 2014
Norris SL et al (2014) Clinical trial registries are of minimal use for identifying selective outcome
and analysis reporting. Res Synth Met. Available at: http://doi.wiley.com/10.1002/jrsm.1113.
Accessed 19 Mar 2014
Pleticha R (2014) Guest post: people with rare diseases need results from all trials. All Trials News.
Available at: http://www.alltrials.net/2014/guest-post-people-with-rare-diseases-need-results-
from-all-trial/. Accessed 12 May 2014
Price JD, Grimley Evans J (2002) N-of-1 randomized controlled trials (‘N-of-1 trials’): singularly
useful in geriatric medicine. Age Ageing 31(4):227–232
Punja S et al (2014) Ethical framework for N-of-1 trials: clinical care, quality improvement, or
human subjects research? In: Kravitz R, Duan N, De. M. C. N.-1G. Panel (eds) Design and
implementation of N-of-1 trials: a user’s guide. Agency for Healthcare Research and Quality,
Rockville, pp 13–22
Riley RD, Lambert PC, Abo-Zaid G (2010) Meta-analysis of individual participant data: rationale,
conduct, and reporting. BMJ 340:c221. Available at: http://www.bmj.com/cgi/doi/10.1136/
bmj.c221. Accessed 6 May 2014
Ripple AM et al (2011) A retrospective cohort study of structured abstracts in MEDLINE, 1992–
2006. J Med Libr Assoc 99(2):160–163. Available at: http://www.pubmedcentral.nih.gov/arti-
clerender.fcgi?artid=3066587&tool=pmcentrez&rendertype=abstract. Accessed 29 Apr 2014
Schardt C et al (2007) Utilization of the PICO framework to improve searching PubMed for clini-
cal questions. BMC 7(1):16. Available at: http://www.biomedcentral.com/1472-6947/7/16.
Accessed 7 May 2014
Schulz KF, Altman DG, Moher D (2010) CONSORT 2010 Statement: updated guidelines for
reporting parallel group randomised trials. BMJ 340:c332–c332. Available at: http://www.bmj.
com/cgi/doi/10.1136/bmj.c332. Accessed 13 Apr 2011
Science.gc.ca (2011) Open access: research data. Available at: http://www.science.gc.ca/default.
asp?lang=en&n=2BBD98C5-1. Accessed 12 May 2014
Shamseer L, Sampson M, Bukutu C, Schmid CH, Nikles J, Tate R, Johnston BC, Zucker D,
Shadish WR, Kravitz R, Guyatt G, Altman DG, Moher D, Vohra S, CENT group (2015)
CONSORT extension for reporting N-of-1 trials (CENT) 2015: explanation and elaboration.
BMJ 350:h1793. doi:10.1136/bmj.h1793
Sharma S, Harrison JE (2006) Structured abstracts: do they improve the quality of information in
abstracts? Am J Orthod Dentofacial Orthop 130(4):523–530. Available at: http://www.science-
direct.com/science/article/pii/S0889540606008936. Accessed 29 Apr 2014
Smyth RMD et al (2011) Frequency and reasons for outcome reporting bias in clinical trials: inter-
views with trialists. BMJ 342:c7153. Available at: http://www.pubmedcentral.nih.gov/articler-
ender.fcgi?artid=3016816&tool=pmcentrez&rendertype=abstract. Accessed 6 May 2014
Sollaci LBMGP (2004) The introduction, methods, results, and discussion (IMRAD) structure: a
fifty-year survey. J Med Libr Assoc 92(3):364–367. Available at: http://www.ncbi.nlm.nih.gov/
pmc/articles/PMC442179/. Accessed 29 Apr 2014
Stevenson HA, Harrison JE (2009) Structured abstracts: do they improve citation retrieval from
dental journals? J Orthod 36(1):52–60; discussion 15–6
Tufte E (1990) Envisioning information. Graphics Press, Cheshire
Turner L, Shamseer L, Altman DG, Weeks L et al (2012a) Consolidated standards of reporting
trials (CONSORT) and the completeness of reporting of randomised controlled trials (RCTs)
published in medical journals. Cochrane Database Syst Rev 11(11):MR000030. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/23152285. Accessed 8 May 2014
Turner L, Shamseer L, Altman DG, Schulz KF et al (2012b) Does use of the CONSORT Statement
impact the completeness of reporting of randomised controlled trials published in medical
journals? A Cochrane review. Syst Rev 1:60. Available at: http://www.pubmedcentral.nih.gov/
articlerender.fcgi?artid=3564748&tool=pmcentrez&rendertype=abstract
194 M. Sampson et al.
Vohra S, Shamseer L, Sampson M, Bukutu C, Schmid CH, Tate R, Nikles J, Zucker DR, Kravitz
R, Guyatt G, Altman DG, Moher D, CENT group (2015) CONSORT extension for reporting
N-of-1 trials (CENT) 2015 statement. BMJ 350:h1738. doi:10.1136/bmj.h1738
Wen J et al (2008) The reporting quality of meta-analyses improves: a random sampling study. J
Clin Epidemiol 61(8):770–775. Available at: http://www.ncbi.nlm.nih.gov/pubmed/18411041.
Accessed 27 Nov 2010
Wilczynski NL et al (1995) Preliminary assessment of the effect of more informative (structured)
abstracts on citation retrieval from MEDLINE. MEDINFO 8(Pt 2):1457–1461
World Medical Association General Assembly (2013) Declaration of Helsinki: ethical principles
for medical research involving human subjects. JAMA 310(20):2191–2194. Available at: http://
www.ncbi.nlm.nih.gov/pubmed/24141714
Chapter 15
Single Patient Open Trials (SPOTs)
Abstract Single patient open trials (SPOTs) are nearly identical to standard trials of
treatment. The added essential ingredient is a set of symptoms (commonly arrived at
by negotiation between clinician and patient) to monitor (the outcome measures).
This means they lie somewhere in between formal N-of-1 trials and totally informal
trials of treatment in terms of rigour. SPOTs are accordingly less demanding to
arrange (for both the patient and clinician) than N–of-1 trials, but they require consid-
erably more effort and commitment than casual trials of treatment. This chapter
defines and describes the rationale for SPOTs, discusses when and why they could be
used, as well as their limitations, and describes outcome measures and analysis. As
well as describing the use of SPOTs in clinical contexts, it covers the extra consider-
ations required when using SPOTs in research. Several examples of the practical
application of SPOTs are given, some with the resulting data. It is anticipated that the
examples may be adapted to enable other clinicians and their patients to perform their
own SPOTs to validate other medical interventions in the context of the individual.
GPs are drawn to pragmatic approaches. One of these is the therapeutic trial (“trial
of treatment”), a well-established practical tool for deciding if a treatment is going
to be useful: the patient is asked to ‘try’ the treatment, and report back if the symp-
toms are helped or not. This enables a long term treatment plan to be formulated.
This approach has three weaknesses.
Firstly both the placebo effect and the regression-to-mean effects are likely to
play a large role. The placebo effect is well known. Patients are likely to expect that
their symptoms will feel better if offered a new, possibly expensive, treatment pre-
sented with hope and perhaps a list of the dangers of this new treatment (perhaps the
reason why it hadn’t been contemplated until now). The expectations may well
become experiences (Howick et al. 2013).
Secondly regression-to-the-mean, which is less well understood as a statistical
phenomenon (Bland and Altman 1994), has important clinical effects. In this set-
ting, because patients tend to seek help when fluctuating symptoms are at their
worst, then just the passage of time will result in those symptoms settling. Symptoms
that increase and decrease in severity with time will tend to ‘regress’ towards the
average severity, after re-measuring any outlier measurement. This means that any
new treatment adopted can too easily be assumed to the cause of any improvement
in a post hoc, ergo propter hoc fallacy.
Finally, the vagueness of the outcome of patients’ symptoms means that any
placebo and regression-to-the-mean effects may become amplified. One of the
strengths of randomised trials (including N-of-1 trials) is that great care is taken
to identify outcome measures – in this case, symptoms to follow with time.
Otherwise our all-too-human characteristics of over-estimating causal effects
mean that we tend to focus on symptom components that improve, and neglect
those that do not (or even get worse) if they do not conform to the expected benefit
pattern.
15 Single Patient Open Trials (SPOTs) 197
Single patient open trials (SPOTs) are nearly identical to trials of treatment. The
added essential ingredient is a (commonly arrived at) set of symptoms to monitor
(the outcome measure). This means they lie somewhere in between formal N-of-1
trials and totally informal trials of treatment in terms of rigour. SPOTs are accord-
ingly less demanding to arrange (for both the patient and clinician) than N of 1 tri-
als, but they require considerably more effort and commitment than casual trials of
treatment.
SPOTs have several essential components (see Box 15.1).
In summary, SPOTs provide a framework and methodology for evaluating the
patient response in single patients in practice using patient centred outcomes to
assess the extent of benefit of any given treatment in an individual.
SPOTs can help in situations in which there is uncertainty about two or more treat-
ment options – in exactly the same way that N-of-1 trials can help.
Why should there be uncertainty? One could argue that we have the whole edi-
fice of evidence-based medicine (EBM) to help direct us to the most effective treat-
ment. This is true, and for many decisions, we can go to trials for the best available
evidence to help us choose the most effective, with the least cost (in terms of adverse
effects, as well as financial cost).
198 J. Smith et al.
However there are some situations in which individual variation plays an impor-
tant role. Individual patients vary in their response to some treatments: what is
effective for one patient may not be effective for another. This is presumably because
of different genetically determined biochemical differences in metabolism. Usually
this is not elucidated, and we have to resort to this form of empirical testing. One
day, we may be able to undertake specific tests in individual patients to decide
whether they will respond to a certain drug or not.
A well-known example of such a clinical situation is that of the disease of
osteoarthritis of the joints, and the possible alternatives of paracetamol (acetamino-
phen) or non-steroidal anti-inflammatory drugs (NSAIDs). The conundrum facing
the prescribing clinician is which of these two different classes to use. Paracetamol
is generally regarded as much safer than the NSAIDs, which cause gastro-intestinal
bleeding, fluid retention and a higher risk of cardiovascular events such as myocar-
dial infarction. This means that most primary care doctors will start patients on
paracetamol first. If this does not control the pain and stiffness, then consider step-
ping up to one of the NSAIDs, perhaps employing a trial of treatment to do so.
However we know that less than half of patients will respond to NSAIDs (March
et al. 1994). The rest will be exposed to the extra risk with no benefit. How can we
decide which group the patient in front of us belongs to?
The best method of course would be an N-of-1 trial (March et al. 1994). However,
this is beyond most clinicians (especially in primary care, where this decision is
faced most commonly). A less rigorous method is a trial of therapy, for the reasons
outlined above.
SPOTs then become a second-best (but much more accessible) method.
SPOTs can help make clinical decisions when patient and clinician are uncertain
about the best of two or more different treatment options.
Firstly there needs to be a willingness of both clinician and patient to recognise
the uncertainty, and join together to answer the question. If either have a strong view
about the efficacy of one of the choices (that is, therapeutic equivalence does not
exist), then this is not a suitable option. Both must be prepared to expend the extra
effort in collecting the data and attempting to interpret it later, and the patient (par-
ticularly) to endure periods of time using what might turn out to be the less effective
treatment option.
Secondly the disease state must be stable, and unlikely to spontaneously resolve.
This means that SPOTs are limited to chronic conditions with persistent symptoms
(or possibly other markers of disease, such as blood pressure, or glycolated haemo-
globin [HbA1c]). However it does not matter if these symptoms fluctuate (it just
means the SPOT might have to take place over a longer time).
15 Single Patient Open Trials (SPOTs) 199
Table 15.1 Comparison of disease centred versus patient-centred management of SPOT processes
Disease-centred/bio-medical
SPOT processes questions Patient-centred questions
Implementation Will the patient adhere/comply? Does the patient like the treatment
enough to use it?
Efficacy Symptom and or disease control Does it address the patient’s concerns?
Drug response? Is this a placebo effect? Does the patient feel better?
Safety Are there adverse effects? Are (any) unwanted effects or safety
risks enough to outweigh (any) benefits?
be more likely to continue the treatment if they are instrumental in gathering data
for its evaluation on themselves, and rate it better than the alternative.
However it should be pointed out that there is little empirical evidence for this
effect. Ideally this research question will be addressed in the future.
Limitations of SPOTs
There are problems with SPOTs. The most important, discussed above, is that the
patient is not blind to which treatment they are using at each time period. It is too
easy to attribute to a treatment incorrectly, effects that are in fact coming from either
a placebo response or regression to the mean.
Although SPOTs may be more patient-centred than N-of-1 trials, the level of
evidence they provide on the effectiveness of treatments is lower due to the method-
ological differences highlighted in Table 15.2.
Some of these shortcomings can be partly mitigated by a focus on making the
outcome as robust as possible. The vague outcomes (e.g. “feeling better”; “sleeping
better”) that are tolerated in informal trials of treatment, are often too subjective to
be effective. The identification of an objective-enough outcome to measure, that still
has meaning to the patient, may be challenging, and may require careful probing
during the consultation (perhaps extending over more than one).
SPOTs are not suitable for all patients and all conditions – the condition must be
stable and long standing enough to be able to measure the effects of the interven-
tions over a reasonable period with minimal variation attributable to natural history.
The patient needs to have sufficient interest in their condition and its optimum
Outcome Measures
What to Measure
The indicator or outcome needs to be clear, quite specific, and reflect what is
most important to the patient. Then this choice must be agreed. The more the
outcome is relevant to the patient, the more likely they will be interested enough
to document their responses e.g. level of pain, mobility, unwanted fatigue or
other drug side effects. Outcome measures may include both subjective (e.g.
patient reported outcomes) and objective (e.g. respiratory function tests). The
number of measures should be kept to the minimum required to achieve the aims
of the SPOT.
How to Measure it
There are many recording techniques for outcome measurement. Examples include:
manually entering information in a written or electronic diary format; and by using
advanced technology available in smart phone apps to provide both automatic mea-
surements and simultaneous storage on smart phones.
There are an abundance of apps for all sorts of things including measurement of
exercise, diet, heart rate, sleep and mental health, available free or at a small cost.
Their functionality and their usefulness to the SPOT process varies; just how much
they have to offer depends on both their appeal to an individual patient and if what
they measure fits with the chosen outcome/s to be measured.
This means that if heart rate is an issue for patient and clinician, then an App
such as “Instant Heart rate” can be used to measure and record pulse rate at specified
times. Alternatively if sleep or mood is the issue, then there are Apps to record these
values too. See Table 15.4 for examples of these.
The frequency and duration of monitoring need to be agreed between patient and
doctor. Just how often it is practical or realistic to measure, as well as how long for
(e.g. twice a day for 2 weeks; or once a day for 1 month) will depend on the patient’s
interest and attitude. Greater scientific rigour is more likely to be obtained by
repeated cycles of treatment versus comparator and/or no treatment exposures.
Example 2 (p. 206) refers to 3 cycles of 4 weeks each. It is important to adjust
around the predicted length of drug washout periods, and any expected delays in the
onset of action of each medication (to avoid misinterpretation of carryover effects).
Symptoms subject to more fluctuation will need more data to off-set the greater
uncertainty.
Ideally the analysis of the results of SPOTs is both simple and meaningful for
patient as well as clinician. This means simple enough to be undertaken without
statistical training. Apps used for data collection could be designed to do some of
these simple calculations. Descriptive results are usually fairly simple to understand
and may just be tallied, e.g. ‘treatment A was preferred over treatment B in two
cycles and no preference was expressed in the third cycle’.
In fact the analysis of SPOTs could be as complicated as for N-of-1 trials (see
Chap. 12). However in SPOTs more emphasis is placed on the individual patient’s
interpretation of the data, and less on what is significant (either at the statistical or
minimum clinically important difference level).
Deciding what is a statistically significant difference between the time periods of
an N-of-1 trial or SPOT is often difficult. Some outcomes may be already well-
defined by existing clinical guidelines, for example blood pressure targets for dia-
betics. For some quantitative outcomes, minimum clinically important differences
can be found in the clinical research literature, but these are essentially a statistical
construct based on a comparison of change scores over time with the patients’
global impression of change for a series of patients (Copay et al. 2007). What
patients regard as a worthwhile change in their clinical status can vary widely. For
example in chronic low back pain, this can vary from 1 % to 100 % for reductions
in pain and in disability (Yelland and Schluter 2006).
Ideally the threshold for a worthwhile difference in response to the treatments in
question should be decided in consultation with the patient before the SPOT com-
mences to avoid the post hoc, ergo propter hoc fallacy. This also fits in well with the
patient-centred approach of the SPOT.
For SPOTs with more than one outcome, a more complex process will be needed
to make conclusions about differences in effectiveness of treatments. What is the
relative value of each outcome medically and to the patient? Should all outcomes be
weighted equally or does one take precedence over another? Are the benefits of
treatments outweighed by their adverse effects? Ultimately it may be difficult to
make a definitive statement about results of a SPOT and it may be more appropriate
to make a statement describing the findings for each outcome.
The extent to which the conclusions from SPOTs will influence decisions about
future management is unknown at present and should be a priority topic in future
research about SPOTs. However it seems a reasonable hypothesis that involving the
patient in all stages of the SPOT process from design to data collection to decisions
about the results and conclusions, will give them a sense of ownership that may help
them to adhere better to management decisions than through the conventional infor-
mal trial of treatment.
204 J. Smith et al.
Up to now, this chapter has been written with clinical care in mind. However SPOTS
can be used in research just as N-of-1 trials can.
They can be used in two ways:
1. SPOTs as single ‘case reports’. These tend to be hypothesis-generating rather
than providing any generalizable information to predict how future patients may
be managed;
2. A series of SPOTs. These are more able to be generalised to future patients, as in
“what can be expected from patients presenting with equipoise about X treat-
ment for Y disease…”.
There are extra considerations for the use of SPOTs as a research tool.
Informed Consent
Ethical Approval
If using a SPOT to guide clinical care of a patient, ethics approval is not required on
the assumption that the patient’s best interests are served. A patient’s implied con-
sent is adequate.
To prepare an ethics approval application (which can be submitted to a university
or professional college institutional ethics committee), there is a need to set out for
both future patients and the committee what is intended. This will need to include
all the features required by ethics committees (see Chapter 11).
Protocols
A formal protocol is not essential for use in therapeutic SPOTs (although it is a good
idea to document what is intended in the patient’s clinical record).
15 Single Patient Open Trials (SPOTs) 205
SPOT Example 1
“I want you to prescribe me testosterone.”
The patient was hunched and ready for conflict from the consulting room
chair.
“I have looked it up on the web,” he continued, “and I am sure it will make
me feel better”.
“If you don’t give it to me, I’ll get something illegally,” he added.
The GP re-checked his patient records. Of course he had no indication for
supplemental testosterone. The GP began to gingerly explore the reasons for
this 62 year old’s odd request, suspecting some sexual problem, perhaps erec-
tile dysfunction.
To his surprise the patient’s concerns were not sexual. Rather they focussed
on fatigue, or what he called ‘energy levels’. He was not able to get out on the
golf course as much as he used to, although he was able to keep on top of work
(which was office work). Surfing on the Web he had found sites that suggested
to him that testosterone would be effective.
The GP carefully went through his own misgivings. There was evidence
that testosterone could invoke increased risk from a number of factors, princi-
pally thromboembolism. He would be prescribing it off-label for an unortho-
dox indication.
(continued)
206 J. Smith et al.
SPOT Example 2
Jason was a 34 year old male accountant who came to see the GP for his
chronic low back pain present since doing some heavy lifting over 3 years
previously. Over this time the pain had spread up into his mid thoracic
spine and out into both gluteal muscles. It was aggravated by activities such
as lifting, ten-pin bowling and cycling but also by sitting and sleeping.
After 8 h sleep he would wake with significant pain, stiffness and muscle
spasm, causing great difficulties in straightening up. It took 3 h to ease in
the morning with light activity. He got partial relief from osteopathy and
massage and from over-the-counter anti-inflammatory medications taken
intermittently. He had been investigated for an inflammatory cause with an
ESR, CRP and HLA B27 genotyping, but all these tests were negative. A
recent spinal MRI had shown early degenerative changes in his lumbar
discs and facet joints, but no hallmarks of inflammation. There was a mod-
erate restriction of lumbar flexion, but other spinal movements were normal
(continued)
15 Single Patient Open Trials (SPOTs) 207
Doctor and patient discussed the pattern in his response of noticeably reduced pain
and stiffness when on meloxicam with return of the same within several days of
ceasing it.
Note that there was no statistical analysis – just a visual inspection of the results
(Table 15.5) for patterns. Not captured by his recorded outcomes was an increased
ability to exercise and play with his children during the periods on meloxicam. This
was an important outcome to him and one, in retrospect, that should have been
included. On the strength of his experience with this SPOT, Jason was keen to con-
tinue the meloxicam (under the cover of a proton pump inhibitor for gastric
protection).
Conclusion
In conclusion, what SPOT methodology lacks in rigor is compensated for by its ease
of use in the workplace. SPOT research can never eliminate the placebo response.
But SPOTs can arguably claim the top spot in patient centred research, both in treat-
ment outcomes and patient consent. In this territory SPOT is a useful tool and pla-
cebo response is welcomed.
References
Barry MJ, Edgman-Levitan S (2012) Shared decision making – the pinnacle of patient-centered
care. N Engl J Med 366:780–781
Bland JM, Altman DG (1994) Regression towards the mean. BMJ 308:1499
Brown M (2010) Charles Bridges-Webb AO 1934–2010. http://sydney.edu.au/medicine/alumni/
news/tributes/100712.php. [Online]. University of Sydney. Accessed 28 Sept 2013
Copay AG, Subach BR, Glassman SD, Polly DW Jr, Schuler TC (2007) Understanding the mini-
mum clinically important difference: a review of concepts and methods. Spine J 7:541–546
Howick J, Friedemann C, Tsakok M, Watson R, Tsakok T, Thomas J, Perera R, Fleming S,
Heneghan C (2013) Are treatments more effective than placebos? A systematic review and
meta-analysis. PLoS One 8:e62599
March L, Irwig L, Schwarz J, Simpson J, Chock C, Brooks P (1994) N-of-1 trials comparing non-
steroidal anti-inflammatory drug with paracetamol in osteoarthritis. BMJ 309:1041–1046
Stewart M (2005) Reflections on the doctor-patient relationship: from evidence and experience. Br
J Gen Pract 55:793–801
Stewart MA (1995) Effective physician-patient communication and health outcomes: a review.
Can Med Assoc J 152:1423–1433
Yelland MJ, Schluter PJ (2006) Defining worthwhile and desired responses to treatment of chronic
low back pain. Pain Med 7:38–45
Chapter 16
Systematic Review and Meta-analysis Using
N-of-1 Trials
Abstract This chapter discusses issues and approaches related to systematic review
and meta-analysis of N-of-1 trials. Some basic guidelines and methods are described
in this chapter. Some important steps in a systematic review of these types of trials
are discussed in detail. This is followed by a detailed description of meta-analytic
methods, spanning both frequentist and Bayesian techniques. A previously under-
taken meta-analysis of a comparison of treatments for fibromyalgia syndrome is
discussed with some sample size considerations. This is further elaborated on
through a discussion on the statistical power of studies through a comparison of
treatments for chronic pain. The chapter concludes with some final thoughts about
the aggregation of evidence from individual N-of-1 trials.
Introduction
randomized controlled clinical trials, and have therefore been promoted as poten-
tially useful for the individualization of medicine (Lillie et al. 2011; Duan et al.
2013). A comprehensive description of these types of trials is provided by Kravitz
et al. (2014a) while Dallery et al. (2013) provide examples of other disciplines
including psychology and occupational therapy where such designs have generated
evidenced-based practices.
Although a trial can comprise a single individual, they commonly comprise mul-
tiple individuals. The multiple subjects designs allow for evaluation and comparison
of treatments both within and across individuals. That is, the data from the individu-
als can be statistically combined to provide individual treatment-effect estimates
which ‘borrow strength’ across other similar patients, and also provide average
treatment effects. In a similar manner, evidence can be combined among different
N-of-1 trials conducted with different groups of patients. These combinations can
be performed through systematic reviews and meta-analysis.
Systematic reviews provide a framework for consistent evaluation of studies
undertaken for the purpose of addressing a common scientific question, where ‘sci-
entific’ is used here in a broad sense and covers medicine, science, social science,
environment and ecology, finance and economics, and so on. Systematic reviews are
endemic in medical research and are often a compulsory component of evidence-
based medicine. They are also becoming standard practice in other fields; see, for
example, the guidelines for systematic reviews and meta-analysis in ecology and
evolution detailed in Koricheva et al. (2013).
Meta-analysis is the quantitative combination of statistical estimates from a col-
lection of studies, where these are often compiled as part of a systematic review.
The methodology employed for meta-analysis depends on a range of statistical con-
siderations as well as the aim of the meta-analysis itself. There is a large literature
on meta-analysis, particularly in the field of clinical medicine and health. The pur-
pose of this chapter is to discuss the way in which evidence from N-of-1 trials can
be used for systematic reviews and meta-analysis. The process of conducting a sys-
tematic review is discussed in section “Systematic Reviews of N-of-1 Trials” and of
undertaking a meta-analysis in section “Meta-analysis of N-of-1 Trials”. Sample
size considerations are discussed in section “Meta-analysis Modelling Decisions”.
The chapter concludes with a general discussion and directions for future research.
2. Develop 5. Assess
1. Define
criteria for 3. Search 4. Collect risk of bias
the review
including for studies data in included
question
studies studies
the review question, developing criteria for including studies, searching for studies,
collecting data, and assessing risk of bias in included studies, among other issues.
These steps are depicted in Fig. 16.1. Other fields have also developed guidelines;
see, for example, Chaps. 4 and 5 in Koricheva et al. (2013) which deal with search-
ing literature, criteria for selection of studies, and extraction and critical appraisal
of data.
Below we discuss the application of the first two steps depicted in Fig. 16.1,
which are most relevant for systematic reviews of N-of-1 trials.
As with all meta-analyses, the review question must be clearly and specifically
defined. It must be sufficiently specific to allow the combined results to be interpre-
table, yet sufficiently broad to allow for a sufficient number of studies to be included
in the analysis.
In formulating the review question, one can reflect upon the suggestions of
Zucker et al. (2010) in regards to the type of studies N-of-1 trials are best applied to.
These are:
• The condition must be chronic and stable.
• The interventions must be symptomatic (not permanently changing the condition
status).
• The interventions need to have appropriate on/off kinetics to limit possible car-
ryover and period effects.
Because N-of-1 trials are individually tailored to a single patient, the criteria for
inclusion of studies will to some extent be specific to the problem. However, there
are common considerations that will apply to most, if not all, reviews:
214 K. Mengersen et al.
A scientific study is often proposed when there is substantial uncertainty about the
answer to a question of interest. Clinical trials are often proposed when the question
of interest is a comparison of the effectiveness of specified treatments. Most trials
are constructed to learn about the average treatment effect across a group of indi-
viduals receiving a treatment. An N-of-1 trial is often proposed when interest is
focused on the efficacy of treatment for a single individual, in order to make a clini-
cal decision.
It is useful to understand the background and motivation of the trial. For exam-
ple, is the scientific study motivated by a general lack of knowledge, or because
16 Systematic Review and Meta-analysis Using N-of-1 Trials 215
A study should provide justification of the use of this design as opposed to alterna-
tives, such as the gold-standard randomized parallel group trial.
Kravitz et al. (2014b) provide a detailed exposition of the reasons why such a
design might be selected. In summary, an N-of-1 trial is most useful:
• When there is substantial variation in treatment outcomes within the individual,
and when interest focuses on these treatment outcomes for the individual;
• If the variation in treatment effects across patients cannot be easily predicted
from available prognostic factors but is anticipated to be substantial;
• If the outcome of interest is chronic, stable or slowly progressive and is either
symptomatic or is associated with a valid biomarker;
• If the outcome is rare, so that there is little other evidence of treatment effect;
• If the treatments have relatively rapid onset and washout; and
• If the treatment regime is relatively straightforward.
The design of an N-of-1 trial requires careful consideration (Kravitz et al. 2014b;
Schmid and Duan 2014). These trials are subject to usual study design issues such
as randomization, replication, blocking, the choice of outcomes, the scale of the
outcomes (continuous, categorical or count data). They are also subject to other
specific issues, including the design of crossovers, time-dependent confounders
and changes over time independent of treatment, carryover of treatment effects
from one period into the next, auto-correlation of measurements and premature
end-of-treatment periods.
Kravitz et al. (2014b) describe five important characteristics of a well-designed
N-of-1 trial: balanced sequence assignment, repetition, washout and run-in, blind-
ing, and systematic outcomes measurement. Schmid and Duan (2014) reiterate
these design principles, identifying in particular four considerations: randomization
and counterbalancing, replication and blocking, the number of crossovers needed to
optimize statistical power, adaptation, and the choice of outcomes of interest to the
patient and clinician.
216 K. Mengersen et al.
a daptation rules need to be developed before the trial commences and potential
biases need to be mitigated by strategies such as blinding.
Finally, systematic outcomes measurement refers to the identification of what
data to collect and how to collect them. Kravitz et al. (2014b) describe the process
of identifying outcome domains, indicators or measures of those domains, and data
collection. Whereas other clinical trial designs typically select a primary measure of
interest and design the trial around this, N-of-1 trials focus on the outcomes of
importance to the individual patient and his or her clinician. These will often be
analyzed separately, but can be combined in some form of index or composite mea-
sure. While this offers substantial flexibility and the possibility of more effectively
answering the aims of the trial stakeholders, it is important to verify the acceptabil-
ity, reliability, validity and relevance of selected measures. Importantly, do they
provide comprehensive coverage of the question of interest and do they measure
what they are intended to measure? A range of data types and sources, ranging from
surveys to mobile devices, is potentially available for use in N-in-1 trials. The cho-
sen observations, which will form the trial statistics, must be adequate estimators of
the measure of interest, both clinically and statistically. A preferred estimator, and
corresponding statistic, is one that is accepted in the literature and is unbiased and
consistent. That is, it can used for comparative purposes in a systematic review or
meta-analysis, if it accurately estimates the target measure and the precision of the
estimate improves as the sample size increases.
An extract of the checklist developed by Schmid and Duan (2014) for assessing
the design of an N-of-1 trial is reproduced in Table 16.2.
The design principles described above are intended to avoid a range of well-
recognized potential biases and confounders in an N-of-1 trial. Other biases and
confounders can arise in these studies. Often these are situation-specific.
Consideration of potential issues in the trial design that may impact on the statistical
estimates is important if these trials and corresponding estimates are to be included
in systematic reviews and meta-analyses.
The typical N-of-1 trial is based on an individual design developed between trial
stakeholders, typically a clinician and a patient. The stakeholders therefore have an
invested interest in conducting the study according to the agreed design: it is likely
to be practically achievable and to answer questions of direct interest. However,
there are always variations between intent and execution, i.e. between trial design
and conduct. It is important that these differences are documented.
218 K. Mengersen et al.
Table 16.2 Checklist for design of N-of-1 trials (Adapted from Schmid and Duan (2014))
Guideline Consideration
Balance treatment assignment Design needs to eliminate or mitigate potential
across conditions, using either confounding effects such as time trend
randomization or Pros and cons of randomization versus counterbalancing
counterbalancing, along with need to be considered carefully and selected appropriately.
blocking Counterbalancing is more effective if there is good
information on a critical confounding effect, for example,
linear time trend. Randomization is more robust against
unknown sources of confounding
Blocking helps mitigate potential confounding with time
trend, especially when early termination occurs
Blind treatment assignment when Blinding of patients and clinicians, to the extent feasible, is
feasible particularly important for N-of-1 trials, especially with
self-reported outcomes, when it is deemed necessary to
eliminate or mitigate nonspecific effects ancillary to
treatment
Some nonspecific effects might continue beyond the end of
trial within the individual patient, and therefore should be
considered part of the treatment effect instead of a source
of confounding
Use appropriate measures to deal A washout period is commonly used to mitigate carryover
with potential bias due to effect. Adverse interaction among treatments being
carryover and slow onset effects compared indicates the need for a washout period
Absence of active treatment during a washout period might
pose an ethical dilemma and diminish user acceptance for
active control trials
Washout does not deal with slow onset of new treatment
and might actually extend duration of transition between
treatment conditions
Analytic methods can be useful for dealing with carryover
and slow onset effects when repeated assessments are
available
Perform multiple assessments This increases the precision of estimated treatment effect
within treatment periods and facilitates analytic approaches to address carryover or
slow onset effects
The cost and respondent burden need to be taken into
consideration in decisions regarding frequency of
assessments
Consider adaptive trial designs These can help improve trial efficiency and reduce
and sequential stopping rules patients’ exposure to the inferior treatment condition
One important issue raised by Schmid and Duan (2014) is data collection. While
the motivation of patients to participate in an N-of-1 trial may reduce the problem
of missing data, the complexity of the design, multiplicity of outcome measures and
lack of easy access to standard data acquisition tools such as forms and software can
increase the problem. This needs more careful attention in these types of trials than
in more standard randomized controlled trials for which the infrastructure, data col-
lection rules and mechanisms, and trial support are more widely established.
16 Systematic Review and Meta-analysis Using N-of-1 Trials 219
Unplanned and unexpected events during the course of a trial can induce poten-
tial biases and confounders. Examples of these include unexpected termination
of the study, missing data not at random, unplanned changes in patient charac-
teristics relevant to the study and unplanned external influences on the trial.
These need to be carefully evaluated with respect to their potential influence on
the trial analysis and results. If the impact is substantial, the trial should be
adapted or terminated.
Were Appropriate Statistical Methods Used for the Analysis of the Data?
Most N-of-1 trials published to date have used graphical comparisons, a statistical
cutoff or a clinical significance cutoff to compare treatments (Gabler et al. 2011).
Graphical and statistical summaries are always helpful in the preliminary analysis
of data, and they can often facilitate inferences if the study design is simple and the
results are clear. However, models are often important.
Schmid and Duan (2014) provides details of the statistical methods used for
analysis of individual participant data from N-of-1 trials. These methods are repre-
sented as a decision tree in Fig. 12.1.
As indicated in Fig. 12.1, many analyses ignore both time-related effects and the
fact that the measurements in an N-of-1 trial are correlated. Correlation between
measurements within periods, and carryover effects between treatment periods, can
be accounted for using established time series methods, such as autoregressive
models and dynamic models. The models can also account for time. Depending on
the nature of the time dependence, this can be achieved by indexing time within
treatment period and/or by indexing treatment periods within blocks.
As argued by Schmid and Duan (2014), a Bayesian model is better able to describe
the complexities of an N-of-1 trial described above, compared with a standard fre-
quentist model, because it is more modular in structure, can include prior informa-
tion about treatment differences or measurement errors and biases, and can
incorporate different sources and types of information more easily. Moreover, a
Bayesian analysis provides more valuable inferences than a standard frequentist
220 K. Mengersen et al.
analysis, because the posterior probability resulting from the Bayesian analysis is
more interpretable and able to deliver a richer set of results than the p-value that is
typically reported from a frequentist analysis. The posterior probability allows
stakeholders to obtain a wide range of relevant estimates, comparisons and proba-
bilities, with corresponding statements about the uncertainty of these values. These
can be for a composite outcome value or for multiple outcomes, where the latter are
described by joint posterior probability distributions. Schmid and Duan (Schmid
and Duan 2014) provide examples of the types of outputs and inferences that can be
obtained from a Bayesian analysis, including the probability that one treatment is
superior to another treatment, probabilistic ranking of multiple treatments, the prob-
ability that the treatment effect is at least as large as a certain clinically important
size, and so on.
A Bayesian analysis of the individual N-of-1 trial also allows a more streamlined
analysis of multiple N-of-1 trials, as described in section “Meta-analysis Modelling
Decisions”.
Were the Conclusions of the Study Appropriate Given the Study Design,
Conduct and Analysis?
The adequacy of the reporting of statistical results can pose a problem when
evaluating studies. First, it can sometimes be difficult to decide whether results
are presented correctly because the required information is not presented, as
described above. Second, the reported information may be limited (e.g., percent-
ages), requiring substantial interpretation by the author of the report which may
be difficult to verify.
Reporting of conclusions is a third, related issue: the relevant information may
be presented, but if it is not compelling then an enthusiastic author might make
progressively more assertive statements from the ‘Results’ section to the
‘Conclusions’ section and from there to the ‘Abstract’. If an unsuspecting system-
16 Systematic Review and Meta-analysis Using N-of-1 Trials 221
Zucker et al. (2010) provided details of the models, assusmptions and inferences for
the approaches for combining all data available from the N-of-1 trials and compared
these with approaches using summaries or portions of the data. The models are
briefly presented here; see Zucker et al.(2010) for explanation and discussion.
222 K. Mengersen et al.
Table 16.3 Meta-analysis models for N-of-1 trials, extracted from Zucker et al. (2010)
Type of data Types of models
Data aggregated to the trial (patient) level Summary fixed and random effects
models
Data at trial-period level: multiple estimates of an Summary random-effects model or
effect per trial mixed model
Subset of prospective data with treatment order Standard model for analysis of
randomized across trials, e.g.: (i) first period population designs, e.g. (i) t-test; (ii)
treatments, analogous to a randomized parallel group paired t-test
trial; (ii) pair-randomized treatments in first two
periods (AB/BA crossover design)
Data using all periods Fixed or random effects model;
Multiple crossover model; Repeated
measures model; Linear mixed model;
Bayesian hierarchical model
Assume that there is a summary effect yi from each trial. This could be a single
outcome measure or a composite measure. In a fixed effects model, these are
assumed to vary randomly around an overall true mean effect α:
(
yi = α + ε i ; .ε i ∼ N 0,σ i 2 )
where σi2 is the variance associated with yi. If there is no repetition within a trial,
these variances will not be available; if the trials have similar designs then the vari-
ances can be assumed to be equal and the analysis can proceed. Otherwise alterna-
tive assumptions have to be made. If there is only a small number of repetitions or
treatment periods per trial, so that each trial variance is poorly estimated, it may be
preferable to replace σi2 by a common pooled variance σ2. For N-of-1 studies, such
variances are often assumed known, but this may be problematic if the number of
observations is small (usually would not have no replication in an N-of-1 study).
See Zucker et al. (2010) for details.
If there are multiple outcome measures, the model becomes multivariate, with yi
becoming a vector of estimated effects from the ith study, α becoming a vector of
mean effects, and εi having a multivariate normal distribution with σi 2 replaced by
a variance-covariance matrix Σi.
In a random effects model the estimated effect yi is assumed to vary around a
trial-specific effect αi, which is in turn assumed to vary around an overall effect α0:
yi = a i + e i ; . e i ~ N ( 0,s i 2 ) ; a i ~ N (a 0 ,t 2 )
or alternatively and equally
yi = a 0 + a i + e i ; . e i ~ N ( 0, s i 2 ) ; a i ~ N ( 0,t 2 ) .
16 Systematic Review and Meta-analysis Using N-of-1 Trials 223
Here, σi2 describes the variation of the effects within a trial (the ‘within-trial vari-
ance’ and τ2 describes the variation of effects among or between trial (the ‘between-
trial variance’). Note that there need to be enough trials to adequately estimate the
between-trial variance τ2. If this is not the case, a fixed effects model might be pre-
ferred. See Zucker et al. (2010) for further discussion of this issue.
Mixed Models
The above models can be extended to analyze the full set of data available from
multiple N-of-1 trials. As described by Schmid and Duan (2014), let m, i, j, k
and l denote the individual (trial), observation, treatment period, block and
treatment, respectively. Then a simple random-effects model for observation
ymijkl is given by
y mijkl = a m + bl + g k + d j( k ) + ei ( j( k ( m )))
where the four terms indicate the variability among individuals, treatments, blocks,
treatment periods within a block, and observations within a treatment period within
a block within a patient. The treatment effect is considered fixed; the individual or
trial is considered to be random with distribution αm ~ N(α0,σa2) where α0 is the over-
all effect, and the other three effects are also considered to be normally distributed
with means 0 and variances σγ2, σδ2 and σε2, respectively.
The above models can also be extended to allow for time trend and carryover. As
described by Schmid and Duan (2014), a meta-analysis model for outcome y for the
ith patient that incorporates a time trend at time t is given by
yit = a i + bt Tr + gX t + eit
224 K. Mengersen et al.
where Tt is the time at time t and Xt is an indicator for the treatment received. This
model returns an estimate of the trial effect (i.e. individual effect, given by αi), the
linear trend over time (given by β), the treatment effect (given by γ) and the residual
variation (given by εit).
These models can be extended in a straightforward manner to include correla-
tions in the residuals over time, nonlinear terms to capture possible nonlinear trends,
seasonal effects, interactions between patients and other factors explaining variation
across patients.
Bayesian Models
Bayesian models build on the above formulations by adding priors to each of the
unknown parameters and expressing the parameter estimates in the form of posterior
distributions (instead of maximum likelihood estimates as in frequentist analyses).
Inferences of interest, such as comparisons and rankings of treatment effects, probabili-
ties that treatment effects exceed thresholds of interest, etc. are then derived from these
posterior distributions. See Zucker et al. (1997, 2010), Duan et al. (2013), and Schmid
and Duan (2014) for more detailed explanations and examples of Bayesian approaches.
Zucker et al. (2010) describe a case study in which they combined 58 N-of-1 trials
comparing amitriptyline (AMT) and the combination of AMT and fluoxetine (FL)
for treating fibromyalgia syndrome (FMS). Details of the study are provided by
Zucker et al. (2006). The trials had the following characteristics:
• Each trial had six treatment periods: three sets of paired treatments, comprising
one period on AMT + FL and one on AMT.
• All treatment pairs were block randomized.
• The outcome measure (the quality-of-life Fibromyalgia Impact Questionnaire
(FIQ) score) – a continuous value between 0 (best) and 100 (worst) was mea-
sured prior to any FMS medications and again at the end of each of the six
6-week treatment periods.
Zucker et al. (2010) analyzed data from the 46 patients who completed at least
one period on each treatment. Of these patients, 34 completed all six treatment peri-
ods. The authors illustrated the application of a range of methods for meta-analysis.
In particular, a variety of mixed models were fitted to the aggregated data, and the
reader is encouraged to review their work. The meta-analysis models differed in
how the intercept and treatment effect were treated (fixed and/or random), how
patients’ variances were treated (equal or unequal variances) and how the within-
patient variance was structured (for example, single and uncorrelated). They made
a number of comments regarding the implications of sample size in these analyses.
16 Systematic Review and Meta-analysis Using N-of-1 Trials 225
The variance of estimated effects for a typical N-of-1 trial is usually poorly esti-
mated, since the sample size is usually small. For the simple fixed and random
effects models, a common within-trial variance can be calculated by pooling across
trials. Similarly, for a mixed model (e.g., nesting sets of treatments within treatment
periods in the case study described above), a common within-trial covariance matrix
can be calculated. In addition to providing a more robust estimator of the within-
trial variance, this approach also reduces the number of parameters that need to be
estimated. For example, in the above case study, the number of parameters in a
mixed model meta-analysis can be reduced from six variance and 21 covariance
terms in a full model (with different within-trial variances and covariances) to one
variance term (assuming common within-trial variances and uncorrelated trials).
Note that different model assumptions can be considered and evaluated with respect
to stability of estimation and interpretability of results.
In the above case study, Zucker et al. (2010) derived prior distributions from a pub-
lished crossover trial that used the same medications and dosages. They showed that
this produced more robust estimates, in the sense that they were not only based on
226 K. Mengersen et al.
the small number of available observations, but also facilitated the estimation of
otherwise unavailable parameters such as trial-specific variances and covariances.
One difference between a meta-analysis of N-of-1 trials and that of randomized tri-
als is that the number of trials in an N-of-1 study is usually substantially more than
the number available from clinical trials, so that the data provide more information
about the between-trial variance. The Bayesian model is therefore less sensitive to
the prior on this parameter.
The models described in section “Meta-analysis of N-of-1 Trials” and the consider-
ations listed above highlight the importance of being able to accurately estimate the
within-trial and between-trial variances. This necessarily depends on the number of
treatments per trial and the number of trials.
Duan et al. (2013) studied this question by adopting a simple random effects
model and calculating the variance of the mean effect with M trials and N paired
treatment periods per trial, compared with a classic two-period (AB/BA) crossover
design under several combinations of values of the between-trial variance (τ2) and
within-trial variances (σ2/N).
Assuming independent trials, the calculated precision is w = M / (t 2 + 2s 2 / N ) .
The following observations were made by the authors.
• For fixed τ2 and σ2, the value of ω increases as the number of trials (M) increases
and as the number of repeated measures within a trial (N) increases.
• The relative importance of M and N depends on the relative size of the within-
and between-trial variances.
• Additional measurements on individual patients are valuable if the between-trial
variability is small compared with the within-trial variability.
• Conversely, more trials are more valuable than more measurements on indi-
viduals if the within-trial variance is small compared with the between-trial
variance.
The effect on ω of M, N, τ2 and σ2 is illustrated in Fig. 16.2. Here, the horizontal
and vertical axes show values of the within- and between-trial variances, respec-
tively, and the four plots show different combinations of the trial size and number of
trials. The contour lines show the value of the variance 1/ω. Comparison of the
orientation of the contours and their relative magnitude indeed reveals and supports
the above observations.
16 Systematic Review and Meta-analysis Using N-of-1 Trials 227
between-trial variance
between-trial variance
1.8
8
16 18 22 24 2.2 2.4
1.4 1.6
6
6
12 14 20 2
1 1.2
4
4
8 10
0.6 0.8
6
2
2
2 4 0.2 0.4
0
0
0 2 4 6 8 0 2 4 6 8
within-trial variance within-trial variance
M=1, N=1 M=10, N=1
between-trial variance
between-trial variance
8
0.6
0.8
6
6
10
0.2
0.4
1
4
0.9
0.1
0.3
2
8
2
0.7
0.5
1
9
0
0 2 4 6 8 0 2 4 6 8
within-trial variance within-trial variance
M=1, N=10 M=10, N=10
These principles can be applied even when more complex meta-analysis models
are employed. Alternatively, more refined calculations can be made for these mod-
els by examining the role of the within- and between-trial variances in the relevant
equations for the variance components. As a general rule, more measurements
should be taken of parameters which are poorly estimated, but this will depend on
the accuracy required for the overall estimates and the role that the variances play in
these estimates.
knowledge, whether it be from previous studies or expert opinion, about the true
difference or the clinically relevant difference between treatment effects can be
obtained. This can then be used in formulating a study with high power.
We consider statistical power of a meta-analysis through a previously conducted
N-of-1 randomized trial for the assessment of the efficacy of Gabapentin over pla-
cebo for chronic neuropathic pain (Yelland et al. 2009). Details of the study can be
found in the reference, but let’s suppose we are interested in conducting a similar
meta-analysis, with functional limitation as the primary outcome. Based on the
Hierarchical Bayesian meta-analysis conducted by Yelland et al. (2009), it was esti-
mated that the difference between treatment effects was 0.6 (0.2). If we fix the
number of paired cycles per individual at six, then the question is how many indi-
vidual trials are required to maintain a high probability of determining a difference
between treatments assuming a difference in effects of 0.6 exists?
To answer this, we must first be clear about how the treatment difference will
be estimated (that is, how the data will be analyzed). Here, let’s assume that we
follow the methodology above in ‘Summary fixed and random effects models’ and
fit the specified random effects model with no block, period or order effects.
Further we assume that the individual effects and residual variability follow a
normal distribution each with variance of one. It is also assumed that patients have
equal, uncorrelated response variances and equal variances by treatment. Of
course, uncertainty in the parameters (example, the standard error of 0.2 for the
difference in treatment effects) and model/s (for example, equal variances by
treatment or the inclusion of a block effect) can be included but this was not
thought to be relevant for this chapter. It is important to note that the estimates of
power will depend on such assumptions.
With the analysis plan clearly specified and relevant parameters defined, statisti-
cal power can be estimated. In this work, we simply estimated power via simulation.
That is, initially we simulated patient data from the assumed model, re-fit the model
to the simulated data, conducted an hypothesis test to determine if there was a sig-
nificant difference between treatments (significance level used here was 0.05),
recorded the result of the hypothesis test, then repeated the whole process a large
number of times (here we chose to repeat the process 500 times). The proportion of
times the null hypothesis was rejected is the estimate of statistical power. This esti-
mate is shown in Fig. 16.3 for a variety of different numbers of patients.
From Fig. 16.3, the power of the study increases as the number of subjects
increases. In general, it is thought that 80 % power is reasonable, and it appears that
this would be achieved with about 22 individual trials. This can be improved to about
90 % with an additional 11 trials. In estimating statistical power, simulation tech-
niques were used to mimic the potential data which might be observed in the meta-
analysis. An important part of this data simulation was to allow for the occurrence of
missing data as this has the potential to significantly reduce the power of the study.
For example, from Yelland et al. (2009), only 75 % of individual trials yielded at least
one cycle, and only 65 % of trials yielded all three cycles. From these percentages, it
is clear that such trials can be subject to many missing data points, and this should be
accounted for in the simulation when estimating statistical power.
16 Systematic Review and Meta-analysis Using N-of-1 Trials 229
Fig. 16.3 Estimated
statistical power for different
numbers of patients for a
hypothetical N-of-1 trial of
Gabapentin versus placebo
Discussion
This chapter has described the conditions under which evidence from N-of-1 trials
can be included in a systematic review or meta-analysis. The overall answer is yes,
with reservation.
The first reservation is that the N-of-1 trials themselves need to be carefully and
well designed. As discussed in section “Systematic Reviews of N-of-1 Trials”, the
quality of these trials is a paramount issue for systematic reviews and meta-analyses.
Schmid and Duan (2014) argued that although N-of-1 trials allow great flexibility in
meeting the aims of the patient and clinician and conforming to individual con-
straints, they also need to adhere to good design principles if they are to deliver
accurate, replicable and comparable evidence. They suggest that a centralized ser-
vice responsible for designing these trials might assist clinicians who are unfamiliar
with these principles and hence ensure that proper standards are maintained, while
still allowing designs to remain flexible and easy to implement.
The second reservation is that the systematic review must be designed, conducted
and reported in such a way that it facilitates a systematic comparison while allowing
for the individual characteristics of the trials. The systematic review reported by
Gabler et al. (2011), based on 2154 participants in 108 studies published between
1985 and December 2010, found that N-of-1 trials were useful for increasing preci-
sion of estimates for a range of medical conditions, but recommended that the trial
results include a clear description of individual data in order to facilitate future
meta-analysis. Extending this observation, the ‘clear description’ should comprise
common components that enable the systematic comparison to be undertaken. This
is achievable. While the guidelines provided by the Cochrane Collaboration may be
the gold standard for randomized controlled clinical trials, there are also parallels
for less well designed studies, such as those developed by the Centre for Evidence
Based Conservation and National Centre for Ecological Analysis and Synthesis in
ecology and the Campbell Collaboration in the social sciences.
230 K. Mengersen et al.
The systematic review must also be representative in some sense, in that the
comparisons and generalizations arising from the review are applicable to a recog-
nized population. If the review only contains trials that are published because the
treatment comparisons are significant (so-called publication bias or the file-drawer
problem), then the generalizations will not be applicable to the whole population.
Notwithstanding this, a systematic review may still be useful for noting strengths
and weaknesses of the trials, issues in reporting, deficiencies in publication or
access to trial information, other issues related to systematic comparisons and other
information gaps.
The third reservation is that a meta-analysis based on studies that vary substan-
tially with respect to design and reporting needs to be very carefully formulated.
The statistical model needs to accommodate these variations in order to deliver
valid combined estimates. The conclusions of Zucker et al. (2010) were that ‘with
few observations per patient and little information about within-patient variation,
combined N-of-1 trials data may not support models that include complex variance
structures.’ If there are substantive concerns about the trials or the review, then it
may be better not to undertake a meta-analysis at all. However, with sufficient infor-
mation they can be used to estimate population effects and can provide enhanced
estimates and inferences compared with standard clinical trials. Moreover, ‘models
with fixed treatment effects and common variances are robust and lead to conclu-
sions that are similar to, though more precise, than single period or single crossover
designs’ (p. 1312).
In conclusion, the increasing interest in, and application of N-of-trials is clear.
The systematic review reported by Gabler et al. (2011), based on 2154 participants
in 108 studies published between 1985 and December 2010, found that N-of-1 trials
were useful for increasing precision of estimates for a range of medical conditions.
The User Guide for these trials authored by Kravitz et al. (2014a) and sponsored by
the U.S. Agency for Healthcare Research and Quality provides further evidence.
Systematic reviews and meta-analysis of these studies is the next logical step in
evidence-based medicine. The basic guidelines and methods are described in this
chapter, and elaborations are available (Duan et al. 2013; Schmid and Duan 2014;
Zucker et al. 2010). It behoves the biostatisticians involved in these fields to keep
developing, improving and applying them.
References
Dallery J, Cassidy RN, Raiff BR (2013) Single-case experimental designs to evaluate novel
technology-based health interventions. J Med Internet Res 15(2), e22
Dallery J, Raiff BR (2014) Optimizing behavioral health interventions with single-case designs:
from development to dissemination. Transl Behav Med 4:290–303
Duan N, Kravitz RL, Schmid CH (2013) Single-patient (N-of-1) trials: a pragmatic clinical deci-
sion methodology for patient-centered comparative effectiveness research. J Clin Epidemiol
66:S21–S28
16 Systematic Review and Meta-analysis Using N-of-1 Trials 231
Gabler NB, Duan N, Vohra S, Kravitz RL (2011) N-of-1 trials in the medical literature: a system-
atic review. Med Care 49:761–768
Higgins J, Green SE (2008) Cochrane handbook for systematic reviews of interventions (eds) Wiley
Koricheva J, Gurevitch J, Mengersen K (eds) (2013) Hand-book of meta-analysis in ecology and
evolution. Princeton University Press, Princeton
Kravitz RL, Duan N, The Decide Methods Centre N-of-1 Guidance Panel (Duan N, Eslick L,
Gabler NB, Kaplan HC, Kravitz RL, Larson EB, Pace WD, Schmid CH, Sim I, Vohra S) (eds)
(2014a) Design and implementation of N-of-1 trials: a user’s guide. Agency for Healthcare
Research and Quality, Rockville
Kravitz R, Duan N, Vohra S, Li J (2014b) The DEcIDE methods centre N-of-1 guidance panel
introduction to N-of-1 trial: indications and barriers. In: Kravitz R, Duan N, The Decide
Methods Centre N-of-1 Guidance Panel (Duan N, Eslick L, Gabler NB, Kaplan HC,
Kravitz RL, Larson EB, Pace WD, Schmid CH, Sim I, Vohra S) (eds) Design and imple-
mentation of N-of-1 trials: a user’s guide. Agency for Healthcare Re-search and Quality,
Rockville
Lillie EO, Patay B, Diamant J, Isseli B, Topol EJ, Schoril NJ (2011) The N-of-1 clinical trial: the
ultimate strategy for individualizing medicine? Pers Med 8(2):161–173
Schmid C, Duan N (2014) The DEcIDE methods centre N-of-1 guidance panel statistical design
and analytic consideration for N-of-1 trials. In: Kravitz RL, Duan N, Eslick L, Gabler NB,
Kaplan HC, Kravitz RL, Larson EB, Pace WD, Schmid CH, Sim I, Vohra S (eds) Design and
implementation of N-of-1 trials: a user’s guide. Agency for Healthcare Research and Quality,
Rockville
Senior HE, Mitchell GK, Nikles J, Carmont SA, Schluter PJ, Currow DC, Vora R, Yelland MJ,
Agar M, Good PD, Hardy JR (2013) Using aggregated single patient (N-of-1) trials to deter-
mine the effectiveness of psychostimulants to reduce fatigue in advanced cancer patients: a
rationale and protocol. BMC Palliat Care 12:17
Yelland MJ, Poulos CJ, Pillans PI, Bashford GM, Nikles CJ, Sturtevant JM, Vine N, Del Mar CB,
Schluter PJ, Tan M, Chan J, Mackenzie F, Brown R (2009) N-of-1 randomized trials to assess
the efficacy of gabapentin for chronic neuropathic pain. Pain Med 10:754–761
Zucker DR, Schmid CH, Mcintosh MW, D’agostino RB, Selker HP, Lau J (1997) Combining
single patient (N-of-1) trials to estimate population treatment effects and to evaluate individual
patient responses to treatment. J Clin Epidemiol 50:401–410
Zucker DR, Ruthazer R, Schmid CH, Feuer JM, Fischer PA, Kieval RI, Mogavero N, Rapoport RJ,
Selker HP, Stotsky SA, Winston E, Goldenberg DL (2006) Lessons learned combining N-of-1
trials to assess fibromyalgia therapies. J Rheumatol 33:2069–2077
Zucker DR, Ruthazer R, Schmid CH (2010) Individual (N-of-1) trials can be combined to give
population comparative treatment effect estimates: methodologic considerations. J Clin
Epidemiol 63:1312–1323
Chapter 17
Where Are N-of-1 Trials Headed?
Jane Nikles
Abstract N-of-1 trials and review articles have recently been published in the areas
of chronic pain, pediatrics, palliative care, complementary and alternative medicine,
rare diseases, patient-centered care, the behavioral sciences and genomics. These
are briefly reviewed and the current place of N-of-1 trials discussed. The chapter
concludes with a vision for the future of N-of-1 trials.
N-of-1 trials are slowly gaining traction as their usefulness in a variety of situations
becomes more clearly recognized. As of mid 2015, there are 8 N-of-1 trials listed as
currently or soon to be recruiting on clinical trials.gov, consisting of 6 currently
recruiting and 2 not yet recruiting. Two of these are in cancer, three in rare diseases,
and 2 in children. In chronic pain research, palliative care, pediatrics, complemen-
tary and alternative medicine, rare disease research and the behavioral sciences, the
place of N-of-1 trials is being solidified and strengthened.
Chronic Pain
J. Nikles (*)
School of Medicine, The University of Queensland, Ipswich, QLD, Australia
e-mail: uqjnikle@uq.edu.au
Pediatrics
Pediatrics is ideally suited to N-of-1 trials, with small populations, frequent hetero-
geneity of response and the clear benefits for parents of having individual informa-
tion about their child’s response. ADHD is the most common condition studied in
pediatric N-of-1 trials: there have been N-of-1 trials of stimulants for ADHD in a
total of 193 children in 4 studies since 1996 (see Table 17.1). A further 138 children
have undertaken N-of-1 trials in a variety of conditions, making a total of 331 children
undergoing N-of-1 trials since 1996. The various drugs/conditions studied are listed
in table 17.1.
There have been four reviews about the use of N-of-1 trials in children for
• Complementary and alternative medicines in cancer Sung and Feldman (2006).
• Human deoxyribonuclease (rhDNase) in the management of cystic fibrosis
(Suri 2005)
• Montelukast in pediatric asthma (Bush 2014)
• Psychopharmacological studies (Greenhill et al. 2003).
More recently, attention has turned to pediatric analgesic trials. The standard
parallel-placebo analgesic trial design commonly used for adults has scientific, ethi-
cal and practical difficulties in pediatrics, due to the likelihood of subjects experi-
encing pain for extended periods of time. Participants in a FDA sponsored scientific
workshop developed consensus on aspects of pediatric analgesic clinical trial
design. The consensus was that small sample designs, including cross-over trials
and N-of-1 trials, should be considered for particular pediatric chronic pain condi-
tions and for studies of pain and irritability in pediatric palliative care (Berde et al.
2012). One option is to compare best analgesia vs best analgesia plus test treatment,
which removes the ethical problem of placebos for pain trials.
Palliative Care
N-of-1 trials are a new methodology well suited to meet some of the challenges of
conducting trials in a palliative care (PC) setting (Davis and Mitchell 2012). The
need to improve the evidence base on which PC is based is widely acknowledged
(Hermet et al. 2002), especially as many of the common practices and interventions
used routinely are based on anecdote or expert opinion alone. RCTs are considered
by many to be the gold standard for evidence in clinical medicine. However, many
RCTs fail in palliative populations (e.g. Cook et al. 2002) because it is too difficult
to recruit and retain enough people to achieve the predicted sample size without
extraordinary amounts of effort, organization and funding. Multi-site support is
needed, as patients want to participate. N-of-1 trials are an alternative means of
conducting trials in these patients. There is also the issue of how to manage missing
data, where a large proportion of the patient population is likely to die. In normal
intention to treat trials, these are considered as treatment failures, and this is not the
case in PC (Currow et al. 2012). Utilizing N-of-1 trials and including all completed
cycles in the final analysis overcomes this problem.
As described in Chap. 16, it is possible to combine the results of many N-of-1
trials to determine what the effect of a therapy was for a population (Zucker et al.
1997). N-of-1 trials can gather evidence of similar strength to RCTs in PC, but
require less than half the number of subjects. This allows more rapid accumulation
of strong evidence on treatment effects in patients with advanced life-limiting
236 J. Nikles
illness previously very difficult to gather. For suitable clinical questions, N-of-1 trials
will enable high quality evidence to be gathered much more effectively, accelerate
the rate of accumulation of high-grade evidence and have an important effect on
the quality and effectiveness of care offered to this very disadvantaged group
(Mitchell et al. unpublished data).
N-of-1 trials have been used to test valerian for insomnia (Coxeter et al. 2003).
Recently several articles have been published by Chinese groups about using N-of-1
trials for Traditional Chinese Medicine (TCM) (Li et al. 2013; Huang et al. 2014).
One example is Liuwei Dihuang- Decoction for Kidney-Yin Deficiency Syndrome
(Yuhong et al. 2013). N-of-1 trials are uniquely suited to the individualized nature
of TCM, though limited information about the half-lives of some of these medicines
make estimating the length of each treatment period and washout periods, and
therefore trial length, difficult. They may be suitable for trials of acupuncture using
a sham needle (Lee et al. 2012).
Rare Diseases
Patient-Centered Care
N-of-1 trials are a patient-centered intervention that may improves medication man-
agement in suitable chronic diseases. We conducted the first study examining patient
perspectives of N-of-1 trials (Nikles et al. 2005).. Patients were generally very satis-
fied with the N-of-1 trial process. Their participation led to increased knowledge,
awareness and understanding of their condition, their bodies’ response to it, and its
management. Some of this arose specifically from use of daily symptom diaries.
N-of-1 trials led to a sense of empowerment and control as well as improved
individually-focused care. N-of-1 trials appeared to empower these patients as a
result of both collecting information about their responses to different treatment
options, and participating actively in subsequent therapeutic decisions.
17 Where Are N-of-1 Trials Headed? 237
Behavioral Sciences
Genomics
response has been carried out for various drugs e.g. warfarin, azathioprine, some
cancer drugs, clopidogrel. Applying N-of-1 trials to pharmacogenomics, which
would significantly reduce sample size required, has not yet been done, though sug-
gested in a number of articles (Lillie et al. 2011).
We conclude with a quote from Kaput and Morine 2012 who are developing
N-of-1 nutrigenomic research:
High throughput metabolomics, proteomic and genomic technologies provide 21st century
data that humans cannot be randomized into groups: individuals are genetically and bio-
chemically distinct. Gene–environment interactions caused by unique dietary and lifestyle
factors contribute to heterogeneity in physiologies observed in human studies. The risk
factors determined for populations cannot be applied to the individual. Developing indi-
vidual risk or benefit factors in light of the genetic diversity of human populations, the
complexity of foods, culture and lifestyle, and the variety of metabolic processes that lead
to health or disease are significant challenges for personalizing advice for healthy or medi-
cal treatments for individuals with chronic disease (Kaput and Morine 2012)
Vision
Imagine this……a patient attends their doctor with any chronic disease, e.g. osteo-
arthritis. Before prescribing a medication, the doctor writes a “prescription” for an
N-of-1 trial, a test to see whether the medication works for the patient’s pain. The
trial is set up on a mobile phone app allowing customized design of the trial. After
taking medication and placebo in blinded random order and keeping track of pain/
symptoms via the mobile phone app, the patient and their doctor receive a report
about whether the drug works for their pain and whether it has side effects. N-of-1
trials are widely known, and standard practice in clinical situations where there is
uncertainty about the effectiveness of a drug, there is uncertainty about the dose
that will be effective, the drug is expensive or it has important side effects. Patients
initiate discussion with their doctor about using N-of-1 trials to answer specific
questions about their health. In rare conditions, conditions where recruitment is
difficult or populations are small, N-of-1 trials, the highest level of evidence, are
commonly used to assess effectiveness of drugs where the drug to be tested is suit-
able. Pharmaceutical funders such as health insurers and state government health
services use N-of-1 trials to decide whether a patient responds and therefore should
have the cost of the drug reimbursed. A central coordinating unit runs N-of-1 trials
all over the country by post and telephone, working closely with a manufacturing
pharmacy to supply medications. N-of-1 trials are used in many countries, with a
national coordinating center in each country. A worldwide database stores the
design and results of each N-of-1 trial for aggregation with other similar trials to
facilitate the application of sophisticated statistical methods to analyse the trials.
17 Where Are N-of-1 Trials Headed? 239
Conclusion
N-of-1 trials are becoming more widely used, and their application in certain suit-
able areas such as pediatrics and chronic pain is growing. The confluence of genom-
ics, the upswing in personalized medicine and the widespread popularity of wireless
devices make a promising platform for N-of-1 trials to find their true niche.
References
Berde CB, Walco GA, Krane EJ, Anand KJ, Aranda JV, Craig KD, Dampier CD, Finkel JC,
Grabois M, Johnston C, Lantos J, Lebel A, Maxwell LG, McGrath P, Oberlander TF, Schanberg
LE, Stevens B, Taddio A, Von Baeyer CL, Yaster M, Zempsky WT (2012) Pediatric analgesic
clinical trial designs, measures, and extrapolation: report of an FDA scientific workshop.
Pediatrics 129(2):354–364. doi:10.1542/peds.2010-3591, Epub 2012 Jan 16
Bush A. Montelukast in paediatric asthma: where we are now and what still needs to be done? 7
2014 Dec 11. pii: S1526-0542(14)00131-6. doi: 10.1016/j.prrv.2014.10.007
Camfield P, Gordon K, Dooley J, Camfield C (1996) Melatonin appears ineffective in children with
intellectual deficits and fragmented sleep: six "N of 1" trials. J Child Neurol 11(4):341–343
Cook AM, Finlay IG, Butler-Keating RJ (2002) Recruiting into palliative care trials: lessons learnt
from a feasibility study. Palliat Med 16:163–165
Coxeter P, Schluter P, Eastwood H, Nikles J, Glasziou P (2003) Valerian does not reduce
symptoms for patients with chronic insomnia in general practice. Complement Ther Med
11(4):215–222
Currow DC, Plummer JL, Kutner JS, Samsa GP, Abernethy AP (2012) Analyzing phase III studies
in hospice/palliative care. A solution that sits between intention-to-treat and per protocol analy-
ses: the palliative-modified ITT analysis. J Pain Symptom Manag 44(4):595–603
Davidson KW, Peacock J, Kronish IM, Edmondson D (2014) Personalizing behavioral interven-
tions through single-patient (N-of-1) trials. Soc Personal Psychol Compass 8(8):408–421
Davis MP, Mitchell GK (2012) Topics in research: structuring studies in palliative care. Curr Opin
Support Palliat Care 6(4):483–489. doi:10.1097/SPC.0b013e32835843d7
Duan N, Kravitz RL, Schmid CH (2013) Single-patient (N-of-1) trials: a pragmatic clinical deci-
sion methodology for patient-centered comparative effectiveness research. J Clin Epidemiol
66(8 Suppl):S21–S28. doi:10.1016/j.jclinepi.2013.04.006
Duggan CM, Mitchell G, Nikles CJ, Glasziou PP, Del Mar CB, Clavarino A (2000) Managing
ADHD in general practice. N of 1 trials can help! Aust Fam Physician 29(12):1205–1209
Evans JJ, Gast DL, Perdices M, Manolov R (2014) Single case experimental designs: introduction
to a special issue of neuropsychological rehabilitation. Neuropsychol Rehabil 24(3–4):305–
314. doi:10.1080/09602011.2014.903198, Epub 2014 Apr 25
Faber A, Keizer RJ, van den Berg PB, de Jong-van den Berg LT, Tobi H (2007) Use of double-blind
placebo-controlled N-of-1 trials among stimulant-treated youths in the Netherlands: a descrip-
tive study. Eur J Clin Pharmacol 63(1):57–63, Epub 2006 Nov 18
Facey K, Granados A, Guyatt G, Kent A, Shah N, van der Wilt GJ, Wong-Rieger D (2014)
Generating health technology assessment evidence for rare diseases. Int J Technol Assess
Health Care 19:1–7
Gewandter JS, Dworkin RH, Turk DC, McDermott MP, Baron R, Gastonguay MR, Gilron I, Katz
NP, Mehta C, Raja SN, Senn S, Taylor C, Cowan P, Desjardins P, Dimitrova R, Dionne R,
Farrar JT, Hewitt DJ, Iyengar S, Jay GW, Kalso E, Kerns RD, Leff R, Leong M, Petersen KL,
Ravina BM, Rauschkolb C, Rice AS, Rowbotham MC, Sampaio C, Sindrup SH, Stauffer JW,
240 J. Nikles
Steigerwald I, Stewart J, Tobias J, Treede RD, Wallace M, White RE (2014) Research designs
for proof-of-concept chronic pain clinical trials: IMMPACT recommendations. Pain
155(9):1683–1695. doi:10.1016/j.pain.2014.05.025, Epub 2014 May 24
Greenhill LL, Jensen PS, Abikoff H, Blumer JL, Deveaugh-Geiss J, Fisher C, Hoagwood K,
Kratochvil CJ, Lahey BB, Laughren T, Leckman J, Petti TA, Pope K, Shaffer D, Vitiello B,
Zeanah C (2003) Developing strategies for psychopharmacological studies in preschool
children. J Am Acad Child Adolesc Psychiatry 42(4):406–414, Review
Gupta S, Faughnan ME, Tomlinson GA, Bayoumi AM (2011) A framework for applying unfamil-
iar trial designs in studies of rare diseases. J Clin Epidemiol 64(10):1085–1094. doi:10.1016/j.
jclinepi.2010.12.019, Epub 2011 May 6
Hermet R, Burucia B, Sentilles-Monkam A (2002) The need for evidence-based proof in palliative
care. Eur J Pall Care 9:104–107
Huang H, Yang P, Xue J, Tang J, Ding L, Ma Y, Wang J, Guyatt GH, Vanniyasingam T, Zhang Y
(2014) Evaluating the individualized treatment of traditional Chinese medicine: a pilot study of
N-of-1 trials. Evid Based Complement Alternat Med 2014:148730. doi:10.1155/2014/148730,
Epub 2014 Nov 11
Huber AM, Tomlinson GA, Koren G, Feldman BM (2007) Amitriptyline to relieve pain in juvenile
idiopathic arthritis: a pilot study using Bayesian metaanalysis of multiple N-of-1 clinical trials.
J Rheumatol 34(5):1125–1132, Epub 2007 Apr 15
Kaput J, Morine M (2012) Discovery-based nutritional systems biology: developing N-of-1 nutrig-
enomic research. Int J Vitam Nutr Res 82(5):333–341. doi:10.1024/0300-9831/a000128
Kent MA, Camfield CS, Camfield PR (1999) Double-blind methylphenidate trials: practical, use-
ful, and highly endorsed by families. Arch Pediatr Adolesc Med 153(12):1292–1296
Lee S, Lim N, Choi SM, Kim S (2012) Validation study of Kim’s sham needle by measuring facial
temperature: an N-of-1 randomized double-blind placebo-controlled clinical trial. Evid Based
Complement Alternat Med 2012:507937. doi:10.1155/2012/507937, Epub 2012 Mar 6
Li J, Tian J, Ma B, Yang K (2013) N-of-1 trials in china. Complement Ther Med 21(3):190–194.
doi:10.1016/j.ctim.2013.01.003, Epub 2013 Feb 18
Lillie EO, Patay B, Diamant J, Issell B, Topol EJ, Schork NJ (2011) The N-of-1 clinical trial: the
ultimate strategy for individualizing medicine? Per Med 8(2):161–173
Nathan PC, Tomlinson G, Dupuis LL, Greenberg ML, Ota S, Bartels U, Feldman BM (2006) A
pilot study of ondansetron plus metopimazine vs. Ondansetron Monotherapy in children
receiving highly emetogenic chemotherapy: a Bayesian randomized serial N-of-1 trials design.
Support Care Cancer 14(3):268–276, Epub 2005 Jul 29
Nikles CJ, Clavarino AM, Del Mar CB (2005) Using N-of-1 trials as a clinical tool to improve
prescribing. Br J Gen Pract 55(512):175–180
Nikles CJ, Mitchell GK, Del Mar CB, Clavarino AM, McNairn N (2006) An n-of-1 trial service in
clinical practice: testing the effectiveness of stimulants for attention-deficit/hyperactivity disor-
der. Pediatrics 117(6):2040–2046
Nikles CJ, McKinlay L, Mitchell GK, Carmont SA, Senior HE, Waugh MC, Epps A, Schluter PJ,
Lloyd OT (2014) Aggregated n-of-1 trials of central nervous system stimulants versus placebo for
paediatric traumatic brain injury–a pilot study. Trials 15:54. doi:10.1186/1745-6215-15-54
Sung L, Feldman BM (2006) N-of-1 trials: innovative methods to evaluate complementary and
alternative medicines in pediatric cancer. J Pediatr Hematol Oncol 28(4):263–266
Sung L, Tomlinson GA, Greenberg ML, Koren G, Judd P, Ota S, Feldman BM (2007) Serial con-
trolled N-of-1 trials of topical vitamin E as prophylaxis for chemotherapy-induced oral muco-
sitis in paediatric patients. Eur J Cancer 43(8):1269–1275, Epub 2007 Mar 23
Suri R (2005) The use of human deoxyribonuclease (rhDNase) in the management of cystic fibro-
sis. BioDrugs 19(3):135–144
Suri R, Metcalfe C, Wallis C, Bush A (2004) Predicting response to rhDNase and hypertonic saline
in children with cystic fibrosis. Pediatr Pulmonol 37(4):305–310
17 Where Are N-of-1 Trials Headed? 241
Taragin D, Berman S, Zelnik N, Karni A, Tirosh E (2013) Parents’ attitudes toward methylphenidate
using n-of-1 trial: a pilot study. Atten Defic Hyperact Disord 5(2):105–109. doi:10.1007/
s12402-012-0099-x. Epub 2012 Dec 16
Tate RL, McDonald S, Perdices M, Togher L, Schultz R, Savage S (2008) Rating the methodologi-
cal quality of single-subject designs and n-of-1 trials: introducing the single-case experimental
design (SCED) scale. Neuropsychol Rehabil 18(4):385–401. doi:10.1080/09602010802009201
Tate RL, Perdices M, Rosenkoetter U, Wakim D, Godbee K, Togher L, McDonald S (2013)
Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials:
the 15-item risk of bias in N-of-1 trials (RoBiNT) scale. Neuropsychol Rehabil 23(5):619–638.
doi:10.1080/09602011.2013.824383, Epub 2013 Sep 9
Tate RL, Perdices M, McDonald S, Togher L, Rosenkoetter U (2014) The design, conduct and
report of single-case research: resources to improve the quality of the neurorehabilitation
literature. Neuropsychol Rehabil 24(3–4):315–331. doi:10.1080/09602011.2013.875043,
Epub 2014 Apr 7
Yuhong H, Qian L, Yu L, Yingqiang Z, Yanfen L, Shujing Y, Shufang Q, Lanjun S, Shuxuan Z,
Baohe W (2013) An n-of-1 trial service in clinical practice: testing the effectiveness of Liuwei
Dihuang decoction for kidney-Yin defisciency syndrome. Evid Based Complement Alternat
Med 2013:827915. doi:10.1155/2013/827915, Epub 2013 Sep 23
Zucker DR, Schmid CH, McIntosh MW, D’Agostino RB, Selker HP, Lau J (1997) Combining
single patient (N-of-1) trials to estimate population treatment effects and to evaluate individual
patient responses to treatment. J Clin Epidemiol 50(4):401–410
Index
A C
Adaptive trial design, 78, 216, 218 Case description, 22, 24, 25
Adverse drug reactions (ADR), 12, 116–118, Case report form (CRF), 4, 94–103, 119–120
120, 121 Causality, 26, 116, 117, 120, 122, 196
Adverse event, 5, 12, 51, 64, 86, 94–96, Changing-criterion design (CCD), 32–33
101, 102, 106, 108, 111, 115–122, Checklist, 24, 31, 98, 131, 178–180,
163, 178, 188, 237 183, 217, 218
Aggregated n-of 1 trial, 51, 54, 57–65, Chronic disease, 1, 11, 12, 236, 238
70, 78, 82 Chronic pain, 4, 5, 7, 46, 110, 233–235, 239
Allocation concealment, 4, 81–90, 182, 183 Clinical decision-making, 3, 5, 14, 48,
Allocation sequence, 81–85, 183 53, 54, 112
Alternate treatment, 72, 74, 183 Clinically important change, 5, 110, 112
Analysis, 2, 23, 45, 61, 74, 86, 93, 109, 116, Clinical research continuum, 5, 132
126, 135, 157, 178, 200, 212, 235 Clinical response, 158, 159
Clinical trial, 5, 6, 13, 14, 22, 46, 60, 67,
71, 76, 82, 93–95, 97–99, 101–103,
B 114, 116, 118, 119, 132, 156–160,
Bayesian methods, 7, 109, 219, 220, 165, 167–171, 176–179, 188,
224–226, 236 211–212, 214, 215, 217, 226,
Behavioral sciences, 3, 7, 16, 19–38, 233, 237 229, 230, 233, 235
Beneficence, 126, 129, 130 Complementary and alternative medicine,
Benefits, 2, 3, 9, 11, 15–16, 43, 46, 48, 52, 57, 7, 233, 235, 236
61–63, 76, 86, 88, 102, 126, 128, 130, Confidence intervals (CI), 46–47, 58, 62, 65,
131, 145, 151, 156–160, 162, 165, 166, 169, 170, 184–186
169, 171, 172, 189, 196, 197, 203, 207, Confounding, 4, 26, 36, 37, 44, 57, 81, 82,
208, 234, 237, 238 90, 214–219
Bias, 3, 4, 14, 23, 24, 45, 57, 61, 64, Consolidated standards of reporting trials
81–83, 85, 86, 89, 90, 102, 103, (CONSORT), 19, 24, 83, 85, 89, 98,
126–128, 130, 172, 176, 180, 190, 177–185, 187
213, 214, 216–219, 230 Consort extension for reporting N-of-1 trials
Biphasic design, 19 (CENT), 6, 24, 178–182, 185, 186, 190
Blinding, 4, 23, 34, 45, 48, 81–90, 97, Correlated measurements, 72–73, 75,
106, 122, 127, 152, 162, 172, 178, 76, 215, 219
180, 182–183, 196, 200, 207, 211, Cost-effectiveness, 16, 78, 98, 156–166
215–218, 238 Cost-effectiveness analysis (CEA), 161, 163,
B-phase training, 25–26 164, 166–172
Costs, 1, 10, 49, 61, 68, 87, 98, 110, 156, 188, H
197, 218, 237 Health technology assessment, 236
Crossover design, 62, 68, 70, 72, 215, 222, Heterogeneity, 156, 157, 159, 171, 184,
226, 230 234, 238
Cross-over studies, 47, 169 Human research ethics committees,
Crossover trial, 48, 81, 87, 211, 225, 235 5, 128–132
D I
Data ICH-Good Clinical Practice (ICH-GCP),
deposit, 6, 187–190 5, 95, 97–100, 102, 115, 116, 122
discrepancies, 94, 96, 98, 100, 101 Inclusion criteria, 46, 59, 97–98, 213
management, 4, 94, 96–97, 103, 190 Informed consent, 97, 119, 190, 204
monitoring, 86, 88, 119 Institutional review boards (IRB), 5, 119, 121,
validation, 4, 94, 99–101, 103 122, 128, 129, 132, 190
Database, 4, 23, 94, 96–103, 188, 238 Integrity, 95, 116, 126, 129, 130
Decision-making, 3, 5, 6, 14, 15, 35, 44, Intent, 45, 50, 58, 130, 131, 177, 217, 235
46, 48, 52–54, 68, 78, 112, 157,
160, 163, 171, 199
Design, 2, 13, 20, 46, 62, 68, 81, 94, 112, 115, J
127, 135, 156, 176, 203, 211, 233 Justice, 126, 129, 130
Direct treatment effect, 71–72, 75–78
Double-blind, 9, 14, 28, 45, 48, 59, 86, 88, 237
L
Linear models, 37, 142–144, 150, 168, 222
E Logic check, 100
Economic evaluation, 6, 156–160, 163, 164,
166, 167, 170–172
Economic methods, 156, 158, 172 M
Economics, 6, 12, 131, 155–172, 212, 237 Medical coding, 4, 101–102
Efficiency, 1, 30, 76, 77, 208, 216, 218 Medication, 9–16, 81, 84, 85, 89, 94, 95,
Enrolment, 4, 95, 97–100, 216 97, 101, 102, 106–108, 111, 112,
Error structure, 4, 75, 77, 78, 151 122, 160–163, 166, 202, 206, 207,
Error types, 34, 36, 37, 58, 63, 74, 78 225, 236, 238
Ethics, 5, 26, 28, 29, 46, 70, 71, 87, 95, 96, 98, Medication expenditure, 12
103, 115, 116, 119, 122, 125–132, 158, Merit, 126, 129, 130, 171, 180, 214
176, 184, 204, 205, 235, 237 Meta-analysis, 7, 37, 103, 152, 178, 180,
Expectedness, 14, 27, 28, 32, 34, 44, 59, 61, 186, 187, 211–230
64, 65, 77, 118, 120, 121, 130, 159, Methodology sample size, 20, 67–79, 156,
165, 181, 196, 199, 202, 204, 206 160, 172, 200, 235, 237
Experimental, 3, 4, 11, 20–26, 28–32, 34, 35, Minimal detectable change, 110
45, 68–71, 76, 78, 79, 111, 136, 149, Missing data, 45, 93, 100, 101, 150, 218,
211, 216, 237 219, 228, 235
Exploratory data analysis, 5 Multiple baseline, 29–30
F N
Fixed effects models, 150, 222, 223, 225 N-of-1 trial, 1–7, 9–16, 19–38, 43–54, 57–65,
67–79, 81–90, 93, 98, 103, 105–112,
115, 125–132, 135–152, 155–172,
G 175–190, 196–200, 203, 204, 207,
General practitioners (GPs), 196, 201, 211–230, 233–239
205–207 Non-experimental, 24
Genomics, 1, 7, 237–239 Non-parametric models, 167, 170
Goodness-of-fit models, 150, 151 Normality, 65, 146
Index 245
T
R Test interpretation, 19
Random effect models, 151, 222–223, Threshold, 5, 109–112, 169, 170, 203, 224
225, 226, 228 Transparency, 6, 152, 176–178, 187, 188, 190
Randomization, 3, 23, 33–36, 45, 61, 65, Treatment effects, 2–3, 10, 25–28, 30, 37,
81–90, 95, 98, 100, 127, 144, 152, 162, 47–49, 62, 65, 68–72, 74, 77, 83, 89,
172, 180, 182–183, 215, 216, 218 90, 139, 140, 145, 150, 156, 184, 197,
Randomized controlled trials (RCTs), 3, 11, 211, 212, 214–216, 218, 220, 223, 224,
13, 21, 22, 43–48, 50, 51, 54, 57, 227, 228, 230, 234, 235, 237
59–65, 67, 68, 127, 128, 156, 158, Trial of therapy, 14, 126–128, 130, 131, 198
176–178, 183, 187, 218, 229, 235, 237 Trial registration, 4, 6, 94, 102, 187–190
Rare diseases, 7, 190, 233, 236 Trials, 1, 9, 20, 43, 57, 67, 81, 93, 105, 115,
Replication, 6, 22, 23, 30, 136, 182, 190, 126, 135, 156, 176, 196, 211, 233
211, 215, 216 Trials of treatment, 6, 44, 197, 200
Reporting, 3, 22, 45, 58, 82, 94, 105, 115, 131, Triple-blind, 86
152, 160, 176, 196, 214, 237
Reporting guideline, 6, 24, 176–180, 182, 237
Representativeness, 33, 54, 58, 63, 112, 115, V
162, 230 Validation, 4, 93, 94, 99–103, 151
Research design, 20, 24, 26, 33, 156,
176, 177, 233
Research ethics, 5, 125–132, 184 W
Residual effects, 70–72, 75–78 Withdrawal/ reversal design, 26–30