100% found this document useful (2 votes)

345 views

Pharmaco-Genomics: Methods and Protocols

Uploaded by

Eliza Clara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

345 views

Pharmaco-Genomics: Methods and Protocols

Uploaded by

Eliza Clara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 375

Methods in

Molecular Biology 1015

Federico Innocenti
Ron H. N. van Schaik Editors

Pharmaco-
genomics
Methods and Protocols
Second Edition
METHODS IN M O L E C U L A R B I O LO G Y ™

Series Editor
John M. Walker
School of Life Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK

For further volumes:

http://www.springer.com/series/7651
Pharmacogenomics
Methods and Protocols

Second Edition

Edited by

Federico Innocenti
Division of Pharmacotherapy and Experimental Therapeutics
Lineberger Comprehensive Cancer Center, Institute for Pharmacogenomics and
Individualized Therapy, Eshelman School of Pharmacy
University of North Carolina, Chapel Hill, NC, USA

Ron H.N. van Schaik

Department of Clinical Chemistry (AKC), Erasmus University Medical Center
Rotterdam, The Netherlands
Editors
Federico Innocenti Ron H.N. van Schaik
Division of Pharmacotherapy and Experimental Department of Clinical Chemistry (AKC)
Therapeutics Erasmus University Medical Center
Lineberger Comprehensive Cancer Center Rotterdam, The Netherlands
Institute for Pharmacogenomics and
Individualized Therapy
Eshelman School of Pharmacy
University of North Carolina
Chapel Hill, NC, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic)

ISBN 978-1-62703-434-0 ISBN 978-1-62703-435-7 (eBook)
DOI 10.1007/978-1-62703-435-7
Springer New York Heidelberg Dordrecht London

Library of Congress Control Number: 2013938603

© Springer Science+Business Media, LLC 2013

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation,
computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this
legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for
the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work.
Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the
Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions
for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution
under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of publication, neither
the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be
made. The publisher makes no warranty, express or implied, with respect to the material contained herein.

Printed on acid-free paper

Humana Press is a brand of Springer

Springer is part of Springer Science+Business Media (www.springer.com)
Preface

Based upon the success of its first edition, the second edition of Pharmacogenomics: Methods
and Protocols aims to continue to provide readers with high-quality content on the most
innovative and commonly adopted technologies in the field of pharmacogenomics. Many
contributors to this book are leading experts in this field.
Pharmacogenomics: Methods and Protocols has become an established guide for investi-
gators in the selection and the experimental application of pharmacogenomic technolo-
gies. Using the extensive information in the materials and methods sections, investigators
will be able to easily perform each technique in their laboratories. This book is unique in
that it identifies and highlights problems that might be encountered in performing a spe-
cific technique and how to overcome these. Each procedure is described in a stepwise
fashion, providing detailed information from leading experts that is usually not found in
research articles.
Pharmacogenomics aims to study the genetic basis of interpatient variability in response
to drug therapy. Understanding an individual’s genetic makeup is the key to creating per-
sonalized drugs with greater efficacy and safety. Various technologies are currently available,
and this book aids the researchers’ decision on the most suitable method to apply.
In this updated edition, an introductory chapter describes the history of pharmacoge-
nomics and its current status. It is followed by Part II, which includes a variety of tech-
niques that are currently available to interrogate a patient’s genome. Readers will find
detailed information on eight technologies for SNP detection, plus three in-depth chapters
on recent technological developments in epigenetic techniques, sequencing, and quality
control. Relative to the first edition, newer methods such as SmartAmp, GoldenGate, and
Luminex X MAP have now been included.
Part III describes six methodologies and tools to assess and infer the functional sig-
nificance of allele variation in humans, including more innovative in vitro models (assays
to detect allelic imbalance or the effects of nonsynonymous variants and to guide iden-
tification of candidate genes) and in vivo assays in mice (use of genomically characterized
inbred mice and the hydrodynamic tail vein assay for human promoters and
enhancers).
Part IV describes current tools for supporting the translation and implementation of
pharmacogenomic markers in the clinic. Here, readers will find five completely new
chapters on the latest repositories of pharmacogenomic information, a summary guide
to the most recent Web-based resources of interest to pharmacogenomic researchers,
and two key examples of algorithms and guidelines for treatment personalization based
upon genetics.
Pharmacologists, geneticists, molecular biologists, and physicians in academic institu-
tions, in biotechnology, and in pharmaceutical industries will find Pharmacogenomics:
Methods and Protocols, second edition an essential reference and a valuable source on the
latest information in this field.

v
vi Preface

We are extremely grateful to all the authors for their excellent contributions making this
book a comprehensive and up-to-date resource for investigators in pharmacogenomics.

Chapel Hill, NC, USA Federico Innocenti

Rotterdam, The Netherlands Ron H.N. van Schaik
Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

PART I INTRODUCTION TO PHARMACOGENOMICS

1 Pharmacogenomics: Historical Perspective and Current Status. . . . . . . . . . . . . 3

Rosane Charlab and Lei Zhang

PART II TECHNIQUES FOR INTERROGATING VARIATION

INHUMAN GENES AND GENOMES

2 Denaturing High-Performance Liquid Chromatography

for Mutation Detection and Genotyping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Donna Lee Fackenthal, Pei Xian Chen, Ted Howe, and Soma Das
3 Clinical SNP Detection by the SmartAmp Method . . . . . . . . . . . . . . . . . . . . . 55
Toshihisa Ishikawa and Yoshihide Hayashizaki
4 MALDI-TOF Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Dirk van den Boom, Matthias Wjst, and Robin E. Everts
5 TaqMan® Drug Metabolism Genotyping Assays for the Detection
of Human Polymorphisms Involved in Drug Metabolism . . . . . . . . . . . . . . . . 87
Toinette Hartshorne
6 Pyrosequencing of Clinically Relevant Polymorphisms . . . . . . . . . . . . . . . . . . . 97
Cristi R. King and Sharon Marsh
7 Pharmacogenetics Using Luminex® xMAP® Technology:
A Method for Developing a Custom Multiplex Single Nucleotide
Polymorphism Mutation Assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Gonnie Spierings and Sherry A. Dunbar
8 Use of Linkage Analysis, Genome-Wide Association Studies,
and Next-Generation Sequencing in the Identification
of Disease-Causing Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Eric Londin, Priyanka Yadav, Saul Surrey, Larry J. Kricka,
and Paolo Fortina
9 The GoldenGate Genotyping Assay: Custom Design, Processing,
and Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Anna González-Neira

vii
viii Contents

10 Genome-Wide Gene Expression Profiling, Genotyping,

and Copy Number Analyses of Acute Myeloid Leukemia
Using Affymetrix GeneChips. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Mathijs A. Sanders and Peter J.M. Valk
11 Epigenetic Techniques in Pharmacogenetics . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Sandra G. Heil
12 Plasmid Derived External Quality Controls for Genetic Testing . . . . . . . . . . . . 189
Tahar van der Straaten and Henk-Jan Guchelaar

PART III FUNCTIONAL ASSESSMENT OF GENETIC VARIATION:

IN VITRO AND IN VIVO METHODS

13 Allelic Imbalance Assays to Quantify Allele-Specific Gene Expression

and Transcription Factor Binding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Francesca Luca and Anna Di Rienzo
14 SCAN: A Systems Biology Approach to Pharmacogenomic Discovery . . . . . . . 213
Eric R. Gamazon, R. Stephanie Huang, and Nancy J. Cox
15 Methods to Examine the Impact of Nonsynonymous SNPs
on Protein Degradation and Function of Human ABC Transporter . . . . . . . . . 225
Toshihisa Ishikawa, Kanako Wakabayashi-Nakao,
and Hiroshi Nakagawa
16 In Vitro Identification of Cytochrome P450 Enzymes Responsible
for Drug Metabolism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Zhengyin Yan and Gary W. Caldwell
17 In Vitro and In Vivo Mouse Models for Pharmacogenetic Studies . . . . . . . . . . 263
Amber Frick, Oscar Suzuki, Natasha Butz, Emmanuel Chan,
and Tim Wiltshire
18 The Hydrodynamic Tail Vein Assay as a Tool for the Study
of Liver Promoters and Enhancers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Mee J. Kim and Nadav Ahituv

PART IV TOOLS FOR TRANSLATION AND IMPLEMENTATION

OFPHARMACOGENETIC MARKERS

19 A Guide to the Current Web-Based Resources in Pharmacogenomics . . . . . . . 293

Dylan M. Glubb, Steven W. Paugh, Ron H.N. van Schaik,
and Federico Innocenti
20 PharmGKB: The Pharmacogenomics Knowledge Base . . . . . . . . . . . . . . . . . . 311
Caroline F. Thorn, Teri E. Klein, and Russ B. Altman
21 Genetic Databases in Pharmacogenomics:
The Frequency of Inherited Disorders Database (FINDbase). . . . . . . . . . . . . . 321
Marianthi Georgitsi and George P. Patrinos
Contents ix

22 Development of Predictive Models for Estimating Warfarin

Maintenance Dose Based on Genetic and Clinical Factors . . . . . . . . . . . . . . . . 337
Lu Yang and Mark W. Linder
23 Evidence Based Drug Dosing and Pharmacotherapeutic
Recommendations per Genotype. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Vera H.M. Deneer and Ron H.N. van Schaik

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Contributors

NADAV AHITUV • Department of Bioengineering and Therapeutic Sciences,

Institute for Human Genetics, University of California, San Francisco, CA, USA
RUSS B. ALTMAN • Department of Genetics, School of Medicine, Stanford University,
Stanford, CA, USA
NATASHA BUTZ • Division of Pharmacotherapy and Experimental Therapeutics,
Institute for Pharmacogenomics and Individualized Therapy, Eshelman School of
Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
GARY W. CALDWELL • CREATe, Janssen Pharmaceutical Companies of Johnson & Johnson,
Spring House, PA, USA
EMMANUEL CHAN • Division of Pharmacotherapy and Experimental Therapeutics,
Institute for Pharmacogenomics and Individualized Therapy, Eshelman School of
Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
ROSANE CHARLAB • Office of Clinical Pharmacology, Office of Translational Sciences,
Center for Drug Evaluation and Research, United States Food and Drug
Administration, Silver Spring, MD, USA
PEI XIAN CHEN • Department of Human Genetics, University of Chicago, Chicago, IL,
USA
NANCY J. COX • Section of Genetic Medicine, Department of Medicine,
The University of Chicago, Chicago, IL, USA
SOMA DAS • Department of Human Genetics, University of Chicago, Chicago, IL, USA
ANNA DI RIENZO • Department of Human Genetics, University of Chicago,
Chicago, IL, USA
VERA H.M. DENEER • Department of Clinical Pharmacy, St. Antonius Ziekenhuis
Nieuwegein, Nieuwegein, The Netherlands
SHERRY A. DUNBAR • Luminex Corporation, Austin, TX, USA
ROBIN E. EVERTS • SEQUENOM® Inc., San Diego, CA, USA
DONNA LEE FACKENTHAL • Department of Human Genetics, University of Chicago,
Chicago, IL, USA
PAOLO FORTINA • Cancer Genomics Laboratory, Kimmel Cancer Center,
Department of Cancer Biology, Thomas Jefferson University, Jefferson Medical College,
Philadelphia, PA, USA
AMBER FRICK • Division of Pharmacotherapy and Experimental Therapeutics,
Institute for Pharmacogenomics and Individualized Therapy, Eshelman School of
Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
ERIC R. GAMAZON • Section of Genetic Medicine, Department of Medicine,
The University of Chicago, Chicago, IL, USA
MARIANTHI GEORGITSI • Department of Pharmacy, School of Health Sciences,
University of Patras, Patras, Greece
DYLAN M. GLUBB • Queensland Institute of Medical Research, Brisbane, QLD, Australia
ANNA GONZÁLEZ-NEIRA • Human Genotyping Unit, Spanish National Cancer Research
Center (CNIO), Madrid, Spain

xi
xii Contributors

HENK-JAN GUCHELAAR • Department of Clinical Pharmacy and Toxicology,

Leiden University Medical Center, Leiden, The Netherlands
TOINETTE HARTSHORNE • Life Technologies, South San Francisco, CA, USA
YOSHIHIDE HAYASHIZAKI • Preventive Medicine and Diagnosis Innovation Program,
RIKEN, Wako, Japan
SANDRA G. HEIL • Department of Clinical Chemistry, Erasmus University Medical Center,
Rotterdam, The Netherlands
TED HOWE • Transgenomic Inc., Omaha, NE, USA
R. STEPHANIE HUANG • Section of Hematology/Oncology, Department of Medicine,
The University of Chicago, Chicago, IL, USA
FEDERICO INNOCENTI • Division of Pharmacotherapy and Experimental Therapeutics,
Lineberger Comprehensive Cancer Center, Institute for Pharmacogenomics and
Individualized Therapy, Eshelman School of Pharmacy, University of North Carolina,
Chapel Hill, NC, USA
TOSHIHISA ISHIKAWA • Center for Life Science Technologies, RIKEN, Yokohama, Japan
MEE J. KIM • Department of Bioengineering and Therapeutic Sciences, Institute for
Human Genetics, University of California, San Francisco, CA, USA
CRISTI R. KING • Department of Internal Medicine, Washington University in St. Louis,
St. Louis, MO, USA
TERI E. KLEIN • Department of Genetics, Stanford University Medical Center, Stanford,
CA, USA
LARRY J. KRICKA • Department of Pathology and Laboratory Medicine,
University of Pennsylvania School of Medicine, Philadelphia, PA, USA
MARK W. LINDER • Department of Pathology and Laboratory Medicine,
University of Louisville School of Medicine, Louisville, KY, USA
ERIC LONDIN • Computational Medicine Center, Thomas Jefferson University Jefferson
Medical College, Philadelphia, PA, USA
FRANCESCA LUCA • Department of Human Genetics, University of Chicago,
Chicago, IL, USA
SHARON MARSH • Faculty of Pharmacy and Pharmaceutical Sciences, Katz Group Centre
for Pharmacy and Health Research, University of Alberta, Edmonton, AB, Canada
HIROSHI NAKAGAWA • College of Bioscience and Biotechnology, Chubu University, Aichi,
Japan
GEORGE P. PATRINOS • Department of Pharmacy, School of Health Sciences, University of
Patras, Patras, Greece
STEVEN W. PAUGH • Hematological Malignancies Program and Pharmaceutical Sciences
Department, St. Jude Children’s Research Hospital, Memphis, TN, USA
MATHIJS A. SANDERS • Department of Hematology, Erasmus University Medical Center,
Rotterdam, The Netherlands
GONNIE SPIERINGS • Luminex B.V., Oosterhout, The Netherlands
SAUL SURREY • Department of Medicine, Thomas Jefferson University, Jefferson Medical
College, Philadelphia, PA, USA
OSCAR SUZUKI • Division of Pharmacotherapy and Experimental Therapeutics,
Institute for Pharmacogenomics and Individualized Therapy, Eshelman School of
Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
CAROLINE F. THORN • Department of Genetics, School of Medicine, Stanford University,
Stanford, CA, USA
Contributors xiii

PETER J.M. VALK • Department of Hematology, Erasmus University Medical Center,

Rotterdam, The Netherlands
RON H.N. VAN SCHAIK • Department of Clinical Chemistry, Erasmus University Medical
Center, Rotterdam, The Netherlands
DIRK VAN DEN BOOM • SEQUENOM® Inc., San Diego, CA, USA
TAHAR VAN DER STRAATEN • Department of Clinical Pharmacy and Toxicology,
Leiden University Medical Center, Leiden, The Netherlands
KANAKO WAKABAYASHI-NAKAO • Medical Genetics Division, Shizuoka Cancer Center
Research Institute, Shizuoka, Japan
TIM WILTSHIRE • Division of Pharmacotherapy and Experimental Therapeutics,
Institute for Pharmacogenomics and Individualized Therapy, Eshelman School of
Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
MATTHIAS WJST • Comprehensive Pneumology Center (CPC), Helmholtz Zentrum
Muenchen, German Research Center for Environmental Health (GmbH),
Neuherberg, Germany; Institute of Medical Statistics and Epidemiology,
Klinikum Rechts der Isar der TU Muenchen, Muenchen, Germany
PRIYANKA YADAV • Cancer Genomics Laboratory, Kimmel Cancer Center,
Thomas Jefferson University, Jefferson Medical College, Philadelphia, PA, USA
ZHENGYIN YAN • CREATe, Janssen Pharmaceutical Companies of Johnson & Johnson,
Spring House, PA, USA
LU YANG • Department of Pathology and Laboratory Medicine, University of Louisville
School of Medicine, Louisville, KY, USA
LEI ZHANG • Office of Clinical Pharmacology, Office of Translational Sciences,
Center for Drug Evaluation and Research, United States Food and Drug
Administration, Silver Spring, MD, USA
Part I

Introduction to Pharmacogenomics
Chapter 1

Pharmacogenomics: Historical Perspective

and Current Status
Rosane Charlab and Lei Zhang

Abstract
Pharmacogenomics and its predecessor pharmacogenetics study the contribution of genetic factors to the
interindividual variability in drug efficacy and safety. One of the major goals of pharmacogenomics is to
tailor drugs to individuals based on their genetic makeup and molecular profile. From early findings in the
1950s uncovering inherited deficiencies in drug metabolism that explained drug-related adverse events, to
nowadays genome-wide approaches assessing genetic variation in multiple genes, pharmacogenomics has
come a long way. The evolution of pharmacogenomics has paralleled the evolution of genotyping tech-
nologies, the completion of the human genome sequencing and the HapMap project. Despite these
advances, the implementation of pharmacogenomics in clinical practice has yet been limited. Here we
present an overview of the history and current applications of pharmacogenomics in patient selection, dos-
ing, and drug development with illustrative examples of these categories. Some of the challenges in the
field and future perspectives are also presented.

Key words Pharmacogenetics, Pharmacogenomics, Pharmacokinetics, Pharmacodynamics, Poly-

morphism, Adverse event, Targeted therapy, Drug metabolizing enzyme, Drug transporter

1 Pharmacogenomics: Historical Perspective and Current Status

It is well known that people respond differently to medications.

The same medication can be well tolerated and/or effective in
some individuals, but lead to severe adverse reactions and/or be
ineffective in others. This heterogeneity in drug response poses
immense clinical challenges and underscores the importance of
individualized medicine efforts, i.e., tailor the medications to indi-
viduals in order to optimize treatment, prevent adverse reactions,
and improve patient care [1].
The individual variation in drug response is attributed to the
complex interplay of multiple factors (Fig. 1). These include differ-
ences in genetic makeup, environmental factors, co-morbidities, age,
sex, race, organ dysfunction, disease characteristics, co-medications

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_1, © Springer Science+Business Media, LLC 2013

3
4 Rosane Charlab and Lei Zhang

Genetic Environmental Disease

characteristics

Clinical
Age Practice
DDI

Sex Regulatory Comorbidities

DRUG
RESPONSE
Life
Race Style ???

Fig. 1 The drug response “puzzle.” Genetic makeup is just one of many factors that contribute to interindividual
variability in drug response. The contribution of each factor can be different for different patient–drug associa-
tions. DDI drug–drug interactions

and drug–drug interactions. It is important to assess the contribution

of the various genetic and nongenetic components to the overall
response. Pharmacogenetics deals with the genetic component,
and focuses on how genetic differences in individual candidate
genes—mostly polymorphisms in genes encoding drug metaboliz-
ing enzymes (DMEs) and transporters, contribute to the observed
variability in drug response [2]. It is estimated that the genetic
component accounts for 20–95 % of variability in drug response [3,
4]. Pharmacogenetics usually relies on large clinical effects of single
or a few gene variants. Still, most of genetic variability in drug
response is likely to be associated to complex traits involving mul-
tiple genes with compensatory or overlapping roles. Consequently,
the evaluation is also more complex. To accommodate this com-
plexity, pharmacogenetics has evolved into pharmacogenomics,
which studies the influence of multiple genes, including relevant
pathways and ultimately the entire genome (and its products) as
how they impact drug response [5, 6]. Pharmacogenomics consid-
ers both inherited (germ-line) and acquired (somatic; in tumor)
DNA variations, in addition to variations in RNA expression. This
new field combines classical pharmacology and genomics, and
applies the use of genetic information, both at population and
patient’s level, to advance drug research and development and to
manage drug selection and dosing [7–9].
The use of large-scale genetic analysis to interpret and predict
drug response characteristic of pharmacogenomics was facilitated
by the completion of the human genome sequencing and map-
ping, the international HapMap project and by advances in gene
expression profiling, high throughput genotyping, sequencing and
Pharmacogenomics: Historical Perspective and Current Status 5

other genomic methodologies. Ironically, these advances also cre-

ated a flood of information in a faster pace that it could be analyzed
and properly correlated with clinical information. Despite the fact
that an increasing number of drug labels have indications associ-
ated with genetic biomarkers [10], the exact contribution of
genetic factors to drug toxicity and efficacy is unclear for most
drugs, and the implementation of pharmacogenomics in clinical
practice is in its infancy. The growing interest in applying pharma-
cogenomics principles in mainstream clinical medicine and drug
development is undeniable, however, even for the most skeptics.

2 Pharmacogenetics Is Born

The inability to taste phenylthiourea was associated to an autosomal

recessive trait by Larry Snyder in 1932—establishing a link between
drug response and inheritance [11]. The gene responsible for the
trait, the taste receptor TAS2R1, was only identified in 2003,
70 years after Snyder’s findings [12]. Arno Motulsky in 1957 [13]
anticipated that inheritance might explain variability in drug efficacy
and toxicity. The term “pharmacogenetics” was then coined by
Friederich Vogel in 1959 [14] to define a new science applying
genetic and pharmacology to study the influence of inheritance on
drug response. The initial reports of “classic” pharmacogenetic
traits were drug metabolism disorders in which inherited variations
in a single drug disposition gene caused abnormal response to the
drug. These traits behaved as high-penetrance monogenic
Mendelian traits. Seminal studies include (1) association of pro-
longed muscle paralysis and apnea by the muscle relaxant succinyl-
choline to atypical butyrylcholinesterase (pseudocholinesterase),
(2) thiopurine S-methyltransferase (TPMT) phenotyping to uncover
patients with low capacity to inactivate toxic thiopurines, (3) “slow
acetylators” versus “rapid acetylators” status for isoniazid metabo-
lism associated to inherited N-acetyltransferase (NAT2) variation,
(4) cytochrome P450 2D6 (CYP2D6) poor metabolizers pheno-
typing for debrisoquine hydroxylation, and (5) primaquine-related
hemolytic anemia in carriers of glucose-6-phosphate dehydrogenase
deficiency (Table 1). The distribution of phenotypes for these early
examples was established by measuring significant variability in
pharmacokinetic (PK) parameters. A correlation was then estab-
lished between drug pharmacokinetics and efficacy or toxicity, and
the genetic variations associated to enzyme activity were later iden-
tified (reviewed by [2, 15, 16]). For the examples above, CYP2D6
was cloned and the genetic polymorphism associated with deficient
debrisoquine metabolism was characterized in 1988 [17], followed
by molecular cloning of NAT2 [18], and TPMT [19] genes. In
addition, early findings from twin and family studies were consistent
with an important role of genetic variation in drug response [3, 20].
6 Rosane Charlab and Lei Zhang

Table 1
Pharmacogenetics is born: seminal findings

Variable Associated
Drug clinical effect Gene mechanisma References
Phenylthiourea Inability to taste “PTU nontaster Coding SNPs in [11, 12]
(PTU) phenylthiourea trait” taste receptor
TAS2R1
Primaquine Primaquine-related G6PD G6PD deficiency [92]
(antimalarial) hemolytic anemia
Succinilcholine Prolonged muscle Pseudocholinesterase Pseudocholinesterase [93]
(muscle relaxant) paralysis and deficiency
apnea
Isoniazid Isoniazid-induced NAT2 Reduced function [94, 95]
(antituberculosis) peripheral NAT2 variants
neuropathy in
“slow-acetylators”
Debrisoquine Adverse response to CYP2D6 Reduced function [96, 97]
(antihypertensive) debriquisone in CYP2D6 variants
“poor
metabolizers”
Thiopurines Increased risk of TPMT Reduced function [98]
myelossupression TPMT variants
in “poor
metabolizers”
Adapted from refs. 15, 27
Genes: TAS2R1 taste receptor, type 2, member 1; G6PD glucose-6-phosphate dehydrogenase; NAT2 N-acetyltransferase
2; CYP2D6 cytochrome P450, family 2, subfamily D, polypeptide 6; TPMT thiopurine S-methyltransferase
a
Associated mechanisms at molecular level were in some instances identified years apart of the original findings

3 Pharmacogenetics of Drug Disposition and Drug Targets:

The “Art of Medicine” Unraveled

Genetically based differences can occur in processes involved in

drug pharmacokinetics (PK) and/or pharmacodynamics (PD)
[2, 21, 22]. Genetic variation in drug disposition (Absorption,
Distribution, Metabolism, and Excretion—ADME) can lead to
pharmacokinetic changes in the levels of the parent drug or its
metabolites and thereby affect drug action (Table 2). Many func-
tionally relevant polymorphisms have been identified in DMEs
including cytochrome P450 (CYP450; especially CYP2D6,
CYP2C9, CYP2C19), NAT2, TPMT, and UDP-
glucoronosyltransferases (UGTs) [7].
Polymorphisms in drug disposition genes may decrease the
functional activity or expression of the metabolizing enzymes. This
can give rise to distinct individual metabolism phenotypes ranging
from poor to ultra rapid (i.e., poor (PM), intermediate (IM),
Pharmacogenomics: Historical Perspective and Current Status 7

Table 2
Genetic variation among individuals can influence pharmacokinetics and/or pharmacodynamics
and affect drug benefit/risk profile

Gene (and
associated
Therapeutic agent Clinical effect variants) Problem affecting efficacy/safety
Metabolism
Warfarin Anticoagulant CYP2C9 PK: Increased risk of bleeding in PMs
Codeine Analgesic CYP2D6 PK: Lack of analgesia in CYP2D6 PMs,
(pro-drug) toxicity in UMs
Nortriptyline Antidepressant CYP2D6 PK: Increased adverse event risk in CYP2D6
PMs, decreased efficacy in UMs
Clopidogrel Anti-thrombotic CYP2C19 PK: Decreased bioactivation and reduced
(pro-drug) response to clopidogrel in PMs
Mercaptopurine Antineoplastic TPMT PK: Increased myelosupression risk in PMs
Irinotecan Antineoplastic UGT1A1 PK: Increased hematological toxicity risk in
the reduced function UGT1A1*28
carriers
Transport
Simvastatin Lipid-lowering drug SLCO1B1 PK: SNP associated with reduced hepatic
uptake and increased risk of statin-
induced myopathy
Drug target
Warfarin Anticoagulant VKORC1 PD: Decreased dose requirement associated
to VKORC1 SNP
Beta blockers Treat high blood ADRB1 PD: SNPs associated to variability in
pressure/decrease response to beta blockers
heart rate
Trastuzumab Antineoplastic ERBB2 PD: Effective in patients overexpressing
(HER2) HER2 receptor on tumor cells
Other
Abacavir Anti-HIV HLA-B*5701 PD/mechanistic: High risk of severe
hypersensitivity reaction in HLA-B*5701
allele carriers
Carbamezapine Anticonvulsant HLA-B*1502 PD/mechanistic: High risk of SJS/TEN
with HLA-B*1502 allele in Han Chinese
Pegylated Antiviral for IL28B PD/mechanistic: SNP near to IL28B gene
interferon hepatitis C is associated to response to PegIFN and
(PegINF) and RBV for patients with chronic genotype
ribovarin (RBV) 1 HCV infection [99]
Modified from refs. 22, 27, 100, 101
PK pharmacokinetics, PD pharmacodynamics, SNP single nucleotide polymorphism, PM poor metabolizer, UM ultrarapid
metabolizer, SJS/TEN Stevens Johnson syndrome/Toxic epidermal necrolysis, HIV human immunodeficiency virus, HCV
Hepatitis C virus
Genes: CYP2C9 cytochrome P450, family 2, subfamily C, polypeptide 9; CYP2D6 cytochrome P450, family 2, subfam-
ily D, polypeptide 6; CYP2C19 cytochrome P450, family 2, subfamily C, polypeptide 19; TPMT thiopurine
S-methyltransferase; UGT1A1 UDP glucuronosyltransferase 1 family, polypeptide A1; SLCO1B1 solute carrier organic
anion transporter family, member 1B1; VKORC1 vitamin K epoxide reductase complex, subunit 1; ADRB1 adrenergic,
beta-1-, receptor; ERBB2 (HER2) v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma
derived oncogene homolog (avian); HLA-B major histocompatibility complex, class I, B; IL28B interleukin 28B (inter-
feron, lambda 3)
8 Rosane Charlab and Lei Zhang

extensive (EM), and ultrarapid (UM)) metabolizers with potential

clinical consequences. For example, when an active parent drug
undergoes inactivation via a polymorphic DME, reduced function
variants can lead to drug accumulation and toxicity in poor metabo-
lizers, while ultra rapid variants can lead to higher clearance and
reduced drug action in ultra rapid metabolizers. Conversely, when
an inactive pro-drug needs to be converted to the active drug by a
polymorphic DME to display pharmacological activity, reduced
function polymorphisms can lead to lack of drug efficacy in poor
metabolizers, while ultra rapid variants can lead to active drug accu-
mulation and toxicity [23, 24]. In addition, the distribution of func-
tional DME variants can be different among ethnic groups giving
rise to different proportions of PM, IM, EM, and UM subjects
within a given population [25]. For instance, 5–10 % of Europeans
and Africans are CYP2D6 poor metabolizers (PMs), while the fre-
quency of CYP2D6 PMs is much lower in Asians. On the other
hand, Asians have a higher frequency of CYP2C19 PMs compared
with the other two major ethnic groups [22, 25, 26]. In some
instances, loss or reduced DME function due to a polymorphism
can be compensated by another functional DME. For example,
CYP3A5 deficiency can be caused by a polymorphism (CYP3A5*3,
6986A>G) that originates a premature stop codon and enzyme
truncation. Because most CYP3A5 drug substrates are also sub-
strates of CYP3A4, the clinical effects of an altered CYP3A5 activity
may be difficult to interpret in the context of a functional CYP3A4
[16].
Genetic variation can also occur in drug targets or associated
pharmacodynamic pathways, or in genes unrelated to the thera-
peutic effect, and these can lead to differences in how subjects
respond to a drug. Polymorphisms in VKORC1, which encodes
vitamin K epoxide reductase complex, subunit 1, the warfarin tar-
get, exemplify this category as described in the following sections.
Genetic variations at pharmacokinetic and pharmacodynamic level
can affect clinical outcome, and these are most relevant to clinical
practice. Drugs with narrow therapeutic index (where small differ-
ences in ADME may lead to toxicity, and also referred here as “nar-
row therapeutic range”), and that have at least a critical step in the
drug response pathway mostly controlled/contributed by a single
gene, are more likely to have a measurable pharmacogenetic effect.

4 Pharmacogenetics to Pharmacogenomics: Knowing Is Half of the Battle

The term pharmacogenomics was introduced in the late 1990s

[27], just before the completion of the human genome sequence
and realization that many genes are polymorphic [28, 29].
Polymorphisms are variations at DNA sequence level among
individuals. Historically, polymorphisms are defined as occurring at
Pharmacogenomics: Historical Perspective and Current Status 9

a frequency of 1 % or more in the population. If the frequency is

lower than 1 %, the variation is usually considered a mutation [30].
SNPs (single nucleotide polymorphisms) are single-base pair sub-
stitutions and represent the most studied type of polymorphism in
the human genome. Over 14 million SNPs have been discovered,
most yet with unknown function. In addition to single-base pair
substitutions, other types of DNA variation include insertions,
deletions, and variations in copy-number of genes or DNA regions
[16]. Inter individual differences in epigenetic state of a genome
(i.e., DNA methylation status and histone modifications) may also
contribute to variability in drug response through regulation of
gene expression [31]. Genetic variation can differ considerably
among ethnic groups [32], and often a combination of SNPs
inherited together (haplotypes) are evaluated in genotype–phenotype
correlation studies instead of individual SNPs [33–35]. To date,
the inherited genetic variability in drug response has been primarily
associated to a small proportion of DNA variations in drug targets,
disposition genes and in genes related to serious adverse events.
Although extensively assessed, the role of most DNA variations in
drug pharmacokinetics, pharmacodynamics, efficacy and toxicity
has not been clearly determined.
Pharmacogenomic studies use mainly candidate gene or
genome wide approaches to identify biomarkers. The candidate
gene approach is hypothesis driven and limited by the current
knowledge of the drug pharmacokinetics and pharmacodynamics.
In this approach genetic variation in genes mostly involved in the
drug disposition and response are tested during association studies.
An expansion of the candidate gene approach is called pathway-
based approach, which is also hypothesis driven and interrogates a
group of polymorphisms in genes common to a candidate pathway.
Alternatively, approaches can rely on genome wide association
studies (GWAS), which are hypothesis generating and discovery-
driven, and are not limited by the inclusion of defined candidate
genes in the test set. GWAS use microarray technologies which are
able to evaluate millions of polymorphisms covering the entire
human genome simultaneously and, in this way, identify new tar-
gets irrespective of prior biological knowledge of the gene/poly-
morphism function. Polymorphisms (typically SNPs) in GWAS
significantly associated with the phenotype of interest are used to
identify relevant genes using linkage disequilibrium and bioinfor-
matics tools. This approach however requires a very large sample
size and the risk of false positives is high. Both approaches have
their advantages and disadvantages and statistical considerations
should be carefully carried out prior to study initiation. Furthermore,
replication of pharmacogenomic data across studies and translation
to clinical applicability is still a major challenge [36, 37].
In the post genomic era, genome wide genotyping technologies
are being used to characterize genetic mutations or polymorphisms
10 Rosane Charlab and Lei Zhang

and their functional consequences. These technologies use gene

chips known as microarrays. Chip technologies can also assess other
type of variations such as gene copy number, structural changes in
DNA, and gene expression profiles of various tissue samples in a
high throughput fashion [38]. There has also been significant
progress in DNA sequencing technologies, referred as the next-
generation sequencing (NGS), being the traditional Sanger
method, the first generation technology. NGS methods are becom-
ing faster and cheaper than the traditional method used to generate
the first sequence of the human genome. The various methods
encompass a combination of protocols for DNA template prepara-
tion, sequencing and imaging, and analysis. The reader is referred
to reviews on the subject for more details [39]. The power and
speed of these methods is a key tool to accelerate discovery. For
instance, information on rare transcripts and alternative splicing can
be obtained through sequencing-based methods instead of with
gene-expression microarrays, without prior knowledge of which
genes to interrogate. Of note, despite differences in the definition,
the terms pharmacogenetics and pharmacogenomics are commonly
used interchangeably. Also, it is important to have in mind that the
distribution of clinically relevant genetic variants may be different
among different ethnic groups and pharmacogenetic or pharma-
cogenomic information generated in one population may not be
applicable to a different one.

5 Pharmacogenomics Applications

Currently pharmacogenomics apply or develop biomarkers to (1)

identify patients at risk of adverse events, (2) select patients most
likely to benefit from treatment, (3) establish rational dosing to
ensure safe and effective use of treatment agents, and (4) inform
clinical trial design and drug development. The following sections
will discuss these applications.

5.1 Pharmacoge- Toxicity effects and serious adverse drug reactions (SADRs) are
nomics and Adverse responsible for a large amount of deaths and hospitalizations per
Drug Reactions: year in the US, for the preclusion of potential drugs for entering the
Who Is at Risk? market and for postmarketing withdrawal of approved drugs [36].
SADRs known as type A are predictable events based on the drug
pharmacokinetic and pharmacodynamic properties, and are typi-
cally dose-related. SADRs can also be unpredictable (of idiosyn-
cratic nature; known as type B) and not related to dose, and thus
more worrisome. Some type B SADRs have an underlying immu-
noallergenic mechanism not yet completely understood [36, 40].
Pharmacogenomic approaches can be used to uncover potential
gene(s) associated to the adverse events and develop biomarkers for
Pharmacogenomics: Historical Perspective and Current Status 11

screening patients at risk [16]. Illustrative examples in which a

genetic marker was associated to a SADR are given below.
● Drug-induced liver injury (DILI): DILI is the most common
cause of drug withdrawal from the market. Also, approximately
13 % of acute liver failure in the US is attributed to idiosyn-
cratic DILI [16], underlying the importance of being able to
identify patients at risk. Most of positive genetic-association
studies in DILI to specific drugs relates to genetic variations in
the human leukocyte antigen (HLA) genes or in genes relevant
to drug metabolism and transport [41–43]. Examples include
flucloxacillin, xilamegatran, and lumiracoxib, which are not
marketed in the US. Flucloxacillin is an antibiotic used for the
treatment of staphylococcal infection in Europe and Australia.
In a multicenter GWAS, a SNP in the major histocompatibility
complex (MHC) and closely linked with HLA-B*5701,
showed very strong association with flucloxacillin-induced
hepatic injury [44]. Ximelagatran, a thrombin inhibitor devel-
oped for the prevention and treatment of thromboembolism,
failed to demonstrate a favorable safety profile. A retrospective
study including both genome-wide and large-scale candidate
gene analysis found a genetic association between elevated ala-
nine transaminases and HLA-DRB1*07 and HLA-DQA1*02
[45]. Lumiracoxib, a selective COX-2 inhibitor was either not
approved or withdrawn from the market worldwide due to
hepatotoxicity concerns. Recently, a GWAS and fine mapping
identified a strong association with a common HLA haplotype
(HLA-DRB1*1501-HLA-DQB1*0602-HLA-DRB5*0101-
HLA-DQA1*0102) [46]. Attempts to replicate these associa-
tions in order to potentially “revive the drugs” are challenging.
Difficulties may include obtaining well-characterized samples
from patients with these adverse reactions in a number suitable
for genome-wide methods, involvement of multiple genes and
complex gene-environment interactions [47, 48].
● Drug-induced skin injury [16, 49]: The use of abacavir, an
anti-HIV-1 drug, can lead to a serious, potentially fatal hyper-
sensitivity syndrome in approximately 5–9 % of the patients. In
one of the most compelling applications of pharmacogenomics,
a double-blind, prospective, randomized multicenter study
with 1,956 patients randomized to a HLA-B*5701 prescreening
arm or to an arm without screening indicated that prospective
screening for HLA-B*5701 reduced the risk of hypersensitivity
reaction to abacavir [50]. This screening is now adopted in many
countries and had in fact reduced the frequency of skin reactions
to this drug. Similarly, carbamazepine, an anticonvulsant, can lead
to serious hypersensitivity reactions including maculopapular
eruption, hypersensitivity syndrome, Stevens–Johnson syndrome
12 Rosane Charlab and Lei Zhang

(SJS), and toxic epidermal necrosis (TEN). Carbamazepine-

induced SJS/TEN was strongly associated with a HLA polymor-
phism HLA-B*1502 in Han Chinese [51], supporting genotyping
for the marker. HLA-B*1502 is mostly found across broad areas
of Asia. According with the carbamazepine labeling, “patients
with ancestry in populations in which HLA-B*1502 may be
present” should be screened for the allele before therapy [52].
Patients testing HLA-B*1502 positive should not be given car-
bamazepine “unless the benefit clearly outweighs the risk.”
However, caution should be exercised in deciding which
patients to screen due to the high variability in the rates of
HLA-B*1502 even within ethnic groups. Recently, GWAS
studies have also identified an association of HLA-A*3101 with
carbamazepine-induced adverse drug reactions in persons of
Northern European descent [53] and in Japanese [54], under-
scoring the variability of the genetic markers among ethnic
groups. Hypersensitivity to allopurinol, an anti-hyperuricemia
and anti-gout drug, has also been associated to the HLA-B
gene, but with a different allele, HLA-B*5801 [55].
● Statin-induced myotoxicity [56]: Statins are lipid-lowering
drugs that can cause myopathy ranging from mild myalgia
to potentially fatal and rare rhabdomyolysis. Statins are
substrates of the organic anion-transporting polypeptide
1B1 (OATP1B1), which is encoded by the SLCO1B1 gene. A
SLCO1B1 polymorphism (SLCO1B1 521T>C SNP) was
shown to markedly reduce the statin hepatic uptake and increase
their systemic exposure. A recent GWAS showed that the
SLCO1B1 521T>C SNP is strongly associated with simvastatin-
induced myopathy [57]. It is possible that the SLCO1B1
521T>C SNP is associated with an increased risk of myopathy
in the case of most other statins as well, although not studied.
● Irinotecan induced neutropenia: Irinotecan is a chemothera-
peutic drug approved for the treatment of colorectal cancer.
Irinotecan active metabolite SN-38 is inactivated through
glucuronidation by the polymorphic UGT1A1 drug
metabolizing enzyme among other UGTs. High levels of
SN-38 due to impaired metabolic inactivation can lead to
SADRs including neutropenia and diarrhea. Many prospective
and retrospective studies have documented that patients har-
boring reduced function alleles of UGT1A1 such as
UGT1A1*28 are at higher risk of developing severe neutrope-
nia. The association of UGT1A1 genotype with diarrhea was
not as consistent throughout studies, possibly because of
aggressive management with antidiarrheic drugs [58]. The iri-
notecan labeling was revised in 2005 in lieu of these studies
and now recommends a reduced initial dose for patients known
to be homozygous for UGT1A1*28. The frequency of the *28
Pharmacogenomics: Historical Perspective and Current Status 13

allele is variable among ethnic groups, being more common in

Caucasians and Africans and less frequent in Asians [59].

5.2 Predictors of Individualized drug therapy is particularly needed for agents with
Efficacy: Who Benefits narrow therapeutic index and when the consequences of drug tox-
the Most? icity or lack of efficacy are severe and potentially fatal such as for
antineoplastic agents or anticoagulants [60].

5.2.1 Anticancer Drugs Only a small proportion of cancer patients respond to available
and Tumor Molecular anticancer therapies. Furthermore, patients can develop resistance
Landscape to therapy in a short period of time. Tumor genomes present sev-
eral genetic or genomic structural changes and are highly hetero-
geneous. This heterogeneity can be observed both among different
patients with the same tumor type, and within the same patient
when different tumor sites are compared at molecular level. This
striking variability allied to narrow therapeutic index of many anti-
cancer agents make cancer pharmacogenomics a field rich in oppor-
tunities for tailoring therapy to individual patients. Recent examples
with several cancers [61–63] suggest that tumor types classically
subdivided and classified by histopathology can be further subdi-
vided in molecular subsets. These subsets are characterized by spe-
cific alterations at molecular level that “drive” the tumor (the
so-called “oncogenic drivers”) and are critical for its survival (e.g.,
point mutations, translocations, amplifications). Most of these
genetic alterations are acquired or somatic, i.e., they are only pres-
ent in the tumor tissue.
The increased understanding of the tumor molecular land-
scape and signaling has driven the development of antineoplastic
agents against these specific tumor molecular alterations (mostly in
the drug target or associated pathway). Impressive response rates
have been observed with this approach in the molecularly defined
subset of patients positive for the alteration and/or negative for
alterations associated with resistance to therapy. This targeting
approach is also expected to have a better safety profile than most
chemotherapy drugs—nonselective and designed to fit all.
However, in this scenario, traditional histologies may be subdi-
vided in smaller and smaller molecular subsets adding complexity
and challenge to drug development. One of the earlier examples is
the use of tamoxifen in estrogen receptor-positive metastatic breast
cancer. Other examples (Table 3) include imatinib, a tyrosine
kinase inhibitor indicated for BCR-ABL translocation positive
chronic myelogenous leukemia (CML), trastuzumab, a monoclo-
nal antibody with clinical efficacy in patients with breast cancer
positive for v-erb-b2 erythroblastic leukemia viral oncogene homo-
log 2 (HER2) gene amplification or overexpression of the HER2
protein, erlotinib, an epidermal growth factor receptor (EGFR)
tyrosine kinase inhibitor most effective in non-small-cell lung can-
cer (NSCLC) positive for EGFR activating mutations,
14 Rosane Charlab and Lei Zhang

Table 3
Examples of targeted oncology agents and associated tumor molecular alterations

Therapeutic agent Class Molecular alteration Tumor type

Tamoxifen Nonsteroidal antiestrogen ER expression Breast
Erlotinib EGFR TKI EGFR activating mutation NSCLC
Lapatinib EGFR/HER2 TKI HER2 amplification/overexpression Breast
Traztuzumab Monoclonal antibody HER2 amplification/overexpression Breast
against HER2
Cetuximab/ Monoclonal antibody KRAS mutationa mCRC
panitumumab against EGFR
Imatinib ABL TKIb BCR-ABL fusion gene by chromosomal CML
translocation
Crizotinib ALK TKI ALK rearrangement (e.g., EML4-ALK) NSCLC
Vemurafenib BRAF STK inhibitor BRAF V600E mutation Melanoma
Modified from ref. 102
TKI tyrosine kinase inhibitor, STK serine/threonine kinase inhibitor, mCRC metastatic colorectal cancer, NSCLC non-
small-cell lung cancer, CML chronic myelogenous leukemia
Genes: ER(ESR1) estrogen receptor 1; EGFR epidermal growth factor receptor; ERBB2 (HER2) v-erb-b2 erythroblas-
tic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian); KRAS v-Ki-ras2
Kirsten rat sarcoma viral oncogene homolog; BCR-ABL breakpoint cluster region-Abl tyrosine kinase; EML4-ALK
echinoderm microtubule associated protein like 4-anaplastic lymphoma receptor tyrosine kinase; BRAF v-raf murine
sarcoma viral oncogene homolog B1
a
Lack of benefit from therapy in KRAS mutant mCRC
b
Only relevant target to tumor type was indicated

vemurafenib, a kinase inhibitor recently approved for v-raf murine

sarcoma viral oncogene homolog B1 (BRAF) V600E mutant mel-
anoma [64], and crizotinib, approved for NSCLC positive for
anaplastic lymphoma kinase (ALK) rearrangements, which encom-
passes only about 3–5 % of unselected NSCLC US patients.
Cetuximab and panitumumab are anti-EGFR monoclonal anti-
bodies indicated for the treatment of metastatic colorectal cancer
(mCRC). Tumors positive for somatic activating mutations in
v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS),
which encodes a signaling protein downstream of the EGFR
receptor, do not respond to these antibodies and should not be
treated with these agents [1, 63, 65]. In some cases, the relevant
molecular alterations need to be assessed by specific diagnostic
tests before the patient can be given the therapeutic agent or
excluded from therapy.
Despite the success of these targeted therapies in molecularly
defined subset of patients, acquired resistance to the targeted agent
is often a problem that impacts the clinical response [62, 66, 67].
Molecular mechanisms of resistance are now being investigated to
identify not only acquired, but intrinsic mechanisms of resistance
to orient the development of rationally designed targeted agents.
Pharmacogenomics: Historical Perspective and Current Status 15

5.2.2 Clopidogrel Clopidogrel is an anti-thrombotic agent. Clopidogrel has no

and CYP2C19 inherent antiplatelet activity, but about 15 % of its dose is converted
to an active metabolite in a two-step process involving multiple
CYPs, one of which is the polymorphic CYP2C19 [68]. The PM
status of CYP2C19 (carriers of two nonfunctional alleles *2 or *3)
is found to be associated with diminished clinical response to clopi-
dogrel [69]. The relationship between the CYP2C19 genotype
and pharmacokinetics and pharmacodynamics was further extended
to clinical outcomes in clinical studies. Namely, CYP2C19 PMs
had a higher rate of death, nonfatal myocardial infarction, or non-
fatal stroke as compared to noncarriers following percutaneous
coronary intervention [70]. Some practitioners have developed an
algorithm to indicate when alternate treatments or treatment strat-
egies (including dose increases) are recommended for patients with
variant CYP2C19 genotypes [71]. Between May 2009 and May
2011, several sections of the clopidogrel labeling were updated to
include pharmacogenomics and drug interaction information
related to the diminished antiplatelet responses and the increased
risk of cardiovascular events in patients with reduced CYP2C19
function [68].

5.3 Establish Warfarin is an oral vitamin K antagonist anticoagulant. Treatment

a Rational Dosing with warfarin is complicated because of its narrow therapeutic
to Ensure Safety range and complex dose-response relationship [72, 73]. S-warfarin,
and Efficacy that is primarily responsible for wafarin’s pharmacologic effect, is
mainly metabolized by CYP2C9, a polymorphic enzyme with
5.3.1 Warfarin and
CYP2C9*2 and CYP2C9*3 as the 2 major variants with reduced
CYP2C9 and VKORC1
activities. The clearance of S-warfarin was found to be 3-fold lower
in CYP2C9*2 homozygous carriers and tenfold lower in CYP2C9*3
homozygous carriers [74]. Reduced clearance in patients with
CYP2C9 variants are associated with higher exposure of S-warfarin,
which leads to the need for reduced warfarin maintenance doses,
longer time needed to achieve a stable dose, and a higher risk of
bleeding especially during the induction period [72, 73].
In addition to CYP2C9, polymorphisms in VKORC1, which is
the target for vitamin K antagonists, were shown to have an impor-
tant impact on warfarin response. The major polymorphism of
VKORC1 has been identified at the −1,639 position, G/G being
the homozygous wild-type and A/A and G/A being the
homozygous and heterozygous variants. Patients with the variant
genotypes show increased responsiveness to warfarin [75], and
generally require lower doses than patients with the –1,639 G/G
wild-type genotype [76, 77]. Polymorphisms of CYP2C9 and
VKCOR1 genes alone consistently account for approximately 30 %
of the variability in warfarin dose requirement [78]. Patients carry-
ing variant CYP2C9 and/or VKORC1 genotypes had a higher
chance of major hemorrhage during warfarin therapy as a result of
over-dosing [72, 78].
16 Rosane Charlab and Lei Zhang

Pharmacogenetics based-dose adjustments are one tool to

individualize drug treatment according to genetic factors. Studies
have shown that genotype-based dosing can help identify the initial
dose that is more close to the stable dose and enhance the efficacy
and safety of anticoagulation [73, 79, 80]. A pharmacogenetic
algorithm was shown to estimate the therapeutic steady-state war-
farin dose more accurately than one using clinical factors and inter-
national normalized ratio (INR) response alone [81]. In January
2010, the warfarin labeling was updated to include a dosing table
to be considered for initial dosing based specifically on both
CYP2C9 and VKORC1 genotypes [82].

5.3.2 Tetrabenazine Tetrabenazine is a vesicular monoamine transporter 2 (VMAT)

and CYP2D6 inhibitor approved by the US FDA in 2008 for the treatment of cho-
rea associated with Huntington’s disease, a rare disease. Tetrabenazine
is mainly metabolized by carbonyl reductase to the active metabolites
α-dihydrotetrabenazine (HTBZ) and β-HTBZ which are metabo-
lized primarily by polymorphic CYP2D6. Drug interaction studies
showed an increased exposure of α-HTBZ and β-HTBZ in subjects
taking strong CYP2D6 inhibitors (three- and ninefold, respectively),
indicating that it is likely that the exposure to α-HTBZ and β-HTBZ
would be increased in CYP2D6 PMs [83, 84]. Using modeling and
simulation, the relative exposure in CYP2D6 PMs and EMs was pre-
dicted based on information from drug interaction studies, and vari-
ous genotype-appropriate doses were simulated [83]. Because there
is an increased risk of drug-associated depression, suicidality, and
QTc prolongation adverse events with an increased dose of tetra-
benazine and anticipated increased exposure of α-HTBZ and
β-HTBZ in CYP2D6 PMs compared to that in CYP2D6 EMs, the
labeling recommends genotyping patients for CYP2D6 prior to
administering a higher tetrabenazine dose (i.e., >50 mg/day) [83,
84]. In the tetrabenazine case, drug interaction data with a strong
inhibitor of CYP2D6 was used to project the likely exposure in
CYP2D6 PMs for labeling dose recommendations.

5.4 Inform Clinical Pharmacogenomic principles have been successfully incorporated

Trial Design and Drug in all phases of development, from preclinical to phase 3 trials and
Development [ 7, 85 ] in post approval phase (Fig. 2). Pharmacogenomics can be used in
preclinical studies to establish a biological rationale and assess
potential for polymorphic drug disposition. In early phase clinical
trials when the drug disposition pathway is known to be polymor-
phic, pharmacogenomics can be applied to exclude patients with
reduced function alleles (poor metabolizers) or direct a genotype-
guide dosing approach in later phase trials. In addition, drug inter-
actions studies can also help defining the role of reduced function
metabolism phenotypes. Conversely, samples collected in these
studies can be retrospectively analyzed to assess the effect of genetic
variation on drug pharmacokinetics and pharmacodynamics.
Information gathered in these early studies can be prospectively
Pharmacogenomics: Historical Perspective and Current Status 17

Experimental ---------- Optimize efficacy - - - Minimize risk ----------

evidence for
PGx
interaction Restricted Enriched/
FIH/DDI/HV stratified
trials trials
Major
polymorphic Stratified Stratified
pathways dose-finding dosing

Labeling

Nonclinical Phase 1 Phase 2 Phase 3 Phase 4

Metabolism, ADME K
transport Efficacy n
Intrinsic/ o
Drug-target extrinsic Safety w
interactions factors D/R, C/R l
e
Nonclinical Safety Intrinsic/ extrinsic factors d
safety g
e

Fig. 2 Pharmacogenomic maneuvers in drug development. Original Published Source: Figure 4 from reference
[86]. PGx pharmacogenomic, FIH first in human, DDI drug–drug interactions, HV healthy volunteers, ADME
absorption, distribution, metabolism, excretion, D/R dose–response, C/R concentration–response

applied in the design of phase 3 trials to select patients most likely

to respond to therapy, and to exclude those at risk of adverse events
based on pharmacokinetics or pharmacodynamics. A spectrum of
enrichment designs based on genomic markers can also be per-
formed at any point of the clinical development timeline [86].
Additional role for pharmacogenomics in the clinical arena
include applications in disease screening and risk assessment, diag-
nosis, prognosis and disease monitoring [87].

6 Future Perspectives

A recent survey of pharmacogenetics trials registered with clinical-

trials.gov shows 158 trials as in April 2011 [61]. The top three
therapeutic areas were oncology, psychiatric disorders and antico-
agulation/thrombosis. This snapshot reflects the major impelling
forces driving pharmacogenetics/pharmacogenomics approaches,
i.e., attempts to decrease adverse events and increase therapy suc-
cess by matching the right patient to the right drug and dosing
regimen. As discussed in the previous sections, some of the factors
contributing to therapeutic failures in oncology can be approached
through pharmacogenomics as for example the large tumor molec-
ular heterogeneity, molecular resistance to therapy and toxicity of
antineoplastic drugs. Failure of unselected therapies may also
prompt the use of pharmacogenetic approaches in psychiatric dis-
orders. It is estimated that about 40–60 % of patients have to
change the first prescribed antidepressant. Concerning anticoagu-
lation and thrombosis, most of the studies reported involved
18 Rosane Charlab and Lei Zhang

warfarin, which has a narrow therapeutic index, large inter-patient

and ethnic variation in dosing requirements, and a known genetic
component contributing to the response.
Where are we heading? Genome-wide approaches use geno-
type–phenotype associations. One of the bottlenecks of these
approaches is the collection of detailed, systematic and specific phe-
notypes in large studies to allow these associations to be investi-
gated. The wealth of genomic data that is currently being generated
is not paralleled by large-scale, detailed, standardized phenotypic
information of individuals—the so-called “phenomes.” The term
phenomics has been created to define this new growing “omic”
field dedicated to the large-scale study of high-dimensional pheno-
types, and which incorporates several layers of biology from mole-
cules, to signaling pathways to systems to behavior, in health and
disease. It requires systematic and comprehensive acquisition and
analysis of phenotypes with various methods (e.g., omic technolo-
gies, clinical, biochemical, imaging) at high throughput level in
order to provide insights on how genetic diversity translates into
phenotypes. Phenomics challenges are similar to the ones encoun-
tered by genomics in its early days: how to acquire and how to apply
comprehensive (omic; in this case phenotypic) data. Several phe-
nome projects are ongoing [88]. Efforts have also been initiated to
standardize definitions of phenotypes for discovery and validation.
Examples include phenotypes in serious adverse drug reactions such
as drug-induced liver injury, drug-induced skin injury, and drug-
induced torsade de pointes [89]. This search for standards and for
increase in phenotype granularity (known as “deep phenotyping”)
may reduce sample heterogeneity and measurement error leading
to an increase in study power, an improvement of diagnostics, and
ultimately to an increased understanding of genotype–phenotype
associations, outcomes, and treatment responses [88, 90, 91]. After
the genomics revolution, phenomics is next in line, as one more
necessary step to bridge the gap from bench to clinical practice.

Disclaimer

The views expressed in this article are those of the authors and may
not necessarily represent FDA policy. No official endorsement is
intended, nor should be inferred.

Acknowledgment

The authors acknowledge Dr. Issam Zineh for critical review of the
manuscript.
Pharmacogenomics: Historical Perspective and Current Status 19

References

1. Offit K (2011) Personalized medicine: new 17. Gonzalez FJ et al (1988) Characterization of

genomics, old lessons. Hum Genet 130(1): the common genetic defect in humans defi-
3–14 cient in debrisoquine metabolism. Nature
2. Weinshilboum R, Wang L (2004) 331(6155):442–446
Pharmacogenomics: bench to bedside. Nat 18. Blum M et al (1990) Human arylamine
Rev Drug Discov 3(9):739–748 N-acetyltransferase genes - isolation, chromo-
3. Kalow W et al (1998) Hypothesis: comparisons somal localization, and functional expression.
of inter- and intra-individual variations can DNA Cell Biol 9(3):193–203
substitute for twin studies in drug research. 19. Krynetski EY et al (1995) A single-point
Pharmacogenetics 8(4):283–289 mutation leading to loss of catalytic activity in
4. Evans WE, McLeod HL (2003) human thiopurine S-methyltransferase. Proc
Pharmacogenomics—drug disposition, drug Natl Acad Sci USA 92(4):949–953
targets, and side effects. N Engl J Med 20. Vesell ES (1989) Pharmacogenetic perspec-
348(6):538–549 tives gained from twin and family studies.
5. Roden DM et al (2006) Pharmacogenomics: Pharmacol Ther 41(3):535–552
challenges and opportunities. Ann Intern 21. Shin J et al (2009) Pharmacogenetics: from
Med 145(10):749–757 discovery to patient care. Am J Health Syst
6. Camilleri M, Saito YA (2008) Pharm 66(7):625–637
Pharmacogenomics in gastrointestinal disor- 22. Roden DM et al (2011) Pharmacogenomics:
ders. Methods Mol Biol 448:395–412 the genetics of variable drug responses.
7. Kirk RJ et al (2008) Implications of pharma- Circulation 123(15):1661–1670
cogenomics for drug development. Exp Biol 23. Evans WE, Relling MV (1999) Relling,
Med (Maywood) 233(12):1484–1497 Pharmacogenomics: translating functional
8. McLeod HL, Evans WE (2001) genomics into rational therapeutics. Science,
Pharmacogenomics: unlocking the human 286(5439):487–491
genome for better drug therapy. Annu Rev 24. Belle DJ, Singh H (2008) Genetic factors in
Pharmacol Toxicol 41:101–121 drug metabolism. Am Fam Physician 77(11):
9. Watters JW, McLeod HL (2003) Cancer 1553–1560
pharmacogenomics: current and future appli- 25. Gaedigk A et al (2008) The CYP2D6 activity
cations. Biochim Biophys Acta 1603(2): score: translating genotype information into a
99–111 qualitative measure of phenotype. Clin
10. Hudson KL (2011) Genomics, health care, Pharmacol Ther 83(2):234–242
and society. N Engl J Med 365(11): 26. Desta Z et al (2002) Clinical significance of
1033–1041 the cytochrome P450 2C19 genetic polymor-
11. Snyder LH (1932) Studies in human inheri- phism. Clin Pharmacokinet 41(12):913–958
tance. IX. The inheritance of taste deficiency 27. Mini E, Nobili S (2009) Pharmacogenetics:
in man. Ohio J Sci 32:436–468 implementing personalized medicine. Clin
12. Kim UK et al (2003) Positional cloning of the Cases Miner Bone Metab 6(1):17–24
human quantitative trait locus underlying 28. Venter JC et al (2001) The sequence of
taste sensitivity to phenylthiocarbamide. the human genome. Science 291(5507):
Science 299(5610):1221–1225 1304–1351
13. Motulsky AG (1957) Drug reactions 29. Lander ES et al (2001) Initial sequencing and
enzymes, and biochemical genetics. J Am analysis of the human genome. Nature
Med Assoc 165(7):835–837 409(6822):860–921
14. Vogel F (1959) Moderne probleme der 30. den Dunnen JT, Antonarakis SE (2001)
humangenetik. Ergebn Inn Med Kinderheilkd Nomenclature for the description of human
12:52–125 sequence variations. Hum Genet 109(1):
15. Nebert DW et al (2008) From human genet- 121–124
ics and genomics to pharmacogenetics and 31. Kacevska M et al (2011) Perspectives on epi-
pharmacogenomics: past lessons, future direc- genetics and its relevance to adverse drug
tions. Drug Metab Rev 40(2):187–224 reactions. Clin Pharmacol Ther 89(6):
16. Ma Q, Lu AY (2011) Pharmacogenetics, 902–907
pharmacogenomics, and individualized medi- 32. O’Donnell PH, Dolan ME (2009) Cancer
cine. Pharmacol Rev 63(2):437–459 pharmacoethnicity: ethnic differences in sus-
20 Rosane Charlab and Lei Zhang

ceptibility to the effects of chemotherapy. bench. Nat Rev Gastroenterol Hepatol 8(4):
Clin Cancer Res 15(15):4806–4814 202–211
33. Lee W et al (2005) Cancer pharmacogenom- 48. Pirmohamed M (2010) Pharmacogenetics of
ics: powerful tools in cancer chemotherapy idiosyncratic adverse drug reactions. Handb
and drug development. Oncologist Exp Pharmacol 196:477–491
10(2):104–111 49. Becquemont L (2010) HLA: a pharmacoge-
34. Judson R et al (2000) The predictive power nomics success story. Pharmacogenomics
of haplotypes in clinical response. 11(3):277–281
Pharmacogenomics 1(1):15–26 50. Mallal S et al (2008) HLA-B*5701 screening
35. Fujiwara Y, Minami H (2010) An overview of for hypersensitivity to abacavir. N Engl J Med
the recent progress in irinotecan pharmacoge- 358(6):568–579
netics. Pharmacogenomics 11(3):391–406 51. Hung SI et al (2006) Genetic susceptibility to
36. Becquemont L (2009) Pharmacogenomics of carbamazepine-induced cutaneous adverse
adverse drug reactions: practical applications drug reactions. Pharmacogenet Genomics
and perspectives. Pharmacogenomics 10(6): 16(4):297–306
961–969 52. TEGRETOL(Carbamazepine) Labeling
37. Wu X et al (2009) Strategies to identify phar- [Online]. http://www.accessdata.fda.gov/
macogenomic biomarkers: candidate gene, drugsatfda_docs/label/2011/
pathway-based, and genome-wide approaches. 016608s100s102 s.s., 018927s041s042,020
In: Innocenti F (ed) Genomics and pharma- 234s031s033lbl.pdf. Accessed 31 Oct 2011
cogenomics in anticancer drug development 53. McCormack M et al (2011) HLA-A*3101
and clinical response. Humana Press, Totowa and carbamazepine-induced hypersensitivity
NJ, pp 353–370 reactions in Europeans. N Engl J Med
38. Feero WG et al (2010) Genomic medicine— 364(12):1134–1143
an updated primer. N Engl J Med 362(21): 54. Ozeki T et al (2011) Genome-wide association
2001–2011 study identifies HLA-A*3101 allele as a genetic
39. Metzker ML (2010) Sequencing risk factor for carbamazepine-induced cutane-
technologies—the next generation. Nat Rev ous adverse drug reactions in Japanese popula-
Genet 11(1):31–46 tion. Hum Mol Genet 20(5):1034–1041
40. Wilke RA et al (2007) Identifying genetic risk 55. Somkrua R et al (2011) Association of HLA-
factors for serious adverse drug reactions: cur- B*5801 allele and Allopurinol-induced
rent progress and challenges. Nat Rev Drug Stevens Johnson syndrome and toxic epider-
Discov 6(11):904–916 mal necrolysis: a systematic review and meta-
41. Andrade RJ et al (2009) Drug-induced liver analysis. BMC Med Genet 12(1):118
injury: insights from genetic studies. 56. Niemi M (2010) Transporter pharmacoge-
Pharmacogenomics 10(9):1467–1487 netics and statin toxicity. Clin Pharmacol
42. Huang YS (2010) Tailored drug therapy for Ther 87(1):130–133
mitigating drug-induced liver injury: is this 57. Link E et al (2008) SLCO1B1 variants and
the era of genetic screening? Pers Med statin-induced myopathy—a genomewide
7(1):5–8 study. N Engl J Med 359(8):789–799
43. Wang L et al (2011) Genomics and drug 58. Walko CM, McLeod H (2009)
response. N Engl J Med 364(12):1144–1153 Pharmacogenomic progress in individualized
44. Daly AK et al (2009) HLA-B*5701 genotype dosing of key drugs for cancer patients. Nat
is a major determinant of drug-induced liver Clin Pract Oncol 6(3):153–162
injury due to flucloxacillin. Nat Genet 59. Innocenti F, Ratain MJ (2006)
41(7):816–819 Pharmacogenetics of irinotecan: clinical per-
45. Kindmark A et al (2008) Genome-wide phar- spectives on the utility of genotyping.
macogenetic investigation of a hepatic adverse Pharmacogenomics 7(8):1211–1221
event without clinical signs of immunopathol- 60. Wilke RA, Dolan ME (2011) Genetics and vari-
ogy suggests an underlying immune patho- able drug response. JAMA 306(3):306–307
genesis. Pharmacogenomics J 8(3):186–195 61. Carlquist JF, Anderson JL (2011)
46. Singer JB et al (2010) A genome-wide study Pharmacogenetic mechanisms underlying
identifies HLA alleles associated with unanticipated drug responses. Discov Med
lumiracoxib-related liver injury. Nat Genet 11(60):469–478
42(8):711–714 62. Lovly CM, Carbone DP (2011) Lung cancer
47. Tujios S, Fontana RJ (2011) Mechanisms of in 2010: one size does not fit all. Nat Rev Clin
drug-induced liver injury: from bedside to Oncol 8(2):68–70
Pharmacogenomics: Historical Perspective and Current Status 21

63. Biankin AV, Hudson TJ (2011) Somatic vari- 77. Yuan HY et al (2005) A novel functional
ation and cancer: therapies lost in the mix. VKORC1 promoter polymorphism is associ-
Hum Genet 130(1):79–91 ated with inter-individual and inter-ethnic dif-
64. Chapman PB et al (2011) Improved survival ferences in warfarin sensitivity. Hum Mol
with vemurafenib in melanoma with BRAF Genet 14(13):1745–1751
V600E mutation. N Engl J Med 364(26): 78. Gage BF, Lesko LJ (2008) Pharmacogenetics
2507–2516 of warfarin: regulatory, scientific, and clinical
65. Shaw AT et al (2011) Effect of crizotinib on issues. J Thromb Thrombolysis 25(1):45–51
overall survival in patients with advanced non- 79. Caraco Y et al (2008) CYP2C9 genotype-
small-cell lung cancer harbouring ALK gene guided warfarin prescribing enhances the effi-
rearrangement: a retrospective analysis. Lancet cacy and safety of anticoagulation: a
Oncol 12(11):1004–1012 prospective randomized controlled study.
66. Hutchinson L (2010) Targeted therapies: Clin Pharmacol Ther 83(3):460–470
activated PI3K/AKT confers resistance to 80. Gage BF et al (2008) Use of pharmacogenetic
trastuzumab but not lapatinib. Nat Rev Clin and clinical factors to predict the therapeutic
Oncol 7(8):424 dose of warfarin. Clin Pharmacol Ther 84(3):
67. Ellis LM, Hicklin DJ (2009) Resistance to 326–331
targeted therapies: refining anticancer therapy 81. Lenzini P et al (2010) Integration of genetic,
in the era of molecular oncology. Clin Cancer clinical, and INR data to refine warfarin
Res 15(24):7471–7478 dosing. Clin Pharmacol Ther 87(5):
68. PLAVIX (Clopidogrel) Labeling [Online]. 572–578
http://www.accessdata.fda.gov/drugsatfda_ 82. COUMADIN (Warfarin) prescribing informa-
docs/label/2011/020839s051lbl.pdf . tion. [Online]. www.accessdata.fda.gov/drug-
Accessed 28 Aug satfda_docs/label/2010/009218s108lbl.pdf.
69. Mega JL et al (2009) Cytochrome p-450 Accessed 28 Aug
polymorphisms and response to clopidogrel. 83. Tetrabenazine Clinical Pharmacology Review.
N Engl J Med 360(4):354–362 [Online]. http://www.accessdata.fda.gov/
70. Mega JL et al (2010) Reduced-function drugsatfda_docs/nda/2008/
CYP2C19 genotype and risk of adverse clini- 021894s000TOC.cfm. Accessed 28 Aug
cal outcomes among patients treated with 84. XENAZINE (Tetrabenazine) Labeling
clopidogrel predominantly for PCI: a meta- [Online]. http://www.accessdata.fda.gov/
analysis. JAMA 304(16):1821–1830 drugsatfda_docs/label/2011/021894s004lbl.
71. Scott SA et al (2011) Clinical pharmacoge- pdf. Accessed 28 Aug
netics implementation consortium guidelines 85. Stingl Kirchheiner JC, Brockmoller J (2011)
for cytochrome P450-2C19 (CYP2C19) gen- Why, when, and how should pharmacogenet-
otype and Clopidogrel therapy. Clin ics be applied in clinical studies? Current and
Pharmacol Ther 90(2):328–332 future approaches to study designs. Clin
72. Daly AK, King BP (2003) Pharmacogenetics Pharmacol Ther 89(2):198–209
of oral anticoagulants. Pharmacogenetics 86. Zineh I, Pacanowski MA (2011)
13(5):247–252 Pharmacogenomics in the assessment of ther-
73. Kim MJ et al (2009) A regulatory science per- apeutic risks versus benefits: inside the United
spective on warfarin therapy: a pharmacoge- states food and drug administration.
netic opportunity. J Clin Pharmacol 49(2): Pharmacotherapy 31(8):729–735
138–146 87. Diamandis M et al (2010) Personalized medi-
74. Scordo MG et al (2002) Influence of CYP2C9 cine: marking a new epoch in cancer patient
and CYP2C19 genetic polymorphisms on war- management. Mol Cancer Res 8(9):
farin maintenance dose and metabolic clear- 1175–1187
ance. Clin Pharmacol Ther 72(6):702–710 88. Houle D et al (2010) Phenomics: the next
75. Sconce EA et al (2005) The impact of challenge. Nat Rev Genet 11(12):855–866
CYP2C9 and VKORC1 genetic polymor- 89. Pirmohamed M et al (2011) The phenotype
phism and patient characteristics upon warfa- standardization project: improving pharma-
rin dose requirements: proposal for a new cogenetic studies of serious adverse drug
dosing regimen. Blood 106(7):2329–2333 reactions. Clin Pharmacol Ther 89(6):
76. Marsh S et al (2006) Population variation in 784–785
VKORC1 haplotype structure. J Thromb 90. Lanktree MB et al (2010) Phenomics:
Haemost 4(2):473–474 expanding the role of clinical evaluation in
22 Rosane Charlab and Lei Zhang

genomic studies. J Investig Med 58(5): 97. Eichelbaum M et al (1979) Defective

700–706 N-oxidation of sparteine in man: a new phar-
91. Tracy RP (2008) ‘Deep phenotyping’: char- macogenetic defect. Eur J Clin Pharmacol
acterizing populations in the era of genomics 16(3):183–187
and systems biology. Curr Opin Lipidol 98. Weinshilboum RM, Sladek SL (1980)
19(2):151–157 Mercaptopurine pharmacogenetics: mono-
92. Carson PE et al (1956) Enzymatic deficiency genic inheritance of erythrocyte thiopurine
in primaquine-sensitive erythrocytes. Science methyltransferase activity. Am J Hum Genet
124(3220):484–485 32(5):651–662
93. Kalow W (1956) Familial incidence of low 99. Ge D et al (2009) Genetic variation in IL28B
pseudocholinesterase level. Lancet 271: predicts hepatitis C treatment-induced viral
576–577 clearance. Nature 461(7262):399–401
94. Harris HW et al (1958) Comparison of iso- 100. Eichler HG et al (2011) Bridging the efficacy-
niazid concentrations in the blood of people effectiveness gap: a regulator’s perspective on
of Japanese and European descent; therapeu- addressing variability of drug response. Nat
tic and genetic implications. Am Rev Tuberc Rev Drug Discov 10(7):495–506
78(6):944–948 101. Bertilsson L et al (2002) Molecular genetics
95. Evans DA et al (1960) Genetic control of iso- of CYP2D6: clinical relevance with focus on
niazid metabolism in man. Br Med J psychotropic drugs. Br J Clin Pharmacol
2(5197):485–491 53(2):111–122
96. Mahgoub A et al (1977) Polymorphic 102. Janne PA et al (2009) Factors underlying sensi-
hydroxylation of Debrisoquine in man. tivity of cancers to small-molecule kinase inhib-
Lancet 2(8038):584–586 itors. Nat Rev Drug Discov 8(9): 709–723
Part II

Techniques for Interrogating Variation

in Human Genes and Genomes
Chapter 2

Denaturing High-Performance Liquid Chromatography

for Mutation Detection and Genotyping
Donna Lee Fackenthal, Pei Xian Chen, Ted Howe, and Soma Das

Abstract
Denaturing high-performance liquid chromatography (DHPLC) is an accurate and efficient screening
technique used for detecting DNA sequence changes by heteroduplex analysis. It can also be used for
genotyping of single nucleotide polymorphisms (SNPs). The high sensitivity of DHPLC has made this
technique one of the most reliable approaches to mutation analysis and, therefore, used in various areas of
genetics, both in the research and clinical arena. This chapter describes the methods used for mutation
detection analysis and the genotyping of SNPs by DHPLC on the WAVE™ system from Transgenomic
Inc. (“WAVE” and “DNASep” are registered trademarks, and “Navigator” is a trademark, of Transgenomic,
used with permission. All other trademarks are property of the respective owners).

Key words Denaturing high-performance liquid chromatography (DHPLC), Genotyping, Mutation

detection, Single nucleotide polymorphism (SNP), Single-base extension, SURVEYOR® nuclease

1 Introduction

1.1 Mutation The basis of mutation detection by DHPLC is the formation and
Detection by DHPLC discrimination of homoduplex and heteroduplex DNA molecules
that can be created when a DNA sequence change is present on
one allele [1]. The DHPLC cartridge (DNASep™) contains a non-
porous matrix consisting of polystyrene–divinylbenzene copolymer
beads. The beads are alkylated with C-18 chains which form single
C–C bonds, are electrostatically neutral and do not interact with
nucleic acids [2]. DNA binds to the cartridge matrix by the use of
triethylammonium acetate (TEAA) that serves as an ion-pairing
reagent between nucleic acids and the beads in the cartridge. The
positively charged triethylammonium ion bonds to the negatively
charged phosphate group on the DNA backbone and the hydro-
phobic groups of triethylammonium acetate interact with the
hydrophobic C-18 chains on the copolymer beads. DNA is eluted

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_2, © Springer Science+Business Media, LLC 2013

25
26 Donna Lee Fackenthal et al.

Wild Type Mutant Heteroduplexes Homoduplexes

denature
and cool
A C G T A T GC
A T GC

Fig. 1 Demonstration of the formation of heteroduplex and homoduplex molecules

from the cartridge by the use of an acetonitrile buffer, which at

increasing concentrations across the cartridge breaks the hydro-
phobic interactions between the TEAA/DNA molecules. As het-
eroduplex molecules form less hydrophobic interactions compared
to homoduplex molecules they are eluted off the cartridge faster
compared to homoduplex molecules.
Generally coding exons and flanking intron sequences are tar-
geted for mutation detection. These regions are amplified with
specific primers to create amplicons of approximately 180–700 bp,
which is the optimal size for mutation detection by DHPLC. Larger
amplicons can also be used for mutation detection, but the sensitiv-
ity of the technique decreases with the increasing size. To create
homoduplex and heteroduplex molecules, the PCR fragments
are denatured followed by gradual reannealing such that in the
presence of a heterozygous sequence change, “wild type” and
“mutant” sense fragments reanneal with both “wild type” and
“mutant” antisense fragments creating homoduplex and heterodu-
plex molecules (Fig. 1). Homoduplex and heteroduplex molecules
are separated on the DHPLC cartridge under partially denaturing
conditions (increased temperatures) that cause the heteroduplex
molecules to be significantly more denatured than the homoduplex
molecules allowing for their better separation. Homoduplex and
heteroduplex molecules bind with differing affinities to the car-
tridge and elute differently in the presence of an increasing acetoni-
trile gradient, with heteroduplex molecules eluting earlier. The
eluted DNA is detected with a UV lamp and a chromatogram is
generated electronically. A sample with no sequence change will
produce only homoduplex molecules while a sample with a sequence
change will produce both homoduplex and heteroduplex molecules
with the heteroduplex molecules showing up as an extra peak on
the chromatogram. PCR fragments with heteroduplex peaks can be
sequenced to determine the exact sequence change present.
The sensitivity of mutation detection by DHPLC is estimated to be
between 96 and 100 % and very closely matches the sensitivity of
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 27

direct sequencing [1, 3, 4]. For this reason, mutation detection by

DHPLC is now widely used in both research and clinical settings [5].

1.2 Genotyping The DHPLC instrument can also be used for genotyping of SNPs.
by DHPLC DNA sequence fragments that differ at a single base pair position
can be distinguished on the DHPLC due to the differing hydro-
phobicities of different base pairs that can cause a change in their
elution profile [6]. This characteristic is taken advantage of in geno-
type applications used on the DHPLC and one that has been used
successfully is single-base extension (SBE) genotyping [7]. Single-
base extension on the DHPLC (SBE-DHPLC) is performed by an
initial polymerase chain reaction of an amplicon with a single-base
change to be genotyped followed by an extension reaction using an
oligonucleotide that acts as a single-base extension primer. The
SBE primer is annealed downstream or upstream immediately
adjacent to the SNP to be genotyped in the 5′ to 3′ direction.
Thermosequenase extends the 3′ end of the extension primer with the
appropriate ddNTP. The primer extends one base only because the
ddNTP terminates further extension. Extended products are sepa-
rated on the DHPLC based on the hydrophobicity of the last base,
so although the lengths of the extended products are the same for
different alleles, the hydrophobicity of the extended products of
each allele will be different.
Another variation of single-base extension genotyping on the
DHPLC is primer extension genotyping where a combination of
dNTPs and ddNTPs are added to the reaction so that depending
on the allele present, either extension beyond the single-base or
just single-base extension occurs [8]. Separation of the extended
products then becomes a function of the differing lengths of the
two extended products. This review will focus on the protocol for
single-base extension genotyping.
The utility of the DHPLC for genotyping is not as widespread as
its mutation detection application. However, the utility and effective-
ness of SBE-DHPLC for genotyping purposes has been clearly
demonstrated [7]. In our experience genotyping by SBE-DHPLC is
a very robust technique and has often worked where other methods
of genotyping have failed. It is a very useful methodology for medium
scale genotyping projects of approximately 500–1,000 samples.

1.3 Surveyor Transgenomic SURVEYOR® Mutation Detection Kits use a new

Nuclease for Mutation mismatch-specific plant DNA endonuclease to scan for known and
Detection and unknown mutations and polymorphisms in heteroduplex DNA.
Genotyping SURVEYOR Nuclease, the key component of the kit, is a member
of the CEL family of plant endonucleases that cleave DNA with
high specificity at sites of base substitution mismatch and other
distortions [9, 10]. These DNA endonucleases cut both strands of
a DNA heteroduplex on the 3′-side of the mismatch site [11, 12].
Insertion/deletion mismatches and all base-substitution mismatches
28 Donna Lee Fackenthal et al.

are recognized, but the efficiency of cleavage varies with the

sequence of the mismatch [9, 12]. DNA endonucleases from
celery have been used to detect accurately a variety of mutations
and polymorphisms in the human BRCA1 gene [10, 13]. Other
applications include high-throughput screening of induced point
mutations (TILLING) in Arabidopsis [14–16], Lotus [17], and
zebrafish [18], screening for SNPs in inbred rat strains [19], and
scanning of large regions of bacterial genomic DNA for mutations
and polymorphisms (GIRAFF) [11, 20]. SURVEYOR Nuclease
has been used to verify the presence of known mutations in a number
of genes in human peripheral blood DNA [21], to carry out screen-
ing for induced point mutations in barley [22] and to screen for
error-free clones generated from a plant cDNA library by PCR-based
cloning [23]. The SURVEYOR® Mutation Detection Kits for
WAVE® and WAVE® HS Systems has been designed to cleave DNA
fragments at mismatched sites for subsequent analysis by ion-pairing
reverse-phase HPLC using the WAVE and WAVE HS Systems.

2 Materials

2.1 Instrumentation 1. The WAVE system from Transgenomic Inc., Omaha, NE, is the
most widely used system for DHPLC analysis. The methods
described in this chapter pertain specifically to the WAVE™
Nucleic Acid Fragment Analysis System 3500HT, although
applicable to other WAVE™ model types (see Note 1). The
3500HT system is a high throughput system that allows for
analysis of hundreds of samples. It consists of six major compo-
nents (Fig. 2): degasser, in which the two ion-pairing buffers

Sample Introduction

Pump Autosampler/Chiller Oven with Cartridge UV Detector

Degasser

Waste Container

Computer

Buffers A, B, C, D

Fig. 2 System flowpath of the WAVE nucleic acid fragment analysis system 3500HT
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 29

(A and B) and two acetonitrile solutions (syringe wash and

solution D) originate their flow; a quaternary pump which
controls the percentage of Buffer A, Buffer B, and Solution D
that flow through the system and prevents contaminants by
way of filters from entering the flowpath; autosampler and
chiller which contain two 96-well plate holders and the injection
needle and valve; oven which contains the inline filter used to
filter out particles larger that 0.5 μm, a pre-heat coil, and the
separation cartridge (DNASep™); UV detector (deuterium
lamp) which measures the absorbance of DNA samples at
260 nm by light refracting and splitting into two beams; and a
computer which plots the absorbance (y-axis) against time
(x-axis) and depicts the DNA as peaks on a chromatogram.
WAVE system options include a fragment collector, and a fluo-
rescence detector (xenon lamp) fitted with a High-Sensitivity
Accessory (HSX). They are connected through the interface
module. The fragment collector is used to collect and reana-
lyze separated fragments. The fluorescence detector and HSX
module are used for increasing sensitivity without the need to
use fluorescently tagged primers or probes.
The actual flow path of the samples (amplified, denatured
and reannealed PCR fragments) begins with the samples enter-
ing the autosampler (Fig. 2). Initially the injection needle
which is connected to the bottom of a glass syringe is washed
with the Syringe Wash Solution (buffer C) and then drops into
the vial/well at which time the syringe plunger goes down and
draws a vacuum that removes the sample from the vial/well.
The needle moves to the injection port where the sample is
injected into the sample loop. The sample is then carried by
buffers to the cartridge in the oven where the DNA fragments
are separated based on their structures. As the DNA is eluted
from the cartridge by the buffers, it passes through a flow cell
where the absorbance of light from the UV detector is mea-
sured and plotted on a chromatogram by aid of the computer
and associated software.
2. Thermal Cyclers with 96 wells and heated lids are used for
PCR reactions and denaturation/reannealing of samples.

2.2 Software At the time of printing, Navigator™ Software Version 3.0 is the
operating software for the WAVE™ platform (Transgenomic, Inc.,
Omaha, NE).

2.3 Cartridge DNASep™ HT Cartridge (Transgenomic, Inc., Omaha, NE) is used

for chromatography (see Note 2).

2.4 DHPLC Buffers/ 1. WAVE™ Optimized Buffer A or equivalent, 100 mM Triethyl-

Solvents (See Note 3) Ammonium Acetate, store at room temperature, stable up to
2 weeks opened, stable up to 18 months unopened.
30 Donna Lee Fackenthal et al.

2. WAVE™ Optimized Buffer B or equivalent, 100 mM Triethyl-

Ammonium Acetate and 25 % Acetonitrile, store at room tem-
perature, stable up to 2 weeks opened, stable up to 18 months
unopened.
3. WAVE™ Optimized Syringe Wash Solution or Buffer C or
equivalent, 4.5 % Acetonitrile, store at room temperature, stable
up to 2 weeks opened, stable up to 18 months unopened.
4. WAVE™ Optimized Solution D or equivalent, 75 % Acetonitrile,
store at room temperature, stable up to 1 month opened, stable
up to 18 months unopened.

2.5 Single-Base 1. 1× Shrimp Alkaline Phosphatase Buffer, store at −20 °C.

Extension Buffers/ 2. 1 U E. coli Exonuclease I, store at −20 °C.
Solvents
3. Deionized-distilled H2O (minimum 18 Mohms reading) or
2.5.1 For PCR HPLC-grade H2O.
Purification

2.5.2 For Single-Base 1. 1× Thermo Sequenase™ Concentrated Reaction Buffer, store

Extension Reaction at −20 °C.
2. Thermo Sequenase™ Enzyme Dilution Buffer, store at −20 °C.
3. 2.5 U Thermo Sequenase™ DNA Polymerase with
Pyrophosphatase, store at −20 °C.
4. 250 μM each ddNTP: ddATP, ddCTP, ddGTP, ddTTP, store
at −20 °C.
5. 1 μM Extension Primer each, store at −20 °C.

2.6 SURVEYOR ® Transgenomic SURVEYOR® Plus Nuclease kits specifically

Nuclease designed for the WAVE™ and WAVE™ HS systems are available in
25, 100, and 1,000 reaction kits, the components of which are
listed below; store at −20 °C.
1. 25-RXN Kit: 30 μL SURVEYOR Nuclease W, 30 μL
SURVEYOR Nuclease Enhancer W2, 250 μL SURVEYOR
Nuclease Enhancer Cofactor, 250.0 μL 0.15 M MgCl2
Solution, 250 μL SURVEYOR Nuclease Stop Solution, 10 μL
SURVEYOR Control C, and 10 μL SURVEYOR Control G.
2. 100-RXN Kit: 120 μL SURVEYOR Nuclease W, 120 μL
SURVEYOR Nuclease Enhancer W2, 1.0 mL SURVEYOR
Nuclease Enhancer Cofactor, 1.0 mL 0.15 M MgCl2 Solution,
1.0 mL SURVEYOR Nuclease Stop Solution, 10 μL
SURVEYOR Control C, and 10 μL SURVEYOR Control G.
3. 1000-RXN Kit: 1.0 mL SURVEYOR Nuclease W, 1.0 mL
SURVEYOR Nuclease Enhancer W2, 3.0 mL SURVEYOR
Nuclease Enhancer Cofactor, 3.0 mL, 3.0 mL 0.15 M MgCl2
Solution, 3.0 mL SURVEYOR Nuclease Stop Solution, 10 μL
SURVEYOR Control C, and 10 μL SURVEYOR Control G.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 31

3 Methods

3.1 Mutation The following considerations should be taken into account when
Detection by choosing amplicons to be analyzed by DHPLC for mutation
Heteroduplex Analysis detection:
3.1.1 PCR 1. The optimal size of the amplicons should be 180–700 bp.
Amplicon Design 2. The melting temperature range of the amplicon should be
between 52 and 75 °C.
3. Ideally choose one amplicon with one melting domain as
opposed to multiple melting domains. Figure 3a shows an
amplicon with a single melting domain and Fig. 3b demon-
strates an amplicon with multiple melting domains. When the
amplicon of interest has multiple melting domains it may be
necessary to break the fragment into smaller amplicons with
one melting domain each or incorporate GC clamps (see Note 4)
to PCR primers to even out melting domains within the ampli-
con, thereby obtaining one melting temperature.

a
1.00
Helical Fraction %

0.75

0.50

0.25

0.00
0 50 100 150 200 250
Base Position
Domain 1
b
1.00

0.75 Domain 2
Helical Fraction %

0.50

0.25

0.00
0 50 100 150 200 250 300
Base Position

Fig. 3 (a) Amplicon with single melting domain; (b) Amplicon with two melting
domains
32 Donna Lee Fackenthal et al.

4. Amplicon melting profiles are sequence dependent. The GC

content within an amplicon also determines the melting profile.
The optimal GC content is 48–68 %.
5. Ideally there should be 2° or less difference between TM of the
PCR primers.

3.1.2 Preparation of PCR 1. PCRs for subsequent DHPLC analysis are performed using
Samples for Mutation regular touchdown PCR protocols and in 50 μL volumes to
Detection allow for sufficient volume injections for DHPLC analysis at
various temperatures.
2. Negative control DNA samples should be included for every
amplicon being analyzed and positive control DNA samples
should be included when available (see Note 5). A negative
control is a sample with no sequence change in the amplicon
and a positive control is a sample with a known sequence
change in the amplicon being analyzed. A blank (H2O) control
should also be included to check for PCR contamination.
3. After the PCR, samples are denatured and gradually allowed
to reanneal to create homoduplex and heteroduplex products.
To do this, samples are briefly spun down and denatured and
slowly reannealed over 60 min with the following cycling
profile: [95 °C for 5 min, ramp 95 °C → 45 °C over 60 min,
45 °C for 30 s, hold at 4 °C] (see Note 6).

3.1.3 Instrument 1. The Cartridge for sample injections should be installed in the
and Cartridge Preparation WAVE oven as per manufacturer’s instructions and the oven
temperature should be set to 50 °C for sizing PCR products
(see Note 7).
2. The volumes of Buffer A, Buffer B, Solution D and the Syringe
Wash Solution (Buffer C) should be checked to make sure that
sufficient buffer exists for the number of injections to be per-
formed (see Note 3). Check waste receptacle, exchange if
receptacle is almost full to capacity.
3. Wash the cartridge with 100 % Buffer D at flow rate 1.5 mL/
min for 10 min.
This is performed by entering 100 for %D and 0 for %B
and %C and changing the flow rate to 1.5 mL/min on the
pump keypad.
4. Wash the syringe five times by pushing the WASH button on
the instrument’s autosampler.
5. Purge the pumps by setting Buffer A, Buffer B, and Solution D
to 33 % each, flip purge valve in the pump chamber to the
“open” position, press “purge.” Purge for 2 min. Press “purge”
again and close the purge valve. Purging the pump helps to
eliminate air bubbles.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 33

6. Equilibrate the cartridge at 65 % Buffer A and 35 % Buffer B at

flow 1.5 mL/min for 20 min. This is performed by entering
35 for %B and 0 for %C and %D and changing the flow rate to
1.5 mL/min on the pump keypad.
Note: Steps 2–6 should be performed once daily prior to run-
ning samples on the instrument. This helps to keep the cartridge
and flow path clean and free from impurities. In addition, for opti-
mal instrument and cartridge performance, regular quality control
procedures should be performed at regular intervals (see Note 8).

3.1.4 Set-up of Project Certain criteria are important to set up as default settings as
Defaults they pertain to all mutation detection runs. Once these settings are
created they need not be entered each time prior to each run.
1. On the Menu Bar, choose Setup then Project Defaults.
2. In the Equilibrate Cartridge area: check the Before first Injection
box and enter 3 min. This is necessary for equilibrating the
cartridge prior to the first injection. Check the After Temperature
Change box and enter 5 min. Again, it is necessary to equilibrate
the cartridge (see Note 9). Check the After Gradient Change
box and enter 5 min. This allows for a 5 min equilibration of
the cartridge in between changing of the buffer gradient.
3. In the Injection Ordering area: check the Run in Temperature
Order Ascending. This allows for samples to be injected in
ascending temperature order thereby minimizing the number
of times the oven needs to change temperature.
4. In the Clean Options area: check Normal Clean (see Note 10).
5. In the Injection area: Select Injection Type ALL. This injection
type gives better intensity. In the Default Injection Volume
enter 7 μL. In the Feed Volume, that is, the volume of syringe
wash solution injected into the flow path, enter 25 μL when
the Injection Type is ALL.
6. Disable Tray Change Request is optional. If this is checked,
the tray change prompt will not appear when a run is started.
This is especially useful when two trays are used for one run.

3.1.5 Creating For every amplicon to be analyzed, a method needs to be created.

a New Method A method contains information or parameters used to run injections.
Once a method is created for a particular amplicon, it can be saved
and reused. There are three ways to create a method. A method can
be created while setting up specific injections on the Injection page
or on the DNA page or can be created independently. Guidelines
for creating a method independently are detailed below:
1. On the Menu Bar, select File -> New Method. Enter a method
name. It is helpful to choose a name that includes relevant infor-
mation such as the name of the gene, exon, and type of analysis.
34 Donna Lee Fackenthal et al.

2. Enter the Application Type (see Note 11).

3. Enter the number of Base Pairs of the amplicon.
4. Enter the appropriate Temperature for analysis (see Note 12),
5. The default Injection type, is ALL.
6. The Clean type is set at Normal Clean as entered in the Project
Defaults.
7. The flow rate is the rate at which the buffers move through the
system in milliliters per minute. The application type will auto-
matically specify the flow rate.
8. The Percent B is automatically calculated based on the number
of base pairs that is entered.
9. The Slope is the amount the %B increases per minute. A slope
of 2 % increase in Buffer B per minute is the recommended
gradient for Mutation Detection [24, 25]. The Percent B
should be between the start and stop gradient as indicated on
the gradient table.
10. The gradient plot and gradient table are automatically updated
when certain parameters including the application type are
changed. The gradient plot displays the window of the gradi-
ent, that is, it shows the amount of buffer used along the
gradient. The horizontal blue line represents the percentage of
the buffer(s) indicated in the Display field. The blue vertical
line indicates where the fragment peak of interest is theoreti-
cally predicted to elute under denaturing conditions. The red
line (which appears with the Mutation Detection application
type only) is a guideline as to where the peak will elute under
non-denaturing conditions. The two solid black vertical lines
indicate the optimal elution window.
11. The estimated run time is automatically calculated and appears
above the gradient table. It should be noted that choosing
the application type, Rapid DNA, decreases the run time per
sample. The Rapid DNA application type is the one of choice
for the 3500HT system.
12. Time shift (optional)—The time shift is an adjustment in min-
utes that moves the elution of the fragment of interest either
earlier or later in the gradient. The value of the time shift can
be negative (earlier elution) or positive (later elution) with the
value between −10 and +10. The time shift actually offsets the
gradient by the formula: value x slope. For example, a slope of
2.5 %/min and a time shift value of +1.0 min decreases all
values for %B (not including clean-off) by 2.5 %. The lower
percentage of Buffer B results in peaks increasing in retention
time. A time shift is recommended if, for example:
(a) The peak of interest is eluting too late which would result
in an absence of the peak on the chromatogram, change
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 35

time shift default to a negative value such as −1.0 which

results in earlier elution.
(b) The peak of interest elutes too early, then change the value
to +1.0 min.
Essentially, the slope of the gradient changes as a result of a
time shift.

3.1.6 Create Sample A sample sheet is a table that specifies the injection order and method
Sheet (Injection Table) type to use for a series of samples to be analyzed. In a sample sheet,
information including such variables as sample name, sample loca-
tion in the tray (vial), method to be used (that links to information
such as application type, volume to be injected for each sample, oven
temperature, clean type, and flow rate) is listed. This needs to be set
up for every set of samples to be analyzed and prior to each run.
1. The sample sheet should be set up following the Navigator™
Software Manual that lists detailed step-by-step instructions.
2. It is recommended that each sample be injected once for sizing
(see Note 7). This is performed using a Sizing application type,
that is, DS (double-stranded) Single Fragment (see Note 11).
It is recommended that each sample be injected three times for
mutation detection analysis, using the three different tempera-
tures calculated for optimal detection of sequence change for
the particular amplicon in question (see Note 12).

3.1.7 Running Samples 1. After all the daily maintenance has been performed
(Subheading 3.1.3, steps 2–6) and the sample sheet created,
run the samples by highlighting specific injections then press-
ing the run injection button indicated by the green triangle or
simply pressing the run injection button when all injections in
the sample sheet are to be run.
2. The first 3 min of the run is an equilibration. The equilibration
line should be flat at 0 mV. A slight deviation in the line is nor-
mal. If the line is not flat at 0 mV the run must be discontinued
and the cartridge equilibrated for an additional 10 min.

3.1.8 Analyzing Results 1. As previously mentioned, for all amplicons being analyzed for
mutation detection, a normal control (a sample with no
sequence change in the amplicon) should be included, and a
positive control (a sample with a known sequence change in
the amplicon), if available, should be included.
2. Compare the chromatogram of the normal control with the
experimental samples for analysis. An absence of a change in
the chromatogram between the experimental sample and the
normal control indicates no sequence change present in the
amplicon of the experimental sample. If a sequence change is
present in the amplicon of the experimental sample this will be
36 Donna Lee Fackenthal et al.

Heteroduplex Peak Homoduplex Peak

Experimental Sample

Normal Control Sample

Fig. 4 Normal and experimental samples indicating homoduplex and heteroduplex peaks

depicted as an additional peak in the chromatogram as

compared to the normal control. The first peak to come off the
cartridge represents the heteroduplex product and the second
peak that elutes later is the homoduplex product (Fig. 4)
(see Note 13).
3. Sequence the amplicons of those samples where a change in
the chromatogram is observed (see Note 14).

3.2 Single-Base 1. PCR amplification of the region containing the SNP to be

Extension genotyped is performed using regular PCR conditions in a
15 μL volume, with the following exceptions:
3.2.1 Preparation of PCR
Samples for Single-Base (a) Primer concentrations are decreased to 125 nM each as
Extension excess primer can interfere with the subsequent extension
reaction by causing extension to occur from the PCR prim-
ers as opposed to the extension primer. The concentration
may be doubled if multiplex reactions are performed or if
the amplicon size is larger than usual.
(b) dNTP concentrations are decreased to 50 μM each as
excess can result in extension beyond the single-base in the
subsequent extension reaction. The concentration may be
doubled if multiplex reactions are performed or if the
amplicon size is larger than usual.
2. The PCR cycling conditions begin with an initial denaturation
step at 95 °C for 15 min followed by 40 cycles of the following
profile: [95 °C for 15 s, TA °C for 15 s (annealing temperature
is dependent on primer TM), 72 °C for 30–60 s]. Final
Extension at 72 °C for 10 min, Hold at 4 °C.
3. Check quality and size of PCR products by running 3 μL on a
1.5 % agarose gel (see Note 15). Also include positive and
negative controls. Positive controls are samples with known
genotypes (see Note 16). The negative control contains no
DNA and therefore should not yield a PCR product. If sizes
are correct and yield is adequate, proceed with the purification
reaction.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 37

4. The following should be noted with regards to the PCR:

(a) The optimal size of the amplicons should be 180–700 bp.
(b) As with all PCRs, when designing primers, avoid 3′ end
dimers, 3′ hairpin loops, and false priming. Primers can be
designed using primer analysis software such as Oligo
(Version 6.0). Primers can be checked for specificity and to
make sure that they do not contain polymorphic sites by
performing appropriate BLAST (www.ncbi.nlm.nih.gov)
and BLAT (www.genome.ucsc.edu) searches.
(c) Multiple SNPs can be genotyped simultaneously by per-
forming multiplex reactions (see Note 17).

3.2.2 PCR Purification 1. Purification reactions are performed in a 20 μL volume. 10 μL

for Single-Base Extension of PCR products are used for each reaction. Prepare master
mix that consists of the following reagents:
1 U of Shrimp Alkaline Phosphatase (SAP) which removes
excess dNTPs from the PCR reaction.
1 U of Exonuclease I to remove excess primers (see Note 18).
1× SAP Buffer.
Aliquot 10 μL of master mix to 10 μL of PCR product for
each reaction (see Note 19).
2. Reactions are incubated at 37 °C for 45 min followed by
inactivation of the enzymes at 95 °C for 15 min. Samples can
be held at 4 °C after that.

3.2.3 Single-Base 1. Single-base extension reactions are performed in a 10 μL vol-

Extension Reaction ume. Prepare master mix that consists of the following reagents:
1× Thermo Sequenase™ Concentrated Reaction Buffer.
250 μM of each ddNTP.
1 μM extension primer (see Note 20).
1.25 U Thermo Sequenase (see Note 21).
Aliquot 4 μL of master mix to 6 μL of purified PCR product
for each reaction.
When performing multiplex SBE reactions (see Note 22),
add the additional extension primer(s) to the master mix (also
1 μM concentration) and increase the aliquot of master mix by
0.5 μL for each additional extension primer added. The volume
of purified PCR product should be decreased by 0.5 μL (for
each additional primer added) as well.
2. The cycling conditions begin with an initial denaturation step
at 96 °C for 2 min followed by 60 cycles of the following
profile: [96 °C for 30 s, 55 °C for 30 s, 60 °C for 30 s]. Hold
at 4 °C.
38 Donna Lee Fackenthal et al.

3.2.4 Denaturing 1. Denature samples at 96 °C for 4 min followed by 4 °C hold

Samples for Single-Base before running the samples on DHPLC instrument.
Extension 2. In instances where single base extension reactions are pooled
prior to running on the DHPLC, a minimum of 8 μL of
each individual reaction are combined prior to denaturation
(see Note 23).

3.2.5 Instrument 1. The cartridge for sample injections should be installed in the
and Cartridge Preparation WAVE oven as per manufacturer’s instructions and oven tem-
perature should be set to 70 °C to keep extension products
denatured.
2. The volumes of Buffer A, Buffer B, Solution D and the Syringe
Wash Solution (Buffer C) should be checked to make sure that
sufficient buffer exists for the number of injections to be per-
formed (see Note 3). Check waste receptacle, exchange if
receptacle is almost full to capacity.
3. Wash the cartridge with 100 % Solution D at flow rate 1.5 mL/
min for 10 min.
This is performed by entering 100 for %D and 0 for %B
and %C and changing the flow rate to 1.5 mL/min on the
pump keypad.
4. Wash the syringe five times by pushing the WASH button on
the instrument’s autosampler.
5. Purge the pumps by setting Buffer A, Buffer B, and Solution D
to 33 % each, flip purge valve in the pump chamber to the
“open” position, press “purge.” Purge for 2 min. Press “purge”
again and close the purge valve. Purging the pump helps to
eliminate air bubbles.
6. Equilibrate the cartridge at 65 % Buffer A and 35 % Buffer B at
flow 1.5 mL/min for 20 min. This is performed by entering
35 for %B and 0 for %C and %D and changing the flow rate to
1.5 mL/min on the pump keypad.
Note: Steps 2–6 should be performed once daily prior to run-
ning samples on the instrument. This helps to keep the cartridge
and flow path clean and free from impurities. In addition, for opti-
mal instrument and cartridge performance, regular quality control
procedures should be performed at regular intervals (see Note 8).

3.2.6 Set-up Certain criteria are important to set up as default settings as they
of Project Defaults pertain to all single-base extension runs. Once these settings are
created they need not be entered each time prior to each run.
1. On the Menu Bar, choose Setup then Project Defaults.
2. In the Equilibrate Cartridge area: Check the Before first
Injection box and enter 3 min. This is necessary for equilibrating
the cartridge prior to the first injection.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 39

3. In the Injection Ordering area: Select Run in Injection Order.

4. In the Clean Options area: Select Normal Clean. Do not
choose Fast or Active clean (see Note 24).
5. In the Injection area: Select Injection Type ALL. This injection
type gives better intensity. In the Default Injection Volume
enter 8 μL for both single and multiplex reactions. For pooled
samples, enter 16 μL. In the Feed Volume, that is, the volume
of syringe wash solution injected into the flow path, enter
25 μL when the Injection Type is ALL.
6. Disable Tray Change Request is optional. If this is checked, the
tray change prompt will not appear when a run is started. This
is especially useful when two trays are used for one run.

3.2.7 Creating a The Mutation Detection application type is used as a template to

New Method manually create a new method for SBE-DHPLC runs.
1. On the Menu Bar, select File -> New Method. Enter a method
name. It is helpful to choose a name which includes relevant
information such as the name of the gene, targeted SNP, type
of genotyping assay, etc.
2. In the opened “Method” window, edit method parameters as
below:
(a) Select Mutation Detection as Application Type if it is not
shown.
(b) Enter 1.5 mL/min for Flow Rate when using a HT
cartridge.
(c) Enter 70 °C for Oven Temperature.
(d) Select Normal clean as Clean Type as indicated in Project
Defaults.
(e) Enter the number of base pairs of the extension primer
length or shortest primer length if multiplex SBE is applied.
The Navigator™ Software will calculate the start gradient.
This may need to be adjusted if the peak is eluted too early
or too late within the run time. The start gradient can be
adjusted by performing a time shift (see step 3 below). The
start gradient can also be adjusted by taking note of what
point in the Navigator™-calculated gradient the unex-
tended primer elutes at, which is indicative of the percent-
age B, and the start gradient is adjusted accordingly (usually
1 % before the percentage B at which the unextended
primer elutes at).
(f) Manually change the default settings for the following vari-
ables that determine the gradient range and duration:
40 Donna Lee Fackenthal et al.

Slope (%B per min) 5.0 %

Drop for loading 5.0 %
Loading duration 0.3 min
Gradient duration 2.0 or 2.5 min (see Note 25)
Clean duration 0.5 min
Equilibration duration 0.9 min

(g) Click Save after entering the parameters but first carefully
check all the values of the parameters as sometimes when
one parameter is entered or modified, it might also change
one or some of the other parameters. If this happens the
original values need to be re-entered. This is a result of the
software application.
3. Time shift (optional)—The time shift is an adjustment in min-
utes that moves the elution of the fragment of interest either
earlier or later in the gradient. The value of the time shift can be
negative (earlier elution) or positive (later elution). The time
shift actually offsets the gradient by the formula: value × slope.
For example, a slope of 2.5 %/min and a time shift value of
+1.0 min decreases all values for %B (not including clean-off)
by 2.5 %. The lower percentage of Buffer B results in peaks
increasing in retention time. A time shift is recommended if, for
example:
1. The peak of interest is eluting too late which would result
in an absence of the peak on the chromatogram. Change
time shift default to a negative value such as −0.5 which
results in earlier elution.
2. The peak of interest elutes too early, then change the value
to +0.5 min.
Essentially, the slope of the gradient changes as a result of a
time shift.

3.2.8 Create Sample A sample sheet is a table that specifies the injection order and method
Sheet (Injection Table) type to use for a series of samples to be analyzed. In a sample sheet,
information including such variables as sample name, sample location
in the tray (vial), method to be used (that links to information such
as application type, volume to be injected for each sample, oven
temperature, clean type and flow rate) is listed. This needs to be set
up for every set of samples to be analyzed and prior to each run.
The sample sheet should be set up following the Navigator™
Software Manual that lists detailed step-by-step instructions.

3.2.9 Running Samples The primers should be initially injected individually to check the
elution time. This is critical when performing multiplex SBE as the
peaks need to be separated by at least 30 s.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 41

1. After all the daily maintenance has been performed

(Subheading 3.2.5, steps 2–6) and the sample sheet created,
run the samples by highlighting specific injections then pressing
the run injection button indicated by the green triangle or
simply pressing the run injection button when all injections in
the sample sheet are to be run.
2. The first 3 min of the run is an equilibration. Watch to make sure
that the equilibration line is flat at 0 mV. A slight deviation in the
line is normal. If the line is not flat at 0 mV the run must be dis-
continued and the cartridge equilibrated for an additional 10 min.

3.2.10 Analyzing 1. On the Injection Page, select the appropriate Tray Name for
Single-Base Extension the run which is to be analyzed and click on the Results tab.
Results Two charts with x and y-axis will be displayed. The x-axis indi-
cates minutes and the y-axis represents the absorbance. In the
Injection Table, under Chart 1 (for graph 1), highlight the
blank control as well as the known genotyped control samples.
The results of the experimental samples will be compared to
these controls and the genotypes determined. Under Chart 2,
highlight each sample individually to read the genotype.
2. Based on the increasing hydrophobicity of the four bases on the
extension products, the elution order is C<G<A<T, i.e., the C
extension product elutes first and the T extension product
elutes last [7]. Figure 5 displays the extension products for a
single reaction whereas Figs. 6 and 7 show extension products
for duplex and triplex reactions, respectively. It should be noted
that the elution order of T and A may sometimes be reversed
(see Note 26).

Fig. 5 Demonstration of single base extension products in three samples and a blank control sample. In this
example the extension products are G and A nucleotides. The genotype of each sample is determined based
on the extension products obtained. u.p. unextended primer
42 Donna Lee Fackenthal et al.

Fig. 6 Demonstration of single base extension products in a duplex reaction where two SNP regions are being
genotyped simultaneously. The extension products are G, T and C, T respectively. The genotype of each sample
is determined based on the extension products obtained. This example demonstrates the results in four experimental
samples and one blank control sample. u.p. unextended primer

Fig. 7 Demonstration of single base extension products in a triplex reaction where three SNP regions are being
genotyped simultaneously. The extension products for all three SNPs are C and A. The genotype of each sample
is determined based on the extension products obtained. This example demonstrates the results in two experi-
mental samples and one blank control sample. U.P. unextended primer. Note—For the first SNP, a reverse
extension primer has been used necessitating reverse complementing of the extended bases for genotype call.
For the second two SNPs forward extension primers have been used

3.3 Mutation 1. This section provides general instructions for the detection of
Detection and mutations using the SURVEYOR® PLUS Mutation Detection
Genotyping Using Kit for WAVE and WAVE HS Systems. In general, processing
SURVEYOR® Nuclease of samples should be carried out from start to finish as
described in the Transgenomic User Guide. If processing of a
sample is stopped before completion of all steps, the DNA
should be stored at −20 °C until the next step is carried out.
However, exposure of any frozen sample to repeated freeze-thaw
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 43

Mutation Detection in Four Easy Steps

Amplify Mutant and Reference DNA

Hybridize DNA Hetero-and Homoduplex Formation

Add SURVEYOR Nuclease

Direct Fragment Analysis

Fig. 8 Steps involved in mutation detection using SURVEYOR nuclease

cycles should be avoided and storage at −20 °C of PCR amplified

DNA or SURVEYOR® Nuclease digestion products for
extended periods (>1 week) should be avoided. There are several
important factors with the process that can have an adverse
effect on the quality of results (see Notes 27).
2. Mutation detection and confirmation with SURVEYOR®
Nuclease involves four steps that are depicted in Fig. 8 and
described below:
Step 1—Prepare PCR amplicons from mutant (test) and wild-
type (reference) DNA.
Step 2—Mix equal amounts of test and reference DNA;
hybridize them by heating and cooling the mixture to
form hetero- and homo-duplexes.
Step 3—Treat the annealed heteroduplex/homoduplex mixture
with SURVEYOR® Nuclease. The reference DNA alone,
treated similarly, serves as a background control.
Step 4—Analyze the DNA fragments with the WAVE™ or
WAVE ™ HS System. The formation of new cleavage
products, due to the presence of one or more mismatches,
is indicated by the presence of additional peaks when
compared to the homoduplex. An example is given in
Fig. 9 where Wild-type (HMD) and mutant (HTD)
amplicons were mixed in a 1:64 and 1:128 ratios and their
digestion products were detected by fluorescence. The
retention time of the cleavage products indicate the size
of the fragments and therefore the location of the mis-
match or mismatches

4 Notes

1. There are four WAVE™ system models. The 3500 is the base
model without high throughput capacity. It uses the DNASep™
Cartridge, has a larger mixer than the 3500HT and does not
44 Donna Lee Fackenthal et al.

Fig. 9 Heteroduplex/homoduplex digestion peaks shown overlaid on the homoduplex digestion profile.
Fluorescent signal produced by SURVEYOR® Nuclease digestion of heteroduplex DNA present in 1:64 and
1:128 ratio of homoduplex DNA. Products were analyzed using the WAVE™ System run under non-denaturing
conditions at 50 °C and equipped with a Fluorescence Detector and a High-Sensitivity Accessory for post-
column DNA intercalation with fluorescent dye

have an internal accelerator. The smaller volume of the mixer

on the 3500HT system allows for an increased flow rate. The
3500A is identical to the 3500 with the addition of the internal
accelerator. The 4500HT model is a high-throughput system
that employs the DNASep™ HT Cartridge, although it can
easily accommodate the DNASep™ Cartridge.
2. The DNASep™HT Cartridge has a larger diameter to accom-
modate an increased flow rate. The 3500HT/4500HT systems
are the only models that can use the DNASep™HT Cartridge.
Another analytical cartridge available for DHPLC mutation
detection analysis is the DNASep™ Cartridge which runs at
lower throughout than the DNASep™HT Cartridge. With
proper care and maintenance, the DNASep™ and DNASep™HT
Cartridges should last a minimum of 6,000 injections.
3. Only ASTM Type 1 (American Society of Testing and Materials)
water of 18 MOhm purity with <5 ppb Total Organic Content
is recommended if buffers are prepared in the laboratory. DO
NOT use autoclaved water (for PCR reactions as well) because
there may be metal ions and/or organic contaminants present.
Buffers are flammable and irritants. Proper laboratory safety
must be exercised when handling these reagents. Generally, a
buffer volume containing approximately 1.25 L is sufficient for
about 250 injections. But this is dependent on the duration of
each injection, the type of method, etc.
4. Name and Sequence of GC clamps [2]
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 45

The following clamps can be added to PCR primers for

amplicon design.

3′ End Up cgggacgc
5′ End Up gcgtcccg
25 bp GCT cgcccgccgccgcccgccgcccgtc
30 bp GCT cgcccgccgccgcccgccgcccgtcccgcc
40 bp GCT cgcccgccgcgccccgcgcccgtcccgccgcccccgcccg
10 bp GC cgcccgccgc
15 bp GC cgcccgccgccgccc
20 bp GC cgcccgccgccgcccgccgc
3′ GC 10 bp cgggcggggg
3′ GC 20 bp cgggcgggggcggcgggccg
5′ GC 10 bp gcccccgccg
5′ GC 20 bp gcggcccgccgcccccgccg

If 5′ or 3′ is not specified, the clamp can be used in forward

or reverse direction.
5. A negative control (a DNA fragment with no sequence change)
is used to compare the profiles of the experimental samples.
A change in peak appearance or number in the experimental
samples indicates a potential variant. This must be further ana-
lyzed by double-stranded sequencing to determine the exact
nature of the sequence variant.
6. PCRs can be performed and stored at 4 °C in advance, however,
it is essential to denature and reanneal the samples just prior to
loading samples on the DHPLC instrument.
7. At 50 °C the PCR products are double-stranded and should be
sized. All samples, with or without mutations/sequence
changes, amplified for a particular region, should have identical
elution profiles at 50 °C. Sizing is performed to check for the
amount of PCR product and its purity prior to running at the
optimal mutation detection temperatures. The PCR product
should show up on the chromatogram as a single sharp peak of
the expected size. The sizing step on the DHPLC negates the
need to run PCR products on agarose gels to check for the
presence of amplification products.
8. One of the greatest factors to ensure optimal functioning of the
DHPLC instrument is the performance of regular maintenance
and quality control procedures. These are detailed below:
(a) The following standards should be run weekly or after
approximately 500-injection intervals: WAVE DNA Sizing
Standard (pUC 18 HAE III digest, 50 °C), WAVE Low-
46 Donna Lee Fackenthal et al.

Range Mutation Control Standard (56 °C), and WAVE

High-Range Mutation Control Standard (70 °C).
Chromatogram peak profiles appear with the reagents. The
DNA Sizing Standard is used for size-based DNA fragment
separations. Nine resolved fragments should be detected
with the following sizes in base pairs: 80, 102, 174, 257,
267, 298, 434, 458, 587. Both the 56 °C and 70 °C
Mutation Standards form two heteroduplexes and two
homoduplexes which assess temperature, elution gradient,
and buffer composition. The peak heights should be even
and symmetrical and the valley between the two peaks
should be deep. Symmetrical peaks with a resolution of
approximately 50 % baseline indicate good resolution and
an oven in calibration. Sometimes uneven peak heights or
uneven or high valley depths indicate that the oven may
need to be re-calibrated or there is loss of resolution.
Symmetrical but poorly resolved peaks indicate loss of reso-
lution due usually to contamination.
If standards deviate from the optimal profiles, it is
possible that the cartridge requires a reverse hot wash
with Solution D, as with increasing numbers of injections
the cartridge will gradually lose resolution (however, with
proper maintenance cartridges should reach over 6,000
injections). A reverse hot wash removes contaminants off
of the front side of the cartridge beads. Because the flow
direction is reversed, the buildup of contaminants is now
washed off the backside of the cartridge. Another possible
cause of decreased cartridge performance can be if reagents
have degraded and need to be changed.
(b) A reverse hot wash should be performed weekly or after
every 1,000 injections to clean contaminants off the front
end of the cartridge. This is performed by physically revers-
ing the cartridge in the oven and washing the cartridge with
100 % Solution D at 80 °C for 30 min followed by equili-
brating at 50 % A and 50 % B at 50 °C for 1 h. Sizing and
mutation standards should be run following the hot wash
to check the quality of resolution.
(c) The inline filter should be changed after every 1,000 injec-
tions. The inline filter serves to block potential contami-
nants from reaching the cartridge.
(d) An isopropyl alcohol wash should be run to flush the system
every 3 months. This is done by first washing the cartridge
with 100 % D for 5 min and then stopping the flow. The
cartridge and inline filter are each replaced with a peak
union. All solvent lines with the inlet filters attached should
be placed in HPLC-grade isopropanol. Purge the pumps
for 5 min. Then flush the system for 15 min at a flow rate of
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 47

1.5 mL/min with Buffer A, Buffer B, and Solution D at

33 % each. The autosampler syringe should be washed
20–50 times as well. Once the system has been flushed with
isopropanol, the solvent lines should be placed in Millipore
or equivalent water. The previous purging and flushing
steps should be repeated. Equilibrate the system for 20 min
with 50 % A and 50 % B. Note that ALL traces of isopropa-
nol must be removed.
(e) It is important to change the solvent inlet filters when they
are discolored, that is, no longer white, or slimy every
3 months. It is possible for the solvent filters to draw a
vacuum and cause irregular elution times.
(f) If the pressure is unusually high, for example >1,700 (this
is checked on the pump display by reading the pressure
value), change the inline filter or perform a reverse hot
wash. Often there may be buildup on either the filter or
cartridge that will cause a pressure increase.
(g) The UV lamp energy should be checked weekly. The lamp
energy should be equal to or greater than half the initial
energy. The wavelength accuracy should be +1 nm [1].
A decreased UV lamp energy will result in lower peak reso-
lution, that is, peaks may be shorter and broader.
9. Equilibrating the cartridge for a minimum of 5 min, between
temperature changes, is recommended to help maintain the
sharpness and consistency of peaks from the beginning of a set
of samples at a given temperature to the end of the set.
10. There are three different clean types. The Active clean type runs
100 % Solution D (75 % Acetonitrile) through the WAVE System
for the amount of clean time specified in the method. The Fast
clean type injects Solution D directly into the flow path but is
only available for systems with an accelerator. The Normal clean
type runs 100 % Buffer B through the WAVE System for the
amount of clean time specified in the method [2].
11. There are six different application types. The first three types are
the most commonly used for mutation detection. Mutation
Detection is used for creating partially denaturing conditions
that will produce heteroduplexes from mixtures of wild-type
and mutant fragments. Rapid DNA, which can only be run on
the 3500HT system, is a faster version of Mutation Detection
used for detecting known mutations. Double-stranded Single
Fragment confirms or determines the size of a single fragment.
This type is used for sizing PCR products. Double-stranded
Multiple Fragments sizes mixtures of fragments over a range of
sizes. Universal Linear is used for cartridge calibration and
general analyses. Oligo Purification is a fragment-separation gra-
dient used for purifying DNA within specific size ranges [2].
48 Donna Lee Fackenthal et al.

12. The appropriate temperature for mutation detection analysis

for each amplicon needs to be determined. Each amplicon
sequence can be checked by the Navigator™ software to deter-
mine its melting temperature and optimal temperature for
mutation detection. The software calculates the optimal tem-
perature for each amplicon based on its sequence and melting
temperature. It should be noted that optimal temperatures can
range from 48 to 68 °C for AT- and GC-rich sequences, respec-
tively [26]. The optimal temperature is the temperature at
which partial denaturation of the amplicon occurs that allows
for the best separation between homoduplex and heteroduplex
molecules. For optimal mutation detection sensitivity, two
additional temperatures are used above and below the calcu-
lated optimal temperature. Generally, samples are run ±0.5°
above/below the calculated temperature as well to capture any
changes in the mutation detection peak profile. However, if
fragments melt fast as indicated on the Helical Fraction % versus
Base Position panel (under the DNA tab), that is, if the helical
fraction % curve drops below 50–60 % within approximately
the first 100 bp, then choose smaller temperature increments,
that is, ±0.2°.
In general, the optimal temperature should result in 65–95 %
helicity of the amplicon, i.e., the portion of DNA that remains
double-stranded. Approximately one-third of the amplicon
should be above 50 % helicity at the chosen temperature.
As per the manufacturer’s recommendations, if the GC
content is above 68 %, a mutation detection temperature that is
0.2° higher with each 2 % of GC content, than that calculated
by the Navigator™ software should be used.
13. Some sequence changes may show up as more subtle changes
in the chromatogram and therefore the recommendation is
to sequence any sample where a deviation from the normal
chromatogram pattern is observed. In some instances, the only
change in chromatogram between the sample with a sequence
change and the normal control may be a decrease in peak
height. When a decrease in peak height is obtained, this should
be compared to the sizing run (run performed at 50 °C when the
amplicon is still double-stranded) performed for this sample’s
amplicon and if the peak height is also observed to be decreased
in the sizing run, then this is most likely indicative of a decrease
in amplification product as opposed to a sequence change.
A sequence change generally shows up as a change in the
chromatogram profile in more than one of the three partially
denaturing temperatures run for each amplicon, and very often
at all three temperatures.
14. Some samples that show more subtle changes in chromatogram
profiles, particularly at only one of the three partially denaturing
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 49

temperatures, may not represent a true sequence change, and

may reflect a PCR or DHPLC artifact. It should be noted
however that a subtle change, if present at all three tempera-
tures, is very likely to represent a true sequence change. The
use of a proofreading Taq polymerase enzyme to generate the
amplicons for DHPLC mutation analysis is likely to minimize
changes in chromatogram profiles that are due to PCR artifacts
as opposed to true sequence changes.
15. When a large number of samples are being genotyped, a
selected number of samples can be checked on an agarose gel
that is representative for the entire sample set being amplified
for genotyping.
16. It is important to simultaneously run controls with known
genotypes as this provides a standard peak comparison with the
experimental samples. Extended products of the experimental
samples can be compared or superimposed to that of the posi-
tive control samples. Peak positions will determine the extended
base products and therefore the genotype of the sample.
17. Multiplex reactions can be performed in a variety of different
ways. This chapter discusses duplex and triplex reactions for
genotyping two to three SNPs simultaneously. While further
multiplexing can be performed [8] this is increasingly more
difficult and requires more thorough optimization of condi-
tions. Duplex PCRs are performed when genotyping two SNPs
not in close proximity to each other and requires two sets of
PCR primers. For triplex PCRs three sets of PCR primers are
used. Primer annealing temperatures should be within 2° of
each other for optimal amplification. Also, primer concentra-
tions may need to be adjusted to give equal PCR products as
viewed on an agarose gel. If the SNPs to be genotyped are in
close proximity to each other they can be included in a single
PCR amplicon followed by a multiplex extension reaction.
In those instances where multiplex PCRs do not work, PCRs
can be performed as individual reactions and then pooled for
the subsequent extension reaction.
18. Double the concentration of Exonuclease I if performing
duplex PCR, that is, when two sets of PCR primers are used.
19. Also include the blank (H20) PCR control in the purification
reaction and subsequent SBE-DHPLC reaction. This is added
not only to check for the purity of reagents but also to allow
for sizing of the extension primer on the DHPLC. Alignment
of the blank control reaction with the extension reaction prod-
ucts of the experimental samples on the DHPLC, will allow for
easy distinction between the extension primer (unextended
product) and the single-base extended products (as no exten-
sion products will be observed in the blank control reaction)
(see Figs. 5, 6, and 7).
50 Donna Lee Fackenthal et al.

20. The extension primer is designed such that its 3′ end lies imme-
diately adjacent to the SNP to be genotyped. The extension
primer can be designed to lie either upstream or downstream
of the SNP. The sequence surrounding the SNP often times
determines which direction to design the extension primer
(i.e., upstream/forward or downstream/reverse). The length
of the extension primer can range between 18 and 24 bp with
an optimal length of 20 bp and aim for a TM of ~60 °C with
~50 % AT and 50 % GC content. When designing the primer
avoid areas of repeats and check primers for hairpin loops and
primer dimer formation especially at the 3′ end of the primer.
Single base changes can be introduced into the primer (usually
towards the 5′ end) if needed to stabilize the structure, break
hairpin loops etc.
21. Dilute the stock enzyme [32 U/μL] 1:12.8 with Thermo
Sequenase™ Enzyme Dilution Buffer. Final concentration should
be 2.5 U/μL. Enzyme solution should be prepared fresh.
22. The following considerations must be taken into account when
designing extension primers for multiplex single-base exten-
sion. There must be sufficient separation between primers, that
is, there must be a minimum of 30 s elution time separating
the primers. This is generally obtained by having primers differ
in length by 2 bp or more. GC content as well as the sequence
content must be considered. GC-rich primers elute earlier than
AT-rich primers. Hydrophobicity of the primers is a factor; for
example, elution times are shorter for cytosine and guanine.
Elution times may also be adjusted with the addition of GC or
AT clamps onto the 5′ end of the extension primer.
23. Prior to pooling reactions one needs to verify that different
extension primers and their extension products can be separated
on the DHPLC by sizing aliquots of the extension primers to
check their separation. Ideally there should be a separation of
a minimum of 30 s between extension primers to allow for
clear separation of the corresponding extension products.
Pooling can be performed when it is difficult to perform mul-
tiplex reactions. Reactions are performed separately and
pooled prior to running on the DHPLC. Pooling (as with
multiplex reactions) saves on DHPLC run times, DHPLC
reagents as well as increases the life span of the DHPLC
cartridge.
24. A “Normal” clean is the 100 % B clean-off step containing
25 % acetonitrile which prevents cartridge gradient fluctuation
for this application, as the gradient conditions for SBE-DHPLC
require a low percentage acetonitrile and small increases over
time. The “Fast” or “Active” clean is the 100 % D clean-off
step containing 75 % acetonitrile.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 51

25. The gradient duration is modified to 2 min that cuts down the
run time. The gradient duration may be increased to 2.5 min
or more as needed when multiplex SBE is performed.
26. The elution order of T and A may sometimes be switched at
70 °C. Devaney et al. [7] reported an elution order of C<G<A<T
at 70 °C. This elution order has also been observed when a set
of four 16-mer heterooligonucleotides differ in a single base at
the 3′ end [6]. We have also observed an elution order of
C<G<T<A. Our experimental samples are run at a 1.5 mL/min
flow rate using the high-throughput (DNASep™HT) cartridge.
It is possible that the faster run time as well as the dimensions of
the DNASep™HT Cartridge versus the DNASep™ Cartridge
(as used by Devaney et al. [7]) have a subtle effect on the elution
order of T and A. In addition it is known that retention may be
governed not only by the substituted base but also by the
immediate sequence context. For this reason the inclusion of
positive controls with known genotypes (as determined by an
independent method) is very important.
27. Factors Affecting the Quality of SURVEYOR® Nuclease Results:
(a) The quality of the genomic DNA to be amplified: High
quality DNA (from fresh or frozen cells or tissue) should
be used. The DNA should have a concentration of >5 ng/
μL as determined by absorbance at 260 nm, have an
absorbance ratio at 260/280 nm of >1.7 and be >90 %
DNA (i.e., free of most tRNA and rRNA contamination
as judged by appearance on an agarose gel). Store DNA
samples at −20 °C. If the DNA template is extracted from
paraffin-embedded tissue, several additional precautions
can be taken. The extracted DNA can be treated with uracil
DNA glycosylase to prevent amplification of DNA frag-
ments containing deaminated C residues. Often a high
percentage of the A260 adsorbing material extracted from
paraffin-embedded tissue is not amplified well during
PCR. Using a larger amount of starting DNA, e.g., ~50
versus 10 ng, will often help to produce a reasonable
amplification product.
(b) The quality of the PCR amplified DNA: PCR should pro-
duce a sufficiently high yield (>15 ng/μL) of a SINGLE
amplified species of the correct size. We strongly
recommend the use of a proofreading DNA polymerase
(such as Transgenomic Optimase® Polymerase) to reduce
the amount of base misincorporation during PCR (which
leads to the generation of “false” mutations and spurious
SURVEYOR® Nuclease cleavage fragments). Similarly,
nonspecific PCR fragments can be interpreted as muta-
tions and can mask SURVEYOR® Nuclease mismatch
52 Donna Lee Fackenthal et al.

cleavage products. If possible, a reference DNA should be

digested with SURVEYOR® Nuclease and run to exclude
spurious background by visual comparison of chromato-
gram profiles. Primer-dimers should be strenuously avoided
as their presence dramatically inhibits SURVEYOR®
Nuclease cleavage at mismatch sites. Examine each ampli-
fied DNA product before digestion by gel electrophoresis
or WAVE HPLC to be sure that it is a single species of the
expected size.
(c) The relative proportion of mutant (test) to wild-type (refer-
ence) DNA in the hybridized sample. Whenever possible,
test and reference PCR products should be hybridized in
equal proportion to maximize the amount of heteroduplex
DNA available for digestion.
(d) Suppression of DNA nicking: SURVEYOR® Nuclease
nicks double-stranded DNA at random matched sites,
which produces background during extended incubations.
This activity is suppressed by SURVEYOR® Enhancer W2
and its cofactor without otherwise negatively affecting the
reaction. SURVEYOR® Nuclease Enhancer W2 and cofactor
are included in this kit.
(e) The composition of the PCR buffer: Commercially available
PCR buffers vary dramatically in content and the contents
are often not defined by suppliers. A few buffers are NOT
compatible with SURVEYOR® Nuclease due to pH or the
presence of additives, surfactants, or other proprietary ingre-
dients. As the result of development efforts at Transgenomic,
new SURVEYOR® Nuclease reaction conditions have been
defined for a large number of different PCR buffers that
improve signal intensity significantly (1.5–4 fold, depending
upon the buffer) while maintaining low background. These
new reaction conditions are included in the protocols
described in the Transgenomic User Guide.
(f) Signal-to-Noise ratio. The signal to noise ratio is generally
high enough to detect mutations present at a low percent-
age of the total DNA template; it is possible to detect
1–20 % mutant DNA depending upon the particular DNA
amplicon, its size, the number and type(s) of mutation(s),
and the model of WAVE™ platform used. Figure 10 shows
the digestion products generated with homoduplex and
heteroduplex Control DNA (included with the
SURVEYOR® Nuclease Kits) fractionated by ion-pairing
reverse phase HPLC under non-denaturing conditions
using the WAVE™ HS System, respectively. The mutation-
specific cleavage products are clearly seen as two new peaks
eluting with the expected sizes that can be estimated relative
to the DNA size marker.
Denaturing High-Performance Liquid Chromatography for Mutation Detection… 53

Fig. 10 SURVEYOR Nuclease digestion products of self-annealed Control C homoduplex (HMD) and Control G/C
homoduplex/heteroduplex (HTD). The 633-bp amplicons were PCR amplified with Optimase polymerase from
2 μL of Control G and Control C. Control G/C homoduplex/heteroduplex was formed by hybridizing equal
amounts of Control G and Control C homoduplex PCR product and contains homoduplexes and C/C and G/G
mismatched heteroduplexes. DNA (300, 600, and 1,200 ng) was digested with 1 μL of SURVEYOR Nuclease W,
1 μL of SURVEYOR Enhancer W2, 1/10th volume of 0.15 M MgCl2 and 1/10th volume of Enhancer Cofactor for
60 min at 42 °C. SURVEYOR Nuclease digestion products [180 (6 μL), 400 (12 μL), and 850 ng (24 μL)] were
analyzed using the WAVE system run under non-denaturing conditions at 50 °C and equipped with a
Fluorescence Detector and a High-Sensitivity Accessory for post-column DNA intercalation with fluorescent
dye. Transgenomic 100-bp ladder DNA was run as a marker. The 217- and 416-cleavage products expected
from the Control G/C heteroduplex are clearly visible. Also visible in all three chromatograms is the full-length
633-bp homoduplex. Flat-top peaks were produced by injection of amounts of DNA that saturated the instrument
detector

References
1. Oefner PJ, Underhill PA (1995) Comparative 4. O’Donovan MC et al (1998) Blind analysis of
DNA sequencing by denaturing high- denaturing high-performance liquid chroma-
performance liquid chromatography tography as a tool for mutation detection.
(DHPLC). In: ASHG Annual meeting A2666. Genomics 52:44–49
University of Chicago Press 5. Frueh FW, Noyer-Weidner M (2003) The use
2. Navigator™ Software Manual, Version 3, © of denaturing high performance liquid chro-
(2002–2009) Transgenomic, Inc., used with matography (DHPLC) for the analysis of
permission genetic variations: impact for diagnostics and
3. Cotton RG (1997) Slowly but surely towards pharmacogenetics. Clin Chem Lab Med
better scanning for mutations. Trends Genet 41:452–461
13:43–46
54 Donna Lee Fackenthal et al.

6. Oefner PJ (2000) Allelic discrimination by 17. Perry JA et al (2003) A TILLING reverse

denaturing high-performance liquid chroma- genetics tool and a Web-accessible collection
tography. J Chromatogr B 739:345–355 of mutants of the legume Lotus japonicus.
7. Devaney JM et al (2001) Genotyping of two Plant Physiol 131:866–871
mutations in the HFE gene using single-base 18. Wienholds E et al (2003) Efficient target-
extension and high-performance liquid chro- selected mutagenesis in zebrafish. Genome Res
matography. Anal Chem 73:620–624 13:2700–2707
8. Wu G et al (2003) Rapid, accurate genotyping 19. Smits BMG et al (2004) Genetic variation in
of β-thalassaemia mutations using a novel mul- coding regions between and within commonly
tiplex primer extension/denaturing high- used inbred rat strains. Genome Res 14:
performance liquid chromatography assay. Br J 1285–1290
Haematol 122:311–316 20. Sokurenko EV (2001) Discovering the sweep-
9. Oleykowski CA et al (1998) Mutation detec- ing power of point mutations using a GIRAFF.
tion using a novel plant endonuclease. Nucleic Trends Microbiol 9:522–525
Acids Res 26:4597–4602 21. Scaffino MF et al (2004) Heteroduplex detec-
10. Yang B et al (2000) Purification, cloning, and tion with a plant DNA endonuclease for standard
characterization of CEL I nuclease. gel electrophoresis. Transgenics 4:157–166
Biochemistry 39:3533–3541 22. Caldwell DG, et al (2004) A structured mutant
11. Sokurenko EV et al (2001) Detection of sim- population for forward and reverse genetics in
ple mutations and polymorphisms in large Barley (Hordeum vulgare L.). Plant J. doi:
genomic regions. Nucleic Acids Res 29:e111 10.1111/j.1365-313X.2004.02190.x
12. Qiu P et al (2004) Mutation detection using 23. Qiu P, Shandilya H, Gerard GF (2005) A
Surveyor nuclease. Biotechniques 36:702–707 method for clone confirmation using a
13. Kulinski J et al (2000) CEL I enzymatic mismatch-specific DNA endonuclease. Mol
mutation detection assay. Biotechniques Biotechnol 29:11–18
29:44–48 24. Taylor P, Munson K, Gjerde D (1999)
14. Colbert T et al (2001) High-throughput Detection of mutations and polymorphisms on
screening for induced point mutations. Plant the WAVE™ DNA fragment analysis system.
Physiol 126:480–484 Application Note 101. Transgenomic Inc.
15. Till BJ et al (2003) Large-scale discovery of 25. Kuklin A et al (1997/98) Detection of single-
induced point mutations with high through- nucleotide polymorphisms with the WAVE™
put TILLING. Genome Res 13:524–530 DNA fragment analysis system. Genet Test
16. Greene EA et al (2003) Spectrum of chemi- 1(3):201–206
cally induced mutations from a large-scale 26. Xiao W, Oefner PJ (2001) Denaturing high-
reverse-genetic screen in Arabidopsis. Genetics performance liquid chromatography: a review.
164:731–740 Hum Mutat 17:439–474
Chapter 3

Clinical SNP Detection by the SmartAmp Method

Toshihisa Ishikawa and Yoshihide Hayashizaki

Abstract
For advancing personalized medicine, it is important to incorporate pharmacogenomics data into routine
clinical practice. The SmartAmp method enables us to detect genetic polymorphisms or mutations in tar-
get genes within 30–40 min without DNA isolation and PCR amplification. The SmartAmp method has
been developed based on the concept that DNA amplification per se is the signal for the presence of a
specific target sequence. Differing from the widely used PCR, the SmartAmp reaction is an isothermal
DNA amplification, where the initial step of copying a target sequence from the template DNA is critically
important. For clinical applications, we have created SmartAmp primers and clinical device that detect
genetic polymorphisms of human genes involved in drug-induced toxicity or disease risk. This chapter
addresses both the basic molecular mechanism underlying the SmartAmp method and its practical applica-
tions to detect clinically important single nucleotide polymorphisms (SNPs).

Key words Personalized medicine, Point of care, Adverse reaction, Warfarin, Irinotecan, ABCB1
(MDR1), ABCC11, VKORC1, CYP2C9, UGT1A1

1 Introduction

Genetic polymorphisms and mutations in drug metabolizing

enzymes, transporters, receptors, and other drug targets (e.g., tox-
icity targets) are linked to inter-individual differences in the efficacy
and toxicity of medications as well as risk of genetic diseases. The
inter-individual variation in the rate of drug metabolism has been
known for many years. Pharmacogenomics dealing with heredity
and response to drugs is part of science that attempts to explain
variability of drug responses and to search for the genetic basis of
such variations or differences. Validation of clinically important
genetic polymorphisms and development of new technologies to
rapidly detect clinically important variants are critical issues for
advancing personalized medicine.
Recent years, technologies are evolving to transform diagnos-
tic devices for rapid testing at the Point-of-Care (POC). Portable
devices are being engineered for use in a range of settings to per-
form robust assays for the diagnosis of disease that will improve

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_3, © Springer Science+Business Media, LLC 2013

55
56 Toshihisa Ishikawa and Yoshihide Hayashizaki

patient management, and result in greater convenience and speed

to answer. The POC diagnostics is a growing field that is gradually
becoming more user-friendly with the introduction of portable
devices and quicker nucleic acid detection. Successful POC diag-
nostics require four major elements, such as rapid reaction, low
cost, low energy consumption, and simple analysis (with minimal
technical training and inclusion of controls but no off-instrument
processing or reagent preparation). In this context, we decided to
develop POC technology and to apply it to medical advances.
The next important step is to incorporate pharmacogenomics
data into routine clinical practice. Development of personalized
medicine including POC diagnostics requires integration of vari-
ous segments of biotechnology, clinical medicine, and pharmacol-
ogy. A key requirement for the advancing personalized medicine
resides in the ability of rapidly and conveniently testing patients’
genetic polymorphisms and/or mutations.
In 2007, the SmartAmp method was developed based on the
principal concept that DNA amplification per se is the signal for
detection of a genetic mutation or SNP. Differing from the widely
used PCR, the SmartAmp method is an isothermal DNA amplifica-
tion reaction [1, 2]. Therefore, the method enables rapid detection
of SNPs and mutations using a simple and cost-effective instru-
ment. While we describe a protocol for SmartAmp-based SNP
detection using a real-time PCR detection system in this article,
end-point detection with a CCD camera-linked digital processor
has recently been developed in our laboratory [2]. By using 96-well
or 384-well plates and automated dispenser units, the throughput
of SmartAmp-based SNP detection will be markedly increased.

2 Materials

2.1 Enzyme The SmartAmp method utilizes Aac polymerase as a DNA poly-
merase with strand-displacement activity. Aac polymerase is from
the thermophilic bacteria Alicyclobacillus acidocaldarius. This DNA
polymerase is highly resistant to cellular contaminants and hence
works directly on blood samples, just after a simple heat treatment
(98 °C, 3 min) to degrade/denature RNA and proteins. This is a
great advantage of the SmartAmp method over the commonly used
PCR-based techniques. The enzymatic activity of Taq DNA poly-
merase is easily inhibited by impurities in the PCR reaction.
1. Aac polymerase used for the SmartAmp method is a cloned
large fragment of 610 amino acids (69.5 kDa) that carries
DNA strand displacement activity. The optimal temperature
and pH for this enzyme are 60–65 °C and 8.0–8.2,
respectively.
Clinical SNP Detection by the SmartAmp Method 57

2. This enzyme is commercially available from K.K. DNAFORM

(Yokohama, Japan). See the Web page: http://www.dnaform.
jp/smartamp/index_e.html.

2.2 Primers In the SmartAmp method, the entire DNA amplification process
requires five primers: turnback primer (TP), boost primer (BP),
folding primer (FP), and two outer primers (OP1 and OP2)
(Fig. 1). Primers are selected based on those algorithms consider-
ing the free energy, probability of base-pairing, product size range,
optimal melting temperature, and product size range. The design
of these primers contributes to the specificity of SmartAmp. In
particular two primers (TP and FP) are critically important for the
amplification process. The genomic sequence between the anneal-
ing sites of the TP and FP primers is the target region that will be
amplified by the SmartAmp reaction. The other primers (BP, OP1,
and OP2) are additionally employed to accelerate the process and
enhance specificity. Those primers can be synthesized and obtained
from any commercial source, such as Invitrogen.
1. In isothermal DNA amplification by the SmartAmp method,
the initial step of copying a target sequence from the genomic
DNA is a prerequisite. FP and TP hybridize the template
genomic DNA. Next, both products primed for the FP and
TP are detached from template genomic DNA by strand-
displacing DNA polymerase, whose extensions are primed by
OP1 and OP2. Single-stranded DNA products, thus displaced,
become templates in the second step for the opposing FP and
TP. These single stranded DNA products are generated by the
strand-displacement activity of the DNA polymerase, being
primed from the flanking region of OP primers adjacent to the
target sequence. The resulting DNA products are referred to
as “intermediate products” which play key roles in the subse-
quent amplification steps (Figs. 1 and 2).
2. The formation of those intermediate products is the rate-
limiting step in SmartAmp-based isothermal DNA
amplification. Intermediate 1 (IM1) has the TP sequence at
the 5′ end and the FP complementary sequence at the 3′ end.

BP FP OP2

5’ 3’
Genomic DNA
3’ 5’
OP1 TP
turn back 20 b

Fig. 1 Schematic illustration of the SmartAmp reaction using five primers: turnback primer (TP), boost primer
(BP), forward primer (FP), and two outer primers (OP1 and OP2)
58 Toshihisa Ishikawa and Yoshihide Hayashizaki

Self-primed
DNA synthesis
(Concatenation)

FP
3’ Intermediate 1 (IM1)
3’
OP1 TP turn back 3’
5’ 5’

FP OP2

5’ 3’
Genomic DNA
3’ 5’
OP1 TP turn back

3’ 5’ FP OP2
3’
Intermediate 2 (IM2) 3’
5’
TP
Self-primed
DNA synthesis
(Concatenation)
Fig. 2 Formation of intermediate products in the initial step of the SmartAmp reaction. As an initial step,
primer’s priming and DNA polymerase reaction generate two intermediates (IM1 and IM2). The inner primer set
with FP and TP initiates the reaction by hybridizing to opposite strands of a target region. Linear primer exten-
sion products from the FP and TP primers are then released from their templates in a second primer extension
reaction driven by a set of outer primers (OP1 and OP2) that hybridize downstream of the FP and TP primers.
Due to the special features of the FP and TP primers, single-stranded primer extension products from those
primers will refold at their 3′ and 5′ ends to form new priming sites that maintain self-amplification in a cau-
tious process driven by the DNA strand displacement activity of Aac DNA polymerase

Intermediate 2 (IM2) is complementary to IM1 (Fig. 2). The

initial self-priming site on IM1 is the 3′-end of the FP sequence
of IM1. Concatenated products of IM1 are synthesized by an
elongation proves termed pathway A. The characteristic fea-
ture of the products of pathway A is that the free 5′ and 3′
ends carry TP and its complementary sequence, forming long
double stranded hairpin DNA. The initial self-priming elonga-
tion site on IM2 is located at the 3′ end of the TP sequence of
IM2. Long concatenated DNA products are synthesized as in
pathway A, but end products in pathway B are different. The
long-hairpin DNA products of pathway B carry FP and its
complementary sequence at the free 5′ and 3′ ends respec-
tively. There is another elongation pathway which starts from
the 3′ end of a free TP-primer that hybridizes to the looping
structure of the TP complementary sequence, which is located
Clinical SNP Detection by the SmartAmp Method 59

Pathway A Pathway B

3’ 3’ 5’
5’
IM1 IM2
5’ 3’
3’ 5’

SmartAmp reaction products

Fig. 3 Formation of concatenated DNA products in the SmartAmp reaction. Self-priming DNA synthesis from
each of IM1 and IM2 creates hairpin molecules via pathways A and B. These structures lead to further self-
primed DNA synthesis to create dimeric amplicons and then subsequently concatenated DNA products

at the intermediate region of the long products of pathway A.

Thus, concatenated DNA products are formed in the
SmartAmp reaction. The resulting DNA products could be
detected by conventional agarose gel electrophoresis, where
DNA ladder patterns represented the formation of concate-
nated DNA products [1] (Fig. 3).
3. To ensure the high fidelity of SNP detection by the SmartAmp
method, exponential amplification of mis-primed DNA must
be suppressed. In the original SmartAmp method, this was
achieved by adding either the mismatch binding protein
(MutS) Thermus aquaticus [1, 2] or a competitive probe (CP)
[3, 4] to the reaction mixture. MutS inhibits background
DNA from entering the amplification cycle by specifically
binding to mis-primed amplification products. In addition, a
combination of the asymmetrical primers, i.e., TP and FP is
used to minimize alternative mis-amplification pathways [1].
4. Instead of MutS with a narrow window of optimal concentra-
tion, CP has been developed for SNP detection. Figure 4
depicts an example of SNP detection using CP. Since the
3′-end of CP is blocked by NH2-group, DNA polymerization
from that end does not take place. CP recognizes the target
SNP and binds to DNA, and thereby it interferes with binding
of the turn-back tail of TP to DNA (Fig. 4).
60 Toshihisa Ishikawa and Yoshihide Hayashizaki

16q24 16q23 16q22 16q21 16q13 16q11 16p11 16p12 16p13

16q12
5’
ABCC11
1 2 3 4 5 6 7 8

9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26

27 28 29 30

3’

SNP(538G or A)

BP FP OP2
Genomic DNA 3’ 5’ 3’ 5’
5’ 3’
-36 -18 -9 +1 +6 +18 +25 +41
-77 -64 -55 -34 Intron
3’ 5’
5’ 3’ 3’
OP1 TP 3’-NH2- 5’
Turn-back CP
5’ 0
Exon 4
Fig. 4 Schematic illustration of the SmartAmp method-based SNP detection. The SNP 538G>A resides in exon
4 of the ABCC11 gene on chromosome 16q12. The lower panel shows the strategy for the SmartAmp method-
based detection of SNP 538G>A in ABCC11 gene

2.3 Reagents 1. Deoxynucleotide triphosphates (dNTP) are available from any

commercial source.
2. Tris(hydroxymethyl)aminomethane (Tris), dimethyl sulfoxide
(DMSO), and Tween®20 are available from any commercial
source.
3. Chemicals, such as HCl, KCl, (NH4)2SO4, MgSO4, are of ana-
lytical grade.
4. 10× SmartAmp buffer: 200 mM Tris–HCl (pH 8.0), 100 mM
KCl, 100 mM (NH4)2SO4, 80 mM MgSO4, 1 % (v/v)
Tween®20.
5. SYBR®Green I (Molecular Probes, Invitrogen).
6. Sample pretreatment solution: 40 mM NaOH.
Clinical SNP Detection by the SmartAmp Method 61

2.4 Instrument 1. For laboratory use, real-time PCR model Mx3000P

(Stratagene, La Jolla, CA, USA), Lightcycler 480 (Roche), or
other compatible instruments are applicable to the SmartAmp-
based SNP detection.

2.5 Samples 1. The blood sample can be obtained with a finger prick.
2. The minimal volume of blood sample to be tested: 2 μl.
3. The blood sample is subsequently pretreated and then applied
to the reaction of SmartAmp-based SNP detection.

3 Methods

3.1 Sample Before applied to the SmartAmp reaction, each blood sample
Pretreatment should be diluted fourfold with 40 mM NaOH and then heated at
98 °C for 3 min. During this pretreatment process, proteins and
RNA are denatured and degraded under alkaline conditions.
Genomic DNA remains intact at concentrations of 5–10 ng/μl in
the sample after the pretreatment.
1. Take 2 μl of the blood sample and mix with 6 μl of 40 mM
NaOH to make a fourfold dilution.
2. Heat the mixed sample solution at 98 °C for 3 min.
3. Chill the sample solution on ice until use.

3.2 SmartAmp The standard reaction mixture (total volume of 25 μl) contains:
Reaction 2.0 μM FP, 2.0 μM TP, 1.0 μM BP, 0.25 μM OP1, and 0.25 μM
OP2, 1.4 mM dNTPs, 5 % DMSO, 20 mM Tris–HCl (pH 8.0),
10 mM KCl, 10 mM (NH4)2SO4, 8 mM MgSO4, 0.1 % (v/v)
Tween®20, 1/100,000-diluted SYBR® Green I, 20 units of Aac
DNA polymerase.
1. Add 1 μl of the pretreated sample to 25 ml of the SmartAmp
reaction mixture.
2. Incubate the SmartAmp reaction mixture at 60 °C for
30–60 min in a real-time PCR instrument.
3. Measure the fluorescence of SYBR® Green I that indicates for-
mation of concatenated DNA products.

3.3 Detection of SNP We here present an example of SmartAmp-based SNP detection

538G>A in the Human using clinical samples. This SNP detection procedure can be applied
ABCC11 Gene to genetic testing of axillary osmidrosis and other clinical indica-
tions (e.g., mastopathy). Human ATP-binding cassette (ABC)
transporter ABCC11 functions as an ATP-dependent efflux pump
for amphipathic anions. One non-synonymous SNP 538G>A
62 Toshihisa Ishikawa and Yoshihide Hayashizaki

(Gly180Arg) has been found to greatly affect the function and

stability of de novo synthesized ABCC11 (Arg180) variant protein.
The SNP variant lacking N-linked glycosylation is recognized as a
misfolded protein in the endoplasmic reticulum (ER) and readily
undergoes proteasomal degradation [4]. This ER-associated degra-
dation of ABCC11 protein underlies the molecular mechanism of
affecting the function of apocrine glands. On the other hand, the
wild type (Gly180) of ABCC11 is associated with wet-type earwax
[6], axillary osmidrosis [4, 5], colostrum secretion from the mam-
mary gland [6], and the potential susceptibility of breast cancer [7].
Furthermore, the wild type of ABCC11 reportedly has ability to
efflux cyclic nucleotides and nucleoside-based anticancer drugs [8].
1. The SNP 538G>A (Gly180Arg) resides on exon 4 of the
ABCC11 gene located on human chromosome 16q12.1
(Fig. 4). To determine the SNP 538G>A (Gly180Arg) in the
ABCC11 gene, we prepared one set of primers designated TP,
FP, BP, OP, and CP (Fig. 5) The TPs discriminate the

WT allele (538G) primers

OP1 TP (538G) BP
5’-CAGTGCTTCTGGTGATGCTGAGGTTCCAGAGAACAAGGTTGATTTTCGATGCACTTCTGGGCATCTGCTTCTG

TP /Bc FP OP2
CATTGCCAGTGTACTCGGGCCAGTAAGTGGCAGACTTGGTGAGGTTTGGGGGACTCTAGGCTTCAGAGGT-3’
CP (538G)

SNP allele (538A) primers

BP
OP1 TP (538A)
5’-CAGTGCTTCTGGTGATGCTGAGGTTCCAGAGAACAAGGTTGATTTTCGATGCACTTCTGGGCATCTGCTTCTG

TP/Bc FP OP2
CATTGCCAGTGTACTCAGGCCAGTAAGTGGCAGACTTGGTGAGGTTTGGGGGACTCTAGGCTTCAGAGGT-3’
CP (538A)

TP (538G) 5’-CGAGTACACT GGTTGATTTTCGATGCACTTC-3’

TP (538A) 5’-CTGAGTACACT AGGTTGATTTTCGATGCACTTC-3’
FP 5’-agcgatgcgttcgagcatcgct GTCTGCCACTTACTGGCC-3’
BP 5’-AGAAGCAGATGCCCAGAA-3’
OP1 5’-TGATGCTGAGGTTCCAG-3’
OP2 5’-TAGAGTCCCCCAAACCT-3’
CP (538G) 5’-TACTGGCTCGAGTACAC-NH2-3’
CP (538A) 5’-TACTGGCCCGAGTACAC-NH2-3’

Fig. 5 Partial genomic DNA sequences of the ABCC11 gene carrying WT (538G) and SNP (538A) alleles as well
as the sequences of the primers used for the SmartAmp assay. Arrows indicated the sequence difference
between the WT and SNP alleles
Clinical SNP Detection by the SmartAmp Method 63

polymorphism 538G or 538A in the ABCC11 gene, and the

CPs inhibit the background amplification from mismatch
sequence pairs.
2. A sample to be subjected to the SmartAmp-based detection of
the SNP 538G>A in the ABCC11 gene is prepared from blood
samples by incubating at 98 °C for 3 min (see Subheading 1).
After chilling on ice, 1 μl of the pretreated sample is added
directly into the reaction mixture (final volume of 25 μl) con-
taining 2.0 μM FP, 2.0 μM TP, 1.0 μM BP, 0.25 μM OP1,
0.25 μM OP2, 20 μM, 1.4 mM dNTPs, 5 % DMSO, 20 mM
Tris–HCl (pH 8.0), 10 mM KCl, 10 mM (NH4)2SO4, 8 mM
MgSO4, 0.1 % (v/v) Tween®20, 1/100,000-diluted SYBR®
Green I, 0.24 unit/μl Aac DNA polymerase. The SmartAmp
reaction mixture is incubated at 60 °C for 30–60 min under an
isothermal condition in a real-time PCR model Mx3000P sys-
tem, where changes in the fluorescence intensity of SYBR®
Green I dye indicating DNA amplification are monitored dur-
ing the reaction.
3. The SmartAmp primers we have designed (Fig. 5) selectively
recognize the SNP 538G>A of the ABCC11 gene to discrimi-
nate homozygous 538G/G (wet type), heterozygous 538G/A
(wet type), and homozygous 538A/A (dry type) in genomic
DNA (Fig. 6). These results are consistent with the sequence
analysis data (Fig. 6).

538G/G 538G/A 538A/A

4,500 4,500 4,500
Fluorescence (dR)

Fluorescence (dR)
Fluorescence (dR)

3,500 3,500 3,500

538G G
2,500 2,500 2,500
A A
1,500 1,500 1,500
538A G
500 500 500

-500 -500 -500

0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time (min) Time (min) Time (min)

538G/G 538G/A 538A/A

Fig. 6 Detection of the SNP 538G>A by SmartAmp assay. Upper panels demonstrate time-courses of the
SmartAmp assay reaction with ABCC11 allele–specific primers. Lower panels show the results of DNA
sequence analysis for three diploid genotypes of ABCC11
64 Toshihisa Ishikawa and Yoshihide Hayashizaki

4 Notes

4.1 Primer Design To design primer sets for the reaction, we have developed algo-
rithms specific to SmartAmp primer design [9]. Primer candidates
can be selected based on those algorithms considering the free
energy, probability of base-pairing, product size range, optimal
melting temperature, and product size range. The design of these
primers contributes to the specificity of SmartAmp. In particular
two primers (TP and FP) are critically important for the amplifica-
tion process. For SmartAmp primer design, it is convenient to use
the software program available on the Web site at http://www.
smapDNA.com. Initial candidate primer sets can be generated
with this program.
Primer extension-based SNP detection systems usually require
the SNP detection nucleotide to be engineered precisely at the
3′-end of a specific primer [2]. However, SmartAmp does not have
this limitation and thus there is a far greater versatility in its ability
to detect SNPs. The design options are numerous and the primer
design flexibility is unrivaled. However, the best primer set should
be selected by experimental screening among numerous possible
combinations of primer candidates. The criteria of the screening
are: no mis-amplification, high fidelity and selectivity to the target
SNP, and high sensitivity to the target site. For clinical SNP detec-
tion, the minimal detection limit of SmartAmp-based SNP detec-
tion is 5 ng genomic DNA per 1 μl of sample or even lower.

4.2 Variations There are several applications of the SmartAmp method to detec-
of SmartAmp-Based tion of clinically important genetic polymorphisms of drug metab-
SNP Detection olizing enzymes and transporters.
1. Detection of CYP2C9*2, CYP2C9*3 and vitamin K oxide
reductase VKORC1.
Warfarin is the most widely prescribed anticoagulant for the
treatment of thromboembolic disorders. Because of its narrow
therapeutic index and the large individual variability observed
between warfarin dosage and its anticoagulant effect [10–12], it
is essential to carefully adjust the dosage based on the prothrom-
bin time (PT) expressed as the international normalized ratio.
The genetic polymorphisms of CYP2C9*2 and CYP2C9*3 and
in the vitamin K oxide reductase (VKORC1) promoter
−1639G>A have a great impact on the pharmacokinetic profile
and pharmacological efficacy of warfarin. Genetic testing of a
patient for these SNPs prior to prescription of the drug is of great
importance in warfarin-based individualized pharmacotherapy
that will minimize the risks of adverse reactions and reoccurrence
of thromboembolic episodes. In 2007, the US FDA updated the
labeling for warfarin (http://www.fda.gov/bbs/topics/
NEWS/2007/NEW01684.html) such that genetic testing is
Clinical SNP Detection by the SmartAmp Method 65

recommended to ensure the efficacy and safety of warfarin by

adjusting the optimal dose for individual patients. Therefore, we
aimed to analyze CYP2C9*2, CYP2C9*3, and VKORC1
−1639G>A polymorphisms by the SmartAmp method. Blood
samples from a total of 125 consenting participants were used to
test for those SNPs by the SmartAmp method, whereby samples
were subjected to real-time assay without DNA purification.
SmartAmp-based SNP testing was completed within 45 min for
each blood sample, and the obtained data were perfectly consis-
tent with the data of PCR-restriction fragment length polymor-
phisms (PCR-RFLP) [13].
With respect to the CYP2C9*2 polymorphism, 123 par-
ticipants were homozygous wild-type, one was heterozygous
(WT/SNP), and one was homozygous SNP. For the
CYP2C9*3 polymorphism, 116 participants were homozy-
gous wild-type (WT/WT), eight were heterozygous (WT/
SNP), and one was homozygous SNP [13]. All of the data
were verified by PCR-restriction fragment length polymor-
phisms (PCR-RFLP), and the results demonstrated a perfect
concordance with the SmartAmp results. Neither false posi-
tives nor false negatives were observed in the SmartAmp-based
SNP detection.
2. Detection of UDP-glucuronosyltransferase UGT1A1*28.
Irinotecan (CPT-11) is a camptothecin analogue with
strong antitumor activity that acts through inhibition of topoi-
somerase I. Irinotecan is now widely used, especially for treat-
ing colorectal and lung cancers, but occasionally causes
unpredictably severe leucopenia or diarrhea and fatal toxicity.
Irinotecan is hydrolyzed in vivo to form an active metabolite
SN-38 by carboxylesterase. SN-38 is subsequently conjugated
mainly by UDP-glucuronosyltransferase UGT1A1 to form a
hydrophilic glucuronide conjugate. Genetic polymorphisms of
UGT1A1 are reportedly an important determinant of individ-
ual variation in susceptibility to the toxicity of irinotecan. Severe
toxicity is attributed, at least in part, to increased exposure to
SN-38 caused by decreased glucuronidation activity owing to
genetic polymorphisms of UGT1A1. Previous studies [14–20]
have provided evidence that the UGT1A1*28 polymorphism is
linked to irinotecan toxicity. Thus, the US FDA encourages
genetic testing to reduce the risk of UGT1A1*28-mediated
irinotecan toxicity.
Microsatellite polymorphisms that are typically copy num-
ber differences of two to four nucleotide repeats are a very
important class of genetic variations found in many genes.
One well-studied example of a microsatellite polymorphism,
the UGT1A1*28 allele, has been linked to a pharmacokinetic
phenotypic outcome. The TATA box in the promoter of this
66 Toshihisa Ishikawa and Yoshihide Hayashizaki

allele generally includes a wild-type sequence of (TA)6TAA.

The UGT1A1*28 allele, however, has a two-base pair inser-
tion (TA) resulting in the sequence (TA)7TAA and is associ-
ated with impaired expression of UGT1A1 and reduced
glucuronidation of SN-38 [21]. Several previous reports of
allele frequencies of UGT1A1 variants have included (TA)5,
(TA)6, (TA)7, and (TA)8 in the TATA box in various ethnic
groups [22].
Initially, the UGT1A1*28 allele was a difficult target
sequence for assay development by the conventional SmartAmp
primer approach, because of a high frequency of mis-match
amplification. This phenomenon may be typical of promoter
polymorphisms that differ only in copy number of the repeat
sequence. We have improved the SmartAmp method for
detecting the UGT1A1*28 polymorphism by using a com-
petitive probe (CP) to suppress mis-amplification [23]. By
using the CP with complete homology to the repetitive “TA”
dinucleotides and some flanking sequence on either side,
hybridization to the mismatch allele can be favored, because
its melting temperature (Tm) is higher than that of the unfa-
vorable mismatch hybridization event to the discrimination
primer (i.e., FP).
3. Detection of triallelic SNPs in Human ABC transporter
ABCB1 gene.
Human ABC transporter ABCB1 (P-glycoprotein/
MDR1) was originally identified as a multidrug export pump
overexpressed in cancer cells, whereas it is also expressed in
many normal tissues. For example, ABCB1 is located in the
apical domain of the enterocytes of the gastrointestinal tract
(jejunum and duodenum) and limits the uptake and absorp-
tion of drugs and other substrates from the intestine into the
systemic circulation by excreting substrates into the gastroin-
testinal tract. In addition, the expression of ABCB1 on the
luminal membrane of capillary endothelial cells of the brain
restricts drug distribution into the central nervous system.
This function of ABCB1 appears to be very important for pro-
tecting the central nervous system from attack by toxic com-
pounds. A similar protective role to limit the distribution of
potentially toxic xenobiotics into tissues was suggested for
ABCB1 expressed in the placenta and the testis. ABCB1
expressed in the canalicular domain of hepatocytes and the
brush border of proximal renal tubules plays a role in the
biliary and urinary excretion of xenobiotics and drugs.
There is increasing recognition of triallelic SNPs in the
genome and their possible role in varied responses to drugs. It
has been shown that nonsynonymous polymorphisms (2677G>T,
A, or C) at amino acid position 893 (Ala>Ser, Thr, or Pro) have
a great impact on both the activity and the substrate specificity
Clinical SNP Detection by the SmartAmp Method 67

of the human ABC transporter ABCB1 (P-glycoprotein/

MDR1) [24, 25]. While the A893P variant (2677G>C) is a
rare mutation, triallelic SNPs of 2677G, 2677T, and 2677A
exhibit wide ethnic differences in allele frequency, and these
non-synonymous polymorphisms are suggested to be clinically
important [25]. However, Hüebner et al. have tested and com-
pared widely used methods with respect to their error-
producing potential in detecting triallelic SNPs [26]. Their
study revealed that all methods tested, except Sequenom, pro-
duced errors for detection of the triallelic SNP (2677G>T/A)
in the human ABCB1 (P-glycoprotein/NDR1) gene. In this
context, we examined whether the SmartAmp method could
accurately detect the triallelic SNPs in the ABCB1 gene. The
corresponding results are shown in recent article [27] demon-
strating that the SmartAmp method could accurately detect
and discriminate all possible homozygotes and heterozygotes
of the triallelic SNPs.

4.3 Positive and As the positive control, we use three types of isolated genomic
Negative Controls for DNA with homozygous WT/WT, heterozygous WT/SNP, or
SmartAmp-Based SNP homozygous SNP/SNP in the gene of interest. The sequence of
Detection those genetic polymorphisms should be analyzed and confirmed
by conventional DNA sequence analysis. The concentration of the
genomic DNA in each control is adjusted to be about 10 ng/μl.
The negative control is distilled water. For control experiments,
1 μl of the positive or negative control is taken and added to the
SmartAmp reaction mixture (final volume 25 μl). The negative
control should not lead to any DNA amplification during the
SmartAmp reaction over time up to 60 min.

4.4 Sensitivity Check By using the positive control, the sensitivity of SmartAmp-based
SNP detection can be performed. We gain insight into the minimal
detection limit of SmartAmp-based SNP detection by diluting the
positive controls (WT/WT, WT/SNP, and SNP/SNP) in a step-
wise manner. As described above, the minimal detection limit of
SmartAmp-based SNP detection should be 5 ng genomic DNA
per 1 μl of test sample or even lower. Detection of a minimal detec-
tion limit is a prerequisite before clinical applications, since we use
heat-pretreated blood samples that contain varying numbers of
white blood cells. Precaution is needed for SmartAmp-based SNP
detection, in particular when we use blood samples from patients
with leucopenia.

4.5 Clinical One of the biggest advantages of SmartAmp-based SNP detection

Applications is the simple procedure for the end-user. In clinical use, the end-
user only needs to mix a lysed specimen (blood) with the reagent
mix. The entire assay is performed in a closed tube, which not only
simplifies the process, but also reduces the risk of contamination.
Furthermore, for clinical SNP detection, we have most recently
68 Toshihisa Ishikawa and Yoshihide Hayashizaki

CCD
Camera

PC
Lens & filter

Dispensing samples
into reaction wells SmartAmp reaction SNP
at 60°C for 30 -40 min digital pattern
&
End-point measurement

Fig. 7 Schematic illustration for end-point detection of SmartAmp-based SNP typing with a CCD camera-linked
digital processor

developed an end-point detection system with a charge-coupled

device (CCD) camera-linked digital processor (Fig. 7). By using
96-well or 384-well plates and automated dispenser units, the
throughput of SmartAmp reactions could be markedly increased.
In that sense, the end-point determination can be considered digi-
tal, perhaps enabling simple and cost-effective detection method-
ologies that could be deployed in countries of limited financial
resources for health care diagnostics.

Acknowledgments

The authors thank Dr. Alexander Lezhava (RIKEN Omics Science

Center) and Mr. Makoto Nagakura and Mr. Takeaki Fukami
(BioTec Co., Ltd.) for their fruitful discussion. The authors’ study
was supported by a Japan Science and Technology Agency (JST)
research project named “Development of the world’s fastest SNP
detection system” (to T.I.) and a Research Grant for RIKEN
Omics Science Center from the Ministry of Education, Culture,
Sports, Science and Technology (to Y.H.).

References

1. Mitani Y et al (2007) Rapid SNP diagnostics 2. Mitani Y et al (2009) A rapid and cost-effective
using asymmetric isothermal amplification and SNP detection method: application of
a new mismatch-suppression technology. Nat SmartAmp2 to pharmacogenomics research.
Methods 4:257–262 Pharmacogenomics 10:1187–1197
Clinical SNP Detection by the SmartAmp Method 69

3. Watanabe J et al (2007) Use of a competitive study of 75 patients. Clin Cancer Res 10:
probe in assay design for genotyping of the 5151–5159
UGT1A1*28 microsatellite polymorphism by 17. Iyer L et al (2002) UGT1A1*28 polymor-
the smart amplification process. Biotechniques phism as a determinant of irinotecan disposi-
43:479–484 tion and adverse reactions. Pharmacogenomics
4. Toyoda Y et al (2009) Earwax, osmidrosis, J 2:43–47
and breast cancer: why does one SNP 18. Innocenti F et al (2004) Genetic variants in
(538G>A) in the human ABC transporter the UDP-glucuronosyltransferase 1A1 gene
ABCC11 gene determine earwax type? FASEB predict the risk of severe neutropenia of irino-
J 23: 2001–2013 tecan. J Clin Oncol 22:1382–1388
5. Yoshiura K et al (2006) A SNP in the ABCC11 19. Marcuello E et al (2004) UGT1A1 gene varia-
gene is the determinant of human earwax type. tions and irinotecan treatment in patients with
Nat Genet 38:324–330 metastatic colorectal cancer. Br J Cancer
6. Miura K et al (2007) A strong association 91:678–682
between human earwax-type and apocrine 20. Kitagawa C et al (2005) Genetic polymorphism
colostrum secretion from the mammary gland. in the phenobarbital-responsive enhancer mod-
Hum Genet 121:631–633 ule of the UDPglucuronosyltransferase 1A1
7. Ota I et al (2010) Association between breast gene and irinotecan toxicity. Pharmacogenet
cancer risk and the wild-type allele of human Genomics 15:35–41
ABC transporter ABCC11. Anticancer Res 21. Hasegawa Y et al (2006) Pharmacogenetic
30:5189–5194 approach for cancer treatment-tailored medicine
8. Toyoda Y, Ishikawa T (2010) in practice. Ann N Y Acad Sci 1086:223–232
Pharmacogenomics of human ABC trans- 22. Innocenti F et al (2002) Haplotype structure
porter ABCC11 (MRP8): potential risk of of the UDP-glucuronosyltransferase 1A1 pro-
breast cancer and chemotherapy failure. moter in different ethnic groups.
Anticancer Agents Med Chem 10:617–623 Pharmacogenetics 12:725–733
9. Kimura Y et al (2011) Optimization of turn- 23. Watanabe J et al (2007) Complete suppression
back primers in isothermal amplification. of background amplification using competi-
Nucleic Acids Res 39:e59 tive probe in a SMart-Amplification process assay
10. Kaminsky LS, Zhang ZY (1997) Human P450 for microsatellite polymorphism genotyping of
metabolism of warfarin. Pharmacol Ther UGT1A1*28. Biotechniques 43:479–484
73:67–74 24. Leschzinger GD et al (2007) ABCB1 geno-
11. Cannegieter SC et al (1995) Optimal oral anti- type and PGP expression, function and thera-
coagulant therapy in patients with mechanical peutic drug response: a critical review and
heart valves. N Engl J Med 333:11–17 recommendations for future research.
12. Fihn SD et al (1993) Risk factors for complica- Pharmacogenomics J 7:154–179
tions of chronic anticoagulation. Ann Intern 25. Sakurai A et al (2007) Quantitative SAR
Med 118:511–520 analysis and molecular dynamic simulation to
13. Aomori T et al (2009) Rapid SNP detection of functionally validate nonsynonymous poly-
the cytochrome P-450 (CYP) 2C9 and the vita- morphisms of human ABC transporter
min K oxide reductase (VKORC1) gene for the ABCB1. Biochemistry 46:7678–7693
warfarin dose adjustment by Smart-Amplification 26. Hüebner C et al (2007) Triallelic single
process version 2. Clin Chem 55:804–812 nucleotide polymorphisms and genotyping error
14. Ando Y et al (2000) Polymorphisms of in genetic epidemiology studies: MDR1 (ABCB1)
UDPglucuronosyltransferase gene and irinote- G2677/T/A as an example. Cancer Epidemiol
can adverse reactions: a pharmacogenetic anal- Biomarkers Prev 16: 1185–1192
ysis. Cancer Res 60:6921–6929 27. Ishikawa T et al (2010) Emerging new tech-
15. Ando Y, Hasegawa Y (2005) Clinical pharma- nologies in pharmacogenomics: rapid SNP
cogenetics of irinotecan (CPT-11). Drug detection, molecular dynamic simulation,
Metab Rev 37:565–574 and QSAR analysis methods to validate clini-
16. Rouits E et al (2004) Relevance of different cally important genetic variants of human
UGT1A1 polymorphisms in irinotecan- ABC transporter ABCB1 (P-gp/MDR1).
induced toxicity: a molecular and clinical Pharmacol Ther 126:69–81
Chapter 4

MALDI-TOF Mass Spectrometry

Dirk van den Boom, Matthias Wjst, and Robin E. Everts

Abstract
Major strengths of mass spectrometry analysis include the accuracy of the detection principle, automatic
data storage as well as simplicity and flexibility of assay design making it a premier choice for targeted
genotyping of sequence variations. We explain the assay principle in detail and give step-by-step laboratory
instructions. Finally, references point toward further use of mass spectrometry analysis for molecular hap-
lotyping, re-sequencing, and quantitative analysis for copy number variations and gene expression studies
are given.

Key words Matrix-assisted laser desorption/ionization mass spectrometry, High-throughput geno-

typing, Haplotype, Copy number variation, Gene expression, Re-sequencing

1 Introduction

The efficacy of drugs is dependent on absorption, distribution,

metabolism, excretion (ADME), and toxicity. These processes are
dependent on the interaction of genes and their products with the
drug compounds, mainly the so-called drug metabolizing (DME)
genes [1]. Pharmacogenomic studies rely on genetically deter-
mined differences in individuals that are thought to influence treat-
ment response or side effects of a drug. The genetic differences are
in part subtle changes in the nucleotide sequence of the genome
that can influence gene expression levels or gene function. Although
there are many repeat regions and small insertions and deletions,
the main sources for human genetic variation are single base pair
exchanges (single nucleotide polymorphisms or SNPs) and struc-
tural variations such as copy number polymorphisms (copy number
variations or CNVs) [2, 3] that occur in functionally important
genomic regions.
Genes of the cytochrome P450 superfamily, such as CYP2D6,
CYP2C9, and CYP2C19, have been studied with sufficient detail
that there are phenotypic classifications as to how variations in
these genes directly impact drug efficacy (further refs. 4–6).

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_4, © Springer Science+Business Media, LLC 2013

71
72 Dirk van den Boom et al.

The use of large-scale association studies of genotypes from

many individuals participating in a clinical trial is considered to be
the most promising method to identify responders and non-
responders to a particular treatment. Relevant SNPs may be situ-
ated directly in genes targeted by a specific treatment, e.g., a
receptor, as well as in the signalling cascade, even in parallel path-
ways or in genes involved in the metabolizing pathway of certain
drugs.
There has been intensive worldwide research into different
assay methods for more than two decades. The technical possibili-
ties to discover and genotype SNPs in individuals have expanded
significantly and a plethora of different methods is available today
[7]. Today there is an estimated count of nearly 15 million SNPs,
1 million short insertions and deletions as well as 20,000 structural
variants in the human genome [8].
Analyzing these variations requires high-throughput (HT)
methods that are accurate, flexible towards the type of variant
(SNPs, In/Dels, CNVs) and flexible to accommodate new variants
as they are discovered and can scale to the study size usually
required to obtain sufficient statistical power.

1.1 High-Throughput Despite the plethora of available technologies, not all genotyping
Genotyping methods are suitable for HT genotyping. A major requirement for
HT genotyping is automation—from sample preparation to auto-
mated readout of the genotype. Another requirement is the avail-
ability of sufficient DNA template—the reason why nearly all
methods are based on PCR amplification. Timing, throughput and
accuracy are also critical. Missing or incorrect genotypes, even in a
minor number of samples, may double the time for genotyping.
Either individual samples need to be re-arrayed in a second step
from original plates or repeated from the same source. Average
setup, implementation and process time for an assay are therefore
important factors to consider. Finally, accuracy is extremely impor-
tant, as running all assays in duplicate or triplicate would not be
cost efficient.
Current methods for targeted genotyping combine at least one
of four different principles of allelic discrimination (hybridization,
primer extension, ligation, or restriction) with one of four different
detection techniques (chemi-luminescence/fluorescence, fluores-
cence polarization, resonance energy transfer, and mass spectrom-
etry). Assay formats still range from gel electrophoresis, plates,
particles, fiber arrays and microchip arrays to semi- and homoge-
nous assays that do not require any further sample separation or
purification.
The major strengths of mass spectrometric analysis are the
inherent accuracy of this detection principle, the automatic data
accumulation and interpretation, the high-throughput capacity
and the ability to analyze not only SNPs but also more complex
MALDI-TOF Mass Spectrometry 73

sequence variations, including quantitative analysis such as copy

number variants [9]. The instrumentation comes with slightly
higher initial setup costs compared to other methods, but these
amortize very quickly in high-throughput application. More
importantly, the effort required for development and implementa-
tion of assays and assay panels is very low. Therefore, mass spec-
trometry appears to be particularly suitable for fast setup and
analysis of a large number of markers. In addition to the large
genotyping capacity, MALDI-TOF MS provides the possibilities of
multiplexing and even second-use functions (quantification of
allele frequencies, sequencing and even protein analysis), which
renders this technology universally applicable.

1.2 Mass The importance of mass spectrometry in the field of proteomics

Spectrometry and genomics has steadily increased over the last two decades.
While mass spectrometry has long been a prominent method in
analytical chemistry, the analysis of biomolecules appeared to be a
problematic task for several reasons.
Generally, in mass spectrometry an ion source is coupled with
a mass analyzer equipped with a detection system. The ion source
generates gas-phase ions of the molecules of interest. The genera-
tion of analyte ions is a prerequisite because mass analyzers usually
apply either magnetic or electrical fields for the molecular mass
determination. Secondly, the process of desorption and ionization
is a crucial step. It needs to proceed as gently as possible to avoid
decomposition of the analyte and the lack of appropriate methods
to produce intact ions of large biomolecules such as nucleic acids
and proteins has initially hampered the application of mass
spectrometry.
With the introduction of the “soft ionization” methods, elec-
trospray ionization (ESI) and matrix-assisted laser desorption/ion-
ization (MALDI) at the end of the 1980s, the accessible mass
range for biomolecules was expanded so significantly that both
methods now can be seen as cornerstones of modern molecular
analysis in proteomics and genomics. This development was
rewarded with the Nobel Prize in chemistry 2002 to Fenn and
Tanaka. MALDI-TOF MS in particular has significantly impacted
the field of nucleic acid analysis.
During MALDI the analyte molecules are mixed with a small
molecular weight compound, the matrix. Typically these are small
organic molecules with an absorption maximum close to the laser
wavelength used for subsequent irradiation. The matrix is used in
high molar excess over the analyte. The matrix-analyte mixture is
then irradiated with a laser beam (lasers emitting in the UV wave-
length or mid-infrared lasers are most common). The irradiation
triggers a micro-explosion during which the analyte molecules are
co-desorbed into the gas phase with the matrix. The matrix mole-
cules almost exclusively absorb the laser energy and this allows the
74 Dirk van den Boom et al.

generation of intact gas-phase analyte molecules. The most common

mass analyzer employed with a MALDI ion source is a time-of-
flight (TOF) mass analyzer. All ions generated in the desorption
process are accelerated to an almost uniform translational energy
by means of an electric field. They then enter a field-free drift
region and traverse through this region with a mass-to-charge rate-
dependent velocity. The time for travelling through this drift
region is recorded and allows determination of the analyte mass.
The use of mass spectrometry for analysis of nucleic acids pro-
vides significant advantages. First and foremost, this analytical
method determines an inherent physical property of the molecule
of interest, the molecular mass. On a principle basis, this provides
a higher accuracy than indirect analysis through, for example, fluo-
rescent labels or assessment of gel electrophoretic mobility. The
flight time of a molecule is not affected by its three-dimensional
structure. Side-products sometimes generated in enzymatic reac-
tions usually exhibit a different mass and thus do not lead to mis-
interpretation of data. Additionally, MALDI-TOF MS provides
very high analytical speed. The process proceeds in microseconds
and thus provides very fast turnaround times. Mass spectra provide
a very simple data format, which lends itself to automated data
interpretation without the help of statistical tools. The current
rate-limiting step is the laser repetition rate. With current 200 Hz
lasers, sample acquisition and real-time data analysis can be com-
pleted in 400 ms.
SNP genotyping by matrix-assisted laser desorption/ioniza-
tion time-of-flight (MALDI-TOF) mass spectrometry takes advan-
tage of mass differences between allele-specific primer extension
products. Several methods have been developed including the
PinPoint assay, the GOOD assay, and the homogenous
MassEXTEND assay, which was further developed into the
iPLEX®/iPLEX Gold assay by SEQUENOM [9–11].
A representative scheme is depicted in Fig. 1.
The iPLEX Gold assay is currently the most widely adapted
method of MALDI-TOF MS based genotyping. It is a homoge-
neous assay format that involves amplification of target regions and
followed by a post-PCR primer extension reaction, in which a
primer is annealed immediately adjacent to the SNP position and
extended allele-specifically to determine the present alleles.
Two different primer extension principles are used most often
(see Fig. 1a, b). In the first implementation, the reaction cocktail
contains four terminating nucleotides such as ddNTPs or acy-
cloNTPs in combination with thermostable DNA polymerase. In a
cycled reaction, the extension primer is extended by exactly one
nucleotide and alleles are differentiated by the molecular mass of
the extension products. The terminator nucleotides can be mass-
optimized to allow optimal separation. This concept is very often
used when multiplex reactions up to 40-plexes are designed as this
MALDI-TOF Mass Spectrometry 75

a b

A C GT A C GT
T T

4 terminating nucleotides DNA - Polymerase One elongator (dNTP),

DNA - Polymerase
Three terminators (ddNTP)
A A
G C
C GT C GT
A A
T T

I EP AG I EP ddA dGddC

+1Nt Dm=16 Da +1Nt Dm=305.2Da

m/z m/z

Fig. 1 Schematic representation of primer extension reactions and analysis by mass spectrometry for genotyp-
ing of sequence variants/single nucleotide polymorphisms. (a) depicts an example of an extension reaction
where only terminating nucleotides are used in the extension reaction. Alleles are identified by the correspond-
ing molecular mass and the difference in the molecular mass of extension products is driven by the mass dif-
ference between the terminator nucleotides. (b) depicts an example of a multibase extension reaction. Here a
mixture of one elongator with three terminator nucleotides is modelled. The extension products differ in molec-
ular mass by a nucleotide. While single base extension design usually allow for higher multiplexing, mixtures of
elongators/terminators are occasionally the more appropriate choice for genotyping insertion/deletions

implementation has the highest compactness of spacing of exten-

sion products (Fig. 2). In the second implementation, the nucleo-
tide mix contains one elongator nucleotide (e.g., a dNTP) and
three terminating nucleotides (e.g., ddNTPs). Here the extension
products differ in length by one nucleotide (see Note 1). This
implementation has more flexibility for design of insertion/dele-
tion polymorphism, but at times does not achieve the same level of
multiplexing. The extension products of the cycled reaction are
conditioned by the addition of ion-exchange resin and only a few
nanoliters of products are transferred onto a prefabricated matrix-
loaded chip array with a nanodispensing device. Chip arrays can
carry up to 384 samples that are analyzed automatically by
MALDI-TOF-MS.
Prior to the primer extension reaction, shrimp alkaline phos-
phatase (SAP) is added to the PCR product. This dephosphory-
lates any residual deoxynucleotides (dNTPs), which otherwise
would interfere with the allele-specific termination. The heat-labile
SAP is then easily inactivated. The assay allows a single-tube add-
on procedure, where addition of ion-exchange resin provides for
sample conditioning and is more amenable to automated sample
preparation.
76

rs72549353

I
Dirk van den Boom et al.

...P.rs41279188
...P.rs28371706
...P.rs35350960
...P.rs11572080
...P.rs72549353
...P.rs11568626
...P.rs41303343
...P.rs55785340
...P.rs56296335
...P.rs28399447
...P.rs72552713
...P.rs72558187
...P.rs28399433
...P.rs28399444
...P.rs10509681
...P.rs56318881
...P.rs72558186

D
D
D

T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T

A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A

G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G

...EP.rs5030865
...EP.rs7662029
...EP.rs1135836
...EP.rs1801266
...EP.rs5030862
...EP.rs4646278
...EP.rs8177508
...EP.rs4149056
...EP.rs1801267
...EP.rs3740066
...EP.rs2032582
...EP.rs8177517
...EP.rs1138272
...EP.rs9332239
...EP.rs2282143

C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C

UEP.rs628031
UEP.rs316019
UEP.rs776746

UEP.rs1695
40

Intensity
10

0
4500 5000 5500 6000 6500 7000 7500 8000 8500 9000
Mass

Fig. 2 Depicted is a representative mass spectrum of a 36-plexed SNP genotyping reaction covering multiple sequence variants influencing drug metabolism and
transport (for details please see Everts R. et al., Application Note, Development and Validation of the iPLEX ADME PGx Panel on the MassARRAY System). The inten-
sity of mass signals (y-axis) is plotted against the molecular mass of extension products (x-axis) of all assays in the multiplex. Assays are color-coded consisting
of the unextended primer (EP) referencing the rs number of the sequence variant, e.g., EP.rs2282143, followed by the alleles, e.g., C and T in the same color
MALDI-TOF Mass Spectrometry 77

Sequenom has optimized this protocol for multiplexing up to

40 sequence variants/SNPs in a single reaction well. The following
application note summarizes this procedure. A critical step is the
use of provided reagents and thermal cycling parameters (Table 2).

2 Materials

Although all materials may be ordered by individual suppliers,

optimized reagents can be ordered on Sequenom’s website,
http://www.sequenom.com.
1. MassARRAY® Analyzer 4 system (SEQUENOM catalog
#10411).
2. Assay Design Suite available on http://www.mysequenom.com.
3. PCR enzyme.
4. PCR buffer.
5. dNTPs.
6. PCR primers (obtain from oligonucleotide supplier).
7. iPLEX Gold enzyme (SEQUENOM catalog #10148/10148-2
OR 10142/10142.2).
8. iPLEX Gold termination mix (SEQUENOM catalog
#10148/10148-2 OR 10142/10142.2).
9. iPLEX Gold buffer (SEQUENOM catalog #10148/10148-2
OR 10142/10142.2).
10. MassEXTEND Mix (SEQUENOM catalog #10039-10048).
11. MassEXTEND primers (obtain from oligonucleotide
supplier).
12. Thermo Sequenase (SEQUENOM catalog #10186).
13. Shrimp Alkaline Phosphatase (SAP).
14. SpectroCHIP® Arrays & Clean Resin Kit #10117.
15. Clean Resin Kit #10118.
16. Clean Resin Dimple Plate (SEQUENOM catalog #11235).

3 Methods

3.1 Assay Design For designing highly multiplexed genotyping assays, specific primer
Considerations design software is available that designs PCR and EXTEND prim-
ers for each SNP (or insertion/deletion polymorphism) to be
investigated. It uses a multiplexing algorithm developed to take full
advantage of the available mass range while avoiding overlapping
mass signals in the analyzed mass range. The program is also
78 Dirk van den Boom et al.

designed to consider potential unwanted intra- and inter-primer

interactions in order to avoid mis-amplification and false extension
products. In case a larger set of sequence variants have to be multi-
plexed, the program will determine an appropriate number of wells
to distribute the assays into and will allow for inserting new assays
into existing multiplexes (also referred to as superplexing) in case
new content should be added to an existing multiplex assay panel.
Prior to the hME/iPLEX reaction, the genomic DNA is ampli-
fied using the polymerase chain reaction (PCR, see Section 3.2 or
Notes 2, 3). The use of a 10-mer tag (5′-ACGTTGGATG-3′) on
the 5′ end of each PCR primer provides significant improvement in
overall performance. The tags increase the masses of unused PCR
primers so that they fall outside the mass range of analytical peaks
and help to balance amplification.
General Notes:
● Tubes and plates with reagents are lightly vortexed and centri-
fuged before use.
● Plates are sealed with adhesive PCR seals when not in use. The
reagents (stocks, dilutions, and finished cocktails in plates) are
stored at −20 °C when not in use.
● DNA samples are stored at −20 °C when not in use.
● DNA samples are stored at 4 °C when in use.

3.2 PCR To prepare and process the PCR, perform the following steps:
1. Prepare a PCR cocktail as described in Table 1 (volumes are
provided on a per-well basis)
2. Cycle the PCR in a standard thermal cycler according to the
conditions described in Table 2.
3.3 SAP 1. 0.5 U of shrimp alkaline phosphatase (SAP) provided in a 2 μL
volume of enzyme/buffer are then added to each PCR to
dephosphorylate unincorporated dNTPs from the amplifica-
tion reaction. The reaction is incubated at 37 °C for 40 min
followed by inactivation of the enzyme at 85 °C for 5 min.

Final concentration in
Reagent Volume (μL) 7 mL reaction volume
Nanopure Water (HPLC grade) 1.53 N/A
SAP Buffer 0.17 0.025x
SAP Enzyme 0.3 0.5 U
Total volume 2
MALDI-TOF Mass Spectrometry 79

Table 1
PCR cocktail

Reagent Volume (μL) Final concentration

Nanopure water 1.800 N/A
Genomic DNA (10 ng/μL) 1.000 10 ng/rxn
a
PCR buffer containing 20 mM MgCl2 (10×) 0.500 1.00×/2.00 mM MgCl2
Fresh dNTPs (25 mM) b
0.100 500 μM each
Forward PCR primersc (500 nM each) 0.500 100 nM each
Reverse PCR primersc (500 nM each) 0.500 100 nM each
MgCl2 (25 mM) 0.400 2.00 mM
PCR enzyme (5 U/μL) SEQUENOM Inc. 0.200 1 U/rxn
Total 5.000
a
The PCR buffer concentration should not exceed 1.25×. Higher salt concentrations have negative effects at the hME level
b
Maximum of 5 freeze–thaws
c
Containing a 10-mer tag: hME-10 (5′-ACGTTGGATG-3′). Do not use Q solution. It has negative effects on MALDI-
TOF MS analysis

Table 2
PCR conditions

Cycles Condition
1 95 °C for 2 min
45 95 °C for 30 s
56 °C for 30 s
72 °C for 1 min
1 72 °C for 5 min
1 4 °C hold

3.4 Adjusting The mass signals in the mass spectrum for a multiplexed reaction
Primer Amount may not have comparable heights. Variations in peak height may
stem from (1) inconsistent oligonucleotide quality, (2) inconsistent
oligonucleotide concentration, or (3) different desorption/ioniza-
tion behavior in MALDI. For best multiplexing results, the con-
centrations of MassEXTEND primers should be adjusted to even
out peak heights (intensities) in the mass spectrum. This adjust-
ment must be done prior to preparing the iPLEX/MassEXTEND
reaction cocktail and processing the iPLEX reaction. The following
steps need to be performed to adjust primer mixes:
1. For each multiplex, prepare a mixture of the required primers.
The final concentration of each primer in the primer mix
should be 9 μM. Consider how much primer mix you will
80 Dirk van den Boom et al.

need so that this step has to be performed only once for

the assay setup. Each single reaction (i.e., a single well in a
384-well microplate) requires 1 μL primer mix.
2. Pipette 1 μL of the primer mix into a well of a microplate and
add 24 μL Nanopure water to obtain a 360 nM dilution of the
primer mix (referred to as a primer mix sample).
3. Repeat steps 1 and 2 for each multiplex, to generate a micro-
plate containing primer mix samples for all of the multiplexes.
4. Add 6 mg CLEAN resin to each well of the microtiter plate
(MTP) using the dimple plate.
5. Dispense the primer mix samples to a precoated chip using
standard dispensing conditions for iPLEX reaction products.
6. Acquire spectra using the MassARRAY Typer software. Use
the assay definitions (in Typer) for the actual multiplexes. Each
well on the SpectroCHIP array will yield no-calls because
there is no analyte, only unextended primers. A mass signal
should appear at the expected mass for each primer in the mix.
A missing signal generally indicates poor primer quality or a
primer missing from the mix. An unexpected signal generally
indicates poor primer quality or the addition of an unnecessary
primer to the mix.
7. Check whether the primer mass signals in each mass spectrum
have comparable heights (see Note 4). If all mass signals are at
least 50 % the height of the highest mass signal, they are
acceptable. If any mass signal is less than 50 % the height of the
highest signal, add more of that primer, e.g., add the deficit in
percent from the highest signal as percent of the initial volume.
A corresponding report function is provided within the
supplied genotyping software (see Notes 5–7).

3.5 iPLEX Reaction, Once the MassEXTEND primer mixes have been adjusted, the
Desalting and iPLEX extend reaction cocktail is prepared (Table 3), added to the
Dispensing SAP-treated PCR product, and thermocycled.
Cycle the reaction as indicated in Table 4.
Dilute with 16 μL Nanopure water and add 6 mg CLEAN resin
to the Extend reaction products for conditioning (see Note 8).
Then incubate for 15 min at room temperature and keep the resin
particles in suspension during incubation. Spin the reaction vessel
at 3,200 × g (2,000 rpm for standard rotor centrifuge) prior to the
next step. Using a nanodispenser, 12–15 nL of the reaction product
is then transferred onto a 384-well SpectroCHIP array.

3.6 Desorption Analysis of chip-transferred samples proceeds in a linear, delayed

and Spectral Analysis, extraction time-of-flight (TOF) mass spectrometer (MassARRAY
Assignment Analyzer 4). Mass spectra are acquired in positive ion mode
of Genotypes (all positively charged molecular ions are accelerated). The
SpectroCHIPs are introduced into the ion source and high-vacuum
MALDI-TOF Mass Spectrometry 81

Table 3
Extension reaction cocktail (per reaction well)

Final concentration
Reagent Volume (μL) in 9 μL reaction volume
Nanopure water (HPLC grade) 0.619 N/A
iPLEX Gold buffer 0.200 0.222×
iPLEX termination mix 0.200 1×
Adjusted primer mix (~9 μM each) a
0.940 1.25×/1.875 mM MgCl2
iPLEX enzyme (32 U/μL) 0.041 1.25 U/rxn
Total volume 2.000
a
Note that the primers in an adjusted mix may not be at 9 μM each. Each starts out at 9 μM; however, the addition of
extra amounts of some primers to adjust the mix will change the concentrations

Table 4
Extension conditions

Step Cycles Condition

1 1 94 °C for 2 min
2 40
3 94 °C for 5 s
4 5 cycles 52 °C for 5 s
80 °C for 5 s
5 1 72 °C for 3 min
6 1 4 °C hold

conditions are applied. Image processing aligns the laser position

automatically to the chip element raster for fully automated scan-
ning of each chip position. Each matrix crystal is addressed indi-
vidually and irradiated with a 337 nm laser pulse of 1 ns duration.
The irradiation results in a plume of volatised matrix and analyte.
During gas phase, charge-transfer processes generate matrix and
analyte ions, which are accelerated in an electric field. By travelling
through a field-free region of approximately 1 m length their
velocity is inversely proportional to their mass-to-charge ratio. The
resulting time-resolved mass spectrum is then translated into mass
spectrum by comparison with known calibrants. Usually four to six
sets of 15 single laser shots are accumulated and averaged into a
single spectrum. This average spectrum is then further processed
and analyzed using dedicated software (Typer Analyzer,
82 Dirk van den Boom et al.

SEQUENOM) that performs baseline correction, peak identifica-

tion and quality assessments. The determination of corresponding
genotypes occurs in real time during data acquisition and is usually
completed within 1 s processing time (transit time of laser, laser
irradiation, spectra accumulation and analysis). If the mass spec-
trum is not of sufficient quality, the software will automatically
reacquire new data points from the same chip position before it
moves to the next chip position. This provides real-time control of
data quality and increases accuracy as well as call rates.

3.7 Other MALDI The focus of this chapter has been genotyping of SNPs using primer
Applications extension methods and MALDI-TOF MS. Within recent years, the
portfolio of applications using MALDI-TOF MS as a detection
platform has expanded significantly. A majority of these new appli-
cations not only rely on the accuracy provided by mass spectrome-
try for qualitative analysis of nucleic acids, but they also have
established measures for quantitative analysis of nucleic acids.
Recent publications describe the use MALDI-TOF MS for relative
quantitation of genetic information in DNA pools and sample mix-
tures [12–15]; re-sequencing methods, which allow the rapid dis-
covery of SNPs, the screening for mutations or signature sequence
based identification of organisms such as pathogens [16–18]; and
also relative and absolute quantitation in gene expression and copy
number variations [19]. A further interesting application is M1-PCR
for haplotyping [20]. Here, multiplex PCR performed on single
DNA molecules generated by dilution is combined with the speci-
ficity of mass spectrometry read-outs to generate up to 25 kB hap-
lotypes. Recent reviews summarize these developments [9].

4 Notes

In addition to the above procedures it is worthwhile to also con-

sider the following points.
1. In case of the design of multi-base extensions occasionally
DNA polymerase pausing has been observed when the tem-
plate exhibits strong secondary structure. This leads to prema-
turely terminated extension products, which can confound the
analysis if termination mixes are not selected carefully (note
that an extension primer elongated either with one ddGTP or
dATP will have the same molecular mass). In Table 5, a list of
suitable termination mixes for biallelic SNPs outside the stan-
dard iPLEX termination mix recommended by SEQUENOM,
which prevent mass signal coincidence of pausing artifacts and
real termination events, is provided.
2. PCR reactions for the MassEXTEND reaction are usually per-
formed in low volumes (5 μL). It is important that the TE
MALDI-TOF Mass Spectrometry 83

Table 5
SNPs and suitable termination mixes in addition to SEQUENOM standard
iPLEX termination mix

SNP Termination mix

A/C dATP/ddCTP/ddGTP/ddTTP
A/G dGTP/ddATP/ddCTP/ddTTP
A/T dATP/ddCTP/ddGTP/ddTTP
C/G dGTP/ddATP/ddCTP/ddTTP
dCTP/ddATP/ddGTP/ddTTP
C/T dTTP/ddATP/ddCTP/ddGTP
G/T dGTP/ddATP/ddCTP/ddTTP
Ins/dels Dependent on sequence context

concentration in the genomic DNA does not inhibit the

amplification. Make sure that the genomic DNA does not
contain more than 0.25× TE buffer.
3. The matrix/crystallization process is sensitive to detergents.
PCR additives such as Q solution (provided with HotStarTaq)
may disturb the crystallization process and reduce the data
quality, and thus should be avoided.
4. Oligonucleotides of poor quality (increased amount of synthe-
sis failure products or strong depurination signals) will lead to
poor genotyping performance and may interfere with correct
genotype assignment. Make sure during the primer amount
adjustment that each primer generates only the desired mass
signal. Preferably, order primers from oligonucleotide manu-
facturers using MALDI-TOF MS for synthesis quality control.
5. SEQUENOM provides an iPLEX termination mix for single
base extension reaction that optimizes the mass separation of
extension products while also allowing levels of multiplexing
up to 40 SNPs. When designing genotyping assays manually
and with your own nucleotides, do not use termination mixes
containing all four dideoxynucleotides. Mass differences
between alleles would then be as little as 9 Da (ddATP/ddTTP
mass difference). This can be challenging to discriminate and
can lead to wrong genotype assignments. Additionally, the
mass difference between ddA/ddC and ddT/ddG falls close
to the mass of sodium adducts (22 Da), potentially leading to
misinterpretation of mass signals.
6. When designing assays manually, check PCR primers for mul-
tiple binding to the genome and for formation of primer
dimers and hairpins to avoid mis-amplification. Check
84 Dirk van den Boom et al.

self-designed EXTEND primers for hairpin formation to avoid

self-extension. A software package is available from
SEQUENOM at http://www.mysequenom.com.
7. When designing assays, check target regions for copy number
variations or paralogous regions as both may impact the allele
ratios in the genotyping reaction and may lead to skewed
“genotype clusters”. While these regions cannot always be
avoided, special care should be taken towards their analysis,
for example by employing advanced clustering algorithms.
8. Desalting the MassEXTEND products with CLEAN resin is a
crucial step with strong impact on the data quality. It is impor-
tant that the resin particles stay in suspension during the 15 min
incubation step and do not settle. A rotation where plates are
turned upside down usually provides best performance.
Increased incubation temperature is not recommended.

Acknowledgments

Trademarks may be copyrighted by the respective owners.

References

1. Sadee W, Dai Z (2005) Pharmacogenetics/ CYP2D6 allele, CYP2D6*69, in a Caucasian

genomics and personalized medicine. Hum poor metabolizer individual. Eur J Clin
Mol Genet 14 Spec No. 2:R207–R214 Pharmacol 65(1):97–100
2. Ring HZ, Kwok PY, Cotton RG (2006) 7. Ragoussis J (2009) Genotyping technologies
Human variome project: an international col- for genetic research. Annu Rev Genomics
laboration to catalogue human genetic varia- Hum Genet 10:117–133
tion. Pharmacogenomics 7(7):969–972 8. Xue Y, Cartwright RA, Altshuler DL, Kebbel
3. Pang AW, MacDonald JR, Pinto D, Wei J, J, Kokko-Gonzales P, Nickerson DA (2010) A
Rafiq MA, Conrad DF, Park H, Hurles ME, map of human genome variation from
Lee C, Venter JC, Kirkness EF, Levy S, Feuk population-scale sequencing. 1000 Genomes
L, Scherer SW (2010) Towards a comprehen- Project Consortium. Nature 467(7319):
sive structural variation map of an individual 1061–1073
human genome. Genome Biol 11(5):R52 9. Oeth P, del Mistro G, Marnellos G, Shi T, van
4. Gaedigk A, Ryder DL, Bradford LD, Leeder den Boom D (2009) Qualitative and quantita-
JS (2003) CYP2D6 poor metabolizer status tive genotyping using single base primer exten-
can be ruled out by a single genotyping assay sion coupled with matrix-assisted laser
for the −1584G promoter polymorphism. Clin desorption/ionization time-of-flight mass
Chem 49(6 Pt 1):1008–1011 spectrometry (MassARRAY). Methods Mol
5. Mega JL, Close SL, Wiviott SD, Shen L, Biol 578:307–343
Hockett RD, Brandt JT, Walker JR, Antman 10. Tost J, Gut IG (2002) Genotyping single
EM, Macias W, Braunwald E, Sabatine MS nucleotide polymorphisms by mass spectrom-
(2009) Cytochrome p-450 polymorphisms etry. Mass Spectrom Rev 21:388–418
and response to clopidogrel. N Engl J Med 11. Ross P, Hall L, Haff LA (2000) Quantitative
360(4):354–362 approach to single nucleotide polymorphism
6. Gaedigk A, Frank D, Fuhr U (2009) analysis using MALDI TOF mass spectrome-
Identification of a novel non-functional try. Biotechniques 29:620–629
MALDI-TOF Mass Spectrometry 85

12. Bansal A, van den Boom D, Kammerer S, Honisch C, Rodi CP, Bocker S et al (2004)
Honisch C, Adam G, Cantor CR, Kleyn P, High-throughput MALDI-TOF discovery of
Braun A (2002) Association testing by DNA genomic sequence polymorphisms. Genome
pooling: an effective initial screen. Proc Natl Res 14:126–133
Acad Sci USA 99:16871–16874 17. Ehrich M, Böcker S, van den Boom D (2005)
13. Werner M, Sych M, Herbon N, Illig T, König Multiplexed discovery of sequence polymor-
I, Wjst M (2002) Large-scale determination of phisms using base-specific cleavage and
SNP allele frequencies in DNA pools using MALDI-TOF MS. Nucleic Acids Res
MALDI-TOF mass spectrometry. Hum Mutat 33(4):e38
20:57–64 18. Honisch C, Chen Y, Mortimer C, Arnold C,
14. Mohlke KL, Erdos MR, Scott LJ, Fingerlin Schmidt O, van den Boom D, Cantor CR,
TE, Jackson AU, Silander K, Hollstein P, Shah HN, Gharbia SE (2007) Automated
Boehnke M, Collins FS (2002) High- comparative sequence analysis by base-specific
throughput screening for evidence of associa- cleavage and mass spectrometry for nucleic
tion by using mass spectrometry genotyping acid-based microbial typing. Proc Natl Acad
on DNA pools. Proc Natl Acad Sci USA Sci USA 104(25):10649–10654
99:16928–16933 19. Ding C, Cantor CR (2003) A high-throughput
15. Herbon N, Werner M, Braig C, Gohlke H, gene expression analysis technique using com-
Dütsch G, Illig T, Altmüller J, Hampe J, petitive PCR and matrix-assisted laser desorp-
Lantermann A, Schreiber S et al (2003) High- tion ionization time-of-flight MS. Proc Natl
resolution SNP scan of chromosome 6p21 in Acad Sci USA 100:3059–3064
pooled samples from patients with complex 20. Ding C, Cantor CR (2003) Direct molecular
diseases. Genomics 81:510–518 haplotyping of long-range genomic DNA with
16. Stanssens P, Zabeau M, Meersseman G, Remes M1-PCR. Proc Natl Acad Sci USA 100:
G, Gansemans Y, Storm N, Hartmer R, 7449–7453
Chapter 5

TaqMan® Drug Metabolism Genotyping Assays

for the Detection of Human Polymorphisms Involved
in Drug Metabolism
Toinette Hartshorne

Abstract
Polymorphisms associated with genes that code for various drug-metabolizing enzymes (DMEs) and
associated transport proteins can influence the rate of drug metabolism within individuals, thus potentially
affecting drug efficacy and the occurrence of side effects. There are 2,700 unique TaqMan® Drug
Metabolism Genotyping Assays (Life Technologies) for detecting single nucleotide polymorphisms
(SNPs), insertions and deletions (indels), and multinucleotide polymorphisms (MNPs) in both coding and
regulatory regions. These research assays are useful tools for better understanding genetic variation in drug
metabolism. Here we describe the procedure for measuring genetic variation in human DNA using
TaqMan® Drug Metabolism Genotyping Assays. These assays are for research use only and are not intended
for any animal or human therapeutic or diagnostic use.

Key words Single nucleotide polymorphism, SNP, Drug-metabolizing enzymes, DME, Cytochrome
P450, CYP superfamily, TaqMan® Drug Metabolism Genotyping Assays, TaqMan® Genotyper
Software

1 Introduction

TaqMan® Drug Metabolism Genotyping Assays are used to detect

polymorphisms in 221 human genes that code for various drug-
metabolizing enzymes (DMEs) and associated transport proteins,
and were designed based on information from several private and
public SNP databases, including recognized public allele nomen-
clature sites. Genomic SNP context sequences were masked for
known polymorphisms (including SNPs from NCBI dbSNP and
Life Technologies SNP databases) and repetitive sequence ele-
ments, and were then submitted to a proprietary TaqMan® assay
design pipeline to generate primer and probe sequences. The
design pipeline optimized the sequence composition of primer and
probe sequences and included a BlastN search to reference genome
sequences to ensure specificity for the unique genomic target. This

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_5, © Springer Science+Business Media, LLC 2013

87
88 Toinette Hartshorne

genome quality check was crucial when designing assays to CYP

gene SNP targets, because the CYP gene family contains highly
homologous genes and pseudogenes. All assays have passed perfor-
mance tests involving 180 unique DNA samples from four differ-
ent populations. These TaqMan® DME Assays were designed to be
used with Applied Biosystems® real-time PCR systems.
Polymorphisms within the DME collection include single-
nucleotide polymorphisms (SNPs), insertion/deletions (indels),
and multinucleotide polymorphisms (MNPs). Initially, all poly-
morphisms were identified for the 221 genes, and the set was then
filtered to include only polymorphisms within regulatory elements,
coding regions, and splice junctions. Public allele nomenclature
sites were used to help assign common allele names to specific
polymorphisms in the DME collection. A complete list of the
TaqMan® Drug Metabolism Genotyping Assays, including any
available common allele names and refSNP identifiers as well as
SNP context sequences and other annotations, is available in the
Drug Metabolism Genotyping Assay Index file provided at the
TaqMan® Drug Metabolism Genotyping Assays product Web site.
TaqMan® DME Genotyping Assays utilize allelic discrimina-
tion analysis of genomic DNA (gDNA) using real-time or end-
point PCR data. Allelic discrimination is achieved using TaqMan®
chemistry with two fluorogenic probes to enable the allele-specific
discrimination of a single base pair in a PCR product as it accumu-
lates during PCR cycles (Fig. 1). A VIC® reporter dye is linked to
the 5′ end of the Allele 1 probe, and a FAM™ dye is linked to the
5′ end of the Allele 2 probe. A minor groove binder (MGB)
increases the probe melting temperature (Tm) and sensitivity,
allowing the design of shorter probes and resulting in improved
allelic discrimination. The increase in fluorescence signal occurs
when probes that have been hybridized to the complementary
sequence are cleaved during amplification by the 5′ nuclease activ-
ity of the AmpliTaq Gold® DNA Polymerase. Thus, the fluores-
cence signal generated by PCR amplification distinguishes the SNP
alleles and enables sample genotyping. Detection of SNPs by
TaqMan® DME genotyping assays is a closed-tube PCR method
that can be completed in about 3 h from sample to results.
Automated genotype calling is facilitated by TaqMan® Genotyper
Software, developed specifically for analysis of TaqMan® SNP and
DME Genotyping Assays.

2 Materials

2.1 Assay Each TaqMan® DME assay consists of a single tube containing:
Components ● Two primers for specific amplification of the locus containing
the polymorphism of interest.
TaqMan® Drug Metabolism Genotyping Assays 89

Fig. 1 TaqMan® SNP Genotyping Assay. (1) The four TaqMan® Genotyping Assay components and the target
DNA template with the SNP (in brackets). (2) The denatured DNA target and annealing of the assay compo-
nents. (3) Signal generation leading to specific allele detection

● Two TaqMan® MGB probes for detection of alleles.

Each TaqMan® MGB probe contains:
● A reporter dye at the 5′ end of each probe—the VIC® dye is
linked to the 5′ end of the Allele 1 probe; the FAM™ dye is
linked to the 5′ end of the Allele 2 probe.
● A minor groove binder (MGB) at the 3′ end of the probe
sequence.
● A nonfluorescent quencher (NFQ) at the 3′ end of the probe.
All TaqMan® Drug Metabolism Genotyping PCR Assays
require only three components:
● 3–20 ng of purified gDNA sample per well, with all wells in a
given study containing the same amount of DNA.
● 20× TaqMan® Drug Metabolism Genotyping Assay (specific
for each polymorphism).
● 2× TaqMan® Genotyping Master Mix.
90 Toinette Hartshorne

2.2 Recommended The recommended template for TaqMan® Drug Metabolism

Template Genotyping Assays is purified gDNA (3–20 ng). Quantify gDNA
by flourometric analysis using a Qubit® dsDNA BR or HS Assay
Kit (recommended) or by UV spectrophotometry (see Note 1).

2.3 Selecting a This PCR protocol has been tested using GeneAmp® PCR System
Thermal Cycler or 9700 and the Applied Biosystems® 7900HT Real-Time PCR
Real-Time PCR System System thermal cyclers for PCR amplification, as well as other
Applied Biosystems® instruments, including those listed below (see
Notes 2 and 3).
Instruments:
Thermal Cyclers: GeneAmp® PCR System 9700, Veriti® 384-Well
Thermal Cycler, Veriti® 96-Well Fast Thermal Cycler.
(TaqMan® Drug Metabolism Assays can be performed on Fast
thermal cyclers using standard reagents and standard cycling
protocols.)
Real-Time PCR Systems (These systems allow real-time analysis of
PCR, which is helpful for troubleshooting. QuantStudio™
12K Flex system, ViiA™ 7 system, 7900HT Fast system, 7500
system, 7500 Fast system, StepOnePlus™ system, StepOne™
system (TaqMan® Drug Metabolism Assays can be performed
on Fast real-time PCR systems using standard reagents and
standard cycling protocols).

3 Methods

The TaqMan® Drug Metabolism Genotyping Assay procedure

consists of three main steps: PCR amplification, allelic discrimina-
tion plate read, and allelic discrimination analysis. An overview of
the procedure is shown in Fig. 2.

3.1 PCR During the first step of a TaqMan® Drug Metabolism Genotyping
Amplification Assay, AmpliTaq Gold® DNA Polymerase from the TaqMan®
Genotyping Master Mix (see Note 4) amplifies the target DNA
3.1.1 Overview
using sequence-specific primers. TaqMan® MGB probes from the
Drug Metabolism Genotyping Assay provide a fluorescence read-
out of the amplification of each allele.

3.1.2 General Process PCR amplification requires that you prepare an optical reaction
plate containing the following for each assay:
● No-template controls (NTCs) (at least two are strongly rec-
ommended; see Note 5)
TaqMan® Drug Metabolism Genotyping Assays 91

Fig. 2 Overview of the TaqMan® Drug Metabolism Genotyping Assay procedure

● gDNA samples with known genotype at SNP of interest

(optional controls)
● gDNA samples with unknown genotype at SNP of interest

3.1.3 Reagent ● Keep all TaqMan® Drug Metabolism Genotyping Assays in the
Preparation Guidelines freezer, protected from light, until ready for use. Excessive
exposure to light may affect the fluorescent probes.
92 Toinette Hartshorne

● Minimize freeze–thaw cycles.

● Prior to use: Thoroughly mix the TaqMan® Genotyping Master
Mix by swirling the bottle; resuspend the assay mix by vortex-
ing and then centrifuge the tube briefly. After thawing frozen
gDNA samples, resuspend the samples by vortexing and then
centrifuge the tubes briefly.
● Prepare the reaction mix for each assay before transferring it to
the optical reaction plate for thermal cycling.
● Mix the reagents thoroughly after adding the reaction mix to
the gDNA samples to avoid reagent stratification in the wells.

3.1.4 Methods The TaqMan® Drug Metabolism Genotyping Assay protocol

for Adding DNA allows you to use either wet or dried-down DNA. If your experi-
ment requires multiple plates that use the same gDNA, or if you
plan to use the same gDNA in several experiments, it is convenient
to dry down the gDNA in the plates, which are then ready for use
at any time. Both methods are described below.
To create a plate with wet DNA:
1. Dilute each purified gDNA sample with DNase-free water to
deliver a final DNA mass in the range of 3–20 ng per reaction
well (see Notes 6 and 7). For preparing a 384-well reaction
plate, the volume of DNA sample and DNase-free water
should be 2.25 μL. For preparing a 96-well reaction plate, the
volume of DNA and DNase water should be 11.25 μL.
2. Into each well of the 96-well or 384-well optical reaction
plate, pipet one control or sample aliquot of the volume
(indicated in step 1) appropriate for the plate type.
To create a plate with dried-down DNA:
1. Pipet one control or sample (3–20 ng of purified gDNA) into
each well of a 96-well or 384-well optical reaction plate. All
wells belonging to the same drug metabolism genotyping
assay must contain the same amount of sample or control.
2. Dry down the samples completely by evaporation at room
temperature in a dark, amplicon-free location (cover the plate
with a lint-free tissue while drying).

3.1.5 Prepare the The reaction mix contains TaqMan® Drug Metabolism Genotyping
Reaction Mix Assay Mix, TaqMan® Genotyping Master Mix, and DNase-free
water. The recommended final reaction volume, per well, is 5 μL
for a 384-well plate and 25 μL for a 96-well plate. To prepare the
reaction mix:
1. Calculate the number of reactions to be performed for each
assay (see Note 8).
TaqMan® Drug Metabolism Genotyping Assays 93

2. Calculate the total volume of each component needed for each

assay, using the table below. Be sure to choose the appropriate
DNA delivery method for your experiment (see Note 9).

Dried-down
Wet DNA method DNA method

Volume/well (μL)

384-well 96-well 384-well 96-well

Component plate plate plate plate
2× TaqMan® 2.50 12.50 2.50 12.50
Genotyping
Master Mix
20× Drug Metabolism 0.25 1.25 0.25 1.25
Genotyping
Assay Mix
DNase-free water None None 2.25 11.25
Total volume per well 2.75 13.75 5.00 25.00

3. Gently swirl the bottle of 2× TaqMan® Genotyping Master

Mix (abbreviated as “GTMM” in subsequent steps). Ensure
that the 2× GTMM is well mixed before use.
4. Vortex and centrifuge the 20× Drug Metabolism Genotyping
Assay Mix briefly.
5. Pipet the required total volumes of 2× GTMM and 20× Drug
Metabolism Genotyping Assay Mix into sterile test tubes.
6. Flick and invert the tube(s) to mix.
7. Centrifuge the tube(s) briefly to spin down the contents and
to eliminate any air bubbles from the solution.

3.1.6 Prepare the DNA For each assay and on each reaction plate, run controls to ensure
Reaction Plate optimal analysis and troubleshooting capabilities of TaqMan®
Drug Metabolism Genotyping Assays:
● Two no-template controls (NTCs, DNase-free water) per assay
(strongly recommended)
● Known gDNA controls (optional)
1. Into each well of the DNA reaction plate, pipet the reaction
mix as indicated below (see Note 10). For preparing a 384-
well reaction plate, the volume of reaction mix/well should
be 2.75 μL (wet method) or 5 μL (dried-down DNA
method). For preparing a 96-well reaction plate, the vol-
ume of reaction mix/well should be 13.75 μL (wet method)
or 25 μL (dried-down DNA method).
94 Toinette Hartshorne

2. Inspect all the wells for uniformity of volume, and note

which wells do not appear to contain the proper volume.
Redo any reactions that do not contain the proper
volume.
3. Seal the plate with an optical adhesive cover (required if
using the Applied Biosystems® 7900HT Real-Time PCR
System) or with optical caps.
4. Vortex the plate to mix the wells.
5. Centrifuge the plate briefly to spin down the contents and
eliminate any air bubbles.

3.1.7 Perform PCR The TaqMan® Drug Metabolism Genotyping Assay protocol uses
a 90-s PCR extension time and 50 PCR cycles. These conditions
are chosen for optimal performance because the average amplicon
size of TaqMan® Drug Metabolism Genotyping Assays is longer
than the average amplicon size of most TaqMan® SNP Genotyping
Assays.
To perform PCR:
1. Specify the thermal cycling conditions (see Note 11).

AmpliTaq Gold®
enzyme activation PCR (50 cycles)

Hold Denature Anneal/extend

10 min at 95 °C 15 s at 92 °C 90 s at 60 °C

2. Specify the reaction volume: 5 μL for a 384-well plate, 25 μL

for a 96-well plate.
3. Load the reaction plate into the thermal cycler, then start the
run.

3.2 Allelic After PCR amplification, you perform an endpoint plate read using
Discrimination Plate Applied Biosystems® Real-Time PCR System Software. The instru-
Read and Analysis ment software uses the fluorescence measurements made during
the plate read to plot fluorescence (Rn) values based on the signals
from each well. The plotted fluorescence signals indicate which
alleles and genotypes are in each sample.
Refer to the allelic discrimination or genotyping section of the
appropriate instrument user guide for instructions on how to use
the system software to perform the plate read and analysis.
Analyzing data for SNP genotyping requires that you:
1. Create and set up an allelic discrimination plate read document
2. Perform an allelic discrimination plate read on a real-time
PCR instrument system
TaqMan® Drug Metabolism Genotyping Assays 95

3. Analyze the plate read document

4. Make manual allele calls or review automatic allele calls
5. Convert allele calls to genotypes

3.3 Data Analysis TaqMan® Genotyper Software features a state-of-the-art genotype-

Using TaqMan® calling algorithm, an intuitive user interface, and enhanced multi-
Genotyper Software plate analysis features. The software enables identification and
utilization of various controls and reference data panels to influence
genotype calls from real-time or endpoint TaqMan® SNP and DME
Genotyping Assays, and enables you to overlay and analyze raw data
from multiple genotyping experiments. Download free TaqMan®
Genotyper Software at http://www.lifetechnologies.com/us/en/
home.html.
1. Use the TaqMan® Genotyper Software to create a study and:
(a) Import multiple experiments into the study
(b) Import assay information files to update assay
information
(c) Import Supplementary Sample Information (SSI) files to
update sample information
(d) Import reference panel files to add reference samples to a
study
(e) Set the study analysis settings
2. Analyze the study data using one of two call methods:
(a) Autocalling—genotypes are automatically assigned to
samples using an improved algorithm
(b) Classification scheme—the user sets linear cluster
boundaries to define regions associated with each genotype
call category (i.e., homozygote, heterozygote, and
undetermined regions)
3. View the study results, including the sample genotypes and
quality-control statistics, at the study, assay, experiment, and
sample levels

4 Notes

1. Genomic DNA (gDNA) should be quantitated by a reliable

method such as flourometric analysis using a Qubit® dsDNA
BR or HS Assay Kit (recommended) or by measuring the UV
absorbance (A260/A280). Be sure that the human gDNA that
you use has an A260/A280 ratio >1.7.
96 Toinette Hartshorne

2. Because of differences in ramp rates and thermal accuracy, you

may need to adjust the settings if you use thermal cyclers other
than those indicated here. Use of thermal cyclers from manu-
facturers other than Applied Biosystems® is not supported by
Life Technologies.
3. TaqMan® Drug Metabolism Genotyping Assays can also be
run on the Applied Biosystems™ QuantStudio™ 12K Flex
OpenArray® plates, which is a flexible, high throughput, eco-
nomical system. The genotyping protocol that is found in the
QuantStudio™ 12K Flex Real-Time PCR System OpenArray®
Experiments User Guide should be followed.
4. Alternatively, TaqMan® Universal Master Mix II, no UNG or
with UNG (i.e., AmpErase UNG), can be used. Only
Genotyping Master Mix (which does not contain UNG) is
referred to in this protocol.
5. We strongly recommend using at least two NTCs per assay to
orient the VIC® and/or FAM™ clusters to an origin, and to
enhance the detection of gDNA contamination on a given set
of plates.
6. All wells belonging to the same Drug Metabolism Genotyping
Assay must contain the same amount of sample or control.
7. Multiple Drug Metabolism Genotyping Assays may be run on
one reaction plate, but they must be analyzed separately using
the real-time instrument system software. Data from multiple
plates may be overlaid and analyzed using TaqMan® Genotyper
Software.
8. Include at least two NTCs and, if available, at least one known
gDNA control on each plate for optimal analysis and trouble-
shooting capabilities.
9. In your calculations, include some extra reactions to compen-
sate for the volume loss that occurs during pipetting.
10. Be sure that no cross-contamination occurs from well to well
during pipetting.
11. These conditions are optimized for use only with TaqMan®
Drug Metabolism Genotyping Assays on the instruments
specified here. Refer to the appropriate instrument user guide
for help with programming your thermal cycler or real-time
PCR system.
Chapter 6

Pyrosequencing of Clinically Relevant Polymorphisms

Cristi R. King and Sharon Marsh

Abstract
Despite the influx of high throughput sequencing techniques, there is still a niche for low-medium
throughput genotyping technologies for small-scale screening and validation purposes. Pyrosequencing is
a genotyping assay based on sequencing-by-synthesis. Short runs of sequence around each polymorphism
are generated, allowing for internal controls for each sample. Pyrosequencing can also be utilized to iden-
tify tri-allelic, indel, and short repeat polymorphisms, as well as determining allele percentages for methyla-
tion or pooled sample assessment. This range of applications makes it well-suited to the research laboratory
as a one-stop system.

Key words Pyrosequencing, Genotype, Polymorphism, Indel, Tri-allelic

1 Introduction

Polymorphisms in coding and control regions of genes can cause

significant inter-individual variation in the resulting protein func-
tion and activity, leading to important differences in disease sus-
ceptibility and drug metabolism [1]. This expansion in evaluable
SNPs has led to a number of detection methods [2, 3].
Pyrosequencing is a robust medium-throughput genotyping
system capable of analyzing a wide range of DNA variation. The
methodology is easy to perform and readily transferrable to other
laboratories. Applications vary widely from research to diagnostics.
Pyrosequencing produces specific sequence data in the form of
peaks on a pyrogram. It does not require the presence of a restric-
tion enzyme site and PCR product and internal primer sites can
vary in size and position. In addition, it can be utilized to identify
tri-allelic, indel, and short repeat polymorphisms, as well as deter-
mining allele percentages for methylation or pooled sample assess-
ment [2]. The availability of sequence directly adjacent to the
polymorphisms allows internal quality control checks to be made
for each sample. Pyrosequencing is typically performed on a

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_6, © Springer Science+Business Media, LLC 2013

97
98 Cristi R. King and Sharon Marsh

96-well platform and in an average day over 3,000 individual

genotypes can be measured. This method has been utilized to
genotype many clinically relevant polymorphisms [4–7].

2 Materials

2.1 DNA Template DNA from any source can be used in Pyrosequencing assays
(see Note 1). Commonly used kits for manual or machine DNA
extraction, including Gentra, Qiagen, and Oragene do not inhibit
the assay.

2.2 PCR 1. Primer Design Software (custom software is available for a fee
from Qiagen, and is usually included in the purchase of the
Pyrosequencing machine).
2. 1–5 ng DNA template (see Note 2).
3. PCR Mastermix, for example: 30 mM Tris–HCl, 100 mM
potassium chloride, pH 8.05, 400 μM dNTP, and 5 mM mag-
nesium chloride (see Note 3).
4. Hot start Taq Polymerase (see Note 4).
5. DNase- and RNase-free 18.2 mΩ water.
6. DNA Oligonucleotides (primers), one biotinylated.
7. Unskirted 96-well PCR trays.
8. Sealing Film or Silicon Mat for covering PCR plates in a
thermocycler.
9. Thermocycler with 96-well capacity, gradient block, and
heated lid.

2.3 Agarose Gel 1. Agarose.

Electrophoresis 2. 50× TAE buffer: For 1 l, add 242 g Tris base, 57.1 ml glacial
for Validating and Acetic Acid, 18.6 g EDTA to 18.2 mΩ Water. Store at room
Optimizing PCR temperature. Dilute to 1× with water prior to use.
3. Microwave.
4. Ethidium Bromide (4 μl of 10 mg/ml ethidium bro-
mide/100 ml agarose; add AFTER heating).
5. Loading Dye (can be purchased premade or made using 6×
recipe below): For 100 ml 6× loading dye: 30 ml Glycerol,
70 ml water plus a pinch of Bromophenol Blue and a pinch of
Xylene Cyanol FF (amount can be varied depending on the
desired color). Store at room temperature.
6. Gel Apparatus: casting tray, gel tank, lid and power supply.
7. UV Gel Documentation system with thermal printer.
Pyrosequencing of Clinically Relevant Polymorphisms 99

2.4 Processing PCR 1. Centrifuge with rotor/buckets to handle 96-well plates.

for Pyrosequencing 2. 2× Binding Buffer: For 1 l, add 1.21 g Tris, 117 g NaCl,
0.292 g EDTA to water, pH 7.6 with 1 M HCl. Sterile filter
then add 1 ml Tween 20.
3. Sepharose Bead Mix: 240 μl Streptavidin-coated Sepharose
beads, 4,560 μl 2× Binding Buffer and 3,600 μl 18.2 mΩ
Water per 96-well plate (the older magnetic bead processing
protocol for a PSQ96 or PSQ96MA is described elsewhere
[8]). Excess sepharose/binding buffer mix can be stored in a
glass bottle at 4 °C.
4. 24- or 96-well plate shaker, e.g., eppendorf thermomixer
(Fisher Scientific, Hampton, NH).
5. Vacuum prep tool and troughs (Qiagen, Germany).
6. 70 % ethanol in 18.2 mΩ water (see Note 5).
7. 0.2 M NaOH in 18.2 mΩ water.
8. Washing buffer: For 1 l, add 1.21 g Tris to water, pH 7.6 with
4 M acetic acid. Sterile filter.
9. Annealing buffer: For 1 l, add 2.42 g Tris; 0.43 g magnesium
acetate-tetrahydrate to water, pH 7.6 with 4 M acetic acid.
Sterile filter.
10. Pyrosequencing primer mix: 12 μl of 0.3 μM Pyrosequencing
primer in annealing buffer per well dispensed into a 96-well
Pyrosequencing plate (Qiagen, Germany).
11. Heating block capable of at least 80 °C.
12. Pyrosequencing plate adaptor set (base and iron) (Qiagen,
Germany).
13. Adhesive sealing film for 96-well plates.

2.5 Pyrosequencing 1. 96-well PyroMark Pyrosequencer with Pyrosequencing 96A

version 1.1 or 96MA software or higher. A detailed protocol
for the older PSQ96 or PSQ96MA has been described previ-
ously [8].
2. PSQ cartridge, capillary dispensing tips or nucleotide dispens-
ing tips, and reagent dispensing tips (Qiagen, Germany).
3. PyroMark reagent kit (Qiagen, Germany).
4. DNase and RNase free 18.2 mΩ water.
5. Microcentrifuge.

3 Methods

Pyrosequencing is based on sequencing by synthesis. The assay

takes advantage of the natural release of pyrophosphate whenever
a nucleotide is incorporated onto an open 3′ DNA strand.
100 Cristi R. King and Sharon Marsh

Fig. 1 The Pyrosequencing reaction. A modified ATP is used for the nucleotide dispensations to prevent its
direct use by luciferase in the reaction. Modified and published with permission from Biotage AB

The released pyrophosphate is used in a sulfurylase reaction releas-

ing ATP. The released ATP can be used by luciferase in the conver-
sion of luciferin to oxyluciferin. The reaction results in the emission
of light, which is collected by a CCD camera and recorded in the
form of peaks, known as pyrograms (Fig. 1). When a nucleotide is
not incorporated into the reaction, no pyrophosphate is released
and the unused nucleotide is removed from the system by degrada-
tion through apyrase. This four enzyme process is performed in a
closed system in a single well.
The PyroMarkID and PyroMark MD (with optional plate
loader) instruments will perform the majority of applications,
including analysis of di-, tri-, or tetra-allelic SNPs (simplex or
Pyrosequencing of Clinically Relevant Polymorphisms 101

multiplex), insertions, deletions, methylation analysis, and allele

quantification. In addition, the PyroMarkID can perform short
sequencing, which can be used for microbial typing. Premade kits
are available for several commonly studied polymorphisms, fungal
and mycobacteria typing, and methylation. Currently available kits
can be found at http://www.qiagen.com/Products/Catalog/
Assay-Technologies/Pyrosequencing. The assays contain opti-
mized pretested reagents and primers, eliminating the need for
assay design. Detailed protocols for multiplex, allelic quantifica-
tion, methylation, etc. have been described previously [9, 10]. The
methods below are specific for SNP analysis on the PyroMark 96
well systems.

3.1 PCR 1. Any primer design software, freely available or custom pur-
Primer Design chased may be used to design PCR primers for Pyrosequencing.
The polymorphism may be in any position of the PCR ampli-
con from one base in from the 3′ end of the PCR primer
sequence to centered between the primers. SNPs, indels,
repeats, etc. do not require specific PCR primer design
modifications.
2. Primers should be between 15 and 30 bases long, with an
optimum size of 20 bases, ideally with a GC:AT ratio around
50 % (although not essential, as you are at the mercy of the
location of the polymorphism).
3. Most amplicon sizes are usable for high quality DNA, how-
ever, amplicon sizes of 100–200 bp are suitable for most tem-
plate sources, including fragmented DNA.
4. Care should be taken to avoid any possible template loops
from primers or the single-stranded amplicon doubling back
on themselves, as these can lead to background problems dur-
ing the Pyrosequencing assay (see Note 6).
5. Optimum primer melting temperature (Tm) is 60 °C, how-
ever, again, the position of the polymorphism determines the
ability to design optimum primers and 50–69 °C will work.
The individual primers should ideally have Tms within 2 °C of
each other to allow effective optimization of the PCR.
6. Primer specificity should be checked by screening the primers
across available human genome sequence using the NCBI Blast
program (http://www.ncbi.nlm.nih.gov/blast/). Extra care
should be taken when designing assays for gene family mem-
bers, e.g., cytochromes, or genes with known pseudogenes,
e.g., DHFR, as cross-hybridization of primers can lead to high
background, reduced signal and/or false positive results.
7. One primer needs to be biotinylated at the 5′ end. Which
primer to be biotinylated is dependent on the Pyrosequencing
primer orientation.
102 Cristi R. King and Sharon Marsh

3.2 Pyrosequencing 1. The entire PCR amplicon sequence, including forward and
Primer Design reverse primer sequences is required to generate the optimum
Pyrosequencing primer. The custom software from Qiagen
(Germany) should be used for optimum results.
2. Unless multiplexing is required (see Note 7), the software
should be defaulted to find both forward and reverse primers
to improve the likelihood of obtaining the optimum primer
sequence. The software will list all possible forward and reverse
primers by score. Often “medium” scores yield usable primers,
as certain scoring parameters are more critical than others (see
Note 8). Template loops likely to cause background will not
affect the overall score can cause problems and should be
avoided (see Note 9).
3. The orientation of the Pyrosequencing primer will determine
the PCR primer to be biotinylated. Forward Pyrosequencing
primers require a biotinylated reverse PCR primer, reverse
Pyrosequencing primers require a biotinylated forward PCR
primer.

3.3 PCR Optimization 1. Primer optimization of magnesium concentration and tem-

perature should be carried out in advance for new assays.
Ideally a gradient PCR with different magnesium concentra-
tions should be performed, if a thermocycler with a gradient
block is available. If a premade PCR mix is used, only tempera-
ture optimization need be performed (see Note 10). An exam-
ple gradient set-up based on a 96-well PCR block with gradient
function follows:
Mastermix (see Note 11):
130 μl Amplitaq Gold PCR mastermix (Applied Biosystems,
Foster City, CA)
Forward primer (10 pM final concentration)
Reverse primer (10 pM final concentration)
13 μl DNA
Up to 260 μl with 18.2 mΩ water
Add 20 μl of mastermix to row of a 96-well plate or 12 0.2 ml
tubes and place on the gradient block (ensure samples cover a
continuous row).
PCR program (based on a thermal cycler with a gradient block):
93 °C 20 min (or appropriate temperature/time to activate
taq)
30 cycles of:
94 °C 30 s
55–72 °C 30 s
Pyrosequencing of Clinically Relevant Polymorphisms 103

72 °C 30 s
Then:
72 °C 5 min
4 °C storage.
2. The gradient PCR should be visualized using a 1 or 2 %
agarose gel. The optimal temperature should give the brightest
single band at the appropriate amplicon size. Care should be
taken to avoid temperatures where a smeared or multiband
product can be seen as these can increase pyrosequencing
background or reduce specificity if co-amplifying a different
DNA region. Where several temperatures of equal band
intensity are available, the highest temperature should be
picked to ensure specificity.

3.4 PCR for 1. Care should be taken to avoid contamination. Ideally a sepa-
Pyrosequencing rate room that does not come in contact with post-PCR ampli-
fied DNA or post-PCR pipettes, reagents and consumables
should be used. The bench area should be swabbed with 70 %
ethanol or 5 % bleach solution before each PCR set-up and
barrier tips should be used for all pipetting steps.
2. 1 μl (1–5 ng) DNA (depending on source, see Note 2) should
be dispensed into an unskirted 96-well PCR tray (see Note
12). At least 1 well should not contain DNA to act as a nega-
tive control (see Note 13).
3. A 20 μl PCR reaction is ideal for Pyrosequencing, however, if
the PCR product is especially strong or wide-peak pyrograms
occur, a 10 μl reaction will work well. For a 20 μl reaction
based on ABI Amplitaq Gold PCR mastermix (Applied
Biosystems, Foster City, CA):
10 μl ABI Amplitaq Gold PCR mix
Forward PCR primer (10 pM final concentration)
Reverse PCR primer (10 pM final concentration)
Up to 19 μl with 18.2 mΩ water
1 μl template
4. The PCR plate should be well sealed using a silicon mat or
adhesive film. The following PCR program should be run (see
Note 14):
93 °C 10 min (or relevant temperature/time for taq activation)
55 cycles of:
95 °C 30 s
X °C 30 s (based on gradient derived annealing temp)
72 °C 30 s
104 Cristi R. King and Sharon Marsh

Then:
72 °C 5 min
4 °C storage.
5. It is possible to directly use the PCR product for
Pyrosequencing, however, it is advisable to check the product
and the negative control on a 1–2 % agarose gel to ensure the
reaction has been performed successfully and no contamina-
tion is present. Contamination is identifiable at the
Pyrosequencing stage, however, it is cheaper and faster to run
an agarose gel than process and run a contaminated/failed
Pyrosequencing plate. 96-well plates should be briefly centri-
fuged and the lid removed with care to prevent sample aerosol
and inadvertent cross-contamination. Typically, 5 μl of the
negative control and 5 μl of five to six wells should give an idea
of the success of the PCR. The Pyrosequencing will not be
affected by the reduction in volume in these wells. Due to the
unusually large number of PCR cycles, some smearing may be
visible on a gel, even if the optimum annealing temperature
has been used. At this stage the smearing typically does not
affect the Pyrosequencing reaction if the PCR primers are spe-
cific and the negative control does not contain product.
6. The PCR product can be stored at 4 °C until needed. PCR
trays should be briefly centrifuged prior as condensation may
occur on the lid, which is a possible source of post PCR
contamination.

3.5 PCR Processing This protocol assumes the use of a streptavidin/sepharose bead
for Pyrosequencing set-up for Pyrosequencing on a 96-well PyroMark system. The
magnetic bead processing method for the PSQ96 or PSQ96MA is
described elsewhere [8].
1. A 96-well Pyrosequencing plate containing Pyrosequencing
primer mix should be set-up as described in Subheading 2.4
(see Note 15):
2. The small volume readily evaporates, if the set-up time is lon-
ger than 10–15 min cover the plate with adhesive film. Primer
plates can be aliquoted in advance and stored at 4 °C. It is
advisable to allow them to reach ambient temperature and
briefly centrifuge them before use after storage.
3. Add 70 μl sepharose bead mix as described in Subheading 2.4
to each well of the PCR product. Replace silicon lid/adhesive
film securely.
4. Shake the 96-well plate for 5 min at room temperature. If
using the eppendorf thermomixer, 1,400 RPM is the opti-
mum speed. This allows the streptavidin coated sepharose
beads to anneal to the biotin tag on the PCR primer. Use the
Pyrosequencing of Clinically Relevant Polymorphisms 105

plate immediately, if the plate is allowed to sit the beads will

settle to the bottom of the wells and will not be accessible to
the vacuum tool. If settling has occurred, briefly return the
plate to the shaker to disperse the beads.
5. Align reagent troughs, PCR product/bead mix tray and
Pyrosequencing primer tray on the vacuum workstation (see
Note 16).
6. With the vacuum switched OFF, shake the vacuum tool tips
into clean 18.2 mΩ water. Discard water, refill trough and
switch the vacuum on. Place filter tips into trough until all
water has been removed (approximately 30 s).
7. Place filter tips into the wells containing the PCR/bead mix.
Ensure all liquid has been removed from the tray, slightly
rocking the vacuum tool can prevent surface tension from
causing liquid to remain in the wells. The beads attached to
the biotin primer will prevent the PCR product from going
through the filters.
8. With the vacuum still on, place the filer tips in the 70 % etha-
nol. Wait a few seconds until a good flow of liquid is seen
through the tubing allow the tips to suck up ethanol for 5 s.
Repeat with 0.2 M NaOH and washing buffer. The NaOH
denatures the DNA, so only single stranded PCR product
remains adhered to the filter tips.
9. Switch the vacuum off or remove the vacuum hose from the vac-
uum tool and place the filter tips into the Pyrosequencing plate
containing the Pyrosequencing primer/annealing buffer mix.
Residual vacuum will caused the primer mix to be sucked up
through the tips so ensure it is fully off. Gently rock the tips in
the wells to disperse the PCR product.
10. Place the Pyrosequencing plate onto a heating block at 80 °C
for 2 min. Ensure the plate sits on the Pyrosequencing plate
adaptor with the corresponding lid (or “iron”) placed over the
plate to prevent evaporation. After 2 min, remove from heat-
ing block and place on a bench surface to cool. Once the plate
is cool to the touch, cover with an adhesive seal (unless it will
be run within 10–15 min) to prevent evaporation. If evapora-
tion has occurred, adding 12 μl of annealing buffer will rescue
the plate. Covering the plate while too hot will cause
condensation on the lid, which can lead to cross-contamina-
tion of the wells.
11. Processed plates can be stored at 4 °C until needed.

3.6 Pyrosequencing 1. Open the Pyrosequencing software. A user name and pass-
word is typically required. This is usually set up with instru-
3.6.1 Entering Assay
ment installation. Individual or group-wide passwords can
Details
be used.
106 Cristi R. King and Sharon Marsh

2. If the assay is not already entered into the software, on the left
of the screen click “simplex entry” (see Note 17). In the menu
tree to the right of the simplex entry icon scroll to the top,
right click over “simplex entry” and select “new entry”.
3. The required fields are a unique name for the assay (usually
gene/SNP name) and a sequence to analyze. Usually five to
six bases after the SNP position provides enough information
for the assay. SNPs should be denoted as for example T/C
(tri-allelic or tetra-allelic SNPs can also be entered, e.g.,
G/A/T or G/A/T/C) and indels as for example [GATC].
Short repeats should be entered as a series of indels, e.g., [TA]
[TA][TA]. Clicking “dispensation order” will automatically
generate the least amount of nucleotide dispensations required
for optimum genotype information. The dispensation order
can be manually edited by typing in the dispensation order
field, which is useful for troubleshooting problem assays.
4. Select “show histograms” and the predicted pyrogram pattern
will be displayed on the right. The default screens show both
homozygous patterns and the heterozygous pattern. It is pos-
sible to scroll through histograms on the lower panel, useful if
multiplex of multiple indels are to be analyzed, etc. Selecting
individual or all predicted histograms on the box below the
dispensation order and clicking “export” opens the histograms
in a browser window where they can be printed or saved.
5. Click “save”. At this stage the parameters can no longer be
altered, a duplicate setup with a unique name will need to be
created for any alterations to the assay.

3.6.2 Entering a 1. Select the “SNP run” icon on the far left of the screen.
SNP Run 2. On the menu tree right-click over “SNP run” and select “new
SNP run” (see Note 18).
3. The essential parameters on the setup tab are a unique run
name (e.g., gene/SNP/sample set/date) and the active well
map. The default plate map is for a full 96-well plate. Individual
wells can be selected (hold down control for nonadjacent wells),
clicking the “activate wells” button will grey out unused wells.
In addition, instrument parameters must be selected from the
drop down menu. Usually “instrument parameters” is a default
file, however, care should be taken to ensure the appropriate
parameters are selected for nucleotide or capillary dispensing
tips, as they are not interchangeable. Parameter setup instruc-
tions are found with the dispensing tip packaging.
4. The essential parameters on the setup tab are to select the SNP
assay by clicking on the drop-down menu under “simplex”
and selecting the assay name entered in Subheading 3.6.1, and
to fill the plate map by clicking and dragging over the active
(white) wells (see Note 19).
Pyrosequencing of Clinically Relevant Polymorphisms 107

5. Once the run has been set up, click “save”. This can be edited
post-save, and changes can be re-saved.
6. If multiple plates of the same assay are to be run, on the menu
tree right click over the SNP run you have just entered and
select “duplicate SNP run”. The only parameter necessary is a
unique run name.

3.6.3 Individual Plate 1. On the SNP run setup page described in Subheading 3.6.2,
Run for PyroMark Systems click the “view” tab and select “run”. This will list the appro-
priate volumes of nucleotides, enzyme, and reagent needed
for the individual run.
2. Set up the cartridge holder as shown in Fig. 2. It is essential
that all nucleotide/capillary and reagent tips are clean before
use. To check for blockages in the nucleotide and reagent tips,
fill with 18.2 mΩ water and apply pressure over the top of the
tip. Water should squirt from the bottom of the tip. If this
does not occur, try filling/emptying the tip several times with
water and retry forcing liquid through. If the tip remains
blocked, discard. For nucleotide dispensing tips, do not force
water through them. The hydrophobic disks may dislodge and
prevent the tip from functioning. Rather, ensure the tip has
been rinsed several times in water and has been stored in a
clean, lint-free environment (see Note 20).
3. Nucleotides, enzyme, and substrate are sold as a reagent kit.
Each vial is clearly labeled. Nucleotides come as a solution,
enzyme and reagent are lyophilized and should be resuspended
with 18.2 mΩ water before use; the volumes vary per kit and are
clearly marked on the labels. The enzyme and substrate both
dissolve rapidly and no mixing or shaking is required. Indeed,
this should be avoided as air bubbles in the liquid could cause
tip blockages or inconsistent dispensation. Unused resuspended
enzyme and substrate can be stored at −20 °C for future use.

Fig. 2 Reagent and nucleotide cartridge orientation. E = enzyme, S = substrate, A,

C, G, and T = nucleotides. A modification of dATP is used to prevent the nucleotide
from being a direct source for the oxyluciferase
108 Cristi R. King and Sharon Marsh

4. If using the nucleotide dispensing tips, the nucleotides should

be microfuged for 10 min and care should be taken to not
aliquot from the bottom of the vial in case any precipitate is
present which could cause tip blockage. For all dispensing tips
it is recommended that non-barrier pipette tips are used as
fibers can cause tip blockage.
5. If the capillary dispensing tips are used, the nucleotides should
be diluted 1:1 with TE buffer pH 8 and mixed well before use.
6. The nucleotide and reagent dispensing tips should be filled
according to the volumes suggested by the software. Capillary
dispensing tips should be filled by doubling the amount sug-
gested by the software. Care should be taken not to pipette air
bubbles and to gently angle the liquid down the sides of the
tips. Capillary and reagent dispensing tips can allow minute air
bubbles without affecting their performance. With nucleotide
dispensing tips it is extremely important to check all of the tips
for air bubbles. These can usually be removed by gently tap-
ping the sides of the tips until the air bubbles surface, or, if
necessary, dislodging them with a clean pipette tip.
7. A test plate should be run after each cartridge refill. This is
extremely important when using the nucleotide dispensing
tips, and three or four test plates should be run in succession
to ensure no blockages are present. The substrate reagent dis-
pensing tip is also prone to blockage if the substrate is allowed
to sit in the tip at room temperature for any length of time. To
run a test plate: Place the cartridge in the Pyrosequencer and
the test plate in the 96-well plate platform. On the far left of
the software screen select the “instrument” tab, then select
“instrument” and “manage”. Click “test”. A warning will
appear asking you to check that you have placed the test plate
(see Note 21) into the instrument. Click “ok”. The test takes
approximately 30 s. Remove the plate. In the center there
should be six wells with liquid: four nucleotides, a reagent and
a substrate. If there are less than six wells with liquid, a block-
age has occurred.
8. Remove the adhesive film carefully from the Pyrosequencing
plate and place it in the Pyrosequencer. Close all levers and
click “run” on the plate run setup. The Pyrosequencer will
now automatically dispense enzyme, substrate and nucleotides
in the predetermined dispensation order. The progress of each
individual well can be monitored at any time by selecting the
relevant well on the 96-well plate map on the screen.
9. To automatically analyze the data once the run has completed,
select “analyze all”.
Pyrosequencing of Clinically Relevant Polymorphisms 109

3.6.4 Batch Runs Using 1. SNP runs should be set up as described in Subheading 3.6.2,
the Automatic Plate Loader saved and closed.
2. Select the “Batch run” icon on the far left of the Pyrosequencing
software, on the menu tree right click over “batch runs” and
select “new batch run”. One to ten plates can be run in each
batch. A unique name for each batch must be provided, and
the instrument parameters must be selected for each batch. If
barcoded plates are not used, uncheck the “barcode” field.
3. On the far left of the software click on the “SNP runs” icon.
From the menu tree, click and drag your SNP runs into the
one to ten slots on the batch window.
4. On the top menu bar select “batch” and “setup information”.
This will open a browser window (may take a few seconds)
with the total amount of nucleotides (which should be doubled
for the capillary dispensing tips), enzyme and reagents needed
for the entire batch.
5. The cartridge should be set up as described in Subheading 3.6.3.
The dispensing tips should be cleaned between every batch
and a test plate should be run prior to every batch.
6. Remove the adhesive film from the Pyrosequencing plates, stack
them (check that the plates can be lifted free without sticking to
the lower plates, occasional warping may occur, causing plates
to stick together, which jams the robotic arm). Place plates in
the robot stacker unit. The correct plate orientation is shown on
the top of the stacker unit. Ensure the plates lie flat on the base
of the stacker unit and are between the grooves. Plate 1 on the
Batch set up should be on the top, plate 10 (or the last plate in
the batch setup) should be on the bottom.
7. Ensure the stacker unit is firmly pushed into place. The nucle-
otides will not dispense if the unit is only partially home.
8. Click the “play” icon. Plates will automatically load and be
discarded throughout the batch.
9. Plates will automatically be analyzed by the software when run
in batch mode. They can be accessed from the batch setup
window or from the individual SNP run files.

3.6.5 Analysis 1. Once the Pyrosequencing run has been analyzed by the soft-
of Pyrosequencing Results ware, the 96-well plate map will be color-coded according to
the result. Blue indicates a well where the pyrogram matches
one of the predicted histograms and a genotype can be accu-
rately called. Orange indicates a possible match with a pre-
dicted histogram, however, human intervention is required to
validate the call. Red indicates a failed well, where no match
with a predicted histogram can be found. Figure 3 shows
pyrograms and associated predicted histograms for the tri-
allelic ABCB1 2677 G>A/T polymorphism.
110 Cristi R. King and Sharon Marsh

Fig. 3 Predicted histograms and actual pyrograms for ABCB1 2677 G>A/T genotypes
Pyrosequencing of Clinically Relevant Polymorphisms 111

2. The well(s) where no DNA was added in the PCR reaction

should automatically be scored negative (see Note 22). There
may be nonspecific peaks in the negative control(s). These are
likely to be caused by looping of the internal primer and can
aid trouble-shooting assays by identifying whether the internal
primer is the culprit for background peaks.
3. Samples checked (orange) for human intervention can be
edited by clicking on the specific well and opening up the pre-
dicted histograms from the “histogram” tab on the right. If a
genotype consensus is reached the sample call can be manually
edited by right-clicking over the genotype above the pyro-
gram. Genotypes can be selected and pass/check/fail can be
altered. The well on the plate map will show a dark circle,
indicating that manual editing has taken place.
4. The data can be exported as a report, as a tab delimited file, or
an XML file. Custom export options are also available. The
export function can be accessed by selecting “report” and
then saved as the appropriate file type. Selected wells or the
entire plate can be save/exported. Pyrograms (all or selected)
can also be saved or printed, up to 6 per page (see Note 23).

4 Notes

1. Pyrosequencing has been successfully performed on DNA

from cell lines, blood, serum, plasma, paraffin embedded tis-
sue frozen tissue, and whole genome amplified product. In
addition, cDNA from various sources has also been success-
fully pyrosequenced.
2. The actual starting concentration of DNA depends on the
quality of the template. DNA extracted from blood is highly
accessible for PCR and consequently 0.5–1 ng can produce
reliable, reproducible product. DNA from plasma, serum,
frozen tissue and whole genome amplified methods tend to
be fragmented and more template may be necessary for opti-
mum PCR. A test in advance of serial dilutions of the tem-
plate DNA should be performed with the PCR primers to
find the appropriate concentration that gives a clean high-
yield PCR product.
3. Premade mixes of buffer, magnesium, dNTPs, and Taq poly-
merase are recommended as they provide consistent results
and minimize pipetting errors.
4. Non-hot start Taq is also suitable, however, primer dimers
are less of a problem with hot start Taq and this is
recommended.
112 Cristi R. King and Sharon Marsh

5. All solutions should be made using 18.2 mΩ water. Solutions

other than the NaOH and 70 % ethanol should be sterile fil-
tered prior to the addition of Tween 20. 10× washing buffer,
annealing buffer, and NaOH can be made and stored at room
temperature for dilution to the working concentrations. All
solutions can be stored at room temperature.
6. Problem template loops will also be flagged in the
Pyrosequencing primer design software.
7. This protocol is based on simplex assays; however, multiplex-
ing with up to three internal primers can be performed, either
from the same PCR product or different PCR products. The
primer design software can only determine one internal primer
at a time, often the first choice primers for each will not be
useful in a multiplex assay where the combined sequence to
analyze is best designed to generate unique SNP dispensa-
tions. In addition, the orientation of the primers is vital for
multiplex assays as only one PCR primer can be biotinylated.
8. Issues of concern from primer design:
Mis-priming: If the internal primer can anneal to multiple
positions within the amplicon the 3′ ends of the annealed
region can incorporate nucleotides leading to incorrect geno-
type calls or unacceptable background.
Duplex formation: If the internal primer can dimerize with
itself, as for the mis-priming, unacceptable background may
result, or reduced signal intensity due to suboptimum primer
annealing.
Hairpin loop: If the primer forms secondary structures the
amount of primer available for the reaction is diminished and
reduced signal can result.
Template loop: Loops of more than ~4–5 GC rich regions will
be flagged by an asterisk and should be avoided. Loops less
than 4 bases should also be avoided if possible to reduce the
likelihood of background.
Noncritical parameters from Pyrosequencing primer design:
Repeated base at SNP sequence: This is not something that can
be controlled or optimized for the SNP position is not move-
able. Typically the pyrograms can accommodate up to three
bases in a row with no problems. Four to six bases may be dif-
ficult to read manually as the scale will be affected. Over six
repeated bases are not recommended as distinguishing the
peak heights become very difficult.
Primer length: The length of the primer is not critical to the
reaction.
Pyrosequencing of Clinically Relevant Polymorphisms 113

9. If an appropriate Pyrosequencing primer cannot be found as

the critical scoring parameters are flagged, it is possible to
“trick” the software to improve the search. As the software will
only look five bases either side of the SNP for a suitable primer,
entering a fake SNP 5 bases before or after will extend the
region searched. This may help to overcome mis-priming and
dimer problems. To eliminate template loops, adjusting the 5′
end of the PCR primer that would cause the loop will help,
e.g., shifting the primer two to three bases to the left or right,
or trying a PCR primer in a slightly different region. As only
one primer is likely to cause the loop problem, if a primer in
the opposite orientation is available (even if not the highest
score), this is often the easiest solution.
10. Premade PCR mixes are usually a fixed magnesium chloride
concentration. If primer conditions are not optimized through
temperature alone, extra magnesium chloride may be added to
the PCR mix. In addition, problem assays may be improved by
the addition of 5–10 % DMSO or 1 M Betaine. This will not
affect the Pyrosequencing.
11. The mix is for 13 samples, allowing one extra sample for pipet-
ting discrepancies.
12. If a larger volume of DNA is necessary, adjustments can be
made to the PCR mastermix (reducing the water volume), or
DNA may be dispensed into the plate and allowed to dry
down overnight at room temperature. The DNA is reconsti-
tuted once the PCR mastermix is added.
13. For multiple primer sets/plate at least one negative control/
primer set should be included, as well as a negative control
with all primer sets combined.
14. 55 cycles are run to ensure all primers and nucleotides are
exhausted and not available to cause background during the
Pyrosequencing. If wide peaks occur in the program, reducing
the number of cycles to 40 may help to prevent these.
15. Multiple assays can be run/96-well plate, indeed, each well
could contain a different internal primer. The wells corre-
sponding to the negative controls from the PCR setup should
contain internal primer as this is a valuable trouble-shooting
method for program background issues.
16. A workstation platform is available from Pyrosequencing,
which holds the reagent troughs and plates in specified posi-
tions. Any method to hold the reagent troughs stationary is
appropriate, e.g., rigid plastic tip box lids. A video protocol for
setting up a late can be viewed at: http://www.jove.com/
index/Details.stp?ID=630, doi: 10.3791/630
114 Cristi R. King and Sharon Marsh

17. If a multiplex assay is to be set up, select the “multiplex entry”

icon, right click over “multiplex entry” on the menu tree and
select “new entry”. Type in the three separate dispensation
orders for each internal primer. The computer generated dis-
pensation order will give a combined dispensation for the
three SNPs. The field requirements here are the same as for
the simplex entry except two or three sequences to analyze
may be entered.
18. The menu tree for SNP runs can be organized into folders so
multiple users can easily access their files. If this has been done,
right click over the relevant folder and select “new run”.
19. Each well can contain a different simplex/multiplex entry if
desired, simply select the entry and click in the appropriate
well until all active wells are filled.
20. Pyrosequencing provides specific storage boxes for the tips
with the instrument, and more are available from the company
if required.
21. To save on plate costs, attach adhesive film to the top of the test
plate. The dispensation will occur on the film, rather than in
the wells and this can be wiped off and the plate can be reused.
22. If multiple primer sets are used per plate, the negative controls
for each primer set should be checked for contamination.
23. The report structure is available in forms readily transferable
to most database/spreadsheet systems.

References
1. Evans WE, McLeod HM (2003) 6. Saeki M, Saito Y, Jinno H, Tohkin M, Kurose
Pharmacogenomics—drug disposition, drug K, Kaniwa N, Komamura K, Ueno K,
targets, and side effects. N Engl J Med 348: Kamakura S, Kitakaze M, Ozawa S, Sawada J
538–549 (2003) Comprehensive UGT1A1 genotyping
2. Marsh S (2009) Pyrosequencing. In: Patrinos in a Japanese population by pyrosequencing.
GP, Ansorge W (eds) Molecular diagnostics. Clin Chem 49:1182–1185
Elsevier, USA 7. Garsa A, Marsh S, McLeod HL (2005)
3. Freimuth RR, Ameyaw M-M, Pritchard SC, CYP3A4 and CYP3A5 genotyping by
Kwok P-Y, McLeod HL (2004) High- Pyrosequencing. BMC Med Genet 6:19
throughput genotyping methods for pharma- 8. Rose CM, Marsh S, Ameyaw MM, McLeod
cogenomic studies. Curr Pharmacogenomics HL (2003) Pharmacogenetic analysis of clini-
2:21–33 cally relevant genetic polymorphisms. Methods
4. Ahluwalia R, Freimuth R, McLeod HL, Marsh S Mol Med 85:225–237
(2003) Use of pyrosequencing to detect clini- 9. Lee SS, Kim WY, Jang YJ, Shin JG (2008)
cally relevant polymorphisms in dihydropyrimi- Duplex pyrosequencing of the TPMT3C and
dine dehydrogenase. Clin Chem 49:1661–1664 TPMT6 alleles in Korean and Vietnamese pop-
5. Hoskins JM, Marcuello E, Altes A, Marsh S, ulations. Clin Chim Acta 398:82–85
Maxwell T, Van Booven DJ, Pare L, 10. Yu J, Marsh S (2008) SNP and DNA methyla-
Culverhouse R, McLeod HL, Baiget M (2008) tion analysis with Pyrosequencing. In: Wang F
Irinotecan pharmacogenetics: influence of (ed) Biomarker methods in drug discovery and
pharmacodynamic genes. Clin Cancer Res development: methods and protocols.
14:1788–1796 Humana, Totowa, pp 119–140
Chapter 7

Pharmacogenetics Using Luminex® xMAP® Technology:

A Method for Developing a Custom Multiplex Single
Nucleotide Polymorphism Mutation Assay
Gonnie Spierings and Sherry A. Dunbar

Abstract
Sequence variations in the human genome can affect the development of diseases and provide markers for
the identification of genetic diseases and drug susceptibility. Single Nucleotide Polymorphisms (SNPs), the
most abundant sequence variations in the genome, are used in pharmacogenetics as indicators of drug
therapy efficacy in individuals and are important road maps in the route to personalized medicine. This
chapter describes the development of PCR based custom multiplex SNP mutation analysis assays using
Luminex® Multi-Analyte Profiling (xMAP®) Technology. Up to 500 different mutations can be detected
in a single well and up to 384 samples can be analyzed per run.

Key words Pharmacogenetics, Luminex®, xMAP® technology, Multiplex mutation analysis,

Microsphere, Suspension array, Liquid array, Nucleic acid detection, SNP analysis

1 Introduction

Since the introduction of the xMAP® Technology by Luminex®,

the platform has found its way into the scientific community,
including research, clinical research, and pharmaceutical laboratories
[1, 2]. This is not surprising since (1) it can be used for protein and
nucleic acid based assays, (2) a broad spectrum of assays are com-
mercially available, among which is an IVD and CE marked 2D6
panel (xTAG® CYP2D6 kit v3) (Table 1) [3], and (3) the open
architecture allows for the development of custom assays.
At the heart of the xTAG® platform are polystyrene micro-
spheres or so called “beads”. There are two types of microspheres;
the MicroPlex® microspheres which have a diameter of 5.6 μm and
the superparamagnetic MagPlex® Microspheres, which are 6.5 μm in
diameter. Both types of microspheres have carboxylated surfaces to
which different capture molecules like proteins or oligonucleotides

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_7, © Springer Science+Business Media, LLC 2013

115
116 Gonnie Spierings and Sherry A. Dunbar

Table 1
Mutations and polymorphisms detected by the xTAG®
CYP 2D6 Kit v3. All ancillary reagents are included in the kit

Star (*) genotype Mutations and polymorphismsa

*1 None
*2 −1584C>G, 1661G>C, 2850C>T, 4180G>C
*3 2549A>del
*4 100C>T, 1661G>C, 1846G>A, 2850C>T, 4180G>C
*5 Deletion
*6 1707T>del, 4180G>C
*7 2935A>C
*8 1661G>C, 1758G>T, 2850C>T, 4180G>C
*9 2613delAGA
*10 100C>T, 1661G>C, 4180G>C
*11 883G>C, 1661G>C, 2850C>T, 4180G>C
*15 138insT
*17 1023C>T, 1661G>C, 2850C>T, 4180G>C
*29 1659G>A, 1661G>C, 2850C>T, 3183G>A, 4180G>C
*35 −1584C>G, 31G>A, 1661G>C, 2850C>T, 4180G>C
*41 1661G>C, 2850C>T, 2988G>A, 4180G>C
DUP Duplication
The assay is developed to be analyzed on Luminex®100™ and Luminex®200™
a
Nucleotide changes that define the (*) star genotype are shown in bold font

can be covalently bound. The microspheres are internally dyed

with precise amounts of two or three spectrally different fluoro-
chromes. By using this method an array is created of up to 500
different microspheres sets, each possessing a unique spectral
address, allowing them to be simultaneously measured in a single
reaction vessel. A fourth fluorochrome, coupled to a reporter mol-
ecule, detects the biomolecular interaction that occurs at the sur-
face of the microsphere. Multiple readings are made per microsphere
set, providing valid and robust statistics.
The microspheres are interrogated in Luminex analyzers.
Currently there are three different Luminex analyzers available:
Luminex® 200™ (and its predecessor Luminex®100™), FLEXMAP
3D®, and MAGPIX®.
Luminex®200™ and FLEXMAP 3D® are based on flow cytom-
etry and the microspheres are interrogated individually in a rapidly
flowing fluid stream as they pass two lasers, a 635-nm 10 mW red
Luminex® xMAP® Technology 117

Fig. 1 Schematic presentation of the detection modules of Luminex® 200™/FLEXMAP 3D® (a) and MAGPIX® (b)

diode laser and a 532-nm green 13 mW yttrium aluminum garnet

laser (YAG).
The red laser excites the internal fluorochromes in order to
classify the microspheres. The green laser excites the reporter fluo-
rochrome (R-phycoerythrin, Alexa 532, or Cy3) bound to the sur-
faces of the microspheres when the analytes for the corresponding
beads are present in the sample (Fig. 1a). Both types of micro-
spheres can be analyzed.
The MAGPIX® utilizes a flow cell, a red (classification) and a
green (reporter) LED for illumination of the microspheres, and
CCD based optics for signal recording. During the illumination
and recording process a magnet is used to hold the MagPlex®
Microspheres in the optics module. After the image recording,
the magnet is released and the sample is transported to the waste
container (Fig. 1b).
xMAP® technology has been used extensively for SNP
genotyping, both in the direct hybridization format and the
solution-based microsphere capture format as described previously
[1]. The direct hybridization assays use the fact that there is a
difference of several degrees in hybridization temperature of an
118 Gonnie Spierings and Sherry A. Dunbar

oligonucleotide to a perfect match template compared to a template

containing a single base mismatch. The SNP of interest is amplified
via a PCR reaction and, by labeling one of the PCR primers, the
amplified product acquires a fluorescent reporter. A capture probe,
complementary to part of the sequence of the labeled PCR strand,
is modified with an amine group and spacer. Using a carbodiimide
coupling procedure, the modified capture probe is coupled to the
carboxylated microsphere [4]. When multiplexing different SNPs,
care should be taken that the hybridization temperatures of the
different PCR strands to their corresponding capture probes are
equal and no cross hybridization occurs.
The xTAG® Microspheres (previously named FlexMAP™
Microspheres) and the recently introduced superparamagnetic
MagPlex®-TAG™ Microspheres are pre-coupled with xTAG® oli-
gonucleotides (anti-tags) that are optimized to have a hybridization
temperature of 37 °C and to have minimum cross-reactivity. Using
these pre-coupled microspheres, different types of solution-based
microsphere capture assays like Allele Specific Primer Extension
(ASPE), Oligonucleotide Ligation Assay (OLA), Multiplex Ligation-
dependent Probe Amplification (MLPA), Gap Ligase Chain Reaction
(Gap-LCR) and Multiplex Oligonucleotide Ligation (MOL)-PCR
[5–9] have been used. Specific capture sequences (tags) are added to
allele-specific primers or probes and are subsequently incorporated
during an enzymatic step, allowing hybridization to the complemen-
tary anti-tag sequence on the microsphere surface. The assays rely on
the discrimination ability of DNA polymerases (ASPE) or DNA
ligases (OLA, MLPA, MOL-PCR, Gap-LCR).
ASPE and OLA start with a PCR step, whereas MLPA, MOL-
PCR, and Gap-LCR start with a ligation step. By attaching uni-
versal primer sequences to both the allele-specific probe and to the
reporter probe, the subsequent PCR reactions of MLPA, MOL-
PCR and Gap-LCR therefore only need the use of one PCR
primer pair.
In this chapter we present a method to develop multiplex SNP
assays on the xMAP® Platform using MagPlex®-TAG™
Microspheres that only needs a PCR step before hybridizing the
labeled products to the microspheres (see Note 1). It should be
noted that in SNP analysis allelic ratios need to be calculated. This
requires the use of one MagPlex®-TAG™ microsphere to detect
the Wild-type SNP and the use of a different MagPlex®-TAG™
Microsphere to detect the Mutation SNP (see Note 2).
The PCR reaction should be designed so that the production
of the anti-TAG sequence (complement of TAG) on the nontarget
strand is prevented or minimized. This can be achieved by the
following PCR amplification strategies (Fig. 2):
Method A: Asymmetric PCR—The TAGged primers (for the target
strands) are in excess relative to the primers without TAG
Luminex® xMAP® Technology 119

Fig. 2 Schematic representation of PCR products hybridized to the corresponding MagPlex®-TAG™

microspheres

(for the nontarget strands). Optimize the ratio of TAGged to

non-TAGged primers (usually 10:1–100:1). Include a bioti-
nylated dNTP in the PCR reaction to biotinylate the TAGged
target strand.
Method B: PCR with Lambda Exonuclease Treatment—The primers
without TAG are 5′ phosphorylated. The completed PCR reac-
tions are treated with Lambda Exonuclease to degrade the phos-
phorylated nontarget strands. Include a biotinylated dNTP in
the PCR reaction to biotinylate the TAGged target strands.
Method C: Spacer-modified TAGged Primers—Design the TAGged
primers so that there is a spacer modification between the TAG
and target-specific sequence to prevent amplification of the anti-
TAG sequence in the nontarget strand. The primers without
TAG are 5′ biotinylated to label the PCR products (see Note 3).

2 Materials

2.1 Equipment 1. Luminex® xMAP analyzer run under xPONENT® software

(either Luminex®100™, Luminex®200™, FLEXMAP 3D®,
MAGPIX®).
120 Gonnie Spierings and Sherry A. Dunbar

2. Thermal cycler for 0.2 ml thin wall PCR tubes and 96-well
plates.
3. Microcentrifuge for 1.5 ml and 0.2 ml tubes.
4. Vortex mixer.
5. Mini bath sonicator.
6. Cold block for 1.5 ml and 0.5 ml microcentrifuge tubes.
7. PCR cooler rack for 0.2 ml thin wall PCR tubes (96-well
compatible).
8. Pipettes (P10, P20, P100, P200, P1000).
9. 8 channel pipette (1–10 μl, 5–50 μl, 50–200 μl).
10. Racks for 1.5 ml and 0.5 ml microcentrifuge tubes and for
0.2 ml thin-walled PCR tubes.
11. Dynal MPC®-96S Magnetic Particle Concentrator (see Note 4).
12. 96-well plate magnet compatible with V-bottom plates (see
Notes 5 and 6).
13. Pipette aid.

2.2 Consumables 1. 0.2 ml thin wall polypropylene tubes for PCR (see Note 7).
2. 1.5 ml and 0.5 ml polypropylene microcentrifuge tubes.
3. 25 ml Pipettes.
4. Polypropylene tubes (Falcon® tubes): 15 ml and 50 ml.
5. Aerosol Resistant tips for Pipettes.
6. Corning Costar ® Thermowell® Thin-wall polycarbonate
96-well plate (see Note 8).
7. Bio-Rad Microseal® A.
8. Parafilm M.
9. Reservoir basins.

2.3 Reagents 1. MagPlex®-TAG™ Microspheres (see Notes 9 and 10).

2. PCR amplification primers (see Note 11): Resuspend in
Molecular Biology grade water to a concentration of 1 mM.
For each target, one primer has a unique TAG sequence, or a
unique TAG and spacer (see Method C), at the 5′ end upstream
from the target-specific sequence. The other primer is designed
according to one of the following methods:
(a) Method A. Asymmetric PCR: the primer without TAG is
unmodified.
(b) Method B. PCR with Lambda Exonuclease treatment: the
primer without TAG is 5′ phosphorylated.
Luminex® xMAP® Technology 121

(c) Method C. Spacer-modified TAGged primers: primer with-

out TAG is 5′ biotinylated (see Notes 12 and 13).
3. Molecular Biology grade water.
4. Qiagen HotStarTaq® Polymerase including 10× PCR Buffer
and 25 mM MgCl2 or equivalent.
5. Lambda Exonuclease and 10× reaction buffer (for Method B).
6. dNTPs at 100 mM each.
7. Biotin-14-dCTP at 0.4 mM (for Methods A and B) (see Note 12).
8. 1.1× Tm Hybridization Buffer: 0.22 M NaCl, 0.22 M Tris,
0.088 % Triton X-100, pH 8.0 (see Note 14).
9. 1× Tm Hybridization Buffer: 0.2 M NaCl, 0.1 M Tris, 0.08 %
Triton X-100, pH 8.0 (see Note 15).
10. Streptavidin–R-phycoerythrin (SA-PE) 1 mg/ml (Invitrogen)
(see Notes 13 and 16).

3 Methods

3.1 Multiplexed The following procedures are for single PCR reactions (see Notes
PCR Reaction 17 and 18). Scale it to analyze up to 96 samples by multiplying the
volumes by the number of samples (see Note 19). PCR should be
performed under optimized conditions. The parameters listed
below are for example purposes only.
1. PCR Set-up
(a) Method A: Asymmetric PCR.
Prepare the following PCR mix per sample: 1× Qiagen
PCR reaction buffer, 1.5 mM MgCl2, 200 μM each dNTP
(-dCTP), 200 μM biotin-dCTP, 0.4–1 μM each TAGged
primer, 0.004–0.1 μM each primer without TAG, 2.5 Units
Qiagen HotStarTaq® polymerase, 50 ng template (see
Notes 20 and 21).
(b) Method B: PCR with Lambda Exonuclease Treatment.
Prepare the following PCR mix per sample: 1× Qiagen
PCR reaction buffer, 1.5 mM MgCl2, 200 µM each dNTP
(-dCTP), 200 µM biotin-dCTP, 0.2 μM each primer,
2.5 Units Qiagen HotStarTaq® polymerase, 50 ng tem-
plate (see Note 21).
(c) Method C: Spacer-modified TAGged Primers.
Prepare the following PCR mix per sample: 1× Qiagen PCR
reaction buffer, 1.5 mM MgCl2, 200 μM each dNTP, 0.2 μM
each primer, 2.5 Units Qiagen HotStarTaq® polymerase,
50 ng Template (see Note 22).
122 Gonnie Spierings and Sherry A. Dunbar

2. PCR Cycles (all three methods)

Hold: 95 °C, 15 min (for enzyme activation)

Cycle: 94 °C, 30 s
55 °C, 30 s
72 °C, 30 s
35 Cycles
Hold: 72 °C, 7 min
Hold: 4 °C, Forever

3.2 Lambda 1. 5 μl PCR reactions (Method B), 1 μl 10× lambda Exonuclease

Exonuclease reaction buffer, 5–10 Units Lambda Exonuclease, add
Treatment of PCR Molecular Biology grade water to a final volume of 10 μl and
Product. (Only for mix, place in thermal cycler.
Method B, for Method 2. Thermal cycler conditions
A and C Proceed to
Subheading 3.3) Hold 37 °C, 30 min
Hold 80 °C, 15 min
Hold 4 °C, Forever

3.3 Hybridization For Methods A and B. (See Note 23)

to MagPlex®-TAG™
1. Select the appropriate MagPlex®-TAG™ microsphere sets and
Microspheres
resuspend according to the instructions described in the
Product Information Sheet provided with the microspheres.
2. Combine 2,500 microspheres of each region per reaction.
3. Dilute/concentrate the MagPlex®-TAG™ microsphere mixture
to 125 of each microsphere target per μl in 1.1× Tm Hybridization
Buffer by vortex and sonication for approximately 20 s.
4. Aliquot 20 μl of the MagPlex®-TAG™ microsphere mixture to
each well.
5. Add 1–5 μl of dH2O to each background well.
6. Add 1–5 μl of each PCR reaction to appropriate wells.
7. Cover the plate to prevent evaporation and denature at 96 °C for
90 s.
8. Hybridize at 37–45 °C for 30 min (see Notes 24 and 25).
9. Prepare Reporter Mix by diluting SA-PE to 8–10 μg/ml in
1× Tm Hybridization Buffer (see Note 26).
10. Add 70 μl to each well. Mix gently.
11. Incubate at 37–45 °C for 15 min.
12. Analyze 70 μl at hybridization temperature on the Luminex
analyzer according to the system manual.
Luminex® xMAP® Technology 123

For Method C
1. Select the appropriate MagPlex®-TAG™ microsphere sets and
resuspend according to the instructions described in the
Product Information Sheet provided with your microspheres.
2. Combine 2,500 microspheres of each set per reaction.
3. Dilute/concentrate the MagPlex®-TAG™ microsphere mixture
to 125 of each microsphere set per μl in 1.1× Tm Hybridization
Buffer by vortex and sonication for approximately 20 s.
4. Aliquot 20 μl of the MagPlex®-TAG™ microsphere mixture to
each well.
5. Add 1–5 μl of dH2O to each background well.
6. Add 1–5 μl of each PCR reaction to appropriate wells.
7. Prepare Reporter Mix by diluting SA-PE to 8–10 μg/ml in 1× Tm
Hybridization Buffer.
8. Add 70 μl to each well. Mix gently.
9. Cover the plate to prevent evaporation and hybridize at
37–45 °C for 25–45 min.
10. Analyze 70 μl at hybridization temperature on the Luminex
analyzer according to the system manual.

4 Notes

1. Certain applications of MagPlex®-TAG™ Microspheres may be

covered by patents owned by other parties than Luminex.
Purchase of MagPlex®-TAG™ Microspheres does not convey a
license to any third-party patents unless explicitly stated in
writing. You are responsible for conducting the necessary due
diligence and securing rights to any third-party intellectual
property required for your specific application(s) of any law
regulation. Nothing herein is to be construed as recommend-
ing practice or any product in violation of any patent or in
violation of any law or regulation.
2. Allelic Ratio Normal = MFI normal/(MFI normal + MFI mutant).
Allelic Ratio Mutant = MFI mutant/(MFI normal + MFI mutant).
Homozygote > 0.75 on one bead set.
Heterozygote 0.25–0.75 on each bead set.
Often x = 0.75 and y = 0.25, but these values should be deter-
mined for each SNP.

3. Do not denature the PCR product of Method C prior to the

hybridization step.
124 Gonnie Spierings and Sherry A. Dunbar

4. Instead of using a magnet, a microcentrifuge can be used to

pellet the Magnetic Microspheres by centrifugation at
≥2,250 × g for 3 min and remove the supernatant.
5. In order to automate the procedure, a microplate washer
compatible with Magnetic Microspheres can be used.
6. Instead of using a magnet, a 96-well plate centrifuge (≥2,250 × g
for 3 min) or a 1.2 μm Millipore filter plate and vacuum manifold
can be used.
7. For ease of use we advise to use Strips of eight PCR vials with
attached flat caps.
8. Plate compatible with Luminex analyzers.
9. Make sure the MagPlex®-TAG™ Microspheres used are com-
patible with the type of Luminex analyzer used. For more
information please visit http://www.luminexcorp.com.
10. MagPlex®-TAG™ Microspheres are light sensitive. Protect
from light during incubation steps.
11. PCR Primer Design
(a) PCR primers should be designed to amplify a region con-
taining the SNP of interest.
(b) The discriminating target-specific PCR primers should be
synthesized for all sequence variants and should be from
the same DNA strand (per SNP).
(c) PCR primers should be matched for melting temperature
at 51–56 °C.
(d) The target-specific PCR primer should extend out to and
include the SNP as the 3′ nucleotide.
(e) Use oligo design software to select an appropriate TAG
sequence.
12. Biotin is light sensitive, so protect from light during incubation
steps.
13. It is possible to label the primer without TAG with a fluorescent
dye like Alexa 532 or Cy3. When using a fluorescent labeled
primer no SA-PE is needed. The obtained signal intensities will
however be decreased by approximately 70 % as compared to
using the Biotin/SA-PE method.
14. For 250 ml 1.1× TM buffer: 27.5 ml 1 M Tris–HCl pH 8.0,
11 ml 5 M NaCl, 0.22 ml Triton® X-100, 211.28 ml Molecular
Grade dH2O. Filter-sterilize and store at 4 °C.
15. For 250 ml 1× TM buffer: 25 ml Tris–HCl pH 8.0, 10 ml 5 M
NaCl, 0.2 ml Triton® X-100, 214.8 ml Molecular Grade
dH2O. Filter-sterilize and store at 4 °C.
16. SA-PE is light sensitive. Protect from light at all times.
17. Perform PCR setup in Pre-PCR area.
Luminex® xMAP® Technology 125

18. Prior to use, mix all solutions, except enzyme stock solutions,
by short vortex (2–5 s) and settle the reagents to the bottom of
the tube by short centrifugation (2–5 s). Enzyme stock solutions
should be taken from freezer when ready to use and returned
to freezer immediately after use (alternatively it can be kept on
a freezer block). Mix enzyme stock solutions by inverting and
flicking the tube followed by a short centrifugation step (2–5 s)
to settle reagents to the bottom of the tube.
19. When calculating master mix volumes for multiple reactions,
include a minimum of 10 % overage to account for variability
in pipetting. After making the master mix, vortex the solution
(2–5 s) followed by a short centrifugation step (2–5 s) and
then aliquot in the separate tubes.
20. Ratio of TAGged (excess) to non-TAGged (limiting) primer
should be optimized in the 10:1–100:1 range.
21. Ratio of biotinylated to unlabeled dNTPs may require
optimization.
22. During setup, keep master mix and samples on ice or a cold
block. Preheat thermal cycler to 95 °C.
23. If background signals are too high wash steps may need to be
added after the hybridization step and/or after adding the SA-PE.
24. These steps can be performed on a thermal cycler programmed
as follows: Hold at 96 °C, 90 s, Hold at 37 °C, Forever.
25. Optional Wash Procedure after Hybridization Step:
(a) Pellet the MagPlex®-TAG™ microspheres by placing the
plate on a magnetic separator and allow separation to
occur for 30–60 s. Remove the supernatant.
(b) Resuspend the pelleted MagPlex®-TAG™ microspheres in
75 μl of 1× Tm Hybridization Buffer.
(c) Repeat steps (a) and (b). This is a total of two washes.
(d) Pellet the MagPlex®-TAG™ microspheres by placing the
plate on a magnetic separator and allow separation to
occur for 30–60 s. Remove the supernatant.
● Alternatively, wash steps can be performed by centrif-
ugation or vacuum filtration.
– Pellet the MagPlex®-TAG™ microspheres by
centrifugation at ≥2,250 × g for 3 min and remove
the supernatant.
– Pre-wet a 1.2 μm Millipore filter plate with 1×
Tm Hybridization Buffer and filter by vacuum
manifold. Transfer the reactions to the pre-wetted
filter plate and remove the supernatant by vacuum
filtration. Wash twice with 100 μl 1× Tm
Hybridization Buffer.
126 Gonnie Spierings and Sherry A. Dunbar

(e) Resuspend microspheres in 75 µl of 1× Tm Hybridization

Buffer containing 2–8 µg/ml SA-PE.
(f) Incubate at 37–45 °C for 15 min.
(g) Analyze 50 µl at hybridization temperature on the Luminex
analyzer according to the system manual.
26. Make the diluted SA-PE when ready to use in an appropriate
polypropylene tube. Prolonged storage of diluted SA-PE in a
plastic container will decrease the SA-PE concentration in
solution.

References
1. Dunbar SA (2006) Applications of Luminex® Methods in molecular biology: Salmonella:
xMAP™ technology for rapid, high-throughput methods and protocols, vol 394. Humana,
multiplexed nucleic acid detection. Clin Chim Totowa, NJ, pp 1–19
Acta 363:71–82 5. Pickering JW et al (2004) Flow cytometric
2. Dunbar SA, Hoffmeyer MR (2013) assay for genotyping cytochrome P450 2C9
Microsphere-based multiplex immunoassays: and 2 C19—comparison with a microelec-
development and applications using Luminex® tronic DNA array. Am J Pharmacogenomics
xMAP® Technology. In: Wild DG (ed) The 4(3):199–207
immunoassay handbook, 4th edition. Elsevier, 6. Bruse E et al (2008) Improvements to bead-
Amsterdam, NL, pp 157–174 based oligonucleotide ligation SNP genotyping
3. Luminex Corporation (2010) Luminex assays. Biotechniques 45:559–571
Corporation launches new FDA cleared phar- 7. Monico CG et al (2007) Comprehensive
macogenetic diagnostic test. xTAG® mutation screening in 55 probands with type 1
CYP2D6Kit can assist physicians in improving primary Hyperoxaluria shows feasibility of a
patient care by helping to determine a person- gene-based diagnosis. J Am Soc Nephrol 18:
alized therapeutic strategy: press release. 1905–1914
PRNewswire via COMTEX. http://www. 8. Tian F et al (2008) A new single nucleotide
prnewswire.com/news-releases/luminex- polymorphism genotyping method based on
corporation- launches-new-fda-clear ed- gap ligase chain reaction and a microsphere
pharmacogenetic-diagnostic-test-108680239. detection assay. Clin Chem Lab Med 46:
html. Accessed 17 Nov 2010 486–489
4. Dunbar SA, Jacobson JW (2007) Quantitative, 9. Deshpande A et al (2010) A rapid multiplex
multiplexed detection of Salmonella and other assay for nucleic acid-based diagnostics.
pathogens by Luminex® xMAP™ suspension J Microbiol Methods 80:155–163
array. In: Schatten H, Eisenbank A (eds)
Chapter 8

Use of Linkage Analysis, Genome-Wide Association

Studies, and Next-Generation Sequencing in
the Identification of Disease-Causing Mutations
Eric Londin, Priyanka Yadav, Saul Surrey, Larry J. Kricka,
and Paolo Fortina

Abstract
For the past two decades, linkage analysis and genome-wide analysis have greatly advanced our knowledge
of the human genome. But despite these successes the genetic architecture of diseases remains unknown.
More recently, the availability of next-generation sequencing has dramatically increased our capability for
determining DNA sequences that range from large portions of one individual’s genome to targeted regions
of many genomes in a cohort of interest. In this review, we highlight the successes and shortcomings that
have been achieved using genome-wide association studies (GWAS) to identify the variants contributing to
disease. We further review the methods and use of new technologies, based on next-generation sequencing,
that are becoming increasingly used to expand our knowledge of the causes of genetic disease.

Key words Linkage analysis, Genome-wide association study, Massively parallel sequencing, NGS-
applications, Pharmacogenomics

1 Introduction

Over the last 25 years, exciting progress has been made in identifying
the genetic variants associated with human diseases. During this
time, genes responsible for over 3,000 Mendelian disorders have
been identified (Online Mendelian Inheritance in Man, http://www.
ncbi.nlm.nih.gov/omim); however, identifying variants associated
with complex diseases has proven more difficult. More recently, new
genomic methods have begun to impact this field and currently more
than 1,300 variants have been associated with a variety of complex
diseases (http://www.genome.gov/gwastudies) [1].
Linkage analysis and more recently genome-wide association
studies (GWAS) have been the main tools to identify variants for
Mendelian and complex diseases, respectively. Both methods reveal
genomic regions associated with disease rather than the actual

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_8, © Springer Science+Business Media, LLC 2013

127
128 Eric Londin et al.

disease-causing variants. In some instances, follow-up examination

of these regions has enabled discovery of the causal variant. While
both are powerful techniques and have increased our knowledge of
the genetic basis of many diseases, they are not amenable to all
diseases, and many of the disease-causing variants have remained
elusive. In this review, we highlight the successes and shortcomings
that have been achieved using GWAS to identify the variants con-
tributing to disease. We further review the methods and use of new
technologies, based on next-generation sequencing, that are
becoming increasingly used to expand our knowledge of the causes
of genetic disease.

2 Methods for Identifying Genomic Variants Associated with Disease

2.1 Genome-Wide The initial successes in identifying mutations causing monogenic

Linkage Studies (or Mendelian) disease used linkage and positional cloning through
family-based studies [2–4]. Following early successes of this
approach, its widespread adaptation led to the identification of the
genetic links to many diseases. Often, in Mendelian diseases the
identified mutations lead to changes in the amino acid sequence of
the translated protein, greatly increasing one’s risk to developing
the disease. Linkage studies are very powerful to identify such rare
risk alleles typically responsible for Mendelian disorders, but due to
low resolution and lack of statistical power to identify more com-
mon variants of modest effect, they have not been successful for
more complex disorders [5, 6]. Additionally, the lack of genetically
informative families, particularly for diseases displaying late-onset
or caused by de novo mutations, hamper the availability of sufficient
numbers of affected relatives to provide adequate power to identify
the disease-causing variants even for some Mendelian disorders.
For a disease where the majority of cases are sporadic (such as
Kabuki syndrome), linkage studies have failed to identify causative
mutations.

2.2 Genome-Wide GWAS examine single nucleotide polymorphisms (SNPs) through-

Association Studies out the genome in thousands of individuals to identify alleles asso-
ciated with disease. This approach relies upon information from
the HapMap project, and the existence of linkage disequilibrium
throughout the human genome, so that a variant at one locus can
predict the genetic variance at adjoining loci [7]. In this approach,
typically hundreds of thousands of SNPs are genotyped in disease and
control groups. Comparison of allele frequencies between the two
groups reveal genotypes that are overrepresented in one group
compared to the other and are therefore associated with disease risk.
The fundamental basis of GWAS is the common-disease/
common-variant (CD/CV) hypothesis, which states that common
diseases are driven by multiple common variants [8]. Individually,
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation… 129

each variant alone can yield a minor amount of risk, but when
combined, their effect is substantially increased. In GWAS, these
disease-causing variants may not be identified directly, but rather
genomic locations linked to them may be identified. In fact, statis-
tically significant association is often not found to be within a gene.
Generally, the closest gene associated with the common variant
represents the most likely candidate gene.
The first GWAS performed, in 2005, was for age-related macular
degeneration (AMD). Here, Klein et al. [9], genotyped ~100,000
SNPs throughout the genome in a small sample set of 96 cases and
50 controls resulting in the association of a SNP in the comple-
ment factor H (CFH) gene. The p-value of the associated SNP
surpassed the genome-wide significance threshold; and, the high
effect size (odds-ratio = 4.6) contributed to the highly significant
finding. The success of this study suggested that this would be a
viable approach for other complex disorders. Since the AMD study,
numerous GWAS have been published identifying over 1,300
significant associations (p-value < 5 × 10−8) with over 220 different
traits (http://www.genome.gov/gwastudies) [1].
While thousands of associations have been identified, they have
failed, with few exceptions, to produce results as definitive as the
AMD study. Over time, the number of SNPs assayed has increased
into the millions with tens of thousands of subjects. Even with
these increasingly large studies, only modest associations have been
identified. For example, studies of Crohn’s disease have identified
30 loci, but when combined they only explain ~20 % of the overall
heritability of the disease [10]. Similarly, studies of human height,
which has an estimated heritability of ~80 %, have identified hun-
dreds of loci which explain only 10 % of its overall heritability [11].
These studies are in contrast to AMD in which five common loci
explain over 50 % of the heritability of the disease [12]. These dif-
ferences in study results can be attributed to the fact that some
common diseases, such as AMD, can be explained by a limited
number of common variants of large effect [13]. But, for most
other conditions, this is not the case, and common variants only
account for a small part of the overall heritability of the disease.
In conclusion, even though GWAS has been successful in iden-
tifying many low risk alleles for common disease, the findings do not
explain a large proportion of the heritability of complex diseases [8].
Several possible reasons for this exist. Estimates of heritability, based
on familial aggregation, may be inflated. Genetic and non-genetic
factors contribute to familial aggregation of disease [14], with the
latter not being detected in a GWAS. Second, GWAS tend to iden-
tify loci not genes, and a positive signal is not always in a protein-
coding region making identification of the disease-causing variants
difficult. Therefore, the proportion of heritability explained by the
associated SNPs may be underestimated. Third, not all of the
genome is covered. Finally, the majority of GWAS are underpowered
130 Eric Londin et al.

to detect association to rare alleles (frequency less than 0.05).

This would require cohorts in the tens-to-hundreds of thousands of
subjects to generate adequate statistical power to detect association
to rare alleles. Such large cohort sizes would be impractical to obtain
for many diseases. With these factors in mind, GWAS may not be the
appropriate method to identify all the genetic variants involved in
disease due to the effects of multiple genes. More comprehensive
approaches are needed and it is anticipated that the analysis of rarer
variants identified by the 1000 Genomes Project (1KGP) [15] may
allow for this to occur. By leveraging advances in massively parallel
sequencing technology, the 1KGP will extend the catalog of human
variation covering minor allele frequencies as low as ~1 %; therefore,
increasing the scientific community’s understanding of the full
spectrum of variation in human populations.

2.3 Massively Perhaps the most comprehensive approach to identify genomic

Parallel Sequencing variants associated with both Mendelian and more complex dis-
eases is to perform massively parallel sequencing or next-generation
sequencing (NGS). This approach involves either sequencing the
entire genome or specifically targeted regions. Eventually, these
studies will be performed on a large number of samples, in a man-
ner similar to GWAS, to allow for the identification of all variants
associated with the disease. The increased use of this technology
will give a more complete understanding of the genome and the
information encoded within.
Genome sequencing has progressed significantly in recent years
from being able to sequence hundreds of base pairs to millions of
base pairs in a single reaction. The initial sequencing of the human
genome completed in 2000 [16, 17] was performed by Sanger
sequencing. In this process, DNA fragments are terminated with a
fluorescently labeled base, and all of the fragments are separated in
order of their length via capillary electrophoresis. It is the informa-
tion in the last base of each fragment which is used to determine
the original sequence [18, 19]. This method can result in sequence
reads of up to 800 nucleotides long and was used for the initial
sequencing of the human genome [16, 17]. This took 10 years to
complete at a cost of $3 billion. Although this methodology is
accurate and powerful, the cost and speed do not make it a feasible
approach for large-scale sequencing. Recent years have seen great
advances in sequencing technologies now making it possible for
individual laboratories to sequence an entire genome.

2.4 NGS Sequencing The newer sequencing technologies can achieve a much higher
Platforms throughput by sequencing a large number of samples in parallel.
Currently, a variety of new platforms are available to perform
massively parallel sequencing, with each platform allowing for dif-
ferences in the scale of the sequence being performed [20–22].
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation… 131

Table 1
Overview of the three main next-generation sequencing platforms

Life Technologies
Roche 454 SOLiD/5500xl Illumina HiSeq
Sequencing Pyrosequencing Sequencing by ligation of Polymerase-mediated
method fluorescently labeled incorporation of terminally
nucleotides labeled fluorescent nucleotides
Library Emulsion PCR Emulsion PCR Enzymatic amplification
amplification
method
Maximum read Up to 1,000 bp Mate-paired: 2 × 50 bp Paired-end: 2 × 150 bp
lengths Paired-end: 75 bp and 35 bp
Fragment: 75 bp
Throughput 700 Mb/run Up to 15 Gb/day for a single Up to 35 Gb/day for a single
flow cell flow cell

The various platforms each produce sequence reads of different

lengths which then have to undergo extensive bioinformatic analyses
to align the sequences and identify genomic variants.
Currently, there are three main NGS platforms (Roche 454
FLX, Life Technologies SOLiD and 5500xl sequencer and Illumina
HiSeq) each relying on different technologies (Table 1). The
Roche 454 FLX sequencer uses large-scale parallel pyrosequencing
to generate 400–600 megabases of sequence. This method ampli-
fies DNA inside water droplets in an oil solution (emulsion PCR,
ePCR), with each droplet containing a single DNA template
attached to a single primer-coated bead that then forms a single
molecular colony. Pyrosequencing [23, 24] uses luciferase to gen-
erate light for detection of the individual nucleotides added to
nascent DNA during polymerization, and the combined data are
used to generate sequence readouts. Currently, this technology
allows for read lengths of up to 400 nucleotides (Fig. 1).
Similar to the Roche 454 sequencer, the Life Technologies
SOLiD (Sequencing by Oligonucleotide Ligation and Detection)
and 5500xl sequencer rely on ePCR to amplify DNA fragments
bound to beads, which are covalently bound to a glass slide.
Sequencing is performed by ligation, detection and cleavage of
di-base probes. After each set of reactions the extension product is
cleaved and the template is reset for the next round of reactions.
Multiple cycles of this will produce read lengths of up to 75 nucle-
otides (Fig. 2).
132 Eric Londin et al.

Fig. 1 Roche 454 Sequencing. (a) Genomic DNA is fragmented and adapter sequences are ligated onto
fragmented DNA mixed with agarose beads; (b) Emulsion PCR is used to amplify the DNA fragments on the
agarose beads generating millions of amplified sequencing templates on each bead; (c) The beads are deposited
into PicoTiter wells where simultaneous sequencing of the entire genome is performed in thousands of picoliter-
sized wells by Pyrosequencing; (d) Pyrosequencing reactions consist of stepwise elongation of the primer
strand by sequential addition of the individual deoxynucleoside triphosphates in the presence of sulfurylase
and luciferase. Sequence at each elongation step is inferred by measuring light emission as an indicator of
nucleotide incorporation. This method allows for the amplification of up to 1,000 nucleotide size sequences
(with permission from 454 Sequencing© Roche Diagnostics, Branford, CT.)

In contrast to the 454 and SOLiD/5500xl methods, which

use bead-based ePCR to amplify DNA fragments, Illumina utilizes
a unique bridged-amplification reaction that occurs on the surface
of the flow cell. Here, single-stranded, adapter-ligated fragments
are bound to the surface of the flow cell and exposed to reagents
Fig. 2 Life Technologies Sequencing by Ligation (SOLiD). The SOLiD sequencing technology involves the preparation of a sequencing library. (a) DNA is frag-
mented into smaller pieces and adapter and primer sequences are ligated onto the fragments; (b) The DNA fragments are deposited onto agarose beads and
the fragments are enriched during ePCR. The 3′ ends of the amplified fragments are covalently modified to allow for attachment to the glass slide; (c) Following
the 3′ modification, the beads are deposited onto glass slides; (d) Sequencing by ligation occurs with the binding of a sequencing primer to the DNA fragment
with fluorescently labeled di-bases being ligated to the primer; (e) The specificity of the di-base probe is achieved through the interrogation of the 1st and 2nd
base of each ligation reaction. Multiple cycles of ligation, detection and cleavage are performed with the number of cycles determining the overall read length.
Following the ligation and detection cycles, the extension product is removed and the template is reset with a primer complementary to the n − 1 position of a
second round of ligation cycles. In total, five rounds of primer resets are completed allowing for virtually every base to be interrogated in two independent ligation
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation…

reactions by two different primers allowing for up to a 99.99 % accuracy to be achieved (with permission from Life Technologies, Carlsbad, CA.)
133
134 Eric Londin et al.

for polymerase-based extension. Priming occurs as the free/distal

end of a ligated fragment bridges to a complementary oligonucle-
otide on the surface. This process is repeated multiple times to
produce millions of copies of DNA fragments. Following this step,
sequencing by synthesis is performed with the incorporation of a
single fluorescent nucleotide, followed by imaging of the flow cell.
This process is repeated multiple times to produce 75 nucleotide
fragments (Fig. 3).
Taken together, these machines can produce billions of base
pairs of sequence in a relatively short period of time (days to a few
weeks) allowing for the sequencing of entire genomes or specifically
targeted regions (e.g., whole-exome sequencing). It is now clear
that this process can be performed by a single lab; the major hurdle
is the bioinformatic analyses of the sequence data. Typically, the
analysis is performed in multiple stages [25]. First is the acquisition
of the raw sequence reads from the machine which needs to be
mapped to the genome. The short sequence reads produced from
the Illumina and SOLiD machines do pose a challenge in aligning
the sequences, and require multiple reads per fragment. This overlap
in sequence reads ensures proper mapping and high confidence in
variant calling. The second step involves variant discovery (SNPs,
indels, and structural variants such as copy numbers). The final
step is to interpret the results in the context of the disease or trait
under study.

2.5 Applications The increased use of NGS is rapidly expanding our understanding
of NGS of the genetic basis of disease. Perhaps the largest genome sequenc-
ing effort being performed currently is the 1000 Genomes Project
[15]. The project aims to characterize variations in the human
genome by performing both whole-genome and exome sequenc-
ing on roughly 2,500 subjects from diverse population groups.
The pilot phase of this project sequenced 179 individuals and iden-
tified over 15 million new genetic variations. Release of this data
into the public domain will aid additional studies and increase our
knowledge of genetic variation.
Perhaps the greatest success of next-generation sequencing to
date has been in the discovery of variants for rare Mendelian dis-
eases. Instead of sequencing the entire genome, targeted sequenc-
ing of the coding regions as well as whole-exome sequencing has
yielded valuable results in the identification of disease-causing vari-
ants. The initial proof-of-concept that these were viable approaches
came in 2009 with the sequencing of four cases of Freeman–
Sheldon syndrome (FSS) and an additional eight controls [26].
While the cause of the disease was known (MYH3), the authors
were able to identify causative variants within this gene in all four
of their subjects. Since then, whole-exome sequencing has been
used extensively to identify previously unknown causes of a variety
of diseases (Table 2).
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation… 135

Fig. 3 Illumina Sequencing by Synthesis. (a) Genomic DNA is randomly fragmented and adapters are ligated
to both ends; (b)–(f) Bridge amplification is used to create clusters of DNA strands and fluorescently labeled
3′-OH blocked nucleotides are added to the flow cell with DNA polymerase; (g) The strands are extended by
one nucleotide. Following the addition of a single nucleotide, the unused nucleotides and DNA polymerase are
washed away, and the reaction is imaged; (h) and (i) The process is then repeated for another round of nucleo-
tide incorporation (with permission from Illumina, San Diego, CA.)
136 Eric Londin et al.

Whole-exome sequencing can identify variants for disease that

would not have been otherwise discovered. For example, exome
sequencing in patients with Miller syndrome, a rare autosomal
dominant disorder, identified de novo mutations in the DHODH
gene [27]. Similarly, de novo mutations were identified in the
SETBP1 gene associated with Schinzel–Giedion syndrome [28].
For both of these studies, the de novo variants would not have
been identified without the use of exome sequencing. Taken
together, these studies highlight the advantages of exome sequenc-
ing over previously used linkage studies for Mendelian disorders.
In contrast to Mendelian disorders, complex disorders pose a
more difficult challenge for NGS. The polygenic basis for these
disorders will require increased numbers of subjects to yield signifi-
cant results. Since some of the causative variants for complex dis-
eases will likely be within the noncoding regions of the genome,
relying solely on exome sequencing would not be a feasible
approach. Entire genomes of affected individuals would likely have
to be sequenced to find these variants. Currently, the cost of whole-
genome sequencing prohibits this from being performed on a
large-scale basis. But, as the cost decreases and sequencing through-
put increases, this approach will become a reality. Nonetheless,
identification of significant association for variants detected by
NGS will require groups of cases and controls as large as, or even
larger, than those used in current GWAS of common variants. In
fact, although effects of rare variants are expected to be higher than
those of common ones, and therefore easier to detect, their lower
frequency will decrease power to detect a significant association
compared to more frequent variants. For this reason, more power-
ful statistical methods specific to the analysis of rare variants (usu-
ally based on collapsing multiple rare variants into sets, rather than
testing each variant one at a time) are being developed [29].
Despite these drawbacks, the use of whole-genome sequencing has
started to show some progress in identifying variants for complex
disorders. These studies are benefitting from results of the 1000
Genomes project. The millions of SNPs identified are being
imputed into previously performed and current GWAS to aid in
the identification of rarer disease-causing variants. For example, in
a recent meta-analysis of multiple Parkinson’s disease (PD) studies
[30], GWAS results were combined with the imputation of mil-
lions of SNPs from the 1000 Genomes Project and resulted in the
identification of new risk factors for PD, which would not have
been possible without the addition of the new SNPs. This result
suggests that there will be an intersection between GWAS and
genome sequencing where previously performed GWAS can be
combined with new SNP data to increase the power of the GWAS
resulting in the identification of new variants associated with these
complex diseases.
Table 2
Recent literature using whole-exome and whole-genome sequencing

Method Number of subjects

Study performed Disease studied sequenced Major findings
1000 Genomes Whole-genome Normal human 179 individuals from four Identified over 15 million SNPs, 1 million
Project [15] and exome populations populations and 697 exomes insertion–deletions, and 20,000 structural
sequencing from seven populations variants most of which were previously
uncharacterized
Al Badr Whole-exome Ochoa syndrome One affected individual Identified a frameshift mutation in the HPSE2
et al. [44] sequencing gene
Alvarado Whole-exome Distal arthrogryposis One subject from a multi- Identified a missense mutation in the myosin
et al. [45] sequencing type 1 generational family (MYH3) gene that segregated with distal
arthrogryposis in the family
Barak Whole-exome Occipital cortical Two affected individuals from Identified mutations in the LAMC3 gene
et al. [46] sequencing development two separate consanguineous
families
Becker Whole-exome Autosomal-recessive One affected individual Identified a truncating mutation in the
et al. [47] sequencing osteogenesis SERPINF1 gene
imperfecta
Bolze Whole-exome Autoimmune A consanguineous family Identified homozygous missense mutations in
et al. [48] sequencing lymphoproliferative Fas-associated death domain (FADD) gene
syndrome (ALPS)
Caliskan Whole-exome Non-syndromic A consanguineous family Identified a novel missense mutation in the
et al. [49] sequencing mental retardation trans-2,3-enoyl-CoA reductase (TECR) gene
Choi Whole-exome Bartter’s syndrome A single affected subject Used whole-exome sequencing to make a
et al. [34] sequencing diagnosis of Bartter syndrome in a subject
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation…

who did not harbor mutations in previously

known genes for the disease
(continued)
137
Table 2
138

(continued)

Method Number of subjects

Study performed Disease studied sequenced Major findings
de Greef Whole-exome Immunodeficiency, Performed homozygosity Identified mutations in the ZBTB24 gene
et al. [50] sequencing centromeric mapping in five patients
instability, and facial then whole-exome
Eric Londin et al.

anomalies (ICF) sequencing in one

of the patients
Erlich Whole-exome Hereditary spastic One affected subject Identified a missense mutation in the KIF1A
et al. [51] sequencing paraparesis (HSP) gene
Gilissen Whole-exome Sensenbrenner Two unrelated affected Identified compound heterozygous mutations
et al. [52] sequencing syndrome subjects in the WDR35 gene
Glazov Whole-exome Skeletal dysplasia Two affected children and Identified missense mutations in
et al. [53] sequencing unaffected parents the POP1 gene
Gotz Whole-exome Infantile mitochondrial A single patient who died at Identified a missense mutation in the
et al. [54] sequencing cardiomyopathy 10 months of age mitochondrial alanyl-tRNA synthetase
(mtAlaRS) gene. This mutation was further
confirmed in two additional subjects
Greif Whole-exome Acute promyelocytic Tumors from three APL Identified 13 tumor specific mutations
et al. [55] sequencing leukemia (APL) patients including genes that are known targets of
leukemia including WT1 and KRAS
Johnson Whole-exome Familial amyotrophic A single affected family Identified missense mutations in the valosin-
et al. [56] sequencing lateral sclerosis (ALS) containing protein (VSP) gene. Mutations
were identified in additional affected
individuals
Li et al. [57] Whole-exome Non-disease based study 200 exomes Sequenced 200 exomes from individuals from
sequencing Denmark and identified an excess of low-
frequency non-synonymous mutations
Method Number of subjects
Study performed Disease studied sequenced Major findings
Liu et al. [58] Whole-exome Acne inversa Two affected individual Identified a splice site mutation in the NCSTN
sequencing and one unaffected gene, and confirmed the presence of
individual from a family mutations within the gene in addition
affected individuals
Mondal Targeted Human X-chromosome 24 male subjects Sequenced the human X-exome using a new
et al. [59] sequencing exome primer library to a 97 % coverage of the
targeted regions of the chromosome
Montenegro Whole-exome Charcot-Marie-Tooth An undiagnosed family Identified a previously characterized missense
et al. [60] sequencing (CMT) mutation in the GJB1 gene linked to
additional CMT subjects
Ng et al. [26] Whole-exome Freeman–Sheldon 12 exomes Using whole-exome sequencing of four affected
sequencing syndrome (FSS) individuals and eight unaffected individuals,
and were able to identify known mutations
for the disease. This study represents a
proof-of-principal that exome sequencing
can be used to identify variants associated
to Mendelian diseases
Ng et al. [61] Whole-exome Kabuki syndrome Ten unrelated affected Identified mutations in the MLL2 gene which
sequencing subjects were shown to be de novo in families where
the parental DNA was available
O’Roak Whole-exome Sporadic autism 20 affected individuals Identified 21 de novo mutations, 11 of which
et al. [62] sequencing disorders and their parents were nonsynonymous
O’Sullivan Whole-exome Amelogenesis imperfecta One affected individual Identified a homozygous nonsense mutation
et al. [63] sequencing (AI) in the FAM20A gene
Ostergaard Whole-exome Primary lymphoedema A single affected individual Identified missense mutations in the
et al. [64] sequencing GJC2 gene
Puente Whole-exome Hereditary progeroid Two unrelated families Identified homozygous mutations in the
et al. [65] sequencing syndrome with an effected individual barrier-to-autointegration Factor 1 (BANF1)
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation…

gene
(continued)
139
Table 2
140

(continued)

Method Number of subjects

Study performed Disease studied sequenced Major findings
Rios et al. [66] Whole-genome Severe A single affected subject Identified 2 nonsense mutations in the ABCG5
sequencing hypercholesterolemia gene
Eric Londin et al.

Saarinen Whole-exome Hodgkin lymphoma A family of four affected Identified mutations in the ataxia-telangiectasia
et al. [67] sequencing cousins (NAPT) gene in the subjects and confirmed
the presence of the mutation in additional
affected subjects
Simpson Whole-exome Hajdu–Cheney Three unrelated affected Identified nonsense mutations in the NOTCH2
et al. [68] sequencing syndrome subjects gene
Snape Whole-exome Mosaic variegated Two affected siblings Identified two mutations in the CEP57 gene,
et al. [69] sequencing aneuploidy syndrome the first is a 2 bp deletion and the second an
(MVA) 11 bp insertion present in both siblings.
Confirmed in an additional 18 affected
individuals
Szperl Whole-exome Celiac disease Two affected subjects from a Identified 12 nonsense mutations with low
et al. [70] sequencing three generation family with frequency present in both individuals. Two of
six affected individuals the variants in CSAG1 and KRT37 genes
were present in all six affected individuals and
two additional variants in the MADD and
GBGT1 genes were also present in 5/6 and
4/6 individuals, respectively
Sundaram Whole-exome Tourette syndrome Ten members of a 3-generation Identified three missense mutations in the
et al. [71] sequencing family MRPL3, DNAJC13 and OFCC1 genes that
segregated with chronic tic disorder
Timmermann Whole-exome Colorectal cancer Tumor and adjacent non- Identified mutations in the intracellular kinase
et al. [72] sequencing affected normal colonic domain of bone morphogenetic protein
tissue from two subjects receptor 1A (BMPR1A)
Method Number of subjects
Study performed Disease studied sequenced Major findings
Tsurusaki Whole-exome X-linked leukodystrophy An affected subject and Identified a nonsense mutation in the MCT8
et al. [73] sequencing unaffected sibling gene
Vissers Whole-exome Mental retardation Ten families with one affected Identified missense de novo mutations in nine
et al. [74] sequencing child genes
Vissers Whole-exome Chondrodysplasia and Three affected individuals Identified a missense mutation in the Golgi-
et al. [75] sequencing abnormal joint resident nucleotide phosphatase (gPAPP)
development gene in all three patients
Wei et al. [76] Whole-exome Melanoma 14 matched normal and Identified 68 genes that have somatic
sequencing metastatic tumors mutations. TRRAP harbored a recurrent
mutation in 4 % of additional patients and a
mutation in GRIN2A was mutated in 33 % of
the melanoma samples
Worthey et al. Whole-exome Intractable inflammatory A single affected subject Identified a single missense mutation in the
[77] sequencing bowel disease X-linked inhibitor of apoptosis gene. The
exome sequencing performed was used to
make a definitive diagnosis of the disease
Yamaguchi et al. Whole-exome Primary failure of tooth Two affected subjects Identified a missense mutation in the
[78] sequencing eruption (PFE) parathyroid hormone 1 receptor gene
(PTH1R) gene
Zhou et al. [79] Whole-exome Hereditary hypotrichosis One affected subject Identified a missense mutation in the ribosomal
sequencing simplex protein L21 (RPL21) gene
Zuchner et al. Whole-exome Retinitis pigmentosa A single affected family Identified a missense mutation in the
[80] sequencing dehydrodolichyl diphosphate synthase
(DHDDS) gene
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation…
141
142 Eric Londin et al.

NGS will have an important clinical utility [31–33]. Application

of NGS within the clinic can be used to diagnose and develop a
plan to treat disease [31]. Recently, Choi et al. [34] have used
whole exome sequencing to discover the cause of disease in an
individual with a suspected diagnosis of Bartter’s syndrome. They
identified a variant in the SLC26A3 gene, known to cause con-
genital chloride diarrhea, which was consistent with the patient’s
symptoms. Exome sequencing is one approach that the newly cre-
ated NIH Undiagnosed Disease Program is using to help diagnose
patients with rare diseases of unknown cause [35]. The initial suc-
cess of this program was shown with the identification of muta-
tions in the NT5E gene [36] in patients experiencing arterial and
joint calcifications. These studies highlight the uses of NGS to
diagnose and potentially treat patients with unknown diseases.
Pharmacogenomics will also benefit from NGS technology
[37] (PharmGKB, http://pharmgkb.org/). Current methods use
either targeted genotyping by SNP qPCR of a single locus or arrays
with a comprehensive coverage of the absorption, distribution,
metabolism, and excretion (ADME) markers panel. The human
DMET Plus (drug-metabolizing enzymes and transporters) SNP
array (Affymetrix, Santa Clara, CA) enables direct assessment of
common functional variants (1,936 marker) in 225 ADME that
may play a role in the phenotypic response of a patient to drug
treatment. A recent NGS study on a patient with thrombophilia
informed the patient about the appropriate pharmacological treat-
ment for their disease [38]. As NGS becomes used in clinical set-
tings, pharmacological treatments potentially could be tailored to
one’s personal genome. Additionally, as new pharmacogenomic
targets are identified, having the complete sequence of a person’s
genome will eliminate the need to reexamine the individual for
these new loci.

2.6 Limitations Because most disease-causing variants are located within the
to NGS coding regions of the genome, whole exome sequencing will prove
to be a powerful approach to identify genetic variation, and will
continue to be the method of choice until whole genome sequenc-
ing can be performed more cost effectively. Despite the clear
advantages of exome sequencing, there are drawbacks to this
approach. First, this approach will not detect structural variants
such as copy number changes, which have been implicated in dis-
ease. Second, there is a limitation in the specific exons that are
captured. Additional variants may be located in exons not targeted,
and such, not be identified. Finally, since exome capture is not
sufficiently specific, it now requires sequencing of a much larger
area. This level of sequencing would be equivalent to performing
whole genome sequencing at a low coverage. However, the low
coverage sequencing would likely miss many variants that are
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation… 143

present. Ultimately, whole genome sequencing will allow for the

most thorough examination of the genome.
Probably the greatest limitation to NGS is the bioinformatic
approaches needed to store and analyze the data [25, 39–42]. The
first challenge is the initial alignment of the short sequence data to
the genome. This process requires large amounts of computing
power and often takes days to complete. Following the alignment,
accurate calling of genomic variants must be performed, which
must be annotated further. Finally, interpretation of clinical rele-
vance of findings is made difficult by our limited understanding of
the potential significance of many sequence variants. Together,
these steps require complex bioinformatic approaches; this process
will get easier as standardized methodologies become available.
The error rate of raw sequence data produced through NGS is
higher than that achieved through Sanger sequencing. But the
overall error rate is reduced because of the high degree of sequenc-
ing depth (20–40×) that is necessary to achieve complete coverage.
This high redundancy in sequencing of each base gives confidence
in sequence calls. However, with increased sequence depth comes
increased cost. A key question in NGS is whether the identified
variant represents a true SNP or is a false positive. This is particu-
larly important for the identification of low-frequency variants or
private mutations that may be observed in a single or limited num-
ber of subjects. Until improved bioinformatic techniques become
available that will better ensure the accuracy of the sequence data,
validation of the identified variants through Sanger sequencing or
other genotyping methods is required [42].
Other sources of sequence errors may arise from the differ-
ences in the sequencing chemistries employed from the various
NGS platforms. A recent study compared the sequence obtained
from the same individual on three different platforms [43]. When
combining more than one platform, the identification of false posi-
tives was significantly reduced. The differences obtained from the
three platforms suggest that they all substantially differ in the error
profiles. Additionally, the subsequent bioinformatic analyses
employed by different researchers also could introduce false posi-
tive results.

3 Conclusions

The recent advances in DNA sequencing technologies have given

human geneticists new tools to delineate the genetic basis of both
rare and common diseases. In the next few years, as the cost of
genome sequencing continues to drop, more genetic variants con-
tributing to Mendelian and complex diseases will be identified,
enhancing our knowledge of these diseases. Beyond gene
144 Eric Londin et al.

discovery experiments, NGS will play an important role in person-

alized medicine where an individual’s genomic sequence could
provide information needed to make informed decisions about
disease risk, treatment, and outcome.

Acknowledgments

This work was supported the Kimmel Cancer Center and the
Computational Medicine Center at Thomas Jefferson University
Jefferson Medical College.

References
1. Hindorff LA et al (2009) Potential etiologic strongly influences risk of age-related macular
and functional implications of genome-wide degeneration. Nat Genet 38:1055–1059
association loci for human diseases and traits. 13. Jakobsdottir J et al (2009) Interpretation of
Proc Natl Acad Sci USA 106:9362–9367 genetic association studies: markers with repli-
2. Rommens JM et al (1989) Identification of the cated highly significant odds ratios may be
cystic fibrosis gene: chromosome walking and poor classifiers. PLoS Genet 5:e1000337
jumping. Science 245:1059–1065 14. Rose SP (2006) Commentary: heritability esti-
3. Riordan JR et al (1989) Identification of the mates–long past their sell-by date. Int J
cystic fibrosis gene: cloning and characteriza- Epidemiol 35:525–527
tion of complementary DNA. Science 15. 1000 Genomes Project Consortium, Abecasis
245:1066–1073 GR et al (2010) A map of human genome
4. Kerem B et al (1989) Identification of the cys- variation from population-scale sequencing.
tic fibrosis gene: genetic analysis. Science Nature 467:1061–1073
245:1073–1080 16. Venter JC et al (2001) The sequence of the
5. Risch N, Merikangas K (1996) The future of human genome. Science 291:1304–1351
genetic studies of complex human diseases. 17. Lander ES et al (2001) Initial sequencing and
Science 273:1516–1517 analysis of the human genome. Nature 409:
6. Altmuller J et al (2001) Genomewide scans of 860–921
complex human diseases: true linkage is hard 18. Schloss JA (2008) How to get genomes at one
to find. Am J Hum Genet 69:936–950 ten-thousandth the cost. Nat Biotechnol
7. Consortium IH (2005) A haplotype map of 26:1113–1115
the human genome. Nature 437:1299–1320 19. Hert DG, Fredlake CP, Barron AE (2008)
8. Manolio TA et al (2009) Finding the missing Advantages and limitations of next-generation
heritability of complex diseases. Nature 461: sequencing technologies: a comparison of
747–753 electrophoresis and non-electrophoresis meth-
9. Klein RJ et al (2005) Complement factor H ods. Electrophoresis 29:4618–4626
polymorphism in age-related macular degen- 20. Pareek CS, Smoczynski R, Tretyn A (2011)
eration. Science 308:385–389 Sequencing technologies and genome
10. Barrett JC et al (2008) Genome-wide associa- sequencing. J Appl Genet 52:413–435
tion defines more than 30 distinct susceptibil- 21. Metzker ML (2010) Sequencing technologies—
ity loci for Crohn’s disease. Nat Genet the next generation. Nat Rev Genet 11:31–46
40:955–962 22. Mardis ER (2008) Next-generation DNA
11. Lango AH et al (2010) Hundreds of variants sequencing methods. Annu Rev Genomics
clustered in genomic loci and biological pathways Hum Genet 9:387–402
affect human height. Nature 467:832–838 23. Zheng Z et al (2010) Titration-free massively
12. Maller J et al (2006) Common variation in three parallel pyrosequencing using trace amounts of
genes, including a noncoding variant in CFH, starting material. Nucleic Acids Res 38:e137
Use of Linkage Analysis, Genome-Wide Association Studies, and Next-Generation… 145

24. Margulies M et al (2005) Genome sequencing 42. Blaby-Haas CE, de Crecy-Lagard V (2011)
in microfabricated high-density picolitre reac- Mining high-throughput experimental data to
tors. Nature 437:376–380 link gene and function. Trends Biotechnol
25. Depristo MA et al (2011) A framework for 29:174–182
variation discovery and genotyping using next- 43. Nothnagel M et al (2011) Technology-specific
generation DNA sequencing data. Nat Genet error signatures in the 1000 Genomes Project
43:491–498 data. Hum Genet 130:505–516
26. Ng SB et al (2009) Targeted capture and mas- 44. Al Badr W et al (2011) Exome capture and
sively parallel sequencing of 12 human exomes. massively parallel sequencing identifies a novel
Nature 461:272–276 HPSE2 mutation in a Saudi Arabian child with
27. Ng SB et al (2009) Exome sequencing identi- Ochoa (urofacial) syndrome. J Pediatr Urol
fies the cause of a mendelian disorder. Nat 7:569–573
Genet 42:30–35 45. Alvarado DM et al (2011) Exome sequencing
28. Hoischen A et al (2010) De novo mutations of identifies an MYH3 mutation in a family with
SETBP1 cause Schinzel-Giedion syndrome. distal arthrogryposis type 1. J Bone Joint Surg
Nat Genet 42:483–485 Am 93:1045–1050
29. Bansal V et al (2010) Statistical analysis strate- 46. Barak T et al (2011) Recessive LAMC3 muta-
gies for association studies involving rare vari- tions cause malformations of occipital cortical
ants. Nat Rev Genet 11:773–785 development. Nat Genet 43:590–594
30. Nalls MA et al (2011) Imputation of sequence 47. Becker J et al (2011) Exome sequencing iden-
variants for identification of genetic risks for tifies truncating mutations in human
Parkinson’s disease: a meta-analysis of genome- SERPINF1 in autosomal-recessive osteogene-
wide association studies. Lancet 377:641–649 sis imperfecta. Am J Hum Genet 88:362–371
31. Su Z et al (2011) Next-generation sequencing 48. Bolze A et al (2010) Whole-exome-
and its applications in molecular diagnostics. sequencing-based discovery of human FADD
Expert Rev Mol Diagn 11:333–343 deficiency. Am J Hum Genet 87:873–881
32. Marian AJ (2011) Medical DNA sequencing. 49. Caliskan M et al (2011) Exome sequencing
Curr Opin Cardiol 26:175–180 reveals a novel mutation for autosomal reces-
33. Diamandis EP (2009) Next-generation sive non-syndromic mental retardation in the
sequencing: a new revolution in molecular TECR gene on chromosome 19p13. Hum
diagnostics? Clin Chem 55:2088–2092 Mol Genet 20:1285–1289
34. Choi M et al (2009) Genetic diagnosis by 50. de Greef JC et al (2011) Mutations in ZBTB24
whole exome capture and massively parallel are associated with immunodeficiency, centro-
DNA sequencing. Proc Natl Acad Sci USA meric instability, and facial anomalies syndrome
106:19096–19101 type 2. Am J Hum Genet 88:796–804
35. Maxmen A (2011) Exome sequencing deci- 51. Erlich Y et al (2011) Exome sequencing and
phers rare diseases. Cell 144:635–637 disease-network analysis of a single family
implicate a mutation in KIF1A in hereditary
36. St HC et al (2011) NT5E mutations and spastic paraparesis. Genome Res 21:658–664
arterial calcifications. N Engl J Med 364:
432–442 52. Gilissen C et al (2010) Exome sequencing
identifies WDR35 variants involved in
37. Daly AK (2010) Genome-wide association Sensenbrenner syndrome. Am J Hum Genet
studies in pharmacogenomics. Nat Rev Genet 87:418–423
11:241–246
53. Glazov EA et al (2011) Whole-exome
38. Dewey FE et al (2011) Phased whole-genome re-sequencing in a family quartet identifies
genetic risk in a family quartet using a major POP1 mutations as the cause of a novel skele-
allele reference sequence. PLoS Genet tal dysplasia. PLoS Genet 7:e1002027
7:e1002280
54. Gotz A et al (2011) Exome sequencing identi-
39. Trapnell C, Salzberg SL (2009) How to map fies mitochondrial alanyl-tRNA synthetase
billions of short reads onto genomes. Nat mutations in infantile mitochondrial cardio-
Biotechnol 27:455–457 myopathy. Am J Hum Genet 88:635–642
40. Nielsen R et al (2011) Genotype and SNP call- 55. Greif PA et al (2011) Somatic mutations in acute
ing from next-generation sequencing data. promyelocytic leukemia (APL) identified by
Nat Rev Genet 12:443–451 exome sequencing. Leukemia 25:1519–1522
41. Hinchcliffe M, Webster P (2011) In silico 56. Johnson JO et al (2010) Exome sequencing
analysis of the exome for gene discovery. reveals VCP mutations as a cause of familial
Methods Mol Biol 760:109–128 ALS. Neuron 68:857–864
146 Eric Londin et al.

57. Li Y et al (2010) Resequencing of 200 human disorder of severe and progressive bone loss.
exomes identifies an excess of low-frequency Am J Hum Genet 43:303–305
non-synonymous coding variants. Nat Genet 69. Snape K et al (2011) Mutations in CEP57
42:969–972 cause mosaic variegated aneuploidy syndrome.
58. Liu Y et al (2011) Confirmation by exome Am J Hum Genet 43:527–529
sequencing of the pathogenic role of NCSTN 70. Szperl AM et al (2011) Exome sequencing in a
mutations in acne inversa (hidradenitis suppu- family segregating for celiac disease. Clin
rativa). J Invest Dermatol 131:1570–1572 Genet 80:138–147
59. Mondal K et al (2011) Targeted sequencing of 71. Sundaram SK et al (2011) Exome sequencing
the human X chromosome exome. Genomics of a pedigree with tourette syndrome or
98:260–265 chronic tic disorder. Ann Neurol 69:901–904
60. Montenegro G et al (2011) Exome sequenc- 72. Timmermann B et al (2010) Somatic mutation
ing allows for rapid gene identification in a profiles of MSI and MSS colorectal cancer
Charcot-Marie-Tooth family. Ann Neurol 69: identified by whole exome next generation
464–470 sequencing and bioinformatics analysis. PLoS
61. Ng SB et al (2010) Exome sequencing identi- One 5:e15661
fies MLL2 mutations as a cause of Kabuki syn- 73. Tsurusaki Y et al (2011) Exome sequencing of
drome. Nat Genet 42:790–793 two patients in a family with atypical X-linked
62. O’Roak BJ et al (2011) Exome sequencing in leukodystrophy. Clin Genet 80:161–166
sporadic autism spectrum disorders identifies 74. Vissers LE et al (2010) A de novo paradigm
severe de novo mutations. Nat Genet 43: for mental retardation. Am J Hum Genet
585–589 42:1109–1112
63. O’Sullivan J et al (2011) Whole-exome 75. Vissers LE et al (2011) Chondrodysplasia and
sequencing identifies FAM20A mutations as a abnormal joint development associated with
cause of amelogenesis imperfecta and gingival mutations in IMPAD1, encoding the Golgi-
hyperplasia syndrome. Am J Hum Genet resident nucleotide phosphatase, gPAPP. Am J
88:616–620 Hum Genet 88:608–615
64. Ostergaard P et al (2011) Rapid identification 76. Wei X et al (2011) Exome sequencing identi-
of mutations in GJC2 in primary lymphoe- fies GRIN2A as frequently mutated in mela-
dema using whole exome sequencing com- noma. Nat Genet 43:442–446
bined with linkage analysis with delineation of 77. Worthey EA et al (2011) Making a definitive
the phenotype. J Med Genet 48:251–255 diagnosis: successful clinical application of
65. Puente XS et al (2011) Exome sequencing and whole exome sequencing in a child with intrac-
functional analysis identifies BANF1 mutation table inflammatory bowel disease. Genet Med
as the cause of a hereditary progeroid syn- 13:255–262
drome. Am J Hum Genet 88:650–656 78. Yamaguchi T et al (2011) Exome resequenc-
66. Rios J et al (2010) Identification by whole- ing combined with linkage analysis identifies
genome resequencing of gene defect responsi- novel PTH1R variants in primary failure of
ble for severe hypercholesterolemia. Hum Mol tooth eruption in Japanese. J Bone Miner Res
Genet 19:4313–4318 26(7):1655–1661
67. Saarinen S et al (2011) Exome sequencing 79. Zhou C et al (2011) Mutation in ribosomal
reveals germline NPAT mutation as a candi- protein L21 underlies hereditary hypotrichosis
date risk factor for Hodgkin lymphoma. Blood simplex. Hum Mutat 32:710–714
118:493–498 80. Zuchner S et al (2011) Whole-exome sequenc-
68. Simpson MA et al (2011) Mutations in ing links a variant in DHDDS to retinitis pig-
NOTCH2 cause Hajdu-Cheney syndrome, a mentosa. Am J Hum Genet 88:201–206
Chapter 9

The GoldenGate Genotyping Assay: Custom Design,

Processing, and Data Analysis
Anna González-Neira

Abstract
The Illumina GoldenGate Assay is a technique that is widely used in molecular genetics to analyze up to
thousands of single nucleotide polymorphism (SNPs) simultaneously, providing data of very high quality
in a fast and efficient manner. This technique allows the user to optimize the number of genetic loci to be
interrogated in a way that best suits their research goals. Here are described in detail all the steps to be
followed in the process of genotyping a custom panel, from panel design through data analysis.

Key words GoldenGate assay, Custom genotyping, Single nucleotide polymorphism, Fluorescence
signal, Cluster analysis

1 Introduction

The GoldenGate Assay allows the user to carry out low- to

moderate-multiplex genotyping based on a custom-built panel of
SNPs [1–3]. Researches can therefore use it to create assays tai-
lored directly to their specific genotyping needs, focused on tar-
geted regions, candidate genes or pathways, and many other
applications [4–9]. Custom assay panels can currently be deployed
either with BeadArray technology using Illumina’s iScan System
or Veracode technology using the BeadXpress Reader System.
BeadArray technology is based on 3-μm silica beads that self-
assemble in micro-wells on planar silica slides (multisample
BeadChip) [2]. The beads are randomly assembled in this substrate
with a uniform spacing of ~5.7 μm and each bead is covered with
hundreds of thousands of copies of a specific oligonucleotide that
act as the capture sequences for each of the assays. On the other
hand, VeraCode technology is based on cylindrical glass micro-
beads measuring 240 μm in length by 28 μm in diameter [3]. Each
microbead carries high-density digital holographic code and, when
excited by laser, emits a unique code image, allowing quick and
specific allele detection. Assays are created by pooling microbeads

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_9, © Springer Science+Business Media, LLC 2013

147
148 Anna González-Neira

with code diversities from one to several hundred, depending on

the desired level of multiplexing. Multiplex assays of 96, or any
number from 384 to 3,072, SNP can be designed for BeadArray,
while 48-, 96-, 144-, 192-, and 384-plex genotyping can be carried
out using Veracode.

2 Custom Panel Design

The first step in creating a custom genotyping panel using the

GoldenGate Assay is to select the panel of SNPs to be investigated.
Illumina offers an easy and convenient method to ensure successful
assay development, providing SNP-specific information including
predicted genotyping success, validation status and minor allele
frequencies (MAF) from published studies [10].
Firstly, the researcher needs to create a preliminary input file in
comma-separated (*.csv) format, containing a list of genes, regions,
sequences, or names of loci of interest. The file should include dif-
ferent specific column headings, depending on the type of list
included, as detailed below:
● Gene list: the column headings should be Gene_Name (Ref
Seq accession ID or HUGO gene symbol), Bases_Upstream
(number of bases upstream of the first coordinate for the gene)
and Bases_Downstream (number of bases downstream of the
last coordinate for the gene).
● Region list: the column headings should be Chromosome (chromo-
some containing the locus, enter 0 if unknown), Start_Coordinate
(first coordinate for the region to search), End_Coordinate (last
coordinate for the region to search), and User_Information
(any comments the researcher wishes to include).
● Sequence list: the column headings should be Locus_Name
(customer-supplied name for the sequence), Sequence (limited
to 10 kbs, putting brackets around the polymorphic locus in the
submitted sequence (TGG[A/C]ATT), and a minimum of 50
base pairs of sequence flanking on each side of the variant is
required), Target_Type (must be neither SNP or Indel), Genome_
Build_Version (enter 0 if unknown), Chromosome (chromosome
containing the locus, enter 0 if unknown), Coordinate (chromo-
somal coordinate, enter 0 if unknown), Source (the source of the
sequence and annotation data, enter unknown if no information
is available), Source_Version (source version number, enter 0 if
unknown), Sequence_Orientation (forward, reverse or
unknown), and Plus_Minus (Plus or Minus).
● Locus list: it should consist of just one column with heading
Locus_Name (the RS number taken from the dbSNP database:
www.ncbi.nlm.nih.gov/projects/SNP/).
The GoldenGate Genotyping Assay 149

The preliminary input file is then evaluated by the Assay Design

Tool (ADT) which provides independent assay success prediction
values, validation status, and allele frequencies. The submission to
the ADT is done directly by the researcher; an e-mail notification
is sent to the user when scoring is complete. The submission can
also be made by emailing the file to Technical Support Scientist
who will submit the file to ADT for processing.
The ADT generates a Score output file. This file contains a set
of informative metrics for each locus requested in the preliminary
input file. These metrics should be used to preferentially select the
assays that have a high likelihood of success in the final product
design and can be used to create a final order file. Performance
values are presented for each locus. The most important metric is
the Final_score that ranges from 0 to 1 and higher values reflect
greater likelihood of success of the assay experimentally. Additional
information such as whether the designed assay has been validated
(Validation_status), how a designed assay has been validated
(Validation_Bin), and the reasons why a successful assay is unlikely
for a marker locus (Failure_Codes) are also provided.
Researchers should use these metrics to select the final assay
panel. The following criteria for assay selection are recommended
to create a final product with the highest chance of generating
meaningful results: minimum MAF; spacing and/or tagging of
SNPs across the region/gene; favor GoldenGate-validated designs;
favor two-hit or HapMap-validated loci (www.hapmap.org); give
preference to assays with higher Final_Score; avoid assays with
Final_Score lower than 0.4 (because they have lower chance of
converting into functional assay and can also decrease the overall
performance of all assays); avoid assays containing SNPs with warn-
ing codes. After this custom selection, a final file must be submit-
ted to Illumina so that the custom pool can be manufactured.

3 DNA Preparation Requirements

Five microliters of DNA at a concentration of 50–100 ng/μL, as

determined by the Molecular Probes PicoGreen® assay, are placed
on a 96-well plate [11].
Internal quality control DNA samples should be included:
● It is recommended that at least 5 % of samples are included in
duplicate and that duplicate pairs are scattered across all plates.
If amplified DNA is used, the duplicates should be from inde-
pendent amplifications if possible.
● Samples from trios (mother, father, and offspring) can be also
included to check the marker segregation.
● A negative control (one well without any DNA template) is
not essential according Illumina recommendations.
150 Anna González-Neira

● It is advisable to avoid placing the same group of samples

together on plates: for example, if cases and controls are to be
genotyped, avoid systematically placing them on certain areas
of plates or to separate plates. However, if cases and controls
cannot be intermixed across the plates, the genotyping of all
plates should be done using the same genotyping platform,
methods, and conditions and scored by someone blind to
case–control status.
Create one sample sheet for each plate in comma separated
(*.csv) format with the following column headings: Sample_ID:
required, must be unique and contain no spaces; SentrixBarcode:
required, barcode of the beadChip or microtiter plate (both where
hybridization of the GoldenGate Assay products takes place),
depending on whether the iScan System or BeadXpress Reader,
respectively, are used to analyze fluorescent signal; SentrixPosition:
required; Sample_Plate: required, plate name, must be unique and
less than six characters long; Sample_Well: required; well position
of the sample (e.g., A01); Sample_Group: preferred, case/control,
for example; Gender: preferred, enter M or Male/F or Female;.
Sample_Name: required; Replicate: required (if applicable),
Sample_ID of the duplicate sample. Parent1: required (if applica-
ble), Sample_ID of one parent; Parent2: required (if applicable),
Sample_ID of the other parent.

4 GoldenGate Genotyping Protocol

The GoldenGate Genotyping protocol can be performed manually

or can be easily automated in the laboratory using an LIMS
(Laboratory Information Management System) [11].
● A minimum of 250 ng of DNA is activated by binding to para-
magnetic particles.
● The activated DNA is mixed with the Assay hybridization
buffer which contains three oligonucleotides (oligos) for each
SNP locus to be interrogated. Two of these oligos (Allele-
Specific Oligos, ASOs) are specific to each allele at the SNP site
and the third, the Locus-Specific Oligo (LSO), hybridizes sev-
eral bases downstream of the SNP site. A sequence of genomic
complementary and universal PCR primer sites are included in
these three oligo sequences; moreover in the LSO there is a
unique address sequence that targets a particular bead type.
The oligos hybridize to the activated DNA.
● Several washing steps are then required to remove excess
oligos.
● The next step is an extension reaction of the ASO and ligation
of the extended product to the LSO at each SNP site.
The GoldenGate Genotyping Assay 151

● The extended products are used as template for the PCR using
three universal PCR primers: two forward primers, P1 and P2,
labeled with Cy3 and Cy5 and a reverse primer, P3.
● The PCR products are again bound and the single-stranded
dye-labeled DNAs are eluted and prepared for hybridization to
their complementary bead type via their unique address
sequences on the micron silica beads or cylindrical glass micro-
beads, using a beadChip (BeadArray technology) or microtiter
plate (Veracode technology), respectively.

5 Analyzing GoldenGate Genotyping Data

After hybridization, the iScan System or BeadXpress Reader are

used to analyze fluorescence signal, which is in turn analyzed using
the GenomeStudio software [12]. In particular, the GenomeStudio
Genotyping Module uses a clustering algorithm that defines clus-
ter positions and can perform an automatic reclustering for all loci
or a subset of them. The clustering algorithm does not automati-
cally accommodate loci with no clusters heterozygotes, so manual
clustering is recommended for mitochondrial SNPs and those on
the Y chromosome.
Before evaluating SNP cluster positions to identify SNPs that
need to be excluded or manually clustered, it is important to check
the internal controls provided by the GoldenGate assay. These
include sample-dependent, sample-independent, and contamination
controls, and they provide relevant information about the overall
performance of the reagents, samples, and equipment used in the
experiment. They can be visualized in GenomeStudio Software.
In addition, before SNPs are further evaluated, it is important
to highlight and exclude problematic samples that show poor per-
formance on the genotyping assay. The user should use the GenCall
scores and call rate to identify these samples. A scatter plot of
GenCall score (use 10 % GC or p10 GC values) against sample call
rate should be generated. Samples with low 10 % CG or low call
rate will be outliers from the majority of samples and should be
excluded or reprocessed in an additional experiment.
The user can then manually edit the clusters of all the SNPs in
the project. Alternatively, they can prioritize loci for manual clus-
tering (or exclusion) using the metrics listed in the SNP Table
from the GenomeStudio software. To do this the user should sort
this SNP Table by:
● Cluster separation score (Cluster Sep): it is the measure of the
separation between the three genotype clusters in the theta
dimension and varies from 0 to 1. The user should prioritize
the evaluation of SNPs with a low score. SNPs with overlap-
ping clusters should be excluded.
152 Anna González-Neira

● Call Frequency (Call Freq): it is the proportion of all samples at

each locus with call scores above the no-call threshold and
ranges from 0 to 1. SNPs with low Call Freq should be priori-
tized for manual clustering.
● AB Mean for Intensity (R) (AB R Mean): this parameter is the
mean normalized intensity (R) of the heterozygote cluster.
The metric helps identify SNPs with low intensity and has val-
ues increasing from 0. The user should prioritize SNPs with
low AB R Mean for manual evaluation and exclude those with
intensities too low for genotypes to be called reliably.
● AB Mean for Theta (T) (AB T Mean): this parameter is the
mean of normalized theta values of the heterozygote cluster,
and ranges from 0 to 1. Values ≤0.2 or ≥0.8 are indicative of a
possible shift of the heterozygote cluster towards a homozy-
gous cluster. If the cluster can be separated the user can edit
the SNP manually, otherwise the locus should be excluded.

6 Quality Control for GoldenGate Data

After editing SNP clusters, GenomeStudio can be used to calculate

the reproducibility across duplicated samples. The investigator
needs to establish the concordance threshold between duplicate
samples. According Illumina data, the reproducibility described for
the GoldenGate assay is extremely high; approximately 99 % across
replicated samples.
Next, Mendelian inheritance in trios should be checked as
should evidence of departure from Hardy–Weinberg equilibrium.
These are standard quality control checks that test the quality of
SNP genotypes.
Finally, samples and SNPs which do not meet the overall call-
rate threshold established by the investigator should be excluded.
The data is then ready to be statistically analyzed.

References

1. Oliphant A et al (2002) BeadArray technol- 5. Lubomirov R et al (2010) ADME pharmaco-

ogy: enabling an accurate, cost-effective genetics: investigation of the pharmacokinetics
approach to high-throughput genotyping. of the antiretroviral agent lopinavir coformu-
Biotechniques Suppl:56–58, 60–61 lated with ritonavir. Pharmacogenet Genomics
2. Shen R et al (2005) High-throughput SNP 20(4):217–230
genotyping on universal bead arrays. Mutat 6. Monsuur AJ et al (2005) Myosin IXB variant
Res 573(1–2):70–82 increases the risk of celiac disease and points
3. Lin CH et al (2009) Medium- to high- toward a primary intestinal barrier defect. Nat
throughput SNP genotyping using VeraCode Genet 37(12):1341–1344, Epub 2005 Nov 13
microbeads. Methods Mol Biol 496: 7. McMahon FJ et al (2006) Variation in the
129–142 gene encoding the serotonin 2A receptor is
4. Fan JB et al (2005) BeadArray-based solutions associated with outcome of antidepressant
for enabling the promise of pharmacogenom- treatment. Am J Hum Genet 78(5):804–814,
ics. Biotechniques 39(4):583–588 Epub 2006 Mar 20
The GoldenGate Genotyping Assay 153

8. Fallin MD et al (2005) Bipolar I disorder and genotyping assay. PLoS One 6(6):e20251,
schizophrenia: a 440-single-nucleotide poly- Epub 2011 Jun 6
morphism screen of 64 candidate genes among 10. Designing custom GoldenGate genotyping
Ashkenazi Jewish case-parent trios. Am J Hum assay. Technical note (from www.illumina.com)
Genet 77(6):918–936, Epub 2005 Oct 28
11. GoldenGate® assay workflow. Technical note
9. Campino S et al (2011) Population genetic (from www.illumina.com)
analysis of Plasmodium falciparum parasites 12. Analyzing GoldenGate genotyping data.
using a customized Illumina GoldenGate Technical note (from www.illumina.com)
Chapter 10

Genome-Wide Gene Expression Profiling, Genotyping,

and Copy Number Analyses of Acute Myeloid Leukemia
Using Affymetrix GeneChips
Mathijs A. Sanders and Peter J.M. Valk

Abstract
With novel genome-wide technologies it is nowadays possible to perform detailed molecular analyses of
normal and malignant tissues. Acute myeloid leukemia (AML) is a heterogeneous group of diseases with
variable response to therapy. Gene expression profiling and genome-wide genotyping have recently been
successfully applied to unravel the heterogeneity of AML. This chapter gives instructions and recommen-
dations for genome-wide gene expression analyses, genotyping, and copy number analyses, as performed
for AML using Affymetrix GeneChips.

Key words Affymetrix GeneChips, Affymetrix DNA mapping arrays, Gene expression profiling,
Genome-wide genotyping, Copy number analyses, Acute myeloid leukemia

1 Introduction

Within the human body, thousands of genes and their products,

i.e., RNA and proteins, function in a complicated web and are
orchestrated both temporally and spatially. Gene expression,
however, varies from tissue to tissue depending on the cell types
present and the condition of the material, e.g., disease state, giving
a source of variation within and between organisms. The ability to
measure the RNA expression or DNA structure of multiple genes
simultaneously provides the researcher with the ability to study the
entire genome in one experiment, with a quantifiable signal being
generated that is directly proportional to the copy number/expres-
sion level in cells/tissues. The microarray technology is a great
milestone for full (global) genome research. Qualitative or quanti-
tative measurements with DNA microarrays use the selective nature
of DNA–DNA or DNA–RNA hybridization under high-stringency
conditions and fluorophore-based detection. There are currently a

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_10, © Springer Science+Business Media, LLC 2013

155
156 Mathijs A. Sanders and Peter J.M. Valk

large number of companies producing both cDNA and oligonucle-

otide arrays that interrogate human genome expression to maximum
capacity. GeneChip technology, a slight variation of oligonucle-
otide arrays, is produced by Affymetrix [1] and currently has the
leading position in microarray technology, along with Agilent [2]
and Illumina [3].
Affymetrix GeneChips refer to the high-density oligonucleotide-
based arrays, which consist of small DNA oligonucleotides referred
to as probes. DNA probes are synthesized in situ on silicon wafers
using a photolithographic process. The 11-μm DNA probes on
expression arrays are 25 nucleotides long and a probe set, repre-
senting a single mRNA, consists of 11 different probe pairs (22
probes). This allows for consistent discrimination between signal
and background noise. The 54,000 different probe sets on the cur-
rent U133-plus2.0 GeneChip microarray represent approximately
30,000 known genes and EST sequences. For each probe on the
array that perfectly matches (PM) its target sequence, Affymetrix
also created a paired “mismatch” probe (MM). The mismatch
probe contains a single mismatch located directly at the 13th posi-
tion in the 25-mer probe sequence [4]. This mismatch probe is
used as a background control and also to overcome the low speci-
ficity of the short oligonucleotide used [5]. While the perfect
match probe provides measurable fluorescence when the sample
binds to it, the paired mismatch probe is used to detect and
eliminate any false or contaminating fluorescence within that mea-
surement [1]. The mismatch probe serves as an internal control for
its perfect match partner because it hybridizes to nonspecific
sequences about as effectively as its counterpart, allowing mislead-
ing signals, from cross-hybridization, for example, to be efficiently
quantified and subtracted from a gene expression measurement or
genotype call [4–6]. These multiple measurements provide high
sensitivity and reproducibility, just as the 25-mer oligonucleotide
probe length confers high specificity. The chip design strategy for
genotyping probe sets is to use a set of perfect match/mismatch
probe pairs to interrogate the surrounding bases of the SNP for the
forward and or reverse target for both the A and B alleles. The
Genome-Wide Human SNP Array 6.0 contains more than 906,600
single nucleotide polymorphisms (SNPs) and more than 946,000
probes for the detection of copy number variation.
Acute myeloid leukemia (AML) is a group of neoplasms char-
acterized by a variety of genetic aberrations and a variable response
to therapy [7, 8]. The pretreatment karyotype of leukemic blasts is
currently the key determinant for therapy decision-making in
AML. For instance, the translocations inv(16), t(8;21), and
t(15;17) are indicative for a favorable prognosis, whereas other
cytogenetic aberrations indicate a poor-risk leukemia [7, 8]. The
largest cytogenetic subclass of AML, i.e., those patients with a
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 157

normal karyotype, is categorized as standard risk, since these AML

cases lack informative chromosomal markers. This group, accounting
for approximately 40–45 % of all AML patients, most probably
contains a mixture of patients with favorable and unfavorable
prognosis.
Molecular analyses of AML have revealed mutations in various
genes, such as the genes encoding nucleophosmin (NPM1), the
fms-like tyrosine kinase receptor 3 (FLT3), and the CCAAT/
enhancer binding protein alpha (CEBPA), as well as increased
expression of the ecotropic virus integration site-1 (EVI1) gene in
specific subsets of AML [8]. These mutations refine the classifica-
tion of AML. For instance, mutations in NPM1, like those in the
gene encoding CEBPA, are associated with a favorable outcome,
whereas internal tandem duplication (ITD) mutations in the hema-
topoietic growth factor receptor FLT3 gene and elevated expres-
sion of the transcription factor EVI1 mRNA are indicative for
unfavorable prognosis [7, 8].
Although nonrandom clonal aberrations are identified in
40–50 % of all AML patients and the numbers of molecular genetic
abnormalities are growing, a large proportion of AML patients
cannot adequately be classified because of the lack of prognosti-
cally significant molecular abnormalities. Novel genome-wide
approaches open possibilities to improve risk-stratification of AML.
Moreover, genome-wide analyses of AML will help to unravel the
biology of this disease. In fact, in recent years a number of studies
have applied gene expression profiling in the discovery of new dis-
ease entities in AML, comparison of AML subtypes and prediction
of molecularly defined subtypes and disease outcome in AML [9].
Likewise, genome wide genotyping has been applied in detailed
molecular analyses of acute leukemias. In the following paragraphs,
we will highlight the various practical aspects of gene expression
profiling and genome-wide genotyping of AML as well as the anal-
yses of these large data sets (see Note 1).

2 Materials

Use for all preparations molecular biology grade water ((RNase/

DNase-free) BioWhittaker).

2.1 Isolation of 1. Ficoll with a density of 1.077 (e.g., NycoMed or Ficoll-Paque

Mononuclear Cells by Plus).
Density Separation 2. Phosphate-buffered saline (PBS).
Using Ficoll
3. Fetal calf serum (FCS).
4. Hypotonic medium (e.g., NH4Cl).
158 Mathijs A. Sanders and Peter J.M. Valk

2.2 RNA Isolation 1. RNA lysis buffer:

25 g guanidine thiocyanate.
0.25 g sarkosyl.
1.25 ml 1 M sodium citrate pH 7.0.
Make up to 50 ml with RNase-free water.
2. Cesiumchloride:
5.7 M: 95.97 g CsCl.
0.83 ml 3 M NaAc pH 5.0.
Make up to 100 ml with RNase-free water.
3. 3 M NaAc pH 5.0.
4. 96 % ethanol.
5. 70 % ethanol.
6. RNase-free water (0.1 % v/v diethylpyrocarbonate for 1 h at
37 °C and then autoclaved [at least 15 min]).

2.3 DNA Isolation 1. DNA lysis buffer: 12.5 ml 100 mM EDTA and 3.75 ml 1 M
NaCl in 50 ml water.
2. Proteinase K (20 mg/ml).
3. SDS (20 %).
4. Saturated NaCl (87.7 g NaCl in 250 ml water).
5. 96 % ethanol.
6. TE buffer (10 mM Tris–HCl, 1 mM EDTA).

2.4 Labeling and Hybridization and staining of cRNA were performed exactly
Hybridization according to the manufacturer of the Affymetrix GeneChips
Procedures [Affymetrix (Santa Clara, CA, USA)] [10].

2.4.1 Labeling RNA 1. Superscript II RT (200 U/μl), Invitrogen Life Technologies.

Double Strand cDNA 2. E. coli DNA ligase (10 U/μl).
Synthesis 3. DNA polymerase I (10 U/μl).
4. E. coli RNAse H (2 U/μl).
5. T4 DNA polymerase (5 U/μl).
6. 5× second strand buffer 500 μl, Invitrogen Life Technologies.
7. dNTP set, 250 μl of 100 mM each of dATP, dCTP, dGTP, dTTP.
8. RNAsin, 10.000 U.

In Vitro Transcription 1. Biotin-11-CTP, (10 mM, 250 nmol in 25 μl), PerkinElmer.

2. Biotin-16-UTP, (10 mM, 250 nmol in 25 μl), PerkinElmer.
3. MEGAScript T7, 40 labeling reactions, Ambion.

Cleanup 1. GeneChip® Sample Cleanup Module, 30 reactions, Affymetrix.

Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 159

2.4.2 Hybridization and 1. Acetylated bovine serum albumin (BSA) solution (50 mg/ml).
Staining Gene Expression 2. Herring Sperm DNA (10 mg/ml).
Profiling
3. GeneChip® Eukaryotic Hybridization Control Kit (contains
Hybridization Control cRNA and Control Oligo B2), 150 reactions, Affymetrix.
4. 5 M NaCl (RNase-free/DNase-free).
5. MES-Free Acid Monohydrate SigmaUltra.
6. MES Sodium Salt.
7. EDTA Disodium Salt, 0.5 M solution.

Staining 1. Acetylated bovine serum albumin (BSA) solution (50 mg/ml).

2. R-Phycoerythrin–Streptavidin (1 mg/ml).
3. 5 M NaCl (RNase-free/DNase-free).
4. PBS, pH 7.2.
5. 20× SSPE (3 M NaCl, 0.2 M NaH2PO4, 0.02 M EDTA).
6. Goat IgG (Reagent Grade).
7. Anti-streptavidin antibody (goat, biotinylated).

2.5 Labeling, Labeling of genomic DNA, hybridization, and staining for genome-
Hybridization, and wide genotyping were performed exactly according to the manu-
Staining Genome-Wide facturer of the Affymetrix GeneChips [Affymetrix (Santa Clara,
Genotyping CA, USA)] [11].

3 Methods (See Note 2)

3.1 Isolation of 1. Use Ficoll-1.077 at room temperature.

Mononuclear Cells 2. Add PBS with 0.5 % FCS to bone marrow or blood up to the
by Density Separation appropriate volume blood: minimum 1:1 dilution [maximum
Using Ficoll WBC concentration 60 × 106/ml; bone marrow: 1:3 or 1:5
(See Note 3) dilution (maximum cell concentration 60 × 106/ml)].
3. Bring 15 ml Ficoll-1.077 in the 50 ml tube. Carefully add the
cell suspensions on top of the Ficoll-1.077.
4. Spin 20 min at 1,800 rpm (600 × g) at room temperature.
5. Remove the upper layer of medium up to 5 mm above the
interphase.
6. Collect the interphase in a 50 ml tube.
7. Wash cells with 50 ml PBS + 0.5 % FCS and centrifugation for
10 min at 760 × g.
8. Decant the supernatant.
9. Depending on the type of isolation the cells should be lysed in
RNA- or DNA lysis buffer (see Note 4).
160 Mathijs A. Sanders and Peter J.M. Valk

3.2 RNA Isolation 1. Lyse mononuclear cells (see Note 6) in 6 ml RNA lysis buffer
(See Note 5) by adding the lysis buffer, resuspending the cells and vortexing
for 1 min.
2. Pipet in open centrifuge tubes for SW41Ti rotor, 3 ml cesium
chloride.
3. Pipet carefully the lysate on top of the cesium chloride.
4. Weigh the tubes and make them equal in weight with lysis buf-
fer (maximal variation: 0.1 g).
5. Centrifuge for 18 h at 32,000 rpm (17,500 × g) at room
temperature.
6. After centrifuging the RNA pellet is on the bottom, DNA is
halfway and proteins are in the upper phase.
7. Pipet off the upper fluid and turn the tube upside down. DNA
that is left over in the tube cannot contaminate the RNA
pellet.
8. Cut the bottom of the tube containing the RNA pellet with a
sterile lancet.
9. Collect the RNA pellet into a sterile eppendorf tube by wash-
ing the bottom twice with RNase-free water. Keep on ice.
10. Precipitate the RNA by adding 40 μl 3 M NaAc pH 5.0 and
1 ml 96 % ethanol. Mix well by turning upside down.
11. Incubate for 30 min at −70 °C.
12. Centrifuge for 15 min at 16200 × g at 4 °C.
13. Wash pellet with 500 μl 70 % ethanol. Centrifuge for 10 min at
13,000 rpm at 4 °C.
14. Dissolve the pellet in 25 μl RNase-free water, by pipetting up
and down.
15. Measure the RNA concentration using a 1:10 dilution and
determine quality (see Note 7).
16. Store RNA at −70 °C.

3.3 DNA Isolation 1. Pellet mononuclear cells (see Note 9) by centrifugation 5 min
(See Note 8) at 4 °C at 1,200 rpm (300 × g).
2. Remove supernatant and resuspend 10–100× 106 cells in 3 ml
DNA lysis buffer.
3. Add 25 μl proteinase K (20 mg/ml).
4. Add 150 μl SDS (20 %).
5. Incubate overnight at 37 °C.
6. Add 1 ml saturated NaCl and vortex.
7. Centrifuge for 15 min at 2,500 rpm (1,100 × g).
8. Repeat steps 6 and 7 until the supernatant is clear.
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 161

9. Add two volumes (8 ml) ethanol to the supernatant.

10. Harvest DNA from the interphase using a pipet tip [or spin the
tube 10 min at 3,500 rpm (2,300 × g)].
11. Wash DNA with 70 % ethanol.
12. Incubate for 30 min at 65 °C to deactivate DNases.
13. Dissolve DNA in 100–350 μl TE overnight at room
temperature.
14. Store DNA at 4 °C in air-tight tube to prevent evaporation.

3.4 Labeling and 1. Pipet:

Hybridization
10 μl RNA in H2O (5 μg RNA).
Procedures
2 μl T7(dT)24 Primer (50 pmol/μl).
3.4.1 Labeling RNA
(Fig. 1) 2. Incubate for 10 min at 70 °C (in 500 μl tubes in ABI9700).
3. Cool on ice and spin.
First Strand cDNA
Synthesis 4. Pipet per reaction:
2 μl 0.1 M DTT.
1 μl dNTP’s (10 mM).
5. Incubate for 2 min at 42 °C.
6. Add 1 μl Superscript II RT (200 U/μl).
7. Incubate for 1 h at 42 °C.
8. Cool on ice (at least 2 min).

Second Stand cDNA 1. Pipet per reaction:

Synthesis
91 μl water.
30 μl 5× Second Strand Reaction Buffer.
3 μl 10 mM dNTP mix.
1 μl DNA Ligase (10 U/μl).
4 μl DNA Polymerase I (10 U/μl).
1 μl RNAse H (2 U/μl).
2. Mix by pipeting and spin.
3. Add 130 μl of this mix to each first strand synthesis.
4. Mix by pipeting and spin.
5. Incubate for 2 h at 16 °C.
6. Add 2 μl T4 DNA polymerase.
7. Incubate for 5 min at 16 °C.
8. Cool on ice and add 10 μl 0.5 M EDTA to stop the reaction.
9. Clean the ds cDNA with the GeneChip® Sample Cleanup
Module.
162 Mathijs A. Sanders and Peter J.M. Valk

Fig. 1 Overview of Affymetrix genome-wide expression profiling with U133-plus2.0 GeneChip microarray.
Affymetrix gene expression arrays use a standardized biotin labeling protocol, which utilizes an Oligo(dT)-T7
promoter primed, in vitro transcription based linear amplification strategy. The procedure consists of reverse
transcription with an oligo(dT) primer bearing a T7 promoter. Subsequently, the cDNA is subjected to second
strand synthesis and cleanup to become a template for in vitro transcription (IVT) with T7 RNA Polymerase in
the presence of biotinylated nucleotides. Following this, strict Affymetrix protocols are utilized by the standard
fluidics and scanning stations
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 163

10. After cleanup, the ds cDNA has a volume of 14 μl. Concentrate

ds cDNA with a SpeedVac up to 1.5 μl.
11. Use 1.5 μl in IVT.

In Vitro Transcription 1. Pipet per reaction:

(See Note 10) 1.5 μl ds cDNA Template.
2 μl 17 mM GTP.
1.5 μl 17 mM UTP.
1.5 μl 17 mM CTP.
3.75 μl 10 mM Biotin-11-CTP.
3.75 μl 10 mM Biotin-16-UTP.
2 μl 10× Reaction buffer.
2 μl T7 enzyme mix.
2 μl 17 mM ATP.
2. Add 18.5 μl IVT reaction mix per 1.5 μl ds cDNA sample.
3. Mix by pipeting and spin.
4. Incubate for 5 h (or overnight) at 37 °C.
5. Clean cRNA with GeneChip® Sample Cleanup Module.
6. Determine cRNA concentration.

Fragmentation cRNA 1. Fragment 10 μg cRNA:

24 μl (RNA in water).
6 μl fragmentation buffer.
2. Incubate for 35 min at 94 °C.
3. Cool on ice.

3.4.2 Hybridization and Hybridization and staining of cRNA were performed exactly
Staining Gene Expression according to the manufacturer of the Affymetrix GeneChips
Profiling [Affymetrix (Santa Clara, CA, USA)] [10].

Hybridization cRNA 1. Prepare 1× hybridization mix:

30 μl fragmented cRNA.
3.3 μl Control Oligonucleotide B2.
10 μl 20× Eukaryotic Hybridization Controls (bioB, bioC,
bioD cre).
2 μl Herring Sperm DNA (10 mg/ml).
2 μl Acetylated BSA (50 mg/ml).
100 μl 2× Hybridization Buffer.
52.7 μl water.
164 Mathijs A. Sanders and Peter J.M. Valk

3.5 Labeling, Labeling of genomic DNA, hybridization, and staining for genome-
Hybridization, and wide genotyping were performed exactly according to the manu-
Staining Genome-Wide facturer of the Affymetrix GeneChips [Affymetrix (Santa Clara,
Genotyping CA, USA)] (Fig. 2) [11].

3.6 Experimental To perform a successful microarray experiment one should take

Design and Variation several factors into account. The structure of the experimental
design is directly related to the statistical power of the analysis. Not
only the number of samples is of primacy, but additional emphasis
should be put on the structure of the design (e.g., cases vs. con-
trol). Furthermore, a myriad of possible factors of variation can
usurp the statistical power needed to answer the biological ques-
tions at hand. A vital step of the design is to identify all possible
sources of variation. One of the largest sources of variation is
unwanted biological variation. For instance, it has been shown that
dramatic transcriptional differences can occur at different times of
the day solely due to the circadian rhythm [12]. Another source of

Fig. 2 Overview of Affymetrix genome-wide genotyping with human SNP array 6.0 array [11]. Genomic DNA is
digested with NspI and StyI, followed by adapter ligation, linear amplification, and labeling, before hybridization
on the GeneChip
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 165

variation is introduced by technical variation. For instance, differ-

ent techniques for isolating genetic material can have a strong
impact on gene expression measurements, as is the case for chang-
ing labeling kits or other reagents within an experiment. It is
imperative that the researcher designs the experiment in advance
and lists all possible sources of variation, resulting in an optimal
design for inferring the questions at hand.

3.7 Analyses of Gene In the following paragraphs important recommendations are given
Expression Profiling how to successfully accomplish the major study objectives when
Data analyzing gene expression data, i.e., class discovery, class compari-
son, and class prediction. A full overview of all types of gene expres-
sion data analyses is given in [13].

3.7.1 Normalization and Pre-processing microarrays is a vital step in acquiring the measure-
Summarization ments. To truly understand the output of the whole data-generating
process and the reason for pre-processing it is necessary to first
explain the fundaments of microarray analysis. On a microarray
each probe pair consists of a perfect match and a mismatch probe.
The signal intensity emitted from these probes are read by the
microarray scanner and condensed into a signal intensity file, also
called .CEL files. This .CEL file contains the signal intensity mea-
surement for each probe situated on the microarray and is there-
fore pivotal for research. These probes belong to a probe set, which
is directly related to a known transcript, as has been stated above.
Hence, the individual probes associated to one probe set can be
summarized to one intensity value reflecting the expression level of
the associated gene. Before summarization is it of utmost impor-
tance that the microarrays are normalized. Normalization is a type
of “calibration” that serves to remove nonbiological or systematic
variation between samples, such as differences in the background
and noise levels, hybridization conditions, handling and instru-
mentation consistencies. Most normalization procedures perform
the following pre-processing steps:
Step 1: Background correction of the probes.
Step 2: Normalization within the chip to correct technical variation
or to facilitate between-array comparison. Frequently used meth-
ods use statistical techniques such as “quantiles” or “invariant set
of genes.”
Step 3: Perfect Match correction methods, e.g., subtracting the
Mismatch probe from the Perfect Match probe.
Step 4: Summarization, e.g., “average difference” and “median
polish” which converts the 11–22 probe pair intensities into one
probe set value.
166 Mathijs A. Sanders and Peter J.M. Valk

There are a myriad of normalization techniques developed

during the last decade. Only a few of them are frequently used:
Microarray Suite 5.0: Also commonly abbreviated to MAS5.0.
This algorithm is one of the most frequently used normalization
procedures. It is routinely embedded in microarray software of
Affymetrix, known as Expression Console. The normalization
method assumes that the total amount of labeled mRNA is equal
among all samples [14]. MAS uses a robust estimators, i.e., Tukeys
biweight, based on a weighted mean to estimate the variance
among probe pairs within one probe set. Following this, the algo-
rithm applies the Wilcoxon-Signed rank sum test to make the con-
fidence calls, which indicate the reliability of each call.
Robust Multi-array analysis: Also commonly abbreviated to (GC)
RMA [15]. The RMA algorithm adjusts the background to create
an ideal match (IM), ignoring Mismatch probes and removing
global background. Furthermore, it utilizes quantile normalization
in which the intensities are adjusted, ignoring the outliers, such
that the microarrays are comparable. It uses median polish to
summarize the probes into a probe set intensity value on a logarith-
mic scale. A modified version of RMA is GC-RMA, which models
the intensity of the probe level data taking into account the stron-
ger binding of G/C pairs presumable resulting in higher intensity
values for GC-rich probes.
The given normalization procedures can be performed with most
commercial software packages, such as dChip [16] or Omniviz
(Omniviz, MI, USA) [17], but most commonly is performed with R in
conjunction with adequate Bioconductor packages. It requires from
the researcher a slight understanding of programming languages.

3.7.2 Class Discovery Usually the first step after pre-processing is the use of techniques
to perform unsupervised analysis. One of these techniques is called
clustering, which is a tool that aims at dividing the data in such a
way that items (e.g., samples or genes) fall into the same group and
that dissimilar items fall into different groups. Clustering is an
unsupervised technique when prior information, such as pheno-
type, molecular subtype or any clinical parameter, is not taken into
account. It is an easy way to infer if samples with similar subtypes
of disease are grouping together, hinting the researcher that there
is information in the data that could discriminate these subtypes.
There are many different techniques to perform clustering. Most
frequently the technique called hierarchical clustering is performed.
This method divides the data set into clusters, which are further
subdivided into smaller clusters, resulting in a dendrogram. To
cluster the data the method needs a similarity/dissimilarity matrix.
Mostly this distance metric between samples is based on a subset of
the genes measured using the microarrays. Particular metrics that
are frequently used are:
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 167

Euclidean distance: Given the set of selected genes the Euclidean

distance is calculated between the samples. The smaller the dis-
tance the more similar are the samples. A drawback is that genes
with higher expression have the tendency to play a larger part in
this metric.
Pearson correlation: Given the set of selected genes the Pearson
correlation coefficient is calculated between the samples. This metric
is always between -1 (anti-correlation) or 1 (perfect correlation). If
the coefficient is closer to 1, it will imply that the samples are more
similar. While a coefficient closer to −1 implies that the samples are
more dissimilar. A coefficient close to zero implies absolutely no
relation between the samples.
It is imperative that the researcher pre-selects genes before
performing clustering. The rationale behind this rule-of-thumb
relates to the fact that in most diseases a larger proportion of genes
are unaffected. Ultimately, if this proportion is very large, it will
result in distance metrics showing similarity between all samples.
The highest proportion of information, which can be harvested to
generate strong cluster dendrograms, lies in genes showing large
variation. Particular R-packages as well as the tool Cluster [18]
allow the researcher to pre-select genes to generate strong cluster
dendrograms. The outline of performing cluster analysis is as follows,
which will result in a clustering heatmap as illustrated in Fig 3:
Step 1: Decide in advance which distance metric seems most opti-
mal. It has been suggested the Euclidean distance works best for
logarithmic transformed data, while the Pearson correlation coef-
ficient seems to work best for absolute values.
Step 2: Pre-select genes in advance of clustering. This pre-selection
is necessary to optimally differentiate between possible subtypes of
disease. Packages in R as well as the tool Cluster allow the user to
select for genes with high variation.
Step 3: Choose clustering technique of interest. There are a large
number of different clustering techniques which all have their ben-
efits as well as drawbacks, and the choice is highly dependent on
the structure of the data. Most frequently techniques such as hier-
archical or k-means clustering are used.
Step 4: Ultimately the researcher is unaware if the clustering illus-
trates the structure contained within the data, i.e., differentiates all
possible subtypes. There is no score or metric which shows that the
generated dendrogram is the most appropriate. Furthermore, it is
difficult to define in advance the number of genes needed for a
strong clustering dendrogram. Frequently, one will look if known
subtypes cluster together and select the one that associates strongly
with their presumptions, but this is a biased way to look at the
data. There are particular packages, e.g., pvclust [20], which allow
pre-selecting the number of genes, resulting in a strong clustering
dendrogram, in an unbiased way.
168 Mathijs A. Sanders and Peter J.M. Valk

NN (percentage)
B (percentage)
abnormality
A
285 AML patients

Karyotype

FLT3TKD
FLT3ITD

285 AML patients

CEBPA
N-RAS
K-RAS
EV11
FAB
Genes
t(11q23)(43) 43 1
FLT3 ITD(82) 76 2
FLT3 ITD(53) 68 3

CEBPA(53) 67 4

61 5
AML patients

FLT3 ITD(100) 88 6
67 7
8

inv(16)(100) 9

EV11(45) 10
78 11
t(15:17)(100) 12

t(8:21)(100) 13
14
CEBPA(63) 63 15
t(11q23)(45) 16
CD34
NBM
AML patients

Fig. 3 Unsupervised cluster analyses of 285 cases of primary AML. (a) Correlation view of 285 AML patients
(2856 probe sets) [19]. The Correlation visualization tool displays pair-wise correlations between the samples.
The cells in the visualization are colored by Pearson’s correlation coefficient values with deeper colors indicat-
ing higher positive (red) or negative (blue) correlations. The scale bar indicates 100 % correlation (red) towards
100 % anti-correlation (blue). One hundred percent anti-correlation would indicate that genes with high
expression in one sample would always have low expression in the other sample and vice versa. The red
diagonal displays the comparison of an AML patient with itself, i.e., 100 % correlation. In order to reveal cor-
relation patterns, a matrix ordering method is applied to rearrange the samples. The ordering algorithm starts
with the most correlated sample pair and, through an iterative process, sorts all the samples into correlated
blocks. Each sample is joined to a block in an ordered manner so that a correlation trend is formed within a
block with the most correlated samples at the center. The blocks are then positioned along the diagonal of the
plot in a similar ordered manner. (b) Adapted correlation view (2856 probe sets) of 285 AML patients (right
panel) and the expression levels of the top 40 genes defining the 16 individual clusters of patients (left panel).
FAB classification and karyotype based on cytogenetics are depicted in the columns along the original diagonal
of the correlation view (FAB M0-red, M1-green, M2-purple, M3-orange, M4-yellow, M5-blue, M6-grey; karyo-
type: normal-green, inv(16)-yellow, t(8;21)-purple, t(15;17)-orange, t(11q23)/MLL abnormalities-blue, 7(q)
abnormalities-red, +8-pink, complex-black, other-grey). FLT3 ITD, FLT3 TKD, N-RAS, K-RAS and CEBPA muta-
tions and EVI1 overexpression are depicted in the same set of columns (red bar: positive and green bar:
negative). (Reprinted with permission from Valk et al., Copyright 2004, Massachusetts Medical Society.)

3.7.3 Class Comparison Class comparison involves the discovery of differentially expressed
genes among different classes of samples. Analysis methods are
“supervised” when they include prior classification information.
This may be different cell or tissue types or experimental/treat-
ment conditions. For example, when looking at tissues of normal
breast and cancerous breast, the genes that are consistently differ-
entially expressed between them, may be involved in the initiation
or progression of cancer.
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 169

There are many different methods to infer differential expression

and they are all based on hypothesis testing. Such methods gener-
ate a p-value implying the certainty at which the H0 (H-null)
hypothesis is true. In most cases this hypothesis states that there
are no differences between the two groups. If the p-value is suffi-
ciently small this hypothesis is rejected in favor of the alterative
hypothesis H1 implying that there is a difference between the two
groups. The selection of your hypothesis testing method is highly
dependent on the structure of your data and experimental design.
When one assumes that the data is normally distributed, methods
such as the t-test and ANOVA can be used. While non-normally
distributed data must be tested by nonparametric methods such as
the Wilcoxon-Signed rank sum test. Even more elaborate struc-
tures such as time series or hierarchical structures can be tested by
methods called linear mixed models. Hence, the choice of statisti-
cal test is highly dependent on the distribution/form of your data
and structure of your experimental design. In some experiments it
is difficult to generate a large number of samples per condition.
When dealing with microarray data there are particular techniques
who can (partly) overcome this problem of insufficient statistical
power. Packages such as limma [21] use information across genes
to determine if one particular gene is differentially expressed
between two conditions.
Finally, after the selection of the most appropriate statistical
test the researcher is left with uncorrected p-values per gene. In
making many independent observations with the same acceptance
threshold that would be used when considering a single test event,
it can become a problem to control the Type 1 error (the Type 1
error is the probability of rejection the H-null hypothesis when the
H-null hypothesis is in fact true). This Type 1 error is often con-
trolled by methods such as Holm–Bonferroni [22] and false dis-
covery rate (FDR) [23]. Holm–Bonferroni is a very conservative
method that corrects the p-values such that one is left with the
lowest number of false positives, while FDR tries to reject as many
hypotheses (i.e., gene is differentially expressed) while controlling
for the rate of false positives.
An example analysis is given by Valk et al. [19]. In this study,
285 AML samples where characterized on the Affymetrix U133A
GeneChip, measuring 21,765 probe sets, and normalized with the
Affymetrix Microarray Suite (MAS5.0). Using unsupervised analy-
sis, i.e., hierarchical clustering, with 2,856 probe sets it finally
resulted in a dendrogram containing 16 AML clusters, as illus-
trated by the correlation view in the right panel of Fig. 3b.
Subsequently, all supervised analyses in this study were performed
using significance analysis of microarrays (SAM) [24]. SAM calcu-
lates a score for each gene on the basis of change in gene expres-
sion relative to the standard deviation of all 285 measurements.
The inferred q-value for each gene represents the probability that
170 Mathijs A. Sanders and Peter J.M. Valk

it is falsely called significantly deregulated. Gene characteristics of

each of the 16 clusters were obtained after supervised analysis. The
expression profiles of the distinct subsets of genes, either up- or
down-regulated, are plotted in Fig. 3b in the left panel alongside
the correlation view. Noteworthy, the SAM algorithm is consid-
ered outdated as of now, as significant improvements have been
made of the years.
The outline of statistical testing differential expression:
Step 1: Determine if your data is normally distributed. This can be
done by looking at Q–Q plots or using the Kolmogorov–Smirnov
test.
Step 2a: Data normally distributed. Use techniques such as the
t-test for independent samples or paired t-test for paired samples.
Other techniques include ANOVA or limma.
Step 2b: If not, use nonparametric method based heavily on permu-
tating the data. Example: Wilcoxon-Signed rank sum test.
Step 2c: If there is a structure in the data, e.g., time series, it is best
to use linear mixed models.
Step 3: Correct the p-values to your liking. If facing with an explor-
atory experiment it is best to select FDR, while Holm–Bonferonni
is more appropriate for conformation experiments.
All of the described methods can be performed with most com-
mercial statistical packages as well as with R. Additionally, there are
packages which allows the researcher to infer if particular pathways
are differentially expressed (not implying gene set enrichment) [25].

3.7.4 Class Prediction Class prediction is a supervised technique which allows the discov-
ery of genes that alone, or in combination can predict which class
a sample belongs to. Please note that the genes which most opti-
mally classify the samples are not necessarily the most differentially
expressed. These techniques are particularly useful in diagnostics as
specific profiles can be inferred, with the ability to predict subtypes
or which patient will be effectively treated.
Many classification algorithms have been developed during the
last decades. Most methods select, by means of cross-validation, an
appropriate subset of genes which most optimally classifies the
samples. To infer if the prediction algorithm works well the
researcher has to split his data set into a training set and test set.
The model is trained on the training set by means of cross-valida-
tion to select the most appropriate subset of genes and finally infers
the model accuracy on the test set. The prediction accuracy should
never be inferred from the training data from which the model is
constructed, as it will lead to overfitting and an over-estimation of
the prediction accuracy.
There are different methods to perform the classification, but
they can generally be divided into two groups when it comes to
gene selection.
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 171

Discrete selection: Genes are selected as single entities. Forward

selection starts off with an empty set of genes and will subsequently
add one gene until the optimum is reached. Backward selection
starts off with a set containing all genes and will subsequently remove
genes until the optimum is reached. A drawback is that all genes in
the final set will contribute equally, decreasing the interpretability, as
it becomes difficult to infer which gene is important.
Continuous selection: These methods select genes by a technique
called shrinkage. By optimizing a regularization parameter particu-
lar genes are removed from the model, but the remaining genes are
weighted in such a way that the model results in the optimal pre-
diction on the training set. This is an unbiased way of selecting
genes, but additionally leads to an increased interpretability of the
given set of genes in relation to the outcome. Given the weights
per gene one can infer if a particular gene has a strong impact on
the classification of a class. The most well-known technique is
called (multinomial) logistic regression with lasso penalization,
also called “the lasso” [26].
Additionally, these methods enable the user to classify multiple
classes at the same time, also called multi-class prediction. Most of
these algorithms can be utilized within the R environment in
conjunction with Bioconductor [27], but less extensively in most
commercial packages.
Outline class prediction:
Step 1: Normalize and/or transform data if necessary. If the data
set has not been normalized in advance, then it should be done
before performing classification as microarrays should be compa-
rable. Additionally, some researchers transform their data for rea-
sons of interpretability or resulting in more optimal classification
accuracy.
Step 2: Split data set randomly in a training and test set. Generally,
one generates a larger training set than test set. As long as the test
set is sufficiently large to accurately infer the prediction accuracy.
Additionally, it is possible that subtype incidences are preserved in
the training and test set to prevent that for instance the training set
contains all samples having a rare disease subtype.
Step 3: Determine number of folds for the cross-validation. Usually,
this is ten, but it is highly dependent on the number of samples in
your dataset and the structure of your data.
Step 4: Determine classification algorithm. The accuracy of your
algorithm is highly dependent on the structure of your data. There
is no algorithm that excels at all problems. Most researchers try a
large number of different algorithms and determine the best on
the basis of prediction accuracy. Also interpretability plays a large
role for which the continuous selection procedures are most
optimal.
172 Mathijs A. Sanders and Peter J.M. Valk

3.8 Analyses In the following paragraphs important recommendations are given

of Genome-Wide how to successfully accomplish the major study objectives when
Genotyping Data analyzing SNP arrays, i.e., genome-wide association, loss of het-
erozygosity, copy number variation.

3.8.1 Genotyping SNP microarrays grant the ability to infer the genotype of SNPs.
SNP arrays have been used en masse to determine genotypes for
large numbers of SNPs per individual (~900,000 for Affymetrix
SNP 6.0). After pre-processing and labeling the DNA the scanner
determines the signal intensities of the probes that are tilled over
the SNP, as illustrated in Fig. 2. Based on the signal intensities the
researcher can then determine if the SNP allele has the genotype;
major/major (AA), major/minor (AB), minor/minor (BB), and
noCall (undetermined genotype).
These genotype calls can be used to perform Genome-Wide
Association (GWAS) analysis to infer if a particular genotype of a
SNP or a combination of SNP’s is associated with the outcome,
e.g., disease. This in particular is very difficult due to issues of sta-
tistical power. Using microarrays we are generating more than
900,000 observations. To infer if there is an association between a
SNP and the outcome we need to perform a statistical test. Since
we are performing this test on all these SNPs we will need to per-
form multiple testing correction, as shown above. This will result
in strong p-value corrections, because the large number of mea-
sured SNPs, resulting in no statistical significant association
between the SNP and the outcome. This is one of the reasons why
these studies need many samples.
Genotyping of SNPs makes it also possible to infer Loss of
Heterozygosity (LOH). In cancer research this is of particular
interest since these regions may contain tumor suppressor genes
with equal or similar mutations on both alleles. In the case of
paired samples, e.g., normal–cancerous tissue or diagnostic–remis-
sion sample, this can easily be done by determining regions of
homozygous genotypes (AA or BB) in the case samples which are
not observed in the control sample. If no control sample is avail-
able the regions of LOH can be determined by hidden Markov
models (HMM).
Outline inference genotype:
Step 1: The Affymetrix software package, GCOS, always contains
the algorithms to determine the genotype calls. Also, the stand-
alone software package Affymetrix-Power-Tools (APT) [28] has
the ability to determine the genotype calls per sample.
Step 2: Using these algorithms, it is best to determine the genotype
calls in a large batch of microarrays, as these algorithms are strongly
dependent on using the distribution of signal intensities over all
samples to optimally determine the genotype calls.
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 173

3.8.2 Copy Number One vital, but very difficult, step in SNP microarray analysis is
Analysis determining the copy number variations (CNVs). In particular
types of cancer genomic regions that are recurrently deleted or
amplified could pinpoint towards tumor suppressor genes or onco-
genes. Determining CNVs is by far the most difficult aspect of
analyzing SNP arrays.
There are multiple algorithms to determine CNVs from SNP
arrays, such as CNAG [29] and dChip [16]. In this section we will
focus on determining the CNVs using dChip. It is imperative,
especially for determining CNVs, to have a correct experimental
design. To optimally determine the CNVs per sample there must
be an appropriate reference batch (e.g., multiple samples) from
which the signal distribution per SNP can be determined under the
assumption that the reference has a normal copy number, i.e., dip-
loid. In a particular case it could happen that one or more samples
in the reference batch are not diploid for a particular individual
SNP. Even in this case the algorithm can accurately estimate the
copy number of a sample, and will work as long as this does not
occur in a too large proportion of reference samples.
Additionally, these reference samples should not be selected
from an online database. Sources of variation, such as different
methods of DNA isolation, and batch effects will result in very bad
estimations of CNVs. The most appropriate way to generate your
reference batch is to run some SNP arrays for normal karyotype,
remission, or healthy samples. Most effective would be to also have
a 1:1 male–female ratio in the reference batch to accurately esti-
mate CNVs for the X and Y chromosomes.
The outline for CNVs estimation in dChip [16]:
Step 1: Construct an appropriate experimental design. Determine
your reference set on the basis of DNA quality, expected normal
copy number (remission or healthy samples), and similar DNA iso-
lation and pre-processing methods.
Step 2: Gather all necessary files to load your SNP microarrays into
dChip. Next to your .CEL files generated by the scanner, you will
need a CDF file (library file describing the array) from the
Affymetrix Web site (select your array type), and a patient info file
describing which sample belongs to the reference batch and
gender.
Step 3: When all samples are loaded one should check if none of the
samples contain irregularities. It is possible to visualize the signal
intensities of the array in dChip. This grants the ability to look for
artifacts, such as blobs or scratches, resulting in wrongly measured
signal intensities. In the case a reference samples shows these arti-
facts, then it should be removed from the reference batch.
Step 4: Normalize to a reference sample. Just like for gene expres-
sion, microarrays the signal intensity distribution must be made
174 Mathijs A. Sanders and Peter J.M. Valk

comparable. Optimally one should select a reference sample with-

out any artifacts on the array.
Step 5: Determine CNVs. After normalization, dChip grants the
ability to calculate the raw copy number per SNP. There is still
some considerable variation per probe set, which is the reason for
an additional smoothing step. This smoothing step takes the raw
copy numbers and calculates the copy number of a SNP by taking
into account the copy numbers of SNPs in it vicinity. For SNP 6.0
this number is optimally set to 10.

3.9 Integrated Accurate analyses of comprehensive genome-wide SNP genotyping

Analyses and gene expression data sets is challenging for many researchers.
High-density genome-wide views of biological samples, using
high-throughput DNA mapping and mRNA gene expression
microarrays facilitate the identification of recurrent molecular
lesions. The number of software packages facilitating the researcher
in visualizing SNP genotyping and mRNA gene expression data in
a combined view is still lacking. This combined view grants the
power to discern if particular (recurrent) molecular lesions have
effect on nearby situated genes. One package that can effectively
combine these two datasets is called SNPExpress [30]. This package
enables the researcher to combine the SNP array data (genotype
calls, CNVs) and gene expression data for multiple samples in one
comprehensive plot. Furthermore, this software packages has the
ability to plot additional information such as gene location based
of Refseq and has the ability to infer Loss of Heterozygosity using
a hidden Markov model. Using these visualizations the researchers
can easily infer if a particular CNV has an effect on gene expres-
sion, as illustrated in Fig. 4. Furthermore, one has the ability to
zoom in on particular regions of interest to see with a high resolu-
tion if genetic aberrations have an effect on gene expression. This
software package is programmed in JAVA, hence is cross-platform,
i.e., can be run on all Operating Systems as long as the Java
Runtime Environment is installed. It is memory-efficient and easy
to run. Finally, the input files can be transformed to binary file for
(fast) random access.
Outline SNPExpress (example files on http://www.planet-
mathematics.com/SNPExpress/):
Step 1: Generate a tab-delimited matrix file containing the genotypes.
On the rows are the probe sets on the columns are the samples.
Step 2: Generate a tab-delimited matrix file containing the copy
numbers. Same format as the genotype file.
Step 3: Generate a tab-delimited matrix file containing the gene
expression levels. Same format as the genotype file.
Step 4: Download SNPExpress from given URL and download
additional annotation files from the same Web site.
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 175

Fig. 4 Integrated analyses using SNPExpress (available at: http://www.planetmathematics.com/SNPExpress/)

[30]. DNA mapping array data from the Affymetrix 250K NspI DNA mapping array was used to sequentially
align the genotypes and copy numbers of chromosome 7 of four AML samples. The copy numbers (n = 0, 1, 2,
3, 4) are shown for each individual patient by horizontal lines. Copy number n = 2 is depicted by a green line
(A). The SNP genotypes are sequentially aligned along the chromosome (AA: red; BB: yellow; AB: blue, noCall:
white). LOH is indicated by a thick magenta horizontal bar (A), gains (default n > 2.5) by a pink (Fig. 1c) and
losses (default n < 1.5) by a turquoise background (C). Gene expression levels are visualized as vertical white
bar at the chromosomal position of the gene-specific probe set. In the event that multiple probe sets span the
same region in the chromosome-wide view the vertical gene expression bars are red and proportional to the
highest expression value. The two upper samples clearly display a decreased copy number as was previously
shown by cytogenetics, i.e., a complete monosomy (sample 1) or a deletion of the q-arm of chromosome 7
(sample 2). The overall expression of the majority of genes in the displayed region is decreased in the samples
with chromosome 7 abnormalities. The chromosome selector (D; where 23 is the X chromosome), the mouse-
over function showing info of each SNP or probe set (E), full chromosome view (F), zoom function (G) gene
search function (H), the links to external databases (I), display CNVs (J), and export selected data (K) options
are indicated

4 Notes

1. The analyses described in this chapter are based on [19, 30].

2. Standardizing laboratory steps such as sample preparation,
labeling, and hybridization are important issues to minimize
the interpretation variability within and across microarray
experiments.
176 Mathijs A. Sanders and Peter J.M. Valk

3. A major drawback in the generation of tumor-specific gene

expression profiles is the inevitable infiltration of normal cells
in solid tumors. AML, like other types of cancer, is a heteroge-
neous group of diseases, however, cells can be relatively easily
harvested and a simple ficoll procedure will result in high
percentages of malignant cells (generally above 80 %).
Accordingly, microarray-generated profiles of AML are
tumor-specific.
4. Cells can also be viably frozen in 70 % PBS, 20 % FCS and 10 %
DMSO in liquid nitrogen. Thaw these cells quickly at 37 °C
until some ice is left. This will result in the highest quality of
viable cells as well as RNA and DNA. Pellet cells by 5 min cen-
trifugation at 4 °C at 300 × g.
5. Keeping RNA extraction procedures consistent and making
sure that the samples yield good quality-RNA is of utmost
importance. The treatment of tissue before the extraction of
RNA is also important. Fresh frozen tissue must have been
handled consistently. Tissue that has not been immediately fro-
zen may have degradation of RNA species, as will have the
freezing–thawing cycles of samples. With CsCl centrifugation
highly purified RNA is isolated, however, small RNAs, such as
microRNAs are lost. Besides CsCl centrifugation, RNA can
also be isolated following other methods, such as RNABee iso-
lation or using purification columns, all having advantages and
disadvantages. RNABee isolation is phenol-based, but includes
small RNAs, whereas column-based isolations generally give
lower yields.
6. 10 × 106 cells will yield 10 μg RNA. In case of AML, sufficient
mononuclear cells are harvested.
7. Determine the quality of the RNA. For instance, with an
Agilent Bioanalyzer. Only use RNA with a RNA integrity num-
ber (RIN) of seven or higher. Generally RNA isolated from
AML mononuclear cells has a high RIN value.
8. High salt based DNA isolation procedures result in high-
molecular weight genomic DNA. However, column-based
procedures also result in high-quality DNA. Do not use differ-
ent procedures for isolation of genomic DNA in the same
experiment.
9. 1 × 106 cells will yield 8–10 μg DNA. In case of AML, sufficient
mononuclear cells are harvested.
10. Do not place the reaction on ice because of DNA precipitation
of the sample as a result of the spermidine.
Genome-Wide Gene Expression Profiling, Genotyping, and Copy Number Analyses… 177

References
1. http://www.affymetrix.com genome-wide expression patterns. Proc Natl
2. http://www.home.agilent.com/agilent/home Acad Sci USA 95(25):863–868
3. http://www.illumina.com 19. Valk PJM, Verhaak RGW, Beijen MA,
4. Millenaar FF et al (2006) How to decide? Erpelinck CAJ, Barjesteh van Waalwijk van
Different methods of calculating gene expres- Doorn-Khosrovani S, Boer JM, Beverloo HB,
sion from short oligonucleotide array data will Moorhouse MJ, van der Spek PJ, Löwenberg
give different results. BMC Bioinformatics B, Delwel R (2004) Prognostically useful
7:137 gene-expression profiles in acute myeloid leu-
kemia. N Engl J Med 350:1617–1628
5. Lipshutz RJ et al (1999) High density syn-
thetic oligonucleotide arrays. Nat Genet 21(1 20. Suzuki R, Shimodaira H (2006) Pvclust: an R
Suppl):20–24 package for assessing the uncertainty in hierar-
chical clustering. Bioinformatics 22(12):
6. Lipshutz RJ (2000) Applications of high- 40–42
density oligonucleotide arrays. Novartis Found
Symp 229:84–90, discussion 90–3 21. Smyth GK (2004) Linear models and empiri-
cal Bayes methods for assessing differential
7. Burnett A, Wetzler M, Lowenberg B (2011) expression in microarray experiments. Stat
Therapeutic advances in acute myeloid leuke- Appl Genet Mol Biol 3(1):Article 3
mia. J Clin Oncol 29(5):487–494
22. Holm S (1979) A simple sequentially rejective
8. Marcucci G, Haferlach T, Dohner H (2011) multiple test procedure. Scand J Stat
Molecular genetics of adult acute myeloid leu- 6(2):65–70
kemia: prognostic and therapeutic implica-
tions. J Clin Oncol 29(5):475–486 23. Benjamini Y, Hochberg Y (1995) Controlling
the false discovery rate: a practical and power-
9. Wouters BJ, Löwenberg B, Delwel R (2009) A ful approach to multiple testing. J R Stat Soc
decade of genome-wide gene expression pro- Ser 57(1):289–300
filing in acute myeloid leukemia: flashback and
prospects. Blood 113(2):291–298 24. Tusher VG, Tibshirani R, Chu G (2001)
Significant analysis of microarrays applied to
10. http://media.af fymetrix.com/suppor t/ the ionizing radiation response. Proc Natl
downloads/manuals/expression_analysis_ Acad Sci USA 98(9):5116–5121
technical_manual.pdf
25. Goeman JJ, van der Geer SA, de Kort F, van
11. http://media.af fymetrix.com/suppor t/ Houwelingen HC (2004) A global test for
downloads/manuals/genomewidesnp6_man- groups of genes: testing association with a clin-
ual.pdf ical outcome. Bioinformatics 20(1):93–99
12. Harmer SL, Kay SA (2000) Microarrays: 26. Tibshirani R (1996) Regression shrinkage and
determining the balance of cellular transcrip- selection via the lasso. J R Stat Soc Ser
tion. Plant Cell 12(5):613–616 58(1):267–288
13. WB van Leeuwen, C Vink (2009) Molecular 27. http://www.r-project.org/
Diagnostics—Techniques & Applications. IVA
Groep B.V. Rotterdam, the Netherlands. 28. http://www.affymetrix.com/partners_pro-
ISBN 978-90-6464-340-8 grams/programs/developer/tools/pow-
ertools.affx
14. Clarke JD, Zhu T (2006) Microarray analysis
of the transcriptome as a stepping stone 29. Nannya Y, Sanada M, Nakazaki K, Hosoya N,
towards understanding biological systems; Wang L, Hangaishi A, Kurokawa M, Chiba S,
practical considerations and perspectives. Plant Bailey DK, Kennedy GC, Ogawa S (2005) A
J 45(4):630–650 robust algorithm for copy number detection
using high-density oligonucleotide single
15. Wu Z, Irizarry RA, Gentleman R, Martinez- nucleotide polymorphism genotyping arrays.
Murillo F, Spencer F (2004) Model-based Cancer Res 65:6071–6079
background adjustment for oligonucleotide
expression arrays. J Am Stat Assoc 99:909 30. Sanders MA, Verhaak RGW, Geertsma-
Kleinekoort WM, Abbas S, Horsman S, van der
16. http://www.dchip.org Spek PJ, Löwenberg B, Valk PJM (2008)
17. http://www.omniviz.com SNPExpress: integrated visualization of genome-
18. Eisen MB, Spellman PT, Brown PO, Botstein wide genotypes, copy numbers and gene expres-
D (1998) Cluster analysis and display of sion levels. BMC Genomics 25(9):41
Chapter 11

Epigenetic Techniques in Pharmacogenetics

Sandra G. Heil

Abstract
Pharmacoepigenetics is an emerging field, which can be studied by several approaches. Addressing DNA
methylation status of drug-metabolizing enzymes and transporters (DMET) is challenging and might
provide answers in relation to interindividual differences in pharmacokinetics and pharmacodynamics.
Studying genetic variation in DMET genes in relation to drug response has been the main focus of phar-
macogenetics laboratories; it is, however, expected that epigenetic modifications will play a role in drug
responses as well. Some of the variations in drug-responses cannot be explained by genetic variation in
DMET genes. For those particular genes it might be interesting to examine the DNA methylation status
in relation to pharmacokinetics. In this chapter we discuss the methods available and provide a protocol to
quantify DNA methylation status of CpG sites in candidate genes, which can readily be applied to most
pharmacogenetics laboratories. In addition, we provide details about optimization and validation of the
method in terms of technical specificity and technical sensitivity and precision of the method.

Key words Pharmacoepigenetics, DNA methylation, Bisulfite, Real-time quantitative PCR

1 Introduction

Epigenetics is originally defined as “inheritable changes in gene

function that cannot be explained by changes in DNA sequence” [1].
Epigenetic regulation of gene expression is important during
development and disease progression but might also explain inter-
individual variations in drug response. The most studied epigenetic
mechanism is DNA methylation.
DNA methylation is essential for normal development and is
involved in genomic imprinting, X-chromosome inactivation and
carcinogenesis. DNA methylation involves the addition of a methyl
group from S-adenosylmethionine to cytosine in a CpG context by
DNA methyltransferases (DNMT). Between 70 and 80 % of the
CpG sites of the mammalian genome are methylated. Large stretches
of unmethylated CpG sites can be found in so-called CpG islands
located near promoter regions of many genes. Methylation of CpG
islands is associated with gene silencing, whereas methylation of

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_11, © Springer Science+Business Media, LLC 2013

179
180 Sandra G. Heil

CpG sites in gene bodies has been described to be positively

correlated with transcription (see for review ref. [2]).
Large differences in drug response are present between indi-
viduals, which can for a considerable part be explained by genetic
variation of drug metabolizing enzymes and transporters (DMET)
(i.e., pharmacogenetics). Interestingly, recent studies suggest that
aberrant DNA methylation can also explain these interindividual
drug responses, a field which is known as pharmacoepigenetics (for
review see refs. 3, 4). Pharmacoepigenetics is an emerging field
which is of potential interest to interindividual differences in drug
responses which are strongly regulated at the transcript level and
cannot be fully accounted for by genetic variation [3, 4].
Several techniques can be applied to study DNA methylation
on either a global, genome-wide or gene-specific way of fashion
(for review see refs. 5–7). Concerning the field of pharmocoepi-
genetics studying gene-specific DNA methylation of DMET genes
might be of particular interest and determination of gene-specific
methylation will therefore be described in this chapter.

1.1 Techniques Studying DNA methylation is depending on the ability to discrimi-

to Study DNA nate between methylated cytosines and unmethylated cytosines.
Methylation In the early 1980s, DNA methylation was studied by Southern
blotting using methylation sensitive restriction endonucleases [8].
This method is relatively simple and does not require any special
instrumentation. However, large amounts of DNA are required and
analysis is hampered by the limitation to CpG sites present within
restriction recognition sites. This method has largely been replaced
by bisulfite modification combined with PCR based methods [9].

1.1.1 Bisulfite Treatment Treatment of genomic DNA with sodium bisulfite converts
unmethylated cytosines into uracil remaining methylated cytosines
as cytosines. The bisulfite-converted genomic DNA is then sub-
jected to PCR, in which uracil residues will be amplified as thymine-
residues and methylated cytosines as cytosines, enabling simple
discrimination by detection techniques like Sanger sequencing [9].
Bisulfite treatment can be performed by the initial protocol
described by Frommer et al. [9]. However, also bisulfite modifica-
tion kits are available that enable high-throughput bisulfite-
modification and provide good quality results (e.g., Zymo
Research). Importantly, a control reaction should be performed to
assess whether all unmethylated cytosines are modified into thy-
mines after PCR. Inefficient bisulfite-modification might result in
false-positive calls due to incomplete conversion of unmethylated
cytosines into uracils.

1.1.2 PCR Based Several PCR-based methods to determine gene-specific DNA

Detection Methods methylation status have been published [7], which will be dis-
cussed in relation to pharmacoepigenetics. The gold standard is
Epigenetic Techniques in Pharmacogenetics 181

Fig. 1 Detection of DNA methylation by generally applied PCR-based quantitative methods. Genomic DNA is
treated with bisulfite which modifies unmethylated cytosines into uracil leaving methylated cytosines in a CpG
content unchanged. After PCR the uracils will be amplified as thymines. Several quantitative detection meth-
ods can be applied to discriminate between cytosines and thymines (e.g., cloning followed by Sanger sequenc-
ing, sequencing-by-synthesis based approaches such as Pyrosequencing®, mass spectrometry by epiTYPER™,
and real-time quantitative PCR based approaches such as MethyLight)

PCR-based methylation detection by bisulfite-sequencing, which

is quantitative but labor-intensive when combined with cloning of
individual alleles [9]. Recent technologies enable detection of
bisulfite-modified DNA by methods like real-time quantitative
PCR (qPCR), pyrosequencing, and mass-spectrometry (Fig. 1)
[10–12]. These methods can roughly be divided into sensitive and
quantitative methods [5]. Sensitive methods enable detection of
methylated alleles in presence of large amount of unmethylated
alleles without quantifying them (e.g., MethyLight), whereas quan-
titative detection methods enable (relative) quantification of meth-
ylation status of (single) CpG sites (e.g., Cloning in combination
with Sanger sequencing, Pyrosequencing®, and epiTYPER™) [5].
The choice of technique is largely dependent on the main research
question. In the field of pharmacoepigenetics in which the main
question is to find epigenetic changes that contribute to pharma-
cokinetics, assessing methylation status of specific CpG sites in
182 Sandra G. Heil

Drug Metabolizing Enzyme and Transporter (DMET) genes by a

quantitative approach will be the preferred method.
Most pharmacogenetics laboratories have instruments avail-
able to analyze single-nucleotide polymorphisms (e.g., PCR,
real-time PCR) and these instruments can be readily applied for
methylation analysis. For that reason, a sodium-bisulfite qPCR
based method originally described by Laird and coworkers will be
discussed in detail [10]. This method enables accurate quantifica-
tion of methylation status of a multipe CpG site in a gene of inter-
est (GOI).

2 Materials

2.1 Control DNA Unmethylated and methylated human DNA from Zymo Research
(Cat No. D5014, ZymoResearch, BaseClear Leiden, The
Netherlands) can be used as control DNA during bisulfite treat-
ment. This bisulfite-treated control DNA is subsequently used as
control DNA in the PCR reaction.

2.2 Bisulfite ●● Zymo EZ DNA methylation direct kit (Cat. No. D5021,
Modification Zymo Research).
●● Eppendorf microcentrifuge tubes 1.5 and 2.0 mL.
●● Absolute ethanol (Cat.No. 1.00983.2500, Merck).

2.3 Real-Time ●● Taqman GTXpress Master Mix containing AmpliTaq Gold,

Quantitative PCR Buffer, MgCl2, and dNTPs (Cat. No. N808-0249, Applied
Biosystems).
●● Primers (10 pmol/L final concentration, Invitrogen,
LifeTechnologies, The Netherlands).
●● FAM labelled Taqman probes with Black Hole 1 quencher
(BHQ1) (10 pmol/L final concentration, Biolegio, Nijmegen,
The Netherlands).

3 Methods

3.1 DNA Isolation High-quality genomic DNA is preferred for quantification of DNA
methylation. Isolation can be performed by several protocols as
long as the A260/280 ratio and the A260/230 ratio are around
1.8. DNA quality can be checked by an UV spectrophotometer,
for example with the NanoDrop (NanoDrop, Thermo Scientific).

3.2 Bisulfite Control DNA (unmethylated and methylated) and sample DNA is
Modification treated with bisulfite using the Zymo EZ DNA methylation direct
kit (Cat. No. D5021, Zymo Research). Reagents are prepared
according to the manufacturer’s instructions.
Epigenetic Techniques in Pharmacogenetics 183

1. Add 20 μL of genomic DNA (25 ng/μL) to 130 μL of CT

conversion reagent and incubate in a PCR machine (PTC-
200, MJ Research) with heated lid using the following proto-
col: 98 °C/8 min (denaturation), 64 °C/3.5 h (bisulfite
modification) and 4 °C (storage up to 20 h).
2. Place a Zymo-Spin™ IC column into a provided collection
tube and add 600 μL of M-binding buffer (use fume hood
when working with M-binding buffer).
3. Add the sample from step 1 to the Zymo column containing
the M-binding buffer and mix the sample by pipetting several
times.
4. Centrifuge the columns at full speed (>10,000 × g) for 30 s.
5. Place each Zymo-Spin™ IC column in a new collection tube
or 2 mL Eppendorf tube and throw away the collection tube
used in steps 2–4 (see Note 1).
6. Add 100 μL of M-wash buffer and centrifuge at full speed
(>10,000 × g) for 30 s.
7. Add 200 μL M-Desulfonation buffer to the column and incu-
bate for 20 min at room temperature.
8. Centrifuge the sample 30 s at full speed (>10,000 × g).
9. Add 200 μL of M-wash buffer to the column. Centrifuge at
full speed (>10,000 × g), discard the supernatant and add
another 200 μL of M-wash buffer.
10. Centrifuge at full speed (>10,000 × g) for 30 s and place the
Zymo-Spin™ IC column into a 1.5 mL Eppendorf microcen-
trifuge tube.
11. Add 50 μL of M-Elution buffer directly to the matrix and
centrifuge at full speed (>10,000 × g) for 30 s to elute the
DNA.
12. Calculate the DNA concentration applying the RNA-40 factor
on a UV-spectrophotometer like the NanoDrop (see Note 2).
13. Calculate the recovery of the bisulfite treatment by the follow-
ing formula:

Recovery =
[ Amount of DNA after bisulfite treatement (ng)] × 100 %
[ Amount of DNA used in bisulfite treatement (ng)]

Normally, recoveries of >80 % are obtained. If recoveries are

lower, repeat the bisulfite treatment.
14. Use DNA immediately for qPCR or store the DNA at −20 °C
for later use. It is recommended to use the DNA within a
month as bisulfite-treated DNA is quickly degraded.
184 Sandra G. Heil

3.3 Real-Time qPCR can be used to quantify the amount of cytosines and thymines
Quantitative PCR in the GOI. This method was originally described by Laird and
coworkers as MethyLight [10, 13]. qPCR conditions should be
optimized for the GOI and for the reference gene. We provide a
general protocol to optimize the qPCR assay including PCR effi-
ciency, validation and calculation of percentage methylated refer-
ence (PMR) [13] but do not provide sequence specific information
like primer and probe sequences as this is dependent upon the GOI.
1. Design primers based at the bisulfite-converted sequence of the
GOI and of the reference gene. We generally apply MethPrimer
software [14]. Choose primers for bisulfite-sequencing to
obtain primers that do not contain CpG sites (see Note 3).
2. Design a probe specific for methylated cytosines within the
GOI. We generally apply Taqman probes labelled with FAM
and a Black Hole Quencher (BHQ-1).
3.
Design a probe specific for a bisulfite-treated reference
sequence to control for amount of input DNA. See Note 4.
We generally apply Taqman probes for beta-actin labelled with
FAM-BHQ1.
4.
Optimize the PCR reaction using standard protocols and
chemicals using 1–5 μL of bisulfite-treated control DNA. Run
each reaction in triplicate.
5.
After optimization and validation (see Subheading 3.3.1–
3.3.3) run each sample in triplicate for each gene (monoplex).
Run standard curves in triplicate for GOI and reference gene
on each plate (see Note 5).

3.3.1 PCR Efficiency 1. Make a five-times dilution series of the methylated control
DNA (e.g., we prefer dilution series of undiluted and 5, 25,
125, 625, and 3,125 times dilution).
2. Run the optimized PCR protocol for both the GOI and the
reference gene at the ABI Prism 7000 Sequence detection
system.
3. Plot the threshold cycle (CT value) against the log of the dilu-
tion factor and calculate the coefficient of determination (R2)
and slope using linear regression (e.g., Excel or Analyse-it).
PCR efficiency can be calculated by the following formula
[15] (see Note 6):

PCR efficiency = 10 −1/ slope − 1 × 100 %

If the PCR is not 100 % efficient optimize the reaction fur-
ther by adapting annealing temperature and magnesium chlo-
ride concentration (3–5 mM). Otherwise, design new primers
and/or probes.
Epigenetic Techniques in Pharmacogenetics 185

3.3.2 PCR Bias, 1. Make a standard curve using mixtures of unmethylated

Technical Specificity control DNA with increasing amounts of methylated control
and Technical Sensitivity DNA (0, 20, 40, 60, 80, and 100 %). Perform the bisulfite
modification protocol as described in Subheading 3.2.
2. Run the optimized protocol in triplicate for both the GOI and
reference gene.
3. Plot the mean threshold cycle (CT value) of the GOI against
the log of % of methylated DNA and calculate the regression
coefficient and slope for the GOI using linear regression (e.g.,
Excel or Analyse-it). A positive signal should be obtained for
the methylated control DNA whereas the unmethylated con-
trol DNA (0 % methylation) should not be amplified (CT > 40)
(i.e., technical specificity). In addition, the standard curve
should be linear with a PCR efficiency of 90–100 % indicating
that the method is able to discriminate between different lev-
els of DNA methylation (i.e., technical sensitivity).
4. Plot the mean threshold cycle (CT value) of the reference gene
against the log of % of methylated DNA using linear regres-
sion (e.g., Excel or Analyse-it). This should result in a straight
flat line with a slope of 0 as this reaction is independent of
methylation status. If the line is not straight a PCR bias due to
preferential amplification of the methylated allele might be
present and the PCR should be further optimized or new
primers/probe should be designed.

3.3.3 Precision 1. Perform repeated measurements (n = 5) of three concentra-

tions (e.g., 5, 25, and 125 times diluted DNA) of the bisulfite-
treated control DNA at 5 consecutive days to calculate the
within-run (i.e., repeatability) and between-day precision.
2. Perform two runs a day at 5 consecutive days to calculate the
between-run precision.
3. Calculate the mean with standard deviation (SD). Calculate the
coefficient of variation (CV%) for each precision and the total
precision (i.e., reproducibility) by the following formula:

Total precision = SDbetween-day

2
+ SDbetween-run
2
+ SD2within-run

3.3.4 Calculation of Relative quantification is applied to quantify the DNA methylation

Percentage Methylated status of the GOI. The most frequently applied methods are the
Reference ΔΔCT method, which originates from the field of gene-
quantification [16] and calculation of the percentage methylated
reference (PMR), which is most frequently applied in the field of
epigenetics [13]. Both methods are relative quantification meth-
ods based at the same principle. In this chapter we will describe the
186 Sandra G. Heil

PMR method as originally described by the group of Laird and

coworkers [13].
1. Plot the mean threshold cycle (CT value) against the log of the
percentage methylation and calculate the coefficient of deter-
mination (R2) and the slope using linear regression (e.g., Excel
or Analyse-it). Two regression equations should be calculated;
one for the GOI and one for the reference gene.
2. Calculate the amount of (methylated) DNA from the mean CT
values using the regression equation of the GOI and the refer-
ence gene respectively. This value is further addressed as Value.
3. Calculate the PMR for each sample by the following formula
[13]:

Valuesample [GOI ] / Valuesample [ REF]

PMR = × 100%
Valuecontrol [GOI ] / Valuecontrol [ REF]

In this formula, GOI indicates the gene of interest and REF
indicates the reference gene. Sample indicates the test sample
and control indicates the 100 % methylated control DNA
(see Note 7).

4 Notes

1. Use a new tube for each step and use 1.5 mL or 2.0 mL
Eppendorf tubes instead of the original collection tubes.
2. Bisulfite-treated DNA resembles the characteristics of RNA
(e.g., contains uracil and is single-stranded). For that reason
we apply the RNA-40 factor in UV-spectrophotometry to cal-
culate the concentration (i.e., 1 OD260 Unit = 40 ng/μL
bisulfite-treated DNA).
3. Universal primers in combination with a probe containing one
or multiple CpG sites results in quantification of one to five
CpG sites at once depending on the sequence of the GOI.
A disadvantage of this approach is that quantification of meth-
ylation status of a single CpG is difficult to obtain due to the
presence of multiple CpG sites within the probe sequence.
Other methods such as (pyro)sequencing are available for
quantification of single CpG sites.
4. A frequently used reference gene used for quantification of
methylation is beta-actin. However, as a reference gene is
applied to control for the amount of input DNA, each gene
can theoretically be chosen as long as multiple copies of the
gene have not been described. The primers and probe of the
reference gene do not contain CpG sites and thus are specific
Epigenetic Techniques in Pharmacogenetics 187

for bisulfite-treated DNA independent of methylation status.

This reference gene can be applied in each qPCR assay to
correct for amount of input DNA.
5. We prefer to run the dilution series and samples on one plate.
However, if more samples need to be measured several plates
can be used, depending on the total precision (see
Subheading 3.3.3). If more than one plate should be used
than we prefer to include standard curves of GOI and refer-
ence gene at each plate.
6. An efficient PCR should result in a slope of −3.32 with a coef-
ficient of determination (R2) of >0.990, respectively. In gen-
eral, we obtain PCR efficiencies of 90–100 % with R2 of
>0.995.
7. PMR is based at relative quantification and the PMR therefore
does provide information about the absolute amount of meth-
ylation [13]. In case–control studies the PMR values of
patients and controls can be compared to each other by calcu-
lating a mean value and performing statistics. In a diagnostic
perspective the PMR can be calculated and interpreted as rela-
tive measure in relation to established reference values.

Acknowledgment

The expert technical assistance of Mr. Pieter Griffioen is gratefully

acknowledged.

References
1. Russo VEA, Martienssen RA, Riggs AD (1996) 7. Kristensen LS, Hansen LL (2009) PCR-based
Epigenetic mechanisms of gene regulation. Cold methods for detecting single-locus DNA meth-
Spring Harbor Press, Cold Spring Harbor, NY ylation biomarkers in cancer diagnostics,
2. Chen ZX, Riggs AD (2011) DNA methylation prognostics, and response to treatment. Clin
and demethylation in mammals. J Biol Chem Chem 55:1471–1483
286:18347–18353 8. Feinberg AP, Vogelstein B (1983) Hypomethy
3. Gomez A, Ingelman-Sundberg M (2009) lation distinguishes genes of some human can-
Pharmacoepigenetics: its role in interindividual cers from their normal counterparts. Nature
differences in drug response. Clin Pharmacol 301:89–92
Ther 85:426–430 9. Frommer M et al (1992) A genomic sequenc-
4. Ingelman-Sundberg M, Sim SC, Gomez A, ing protocol that yields a positive display of
Rodriguez-Antona C (2007) Influence of cyto- 5-methylcytosine residues in individual
chrome P450 polymorphisms on drug thera- DNA strands. Proc Natl Acad Sci USA 89:
pies: pharmacogenetic, pharmacoepigenetic and 1827–1831
clinical aspects. Pharmacol Ther 116:496–526 10. Eads CA et al (2000) MethyLight: a high-
5. Shen L, Waterland RA (2007) Methods of throughput assay to measure DNA methyla-
DNA methylation analysis. Curr Opin Clin tion. Nucleic Acids Res 28:E32
Nutr Metab Care 10:576–581 11. Colella S, Shen L, Baggerly KA, Issa JP, Krahe
6. Laird PW (2010) Principles and challenges of R (2003) Sensitive and quantitative universal
genomewide DNA methylation analysis. Nat Pyrosequencing methylation analysis of CpG
Rev Genet 11:191–203 sites. Biotechniques 35:146–150
188 Sandra G. Heil

12. Ehrich M et al (2005) Quantitative high- 14. http://www.urogene.org/methprimer/

throughput analysis of DNA methylation pat- index1.html. Accessed 26 July 2011
terns by base-specific cleavage and mass 15. http://www.gene-quantification.de/effi-
spectrometry. Proc Natl Acad Sci USA ciency01.html#rebrikov. Accessed 26 July 2011
102:15785–15790 16. Livak KJ, Schmittgen TD (2001) Analysis of
13. Campan M, Weisenberger DJ, Trinh B, Laird relative gene expression data using real-time
PW (2009) MethyLight. Methods Mol Biol quantitative PCR and the 2(-Delta Delta C(T))
507:325–337 method. Methods 25:402–408
Chapter 12

Plasmid Derived External Quality Controls

for Genetic Testing
Tahar van der Straaten and Henk-Jan Guchelaar

Abstract
Since the human genome has been fully sequenced, and presence of single nucleotide polymorphisms
(SNPs) appeared abundant, many studies are associating SNPs with clinical response or even with disease.
For some diseases or drug treatments these associations are clear, so that genetic screening for such SNPs
or mutations is a standard procedure. For that reason, many different techniques have been developed for
fast and easy screening for such specific SNPs/mutations. For reliable screening, the use of controls with
known genotypes is indispensable. Plasmids are an ideal tool for making controls which can serve as an
inexhaustible source, making new validation superfluous.
In this chapter we describe how plasmid controls can be made using DNA with a heterozygous
genotype, and also from DNA of which only one allele is available.

Key words Genotyping, Controls, Plasmids, Mutagenesis

1 Introduction

For routine healthcare purposes, there is a strong demand for

genotyping tests for SNPs/mutations that have proven to be
clinical relevant, that are cost-effective, reliable, and easy to use.
In general, a patients’ genotype of a specific SNP/mutation can be
determined within 3 h after taking a blood sample (although saliva
is also suitable). Many of such techniques are making use of specific
probes that bind selectively to a certain genotype. Since these
probes are labeled with the genotype specific fluorescent dye, the
genotype can be followed real-time. Although such techniques
are easy to perform, analysis might be a problem when not all three
possible genotypes are present. Automatic genotype calling might
be wrong in case of one or two clusters. For such analysis, using all
possible genotype controls is necessary. Another technique that is
easy to use is based on differences in melting point between two
genotypes in a small PCR product. These melting peaks can be
distinguished, but which peak belongs to which genotype cannot

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_12, © Springer Science+Business Media, LLC 2013

189
190 Tahar van der Straaten and Henk-Jan Guchelaar

be predicted and should be established by another method, unless

controls are taken with known genotypes. As discussed above, con-
trols with known genotypes are a prerequisite for testing or validat-
ing a new genotyping method. Whether this method is for research
only or for patient healthcare, for a single sample or large group of
samples, the use of established controls contributes to the reliabil-
ity of the results.
We argue for the use of plasmid controls instead of linear DNA,
such as chromosomal DNA, for several reasons. Once a plasmid
has been created, it is an inexhaustible source for that specific con-
trol since picograms of DNA are enough to retransform it into
E. coli and get as much new plasmid DNA as desired. Therefore it
is easy to distribute this control to other laboratories. Since plasmid
DNA is circular, it is more stable than linear because it cannot be
degraded by exonucleases. A golden standard for determining the
order of nucleotides is the chain termination method [1], nowa-
days known as Sanger sequencing. Plasmids are good templates for
such reactions and many plasmids contain sequences for general
used primers (such as M13, SP6, T7, T3, see Fig. 1). Sequencing
heterozygous DNA might be a misinterpreted when both alleles
are not equally amplified by PCR [2]. Plasmid vectors can only
insert one PCR product (allele) so in case of doubt, a suspicious
DNA sample can be cloned into a plasmid and the genotype can be
confirmed after sequencing several plasmids. Theoretically 50 % of
plasmids contain allele 1 and the other 50 % contain allele 2. But,
when one allele is less efficiently amplified this could be for

Xmnl 2009
T7 1 start
Scal 1890 Nael 2707 Apal 14
Aatll 20
f1 ori Sphl 26
BstZl 31
Ncol 37
BstZl 43
Ampr Notl 43
pGEM -T Easy lacZ Sacll 49
EcoRl 52
(3015bp)
Spel 64
EcoRl 70
Notl 77
BstZl 77
Pstl 88
ori Sall 90
Ndel 97
Sacl 109
1473VA05_6A

BstXl 118
Nsil 127
141
SP6

Fig. 1 pGEM-T easy vector from Promega. Within the Lac-operon the multiple
cloning site is opened and a thymidine is added at the 3′-ends
Plasmid Derived External Quality Controls for Genetic Testing 191

example 20–80 %. Nevertheless, sequencing ten samples should

statistically rule out heterozygosity when only one allele is found.
Recently we discussed the use of controls in genetic testing and
argued for the use of plasmid derived controls for genetic testing [3].
In 2010, we published a review of genotyping methods [4]. In this
chapter we describe in more detail how such plasmid controls can be
established, either for SNPs/mutations (for indels see Note 5).

1.1 Background Plasmids are self-replicating, extrachromosomal DNA molecules

found in almost all bacterial species. Plasmids carry genes for a
wide variety of functions such as resistance to antibiotics. Most
plasmids are double stranded circular DNA molecules and their
size varies between several kilobases to hundreds of kilobases.
Some plasmids transfer their DNA across bacterial species; some
only transfer their DNA into bacteria of the same species, whereas
others do not transfer their DNA at all. In the 1970s, these natu-
rally occurring plasmids, mainly derived from Escherichia coli
(E. coli), were used to create vectors that allow manipulation and
delivery of specific DNA sequences. All such created plasmid vec-
tors contain three common features: a replicator, a selectable
marker, and a multiple cloning site. The replicator contains the site
at which DNA replication starts. The selectable marker is usually a
gene encoding resistance to some antibiotics, which is used for
maintaining the plasmid in cells. The cloning site is a restriction
endonuclease cleavage site in which foreign DNA can be inserted
without interfering with the plasmids ability to replicate or with
antibiotic resistance. In some plasmids, the multiple cloning site is
located in the Lac operon which allows blue/white screening
[5, 6]. Commercial plasmids have been developed that are opened
in the multiple cloning site and where at both 3′-ends a thymidine
(T) is added (Fig. 1). These 3′-T overhangs at the insertion site
greatly improves the efficiency of ligation of a PCR product into
the plasmids [7, 8] since most DNA polymerases have the ability to
add an adenosine (A) at the 3′-ends of the amplified DNA [9].
After insertion of the PCR product into this so-called A-T vector,
this new plasmid is transformed into E. coli for multiplication.
Transformation of E. coli was first described in 1970 [10] and
improved by Dagert and Ehrlich in 1979 [11]. After special treat-
ments these E. coli obtain the ability to take up plasmids and there-
fore are called “competent cells.” These cells are commercially
available or can be prepared by standard methods [12].

2 Materials

A-T plasmids such as pGEM-T can be commercially obtained

from Promega (Leiden, The Netherlands), or pCR2.1 from
Invitrogen (Nieuwerkerk aan den IJssel, The Netherlands).
192 Tahar van der Straaten and Henk-Jan Guchelaar

Competent cells are also available from both companies. Primers in

this example are synthesized by Eurogentec (Maastricht, The
Netherlands). Many companies offer Taq polymerases with capac-
ity to add adenosine at the 3′-ends, as an example we use Hotstar
Taq polymerase master mix from Qiagen (Venlo, The Netherlands)
(see Note 1).

2.1 Generation In order to establish plasmid controls that can be used for several
of Plasmid Controls genotyping techniques, we suggest primers that are located about
500 nucleotides up and downstream of the SNP. As an example we
took DPYD gene rs3918290 (Fig. 2) to insert into Promega’s
pGEM-Teasy vector. Described below are two approaches, (1)
using a DNA sample that has been previously genotyped as hetero-
zygous. In this example the DNA sample was genotyped heterozy-
gous for rs3918290 by means of Taqman analysis (Lifetech,
Nieuwerkerk aan de IJssel, The Netherlands) and confirmed by
pyrosequencing (Qiagen, Venlo, The Netherlands), (2) Insert the
mutation at the SNP site in case of low minor allele frequency and
only one genotype is available.
Of note, for testing genotyping methods or validation of meth-
ods it is advised to use a standard control DNA panel of healthy
volunteers. This can be commercially available reference material
(i.e., from Coriell [13]); Gentris [14]; or GeT-RM cell lines col-
lected by the CDC [15], or, as we did, we took 94 blood samples
from blood-donors who gave informed consent for research use.

Fig. 2 DPYD exon 14 deletion. SNP rs3918290 (C/T) is shown in bold and capital. In bold and underlined for-
ward and reverse PCR are primers are shown
Plasmid Derived External Quality Controls for Genetic Testing 193

3 Methods

3.1 Generate 1. Perform regular PCR as follows: in one reaction tube add
Genotype 5 pmol of PCR primers forward and reverse, 10 μl Hotstar
Control Using mastermix, 10 ng of chromosomal DNA and add sterile water
Heterozygous DNA to a total volume of 20 μl.
2. Run a standard PCR program as follows: 15 min at 95 °C, 30
cycles of 95 °C—55 °C—72 °C for 30 s—30 s—60 s, respec-
tively, followed by a final extension at 72 °C for 10 min (impor-
tant for the addition of Adenosine at 3′-ends).
3. Analyze PCR product by gel electrophoresis.
4. Insert the PCR product into pGEM-T easy vector as follows:
in one tube at 3 μl PCR product, 1 μl pGEM-T vector, 5 μl
ligase buffer, and 1 μl ligase. Incubate for at least 2 h at room
temperature.
5. Defreeze competent E. coli cells and add to the ligation mix-
ture. Follow the procedure as described for these cells which
depends on the way how they are prepared, for example heat
shock or electroshock (see Note 2 and 4).
6. pGEM-T vector allows blue/white screening when comple-
mentary competent cells are used and substrate is added to the
growth plate. pGEM-T that has no insert will give blue colo-
nies, whereas an insertion will disturb the LacZ gene and yields
white colonies. Using heterozygous DNA, theoretically, 50 %
of (white) colonies will contain allele 1 and the other 50 %
allele 2 (Fig. 3). Grow four colonies and next day isolate plas-
mid DNA by standard methods [12] and check for genotype.
Usually, about 50 pg of plasmid in a PCR reaction is sufficient
for standard genotyping methods.

3.2 Generate Since one allele is available, the other allele has to be created. This
Genotype Control can be done by PCR where one primer (forward in this example)
Using DNA of Which is overlapping the SNP/mutation and contains the mutant nucleo-
One Allele Is Available tide at that position (Fig. 4). The resulted PCR product will con-
tain the mutation and can be used as a primer with the original
forward primer (Fig. 5).
1. Perform regular PCR as follows: in one reaction tube add
5 pmol of PCR primers mutant forward and reverse, 10 μl
Hotstar mastermix, 10 ng of chromosomal DNA (add sterile
water to a total volume of 20 μl).
2. Run a standard PCR program as follows: 15 min at 95 °C, 30
cycles of 95 °C—55 °C—72 °C for 30 s—30 s—60 s, respec-
tively, followed by a final extension at 72 °C for 10 min.
3. Analyze PCR product by gel electrophoresis (see Note 3).
194 Tahar van der Straaten and Henk-Jan Guchelaar

C T A C A
A A A C
A G A G A
C A
A
T A C A T A A G
G A A A G A A
T A
A A T A C A T A
T A A A G A A T A
A
A A T A C A C A A A
A A A G A G T
T C T C A
A A A A A A
A A A G A A A G
C A T A C A
C AA G C A A A A G
A G A G T A
T A A A
C A A A T A
A G A A

C A T A
TA TA A
G T T

Fig. 3 Mixture of PCR products containing either a C or a T at the SNP site. Per plasmid only one PCR strand
can be inserted, thus either a C or a T

4. Use 1 μl of this PCR product as the reverse primer in combina-

tion with the original forward primer as described in step 1.
5. Analyze PCR product by gel electrophoresis.
6. Insert the PCR product into pGEM-T easy vector as follows:
in one tube at 3 μl PCR product, 1 μl pGEM-T vector, 5 μl
ligase buffer, and 1 μl ligase. Incubate for at least 2 h at room
temperature.
7. Defreeze competent E. coli cells and add to the ligation mix-
ture. Follow the procedure as described for these cells (see
Note 4).
8. pGEM-T vector allows blue/white screening when comple-
mentary competent cells are used and substrate is added to the
growth plate. pGEM-T that has no insert will give blue colo-
nies, whereas an insertion will disturb the LacZ gene and yields
white colonies. Grow a number of colonies and next day iso-
late plasmid DNA by standard methods [12] and check for
genotype. Usually, about 50 pg of plasmid in a PCR reaction
will is sufficient for standard genotyping methods.
Plasmid Derived External Quality Controls for Genetic Testing 195

T
5’ C 3’
Primer design 3’ G 5’

3’ 5’
Annealing and 5’ 3’
elongation
T
5’ 3’
3’ G 5’

New strands for 5’ T 3’

3’ 5’
Annealing and
elongation T
3’ G 5’

New strands 5’ T 3’
3’ A 5’
after number of 5’ T 3’
cycles 3’ A 5’
5’ T 3’
3’ A 5’
5’ T 3’
3’ A 5’

Fig. 4 Introduction of mutation in DNA. A primer is chosen with the nucleotide of interest at the SNP site. This
mutant primer will bind on the complementary strain and elongate. This new strand, containing a thymidine at
the SNP site is a template for the reverse primer in the next PCR cycle. The next cycle will use both new strands
to get double stranded PCR with the mutation inserted

PCR product with inserted mutation

Forward PCR primer

3’ A 5’
5’ 3’

3’ 5’
Fig. 5 PCR product with the inserted mutation will serve as a reverse primer in combination with the original used
forward primer, yielding a PCR product as was described for Fig. 4 but with a T instead of a C at the SNP site
196 Tahar van der Straaten and Henk-Jan Guchelaar

4 Notes

1. When using an A-T plasmid be sure to use a polymerase that

adds an adenosine at the 3′-end of the PCR product. Especially
enzymes with proofreading do not add this adenosine. It is
possible to use a blunt ended vector in which such blunt ended
PCR products can be inserted.
2. For blue/white screening, which is advised because of back-
ground colonies, be sure to use competent cells that have char-
acteristics as F′ and LacIq which is necessary for alpha
complementation. IPTG is necessary to activate the Lac
operon. If the wrong cells are used, or no IPTG is added, all
colonies will appear white.
3. Before ligation into A-T vector, check the PCR products on
gel. It should be one clear band of the expected size. If not,
optimize the PCR or purify the band from gel before ligation.
4. Competent cells can be made manually or bought commer-
cially. The way the plasmid enters the bacterial cell depends on
the way these E. coli cells are made competent. The most used
way is washing the cells in CaCl2 solution and snap freeze the
cells in liquid nitrogen. When thawed, these cells are incubated
with plasmid which enters the bacteria cell (which is more effi-
cient after a heat shock at 42 °C for 90 s.). Bacteria containing
a plasmid will grow into colonies after plating on selective
medium and incubation at 37 °C. After heat shock and before
plating on selective medium, the amount of colonies will
increase by factor 2–4 when the cells are recovered for
30–60 min in normal medium (no antibiotics). Instead of heat
shock, E. coli can also be prepared for electroshock, which is
often more efficient.
5. Instead of replacing one single nucleotide for another, the
same approach can be taken for creating controls with an inser-
tion or deletion. Choose 15–20 nucleotides directly upstream
and downstream of the insertion/deletion. The maximal
length of the primer can be up to 60 nucleotides, so if the
insertion/deletion is about ten nucleotides it is better to
increase the length of the primer.

References
1. Sanger F, Coulson AR (1975) A rapid method 3. van der Straaten T, Swen J et al (2008) Use of
for determining sequences in DNA by primed plasmid-derived external quality control samples
synthesis with DNA polymerase. J Mol Biol in pharmacogenetic testing. Pharmacogenomics
94:441–448 9:1261–1266
2. van der Heiden I, van der Werf M et al (2004) 4. van der Straaten T, van Schaik RH (2010)
Sequencing: not always the “gold standard”. Genetic techniques for pharmacogenetic analy-
Clin Chem 50:248–249 ses. Curr Pharm Des 16:231–237
Plasmid Derived External Quality Controls for Genetic Testing 197

5. Ullmann A, Jacob F, Monod J (1967) 9. Clark JM (1988) Novel non-templated

Characterization by in vitro complementation nucleotide addition reactions catalyzed by pro-
of a peptide corresponding to an operator- caryotic and eucaryotic DNA polymerases.
proximal segment of the beta-galactosidase Nucleic Acids Res 16:9677–9686
structural gene of Escherichia coli. J Mol Biol 10. Mandel M, Higa A (1970) Calcium-dependent
24:339–343 bacteriophage DNA infection. J Mol Biol
6. Yanisch-Perron C, Vieira J, Messing J (1985) 53:159–162
Improved M13 phage cloning vectors and host 11. Dagert M, Ehrlich SD (1979) Prolonged incu-
strains: nucleotide sequences of the M13mp18 bation in calcium chloride improves the compe-
and pUC19 vectors. Gene 33:103–119 tence of Escherichia coli cells. Gene 6:23–28
7. Mezei LM, Storts DR (1994) Purifications of 12. Ausubel FM (1997) Current protocols in
PCR products. In: Griffin HG, Griffin AM molecular biology. Wiley, New York
(eds) PCR technology: current innovations. 13. http://ccr.coriell.org/Sections/Collections/
CRC, Boca Raton, p 21 CDC/?Ssld=16
8. Robles J, Doers M (1994) pGEM-T Vector 14. http://www.gentris.com
Systems troubleshooting guide. Promega 15. http://www.cdc.gov/dls/genetics/rmmateri-
Notes 45:19–20 als/default.aspx
Part III

Functional Assessment of Genetic Variation:

In Vitro and In Vivo Methods
Chapter 13

Allelic Imbalance Assays to Quantify Allele-Specific Gene

Expression and Transcription Factor Binding
Francesca Luca and Anna Di Rienzo

Abstract
A growing number of noncoding variants are found to influence the susceptibility to common diseases and
interindividual variation in drug response. However, the mechanisms by which noncoding variation affects
cellular and clinical phenotypes remain to be elucidated. Allele-specific assays allow testing directly the dif-
ferential properties of the alleles at a regulatory variant, which are detected as an allelic imbalance. Two
widely used allelic imbalance assays target cDNA and DNA from chromatin immunoprecipitation (ChIP)
experiments, and therefore revealing allele-specific gene expression and transcription factor binding,
respectively. The throughput of allelic imbalance assays ranges from single variant to the genome scale,
which are made possible by the recent advances in genotyping and sequencing technologies (e.g., genome-
wide quantitative cDNA genotyping, ChIP-seq).

Key words Polymorphism, Chromatin immunoprecipitation, RNA, cDNA, Quantitative PCR, Gene
expression

1 Introduction

DNA polymorphisms in regulatory regions may account for a large

proportion of interindividual differences in common phenotypes.
Accordingly, a large number of noncoding SNPs have been associ-
ated with diseases in genome-wide association studies (e.g., [1]).
The functional relevance of regulatory polymorphisms has been
further confirmed by the increasing number of studies establishing
an association between genetic variation and cellular phenotypes
(e.g., mRNA levels, [2–7]).
Allelic imbalance assays allow the investigator to evaluate the
cis effect of a putative regulatory variant, by directly assessing the
effect of each allele at the site of interest or at a proxy SNP. The
power of these approaches relies on the fact that the two alleles at

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_13, © Springer Science+Business Media, LLC 2013

201
202 Francesca Luca and Anna Di Rienzo

a site are compared within the same sample (a heterozygous

individual), therefore removing the confounding effect of environ-
mental or trans-acting factors.
These assays have been largely used to investigate allelic expres-
sion and allele-specific transcription factor binding. Ultimately, a
combination of both approaches is able to provide direct evidence
that variation at a given site results in different levels of gene
expression by altering, for example, the interaction between the
DNA and a transcription factor.
A cDNA/RNA allelic imbalance assay is based on the notion
that cis-acting regulatory polymorphisms cause differential expres-
sion between chromosomes in heterozygotes. This will result in
unequal representation of alleles at coding polymorphisms on the
same haplotype in the mRNA of individuals heterozygous for the
regulatory polymorphism.
One of the molecular mechanisms by which regulatory poly-
morphisms affect gene expression is through alterations of DNA–
protein binding affinity. This can be detected in a Chromatin
immunoprecipitation (ChIP) assay followed by allele specific quan-
tification of the ChIPed DNA (commonly known as HaploChIP).
ChIP assays allow in vivo analysis of DNA–protein interaction.
Proteins are cross-linked to the chromatin in living cells by formal-
dehyde treatment; the chromatin is then sheared and incubated
with an antibody specific for the protein of interest. Following the
immunoprecipitation, the DNA is purified and can be analyzed by
a variety of techniques including quantitative real-time PCR.
Both RNA/cDNA and ChIP allelic imbalance assays were
originally developed for single gene analyses [8, 9]. In the follow-
ing protocol, we will describe applications that include TaqMan
quantitative genotyping assays. However, alternative approaches to
quantifying allelic imbalance can be used (e.g., fluorescent dideoxy
terminator-based methods [10], MALDI-TOF-MS [11, 12]).
More recently, high-throughput genotyping and sequencing tech-
nologies have expanded the potential for allelic imbalance applica-
tions to the genome-wide scale [13].

2 Materials

2.1 Cell Culture The following protocol uses lymphoblastoid cell lines (LCLs).
However, allelic imbalance assays can be performed also in other
cell lines as well as primary cells. Suggestions on how to modify the
protocol when using different cell types are provided throughout
the text.
1. LCLs from individuals carrying the heterozygous genotype at
the SNP(s) of interest (see Note 1).
2. RPMI 16490 (Gibco), supplemented with 15 % FBS and
0.1 % Gentamycin.
ChIP and RNA/cDNA Allelic Imbalance 203

2.2 Preparation of In the current protocol the Upstate (Millipore) ChIP Assay Kit
the Sample for Allelic reagents are used; however, details to prepare the reagents are also
Imbalance (AI) provided.

2.2.1 Preparation of the 1. 2 × 106 LCLs in mid-log exponential phase.

Sample for ChIP AI 2. Fresh 18.5 % formaldehyde: 0.925 g paraformaldehyde, 35 μl
1 M KOH, add water to a final volume of 5 ml.
3. Fresh 10× Glycine (1.25 M).
4. Protease Inhibitors (For 1 ml of buffer add 10 μl of PMSF and
1 μl Protease Inhibitors Cocktail).
5. Ice-cold PBS.
6. SDS lysis Buffer: 1 % SDS, 10 nM EDTA, 50 mM Tris, pH 8.1.
7. Protease inhibitors.
8. Sonicator.
9. ChIP Dilution Buffer: 0.01 % SDS, 1.1 % Triton X-100, 2 mM
EDTA, 20 mM Tris–HCl, pH 8.1, 150 mM NaCl.
10. IgG.
11. Salmon Sperm DNA/Protein A Agarose-50 % Slurry.
12. ChIP-grade antibody specific for the protein of interest.
13. 5 M NaCl.
14. 0.5 M EDTA, pH 8.0.
15. 1 M Tris–HCl, pH 6.5.
16. Low-Salt Immune Complex Wash Buffer: 0.1 % SDS, 1 %
Triton X-100, 2 mM EDTA, 20 mM Tris–HCl, pH 8.1,
150 mM NaCl.
17. High-Salt Immune Complex Wash Buffer: 0.1 % SDS, 1 %
Triton X-100, 2 mM EDTA, 20 mM Tris–HCl, pH 8.1,
500 mM NaCl.
18. LiCL Immune Complex Wash Buffer: 0.25 M LiCl, 1 %
IGEPAL-CA630, 1 % deoxycholic acid (sodium salt), 1 mM
EDTA, 10 mM Tris, pH 8.1.
19. 1× TE: 10 mM Tris–HCl, 1 mM EDTA, pH 8.0.
20. Freshly prepared Elution Buffer: 1 % SDS, 0.1 M NaHCO3.
21. RNase.
22. Qiagen PCR purification kit.
23. 10 μM primers specific to a negative control region (i.e., a
region known not to bind the transcription factor).
24. 10 μM primers specific to a positive control region (i.e., a
region known to bind the transcription factor).
25. SYBR® Green Master Mix (Applied Biosystems, or any other
company).
204 Francesca Luca and Anna Di Rienzo

2.2.2 Preparation of the 1. LCLs in mid-log exponential phase (Approx total RNA yield:
Sample for RNA/cDNA AI 3 μg/106 cells).
2. Qiagen RNeasy Plus mini kit.
3. High-Capacity cDNA Reverse Transcription Kits from Applied
Biosystems.

2.3 AI Assay 1. Taqman® Universal PCR Master Mix with no AmpErase® UNG.
2. Taqman® 40× SNP Genotyping Assay.

3 Methods

Allelic imbalance assays are performed on cells from individuals

heterozygous at the site of interest (ChIP AI assays) or at a coding
site in linkage disequilibrium with the site of interest (RNA/cDNA
AI assays). Here we describe the protocols for these two assays,
starting from cell cultures. Different methods have been developed
to assay allelic imbalance, both at the single gene and at the
genome-wide level. The method we describe uses quantitative
TaqMan genotyping, but could be substituted by other methods
depending on the resources available to each investigator.

3.1 Cell Culture This method describes the protocol for LCLs grown in suspension;
however, other cell types can be used (see Note 2). LCLs are seeded
at 0.5 × 106 and, once in mid-log exponential phase, may be stimu-
lated according to experimental design (for example with dexa-
methasone) (see Note 3) prior to harvesting.

3.2 Preparation of Alternative ChIP protocols are successfully used in other laborato-
the Sample for AI ries and can replace the one described here.
3.2.1 Preparation Formaldehyde Cross-linking
of the Sample for ChIP AI
1. In the tissue culture hood, add formaldehyde to culture flasks
(final concentration of 1 %). Swirl and incubate for 20 min at
37 °C in the tissue culture incubator (see Note 4).
2. Add 1.25 M (10×) Glycine to the flasks, to a final concentra-
tion of 0.125 M (1×). Glycine quenches the cross-linking
reaction. Swirl and incubate at room temperature for 5 min.
3. Meanwhile, add protease inhibitors to PBS (need about 20 ml
PBS per 75 cm2 flask) and cool the centrifuge to 4 °C.
4. Collect the cells by centrifugation at 290 × g for 7 min at 4 °C.
5. Wash cells twice with 10 ml of ice-cold PBS containing prote-
ase inhibitors.
6. Cell pellet can be stored at −80 °C.
ChIP and RNA/cDNA Allelic Imbalance 205

Cell lysis and Sonication

7. (If continuing with the sonication on the same day) Add pro-
tease inhibitors to SDS Lysis Buffer at room temperature.
8. (If cell pellet was frozen) Thaw cell pellet on ice, take half of
the volume and collect the pellet at 700 × g at 4 °C for 5 min
(store at −80 °C the remaining aliquot).
9. Meanwhile, add protease inhibitors to SDS Lysis Buffer
10. Remove the supernatant and add 700 μl of the SDS Lysis
Buffer to the pellet, resuspend and split into two 2 ml tubes
with flat bottom and incubate for 10 min on ice.
11. Sonicate the samples in a QSonica S4000 (or a Q700)
Sonicator with a cup horn immersed in ice-cold water. The
sonication program includes 50 cycles (each cycle is 30 s ON
and 1 min OFF, amplitude set at 90) (see Note 5). To avoid
overheating of the sample, replace the water with new ice-cold
one every 10 min.
12. Pellet in tabletop centrifuge (14,000 × g) at 4 °C for 10 min,
place the supernatant into a new 1.7 ml tube.
13. The sonication should result in DNA fragments approx 200–
400 bp in size. Remove 20 μl (tester) in order to determine if
the appropriate fragment size has been obtained.
14. Store the majority of the supernatant at −80 °C until the soni-
cation has been checked.
15. To reverse the cross-links in the tester, add 2 μl 5 M NaCl and
incubate at 65 °C for 4 h or overnight.
16. Add 1 μl RNase to each tester and incubate at room tempera-
ture for 30 min.
17. Add 1 μl 0.5 M EDTA, 2 μl 1 M Tris–HCl, pH 6.5, and 1 μl
of 10 mg/ml Proteinase K to the testers and incubate for 1 h
at 45 °C.
18. Run the samples on a 1 % TBE gel at 80 V for 30 min. If the
DNA smear is within 200–400 bp size range proceed with the
ChIP assay protocol (see Note 6).
ChIP Assay
19. Add protease inhibitors to the ChIP Dilution Buffer.
20. Split each sample into two 1.7 ml tubes (one of the two sam-
ples can be used to obtain an IgG ChIP control). Dilute the
sonicated cell supernatant ~10-fold in ChIP Dilution Buffer
by adding 1.2 ml of the ChIP Dilution Buffer to the 150 μl
sonicated cell supernatant to a final volume of 1.35 ml.
21. Preclear, by adding 60 μl of Salmon Sperm DNA/Protein A
Agarose-50 % Slurry and 1 μg of IgG for 1 h at 4 °C with rota-
tion (see Note 7).
206 Francesca Luca and Anna Di Rienzo

22. Pellet agarose (5,000 × g for 1 min or less at 4 °C) and place
supernatant in new tubes.
23. Remove 20 μl for the input control sample and store at −80 °C.
24. Add antibody (1–5 μg depending on antibody of choice) to
the supernatant and incubate overnight at 4 °C with rotation.
25. Add 45 μl Salmon Sperm DNA/Protein A Agarose Slurry for
1 h at 4 °C with rotation.
26. Pellet agarose (5,000 × g for 1 min at 4 °C). Aspirate out the
supernatant and wash for 3–5 min on a rotating platform with
1 ml of each of the buffers listed in the order given below:
(a) Low-Salt Immune Complex Wash Buffer, one wash
(b) High-Salt Immune Complex Wash Buffer, one wash
(c) LiCl Immune Complex Wash Buffer, one wash
(d) 1× TE, two washes
27. Meanwhile, prepare the Elution Buffer.
28. Elute by adding 250 μl elution buffer to the pelleted agarose/
antibody/protein complex. Vortex briefly and incubate at room
temperature for 15 min with rotation. Spin down the agarose,
and carefully transfer the supernatant fraction to another tube
and repeat elution. Combine eluates (total volume ~ 500 μl).
29. Thaw the input control samples.
30. Add 20 μl 5 M NaCl (2 μl for input) to the 500 μl eluates and
reverse cross-links by heating at 65 °C for 4 h or overnight (see
Note 8). One may store the sample at −20 °C and continue
the next day.
31. Add 1 μl RNase to each sample and incubate at room tempera-
ture for 30 min.
32. Add 10 μl 0.5 M EDTA (1 μl for input), 20 μl 1 M Tris–HCl
(2 μl for input), pH 6.5, and 2 μl of 10 mg/ml Proteinase K
(1 μl for input) to the combined eluates and incubate for 1 h
at 45 °C.
33. Recover DNA with the Qiagen PCR Purification Kit, follow-
ing the manufacturer’s instructions.
34. Before performing AI Assays on the ChIPed DNA, the quality
of the ChIP experiment should be assessed by performing
quantitative real time PCRs targeting a positive and a negative
control region.

3.2.2 Preparation of the 1. Extract total RNA with the Qiagen RNeasy Plus mini kit fol-
Sample for RNA/cDNA AI lowing the manufacturer’s protocol.
ChIP and RNA/cDNA Allelic Imbalance 207

2. Synthesize cDNA from total RNA (100 ng) using the High-
Capacity cDNA Reverse Transcription Kit (Applied Biosystems,
Foster City, CA) according to the manufacturer’s protocol.
Dilute cDNA samples 1:30 to perform AI assays.

3.3 AI Assay While ChIP AI assays target directly the candidate binding variant
of interest, cDNA AI assays are designed to target a coding SNP in
the gene that is differentially regulated by the two alleles at the
candidate regulatory variant.
1. The following criteria should be used to select a coding SNP
to be assayed in a cDNA AI assay:
(a) High linkage disequilibrium with the regulatory variant
(if phased genotype data including the regulatory SNP are
not available (see Note 9))
(b) High heterozygosity
(c) >40 bp away from exon boundary to allow designing
assays that will amplify both gDNA and cDNA thus
allowing for the use of a gDNA standard curve
2. In most cases, predesigned TaqMan genotyping assays are
available from Applied Biosystems. Alternatively, custom made
assays can be designed using the Custom TaqMan Assay
Design Tool.
3. Quantitative real-time PCR assays can be performed in either
96- or 384-well plates using any of the ABI systems (e.g., the
ABI PRISM 7900HT Sequence Detection System or the ABI
StepOnePlus™ Real-Time PCR System). Reactions are typi-
cally run in triplicates for each sample.
PCR mix for a sample run on a 96-well plate
Total volume: 20 μl
cDNA/ChIPed DNA: 4 μl
Taqman® Universal PCR Master Mix with no AmpErase®
UNG: 10 μl
Taqman® 40× SNP Genotyping Assay: 0.5 μl
Use Applied Biosystems standard recommended PCR cycling
conditions.
4. To account for differences between the two fluorochromes, a
standard curve should be built for each of the two alleles using
serial dilutions (see Note 10) of genomic DNA from an
individual heterozygous at the assayed SNP (Fig. 1). PCR
products are quantified for each allele separately in each
reaction and ratios between the two different alleles can be
calculated (Fig. 2). The results can then be averaged across
PCR replicates.
208 Francesca Luca and Anna Di Rienzo

Standard Curve
37
A
36
35 G
Input DNA
34
33
32

CT
30

29
28 IP DNA
27

26
25
24

0.01 0.02 0.1 0.2 1 2 3 45 10 20 30 100

Quantity

Target: G Slope −4.397 Y-Inter: 29.521 R2: 0.994 Eff%: 68.828

Slope −4.376 Y-Inter: 30.482 R :

2
Target: A 0.993 Eff%: 69.244

Fig. 1 Example of the results from a ChIP AI assay at a regulatory variant for the
SGK1 gene [14]. The samples are plotted over the standard curves built for the
two alleles separately. An imbalance in the ChIPed DNA can be observed

4 Notes

1. For ChIP AI assays, individuals should be heterozygous at the

candidate binding variant. For RNA/cDNA AI assays, the
assayed individuals should include heterozygotes at a coding
SNP in linkage disequilibrium with the candidate regulatory
variant (this coding SNP is directly assayed for allelic imbal-
ance in the cDNA) and both homozygotes and heterozygotes
at the candidate regulatory variant (a minimum of three indi-
viduals at each of these genotype classes is required to perform
t-test). AI assays can be relatively noisy, especially in genes
expressed at low levels; therefore, a large sample size is recom-
mended (e.g., >5). This can be achieved by either performing
replicates of the chromatin immunoprecipitation/RNA
extraction and PCR in the same individual or by assaying mul-
tiple individuals with the same genotype. The latter option has
the advantages that it allows testing for robustness of the
observed AI in different genetic backgrounds.
2. This same protocol for formaldehyde cross-linking can be
applied also to other cell types growing in suspension. For
adherent cells, formaldehyde and glycine should be added
ChIP and RNA/cDNA Allelic Imbalance 209

Fig. 2 Example of the results of a cDNA AI assay targeting a coding SNP in link-
age disequilibrium with an interaction eQTL at the LSG1 gene [15]. The assay
was performed on samples cultured in two different conditions (with and without
dexamethasone). In this assay, the natural log-ratio between the two different
alleles was calculated and quantile normalized in each treatment condition sep-
arately. Two PCR replicates were performed and the results were averaged.
A significant difference between heterozygotes and homozygotes at the candi-
date regulatory variant in the presence of dexamethasone was observed
(p = 8.38 × 10−5). In each box, the horizontal line represents the median and the
whiskers represent the first and third quartile

directly to the tissue culture dish. The medium should then be

aspirated and cells washed with PBS in the plate. Cells should
be harvested using a cell scraper and moved to a conical tube.
3. If the goal of the experiment is to compare allele-specific
expression/binding under different conditions, a balanced
study design should be employed. Specifically, each experi-
mental unit should be defined as the set of experimental con-
ditions assayed for each sample.
4. The time and concentration of treatment for the cross-linking
with formaldehyde should be optimized for the specific cell
type analyzed. The conditions described here have been suc-
cessfully used for MCF-10aMyc and LCLs [14].
5. Depending on the cell type, protein of interest and sonicator
available, sonication conditions should be optimized. In gen-
eral we suggest using a sonicator equipped with a water bath
210 Francesca Luca and Anna Di Rienzo

to prevent the foaming generated by a sonicator equipped

with a probe. The use of a water bath also reduces inter-sample
variability in sonication size, which is a feature desired
when performing comparisons across treatment conditions or
between samples.
6. If the desired fragment length distribution has not been
achieved, it is possible to repeat the sonication. In our experi-
ence if the first sonication has generated a tight fragment dis-
tribution of a size larger than the desired one, 10–20 additional
cycles of sonication are enough to shift the fragment size dis-
tribution to the desired one. Repeated freezing–thawing cycles
of the pre-sonicated pellet are not advisable as they could
result in disruption of protein–DNA bonds.
7. The preclear step is optional. Its purpose is to remove proteins
that interact nonspecifically with the IP components.
8. In our experience, 4 h of incubation are enough for the tester,
while overnight incubation is required for the actual IP
samples.
9. If the regulatory variant and the coding SNP are in perfect
linkage disequilibrium, allelic imbalance can be detected as an
overrepresentation of the allele at the coding SNP that occurs
on the same chromosome as the allele at the regulatory SNP
that results in higher transcript levels. However, when the two
variants are not in perfect linkage disequilibrium, different
alleles at the coding SNP will be over-represented depending
on the haplotype phase in the samples examined. In this case,
an allelic imbalance can be detected by comparing the variance
of the allelic ratio at the coding SNP across individuals that are
heterozygous and homozygous for the regulatory SNP. This
test is less powerful than the one performed in the case of per-
fect linkage disequilibrium.
10. Standard curves should be built using serial dilutions (eight
serial dilutions, 1:2 [ChIP AI] or 1:5 [cDNA AI], starting at
20–25 ng/μl)

Acknowledgments

We thank Sonal Kashyap, Allison Richards, and Shaneen Baxter for

contributing to the optimization of these protocols and Joseph
Maranville for helpful advice. F.L. was supported by an AHA post-
doctoral fellowship (11POST5390005).
ChIP and RNA/cDNA Allelic Imbalance 211

References

1. WTCCC (2007) Genome-wide association 9. Knight JC et al (2003) In vivo characterization

study of 14,000 cases of seven common dis- of regulatory polymorphisms by allele-specific
eases and 3,000 shared controls. Nature quantification of RNA polymerase loading.
447(7145):661–678 Nat Genet 33(4):469–475
2. Morley M et al (2004) Genetic analysis of 10. Matyas G et al (2002) Quantification of single
genome-wide variation in human gene expres- nucleotide polymorphisms: a novel method
sion. Nature 430(7001):743–747 that combines primer extension assay and capil-
3. Pickrell JK et al (2010) Understanding lary electrophoresis. Hum Mutat 19(1):58–68
mechanisms underlying human gene expres- 11. Jurinke C et al (2002) Automated genotyping
sion variation with RNA sequencing. Nature using the DNA MassArray technology.
464(7289):768–772 Methods Mol Biol 187:179–192
4. Smirnov DA et al (2009) Genetic analysis of 12. Braun A et al (1997) Detecting CFTR gene
radiation-induced changes in human gene mutations by using primer oligo base exten-
expression. Nature 459:587–591 sion and mass spectrometry. Clin Chem
5. Stranger BE et al (2005) Genome-wide asso- 43(7):1151–1158
ciations of gene expression variation in 13. Pastinen T (2010) Genome-wide allele-specific
humans. PLoS Genet 1(6):e78 analysis: insights into regulatory variation. Nat
6. Stranger BE et al (2007) Population genomics Rev Genet 11(8):533–538
of human gene expression. Nat Genet 14. Luca F et al (2009) Adaptive variation regulates
39(10):1217–1224 the expression of the human SGK1 gene in
7. Montgomery SB et al (2010) Transcriptome response to stress. PLoS Genet 5(5):e1000489
genetics using second generation sequencing 15. Maranville JC et al (2011) Interactions
in a Caucasian population. Nature 464(7289): between glucocorticoid treatment and cis-
773–777 regulatory polymorphisms contribute to
8. Yan H et al (2002) Allelic variation in human cellular response phenotypes. PLoS Genet
gene expression. Science 297(5584):1143 7(7):e1002162
Chapter 14

SCAN: A Systems Biology Approach

to Pharmacogenomic Discovery
Eric R. Gamazon, R. Stephanie Huang, and Nancy J. Cox

Abstract
Genome-wide association (GWA) studies have identified thousands of genetic variants that contribute to
disease and pharmacologic traits. More recently, high-throughput sequencing studies promise to provide
a more complete catalog of genetic variants with roles in human phenotypic variation. Yet, characterizing
the influence of functional variants on genes, RNAs, proteins, and ultimately disease or pharmacologic
traits is a critical challenge for a vast majority of the implicated susceptibility loci. Here we describe SCAN,
a bioinformatics resource we have developed to elucidate the functional consequences of genetic variants
identified by genome-wide scans. In particular, this public resource implements a systems biology approach
to pharmacogenomic discovery.

Key words eQTLs, Pharmacogenomics, Expression profiling, Transcriptome, SNP function, Genetic
variation

1 Introduction

Genome-wide association (GWA) studies have provided molecular

medicine and modern biology with increasingly complex datasets—
in scope and diversity—and have enabled high-throughput genomic
analyses of a broad spectrum of disease traits and pharmacologic
outcomes. The massive amounts of data that characterize such
studies generate an unprecedented volume of results whose explora-
tion is facilitated, and often only feasible, through the use of versatile
databases and computational tools. Indeed, our greater ability to
conduct high-throughput surveys of human genetic variation has
led to new computational challenges in the analysis and prioritiza-
tion of findings aimed at the identification and characterization of
genetic loci that predict phenotype.
Besides the deluge of data emerging from GWA studies, there is
perhaps a greater challenge facing GWA as an approach to complex
traits genetics. Although GWA studies have led to some notable

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_14, © Springer Science+Business Media, LLC 2013

213
214 Eric R. Gamazon et al.

successes in the identification of genetic variation influencing complex

traits [1], the primary aim of such studies—a comprehensive account
of the genetic etiology of disease risk or drug response—continues
to challenge our best efforts. It is widely appreciated that GWA find-
ings on their own do not necessarily enhance our ability to discern
the underlying biological function of genetic signals. Furthermore,
despite the growing list of reproducible GWA studies results, sub-
stantial heritability remains to be accounted for [2].
To meet some of these computational and analytic challenges,
our group has devoted substantial resources to developing a pub-
licly available genomic database, SCAN (http://www.scandb.org)
[3], which serves results of transcriptome studies in HapMap lym-
phoblastoid cell lines (LCLs), and includes information on com-
mon copy number variants as well. Our preliminary studies with
SCAN have shown that SNPs associated with complex traits are
more likely to be expression quantitative trait loci (eQTLs) than
minor allele-frequency-matched SNPs drawn from high-density
SNP genotyping platforms [4]. This finding has been robustly
observed across a wide range of definitions for trait-associated
SNPs and eQTLs, and across a broad range of human phenotypes
[1, 5]. SCAN provides a framework for utilizing the transcriptome
and implements a functional annotation pipeline to expand on
genetic association studies.
In this chapter, we describe the feature set of a bioinformatics
resource that has proven useful for studies in systems-based pharma-
cogenomics [6, 7]. We motivate SCAN’s particular variant annota-
tion approach to the prioritization of results from the flood of data
from GWA studies. Finally, we present various in silico experiments to
demonstrate the utility of the particular approach to the integration
of genomic datasets that has been implemented in SCAN.

2 Features

2.1 Genetic The cell lines from the International HapMap Project [8] have
Variation: SNPs been the most frequently used for in vitro studies of gene expres-
and CNVs sion and pharmacologic phenotypes [6, 9, 10]. The cell lines are
Epstein-Barr virus transformed lymphocytes derived from appar-
ently healthy individuals of different ancestry. The HapMap LCLs
have been extensively utilized as a model for pharmacogenomic
studies, most notably on cellular toxicities of anticancer agents.
HapMap samples in 30 CEU (Utah residents with ancestry from
Northern and Western Europe) trios and in 30 YRI (Yoruba in
Ibadan, Nigeria) trios comprised the initial genotype dataset in
SCAN. Studies that have been conducted on these samples have
demonstrated that gene expression and drug response phenotypes
such as cytotoxicity are heritable and include an appreciable genetic
component. Extensive genotypic data are available on these samples
A Systems Biology Approach to Pharmacogenomics 215

from the International HapMap Project (http://www.hapmap.

org) and, more recently, the 1000 Genomes Project (http://
www.1000Genomes.org) [11]. The latter initiative promises to
enable explorations of the role of rare variants in complex traits
genetics by providing a more comprehensive catalog of human
genetic variation. SCAN is poised to contribute to the development
of methods for testing associations with complex traits, including
drug response [12], using data from sequencing studies.
The first release of SCAN utilized the HapMap SNPs as the
primary unit of analysis. With the recent release of an extensive
catalog of copy number variants (CNVs) assayed in the HapMap cell
lines from array-based technologies [13] and from population-based
sequencing studies [14], SCAN has been expanded to include
information on these genetic variants [15]. In a recent pharma-
cogenomic study of these CNVs, we have been able to show that,
for an array of functionally diverse chemotherapeutics, the top
CNV associations with cytotoxicity are independent of known
SNP associations and have thus not been interrogated by previous
SNP-based GWA studies of these drugs [16].

2.2 Integration Given the proliferation of databases and resources for genomic
of Functional studies, it was felt that there was a need for consolidating publicly
Annotations of Genetic available functional annotations for loci emerging from genome-
Variation wide studies. We were particularly interested in developing novel
tools relevant to large-scale, whole-genome studies aimed at char-
acterizing susceptibility loci for complex traits. Functional classifi-
cation of variants is one of the primary challenges of such studies for
the purposes of prioritization for follow-up studies and of providing
mechanistic hypotheses for observed associations between SNPs
and disease or drug response. A database to assist in the identifica-
tion of the functional consequences or potential biological impact
of genetic variants from genome-wide scanning was a primary aim
in the development of SCAN.
A priori information on how genetic variation may impact
biological processes or molecular function may aid in the interpre-
tation of results from GWA studies. Indeed, GWA analyses
restricted to variants with increased probabilities of association may
be utilized to reduce the number of tests performed and increase
the power to detect susceptibility loci. Of course, many genetic
variants have no known functional effect on disease or therapeutic
outcome in humans. Furthermore, determining the influence of
genetic variation on genes, RNAs, or proteins, and ultimately disease
or drug response is a challenge of enormous complexity. SNPs
located in coding regions may be silent or synonymous, resulting in
no change in the gene product. A missense variant is clearly a useful
marker for disease. For example, a missense polymorphism causes a
change in amino acid from a glutamine in a normal hemoglobin
gene to a valine in a sickle cell hemoglobin gene; as a result, a
216 Eric R. Gamazon et al.

homozygote individual for the sickle cell allele develops sickle cell
anemia. On the other hand, a sickle cell homozygote individual or
a heterozygote individual at the hemoglobin gene locus may
develop better resistance to malaria. A nonsense variant results in a
premature termination codon. This type of polymorphism is
responsible for at least some forms of such diseases as Duchenne
Muscular Dystrophy (characterized by a damaged dystrophin gene)
and Cystic Fibrosis (caused by mutations in the CFTR gene).
On the other hand, nonsense-mediated decay (NMD) provides a
cellular machinery for detecting nonsense mutations and prevent-
ing the expression of an aberrant protein. A frameshift interferes
with the triplet nature of gene expression and causes a change in
the reading frame of the codons (thus, a change in the resulting
translation). A frameshift mutation in NOD2 has been shown to be
associated with Crohn’s disease [17, 18]. NOD2 is an outstanding
candidate for inflammatory bowel disease (IBD), as tumor necrosis
factor signaling and nuclear factor (NF) κB activation in mononu-
clear cells are critical for IBD pathophysiology. Yet the frameshift in
NOD2 was found to be associated solely with Crohn’s disease, and
not with ulcerative colitis. SCAN annotates a coding variant with
information on how it exerts its functional effect on its host gene by
altering the gene product.
An SNP may have other effects on biological molecules.
A transcript may contain variation in a region that is not translated.
An SNP in a 5′ untranslated (5′ UTR) region may alter a binding
site for a protein, thus affecting mRNA stability; alternatively, it may
promote or inhibit the initiation of translation. A recent study has
shown that 5′ UTR SNPs in nuclear transcripts encoding both mito-
chondrial and secreted proteins may influence gene regulation at the
level of mRNA export [19]. Similarly, an SNP in a 3′ untranslated
(3′ UTR) region may affect the binding site of a microRNA
(miRNA), a posttranscriptional regulator that binds to the target
transcript to induce translational repression or gene silencing.
Dysregulation of miRNAs has been associated with higher tumor
proliferation in human epithelial ovarian cancer [20] and with the
etiologies of cardiovascular diseases [21] and of a psychiatric pheno-
type [22]. SCAN annotates noncoding SNPs with information on
location in the genome relative to the most proximal transcript.

2.3 Transcript In contrast to conventional approaches to functional classification,

Regulation SCAN provides an annotation pipeline to characterize genetic vari-
ation with high-throughput molecular phenotypes (e.g., gene
expression traits). GWA studies have discovered numerous repro-
ducible associations between common variants and complex human
phenotypes, but only a small proportion can be attributed to
protein-altering variants. Indeed a substantial proportion of the
discovered associations are from noncoding regions; thus, these
variants have been hypothesized to alter the expression levels of
one or more target genes.
A Systems Biology Approach to Pharmacogenomics 217

Variation in gene expression is an important feature of human

phenotypic variation. In recent years, we have witnessed some trans-
formative advances in the assay of gene expression on a genome-wide
scale, which have enabled studies of transcript variation at an unprec-
edented resolution [23]. These technological advances are likely to
contribute further to our understanding of the patterns of modula-
tion of important pharmacokinetic and pharmacodynamic genes, the
characterization of pharmacogenetically relevant co-regulated genes,
and the elucidation of biological networks.
Studies conducted in LCLs have shown extensive differences
in the genomic regulation of gene expression within and among
ethnic populations [24]. Of enormous importance to the develop-
ment of SCAN is the availability of large-scale datasets on the
genetic regulation of gene expression. The assimilation of genomic
data and high-throughput gene expression data for the identifica-
tion of regulatory variation is an important approach to SNP func-
tional annotation implemented in SCAN. Studies in our group
[25, 26] and others [27, 28] have mapped gene expression varia-
tion to particular genomic loci known as expression quantitative
trait loci or eQTLs. SCAN uses summary results of SNP associa-
tions to transcriptional expression to functionally characterize
polymorphisms.

2.4 Multi-locus Genotype–phenotype correlations from GWA studies often span

Linkage many correlated variants across multiple genes. SCAN annotates an
Disequilibrium SNP with the set of genes (located in a region spanning 500 kb,
centered at the SNP) that contain variants in linkage disequilibrium
(LD) with the given SNP. Furthermore, despite the enormous
advances in genotyping platforms utilized in GWA studies, it is likely
that the causative variants are not genotyped. SCAN provides a
framework for interpreting results from GWA studies by assessing,
for a given set of SNPs, the coverage of high-throughput genotyp-
ing platforms relative to a reference panel (see Fig. 1 for an example).
Multilocus LD, which is calculated using haplotype frequencies,
provides a way to estimate how much the genotyped variants
capture the available information at a locus; multilocus LD can then
be used in the choice of genotyping platform for a candidate gene.

3 Implementation

SCAN provides a set of query tools (see Fig. 2) using the following
interfaces, which are all available in batch query mode:
1. An SNP Query that returns physical and functional annotation,
host and flanking genes, and genes whose expression levels are
predicted by the variant, at a user-specified P-value threshold.
2. A Gene Query that retrieves all variants within and up to a
user-specified distance (in kilobases) of the gene, maps the gene
218 Eric R. Gamazon et al.

Fig. 1 Platform coverage. SCAN enables the interpretation of results from GWA studies by assessing, for a
given set of SNPs, the coverage of high-throughput genotyping platforms relative to a reference panel

to its genomic coordinates relative to the reference assembly

and returns the list of local (cis-) and distant (trans-acting)
regulators of the gene. The eQTLs located within the 4 mb of
a gene are classified as cis-acting; other eQTLs (including those
on other chromosomes) are defined as trans-acting. Clicking
on the gene symbol in the result output provides additional
annotation including nomenclature, gene type (e.g., protein-
coding), and whether the gene is expressed in the various
tissues (e.g., LCL).
3. A Genomic Region Query that returns the list of variants in
the specified genomic region, the list of all genes located with
the region and all genes predicted to be regulated by the SNPs
within the region, at a user-specified P-value threshold.
4. A CNV Query that returns the CNV’s genomic coordinates,
the copy number genotype, genes overlapping the CNV as
well as flanking genes, and genes predicted to be regulated by
the CNV at a user-specified P-value threshold.
For each query, the strength of the association between the
eQTL (either cis- or trans-acting) and the target gene is provided.
A Systems Biology Approach to Pharmacogenomics 219

Fig. 2 Functional annotation. SCAN provides a set of interfaces for annotating SNPs, genes, genomic regions
and CNVs with high-throughput molecular phenotypes (e.g., gene expression)

SCAN reports unadjusted P-values since the appropriate multiple

testing correction method may vary with the study context (in par-
ticular, the number of statistical tests performed in a given study).

4 Applications

In this section, we illustrate the use of SCAN by conducting several

in silico experiments aimed at exploiting the functional annotation
system. We pursue particular research questions (to extend the
results of published studies) and, in the process, define the protocols
used to conduct these in silico experiments.

4.1 eQTLs A GWA study of childhood onset asthma [29] implicated ORMDL3
for Known Asthma as a susceptibility gene. Multiple SNPs located on chromosome
Susceptibility Gene 17q21 have been found to be strongly and reproducibly associated
with risk of disease [30]. ORMDL3 is a gene that encodes a trans-
membrane protein anchored in the endoplasmic reticulum.
Particularly, genetic variants regulating ORMDL3 expression in cis
were hypothesized to be determinants of disease susceptibility
[29]. We sought to identify regulators of this gene in LCLs to
validate the initial findings.
220 Eric R. Gamazon et al.

Protocol:
(a) In SCAN’s Gene Query tool, enter the gene of interest
(ORMDL3). SCAN queries can also be done in batch mode by
entering or uploading a list, but we restrict the present analysis
to this particular gene.
(b) Select “include gene start, end, and chromosome” option.
(c) Select “include SNPs that predict expression with p-value less
than” and enter 0.0001 as SNP eQTL p-value threshold.
(d) Select “include CNVs that predict expression with p-value less
than” and enter 0.01 as CNV eQTL p-value threshold.
(e) Select “Restrict to eQTLs on the same chromosome”.
Results:
The experiment we just conducted replicates many of the eQTLs
reported for ORMDL3 in the original study (see Fig. 3), including
rs9303277 on IKZF3 (p = 4 × 10−10), rs2290400 on GSDMB

Fig. 3 Novel susceptibility loci identified from SCAN’s annotation pipeline. SCAN replicates many of the eQTLs
reported for ORMDL3 in the original GWA study (Moffatt, M.F., et al., 2007) of childhood onset asthma. In addi-
tion to the reported loci, we identified 2 CNVs on the same chromosome (namely, CNVR7003.1 and CNVR7095.3)
that predict ORMDL3 expression. These are outstanding candidate loci for follow-up studies on asthma
susceptibility
A Systems Biology Approach to Pharmacogenomics 221

(p = 4 × 10−10), rs7216389 on GSDMB (p = 4 × 10−10), and a SNP

rs4795405 (p = 3 × 10−6) outside of a gene. The first 3 of these
SNPs are the top eQTL associations, in samples of European
ancestry (CEU), for ORMDL3 in SCAN. The recent study from
the 1000 Genomes Project [11] has identified an SNP rs11078928,
a variant in GSDMB, to be in strong LD with various SNPs near
ORMDL3, suggesting that GSDMB may be the causative gene in
the observed associations.
We set a loose p-value threshold since local (cis) associations
imply a reduced multiple testing burden and, in the case of CNV
eQTLs, their identification is exploratory. Going beyond the origi-
nal study, we identified 2 CNVs on the same chromosome (namely,
CNVR7003.1 and CNVR7095.3) that predict ORMDL3 expres-
sion. These are outstanding candidate loci for follow-up studies on
asthma susceptibility.

4.2 Functional The identification of patients who are likely to experience adverse
Annotation of events from a particular chemotherapeutic treatment is an impor-
Chemotherapeutic tant step in the individualization of cancer chemotherapeutics.
Susceptibility A recent study [31] has applied an in vitro genome-wide cell-based
Associated SNPs model system to identify pharmacogenetic variants that can serve
as germ-line genetic biomarkers of carboplatin susceptibility in
head and neck cancer (HNC) patients. Two SNPs, rs2551038 and
rs6870861, identified in a cell-based model were found to be asso-
ciated with overall response to carboplatin-based therapy in HNC
patients. We sought mechanistic hypotheses for the published SNP
associations with chemoresponse.
Protocol:
(a) In SCAN’s SNP Query tool, enter the SNPs of interest
(rs2551038 and rs6870861). The SNPs should be white space-
or comma-delimited.
(b) Select “include SNP info” option.
(c) Select “include host gene and SNP function” option.
(d) Select “include left- and right-flanking genes” option.
(e) Select “include genes that SNP predicts expression for with
p-value less than” option and enter 0.0001 as SNP eQTL
p-value threshold.
Results:
This protocol identified rs2551038 as an intronic SNP within
HINT1, a gene that has been shown to act as a tumor suppressor and
to exert a pro-apoptotic function [32]. On the other hand, rs6870861
is located outside of a gene, but is flanked on the right by HINT1.
The two SNPs are regulating the expression of at least ten genes,
mostly in trans, including TLR4 (p = 5 × 10−7), the most significantly
222 Eric R. Gamazon et al.

regulated gene (by both variants). Interestingly, a previous study has

shown a link between TLR4 signaling pathway, inflammation, tumor
growth, and chemoresistance [33].

4.3 Identification A polymorphism at NEGR1 has been shown to be a reproducible

of a CNV association with body mass index at the locus; in particular, a 45 kb
at a BMI-Associated deletion at the locus was proposed as a candidate causal polymor-
Gene Locus phism [34]. We sought to identify copy number polymorphisms in
this genomic region and determine their role on genome-wide
gene expression.
Protocol:
(a) In SCAN’s Gene Query, enter the gene (NEGR1) and select
“include gene start, end, and chromosome” to identify the
gene’s genomic coordinates. The gene is located on chromo-
some 1 (71641212–72520864).
(b) In SCAN’s Region query, enter a region (1:71611212:72550864)
that spans 30 kb upstream and downstream of the gene.
(c) Select “include SNPs” option to list all SNPs in the specified
region.
(d) Select “include genes” option to list all genes located in the
region.
(e) Select “include CNVs” option to list all CNVs in the specified
region.
(f) Click on the “CNVR216.1” link to retrieve the genomic
boundaries and the genotypes of the deletion polymorphism.
(g) In SCAN’s CNV Query, enter “CNVR216.1”.
(h) Select “include genes” that CNV predicts expression for with
p-values less than option and enter 0.0005 as CNV eQTL
p-value threshold.
Results:
This experiment identified a copy number polymorphism
(CNVR216.1) that is located downstream from the NEGR1 gene.
The variant is a deletion with copy number genotypes 0, 1, and 2.
Furthermore, at the given eQTL threshold, the CNV is associated
with the expression of several target genes, including HDAC5 and
FLJ42957. Remarkably, HDAC5 is one of several NAD+ depen-
dent histone deacetylases (induction of which has been linked to
calorie restriction [35]) found to be down-regulated in the fat tis-
sue of obese co-twins [36]. Redoing the same experiment (using
the larger region 1:71611212:72600864) yields a second deletion
polymorphism CNVR217.1 of length 45 kb at this locus (the
hypothesized causal variant). CNVR217.1 is also an eQTL for sev-
eral target genes, including the diabetes gene RALGPS2 [37].
A Systems Biology Approach to Pharmacogenomics 223

5 Conclusion

We have described the current feature set of a bioinformatics

database that facilitates the functional characterization of genetic
variation and the identification of potential mechanistic underpin-
nings of associations from GWA studies. To date, SCAN has
received over 800,000 queries. A catalog of eQTLs identified in
primary human tissues (including liver, adipose, muscle, and cere-
bellum) is now being assimilated. Ongoing research efforts (e.g.,
rare variants from the 1000 Genomes Project, genetic regulation
of protein levels and studies of epigenetic regulation of gene
expression) promise to expand the scope and relevance of SCAN.

Acknowledgments

This work was funded through Pharmacogenomics of Anticancer

Agents Research (PAAR; U01 GM61393), ENDGAMe (ENhancing
Development of Genome-wide Association Methods) initiative
(U01 HL084715), the Genotype-Tissue Expression project
(GTeX) (R01 MH090937), Rare Variants and Complex Human
Phenotypes (U01HG005773), and the University of Chicago
DRTC (Diabetes Research and Training Center; P60 DK20595).

References

1. Hindorff LA et al (2009) Potential etiologic 9. Huang RS et al (2007) A genome-wide

and functional implications of genome-wide approach to identify genetic variants that
association loci for human diseases and traits. contribute to etoposide-induced cytotoxicity.
Proc Natl Acad Sci USA 106:9362–9367 Proc Natl Acad Sci USA 104:9758–9763
2. Manolio TA et al (2009) Finding the missing 10. Huang RS et al (2008) Genetic variants associ-
heritability of complex diseases. Nature ated with carboplatin-induced cytotoxicity in
461:747–753 cell lines derived from Africans. Mol Cancer
3. Gamazon ER et al (2010) SCAN: SNP and copy Ther 7:3038–3046
number annotation. Bioinformatics 26:259–262 11. Durbin RM et al (2010) A map of human
4. Nicolae DL et al (2010) Trait-associated SNPs are genome variation from population-scale
more likely to be eQTLs: annotation to enhance sequencing. Nature 467:1061–1073
discovery from GWAS. PLoS Genet 6:e1000888 12. Gamazon ER et al (2009) A pharmacogene
5. Gamazon ER et al (2010) Chemotherapeutic database enhanced by the 1000 Genomes
drug susceptibility associated SNPs are Project. Pharmacogenet Genomics 19:
enriched in expression quantitative trait loci. 829–832
Proc Natl Acad Sci USA 107:9287–9292 13. Conrad DF et al (2010) Origins and func-
6. Welsh M et al (2009) Pharmacogenomic dis- tional impact of copy number variation in the
covery using cell-based models. Pharmacol human genome. Nature 464:704–712
Rev 61:413–429 14. Mills RE et al (2011) Mapping copy number
7. Gamazon ER et al (2010) PACdb: a database for variation by population-scale genome sequenc-
cell-based pharmacogenomics. Pharmacogenet ing. Nature 470:59–65
Genomics 20:269–273 15. Gamazon ER, Nicolae DL, Cox NJ (2011)
8. International HapMap Consortium (2003) A study of CNVs as trait-associated polymor-
The International HapMap Project. Nature phisms and as expression quantitative trait loci.
426:789–796 PLoS Genet 7:e1001292
224 Eric R. Gamazon et al.

16. Gamazon ER, Huang RS et al (2011) Copy 27. Stranger BE et al (2007) Population genomics
number polymorphisms and anticancer phar- of human gene expression. Nat Genet 39:
macogenomics. Genome Biol 12:R46 1217–1224
17. Ogura Y et al (2001) A frameshift mutation in 28. Storey JD et al (2007) Gene-expression varia-
NOD2 associated with susceptibility to tion within and among human populations.
Crohn’s disease. Nature 411:603–606 Am J Hum Genet 80:502–509
18. Hampe J et al (2001) Association between 29. Moffatt MF et al (2007) Genetic variants regu-
insertion mutation in NOD2 gene and Crohn’s lating ORMDL3 expression contribute to the
disease in German and British populations. risk of childhood asthma. Nature 448:470–473
Lancet 357:1925–1928 30. Bouzigon E et al (2008) Effect of 17q21 vari-
19. Cenik C (2011) Genome analysis reveals inter- ants and smoking exposure in early-onset
play between 5′UTR introns and nuclear asthma. N Engl J Med 359:1985–1994
mRNA export for secretory and mitochondrial 31. Ziliak D et al (2011) Germline polymorphisms
genes. PLoS Genet 7:e1001366 discovered via a cell-based, genome-wide
20. Zhang L et al (2008) Genomic and epigenetic approach predict platinum response in head
alterations deregulate microRNA expression in and neck cancers. Transl Res 157:265–272
human epithelial ovarian cancer. Proc Natl 32. Weiske J, Huber O (2006) The histidine triad
Acad Sci USA 105:7004–7009 protein Hint1 triggers apoptosis independent
21. Yang B et al (2007) The muscle-specific of its enzymatic activity. J Biol Chem
microRNA miR-1 regulates cardiac arrhyth- 281:27356–27366
mogenic potential by targeting GJA1 and 33. Kelly MG et al (2006) TLR-4 signaling pro-
KCNJ2. Nat Med 13:486–491 motes tumor growth and paclitaxel chemore-
22. Hansen T et al (2007) Brain expressed microR- sistance in ovarian cancer. Cancer Res 66:
NAs implicated in schizophrenia etiology. 3859–3868
PLoS One 2:e873 34. Willer CJ et al (2009) Six new loci associated
23. Pickrell JK et al (2010) Understanding mech- with body mass index highlight a neuronal
anisms underlying human gene expression influence on body weight regulation. Nat
variation with RNA sequencing. Nature 464: Genet 41:25–34
768–772 35. Yang T, Sauve AA (2006) NAD metabolism
24. Spielman RS et al (2007) Common genetic and sirtuins: metabolic regulation of protein
variants account for differences in gene expres- deacetylation in stress and toxicity. AAPS J
sion among ethnic groups. Nat Genet 39: 8:E632–E643
226–231 36. Pietilainen KH et al (2008) Global transcript
25. Zhang W et al (2009) Identification of common profiles of fat in monozygotic twins discordant
genetic variants that account for transcript for BMI: pathways behind acquired obesity.
isoform variation between human populations. PLoS Med 5:e51
Hum Genet 125:81–93 37. Hayes MG et al (2007) Identification of type 2
26. Duan S et al (2008) Genetic architecture of diabetes genes in Mexican Americans through
transcript-level variation in humans. Am J genome-wide association studies. Diabetes
Hum Genet 82:1101–1113 56:3033–3044
Chapter 15

Methods to Examine the Impact of Nonsynonymous

SNPs on Protein Degradation and Function
of Human ABC Transporter
Toshihisa Ishikawa, Kanako Wakabayashi-Nakao,
and Hiroshi Nakagawa

Abstract
Clinical studies have strongly suggested that genetic polymorphisms and/or mutations of certain
ATP-binding cassette (ABC) transporter genes might be regarded as significant factors affecting patients’
responses to medication and/or the risk of diseases. In the case of ABCG2, certain single nucleotide poly-
morphisms (SNPs) in the encoding gene alter the substrate specificity and/or enhance endoplasmic
reticulum-associated degradation (ERAD) of the de novo synthesized ABCG2 protein via the ubiquitin-
mediated proteasomal proteolysis pathway. Hitherto accumulated clinical data imply that several nonsyn-
onymous SNPs affect the ABCG2-mediated clearance of drugs or cellular metabolites, although some
controversies still exist. Therefore, we recently developed high-speed functional screening and ERAD of
ABC transporters so as to evaluate the effect of genetic polymorphisms on their function and protein
expression levels in vitro. In this chapter we present in vitro experimental methods to elucidate the impact
of nonsynonymous SNPs on protein degradation of ABCG2 as well as on its transport function.

Key words BCRP, Endoplasmic reticulum-associated degradation (ERAD), Ubiquitin, Proteasome,

Endosome, Lysosome, Genetic polymorphism, Porphyrin, Gout

1 Introduction

It is well recognized that drugs can exhibit wide inter-patient

variability in their efficacy and toxicity. For many drugs, these
inter-individual differences are due, in part, to polymorphisms in
genes encoding drug metabolizing enzymes, drug transporters,
and/or drug targets (e.g., receptors, enzymes) [1, 2]. As a means
of implementing personalized medicine, it is critically important to
understand the molecular mechanisms underlying inter-individual
differences in the drug response. Genetic polymorphisms of drug
metabolizing enzymes and drug transporters have been found to
play a significant role in the patients’ responses to medication [3–5].

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_15, © Springer Science+Business Media, LLC 2013

225
226 Toshihisa Ishikawa et al.

SNP database ABC transporter cDNA

Cellular traffic & Site-directed mutagenesis Transport function

Protein degradation assay

pcDNA5/FRT vector pFastBac vector

Flp-In 293 cells Recombinant Bacmid DNA

Hygromycin selection Expression in insect Sf9 cells

Q-RT-PCR to detect mRNA Preparation of membrane vesicles

Immunoblotting to detect protein level Transport function assay

Immuno-fluorescence micrograph QSAR analysis for substrate specificity

Determine target SNP

Detection of target SNP in clinical samples

Clinical PK/PD data Evaluate clinical impact of SNP

Fig. 1 Flowchart for experimental procedures for studing protein quality control
and transport function to validate clinically important SNPs. The protein quality
control and ERAD of SNP variants of the ABC transporters can be studied by using
Flp-In-293 cells, whereas the transport function assay is carried out by using plasma
membrane vesicles prepared from insect Sf9 cells and high-speed screening/QSAR
analysis technologies. QSAR quantitative structure–activity relationship, Q-RT-PCR
quantitative reverse-transcription polymerase chain reaction

Accumulating evidence demonstrates that certain nonsynonymous

polymorphisms have great impact on protein stability and degrada-
tion as well as on the function of drug metabolizing enzymes and
transporters [6]. In addition to alterations in the transport activity
and substrate specificity, genotype-related protein degradation or
impaired intracellular trafficking of drug transporters can affect the
overall pharmacological and pharmacokinetic profiles of a drug.
For determining clinically important SNPs, factors such as the
molecular mechanisms underlying differences in patients’ drug
responses should be taken into account. Figure 1 depicts a flow-
chart of experimental procedures for studying both protein quality
control and transport function assays to achieve the final goal of clini-
cal SNP detection. The protein quality control and ERAD for SNP
variants of ABC transporters can be studied by using Flp-In-293
cells, as described in this chapter. Furthermore, transport function
assays can be carried out by using plasma membrane vesicles prepared
from cultured cells, such as insect Sf9 cells. We herein address func-
tional evaluation of nonsynonymous SNP variants.
The endoplasmic reticulum (ER) is the cellular system respon-
sible for protein synthesis and maturation. The native conforma-
tion of a protein is latently encoded in its primary amino acid
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 227

sequence and the corresponding gene sequence; however, protein

folding does not proceed spontaneously. The ER is responsible for
enhancing the efficacy and fidelity of protein folding. Furthermore,
the role of the ER resides in checking de novo synthesized proteins
and in destining them for the plasma membrane and for the secre-
tory or endocytic organelles [6–10].
Efficient protein quality control in the ER is required to prevent
incompletely folded molecules from moving along the intra-cellular
traffic pathway, since accumulation of misfolded proteins is consid-
ered to detrimentally affect cellular functions. “ER stress” has been
proposed as a term to describe a cellular response to the accumula-
tion of misfolded proteins. Misfolded proteins resulting from
genetic polymorphisms should be removed from the ER by ret-
rotranslocation to the cytosol compartment and then degraded by
the ubiquitin–proteasome system. At present, however, it still
remains to be elucidated how misfolded proteins are recognized
and destroyed via the ERAD pathway. Furthermore, the current
bioinformatics technology is not able to accurately predict which
nonsynonymous SNPs cause misfolded proteins. In this chapter,
we present an in vitro method to evaluate the effect of nonsynony-
mous SNPs on protein quality control of the de novo synthesized
ABCG2 protein. For this purpose, the “Flp recombinase” system is
used as it provides a useful tool to quantitatively analyze the protein
stability and degradation of misfolded proteins.
To identify drugs affected by genetic polymorphisms, high-speed
screening technologies are useful. The isolated membrane vesicle
system provides a practical tool for low cost and high-throughput
analysis of ABC multidrug transporters. Baculovirus-infected insect
cells have successfully been employed to give relatively high protein
expression yields; for example, Spodoptera frugiperda (Sf9) cells
are widely used to obtain membranes overexpressing various ABC
transporters. We present procedures for fictional evaluation of
nonsynonymous SNP variants of humanABCG2.

2 Materials

2.1 Materials Flp recombinase-mediated site-specific integration and gene expres-

for Evaluation sion in mammalian cells allow us to integrate one single copy of cDNA
of the Impact of into the genomic DNA at a specific genome location in mammalian
Nonsynonymous SNPs host cells. At present, the Flp-In™ system is commercially available
on Protein from Invitrogen (Carlsbad, CA, USA: www.invitrogen.com).
Degradation Flp-In™ cell lines (Invitrogen, Carlsbad, CA, USA) were
generated from the American Type Culture Collection (ATCC)
2.1.1 Flp-In Cell Lines cell lines, (e.g., HEK293, CV-1, CHO-K1, BHK, NIH/3 T3, or
Jurkat) to stably express the lacZ-Zeocin fusion gene. Each cell line
contains a single integrated Flp Recombination Target (FRT) site.
228 Toshihisa Ishikawa et al.

The FRT site, originally isolated from S. cerevisiae, serves as a binding

site for Flp recombinase and has been well characterized [11–14].
The minimal FRT site comprises a 34-bp sequence containing two
13-bp imperfect inverted repeats separated by an 8-bp spacer that
includes an Xba I restriction site. An additional 13-bp repeat is
found in most FRT sites [15]. While Flp recombinase binds to all
three of the 13-bp repeats, strand cleavage actually occurs at the
boundaries of the 8-bp spacer region [14, 15].

2.1.2 pcDNA5/FRT Vector pcDNA5/FRT (Invitrogen, Carlsbad, CA, USA) is a 5.1-kb

expression vector designed for use with the Flp-In™ system. This
vector contains the following elements: the human cytomegalovi-
rus (CMV) immediate-early enhancer/promoter [16–18]: multi-
ple cloning sites with ten unique restriction sites, which can be
used to introduce the cDNA sequence encoding the protein to be
studied (in the present case ABCG2): the FRT site for Flp
recombinase-mediated integration of the vector into Flp-In host
cells; and the hygromycin-resistance gene for the selection of stable
cell lines [19].

2.1.3 pOG44 Vector pOG44 is a 5.8-kb Flp recombinase expression vector (Invitrogen,
Carlsbad, CA, USA). The FLP gene was originally isolated from
the S. cerevisiae 2-μ plasmid [20, 21] and encodes a site-specific
recombinase that is a member of the integrase family of recombi-
nases [22]. The Flp recombinase mediates a site-specific recombi-
nation reaction between interacting DNA molecules via the pairing
of interacting FRT sites [11, 23]. The native FLP gene encodes a
protein of 423 amino acids with a calculated molecular mass of
49 kDa. The FLP gene expressed from pOG44 encodes a
temperature-sensitive Flp recombinase, which carries a point muta-
tion (flp-F70L) that results in a change in amino acid 70 from Phe
to Leu [24]. The flp-F70L protein expressed from pOG44 exhibits
increased thermostability at 37 °C in mammalian cells when com-
pared with the native Flp recombinase [24].

2.1.4 Reagents The following reagents are available from commercial sources.
1. Dulbecco’s modified Eagle’s medium (D-MEM).
2. 10 % (v/v) heat-inactivated fetal calf serum (FCS).
3. L-Glutamine (2 mM).
4. Penicillin (100 U/ml).
5. Streptomycin (100 μg/ml).
6. Zeocin (100 μg/ml).
7. Hygromycin B (100 μg/ml).
8. Amphotericin B, 250 ng/ml.
9. Trypan Blue dye.
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 229

10. Lipofectamine-2000.
11. MG132 (inhibitor for proteasomal degradation of proteins).
12. Bafilomycin A1 (inhibitor for lysosomal degradation of proteins).
13. 3-[4,5-dimethylthiazol-2-yl]-2,5-diphenyltetrazolium bromide
(MTT reagent).

2.1.5 Sample ABCG2 wild type (WT) cDNA

Human ABCG2 cDNA was cloned from cDNA of the MCF7/
BCRP clone-8 cell line by PCR, as described previously [25]. The
PCR product is inserted into the pcDNA5/FRT plasmid [26–28],
and its sequence is analyzed by automated DNA sequencing.

2.1.6 Enzymes N-glycosidase F (PNGase F)

2.2 Materials for We usually infect Sf9 cells (1 × 106 cells/ml) with human ABCG2-
Functional Evaluation of recombinant baculovirus and culture them at 27 °C with gentle
Nonsynonymous SNPs shaking [29, 30].

2.2.1 Cells Used 1. Insect Spodoptera frugiperda Sf9 cells.

for the Expression 2. Competent DH10Bac E. coli cells.
of ABCG2 in Sf9 Cells

2.2.2 Plasmid 1. pFastBac1 plasmid.

and Enzymes 2. EcoRI.
3. Dpn I endonuclease.
4. PfuTurbo® DNA polymerase.

2.2.3 Medium 1. EX-CELL™ 420 Insect serum-free medium (JRH Biosciences,

and Reagent Inc., Lenexa, KZ, USA).
2. Cellfectin® reagent (Invitrogen Co., Carlsbad, CA, USA).

2.2.4 Buffer Solutions 1. Hypotonic buffer (0.5 mM Tris/HEPES, pH 7.4, 0.1 mM

and Media Used for EGTA) containing leupeptin (10 μg/ml).
Plasma Membrane 2. Phosphate-buffered saline (PBS).
Preparation and the
3. 0.25 M sucrose containing 10 mM Tris/HEPES (pH 7.4).
Transport Assay
4. 40 % (w/v) sucrose.
5. Standard incubation medium (0.25 M sucrose and 10 mM
Tris/HEPES, pH 7.4, 10 mM creatine phosphate, 100 μg/ml
of creatine kinase, and 10 mM MgCl2).
6. Stop solution (0.25 M sucrose, 10 mM Tris/HEPES, pH 7.4,
and 2 mM EDTA).
7. 10 mM NaOH.
230 Toshihisa Ishikawa et al.

2.2.5 Antibody 1. ABCG2-specific antibody BXP-21 (SIGNET, Dedham, MA,

USA).

2.2.6 Materials 1. Sephadex G-25 equilibrated with 0.25 M sucrose and 10 mM

and Instrument Used Tris/HEPES (pH 7.4).
for the Transport Assay 2. MultiScreen™ plates (Nihon Millipore KK, Tokyo, Japan).
3. EDR384S system (BioTec, Tokyo, Japan).

3 Methods

3.1 Methods SNP data on the polymorphisms of human ABCG2 gene were
for Evaluation obtained from the NCBI dbSNP database and publications [31].
of the Impact of Figure 2 depicts nonsynonymous polymorphisms and acquired
Nonsynonymous SNPs mutations in the human ABCG2 gene.
on Protein Degradation
3.1.1 SNP Data on
Nonsynonymous
Polymorphisms of Human
ABCG2 Gene

N590Y

F431L F571I S S homodimer

F208S S
S441N D620N
I206L
outside

Plasma
Membrane

G51C inside
COOH
B F489L
A
V12M R482G
H2N C R482T
Q126stop E334stop

Q141K S248P

T153M Q166E

Fig. 2 Schematic illustration of human ABCG2 and its nonsynonymous polymor-

phisms. The ABCG2 protein expressed in the plasma membrane is a homodimer
linked via a cysteinyl disulfide bond. The cysteine residue corresponding to Cys603 of
human ABCG2 is involved in the homodimer formation. The disulfide bond formation
at Cys603 does not appear to be a prerequisite for exerting the transport activity of
ABCG2. SNP data on the polymorphisms of ABCG2 were obtained from the NCBI
dbSNP database and recent publications. The variants R482G and R482T are acquired
mutations. A, B, and C indicate the motifs of Walker A (amino acids #80–86),
Walker B (amino acids #205–210), and signature C (amino acids #186–200)
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 231

3.1.2 Preparation ABCG2 WT cDNA inserted into the pcDNA5/FRT plasmid is

of Plasmids Carrying used as the template, and nonsynonymous SNP variants were
ABCG2 Variant cDNA generated by using the QuikChange® Site-Directed Mutagenesis
Kit (Stratagene, La Jolla, CA, USA). Table 1 summarizes the PCR
primers and conditions for site-directed mutagenesis to create
variants of ABCG2. The mutations should be confirmed by
sequencing the inserted cDNA.

3.1.3 Expression To quantitatively analyze the effect of nonsynonymous SNPs of

of ABCG2 and Its Variants ABCG2 on the protein expression level, we used the Flp-In method
in Flp-In-293 Cells to integrate one single copy of ABCG2 variant-cDNA into FRT-
tagged genomic DNA. Figure 3 illustrates the strategy by which we
integrate one single copy of the human ABCG2 cDNA into the chro-
mosomal DNA of Flp-In-293 cells by means of the Flp recombinase
system. By using this method, we exclude the random integration of
ABCG2 cDNA into the chromosomal DNA in host cells [27].
1. Flp-In™-293 cells (Invitrogen) are maintained in D-MEM
supplemented with 10 % (v/v) heat-inactivated FCS, 2 mM
L-glutamine, penicillin (100 U/ml), and streptomycin
(100 μg/ml) at 37 °C in a humidified atmosphere of 5 % CO2
in air.
2. The number of viable cells is determined from counts made in
a hemocytometer with Trypan Blue dye exclusion.
3. Flp-In-293 cells are transfected with the ABCG2-pcDNA5/
FRT vector, the Flp recombinase expression plasmid pOG44,
and LipofectAmine™-2000 (Invitrogen, Carlsbad, CA, USA)
according to the manufacturer’s instructions.
4. Single colonies resistant to hygromycin B (Invitrogen,
Carlsbad, CA, USA) are picked and subcultured.
5. Selection of positive colonies is performed by immunoblotting.
6. Mock cells (Flp-In-293/Mock) are prepared by transfecting
Flp-In-293 cells with the pcDNA5/FRT and pOG44 vectors
in the same manner as described above.

3.1.4 Detection It is important to examine whether the genomic DNA-integrated

of mRNA by RT-PCR ABCG2 cDNA is transcribed into mRNA. The transcript can be
detected by conventional RT-PCR or quantitative RT-PCR
methods.
1. Total RNA is extracted from cultured Flp-In-293 cells with
NucleoSpin® RNA II (MACHEREY-NAGEL GmbH & Co.
KG, Dueren, Germany).
2. cDNA is prepared from the extracted RNA in a reverse tran-
scriptase reaction with SuperScript II RT (Invitrogen, Carlsbad,
CA, USA) and random hexamers according to the manufac-
turer’s instructions.
232

Table 1
PCR primers and conditions for site-directed mutagenesis to create variants of ABCG2
Toshihisa Ishikawa et al.

Forward/reverse Primer length

Variant (F/R) primers Primer sequence (5′ → 3′) (bases) % GC Tm (°C)
V12M F CGAAGTTTTTATCCCAATGTCACAAGGAAACAC 33 39 55
R GTGTTTCCTTGTGACATTGGGATAAAAACTTCG
G51C F ATCGAGTAAAACTGAAGAGTTGCTTTCTACCTTGTAGAAAAC 42 35 59
R GTTTTCGACAAGGTAGAAAGCAACTCTTCAGTTTTACTCGAT
Q126stop F GTAATTCAGGTTACGTGGTATAAGATGATGTTGTGATGGG 40 40 62
R CCCATCACAACATCATCTTATACCACGTAACCTGAATTAC
Q141K F CGGTGAGAGAAAACTTAAAGTTCTCAGCAGCTCTT 35 42 55
R AAGAGCTGCTGAGAACTTTAAGTTTTCTCTCACCG
T153M F CGGCTTGCAACAACTATGATGAATCATGAAAAAAACGAACGG 42 40 60
R CCGTTCGTTTTTTTCATGATTCATCATAGTTGTTGCAAGCCG
Q166E F GGATTAACAGGGTCATTGAAGAGTTAGGTCTGGAT 35 42 55
R ATCCAGACCTAACTCTTCAATGACCCTGTTAATCC
I206L F CTTATCACTGATCCTTCCCTCTTGTTCTTGGATGAG 36 44 59
R CTCATCCAAGAACAAGAGGGAAGGATCAGTGATAAG
F208S F TGATCCTTCCATCTTGTCCTTGGATGAGCCTACAA 35 45 55
R TTGTAGGCTCATCCAAGGACAAGATGGAAGGATCA
S248P F TTCATCAGCCTCGATATCCCATCTTCAAGTTGTTT 35 40 55
R AAACAACTTGAAGATGGGATATCGAGGCTGATGAA
E334stop F TCATAGAAAAATTAGCGTAGATTTATGTCAACTCC 35 31 55
R GGAGTTGACATAAATCTACGCTAATTTTTCTATGA
F431L F AGCTGGGGTTCTCCTCTTCCTGACGACC 28 60 62
R GGTCGTCAGGAAGAGGAGAACCCCAGCT
S441N F AACCAGTGTTTCAGCAATGTTTCAGCCGTGGAAC 34 47 59
R GTTCCACGGCTGAAACATTGCTGAAACACTGGTT
F489L F GAGGATGTTACCAAGTATTATACTTACCTGTATAGTGTACTTCATG 46 34 62
R CATGAAGTACACTATACAGGTAAGTATAATACTTGGTAACATCCTC
F571I F GTCATGGCTTCAGTACATCAGCATTCCACGATATGG 36 47 61
R CCATATCGTGGAATGCTGATGTACTGAAGCCATGAC
N590Y F CATAATGAATTTTTGGGACAATACTTCTGCCCAGGACTCAAT 42 38 62
R ATTGAGTCCTGGGCAGAAGTATTGTCCCAAAAATTCATTATG
D620N F GGTAAAGCAGGGCATCAATCTCTCACCCTGGG 32 56 62
R CCCAGGGTGAGAGATTGATGCCCTGCTTTACC
Mutagenesis sites are indicated by underbars and bold letters. The % GC indicates the percentage of guanine and cytosine contents in the PCR primer set. Tm shows the melting
temperature for each PCR primer set
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation…
233
234 Toshihisa Ishikawa et al.

Fig. 3 Flp-mediated integration of the ABCG2 cDNA into FRT-tagged genomic DNA. Flp-In-293 cells were
co-transfected with the pcDNA5/FRT vector carrying the ABCG2 cDNA and the Flp recombinase expression
plasmid pOG44. Flp recombinase mediates insertion of the expression construct with the ABCG2 cDNA into
the genome at the integrated FRT site through site-specific DNA recombination

3. The mRNA levels of ABCG2 and glyceraldehyde-3-phosphate

dehydrogenase (GAPDH) are determined by PCR in an iCy-
cler™ thermal cycler (BIO-RAD, Hercules, CA, USA) with
the following specific primer sets: ABCG2 (5′-GATCTCTC
ACCCTGGGGCTTGTGGA, 5′-TGTGCAACAGTGTGATG
GCAAGGGA), GAPDH (5′-ACTGCCAACGTGTCAGTGG
TGGACCTGA; 5′-GGCTGGT GGTCCAGGGGTCTTAC
TCCTT). The PCR reaction consisted of a hot-start incuba-
tion at 94 °C for 2 min and 30 cycles of 94 °C for 30 s, 59 °C
for 30 s, and 72 °C for 30 s.
4. After the PCR, products are separated by agarose gel electro-
phoresis and detected with ethidium bromide under UV light.

3.1.5 Measurement The mRNA levels of ABCG2 and GAPDH are measured by quan-
of mRNA Levels by titative PCR, and the ratios of ABCG2 variants vs. GAPDH are
Quantitative RT-PCR plotted.
1. Determine the mRNA levels of ABCG2 and GAPDH by using
the 7500 Fast Real Time-PCR System (Applied Biosystems,
Foster City, CA, USA), TaqMan® Fast Universal Master Mix
(Applied Biosystems), and TaqMan® probes (ABCG2;
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 235

Hs00184979_m1, GAPDH; Hs99999905_m1) (Applied

Biosystems) according to the manufacturer’s protocol.
2. The expression levels of ABCG2 are normalized against those
of GAPDH.

3.1.6 Immunoblotting The ABCG2 protein expressed in Flp-In-293 cells is detected by

to Detect ABCG2 Protein immunoblotting with BXP-21 (SIGNET, Dedham, MA, USA), a
specific antibody to human ABCG2 [27].
1. Cells are rinsed with ice-cold PBS (pH 7.4) and subsequently
treated with lysis buffer containing 50 mM Tris–HCl (pH 7.4),
1 % (w/v) Triton X-100, 1 mM dithiothreitol, and a protease
inhibitor cocktail (Roche Ltd., Mannheim, Germany).
2. The samples are homogenized by passage through a 27-guage
needle and then centrifuged at 800 × g for 10 min at 4 °C. For
glycosidase treatments, protein (20 μg) of the cell lysate sample
is incubated with 20 U of PNGase F at 37 °C for 10 min.
3. Proteins are separated by electrophoresis on 7.5 % (w/v) poly-
acrylamide gels. Equal amounts of the resulting cell lysate
(10 μg of protein) are subjected to SDS-PAGE in the presence
or absence of mercaptoethanol.
4. Proteins are electrophoretically blotted onto Hybond-ECL
nitrocellulose membranes (Amersham, Buckinghamshire, UK).
5. Immunoblotting is performed by using BXP-21 (1:200 dilu-
tion) as the first antibody and anti-mouse IgG-horseradish per-
oxidase (HRP)-conjugate (1:3,000 dilution; Cell Signaling
Technology, Beverly, MA, USA) as the secondary antibody.
6. HRP-dependent luminescence is developed by using Western
Lighting Chemiluminescent Reagent Plus (PerkinElmer Life
Sciences, Boston, MA, USA) and detected in a Lumino
Imaging Analyzer FAS-1000 (TOYOBO, Osaka, Japan).
7. To detect GAPDH, as an internal loading control, immunob-
lot detection is carried out in the same manner as described
above, except for the use of mouse monoclonal antibody
against GAPDH (1:1,000 dilution; American Research
Products, Inc. Belmont, MA, USA) as the first antibody.
8. Based on the amino acid sequence (NM_004827) of human
ABCG2, the molecular weight of non-glycosylated ABCG2
WT is calculated to be 72,314 by using the ExPASY Compute
pI/Mw tool (http://us.expasy.org/tools/pi_tool.html). This
molecular weight is referred as the non-glycosylated nascent
peptide (monomer) of ABCG2 [6, 32].

3.1.7 Immunofluo- 1. ABCG2 expressing Flp-In-293 cells are seeded onto collagen
rescence Microscopy type I-coated cover glasses and incubated under the above-
mentioned culture conditions for 24 h.
236 Toshihisa Ishikawa et al.

2. Cells are fixed with 4 % paraformaldehyde in PBS at room

temperature for 20 min. Thereafter, cell membranes are per-
meabilized by incubation with 0.02 % Triton X-100 in PBS at
room temperature for 5 min.
3. To block free aldehyde groups of the formaldehyde, cells are
treated with glycine (10 mg/ml) in PBS at room temperature
for 10 min, which is followed by a further incubation with
0.5 % (w/v) albumin in PBS at room temperature for 1 h.
4. To detect the ABCG2 protein, cells are treated with the BXP-21
antibody (1:1,000 dilution; SIGNET, Dedham, MA, USA) as
the first antibody and subsequently with the Alexa Fluor
488-conjugated anti-mouse IgG antibody (1:1,000 dilution;
Invitrogen, Carlsbad, CA, USA).
5. In the same preparations, nuclear DNA is stained with prop-
idium iodide (4 μg/ml) in PBS containing 0.5 % (w/v) albumin.
The immunofluorescence of Flp-In-293 cells is detected with
a confocal laser-scanning fluorescence microscope [28].

3.2 Methods The ABCG2 cDNA-containing pcDNA5/FRT plasmid is digested

for Functional by EcoRI, and ABCG2 cDNA is removed. After treatment with
Evaluation of alkaline phosphatase, ABCG2 cDNA is ligated to the EcoRI site of
Nonsynonymous SNPs the pFastBac1 plasmid by using the Rapid DNA Ligation Kit
(Roche Applied Science, Roche Diagnosis Corp., Indianapolis, IN,
3.2.1 Preparation USA) [30].
of Plasmids Carrying
ABCG2 Variant cDNA 1. Nonsynonymous SNP variants are generated by using the
QuikChange® Site-Directed Mutagenesis Kit (Stratagene, La
Jolla, CA, USA). PCR is carried out in an iCycler (Bio-Rad
Laboratories, Inc., Hercules, CA, USA) by using PfuTurbo®
DNA polymerase, the ABCG2-pFastBac1 plasmid, and specific
primers (see Table 1 for primers).
2. The PCR is initiated by incubation at 95 °C for 30 s and then
followed by 12 cycles of reactions at 95 °C for 30 s, at the Tm
given in Table 1 for 1 min, and at 68 °C for 14 min.
3. After the PCR, the reaction mixture is incubated with Dpn I
endonuclease at 37 °C for 1 h to digest the original template
plasmid. Each variant cDNA generated in the pFastBac1
plasmid is subjected to nucleotide sequence analysis (Hitachi,
Ltd., Tokyo, Japan).

3.2.2 Expression Figure 4 demonstrates the strategy for the expression of ABCG2
of ABCG2 Variants variants in Sf9 cells.
in Sf9 Cells
1. Competent DH10Bac E. coli cells are transformed by the variant
ABCG2 plasmids. The variant ABCG2 cDNA is then trans-
posed into a bacmid, which is a baculovirus shuttle vector
carrying the baculovirus genome, in DH10Bac cells with the
aid of a helper plasmid.
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 237

Human ABCG2 cDNA

ABCG2
pFastBac1-
ABCG2
plasmid
Competent DH10BacTM E. Coli cells

Harvest insect cells

Infection of Sf9 cells

Recombinant Baculovirus
Recombinant Bacmid DNA
Prepare
plasma membrane vesicles Viral Amplification

Quality control of vesicles Functional screening

Fig. 4 Expression of ABCG2 in Sf9 insect cells. ABCG2 cDNA is inserted into the pFastBac1 plasmid. Competent
DH10Bac E. coli cells were transformed by the variant ABCG2 plasmids. Then, the variant ABCG2 cDNA was
transposed into a bacmid, which is a baculovirus shuttle vector carrying the baculovirus genome, in DH10Bac
cells with the aid of a helper plasmid. The baculovirus has a 130-kb double-stranded DNA genome packaged
in a cigar-shaped (25 by 260 nm) enveloped nucleocapsid. Baculovirus enters insect cells via receptor-
mediated endocytosis [33]. The viral fusion protein gp64 is responsible for acid-induced endosomal escape [34].
In the cytoplasm, the nucleocapsid probably induces the formation of actin filaments, which provide a possible
mode of transport toward the nucleus [35, 36]. The recombinant bacmid was isolated and purified. Sf9 cells
were grown in EX-CELL™ 420 Insect serum-free medium and then transfected with the ABCG2-recombinant
bacmid in the presence of Cellfectin® reagent. The culture medium containing the recombinant baculovirus
was harvested, and Sf9 cells were further infected with the harvested virus and maintained at 27 °C for 72 h.
Sf9 cells expressing ABCG2 were finally harvested by centrifugation

2. The recombinant bacmid is isolated and purified.

3. Insect Spodoptera frugiperda Sf9 cells are grown in EX-CELL™
420 Insect serum-free medium (JRH Biosciences, Inc., Lenexa,
KZ, USA) supplemented with 1 % (v/v) heat-inactivated FCS,
penicillin (100 U/ml), and streptomycin (100 μg/ml) (Invitrogen
Co., Carlsbad, CA, USA) with gentle shaking at 27 °C.
4. Sf9 cells are then transfected with the ABCG2-recombinant
bacmid in the presence of Cellfectin® reagent (Invitrogen Co.,
Carlsbad, CA, USA) according to the manufacturer’s
protocol.
5. Ninety-six hours after the transfection, the culture medium
containing the recombinant baculovirus is harvested by
centrifugation.
238 Toshihisa Ishikawa et al.

6. To amplify recombinant baculovirus, Sf9 cells are further

infected with the harvested virus and maintained at 27 °C for
72 h. After the incubation, the culture medium is harvested by
centrifugation. This process is repeated two times.
7. Sf9 cells (1 × 106 cells/ml) are infected with the amplified
recombinant baculoviruses and cultured in EX-CELL™ 420
Insect serum-free medium at 27 °C with gentle shaking.
8. Three days after the infection, Sf9 cells are harvested by
centrifugation.
9. Sf9 cells are subsequently washed with PBS at 4 °C, collected
by centrifugation, and stored at −80 °C until used.

3.2.3 Preparation of the Plasma membrane vesicles are prepared from ABCG2-expressing
Plasma Membrane Vesicles Sf9 cells as described previously [29, 30]. The use of low ionic
from Sf9 Cells strength buffers during the membrane preparation steps promotes
the formation of open membrane sheets and inside-out membrane
vesicles. It is important to maintain high integrity of the plasma
membrane vesicles used in the transport assay. In other words, the
membrane vesicles must be completely sealed. Figure 5

600 x 106 cells 50-ml Tube

30 ml of Hypo-tonic buffer 0.5 mM Tris/HEPES (pH 7.4)

& 0.1 mM EGTA
Repeat Homogenization

Centrifugation
2,000 x g, 10 min

Ppt. Sup. Interface fraction

Homogenization 10 ml of 0.25 M sucrose &
10 mM Tris/HEPES (pH 7.4)
Centrifugation
100,000 x g, 30 min Centrifugation
100,000 x g, 30 min
Sup. Ppt.
10 ml of 0.25 M sucrose & Sup. Ppt.
10 mM Tris/HEPES (pH 7.4)
0.25 M sucrose &
Homogenization 10 mM Tris/HEPES (pH 7.4)

Divided into micro-tubes

& frozen at -80˚C
40% sucrose &
10 mM Tris/Hepes Centrifugation at Plasma membrane vesicles
pH 7.4 100,000 x g, 30 min

Fig. 5 Preparation of membrane vesicles from ABCG2-expressing Sf9 cells. The frozen cell pellet was thawed
quickly, diluted 40-fold with the hypotonic buffer, and then homogenized with a Potter-Elvehjem homogenizer.
After centrifugation at 2,000 × g, the supernatant was further centrifuged at 100,000 × g for 30 min. The crude
membrane fraction was layered over 40 % (w/v) sucrose solution and centrifuged at 100,000 × g for 30 min.
The turbid layer at the interface was collected and then centrifuged at 100,000 × g for 30 min. The membrane
fraction was collected and resuspended in a small volume (150–250 μl) of 0.25 M sucrose containing 10 mM
Tris/HEPES (pH 7.4)
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 239

demonstrates the strategy for preparation of the plasma membrane

vesicles from Sf9 cells
1. The frozen cell pellet is thawed quickly, diluted 40-fold with a
hypotonic buffer (0.5 mM Tris/HEPES, pH 7.4, 0.1 mM
EGTA) containing leupeptin (10 μg/ml), and then homoge-
nized with a Potter-Elvehjem homogenizer.
2. After centrifugation at 2,000 × g, the supernatant is further
centrifuged at 100,000 × g for 30 min. The resulting pellet is
suspended in 0.25 M sucrose containing 10 mM Tris/HEPES,
pH 7.4 and leupeptin (10 μg/ml).
3. The crude membrane fraction is layered over 40 % (w/v)
sucrose solution and centrifuged at 100,000 × g for 30 min.
4. The turbid layer at the interface is collected, suspended in
0.25 M sucrose containing 10 mM Tris/HEPES, pH 7.4, and
centrifuged at 100,000 × g for 30 min.
5. The membrane fraction is collected and resuspended in a small
volume (150–250 μl) of 0.25 M sucrose containing 10 mM
Tris/HEPES, pH 7.4.
6. After the protein concentration is measured by the BCA
Protein Assay Kit (PIERCE, Rockford, IL, USA), the membrane
solution is stored at −80 °C until used.

3.2.4 Immunological The amount of ABCG2 expressed in the cell membrane vesicles is
Detection of ABCG2 in determined by immunoblotting with BXP-21 (SIGNET, Dedham,
Plasma Membrane Vesicles MA, USA), a specific antibody to human ABCG2, as described
above. To quantitatively analyze the transport activity of ABCG2
variants, it is critically important to normalize the expression level
of each variant protein. There is a linear relationship between the
signal intensity of immunoblotting and the logarithmic value of
the amount of protein applied to the electrophoresis [29, 30].
Based on this linear relationship, the expressed levels of ABCG2
and its variants in different plasma membrane preparations can be
quantitatively estimated and normalized [29, 30].

3.2.5 High-Speed ABCG2 is suggested to be responsible for the cellular homeostasis of

Detection of ABCG2- porphyrins and their related compounds. In fact, ABCG2 transports
Mediated Porphyrin protoporphyrin IX and hematoporphyrin in an ATP-dependent
Transport manner. These porphyrins are considered to be endogenous sub-
strates of ABCG2. To evaluate the impact of nonsynonymous
SNPs on such physiological functions of ABCG2, we have devel-
oped a high-speed detection method for ABCG2-mediated por-
phyrin transport. Figure 6 illustrates the procedure of the porphyrin
transport assay.
1. The frozen stocked membrane is quickly thawed, and mem-
brane vesicles are formed by passing the membrane suspension
through a 27-gauge needle.
240 Toshihisa Ishikawa et al.

Incubation medium: 100 ml Separation p ate

Plasma membrane vesicles (50 mg of protein),
1 mM ATP, 10 mM MgCl2, 10 mM Creatine phosphate,
100 mg/ml Creatine kinase,
0.25 M Sucrose, 10 mM Tris/HEPES (pH 7.4)
Porphyrins (and inhibitors)

30 ml/well
Incubate (PCR machine): 500 ml/well Sephadex G-25
4°C 10 sec (Bed volume: 100 ml)
37°C 10 min
1,600×g, 4°C, 5 min
0.25 M Sucrose,
Stop solution: 80 ml/well 10 mM Tris/HEPES (pH 7.4)
10 mM EDTA, 0.25 M Sucrose, ×3
1,600×g, 4°C, 5 min
10 mM Tris/HEPES (pH 7.4)
50 ml/well

Separation plate

96-well microplate
1,600×g, 4°C, 5 min

Separation plate & 96-well microplate

250 ml/well 10 mM NaOH

Measure: Excitation: 405 nm
Emission: 612 nm

Fig. 6 Detection of ATP-dependent hematoporphyrin transport into plasma membrane vesicles by using a
96-well separation plate. Plasma membrane vesicles expressing human ABCG2 were prepared from Sf9 insect
cells. ATP-dependent transport of hematoporphyrin into the vesicles mediated by the action of ABCG2 was
measured by measuring the fluorescence of hematoporphyrin incorporated into the membrane vesicles [37]

2. Plasma membrane vesicles (50 μg of protein) are incubated

with 20 μM hematoporphyrin in the presence or absence of
1 mM ATP in 30 μl of the standard incubation medium
(0.25 M sucrose and 10 mM Tris/HEPES, pH 7.4, 10 mM
creatine phosphate, 100 μg/ml of creatine kinase, and 10 mM
MgCl2) at 37 °C for 10 min.
3. After a specified incubation period, the reaction mixture is
mixed with 80 μl of ice-cold stop solution (10 mM EDTA,
0.25 M sucrose, and 10 mM Tris/HEPES, pH 7.4), and then
50 μl of the resulting solution is loaded onto a 96-well separa-
tion plate (100 μl of bed volume) packed with Sephadex G-25
equilibrated with 0.25 M sucrose and 10 mM Tris/HEPES,
pH 7.4 (Fig. 6).
4. The plate is immediately centrifuged in a swing-type rotor at
1,600 × g for 5 min, whereby the eluate is collected into a
96-well microplate (Fig. 6).
5. The eluate in each well is mixed with 250 μl of 10 mM NaOH
solution to dissolve the plasma membrane vesicles.
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 241

6. Hematoporphyrin in the resulting solution is quantitatively

analyzed by measuring its fluorescence in a fluorescence spec-
trophotometer (excitation at 405 nm; emission at 612 nm).

3.2.6 High-Speed To detect the drug transport activity of ABCG2 WT and SNP
Screening to Measure the variants, we used methotrexate (MTX) as a model substrate.
Transport Activity of ABCG2 Figure 7 illustrates the procedure of the functional assay. This high-
and Its Variants speed screening method can be used to investigate drug-ABCG2
interactions based on quantitative structure–activity relationship
(QSAR) analysis [38].
1. The frozen stocked membrane is first thawed quickly, and then
membrane vesicles are formed by passing the membrane
suspension through a 27-gauge needle.
2. To measure the ABCG2-mediated MTX transport, the stan-
dard incubation medium should contain plasma membrane
vesicles (10 or 50 μg of protein), 200 μM [3′,5′,7′-3H]MTX
(Amersham, Buckinghamshire, UK), 0.25 M sucrose, 10 mM
Tris/HEPES, pH 7.4, 10 mM MgCl2, 1 mM ATP, 10 mM
creatine phosphate, and 100 μg/ml of creatine kinase in a final
volume of 100 μl. The incubation is carried out at 37 °C.

Incubation medium:
plasma membrane vesicles (50 mg of protein)
Preparation on ice 100 ml 0.25M sucrose, 10 mM Tris/HEPES (pH 7.4),
1 mM ATP, 10 mM MgCl2,
10 mM creatine phosphate, creatine kinase (100 mg/ml),
100 mM [3H]Methotrexate (MTX)
Incubation at 37°C for 20 min
1 ml
Stop solution: 0.25 M sucrose
(ice-cold) 2 mM EDTA
Quick mixing 10 mM Tris/HCl (pH 7.4)
270 ml / well

MultiScreenTM plate
200 ml / well 0.25 M sucrose
10 mM Tris/HCl (pH 7.4)
Aspiration
ATP Rinse 4 times
[3H]MTX
ADP, Pi

Filter membrane Measure radioactivity

Fig. 7 High-speed screening method to study the transport activity of human ABCG2 and its SNP variants.
ATP-dependent transport of [3H]methotrexate (MTX) into plasma membrane vesicles mediated by the action of
ABCG2 was measured by counting the radioactivity remaining on the filter of MultiScreen™ plates. Inhibition
of MTX transport was detected by adding a test compound into the reaction mixture [38]
242 Toshihisa Ishikawa et al.

3. After a specified time (20 min for the standard condition), the
reaction medium is mixed with 1 ml of the ice-cold stop solution
(0.25 M sucrose, 10 mM Tris/HEPES, pH 7.4, and 2 mM
EDTA) to terminate the transport reaction. Subsequently, ali-
quots (270 μl per well) of the resulting mixture are transferred
to MultiScreen™ plates (Nihon Millipore KK, Tokyo, Japan).
4. Under aspiration, each well of the plate is then rinsed with the
0.25 M sucrose solution containing 10 mM Tris/HEPES, pH
7.4, four times (4 × 200 μl for each well) in an EDR384S system
(BioTec, Tokyo, Japan).
5. [3H]MTX thus incorporated into the vesicles is measured by
counting the radioactivity remaining on the filter of
MultiScreen™ plates, where each filter is placed in 2 ml of liq-
uid scintillation fluid (Ultima Gold, Packard BioScience).

4 Notes

4.1 Notes for The Flp-In-293 cell line is a useful cell system for studying the
Valuation of the Impact molecular mechanism of protein misfolding and the subsequently
of Nonsynonymous occurring ERAD process. Flp-In-293 cells are not polarized cells.
SNPs on Protein Therefore, for studying the apical or basolateral localization of
Degradation membrane proteins, MDCK (Madin-Darby canine kidney) and
LLC-PK1 (porcine kidney) cells may be applicable
4.1.1 Flp-In Cell Lines
1. The Flp-In method is based on the exchange of an expression
cassette within a previously tagged FRT site. M-FISH revealed
that ABCG2 cDNA was incorporated into the telomeric region
of chromosome 12p in Flp-In-293 cells [26, 27].
2. As shown in Fig. 8a, mRNA levels of ABCG2 WT and SNP
variants (V12M, Q141K, F208S, S248P, F431L, S441N, and
F489L) were evenly represented in Flp-In-293 cells. On the
other hand, ABCG2 WT and those SNP variants as well as
GAPDH proteins were detected by immunoblotting, and their
expression levels should be quantified. For this purpose, we
treated all of the samples with PNGase F and mercaptoethanol
to remove glycomoieties and to break the cysteinyl disulfide
bond forming a homodimer. Since there was a linear relation-
ship between the signal intensity of immunoblotting and the
logarithmic value of the amount of ABCG2 protein applied to
the electrophoresis, the expression level of ABCG2 or GAPDH
in cell lysate samples could be quantitatively estimated based on
the linear relationship [28, 29, 32]. The relative values of
protein levels were then normalized to the ratio of ABCG2
WT/GAPDH. Although mRNA levels were almost the same in
the WT and SNP variants (F208S and S441N), protein levels of
those variants were markedly decreased (Fig. 8b). The protein
level of Q141K variant was about half that of the WT level.
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 243

3. The immunofluorescence images of Flp-In-293 cells expressing

ABCG2 WT or SNP variants revealed that F208S and S441N
variant proteins were not expressed in the plasma membrane
(Fig. 8b). The S441N variant appeared to remain in the intra-
cellular space, most probably located in aggresomes. The other

a mRNA levels Protein levels

Q141K

Q141K
V12M

V12M
Mock

Mock
WT

WT
ABCG2 ABCG2

GAPDH GAPDH

Relative protein level

Relative mRNA level

2 2

1 1
*
0 0 *

mRNA levels Protein levels

S441N

S441N
S248P

S248P
F208S

F208S
F431L

F489L

F431L

F489L
Mock

Mock
WT

ABCG2 ABCG2

GAPDH GAPDH
Relative protein level
Relative mRNA level

2 2

1 1

*
0 0 * *

Fig. 8 mRNA and protein expression levels (a) as well as immunofluorescence images of Flp-In-293 cells
expressing ABCG2 WT or SNP variants (b). (a) Relative levels of mRNA were detected by RT-PCR with specific
primers for ABCG2 and GAPDH. Data are calculated as ratios by referring to the GAPDH mRNA levels in
Flp-In-293 cells and normalized to the ratio of ABCG2/GAPDH. Data are expressed as mean values ± SD (n = 4).
Relative levels of ABCG2 protein were detected by immunoblotting. ABCG2 protein was detected by immunob-
lot analysis with BXP-21 monoclonal antibody. Data are calculated as ratios by referring to the GAPDH protein
levels in Flp-In-293 cells and normalized to the ratio of ABCG2/GAPDH. Data are expressed as means ± SD in
triplicate experiments. Statistical significance (*P < 0.05) was evaluated by Student’s t-test. (b) The ABCG2
protein was immunologically linked with Alexa Fluor 488 (green fluorescence), and nuclei were stained with
propidium iodide (red fluorescence). Horizontal bars correspond to 20 μm
244 Toshihisa Ishikawa et al.

Fig. 8 (continued)

variants, i.e., V12M, Q141K, S248P, F431L, and F489L, were

expressed in the plasma membrane as was ABCG2 WT.
4. MG132 and bafilomycin A1 (BMA) are potent proteolysis
inhibitors in proteasomes and lysosomes, respectively [6, 27, 32].
By using these inhibitors, we could identify protein degrada-
tion pathways for ABCG2 WT and SNP variants. Flp-In-293
cells expressing F208S or S441N were incubated in the presence
of MG132 (2.0 μM) for 24 h, and then cell lysate samples were
immediately prepared. Protein expression levels of the F208S
and S441N variants were determined by immunoblotting after
PNGase F treatments. As shown in Fig. 9a, the protein levels of
those ABCG2 variants were remarkably enhanced by the treat-
ment with the proteasome inhibitor MG132 in a concentration-
dependent manner. In contrast, the protein level of the ABCG2
WT was not significantly affected by MG132 treatment; however,
it was significantly enhanced by BMA treatment. These results
suggest that ABCG2 WT is degraded mainly in lysosomes,
whereas the F208 and S441N variants undergo ubiquitination
and proteasomal degradation (Fig. 9b).

4.1.2 Ubiquitin-Mediated The ubiquitin-mediated proteasomal degradation of drug metabo-

Proteasomal Degradation lizing enzymes and transporters is a new aspect of pharmacoge-
of Other ABC Transporters nomics. The present review addresses ER protein quality control
and ubiquitin-mediated proteasomal degradation of human ABC
transporters. A recent review [6] summarizes the effect of nonsyn-
onymous mutations and SNPs on protein maturation, intracellular
trafficking, or ERAD of ABC transporters. While there are many
reports on impaired protein processing and enhanced degradation
of mutation variants of disease-associated ABC transporters, only
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 245

Fig. 9 Effect of bafilomycin A1 (BMA) and MG132 (MG) on the protein levels of ABCG2 WT, F208S, and S441N
(a) as well as a schematic illustration of plausible pathways involved in the degradation of ABCG2 protein (b).
(a) Flp-In-293 cells expressing WT, F208S, or S441N were incubated in the absence or presence of BMA
(10 nM) or MG (2.0 μM) for 24 h. ABCG2 WT, F208S, and S441N variant proteins were analyzed by immunoblotting
with the ABCG2-specific monoclonal antibody (BXP-21) after PNGase F treatment. The ABCG2 protein level in the
cell lysate of each cell population was analyzed by immunoblotting with the ABCG2-specific monoclonal antibody
(BXP-21) or the GAPDH-specific antibody after PNGase F treatment. The signal intensity ratio (ABCG2/GAPDH)
was normalized to the control level (labeled as “None”). Data are expressed as means ± SD in triplicate experi-
ments. (b) The correctly processed ABCG2 WT is finally destined to reach the plasma membrane and is then
degraded by the endosome–lysosome pathway after remaining in the plasma membrane domain for a certain
period. In contrast, the misfolded ABCG2 protein undergoes ubiquitination-mediated proteasomal degradation.
Bafilomycin A1 (BMA) and MG132 inhibit lysosomal and proteasomal degradation, respectively

limited information is presently available for the genetic polymor-

phisms of drug-transporting ABC transporters.

4.2 Notes To examine the quality of plasma membrane vesicles prepared from
for Functional Sf9 cells, we used scanning electron microscopy (SEM) technolo-
Evaluation of gies and identified the optimal conditions required to prepare the
Nonsynonymous SNPs membrane vesicles. SEM revealed that well-sealed membrane
vesicles have an average size (diameter) of about 200 nm [38].
4.2.1 Quality of Plasma
The timing of harvesting Sf9 cells after baculovirus infection is very
Membrane Vesicles
critical. The membrane morphology of infected Sf9 cells changed
Prepared from Sf9 cells
greatly; in particular, numerous pores were observed after day 5.
Membrane vesicles prepared from those cells (>day 5) are useless
for our purpose.
1. It is important to prepare membrane vesicles in the presence of
serine/cysteine protease inhibitors. Leupeptin (10 μg/ml)
inhibits the degradation of ABCG2 protein in membrane
vesicles prepared from baculovirus-infected Sf9 cells during
repetitive freeze–thaw cycles.
246 Toshihisa Ishikawa et al.

2. Membrane vesicles (suspended in 250 mM sucrose and 10 mM

Tris/HEPES, pH 7.4) can be stored at −80 °C or in liquid
nitrogen until used. For long-term (over 1 year) storage of
membrane vesicles, however, we recommend substituting
trehalose for sucrose in the membrane vesicle preparations
[39]. Trehalose (α-D-glucopyranosyl α-D-glucopyranoside) is a
nonreducing disaccharide comprising two glucose molecules
joined by an α,α-1,1 linkage. Trehalose is a stress protectant in
biological systems as it interacts with and directly protects lipid
membranes and proteins from desiccation and during freezing
[40–42].

4.2.2 Gel-Filtration Assay Whereas the rapid filtration method is widely used for the transport
Method for the Transport of assay, it is not applicable for assaying porphyrin transport. Since
Hydrophobic Compounds hematoporphyrin is bound to the filter membrane surface, it causes
high background levels in the transport measurements [30, 37].
Therefore, we applied gel-filtration to the porphyrin transport
assay, as shown in Fig. 6. Based on our experiences, we recommend
the gel-filtration method when hydrophobic compounds are used
as substrates of the transporter of interest.

4.2.3 Quantitative To quantitatively analyze the transport activity per each SNP
Analysis of the Transport variant, it is important to quantitatively analyze the immunoblot-
Activity of SNP Variants ting intensities vs. ABCG2 protein levels. There is a linear relation-
ship between the signal intensity of immunoblotting and the
logarithmic value of the amount of protein applied to the electro-
phoresis [30]. Based on this linear relationship, the expression
levels of ABCG2 and its variants in different plasma membrane
preparations can be quantitatively estimated and normalized.
1. Figure 10 demonstrates the ATP-dependent transport of
hematoporphyrin (upper panel) and methotrexate (lower
panel) mediated by ABCG2 and its variants. Plasma membrane
vesicles (50 μg of protein) expressing ABCG2 and its variants
are incubated with 20 μM hematoporphyrin or 200 μM [3H]
methotrexate in the presence of 1 mM ATP. Each transport
activity is calculated by considering the normalized levels of
ABCG2 protein expression [30].
2. It is important to note that the variants Q126stop, F208S,
S248P, E334stop, and S441N lack substantial transport activity
for both hematoporphyrin and methotrexate. Interestingly, the
F489L variant, which does not transport methotrexate, exhibits
impaired hematoporphyrin transport (Vmax = 0.058 nmol/min/
mg protein, Km = 8.6 μM for F489L vs. Vmax = 0.654 nmol/min/
mg protein, Km = 17.8 μM for WT).
3. The F431L variant as well as the acquired mutants R482G and
R482T transport hematoporphyrin (upper panel), although
they do not transport methotrexate (lower panel). These
results provide evidence that certain nonsynonymous SNPs
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 247

0.5
Porphyrin

(nmol/min/mg protein)
0.4
Porphyrin transport
0.3

0.2

0.1

0.0

1.5
Methotrexate
Methotrexate transport
(nmol/min/mg protein)

1.0

0.5

0.0
WT
Mock

I206L

F571I
V12M
G51C

F431L

F489L
F208S
S248P

R482T
S441N

N590Y
Q141K
T153M
Q166E

D620N
R482G
E334stop
Q126stop

Fig. 10 ATP-dependent transport of hematoporphyrin (upper panel) and methotrexate (lower panel) mediated
by ABCG2 and its variants. Plasma membrane vesicles (50 μg of protein) were incubated with 20 μM hemato-
porphyrin or 200 μM [3H]MTX in the presence or absence of 1 mM ATP in the standard incubation medium at
37 °C for 10 min (hematoporphyrin) or 20 min (MTX). The ATP-dependent transport of hematoporphyrin or MTX
is normalized for the amount of ABCG2 protein as described previously [6]. Data are expressed as means ± SD
in triplicate experiments

and acquired mutation greatly affect the substrate specificity as

well as the protein expression level of ABCG2.
4. We functionally classified these nonsynonymous polymorphisms
(V12M, Q141K, F208S, S248P, F431L, S441N, and F489L) and
acquired mutants (R482, R482T) in terms of their protein expres-
sion level, drug resistance profile, and prazosin-stimulated ATPase
activity. Figure 11 summarizes the functional properties of these
variants and the acquired mutants. Based on the experimental data
hitherto obtained, those variants and mutants are classified into
four groups: group 1 (WT, V12M, Q141K); group 2 (F208S,
S441N), group 3 (S248P, F431L, F489); and group 4 (R482,
R482T) [28].

4.2.4 Q141K as a Risk Large meta-analyses of genome-wide association studies (GWAS)

Factor for Gout, have revealed that one SNP in the ABCG2 genes is strongly associ-
Cardiovascular Disease, ated with the phenotype of gout [43–45]. Several laboratories have
and Diabetes independently reported that the nonsynonymous SNP 421C > A
(Q141K) in the ABCG2 gene is one of the major genetic factors
248 Toshihisa Ishikawa et al.

WT V12M Q141K F208S S248P F431L S441N F489L R482G R482T

Protein expression + + + - + + - + + +
MTX transport + + + - - - - +/ - -
Porphyrin transport + + + - - + - +/ + +
SN-38 resistance + + + - +/ + - - + +
MX resistance + + + - - +/ - - + +
Doxorubicin resistance - - - - - - - - + +
Daunorubicin resistance - - - - - - - - + +
ATPase activity (Prazosin) - - - - - - - - + +
The expression of Q141K protein is reduced by about 50% due to both lysozomal and proteasomal degradation.

Fig. 11 Characterization of ABCG2 WT and SNP variants. The properties of ABCG2 WT and SNP variants were
characterized as + (positive), − (negative), or ± (marginal) according to the following indexes: protein expression,
transport of methotrexate (MTX) or porphyrin, resistance to SN-38, mitoxantrone (MX), doxorubicin, or dauno-
rubicin, and prazosin-stimulated ATPase activity. Data are from ref. 28

for elevated serum uric acid levels and the risk of gout [46, 47].
Uric acid is the end product of purine metabolism in humans.
Two-thirds of the uric acid in the human body is normally excreted
through the kidney, whereas one-third gains entrance to the gut
where it undergoes uricolysis (decomposition of uric acid). Because
of impaired expression of the SNP variant (Q141K) of ABCG2,
elevated serum uric acid levels cause gout and are a risk factor for
cardiovascular disease and diabetes. This provides evidence that
ABCG2 expressed on the apical side of the proximal tubular cells
in human kidney plays a pivotal role in the renal excretion of serum
uric acid.

Acknowledgments

The study performed in the authors’ laboratory was supported by

the NEDO International Joint Research Grant program
“International standardization of functional analysis technology
for genetic polymorphisms of drug transporters” as well as a Grant-
in-Aid for Scientific Research (A) (No. 18201041) and Grants for
Exploratory Research (No. 19659136 and No. 23650619) from
the Japanese Society for the Promotion of Science (JSPS).

References

1. Evans WE, Johnson JA (2001) project: research at the interface of genomics

Pharmacogenomics: the inherited basis for and transporter pharmacology. Clin Pharmacol
interindividual differences in drug response. Ther 87:109–116
Annu Rev Genomics Hum Genet 2:9–39 4. Kim RB (2002) Pharmacogenetics of CYP
2. Evans WE, Relling MV (1999) Pharmacoge- enzymes and drug transporters: remarkable
nomics: translating functional genomics into recent advances. Adv Drug Deliv Rev
rational therapeutics. Science 286:487–491 54:1241–1242
3. Kroetz DL, Yee SW, Giacomini GK (2010) The 5. Ishikawa T, Tsuji A, Inui K et al (2004) The
pharmacogenomics of membrane transporters genetic polymorphism of drug transporters:
Methods to Examine the Impact of Nonsynonymous SNPs on Protein Degradation… 249

functional analysis approaches. Pharmacoge- 20. Broach JR, Hicks JB (1980) Replication and
nomics 5:67–99 recombination functions associated with the
6. Nakagawa H, Toyoda Y, Wakabayashi-Nakao yeast plasmid, 2 mu circle. Cell 21:501–508
K et al (2011) Ubiquitin-mediated protea- 21. Broach JR, Guarascio VR, Jayaram M (1982)
somal degradation of ABC transporters: a new Recombination within the yeast plasmid 2 mu
aspect of genetic polymorphisms and clinical circle is site-specific. Cell 29:227–234
impacts. J Pharm Sci 100:3602–3619 22. Argos P, Landy A, Abremski K et al (1986)
7. Ellgaard L, Molinari M, Helenius A (1999) The integrase family of site-specific recombi-
Setting the standards: quality control in the nases: regional similarities and global diversity.
secretory pathway. Science 286:1882–1888 EMBO J 5:433–440
8. Mori K (2000) Tripartite management of 23. Craig NL (1988) The mechanism of conserva-
unfolded proteins in the endoplasmic reticulum. tive site-specific recombination. Annu Rev
Cell 101:451–454 Genet 22:77–105
9. Hampton RY (2002) ER-associated degrada- 24. Buchholz F, Ringrose L, Angrand PO et al
tion in protein quality control and cellular (1996) Different thermostabilities of FLP and
regulation. Curr Opin Cell Biol 14:476–482 Cre recombinases: implications for applied
10. Kleizen B, Braakman I (2004) Protein folding site-specific recombination. Nucleic Acids Res
and quality control in the endoplasmic reticu- 24:4256–4262
lum. Curr Opin Cell Biol 16:343–349 25. Mitomo H, Kato R, Ito A et al (2003) A func-
11. Sauer B (1994) Site-specific recombination: tional study on polymorphism of the
developments and applications. Curr Opin ATP-binding cassette transporter ABCG2:
Biotechnol 5:521–527 critical role of arginine-482 in methotrexate
transport. Biochem J 373:767–774
12. Gronostajski RM, Sadowski PD (1985)
Determination of DNA sequences essential for 26. Tamura A, Wakabayashi K, Onishi Y et al
FLP-mediated recombination by a novel (2006) Genetic polymorphisms of human ABC
method. J Biol Chem 260:12320–12327 transporter ABCG2: development of the stan-
dard method for functional validation of SNPs
13. Jayaram M (1985) Two-micrometer circle site- by using the Flp recombinase system. J Exp
specific recombination: the minimal substrate Ther Oncol 6:1–11
and the possible role of flanking sequences.
27. Wakabayashi-Nakao K, Tamura A, Koshiba S
Proc Natl Acad Sci USA 82:5875–5879
et al (2010) Production of cells with targeted
14. Senecoff JF, Bruckner RC, Cox MM (1985) integration of gene variants of human ABC
The FLP recombinase of the yeast 2-micron transporter for stable and regulated expression
plasmid: characterization of its recombination using the Flp recombinase system. Methods
site. Proc Natl Acad Sci USA 82:7270–7274 Mol Biol 648:139–159
15. Andrews BJ, Proteau GA, Beatty LG et al 28. Tamura A, Wakabayashi K, Onishi Y et al
(1985) The FLP recombinase of the 2 micron (2007) Re-evaluation and functional classifica-
circle DNA of yeast: interaction with its target tion of non-synonymous single nucleotide
sequences. Cell 40:795–803 polymorphisms of the human ATP-binding
16. Boshart M, Weber F, Jahn G et al (1985) cassette transporter ABCG2. Cancer Sci
A very strong enhancer is located upstream of an 98:231–239
immediate early gene of human cytomegalovirus. 29. Ishikawa T, Sakurai A, Kanamori Y et al
Cell 41:521–530 (2005) High-speed screening of human
17. Nelson JA, Reynolds-Kohler C, Smith BA ATP-binding cassette transporter function
(1987) Negative and positive regulation by a and genetic polymorphisms: new strategies
short segment in the 5'-flanking region of the in pharmacogenomics. Methods Enzymol
human cytomegalovirus major immediate- 400:485–510
early gene. Mol Cell Biol 7:4125–4129 30. Tamura A, Watanabe M, Saito H et al (2006)
18. Andersson S, Davis DL, Dahlback H et al Functional validation of the genetic polymor-
(1989) Cloning, structure, and expression of the phisms of human ATP-binding cassette (ABC)
mitochondrial cytochrome P-450 sterol transporter ABCG2: identification of alleles
26-hydroxylase, a bile acid biosynthetic enzyme. that are defective in porphyrin transport. Mol
J Biol Chem 264:8222–8229 Pharmacol 70:287–296
19. Gritz L, Davies J (1983) Plasmid-encoded 31. Ishikawa T, Tamura A, Saito H et al (2005)
hygromycin B resistance: the sequence of Pharmacogenomics of the human ABC trans-
hygromycin B phosphotransferase gene and its porter ABCG2: from functional evaluation to
expression in Escherichia coli and Saccharomyces drug molecular design. Naturwissenschaften
cerevisiae. Gene 25:179–188 92:451–463
250 Toshihisa Ishikawa et al.

32. Nakagawa H, Tamura A, Wakabayashi K et al 40. Elbein AD, Pan YT, Pastuszak I et al (2003)
(2008) Ubiquitin-mediated proteasomal New insights on trehalose: a multifunctional
degradation of non-synonymous SNP variants molecule. Glycobiology 13:17R–27R
of human ABC transporter ABCG2. Biochem 41. Furuki T, Oku K, Sakurai M (2009)
J 411:623–631 Thermodynamic, hydration and structural
33. Wang P, Hammer DA, Granados RR (1997) characterization of alpha, alpha-trehalose.
Binding and fusion of Autographa californica Front Biosci 14:3523–3535
nucleopolyhedrovirus to cultured insect cells. 42. Guo N, Puhlev I, Brown DR et al (2008)
J Gen Virol 78:3081–3089 Trehalose expression confers desiccation toler-
34. Blissard GW, Wenz JR (1992) Baculovirus ance on human cells. Nat Biotechnol 18:
gp64 envelope glycoprotein is sufficient to 168–171
mediate pH-dependent membrane fusion. 43. Deghan A, Köttgen A, Yang Q et al (2008)
J Virol 66:6829–6835 Association of three genetic loci with uric acid
35. Lanir LM, Volkman LE (1998) Actin binding concentration and risk of gout: a genome-wide
and nucleation by Autographa californica M association study. Lancet 372:1953–1961
nucleopolyhedrovirus. Virology 243:167–177 44. Kolz M, Johnson T, Sanna S et al (2009)
36. Whittaker GR, Helenius A (1998) Nuclear Meta-analysis of 28,141 individuals identifies
import and export of viruses and virus common variants within five new loci that
genomes. Virology 246:1–23 influence uric acid concentrations. PLoS Genet
37. An R, Hagiya Y, Tamura A et al (2009) Cellular 5:e1000504
phototoxicity evoked through the inhibition 45. Stark K, Reinhard W, Grassi M et al (2009)
of human ABC transporter ABCG2 by cyclin- Common polymorphisms influencing serum
dependent kinase inhibitors in vitro. Pharm uric acid levels contribute to susceptibility to
Res 26:449–458 gout, but not to coronary artery disease. PLoS
38. Saito H, Hirano H, Nakagawa H et al (2006) One 4:e7729
A new strategy of high-speed screening and 46. Woodward O, Köttgen A, Coresh J et al
quantitative structure-activity relationship (2009) Identification of a urate transporter,
analysis to evaluate human ATP-binding cas- ABCG2, with a common functional polymor-
sette transporter ABCG2-drug interactions. phism causing gout. Proc Natl Acad Sci USA
J Pharmacol Exp Ther 317:1114–1124 106:10338–10342
39. Saito H, Hirano H, Shin W et al (2009) Technical 47. Matsuo H, Takada T, Ichida K et al (2009)
pitfalls and improvements in high-speed screen- Common defects of ABCG2, a high-capacity
ing and QSAR analysis to predict durg–drug urate exporter, cause gout. A function-based
interactions of ABC transporter ABCB11 (bile genetic analysis in a Japanese population. Sci
salt export pump). AAPS J 11:581–589 Transl Med 1:5ra11
Chapter 16

In Vitro Identification of Cytochrome P450 Enzymes

Responsible for Drug Metabolism
Zhengyin Yan and Gary W. Caldwell

Abstract
Metabolism catalyzed by the cytochrome P450 enzymes (CYPs) represents the most important pathway
for drug metabolism and elimination in humans. Identification of the CYPs responsible for metabolism of
existing and novel drugs is critical for the prediction of adverse reactions caused by drug–drug interactions
or individual genetic polymorphism. An integrated approach is described for CYP-mediated metabolic
reaction phenotyping using both recombinant enzymes and human liver microsomes in combination of
selective inhibitors or inhibitory antibodies. The in vitro method described includes screening of recombi-
nant CYPs for metabolic activity, chemical inhibition or antibody neutralization, and correlation analysis
with isoform-selective marker activities. The primary focus is on identification of the most common
enzymes including CYP1A2, 2C9, 2C19, 2D6, and 3A4, although the same strategy could potentially be
used for identification of other isoforms.

Key words Cytochrome P450, CYPs, Phenotyping, Metabolism

1 Introduction

Cytochrome P450s (CYPs) are a super family of enzymes that play

a pivotal role in metabolism and elimination of drugs. It has been
estimated that approximately two-thirds of drugs on the market
are metabolized by this group of enzymes. Among these CYPs,
three subgroups (CYP1, CYP2, and CYP3) are largely responsible
for the metabolism of marketed drugs and also xenobiotics [1].
Particularly, CYP1A2, 2C9, 2C19, 2D6, and 3A4/5 play a greater
role in the metabolism of a vast majority of drugs. As a result, inhi-
bition of these CYPs by co-administrated drugs represents a prin-
cipal mechanism for metabolism-based drug–drug interactions
which can potentially lead to severe clinical consequences and even
withdrawal of drugs from the market [2]. In addition, genetic vari-
ations in CYPs and various polymorphisms have been well docu-
mented [3–6]. If a clinically used drug is predominantly metabolized

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_16, © Springer Science+Business Media, LLC 2013

251
252 Zhengyin Yan and Gary W. Caldwell

Major Metabolite Profiling in HLM

• Identify major metabolites
• Rule out involvement of non-CYP enzymes

Optimization of Metabolic Reactions

• Determine linear protein conc. range
• Determine linear incubation time
• Determine Km and Vmax

CYP-selective Inhibition Studies Recombinant CYP Screening

• Activity screen of all available CYPs
• CYP-selective chemical inhibitors;
• Normalize activity relative to CYP content in liver
• CYP-selective inhibitory antibodies

Integration & Predication

• In-vitro kinetic analysis
• In-vivo PK data

Correlation Analysis using HLM panel

• Determine correlation coefficient (r) for each CYP

Fig. 1 An integral approach for CYP phenotyping

a polymorphic CYP such as CYP2D6, 2C19, and 2C9, genetic

variations in the expression of the enzyme can potentially play a
great role in therapeutic efficacy and drug toxicity. Therefore,
identification and characterization of the enzyme(s) responsible for
the metabolism of a given drug (CYP-phenotyping) has become an
important task in drug discovery and development.
During the past the decades, a variety of in vitro reagents and
tools have been developed to routinely determine and characterize
which CYP enzyme(s) is involved in the metabolism of a given
drug [7–9]. As shown in Fig. 1, one must understand that metabo-
lite profiling in pooled HLM fortified with NADPH is a prerequi-
site for CYP reaction phenotyping, which is then followed by
various kinetic studies to determine the linear ranges of protein
concentration and incubation time for metabolite formation, as
well as kinetics parameters (Km and Vmax) in microsomal incuba-
tions. Additionally, it is generally recognized that CYP reaction-
phenotyping requires the integration of data obtained from various
in vitro assays such as CYP activity correlation and inhibition stud-
ies, as well as integration of clinical data including clinical PK and
drug interaction studies [10, 11]. This chapter is limited to the in
vitro protocols commonly used for CYP-phenotyping.
In Vitro Identification of Cytochrome P450 Enzymes... 253

2 Materials

2.1 Buffers, All reagents were obtained from Sigma-Aldrich (St. Louis, MO)
Cofactors, and Stop except for those specified.
Solution
1. 0.5 M Potassium phosphate buffer, pH 7.4 is prepared as the
following:
(a) 0.5 M Potassium phosphate, KH2PO4, monobasic. Dissolve
34 g KH2PO4 in 450 mL deionized water, and then bring
the final volume to 500 mM with deionized water.
(b) 0.5 M Potassium phosphate, K2HPO4, dibasic. Dissolve
57 g K2HPO4·3H2O in 450 mL deionized water, and then
bring the final volume to 500 mM with deionized water.
(c) Mix 60 mL 0.5 M KH2PO4 with 280 mL 0.5 M K2HPO4,
and check with a pH meter for a pH value of 7.4. If
necessary, adjust pH with either KH2PO4 or K2HPO4.
2. 5 mM Sodium citrate, tribasic. Dissolve 14.7 mg sodium
citrate in 100 mL deionized water, and store at 4 °C.
3. Co-factors: Dissolve 400 mg nicotinamide adenine dinucleo-
tide phosphate (NADP+), 400 mg glucose-6-phosphate, and
266 mg MgCl2·6H2O in 18 mL deionized water, and then
adjust the final volume to 20 mL with deionized water. Aliquot
and store at −20 °C.
4. Glucose-6-phosphate dehydrogenase (G6PDH): 40 U/mL,
prepared in 5 mM sodium citrate. Aliquot and store at −20 °C.
5. Stop solution: acetonitrile containing 0.5 μM propranolol or
an equivalent as an internal standard for LC-MS/MS analysis.

2.2 CYP Inhibitors, 1. CYP Selective inhibitors: All inhibitors and their effective con-
Substrates, and centrations are listed in Table 1.
Antibodies 2. CYP Marker substrates (optional).
(a) Phenacetin (CYP1A2).
(b) Coumarin (CYP2A6).
(c) (S)-Mephenytoin (CYP2C19 and CYP2B6).
(d) Paclitaxel (CYP2C8).
(e) Diclofenac (CYP2C9).
(f) Bufuralol (CYP2D6).
(g) Chlorzoxazone (CYP2E1).
(h) Testosterone (CYP3A4).
3. Inhibitory antibodies. Polyclonal or monoclonal antibodies
raised against CYP1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1,
and 3A4 were purchased from XenoTech, LLC (Lenexa, KS) or
other supplier (BD Biosciences, Woburn, MA).
254 Zhengyin Yan and Gary W. Caldwell

Table 1
Chemical inhibitors and effective concentrations for in vitro
CYP-phenotyping [10]

Inhibitor
CYP Inhibitor concentration (μM)
1A2 α-Napthoflavone 1
2A6 Methoxsalen 1
2B6 ThioTEPA 50
2C8 Montelukast 0.1
2C9 Sulphaphenazole 10
2C19 N-3-Benzylphenobarbital 1
2D6 Quinidine 1
2E1 Diethyldithiocarbamate 50
3A4/5 Ketoconazole 1

2.3 CYP Enzymes 1. Pooled human liver microsomes: HLM prepared from 20 to
and Human Liver 50 donors was obtained from BD Biosciences (Woburn, MA)
Microsomes and stored at −80 °C.
2. cDNA-expressed cytochrome P450: Supersomes™ enzymes
such as CYP1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and
3A4 was all purchased from BD Biosciences (Woburn, MA)
and stored at −80 °C (see Note 1).
3. Individual human liver microsomal panel: Individual liver
microsomes prepared from 10 to 15 different donors. Each
HLM preparation was fully characterized using marker
substrates for CYP- specific activity including CYP1A2, 2A6,
2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4/5 (BD Biosciences).

2.4 Instrumentation LC-MS/MS analyses were performed on ABI/MDS Sciex 4000

for Analysis QTRAP mass spectrometer (Toronto, Canada) or a comparable
MS coupled with a CTC LEAP auto-sampler and Shimadzu 20A
HPLC system (Canby, OH). Mass spectrometer was operated in
the electrospray ionization positive (ESI+) mode using the follow-
ing conditions: ion spray voltage 5,500 V, turbo gas temperature
450 °C, entrance potential 10 V, nebulizing gas 30, and turbo gas
30. MS analytical parameters for each analyte are optimized.
Aliquots of 15 μL samples were injected onto a Princeton
SPHER-100 C18 column (2.0 × 50 mm, 5 μm) with mobile phases
of 1 % acetic acid in water and acetonitrile at a flow rate of 0.4 mL/
min. The metabolites were eluted using a single gradient from
95 % aqueous to 95 % acetonitrile over 8 min, and then the column
was flushed with 95 % acetonitrile for 2 min before re-equilibration
In Vitro Identification of Cytochrome P450 Enzymes... 255

at the initial condition. During the run, the divert valve was acti-
vated to direct the HPLC eluant to the waste line for the first
1.5 min of elution and then switched to the mass spectrometer for
analysis. LC-MS data were processed by Analyst 1.4.2 (ABI Sciex)
to obtain peak areas of each analyte that were normalized relative
to the internal standard.

3 Methods

3.1 Metabolism by The following procedure assumes a 300-μL incubation volume in

Recombinant CYPs triplicate but can be scaled to other volumes.
(See Note 2)
1. Dissolve the drug compound in acetonitrile to make a stock
solution (see Note 3).
2. Label a 96-well plate as “1A2,” “2C9,” “2C19,” “3A4,” etc.
and “control” used for incubation (three replicates are needed
for each CYP enzyme).
3. Dilute 0.50 M phosphate buffer with deionized water to make
5.0 mL 50 mM phosphate buffer.
4. Dilute the drug stock in 2,500 μL 50 mM phosphate buffer to
a concentration equal to 2× Km determined in the kinetic
studies using HLM (see Note 4).
5. Dispense 100 μL of 50 mM phosphate buffer containing the
drug into every well on the labeled 96-well plate.
6. Calculate the exact volume of each individual CYP enzyme
needed to make 150 μL of solution at 100 pmol/mL (see
Note 5).
7. Based on the calculation, add the exact amount of each indi-
vidual CYP enzyme in triplicate to individual wells on the
labeled 96-well plate containing 100 μL of drug in 50 mM
phosphate buffer; for the “control” wells receive the same
amount of non-transfected supersomes (without any CYP
enzymes).
8. Add proper volumes of 50 mM phosphate buffer to individual
wells to bring the total volume to 150 μL.
9. Prepare NADPH regenerating solution as the following:
(a) Add 1,000 μL phosphate buffer (500 mM).
(b) Add 7,800 μL deionized water.
(c) Add 1,000 μL NADP+ cofactor mixture.
(d) Supply 200 μL G6PDH.
(e) Vortex briefly.
10. Dispense 150 μL NADPH regenerating solution to every well
containing CYP-drug mixture, and tape the plate repeatedly.
256 Zhengyin Yan and Gary W. Caldwell

Table 2
Expression levels of different CYP enzymes in liver [10]

Mean level
(pmol/mg protein) Relative level (%)

CYP Min Max Min Max

1A2 19 67 7.5 13
2A6 14 68 5.5 13
2B6 1.0 45 0.4 8.4
2C8 12 64 4.5 12
2C9 50 96 20 18
2C19 8.0 20 3.1 3.7
2D6 5.0 11 2.0 2.1
2E1 22 52 8.6 9.8
3A4 37 108 15 20
3A5 1.0 117 0.4 22

11. Incubate at 37 °C for 30–60 min (see Note 6).

12. Add 150 μL stop solution to terminate the reaction.
13. Centrifuge to precipitate protein.
14. Transfer supernatants to a labeled HPLC sample plate for ana-
lyzing major metabolites (see Note 7) using LC-MS/MS.
15. Calculation and data analysis: kinetic rates of individual CYPs can
be determined in relative to the highest one (100 %) (see Note 8).
16. Normalize kinetic rates using the estimated content of each
CYP in human liver (Table 2) to determine which CYP(s) are
the most effective in metabolizing the drug.

3.2 Chemical Following procedure assumes a 200 μL incubation volume and

Inhibition (See Note 9) 0.5 mg/mL HLM in triplicate but can be scaled to other volumes
and protein concentrations (see Note 10).
1. Dissolve CYP-selective inhibitors in acetonitrile to make a
working solution at a concentration of 200× the efficacious
concentration (Table 1, see Note 11).
2. Label individual wells on a 96-well plate as “1A2,” “2C9,”
“2C19,” “2C19,” “3A4” … and “control” used for incuba-
tion (three replicates are needed for each CYP enzyme).
3. Calculate the total volume of HLM mixture needed for the
assay based on the total number of CYP-selective inhibitors to
be tested.
In Vitro Identification of Cytochrome P450 Enzymes... 257

4. For ten CYP-selective inhibitors, prepare HLM (1 mg

protein/mL) as the following (see Note 2).
(a) 400 μL phosphate buffer (500 mM).
(b) 200 μL pooled human liver microsomes (20 mg/mL
protein).
(c) 3,400 μL deionized water.
(d) Invert tubes repeatedly to mix well.
5. Dilute the drug stock in HLM mixture to make the final con-
centration equal to 2× Km.
6. Dispense 100 μL drug–HLM solution to each well on the
labeled plate.
7. Add 1.0 μL of CYP-selective inhibitors to corresponding
labeled wells containing drug–HLM mixture; For the “con-
trol” group, the same volume of acetonitrile is added.
8. Prepare NADPH regenerating solution as the following:
(a) Add 1,000 μL phosphate buffer (500 mM).
(b) Add 7,800 μL deionized water.
(c) Add 1,000 μL NADP+ cofactor mixture.
(d) Supply 200 μL G6PDH.
(e) Vortex briefly.
9. Dispense 100 μL NADPH regenerating solution to every tube
containing HLM–drug mixture, and invert tubes repeatedly.
10. Put the plates in a water bath and incubate at 37 °C for
30–60 min (see Note 10).
11. Add 100 μL stop solution to terminate the reaction.
12. Centrifuge to precipitate protein.
13. Transfer supernatants to labeled HPLC sample plate for ana-
lyzing major metabolites (see Note 7) using LC-MS/MS.
14. Calculation and data analysis: inhibition of metabolite forma-
tion by individual CYP inhibitors can be determined in relative
to the control without inhibitors (100 %).

3.3 Antibody Following procedure assumes a 200 μL incubation volume and

Neutralization 0.5 mg/mL HLM in triplicate but can be scaled to other volumes
(See Note 12) and protein concentrations.
1. Label a 96-well plate as “1A2,” “2C9,” “2C19,” “2C19,”
“3A4” … and “control” used for incubation (three replicates
are needed for each CYP-selective inhibitory antibody and the
control without antibody).
2. Calculate the total volume of pooled human liver microsomal
solution based on the number of CYP-selective antibodies to
be tested.
258 Zhengyin Yan and Gary W. Caldwell

3. For ten CYP specific antibodies, prepare drug–HLM solution

as the following:
(a) 400 μL phosphate buffer (500 mM).
(b) 200 μL pooled human liver microsomes (20 mg/mL
protein).
(c) A proper volume (3,400-X) μL deionized water.
(d) X μL drug stock solution to a final concentration equal to
3× Km.
(e) Invert tubes repeatedly to mix well.
4. Dispense 100 μL diluted drug–HLM mixture to each well on
the labeled plate.
5. Calculate the dilution fold of each CYP-specific antibody
(see Note 13).
6. Add a proper amount of CYP-specific antibody to the corre-
sponding wells containing drug–HLM mixture; For the “con-
trol” group, the same volume of 50 mM phosphate buffer is
added.
7. Bring the total volume to 150 μL with 50 mM phosphate
buffer.
8. Incubation HLM–antibody mixture for 15 min at room
temperature.
9. Put the plate to a water bath, and warm up for 3 min.
10. Prepare NADPH regenerating solution in a 4 mL tube as the
following:
(a) Add 1,000 μL phosphate buffer (500 mM).
(b) Add 7,800 μL deionized water.
(c) Add 1,000 μL NADP+ cofactor mixture.
(d) Supply 200 μL G6PDH.
(e) Vortex briefly.
11. Dispense 150 μL NADPH regenerating solution to every tube
containing HLM–drug mixture, and invert tubes repeatedly.
12. Continue incubation at 37 °C for 30–60 min (see Note 6).
13. Add 150 μL stop solution to terminate the reaction.
14. Centrifuge for 10-min at to precipitate protein.
15. Transfer supernatants to a labeled HPLC sample plate for ana-
lyzing major metabolites by LC-MS/MS.
16. Calculation and data analysis: inhibition of metabolite forma-
tion by individual CYP antibodies can be determined in rela-
tive to the control without inhibitory antibody (100 %).
In Vitro Identification of Cytochrome P450 Enzymes... 259

3.4 Correction Following procedure assumes ten individual HLM preparations,

Analysis of Metabolite and each with a 300 μL incubation volume and 0.5 mg/mL HLM
Formation and CYP in triplicate, but it can be scaled to other volumes and protein con-
Activities (See Note 14) centrations (see Note 10).
1. Dilute 500 mM phosphate buffer (pH 7.4) with deionized
water to prepare 10 mL 50 mM phosphate buffer (pH 7.4).
2. Dilute drug stock in the freshly made 10 mL phosphate buffer
(50 mM, pH 7.4) to make a final concentration equal to
3× Km.
3. Label a 96-well plate (0.5 mL/well) by donor numbers, and
assign three wells to each donor.
4. Dispense 100 μL drug–phosphate buffer to each well.
5. Label ten 1.5-mL microcentrifuge tubes by donor numbers.
6. Dilute individual HLM (20 mg/mL protein) in the labeled
tubes with phosphate buffer solution (50 mM, pH 7.4) to
make 300 μL of HLM working solution (2.5 mg protein/
mL).
7. Add 50 μL of HLM working solution to corresponding wells
in triplicate.
8. Put the 96-well plate in a water bath to warm up for 3 min.
9. Prepare NADPH regenerating solution as the following:
(a) Add 1,000 μL phosphate buffer (500 mM).
(b) Add 7,800 μL deionized water.
(c) Add 1,000 μL NADP+ cofactor mixture.
(d) Supply 200 μL G6PDH.
(e) Vortex briefly.
10. Dispense 150 μL NADPH regenerating solution to every well
containing HLM–drug mixture, and tap the plate repeatedly
to mix.
11. Continue incubation at 37 °C for 30–60 min (see Note 6).
12. Add 100 μL stop solution to terminate the reaction.
13. Centrifuge for 10-min to precipitate protein.
14. Transfer supernatants to a new 96-well plate for analyzing
major metabolites by LC-MS/MS.
15. Calculation and data analysis: Kinetic rates of individual HLM
preparations can be determined in relative to the highest one
(100 %) (see Note 8).
16. Correlation analysis is performed using relative kinetic rates
obtained for each HLM preparation and CYP-specific marker
activity ([7], see Note 14).
260 Zhengyin Yan and Gary W. Caldwell

4 Notes

1. It is important to obtain all recombinant enzymes from the

same supplier since enzymes from different sources may exhibit
different kinetic profiles [12].
2. The most attractive feature of using recombinant CYPs is that
one can greatly simplify the CYP-phenotyping study and
unambiguously identify a particular CYP responsible for for-
mation of a given metabolite, because this in vitro system does
not contain competing enzymes. Also, many isoforms are
commercially available, which include those less common ones
such as CYPs enzymes such as CYP1A1 and 1B1.
3. Acetonitrile is the preferred solvent, due to its minimal inhibi-
tory effect on CYP activity. If DMSO must be used to maxi-
mize the solubility, the final concentration in the incubation
must be kept below 0.2 %.
4. If desired, one or three drug concentrations (0.5× Km, 1× Km,
and 2× Km) can be included in rCYP kinetic study. Although
both metabolite profiling and kinetic studies are both critical
and necessary for correctly designing and conducting CYP
inhibition studies and CYP activity correlation analysis, those
experimental procedures can be found readily in literature
[7, 13], and thus are not covered in this chapter.
5. The contents of individual CYPs can be found on the sample
sheets from the supplier.
6. The incubation time varies based on the turnover rate, and it
can be estimated by the kinetic studies.
7. Although there are no rigid guideline for conducting in vitro
CYP reaction phenotyping studies, monitoring the major
metabolite formation is generally preferred over substrate
depletion measurement because it is reaction-specific. On the
other hand, measuring depletion of drug can be less reliable for
low turnover drugs and those metabolized by multiple enzymes.
8. The relative rate determination approach can be utilized in the
early stage of drug discovery since metabolite standards may
not be available.
9. The basic principal of the assay is to examine the impact of
CYP-selective inhibitors on formation of individual metabo-
lites in pooled human liver microsomes.
10. Both HLM protein concentration and incubation time can be
adjusted based on the turnover rate.
11. It is recommended to validate effective concentrations of indi-
vidual CYP-selective inhibitors using pooled HLM and CYP-
selective substrates. Alternatively, three concentrations are
used for each CYP-selective inhibitor.
In Vitro Identification of Cytochrome P450 Enzymes... 261

12. Because well-characterized selective inhibitors are not available

for every CYP enzyme, inhibitory antibodies specific to indi-
vidual CYPs can be used as a superior alternative to chemical
inhibitors for identification of CYPs responsible for metabo-
lism of a given drug.
13. The dilution fold information can be obtained from the anti-
body supplier. Since difference in affinity can vary significantly
for different antibodies, it is recommended that a titration
experiment be carried to determine their effective dilution
fold to neutralize specific CYP activity in HLM.
14. CYP activity correlation studies are sometimes conducted to
further elucidate the role of CYP enzymes and this assay nor-
mally requires a panel of individual HLM prepared from at least
ten different donors. Those HLM preparations had been previ-
ously characterized for individual CYP marker activities by the
supplier, and those were chosen to be included in the panel
because of their distinct CYP marker activity profiles. In this
instance, the rate of metabolite formation is correlated with
various CYP marker activities or the levels of individual enzymes.

References
1. Lin JH, Lu AY (1997) Role of pharmacokinet- microsomes: a re-evaluation of P450 isoform
ics and metabolism in drug discovery and selectivity. Eur J Drug Metab Pharmacokinet
development. Pharmacol Rev 49:403–449 36:1–16
2. Yan Z, Caldwell GW (2001) Metabolism pro- 8. Vermeir M, Hemeryck A, Cuyckens F,
filing, and cytochrome P450 inhibition & Francesch A, Bockx M, Van Houdt J, Steemans
induction in drug discovery. Curr Top Med K, Mannens G, Aviles P, De Coster R (2009)
Chem 5:403–425 In vitro studies on the metabolism of trabect-
3. Rannug A, Alexandrie A-K, Persson I, edin (YONDELIS) in monkey and man,
Ingelman-Sundberg M (1995) Genetic poly- including human CYP reaction phenotyping.
morphism of cytochromes P450 1A1, 2D6 Biochem Pharmacol 77:1642–1654
and 2E1: regulation and toxicological signifi- 9. Yan Z, Caldwell GW (2004) Evaluation of
cance. J Occup Environ Med 37:25–36 cytochrome P450 inhibition in human liver
4. Murray M (2006) Role of CYP pharmacoge- microsomes. Optimization in Drug Discovery,
netics and drug–drug interactions in the efficacy Humana, Totowa, NJ, pp 231–244
and safety of atypical and other antipsychotic 10. Zhang H, Davis CD, Sinz MW, Rodrigues AD
agents. J Pharm Pharmacol 58:871–885 (2007) Cytochrome P450 reaction-
5. Hamdy SI, Hiratsuka M, Narahara K, El-Enany phenotyping: an industrial perspective. Expert
M, Moursi N, Ahmed MS-E, Mizugaki M Opin Drug Metab Toxicol 3:667–687
(2002) Allele and genotype frequencies of 11. Harper TW, Brassil PJ (2008) Reaction phe-
polymorphic cytochromes P450 (CYP2C9, notyping: current industry efforts to identify
CYP2C19, CYP2E1) and dihydropyrimidine enzymes responsible for metabolizing drug
dehydrogenase (DPYD) in the Egyptian popu- candidates. AAPS J 10:200–207
lation. Br J Clin Pharmacol 53:596–603 12. Kumar V, Rock DA, Warren CJ, Tracy TS,
6. Ingelman-Sundberg M, Sim SC, Gomez A, Wahlstrom JL (2006) Enzyme source effects
Rodriguez-Antona C (2007) Influence of on CYP2C9 kinetics and inhibition. Drug
cytochrome P450 polymorphisms on drug Metab Dispos 34:1903–1908
therapies: pharmacogenetic, pharmacoepi- 13. Yan Z, Caldwell GW, Wu W, McKown L,
genetic and clinical aspects. Pharmacol Ther Rafferty B, Jones W, Masucci JA (2002) In
116:496–526 vitro identification of metabolic pathways and
7. Khojasteh SC, Prabhu S, Kenny JR, Halladay cytochrome P450 isoforms involved in the
JS, Lu AYH (2011) Chemical inhibitors of metabolism of etoperidone. Xenobiotica
cytochrome P450 isoforms in human liver 32(11):949–962
Chapter 17

In Vitro and In Vivo Mouse Models

for Pharmacogenetic Studies
Amber Frick, Oscar Suzuki, Natasha Butz, Emmanuel Chan,
and Tim Wiltshire

Abstract
The identification of causative genes underlying biomedically relevant phenotypes, particularly complex
multigenic traits, is of vital interest to modern medicine. Using genome-wide association analysis, many
studies have successfully identified thousands of loci (called quantitative trait loci or QTL), some of these
associating with drug response phenotypes. However, the determination and validation of putative genes
has been much more challenging. The actions of drugs, both efficacious and deleterious, are complex
phenotypes that are controlled or influenced in part by genetic mechanisms.
Investigation for genetic correlates of complex traits and pharmacogenetic traits is often difficult to
perform in human studies due to cost, availability of relevant sample population, and limited ability to
control for environmental effects. These challenges can be circumvented with the use of mouse models for
pharmacogenetic studies. In addition, the mouse can be treated at sub- and supratherapeutic doses and
subjected to invasive procedures, which can facilitate measures of drug response phenotypes, making iden-
tification of pharmacogenetically relevant genes more feasible. The availability of multiple mouse genetic
and phenotypic resources is an additional benefit to using the mouse for pharmacogenetic studies.
Here, we describe the contribution of animal models, specifically the mouse, towards the field of
pharmacogenetics. In this chapter, we describe different mouse models, including the knockout mouse,
recombinant mouse inbred strains, in vitro mouse cell-based assays, as well as novel experimental approaches
like the Collaborative Cross recombinant mouse inbred panel, which can be applied to preclinical pharma-
cogenetics research. These approaches can be used to assess drug response phenotypes that are difficult to
model in humans, thereby facilitating drug discovery, development, and application.

Key words Quantitative trait loci (QTL), Quantitative trait genes (QTG), Knockout (KO) mouse,
Recombinant inbred strain, Genome-wide association mapping, High content screening (HCS),
Collaborative cross (CC), Cell barcoding

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_17, © Springer Science+Business Media, LLC 2013

263
264 Amber Frick et al.

1 Introduction

Drug response and toxicity are complex traits, highly variable across
individuals, partly attributable to heredity and genetic diversity [1].
Experiments using model organisms can complement human
genomic studies with unique advantages, including the ability to
circumvent some issues that arise in clinical trials due to administra-
tion of potentially toxic or narrow therapeutic index drugs, allow for
risky or invasive procedures, control environmental factors that influ-
ence drug response such as diet, and reduce experimental cost [2].
Many mouse inbred strains have been well characterized both geno-
typically and phenotypically. These genetically diverse and stable
mouse populations are powerful tools for genome-wide association
pharmacogenetic studies. Classical laboratory mouse strains exhibit
variation in multiple phenotypes and have long been used for
genetic analysis of human disease. Furthermore, the mouse genome
can be easily manipulated, which makes them robust models to
identify and validate specific causative genes underlying toxic and
variable drug responses in humans. Detailed information regarding
mouse genomics can be found in Silver’s Mouse Genetics: Concepts
and Applications (1995) [3] and Hedrich’s The Laboratory Mouse
(Handbook of Experimental Animals) (2004) [4].
More recently, inbred strains derived from wild mice have been
created, adding genetic and phenotypic diversity to the pool of
available laboratory mice. Ideally, we can measure multiple drug
response phenotypes in vitro or in vivo across a panel of mouse
inbred strains to identify genes underlying variable responses to
drugs. Findings from mouse studies and high-throughput mouse
cell-based screens can help identify which genetic variants deter-
mine positive, negative, or non-response to pharmacologic agents.
This information can be used to develop and design subsequent
clinical trials. The use of pharmacogenetic information in clinical
trials can help ensure that only patients who are likely to respond
or patients who are less likely to display toxicity will be tested with
the novel agent. This approach will minimize drug exposure to
patients who are less likely to benefit from the new drug.
Additionally, conducting clinical trials in a targeted patient popula-
tion can make research efforts more cost-effective. Fewer patients
may be required to observe an effect, which will reduce cost and
shorten the time required to complete the study.
Targeted drug trial designs and personalized drug therapy
are possible through the use of pharmacogenetic information.
Therefore, it is important to use and develop innovative preclinical
tools that will facilitate pharmacogenetic research. This chapter
provides an initial guide as to how mouse models can be used for
identification of pharmacogenetically relevant genes, thereby facili-
tating efforts to advance drug development and medication ther-
apy management.
In Vitro and In Vivo Mouse Models 265

2 Methods

2.1 Mouse Prior to any genetic analysis, reproducible measurement of robust

Phenotyping phenotypes is critical. Phenotypes must be measured accurately
and precisely, features often obtainable using the mouse model.
However, species differences may unfortunately contribute to phe-
notypes that are irrelevant or inaccurately used to model human
response to pharmacotherapy. Of particular note, drug metabolism
in mice may differ from that in humans [2].
In vitro studies with human liver microsomes, human hepato-
cytes, liver slices, and recombinant enzymes are important meth-
ods to assess human drug metabolism. However, these techniques
alone cannot predict how absorption, distribution, metabolism,
and excretion will modulate pharmacologic activity in vivo. Rodent
models are widely accepted experimental tools for evaluating the
carcinogenicity, toxicity, metabolism, and pharmacology of xeno-
biotics. Mice are particularly useful models with advantages over
other rodents including short gestation, large litter sizes, fast
breeding, and lower animal husbandry and maintenance costs.
However, translation to humans may not be straightforward largely
due to differences in drug metabolizing enzymes and subsequent
alterations in efficacy and safety, necessitating preliminary pharma-
cokinetic and pharmacodynamic studies for pharmacological com-
pounds of interest [3].

2.2 Knockout Mice Generation of genetically altered mice has been extremely useful for
analyzing gene function and mapping complex traits. Presently,
knockout (KO) mice are readily available, and the methodology for
generating new KO lines is well established. A review by Liggett
(2004) [4] provides practical guidance regarding the use of geneti-
cally modified mouse models for pharmacogenomic research. KO
mice are generally used in pharmacogenetic studies to examine the
effects of specific genes in mediating pharmacotherapeutic out-
comes. To generate KO mouse models, a modified segment of the
mouse gene is transfected into embryonic stem (ES) cells. Some
cells will then incorporate the transfected DNA in the target chro-
mosomal region through homologous recombination. Subsequently,
cells with altered DNA are isolated and implanted into mice in a
state of pseudopregnancy where the corpus luteum persists without
an embryo following estrus and breeding with an infertile male.
Characterization of the gene’s pharmacologic function is performed
by comparing drug response phenotypes between control and KO
mice [4]. For example, Hernandez et al. characterized the role of
flavin-containing monooxygenase family genes (FMOs) in mediat-
ing imipramine metabolism and central nervous system effects by
using mice lacking different Fmo genes [5]. Another group showed
that genetic ablation of glutathione S-transferase Pi (GstP1/P2(−/−))
led to resistance to acetaminophen-induced liver damage [6].
266 Amber Frick et al.

Although the use of KO mouse models has led to the identification

of several genes linked with drug response, there are several chal-
lenges in using this approach. This methodology is labor-intensive
and time-consuming and can negatively affect development or pro-
duce phenotypic effects that are neither relevant nor similar to the
effects of the human ortholog genes [4]. A recent review by Eisener-
Dorman and colleagues discusses several factors to consider when
using and generating KO mouse models [7]. The Jackson Laboratory
(http://www.jax.org) and the Mutant Mouse Regional Resource
Centers (MMRRC, http://www.mmrrc.org) have wide repositories
of KO mouse models that are available as live animals, cryopreserved
cell lines, or embryos. Additionally, the International Knockout
Mouse Consortium (IKMC, http://www.knockoutmouse.org) aims
to knock out all protein-coding genes in the mouse and test multiple
KO mouse lines in a battery of phenotype tests. So far, over 9,000
genes have been targeted by the consortium and more than 12,000
targeting vectors have been produced [8]. A number of private com-
panies also produce and provide KO mouse lines.

2.3 Quantitative Trait KO mice or transformed ES cells are restricted to assessing single-
Loci Identification gene effects. It is not often viable to ablate multiple genes to evalu-
ate multigenic effects on drug response. KO mouse models are
amenable to pharmacogenetic studies of genes with known or sus-
pected function. Often times, however, we do not have prior
knowledge regarding which genes may influence drug response,
thus necessitating the use of genome-wide association analysis for
identification of genetic loci significantly linked with drug response
in quantitative trait locus (QTL) regions. The most common
approach for QTL identification is to study a specific mouse refer-
ence population that was generated from strategic breeding experi-
ments. The aim is to establish a reference mouse population that
exhibits phenotypic and genotypic variation, providing the basis for
QTL mapping analysis. For a review regarding the use of different
mouse breeding approaches, the readers are referred to Darvasi
(1998) [9]. Typically, two parental mouse inbred strains that exhibit
significant difference in drug response phenotypes are mated (out-
cross), which leads to generation of a recombinant mouse popula-
tion (F1 or filial generation 1). Well-established linkage analysis
methodologies are available for genetic mapping studies of out-
cross-backcross (N2 or nuclear generation 2, resulting from an F1
mated to its parent) and outcross-intercross (F2, resulting from F1
brother–sister mating) mouse populations [10], and many of these
QTL mapping studies have successfully identified genomic regions
closely linked with drug response. For example, Haston et al.
mapped regions that influence differences in bleomycin-induced
pulmonary fibrosis by using two inbred strains with differential sus-
ceptibility to this drug [11]. QTL underlying variations in pheno-
types related to ethanol [12] and cocaine consumption [13] have
also been identified using F2 mouse populations.
In Vitro and In Vivo Mouse Models 267

One of the disadvantages in using F2 or N2 mouse populations

is that the mice must be repeatedly genotyped because each mouse
is genetically unique, complicating replication efforts. A related
strategy for QTL discovery that circumvents this shortcoming is
the use of recombinant inbred (RI) strains for linkage analysis. The
RI strains are derived after 20 or more generations of brother–sis-
ter mating. The brother–sister breeding pairs are typically gener-
ated from an outcross between two founder strains. RI lines are
genetically stable and have been extensively genotyped. In com-
parison to F2 crosses, RI lines are less expensive and require less
effort to generate and maintain.
RI lines have been successfully used by a number of groups to
identify drug response QTL. Boyle and Gill, for example, identi-
fied two loci that control cocaine-induced locomotor activity using
AXB/BXA RI strains, which were derived from C57BL/6J and
A/J strains [14]. RI strains from different parental strains can be
obtained from The Jackson Laboratory and more are in
development.
However, due to the limited recombination events that can
occur from crossing two mouse inbred strains, the QTL region is
often wide. Also, it is possible that the parental strains that contrib-
ute to the RI line do not show significant differences in drug
response, which would necessitate phenotyping a larger number of
animals. Furthermore, once a QTL has been mapped, additional
experiments are usually needed to narrow the chromosomal region
linked to the phenotype to subsequently identify quantitative trait
genes (QTG).
Another approach to mapping QTL has been developed more
recently to take advantage of a genetically and phenotypically
diverse panel of mouse inbred strains. QTL mapping analysis across
a panel of multiple mouse inbred strains requires the use of a dense
SNP (single nucleotide polymorphism) genotype map. This QTL
mapping approach became possible with advances in DNA
sequencing and genotyping technology, which permitted the iden-
tification of millions of SNPs and the genotyping of a large number
of mouse inbred strains. The resolution provided by the SNPs
potentially improves QTL resolution; candidate regions are often
less than 2 Mb [15].
The improved precision in QTL mapping analysis makes the
identification of QTG possible. Using this approach, Guo and col-
leagues were able to detect the Cyp2c29 gene, a murine homolog of
human CYP2C9, as partially responsible for mediating warfarin
metabolism in mice [16]. In another study, Harrill et al. found an
association between a genomic region that includes the Cd44 geno-
type and acetaminophen-induced liver injury in the mouse [17]. The
authors subsequently performed a candidate gene study in humans
and found an association between CD44 genetic variants and suscep-
tibility to acetaminophen toxicity. The identification of QTG also
has the potential to assist in drug development. Following QTL
268 Amber Frick et al.

mapping analysis, Zhang and colleagues further investigated the

effects of a candidate gene, aldehyde oxidase-1 (Aox) in mediating
drug clearance. In this study, the authors found that Aox is respon-
sible for the rapid metabolism of RO1 (6-(2,4-difluoro-phenoxy)-2-
((R)-2-hydroxy-1-methyl-ethylamino)-8-((S)-2-hydroxy-propyl)-
8H-pyrido[2,3-d]pyrimidin-7-one), a candidate p38 MAP kinase
inhibitor. The use of specific enzyme inhibitors and expressed
recombinant enzymes confirmed that the AOX protein catalyzed the
formation of the 4-hydroxylated drug metabolite in mice and
humans. RO1 was a candidate drug for rheumatoid arthritis. Clinical
trials for RO1 were terminated due to rapid clearance of this drug in
humans [18]. The short half-life and metabolic profile in human
beings were different from that in rats, dogs, and monkeys character-
ized during routine preclinical studies.
Although methods for genome-wide association mapping anal-
ysis in a panel of inbred mouse strains are not as well-established as
QTL mapping approaches in F2 or N2 mouse populations, there
are a few genotype–phenotype association mapping algorithms like
Efficient Mixed Model Association (EMMA) (http://mouse.
cs.ucla.edu/emma) [19] and SNPster (http://snpster.gnf.org)
[15, 20] that have been used successfully to identify QTL. Single
marker mapping is the simplest method to compute association
between genotype and phenotype. As each SNP is biallelic, the
strength of association between genotype and phenotype is calcu-
lated with a t-test or an F-test. EMMA utilizes F-tests for associa-
tion mapping in model organisms, such as the laboratory mouse,
while accounting for potential confounding variables like popula-
tion structure and genetic relatedness. On the other hand, SNPster
conducts an association analysis between haplotype and phenotype
across the mouse genome by using a weighted bootstrap method.
The haplotypes are inferred by the genotype patterns observed at a
3-SNP sliding window. Mean phenotypic values are then calculated
for the strains in each haplotype group and the strength of the asso-
ciation between phenotype and haplotype groups is determined
using analysis of variance (ANOVA). Significant QTL are then eval-
uated in detail to identify candidate QTG [21]. QTG may be priori-
tized and validated using knockdown and overexpression
experiments in vivo to further characterize the genetic effects on
drug response phenotypes. As more information about gene func-
tion, interaction networks, and biochemical pathways come to light,
the identification of causative QTG will become more feasible.
The Mouse Phenome Project is a valuable resource that can
help in the selection of traits and strains for pharmacogenetic stud-
ies. The Mouse Phenome Database (MPD, http://phenome.jax.
org) is a public central repository for mouse phenotype and geno-
type information. Currently, there are over 1,400 phenotypic mea-
surements and more than eight million SNPs that have been
deposited into the MPD [22]. A large portion of the phenotypes
In Vitro and In Vivo Mouse Models 269

deposited may be of relevance to pharmacogenomic studies,

including data on cancer (e.g., metastatic progression and tumor
growth), cardiovascular disease (e.g., atherosclerosis), infectious
disease (e.g., susceptibility to bacterial infection and response to
parasitic infection), obesity (e.g., fat body composition, body
weight, and body mass index), and behavior (e.g., anxiety).
Information on strain responses to therapeutic drugs like imipra-
mine, diazepam, acetaminophen, and lamotrigine is also available.
The Mouse Genome Database Project [23] (http://www.informat-
ics.jax.org/) is another online resource that has information on
mouse QTL mapping studies and multiple mouse phenotypic mea-
sures. The Wellcome Trust Sanger Institute (http://www.sanger.
ac.uk/resources/mouse/genomes) has extensive sequencing, SNP,
indel, and structural variation data available.
After a genomic interval or gene is identified in the mouse,
researchers will commonly be interested in the corresponding
region of the human genome. The majority of human genes have a
murine ortholog; additionally, human and mouse genes are often
found in regions of conserved synteny, where multiple genes and
regulatory regions can be found in blocks that are conserved
between the two species. Maps of human-mouse genomic align-
ments and tools to convert genomic positions between organisms
are widely available online. The UCSC Genome Browser (http://
genome.ucsc.edu), Ensembl (http://www.ensembl.org) and NCBI
(http://www.ncbi.nlm.nih.gov/projects/homology/maps) provide
tools frequently used for this purpose. However, the translation to
humans is not always simple. The selection of human candidate genes
based on mouse findings may need to be extended beyond orthologs
to also include genes for proteins in the same pathway or family as the
mouse QTG.

2.4 High-Throughput The use of in vitro cell-based assays for pharmacogenomics studies
in Vitro Cell-Based provides unprecedented opportunities for researchers to assess
Assays for molecular response to drugs. In comparison to in vivo models, cell-
Characterization of based assays have higher assay versatility and scalability. In vitro
Inter-individual Drug cell-based assays can be conducted in a high-throughput fashion,
Responses Using allowing for multiple endpoints to be measured simultaneously.
Mouse Embryonic Importantly, large cell-based in vitro screens can be performed for
Fibroblasts from comparison of intra-individual cellular responses to drugs and tox-
Recombinant Inbred ins, thereby making identification of drug response QTG feasible.
Mouse Strains Cell-based assays can be developed for in vitro characterization
of pharmacological and cytotoxic responses. There is a broad selec-
tion of drug response phenotypes, including mutagenicity, carcino-
genicity, cytotoxicity, and teratogenecity that can be measured in
cell-based assays. To obtain biologically relevant results, it is impor-
tant to choose the appropriate endpoints to measure. Table 1 lists
a number of different drug response phenotypes that can be easily
measured through the use of commercially available cell-based kits.
270 Amber Frick et al.

Table 1
Endpoints commonly measured in cell assays and high content screening

Cellular process or
response Biomarkers and endpoints measured
Apoptosis Cell loss
Cell viability
Nuclear morphology
DNA content
Cell permeability
Mitochondrial mass
Mitochondrial membrane potential
Changes in the actin cytoskeleton
Caspase-3 activation
Caspase-9 activation
Cytochrome c localization
Cell cycle control p53 detection
p21 detection
Autophagy LC3B protein quantification
Cell proliferation BrdU incorporation
Ki-67 antigen quantification
3H-thymidine incorporation (DNA synthesis)
14C-methionine incorporation (protein synthesis)
Cell morphology F-actin and microtubule rearrangements
Oxidative stress Superoxide formation
MnSOD production
Phospho-H2AX detection
Cytochrome C reduction
Loss of critical molecules ATP depletion
Glutathione depletion
Cell membrane integrity LDH release assay
Membrane-impermeable DNA stain

Conventional cytotoxicity assays typically have lower assay

sensitivity and limited ability to model complex toxicity pathways
because traditional cytotoxicity assays only measure a single end-
point and evaluate cytotoxic responses that occur in later stages of
cell death [24, 25]. Alternatively, high-throughput, high content
cell-based imaging screening (HCS) can simultaneously measure
large numbers of phenotypic endpoints in the same cell, thus facili-
tating detection of mechanisms underlying toxic drug response.
For additional information regarding the use and application of
HCS assays please refer to Rausch [26], Bullen [27], Abraham
et al. [28], Mayr and Bojanic [29], Zock [30], and Zanella et al.
[31]. HCS assays begin with a combination of cellular dyes/stains,
In Vitro and In Vivo Mouse Models 271

antibodies typically labeled with fluorescent compounds, and/or

GFP-fusion proteins. Cell-based imaging HCS combines the use
of fluorescence-based reagents and imaging instruments to assess
toxic responses of both individual cells and total cell populations in
a high-throughput format [32, 33]. Additionally, it is feasible to
evaluate early measures of cell death using HCS.
Investigators may choose to conduct HCS on fixed cells (plated
cells are fixed after experimental treatment and then read on an
HCS instrument) or on live cells (an HCS readout is performed on
living cells). The fixed cell HCS approach is a high-throughput
screening method with all fixation steps usually automated, making
the process relatively fast and reproducible. However, experimental
design is limited to a single time-point, requiring preparation of
multiple plates to cover an entire time-course. Live cell-based HCS
assays, in contrast, permit kinetic measurements from the same
plate. Through the use of fluorescent markers that are functional in
living cells, cellular phenotypes can be measured at baseline and
throughout the experimental period [34]. Time is a critical com-
ponent for the live cell-based HCS assays because the experiment
can only be conducted as long as the cells are alive.
Of note, there are a few methodologies that can be used to
assess the sequential dynamics of cellular processes. One in particu-
lar is an extracellular flux analyzer (http://www.seahorsebio.com),
which measures oxygen consumption rate (an indicator of mito-
chondrial respiration) and extracellular acidification rate (occurs
due to glycolysis and is indicative of cell metabolism). Another
methodology that can be used for measuring dynamic cellular pro-
cesses is flow cytometry. Flow cytometry combines light scatter,
excitation, and fluorochrome emission to generate multiparamater
data from particles and cells. For more information regarding flow
cytometry, the reader is referred to Shapiro’s Practical Flow
Cytometry (2003) [35].
Several factors to consider when selecting cell-based assays for
high-throughput screening include cost, ease of use, assay sensitiv-
ity and specificity, and the number of endpoints that can be mea-
sured simultaneously by the detection instrument. Another
important consideration is the type of cells that will be used for
HCS. Multiple immortalized cell lines or cloned cell lines derived
from human or animal tumors have been widely used due to
their experimental convenience and commercial availability.
The American Type Culture Collection (ATCC, http://www.atcc.
org) is a comprehensive resource for cell-based in vitro assays with
information for more than 3,600 cell lines from over 150 species.
A few of the most widely used cell lines for pharmacogenomics
studies are the HapMap, Human Variation Panel, the Centre
d’Etude du Polymorphisme Humain pedigree cell lines, and the
Epstein-Barr virus (EBV)-transformed lymphocyte immortalized
cell lines, which were obtained from hundreds of individuals of
272 Amber Frick et al.

different ethnic groups. These cell lines can be obtained from the
Coriell Institute repository (http://ccr.coriell.org). Several studies
have used these cell lines to perform genome-wide association
studies, investigating genetic variants linked with differential
responses to cisplatin, carboplatin, and etoposide [36–38].
Immortalized cell lines are an important tool for pharmacoge-
nomic research. However, immortalized cell lines do not exhibit
normal in vivo cellular functions and have dysfunctional apoptotic
and cell cycle control mechanisms. Primary cultures derived from
human and animal tissues are an important alternative to immor-
talized cells because these cell lines more closely mimic normal cell
functions and are thus more physiologically relevant experimental
models for drug screening, in vitro mechanistic characterization,
or gene discovery [39]. For detailed protocols on cell isolation and
culture from a variety of mouse tissues, please refer to Ward and
Tosh’s Mouse Cell Culture: Methods and Protocols [40]. Mouse
embryonic fibroblasts (MEFs) are an example of an effective pri-
mary cell line currently utilized for these purposes. MEFs are
advantageous because they exhibit features of primary cultures and
are easily manipulated for experimental purposes. Here, we pro-
vide a methodology that can assist in the design of a pharmacoge-
nomic high-throughput screen using MEFs.

2.5 Measuring Cellular genetics strategies combine the experimental advantages

Inter-strain Cytotoxic of both in vitro and in vivo studies. MEFs and other cells isolated
Responses to Drugs from genetically defined mice serve as a platform to molecularly
and Toxins Using characterize multigenic phenotypes. Characterization of these
High-Throughput phenotypes with screens of pharmacotherapeutic compounds
Cell-Based Screening facilitates identification of toxicity pathways.
in Mouse Embryonic
Fibroblasts

2.5.1 Cell Culture One of the most important and challenging aspects in using cell
assays for high-throughput pharmacogenomic QTL mapping is to
use cells that are in the same growth phase at the time they are
plated for HCS. To ensure consistency in conditions across all cell
lines, all cells must have been through the same number of pas-
sages when expanded and have the same confluency before the
HCS procedure, thus minimizing environmental and experimental
variation between multiple cell lines. In addition, primary cells are
less accessible or robust and have a limited life span compared to
immortalized or tumor cell lines [41]. Therefore, the time required
for cell growth and culture is an important experimental
consideration.
In Vitro and In Vivo Mouse Models 273

1. Grow the appropriate number of cells in tissue culture flasks.

Cells should be 50–60% confluent prior to trypsinization.
2. Collect cells by centrifugation and resuspend them in the
appropriate media for that cell type at a desired density (e.g.,
MEFs are plated at a density of 1,500 cells per well in a 384-
well plate for a 24 h assay time point and at a density of 1,000
cells per well for a 72 h assay time point).
3. Plate the suspended cells in the appropriate 96- or 384-well
cell culture plates. Multiple replicate wells should be used for
accurate quantification of the phenotype. The multi-well plate
format allows for cells to be treated with a wide range of drug
concentrations, facilitating acquisition of dose–response
curves. An automated system to dispense cells into the wells
can be used, especially if plating on a 384-well plate, further
minimizing experimental variability.

2.5.2 Dosing the Cells After a recovery period, which varies between cell types, treat the
cells with the drug of interest by either adding the drug to the
existing media or replacing the media with a media combined with
drug. A serial dilution of the drug is most commonly used to dose
each of the cell lines.

2.5.3 Cell Assay 1. HCS and additional methodologies like flow cytometry can be
and Phenotyping easily multiplexed. There are multiple drug response pheno-
types that can be measured (Table 1). Caution should be exer-
cised to minimize experimental variation.
2. Collect cellular response data using an automated microscope,
flow cytometer, fluorescence plate reader, or other types of
detection instruments.

2.5.4 Data Analysis 1. Data collected should be processed according to the type of
assay used. In the case of HCS, the images obtained are pro-
cessed by segmentation algorithms that quantify different
aspects of cellular morphology.
2. Normalize experimental values to reference wells, which are
treated with vehicle. This data normalization approach mini-
mizes experimental variation and other variance unrelated to
drug response.
3. If cells were treated with multiple drug concentrations, a
dose–response curve can be obtained (Fig. 1). A single pheno-
typic value, usually IC50, is then calculated from the curve.
4. Calculated phenotypic values can be used for QTL mapping
using the methods previously described (Fig. 2).
274 Amber Frick et al.

Fig. 1 Dose response curves and IC50 values for eight rotenone treated MEF cell
lines obtained using high-content image screening. MEFs from 32 inbred mouse
strains were plated on 384-well clear bottom plates and treated with nine differ-
ent concentrations of rotenone, ranging from 0.015 to 100 µM. Triplicate wells
were used for each concentration of the compound. The cells were fixed and
stained after 72 hours of treatment, and plates were subsequently imaged using
an automated fluorescence high-content imaging microscope (BD Pathway 435).
The number of cells was estimated by software analysis of Hoechst-stained
nuclei within collected images. (a) Dose-response curves were calculated using
the responses at each dose normalized by the vehicle-only wells (DMSO treat-
ment), results for eight of the cell lines are presented on this graph. QTL mapping
was performed using the IC50 values obtained for all of the strains. Data points
have been slightly skewed on the x-axis to prevent overlapping of SEM bars and
enhance clarity. (b) IC50 results from 30 strains are presented including SEM
bars. The responses from two strains failed the curve fitting. Observed differ-
ences are statistically significant
In Vitro and In Vivo Mouse Models 275

Fig. 2 Genome-wide association analysis for rotenone IC50 values. IC50 values obtained from rotenone dose–
response curves were used for SNPster analysis. (a) Association plot between haplotypes inferred from a
sliding 3-SNP window across the entire mouse genome and rotenone IC50 values. The arrow indicates the
genomic region with the strongest association signal. (b) Amplification of the indicated region in the X chromo-
some, showing known and predicted genes and transcripts that lie in this interval and could potentially explain
the phenotypic variation

2.5.5 Candidate QTG The identification of candidate genes is usually based on biological
Identification and relevance of the QTL genes to the phenotype being studied.
Validation Pathway analysis tools (GeneGO, http://www.genego.com and
Ingenuity, http://www.ingenuity.com) can be useful to discover
such relationships. These tools contain rich databases of known
interactions that allow the researcher to quickly find connections
between the genes of interest and pathways affected by the drug.
In cell-based studies, the first step to validate the selected can-
didate genes is usually the knockdown and overexpression of the
genes, followed by drug treatment of the cell line used. This can be
achieved by the transfection of siRNA oligonucleotides or DNA
vectors. Different methods of transfection can be used. For a
review on this subject, please refer to Kim and Eberwine[42]. The
response of the transfected cells to drug treatment is then com-
pared with cells that have a normal expression of the gene; changes
are suggestive of gene involvement in drug response.
276 Amber Frick et al.

3 Future Directions

Resources are being developed to fully exploit the genetic power of

the mouse. Complex etiologies underlie the most common and
insidious human health problems, necessitating novel mouse mod-
els to delineate genetic, environmental, and developmental com-
ponents of complex diseases. The Complex Trait Consortium is
providing unique resources for the research community, including
the Collaborative Cross (CC), designed to enhance models of
complex traits.
The emerging CC was designed to achieve high mapping reso-
lution and detect extended networks of epistatic and gene–envi-
ronment interactions [43]. The CC is a large panel of RI strains
derived from eight genetically divergent founder strains. Given the
genetic diversity found within the parental strains, there are poten-
tially unlimited numbers of genetically identical mouse strains that
can be used for pharmacogenetic studies. Eight-way RI strains
achieve 99 % inbreeding by generation 23, and each strain captures
approximately 135 unique recombination events. Each CC line is
genetically diverse with segregating polymorphisms at every 100–
200 bp. Varied allele effects can be used to delineate genetic effects
within pleiotropic loci, facilitating identification of QTG. This
genetic diversity will ensure phenotypic diversity in almost any trait
of interest. A recent study by Aylor et al. evaluated the utility of
partly inbred CC lines for gene mapping analysis. Results from the
study indicate that CC strains have high genetic diversity, balanced
allele frequencies, and dense, evenly distributed recombination
sites. Discrete, complex, and biomolecular traits (i.e., white head-
spotting, body weight, and liver mRNA expression levels) were
successfully mapped using CC lines. This study provided insight
into the use of CC lines for gene mapping studies [44]. Cells from
such inbred mouse strains may be “barcoded” in a cell-based mul-
tiplexing approach, allowing for simultaneous detection of pheno-
types across genetically divergent strains. Barcoding reduces the
number of reagents, improves high-throughput screening meth-
ods, minimizes experimental variability, and facilitates data acquisi-
tion efforts [45].
The ultimate goal of mouse models is to enhance translation to
human populations, disease processes, and efficacious and toxic
responses to pharmacotherapy. In the case of the CC, a fixed set of
genomes and subsequent data obtained through scientific interro-
gation will enable understanding of the complex interplay of genes
and environment. The question “is the mouse a good model for
pharmacogenetic translation” often arises. Clearly, the same poly-
morphisms and possibly some genes identified in the mouse will
not be translatable to human systems. However, as for any good
model system, some of the genes and likely many of the genetic
In Vitro and In Vivo Mouse Models 277

pathways uncovered using the mouse model will be translatable as

“mammalian responses” to drug toxicity rather than species-
specific responses. The reader is referred to Harrill et al. [17] for an
example of a successful pharmacogenomic study translation from
the mouse model to humans. Further sophistication of mouse
models in the future will likely lead to significant improvements in
clinical and pharmacotherapeutic management of human diseases.

4 Conclusion

In addition to recapitulating human disease, mouse models are

valuable assets for drug discovery and development. In vivo mouse
genetic methods have resulted in the identification of thousands of
QTL for an extensive range of phenotypes. More recent genomic
advances have enabled narrowing these QTL regions to specific
QTG. Furthermore, novel in vitro high-throughput technologies
strengthen characterization of multiple phenotypes. Thus, in vivo,
in vitro, and in silico mouse models are essential tools for preclini-
cal assessment of drug response phenotypes, including pharmaco-
kinetic, pharmacodynamic, and pharmacogenetic responses.

Acknowledgments

The authors express their appreciation to Cristina Benton for

valuable contributions and manuscript revision.

References

1. Zhang W, Dolan ME (2009) Use of cell lines 6. Henderson CJ et al (2000) Increased resis-
in the investigation of pharmacogenetic loci. tance to acetaminophen hepatotoxicity in mice
Curr Pharm Des 15(32):3782–3795 lacking glutathione S-transferase Pi. Proc Natl
2. Bogaards JJ et al (2000) Determining the best Acad Sci USA 97(23):12741–12745
animal model for human cytochrome P450 7. Eisener-Dorman AF et al (2009) Cautionary
activities: a comparison of mouse, rat, rabbit, insights on knockout mouse studies: the gene
dog, micropig, monkey and man. Xenobiotica or not the gene? Brain Behav Immun
30(12):1131–1152 23(3):318–324
3. Cheung C, Gonzalez FJ (2008) Humanized 8. Skarnes WC et al (2011) A conditional knockout
mouse lines and their application for predic- resource for the genome-wide study of mouse
tion of human drug metabolism and gene function. Nature 474(7351):337–342
toxicological risk assessment. J Pharmacol Exp 9. Darvasi A (1998) Experimental strategies for
Ther 327(2):288–299 the genetic dissection of complex traits in ani-
4. Liggett SB (2004) Genetically modified mouse mal models. Nat Genet 18(1):19–24
models for pharmacogenomic research. Nat 10. Zou F (2009) QTL mapping in intercross and
Rev Genet 5(9):657–663 backcross populations. Methods Mol Biol
5. Hernandez D et al (2009) Deletion of 573:157–173
the mouse Fmo1 gene results in enhanced 11. Haston CK et al (2002) Bleomycin hydrolase
pharmacological behavioural responses to and a genetic locus within the MHC affect risk
imipramine. Pharmacogenet Genomics 19(4): for pulmonary fibrosis in mice. Hum Mol
289–299 Genet 11(16):1855–1863
278 Amber Frick et al.

12. Drews E et al (2010) Quantitative trait loci con- 29. Mayr LM, Bojanic D (2009) Novel trends in
tributing to physiological and behavioural etha- high-throughput screening. Curr Opin
nol responses after acute and chronic treatment. Pharmacol 9(5):580–588
Int J Neuropsychopharmacol 13(2):155–169 30. Zock JM (2009) Applications of high content
13. Jones BC et al (1999) Quantitative-trait loci screening in life science research. Comb Chem
analysis of cocaine-related behaviours and neu- High Throughput Screen 12(9):870–876
rochemistry. Pharmacogenetics 9(5):607–617 31. Zanella F et al (2010) High content screening:
14. Boyle AE, Gill K (2001) Sensitivity of AXB/ seeing is believing. Trends Biotechnol 28(5):
BXA recombinant inbred lines of mice to the 237–245
locomotor activating effects of cocaine: a 32. Taylor DL, Giuliano KA (2005) Multiplexed
quantitative trait loci analysis. Pharmacogenetics high content screening assays create a systems
11(3):255–264 cell biology approach to drug discovery. Drug
15. McClurg P et al (2006) Comparative analysis Discov Today Techn 2(2):149–154
of haplotype association mapping algorithms. 33. Giuliano KA et al (2006) Systems cell biology
BMC Bioinformatics 7:61 based on high-content screening. Methods
16. Guo Y et al (2006) In silico pharmacogenetics: Enzymol 414:601–619
warfarin metabolism. Nat Biotechnol 24(5): 34. Abraham VC et al (2004) High content
531–536 screening applied to large-scale cell biology.
17. Harrill AH et al (2009) Mouse population- Trends Biotechnol 22(1):15–22
guided resequencing reveals that variants in 35. Shapiro HM (2003) Practical flow cytometry.
CD44 contribute to acetaminophen-induced Wiley, Hoboken
liver injury in humans. Genome Res 19(9): 36. Huang RS et al (2007) Identification of
1507–1515 genetic variants contributing to cisplatin-
18. Zhang X et al (2011) In silico and in vitro induced cytotoxicity by use of a genomewide
pharmacogenetics: aldehyde oxidase rapidly approach. Am J Hum Genet 81(3):427–437
metabolizes a p38 kinase inhibitor. 37. Shukla SJ et al (2009) Whole-genome
Pharmacogenomics J 11(1):15–24 approach implicates CD44 in cellular resis-
19. Kang HM et al (2008) Efficient control of tance to carboplatin. Hum Genomics 3(2):
population structure in model organism asso- 128–142
ciation mapping. Genetics 178(3):1709–1723 38. Bleibel WK et al (2009) Identification of
20. Pletcher MT et al (2004) Use of a dense single genomic regions contributing to etoposide-
nucleotide polymorphism map for in silico induced cytotoxicity. Hum Genet 125(2):
mapping in the mouse. PLoS Biol 2(12):e393 173–180
21. McClurg P et al (2007) Genomewide associa- 39. Marshak DR, Greenwalt DE (2007)
tion analysis in diverse inbred mice: power and Differentiating primary human cells in rapid-
population structure. Genetics 176(1):675–683 throughput discovery applications. Methods
22. Grubb SC et al (2009) Mouse phenome data- Mol Biol 356:121–128
base. Nucleic Acids Res 37(Database issue): 40. Ward A, Tosh D (2010) Mouse cell culture:
D720–D730 methods and protocols, 1st edn. Humana
23. Blake J.A., et al. (2009) The Mouse Genome Press, New York
Database genotypes::phenotypes. Nucleic 41. Freshney R (2005) Culture of animal cells—a
Acids Res 37(Database issue):D712-9. manual of basic technique, 5th edn. Wiley,
24. Olson H et al (2000) Concordance of the tox- Hoboken, NJ
icity of pharmaceuticals in humans and in ani- 42. Kim TK, Eberwine JH (2010) Mammalian cell
mals. Regul Toxicol Pharmacol 32(1):56–67 transfection: the present and the future. Anal
25. Jaeschke H et al (2002) Mechanisms of hepa- Bioanal Chem 397(8):3173–3178
totoxicity. Toxicol Sci 65(2):166–176 43. Churchill GA et al (2004) The Collaborative
26. Rausch O (2006) High content cellular screen- Cross, a community resource for the genetic
ing. Curr Opin Chem Biol 10(4):316–320 analysis of complex traits. Nat Genet 36(11):
27. Bullen A (2008) Microscopic imaging tech- 1133–1137
niques for drug discovery. Nat Rev Drug 44. Aylor DL et al (2011) Genetic analysis of com-
Discov 7(1):54–67 plex traits in the emerging collaborative cross.
28. Abraham VC et al (2008) Application of a Genome Res 21:1213–1222
high-content multiparameter cytotoxicity assay 45. Krutzik PO, Nolan GP (2006) Fluorescent cell
to prioritize compounds based on toxicity barcoding in flow cytometry allows high-
potential in humans. J Biomol Screen 13(6): throughput drug screening and signaling
527–537 profiling. Nat Methods 3(5):361–368
Chapter 18

The Hydrodynamic Tail Vein Assay as a Tool

for the Study of Liver Promoters and Enhancers
Mee J. Kim and Nadav Ahituv

Abstract
The hydrodynamic tail vein injection is a technique that is used to deliver nucleic acids into live mice.
Delivery through this method results in the in vivo transfection of foreign DNA primarily in the liver.
Here, we describe the use of this technique to test for regulatory activity of liver promoters and enhancers,
using a dual luciferase reporter system as the readable/measureable output and how this application can
be used for pharmacogenomic studies.

Key words Hydrodynamic tail vein, Pharmacogenomics, Promoter, Enhancer

1 Introduction

The hydrodynamic tail vein technique is an efficient procedure to

deliver nucleic acids to the liver via the rapid intravascular injection
of a large volume of liquid [1, 2]. It can be used to deliver specific
genes into the liver [3–5], to deliver RNAi [6–11], to invoke
tumors [12], and to study the gene expression of the host after
delivery of foreign DNA that generates therapeutic plasma levels of
the resulting protein [13]. Here, we describe its use for the phar-
macogenomic characterization of promoters, enhancers, and their
variants in the liver. This approach can be also applied for the
general investigation of hepatic gene regulatory elements.
The liver is a central organ for drug absorption, deposition,
metabolism, and elimination (ADME). There has been much
research focused on understanding how the diverse array of drug-
associated liver genes interact to determine drug response. It is
well known that differential expression of drug-metabolizing
enzymes and drug transporters in the liver is a major determinant
of drug response variability. In addition, interindividual differences
in liver function clearly have a genetic component, as demonstrated
by recent genome wide association studies that have uncovered

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_18, © Springer Science+Business Media, LLC 2013

279
280 Mee J. Kim and Nadav Ahituv

novel susceptibility loci linked to many liver-associated diseases

[14–16]. However, in the majority of cases, the variation observed
within these loci does not affect protein coding regions [17]. The
same holds true for pharmacogenomic phenotypes. Faulty gene
regulation (in regulatory regions, such as promoters and enhanc-
ers), rather than aberrant protein structure, could be the cause of
many of these pharmacogenomic outcomes.
With technological advances, such as comparative genomics
and chromatin immunoprecipitation in combination with mas-
sively parallel sequencing technologies (ChIP-seq), gene regula-
tory sequences can now be identified in a rapid manner. However,
their functional characterization, particularly in vivo, still remains a
challenge. The hydrodynamic tail vein injection technique can be
used as an in vivo assay to rapidly characterize functional gene reg-
ulatory elements and to test them for differences in activity due to
nucleotide variation. Identifying regulatory elements and charac-
terizing the functional effects of genetic variants on drug response
will help us uncover the mechanisms of adverse drug reactions
mediated by the liver. Ultimately, the study of pharmacogenomics
offers the promise of tailoring more effective drug treatments on
an individual basis.

2 Materials

2.1 For the Injection Nucleic acid (10 μg/mouse of assayed plasmid; Renilla control,
such as pGL4.74[hRluc/TK] (Promega): 2 μg/mouse).
Heat source (heat lamp or heat box with 120 W bulb, such as:
Aladin Enterprises, Inc., Cat. # RHB.1812).
Heat pad(s) (such as reusable pads from SnapHeat.com; Cat. #
SH812 & SH88).
Source of anesthesia (e.g., isoflurane), gas chamber, and mouth piece.
Scale.
Mice: 21–25 g (see Note 1).
3 mL syringes (Becton Dickinson; Cat. #: 309585).
27½ gauge needles (Becton Dickinson; Cat. #: 305109).
Medical gauze pads (Kendall; Cat. #: 2187).
Delivery Solution (Mirus TransIT®-EE In Vivo; Cat. #: Mir5340;
[18]).
5 mL sterile centrifuge/plastic tubes to hold injection solution
(Argos Technologies; Cat. #: T2076S).

2.2 For Liver Harvest Dissection instruments (scissors/forceps).

and Luciferase Assay 1× Lysis Buffer (5× Passive Lysis Buffer, Promega; Cat. #: E1941;
Proprietary Formulation).
Luciferase substrate (see Note 2).
Renilla luciferase substrate (see Note 2).
Hydrodynamic Tail Vein Assay for Pharmacogenomics 281

Sterile or autoclaved liver collection tubes (see Note 8) and 1.5 mL

microcentrifuge tubes.
Homogenizer (e.g., rotor stator).
Refrigerated microcentrifuge.
70 % Ethanol.
White flat bottom 96-well plates (CoStar, Cat. #: 3917).
Luminometer.

2.3 Nucleic Acid 1. For the analysis of promoters and their variants, it is typical to
Preparation clone at least 250 bp upstream of the transcriptional start site
(TSS) and approximately 100 bp downstream of the TSS.
2.3.1 Promoters
2. Promoter sequences and their variants are cloned into the
pGL4.11b [luc2P] (Promega) vector that contains the luciferase
reporter gene (Fig. 1).
3. An empty pGL4.11b [luc2P] vector (without an insert) is used
as a negative control for promoter assays. In addition, the
reference sequence of the assayed promoter is typically used
as a baseline to compare the promoter activity of nucleotide
variants [19].

2.3.2 Enhancers 1. For enhancers, there is a variety of methods and resources from
which one can select candidate sequences to test for enhancer
activity. These include but are not restricted to: comparative
genomics, ChIP-seq datasets for enhancer marks, DNase
hypersensitive sites and transcription factor binding site analy-
sis. For pharmacogenomic purposes, we have focused on using
some of these approaches to analyze regulatory sequences
around genes that are involved in ADME, such as liver mem-
brane transporters [19, 20].
2. The sequences that are tested for enhancer activity and their
variants are cloned into the pGL4.23 [luc/minP] (Promega)

PROMOTER Luciferase
OR
ENHANCER minP Luciferase

TAIL VEIN LUCIFERASE

INJECTION ASSAY
(Day 1) (Day 2)

Fig. 1 A schematic illustration of the hydrodynamic tail vein injection assay. Promoters or putative enhancer
sequences are cloned into a luciferase reporter plasmid and co-injected with a Renilla luciferase reporter
plasmid (not shown) into the tail vein of the mouse. Luciferase activity is assayed 24 h post injection and
measured by a luminometer. minP: minimal promoter
282 Mee J. Kim and Nadav Ahituv

vector that contains a minimal promoter (a promoter that is

not sufficient to drive reporter expression without the presence
of a functional enhancer) and the firefly luciferase reporter
gene (Fig. 1).
3. For enhancers, an empty pGL4.23[luc/minP] plasmid serves
as a negative control and results are compared to this vector as
fold induction [20]. In addition, the reference sequence of an
identified enhancer is used as a baseline to compare the
enhancer activity of any nucleotide variants.

2.3.3 Control Constructs 1. In every injection, 2 μg of the pGL4.74 [hRluc/TK] vector

and DNA Purification that contains the constitutive HSV-TK promoter followed by
the Renilla Reniformis (hRluc) luciferase gene is injected to
control for injection efficiency.
2. The Apolipoprotein E (APOE) enhancer (hg18: chr19:50119112–
50119676; [21]), which is known to drive liver specific tissue
expression, can be used as a positive control for liver enhancer
activity in this assay.
3. All plasmids are typically grown up from bacterial culture
and should be purified using an endotoxin-free plasmid DNA
purification kit (see Note 3).

3 Methods

3.1 Day 1: The total volume required for injection (formula adopted from
Hydrodynamic Tail Mirus; see Note 4 [18]):
Vein Injection
mouse weight ( g )
Total volume (mL ) = 10% + 0.1mL delivery solution
3.1.1 Sample 10 g / mL
Preparation
However, to determine the actual volume of delivery solution,
the formula is rearranged and the addition of nucleic acids is taken
into account:
⎡volume of 10 mg test plasmid(mL) ⎤
Delivery solution (mL) = total volume(mL) − ⎢ ⎥
⎣+volume of 2mg Renilla plasmid(mL)⎦

Example 3A
A mouse to be injected weighs 24 g. The DNA concentration of
the test construct (TC) plasmid is 500 ng/μL and the Renilla
plasmid (RP) concentration is 200 ng/μL. To inject 10 μg of the
test construct and 2 μg of the Renilla, the total injection mix
volume would be:
24 g
Total volume (mL ) = 10% + 0.1mL delivery solution
10 g / mL
= 2.5mL
Hydrodynamic Tail Vein Assay for Pharmacogenomics 283

After subtracting the volume of the test and Renilla plasmids,

the volume of the delivery solution should be:
Delivery solution (mL ) = 2.5mL − (0.02mL TC + 0.01mL RP )
= 2.47mL

1. To assist in determining the amount of delivery solution for

each individual mouse at the time of injection, it is advised to
prepare a worksheet that has already calculated the volume of
nucleic acid needed for each injection and subtracted that
volume from the total volume that is to be injected. In this
worksheet, the weight of the mouse is already taken into
account by increments of 0.5 g. This allows for the injection
procedure to be more time efficient and less prone to error.

Example 3B

Construct ID Test 1
Construct volume (μL) 20
Renilla volume (μL) 10
Weight of mouse (g) Mirus (mL)
21.0 2.17
21.5 2.22
22.0 2.27
22.5 2.32
23.0 2.37
23.5 2.42
24.0 2.47
24.5 2.52
25.0 2.57
Construct DNA concentration (ng/μL) 500
Volume for 10 μg (μL) 20

2. Warm up the delivery solution to 25–37 °C prior to injection.

3. Once determining the mouse weight using a scale, add the
nucleic acid (test and Renilla) into a sterile 5 mL plastic tube.
4. Add the appropriate volume of delivery solution to the tube
containing the nucleic acid.
5. Connect the needle to the syringe and load the injection solution.
Make certain that there are no air bubbles in the syringe by
flipping the syringe, tapping the side or by moving the plunger
up and down carefully; push the air out of the needle until a
small volume of the injection solution is ejected.
284 Mee J. Kim and Nadav Ahituv

3.2 Preparation 1. Dilate the tail vein by warming the mouse with a heat source,
of the Animal for such as a heat box (Fig. 2), prior to administering the gaseous
Injection anesthetic (see Note 5). Following 3–5 min in the heat box,
transfer the mouse into the anesthesia chamber (Fig. 3b).
The chamber used to administer the anesthetic can also be
fitted with a heat pad for optimal dilation.
2. After the mouse is anesthetized in the chamber, as it is transferred
to the injection station, weigh the animal on a scale to determine

Fig. 2 (a) Heat box (Aladin Enterprises, Inc., Cat. # RHB.1812); (b) mice being warmed up for tail dilation

Fig. 3 (a) Gaseous anesthesia machine; (b) anesthesia box with heat pad; (c) injection station with anesthesia
mouth piece and heat pad
Hydrodynamic Tail Vein Assay for Pharmacogenomics 285

Fig. 4 Anesthetized mouse on injection station, ready for tail vein injection

Fig. 5 Mouse tail vein injection. (a) Entry of the needle is approximately at a 45° angle; (b) As the needle is
inserted into the vein, the needle becomes more parallel with the tail and the solution is injected rapidly
between 4 and 8 s

the volume of delivery solution needed (see Subheading 3.1.1

above). Place the animal on the injection station, with the
anesthetic mask positioned into its muzzle (Fig. 4).
3. Locate the tail veins on either side of the tail (they are located
laterally) and adjust the body to the side to be injected. Swab
the tail with alcohol to clean the injection site.
4. Pull the tail taut and place the needle, bevel up, approximately
30–45° from the plane of the tail (see Fig. 5a). It is recom-
mended to first inject in the distal half of the tail so that should
the initial injection be unsuccessful, the needle can be reposi-
tioned closer towards the trunk of the mouse. Starting too close to
the end, however, will be more difficult, as the vein and tail are
286 Mee J. Kim and Nadav Ahituv

thinner. As the needle inserts into the vein, move the needle
nearly parallel to the tail and insert the entire length of the
needle into the vein (see Fig. 5b); inject the injection solution
into the tail (see Note 6).
5. Inject the entire contents of the syringe within 4–8 s at a
constant rate.
6. Stop the bleeding by applying the medical gauze to the
injection site.
7. Take the mouse off the anesthetic (optional; we usually label
each mouse with a different number using a permanent marker
so as to know what construct was injected) and allow it to
recover in a new cage (see Note 7).

3.3 Day 2: Based on Herweijer et al. [3], whose study determined that the
Harvesting the Livers optimal liver expression levels of injected DNA is 24 h post injection,
and Luciferase livers are harvested at this time point.
Measurements

3.3.1 Preparation 1. Before sacrificing the mice, dilute lysis buffer in water to working
of Reagents concentration and aliquot 3 mL into labeled liver collection
tubes (see Note 8).
2. Prepare the Luciferase Assay Reagent II and Stop N Glo®
Reagent solutions, according to manufacturer’s protocol
(see Note 9).
3. Sacrifice mice according to approved animal protocols, dissect
the liver and place it in the numbered liver collection tubes
containing 3 mL of cold lysis buffer on ice (make sure all of the
liver tissue is entirely immersed in the lysis buffer).
4. Homogenize the livers for 1 min at high speed until there are
no observed liver chunks, keeping samples on ice before and
after homogenization. Use 70 % ethanol to clean the homog-
enizer between samples.
5. Transfer 1 mL of the liver homogenate to a labeled 1.5 mL
microcentrifuge tube and centrifuge at 4 °C for 30 min at
14,000 rpm.
6. During this centrifugation, aliquot 380 μL lysis buffer to a
newly labeled 1.5 mL microcentrifuge tube per sample and
store at 4 °C.
7. Upon completion of centrifugation of the liver homogenate,
transfer 20 μL of the supernatant (liquid in the top phase of the
homogenate) to the 1.5 mL microcentrifuge tube containing
380 μL lysis buffer that has been chilled to 4 °C and vortex
briefly (see Note 10).
8. Administer the appropriate volume of Luciferase Assay and
Stop-N-Glo® reagents into the diluted liver supernatant from
Hydrodynamic Tail Vein Assay for Pharmacogenomics 287

step 7. This amount is usually determined by the specific

luminometer that is used for this assay, according to the manu-
facturer’s protocols.

3.4 Analysis The background blank readings of lysis buffer are routinely sub-
of Luciferase Activity tracted from both the Luciferase and Renilla activity readings.
Readings In addition, the Luciferase values are divided by the Renilla activity
values (within each sample) to get a normalized relative Luciferase
activity value and the replicates are averaged together.
An alternate method of quantifying luciferase activity is a real
time approach, using in vivo imaging technology such as the IVIS
optical imaging system by Caliper Life Sciences [22]. This system
allows for the quantification of bioluminescence and/or lumines-
cence in vivo and avoids sacrificing mice to measure reporter gene
activity.

4 Notes

1. While it is possible to inject at a lower and higher weight, we

find that mice that are under and over the 21–25 g weight
range pose technical challenges. Smaller mice have smaller tail
veins for the gauge used for this technique. Larger mice require
more volume to inject and have been found to recover at a
lower success rate from the actual injection than mice that are
between 21 and 25 g. If the source of mice is an outside
provider, plan to have your shipment arrive at least 1 day
before, making the mice less stressed and well fed/hydrated than
those that arrive on the day of the injection. Also keep in mind,
mice that arrive several days ahead of the injection date may gain
weight and may be larger than the weight requested. We typically
use CD1 mice since they are bigger and albino, making it easier
to detect their tail vein and inject.
2. Promega provides a Dual-Luciferase Reporter Assay kit that
provides all the necessary reagents in one kit (Promega, Dual-
Luciferase® Reporter Assay System; Cat. #: E1960) [23].
3. It is important that the purified plasmid DNA is of high quality
and protein-, endotoxin-, DNase-, RNase-free to prevent
adverse or toxic effects on the animal. To achieve this, we rou-
tinely use the Qiagen Endofree Kit (Qiagen, Cat. #: 12362).
4. The addition of the 0.1 mL delivery solution compensates for
the volume of delivery solution that remains in the syringe
after the injection.
5. Condensation in the box or excessive movement is an indication
of overheating. To prevent overheating or dehydration, mice
are only pulled from their original mouse cage a few minutes
288 Mee J. Kim and Nadav Ahituv

prior to their sedation. Do not keep mice in the heat box for
more than 20 min.
6. If the needle is inserted properly, the blood in the vein should
clear and injection of the solution should be without resistance.
If there is resistance upon pushing the plunger, the needle is not
placed properly into the vein. This is also evident, if, during
the injection the tail becomes swollen locally and appears to be
“perspiring” the solution that is injected out of the tail’s pores.
Likewise, if the injection is improperly administered, the anal
area of the mouse may also become swollen. Resistance may also
be experienced mid-injection, perhaps by the movement of the
needle out of the vein. If that is the case, continue to inject
the solution but pull the needle slightly out. This may alleviate
the resistance and allow for a successful injection.
7. The mouse should recover within 5 min of the injection. The
heart rate may slow or increase rapidly within the first minute
post injection, however, this should normalize. If the mouse
appears to be seizing after the injection, this may be an indicator
that either an air bubble or an impurity entered the circulation
and the mouse may not survive. Careful monitoring of the
mice post injection is necessary.
8. The collection tube used may be dependent on the homoge-
nizer; a 14 mL Falcon tube (Becton Dickinson, Cat. #: 352001)
is adequate for a rotor stator homogenizer. Additional lysis buf-
fer will be needed to dilute the supernatant in step 7 and to use
as a blank/plate control during the luminescence read.
9. Keep both solutions away from direct light and heat. They can
be stored at 4 °C until they are ready to aliquot.
10. Dilution of the supernatant may not be necessary, depending
on the luminometer’s range.

References
1. Zhang G, Budker V, Wolff JA (1999) High 6. Lewis DL et al (2002) Efficient delivery of
levels of foreign gene expression in hepato- siRNA for inhibition of gene expression in
cytes after tail vein injections of naked plasmid postnatal mice. Nat Genet 32:107–108
DNA. Hum Gene Ther 10:1735–1737 7. McCaffrey AP et al (2002) RNA interference
2. Eggenhofer E et al (2009) High volume naked in adult mice. Nature 418:38–39
DNA tail-vein injection restores liver function 8. Sen A et al (2003) Inhibition of hepatitis C
in Fah-knock out mice. Hepatology 25: virus protein expression by RNA interference.
1002–1008 Virus Res 96:27–35
3. Herweijer H, Zhang G et al (2001) Time course 9. Song E et al (2003) RNA interference target-
of gene expression after plasmid DNA gene ing Fas protects mice from fulminant hepatitis.
transfer to the liver. J Gene Med 3(3):280–291 Nat Med 9:347–351
4. Herweijer H, Wolff JA (2003) Progress and 10. Zender L et al (2003) Caspase 8 small inter-
prospects: naked DNA gene transfer and therapy. fering RNA prevents acute liver failure in
Gene Ther 10:453–458 mice. Proc Natl Acad Sci USA 100:
5. Herweijer H, Wolff JA (2007) Gene therapy 7797–7802
progress and prospects: hydrodynamic gene 11. Xu J et al (2005) Reduction of PTP1B by RNAi
delivery. Gene Ther 14:99–107 upregulates the activity of insulin controlled
Hydrodynamic Tail Vein Assay for Pharmacogenomics 289

fatty acid synthase promoter. Biochem Biophys 18. MIRUS, TransIT®-EE Hydrodynamic
Res Commun 329:538–543 Delivery Solution Protocol. http://www.
12. Keng VW et al (2011) Modeling hepatitis B mirusbio.com/assets/cms_files/protocols/
virus X-induced hepatocellular carcinoma in ML043.pdf. Accessed July 2011
mice with the sleeping beauty transposon 19. Choi JH et al (2009) Identification and charac-
system. Hepatology 53:781–790 terization of novel polymorphisms in the basal
13. Zhou T et al (2010) Intracellular gene transfer promoter of the human transporter, MATE1.
in rats by tail vein injection of plasmid DNA. Pharmacogenet Genomics 19:770–780
AAPS J 12:692–698 20. Kim MJ et al (2011) Functional characteriza-
14. Chalasani N et al (2010) Genome-wide asso- tion of liver enhancers regulating drug-
ciation study identifies variants associated with associated transporters. Clin Pharmacol Ther
histologic features of nonalcoholic fatty liver 89:571–578
disease. Gastroenterology 139:1567–1576 21. Simonet WS et al (1993) A far-downstream
15. Zhang H et al (2010) Genome-wide associa- hepatocyte-specific control region directs
tion study identifies 1p36.22 as a new suscep- expression of the linked human apolipoprotein
tibility locus for hepatocellular carcinoma in E and C-I genes in transgenic mice. J Biol
chronic hepatitis B virus carriers. Nat Genet Chem 268:8221–8229
42:755–758 22. Caliper Life Sciences, IVIS® Lumina II. http://
16. Chen WM et al (2008) Variations in the www.caliperls.com/products/preclinical-
G6PC2/ABCB11 genomic region are associ- imaging/ivis-luminaii.htm. Accessed July 2011
ated with fasting glucose levels. J Clin Invest 23. Promega Corporation, Dual-Luciferase®
118:2620–2628 Reporter Assay Technical Manual. http://
17. Schadt EE et al (2008) Mapping the genetic www.promega.com/resources/protocols/
architecture of gene expression in human liver. technical-manuals/0/dual-luciferase-reporter-
PLoS Biol 6:1020–1032 assay-system-protocol/. Accessed July 2011
Part IV

Tools for Translation and Implementation

of Pharmacogenetic Markers
Chapter 19

A Guide to the Current Web-Based Resources

in Pharmacogenomics
Dylan M. Glubb, Steven W. Paugh, Ron H.N. van Schaik,
and Federico Innocenti

Abstract
Human genomics research has produced vast amounts of data that can be applied to or used to inform
pharmacogenomic studies. The Internet is an extremely useful resource for pharmacogenomics as many
Web sites provide access to data from genomic and clinical studies or host tools which can be used to
interpret findings or generate hypotheses. Human genetic variation can now easily be explored or visual-
ized through genome browsers and Web-based repositories which store the details of millions of human
germ-line and somatic genetic variants. Gene expression data from many different tissue and cell types are
available through Web-based repositories, and human genetic variants that associate with mRNA expres-
sion can be identified using Web data portals. Pharmacogenetic associations can be explored through
publically available data repositories and the functionality of genetic variants predicted through Web-based
bioinformatic tools. Furthermore, resources relating to currently used genetic tests are available online.
Large clinical and population studies, many linked to medical records, can be queried for the availability of
biospecimens or data. In the future, as the amount of genomic and associated clinical data increases, there
is little doubt that Web-based resources will continue to evolve and overcome barriers hindering their
efficient use, leading to systems-based approaches to pharmacogenomics.

Key words Genome browser, Genetic variation, Genotypes, Gene expression, eQTLs, Genetic
association studies, GWAS, Biorepositories, Bioinformatic tools

1 Introduction

Since the beginning of the twenty-first century, we have transi-

tioned to a time of routine clinical pharmacogenetic testing with
the possibility of routine whole genome sequencing in the near
future [1]. Currently, other methods such as DNA oligonucleotide
genotyping arrays allow more cost-effective but less comprehen-
sive genome-wide interrogation. Nonetheless, the interpretation
and synthesis of genomic data for broad audiences, including
patients, physicians and researchers, remains a large hurdle as, even

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_19, © Springer Science+Business Media, LLC 2013

293
294 Dylan M. Glubb et al.

Fig. 1 Publications related to genomic resources. PubMed searches for associated terms such as “genome
browser” (a) and “expression database” (b) show their rapid growth since the beginning of the twenty-first
century

at this point in time, there exists overwhelming amounts of human

genomic and pharmacogenomic data.
Web-based resources allow investigators, clinicians and patients
to access an extremely wide range of valuable DNA sequence, gen-
otype, phenotype and bioinformatic data that are relevant to phar-
macogenomic studies. Searches of PubMed show that these
resources are rapidly growing in number (Fig. 1) but the integra-
tion of pharmacogenomic-related information remains an issue.
One publicly available resource that helps overcome this barrier is
the Pharmacogenetics and Pharmacogenomics Knowledge Base
(PharmGKB; Chapter 20) PharmGKB has changed the way phar-
macogenomics research is performed and interpreted, and provide
a tremendous resource.
Given the large number of Web-based resources which have
relevance to pharmacogenomics, this chapter will only highlight
the most useful and comprehensive resources available. It should
be noted that in such a quickly changing field, what is described
here represents only a snapshot in time as human genomic and
pharmacogenomic projects and technologies evolve.

2 Repositories of Germ-line Genetic DNA Sequences and Variants (Table 1)

There are many Web-based genome browsers which enable view-

ing of reference germ-line DNA sequences of humans and other
species and are often integrated with a variety of contextual infor-
mation. The most well-known include the Ensembl, UCSC and
NCBI browsers. These browsers provide annotated reference DNA
Pharmacogenomic Web-Based Resources 295

Table 1
Repositories of germ-line DNA sequence and genetic variation

Resource Web address Description

dbSNP http://www.ncbi.nlm.nih.gov/ Serves as a central repository for both
projects/SNP/index.html single base nucleotide substitutions
and short indel polymorphisms
International http://www.hapmap.org/ Repository of SNP genotypes from
HapMap Project index.html.en eleven different ethnic populations
NIEHS SNPs http://egp.gs.washington.edu A SNP discovery resource focused on
examining the relationships between
environmental exposures, inter-
individual sequence variation in human
genes and disease risk in US
populations
SeattleSNPs http://pga.mbt.washington.edu A SNP discovery resource focused on
genes involved in pathways that
underlie inflammatory responses
in humans
UCSC Genome http://genome.ucsc.edu Repository of annotated genomic
Browser reference DNA sequences
Ensembl http://ensembl.org Repository of annotated genomic
reference DNA sequences
NCBI Genome http://www.ncbi.nlm.nih.gov/ Repository of annotated genomic
sites/genome reference DNA sequences
Wellcome Trust http://www.wtccc.org.uk/info/ Identifies genome sequence variants
Case Control access_to_data_samples.shtml influencing major causes of human
Consortium morbidity and mortality, through
implementation and analysis of
large-scale genome wide association
studies
1000 Genomes http://www.1000genomes.org Repository of DNA sequencing and
variant calls from different ethnic
populations
SPSmart http://spsmart.cesga.es/ Allows browsing and combination of
genotypes from many large-scale
genomic databases into user defined
groups
dbGaP http://www.ncbi.nlm.nih.gov/gap Repository of human genotypes from
human genotype–phenotype studies
SNAP http://www.broadinstitute.org/ Web-based service for retrieval of SNP
mpg/snap/index.php LD data from the International
HapMap and 1000 Genomes Projects
Database of http://projects.tcag.ca/variation/ A repository of structural genetic
Genomic Variants variation found in healthy individuals
(continued)
296 Dylan M. Glubb et al.

Table 1
(continued)

Resource Web address Description

Innate Immunity https://regepi.bwh.harvard.edu/ Repository of human genotypes and
Programs IIPGA2/index_html haplotypes from genes related to
for Genomic innate immunity
Applications
JSNP database http://snp.ims.u-tokyo.ac.jp/ Repository of common SNPs in the
Japanese population
dbVar http://www.ncbi.nlm.nih.gov/ An NCBI repository of CNVs
dbvar/
FINDbase http://www.findbase.org Repository of allele frequencies of
pharmacogenetic markers in
different populations

sequence data and are searchable by chromosomal location, gene

or genetic variant, allowing investigators to visualize DNA
sequences of interest. The level of annotation varies significantly
between the genome browsers with the most comprehensive being
the UCSC browser. This browser has many annotations available
for display which include those for genetic variation, clinical phe-
notypes, gene expression and regulation, epigenetics, and compar-
ative genomics.
While genome browsers can provide DNA sequences with
genetic variants annotated, they are not themselves repositories of
germ-line genetic variation. The largest collection of SNPs can be
found in the NCBI dbSNP repository which now contains over 30
million human SNPs. SNPs can be searched for by rs ID or Human
Genome Variation Society name and dbSNP provides DNA
sequence flanking the SNP of interest, and, when available, popu-
lation allele and genotype frequencies. dbSNP does not contain
information about larger genomic alterations but the Database of
Genomic Variants (DGV) fills this gap in the knowledge. DGV lists
germ-line insertion/deletion (indel) variants larger than 100 base
pairs as well as inversions and copy number variants (CNVs) larger
than 1 kilobase. These structural variants can be browsed by chro-
mosome and searches can be performed by DNA sequence, gene
or chromosomal location. The Web site also has a genome browser
page allowing annotations of sequence with structural variation,
SNPs, genes and disease phenotypes. The DGV repository is lim-
ited to CNVs observed in healthy humans but dbVar, the NCBI
database of genomic structural variation, holds information about
germ-line and somatic CNVs identified in healthy and clinical sam-
ples. dbVar can be queried by many different criteria including
chromosomal position, gene, CNV ID, associated clinical pheno-
types, sample type, study and variant size and detailed CNV infor-
mation is provided.
Pharmacogenomic Web-Based Resources 297

With regards to genetic variants that are specifically related to

pharmacogenetics, the Frequency of Inherited Disorders database
(FINDbase) stores allelic frequency data of nearly 150 pharmaco-
genetic variants from ~87,000 individuals belonging to different
populations and ethnic groups. FINDbase is described in detail in
Chapter 21.
Genome browsers and databases of genetic variation provide
information about DNA sequence and variation but these reposi-
tories do not contain germ-line DNA genotypes. This role is
filled by repositories such as those of HapMap and 1000 Genomes
Project. These projects have proved to have many applications to
pharmacogenomic studies and have greatly advanced research in
the field [2]. HapMap has led the way as a source of human
genotype information and has identified millions of common
(>5 % minor allele frequency) SNPs and short indels in eleven
different ethnic populations. Researchers can use the HaploView
software application from the Broad Institute to analyze linkage
disequilibrium (LD) patterns using genotype data downloaded
from the HapMap Web site. Another way to view LD from
HapMap is using the CandiSNPer Web-tool. CandiSNPer can
retrieve SNPs of a given HapMap population in a user-defined
region flanking the SNP of interest and determine LD between
this SNP and plot annotated SNPs according to chromosomal
position. Rarer alleles are not well represented in HapMap but
the 1000 Genomes Project aims to describe the map of SNPs,
indels and larger structural variants present in at least 0.1 % (for
coding regions) to 1 % (for noncoding regions) by whole-genome
sequencing of more than 2,000 individuals from five major eth-
nic populations [3]. 1000 Genomes Project data can be visual-
ized through a specific version of the Ensembl genome browser.
Another way to view 1000 Genomes Project data is through the
SNP Annotation and Proxy Search (SNAP) Web site. SNAP uses
data from 1000 Genomes Project Pilot 1, in addition to HapMap,
to allow SNP searches by rs ID and the identification of SNPs
based on LD and the generation of LD plots.
There also exist Web-based repositories of smaller targeted
studies of human genetic variation such as the NIEHS SNPs and
SeattleSNPs projects. The NIEHS SNPs project has two phases:
Phase I entailed the identification of SNPs in genes from key bio-
logical pathways involved in environmental response through
DNA resequencing using a set of human DNA samples represent-
ing the diversity of the USA [4]; and Phase II, which is in progress,
is using second-generation DNA sequencing to characterize the
genetic variation of the entire exome. Variation data are uploaded
to the NIEHS SNPs Exome Variant Server which is searchable by
gene name or chromosomal location and complete exome sequenc-
ing data are available for download. The aim of the SeattleSNPs
project is to characterize genetic variation in candidate genes of
inflammatory responses by genotyping individuals from HapMap
298 Dylan M. Glubb et al.

population groups and generates data which is incorporated into

the HapMap project. The SeattleSNPs Web site has a Genome
Variation Server allowing searches by chromosomal location, gene
or rs ID. This database contains 4.5 million variants with corre-
sponding genotype data.
There are several Web-based repositories of germ-line genetic
variation from clinical studies. The NCBI database of Genotypes
and Phenotypes (dbGaP) contains genotypes, pedigree informa-
tion, fine mapping results and resequencing traces from over 2,000
clinical datasets. Searches of dbGaP can be made by disease, geno-
typing platform or study name or studies can be browsed. The core
study of the Wellcome Trust Case Control Consortium (WTCCC)
has genotype information derived from genotyping of 500,000
SNPs in 2,000 individuals from each of seven common diseases in
addition to 3,000 control individuals from the UK [5]. WTCCC
genotype data are not publically available but access can be obtained
by researchers after application to the WTCCC Data Access
Committee.
Many databases and tools can be used to mine the data origi-
nating from these projects, some of which are described extensively
in this book. SPSmart is a Web-tool which incorporates genotype
data from many of the database repositories such as HapMap,
1000 Genomes and others and allows a specific population-based
analysis across databases by SNP, chromosomal location or gene.

3 Repositories of Cancer Somatic Variation (Table 2)

The identification of somatic DNA variation associated with cancer

and wide-scale genotyping of such variants has become an area of
considerable research activity in recent years. The International
Cancer Genome Consortium (ICGC) is collecting genomic data
from 50 different cancer types using more than 25,000 individual
tumors, initially through whole exome sequencing [6]. The data
will include full range of somatic variation (SNPs, indels, CNVs
and chromosomal rearrangements) which occur at a frequency of
at least 3 % and matching non-tumor tissue will be used to distin-
guish somatic variants from germ-line ones. The ICGC Data Portal
allows searches to be made by genes, samples, simple somatic vari-
ants and structural rearrangements. Queries by gene will generate
mutation summaries identifying the frequencies of somatic muta-
tions in ICGC tumor datasets. Alternatively, genes or gene path-
ways affected by somatic mutations can be identified by searching
the database by tumor type. Raw sequencing data can be accessed
by researchers upon request. A variety of DNA sequencing and
genotyping techniques, which have changed over the course of the
project, have been used to identify somatic mutations by The
Cancer Genome Atlas (TCGA). TCGA is cataloguing genomic
Pharmacogenomic Web-Based Resources 299

Table 2
Repositories of somatic (cancer) variation

Resource Web address Description

ICGC http://www.icgc.org/ Repository planned to contain DNA
sequencing and somatic mutation data
from 50 different tumor types
COSMIC http://www.sanger.ac.uk/cosmic/ Repository of somatic mutation, genotype
and whole genome sequencing data from
cancer studies
TCGA http://cancergenome.nih.gov/ Repository planned to contain DNA
dataportal sequencing and somatic mutation data
from 20 different tumor types
variant GPS http://variantgps.nci.nih.gov/ Repository of genotyping data and genetic
cgfseq/pages/home.do variants identified from targeted next-
generation sequencing in cancer studies
cBio Cancer http://www.cbioportal.org/ Searchable Web-tool which integrates
Genomics Portal public-portal/ tumor and somatic mutation data from
TCGA and the Memorial Sloan-
Kettering Cancer Center
Tumorscape http://www.broadinstitute.org/ Repository of somatic CNV data from
tumorscape multiple cancer types
Oncomine https://www.oncomine.org Searchable Web-tool which integrates
somatic CNV data from TCGA
Cancer Genomics https://genome-cancer.ucsc.edu/ Web-tool which integrates somatic
Browser mutation data from TCGA and other
cancer genomic studies

information from more than 20 different types of cancer using

matched tumor and normal tissue and plan to collect samples from
500 patients for each cancer. As of early 2012, nearly 6,000 patient
samples have been collected and analyzed. The TCGA Data Portal
allows searches to be made by genes, somatic variants and disease
type, and raw sequencing data can be retrieved and downloaded
upon granting of access.
The Catalogue of Somatic Mutations in Cancer (COSMIC)
has collated data generated from the aforementioned ICGC and
TCGA studies, the Cancer Genome Project (CGP) and targeted
sequencing of the NCI60 cell lines (a panel of 60 diverse human
cell lines) in known cancer genes, in addition to information
extracted through the literature. As of early 2012, this repository
holds genotypes from over 600,000 tumors, whole-genome
sequencing data from nearly 500 cancer genomes and lists more
than 200,000 somatic mutations. The somatic variation data can
be searched by gene, sample, tissue or mutation or browsed by
300 Dylan M. Glubb et al.

tissue and sub-tissue categories to identify genes in which muta-

tions reside and their distribution within a gene. To determine
whether somatic mutations in a specific gene may be driving a can-
cer, the cBio Cancer Genomics Portal is a useful resource. This
Web-tool provides the useful function of identifying the percent-
age of cases within a particular tumor type which display mutations
in a gene of interest with data derived from TCGA and the
Memorial Sloan-Kettering Cancer Center. A tab linking to survival
curve data shows whether there is a significant association of sur-
vival and mutations in the gene of interest.
Somatic CNVs are a common feature of cancers and, in addi-
tion to COSMIC and TCGA, the Oncomine and Tumorscape
repositories contain such information. The Tumorscape Web site
allows the querying of data by tumor type or by gene from a study
of somatic CNVs in more than 3,000 specimens from 26 types of
tumors generated and data can be downloaded from specific stud-
ies. Searches by gene will provide summary information of genetic
amplifications and deletions in specific cancers while searches by
tumor type will summarize the amplifications and deletions identi-
fied. The UCSC Cancer Genomics Browser also hosts somatic CNV
data in addition to other somatic mutations from the TCGA and
other cancer genomic projects. The data is searchable by chromo-
some or gene but genomic data cannot be currently downloaded.

4 Repositories of Gene Expression Data (Table 3)

Just as technologies have evolved to allow the interrogation of

DNA variation at the genome-wide level, it is also now possible to
quantitate the mRNA output of the genome through DNA expres-
sion arrays or next-generation sequencing (RNA-seq) [7]. One of
the biggest resources of these data is the Gene Expression Omnibus
(GEO) repository which is maintained on the NCBI Web site.
GEO allows the downloading of gene expression data from over
2,500 datasets containing more than 625,000 samples, including
many pharmacogenomic studies and can be queried for studies,
experimental keywords, genes or even nucleotide sequences.
Alternatively, GEO can be browsed by dataset, array platform or
samples. Another large repository of mRNA data is ArrayExpress
which contains microarray gene expression collected to MIAME
standards from over 25,000 experiments. ArrayExpress data can be
downloaded upon querying or browsing by studies, experiments,
array platforms, or genes. Within the ArrayExpress archive, a sub-
set of more than 5,000 experiments have been curated and re-
annotated allowing queries of individual genes to determine effects
on expression across experiments in specific diseases, tissues, cells
under different biological conditions or treatments.
Pharmacogenomic Web-Based Resources 301

Table 3
Repositories of gene expression data

Resource Web address Description

ICGC http://www.icgc.org/ Copy number, rearrangement, expression,
and mutation data
GEO http://www.ncbi.nlm.nih.gov/geo/ Repository of gene expression data
from >2,500 studies
Oncomine http://www.oncomine.org Repository of gene expression data
from GEO, TCGA and other projects
Cell Miner http://discover.nci.nih.gov/cellminer/ Repository of gene expression and GI50
drug concentration data from NCI60
cell lines
SBM DB http://www.lsbm.org/site_e/database/ Repository of mRNA expression
from healthy and tumoral tissues
ArrayExpress http://www.ebi.ac.uk/arrayexpress/ Repository of microarray-derived mRNA
expression data from >25,000 studies
GENT http://medical-genome.kribb.re.kr/ Repository of microarray-derived
GENT/ mRNA expression from >34,000
tissue samples
Cancer Genome http://cgap.nci.nih.gov/cgap.html Repository of gene expression from
Anatomy normal, precancer and cancer cells
Project
UCSC Genome http://genome.ucsc.edu DNA sequences annotated with gene
Browser expression data from a wide range
of sources

Oncomine has a focus on cancer gene expression and includes

relevant data from GEO, TCGA and other publically available
repositories. Tools in Oncomine allow the comparison of gene
expression in tumor and normal tissue samples to identify genes
which are specifically expressed in the dataset of interest. The dif-
ferential analysis option, available in the premium edition of
Oncomine, is especially pertinent to the study of pharmacogenomics
as it can be used to examine drug sensitivity and patient treatment
response data. The Gene Expression across Normal and Tumor tis-
sue (GENT) database also contains gene expression data from tumor
and normal tissue. GENT stores mRNA data profiled using
Affymetrix microarray gene expression arrays from over 34,000 tis-
sue samples and nearly 1,000 human cancer cell lines. Queries can
be made by Affymetrix probe or gene and the output plots gene
expression across tissues (tumor and/or normal) or cell lines.
Another source of expression data from normal and tumor tis-
sue samples, including expression of miRNAs, is the UCSC genome
browser. Furthermore, gene expression data from the NCI60 cell
302 Dylan M. Glubb et al.

lines is also available in the browser. The NCI60 cell lines have
been used to screen >100,000 chemicals and Cell Miner database
contains drug concentrations which inhibit cell growth by 50 %
(GI50), expression (gene, protein and miRNA), DNA methylation
and fingerprinting data generated from these cell lines. Queries can
be made by drug, gene, protein, miRNA, tissue and the corre-
sponding data is available for download. The Systems Biology and
Medicine Database (SBM DB) is another repository which stores
expression data from cell lines. SBM DB contains mRNA expres-
sion profiles from 112 different (normal and tumor) tissues and
cell lines measured using Affymetrix U133 microarrays. Searches
can be made by gene and relative expression visualized across the
different tissues and cells.

5 Repositories of Expression Quantitative Trait Loci (eQTL) Data (Table 4)

Gene expression can be used as a quantitative trait to which SNPs

can be associated and such variants are known as expression quan-
titative trait loci (eQTL). This enables mRNA expression data from
genome-wide studies to be analyzed for associations with the large
numbers of SNPs which are known today. The field of eQTL
research is in the early stages of development but several searchable
repositories are available on the Internet and one of them, SCAN,
which contains eQTLs identified in lymphoblastoid cell lines
(LCLs), is discussed in detail in Chapter 14. The eQTL.uchicago.
edu Web site hosts a genome browser which has eQTL annota-
tions from liver, brain, LCL, monocytes, T-cell, and brain studies.
The Genotype-Tissue Expression (GTEx) eQTL browser contains
eQTL data from liver, brain and LCL studies. Further eQTL data
from other tissues will be added as the project progresses. The data
can be searched by SNP or gene and results can be filtered by the
significance of the eQTL association. eQTLs are also annotated
with significant clinical associations observed in Genome-Wide

Table 4
Repositories of eQTL data

Resource Web address Description

SCAN http://www.scandb.org Repository of SNP and CNV eQTLs
identified in LCLs
eqtl.uchicago.edu http://eqtl.uchicago.edu/cgi-bin/ Genome browser with eQTL annotations
gbrowse/eqtl/
GTEx http://www.ncbi.nlm.nih.gov/ Repository of eQTL data from different
gtex/GTEX2/gtex.cgi tissues
Pharmacogenomic Web-Based Resources 303

Association Studies (GWAS). seeQTL is a human eQTL browser

with data from monocyte, brain and LCL studies and incorporates
a meta-analysis approach to score eQTLs across the studies of
LCLs from different HapMap populations [8]. The browser is
searchable by chromosomal position, gene or SNP.

6 Repositories of Genetic and Genome-Wide Association Studies

and Related Tools (Table 5)

Repositories of phenotype–genotype association studies are very

useful resources for pharmacogenomic researchers looking to find
genes or genetic variants associated with drug responses or clinical
relevance for genes or variants of interest. dbGaP is a repository of
such clinical data and can be browsed by studies, clinical variables,
analyses and datasets. Genotype and phenotype data are available
for download although some data are embargoed or have con-
trolled access. Additionally, dbGaP contains data from GWAS.
These analyses allow the testing of associations between clinical
phenotypes and a large number of common SNPs. The National
Human Genome Research Institute (NHGRI) maintains a catalog
which summarizes the findings of GWAS which have assayed at
least 100,000 SNPs. The entire catalog can be downloaded or
searched by clinical phenotype, chromosomal region, gene or SNP.
There are also Web-based portals such as the Phenotype-Genotype
Integrator (PheGenI) which allow dbGaP and the NHGRI catalog
to be queried by clinical phenotype, SNP, gene or chromosomal
location. Queries using these tools will return a summary of

Table 5
Repositories of GWAS and clinical genetic associations

Resource Web address Description

GWAS Central http://www.gwascentral.org/ Repository of genetic association studies
HuGE Navigator http://hugenavigator.net/ Series of Web-tools which enable
HuGENavigator/home.do mining of the literature and genetic
association studies
OMIM http://omim.org/ Compendium of human genes and
related phenotypes
NHGRI catalog http://www.genome.gov/gwastudies/ Repository of GWAS results
dbGaP http://www.ncbi.nlm.nih.gov/gap Repository of phenotype–genotype
data from clinical studies
PheGenI http://www.ncbi.nlm.nih.gov/ Web-tool which allows queries of
gap/PheGenI dbGaP and the NHGRI catalog
304 Dylan M. Glubb et al.

significant SNP-phenotype associations with links to the relevant

studies in dbGaP or the NHGRI catalog.
The human genome epidemiology (HuGE) Navigator enables
the mining of the literature, GWAS, meta- analyses, cancer studies
for clinical and epidemiological genetic associations. Queries can
be made by disease, gene, SNP, study keyword and the individual
databases of associations organized by GWAS, genes and diseases
can be downloaded. Searching by disease will generate a table of
genes which studies have shown to be associated with that disease
and the table further divides studies into GWAS and meta-analyses.
There is a bioinformatics tool called Gene Prospector which rates
the gene-disease associations based on the available evidence. Links
to relevant studies in dbGaP are also provided. The GWAS results
link to summaries of the associations. Similarly, searching by gene
will generate a table of diseases which studies have shown to be
associated with that gene. GWAS Central performs a similar role to
the HuGE Navigator and is one of the largest repositories of
genetic association studies with summaries of data from over 1,000
studies, including GWAS and candidate gene or region studies.
GWAS Central builds upon SNPs and other variants from public to
provide findings from genetic association studies without provid-
ing individual level genotypes or phenotypes. Summary level data
can be presented for one or more studies, comprised of a single or
multiple experiments and subjects. Data can be queried or browsed
by SNP, gene, chromosomal location, disease, or study keyword
and filtered by the p-value of the association.

7 Bioinformatic Tools to Predict SNP Function (Table 6)

Bioinformatic prediction of the functional consequences of a par-

ticular genetic variant plays an increasingly important role in genetic
association studies, as genetic associations identified from non-can-
didate driven studies, such as GWAS, do not typically have an obvi-
ous mechanistic explanation. Non-synonymous (i.e., amino acid
changing) SNPs are the variants most likely to have effects on pro-
tein function and many validated pharmacogenetic SNPs are non-
synonymous. There are several Web sites which provide a
bioinformatic analysis of non-synonymous SNPs. SIFT (sorts intol-
erant from tolerant) is an algorithm which is based on the degree
of amino acid conservation in similar sequences and SNPs can be
queried by rs ID or chromosomal location. The results of the analy-
sis are a score and a prediction of whether the SNP is damaging or
tolerated by the protein. The Polymorphism Phenotyping
(PolyPhen) Web site performs a similar role and contains analyses
of the effects of all 54,373 unique human SNPs from the dbSNP
build 126 which map to proteins. A score is given to each SNP and
a prediction of its effect on parameters such as protein structure,
Pharmacogenomic Web-Based Resources 305

Table 6
Tools to predict SNP function

Resource Web address Description

ENCODE http://genome.ucsc.edu/ENCODE/ ENCODE data can be directly
Project downloaded or visualized through the
UCSC Genome Browser
SNP Function http://brainarray.mbni.med.umich.edu/ Provides annotation of SNPs at the
Portal Brainarray/Database/SearchSNP/ genome, transcript, protein, pathway,
snpfunc.aspx disease and population levels
FastSNP http://fastsnp.ibms.sinica.edu.tw/pages/ Web-tool which incorporates
input_CandidateGeneSearch.jsp bioinformatic analyses of SNP function
pfSNP http://pfs.nus.edu.sg Web-tool which incorporates bioinformatic
analyses of SNP function and results
from genetic association studies
Pupasuite http://pupasuite.bioinfo.cipf.es Web-tool which incorporates
bioinformatic analyses of SNP function
SIFT http://sift.jcvi.org/ Web-tool which predicts effects
of non-synonymous SNPs
PolyPhen http://genetics.bwh.harvard.edu/pph/ Web-tool which predicts effects
of non-synonymous SNPs
FuncPred http://snpinfo.niehs.nih.gov/ Web-tool which incorporates
snpfunc.htm bioinformatic analyses of SNP function
F-SNP http://compbio.cs.queensu.ca/F-SNP/ Web-tool which incorporates
bioinformatic analyses of SNP function

hydrophobicity, and function. The complete database can be down-

loaded or searches can be made by rs ID or amino acid sequence.
Synonymous SNPs or those not in coding regions can have
functional effects, most likely on gene expression, through mecha-
nisms which modify exon–intron splicing, transcription factor and
miRNA binding. A useful way to analyze an SNP when uncertain
of its functionality is to use one of the Web-based meta-analysis
tools. SNP Function Prediction (FuncPred) from the NIEHS can
be queried by SNP, gene, or chromosomal position and provides
predictions of transcription factor and miRNA binding, splicing,
non-synonymous SNPs and PolyPhen analysis, stop codons and
scores for regulatory potential, and sequence conservation based
on comparative genomics are generated. In addition, SNPs in LD
with the variant of interest in a specific population can also be visu-
alized. F-SNP provides similar functional analyses of effects on
splicing and transcription, and also protein function, including
posttranslational effects, using information from 16 tools and data-
bases. FastSNP provides a gene or SNP based search and provides
a risk definition based upon a decision tree hierarchy classifying
306 Dylan M. Glubb et al.

SNPs into different categories of functional significance from

nonsense to downstream SNPs with no known function.
The EnCyclopedia Of DNA Elements (ENCODE) project has
generated much genomic and epigenomic data which aids the
identification of functional elements in the genome and thus
enables hypotheses to be generated about genetic variants.
ENCODE data has been generated from experiments examining
transcription, chromatin accessibility, histone modification, DNA
methylation, transcription factor binding, and several other
genomic/epigenomic features in many different cell types [9].
Data are available for download or can be visualized in the UCSC
genome browser.

8 Web sites Related to Pharmacogenetic Testing (Table 7)

There are now over a 100 FDA-approved drugs which have

pharmacogenetic information in their label and the FDA Web site
contains a list of these drugs with links to label and pharmacoge-
netic information. Warfarin is one of these drugs and its pharmaco-
genetics has been very well characterized. Indeed, a dosing algorithm
for warfarin is available at the Warfarin Dosing Web site (Chapter
22). The algorithm takes into account the genotypes of variants in
CYP2C9, CYP4F2, GGCX and VKORC1, in addition to clinical
factors, to estimate the warfarin dose a patient should receive.
There are more pharmacogenetic tests available than just those
listed on FDA drug labels and the Genetic Diagnostic Network
(GENDIA), an international network of more than 100 laborato-
ries, currently lists 201 tests on its pharmacogenetic Web site

Table 7
Web sites related to pharmacogenetic testing

Resource Web address Description

FDA http://www.fda.gov/Drugs/ Links to pharmacogenomic information
ScienceResearch/ResearchAreas/
Pharmacogenetics
PHARMACO- http://www.pharmaco-gendia.net Repository of pharmacogenetic tests
GENDIA
GTR http://www.ncbi.nlm.nih.gov/gtr/ Repository of genetic test information
CancerGEM KB http://www.hugenavigator.net/ Repository of cancer genetic tests
CancerGEMKB
Warfarin Dosing http://www.warfarindosing.org/ Web-based algorithm for warfarin dosing
Source/Home.aspx
PharmGKB http://pharmgkb.org Repository of pharmacogenetic information
and PGx-based dosing guidelines
Pharmacogenomic Web-Based Resources 307

(PHARMACO-GENDIA). A searchable repository of genetic test

information is provided by the Genetic Testing Registry (GTR)
from the NIH. The GTR links genes and genetic variants to dis-
eases and drug responses. The database can be queried by test,
drug, disease and other clinical phenotypes, gene, protein or labo-
ratory providing genetic tests. In the context of cancer genetic
tests, Cancer GEM KB maintains a list of related tests which can be
queried by disease, gene or drug.

9 Biorepositories and Population-Based Cohorts with Linked

Medical Records (Table 8)

Large populations which are accurately clinically phenotyped are

increasingly needed to explore and understand the relationship
between genetic variants and drug responses. Moreover, studies
linked to national biorepositories which store biological samples

Table 8
Web sites for biorepositories and large cohorts with linked medical information

Resource Web address Description

CGN http://www.cancergen.org/ Clinical database of patients from 14
cancer centers
NCI Specimen http://pluto3.nci.nih.gov/ Web-portal which allows searches
Resource Locator tissue/default.cfm for biological samples from cancer
patients
BioLINCC http://biolincc.nhlbi.nih.gov/home/ Web-portal which allows searches
for biological samples and clinical
data from NHLBI studies
Rare Disease-HUB http://biospecimens.ordr.info.nih.gov/ Web-portal which allows searches
for biological samples from
patients with rare diseases
PGP http://www.personalgenomes.org/ Open access repository of genotype
data with linked medical and
personal information
iSAEC http://www.saeconsortium.org Repository of clinical and
genotyping data from studies of
adverse drug events
eMERGE Network https://www.mc.vanderbilt.edu/ Network of DNA repositories with
victr/dcc/projects/acc/ linked electronic medical records
index.php/Main_Page
UK Biobank http://www.ukbiobank.ac.uk/ Population-based repository of
biological samples with linked
medical, lifestyle and family
history data
308 Dylan M. Glubb et al.

from participants, such as the BioBank Japan Project [10] and UK

Biobank [11], have great potential for pharmacogenomic analysis.
The UK Biobank is a longitudinal project which has collected
blood, urine and saliva samples, in addition to health and lifestyle
information, from 500,000 individuals aged 40–69 who live in the
UK. To access samples, researchers have to register, propose a
research project and pay charges related to administration of the
project. However, researchers can visualize data summaries using
UK Biobank Showcase, an open access Web-portal, which allows
the study to be queried or browsed for factors such as lifestyle,
family and medical history, health and physical measures. Indeed,
many data coordinating centers and registries allow their registries
to be searched for numbers of patients with specific clinical charac-
teristics without restrictions. The Cancer Genetics Network (CGN)
is a data coordinating center for 14 clinical centers and the data-
base contains information from over 26,000 individuals with can-
cer and/or a family history of cancer. The CGN database can be
queried by demographics, family history of cancer, clinical charac-
teristics, and whether a genetic test has been performed to deter-
mine the number of patients who meet the criteria selected. Access
to the database is then possible after acceptance of a research pro-
posal. One Web-based repository which contains medical and per-
sonal information, in addition to genotype data, with unrestricted
access is the Personal Genome Project (PGP) [12]. The PGP has
enrolled 1,000 individuals so far and aims to incrementally expand
to 100,000.
There are also large studies which have matching electronic
medical records. For example, the Electronic Medical Records and
Genomics (eMERGE) Network is an NHGRI consortium of five
institutions that link DNA from over 100,000 individuals to elec-
tronic medical records [13]. DNA samples and electronic medical
records from (eMERGE) Network of DNA repositories can be
accessed after a successful affiliate membership application that
contains a research proposal.
The International Serious Adverse Event Consortium (iSAEC)
provides a Web-based portal to serious adverse event clinical and
genotype data from participating pharmaceutical companies and
academic institutions available to researchers who sign the consor-
tium’s data use agreement. In addition, iSAEC has also compiled
data from similarly genotyped population controls matched for
age, sex and ethnicity.
Web-based repositories which enable the location and avail-
ability of biospecimens with associated clinical data provide useful
tools for researchers. The NCI Specimen Resource Locator can be
queried for tumor specimens and other matching biological sam-
ples (e.g., DNA/RNA) based on the type of tumor, the associated
clinical data available and will link to specific biospecimen reposito-
ries matching the specified criteria. Similarly, the Rare
Pharmacogenomic Web-Based Resources 309

Disease-HUB from the Office of Rare Diseases Research (NIH)

can be queried by disease, biospecimen anatomic source, process-
ing and storage method, imaging data available and will then link
to repositories matching the specified criteria. The Biologic
Specimen and Data Repository Information Coordinating Center
(BioLINCC) contains information about clinical studies supported
by the National Heart Lung and Blood Institute (NHLBI). The
BioLINCC can be queried by study conditions and biospecimen
type. Researchers must register to request data and biospecimens
or add study data and information to the database.

10 Conclusion and Future Directions

Pharmacogenomic research is a rapidly evolving research area mov-

ing away from singular in vitro/ex vivo/in vivo experiments and
converging into large scale agnostic in vivo genotype/phenotype
studies of large cohorts of patients. This growth and evolution is
certainly tied to the growth in processing and communication
capacity provided by Internet connected workstations and com-
puter facilities which dwarf the technology available only a few years
ago. Barriers remain to the utilization and interpretation of the
results of these experiments including the lack of a common stan-
dardized application programming interface among pharmacoge-
nomic repositories which leads to a lack of connectivity and some
level of redundancy. Problems of ethics, human subject protection,
and the commercialization of publicly funded resources also remain
important areas for discussion among both scientific and public
groups. As new technologies for massive data generation further
emerge and become more widely used the expected population
of computational Web-based portals for genomic data analysis will
also grow and perhaps led to the emergence of a new type of system
science to merge these data sources together into knowledge.

References

1. Wagner MJ (2009) Pharmacogenetics and per- 5. Consortium, T.W.T.C.C (2007) Genome-

sonal genomes. Per Med 6:643–652 wide association study of 14,000 cases of seven
2. Zhang W, Dolan ME (2010) Impact of the 1000 common diseases and 3,000 shared controls.
genomes project on the next wave of pharma- Nature 447:661–678
cogenomic discovery. Pharmacogenomics 11: 6. Hudson TJ et al (2010) International network
249–256 of cancer genome projects. Nature 464:
3. Consortium, G.P (2010) A map of human 993–998
genome variation from population-scale 7. Roy NC et al (2011) A comparison of analog
sequencing. Nature 467:1061–1073 and Next-Generation transcriptomic tools for
4. Rieder MJ et al (2008) The environmental mammalian studies. Brief Funct Genomics
genome project: reference polymorphisms for 10:135–150
drug metabolism genes and genome-wide asso- 8. Xia K et al (2012) seeQTL: a searchable database
ciation studies. Drug Metab Rev 40:241–261 for human eQTLs. Bioinformatics 28:451–452
310 Dylan M. Glubb et al.

9. Myers RM et al (2011) A user’s guide to the 12. Angrist M (2009) Eyes wide open: the
encyclopedia of DNA elements (ENCODE). personal genome project, citizen science and
PLoS Biol 9:e1001046 veracity in informed consent. Per Med 6:
10. Nakamura Y (2007) The BioBank Japan 691–699
project. Clin Adv Hematol Oncol 5:696–697 13. Kho AN et al (2011) Electronic medical
11. Ollier W, Sprosen T, Peakman T (2005) UK records for genetic research: results of the
Biobank: from concept to reality. Pharmaco eMERGE consortium. Sci Transl Med 3:
genomics 6:639–646 79re1
Chapter 20

PharmGKB: The Pharmacogenomics Knowledge Base

Caroline F. Thorn, Teri E. Klein, and Russ B. Altman

Abstract
The Pharmacogenomics Knowledge Base, PharmGKB, is an interactive tool for researchers investigating
how genetic variation affects drug response. The PharmGKB Web site, http://www.pharmgkb.org, dis-
plays genotype, molecular, and clinical knowledge integrated into pathway representations and Very
Important Pharmacogene (VIP) summaries with links to additional external resources. Users can search
and browse the knowledgebase by genes, variants, drugs, diseases, and pathways. Registration is free to the
entire research community, but subject to agreement to use for research purposes only and not to redis-
tribute. Registered users can access and download data to aid in the design of future pharmacogenetics and
pharmacogenomics studies.

Key words PharmGKB, Database, Pharmacogenetics, Pharmacogenomics, Genotype, Phenotype,

Pathways, VIP genes, Pharmacogenes

1 Background

In 1999 the National Institutes of Health recognized the need for

a freely available collection of high quality genotypic and pheno-
typic data from pharmacogenetics and pharmacogenomics studies,
and announced the funding of the Pharmacogenetics Research
Network (PGRN). Its mission: “to enable the formation of a series
of multi-disciplinary research groups funded to conduct studies
addressing research problems in pharmacogenetics. These groups
are united by the purpose of developing and populating a public
database, which was envisioned as a tool for all researchers in the
field.” [1] This tool is the PharmGKB, the Pharmacogenomics
Knowledge Base, with Web site access that provides summaries of
pharmacogenomic relationships linked to the data that support
them, to be used by the scientific community for pharmacogenetics
and pharmacogenomics research (Fig. 1).

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_20, © Springer Science+Business Media, LLC 2013

311
312 Caroline F. Thorn et al.

Fig. 1 The PharmGKB homepage, http://www.pharmgkb.org, contains directed search boxes where users can
also browse from lists of genes, variants, drugs, or diseases

2 Overview

PharmGKB captures pharmacogenomic relationships in a struc-

tured format so that it can be searched, interrelated, and displayed
according to the researchers interests, either for manual inspection
or to download for further analyses. The knowledge base is valu-
able both to the researcher who is interested in a specific single
nucleotide polymorphism and its influence on a particular drug
PharmGKB: The Pharmacogenomics Knowledge Base 313

treatment and to the researcher interested in a disease or drug and

looking for candidate genes which may affect disease progression
or drug response. At present PharmGKB has over 5,000 variant
annotations, with over 900 genes related to drugs and over 600
drugs related to genes [April 2013]. The data contained within
the database is curated from a variety of sources to bring together
the most relevant features of genes, drugs, and diseases for phar-
macogenomics [2]. Some information is imported directly from
other trusted standard repositories (such as gene symbols and
names from the Human Genome Nomenclature Committee,
HGNC [3], drug names and structures from Drugbank [4]);
detailed relationship data from the literature is manually curated
and described using controlled vocabularies. For genes and drugs
where many relationships are known, these are compiled by cura-
tors and experts in the field into Very Important Pharmacogene
(VIP) summaries and PharmGKB drug pathways and published in
an interactive form on the Web site and conventional form in peer
reviewed journals [5, 6].
PharmGKB averages around 30,000 visitors per month. Of the
more than 5,000 user accounts, approximately 30 % are identified
as academic users (.edu), with 30 % from industry (.com) and 8 %
from nonprofit or government domains. A user account and agree-
ment to the PharmGKB database license agreement is necessary for
downloading data. Data is distributed as zipped up packages of
spreadsheets with literature relationships, variant annotations, clini-
cal annotations or pathway relationships. Individualized genotype
and phenotype datasets from pharmacogenomics studies of the
PGRN can be found under the download tabs of the relevant
genes, drugs, and diseases.
PharmGKB exchanges data with Drugbank, dbSNP, the
CYP alleles database, and HuGE Navigator. Data is imported
from HGNC, Entrez, and UCSC Golden Path. Links are also
maintained with a number of other sources as seen under the
Downloads/Link Outs tab.
The initial interaction with the Web site is through pages
devoted to genes, variants, drugs, diseases, and pathways, with
directed searches to make access to these more rapid for focused
users (see the hompage, Fig. 1). The data is represented according
to a hierarchy and tagged with icons. This enables many facets of
the data to be captured and stored in the database but also permits
the user to find exactly what they are looking for. The use of stan-
dardized vocabulary aids both the sorting and storage of data and
supports automated methods of analysis as well as traditional
human browsing.
314 Caroline F. Thorn et al.

3 Initial Interactions with the PharmGKB Web Site: Gene, Drug, and Disease Pages

In PharmGKB, genes are catalogued according to the HGNC [3].

In addition alternative names and symbols are also listed and can
be submitted by researchers and searched on. The general layout of
a gene page is shown in Fig. 2. The data are organized under tabs
for clinical pharmacogenomics, pharmacogenomics research, over-
view, VIP, haplotypes, pathways, related genes, drugs, and diseases,
datasets, and downloads or links out. The clinical pharmacoge-
nomics tab displays any dosing guidelines involving the gene pub-
lished by CPIC (the Clinical Pharmacogenomics Implementation
Consortium) [7] and the Royal Dutch Pharmacogenetics Working
Group [8]. This tab also has drug labels, high level clinical annota-
tions (described in more detail below) and links to genetic testing
sources for the gene. The pharmacogenomics research tab lists
genomic variants associated with the gene and the drugs they
interact with and links to annotations that describe the relationship
between the variants and drugs from individual papers (described
in more detail below). The overview page contains the basic data
about the gene, standard and alternate names and symbols, and
location on the genome. The VIP tab is present for genes where

Fig. 2 The TMPT gene page showing genomic variants and related drugs with links to the annotations and
haplotypes and tabs for clinical pharmacogenomics, pharmacogenomics research, overview, VIP, haplotypes,
pathways, related drugs and diseases, and downloads and links out
PharmGKB: The Pharmacogenomics Knowledge Base 315

there is considerable knowledge of the pharmacogenomics and a

summary has been written (see below for more details and Fig. 2).
Pathway tabs link to the curated drug pathways that involve the gene.
Related genes, drugs, and diseases are compiled from literature anno-
tations (described below). Download/Link outs provide a mecha-
nism to retrieve primary data files or go to the original source.
Drug and disease pages follow a similar tabbed layout style to
the gene pages. Drug information including pharmacological
effects, mechanisms of action, and structures was obtained from
Drugbank [4] and Pubchem [9]. Additional information and short
pharmacogenomics summaries for the top 100 drugs (selected
based on a combined list of the most prescribed drugs and the
most reported drugs for adverse events) was compiled by
PharmGKB curators. Disease information is imported from MeSH
[10] and SnoMed [11].

4 Curated Knowledge

Capturing the wealth of pharmacogenomic data already published

is a considerable challenge. Most of this information is stored in
written natural language text in journal articles or books and not
easily retrieved by automated methods. We conduct research into
natural language processing (NLP) and ways in which to appropri-
ately aggregate all pharmacogenetics and pharmacogenomics
articles in Pubmed [12] but there is still a necessity for human
curation to ensure quality data [13].

4.1 Literature A basic literature annotation captures the genes, drugs, and diseases
Annotations involved in a single article from Pubmed and the category (or catego-
ries) of evidence that describe the type of relationships measured.
Our current process for literature annotation uses NLP to suggest
possible genes, drugs, and diseases to the curator [14, 15] but after
reading the article the curator decides which are appropriate.

4.2 Genomic Variant In addition to tagging articles for basic relationships curators can
Annotations and Very also describe in detail the relationships for individual variants and
Important their effects on drug response. The variant is mapped to the dbSNP
Pharmacogenes identifier and controlled vocabularies are used to define the alleles
or genotypes observed in the paper and their response to drug,
in the particular population studied. Information about the popu-
lation size, location or race and ethnicity, allele frequencies and
statistical measures can be captured and stored in the database.
Although time consuming, the benefit of annotating each indi-
vidual publication in such a detailed manner is that it will allow for
all kinds of computational analyses. PharmGKB currently has over
5,000 genomic variant annotations [April 2013].
316 Caroline F. Thorn et al.

In addition to the very structured annotations, a more text

based, reader-friendly format is provided to summarize the relation-
ships for genes and variants where many there have been many phar-
macogenomic studies. These mini-reviews are known as Very
Important Pharmacogene summaries or VIPs. PharmGKB currently
provides VIPs for 47 genes [April 2013] with a priority list of more
to be developed. The list of VIP genes has been used by several
groups in a variety of studies to provide a candidate set of genes to
work from [16–19]. The NIH Pharmacogenomics Research
Network (PGRN) has a longer list of more than 500 genes of rel-
evance to pharmacogenetics which is available at PharmGKB.

4.3 Clinical Once there is sufficient evidence available from variant annotations
Annotations for a given variant and drug combination a clinical annotation is
written. This is a summary of the clinical relevance for each of the
individual genotypes that may be observed for a given gene variant
and drug combination. The PharmGKB’s clinical annotations reflect
expert consensus based on clinical evidence and peer-reviewed lit-
erature available at the time they are written and are intended only
to assist clinicians in decision-making and to identify questions for
further research. A strength of evidence score is given for clinical
annotations based on the type of study, number of study subjects,
and statistical significance reported.

4.4 Pathways Historically many pharmacogenetic studies have focused on single

genes involved in drug side affects, there is now a growing inter-
est in how pathways of interacting genes can affect both drug
metabolism and drug response. PharmGKB pathways are drug-
centered, depicting candidate genes for pharmacogenetics and
pharmacogenomics studies, they provide the means to connect
separate data sets to represent the current knowledge as a cohe-
sive snapshot. The diagrams have information content in the shape
and color of the icons that represent whether the component is a
gene, a drug, a metabolic intermediate, and so on. This informa-
tion is captured in the database in a Biopax [20] compatible format
that can be downloaded and used in pathway analysis packages.
The Web-displayed pathways are interactive and clicking on a gene
icon opens a window with the gene page, clicking on a drug opens
a window of a drug page, etc. The Irinotecan Pathway is shown
in Fig. 3 as an example. We currently have 99 curated pathways
[April 2013], many of which have been published in peer reviewed
journals [21–35].
A summary is provided to describe in words the content of the
graphic, its particular view and limitations, and additional, perhaps
ill-defined or controversial, data that was not included in this
representation. The pathways are generated by collaboration of
investigators to link data, either novel or in the public domain,
centered on a particular drug. The representation is a consensus of the
PharmGKB: The Pharmacogenomics Knowledge Base 317

Irinotecan Pathway, Pharmacokinetics

Liver cell: Model human liver cell showing blood, bile and intestinal compartments, indicating tissue
specific involvement of genes in the irinotecan pathway.

Neutropenia

CES1 SN-38
Irinotecan CES2 BCHE
Cell Membrane
SLCO1B1 ABCC1

APC
M4

NPC
CYP3A4 CYP3A4 Liver cell

CYP3A5 CYP3A5 CES1

UGT1A1
Irinotecan
SN-38

CES1 UGT1A9

CES2

SN-38G

Bile
ABCC2 Via
ABCB1
ABCC2 ABCG2
ABCB1 ABCC2
e
stin
Inte
UGT1A1
SN-38G
UGT1A10
Irinotecan
CES1 SN-38
CES2

Diarrhea

Fig. 3 The Irinotecan Pathway, view of a model human liver cell showing blood, bile, and intestinal compartments,
indicating tissue-specific involvement of genes in the irinotecan pathway. Drugs are depicted by purple boxes,
transporter genes by turquoise ovals, genes coding for metabolic enzymes by blue ovals. http://www.pharmgkb.
org/do/serve?objId=PA2001&objCls=Pathway
318 Caroline F. Thorn et al.

opinions of the authors. Currently these pathways are constructed

by hand as graphic images. They are then converted by a curator into
gpml, GenMapp pathway markup language, a BioPax compatible
format, and stored in the knowledgebase.

5 Future Directions

Since the year 2000, the PharmGKB has become the “go to” site
for pharmacogenetics and pharmacogenomics knowledge [36,
37]. In response to assessment of the field and feedback from
users, the priorities for the next 5 years include:
● Supporting data-sharing consortia in which multiple investigators
pool their data in collaboration with PharmGKB to answer
specific questions that require large datasets, not typically avail-
able to single research groups.
● Developing algorithms for text mining in order to identify
appropriate pharmacogenomics literature, and begin the process
of extracting the key genes, variations, drugs, and phenotypes
that form the basis for our curator annotations.
● Creating algorithms for the analysis of rare variations that
emerge from whole exome and whole genome sequencing
efforts. Most of the efforts to date in pharmacogenomics have
focused on the analysis of common variants, but the era of
genome sequencing has made it clear that a primary challenge
will be interpreting rare or novel variations found in individual
genomes.
● Helping lead the clinical implementation and impact of phar-
macogenomics knowledge in clinical settings. The contents of
PharmGKB can provide a base of peer-reviewed information
from which clinical guidelines can be constructed.
● Studying the molecular and cellular mechanisms of drug
response in order to provide the knowledgebase required to
understand the systemic effects of drugs, their side effects, and
their unexpected interactions.
Finally, we will evaluate how these and other activities impact
the requirements for the PharmGKB Web site, and consider its evo-
lution from a purely research repository of knowledge to a more
integrated research and clinical resource for personalized medicine.

Acknowledgments

The authors would like to acknowledge Dorit Berlin, Michelle

Whirl Carrillo, John Conroy, Adrien Coulet, Sean David, Katrina
Easton, Ray Fergerson, Yael Garten, Li Gong, Mei Gong, Winston
PharmGKB: The Pharmacogenomics Knowledge Base 319

Gor, Joan Hebert, Tina Hernandez-Boussard, Micheal Hewett,

Amy Hodge, Laura Hodges, Daniel Holbert, Tiffany Jung, Mark
Kiuchi, Steve Lin, Feng Liu, Xing Jian Lou, Charity Lu, Andrew
MacBride, Ellen McDonagh, Diane Oliver, Connie Oshiro, Ryan
Owen, Daniel Rubin, Katrin Sangkuhl, Farhad Shafa, Ravi Shankar,
Rebecca Tang, TC Truong, Ryan Whaley, Mark Woon, and Tina
Zhou for their contributions to building the PharmGKB.
The PharmGKB is financially supported by NIH/NIGMS
(R24GM61374).

References

1. NIH. Goals for the PGRN. http://www. 14. Garten Y, Altman RB (2009) Pharmspresso: a
nigms.nih.gov/Research/FeaturedPrograms/ text mining tool for extraction of pharmacoge-
PGRN/ nomic concepts and relationships from full
2. Altman RB, Klein TE (2002) Challenges for text. BMC Bioinformatics 10(Suppl 2):S6
biomedical informatics and pharmacogenom- 15. Coulet A et al (2010) Using text to build
ics. Annu Rev Pharmacol Toxicol 42:113–133 semantic networks for pharmacogenomics.
3. Povey S et al (2001) The HUGO Gene J Biomed Inform 43:1009–1019
Nomenclature Committee (HGNC). Hum 16. Chen J et al (2010) Interethnic comparisons of
Genet 109:678–680 important pharmacology genes using SNP
4. Wishart DS et al (2006) DrugBank: a compre- databases: potential application to drug regu-
hensive resource for in silico drug discovery latory assessments. Pharmacogenomics 11:
and exploration. Nucleic Acids Res 1077–1094
34:D668–D672 17. Sissung TM et al (2010) Clinical pharmacol-
5. Eichelbaum M et al (2009) New feature: path- ogy and pharmacogenetics in a genomics era:
ways and important genes from PharmGKB. the DMET platform. Pharmacogenomics 11:
Pharmacogenet Genomics 19:403 89–103
6. Sangkuhl K et al (2008) PharmGKB: under- 18. Gamazon ER et al (2009) A pharmacogene
standing the effects of individual genetic vari- database enhanced by the 1000 Genomes
ants. Drug Metab Rev 40:539–551 Project. Pharmacogenet Genomics 19:
7. Relling MV, Klein TE (2011) CPIC: clinical 829–832
pharmacogenetics implementation consortium 19. Feng J et al (2010) Compilation of a compre-
of the pharmacogenomics research network. hensive gene panel for systematic assessment of
Clin Pharmacol Ther 89:464–467 genes that govern an individual’s drug
8. Swen JJ et al (2011) Pharmacogenetics: from responses. Pharmacogenomics 11:1403–1425
bench to byte–an update of guidelines. Clin 20. Demir E et al (2010) The BioPAX community
Pharmacol Ther 89:662–673 standard for pathway data sharing. Nat
9. Bolton E, Wang Y, Thiessen PA, Bryant SH Biotechnol 28:935–942
(2008) PubChem: integrated platform of small 21. Desta Z et al (2009) Antiestrogen pathway
molecules and biological activities. In: Annual (aromatase inhibitor). Pharmacogenet
Reports in Computational Chemistry. Genomics 19:554–555
American Chemical Society, Washington, DC 22. Thorn CF, Klein TE, Altman RB (2009)
10. (US), N. L. o. M. MeSH Browser http:// Codeine and morphine pathway. Pharmacogenet
www.nlm.nih.gov/mesh/MBrowser.html Genomics 19:556–558
11. Organisation, I. H. T. S. D. SNOMED CT 23. Yang J et al (2009) Etoposide pathway.
http://www.ihtsdo.org/snomed-ct/ Pharmacogenet Genomics 19:552–553
12. Rubin DL et al (2005) A statistical approach 24. Marsh S et al (2009) Platinum pathway.
to scanning the biomedical literature for phar- Pharmacogenet Genomics 19:563–564
macogenetics knowledge. J Am Med Inform 25. Sangkuhl K, Klein TE, Altman RB (2009)
Assoc 12:121–129 Selective serotonin reuptake inhibitors path-
13. Altman RB et al (2003) Indexing pharmaco- way. Pharmacogenet Genomics 19:907–909
genetic knowledge on the World Wide Web. 26. Zaza G et al (2010) Thiopurine pathway.
Pharmacogenetics 13:3–5 Pharmacogenet Genomics 20:573–574
320 Caroline F. Thorn et al.

27. Gong L, Altman RB, Klein TE (2011) 33. Sangkuhl K, Klein TE, Altman RB (2011)
Bisphosphonates pathway. Pharmacogenet PharmGKB summary: citalopram pharmacoki-
Genomics 21:50–53 netics pathway. Pharmacogenet Genomics
28. Maitland ML et al (2010) Vascular endothelial 21(11):769–772
growth factor pathway. Pharmacogenet 34. Thorn CF et al (2011) Doxorubicin pathways:
Genomics 20:346–349 pharmacodynamics and adverse effects.
29. Sangkuhl K, Klein TE, Altman RB (2010) Pharmacogenet Genomics 21(7):440–446
Clopidogrel pathway. Pharmacogenet Genomics 35. Thorn CF et al (2011) PharmGKB summary:
20:463–465 fluoropyrimidine pathways. Pharmacogenet
30. Sangkuhl K et al (2011) Platelet aggregation Genomics 21:237–242
pathway. Pharmacogenet Genomics 21(8): 36. Sim SC, Altman RB, Ingelman-Sundberg M
516–521 (2011) Databases in the area of pharmacoge-
31. Oshiro C et al (2009) Taxane Pathway. netics. Hum Mutat 32:526–531
Pharmacogenet Genomics 19:979–983 37. Thorn CF, Klein TE, Altman RB (2010)
32. Mikkelsen TS et al (2011) PharmGKB summary: Pharmacogenomics and bioinformatics:
methotrexate pathway. Pharmacogenet PharmGKB. Pharmacogenomics 11:
Genomics 21(10):679–686 501–505
Chapter 21

Genetic Databases in Pharmacogenomics:

The Frequency of Inherited Disorders Database (FINDbase)
Marianthi Georgitsi and George P. Patrinos

Abstract
Pharmacogenomics studies how the variations of the individuals’ genetic makeup are correlated with a
person’s response to certain drugs in relation to the therapeutic efficiency, clinical outcome, or even sur-
vival, and how they affect drug metabolism, transport, or clearance. Yet, since the incidence of these poly-
morphisms, being either single-point variations or small insertions/deletions, varies among different
populations, a systematic collection and documentation of these variations is warranted, in order to facili-
tate implementation of pharmacogenomics in different populations. Here we review the existing electronic
databases related to pharmacogenomics and pay particular attention in the description of the pharmacoge-
nomics module Frequency of Inherited Disorders database (FINDbase), which documents curated allelic
frequency data pertaining to 144 pharmacogenomics markers across 14 genes, representing approximately
87,000 individuals from 150 populations and ethnic groups worldwide. Long-term sustainability of these
resources aims to contribute to the design, development, and implementation of pharmacogenomics test-
ing towards the application of personalized approaches in medical treatment.

Key words Database, Pharmacogenomics, Markers, Allelic frequencies, Populations, Ethnic groups,
Genes

1 Introduction

In the recent years, we have witnessed a remarkable progress in our

understanding of the genetic basis of disease, resulting from the
significantly advanced genotyping technology. This in turn has led
to very high rates of data production in many laboratories. At the
same time, DNA diagnostics and electronic healthcare records
tend to become increasingly common features of modern medical
practice. Therefore, it should be possible to integrate all of this
information in order to establish a detailed understanding of how
genome variations impact human health. It has rapidly become
clear that the knowledge and organization of these alterations in
structured repositories will be of great importance not only for
diagnosis but also for clinicians and researchers.

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_21, © Springer Science+Business Media, LLC 2013

321
322 Marianthi Georgitsi and George P. Patrinos

Genetic databases are online repositories of mutation data,

described for a single gene (locus-specific) or more genes (general)
or specifically for a population or ethnic group (national/ethnic).
The main applications of mutation databases are to provide
genotype-phenotype information and to facilitate molecular diag-
nostics. The first serious efforts towards summarizing DNA varia-
tions and their clinical consequences was made by Victor McKusick
in 1966 [1], when he published the Mendelian Inheritance in Man
(MIM), that is now distributed electronically (Online Mendelian
Inheritance in Man; OMIM, http://www.ncbi.nlm.nih.gov/
omim) by the National Center for Biotechnology Information and
updated on a daily basis [2]. Also, the first database collecting
mutations from a single gene has been published in 1976. It
included 200 mutations from the globin gene in a book, at that
time, format and has led to the HbVar database for hemoglobin
variants and thalassemia mutations [3, 4]. In the mid-1990s the
Human Genome Organization-Mutation Database Initiative
(HUGO-MDI) was created, in order to organize this new domain
of genetics, i.e., mutation analysis [5], which then evolved into the
Human Genome Variation Society (HGVS, http://www.hgvs.
org). Nowadays, this field is expanding at a rapid pace and there
are diverse types of genetic databases available on the Internet.
Pharmacogenetics and pharmacogenomics are gradually
assuming an important role in modern medical practice. Therefore,
publicly available specialist pharmacogenomics databases, summa-
rizing allele frequencies of pharmacogenetically relevant single
nucleotide polymorphisms (SNPs) in different populations, related
information on drug response in the context of the underlying
genetic variation, would be particularly helpful to this end.
However, this field is currently in its infancy. The Pharmacogenomics
Knowledge Base (PharmGKB, http://www.pharmgkb.org) is the
most prominent project in this field, supported by the National
Institutes of Health since 1999 through the PharmacoGenetics
Research Network [6, 7]. This database is described in details in
Chapter 20 and will not be discussed here.
This article aims to emphasize their potential applications in
pharmacogenomics and personalized medicine, by paying particu-
lar attention to the Frequency of Inherited Disorders database
(FINDbase; http://www.findbase.org), where allele frequencies of
pharmacogenetically relevant genes are stored, and comment upon
the key elements that are still missing and holding back the field.

2 Types of Genetic Databases

The various depositories that fall under the banner of “genetic

databases” can be divided into three main categories:
1. General (or central) mutation databases (GMDs). These data-
bases attempt to capture all described mutations in all genes,
Genetic Databases in Pharmacogenomics… 323

but with each being represented in only limited detail. The

included phenotype descriptions are generally quite cursory,
making GMDs of little value for those wishing to understand
the subtleties of phenotypic variability [8]. The best current
example of a GMD would be the Human Gene Mutation
Database (HGMD, http://www.hgmd.org) [9].
2. Locus-specific databases (LSDBs). There are over 1700 LSDBs
(http://www.gen2phen.org) that are concerned with just one
or a few specific genes [10, 11], usually related to a single dis-
ease entity. They aim to be highly curated repositories of pub-
lished and unpublished mutations within those genes, and as
such they provide a much-needed complement to the core
databases. Data quality and completeness is typically high,
with up to 50 % of stored records pertaining to otherwise
unpublished mutations. The data are also very rich and infor-
mative and the annotation of each mutant includes a full
molecular and phenotypic description.
3. National/ethnic mutation databases (NEMDBs). NEMDBs
are repositories documenting the genetic composition of an
ethnic group and/or population, the genetic defects, leading
to various inherited disorders and their frequencies calculated
on a population-specific basis (http://www.goldenhelix.org)
[12]. The emergence of the NEMDBs is justified from the fact
that the spectrum of mutations observed for any gene or
disease will often differ between population groups across the
planet, and also between distinct ethnic groups within a
geographical region.
These database types share the same primary purpose of repre-
senting DNA variations that have definitive or likely phenotypic
effect and they achieve this goal from very different angles. Beyond
the aforementioned main database types, DNA variation is also
recorded in various genomic databases, such as dbSNP (http://
www.ncbi.nlm.nih.gov/projects/SNP) [13] and the Genome
Wide Association central database (GWAS Central, http://www.
gwascentral.org) [14]. These resources make available a very exten-
sive list of normally occurring human genome variation and are of
utmost importance in helping to complete the picture for any gene
or region of interest, by summarizing all the neutral variants that
are typically not included in GMDs, LSDBs, and NEMDBs.

3 Database Management Models

A database is referred to as a collection of records, each of which

contains one or more fields (i.e., pieces of data) about some entity
(e.g., DNA sequences, mutations) that has a regular structure and
that is organized in such a way that the desired information can
easily be retrieved. Creation of databases relies on the model that
324 Marianthi Georgitsi and George P. Patrinos

the curator, i.e., the person, or group of persons that is responsible

for developing, updating and ultimately maintaining a mutation
database, will choose for setting up one. In the past, information
was contained within plain text Web sites but this structure cannot
be considered as a database in a strict sense. Flat-file databases were
the simplest type, they were the dominant type for a long time, and
they can still be useful, particularly for small scale and simple appli-
cations. These databases have modest querying capacity, and can
accommodate small to moderately big datasets.
Nowadays, relational databases gradually tend to dominate the
field. A relational database is based on data organization in a series
of interrelated tables. Also, information can be retrieved in an
extremely flexible manner by using structured data queries. The
dominant query language for relational databases is the semi-
standardized structured query language (SQL) with many differ-
ent SQL variants. The requirement of specialized software for
developing a relational database can potentially be a disadvantage,
since significant computer proficiency is required.

4 Depositing Pharmacogenomics Data into Databases: The FINDbase Paradigm

As previously mentioned, NEMDBs aim to document extensive

information over the described genetic heterogeneity of an ethnic
group or population. These resources have recently emerged,
mostly driven by the need to document the varying mutation spec-
trum observed for any gene (or multiple genes) associated with a
genetic disorder, among different population and ethnic groups
[12]. The first NEMDBs to come online were the Finnish database
(http://www.findis.org) [15], and the various NEMDBs that are
available at the Golden Helix Server (http://www.goldenhelix.
org) [16, 17]. In the latter case, a specialized database manage-
ment system was introduced, namely ETHNOS (available in both
flat-file and relational database format) that enables both basic
query formulation and restricted-access data entry so that all
records are manually curated to ensure high and consistent data
quality [18]. This management system led to the worldwide
Frequency of Inherited Disorders database (FINDbase; http://
www.findbase.org), a relational database that currently records fre-
quencies of causative mutations and pharmacogenetic markers
worldwide [19]. FINDbase was originally developed in 2006 as a
relational database that fostered frequencies of causative mutations
in genes associated with inherited disorders, systematically col-
lected from various populations and ethnic groups worldwide [19].
In 2010, it underwent a significant reform, pertaining not only to
data content update, but also to technological advances that facili-
tate data querying and visualization possibilities [20]. FINDbase
currently represents the richest NEMDB, content-wise, and has
Genetic Databases in Pharmacogenomics… 325

been broadly accepted by the scientific community as a key resource

to retrieve population-specific information. A new feature of the
updated database was the incorporation of a separate module,
namely FINDbase-Pharmacogenomics (FINDbase-PGx), pertain-
ing solely to pharmacogenomic markers allelic frequencies in genes
representing different classes of drug-metabolizing enzymes, trans-
porters, and drug-targets [21]. FINDbase-PGx represents the
largest, so far, collection of data on population and ethnic group-
specific pharmacogenomic markers allelic frequencies, an aspect
not sufficiently covered by other existing pharmacogenomic data-
bases and related resources. Understanding and registering
population-specific differences of the individuals’ genetic make-up
is expected to assist in adapting tailor-made therapeutic modalities
in the light of personalized medicine, in an effort to maximize the
therapeutic benefits and minimize primarily the chances of adverse
drug reactions on individuals, but also the costly burden of treat-
ing such serious or even fatal reactions. Moreover, since most
drug-development programs still focus on Caucasian subjects, it is
of great importance to document pharmacogenomics data on non-
Caucasians too, in order to better evaluate the implementation of
novel drugs in such populations [22].

5 Database Overview

5.1 Data Collection The main body of data is derived from the published literature
(http://www.ncbi.nlm.nih.gov/pubmed), mainly from original
reports or, occasionally, from review articles if the original publica-
tions were not accessible, according to the following criteria:
● The population and ethnicity should be clearly stated.
● The cohorts should be ethnically homogeneous.
● The subjects should be unrelated.
● Each population should be represented by a sufficient sample
size [i.e., ≥50 subjects (100 chromosomes)], with exceptions
for smaller cohort sizes made in the case of isolated popula-
tions or tribes studied less commonly.
● Each population should be represented only once for each
gene in the final data, by the largest available cohort, in order
to avoid redundant cases.
Data curation pertained initially to a careful selection of only
the pharmacogenomically relevant variants, followed by a reevalu-
ation of the calculated allelic frequencies per study, as some incon-
sistencies were identified pertaining to either the reported allele
frequencies in different parts of the same article (i.e., text and
tables), or the number of samples based on which rare allele fre-
quencies were eventually calculated, or the nomenclature used for
326 Marianthi Georgitsi and George P. Patrinos

each specific variant compared to that used in genomic databases

such as dbSNP [6]. No individual-level genotypes are presented in
FINDbase-PGx—only group-level aggregated (summary level)
data are collected. Pharmacogenomic markers allelic frequency
data were not curated from papers with analyses performed on
groups of individuals selected based on race (i.e., Caucasians,
Africans, Asians, etc.), since according to racial classification,
humans are divided into groups based largely on genetically trans-
mitted phenotypic traits. In FINDbase, ethnicity is rather more
important, represented by a “population,” which is regarded as a
group of people with the same ethnic origin and nationality. On
the other hand, an “ethnic group” is a subcategory of a population
whose individuals share distinct cultural, linguistic, or religious
identities, and are typically found in isolation within a certain geo-
graphic area or country.
The pharmacogenomics module of FINDbase (FINDbase-
PGx) was launched in August 2010 containing allelic frequencies
from a total of 144 pharmacogenomic markers from 14 well-
documented pharmacogenes, mined from 214 publications, per-
taining to approximately 87,000 subjects (>173,000 chromosomes)
across 150 populations and ethnic groups worldwide (Table 1),
including North and sub-Saharan Africans, Caucasians, Northeast
and Southeast Asians, Pacific Islanders, Amerindians, Aborigines,
and rare tribes. These variations include single nucleotide poly-
morphisms (SNPs) and small insertions/deletions, residing in cod-
ing or regulatory regions of the corresponding genes, and may
affect either the quality or stability of the produced proteins (quali-
tative effect) or alter gene transcription and expression (quantita-
tive effect) (for a review on transcriptional regulation and
pharmacogenomics see ref. 23).
Three classes of pharmacogenes are represented in FINDbase-
PGx: (a) genes coding for drug-metabolizing enzymes (CYP1A2,
CYP2D6, CYP2E1, CYP3A4, CYP3A5, DPYD, NAT2, PON1,
PON2, TPMT, UGT1A1, and UGT2B7), (b) genes coding for
drug transporters (SLCO1B1), and (c) genes coding for enzymes
being drug targets themselves (TYMS). The two latter categories
may be under-represented in the first version of the database; how-
ever, database contents are being continuously updated and the
second version of FINDbase-PGx aims to foster data on additional
genes coding for drug transporters (for instance the ATP-Binding
Cassette superfamily of genes, such as ABCB1/MDR1, ABCC1)
or drug targets (VKORC1).

5.2 System Design FINDbase-PGx is a publicly available database that is accessible via
and Access the URL http://www.findbase.org, and is being hosted at the
Golden Helix server (http://www.goldenhelix.org). There are no
registration requirements for data querying. The system architec-
ture and database schema were detailed in the original publication
Genetic Databases in Pharmacogenomics… 327

Table 1
Well-established pharmacogenes currently included in FINDbase-PGx, presented according
to their role in drug metabolism, transport, or action (adapted from ref. 21)

Number Number of Number of

of markers populations chromosomes
studied studied analyzed
Gene Drug (links in PharmGKB) per gene per gene per gene
Genes coding for drug-metabolizing enzymes
CYP1A2 http://www.pharmgkb.org/do/serve?objId= 17 20 12,074
PA27093&objCls=Gene#tabview=tab6
CYP2D6 http://www.pharmgkb.org/do/serve?objId= 47 35 21,406
PA128&objCls=Gene#tabview=tab6
CYP2E1 http://www.pharmgkb.org/do/serve?objId= 10 45 5,182
PA129&objCls=Gene#tabview=tab4
CYP3A4 http://www.pharmgkb.org/do/serve?objId= 6 18 9,048
PA130&objCls=Gene#tabview=tab6
CYP3A5 http://www.pharmgkb.org/do/serve?objId= 9 51 20,320
PA131&objCls=Gene#tabview=tab6
DPYD http://www.pharmgkb.org/do/serve?objId= 15 18 8,652
PA145&objCls=Gene#tabview=tab6
NAT2 http://www.pharmgkb.org/do/serve?objId= 13 23 10,668
PA18&objCls=Gene#tabview=tab5
PON1 http://www.pharmgkb.org/do/serve?objId= 3 23 22,042
PA33529&objCls=Gene#tabview=tab4
PON2 – 2 10 11,984
TPMT http://www.pharmgkb.org/do/serve?objId= 4 20 >11,776
PA356&objCls=Gene#tabview=tab6
UGT1A1 http://www.pharmgkb.org/do/serve?objId= 3 23 3,324
PA420&objCls=Gene#tabview=tab6
UGT2B7 http://www.pharmgkb.org/do/serve?objId= 4 5 3,508
PA361&objCls=Gene#tabview=tab5
Genes coding for drug transporters
SLCO1B1 http://www.pharmgkb.org/do/serve?objId= 4 18 11,226
PA134865839&objCls=Gene#tabview=
tab6
Genes coding for drug targets
TYMS http://www.pharmgkb.org/do/serve?objId= 7 17 22,528
PA359&objCls=Gene#tabview=tab6
a
Total 144 >173,738
a
A total number of populations cannot be calculated, as the same population may be represented more than once in
these 14 genes
of FINDbase-PGx [21], whereas the component services that
comprise FINDbase-PGx follow the service oriented architectural
approach [24]. The database querying interface was developed by
utilizing Microsoft’s PivotViewer program (http://www.getpivot.
com), based on Microsoft Silverlight® technology (http://www.
silverlight.net). FINDbase represents the first effort to have imple-
mented this program in mining biological information from large
328 Marianthi Georgitsi and George P. Patrinos

Fig. 1 Outline of the entire FINDbase-PGx data collection, consisting of variation boxes (see also Fig. 2)

datasets (Fig. 1). FINDbase user-friendly interface creates an envi-

ronment for quickly arranging data collections according to
selected criteria, filtering a collection for acquiring subsets of infor-
mation, or even zooming in a particular item (i.e., entry) for in-
depth data acquisition. Each entry (i.e., marker) is displayed in the
form of a card (Fig. 2a), along with a sidebar textbox with data
regarding the allelic marker frequency, the population/ethnic
group, as well as external links to PubMed, OMIM, and PharmGKB
(Fig. 2b), thus enhancing the creation of a network of genomic
repositories. These cards are automatically designed by PivotViewer,
upon data submission.
In addition, all recorded entries are provided along with their
unique PubMed ID for immediate article retrieval, whereas each
entry is identified against a unique ResearcherID (http://www.
researcherid.com), corresponding to the person who served as data
miner and curator. The use of unique researcher identifiers (such as
ResearcherID, OpenID®, and Researcher Identification Primer)
provides incentives for direct data submission and identifies a
researcher’s contribution to science in forms other than the estab-
lished peer-reviewed publications, such as submission to genetic
databases and data curation in such databases, being either LSBDs
or NEMDBs. Recently, this concept has been successfully imple-
mented for the systematic documentation and analysis of published
and unpublished human genetic variation related to hemoglobin-
opathies and thalassemias using the microattribution approach, a
microcredit-tracking system for data contribution reward [25]
which is currently implemented to provide incentives and credit to
researchers worldwide that are involved in the determination of
pharmacogenetically relevant allele frequencies.
Genetic Databases in Pharmacogenomics… 329

Fig. 2 Pharmacogenomic markers are presented in FINDbase-PGx as “Variation cards.” (a) Example of a card
representing the NAT2 variation rs1801280 (alternatively known as NAT2*5), in the Portuguese population. (b)
The corresponding information box with data regarding this particular marker appears upon zooming in the
corresponding “Variation card”

5.3 Querying Engine FINDbase-PGx gives the user the possibility to view, organize, cat-
egorize, and reorganize data dynamically, owing to the various fil-
ters provided in the left side menu (Fig. 3), as detailed earlier.
Currently, it is not yet possible to download data, but the users are
provided with a user-friendly environment for on-site data analysis,
by sorting the acquired data, with the help of the options provided
in the upper right drop-down menu (Fig. 4). The queries may be
simple, such as to observe variants from a certain population only
(Fig. 4), specific allelic variants (Fig. 5), rare variants with minor
allele frequency <10 % (Fig. 6), variants associated with a certain
drug, variants included in a particular publication, and others.
However, queries may also be compound, including various com-
binations of the above.
One querying example is presented in Fig. 4, pertaining spe-
cifically to Chinese. In this example, the Chinese population is rep-
resented by a total of 226 alleles, as depicted by an equal number
of display items across 13 genes (presented in alphabetical order):
eight alleles for CYP1A2, 125 alleles for CYP2D6, 20 alleles for
CYP2E1, three alleles for CYP3A4, five alleles for CYP3A5, six
alleles for DPYD, four alleles for NAT2, one allele for PON1, two
330 Marianthi Georgitsi and George P. Patrinos

Fig. 3 The querying interface of FINDbase-PGx

Fig. 4 Database query based on a specific population. A query for retrieving all pharmacogenetically relevant
variations for the Chinese population alone (left red arrow), sorted by gene name (right red arrow), returns 226
alleles in equal number of display items
Genetic Databases in Pharmacogenomics… 331

Fig. 5 Database querying based on pharmacogenomic marker allele frequency: Presentation of data pertaining
to marker rs2306283 (c.388A>G, alternatively known as *1b) of SLCO1B1, filtered by rare allele frequency
data (upper right sorting drop-down menu, reading “Sort:RareAlleleFrequency”)

Fig. 6 Database querying based on rare allele frequency: Presentation of all markers with rare allele frequency
0–10.44 %, sorted by gene name

alleles for PON2, four alleles for SLCO1B1, 10 alleles for TPMT,
34 alleles for TYMS, and four alleles for UGT1A1. These data can
be further scrutinized by zooming into specific ethnic groups
within the Chinese population (Han, Yao, Uygur), or based on the
geographical region (Chinese people from China, Hong Kong,
Singapore, or Malaysia).
332 Marianthi Georgitsi and George P. Patrinos

6 FINDbase Versus Other Pharmacogenomic Databases/Resources

PharmGKB represents the most comprehensive collection of

information in the field of pharmacogenomics, harboring informa-
tion on all pharmacogenomically relevant genes that have been
described thus far, mainly focusing on the relationships between
these genes and their genetic variations with drug response. It con-
tains curated genomic, phenotypic, as well as clinical information
from a multitude of pharmacogenomic studies from either inde-
pendent contributors or consortia investigating the clinical applica-
tions of pharmacogenomics. The features of PharmGKB and its
utilities are thoroughly covered in Chapter 20. Here, it is impor-
tant to highlight the ways in which FINDbase-PGx and PharmGKB
differ from each other, strengthening the notion that these two
databases are complementary to each other and not redundant.
Even though PharmGKB’s focus is on variants with well-established
only relationship to drug efficacy, metabolism, or toxicity, these
variants are rarely, and only recently have begun to be, documented
in respect to their population or ethnic group-specific frequencies.
Rather, allelic frequencies had been often referred to in terms of
race-specific differences (Africans, Asians, Caucasians), and this was
the case with a small number of variants. On the contrary,
FINDbase-PGx provides the largest currently available collection
of allelic frequency data on population- and ethnic group-specific
level (total of 150 populations/ethnic groups represented), from a
much smaller number of genes, but from many variants within each
gene. In addition, one novel aspect of FINDbase-PGx is the pos-
sibility that is offered to the user via the implementation of the
innovative PivotViewer program and the Silverlight® technology to
perform dynamic data queries from large datasets, not currently
possible in PharmGKB. In FINDbase-PGx, the user can visualize
and dynamically arrange data according to various filters, catego-
rize data, and discover trends or differences across all items. This
approach is implemented for the first time in a genomic data
repository.
On the other hand, LSDBs relevant to pharmacogenomics,
such as the Human CYP Nomenclature Committee (http://www.
cypalleles.ki.se), the UGT Alleles Nomenclature (http://www.
pharmacogenomics.pha.ulaval.ca/sgc/ugt_alleles/) page, and the
Consensus Human Arylamine N-Acetyltransferase Gene
Nomenclature page (http://louisville.edu/medschool/pharma-
cology/consensus-human-arylamine-n-acetyltransferase- gene-
nomenclature/), provide detailed compilations of all identified
genetic variants, albeit with very limited information on the func-
tional effects of these variants. As helpful as such databases/pages
may be, they lack any information pertaining to population or eth-
nic group-specific allelic frequencies on those markers that are
pharmacogenomically relevant. FINDbase-PGx now covers this
Genetic Databases in Pharmacogenomics… 333

aspect for those genes from the CYP (CYP1A2, CYP2D6,

CYP2E1, CYP3A4, CYP3A5), UGT (UGT1A1 and UGT2B7),
and NAT (NAT2) families which are currently included.
Continuous data enrichment aims to enlarge the present compila-
tion of genes, for even better overlap between FINDbase-PGx and
the aforementioned LSDBs.

7 Conclusions and Future Prospects

As summarized in a recent review by Lagoumintzis and coworkers,

the importance of pharmacogenomics knowledgebases can be
viewed in three points: Firstly, they summarize information on
drug response in the context of the underlying genetic variation;
secondly, they document allelic frequencies of pharmacogenetically
relevant SNPs in different populations; thirdly, they serve as public
repositories for depositing genotype/phenotype data from phar-
macogenomics studies that could be ultimately used for subse-
quent meta-analyses [26]. The well-established pharmacogenomic
knowledgebase PharmGKB thoroughly covers the first and third
point, whereas FINDbase-PGx covers the second and third point,
in a way that each database covers a gap left by the other, without
being redundant. FINDbase-PGx will be soon enriched with addi-
tional pharmacogenomic markers and their allelic frequencies in
genes such as CYP2C9 and VKORC1, CYP2C19, and NAT1; data
pertaining to already existing genes and markers are updated as
new studies are being published and additional populations and/
or ethnic groups are being analyzed. Given the way the system was
developed, data addition will not occur in the expense of querying
or visualization interface performance.
Moreover, direct data submission from individual contributors is
greatly encouraged and the features to materialize this interaction
are already developed to support data upload. Contributor identifi-
cation will be possible via a unique ResearcherID, so that credit is
properly given through a report in the form of a manuscript in
Human Genomics and Proteomics journal, the first peer-reviewed
open-access database-journal (http://www.sage-hindawi.com/
journals/hgp), while the report becomes indexed in PubMed.
Complementing other available pharmacogenomics knowledgebases
and related resources, FINDbase-PGx represents a useful tool that
was developed aiming to assist in the design and future development
of pharmacogenomic testing across different nations worldwide.
Pharmacogenomics, despite its early days, represents a tangible
aspect of personalized medicine and is expected to further improve
certain therapeutic modalities, given that physicians may slowly
become accustomed to pharmacogenomic testing. The impact of
pharmacogenomics on certain clinical research areas, such as cancer
and cardiovascular diseases management is becoming more and
334 Marianthi Georgitsi and George P. Patrinos

more established, whereas for others, like hemoglobinopathies and

neuropsychiatric disorders, research interest remains intensive [27].
Apart from the technical challenges and the uniqueness of the
field of pharmacogenomics in relation to database projects, per-
haps more difficult to overcome will be problems associated with
the way database research is organized, motivated, and rewarded.
For example, forming consensus opinions and truly committed
consortia in order to create standards, such as the warfarin consor-
tium, is far from easy in the highly competitive world of science.
This may partly explain why leading bioinformatics activities today
are often conducted in large specialized centers (e.g., the European
Bioinformatics Institute, and the US National Center for
Biotechnology Information) where the political influence and criti-
cal mass is such that what they produce automatically becomes the
de facto standard [28]. To this end, a global initiative, the Human
Variome Project (HVP), (http://www.humanvariomeproject.org)
was initiated in 2006, aiming to catalogue all human genetic varia-
tion and to make that information freely available to researchers,
clinicians and patients worldwide [29]. The HVP envisions achiev-
ing improved health outcomes by facilitating the unification of
human genetic variation and its impact on human health [30]. It
will support the use of human variation information in clinical and
research environments across the world by developing the resources
required to undertake key tasks, such as capturing and archiving all
human gene variation associated with human disease and variation
to drug response, establishing systems that ensure adequate cura-
tion of human variation knowledge from genetic databases, facili-
tating the development of software to collect and exchange human
variation data in a federation, and developing ethical standards that
ensure open access to all human variation data that are to be used
for global public good and address the needs of “indigenous”
communities under threat of dilution in emerging countries [31].
This kind of distributed and coordinated effort would also,
ideally, be managed in close partnership with specialized journals
[32] to ensure that contributors not only have the means but also
the incentives to publish their efforts. Such incentives include the
microattribution reviews proposed by Nature Genetics in early
2008 [33], or new publication modalities such as the database-
journals, i.e., databases inter-related with scientific journals. In the
latter case, Human Genomics and Proteomics, inter-related with
FINDbase, stands as a representative, and currently the only,
example of a database-journal [34].
Finally, the most fundamental hurdle of all that retards the
field is that of limited funding. Because of this, almost all mutation
databases in existence today have been built by researchers “on the
side” for their own use, with a small degree of sponsorship/fund-
ing at best. Therefore, to advance beyond this stage, database proj-
ects need to be increased in scale, quality, and durability, and this
can only happen if strategically minded funding agencies make
Genetic Databases in Pharmacogenomics… 335

available substantial targeted funds not only for the development

but also for general maintenance.

8 Conclusions

It is widely accepted that genetic databases are increasingly becom-

ing valuable tools in modern medical practice and personalized
medicine. However, the current array of genetic databases, partic-
ularly the ones related to pharmacogenomics is limited in number
and in their degree of inter-connection to capture all that is known
and being discovered regarding genetic variation and their correla-
tion to the variable drug response. Apparently, the biomedical
community must first appreciate the overwhelming need for
improved genetic/mutation database systems and the most ade-
quate solution will then presumably follow.

Acknowledgments

We wish to thank Sjozef van Baal for providing the building blocks
for FINDbase and FINDbase-PGx and for his continuous efforts.
A part of our own work has been funded by the European
Commission [ITHANET (FP6-026539), EuroGenTest (FP6-
512148), and GEN2PHEN (FP7-200754) projects], and the
Golden Helix Institute of Biomedical Research. Dr. Marianthi
Georgitsi is the recipient of a State Scholarship Foundation (IKY)
postdoctoral grant.

References
1. McKusick VA, Mendelian Inheritance in Man 6. Klein TE et al (2001) Integrating genotype
(1966) A catalog of human genes and genetic and phenotype information: an overview of
disorders. Johns Hopkins University Press, the PharmGKB project. Pharmacogenetics
Baltimore, MD Research Network and Knowledge Base.
2. Hamosh A et al (2005) Online mendelian Pharmacogenomics J 1:167–170
inheritance in man (OMIM), a knowledgebase 7. Davis A, Long R (2001) Pharmacogenetics
of human genes and genetic disorders. Nucleic research network and knowledge base: 1st
Acids Res 33:D514–D517 annual scientific meeting. Pharmacogenomics
3. Hardison RC et al (2002) HbVar: a relational 2:285–289
database of human hemoglobin variants and 8. George RA et al (2008) General mutation
thalassemia mutations at the globin gene databases: analysis and review. J Med Genet
server. Hum Mutat 19:225–233 45:65–70
4. Patrinos GP et al (2004) Improvements in the 9. Stenson PD et al (2003) Human Gene
HbVar database of human hemoglobin vari- Mutation Database (HGMD): 2003 update.
ants and thalassemia mutations for population Hum Mutat 21:577–581
and sequence variation studies. Nucleic Acids 10. Claustres M, Horaitis O, Vanevski M, Cotton
Res 32:D537–D541 RG (2002) Time for a unified system of muta-
5. Cotton RG, McKusick V, Scriver CR (1998) tion description and reporting: a review of
The HUGO mutation database initiative. locus-specific mutation databases. Genome
Science 279:10–11 Res 12:680–688
336 Marianthi Georgitsi and George P. Patrinos

11. Cotton RG, Phillips K, Horaitis O (2007) A using SNP databases: potential application to
survey of locus-specific database curation. drug regulatory assessments. Pharmacoge-
Human Genome Variation Society. J Med nomics 11:1077–1094
Genet 44:e72 23. Georgitsi M et al (2011) Transcriptional regula-
12. Patrinos GP (2006) National and ethnic mutation and pharmacogenomics. Pharmacogenomics
tion databases: recording populations’ genog- 12:655–673
raphy. Hum Mutat 27:879–887 24. Bell M (2010) SOA modeling patterns for
13. Wheeler DL et al (2008) Database resources of service-oriented discovery and analysis. Wiley
the National Center for Biotechnology & Sons, Inc., Hoboken, NJ
Information. Nucleic Acids Res 36:D13–D21 25. Giardine B et al (2011) Systematic documen-
14. Thorisson GA et al (2009) HGVbaseG2P: a tation and analysis of human genetic variation
central genetic association database. Nucleic in hemoglobinopathies using the microattri-
Acids Res 37:D797–D802 bution approach. Nat Genet 43:295–301
15. Sipila K, Aula P (2002) Database for the muta- 26. Lagoumintzis G, Poulas K, Patrinos GP
tions of the Finnish disease heritage. Hum (2010) Genetic databases and their potential
Mutat 19:16–22 in pharmacogenomics. Curr Pharm Des 16:
16. Patrinos GP et al (2005) The Hellenic national 2224–2231
mutation database: a prototype database for 27. Squassina A et al (2010) Realities and expecta-
mutations leading to inherited disorders in the tions of pharmacogenomics and personalized
Hellenic population. Hum Mutat 25: medicine: impact of translating genetic knowl-
327–333 edge into clinical practice. Pharmacogenomics
17. Kleanthous M et al (2006) The cypriot and 11:1149–1167
Iranian National Mutation databases. Hum 28. Stein L (2002) Creating a bioinformatics
Mutat 27:598–599 nation. Nature 417:119–120
18. Patrinos GP, Kollia P, Papadakis MN (2005) 29. Ring HZ, Kwok PY, Cotton RG (2006)
Molecular diagnosis of inherited disorders: les- Human Variome Project: an international col-
sons from hemoglobinopathies. Hum Mutat laboration to catalogue human genetic varia-
26:399–412 tion. Pharmacogenomics 7:969–972
19. van Baal S et al (2007) FINDbase: a relational 30. Horaitis O et al (2007) A database of locus-
database recording frequencies of genetic specific databases. Nat Genet 39:425
defects leading to inherited disorders world- 31. Kaput J et al (2009) Planning the human vari-
wide. Nucleic Acids Res 35:D690–D695 ome project. The spain report. Hum Mutat
20. Georgitsi M et al (2011) FINDbase: a world- 30:496–510
wide database for genetic variation allele fre- 32. Patrinos GP, Wajcman H (2004) Recording
quencies updated. Nucleic Acids Res 39: human globin gene variation. Hemoglobin
D926–D932 28:5–7
21. Georgitsi M et al (2011) Population-specific 33. Axton M (2008) Human variome microattri-
documentation of pharmacogenomic markers bution reviews. Nat Genet 40:1
and their allelic frequencies in FINDbase. 34. Patrinos GP, Petricoin EF (2009) A new scien-
Pharmacogenomics 12:49–58 tific journal linked to a genetic database:
22. Chen J, Teo YY et al (2010) Interethnic com- towards a novel publication modality. Hum
parisons of important pharmacology genes Genomics Proteomics 1:e597478
Chapter 22

Development of Predictive Models for Estimating Warfarin

Maintenance Dose Based on Genetic and Clinical Factors
Lu Yang and Mark W. Linder

Abstract
In this chapter, we use calculation of estimated warfarin maintenance dosage as an example to illustrate
how to develop a multiple linear regression model to quantify the relationship between several indepen-
dent variables (e.g., patients’ genotype information) and a dependent variable (e.g., measureable clinical
outcome).

Key words Multiple regression, Warfarin dosing algorism, CYP2C9, VKORC1, INR, Pharmacogenetics,
Personalized medicine

1 Introduction

We use Warfarin dosing to illustrate the approach for establishing

a pharmacogenetic algorithm as it is a good example of a pharma-
cogenetic application in personalized medicine. Several reasons
have imposed the need for the ability to predict warfarin dose
requirement. First, warfarin is the most widely used anticoagulant,
prescribed to more than 2 million new warfarin patients per year.
Clinical management is difficult because of large interpatient vari-
ability [1]. Second, several research studies have confirmed a sig-
nificant influence of two genetic factors [Cytochrome P4502C9
(CYP2C9) and Vitamin K epoxide reductase complex protein 1
(VKORC1)] as well as patients’ demographic factors on warfarin
maintenance dose [2–4]. Third, since warfarin has a narrow thera-
peutic window it is crucial to develop a dosing model to direct the
therapeutic management [5].
In this chapter, we introduce the approach of establishing a
warfarin dosing algorithm step by step. The three major steps
involved in the development of a pharmacogenetic algorithm are:
study design, algorithm building using multiple regression model,
and final model validation.

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_22, © Springer Science+Business Media, LLC 2013

337
338 Lu Yang and Mark W. Linder

Prior to developing the algorithm in pharmacogenetics, it is

important to understand basic knowledge for the study design,
which includes the criteria for selecting the target population and
relevant information about the population that needs to be col-
lected. When building a pharmacogenetic algorithm towards a
specific drug therapy, one needs to choose an appropriate math-
ematic model based on the data examination. Multiple regression
analysis used in this chapter for developing warfarin dosing algo-
rithm, is a powerful technique to assess the association of multi-
ple independent variables and the dependent variable. The final
model can then be used for predicting the unknown value of the
dependent variable (e.g., warfarin maintenance dose) from the
known value of two or more variables- also called the predictors
(e.g., the patients’ characteristics). The validity of the model can
then be evaluated by correlation between the predicted values
and the actual values.

2 Study Design

The objective of developing a predictive model is to quantify the

relationship between predictive variables and outcome (pheno-
type). Multiple regression strategy is a great modeling approach to
measure the weighted contribution of each predictor to the final
outcome. Prior to the predictive study, the investigator needs to
define what standard will be used for deciding that a subject has a
particular disease or outcome. This standard actually serves as the
subject selection criteria. In the meantime, one also needs to decide
what data to collect as used for predictive variables, which are
hypothesized to have effects on the outcome based on previous
findings. These variables may include both genotype information
and clinical characteristics.
Taking the example of building a warfarin dosing algorithm,
the study goal is clearly defined as to calculate the warfarin mainte-
nance dose based on patients’ genetic information and demo-
graphic information. INR is used as a standard to decide the
therapeutic response to warfarin (INR between 2 and 3 is the most
commonly recommended warfarin therapeutic range). The ideal
maintenance dose should be able to achieve this therapeutic INR
range. Therefore the study subject for this particular study would
target patients with recommended INR range yielded by consistent
dosing. Genetic variants of both CYP2C9 and VKORC1 [4, 6, 7]
influence warfarin maintenance dose requirements. Other genetic
factors like CYP4F2, ApoE, and GGCX have also been reported as
potentially important for warfarin dosing management [8–10], but
which are not included in our warfarin dosing model example.
Patients’ demographic factors like Age, Sex, and Weight were also
Predictive Models for Warfarin 339

selected as candidate predictors for calculating warfarin mainte-

nance dose, according to previous research. The major workflow in
the study design for developing a pharmacogenetic algorithm is
summarized as below:
1. Clearly define the study goal.
2. Choose the gold standard to measure the clinical outcome.
3. Set-up study subject criteria according to targeted outcome
measured by the standard.
4. Establish a list of predictive factors as data required to collect.
When the researcher begins to select the target populations it
is important to make all the variables distributed normally across
the intended range for application of the predictive model. Before
data collection it is important to pay some attention to the range
of data to gather. For example, if the goal is to predict the warfarin
dose for a population aged from 20 to 90 years old, what are the
considerations to choose the study subject age range? Ideally, the
distribution of age will fall in a normal bell-shaped curve. Non-
normal distribution of the data can distort the relationships and
affect the model fitting process. Therefore we would like to choose
a study population with normally distributed age range within
20–90 years old. Alternatively, a data transformation (e.g., square
root, log) method may be applied to improve normality.
The sample population size for developing the multiple regres-
sion model has to be taken into consideration when genetic infor-
mation is chosen as the predicting variables. If the interested allele
frequency is relatively low then a larger sample size is required to
achieve certain statistical power.

2.1 Note ● Ideally the mechanism of effect for a predictive genetic factor
on the outcome is the primary consideration for inclusion
within the model. However, validated statistical association in
the absence of mechanistic explanation may provide equivalent
input to the model but may limit opportunities for appropriate
accommodation in practice. For example, the knowledge that
CYP2C9 genetic variants reduce S-warfarin clearance not only
contributes to estimation of maintenance dose, but also is
instructive in terms of timing INR measurements with time to
reach steady state [11].
● The study population size depends on the allele frequency of
interest and might be relatively large if the genetic variants are
rare.
● Experimental approaches for data collection need to be selected
appropriately in terms of efficiency, reliability, and economic
situation.
340 Lu Yang and Mark W. Linder

3 Model Fitting

Multiple regression is a commonly applied technique to predict

the variance in a dependent variable based on linear combinations
of several independent variables. The multiple regression equation
takes the form: Y = a + b1x1 + b2x2 + … + bnxn. “Y” is the value
of the dependent variable that is being predicted or explained. “xn”
is the nth independent variable that accounts for the variance in y.
The “b”s are the regression coefficients representing the amount
the dependent variable y changes when the corresponding indepen-
dent variable changes 1 unit. “a” is the constant where the regression
line intercepts the y axis, representing the amount the dependent y
will be when all the independent variables are 0. The multiple
regression fitting process is to compute the linear relationship
between the “y” variable and the n “x” variables by using statistical
software (see examples in Subheading 3.1).
The multiple linear regression model can be characterized by
three basic assumptions: (1) The relationship between dependent
variable and independent variables follows a linear and additive
pattern which refers to the regression equation; (2) The distribu-
tion of the continuous variables in the multiple regression is nor-
mal; (3) There are no correlations between independent variables.
Violations of assumptions will be examined before analyzing the
experimental data. We will give more detailed explanations on this
using warfarin dosing algorithm as an example.
Before starting the multiple regression model fitting process,
one needs to first choose which predictor variables need to be
included in the model. In general, the rational for choosing vari-
ables depends mainly on the study objectives. In our example study
of warfarin maintenance dose prediction, the objective is to investi-
gate the quantifiable relationship between candidate factors and
warfarin maintenance dose. Individual warfarin maintenance dosage
is therefore selected as the dependent variable. Age, sex, weight,
CYP2C9 genotype, and VKORC1 genotype were selected as candi-
date predictors (independent variables) for warfarin maintenance
dose (dependent variable). Age, sex, and weight are commonly
measured clinical characteristics and are continuously considered as
significant factors of drug dosage. CYP2C9 and VKORC1 variants
contributed significantly to warfarin maintenance dose variability as
repeatedly reported in pharmacogenetic studies [4, 6, 7].
After gathering all the data required for the model, one needs to
examine whether there are any violations of the model assumptions.
Normality and linearity examinations can be performed separately
for each variable. According to the model assumptions, if the
dependent variable distribution is not normal or it is not linearly
related to the independent variables, a logarithm or square root
transformation of the dependent variable may result in a better
Predictive Models for Warfarin 341

model fit. Shapiro-Wilk test can be used to examine the distribution

normality of the data. If there are two or more of the independent
variables which are nearly linear combinations of each other, they
are multicolinear. These situations will be considered as violations
of the multiple regression model. In such case, the simplest solu-
tion is to use only one of the group of such related independent
variables for the regression model fitting. For example, if weight
and BMI both contribute to the warfarin dose prediction and they
are linearly related to each other, then only one of these two vari-
ables need to be included in order to build a multiple regression
model. The rule is to include the fewest independent variables that
account for the outcome (dependent variable).
Two variable screening strategies that can be used in the regres-
sion fitting procedure are forward stepwise regression and backward
stepwise regression. The first strategy starts by adding candidate
variables one at a time to the model, it measures the degree to
which the independent variable correlates to the dependent vari-
able and continue to add variables to the model until no significant
variables remain.
For example, if there are four independent variables x1, x2, x3,
x4 and we are trying to generate a multiple regression model out
of these variables to predict the dependent variable y by using this
forward stepwise regression approach. As in the forward selecting
procedure, we start by fitting the four models y = a + bnxn, n = 1,…,
4, to generate the p values for the significance of each single vari-
able derived from the t-test. For example, upon fitting the models
for each variable the following p-values are obtained: x1, p = 0.026;
x2, p = 0.146; x3, p = 0.238; and x4, p = 0.059.
Assuming that a significance level of 95 % level (p < 0.05) is
required, we can see that variables x1 and x4 are likely to provide
useful information for the model (p < 0.05). We start by adding the
most significant variable x1 to the model. Then we are going to fit
three models with the remaining three variables y = a + b1x1 + bnxn,
n = 2, 3, 4 to determine whether any of these three variables will be
significant enough to be included in the regression model. For
example, sequential addition of x2, x3, and x4 yields a p-value of
0.056, 0.016, and 0.148, respectively.
The variable x3 has the lowest p-value so that it is added to
the current model, given that x1 is already in the model. Before
we are going to test the remaining variables to be included in
the model, we fit the current model (y = a + b1x1 + b3x3) and test
the significance of x1. In this example, x1, p = 0.012 and x3,
p = 0.016. In this scenario, x1 is remains significant after x3 is added
to the model.
Since x1 is significant, we leave it in the model and continue to
fit the two models containing x1 and x3, and each of the remaining
variables: x2 and x4, one by one. If for example the resulting
p-values for x2 and x4 do not meet the required significance level,
342 Lu Yang and Mark W. Linder

the procedure ends and the forward stepwise regression approach

would result in the model containing only x1 and x3:
y = a + b1x 1 + b 3x 3
The second strategy starts with fitting the model with all vari-
ables of interest and then performs a new analysis after removing
the variables one by one. Those variables which are not significant
at the chosen critical level (e.g., p = 0.05) will be dropped from the
final model.
After all these steps of initial regression with candidate vari-
ables, assumption checking and possible data transformations for
correcting the assumption violations, the researcher will be able to
obtain a final model with a standard multiple regression equation
following an additive pattern of each independent variable multi-
plying by their specific coefficient. For example, the final model for
warfarin maintenance is expressed as: ln(Dose) = 1.35 – 0.008*Age
+0.116*sex +0.004*weight −0.376*(VKORC1-AA) + 0.271*(VK
ORC1-GG) – 0.307*(2C9*2)−0.318*(2C9*3) [12]. Note that
the p value is significant for each variable in the final model ([12],
Table 2).

3.1 Note 1. There may be interactions between the independent variables

in the multiple regression model and this interaction is in fact
in regard to the effect on the response. As for example, Fig. 1
from Linder’s paper [13] indicated a relationship between
warfarin maintenance dose (dependent variable) and advanc-
ing age based on the status of CYP2C9 variant. The non-vari-
ant group showed a much more significant effect of age on the
warfarin dose. Under this circumstance, this interaction
between CYP2C9 variant status and age should be included as
a new term after these two independent variables. The new
model could have Age, CYP2C9 and AGE*CYP2C9 as inde-
pendent variables and allows separate regression lines for each
of the CYP2C9 variant groups.
2. There are several commercial software applications that can be
used to perform multiple regression modeling. The common
ones are listed; Stata (StataCorp LP, Collage Station, Texas),
SPSS (SPSS, Chicago, IL), S-Plus (TIBICO Software Inc.,
Palo Alto, CA), and other statistical software packages.

4 Model Validation

The most straightforward examination of the accuracy of the mul-

tiple regression model is the R2 value. R2 is defined as the amount
of variance in the dependent variable that can be explained by the
Predictive Models for Warfarin 343

multiple regression model. R2 equals 1 minus the ratio of residual

variability (sum of square explained by the model) to the overall
variability (sum of square around the mean).
R = 1 − SS ( Residual ) / SS (Overall )
The range of R2 is between zero and one. The closer the R2
value is to one, the lesser the difference between the model
explained variance and the overall variance, therefore the better the
model predicts. The example is shown in (Fig. 2, [12]). In this
study, correlation between actual dosages and model calculated
daily warfarin maintenance dosages yielded an R2 value of 0.61.
Validations such as these are most rigorous when tested against an
independent data set not involved in development of the predictive
model.
In addition to the R2 approach, one can also demonstrate the
model accuracy by comparing to other existing models using dif-
ferent mathematical approaches. The mean absolute error, which is
the mean of the absolute values for the difference between the
predicted and actual value, can be used to evaluate each model’s
predictive accuracy. The mean absolute error is computed in the
original units rather than in the transformed units as for a fair com-
parison of all models.
In conclusion, R2 is the most commonly used statistical
approach to evaluate model fit. An alternative approach is to use
the mean absolute error to evaluate the model prediction
accuracy.

5 Summary

In this chapter, we use estimation of warfarin maintenance dosage

as an example to illustrate how to develop a multiple linear regres-
sion model to quantify the relationship between several indepen-
dent variables (e.g., patients’ genotype information and
demographics) and a dependent variable (e.g., measurable clinical
outcome such as INR). The quality of the predictive model is
ultimately dependent upon the quality of the data obtained from
the cohort of subjects used for developing the model, detection of
model violations, and appropriate data transformation techniques.
Final predictive models should be validated against independent
cohorts of patients selected to test the limits of the predictive
model. Prior to implementing predictive models into local clinical
practice, the model should first be tested to identify potential
characteristics of the local population, e.g., racial diversity, smok-
ing habits, which may not be adequately accounted for in the
predictive model and thus not meet the anticipated level of
predictive accuracy.
344 Lu Yang and Mark W. Linder

References

1. Takahashi H, Echizen H (2003) of warfarin dose with genes involved in its

Pharmacogenetics of cyp2c9 and interindivid- action and metabolism. Hum Genet 121:
ual variability in anticoagulant response to 23–34
warfarin. Pharmacogenomics J 3:202–214 8. Caldwell MD, Awad T, Johnson JA, Gage BF,
2. Aquilante CL, Langaee TY, Lopez LM, Falkowski M, Gardina P et al (2008) Cyp4f2
Yarandi HN, Tromberg JS, Mohuczy D et al genetic variant alters required warfarin dose.
(2006) Influence of coagulation factor, vita- Blood 111:4106–4112
min k epoxide reductase complex subunit 1, 9. Wadelius M, Chen LY, Downes K, Ghori J,
and cytochrome p450 2c9 gene polymor- Hunt S, Eriksson N et al (2005) Common
phisms on warfarin dose requirements. Clin vkorc1 and ggcx polymorphisms associated
Pharmacol Ther 79:291–302 with warfarin dose. Pharmacogenomics J 5:
3. D'Andrea G, D'Ambrosio RL, Di Perna P, 262–270
Chetta M, Santacroce R, Brancaccio V et al 10. Kimmel SE, Christie J, Kealey C, Chen Z, Price
(2005) A polymorphism in the vkorc1 gene is M, Thorn CF et al (2008) Apolipoprotein e
associated with an interindividual variability in genotype and warfarin dosing among cauca-
the dose-anticoagulant effect of warfarin. sians and African Americans. Pharmacogenomics
Blood 105:645–649 J 8:53–60
4. Sconce EA, Khan TI, Wynne HA, Avery P, 11. Linder MW, Bon Homme M, Reynolds KK,
Monkhouse L, King BP et al (2005) The impact Gage BF, Eby C, Silvestrov N, Valdes R Jr
of cyp2c9 and vkorc1 genetic polymorphism (2009) Interactive modeling for ongoing util-
and patient characteristics upon warfarin dose ity of pharmacogenetic diagnostic testing:
requirements: Proposal for a new dosing regi- application for warfarin therapy. Clin Chem
men. Blood 106:2329–2333 55:1861–1868
5. Burns M (1999) Management of narrow ther- 12. Zhu Y, Shennan M, Reynolds KK, Johnson
apeutic index drugs. J Thromb Thrombolysis 7: NA, Herrnberger MR, Valdes R Jr, Linder
137–143 MW (2007) Estimation of warfarin mainte-
6. Tham LS, Goh BC, Nafziger A, Guo JY, Wang nance dose based on vkorc1 (−1,639g > a)
LZ, Soong R, Lee SC (2006) A warfarin- and cyp2c9 genotypes. Clin Chem 53:
dosing model in asians that uses single- 1199–1205
nucleotide polymorphisms in vitamin k epoxide 13. Linder MW, Looney S, Adams JE 3rd, Johnson
reductase complex and cytochrome p450 2c9. N, Antonino-Green D, Lacefield N et al
Clin Pharmacol Ther 80:346–355 (2002) Warfarin dose adjustments based on
7. Wadelius M, Chen LY, Eriksson N, Bumpstead cyp2c9 genetic polymorphisms. J Thromb
S, Ghori J, Wadelius C et al (2007) Association Thrombolysis 14:227–232
Chapter 23

Evidence Based Drug Dosing and Pharmacotherapeutic

Recommendations per Genotype
Vera H.M. Deneer and Ron H.N. van Schaik

Abstract
Implementing pharmacogenetics in daily clinical practice has the potential to improve patient care. The
translation of results of pharmacogenetic studies into practical pharmacotherapeutic recommendations is
essential. These recommendations are preferably available at the time of drug prescribing and drug dis-
pensing. This chapter describes a process of developing evidence based drug dosing and pharmacothera-
peutic guidelines per genotype by the Dutch Pharmacogenetics Working Group. It is aimed to provide
recommendations in case drugs are prescribed to a patient whose genotype is known. Furthermore, several
examples are given. Many drugs are metabolized by the Cytochrome P450 CYP2D6 enzyme. Carriage of
genetic variants of the CYP2D6 gene can result in a predicted phenotype of poor, intermediate or ultrar-
apid metabolizer. Dose adjustments, pharmacotherapeutic and monitoring recommendations are described
for several CYP2D6 substrates, when initiated in patients with the above mentioned phenotypes.

Key words Pharmacogenetics, Cytochrome P450 enzymes, CYP2D6, Personalized medicine,

Pharmacotherapy

1 Introduction

In the past years, both the number and quality of published studies
on the association between genetic variants and the pharmacokinet-
ics or pharmacodynamics of drugs has increased enormously. Studies
include larger populations and clinically relevant endpoints are more
frequently evaluated. The FDA and EMA have encouraged and
requested to add pharmacogenetic information to drug labels.
Pharmacogenetic testing has become cheaper en more readily avail-
able. Many researchers, health care providers and those involved in
health care policy have stated that implementing pharmacogenetics
in daily clinical practice is a major step forward to personalized med-
icine. By implementing pharmacogenetics, one aims to improve

The Pharmacogenetics Working Group is part of the Pharmacogenetics Project of the Royal Dutch Association for the
Advancement of Pharmacy

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7_23, © Springer Science+Business Media, LLC 2013

345
346 Vera H.M. Deneer and Ron H.N. van Schaik

Predicting
drug Better,
concentration Better patient Better public
PGx testing individualized,
or outcome health
prescribing
response

Fig. 1 Implementation of pharmacogenetics in daily clinical practice

patient care and therefore improve public health (Fig. 1). However,
the implementation of pharmacogenetics in routine patient care is
still limited. One of the main reasons may be that studies have not
resulted in practical dose prescribing recommendations per geno-
type. In 2005, The Royal Dutch association for the advancement
of Pharmacy initiated the Pharmacogenetics Project for which
they established a multidisciplinary working group (Dutch
Pharmacogenetics Working Group; DPWG) in which (hospital)
pharmacists, medical doctors, clinical pharmacologists, clinical (bio)
chemists, and epidemiologists participate. The aim of the project
was (1) to evaluate gene–drug interactions by systematic review, (2)
to give drug dosing and pharmacotherapeutic recommendations
per genotype, (3) to make recommendations available at the time of
prescribing drugs by medical doctors and drug dispensing by phar-
macies. This approach did not include recommendations on whether
or not genotyping is thought to be necessary in case of certain drugs
or certain clinical conditions of patients, but aimed at providing
drug dosing and other pharmacotherapeutic recommendations in
case the genotype of the patient is known. It was anticipated that in
the near future genotyping is more often applied as part of routine
patient care in case of problems regarding pharmacotherapy such as
the development of side effects or insufficient efficacy of drugs.
Once a patient’s genotype is known, it should be taken into account
at the time other drugs are initiated.
Monitoring of pharmacotherapy of individual patients is rou-
tinely performed by using electronic drug prescribing systems or
drug dispensing systems with an incorporated database. The pre-
scribing and dispensing systems contain patient characteristics such
as prescribed drugs, a patient’s genotype. The incorporated data-
base includes information regarding drug interactions and gene–
drug interactions (Fig. 2). Implementation in daily practice means
Evidence Based Drug Dosing and Pharmacotherapeutic Recommendations… 347

Software Electronic database

Drug prescribing Level of evidence gene-drug interaction
Drug dispensing Clinical relevance gene-drug interaction
Patient characteristic Interaction/Action
CYP2D6 PM Dose or pharmacotherapeutic recommendation
CYP2C9 *1/*3 Report
Drugs

Fig. 2 Implementation of pharmacogenetics database into drug prescribing and dispensing software

that an alert is generated when a genotype requires modification of

the therapy, e.g., initiating metoprolol in a CYP2D6 poor metabo-
lizer generates an alert with the advice to reduce the initial dose
of metoprolol by 70 % when treating a patient with heart failure.
The pharmacogenetics database includes the following parameters
of genotype–drug interactions: level of evidence, clinical relevance,
whether an action is required or not, a dose or pharmacotherapeu-
tic recommendation if applicable, and a report summarizing the
available data used to get to the particular recommendation.
This chapter describes the process of developing evidence
based drug dosing and pharmacotherapeutic guidelines per geno-
type by the DPWG, as well as relevant aspects of the parameters in
the pharmacogenetics database. It is meant as an example of a
method and assessment process for the development of pharmaco-
genetic guidelines.

2 From Genotype to Predicted Phenotype

In daily clinical practice it is useful to translate a patient’s genotype

into a predicted phenotype. The latter is easier to understand by
health care providers, who are less familiar with the detailed nomen-
clature in the field of pharmacogenetics. Genotypes regarding
enzymes like the cytochrome P450 enzymes CYP2D6 and CYP2C19
are translated in the predicted phenotype poor, intermediate, exten-
sive, or ultrarapid metabolizer. The translation has, in some cases,
been incorporated as part of commercially available genetic tests and
implies that a report is generated including a patient’s predicted phe-
notype. However, this translation may differ between genetic tests,
health care professionals and researchers. This means that the
348 Vera H.M. Deneer and Ron H.N. van Schaik

reported phenotype may depend on the laboratory which performed

the genetic test if harmonization is absent. It is obvious that this is
not the optimal situation and one of the requirements for dosing
recommendations based on a predicted phenotype is that there is
agreement on how to translate genotype to predicted phenotype.
For CYP2D6 for example, one agrees that patients with two genes
encoding a nonfunctional CYP2D6 enzyme are poor metabolizers
and those with additional copies of a functional CYP2D6 allele are
ultrarapid metabolizers. However, carriers of only one nonfunctional
allele are in the literature classified as either intermediate or extensive
metabolizer, with the latter having a “normal” enzyme capacity.
Off course the predicted phenotype is dependent on the sub-
strate. The metabolic ratio of the CYP2D6 probe drug dextrome-
torphan is increased by a factor 3.0 in carriers of one nonfunctional
CYP2D6 allele versus subjects without such an allele [1]. The area
under the plasma concentration versus time curve of trimipramine
in subjects with one nonfunctional allele is increased by a factor 2.5
as compared to those without such an allele [2]. However, the
clearance of haloperidol in both groups is similar with a ratio of
those with one nonfunctional allele versus those without, being 0.9
[3]. Since carriage of only one nonfunctional CYP2D6 allele leads
to clinically relevant changes in pharmacokinetics of some drugs, it
was decided to classify these subjects as intermediate metabolizers.
The result of the consensus meeting on this topic was shared with
professionals involved in genetic testing in the Netherlands.

3 Level of Evidence and Clinical Relevance of Gene–Drug Interactions

Initially, the scientific literature on pharmacogenetics primarily

included case reports, case series, pharmacokinetic studies and phar-
macodynamic studies with surrogate endpoints. In more recent years
the number of observational studies on the association between
genetic variants and clinically relevant endpoints has increased.
Randomized studies are however still scarse. Within the pharmaco-
genetics project, a systematic search is performed on a specific gene–
drug interaction. In the further assessment, review articles, nonhuman
studies and in vitro data are excluded. The level of evidence of every
study is scored on a five point scale with the scores 0 and 4 being,
respectively, the lowest and highest level of evidence. The criteria for
assigning the different scores are described in Table 1 [4]. The clini-
cal relevance is scored on a seven-point scale. A clinical or pharmaco-
kinetic effect, which is not statistically significant in a specific study is
coded as AA (lowest impact), while code F represents a highly clini-
cally relevant effect, e.g., death, severe arrhythmia, or bone marrow
depression (highest impact). A more detailed description is given in
Table 2 [4]. As part of the assessment of literature, events are added
to the list. Initially, the level of evidence and clinical relevance of each
article is independently scored by two members of the DPWG. The
Evidence Based Drug Dosing and Pharmacotherapeutic Recommendations… 349

Table 1
Scoring system for level of evidence of gene–drug interaction [4]

Level of evidence
4 Published controlled studies of good qualitya relating to phenotyped and/or genotyped patients
or healthy volunteers, and having relevant pharmacokinetic or clinical endpoints
3 Published controlled studies of moderate qualityb relating to phenotyped and/or genotyped
patients or healthy volunteers, and having relevant pharmacokinetic or clinical endpoints
2 Published case reports, well documented, and having relevant pharmacokinetic or clinical
endpoints. Well documented case series
1 Published incomplete case reports. Product information
0 Data on file
– No evidence
Population size was not assessed when assigning the level of evidence but dose adjustments were calculated as the popu-
lation size-weighted mean
a
“Good quality” criteria include:
− The use of concomitant medication with a possible effect on the phenotype is reported in the manuscript
− Confounders are reported (e.g., smoking status)
− The reported data are based on steady-state kinetics
− Results are corrected for dose variability
b
Wherever one or more of these “good quality” criteria was missing, the quality of the study was considered to be
“moderate”

Table 2
Scoring system for clinical relevance of gene–drug interaction [4]

Classification of clinical relevance

AA Clinical effect (NS)
Kinetic effect (NS)
A Minor clinical effect (S): QTc prolongation (<450 msec ♂, <460 msec ♀); INR increase <4.5
Kinetic effect (S)
B Clinical effect (S): short-lived discomfort (<48 h) without permanent injury: e.g., reduced
decrease in resting heart rate; reduction in exercise tachycardia; decreased pain relief from
oxycodone; ADE resulting from increased bioavailability of atomoxetine (decreased appetite,
insomnia, sleep disturbance etc.); neutropenia >1.5 × 109/l; leucopenia >3.0 × 109/l;
thrombocytopenia >75 × 109/l; moderate diarrhea not affecting daily activities; reduced glucose
increase following oral glucose tolerance test
C Clinical effect (S): long-standing discomfort (48–168 h) without permanent injury e.g., failure of
therapy with tricyclic antidepressants, atypical antipsychotic drugs; extrapyramidal side effects;
parkinsonism; ADE resulting from increased bioavailability of tricyclic antidepressants, metoprolol,
propafenone (central effects e.g., dizziness); INR 4.5–6.0; neutropenia 1.0–1.5 × 109/l; leucopenia
2.0–3.0 × 109/l; thrombocytopenia 50–75 × 109/l
D Clinical effect (S): long-standing discomfort (>168 h), permanent symptom or invalidating injury
e.g., failure of prophylaxis of atrial fibrillation; venous thromboembolism; decreased effect of
clopidogrel on inhibition of platelet aggregation; ADE resulting from increased bioavailability
of phenytoin; INR > 6.0; neutropenia 0.5–1.0 × 109/l; leucopenia 1.0–2.0 × 109/l;
thrombocytopenia 25–50 × 109/l; severe diarrhea
(continued)
350 Vera H.M. Deneer and Ron H.N. van Schaik

Table 2
(continued)

Classification of clinical relevance

E Clinical effect (S): Failure of lifesaving therapy e.g., anticipated myelosuppression; prevention
of breast cancer relapse; arrhythmia; neutropenia < 0.5 × 109/l; leucopenia < 1.0 × 109/l;
thrombocytopenia <25 × 109/l; life-threatening complications from diarrhea
F Clinical effect (S): death; arrhythmia; unanticipated myelosuppression
NS not statistically significant difference, S statistically significant difference, INR international normalized ratio, ADE
adverse drug event
The clinical relevance was scored on a seven-point scale derived from the National Cancer Institute’s Common Toxicity
Criteria. A clinical or pharmacokinetic effect that was not statistically significant was classified as AA (lowest impact),
whereas death, for example, was classified as F (highest impact). At every level of this point scale, new events are added
after assessment by the DPWG

assigned scores are subsequently evaluated by the complete working

group. Finally, the overall code of a gene–drug interaction is the
highest level of evidence and the clinical effect with the highest
relevance assigned to the articles included in the assessment.

4 Calculation of Dose Adjustments

In literature, dose adjustments of antidepressants and antipsychotic

drugs have been calculated for the different CYP2D6 phenotypes
[5]. However, data were partially extrapolated and the clinical rel-
evance of differences in pharmacokinetics between phenotypes
were not taken into account. In general, studies are frequently
underpowered to detect differences for less frequent phenotypes
such as CYP2D6 ultrarapid metabolizers.
As part of the pharmacogenetics project, dose calculations
were based on the following: (1) only pharmacokinetic data of
articles with level of evidence of 3 and 4 were used, (2) data from
studies showing either statistically significant or not statistically sig-
nificant differences were used; statistically non significance is often
caused by a limited sample size of the group with a certain geno-
type, (3) in case of active metabolites, the sum of the parent com-
pound and the active metabolite was used, (4) in case of prodrugs
which have to be metabolized into the active compound, data of
the active metabolite were used, (5) preferentially AUC’s were
used in calculations, followed by concentrations, calculated drug
clearances, and elimination half times.
A dose adjustment is calculated for every selected article using
the formula depicted in Fig. 3. A final dose adjustment is calculated
as the population size-weighted mean of the individually calculated
dose adjustment (Fig. 4).
Evidence Based Drug Dosing and Pharmacotherapeutic Recommendations… 351

DPM(%) = (AUCEM/ AUCPM) * 100%

Fig. 3 Calculation of dose adjustment for a phenotype based on one article using AUC as pharmacokinetic
parameter [4]. D dose, PM poor metabolizer, AUC area under the plasma concentration versus time curve, EM
extensive metabolizer. The calculated dose adjustment is the percentage of the dose usually prescribed or
recommended

(N(a)* DPM(a)) + (N(b)* DPM(b)) + (N(c)* DPM(c))……. + (N(x)* DPM(n))

DPM (%) =
N(a) + N(b) + N(c)….. + N(x)

Fig. 4 Calculation of the overall dose adjustment for a phenotype as the population size-weighted mean of
calculated dose adjustments per article [4]. D dose, PM poor metabolizer, N(a), number of subjects of study “a”;
DPM(a), calculated dose adjustment for PMs based on the results of study “a,” “b,” “c,”…“x” represent other
articles. The calculated overall dose adjustment is the percentage of the dose usually prescribed or recom-
mended. The percentage by which the dose should be reduced is calculated as 100 % minus DPM(%)

Table 3
Classification of gene–drug interactions on whether an interaction
exists or an action is required in case the combination occurs
in an individual patient

Interaction Action Result

Yes Yes System generates alert and advice how to deal with
the interaction
Yes No No alert, local users can adjust the system to
generate an alert
No No No alert

5 Dosing and Pharmacotherapeutic Recommendations

The final part of the assessment is the classification of the gene–

drug interaction (interaction: yes/no) and the decision on whether
a specific action is required in case the gene–drug interaction
occurs (action: yes/no). This classification is essential in the gen-
eration of alerts within the drug prescribing systems and drug dis-
pensing systems. Local users have the possibility to make
modifications with respect to the generated alerts as described in
Table 3. The recommendations include dose adjustments of drugs,
to select an alternative drug, to be alert to adverse drug events or
insufficient efficacy of a drug in an individual patient. An overview
of the dosing and pharmacotherapeutic recommendations per
gene–drug interaction has been published within the scientific
352

Table 4
Gene–drug interactions of several CYP2D6 substrates [4]

Level of Clinical Gene–drug

Drug Phenotype evidence relevance interaction Dose or therapeutic recommendation
Clomipramine PM 4 C Yes Reduce dose by 50 % and monitor (desmethyl)clomipramine plasma concentration
Vera H.M. Deneer and Ron H.N. van Schaik

IM 4 C Yes Insufficient data to allow calculation of dose adjustment. Monitor (desmethyl)

clomipramine plasma concentration
UM 2 C Yes Select alternative drug (e.g., citalopram, sertralin) or monitor (desmethyl)clomipramine
plasma concentration
Flecainide PM 4 A Yes Reduce dose by 50 %, record ECG, monitor plasma concentration
IM 3 A Yes Reduce dose by 25 %, record ECG, monitor plasma concentration
UM – – Yes Record ECG and monitor plasma concentration or select alternative drug (e.g., sotalol,
disopyramide, quinidine, amiodarone)
Haloperidol PM 4 C Yes Reduce dose by 50 % or select alternative drug (e.g., pimozide, flupenthixol,
fluphenazine, quetiapine, olanzapine, clozapine)
IM 4 A Yes No
UM 4 C Yes Insufficient data to allow calculation of dose adjustment. Be alert to decreased
haloperidol plasma concentration and adjust maintenance dose in response to
haloperidol plasma concentration or select alternative drug (e.g., pimozide,
flupenthixol, fluphenazine, quetiapine, olanzapine, clozapine)
Metoprolol PM 4 C Yes Heart failure: Select alternative drug (e.g., bisoprolol, carvedilol) or reduce dose by 75 %
Other indications: Be alert to ADE (e.g., bradycardia, cold extremities) or select
alternative drug (e.g., atenolol, bisoprolol)
IM 4 B Yes Heart failure: Select alternative drug (e.g., bisoprolol, carvedilol) or reduce dose by 50 %
Other indications: Be alert to ADE (e.g., bradycardia, cold extremities) or select
alternative drug (e.g., atenolol, bisoprolol)
UM 4 D Yes Heart failure: Select alternative drug (e.g., bisoprolol, carvedilol) or titrate dose to max.
250 % of the normal dose in response to efficacy and ADE
Other indications: Select alternative drug (e.g., atenolol, bisoprolol) or titrate dose to
max. 250 % of the normal dose in response to efficacy and ADE
Paroxetine PM 4 A Yes No
IM 4 A Yes No
UM 4 C Yes Insufficient data to allow calculation of dose adjustment. Select alternative drug
(e.g., citalopram, sertraline)
Tramadol PM 4 B Yes Select alternative drug (NOT oxycodone or codeine) or be alert to symptoms of
insufficient pain relief
IM 4 B Yes Be alert to decreased efficacy. Consider dose increase. If response is still inadequate
select alternative drug (NOT oxycodone or codeine) or be alert to symptoms of
insufficient pain relief
UM 3 C Yes Reduce dose by 30 % and be alert to ADE (e.g., nausea, vomiting, constipation,
respiratory depression, confusion, urinary retention) or select alternative drug
(e.g., acetaminophen, NSAID, morphine NOT oxycodone or codeine)
ADE adverse drug event
Evidence Based Drug Dosing and Pharmacotherapeutic Recommendations…
353
354 Vera H.M. Deneer and Ron H.N. van Schaik

literature [4, 6]. Some examples are described in Table 4. When an

alert is generated a text pops up including the recommendation
and some background information. The pharmacogenetics data-
base contains a report summarizing the main finding of the assessed
articles, the assigned codes, background information as well as
issues that were important within the assessment process.

References

1. Sachse C et al (1997) Cytochrome P450 2D6 haloperidol treatment. Clin Pharmacol Ther
variants in a Caucasian population: allele fre- 72:438
quencies and phenotypic consequences. Am J 4. Swen J et al (2008) Pharmacogenetics: from
Human Genet 60:284 bench to byte. Clin Pharmacol Ther 83:781
2. Kirchheiner J et al (2003) Effects of polymor- 5. Kirchheiner J et al (2004) Pharmacogenetics of
phisms in CYP2D6, CYP2C9, and CYP2C19 antidepressants and antipsychotics: the contri-
on trimipramine pharmacokinetics. J Clin bution of allelic variations to the phenotype of
Psychopharmacol 23:459 drug response. Mol Psychiatry 9:442
3. Brockmöller J et al (2002) The impact of 6. Swen J et al (2011) Pharmacogenetics: from
the CYP2D6 polymorphism on haloperidol bench to byte—an update of guidelines. Clin
pharmacokinetics and on the outcome of Pharmacol Ther 89:662
INDEX

A GoldenGate Genotyping ....................................147–152

high content screening (HCS)............................ 270, 271
ABC transporters hydrodynamic tail vein........................................279–288
ABCB1, 66–67 iPLEX Gold .................................................................74
ABCC11 gene ........................................................60–63 pyrosequencing .....................................................97–114
ABCG2 transport function .......................................................226
cDNA .................... 227–229, 231, 234, 236, 237, 242 Asymmetric PCR .....................................................118–121
genes ..................................................... 230, 247–248 ATP-dependent transport of
mRNA levels of .................................... 234, 242, 243 hematoporphyrin .................... 239, 240, 246, 247
protein ........... 227, 230, 235, 236, 242, 243, 245–247
variants .................. 231, 232, 234, 236, 237, 239, 244 B
Absorption, distribution, metabolism, excretion
BeadArray technology .............................................. 147, 151
(ADME)...............................6, 8, 17, 71, 76, 142,
Bioinformatic analyses .......................131, 134, 143, 304, 305
265, 279, 281
Bisulfite-treated DNA...................................... 183, 186, 187
Acute myeloid leukemia (AML) ..............................155–176
Acute promyelocytic leukemia (APL) ..............................138 C
A DE. See Adverse drug events (ADE)
ADT. See Assay Design Tool (ADT) The Cancer Genome Atlas (TCGA) .......................298–301
Adverse drug events (ADE) ..................... 307, 349–351, 353 Cancer Genome Project (CGP) .......................................299
Adverse drug reactions ................................. 10–13, 280, 325 Candidate genes ................. 4, 9, 11, 129, 147, 217, 267–269,
Affymetrix 275, 297, 304, 313, 316
genechips ............................................................155–176 Catalogue of Somatic Mutations in
genotype calls ............................................. 156, 172, 174 Cancer (COSMIC) ................................ 299, 300
isolation of mononuclear cells............................. 157, 159 cDNA. See Complementary DNA (cDNA)
mismatch probe .......................................... 156, 165, 166 Cell lines ........................... 111, 192, 202, 214, 215, 227–229,
Allele 242–244, 266, 271–275, 299, 301, 302
detection ............................................................... 89, 147 immortalized ...................................................... 271, 272
frequencies ........................ 66, 67, 73, 128, 149, 276, 296, CFH. See Complement factor H (CFH)
315, 322, 325, 328, 331, 339 ChIP. See Chromatin immunoprecipitation (ChIP)
nonfunctional........................................................ 15, 348 Chromatin immunoprecipitation (ChIP) ................ 202–208,
percentages, determining ..............................................97 210, 280, 281
Allele specific primer extension (ASPE) .................... 74, 118 Clinical relevance.......................143, 303, 316, 347–350, 352
Allelic frequencies..............................297, 325, 326, 332, 333 Clopidogrel............................................................. 7, 15, 349
pharmacogenomic markers ................. 325, 326, 331, 333 Clustering .................................... 84, 151, 152, 166, 167, 169
Allelic imbalance (AI) manual ................................................................ 151, 152
AI assays .............................................................201–210 CNVR216.1 .....................................................................222
cDNA ........................................... 202, 204, 206, 207 CNVs. See Copy number variations (CNVs)
AML. See Acute myeloid leukemia (AML) Collaborative cross (CC) ..................................................276
Approaches, genome-wide ..................................... 9, 18, 157 Competitive probe (CP) ................................... 59, 62, 63, 66
Assay Design Tool (ADT) ....................................... 149, 207 Complementary DNA (cDNA) .................28, 111, 156, 158,
Assays 161–163, 202, 204, 206–210, 227–229, 231, 234,
allelic imbalance..................................................201–210 236, 237, 242, 254
cell-based ............................................................ 269, 271 Complement factor H (CFH) ..........................................129
designing ...........................................83–84, 88, 101, 207 Complex diseases ....................... 127, 129, 130, 136, 143, 276
DME genotyping ...................................................88, 95 Complex traits ...............................4, 213–215, 264, 265, 276

Federico Innocenti and Ron H.N. van Schaik (eds.), Pharmacogenomics: Methods and Protocols,
Methods in Molecular Biology, vol. 1015, DOI 10.1007/978-1-62703-435-7, © Springer Science+Business Media, LLC 2013

355
PHARMACOGENOMICS: METHODS AND PROTOCOLS
356 Index

Copy number variations (CNVs)..............71, 72, 82, 84, 156, DMET. See Drug metabolizing enzyme and transporter
172–175, 214, 215, 219–222, 296, 298, 300, 302 (DMET)
CP. See Competitive probe (CP) DNA
CpG sites.................................................. 179–182, 184, 186 chromosomal .............................................. 190, 193, 231
Crohn’s disease ......................................................... 129, 216 genomic .........................28, 51, 57, 61–64, 67, 78, 79, 83,
CT value ....................................................................184–186 88–93, 95, 96, 132, 135, 159, 164, 176, 180–183,
Custom multiplex single nucleotide polymorphism 207, 227, 231, 234
mutation assay.........................................115–126 methylation..............................9, 179–182, 185, 302, 306
CYP enzymes sequencing ..................... 10, 143, 229, 267, 295, 297–299
CYP1A2 ..................................................... 251, 254, 256 DPWG.
CYP2C9 ............................................. 251, 252, 254, 256 See Dutch Pharmacogenetics Working Group (DPWG)
CYP2C9 variant status .........................................252 Drugbank ................................................................. 313, 315
CYP2D6......................................251, 252, 254, 256, 347 Drug development ................. 5, 10, 13, 16–17, 264, 267, 325
CYP2D6 PMs ......................................................348 Drug-induced liver injury (DILI).................................11, 18
CYP2E1 ............................................................. 254, 256 Drug interactions...............................4, 16, 17, 251, 346–352
CYP3A4 ............................................................. 251, 256 Drug metabolism....................................5, 11, 55, 76, 87–96,
CYP3A5 ............................................................. 251, 256 251–261, 265, 316, 327
CYPs. See Cytochrome P450s (CYPs) Drug metabolism genotyping assays.............................87–96
CYP-selective inhibitors................................... 256, 257, 260 Drug metabolizing enzyme and transporter
Cytochrome P450s (CYPs) ................................ 15, 251–261 (DMET) ......................................... 142, 180, 182
genes ................................................................... 180, 182
D Drug-metabolizing enzymes (DMEs)................. 4, 6, 12, 55,
Database of Genomic Variants (DGV) .................... 295, 296 87, 142, 225, 265, 279, 325–327
Database of Genotypes and Phenotypes Drug response
(dbGaP) .................................. 295, 298, 303, 304 phenotypes................... 214, 264–266, 268, 269, 273, 277
Databases phenotypes, multiple........................................... 264, 277
incorporated................................................................346 Drug transporters ..............................225–227, 279, 326, 327
locus-specific databases (LSDBs) ............... 323, 332–333 Dutch Pharmacogenetics Working Group
mutation databases (DPWG) ................................ 314, 346–348, 350
human gene mutation database (HGMD) ...........323
E
national/ethnic mutation databases
(NEMDBs) .................................... 323–325, 328 Efficient Mixed Model Association (EMMA) ................268
relational .....................................................................324 EGFR. See Epidermal growth factor receptor (EGFR)
Data quality ...........................................82–84, 315, 323, 324 Encyclopedia of DNA elements
dbGaP. See Database of Genotypes and Phenotypes (ENCODE) ........................................... 305, 306
(dbGaP) Endoplasmic reticulum (ER) .........14, 62, 219, 226–227, 244
Deletions ...........................9, 27, 71, 72, 75, 77, 88, 101, 116, endoplasmic reticulum-associated
137, 140, 175, 192, 196, 222, 296, 300, 326 degradation (ERAD) .............. 226, 227, 242, 244
Denaturing high-performance liquid chromatography Enhancers ................................................30, 52, 53, 279–288
(DHPLC) Epidermal growth factor receptor (EGFR) ..................13, 14
denaturation step, initial .........................................36, 37 ER. See Endoplasmic reticulum (ER)
mutation detection..................................................25–53 Ethnic groups ............................... 8–10, 12, 13, 66, 272, 297,
single-base extension (SBE) genotyping ......................27 322–326, 328, 331–333
SURVEYOR Nuclease ......................... 27–28, 30, 43, 53 Ethnic populations ........................................... 217, 295, 297
DGV. See Database of Genomic Variants (DGV) Exomes ......................................134, 136–142, 297, 298, 318
Diseases sequencing ...................................134, 136–142, 297, 298
cardiovascular.......................216, 247–248, 269, 333–334 Expression quantitative trait loci
gout.....................................................................247–248 (eQTLs)...................214, 217–221, 223, 302–303
human................................................. 127, 264, 277, 334 data .....................................................................302–303
inflammatory bowel disease (IBD) ..................... 141, 216
Parkinson’s disease (PD) .............................................136 F
Disorders, complex ........................................... 128, 129, 136 False discovery rate (FDR) ....................................... 169, 170
DMEs. See Drug-metabolizing enzymes (DMEs) Flow cytometry................................................. 116, 271, 273
PHARMACOGENOMICS: METHODS AND PROTOCOLS
Index
357
Frequency of Inherited Disorders database high-throughput SNP genotyping ..................... 217, 218
(FINDbase) ............................ 296, 297, 321–335 platform .............................................. 214, 217, 218, 298
FINDbase-PGx .................................. 325–330, 332, 333 GEO. See Gene Expression Omnibus (GEO)
GoldenGate genotyping ...........................................147–152
G GenomeStudio software .............................................151

GC content ............................................................ 32, 48, 50 H

gDNA. See Genomic DNA (gDNA)
Haplotypes .........9, 11, 82, 202, 210, 217, 268, 275, 296, 314
Gel electrophoresis .................... 52, 59, 72, 98, 193, 194, 234
HapMap .........4, 128, 149, 214, 215, 271, 295, 297, 298, 303
Gene expression
HGNC. See HUGO Gene Nomenclature Committee
data .............................. 157, 165–171, 174, 217, 300–302
(HGNC)
levels ....... 71, 155, 165, 168, 174, 175, 202, 208, 216, 217
Hidden Markov models (HMM) .....................................172
profiling .......................................................... 4, 155–176
High-throughput (HT) screening
Gene Expression Omnibus (GEO) .......................... 300, 301
cell-based assays..........................................................271
General mutation databases (GMDs) ...................... 322, 323
induced point mutations (TILLING) ..........................28
Genes
multidrug transporters ................................................227
causative................128, 134, 136, 217, 221, 264, 268, 324
HMM. See Hidden Markov models (HMM)
coding ..................4, 88, 97, 134, 142, 202, 204, 207–209,
HTBZ ................................................................................16
215, 216, 297, 305, 317, 326, 327
HUGO Gene Nomenclature Committee (HGNC) 313, 314
cystic fibrosis ..............................................................216
Human Gene Mutation Database (HGMD)...................323
host .......................216, 217, 221, 227, 228, 279, 300, 302
invariant set of ............................................................165 I
pharmacodynamics (PD) ...... 6–10, 15–17, 217, 265, 277,
345, 348 Indels ..............88, 97, 101, 106, 134, 148, 191, 269, 295–298
quantitative trait ......................... 267–269, 275–277, 302 Inherited disorders database .............................................324
transporter ........... 4, 66, 67, 225–248, 281, 317, 325–327 INR. See International normalized ratio (INR)
tumor suppressor ........................................ 172, 173, 221 Insertions .....................9, 27, 66, 71, 72, 75, 77, 88, 101, 137,
Genetic association studies ......................... 11, 214, 303–305 140, 163, 191, 194, 196, 234, 296, 326
Genetic databases .....................................................321–335 Interethnic comparisons of important
Genetic polymorphisms .........................5, 55, 56, 64, 65, 67, pharmacology genes ............................................4
225, 227, 245 International normalized ratio
Genetic variants .................. 10, 127, 130, 143, 215, 219, 264, (INR) ................... 16, 64, 338, 339, 343, 349, 350
267, 272, 280, 296, 297, 299, 303, 304, 306, 307, Irinotecan ........................................................... 7, 12–13, 65
332, 338, 339, 345, 348 pathway .............................................................. 316, 317
Genomes project....................... 130, 134, 136, 137, 215, 221,
L
223, 295, 297
data ..............................................134, 136, 215, 295, 297 LCLs. See Lymphoblastoid cell lines (LCLs)
Genome-Wide Associations (GWA) .......172, 213–215, 266, Leucopenia ................................................... 65, 67, 349–350
268, 275 Linkage analysis ....................................... 127–144, 266, 267
Genome-Wide Association Studies (GWAS)......... 9, 11, 12, Linkage disequilibrium .......... 9, 128, 204, 207–210, 217, 297
127–144, 172, 201, 247, 264, 272, 303–304, 323 Liver .............. 11, 18, 223, 254, 256–258, 260, 265, 267, 276,
Genomic DNA (gDNA) ............................28, 51, 57, 61–64, 279–288, 302, 317
67, 78, 79, 83, 88–93, 95, 96, 132, 135, 159, 164, Location, chromosomal ............................ 296–298, 303, 304
176, 180–183, 207, 227, 231, 234 Locus-specific oligo (LSO) ..............................................150
Genomics Loss of heterozygosity (LOH) ......................... 172, 174, 175
browsers ...................................................... 294, 296, 297 Luminex xMAP
sequencing ...................4, 10, 45, 130, 134, 136, 137, 140, analyzers
142, 143, 229, 267, 293, 295, 297–299 FlexMAP......................................................116–119
Genomic variant annotations and very important MAGPIX ............................................. 116, 117, 119
pharmacogenes .......................................315–316 lambda exonuclease treatment ............................119–122
Genotype data ................... 207, 214, 297, 298, 303, 307, 308 microspheres ....................................... 115–120, 122–126
Genotype-phenotype associations .............................. 18, 268 oligonucleotide ligation assay (OLA) .........................118
Genotype-Tissue Expression (GTEx) ..............................302 spacer-modified TAGged primers ...................... 119, 121
Genotyping TAGged primers......................................... 118, 119, 121
genome-wide .................................18, 155–176, 202, 293 using ...................................................................115–126
PHARMACOGENOMICS: METHODS AND PROTOCOLS
358 Index

Lymphoblastoid cell lines (LCLs) ................... 202–204, 209, Pharmacogenetics ....................... 4–6, 8–10, 16, 17, 115–126,
214, 217, 219, 302, 303 179–187, 221, 263–277, 293, 294, 296, 297, 304,
306–307, 311, 314–317, 322, 324, 337–340,
M 345–348, 350, 351
MAF. See Minor allele frequencies (MAF) Pharmacogenetics and pharmacogenomics knowledge base
MALDI. See Matrix-assisted laser desorption/ionisation (PharmGKB) ..................142, 294, 306, 311–319,
(MALDI) 322, 327, 328, 332, 333
Matrix-assisted laser desorption/ionisation (MALDI) Pharmacogenetics project ......................... 345, 346, 348, 350
mass differences ............................................................74 Dutch Pharmacogenetics Working
MassEXTEND primers .........................................77, 79 Group (DPWG) .....................................346–348
mass signals ..................................................................79 Pharmacogenetics Research Network (PGRN)....... 311, 313,
Mean threshold cycle................................................ 185, 186 316, 322
Membrane vesicles ....................226, 227, 237–241, 245–247 Pharmacogenomics
Metabolism .......................5–7, 11, 16, 17, 55, 71, 76, 87–97, applications ..............................10–17, 219–222, 297, 332
142, 248, 251–261, 265, 267, 268, 271, 279, 316, clinical ............................................................................5
327, 332 implementation of ...........................5, 217–219, 318, 325
Methotrexate (MTX) ............................................... 241, 248 Plasma membrane vesicles ................ 226, 238–241, 245–247
Microarrays..................... 9, 10, 155, 156, 162, 164–166, 169, Plasmid controls .......................................................190–192
171–176, 300–302 genotype control ......................................... 189, 193–195
Minor allele frequencies (MAF) ....................... 130, 148, 149 Polymerase chain reaction (PCR)
Minor groove binder (MGB) .......................................88–90 amplification .......................... 36, 72, 88, 90–94, 118, 120
Misfolded proteins ..................................................... 62, 227 asymmetric .........................................................118–121
Mouse embryonic fibroblasts (MEFs) ......................269–275 buffer ........................................................ 52, 77, 79, 121
mRNA levels ............................................ 201, 234, 242, 243 efficiency............................................. 184, 185, 187, 191
Multiplex ligation-dependent probe primers...............31, 32, 36, 45, 49, 74, 75, 77–79, 83, 97,
amplification (MLPA) ....................................118 101–105, 111–113, 118, 120, 124, 150, 151,
Mutations 192–195, 231–233
causative.............................................. 128, 134, 136, 324 real-time quantitative..................181, 182, 184–186, 202,
thalassemia..................................................................332 206, 207
Polymorphisms
N copy number ..................................9, 65, 66, 71, 156, 222
microsatellite ................................................................65
National Human Genome Research Institute (NHGRI)
multinucleotide (MNPs) ..............................................88
catalog ............................................. 303, 304, 308
nonsynonymous ...............................66, 67, 226, 230, 247
Natural language processing (NLP) .................................315
regulatory.............................................. 88, 201, 202, 217
NCI60 cell lines................................................ 299, 301, 302
single nucleotide (SNPs) .......... 7, 9, 71, 75, 88, 115–126,
Next-generation sequencing (NGS) ......8, 127–144, 299, 300
128, 156, 182, 225–248, 267, 312, 322, 326
NLP. See Natural language processing (NLP)
Poor metabolizers (PMs) .....................5–8, 16, 346, 348, 351
Non-small cell lung cancer (NSCLC) ..........................13–14
Predictive model .......................................................337–343
Non-synonymous SNPs ......................61, 225–248, 304, 305
Promoters ..............................64–66, 162, 179, 228, 279–288
NSCLC. See Non-small cell lung cancer (NSCLC)
Pyrosequencing
O assays ....................... 98, 99, 101, 102, 105–107, 111–114
dispensation order....................................... 106, 108, 114
Online Mendelian inheritance in mis-priming ........................................................ 112, 113
man (OMIM) ................................. 303, 322, 328 predicted histograms................................... 106, 109–111
primer design ........................................ 98, 101, 102, 112
P
PCR. See Polymerase chain reaction (PCR) Q
Personal genome project (PGP) ............................... 307, 308 Quantitative structure-activity
Personalized medicine .................. 55, 56, 144, 225, 318, 322, relationship (QSAR)............................... 226, 241
325, 333, 335, 337, 345–346 Quantitative trait genes (QTG) ............... 267–269, 275–277
PGP. See Personal genome project (PGP) identification of .................................. 267–269, 275–277
Pharmacoepigenetics ................................................ 180, 181 Quantitative trait loci (QTL) ............266–269, 274, 275, 277
Pharmacogenes ......................................... 315–316, 326, 327 mapping analysis................................. 266–269, 272, 273
PHARMACOGENOMICS: METHODS AND PROTOCOLS
Index
359

R SNP detection ........................................................55–68

SN-38 ............................................................. 12, 65, 66, 248
Recombinant inbred (RI) lines .........................................267 SNP. See Single nucleotide polymorphisms (SNPs)
Repositories Software ..................29, 35, 37, 39, 40, 48, 64, 77, 80–82, 84,
web-based 88, 94–96, 98, 99, 101, 102, 105, 106, 108, 109,
mutation data, somatic .................................. 297, 299 112, 113, 119, 124, 151, 166, 172, 174, 184, 274,
PharmGKB .................................................. 294, 306 297, 324, 334, 340, 342, 347
repository of annotated genomic reference DNA primer design .......................................... 77, 98, 101, 112
sequences ........................................................295 Somatic mutations ............................................ 141, 298–300
repository of gene expression data ................300–302
repository of human genotypes .....................295–296 T

S TaqMan
drug metabolism enzymes (DME) genotyping
Sanger sequencing .............................130, 143, 180, 181, 190 assays...........................................................87–96
SCAN............................................................... 213–223, 302 genotyper software............................................ 88, 95, 96
Scanning electron microscopy (SEM), 245, 274 minor groove binder (MGB) probes .......................89, 90
Sequencing no-template controls (NTCs) ........................... 90, 93, 96
massively parallel ................................................ 130, 280 probes ..............................................88, 91, 182, 184, 234
technologies ................... 10, 128, 130, 133, 143, 202, 280 Transcription factor binding ..................... 201–210, 281, 306
whole-genome ............. 134, 136–143, 293, 297, 299, 318 Transport activity..............................226, 230, 239, 241–242,
Sequenom ..............................................67, 74, 77, 79, 82–84 246–247
Serious adverse drug reactions (SADRs) ................ 10, 12, 18
Significance analysis of microarrays (SAM) ............. 169, 170 U
Single nucleotide polymorphisms (SNPs) ......7, 9, 71, 75, 88,
UCSC genome browser ....................269, 294–296, 300, 301,
115–126, 128, 156, 182, 225–248, 267,
305, 306
312, 322, 326
UGT1A1 ............................. 7, 12, 65, 66, 326, 327, 331, 333
arrays ...........................................142, 156, 164, 172–174
UK Biobank ............................................................. 307, 308
coding ...........6, 87, 97, 207–210, 215, 216, 297, 305, 326
function, bioinformatic analyses of .............................305 V
genotyping assay ....................... 39, 77, 83, 87–91, 94, 95,
151, 204, 207 Very Important Pharmacogene (VIP) ......................313–316
regulatory.....................................207, 208, 210, 305, 326 Vitamin K oxide reductase (VKORC1) .............. 7, 8, 15–16,
tri-allelic ......................................................... 66, 67, 106 64, 65, 306, 326, 333, 337, 338, 340, 342
variants ............................... 62, 68, 72, 77, 134, 207, 217, W
226, 227, 231, 236, 241–244, 246–248, 304, 305
SmartAmp Warfarin ....... 7, 8, 15–16, 18, 64, 65, 267, 306, 334, 337–343
method ...................................................................55–68 WAVE system .......................................28, 29, 43, 44, 47, 53
primer design ................................................................64 Whole-exome sequencing ........................ 134, 136–142, 298

Alternative Splicing
100% (3)
Alternative Splicing
356 pages
Recombinant Protein Expression in Mammalian Cells: Methods and Protocols
100% (2)
Recombinant Protein Expression in Mammalian Cells: Methods and Protocols
311 pages
Lorenz Adlung - Cell and Molecular Biology For Non-Biologists - A Short Introduction Into Key Biological Concepts-Springer (2023)
100% (2)
Lorenz Adlung - Cell and Molecular Biology For Non-Biologists - A Short Introduction Into Key Biological Concepts-Springer (2023)
132 pages
Textbook of Personalized Medicine, 2nd Edition
100% (9)
Textbook of Personalized Medicine, 2nd Edition
763 pages
Computational Systems Biology in Medicine and Biotechnology
No ratings yet
Computational Systems Biology in Medicine and Biotechnology
493 pages
Pharmacogenomics: Current Status and Future Perspectives
No ratings yet
Pharmacogenomics: Current Status and Future Perspectives
13 pages
(Methods in Molecular Biology 1751) Yejun Wang, Ming-An Sun (Eds.) - Transcriptome Data Analysis - Methods and Protocols-Humana Press (2018)
100% (1)
(Methods in Molecular Biology 1751) Yejun Wang, Ming-An Sun (Eds.) - Transcriptome Data Analysis - Methods and Protocols-Humana Press (2018)
239 pages
(Methods in Molecular Biology 1525) Jonathan M. Keith (Eds.) - Bioinformatics - Volume I - Data, Sequence Analysis, and Evolution-Humana Press (2017)
100% (3)
(Methods in Molecular Biology 1525) Jonathan M. Keith (Eds.) - Bioinformatics - Volume I - Data, Sequence Analysis, and Evolution-Humana Press (2017)
489 pages
2019 Book BioinformaticsAndDrugDiscovery PDF
100% (3)
2019 Book BioinformaticsAndDrugDiscovery PDF
323 pages
Bioinformatics Sequences Structures Phylogeny 2018 PDF
100% (5)
Bioinformatics Sequences Structures Phylogeny 2018 PDF
402 pages
(Methods in Molecular Biology 1318) Robert Hnasko (Eds.) - ELISA - Methods and Protocols-Humana Press (2015)
100% (4)
(Methods in Molecular Biology 1318) Robert Hnasko (Eds.) - ELISA - Methods and Protocols-Humana Press (2015)
220 pages
Next Generation Sequencing
100% (9)
Next Generation Sequencing
301 pages
Recombinant Gene Expression
100% (1)
Recombinant Gene Expression
643 pages
Protein Purification Applications 2nd Ed-Practical Approach
100% (4)
Protein Purification Applications 2nd Ed-Practical Approach
182 pages
Applying Pharmacogenomics in Therapeutics-2016
100% (2)
Applying Pharmacogenomics in Therapeutics-2016
307 pages
(Molecular and Translational Medicine) Alan H.B. Wu, Kiang-Teck J. Yeo-Pharmacogenomic Testing in Current Clinical Practice_ Implementation in the Clinical Laboratory (Molecular and Translational Medi
No ratings yet
(Molecular and Translational Medicine) Alan H.B. Wu, Kiang-Teck J. Yeo-Pharmacogenomic Testing in Current Clinical Practice_ Implementation in the Clinical Laboratory (Molecular and Translational Medi
288 pages
Clinical Applications of Capillary Electrophoresis: Methods and Protocols
100% (1)
Clinical Applications of Capillary Electrophoresis: Methods and Protocols
270 pages
Ebook PDF
100% (2)
Ebook PDF
517 pages
Best-Insilco Epitope Design
100% (1)
Best-Insilco Epitope Design
410 pages
Transporters in Drug Development: Yuichi Sugiyama Bente Steff Ansen
No ratings yet
Transporters in Drug Development: Yuichi Sugiyama Bente Steff Ansen
322 pages
2017 Book SingleCellAnalysis
100% (1)
2017 Book SingleCellAnalysis
277 pages
Cell Culture Technology
No ratings yet
Cell Culture Technology
174 pages
Regulated Bioanalysis - Fundamentals An PDF
No ratings yet
Regulated Bioanalysis - Fundamentals An PDF
239 pages
Biologics and Biosimilars (Taylor Francis Group) CRC Press - (Z-Library)
100% (1)
Biologics and Biosimilars (Taylor Francis Group) CRC Press - (Z-Library)
529 pages
Eukaryotic Transcriptional and Post-Transcriptional Gene Expression Regulation
100% (1)
Eukaryotic Transcriptional and Post-Transcriptional Gene Expression Regulation
280 pages
A Practical Guide To Pharmacological Biotechnology - Patra 2019
No ratings yet
A Practical Guide To Pharmacological Biotechnology - Patra 2019
153 pages
Current Applications of Pharmaceutical Biotechnology by Ana Catarina Silva João Nuno Moreira José Manuel Sousa Lobo Hugo Almeida
No ratings yet
Current Applications of Pharmaceutical Biotechnology by Ana Catarina Silva João Nuno Moreira José Manuel Sousa Lobo Hugo Almeida
524 pages
DrugDrug Interactions
No ratings yet
DrugDrug Interactions
244 pages
Algorithms For Next-Generation Sequencing Data (3319598244)
100% (2)
Algorithms For Next-Generation Sequencing Data (3319598244)
356 pages
Bioparmaceutics and Clinical Pharmacokinetics An Introduction
75% (4)
Bioparmaceutics and Clinical Pharmacokinetics An Introduction
436 pages
(Ying Xu, Juan Cui, David Puett (Auth.) ) Cancer Bi (B-Ok - CC) PDF
100% (1)
(Ying Xu, Juan Cui, David Puett (Auth.) ) Cancer Bi (B-Ok - CC) PDF
386 pages
Vaccine Delivery TechnologyMethods and Protocols - 1071607944 PDF
100% (1)
Vaccine Delivery TechnologyMethods and Protocols - 1071607944 PDF
595 pages
Bok:978 1 4471 4372 7 PDF
No ratings yet
Bok:978 1 4471 4372 7 PDF
997 pages
Protein-Ligand Interactions Methods and Applications (Etc.) (Z-Library)
100% (1)
Protein-Ligand Interactions Methods and Applications (Etc.) (Z-Library)
484 pages
Discovering and Developing Molecules With Optimal Drug-Like Properties (20 PDF
100% (1)
Discovering and Developing Molecules With Optimal Drug-Like Properties (20 PDF
510 pages
(Methods in Molecular Biology 1296) Mathieu Rederstorff (Eds.) - Small Non-Coding RNAs - Methods and Protocols-Humana Press (2015) PDF
100% (1)
(Methods in Molecular Biology 1296) Mathieu Rederstorff (Eds.) - Small Non-Coding RNAs - Methods and Protocols-Humana Press (2015) PDF
239 pages
(Methods in Molecular Biology 2194) Joseph Markowitz - Translational Bioinformatics For Therapeutic Development-Springer US - Humana (2021)
100% (1)
(Methods in Molecular Biology 2194) Joseph Markowitz - Translational Bioinformatics For Therapeutic Development-Springer US - Humana (2021)
323 pages
Drug Safety Evaluation PDF
No ratings yet
Drug Safety Evaluation PDF
446 pages
Sel Hewan
100% (1)
Sel Hewan
489 pages
Textbook On Cloning, Expression and Purification of Recombinant
No ratings yet
Textbook On Cloning, Expression and Purification of Recombinant
315 pages
Statistics in Drug Research PDF
No ratings yet
Statistics in Drug Research PDF
372 pages
Practical Protein Bioinformatics PDF
No ratings yet
Practical Protein Bioinformatics PDF
111 pages
Fundamentals and Applns. of Ctld. Rel. Drug Delivery - J. Siepmann, Et. Al., (Springer, 2012) WW PDF
No ratings yet
Fundamentals and Applns. of Ctld. Rel. Drug Delivery - J. Siepmann, Et. Al., (Springer, 2012) WW PDF
607 pages
Real Time PCR Guide Bio Rad
100% (1)
Real Time PCR Guide Bio Rad
105 pages
(Methods in Molecular Biology 1903) Quentin Vanhaelen - Computational Methods For Drug Repurposing-Springer New York, Humana Press (2019)
No ratings yet
(Methods in Molecular Biology 1903) Quentin Vanhaelen - Computational Methods For Drug Repurposing-Springer New York, Humana Press (2019)
331 pages
Application of Clinical Bioinformatics
No ratings yet
Application of Clinical Bioinformatics
395 pages
Protein Protein Interactions Methods and
No ratings yet
Protein Protein Interactions Methods and
612 pages
Proteomics: Methods and Protocols
100% (2)
Proteomics: Methods and Protocols
375 pages
Bioinformatics of Human Proteomics: Xiangdong Wang Editor
100% (2)
Bioinformatics of Human Proteomics: Xiangdong Wang Editor
395 pages
Introduction To Antibody Engineering (Florian Rüker, Gordana Wozniak-Knopp)
100% (2)
Introduction To Antibody Engineering (Florian Rüker, Gordana Wozniak-Knopp)
388 pages
2018 Antibiotic
No ratings yet
2018 Antibiotic
169 pages
(Paula Meleady (Eds.) ) Heterologous Protein Produc (B-Ok - CC)
100% (1)
(Paula Meleady (Eds.) ) Heterologous Protein Produc (B-Ok - CC)
256 pages
Aftab Ahmad, Sultan Habibullah Khan, Zulqurnain Khan - The CRISPR - Cas Tool Kit For Genome Editing-Springer (2022)
No ratings yet
Aftab Ahmad, Sultan Habibullah Khan, Zulqurnain Khan - The CRISPR - Cas Tool Kit For Genome Editing-Springer (2022)
342 pages
Embryonic Stem Cell Protocols
No ratings yet
Embryonic Stem Cell Protocols
451 pages
NGS
100% (3)
NGS
252 pages
(Methods in Molecular Biology, 2231) Kazutaka Katoh - Multiple Sequence Alignment - Methods and Protocols-Humana (2020)
No ratings yet
(Methods in Molecular Biology, 2231) Kazutaka Katoh - Multiple Sequence Alignment - Methods and Protocols-Humana (2020)
322 pages
Guidelines and Screening Methods of Pharmacology
From Everand
Guidelines and Screening Methods of Pharmacology
Surendra H Bodakhe
No ratings yet
Industrial Pharmacy-I
From Everand
Industrial Pharmacy-I
D. K. Tripathi
No ratings yet
Clinical Research Associate - The Comprehensive Guide: Vanguard Professionals
From Everand
Clinical Research Associate - The Comprehensive Guide: Vanguard Professionals
Viruti Shivan
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Lab Preparations Biorad
No ratings yet
Lab Preparations Biorad
20 pages
High Throughput Next Generation Sequencing
No ratings yet
High Throughput Next Generation Sequencing
2 pages
A Report On The Mosquitoes of Mainland Åland, Southwestern Finland and Revised List of Finnish Mosquitoes - Culverwell C.
No ratings yet
A Report On The Mosquitoes of Mainland Åland, Southwestern Finland and Revised List of Finnish Mosquitoes - Culverwell C.
10 pages
Plant Genotyping Methods and Protocols 1st Edition Jacqueline Batley (Eds.) - The latest ebook edition with all chapters is now available
100% (3)
Plant Genotyping Methods and Protocols 1st Edition Jacqueline Batley (Eds.) - The latest ebook edition with all chapters is now available
51 pages
Full Download Introduction to Basics of Pharmacology and Toxicology Volume 3 Experimental Pharmacology 1st Edition by Mageshwaran Lakshmanan, Deepak Gopal Shewade, Gerard Marshall Raj ISBN 9811953457 9789811953453 PDF DOCX
100% (17)
Full Download Introduction to Basics of Pharmacology and Toxicology Volume 3 Experimental Pharmacology 1st Edition by Mageshwaran Lakshmanan, Deepak Gopal Shewade, Gerard Marshall Raj ISBN 9811953457 9789811953453 PDF DOCX
79 pages
Caplan, Michael J-Reference Module in Biomedical Research-Elsevier (2014)
No ratings yet
Caplan, Michael J-Reference Module in Biomedical Research-Elsevier (2014)
17 pages
Selection, Screening and Analysis of Recombinants
No ratings yet
Selection, Screening and Analysis of Recombinants
55 pages
BigDye - TFS-Assets_LSG_manuals_MAN1000355-BDTv3-1CycleSeqKit-UG
No ratings yet
BigDye - TFS-Assets_LSG_manuals_MAN1000355-BDTv3-1CycleSeqKit-UG
56 pages
Informasi Harga ONT 2022
No ratings yet
Informasi Harga ONT 2022
5 pages
Sunder Rajan - 2005 - Biocapital
No ratings yet
Sunder Rajan - 2005 - Biocapital
359 pages
Bago To, Ito Ang Pinaka
No ratings yet
Bago To, Ito Ang Pinaka
34 pages
12733Larone s Medically Important Fungi A Guide to Identification 7th Edition Lars F Westblade Eileen M Burd Shawn R Lockhart Gary W Procop download
100% (1)
12733Larone s Medically Important Fungi A Guide to Identification 7th Edition Lars F Westblade Eileen M Burd Shawn R Lockhart Gary W Procop download
71 pages
KERAGAMAN GENETIK GURITA Octopus Cyanea (LINNAEUS 1758) PAPUA
No ratings yet
KERAGAMAN GENETIK GURITA Octopus Cyanea (LINNAEUS 1758) PAPUA
9 pages
(Ebook) TILLING and Eco-TILLING for Crop Improvement by Anjanabha Bhattacharya, Vilas Parkhi, Bharat Char ISBN 9789819927210, 9819927218 - Get the ebook instantly with just one click
100% (1)
(Ebook) TILLING and Eco-TILLING for Crop Improvement by Anjanabha Bhattacharya, Vilas Parkhi, Bharat Char ISBN 9789819927210, 9819927218 - Get the ebook instantly with just one click
86 pages
(Ebook) Conservation and the genomics of populations by Margaret Byrne; Sally N. Aitken; Gordon Luikart; Frederick William Allendorf; W. Chris Funk ISBN 9780198856566, 9780198856573, 0198856563, 0198856571 All Chapters Instant Download
100% (3)
(Ebook) Conservation and the genomics of populations by Margaret Byrne; Sally N. Aitken; Gordon Luikart; Frederick William Allendorf; W. Chris Funk ISBN 9780198856566, 9780198856573, 0198856563, 0198856571 All Chapters Instant Download
77 pages
Bio 102 Practice Problems Recombinant DNA and Biotechnology
No ratings yet
Bio 102 Practice Problems Recombinant DNA and Biotechnology
7 pages
Bioqt 01
No ratings yet
Bioqt 01
84 pages
3-s2.0-B9780123814661000146-main
No ratings yet
3-s2.0-B9780123814661000146-main
20 pages
Illumina Adapter Sequences
No ratings yet
Illumina Adapter Sequences
34 pages
DNA-Encoded Chemistry - Drug Discovery From A Few Good Reactions
No ratings yet
DNA-Encoded Chemistry - Drug Discovery From A Few Good Reactions
23 pages
PDF (eBook PDF) Molecular Population Genetics by Matthew W. Hahn download
100% (2)
PDF (eBook PDF) Molecular Population Genetics by Matthew W. Hahn download
55 pages
I. Basics of Molecular Biology:: DNA, RNA, Protein, Transcription, Translation, Genome
No ratings yet
I. Basics of Molecular Biology:: DNA, RNA, Protein, Transcription, Translation, Genome
56 pages
Instant ebooks textbook Cytochrome P450 Part C 1st Edition Eric F. Johnson download all chapters
100% (11)
Instant ebooks textbook Cytochrome P450 Part C 1st Edition Eric F. Johnson download all chapters
60 pages
Top 100 MCQs Biotechnology Principles and Processes 25 Nov
No ratings yet
Top 100 MCQs Biotechnology Principles and Processes 25 Nov
101 pages
Bio Technology (BT) 3rd Year Syllabus
No ratings yet
Bio Technology (BT) 3rd Year Syllabus
55 pages
SAMUEL PUERTA VALBUENA-Alumno - Formative D - Genetic Determinism
No ratings yet
SAMUEL PUERTA VALBUENA-Alumno - Formative D - Genetic Determinism
13 pages
Bioinformatics Thomas Dandekar Meik Kunz instant download
100% (2)
Bioinformatics Thomas Dandekar Meik Kunz instant download
81 pages
Neolithic Mitochondrial Haplogroup H Genomes and The Genetic Origins of Europeans
No ratings yet
Neolithic Mitochondrial Haplogroup H Genomes and The Genetic Origins of Europeans
11 pages
Amylase Activity Lab QUBES.v4-7386
No ratings yet
Amylase Activity Lab QUBES.v4-7386
26 pages
Investigating Kinship of Neolithic post-LBK Human Remains From Krusza Zamkowa, Poland Using Ancient DNA
No ratings yet
Investigating Kinship of Neolithic post-LBK Human Remains From Krusza Zamkowa, Poland Using Ancient DNA
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.