Key Points
-
Genome-wide association studies are rapidly becoming feasible as an approach for identifying the genes that underlie common diseases and related quantitative traits. This strategy combines a comprehensive and unbiased survey of the genome with the power to detect common alleles with modest phenotypic effects.
-
Sets of markers for genome-wide association studies can be chosen using various criteria, but the degree to which a particular marker set actually surveys the genome should be evaluated if the label “genome-wide association” is to be applied. Empirical assessments of linkage disequilibrium patterns, such as those that are being performed in the HapMap project, will enable the selection of efficient sets of markers and the evaluation of the comprehensiveness of a given marker set.
-
Study design and interpretation of results must include appropriate statistical thresholds that take multiple-hypothesis testing into account, as can be achieved, for example, by permutation testing. Balancing the need for power to detect modest effects with the cost of genotyping large numbers of markers will probably require a multi-stage design.
-
False-positive results that arise due to population stratification might outnumber true associations, and population stratification should be assessed and corrected for, if needed. Alternatively, family-based designs can be used, but high-quality data are needed to avoid artifacts that are specific to these designs.
-
Gene–gene and gene–environment interactions might be common in complex traits, but unbounded searches for such interactions are unlikely to retain adequate power in studies of hundreds of thousands of markers. Either new methods will be required, or, alternatively, markers with individual effects will need to be identified first, followed by focused searches for interactions.
-
Genome-wide association studies are likely to become a reality in the near future. Care will be required in their design, performance, analysis and interpretation, and well-conceived pilot studies might be valuable for understanding and minimizing the pitfalls of this approach. Nevertheless, genome-wide association studies have the potential to identify many genes for common diseases and quantitative traits.
Abstract
Genetic factors strongly affect susceptibility to common diseases and also influence disease-related quantitative traits. Identifying the relevant genes has been difficult, in part because each causal gene only makes a small contribution to overall heritability. Genetic association studies offer a potentially powerful approach for mapping causal genes with modest effects, but are limited because only a small number of genes can be studied at a time. Genome-wide association studies will soon become possible, and could open new frontiers in our understanding and treatment of disease. However, the execution and analysis of such studies will require great care.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
International human genome sequencing consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).
Gibbs, R. A. et al. The international HapMap project. Nature 426, 789–796 (2003). A description of the HapMap project, which will empirically determine LD patterns across the human genome, allowing the efficient selection of SNPs for genome-wide association studies.
Weiss, K. M. & Terwilliger, J. D. How many diseases does it take to map a gene with SNPs? Nature Genet. 26, 151–157 (2000).
Blangero, J. Localization and identification of human quantitative trait loci: king harvest has surely come. Curr. Opin. Genet. Dev. 14, 233–240 (2004).
McKeigue, P. M. Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am. J. Hum. Genet. 63, 241–251 (1998).
Patterson, N. et al. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004).
Hoggart, C. J., Shriver, M. D., Kittles, R. A., Clayton, D. G. & McKeigue, P. M. Design and analysis of admixture mapping studies. Am. J. Hum. Genet. 74, 965–978 (2004).
Zhu, X., Cooper, R. S. & Elston, R. C. Linkage analysis of a complex disease through use of admixed populations. Am. J. Hum. Genet. 74, 1136–1153 (2004).
Jimenez-Sanchez, G., Childs, B. & Valle, D. Human disease genes. Nature 409, 853–855 (2001).
Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).
Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).
Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603 (2001).
Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603–606 (2001).
Rioux, J. D. et al. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nature Genet. 29, 223–228 (2001).
Stoll, M. et al. Genetic variation in DLG5 is associated with inflammatory bowel disease. Nature Genet. 36, 476–480 (2004).
Stefansson, H. et al. Neuregulin 1 and susceptibility to schizophrenia. Am. J. Hum. Genet. 71, 877–892 (2002).
Nistico, L. et al. The CTLA-4 gene region of chromosome 2q33 is linked to, and associated with, type 1 diabetes. Hum. Mol. Genet. 5, 1075–1080 (1996).
Altmuller, J., Palmer, L. J., Fischer, G., Scherb, H. & Wjst, M. Genomewide scans of complex human diseases: true linkage is hard to find. Am. J. Hum. Genet. 69, 936–950 (2001).
Daly, M. J. & Rioux, J. D. New approaches to gene hunting in IBD. Inflamm. Bowel Dis. 10, 312–317 (2004).
Evans, D. M. & Cardon, L. R. Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps. Am. J. Hum. Genet. 75, 687–692 (2004).
Enhancing linkage analysis of complex disorders: an evaluation of high-density genotyping. Hum. Mol. Genet. 13, 1943–1949 (2004).
John, S. et al. Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites. Am. J. Hum. Genet. 75, 54–64 (2004).
Middleton, F. A. et al. Genomewide linkage analysis of bipolar disorder by use of a high-density single-nucleotide-polymorphism (SNP) genotyping assay: a comparison with microsatellite marker assays and finding of significant linkage to chromosome 6q22. Am. J. Hum. Genet. 74, 886–897 (2004).
Levy, D. et al. Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study. Hypertension 36, 477–483 (2000).
Cox, N. J. et al. Seven regions of the genome show evidence of linkage to type 1 diabetes in a consensus analysis of 767 multiplex families. Am. J. Hum. Genet. 69, 820–830 (2001).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996). A discussion of the power of association studies versus linkage studies for common alleles of modest effect, also anticipating the requirement to take multiple-hypothesis testing into account in genome-wide association studies.
Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).
Cardon, L. R. & Bell, J. I. Association study designs for complex diseases. Nature Rev. Genet. 2, 91–99 (2001).
Tabor, H. K., Risch, N. J. & Myers, R. M. Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Rev. Genet. 3, 391–397 (2002).
Wang, W. Y. S., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).
Harris, H. The Principle of Human Biochemical Genetics 211–242 (American Elsevier Publishing Company, New York, 1970).
Chakravarti, A. Population genetics — making sense out of sequence. Nature Genet. 21, 56–60 (1999).
Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).
Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247 (1999).
Ioannidis, J. P., Ntzani, E. E., Trikalinos, T. A. & Contopoulos-Ioannidis, D. G. Replication validity of genetic association studies. Nature Genet. 29, 306–309 (2001).
Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003). A meta-analysis of association studies between common variants and common diseases, which indicates that a fraction (but much fewer than half) of reported associations are correct. Modest effects are the rule, indicating the need for large sample sizes.
Gloyn, A. L. et al. Large-scale association studies of variants in genes encoding the pancreatic α-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 52, 568–572 (2003).
Florez, J. C. et al. Haplotype structure and genotype-phenotype correlations of the sulfonylurea receptor and the islet ATP-sensitive potassium channel gene region. Diabetes 53, 1360–1368 (2004).
Altshuler, D. et al. The common PPARG Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26, 76–80 (2000) This study uses large sample sizes to demonstrate a modest but consistent association between a missense polymorphism in a candidate gene and type 2 diabetes.
Stefansson, H. et al. Association of neuregulin 1 with schizophrenia confirmed in a Scottish population. Am. J. Hum. Genet. 72, 83–87 (2003).
Yang, J. Z. et al. Association study of neuregulin 1 gene with schizophrenia. Mol. Psychiatry 8, 706–709 (2003).
Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003). By testing many variants in large samples, and using logistic regression, this study shows that a 3′ UTR variant is more strongly associated with autoimmune diseases than the previously studied missense variant in the same gene.
Negoro, K. et al. Analysis of the IBD5 locus and potential gene–gene interactions in Crohn's disease. Gut 52, 541–546 (2003).
Giallourakis, C. et al. IBD5 is a general risk factor for inflammatory bowel disease: replication of association with Crohn disease and identification of a novel association with ulcerative colitis. Am. J. Hum. Genet. 73, 205–211 (2003).
Lindgren, C. & Hirschhorn, J. Genetics of type 2 diabetes. Endocrinologist 11, 178–187 (2001).
Florez, J. C., Hirschhorn, J. & Altshuler, D. The inherited basis of diabetes mellitus: implications for the genetic analysis of complex traits. Annu. Rev. Genomics Hum. Genet. 4, 257–291 (2003).
Vaisse, C. et al. Melanocortin-4 receptor mutations are a frequent and heterogeneous cause of morbid obesity. J. Clin. Invest. 106, 253–262 (2000).
Hirschhorn, J. N. & Altshuler, D. Once and again — issues surrounding replication in genetic association studies. J. Clin. Endocrinol. Metab. 87, 4438–4441 (2002).
Cohen, J. C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).
Carlson, C. S., Eberle, M. A., Kruglyak, L. & Nickerson, D. A. Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004). A useful and clear recent review of genome-wide association studies.
Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).
Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nature Genet. 27, 234–236 (2001).
Syvanen, A. C. Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature Rev. Genet. 2, 930–942 (2001).
Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139–144 (1999).
Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000).
Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J. & Lander, E. S. High-resolution haplotype structure in the human genome. Nature Genet. 29, 229–232 (2001). The first description of long segments of strong LD with low haplotype diversity ('haplotype blocks').
Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001). A survey of chromosome 21 that reveals long segments of LD with low haplotype diversity.
Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002). A survey of over 50 genomic regions that reveals long segments of LD with low haplotype diversity, including relatively large samples from multiple populations.
Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nature Genet. 29, 233–237 (2001).
Dawson, E. et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418, 544–548 (2002).
Crawford, D. C. et al. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74, 610–622 (2004).
Goldstein, D. B., Ahmadi, K. R., Weale, M. E. & Wood, N. W. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet. 19, 615–622 (2003).
Zhang, K., Deng, M., Chen, T., Waterman, M. S. & Sun, F. A dynamic programming algorithm for haplotype block partitioning. Proc. Natl Acad. Sci. USA 99, 7335–7339 (2002).
Stram, D. O. et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 55, 27–36 (2003).
Ke, X. & Cardon, L. R. Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003).
Weale, M. E. et al. Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am. J. Hum. Genet. 73, 551–565 (2003).
Carlson, C. S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).
Halldorsson, B. V. et al. Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Res. 14, 1633–1640 (2004).
Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33 Suppl. 228–237 (2003). A proposal to focus on missense SNPs in the search for the variants that underlie common disease.
Cambien, F. et al. Sequence diversity in 36 candidate genes for cardiovascular disorders. Am. J. Hum. Genet. 65, 183–191 (1999).
Shendure, J., Mitra, R. D., Varma, C. & Church, G. M. Advanced sequencing technologies: methods and goals. Nature Rev. Genet. 5, 335–344 (2004).
Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Loots, G. G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000). The identification of functional regulatory sequences using evolutionary conservation.
Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet. 2, 100–109 (2001).
Thomas, J. W. et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003).
Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M. Scanning human gene deserts for long-range enhancers. Science 302, 413 (2003).
Frazer, K. A. et al. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14, 367–372 (2004).
Boffelli, D., Nobrega, M. A. & Rubin, E. M. Comparative genomics at the vertebrate extremes. Nature Rev. Genet. 5, 456–465 (2004).
Buetow, K. H. et al. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl Acad. Sci. USA 98, 581–584 (2001).
De La Vega, F. M., et al. New generation pharmacogenomic tools: a SNP linkage disequilibrium map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies. Biotechniques (Suppl.), 48–50, 52, 54 (2002).
Matsuzaki, H. et al. Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res. 14, 414–425 (2004).
van den Oord, E. J. & Sullivan, P. F. False discoveries and models for gene discovery. Trends Genet. 19, 537–542 (2003).
Lowe, C. E. et al. Cost-effective analysis of candidate genes using htSNPs: a staged approach. Genes Immun. 5, 301–305 (2004).
Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. & Golani, I. Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284 (2001).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Dudbridge, F. & Koeleman, B. P. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004).
Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).
Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004). A Bayesian perspective on the interpretation of association studies, which emphasizes the negative impact of low prior probabilities and inadequate power on the likelihood that an association is valid.
Barratt, B. J. et al. Remapping the insulin gene/IDDM2 locus in type 1 diabetes. Diabetes 53, 1884–1889 (2004).
Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA Pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).
Barratt, B. J. et al. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. 66, 393–405 (2002).
Allison, D. B. Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60, 676–690 (1997).
Rabinowitz, D. A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47, 342–350 (1997).
Fulker, D. W., Cherny, S. S., Sham, P. C. & Hewitt, J. K. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64, 259–267 (1999).
Abecasis, G. R., Cookson, W. O. & Cardon, L. R. Pedigree tests of transmission disequilibrium. Eur. J. Hum. Genet. 8, 545–551 (2000).
Abecasis, G. R., Cardon, L. R. & Cookson, W. O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).
Zaykin, D. V. et al. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53, 79–91 (2002).
Schaid, D. J., Rowland, C. M., Tines, D. E., Jacobson, R. M. & Poland, G. A. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70, 425–434 (2002).
Stram, D. O. et al. Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum. Hered. 55, 179–190 (2003).
Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies. 65, 220–228 (1999).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Reich, D. E. & Goldstein, D. B. Detecting association in a case-control study while correcting for population stratification. Am. J. Hum. Genet. 20, 4–16 (2001).
Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000). Description of software for detecting and correcting for the presence of multiple population subgroups in an association study.
Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nature Genet. 36, 388–393 (2004).
Morton, N. E. & Collins, A. Tests and estimates of allelic association in complex inheritance. Proc. Natl Acad. Sci. USA 95, 11389–11393 (1998).
Wacholder, S., Rothman, N. & Caporaso, N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol. Biomarkers Prev. 11, 513–520 (2002).
Thomas, D. C. & Witte, J. S. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol. Biomarkers Prev. 11, 505–512 (2002).
Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).
Ardlie, K. G., Lunetta, K. L. & Seielstad, M. Testing for population subdivision and association in four case-control studies. Am. J. Hum. Genet. 71, 304–311 (2002).
Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004).
Rosenberg, N. A., Li, L. M., Ward, R. & Pritchard, J. K. Informativeness of genetic markers for inference of ancestry. Am. J. Hum. Genet. 73, 1402–1422 (2003).
Spielman, R. S. & Ewens, W. J. The TDT and other family-based tests for linkage disequilibrium and association. Am. J. Hum. Genet. 59, 983–989 (1996).
Frayling, T. M. et al. Parent-offspring trios: a resource to facilitate the identification of type 2 diabetes genes. Diabetes 48, 2475–2479 (1999).
Spielman, R. S. & Ewens, W. J. A sibship test for linkage in the presence of association: the sib transmission/ disequilibrium test. Am. J. Hum. Genet. 62, 450–458 (1998).
Horvath, S. & Laird, N. M. A discordant-sibship test for disequilibrium and linkage: no need for parental data. Am. J. Hum. Genet. 63, 1886–1897 (1998).
Boehnke, M. & Langefeld, C. D. Genetic association mapping based on discordant sib pairs: the discordant-alleles test. Am. J. Hum. Genet. 62, 950–961 (1998).
Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N. L. A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am. J. Hum. Genet. 67, 146–154 (2000).
Lazzeroni, L. C. Allele sharing and allelic association I: sib pair tests with increased power. Genet. Epidemiol. 22, 328–344 (2002).
Mitchell, A. A., Cutler, D. J. & Chakravarti, A. Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am. J. Hum. Genet. 72, 598–610 (2003).
Gordon, D. et al. A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur. J. Hum. Genet. 12, 752–761 (2004).
Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002). A discussion of epistasis, including the usefulness of searching first for main effects.
Leal, S. M. & Ott, J. Effects of stratification in the analysis of affected-sib-pair data: benefits and costs. Am. J. Hum. Genet. 66, 567–575 (2000).
Cordell, H. J., Wedig, G. C., Jacobs, K. B. & Elston, R. C. Multilocus linkage tests based on affected relative pairs. Am. J. Hum. Genet. 66, 1273–1286 (2000).
Cordell, H. J. et al. Statistical modeling of interlocus interactions in a complex disease: rejection of the multiplicative model of epistasis in type 1 diabetes. Genetics 158, 357–367 (2001).
Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001).
Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).
Singer, J. B. et al. Genetic dissection of complex traits with chromosome substitution strains of mice. Science 304, 445–448 (2004).
Ozaki, K. et al. Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nature Genet. 32, 650–654 (2002).
Kamatani, N. et al. Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP Maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs. Am. J. Hum. Genet. 75, 190–203 (2004).
Lin, S., Chakravarti, A. & Cutler, D. J. Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies. Nature Genet. 36, 1181–1188 (2004).
Vermeire, S. et al. CARD15 genetic variation in a Quebec population: prevalence, genotype-phenotype relationship, and haplotype structure. Am. J. Hum. Genet. 71, 74–83 (2002).
Kruglyak, L. Genetic isolates: separate but equal? Proc. Natl Acad. Sci. USA 96, 1170–1172 (1999).
Shifman, S., Kuypers, J., Kokoris, M., Yakir, B. & Darvasi, A. Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 12, 771–776 (2003).
Kaessmann, H. et al. Extensive linkage disequilibrium in small human populations in Eurasia. Am. J. Hum. Genet. 70, 673–685 (2002).
Acknowledgements
We thank David Altshuler, Paul DeBakker, Chris Newton-Cheh and Nick Patterson for useful discussions. J.N.H. is the recipient of a Burroughs Wellcome Career Award in Biomedical Science and a Smith Family Foundation New Investigator Award.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Glossary
- ASSOCIATION STUDY
-
A genetic variant is genotyped in a population for which phenotypic information is available (such as disease occurrence, or a range of different trait values). If a correlation is observed between genotype and phenotype, there is said to be an association between the variant and the disease or trait.
- QUANTITATIVE TRAIT
-
A biological trait that shows continuous variation (such as height) rather than falling into distinct categories (such as diabetic or healthy). The genetic basis of these traits generally involves the effects of multiple genes and gene–environment interactions. Examples of quantitative traits that contribute to disease are body mass index, blood pressure and blood lipid levels.
- CANDIDATE GENE
-
A gene for which there is evidence of its possible role in the trait or disease that is under study.
- LINKAGE MAPPING
-
Where genes are mapped by typing genetic markers in families to identify regions that are associated with disease or trait values within pedigrees more often than are expected by chance. Such linked regions are more likely to contain a causal genetic variant.
- ADMIXTURE MAPPING
-
Predicting the recent ancestry of chromosomal segments across the genome to identify regions for which recent ancestry in a particular population correlates with disease or trait values. Such regions are more likely to contain causal variants that are more common in the ancestral population.
- PENETRANCE
-
The proportion of individuals with a specific genotype who manifest the genotype at the phenotypic level. For example, if all individuals with a specific disease genotype show the disease phenotype, then the genotype is said to be 'completely penetrant'.
- HERITABILITY
-
The proportion of the variation in a given characteristic or state that can be attributed to (additive) genetic factors.
- LINKAGE DISEQUILIBRIUM
-
Correlation between nearby variants such that the alleles at neighbouring markers (observed on the same chromosome) are associated within a population more often than if they were unlinked.
- HAPLOTYPE
-
A sequential set of genetic markers that are present on the same chromosome.
- TAG SNPs
-
Single nucleotide polymorphisms that are correlated with, and therefore can serve as a proxy for, much of the known remaining common variation in a region.
- ASCERTAINMENT BIAS
-
A consequence of collecting a nonrandom subsample with a systematic bias, so that results based on the subsample are not representative of the entire sample.
- MULTIPLE-HYPOTHESIS TESTING
-
Testing more than one hypothesis within an experiment. As a result, the probability of an unusual result from within the entire experiment occurring by chance is higher than the individual p-value associated with that result.
- BONFERRONI CORRECTION
-
The simplest correction of individual p-values for multiple-hypothesis testing: pcorrected = 1 − (1 − puncorrected)n, where n is the number of hypotheses tested. This formula assumes that the hypotheses are all independent, and simplifies to pcorrected = npuncorrected when npuncorrected ≪1.
- ODDS RATIO
-
A measure of relative risk that is usually estimated from case-control studies.
- FREQUENTIST
-
A statistical approach for assessing the likelihood that a hypothesis is correct (such as an association being valid), by assessing the strength of the data that supports the hypothesis and the number of hypotheses that are tested.
- BAYESIAN
-
A statistical approach that assesses the probability of a hypothesis being correct (for example, whether an association is valid) by incorporating the prior probability of the hypothesis and the experimental data supporting the hypothesis.
- FOUNDER POPULATIONS
-
Populations that that have been derived from a limited pool of individuals within the last 100 or fewer generations.
- ADMIXTURE
-
Combining two or more populations into a single group. This has implications for studies of genotype–disease associations if the component populations have different genotypic distributions.
- DISCORDANT SIB STUDY
-
A family-based association approach that uses only sibs who are phenotypically discordant (that is, different). Like the transmission disequilibrium test, this approach is immune to population stratification.
- TRANSMISSION DISEQUILIBRIUM TEST
-
A family-based test for association that is immune to population stratification. The transmission of alleles from heterozygous parents to affected offspring is compared to the expected 1:1 ratio.
- HARDY–WEINBERG EQUILIBRIUM
-
The binomial distribution of genotypes in a population, such that frequencies of genotypes AA, Aa and aa will be p2, 2pq, and q2, respectively, where p is the frequency of allele A, and q is the frequency of allele a. Hardy–Weinberg equilibrium applies in a population when there are no factors such as migration or admixture that cause deviations from p2, 2pq and q2.
- EPISTASIS
-
In statistical genetics, this term refers to an interaction of multiple genetic variants (usually at different loci) such that the net phenotypic effect of carrying more than one variant is different than would be predicted by simply combining the effects of each individual variant (mathematically, this means that the gene–gene interaction is significant).
- MULTIFACTOR-DIMENSIONALITY REDUCTION
-
An approach that attempts to reduce the number of tests required to search for interactions between multiple variables.
Rights and permissions
About this article
Cite this article
Hirschhorn, J., Daly, M. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6, 95–108 (2005). https://doi.org/10.1038/nrg1521
Issue Date:
DOI: https://doi.org/10.1038/nrg1521
This article is cited by
-
Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine
Scientific Reports (2024)
-
Genetic association studies in critically ill patients: protocol for a systematic review
Systematic Reviews (2023)
-
Recombinant inbred lines derived from wide crosses in Pisum
Scientific Reports (2023)
-
Genetics of Diabetic Kidney Disease in Type 2 Diabetes: Candidate Gene Studies and Genome-Wide Association Studies (GWAS)
Journal of the Indian Institute of Science (2023)
-
Genome-wide association study for growth traits in Blanco Orejinegro and Romosinuano cattle
Tropical Animal Health and Production (2023)