0% found this document useful (0 votes)

109 views

GWAS

This document discusses statistical methods used to analyze genome-wide association studies (GWAS) data from case-control studies. It describes how allele counting and chi-square tests can be used to identify single nucleotide polymorphisms (SNPs) associated with a disease by comparing allele frequencies in cases versus controls. It also introduces logistic regression as a more flexible statistical approach for GWAS that allows modeling of genotype effects. The document provides examples of how odds ratios can be used to quantify effect sizes of genetic associations.

Uploaded by

RafaelAndresOspino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views

GWAS

Uploaded by

RafaelAndresOspino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Statistical analysis of genome-wide association

(GWAS) data

Jim Stankovich
Menzies Research Institute
University of Tasmania
J.Stankovich@utas.edu.au
Outline

• Introduction
• Confounding variables and linkage disequilibrium
• Statistical methods to test for association in case-control GWA
studies
– Allele counting chi-square test
– Logistic regression
• Multiple testing and power
• Example: GWAS for multiple sclerosis (MS)
– Data cleaning / quality control
– Results
GWA studies have been very successful since 2007

• Prior to the advent of GWA studies, there was very little success in
identifying genetic risk factors for complex multifactorial diseases
• GWA studies have identified over 200 separate associations with
various complex diseases in the past two years
• “Human Genetic Variation” hailed as “Breakthrough of the Year” by
Science magazine in 2007

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Human genome project

SNP genotyping arrays

The SNP consortium
GWA studies
The International HapMap Project
This talk: case-control GWA studies

• Obtain DNA from people with disease of interest (cases) and

SNP: rs12425791 stroke

This talk: case-control GWA studies

• Obtain DNA from people with disease of interest (cases) and

unaffected controls
• Run each DNA sample on a SNP chip to measure genotypes at
300,000-1,000,000 SNPs in cases and controls
• Identify SNPs where one allele is significantly more common in
cases than controls
– The SNP is associated with disease
• Alternative strategy (Peter Visscherʼs talk):
test for association between SNPs and a quantitative trait that
underlies the disease (endophenotype)

SNP: rs12425791 blood pressure stroke

Association does not imply causation

• Suppose that genotypes at a particular SNP are significantly

associated with disease
• This may be because the SNP is associated with some other factor
(a confounder), which is associated with disease but is not in the
same causal pathway

SNP near lactase gene multiple sclerosis (MS)

Association does not imply causation

• Suppose that genotypes at a particular SNP are significantly

associated with disease
• This may be because the SNP is associated with some other factor
(a confounder), which is associated with disease but is not in the
same causal pathway

Northern European
ancestry

SNP near lactase gene multiple sclerosis (MS)

Association does not imply causation

• Suppose that genotypes at a particular SNP are significantly

associated with disease
• This may be because the SNP is associated with some other factor (a
confounder), which is associated with disease but is not in the same
causal pathway
• Possible confounders of genetic associations:
– Ethnic ancestry
– Genotyping batch, genotyping centre
– DNA quality
• Environmental exposures in the same causal pathway
– Nicotine receptors --> smoking --> lung cancer
Hung et al, Nature 452: 633 (2008) + other articles in same issue
– Alcohol dehydrogenase genes --> alcohol consumption --> throat cancer
Hashibe et al, Nature Genetics 40: 707 (2008)
Helpful confounding: linkage disequilibrium

Linkage disequilibrium (LD) is the non-independence of alleles at

nearby markers in a population because of a lack of recombinations
between the markers

50,000 years ago: Today:

~50kb “Haplotype block”

Direct and indirect association testing
Hirschhorn and Daly: Nature Reviews Genetics 6: 95 (2005)

Functional SNP is genotyped Functional SNP (blue) is not

and an association is found genotyped, but a number of
other SNPs (red), in LD with
the functional SNP, are
genotyped, and an
association is found for these
SNPs
LD is helpful, because not all SNPs have to be genotyped

Peʼer et al: Nature Genetics 38: 663 (2006)

Allele counting to test for association between SNP
genotype and case / control status

GG GT TT Total
Cases r0 r1 r2 R
Controls s0 s1 s2 S
Total n0 n1 n2 N

Observed allele counts

G T Total
Cases 2r 0+r 1 r 1+2r 2 2R
Controls 2s 0+s 1 s 1+2s 2 2S
Total 2n 0+n 1 n 1+2n 2 2N
Allele counting to test for association between SNP
genotype and case / control status

GG GT TT Total
Cases r0 r1 r2 R
Controls s0 s1 s2 S
Total n0 n1 n2 N

Observed allele counts Expected allele counts

G T Total G T
Cases 2r 0+r 1 r 1+2r 2 2R 2R(2n 0+n 1)/(2N) 2R(n 1+2n 2)/(2N)
Controls 2s 0+s 1 s 1+2s 2 2S 2S(2n0+n1)/(2N) 2S(n 1+2n 2)/(2N)
Total 2n 0+n 1 n 1+2n 2 2N
Allele counting to test for association between SNP
genotype and case / control status

GG GT TT Total
Cases r0 r1 r2 R
Controls s0 s1 s2 S
Total n0 n1 n2 N

Observed allele counts Expected allele counts

G T Total G T
Cases 2r 0+r 1 r 1+2r 2 2R 2R(2n 0+n 1)/(2N) 2R(n 1+2n 2)/(2N)
Controls 2s 0+s 1 s 1+2s 2 2S 2S(2n0+n1)/(2N) 2S(n 1+2n 2)/(2N)
Total 2n 0+n 1 n 1+2n 2 2N

Chi-square test for independence of rows and columns (null hypothesis):

(Obs – Exp)2
∑
Exp
~ χ2 with 1 df

PLINK --assoc option Other options (e.g. dominant/recessive models)

--model
The odds ratio: a measure of effect size

Odds of an event occurring = Pr(event occurs) / Pr(event doesnʼt occur)

= Pr(event occurs) / [1 - Pr(event occurs)]
Allele counts
G T
Cases a b
Controls c d

Consider all the G alleles in the sample, and pick one at random.
The odds that the G allele occurs in a case: a/c

Consider all the T alleles in the sample, and pick one at random.
The odds that a T allele occurs in a case: b/d

odds ratio = odds that G allele occurs in a case = a/c = a d

odds that T allele occurs in a case b/d bc
Interpretation of the odds ratio
G T
Cases a b
Controls c d

odds ratio (OR) = odds that G allele occurs in a case = a d

odds that T allele occurs in a case bc

OR = increase in odds of being a case for each additional G allele

OR = 1: no association between genotype and disease

OR > 1: G allele increases risk of disease
OR < 1: T allele increases risk of disease

If the disease is rare (e.g. ~0.1% for MS), the odds ratio is roughly equal to
the genotype relative risk (GRR):
the increase in risk of disease conferred by each additional G allele

e.g. if OR = 1.2,
Pr(MS | TT) = 0.1% Pr(MS | GT) = 0.12% Pr(MS | GG) = 0.144%
Logistic regression: more flexible analysis for GWA studies
• Similar to linear regression, used for binary outcomes instead of
continuous outcomes
• Let Yi be the phenotype for individual i
Yi = 0 for controls
Yi = 1 for cases
• Let Xi be the genotype of individual i at a particular SNP
TT Xi = 0
GT Xi = 1
GG Xi = 2
Logistic regression: more flexible analysis for GWA studies
• Similar to linear regression, used for binary outcomes instead of
continuous outcomes
• Let Yi be the phenotype for individual i
Yi = 0 for controls
Yi = 1 for cases
• Let Xi be the genotype of individual i at a particular SNP
TT Xi = 0
GT Xi = 1
GG Xi = 2
• Basic logistic regression model
Let pi = E(Yi | Xi), expected value of pheno given geno
Define logit(pi) = loge[pi /(1- pi) ]
Logistic regression: more flexible analysis for GWA studies
• Similar to linear regression, used for binary outcomes instead of
continuous outcomes
• Let Yi be the phenotype for individual i
Yi = 0 for controls
Yi = 1 for cases
• Let Xi be the genotype of individual i at a particular SNP
TT Xi = 0
GT Xi = 1
GG Xi = 2
• Basic logistic regression model
Let pi = E(Yi | Xi), expected value of pheno given geno
Define logit(pi) = loge[pi /(1- pi) ]

logit(pi) ~ β0 + β1 Xi
Logistic regression: more flexible analysis for GWA studies
• Similar to linear regression, used for binary outcomes instead of
continuous outcomes
• Let Yi be the phenotype for individual i
Yi = 0 for controls
Yi = 1 for cases
• Let Xi be the genotype of individual i at a particular SNP
TT Xi = 0
GT Xi = 1
GG Xi = 2
• Basic logistic regression model
Let pi = E(Yi | Xi), expected value of pheno given geno
Define logit(pi) = loge[pi /(1- pi) ]

logit(pi) ~ β0 + β1 Xi
Test whether β1 differs significantly from zero:
roughly equivalent to allele counting chi-square test

Estimate of odds ratio: exp(β1)

Logistic regression: more flexible analysis for GWA studies
• Similar to linear regression, used for binary outcomes instead of
continuous outcomes
• Let Yi be the phenotype for individual i
Yi = 0 for controls
Yi = 1 for cases
• Let Xi be the genotype of individual i at a particular SNP
TT Xi = 0
GT Xi = 1
GG Xi = 2
• Add extra terms to adjust for potential confounders: e.g. ethnicity,
genotyping batch, genotypes at other SNPs
Let pi = E(Yi | Xi,Ci, Di,…)

logit(pi) ~ β0 + β1Xi + β2Ci + β3Di +…

PLINK --logistic
Multiple testing
• Suppose you test 500,000 SNPs for association with disease
• Expect around 500,000 x 0.05 = 25,000 to have p-value less than 0.05
• More appropriate significance threshold
p = 0.05 / 500,000 = 10-7
genome-wide significance
• In our MS GWAS we considered SNPs for follow-up if they had p-
values less than 0.001
• To detect a smaller p-value need a larger study
The power to detect an association
• Suppose the G allele of a SNP has frequency 0.2. If each additional
G allele increases odds of disease by 1.2, and 1618 cases and 3413
controls are genotyped, what is the power (chance) of detecting an
association with significance p<0.001?

Null hypothesis: true OR=1

Observed OR from this
distribution

p=0.001

1 1.2 Odds ratio (OR)

The power to detect an association
• Suppose the G allele of a SNP has frequency 0.2. If each additional
G allele increases odds of disease by 1.2, and 1618 cases and 3413
controls are genotyped, what is the power (chance) of detecting an
association with significance p<0.001?

Null hypothesis: true OR=1 Alternative hypothesis: true OR=1.2

Observed OR from this Observed OR from this
distribution distribution

p=0.001

1 1.2 Odds ratio (OR)

Null hypothesis: true OR=1 Alternative hypothesis: true OR=1.2

Observed OR from this Observed OR from this
distribution distribution

Power = blue shaded area

= 59%

p=0.001

1 1.2 Odds ratio (OR)

Effect of increasing sample size
Observed OR tends to be closer to true OR (narrower distributions)
⇒ Null and alternative distributions become more separate
⇒ Power increases

Null hypothesis: true OR=1 Alternative hypothesis: true OR=1.2

Observed OR from this Observed OR from this
distribution distribution

Power is now greater

p=0.001

1 1.2 Odds ratio (OR)

Multiple sclerosis - degradation of myelin sheath around
nerve fibres

www.msif.org/en/about_ms/ demyelination.html
Multiple sclerosis
• neurodegenerative disease
• autoimmune attack on myelin sheaths around nerve cells
• more females affected than males (3:1)
• average age-at-onset ~30
• ~16,000 people with MS in Australia ($2 billion p.a.)
• no cure
Risk factors
• Epstein-Barr virus
• Exposure to infant siblings (Ponsonby et al, JAMA, 2005)
• Latitude gradient, childhood sun exposure
(van der Mei et al, Lancet, 2003)
• Only genetic risk factor known before 2007 (first GWAS):
HLA-DRB1*1501 discovered in 1972
(60% MS and 30% controls) IL7R CD58
IL2RA EVI5/RPL5
CLEC16A CD226
KIF1B
TYK2

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Human genome project

SNP genotyping arrays
The SNP consortium
GWA studies
The International HapMap Project
Australian and New Zealand MS GWAS

• Assemble collection of DNA samples (all states + NZ)

• Genotype 1952 MS cases from around Australia and New
Zealand with Illumina 370CNV BeadChips
(Patrick Danoy, Matt Brown, Diamantina Institute, UQ)
• Analyse GWAS data
– Quality control (Devindri Perera, Menzies)
– Impute genotypes at millions of other SNPs
(Sharon Browning, Univ of Auckland)
– Compare case genotypes with >3500 controls from the UK and US
(publicly available data)
• Replication genotyping
(Justin Rubioʼs lab, Howard Florey Institute, Univ of Melbourne)
Quality control - MS samples (PLINK)
• Start with 1952 samples
• Exclusions
– Samples with >2% of SNPs not called 70 --mind
Genotype call rate
No call rate

Oragene saliva DNA Samples

Quality control - MS samples (PLINK)
• Start with 1952 samples
• Exclusions
– Samples with >2% of SNPs not called 70 --mind
– Suspect batch of samples 128
– Uncertain phenotype 10
– Duplicates / relatives 88 --genome
– Ancestry outliers 35
Quality control - ethnicity

N European
 Principal components
African
analysis: EIGENSTRAT
Price et al (2006).
Nat Genet 38: 904

Japanese
Chinese  Use an independent
set of ~77,000 SNPs
--indep-pairwise

 178 outliers removed:

- 35 MS
- 143 controls
Quality control - MS samples
• Start with 1952 samples
• Exclusions
– Samples with >2% of SNPs not called 70 --mind
– Suspect batch of samples 128
– Uncertain phenotype 10
– Duplicates / relatives 88 --genome
– Ancestry outliers 35
– Sex discrepancies 3 --check-sex
• Leaves 1618 samples

Quality control - SNPs

• Start with 310,504 SNPs in both case and control datasets
• Exclude SNPs
– Not called in >5% of samples --geno
– In Hardy-Weinberg disequilibrium --hwe
--maf
– Where one allele has frequency < 1%
• Leaves 302,098 SNPs
Choice of 5% no-call threshold
• We originally planned to use a 10% threshold, but lots of SNPs with no call
rate 5-10% showed deviations from Hardy-Weinberg equilibrium

• Closer look at SNPs with call rates between 5% and 10% suggested that
they were unreliable
GWAS - results

Total sample = 1618 MS cases + 3413 controls

HLA
P=7 x 10-84

P=10-7

# 50
# 500
P=0.001
Extra QC for associated SNPs: cluster plots

UK controls ANZ cases both

The replication phase
• Selected 100 SNPs for replication genotyping

• 2,256 ANZ MS cases + 2,310 ANZ controls

• Two chromosome regions on chr 12 and chr 20 showed (almost)

genome-wide significant (p<5 x 10-7) association with MS after
combining GWAS and replication data

• SNPs in 13/53 other regions with replication p-values < 0.1:

more than expected by chance (p=0.002)
Chromosome 12 association: the downside of LD

P=4.1 x 10-6

P=0.00001

P=0.0001

rs703842

GWAS
P = 4.1 x 10-6

replication
P = 1.4 x 10-6

GWAS + rep
P = 5.4 x 10-11

Allele
frequency 0.33

Odds ratio 0.81

(protective)
Chromosome 12 association: the downside of LD

P=4.1 x 10-6

P=0.00001

P=0.0001

KIF5A SNP associated

with rheumatoid arthritis
and type 1 diabetes
Chromosome 12 association: the downside of LD

P=4.1 x 10-6

P=0.00001

P=0.0001

Logistic regression with both

SNPs in the same model:
rs10876994 p = 0.004
rs703842 p = 0.08
Chromosome 12 association: the downside of LD

P=4.1 x 10-6

P=0.00001

P=0.0001

CYP27B1: most likely

candidate??
CYP27B1
• Cytochrome p450 gene family (drug metabolizing)
• Encodes 25-hydroxyvitamin D-1 alpha hydroxylase (1α−OHase)
• Converts 25(OH)D3 to bioactive 1,25(OH)2D3
• 1,25(OH)2D3 regulates calcium metabolism and the immune system
via vitamin D receptor (VDR)

25(OH)D3 UVB
Liver
7 dehydro-
cholesterol

Vit D

Diet

HLA-DRB1*1501

Adorini and Penna (2008)

Nat Clin Prac Rheum 4: 404-12
The chromosome 20 association

P=0.0001
rs6074022

GWAS
P = 2.5 x 10-5

replication
P = 4.6 x 10-4

GWAS + rep
P = 1.3 x 10-7

Allele
frequency 0.25

Odds ratio 1.20

(increased risk)
CD40
• Member of TNF receptor superfamily: regulates many cell- and
antibody-mediated immune responses

• SNPs in CD40 are associated with risk of rheumatoid arthritis and

Gravesʼ disease
• Functional SNP rs1883832C>T, 1 base pair upstream of the ATG
translation initiation codon
• Allelic heterogeneity

T allele CD40 protein MS

-1 ATG C allele CD40 protein RA GD

www.hapmap.org
Another use of logistic regression:
test for gene-gene interaction

MS risk alleles
Chr 12 = rs703842A
Chr 20 = rs6074022G

Odds
ratio

Chr20 risk

Chr12 risk

Modest evidence that each risk allele has a bigger effect in the presence of
the other risk allele (p = 0.03)
Summary

• Case-control GWA studies have been very successful in the past

couple of years
• Linkage disequilibrium means that most, but not all, common human
genetic variation is captured by genotyping a few hundred thousand
SNPs
• Small effect sizes (e.g. OR 1.2) mean that GWA studies need to be
large, with thousands of cases and controls --> big collaborations
• Methods of statistical analysis are fairly straightforward, but care is
required to clean data
• The ultimate test of any association: replication in an independent
population
Acknowledgments - MS GWAS
Hobart: Devindri Perera
Sydney: Graeme Stewart Adelaide: Mark Slee
Bruce Taylor David Booth
Karen Drysdale Robert Heard
Preethi Guru Jim Wiley
Brendan McMorran New Sharon Browning
Simon Foote Zealand: Brian Browning
Gold Coast: Simon Broadley
Deborah Mason
Lyn Griffiths
Melbourne: Justin Rubio Ernie Willoughby
Lotfi Tajouri
Melanie Bahlo Glynnis Clarke
Michael Pender
Helmut Butzkueven Ruth McCallum
Vicky Perreau Marilyn Merriman
Laura Johnson Brisbane: Matthew Brown Tony Merriman
Judith Field Patrick Danoy
Cathy Jensen Johanna Hadler
Ella Wilkins Karen Pryce
Caron Chapman Peter Csurshes
Mark Marriott Judith Greer
Niall Tubridy
Trevor Kilpatrick
Perth: Bill Carroll
Newcastle: Jeanette Lechner-Scott Alan Kermode
Rodney Scott
Pablo Moscato The Australian and NZ MS Genetics Consortium (2009).
Mathew Cox Nat Genet 41: 824

Funding: MS Research Australia, John T Reid Charitable Trusts,

Trish MS Research Foundation, Australian Research Council

Transport Phenomena in Biological Systems
100% (4)
Transport Phenomena in Biological Systems
409 pages
Transport Phenomena 2nd Ed by Bird Stewart Lightfoot (Solution Manual)
90% (42)
Transport Phenomena 2nd Ed by Bird Stewart Lightfoot (Solution Manual)
761 pages
Acute Appendicitis Case Study
100% (4)
Acute Appendicitis Case Study
98 pages
Bioinformatics A Practical Guide To Next Generation Sequencing Data
No ratings yet
Bioinformatics A Practical Guide To Next Generation Sequencing Data
349 pages
Principles of Population Genetics
No ratings yet
Principles of Population Genetics
7 pages
Science Bioinformatics Data Resources
No ratings yet
Science Bioinformatics Data Resources
40 pages
Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
Protein purification A Clear and Concise Reference
From Everand
Protein purification A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
R GWAS Packages
No ratings yet
R GWAS Packages
18 pages
Tassel Working For GWAS
No ratings yet
Tassel Working For GWAS
140 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Genomic Technologies in Clinical Diagnostics - Glossary: Term Alignment Allele
No ratings yet
Genomic Technologies in Clinical Diagnostics - Glossary: Term Alignment Allele
7 pages
Introduction To Bio Statistics 2nd Edition R. Sokal F. Rohlf Statistics Biology
100% (6)
Introduction To Bio Statistics 2nd Edition R. Sokal F. Rohlf Statistics Biology
190 pages
Applied Bioinformatics
100% (1)
Applied Bioinformatics
166 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
9 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Bioinformatics Notebook: By: Abdul Hannan Malik
No ratings yet
Bioinformatics Notebook: By: Abdul Hannan Malik
29 pages
Population Genomics PDF
No ratings yet
Population Genomics PDF
336 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Bioinformatics Exercises Print
No ratings yet
Bioinformatics Exercises Print
6 pages
Single Nucleotide Polymorphism Analysis
No ratings yet
Single Nucleotide Polymorphism Analysis
34 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
FASTA
No ratings yet
FASTA
33 pages
A Systematic Review On The Comparison of Molecular Gene Editing Tools
No ratings yet
A Systematic Review On The Comparison of Molecular Gene Editing Tools
8 pages
Molecular Systematic of Animals
No ratings yet
Molecular Systematic of Animals
37 pages
Gndesitac 4 SM
100% (1)
Gndesitac 4 SM
160 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
11 pages
QPCR Data Analysis Assignment - 33
No ratings yet
QPCR Data Analysis Assignment - 33
5 pages
Genomic DNA Libraries For Shotgun Sequencing Projects
No ratings yet
Genomic DNA Libraries For Shotgun Sequencing Projects
40 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
PCR Lecture
100% (1)
PCR Lecture
35 pages
Phylogenetic Tree Lab (FASTA)
No ratings yet
Phylogenetic Tree Lab (FASTA)
8 pages
A Few Basics About QTL Mapping
100% (1)
A Few Basics About QTL Mapping
14 pages
Transgenic Animals: Extended Response Task 2 - Semester 4 2010
No ratings yet
Transgenic Animals: Extended Response Task 2 - Semester 4 2010
10 pages
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
PCR Inhibitors
No ratings yet
PCR Inhibitors
13 pages
DNA Barcoding and Metabarcoding of Standardized Samples Reveal Patterns of Marine Benthic Diversity
No ratings yet
DNA Barcoding and Metabarcoding of Standardized Samples Reveal Patterns of Marine Benthic Diversity
17 pages
Next Generation
No ratings yet
Next Generation
5 pages
Genomics and Proteomics
100% (1)
Genomics and Proteomics
317 pages
Sequencing Technologies
100% (2)
Sequencing Technologies
25 pages
PCR Based Molecualr, Genetic Markers
No ratings yet
PCR Based Molecualr, Genetic Markers
59 pages
Bioinformatics History of Bioinformatics
No ratings yet
Bioinformatics History of Bioinformatics
10 pages
Omics Technology: October 2010
No ratings yet
Omics Technology: October 2010
28 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
7 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Phylogenetics
100% (1)
Phylogenetics
51 pages
Population Genetics
100% (3)
Population Genetics
5 pages
Population Genetics Notes PDF
No ratings yet
Population Genetics Notes PDF
357 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
Ebooks File Handbook of Clinical Adult Genetics and Genomics: A Practice-Based Approach 1st Edition Shweta Dhar All Chapters
100% (5)
Ebooks File Handbook of Clinical Adult Genetics and Genomics: A Practice-Based Approach 1st Edition Shweta Dhar All Chapters
62 pages
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
No ratings yet
Genomics: A New Revolution in Science:: An Introduction To Promises and Ethical Considerations by Genome Alberta
66 pages
Advances in Genetics Research Vol 17
No ratings yet
Advances in Genetics Research Vol 17
235 pages
Construction of Phylogenetic Tree.
No ratings yet
Construction of Phylogenetic Tree.
4 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Chapter 12 Molecular Markers
No ratings yet
Chapter 12 Molecular Markers
39 pages
Association Mapping and Its Role in Plant Breeding: Mahendrakumar N. Chaudhari
100% (1)
Association Mapping and Its Role in Plant Breeding: Mahendrakumar N. Chaudhari
28 pages
Exer 5 - BIOINFORMATICS
No ratings yet
Exer 5 - BIOINFORMATICS
21 pages
Bioinformatics
No ratings yet
Bioinformatics
18 pages
The Gene Revolution: CRISPR and the Future of Humanity
From Everand
The Gene Revolution: CRISPR and the Future of Humanity
August Raines
No ratings yet
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
From Everand
Fast Facts: EGFR Exon 20 Insertion Mutations in NSCLC
Julia Rotow
No ratings yet
Frontiers in Medicinal Chemistry: Volume 9
From Everand
Frontiers in Medicinal Chemistry: Volume 9
PublishDrive
No ratings yet
Mastering Genetics: Unraveling the Code of Life
From Everand
Mastering Genetics: Unraveling the Code of Life
Dominic Front
No ratings yet
Chapter 22 Metabolism
No ratings yet
Chapter 22 Metabolism
14 pages
Electrical Safety
100% (1)
Electrical Safety
68 pages
Radhita May Pangastuti - DAFTAR PUSTAKA-dikonversi
No ratings yet
Radhita May Pangastuti - DAFTAR PUSTAKA-dikonversi
7 pages
The Last Leaf
No ratings yet
The Last Leaf
10 pages
Patient Safety Module
100% (1)
Patient Safety Module
49 pages
Micronised Progesterone Soft Gelatin Capsules 100 MG / 200 MG / 400 MG
No ratings yet
Micronised Progesterone Soft Gelatin Capsules 100 MG / 200 MG / 400 MG
2 pages
Community Health Care Plan
No ratings yet
Community Health Care Plan
2 pages
Stress Management Stress Management: Nptel
No ratings yet
Stress Management Stress Management: Nptel
99 pages
White Paper - Introduction ISO TR 21954 Guidance Selection Ventilator
No ratings yet
White Paper - Introduction ISO TR 21954 Guidance Selection Ventilator
8 pages
Pathophysiology of Otitis Media
No ratings yet
Pathophysiology of Otitis Media
3 pages
A Visit To The Doctor - Enfermeria
No ratings yet
A Visit To The Doctor - Enfermeria
4 pages
Ruppel's Manual of Pulmonary Function Testing - 10th Edition pdf docx
100% (4)
Ruppel's Manual of Pulmonary Function Testing - 10th Edition pdf docx
15 pages
Practical Guide To Inpatient Glycaemic Care - V2 2020
No ratings yet
Practical Guide To Inpatient Glycaemic Care - V2 2020
115 pages
CB Insights - Telehealth Business Relationships Startups Future
No ratings yet
CB Insights - Telehealth Business Relationships Startups Future
30 pages
PART I. Instructions: Select The Correct Answer For Each Following Question. ENCIRCLE The Letter That
No ratings yet
PART I. Instructions: Select The Correct Answer For Each Following Question. ENCIRCLE The Letter That
6 pages
Glucose 6 Phosphate Dehydrogenase Deficiency
100% (1)
Glucose 6 Phosphate Dehydrogenase Deficiency
29 pages
Amazing Benefits of King Grass You Have Never Imagined
No ratings yet
Amazing Benefits of King Grass You Have Never Imagined
2 pages
Osteoporosis Seminar
100% (2)
Osteoporosis Seminar
35 pages
Regional Trade Contacts Must Have Made Periodic Outbreaks Unavoidable in The Philippines
No ratings yet
Regional Trade Contacts Must Have Made Periodic Outbreaks Unavoidable in The Philippines
5 pages
Kel 11 - PPT ANTIMIKROBIAL ACTIVITY OF FEW MEDICINAL HERBS
100% (1)
Kel 11 - PPT ANTIMIKROBIAL ACTIVITY OF FEW MEDICINAL HERBS
17 pages
Homeopathic Treatment of Gall Stones: Gallstones
No ratings yet
Homeopathic Treatment of Gall Stones: Gallstones
2 pages
In-Patient'S Record (Chart) Checklist: Baguio General Hospital and Medical Center
No ratings yet
In-Patient'S Record (Chart) Checklist: Baguio General Hospital and Medical Center
10 pages
Mission Bhagiratha
No ratings yet
Mission Bhagiratha
2 pages
College of Nursing Berhampur: Seminar ON Nightingale'S Theory
No ratings yet
College of Nursing Berhampur: Seminar ON Nightingale'S Theory
11 pages
DOH DOLE DBM Jao2020 0001
No ratings yet
DOH DOLE DBM Jao2020 0001
8 pages
Eugen Bleuler's Dementia Praecox or TH Group of Schizophrenias
No ratings yet
Eugen Bleuler's Dementia Praecox or TH Group of Schizophrenias
9 pages
Diare 1
No ratings yet
Diare 1
16 pages
Covid Studies and Articles
No ratings yet
Covid Studies and Articles
10 pages
NRCP Prone Position ICU PT
No ratings yet
NRCP Prone Position ICU PT
7 pages
CSEC Biology MayJune 2024 Paper 2
No ratings yet
CSEC Biology MayJune 2024 Paper 2
20 pages
RT Triage System
No ratings yet
RT Triage System
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

GWAS

Uploaded by

GWAS

Uploaded by

Statistical analysis of genome-wide association

Human genome project

SNP genotyping arrays

• Obtain DNA from people with disease of interest (cases) and

SNP: rs12425791 stroke

• Obtain DNA from people with disease of interest (cases) and

SNP: rs12425791 blood pressure stroke

• Suppose that genotypes at a particular SNP are significantly

SNP near lactase gene multiple sclerosis (MS)

• Suppose that genotypes at a particular SNP are significantly

SNP near lactase gene multiple sclerosis (MS)

• Suppose that genotypes at a particular SNP are significantly

Linkage disequilibrium (LD) is the non-independence of alleles at

50,000 years ago: Today:

~50kb “Haplotype block”

Functional SNP is genotyped Functional SNP (blue) is not

Peʼer et al: Nature Genetics 38: 663 (2006)

Observed allele counts

Observed allele counts Expected allele counts

Observed allele counts Expected allele counts

Chi-square test for independence of rows and columns (null hypothesis):

PLINK --assoc option Other options (e.g. dominant/recessive models)

Odds of an event occurring = Pr(event occurs) / Pr(event doesnʼt occur)

odds ratio = odds that G allele occurs in a case = a/c = a d

odds ratio (OR) = odds that G allele occurs in a case = a d

OR = increase in odds of being a case for each additional G allele

OR = 1: no association between genotype and disease

Estimate of odds ratio: exp(β1)

logit(pi) ~ β0 + β1Xi + β2Ci + β3Di +…

Null hypothesis: true OR=1

1 1.2 Odds ratio (OR)

Null hypothesis: true OR=1 Alternative hypothesis: true OR=1.2

1 1.2 Odds ratio (OR)

Null hypothesis: true OR=1 Alternative hypothesis: true OR=1.2

Power = blue shaded area

1 1.2 Odds ratio (OR)

Null hypothesis: true OR=1 Alternative hypothesis: true OR=1.2

Power is now greater

1 1.2 Odds ratio (OR)

Human genome project

• Assemble collection of DNA samples (all states + NZ)

Oragene saliva DNA Samples

 178 outliers removed:

Quality control - SNPs

Total sample = 1618 MS cases + 3413 controls

UK controls ANZ cases both

• 2,256 ANZ MS cases + 2,310 ANZ controls

• Two chromosome regions on chr 12 and chr 20 showed (almost)

• SNPs in 13/53 other regions with replication p-values < 0.1:

Odds ratio 0.81

KIF5A SNP associated

Logistic regression with both

CYP27B1: most likely

Adorini and Penna (2008)

Odds ratio 1.20

• SNPs in CD40 are associated with risk of rheumatoid arthritis and

T allele CD40 protein MS

-1 ATG C allele CD40 protein RA GD

• Case-control GWA studies have been very successful in the past

Funding: MS Research Australia, John T Reid Charitable Trusts,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.