Skip to content

Commit e69df96

Browse files
authored
Repository Initialization (#1)
* initializing README * edit language in README * add relative link to gestalt * update relative links in README * modify README for correct links and instructions * update README instructions * add ignore file * add functionality scripts these scripts are copied over from https://github.com/greenelab/tad_pathways and are only slightly modified * add example files and example results * add conda environment file * add example pipelines * add repo initialization script * add R sessionInfo * update gestalt path in readme * set error to exit bash scripts * add info to preprint * add correct bmd evidence files
1 parent 86a678e commit e69df96

37 files changed

+20146
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.Rhistory
2+
data/
3+
tad_pathways_data.tar.gz

README.md

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# TAD_Pathways
2+
3+
## Leveraging TADs to identify candidate genes at GWAS signals
4+
5+
**Gregory P. Way and Casey S. Greene - 2017**
6+
7+
### Summary
8+
9+
The repository contains data and instructions to implement a "TAD_Pathways"
10+
analysis for over 300 different trait/disease GWAS or custom SNP lists.
11+
12+
TAD_Pathways uses the principles of topologically association domains (TADs) to
13+
define where an association signal (typically a GWAS signal) can most likely
14+
impact gene function. We use TAD boundaries as defined by
15+
[Dixon et al. 2012](https://doi.org/10.1038/nature11082) and
16+
[hg19 Gencode genes](ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_19/)
17+
to identify which genes may be implicated. We then input this list into a
18+
[WebGestalt Pathways Analysis](http://webgestalt.org/) to output
19+
significantly associated pathways implicated by the input TAD-defined geneset.
20+
21+
For more specific details about the method, refer to our
22+
[preprint](https://doi.org/10.1101/087718 "Determining causal genes from GWAS signals using topologically associating domains").
23+
24+
### Setup
25+
26+
Before you begin, download the necessary TAD based index files and GWAS
27+
curation files and setup python environment:
28+
29+
```bash
30+
bash initialize.sh
31+
32+
source activate tad_pathways
33+
```
34+
35+
### Examples
36+
37+
We provide three different examples for a TAD pathways analysis pipeline. To run
38+
each of the analyses:
39+
40+
```bash
41+
# Example using Bone Mineral Density GWAS
42+
bash example_pipeline_bmd.sh
43+
44+
# Example using Type 2 Diabetes GWAS
45+
bash example_pipeline_t2d.sh
46+
47+
# Example using custom input SNPs
48+
bash example_pipeline_custom.sh
49+
```
50+
51+
### General Usage
52+
53+
There are two ways to implement a TAD_Pathways analysis:
54+
55+
1. GWAS
56+
2. Custom
57+
58+
#### GWAS
59+
60+
Browse the `data/gwas_tad_genes/` directory to select a GWAS file. Each file in
61+
this directory is a tab separated text file that includes information regarding
62+
each gene located within a signal TAD. The column `gene_name` is the
63+
comprehensive list of all implicated genes. For complete information on how
64+
these lists were constructed, refer to
65+
https://github.com/greenelab/tad_pathways.
66+
67+
Input this gene list directly into a
68+
[WebGestalt Pathway Analysis](http://webgestalt.org/) and skip to the
69+
[WebGestalt step](#webgestalt-pathway-analysis).
70+
71+
#### Custom
72+
73+
Create a comma separated file where the first row of each column names the list
74+
of snps below in subsequent rows. There can be many columns with variable
75+
length rows.
76+
77+
E.g.: `custom_example.csv`
78+
79+
| Group 1 | Group 2 |
80+
| ------- | ------- |
81+
| rs12345 | rs67891 |
82+
| rs19876 | rs54321 |
83+
| ... | ... |
84+
85+
Then, perform the following steps:
86+
87+
```bash
88+
# Map custom SNPs to genomic locations
89+
Rscript --vanilla scripts/build_snp_list.R \
90+
--snp_file "custom_example.csv" \
91+
--output_file "mapped_results.tsv"
92+
93+
# Build TAD based genelists for each group
94+
python scripts/build_custom_TAD_genelist.py \
95+
--snp_data_file "mapped_results.tsv" \
96+
--output_file "custom_tad_genelist.tsv"
97+
```
98+
99+
Skip now to the the [WebGestalt step](#webgestalt-pathway-analysis).
100+
101+
### WebGestalt Pathway Analysis
102+
103+
Insert either the GWAS curated genelist or a column from the custom genelist
104+
with the following parameters:
105+
106+
| Parameter | Input |
107+
| --------- | ----- |
108+
| Select gene ID type | *hsapiens__gene_symbol* |
109+
| Enrichment Analysis | *GO Analysis* |
110+
| GO Slim Classification | *Yes* |
111+
| Reference Set | *hsapiens__genome* |
112+
| Statistical Method | *Hypergeometric* |
113+
| Multiple Test Adjustment | *BH* |
114+
| Significance Level | *Top10* |
115+
| Minimum Number of Genes for a Category | *4* |
116+
117+
Once the analysis is complete, click `Export TSV Only` and save the file as
118+
`gestalt/<INSERT_TRAIT_HERE>_gestalt.tsv`.
119+
120+
### Curation
121+
122+
Clean and tidy the output files and summarize into convenient lists of
123+
candidate genes. These genes may or may not be the nearest gene to the GWAS
124+
signal and will require experimental validation.
125+
126+
```bash
127+
# An example for Bone Mineral Density (see `example_pipeline_bmd.sh` as well)
128+
129+
# Process WebGestalt Output saved in `data/gestalt/bmd_gestalt.tsv`
130+
python scripts/parse_gestalt.py --trait 'bmd' --process
131+
132+
# Output evidence tables
133+
python scripts/construct_evidence.py \
134+
--trait 'bmd' \
135+
--genelist 'data/gwas_catalog/Bone_mineral_density_hg19.tsv' \
136+
--pathway 'skeletal system development'
137+
138+
# Summarize evidence
139+
python scripts/assign_evidence_to_TADs.py \
140+
--evidence 'results/bmd_gene_evidence.csv' \
141+
--snps 'data/gwas_tad_genes/Bone_mineral_density_hg19_SNPs.tsv' \
142+
--output_file 'results/BMD_evidence_summary.tsv'
143+
144+
# Output venn diagram
145+
R --no-save --args 'results/bmd_gene_evidence.csv' \
146+
'BMD' < scripts/integrative_summary.R
147+
```
148+
149+
### Contact
150+
151+
For all questions and bug reporting please file a
152+
[GitHub issue](https://github.com/greenelab/tad_pathways/issues)
153+
154+
For all other questions contact Casey Greene at csgreene@mail.med.upenn.edu or
155+
Struan Grant at grants@email.chop.edu

custom_example.csv

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
prostate_cancer
2+
rs10009409
3+
rs1016343
4+
rs10187424
5+
rs103294
6+
rs1041449
7+
rs10486567
8+
rs10875943
9+
rs10896449
10+
rs10934853
11+
rs10936632
12+
rs10993994
13+
rs11135910
14+
rs11214775
15+
rs11228565
16+
rs115306967
17+
rs115457135
18+
rs11568818
19+
rs11650494
20+
rs11902236
21+
rs12051443
22+
rs12155172
23+
rs1218582
24+
rs12480328
25+
rs12500426
26+
rs12621278
27+
rs12653946
28+
rs1270884
29+
rs130067
30+
rs1327301
31+
rs13385191
32+
rs1447295
33+
rs1456315
34+
rs1465618
35+
rs1512268
36+
rs16901979
37+
rs17021918
38+
rs17181170
39+
rs17599629
40+
rs17694493
41+
rs1775148
42+
rs1859962
43+
rs188140481
44+
rs1894292
45+
rs1933488
46+
rs1983891
47+
rs2121875
48+
rs2238776
49+
rs2242652
50+
rs2273669
51+
rs2405942
52+
rs2427345
53+
rs2660753
54+
rs2735839
55+
rs2807031
56+
rs3096702
57+
rs3123078
58+
rs339331
59+
rs3771570
60+
rs3850699
61+
rs4242382
62+
rs4245739
63+
rs4430796
64+
rs4713266
65+
rs4844289
66+
rs4962416
67+
rs56232506
68+
rs5759167
69+
rs5919432
70+
rs5945572
71+
rs6062509
72+
rs636291
73+
rs6465657
74+
rs651164
75+
rs6545977
76+
rs6625711
77+
rs6763931
78+
rs684232
79+
rs6869841
80+
rs6983267
81+
rs7127900
82+
rs7130881
83+
rs7141529
84+
rs7153648
85+
rs7210100
86+
rs721048
87+
rs7241993
88+
rs7501939
89+
rs7584330
90+
rs7611694
91+
rs76934034
92+
rs7837688
93+
rs7931342
94+
rs8008270
95+
rs80130819
96+
rs8014671
97+
rs8102476
98+
rs817826
99+
rs9284813
100+
rs9287719
101+
rs9364554
102+
rs9443189
103+
rs9600079

environment.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
name: tad_pathways
2+
dependencies:
3+
- python=3.5.2
4+
- pandas=0.18.0
5+
- numexpr=2.5.2
6+
- numpy=1.11.1
7+
- scipy=0.17.1

example_pipeline_bmd.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/bin/bash
2+
3+
set -o errexit
4+
5+
# Example of a TAD_Pathways Analysis applied to Bone Mineral Density GWAS
6+
7+
# After saving WebGestalt tsv file, parse its contents
8+
python scripts/parse_gestalt.py --trait 'bmd'
9+
10+
# Construct an evidence file - Nearest gene to gwas or not
11+
python scripts/construct_evidence.py \
12+
--trait 'bmd'\
13+
--gwas 'data/gwas_catalog/Bone_mineral_density_hg19.tsv'\
14+
--pathway 'skeletal system development'
15+
16+
# Summarize the evidence file
17+
python scripts/summarize_evidence.py \
18+
--evidence 'results/bmd_gene_evidence.csv' \
19+
--snps 'data/gwas_tad_snps/Bone_mineral_density_hg19_SNPs.tsv' \
20+
--output_file 'results/bmd_gene_evidence_summary.tsv'
21+
22+
# Visualize overlap in TAD pathways curation
23+
R --no-save --args 'results/bmd_gene_evidence.csv' \
24+
< scripts/integrative_summary.R

example_pipeline_custom.sh

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#!/bin/bash
2+
3+
set -o errexit
4+
5+
# Example of a TAD_Pathways Analysis applied to a Custom SNP list
6+
# For this example, the custom SNP list is the GWAS findings for
7+
# Prostate Cancer. The the data is used as a custom input.
8+
9+
# Map SNPs to genomic location
10+
Rscript --vanilla scripts/build_snp_list.R \
11+
--snp_file 'custom_example.csv' \
12+
--output_file 'results/custom_example_location.tsv'
13+
14+
# Build a customized genelist to input into WebGestalt
15+
python scripts/build_custom_tad_genelist.py \
16+
--snp_data_file 'results/custom_example_location.tsv' \
17+
--output_file 'results/custom_example_tad_results.tsv'
18+
19+
# After saving WebGestalt tsv file, parse its contents
20+
python scripts/parse_gestalt.py --trait 'custom'
21+
22+
# Construct an evidence file - Nearest gene to gwas or not
23+
python scripts/construct_evidence.py \
24+
--trait 'custom'\
25+
--gwas 'results/custom_example_tad_results_nearest_gene.tsv'\
26+
--pathway 'epidermis development,antigen processing and presentation'
27+
28+
# Summarize the evidence file
29+
python scripts/summarize_evidence.py \
30+
--evidence 'results/custom_gene_evidence.csv' \
31+
--snps 'results/custom_example_tad_results.tsv' \
32+
--output_file 'results/custom_gene_evidence_summary.tsv'
33+
34+
# Visualize overlap in TAD pathways curation
35+
R --no-save --args 'results/custom_gene_evidence.csv' \
36+
< scripts/integrative_summary.R
37+

example_pipeline_t2d.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/bin/bash
2+
3+
set -o errexit
4+
5+
# Example of a TAD_Pathways Analysis applied to Type 2 Diabetes GWAS
6+
7+
# After saving WebGestalt tsv file, parse its contents
8+
python scripts/parse_gestalt.py --trait 't2d'
9+
10+
# Construct an evidence file - Nearest gene to gwas or not
11+
python scripts/construct_evidence.py \
12+
--trait 't2d'\
13+
--gwas 'data/gwas_catalog/Type_2_diabetes_hg19.tsv'\
14+
--pathway 'peptide hormone secretion'
15+
16+
# Summarize the evidence file
17+
python scripts/summarize_evidence.py \
18+
--evidence 'results/t2d_gene_evidence.csv' \
19+
--snps 'data/gwas_tad_snps/Type_2_diabetes_hg19_SNPs.tsv' \
20+
--output_file 'results/t2d_gene_evidence_summary.tsv'
21+
22+
# Visualize overlap in TAD pathways curation
23+
R --no-save --args 'results/t2d_gene_evidence.csv' \
24+
< scripts/integrative_summary.R

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy