Exercises
Exercises
Exercises
Email: daforerog@gmail.com
http://users.skynet.be/dforero/index.htm
I have consolidated a set of exercises, in which you can apply different in-silico approaches to
common research problems in genetics and genomics. It is expected that the application of these
tools will enhance the design and analysis of neurogenomics experiments, in terms of scope,
precision and speed.
All the bioinformatics tools required to solve these exercises are listed in my website:
http://users.skynet.be/dforero/df9.htm
1. Identify the number of haplotype blocks that are found in the following human genes
-CREM gene in European population
-GABRA6 gene in African population
-BDNF gene in Asian population
-LMNA gene in African population
-PRNP gene in European population
3. Find the top 10 candidate targets for each one of the following human microRNAs:
-hsa-mir-132
-hsa-mir-134
-hsa-mir-7
-hsa-mir-135b
-hsa-let-7a
5. Retrieve the tissue with the highest expression in humans for each one of these genes.
-APOE
1
-CREM
-BDNF
-PRNP
-BACE1
7. Identify the top 10 candidate genes for Alzheimer disease and the top 10 candidate genes
for Parkinson disease (with basis in meta-analysis of published association studies):
8. Retrieve the list of known genes located in the following human genomic regions:
-9q34.3
-21q21.3
-17p13.1
-11q23.3
-1q23.2
9. Identify the repeat sequences that are present in the following human genomic regions:
-chr17:8279904-8312206
-chr2:86247142-86276108
-chr6:16846682-16869700
-chr1:40858939-40903911
-chr6:163755665-163914884
10. Identify the effects on transcription factors binding sites for the following SNPs:
-rs34706444
-rs12028379
-rs5774713
-rs12239355
-rs17129477
11. Identify the vector sequences that are present in the following DNA fragments:
-acacctttgaggtgaaagagtattcagtgaatatgatggtcatgatgatgtcaccttggatttaaggcattttcttaag
atgtgtaaagtatgttcctttagccgccaccgcggtggagctcccagcttttgttcccttta
-tatctgggctttagtttctccatcattacaatgaagagatgtgctatccttttccaccctgttctaaaattgtgtaact
tttttttttcttttttgagacatgcacgagtgggttacatcgaactggatctcaacagcggt
-gtagtcaggattctgctgacctgcttacagggcactaaatacctgaggaggcaggagcttgggggaaagctgagaggta
tctatccccatctacctactgatggagttccgcgttacataacttacggtaaatggcccgcc
12. Identify the top candidate variations in the following human DNA sequence traces:
You will use a file with the chromatograms of 96 subjects sequenced for a 500 bp region.
13. Retrieve the genomic lengths, protein lengths, chromosomal positions and number of
exons for the following genes:
-PLXNA2
-NRG1
-MTHFR
-DTNBP1
-SLC6A4
2
14. Identify the homologues in mouse and drosophila of the following human genes:
-SV2A
-PDE4B
-DRD1
-SYT1
-RGS4
15. Design overlapping PCR primers to sequence the following human genomic regions:
-chr1:40,879,177-40,883,673
-chrX:77,256,575-77,258,830
-chr8:26,530,136-26,532,811
-chr4:122,960,094-122,962,212
-chr5:161,054,462-161,056,347
16. Identify the differential GO and KEGG terms in the following two lists of human genes:
List 1. GPR51, GRIA2, KIF5C, MBP, MEF2C, NAP1L3, NCDN, NDRG4, NEFL, NRGN,
NTRK2, OLFM1
List 2. AKAP6, BRF1, CCNA2, DST, MACF1, NBEA, RAB11A, RANBP5, SEC8L1, SYNE1,
ZFYVE20, ZNF490
20. Find the genes that have their highest expression in prefrontal cortex (200 fold enrichment
in comparison with other tissues), repeat it for amygdala.
21. Identify the transcripts that are targeted by the following affymetrix probes:
-204312_x_at
-207630_s_at
-210400_at
-212581_x_at
-201891_s_at
22. Identify the haplotypes that are present in the following dataset (including their frequency
and calculate the LD values between SNPs).
3
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13
subj1 CT AG AC TT AC CT GG CC CT AG CT AA AG
subj2 TT GG AA TT CC CC GG CC CT AG CT AG AG
subj3 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj4 CT AG AA CT CC CC AG CT CT GG CT AG AG
subj5 CT AG AA TT CC CC GG CC CT AG CT AG AG
subj6 CT AG AA TT CC CC GG CC CC GG TT AA AA
subj7 CT AG AA TT CC CC GG CT CT GG CT AG AG
subj8 CC AG AC TT CC CC GG TT TT GG CC GG GG
subj9 TT GG CC TT AA TT GG CT CT GG CT AG AG
subj10 TT GG AA TT CC CC GG CC CT AG CT AG AG
subj11 CT AG AA TT CC CC GG CC CT AG CT AA AG
subj12 CT GG AA TT CC CC GG TT TT GG CC GG GG
subj13 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj14 TT GG AA CT CC CC AG CC CT AG CT AG AG
subj15 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj16 CT AG AA TT CC CC GG CC CT AG CT AG AG
subj17 CT GG AC TT AC CT GG TT TT GG CC GG GG
subj18 CC AA AA TT CC CC GG CC CT AG CT AG AG
subj19 CT AG AA TT CC CC GG CC CC GG TT AA AA
subj20 CC AA AC TT CC CT GG CC CC GG TT AA AA
subj21 CT AG AA TT CC CC GG CC CC GG TT AA AG
subj22 CC AA CC TT AA TT GG CC CC GG TT AA AG
subj23 TT GG AA TT CC CC GG CC CC GG TT AA AG
subj24 TT GG AA TT CC CC GG CC CC GG TT AA AA
23. Identify the predicted functional effects of each one of the following nsSNPs:
-rs28931579
-rs769452
-rs28931577
-rs11542040
-rs11542035
24. Retrieve the genomic sequence for all the exons (including 50 bp of flanking sequence) of
the following genes:
-RGS4
-RIMS3
-RTN1
-SLC1A3
-SNAP25
25. Identify the interacting partners for each one of the following genes:
-MEF2C
-NAP1L3
-NCDN
-NDRG4
-NEFL
26. Identify which of the next P values pass a False Discovery Rate of 0.05.
0,650106935, 0,308093469, 0,463145394, 0,19572116, 0,112681844, 0,493084372, 0,043017213,
0,515230709, 0,098477813, 0,276669253, 0,4536028, 0,927263525, 0,000763073, 0,391324056,
0,381511095, 0,003431856, 0,206671413, 0,354702281, 0,25477432
4
27. Identify the top 10 down-regulated genes in post-mortem schizophrenia brains, repeat it
for bipolar disorder.
28. Design PCR primers that allow the cloning of the following fragments:
-chrX:77256575-77256975; EcoRI and HindIII
-chr8:26530136-26530636; HindIII and XbaI
-chr6:16846682-16846982; EcoRI and XbaI
-chr1:40858939-40859339; HindIII and EcoRI
29. Identify the genomic regions that are amplified using the following PCR primer pairs:
-F-ATGGAGTGGCTAGAAGAGTCAG
R-TGGATCATTTGCGATTTCCAGTT
-F-AGGGCTTCCTTATGTCCTCCA
R-TACCCACGTACCATTAGGAGC
-F-AAAAGCAGGAGTGTGATGACG
R-CGATCCCAAGTGTGTTACTGG
31. Identify the maximum LOD score simulated for the following pedigree:
32. Identify the nucleotide that is conserved in mouse and rat for the following SNPs:
-rs9817739
-rs1937690
-rs7973772
-rs278151
-rs10128858
35. Identify the number of citations for the papers with the following PMIDs:
-17173049
-16862116
5
-8895455
-818641
-17571346
36. Identify the predicted network of interactions for the following genes:
CAMK2B, DNER, DNM1, EEF1A2, ELAVL4, GFAP
37. Identify the best predicted drug compound that can modulate the activity of the following
genes:
-CAMK2B
-NTRK2
-VDAC1
-CCNA2
-PDE4B
38. Design PCR primers to differentiate between cDNA and genomic DNA for the following
genes:
-TF
-TU3A
-TUBB4
-UCHL1
-VSNL1
39. Identify the most suitable journal to publish a hypothetical paper with the following
abstract:
Human memory is a polygenic trait. We performed a genome-wide screen to identify memory-
related gene variants. A genomic locus encoding the brain protein KIBRA was significantly
associated with memory performance in three independent, cognitively normal cohorts from
Switzerland and the United States. Gene expression studies showed that KIBRA was expressed in
memory-related brain structures. Functional magnetic resonance imaging detected KIBRA allele–
dependent differences in hippocampal activations during memory retrieval. Evidence from these
experiments suggests a role for KIBRA in human memory.
40. Identify the significant SNPs in a genome wide association study and identify possible runs
of homozigosity in the same dataset.
You will download a publicly available dataset with results from about 500.000 SNPs.
41. Identify SNPs that are located in conserved transcription factor binding sites in
chromosome 1; retrieve SNPs that are located in microRNA binding sites in chromosome 2.
42. Identify the Ensembl IDs for the genes of the point 36.
DF, 03-2008
If you use these exercises for teaching purposes, please cite the original source; if you have
commentaries or suggestions, please do not hesitate to contact me by email.