Computational Biology Lab File
Computational Biology Lab File
Computational Biology Lab File
COMPUTATIONAL
BIOLOGY LAB
RECORD
(BT-516)
VARNIT CHAUHAN
17/IBT/042
Computational Biology Lab
INDEX
Sr. Experiment Page
No. Number
EXPERIMENT-01
OBJECTIVE
To explore features of PubMed, OMIM database.
ABOUT
PubMed
PubMed is an online database that allows you to browse and retrieve biomedical and life
science literature to improve your health on a national and personal level.
More than 32 million citations and abstracts in the biomedical literature are available in the
PubMed database. It does not contain full-text journal articles; however, although accessible
from other outlets, such as the publisher's website or PubMed Central, links to the full text
are often present (PMC).
PubMed, which has been available to the public online since 1996, was created and is
maintained by the National Center for Biotechnology Information (NCBI) at the National
Institutes of Health's National Library of Medicine (NLM) (NIH).
OMIM
Mendelian Inheritance in Man OMIM is a publicly accessible, authoritative database of
human genes and genetic phenotypes that is maintained regularly. Both recognized
mendelian diseases and over 16,000 genes are covered in full-text, cited overviews in
OMIM. The phenotype-genotype interaction is the subject of OMIM.
This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of
mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). Twelve book
editions of MIM were published between 1966 and 1998. The online version, OMIM, was
created in 1985 by a collaboration between the National Library of Medicine and the William
H. Welch Medical Library at Johns Hopkins. It was made generally available on the internet
starting in 1987. In 1995, OMIM was developed for the World Wide Web by NCBI, the
National Center for Biotechnology Information.
QUESTION-1
How many articles do you find for the epidemiology and pathogenesis of coronavirus
disease? Give 5 references.
Answer:
1. Jin Y, Yang H, Ji W, Wu W, Chen S, Zhang W, Duan G. Virology, Epidemiology,
Pathogenesis, and Control of COVID-19. Viruses. 2020 Mar 27;12(4):372. doi:
10.3390/v12040372. PMID: 32230900; PMCID: PMC7232198.
2. Amawi H, Abu Deiab GI, A Aljabali AA, Dua K, Tambuwala MM. COVID-19 pandemic: an
overview of epidemiology, pathogenesis, diagnostics and potential vaccines and
therapeutics. Ther Deliv. 2020 Apr;11(4):245-268. doi: 10.4155/tde-2020-0035. Epub
2020 May 12. PMID: 32397911; PMCID: PMC7222554.
3. Hu B, Guo H, Zhou P, Shi ZL. Characteristics of SARS-CoV-2 and COVID-19. Nat Rev
Microbiol. 2021 Mar;19(3):141-154. doi: 10.1038/s41579-020-00459-7. Epub 2020 Oct
6. PMID: 33024307; PMCID: PMC7537588.
4. Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC. Pathophysiology,
Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19): A
Review. JAMA. 2020 Aug 25;324(8):782-793. doi: 10.1001/jama.2020.12839. PMID:
32648899.
5. Ahn DG, Shin HJ, Kim MH, Lee S, Kim HS, Myoung J, Kim BT, Kim SJ. Current Status of
Epidemiology, Diagnosis, Therapeutics, and Vaccines for Novel Coronavirus Disease
2019 (COVID-19). J Microbiol Biotechnol. 2020 Mar 28;30(3):313-324. doi:
10.4014/jmb.2003.03011. PMID: 32238757.
QUESTION-2
What are the genes associated with alveolar cell carcinoma?
a) Name the genes, give their cytogenetic location MIM number, and function. Is
alveolar carcinoma caused by a mutation in a single gene or multiple genes? What clinically
relevant information is provided for cystic fibrosis? Also, report the Genomic coordinates of
any five genes involved.
Answer:
4) NEOPLASIA- Alveolar cell carcinoma
- Non Small cell lung cancer
- Adenocarcinoma of lung
5) MISCELLANEOUS
- Genes associated with susceptibility to lung cancer, e.g., FASLG (134638.0002), FAS
(134637.0021), CHRNA5 (118505.0001), CHRNA3 (118503.0001)
- Genes associated with protection against lung cancer, e.g., CASP8 (601763.0004),
CYP2A6 (122720.0002)
- Mutations in EGFR (131550) are associated with altered response to treatment with
tyrosine kinase inhibitors
6) MOLECULAR BASIS
EXPERIMENT-02
OBJECTIVE
To explore features of Genbank, Refseq, Uniprot.
ABOUT
Genbank
NCBI builds GenBank primarily from the submission of sequence data from authors and
from the bulk submission of expressed sequence tag (EST), genome survey sequence (GSS)
and other high-throughput data from sequencing centers. The US Office of Patents and
Trademarks also contributes sequences from issued patents. GenBank participates with the
European Molecular Biology Laboratory Nucleotide Sequence Database (EMBL) and the
DNA Databank of Japan (DDBJ) as a partner in the International Nucleotide Sequence
Database Collaboration (INSDC), which exchanges data daily to ensure that a uniform and
comprehensive collection of sequence information is available worldwide. NCBI makes the
GenBank data available at no cost over the Internet, through FTP and a wide range of
web-based retrieval and analysis services.
Refseq-
NCBI’s Reference Sequence (RefSeq) database is a collection of taxonomically diverse,
non-redundant and richly annotated sequences representing naturally occurring molecules
of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses,
archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data
submitted to the International Nucleotide Sequence Database Collaboration (INSDC). Similar
to a review article, a RefSeq is a synthesis of information integrated across multiple sources
at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and
functional information. They are generated to provide reference standards for multiple
purposes ranging from genome annotation to reporting locations of sequence variation in
medical records. The RefSeq collection is available without restriction and can be retrieved
in several different ways, such as by searching or by available links in NCBI resources,
including PubMed, Nucleotide, Protein, Gene, and Map Viewer, searching with a sequence
via BLAST, and downloading from the RefSeq FTP site.
UniProt
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence
and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB),
the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt
consortium and host institutions EMBL-EBI, SIB and PIR are committed to the long-term
preservation of the UniProt databases.
QUESTION-1
Retrieve 5 DNA sequences and corresponding protein sequence each of
a) Histone H4
b) Insulin from 5 different organisms.
Answer:
Information about sequence-
Histone H4
SEQUENCE 1:
1. Accession no.- X54078
2. Title- T. nilotica gene for Histone H4
3. Organism- Oreochromis niloticus
4. Base pairs- 605bp
5. If gene, then gene name- N/A
6. Features: Taxon id, CDS, Protein id- 8128, P62796
7. Sequence in Fasta format/ Genbank format
>X54078.1 T.nilotica gene for histone H4
TTACCAATCGAACTTGGCCTGGTTCAAAAAACCTTACCATGTCTGGAAGAGGTAAAGGCGGCAAAGGACTCGGAAAAGGAGGC
GCCAAGCGTCACCGTAAGGTTCTCCGTGATAACATTCAGGGCATCACCAAACCAGCCATCCGTCGTCTGGCTCGCCGTGGTGG
CGTCAAGCGTATCTCTGGTCTGATCTACGAGGAGACCCGTGGTGTGTTGAAGGTGTTTCTGGAGAACGTCATCCGTGACGCCG
TCACCTACACTGAGCACGCCAAGAGGAAGACCGTGACCGCCATGGATGTGGTGTACGCTCTGAAGAGGCAGGGCCGCACTCTG
TACGGCTTCGGCGGTTAAACTCATGCTCCTTCATCCATCAAACGGCTCTTTTAAGAGCCACACACTTCACTTTAAGGGCTTTG
TTCTGGGGCCTTCTTGTGGGTAGGGGTTGGGTTGTTGTGTGTGGTGGTGTTGTTTTTTTGTTTTTTTTTTTTGTCTTACTACA
GATTTCTTGAAATAGAAATTTATAGTTAGGAAAATGTCTGGGTAATAACTTTACAATTTAAGTCACTCAGATTTTTATTCATC
TAGTAGTGATGACCAATGAAGCTT
PROTEIN SEQ.
>CAA38015.1 histone H4 [Oreochromis niloticus]
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMD
VVYALKRQGRTLYGFGG
SEQUENCE 2:
PROTEIN SEQ.
>CAA54829.1 histone H4 [Pyrenomonas salina]
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRSVLKVFLENVIRDAVTYTEHARRKTVTAMD
VVYALKRQGRTLYGFGG
SEQUENCE 3:
PROTEIN SEQ
>CAA56154.1 histone H4 [Lolium
temulentum]MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRDAVTYTE
HAXRKTVTAMDVVYALKRQGRTLYGFGG
SEQUENCE 4:
TCACGATTCATTTTCTAACCCAAACAATCTTCTTTATAAATAAAAAATAAATTCTTCTCTACTCATCAGAAAACTCAAATCTTAAA
ACTTTCTGGAAAAACAAAAATGTCAGGAAGAGGAAAAGGAGGAAAAGGGTTAGGCAAAGGAGGAGCAAAGAGACACAGAAAGGTTC
TAAGAGACAACATTCAAGGAATCACAAAGCCAGCGATTCGTCGTCTTGCTCGTAGAGGAGGTGTGAAGAGAATCAGTGGATTGATC
TATGAAGAAACGAGAGGTGTGTTGAAGATTTTTCTGGAGAATGTGATTAGAGATGCTGTTACTTACACTGAGCATGCGAGGAGGAA
GACGGTGACTGCTATGGATGTTGTTTATGCCTTGAAGAGACAAGGAAGAACTCTATATGGATTTGGTGGTTGATCAATTTGAGATC
TGGGTTTTCTGGTGAATGATGATGATTTAAGTCTTGCGATCAAGAAATTCCAGAAATTGGGTTGAATTTTAGGGTTTCGTTTTGTG
TTGTAATTAGGGCAGCATTGTAATGGATTAATGATAAGTACCATTTGCTCTAATTACTCTTTAATCTCTGAAATTCATGGTAAAGG
ATTATCAATCGAAAACTAATCAAAGGAATTGATTGAACTATGTTTTTGAAGATTGAAAAACAAATAAGGACTAAATGTGAGCAATT
TAAAGTTTAGAGGTATAAGAATCGAAT
PROTEIN SEQ
>NP_180441.1 histone H4 [Arabidopsis thaliana]
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRDAVTYTEHARRKTVTAMD
VVYALKRQGRTLYGFGG
SEQUENCE 5:
PROTEIN SEQ
INSULIN
SEQUENCE 1:
1. DEFINITION Octodon degus insulin mRNA, complete cds.
2. ACCESSION M57671
3. SOURCE Octodon degus (degu)
4. ORGANISM Octodon degus
5. No of base pairs 432 bp
6. REFERENCE 1 (bases 1 to 432)
Nishi,M. and Steiner,D.F.Cloning of complementary DNAs encoding
islet amyloid polypeptide,insulin, and glucagon precursors from
PROTEIN SEQ
SEQUENCE 2:
PROTEIN SEQ
SEQUENCE 3:
10
CTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGCCTCCTGCACCGAGAGAGATG
GAATAAAGCCCTTGAACCAGCCCTGCTGTGCCGTCTGTGTGTCTTGGGGGCCCTGGGCCAAGCCCCACTTCCC
PROTEIN SEQ
SEQUENCE 4:
>M10039.1 Human alpha-type insulin gene and 5' flanking polymorphic region
CTGGGGCTGCTGTCCTAAGGCAGGGTGGGAACTAGGCAGCCAGCAGGGAGGGGACCCCTCCCTCACTCCCACTCTCCCACCCCCAC
CACCTTGGCCCATCCATGGCGGCATCTTGGGCCATCCGGGACTGGGGACAGGGGTCCTGGGGACAGGGGTCCGGGGACAGGGTCCT
GGGGACAGGGGTGTGAGGACAGGGGTCCTGGGGACAGGGGTGTGGGGACAGGGGTGTGAGGACAGGGGTCCCGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGG
GGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGG
ACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGA
CAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACA
GGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGG
GTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGT
GTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGG
GGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGTGGACAGGGGTGTGGG
GATAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTCCCGGG
GACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGG
ACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGAC
AGGGGTCCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAG
GGCTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGG
GGTCCGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGG
TGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGG
TGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGT
GTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTG
TGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTG
GGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGCAGCGCAAAGAGCCCCGCCCTGCAGCCTCCAGCTCTCCTGGTCTAATGTG
GAAAGTGGCCCAGGTGAGGGCTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAGGGACAGGTCTGGCCACCGGGCCCCTG
GTTAAGACTCTAATGACCCGCTGGTCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGCACCAGGGAAATG
GTCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCCGACCCCCCCACCCCAGGCCCTAATGGGCCAGGCGGCAGGGGTTGACAGG
TAGGGGAGATGGGCTCTGAGACTATAAAGCCAGCGGGGGCCCAGCAGCCCTCAGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCA
TCAAGCAGGTCTGTTCCAAGGGCCTTTGCGTCAGGTGGGCTCAGGGTTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAG
GGAGGACGTGGCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACCTGGCCTTCAGCCTGCCTCAGCC
CTGCCTGTCTCCCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGA
CCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGG
11
CTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGGTGAGCCAACCGCCCATTGCTGCCCCTGGCCGCCCCCAG
CCACCCCCTGCTCCTGGCGCTCCCACCCAGCATGGGCAGAAGGGGGCAGGAGGCTGCCACCCAGCAGGGGGTCAGGTGCACTTTTT
TAAAAAGAAGTTCTCTTGGTCACGTCCTAAAAGTGACCAGCTCCCTGTGGCCCAGTCAGAATCTCAGCCTGAGGACGGTGTTGGCT
TCGGCAGCCCCGAGATACATCAGAGGGTGGGCACGCTCCTCCCTCCACTCGCCCCTCAAACAAATGCCCCGCAGCCCATTTCTCCA
CCCTCATTTGATGACCGCAGATTCAAGTGTTTTGTTAAGTAAAGTCCTGGGTGACCTGGGGTCACAGGGTGCCCCACGCTGCCTGC
CTCTGGGCGAACACCCCATCACGCCCGGAGGAGGGCGTGGCTGCCTGCCTGAGTGGGCCAGACCCCTGTCGCCAGGCCTCACGGCA
GCTCCATAGTCAGGAGATGGGGAAGATGCTGGGGACAGGCCCTGGGGAGAAGTACTGGGATCACCTGTTCAGGCTCCCACTGTGAC
GCTGCCCCGGGGCGGGGGAAGGAGGTGGGACATGTGGGCGTTGGGGCCTGTAGGTCCACACCCACTGTGGGTGACCCTCCCTCTAA
CCTGGGTCCAGCCCGGCTGGAGATGGGTGGGAGTGTGACCTAGGGCTGGCGGGCAGGCGGGCACTGTGTCTCCCTGACTGTGTCCT
CCTGTGTCCCTCTGCCTCGCCGCTGTTCCGGAACCTGCTCTGCGCGGCACGTCCTGGCAGTGGGGCAGGTGGAGCTGGGCGGGGGC
CCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTG
CTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGCCTCCTGCACCGAGAGAGATG
GAATAAAGCCCTTGAACCAGCCCTGCTGTGCCGTCTGTGTGTCTTGGGGGCCCTGGGCCAAGCCCCACTTCCC
PROTEIN SEQ
>AAA37041.1 insulin [Cavia porcellus]
MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELEDPQVEQTELGMGLGAGGLQPLALEMAL
QKRGIVDQCCTGTCTRHQLQSYCN
SEQUENCE 5:
>M10039.1 Human alpha-type insulin gene and 5' flanking polymorphic region
CTGGGGCTGCTGTCCTAAGGCAGGGTGGGAACTAGGCAGCCAGCAGGGAGGGGACCCCTCCCTCACTCCC
ACTCTCCCACCCCCACCACCTTGGCCCATCCATGGCGGCATCTTGGGCCATCCGGGACTGGGGACAGGGG
TCCTGGGGACAGGGGTCCGGGGACAGGGTCCTGGGGACAGGGGTGTGAGGACAGGGGTCCTGGGGACAGG
GGTGTGGGGACAGGGGTGTGAGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAG
GGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACA
GGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACA
GGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACA
GGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGAC
AGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGAC
AGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGA
CAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGTGGACAGGGGTGTGGGGA
TAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGG
12
ACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGG
GACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGG
GGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTGTG
GGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGT
GGGGACAGGGCTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTG
TGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTG
TGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGT
CTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGG
TGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGG
GTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGG
GGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGG
GGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAG
CAGCGCAAAGAGCCCCGCCCTGCAGCCTCCAGCTCTCCTGGTCTAATGTGGAAAGTGGCCCAGGTGAGGG
CTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAGGGACAGGTCTGGCCACCGGGCCCCTGGTTA
AGACTCTAATGACCCGCTGGTCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGC
ACCAGGGAAATGGTCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCCGACCCCCCCACCCCAGGCCCT
AATGGGCCAGGCGGCAGGGGTTGACAGGTAGGGGAGATGGGCTCTGAGACTATAAAGCCAGCGGGGGCCC
AGCAGCCCTCAGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGGTCTGTTCCAAGGGCCTT
TGCGTCAGGTGGGCTCAGGGTTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAGGGAGGACGTG
GCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACCTGGCCTTCAGCCTGCC
TCAGCCCTGCCTGTCTCCCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCT
GGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACAC
CTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGG
CAGAGGACCTGCAGGGTGAGCCAACCGCCCATTGCTGCCCCTGGCCGCCCCCAGCCACCCCCTGCTCCTG
GCGCTCCCACCCAGCATGGGCAGAAGGGGGCAGGAGGCTGCCACCCAGCAGGGGGTCAGGTGCACTTTTT
TAAAAAGAAGTTCTCTTGGTCACGTCCTAAAAGTGACCAGCTCCCTGTGGCCCAGTCAGAATCTCAGCCT
GAGGACGGTGTTGGCTTCGGCAGCCCCGAGATACATCAGAGGGTGGGCACGCTCCTCCCTCCACTCGCCC
CTCAAACAAATGCCCCGCAGCCCATTTCTCCACCCTCATTTGATGACCGCAGATTCAAGTGTTTTGTTAA
GTAAAGTCCTGGGTGACCTGGGGTCACAGGGTGCCCCACGCTGCCTGCCTCTGGGCGAACACCCCATCAC
GCCCGGAGGAGGGCGTGGCTGCCTGCCTGAGTGGGCCAGACCCCTGTCGCCAGGCCTCACGGCAGCTCCA
TAGTCAGGAGATGGGGAAGATGCTGGGGACAGGCCCTGGGGAGAAGTACTGGGATCACCTGTTCAGGCTC
CCACTGTGACGCTGCCCCGGGGCGGGGGAAGGAGGTGGGACATGTGGGCGTTGGGGCCTGTAGGTCCACA
CCCACTGTGGGTGACCCTCCCTCTAACCTGGGTCCAGCCCGGCTGGAGATGGGTGGGAGTGTGACCTAGG
GCTGGCGGGCAGGCGGGCACTGTGTCTCCCTGACTGTGTCCTCCTGTGTCCCTCTGCCTCGCCGCTGTTC
CGGAACCTGCTCTGCGCGGCACGTCCTGGCAGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGC
AGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCA
TCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGC
CTCCTGCACCGAGAGAGATGGAATAAAGCCCTTGAACCAGCCCTGCTGTGCCGTCTGTGTGTCTTGGGGG
CCCTGGGCCAAGCCCCACTTCCC
PROTEIN SEQ
13
EXPERIMENT-03
OBJECTIVE
To find the information about genes and proteins using the KEGG database.
ABOUT
QUESTIONS
● Perform a quick search to find pathways associated with the APC gene. Examine the
KEGG GENE record for human APC.
Answer:
14
KEGG GENE record for human APC :- APC, BTPS2, DESMD, DP2, DP2.5, DP3, GS,
PPP1R46
● Examine the WNT pathway page and the location of APC in this diagram.
Answer:
15
● Determine the upstream membrane protein that interacts with Wnt to trigger this
pathway. Find more information about this protein in KEGG.
Answer:
Frizzled is the upstream membrane protein that interacts with Wnt to trigger this
pathway.
● Find genes of interest relating to Asthma. Collect the symbols for genes involved in
Mast cell communication with eosinophils and epithelial cells.
Answer:
IL4 (polymorphism)
IL4RA (polymorphism)
IL13 (polymorphism)
FCER1B (polymorphism)
TNFA (polymorphism)
ADAM33 (polymorphism)
CD14 (polymorphism)
HLA-DRB1 (polymorphism)
HLA-DQB1 (polymorphism)
ADRB2 (polymorphism)
16
EXPERIMENT-04
OBJECTIVE
To find out the information regarding motifs and domains in the given protein sequences
using PROSITE.
PROSITE
ABOUT
PROSITE is a database of protein families and domains. It is based on the observation that,
while there is a huge number of different proteins, most of them can be grouped, based on
similarities in their sequences, into a limited number of families. Proteins or protein domains
belonging to a particular family generally share functional attributes and are derived from a
common ancestor.
It is apparent, when studying protein sequence families, that some regions have been better
conserved than others during evolution. These regions are generally important for the
function of a protein and/or for the maintenance of its three-dimensional structure. By
analyzing the constant and variable properties of such groups of similar sequences, it is
possible to derive a signature for a protein family or domain, which distinguishes its
members from all other unrelated proteins. A pertinent analogy is the use of fingerprints by
the police for identification purposes. A fingerprint is generally sufficient to identify a given
individual. Similarly, a protein signature can be used to assign a newly sequenced protein to
a specific family of proteins and thus to formulate hypotheses about its function.
QUESTION-1
>sp|Seq1
MGSKRGISSRHHSLSSYEIMFAALFAILVVLCAGLIAVSCLTIKESQRGAALGQSHEARA
TFKITSGVTYNPNLQDKLSVDFKVLAFDLQQMIDEIFLSSNLKNEYKNSRVLQFENGSII
VVFDLFFAQWVSDENVKEELIQGLEANKSSQLVTFHIDLNSVDILDKLTTTSHLATPGNV
SIECLPGSSPCTDALTCIKADLFCDGEVNCPDGSDEDNKMCATVCDGRFLLTGSSGSFQA
THYPKPSETSVVCQWIIRVNQGLSIKLSFDDFNTYYTDILDIYEGVGSSKILRASIWETN
17
PGTIRIFSNQVTATFLIESDESDYVGFNATYTAFNSSELNNYEKINCNFEDGFCFWVQDL
NDDNEWERIQGSTFSPFTGPNFDHTFGNASGFYISTPTGPGGRQERVGLLSLPLDPTLEP
ACLSFWYHMYGENVHKLSINISNDQNMEKTVFQKEGNYGDNWNYGQVTLNETVKFKVAFN
AFKNKILSDIALDDISLTYGICNGSLYPEPTLVPTPPPELPTDCGGPFELWEPNTTFSST
NFPNSYPNLAFCVWILNAQKGKNIQLHFQEFDLENINDVVEIRDGEEADSLLLAVYTGPG
PVKDVFSTTNRMTVLLITNDVLARGGFKANFTTGYHLGIPEPCKADHFQCKNGECVPLVN
LCDGHLHCEDGSDEADCVRFFNGTTNNNGLVRFRIQSIWHTACAENWTTQISNDVCQLLG
LGSGNSSKPIFPTDGGPFVKLNTAPDGHLILTPSQQCLQDSLIRLQCNHKSCGKKLAAQD
ITPKIVGGSNAKEGAWPWVVGLYYGGRLLCGASLVSSDWLVSAAHCVYGRNLEPSKWTAI
LGLHMKSNLTSPQTVPRLIDEIVINPHYNRRRKDNDIAMMHLEFKVNYTDYIQPICLPEE
NQVFPPGRNCSIAGWGTVVYQGTTANILQEADVPLLSNERCQQQMPEYNITENMICAGYE
EGGIDSCQGDSGGPLMCQENNRWFLAGVTSFGYKCALPNRPGVYARVSRFTEWIQSFLH
● Give the range of amino acids in the sequence where these domains/motifs occur.
Answer:
❏ SEA -54-169
❏ LDLRA_2 - 183-222 and 642-678
❏ CUB – 225-334 and 524-634
❏ MAM_2 – 345-504
❏ SRCR_2 -678-788
❏ TRYPSIN_DOM -785-1019
❏ LDLRA_1 – 199-221, 655-677
❏ MAM_1- 391-431
❏ TRYPSIN_HIS- 821-826
❏ TRYPSIN_SER -965-976
18
QUESTION-02
What information does PROSITE have for P54886 (UniProt accession number)
● How many hits/results are returned?
Answer:
2 hits are obtained
19
EXPERIMENT-05
OBJECTIVE
To determine the Open Reading Frames in the given nucleotide sequences and translate
them into protein.
ABOUT
ORF
An open reading frame is a portion of a DNA molecule that, when translated into amino
acids, contains no stop codons. The genetic code reads DNA sequences in groups of three
base pairs, which means that a double-stranded DNA molecule can read in any of six
possible reading frames--three in the forward direction and three in the reverse. A long
open reading frame is likely part of a gene.
QUESTIONS
● Retrieve the human myoglobin DNA sequence (Accession number: X00371.1). Can
you identify the Open Reading Frame (ORF)? Once you have determined the ORF of
the human myoglobin gene, translate it to the amino acid sequence.
Answer:
Accession Number and description of query sequence - X00371.1
Number of ORFs found - 31
Genetic Code table Used- Standard
Details of all ORFs in Tabular Form (Frame, Start position, End position, number of base
pairs, number of amino acids)
20
21
QUESTION
Obtain the genomic DNA of a human hexokinase and translate it into protein. What's the
result?
Answer:
Translated into amino acid-
MGGWFCKEREWRMISLILHPCILLQSIQVFNHLSIYPLVHPVTHPSIHLS
SLNSSIHHLFVYPCIHSSIHPPPIHPTSINPSIYPTSLNSSIHLSIISLS
IQAFIHLSIHPSIQNPPVPPSIHLSIHPCVHPSIPQFIHPPIIPPFIHPP
IPSFIHSLSLHSSSLPPSIYPSITHPSLHLSICSSIPPSIHPPIPSFIHS
LSLHSSSPLHPSIHPSSTHPSIYPSIHL
22
EXPERIMENT-06
OBJECTIVE
To explore nucleotide blast.
TOOL USED
BLAST
DESCRIPTION
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to sequence databases
and calculates the statistical significance of matches. BLAST can be used to infer functional
and evolutionary relationships between sequences as well as help identify members of gene
families.
Variants of BLAST
23
QUESTIONS
TCAAGCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCC
CTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTC
TCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCA
GGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCC
CTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACT
GCAACTAGACGCAGCCCGCATGCAGNCCCCCACCCGCCGNCTTCTGCACCGAGAGAGATGGAATTAAACC
CTTGAACCCAGCANANAAAAAAAAAAAAAAA
Perform a pairwise alignment of the sequence against Nonredundant database using Blastn. Look at
the graph of best results at the top of the page. How long is the alignment length for the best match
in the graph?
Answer:
Query coverage- 94%
Query identity- 98.36%
● What organism is the most common source of the sequences in the first 5 hits?
Answer:
Homo Sapiens
● What protein is most commonly identified in the description column of the alignments as
being associated with or related to these sequences?
Answer:
Human insulin
● Report the alignment result with five different organisms. What is the listed percent identity,
score, percent gaps used and E-value?
24
leucoge
nys
● What group was responsible for the DNA sequencing for this entry?
Answer:
PUBMED 12477932
TTAACACGTGTTGTTGGGAGTTCGAACCTCCCCCTTCCAGCCATCTCTTCTGTTGGTGCA
TAGCTTAATGGTAAAGCCCTGGTCTCCAAAACCAGTCACCTAGGTTCGATTCCTGGTGCA
TCAGCCAAATCTGTCTAAAATAGTCCACCATGACAGATTTCCTCTTTGATCCCCACGCAC
ATGTTAGGTTCGACAAGATCGATCTCGTACCAGAGGCTCAAAATCCTCGGTTCAAGGAGT
TGATCGAAAATACCAACCTGAACTACAAATACGATCCGAATGCTCCTCTCATCGTTGACA
CGACGAAGTACGGTCCTCGTTCATTCTACATGAACAATAAGAAGATCAAACGTACTGGCG
TCGAAATCAACATGACTGATGAGATGGTGGAGGAGCTCTACAAGTGCTCGACTGATGTCT
TGTACTTTGCTGAGCGATATTATTACATTCGTACTCTTGACCACGGCAAGATCAAGATCC
CTCTCCGTGACTATCAGAAATTCTGGCTCCGCATTTACGAGGTCCCTGAGATCCGTAACC
GCGTGTGGCTCGCTTGTCGTCAGTCAGCCAAGTCCACGACACTGACCGTTGAGATCATGC
ATCGAATGCTCTTCAATGAGGATTTCGAATACGTCATCCTGGCCAACAAGGGTAACACGG
CTCGTGAAATCTTCTCGCGTGTCCGTATGGCATATGAACAGCTTCCTCTCTGGATGCAGA
TCGGCGTGACCGAATGGAACAAGGGTTCTGTGAAGCTCGAGAACGATTCCCGTGTCTTCG
CTGCTGCTTCTGGTTCTGACTCTGTCCGCGGTTTCTCTCCCAACGAAGTTCTCCTCGACG
AAGCGGCATTCGTTCGTAATGATGAAGAGTTCATGGCCTCCGTGTTCCCGACCATCTCGT
CTGGTTCTAAGTCTCGTCTGACCCAGATCTCCACGCCGAATGGCCCTCGTGGGATCTTCC
ACCGAGATTACACTCGTGCGACCAAGGGTCTCAACAACTACTTCTCCTACAAGGTTCCAT
GGCACTTTGTTCCTGGCCGTGATGAGGAATGGAAGAAGAAGCAGATCGAAGATACCTCGC
TCATGCAGTTCAAGCAGGAGCAGGATTGTGACTTCATGGGCACTACCGACGGCCTCATCG
ATGGTATGGTTCTCGAGAGCATCCAGGACAATATTCAGGCCCCAATTCTTGTCAATGATG
AGGAACTGTCGGATCCAGCGTATGCAATCTTCGAACTCCCGAAGGAAGGACACTCGTATA
TCGTCACTGCCGACACCGGTGAAGGTAAAGGCAAGGACTCGAGCACATTCACCGTGTTTG
ACGTTTCGACTCGTCCGTTCGTCCAGGTTGCTTCCTACAAATCTAACCAGGTTTCCCAGC
TGATCTTCCCAAACAAGCTGGTCAAGATCGCAGAGACATACAATAATGCTCTCTTGATCC
CAGAAAGAAATAACACATCCGGCGGCACTGTCTGCTATAAGGTCTATTATGAACTCGAAT
ACCCGAACGTCTTCCTCCAGGGCGATGGCGAAATGGACATCGGCGTCCATGTCTCTCATG
CTGTCAGATCCCTCGGTGTCAATACCCTCCGTGGCCTCGTTGAAAAGGGCGGTCTCATTA
TCCGAGATGAACGGACCTTCAAGGAACTAGCGAACTTCCGTCTCCAGAAGAACGGGAAAT
ATGCTGCTCCAGAAGGTGAACATGATGATATGGTCCAGAACCTTTGGATATTCTCGTGGT
ATACGGCTGGTGACGAATTCGAAGAAGCGATGAAGGAAAATATCTACAATGACCTCTATC
GTGAAGAGCTCCAGAGCATCGAGAACCTGAAGGTCACCTCTGCGAACGACGCTTATGACC
CGTATAATGTGAAACCAGTGGCAGGAAAGGCGGCCTCTGCGTTTTGGTAAGAAAGGACAA
ATGATGGCATATGATCTATTCTCGGGTACCGTGGCCAATATCCCGGACCTGTCCACTCTC
AAAGAGAAGACGAGAGAGCTTCTCCAGATCCCTGTGTCGGCTGGCCTGGATATCACTCGC
TGGCTGAACGCACCTTCTCTCAATATCCCGGACCCGTTCAAGGAGTTCAATTTCAAATTC
AAGGGCGGTCGAGTATCGTTCGAGGAATATGAAAGGAATTTTCTTCCGAAAGTTCCTAAC
ATTCTCCAGTCGGCCAATCCGATCGGGTTCCAGGATGTCATCAGTGTCTTGGAGACGTTG
25
Also report the listed percent identity, score, percent gaps used, and E-value?
26
EXPERIMENT-07
QUESTIONS
>gi|6552317|ref|NP_009234.1|
MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQCPLCKNDITK
RSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENNSPEHLKDEVSIIQSMGYRNRAKRLLQS
EPENPSLQETSLSVQLSNLGTVRTLRTKQRIQPQKTSVYIELGSDSSEDTVNKATYCSVGDQELLQITPQ
GTRDEISLDSAKKGEAASGCESETSVSEDCSGLSSQSDILTTQQRDTMQHNLIKLQQEMAELEAVLEQHG
SQPSNSYPSIISDSSALEDLRNPEQSTSEKAVLTSQKSSEYPISQNPEGLSADKFEVSADSSTSKNKEPG
VERSSPSKCPSLDDRWYMHSCSGSLQNRNYPSQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTP
YLESGISLFSDDPESDPSEDRAPESARVGNIPSSTSALKVPQLKVAESAQSPAAAHTTDTAGYNAMEESV
SREKPELTASTERVNKRMSMVVSGLTPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLK
YFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDRKIFRGLEICCYGPF
TNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSV
ALYQCQELDTYLIPQIPHSHY
Perform a pairwise alignment for these sequences against a Non-redundant database using Blastp.
● What are the listed percent identity, score, and E-value for the top 10 results?
Answer:
27
● Find proteins that are known to contribute to pulmonary artery hypertension and
determine if animal models exist in which the disease can be studied. Can a
full-length dog protein sequence be found?
Answer:
● >QUERY1
MKDTDLSTLLSIIRLTELKESKRNALLSLIFQLSVAYFIALVIVSRFVRYVNYITYNNLV
EFIIVLSLIMLIIVTDIFIKKYISKFSNILLETLNLKINSDNNFRREIINASKNHNDKNK
LYDLINKTFEKDNIEIKQLGLFIISSVINNFAYIILLSIGFILLNEVYSNLFSSRYTTIS
IFTLIVSYMLFIRNKIISSEEEEQIEYEKVATSYISSLINRILNTKFTENTTTIGQDKQL
YDSFKTPKIQYGAKVPVKLEEIKEVAKNIEHIPSKAYFVLLAESGLRPGELLNVSIENID
28
LKARIIWINKETQTKRAYFSFFSRKTAEFLEKVYLPAREEFIRANEKNIAKLAAANENQE
IDLEKWKAKLFPYKDDVLRRKIYEAMDRALGKRFELYALRRHFATYMQLKKVPPLAINIL
QGRVGPNEFRILKENYTVFTIEDLRKLYDEAGLVVLE
● How large a fraction of the query sequence does the significant hits match (excluding the
identical matches)?
Answer:
About 50%
● How large a fraction of the query sequence does the significant hits match?
Answer:
About 50%
● Why does BLAST come up with more significant hits in the second iteration?
Answer:
In the first iteration BLAST uses the BLOSUM scoring matrix to align and identify
significant hits. Before running the second iteration, the sequences of the significant hits are
aligned and a sequence profile is estimated. That is at each position the frequency of each of
the 20 amino acids is estimated. Now for the second BLAST iteration, this sequence profile is
used as a scoring matrix making the search specific for the query sequence.
29
EXPERIMENT-08
OBJECTIVE
TOOL USED
GENSCAN
DESCRIPTION
GENSCAN was developed by Christopher Burge in the research group of Samuel Karlin at Stanford
University.
QUESTIONS
● Predict the genes in the above sequences using GENSCAN and FGENESH/FGENES.
Answer:
30
31
FGENESH
32
33
Running Genscan
QUESTION
● How many exons are predicted in Sequence?
Answer:
Two
Exon 2
Begin : 2401
End : 2675
Q5. Write down the sequences and the total length of the predicted protein sequence.
Answer:
Total length of protein 393 amino acids.
predicted peptide sequence(s):
34
B. Running NetGene2
● NetGene predicts potential donor and acceptor splice sites as well as protein coding
potential. It does not predict a complete exon-intron gene structure
● Go to the NetGene2 server
QUESTIONS
● Based on the predictions from Genscan, at which position do you expect to find a
donor splice site?
Answer:
● If NetGene predicts a donor splice site at this position, what is the confidence score?
Answer:
35
● If NetGene predicts a donor splice site at this position, write down the 3 nucleotides
on either side of the splice site
Answer:
● If NetGene predicts an acceptor splice site at this position, write down the 3
nucleotides on either side of the splice site.
Answer:
36
EXERCISE-09
OBJECTIVE
To carry out the Multiple sequence alignment using Clustal Omega and find the conserved
regions.
TOOL USED
Clustal Omega
ABOUT:
Clustal Omega is a multiple sequence alignment program for aligning three or more
sequences together in a computationally efficient and accurate manner. It produces
biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary
relationships can be seen via viewing Cladograms or Phylograms.
Running a tool from the web form is a simple multiple steps process, starting at the top of
the page and following the steps to the bottom.
Each tool has at least 2 steps, but most of them have more:
1. The first steps are usually where the user sets the tool input (e.g. sequences,
databases...)
2. In the following steps, the user has the possibility to change the default tool
parameters
3. And finally, the last step is always the tool submission step, where the user can
specify a title to be associated with the results and an email address for email
notification. Using the submit button will effectively submit the information specified
previously in the form to launch the tool on the server
QUESTIONS
● Suppose you have cloned and sequenced the following sequence in the lab. The only
thing we know is that we isolated this sequence from the bacterial species
Paracoccus denitrificans and that it is involved in respiration.
MADAAVHGHGDHHDTRGFFTRWFMSTNHKDIGILYLFTAGIVGLISVCFTVYMRMELQHPGVQYMCLEG
ARLIADASAECTPNGHLWNVMITYHGVLMMFFVVIPALFGGFGNYFMPLHIGAPDMAFPRLNNLSYWMY
VCGVALGVASLLAPGGNDQMGSGVGWVLYPPLSTTEAGYSMDLAIFAVHVSGASSILGAINIITTFLNM
RAPGMTLFKVPLFAWSVFITAWLILLSLPVLAGAITMLLMDRNFGTQFFDPAGGGDPVLYQHILWFFGH
PEVYIIILPGFGIISHVISTFAKKPIFGYLPMVLAMAAIGILGFVVWAHHMYTAGMSLTQQAYFMLATM
TIAVPTGIKVFSWIATMWGGSIEFKTPMLWAFGFLFLFTVGGVTGVVLSQAPLDRVYHDTYYVVAHFHY
VMSLGAVFGIFAGVYYWIGKMSGRQYPEWAGQLHFWMMFIGSNLIFFPQHFLGRQGMPRRYIDYPVEFA
37
YWNNISSIGAYISFASFLFFIGIVFYTLFAGKRVNVPNYWNEHADTLEWTLPSPPPEHTFETLPKREDW
DRAHAH
● Perform the Multiple sequence alignment of this sequence using CLUSTAL OMEGA
with Nine homologous sequences from different organisms.
Answer:
38
39
EXERCISE 10
OBJECTIVE
To investigate the structure of the influenza virus neuraminidase protein using Swiss pdb
Viewer
TOOL USED
Swiss PDB Viewer
ABOUT
Swiss-PdbViewer (aka DeepView) is an application that provides a user-friendly interface
allowing the analysis of several proteins at the same time. The proteins can be
superimposed in order to deduce structural alignments and compare their active sites or
any other relevant parts. Amino acid mutations, H-bonds, angles, and distances between
atoms are easy to obtain thanks to the intuitive graphic and menu
interface.Swiss-PdbViewer (aka DeepView) has been developed since 1994 by Nicolas Guex.
Swiss-PdbViewer is tightly linked to SWISS-MODEL, an automated homology modeling
server developed within the Swiss Institute of Bioinformatics (SIB) at the Structural
Bioinformatics Group at the Biozentrum in Basel. Working with these two programs greatly
reduces the amount of work necessary to generate models, as it is possible to thread a
protein primary sequence onto a 3D template and get immediate feedback on how well the
threaded protein will be accepted by the reference structure before submitting a request to
build missing loops and refine sidechain packing. Swiss-PdbViewer can also read electron
density maps, and provides various tools to build into the density. In addition, various
modeling tools are integrated and residues can be mutated.
Background Info: The substrate of influenza virus neuraminidase is sialic acid. The structure
of sialic acid and the drugs oseltamivir (in Tamiflu) and zanamivir (in Relenza) is shown
below (from P.J. Collins et al., Nature 453, 1258 (2008)).
● Display each of these files in Swiss PDB Viewer. The protein should be displayed in
ribbon and ligands as ball and stick models. Color the ribbons as per secondary
structures. Color the ligand in CPK color.
2BAT
2HU4
41
42
● Superimpose 2HU4 over 2BAT and report the RMSD. Color each molecule in a
different color.
43
EXPERIMENT 11
Objective: To predict the 3D-structure of a protein having Uniprot Accession No. Q9BZ11.
Description
It is a structural bioinformatics web-server dedicated to homology modeling of 3D protein
structures. Homology modeling is currently the most accurate method to generate reliable
three-dimensional protein structure models and is routinely used in many practical
applications. Homology (or comparative) modelling methods make use of experimental
protein structures ("templates") to build models for evolutionary related proteins ("targets").
QUESTIONS
● Give information about Q9BZ11 (protein name, Gene name, Function, Organism).
Answer:
Protein name- Disintegrin and metalloproteinase domain-containing protein 33
Gene name- ADAM33
Function- metalloendopeptidase activity, zinc ion binding
Organism- Homo sapiens (Human)
What are the templates used? (PDB id, organism, % identity, Bit score, E-value)
44
● Give the % homology and the regions aligned with the template(s).
● Give the range of residues modeled, pdb file, energy, Q-Mean score, Ramachandran
Plot of your best model.
Answer:
Range of residues- 209-642 (3g5c.1.A)
Q-mean score- -2.74
Rmachandran plot-
45
Reference-1r54.1.A
Second model- 3g5c.1.A
RMS- 1.96
Reference- 1r54.1.A
Second-2erq.1.A
RMS- 2.44
Reference- 1r54.1.A
Second- 3dsl.1.A
RMS- 1.71
46
Reference- 1r54.1.A
Second- 2ero.1.A
RMS- 1.89
47
EXPERIMENT-12
OBJECTIVE
To predict the 3D-structure of a protein having Uniprot Accession No. Q9BZ11.
TOOL USED
PHYRE2
ABOUT:
Phyre2 is a suite of tools available on the web to predict and analyse protein structure,
function and mutations. The focus of Phyre2 is to provide biologists with a simple and
intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre,
the original version of the server for which we previously published a protocol. In this
updated protocol, we describe Phyre2, which uses advanced remote homology detection
methods to build 3D models, predict ligand binding sites, and analyse the effect of
amino-acid variants (e.g. nsSNPs) for a user’s protein sequence. Users are guided through
results by a simple interface at a level of detail determined by them. This protocol will guide
a user from submitting a protein sequence to interpreting the secondary and tertiary
structure of their models, their domain composition and model quality. A range of additional
available tools is described to find a protein structure in a genome, to submit a large
number of sequences at once and to automatically run weekly searches for proteins difficult
to model.A typical structure prediction will be returned between 30mins and 2 hours after
submission.
RESULT
● Give information about Q9BZ11 (protein name, Gene name, Function, Organism).
Answer:
Protein name- Disintegrin and metalloproteinase domain-containing protein 33
Gene name- ADAM33
Function- metalloendopeptidase activity, zinc ion binding
Organism- Homo sapiens (Human)
● What are the templates used? (PDB id, organism, % identity, Bit score, E-value)
Answer:
48
A 6 atrox 9
● Give the % homology and the regions aligned with the template(s).
● Give the range of residues modeled, pdb file, energy, Q-Mean score, Ramachandran
Plot of your best model.
Answer:
Range of residues- 209-642 (3g5c.1.A)
Q-mean score- -2.74
Ramachandran plot-
49
50
● Overlap the best models obtained from SWISS Model (previous exercise) and from
Phyre2.
Answer:
51
EXPERIMENT-13
OBJECTIVE:
To draw the 2D- structures of given molecules and convert them into 3D structures.
TOOL USED:
ACD Lab-Chemdraw
ABOUT:
ACD/ChemSketch is an easy-to-use, chemically intelligent molecular structure drawing
application, with more than 2 million users worldwide.
1. Draw chemical structures, reactions, and schema, and access a variety of graphical
tools and templates
2. Generate names from molecular structure
3. Calculate molecular properties from chemical structure
4. Create professional reports, presentations, and publication-ready figures
5. Communicate scientific information with clarity and ease
●
Answer:
52
53
EXERCISE-14
OBJECTIVE:
To find the binding pockets in the given protein.
TOOL USED:
Castp.
ABOUT:
Computed Atlas of Surface Topography of proteins (CASTp) provides an online resource for
locating, delineating and measuring concave surface regions on three-dimensional
structures of proteins. These include pockets located on protein surfaces and voids buried in
the interior of proteins. The measurement includes the area and volume of pocket or void
by solvent accessible surface model (Richards' surface) and by molecular surface model
(Connolly's surface), all calculated analytically. CASTp can be used to study surface features
and functional regions of proteins. CASTp includes a graphical user interface, flexible
interactive visualization, as well as on-the-fly calculation for user uploaded structures.
54
55
EXERCISE-15
OBJECTIVE:
To do docking of a given protein with ligand using swiss dock.
TOOL
Swiss dock
ABOUT:
SwissDock and S3DB are developed by Aurélien Grosdidier, Vincent Zoete and Olivier Michielin, from
the Molecular Modeling Group of the Swiss Institute of Bioinformatics in Lausanne, Switzerland.
SwissDock, a web server dedicated to the docking of small molecules on target proteins. It is based
on the EADock DSS engine, combined with setup scripts for curating common problems and for
preparing both the target protein and the ligand input files. An efficient Ajax/HTML interface was
designed and implemented so that scientists can easily submit dockings and retrieve the predicted
complexes.
SwissDock, a web service to predict the molecular interactions that may occur between a target
protein and a small molecule.
S3DB, a database of manually curated target and ligand structures, inspired by the Ligand-Protein
Database.
SwissDock is based on the docking software EADock DSS, whose algorithm consists of the following
steps:
● Many binding modes are generated either in a box (local docking) or in the vicinity of all
target cavities (blind docking).
● Simultaneously, their CHARMM energies are estimated on a grid.
● The binding modes with the most favorable energies are evaluated with FACTS, and
clustered.
● The most favorable clusters can be visualized online and downloaded on your computer.
RESULT
56
57
58
59
EXERCISE-16
OBJECTIVE:
To draw phylogenetic trees using NGPhylogeny.frhttps://ng phylogeny.f by hit and trial.
TOOL USED:
PhyML
TNT
Mr Bayers method
DESCRIPTION:
The phylogenetic tree is a tree diagram to show the evolutionary histories and relationships among
taxonomic groups. It is used in phylogenetics, which is a branch of biology that is concerned mainly
in the study of evolutionary relatedness among various groups of organisms through molecular
sequencing data and morphological data matrices. The phylogenetic tree is essential in
phylogenetics since it has been used to understand biodiversity, genetics, evolutions, and ecology
among groups of organisms.
The phylogenetic tree shows phylogeny, i.e. the similarities and differences in morphology and
genetics between groups of organisms (or taxa). The taxa that are joined together in the
phylogenetic tree implicate evolutionary relatedness. They may also be hypothesized to have
descended from a hypothetical common ancestor (internal node). When the ancestral path is
implicated the phylogenetic tree is said to be rooted. There are different types of phylogenetic trees.
A rooted phylogenetic tree is when the nodes implicate the most recent common ancestor of taxa
being analyzed. Another type of phylogenetic tree is the unrooted tree. In this type of tree, there is
no assumption on ancestry but only the evolutionary relatedness.
PhyML
PHYML Online is a web interface to PHYML, a software that implements a fast and accurate
heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool
provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various
evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large
datasets in reasonable computing time.
TNT
TNT stands for "Tree analysis using New Technology". It is a program for phylogenetic analysis under
parsimony (with very fast tree-searching algorithms; Nixon, 1999, Cladistics 15:407-406; Goloboff,
1999, Cladistics 15:407-428), as well as extensive tree handling and diagnosis capabilities. It is a
joint project by Pablo Goloboff, James Farris, and Kevin Nixon.
It will include basic functions for reading/writing data and trees from R to TNT, and parsing the TNT
outputs. Although I am really a Bayesian, when working with morphology datasets and their various
coding issues, it can be extremely useful to quickly estimate parsimony trees. Examples include:
60
finding coding mistakes, finding starting trees, and finding parts of the tree with no morphological
support at all.
Mr Bayers method
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic
and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the
posterior distribution of model parameters.
RESULT
61
RESULT 2
62
63
RESULT 3
64
65