Computational Biology Lab File

Download as pdf or txt
Download as pdf or txt
You are on page 1of 67

SCHOOL OF BIOTECHNOLOGY

GAUTAM BUDDHA UNIVERSITY

COMPUTATIONAL
BIOLOGY LAB
RECORD
(BT-516)

VARNIT CHAUHAN
17/IBT/042
Computational Biology Lab

INDEX
Sr. Experiment Page
No. Number

1. To explore features of PubMed, OMIM database. 1-3

2. To explore features of Genbank, Refseq, Uniprot 4-13

3. To find out the information regarding motifs and 14-16


domains in the given protein sequences using
PROSITE.

4. To find out the information regarding motifs and 17-19


domains in the given protein sequences using
PROSITE.

5. To determine the Open Reading Frames in the 20-22


given nucleotide sequences and translate them
into protein.

6. To explore nucleotide blast. 23-26

7. To perform pairwise alignment. 27-29

8. To predict the genes in the given sequences. 30-36

9. To carry out the Multiple sequence alignment 37-39


using Clustal Omega and find the conserved
regions.

10. To investigate the structure of the influenza 40-43


virus neuraminidase protein using Swiss pdb
Viewer

11. To predict the 3D-structure of a protein having 44-47


Uniprot Accession No. Q9BZ11.

12. To predict the 3D-structure of a protein having 48-51


Uniprot Accession No. Q9BZ11.

13. To draw the 2D- structures of given molecules 52,53


and convert them into 3D structures.

14. To find the binding pockets in the given protein. 54,55

15. To do docking of a given protein with ligand 56-59


using swiss dock.

16. To draw phylogenetic tree using ngPhylogeny 60-65

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-01

OBJECTIVE
To explore features of PubMed, OMIM database.

DATABASES & TOOLS USED


PubMed, OMIM

ABOUT

PubMed
PubMed is an online database that allows you to browse and retrieve biomedical and life
science literature to improve your health on a national and personal level.
More than 32 million citations and abstracts in the biomedical literature are available in the
PubMed database. It does not contain full-text journal articles; however, although accessible
from other outlets, such as the publisher's website or PubMed Central, links to the full text
are often present (PMC).
PubMed, which has been available to the public online since 1996, was created and is
maintained by the National Center for Biotechnology Information (NCBI) at the National
Institutes of Health's National Library of Medicine (NLM) (NIH).

OMIM
Mendelian Inheritance in Man OMIM is a publicly accessible, authoritative database of
human genes and genetic phenotypes that is maintained regularly. Both recognized
mendelian diseases and over 16,000 genes are covered in full-text, cited overviews in
OMIM. The phenotype-genotype interaction is the subject of OMIM.
This database was initiated in the early 1960s by Dr. Victor A. McKusick as a catalog of
mendelian traits and disorders, entitled Mendelian Inheritance in Man (MIM). Twelve book
editions of MIM were published between 1966 and 1998. The online version, OMIM, was
created in 1985 by a collaboration between the National Library of Medicine and the William
H. Welch Medical Library at Johns Hopkins. It was made generally available on the internet
starting in 1987. In 1995, OMIM was developed for the World Wide Web by NCBI, the
National Center for Biotechnology Information.

QUESTION-1

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

How many articles do you find for the epidemiology and pathogenesis of coronavirus
disease? Give 5 references.

Answer:
1. Jin Y, Yang H, Ji W, Wu W, Chen S, Zhang W, Duan G. Virology, Epidemiology,
Pathogenesis, and Control of COVID-19. Viruses. 2020 Mar 27;12(4):372. doi:
10.3390/v12040372. PMID: 32230900; PMCID: PMC7232198.

2. Amawi H, Abu Deiab GI, A Aljabali AA, Dua K, Tambuwala MM. COVID-19 pandemic: an
overview of epidemiology, pathogenesis, diagnostics and potential vaccines and
therapeutics. Ther Deliv. 2020 Apr;11(4):245-268. doi: 10.4155/tde-2020-0035. Epub
2020 May 12. PMID: 32397911; PMCID: PMC7222554.

3. Hu B, Guo H, Zhou P, Shi ZL. Characteristics of SARS-CoV-2 and COVID-19. Nat Rev
Microbiol. 2021 Mar;19(3):141-154. doi: 10.1038/s41579-020-00459-7. Epub 2020 Oct
6. PMID: 33024307; PMCID: PMC7537588.

4. Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC. Pathophysiology,
Transmission, Diagnosis, and Treatment of Coronavirus Disease 2019 (COVID-19): A
Review. JAMA. 2020 Aug 25;324(8):782-793. doi: 10.1001/jama.2020.12839. PMID:
32648899.

5. Ahn DG, Shin HJ, Kim MH, Lee S, Kim HS, Myoung J, Kim BT, Kim SJ. Current Status of
Epidemiology, Diagnosis, Therapeutics, and Vaccines for Novel Coronavirus Disease
2019 (COVID-19). J Microbiol Biotechnol. 2020 Mar 28;30(3):313-324. doi:
10.4014/jmb.2003.03011. PMID: 32238757.

QUESTION-2
What are the genes associated with alveolar cell carcinoma?
a) Name the genes, give their cytogenetic location MIM number, and function. Is
alveolar carcinoma caused by a mutation in a single gene or multiple genes? What clinically
relevant information is provided for cystic fibrosis? Also, report the Genomic coordinates of
any five genes involved.

Answer:

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

Alveolar carcinoma is caused by a mutation in multiple genes.


Clinically relevant information –
1) Inheritance – somatic mutation and autosomal dominant.
2) RESPIRATORY
3) Lung- Alveolar cell carcinoma
- Non-Small Cell lung cancer
- Adenocarcinoma of the lung

4) NEOPLASIA- Alveolar cell carcinoma
- Non Small cell lung cancer
- Adenocarcinoma of lung

5) MISCELLANEOUS
- Genes associated with susceptibility to lung cancer, e.g., FASLG (134638.0002), FAS
(134637.0021), CHRNA5 (118505.0001), CHRNA3 (118503.0001)
- Genes associated with protection against lung cancer, e.g., CASP8 (601763.0004),
CYP2A6 (122720.0002)
- Mutations in EGFR (131550) are associated with altered response to treatment with
tyrosine kinase inhibitors

6) MOLECULAR BASIS

- Susceptibility conferred by a comparesmutation in the epidermal growth factor


receptor gene (EGFR, 131550.0006)

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-02

OBJECTIVE
To explore features of Genbank, Refseq, Uniprot.

DATABASES & TOOLS USED


Genbank ,Refseq ,Uniprot

ABOUT

Genbank

GenBank is a comprehensive public database of nucleotide sequences supporting


bibliographical and biological annotation. GenBank is built and distributed by the National
Center for Biotechnology Information (NCBI), a division of the National Library of Medicine
(NLM), located on the campus of the US National Institutes of Health (NIH) in Bethesda, MD,
USA.

NCBI builds GenBank primarily from the submission of sequence data from authors and
from the bulk submission of expressed sequence tag (EST), genome survey sequence (GSS)
and other high-throughput data from sequencing centers. The US Office of Patents and
Trademarks also contributes sequences from issued patents. GenBank participates with the
European Molecular Biology Laboratory Nucleotide Sequence Database (EMBL) and the
DNA Databank of Japan (DDBJ) as a partner in the International Nucleotide Sequence
Database Collaboration (INSDC), which exchanges data daily to ensure that a uniform and
comprehensive collection of sequence information is available worldwide. NCBI makes the
GenBank data available at no cost over the Internet, through FTP and a wide range of
web-based retrieval and analysis services.

Refseq-
NCBI’s Reference Sequence (RefSeq) database is a collection of taxonomically diverse,
non-redundant and richly annotated sequences representing naturally occurring molecules
of DNA, RNA, and protein. Included are sequences from plasmids, organelles, viruses,
archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data
submitted to the International Nucleotide Sequence Database Collaboration (INSDC). Similar
to a review article, a RefSeq is a synthesis of information integrated across multiple sources
at a given time. RefSeqs provide a foundation for uniting sequence data with genetic and
functional information. They are generated to provide reference standards for multiple
purposes ranging from genome annotation to reporting locations of sequence variation in
medical records. The RefSeq collection is available without restriction and can be retrieved
in several different ways, such as by searching or by available links in NCBI resources,

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

including PubMed, Nucleotide, Protein, Gene, and Map Viewer, searching with a sequence
via BLAST, and downloading from the RefSeq FTP site.

UniProt

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence
and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB),
the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt
consortium and host institutions EMBL-EBI, SIB and PIR are committed to the long-term
preservation of the UniProt databases.

UniProt is a collaboration between the European Bioinformatics Institute (EMBL-EBI), the


SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR). Across the
three institutes more than 100 people are involved through different tasks such as database
curation, software development and support.

QUESTION-1
Retrieve 5 DNA sequences and corresponding protein sequence each of
a) Histone H4
b) Insulin from 5 different organisms.

Answer:
Information about sequence-
Histone H4
SEQUENCE 1:
1. Accession no.- X54078
2. Title- T. nilotica gene for Histone H4
3. Organism- Oreochromis niloticus
4. Base pairs- 605bp
5. If gene, then gene name- N/A
6. Features: Taxon id, CDS, Protein id- 8128, P62796
7. Sequence in Fasta format/ Genbank format
>X54078.1 T.nilotica gene for histone H4
TTACCAATCGAACTTGGCCTGGTTCAAAAAACCTTACCATGTCTGGAAGAGGTAAAGGCGGCAAAGGACTCGGAAAAGGAGGC
GCCAAGCGTCACCGTAAGGTTCTCCGTGATAACATTCAGGGCATCACCAAACCAGCCATCCGTCGTCTGGCTCGCCGTGGTGG
CGTCAAGCGTATCTCTGGTCTGATCTACGAGGAGACCCGTGGTGTGTTGAAGGTGTTTCTGGAGAACGTCATCCGTGACGCCG
TCACCTACACTGAGCACGCCAAGAGGAAGACCGTGACCGCCATGGATGTGGTGTACGCTCTGAAGAGGCAGGGCCGCACTCTG
TACGGCTTCGGCGGTTAAACTCATGCTCCTTCATCCATCAAACGGCTCTTTTAAGAGCCACACACTTCACTTTAAGGGCTTTG

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

TTCTGGGGCCTTCTTGTGGGTAGGGGTTGGGTTGTTGTGTGTGGTGGTGTTGTTTTTTTGTTTTTTTTTTTTGTCTTACTACA
GATTTCTTGAAATAGAAATTTATAGTTAGGAAAATGTCTGGGTAATAACTTTACAATTTAAGTCACTCAGATTTTTATTCATC
TAGTAGTGATGACCAATGAAGCTT

PROTEIN SEQ.
>CAA38015.1 histone H4 [Oreochromis niloticus]
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKVFLENVIRDAVTYTEHAKRKTVTAMD
VVYALKRQGRTLYGFGG

SEQUENCE 2:

1. Accession no.- X77806


2. Title- P.salina histone H4 gene.
3. Organism- Pyrenomonas salina
4. No. of Base pairs- 990
5. PubMed Ref- Müller SB, Rensing SA, Maier UG. The cryptomonad
histone H4-encoding gene: structure and chromosomal
localization. Gene. 1994 Dec 15;150(2):299-302. doi:
10.1016/0378-1119(94)90441-3. PMID: 7821795.
6. If gene, then gene name- N/A
7. Features: Taxon id, CDS, Protein id- 3034, Q43083
8. Sequence in Fasta format/ Genbank format
>X77806.1 P.salina histone H4 gene
AATCAGTAGGTGCTCTCAAGTTGATGATTCCAAATACGCGCCGGTATTTGCGTCGATGTTTGCGCGCGCACGGATCACAAGGC
AATATAAAGAAAGCATGGCTGGGAGCAGCACATCAAGGTACTCGCTGCCGTCTTCAACGATTTCGCGCGTTGCCCCTCAAAAC
ACAGAAGTTGCGGAAGGAAAGTGGTGTCACTTCCGGTCGCCATTTTTGATTTCGTCCGCTTCGGTTTGTGGAAATAGTTAACT
GATCCGAACCGCAGCGTCTTGACGTGGAGTTGACATTTCGATTTCGACAGAAGAGCCAACGACGCAGAGGTCACGAGCCCCAA
CCCGAACCCACCCGAAAAAACATTATGAACATACCACGTTCTCCGTATCGAACTTGGTATGACATTTGGACCAGCTAGCACAG
CCCACCCAACAAGAGACGGCACAAGCTTGCACGACAACCAACACAAACTATCACATGTCTGGACGTGGCAAAGGCGGTAAGGG
TCTCGGAAAGGGAGGAGCCAAGCGCCACAGGAAGGTTCTGCGTGACAACATCCAGGGCATCACCAAGCCTGCTATCCGTCGTC
TTGCTCGTCGTGGTGGTGTGAAGCGCATCTCTGGCCTCATCTATGAGGAGACGCGATCTGTCCTCAAGGTTTTCCTGGAGAAC
GTGATCCGCGACGCCGTGACCTACACTGAGCACGCGCGCAGGAAGACTGTCACTGCGATGGATGTGGTCTATGCACTCAAGAG
GCAGGGTCGCACGCTTTACGGCTTCGGTGGATAAGCAAGCGCGCTGTAAAAAGTGTTTGGCCAACTGGTGTTTCCTCAACACC
ATTGAGCTGCCTGGTGCGCTTTACGGCTGATTCGATTTCAGTGCGGCTAGCTTCGATCGTGTGTACAGAGATCTTTTCAACAG
CTACAGGCTGGTTCGGCGTTCTACATGTCTTCAACTAATAAAGACCTAATCAAACTCGATCCATTTGTCGTCTTCGG

PROTEIN SEQ.
>CAA54829.1 histone H4 [Pyrenomonas salina]
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRSVLKVFLENVIRDAVTYTEHARRKTVTAMD
VVYALKRQGRTLYGFGG

SEQUENCE 3:

1. Accession no.- X79715


6

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

2. Title- L.temulentum mRNA for histone H4


3. Organism- Lolium temulentum
4. No. of Base pairs- 616
5. If gene, then gene name- N/A
6. Features: Taxon id, CDS, Protein id- 34176, P62887
7. Sequence in Fasta format/ Genbank format
>X79715.1 L.temulentum mRNA for histone H4
CATCCCACTCTCCCTCAGCTCAGCTCAAACCAGCCACCTCTCGTCAGCCATGTCGGGCCGCGGCAAGGGAGGAAAGGGGCTCG
GCAAGGGCGGCGCCAAGCGCCACCGCAAGGTCCTCCGCGACAACATCCAGGGCATCACCAAGCCGGCCATCCGCCGCCTGGCT
CGCCGCGGCGGCGTGAAGCGCATCTCCGGGCTCATCTACGAGGAGACCCGTGGCGTGCTCAAGATCTTCCTCGAGAACGTCAT
CCGCGACGCCGTCACCTACACCGAGCACGCCSGCCGCAAGACCGTCACCGCCATGGACGTAGTCTACGCGCTCAAGCGCCAGG
GTCGCACTCTCTACGGCTTCGGCGGCTAGAGGCCTCTCCTCCTCTTCGTCCGCCTAGAGTGCTCATGTAGTTCCTCATCTGGA
GGTGTAGAGATGTTGTTACCTTCTTCCAGTGAGTGCGTGCCGGAATCTAGTAGGTGTGGTAGGTGTTCTTGTTCGTGCTCTCA
TGCATTGTGTTAGTTTCTGTTCGTGTCTGATGTTACCCGCTTGTTATTATCAGTAATGAAAATGTTGGCTACCAGGCGGCCGC
GAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

PROTEIN SEQ
>CAA56154.1 histone H4 [Lolium
temulentum]MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRDAVTYTE
HAXRKTVTAMDVVYALKRQGRTLYGFGG

SEQUENCE 4:

1. Accession no. - NM_128434


2. Title- Arabidopsis thaliana histone H4 (HIS4), mRNA.
3. Organism- Arabidopsis thaliana
4. No. of base pairs- 715
5. Pubmed ref- Crevillén P, Gómez-Zambrano Á, López JA, Vázquez J,
Piñeiro M, Jarillo JA. Arabidopsis YAF9 histone readers modulate
flowering time through NuA4-complex-dependent H4 and H2A.Z
histone acetylation at FLC chromatin. New Phytol. 2019
Jun;222(4):1893-1908. doi: 10.1111/nph.15737. Epub 2019 Mar 13.
PMID: 30742710.
6. If gene, gene names- At1g07660, F24B9.25
At1g07820, F24B9.8
At2g28740, F8N16.2, T11P11.4
At3g45930, F16L2_140
At3g46320, F18L15.40
At3g53730, F5K20_30
7. Features: Taxon Id, CDS, Protein ID- 3702, P9259
8. Sequence in FASTA/ Genbank format
>NM_128434.4 Arabidopsis thaliana histone H4 (HIS4), mRNA

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

TCACGATTCATTTTCTAACCCAAACAATCTTCTTTATAAATAAAAAATAAATTCTTCTCTACTCATCAGAAAACTCAAATCTTAAA
ACTTTCTGGAAAAACAAAAATGTCAGGAAGAGGAAAAGGAGGAAAAGGGTTAGGCAAAGGAGGAGCAAAGAGACACAGAAAGGTTC
TAAGAGACAACATTCAAGGAATCACAAAGCCAGCGATTCGTCGTCTTGCTCGTAGAGGAGGTGTGAAGAGAATCAGTGGATTGATC
TATGAAGAAACGAGAGGTGTGTTGAAGATTTTTCTGGAGAATGTGATTAGAGATGCTGTTACTTACACTGAGCATGCGAGGAGGAA
GACGGTGACTGCTATGGATGTTGTTTATGCCTTGAAGAGACAAGGAAGAACTCTATATGGATTTGGTGGTTGATCAATTTGAGATC
TGGGTTTTCTGGTGAATGATGATGATTTAAGTCTTGCGATCAAGAAATTCCAGAAATTGGGTTGAATTTTAGGGTTTCGTTTTGTG
TTGTAATTAGGGCAGCATTGTAATGGATTAATGATAAGTACCATTTGCTCTAATTACTCTTTAATCTCTGAAATTCATGGTAAAGG
ATTATCAATCGAAAACTAATCAAAGGAATTGATTGAACTATGTTTTTGAAGATTGAAAAACAAATAAGGACTAAATGTGAGCAATT
TAAAGTTTAGAGGTATAAGAATCGAAT

PROTEIN SEQ
>NP_180441.1 histone H4 [Arabidopsis thaliana]
MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRDAVTYTEHARRKTVTAMD
VVYALKRQGRTLYGFGG

SEQUENCE 5:

1. Accession no. AB050889


2. Title- Citrus jambhiri mRNA for histone H4, partial cds.
3. Organism- Citrus jambhiri
4. No. of bp- 308
5. Features: Taxon ID,CDS, Protein ID-64884, Q948T8
>AB050889.1 Citrus jambhiri mRNA for histone H4, partial cds
ATGTCAGGGCGGGGGAAGGGAGGCAAGGGATTGGGAAAGGGAGGCGCCAAGCGTCACCGTAAGGTGCTTCGCGATAACATTCAGGG
TATCACAAAGCCAGCAATCCGGCGTTTGGCTCGTAGAGGTGGAGTCAAGCGTATCAGTGGCTTGATCTACGAGGAGACACGTGGCG
TCCTGAAGATATTCTTGGAGAACGTCATCCGTGACGCCGTGACCTACACTGAGCACGCCAGGAGAAAGACCGTGACCGCGATGGAC
GTCGTTTACGCTCTCAAGAGGCAGGGCAGGACTCTTTATGGATTTGGGGG

PROTEIN SEQ

>BAB71814.1 histone H4, partial [Citrus jambhiri]


MSGRGKGGKGLGKGGAKRHRKVLRDNIQGITKPAIRRLARRGGVKRISGLIYEETRGVLKIFLENVIRDAVTYTEHARRKTVTAMD
VVYALKRQGRTLYGFG

INSULIN

SEQUENCE 1:
1. DEFINITION Octodon degus insulin mRNA, complete cds.
2. ACCESSION M57671
3. SOURCE Octodon degus (degu)
4. ORGANISM Octodon degus
5. No of base pairs 432 bp
6. REFERENCE 1 (bases 1 to 432)
Nishi,M. and Steiner,D.F.Cloning of complementary DNAs encoding
islet amyloid polypeptide,insulin, and glucagon precursors from

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

a New World rodent, the degu, Octodon degusMol. Endocrinol. 4


(8), 1192-1198 (1990)
7. Sequence in FASTA/ Genbank format

>M57671.1 Octodon degus insulin mRNA, complete cds


GCATTCTGAGGCATTCTCTAACAGGTTCTCGACCCTCCGCCATGGCCCCGTGGATGCATCTCCTCACCGTGCTGGCCCTGCTGGCC
CTCTGGGGACCCAACTCTGTTCAGGCCTATTCCAGCCAGCACCTGTGCGGCTCCAACCTAGTGGAGGCACTGTACATGACATGTGG
ACGGAGTGGCTTCTATAGACCCCACGACCGCCGAGAGCTGGAGGACCTCCAGGTGGAGCAGGCAGAACTGGGTCTGGAGGCAGGCG
GCCTGCAGCCTTCGGCCCTGGAGATGATTCTGCAGAAGCGCGGCATTGTGGATCAGTGCTGTAATAACATTTGCACATTTAACCAG
CTGCAGAACTACTGCAATGTCCCTTAGACACCTGCCTTGGGCCTGGCCTGCTGCTCTGCCCTGGCAACCAATAAACCCCTTGAATG
AG

PROTEIN SEQ

>AAA40590.1 insulin [Octodon degus]


MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLEAGGLQPSALEMILQKR
GIVDQCCNNICTFNQLQNYCNVP

SEQUENCE 2:

1. DEFINITION Aplysia californica insulin precursor (PIN), mRNA.


2. ACCESSION NM_001204686
3. SOURCE Aplysia californica (California sea hare)
4. ORGANISM Aplysia californica
5. No of base pairs 968 bp
6. Sequence in FASTA/ Genbank format

>NM_001204686.1 Aplysia californica insulin precursor (PIN), mRNA


CCTGAATATAGCCAACTAAATTCTAGGAACTCTAAGAGGACTACGCTTGTCTCCAACATCTTATCGTCAACATCTTCTGCAAGCGA
TAACTATATTTCTGGTCCGCCAAAGTAGTATACGCTAAGAACAAGAGGAAGAGAGTCGTAAGGTTTTTTATTCCCAGCCGGCGAGA
GCAGAAACTGTTGTTCTAGCTGCCTTTCTGGTCTTAACAGGACCATTTTGCTGGCCAGTGAAAAACTAACTCGGGTGAAACAACAT
TGGTGCTACCAGCCTCTCCTGACTGTTCCAACGGTGCCTTCTCGTAGCCAGAATGAGCAAGTTCCTCCTCCAGAGCCACTCCGCCA
ACGCCTGCCTGCTCACCCTTCTGCTCACGCTGGCCTCCAACCTCGACATATCCCTGGCCAACTTCGAGCACTCGTGCAACGGCTAC
ATGCGGCCCCACCCGCGGGGTCTGTGCGGCGAAGACCTGCACGTCATCATTTCCAACCTGTGCAGCTCTCTGGGGGGCAACAGGAG
GTTCCTGGCCAAGTACATGGTCAAAAGAGACACGGAAAATGTGAACGACAAGTTACGAGGGATCCTGCTCAATAAGAAAGAAGCTT
TCTCCTACTTGACCAAGAGAGAGGCCTCAGGCTCCATCACATGCGAATGTTGCTTCAACCAGTGTCGGATATTTGAGCTGGCTCAG
TACTGCCGTCTGCCAGACCATTTCTTCTCCAGAATATCCAGAACCGGAAGGAGCAACAGTGGACATGCGCAGTTGGAGGACAACTT
TAGTTAGACATGTTGAGGGCGTAAATGCTTTTAAAATTTTTAATTTGGTGATTATTATTATAAAGGAGGAGTCCACGTGGTGTCAG
ATTTAGCGGGTTTTTTCCACGTGTTTGACTAAAGTTTCCAGATTTATTTCATACCAGCGATACCCGCAGGAATAGAAGGTCCCCTA
AGAAGCTGAAGGCATTATTG

PROTEIN SEQ

>NP_001191615.1 insulin precursor [Aplysia californica]


MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRFLAKYMVKRDTENVNDK
LRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISRTGRSNSGHAQLEDNFS

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

SEQUENCE 3:

1. DEFINITION Human alpha-type insulin gene and 5' flanking


polymorphic region.
2. ACCESSION M10039
3. SOURCE Homo sapiens (human)
4. ORGANISM Homo sapiens
5. No of base pairs 3943bp
6. Sequence in FASTA/ Genbank format
>M10039.1 Human alpha-type insulin gene and 5' flanking polymorphic region
CTGGGGCTGCTGTCCTAAGGCAGGGTGGGAACTAGGCAGCCAGCAGGGAGGGGACCCCTCCCTCACTCCCACTCTCCCACCCCCAC
CACCTTGGCCCATCCATGGCGGCATCTTGGGCCATCCGGGACTGGGGACAGGGGTCCTGGGGACAGGGGTCCGGGGACAGGGTCCT
GGGGACAGGGGTGTGAGGACAGGGGTCCTGGGGACAGGGGTGTGGGGACAGGGGTGTGAGGACAGGGGTCCCGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGG
GGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGG
ACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGA
CAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACA
GGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGG
GTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGT
GTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGG
GGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGTGGACAGGGGTGTGGG
GATAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTCCCGGG
GACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGG
ACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGAC
AGGGGTCCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAG
GGCTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGG
GGTCCGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGG
TGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGG
TGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGT
GTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTG
TGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTG
GGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGCAGCGCAAAGAGCCCCGCCCTGCAGCCTCCAGCTCTCCTGGTCTAATGTG
GAAAGTGGCCCAGGTGAGGGCTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAGGGACAGGTCTGGCCACCGGGCCCCTG
GTTAAGACTCTAATGACCCGCTGGTCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGCACCAGGGAAATG
GTCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCCGACCCCCCCACCCCAGGCCCTAATGGGCCAGGCGGCAGGGGTTGACAGG
TAGGGGAGATGGGCTCTGAGACTATAAAGCCAGCGGGGGCCCAGCAGCCCTCAGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCA
TCAAGCAGGTCTGTTCCAAGGGCCTTTGCGTCAGGTGGGCTCAGGGTTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAG
GGAGGACGTGGCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACCTGGCCTTCAGCCTGCCTCAGCC
CTGCCTGTCTCCCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGA
CCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGG
CTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGGTGAGCCAACCGCCCATTGCTGCCCCTGGCCGCCCCCAG
CCACCCCCTGCTCCTGGCGCTCCCACCCAGCATGGGCAGAAGGGGGCAGGAGGCTGCCACCCAGCAGGGGGTCAGGTGCACTTTTT
TAAAAAGAAGTTCTCTTGGTCACGTCCTAAAAGTGACCAGCTCCCTGTGGCCCAGTCAGAATCTCAGCCTGAGGACGGTGTTGGCT
TCGGCAGCCCCGAGATACATCAGAGGGTGGGCACGCTCCTCCCTCCACTCGCCCCTCAAACAAATGCCCCGCAGCCCATTTCTCCA
CCCTCATTTGATGACCGCAGATTCAAGTGTTTTGTTAAGTAAAGTCCTGGGTGACCTGGGGTCACAGGGTGCCCCACGCTGCCTGC
CTCTGGGCGAACACCCCATCACGCCCGGAGGAGGGCGTGGCTGCCTGCCTGAGTGGGCCAGACCCCTGTCGCCAGGCCTCACGGCA
GCTCCATAGTCAGGAGATGGGGAAGATGCTGGGGACAGGCCCTGGGGAGAAGTACTGGGATCACCTGTTCAGGCTCCCACTGTGAC
GCTGCCCCGGGGCGGGGGAAGGAGGTGGGACATGTGGGCGTTGGGGCCTGTAGGTCCACACCCACTGTGGGTGACCCTCCCTCTAA
CCTGGGTCCAGCCCGGCTGGAGATGGGTGGGAGTGTGACCTAGGGCTGGCGGGCAGGCGGGCACTGTGTCTCCCTGACTGTGTCCT
CCTGTGTCCCTCTGCCTCGCCGCTGTTCCGGAACCTGCTCTGCGCGGCACGTCCTGGCAGTGGGGCAGGTGGAGCTGGGCGGGGGC
CCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTG

10

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

CTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGCCTCCTGCACCGAGAGAGATG
GAATAAAGCCCTTGAACCAGCCCTGCTGTGCCGTCTGTGTGTCTTGGGGGCCCTGGGCCAAGCCCCACTTCCC

PROTEIN SEQ

>AAA59173.1 insulin [Homo sapiens]


MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSL
QKRGIVEQCCTSICSLYQLENYCN

SEQUENCE 4:

1. DEFINITION Guinea pig insulin gene, complete cds.


2. ACCESSION K02233
3. SOURCE Cavia porcellus (domestic guinea pig)
4. ORGANISM Cavia porcellus
5. No of base pairs 1472bp
6. Sequence in FASTA/ Genbank format

>M10039.1 Human alpha-type insulin gene and 5' flanking polymorphic region
CTGGGGCTGCTGTCCTAAGGCAGGGTGGGAACTAGGCAGCCAGCAGGGAGGGGACCCCTCCCTCACTCCCACTCTCCCACCCCCAC
CACCTTGGCCCATCCATGGCGGCATCTTGGGCCATCCGGGACTGGGGACAGGGGTCCTGGGGACAGGGGTCCGGGGACAGGGTCCT
GGGGACAGGGGTGTGAGGACAGGGGTCCTGGGGACAGGGGTGTGGGGACAGGGGTGTGAGGACAGGGGTCCCGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGG
GGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGG
ACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGA
CAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACA
GGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGG
GTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGT
GTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGG
GGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGTGGACAGGGGTGTGGG
GATAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTCCCGGG
GACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGG
ACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGAC
AGGGGTCCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAG
GGCTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGG
GGTCCGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGG
TGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGG
TGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGT
GTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTG
TGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTG
GGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGCAGCGCAAAGAGCCCCGCCCTGCAGCCTCCAGCTCTCCTGGTCTAATGTG
GAAAGTGGCCCAGGTGAGGGCTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAGGGACAGGTCTGGCCACCGGGCCCCTG
GTTAAGACTCTAATGACCCGCTGGTCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGCACCAGGGAAATG
GTCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCCGACCCCCCCACCCCAGGCCCTAATGGGCCAGGCGGCAGGGGTTGACAGG
TAGGGGAGATGGGCTCTGAGACTATAAAGCCAGCGGGGGCCCAGCAGCCCTCAGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCA
TCAAGCAGGTCTGTTCCAAGGGCCTTTGCGTCAGGTGGGCTCAGGGTTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAG
GGAGGACGTGGCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACCTGGCCTTCAGCCTGCCTCAGCC
CTGCCTGTCTCCCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGA
CCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGG

11

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

CTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGGTGAGCCAACCGCCCATTGCTGCCCCTGGCCGCCCCCAG
CCACCCCCTGCTCCTGGCGCTCCCACCCAGCATGGGCAGAAGGGGGCAGGAGGCTGCCACCCAGCAGGGGGTCAGGTGCACTTTTT
TAAAAAGAAGTTCTCTTGGTCACGTCCTAAAAGTGACCAGCTCCCTGTGGCCCAGTCAGAATCTCAGCCTGAGGACGGTGTTGGCT
TCGGCAGCCCCGAGATACATCAGAGGGTGGGCACGCTCCTCCCTCCACTCGCCCCTCAAACAAATGCCCCGCAGCCCATTTCTCCA
CCCTCATTTGATGACCGCAGATTCAAGTGTTTTGTTAAGTAAAGTCCTGGGTGACCTGGGGTCACAGGGTGCCCCACGCTGCCTGC
CTCTGGGCGAACACCCCATCACGCCCGGAGGAGGGCGTGGCTGCCTGCCTGAGTGGGCCAGACCCCTGTCGCCAGGCCTCACGGCA
GCTCCATAGTCAGGAGATGGGGAAGATGCTGGGGACAGGCCCTGGGGAGAAGTACTGGGATCACCTGTTCAGGCTCCCACTGTGAC
GCTGCCCCGGGGCGGGGGAAGGAGGTGGGACATGTGGGCGTTGGGGCCTGTAGGTCCACACCCACTGTGGGTGACCCTCCCTCTAA
CCTGGGTCCAGCCCGGCTGGAGATGGGTGGGAGTGTGACCTAGGGCTGGCGGGCAGGCGGGCACTGTGTCTCCCTGACTGTGTCCT
CCTGTGTCCCTCTGCCTCGCCGCTGTTCCGGAACCTGCTCTGCGCGGCACGTCCTGGCAGTGGGGCAGGTGGAGCTGGGCGGGGGC
CCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTG
CTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGCCTCCTGCACCGAGAGAGATG
GAATAAAGCCCTTGAACCAGCCCTGCTGTGCCGTCTGTGTGTCTTGGGGGCCCTGGGCCAAGCCCCACTTCCC

PROTEIN SEQ
>AAA37041.1 insulin [Cavia porcellus]
MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELEDPQVEQTELGMGLGAGGLQPLALEMAL
QKRGIVDQCCTGTCTRHQLQSYCN

SEQUENCE 5:

1. DEFINITION Human alpha-type insulin gene and 5' flanking


polymorphic region.
2. ACCESSION M10039
3. SOURCE Homo sapiens (human)
4. ORGANISM Homo sapiens
5. No of base pairs 3943 bp
6. Sequence in FASTA/ Genbank format

>M10039.1 Human alpha-type insulin gene and 5' flanking polymorphic region
CTGGGGCTGCTGTCCTAAGGCAGGGTGGGAACTAGGCAGCCAGCAGGGAGGGGACCCCTCCCTCACTCCC
ACTCTCCCACCCCCACCACCTTGGCCCATCCATGGCGGCATCTTGGGCCATCCGGGACTGGGGACAGGGG
TCCTGGGGACAGGGGTCCGGGGACAGGGTCCTGGGGACAGGGGTGTGAGGACAGGGGTCCTGGGGACAGG
GGTGTGGGGACAGGGGTGTGAGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAG
GGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAG
GGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACA
GGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACA
GGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACA
GGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGAC
AGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGAC
AGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGAC
AGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGA
CAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGTGGACAGGGGTGTGGGGA
TAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGATAGGGGTGTGGGG

12

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

ACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGACAGGGGTGTGGG
GACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGACAGGGGTGTGGGGATAGGGGTGTGG
GGACAGGGGTGTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTGTG
GGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGT
GGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTGT
GGGGACAGGGCTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAGGGGTG
TGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTGTGGGGACAGGGGTCCGGGGACAGGGGTG
TGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGT
CTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGGGTGTGGGGACAGGGG
TGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCCGGGGACAGGG
GTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGG
GGTCTGGGGACAGGGGTGTGGGGACAGGGGTCTGGGGATAGGGGTGTGGGGACAGGGGTGTGGGGACAGG
GGTGTGGGGACAGGGGTGTGGGGACAGGGGTGTGGGGACAGGGGTCCTGGGGACAGGGGTCTGGGGACAG
CAGCGCAAAGAGCCCCGCCCTGCAGCCTCCAGCTCTCCTGGTCTAATGTGGAAAGTGGCCCAGGTGAGGG
CTTTGCTCTCCTGGAGACATTTGCCCCCAGCTGTGAGCAGGGACAGGTCTGGCCACCGGGCCCCTGGTTA
AGACTCTAATGACCCGCTGGTCCTGAGGAAGAGGTGCTGACGACCAAGGAGATCTTCCCACAGACCCAGC
ACCAGGGAAATGGTCCGGAAATTGCAGCCTCAGCCCCCAGCCATCTGCCGACCCCCCCACCCCAGGCCCT
AATGGGCCAGGCGGCAGGGGTTGACAGGTAGGGGAGATGGGCTCTGAGACTATAAAGCCAGCGGGGGCCC
AGCAGCCCTCAGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGGTCTGTTCCAAGGGCCTT
TGCGTCAGGTGGGCTCAGGGTTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAGGGAGGACGTG
GCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACCTGGCCTTCAGCCTGCC
TCAGCCCTGCCTGTCTCCCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCT
GGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACAC
CTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGG
CAGAGGACCTGCAGGGTGAGCCAACCGCCCATTGCTGCCCCTGGCCGCCCCCAGCCACCCCCTGCTCCTG
GCGCTCCCACCCAGCATGGGCAGAAGGGGGCAGGAGGCTGCCACCCAGCAGGGGGTCAGGTGCACTTTTT
TAAAAAGAAGTTCTCTTGGTCACGTCCTAAAAGTGACCAGCTCCCTGTGGCCCAGTCAGAATCTCAGCCT
GAGGACGGTGTTGGCTTCGGCAGCCCCGAGATACATCAGAGGGTGGGCACGCTCCTCCCTCCACTCGCCC
CTCAAACAAATGCCCCGCAGCCCATTTCTCCACCCTCATTTGATGACCGCAGATTCAAGTGTTTTGTTAA
GTAAAGTCCTGGGTGACCTGGGGTCACAGGGTGCCCCACGCTGCCTGCCTCTGGGCGAACACCCCATCAC
GCCCGGAGGAGGGCGTGGCTGCCTGCCTGAGTGGGCCAGACCCCTGTCGCCAGGCCTCACGGCAGCTCCA
TAGTCAGGAGATGGGGAAGATGCTGGGGACAGGCCCTGGGGAGAAGTACTGGGATCACCTGTTCAGGCTC
CCACTGTGACGCTGCCCCGGGGCGGGGGAAGGAGGTGGGACATGTGGGCGTTGGGGCCTGTAGGTCCACA
CCCACTGTGGGTGACCCTCCCTCTAACCTGGGTCCAGCCCGGCTGGAGATGGGTGGGAGTGTGACCTAGG
GCTGGCGGGCAGGCGGGCACTGTGTCTCCCTGACTGTGTCCTCCTGTGTCCCTCTGCCTCGCCGCTGTTC
CGGAACCTGCTCTGCGCGGCACGTCCTGGCAGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGC
AGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCA
TCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCACACCCGCCGC
CTCCTGCACCGAGAGAGATGGAATAAAGCCCTTGAACCAGCCCTGCTGTGCCGTCTGTGTGTCTTGGGGG
CCCTGGGCCAAGCCCCACTTCCC

PROTEIN SEQ

>AAA59173.1 insulin [Homo sapiens]


MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG
GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

13

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-03

OBJECTIVE
To find the information about genes and proteins using the KEGG database.

DATABASES & TOOLS USED


KEGG

ABOUT

KEGG- KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases


dealing with genomes, biological pathways, diseases, drugs, and chemical substances.
KEGG is utilized for bioinformatics research and education, including data analysis in
genomics, metagenomics, metabolomics and other omics studies, modeling and simulation
in systems biology, and translational research in drug development.

QUESTIONS

● Perform a quick search to find pathways associated with the APC gene. Examine the
KEGG GENE record for human APC.

Answer:

hsa04310 Wnt signaling pathway

hsa04390 Hippo signaling pathway

hsa04550 Signaling pathways regulating pluripotency of stem cells

hsa04810 Regulation of actin cytoskeleton

hsa04934 Cushing syndrome

hsa05010 Alzheimer disease

hsa05022 Pathways of neurodegeneration - multiple diseases

hsa05165 Human papillomavirus infection

hsa05200 Pathways in cancer

hsa05206 MicroRNAs in cancer

hsa05210 Colorectal cancer

14

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

hsa05213 Endometrial cancer

hsa05217 Basal cell carcinoma

hsa05224 Breast cancer

hsa05225 Hepatocellular carcinoma

hsa05226 Gastric Cancer

KEGG GENE record for human APC :- APC, BTPS2, DESMD, DP2, DP2.5, DP3, GS,
PPP1R46

● Examine the WNT pathway page and the location of APC in this diagram.

Answer:

15

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● Determine the upstream membrane protein that interacts with Wnt to trigger this
pathway. Find more information about this protein in KEGG.

Answer:

Frizzled is the upstream membrane protein that interacts with Wnt to trigger this
pathway.

● Find genes of interest relating to Asthma. Collect the symbols for genes involved in
Mast cell communication with eosinophils and epithelial cells.

Answer:

IL4 (polymorphism)

IL4RA (polymorphism)

IL13 (polymorphism)

FCER1B (polymorphism)

TNFA (polymorphism)

ADAM33 (polymorphism)

CD14 (polymorphism)

HLA-DRB1 (polymorphism)

HLA-DQB1 (polymorphism)

ADRB2 (polymorphism)

16

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-04

OBJECTIVE

To find out the information regarding motifs and domains in the given protein sequences
using PROSITE.

DATABASES & TOOLS USED

PROSITE

ABOUT

PROSITE is a database of protein families and domains. It is based on the observation that,
while there is a huge number of different proteins, most of them can be grouped, based on
similarities in their sequences, into a limited number of families. Proteins or protein domains
belonging to a particular family generally share functional attributes and are derived from a
common ancestor.

It is apparent, when studying protein sequence families, that some regions have been better
conserved than others during evolution. These regions are generally important for the
function of a protein and/or for the maintenance of its three-dimensional structure. By
analyzing the constant and variable properties of such groups of similar sequences, it is
possible to derive a signature for a protein family or domain, which distinguishes its
members from all other unrelated proteins. A pertinent analogy is the use of fingerprints by
the police for identification purposes. A fingerprint is generally sufficient to identify a given
individual. Similarly, a protein signature can be used to assign a newly sequenced protein to
a specific family of proteins and thus to formulate hypotheses about its function.

QUESTION-1

Consider the following amino acid sequence

>sp|Seq1
MGSKRGISSRHHSLSSYEIMFAALFAILVVLCAGLIAVSCLTIKESQRGAALGQSHEARA
TFKITSGVTYNPNLQDKLSVDFKVLAFDLQQMIDEIFLSSNLKNEYKNSRVLQFENGSII
VVFDLFFAQWVSDENVKEELIQGLEANKSSQLVTFHIDLNSVDILDKLTTTSHLATPGNV
SIECLPGSSPCTDALTCIKADLFCDGEVNCPDGSDEDNKMCATVCDGRFLLTGSSGSFQA
THYPKPSETSVVCQWIIRVNQGLSIKLSFDDFNTYYTDILDIYEGVGSSKILRASIWETN
17

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

PGTIRIFSNQVTATFLIESDESDYVGFNATYTAFNSSELNNYEKINCNFEDGFCFWVQDL
NDDNEWERIQGSTFSPFTGPNFDHTFGNASGFYISTPTGPGGRQERVGLLSLPLDPTLEP
ACLSFWYHMYGENVHKLSINISNDQNMEKTVFQKEGNYGDNWNYGQVTLNETVKFKVAFN
AFKNKILSDIALDDISLTYGICNGSLYPEPTLVPTPPPELPTDCGGPFELWEPNTTFSST
NFPNSYPNLAFCVWILNAQKGKNIQLHFQEFDLENINDVVEIRDGEEADSLLLAVYTGPG
PVKDVFSTTNRMTVLLITNDVLARGGFKANFTTGYHLGIPEPCKADHFQCKNGECVPLVN
LCDGHLHCEDGSDEADCVRFFNGTTNNNGLVRFRIQSIWHTACAENWTTQISNDVCQLLG
LGSGNSSKPIFPTDGGPFVKLNTAPDGHLILTPSQQCLQDSLIRLQCNHKSCGKKLAAQD
ITPKIVGGSNAKEGAWPWVVGLYYGGRLLCGASLVSSDWLVSAAHCVYGRNLEPSKWTAI
LGLHMKSNLTSPQTVPRLIDEIVINPHYNRRRKDNDIAMMHLEFKVNYTDYIQPICLPEE
NQVFPPGRNCSIAGWGTVVYQGTTANILQEADVPLLSNERCQQQMPEYNITENMICAGYE
EGGIDSCQGDSGGPLMCQENNRWFLAGVTSFGYKCALPNRPGVYARVSRFTEWIQSFLH

● Find the number of motifs/domains and patterns found?


Answer:
HIT BY PROFILE
❏ SEA
❏ LDLRA_2
❏ CUB
❏ MAM_2
❏ SRCR_2 
❏ TRYPSIN_DOM  
❏ HITS BY PATTERN
❏ LDLRA_1 
❏ MAM_1
❏ TRYPSIN_HIS
❏ TRYPSIN_SER 

● Give the range of amino acids in the sequence where these domains/motifs occur.
Answer:
❏ SEA -54-169
❏ LDLRA_2 - 183-222 and 642-678
❏ CUB – 225-334 and 524-634
❏ MAM_2 – 345-504
❏ SRCR_2  -678-788
❏ TRYPSIN_DOM  -785-1019
❏ LDLRA_1 – 199-221, 655-677
❏ MAM_1- 391-431
❏ TRYPSIN_HIS- 821-826
❏ TRYPSIN_SER -965-976

● To which family do these motifs/patterns belong?


Answer:
SEA-

18

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● What is the function of these domains/motifs?


Answer:
❏ SEA- membrane protective function and ligand-receptor function.
❏ LDLRA_2- cholesterol-carrying lipoproteins of plasma
❏ CUB- an extracellular domain of a regulatory protein

QUESTION-02
What information does PROSITE have for P54886 (UniProt accession number)
● How many hits/results are returned?
Answer:
2 hits are obtained

● Where in the sequence is this region?


Answer:
301-318, 678-699

● Which protein family does P54886 belong to according to PROSITE?


Answer:
Dehydrogenase family

19

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-05

OBJECTIVE
To determine the Open Reading Frames in the given nucleotide sequences and translate
them into protein.

DATABASES & TOOLS USED


ORF finder

ABOUT
ORF
An open reading frame is a portion of a DNA molecule that, when translated into amino
acids, contains no stop codons. The genetic code reads DNA sequences in groups of three
base pairs, which means that a double-stranded DNA molecule can read in any of six
possible reading frames--three in the forward direction and three in the reverse. A long
open reading frame is likely part of a gene.

QUESTIONS
● Retrieve the human myoglobin DNA sequence (Accession number: X00371.1). Can
you identify the Open Reading Frame (ORF)? Once you have determined the ORF of
the human myoglobin gene, translate it to the amino acid sequence.

Answer:
Accession Number and description of query sequence - X00371.1
Number of ORFs found - 31
Genetic Code table Used- Standard
Details of all ORFs in Tabular Form (Frame, Start position, End position, number of base
pairs, number of amino acids)

20

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

21

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

QUESTION

Obtain the genomic DNA of a human hexokinase and translate it into protein. What's the
result?

Answer:
Translated into amino acid-
MGGWFCKEREWRMISLILHPCILLQSIQVFNHLSIYPLVHPVTHPSIHLS
SLNSSIHHLFVYPCIHSSIHPPPIHPTSINPSIYPTSLNSSIHLSIISLS
IQAFIHLSIHPSIQNPPVPPSIHLSIHPCVHPSIPQFIHPPIIPPFIHPP
IPSFIHSLSLHSSSLPPSIYPSITHPSLHLSICSSIPPSIHPPIPSFIHS
LSLHSSSPLHPSIHPSSTHPSIYPSIHL

NOTE: NC_000010.11 is a whole-genome sequence and hexokinase lies in between


(69269991..69401882) while ORF finder was only able to calculate ORF till 50,000 so ORF finder cannot
be used to solve this problem.

22

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-06
OBJECTIVE
To explore nucleotide blast.

TOOL USED
BLAST

DESCRIPTION
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between
sequences. The program compares nucleotide or protein sequences to sequence databases
and calculates the statistical significance of matches. BLAST can be used to infer functional
and evolutionary relationships between sequences as well as help identify members of gene
families.

Variants of BLAST

❏ BLASTN - Compares a DNA query to a DNA database. Searches both strands


automatically. It is optimized for speed, rather than sensitivity.
❏ BLASTP - compares a protein query to a protein database.
❏ BLASTX - Compares a DNA query to a protein database, by translating the query
sequence in the 6 possible frames, and comparing each against the database (3
reading frames from each strand of the DNA) searching.
❏ TBLASTN - Compares a protein query to a DNA database, in the 6 possible frames of
the database.
❏ TBLASTX - Compares the protein encoded in a DNA query to the protein encoded in a
DNA database, in the 6*6 possible frames of both query and database sequences
(Note that all the combinations of frames may have different scores).
❏ BLAST2 - Also called advanced BLAST. It can perform gapped alignments.
❏ PSI-BLAST - (Position-Specific Iterated) Performs iterative database searches (details
below).

23

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

QUESTIONS

● You have been provided with a sequence

TCAAGCAGATCACTGTCCTTCTGCCATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCC
CTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTC
TCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCA
GGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCC
CTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACT
GCAACTAGACGCAGCCCGCATGCAGNCCCCCACCCGCCGNCTTCTGCACCGAGAGAGATGGAATTAAACC
CTTGAACCCAGCANANAAAAAAAAAAAAAAA
Perform a pairwise alignment of the sequence against Nonredundant database using Blastn. Look at
the graph of best results at the top of the page. How long is the alignment length for the best match
in the graph?

Answer:
Query coverage- 94%
Query identity- 98.36%

● What organism is the most common source of the sequences in the first 5 hits?
Answer:
Homo Sapiens

● What protein is most commonly identified in the description column of the alignments as
being associated with or related to these sequences?
Answer:
Human insulin

● Report the alignment result with five different organisms. What is the listed percent identity,
score, percent gaps used and E-value?

organis Per score Perce E-valu


m cent nt gap e
iden
tity

Homo 98.3 764 0% 0.0


sapiens 6%

Gorilla 96.5 707 0% 0.0


gorilla 0%

Pongo 95.0 680 0% 0.0


abelii 9%

Nomasc 94.8 675 0% 0.0


us 6%

24

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

leucoge
nys

Hylobate 94.3 666 0% 0.0


s 9%
moloch

● What group was responsible for the DNA sequencing for this entry?
Answer:
PUBMED 12477932
TTAACACGTGTTGTTGGGAGTTCGAACCTCCCCCTTCCAGCCATCTCTTCTGTTGGTGCA
TAGCTTAATGGTAAAGCCCTGGTCTCCAAAACCAGTCACCTAGGTTCGATTCCTGGTGCA
TCAGCCAAATCTGTCTAAAATAGTCCACCATGACAGATTTCCTCTTTGATCCCCACGCAC
ATGTTAGGTTCGACAAGATCGATCTCGTACCAGAGGCTCAAAATCCTCGGTTCAAGGAGT
TGATCGAAAATACCAACCTGAACTACAAATACGATCCGAATGCTCCTCTCATCGTTGACA
CGACGAAGTACGGTCCTCGTTCATTCTACATGAACAATAAGAAGATCAAACGTACTGGCG
TCGAAATCAACATGACTGATGAGATGGTGGAGGAGCTCTACAAGTGCTCGACTGATGTCT
TGTACTTTGCTGAGCGATATTATTACATTCGTACTCTTGACCACGGCAAGATCAAGATCC
CTCTCCGTGACTATCAGAAATTCTGGCTCCGCATTTACGAGGTCCCTGAGATCCGTAACC
GCGTGTGGCTCGCTTGTCGTCAGTCAGCCAAGTCCACGACACTGACCGTTGAGATCATGC
ATCGAATGCTCTTCAATGAGGATTTCGAATACGTCATCCTGGCCAACAAGGGTAACACGG
CTCGTGAAATCTTCTCGCGTGTCCGTATGGCATATGAACAGCTTCCTCTCTGGATGCAGA
TCGGCGTGACCGAATGGAACAAGGGTTCTGTGAAGCTCGAGAACGATTCCCGTGTCTTCG
CTGCTGCTTCTGGTTCTGACTCTGTCCGCGGTTTCTCTCCCAACGAAGTTCTCCTCGACG
AAGCGGCATTCGTTCGTAATGATGAAGAGTTCATGGCCTCCGTGTTCCCGACCATCTCGT
CTGGTTCTAAGTCTCGTCTGACCCAGATCTCCACGCCGAATGGCCCTCGTGGGATCTTCC
ACCGAGATTACACTCGTGCGACCAAGGGTCTCAACAACTACTTCTCCTACAAGGTTCCAT
GGCACTTTGTTCCTGGCCGTGATGAGGAATGGAAGAAGAAGCAGATCGAAGATACCTCGC
TCATGCAGTTCAAGCAGGAGCAGGATTGTGACTTCATGGGCACTACCGACGGCCTCATCG
ATGGTATGGTTCTCGAGAGCATCCAGGACAATATTCAGGCCCCAATTCTTGTCAATGATG
AGGAACTGTCGGATCCAGCGTATGCAATCTTCGAACTCCCGAAGGAAGGACACTCGTATA
TCGTCACTGCCGACACCGGTGAAGGTAAAGGCAAGGACTCGAGCACATTCACCGTGTTTG
ACGTTTCGACTCGTCCGTTCGTCCAGGTTGCTTCCTACAAATCTAACCAGGTTTCCCAGC
TGATCTTCCCAAACAAGCTGGTCAAGATCGCAGAGACATACAATAATGCTCTCTTGATCC
CAGAAAGAAATAACACATCCGGCGGCACTGTCTGCTATAAGGTCTATTATGAACTCGAAT
ACCCGAACGTCTTCCTCCAGGGCGATGGCGAAATGGACATCGGCGTCCATGTCTCTCATG
CTGTCAGATCCCTCGGTGTCAATACCCTCCGTGGCCTCGTTGAAAAGGGCGGTCTCATTA
TCCGAGATGAACGGACCTTCAAGGAACTAGCGAACTTCCGTCTCCAGAAGAACGGGAAAT
ATGCTGCTCCAGAAGGTGAACATGATGATATGGTCCAGAACCTTTGGATATTCTCGTGGT
ATACGGCTGGTGACGAATTCGAAGAAGCGATGAAGGAAAATATCTACAATGACCTCTATC
GTGAAGAGCTCCAGAGCATCGAGAACCTGAAGGTCACCTCTGCGAACGACGCTTATGACC
CGTATAATGTGAAACCAGTGGCAGGAAAGGCGGCCTCTGCGTTTTGGTAAGAAAGGACAA
ATGATGGCATATGATCTATTCTCGGGTACCGTGGCCAATATCCCGGACCTGTCCACTCTC
AAAGAGAAGACGAGAGAGCTTCTCCAGATCCCTGTGTCGGCTGGCCTGGATATCACTCGC
TGGCTGAACGCACCTTCTCTCAATATCCCGGACCCGTTCAAGGAGTTCAATTTCAAATTC
AAGGGCGGTCGAGTATCGTTCGAGGAATATGAAAGGAATTTTCTTCCGAAAGTTCCTAAC
ATTCTCCAGTCGGCCAATCCGATCGGGTTCCAGGATGTCATCAGTGTCTTGGAGACGTTG

25

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

i. Perform a discontiguous and MEGABLAST search.


ii. What difference do you observe in the results of both as compared to Blastn
results for the sequence.

Also report the listed percent identity, score, percent gaps used, and E-value?

26

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-07

QUESTIONS
>gi|6552317|ref|NP_009234.1|
MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQCPLCKNDITK
RSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENNSPEHLKDEVSIIQSMGYRNRAKRLLQS
EPENPSLQETSLSVQLSNLGTVRTLRTKQRIQPQKTSVYIELGSDSSEDTVNKATYCSVGDQELLQITPQ
GTRDEISLDSAKKGEAASGCESETSVSEDCSGLSSQSDILTTQQRDTMQHNLIKLQQEMAELEAVLEQHG
SQPSNSYPSIISDSSALEDLRNPEQSTSEKAVLTSQKSSEYPISQNPEGLSADKFEVSADSSTSKNKEPG
VERSSPSKCPSLDDRWYMHSCSGSLQNRNYPSQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEGTP
YLESGISLFSDDPESDPSEDRAPESARVGNIPSSTSALKVPQLKVAESAQSPAAAHTTDTAGYNAMEESV
SREKPELTASTERVNKRMSMVVSGLTPEEFMLVYKFARKHHITLTNLITEETTHVVMKTDAEFVCERTLK
YFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDVVNGRNHQGPKRARESQDRKIFRGLEICCYGPF
TNMPTDQLEWMVQLCGASVVKELSSFTLGTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSV
ALYQCQELDTYLIPQIPHSHY

Perform a pairwise alignment for these sequences against a Non-redundant database using Blastp.

● Report 10 most homologous sequences to your query sequence.


Answer:

● What are the listed percent identity, score, and E-value for the top 10 results?
Answer:

27

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● Are any conserved regions/domains found in your query sequence?


Answer:
Yes
● Is the PDB structure of your query sequence or any of the homologous sequences known? If
yes, give the PDB code.
Answer:
Yes, 3KOH_A
● What organism is the source of your query sequence?
Answer:
Homo sapiens

● Find proteins that are known to contribute to pulmonary artery hypertension and
determine if animal models exist in which the disease can be studied. Can a
full-length dog protein sequence be found?

Answer:

● >QUERY1
MKDTDLSTLLSIIRLTELKESKRNALLSLIFQLSVAYFIALVIVSRFVRYVNYITYNNLV
EFIIVLSLIMLIIVTDIFIKKYISKFSNILLETLNLKINSDNNFRREIINASKNHNDKNK
LYDLINKTFEKDNIEIKQLGLFIISSVINNFAYIILLSIGFILLNEVYSNLFSSRYTTIS
IFTLIVSYMLFIRNKIISSEEEEQIEYEKVATSYISSLINRILNTKFTENTTTIGQDKQL
YDSFKTPKIQYGAKVPVKLEEIKEVAKNIEHIPSKAYFVLLAESGLRPGELLNVSIENID

28

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

LKARIIWINKETQTKRAYFSFFSRKTAEFLEKVYLPAREEFIRANEKNIAKLAAANENQE
IDLEKWKAKLFPYKDDVLRRKIYEAMDRALGKRFELYALRRHFATYMQLKKVPPLAINIL
QGRVGPNEFRILKENYTVFTIEDLRKLYDEAGLVVLE

● How many significant hits does BLAST find(E-value < 0.005)?


Answer:
00

● How many significant hits does PSI_BLAST find(E-value < 0.005)?


Answer:
16

● How large a fraction of the query sequence does the significant hits match (excluding the
identical matches)?
Answer:
About 50%

● Do you find any PDB hits among the significant hits?


Answer:
No pdb id found

● How many significant hits does BLAST find?


Answer:
100+

● How large a fraction of the query sequence does the significant hits match?

Answer:
About 50%

● Why does BLAST come up with more significant hits in the second iteration?
Answer:
In the first iteration BLAST uses the BLOSUM scoring matrix to align and identify
significant hits. Before running the second iteration, the sequences of the significant hits are
aligned and a sequence profile is estimated. That is at each position the frequency of each of
the 20 amino acids is estimated. Now for the second BLAST iteration, this sequence profile is
used as a scoring matrix making the search specific for the query sequence.

● Do you find any PDB hits among the significant hits?


Answer:
No

29

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-08
OBJECTIVE

To predict the genes in the given sequences.

TOOL USED

GENSCAN

DESCRIPTION

In bioinformatics, GENSCAN is a program to identify complete gene structures in genomic DNA. It


is a GHMM-based program that can be used to predict the location of genes and their exon-intron
boundaries in genomic sequences from a variety of organisms. The GENSCAN Web server can be
found at MIT.

GENSCAN was developed by Christopher Burge in the research group of Samuel Karlin at Stanford
University.

QUESTIONS

● Obtain the genomic DNA of a human hexokinase. (#Sequence 1)


Answer:
https://www.ncbi.nlm.nih.gov/nuccore/NC_000010.11?report=fasta&log$=seqview&from=6
9269991&to=69401882

● Predict the genes in the above sequences using GENSCAN and FGENESH/FGENES.
Answer:

30

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

31

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

FGENESH

32

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● Retrieve the sequence from Genbank: Sequence #2: GenBank: AF040714.1


● Identify the sequence
Answer:
https://www.ncbi.nlm.nih.gov/nuccore/AF040714.1?report=fasta
● Predict possible exon sequences.
● Use gene-finding programs Genscan, and NetGene2

33

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

Running Genscan

● Go to the Genscan server

QUESTION
● How many exons are predicted in Sequence?
Answer:
Two

Q2. What are the beginning and end positions ?


Answer:
Exon1
Begin: 320
End: 1226

Exon 2
Begin : 2401
End : 2675

Q3. For the possible exons, note the probability of each


Answer:
Prob. for exon 01 = 0.871
prob . for exon 02 = 0.988

Q4. On which strand (+ or -) is the gene located?


Answer:
Both are on + strand

Q5. Write down the sequences and the total length of the predicted protein sequence.
Answer:
Total length of protein 393 amino acids.
predicted peptide sequence(s):

>/tmp/06_01_21-13:51:00. fasta | GENSCAN_predicted_peptide_1|393_aa


MSCSESPAANSFLVDSLISSGRGEAGGGGGGAGGGGGGGYYAHGGVYLPPAADLPYGLOSCGLFPTLGG
KRNEAASPGSGGGGGGLGPGAHGYGPSPIDLWLDAPRSCRMEPPDGPPPPPQQQPPPPPQPPQPAPQAT
SCSFAQNIKEESSYCLYDSADKCPKVSATAAELAPFPRGPPPDGCALGTSSGVPVPGYFRLSQAYGTAK
GYGSGGGGAQOLGAGPFPAQPPGRGFDLPPALASGSADAARKERALDSPPPPTLACGSGGGSQGDEEAH
ASSSAAEELSPAPSESSKASPEKDSLGNSKGENAANWLTAKSGRKKRCPYTKHQTLELEKEFLFNMYLT
RERRLEISRSVHLTDROVKIWFQNRRMKLKKMNRENRIRELTANFNFS

34

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

B. Running NetGene2

● NetGene predicts potential donor and acceptor splice sites as well as protein coding
potential. It does not predict a complete exon-intron gene structure
● Go to the NetGene2 server

QUESTIONS
● Based on the predictions from Genscan, at which position do you expect to find a
donor splice site?
Answer:

● If NetGene predicts a donor splice site at this position, what is the confidence score?
Answer:

35

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● If NetGene predicts a donor splice site at this position, write down the 3 nucleotides
on either side of the splice site
Answer:

● If NetGene predicts an acceptor splice site at this position, write down the 3
nucleotides on either side of the splice site.

Answer:

36

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXERCISE-09
OBJECTIVE

To carry out the Multiple sequence alignment using Clustal Omega and find the conserved
regions.

TOOL USED

Clustal Omega

ABOUT:

Clustal Omega is a multiple sequence alignment program for aligning three or more
sequences together in a computationally efficient and accurate manner. It produces
biologically meaningful multiple sequence alignments of divergent sequences. Evolutionary
relationships can be seen via viewing Cladograms or Phylograms.
Running a tool from the web form is a simple multiple steps process, starting at the top of
the page and following the steps to the bottom.

Each tool has at least 2 steps, but most of them have more:

1. The first steps are usually where the user sets the tool input (e.g. sequences,
databases...)
2. In the following steps, the user has the possibility to change the default tool
parameters
3. And finally, the last step is always the tool submission step, where the user can
specify a title to be associated with the results and an email address for email
notification. Using the submit button will effectively submit the information specified
previously in the form to launch the tool on the server

QUESTIONS
● Suppose you have cloned and sequenced the following sequence in the lab. The only
thing we know is that we isolated this sequence from the bacterial species
Paracoccus denitrificans and that it is involved in respiration.

MADAAVHGHGDHHDTRGFFTRWFMSTNHKDIGILYLFTAGIVGLISVCFTVYMRMELQHPGVQYMCLEG
ARLIADASAECTPNGHLWNVMITYHGVLMMFFVVIPALFGGFGNYFMPLHIGAPDMAFPRLNNLSYWMY
VCGVALGVASLLAPGGNDQMGSGVGWVLYPPLSTTEAGYSMDLAIFAVHVSGASSILGAINIITTFLNM
RAPGMTLFKVPLFAWSVFITAWLILLSLPVLAGAITMLLMDRNFGTQFFDPAGGGDPVLYQHILWFFGH
PEVYIIILPGFGIISHVISTFAKKPIFGYLPMVLAMAAIGILGFVVWAHHMYTAGMSLTQQAYFMLATM
TIAVPTGIKVFSWIATMWGGSIEFKTPMLWAFGFLFLFTVGGVTGVVLSQAPLDRVYHDTYYVVAHFHY
VMSLGAVFGIFAGVYYWIGKMSGRQYPEWAGQLHFWMMFIGSNLIFFPQHFLGRQGMPRRYIDYPVEFA
37

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

YWNNISSIGAYISFASFLFFIGIVFYTLFAGKRVNVPNYWNEHADTLEWTLPSPPPEHTFETLPKREDW
DRAHAH

● Perform the Multiple sequence alignment of this sequence using CLUSTAL OMEGA
with Nine homologous sequences from different organisms.
Answer:

>WP_011748228.1 cytochrome c oxidase subunit I [Paracoccus denitrificans]

>pdb|3EHB|A Chain A, Cytochrome c oxidase subunit 1-beta

>WP_028712542.1 MULTISPECIES: cytochrome c oxidase subunit I [unclassified


Paracoccus]

>RQP06879.1 cytochrome c oxidase subunit I [Paracoccus sp. BP8]

>WP_205294768.1 cytochrome c oxidase subunit I [Paracoccus sp. H4-D09]

>WP_103172951.1 MULTISPECIES: cytochrome c oxidase subunit 1 [Paracoccus]

>WP_155044879.1 MULTISPECIES: cytochrome c oxidase subunit I [unclassified


Paracoccus]

>WP_122113564.1 cytochrome c oxidase subunit I [Paracoccus alkanivorans]

>WP_090846794.1 cytochrome c oxidase subunit I [Paracoccus alkenifer]

>WP_136856693.1 cytochrome c oxidase subunit 1 [Paracoccus hibiscisoli]

●Draw the Phylogenetic tree of the given sequences.


Answer:

38

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

39

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXERCISE 10
OBJECTIVE
To investigate the structure of the influenza virus neuraminidase protein using Swiss pdb
Viewer

TOOL USED
Swiss PDB Viewer

ABOUT
Swiss-PdbViewer (aka DeepView) is an application that provides a user-friendly interface
allowing the analysis of several proteins at the same time. The proteins can be
superimposed in order to deduce structural alignments and compare their active sites or
any other relevant parts. Amino acid mutations, H-bonds, angles, and distances between
atoms are easy to obtain thanks to the intuitive graphic and menu
interface.Swiss-PdbViewer (aka DeepView) has been developed since 1994 by Nicolas Guex.
Swiss-PdbViewer is tightly linked to SWISS-MODEL, an automated homology modeling
server developed within the Swiss Institute of Bioinformatics (SIB) at the Structural
Bioinformatics Group at the Biozentrum in Basel. Working with these two programs greatly
reduces the amount of work necessary to generate models, as it is possible to thread a
protein primary sequence onto a 3D template and get immediate feedback on how well the
threaded protein will be accepted by the reference structure before submitting a request to
build missing loops and refine sidechain packing. Swiss-PdbViewer can also read electron
density maps, and provides various tools to build into the density. In addition, various
modeling tools are integrated and residues can be mutated.

Background Info: The substrate of influenza virus neuraminidase is sialic acid. The structure
of sialic acid and the drugs oseltamivir (in Tamiflu) and zanamivir (in Relenza) is shown
below (from P.J. Collins et al., Nature 453, 1258 (2008)).

Download the pdb structure : 2BAT and 2HU4.


● What is 2BAT and 2HU4?
Answer:
2BAT
2HU4

● How many chains are in 2BAT and 2HU4?


Answer:
2BAT- 1 chains A
2HU4- 3 chains A, B, C, D, D, E, F, G, H

● What are the heteroatoms/molecules present in the two molecules?


Answer:
2BAT- 17 molecules,
2HU4- 8 molecules,
40

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● Display each of these files in Swiss PDB Viewer. The protein should be displayed in
ribbon and ligands as ball and stick models. Color the ribbons as per secondary
structures. Color the ligand in CPK color.

2BAT

2HU4

41

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

42

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● Superimpose 2HU4 over 2BAT and report the RMSD. Color each molecule in a
different color.

RSM calculated is 3.33 Angstrom

43

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT 11
Objective: To predict the 3D-structure of a protein having Uniprot Accession No. Q9BZ11.

Tool Used: SWISS MODEL

Description
It is a structural bioinformatics web-server dedicated to homology modeling of 3D protein
structures. Homology modeling is currently the most accurate method to generate reliable
three-dimensional protein structure models and is routinely used in many practical
applications. Homology (or comparative) modelling methods make use of experimental
protein structures ("templates") to build models for evolutionary related proteins ("targets").

QUESTIONS
● Give information about Q9BZ11 (protein name, Gene name, Function, Organism).
Answer:
Protein name- Disintegrin and metalloproteinase domain-containing protein 33
Gene name- ADAM33
Function- metalloendopeptidase activity, zinc ion binding
Organism- Homo sapiens (Human)

Which method and tool will you use?


Answer:
SWISS MODEL

What are the templates used? (PDB id, organism, % identity, Bit score, E-value)

Templ PDB SCOP Organi Bit % E-valu


ate ID ID sm score identit e
Used y

c2erp 2ERP 55486 Crotal 899 39 0


A us
atrox

c2dw1 2DW2 55516 Crotal 877 38 0


B us
atrox

c3g5c 3G5C 80298 Homo 1885 31 0


A 03 sapien
s

44

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

c3k7n 3K7N 55516 Naia 829 38 0


A atra

● Give the % homology and the regions aligned with the template(s).

Templates used Regions %Homology

c3g5cA 204-686 31%

c2erpA 199-648 39%

c2dw1B 209-646 38%

c3k7nA 204-648 38%

● How many models have you obtained?


Answer:
5 models

● Give the range of residues modeled, pdb file, energy, Q-Mean score, Ramachandran
Plot of your best model.
Answer:
Range of residues- 209-642 (3g5c.1.A)
Q-mean score- -2.74
Rmachandran plot-

● Visualise the model using Swiss Pdb Viewer.


Answer:

45

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

Reference-1r54.1.A
Second model- 3g5c.1.A
RMS- 1.96

Reference- 1r54.1.A
Second-2erq.1.A
RMS- 2.44

Reference- 1r54.1.A
Second- 3dsl.1.A
RMS- 1.71

46

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

Reference- 1r54.1.A
Second- 2ero.1.A
RMS- 1.89

47

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-12
OBJECTIVE
To predict the 3D-structure of a protein having Uniprot Accession No. Q9BZ11.

TOOL USED
PHYRE2

ABOUT:
Phyre2 is a suite of tools available on the web to predict and analyse protein structure,
function and mutations. The focus of Phyre2 is to provide biologists with a simple and
intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre,
the original version of the server for which we previously published a protocol. In this
updated protocol, we describe Phyre2, which uses advanced remote homology detection
methods to build 3D models, predict ligand binding sites, and analyse the effect of
amino-acid variants (e.g. nsSNPs) for a user’s protein sequence. Users are guided through
results by a simple interface at a level of detail determined by them. This protocol will guide
a user from submitting a protein sequence to interpreting the secondary and tertiary
structure of their models, their domain composition and model quality. A range of additional
available tools is described to find a protein structure in a genome, to submit a large
number of sequences at once and to automatically run weekly searches for proteins difficult
to model.A typical structure prediction will be returned between 30mins and 2 hours after
submission.

RESULT
● Give information about Q9BZ11 (protein name, Gene name, Function, Organism).
Answer:
Protein name- Disintegrin and metalloproteinase domain-containing protein 33
Gene name- ADAM33
Function- metalloendopeptidase activity, zinc ion binding
Organism- Homo sapiens (Human)

● Which method and tool will you use?


Answer:
SWISS MODEL

● What are the templates used? (PDB id, organism, % identity, Bit score, E-value)
Answer:

Templ PDB SCOP Organis Bit % E-val


ate ID ID m sco identi ue
Used re ty

c2erp 2ERP 5548 Crotalus 89 39 0

48

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

A 6 atrox 9

c2dw 2DW2 55516 Crotalus 877 38 0


1B atrox

c3g5c 3G5C 8029 Homo 188 31 0


A 803 sapiens 5

c3k7n 3K7N 55516 Naia 82 38 0


A atra 9

● Give the % homology and the regions aligned with the template(s).

Templates used Regions %Homology

c3g5cA 204-686 31%

c2erpA 199-648 39%

c2dw1B 209-646 38%

c3k7nA 204-648 38%

● How many models have you obtained?


Answer:
5 models

● Give the range of residues modeled, pdb file, energy, Q-Mean score, Ramachandran
Plot of your best model.
Answer:
Range of residues- 209-642 (3g5c.1.A)
Q-mean score- -2.74
Ramachandran plot-

49

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

50

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

● Visualise the model using Swiss Pdb Viewer.


Answer:

● Overlap the best models obtained from SWISS Model (previous exercise) and from
Phyre2.
Answer:

RED – 3g5c.1.A from SWISS MODEL


WHITE -c3g5cA from Phyre2
RMS- 2.54

51

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXPERIMENT-13

OBJECTIVE:
To draw the 2D- structures of given molecules and convert them into 3D structures.

TOOL USED:
ACD Lab-Chemdraw

ABOUT:
ACD/ChemSketch is an easy-to-use, chemically intelligent molecular structure drawing
application, with more than 2 million users worldwide.

1. Draw chemical structures, reactions, and schema, and access a variety of graphical
tools and templates
2. Generate names from molecular structure
3. Calculate molecular properties from chemical structure
4. Create professional reports, presentations, and publication-ready figures
5. Communicate scientific information with clarity and ease


Answer:

52

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

53

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXERCISE-14

OBJECTIVE:
To find the binding pockets in the given protein.

TOOL USED:
Castp.

ABOUT:
Computed Atlas of Surface Topography of proteins (CASTp) provides an online resource for
locating, delineating and measuring concave surface regions on three-dimensional
structures of proteins. These include pockets located on protein surfaces and voids buried in
the interior of proteins. The measurement includes the area and volume of pocket or void
by solvent accessible surface model (Richards' surface) and by molecular surface model
(Connolly's surface), all calculated analytically. CASTp can be used to study surface features
and functional regions of proteins. CASTp includes a graphical user interface, flexible
interactive visualization, as well as on-the-fly calculation for user uploaded structures.

54

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

55

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXERCISE-15

OBJECTIVE:
To do docking of a given protein with ligand using swiss dock.

TOOL
Swiss dock

ABOUT:

SwissDock and S3DB are developed by Aurélien Grosdidier, Vincent Zoete and Olivier Michielin, from
the Molecular Modeling Group of the Swiss Institute of Bioinformatics in Lausanne, Switzerland.

SwissDock, a web server dedicated to the docking of small molecules on target proteins. It is based
on the EADock DSS engine, combined with setup scripts for curating common problems and for
preparing both the target protein and the ligand input files. An efficient Ajax/HTML interface was
designed and implemented so that scientists can easily submit dockings and retrieve the predicted
complexes.

This website provides an access to:

SwissDock, a web service to predict the molecular interactions that may occur between a target
protein and a small molecule.
S3DB, a database of manually curated target and ligand structures, inspired by the Ligand-Protein
Database.

SwissDock is based on the docking software EADock DSS, whose algorithm consists of the following
steps:

● Many binding modes are generated either in a box (local docking) or in the vicinity of all
target cavities (blind docking).
● Simultaneously, their CHARMM energies are estimated on a grid.
● The binding modes with the most favorable energies are evaluated with FACTS, and
clustered.
● The most favorable clusters can be visualized online and downloaded on your computer.

RESULT

56

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

57

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

58

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

59

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

EXERCISE-16
OBJECTIVE:
To draw phylogenetic trees using NGPhylogeny.frhttps://ng phylogeny.f by hit and trial.

TOOL USED:
PhyML
TNT
Mr Bayers method

DESCRIPTION:
The phylogenetic tree is a tree diagram to show the evolutionary histories and relationships among
taxonomic groups. It is used in phylogenetics, which is a branch of biology that is concerned mainly
in the study of evolutionary relatedness among various groups of organisms through molecular
sequencing data and morphological data matrices. The phylogenetic tree is essential in
phylogenetics since it has been used to understand biodiversity, genetics, evolutions, and ecology
among groups of organisms.
The phylogenetic tree shows phylogeny, i.e. the similarities and differences in morphology and
genetics between groups of organisms (or taxa). The taxa that are joined together in the
phylogenetic tree implicate evolutionary relatedness. They may also be hypothesized to have
descended from a hypothetical common ancestor (internal node). When the ancestral path is
implicated the phylogenetic tree is said to be rooted. There are different types of phylogenetic trees.
A rooted phylogenetic tree is when the nodes implicate the most recent common ancestor of taxa
being analyzed. Another type of phylogenetic tree is the unrooted tree. In this type of tree, there is
no assumption on ancestry but only the evolutionary relatedness.

PhyML
PHYML Online is a web interface to PHYML, a software that implements a fast and accurate
heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool
provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various
evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large
datasets in reasonable computing time.

TNT
TNT stands for "Tree analysis using New Technology". It is a program for phylogenetic analysis under
parsimony (with very fast tree-searching algorithms; Nixon, 1999, Cladistics 15:407-406; Goloboff,
1999, Cladistics 15:407-428), as well as extensive tree handling and diagnosis capabilities. It is a
joint project by Pablo Goloboff, James Farris, and Kevin Nixon.
It will include basic functions for reading/writing data and trees from R to TNT, and parsing the TNT
outputs. Although I am really a Bayesian, when working with morphology datasets and their various
coding issues, it can be extremely useful to quickly estimate parsimony trees. Examples include:

60

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

finding coding mistakes, finding starting trees, and finding parts of the tree with no morphological
support at all.

Mr Bayers method
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic
and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the
posterior distribution of model parameters.

RESULT

61

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

RESULT 2

62

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

63

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

RESULT 3

64

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)


Computational Biology Lab

65

Varnit Chauhan(17/IBT/042) Computational Biology Lab(BT516)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy