Lecture1 20060306 Kang

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46

Introduction to Bioinformatics

Lecture 1:
Overview

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Suggested Texts
Bioinformatics: Sequence and Genome
Analysis. David Mount. 2001. ISBN: 9-
87969-608-7.
 
Biological Sequence Analysis: Probabilistic
models of proteins and nucleic acids. R.
Durbin, S. Eddy, A. Krogh and G. Mitchison.
1998. ISBN: 0-521-62971-3.
 

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Suggested Texts

Image Source: http://www.amazon.com/


Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Other Bioinformatics Books

Image Source: http://www.amazon.com/

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Other Reference Books

Image Source: http://www.amazon.com/

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
What is Bioinformatics/
Computational Biology?
• Bioinformatics: collection and storage of biological
information

• Computational biology: development of algorithms


and statistical models to analyze biological data

• Bioinformatics/Computational Biology will be


interchanged

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
What is Bioinformatics?

Source: http://ccb.wustl.edu/

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Why should I care?
• SmartMoney ranks
Bioinformatics as #1 among
next HotJobs

• Business Week 50 Masters


of Innovation

• Jobs available, exciting


research potential

• Important information
waiting to be decoded!

http://smartmoney.com/consumer/index.cfm?story=working-june02
Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Why is bioinformatics hot?
• Supply/demand: few people adequately
trained in both biology and computer science

• Genome sequencing, microarrays, etc lead to


large amounts of data to be analyzed

• Leads to important discoveries

• Saves time and money

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
What skills are needed?
• Well-grounded in one of the following
areas:
– Computer science
– Molecular biology
– Statistics

• Working knowledge and appreciation in


the others!

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Where Can I Learn More?
• ISCB: http://www.iscb.org/
• NBCI: http://ncbi.nlm.nih.gov/
• http://www.bioinformatics.org/
• Journals
• Conferences (ISMB, RECOMB, PSB…)

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Overview of Molecular Biology
• Cells
• Chromosomes
• DNA
• RNA
• Amino Acids
• Proteins
• Genome/Transcriptome/Proteome

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Cells
• Complex system enclosed
in a membrane

• Organisms are unicellular


(bacteria, baker’s yeast) or
multicellular

• Humans:
– 60 trillion cells
– 320 cell types

Example Animal Cell


www.ebi.ac.uk/microarray/ biology_intro.htm

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Organisms
• Classified into two types:

• Eukaryotes: contain a membrane-bound nucleus and


organelles (plants, animals, fungi,…)

• Prokaryotes: lack a true membrane-bound nucleus


and organelles (single-celled, includes bacteria)

• Not all single celled organisms are


prokaryotes!

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Chromosomes
• In eukaryotes, nucleus
contains one or several
double stranded DNA
molecules orgainized as
chromosomes

• Humans:
– 22 Pairs of autosomes
– 1 pair sex chromosomes

Human Karyotype
http://avery.rutgers.edu/WSSP/StudentScholars/

Session8/Session8.html

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Image source: www.biotec.or.th/Genome/whatGenome.html
Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
What is DNA?
• DNA: Deoxyribonucleic Acid

• Single stranded molecule (oligomer,


polynucleotide) chain of nucleotides

• 4 different nucleotides:
– Adenosine (A)
– Cytosine (C)
– Guanine (G)
– Thymine (T)

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Nucleotide Bases
• Purines (A and G)
• Pyrimidines (C and T)
• Difference is in base structure

Image Source: www.ebi.ac.uk/microarray/ biology_intro.htm

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
DNA
• Can be thought of as an alphabet with 4
characters

• 4 letter alphabet with sufficiently long words


contains information to create complex
organisms

• Not unlike a computer with a small alphabet

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
DNA polynucleotides(oligomers)
• Different nucleotides are
strung together to form
polynucleotides

• Ends of the
polynucleotide are
different

• A directionality is
present

• Convention is to label
the coding strand from
5’ to 3’

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.html
Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Single Strand Polynucleotide
Example polynucleotide:

5’ GTAAAGTCCCGTTAGC 3’

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Double Stranded DNA
• DNA can be single-stranded or double-stranded

• Double stranded DNA: second strand is the “reverse


complement” strand

• Reverse complement runs in opposite direction and bases


are complementary

• Complementary bases:
– A, T
– C, G

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Double Stranded Sequence
Example double stranded polynucleotide:
5’ GTAAAGTCCCGTTAGC 3’
| | | | | | | | | | | | | | | |
3’ CATTTCAGGGCAATCG 5’

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.html
Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Double Stranded DNA

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Double Helix
• Two complementary DNA strands form a stable DNA
double helix

• This spring marks the 50th anniversary of its discovery

Image source; www.ebi.ac.uk/microarray/ biology_intro.htm

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
RNA
• Ribonucleic Acid

• Similar to DNA

• Thymine (T) is replaced by uracil (U)

• RNA can be:


– Single stranded
– Double stranded
– Hybridized with DNA

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
RNA
• RNA is generally single stranded

• Forms secondary or tertiary structures

• RNA folding will be discussed later

• Important in a variety of ways, including


protein synthesis

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
RNA secondary
structure

• E. coli Rnase P
RNA secondary
structure

Image source: www.mbio.ncsu.edu/JWB/MB409/lecture/ lecture05/lecture05.htm

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
mRNA
• Messenger RNA

• Linear molecule encoding genetic


information copied from DNA molecules

• Transcription: process in which DNA is


copied into an RNA molecule

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
mRNA processing
• Eukaryotic genes can be pieced together
– Exons: coding regions
– Introns: non-coding regions

• mRNA processing removes introns, splices


exons together

• Processed mRNA can be translated into a


protein sequence

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
mRNA Processing

Image source: http://departments.oxy.edu/biology/Stillman/bi221/111300/processing_of_hnrnas.htm

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
tRNA
• Transfer RNA

• Well-defined three-dimensional
structure

• Critical for creation of proteins

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
tRNA structure

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
tRNA
• Amino acid attached to each tRNA

• Determined by 3 base anticodon sequence


(complementary to mRNA)

• Translation: process in which the nucleotide


sequence of the processed mRNA is used in
order to join amino acids together into a
protein with the help of ribosomes and tRNA

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Genetic Code
• 4 possible bases (A, C, G, U)
• 3 bases in the codon
• 4 * 4 * 4 = 64 possible codon sequences
• Start codon: AUG
• Stop codons: UAA, UAG, UGA
• 61 codons to code for amino acids (AUG as
well)
• 20 amino acids – redundancy in genetic code

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
20 Amino Acids
• Glycine (G, GLY)
• Alanine (A, ALA)
• Valine (V, VAL)
• Leucine (L, LEU)
• Isoleucine (I, ILE)
• Phenylalanine (F, PHE)
• Proline (P, PRO)
• Serine (S, SER)
• Threonine (T, THR)
• Cysteine (C, CYS)
• Methionine (M, MET)
• Tryptophan (W, TRP)
• Tyrosine (T, TYR)
• Asparagine (N, ASN)
• Glutamine (Q, GLN)
• Aspartic acid (D, ASP)
• Glutamic Acid (E, GLU)
• Lysine (K, LYS)
• Arginine (R, ARG)
• Histidine (H, HIS)
• START: AUG
• STOP: UAA, UAG, UGA

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Amino Acids
• building blocks for proteins (20 different)
• vary by side chain groups

• Hydrophilic amino acids are water soluble


• Hydrophobic are not

• Linked via a single chemical bond (peptide bond)

• Peptide: Short linear chain of amino acids (< 30) polypeptide:


long chain of amino acids (which can be upwards of 4000
residues long).

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Proteins
• Polypeptides having a three dimensional structure.
 
• Primary–sequence of amino acids constituting the polypeptide
chain
• Secondary–local organization into secondary structures
such as  helices and  sheets
• Tertiary –three dimensional arrangements of the amino acids
as they react to one another due to the polarity and resulting
interactions between their side chains
• Quaternary–number and relative positions of the protein
subunits

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Protein Structure

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Central Dogma

DNA

RNA

PROTEIN

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Central Dogma

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
What is a Gene?
• the physical and functional unit of
heredity that carries information from
one generation to the next

• DNA sequence necessary for the


synthesis of a functional protein or RNA
molecule

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Genome
• chromosomal DNA of an organism

• number of chromosomes and genome size


varies quite significantly from one organism to
another

• Genome size and number of genes does not


necessarily determine organism complexity

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Genome Comparison

ORGANISM CHROMOSOMES GENOME SIZE GENES

Homo sapiens 23 3,200,000,000 ~ 30,000


(Humans)

Mus musculus 20 2,600,000,000 ~30,000


(Mouse)

Drosophila 4 180,000,000 ~18,000


melanogaster
(Fruit Fly)

Saccharomyces 16 14,000,000 ~6,000


cerevisiae (Yeast)

Zea mays (Corn) 10 2,400,000,000 ???

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Transcriptome
• complete collection of all possible mRNAs
(including splice variants) of an organism.

• regions of an organism’s genome that get


transcribed into messenger RNA.

• transcriptome can be extended to include all


transcribed elements, including non-coding
RNAs used for structural and regulatory
purposes.

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)
Proteome
• the complete collection of proteins that
can be produced by an organism.

• can be studied either as static (sum of


all proteins possible) or dynamic (all
proteins found at a specific time point)
entity

Introduction to Bioinformatics National Genome Information Center Spring 2006 (original author: Dr. Eric Rouchka)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy