L2 Proteomics, Genomics and Bioinformatics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

EB3314 Protein Technologies

Chapter 2
Proteomics – Relation
to Genomics and
Bioinformatics
Genomics
● Theoretically starts with the discovery of DNA structure by
Watson and Crick in 1953

● But not much progress until the discovery of enzymes such


as DNA polymerases, restriction endonucleases and ligases for
the synthesis, cutting and joining of the DNA segments (60’s
-70’s)

● Further propelled by technologies such as DNA cloning, DNA


sequencing and amplification (70’s-80’s)

● Large scale automated sequencing + use of computer


programs to manage and analyze sequence data led to the
beginning of genome projects in 1993
The Genome Projects
Organisms Base pairs Genes
Haemophilus influenzae 1.8 Mb 1800 H. influenzae

E. coli 4.6 Mb 4300


Saccharomyces cerevisiae (yeast) 11.47 Mb 5700
Drosophila melanogaster (fruit fly) 122 Mb 13,000
E. coli
Caenorhabditis elegans (soil worm) 100 Mb 19,000
Arabidopsis thaliana (small flowering 157 Mb 27,000
plant)
Mus musculus (mouse) 2.7 Gb 23,000
S. cerevisiae

M. musculus A. thaliana C. elegans D. melanogaster


The Human Genome

3 billion bases (3.3Gb)→


30,000 genes

At any given moment, each of our cells has some combination of genes
turned on, and others are turned off.

How do scientists figure out which are on and which are off?
→ check the gene expression profiling, using a technique called
microarray
→ Microarray analysis involves disruption of cell membrane to isolate its
genetic contents → identify all genes that are turned “on” → generating
a list of those genes→ genes are turned “on” when mRNA is
present/detected.
Genome to Proteome
• Genome remains similar in
clinical
Markers! a cell / cell types

• Proteome changes and


differs from one cell to
another cell
One gene-one enzyme/protein vs
one gene-many enzymes / proteins
● Hypothesis of One gene-one enzyme suggested that the flow genetic
info is in the following order:

● DNA→ (transcription : mRNA) → (translation : Protein)

● It works well for prokaryotic organisms as the protein coding info is


continuous translatable from mRNA into AA sequence.

● Here comes a question on the “one gene one protein” hypothesis→


30,000 genes to make 500,000 proteins → Why are there less
genes detected then the number of proteins coded by them???

● Eukaryotic organisms have exons (protein coding) and intron


(non-protein coding) segments

● DNA → one transcription → many mRNAs → many proteins


One gene-many enzymes
• Each gene may code several proteins by a process known as
alternative splicing

• One gene makes different mRNA products and hence different


proteins
Low number of genes and high number of proteins
• Junk DNA → DNAs that do not code for the phenotype traits

• Single nucloetide polymorphism → difference in nucleotide


within the same gene sequence. No 100% similarity in human
genome

• Gene sequences undergo mRNA splicing (post translational


modification) →give different functional proteins(Chapter 5)
Evidence of Evolutionary Discontinuity –
Human GULO Pseudogene

● Biosynthesis of vitamin C is a multi-step process


that starts in mammals with glucose as an initial
substrate
● The terminal enzyme in the pathway is L-gulono-γ-
lactone oxidase which catalyzes the final step
● The gene that encodes this enzyme is typically
symbolized in the literature as GULO
● Loss of GULO gene function primarily impacts the
cells ability to make vitamin C
Evidence of Evolutionary Discontinuity –
Human GULO Pseudogenes
● losing gene function for gluconolactonase
◦ affects the synthesis of L-gulono-1,4-lactone,
◦ affects caprolactam degradation, the pentose
phosphate pathway
● GULO gene may be “predisposed” to being
lost or mutated (pseudogenization)
compared to other genes because it makes
an enzyme that is allegedly unnecessary for
other biochemical pathways.
Different types of protein
structures
Relationship between protein structure and
function

Hydrophobicity is determined by primary and secondary


structure
e.g.: membrane spanning regions of membrane proteins are
typically alpha helices made of hydrophobic a.a. which interact
with hydrophobic lipids forming stable membrane structure
Example: Relationship between structure and function

** Hb is soluble protein found in cytoplasm of RBC’s as single


molecules. In sickle cell anemia, mutation in the beta globin chain
protein increases its hydrophobicity and protein molecules stick to
each other to avoid aqueous environment.
Example: Relationship between structure and function

** folding of a.a., in a primary sequence, which are distant from


each other forms active site of an enzyme and ligand binding site
of a receptor
The need for proteomics
• The function of protein depends on its three-dimensional structure
and interactions which could hardly be predicted by a sequence of
gene

• Mutations in DNA level interfere the expression of proteins

• Not all mRNAs are translated into functional proteins.

• Post-transcription process modifies the function of proteins secreted


by eukaryotic cells.

• Function of protein is localized in different compartments / organelles


in the cells synthesize protein to regulate their respective activities.

• Proteins are therapeutically relevant molecules → biomarkers


Biomarkers
• Biomarkers → measurable characteristics that reflect physiological,
pharmacological, or disease processes

• Protein biomarkers are promising screening tool for breast cancer,


Alzheimer's, leukemia, and Parkinson's disease.

• A series of six steps must be accomplished in order to successfully validate a


biomarker or set of biomarkers: discovery, qualification, identification, assay
optimization, validation and commercialization.
• Once a biomarker is found and accepted, it can be used to possibly predict
and prevent the disease it's related to.
Tools of proteomics
While expression of genes can be measured easily after the
introduction of cDNA by using PCR to the complementary sequences,
but there are no analogous tools for protein analysis.
Challenges in proteomics
• Diversity → Data generated in proteomics are diverse (proteins
are made out of 20 a.a. → make proteins different in their
properties) (*only 4 nucleotides for genomic study)

• Advance technology → There are different competing or


complementary technology / different steps or methodology which
could be used for peptide sequence analysis (NOT a common
technology platform) making the comparison / interpretation of
data difficult

• Sensitivity → There is a lack of amplification method that


equivalent to the polymerase chain reaction (PCR) for nucleic
acids. Therefore the low concentration proteins are difficult to
detect.

• Location of proteins in the cell → proteins have been modified to


suit their functions in different location in the cell.
What is Bioinformatics?
Conceptualizing biology
in terms of molecules and
then applying
“informatics” techniques
from math, computer
science, and statistics to
understand and organize
the information
associated with these
molecules on a large
scale
How do we use Bioinformatics?

• Store/retrieve biological information (databases)


• Retrieve/compare gene sequences/ peptide sequence
• Predict function of unknown genes/proteins
• Search for previously known functions of a gene /protein
• Compare data with other researchers
• Compile/distribute data for other researchers
Bioinformatics in Proteomics
• Uses information determined by biochemical/crystal
structure methods
• Visualization of protein structure
• Make protein-protein comparisons
• Used to determine:
conformation/folding
antibody binding sites
protein-protein interactions
computer aided drug design
Bioinformatics
Strengths Weaknesses
Accessibility Sometimes not accurate

Growing rapidly Limited possibilities

User friendly Limited comparisons and information


A need for improved Bioinformatics
Genomics Proteomics
Human Genome Project Global view of protein
function/interactions
Gene array technology Protein motifs
Comparative genomics Structural databases
Functional genomics

Data Mining
Handling enormous amounts of data
Sort through what is important and what is not
Manipulate and analyze data to find patterns and variations that correlate
with biological function
Online Mendelian Inheritance in Man (OMIM) is
a database that catalogues all the known
diseases with a genetic component, and — when
possible — links them to the relevant genes in the
human genome and provides references for
further research and tools for genomic analysis of
a catalogued gene

Structure site :
• Molecular modelling
database (MMDB)
• Biopolymer structures from
the Protein Data Bank
(PDB)
• Cn3D (a 3D structure
viewer)
In silico
• A system which gather information available from various sources and use
computer software to model the simulation system in the actual condition.

• For example, if a protein trans-membrane protein was analyzed with a


sequence-based localisation tool that agreed with the hypothesis, it would
probably still be worth experimentally confirming before drawing a
conclusion.

• However, bioinformatics tools can be extremely useful time savers, and can
provide a possible place to start with experimentation, narrow down a
problem domain, or provide potential solutions to problems which would be
very difficult or impossible to determine experimentally, such as with protein
folding.

• Although folding for now is not a replacement for structure determination by


crystallography, it can provide a reasonable estimate of structure which can
be investigated until actual structure is elucidated.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy