lecture1-3_525_W16_large

Download as pdf or txt
Download as pdf or txt
You are on page 1of 131

STRUCTURAL

BIOINFORMATICS
Barry Grant
University of Michigan
www.thegrantlab.org

BIOINF 525 http://bioboot.github.io/bioinf525_w16/ 26-Jan-2016


MODULE OVERVIEW
Objective: Provide an introduction to the practice of bioinformatics
as well as a practical guide to using common bioinformatics
databases and algorithms

1.1. ‣ Introduction to Bioinformatics

1.2. ‣ Sequence Alignment and Database Searching

1.3 ‣ Structural Bioinformatics

1.4 ‣ Genome Informatics: High Throughput Sequencing Applications


and Analytical Methods
WEEK TWO REVIEW

Answers to last weeks homework (19/19):


Answers week 2

Muddy Point Assessment (11/19):


Responses

- “More time to finish the assignment”


- “I felt there was too much material to cover in one lab”
- “The [NCBI] sites were so slow”
- “More time with HMMER would be helpful”
- “Very nice lab”
Q18: NW DYNAMIC PROGRAMMING

Match: +2
Mismatch: -1 A G T T C
Gap: -2 0 -2 -4 -6 -8 -10

A -2 +2 0 -2 -4 -6
ATTG C T -4 0 +1 +2 0 -2
| | |
AGTTC T -6 -2 -1 +3 +4 +2

G -8 -4 0 +1 +2 +3
A -TTGC C -10 -6 -2 -1 0 +4
| | | |
AGTT-C
THIS WEEK’S HOMEWORK

Check out the “Background Reading” material online:


‣ Achievements & Challenges in Structural Bioinformatics
‣ Protein Structure Prediction
‣ Biomolecular Simulation
‣ Computational Drug Discovery

Complete the lecture 1.3 homework questions:


http://tinyurl.com/bioinf525-quiz3
“Bioinformatics is the application of computers
to the collection, archiving, organization, and
analysis of biological data.”

… A hybrid of biology and computer science


“Bioinformatics is the application of computers
to the collection, archiving, organization, and
analysis of biological data.”

Bioinformatics is computer aided biology!


“Bioinformatics is the application of computers
to the collection, archiving, organization, and
analysis of biological data.”

Bioinformatics is computer aided biology!

Goal: Data to Knowledge


So what is structural bioinformatics?
So what is structural bioinformatics?

… computer aided structural biology!

Aims to characterize and interpret biomolecules and


their assembles at the molecular & atomic level
Why should we care?
Why should we care?
Because biomolecules are “nature’s robots”

… and because it is only by coiling into


specific 3D structures that they are able to
perform their functions
BIOINFORMATICS DATA

Literature and ontologies


Gene expression
Genomes
Protein sequence

DNA & RNA sequence

Protein structure

DNA & RNA structure

Chemical entities
Protein families,
motifs and domains

Protein interactions

Pathways

Systems
STRUCTURAL DATA IS CENTRAL

Literature and ontologies


Gene expression
Genomes
Protein sequence

DNA & RNA sequence

Protein structure

DNA & RNA structure

Chemical entities
Protein families,
motifs and domains

Protein interactions

Pathways

Systems
STRUCTURAL DATA IS CENTRAL

Literature and ontologies


Gene expression
Genomes
Protein sequence

DNA & RNA sequence

Protein structure

Sequence > Structure > Function


DNA & RNA structure

Chemical entities
Protein families,
motifs and domains

Protein interactions

change color to gray and yellow from


black and red?
Pathways

Systems
STRUCTURAL DATA IS CENTRAL

Literature and ontologies


Gene expression
Genomes
Protein sequence

DNA & RNA sequence

ENERGETICS DYNAMICS Protein structure

Sequence > Structure > Function


DNA & RNA structure
>

>
Chemical entities
Protein families,
motifs and domains

Protein interactions

Pathways

Systems
Sequence Structure Function
• Unfolded chain of • Ordered in a • Active in specific
amino acid chain precise 3D “conformations”
• Highly mobile arrangment • Specific associations
• Inactive • Stable but dynamic & precise reactions
In daily life, we use machines
with functional structure and moving parts
Genomics is a great start ….
▪ But a parts list is not
enough to
understand how a
bicycle works
… but not the end

▪ We want the full spatiotemporal picture, and an


ability to control it
▪ Broad applications, including drug design,
medical diagnostics, chemical manufacturing,
and energy
Extracted from The Inner Life of a Cell by Cellular Visions and Harvard
[YouTube link: https://www.youtube.com/watch?v=y-uuk4Pr2i8 ]
Sequence Structure Function
• Unfolded chain of • Ordered in a • Active in specific
amino acid chain precise 3D “conformations”
• Highly mobile arrangment • Specific associations
• Inactive • Stable but dynamic & precise reactions
KEY CONCEPT: ENERGY LANDSCAPE

Native
Compact,
Ordered
Unfolded
Expanded, Disordered
KEY CONCEPT: ENERGY LANDSCAPE

1 millisecond
Barrier crossing time
~exp(Barrier Height)

0.1 microseconds Barrier


Height
Native
Compact,
Ordered
Unfolded
Expanded, Disordered
Molten
Globule
Compact, Disordered
KEY CONCEPT: ENERGY LANDSCAPE

1 millisecond
Barrier crossing time
~exp(Barrier Height)
Multiple Native Conformations
Native
0.1 microseconds (e.g. ligand bound and unbound)
Barrier
Height State(s)
Compact,
Ordered
Unfolded
State
Molten Globule
Expanded, Disordered
State
Compact, Disordered
OUTLINE:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges

‣ Fundamentals of protein structure


• Composition, form, forces and dynamics

‣ Representing and interpreting protein structure


• Modeling energy as a function of structure

‣ Example application areas


• Predicting functional dynamics & drug discovery
OUTLINE:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges

‣ Fundamentals of protein structure


• Composition, form, forces and dynamics

‣ Representing and interpreting protein structure


• Modeling energy as a function of structure

‣ Example application areas


• Predicting functional dynamics & drug discovery
TRADITIONAL FOCUS PROTEIN, DNA
AND SMALL MOLECULE DATA SETS
WITH MOLECULAR STRUCTURE

Protein DNA Small Molecules


(PDB) (NDB) (CCDB)
Motivation 1:
Detailed understanding of
molecular interactions

Provides an invaluable structural


context for conservation and
mechanistic analysis leading to
functional insight.
Motivation 1:
Detailed understanding of
molecular interactions

Computational modeling can


provide detailed insight into
functional interactions, their
regulation and potential
consequences of perturbation.

Grant et al. PLoS. Comp. Biol. (2010)


115,306
(1/20/2016)
Motivation 2:
Lots of structural data is
becoming available

Structural Genomics has


contributed to driving
down the cost and time
required for structural
determination

Data from: http://www.rcsb.org/pdb/statistics/


target
selection

Motivation 2: cloning expression purification

Lots of structural data is


becoming available harvesting imaging crystallization

Structural Genomics has bl xtal mounting xtal screening data collection phasing tracing

contributed to driving
down the cost and time publication annotation struc. validation struc. refinement

required for structural


determination
PDB

Image Credit: “Structure determination assembly line” Adam Godzik


Motivation 3:
Theoretical and
computational predictions
have been, and continue
to be, enormously
valuable and influential!
SUMMARY OF KEY MOTIVATIONS

Sequence > Structure > Function


• Structure determines function, so understanding structure
helps our understanding of function

Structure is more conserved than sequence


• Structure allows identification of more distant evolutionary
relationships

Structure is encoded in sequence


• Understanding the determinants of structure allows design and
manipulation of proteins for industrial and medical advantage
Goals:
• Analysis
• Visualization
• Comparison
• Prediction
• Design

Residue No.

Grant et al. JMB. (2007)


Goals:
• Analysis
• Visualization
• Comparison
• Prediction
• Design

Scarabelli and Grant. PLoS. Comp. Biol. (2013)


Goals:
• Analysis
• Visualization
• Comparison
• Prediction
• Design

Scarabelli and Grant. PLoS. Comp. Biol. (2013)


Goals:
• Analysis
• Visualization myosin

• Comparison G-protein
• Prediction
• Design
kinesin

Grant et al. unpublished


Goals:
• Analysis
• Visualization
• Comparison
• Prediction
• Design

Grant et al. PLoS One (2011, 2012)


Goals:
• Analysis
• Visualization
• Comparison
• Prediction
• Design

Grant et al. PLoS Biology (2011)


MAJOR RESEARCH AREAS
AND CHALLENGES

Include but are not limited to:


• Protein classification
• Structure prediction from sequence
• Binding site detection
• Binding prediction and drug design
• Modeling molecular motions
• Predicting physical properties (stability, binding affinities)
• Design of structure and function
• etc...
With applications to Biology, Medicine, Agriculture and Industry
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges

‣ Fundamentals of protein structure


• Composition, form, forces and dynamics

‣ Representing and interpreting protein structure


• Modeling energy as a function of structure

‣ Example application areas


• Predicting functional dynamics & drug discovery
HIERARCHICAL STRUCTURE OF PROTEINS

Primary > Secondary > Tertiary > Quaternary

amino acid Alpha Polypeptide Assembled


residues helix chain subunits

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


RECAP: AMINO ACID NOMENCLATURE

side chain
(R group)

main chain
(backbone)

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


AMINO ACIDS CAN BE GROUPED BY THE
PHYSIOCHEMICAL PROPERTIES

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


AMINO ACIDS POLYMERIZE THROUGH
PEPTIDE BOND FORMATION

side%chains%
backbone%

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


PEPTIDES CAN ADOPT DIFFERENT
CONFORMATIONS BY VARYING THEIR
PHI & PSI BACKBONE TORSIONS

φ" ψ

C?terminal

N?terminal

Bond%angles%and%lengths% Peptide%bond%is%planer%
are%largely%invariant (Cα,%C,%O,%N,%H,%Cα%%all%
lie%in%the%same%plane)

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


PHI VS PSI PLOTS ARE KNOWN AS
RAMACHANDRAN DIAGRAMS

Beta Sheet

Alpha Helix

• Steric%hindrance%dictates%torsion%angle%preference%%
• Ramachandran%plot%show%preferred%regions%of%%φ%and%ψ%dihedral%
angles%which%correspond%to%major%forms%of%secondary"structure

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


MAJOR SECONDARY STRUCTURE TYPES
ALPHA HELIX & BETA SHEET

α4helix"
• Most%common%from%has%3.6%residues%per%turn%
(number%of%residues%in%one%full%rotation)%%%
• Hydrogen%bonds%(dashed%lines)%between%
residue%i"and%i+4"stabilize%the%structure%
• The%side%chains%(in%green)%protrude%outward%
• 310?helix%and%π?helix%forms%are%less%common

Hydrogen%bond:"i→i+4

Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/


MAJOR SECONDARY STRUCTURE TYPES
ALPHA HELIX & BETA SHEET

In%antiparallel"β4sheets"
• Adjacent%β?strands%run%in%opposite%directions%%
• Hydrogen%bonds%(dashed%lines)%between%NH%and%CO%
stabilize%the%structure%
• The%side%chains%(in%green)%are%above%and%below%the%sheet
Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/
MAJOR SECONDARY STRUCTURE TYPES
ALPHA HELIX & BETA SHEET

In%parallel"β4sheets"
• Adjacent%β?strands%run%in%same%direction%
• Hydrogen%bonds%(dashed%lines)%between%NH%and%CO%
stabilize%the%structure%
• The%side%chains%(in%green)%are%above%and%below%the%sheet
Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/
What Does a Protein Look like?
• Proteins%are%stable%(and%hidden)%in%water
• Proteins%closely%interact%with%water
• Proteins%are%close%packed%solid%but%flexible%objects%(globular)
• Due%to%their%large%size%and%%complexity%it%is%often%
hard%to%see%whats%important%in%the%structure%
• Backbone%or%main?chain%representation%can%help%
trace%chain%topology%
• Backbone%or%main?chain%representation%can%help%
trace%chain%topology%&%reveal%secondary%structure
• Simplified%secondary%structure%representations%are%
commonly%used%to%communicate%structural%details%%
• Now%we%can%clearly%see%2o,%3o%and%4o%structure%
• Coiled%chain%of%connected%secondary%structures
DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY

Superposition%of%all%482%structures%in%RCSB%PDB%
(23/09/2015)%
DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY

Principal%component%analysis%(PCA)%of%experimental%structures%
KEY CONCEPT: ENERGY LANDSCAPE

1 millisecond
Barrier crossing time
~exp(Barrier Height)
Multiple Native Conformations
Native
0.1 microseconds (e.g. ligand bound and unbound)
Barrier
Height State(s)
Compact,
Ordered
Unfolded
State
Molten Globule
Expanded, Disordered
State
Compact, Disordered
Key%forces%affec`ng%structure:

• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity%
• Disulfide%Bridges
d

2.6%Å%<%d%<%3.1Å
150°%<%θ%<%180°
Key%forces%affec`ng%structure:

• H?bonding%
• Van%der%Waals% Repulsion%

• Electrosta`cs%
• Hydrophobicity%
Airac`on%
• Disulfide%Bridges

d 3%Å%<%d%<%4Å
Key%forces%affec`ng%structure:
d%%%%%%%%%%d%=%2.8%Å
• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity% (some%`me%called%IONIC%BONDs%or%SALT%BRIDGEs)

• Disulfide%Bridges

Coulomb’s"law E = Energy
k = constant
D = Dielectric constant (vacuum = 1; H2O = 80)
q1 & q2 = electronic charges (Coulombs)
r = distance (Å)
Key%forces%affec`ng%structure:

• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity%
• Disulfide%Bridges

The%force%that%causes%hydrophobic%molecules%or%nonpolar%por`ons%of%molecules%to%
aggregate%together%rather%than%to%dissolve%in%water%is%called%Hydrophobicity%(Greek,"
“water"fearing”).%This%is%not%a%separate%bonding%force;%rather,%it%is%the%result%of%the%
energy%required%to%insert%a%nonpolar%molecule%into%water.
Forces%affec`ng%structure:

• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity%
• Disulfide%Bridges
Other%names:%
cys`ne%bridge%
disulfide%bridge

Hair%contains%lots%of%disulfide%bonds%
which%are%broken%and%reformed%by%heat 10
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges

‣ Fundamentals of protein structure


• Composition, form, forces and dynamics

‣ Representing and interpreting protein structure


• Modeling energy as a function of structure

‣ Example application areas


• Predicting functional dynamics & drug discovery
PDB

Growing but not as rapidly as Sequence repositories


It is highly biased towards crystallography of enzymes
Search: HIV
Search: 1HSG
(PDB ID)
Slide Credit: RCSB PDB
PDB FILE FORMAT

• PDB files contains atomic coordinates and


associated information.
KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE
A SYSTEMS ENERGY AS A FUNCTION OF ITS
STRUCTURE

Two%main%approaches:%
(1).%Physics?Based%
(2).%Knowledge?Based%
KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE
A SYSTEMS ENERGY AS A FUNCTION OF ITS
STRUCTURE

Two%main%approaches:%
(1).%Physics?Based%
(2).%Knowledge?Based%
PHYSICS-BASED POTENTIALS
ENERGY TERMS FROM PHYSICAL THEORY
The Potential Energy Function

Ubond = oscillations about the equilibrium bond length


Uangle = oscillations of 3 atoms about an equilibrium bond angle
Udihedral = torsional rotation of 4 atoms about a central bond
Unonbond = non-bonded energy terms (electrostatics and Lenard-Jones)

CHARMM P.E. function, see: http://www.charmm.org/


img044.jpg (400x300x24b jpeg)

Slide Credit: Michael Levitt


img054.jpg (400x300x24b jpeg)

2223234

Slide Credit: Michael Levitt


PHYSICS-ORIENTED APPROACHES
Weaknesses%
Fully%physical%detail%becomes%computa`onally%intractable%
Approxima`ons%are%unavoidable%
(Quantum%effects%approximated%classically,%water%may%be%treated%crudely)%
Parameteriza`on%s`ll%required%

Strengths%
Interpretable,%provides%guides%to%design%
Broadly%applicable,%in%principle%at%least%
Clear%pathways%to%improving%accuracy%

Status%
Useful,%widely%adopted%but%far%from%perfect% %
Mul`ple%groups%working%on%fewer,%beier%approxs%
Force%fields,%quantum%
entropy,%water%effects%
Moore’s%law:%hardware%improving
Put Levit’s Slide here on Computer Power Increases!

–Johnny Appleseed
SIDE-NOTE: GPUS AND ANTON
SUPERCOMPUTER
SIDE-NOTE: GPUS AND ANTON
SUPERCOMPUTER
KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE
A SYSTEMS ENERGY AS A FUNCTION OF ITS
STRUCTURE

Two%main%approaches:%
(1).%Physics?Based%
(2).%Knowledge?Based%
KNOWLEDGE-BASED DOCKING POTENTIALS

` d i n e%
His

Ligand
carboxylate

Aroma`c
stacking
ENERGY DETERMINES PROBABILITY
(STABILITY)
Basic idea: Use probability as a proxy for energy
Energy

Boltzmann:
Probability

Inverse%Boltzmann:

Example:%ligand%carboxylate%O%to%protein%his`dine%N%
Find%all%protein?ligand%structures%in%the%PDB%with%a%ligand%carboxylate%O%
1. %%For%each%structure,%histogram%the%distances%from%O%to%every%his`dine%N%
2. %%Sum%the%histograms%over%all%structures%to%obtain%p(rO?N)%
3. %%Compute%E(rO?N)%from%p(rO?N)
KNOWLEDGE-BASED DOCKING
POTENTIALS
“PMF”, Muegge & Martin, J. Med. Chem. (1999) 42:791
A%few%types%of%atom%pairs,%out%of%several%hundred%total

Nitrogen+/Oxygen? Aroma`c%carbons Alipha`c%carbons

Atom?atom%distance%(Angstroms)
KNOWLEDGE-BASED POTENTIALS
Weaknesses%
Accuracy%limited%by%availability%of%data%

Strengths%
Rela`vely%easy%to%implement%
Computa`onally%fast%

Status%
Useful,%far%from%perfect% %
May%be%at%point%of%diminishing%returns%
(not%always%clear%how%to%make%improvements)
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges

‣ Fundamentals of protein structure


• Composition, form, forces and dynamics

‣ Representing and interpreting protein structure


• Modeling energy as a function of structure

‣ Example application areas


• Predicting functional dynamics & drug discovery
PREDICTING FUNCTIONAL DYNAMICS

• Proteins"are"intrinsically"flexible"molecules"with"internal"
moCons"that"are"oDen"inCmately"coupled"to"their"
biochemical"funcCon"
– E.g.%%ligand%and%substrate%binding,%conforma`onal%ac`va`on,%
allosteric%regula`on,%etc.%

• Thus"knowledge"of"dynamics"can"provide"a"deeper"
understanding"of"the"mapping"of"structure"to"funcCon""
– Molecular"dynamics%(MD)%and%normal"mode"analysis%(NMA)%are%
two%major%methods%for%predic`ng%and%characterizing%molecular%
mo`ons%and%their%proper`es
MOLECULAR DYNAMICS SIMULATION

• Use force-field to find


Potential energy between
all atom pairs
• Move atoms to next state
• Repeat to generate
trajectory

McCammon, Gelin & Karplus, Nature (1977)


[ See: https://www.youtube.com/watch?v=ui1ZysMFcKk ]
Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)"
(for%integra`ng%equa`ons%of%mo`on,%see%below)

t
Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)"
(for%integra`ng%equa`ons%of%mo`on,%see%below)

At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%%
(by%evalua`ng%force4field"gradient)
Nucleic motion described classically

Empirical force field


Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)"
(for%integra`ng%equa`ons%of%mo`on,%see%below)

At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%%
(by%evalua`ng%force4field"gradient)
Nucleic motion described classically

Empirical force field

Use%the%forces%to%calculate%velociCes%and%move%atoms%to%new%posiCons%
(by%integra`ng%numerically%via%the%“leapfrog”%scheme)"
BASIC ANATOMY OF A MD SIMULATION
Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)"
(for%integra`ng%equa`ons%of%mo`on,%see%below)

At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%%
(by%evalua`ng%force4field"gradient)
Nucleic motion described classically
ps )"
12 "Cme
"ste
" = " 10
m
Empirical force field
" 1 s
m e s…
any" C
ny," m
e " m a
rat
Use%the%forces%to%calculate%velociCes%and%move%atoms%to%new%posiCons%
te
T," " (i
EA
(by%integra`ng%numerically%via%the%“leapfrog”%scheme)"
EP
R
MD%Predic`on%of%Func`onal%Mo`ons%
“close”

“open”

Yao%and%Grant,%Biophys%J.%(2013)
Simula`ons%Iden`fy%Key%Residues%
Media`ng%Dynamic%Ac`va`on%

Yao%…%Grant,%Journal%of%Biological%Chemistry%(2016)
EXAMPLE APPLICATION OF
MOLECULAR SIMULATIONS
Structure TO GPCRS
determines function
• Example: G protein-coupled receptors (GPCRs)
• Largest class of human drug targets
• Function: allow the cell to sense and respond to molecules outside it

Binding
Binding
Cell
Cell$
Membrane
membrane

G-protein-
Activation coupling

GPCR
GPCR

G$protein
G protein
PROTEINS JUMP BETWEEN MANY, HIERARCHICALLY
ORDERED “CONFORMATIONAL SUBSTATES”

H. Frauenfelder et al., Science 229 (1985) 337


Improve this slide

MOLECULAR DYNAMICS IS VERY EXPENSIVE

%Example:%F1?ATPase%in%water%(183,674%atoms)%for%1%nanosecond:%%
%%=>%106%integration%steps%%
%%=>%8.4%*%1011%floating%point%operations/step%%%
%%%%%%%[n(n?1)/2%interactions]%

%%%%%%%Total:% 8.4%*%1017%flop%
%%%%%%(on%a%100%Gflop/s%cpu:% ca"25"years!)%

…"but"performance"has"been"improved"by"use"of:"
%%%%%%multiple%time%stepping% % ca.%%2.5%years%
%%%%%%fast%multipole%methods%% ca.%%%1%year%%
%%%%%%parallel%computers%%% %%%%%%%%ca.%%5%days%
modern%GPUs%%% % %%%%%%%%ca.""1"day"
(Anton"supercomputer%%%%%%%%%ca.""minutes)
COARSE GRAINING: NORMAL MODE ANALYSIS (NMA)

• MD%is%s`ll%`me?consuming%for%large%systems%
• Elas`c%network%model%NMA%(ENM?NMA)%is%an%example%of%a%
lower%resolu`on%approach%that%finishes%in%seconds%even%for%
large%systems.
i
rij
j
C.%G.

• 1%bead%/
1%amino%acid%
• Connected%by%
springs
Atomis`c Coarse%Grained
NMA models the protein as a network of elastic strings

Proteinase K
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges

‣ Fundamentals of protein structure


• Composition, form, forces and dynamics

‣ Representing and interpreting protein structure


• Modeling energy as a function of structure

‣ Example application areas


• Predicting functional dynamics & drug discovery
THE TRADITIONAL EMPIRICAL PATH TO
DRUG DISCOVERY
Compound"library
(commercial,"in4house,
syntheCc,"natural)

High"throughput"screening
(HTS)

Hit"confirmaCon

Lead"compounds
(e.g.,"µM"Kd)

Lead"opCmizaCon"
(Medicinal"chemistry)

Animal"and"clinical" Potent"drug"candidates
evaluaCon (nM"Kd)"
COMPUTER-AIDED LIGAND DESIGN

Aims%to%reduce%number%of%compounds%synthesized%and%assayed%

Lower%costs%

Reduce%chemical%waste%

Facilitate%faster%progress
Two%main%approaches:%
(1).%Receptor/Target?Based"
(2).%Ligand/Drug4Based%
Two%main%approaches:%
(1).%Receptor/Target?Based"
(2).%Ligand/Drug4Based%
SCENARIO 1:
RECEPTOR-BASED DRUG DISCOVERY
Structure%of%Targeted%Protein%Known:%Structure?Based%Drug%Discovery

HIV%Protease/KNI?272%complex
PROTEIN-LIGAND DOCKING
Structure-Based Ligand Design
Docking%soware
Search%for%structure%of%lowest%energy Poten`al%func`on
Energy%as%func`on%of%structure

VDW

+ 4
Screened%Coulombic

Dihedral
STRUCTURE-BASED VIRTUAL SCREENING
Compound% 3D"structure"of"target
database (crystallography,%NMR,%
modeling)

Virtual"screening
(e.g.,%computaConal"docking)

Candidate%ligands

Ligand%op`miza`on
Med%chem,%crystallography,% Experimental%assay
modeling

Ligands Drug"candidates
COMPOUND LIBRARIES

Commercial%%
Government%(NIH) Academia
(in?house%pharma)
FRAGMENTAL STRUCTURE-BASED
SCREENING
“Fragment”"library 3D"structure"of"target

Fragment"docking

Compound%design

Experimental%assay%and%ligand%op`miza`on Drug"candidates
Med%chem,%crystallography,%modeling

hip://www.beilstein?ins`tut.de/bozen2002/proceedings/Jho`/jho`.html
Multiple non active-site pockets identified

Small organic probe fragment affinities map multiple potential


binding sites across the structural ensemble.

* *

Probe Occupancy
GTP
GDP

Residue No.

ethanol benzene
acetone methylamine

isopropanol cyclohexane acetamide


phenol
Ensemble docking & candidate inhibitor testing
3) NCI ligands that target the C1 pocket of K-ras

Top hits from ensemble docking against distal pockets were tested for
inhibitory effects on basal ERK activity in glioblastoma cell lines.

RasEnsemble
activity incomputational
different cell lines
docking Compound effect on U251 cell line

96

00
N1

8
SO

02
27
81

30
38

51

73

43
21

13616

DM

117
99660

66
U1

U2

U3

U3

36

64
13

Ras-GTP P-ERK1/2

Total
Total Ras ERK1/2

117028 10 µM
23895

Compound testing in
cancer cell lines

PLoS One (2011, 2012)


36818
121182 99660
Proteins%and%Ligand%are%Flexible
Protein

Ligand
Complex

ΔGo
+
COMMON SIMPLIFICATIONS USED IN
PHYSICS-BASED DOCKING

Quantum%effects%approximated%classically%

Protein%oen%held%rigid%

Configura`onal%entropy%neglected%

Influence%of%water%treated%crudely
Two%main%approaches:%
(1).%Receptor/Target?Based"
(2).%Ligand/Drug4Based%
Scenario"2%
Structure%of%Targeted%Protein%Unknown:%Ligand?Based%Drug%Discovery
e.g.%MAP%Kinase%Inhibitors

Using%knowledge%of%
exis`ng%inhibitors%to%
discover%more
Why%Look%for%Another%Ligand%if%You%Already%Have%Some?

Experimental%screening%generated%some%ligands,%but%they%don’t%bind%`ghtly%

A%company%wants%to%work%around%another%company’s%chemical%patents%

An%high?affinity%ligand%is%toxic,%is%not%well?absorbed,%etc.
LIGAND-BASED VIRTUAL SCREENING

Compound%Library Known"Ligands

Molecular"similarity"
Machine?learning%
Etc.

Candidate%ligands

Op`miza`on
Med%chem,%crystallography,%% Assay
modeling

Ac`ves Potent"drug"candidates
CHEMICAL SIMILARITY
LIGAND-BASED DRUG-DISCOVERY

Compounds
(available/synthesizable)

n d s
n %l iga Different
ow
i t h %kn Don’t%bother

p are%w
Co m
Sim
ila
r

Test%experimentally
CHEMICAL FINGERPRINTS
BINARY STRUCTURE KEYS

ly e ate
yl th e yl yd ol
e xy
l
… o nd i n e i n e
n h n h l h h d o %b lor or
e ap eto et thy lde lco mi arb S
p h n k m e a a a c S? ch flu
Molecule%1

Molecule%2
CHEMICAL SIMILARITY FROM
FINGERPRINTS

Tanimoto Similarity
or Jaccard Index, T

Intersec`on NI=2

Union NU=8

Molecule%1

Molecule%2
POTENTIAL DRAWBACKS OF PLAIN
CHEMICAL SIMILARITY

May"miss"good"ligands"by"being"overly"conservaCve"

May"put"too"much"weight"on"irrelevant"details"
%%?%Examine%ligand%shape%and%common%substructures%
%%?%Build%pharmacophore%models%
%%?%Sta`s`cs%and%machine%learning%%on%chemical%descriptors
Maximum%Common%Substructure

Ncommon=34
Pharmacophore%Models%
Φάρμακο%(drug)%+%Φορά%(carry)

Bulky"hydrophobe
A%3?point%pharmacophore

% Å
% ± 0.3 3.2%±0.4%Å
5. 0

+"1 AromaCc
3% Å
% ± 0.
2.8
Molecular%Descriptors
More%abstract%than%chemical%fingerprints

Physical%descriptors%
% molecular%weight%
% charge%
% dipole%moment%
% number%of%H?bond%donors/acceptors%
% number%of%rotatable%bonds% Rotatable%bonds

% hydrophobicity%(log%P%and%clogP)%

Topological%
% branching%index%
% measures%of%linearity%vs%interconnectedness%

Etc.%etc.
A%High?Dimensional%“Chemical%Space”%
Each%compound%is%at%a%point%in%an%n?dimensional%space%
Compounds%with%similar%proper`es%are%near%each%other

Descriptor%3
to r%1
sc rip
D e

Descriptor%2

Point%represen`ng%a%
compound%in%descriptor%
space

Apply%mulCvariate"staCsCcs%and%machine"learning%for%descriptor?selec`on.%
(e.g.%par`al%least%squares,%support%vector%machines,%random%forest,%etc.)
CAUTIONARY NOTES

• “Everything"should"be"made"as"simple"as"it"can"be"but"not"simpler”"%
A%model%is%never"perfect.%%A%model%that%is%not%quan`ta`vely%accurate%in%
every%respect%does%not%preclude%one%from%establishing%results%relevant%
to%our%understanding%of%biomolecules%as%long%as%the%biophysics%of%the%
model%are%properly%understood%and%explored.%%

• CalibraCon"of"the"parameters"is"an"ongoing"and"imperfect"process"
Ques`ons%and%hypotheses%should%always%be%designed%such%that%they%do%
not%depend%crucially%on%the%precise%numbers%used%for%the%various%
parameters.%%

• A"computaConal"model"is"rarely"universally"right"or"wrong"
A%model%may%be%accurate%in%some%regards,%inaccurate%in%others.%%These%
subtle`es%can%only%be%uncovered%by%comparing%to%all%available%
experimental%data.
SUMMARY

• Structural bioinformatics is computer aided structural


biology

• Described major motivations, goals and challenges of


structural bioinformatics

• Reviewed the fundamentals of protein structure

• Introduced both physics and knowledge based


modeling approaches for describing the structure,
energetics and dynamics of proteins computationally
Ilan Samish et al. Bioinformatics 2015;31:146-150
INFORMING SYSTEMS BIOLOGY?

Literature and ontologies


Gene expression
Genomes
Protein sequence

DNA & RNA sequence

Protein structure

DNA & RNA structure

Chemical entities
Protein families,
motifs and domains

Protein interactions

Pathways

Systems

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy