lecture1-3_525_W16_large
lecture1-3_525_W16_large
lecture1-3_525_W16_large
BIOINFORMATICS
Barry Grant
University of Michigan
www.thegrantlab.org
Match: +2
Mismatch: -1 A G T T C
Gap: -2 0 -2 -4 -6 -8 -10
A -2 +2 0 -2 -4 -6
ATTG C T -4 0 +1 +2 0 -2
| | |
AGTTC T -6 -2 -1 +3 +4 +2
G -8 -4 0 +1 +2 +3
A -TTGC C -10 -6 -2 -1 0 +4
| | | |
AGTT-C
THIS WEEK’S HOMEWORK
Protein structure
Chemical entities
Protein families,
motifs and domains
Protein interactions
Pathways
Systems
STRUCTURAL DATA IS CENTRAL
Protein structure
Chemical entities
Protein families,
motifs and domains
Protein interactions
Pathways
Systems
STRUCTURAL DATA IS CENTRAL
Protein structure
Chemical entities
Protein families,
motifs and domains
Protein interactions
Systems
STRUCTURAL DATA IS CENTRAL
>
Chemical entities
Protein families,
motifs and domains
Protein interactions
Pathways
Systems
Sequence Structure Function
• Unfolded chain of • Ordered in a • Active in specific
amino acid chain precise 3D “conformations”
• Highly mobile arrangment • Specific associations
• Inactive • Stable but dynamic & precise reactions
In daily life, we use machines
with functional structure and moving parts
Genomics is a great start ….
▪ But a parts list is not
enough to
understand how a
bicycle works
… but not the end
Native
Compact,
Ordered
Unfolded
Expanded, Disordered
KEY CONCEPT: ENERGY LANDSCAPE
1 millisecond
Barrier crossing time
~exp(Barrier Height)
1 millisecond
Barrier crossing time
~exp(Barrier Height)
Multiple Native Conformations
Native
0.1 microseconds (e.g. ligand bound and unbound)
Barrier
Height State(s)
Compact,
Ordered
Unfolded
State
Molten Globule
Expanded, Disordered
State
Compact, Disordered
OUTLINE:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges
Structural Genomics has bl xtal mounting xtal screening data collection phasing tracing
contributed to driving
down the cost and time publication annotation struc. validation struc. refinement
Residue No.
• Comparison G-protein
• Prediction
• Design
kinesin
side chain
(R group)
main chain
(backbone)
side%chains%
backbone%
φ" ψ
C?terminal
N?terminal
Bond%angles%and%lengths% Peptide%bond%is%planer%
are%largely%invariant (Cα,%C,%O,%N,%H,%Cα%%all%
lie%in%the%same%plane)
Beta Sheet
Alpha Helix
• Steric%hindrance%dictates%torsion%angle%preference%%
• Ramachandran%plot%show%preferred%regions%of%%φ%and%ψ%dihedral%
angles%which%correspond%to%major%forms%of%secondary"structure
α4helix"
• Most%common%from%has%3.6%residues%per%turn%
(number%of%residues%in%one%full%rotation)%%%
• Hydrogen%bonds%(dashed%lines)%between%
residue%i"and%i+4"stabilize%the%structure%
• The%side%chains%(in%green)%protrude%outward%
• 310?helix%and%π?helix%forms%are%less%common
Hydrogen%bond:"i→i+4
In%antiparallel"β4sheets"
• Adjacent%β?strands%run%in%opposite%directions%%
• Hydrogen%bonds%(dashed%lines)%between%NH%and%CO%
stabilize%the%structure%
• The%side%chains%(in%green)%are%above%and%below%the%sheet
Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/
MAJOR SECONDARY STRUCTURE TYPES
ALPHA HELIX & BETA SHEET
In%parallel"β4sheets"
• Adjacent%β?strands%run%in%same%direction%
• Hydrogen%bonds%(dashed%lines)%between%NH%and%CO%
stabilize%the%structure%
• The%side%chains%(in%green)%are%above%and%below%the%sheet
Image from: http://www.ncbi.nlm.nih.gov/books/NBK21581/
What Does a Protein Look like?
• Proteins%are%stable%(and%hidden)%in%water
• Proteins%closely%interact%with%water
• Proteins%are%close%packed%solid%but%flexible%objects%(globular)
• Due%to%their%large%size%and%%complexity%it%is%often%
hard%to%see%whats%important%in%the%structure%
• Backbone%or%main?chain%representation%can%help%
trace%chain%topology%
• Backbone%or%main?chain%representation%can%help%
trace%chain%topology%&%reveal%secondary%structure
• Simplified%secondary%structure%representations%are%
commonly%used%to%communicate%structural%details%%
• Now%we%can%clearly%see%2o,%3o%and%4o%structure%
• Coiled%chain%of%connected%secondary%structures
DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY
Superposition%of%all%482%structures%in%RCSB%PDB%
(23/09/2015)%
DISPLACEMENTS REFLECT INTRINSIC FLEXIBILITY
Principal%component%analysis%(PCA)%of%experimental%structures%
KEY CONCEPT: ENERGY LANDSCAPE
1 millisecond
Barrier crossing time
~exp(Barrier Height)
Multiple Native Conformations
Native
0.1 microseconds (e.g. ligand bound and unbound)
Barrier
Height State(s)
Compact,
Ordered
Unfolded
State
Molten Globule
Expanded, Disordered
State
Compact, Disordered
Key%forces%affec`ng%structure:
• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity%
• Disulfide%Bridges
d
2.6%Å%<%d%<%3.1Å
150°%<%θ%<%180°
Key%forces%affec`ng%structure:
• H?bonding%
• Van%der%Waals% Repulsion%
• Electrosta`cs%
• Hydrophobicity%
Airac`on%
• Disulfide%Bridges
d 3%Å%<%d%<%4Å
Key%forces%affec`ng%structure:
d%%%%%%%%%%d%=%2.8%Å
• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity% (some%`me%called%IONIC%BONDs%or%SALT%BRIDGEs)
• Disulfide%Bridges
Coulomb’s"law E = Energy
k = constant
D = Dielectric constant (vacuum = 1; H2O = 80)
q1 & q2 = electronic charges (Coulombs)
r = distance (Å)
Key%forces%affec`ng%structure:
• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity%
• Disulfide%Bridges
The%force%that%causes%hydrophobic%molecules%or%nonpolar%por`ons%of%molecules%to%
aggregate%together%rather%than%to%dissolve%in%water%is%called%Hydrophobicity%(Greek,"
“water"fearing”).%This%is%not%a%separate%bonding%force;%rather,%it%is%the%result%of%the%
energy%required%to%insert%a%nonpolar%molecule%into%water.
Forces%affec`ng%structure:
• H?bonding%
• Van%der%Waals%
• Electrosta`cs%
• Hydrophobicity%
• Disulfide%Bridges
Other%names:%
cys`ne%bridge%
disulfide%bridge
Hair%contains%lots%of%disulfide%bonds%
which%are%broken%and%reformed%by%heat 10
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges
Two%main%approaches:%
(1).%Physics?Based%
(2).%Knowledge?Based%
KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE
A SYSTEMS ENERGY AS A FUNCTION OF ITS
STRUCTURE
Two%main%approaches:%
(1).%Physics?Based%
(2).%Knowledge?Based%
PHYSICS-BASED POTENTIALS
ENERGY TERMS FROM PHYSICAL THEORY
The Potential Energy Function
2223234
Strengths%
Interpretable,%provides%guides%to%design%
Broadly%applicable,%in%principle%at%least%
Clear%pathways%to%improving%accuracy%
Status%
Useful,%widely%adopted%but%far%from%perfect% %
Mul`ple%groups%working%on%fewer,%beier%approxs%
Force%fields,%quantum%
entropy,%water%effects%
Moore’s%law:%hardware%improving
Put Levit’s Slide here on Computer Power Increases!
–Johnny Appleseed
SIDE-NOTE: GPUS AND ANTON
SUPERCOMPUTER
SIDE-NOTE: GPUS AND ANTON
SUPERCOMPUTER
KEY CONCEPT: POTENTIAL FUNCTIONS DESCRIBE
A SYSTEMS ENERGY AS A FUNCTION OF ITS
STRUCTURE
Two%main%approaches:%
(1).%Physics?Based%
(2).%Knowledge?Based%
KNOWLEDGE-BASED DOCKING POTENTIALS
` d i n e%
His
Ligand
carboxylate
Aroma`c
stacking
ENERGY DETERMINES PROBABILITY
(STABILITY)
Basic idea: Use probability as a proxy for energy
Energy
Boltzmann:
Probability
Inverse%Boltzmann:
Example:%ligand%carboxylate%O%to%protein%his`dine%N%
Find%all%protein?ligand%structures%in%the%PDB%with%a%ligand%carboxylate%O%
1. %%For%each%structure,%histogram%the%distances%from%O%to%every%his`dine%N%
2. %%Sum%the%histograms%over%all%structures%to%obtain%p(rO?N)%
3. %%Compute%E(rO?N)%from%p(rO?N)
KNOWLEDGE-BASED DOCKING
POTENTIALS
“PMF”, Muegge & Martin, J. Med. Chem. (1999) 42:791
A%few%types%of%atom%pairs,%out%of%several%hundred%total
Atom?atom%distance%(Angstroms)
KNOWLEDGE-BASED POTENTIALS
Weaknesses%
Accuracy%limited%by%availability%of%data%
Strengths%
Rela`vely%easy%to%implement%
Computa`onally%fast%
Status%
Useful,%far%from%perfect% %
May%be%at%point%of%diminishing%returns%
(not%always%clear%how%to%make%improvements)
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges
• Proteins"are"intrinsically"flexible"molecules"with"internal"
moCons"that"are"oDen"inCmately"coupled"to"their"
biochemical"funcCon"
– E.g.%%ligand%and%substrate%binding,%conforma`onal%ac`va`on,%
allosteric%regula`on,%etc.%
• Thus"knowledge"of"dynamics"can"provide"a"deeper"
understanding"of"the"mapping"of"structure"to"funcCon""
– Molecular"dynamics%(MD)%and%normal"mode"analysis%(NMA)%are%
two%major%methods%for%predic`ng%and%characterizing%molecular%
mo`ons%and%their%proper`es
MOLECULAR DYNAMICS SIMULATION
t
Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)"
(for%integra`ng%equa`ons%of%mo`on,%see%below)
At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%%
(by%evalua`ng%force4field"gradient)
Nucleic motion described classically
At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%%
(by%evalua`ng%force4field"gradient)
Nucleic motion described classically
Use%the%forces%to%calculate%velociCes%and%move%atoms%to%new%posiCons%
(by%integra`ng%numerically%via%the%“leapfrog”%scheme)"
BASIC ANATOMY OF A MD SIMULATION
Divide%Cme%into%discrete%(~1fs)%Cme"steps"(∆t)"
(for%integra`ng%equa`ons%of%mo`on,%see%below)
At%each%`me%step%calculate%pair?wise%atomic%forces%(F(t))%%
(by%evalua`ng%force4field"gradient)
Nucleic motion described classically
ps )"
12 "Cme
"ste
" = " 10
m
Empirical force field
" 1 s
m e s…
any" C
ny," m
e " m a
rat
Use%the%forces%to%calculate%velociCes%and%move%atoms%to%new%posiCons%
te
T," " (i
EA
(by%integra`ng%numerically%via%the%“leapfrog”%scheme)"
EP
R
MD%Predic`on%of%Func`onal%Mo`ons%
“close”
“open”
Yao%and%Grant,%Biophys%J.%(2013)
Simula`ons%Iden`fy%Key%Residues%
Media`ng%Dynamic%Ac`va`on%
Yao%…%Grant,%Journal%of%Biological%Chemistry%(2016)
EXAMPLE APPLICATION OF
MOLECULAR SIMULATIONS
Structure TO GPCRS
determines function
• Example: G protein-coupled receptors (GPCRs)
• Largest class of human drug targets
• Function: allow the cell to sense and respond to molecules outside it
Binding
Binding
Cell
Cell$
Membrane
membrane
G-protein-
Activation coupling
GPCR
GPCR
G$protein
G protein
PROTEINS JUMP BETWEEN MANY, HIERARCHICALLY
ORDERED “CONFORMATIONAL SUBSTATES”
%Example:%F1?ATPase%in%water%(183,674%atoms)%for%1%nanosecond:%%
%%=>%106%integration%steps%%
%%=>%8.4%*%1011%floating%point%operations/step%%%
%%%%%%%[n(n?1)/2%interactions]%
%%%%%%%Total:% 8.4%*%1017%flop%
%%%%%%(on%a%100%Gflop/s%cpu:% ca"25"years!)%
…"but"performance"has"been"improved"by"use"of:"
%%%%%%multiple%time%stepping% % ca.%%2.5%years%
%%%%%%fast%multipole%methods%% ca.%%%1%year%%
%%%%%%parallel%computers%%% %%%%%%%%ca.%%5%days%
modern%GPUs%%% % %%%%%%%%ca.""1"day"
(Anton"supercomputer%%%%%%%%%ca.""minutes)
COARSE GRAINING: NORMAL MODE ANALYSIS (NMA)
• MD%is%s`ll%`me?consuming%for%large%systems%
• Elas`c%network%model%NMA%(ENM?NMA)%is%an%example%of%a%
lower%resolu`on%approach%that%finishes%in%seconds%even%for%
large%systems.
i
rij
j
C.%G.
• 1%bead%/
1%amino%acid%
• Connected%by%
springs
Atomis`c Coarse%Grained
NMA models the protein as a network of elastic strings
Proteinase K
NEXT UP:
‣ Overview of structural bioinformatics
• Major motivations, goals and challenges
High"throughput"screening
(HTS)
Hit"confirmaCon
Lead"compounds
(e.g.,"µM"Kd)
Lead"opCmizaCon"
(Medicinal"chemistry)
Animal"and"clinical" Potent"drug"candidates
evaluaCon (nM"Kd)"
COMPUTER-AIDED LIGAND DESIGN
Aims%to%reduce%number%of%compounds%synthesized%and%assayed%
Lower%costs%
Reduce%chemical%waste%
Facilitate%faster%progress
Two%main%approaches:%
(1).%Receptor/Target?Based"
(2).%Ligand/Drug4Based%
Two%main%approaches:%
(1).%Receptor/Target?Based"
(2).%Ligand/Drug4Based%
SCENARIO 1:
RECEPTOR-BASED DRUG DISCOVERY
Structure%of%Targeted%Protein%Known:%Structure?Based%Drug%Discovery
HIV%Protease/KNI?272%complex
PROTEIN-LIGAND DOCKING
Structure-Based Ligand Design
Docking%soware
Search%for%structure%of%lowest%energy Poten`al%func`on
Energy%as%func`on%of%structure
VDW
+ 4
Screened%Coulombic
Dihedral
STRUCTURE-BASED VIRTUAL SCREENING
Compound% 3D"structure"of"target
database (crystallography,%NMR,%
modeling)
Virtual"screening
(e.g.,%computaConal"docking)
Candidate%ligands
Ligand%op`miza`on
Med%chem,%crystallography,% Experimental%assay
modeling
Ligands Drug"candidates
COMPOUND LIBRARIES
Commercial%%
Government%(NIH) Academia
(in?house%pharma)
FRAGMENTAL STRUCTURE-BASED
SCREENING
“Fragment”"library 3D"structure"of"target
Fragment"docking
Compound%design
Experimental%assay%and%ligand%op`miza`on Drug"candidates
Med%chem,%crystallography,%modeling
hip://www.beilstein?ins`tut.de/bozen2002/proceedings/Jho`/jho`.html
Multiple non active-site pockets identified
* *
Probe Occupancy
GTP
GDP
Residue No.
ethanol benzene
acetone methylamine
Top hits from ensemble docking against distal pockets were tested for
inhibitory effects on basal ERK activity in glioblastoma cell lines.
RasEnsemble
activity incomputational
different cell lines
docking Compound effect on U251 cell line
96
00
N1
8
SO
02
27
81
30
38
51
73
43
21
13616
DM
117
99660
66
U1
U2
U3
U3
36
64
13
Ras-GTP P-ERK1/2
Total
Total Ras ERK1/2
117028 10 µM
23895
Compound testing in
cancer cell lines
Ligand
Complex
ΔGo
+
COMMON SIMPLIFICATIONS USED IN
PHYSICS-BASED DOCKING
Quantum%effects%approximated%classically%
Protein%oen%held%rigid%
Configura`onal%entropy%neglected%
Influence%of%water%treated%crudely
Two%main%approaches:%
(1).%Receptor/Target?Based"
(2).%Ligand/Drug4Based%
Scenario"2%
Structure%of%Targeted%Protein%Unknown:%Ligand?Based%Drug%Discovery
e.g.%MAP%Kinase%Inhibitors
Using%knowledge%of%
exis`ng%inhibitors%to%
discover%more
Why%Look%for%Another%Ligand%if%You%Already%Have%Some?
Experimental%screening%generated%some%ligands,%but%they%don’t%bind%`ghtly%
A%company%wants%to%work%around%another%company’s%chemical%patents%
An%high?affinity%ligand%is%toxic,%is%not%well?absorbed,%etc.
LIGAND-BASED VIRTUAL SCREENING
Compound%Library Known"Ligands
Molecular"similarity"
Machine?learning%
Etc.
Candidate%ligands
Op`miza`on
Med%chem,%crystallography,%% Assay
modeling
Ac`ves Potent"drug"candidates
CHEMICAL SIMILARITY
LIGAND-BASED DRUG-DISCOVERY
Compounds
(available/synthesizable)
n d s
n %l iga Different
ow
i t h %kn Don’t%bother
p are%w
Co m
Sim
ila
r
Test%experimentally
CHEMICAL FINGERPRINTS
BINARY STRUCTURE KEYS
ly e ate
yl th e yl yd ol
e xy
l
… o nd i n e i n e
n h n h l h h d o %b lor or
e ap eto et thy lde lco mi arb S
p h n k m e a a a c S? ch flu
Molecule%1
Molecule%2
CHEMICAL SIMILARITY FROM
FINGERPRINTS
Tanimoto Similarity
or Jaccard Index, T
Intersec`on NI=2
Union NU=8
Molecule%1
Molecule%2
POTENTIAL DRAWBACKS OF PLAIN
CHEMICAL SIMILARITY
May"miss"good"ligands"by"being"overly"conservaCve"
May"put"too"much"weight"on"irrelevant"details"
%%?%Examine%ligand%shape%and%common%substructures%
%%?%Build%pharmacophore%models%
%%?%Sta`s`cs%and%machine%learning%%on%chemical%descriptors
Maximum%Common%Substructure
Ncommon=34
Pharmacophore%Models%
Φάρμακο%(drug)%+%Φορά%(carry)
Bulky"hydrophobe
A%3?point%pharmacophore
% Å
% ± 0.3 3.2%±0.4%Å
5. 0
+"1 AromaCc
3% Å
% ± 0.
2.8
Molecular%Descriptors
More%abstract%than%chemical%fingerprints
Physical%descriptors%
% molecular%weight%
% charge%
% dipole%moment%
% number%of%H?bond%donors/acceptors%
% number%of%rotatable%bonds% Rotatable%bonds
% hydrophobicity%(log%P%and%clogP)%
Topological%
% branching%index%
% measures%of%linearity%vs%interconnectedness%
Etc.%etc.
A%High?Dimensional%“Chemical%Space”%
Each%compound%is%at%a%point%in%an%n?dimensional%space%
Compounds%with%similar%proper`es%are%near%each%other
Descriptor%3
to r%1
sc rip
D e
Descriptor%2
Point%represen`ng%a%
compound%in%descriptor%
space
Apply%mulCvariate"staCsCcs%and%machine"learning%for%descriptor?selec`on.%
(e.g.%par`al%least%squares,%support%vector%machines,%random%forest,%etc.)
CAUTIONARY NOTES
• “Everything"should"be"made"as"simple"as"it"can"be"but"not"simpler”"%
A%model%is%never"perfect.%%A%model%that%is%not%quan`ta`vely%accurate%in%
every%respect%does%not%preclude%one%from%establishing%results%relevant%
to%our%understanding%of%biomolecules%as%long%as%the%biophysics%of%the%
model%are%properly%understood%and%explored.%%
• CalibraCon"of"the"parameters"is"an"ongoing"and"imperfect"process"
Ques`ons%and%hypotheses%should%always%be%designed%such%that%they%do%
not%depend%crucially%on%the%precise%numbers%used%for%the%various%
parameters.%%
• A"computaConal"model"is"rarely"universally"right"or"wrong"
A%model%may%be%accurate%in%some%regards,%inaccurate%in%others.%%These%
subtle`es%can%only%be%uncovered%by%comparing%to%all%available%
experimental%data.
SUMMARY
Protein structure
Chemical entities
Protein families,
motifs and domains
Protein interactions
Pathways
Systems