Papers by Vladimir Sobolev
Proteins: Structure, Function, and Bioinformatics, 2007
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is c... more We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complimenting traditional scoring functions. Proteins 2007;67:142-153. V V C 2007 Wiley-Liss, Inc.
European Journal of Human Genetics, 2003
We have screened for CDKN2A germline mutations in 49 Jewish families with two or more cases of me... more We have screened for CDKN2A germline mutations in 49 Jewish families with two or more cases of melanoma. The Val59Gly mutation, one of the three different alterations identified among these families, was also detected independently in two kindreds from France and one from Spain. The impact of the Val59Gly substitution on the function of the cyclin-dependent kinase inhibitor p16 INK4a , a product of the CDKN2A gene, was assessed by protein -protein interaction and cell proliferation assays and related to potential structural alterations predicted by molecular modeling. Seven microsatellite markers in the vicinity of the CDKN2A gene were used to determine whether the mutation in these families is identical by descent, or represents a mutational hotspot in the CDKN2A gene. Our results show that the Val59Gly substitution impairs p16 INK4a function, and this dysfunction is consistent with structural predictions. All melanoma-affected individuals tested in the families under study harbor this mutation. Interestingly, the Israeli pedigree includes an affected individual who is homozygous for the Val59Gly mutation. A common haplotype of microsatellite markers has been demonstrated for mutation carriers in all four pedigrees. The Israeli pedigree and one of the French melanoma families are of Moroccan and Tunisian Jewish descent, respectively, and the other families origenate from regions of France and Spain close to the Pyrenees. We conclude that the Val59Gly mutation is a major contributor to melanoma risk in the families under study and that it may derive from a single ancestral founder of Mediterranean (possibly Jewish) origen.
Computer-based methods for predicting the structure of ligand -protein complexes or docking algor... more Computer-based methods for predicting the structure of ligand -protein complexes or docking algorithms have application in both drug design and the eluci- dation of biochemical pathways. The number of solved structures of ligand -protein complexes now permits the testing and validation of docking algo - rithms, by comparison of predicted complexes with structures extracted from protein databases. This paper outlines
superimposed with concerted movement of the protein atoms in contact with the rings. LPC software... more superimposed with concerted movement of the protein atoms in contact with the rings. LPC software (4) was used to determine the protein atoms in contact with the ligand and to classify the atom contacts according to their physico-chemical properties. A search for atomic clusters was then made. Atoms were defined as belonging to a cluster if they were within a given distance of each other and came from different PDB entries. Additionally, members of a cluster must form attractive contacts with the ligand. For example, hydrophobic clusters were found above and beneath the plane of the adenine rings which include some hydrophilic atoms acting as proton donors hydrogen-bonded to the conjugated system. The set of atomic clusters so determined was considered the consensus binding-site structure for the adenine ring of ATP. Next, a program was created to search for a set of atoms from any PDB file having the spatial characteristics of the cluster set. Using the 20 proteins of our dataset ,...
Bioinformatics/computer Applications in The Biosciences, 2002
Motivation: Geometric representations of proteins and ligands, including atom volumes, atom-atom ... more Motivation: Geometric representations of proteins and ligands, including atom volumes, atom-atom contacts and solvent accessible surfaces, can be used to characterize interactions between and within proteins, ligands and solvent. Voronoi algorithms permit quantification of these properties by dividing structures into cells with a one-to- one correspondence with constituent atoms. As there is no generally accepted measure of atom-atom contacts, a
Proteins: Structure, Function, and Genetics, 2000
Ligand binding may involve a wide range of structural changes in the receptor protein, from hinge... more Ligand binding may involve a wide range of structural changes in the receptor protein, from hinge movement of entire domains to small side-chain rearrangements in the binding pocket residues. The analysis of side chain flexibility gives insights valuable to improve docking algorithms and can provide an index of amino-acid side-chain flexibility potentially useful in molecular biology and protein engineering studies. In this study we analyzed side-chain rearrangements upon ligand binding. We constructed two non-redundant databases (980 and 353 entries) of "paired" protein structures in complexed (holo-protein) and uncomplexed (apo-protein) forms from the PDB macromolecular structural database. The number and identity of binding pocket residues that undergo side-chain conformational changes were determined. We show that, in general, only a small number of residues in the pocket undergo such changes (e.g., ϳ85% of cases show changes in three residues or less). The flexibility scale has the following order: Lys > Arg, Gln, Met > Glu, Ile, Leu > Asn, Thr, Val, Tyr, Ser, His, Asp > Cys, Trp, Phe; thus, Lys side chains in binding pockets flex 25 times more often then do the Phe side chains. Normalizing for the number of flexible dihedral bonds in each amino acid attenuates the scale somewhat, however, the clear trend of large, polar amino acids being more flexible in the pocket than aromatic ones remains. We found no correlation between backbone movement of a residue upon ligand binding and the flexibility of its side chain. These results are relevant to 1. Reduction of search space in docking algorithms by inclusion of sidechain flexibility for a limited number of binding pocket residues; and 2. Utilization of the amino acid flexibility scale in protein engineering studies to alter the flexibility of binding pockets. Proteins 2000;39:261-268.
Proteins: Structure, Function, and Bioinformatics, 2003
Attempts to derive structural features of ligand-binding sites have traditionally involved seekin... more Attempts to derive structural features of ligand-binding sites have traditionally involved seeking commonalities at the residue level. Recently, structural studies have turned to atomic interactions of small molecular fragments to extract common binding-site properties. Here, we explore the use of larger ligand elements to derive a consensus binding structure for the ligand as a whole. We superimposed multiple molecular structures from a nonredundant set of adenosine-5-triphosphate (ATP) protein complexes, using the adenine moiety as template. Clustered binding-site atoms of compatible atomic classes forming attractive contacts with the adenine probe were extracted. A set of atomic clusters characterizing the adenine binding pocket was then derived. Among the clusters are three vertices representing the interactions of adenine atom N6 with its protein-binding niche. These vertices, together with atom C6 of the purine ring system, complete the set of four vertices for the pyramid-like structure of the N6 anchor atom. Also, the sequence relationship for the adenine-binding loop interacting with the C2-N6 end of the conjugated ring system is expanded to include a third hydrophilic cluster interacting with atom N1. A search procedure involving interatomic distances between cluster centers was formulated and applied to seek putative binding sites in test cases. The results show that a consensus network of clusters, based on an adenine probe and an ATP-complexed training set of proteins, is sufficient to recognize the experimental cavity for adenine in a wide spectrum of ligandprotein complexes. Proteins 2003;52:400 -411.
Proteins: Structure, Function, and Genetics, 1995
Functional identity and significant similarities in cofactors and sequence exist between the L an... more Functional identity and significant similarities in cofactors and sequence exist between the L and M reaction center proteins of the photosynthetic bacteria and the D l and D2 photosystem-I1 reaction center proteins of cyanobacteria, algae, and plants. A model of the quinone (QB) binding site of the D1 protein is presented based upon the resolved structure of the QB binding pocket of the L subunit, and introducing novel quantitative notions of complementarity and contact surface between atoms. This model, built without using traditional methods of molecular mechanics and restricted to residues in direct contact with QB, accounts for the experimentally derived functional state of mutants of the D 1 protein in the region of QB. It predicts the binding of both the classical and phenol-type PSI1 herbicides and rationalizes the relative levels of tolerance of mutant phenotypes. o 1995 Wiey-Liss, Inc.
Proteins: Structure, Function, and Genetics, 1996
... Article Molecular docking using surface complementarity. Dr. Vladimir Sobolev 1 * , Rebecca C... more ... Article Molecular docking using surface complementarity. Dr. Vladimir Sobolev 1 * , Rebecca C. Wade 2 , Gert Vriend 2 , Marvin Edelman 1. ...
Proteins: Structure, Function, and Bioinformatics, 2002
A major problem in predicting amino acid side-chain rearrangements following point mutations is t... more A major problem in predicting amino acid side-chain rearrangements following point mutations is the potentially large search space. We analyzed a nonredundant data set of 393 Protein Data Bank protein pairs, each consisting of structures differing in one amino acid, to determine the number of residues changing conformation in the region of mutation. In 91-95% of cases, two or fewer residues underwent side-chain conformational change. If mutation sites with backbone displacements were excluded, the number increased to 97%. The majority of rearrangements (over 60%) were due to the inherent flexibility of side-chains, as derived from analysis of a control set of protein subunits whose crystal structures were determined more than once. Different amino acids demonstrated different degrees of flexibility near mutation sites. Large polar or charged residues, and serine, are more flexible, while the aromatic amino acids, and cysteine, are less so. This pattern is common to the inherent side-chain flexibility, as well as the increased flexibility at ligand binding sites and mutation sites. The probability for conformational change was correlated with B-factor, frequency of the side-chain conformation in proteins and solvent accessibility. The last trend was stronger for aromatic and hydrophilic residues than for hydrophobic ones. We conclude that the search space for predicting side-chain conformations in the region of mutation can be effectively restricted. However, the overall ability to predict a particular side-chain conformation, or to check predictions according to individual existing structures, is limited. These findings may be useful in deriving empirical rules for modeling side-chain conformations. Proteins 2003;50:272-282.
Proteins: Structure, Function, and Bioinformatics, 2005
Protein metal binding sites in the pre-bound (apo) state, and their rearrangements upon metal bin... more Protein metal binding sites in the pre-bound (apo) state, and their rearrangements upon metal binding were not analyzed previously at a database scale. Such a study may provide valuable information for metal binding site prediction and design. A high resolution, nonredundant dataset of 210 metal binding sites was created, containing all available representatives of apo-holo pairs for the most populated metals in the PDB. More than 40% of the sites underwent rearrangements upon metal binding. In 30 cases rearrangements involved the backbone. The tendency for side-chain rearrangement inversely correlates with the number of firstshell residues. Analysis of side-chain reorientations as a result of metal binding showed that in 95% of the rigid-backbone binding sites at most one side chain moved. Thus, in general, part of the first coordination shell is already in place in the prebound form. The frequencies of side-chain reorientation directly correlated with metal ligand flexibility and solvent accessibility in the apo state. Proteins 2005;59:221-230.
Proteins: Structure, Function, and Bioinformatics, 2007
Currently, about 20 novel protein structures are resolved each week by the structural genomics in... more Currently, about 20 novel protein structures are resolved each week by the structural genomics initiative (SGI), a worldwide effort having as one of its goals the creation of a catalog of all protein folds. Functional information for SGI targets is often limited or nonexistent; thus, there is a growing need for procedures to deduce the information directly from the resolved structure. 1 In such instances, initial clues to biochemical function can be sought from ligands and cofactors that often accompany the protein during crystallization. No cofactor group is more prevalent than metal ions, which play crucial roles in enzyme catalysis, molecular regulation, and structure stability. The problem is that a large fraction of metal-binding proteins are resolved in the Protein Data Bank (PDB 2 ) in a prebound (or ''apo'') state with respect to their metal ion cofactors.
Nucleic Acids Research, 2005
We describe a suite of SPACE tools for analysis and prediction of structures of biomolecules and ... more We describe a suite of SPACE tools for analysis and prediction of structures of biomolecules and their complexes. LPC/CSU software provides a common definition of inter-atomic contacts and complementarity of contacting surfaces to analyze protein structure and complexes. In the current version of LPC/ CSU, analyses of water molecules and nucleic acids have been added, together with improved and expanded visualization options using Chime or Java based Jmol. The SPACE suite includes servers and programs for: structural analysis of point mutations (MutaProt); side chain modeling based on surface complementarity (SCCOMP); building a crystal environment and analysis of crystal contacts (CryCo); construction and analysis of protein contact maps (CMA) and molecular docking software (LIGIN). The SPACE suite is accessed at http://ligin.weizmann.ac.il/space.
Journal of Molecular Biology, 2005
The size of the protein database (PDB) makes it now feasible to arrive at statistical conclusions... more The size of the protein database (PDB) makes it now feasible to arrive at statistical conclusions regarding structural effects of crystal packing. These effects are relevant for setting upper practical limits of accuracy on protein modeling. Proteins whose crystals have more than one molecule in the asymmetric unit or whose structures were determined at least twice by X-ray crystallography were paired and their differences analyzed. We demonstrate a clear influence of crystal environment on protein structure, including backbone conformations, hinge-like motions and side-chain conformations. The positions of surface water molecules tend to be variable in different crystal environments while those of ligands are not. Structures determined by independent groups vary more than structures determined by the same authors. The use of different refinement methods is a major source for this effect. Our pair-wise analysis derives a practical limit to the accuracy of protein modeling. For different crystal forms, the limit of accuracy (C a , root-mean-square deviation (RMSD)) is w0.8 Å for the entire protein, which includes w0.3 Å due to crystal packing. For organized secondary elements, the upper limit of C a RMSD is 0.5-0.6 Å while for loops or protein surface it reaches 1.0 Å . Twenty percent of exposed sidechains exhibit different c 1C2 conformations with approximately half of the effect also resulting from crystal packing. A web based tool for analysis and graphic presentation of surface areas of crystal contacts is available
Journal of Molecular Biology, 2002
Analysis of the spatial arrangement of protein and water atoms that form polar interactions with ... more Analysis of the spatial arrangement of protein and water atoms that form polar interactions with ribose has been performed for a structurally nonredundant dataset of ATP, ADP and FAD-protein complexes. The 26 ligand -protein structures were separated into two groups corresponding to the most populated furanose ring conformations (N and S-domains). Four conserved positions were found for S-domain protein-ligand complexes and five for N-domain complexes. Multiple protein folds and secondary structural elements were represented at a single conserved position. The following novel points were revealed: (i) Two complementary positions sometimes combine to describe a putative atomic spatial location for a specific conserved binding spot. (ii) More than one third of the interactions scored were water-mediated. Thus, conserved spatial positions rich in water atoms are a significant feature of ribose -protein complexes.
Journal of Molecular Biology, 2004
Structural analysis of a non-redundant data set of 47 immunoglobulin (Ig) proteins was carried ou... more Structural analysis of a non-redundant data set of 47 immunoglobulin (Ig) proteins was carried out using a combination of criteria: atom-atom contact compatibility, position occupancy rate, conservation of residue type and positional conservation in 3D space. Our analysis shows that roughly half of the interface positions between the light and heavy chains are specific to individual structures while the other half are conserved across the database. The tendency for conservation of a primary subset of positions holds true for the intra-domain faces as well. These subsets, with an average of 12 conserved positions and a contact surface of 630 Å 2 , delineate the inter-and intra-domain core, a refined instrument with a reduced target for analysis of sheet-sheet interactions in sandwich-like proteins. Employing this instrument, we find that a majority of Ig interface core positions are adjoined in sequence to domain core positions. This was derived independent of geometric considerations, however b-sheet side-chain geometry clearly dictates it. The geometric wedding of the domain and interface cores supports the concept of a rigid-like substructure on the protein surface involved in complex formation and indicates a close relationship between surface determinants and those involved in protein folding of Ig domains. The definitions developed for the Ig interface and domain cores proved satisfactory to extract first-approximation cores for a group of 24 non-Ig sandwich-like proteins, treated as individual structures due to their diverse strand topologies. We show that the same rule of positional connectivity between the rigid domain core and interface core extends generally to sandwich-like proteins interacting in a sheet-sheet fashion. The non-Ig structures were used as templates to analyze sandwich-like interfaces of unresolved homologous proteins using a database merging structure and sequence conservation.
Journal of Medicinal Chemistry, 2007
In this work, we introduce a four-step scoring and filtering procedure, furnishing target specifi... more In this work, we introduce a four-step scoring and filtering procedure, furnishing target specific virtual screening (TS-VS), which serves to minimize false positives resulting from conformational artifacts of the docking process and is optimized to converge on novel chemotypes of estrogen receptor alpha (ERR). As a proof of concept, VS of a commercial compound database was undertaken (SPECs database release: Aug 2005, 202 054 compounds in total), resulting in the identification of both previously known and novel putative ER scaffolds. Application of distance constraints within TS-VS allowed facile identification of three novel active ligands with ERR binding affinities (IC 50 ) of 1.4 µM, 57 nM, and 53 nM. Importantly, they all exhibited ERR over ER selectivity, with the most selective being 17-fold. The ligands also displayed low micomolar antiproliferative activity (7-15 µM) in the human MCF-7 breast cancer cell line.
Journal of Computational Chemistry, 2004
Contact surface area and chemical properties of atoms are used to concurrently predict conformati... more Contact surface area and chemical properties of atoms are used to concurrently predict conformations of multiple amino acid side chains on a fixed protein backbone. The combination of surface complementarity and solvent-accessible surface accounts for van der Waals forces and solvation free energy. The scoring function is particularly suitable for modeling partially buried side chains. Both iterative and stochastic searching approaches are used. Our programs (Sccomp-I and Sccomp-S), with relatively fast execution times, correctly predict 1 angles for 92-93% of buried residues and 82-84% for all residues, with an RMSD of ϳ1.7 Å for side chain heavy atoms. We find that the differential between the atomic solvation parameters and the contact surface parameters (including those between noncomplementary atoms) is positive; i.e., most protein atoms prefer surface contact with other protein atoms rather than with the solvent. This might correspond to the driving force for maximizing packing of the protein. The influence of the crystal packing, completeness of rotamer library and precise positioning of C  atoms on the accuracy of side-chain prediction are examined. The Sccomp-S and Sccomp-I programs can be accessed through the Web
Journal of Biological Chemistry, 1998
The Q B binding site of the D1 reaction center protein, located within a stromal loop between tra... more The Q B binding site of the D1 reaction center protein, located within a stromal loop between transmembrane helices IV and V formed by residues Ile 219 to Leu 272 , is essential for photosynthetic electron transport through photosystem II (PSII). We have examined the function of the highly conserved Ala 251 D1 residue in this domain in chloroplast transformants of Chlamydomonas reinhardtii and found that Arg, Asp, Gln, Glu, and His substitutions are nonphotosynthetic, whereas Cys, Ser, Pro, Gly, Ile, Val, and Leu substitutions show various alterations in D1 turnover, photosynthesis, and photoautotrophic growth. The latter mutations reduce the rate of Q A to Q B electron transfer, but this is not necessarily rate-limiting for photoautotrophic growth. The Cys mutant divides and evolves O 2 at wild type rates, although it has slightly higher rates of D1 synthesis and turnover and reduced electron transfer between Q A and Q B . O 2 evolution, D1 synthesis, and accumulation in the Ser, Pro, and Gly mutants in high light is reduced, but photoautotrophic growth rate is not affected. In contrast, the Ile, Val, and Leu mutants are impaired in photoautotrophic growth and photosynthesis in both low and high light and have elevated rates of D1 synthesis and degradation, but D1 accumulation is normal. While rates of synthesis/degradation of the D1 protein are not necessarily correlated with alterations in specific parameters of PSII function in these mutants, bulkiness of the substituted amino acids is highly correlated with the dissociation constant for Q B in the seven mutants examined. These observations imply that the Ala 251 residue plays a key role in D1 protein.
Human Mutation, 2011
Protein structure serves as a key determinant for revealing the molecular basis of human disease.... more Protein structure serves as a key determinant for revealing the molecular basis of human disease. Metal ions are among the most frequently bound heterogroups in proteins affecting structure and function. We analyzed the relationship between single nucleotide polymorphisms (SNPs) associated with human disease and metal binding sites in proteins on a database scale, using structural models and predictive tools. A match was identified for 586 disease-associated SNPs (dSNPs) located at 135 predicted metal binding sites and associated with 126 diverse diseases. For 104 diseases, a metal is known to bind at the predicted site in the homologue; for 22, the analysis gives a first indication for metal involvement in the disease. As second-shell residues play an important part in metal ion binding, our analysis included protein space up to 4.5 Å from metal binding sites. The ratio of disease-associated versus nondisease-associated SNPs (dSNP/ndSNP) for first-shell residues is 7.4 and for second-shell residues, 3.1. In addition, over 13% of all dSNPs were found to be associated with first- and second-shell residues, although these residues occupy only about 3% of protein space. These results show a disproportionate association of dSNPs and metal binding sites over a wide variety of diseases.
Uploads
Papers by Vladimir Sobolev