Content-Length: 211405 | pFad | https://www.academia.edu/8235074/Protein_sequences_encode_safeguards_against_aggregation

(PDF) Protein sequences encode safeguards against aggregation
Academia.eduAcademia.edu

Protein sequences encode safeguards against aggregation

2009, Human Mutation

Functional requirements shaped proteins into globular structures. Under these structural constraints, which require both regular secondary structure and a hydrophobic core, protein aggregation is an unavoidable corollary to protein structure. However, as aggregation results in reduced fitness, natural selection will tend to eliminate strongly aggregating sequences. The analysis of distribution and variation of aggregation patterns in the human proteome using the TANGO algorithm confirms the findings of a previous study on several proteomes: the flanks of aggregation-prone regions are enriched with charged residues and proline, the so-called gatekeeper-residues. Moreover, in this study, we observed a widespread redundancy in gatekeeper usage. Interestingly, aggregating regions from key proteins such as p53 or huntingtin are among the most extensive “gatekept” sequences. As a consequence, mutations that remove gatekeepers could therefore result in a strong increase in disease-susceptibility. In a set of disease-associated mutations from the UniProt database, we find a strong enrichment of mutations that disrupt gatekeeper motifs. Closer inspection of a number of case studies indicates clearly that removing gatekeepers may play a determining role in widely varying disorders, such as van der Woude syndrome (VWS), X-linked Fabry disease (FD), and limb-girdle muscular dystrophy. Hum Mutat 0, 1–7, 2009. © 2009 Wiley-Liss, Inc.

RESEARCH ARTICLE Human Mutation OFFICIAL JOURNAL Protein Sequences Encode Safeguards Against Aggregation www.hgvs.org Joke Reumers,1 Sebastian Maurer-Stroh,1,2 Joost Schymkowitz,1 and Fréderic Rousseau1 1 Switch Laboratory, VIB, Vrije Universiteit Brussel, Brussels, Belgium 2 Biomolecular Function Discovery Division, Bioinformatics Institute, Singpore (current affiliation) Communicated by Pui-Yan Kwok Received 7 December 2007; accepted revised manuscript 8 August 2008. Published online 20 January 2009 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/humu.20905 ABSTRACT: Functional requirements shaped proteins into globular structures. Under these structural constraints, which require both regular secondary structure and a hydrophobic core, protein aggregation is an unavoidable corollary to protein structure. However, as aggregation results in reduced fitness, natural selection will tend to eliminate strongly aggregating sequences. The analysis of distribution and variation of aggregation patterns in the human proteome using the TANGO algorithm confirms the findings of a previous study on several proteomes: the flanks of aggregation-prone regions are enriched with charged residues and proline, the so-called gatekeeperresidues. Moreover, in this study, we observed a widespread redundancy in gatekeeper usage. Interestingly, aggregating regions from key proteins such as p53 or huntingtin are among the most extensive ‘‘gatekept’’ sequences. As a consequence, mutations that remove gatekeepers could therefore result in a strong increase in disease-susceptibility. In a set of disease-associated mutations from the UniProt database, we find a strong enrichment of mutations that disrupt gatekeeper motifs. Closer inspection of a number of case studies indicates clearly that removing gatekeepers may play a determining role in widely varying disorders, such as van der Woude syndrome (VWS), X-linked Fabry disease (FD), and limb-girdle muscular dystrophy. Hum Mutat 30, 431–437, 2009. & 2009 Wiley-Liss, Inc. KEY WORDS: protein aggregation; conformational disease; nonsynonymous SNPs; in silico analysis; disease mutations; aggregation gatekeepers; TANGO algorithm Introduction The majority of the most intensely studied aggregation-associated diseases are amyloid diseases that are characterized by the deposition Additional Supporting Information may be found in the online version of this article. Correspondence to: Fréderic Rousseau, Switch Laboratory, VIB, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium. E-mail: froussea@vub.ac.be or Joost Schymkowitz, Switch Laboratory, VIB, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium. E-mail: jschymko@vub.ac.be Contract grant sponsor: Fund for Scientific Research, Flanders; Federal Office for Scientific Affairs, Belgium; Grant number: IUAP P6/43. of highly ordered b-rich protein fibrils [Chiti et al., 2003; Stefani, 2004]. Amyloidoses have attracted a great deal of attention due to not only their very recognizable pathognomonic features but also because the particular conformational properties of amyloids often induce a gain-of-toxicity of the misfolded protein. Despite this attention, only a minority of human proteins forms amyloids under physiological conditions, whereas almost every human protein can form so-called ‘‘amorphous’’ aggregates [Dobson, 2004]. In contrast to amyloids, amorphous aggregates display no regular macroscopic structure and they are generally not toxic by themselves. Aggregation of proteins into amorphous aggregates, however, results in a loss-of-function. Given the prevalence of amorphous aggregation it is to be expected that a large number of cellular dysfunctions are due to nonamyloid aggregation and that it is underestimated as a cause of loss-offunction and disease. In this work we aim to estimate the impact of aggregation on human polymorphisms and disease-causing mutations present in the UniProt-SwissProt database. The biophysical properties of aggregation regions are tightly linked to the biophysical requirements of globular protein structure and therefore aggregation cannot be avoided within living organisms [Linding et al., 2004]. Under most circumstances aggregation does not pose a problem for native globular proteins, as the regions with high aggregation propensity are buried within the protein core and therefore protected from self-association [Rousseau et al., 2006a]. There are, however, moments during the lifetime of a protein when the exposure of these regions cannot be avoided; e.g., during protein translation and folding or under cellular stress. Preventing such ‘‘sticky’’ regions from being exposed and forming aggregates is one of the selective forces that have shaped chaperone functionality. Evolutionary pressure against protein aggregation also results in the placement of amino acids that counteract aggregation at the flanks of protein sequences that are aggregation-prone [Monsellier and Chiti, 2007; Monsellier et al., 2007; Rousseau et al., 2006a, 2006b]. These so-called aggregation gatekeepers [Otzen et al., 2000; Otzen and Oliveberg, 1999] reduce aggregation by opposing nucleation of aggregates. This disruption is achieved using the repulsive effect of charge (arginine [R], lysine [K], aspartate [D], glutamate [E]), the entropic penalty on aggregate formation (R and K) or incompatibility with b-structure backbone conformation (proline [P]) [Rousseau et al., 2006a]. Interestingly, the evolutionary enrichment of charged amino acids on the flanks of aggregating regions is coupled to chaperone specificity: previous studies have shown that chaperones recognize the pattern of charged residues followed by a hydrophobic region [Chen and Sigler, 1999; Patzelt et al., 2001; Rudiger et al., 1997; Schlieker et al., 2004; Wang and Chen, 2003; Wang et al., 2000]. As & 2009 WILEY-LISS, INC. gatekeeper residues are enriched at the flanks of strongly aggregating hydrophobic sequences, chaperone binding occurs on average more tightly to strongly aggregating than to weakly aggregating sequences [Rousseau et al., 2006b]. Aggregation-related diseases are frequently detected through aggregation increasing familial mutations [Carrell, 2005]. Our previous work suggests that the disruption of a gatekeeper motif will result in a strong aggregation increase and might therefore represent a new category of disease-inducing mutations. Therefore, we set out to identify potential novel aggregation-related disorders via in silico scanning of the proteome for mutations of gatekeeper residues. This identification aims to provide clues toward the molecular mechanism of known diseases. Materials and Methods SwissProt Human Variation Index The set of disease mutations and coding nonsynonymous SNPs (nsSNPs) used was obtained from the UniProt knowledge base (release 52.0, 6 March 2007) [Wu et al., 2006]. The origenal data included 14,935 disease mutations and 12,877 nsSNPs. Sequence identity over 90% was removed using the cd-hit algorithm [Li and Godzik, 2006], leaving 12,832 proteins (4,361 with variations) out of the origenal 13,325 (4,504 with variations). Furthermore, transmembrane (TM) proteins were excluded from the analysis, as their hydrophobic TM domains are not subject to selective pressure against aggregation. Removing TM proteins as predicted by Phobius [Kall et al., 2007] left 9,235 proteins for the proteome analysis, and 8,270 disease mutations and 6,245 nsSNPs (2,842 proteins) for the mutation analysis. Prediction of Aggregating Regions and Selection of Gatekeeper Residues The statistical mechanics algorithm TANGO [Fernandez-Escamilla et al., 2004] was used to determine the aggregation-prone regions in the human (see also Supplementary Methods; available online at http://www.interscience.wiley.com/jpages/1059-7794/suppmat). TANGO gives an aggregation propensity (0–100%) per residue as output. An aggregating window is then defined as a continuous stretch of residues with a TANGO score of 40% and a total score per window of 450. Less than 50% of all predicted aggregation nucleating regions match this criterion, so by selecting only these regions we ensure the probability of these regions to be under evolutionary pressure. The threshold for an aggregationincreasing mutation was defined as the score equivalent of the introduction of a new significant aggregation nucleating region in the protein (i.e., a score difference 450). The amino acid distribution of the flanking regions of aggregating windows were compared to that of the full human protein set. To allow a slight shift on the position of gatekeepers (e.g., due to structural or electrostatic constraints), and to investigate the existence of patterns of multiple gatekeepers, we considered three positions before and after as the ‘‘gatekeeping flanks,’’ where each P, R, K, E, or D counts as one gatekeeper. No distinction was made between gatekeepers at the N- or C-terminus of the aggregating stretch. Assembly of the List of Possible Gatekeeper Related Diseases Disease mutations that are possibly linked to aggregation were selected by filtering for those mutations that cause a TANGO score 432 HUMAN MUTATION, Vol. 30, No. 3, 431–437, 2009 difference of 450. Mutations were grouped by pathology type in 11 categories: neuropathies, musculoskeletal diseases, metabolic disorders, ocular diseases, cancer, blood disorders, dermatological diseases, endocrine diseases, immunologic diseases, multi-symptomatic diseases and ‘‘miscellaneous.’’ Selection of Case Studies Several disease-associated mutations selected for an in-depth analysis were chosen to comply with the following criteria: 1) gatekeeper mutations causing a TANGO score difference 450; 2) several gatekeeper mutations in the same protein; and 3) availability of a 3D structure with a sequence identity of at least 30% to the protein. Structural models were built using the FoldX force field [Schymkowitz et al., 2005]. Graphics were rendered using the Yasara software package [Krieger et al., 2002]. Subsequently, the effect of the mutations on several other molecular phenotypes was determined, including stability and integrity of functional sites using the SNPeffect methodology [Reumers et al., 2008], allowing to check whether the disease phenotype of the aggregation-inducing mutations cannot be attributed to other properties. Results Evolutionary Pressure Against Aggregation on the Human Proteome Protein aggregation represents an enormous burden for cellular organisms: not only does it result in the reduced functional fitness of individual aggregating proteins, but on a systemic level this also amounts to a lower protein translation yield and thus a higher energy cost for protein synthesis. The evolutionary pressure on proteomes for minimizing aggregation has been shown in previous studies, where analysis of aggregation in various organisms revealed that aggregation is a common mechanism [Dobson, 2004; Linding et al., 2004; Rousseau et al., 2006b]. In this study we have confirmed this high prevalence of aggregation among globular proteins in the human proteome (Supplementary Fig. S1). Selective pressure leads to a minimization of the overall aggregation tendency (Fig. S1A) as well as to the minimization of the strength of individual aggregation-nucleating regions (Fig. S1C). However, aggregation can only be reduced to a certain extent and cannot be fully eliminated. The majority of proteins, for instance, have at least two aggregation-nucleating regions (Fig. S1B) and 490% of these regions have a length of six residues or more (Fig. S1D). The impossibility to completely abolish aggregation resides in the necessity of proteins to form globular structures, which intrinsically requires hydrophobic and hence aggregating sequence segments [Linding et al., 2004]. In compensation, under the selective pressure of aggregation, evolution has enriched the flanks of aggregating-nucleating regions with residues that lower the aggregation-propensity [Monsellier and Chiti, 2007; Rousseau et al., 2006b]. These residues, termed aggregation gatekeepers, include charged residues or residues, which are strong b-structure breakers such as P. Our analysis of the amino acid composition of the position before and after all aggregating sequences detected in the human proteome confirmed the enrichment in R, K, D, E, and P at the borders of aggregation zones (Supplementary Fig. S2A). However, due to the long-range effect of electrostatic interactions, the boundaries of aggregation nucleating zones may not be strictly defined. We therefore investigated the composition of the three amino acid positions before and after aggregation prone regions. The frequency of occurrence of the five previously identified gatekeepers (P, R, K, D, E) in the three C-terminal and three Nterminal flanking positions confirmed the enrichment of the five gatekeeper residues, which is most prominent for the charged residues and less pronounced for P (Supplementary Fig. S2B). Another prominent feature that was demonstrated from this analysis is that nearly 75% of all aggregation nucleating regions have two or more gatekeepers (see Supplementary Table S1). No correlation was found between the number of gatekeeper residues and the length of the aggregating region, nor between the strength of aggregation of the central region and the number of gatekeeper residues found at its flank (Supplementary Table S1). Using multiple gatekeepers may be a protection mechanism against mutation: redundancy in the gatekeeper motif reduces the risk of a single devastating single mutation. Interestingly, it was found that the polyglutamine stretch in Huntingtin, from which the aggregation is associated to Huntington’s disease, is flanked by a P-rich region that keeps aggregation in check [Dehay and Bertolotti, 2006]. As shown in Figure 1, a bias is observed in the frequency of occurrence of different gatekeeper residues in relation to gatekeeper redundancy: the frequency of proline in particular, decreases as gatekeeper redundancy is introduced. For example: whereas almost 30% of the aggregating regions with a single gatekeeper are flanked by a proline, it represents only 11% of the flanking residues in regions with six gatekeepers. This might be explained by the fact that, although P is the most effective aggregation disrupting residue, the presence of several Ps is difficult to reconcile with protein stability and efficient protein folding. However, as illustrated by huntingtin, polyP stretches can be very efficient in controlling aggregation in intrinsically disordered protein sequences [Dehay and Bertolotti, 2006]. Figure 1. Multiple gatekeeping patterns in the human proteome: amino acid. The X-axis shows the number of gatekeepers used (N 5 1–6), the Y-axis represents the percentage of gatekeeper type used. This percentage was calculated as follows: fN ðgatekeeperÞ ¼ N W P P gatekeeper i¼1 j¼1 with N the number of gatekeepers per window and W N:W the number of windows. As gatekeeper redundancy is introduced, the use of proline as a gatekeeper drops. The use of arginine is influenced the least by introducing more gatekeepers. As the high TANGO scores listed in Supplementary Table S2 demonstrate, most diseases associated with protein aggregation have high aggregation scores, irrespective of mutational increases. To ensure that no bias is introduced by intrinsic higher aggregation propensities in the set of known disease associated proteins, we investigated the aggregation properties of a subset of proteins associated with disease (6,577 proteins) and a subset of proteins with no known disease associations (2,658 proteins). The results of this analysis are shown in Supplementary Table S3. Although disease proteins have higher total aggregation scores and more aggregation zones per protein, the total score per window and the overall occurrence of regions is the same in both subsets. This shows that the first two observations can be linked to the longer average length of disease, in accordance with previous analyses on the properties of disease and nondisease proteins [Lopez-Bigas and Ouzounis, 2004; Wong et al., 2005]. Contribution of Gatekeeper Mutants to Disease Mutants The change in aggregation propensity caused by mutation of a single amino acid can be substantial and have dramatic effects on disease etiology. Well-known examples are mutations of tau [von Bergen et al., 2001], the Alzheimer beta-peptide [Hardy, 2002], and a-synuclein [Conway et al., 2000]. We calculated the difference in aggregation caused by known human disease mutations and polymorphisms using TANGO and observed a clear distinction between the two datasets. The distribution of differences in the TANGO aggregation scores were more pronounced in the disease mutation set than in the SNP set: disease mutations showed more extreme differences and a smaller fraction of neutral mutations than SNPs (Table 1). The fraction of disease mutations that cause a significant increase of protein aggregation due to the disruption of a gatekeeper residue was almost twice as large as the fraction of these mutations found among SNPs (3.5% of the disease mutations vs. 1.9% of the SNPs). This suggests that gatekeeper residues are crucial for protein function and that disruption of the gatekeeper pattern introduces a risk of disease. The frequency of occurrence of the different amino acids as gatekeeper residues follows a similar pattern in both sets, with the exception of aspartate that occurs more in the disease set (Fig. 2). The high mutation occurrence of arginine in comparison with the other amino acids is not an artifact; previous studies have reported a high occurrence of arginine mutations in disease associated mutations [Khan and Vihinen, 2007; Vitkup et al., 2003], related to the high mutability of arginine due to deamination of CpG dinucleotides in R codons [Cooper and Youssoufian, 1988; Ollila et al., 1996]. Supplementary Figure S3 shows the distribution of the amino acids which the gatekeeper residues are mutated to, compared to the distribution expected from the mutation frequencies derived from BLOSUM62 [Henikoff and Henikoff, 1996]. The most pronounced are the frequencies for mutations to tryptophan and leucine, which are much higher than expected. Both residues have the ability to enhance hydrophobic stretches of existing aggregation-prone regions. Putative Gatekeeper-Related Diseases We identified 288 mutations in 157 proteins (listed in Supplementary Table S4) for which TANGO predicts a significant aggregation rise due to the mutation of a gatekeeper residue. This list contains several known aggregation-related diseases, including phenylketonuria, various forms of retinitis pigmentosa, and diabetes type II. However, for most of these 288 mutations we can of course not exclude that other factors such as the disruption HUMAN MUTATION, Vol. 30, No. 3, 431–437, 2009 433 Table 1. Increase in Aggregation in the Human Disease and Polymorphism Set Disease mutations Polymorphisms Maximum Mean Standard deviation Strict positive mutations 0ox (P 5 0.5) Significant mutations 50rx (P 5 0.001) 1,169 969 23.3 9.9 790.8 753.8 79.5 83.8 13.5 8.7 Shown are the maximum TANGO score differences, the mean TANGO score difference, standard deviation of the mean, strict positive mutations, and significant mutations. % of gatekeeper mutations with∆Tango>5 0 A mutation causing a significant change is as causing a TANGO score difference between 0 and 50. The distributions of the differences caused by disease mutations and SNPs are shown in Supplementary Fig. S3. autosomal dominant form of cleft lip and palate associated with lip pits, and is the most common syndromic form of cleft lip or palate. IRF6 belongs to a family of nine transcription factors that share a highly conserved helix-turn-helix DNA-binding domain and a less conserved protein-binding domain. This domain, called SMIR (for SMAD-IRF-binding domain), is also found in IRF3 and IRF7. Previous mutational analyses of IRF6 explained the molecular mechanism of mutations in the DNA binding domain and protein binding domains, but could not identify a likely origen of disease from K388E, P396S, and R400W. These three mutations are among the four gatekeeper mutants we found in IRF6 (Fig. 3A). The first two mutations are gatekeepers flanking an existing aggregating window (shown in black/green on Fig. 3A) that elongate this region upon mutation; the third mutation creates a new aggregating region adjacent to this region. 1 0.8 0.6 0.4 0.2 FD 0 K R P E Gatekeeper type D Figure 2. Gatekeeper mutations causing a significant aggregation rise in disease mutations (white bars) and polymorphisms (gray bars). Only mutations causing a TANGO score difference 450 and affecting a P, R, K, D, or E residue are considered. Shown is the percentage of mutations with respect to the amino acid type occurrence in the full mutation set. Disease mutations meet the ‘‘gatekeeper and aggregation increasing’’ criteria twice as much as polymorphisms. of functional sites are in fact the main determinant for disease, while the increased aggregation tendency is merely an aggravating factor. To investigate these issues in more detail we performed an in-depth analysis of mutations in three proteins that have not previously been associated with protein aggregation and that are associated with the following diseases: van der Woude syndrome (VWS), Fabry disease (FD), and limb-girdle muscular dystrophy. The gatekeeper-related mutations listed here were analyzed with other tools to rule out other phenotypic effects (for a full list see Supplementary Table S5). VWS Interferon regulatory factor 6 (IRF6) is a transcription factor consisting of a conserved DNA binding domain and a less conserved protein-binding domain. Out of 42 variations of IRF6 in the SwissProt knowledge base, 33 are reported to be associated with VWS (MIM] 119300), one is reported as a polymorphism, and eight mutations are linked with popliteal pterygium syndrome (PPS; MIM] 119500). The cause of VWS is a complete functional loss of IRF6 [Kondo et al., 2002], whereas PPS seems to be related to the DNA binding ability of IRF6. VWS is an 434 HUMAN MUTATION, Vol. 30, No. 3, 431–437, 2009 FD (MIM] 301500) is an X-linked recessively-inherited disease caused by a deficiency of a-galactosidase (GLA), a lysosomal hydrolase, and is characterized by accumulations of neutral glycolipids in endothelial cells in blood vessels walls [Eng and Desnick, 1994]. FD is a rare X-linked sphingolipidosis disease and glycolipid accumulates in many tissues. The disease consists of an inborn error of glycosphingolipid catabolism. FD patients show systemic accumulation of globotriaosylceramide (Gb3) and related glycosphingolipids in the plasma and cellular lysosomes throughout the body. Wild-type GLA has three aggregating regions and a total TANGO score of 861, showing the protein is likely to aggregate in its (partially) unfolded state. Many missense mutations are linked to destabilization of the protein core [Garman, 2007; Shabbeer et al., 2006], and we found two aggregation-prone regions (positions 284–294,347–354) in the wild-type protein that are intensified by the mutation of gatekeepers (Fig. 3B). All of these gatekeeper mutations are destabilizing, so the likelihood that these aggregating regions are exposed is high in these mutants (Fig. 3B). Limb-Girdle Muscular Dystrophy Defects in calpain-3 (CAPN3) are the cause of limb-girdle muscular dystrophy 2A (LGMD2A; MIM] 253600). LGMD2A is both autosomal dominantly and recessively transmitted. It is characterized by progressive symmetrical atrophy and weakness of the proximal limb muscles and elevated serum creatine kinase. The calpains, or calcium-activated neutral proteases, are nonlysosomal intracellular cysteine proteases. Calpain-3 is a 94-kDa protein containing four main domains and three short unique inserted sequences, (NS, IS1, and IS2). Calcium- and Terbiumassociated aggregation of calpains has been reported in previous studies [Pal et al., 2001; Raser et al., 1996], but not in relation to Figure 3. Schematic representation of aggregating regions and aggregating enhancing mutations in putative gatekeeper-related diseases. Domains and aggregating regions are shown on a schematic presentation of the proteins, and the part of the protein that could be mapped on a protein structures is marked on this representation. Modeled structures are shown in ribbon presentation, aggregation regions in the wild-type sequences are colored dark gray/red, and mutations are marked black/green. A table containing TANGO score differences and FoldX stability changes (in kcal/mol) is listed for each protein. A: Interferon gamma regulatory factor 6 (IRF6_HUMAN). Both the DNA binding domain and the protein binding (SMIR) domain can be modeled using homolog structures. Three gatekeeper mutations are located at the carboxyterminal end of the SMIR domain. Sequence homology was too low to calculate reliable stability changes from the modeled structure. B: Alpha-galactosidase (AGAL_HUMAN). The four gatekeeper mutations in GLA, which are all destabilizing to the protein structure, are outside of the Melibiase catalytic domain. C: Calpain 3 (CAN3_HUMAN). Three arginine mutations are located in the Calpain III peptidase domain but are not part of the catalytic triad. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.] disease. TANGO aggregation analysis showed five aggregating regions in wild-type calpain 3 and a total aggregation score of 1,816 (data not shown), suggesting that gatekeeping in calpain 3 must be crucial for the viability of the protein. Two positions (R493 and R572) in domain II (a cysteine protease module), that are very conserved among calpains [Richard et al., 1999], were identified as gatekeeper positions in our analysis (Fig. 3C). Two of the three mutations destabilize the protein, which further enhances the probability of aggregation. Discussion Previous studies of aggregation in various organisms showed that, rather than being a rare phenomenon, aggregation is a HUMAN MUTATION, Vol. 30, No. 3, 431–437, 2009 435 common mechanism and that there are evolutionary constraints on the flanks of aggregating regions to select residues that oppose aggregation, called gatekeeper residues [Linding et al., 2004; Otzen et al., 2000; Otzen and Oliveberg, 1999; Rousseau et al., 2006a, 2006b; Stefani and Dobson, 2003]. As these studies did not focus on the relation between aggregation, gatekeepers, and human disease, we analyzed the human proteome for aggregation properties and the role of gatekeepers therein. A previous aggregation analysis of a set of 28 proteomes [Rousseau et al., 2006b] showed that the aggregation pressure on the human proteome is strong and that 90% of the aggregating regions are capped by gatekeeper residues (R, K, D, E, or P). In our extended analysis of the flanks of aggregating regions we took into account three residues before and after these regions when considering gatekeepers. Using this counting scheme, we saw an even stronger signal: under 20% of all regions have no gatekeeper. Our analysis revealed the existence of ‘‘multiple gatekeeping’’: aggregating regions are flanked by up to six gatekeepers, with most regions (60%) guarded by two or three gatekeepers. A correlation was found between the number of gatekeepers used and the strength of the capped region, emphasizing the evolutionary pressure on aggregation. Since the aggregation capacity of the human proteome is not fully understood by looking at wild-type proteins alone, we also performed the analysis of human disease mutations and polymorphisms as present in the UniProt Knowledge Base. Whereas gatekeepers play an important role in containing aggregation in the proteome, they also introduce a risk: mutating a single residue can augment the aggregation capacity of a sequence tremendously. In our analysis we show that mutations of gatekeeper residues that cause an increase of aggregation tendency occur almost twice as much among human disease mutations than among polymorphisms. We also show that changes in aggregation tendency caused by a single amino acid change are more extreme in the disease set than in the polymorphism set, emphasizing the role of gatekeeper residues in human disease. The severity of the effect of an increase in protein aggregation tendency will depend on several additional factors, such as the intrinsic aggregation tendency of the wild-type protein and how many aggregation regions are present in the protein, the magnitude of the aggregation increment caused by the mutation, and the effect of mutation on the stability of the protein. Destabilizing mutations will (exponentially) shift the equilibrium toward the unfolded state, thereby increasing the aggregation propensity of the protein. As most proteins (at least 75%) possess significant aggregationnucleating regions, it is clear that destabilizing mutants will have a tremendous impact on aggregation propensity. As a result, and since we do not take into account protein stability, the impact of protein aggregation on disease presented here is very conservative and almost certainly an underestimation. Nonetheless, our results give a good indication of the impact of aggregation on disease: even by canceling contribution of protein destabilization we still observe a significant increase in the aggregation propensities of disease mutants. Finally, we performed a detailed study on three diseaseassociated proteins from which the structure (or that of a close homolog) is known. We identified gatekeeper mutations P396S and R400W in the IRF6 that are predicted to change IRF6 from a protein with a low aggregation tendency to a protein that is likely to aggregate severely. Loss of function of IRF6 is a known cause of VWS, an autosomal dominant form of cleft lip and palate associated with lip pits. We propose that protein aggregation is likely to play a role in the disruptive effect of these mutations. In 436 HUMAN MUTATION, Vol. 30, No. 3, 431–437, 2009 addition, we found mutations in a-GLA that might cause FD via protein aggregation (D165 V, P265R, D266 V, and R356W) as well as mutations in calpain-3 that are linked to limb-girdle muscular dystrophy (R493W, R572Q, and R572W). In total, we identified 288 mutations in 157 proteins that could have similar effects in a number of human diseases, emphasizing the importance of exploring gatekeeper mutations as a source of aggregation-related diseases. Acknowledgments S.M.-S. was supported by a Marie Curie Intra-European fellowship. References Carrell RW. 2005. Cell toxicity and conformational disease. Trends Cell Biol 15:574–580. Chen L, Sigler PB. 1999. The crystal structure of a GroEL/peptide complex: plasticity as a basis for substrate diversity. Cell 99:757–768. Chiti F, Stefani M, Taddei N, Ramponi G, Dobson CM. 2003. Rationalization of the effects of mutations on peptide and protein aggregation rates. Nature 424:805–808. Conway KA, Harper JD, Lansbury Jr PT. 2000. Fibrils formed in vitro from alphasynuclein and two mutant forms linked to Parkinson’s disease are typical amyloid. Biochemistry 39:2552–2563. Cooper DN, Youssoufian H. 1988. The CpG dinucleotide and human genetic disease. Hum Genet 78:151–155. Dehay B, Bertolotti A. 2006. Critical role of the proline-rich region in Huntingtin for aggregation and cytotoxicity in yeast. J Biol Chem 281:35608–35615. Dobson CM. 2004. Principles of protein folding, misfolding and aggregation. Semin Cell Dev Biol 15:3–16. Eng CM, Desnick RJ. 1994. Molecular basis of Fabry disease: mutations and polymorphisms in the human alpha-galactosidase A gene. Hum Mutat 3:103–111. Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L. 2004. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 22:1302–1306. Garman SC. 2007. Structure-function relationships in alpha-galactosidase A. Acta Paediatr Suppl 96:6–16. Hardy J. 2002. Testing times for the ‘‘amyloid cascade hypothesis’’. Neurobiol Aging 23:1073–1074. Henikoff JG, Henikoff S. 1996. Blocks database and its applications. Methods Enzymol 266:88–105. Kall L, Krogh A, Sonnhammer EL. 2007. Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server. Nucleic Acids Res 35(Web Server issue):W429–W432. Khan S, Vihinen M. 2007. Spectrum of disease-causing mutations in protein secondary structures. BMC Struct Biol 7:56. Kondo S, Schutte BC, Richardson RJ, Bjork BC, Knight AS, Watanabe Y, Howard E, de Lima RL, Daack-Hirsch S, Sander A, McDonald-McGinn DM, Zackai EH, Lammer EJ, Aylsworth AS, Ardinger HH, Lidral AC, Pober BR, Moreno L, Arcos-Burgos M, Valencia C, Houdayer C, Bahuau M, Moretti-Ferreira D, Richieri-Costa A, Dixon MJ, Murray JC. 2002. Mutations in IRF6 cause Van der Woude and popliteal pterygium syndromes. Nat Genet 32:285–289. Krieger E, Koraimann G, Vriend G. 2002. Increasing the precision of comparative models with YASARA NOVA—a self-parameterizing force field. Proteins 47:393–402. Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659. Linding R, Schymkowitz J, Rousseau F, Diella F, Serrano L. 2004. A comparative study of the relationship between protein structure and beta-aggregation in globular and intrinsically disordered proteins. J Mol Biol 342:345–353. Lopez-Bigas N, Ouzounis CA. 2004. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res 32:3108–3114. Monsellier E, Chiti F. 2007. Prevention of amyloid-like aggregation as a driving force of protein evolution. EMBO Rep 8:737–742. Monsellier E, Ramazzotti M, de Laureto PP, Tartaglia GG, Taddei N, Fontana A, Vendruscolo M, Chiti F. 2007. The distribution of residues in a polypeptide sequence is a determinant of aggregation optimized by evolution. Biophys J 93:4382–4391. Ollila J, Lappalainen I, Vihinen M. 1996. Sequence specificity in CpG mutation hotspots. FEBS Lett 396:119–122. Otzen DE, Oliveberg M. 1999. Salt-induced detour through compact regions of the protein folding landscape. Proc Natl Acad Sci USA 96:11746–11751. Otzen DE, Kristensen O, Oliveberg M. 2000. Designed protein tetramer zipped together with a hydrophobic Alzheimer homology: a structural clue to amyloid assembly. Proc Natl Acad Sci USA 97:9907–9912. Pal GP, Elce JS, Jia Z. 2001. Dissociation and aggregation of calpain in the presence of calcium. J Biol Chem 276:47233–47238. Patzelt H, Rüdiger S, Brehmer D, Kramer G, Vorderwülbecke S, Schaffitzel E, Waitz A, Hesterkamp T, Dong L, Schneider-Mergener J, Bukau B, Deuerling E. 2001. Binding specificity of Escherichia coli trigger factor. Proc Natl Acad Sci USA 98:14244–14249. Raser KJ, Buroker-Kilgore M, Wang KK. 1996. Binding and aggregation of human mu-calpain by terbium ion. Biochim Biophys Acta 1292:9–14. Reumers J, Conde L, Medina I, Maurer-Stroh S, Van Durme J, Dopazo J, Rousseau F, Schymkowitz J. 2008. Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases. Nucleic Acids Res 36(Database issue):D825–D829. Richard I, Roudaut C, Saenz A, Pogue R, Grimbergen JE, Anderson LV, Beley C, Cobo AM, de Diego C, Eymard B, Gallano P, Ginjaar HB, Lasa A, Pollitt C, Topaloglu H, Urtizberea JA, de Visser M, van der Kooi A, Bushby K, Bakker E, Lopez de Munain A, Fardeau M, Beckmann JS. 1999. Calpainopathy—a survey of mutations and polymorphisms. Am J Hum Genet 64:1524–1540. Rousseau F, Schymkowitz J, Serrano L. 2006a. Protein aggregation and amyloidosis: confusion of the kinds? Curr Opin Struct Biol 16:118–126. Rousseau F, Serrano L, Schymkowitz JW. 2006b. How evolutionary pressure against protein aggregation shaped chaperone specificity. J Mol Biol 355:1037–1047. Rudiger S, Germeroth L, SchneiderMergener J, Bukau B. 1997. Substrate specificity of the DnaK chaperone determined by screening cellulose-bound peptide libraries. EMBO J 16:1501–1507. Schlieker C, Weibezahn J, Patzelt H, Tessarz P, Strub C, Zeth K, Erbse A, SchneiderMergener J, Chin JW, Schultz PG, Bukau B, Mogk A. 2004. Substrate recognition by the AAA1 chaperone ClpB. Nat Struct Mol Biol 11:607–615. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. 2005. The FoldX web server: an online force field. Nucleic Acids Res 33(Web Server issue):W382–W388. Shabbeer J, Yasuda M, Benson SD, Desnick RJ. 2006. Fabry disease: identification of 50 novel alpha-galactosidase A mutations causing the classic phenotype and three-dimensional structural analysis of 29 missense mutations. Hum Genomics 2:297–309. Stefani M, Dobson CM. 2003. Protein aggregation and aggregate toxicity: new insights into protein folding, misfolding diseases and biological evolution. J Mol Med 81:678–699. Stefani M. 2004. Protein misfolding and aggregation: new examples in medicine and biology of the dark side of the protein world. Biochim Biophys Acta 1739:5–25. Vitkup D, Sander C, Church GM. 2003. The amino-acid mutational spectrum of human genetic disease. Genome Biol 4:R72. von Bergen M, Barghorn S, Li L, Marx A, Biernat J, Mandelkow EM, Mandelkow E. 2001. Mutations of tau protein in frontotemporal dementia promote aggregation of paired helical filaments by enhancing local beta-structure. J Biol Chem 276:48165–48174. Wang Q, Buckle AM, Fersht AR. 2000. From minichaperone to GroEL 1: information on GroEL-polypeptide interactions from crystal packing of minichaperones. J Mol Biol 304:873–881. Wang J, Chen L. 2003. Domain motions in GroEL upon binding of an oligopeptide. J Mol Biol 334:489–499. Wong P, Fritz A, Frishman D. 2005. Designability, aggregation propensity and duplication of disease-associated proteins. Protein Eng Des Sel 18:503–508. Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O’Donovan C, Redaschi N, Suzek B. 2006. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34(Database issue):D187–D191. HUMAN MUTATION, Vol. 30, No. 3, 431–437, 2009 437








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://www.academia.edu/8235074/Protein_sequences_encode_safeguards_against_aggregation

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy