1.
Machine-learning approach expands the repertoire of anti-CRISPR protein families.
Gussow, AB, Park, AE, Borges, AL, Shmakov, SA, Makarova, KS, Wolf, YI, Bondy-Denomy, J, Koonin, EV
Nature communications. 2020;(1):3784
Abstract
The CRISPR-Cas are adaptive bacterial and archaeal immunity systems that have been harnessed for the development of powerful genome editing and engineering tools. In the incessant host-parasite arms race, viruses evolved multiple anti-defense mechanisms including diverse anti-CRISPR proteins (Acrs) that specifically inhibit CRISPR-Cas and therefore have enormous potential for application as modulators of genome editing tools. Most Acrs are small and highly variable proteins which makes their bioinformatic prediction a formidable task. We present a machine-learning approach for comprehensive Acr prediction. The model shows high predictive power when tested against an unseen test set and was employed to predict 2,500 candidate Acr families. Experimental validation of top candidates revealed two unknown Acrs (AcrIC9, IC10) and three other top candidates were coincidentally identified and found to possess anti-CRISPR activity. These results substantially expand the repertoire of predicted Acrs and provide a resource for experimental Acr discovery.
2.
Exploring Protein-Peptide Binding Specificity through Computational Peptide Screening.
Bhattacherjee, A, Wallin, S
PLoS computational biology. 2013;(10):e1003277
Abstract
The binding of short disordered peptide stretches to globular protein domains is important for a wide range of cellular processes, including signal transduction, protein transport, and immune response. The often promiscuous nature of these interactions and the conformational flexibility of the peptide chain, sometimes even when bound, make the binding specificity of this type of protein interaction a challenge to understand. Here we develop and test a Monte Carlo-based procedure for calculating protein-peptide binding thermodynamics for many sequences in a single run. The method explores both peptide sequence and conformational space simultaneously by simulating a joint probability distribution which, in particular, makes searching through peptide sequence space computationally efficient. To test our method, we apply it to 3 different peptide-binding protein domains and test its ability to capture the experimentally determined specificity profiles. Insight into the molecular underpinnings of the observed specificities is obtained by analyzing the peptide conformational ensembles of a large number of binding-competent sequences. We also explore the possibility of using our method to discover new peptide-binding pockets on protein structures.
3.
A mathematical framework for the selection of an optimal set of peptides for epitope-based vaccines.
Toussaint, NC, Dönnes, P, Kohlbacher, O
PLoS computational biology. 2008;(12):e1000246
Abstract
Epitope-based vaccines (EVs) have a wide range of applications: from therapeutic to prophylactic approaches, from infectious diseases to cancer. The development of an EV is based on the knowledge of target-specific antigens from which immunogenic peptides, so-called epitopes, are derived. Such epitopes form the key components of the EV. Due to regulatory, economic, and practical concerns the number of epitopes that can be included in an EV is limited. Furthermore, as the major histocompatibility complex (MHC) binding these epitopes is highly polymorphic, every patient possesses a set of MHC class I and class II molecules of differing specificities. A peptide combination effective for one person can thus be completely ineffective for another. This renders the optimal selection of these epitopes an important and interesting optimization problem. In this work we present a mathematical framework based on integer linear programming (ILP) that allows the formulation of various flavors of the vaccine design problem and the efficient identification of optimal sets of epitopes. Out of a user-defined set of predicted or experimentally determined epitopes, the framework selects the set with the maximum likelihood of eliciting a broad and potent immune response. Our ILP approach allows an elegant and flexible formulation of numerous variants of the EV design problem. In order to demonstrate this, we show how common immunological requirements for a good EV (e.g., coverage of epitopes from each antigen, coverage of all MHC alleles in a set, or avoidance of epitopes with high mutation rates) can be translated into constraints or modifications of the objective function within the ILP framework. An implementation of the algorithm outperforms a simple greedy strategy as well as a previously suggested evolutionary algorithm and has runtimes on the order of seconds for typical problem sizes.