-
1.
Amalgamation of 3D structure and sequence information for protein-protein interaction prediction.
Jha, K, Saha, S
Scientific reports. 2020;(1):19171
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.
-
2.
ProteoViz: a tool for the analysis and interactive visualization of phosphoproteomics data.
Storey, AJ, Naceanceno, KS, Lan, RS, Washam, CL, Orr, LM, Mackintosh, SG, Tackett, AJ, Edmondson, RD, Wang, Z, Li, HY, et al
Molecular omics. 2020;(4):316-326
-
-
Free full text
-
Abstract
Quantitative proteomics generates large datasets with increasing depth and quantitative information. With the advance of mass spectrometry and increasingly larger data sets, streamlined methodologies and tools for analysis and visualization of phosphoproteomics are needed both at the protein and modified peptide levels. To assist in addressing this need, we developed ProteoViz, which includes a set of R scripts that perform normalization and differential expression analysis of both the proteins and enriched phosphorylated peptides, and identify sequence motifs, kinases, and gene set enrichment pathways. The tool generates interactive visualization plots that allow users to interact with the phosphoproteomics results and quickly identify proteins and phosphorylated peptides of interest for their biological study. The tool also links significant phosphosites with sequence motifs and pathways that will help explain the experimental conditions and guide future experiments. Here, we present the workflow and demonstrate its functionality by analyzing a phosphoproteomic data set from two lymphoma cell lines treated with kinase inhibitors. The scripts and data are freely available at and via the ProteomeXchange with identifier PXD015606.
-
3.
Combining sequence and network information to enhance protein-protein interaction prediction.
Liu, L, Zhu, X, Ma, Y, Piao, H, Yang, Y, Hao, X, Fu, Y, Wang, L, Peng, J
BMC bioinformatics. 2020;(Suppl 16):537
Abstract
BACKGROUND Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs accurately is crucial for understanding cell physiology. RESULTS Recently, graph convolutional networks (GCNs) have been proposed to capture the graph structure information and generate representations for nodes in the graph. In our paper, we use GCNs to learn the position information of proteins in the PPIs networks graph, which can reflect the properties of proteins to some extent. Combining amino acid sequence information and position information makes a stronger representation for protein, which improves the accuracy of PPIs prediction. CONCLUSION In previous research methods, most of them only used protein amino acid sequence as input information to make predictions, without considering the structural information of PPIs networks graph. We first time combine amino acid sequence information and position information to make representations for proteins. The experimental results indicate that our method has strong competitiveness compared with several sequence-based methods.
-
4.
Protein-Protein Interactions Prediction Based on Graph Energy and Protein Sequence Information.
Xu, D, Xu, H, Zhang, Y, Chen, W, Gao, R
Molecules (Basel, Switzerland). 2020;(8)
Abstract
Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.
-
5.
Improving Identification of In-organello Protein-Protein Interactions Using an Affinity-enrichable, Isotopically Coded, and Mass Spectrometry-cleavable Chemical Crosslinker.
Makepeace, KAT, Mohammed, Y, Rudashevskaya, EL, Petrotchenko, EV, Vögtle, FN, Meisinger, C, Sickmann, A, Borchers, CH
Molecular & cellular proteomics : MCP. 2020;(4):624-639
Abstract
An experimental and computational approach for identification of protein-protein interactions by ex vivo chemical crosslinking and mass spectrometry (CLMS) has been developed that takes advantage of the specific characteristics of cyanurbiotindipropionylsuccinimide (CBDPS), an affinity-tagged isotopically coded mass spectrometry (MS)-cleavable crosslinking reagent. Utilizing this reagent in combination with a crosslinker-specific data-dependent acquisition strategy based on MS2 scans, and a software pipeline designed for integrating crosslinker-specific mass spectral information led to demonstrated improvements in the application of the CLMS technique, in terms of the detection, acquisition, and identification of crosslinker-modified peptides. This approach was evaluated on intact yeast mitochondria, and the results showed that hundreds of unique protein-protein interactions could be identified on an organelle proteome-wide scale. Both known and previously unknown protein-protein interactions were identified. These interactions were assessed based on their known sub-compartmental localizations. Additionally, the identified crosslinking distance constraints are in good agreement with existing structural models of protein complexes involved in the mitochondrial electron transport chain.
-
6.
The Methods Employed in Mass Spectrometric Analysis of Posttranslational Modifications (PTMs) and Protein-Protein Interactions (PPIs).
Yakubu, RR, Nieves, E, Weiss, LM
Advances in experimental medicine and biology. 2019;:169-198
-
-
Free full text
-
Abstract
Mass Spectrometry (MS) has revolutionized the way we study biomolecules, especially proteins, their interactions and posttranslational modifications (PTM). As such MS has established itself as the leading tool for the analysis of PTMs mainly because this approach is highly sensitive, amenable to high throughput and is capable of assigning PTMs to specific sites in the amino acid sequence of proteins and peptides. Along with the advances in MS methodology there have been improvements in biochemical, genetic and cell biological approaches to mapping the interactome which are discussed with consideration for both the practical and technical considerations of these techniques. The interactome of a species is generally understood to represent the sum of all potential protein-protein interactions. There are still a number of barriers to the elucidation of the human interactome or any other species as physical contact between protein pairs that occur by selective molecular docking in a particular spatiotemporal biological context are not easily captured and measured.PTMs massively increase the complexity of organismal proteomes and play a role in almost all aspects of cell biology, allowing for fine-tuning of protein structure, function and localization. There are an estimated 300 PTMS with a predicted 5% of the eukaryotic genome coding for enzymes involved in protein modification, however we have not yet been able to reliably map PTM proteomes due to limitations in sample preparation, analytical techniques, data analysis, and the substoichiometric and transient nature of some PTMs. Improvements in proteomic and mass spectrometry methods, as well as sample preparation, have been exploited in a large number of proteome-wide surveys of PTMs in many different organisms. Here we focus on previously published global PTM proteome studies in the Apicomplexan parasites T. gondii and P. falciparum which offer numerous insights into the abundance and function of each of the studied PTM in the Apicomplexa. Integration of these datasets provide a more complete picture of the relative importance of PTM and crosstalk between them and how together PTM globally change the cellular biology of the Apicomplexan protozoa. A multitude of techniques used to investigate PTMs, mostly techniques in MS-based proteomics, are discussed for their ability to uncover relevant biological function.
-
7.
Protein-protein interactions of the nicotinamide/nicotinate mononucleotide adenylyltransferase of Leishmania braziliensis.
Ortiz-Joya, L, Contreras-Rodríguez, LE, Ramírez-Hernández, MH
Memorias do Instituto Oswaldo Cruz. 2019;:e180506
Abstract
BACKGROUND Nicotinamide adenine dinucleotide (NAD) plays a central role in energy metabolism and integrates cellular metabolism with signalling and gene expression. NAD biosynthesis depends on the enzyme nicotinamide/nicotinate mononucleotide adenylyltransferase (NMNAT; EC: 2.7.7.1/18), in which converge the de novo and salvage pathways. OBJECTIVE The purpose of this study was to analyse the protein-protein interactions (PPI) of NMNAT of Leishmania braziliensis (LbNMNAT) in promastigotes. METHODS Transgenic lines of L. braziliensis promastigotes were established by transfection with the pSP72αneoαLbNMNAT-GFP vector. Soluble protein extracts were prepared, co-immunoprecipitation assays were performed, and the co-immunoprecipitates were analysed by mass spectrometry. Furthermore, bioinformatics tools such as network analysis were applied to generate a PPI network. FINDINGS Proteins involved in protein folding, redox homeostasis, and translation were found to interact with the LbNMNAT protein. The PPI network indicated enzymes of the nicotinate and nicotinamide metabolic routes, as well as RNA-binding proteins, the latter being the point of convergence between our experimental and computational results. MAIN CONCLUSION We constructed a model of PPI of LbNMNAT and showed its association with proteins involved in various functions such as protein folding, redox homeostasis, translation, and NAD synthesis.
-
8.
Global Vectors Representation of Protein Sequences and Its Application for Predicting Self-Interacting Proteins with Multi-Grained Cascade Forest Model.
Chen, ZH, You, ZH, Zhang, WB, Wang, YB, Cheng, L, Alghazzawi, D
Genes. 2019;(11)
Abstract
Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems.
-
9.
Predicting protein-protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach.
Tian, B, Wu, X, Chen, C, Qiu, W, Ma, Q, Yu, B
Journal of theoretical biology. 2019;:329-346
Abstract
Research on protein-protein interactions (PPIs) not only helps to reveal the nature of life activities but also plays a driving role in understanding the mechanisms of disease activity and the development of effective drugs. The rapid development of machine learning provides new opportunities and challenges for understanding the mechanism of PPIs. It plays an important role in the field of proteomics research. In recent years, an increasing number of computational methods for predicting PPIs have been developed. This paper proposes a new method for predicting PPIs based on multi-information fusion. First, the pseudo-amino acid composition (PseAAC), auto-covariance (AC) and encoding based on grouped weight (EBGW) methods are used to extract the features of protein sequences, and the extracted three groups of feature vectors were fused. Secondly, the fused feature vectors are denoised by two-dimensional (2-D) wavelet denoising. Finally, the denoised feature vectors are input to the support vector machine (SVM) classifier to predict the PPIs. The ACC of PPIs of Helicobacter pylori (H. pylori) and Saccharomyces cerevisiae (S. cerevisiae) datasets were 95.97% and 95.55% by 5-fold cross-validation test and compared with other prediction methods. The experimental results show that the proposed multi-information fusion prediction method can effectively improve the prediction performance of PPIs. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/PPIs-WDSVM/.
-
10.
Bioinformatics Approach to Identify Novel AMPK Targets.
Gongol, B, Marin, T, Johnson, DA, Shyy, JY
Methods in molecular biology (Clifton, N.J.). 2018;:99-109
Abstract
In silico analysis of Big Data is a useful tool to identify putative kinase targets as well as nodes of signaling cascades that are difficult to discover by traditional single molecule experimentation. System approaches that use a multi-tiered investigational methodology have been instrumental in advancing our understanding of cellular mechanisms that result in phenotypic changes. Here, we present a bioinformatics approach to identify AMP-activated protein kinase (AMPK) target proteins on a proteome-wide scale and an in vitro method for preliminary validation of these targets. This approach offers an initial screening for the identification of AMPK targets that can be further validated using mutagenesis and molecular biology techniques.