-
1.
LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with word2vec.
Tsukiyama, S, Hasan, MM, Fujii, S, Kurata, H
Briefings in bioinformatics. 2021;(6)
-
-
Free full text
-
Abstract
Viral infection involves a large number of protein-protein interactions (PPIs) between human and virus. The PPIs range from the initial binding of viral coat proteins to host membrane receptors to the hijacking of host transcription machinery. However, few interspecies PPIs have been identified, because experimental methods including mass spectrometry are time-consuming and expensive, and molecular dynamic simulation is limited only to the proteins whose 3D structures are solved. Sequence-based machine learning methods are expected to overcome these problems. We have first developed the LSTM model with word2vec to predict PPIs between human and virus, named LSTM-PHV, by using amino acid sequences alone. The LSTM-PHV effectively learnt the training data with a highly imbalanced ratio of positive to negative samples and achieved AUCs of 0.976 and 0.973 and accuracies of 0.984 and 0.985 on the training and independent datasets, respectively. In predicting PPIs between human and unknown or new virus, the LSTM-PHV learned greatly outperformed the existing state-of-the-art PPI predictors. Interestingly, learning of only sequence contexts as words is sufficient for PPI prediction. Use of uniform manifold approximation and projection demonstrated that the LSTM-PHV clearly distinguished the positive PPI samples from the negative ones. We presented the LSTM-PHV online web server and support data that are freely available at http://kurata35.bio.kyutech.ac.jp/LSTM-PHV.
-
2.
Current status and future perspectives of computational studies on human-virus protein-protein interactions.
Lian, X, Yang, X, Yang, S, Zhang, Z
Briefings in bioinformatics. 2021;(5)
Abstract
The protein-protein interactions (PPIs) between human and viruses mediate viral infection and host immunity processes. Therefore, the study of human-virus PPIs can help us understand the principles of human-virus relationships and can thus guide the development of highly effective drugs to break the transmission of viral infectious diseases. Recent years have witnessed the rapid accumulation of experimentally identified human-virus PPI data, which provides an unprecedented opportunity for bioinformatics studies revolving around human-virus PPIs. In this article, we provide a comprehensive overview of computational studies on human-virus PPIs, especially focusing on the method development for human-virus PPI predictions. We briefly introduce the experimental detection methods and existing database resources of human-virus PPIs, and then discuss the research progress in the development of computational prediction methods. In particular, we elaborate the machine learning-based prediction methods and highlight the need to embrace state-of-the-art deep-learning algorithms and new feature engineering techniques (e.g. the protein embedding technique derived from natural language processing). To further advance the understanding in this research topic, we also outline the practical applications of the human-virus interactome in fundamental biological discovery and new antiviral therapy development.
-
3.
A Novel Protein Mapping Method for Predicting the Protein Interactions in COVID-19 Disease by Deep Learning.
Alakus, TB, Turkoglu, I
Interdisciplinary sciences, computational life sciences. 2021;(1):44-60
-
-
Free full text
-
Abstract
The new type of corona virus (SARS-COV-2) emerging in Wuhan, China has spread rapidly to the world and has become a pandemic. In addition to having a significant impact on daily life, it also shows its effect in different areas, including public health and economy. Currently, there is no vaccine or antiviral drug available to prevent the COVID-19 disease. Therefore, determination of protein interactions of new types of corona virus is vital in clinical studies, drug therapy, identification of preclinical compounds and protein functions. Protein-protein interactions are important to examine protein functions and pathways involved in various biological processes and to determine the cause and progression of diseases. Various high-throughput experimental methods have been used to identify protein-protein interactions in organisms, yet, there is still a huge gap in specifying all possible protein interactions in an organism. In addition, since the experimental methods used include cloning, labeling, affinity purification mass spectrometry, the processes take a long time. Determining these interactions with artificial intelligence-based methods rather than experimental approaches may help to identify protein functions faster. Thus, protein-protein interaction prediction using deep-learning algorithms has been employed in conjunction with experimental method to explore new protein interactions. However, to predict protein interactions with artificial intelligence techniques, protein sequences need to be mapped. There are various types and numbers of protein-mapping methods in the literature. In this study, we wanted to contribute to the literature by proposing a novel protein-mapping method based on the AVL tree. The proposed method was inspired by the fast search performance on the dictionary structure of AVL tree and was used to verify the protein interactions between SARS-COV-2 virus and human. First, protein sequences were mapped by both the proposed method and various protein-mapping methods. Then, the mapped protein sequences were normalized and classified by bidirectional recurrent neural networks. The performance of the proposed method was evaluated with accuracy, f1-score, precision, recall, and AUC scores. Our results indicated that our mapping method predicts the protein interactions between SARS-COV-2 virus proteins and human proteins at an accuracy of 97.76%, precision of 97.60%, recall of 98.33%, f1-score of 79.42%, and with AUC 89% in average.
-
4.
Imbalance Data Processing Strategy for Protein Interaction Sites Prediction.
Wang, B, Mei, C, Wang, Y, Zhou, Y, Cheng, MT, Zheng, CH, Wang, L, Zhang, J, Chen, P, Xiong, Y
IEEE/ACM transactions on computational biology and bioinformatics. 2021;(3):985-994
Abstract
Protein-protein interactions play essential roles in various biological progresses. Identifying protein interaction sites can facilitate researchers to understand life activities and therefore will be helpful for drug design. However, the number of experimental determined protein interaction sites is far less than that of protein sites in protein-protein interaction or protein complexes. Therefore, the negative and positive samples are usually imbalanced, which is common but bring result bias on the prediction of protein interaction sites by computational approaches. In this work, we presented three imbalance data processing strategies to reconstruct the original dataset, and then extracted protein features from the evolutionary conservation of amino acids to build a predictor for identification of protein interaction sites. On a dataset with 10,430 surface residues but only 2,299 interface residues, the imbalance dataset processing strategies can obviously reduce the prediction bias, and therefore improve the prediction performance of protein interaction sites. The experimental results show that our prediction models can achieve a better prediction performance, such as a prediction accuracy of 0.758, or a high F-measure of 0.737, which demonstrated the effectiveness of our method.
-
5.
Machine learning for phytopathology: from the molecular scale towards the network scale.
Wang, Y, Zhou, M, Zou, Q, Xu, L
Briefings in bioinformatics. 2021;(5)
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
-
6.
Determining protein-protein functional associations by functional rules based on gene ontology and KEGG pathway.
Zhang, YH, Zeng, T, Chen, L, Huang, T, Cai, YD
Biochimica et biophysica acta. Proteins and proteomics. 2021;(6):140621
Abstract
Protein-protein interactions (PPIs) describe the direct physical contact of two proteins that usually results in specific biological functions or regulatory processes. The characterization and study of PPIs through the investigation of their pattern and principle have remained a question in biological studies. Various experimental and computational methods have been used for PPI studies, but most of them are based on the sequence similarity with current validated PPI participators or cellular localization patterns. Most methods ignore the fact that PPIs are defined by their specific biological functions. In this study, we constructed a novel rule-based computational method using gene ontology and KEGG pathway annotation of PPI participators that correspond to the complicated biological effects of PPIs. Our newly presented computational method identified a group of biological functions that are tightly associated with PPIs and provided a new function-based tool for PPI studies in a rule manner.
-
7.
Improving Identification of In-organello Protein-Protein Interactions Using an Affinity-enrichable, Isotopically Coded, and Mass Spectrometry-cleavable Chemical Crosslinker.
Makepeace, KAT, Mohammed, Y, Rudashevskaya, EL, Petrotchenko, EV, Vögtle, FN, Meisinger, C, Sickmann, A, Borchers, CH
Molecular & cellular proteomics : MCP. 2020;(4):624-639
Abstract
An experimental and computational approach for identification of protein-protein interactions by ex vivo chemical crosslinking and mass spectrometry (CLMS) has been developed that takes advantage of the specific characteristics of cyanurbiotindipropionylsuccinimide (CBDPS), an affinity-tagged isotopically coded mass spectrometry (MS)-cleavable crosslinking reagent. Utilizing this reagent in combination with a crosslinker-specific data-dependent acquisition strategy based on MS2 scans, and a software pipeline designed for integrating crosslinker-specific mass spectral information led to demonstrated improvements in the application of the CLMS technique, in terms of the detection, acquisition, and identification of crosslinker-modified peptides. This approach was evaluated on intact yeast mitochondria, and the results showed that hundreds of unique protein-protein interactions could be identified on an organelle proteome-wide scale. Both known and previously unknown protein-protein interactions were identified. These interactions were assessed based on their known sub-compartmental localizations. Additionally, the identified crosslinking distance constraints are in good agreement with existing structural models of protein complexes involved in the mitochondrial electron transport chain.
-
8.
ProteoViz: a tool for the analysis and interactive visualization of phosphoproteomics data.
Storey, AJ, Naceanceno, KS, Lan, RS, Washam, CL, Orr, LM, Mackintosh, SG, Tackett, AJ, Edmondson, RD, Wang, Z, Li, HY, et al
Molecular omics. 2020;(4):316-326
-
-
Free full text
-
Abstract
Quantitative proteomics generates large datasets with increasing depth and quantitative information. With the advance of mass spectrometry and increasingly larger data sets, streamlined methodologies and tools for analysis and visualization of phosphoproteomics are needed both at the protein and modified peptide levels. To assist in addressing this need, we developed ProteoViz, which includes a set of R scripts that perform normalization and differential expression analysis of both the proteins and enriched phosphorylated peptides, and identify sequence motifs, kinases, and gene set enrichment pathways. The tool generates interactive visualization plots that allow users to interact with the phosphoproteomics results and quickly identify proteins and phosphorylated peptides of interest for their biological study. The tool also links significant phosphosites with sequence motifs and pathways that will help explain the experimental conditions and guide future experiments. Here, we present the workflow and demonstrate its functionality by analyzing a phosphoproteomic data set from two lymphoma cell lines treated with kinase inhibitors. The scripts and data are freely available at and via the ProteomeXchange with identifier PXD015606.
-
9.
Combining sequence and network information to enhance protein-protein interaction prediction.
Liu, L, Zhu, X, Ma, Y, Piao, H, Yang, Y, Hao, X, Fu, Y, Wang, L, Peng, J
BMC bioinformatics. 2020;(Suppl 16):537
Abstract
BACKGROUND Protein-protein interactions (PPIs) are of great importance in cellular systems of organisms, since they are the basis of cellular structure and function and many essential cellular processes are related to that. Most proteins perform their functions by interacting with other proteins, so predicting PPIs accurately is crucial for understanding cell physiology. RESULTS Recently, graph convolutional networks (GCNs) have been proposed to capture the graph structure information and generate representations for nodes in the graph. In our paper, we use GCNs to learn the position information of proteins in the PPIs networks graph, which can reflect the properties of proteins to some extent. Combining amino acid sequence information and position information makes a stronger representation for protein, which improves the accuracy of PPIs prediction. CONCLUSION In previous research methods, most of them only used protein amino acid sequence as input information to make predictions, without considering the structural information of PPIs networks graph. We first time combine amino acid sequence information and position information to make representations for proteins. The experimental results indicate that our method has strong competitiveness compared with several sequence-based methods.
-
10.
Amalgamation of 3D structure and sequence information for protein-protein interaction prediction.
Jha, K, Saha, S
Scientific reports. 2020;(1):19171
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.