-
1.
Identifying glycan motifs using a novel subtree mining approach.
Coff, L, Chan, J, Ramsland, PA, Guy, AJ
BMC bioinformatics. 2020;(1):42
Abstract
BACKGROUND Glycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host-pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases. Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs. RESULTS In this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections. This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions. We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions. Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray. When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants. Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.89. CONCLUSIONS We present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs. The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation.
-
2.
Amalgamation of 3D structure and sequence information for protein-protein interaction prediction.
Jha, K, Saha, S
Scientific reports. 2020;(1):19171
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.
-
3.
Predicting substitutions to modulate disorder and stability in coiled-coils.
Karami, Y, Saighi, P, Vanderhaegen, R, Gerlier, D, Longhi, S, Laine, E, Carbone, A
BMC bioinformatics. 2020;(Suppl 19):573
Abstract
BACKGROUND Coiled-coils are described as stable structural motifs, where two or more helices wind around each other. However, coiled-coils are associated with local mobility and intrinsic disorder. Intrinsically disordered regions in proteins are characterized by lack of stable secondary and tertiary structure under physiological conditions in vitro. They are increasingly recognized as important for protein function. However, characterizing their behaviour in solution and determining precisely the extent of disorder of a protein region remains challenging, both experimentally and computationally. RESULTS In this work, we propose a computational framework to quantify the extent of disorder within a coiled-coil in solution and to help design substitutions modulating such disorder. Our method relies on the analysis of conformational ensembles generated by relatively short all-atom Molecular Dynamics (MD) simulations. We apply it to the phosphoprotein multimerisation domains (PMD) of Measles virus (MeV) and Nipah virus (NiV), both forming tetrameric left-handed coiled-coils. We show that our method can help quantify the extent of disorder of the C-terminus region of MeV and NiV PMDs from MD simulations of a few tens of nanoseconds, and without requiring an extensive exploration of the conformational space. Moreover, this study provided a conceptual framework for the rational design of substitutions aimed at modulating the stability of the coiled-coils. By assessing the impact of four substitutions known to destabilize coiled-coils, we derive a set of rules to control MeV PMD structural stability and cohesiveness. We therefore design two contrasting substitutions, one increasing the stability of the tetramer and the other increasing its flexibility. CONCLUSIONS Our method can be considered as a platform to reason about how to design substitutions aimed at regulating flexibility and stability.
-
4.
Inferring the molecular and phenotypic impact of amino acid variants with MutPred2.
Pejaver, V, Urresti, J, Lugo-Martinez, J, Pagel, KA, Lin, GN, Nam, HJ, Mort, M, Cooper, DN, Sebat, J, Iakoucheva, LM, et al
Nature communications. 2020;(1):5918
Abstract
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
-
5.
Bioinformatics analysis of multi-omics data identifying molecular biomarker candidates and epigenetically regulatory targets associated with retinoblastoma.
Zeng, Y, He, T, Liu, J, Li, Z, Xie, F, Chen, C, Xing, Y
Medicine. 2020;(47):e23314
-
-
Free full text
-
Abstract
Retinoblastoma (RB) is the commonest malignant tumor of the infant retina. Besides genetic changes, epigenetic events are also considered to implicate the occurrence of RB. This study aimed to identify significantly altered protein-coding genes, DNA methylation, microRNAs (miRNAs), long noncoding RNAs (lncRNAs), and their molecular functions and pathways associated with RB, and investigate the epigenetically regulatory mechanism of DNA methylation modification and non-coding RNAs on key genes of RB via bioinformatics method.We obtained multi-omics data on protein-coding genes, DNA methylation, miRNAs, and lncRNAs from the Gene Expression Omnibus database. We identified differentially expressed genes (DEGs) using the Limma package in R, discerned their biological functions and pathways using enrichment analysis, and conducted the modular analysis based on protein-protein interaction network to identify hub genes of RB. Survival analyses based on The Cancer Genome Atlas clinical database were performed to analyze prognostic values of key genes of RB. Subsequently, we identified the differentially methylated genes, differentially expressed miRNAs (DEMs) and lncRNAs (DELs), and intersected them with key genes to analyze possible targets of the underlying epigenetic regulatory mechanisms. Finally, the ceRNA network of lncRNAs-miRNAs-mRNAs was constructed using Cytoscape.A total of 193 DEGs, 74 differentially methylated-DEGs (DM-DEGs), 45 DEMs, 5 DELs were identified. The molecular pathways of DEGs were enriched in cell cycle, p53 signaling pathway, and DNA replication. A total of 10 key genes were identified and found significantly associated with poor survival outcome based on survival analyses, including CDK1, BUB1, CCNB2, TOP2A, CCNB1, RRM2, KIF11, KIF20A, NDC80, and TTK. We further found that hub genes MCM6 and KIF14 were differentially methylated, key gene RRM2 was targeted by DEMs, and key genes TTK, RRM2, and CDK1 were indirectly regulated by DELs. Additionally, the ceRNA network with 222 regulatory associations was constructed to visualize the correlations between lncRNAs-miRNAs-mRNAs.This study presents an integrated bioinformatics analysis of genetic and epigenetic changes that may be associated with the development of RB. Findings may yield many new insights into the molecular biomarker candidates and epigenetically regulatory targets of RB.
-
6.
Bioinformatics analysis of differentially expressed genes in subchondral bone in early experimental osteoarthritis using microarray data.
Wang, Z, Ji, Y, Bao, HW
Journal of orthopaedic surgery and research. 2020;(1):310
Abstract
BACKGROUND Osteoarthritis (OA) is the most common arthritic disease in humans, affecting the majority of individuals over 65 years of age. The aim of this study is to identify the gene expression profile specific to subchondral bone in OA by comparing the different expression profiles in experimental and sham-operation groups. METHODS Gene expression profile GSE30322 was downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were obtained by limma package. And Database for Annotation, Visualization and Integrated Discovery (DAVID) databases were further used to identify the potential gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Furthermore, a protein-protein interaction (PPI) network was constructed and significant modules were extracted. RESULTS Totally, 588 DEGs were identified including 199 upregulated DEGs and 389 downregulated DEGs screened in OA and sham-operation. GO showed that DEGs were significantly enhanced for ribosomal subunit export from nucleus and molting cycle. KEGG pathway analysis revealed that target genes were enriched in thiamine metabolism. CONCLUSION These key candidate DEGs that affect the progression of OA, and these genes might serve as potential therapeutic targets for OA.
-
7.
Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning.
Pandurangan, AP, Blundell, TL
Protein science : a publication of the Protein Society. 2020;(1):247-257
Abstract
Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.
-
8.
SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.
Preto, AJ, Moreira, IS
International journal of molecular sciences. 2020;(19)
Abstract
Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein-protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.
-
9.
Effects of reverse genetic mutations on the spectral and photochemical behavior of a photoactivatable fluorescent protein PAiRFP1.
Hassan, F, Khan, FI, Song, H, Lai, D, Juan, F
Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy. 2020;:117807
Abstract
Bacteriophytochrome photoreceptors (BphPs) containing biliverdin (BV) have great potential for the development of genetically engineered near-infrared fluorescent proteins (NIR FPs). We investigated a photoactivatable fluorescent protein PAiRFP1, was engineered through directed molecular evolution. The coexistence of both red light absorbing (Pr) and far-red light absorbing (Pfr) states in dark is essential for the photoactivation of PAiRFP1. The PCR based site-directed reverse mutagenesis, spectroscopic measurements and molecular dynamics (MD) simulations were performed on three targeted sites V386A, V480A and Y498H in PHY domain to explore their potential effects during molecular evolution of PAiRFP1. We found that these substitutions did not affect the coexistence of Pr and Pfr states but led to slight changes in the photophysical parameters. The covalent docking of biliverdin (cis and trans form) with PAiRFP1 was followed by several 100 ns MD simulations to provide some theoretical explanations for the coexistence of Pr and pfr states. The results suggested that experimentally observed coexistence of Pr and Pfr states in both PAiRFP1 and mutants were resulted from the improved stability of Pr state. The use of experimental and computational work provided useful understanding of Pr and Pfr states and the effects of these mutations on the photophysical properties of PAiRFP1.
-
10.
TooT-T: discrimination of transport proteins from non-transport proteins.
Alballa, M, Butler, G
BMC bioinformatics. 2020;(Suppl 3):25
Abstract
BACKGROUND Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the immense effort needed to characterize them. Therefore, there is a need for advanced computational techniques takes sequence information alone to distinguish membrane transporter proteins; this can then be used to direct new experiments and give a hint about the function of a protein. RESULTS This work proposes an ensemble classifier TooT-T that is trained to optimally combine the predictions from homology annotation transfer and machine-learning methods to determine the final prediction. Experimental results obtained by cross-validation and independent testing show that combining the two approaches is more beneficial than employing only one. CONCLUSION The proposed model outperforms all of the state-of-the-art methods that rely on the protein sequence alone, with respect to accuracy and MCC. TooT-T achieved an overall accuracy of 90.07% and 92.22% and an MCC 0.80 and 0.82 with the training and independent datasets, respectively.