-
1.
Predicting substitutions to modulate disorder and stability in coiled-coils.
Karami, Y, Saighi, P, Vanderhaegen, R, Gerlier, D, Longhi, S, Laine, E, Carbone, A
BMC bioinformatics. 2020;(Suppl 19):573
Abstract
BACKGROUND Coiled-coils are described as stable structural motifs, where two or more helices wind around each other. However, coiled-coils are associated with local mobility and intrinsic disorder. Intrinsically disordered regions in proteins are characterized by lack of stable secondary and tertiary structure under physiological conditions in vitro. They are increasingly recognized as important for protein function. However, characterizing their behaviour in solution and determining precisely the extent of disorder of a protein region remains challenging, both experimentally and computationally. RESULTS In this work, we propose a computational framework to quantify the extent of disorder within a coiled-coil in solution and to help design substitutions modulating such disorder. Our method relies on the analysis of conformational ensembles generated by relatively short all-atom Molecular Dynamics (MD) simulations. We apply it to the phosphoprotein multimerisation domains (PMD) of Measles virus (MeV) and Nipah virus (NiV), both forming tetrameric left-handed coiled-coils. We show that our method can help quantify the extent of disorder of the C-terminus region of MeV and NiV PMDs from MD simulations of a few tens of nanoseconds, and without requiring an extensive exploration of the conformational space. Moreover, this study provided a conceptual framework for the rational design of substitutions aimed at modulating the stability of the coiled-coils. By assessing the impact of four substitutions known to destabilize coiled-coils, we derive a set of rules to control MeV PMD structural stability and cohesiveness. We therefore design two contrasting substitutions, one increasing the stability of the tetramer and the other increasing its flexibility. CONCLUSIONS Our method can be considered as a platform to reason about how to design substitutions aimed at regulating flexibility and stability.
-
2.
Bioinformatics analysis of differentially expressed genes in subchondral bone in early experimental osteoarthritis using microarray data.
Wang, Z, Ji, Y, Bao, HW
Journal of orthopaedic surgery and research. 2020;(1):310
Abstract
BACKGROUND Osteoarthritis (OA) is the most common arthritic disease in humans, affecting the majority of individuals over 65 years of age. The aim of this study is to identify the gene expression profile specific to subchondral bone in OA by comparing the different expression profiles in experimental and sham-operation groups. METHODS Gene expression profile GSE30322 was downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were obtained by limma package. And Database for Annotation, Visualization and Integrated Discovery (DAVID) databases were further used to identify the potential gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Furthermore, a protein-protein interaction (PPI) network was constructed and significant modules were extracted. RESULTS Totally, 588 DEGs were identified including 199 upregulated DEGs and 389 downregulated DEGs screened in OA and sham-operation. GO showed that DEGs were significantly enhanced for ribosomal subunit export from nucleus and molting cycle. KEGG pathway analysis revealed that target genes were enriched in thiamine metabolism. CONCLUSION These key candidate DEGs that affect the progression of OA, and these genes might serve as potential therapeutic targets for OA.
-
3.
Amalgamation of 3D structure and sequence information for protein-protein interaction prediction.
Jha, K, Saha, S
Scientific reports. 2020;(1):19171
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.
-
4.
Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning.
Pandurangan, AP, Blundell, TL
Protein science : a publication of the Protein Society. 2020;(1):247-257
Abstract
Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.
-
5.
Bioinformatics analysis of multi-omics data identifying molecular biomarker candidates and epigenetically regulatory targets associated with retinoblastoma.
Zeng, Y, He, T, Liu, J, Li, Z, Xie, F, Chen, C, Xing, Y
Medicine. 2020;(47):e23314
-
-
Free full text
-
Abstract
Retinoblastoma (RB) is the commonest malignant tumor of the infant retina. Besides genetic changes, epigenetic events are also considered to implicate the occurrence of RB. This study aimed to identify significantly altered protein-coding genes, DNA methylation, microRNAs (miRNAs), long noncoding RNAs (lncRNAs), and their molecular functions and pathways associated with RB, and investigate the epigenetically regulatory mechanism of DNA methylation modification and non-coding RNAs on key genes of RB via bioinformatics method.We obtained multi-omics data on protein-coding genes, DNA methylation, miRNAs, and lncRNAs from the Gene Expression Omnibus database. We identified differentially expressed genes (DEGs) using the Limma package in R, discerned their biological functions and pathways using enrichment analysis, and conducted the modular analysis based on protein-protein interaction network to identify hub genes of RB. Survival analyses based on The Cancer Genome Atlas clinical database were performed to analyze prognostic values of key genes of RB. Subsequently, we identified the differentially methylated genes, differentially expressed miRNAs (DEMs) and lncRNAs (DELs), and intersected them with key genes to analyze possible targets of the underlying epigenetic regulatory mechanisms. Finally, the ceRNA network of lncRNAs-miRNAs-mRNAs was constructed using Cytoscape.A total of 193 DEGs, 74 differentially methylated-DEGs (DM-DEGs), 45 DEMs, 5 DELs were identified. The molecular pathways of DEGs were enriched in cell cycle, p53 signaling pathway, and DNA replication. A total of 10 key genes were identified and found significantly associated with poor survival outcome based on survival analyses, including CDK1, BUB1, CCNB2, TOP2A, CCNB1, RRM2, KIF11, KIF20A, NDC80, and TTK. We further found that hub genes MCM6 and KIF14 were differentially methylated, key gene RRM2 was targeted by DEMs, and key genes TTK, RRM2, and CDK1 were indirectly regulated by DELs. Additionally, the ceRNA network with 222 regulatory associations was constructed to visualize the correlations between lncRNAs-miRNAs-mRNAs.This study presents an integrated bioinformatics analysis of genetic and epigenetic changes that may be associated with the development of RB. Findings may yield many new insights into the molecular biomarker candidates and epigenetically regulatory targets of RB.
-
6.
Inferring the molecular and phenotypic impact of amino acid variants with MutPred2.
Pejaver, V, Urresti, J, Lugo-Martinez, J, Pagel, KA, Lin, GN, Nam, HJ, Mort, M, Cooper, DN, Sebat, J, Iakoucheva, LM, et al
Nature communications. 2020;(1):5918
Abstract
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
-
7.
Effects of reverse genetic mutations on the spectral and photochemical behavior of a photoactivatable fluorescent protein PAiRFP1.
Hassan, F, Khan, FI, Song, H, Lai, D, Juan, F
Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy. 2020;:117807
Abstract
Bacteriophytochrome photoreceptors (BphPs) containing biliverdin (BV) have great potential for the development of genetically engineered near-infrared fluorescent proteins (NIR FPs). We investigated a photoactivatable fluorescent protein PAiRFP1, was engineered through directed molecular evolution. The coexistence of both red light absorbing (Pr) and far-red light absorbing (Pfr) states in dark is essential for the photoactivation of PAiRFP1. The PCR based site-directed reverse mutagenesis, spectroscopic measurements and molecular dynamics (MD) simulations were performed on three targeted sites V386A, V480A and Y498H in PHY domain to explore their potential effects during molecular evolution of PAiRFP1. We found that these substitutions did not affect the coexistence of Pr and Pfr states but led to slight changes in the photophysical parameters. The covalent docking of biliverdin (cis and trans form) with PAiRFP1 was followed by several 100 ns MD simulations to provide some theoretical explanations for the coexistence of Pr and pfr states. The results suggested that experimentally observed coexistence of Pr and Pfr states in both PAiRFP1 and mutants were resulted from the improved stability of Pr state. The use of experimental and computational work provided useful understanding of Pr and Pfr states and the effects of these mutations on the photophysical properties of PAiRFP1.
-
8.
SPOTONE: Hot Spots on Protein Complexes with Extremely Randomized Trees via Sequence-Only Features.
Preto, AJ, Moreira, IS
International journal of molecular sciences. 2020;(19)
Abstract
Protein Hot-Spots (HS) are experimentally determined amino acids, key to small ligand binding and tend to be structural landmarks on protein-protein interactions. As such, they were extensively approached by structure-based Machine Learning (ML) prediction methods. However, the availability of a much larger array of protein sequences in comparison to determined tree-dimensional structures indicates that a sequence-based HS predictor has the potential to be more useful for the scientific community. Herein, we present SPOTONE, a new ML predictor able to accurately classify protein HS via sequence-only features. This algorithm shows accuracy, AUROC, precision, recall and F1-score of 0.82, 0.83, 0.91, 0.82 and 0.85, respectively, on an independent testing set. The algorithm is deployed within a free-to-use webserver at http://moreiralab.com/resources/spotone, only requiring the user to submit a FASTA file with one or more protein sequences.
-
9.
Into the wild: new yeast genomes from natural environments and new tools for their analysis.
Libkind, D, Peris, D, Cubillos, FA, Steenwyk, JL, Opulente, DA, Langdon, QK, Rokas, A, Hittinger, CT
FEMS yeast research. 2020;(2)
-
-
Free full text
-
Abstract
Genomic studies of yeasts from the wild have increased considerably in the past few years. This revolution has been fueled by advances in high-throughput sequencing technologies and a better understanding of yeast ecology and phylogeography, especially for biotechnologically important species. The present review aims to first introduce new bioinformatic tools available for the generation and analysis of yeast genomes. We also assess the accumulated genomic data of wild isolates of industrially relevant species, such as Saccharomyces spp., which provide unique opportunities to further investigate the domestication processes associated with the fermentation industry and opportunistic pathogenesis. The availability of genome sequences of other less conventional yeasts obtained from the wild has also increased substantially, including representatives of the phyla Ascomycota (e.g. Hanseniaspora) and Basidiomycota (e.g. Phaffia). Here, we review salient examples of both fundamental and applied research that demonstrate the importance of continuing to sequence and analyze genomes of wild yeasts.
-
10.
Immunoinformatics guided rational design of a next generation multi epitope based peptide (MEBP) vaccine by exploring Zika virus proteome.
Shahid, F, Ashfaq, UA, Javaid, A, Khalid, H
Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases. 2020;:104199
Abstract
Zika virus (ZIKV) is an RNA virus that has spread through mosquito sting. Currently, no vaccine and antiviral medication available so far against ZIKV. Therefore, it has fostered a study to design MEBP vaccine enabling effective prevention against the ZIKV infection. In this study combination of immuno-informatics and molecular docking approach was used to constitute a MEBP vaccine. The ZIKV proteome was used for prediction of B-cell, T-cell (HTL & CTL) and IFN-γ epitopes. After prediction, highly antigenic and overlapping epitopes have been shortlisted which includes 14 CTL and 11 HTL epitopes that have been linked to the final peptide through AAY and GPGPG linkers respectively. An adjuvant at the N-end of the vaccine was added to improve the immunogenicity of the vaccine through the EAAAK linker. The final construct constitutes 435 amino acids after the addition of linkers and adjuvant. The existence of B-cell and IFN-γ epitopes affirms the humoral and cell-mediated immune responses acquired by the construct. Allergenicity, antigenicity and different physiochemical attributes of the vaccine were evaluated to assure its safety and immunogenicity profile. In fact, the construct was antigenic and non-allergenic. Docking was performed among vaccine and TLR-3 to evaluate the binding affinity and the molecular interaction. Finally, the construct was subjected to In silico cloning to confers the authenticity of its expression efficiency. However, the proposed construct need to be validate experimentally to ensure its safety and immunogenic profile.