0
selected
-
1.
RSVdb: a comprehensive database of transcriptome RNA structure.
Yu, H, Zhang, Y, Sun, Q, Gao, H, Tao, S
Briefings in bioinformatics. 2021;(3)
Abstract
RNA fulfills a crucial regulatory role in cells by folding into a complex RNA structure. To date, a chemical compound, dimethyl sulfate (DMS), has been developed to probe the RNA structure at the transcriptome level effectively. We proposed a database, RSVdb (https://taolab.nwafu.edu.cn/rsvdb/), for the browsing and visualization of transcriptome RNA structures. RSVdb, including 626 225 RNAs with validated DMS reactivity from 178 samples in eight species, supports four main functions: information retrieval, research overview, structure prediction and resource download. Users can search for species, studies, transcripts and genes of interest; browse the quality control of sequencing data and statistical charts of RNA structure information; preview and perform online prediction of RNA structures in silico and under DMS restraint of different experimental treatments and download RNA structure data for species and studies. Together, RSVdb provides a reference for RNA structure and will support future research on the function of RNA structure at the transcriptome level.
-
2.
Protein-Protein Interactions Prediction Based on Graph Energy and Protein Sequence Information.
Xu, D, Xu, H, Zhang, Y, Chen, W, Gao, R
Molecules (Basel, Switzerland). 2020;(8)
Abstract
Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.
-
3.
Genome-wide probing RNA structure with the modified DMS-MaPseq in Arabidopsis.
Wang, Z, Wang, M, Wang, T, Zhang, Y, Zhang, X
Methods (San Diego, Calif.). 2019;:30-40
Abstract
Transcripts have intrinsic propensity to form stable secondary structure that is fundamental to regulate RNA transcription, splicing, translation, RNA localization and turnover. Numerous methods that integrate chemical reactions with next-generation sequencing (NGS) have been applied to study in vivo RNA structure, providing new insights into RNA biology. Dimethyl sulfate (DMS) probing coupled with mutational profiling through NGS (DMS-MaPseq) is a newly developed method for revealing genome-wide or target-specific RNA structure. Herein, we present our experimental protocol of a modified DMS-MaPseq method for plant materials. The DMS treatment condition was optimized, and library preparation procedures were simplified. We also provided custom scripts for bioinformatic analysis of genome-wide DMS-MaPseq data. Bioinformatic results showed that our method could generate high-quality and reproducible data. Further, we assessed sequencing depth and coverage for genome-wide RNA structure profiling in Arabidopsis, and provided two examples of in vivo structure of mobile RNAs. We hope that our modified DMS-MaPseq method will serve as a powerful tool for analyzing in vivo RNA structurome in plants.
-
4.
PRWHMDA: Human Microbe-Disease Association Prediction by Random Walk on the Heterogeneous Network with PSO.
Wu, C, Gao, R, Zhang, D, Han, S, Zhang, Y
International journal of biological sciences. 2018;(8):849-857
Abstract
Microorganisms resided in human body play a vital role in metabolism, immune defense, nutrition absorption, cancer control and protection against pathogen colonization. The changes of microbial communities can cause human diseases. Based on the known microbe-disease association, we presented a novel computational model employing Random Walking with Restart optimized by Particle Swarm Optimization (PSO) on the heterogeneous interlinked network of Human Microbe-Disease Associations (PRWHMDA) (see Figure 1). Based on the known human microbe-disease associations, we constructed the heterogeneous interlinked network with Cosine similarity. The extended random walk with restart (RWR) method was derived to get the potential microbe-disease associations. PSO was utilized to get the optimal parameters of RWR. To evaluate the prediction effectiveness, we performed leave one out cross validation (LOOCV) and 5-fold cross validation (CV), which got the AUC (The area under ROC curve) of 0.915 (LOOCV) and the average AUCs of 0.8875 ± 0.0046 (5-fold CV). Moreover, we carried out three case studies of asthma, inflammatory bowel disease (IBD) and type 1 diabetes (T1D) for the further evaluation. The result showed that 10, 10 and 9 of top-10 predicted microbes were verified by previously published experimental results, respectively. It is anticipated that PRWHMDA can be effective to identify the disease-related microbes and maybe helpful to disclose the relationship between microorganisms and their human host.
-
5.
Integrative bioinformatics and proteomics-based discovery of an eEF2K inhibitor (cefatrizine) with ER stress modulation in breast cancer cells.
Yao, Z, Li, J, Liu, Z, Zheng, L, Fan, N, Zhang, Y, Jia, N, Lv, J, Liu, N, Zhu, X, et al
Molecular bioSystems. 2016;(3):729-36
Abstract
Eukaryotic elongation factor-2 kinase (eEF2K), a unique calcium/calmodulin-dependent protein kinase, is well known to regulate apoptosis, autophagy and ER stress in many types of human cancers. Therefore, eEF2K would be regarded as a promising therapeutic target; however, the eEF2K-regulated mechanism and its targeted inhibitor still remain to be discovered in cancer. Herein, we constructed a protein-protein interaction (PPI) network of eEF2K and achieved an eEF2K-regulated ER stress subnetwork by bioinformatics prediction. Then, we found that the differential protein expressions involved in ER stress in the context of si-eEF2K-treated MCF-7 and MDA-MB-436 cells by iTRAQ-based analyses, respectively. Integrated into these aforementioned results, we constructed a core eEF2K-regulated ER stress subnetwork in breast cancer cells. Subsequently, we screened a series of candidate compounds targeting eEF2K and discovered a novel eEF2K inhibitor (cefatrizine) with an anti-proliferative activity toward breast cancer cells. Moreover, we found that cefatrizine induced ER stress in both MCF-7 and MDA-MB-436 cells. Interestingly, we demonstrated that the mechanism of cefatrizine-induced ER stress was in good agreement with our bioinformatics and proteomics-based results. In conclusion, these results demonstrate that a novel eEF2K inhibitor (cefatrizine) induces ER stress in breast cancer cells by integrating bioinformatics prediction, proteomics analyses and experimental validation, which would provide a clue for exploring more mechanisms of eEF2K and its targeted inhibitors in cancer therapy.
-
6.
An evolution-based approach to De Novo protein design and case study on Mycobacterium tuberculosis.
Mitra, P, Shultis, D, Brender, JR, Czajka, J, Marsh, D, Gray, F, Cierpicki, T, Zhang, Y
PLoS computational biology. 2013;(10):e1003298
Abstract
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.
-
7.
Prediction and Analysis of Post-Translational Pyruvoyl Residue Modification Sites from Internal Serines in Proteins.
Jiang, Y, Li, BQ, Zhang, Y, Feng, YM, Gao, YF, Zhang, N, Cai, YD
PloS one. 2013;(6):e66678
Abstract
Most of pyruvoyl-dependent proteins observed in prokaryotes and eukaryotes are critical regulatory enzymes, which are primary targets of inhibitors for anti-cancer and anti-parasitic therapy. These proteins undergo an autocatalytic, intramolecular self-cleavage reaction in which a covalently bound pyruvoyl group is generated on a conserved serine residue. Traditional detections of the modified serine sites are performed by experimental approaches, which are often labor-intensive and time-consuming. In this study, we initiated in an attempt for the computational predictions of such serine sites with Feature Selection based on a Random Forest. Since only a small number of experimentally verified pyruvoyl-modified proteins are collected in the protein database at its current version, we only used a small dataset in this study. After removing proteins with sequence identities >60%, a non-redundant dataset was generated and was used, which contained only 46 proteins, with one pyruvoyl serine site for each protein. Several types of features were considered in our method including PSSM conservation scores, disorders, secondary structures, solvent accessibilities, amino acid factors and amino acid occurrence frequencies. As a result, a pretty good performance was achieved in our dataset. The best 100.00% accuracy and 1.0000 MCC value were obtained from the training dataset, and 93.75% accuracy and 0.8441 MCC value from the testing dataset. The optimal feature set contained 9 features. Analysis of the optimal feature set indicated the important roles of some specific features in determining the pyruvoyl-group-serine sites, which were consistent with several results of earlier experimental studies. These selected features may shed some light on the in-depth understanding of the mechanism of the post-translational self-maturation process, providing guidelines for experimental validation. Future work should be made as more pyruvoyl-modified proteins are found and the method should be evaluated on larger datasets. At last, the predicting software can be downloaded from http://www.nkbiox.com/sub/pyrupred/index.html.