-
1.
Molecular mechanism and role of microRNA-93 in human cancers: A study based on bioinformatics analysis, meta-analysis, and quantitative polymerase chain reaction validation.
Gao, Y, Deng, K, Liu, X, Dai, M, Chen, X, Chen, J, Chen, J, Huang, Y, Dai, S, Chen, J
Journal of cellular biochemistry. 2019;(4):6370-6383
Abstract
INTRODUCTION Currently, studies have shown that microRNA-93 (miR-93) can be an oncogene or a tumor suppressor in different kinds of cancers. The role of miR-93 in human cancers is inconsistent and the underlying mechanism on the aberrant expression of miR-93 is complicated. METHODS We first conducted gene enrichment analysis to give insight into the prospective mechanism of miR-93. Second, we performed a meta-analysis to evaluate the clinical value of miR-93. Finally, a validation test based on quantitative polymerase chain reaction (qPCR) was performed to further investigate the role of miR-93 in pan-cancer. RESULTS Gene Ontology (GO) enrichment analysis results showed that the target genes of miR-93 were closely related to transcription, and MAPK1, RBBP7 and Smad7 became the hub genes. In the diagnostic meta-analysis, the overall sensitivity, specificity, and area under the curve were 0.76 (0.64-0.85), 0.82 (0.64-0.92), and 0.85 (0.82-0.88), respectively, which suggested that miR-93 had excellent performance on the diagnosis for human cancers. In the prognostic meta-analysis, dysregulated miR-93 was found to be associated with poor OS in cancer patients. In the qPCR validation test, the serum levels of miR-93 were upregulated in breast cancer, breast hyperplasia, lung cancer, chronic obstructive pulmonary disease, nasopharyngeal cancer, hepatocellular cancer, gastric ulcer, endometrial cancer, esophageal cancer, laryngeal cancer, and prostate cancer compared with healthy controls. CONCLUSIONS miR-93 could act as an effective diagnostic and prognostic factor for cancer patients. Its clinical value for cancer early diagnosis and survival prediction is promising.
-
2.
PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning.
Li, JQ, You, ZH, Li, X, Ming, Z, Chen, X
IEEE/ACM transactions on computational biology and bioinformatics. 2017;(5):1165-1172
Abstract
Self interacting proteins (SIPs) play an important role in various aspects of the structural and functional organization of the cell. Detecting SIPs is one of the most important issues in current molecular biology. Although a large number of SIPs data has been generated by experimental methods, wet laboratory approaches are both time-consuming and costly. In addition, they yield high false negative and positive rates. Thus, there is a great need for in silico methods to predict SIPs accurately and efficiently. In this study, a new sequence-based method is proposed to predict SIPs. The evolutionary information contained in Position-Specific Scoring Matrix (PSSM) is extracted from of protein with known sequence. Then, features are fed to an ensemble classifier to distinguish the self-interacting and non-self-interacting proteins. When performed on Saccharomyces cerevisiae and Human SIPs data sets, the proposed method can achieve high accuracies of 86.86 and 91.30 percent, respectively. Our method also shows a good performance when compared with the SVM classifier and previous methods. Consequently, the proposed method can be considered to be a novel promising tool to predict SIPs.
-
3.
Prediction of Drug-Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures.
Meng, FR, You, ZH, Chen, X, Zhou, Y, An, JY
Molecules (Basel, Switzerland). 2017;(7)
Abstract
Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.
-
4.
PCVMZM: Using the Probabilistic Classification Vector Machines Model Combined with a Zernike Moments Descriptor to Predict Protein-Protein Interactions from Protein Sequences.
Wang, Y, You, Z, Li, X, Chen, X, Jiang, T, Zhang, J
International journal of molecular sciences. 2017;(5)
Abstract
Protein-protein interactions (PPIs) are essential for most living organisms' process. Thus, detecting PPIs is extremely important to understand the molecular mechanisms of biological systems. Although many PPIs data have been generated by high-throughput technologies for a variety of organisms, the whole interatom is still far from complete. In addition, the high-throughput technologies for detecting PPIs has some unavoidable defects, including time consumption, high cost, and high error rate. In recent years, with the development of machine learning, computational methods have been broadly used to predict PPIs, and can achieve good prediction rate. In this paper, we present here PCVMZM, a computational method based on a Probabilistic Classification Vector Machines (PCVM) model and Zernike moments (ZM) descriptor for predicting the PPIs from protein amino acids sequences. Specifically, a Zernike moments (ZM) descriptor is used to extract protein evolutionary information from Position-Specific Scoring Matrix (PSSM) generated by Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then, PCVM classifier is used to infer the interactions among protein. When performed on PPIs datasets of Yeast and H. Pylori, the proposed method can achieve the average prediction accuracy of 94.48% and 91.25%, respectively. In order to further evaluate the performance of the proposed method, the state-of-the-art support vector machines (SVM) classifier is used and compares with the PCVM model. Experimental results on the Yeast dataset show that the performance of PCVM classifier is better than that of SVM classifier. The experimental results indicate that our proposed method is robust, powerful and feasible, which can be used as a helpful tool for proteomics research.
-
5.
Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information.
An, JY, You, ZH, Chen, X, Huang, DS, Yan, G, Wang, DF
Molecular bioSystems. 2016;(12):3702-3710
Abstract
Self-interacting proteins (SIPs) play an essential role in cellular functions and the evolution of protein interaction networks (PINs). Due to the limitations of experimental self-interaction proteins detection technology, it is a very important task to develop a robust and accurate computational approach for SIPs prediction. In this study, we propose a novel computational method for predicting SIPs from protein amino acids sequence. Firstly, a novel feature representation scheme based on Local Binary Pattern (LBP) is developed, in which the evolutionary information, in the form of multiple sequence alignments, is taken into account. Then, by employing the Relevance Vector Machine (RVM) classifier, the performance of our proposed method is evaluated on yeast and human datasets using a five-fold cross-validation test. The experimental results show that the proposed method can achieve high accuracies of 94.82% and 97.28% on yeast and human datasets, respectively. For further assessing the performance of our method, we compared it with the state-of-the-art Support Vector Machine (SVM) classifier, and other existing methods, on the same datasets. Comparison results demonstrate that the proposed method is very promising and could provide a cost-effective alternative for predicting SIPs. In addition, to facilitate extensive studies for future proteomics research, a web server is freely available for academic use at .
-
6.
PredHydroxy: computational prediction of protein hydroxylation site locations based on the primary structure.
Shi, SP, Chen, X, Xu, HD, Qiu, JD
Molecular bioSystems. 2015;(3):819-25
Abstract
Compared to well-known and extensively studied protein phosphorylation, protein hydroxylation attracts much less attention and the molecular mechanism of the hydroxylation is still incompletely understood. And yet annotation of hydroxylation in proteomes is a first-critical step toward decoding protein function and understanding their physiological roles that have been implicated in the pathological processes and providing useful information for the drug designs of various diseases related with hydroxylation. In this work, we present a novel method called PredHydroxy to automate the prediction of the proline and lysine hydroxylation sites based on position weight amino acids composition, 8 high-quality amino acid indices and support vector machines. The PredHydroxy achieved a promising performance with an area under the receiver operating characteristic curve (AUC) of 82.72% and a Matthew's correlation coefficient (MCC) of 69.03% for hydroxyproline as well as an AUC of 87.41% and a MCC of 66.68% for hydroxylysine in jackknife cross-validation. The results obtained from both the cross validation and independent tests suggest that the PredHydroxy might be a powerful and complementary tool for further experimental investigation of protein hydroxylation. Feature analyses demonstrate that hydroxylation and non-hydroxylation have distinct location-specific differences; alpha and turn propensity is of importance for the hydroxylation of proline and lysine residues. A user-friendly server is freely available on the web at: .
-
7.
The prediction of palmitoylation site locations using a multiple feature extraction method.
Shi, SP, Sun, XY, Qiu, JD, Suo, SB, Chen, X, Huang, SY, Liang, RP
Journal of molecular graphics & modelling. 2013;:125-30
Abstract
As an extremely important and ubiquitous post-translational lipid modification, palmitoylation plays a significant role in a variety of biological and physiological processes. Unlike other lipid modifications, protein palmitoylation and depalmitoylation are highly dynamic and can regulate both protein function and localization. The dynamic nature of palmitoylation is poorly understood because of the limitations in current assay methods. The in vivo or in vitro experimental identification of palmitoylation sites is both time consuming and expensive. Due to the large volume of protein sequences generated in the post-genomic era, it is extraordinarily important in both basic research and drug discovery to rapidly identify the attributes of a new protein's palmitoylation sites. In this work, a new computational method, WAP-Palm, combining multiple feature extraction, has been developed to predict the palmitoylation sites of proteins. The performance of the WAP-Palm model is measured herein and was found to have a sensitivity of 81.53%, a specificity of 90.45%, an accuracy of 85.99% and a Matthews correlation coefficient of 72.26% in 10-fold cross-validation test. The results obtained from both the cross-validation and independent tests suggest that the WAP-Palm model might facilitate the identification and annotation of protein palmitoylation locations. The online service is available at http://bioinfo.ncu.edu.cn/WAP-Palm.aspx.
-
8.
Position-specific analysis and prediction for protein lysine acetylation based on multiple features.
Suo, SB, Qiu, JD, Shi, SP, Sun, XY, Huang, SY, Chen, X, Liang, RP
PloS one. 2012;(11):e49108
Abstract
Protein lysine acetylation is a type of reversible post-translational modification that plays a vital role in many cellular processes, such as transcriptional regulation, apoptosis and cytokine signaling. To fully decipher the molecular mechanisms of acetylation-related biological processes, an initial but crucial step is the recognition of acetylated substrates and the corresponding acetylation sites. In this study, we developed a position-specific method named PSKAcePred for lysine acetylation prediction based on support vector machines. The residues around the acetylation sites were selected or excluded based on their entropy values. We incorporated features of amino acid composition information, evolutionary similarity and physicochemical properties to predict lysine acetylation sites. The prediction model achieved an accuracy of 79.84% and a Matthews correlation coefficient of 59.72% using the 10-fold cross-validation on balanced positive and negative samples. A feature analysis showed that all features applied in this method contributed to the acetylation process. A position-specific analysis showed that the features derived from the critical neighboring residues contributed profoundly to the acetylation site determination. The detailed analysis in this paper can help us to understand more of the acetylation mechanism and can provide guidance for the related experimental validation.
-
9.
Modelling amorphous computations with transcription networks.
Simpson, ZB, Tsai, TL, Nguyen, N, Chen, X, Ellington, AD
Journal of the Royal Society, Interface. 2009;(Suppl 4):S523-33
Abstract
The power of electronic computation is due in part to the development of modular gate structures that can be coupled to carry out sophisticated logical operations and whose performance can be readily modelled. However, the equivalences between electronic and biochemical operations are far from obvious. In order to help cross between these disciplines, we develop an analogy between complementary metal oxide semiconductor and transcriptional logic gates. We surmise that these transcriptional logic gates might prove to be useful in amorphous computations and model the abilities of immobilized gates to form patterns. Finally, to begin to implement these computations, we design unique hairpin transcriptional gates and then characterize these gates in a binary latch similar to that already demonstrated by Kim et al. (Kim, White & Winfree 2006 Mol. Syst. Biol. 2, 68 (doi:10.1038/msb4100099)). The hairpin transcriptional gates are uniquely suited to the design of a complementary NAND gate that can serve as an underlying basis of molecular computing that can output matter rather than electronic information.
-
10.
An improved Gibbs sampling method for motif discovery via sequence weighting.
Chen, X, Jiang, T
Computational systems bioinformatics. Computational Systems Bioinformatics Conference. 2006;:239-47
-
-
Free full text
-
Abstract
The discovery of motifs in DNA sequences remains a fundamental and challenging problem in computational molecular biology and regulatory genomics, although a large number of computational methods have been proposed in the past decade. Among these methods, the Gibbs sampling strategy has shown great promise and is routinely used for finding regulatory motif elements in the promoter regions of co-expressed genes. In this paper, we present an enhancement to the Gibbs sampling method when the expression data of the concerned genes is given. A sequence weighting scheme is proposed by explicitly taking gene expression variation into account in Gibbs sampling. That is, every putative motif element is assigned a weight proportional to the fold change in the expression level of its downstream gene under a single experimental condition, and a position specific scoring matrix (PSSM) is estimated from these weighted putative motif elements. Such an estimated PSSM might represent a more accurate motif model since motif elements with dramatic fold changes in gene expression are more likely to represent true motifs. This weighted Gibbs sampling method has been implemented and successfully tested on both simulated and biological sequence data. Our experimental results demonstrate that the use of sequence weighting has a profound impact on the performance of a Gibbs motif sampling algorithm.