0
selected
-
1.
Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix.
Liu, B, Chen, J, Guo, M, Wang, X
IEEE/ACM transactions on computational biology and bioinformatics. 2019;(1):292-300
Abstract
Protein remote homology detection and fold recognition are two critical tasks for the studies of protein structures and functions. Currently, the profile-based methods achieve the state-of-the-art performance in these fields. However, the widely used sequence profiles, like position-specific frequency matrix (PSFM) and position-specific scoring matrix (PSSM), ignore the sequence-order effects along protein sequence. In this study, we have proposed a novel profile, called sequence-order frequency matrix (SOFM), to extract the sequence-order information of neighboring residues from multiple sequence alignment (MSA). Combined with two profile feature extraction approaches, top-n-grams and the Smith-Waterman algorithm, the SOFMs are applied to protein remote homology detection and fold recognition, and two predictors called SOFM-Top and SOFM-SW are proposed. Experimental results show that SOFM contains more information content than other profiles, and these two predictors outperform other state-of-the-art methods. It is anticipated that SOFM will become a very useful profile in the studies of protein structures and functions.
-
2.
Molecular mechanism and role of microRNA-93 in human cancers: A study based on bioinformatics analysis, meta-analysis, and quantitative polymerase chain reaction validation.
Gao, Y, Deng, K, Liu, X, Dai, M, Chen, X, Chen, J, Chen, J, Huang, Y, Dai, S, Chen, J
Journal of cellular biochemistry. 2019;(4):6370-6383
Abstract
INTRODUCTION Currently, studies have shown that microRNA-93 (miR-93) can be an oncogene or a tumor suppressor in different kinds of cancers. The role of miR-93 in human cancers is inconsistent and the underlying mechanism on the aberrant expression of miR-93 is complicated. METHODS We first conducted gene enrichment analysis to give insight into the prospective mechanism of miR-93. Second, we performed a meta-analysis to evaluate the clinical value of miR-93. Finally, a validation test based on quantitative polymerase chain reaction (qPCR) was performed to further investigate the role of miR-93 in pan-cancer. RESULTS Gene Ontology (GO) enrichment analysis results showed that the target genes of miR-93 were closely related to transcription, and MAPK1, RBBP7 and Smad7 became the hub genes. In the diagnostic meta-analysis, the overall sensitivity, specificity, and area under the curve were 0.76 (0.64-0.85), 0.82 (0.64-0.92), and 0.85 (0.82-0.88), respectively, which suggested that miR-93 had excellent performance on the diagnosis for human cancers. In the prognostic meta-analysis, dysregulated miR-93 was found to be associated with poor OS in cancer patients. In the qPCR validation test, the serum levels of miR-93 were upregulated in breast cancer, breast hyperplasia, lung cancer, chronic obstructive pulmonary disease, nasopharyngeal cancer, hepatocellular cancer, gastric ulcer, endometrial cancer, esophageal cancer, laryngeal cancer, and prostate cancer compared with healthy controls. CONCLUSIONS miR-93 could act as an effective diagnostic and prognostic factor for cancer patients. Its clinical value for cancer early diagnosis and survival prediction is promising.
-
3.
Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier.
Wei, L, Xing, P, Zeng, J, Chen, J, Su, R, Guo, F
Artificial intelligence in medicine. 2017;:67-74
Abstract
Computational methods are employed in bioinformatics to predict protein-protein interactions (PPIs). PPIs and protein-protein non-interactions (PPNIs) display different levels of development, and the number of PPIs is considerably greater than that of PPNIs. This significant difference in the number of PPIs and PPNIs increases the cost of constructing a balanced dataset. PPIs can be classified as either physical or genetic. However, ready-made PPNI databases were proven only to have no physical interactions and were not proven to have no genetic interactions. Hence, ready-made PPNI databases contain false negative non-interactions. In this study, two PPNI datasets were artificially generated from a PPI database. In contrast to various traditional PPI feature extraction methods based on sequential information, two types of novel feature extraction methods were proposed. One is based on secondary structure information, and the other is based on the physicochemical properties of proteins. The experimental results of the RandomPairs dataset validate the efficiency and effectiveness of the proposed prediction model. These results reveal the potential of constructing a PPI negative dataset to reduce false negatives. Related datasets, tools, and source codes are accessible at http://lab.malab.cn/soft/PPIPre/PPIPre.html.
-
4.
ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank.
Chen, J, Guo, M, Li, S, Liu, B
Bioinformatics (Oxford, England). 2017;(21):3473-3476
Abstract
SUMMARY As one of the most important tasks in protein sequence analysis, protein remote homology detection is critical for both basic research and practical applications. Here, we present an effective web server for protein remote homology detection called ProtDec-LTR2.0 by combining ProtDec-Learning to Rank (LTR) and pseudo protein representation. Experimental results showed that the detection performance is obviously improved. The web server provides a user-friendly interface to explore the sequence and structure information of candidate proteins and find their conserved domains by launching a multiple sequence alignment tool. AVAILABILITY AND IMPLEMENTATION The web server is free and open to all users with no login requirement at http://bioinformatics.hitsz.edu.cn/ProtDec-LTR2.0/. CONTACT bliu@hit.edu.cn.
-
5.
Protein Remote Homology Detection Based on an Ensemble Learning Approach.
Chen, J, Liu, B, Huang, D
BioMed research international. 2016;:5813645
Abstract
Protein remote homology detection is one of the central problems in bioinformatics. Although some computational methods have been proposed, the problem is still far from being solved. In this paper, an ensemble classifier for protein remote homology detection, called SVM-Ensemble, was proposed with a weighted voting strategy. SVM-Ensemble combined three basic classifiers based on different feature spaces, including Kmer, ACC, and SC-PseAAC. These features consider the characteristics of proteins from various perspectives, incorporating both the sequence composition and the sequence-order information along the protein sequences. Experimental results on a widely used benchmark dataset showed that the proposed SVM-Ensemble can obviously improve the predictive performance for the protein remote homology detection. Moreover, it achieved the best performance and outperformed other state-of-the-art methods.
-
6.
High-throughput phenotypic characterization of Pseudomonas aeruginosa membrane transport genes.
Johnson, DA, Tetu, SG, Phillippy, K, Chen, J, Ren, Q, Paulsen, IT
PLoS genetics. 2008;(10):e1000211
Abstract
The deluge of data generated by genome sequencing has led to an increasing reliance on bioinformatic predictions, since the traditional experimental approach of characterizing gene function one at a time cannot possibly keep pace with the sequence-based discovery of novel genes. We have utilized Biolog phenotype MicroArrays to identify phenotypes of gene knockout mutants in the opportunistic pathogen and versatile soil bacterium Pseudomonas aeruginosa in a relatively high-throughput fashion. Seventy-eight P. aeruginosa mutants defective in predicted sugar and amino acid membrane transporter genes were screened and clear phenotypes were identified for 27 of these. In all cases, these phenotypes were confirmed by independent growth assays on minimal media. Using qRT-PCR, we demonstrate that the expression levels of 11 of these transporter genes were induced from 4- to 90-fold by their substrates identified via phenotype analysis. Overall, the experimental data showed the bioinformatic predictions to be largely correct in 22 out of 27 cases, and led to the identification of novel transporter genes and a potentially new histamine catabolic pathway. Thus, rapid phenotype identification assays are an invaluable tool for confirming and extending bioinformatic predictions.