0
selected
-
1.
Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.
Jing, X, Dong, Q, Hong, D, Lu, R
IEEE/ACM transactions on computational biology and bioinformatics. 2020;(6):1918-1931
Abstract
As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.
-
2.
De novo sequencing of proteins by mass spectrometry.
Vitorino, R, Guedes, S, Trindade, F, Correia, I, Moura, G, Carvalho, P, Santos, MAS, Amado, F
Expert review of proteomics. 2020;(7-8):595-607
Abstract
INTRODUCTION Proteins are crucial for every cellular activity and unraveling their sequence and structure is a crucial step to fully understand their biology. Early methods of protein sequencing were mainly based on the use of enzymatic or chemical degradation of peptide chains. With the completion of the human genome project and with the expansion of the information available for each protein, various databases containing this sequence information were formed. AREAS COVERED De novo protein sequencing, shotgun proteomics and other mass-spectrometric techniques, along with the various software are currently available for proteogenomic analysis. Emphasis is placed on the methods for de novo sequencing, together with potential and shortcomings using databases for interpretation of protein sequence data. EXPERT OPINION As mass-spectrometry sequencing performance is improving with better software and hardware optimizations, combined with user-friendly interfaces, de-novo protein sequencing becomes imperative in shotgun proteomic studies. Issues regarding unknown or mutated peptide sequences, as well as, unexpected post-translational modifications (PTMs) and their identification through false discovery rate searches using the target/decoy strategy need to be addressed. Ideally, it should become integrated in standard proteomic workflows as an add-on to conventional database search engines, which then would be able to provide improved identification.
-
3.
Small design from big alignment: engineering proteins with multiple sequence alignment as the starting point.
Wang, T, Liang, C, Hou, Y, Zheng, M, Xu, H, An, Y, Xiao, S, Liu, L, Lian, S
Biotechnology letters. 2020;(8):1305-1315
Abstract
Multiple sequence alignment (MSA) is a fundamental way to gain information that cannot be obtained from the analysis of any individual sequence included in the alignment. It provides ways to investigate the relationship between sequence and function from a perspective of evolution. Thus, the MSA of proteins can be employed as a reference for protein engineering. In this paper, we reviewed the recent advances to highlight how protein engineering was benefited from the MSA of proteins. These methods include (1) engineering the thermostability or solubility of proteins by making it closer to the consensus sequence of the alignment through introducing site mutations; (2) structure-based engineering proteins with comparative modeling; (3) creating paleoenzymes featured with high thermostability and promiscuity by constructing the ancestral sequences derived from multiple sequence alignment; and (4) incorporating site-mutations targeting the evolutionarily coupled sites identified from multiple sequence alignment.
-
4.
Mutants of β2-glycoprotein I: Their features and potent applications.
Shen, L, Azmi, NU, Tan, XW, Yasuda, S, Wahyuningsih, AT, Inagaki, J, Kobayashi, K, Ando, E, Sasaki, T, Matsuura, E
Best practice & research. Clinical rheumatology. 2018;(4):572-590
Abstract
β2-Glycoprotein I (β2GPI) is a highly-glycosylated plasma protein composed of five homologous domains which regulates coagulation, fibrinolysis, and/or angiogenesis by interacting to negatively charged hydrophobic molecules and/or with plasminogen and its metabolites. The present study focused on structural and functional characterization of β2GPI's domain I (DI) and V (DV). Through N-terminal amino acid sequencing, a novel plasmin-cleaved site at K287C288 was identified in DV. We further modified the intact DV by altering two amino acids at specific proteolytic cleavage sites to generate three stable DV mutants: DV(PP), (PE), and (AA). Results of both SDS-PAGE and MALDI-TOF-MS showed that all three DV mutants were more stable than the intact DV, and DV(PE) was predominantly resistant to proteolysis. Competitive ELISA assessed affinities of intact β2GPI and those mutants to cardiolipin. In culture system, all DV and DI mutants potently inhibited HUVEC's proliferation by 18-30% as compared to control. Only DI and nicked β2GPI showed significant inhibition in HUVEC's tube formation. Moreover, DV(PE)-coated affinity columns demonstrated its binding property towards anionic lipids and could substantially isolate anionic DOPS from zwitterionic DOPC as a purification model. In summary, the proteolytic resistant and unhindered phospholipid (PL) binding properties of DV(PE) have made it an appealing element for subsequent prospective studies. Future in-depth characterization and optimized applications of cleavage-resistant DV(PE) would complement its full capacity as a novel clinical modality in the field of vascular imaging and/or lipidomics studies.
-
5.
Practical analysis of specificity-determining residues in protein families.
Chagoyen, M, García-Martín, JA, Pazos, F
Briefings in bioinformatics. 2016;(2):255-61
Abstract
Determining the residues that are important for the molecular activity of a protein is a topic of broad interest in biomedicine and biotechnology. This knowledge can help understanding the protein's molecular mechanism as well as to fine-tune its natural function eventually with biotechnological or therapeutic implications. Some of the protein residues are essential for the function common to all members of a family of proteins, while others explain the particular specificities of certain subfamilies (like binding on different substrates or cofactors and distinct binding affinities). Owing to the difficulty in experimentally determining them, a number of computational methods were developed to detect these functional residues, generally known as 'specificity-determining positions' (or SDPs), from a collection of homologous protein sequences. These methods are mature enough for being routinely used by molecular biologists in directing experiments aimed at getting insight into the functional specificity of a family of proteins and eventually modifying it. In this review, we summarize some of the recent discoveries achieved through SDP computational identification in a number of relevant protein families, as well as the main approaches and software tools available to perform this type of analysis.
-
6.
Protein Function Prediction: Problems and Pitfalls.
Pearson, WR
Current protocols in bioinformatics. 2015;:4.12.1-4.12.8
Abstract
The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous meaning of "functional similarity," and (3) being aware of the limitations of annotations in functional databases. Protein function prediction does not offer "one-size-fits-all" solutions. Prediction strategies work better when the idiosyncrasies of function and functional annotation are better understood.
-
7.
Sequence-based protein superfamily classification using computational intelligence techniques: a review.
Vipsita, S, Rath, SK
International journal of data mining and bioinformatics. 2015;(4):424-57
Abstract
Protein superfamily classification deals with the problem of predicting the family membership of newly discovered amino acid sequence. Although many trivial alignment methods are already developed by previous researchers, but the present trend demands the application of computational intelligent techniques. As there is an exponential growth in size of biological database, retrieval and inference of essential knowledge in the biological domain become a very cumbersome task. This problem can be easily handled using intelligent techniques due to their ability of tolerance for imprecision, uncertainty, approximate reasoning, and partial truth. This paper discusses the various global and local features extracted from full length protein sequence which are used for the approximation and generalisation of the classifier. The various parameters used for evaluating the performance of the classifiers are also discussed. Therefore, this review article can show right directions to the present researchers to make an improvement over the existing methods.
-
8.
State-of-the-art bioinformatics protein structure prediction tools (Review).
Pavlopoulou, A, Michalopoulos, I
International journal of molecular medicine. 2011;(3):295-310
Abstract
Knowledge of the native structure of a protein could provide an understanding of the molecular basis of its function. However, in the postgenomics era, there is a growing gap between proteins with experimentally determined structures and proteins without known structures. To deal with the overwhelming data, a collection of automated methods as bioinformatics tools which determine the structure of a protein from its amino acid sequence have emerged. The aim of this paper is to provide the experimental biologists with a set of cutting-edge, carefully evaluated, user-friendly computational tools for protein structure prediction that would be helpful for the interpretation of their results and the rational design of new experiments.
-
9.
Supervised ensembles of prediction methods for subcellular localization.
Assfalg, J, Gong, J, Kriegel, HP, Pryakhin, A, Wei, T, Zimek, A
Journal of bioinformatics and computational biology. 2009;(2):269-85
Abstract
In the past decade, many automated prediction methods for the subcellular localization of proteins have been proposed, utilizing a wide range of principles and learning approaches. Based on an experimental evaluation of different methods and their theoretical properties, we propose to combine a well-balanced set of existing approaches to new, ensemble-based prediction methods. The experimental evaluation shows that our ensembles improve substantially over the underlying base methods.
-
10.
Terminal proteomics: N- and C-terminal analyses for high-fidelity identification of proteins using MS.
Nakazawa, T, Yamaguchi, M, Okamura, TA, Ando, E, Nishimura, O, Tsunasawa, S
Proteomics. 2008;(4):673-85
Abstract
In proteomics, MS plays an essential role in identifying and quantifying proteins. To characterize mature target proteins from living cells, candidate proteins are often analyzed with PMF and MS/MS ion search methods in combination with computational search routines based on bioinformatics. In contrast to shotgun proteomics, which is widely used to identify proteins, proteomics based on the analysis of N- and C-terminal amino acid sequences (terminal proteomics) should render higher fidelity results because of the high information content of terminal sequence and potentially high throughput of the method not requiring very high sequence coverage to be achieved by extensive sequencing. In line with this expectation, we review recent advances in methods for N- and C-terminal amino acid sequencing of proteins. This review focuses mainly on the methods of N- and C-terminal analyses based on MALDI-TOF MS for its easy accessibility, with several complementary approaches using LC/MS/MS. We also describe problems associated with MS and possible remedies, including chemical and enzymatic procedures to enhance the fidelity of these methods.