-
1.
Structure-based protein function prediction using graph convolutional networks.
Gligorijević, V, Renfrew, PD, Kosciolek, T, Leman, JK, Berenberg, D, Vatanen, T, Chandler, C, Taylor, BC, Fisk, IM, Vlamakis, H, et al
Nature communications. 2021;(1):3168
Abstract
The rapid increase in the number of proteins in sequence databases and the diversity of their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network for predicting protein functions by leveraging sequence features extracted from a protein language model and protein structures. It outperforms current leading methods and sequence-based Convolutional Neural Networks and scales to the size of current sequence repositories. Augmenting the training set of experimental structures with homology models allows us to significantly expand the number of predictable functions. DeepFRI has significant de-noising capability, with only a minor drop in performance when experimental structures are replaced by protein models. Class activation mapping allows function predictions at an unprecedented resolution, allowing site-specific annotations at the residue-level in an automated manner. We show the utility and high performance of our method by annotating structures from the PDB and SWISS-MODEL, making several new confident function predictions. DeepFRI is available as a webserver at https://beta.deepfri.flatironinstitute.org/ .
-
2.
Analysis and comparison of alkaline and acid phosphatases of Gram-negative bacteria by bioinformatic and colorimetric methods.
Amoozadeh, M, Behbahani, M, Mohabatkar, H, Keyhanfar, M
Journal of biotechnology. 2020;:56-62
Abstract
Alkaline phosphatase (ALP) and acid phosphatase (ACP) are two important phosphatase enzymes that play fundamental roles in Gram-negative bacteria. Additionally, they are useful for various biotechnological and industrial applications. In the present study, different aspects of bacterial ALPs and ACPs such as pseudo amino acid composition (PseAAC), amino acid composition, dipeptide composition, physicochemical properties, secondary structures and structural motifs were studied. The binding affinity of the phosphomonoesters to ALP and ACP enzymes was predicted by docking, and the activity of ALPs and ACPs were measured using colorimetric assay. ROC curve statistical analysis the machine learning algorithms were applied for classification of these two phosphatase protein groups. The results indicated that the physicochemical properties of ALPs and ACPs were not significantly different, although the aliphatic index and Extinction coefficient of motifs of these two enzymes were significantly different. Classification based on the concept of PseAAC and dipeptide composition also indicated high accuracy. The result of docking demonstrated that the binding free energy of ALPs was less than ACPs and the experimental results demonstrated that the activity of ACPs was more than ALPs. In conclusion, there is a relationship between efficiency and PseAAC and dipeptide compositions of these two enzymes.
-
3.
Critical evaluation of bioinformatics tools for the prediction of protein crystallization propensity.
Wang, H, Feng, L, Webb, GI, Kurgan, L, Song, J, Lin, D
Briefings in bioinformatics. 2018;(5):838-852
-
-
Free full text
-
Abstract
X-ray crystallography is the main tool for structural determination of proteins. Yet, the underlying crystallization process is costly, has a high attrition rate and involves a series of trial-and-error attempts to obtain diffraction-quality crystals. The Structural Genomics Consortium aims to systematically solve representative structures of major protein-fold classes using primarily high-throughput X-ray crystallography. The attrition rate of these efforts can be improved by selection of proteins that are potentially easier to be crystallized. In this context, bioinformatics approaches have been developed to predict crystallization propensities based on protein sequences. These approaches are used to facilitate prioritization of the most promising target proteins, search for alternative structural orthologues of the target proteins and suggest designs of constructs capable of potentially enhancing the likelihood of successful crystallization. We reviewed and compared nine predictors of protein crystallization propensity. Moreover, we demonstrated that integrating selected outputs from multiple predictors as candidate input features to build the predictive model results in a significantly higher predictive performance when compared to using these predictors individually. Furthermore, we also introduced a new and accurate predictor of protein crystallization propensity, Crysf, which uses functional features extracted from UniProt as inputs. This comprehensive review will assist structural biologists in selecting the most appropriate predictor, and is also beneficial for bioinformaticians to develop a new generation of predictive algorithms.
-
4.
Computational tools help improve protein stability but with a solubility tradeoff.
Broom, A, Jacobi, Z, Trainor, K, Meiering, EM
The Journal of biological chemistry. 2017;(35):14349-14361
Abstract
Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnology. Increasing protein stability is an especially challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 experimental mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodynamic stability, ThreeFoil. Experimental characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein solubility, the most common cause of protein design failure. Examination of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of solubility.
-
5.
CoffeebEST: an integrated resource for Coffea spp expressed sequence tags.
Paschoal, AR, Fernandes, ED, Silva, JC, Lopes, FM, Pereira, LF, Domingues, DS
Genetics and molecular research : GMR. 2014;(4):10913-20
Abstract
Coffee is one of the most important commodities in the world, and its production relies mainly on two species, Coffea arabica and Coffea canephora. Although there are diverse transcriptome datasets available for coffee trees, few research groups have exploited the potential knowledge contained in these data, especially with respect to fruit and seed development. Here, we present a comparative analysis of the transcriptomes of Coffea arabica and Coffea canephora with a focus on fruit development using publicly available expressed sequence tags (ESTs). Most of the fruit and seed EST data has been obtained from C. canephora. Therefore, we performed a fruit EST analysis of the 5 developmental stages of this species (18, 22, 30, 42, and 46 weeks after flowering) comprising 29,009 sequences. We compared C. canephora fruit ESTs to reference unigenes of C. canephora (7710 contigs and 8955 singletons) and C. arabica (15,656 contigs and 16,351 singletons). Additional analyses included functional annotation based on Gene Onthology, as well as an annotation using PlantCyc, a curated plant protein database. The Coffee Bean EST (CoffeebEST) is a public database available at http://bioinfo-02.cp.utfpr.edu.br/. This database represents an additional resource for the coffee scientific community, offering a user-friendly collection of information for non-specialists in coffee molecular biology to support experimental research on comparative and functional genomics.
-
6.
Deleterious nonsynonymous single nucleotide polymorphisms in human solute carriers: the first comparison of three prediction methods.
Hao, DC, Xiao, B, Xiang, Y, Dong, XW, Xiao, PG
European journal of drug metabolism and pharmacokinetics. 2013;(1):53-62
Abstract
Abundant nsSNPs have been found in genes coding for human solute carrier (SLC) transporters, but there is little known about the relationship between the genotype and phenotype of nsSNPs in these membrane proteins. It is unknown which prediction method is better suited for the prediction of nonneutral nsSNPs of SLC transporters. We have identified 2,958 validated nsSNPs in human SLC family members 1-47 from the Ensembl genome database and the NCBI SNP database. Using three different algorithms, 37-45 % of nsSNPs in SLC genes were predicted to have functional impacts on transporter function. Predictions largely agreed with the available experimental annotations. Overall, 76.5, 74.4, and 73.5 % of nonneutral nsSNPs were predicted correctly as damaging by SNAP, SIFT, and PolyPhen, respectively, while 67.4, 66.3, and 76.7 % of neutral nsSNPs were predicted correctly as nondamaging by the three methods, respectively. This study identified many amino acids that were likely to be functionally critical but have not yet been studied experimentally. There was a significant concordance between the predicted results of different methods. Evolutionarily nonneutral (destabilizing) amino acid substitutions are predicted to be the basis for the pathogenic alteration of SLC transporter activity that is associated with disease susceptibility and altered drug/xenobiotic response.
-
7.
Computational methods to work as first-pass filter in deleterious SNP analysis of alkaptonuria.
Magesh, R, George Priya Doss, C
TheScientificWorldJournal. 2012;:738423
Abstract
A major challenge in the analysis of human genetic variation is to distinguish functional from nonfunctional SNPs. Discovering these functional SNPs is one of the main goals of modern genetics and genomics studies. There is a need to effectively and efficiently identify functionally important nsSNPs which may be deleterious or disease causing and to identify their molecular effects. The prediction of phenotype of nsSNPs by computational analysis may provide a good way to explore the function of nsSNPs and its relationship with susceptibility to disease. In this context, we surveyed and compared variation databases along with in silico prediction programs to assess the effects of deleterious functional variants on protein functions. In other respects, we attempted these methods to work as first-pass filter to identify the deleterious substitutions worth pursuing for further experimental research. In this analysis, we used the existing computational methods to explore the mutation-structure-function relationship in HGD gene causing alkaptonuria.
-
8.
Human platelet acetylcholinesterase inhibition by cyclophosphamide: a combined experimental and computational approach.
Al-Jafari, AA, Shakil, S, Reale, M, Kamal, MA
CNS & neurological disorders drug targets. 2011;(8):928-35
Abstract
Acetylcholinesterase (AChE)-inhibition is an area of priority research as various roles have been attributed to AChE in neurodegenerative disorders and cancer as well. In the present study, a comparative multiple 4 dimensional (4D)-approach was applied to analyze human platelet AChE-inhibition by cyclophosphamide (CP). AChE activity was assessed by measuring the hydrolysis of acetylthiocholine iodide (ASChI). The different perspective of analysis was based on two classical (Lineweaver-Burk as well as Dixon) plots, built-in equations of GOSA and a recently introduced graphical approach. Thus, various kinetic constants such as KI, Ks, Km, ksl, Vmao, Ki, ksli, Slmax, �Ks, K1/2, kcat and ksp were estimated. Previous findings of AChE (from different sources) inhibition by CP were also compared. This study extends the elucidation of the kinetic approach of analysis and quantifying enzyme-substrate and enzyme-inhibitor interactions, which is crucial to bringing any drug from bench to bedside. The acyl pocket of human AChE was found to interact with CP through the amino acid residues Y70, Y121, W233, F288, F290, Y334, F408 and Y442, while the anionic sub-site of catalytic site (CAS) interacted with the ligand through residues W84, N85, G116, G117, Y121, S122, G123, L127, Y130, E198, Y334, H443 and G444. CP displayed variable docking poses with the peripheral anionic site (PAS) of human AChE. The findings of kinetic analysis were reinforced by the results of docking experiments. Both the applied approaches strongly indicate partial mixed type of inhibition pattern for the study enzyme (AChE) and its inhibitor (CP).
-
9.
Computer-based comparison of structural features of envelope protein of Alkhurma hemorrhagic fever virus with the homologous proteins of two closest viruses.
Mohabatkar, H
Protein and peptide letters. 2011;(6):559-67
Abstract
The aim of this study was prediction of epitopes and medically important structural properties of protein E of Alkhurma hemorrhagic fever virus (AHFV) and comparing these features with two closely relates viruses, i.e. Kyasanur Forest disease virus (KFDV) and Tick-borne encephalitis virus (TBEV) by bioinformatics tools. Prediction of evolutionary distance, localization, sequence of signal peptides, C, N O glycosylation sites, transmembrane helices (TMHs), cysteine bond positions and B cell and T cell epitopes of E proteins were performed. 2D-MH, Virus-PLoc, Signal-CF, EnsembleGly, MemBrain, DiANNA, BCPREDS and MHCPred servers were applied for the prediction. According to the results, the evolutionary distance of E protein of AHFV and two other viruses was almost equal. In all three proteins of study, residues 1-35 were predicted as signal sequences and one asparagine was predicted to be glycosylated. Results of prediction of transmembrane helices showed one TMH at position 444-467 and the other one at position 476-490. Twelve cysteines were potentially involved to form six disulfide bridges in the proteins. Four parts were predicted as B cell epitopes in E protein of AHFV. One epitope was conserved between three proteins of study. The only conserved major histocompatibility complex (MHC) binding epitope between three viruses was for DRB0401 allele. As there are not much experimental data available about AHFV, computer-aided study and comparison of E protein of this virus with two closely related flaviviruses can help in better understanding of medical properties of the virus.
-
10.
Comparison of PGH2 binding site in prostaglandin synthases.
Paragi-Vedanthi, P, Doble, M
BMC bioinformatics. 2010;(Suppl 1):S51
Abstract
BACKGROUND Prostaglandin H2 (PGH2) is a common precursor for the synthesis of five different Prostanoids via specific Prostanoid Synthases. The binding of this substrate with these Synthases is not properly understood. Moreover, currently no crystal structure of complexes bound with PGH2 has been reported. Hence, understanding the interactions of PGH2 and characterizing its binding sites in these synthases is crucial for developing novel therapeutics based on these proteins as targets. RESULTS Shape and physico-chemical properties of the PGH2 binding sites of the four prostanoid synthases were analyzed and compared in order to understand the molecular basis of the specificity. This study provides models with predicted pockets for the binding of PGH2 with PGD, PGE, PGF and PGI Synthases. The results closely match with available experimental data. The comparison showed seven physico-chemical features that are common to the four PGH2 binding sites. However this common pattern is not statistically unique and is not specific enough to distinguish between proteins that can or cannot bind PGH2. A large scale search in ASTRAL data bank, a non redundant Protein Data Bank, for a similar pattern showed the uniqueness of each of the PGH2 binding site in these Synthases. CONCLUSION The binding pockets in PGDS, PGES, PGFS and PGIS are unique and do not share significant commonality which can be characterized as a PGH2 binding site. Local comparison of these protein structures highlights a case of convergent evolution in analogous functional sites.