-
1.
PepQuery enables fast, accurate, and convenient proteomic validation of novel genomic alterations.
Wen, B, Wang, X, Zhang, B
Genome research. 2019;(3):485-493
Abstract
Massively parallel or second-generation sequencing-based genomic studies continuously identify new genomic alterations that may lead to novel protein sequences, which are attractive candidates for disease biomarkers and therapeutic targets after proteomic validation. Integrative proteogenomic methods have been developed to use mass spectrometry (MS)-based proteomics data for such validation. These methods replace the reference sequence database in proteomic database searching with a customized protein database that incorporates sample- or disease-specific sequences derived from DNA or RNA sequencing, thus enabling the identification of novel protein sequences. Although useful, this spectrum-centric approach requires a full evaluation of all possible spectrum-peptide pairs, which is time-consuming, error-prone, and difficult to apply. Here, we present PepQuery, a peptide-centric approach that focuses on only novel DNA or protein sequences of interest. PepQuery allows quick and easy proteomic validation of genomic alterations without customized database construction. We demonstrated the sensitivity and specificity of the approach in validating completely novel proteins, novel splice junctions, and single amino acid variants using simulations and experimental data. Notably, enabling unrestricted modification searching in PepQuery reduced false positives by up to 95%. We implemented PepQuery as both web-based and stand-alone applications. The web version provides direct access to more than half a billion MS/MS spectra from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and other cancer proteomic studies. The stand-alone version supports batch analysis and user-provided MS/MS data. PepQuery will increase the usage of proteogenomics beyond the proteomics community and will broaden the application of proteogenomics in personalized medicine.
-
2.
The Reliability of DNA Sequences in Public Databases Belonging to the Most Economically Important Shiitake Culinary-Medicinal Mushroom Lentinus edodes (Agaricomycetes) in Asia.
Yang, RH, Wu, YY, Tang, LH, Li, CH, Shang, JJ, Li, Y, Song, Y, Huang, WH, Tao, XS, Tan, Q, et al
International journal of medicinal mushrooms. 2019;(12):1223-1239
Abstract
Large numbers of DNA sequences deposited in the International Nucleotide Sequence Databases (INSD) are erroneously annotated. The erroneous information may lead to misleading conclusions or cause great economic losses to farmers. Lentinus edodes (= Lentinula edodes (Berk.) Pegler) is one of the most important and popular culinary-medicinal mushrooms with a high nutritional value. In this study, experimental and in silico methods were used to correct the sequences annotated as L. edodes in the INSD. A total of 3,426 nucleotide entries were retrieved from public databases, including 140 different types of genetic sequences. Excluding 1,893 genome sequences, the most abundant signatures represented by ITS (258) and IGS1 (259) sequences accounted for 33.23% of the total entries. A total of 3,058 sequences were annotated correctly, 350 were indeterminate, and 18 were annotated erroneously based on the two methods. Correction of sequences will be beneficial for species identification and annotation. Phylogenic analysis based on ITS sequences suggested that L. edodes segregate in four clades in the tree based on ITS sequences. The isolates from China were distributed into two clades. In L. edodes, the intraspecific variation of the ITS2 sequences was much higher than that of the ITS1 sequences. In addition, the genetic diversity of the L. edodes sequences from China was much higher than that of any other regions included in this study. The northwest and southwest regions of China were L. edodes diversity centers.
-
3.
Carotenoid Cleavage Dioxygenases: Identification, Expression, and Evolutionary Analysis of This Gene Family in Tobacco.
Zhou, Q, Li, Q, Li, P, Zhang, S, Liu, C, Jin, J, Cao, P, Yang, Y
International journal of molecular sciences. 2019;(22)
Abstract
Carotenoid cleavage dioxygenases (CCDs) selectively catalyze carotenoids, forming smaller apocarotenoids that are essential for the synthesis of apocarotenoid flavor, aroma volatiles, and phytohormone ABA/SLs, as well as responses to abiotic stresses. Here, 19, 11, and 10 CCD genes were identified in Nicotiana tabacum, Nicotiana tomentosiformis, and Nicotiana sylvestris, respectively. For this family, we systematically analyzed phylogeny, gene structure, conserved motifs, gene duplications, cis-elements, subcellular and chromosomal localization, miRNA-target sites, expression patterns with different treatments, and molecular evolution. CCD genes were classified into two subfamilies and nine groups. Gene structures, motifs, and tertiary structures showed similarities within the same groups. Subcellular localization analysis predicted that CCD family genes are cytoplasmic and plastid-localized, which was confirmed experimentally. Evolutionary analysis showed that purifying selection dominated the evolution of these genes. Meanwhile, seven positive sites were identified on the ancestor branch of the tobacco CCD subfamily. Cis-regulatory elements of the CCD promoters were mainly involved in light-responsiveness, hormone treatment, and physiological stress. Different CCD family genes were predominantly expressed separately in roots, flowers, seeds, and leaves and exhibited divergent expression patterns with different hormones (ABA, MeJA, IAA, SA) and abiotic (drought, cold, heat) stresses. This study provides a comprehensive overview of the NtCCD gene family and a foundation for future functional characterization of individual genes.
-
4.
i6mA-DNCP: Computational Identification of DNA N6-Methyladenine Sites in the Rice Genome Using Optimized Dinucleotide-Based Features.
Kong, L, Zhang, L
Genes. 2019;(10)
Abstract
DNA N6-methyladenine (6mA) plays an important role in regulating the gene expression of eukaryotes. Accurate identification of 6mA sites may assist in understanding genomic 6mA distributions and biological functions. Various experimental methods have been applied to detect 6mA sites in a genome-wide scope, but they are too time-consuming and expensive. Developing computational methods to rapidly identify 6mA sites is needed. In this paper, a new machine learning-based method, i6mA-DNCP, was proposed for identifying 6mA sites in the rice genome. Dinucleotide composition and dinucleotide-based DNA properties were first employed to represent DNA sequences. After a specially designed DNA property selection process, a bagging classifier was used to build the prediction model. The jackknife test on a benchmark dataset demonstrated that i6mA-DNCP could obtain 84.43% sensitivity, 88.86% specificity, 86.65% accuracy, a 0.734 Matthew's correlation coefficient (MCC), and a 0.926 area under the receiver operating characteristic curve (AUC). Moreover, three independent datasets were established to assess the generalization ability of our method. Extensive experiments validated the effectiveness of i6mA-DNCP.
-
5.
Opportunities and limitations of reduced representation bisulfite sequencing in plant ecological epigenomics.
Paun, O, Verhoeven, KJF, Richards, CL
The New phytologist. 2019;(2):738-742
Abstract
Contents Summary 738 I. Introduction 738 II. RRBS loci as genome-wide epigenetic markers 739 III. Exploiting functional annotation of RRBS loci 739 IV. Limitations of RRBS methods for nonmodel species 740 V. Maximising the impact of RRBS in plants 741 VI. Conclusions 741 Acknowledgements 741 SUMMARY Investigating the features and implications of epigenetic mechanisms across the breadth of organisms and ecosystems is important for understanding the ecological relevance of epigenetics. Several cost-effective reduced representation bisulfite sequencing approaches (RRBS) have been recently developed and applied to different organisms that lack a well annotated reference genome. These new approaches improve the assessment of epigenetic diversity in ecological settings and may provide functional insights. We assess here the opportunities and limitations of RRBS in nonmodel plant species. Well thought out experimental designs that include complementary gene expression studies, and the improvement of genomics resources for the target group, promise to maximize the effect of future RRBS studies.
-
6.
Isolation of group B Streptococcus with reduced β-lactam susceptibility from pregnant women.
Moroi, H, Kimura, K, Kotani, T, Tsuda, H, Banno, H, Jin, W, Wachino, JI, Yamada, K, Mitsui, T, Yamashita, M, et al
Emerging microbes & infections. 2019;(1):2-7
Abstract
β-Lactam antibiotics are first-line agents for the treatment and prevention of group B Streptococcus (GBS) infections. We previously reported clinical GBS isolates with reduced β-lactam susceptibility (GBS-RBS) and characterized them as harbouring amino acid substitutions in penicillin-binding proteins (PBPs). However, to our knowledge, GBS-RBS clinical isolates have never previously been isolated from pregnant women worldwide. We obtained 477 clinical GBS isolates from vaginal/rectal swabs of 4530 pregnant women in Japan. We determined the MICs of seven β-lactams for all 477 clinical isolates. Five clinical isolates showed reduced ceftibuten susceptibility. For these isolates, we performed sequencing analysis of pbp genes. None of the 477 isolates were non-susceptible to penicillin G, ampicillin, and meropenem. For five isolates, the MICs of ceftibuten were relatively high (64-128 μg/ml). Each of these isolates possessed a single amino acid substitution in PBP2X, and some of the substitutions had been previously found in GBS with reduced penicillin susceptibility. This is the first report of the isolation of clinical GBS-RBS isolates harbouring amino acid substitutions in PBP2X that confer reduced ceftibuten susceptibility from pregnant women.
-
7.
How good are pathogenicity predictors in detecting benign variants?
Niroula, A, Vihinen, M
PLoS computational biology. 2019;(2):e1006481
Abstract
Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers.
-
8.
Generative Models for Quantification of DNA Modifications.
Äijö, T, Bonneau, R, Lähdesmäki, H
Methods in molecular biology (Clifton, N.J.). 2018;:37-50
Abstract
There are multiple chemical modifications of cytosine that are important to the regulation and ultimately the functional expression of the genome. To date no single experiment can capture these separate modifications, and integrative experimental designs are needed to fully characterize cytosine methylation and chemical modification. This chapter describes a generative probabilistic model, Lux, for integrative analysis of cytosine methylation and its oxidized variants. Lux simultaneously analyzes partially orthogonal bisulfite sequencing data sets to estimate proportions of different cytosine methylation modifications and estimate multiple cytosine modifications for a single sample by integrating across experimental designs composed of multiple parallel destructive genomic measurements. Lux also considers the variation in measurements introduced by different imperfect experimental steps; the experimental variation can be quantified by using appropriate spike-in controls, allowing Lux to deconvolve the measurements and recover accurately the underlying signal.
-
9.
Novo&Stitch: accurate reconciliation of genome assemblies via optical maps.
Pan, W, Wanamaker, SI, Ah-Fong, AMV, Judelson, HS, Lonardi, S
Bioinformatics (Oxford, England). 2018;(13):i43-i51
-
-
Free full text
-
Abstract
MOTIVATION De novo genome assembly is a challenging computational problem due to the high repetitive content of eukaryotic genomes and the imperfections of sequencing technologies (i.e. sequencing errors, uneven sequencing coverage and chimeric reads). Several assembly tools are currently available, each of which has strengths and weaknesses in dealing with the trade-off between maximizing contiguity and minimizing assembly errors (e.g. mis-joins). To obtain the best possible assembly, it is common practice to generate multiple assemblies from several assemblers and/or parameter settings and try to identify the highest quality assembly. Unfortunately, often there is no assembly that both maximizes contiguity and minimizes assembly errors, so one has to compromise one for the other. RESULTS The concept of assembly reconciliation has been proposed as a way to obtain a higher quality assembly by merging or reconciling all the available assemblies. While several reconciliation methods have been introduced in the literature, we have shown in one of our recent papers that none of them can consistently produce assemblies that are better than the assemblies provided in input. Here we introduce Novo&Stitch, a novel method that takes advantage of optical maps to accurately carry out assembly reconciliation (assuming that the assembled contigs are sufficiently long to be reliably aligned to the optical maps, e.g. 50 Kbp or longer). Experimental results demonstrate that Novo&Stitch can double the contiguity (N50) of the input assemblies without introducing mis-joins or reducing genome completeness. AVAILABILITY AND IMPLEMENTATION Novo&Stitch can be obtained from https://github.com/ucrbioinfo/Novo_Stitch.
-
10.
DNA methylation detection: recent developments in bisulfite free electrochemical and optical approaches.
Bhattacharjee, R, Moriam, S, Umer, M, Nguyen, NT, Shiddiky, MJA
The Analyst. 2018;(20):4802-4818
Abstract
DNA methylation is one of the significant epigenetic modifications involved in mammalian development as well as in the initiation and progression of various diseases like cancer. Over the past few decades, an enormous amount of research has been carried out for the quantification of DNA methylation in the mammalian genome. Earlier, most of these methodologies used bisulfite treatment. However, the low conversion, false reading, longer assay time and complex chemical reaction are the common limitations of this method that hinder their application in routine clinical screening. Thus, as an alternative to bisulfite conversion-based DNA methylation detection, numerous bisulfite-free methods have been proposed. In this regard, electrochemical biosensors have gained much attention in recent years for being highly sensitive yet cost-effective, portable, and simple to operate. On the other hand, biosensors with optical readouts enable direct real time detection of biological molecules and are easily adaptable to multiplexing. Incorporation of electrochemical and optical readouts into bisulfite free DNA methylation analysis is paving the way for the translation of this important biomarker into standard patient care. In this review, we provide a critical overview of recent advances in the development of electrochemical and optical readout based bisulfite free DNA methylation assays.