1.
Carotenoid Cleavage Dioxygenases: Identification, Expression, and Evolutionary Analysis of This Gene Family in Tobacco.
Zhou, Q, Li, Q, Li, P, Zhang, S, Liu, C, Jin, J, Cao, P, Yang, Y
International journal of molecular sciences. 2019;(22)
Abstract
Carotenoid cleavage dioxygenases (CCDs) selectively catalyze carotenoids, forming smaller apocarotenoids that are essential for the synthesis of apocarotenoid flavor, aroma volatiles, and phytohormone ABA/SLs, as well as responses to abiotic stresses. Here, 19, 11, and 10 CCD genes were identified in Nicotiana tabacum, Nicotiana tomentosiformis, and Nicotiana sylvestris, respectively. For this family, we systematically analyzed phylogeny, gene structure, conserved motifs, gene duplications, cis-elements, subcellular and chromosomal localization, miRNA-target sites, expression patterns with different treatments, and molecular evolution. CCD genes were classified into two subfamilies and nine groups. Gene structures, motifs, and tertiary structures showed similarities within the same groups. Subcellular localization analysis predicted that CCD family genes are cytoplasmic and plastid-localized, which was confirmed experimentally. Evolutionary analysis showed that purifying selection dominated the evolution of these genes. Meanwhile, seven positive sites were identified on the ancestor branch of the tobacco CCD subfamily. Cis-regulatory elements of the CCD promoters were mainly involved in light-responsiveness, hormone treatment, and physiological stress. Different CCD family genes were predominantly expressed separately in roots, flowers, seeds, and leaves and exhibited divergent expression patterns with different hormones (ABA, MeJA, IAA, SA) and abiotic (drought, cold, heat) stresses. This study provides a comprehensive overview of the NtCCD gene family and a foundation for future functional characterization of individual genes.
2.
An improved string composition method for sequence comparison.
Lu, G, Zhang, S, Fang, X
BMC bioinformatics. 2008;(Suppl 6):S15
Abstract
BACKGROUND Historically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison-one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational limitations. Consequently, alignment-free methods have been explored as important alternatives in estimating sequence similarity. Of the alignment-free methods, the string composition vector (CV) methods, which use the frequencies of nucleotide or amino acid strings to represent sequence information, show promising results in genome sequence comparison of prokaryotes. The existing CV-based methods, however, suffer certain statistical problems, thereby underestimating the amount of evolutionary information in genetic sequences. RESULTS We show that the existing string composition based methods have two problems, one related to the Markov model assumption and the other associated with the denominator of the frequency normalization equation. We propose an improved complete composition vector method under the assumption of a uniform and independent model to estimate sequence information contributing to selection for sequence comparison. Phylogenetic analyses using both simulated and experimental data sets demonstrate that our new method is more robust compared with existing counterparts and comparable in robustness with alignment-based methods. CONCLUSION We observed two problems existing in the currently used string composition methods and proposed a new robust method for the estimation of evolutionary information of genetic sequences. In addition, we discussed that it might not be necessary to use relatively long strings to build a complete composition vector (CCV), due to the overlapping nature of vector strings with a variable length. We suggested a practical approach for the choice of an optimal string length to construct the CCV.