Introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes

BioRxiv : the Preprint Server for Biology
Adrienne RessayreJohann Joets


Plant genomes are large, intron-rich and present a wide range of variation in coding region G + C content. Concerning coding regions, a sort of syndrome can be described in plants: the increase in G + C content is associated with both the increase in heterogeneity among genes within a genome and the increase in variation across genes. Taking advantage of the large number of genes composing plant genomes and the wide range of variation in gene intron number, we performed a comprehensive survey of the patterns of variation in G + C content at different scales from the nucleotide level to the genome scale in two species Arabidopsis thaliana and Oryza sativa , comparing the patterns in genes with different intron numbers. In both species, we observed a pervasive effect of gene intron number and location along genes on G + C content, codon and amino acid frequencies suggesting that in both species, introns have a barrier effect structuring G + C content along genes. In external gene regions (located upstream first or downstream last intron), species-specific factors are shaping G + C content while in internal gene regions (surrounded by introns), G + C content is constrained to remain within a range common to both species. In rice, ...Continue Reading

Related Concepts

Base Sequence
Rice (Dietary)
Arabidopsis thaliana <plant>
Feeling Content

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.