Data compression and genomes: a two-dimensional life domain map

Journal of Theoretical Biology
Giulia MenconiMarcello Buiatti

Abstract

We define the complexity of DNA sequences as the information content per nucleotide, calculated by means of some Lempel-Ziv data compression algorithm. It is possible to use the statistics of the complexity values of the functional regions of different complete genomes to distinguish among genomes of different domains of life (Archaea, Bacteria and Eukarya). We shall focus on the distribution function of the complexity of non-coding regions. We show that the three domains may be plotted in separate regions within the two-dimensional space where the axes are the skewness coefficient and the curtosis coefficient of the aforementioned distribution. Preliminary results on 15 genomes are introduced.

References

May 1, 1995·Physical Review. E, Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics·S V BuldyrevH E Stanley
Apr 6, 2001·Physical Review Letters·B AuditA Arneodo
Jan 22, 2002·Physical Review Letters·Dario BenedettoVittorio Loreto
Nov 5, 2003·Bioinformatics·Hasan H Otu, Khalid Sayood
Jun 8, 2004·Bioinformatics·Olivier Delgrange, Eric Rivals
Jun 25, 2004·Nucleic Acids Research·Y L Orlov, V N Potapov
May 21, 2005·Physical Review Letters·Hong-Da ChenHoong-Chien Lee
Jun 15, 2005·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Manuel DehnertMarc-Th Hütt
Dec 1, 2007·International Journal of Bioinformatics Research and Applications·Wojciech SzpankowskiLukasz Szpankowski

❮ Previous
Next ❯

Citations

Feb 17, 2010·Database : the Journal of Biological Databases and Curation·Gregory Vey
Feb 6, 2016·Journal of Bioinformatics and Computational Biology·Muhammad SardarazAtaul Aziz Ikram
May 11, 2011·Molecular Phylogenetics and Evolution·Elisa CalistriMarcello Buiatti
Feb 14, 2014·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Sebastian Wandelt, Ulf Leser
Aug 26, 2014·Journal of Theoretical Biology·Elisa CalistriRoberto Livi

❮ Previous
Next ❯

Related Concepts

Related Feeds

Archaeogenetics

Recent advances in genomic sequencing has led to the discovery of new strains of Archaea and shed light on their evolutionary history. Discover the latest research on Archaeogenetics here.