Exploration of multivariate analysis in microbial coding sequence modeling.

BMC Bioinformatics
Tahir MehmoodLars Snipen

Abstract

Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). The performance of coding sequence modeling can be substant...Continue Reading

References

Jan 1, 1993·Bio Systems·M Borodovsky, J McIninch
Feb 28, 1998·Nucleic Acids Research·S L SalzbergO White
Mar 21, 1998·Nucleic Acids Research·A V Lukashin, M Borodovsky
Jun 15, 1999·Bioinformatics·U OhlerM G Reese
Apr 27, 2000·Trends in Genetics : TIG·D M Faguy, W F Doolittle
Jul 13, 2000·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Z ZhangW Miller
Jul 28, 2001·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·T YadaK Nakai
Jun 5, 2003·BMC Bioinformatics·Thomas Schou Larsen, Anders Krogh
Jun 6, 2003·Biochemical and Biophysical Research Communications·Ling-Ling Chen, Chun-Ting Zhang
Jun 26, 2003·Nucleic Acids Research·Stéphanie BocsClaudine Médigue
Jul 21, 2004·Briefings in Bioinformatics·Rajeev K Azad, Mark Borodovsky
Aug 7, 2004·Journal of Bioinformatics and Computational Biology·Zhengqing OuyangZhen-Su She
May 4, 2005·Journal of Bacteriology·Biliana Lesic, Elisabeth Carniel
Sep 21, 2005·Proceedings of the National Academy of Sciences of the United States of America·Hervé TettelinClaire M Fraser
Sep 28, 2005·Current Opinion in Genetics & Development·Duccio MediniRino Rappuoli
Oct 4, 2005·EMBO Reports·Konrad U FoerstnerPeer Bork
Apr 6, 2006·Proceedings of the National Academy of Sciences of the United States of America·Swaine L ChenJeffrey I Gordon
Jan 24, 2007·Bioinformatics·Arthur L DelcherSteven L Salzberg
Oct 19, 2007·Nature·Mark J Pallen, Brendan W Wren
Oct 27, 2007·The Lancet Infectious Diseases·Pierre-Edouard FournierDidier Raoult
Dec 20, 2007·Genome Biology·Hanni WillenbrockDavid W Ussery
Apr 4, 2008·Journal of Theoretical Biology·Sebastian E AhnertAndrei Zinovyev
Oct 18, 2008·Nucleic Acids Research·Kim D PruittDonna R Maglott
Feb 26, 2010·BMC Bioinformatics·Chih-Hsien ChengChin Lung Lu
Mar 10, 2010·BMC Bioinformatics·Doug HyattLoren J Hauser
Mar 17, 2010·BMC Bioinformatics·Andrew S WarrenJoão Carlos Setubal
Aug 19, 2010·Cellular Microbiology·M Alexander Schmidt

❮ Previous
Next ❯

Citations

Oct 1, 2013·European Journal of Applied Physiology·J HadréviF Hellström
Jan 29, 2014·Microbial Informatics and Experimentation·Hilde VinjeLars Snipen
Jul 3, 2015·BMC Bioinformatics·Hilde VinjeLars Snipen

❮ Previous
Next ❯

Software Mentioned

Glimmer3
GeneMark
IMM
Glimmer
CPPLS
megaBLAST

Related Concepts

Related Feeds

Archaeogenetics

Recent advances in genomic sequencing has led to the discovery of new strains of Archaea and shed light on their evolutionary history. Discover the latest research on Archaeogenetics here.