Gene prediction in metagenomic fragments: a large scale machine learning approach

BMC Bioinformatics
Katharina J HoffPeter Meinicke

Abstract

Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extra...Continue Reading

References

Dec 1, 1977·Proceedings of the National Academy of Sciences of the United States of America·F SangerA R Coulson
Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Mar 21, 1998·Nucleic Acids Research·A V Lukashin, Mark Borodovsky
Aug 15, 1998·Science·M RonaghiP Nyrén
Sep 11, 1999·Nucleic Acids Research·J Besemer, Mark Borodovsky
Nov 11, 1999·Nucleic Acids Research·A L DelcherSteven L Salzberg
Dec 11, 1999·Nucleic Acids Research·K E Rudd
Dec 26, 2001·Bioinformatics·B E SuzekSteven L Salzberg
Feb 12, 2002·Bioinformatics·Vladimir B BajicVladimir Brusic
Feb 28, 2002·Genome Biology·Philip Hugenholtz
Jun 12, 2002·Current Opinion in Microbiology·Vigdis Torsvik, Lise Øvreås
Oct 7, 2003·Annual Review of Microbiology·Michael S Rappé, Stephen J Giovannoni
Oct 9, 2003·Applied and Environmental Microbiology·S VogetWolfgang R Streit
Dec 23, 2003·The International Journal of Biochemistry & Cell Biology·Hong-Yu OuChun-Ting Zhang
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Jun 15, 2004·Current Opinion in Biotechnology·Rolf Daniel
Aug 7, 2004·Current Opinion in Biotechnology·Wolfgang R StreitKarl-Erich Jaeger
Dec 1, 2004·Annual Review of Genetics·Christian S RiesenfeldJo Handelsman
Dec 14, 2004·Microbiology and Molecular Biology Reviews : MMBR·Jo Handelsman
Apr 23, 2005·Science·Susannah Green TringeEdward M Rubin
Jun 3, 2005·Nature Reviews. Microbiology·Rolf Daniel
Aug 20, 2005·PLoS Computational Biology·Kevin Chen, Lior Pachter
Oct 27, 2005·Bioinformatics·Pernille Nielsen, Anders Krogh
Mar 11, 2006·BMC Bioinformatics·Maike Tech, Peter Meinicke
Mar 22, 2006·BMC Genomics·Robert A EdwardsForest Rohwer
Jul 18, 2006·Nucleic Acids Research·Maike TechPeter Meinicke
Jul 29, 2006·Bioinformatics·Lutz KrauseJens Stoye
Oct 10, 2006·Nucleic Acids Research·Hideki NoguchiToshihisa Takagi
Jan 5, 2007·Nucleic Acids Research·Dennis A BensonDavid L Wheeler

Citations

Sep 12, 2012·Briefings in Bioinformatics·Hanno Teeling, Frank Oliver Glöckner
Jul 17, 2012·Bioinformatics·Doug HyattEdward C Uberbacher
May 12, 2009·Nucleic Acids Research·Katharina J HoffMaike Tech
Apr 21, 2010·Nucleic Acids Research·Wenhan ZhuMark Borodovsky
Jun 10, 2010·Nucleic Acids Research·Mohamed Radhouene AnibaJulie Dawn Thompson
Sep 2, 2010·Nucleic Acids Research·Mina RhoYuzhen Ye
May 31, 2011·Nucleic Acids Research·Thomas LingnerPeter Meinicke
Jan 15, 2011·BMC Bioinformatics·Non G Yok, Gail L Rosen
Jun 14, 2013·BMC Bioinformatics·Yongchu LiuHuaiqiu Zhu
Nov 17, 2009·BMC Genomics·Katharina J Hoff
Mar 3, 2010·PLoS Computational Biology·John C WooleyIddo Friedberg
Aug 3, 2010·PloS One·Sébastien RodrigueSallie W Chisholm
Oct 12, 2013·PloS One·Mohammed E M TolbaSumio Sugano
Jan 1, 2009·Journal of Computer Science and Technology·John C Wooley, Yuzhen Ye
Feb 15, 2011·Chemosphere·H James CleavesRobert A Hazen
May 12, 2010·Forensic Science International. Genetics·B BrenigE Schütz
Nov 10, 2009·Trends in Microbiology·Thomas SchoenfeldDavid Mead
Jan 17, 2009·Letters in Applied Microbiology·R D SleatorC Hill
Dec 30, 2014·Bioinformatics·Peter Meinicke
Sep 28, 2010·BMC Bioinformatics·Thomas LingnerPeter Meinicke
Sep 17, 2009·Technology in Cancer Research & Treatment·Dominik HeiderMarkus Borschbach
Dec 19, 2017·BioMed Research International·Shao-Wu ZhangTeng Zhang
Jan 10, 2018·Scientific Reports·Mauricio Barrientos-SomarribasErik L L Sonnhammer
Aug 9, 2013·BMC Bioinformatics·Achraf El Allali, John R Rose
Dec 25, 2018·BioMed Research International·Wen-Pei ChenYaw-Ling Lin
Dec 26, 2019·BMC Bioinformatics·Prapaporn Techa-AngkoonYanni Sun
Dec 27, 2016·Frontiers in Physiology·Alejandra V ContrerasOsbaldo Resendis-Antonio
Jul 22, 2018·BioData Mining·Amani Al-Ajlan, Achraf El Allali
Apr 1, 2019·Journal of Mathematical Biology·Richard C Tillquist, Manuel E Lladser
Dec 28, 2018·Interdisciplinary Sciences, Computational Life Sciences·Amani Al-Ajlan, Achraf El Allali

Related Concepts

Knowledge Representation (Computer)
Genome Mapping
DNA, Bacterial
Pattern Recognition System
Genome, Bacterial
Sequence Determinations, DNA
Classification
Environment
Genes
Genome

Related Feeds

Archaeogenetics

Recent advances in genomic sequencing has led to the discovery of new strains of Archaea and shed light on their evolutionary history. Discover the latest research on Archaeogenetics here.