An analysis of the Sargasso Sea resource and the consequences for database composition.

BMC Bioinformatics
Michael L TressAlfonso Valencia

Abstract

The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource. The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments. These observations suggest that the new sequences should be used...Continue Reading

References

Dec 31, 1975·Journal of Molecular Evolution·E Zuckerkandl
Aug 15, 1992·Proceedings of the National Academy of Sciences of the United States of America·P BorkA Valencia
Jul 1, 1969·Scientific American·M O Dayhoff
Jan 1, 1996·Methods in Enzymology·J C Wootton, S Federhen
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Jul 17, 1998·Current Opinion in Structural Biology·L Holm
Dec 10, 1998·Nature Structural Biology·A Sali
Dec 11, 1999·Nucleic Acids Research·H M BermanP E Bourne
Aug 16, 2000·Proteins·D Devos, A Valencia
Jan 25, 2002·Proteins·Dariusz Przybylski, Burkhard Rost
Mar 15, 2002·Trends in Biochemical Sciences·David T Jones, Mark B Swindells
Jul 16, 2002·Bioinformatics·Jinfeng Liu, Burkhard Rost
Jun 26, 2003·Nucleic Acids Research·Adam Zemla
Jul 10, 2003·Journal of Molecular Biology·Michael L TressAlfonso Valencia
Oct 28, 2003·Proteins·Lisa N KinchNick V Grishin
Oct 28, 2003·Proteins·Anna Tramontano, Veronica Morea
Dec 19, 2003·Nucleic Acids Research·Dennis A BensonDavid L Wheeler
Dec 19, 2003·Nucleic Acids Research·Alex BatemanSean R Eddy
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Mar 23, 2004·Nucleic Acids Research·Robert C Edgar
May 15, 2004·Environmental Microbiology·Michael Y Galperin
May 22, 2004·Briefings in Bioinformatics·Amos BairochElisabeth Gasteiger
Apr 23, 2005·Science·Susannah Green TringeEdward M Rubin
Aug 20, 2005·PLoS Computational Biology·Kevin Chen, Lior Pachter
Sep 28, 2005·Proteins·Michael TressRoland L Dunbrack
Oct 4, 2005·EMBO Reports·Konrad U FoerstnerPeer Bork

❮ Previous
Next ❯

Citations

Jun 1, 2007·Nature Methods·Lior Pachter
Oct 11, 2008·PloS One·Weizhong LiAdam Godzik
Aug 8, 2008·Bioinformation·Willy Valdivia-Granda
Sep 30, 2010·IEEE Transactions on Nanobioscience·Gail L Rosen, Steven D Essinger

❮ Previous
Next ❯

Methods Mentioned

BETA
phylogenetic profiles
whole genome shotgun

Software Mentioned

MUSCLE
LGA
itok
nr
hit
CLUSTALW
cd
BLAST
SEG
Curr

Related Concepts

Related Feeds

Archaeogenetics

Recent advances in genomic sequencing has led to the discovery of new strains of Archaea and shed light on their evolutionary history. Discover the latest research on Archaeogenetics here.