Towards realistic benchmarks for multiple alignments of non-coding sequences.

BMC Bioinformatics
Jaebum Kim, Saurabh Sinha

Abstract

With the continued development of new computational tools for multiple sequence alignment, it is necessary today to develop benchmarks that aid the selection of the most effective tools. Simulation-based benchmarks have been proposed to meet this necessity, especially for non-coding sequences. However, it is not clear if such benchmarks truly represent real sequence data from any given group of species, in terms of the difficulty of alignment tasks. We find that the conventional simulation approach, which relies on empirically estimated values for various parameters such as substitution rate or insertion/deletion rates, is unable to generate synthetic sequences reflecting the broad genomic variation in conservation levels. We tackle this problem with a new method for simulating non-coding sequence evolution, by relying on genome-wide distributions of evolutionary parameters rather than their averages. We then generate synthetic data sets to mimic orthologous sequences from the Drosophila group of species, and show that these data sets truly represent the variability observed in genomic data in terms of the difficulty of the alignment task. This allows us to make significant progress towards estimating the alignment accuracy of ...Continue Reading

References

Jan 1, 1981·Journal of Molecular Evolution·J Felsenstein
Jun 2, 1998·Bioinformatics·J StoyeF Meyer
Nov 25, 1998·Protein Science : a Publication of the Protein Society·K MizuguchiJ P Overington
Mar 26, 2003·Genome Research·Michael BrudnoSerafim Batzoglou
Jan 23, 2004·BMC Bioinformatics·Daniel A PollardMichael B Eisen
Apr 3, 2004·Genome Research·Nicolas Bray, Lior Pachter
Apr 3, 2004·Genome Research·Mathieu BlanchetteWebb Miller
May 8, 2004·Science·Gill BejeranoDavid Haussler
Jun 10, 2004·BMC Bioinformatics·D A PollardM B Eisen
Aug 24, 2004·Current Protein & Peptide Science·V A Simossis, J Heringa
Dec 3, 2004·Genome Research·Mathieu BlanchetteDavid Haussler
Jan 22, 2005·Molecular Biology and Evolution·Saurabh Sinha, Eric D Siggia
Apr 27, 2005·Journal of Bioinformatics and Computational Biology·Yi Wang, Kuo-Bin Li
Jun 18, 2005·Bioinformatics·Amol Prakash, Martin Tompa
Jul 8, 2005·Proceedings of the National Academy of Sciences of the United States of America·Ari Löytynoja, Nick Goldman
Jul 19, 2005·Genome Research·Adam SiepelDavid Haussler
Jul 27, 2005·Proteins·Julie D ThompsonOlivier Poch
Nov 25, 2005·BMC Bioinformatics·Michael S Rosenberg
Nov 25, 2005·Bioinformatics·Reed A Cartwright
Jan 18, 2006·PLoS Computational Biology·Gerton LunterJotun Hein
Apr 14, 2006·Systematic Biology·T Heath Ogden, Michael S Rosenberg
May 9, 2006·Current Opinion in Structural Biology·Robert C Edgar, Serafim Batzoglou
Sep 9, 2006·Journal of Bioinformatics and Computational Biology·Leonid ChindelevitchMathieu Blanchette
Mar 28, 2007·Molecular Biology and Evolution·Giddy Landan, Dan Graur
May 8, 2007·Molecular Biology and Evolution·Ziheng Yang
Jun 19, 2007·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Abdoulaye Banire DialloMathieu Blanchette
Jun 28, 2007·Genome Biology·Amol Prakash, Martin Tompa
Sep 6, 2007·PLoS Computational Biology·Cédric Notredame
Sep 7, 2007·Bioinformatics·Robert K Bradley, Ian Holmes
Sep 12, 2007·Bioinformatics·M A LarkinD G Higgins
Nov 13, 2007·Nature·UNKNOWN Drosophila 12 Genomes ConsortiumIain MacCallum
Dec 19, 2007·Nucleic Acids Research·D KarolchikW J Kent
Mar 29, 2008·Briefings in Bioinformatics·Kazutaka Katoh, Hiroyuki Toh
Apr 23, 2008·Current Protocols in Bioinformatics·Victor SimossisJaap Heringa
May 7, 2008·Molecular Biology and Evolution·Barry G Hall
May 29, 2008·Algorithms for Molecular Biology : AMB·Amarendran R SubramanianBurkhard Morgenstern
Jun 21, 2008·Methods in Molecular Biology·Walter Pirovano, Jaap Heringa
Jun 26, 2008·Algorithms for Molecular Biology : AMB·Andreas W M DressPeter F Stadler
Jan 10, 2009·PLoS Genetics·Jaebum KimSaurabh Sinha
May 9, 2009·Molecular Biology and Evolution·William Fletcher, Ziheng Yang
May 30, 2009·PLoS Computational Biology·Robert K BradleyLior Pachter

❮ Previous
Next ❯

Citations

Mar 3, 2012·Molecular Biology and Evolution·Tina KoestlerIngo Ebersberger
Jul 20, 2010·Nucleic Acids Research·Mohamed Radhouene AnibaJulie D Thompson
Jan 11, 2012·Nucleic Acids Research·Ionas ErbCédric Notredame
May 18, 2011·Nucleic Acids Research·Jaebum Kim, Jian Ma
Jun 2, 2011·Genome Research·Leila TaherIvan Ovcharenko
Sep 18, 2010·BMC Bioinformatics·Gayathri Jayaraman, Rahul Siddharthan
Dec 17, 2010·BMC Bioinformatics·Minh Duc CaoLloyd Allison
Mar 5, 2011·BMC Bioinformatics·Giltae SongWebb Miller
Oct 8, 2013·Molecular Biology and Evolution·Thyago DuqueSaurabh Sinha
Feb 21, 2016·Database : the Journal of Biological Databases and Curation·Javier HerreroPaul Flicek
Jun 2, 2012·Gene Regulation and Systems Biology·Ransom L BaldwinRobert W Li

❮ Previous
Next ❯

Software Mentioned

HoT SPS
Indelign
HoT CS
HoT
Paml
Mafft
Dialign
Pecan
Phastcons
Rose

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.