A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods.

PloS One
Ai-Bing ZhangWei-zhong Zhao

Abstract

Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML me...Continue Reading

References

Apr 25, 1990·Nucleic Acids Research·H J Jeffrey
Jan 1, 1997·Computers & Chemistry·C H Wu
Dec 29, 1998·Protein Science : a Publication of the Protein Society·H C WangJ M Carazo
Sep 21, 2002·Cladistics : the International Journal of the Willi Hennig Society·G Giribet
Mar 5, 2003·Proceedings. Biological Sciences·Paul D N HebertJeremy R deWaard
Dec 1, 1996·Neural Networks : the Official Journal of the International Neural Network Society·Maciej StodolskiStanislaw Osowski
Sep 4, 2003·Proceedings. Biological Sciences·Paul D N HebertJeremy R deWaard
Oct 8, 2003·Systematic Biology·Stéphane Guindon, Olivier Gascuel
Mar 23, 2004·Nucleic Acids Research·Robert C Edgar
Sep 30, 2004·PLoS Biology·Paul D N HebertCharles M Francis
Oct 7, 2004·Proceedings of the National Academy of Sciences of the United States of America·Paul D N HebertWinnie Hallwachs
Oct 16, 2004·PLoS Biology·Craig Moritz, Carla Cicero
Apr 9, 2005·Nature·Malte C Ebach, Craig Holdrege
Apr 29, 2005·Nature·T Ryan Gregory
May 6, 2005·Nature·David E Schindel, Scott E Miller
Oct 11, 2005·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Vincent SavolainenRichard Lane
Oct 11, 2005·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Robert D WardPaul D N Hebert
Oct 11, 2005·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Rob DeSalleMark Siddall
Dec 13, 2005·PLoS Biology·Christopher P Meyer, Gustav Paulay
Mar 2, 2006·Systematic Biology·Rasmus Nielsen, Mikhail Matz
Oct 25, 2006·Systematic Biology·Michael J HickersonCraig Moritz
Jan 9, 2007·Molecular Phylogenetics and Evolution·Torbjørn EkremElisabeth Stur
Feb 24, 2007·Trends in Genetics : TIG·Mehrdad HajibabaeiDonal A Hickey
Mar 9, 2007·Frontiers in Zoology·Martin Wiemers, Konrad Fiedler
Jun 15, 2007·BMC Biology·Mehrdad HajibabaeiPaul D N Hebert
Sep 6, 2007·Proceedings. Biological Sciences·Marianne EliasChris D Jiggins
Sep 12, 2007·Bioinformatics·M A LarkinD G Higgins
Mar 18, 2008·Molecular Phylogenetics and Evolution·Wei Zhang, Zhirong Sun
Apr 10, 2008·Systematic Biology·Howard A RossWai Lok Sibon Li
May 14, 2008·BMC Genomics·Isabelle MeusnierMehrdad Hajibabaei
Jun 25, 2008·Infection, Genetics and Evolution : Journal of Molecular Epidemiology and Evolutionary Genetics in Infectious Diseases·Lise Frézal, Raphael Leblois
Sep 2, 2008·Proceedings of the National Academy of Sciences of the United States of America·Hojun SongKeith A Crandall
Oct 15, 2008·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Kasper MunchRasmus Nielsen
Oct 15, 2008·Systematic Biology·Kasper MunchRasmus Nielsen
Sep 19, 2009·Molecular Phylogenetics and Evolution·A B ZhangC-D Zhu

❮ Previous
Next ❯

Citations

Aug 3, 2016·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·Diego Mallo, David Posada

❮ Previous
Next ❯

Methods Mentioned

BETA
feature extraction

Software Mentioned

MUSCLE
NJ
PHYML
PAUP
beta
BOLD
ClustalW

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.