CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers

BMC Genomics
Rachid OunitStefano Lonardi

Abstract

The problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce. We introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions. CLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/ .

References

Oct 5, 1990·Journal of Molecular Biology·S F AltschulDavid J Lipman
Jul 13, 2000·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Z ZhangW Miller
Mar 4, 2003·Bioinformatics·Susana Vinga, Jonas Almeida
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
May 25, 2005·Proceedings of the National Academy of Sciences of the United States of America·Richard W HymanRonald W Davis
Jan 27, 2007·Genome Research·Daniel H HusonStephan C Schuster
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos C Kyrpides
Dec 8, 2009·BMC Genomics·Timothy J CloseRobbie Waugh
Mar 2, 2011·Nature Methods·Kaustubh R PatilAlice C McHardy
Apr 30, 2011·Nature Methods·Arthur Brady, S L Salzberg
May 12, 2012·BMC Bioinformatics·Adam L Bazinet, Michael P Cummings
Jun 13, 2012·Nature Methods·Nicola SegataCurtis Huttenhower
Jun 16, 2012·Nature·Human Microbiome Project Consortium
Jun 16, 2012·Nature·Human Microbiome Project Consortium
Aug 17, 2012·Functional & Integrative Genomics·Jaroslav DolezelHana Simková
Nov 30, 2012·Nucleic Acids Research·Dennis A BensonEric W Sayers
Apr 18, 2013·PLoS Computational Biology·Stefano LonardiTimothy J Close
May 15, 2013·Bioinformatics·Tanja MagocS L Salzberg
Jul 6, 2013·Bioinformatics·Sasha K AmesJonathan E Allen
Sep 4, 2013·The Plant Journal : for Cell and Molecular Biology·Martin MascherRobbie Waugh
Sep 10, 2013·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Heba S SaidMasahira Hattori
Feb 14, 2014·Nucleic Acids Research·Qichao TuJizhong Zhou
Mar 4, 2014·Genome Biology·Derrick E Wood, S L Salzberg
Mar 15, 2014·PloS One·David KoslickiGail L Rosen

Citations

Jan 8, 2016·BMC Bioinformatics·Vinh Van LeHoai Van Tran
Apr 1, 2016·FEMS Microbiology Letters·Bonnie L HurwitzKen Youens-Clark
May 7, 2016·Bioinformatics·Vitor C PiroBernhard Y Renard
Feb 9, 2017·Archives of Pathology & Laboratory Medicine·Robert SchlabergMicrobiology Resource Committee of the College of American Pathologists
Apr 28, 2017·Scientific Data·Sebastian BeierMartin Mascher
May 26, 2017·Viruses·Stephen HayesDouwe van Sinderen
Aug 10, 2017·Nature Reviews. Gastroenterology & Hepatology·Marcus J ClaessonPaul W O'Toole
Sep 24, 2017·Microbiome·Niamh B O'HaraChristopher E Mason
Oct 3, 2017·Nature Methods·Alexander SczyrbaAlice C McHardy
Apr 25, 2018·Mediators of Inflammation·Anna PiccaEmanuele Marzetti
Aug 3, 2016·Virus Evolution·Rebecca RoseMattia Prosperi
Apr 20, 2018·Scientific Reports·Quentin Le BastardEmmanuel Montassier
Jun 6, 2018·PeerJ·Adam L BazinetShashikala Ratnayake
Jul 17, 2018·Frontiers in Plant Science·Gloria Torres-CortésMatthieu Barret
Jan 31, 2019·Bioinformatics·Benjamin LinardFabio Pardi
Jun 14, 2019·Frontiers in Microbiology·Valeria ImperatoSofie Thijs
Dec 28, 2019·Biomolecules·Guillermin Agüero-ChapinAgostinho Antunes
Sep 19, 2019·Virus Evolution·Maha MaabarJoseph Hughes
Dec 19, 2019·Briefings in Bioinformatics·Richa Bharti, Dominik G Grimm
Feb 25, 2020·Hypertension·Seungbum KimMohan K Raizada
Jan 9, 2020·American Journal of Physiology. Cell Physiology·Kumar KotloBrian T Layden
Mar 14, 2020·Bioinformatics·Vijini MallawaarachchiYu Lin
Jul 25, 2020·Microbial Genomics·Ana Elena Pérez-CobasCarmen Buchrieser
Sep 18, 2020·BMC Bioinformatics·Veronica GuerriniGiovanna Rosone
Mar 18, 2016·Nucleic Acids Research·Yanming ZhangFangqing Zhao
Sep 20, 2015·Applied and Environmental Microbiology·Susan R LeonardChristopher A Elkins
Jul 26, 2015·Bioinformatics·Karel BřindaGregory Kucherov
Apr 14, 2016·Nature Communications·Peter MenzelAnders Krogh
Dec 26, 2016·Briefings in Bioinformatics·Valentina GalataAndreas Keller
Nov 29, 2015·Bioinformatics·Alla MikheenkoAlexey Gurevich
Mar 28, 2017·Journal of Bioinformatics and Computational Biology·Diem-Trang PhamVinhthuy Phan
Jan 5, 2018·Genes·Sandra Christine Andersen, Jeffrey Hoorfar
Sep 21, 2017·International Journal of Molecular Sciences·Stefano LeoJacques Schrenzel
Jun 10, 2017·BMC Microbiology·Sandra Christine AndersenJeffrey Hoorfar
Sep 9, 2017·Bioinformatics·Martina FischerBernhard Y Renard
Dec 16, 2017·BMC Genomics·Samuele GirottoCinzia Pizzi
Aug 16, 2017·Microbiome·Vitor C PiroBernhard Y Renard
Sep 30, 2017·Bioinformatics·André MüllerBertil Schmidt
Jun 22, 2018·BMC Bioinformatics·Lauren CoombeRené L Warren
Jan 4, 2018·BMC Bioinformatics·Quang TranVinhthuy Phan
May 16, 2018·Proceedings of the National Academy of Sciences of the United States of America·Daniel M CornforthMarvin Whiteley
Nov 25, 2018·International Journal of Molecular Sciences·Wolfgang KaisersHeiner Schaal
Dec 1, 2018·BMC Bioinformatics·Samuele GirottoCinzia Pizzi
Apr 9, 2019·PLoS Computational Biology·Jose Manuel Martí
Aug 15, 2019·PloS One·Tyler J Dougan, Stephen R Quake
Jan 17, 2020·Molecular Ecology Resources·Eleonora RachtmanSiavash Mirarab
Nov 30, 2019·Genome Biology·Derrick E WoodBen Langmead
Apr 9, 2020·Theoretical Biology & Medical Modelling·Jian-Hong SunShu-Qun Liu
May 22, 2020·Critical Reviews in Microbiology·Chaminda Jayampath SeneviratneEgija Zaura
Jun 25, 2020·Briefings in Bioinformatics·Matteo CominFabio Vandin
Aug 23, 2020·Genes·Quang Tran, Vinhthuy Phan
Jan 19, 2016·Scientific Reports·Stinus LindgreenPaul P Gardner
May 6, 2016·Frontiers in Microbiology·Juan JovelGane K-S Wong
Aug 20, 2016·Bioinformatics·Rachid Ounit, Stefano Lonardi
Aug 19, 2017·BMC Microbiology·Jan D BrüwerChristian R Voolstra
Jul 3, 2017·Intervirology·Andriniaina Andy Nkili-MeyongNicolas Berthet
Oct 14, 2017·Briefings in Bioinformatics·F P BreitwieserS L Salzberg
Sep 28, 2017·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Sawsan KanjStéphane Gazut
Jan 5, 2017·BMC Bioinformatics·Robin KobusBertil Schmidt
Nov 10, 2017·Journal of Clinical Microbiology·Jie LiuEric Houpt
Feb 15, 2018·Bioinformatics·Prashant PandeyBonnie Berger
Apr 25, 2018·Journal of Dairy Science·T A McAllisterR Zaheer
Oct 5, 2017·Genome Biology·Xin XingWenxuan Zhong
Aug 15, 2018·Scientific Reports·Alejandra Escobar-ZepedaAlejandro Sanchez-Flores
Jul 17, 2018·Bioinformatics·Yunan LuoJian Peng
Mar 6, 2019·Genome Biology·Fernando MeyerDavid Koslicki
Jan 17, 2019·Bioinformatics·Subrata SahaSanguthevar Rajasekaran
Mar 14, 2020·BMC Bioinformatics·Robin KobusBertil Schmidt
Jun 10, 2020·Bioinformatics·David J Burks, Rajeev K Azad
May 14, 2020·BMC Bioinformatics·Mikang SimJaebum Kim
Feb 5, 2020·Frontiers in Genetics·Rilquer MascarenhasPedro Milet Meirelles
Jun 5, 2020·BMC Biology·Ryuichi KumataKei Sato
Sep 22, 2020·PeerJ·Andres BenavidesFelipe Cabarcas
Jul 10, 2020·Proceedings of the National Academy of Sciences of the United States of America·Justin ChuInanc Birol
Aug 8, 2015·The Plant Journal : for Cell and Molecular Biology·María Muñoz-AmatriaínTimothy J Close
Oct 11, 2015·Bioinformatics·Genivaldo Gueiros Z SilvaRobert A Edwards
Sep 30, 2016·PloS One·Ahmed A MetwallyDavid L Perkins
Feb 16, 2017·Clinical and Experimental Allergy : Journal of the British Society for Allergy and Clinical Immunology·C P FrossardP A Eigenmann
Jun 15, 2016·Bioinformatics·Panu SomervuoOtso Ovaskainen
Feb 15, 2017·Proceedings of the National Academy of Sciences of the United States of America·Wei LinYongxin Pan
Aug 6, 2018·Applied Microbiology and Biotechnology·Tiphaine C MartinMario Falchi
Jun 29, 2018·Bioinformatics·Fatemeh AlmodaresiRob Patro
Oct 26, 2018·Bioinformatics·Rebecca RoseMattia Prosperi
Nov 5, 2019·Frontiers in Genetics·Nidhi ShahMihai Pop
Jun 27, 2019·Communications Biology·Aubin ThomasWilliam Ritchie
Oct 24, 2019·Genome Biology·F A Bastiaan von MeijenfeldtBas E Dutilh
Nov 12, 2019·Frontiers in Plant Science·Rares LucaciuThomas Rattei
Feb 7, 2020·International Journal of Molecular Sciences·Valery V PanyukovOlga N Ozoline
Mar 10, 2020·Bioengineered·Mukesh Kumar AwasthiZengqiang Zhang
Jul 16, 2016·The ISME Journal·Luke R ThompsonUlrich Stingl
Apr 26, 2017·Frontiers in Microbiology·Krupa M ParmarRavindra Pal Singh
Apr 4, 2017·The New Phytologist·Laura ParducciMikkel Winther Pedersen
Sep 23, 2017·Genome Biology·Alexa B R McIntyreChristopher E Mason
Mar 29, 2018·Algorithms for Molecular Biology : AMB·Samuele GirottoCinzia Pizzi
Apr 17, 2018·PLoS Computational Biology·Mark Reppell, John Novembre
May 31, 2019·Nature Medicine·Myrna G SerranoGregory A Buck
Nov 18, 2018·Genome Biology·F P BreitwieserS L Salzberg
May 6, 2019·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Dylan LebatteuxAbdoulaye Baniré Diallo
Dec 11, 2019·Science Progress·Alexander Wy ChanHerb E Schellhorn
Feb 11, 2021·The Journal of Microbiology·Ho-Jin GwakMina Rho
Dec 18, 2020·Frontiers in Cellular and Infection Microbiology·Stefano LeoJacques Schrenzel
Jan 28, 2021·Pathogens·Erin M GarciaKimberly K Jefferson
Dec 23, 2020·BMC Genomics·Chi-Ming LeungTak-Wah Lam
Feb 13, 2021·NAR Genomics and Bioinformatics·Qiaoxing LiangLai Wei
Nov 10, 2020·Subrata SahaZigeng Wang

Related Concepts

Chromosomes
Classification
DNA Sequence
Clark 2
GPER protein, human
Genomics
Computational Molecular Biology
Nucleic Acid Sequencing
Transcript
Clone

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Systemic Juvenile Idiopathic Arthritis

Systemic juvenile idiopathic arthritis is a rare rheumatic disease that affects children. Symptoms include joint pain, but also fevers and skin rashes. Here is the latest on this disease.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Pontocerebellar Hypoplasia

Pontocerebellar hypoplasias are a group of neurodegenerative autosomal recessive disorders with prenatal onset, atrophy or hypoplasia of the cerebellum, hypoplasia of the ventral pons, microcephaly, variable neocortical atrophy and severe mental and motor impairments. Here is the latest research on pontocerebellar hypoplasia.

Cell Atlas Along the Gut-Brain Axis

Profiling cells along the gut-brain axis at the single cell level will provide unique information for each cell type, a three-dimensional map of how cell types work together to form tissues, and insights into how changes in the map underlie health and disease of the GI system and its crosstalk with the brain. Disocver the latest research on single cell analysis of the gut-brain axis here.

Chronic Traumatic Encephalopathy

Chronic Traumatic Encephalopathy (CTE) is a progressive degenerative disease that occurs in individuals that suffer repetitive brain trauma. Discover the latest research on traumatic encephalopathy here.