FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares

Genivaldo Gueiros Z SilvaRobert A Edwards


One of the major goals in metagenomics is to identify the organisms present in a microbial community from unannotated shotgun sequencing reads. Taxonomic profiling has valuable applications in biological and medical research, including disease diagnostics. Most currently available approaches do not scale well with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here we introduce FOCUS, an agile composition based approach using non-negative least squares (NNLS) to report the organisms present in metagenomic samples and profile their abundances. FOCUS was tested with simulated and real metagenomes, and the results show that our approach accurately predicts the organisms present in microbial communities. FOCUS was implemented in Python. The source code and web-sever are freely available at

Associated Proceedings Papers

Nov 15, 2018·V.V. PanyukovS.S. Kiselev


Oct 11, 2015·Bioinformatics·Genivaldo G Z SilvaRobert A Edwards
Jan 18, 2016·BMC Bioinformatics·Veronika B DubinkinaDmitry G Alexeev
Dec 15, 2015·FEMS Microbiology Reviews·Robert A EdwardsBas E Dutilh
Mar 17, 2016·Nature·B KnowlesF Rohwer
Mar 18, 2016·Nucleic Acids Research·Yanming ZhangFangqing Zhao
Aug 10, 2015·Cellular and Molecular Life Sciences : CMLS·Daniel R Garza, Bas E Dutilh
Oct 12, 2015·Database : the Journal of Biological Databases and Curation·Pedro Milet MeirellesFabiano L Thompson
Oct 21, 2015·Frontiers in Microbiology·Saskia L SmitsAnita C Schürch
Feb 13, 2015·Briefings in Bioinformatics·Marie Lisandra Zepeda MendozaM Thomas P Gilbert
Jul 11, 2015·PeerJ·Katelyn McNair, Robert A Edwards
Jun 5, 2016·Bioinformatics·Vladimir I UlyantsevDmitry G Alexeev
Aug 2, 2016·Journal of Microbiological Methods·Karina HeckMarli F Fiore
Mar 28, 2017·Journal of Bioinformatics and Computational Biology·Diem-Trang PhamVinhthuy Phan
Jul 6, 2017·Nature Communications·Felipe H CoutinhoFabiano L Thompson
Dec 1, 2017·BMC Genomics·Bhavya PapudeshiElizabeth A Dinsdale
Oct 3, 2017·Nature Methods·Alexander SczyrbaAlice C McHardy
Aug 17, 2019·Frontiers in Microbiology·Miriam Gonçalves de ChavesAcacio Aparecido Navarrete
May 26, 2017·Frontiers in Microbiology·Danillo O AlvarengaAlessandro M Varani
Oct 24, 2019·Genome Biology·F A Bastiaan von MeijenfeldtBas E Dutilh
Mar 6, 2019·Genome Biology·Fernando MeyerDavid Koslicki
Dec 4, 2019·ELife·Cynthia B SilveiraForest Rohwer
Jun 30, 2017·Frontiers in Microbiology·Mariana E CampeãoCristiane C Thompson
Feb 13, 2020·Bioinformatics·Bahar AlipanahiChristina Boucher


Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Jun 17, 1998·Proceedings of the National Academy of Sciences of the United States of America·W B WhitmanW J Wiebe
Jul 13, 2000·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Z ZhangW Miller
Dec 14, 2004·Microbiology and Molecular Biology Reviews : MMBR·Jo Handelsman
Dec 21, 2006·Nature Methods·Alice C McHardyIsidore Rigoutsos
May 1, 2007·Nature Methods·Konstantinos MavromatisNikos C Kyrpides
Oct 6, 2007·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Ken KurokawaMasahira Hattori
Mar 6, 2009·Genome Biology·Ben LangmeadSteven Salzberg
Jan 11, 2011·Bioinformatics·Guillaume Marçais, Carl Kingsford
Apr 30, 2011·Nature Methods·Arthur Brady, Steven Salzberg
May 7, 2011·Bioinformatics·Peter MeinickeThomas Lingner
Jul 1, 2011·The ISME Journal·Pedro Belda-FerreAlex Mira
Feb 18, 2012·BMC Genomics·Kerensa E McElroyTorsten Thomas
Jun 13, 2012·Nature Methods·Nicola SegataCurtis Huttenhower
Jun 16, 2012·Nature·Human Microbiome Project Consortium
Sep 4, 2012·Nucleic Acids Research·Martin S Lindner, Bernhard Y Renard
Sep 11, 2012·Briefings in Bioinformatics·Sharmila S MandeTarini Shankar Ghosh
Dec 12, 2012·Frontiers in Microbiology·Marc StrousHalina E Tegetmeyer
Oct 23, 2013·PLoS Computational Biology·Rogan CarrElhanan Borenstein
Nov 30, 2013·Science·Itai Sharon, Jillian F Banfield

Related Concepts

Severe Acute Respiratory Syndrome
Entire Oral Cavity
Radioallergosorbent Test
Cyanea capillata preparation

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Sexual Dimorphism in Neurodegeneration

There exist sex differences in neurodevelopmental and neurodegenerative disorders. For instance, multiple sclerosis is more common in women, whereas Parkinson’s disease is more common in men. Here is the latest research on sexual dimorphism in neurodegeneration

HLA Genetic Variation

HLA genetic variation has been found to confer risk for a wide variety of diseases. Identifying these associations and understanding their molecular mechanisms is ongoing and holds promise for the development of therapeutics. Find the latest research on HLA genetic variation here.

Super-resolution Microscopy

Super-resolution microscopy is the term commonly given to fluorescence microscopy techniques with resolutions that are not limited by the diffraction of light. Here are the latest discoveries pertaining to super-resolution microscopy.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells.

Brain Lower Grade Glioma

Low grade gliomas in the brain form from oligodendrocytes and astrocytes and are the slowest-growing glioma in adults. Discover the latest research on these brain tumors here.

CD4/CD8 Signaling

Cluster of differentiation 4 and 8 (CD8 and CD8) are glycoproteins founds on the surface of immune cells. Here is the latest research on their role in cell signaling pathways.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.