Identifying Group-Specific Sequences for Microbial Communities Using Long k -mer Sequence Signatures

Frontiers in Microbiology
Ying WangFengzhu Sun

Abstract

Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered "group-specific" in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the dis...Continue Reading

References

Nov 1, 1991·Journal of Neuropathology and Experimental Neurology·W Paulus, K Jellinger
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Oct 6, 1999·Genome Research·X Huang, A Madan
Apr 17, 2004·Bioinformatics·Yuriy FofanovB Montgomery Pettitt
Oct 9, 2008·PloS One·Daniel C RichterDaniel H Huson
Dec 5, 2008·Microbiology and Molecular Biology Reviews : MMBR·Victor KuninPhilip Hugenholtz
Apr 11, 2009·PLoS Computational Biology·James Robert WhiteMihai Pop
Nov 7, 2009·Science·Elizabeth K CostelloRob Knight
Jan 11, 2011·Bioinformatics·Guillaume Marçais, Carl Kingsford
May 17, 2011·Nature Biotechnology·Manfred G GrabherrAviv Regev
Jun 28, 2011·Genome Biology·Nicola SegataCurtis Huttenhower
Jun 16, 2012·Nature·UNKNOWN Human Microbiome Project Consortium
Jun 16, 2012·Nature·Tanya YatsunenkoJeffrey I Gordon
Nov 30, 2012·Nucleic Acids Research·Christian QuastFrank Oliver Glöckner
Dec 28, 2012·BMC Genomics·Bai JiangXuegong Zhang
Jan 18, 2013·Bioinformatics·Guillaume RizkRayan Chikhi
May 31, 2013·Nature·Fredrik H KarlssonFredrik Bäckhed
Jul 19, 2013·Genome Research·Catherine A LozuponeRob Knight
Sep 3, 2013·Journal of Hepatology·Reiner WiestMarkus Geuking
Sep 24, 2013·BMC Genomics·Hongfei Cui, Xuegong Zhang
Sep 15, 2014·Nature Methods·Johannes AlnebergChristopher Quince
Feb 5, 2015·Algorithms for Molecular Biology : AMB·Le Van VinhTran Van Hoai
Feb 15, 2015·Journal of Molecular Cell Biology·Rui Jiang
Mar 12, 2015·Nature Communications·Qiang FengJun Wang
Jan 2, 2016·Biochemical and Biophysical Research Communications·Ying WangTing Chen
Mar 10, 2016·Microbiome·Naseer SangwanJack A Gilbert
Jul 12, 2016·PLoS Computational Biology·Edoardo PasolliNicola Segata
Jul 29, 2017·Genome Biology·Chengping WenStanislav Dusko Ehrlich
Oct 3, 2017·Nature Methods·Alexander SczyrbaAlice C McHardy
Oct 5, 2017·Genome Biology·Xin XingWenxuan Zhong
Nov 1, 2017·Nature Methods·Edoardo PasolliLevi Waldron
Dec 1, 2017·BMC Genomics·Bhavya PapudeshiElizabeth A Dinsdale

❮ Previous
Next ❯

Citations

Sep 29, 2020·Frontiers in Microbiology·Ying WangFengzhu Sun
Feb 16, 2021·Frontiers in Genetics·Yujie HouYing Wang
Oct 21, 2021·Genome Research·Kristoffer Sahlin

❮ Previous
Next ❯

Datasets Mentioned

BETA
ERP002469

Software Mentioned

Nucleotide Blast
CAP3
DSK
MaxBin2
CONCOCT
JELLYFISH
COCACOLA
MegaHIT
Apache Spark
MetaSim

Related Concepts

Related Feeds

Biomarkers for Type 2 Diabetes

Biomarkers can help understand chronic diseases and assist in risk prediction for prevention and early detection of diseases. Here is the latest research on biomarkers in type 2 diabetes, a disease in which the body is unable to produce or properly use insulin.