Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity

BioRxiv : the Preprint Server for Biology
C Titus BrownBlair Sullivan

Abstract

Genomes computationally inferred from large metagenomic data sets are often incomplete and may be missing functionally important content and strain variation. We introduce an information retrieval system for large metagenomic data sets that exploits the sparsity of DNA assembly graphs to efficiently extract subgraphs surrounding an inferred genome. We apply this system to recover missing content from genome bins and show that substantial genomic sequence variation is present in a real metagenome. Our software implementation is available at https://github.com/spacegraphcats/spacegraphcats under the 3-Clause BSD License.

Related Concepts

Genome
Computer Software
Molecular Assembly/Self Assembly
Replication Licensing
Genome Assembly Sequence
Quantitative Real-Time Polymerase Chain Reaction

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

© 2021 Meta ULC. All rights reserved