Aug 14, 2014

Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing

bioRxiv
Konstantin BerlinAdam M Phillippy

Abstract

We report reference-grade de novo assemblies of four model organisms and the human genome from single-molecule, real-time (SMRT) sequencing. Long-read SMRT sequencing is routinely used to finish microbial genomes, but the available assembly methods have not scaled well to larger genomes. Here we introduce the MinHash Alignment Process (MHAP) for efficient overlapping of noisy, long reads using probabilistic, locality-sensitive hashing. Together with Celera Assembler, MHAP was used to reconstruct the genomes of Escherichia coli , Saccharomyces cerevisiae , Arabidopsis thaliana , Drosophila melanogaster , and human from high-coverage SMRT sequencing. The resulting assemblies include fully resolved chromosome arms and close persistent gaps in these important reference genomes, including heterochromatic and telomeric transition sequences. For D. melanogaster , MHAP achieved a 600-fold speedup relative to prior methods and a cloud computing cost of a few hundred dollars. These results demonstrate that single-molecule sequencing alone can produce near-complete eukaryotic genomes at modest cost.

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

NCOR2 wt Allele
Heterochromatic Silencing
Genome
Saccharomyces cerevisiae allergenic extract
Drosophila melanogaster Proteins
Nucleic Acid Sequencing
Single Molecule Imaging
Sequencing
Arabidopsis thaliana extract
Chromosomes

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.