DOI: 10.1101/006841Jul 3, 2014Paper

Single haplotype assembly of the human genome from a hydatidiform mole

BioRxiv : the Preprint Server for Biology
Karyn Meltz SteinbergRichard K Wilson


An accurate and complete reference human genome sequence assembly is essential for accurately interpreting individual genomes and associating sequence variation with disease phenotypes. While the current reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can help overcome these problems, even the longest available reads do not resolve all regions of the human genome. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones, an optical map, and 100X whole genome shotgun (WGS) sequence coverage using short (lllumina) read pairs. We used the WGS sequence and the GRCh37 reference assembly to create a sequence assembly of the CHM1 genome. We subsequently incorporated 382 finished CHORI-17 BAC clone sequences to generate a second draft assembly, CHM1\_1.1 (NCBI AssemblyDB GCA\_000306695.2). Analysis of gene and repeat conten...Continue Reading

Related Concepts

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.