ARCS: scaffolding genome drafts with linked reads

Sarah YeoInanc Birol


Sequencing of human genomes is now routine, and assembly of shotgun reads is increasingly feasible. However, assemblies often fail to inform about chromosome-scale structure due to a lack of linkage information over long stretches of DNA-a shortcoming that is being addressed by new sequencing protocols, such as the GemCode and Chromium linked reads from 10 × Genomics. Here, we present ARCS, an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. We show how the contiguity of an ABySS H.sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts. Supplementary data are available at Bioinformatics online.


Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Jun 23, 2009·Genome Research·Martin KrzywinskiMarco A Marra
Jan 19, 2010·Bioinformatics·Heng Li, Richard Durbin
Feb 25, 2014·Nature Biotechnology·Volodymyr KuleshovMichael P Snyder
Mar 4, 2014·Genome Biology·Martin HuntThomas D Otto
Feb 11, 2015·Bioinformatics·Jared O'ConnellAnthony J Cox
May 9, 2015·Bioinformatics·Heng Li
Feb 2, 2016·Nature Biotechnology·Grace X Y ZhengHanlee P Ji
May 10, 2016·Nature Methods·Yulia MostovoyPui-Yan Kwok
May 18, 2016·Nature Reviews. Genetics·Sara GoodwinW Richard McCombie
Jun 17, 2016·Bioinformatics·Volodymyr KuleshovSerafim Batzoglou
Feb 25, 2017·Genome Research·Shaun D JackmanInanc Birol
Apr 7, 2017·Genome Research·Neil I WeisenfeldDavid B Jaffe


Dec 14, 2017·Genes·Samantha J JonesSteven J M Jones
May 17, 2018·Human Molecular Genetics·Martin O PollardManjinder S Sandhu
Nov 30, 2018·Annual Review of Animal Biosciences·Jose V LopezIliana B Baums
Feb 21, 2019·Bioinformatics·Antoine LimassetPierre Peterlongo
Jun 6, 2019·PLoS Computational Biology·Jay Ghurye, Mihai Pop
Jun 30, 2019·Genes·Harwood H KwanSteven J M Jones
Jun 20, 2019·GigaScience·Qing LiZhonghua Zhang
Jun 22, 2018·BMC Bioinformatics·Lauren CoombeRene L Warren
Nov 1, 2019·BMC Bioinformatics·Junwei LuoChaokun Yan
Mar 11, 2020·Molecular Ecology Resources·Jocelyn P ColellaMatthew D MacManes
Sep 13, 2019·Bioinformatics·Ivan TolstoganovPavel A Pevzner
Nov 29, 2019·Scientific Reports·Sunnvør Í KongsstovuHans Atli Dahl
Mar 13, 2020·Scientific Data·Maria KyriakidouMartina V Strömvik
Mar 31, 2018·Nature Reviews. Genetics·Fritz J SedlazeckMichael C Schatz
Oct 28, 2018·BMC Bioinformatics·Shaun D JackmanInanc Birol
Mar 21, 2020·Genome Biology·Fatih KaraoğlanoğluCan Alkan
Nov 14, 2018·Bioinformatics·Ibrahim NumanagicFaraz Hach
Sep 30, 2020·Nature Genetics·Qian ZhouSanwen Huang
Nov 21, 2020·Bioinformatics·Markus HiltunenHanna Johannesson

Related Concepts

Clinical Protocols
Chromosome Structures
Molecular Assembly/Self Assembly
Abyssal strain AIII3*
Nucleic Acid Sequencing
Arthrogryposis Renal Dysfunction Cholestasis Syndrome
DNA Barcode, Taxonomic

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Computational Methods for Protein Structures

Computational methods employing machine learning algorithms are powerful tools that can be used to predict the effect of mutations on protein structure. This is important in neurodegenerative disorders, where some mutations can cause the formation of toxic protein aggregations. This feed follows the latests insights into the relationships between mutation and protein structure leading to better understanding of disease.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.