Mar 3, 2009

ABySS: a parallel assembler for short read sequence data

Genome Research
Jared T SimpsonInanç Birol


Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs > or =100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.

  • References32
  • Citations1361


Mentioned in this Paper

Escherichia coli K12
Computer Programs and Programming
Sequence Determinations, DNA
Base Pairing
Genetic Polymorphism
Contig Mapping
Large-Scale Sequencing

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Autism: Motor Learning

A common feature of autism spectrum disorder (ASD) is the impairment of motor control and learning, consistent with perturbation in cerebellar function. Find the latest research on ASD and motor learning here.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

Sexual Dimorphism in Neurodegeneration

There exist sex differences in neurodevelopmental and neurodegenerative disorders. For instance, multiple sclerosis is more common in women, whereas Parkinson’s disease is more common in men. Here is the latest research on sexual dimorphism in neurodegeneration

Protein Localization in Disease & Therapy

Localization of proteins is critical for ensuring the correct location for physiological functioning. If an error occurs, diseases such as cardiovascular, neurodegenerative disorders and cancers can present. Therapies are being explored to target this mislocalization. Here is the latest research on protein localization in disease and therapy.

Genetic Screens in Bacteria

Genetic screens can provide important information on gene function as well as the molecular events that underlie a biological process or pathway. Here is the latest research on genetic screens in bacteria.

Head And Neck Squamous Cell Carcinoma

Squamous cell carcinomas account for >90% of all tumors in the head and neck region. Head and neck squamous cell carcinoma incidence has increased dramatically recently with little improvement in patient outcomes. Here is the latest research on this aggressive malignancy.

Artificial Intelligence in Cardiac Imaging

Artificial intelligence (ai) techniques are increasingly applied to cardiovascular (cv) medicine in cardiac imaging analysis. Here is the latest research.