An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing

GigaScience
Aleksey ZiminSteven L Salzberg

Abstract

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.

References

Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Dec 2, 2008·Nucleic Acids Research·Genis ParraIan Korf
Aug 21, 2010·Molecular Ecology·Andrew J EckertDavid B Neale
May 24, 2013·Nature·Björn NystedtStefan Jansson
Aug 31, 2013·Bioinformatics·Aleksey ZiminJames A Yorke
Mar 22, 2014·Genetics·Aleksey ZiminCharles H Langley
Oct 29, 2014·Plant Physiology·Amanda R De La TorreJörg Bohlmann
May 26, 2015·Nature Biotechnology·Konstantin BerlinAdam M Phillippy

Citations

Aug 14, 2018·Bioinformatics·Reuben J Pengelly, Andrew Collins
Oct 16, 2018·The New Phytologist·Amanda R De La TorreDavid B Neale
Dec 23, 2018·Database : the Journal of Biological Databases and Curation·Fei YiJunhui Wang
Sep 10, 2019·Mass Spectrometry Reviews·Ana Margarida RodriguesCarla António
Jul 8, 2020·Molecular Ecology Resources·Annika PerryStephen Cavers

Related Concepts

Sequence Determinations, DNA
Genome, Plant
Contig Mapping
Genomics
Pinus taeda
High-Throughput Nucleotide Sequencing
Genome
Pinus taeda
Size
Molecular Assembly/Self Assembly

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.