AGOUTI: improving genome assembly and annotation using transcriptome data

GigaScience
Simo V ZhangMatthew W Hahn

Abstract

Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species. We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on both simulated and real datasets, we show that it is highly accurate and that it achieves greater accuracy and contiguity when compared with other existing methods. AGOUTI is a powerful and effective scaffolder and, unlike most scaffolders, is expected to be more effective in larger genomes because of the commensurate increase in intron length. AGOUTI is able to scaffold thousands of contigs ...Continue Reading

References

Jan 11, 2000·Nucleic Acids Research·L SteinJ Spieth
Jun 10, 2009·Bioinformatics·Heng Li1000 Genome Project Data Processing Subgroup
Jun 19, 2009·Nature·Susan E CelnikermodENCODE Consortium
Jan 30, 2010·Bioinformatics·Aaron R Quinlan, Ira M Hall
Oct 29, 2010·Genome Research·Ali MortazaviPaul W Sternberg
Nov 26, 2010·Nature Methods·Can AlkanEvan E Eichler
Dec 15, 2010·Bioinformatics·Marten BoetzerWalter Pirovano
Dec 29, 2010·Proceedings of the National Academy of Sciences of the United States of America·Sante GnerreDavid B Jaffe
Oct 14, 2011·Bioinformatics·Oksana Riba-GrognuzYannick Wurm
Mar 6, 2012·Nature Methods·Ben Langmead, Steven L Salzberg
Sep 10, 2013·BMC Genomics·Wei XueXiao-Wen Sun
Mar 4, 2014·Genome Biology·Martin HuntThomas D Otto
Dec 5, 2014·PLoS Computational Biology·James F DentonMatthew W Hahn
Feb 13, 2016·PLoS Biology·James B PeaseLeonie C Moyle

Citations

Jun 15, 2018·The Plant Journal : for Cell and Molecular Biology·Claudio Casola, Tomasz E Koralewski
Aug 1, 2018·ELife·Philipp BrandBrian R Johnson
Mar 1, 2018·Antonie van Leeuwenhoek·Alexander V PinevichHelena Y Dmitrieva
Jan 27, 2019·Genome Research·Jeramiah J SmithS Randal Voss
Jun 6, 2019·PLoS Computational Biology·Jay Ghurye, Mihai Pop
Jun 28, 2020·Genome Biology and Evolution·Maulik UpadhyayIvica Medugorac
Apr 25, 2018·GigaScience·José Horacio GrauMichael Hofreiter
May 26, 2018·Scientific Reports·Valentino RuggieriJordi Garcia-Mas
Oct 5, 2018·Frontiers in Genetics·Shubham K JaiswalVineet K Sharma
Aug 25, 2020·Genome Biology and Evolution·Andrey A YurchenkoKathryn R Elmer

Related Concepts

Computer Programs and Programming
Genome
Sequence Determinations, RNA
Computational Molecular Biology
MRNA Differential Display
Molecular Sequence Annotation
High-Throughput Nucleotide Sequencing
Genes
Genome
RNA

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.