Optimization of de novo transcriptome assembly from next-generation sequencing data

Genome Research
Yann Surget-Groba, Juan I Montoya-Burgos

Abstract

Transcriptome analysis has important applications in many biological fields. However, assembling a transcriptome without a known reference remains a challenging task requiring algorithmic improvements. We present two methods for substantially improving transcriptome de novo assembly. The first method relies on the observation that the use of a single k-mer length by current de novo assemblers is suboptimal to assemble transcriptomes where the sequence coverage of transcripts is highly heterogeneous. We present the Multiple-k method in which various k-mer lengths are used for de novo transcriptome assembly. We demonstrate its good performance by assembling de novo a published next-generation transcriptome sequence data set of Aedes aegypti, using the existing genome to check the accuracy of our method. The second method relies on the use of a reference proteome to improve the de novo assembly. We developed the Scaffolding using Translation Mapping (STM) method that uses mapping against the closest available reference proteome for scaffolding contigs that map onto the same protein. In a controlled experiment using simulated data, we show that the STM method considerably improves the assembly, with few errors. We applied these two...Continue Reading

References

Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Jul 27, 2001·Bioinformatics·Pavel A Pevzner, H Tang
Aug 16, 2001·Proceedings of the National Academy of Sciences of the United States of America·Pavel A PevznerM S Waterman
Aug 2, 2005·Nature·Marcel MarguliesJonathan M Rothberg
Oct 21, 2005·Molecular Biology and Evolution·Joseph HughesAlfried P Vogler
May 19, 2007·Science·Vishvanath NeneDavid W Severson
Jul 31, 2007·The Plant Journal : for Cell and Molecular Biology·W Brad BarbazukPatrick S Schnable
Oct 16, 2007·Nature Methods·Gregory J PorrecaJay Shendure
Nov 23, 2007·Genome Research·Tatiana Teixeira TorresChristian Schlötterer
Dec 18, 2007·Genome Research·Mark J Chaisson, Pavel A Pevzner
Jan 1, 2008·Nature Methods·Stephan C Schuster
Feb 13, 2008·Molecular Ecology·J Cristobal VeraJames H Marden
Mar 15, 2008·Genome Research·Jonathan ButlerDavid B Jaffe
Mar 20, 2008·Genome Research·Daniel R Zerbino, Ewan Birney
May 3, 2008·Science·Ugrappa NagalakshmiMichael Snyder
Jun 3, 2008·Nature Methods·Ali MortazaviBarbara Wold
Jun 5, 2008·Biology Letters·Fabien BurkiJan Pawlowski
Jul 1, 2008·Nature Methods·Piero Carninci
Sep 27, 2008·PLoS Computational Biology·Steven L SalzbergVincent T Lee
Nov 19, 2008·Nature Reviews. Genetics·Zhong WangMichael Snyder
Feb 14, 2009·PLoS Biology·Gareth J FraserJ Todd Streelman
Mar 3, 2009·Genome Research·Jared T SimpsonInanç Birol
Apr 8, 2009·Nature Methods·Fuchou TangM Azim Surani
Jun 23, 2009·Genome Research·A Sorana MorrissyMarco A Marra
Jul 28, 2009·Molecular Immunology·Zoltán HegedusAnnemarie H Meijer
Aug 27, 2009·Molecular Biology and Evolution·John G GibbonsAntonis Rokas
Sep 9, 2009·Bioinformatics·Andrew D SmithMichael Q Zhang
Sep 11, 2009·BMC Genomics·Bradley J MainSergey V Nuzhdin
Nov 11, 2009·PloS One·Antoine DissetRabary Rajerison
Feb 24, 2010·BMC Genomics·Juan I Montoya-BurgosIlham Bahechar
Mar 12, 2010·Nature·Stephen B MontgomeryEmmanouil T Dermitzakis

Citations

May 28, 2013·Molecules and Cells·Bo-Rahm LeeByung-Kwan Cho
May 18, 2011·Genetica·Pedro Martínez-GómezManuel Rubio
Aug 4, 2011·Plant Molecular Biology·Zhihui XiaXi Huang
Aug 21, 2012·Genes & Nutrition·Joan Cerdà, Manuel Manchado
Dec 21, 2012·The ISME Journal·Alireza Saidi-MehrabadPeter F Dunfield
May 31, 2011·Nature Methods·Manuel GarberCole Trapnell
Sep 8, 2011·Nature Reviews. Genetics·Jeffrey Martin, Zhong Wang
Jan 30, 2013·Nature Reviews. Genetics·Niranjan Nagarajan, Mihai Pop
Sep 24, 2010·Nature Reviews. Molecular Cell Biology·Marcel Méchali
May 24, 2013·Nature Reviews. Molecular Cell Biology·Dieter Egli, Gloryn Chia Le Bin
Jan 20, 2012·Briefings in Functional Genomics·Benjamin Kilian, Andreas Graner
Oct 26, 2011·Bioinformatics·Guohui YaoGeorge M Weinstock
Nov 20, 2012·Bioinformatics·Yang I Li, Richard R Copley
Nov 28, 2012·Database : the Journal of Biological Databases and Curation·Victor Zeng, Cassandra G Extavour
Jan 27, 2012·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Beide Fu, Shunping He
Jan 26, 2013·Genome Biology and Evolution·Urszula BrykczynskaMichel C Milinkovitch
Aug 27, 2013·Genome Biology and Evolution·Owen G OsborneDmitry A Filatov
Jan 18, 2013·Plant & Cell Physiology·Chun-lin SuMing-Che Shih
Jul 17, 2012·Genome Research·John E CollinsDerek L Stemple
Feb 13, 2013·BMC Bioinformatics·Luis Carlos BelarminoAna Maria Benko-Iseppon
Dec 16, 2010·BMC Genomics·Jeong-Hyeon ChoiJusten Andrews
Jan 27, 2011·BMC Genomics·Ben Ewen-CampenCassandra G Extavour
Mar 16, 2012·BMC Genomics·Nicole GruenheitPeter Lockhart
Jul 11, 2013·BMC Genomics·Shawn T O'Neil, Scott J Emrich
Sep 13, 2013·BMC Genomics·Michael WachholtzKeenan Amundsen
Sep 19, 2012·BMC Plant Biology·Sheila M C GordoSylvain Darnet
Apr 21, 2011·Algorithms for Molecular Biology : AMB·Marius NicolaeAlex Zelikovsky
Apr 13, 2012·PLoS Computational Biology·Onur SakaryaAsim S Siddiqui
Sep 28, 2013·PloS One·Liang QiaoZhaobin Song
Feb 4, 2014·BMC Research Notes·Keng-See ChowZainorlina Mohd-Zainuddin
Jan 2, 2014·Omics : a Journal of Integrative Biology·Xi QianGuofang Zhong
Feb 22, 2014·Molecular Genetics and Genomics : MGG·Olga A PostnikovaLev G Nemchinov
Jun 27, 2014·Proceedings. Biological Sciences·Akito Y Kawahara, Jesse W Breinholt
Feb 18, 2014·Bioinformatics·Yinlong XieJun Wang
Sep 29, 2011·Annual Review of Entomology·David W Severson, Susanta K Behura
Jan 15, 2016·BMC Genomics·Martin BensKarol Szafranski
Jul 27, 2014·Genome Biology·Nadia M Davidson, Alicia Oshlack
Jun 10, 2011·Comparative Biochemistry and Physiology. Toxicology & Pharmacology : CBP·T I GarciaR B Walter
Jan 23, 2015·Genome Biology·Bo LiColin N Dewey
Jul 31, 2013·Environmental Science & Technology·Dongshan AnGerrit Voordouw
Jul 15, 2015·Journal of Evolutionary Biology·Claire MorandinHeikki Helanterä
Mar 14, 2013·Molecular Ecology Resources·Graham M HughesEmma C Teeling
Sep 19, 2012·The New Phytologist·Tina KyndtGodelieve Gheysen
Sep 17, 2013·Journal of Experimental Zoology. Part B, Molecular and Developmental Evolution·Mariko ForconiJean-Nicolas Volff
Apr 15, 2014·Animal Science Journal = Nihon Chikusan Gakkaihō·Ning DingQigui Wang
Sep 30, 2014·Computational Biology and Chemistry·Juan Esteban GalloOliver Keatinge Clay
Dec 3, 2014·Methods : a Companion to Methods in Enzymology·Yihwan KimMina Rho
Apr 19, 2015·Parasites & Vectors·Francois Olivier HebertNadia Aubin-Horth
Apr 17, 2015·BMC Bioinformatics·Stanley Kimbung MbandiAlan Christoffels
Jan 13, 2015·Journal of Microbiological Methods·Mikang Sim, Jaebum Kim
Jan 31, 2015·Genome Génome / Conseil National De Recherches Canada·Periyasamy VijayakumarAnamika Mishra
May 7, 2016·Bioinformatics·Dilip A Durai, Marcel H Schulz
Jun 10, 2016·Development Genes and Evolution·Michael L JaramilloYara Maria Rauh Müller
Jul 29, 2016·BMC Genomics·Xin HuangPeter A Armbruster
Oct 9, 2016·Plant Science : an International Journal of Experimental Plant Biology·Chatree SaensukSiwaret Arikit
Sep 18, 2016·Comparative Biochemistry and Physiology. Part D, Genomics & Proteomics·Beide FuJingou Tong
Jan 21, 2017·Microbiome·Jeremy W CoxAleksey Porollo
Mar 2, 2016·Viruses·Hin Kwok, Alan Kwok Shing Chiang
Feb 14, 2018·Molecules : a Journal of Synthetic Chemistry and Natural Product Chemistry·Sima TaheriRedmond Ramin Shamshiri
Oct 11, 2017·Journal of Zhejiang University. Science. B·Ling HongPhilip C Miller
Nov 8, 2011·American Journal of Botany·Zhao LaiLoren H Rieseberg
Jan 24, 2012·American Journal of Botany·Susan R StricklerLukas A Mueller
Apr 21, 2019·Bioinformatics·Xin LiXiaoman Li
Aug 15, 2014·G3 : Genes - Genomes - Genetics·Upendra Kumar DevisettyJulin N Maloof
Aug 21, 2019·G3 : Genes - Genomes - Genetics·Dario I OjedaTanja Pyhäjärvi
Jul 13, 2012·BMC Genomics·Ujwal R BagalJeffrey F D Dean
Apr 22, 2015·PLoS Neglected Tropical Diseases·Xin HuangPeter A Armbruster
Jan 1, 2014·Computational Biology Journal·Mingming LiuLiqing Zhang
Aug 23, 2017·Scientific Reports·Jared MamrotHayley Dickinson
May 17, 2018·BMC Genomics·Chih-Hung ChouHsien-Da Huang
Jul 26, 2018·Physiology and Molecular Biology of Plants : an International Journal of Functional Plant Biology·Bhavana TiwariK V Bhat
Apr 29, 2020·PloS One·Mohammad Sadat-HosseiniKourosh Vahdati
Jan 1, 2012·ISRN Bioinformatics·Chien-Chih ChenJan-Ming Ho
Apr 4, 2013·G3 : Genes - Genomes - Genetics·Alexander G ShankuAndrew D Kern
Feb 15, 2017·Proceedings of the National Academy of Sciences of the United States of America·Wei LinYongxin Pan
Mar 1, 2019·BMC Genomics·Yun ZhaoChang-Jie Xu
Mar 17, 2019·International Journal of Molecular Sciences·Yajun ChenLu Zhang
Aug 26, 2020·Genes·Juan Pablo A OrtizSilvina C Pessino
Feb 23, 2017·PeerJ·Cédric CabauChristophe Klopp
Oct 6, 2017·Nature Communications·Shivani Mahajan, Doris Bachtrog
Sep 8, 2017·BMC Genomics·Aladje BaldéAlfredo Cravador

Related Concepts

Aedes
Arius
Homologous Sequences, Nucleic Acid
Sequence Determinations, DNA
Evolution, Molecular
Computational Molecular Biology
Contig Mapping
Proteome
MRNA Differential Display
Genomics

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.