Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3

Molecular Biology and Evolution
Mira V HanMatthew W Hahn


Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.


Jan 1, 1981·Journal of Molecular Evolution·J Felsenstein
Mar 28, 2002·Nucleic Acids Research·A J EnrightChristos A Ouzounis
Oct 5, 2002·Science·Robert A HoltStephen L Hoffman
Apr 2, 2004·Nature·Richard A GibbsRat Genome Sequencing Project Consortium
Jul 27, 2004·Science·Jonathan SebatMichael Wigler
Aug 4, 2005·Genome Research·Matthew W HahnNello Cristianini
Nov 16, 2005·Annual Review of Genetics·Masatoshi Nei, Alejandro P Rooney
Mar 18, 2006·Bioinformatics·Tijl De BieMatthew W Hahn
Dec 22, 2006·PloS One·Jeffery P DemuthMatthew W Hahn
Apr 14, 2007·Science·Rhesus Macaque Genome Sequencing and Analysis ConsortiumAnn S Zwieg
Oct 20, 2007·Genetics·Matthew W HahnSang-Gook Han
Nov 8, 2007·Nature Genetics·Timothy B SacktonAndrew G Clark
Nov 13, 2007·Nature·Drosophila 12 Genomes ConsortiumIain MacCallum
Nov 14, 2007·PLoS Genetics·Matthew W HahnSang-Gook Han
Mar 7, 2008·Nature·Francis M MartinIgor V Grigoriev
May 3, 2008·Nature·Jeffrey M KiddEvan E Eichler
Jan 21, 2009·BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology·Jeffery P Demuth, Matthew W Hahn
Feb 13, 2009·Nature·Tomas Marques-BonetEvan E Eichler
May 26, 2009·Nature·Geraldine ButlerChristina A Cuomo
Dec 17, 2009·Nature·Ruiqiang LiJun Wang
May 18, 2010·Current Biology : CB·Chris A BrownKevin J Verstrepen
Jul 28, 2010·Molecular Biology and Evolution·Matthew D Rasmussen, Manolis Kellis
Feb 5, 2011·Science·John K ColbourneJeffrey L Boore
Feb 23, 2011·PloS One·Melissa J HubiszAdam Siepel
Oct 21, 2011·Nature·David BrawandHenrik Kaessmann
Nov 1, 2011·Bioinformatics·Ryan M AmesSimon C Lovell
Nov 3, 2011·BMC Bioinformatics·Liang LiuZhanji Liu
Nov 15, 2011·Bioinformatics·Pablo LibradoJulio Rozas
Nov 17, 2011·Nucleic Acids Research·Paul FlicekStephen M J Searle
Dec 3, 2011·Genome Research·Daniel R SchriderMatthew W Hahn
Jul 4, 2012·Nature Genetics·Qiang QiuJianquan Liu


Dec 24, 2013·Molecular Biology and Evolution·Charles-Elie RabierCécile Ané
Feb 28, 2014·BMC Genomics·Heather E MachadoSuzy C P Renn
Jun 3, 2014·BMC Plant Biology·Iris FischerNathalie Chantret
Jan 13, 2015·BMC Genomics·Maureen StolzerDannie Durand
Oct 11, 2015·Genome Biology and Evolution·Christopher B CunninghamAllen J Moore
Sep 13, 2015·Bioinformatics·Utkarsh J Dang, G Brian Golding
Dec 24, 2015·Fungal Biology·Romina GazisDavid S Hibbett
Sep 5, 2015·Genome Biology and Evolution·Luca CornettiVincent Savolainen
Mar 2, 2016·Systematic Biology·Fábio K Mendes, Matthew W Hahn
Sep 5, 2015·Journal of Mathematical Biology·Vincent RanwezVincent Berry
Dec 17, 2014·Molecular Biology and Evolution·Juan C OpazoJay F Storz
Feb 18, 2015·Annual Review of Animal Biosciences·Klaus-Peter KoepfliStephen J O'Brien
Jan 13, 2015·Genome Biology and Evolution·Magnus KarlssonDan Funck Jensen
Nov 12, 2014·Proceedings of the National Academy of Sciences of the United States of America·Michael J MontagueWesley C Warren
Jan 2, 2015·Genome Biology and Evolution·Yolanda GuillénAlfredo Ruiz
Dec 5, 2014·PLoS Computational Biology·James F DentonMatthew W Hahn
Mar 8, 2018·Molecular Biology and Evolution·Mark A Phuong, Gusti N Mahardika
Jan 13, 2018·Environmental Microbiology Reports·Mohammad BahramMartin Ryberg
Dec 29, 2017·Environmental Microbiology·Xiaoqian Shi-KunneMichael F Seidl
Feb 3, 2018·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Xinyi GuoJianquan Liu
Jul 13, 2018·Journal of Experimental Zoology. Part B, Molecular and Developmental Evolution·Mark C HarrisonCoby Schal
Jun 15, 2018·The Plant Journal : for Cell and Molecular Biology·Claudio Casola, Tomasz E Koralewski
Oct 21, 2018·Journal of Evolutionary Biology·Benjamin L S FurmanG Brian Golding
Feb 7, 2018·Nature Ecology & Evolution·Mark C HarrisonErich Bornberg-Bauer
Jan 23, 2019·Genome Biology and Evolution·Feng ZhangChao-Dong Zhu
Mar 13, 2019·Journal of Evolutionary Biology·Kevin M HornFrank E Anderson
Mar 22, 2019·Integrative and Comparative Biology·Tony Gamble
Jul 6, 2019·Plant Biotechnology Journal·Linkai HuangXinquan Zhang
Aug 8, 2019·PLoS Neglected Tropical Diseases·Fang LuoWei Hu
Aug 12, 2018·Nature Communications·Natsumi KanzakiTaisei Kikuchi
Jul 29, 2017·G3 : Genes - Genomes - Genetics·David B NealeJill L Wegrzyn
Sep 5, 2019·Molecular Ecology Resources·Zhaoshou RanJilin Xu
Sep 11, 2019·Molecular Biology and Evolution·Young-Jun ChoiMakedonka Mitreva
Feb 11, 2020·BMC Genomics·Annie LebretonLaurence Meslet-Cladière
Apr 10, 2020·Nature Communications·Jin SunPei-Yuan Qian
Apr 10, 2020·The Plant Journal : for Cell and Molecular Biology·Kanhu C Moharana, Thiago M Venancio
May 7, 2020·Genome Biology and Evolution·Rahulsimham VegesnaKateryna D Makova
May 3, 2020·Molecular Biology and Evolution·Arthur Zwaenepoel, Yves Van de Peer
Jul 18, 2020·Plant, Cell & Environment·Marylaure De La HarpeChristian Lexer
Jul 19, 2020·Genome Biology and Evolution·Shane M DeneckeJohn Vontas
Aug 1, 2020·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·Nolan BornowskiC Robin Buell
Jul 21, 2016·GigaScience·Simo V ZhangMatthew W Hahn
Apr 4, 2017·GigaScience·Seunghyun KangHyun Park
May 31, 2017·International Journal of Genomics·Ousman Mahmud, Jessica C Kissinger
Aug 16, 2017·Nature Ecology & Evolution·Jin SunPei-Yuan Qian
Nov 2, 2017·BMC Evolutionary Biology·Fabrizio MenardoBeat Keller
Mar 20, 2018·Frontiers in Microbiology·David LopezValérie Pujade-Renaud
Jul 10, 2018·Genome Biology and Evolution·Ricardo Assunção VialleSidney Santos
Feb 27, 2017·G3 : Genes - Genomes - Genetics·Zhiqiang YeMichael Lynch
Jun 20, 2018·Nature Plants·Christophe PlomionJérôme Salse
Mar 22, 2019·Proceedings of the National Academy of Sciences of the United States of America·Ticao ZhangLa Qiong
May 19, 2019·Genome Biology·Mathieu SeppeyNadir Alvarez
Aug 3, 2019·Journal of Molecular Evolution·Sagar Sharad ShindeNagarjun Vijay
Jan 24, 2020·Genome Biology·Gregg W C ThomasStephen Richards
Mar 18, 2020·Science China. Life Sciences·Wenting ZhangXuehui Huang
May 11, 2020·Journal of Molecular Evolution·Kevin M Horn, Frank E Anderson
Aug 15, 2020·Molecular Ecology Resources·Wirulda PootakhamSithichoke Tangphatsornruang
Jan 8, 2017·Genome Biology and Evolution·Daniel DowlingOliver Niehuis
Dec 3, 2016·Proceedings of the National Academy of Sciences of the United States of America·Sean K McKenzieDaniel J C Kronauer
Oct 5, 2018·Frontiers in Genetics·Shubham K JaiswalVineet K Sharma
May 17, 2019·Protein Science : a Publication of the Protein Society·João V RodriguesEugene I Shakhnovich
Feb 26, 2019·Nature Ecology & Evolution·Bo-Mi KimHyun Park
Apr 3, 2020·Proceedings of the National Academy of Sciences of the United States of America·Rebecca A PovilusWilliam E Friedman
Sep 15, 2020·Microbial Genomics·Fotis E PsomopoulosChristos A Ouzounis
Mar 11, 2020·Scientific Data·Ajit Kumar PatraYoungik Yang
May 8, 2020·Horticulture Research·Min-Jie HuZhong-Jian Liu
Jul 19, 2020·BMC Evolutionary Biology·Lucas Freitas, Mariana F Nery
Mar 23, 2017·Scientific Reports·Georgia TsagkogeorgaStephen J Rossiter
Apr 5, 2020·Nature Communications·Yi-Cun ChenYang-Dong Wang
May 26, 2018·Science·Maximilian GriesmannShifeng Cheng
Nov 11, 2020·Nucleic Acids Research·Marcela K Tello-RuizDoreen Ware
Nov 20, 2020·Nature Communications·Sang-Ho KangJae Kyung Sohng
Sep 16, 2020·Molecular Biology and Evolution·Sofia CasasaErik J Ragsdale
Oct 24, 2020·ELife·Robert GreenhalghMerijn R Kant
Nov 24, 2020·Molecular Biology and Evolution·Gregg W C ThomasMatthew W Hahn
Dec 17, 2020·Bioinformatics·Fábio K MendesMatthew W Hahn

Related Concepts

Computer Programs and Programming
Sequence Determinations, DNA
Evolution, Molecular
Computational Molecular Biology
Molecular Sequence Annotation
Gene Duplication

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.