CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes

Genis ParraIan Korf


The numbers of finished and ongoing genome projects are increasing at a rapid rate, and providing the catalog of genes for these new genomes is a key challenge. Obtaining a set of well-characterized genes is a basic requirement in the initial steps of any genome annotation process. An accurate set of genes is needed in order to learn about species-specific properties, to train gene-finding programs, and to validate automatic predictions. Unfortunately, many new genome projects lack comprehensive experimental data to derive a reliable initial set of genes. In this study, we report a computational method, CEGMA (Core Eukaryotic Genes Mapping Approach), for building a highly reliable set of gene annotations in the absence of experimental data. We define a set of conserved protein families that occur in a wide range of eukaryotes, and present a mapping procedure that accurately identifies their exon-intron structures in a novel genomic sequence. CEGMA includes the use of profile-hidden Markov models to ensure the reliability of the gene structures. Our procedure allows one to build an initial set of reliable gene annotations in potentially any eukaryotic genome, even those in draft stages. Software and data sets are available onlin...Continue Reading


Jan 1, 1993·Bio Systems·M Borodovsky, J McIninch
Jun 15, 1996·Genomics·M Burset, R Guigó
Aug 20, 1996·Proceedings of the National Academy of Sciences of the United States of America·M S GelfandP A Pevzner
Apr 25, 1997·Journal of Molecular Biology·C Burge, S Karlin
Jan 23, 1999·Current Opinion in Genetics & Development·H Akashi, A Eyre-Walker
Jan 27, 1999·Bioinformatics·S R Eddy
Apr 26, 2000·Genome Research·G ParraR Guigó
Aug 31, 2000·Journal of Molecular Biology·C NotredameJ Heringa
Jul 27, 2001·Bioinformatics·I KorfM R Brent
Oct 5, 2002·Science·Robert A HoltStephen L Hoffman
Jan 10, 2003·Nucleic Acids Research·Jessica C KissingerDavid S Roos
Jan 17, 2003·Genome Research·Genís ParraRoderic Guigó
Sep 13, 2003·BMC Bioinformatics·Roman L TatusovDarren A Natale
Dec 11, 2003·Eukaryotic Cell·Arthur R GrossmanZhaoduo Zhang
May 5, 2004·Genome Research·Val CurwenMichele Clamp
May 5, 2004·Genome Research·Ewan BirneyRichard Durbin
May 18, 2004·BMC Bioinformatics·Ian Korf
Nov 30, 2005·Nucleic Acids Research·Alexandre LomsadzeMark Borodovsky
May 25, 2006·Genome Biology·Haining LinC Robin Buell

❮ Previous
Next ❯


Sep 6, 2013·Nature·Joe ParkerStephen J Rossiter
Nov 8, 2011·Nature Biotechnology·Rajeev K VarshneyScott A Jackson
Jan 29, 2013·Nature Biotechnology·Rajeev K VarshneyDouglas R Cook
Sep 21, 2013·Nature Communications·Yun Sung ChoJong Bhak
Oct 17, 2013·Nature Communications·Zhijian CaoWenxin Li
Oct 24, 2013·Nature Communications·Wenming ZhengZhensheng Kang
Jul 16, 2013·Nature Genetics·Thomas WickerBeat Keller
Nov 26, 2013·Nature Genetics·Hyung-Soon YimJung-Hyun Lee
Feb 2, 2011·Proceedings of the National Academy of Sciences of the United States of America·Chris R SmithJürgen Gadau
Feb 2, 2011·Proceedings of the National Academy of Sciences of the United States of America·Christopher D SmithNeil D Tsutsui
Mar 15, 2012·Proceedings of the National Academy of Sciences of the United States of America·Ronnie de JongeBart P H J Thomma
May 23, 2013·Proceedings of the National Academy of Sciences of the United States of America·Stéphane HacquardEmiel Ver Loren van Themaat
Nov 26, 2011·Genome Biology and Evolution·Yogeshwar D Kelkar, Howard Ochman
Sep 21, 2013·Genome Biology and Evolution·Eva GreganovaNiklaus Fankhauser
Dec 2, 2008·Nucleic Acids Research·Genis ParraIan Korf
Jul 1, 2011·Nucleic Acids Research·Seolkyoung JungSean R Eddy
Nov 8, 2011·Nucleic Acids Research·Sujai KumarMark Blaxter
Nov 13, 2012·Nucleic Acids Research·Shuai Zhan, Steven M Reppert
Jun 14, 2013·Nucleic Acids Research·Osvaldo MarinottiAna Tereza Ribeiro de Vasconcelos
Aug 15, 2012·FASEB Journal : Official Publication of the Federation of American Societies for Experimental Biology·Christelle GodelPascal Mäser
Nov 21, 2007·Genome Research·Brandi L CantarelMark Yandell
Apr 25, 2013·Plant Physiology·Philipp ZerbeJörg Bohlmann
Jun 1, 2012·BMC Genomics·Jesús Martínez-BarnetcheMario H Rodríguez López

❮ Previous
Next ❯

Related Concepts

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Landau-Kleffner Syndrome

Landau Kleffner syndrome (LKS), also called infantile acquired aphasia, acquired epileptic aphasia, or aphasia with convulsive disorder, is a rare childhood neurological syndrome characterized by the sudden or gradual development of aphasia (the inability to understand or express language) and an abnormal electroencephalogram. Discover the latest research on LKS here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.


Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Regulation of Vocal-Motor Plasticity

Dopaminergic projections to the basal ganglia and nucleus accumbens shape the learning and plasticity of motivated behaviors across species including the regulation of vocal-motor plasticity and performance in songbirds. Discover the latest research on the regulation of vocal-motor plasticity here.