Gene cluster statistics with gene families.

Molecular Biology and Evolution
Narayanan Raghupathy, Dannie Durand

Abstract

Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene fami...Continue Reading

References

Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
May 30, 1998·Proceedings of the National Academy of Sciences of the United States of America·M A Huynen, P Bork
Oct 27, 1998·Trends in Biochemical Sciences·T DandekarP Bork
Nov 30, 1998·Science·A AmoresJ H Postlethwait
Mar 17, 1999·Proceedings of the National Academy of Sciences of the United States of America·R OverbeekN Maltsev
Aug 10, 1999·BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology·N G SmithL D Hurst
Apr 27, 2000·Trends in Genetics : TIG·W M Fitch
Dec 16, 2000·Science·T J VisionS D Tanksley
Feb 22, 2001·Science·J C VenterX Zhu
Feb 28, 2001·Trends in Genetics : TIG·J TamamesM Vicente
Mar 17, 2001·Mammalian Genome : Official Journal of the International Mammalian Genome Society·Z Trachtulec, J Forejt
Nov 3, 2001·Genome Research·R Friedman, A L Hughes
Apr 5, 2002·Genome Research·W James Kent
Apr 23, 2002·Nature Genetics·Laurent Abi-RachedHidetoshi Inoko
May 29, 2002·Nature Genetics·Aoife McLysaghtKenneth H Wolfe
Jun 1, 2002·Nature Genetics·Jürg Spring
Aug 15, 2002·Genome Research·Yu ZhengSimon Kasif
Oct 16, 2002·BMC Evolutionary Biology·Georgy P KarevEugene V Koonin
Nov 15, 2002·Nature·Eugene V KooninGeorgy P Karev
Nov 26, 2002·Trends in Genetics : TIG·Laurence D HurstCsaba Pál
Nov 26, 2002·Trends in Genetics : TIG·Klaas VandepoeleYves Van de Peer
May 8, 2003·Immunogenetics·Etienne G J DanchinPierre Pontarotti
Jun 5, 2003·Molecular Biology and Evolution·Alexandre VienneAndré Gilles
Jun 26, 2003·Nucleic Acids Research·Elisabeth GasteigerAmos Bairoch
Jul 12, 2003·Bioinformatics·Peter P CalabreseTodd J Vision
Aug 26, 2003·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Dannie Durand, David Sankoff
May 8, 2004·Nature Reviews. Genetics·Laurence D HurstMartin J Lercher
Jun 3, 2004·Genome Research·Cedric SimillionYves Van de Peer
Dec 15, 2004·BMC Bioinformatics·Noam KaplanMichal Linial
Feb 8, 2005·Journal of Molecular Evolution·Etienne G J Danchin, Pierre Pontarotti
Sep 1, 2005·PLoS Biology·Paramvir Dehal, Jeffrey L Boore
Oct 26, 2005·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Rose HobermanDannie Durand
Nov 14, 2006·Nucleic Acids Research·Christian von MeringPeer Bork

❮ Previous
Next ❯

Citations

Mar 23, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·S GruseaP Pontarotti
May 2, 2012·BMC Bioinformatics·Li AnVasileios Megalooikonomou
Feb 26, 2014·BMC Bioinformatics·Katharina JahnSebastian Böcker
Jul 6, 2014·Gene·Timo TiirikkaMauno Vihinen
Sep 30, 2014·Nature Chemical Biology·James R DoroghaziWilliam W Metcalf
Oct 19, 2010·FEBS Letters·Kálmán SomogyiIstván Andó
Dec 1, 2010·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Simona Grusea
Aug 12, 2014·BMC Bioinformatics·Joseph M E X LucasHugues Roest Crollius

❮ Previous
Next ❯

Software Mentioned

Blast
R
R Development Core Team
Mathematica

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Related Papers

Journal of Mathematical Biology
Ryszard RudnickiDamian Wójtowicz
Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
Dannie Durand, David Sankoff
Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
Rose HobermanDannie Durand
© 2021 Meta ULC. All rights reserved