Graph Splitting: A Graph-Based Approach for Superfamily-Scale Phylogenetic Tree Reconstruction

Systematic Biology
Motomu Matsui, Wataru Iwasaki

Abstract

A protein superfamily contains distantly related proteins that have acquired diverse biological functions through a long evolutionary history. Phylogenetic analysis of the early evolution of protein superfamilies is a key challenge because existing phylogenetic methods show poor performance when protein sequences are too diverged to construct an informative multiple sequence alignment (MSA). Here, we propose the Graph Splitting (GS) method, which rapidly reconstructs a protein superfamily-scale phylogenetic tree using a graph-based approach. Evolutionary simulation showed that the GS method can accurately reconstruct phylogenetic trees and be robust to major problems in phylogenetic estimation, such as biased taxon sampling, heterogeneous evolutionary rates, and long-branch attraction when sequences are substantially diverge. Its application to an empirical data set of the triosephosphate isomerase (TIM)-barrel superfamily suggests rapid evolution of protein-mediated pyrimidine biosynthesis, likely taking place after the RNA world. Furthermore, the GS method can also substantially improve performance of widely used MSA methods by providing accurate guide trees.

References

Oct 5, 1990·Journal of Molecular Biology·S F AltschulD J Lipman
Jan 1, 1981·Journal of Molecular Evolution·J Felsenstein
May 1, 1997·Molecular Biology and Evolution·J Zhang, S Kumar
Jul 1, 1997·Molecular Biology and Evolution·Z Yang, B Rannala
Oct 26, 1999·Journal of Molecular Biology·J M ThorntonF M Pearl
Dec 11, 1999·Nucleic Acids Research·H M BermanP E Bourne
Oct 31, 2000·Journal of Molecular Biology·R R Copley, P Bork
Oct 9, 2002·Genome Research·Jason E StajichEwan Birney
Sep 23, 2003·Protein Science : a Publication of the Protein Society·Keiko MatsudaNobuhiro Go
Jan 22, 2004·Bioinformatics·Emmanuel ParadisKorbinian Strimmer
Mar 10, 2004·Proceedings of the National Academy of Sciences of the United States of America·A BarratA Vespignani
Jul 21, 2004·Journal of Computational Chemistry·Eric F PettersenThomas E Ferrin
Dec 9, 2004·Bioinformatics·Søren BesenbacherChristian N S Pedersen
Mar 21, 2006·Nucleic Acids Research·Alberto PaccanaroMansoor A S Saqi
Apr 14, 2006·Systematic Biology·T Heath Ogden, Michael S Rosenberg
Oct 26, 2006·BMC Bioinformatics·Paulo A S NuinElisabeth R M Tillier
May 23, 2007·Proceedings of the National Academy of Sciences of the United States of America·Gustavo Caetano-AnollésJay E Mittenthal
Mar 28, 2008·BMC Bioinformatics·Alexis Criscuolo, Olivier Gascuel
Aug 12, 2008·Bioinformatics·Vikas Bansal, Vineet Bafna
May 9, 2009·Molecular Biology and Evolution·William Fletcher, Ziheng Yang
Feb 26, 2010·Cold Spring Harbor Perspectives in Biology·Eric A GaucherRyan N Randall
Mar 13, 2010·PloS One·Morgan N PriceAdam P Arkin
Jul 28, 2010·PLoS Genetics·Yann S DufourTimothy J Donohue
Dec 15, 2010·Bioinformatics·Michael E SmootTrey Ideker
Dec 21, 2010·Bioinformatics·Klaus Peter Schliep
Mar 1, 2011·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Shu-Bo ZhangJian-Huang Lai
Apr 8, 2011·Nucleic Acids Research·Ivica Letunic, Peer Bork
Oct 13, 2011·Molecular Systems Biology·Fabian SieversDesmond G Higgins
Jan 25, 2012·The Journal of Cell Biology·Ana Maria RojasAlfonso Valencia
Feb 24, 2012·Systematic Biology·Fredrik RonquistJohn P Huelsenbeck
Oct 12, 2012·Natural Product Reports·Christopher T Walsh, Timothy A Wencewicz
Jan 19, 2013·Molecular Biology and Evolution·Kazutaka Katoh, Daron M Standley
Jan 24, 2013·Biology Direct·Cheong Xin Chan, Mark A Ragan
Feb 1, 2013·Bioinformatics·Pierre-Alain JachietEric Bapteste
Dec 3, 2013·Nucleic Acids Research·Antonina AndreevaAlexey G Murzin
Oct 9, 2014·Annual Review of Genetics·Marco GerlingerCharles Swanton
Nov 6, 2014·Molecular Biology and Evolution·Lam-Tung NguyenBui Quang Minh
Jan 7, 2016·Journal of Molecular Evolution·Aaron David GoldmanLaura F Landweber
Jan 18, 2016·Trends in Microbiology·Eduardo CorelEric Bapteste

❮ Previous
Next ❯

Citations

Mar 12, 2020·Molecular Biology and Evolution·Nathaniel J HimmelDaniel N Cox

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.