A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes

Molecular Biology and Evolution
Gary W StuartJeffery J Leader

Abstract

We recently developed a method for producing comprehensive gene and species phylogenies from unaligned whole genome data using singular value decomposition (SVD) to analyze character string frequencies. This work provides an integrated gene and species phylogeny for 64 vertebrate mitochondrial genomes composed of 832 total proteins. In addition, to provide a theoretical basis for the method, we present a graphical interpretation of both the original frequency matrix and the SVD-derived matrix. These large matrices describe high-dimensional Euclidean spaces within which biomolecular sequences can be uniquely represented as vectors. In particular, the SVD-derived vector space describes each protein relative to a restricted set of newly defined, independent axes, each of which represents a novel form of conserved motif, termed a correlated peptide motif. A quantitative comparison of the relative orientations of protein vectors in this space provides accurate and straightforward estimates of sequence similarity, which can in turn be used to produce comprehensive gene trees. Alternatively, the vector representations of genes from individual species can be summed, allowing species trees to be produced.

References

Feb 18, 1997·Proceedings of the National Academy of Sciences of the United States of America·A JankeU Arnason
Jun 18, 1998·Journal of Molecular Evolution·A HärlidU Arnason
Jan 1, 1999·DNA Research : an International Journal for Rapid Publication of Reports on Genes and Genomes·H NakashimaT Ooi
Jan 23, 1999·Current Opinion in Genetics & Development·J L Boore, W M Brown
Jan 23, 1999·Nature Genetics·B SnelM A Huynen
Mar 3, 1999·Proceedings of the National Academy of Sciences of the United States of America·A S Rasmussen, U Arnason
May 21, 1999·Molecular Biology and Evolution·D KramerovI Serdobova
Sep 11, 1999·Trends in Ecology & Evolution·J P Curole, T D Kocher
Oct 16, 1999·Nucleic Acids Research·S T Fitz-Gibbon, C H House
Jun 1, 2000·Molecular Biology and Evolution·A ReyesC Saccone
Jun 3, 2000·Nature Biotechnology·M Y Galperin, E V Koonin
Nov 23, 2000·Current Opinion in Genetics & Development·J L Thorne
Dec 9, 2000·Molecular Biology and Evolution·D D PollockM P Cummings
Feb 24, 2001·Nature·W J MurphyS J O'Brien

❮ Previous
Next ❯

Citations

Jan 1, 2004·Journal of Biological Physics·Zhen QiRunsheng Chen
Dec 15, 2010·Journal of Biomolecular Structure & Dynamics·Guisong Chang, Tianming Wang
Jun 27, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Fabio Cunial, Alberto Apostolico
Aug 11, 2012·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Eyal Cohen, Benny Chor
Oct 29, 2013·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Conrad J BurdenSylvain Forêt
Jul 19, 2005·Proceedings. Biological Sciences·Scott V EdwardsAndrew M Shedlock
Jun 27, 2008·BMC Bioinformatics·Guoqing LuXiang Fang
Nov 11, 2005·BMC Evolutionary Biology·Charles ChapusPatrick Deschavanne
Dec 6, 2012·BMC Research Notes·Arun Seetharam, Gary W Stuart
Nov 7, 2006·PLoS Computational Biology·Eliot C Bush, Bruce T Lahn
Feb 8, 2005·Applied Bioinformatics·John K VriesIvet Bahar
May 19, 2010·International Journal of Molecular Sciences·Zu-Guo YuKa Hou Chu
Feb 20, 2014·Nature Communications·Kelsey N LucasJohn H Costello
Jun 28, 2011·Mechanisms of Ageing and Development·Kurtis D SalwayJeffrey A Stuart
May 4, 2011·Journal of Theoretical Biology·Guohua HuangLixin Xu
Apr 23, 2010·Systematic and Applied Microbiology·H-P Klenk, M Göker
Dec 15, 2015·BioData Mining·Emanuel WeitschekGiovanni Felici
Aug 27, 2014·Journal of Theoretical Biology·Jia WenStephen S T Yau
Mar 13, 2014·Biological Reviews of the Cambridge Philosophical Society·Lauren C Sallan
Mar 9, 2011·IEEE/ACM Transactions on Computational Biology and Bioinformatics·Raymond H ChanRoger Wei Wang
Nov 20, 2014·The New England Journal of Medicine·Alexandra SnyderTimothy A Chan
Jul 11, 2012·Molecular Phylogenetics and Evolution·Takuyo Aita, Koichi Nishigaki
Aug 14, 2013·Gene·Chenglong YuStephen S-T Yau
Aug 4, 2004·Journal of Bioinformatics and Computational Biology·Gary W Stuart, Michael W Berry
Apr 25, 2007·Systematic Biology·Michael Höhl, Mark A Ragan
Feb 15, 2019·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Yi WangStephen S-T Yau

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.