Abstract
Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred protein sequences without ready access to the underlying genomic data. Analysis of this information (e.g. for EST clustering or phylogenetic reconstruction from proteome data) is hampered because it is not known if two protein sequences are isoforms (splice variants) or not (i.e. paralogs/orthologs). However, even without knowing the intron/exon structure, visual analysis of the pattern of similarity across the alignment of the two protein sequences is usually helpful since paralogs and orthologs feature substitutions with respect to each other, as opposed to isoforms, which do not. The IsoSVM tool introduces an automated approach to identifying isoforms on the protein level using a support vector machine (SVM) classifier. Based on three specific features used as input of the SVM classifier, it is possible to automatically identify isoforms with little effort and with an accuracy of more than 97%. We show that the SVM is superior to a radial basis function network and to a linear classifier. As an example application we use IsoSVM to estimate that a set of Xenopu...Continue Reading
References
Jun 1, 1992·Computer Applications in the Biosciences : CABIOS·D T JonesJ M Thornton
Jun 1, 1970·Systematic Zoology·W M Fitch
Sep 1, 1997·Nucleic Acids Research·S F AltschulD J Lipman
Jun 20, 1998·Bioinformatics·N P BrownC Sander
Oct 6, 1999·Genome Research·X Huang, A Madan
Feb 15, 2001·Trends in Genetics : TIG·B R Graveley
Jul 4, 2001·Genome Research·M DeanR Allikmets
Jul 28, 2001·Progress in Neurobiology·P J Grabowski, D L Black
Apr 23, 2002·Nature Reviews. Genetics·Luca CartegniAdrian R Krainer
Jul 24, 2002·Nucleic Acids Research·Kazutaka KatohTakashi Miyata
Jan 10, 2003·Nucleic Acids Research·David L WheelerLukas Wagner
Jan 10, 2003·Nucleic Acids Research·Christopher LeeYi Xing
Oct 14, 2003·Bioinformatics·Jiang QianMark Gerstein
Dec 6, 2003·Genome Research·Xiang H-F ZhangLawrence A Chasin
Dec 19, 2003·Nucleic Acids Research·T A ThanarajJuha Muilu
Dec 19, 2003·Nucleic Acids Research·Heike PospisilJens G Reich
Mar 3, 2004·Bioinformatics·Christina S LeslieWilliam Stafford Noble
Nov 9, 2004·Bioinformatics·Gideon DrorRon Shamir
Dec 21, 2004·Nucleic Acids Research·David L WheelerEugene Yaschenko
Sep 16, 2005·BMC Genomics·Alexander SczyrbaCurtis R Altmann
Oct 29, 2005·Proteins·Georg FuellenStefan Lorkowski
Feb 5, 2008·IEEE Transactions on Neural Networks·K R MüllerB Schölkopf
Citations
Apr 19, 2013·Molecular Biology and Evolution·Marta NovoGonzalo Giribet
Sep 4, 2008·Journal of Bioinformatics and Computational Biology·Alexander E Ivliev, Marina G Sergeeva
Oct 1, 2006·Expert Opinion on Drug Discovery·Paul M WattWayne R Thomas
Feb 19, 2015·Proceedings of the National Academy of Sciences of the United States of America·Ramya Purkanti, Mukund Thattai
May 13, 2020·Strahlentherapie und Onkologie : Organ der Deutschen Röntgengesellschaft ... [et al]·Martin KocherPhilipp Lohmann
Mar 3, 2020·Frontiers in Neurology·Philipp LohmannNorbert Galldiks
Jun 12, 2020·Methods : a Companion to Methods in Enzymology·Philipp LohmannKarl-Josef Langen
Jul 25, 2007·Chemical Reviews·Gert Lubec, Leila Afjehi-Sadat