Knowledge discovery by accuracy maximization

Proceedings of the National Academy of Sciences of the United States of America
Stefano CacciatoreLeonardo Tenori

Abstract

Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's pres...Continue Reading

References

Jan 1, 1979·Reviews of Infectious Diseases·R V McCloskey
Dec 9, 1998·Proceedings of the National Academy of Sciences of the United States of America·M B EisenD Botstein
Dec 23, 2000·Science·J B TenenbaumJ C Langford
Dec 23, 2000·Science·S T Roweis, L K Saul
Jan 5, 2002·Science·Mukund Balasubramanian, Eric L Schwartz
Jun 24, 2003·Journal of Computational Chemistry·Dimitris K Agrafiotis
Jan 27, 2004·Proceedings of the National Academy of Sciences of the United States of America·Richard M Shiffrin, Katy Börner
May 19, 2005·Proceedings of the National Academy of Sciences of the United States of America·R R CoifmanS W Zucker
May 26, 2005·Bioinformatics·Julia HandlDouglas B Kell
Jul 29, 2006·Science·G E Hinton, R R Salakhutdinov
Jan 16, 2007·Science·Brendan J Frey, Delbert Dueck
Jan 31, 2008·Proceedings of the National Academy of Sciences of the United States of America·Michael AssfalgManfred Spraul
Mar 11, 2008·Nature Biotechnology·Markus Ringnér
Mar 14, 2008·Proceedings of the National Academy of Sciences of the United States of America·Choongrak KimIksoo Chang
Sep 26, 2008·Chembiochem : a European Journal of Chemical Biology·Jessica D RyanDouglas S Clark
Mar 27, 2010·Proceedings of the National Academy of Sciences of the United States of America·Xuelian Wei, Ker-Chau Li
Aug 19, 2010·Nature Reviews. Genetics·Eric E SchadtGarry P Nolan
Sep 29, 2011·Proceedings of the National Academy of Sciences of the United States of America·Albert D ShiehEdoardo M Airoldi
Sep 1, 2012·PloS One·Ahmed Shamsul ArefinPablo Moscato

Citations

Apr 29, 2015·Journal of Personalized Medicine·Dario CarotenutoPaola Turano
May 28, 2015·Annals of the New York Academy of Sciences·Stefano Cacciatore, Massimo Loda
Mar 10, 2015·Omics : a Journal of Integrative Biology·Stefano CacciatoreMario Piccioli
Jul 13, 2018·Angewandte Chemie·Alessia VignoliClaudio Luchinat
Mar 13, 2019·International Journal of Cancer. Journal International Du Cancer·Stefano MeucciStefano Cacciatore
Oct 25, 2017·The Analyst·Jun Kikuchi, Shunji Yamada
Dec 21, 2016·Bioinformatics·Stefano CacciatoreDavid A MacIntyre
Jan 12, 2017·Molecular Cancer Research : MCR·Stefano CacciatoreMassimo Loda
Aug 7, 2015·Ying LiJose D. Contreras

Related Concepts

Anatomy, Regional
Gene Expression
Speech
Extraction
Local
American
Cross Validation
Chemical Extraction
Analysis
Metabolomics

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.