Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

BMC Medical Informatics and Decision Making
Mir S Siadaty, William A Knaus

Abstract

Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a lar...Continue Reading

References

Feb 4, 1998·Trends in Genetics : TIG·J McEntyre
Dec 12, 2000·The Journal of Organic Chemistry·B B Snider, T Liu
Aug 20, 2003·Journal of the American Medical Informatics Association : JAMIA·Victor Maojo, Casimir A Kulikowski
Dec 19, 2003·Nucleic Acids Research·Olivier Bodenreider
Dec 27, 2005·Computers in Biology and Medicine·Irene M MullinsWilliam A Knaus
Jul 24, 2007·Indian Journal of Dermatology, Venereology and Leprology·S Amladi
Feb 5, 2008·IEEE Transactions on Neural Networks·S MitraP Mitra

❮ Previous
Next ❯

Citations

Apr 5, 2012·Autoimmune Diseases·Ricardo A CifuentesJuan-Manuel Anaya
Apr 20, 2013·Journal of the American Medical Informatics Association : JAMIA·Kathrin M CresswellAziz Sheikh
Jun 28, 2016·Journal of Medical Internet Research·Emmanouil G SpanakisChariklia Tziraki

❮ Previous
Next ❯

Software Mentioned

UMLS
Perl

Related Concepts

Related Feeds

CV Disorders & Type 2 Diabetes

This feed focuses on the association of cardiovascular diseases in patients with type 2 diabetes.

Anemia

Anemia develops when your blood lacks enough healthy red blood cells. Anemia of inflammation (AI, also called anemia of chronic disease) is a common, typically normocytic, normochromic anemia that is caused by an underlying inflammatory disease. Here is the latest research on anemia.