GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning.

Journal of Chemical Information and Modeling
Carmen EspositoSereina Riniker

Abstract

Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure-activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we fou...Continue Reading

References

May 13, 2009·Journal of Chemical Information and Modeling·Sebastian G Rohrer, Knut Baumann
Apr 30, 2010·Journal of Chemical Information and Modeling·David Rogers, Mathew Hahn
Dec 6, 2011·Nucleic Acids Research·Yanli WangStephen H Bryant
May 17, 2012·Journal of Chemical Information and Modeling·John J IrwinRyan G Coleman
Mar 26, 2013·Journal of Chemical Information and Modeling·Robert P Sheridan
Jun 1, 2013·Journal of Cheminformatics·Sereina Riniker, Gregory A Landrum
Jun 26, 2013·International Journal of Computer Assisted Radiology and Surgery·Bowen SongZhengrong Liang
Nov 1, 2013·Journal of Chemical Information and Modeling·Sereina RinikerGregory A Landrum
Feb 15, 2014·Journal of Chemical Information and Modeling·Alexey V ZakharovMarc C Nicklaus
Jun 17, 2014·Journal of Chemical Information and Modeling·Sereina RinikerGregory A Landrum
Jul 12, 2014·Journal of Computer-aided Molecular Design·Paul Czodrowski
Oct 20, 2015·Journal of Chemical Information and Modeling·Teague Sterling, John J Irwin
Dec 3, 2016·Nucleic Acids Research·Anna GaultonAndrew R Leach
Jan 31, 2017·Journal of Molecular Graphics & Modelling·Ulf Norinder, Scott Boyer
Jun 20, 2017·Journal of Chemical Information and Modeling·Jiangming SunHongming Chen
Feb 10, 2018·Science Advances·Xin TongJingyi Jessica Li
Apr 10, 2018·Chemical Science·Zhenqin WuVijay Pande
Jun 28, 2018·Journal of Chemical Information and Modeling·Domenico GadaletaEnrico Mombelli
Jan 1, 2017·Green Chemistry : an International Journal and Green Chemistry Resource : GC·Katherine A PhillipsKristin K Isaacs
Feb 6, 2020·Nature Methods·Pauli VirtanenUNKNOWN SciPy 1.0 Contributors
Jun 24, 2020·Journal of Chemical Information and Modeling·Selçuk Korkmaz
Aug 14, 2020·Journal of Chemical Information and Modeling·Carmen EspositoSereina Riniker
Oct 20, 2020·Journal of Pharmaceutical Sciences·Jonathan AlvarssonOla Spjuth
Dec 30, 2020·Journal of Cheminformatics·Gabriel IdakwoPing Gong
Aug 30, 2019·Journal of Cheminformatics·Domenico GadaletaAlessandra Roncaglioni

❮ Previous
Next ❯

Citations

Jul 17, 2021·Journal of Chemical Information and Modeling·Giorgio Amendola, Sandro Cosconati
Sep 23, 2021·Journal of Computer-aided Molecular Design·W Patrick Walters
Nov 18, 2021·Journal of Chemical Theory and Computation·WooSeok JeongLaura Gagliardi

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

Related Papers

Pharmacotherapy
Richard T Scheife
Scientific American
T M Beardsley
Hong Kong Medical Journal = Xianggang Yi Xue Za Zhi
S W ChoiM G Irwin
Anaesthesia
S A ThompsonG B Drummond
BMJ : British Medical Journal
Andreas Kopka
© 2021 Meta ULC. All rights reserved