Random forest versus logistic regression: a large-scale benchmark experiment

BMC Bioinformatics
Raphael CouronnéAnne-Laure Boulesteix

Abstract

The Random Forest (RF) algorithm for regression and classification has considerably gained popularity since its introduction in 2001. Meanwhile, it has grown to a standard classification approach competing with logistic regression in many innovation-friendly scientific fields. In this context, we present a large scale benchmarking experiment based on 243 real datasets comparing the prediction performance of the original version of RF with default parameters and LR as binary classification tools. Most importantly, the design of our benchmark experiment is inspired from clinical trial methodology, thus avoiding common pitfalls and major sources of biases. RF performed better than LR according to the considered accuracy measured in approximately 69% of the datasets. The mean difference between RF and LR was 0.029 (95%-CI =[0.022,0.038]) for the accuracy, 0.041 (95%-CI =[0.031,0.053]) for the Area Under the Curve, and - 0.027 (95%-CI =[-0.034,-0.021]) for the Brier score, all measures thus suggesting a significantly better performance of RF. As a side-result of our benchmarking experiment, we observed that the results were noticeably dependent on the inclusion criteria used to select the example datasets, thus emphasizing the impor...Continue Reading

References

Jan 10, 2003·Nucleic Acids Research·Alvis BrazmaSusanna-Assunta Sansone
Sep 18, 2004·BMC Bioinformatics·Michael P Cummings, Daniel S Myers
Jan 27, 2007·BMC Bioinformatics·Carolin StroblTorsten Hothorn
Oct 23, 2009·Bioinformatics·Mohammadmahdi R YousefiEdward R Dougherty
Mar 2, 2010·Journal of Clinical Epidemiology·Kaspar Rufibach
Sep 13, 2011·Briefings in Bioinformatics·Anne-Laure BoulesteixCarolin Strobl
May 3, 2013·PloS One·Anne-Laure BoulesteixManuel J A Eugster
Mar 13, 2014·Biometrical Journal. Biometrische Zeitschrift·Anne-Laure Boulesteix, Matthias Schmid
Feb 14, 2016·Bioinformatics·Victor L JongMarinus J C Eijkemans
Sep 3, 2016·BMC Bioinformatics·Barbara F F Huang, Paul C Boutros
May 27, 2017·Computational and Mathematical Methods in Medicine·Anne-Laure BoulesteixMathias Fuchs
Sep 11, 2017·BMC Medical Research Methodology·Anne-Laure BoulesteixAlexander Hapfelmeier

❮ Previous
Next ❯

Citations

Aug 21, 2019·Health Services Research·Leah L ZulligHayden B Bosworth
Dec 12, 2019·Journal of Oral Pathology & Medicine : Official Publication of the International Association of Oral Pathologists and the American Academy of Oral Pathology·Xiangjian WangHongmei Zhou
May 23, 2019·Alzheimer's & Dementia : the Journal of the Alzheimer's Association·Andrea VergalloUNKNOWN Alzheimer Precision Medicine Initiative (APMI)
Jan 9, 2020·BMJ Evidence-based Medicine·Rafael PereraThomas R Fanshawe
Jun 22, 2019·Genome Biology·Lukas M WeberMark D Robinson
Jun 26, 2020·PloS One·Mieke DeschepperKristof Eeckloo
Jun 20, 2020·The American Journal of Gastroenterology·Prasad G IyerJohn B Kisiel
Aug 11, 2020·Journal of the International Neuropsychological Society : JINS·H Sebastian CaballeroRoger A Dixon
Aug 22, 2020·Briefings in Bioinformatics·Moritz HerrmannAnne-Laure Boulesteix
Dec 17, 2019·European Clinical Respiratory Journal·Göran ErikssonLeif Bjermer
Jul 1, 2020·BMC Medical Research Methodology·Joshua J Levy, A James O'Malley
Apr 1, 2020·Biometrical Journal. Biometrische Zeitschrift·Marco ChiabudiniErika Graf
Jul 1, 2020·Transfusion·Andreas MittereckerJens Meier
Sep 16, 2020·The Annals of Pharmacotherapy·Mohammad A Al-MamunAndrea Sikora Newsome
Sep 11, 2017·BMC Medical Research Methodology·Anne-Laure BoulesteixAlexander Hapfelmeier
Nov 25, 2020·BMC Medical Genomics·Animesh AcharjeeGeorgios V Gkoutos
Dec 19, 2020·Aging Cell·Maxim N Shokhirev, Adiv A Johnson
Nov 30, 2020·Academic Emergency Medicine : Official Journal of the Society for Academic Emergency Medicine·Samuel A McDonaldD Mark Courtney
Jan 1, 2021·Frontiers in Plant Science·Elham KhaliliFaezeh Ghanati
Jan 28, 2021·International Journal of Molecular Sciences·Luca BedonMaurizio Polano
Dec 29, 2020·Journal of Clinical Epidemiology·Thomas E CowlingJan van der Meulen
Jan 13, 2021·Pacing and Clinical Electrophysiology : PACE·Vien T TruongEugene S Chung
Nov 24, 2020·Computational Economics·Juvenal José DuarteJosé César Cruz
Apr 1, 2021·Scientific Reports·Lluís MasanaUNKNOWN STACOV-XULA research group
Nov 14, 2020·Sensors·Javier Martínez-GramageEva Segura-Ortí
Apr 15, 2021·Nature Communications·Amin KiaghadiClint N Dawson
Apr 23, 2021·Scientific Reports·Kelsey Barton-HenryAnders Levermann
Feb 18, 2021·Clinical Cancer Research : an Official Journal of the American Association for Cancer Research·Shounak MajumderJohn B Kisiel
May 5, 2021·BMC Medical Informatics and Decision Making·Wengui TaoFenghua Chen
Nov 20, 2020·Computers, Informatics, Nursing : CIN·Rumei YangKatherine A Sward
May 4, 2020·Psychiatry Research·Kevin Z WangVincenzo De Luca

❮ Previous
Next ❯

Software Mentioned

mlr
GitHub
glmnet
tuneRanger
Docker
glm
randomForest
batchtools
ArrayExpress
ALB

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.

© 2022 Meta ULC. All rights reserved