Modelling methods and cross-validation variants in QSAR: a multi-level analysis$

SAR and QSAR in Environmental Research
A RáczK Héberger

Abstract

Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r2 and Q2). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced...Continue Reading

References

Mar 26, 2003·Journal of Chemical Information and Computer Sciences·Douglas M HawkinsDenise Mills
Jan 27, 2004·Journal of Chemical Information and Computer Sciences·Douglas M Hawkins
Feb 26, 2005·Journal of Computer-aided Molecular Design·Arthur M Doweyko
Nov 26, 2009·Current Topics in Medicinal Chemistry·Jitender VermaEvans C Coutinho
Aug 1, 1988·Journal of the American Chemical Society·R D CramerJ D Bunce
Oct 22, 2013·Journal of Computer-aided Molecular Design·Gergely TóthKároly Héberger
Dec 20, 2013·Journal of Medicinal Chemistry·Artem CherkasovAlexander Tropsha
Apr 30, 2014·Alternatives to Laboratory Animals : ATLA·Matteo CassottiRoberto Todeschini
Oct 14, 2014·Mutation Research. Genetic Toxicology and Environmental Mutagenesis·Károly HébergerBranka Vuković-Gačić
Jun 9, 2015·Journal of Cheminformatics·Dávid BajuszKároly Héberger
May 25, 2016·Journal of Chemical Information and Modeling·Paola Gramatica, Alessandro Sangion
Oct 13, 2017·SAR and QSAR in Environmental Research·J A Castillo-GaritA Torreblanca

❮ Previous
Next ❯

Citations

Apr 5, 2019·Molecular Informatics·Anita RáczKároly Héberger
Jun 4, 2019·SAR and QSAR in Environmental Research·Y Xia, H Zhang
Aug 4, 2019·Molecules : a Journal of Synthetic Chemistry and Natural Product Chemistry·Anita RáczKároly Héberger
Jul 28, 2020·SAR and QSAR in Environmental Research·O V TinkovJ C Dearden
Aug 19, 2020·Chemphyschem : a European Journal of Chemical Physics and Physical Chemistry·Johann Gasteiger
Jul 19, 2020·European Journal of Drug Metabolism and Pharmacokinetics·Samin Beheshti, Ali Shayanfar
Oct 9, 2020·Molecular Informatics·Oleg TinkovYuri Porozov
Feb 3, 2021·SAR and QSAR in Environmental Research·S SchmidtJ Hager
Dec 29, 2020·Chemical Research in Toxicology·Marcus W H WangTimothy E H Allen
Jun 11, 2021·Molecular Diversity·Anita RáczKároly Héberger

❮ Previous
Next ❯

Methods Mentioned

BETA
PCR

Software Mentioned

QikProp
PLS Toolbox
MATLAB
RDKit

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.