Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection

Journal of Chemical Information and Modeling
Igor V TetkoAlexandre Varnek

Abstract

The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The "distance to model" can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR mode...Continue Reading

References

Aug 2, 1996·Journal of Medicinal Chemistry·D E PattersonL E Weinberger
Aug 24, 2000·Journal of Chemical Information and Computer Sciences·B BeckT Clark
Mar 30, 2001·Journal of Chemical Information and Computer Sciences·A J ChalkT Clark
Oct 18, 2001·Journal of Chemical Information and Computer Sciences·I V TetkoA E Villa
Jun 28, 2002·Journal of Chemical Information and Computer Sciences·Igor V Tetko
May 28, 2003·Journal of Chemical Information and Computer Sciences·Jörg K Wegner, Andreas Zell
Jun 27, 2003·Journal of Medicinal Chemistry·Min ShenAlexander Tropsha
Sep 19, 2003·Journal of Computer-aided Molecular Design·Alexander GolbraikhAlexander Tropsha
Jan 27, 2004·Journal of Chemical Information and Computer Sciences·Douglas M Hawkins
Nov 24, 2004·Journal of Chemical Information and Computer Sciences·Robert P SheridanSimon K Kearsley
Jan 27, 2005·SAR and QSAR in Environmental Research·T W SchultzM T D Cronin
May 17, 2005·Chemical Research in Toxicology·Aynur O AptulaT Wayne Schultz
Oct 19, 2005·Journal of Computer-aided Molecular Design·Igor V TetkoVolodymyr V Prokopenko
Nov 18, 2005·Journal of Computer-aided Molecular Design·A VarnekV P Solov'ev
Jul 19, 2006·Drug Discovery Today·Igor V TetkoGennadiy I Poda
Jan 25, 2007·Journal of Chemical Information and Modeling·Anton SchwaighoferKlaus-Robert Müller
Mar 4, 2008·Journal of Chemical Information and Modeling·Hao ZhuIgor V Tetko

❮ Previous
Next ❯

Citations

May 28, 2009·Journal of Computer-aided Molecular Design·Markus H J Seifert
Mar 19, 2013·Journal of Computer-aided Molecular Design·David J WoodJonna Stålring
May 31, 2013·Journal of Computer-aided Molecular Design·Igor I Baskin, Nelly I Zhokhova
Jun 25, 2010·Journal of Chemical Information and Modeling·Denis FourchesAlexander Tropsha
Feb 16, 2012·Journal of Chemical Information and Modeling·Feixiong ChengYun Tang
Mar 6, 2012·Journal of Chemical Information and Modeling·Robert P Sheridan
Apr 3, 2012·Journal of Chemical Information and Modeling·Stefan BrandmaierTomas Öberg
May 31, 2012·Journal of Chemical Information and Modeling·Bo-Han SuYufeng J Tseng
Mar 26, 2013·Journal of Chemical Information and Modeling·Robert P Sheridan
Aug 2, 2013·Journal of Chemical Information and Modeling·Yuling AnSteven L Dixon
Oct 25, 2013·Journal of Chemical Information and Modeling·Robert P Sheridan
Sep 23, 2010·ACS Nano·Denis FourchesAlexander Tropsha
Nov 15, 2012·Chemical Research in Toxicology·Renee SolimeoHao Zhu
Oct 23, 2009·Chemical Research in Toxicology·Hao ZhuAlexander Tropsha
Nov 30, 2013·Nature Reviews. Drug Discovery·John G CummingHongming Chen
Jun 23, 2009·SAR and QSAR in Environmental Research·J C DeardenK L E Kaiser
Dec 22, 2009·SAR and QSAR in Environmental Research·D A FilimonovV V Poroikov
Apr 23, 2013·SAR and QSAR in Environmental Research·M P Payne, W G Button
May 29, 2013·SAR and QSAR in Environmental Research·C Li, L M Colosi
May 29, 2010·Toxicological Sciences : an Official Journal of the Society of Toxicology·Max K LeongFu-Yuan Tsai
Jan 1, 2009·Journal of Cheminformatics·Robert D Clark
Apr 25, 2012·The AAPS Journal·Hongmao SunRuili Huang
Jun 19, 2010·Environmental Health Perspectives·Sandhya KortagereSean Ekins
Dec 17, 2009·PLoS Computational Biology·Sean EkinsMatthew D Krasowski
May 29, 2014·Journal of Cheminformatics·Svetlana I OvchinnikovaNatalia V Kireeva
Jun 25, 2014·Journal of Cheminformatics·Thierry HanserStéphane Werner
Nov 6, 2009·Journal of Computer-aided Molecular Design·Luca A FenuWendy E Sanderson
Jul 8, 2014·Drug Discovery Today·Orazio NicolottiEttore Novellino
Dec 26, 2015·Molecules : a Journal of Synthetic Chemistry and Natural Product Chemistry·Elena S SalminaIgor V Tetko
May 15, 2012·Expert Opinion on Drug Metabolism & Toxicology·Nina Jeliazkova
Sep 12, 2015·Expert Opinion on Drug Discovery·Tao WangLi-Rong Yang

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.