Holistic Prediction of pKa in Diverse Solvents Based on Machine Learning Approach

Angewandte Chemie
Qi YangJin-Pei Cheng

Abstract

Although numerous theoretic approaches have been developed for predicting aqueous p K a , fast and accurate prediction of non-aqueous p K a s has remained a major challenge. On the basis of iBonD experimental p K a database curated across 39 solvents, a holistic p K a prediction model was established by using machine learning approach. Structural and physical organic parameters combined descriptors (SPOC) were introduced to represent the electronic and structural features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm showed the best prediction performance with MAE value as low as 0.87 p K a unit. The capability of prediction in diverse solvents allows for a comprehensive mapping of all the possible p K a correlations between different solvents, verifying the existence of transfer learning features . The holistic model was validated by prediction of aqueous p K a and micro-p K a of pharmaceutical molecules and p K a s of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform ( http://pka.luoszgroup.com ) was constructed based on the current model, which could provide p K a prediction beyond the reach otherwise for differ...Continue Reading

References

Oct 1, 1987·Ecotoxicology and Environmental Safety·T W Schultz
Mar 23, 2002·Physical Review Letters·J Z BaiUNKNOWN BES Collaboration
Nov 26, 2002·Journal of Chemical Information and Computer Sciences·Joseph L DurantJames G Nourse
May 29, 2003·Current Topics in Medicinal Chemistry·David J Livingstone
Jan 22, 2004·Journal of the American Chemical Society·Yao FuQing-Xiang Guo
Oct 20, 2005·Journal of Computational Chemistry·Frank Eckert, Andreas Klamt
Apr 7, 2006·Organic Letters·Tianzhi Zhang, Eric V Anslyn
Jul 28, 2009·Journal of Chemical Information and Modeling·A P HardingP L A Popelier
Aug 26, 2009·Journal of Chemical Information and Modeling·Adam C Lee, Gordon M Crippen
Dec 22, 2009·SAR and QSAR in Environmental Research·D T Manallack
Jul 17, 2010·European Journal of Medicinal Chemistry·Francesca MillettiGabriele Cruciani
Apr 8, 2011·Combinatorial Chemistry & High Throughput Screening·Matthias RuppIgor V Tetko
Jul 14, 2012·The Journal of Physical Chemistry. a·David M Bell, Scott L Anderson
Oct 27, 2012·Chemical Society Reviews·David T ManallackDavid K Chalmers
Jan 11, 2013·ChemMedChem·David T ManallackDavid K Chalmers
Apr 10, 2013·Biochimica Et Biophysica Acta·James T MuckermanYuko Wasada-Tsutsui
Mar 13, 2014·Organic Letters·Xiang NiJin-Pei Cheng
Sep 3, 2014·Journal of Medicinal Chemistry·Paul S Charifson, W Patrick Walters
Dec 17, 2014·Journal of Chemical Information and Modeling·Robert FraczkiewiczAlexander Hillisch
Feb 14, 2015·Organic Letters·Zhen LiJin-Pei Cheng
Oct 10, 2015·Pest Management Science·Peter Jeschke
Jan 21, 2016·Journal of Computational Chemistry·Emanuele Rossini, Ernst-Walter Knapp
Jun 17, 2016·Journal of Chemical Theory and Computation·Emanuele RossiniErnst-Walter Knapp
Mar 11, 2017·Chemical Reviews·Xiao-Song XueJin-Pei Cheng
Aug 24, 2017·The Journal of Organic Chemistry·Junming HoKatrina A Jolliffe
May 16, 2018·Pest Management Science·Robert D Clark
Jun 20, 2018·Journal of the American Chemical Society·Jin-Dong YangJin-Pei Cheng
Aug 4, 2018·Journal of Computer-aided Molecular Design·Nicolas TielkerStefan M Kast
Aug 8, 2018·Journal of Computer-aided Molecular Design·Edithe SelwaBogdan I Iorga
Aug 22, 2018·Journal of Computer-aided Molecular Design·Qiao ZengBernard R Brooks
Oct 3, 2018·Journal of Computer-aided Molecular Design·Samarjeet PrasadBernard R Brooks
Oct 17, 2018·Journal of Computer-aided Molecular Design·Caitlin C BannanA Geoffrey Skillman
May 17, 2019·Chemical Society Reviews·Jin-Dong YangJin-Pei Cheng
Oct 22, 2019·Journal of the American Chemical Society·Rafał RoszakBartosz A Grzybowski
Dec 4, 2019·Journal of Computer-aided Molecular Design·Michael K Gilson
Jan 9, 2020·Journal of the American Chemical Society·Feng AnHerbert Mayr
Sep 18, 2019·Journal of Cheminformatics·Kamel MansouriAntony J Williams

❮ Previous
Next ❯

Citations

Jun 19, 2021·The Journal of Physical Chemistry. a·Philipp Pracht, Stefan Grimme
Aug 10, 2021·Beilstein Journal of Organic Chemistry·Susanne M FischerChristian Slugovc
Sep 25, 2021·The Journal of Physical Chemistry Letters·Cheng-Wei JuZhou Lin
Oct 14, 2021·Bioinformatics·Jiacheng XiongMingyue Zheng
Oct 21, 2021·Chemistry : a European Journal·Stefano Racioppi, Martin Rahm

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.