PubChem and ChEMBL Beyond Lipinski

Alice CapecchiJean-Louis Reymond


Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (, the second one to perform MXFP nearest neighbor searches in NLP ( We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at and can be used freely to explore and better underst...Continue Reading

Related Concepts

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.

Glut1 Deficiency

Glut1 deficiency, an autosomal dominant, genetic metabolic disorder associated with a deficiency of GLUT1, the protein that transports glucose across the blood brain barrier, is characterized by mental and motor developmental delays and infantile seizures. Follow the latest research on Glut1 deficiency with this feed.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Separation Anxiety

Separation anxiety is a type of anxiety disorder that involves excessive distress and anxiety with separation. This may include separation from places or people to which they have a strong emotional connection with. It often affects children more than adults. Here is the latest research on separation anxiety.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Related Papers

Alice CapecchiJean-Louis Reymond
Jean-Louis ReymondAlice Capecchi
Molecular Informatics
Alice CapecchiJean-Louis Reymond
© 2021 Meta ULC. All rights reserved