Machine Learning for Organic Cage Property Prediction

ChemRxiv
Lukas TurcaniKim Jelfs

Abstract

We use machine learning to predict shape persistence and cavity size in porous organic cages. The majority of hypothetical organic cages suffer from a lack of shape persistence and as a result lack intrinsic porosity, rendering them unsuitable for many applications. We have created the largest computational database of these molecules to date, numbering 63,472 cages, formed through a range of reaction chemistries and in multiple topologies. We study our database and identify features which lead to the formation of shape persistent cages. We find that the imine condensation of trialdehydes and diamines in a [4+6] reaction is the most likely to result in shape persistent cages, whereas thiol reactions are most likely to give collapsed cages. Using this database, we develop machine learning models capable of predicting shape persistence with an accuracy of up to 93%, reducing the time taken to predict this property to milliseconds, and removing the need for specialist software. In addition, we develop machine learning models for two other key properties of these molecules, cavity size and symmetry. We provide open-source implementations of our models, together with the accompanying data sets, and an online tool giving users access...Continue Reading

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Hereditary Sensory Autonomic Neuropathy

Hereditary Sensory Autonomic Neuropathies are a group of inherited neurodegenerative disorders characterized clinically by loss of sensation and autonomic dysfunction. Here is the latest research on these neuropathies.

Glut1 Deficiency

Glut1 deficiency, an autosomal dominant, genetic metabolic disorder associated with a deficiency of GLUT1, the protein that transports glucose across the blood brain barrier, is characterized by mental and motor developmental delays and infantile seizures. Follow the latest research on Glut1 deficiency with this feed.

Regulation of Vocal-Motor Plasticity

Dopaminergic projections to the basal ganglia and nucleus accumbens shape the learning and plasticity of motivated behaviors across species including the regulation of vocal-motor plasticity and performance in songbirds. Discover the latest research on the regulation of vocal-motor plasticity here.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Nodding Syndrome

Nodding Syndrome is a neurological and epileptiform disorder characterized by psychomotor, mental, and growth retardation. Discover the latest research on Nodding Syndrome here.

LRRK2 & Microtubules

Mutations in the LRRK2 gene are risk-factors for developing Parkinson’s disease (PD). LRRK2 mutations in PD have been shown to enhance its association with microtubules. Here is the latest research.