DOI: 10.1101/500694Dec 18, 2018Paper

A Deep Learning Genome-Mining Strategy Improves Biosynthetic Gene Cluster Prediction

BioRxiv : the Preprint Server for Biology
Geoffrey D HanniganDanny A Bitton

Abstract

Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers more accurate BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing tools. We supplemented this with downstream random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a significant step forward for in-silico BGC identification.

Related Concepts

Classification
Genome
Research Design
Cochlear Nucleus Structure
Strategy
Metabolite
Cancer Treatment
Microbicides
Small Molecule
TRNA Gene Clustering

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.