Apr 3, 2020

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

BioRxiv : the Preprint Server for Biology
Vito Adrian Cantu Alessio RoblesA. Segall

Abstract

For any given bacteriophage genome or phage sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an "other" category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F 1 -score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten classes, and non-phage proteins are classified as "other", provid...Continue Reading

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Cardiomyopathy, Familial Idiopathic
Study
Entire Scalp
Cortex Bone Disorders
Adrenal Cortex Diseases
Electroencephalography
Electrocochleography
Reconstructive Surgical Procedures
Neurons
Brain

Related Feeds

Cardiomyopathy

Cardiomyopathy is a disease of the heart muscle, that can lead to muscular or electrical dysfunction of the heart. It is often an irreversible disease that is associated with a poor prognosis. There are different causes and classifications of cardiomyopathies. Here are the latest discoveries pertaining to this disease.

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.