Apr 3, 2020

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

BioRxiv : the Preprint Server for Biology
Vito Adrian Cantu Alessio RoblesA. Segall


For any given bacteriophage genome or phage sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most divergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an "other" category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence diversity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F 1 -score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten classes, and non-phage proteins are classified as "other", provid...Continue Reading

  • References
  • Citations


  • We're still populating references for this paper, please check back later.
  • References
  • Citations


  • This paper may not have been cited yet.

Mentioned in this Paper

Cardiomyopathy, Familial Idiopathic
Entire Scalp
Cortex Bone Disorders
Adrenal Cortex Diseases
Reconstructive Surgical Procedures

Related Feeds


Cardiomyopathy is a disease of the heart muscle, that can lead to muscular or electrical dysfunction of the heart. It is often an irreversible disease that is associated with a poor prognosis. There are different causes and classifications of cardiomyopathies. Here are the latest discoveries pertaining to this disease.

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.