DOI: 10.1101/505768Dec 31, 2018Paper

A network-based integrated framework for predicting virus-host interactions

BioRxiv : the Preprint Server for Biology
Weili WangNathan A. Ahlgren

Abstract

Metagenomic sequencing has greatly enhanced the discovery of viral genomic sequences; however, it remains challenging to identify the host(s) of these new viruses. We developed VirHostMatcher-Net, a flexible, network-based, Markov random field framework for predicting virus-host interactions using multiple, integrated features: CRISPR sequences, sequence homology, and alignment-free similarity measures (s2* and WIsH). Evaluation of this method on a benchmark set of 1,075 known viruses-host pairs yielded host prediction accuracy of 62% and 85% at the genus and phylum levels, representing 12-27% and 10-18% improvement respectively over previous single-feature prediction approaches. We applied our host-prediction tool to three metagenomic virus datasets: human gut crAss-like phages, marine viruses, and viruses recovered from globally-distributed, diverse habitats. Host predictions were frequently consistent with those of previous studies, but more importantly, this new tool made many more confident predictions than previous tools, up to 6-fold more (n>60,000), greatly expanding the diversity of known virus-host interactions.

Related Concepts

Bacteriophages
Virus
Evaluation
Elk3 protein, mouse
Virus-host Interaction
Virus by host
Habitat
Wit protein, Drosophila
RPS2 gene
Genus

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Cancer Vaccines

Cancer vaccines are vaccines that either treat existing cancer or prevent development of a cancer.

CRISPR for Genome Editing

Genome editing technologies enable the editing of genes to create or correct mutations. Clustered regularly interspaced short palindromic repeats (CRISPR) are DNA sequences in the genome that are recognized and cleaved by CRISPR-associated proteins (Cas). Here is the latest research on the use of CRISPR-Cas system in gene editing.

CRISPR for Genome Editing (Preprints)

Genome editing technologies enable the editing of genes to create or correct mutations. Clustered regularly interspaced short palindromic repeats (CRISPR) are DNA sequences in the genome that are recognized and cleaved by CRISPR-associated proteins (Cas). Here are the latest preprints on the use of CRISPR-Cas system in gene editing.

CRISPR (general)

Clustered regularly interspaced short palindromic repeats (CRISPR) are DNA sequences in the genome that are recognized and cleaved by CRISPR-associated proteins (Cas). CRISPR-Cas system enables the editing of genes to create or correct mutations. Discover the latest research on CRISPR here.