DOI: 10.1101/504209Dec 21, 2018Paper

Population-wide copy number variation calling using variant call format files from 6,898 individuals

BioRxiv : the Preprint Server for Biology
Grace PngArthur Gilly

Abstract

Motivation: Copy number variants (CNVs) are large deletions or duplications at least 50 to 200 base pairs long. They play an important role in multiple disorders, but accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. Results: We use a regression tree-based approach to call CNVs from whole-genome sequencing (WGS, >18x) variant call-sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. 61.8% of detected events have been previously reported in the Database of Genomic Variants. 23% of high-quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe the LD structure and copy number variation underlying the association between levels of the CCL3 protein and a complex structural variant (MAF=0.15, p=3.6x10-12) affecting CCL3L3, a paralog of the CCL3 gene. We also identify a cis-association between a low-frequency NOMO1 deletion and the pr...Continue Reading

Related Concepts

Gene Deletion
Gene Duplication
Genes
Motivation
Regression Analysis
Trees (plant)
Lipid Droplet
NUBP1 protein, human
Quantitative Trait Loci
Cohort

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.