Nov 14, 2015

A novel method to model read counts in genomic data to reduce false positive identification of heterozygotes

BioRxiv : the Preprint Server for Biology
Steven H WuReed A Cartwright

Abstract

Accurate identification of genotypes is critical in identifying de novo mutations, linking mutations with disease, and determining mutation rates. To call genotypes correctly from short-read data requires modeling read counts for each base. True heterozygotes may be affected by mapping reference bias and library preparation, leading to a distribution of reads that does not fit a 1:1 binomial distribution, and potentially resulting failure to call the alternate allele. Homozygous sites can be affected by the alignment of paralogous genes and sequencing error, which could incorrectly suggest heterozygousity. Previous work has modeled increased variance and skewed allele ratios to some degree. Here, we were able to model reads for all data as a mixture of Dirichlet multinomial distributions. This model has a better fit to the data than previously used models. In most cases we observed two distributions: one corresponds to a large proportion of heterozygous sites with a low reference bias and close-to-binomial distribution, and the other to a small proportion of sites with a high bias and overdispersion. The sites with high reference bias have not been previously identified as SNPs in extensive human genome research; thus, we belie...Continue Reading

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Research
Genome
Nucleic Acid Sequencing
Genetic Pedigree
RAD51B gene
Site
Low Complexity Region
Sequencing
cDNA Library
Alleles

About this Paper

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.