Apr 15, 2020

A pan-cancer landscape of somatic substitutions in non-unique regions of the human genome

BioRxiv : the Preprint Server for Biology
M. TarabichiTomasz Konopka

Abstract

Around 13% of the human genome displays high sequence similarity with at least one other chromosomal position and thereby poses challenges for computational analyses such as detection of somatic events in cancer. We here extract features of sequencing data from across non-unique regions and employ a machine learning pipeline to describe a landscape of somatic substitutions in 2,658 cancers from the PCAWG cohort. We show mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation load and substitution profiles, and can be validated with linked-read sequencing. This uncovers hidden mutations in ~1,700 coding sequences and thousands of regulatory elements, including known cancer genes, immunoglobulins, and highly mutated gene families.

  • References
  • Citations

References

  • We're still populating references for this paper, please check back later.
  • References
  • Citations

Citations

  • This paper may not have been cited yet.

Mentioned in this Paper

Computer Software
Lossy Compression
Genome
Question (Inquiry)
Medical Devices
Genomics
Genetic Vectors
Protein Isoforms
Disease Vectors
Cloning Vectors

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

Trends in Genetics : TIG
Yanmei DouPeter J Park
Physical Review. B, Condensed Matter
P R Watson, I I I Mischenko
© 2020 Meta ULC. All rights reserved