Quantifying the unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

BioRxiv : the Preprint Server for Biology
James ZouDaniel G MacArthur


As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of less than 10-5 in healthy humans, consistent with very strong intolerance to gene inactivation.

Related Concepts

Research Project
Intolerance Function
Nucleic Acid Sequencing
NIH Roadmap Initiative Tag
Large-Scale Sequencing
Gene Silencing

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

CZI Human Cell Atlas Seed Network

The aim of the Human Cell Atlas (HCA) is to build reference maps of all human cells in order to enhance our understanding of health and disease. The Seed Networks for the HCA project aims to bring together collaborators with different areas of expertise in order to facilitate the development of the HCA. Find the latest research from members of the HCA Seed Networks here.

BioHub - Researcher Network

The Chan-Zuckerberg Biohub aims to support the fundamental research and develop the technologies that will enable physicians to cure, prevent, or manage all diseases in our childrens' lifetimes. The CZ Biohub brings together researchers from UC Berkeley, Stanford, and UCSF. Find the latest research from the CZ Biohub researcher network here.

Related Papers

BioRxiv : the Preprint Server for Biology
Ravi PatelSudhir Kumar
Molecular Biology and Evolution
Ravi PatelSudhir Kumar
BioRxiv : the Preprint Server for Biology
Konrad J KarczewskiMiguel Covarrubias
BioRxiv : the Preprint Server for Biology
Konrad J KarczewskiDaniel G MacArthur
© 2020 Meta ULC. All rights reserved