Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions

PloS One
Jun Zhang

Abstract

Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA) has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs) without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the simila...Continue Reading

References

Sep 1, 1978·Science·P MenozziLuigi Luca Cavalli-Sforza
Dec 5, 1998·American Journal of Human Genetics·Esteban ParraMark D Shriver
Mar 11, 2000·American Journal of Human Genetics·Mary Sara McPeek, L Sun
Jan 17, 2002·Genetic Epidemiology·L SunMary Sara McPeek
Feb 15, 2002·American Journal of Human Genetics·Heather E Collins-SchrammMichael F Seldin
Sep 6, 2002·Genetic Epidemiology·Xiaofeng ZhuRichard S Cooper
Feb 5, 2003·Human Heredity·Lei SunMary Sara McPeek
Jan 1, 1964·Cold Spring Harbor Symposia on Quantitative Biology·Luigi Luca Cavalli-SforzaA W EDWARDS
Nov 25, 2003·American Journal of Human Genetics·Noah A RosenbergJonathan K Pritchard
Apr 22, 2004·European Journal of Human Genetics : EJHG·Dina L NewmanCarole Ober
Feb 16, 2005·Genetic Epidemiology·Hua TangNeil J Risch
Mar 10, 2006·Clinical Biomechanics·Allan T WrigleyJoan M Stevenson
Apr 17, 2007·American Journal of Human Genetics·Marc BauchetMark D Shriver
Sep 26, 2007·PLoS Genetics·Peristera PaschouPetros Drineas
Jan 23, 2008·PLoS Genetics·Alkes L PriceJoel N Hirschhorn
Sep 2, 2008·Nature·John NovembreCarlos Bustamante
Sep 12, 2008·Molecular Ecology Notes·Daniel FalushJonathan K Pritchard
Mar 25, 2009·Genome Research·Joseph K PickrellJonathan K Pritchard
May 16, 2009·American Journal of Human Genetics·Shameek BiswasJoshua M Akey
May 21, 2009·Genetic Epidemiology·Ann B LeeKathryn Roeder
Jun 9, 2009·PLoS Genetics·Graham CoopJonathan K Pritchard
Dec 4, 2009·PloS One·Jun ZhangMary Sara McPeek
Dec 1, 2008·Journal of the American Statistical Association·Carlos M CarvalhoMike West

Citations

Jan 25, 2012·PloS One·Hoicheong SiuMomiao Xiong
Feb 9, 2017·Nature Communications·Eunjung HanCatherine A Ball
Feb 28, 2020·Forensic Science International. Genetics·Peter PfaffelhuberFranz Baumdicker

Methods Mentioned

BETA
genotyping
PCA

Related Concepts

In Silico
Genetics, Population
Variation (Genetics)
Single Nucleotide Polymorphism
Principal Component Analysis
Genome-Wide Association Study
Biological Markers
Forensic Medicine
Genome
Structure

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Alzheimer's Disease: MS4A

Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.

Pediculosis pubis

Pediculosis pubis is a disease caused by a parasitic insect known as Pthirus pubis, which infests human pubic hair, as well as other areas with hair including eye lashes. Here is the latest research.

Rh Isoimmunization

Rh isoimmunization is a potentially preventable condition that occasionally is associated with significant perinatal morbidity or mortality. Discover the latest research on Rh Isoimmunization here.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells. It also follows CRISPR-Cas9 approaches to generating genetic mutants as a means of understanding the effect of genetics on phenotype.

Enzyme Evolution

This feed focuses on molecular models of enzyme evolution and new approaches (such as adaptive laboratory evolution) to metabolic engineering of microorganisms. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Pharmacology of Proteinopathies

This feed focuses on the pharmacology of proteinopathies - diseases in which proteins abnormally aggregate (i.e. Alzheimer’s, Parkinson’s, etc.). Discover the latest research in this field with this feed.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.