Testing for neutrality in samples with sequencing errors

Genetics
Guillaume Achaz

Abstract

Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of theta that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.

References

Apr 1, 1975·Theoretical Population Biology·G A Watterson
Feb 1, 1974·Genetical Research·J M Smith, J Haigh
Apr 1, 1983·Theoretical Population Biology·R R Hudson
Oct 1, 1995·Theoretical Population Biology·Y X Fu
Jun 29, 1994·Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences·R C Griffiths, S Tavaré
May 16, 1998·Proceedings of the National Academy of Sciences of the United States of America·A Eyre-WalkerB S Gaut
Jan 23, 1999·Molecular Biology and Evolution·F Depaulis, M Veuille
Mar 10, 2001·Nature·R SachidanandamInternational SNP Map Working Group
May 24, 2001·Molecular Biology and Evolution·L MarkovtsovaSimon Tavaré
May 24, 2001·Molecular Biology and Evolution·Jeffrey D Wall, R R Hudson
May 24, 2001·Molecular Biology and Evolution·F DepaulisM Veuille
Oct 19, 2001·Molecular Biology and Evolution·P Tiffin, B S Gaut
Jun 12, 2003·Genome Research·H InnanMagnus Nordborg
Apr 7, 2004·Science·J Craig VenterHamilton O Smith
Jun 25, 2004·Molecular Biology and Evolution·G AchazJ Wakeley
May 12, 2006·Genome Research·Kosuke M TeshimaMolly Przeworski
Nov 17, 2006·Nature·Richard E GreenSvante Pääbo
Nov 6, 2007·Molecular Biology and Evolution·Philip L F Johnson, Montgomery Slatkin

Citations

Jan 31, 2012·Nature Genetics·Erik C AndersenLeonid Kruglyak
Mar 14, 2012·Molecular Biology and Evolution·Simon BoitardAndreas Futschik
Aug 13, 2009·BMC Bioinformatics·Bjarne Knudsen, Michael M Miyamoto
Sep 22, 2011·PLoS Genetics·Catherine A CharneskiEdward J Feil
Dec 31, 2009·PloS One·Eva-Liis LoogväliRichard Villems
Jul 9, 2010·Genetics·Luca FerrettiSebastian E Ramos-Onsins
Jun 5, 2012·Genetics·Luca FerrettiSebastian E Ramos-Onsins
Nov 30, 2013·Molecular Biology and Evolution·Eunjung HanJohn Novembre
Apr 29, 2014·Molecular Biology and Evolution·Swapna PurandareJennifer A Brisson
Oct 4, 2013·BMC Bioinformatics·Thorfinn Sand KorneliussenRasmus Nielsen
Oct 10, 2013·Molecular Ecology·Luca FerrettiMiguel Pérez-Enciso
Oct 2, 2012·Journal of Evolutionary Biology·F Clemente, C Vogl
Apr 23, 2013·American Journal of Human Genetics·Qasim AyubChris Tyler-Smith
Jun 10, 2014·Theoretical Population Biology·M RafajlovićB Mehlig
Jan 18, 2015·Theoretical Population Biology·Luca Ferretti, Sebastian E Ramos-Onsins
Nov 13, 2015·PLoS Pathogens·Mary B O'NeillCaitlin S Pepperell
Jan 24, 2018·Genome Biology and Evolution·Julie JaquiéryClaude Rispe
Apr 17, 2013·G3 : Genes - Genomes - Genetics·Ryan D BickelJennifer A Brisson
Apr 6, 2016·PloS One·Imtiaz A S RandhawaHerman W Raadsma
Jun 14, 2018·Statistical Applications in Genetics and Molecular Biology·Jere Koskela
Oct 3, 2020·Evolutionary Applications·Thibaut CapblancqStephen R Keller
Nov 9, 2018·Frontiers in Genetics·Luca FerrettiSebastian E Ramos-Onsins
Sep 9, 2020·Molecular Biology and Evolution·Joaquin C B NunezDavid M Rand

Related Concepts

In Silico
Two-Parameter Models
Truncation Biases
Morphologic Artifacts
Sequence Determinations, DNA
Spatial Displacement
Laboratory Procedures
Theta Rhythm
Site
Detected (Finding)

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Alzheimer's Disease: MS4A

Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.

Pediculosis pubis

Pediculosis pubis is a disease caused by a parasitic insect known as Pthirus pubis, which infests human pubic hair, as well as other areas with hair including eye lashes. Here is the latest research.

Rh Isoimmunization

Rh isoimmunization is a potentially preventable condition that occasionally is associated with significant perinatal morbidity or mortality. Discover the latest research on Rh Isoimmunization here.

Genetic Screens in iPSC-derived Brain Cells

Genetic screening is a critical tool that can be employed to define and understand gene function and interaction. This feed focuses on genetic screens conducted using induced pluripotent stem cell (iPSC)-derived brain cells. It also follows CRISPR-Cas9 approaches to generating genetic mutants as a means of understanding the effect of genetics on phenotype.

Enzyme Evolution

This feed focuses on molecular models of enzyme evolution and new approaches (such as adaptive laboratory evolution) to metabolic engineering of microorganisms. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Pharmacology of Proteinopathies

This feed focuses on the pharmacology of proteinopathies - diseases in which proteins abnormally aggregate (i.e. Alzheimer’s, Parkinson’s, etc.). Discover the latest research in this field with this feed.

Alignment-free Sequence Analysis Tools

Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. Here is the latest research.