Inferring ethnicity from mitochondrial DNA sequence

BMC Proceedings
Chih LeeCraig E Nelson


The assignment of DNA samples to coarse population groups can be a useful but difficult task. One such example is the inference of coarse ethnic groupings for forensic applications. Ethnicity plays an important role in forensic investigation and can be inferred with the help of genetic markers. Being maternally inherited, of high copy number, and robust persistence in degraded samples, mitochondrial DNA may be useful for inferring coarse ethnicity. In this study, we compare the performance of methods for inferring ethnicity from the sequence of the hypervariable region of the mitochondrial genome. We present the results of comprehensive experiments conducted on datasets extracted from the mtDNA population database, showing that ethnicity inference based on support vector machines (SVM) achieves an overall accuracy of 80-90%, consistently outperforming nearest neighbor and discriminant analysis methods previously proposed in the literature. We also evaluate methods of handling missing data and characterize the most informative segments of the hypervariable region of the mitochondrial genome. Support vector machines can be used to infer coarse ethnicity from a small region of mitochondrial DNA sequence with surprisingly high accu...Continue Reading


Jul 1, 1983·American Journal of Physical Anthropology·R Dibennardo, J V Taylor
Oct 16, 2001·International Journal of Legal Medicine·A RöhlP Forster
Nov 29, 2001·International Journal of Legal Medicine·H J BandeltVincent Macaulay
Oct 18, 2002·American Journal of Human Genetics·Hans-Jürgen BandeltVincent Macaulay
Jan 31, 2003·Annals of Human Genetics·P Forster
Jul 22, 2004·Nature Reviews. Genetics·Michael BamshadJ Claiborne Stephens
Sep 9, 2004·Science·Hans-Jürgen BandeltClaudio Bravi
Oct 7, 2004·Journal of Experimental Zoology. Part B, Molecular and Developmental Evolution·Giovanni Di BernardoMarilena Cipollaro
Oct 8, 2004·Annals of Human Genetics·T EgelandA Salas
Jul 3, 2007·PLoS Genetics·Doron M BeharGenographic Consortium
Jun 14, 2008·PLoS Computational Biology·Joseph SchlechtNirav C Merchant
Jul 16, 2008·Current Hypertension Reports·Clyde W Yancy
Sep 11, 2008·Pharmacogenomics·Hedi SchellemanStephen E Kimmel
Dec 17, 2008·Forensic Science International. Genetics·C PhillipsSNPforID Consortium


Dec 23, 2011·International Journal of Legal Medicine·Miriam BaetaBegoña Martínez-Jarreta
Dec 1, 1996·Current Opinion in Genetics & Development·R J Mitchell, M F Hammer
Jun 1, 2011·BMC Proceedings·Marcin PszczolaMaciej Szydlowski
Feb 26, 2013·Briefings in Bioinformatics·Chen LiDietrich Rebholz-Schuhmann
Apr 25, 2012·Database : the Journal of Biological Databases and Curation·Fabio RinaldiRuss B Altman
Feb 12, 2013·Database : the Journal of Biological Databases and Curation·Fabio RinaldiTherese Vachon
Sep 21, 2013·Database : the Journal of Biological Databases and Curation·Donald C ComeauW John Wilbur
Jul 27, 2012·Briefings in Bioinformatics·Udo HahnNigam H Shah
Feb 21, 2019·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Marmar Moussa, Ion I Măndoiu
Oct 28, 2018·BMC Genomics·Marmar Moussa, Ion I Măndoiu

Related Concepts

DNA, Mitochondrial
Ethnic Group
Genetic Markers
DNA Sequence

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Computational Methods for Protein Structures

Computational methods employing machine learning algorithms are powerful tools that can be used to predict the effect of mutations on protein structure. This is important in neurodegenerative disorders, where some mutations can cause the formation of toxic protein aggregations. This feed follows the latests insights into the relationships between mutation and protein structure leading to better understanding of disease.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.