The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Genome Research
Aaron McKennaMark A DePristo

Abstract

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of ...Continue Reading

References

Aug 26, 1999·Trends in Genetics : TIG·J S ShoemakerB S Weir
Jan 11, 2000·Nucleic Acids Research·S T SherryK Sirotkin
Oct 10, 2001·Genome Research·Z NingJ C Mullikin
Jun 5, 2002·Genome Research·W James KentDavid Haussler
Dec 20, 2003·Nature·UNKNOWN International HapMap Consortium
May 14, 2004·Genome Research·C Andrew StewartStephan Beck
May 22, 2004·Nature Reviews. Genetics·UNKNOWN International HapMap Consortium
Aug 2, 2005·Nature·Marcel MarguliesJonathan M Rothberg
Jan 30, 2008·Bioinformatics·Ruiqiang LiJun Wang
Apr 19, 2008·Nature·David A WheelerJonathan M Rothberg
Oct 11, 2008·Nature Biotechnology·Jay Shendure, Hanlee Ji
Nov 7, 2008·Nature·David R BentleyAnthony J Smith
Nov 7, 2008·Nature·Jun WangJian Wang
May 20, 2009·Bioinformatics·Heng Li, Richard Durbin
Jun 10, 2009·Bioinformatics·Heng LiUNKNOWN 1000 Genome Project Data Processing Subgroup
Jul 16, 2009·Bioinformatics·A Martínez-AlcántaraY Fofanov
Nov 3, 2009·Nature Methods·Shirley PepkeAli Mortazavi

❮ Previous
Next ❯

Citations

Aug 13, 2011·American Journal of Medical Genetics. Part a·David R MurdockMelissa B Ramocki
Jul 10, 2012·Mammalian Genome : Official Journal of the International Mammalian Genome Society·Binnaz YalcinThomas M Keane
Feb 24, 2011·Human Genetics·Michael NothnagelJochen Hampe
Aug 14, 2012·Human Genetics·André AltmannBertram Müller-Myhsok
Jun 25, 2013·Human Genetics·Marissa A LeBlancKaren Bedard
Dec 25, 2012·Journal of Clinical Immunology·Jacob MallottJennifer Puck
Sep 8, 2012·Journal of Cardiovascular Translational Research·James S WareStuart A Cook
Jun 6, 2013·European Journal of Paediatric Neurology : EJPN : Official Journal of the European Paediatric Neurology Society·Rita GuerreiroSara E Mole
Jun 10, 2011·European Journal of Human Genetics : EJHG·Sara J BownePeter Humphries
Feb 7, 2013·European Journal of Human Genetics : EJHG·Iuliana Ionita-LazaXihong Lin
Mar 2, 2012·European Journal of Human Genetics : EJHG·Andreas LeidenrothJane E Hewitt
Jun 14, 2012·European Journal of Human Genetics : EJHG·Bradley N SmithChristopher E Shaw
Aug 29, 2013·European Journal of Human Genetics : EJHG·Judith ConroySean Ennis
Aug 15, 2013·European Journal of Human Genetics : EJHG·Julien TarabeuxClaude Houdayer
Oct 10, 2013·European Journal of Human Genetics : EJHG·Hui ZhaoPhilip Van Damme
Nov 28, 2013·European Journal of Human Genetics : EJHG·Ivan ProkudinRobyn V Jamieson
Mar 15, 2013·European Journal of Human Genetics : EJHG·Michael GonzalezRebecca Schüle
Apr 27, 2013·Genetics in Medicine : Official Journal of the American College of Medical Genetics·Colin C PritchardRobin L Bennett
May 18, 2013·Genetics in Medicine : Official Journal of the American College of Medical Genetics·Jana VandrovcovaTimothy J Aitman
Jul 27, 2012·Journal of Human Genetics·Bahareh RabbaniIturo Inoue
Feb 1, 2013·Journal of Human Genetics·Shigeki MitsunagaHidetoshi Inoko
Jun 15, 2013·Leukemia·Y WeiG Garcia-Manero
Aug 11, 2012·Modern Pathology : an Official Journal of the United States and Canadian Academy of Pathology, Inc·Mark D AdamsJoseph Willis
Oct 30, 2012·Nature·Fabien G LafailleLuigi D Notarangelo
Dec 12, 2012·Nature·Pim van der HarstJohn C Chambers
Dec 21, 2010·Nature Biotechnology·Jacob O KitzmanJay Shendure
Sep 29, 2011·Nature Biotechnology·Michael J ClarkMichael Snyder
Dec 20, 2011·Nature Biotechnology·Joke ReumersJurgen Del-Favero
Dec 20, 2011·Nature Biotechnology·Hugo Y K LamMichael Snyder
Mar 9, 2012·Nature Biotechnology·Hugo Y K LamMichael Snyder
Jul 24, 2012·Nature Biotechnology·Daniel RamsköldRickard Sandberg
Nov 10, 2012·Nature Biotechnology·Lucas D Ward, Manolis Kellis
Oct 10, 2013·Nature Biotechnology·Kevin J McKernanVasisht Tadigotla
Nov 5, 2013·Nature Biotechnology·Siddarth SelvarajBing Ren
Oct 15, 2010·Nature Biotechnology·Aleksandar Milosavljevic

❮ Previous
Next ❯

Related Concepts

Related Feeds

Cancer Genomics (Keystone)

Cancer genomics approaches employ high-throughput technologies to identify the complete catalog of somatic alterations that characterize the genome, transcriptome and epigenome of cohorts of tumor samples. Discover the latest research using such technologies in this feed.