DOI: 10.1101/470138Nov 14, 2018Paper

Scaling computational genomics to millions of individuals with GPUs

BioRxiv : the Preprint Server for Biology
Amaro Taylor-weinerGad Getz


Current genomics methods were designed to handle tens to thousands of samples, but will soon need to scale to millions to keep up with the pace of data and hypothesis generation in biomedical science. Moreover, costs associated with processing these growing datasets will become prohibitive without improving the computational efficiency and scalability of methods. Here, we show that recently developed machine-learning libraries (TensorFlow and PyTorch) facilitate implementation of genomics methods for GPUs and significantly accelerate computations. To demonstrate this, we re-implemented methods for two commonly performed computational genomics tasks: QTL mapping and Bayesian non-negative matrix factorization. Our implementations ran > 200 times faster than current CPU-based versions, and these analyses are ~5-10 fold cheaper on GPUs due to the vastly shorter runtimes. We anticipate that the accessibility of these libraries, and the improvements in run-time will lead to a transition to GPU-based implementations for a wide range of computational genomics methods.

Related Concepts

cDNA Library
Computed (Procedure)
Computational Technique

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Lipidomics & Rhinovirus Infection

Lipidomics can be used to examine the lipid species involved with pathogenic conditions, such as viral associated inflammation. Discovered the latest research on Lipidomics & Rhinovirus Infection.

Spatio-Temporal Regulation of DNA Repair

DNA repair is a complex process regulated by several different classes of enzymes, including ligases, endonucleases, and polymerases. This feed focuses on the spatial and temporal regulation that accompanies DNA damage signaling and repair enzymes and processes.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Torsion Dystonia

Torsion dystonia is a movement disorder characterized by loss of control of voluntary movements appearing as sustained muscle contractions and/or abnormal postures. Here is the latest research.

Archaeal RNA Polymerase

Archaeal RNA polymerases are most similar to eukaryotic RNA polymerase II but require the support of only two archaeal general transcription factors, TBP (TATA-box binding protein) and TFB (archaeal homologue of the eukaryotic general transcription factor TFIIB) to initiate basal transcription. Here is the latest research on archaeal RNA polymerases.

Alzheimer's Disease: MS4A

Variants within the membrane-spanning 4-domains subfamily A (MS4A) gene cluster have recently been implicated in Alzheimer's disease in genome-wide association studies. Here is the latest research on Alzheimer's disease and MS4A.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

© 2021 Meta ULC. All rights reserved