DOI: 10.1101/456756Oct 30, 2018Paper

Accumulating computational resource usage of genomic data analysis workflow to optimize cloud computing instance selection

BioRxiv : the Preprint Server for Biology
Tazro OhtaOsamu Ogasawara


Background: Container virtualization technologies such as Docker became popular in the bioinformatics domain as they improve portability and reproducibility of software deployment. Along with software packaged in containers, the workflow description standards Common Workflow Language also enabled to perform data analysis on multiple different computing environments with ease. These technologies accelerate the use of on-demand cloud computing platform which can scale out according to the amount of data. However, to optimize the time and the budget on a use of cloud, users need to select a suitable instance type corresponding to the resource requirements of their workflows. Results: We developed CWL-metrics, a system to collect runtime metrics of Docker containers and workflow metadata to analyze resource requirement of workflows. We demonstrated the analysis by using seven transcriptome quantification workflows on six instance types. The result showed instance type options of lower financial cost and faster execution time with required amount of computational resources. Conclusions: The summary of resource requirements of workflow executions provided by CWL-metrics can help users to optimize the selection of cloud computing inst...Continue Reading

Related Concepts

Related Feeds

Bioinformatics in Biomedicine (Preprints)

Bioinformatics in biomedicine incorporates computer science, biology, chemistry, medicine, mathematics and statistics. Discover the latest preprints on bioinformatics in biomedicine here.

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.