DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

Nucleic Acids Research
Daniel X Quang, Xiaohui Xie

Abstract

Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory 'grammar' to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ.

References

Aug 23, 2005·Neural Networks : the Official Journal of the International Neural Network Society·Alex Graves, Jürgen Schmidhuber
Feb 28, 2007·Genome Biology·Shobhit GuptaWilliam Stafford Noble
May 29, 2009·Proceedings of the National Academy of Sciences of the United States of America·Lucia HindorffTeri Manolio
Sep 8, 2012·Nature·ENCODE Project Consortium
Nov 7, 2012·Nature·Goncalo R AbecasisGil A McVean
Dec 10, 2013·Nucleic Acids Research·Danielle WelterHelen Parkinson
Feb 18, 2014·Bioinformatics·Daniel X Quang, Xiaohui Xie
Jul 18, 2014·PLoS Computational Biology·Mahmoud GhandiMichael A Beer
Feb 20, 2015·Nature·Roadmap Epigenomics ConsortiumManolis Kellis
Apr 24, 2015·Nucleic Acids Research·Alejandra Medina-RiveraJacques van Helden
May 29, 2015·Nature·Yann LeCunGeoffrey Hinton
Jun 16, 2015·Nature Genetics·Dongwon LeeMichael A Beer
Jul 17, 2015·Epigenetics & Chromatin·Daniel X QuangFrancis S Collins
Jul 28, 2015·Nature Biotechnology·Babak AlipanahiBrendan J Frey
Aug 25, 2015·Nature Methods·Jian Zhou, Olga G Troyanskaya
Feb 14, 2016·Bioinformatics·Yifei ChenXiaohui Xie

Citations

Apr 19, 2018·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Tatsuhiko Naito
Apr 27, 2018·Pharmacogenomics·Alexandr A KalininBrian D Athey
Dec 19, 2017·BioMed Research International·Shao-Wu ZhangTeng Zhang
Dec 9, 2017·BMC Bioinformatics·Xu MinRui Jiang
May 24, 2018·Bioinformatics·Daniel X QuangStephen C J Parker
Dec 1, 2017·Nature Neuroscience·Michael M Halassa, Sabine Kastner
Jul 12, 2018·IEEE Reviews in Biomedical Engineering·Fabrizio CelestiMassimo Villari
Sep 7, 2018·Nature Biotechnology·Michael WainbergBrendan J Frey
Oct 9, 2018·Bioinformatics·Hailin HuJianyang Zeng
Nov 28, 2018·Nature Genetics·James ZouAmalio Telenti
Dec 21, 2018·Genome Research·Hongyang LiYuanfang Guan
May 9, 2019·Wiley Interdisciplinary Reviews. RNA·Xiaoyong PanHong-Bin Shen
May 12, 2019·Bioinformatics·Jun Wang, Liangjiang Wang
Jun 23, 2019·Human Mutation·Shengcheng Dong, Alan P Boyle
Jul 7, 2019·BMC Bioinformatics·Nguyen Quoc Khanh LeHui-Yuan Yeh
Jul 13, 2019·Bioinformatics·Jakub M BartoszewiczBernhard Y Renard
Aug 9, 2018·Nature Communications·Alec A K Nielsen, Christopher A Voigt
Sep 19, 2019·Genome Research·Ilya Vainberg SlutskinEran Segal
Oct 12, 2019·Proteins·Rosalin Bonetta, Gianluca Valentino
Feb 23, 2019·Scientific Reports·Farzad AbdolhosseiniHamidreza Chitsaz
Dec 18, 2019·Bioinformatics·Fabio FabrisAlex A Freitas
Jan 30, 2020·Genome Research·Mahsa Ghanbari, Uwe Ohler
Oct 31, 2018·Nature Communications·Mary GalliAndrea Gallavotti
May 30, 2019·BMC Bioinformatics·Poomarin PhloyphisutEkapol Chuangsuwanich
Jun 24, 2020·PLoS Neglected Tropical Diseases·Alexandra GroteElodie Ghedin
Jun 26, 2020·PLoS Computational Biology·Donghoon LeeMark Gerstein
Jul 2, 2020·Briefings in Bioinformatics·Quanzhong LiuFuyi Li
Feb 16, 2017·BMC Genomics·Yao-Zhong ZhangSatoru Miyano
Mar 29, 2018·Proceedings of the National Academy of Sciences of the United States of America·Daniel D LePolly M Fordyce
Jan 2, 2019·BMC Genomics·Chuqiao LyuJuhua Zhang
Apr 25, 2019·Nature Biomedical Engineering·Kun-Hsing YuIsaac S Kohane
Sep 21, 2019·Nature Communications·Daqi WangYongming Wang
Nov 21, 2019·Genome Medicine·Raquel Dias, Ali Torkamani
May 20, 2020·Nucleic Acids Research·Ye WangXiaowo Wang
Jul 15, 2020·Nature Communications·Wolfgang KoppAltuna Akalin
Sep 26, 2020·Frontiers in Surgery·Michael ChangAlexander R Vaccaro
Sep 9, 2017·Bioinformatics·Sai ZhangJianyang Zeng
Nov 5, 2017·Scientific Reports·Laiyi Fu, Qinke Peng
Jan 2, 2019·BMC Bioinformatics·Yiqian Zhang, Michiaki Hamada
Jul 25, 2019·International Journal of Molecular Sciences·Gongqiang LanHongpeng Wang
Jun 14, 2019·Frontiers in Genetics·Hanyu ZhangYi-Yang Lin
Mar 30, 2019·Nature Methods·Kathleen M ChenOlga G Troyanskaya
Dec 29, 2019·BMC Bioinformatics·Xubo Tang, Yanni Sun
Nov 11, 2019·Molecules : a Journal of Synthetic Chemistry and Natural Product Chemistry·Zhengfeng WangFang-Xiang Wu
Oct 31, 2019·Bioinformatics·Yichao LiLonnie Welch
Jun 20, 2020·Nucleic Acids Research·Alexander Gulliver Bjørnholt GrønningBrage Storstein Andresen
Jul 27, 2020·PeerJ·Thanyathorn ThanapattheerakulJonathan H Chan
Sep 9, 2020·Briefings in Bioinformatics·Murilo Horacio Pereira da CruzPedro Henrique Bugatti
Apr 12, 2017·Genome Biology·Christof AngermuellerOliver Stegle
Apr 6, 2018·Journal of the Royal Society, Interface·Travers ChingCasey S Greene
Jun 29, 2018·Bioinformatics·Hannes BretschneiderBrendan J Frey
Jul 24, 2019·Nature Genetics·Fan Cao, Melissa J Fullwood
Apr 27, 2019·Frontiers in Genetics·Mhaned OubounytKil To Chong
Dec 24, 2018·Genetics, Selection, Evolution : GSE·Patrik Waldmann
Sep 13, 2019·Bioinformatics·Zichao YanMathieu Blanchette
Dec 20, 2019·PLoS Computational Biology·Peter K Koo, Sean R Eddy
Sep 29, 2018·Briefings in Functional Genomics·Zhiqiang ZhangShaoliang Peng
Feb 23, 2018·Nature Communications·Yan ZhangFeng Yue
Apr 12, 2019·Nature Reviews. Genetics·Gökcen EraslanFabian J Theis
Jan 27, 2019·Nature Communications·Gregory J FonsecaChristopher K Glass
Oct 17, 2018·Scientific Reports·Zhen ShenDe-Shuang Huang
Sep 19, 2019·Pharmacological Reviews·Gerald A HigginsBrian D Athey
Dec 28, 2018·Interdisciplinary Sciences, Computational Life Sciences·Amani Al-Ajlan, Achraf El Allali
Oct 3, 2020·Briefings in Bioinformatics·Ying HeDe-Shuang Huang
Nov 25, 2020·Journal of Computational Biology : a Journal of Computational Molecular Cell Biology·Shuzhen Kuang, Liangjiang Wang
Aug 1, 2020·Genome Research·Liesbeth MinnoyeStein Aerts
Dec 5, 2020·Journal of Biomolecular Structure & Dynamics·Zhihua DuVladimir N Uversky
Dec 10, 2020·Nature Communications·Ethan C AlleyKevin M Esvelt
Jul 9, 2020·Computational and Structural Biotechnology Journal·Lefteris Koumakis
Mar 4, 2020·Computational and Structural Biotechnology Journal·Guishan ZhangXianhua Dai
Mar 25, 2020·Thammakorn SaethangDuangdao Wichadakul
Aug 24, 2017·Satpreet SinghZheng Liu

Related Concepts

DNA, Double-Stranded
Receiver Operating Characteristic
Computer Programs and Programming
Neural Network Simulation
Sequence Determinations, DNA
Single Nucleotide Polymorphism
Genomics
Quantitative Trait Loci
Genome-Wide Association Study
Biological Markers

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Synthetic Genetic Array Analysis

Synthetic genetic arrays allow the systematic examination of genetic interactions. Here is the latest research focusing on synthetic genetic arrays and their analyses.

Congenital Hyperinsulinism

Congenital hyperinsulinism is caused by genetic mutations resulting in excess insulin secretion from beta cells of the pancreas. Here is the latest research.

Neural Activity: Imaging

Imaging of neural activity in vivo has developed rapidly recently with the advancement of fluorescence microscopy, including new applications using miniaturized microscopes (miniscopes). This feed follows the progress in this growing field.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Epigenetic Memory

Epigenetic memory refers to the heritable genetic changes that are not explained by the DNA sequence. Find the latest research on epigenetic memory here.

Cell Atlas of the Human Eye

Constructing a cell atlas of the human eye will require transcriptomic and histologic analysis over the lifespan. This understanding will aid in the study of development and disease. Find the latest research pertaining to the Cell Atlas of the Human Eye here.

Femoral Neoplasms

Femoral Neoplasms are bone tumors that arise in the femur. Discover the latest research on femoral neoplasms here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.