Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences

Genome Research
Josh T CuperusGeorg Seelig

Abstract

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native S. cerevisiae 5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.

References

Dec 1, 1977·Proceedings of the National Academy of Sciences of the United States of America·F SangerA R Coulson
Oct 1, 1991·Yeast·P M Sharp, E Cowe
Jun 25, 1991·Nucleic Acids Research·D R Cavener, S C Ray
Apr 1, 1988·Molecular and Cellular Biology·S B Baim, F Sherman
Aug 1, 1984·Proceedings of the National Academy of Sciences of the United States of America·G ThireosH Greer
Jan 25, 1980·Journal of Molecular Biology·M B Brennan, K Struhl
Apr 11, 1995·Proceedings of the National Academy of Sciences of the United States of America·M A Treitel, M Carlson
May 9, 1995·Proceedings of the National Academy of Sciences of the United States of America·Z GuoF Sherman
Dec 6, 1994·Proceedings of the National Academy of Sciences of the United States of America·J ChenD S Pederson
Nov 14, 2000·Molecular and Cellular Biology·D R Morris, A P Geballe
Jun 13, 2001·Molecular Microbiology·C MartensP J Laybourn
Aug 22, 2003·Genetics·Hiroshi Akashi
Jun 3, 2004·Genome Research·Gavin E CrooksSteven E Brenner
Dec 16, 2005·PLoS Computational Biology·Markus Ringnér, Morten Krogh
Mar 17, 2006·RNA·Jeremy R BabendureRoger Y Tsien
Apr 3, 2007·Nature Protocols·R Daniel Gietz, Robert H Schiestl
Apr 22, 2008·Nucleic Acids Research·Andreas R GruberIvo L Hofacker
Apr 14, 2009·Nature Methods·Daniel G GibsonHamilton O Smith
Oct 6, 2009·Nature Biotechnology·Howard M SalisChristopher A Voigt
Nov 26, 2009·BMC Biotechnology·Thomas C ScanlonKarl E Griswold
Feb 5, 2010·Protein Engineering, Design & Selection : PEDS·Lorenzo BenatuilChung-Ming Hsieh
May 5, 2010·Proceedings of the National Academy of Sciences of the United States of America·Justin B KinneyEdward C Cox
Apr 6, 2011·Proceedings of the National Academy of Sciences of the United States of America·Ryan T HietpasDaniel N A Bolon
Nov 26, 2011·Algorithms for Molecular Biology : AMB·Ronny LorenzIvo L Hofacker
Mar 1, 2012·Nature Biotechnology·Rupali P PatwardhanJay Shendure
Oct 30, 2012·Bioinformatics·Alexander DobinThomas R Gingeras
Nov 7, 2012·Proceedings of the National Academy of Sciences of the United States of America·Jamie C KwasnieskiBarak A Cohen
Jan 29, 2013·Nature Biotechnology·Matthew T WeirauchTimothy R Hughes
Jul 9, 2013·Proceedings of the National Academy of Sciences of the United States of America·Shlomi DvirEran Segal
Jul 13, 2013·Nature·Debashish RayTimothy R Hughes
Aug 9, 2013·Proceedings of the National Academy of Sciences of the United States of America·Sriram KosuriGeorge M Church

❮ Previous
Next ❯

Citations

Sep 25, 2018·Microbial Biotechnology·Dick de Ridder
Oct 30, 2018·Nucleic Acids Research·Philipp RentzschMartin Kircher
Jul 10, 2019·World Journal of Microbiology & Biotechnology·Peng ZhangXing Hu
Dec 7, 2019·Science·Sean E McGearyDavid P Bartel
Aug 17, 2019·Chembiochem : a European Journal of Chemical Biology·Dominique BrunsGisbert Schneider
Aug 9, 2018·Nature Communications·Alec A K Nielsen, Christopher A Voigt
Dec 4, 2019·Nature Biotechnology·Carl G de BoerAviv Regev
Jul 29, 2020·Nature Structural & Molecular Biology·Longfei JiaShu-Bing Qian
Sep 2, 2020·Wiley Interdisciplinary Reviews. RNA·Christina Akirtava, Charles Joel McManus
May 16, 2019·Annual Review of Genomics and Human Genetics·Justin B Kinney, David M McCandlish
Apr 12, 2019·Nature Reviews. Genetics·Gökcen EraslanFabian J Theis
Nov 23, 2019·Scientific Reports·Yingshuai SunWenbin Chen
Dec 20, 2019·PLoS Computational Biology·Peter K Koo, Sean R Eddy
Jul 15, 2020·Nature Biotechnology·Ayaan HossainHoward M Salis
Sep 29, 2018·Briefings in Functional Genomics·Zhiqiang ZhangShaoliang Peng
Dec 22, 2019·Biological Reviews of the Cambridge Philosophical Society·Yatti De NijsWim K Soetaert
Aug 21, 2018·Nucleic Acids Research·Søren D PetersenMichael K Jensen
May 5, 2020·Briefings in Bioinformatics·Shuting JinXiangrong Liu
Jul 14, 2020·Bioinformatics·Yi LiuJohn Reinitz
Aug 15, 2020·Frontiers in Bioengineering and Biotechnology·Nadanai Laohakunakorn
May 2, 2020·Nature Communications·Benjamin J Kotopka, Christina D Smolke
Aug 13, 2020·Metabolites·Hongting TangJay D Keasling
Oct 13, 2018·Synthetic Biology·Tim WeeninkTom Ellis
Sep 19, 2020·PloS One·William A BarrMichael P Weir
Oct 9, 2020·Nature Communications·Jacqueline A ValeriDiogo M Camacho
Aug 23, 2019·Trends in Biotechnology·Ronald P H de JonghDick de Ridder
Oct 15, 2020·Science China. Life Sciences·Jianxiao LiuJianbing Yan
Mar 29, 2021·The Journal of Biological Chemistry·Géssica C BarrosFernando L Palhano
Apr 24, 2021·Nature Communications·Ali BashirB Scott Ferguson
Oct 4, 2020·Molecular Cell·Thijs NieuwkoopNico J Claassens
Apr 24, 2021·Genome Research·Evan M CoferOlga G Troyanskaya
Jun 5, 2021·Nature Plants·Tobias JoresChristine Queitsch
Jun 29, 2021·Frontiers in Molecular Biosciences·Jan ZrimecAleksej Zelezniak
Jan 26, 2018·ACS Synthetic Biology·Thomas DecoeneMarjan De Mey
Aug 6, 2021·Biotechnology Journal·Andrew P Cazier, John Blazeck

❮ Previous
Next ❯

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.