Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers

BMC Genomics
Yu FuZhiping Weng

Abstract

RNA-seq and small RNA-seq are powerful, quantitative tools to study gene regulation and function. Common high-throughput sequencing methods rely on polymerase chain reaction (PCR) to expand the starting material, but not every molecule amplifies equally, causing some to be overrepresented. Unique molecular identifiers (UMIs) can be used to distinguish undesirable PCR duplicates derived from a single molecule and identical but biologically meaningful reads from different molecules. We have incorporated UMIs into RNA-seq and small RNA-seq protocols and developed tools to analyze the resulting data. Our UMIs contain stretches of random nucleotides whose lengths sufficiently capture diverse molecule species in both RNA-seq and small RNA-seq libraries generated from mouse testis. Our approach yields high-quality data while allowing unique tagging of all molecules in high-depth libraries. Using simulated and real datasets, we demonstrate that our methods increase the reproducibility of RNA-seq and small RNA-seq data. Notably, we find that the amount of starting material and sequencing depth, but not the number of PCR cycles, determine PCR duplicate frequency. Finally, we show that computational removal of PCR duplicates based only on...Continue Reading

References

Nov 11, 1991·Nucleic Acids Research·Y H ZhouR H Ebright
Aug 11, 1994·Nucleic Acids Research·J M FlamanR Iggo
Dec 1, 1993·PCR Methods and Applications·R S Cha, W G Thilly
Jun 1, 1996·Nucleic Acids Research·F Mathieu-DaudéM McClelland
Apr 29, 2006·Nature Genetics·Piero CarninciYoshihide Hayashizaki
Jun 6, 2006·Nature·Angélique GirardMichelle A Carmell
Jun 6, 2006·Nature·Alexei AravinThomas Tuschl
Apr 21, 2007·Science·Alexei A AravinGregory J Hannon
May 13, 2008·Current Biology : CB·Charles Addo-QuayeMichael J Axtell
Jul 29, 2008·Nucleic Acids Research·Juliane C DohmHeinz Himmelbauer
Jun 10, 2009·Bioinformatics·Heng LiUNKNOWN 1000 Genome Project Data Processing Subgroup
Sep 9, 2009·Nature Reviews. Genetics·Peter J Park
Feb 23, 2011·Genome Biology·Daniel AirdAndreas Gnirke
May 13, 2011·Proceedings of the National Academy of Sciences of the United States of America·Glenn K FuStephen P A Fodor
Nov 22, 2011·Nature Methods·Teemu KiviojaJussi Taipale
Jan 11, 2012·Proceedings of the National Academy of Sciences of the United States of America·Katsuyuki ShiroguchiX Sunney Xie
May 21, 2013·Nature Methods·Xian AdiconisJoshua Z Levin
Jun 26, 2013·Cell Reports·Magali SoumillonHenrik Kaessmann
Dec 24, 2013·Nature Methods·Saiful IslamSten Linnarsson
Jan 5, 2014·Nature Protocols·Simone PicelliRickard Sandberg
Jan 23, 2014·Proceedings of the National Academy of Sciences of the United States of America·Glenn K FuStephen P A Fodor
Feb 8, 2014·BioTechniques·Steven R HeadPhillip Ordoukhanian
Apr 8, 2014·Nature Protocols·Tim R MercerJohn S Mattick
Nov 21, 2014·Nature·Feng YueUNKNOWN Mouse ENCODE Consortium
Jan 30, 2015·Nature Reviews. Genetics·Oliver StegleJohn C Marioni
Jun 7, 2015·Genome Biology·Sayantan BosePeter A Sims
Aug 5, 2015·BMC Genomics·John E CollinsElisabeth M Busch-Nentwich
Nov 22, 2015·Genome Medicine·Gur Yaari, Steven H Kleinstein
Jan 6, 2016·Nature Reviews. Genetics·Kimberly R AndrewsPaul A Hohenlohe
Mar 13, 2016·BMC Bioinformatics·Melanie SchirmerChristopher Quince
May 10, 2016·Scientific Reports·Swati ParekhInes Hellmann
Oct 9, 2016·BMC Bioinformatics·Charles GirardotEileen E M Furlong

❮ Previous
Next ❯

Citations

Jun 9, 2019·The Journal of Immunology : Official Journal of the American Association of Immunologists·Ryan K NelsonPraveen Akuthota
Sep 29, 2019·RNA Biology·Fatima HeinickeGregor D Gilfillan
Aug 22, 2020·Biochemical Society Transactions·Alessio ColantoniElsa Zacco
Mar 27, 2019·Therapeutic Advances in Ophthalmology·Nicholas Owen, Mariya Moosajee
Jul 26, 2019·Nature Reviews. Genetics·Rory StarkJames Hadfield
May 10, 2020·The Journal of Biological Chemistry·Alina GlaubZachary Ardern
Apr 30, 2020·Genome Biology·Shun LiuMengjie Chen
Jan 18, 2020·Journal of Clinical Medicine·Rute PereiraMário Sousa
Jun 13, 2019·Hepatology : Official Journal of the American Association for the Study of Liver Diseases·Suet-Yan KwanWen Xue
Sep 5, 2020·Scientific Reports·Klay SaundersCameron P Bracken
Nov 13, 2020·International Journal of Toxicology·Jennifer C ShingWayne R Buck
Nov 26, 2020·International Journal of Molecular Sciences·Nuri LeeSeri Jeong
Jan 14, 2021·GigaScience·Taylor ReiterN Tessa Pierce-Ward
Oct 23, 2019·Molecular Aspects of Medicine·Lukas ValihrachMikael Kubista
Feb 27, 2020·Cell Reports·Swapnil S ParhadWilliam E Theurkauf
Apr 20, 2021·Cell Discovery·Heming WangPengyu Huang
Jun 3, 2021·Diagnostics·Sarka BenesovaLukas Valihrach
Oct 10, 2021·Molecular Cell·Ildar GainetdinovPhillip D Zamore
Oct 21, 2021·Genome Research·Pavol GenzorAstrid D Haase
Oct 15, 2021·Cardiovascular Research·Maarten VanhaverbekeUNKNOWN EU-CardioRNA COST Action CA17129
Jan 23, 2022·Genome Biology·Ruochen JiangJingyi Jessica Li

❮ Previous
Next ❯

Methods Mentioned

BETA
RNA-seq
PCR
Illumina sequencing
single-cell sequencing
immunoprecipitation
ChIP-seq
degradome sequencing

Software Mentioned

NetworkX
CaptureSeq
CUT
umitools
NextSeq
SAMtools

Related Concepts

Trending Feeds

COVID-19

Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Blastomycosis

Blastomycosis fungal infections spread through inhaling Blastomyces dermatitidis spores. Discover the latest research on blastomycosis fungal infections here.

Nuclear Pore Complex in ALS/FTD

Alterations in nucleocytoplasmic transport, controlled by the nuclear pore complex, may be involved in the pathomechanism underlying multiple neurodegenerative diseases including Amyotrophic Lateral Sclerosis and Frontotemporal Dementia. Here is the latest research on the nuclear pore complex in ALS and FTD.

Applications of Molecular Barcoding

The concept of molecular barcoding is that each original DNA or RNA molecule is attached to a unique sequence barcode. Sequence reads having different barcodes represent different original molecules, while sequence reads having the same barcode are results of PCR duplication from one original molecule. Discover the latest research on molecular barcoding here.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Evolution of Pluripotency

Pluripotency refers to the ability of a cell to develop into three primary germ cell layers of the embryo. This feed focuses on the mechanisms that underlie the evolution of pluripotency. Here is the latest research.

Position Effect Variegation

Position Effect Variagation occurs when a gene is inactivated due to its positioning near heterochromatic regions within a chromosome. Discover the latest research on Position Effect Variagation here.

STING Receptor Agonists

Stimulator of IFN genes (STING) are a group of transmembrane proteins that are involved in the induction of type I interferon that is important in the innate immune response. The stimulation of STING has been an active area of research in the treatment of cancer and infectious diseases. Here is the latest research on STING receptor agonists.

Microbicide

Microbicides are products that can be applied to vaginal or rectal mucosal surfaces with the goal of preventing, or at least significantly reducing, the transmission of sexually transmitted infections. Here is the latest research on microbicides.