A deep learning framework combined with word embedding to identify DNA replication origins

Scientific Reports
Feng WuLina Zhang


The DNA replication influences the inheritance of genetic information in the DNA life cycle. As the distribution of replication origins (ORIs) is the major determinant to precisely regulate the replication process, the correct identification of ORIs is significant in giving an insightful understanding of DNA replication mechanisms and the regulatory mechanisms of genetic expressions. For eukaryotes in particular, multiple ORIs exist in each of their gene sequences to complete the replication in a reasonable period of time. To simplify the identification process of eukaryote's ORIs, most of existing methods are developed by traditional machine learning algorithms, and target to the gene sequences with a fixed length. Consequently, the identification results are not satisfying, i.e. there is still great room for improvement. To break through the limitations in previous studies, this paper develops sequence segmentation methods, and employs the word embedding technique, 'Word2vec', to convert gene sequences into word vectors, thereby grasping the inner correlations of gene sequences with different lengths. Then, a deep learning framework to perform the ORI identification task is constructed by a convolutional neural network with a...Continue Reading


Feb 2, 2008·BMC Bioinformatics·Feng Gao, Chun-Ting Zhang
Apr 29, 2008·Molecular Cell·Amber CramptonMichael Weinreich
Sep 24, 2010·Nature Reviews. Molecular Cell Biology·Marcel Méchali
Dec 21, 2010·Journal of Theoretical Biology·Kuo-Chen Chou
Feb 8, 2011·Nature Structural & Molecular Biology·Celina CostasCrisanto Gutierrez
Aug 5, 2011·Genome Research·Melvenia M MartinMirit I Aladjem
Sep 29, 2011·Bio Systems·Kushal Shah, Annangarachari Krishnamachari
Apr 3, 2012·Bioinformatics·Feng GaoChun-Ting Zhang
Apr 17, 2012·Research in Microbiology·Manoj K DharSanjana Kaul
Oct 13, 2012·Bioinformatics·Limin FuWeizhong Li
Jul 28, 2015·Nature Biotechnology·Babak AlipanahiBrendan J Frey
Nov 22, 2015·Nucleic Acids Research·Karen ClarkEric W Sayers
Jan 3, 2018·Nature Ecology & Evolution·Hongan LongMichael Lynch
Apr 24, 2018·Bioinformatics·Bin LiuKuo-Chen Chou
Jun 29, 2018·Bioinformatics·Genta Aoki, Yasubumi Sakakibara
Jul 12, 2018·IEEE Transactions on Pattern Analysis and Machine Intelligence·Kaiming HeRoss Girshick
Jan 17, 2019·Biochemical Society Transactions·Stephen D Bell
Mar 3, 2019·Bioinformatics·Justin G ChitpinTheodore J Perkins
Mar 7, 2019·Journal of Healthcare Engineering·Zhenglun KongShengpu Xu
Jul 10, 2019·Current Opinion in Chemical Biology·Ineke Brouwer, Tineke L Lenstra
Aug 1, 2019·Scientific Reports·Terezia PrikrylovaAdam B Robertson
Oct 2, 2019·Frontiers in Microbiology·Dan Wang, Feng Gao
Apr 20, 2020·IEEE Transactions on Visualization and Computer Graphics·Angelos ChatzimparmpasAndreas Kerren
Jun 23, 2020·ACS Omega·Venkata Rajesh YellaAditya Kumar

Methods Mentioned

feature extraction

Related Concepts

Base Sequence
DNA Replication
Biological Neural Networks
Comparative Genomic Analysis
Protein Expression
Protein, Organized by Origin
Replication-Associated Process

Trending Feeds


Coronaviruses encompass a large family of viruses that cause the common cold as well as more serious diseases, such as the ongoing outbreak of coronavirus disease 2019 (COVID-19; formally known as 2019-nCoV). Coronaviruses can spread from animals to humans; symptoms include fever, cough, shortness of breath, and breathing difficulties; in more severe cases, infection can lead to death. This feed covers recent research on COVID-19.

Chronic Fatigue Syndrome

Chronic fatigue syndrome is a disease characterized by unexplained disabling fatigue; the pathology of which is incompletely understood. Discover the latest research on chronic fatigue syndrome here.

Systemic Juvenile Idiopathic Arthritis

Systemic juvenile idiopathic arthritis is a rare rheumatic disease that affects children. Symptoms include joint pain, but also fevers and skin rashes. Here is the latest on this disease.

Chromatin Regulation and Circadian Clocks

The circadian clock plays an important role in regulating transcriptional dynamics through changes in chromatin folding and remodelling. Discover the latest research on Chromatin Regulation and Circadian Clocks here.

Central Pontine Myelinolysis

Central Pontine Myelinolysis is a neurologic disorder caused most frequently by rapid correction of hyponatremia and is characterized by demyelination that affects the central portion of the base of the pons. Here is the latest research on this disease.

Myocardial Stunning

Myocardial stunning is a mechanical dysfunction that persists after reperfusion of previously ischemic tissue in the absence of irreversible damage including myocardial necrosis. Here is the latest research.

Pontocerebellar Hypoplasia

Pontocerebellar hypoplasias are a group of neurodegenerative autosomal recessive disorders with prenatal onset, atrophy or hypoplasia of the cerebellum, hypoplasia of the ventral pons, microcephaly, variable neocortical atrophy and severe mental and motor impairments. Here is the latest research on pontocerebellar hypoplasia.

Cell Atlas Along the Gut-Brain Axis

Profiling cells along the gut-brain axis at the single cell level will provide unique information for each cell type, a three-dimensional map of how cell types work together to form tissues, and insights into how changes in the map underlie health and disease of the GI system and its crosstalk with the brain. Disocver the latest research on single cell analysis of the gut-brain axis here.

Chronic Traumatic Encephalopathy

Chronic Traumatic Encephalopathy (CTE) is a progressive degenerative disease that occurs in individuals that suffer repetitive brain trauma. Discover the latest research on traumatic encephalopathy here.