Modular non-repeating codes for DNA storage

BioRxiv : the Preprint Server for Biology
Ian Holmes


We describe a strategy for constructing codes for DNA-based information storage by serial composition of weighted finite-state transducers. The resulting state machines can integrate correction of substitution errors; synchronization by interleaving watermark and periodic marker signals; conversion from binary to ternary, quaternary or mixed-radix sequences via an efficient block code; encoding into a DNA sequence that avoids homopolymer, dinucleotide, or trinucleotide runs and other short local repeats; and detection/correction of errors (including local duplications, burst deletions, and substitutions) that are characteristic of DNA sequencing technologies. We present software implementing these codes, available at with simulation results demonstrating that the generated DNA is free of short repeats and can be accurately decoded even in the presence of substitutions, short duplications and deletions.

Related Concepts

Computer Software
Biological Markers
Gene Deletion
Chemical Substitution
Sequence Determinations, DNA
Trinucleotide Repeats
Radix (invertebrate)

Related Feeds

BioRxiv & MedRxiv Preprints

BioRxiv and MedRxiv are the preprint servers for biology and health sciences respectively, operated by Cold Spring Harbor Laboratory. Here are the latest preprint articles (which are not peer-reviewed) from BioRxiv and MedRxiv.

Related Papers

Proceedings of the National Academy of Sciences of the United States of America
John A HawkinsWilliam H Press
© 2020 Meta ULC. All rights reserved