Nov 19, 2019

SPDI: data model for variants and applications at NCBI

J. Bradley HolmesBrandi L. Kattman


Normalizing sequence variants on a reference, projecting them across congruent sequences and aggregating their diverse representations are critical to the elucidation of the genetic basis of disease and biological function. Inconsistent representation of variants among variant callers, local databases and tools result in discrepancies that complicate analysis. NCBI's genetic variation resources, dbSNP and ClinVar, require a robust, scalable set of principles to manage asserted sequence variants. The SPDI data model defines variants as a sequence of four attributes: sequence, position, deletion and insertion, and can be applied to nucleotide and protein variants. NCBI web services convert representations among HGVS, VCF and SPDI and provide two functions to aggregate variants. One, based on the NCBI Variant Overprecision Correction Algorithm, returns a unique, normalized representation termed the 'Contextual Allele'. The SPDI data model, with its four operations, defines exactly the reference subsequence affected by the variant, even in repeat regions, such as homopolymer and other sequence repeats. The second function projects variants across congruent sequences and depends on an alignment dataset of non-assembly NCBI RefSeq se...Continue Reading

  • References
  • Citations2


  • We're still populating references for this paper, please check back later.
  • References
  • Citations2


Mentioned in this Paper

Positioning Attribute
Health Services
Ncbi Taxonomy
Genome Assembly Sequence
Gene Deletion
Amino-terminal pro-brain natriuretic peptide
Protein Isoforms

Related Feeds

22q11 Deletion Syndrome

22q11.2 deletion syndrome, also known as DiGeorge syndrome, is a congenital disorder caused as a result of a partial deletion of chromosome 22. Here is the latest research.