Store Tag in Sequence

StoreTagInSequence #

Insert a tag’s string value into a read sequence at the position defined by another location tag.

[[step]]
    action = "StoreTagInSequence"
    in_value_label    = "mytag"    # location or string tag to insert
    in_position_label = "mytag2"   # location tag defining where to insert
    anchor = "Start"               # "Start"/"left" or "End"/"right"

Parameters #

ParameterTypeRequiredDescription
in_value_labellocation or string tagyesTag whose sequence is inserted into the read
in_position_labellocation tagyesTag that defines the insertion position
anchor"Start" / "left" / "End" / "right"yesWhether to insert before the leftmost position (Start) or after the rightmost end (End) of the position tag

How it works #

in_position_label is a location tag pointing to a region (or multiple regions) in a read. The insertion point is derived from the anchor:

  • Start / left β€” insert before the leftmost start coordinate across all regions.
  • End / right β€” insert after the rightmost start + len coordinate across all regions.

The bytes inserted come from in_value_label:

  • If it is a location tag, its sequences are joined (without spacer) and used.
  • If it is a string tag, the string value is used directly.

Quality scores for the inserted bases are set to ~ (Phred+33 = Q93, maximum Sanger quality).

Location tag adjustment #

After insertion, all location tags on the same segment are updated:

  • Locations after the insertion point are shifted forward by the number of inserted bases.
  • Locations that straddle the insertion point (start before, end after) have their coordinate information removed (sequence data is preserved).
  • Locations before the insertion point are unchanged.

Behaviour when tags are missing #

If in_value_label is Missing or produces an empty sequence, or if in_position_label carries no location information, the read is left unchanged and no error is raised.

Example #

Given read1 = AAACCCGGG (quality IIIIIIIII), with:

  • val_tag extracting bases 0–2 (AAA, location [0,3])
  • pos_tag extracting bases 3–5 (CCC, location [3,3])
# ignore_in_test
[[step]]
    action = "StoreTagInSequence"
    in_value_label    = "val_tag"
    in_position_label = "pos_tag"
    anchor = "Start"

Result: AAA inserted before position 3 β†’ AAAAAACCCGGG (quality III~~~IIIIII).

With anchor = "End":

Result: AAA inserted after position 6 β†’ AAACCCAAAGGG (quality IIIIII~~~III).