Extract IUPAC

ExtractIUPAC #

[[step]]
    action = "ExtractIUPAC"
    out_label = "mytag"
    anchor = 'Left' # Left | Right | Anywhere
    max_anchor_distance = 0 # Optional, Default = 0. Allow hit in the first/last n bp.
                            # only valid if anchor != anywhere.
    search = "CTN" # May also be a list ["CTN", "GAN", ...]
    segment = 'read1' # Any of your input segments
    max_mismatches = 0 # required. How many mismatches are allowed
    on_tie = 'Earliest' # Earliest|LeftMost|RightMost 
                        # - decide what happens when multiple searches hit

Search and extract a sequence from the read, defined by one ore more IUPAC string(s).

The anchor decides where we search.

  • Anywhere: Search for the IUPAC string everywhere in the segment, return left most hit.
  • Left: Search at the beginning. Left most position is at most max_anchor_distance
  • Right: Search at the end. Right most position is at most segment.len() - max_anchor_distance

max_anchor_distance mustnot be set if anchor is ‘Anywhere’.

Ambiguous matches (e.g. query ‘Y’ matching ‘C’) do not count as mismatches, but as full matches.

If you set max_anchor_distance > 0, the leftmost (anchor=‘Left’) respectively the rightmost (anchor=‘Right’`) hit for any given query is preferred.

This is independent of the on_tie setting which handles multiple queries.

Multiple search queries #

With multiple search queries, multiple may be present in your read.

If anchor is ‘Left’ or ‘Right’, and max_anchor_distance is 0, the first from the list of search that hits within max_mismatches will be reported.

Otherwise, this is also the default.

This may be changed by setting on_tie to ‘LeftMost’ or ‘RightMost’ which will instead take the left or right most occurrence.

No preference to lower hamming distances is given.