Source

Source #

When a step refers to a ‘source’ (instead of a segment), it means the step can read from multiple types of data: segment sequences, segment names, or tag values.

Overview #

The source parameter generalizes the segment parameter, allowing steps to operate on different kinds of string data within a fragment. This flexibility enables advanced workflows like extracting patterns from read names, processing tag-derived sequences, or combining multiple data sources.

Source Types #

Segment Source #

Reads the sequence from a specific segment.

Syntax: Just the segment name (e.g., "read1", "index1")

Example:

[[step]]
    action = "ExtractIUPAC"
    segment = "read1"              # Read from read1 segment sequence
    search = "AGATCGGAAGAGC"
    anchor = "anywhere"
    out_label = "adapter"
    max_mismatches = 1

This is functionally identical to using segment = "read1" in steps that support both parameters.

Name Source #

Reads the read name (FASTQ header) from a specific segment.

Syntax: "name:<segment>" (e.g., "name:read1", "name:index1")

Example:

[[step]]
    action = "ExtractRegex"
    source = "name:read1"         # Read from read1's name/header
    pattern = "BC:([ACGT]+)"      # Extract barcode from name
    out_label = "barcode_from_name"

Use cases:

  • Extracting barcodes embedded in read names by upstream tools
  • Parsing instrument-specific metadata from headers
  • Extracting UMIs added to read names during library prep

Tag Source #

Reads the sequence value from a previously defined tag.

Syntax: "tag:<label>" (e.g., "tag:barcode", "tag:adapter")

Requirements:

  • The tag must be a location tag or sequence tag
  • The tag must be defined earlier in the pipeline

Example:

[[step]]
    action = "ExtractIUPAC"
    segment = "read1"    # Search within the extracted barcode
    max_mismatches = 0
    search = "NNNATCG"
    out_label = "hit"
    anchor = "anywhere"

[[step]]
    action = "ExtractRegion"
    source = "tag:hit"
    start = 0
    length = 3
    anchor = "Start"
    out_label = "barcode"     # Extract first 3bp

Use cases:

  • Searching within previously extracted regions
  • Processing tag-derived sequences without affecting original segments
  • Multi-stage pattern matching (extract large region, then search within it)

Segment vs Source #

Many steps accept either segment or source, but not both:

ParameterAcceptsExample Values
segmentSegment names only"read1", "index1", "All"
sourceSegments, names, or tags"read1", "name:read1", "tag:barcode"

When to use segment:

  • Simple operations on segment sequences
  • Steps that modify segment data in place
  • When you only work with segment sequences

When to use source:

  • Extracting from read names
  • Processing tag-derived sequences
  • Flexible workflows that might switch data sources

Supported Steps #

Not all steps support source. Common steps that do:

Limitations #

  • Tag sources must be location or sequence tags (not numeric or boolean)
  • Name sources access the entire FASTQ header line (excluding the leading @)

See Also #