Tag Other File

TagOtherFile #

Marks reads based on whether ’they’ are present in another file.

Supports comparing by read sequence, read name, and tags.

[[step]]
    action = "TagOtherFile"
    source = 'read1' # <segment>, name:<segment> or tag<tag_name>
    out_label = "present_in_other_file"
    filename = "names.fastq" # Can read fastq (also compressed), or SAM/BAM, or fasta files
    false_positive_rate = 0.01 # false positive rate (0..1)
    seed = 42 # seed for randomness
    include_mapped = true # in case of BAM/SAM, whether to include aligned reads
    include_unmapped = true # in case of BAM/SAM, whether to include unaligned reads
    # other_read_name_end_character " " # in name: mode, Cut the other files read names at this character

This step annotates reads by comparing them to another file.

With false_positive_rate > 0, uses a cuckoo filter, otherwise an exact hash set. Please note our remarks about cuckoo filters.

We can compare reads based on sequencs, names, or extracted sequences (=string & location tags), by using source concept.

In name mode, our read’s names are cut input.options.read_name_end_characterat The other files read names are cut if other_read_name_end_character is set.

In tag mode, missing tags are treated as ’not present in other file'.