TagOtherFile #
Marks reads based on whether ’they’ are present in another file.
Supports comparing by read sequence, read name, and tags.
[[step]]
action = "TagOtherFile"
source = 'read1' # <segment>, name:<segment> or tag<tag_name>
out_label = "present_in_other_file"
filename = "names.fastq" # Can read fastq (also compressed), or SAM/BAM, or fasta files
false_positive_rate = 0.01 # false positive rate (0..1)
seed = 42 # seed for randomness
include_mapped = true # in case of BAM/SAM, whether to include aligned reads
include_unmapped = true # in case of BAM/SAM, whether to include unaligned reads
# other_read_name_end_character " " # in name: mode, Cut the other files read names at this character
This step annotates reads by comparing them to another file.
With false_positive_rate > 0, uses a cuckoo filter, otherwise an exact hash set. Please note our remarks about cuckoo filters.
We can compare reads based on sequencs, names, or extracted sequences (=string & location tags), by using source concept.
In name mode, our read’s names are cut input.options.read_name_end_characterat The other files read names are cut if other_read_name_end_character is set.
In tag mode, missing tags are treated as ’not present in other file'.