Tag Other File by Name

TagOtherFileByName #

Mark reads based on wether names are present in another file.

[[step]]
    action = "TagOtherFileByName"
    segment = "read1" # which segment's name are we using
    label = "present_in_other"
    filename = "names.fastq" # Can read fastq (also compressed), or sam/bam files
    false_positive_rate = 0.01 # false positive rate (0..1)
    seed = 42 # seed for randomness
    ignore_unaligned = false # in case of BAM/SAM, whether to ignore unaligned reads
    fastq_readname_end_char = " " # (optional) char (byte value) at which to cut input fastq read names before comparing. If not set, no cutting is done.
    reference_readname_end_char = "/" # (optional) char (byte value) at which to cut reference read names before storing them.

This step marks reads by comparing their names against names from another file.

fastq_readname_end_chars trims the incoming fastq read names before lookup, while reference_readname_end_chars controls how names from the reference file are stored. Leave either field unset to keep names unchanged.

With false_positive_rate > 0, uses a cuckoo filter, otherwise an exact hash set.

Please note our remarks about cuckoo filters.