Benchmark Section

Benchmark #

For profiling and benchmarking (individual) steps, mbf-fastq-processor has a special benchmark mode.

This mode focuses on benchmarking the steps, and avoids (most) input and output runtime.

Enable it by adding this TOML section. The output section becomes optional (and ignored) when benchmarking is enabled.

[benchmark]
    enable = true # required to enable benchmark mode
    quiet = false # default. If true, don't output timing information
    molecule_count = 1_000_000

Benchmark mode:

  • Disables (regular) output
  • runs in a temp directory,
  • repeats the first molecule ‘block’ of Options.block_size reads until molecule_count has been exceeded.

The last point means that we will spent very little time in reading & decompression (without rapidgzip / parallel BAM processing the largest runtime parts), and focus on the steps. The drawback here is that your pipeline sees the same reads over and over, which of course will lead to a different ‘hit’ profile for set based tests such as duplication counting, TagOtherFileByName,
and Demultiplex

Results #

Parsing, Ryzen Ai Max+ 395, 2025 NVME #

Benchmarking the parsing performance with a no-output, report read count only configuration, using one file of about 44 million 75bp reads, we observe:

  • FASTQ, gzip, rapidgzip, 5 threads: 11.2 million reads / second

  • FASTQ, gzip, single thread / flate2 : 4.7 million reads / second

  • FASTQ, uncompressed: about 6.4 million reads / second

  • BAM, single-threaded: 4.7 million reads / second

  • BAM, multi-threaded, 5 threads: 4.3 million reads / second (slower than single-thread. We have a bottleneck in read-to-fastq-adaptation)

  • FASTA, uncompressed: 3.4 million reads / second

  • FASTA, gzip, single thread, flate2 : 2.8 million reads / second

  • FASTA, gzip, rapidgzip: 4 million reads / second

Per step micro benchmarks #

This section is mostly useful to estiamte which steps are fast and which are not.

On Ryzen AI Max+ 395 using benchmark mode, 12 threads, 10 million (10k repeated) single end reads.

Time (ms)Step
2592.60ExtractIUPACWithIndel
1944.30Report_count_oligios
1918.40Rename
1801.40FilterReservoirSample
1692.80ConcatTags
1186.90Report_duplicate_count_per_fragment
908.83HammingCorrect
905.40Demultiplex
859.17StoreTagInSequence
743.61TrimAtTag
725.05QuantifyTag
704.68StoreTagLocationInComment
702.65MergeReads
691.08StoreTagInComment
684.80UppercaseTag
681.20Report_duplicate_count_per_read
659.23ExtractRegex
620.23TagDuplicates
619.42ConvertRegionsToLength
531.71FilterByTag
526.09LowercaseTag
514.62ExtractRegion
502.24ReplaceTagWithLetter
498.61ExtractRegions
477.21EvalExpression
388.73Report_base_statistics
380.06ReverseComplement
352.31ExtractIUPAC
304.53ValidateName
215.64ExtractLongestPolyX
209.48ExtractLowQualityEnd
207.28ValidateSeq
196.47ExtractLowQualityStart
190.81Postfix
172.78SpotCheckReadPairing
172.69Swap
166.42Prefix
158.80ConvertQuality
158.20ExtractPolyTail
158.02Report_length_distribution
153.09CalcComplexity
153.07TagOtherFileBySequence
150.06CalcGCContent
149.37CalcLength
147.20TagOtherFileByName
140.28ExtractRegionsOfLowQuality
139.93CalcQualifiedBases
138.66CalcBaseContent
137.49CalcExpectedError
136.57ExtractIUPACSuffix
136.45CalcNCount
134.07ValidateQuality
128.60FilterSample
117.46LowercaseSequence
114.71UppercaseSequence
110.16CalcKmers
108.80Skip
106.57FilterEmpty
104.83Report_count
102.38Progress
101.75CutStart
98.83Truncate
96.69CutEnd
91.84FilterByNumericTag
26.11Head