Changelog

Changelog #

v0.9.0 #

General #

  • Renamed project from mbf-fastq-processor to fastqrab
  • Much improved error messages pinpointing exactly what needs to change.
  • Virtual Tags introduced
  • stdout output on any (infix based) output step with “infix=’–stdout–'”
  • Numeric tags now support range checking
  • Added Günther to the docs.

New limitations #

  • Segment count limited to 255

New steps #

Step renames/merges #

Step changes #

  • Conditional Swap/ReverseComplement variants merged into the main steps

  • if tag support on 8 core editing steps for conditional read editing

  • min_length added to ExtractRegions

  • (optional) Quality checking added to Prefix / Postfix

  • Tag value replacement within regular expressions on ExtractRegex

  • Swap - Fixed a correctness bug.

  • Progress stdout handling unified, reports core configuration and time taken by finalization of steps

  • ExtractRegex from tags

  • StoreTagsInTable - optionally do not output read names.

  • StoreTagInComment now supports multiple tags at once

  • ReverseComplement can now be applied to tags

  • CalcBaseContent outputs 0..=1, not 0..=100

  • More steps support ‘relative’ as an option (CalcGCContent, CalcNContent, CalcQualifiedBases)

ExtractIUPAC #

  • Multiple queries in one step, max_mismatches now required, large performance improvements
  • Tie handling for multiple queries
  • Can now also search within a (small) distance from start/end.

HammingCorrect #

  • Can now assign either barcodes or barcode labels
  • Uses the hamming-resonate crate.
  • Gained support for explicit on_no_match and on_tie handling in multiple variations, with or without name-prefix based equivalence classes.
  • multi-core
  • on_tie=ByEditProbability supports reading the barcode distribution from a previously generated report.

Output changes #

  • Tag histogram reports; demultiplex data nested under ‘demultiplex’ key in reports

BAM output #

  • Store tags in BAM tags
  • Store one tag in BAM reference field.
  • Corrected error message when read length > max bam read length
  • better error messages when the read names can’t be represented in BAM. (see also ValidateReadNamesPrintable

Performance #

  • Redesigned multi-core engine: workpool based, better controllable, better documented
  • Default thread count now uses all available CPU cores
  • Rapidgzip for parallel gzip decompression, now also for FASTA; auto-detected; included in Nix builds
  • Arena-based parsers for FASTA and BAM
  • Parallel BAM decoding
  • Multicore EvalExpression, ReportCountOligos, ReportLengthDistribution
  • Prefix/Postfix massively improved performance
  • Merge base statistics ~80% faster
  • ConcatTags ~15% faster
  • IUPAC matching: replaced Sassy with optimized pure-Rust implementation
  • Optimized SwapConditional, TrimAtTag, StoreTagBackInSequence, FilterReservoirSample, Rename
  • Dynamic cuckoo filter sizing; initial_filter_capacity documented; read count estimation
  • ExtractLongestPolyX no longer O(n^2)

Other #

  • verify command: validates a pipeline produces expected output; auto-detects config, captures stdout/stderr
  • configuration TOML can now be read from stdin (incompatible with reads from stdin).
  • barcode sections can now read from files (FASTA, txt)
  • Shell autocompletion for bash, fish, and zsh
  • benchmark mode and per-step benchmark harness
  • template command: shows help on error
  • LLM configuration guide and template.toml rewrite for LLM-assisted config generation
  • TagLabel type: all tag names are now strongly typed; duplicate tag names produce a clear error
  • IndexMap replaces HashMap everywhere for deterministic output order
  • unwrap() replaced with expect() throughout; clippy::unwrap_used now denied
  • MSRV pinned to match flake.nix Rust version
  • Security: upgraded bytes crate (GHSA-434x-w66g-qw3r)
  • Upgraded dependencies
  • Reports: Prevent script injection.
  • Report’s count_oligos now supports named oligos.

Documentation #

Correctness #

  • The parser has been fuzzed.
  • CI now tests docker images
  • Barcode sections now must be used (or removed).
  • Removed unsafe in parallel gzp writing, interactive mode

Bug fixes #

  • Fixed fastp merge algorithm (replaced with direct port of the reference C++ algorithm)
  • Fixed invalid FASTQ detection when comment line doesn’t start with ‘+’
  • Fixed Windows newline detection edge case in parser
  • Fixed Local-Local FastQElement swap
  • Fixed demultiplex & fragment count in reports
  • Fixed Head short-circuit (broken by SpotCheckReadPairing)
  • Fixed ignore_unaligned → now include_mapped / include_unmapped
  • Fixed barcode overlapping multiple matches
  • Fixed a bug n [ExtractIUPACSuffix](/fastqrab/v0.9.0/docs/reference/tag-steps/extract/ExtractIUPACSuffix/
  • Fixed a potential panic in ExtractLongestPolyX

v0.8.1 #

  • Github release workflow test

v0.8.0 #

  • Versioned documentation
  • First revision where very major feature is in place. Changelog starts here.