Output Section

Output section #

The [output] table controls how transformed reads and reporting artefacts are written.

[output]
    prefix = "output"          # required.
    format = "Fastq", # (optional) output format, defaults to 'Fastq'
					  # Valid values are: Fastq, Fasta, BAM and None (for no sequence output)
    compression = "Gzip"        # Raw | Uncompressed | Gzip | Zstd | None (default: Raw)
    compression_threads = 5        # (optional) number of threads to use for compressing gzip data
    suffix = ".fq.gz"           # optional override; inferred from format when omitted
    compression_level = 6       # gzip: 0-9, zstd: 1-22, bam: 0-9 (BGZF); defaults are gzip=6, zstd=5
    ix_separator = "_"          # optional separator between prefix, infixes, and segments. Defaults to '_'

    report_json = false         # write prefix.json (default: false)
    report_html = true          # write prefix.html (default: false)
    report_timing = false          # write prefix.timing.json (default: false)

    output = ["read1", "read2"] # limit which segments become FASTQ files
    interleave = false          # emit a single interleaved FASTQ
    stdout = false              # stream to stdout instead of files
    chunk_size = 100000         # Write multiple, numbered output files, each a maximum of chunk_size reads/molecules.

    output_hash_uncompressed = false
    output_hash_compressed = false
KeyDefaultDescription
prefix"output"Base name for all files produced by the run.
format"Fastq"Output format. Valid values are: Fastq, Fasta, Bam, and None (for no sequence output).
compression"Uncompressed"Compression format for read outputs. Valid values are: Gzip, Zstd, Uncompressed (alias: "Raw"). Must not be set for BAM
compression_threadsautoif using gzip compression, how many thread should be used for compression. See threading
suffixderived from formatOverride file extension when interop with other tooling demands a specific suffix.
compression_levelgzip: 6, zstd: 5Fine-tune compression effort. Ignored for Raw/None. Bam maps directly to the BGZF level (0–9).
report_json / report_htmlfalseToggle structured or interactive reports.
report_timingfalseEmit a JSON file with detailed timing information for all steps.
outputall input segmentsRestrict the subset of segments written to disk. Use an empty list to suppress FASTQs while still running steps that depend on fragment data.
interleavefalseGenerate a single interleaved FASTQ ({prefix}_interleaved.fq*).
stdoutfalseWrite to stdout. Forces format = "Raw". Sets interleave=true if more than one fragment is listed in output
output_hash_uncompressed / output_hash_compressedfalseEmit SHA-256 checksums.
ix_separator"_"Separator inserted between prefix, any infix (demultiplex labels, inspect names, etc.), and segment names.
chunk_size(unlimited)Split outputs into multiple files, each containing at most chunk_size reads/molecules. For non-interleaved output files, it’s chunk_size reads, for interleaved files it’s molecules. This means when mixing interleaved and non-interleaved output, you get the same number of files. Files are numbered sequentially, e.g. output_read1_0.fq.gz, …, Numbers start at 0 and use the minimum number of (base 10) digits necessary for alphabetical sorting (by renaming already produced files whenever an extension is needed).

Generated filenames join these components with ix_separator (default _), e.g. {prefix}_{segment}{suffix}. Interleaving replaces segment with interleaved; demultiplexing adds per-barcode infixes before the segment. Checksums use .uncompressed.sha256 or .compressed.sha256 suffixes.

Compression format and suffix are independent: overriding the suffix will not change the actual compression algorithm.

BAM-specific notes

  • format = "Bam" emits an unaligned BAM file using BGZF compression.
  • BAM may not contain spaces in read names. If a read has a space in it’s Fastq name, it’s truncated at the first space, and the remaining text is placed in the “CO” tag.
  • BAM output cannot be streamed to stdout and requires output_hash_uncompressed = false (compressed hashes continue to work).
  • Interleaved writes produce one paired BAM with appropriate SAM flags; per-segment outputs yield independent BAMs.

Example output files. #

As above #

The above configuration produces:

  • output_read1.fq.gz # .fq is the default suffix for raw, .fq.gz for gzip
  • output_read2.fq.gz
  • output.html # HTML report

If Interleaved was set #

  • output_interleaved.fq.gz
  • output.html # HTML report

No sequence output #

Set format = "None" or output = [] when you only need reports or tag quantification. A prefix is still required so report files have a stable name.

See also the Report steps reference for producing summaries, and the Demultiplex documentation for how barcode outputs influence file naming.

Named pipe outputs #

Output files may be (preexisting) named pipes (FIFOs).

Overwrite protection #

If any output file already exists, mbf-fastq-processor will refuse to overwrite them.

Except when the incompletion marker (see below) is present.

(In-)Completion marker #

Every run writes {prefix}.incompleted in the output directory before any other file handles are opened. The file is deleted once processing finishes, so its presence later indicates an interrupted run.

Because the marker predates other outputs, reruns detect its presence and permit overwriting prior artefacts without manual cleanup.

If the process aborts for any reason, the marker stays behind.