[barcodes.*] section
#
Barcode tables supply the sequence-to-sample-name mappings used by Demultiplex, [HammingCorrect](/fastqrab/v0.9.0/docs/reference/tag-steps/using/HammingCorrect/ and AssignByHalves ).
Each table is an independent named dictionary. The name is chosen by the user and referenced from the step that consumes it.
# ignore_in_test
[barcodes.my_barcodes]
AAAAAA = "sample-1"
CCCCCC = "sample-2"
GGGGGG = "sample-3"
The table may appear anywhere in the TOML file; forward and backward references are both valid.
Key format #
Keys are DNA sequences using uppercase IUPAC nucleotide codes.
All standard IUPAC ambiguity codes are accepted (e.g. N, R, Y, W, …).
_ in keys are ignored for readability.
# ignore_in_test
[barcodes.dual_index]
AAAAAA_TTTTTT = "sample-1" # i7 = AAAAAA, i5 = TTTTTT
CCCCCC_GGGGGG = "sample-2"
Values #
Values are arbitrary sample-name strings that appear in the output file names
produced by Demultiplex.
The name no-barcode is reserved and will be rejected.
Multiple keys may map to the same value (barcode aliases):
# ignore_in_test
[barcodes.my_barcodes]
AAAAAA = "sample-1"
AAAAAC = "sample-1" # treated identically to AAAAAA
CCCCCC = "sample-2"
Reading from file #
# ignore_in_test
[barcodes.my_barcodes]
from_file = {
filename = "barcodes.fasta"
read_name_comment_character = " " # optional
Barcodes can be read from a FASTA / FASTQ / BAM / txt file.
If read_name_comment_character is provided, read names are cut off at that character. Defaults to no truncation.
Note that this file is considered part of the configuration -
fastqrab validate will fail if it is not present.
Txt files are files that contain ‘.txt’ in their filename and one barcode per line. All lines must have the same lengths. Detection is automatic. They may be compressed (gzip,zstd). The labels are automatically set to the barcodes, making this format more suitable for hamming correction than assignments.
Constraints #
| Constraint | Detail |
|---|---|
| Non-empty | At least one entry required per table. |
| Uniform length | All keys (barcodes) in the same table must have the same length (counting _ separators). |
| Non-overlapping IUPAC | Two keys must not match any of the same concrete sequences. e.g. NNNN and ATCG overlap. |
| Reserved name | The value "no-barcode" is used internally for unmatched reads and may not be used as a sample name. |
Multiple tables #
Define as many [barcodes.<name>] sections as needed; each step refers to
the one it needs by name:
# ignore_in_test
[barcodes.i7_barcodes]
AAAAAA = "sample-1"
CCCCCC = "sample-2"
[barcodes.i5_barcodes]
TTTTTT = "lib-A"
GGGGGG = "lib-B"
See also #
- Demultiplex — how barcode tables are consumed to fork the output
- HammingCorrect — correct sequencing errors in a tag before demultiplexing
- ExtractRegion — extract a barcode from a read into a tag