[barcodes.*] section
#
Barcode tables supply the sequence-to-sample-name mappings used by Demultiplex and HammingCorrect.
Each table is an independent named dictionary. The name is chosen by the user and referenced from the step that consumes it.
[barcodes.my_barcodes]
AAAAAA = "sample-1"
CCCCCC = "sample-2"
GGGGGG = "sample-3"
The table may appear anywhere in the TOML file; forward and backward references are both valid.
Key format #
Keys are DNA sequences using uppercase IUPAC nucleotide codes.
All standard IUPAC ambiguity codes are accepted (e.g. N, R, Y, W, …).
A _ in a key separates regions when the tag being matched spans multiple
extracted segments joined with _:
[barcodes.dual_index]
AAAAAA_TTTTTT = "sample-1" # i7 = AAAAAA, i5 = TTTTTT
CCCCCC_GGGGGG = "sample-2"
Values #
Values are arbitrary sample-name strings that appear in the output file names
produced by Demultiplex.
The name no-barcode is reserved and will be rejected.
Multiple keys may map to the same value (barcode aliases):
[barcodes.my_barcodes]
AAAAAA = "sample-1"
AAAAAC = "sample-1" # treated identically to AAAAAA
CCCCCC = "sample-2"
Constraints #
| Constraint | Detail |
|---|---|
| Non-empty | At least one entry required per table. |
| Uniform length | All keys in the same table must have the same length (counting _ separators). |
| Non-overlapping IUPAC | Two keys must not match any of the same concrete sequences. e.g. NNNN and ATCG overlap. |
| Reserved name | The value "no-barcode" is used internally for unmatched reads and may not be used as a sample name. |
Multiple tables #
Define as many [barcodes.<name>] sections as needed; each step refers to
the one it needs by name:
[barcodes.i7_barcodes]
AAAAAA = "sample-1"
CCCCCC = "sample-2"
[barcodes.i5_barcodes]
TTTTTT = "lib-A"
GGGGGG = "lib-B"
See also #
- Demultiplex — how barcode tables are consumed to fork the output
- HammingCorrect — correct sequencing errors in a tag before demultiplexing
- ExtractRegion — extract a barcode from a read into a tag