<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Concepts on fastqrab documentation</title><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/</link><description>Recent content in Concepts on fastqrab documentation</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://tyberiusprime.github.io/fastqrab/main/docs/concepts/index.xml" rel="self" type="application/rss+xml"/><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/philosophy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/philosophy/</guid><description>&lt;h1 id="philosophy">
 Philosophy
 &lt;a class="anchor" href="#philosophy">#&lt;/a>
&lt;/h1>
&lt;p>fastqrab transforms (DNA) sequencing reads for downstream analysis.&lt;/p>
&lt;p>Its focus are on&lt;/p>
&lt;ul>
&lt;li>correctness&lt;/li>
&lt;li>reproducibility&lt;/li>
&lt;li>a lack of surprises&lt;/li>
&lt;li>friendliness&lt;/li>
&lt;li>speed&lt;/li>
&lt;/ul>
&lt;h2 id="correctness">
 Correctness
 &lt;a class="anchor" href="#correctness">#&lt;/a>
&lt;/h2>
&lt;p>We strive to do the right thing, always.&lt;/p>
&lt;p>To that end, fastqrab is tested with more than 500
end-to-end, input-to-output tests, both during development and via
continuous integration.&lt;/p>
&lt;h2 id="reproducibility">
 Reproducibility
 &lt;a class="anchor" href="#reproducibility">#&lt;/a>
&lt;/h2>
&lt;p>Repeated runs on the same bits (input data &amp;amp; configuration)
must deliver the same output bits. Every time.&lt;/p></description></item><item><title>Parser Architecture</title><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/parser-architecture/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/parser-architecture/</guid><description>&lt;h1 id="parser-architecture">
 Parser Architecture
 &lt;a class="anchor" href="#parser-architecture">#&lt;/a>
&lt;/h1>
&lt;h2 id="overview">
 Overview
 &lt;a class="anchor" href="#overview">#&lt;/a>
&lt;/h2>
&lt;p>fastqrab uses a custom-built parser designed for high performance and correctness when processing FASTQ.
The parser&amp;rsquo;s design emphasizes:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Zero-copy parsing&lt;/strong> where possible to minimize memory allocations&lt;/li>
&lt;li>&lt;strong>Streaming architecture&lt;/strong> to handle files of any size&lt;/li>
&lt;li>&lt;strong>Transparent compression&lt;/strong> support (raw, gzip, zstd)&lt;/li>
&lt;li>&lt;strong>Cross-platform compatibility&lt;/strong> (Unix/Windows line endings)&lt;/li>
&lt;/ol>
&lt;p>(FASTA and BAM files are processed differently, see below).&lt;/p>
&lt;h2 id="the-zero-copy-challenge-with-compressed-files">
 The Zero-Copy Challenge with Compressed Files
 &lt;a class="anchor" href="#the-zero-copy-challenge-with-compressed-files">#&lt;/a>
&lt;/h2>
&lt;h3 id="why-not-pure-zero-copy">
 Why Not Pure Zero-Copy?
 &lt;a class="anchor" href="#why-not-pure-zero-copy">#&lt;/a>
&lt;/h3>
&lt;p>A common optimization in bioinformatics tools is &amp;ldquo;zero-copy&amp;rdquo; parsing,
where the parser operates directly on memory-mapped file contents without allocating separate buffers.
This works well for uncompressed files stored on fast storage in suitable file formats.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/segments/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/segments/</guid><description>&lt;h1 id="segments">
 Segments
 &lt;a class="anchor" href="#segments">#&lt;/a>
&lt;/h1>
&lt;p>Modern sequencers, particularly Illumina sequencers, can read multiple times from one (amplified) DNA molecule, producing multiple &amp;lsquo;segments&amp;rsquo; (often called &amp;lsquo;reads&amp;rsquo;) that together form a &amp;lsquo;molecule&amp;rsquo; or &amp;lsquo;fragment&amp;rsquo;.&lt;/p>
&lt;h2 id="definition-and-configuration">
 Definition and Configuration
 &lt;a class="anchor" href="#definition-and-configuration">#&lt;/a>
&lt;/h2>
&lt;p>Segments are defined in the &lt;code>[input]&lt;/code> section of your TOML configuration. Each segment corresponds to one FASTQ file (or stream in interleaved formats), and segment names are arbitrary but should be meaningful.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-toml" data-lang="toml">&lt;span style="display:flex;">&lt;span>[&lt;span style="color:#a6e22e">input&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">read1&lt;/span> = [&lt;span style="color:#e6db74">&amp;#34;sample_R1.fq.gz&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">read2&lt;/span> = [&lt;span style="color:#e6db74">&amp;#34;sample_R2.fq.gz&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">index1&lt;/span> = [&lt;span style="color:#e6db74">&amp;#34;sample_I1.fq.gz&amp;#34;&lt;/span>]
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>In this example, three segments are defined: &lt;code>read1&lt;/code>, &lt;code>read2&lt;/code>, and &lt;code>index1&lt;/code>.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/source/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/source/</guid><description>&lt;h1 id="source">
 Source
 &lt;a class="anchor" href="#source">#&lt;/a>
&lt;/h1>
&lt;p>When a step refers to a &amp;lsquo;source&amp;rsquo; (instead of a &lt;a href="https://tyberiusprime.github.io/fastqrab/main/fastqrab/main/docs/concepts/segments/">&lt;code>segment&lt;/code>&lt;/a>), it means the step can read from multiple types of data: segment sequences, segment names, or tag values.&lt;/p>
&lt;h2 id="overview">
 Overview
 &lt;a class="anchor" href="#overview">#&lt;/a>
&lt;/h2>
&lt;p>The &lt;code>source&lt;/code> parameter generalizes the &lt;code>segment&lt;/code> parameter, allowing steps to operate on different kinds of string data within a fragment. This flexibility enables advanced workflows like extracting patterns from read names, processing tag-derived sequences, or combining multiple data sources.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/step/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/step/</guid><description>&lt;h1 id="step">
 Step
 &lt;a class="anchor" href="#step">#&lt;/a>
&lt;/h1>
&lt;p>A step is one coherent manipulation of the FASTQ stream and its associated data.&lt;/p>
&lt;h2 id="overview">
 Overview
 &lt;a class="anchor" href="#overview">#&lt;/a>
&lt;/h2>
&lt;p>Steps are the building blocks of a processing pipeline. Each step is declared as a &lt;code>[[step]]&lt;/code> entry in the TOML configuration file, and the complete pipeline executes steps sequentially from top to bottom.&lt;/p>
&lt;p>Every step operates on complete fragments (molecules), ensuring that paired segments remain synchronized. If a filtering step removes a fragment based on criteria from &lt;code>read1&lt;/code>, the corresponding &lt;code>read2&lt;/code>, &lt;code>index1&lt;/code>, and any other segments are automatically removed alongside it.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/tag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/concepts/tag/</guid><description>&lt;h1 id="tag--label">
 Tag / Label
 &lt;a class="anchor" href="#tag--label">#&lt;/a>
&lt;/h1>
&lt;p>A regular tag is a piece of fragment-derived metadata that one step in the pipeline
produces, and other steps may consume, transform, or export.&lt;/p>
&lt;p>A virtual tag is an on-the-fly create tag that exists just
for this step and disappears right afterwards.&lt;/p>
&lt;h2 id="overview---regular-tags">
 Overview - Regular tags
 &lt;a class="anchor" href="#overview---regular-tags">#&lt;/a>
&lt;/h2>
&lt;p>Tags enable sophisticated workflows by decoupling data extraction from data
usage. Instead of hardcoding logic like &amp;ldquo;trim adapters AND filter by adapter
presence&amp;rdquo; into a single step, you extract adapter locations as a tag, then use
that tag in multiple downstream operations.&lt;/p></description></item></channel></rss>