<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Cookbooks on fastqrab documentation</title><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/</link><description>Recent content in Cookbooks on fastqrab documentation</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/index.xml" rel="self" type="application/rss+xml"/><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/01-basic-quality-report/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/01-basic-quality-report/</guid><description>&lt;h1 id="cookbook-01-basic-quality-report">
 Cookbook 01: Basic Quality Report
 &lt;a class="anchor" href="#cookbook-01-basic-quality-report">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have FastQ files from a sequencing run and want to generate comprehensive quality reports to assess:&lt;/p>
&lt;ul>
&lt;li>Read quality scores&lt;/li>
&lt;li>Base composition&lt;/li>
&lt;li>Read length distribution&lt;/li>
&lt;li>Duplicate read counts&lt;/li>
&lt;/ul>
&lt;p>This is typically the first step in any sequencing data analysis to understand data quality before downstream processing.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Reads input FastQ file(s)&lt;/li>
&lt;li>Generates a comprehensive quality report including:
&lt;ul>
&lt;li>Base quality statistics&lt;/li>
&lt;li>Base distribution across positions&lt;/li>
&lt;li>Read length distribution&lt;/li>
&lt;li>Duplicate read counting&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Outputs reports in both HTML (human-readable) and JSON (machine-readable) formats&lt;/li>
&lt;li>Passes through all reads unchanged (no filtering)&lt;/li>
&lt;/ol>
&lt;h2 id="input-files">
 Input Files
 &lt;a class="anchor" href="#input-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>input/sample_R1.fq&lt;/code> - Forward reads (Read 1) from paired-end sequencing&lt;/li>
&lt;/ul>
&lt;h2 id="output-files">
 Output Files
 &lt;a class="anchor" href="#output-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>output_R1.fq&lt;/code> - Passed-through reads (identical to input)&lt;/li>
&lt;li>&lt;code>output.report_initial.html&lt;/code> - HTML quality report&lt;/li>
&lt;li>&lt;code>output.report_initial.json&lt;/code> - JSON quality report with detailed statistics&lt;/li>
&lt;/ul>
&lt;h2 id="when-to-use-this">
 When to Use This
 &lt;a class="anchor" href="#when-to-use-this">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>First analysis of new sequencing data&lt;/li>
&lt;li>Quality control before committing to expensive downstream analysis&lt;/li>
&lt;li>Comparing data quality across different sequencing runs&lt;/li>
&lt;li>Identifying potential issues (adapter contamination, quality drop-off, etc.)&lt;/li>
&lt;/ul>
&lt;h2 id="download">
 Download
 &lt;a class="anchor" href="#download">#&lt;/a>
&lt;/h2>
&lt;p>&lt;a href="../../../../cookbooks/01-basic-quality-report.tar.gz">Download 01-basic-quality-report.tar.gz&lt;/a> for a complete, runnable example including expected output files.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/02-umi-extraction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/02-umi-extraction/</guid><description>&lt;h1 id="cookbook-02-umi-extraction">
 Cookbook 02: UMI Extraction
 &lt;a class="anchor" href="#cookbook-02-umi-extraction">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have sequencing data with Unique Molecular Identifiers (UMIs) embedded in the reads. UMIs are short random barcodes added during library preparation that allow you to:&lt;/p>
&lt;ul>
&lt;li>Identify and remove PCR duplicates&lt;/li>
&lt;li>Distinguish true biological duplicates from amplification artifacts&lt;/li>
&lt;li>Improve accuracy in quantitative analyses (RNA-seq, ATAC-seq, etc.)&lt;/li>
&lt;/ul>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Reads input FastQ file with UMIs at the start of read1&lt;/li>
&lt;li>Extracts the UMI sequence (first 8 bases) and creates a tag&lt;/li>
&lt;li>Stores the UMI in the read comment (FASTQ header)&lt;/li>
&lt;li>Removes the UMI bases from the read sequence (so they don&amp;rsquo;t interfere with alignment)&lt;/li>
&lt;li>Outputs modified reads with UMI preserved in the header&lt;/li>
&lt;/ol>
&lt;h2 id="input-files">
 Input Files
 &lt;a class="anchor" href="#input-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>input/sample_R1.fq&lt;/code> - Reads with 8bp UMI at the start&lt;/li>
&lt;/ul>
&lt;h2 id="output-files">
 Output Files
 &lt;a class="anchor" href="#output-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>output_R1.fq&lt;/code> - Reads with UMI in comment, UMI bases removed from sequence&lt;/li>
&lt;/ul>
&lt;h2 id="configuration-highlights">
 Configuration Highlights
 &lt;a class="anchor" href="#configuration-highlights">#&lt;/a>
&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-toml" data-lang="toml">&lt;span style="display:flex;">&lt;span>[[&lt;span style="color:#a6e22e">step&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Extract UMI from positions 0-7 (8 bases)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">action&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;ExtractRegions&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">label&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;umi&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">regions&lt;/span> = [{&lt;span style="color:#a6e22e">source&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;read1&amp;#39;&lt;/span>, &lt;span style="color:#a6e22e">start&lt;/span> = &lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#a6e22e">length&lt;/span> = &lt;span style="color:#ae81ff">8&lt;/span>, &lt;span style="color:#a6e22e">anchor&lt;/span>=&lt;span style="color:#e6db74">&amp;#34;Start&amp;#34;&lt;/span>}]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>[[&lt;span style="color:#a6e22e">step&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Store UMI in the FASTQ comment&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">action&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;StoreTagInComment&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">label&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;umi&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>[[&lt;span style="color:#a6e22e">step&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Remove the UMI bases from the read&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">action&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;CutStart&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">target&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;Read1&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">n&lt;/span> = &lt;span style="color:#ae81ff">8&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="workflow-details">
 Workflow Details
 &lt;a class="anchor" href="#workflow-details">#&lt;/a>
&lt;/h2>
&lt;p>&lt;strong>Before processing:&lt;/strong>&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/03-lexogen-quantseq/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/03-lexogen-quantseq/</guid><description>&lt;h1 id="cookbook-03-lexogen-quantseq-processing">
 Cookbook 03: Lexogen QuantSeq Processing
 &lt;a class="anchor" href="#cookbook-03-lexogen-quantseq-processing">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>Lexogen QuantSeq is a popular 3&amp;rsquo; mRNA sequencing protocol optimized for gene expression profiling. The library structure includes:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>First 8 bases&lt;/strong>: UMI (Unique Molecular Identifier) for deduplication&lt;/li>
&lt;li>&lt;strong>Next 6 bases&lt;/strong>: Random hexamer primer sequence (needs removal)&lt;/li>
&lt;li>&lt;strong>Remaining sequence&lt;/strong>: Actual cDNA from the 3&amp;rsquo; end of transcripts&lt;/li>
&lt;/ul>
&lt;p>This cookbook demonstrates the standard preprocessing for QuantSeq data before alignment.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Extracts the 8bp UMI from the start of reads&lt;/li>
&lt;li>Stores the UMI in the read comment (FASTQ header)&lt;/li>
&lt;li>Removes the first 14 bases total (8bp UMI + 6bp random hexamer)&lt;/li>
&lt;li>Outputs processed reads ready for alignment&lt;/li>
&lt;/ol>
&lt;h2 id="input-files">
 Input Files
 &lt;a class="anchor" href="#input-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>input/quantseq_sample.fq&lt;/code> - Raw QuantSeq reads with UMI and random hexamer&lt;/li>
&lt;/ul>
&lt;h2 id="output-files">
 Output Files
 &lt;a class="anchor" href="#output-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>output_read1.fq&lt;/code> - Processed reads with:
&lt;ul>
&lt;li>UMI stored in comment&lt;/li>
&lt;li>First 14bp removed&lt;/li>
&lt;li>Ready for alignment to reference genome&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="workflow-details">
 Workflow Details
 &lt;a class="anchor" href="#workflow-details">#&lt;/a>
&lt;/h2>
&lt;p>&lt;strong>Raw read structure:&lt;/strong>&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/04-phiX-removal/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/04-phiX-removal/</guid><description>&lt;h1 id="cookbook-04-phix-removal">
 Cookbook 04: PhiX Removal
 &lt;a class="anchor" href="#cookbook-04-phix-removal">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have Illumina PhiX spike-in sequences in your dataset and want to remove those contaminating reads before downstream analysis. PhiX is commonly added as a control to increase base diversity during sequencing runs.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;p>This cookbook demonstrates how to identify and remove PhiX contamination using k-mer counting:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Count k-mers&lt;/strong>: Uses &lt;code>CalcKmers&lt;/code> to count how many 30-mers from each read match the PhiX genome&lt;/li>
&lt;li>&lt;strong>Export data&lt;/strong>: Saves k-mer counts to a TSV table for analysis&lt;/li>
&lt;li>&lt;strong>Filter reads&lt;/strong>: Removes reads with high PhiX k-mer counts (≥25 matching k-mers)&lt;/li>
&lt;/ol>
&lt;h2 id="understanding-the-approach">
 Understanding the Approach
 &lt;a class="anchor" href="#understanding-the-approach">#&lt;/a>
&lt;/h2>
&lt;h3 id="k-mer-counting">
 K-mer Counting
 &lt;a class="anchor" href="#k-mer-counting">#&lt;/a>
&lt;/h3>
&lt;p>The &lt;code>CalcKmers&lt;/code> step counts how many k-mers (short subsequences of length k) from each read are present in the PhiX reference genome:&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/05-quality-filtering/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/05-quality-filtering/</guid><description>&lt;h1 id="cookbook-05-quality-filtering">
 Cookbook 05: Quality Filtering
 &lt;a class="anchor" href="#cookbook-05-quality-filtering">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have sequencing data with varying quality and want to remove low-quality reads before downstream analysis. Poor quality reads can introduce errors in variant calling, assembly, and other analyses.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;p>This cookbook demonstrates quality-based filtering using expected error calculation:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Calculate Expected Errors&lt;/strong>: Uses &lt;code>CalcExpectedError&lt;/code> to compute the expected number of base call errors per read based on quality scores&lt;/li>
&lt;li>&lt;strong>Filter Low-Quality Reads&lt;/strong>: Uses &lt;code>FilterByNumericTag&lt;/code> to remove reads exceeding an error threshold&lt;/li>
&lt;li>&lt;strong>Generate Reports&lt;/strong>: Creates quality reports before and after filtering to show improvement&lt;/li>
&lt;/ol>
&lt;h2 id="understanding-expected-error">
 Understanding Expected Error
 &lt;a class="anchor" href="#understanding-expected-error">#&lt;/a>
&lt;/h2>
&lt;p>Expected error (EE) is a more nuanced quality metric than average quality score:&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/06-adapter-trimming/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/06-adapter-trimming/</guid><description>&lt;h1 id="cookbook-06-adapter-trimming-with-polya-tail-removal">
 Cookbook 06: Adapter Trimming with PolyA Tail Removal
 &lt;a class="anchor" href="#cookbook-06-adapter-trimming-with-polya-tail-removal">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have RNA-seq data that contains:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>PolyA tails&lt;/strong>: Stretches of A bases at the 3&amp;rsquo; end (or polyT at 5&amp;rsquo; for reverse strand)&lt;/li>
&lt;li>&lt;strong>Sequencing adapters&lt;/strong>: Illumina or other adapter sequences that need removal before alignment&lt;/li>
&lt;/ul>
&lt;p>These artifacts can interfere with alignment and downstream analysis if not removed.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;p>This cookbook demonstrates a complete adapter and polyA trimming workflow:&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/07-demultiplexing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/07-demultiplexing/</guid><description>&lt;h1 id="cookbook-07-demultiplexing-by-inline-barcode">
 Cookbook 07: Demultiplexing by Inline Barcode
 &lt;a class="anchor" href="#cookbook-07-demultiplexing-by-inline-barcode">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have pooled sequencing data from multiple samples that were tagged with unique barcode sequences during library preparation
and have not been demuliplexed by your sequencing facility.&lt;/p>
&lt;p>You need to:&lt;/p>
&lt;ul>
&lt;li>Extract the barcode(s) from each read&lt;/li>
&lt;li>Correct sequencing errors in barcodes&lt;/li>
&lt;li>Separate reads into individual files per sample&lt;/li>
&lt;/ul>
&lt;p>This is common in multiplexed sequencing runs to maximize sequencing efficiency and reduce costs.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/08-length-filtering/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/08-length-filtering/</guid><description>&lt;h1 id="cookbook-08-read-length-filtering-and-truncation">
 Cookbook 08: Read Length Filtering and Truncation
 &lt;a class="anchor" href="#cookbook-08-read-length-filtering-and-truncation">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have sequencing data with variable read lengths and need to:&lt;/p>
&lt;ul>
&lt;li>Remove reads that are too short (may align poorly or represent artifacts)&lt;/li>
&lt;li>Remove reads that are too long (may indicate technical issues)&lt;/li>
&lt;li>Truncate all reads to a uniform length (required by some downstream tools)&lt;/li>
&lt;/ul>
&lt;p>Read length filtering is important for:&lt;/p>
&lt;ul>
&lt;li>Quality control after adapter trimming&lt;/li>
&lt;li>Preparing data for tools that require uniform read lengths&lt;/li>
&lt;li>Removing degraded or artifactual sequences&lt;/li>
&lt;/ul>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;p>This cookbook demonstrates comprehensive read length management:&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/09-fastp-equivalent/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/09-fastp-equivalent/</guid><description>&lt;h1 id="cookbook-09-fastp-equivalent-workflow">
 Cookbook 09: Fastp-Equivalent Workflow
 &lt;a class="anchor" href="#cookbook-09-fastp-equivalent-workflow">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You want to replicate the default behavior of &lt;a href="https://github.com/OpenGene/fastp/">fastp&lt;/a> — a popular all-in-one FASTQ preprocessor — using a configurable pipeline. This is useful when you need reproducible, step-by-step control over each filtering stage, or want to extend the workflow beyond what fastp offers.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;p>This cookbook replicates fastp&amp;rsquo;s default single-end processing pipeline:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>PolyG Trimming&lt;/strong>: Uses &lt;code>ExtractPolyTail&lt;/code> + &lt;code>TrimAtTag&lt;/code> to remove polyG tails (Illumina NextSeq/NovaSeq artifact)&lt;/li>
&lt;li>&lt;strong>Adapter Trimming&lt;/strong>: Uses &lt;code>ExtractIUPAC&lt;/code> + &lt;code>TrimAtTag&lt;/code> to remove the Illumina TruSeq R1 adapter&lt;/li>
&lt;li>&lt;strong>N-base Filtering&lt;/strong>: Uses &lt;code>CalcNCount&lt;/code> + &lt;code>FilterByNumericTag&lt;/code> to remove reads with too many ambiguous bases (&lt;code>--n_base_limit 5&lt;/code>)&lt;/li>
&lt;li>&lt;strong>Quality Filtering&lt;/strong>: Uses &lt;code>CalcQualifiedBases&lt;/code> + &lt;code>FilterByNumericTag&lt;/code> to remove reads with too many low-quality bases (&lt;code>--qualified_quality_phred 15&lt;/code>, &lt;code>--unqualified_percent_limit 40&lt;/code>)&lt;/li>
&lt;li>&lt;strong>Length Filtering&lt;/strong>: Uses &lt;code>CalcLength&lt;/code> + &lt;code>FilterByNumericTag&lt;/code> to remove reads shorter than 15bp (&lt;code>--length_required 15&lt;/code>)&lt;/li>
&lt;/ol>
&lt;h2 id="fastp-defaults-replicated">
 Fastp Defaults Replicated
 &lt;a class="anchor" href="#fastp-defaults-replicated">#&lt;/a>
&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Fastp parameter&lt;/th>
 &lt;th>Value&lt;/th>
 &lt;th>Pipeline step&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>--poly_g_min_len&lt;/code>&lt;/td>
 &lt;td>10&lt;/td>
 &lt;td>&lt;code>ExtractPolyTail min_length = 10&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>--adapter_sequence&lt;/code>&lt;/td>
 &lt;td>&lt;code>AGATCGGAAGAGCACACGTCTGAACTCCAGTCA&lt;/code>&lt;/td>
 &lt;td>&lt;code>ExtractIUPAC query = ...&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>--n_base_limit&lt;/code>&lt;/td>
 &lt;td>5&lt;/td>
 &lt;td>&lt;code>FilterByNumericTag max_value = 6&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>--qualified_quality_phred&lt;/code>&lt;/td>
 &lt;td>15 (Phred) → 48 (ASCII)&lt;/td>
 &lt;td>&lt;code>CalcQualifiedBases threshold = 48&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>--unqualified_percent_limit&lt;/code>&lt;/td>
 &lt;td>40% → keep if ≥ 60% qualified&lt;/td>
 &lt;td>&lt;code>FilterByNumericTag min_value = 0.60&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>--length_required&lt;/code>&lt;/td>
 &lt;td>15&lt;/td>
 &lt;td>&lt;code>FilterByNumericTag min_value = 15&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>Note on quality scores&lt;/strong>: Quality values in FASTQ files are ASCII-encoded. Phred Q15 corresponds to ASCII character 48 (&lt;code>15 + 33 = 48&lt;/code>).&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/10-adapter-identification/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/10-adapter-identification/</guid><description>&lt;h1 id="cookbook-10-adapter-identification">
 Cookbook 10: Adapter Identification
 &lt;a class="anchor" href="#cookbook-10-adapter-identification">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have a FASTQ file and want to identify which sequencing adapter is present
before trimming — or to confirm no adapter contamination remains after
trimming. This is useful when the adapter type is unknown, when working with
data from multiple library prep kits, or when validating a trimming step.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Runs a single &lt;code>Report&lt;/code> step that counts exact occurrences of each common
adapter sequence in every read (&lt;code>count_oligos&lt;/code>)&lt;/li>
&lt;li>Writes an &lt;a href="../../../../cookbooks/10-adapter-identification/output.html">HTML&lt;/a> and JSON report — no reads are filtered or written to disk&lt;/li>
&lt;/ol>
&lt;h2 id="how-count_oligos-works">
 How count_oligos Works
 &lt;a class="anchor" href="#how-count_oligos-works">#&lt;/a>
&lt;/h2>
&lt;p>&lt;code>count_oligos&lt;/code> performs exact, full-sequence matching across every read. A read
is counted if the probe sequence appears verbatim anywhere within it. There are
no mismatches and no IUPAC wildcards. A non-zero count means reads carry at
least one complete copy of that adapter.&lt;/p></description></item></channel></rss>