<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Cookbooks on mbf-fastq-processor documentation</title><link>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/</link><description>Recent content in Cookbooks on mbf-fastq-processor documentation</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/index.xml" rel="self" type="application/rss+xml"/><item><title/><link>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/01-basic-quality-report/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/01-basic-quality-report/</guid><description>&lt;h1 id="cookbook-01-basic-quality-report">
 Cookbook 01: Basic Quality Report
 &lt;a class="anchor" href="#cookbook-01-basic-quality-report">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have FastQ files from a sequencing run and want to generate comprehensive quality reports to assess:&lt;/p>
&lt;ul>
&lt;li>Read quality scores&lt;/li>
&lt;li>Base composition&lt;/li>
&lt;li>Read length distribution&lt;/li>
&lt;li>Duplicate read counts&lt;/li>
&lt;/ul>
&lt;p>This is typically the first step in any sequencing data analysis to understand data quality before downstream processing.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Reads input FastQ file(s)&lt;/li>
&lt;li>Generates a comprehensive quality report including:
&lt;ul>
&lt;li>Base quality statistics&lt;/li>
&lt;li>Base distribution across positions&lt;/li>
&lt;li>Read length distribution&lt;/li>
&lt;li>Duplicate read counting&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Outputs reports in both HTML (human-readable) and JSON (machine-readable) formats&lt;/li>
&lt;li>Passes through all reads unchanged (no filtering)&lt;/li>
&lt;/ol>
&lt;h2 id="input-files">
 Input Files
 &lt;a class="anchor" href="#input-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>input/sample_R1.fq&lt;/code> - Forward reads (Read 1) from paired-end sequencing&lt;/li>
&lt;/ul>
&lt;h2 id="output-files">
 Output Files
 &lt;a class="anchor" href="#output-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>output_R1.fq&lt;/code> - Passed-through reads (identical to input)&lt;/li>
&lt;li>&lt;code>output.report_initial.html&lt;/code> - HTML quality report&lt;/li>
&lt;li>&lt;code>output.report_initial.json&lt;/code> - JSON quality report with detailed statistics&lt;/li>
&lt;/ul>
&lt;h2 id="when-to-use-this">
 When to Use This
 &lt;a class="anchor" href="#when-to-use-this">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>First analysis of new sequencing data&lt;/li>
&lt;li>Quality control before committing to expensive downstream analysis&lt;/li>
&lt;li>Comparing data quality across different sequencing runs&lt;/li>
&lt;li>Identifying potential issues (adapter contamination, quality drop-off, etc.)&lt;/li>
&lt;/ul>
&lt;h2 id="download">
 Download
 &lt;a class="anchor" href="#download">#&lt;/a>
&lt;/h2>
&lt;p>&lt;a href="../../../../cookbooks/01-basic-quality-report.tar.gz">Download 01-basic-quality-report.tar.gz&lt;/a> for a complete, runnable example including expected output files.&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/02-umi-extraction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/02-umi-extraction/</guid><description>&lt;h1 id="cookbook-02-umi-extraction">
 Cookbook 02: UMI Extraction
 &lt;a class="anchor" href="#cookbook-02-umi-extraction">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have sequencing data with Unique Molecular Identifiers (UMIs) embedded in the reads. UMIs are short random barcodes added during library preparation that allow you to:&lt;/p>
&lt;ul>
&lt;li>Identify and remove PCR duplicates&lt;/li>
&lt;li>Distinguish true biological duplicates from amplification artifacts&lt;/li>
&lt;li>Improve accuracy in quantitative analyses (RNA-seq, ATAC-seq, etc.)&lt;/li>
&lt;/ul>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Reads input FastQ file with UMIs at the start of read1&lt;/li>
&lt;li>Extracts the UMI sequence (first 8 bases) and creates a tag&lt;/li>
&lt;li>Stores the UMI in the read comment (FASTQ header)&lt;/li>
&lt;li>Removes the UMI bases from the read sequence (so they don&amp;rsquo;t interfere with alignment)&lt;/li>
&lt;li>Outputs modified reads with UMI preserved in the header&lt;/li>
&lt;/ol>
&lt;h2 id="input-files">
 Input Files
 &lt;a class="anchor" href="#input-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>input/sample_R1.fq&lt;/code> - Reads with 8bp UMI at the start&lt;/li>
&lt;/ul>
&lt;h2 id="output-files">
 Output Files
 &lt;a class="anchor" href="#output-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>output_R1.fq&lt;/code> - Reads with UMI in comment, UMI bases removed from sequence&lt;/li>
&lt;/ul>
&lt;h2 id="configuration-highlights">
 Configuration Highlights
 &lt;a class="anchor" href="#configuration-highlights">#&lt;/a>
&lt;/h2>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-toml" data-lang="toml">&lt;span style="display:flex;">&lt;span>[[&lt;span style="color:#a6e22e">step&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Extract UMI from positions 0-7 (8 bases)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">action&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;ExtractRegions&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">label&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;umi&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">regions&lt;/span> = [{&lt;span style="color:#a6e22e">segment&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;read1&amp;#39;&lt;/span>, &lt;span style="color:#a6e22e">start&lt;/span> = &lt;span style="color:#ae81ff">0&lt;/span>, &lt;span style="color:#a6e22e">length&lt;/span> = &lt;span style="color:#ae81ff">8&lt;/span>}]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>[[&lt;span style="color:#a6e22e">step&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Store UMI in the FASTQ comment&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">action&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;StoreTagInComment&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">label&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;umi&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>[[&lt;span style="color:#a6e22e">step&lt;/span>]]
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Remove the UMI bases from the read&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">action&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;CutStart&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">target&lt;/span> = &lt;span style="color:#e6db74">&amp;#39;Read1&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#a6e22e">n&lt;/span> = &lt;span style="color:#ae81ff">8&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="workflow-details">
 Workflow Details
 &lt;a class="anchor" href="#workflow-details">#&lt;/a>
&lt;/h2>
&lt;p>&lt;strong>Before processing:&lt;/strong>&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/03-lexogen-quantseq/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/03-lexogen-quantseq/</guid><description>&lt;h1 id="cookbook-03-lexogen-quantseq-processing">
 Cookbook 03: Lexogen QuantSeq Processing
 &lt;a class="anchor" href="#cookbook-03-lexogen-quantseq-processing">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>Lexogen QuantSeq is a popular 3&amp;rsquo; mRNA sequencing protocol optimized for gene expression profiling. The library structure includes:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>First 8 bases&lt;/strong>: UMI (Unique Molecular Identifier) for deduplication&lt;/li>
&lt;li>&lt;strong>Next 6 bases&lt;/strong>: Random hexamer primer sequence (needs removal)&lt;/li>
&lt;li>&lt;strong>Remaining sequence&lt;/strong>: Actual cDNA from the 3&amp;rsquo; end of transcripts&lt;/li>
&lt;/ul>
&lt;p>This cookbook demonstrates the standard preprocessing for QuantSeq data before alignment.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;ol>
&lt;li>Extracts the 8bp UMI from the start of reads&lt;/li>
&lt;li>Stores the UMI in the read comment (FASTQ header)&lt;/li>
&lt;li>Removes the first 14 bases total (8bp UMI + 6bp random hexamer)&lt;/li>
&lt;li>Outputs processed reads ready for alignment&lt;/li>
&lt;/ol>
&lt;h2 id="input-files">
 Input Files
 &lt;a class="anchor" href="#input-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>input/quantseq_sample.fq&lt;/code> - Raw QuantSeq reads with UMI and random hexamer&lt;/li>
&lt;/ul>
&lt;h2 id="output-files">
 Output Files
 &lt;a class="anchor" href="#output-files">#&lt;/a>
&lt;/h2>
&lt;ul>
&lt;li>&lt;code>output_read1.fq&lt;/code> - Processed reads with:
&lt;ul>
&lt;li>UMI stored in comment&lt;/li>
&lt;li>First 14bp removed&lt;/li>
&lt;li>Ready for alignment to reference genome&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="workflow-details">
 Workflow Details
 &lt;a class="anchor" href="#workflow-details">#&lt;/a>
&lt;/h2>
&lt;p>&lt;strong>Raw read structure:&lt;/strong>&lt;/p></description></item><item><title/><link>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/04-phix-removal/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://tyberiusprime.github.io/fastqrab/v0.8.1/docs/how-to/cookbooks/04-phix-removal/</guid><description>&lt;h1 id="cookbook-04-phix-removal">
 Cookbook 04: PhiX Removal
 &lt;a class="anchor" href="#cookbook-04-phix-removal">#&lt;/a>
&lt;/h1>
&lt;h2 id="use-case">
 Use Case
 &lt;a class="anchor" href="#use-case">#&lt;/a>
&lt;/h2>
&lt;p>You have Illumina PhiX spike-in sequences in your dataset and want to remove those contaminating reads before downstream analysis. PhiX is commonly added as a control to increase base diversity during sequencing runs.&lt;/p>
&lt;h2 id="what-this-pipeline-does">
 What This Pipeline Does
 &lt;a class="anchor" href="#what-this-pipeline-does">#&lt;/a>
&lt;/h2>
&lt;p>This cookbook demonstrates how to identify and remove PhiX contamination using k-mer counting:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Count k-mers&lt;/strong>: Uses &lt;code>CalcKmers&lt;/code> to count how many 30-mers from each read match the PhiX genome&lt;/li>
&lt;li>&lt;strong>Export data&lt;/strong>: Saves k-mer counts to a TSV table for analysis&lt;/li>
&lt;li>&lt;strong>Filter reads&lt;/strong>: Removes reads with high PhiX k-mer counts (≥25 matching k-mers)&lt;/li>
&lt;/ol>
&lt;h2 id="understanding-the-approach">
 Understanding the Approach
 &lt;a class="anchor" href="#understanding-the-approach">#&lt;/a>
&lt;/h2>
&lt;h3 id="k-mer-counting">
 K-mer Counting
 &lt;a class="anchor" href="#k-mer-counting">#&lt;/a>
&lt;/h3>
&lt;p>The &lt;code>CalcKmers&lt;/code> step counts how many k-mers (short subsequences of length k) from each read are present in the PhiX reference genome:&lt;/p></description></item></channel></rss>