Cookbook 01: Basic Quality Report #

Use Case #

You have FastQ files from a sequencing run and want to generate comprehensive quality reports to assess:

Read quality scores
Base composition
Read length distribution
Duplicate read counts

This is typically the first step in any sequencing data analysis to understand data quality before downstream processing.

What This Pipeline Does #

Reads input FastQ file(s)
Generates a comprehensive quality report including:
- Base quality statistics
- Base distribution across positions
- Read length distribution
- Duplicate read counting
Outputs reports in both HTML (human-readable) and JSON (machine-readable) formats
Passes through all reads unchanged (no filtering)

Input Files #

input/sample_R1.fq - Forward reads (Read 1) from paired-end sequencing

Output Files #

output_R1.fq - Passed-through reads (identical to input)
output.report_initial.html - HTML quality report
output.report_initial.json - JSON quality report with detailed statistics

When to Use This #

First analysis of new sequencing data
Quality control before committing to expensive downstream analysis
Comparing data quality across different sequencing runs
Identifying potential issues (adapter contamination, quality drop-off, etc.)

Download #

Download 01-basic-quality-report.tar.gz for a complete, runnable example including expected output files.

Configuration File #

[input]
    # Single-end reads for this example
    # For paired-end data, you would also include: read2 = 'input/sample_R2.fq'
    read1 = 'input/sample_R1.fq'

[[step]]
    # Generate a comprehensive quality report
    action = 'Report'
    name = 'initial'

    # Count total number of reads
    count = true

    # Analyze base quality scores and GC content
    base_statistics = true
    # Analyze the distribution of read lengths

    length_distribution = true

    # Count duplicate reads (identical sequences)
    duplicate_count_per_read = true

    # Count duplicate reads (identical sequences)
    duplicate_count_per_fragment = true

	count_oligos = [
		'AAAAAA',
		'GATCGGAAGAGCACACGTCTGAACTCCAGTCAC',
		'ATCTCGTATGCCGTCTTCTGCTTG',
		'GGGGGGGGGGG',
	]


[output]
    # Output prefix for all files
    prefix = 'output'

    # Generate both HTML and JSON reports
    report_html = true
    report_json = true

    # Output format (FASTQ = uncompressed FASTQ format)
    format = "FASTQ"