Cookbook 01: Basic Quality Report #
Use Case #
You have FastQ files from a sequencing run and want to generate comprehensive quality reports to assess:
- Read quality scores
- Base composition
- Read length distribution
- Duplicate read counts
This is typically the first step in any sequencing data analysis to understand data quality before downstream processing.
What This Pipeline Does #
- Reads input FastQ file(s)
- Generates a comprehensive quality report including:
- Base quality statistics
- Base distribution across positions
- Read length distribution
- Duplicate read counting
- Outputs reports in both HTML (human-readable) and JSON (machine-readable) formats
- Passes through all reads unchanged (no filtering)
Input Files #
input/sample_R1.fq- Forward reads (Read 1) from paired-end sequencing
Output Files #
output_R1.fq- Passed-through reads (identical to input)output.report_initial.html- HTML quality reportoutput.report_initial.json- JSON quality report with detailed statistics
When to Use This #
- First analysis of new sequencing data
- Quality control before committing to expensive downstream analysis
- Comparing data quality across different sequencing runs
- Identifying potential issues (adapter contamination, quality drop-off, etc.)
Download #
Download 01-basic-quality-report.tar.gz for a complete, runnable example including expected output files.
Configuration File #
[input]
# Single-end reads for this example
# For paired-end data, you would also include: read2 = 'input/sample_R2.fq'
read1 = 'input/sample_R1.fq'
[[step]]
# Generate a comprehensive quality report
action = 'Report'
name = 'initial'
# Count total number of reads
count = true
# Analyze base quality scores and GC content
base_statistics = true
# Analyze the distribution of read lengths
length_distribution = true
# Count duplicate reads (identical sequences)
duplicate_count_per_read = true
# Count duplicate reads (identical sequences)
duplicate_count_per_fragment = true
count_oligos = [
'AAAAAA',
'GATCGGAAGAGCACACGTCTGAACTCCAGTCAC',
'ATCTCGTATGCCGTCTTCTGCTTG',
'GGGGGGGGGGG',
]
[output]
# Output prefix for all files
prefix = 'output'
# Generate both HTML and JSON reports
report_html = true
report_json = true
# Output format (FASTQ = uncompressed FASTQ format)
format = "FASTQ"