Generate alignment statistics using samtools flagstat, stats, depth, and coverage. Use when assessing alignment quality, calculating coverage, or generating QC reports.
Resources
2Install
npx skillscat add gptomics/bioskills/bio-alignment-files-bam-statistics Install via the SkillsCat registry.
Version Compatibility
Reference examples tested with: pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures - CLI:
<tool> --versionthen<tool> --helpto confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
BAM Statistics
"Get alignment statistics and coverage from my BAM file" → Generate read counts, mapping rates, per-chromosome statistics, depth profiles, and coverage summaries.
- CLI:
samtools flagstat,samtools stats,samtools depth,samtools coverage(samtools) - Python:
pysam.AlignmentFilewithpileup()andget_index_statistics()(pysam)
Generate alignment statistics using samtools and pysam.
Quick Summary Commands
| Command | Output | Speed |
|---|---|---|
flagstat |
Read counts by category | Very fast |
idxstats |
Per-chromosome counts | Very fast (needs index) |
stats |
Comprehensive statistics | Moderate |
depth |
Per-position depth | Slow (full scan) |
coverage |
Per-region coverage | Fast (needs index) |
samtools flagstat
Fast summary of alignment flags.
samtools flagstat input.bamOutput:
10000000 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
50000 + 0 supplementary
0 + 0 duplicates
9800000 + 0 mapped (98.00% : N/A)
9950000 + 0 paired in sequencing
4975000 + 0 read1
4975000 + 0 read2
9700000 + 0 properly paired (97.49% : N/A)
9750000 + 0 with itself and mate mapped
100000 + 0 singletons (1.01% : N/A)
25000 + 0 with mate mapped to a different chr
10000 + 0 with mate mapped to a different chr (mapQ>=5)Multi-threaded
samtools flagstat -@ 4 input.bamOutput to File
samtools flagstat input.bam > flagstat.txtsamtools idxstats
Per-chromosome read counts (requires index).
samtools idxstats input.bamOutput format: chrom length mapped unmapped
chr1 248956422 5000000 1000
chr2 242193529 4800000 800
chrM 16569 50000 100
* 0 0 150000Parse idxstats
# Total mapped reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
# Mitochondrial percentage
samtools idxstats input.bam | awk '
/^chrM/ {mt = $3}
{total += $3}
END {print mt/total*100 "% mitochondrial"}'samtools stats
Comprehensive statistics including insert size, base quality, and more.
samtools stats input.bam > stats.txtView Summary Numbers
samtools stats input.bam | grep "^SN"Key summary fields:
raw total sequences- Total readsreads mapped- Mapped readsreads mapped and paired- Properly pairedinsert size average- Mean insert sizeinsert size standard deviation- Insert size spreadaverage length- Mean read lengtherror rate- Mismatch rate
Generate Plots (with plot-bamstats)
samtools stats input.bam > stats.txt
plot-bamstats -p plots/ stats.txtStats for Specific Region
samtools stats input.bam chr1:1000000-2000000 > region_stats.txtsamtools depth
Per-position read depth.
Basic Depth
samtools depth input.bam > depth.txtOutput: chrom position depth
Depth at Specific Positions
samtools depth -r chr1:1000-2000 input.bamInclude Zero-Depth Positions
samtools depth -a input.bam > depth_with_zeros.txtMaximum Depth Cap
samtools depth -d 0 input.bam # No cap (default 8000)Depth from BED Regions
samtools depth -b regions.bed input.bamCalculate Mean Depth
samtools depth input.bam | awk '{sum += $3; n++} END {print sum/n}'samtools coverage
Per-chromosome or per-region coverage statistics (faster than depth).
samtools coverage input.bamOutput columns:
#rname- Reference namestartpos- Start positionendpos- End positionnumreads- Number of readscovbases- Bases with coveragecoverage- Percentage of bases coveredmeandepth- Mean depthmeanbaseq- Mean base qualitymeanmapq- Mean mapping quality
Coverage for Specific Region
samtools coverage -r chr1:1000000-2000000 input.bamCoverage from BED
samtools coverage -b regions.bed input.bamHistogram Output
samtools coverage -m input.bampysam Python Alternative
Count Reads
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
total = mapped = paired = proper = 0
for read in bam:
total += 1
if not read.is_unmapped:
mapped += 1
if read.is_paired:
paired += 1
if read.is_proper_pair:
proper += 1
print(f'Total: {total}')
print(f'Mapped: {mapped} ({mapped/total*100:.1f}%)')
print(f'Properly paired: {proper} ({proper/paired*100:.1f}%)')Per-Chromosome Counts
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for stat in bam.get_index_statistics():
print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')Calculate Depth at Position
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup in bam.pileup('chr1', 1000000, 1000001):
print(f'Position {pileup.pos}: depth {pileup.n}')Mean Depth in Region
import pysam
def mean_depth(bam_path, chrom, start, end):
depths = []
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup in bam.pileup(chrom, start, end, truncate=True):
depths.append(pileup.n)
if depths:
return sum(depths) / len(depths)
return 0
depth = mean_depth('input.bam', 'chr1', 1000000, 2000000)
print(f'Mean depth: {depth:.1f}x')Coverage Statistics
Goal: Compute coverage breadth and depth for a genomic region from a BAM file.
Approach: Iterate pileup columns in the region, count covered positions and accumulate depth, then derive percentages and means.
Reference (pysam 0.22+):
import pysam
def coverage_stats(bam_path, chrom, start, end):
covered = 0
total_depth = 0
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup in bam.pileup(chrom, start, end, truncate=True):
covered += 1
total_depth += pileup.n
length = end - start
pct_covered = covered / length * 100
mean_depth = total_depth / length if length > 0 else 0
return {
'length': length,
'covered_bases': covered,
'pct_covered': pct_covered,
'mean_depth': mean_depth
}
stats = coverage_stats('input.bam', 'chr1', 1000000, 2000000)
print(f'Coverage: {stats["pct_covered"]:.1f}%')
print(f'Mean depth: {stats["mean_depth"]:.1f}x')Insert Size Distribution
Goal: Compute the insert size distribution to assess library preparation quality.
Approach: Iterate properly paired read1 records, accumulate template lengths into a Counter, then compute summary statistics.
Reference (pysam 0.22+):
import pysam
from collections import Counter
insert_sizes = Counter()
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam:
if read.is_proper_pair and read.is_read1 and read.template_length > 0:
insert_sizes[read.template_length] += 1
sizes = list(insert_sizes.keys())
mean_insert = sum(s * c for s, c in insert_sizes.items()) / sum(insert_sizes.values())
print(f'Mean insert size: {mean_insert:.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')Quick Reference
| Task | Command |
|---|---|
| Quick counts | samtools flagstat input.bam |
| Per-chrom counts | samtools idxstats input.bam |
| Full stats | samtools stats input.bam |
| Coverage summary | samtools coverage input.bam |
| Per-position depth | samtools depth input.bam |
| Mean depth | samtools depth input.bam | awk '{sum+=$3;n++}END{print sum/n}' |
Common Metrics
| Metric | Good | Concerning |
|---|---|---|
| Mapping rate | >95% | <80% |
| Proper pair rate | >90% | <70% |
| Duplicate rate | <20% | >40% |
| Error rate | <1% | >2% |
| Coverage uniformity | <2x CV | >3x CV |
Related Skills
- sam-bam-basics - View alignment files
- alignment-indexing - idxstats requires index
- duplicate-handling - Check duplicate rates
- alignment-filtering - Filter before stats
- sequence-io/sequence-statistics - FASTA/FASTQ statistics