Filter alignments by flags, mapping quality, and regions using samtools view and pysam. Use when extracting specific reads, removing low-quality alignments, or subsetting to target regions.
Resources
2Install
npx skillscat add gptomics/bioskills/bio-alignment-filtering Install via the SkillsCat registry.
Version Compatibility
Reference examples tested with: pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
- Python:
pip show <package>thenhelp(module.function)to check signatures - CLI:
<tool> --versionthen<tool> --helpto confirm flags
If code throws ImportError, AttributeError, or TypeError, introspect the installed
package and adapt the example to match the actual API rather than retrying.
Alignment Filtering
"Filter my BAM file to keep only high-quality reads" → Select reads by FLAG bits, mapping quality, and genomic regions using samtools view or pysam.
- CLI:
samtools viewwith-F/-f/-q/-Lflags (samtools) - Python:
pysam.AlignmentFileiteration with attribute filters (pysam)
Filter alignments by flags, quality, and regions using samtools and pysam.
Filter Flags
| Option | Description |
|---|---|
-f FLAG |
Include reads with ALL bits set |
-F FLAG |
Exclude reads with ANY bits set |
-G FLAG |
Exclude reads with ALL bits set |
-q MAPQ |
Minimum mapping quality |
-L BED |
Include reads overlapping regions |
Common FLAG Values
| Flag | Hex | Meaning |
|---|---|---|
| 1 | 0x1 | Paired |
| 2 | 0x2 | Proper pair |
| 4 | 0x4 | Unmapped |
| 8 | 0x8 | Mate unmapped |
| 16 | 0x10 | Reverse strand |
| 32 | 0x20 | Mate reverse strand |
| 64 | 0x40 | First in pair (read1) |
| 128 | 0x80 | Second in pair (read2) |
| 256 | 0x100 | Secondary alignment |
| 512 | 0x200 | Failed QC |
| 1024 | 0x400 | Duplicate |
| 2048 | 0x800 | Supplementary |
Filter by FLAG
Keep Only Mapped Reads
samtools view -F 4 -o mapped.bam input.bamKeep Only Unmapped Reads
samtools view -f 4 -o unmapped.bam input.bamKeep Only Properly Paired
samtools view -f 2 -o proper.bam input.bamRemove Duplicates
samtools view -F 1024 -o nodup.bam input.bamRemove Secondary and Supplementary
samtools view -F 2304 -o primary.bam input.bamKeep Only Primary Alignments
samtools view -F 256 -F 2048 -o primary.bam input.bam
# Or combined: -F 2304Keep Read1 Only
samtools view -f 64 -o read1.bam input.bamKeep Read2 Only
samtools view -f 128 -o read2.bam input.bamForward Strand Only
samtools view -F 16 -o forward.bam input.bamReverse Strand Only
samtools view -f 16 -o reverse.bam input.bamFilter by Mapping Quality
Minimum MAPQ
samtools view -q 30 -o highqual.bam input.bamMAPQ and Mapped
samtools view -F 4 -q 30 -o filtered.bam input.bamCommon MAPQ Thresholds
| MAPQ | Meaning |
|---|---|
| 0 | Mapped to multiple locations equally well |
| 20 | ~1% chance of wrong mapping |
| 30 | ~0.1% chance of wrong mapping |
| 40 | ~0.01% chance of wrong mapping |
| 60 | Unique mapping (BWA max) |
Filter by Region
Single Region
samtools view -o region.bam input.bam chr1:1000000-2000000Multiple Regions
samtools view -o regions.bam input.bam chr1:1000-2000 chr2:3000-4000Regions from BED File
samtools view -L targets.bed -o targets.bam input.bamCombine Region and Quality
samtools view -q 30 -L targets.bed -o filtered.bam input.bamCombined Filters
Standard Quality Filter
Goal: Produce a clean BAM containing only primary, mapped, non-duplicate reads with high mapping confidence.
Approach: Combine FLAG exclusion (-F for unmapped + secondary + duplicate + supplementary) with a MAPQ threshold.
Reference (samtools 1.19+):
samtools view -F 3332 -q 30 -o filtered.bam input.bam
# 3332 = 4 (unmapped) + 256 (secondary) + 1024 (duplicate) + 2048 (supplementary)Variant Calling Prep
Goal: Prepare alignments for variant calling by keeping only properly paired, primary, deduplicated reads.
Approach: Require proper pair flag (-f 2), exclude secondary/duplicate/supplementary (-F 3328), and set a MAPQ floor.
Reference (samtools 1.19+):
samtools view -f 2 -F 3328 -q 20 -o clean.bam input.bam
# 3328 = 256 (secondary) + 1024 (duplicate) + 2048 (supplementary)
# Note: -f 2 (proper pair) implies mapped, so -F 4 is not strictly neededChIP-seq Filter
# Remove duplicates and low MAPQ
samtools view -F 1024 -q 30 -o filtered.bam input.bamSubsample Reads
Random Subsample
# Keep ~10% of reads
samtools view -s 0.1 -o subset.bam input.bam
# With seed for reproducibility
samtools view -s 42.1 -o subset.bam input.bamSubsample to Target Count
# Calculate fraction needed
total=$(samtools view -c input.bam)
frac=$(echo "scale=4; 1000000 / $total" | bc)
samtools view -s "$frac" -o subset.bam input.bampysam Python Alternative
Basic Filtering
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('filtered.bam', 'wb', header=infile.header) as outfile:
for read in infile:
if read.is_unmapped:
continue
if read.mapping_quality < 30:
continue
if read.is_duplicate:
continue
outfile.write(read)Filter with Function
Goal: Apply a multi-criteria quality filter to produce clean alignments for downstream analysis.
Approach: Define a predicate checking mapped status, primary alignment, duplicate flag, and MAPQ; stream reads through it.
Reference (pysam 0.22+):
import pysam
def passes_filter(read):
if read.is_unmapped:
return False
if read.is_secondary or read.is_supplementary:
return False
if read.is_duplicate:
return False
if read.mapping_quality < 30:
return False
return True
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('filtered.bam', 'wb', header=infile.header) as outfile:
for read in infile:
if passes_filter(read):
outfile.write(read)Filter by Region
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('region.bam', 'wb', header=infile.header) as outfile:
for read in infile.fetch('chr1', 1000000, 2000000):
outfile.write(read)Filter from BED File
Goal: Extract only reads overlapping target regions defined in a BED file.
Approach: Parse BED into a list of (chrom, start, end) tuples, then fetch reads from each region and write to output.
Reference (pysam 0.22+):
import pysam
def read_bed(bed_path):
regions = []
with open(bed_path) as f:
for line in f:
if line.startswith('#'):
continue
parts = line.strip().split('\t')
regions.append((parts[0], int(parts[1]), int(parts[2])))
return regions
regions = read_bed('targets.bed')
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('targets.bam', 'wb', header=infile.header) as outfile:
for chrom, start, end in regions:
for read in infile.fetch(chrom, start, end):
outfile.write(read)Subsample
import pysam
import random
random.seed(42)
fraction = 0.1
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('subset.bam', 'wb', header=infile.header) as outfile:
for read in infile:
if random.random() < fraction:
outfile.write(read)Quick Reference
| Task | samtools command |
|---|---|
| Mapped only | view -F 4 |
| Unmapped only | view -f 4 |
| Properly paired | view -f 2 |
| Primary only | view -F 2304 |
| No duplicates | view -F 1024 |
| High MAPQ | view -q 30 |
| Region | view file.bam chr1:1-1000 |
| BED regions | view -L file.bed |
| Subsample 10% | view -s 0.1 |
| Standard filter | view -F 3332 -q 30 |
Common Filter Combinations
| Purpose | Flags |
|---|---|
| Clean reads | -F 3332 -q 30 (mapped, primary, no dups, high qual) |
| Variant calling | -f 2 -F 3328 -q 20 (proper pair, primary, no dups) |
| Coverage analysis | -F 1284 -q 1 (mapped, primary, no dups) |
| Count unique | -F 2304 (primary only) |
Flag breakdowns:
- 2304 = 256 + 2048 (secondary + supplementary)
- 3328 = 256 + 1024 + 2048 (secondary + duplicate + supplementary)
- 3332 = 4 + 256 + 1024 + 2048 (unmapped + secondary + duplicate + supplementary)
- 1284 = 4 + 256 + 1024 (unmapped + secondary + duplicate)
Related Skills
- sam-bam-basics - View and understand alignment files
- alignment-sorting - Sort before/after filtering
- alignment-indexing - Required for region filtering
- duplicate-handling - Mark duplicates before filtering
- bam-statistics - Check filter effects