mirpy documentation#
AIRR-seq toolkit
mirpy is a Python toolkit for immune repertoire analysis. It brings together parsers, repertoire abstractions, diversity utilities, and exploratory notebooks in one place.
Getting Started
Install the package, load segment libraries, parse repertoires, and understand the main concepts.
Notebook Gallery
Open every analysis notebook from the repository directly in the docs, without maintaining duplicate examples.
Repository
Jump to the GitHub repository for source code, issues, notebooks, and the current development state.
What mirpy covers#
Large AIRR-seq datasets
Parse AIRR-seq repertoires, clonotype tables, and segment annotations into reusable Python objects for downstream analysis.
Basic repertoire statistics
Compute diversity summaries, richness estimates, rarefaction, and counts such as singletons and doubletons.
K-mer analysis
Work with sequence k-mers for repertoire comparison, enrichment analysis, and exploratory feature construction.
T-cell marker discovery
Search for T-cell biomarkers and enriched clonotype patterns across cohorts and phenotype groups.
Gene usage matrices
Build and analyze V/J usage matrices for samples, cohorts, and comparative repertoire studies.
Resampling workflows
Resample repertoires at matched depths or adjusted segment usage profiles for controlled comparisons.
Clonotype filtering and clustering
Select clonotypes, compare them by distance, and cluster related sequences into analysis-ready groups.
Embeddings
Generate repertoire and prototype embeddings that can be used in downstream machine learning and visualization pipelines.
Quick example#
import gzip
from mir.basic.token_tables import filter_token_table, tokenize_clonotypes
from mir.common.clonotype import Clonotype
from mir.graph.token_graph import build_token_graph
# Load CDR3 sequences and build an RS-filtered token graph
with gzip.open("gilgfvftl_trb_cdr3.txt.gz", "rt") as fh:
cdr3s = [line.strip() for line in fh if line.strip()]
clonotypes = [
Clonotype(junction_aa=seq, locus="TRB", v_gene="TRBV", duplicate_count=1)
for seq in cdr3s
]
table = tokenize_clonotypes(clonotypes, k=3)
rs_table = filter_token_table(table, kmer_pattern="RS")
g_rs = build_token_graph(clonotypes, rs_table)
# Largest connected component — the RS-bearing clonotype cluster
rs_cluster = g_rs.components().giant()
print(f"RS cluster: {rs_cluster.vcount()} nodes")
Explore next#
Getting Started for the shortest path from install to first parsed repertoire
Getting Started for the Copilot agent and companion prompt (
/mirpy-analysis)Examples for the full notebook gallery published from the repository
mir for API documentation generated from the current codebase
GitHub repository for source browsing and notebooks