mirpy documentation#

AIRR-seq toolkit

mirpy is a Python toolkit for immune repertoire analysis. It brings together parsers, repertoire abstractions, diversity utilities, and exploratory notebooks in one place.

What mirpy covers#

Large AIRR-seq datasets

Parse AIRR-seq repertoires, clonotype tables, and segment annotations into reusable Python objects for downstream analysis.

Basic repertoire statistics

Compute diversity summaries, richness estimates, rarefaction, and counts such as singletons and doubletons.

K-mer analysis

Work with sequence k-mers for repertoire comparison, enrichment analysis, and exploratory feature construction.

T-cell marker discovery

Search for T-cell biomarkers and enriched clonotype patterns across cohorts and phenotype groups.

Gene usage matrices

Build and analyze V/J usage matrices for samples, cohorts, and comparative repertoire studies.

Resampling workflows

Resample repertoires at matched depths or adjusted segment usage profiles for controlled comparisons.

Clonotype filtering and clustering

Select clonotypes, compare them by distance, and cluster related sequences into analysis-ready groups.

Embeddings

Generate repertoire and prototype embeddings that can be used in downstream machine learning and visualization pipelines.

Quick example#

import gzip

from mir.basic.token_tables import filter_token_table, tokenize_clonotypes
from mir.common.clonotype import Clonotype
from mir.graph.token_graph import build_token_graph

# Load CDR3 sequences and build an RS-filtered token graph
with gzip.open("gilgfvftl_trb_cdr3.txt.gz", "rt") as fh:
    cdr3s = [line.strip() for line in fh if line.strip()]

clonotypes = [
    Clonotype(junction_aa=seq, locus="TRB", v_gene="TRBV", duplicate_count=1)
    for seq in cdr3s
]

table    = tokenize_clonotypes(clonotypes, k=3)
rs_table = filter_token_table(table, kmer_pattern="RS")
g_rs     = build_token_graph(clonotypes, rs_table)

# Largest connected component — the RS-bearing clonotype cluster
rs_cluster = g_rs.components().giant()
print(f"RS cluster: {rs_cluster.vcount()} nodes")

Explore next#

  • Getting Started for the shortest path from install to first parsed repertoire

  • Getting Started for the Copilot agent and companion prompt (/mirpy-analysis)

  • Examples for the full notebook gallery published from the repository

  • mir for API documentation generated from the current codebase

  • GitHub repository for source browsing and notebooks