Getting started#
Installation#
$ bash setup.sh
$ conda activate tcren
setup.sh creates the tcren conda environment and installs tcren in editable mode.
The TCR-annotation backend arda (mmseqs2-based) is pulled in automatically as a pinned
git dependency (tag 2.0.1); its C++ extension builds against the conda toolchain.
Command line#
End-to-end candidate-epitope scoring from a structure:
$ tcren score -s complex.pdb -c candidates.txt -o ranked.csv
The full pipeline — annotate → superimpose → resmarkup / canonical Cα / contacts → per-interface
energies (TCRen for TCR↔peptide, MJ for TCR↔MHC and peptide↔MHC) plus the total — is one command
(tcren.run_pipeline(structure) in the library):
$ tcren pipeline -s complex.pdb -o scores.csv
Inputs accept .pdb/.cif/.pdb.gz/.cif.gz, a directory, or a .tar.gz batch;
identifiers are resolved from the file names:
$ tcren contacts -s batch.tar.gz -o contacts.csv --interface tcr_peptide
$ tcren annotate -s complex.cif.gz -o markup.csv --regions mhc --pseudo
tcren annotate emits one per-residue markup table covering TCR (CDR/FR), MHC groove
(helices/floor) and peptide; --regions all|tcr|mhc|peptide filters it to one chain class and
--pseudo additionally marks the NetMHCpan MHC pseudosequence residues (region MPS). It
replaces the old separate tcren mhc command.
There are two orientation commands (chains are renamed A=Vα, B=Vβ, C=peptide,
D=MHCα, E=MHCβ/β2m):
tcren superimposebrings a new structure into the canonical frame by superposing its conserved MHC groove Cα onto a canonical database. It detects the input’s MHC class and species, selects every database structure of the same class and species, superposes against each (sequence alignment fixes the residue correspondence), and averages the rigid transforms — translations by mean, rotations by the chordal (SVD-orthonormalised) mean — into one consensus placement. The database defaults todata/Canonical2026(populated at install).tcren orientbuilds a canonical database from native complexes by deriving the per-class canonical frame (this is howCanonical2026itself is produced). Annotation runs as a single batched mmseqs2 call;-tthreads only the structural alignment and write.
$ tcren superimpose -s complex.pdb -o oriented/
$ tcren orient -s data/Native2026 -o data/Canonical2026 -t 8
Both need the reference sets in data/; setup.sh runs tcren fetch-data at install to
populate them. Structure outputs are plain .pdb by default — add --mmCIF for .cif and
--compress for a trailing .gz (these flags apply to every command that writes a structure).
Fetch recent TCR-pMHC structures from the RCSB into data/pdb_recent (mmCIF .cif.gz,
validated to have all five required chains):
$ tcren fetch-recent --discover --after 2024-01-01
Library#
Score candidate epitopes against a structure:
from tcren import parse_structure, ContactMap, score_peptides
from tcren.annotation import classify_chains
from tcren.potential import tcren
structure = parse_structure("complex.pdb.gz") # .pdb/.cif/.pdb.gz/.cif.gz
classify_chains(structure, organism="human") # TRA/TRB via arda, peptide, MHC
contact_map = ContactMap.from_structure(structure)
ranked = score_peptides(contact_map, ["KQWLVWLFL", "RLLHPHHPL"], tcren())
Iterate over a batch (file, directory, or .tar.gz):
from tcren.structure import iter_structures
for pdb_id, structure in iter_structures("batch.tar.gz"):
classify_chains(structure, organism="human")
...
Orient into the canonical frame, layer contacts, and read the docking geometry:
from tcren.mhc import annotate_mhc
from tcren.orient import canonicalize_structure, superimpose, docking_angles
from tcren.contacts import multi_contacts, ContactDefinition
annotate_mhc(structure)
oriented, info = canonicalize_structure(structure) # z=MHC->TCR, y=peptide, x=thin
oriented, info = superimpose(structure) # onto data/Canonical2026 (class+species ensemble)
layers = multi_contacts(structure, ContactDefinition(d1=5, d2=8, d3=12))
angles = docking_angles(structure) # crossing + incident angle
Build a 2D complementarity map and summarise contacts by region pair:
from tcren.project2d import (project_structure, residue_markup_table,
contacts_table, region_pair_summary)
from tcren.viz import render_complementarity_map
proj = project_structure(structure)
svg = render_complementarity_map(residue_markup_table(structure, proj),
contacts=contacts_table(structure, threshold=5.0))
summary = region_pair_summary(structure, kind="closest") # also "cb" (8 A) / "ca" (12 A)
Data#
Structures come from the Hugging Face dataset
isalgo/tcren_structures (all gzipped):
Native2022 (the 2022 paper set), Native2026 (the 2026 set the current potential is derived
from), and Canonical2026 (Native2026 re-oriented). When orienting a new complex an installed
library lazily fetches the canonical reference structures (1ao7/1fyt) from the Hub, so no local
dataset is required.