Performance#

Per-stage timings on a TCR-pMHC complex (1ao7), Apple M3, single thread. Reproduce with:

$ RUN_BENCHMARK=1 pytest -k benchmark -s

stage	time	notes
parse a gzipped structure	~19 ms	`.pdb.gz` / `.cif.gz`
contact map (5 Å, cKDTree)	~9 ms	per structure
score 1000 candidate peptides	~8 ms	~8 µs/peptide (vectorised)
annotate (TCR + MHC), batched	~213 ms/str	one mmseqs2 call for the whole set
peak RSS, single-structure pipeline	~195 MB

Threading model#

Annotation (TCR chain typing + MHC mapping) is the only compute-heavy step. It is always run as a single batched mmseqs2 search over every chain in the input set — mmseqs2 parallelises internally, so it is never called per structure and never wrapped in Python threads (a fork-based pool would also deadlock after mmseqs2/BLAS spawn their own threads). Batching amortises the fixed ~1.5 s mmseqs2 startup: ~213 ms/structure across a set, versus ~1.5 s/structure one at a time.

Threads (tcren.orient.run_folder()’s threads / tcren orient -t N) are used only for the embarrassingly-parallel, mmseqs-free stages — structure parsing, the Kabsch/SVD alignment, and writing oriented files — and, by extension, any PyMOL/Rosetta/FlexPepDock rendering and relaxation.