Performance =========== Per-stage timings on a TCR-pMHC complex (1ao7), Apple M3, single thread. Reproduce with:: $ RUN_BENCHMARK=1 pytest -k benchmark -s ======================================== =========== ==================================== stage time notes ======================================== =========== ==================================== parse a gzipped structure ~19 ms ``.pdb.gz`` / ``.cif.gz`` contact map (5 Å, cKDTree) ~9 ms per structure score 1000 candidate peptides ~8 ms ~8 µs/peptide (vectorised) annotate (TCR + MHC), **batched** ~213 ms/str one mmseqs2 call for the whole set peak RSS, single-structure pipeline ~195 MB ======================================== =========== ==================================== Threading model --------------- Annotation (TCR chain typing + MHC mapping) is the only compute-heavy step. It is always run as a **single batched mmseqs2 search** over every chain in the input set — mmseqs2 parallelises internally, so it is never called per structure and never wrapped in Python threads (a fork-based pool would also deadlock after mmseqs2/BLAS spawn their own threads). Batching amortises the fixed ~1.5 s mmseqs2 startup: ~213 ms/structure across a set, versus ~1.5 s/structure one at a time. Threads (:func:`tcren.orient.run_folder`'s ``threads`` / ``tcren orient -t N``) are used **only** for the embarrassingly-parallel, mmseqs-free stages — structure parsing, the Kabsch/SVD alignment, and writing oriented files — and, by extension, any PyMOL/Rosetta/FlexPepDock rendering and relaxation.