Performance#
Per-stage timings on a TCR-pMHC complex (1ao7), Apple M3, single thread. Reproduce with:
$ RUN_BENCHMARK=1 pytest -k benchmark -s
stage |
time |
notes |
|---|---|---|
parse a gzipped structure |
~19 ms |
|
contact map (5 Å, cKDTree) |
~9 ms |
per structure |
score 1000 candidate peptides |
~8 ms |
~8 µs/peptide (vectorised) |
annotate (TCR + MHC), batched |
~213 ms/str |
one mmseqs2 call for the whole set |
peak RSS, single-structure pipeline |
~195 MB |
Threading model#
Annotation (TCR chain typing + MHC mapping) is the only compute-heavy step. It is always run as a single batched mmseqs2 search over every chain in the input set — mmseqs2 parallelises internally, so it is never called per structure and never wrapped in Python threads (a fork-based pool would also deadlock after mmseqs2/BLAS spawn their own threads). Batching amortises the fixed ~1.5 s mmseqs2 startup: ~213 ms/structure across a set, versus ~1.5 s/structure one at a time.
Threads (tcren.orient.run_folder()’s threads / tcren orient -t N) are used only for
the embarrassingly-parallel, mmseqs-free stages — structure parsing, the Kabsch/SVD alignment, and
writing oriented files — and, by extension, any PyMOL/Rosetta/FlexPepDock rendering and relaxation.