{ "cells": [ { "cell_type": "markdown", "id": "164caae8", "metadata": { "language": "markdown" }, "source": [ "# The canonical TCR-pMHC frame — figures & summary\n", "\n", "`tcren.orient` superposes every TCR-pMHC complex onto a per-class native reference by its MHC\n", "groove Cα, then applies a fixed per-class rotation `R_canon` (PCA of the reference Cα cloud) to\n", "land it in a common **canonical frame**:\n", "\n", "* **z** = PC1 (largest variance, the MHC→TCR long axis), signed +z toward the TCR;\n", "* **y** = PC2 (the groove / peptide axis), signed +y toward the peptide C-terminus;\n", "* **x** = PC3 (the thin axis), right-handed.\n", "\n", "Chains are renamed A=Vα, B=Vβ, C=peptide, D=MHCα, E=MHCβ/β2m. This notebook orients a sample of\n", "the Hugging Face `Native2026` set and reports the frame's quality and geometry: alignment RMSD,\n", "the variance the canonical axes capture, a registered overlay, and the TCR docking-angle\n", "distribution. Structures are read only from the bootstrapped HF set." ] }, { "cell_type": "code", "execution_count": 1, "id": "0abf8ce3", "metadata": { "execution": { "iopub.execute_input": "2026-06-16T12:50:24.220082Z", "iopub.status.busy": "2026-06-16T12:50:24.219958Z", "iopub.status.idle": "2026-06-16T12:50:24.727161Z", "shell.execute_reply": "2026-06-16T12:50:24.726754Z" }, "language": "python" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "python 3.11.15 | tcren 0.1.0 | numpy 2.4.6\n", "MHCI: ref=1ao7 variance PC1/PC2/PC3 = 0.74/0.17/0.09\n", "MHCII: ref=1fyt variance PC1/PC2/PC3 = 0.71/0.18/0.11\n" ] } ], "source": [ "# Imports + environment versions, and the bundled per-class canonical-frame artifact.\n", "import warnings; warnings.filterwarnings('ignore')\n", "import json, sys\n", "from importlib import resources\n", "from pathlib import Path\n", "import numpy as np, polars as pl, matplotlib, matplotlib.pyplot as plt\n", "import tcren\n", "from tcren.structure.io import import_structure\n", "from tcren.annotation import classify_chains\n", "from tcren.mhc import annotate_mhc\n", "from tcren.orient import canonicalize_structure, check_oriented_complex, docking_angles\n", "print('python', sys.version.split()[0], '| tcren', tcren.__version__, '| numpy', np.__version__)\n", "frame_art = json.loads(resources.files('tcren.data').joinpath('canonical_frame.json').read_text())\n", "for cls, e in frame_art.items():\n", " v = e.get('variance_explained', {})\n", " print(f\"{cls}: ref={e['reference_id']} variance PC1/PC2/PC3 = \"\n", " f\"{v.get('PC1_z',0):.2f}/{v.get('PC2_y',0):.2f}/{v.get('PC3_x',0):.2f}\")\n", "STRUCT = Path('data/Native2026')\n", "plt.rcParams.update({'figure.dpi': 110, 'font.size': 10})" ] }, { "cell_type": "markdown", "id": "c09a00a4", "metadata": { "language": "markdown" }, "source": [ "## Orient a Native2026 sample" ] }, { "cell_type": "code", "execution_count": 2, "id": "79754032", "metadata": { "execution": { "iopub.execute_input": "2026-06-16T12:50:24.728484Z", "iopub.status.busy": "2026-06-16T12:50:24.728362Z", "iopub.status.idle": "2026-06-16T12:53:54.970020Z", "shell.execute_reply": "2026-06-16T12:53:54.969580Z" }, "language": "python" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "oriented: 40 / 40 | native frame: 40 | reverse-docked: 0\n" ] }, { "data": { "text/html": [ "
| pdb.id | frame | rmsd | reversed_dock | crossing | incident | qc |
|---|---|---|---|---|---|---|
| str | str | f64 | bool | f64 | f64 | str |
| "1ao7" | "native" | 4.1624e-16 | false | 52.434485 | 6.421442 | "ok" |
| "1bd2" | "native" | 0.441107 | false | 68.22069 | 5.005298 | "ok" |
| "1d9k" | "native" | 1.249152 | false | 79.191269 | 1.318339 | "ok" |
| "1fo0" | "native" | 0.961154 | false | 60.81962 | -11.356836 | "ok" |
| "1fyt" | "native" | 6.3426e-15 | false | 69.571251 | -1.027948 | "ok" |
| "1g6r" | "native" | 1.057513 | false | 43.411631 | -7.968066 | "ok" |
| "1j8h" | "native" | 0.30476 | false | 69.267062 | -1.616912 | "ok" |
| "1jtr" | "native" | 0.963854 | false | 42.897356 | -10.029156 | "ok" |