Engines & Concepts
==================

One immutable trie, two search drivers. Pick one with ``engine=`` or let
``engine="auto"`` choose per query.

Scope vs budget
---------------

A query matches a reference when it satisfies the **scope** and/or the **budget**:

- **Scope** — per-type edit caps: ``max_subs``, ``max_ins``, ``max_dels``, and an optional
  combined ``max_total_edits``. A per-type cap of ``0`` means *zero of that type*.
- **Budget** — a score threshold ``max_penalty`` under a substitution matrix and gap costs.

``max_total_edits`` is an independent total cap (``0`` means "no total cap", falling back to the
per-type sum). It is not clamped by the per-type caps, so ``seqtrie`` can be driven by it alone.

seqtm — branch-and-bound
------------------------

Enumerates each edit (substitution / insertion / deletion) while descending the trie, tracking
the per-type counts, and prunes as soon as a cap or the budget is exceeded. Consequences:

- Per-type caps are enforced **exactly**, and every hit reports an exact
  ``(n_subs, n_ins, n_dels)`` breakdown.
- A dedicated **Hamming-only** path runs when ``max_ins == max_dels == 0``.
- Cost grows with the number of allowed edits, so it is fastest at small distances (k = 1–3) —
  which covers UMI collapse, CDR3 error correction, and CDR3 / epitope matching.

seqtrie — banded DP
-------------------

Carries an edit-distance DP row down the trie and prunes a subtree once its best cell exceeds the
budget. Consequences:

- Handles a matrix-weighted **score budget** (BLOSUM62 or a custom matrix) and indels naturally.
- Cost is independent of the edit count, so it scales better to large budgets / long range.
- It tracks a single cost, so it enforces ``max_total_edits`` + ``max_penalty`` but **not** the
  per-type caps; ``n_subs`` / ``n_ins`` / ``n_dels`` are reported as ``0`` (use
  :meth:`~seqtree.Index.align` to recover the breakdown).

``engine="auto"`` routes substitution-only / small-k indel work to ``seqtm`` and matrix-weighted
budgets to ``seqtrie``.

Scoring
-------

Scores are non-negative penalties where ``0`` is an exact match. Similarity matrices such as
BLOSUM62 are converted at load time via the Gram→squared-distance transform
``pen[a][b] = sim[a][a] + sim[b][b] - 2*sim[a][b]`` — i.e. ``‖φ(a) - φ(b)‖²`` if the score is read
as an inner product ``sim(a,b) = ⟨φ(a), φ(b)⟩``. It is symmetric, ``0`` on the identity, and
non-negative for BLOSUM/PAM (the diagonal is each row's maximum), so every edit adds a non-negative
cost and the budget prune stays valid. Note penalties are roughly twice the scale of the raw score
gaps, so in matrix mode set ``gap_open`` / ``max_penalty`` accordingly. With no matrix, the cost is
unit (1 per substitution and per gap position; linear gaps in v1).

Alphabets
---------

``"aa"`` (20 amino acids in BLOSUM62 order plus ``B Z X *``), ``"nt"`` (``ACGT``), and
``"iupac"`` (nucleotide ambiguity codes). Encoding is case-insensitive; a symbol outside the
alphabet raises an error at build or search time.