seqtree#

FUZZY SEQUENCE SEARCH

seqtree finds biological sequences (amino-acid or nucleotide) within a fixed edit scope or score budget. Build an immutable index once, then search single queries or millions of queries in parallel. C++ core, minimal Python binding.

seqtm — branch-and-bound

Exact per-type edit caps (subs / ins / dels), a fast Hamming-only path, and an exact edit-type breakdown per hit. The workhorse for small edit distances: UMI collapse, CDR3 error correction, CDR3/epitope matching.

seqtrie — banded DP

Matrix-weighted score budgets (BLOSUM62 + gap costs) with cost independent of the edit count. Best for similarity-scored searches over a total-edit or penalty budget.

Results are payload-agnostic — (ref_id, score, n_subs, n_ins, n_dels). Downstream libraries map ref_id back to their own payloads (V gene, MHC, read counts) and filter there.