seqtree#
FUZZY SEQUENCE SEARCH
seqtree finds biological sequences (amino-acid or nucleotide) within a fixed edit scope or score budget. Build an immutable index once, then search single queries or millions of queries in parallel. C++ core, minimal Python binding.
Getting Started
Install, build an index, run your first search.
Engines & Concepts
seqtm vs seqtrie, scope vs budget, scoring.
API Reference
Index, SearchParams, Hit, Alignment.
Benchmarks
Throughput, scaling, alignment cost.
seqtm — branch-and-bound
Exact per-type edit caps (subs / ins / dels), a fast Hamming-only path, and an exact edit-type breakdown per hit. The workhorse for small edit distances: UMI collapse, CDR3 error correction, CDR3/epitope matching.
seqtrie — banded DP
Matrix-weighted score budgets (BLOSUM62 + gap costs) with cost independent of the edit count. Best for similarity-scored searches over a total-edit or penalty budget.
Results are payload-agnostic — (ref_id, score, n_subs, n_ins, n_dels). Downstream libraries
map ref_id back to their own payloads (V gene, MHC, read counts) and filter there.