API Reference#

Index#

class seqtree.Index#

Immutable search index over a set of reference sequences. Build once, then query concurrently; reference id is the position in refs.

align(self: seqtree._core.Index, ref_id: SupportsInt | SupportsIndex, query: str, params: seqtree._core.SearchParams) seqtree._core.Alignment#

Compute a global alignment between query and a reference, on demand.

static build(refs: collections.abc.Sequence[str], alphabet: str = 'aa') seqtree._core.Index#

Build an index. alphabet is ‘aa’, ‘nt’, or ‘iupac’. Raises ValueError on a symbol outside the alphabet.

collisions_batch(self: seqtree._core.Index, queries: collections.abc.Sequence[str], params: seqtree._core.SearchParams, threads: SupportsInt | SupportsIndex = 0) list[int]#

Per-query count of seqtm collisions: how often a reference was re-reached via a different edit path during branch-and-bound (0 for seqtrie / substitution-only).

static load(path: str) seqtree._core.Index#

Load an index previously written with save(); raises on a corrupt/old file.

ref_seq(self: seqtree._core.Index, ref_id: SupportsInt | SupportsIndex) str#

Return the reference sequence string for a reference id.

save(self: seqtree._core.Index, path: str) None#

Serialize the index to a flat binary file for fast reload.

search(self: seqtree._core.Index, query: str, params: seqtree._core.SearchParams) list#

Return all hits for one query within the scope/budget in params.

search_batch(self: seqtree._core.Index, queries: collections.abc.Sequence[str], params: seqtree._core.SearchParams, threads: SupportsInt | SupportsIndex = 0) list#

Search many queries in parallel (releases the GIL). threads=0 uses all cores. Returns one hit list per query, in input order.

search_top(self: seqtree._core.Index, query: str, params: seqtree._core.SearchParams, k: SupportsInt | SupportsIndex = 1) list#

Return up to k best (lowest-score) hits for one query.

SearchParams#

class seqtree.SearchParams#

Search scope and budget. Scope: max_subs/max_ins/max_dels (exact, seqtm) and max_total_edits. Budget: max_penalty with an optional matrix (‘BLOSUM62’) and gap costs. engine is ‘auto’|’seqtrie’|’seqtm’, mode is ‘all’|’top’.

property engine#
property gap_extend#
property gap_open#
property matrix#
property max_dels#
property max_ins#
property max_penalty#
property max_subs#
property max_total_edits#
property mode#

Hit#

class seqtree.Hit#

A search result. Payload-agnostic: map ref_id back to your own payload downstream. score is a non-negative penalty (0 == exact). n_subs/n_ins/n_dels are exact for the seqtm engine and 0 for seqtrie. Iterable as (ref_id, score, n_subs, n_ins, n_dels).

property n_dels#
property n_ins#
property n_subs#
property ref_id#
property score#

Alignment#

class seqtree.Alignment#

Global alignment of a query to a reference. ops has one char per column: ‘M’ match, ‘S’ substitution, ‘I’ insertion, ‘D’ deletion.

property aligned_query#
property aligned_ref#
property ops#
property score#

Functions#

seqtree.pairwise_batch(a: collections.abc.Sequence[str], b: collections.abc.Sequence[str], params: seqtree._core.SearchParams, alphabet: str = 'aa', threads: SupportsInt | SupportsIndex = 0) list#

Batch-vs-batch search. Indexes the larger set internally and streams the smaller; results are a-major (one hit list per a[i]) with Hit.ref_id pointing into b.