Usage#

Command line#

arda info
arda annotate -i reads.fastq -o out.airr.tsv --organism human --seqtype nt
arda annotate -i prot.fasta  -o out.airr.tsv --organism human --seqtype aa

The output is an AIRR rearrangement TSV with 1-based, closed region coordinates (fwr1_start/fwr1_endcdr3_start/cdr3_end), region nucleotide and amino-acid sequences, v_call/j_call, junction, and productive.

Python library#

import arda

records = arda.annotate_sequences(
    ["GACGTGCAG...", ("clone7", "CAGGTG...")],
    seqtype="nt",
    organism="human",
)

Each record is a dict keyed by the AIRR fields above.

Scaling#

MMseqs2 runs multi-threaded (--threads); inputs may be FASTA or FASTQ, plain or gzipped. For cluster runs, shard the input across SLURM array tasks and concatenate the per-shard AIRR TSVs.

Supported organisms#

  • human, mouse — full IG and TR loci.

  • rat, rabbit, rhesus_monkey — IG only (IgBLAST ships no TR internal annotation for these organisms).