Usage#
Command line#
arda info
arda annotate -i reads.fastq -o out.airr.tsv --organism human --seqtype nt
arda annotate -i prot.fasta -o out.airr.tsv --organism human --seqtype aa
The output is an AIRR rearrangement TSV with 1-based, closed region coordinates
(fwr1_start/fwr1_end … cdr3_start/cdr3_end), region nucleotide and
amino-acid sequences, v_call/j_call, junction, and productive.
Python library#
import arda
records = arda.annotate_sequences(
["GACGTGCAG...", ("clone7", "CAGGTG...")],
seqtype="nt",
organism="human",
)
Each record is a dict keyed by the AIRR fields above.
Scaling#
MMseqs2 runs multi-threaded (--threads); inputs may be FASTA or FASTQ, plain
or gzipped. For cluster runs, shard the input across SLURM array tasks and
concatenate the per-shard AIRR TSVs.
Supported organisms#
human, mouse — full IG and TR loci.
rat, rabbit, rhesus_monkey — IG only (IgBLAST ships no TR internal annotation for these organisms).