Usage
=====

Command line
------------

.. code-block:: bash

   arda info
   arda annotate -i reads.fastq -o out.airr.tsv --organism human --seqtype nt
   arda annotate -i prot.fasta  -o out.airr.tsv --organism human --seqtype aa

The output is an AIRR rearrangement TSV with 1-based, closed region coordinates
(``fwr1_start``/``fwr1_end`` … ``cdr3_start``/``cdr3_end``), region nucleotide and
amino-acid sequences, ``v_call``/``j_call``, ``junction``, and ``productive``.

Python library
--------------

.. code-block:: python

   import arda

   records = arda.annotate_sequences(
       ["GACGTGCAG...", ("clone7", "CAGGTG...")],
       seqtype="nt",
       organism="human",
   )

Each record is a dict keyed by the AIRR fields above.

Scaling
-------

MMseqs2 runs multi-threaded (``--threads``); inputs may be FASTA or FASTQ, plain
or gzipped. For cluster runs, shard the input across SLURM array tasks and
concatenate the per-shard AIRR TSVs.

Supported organisms
-------------------

* **human, mouse** — full IG and TR loci.
* **rat, rabbit, rhesus_monkey** — IG only (IgBLAST ships no TR internal
  annotation for these organisms).