mir.common package#

Submodules#

mir.common.clonotype module#

mir.common.diversity module#

mir.common.filter module#

mir.common.alleles module#

mir.common.gene_library module#

mir.common.parser module#

mir.common.repertoire module#

mir.common.metaclonotype module#

mir.common.single_cell module#

mir.common.single_cell_parser module#

mir.common.single_cell_repair module#

mir.common.single_cell_util module#

mir.common.sampling module#

mir.common.pool module#

mir.common.control module#

mir.common.io_parallel module#

Parallel Default And Fallback Policy#

  • Default mode uses parallel parsing with 4 workers.

  • Sequential fallback is used when any of these are true:

    • n_jobs is set to 1.

    • Parsed row count is below 10,000 (parallel_min_rows default).

    • The file fits in one chunk (n_rows <= chunk_size).

  • Practical estimate for typical AIRR tables:

    • Small to medium AIRR files (~3,000 rows at ~0.07 MB gz) represent approximately 43,000 rows per MB gz for similarly narrow AIRR tables.

    • Under this approximation, 10,000 rows corresponds to roughly 0.23 MB gz.

  • Rule of thumb:

    • If a gzipped AIRR file is substantially below about 0.23 MB, sequential loading is typically chosen.

    • If it is above about 0.23 MB, parallel loading is typically beneficial and selected by default.

mir.common.repertoire_dataset module#

TSV And Parquet I/O Layouts#

The repertoire classes provide Polars-first TSV/Parquet I/O helpers with roundtrip-safe schemas:

  • LocusRepertoire:

    • to_tsv / from_tsv

    • to_parquet / from_parquet

  • SampleRepertoire:

    • single-file: one TSV/Parquet with a locus column

    • split-loci: one file per locus via split_loci=True

  • RepertoireDataset:

    • per_sample_locus layout: one file per sample and locus

    • single_file layout: one combined file with sample_id and locus columns

      plus separate metadata.tsv

All dataset loaders operate with worker tasks on individual samples.

Module contents#