calm — Lilush API

Module	Description
calm.pipeline	CALM pipeline utilities.
calm.template	CALM template DSL parser and sequence builder.

Functions

Name	Signature
`tokenize`	tokenize(`text`) -> `tokens`, `err`
`detokenize`	detokenize(`tokens`) -> `text`, `err`
`token_text`	token_text(`id`) -> `text`
`token_info`	token_info(`id`) -> `info`
`vocab_size`	vocab_size() -> `sizes`
`build_sequence`	build_sequence(`model_or_spec`, `opts`) -> `tokens`, `err`
`parse_field_input`	parse_field_input(`field_names`, `text`) -> `fields`
`build_raw_sequence`	build_raw_sequence(`text`) -> `tokens`, `err`
`build_template_sequence`	build_template_sequence(`text`) -> `tokens`, `err`
`load_model`	load_model(`path`) -> `model`, `err`
`new_model`	new_model(`opts`) -> `model`, `err`
`read_header`	read_header(`path`) -> `header`, `err`
`pack_floats`	pack_floats(`values`) -> `data`
`unpack_floats`	unpack_floats(`data`) -> `values`

tokenize(text) -> tokens, err

Tokenize text using normal mode (all tiers)

detokenize(tokens) -> text, err

Detokenize a table of token IDs back to text

token_text(id) -> text

Get surface text of a token by ID

token_info(id) -> info

Get full token info by ID

vocab_size() -> sizes

Get vocabulary size breakdown

build_sequence(model_or_spec, opts) -> tokens, err

Build a token sequence using a model's template (or explicit template spec)

Builds a token sequence from a template and context opts.

First argument can be:

A model userdata (reads template from model:info().template)
A template spec string (parsed directly)
nil (falls back to BOS + raw input)

The opts table contains field values referenced by the template (e.g. input for a QUERY:input frame).

If opts.eos is true, EOS is appended (for training).

parse_field_input(field_names, text) -> fields

Parse inline field:value patterns from text using known field names

Scans text for patterns like field_name:value where field_name is one of the known names. Each field's value runs until the next field anchor or end of string. One trailing space is stripped from each value.

Returns a table of {} pairs, or nil if no patterns were found.

Example: parse_field_input({"headword","pos"}, "pos:n. headword:anything you want") → {pos="n.", headword="anything you want"}

build_raw_sequence(text) -> tokens, err

Build a raw token sequence from plain text (no context frames)

Builds a minimal sequence: <BOS> [byte tokens] <EOS>. No context frames, no CMD token. Use with cmd_pos = 0 for full-sequence loss (training on plain text, code, etc.).

build_template_sequence(text) -> tokens, err

Build a token sequence from a template string with inline special tokens

Parses <NAME> patterns in the input and replaces them with the corresponding special token IDs. Text segments between patterns are byte-tokenized. No automatic BOS is prepended -- the caller controls the full sequence via patterns.

Example input: <BOS><QUERY>define: window<END><ATN><REPLY> Example output: {257, 260, 100, 101, ..., 264, 259, 261}

load_model(path) -> model, err

Load a trained model from a weight file

new_model(opts) -> model, err

Create a new model with random weights

read_header(path) -> header, err

Read a CWGT weight file header without loading the model

pack_floats(values) -> data

Pack a table of floats into a binary string (little-endian fp32)

unpack_floats(data) -> values

Unpack a binary string (little-endian fp32) into a table of floats

Overview

Submodules

Functions