Text Generation

MotifA1

A 105M-parameter bilingual causal language model with dual-mode reasoning, built on the Codon stack.

MotifA1 is a compact, CPU-friendly causal language model trained for bilingual (Chinese / English) instruction following. It supports both an explicit thinking mode (chain-of-thought wrapped in [cot_start] ... [cot_end]) and a direct non-thinking mode, switchable at inference time.


Model Summary

Field Value
Parameters 105.41 M
Vocabulary 8,192 (BPE, packed)
Architecture Causal Transformer (decoder-only)
Position Encoding RoPE, base = 500,000
Training Context 4,096 tokens
Languages δΈ­ζ–‡ / English
Modes Thinking / Non-thinking
Runtime CUDA / CPU
Precision fp32 / bf16
License See repository

Why RoPE base 500k

A RoPE base of 500k flattens the rotary frequency spectrum, which gives MotifA1 headroom to extend its context window beyond the 4,096 it was trained on via interpolation-style scaling, without retraining the position basis from scratch.


Installation

pip install codon-model==0.0.5

Required artifacts:

  • motifa1_sft.safetensors β€” model weights
  • motif.vocab β€” packed tokenizer (vocab + chat template + config in one zip)

Quickstart

Load the model

from codon.motif import MotifA1

model = MotifA1().load_pretrained('motifa1_sft.safetensors').to('cuda')
print(model.count_params(human_readable=True))   # -> 105.41 M

CPU users: replace 'cuda' with 'cpu'. Inference works out of the box, just slower.

Load the tokenizer

from codon.utils.tokens import PackedTokenizer

tokenizer = PackedTokenizer('motif.vocab')

Streaming chat

from codon.utils.generate import chat
from rich.console import Console

console = Console()

for chunk in chat(
    model, tokenizer, model.device,
    messages=[{'role': 'user', 'content': 'Your Q'}],
    stream=True,
    max_new_tokens=1024,
):
    if chunk.cot_ended:
        console.print('\n')
    if chunk.is_cot:
        console.print(chunk.content, end='', style='blue')
    else:
        console.print(chunk.content, end='')

The stream yields chunks tagged with:

  • chunk.is_cot β€” whether the current span is inside a chain-of-thought block
  • chunk.cot_ended β€” fires once when the model exits thinking mode and begins the user-facing answer
  • chunk.content β€” the decoded text fragment

This lets you render reasoning in a separate visual channel (e.g. dim blue) and the final answer in normal style.


Modes

MotifA1 follows a chat template with explicit reasoning markers.

Thinking mode β€” the model first generates content between [cot_start] and [cot_end], then produces the final answer. Recommended for math, multi-step reasoning, code planning.

Non-thinking mode β€” the model emits an empty [cot_start][cot_end] block and answers directly. Recommended for chit-chat, translation, short-form generation, and latency-sensitive applications.

The chat helper exposes mode switching; consult the Codon docs for the parameter form your version exposes.


Training

  • Stage 1 β€” Pretraining at 4,096-token context, bilingual corpus.
  • Stage 2 β€” SFT on a curated mix of single- and multi-turn dialogues, with thinking and non-thinking samples blended.
  • Optimizer AdamW, weight decay 0.01, gradient clip 1.0.
  • Schedule Linear warmup β†’ cosine annealing to 10% of peak LR.
  • Precision bf16 autocast.

Known Limitations

  • Long-range attention is weak. Even with RoPE base 500k allowing window extension, retrieval and reasoning over long spans (>2k effective tokens) degrade noticeably. Treat MotifA1 as a short- to mid-context model in practice.
  • Scale-bound knowledge. At ~105 M parameters, factual recall is limited. Pair with retrieval for knowledge-heavy tasks.
  • Vocabulary is compact. With 8,192 BPE tokens, rare scripts, niche jargon, and long URLs may be tokenized inefficiently.
  • Hallucination. Like all LMs of this scale, MotifA1 can produce confident but incorrect answers. Verify safety-critical outputs.

Intended Use

  • Personal assistants, on-device chat, edge deployment
  • Education and research on small LMs, dual-mode reasoning, and RoPE scaling
  • A base model for further fine-tuning at modest compute budgets

Not intended for high-stakes decisions (medical, legal, financial) or as a sole knowledge source.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CodonProject/MotifA1-Base

Finetunes
1 model

Dataset used to train CodonProject/MotifA1-Base