MotifA1

A 105M-parameter bilingual causal language model with dual-mode reasoning, built on the Codon stack.

MotifA1 is a compact, CPU-friendly causal language model trained for bilingual (Chinese / English) instruction following. It supports both an explicit thinking mode (chain-of-thought wrapped in [cot_start] ... [cot_end]) and a direct non-thinking mode, switchable at inference time.

Web Demo at ModelScope HuggingFace

Model Summary

Field	Value
Parameters	105.41 M
Vocabulary	8,192 (BPE, packed)
Architecture	Causal Transformer (decoder-only)
Position Encoding	RoPE, base = 500,000
Training Context	4,096 tokens
Languages	中文 / English
Modes	Thinking / Non-thinking
Runtime	CUDA / CPU
Precision	fp32 / bf16
License	See repository

Why RoPE base 500k

A RoPE base of 500k flattens the rotary frequency spectrum, which gives MotifA1 headroom to extend its context window beyond the 4,096 it was trained on via interpolation-style scaling, without retraining the position basis from scratch.

Installation

pip install codon-model==0.0.6a2

Required artifacts:

motifa1_sft.safetensors — model weights
motif.vocab — packed tokenizer (vocab + chat template + config in one zip)

Quickstart

Load the model

If you have downloaded the motifa1_sft.safetensors, then you can load as:

from codon.motif import MotifA1

model = MotifA1().load_pretrained('motifa1_sft.safetensors').to('cuda')
print(model.count_params(human_readable=True))   # -> 105.41 M

More simply load from network:

from codon.motif import MotifA1

model = MotifA1().from_remote().to('cuda')
print(model.count_params(human_readable=True))   # -> 105.41 M

CPU users: replace 'cuda' with 'cpu'. Inference works out of the box, just slower.

Load the tokenizer

If you have downloaded the motif.vocab, then you can load as:

from codon.utils.tokens import PackedTokenizer

tokenizer = PackedTokenizer('motif.vocab')

More simply load from network:

from codon.motif import MotifA1Tokenizer

tokenizer = MotifA1Tokenizer().from_remote()

Streaming chat

from codon.utils.generate import chat
from rich.console import Console

console = Console()

for chunk in chat(
    model, tokenizer, model.device,
    messages=[{'role': 'user', 'content': 'Your Q'}],
    stream=True,
    max_new_tokens=1024,
):
    if chunk.cot_ended:
        console.print('\n')
    if chunk.is_cot:
        console.print(chunk.content, end='', style='blue')
    else:
        console.print(chunk.content, end='')

The stream yields chunks tagged with:

chunk.is_cot — whether the current span is inside a chain-of-thought block
chunk.cot_ended — fires once when the model exits thinking mode and begins the user-facing answer
chunk.content — the decoded text fragment

This lets you render reasoning in a separate visual channel (e.g. dim blue) and the final answer in normal style.

Using as OpenAI-Compat Endpoint

from codon.utils.service import Service, ModelCard

Service([
    ModelCard(
        model=model,
        tokenizer=tokenizer,
        model_id='Motif-A1',
        owned='CodonProject'
    )
]).run(port=11305)

Modes

MotifA1 follows a chat template with explicit reasoning markers.

Thinking mode — the model first generates content between [cot_start] and [cot_end], then produces the final answer. Recommended for math, multi-step reasoning, code planning.

Non-thinking mode — the model emits an empty [cot_start][cot_end] block and answers directly. Recommended for chit-chat, translation, short-form generation, and latency-sensitive applications.

The chat helper exposes mode switching; consult the Codon docs for the parameter form your version exposes.

Training

Stage 1 — Pretraining at 4,096-token context, bilingual corpus.
Stage 2 — SFT on a curated mix of single- and multi-turn dialogues, with thinking and non-thinking samples blended.
Optimizer AdamW, weight decay 0.01, gradient clip 1.0.
Schedule Linear warmup → cosine annealing to 10% of peak LR.
Precision bf16 autocast.

The pretraining and SFT code are available at https://github.com/CodonProject/codon-model/tree/v0.0.6-alpha.2/train_exp .

Known Limitations

Long-range attention is weak. Even with RoPE base 500k allowing window extension, retrieval and reasoning over long spans (>2k effective tokens) degrade noticeably. Treat MotifA1 as a short- to mid-context model in practice.
Scale-bound knowledge. At ~105 M parameters, factual recall is limited. Pair with retrieval for knowledge-heavy tasks.
Vocabulary is compact. With 8,192 BPE tokens, rare scripts, niche jargon, and long URLs may be tokenized inefficiently.
Hallucination. Like all LMs of this scale, MotifA1 can produce confident but incorrect answers. Verify safety-critical outputs.

Intended Use

Personal assistants, on-device chat, edge deployment
Education and research on small LMs, dual-mode reasoning, and RoPE scaling
A base model for further fine-tuning at modest compute budgets

Not intended for high-stakes decisions (medical, legal, financial) or as a sole knowledge source.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for CodonProject/MotifA1-SFT

Base model

CodonProject/MotifA1-Base

Finetuned

(1)

this model

CodonProject
/

MotifA1-SFT