MotifA1
A 105M-parameter bilingual causal language model with dual-mode reasoning, built on the Codon stack.
MotifA1 is a compact, CPU-friendly causal language model trained for bilingual (Chinese / English) instruction following. It supports both an explicit thinking mode (chain-of-thought wrapped in [cot_start] ... [cot_end]) and a direct non-thinking mode, switchable at inference time.
Web Demo at ModelScope HuggingFace
Model Summary
| Field | Value |
|---|---|
| Parameters | 105.41 M |
| Vocabulary | 8,192 (BPE, packed) |
| Architecture | Causal Transformer (decoder-only) |
| Position Encoding | RoPE, base = 500,000 |
| Training Context | 4,096 tokens |
| Languages | δΈζ / English |
| Modes | Thinking / Non-thinking |
| Runtime | CUDA / CPU |
| Precision | fp32 / bf16 |
| License | See repository |
Why RoPE base 500k
A RoPE base of 500k flattens the rotary frequency spectrum, which gives MotifA1 headroom to extend its context window beyond the 4,096 it was trained on via interpolation-style scaling, without retraining the position basis from scratch.
Installation
pip install codon-model==0.0.6a2
Required artifacts:
motifa1_sft.safetensorsβ model weightsmotif.vocabβ packed tokenizer (vocab + chat template + config in one zip)
Quickstart
Load the model
If you have downloaded the motifa1_sft.safetensors, then you can load as:
from codon.motif import MotifA1
model = MotifA1().load_pretrained('motifa1_sft.safetensors').to('cuda')
print(model.count_params(human_readable=True)) # -> 105.41 M
More simply load from network:
from codon.motif import MotifA1
model = MotifA1().from_remote().to('cuda')
print(model.count_params(human_readable=True)) # -> 105.41 M
CPU users: replace 'cuda' with 'cpu'. Inference works out of the box, just slower.
Load the tokenizer
If you have downloaded the motif.vocab, then you can load as:
from codon.utils.tokens import PackedTokenizer
tokenizer = PackedTokenizer('motif.vocab')
More simply load from network:
from codon.motif import MotifA1Tokenizer
tokenizer = MotifA1Tokenizer().from_remote()
Streaming chat
from codon.utils.generate import chat
from rich.console import Console
console = Console()
for chunk in chat(
model, tokenizer, model.device,
messages=[{'role': 'user', 'content': 'Your Q'}],
stream=True,
max_new_tokens=1024,
):
if chunk.cot_ended:
console.print('\n')
if chunk.is_cot:
console.print(chunk.content, end='', style='blue')
else:
console.print(chunk.content, end='')
The stream yields chunks tagged with:
chunk.is_cotβ whether the current span is inside a chain-of-thought blockchunk.cot_endedβ fires once when the model exits thinking mode and begins the user-facing answerchunk.contentβ the decoded text fragment
This lets you render reasoning in a separate visual channel (e.g. dim blue) and the final answer in normal style.
Using as OpenAI-Compat Endpoint
from codon.utils.service import Service, ModelCard
Service([
ModelCard(
model=model,
tokenizer=tokenizer,
model_id='Motif-A1',
owned='CodonProject'
)
]).run(port=11305)
Modes
MotifA1 follows a chat template with explicit reasoning markers.
Thinking mode β the model first generates content between [cot_start] and [cot_end], then produces the final answer. Recommended for math, multi-step reasoning, code planning.
Non-thinking mode β the model emits an empty [cot_start][cot_end] block and answers directly. Recommended for chit-chat, translation, short-form generation, and latency-sensitive applications.
The chat helper exposes mode switching; consult the Codon docs for the parameter form your version exposes.
Training
- Stage 1 β Pretraining at 4,096-token context, bilingual corpus.
- Stage 2 β SFT on a curated mix of single- and multi-turn dialogues, with thinking and non-thinking samples blended.
- Optimizer AdamW, weight decay 0.01, gradient clip 1.0.
- Schedule Linear warmup β cosine annealing to 10% of peak LR.
- Precision bf16 autocast.
The pretraining and SFT code are available at https://github.com/CodonProject/codon-model/tree/v0.0.6-alpha.2/train_exp .
Known Limitations
- Long-range attention is weak. Even with RoPE base 500k allowing window extension, retrieval and reasoning over long spans (>2k effective tokens) degrade noticeably. Treat MotifA1 as a short- to mid-context model in practice.
- Scale-bound knowledge. At ~105 M parameters, factual recall is limited. Pair with retrieval for knowledge-heavy tasks.
- Vocabulary is compact. With 8,192 BPE tokens, rare scripts, niche jargon, and long URLs may be tokenized inefficiently.
- Hallucination. Like all LMs of this scale, MotifA1 can produce confident but incorrect answers. Verify safety-critical outputs.
Intended Use
- Personal assistants, on-device chat, edge deployment
- Education and research on small LMs, dual-mode reasoning, and RoPE scaling
- A base model for further fine-tuning at modest compute budgets
Not intended for high-stakes decisions (medical, legal, financial) or as a sole knowledge source.
Model tree for CodonProject/MotifA1-SFT
Base model
CodonProject/MotifA1-Base