Codeas Model
A fine-tuned Qwen3-14B model optimized for code generation and reasoning tasks. Available in GGUF Q6_K format for efficient local inference.
Model Details
| Base Model | Qwen3-14B |
| Parameters | ~15B |
| Architecture | Qwen3 (GQA, RoPE) |
| Context Length | 40,960 tokens |
| Precision | BF16 (original), Q6_K (GGUF) |
| License | Apache 2.0 |
Architecture
- 40 transformer blocks
- 40 attention heads, 8 KV heads (Grouped Query Attention)
- 5,120 hidden size / 17,408 FFN size
- RoPE with 1M frequency base
- SiLU activation
- 151,936 vocab size (GPT-2 tokenizer, Qwen2 pre-tokenizer)
Capabilities
- Chain-of-thought reasoning via
<think>blocks - Tool/function calling via
<tool_call>format - Thinking mode can be toggled on/off per request
GGUF Quantizations
| File | Quant | Size | Quality |
|---|---|---|---|
codeas-model-Q6_K.gguf |
Q6_K | 12.1 GB | Near-lossless |
Usage
llama.cpp
./llama-cli -m codeas-model-Q6_K.gguf -p "Write a Python function to merge two sorted lists" -n 512
Ollama
Create a Modelfile with the following content:
FROM ./codeas-model-Q6_K.gguf
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM "You are Codeas, a helpful coding assistant."
Then run:
ollama create codeas -f Modelfile
ollama run codeas
Hardware Requirements
| Format | VRAM / RAM |
|---|---|
| Q6_K GGUF | ~14 GB |
Training
| Method | Full fine-tune (no LoRA) |
| Framework | Axolotl 0.13.0 + Transformers 4.55.4 |
| Hardware | 8x GPU (FSDP) |
| Optimizer | AdamW (fused) |
| LR Schedule | Cosine, 1e-5 peak |
| Sequence Length | 8,192 |
| Batch Size | 24 (3 per device) |
| Epochs | 3 |
| Precision | BF16 + TF32 |
| Techniques | Flash Attention, Sample Packing, Gradient Checkpointing, Activation Offloading |
Sampling Defaults
temperature: 0.6
top_p: 0.95
top_k: 20
- Downloads last month
- 43
Hardware compatibility
Log In to add your hardware
6-bit