Codeas Model

A fine-tuned Qwen3-14B model optimized for code generation and reasoning tasks. Available in GGUF Q6_K format for efficient local inference.

Model Details


Base Model	Qwen3-14B
Parameters	~15B
Architecture	Qwen3 (GQA, RoPE)
Context Length	40,960 tokens
Precision	BF16 (original), Q6_K (GGUF)
License	Apache 2.0

Architecture

40 transformer blocks
40 attention heads, 8 KV heads (Grouped Query Attention)
5,120 hidden size / 17,408 FFN size
RoPE with 1M frequency base
SiLU activation
151,936 vocab size (GPT-2 tokenizer, Qwen2 pre-tokenizer)

Capabilities

Chain-of-thought reasoning via <think> blocks
Tool/function calling via <tool_call> format
Thinking mode can be toggled on/off per request

GGUF Quantizations

File	Quant	Size	Quality
`codeas-model-Q6_K.gguf`	Q6_K	12.1 GB	Near-lossless

Usage

llama.cpp

./llama-cli -m codeas-model-Q6_K.gguf -p "Write a Python function to merge two sorted lists" -n 512

Ollama

Create a Modelfile with the following content:

FROM ./codeas-model-Q6_K.gguf

PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

SYSTEM "You are Codeas, a helpful coding assistant."

Then run:

ollama create codeas -f Modelfile
ollama run codeas

Hardware Requirements

Format	VRAM / RAM
Q6_K GGUF	~14 GB

Training


Method	Full fine-tune (no LoRA)
Framework	Axolotl 0.13.0 + Transformers 4.55.4
Hardware	8x GPU (FSDP)
Optimizer	AdamW (fused)
LR Schedule	Cosine, 1e-5 peak
Sequence Length	8,192
Batch Size	24 (3 per device)
Epochs	3
Precision	BF16 + TF32
Techniques	Flash Attention, Sample Packing, Gradient Checkpointing, Activation Offloading

Sampling Defaults

temperature: 0.6
top_p: 0.95
top_k: 20

Downloads last month: 43

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

6-bit

Model tree for devaloper/codeas

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Quantized

(159)

this model