SigLino-MoE-0.3-0.6B

Accepted at CVPR 2026

This work stems from the CVPR 2026 AMoE paper, which designs and applies distillation into a Mixture-of-Experts (MoE) vision architecture. We have chosen the name SigLino for better clarity (SigLIP2 + DINOv3).

Sparse MoE variant of SigLino with top-6 out of 28 experts routing. 0.3B active parameters, 0.6B total.

Part of the SigLino model family.

Usage

import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor

# Load model and processor
model_id = "tiiuae/siglino-moe-0.3-0.6B"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)

# Preprocess image
image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

# Inference
with torch.no_grad():
    outputs = model(**inputs)

# Access specialized features
# Options: 'siglino' (768d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["siglino"]    # (Batch, Tokens, 768)
summary_features = outputs["summary_features"]["siglip2"] # (Batch, 1152)

Model Details

Attribute	Value
Architecture	MoE (top-6/28)
Parameters (active)	0.3B
Parameters (total)	0.6B
Layers	18
Hidden Dim	768
MoE Dim	384
Patch Size	16x16
Teachers	DINOv3, SigLIP2

Results (512x512, ensemble features)

Task	Metric	Score
kNN (ImageNet)	Acc	85.9
kNN (6-dataset avg)	Acc	90.5
Zero-shot cls (ImageNet)	Acc	79.9
Flickr30K I2T	R@1	94.6
MSCOCO I2T	R@1	70.8
Pascal VOC (1024)	mIoU	88.9
Cityscapes (1024)	mIoU	65.4

Citation

If you use this work in your research, please cite:

@article{chaybouti2025amoe,
  title={AMoE: Agglomerative Mixture-of-Experts Vision Foundation Models},
  author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
  journal={arXiv preprint arXiv:2512.20157},
  year={2025}
}

Downloads last month: 2,428

Safetensors

Model size

0.5B params

Tensor type

F32

Collection including tiiuae/siglino-moe-0.3-0.6B

SigLino: Vision Foundation Models (SigLIP2 + DINOv3)

Collection

Vision encoders distilled from DINOv3 and SigLIP2 (MoE & Dense). Stems from the CVPR 2026 AMoE paper. • 5 items • Updated 3 days ago • 4

Paper for tiiuae/siglino-moe-0.3-0.6B

AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model

Paper • 2512.20157 • Published Dec 23, 2025 • 2