SigLino-MoE-0.3-0.6B

Accepted at CVPR 2026

Project Website arXiv GitHub

This work stems from the CVPR 2026 AMoE paper, which designs and applies distillation into a Mixture-of-Experts (MoE) vision architecture. We have chosen the name SigLino for better clarity (SigLIP2 + DINOv3).

Sparse MoE variant of SigLino with top-6 out of 28 experts routing. 0.3B active parameters, 0.6B total.

Part of the SigLino model family.

Usage

import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor

# Load model and processor
model_id = "tiiuae/siglino-moe-0.3-0.6B"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)

# Preprocess image
image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

# Inference
with torch.no_grad():
    outputs = model(**inputs)

# Access specialized features
# Options: 'siglino' (768d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["siglino"]    # (Batch, Tokens, 768)
summary_features = outputs["summary_features"]["siglip2"] # (Batch, 1152)

Model Details

Attribute Value
Architecture MoE (top-6/28)
Parameters (active) 0.3B
Parameters (total) 0.6B
Layers 18
Hidden Dim 768
MoE Dim 384
Patch Size 16x16
Teachers DINOv3, SigLIP2

Results (512x512, ensemble features)

Task Metric Score
kNN (ImageNet) Acc 85.9
kNN (6-dataset avg) Acc 90.5
Zero-shot cls (ImageNet) Acc 79.9
Flickr30K I2T R@1 94.6
MSCOCO I2T R@1 70.8
Pascal VOC (1024) mIoU 88.9
Cityscapes (1024) mIoU 65.4

Citation

If you use this work in your research, please cite:

@article{chaybouti2025amoe,
  title={AMoE: Agglomerative Mixture-of-Experts Vision Foundation Models},
  author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
  journal={arXiv preprint arXiv:2512.20157},
  year={2025}
}
Downloads last month
2,428
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including tiiuae/siglino-moe-0.3-0.6B

Paper for tiiuae/siglino-moe-0.3-0.6B