SigLino: Vision Foundation Models (SigLIP2 + DINOv3)
Collection
Vision encoders distilled from DINOv3 and SigLIP2 (MoE & Dense). Stems from the CVPR 2026 AMoE paper. • 5 items • Updated • 4
Accepted at CVPR 2026
This work stems from the CVPR 2026 AMoE paper, which designs and applies distillation into a Mixture-of-Experts (MoE) vision architecture. We have chosen the name SigLino for better clarity (SigLIP2 + DINOv3).
Sparse MoE variant of SigLino with top-6 out of 28 experts routing. 0.3B active parameters, 0.6B total.
Part of the SigLino model family.
import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor
# Load model and processor
model_id = "tiiuae/siglino-moe-0.3-0.6B"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)
# Preprocess image
image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
# Inference
with torch.no_grad():
outputs = model(**inputs)
# Access specialized features
# Options: 'siglino' (768d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["siglino"] # (Batch, Tokens, 768)
summary_features = outputs["summary_features"]["siglip2"] # (Batch, 1152)
| Attribute | Value |
|---|---|
| Architecture | MoE (top-6/28) |
| Parameters (active) | 0.3B |
| Parameters (total) | 0.6B |
| Layers | 18 |
| Hidden Dim | 768 |
| MoE Dim | 384 |
| Patch Size | 16x16 |
| Teachers | DINOv3, SigLIP2 |
| Task | Metric | Score |
|---|---|---|
| kNN (ImageNet) | Acc | 85.9 |
| kNN (6-dataset avg) | Acc | 90.5 |
| Zero-shot cls (ImageNet) | Acc | 79.9 |
| Flickr30K I2T | R@1 | 94.6 |
| MSCOCO I2T | R@1 | 70.8 |
| Pascal VOC (1024) | mIoU | 88.9 |
| Cityscapes (1024) | mIoU | 65.4 |
If you use this work in your research, please cite:
@article{chaybouti2025amoe,
title={AMoE: Agglomerative Mixture-of-Experts Vision Foundation Models},
author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
journal={arXiv preprint arXiv:2512.20157},
year={2025}
}