SetFit Multilingual OVR Router (ONNX with Attentions)

This is a State-of-the-Art SetFit model exported to ONNX format, specifically trained to classify LLM tasks into three semantic categories: Needle (Fact Retrieval), Reasoning (Logic/Analysis), and Summary (General Recap).

The model is based on paraphrase-multilingual-MiniLM-L12-v2 and has been modified to expose all 12 layers of raw attention weights.

Key Features

  • 3-Class Classification: High-precision separation of intents.
  • Multilingual: Native support for Russian, English, and 50+ other languages.
  • Attention Output: Every inference returns a full attention matrix (batch, heads, seq_len, seq_len) for all 12 layers.
  • Dual Precision: Both FP32 (model.onnx) and INT8 Quantized (model_quantized.onnx) versions are available.
  • Optimized for CPU: Fast ONNX inference via onnxruntime.

Classification Map

  • Label 0: Summary (Chatter, Recaps, TL;DR)
  • Label 1: Needle (Pinpoint facts, parameters, keys, IPs)
  • Label 2: Reasoning (Comparison, analysis, code debugging, logical chains)

Project Origin

This model is a core component of the WAMP-proxy project, an intelligent middleware for research into LLM context optimization.

Quick Inference (Python)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
import json

# 1. Load model and weights
session = ort.InferenceSession("model.onnx")
tokenizer = AutoTokenizer.from_pretrained(".")
with open("router_weights_setfit.json", "r") as f:
    weights = json.load(f)

# 2. Prepare Input
text = "What is the database port?"
inputs = tokenizer(text, return_tensors="np")
onnx_inputs = {
    "input_ids": inputs["input_ids"].astype(np.int64),
    "attention_mask": inputs["attention_mask"].astype(np.int64)
}

# 3. Run
outputs = session.run(None, onnx_inputs)
embeddings = np.mean(outputs[0], axis=1) # Mean pooling

# 4. Predict probabilities (LogReg Head)
scores = np.dot(embeddings, np.array(weights["coef"]).T) + weights["intercept"]
probs = np.exp(scores) / np.exp(scores).sum()
print(f"Probabilities: {probs}")

License

MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for naranor/SetFit-Multilingual-ONNX-Router-V1