Tarka-Embedding-150M-V1 (ONNX)

ONNX version of Tarka-AIR/Tarka-Embedding-150M-V1.

  • Embedding dimension: 768
  • Context length: 2048
  • Model size: ~600MB

Usage

With ONNX Runtime (Python)

import onnxruntime as ort
from transformers import AutoTokenizer

session = ort.InferenceSession("tarka-150m-v1-onnx/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("permutans/Tarka-Embedding-150M-V1-ONNX")

texts = [
    "The weather is lovely today.",
    "It's so sunny outside!",
    "He drove to the stadium.",
]

embeddings = []
for text in texts:
    inputs = tokenizer(text, return_tensors="np")
    _, sentence_embedding = session.run(
        None,
        {"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
    )
    embeddings.append(sentence_embedding[0])

import numpy as np

embeddings = np.array(embeddings)
print(embeddings.shape)  # (3, 768)

# Compute cosine similarities
from sklearn.metrics.pairwise import cosine_similarity

similarities = cosine_similarity(embeddings)
print(similarities)

With FastEmbed (Rust)

Compatible with fastembed-rs for high-performance embedding generation.

Model Outputs

  • token_embeddings: Token-level embeddings (batch_size, sequence_length, 768)
  • sentence_embedding: Pooled sentence embeddings (batch_size, 768) - use this for most tasks

Performance

This ONNX export works with both CPU and CUDA execution providers for flexible deployment.

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for permutans/Tarka-Embedding-150M-V1-ONNX

Quantized
(2)
this model