Tarka-Embedding-150M-V1 (ONNX)
ONNX version of Tarka-AIR/Tarka-Embedding-150M-V1.
- Embedding dimension: 768
- Context length: 2048
- Model size: ~600MB
Usage
With ONNX Runtime (Python)
import onnxruntime as ort
from transformers import AutoTokenizer
session = ort.InferenceSession("tarka-150m-v1-onnx/model.onnx")
tokenizer = AutoTokenizer.from_pretrained("permutans/Tarka-Embedding-150M-V1-ONNX")
texts = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = []
for text in texts:
inputs = tokenizer(text, return_tensors="np")
_, sentence_embedding = session.run(
None,
{"input_ids": inputs["input_ids"], "attention_mask": inputs["attention_mask"]},
)
embeddings.append(sentence_embedding[0])
import numpy as np
embeddings = np.array(embeddings)
print(embeddings.shape) # (3, 768)
# Compute cosine similarities
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity(embeddings)
print(similarities)
With FastEmbed (Rust)
Compatible with fastembed-rs for high-performance embedding generation.
Model Outputs
token_embeddings: Token-level embeddings (batch_size, sequence_length, 768)sentence_embedding: Pooled sentence embeddings (batch_size, 768) - use this for most tasks
Performance
This ONNX export works with both CPU and CUDA execution providers for flexible deployment.
- Downloads last month
- 21
Model tree for permutans/Tarka-Embedding-150M-V1-ONNX
Base model
Tarka-AIR/Tarka-Embedding-150M-V1