Instructions to use perplexity-ai/pplx-embed-v1-late-0.6b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use perplexity-ai/pplx-embed-v1-late-0.6b with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("perplexity-ai/pplx-embed-v1-late-0.6b") - Notebooks
- Google Colab
- Kaggle
pplx-embed-v1-late-0.6b: Late-Interaction Embeddings
pplx-embed-v1-late-0.6b is a token-level late-interaction embedding model for retrieval with MaxSim scoring. It is continued training of pplx-embed-v1-0.6b using ContrastiveLoss to optimize token-level MaxSim.
Token-level embedding dim is 128, which hits the fast path of the optional erikkaum/maxsim MaxSim kernel.
Usage
Using PyLate (indexing + retrieval)
from pylate import indexes, models, retrieve
model = models.ColBERT(
model_name_or_path="perplexity-ai/pplx-embed-v1-late-0.6b",
trust_remote_code=True,
)
documents_ids = ["1", "2", "3"]
documents = [
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
]
index = indexes.PLAID(
index_folder="pylate-index",
index_name="pplx-embed-v1-late-0.6b",
override=True,
)
documents_embeddings = model.encode(documents, is_query=False)
index.add_documents(documents_ids=documents_ids, documents_embeddings=documents_embeddings)
retriever = retrieve.ColBERT(index=index)
queries_embeddings = model.encode(["What motivates scientific discovery?"], is_query=True)
scores = retriever.retrieve(queries_embeddings=queries_embeddings, k=3)
print(scores)
Using the erikkaum/maxsim kernel (fast MaxSim scoring)
Fused MaxSim for reranking, pair scoring, or evaluation. Supports CUDA (sm_80/86/89) and Metal (Apple Silicon); fp32/fp16/bf16 in, fp32 out; forward-only.
import torch
from kernels import get_kernel
from pylate import models
device = "cuda" if torch.cuda.is_available() else "mps"
model = models.ColBERT(
model_name_or_path="perplexity-ai/pplx-embed-v1-late-0.6b",
trust_remote_code=True,
device=device,
)
maxsim = get_kernel("erikkaum/maxsim", version=1, trust_remote_code=True)
q_emb = model.encode(["What motivates scientific discovery?"], is_query=True, convert_to_tensor=True)
d_emb = model.encode([
"Scientists explore the universe driven by curiosity.",
"Children learn through curious exploration.",
"Historical discoveries began with curious questions.",
], is_query=False, convert_to_tensor=True)
# Pad to [B=1, n_candidates, Ld_max, dim] for score_candidates_padded.
Lq, dim = q_emb[0].shape
n, Ld_max = len(d_emb), max(d.shape[0] for d in d_emb)
queries_pad = q_emb[0].unsqueeze(0).to(device, torch.float16)
documents_pad = torch.zeros(1, n, Ld_max, dim, device=device, dtype=torch.float16)
for i, d in enumerate(d_emb):
documents_pad[0, i, : d.shape[0]] = d.to(device, torch.float16)
query_lengths = torch.tensor([Lq], dtype=torch.int32, device=device)
doc_lengths = torch.tensor([[d.shape[0] for d in d_emb]], dtype=torch.int32, device=device)
scores = maxsim.score_candidates_padded(queries_pad, documents_pad, query_lengths, doc_lengths)
print(scores[0].tolist()) # fp32 scores per candidate
For ragged variable-length pair scoring (eval, distillation, hard-negative mining), use maxsim.score_pairs_packed(...) instead — see the kernel card for the packed API.
Performance
We evaluate pplx-embed-v1-late-0.6b on two standard late-interaction retrieval suites and report the average nDCG@10:
- BEIR — average over 15 English retrieval tasks.
- MIRACL — average over 18 languages.
| Benchmark | pplx-embed-v1-late-0.6b |
Reference |
|---|---|---|
| BEIR (15 tasks) | 56.61 | colbert-zero: 55.43 |
| MIRACL (18 langs) | 66.62 | jina-colbert-v2: 62.28 |
Technical Details
This model uses late interaction: queries and documents are encoded as token-level vectors and scored with MaxSim rather than pooled into a single vector.
For background on the base embedding family, see the pplx-embed-v1-0.6b model card and the technical report: https://arxiv.org/abs/2602.11151.
- Downloads last month
- 4,865
Model tree for perplexity-ai/pplx-embed-v1-late-0.6b
Base model
perplexity-ai/pplx-embed-v1-0.6b