Integrate with Sentence Transformers v5.4
Hello!
Pull Request overview
- Integrate this model using a Sentence Transformers
CrossEncoder
Details
This PR adds the configuration files needed to load this model directly as a CrossEncoder via Sentence Transformers. The model uses an any-to-any Transformer with a LogitScore head that computes the logit difference between the "yes" and "no" tokens, i.e. the model's confidence that a document is relevant to a query. The model supports text, image, video, and multimodal (e.g. combinations of the previous) inputs via a structured message format.
A custom additional_chat_templates/reranker.jinja maps Sentence Transformers' structured messages (with "query" and "document" roles) to the model's expected format with the <Instruct>, <Query>, and <Document> fields, including the system prompt for yes/no judgment. The template includes a default instruction ("Given a search query, retrieve relevant candidates that answer the query.") as a fallback when no prompt is provided. unpad_inputs is set to false as Qwen3 can't flatten inputs nicely.
Added files:
modules.json: pipeline:Transformer&LogitScoresentence_bert_config.json:any-to-anytask, structured message format, multimodal configconfig_sentence_transformers.json: default prompt ("Retrieve text relevant to the user's query."), Identity activationadditional_chat_templates/reranker.jinja: custom template for the reranker format1_LogitScore/config.json: yes/no token IDs
Once the Sentence Transformers v5.4 release is out, the model can be used immediately like so:
from sentence_transformers import CrossEncoder
model = CrossEncoder("Qwen/Qwen3-VL-Reranker-8B", revision="refs/pr/9")
query = "A woman playing with her dog on a beach at sunset."
documents = [
"A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.",
"https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
{
"text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
]
prompt = "Retrieve images or text relevant to the user's query."
pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs, prompt=prompt)
print(scores)
# [1.3125, 0.25, 0.4375]
rankings = model.rank(query, documents, prompt=prompt)
print(rankings)
# [{'corpus_id': 0, 'score': 1.3125}, {'corpus_id': 2, 'score': 0.4375}, {'corpus_id': 1, 'score': 0.25}]
And after merging, the revision argument can be dropped.
Note that none of the old behaviour is affected/changed. It only adds an additional way to run this model in a familiar and common format.
If you are able to merge this before tomorrow's Sentence Transformers v5.4 release, then I will be able to include this in my blogpost and documentation as a release model without revision. Otherwise I'll document it with revision and I can drop that later.
- Tom Aarsen