Integrate with Sentence Transformers v5.4

This PR adds the configuration files needed to load this model directly as a CrossEncoder via Sentence Transformers. The model uses an any-to-any Transformer with a LogitScore head that computes the logit difference between the "yes" and "no" tokens, i.e. the model's confidence that a document is relevant to a query. The model supports text, image, video, and multimodal (e.g. combinations of the previous) inputs via a structured message format.

A custom additional_chat_templates/reranker.jinja maps Sentence Transformers' structured messages (with "query" and "document" roles) to the model's expected format with the <Instruct>, <Query>, and <Document> fields, including the system prompt for yes/no judgment. The template includes a default instruction ("Given a search query, retrieve relevant candidates that answer the query.") as a fallback when no prompt is provided. unpad_inputs is set to false as Qwen3 can't flatten inputs nicely.

Added files:

modules.json: pipeline: Transformer & LogitScore
sentence_bert_config.json: any-to-any task, structured message format, multimodal config
config_sentence_transformers.json: default prompt ("Retrieve text relevant to the user's query."), Identity activation
additional_chat_templates/reranker.jinja: custom template for the reranker format
1_LogitScore/config.json: yes/no token IDs

Once the Sentence Transformers v5.4 release is out, the model can be used immediately like so:

from sentence_transformers import CrossEncoder

model = CrossEncoder("Qwen/Qwen3-VL-Reranker-8B", revision="refs/pr/9")

query = "A woman playing with her dog on a beach at sunset."
documents = [
    "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.",
    "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
    {
        "text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.",
        "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
    },
]

prompt = "Retrieve images or text relevant to the user's query."
pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs, prompt=prompt)
print(scores)
# [1.3125, 0.25, 0.4375]

rankings = model.rank(query, documents, prompt=prompt)
print(rankings)
# [{'corpus_id': 0, 'score': 1.3125}, {'corpus_id': 2, 'score': 0.4375}, {'corpus_id': 1, 'score': 0.25}]

And after merging, the revision argument can be dropped.

Note that none of the old behaviour is affected/changed. It only adds an additional way to run this model in a familiar and common format.

If you are able to merge this before tomorrow's Sentence Transformers v5.4 release, then I will be able to include this in my blogpost and documentation as a release model without revision. Otherwise I'll document it with revision and I can drop that later.

Tom Aarsen

tomaarsen changed pull request status to open Apr 8

thenlper changed pull request status to merged 26 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment