Add model card for Visual Jigsaw Video 7B

by nielsr HF Staff - opened Oct 1, 2025

←

nielsr

Oct 1, 2025

This PR adds a comprehensive model card for the Visual Jigsaw Video 7B model, based on the paper Visual Jigsaw Post-Training Improves MLLMs.

The updates include:

Relevant metadata: license (apache-2.0), pipeline_tag (video-text-to-text), library_name (transformers), base_model (Qwen2.5-VL-7B-Instruct), and additional tags.
A link to the paper on Hugging Face Papers.
Links to the project page and the GitHub repository.
A concise overview of the model and its capabilities, including the overview image.
A sample usage section demonstrating how to use the model with the transformers library, building upon the original repository's instructions regarding its Qwen2.5-VL-7B-Instruct base.
The paper's BibTeX citation.

This should significantly improve the discoverability and usability of the model on the Hub.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment