Add model card for Visual Jigsaw Video 7B

#1
by nielsr HF Staff - opened

This PR adds a comprehensive model card for the Visual Jigsaw Video 7B model, based on the paper Visual Jigsaw Post-Training Improves MLLMs.

The updates include:

  • Relevant metadata: license (apache-2.0), pipeline_tag (video-text-to-text), library_name (transformers), base_model (Qwen2.5-VL-7B-Instruct), and additional tags.
  • A link to the paper on Hugging Face Papers.
  • Links to the project page and the GitHub repository.
  • A concise overview of the model and its capabilities, including the overview image.
  • A sample usage section demonstrating how to use the model with the transformers library, building upon the original repository's instructions regarding its Qwen2.5-VL-7B-Instruct base.
  • The paper's BibTeX citation.

This should significantly improve the discoverability and usability of the model on the Hub.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment