Improve FG-CLIP 2 model card: Add Chinese language, project page, and enhanced sample usage

by nielsr HF Staff - opened Oct 17

←

nielsr

Oct 17

This PR aims to improve the FG-CLIP 2 model card by:

Adding zh to the language metadata, as the model is explicitly designed for bilingual (English and Chinese) fine-grained vision-language understanding, as stated in the paper abstract and GitHub README.
Including a direct link to the project page (https://360cvgroup.github.io/FG-CLIP) for better discoverability.
Incorporating an overview image (FGCLIP2_compare_all_n.png) from the original GitHub repository to visually summarize the model's performance.
Updating the "Retrieval" sample usage section with bilingual (Chinese) captions, max_length=196, and walk_type="long" to better showcase the model's fine-grained and long-caption capabilities, consistent with the GitHub README. A corresponding output image (cn_re_demo.png) is also added.
Updating the "Dense feature effect display" sample usage with bilingual captions from the GitHub README.
Ensuring all image links in the markdown content use absolute Hugging Face repository URLs for correct rendering on the Hub.

These changes enhance the model card's completeness and clarity, making it more informative and useful for users on the Hugging Face platform.

qingshan777 changed pull request status to merged Oct 17

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment