YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

Jianhao Zeng^1,*, Yancheng Bai^1,*, Ruidong Chen^1,2, Xuanpu Zhang², Lei Sun¹

Dongyang Jin¹, Ryan Xu¹, Nannan Zhang^3,#, Dan Song², Xiangxiang Chu¹

¹Amap, Alibaba Group ²Tianjin University

³Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

Abstract

Video virtual try-on technology provides a cost-effective solution for creating marketing videos in fashion e-commerce. However, its practical adoption is hindered by two critical limitations. First, the reliance on a single garment image as input in current virtual try-on datasets limits the accurate capture of realistic texture details. Second, most existing methods focus solely on generating full-shot virtual try-on videos, neglecting the business's demand for videos that also provide detailed close-ups. To address these challenges, we introduce a high-resolution dataset for video-based virtual try-on. This dataset offers two key features. First, it provides more detailed information on the garments, which includes high-fidelity images with detailed close-ups and textual descriptions; Second, it uniquely includes full-shot and close-up try-on videos of real human models. Furthermore, accurately assessing consistency becomes significantly more critical for the close-up videos, which demand high-fidelity preservation of garment details. To facilitate such fine-grained evaluation, we propose a new garment consistency metric VGID (Video Garment Inception Distance) that quantifies the preservation of both texture and structure. Our experiments validate these contributions. We demonstrate that by utilizing the detailed images from our dataset, existing video generation models can extract and incorporate texture features, significantly enhancing the realism and detail fidelity of virtual try-on results. Furthermore, we conduct a comprehensive benchmark of recent models. The benchmark effectively identifies the texture and structural preservation problems among current methods.

Dataset Access

Sets the environment variable to point to a mirror site for faster and more stable Hugging Face connections.

export HF_ENDPOINT=https://hf-mirror.com

Downloads the snapshot from Huggingface and saves it to the local data directory.

from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="JianhaoZeng/Eevee",
    local_dir="./data",
    repo_type="model"
)

Merges the split multi-part files into a single zip archive and extracts the contents.

cd ./data
cat Eevee.zip.part* > Eevee.zip
unzip Eevee.zip -d ./Eevee

Data Organization

As illustrated in ./Eevee, the following data should be provided.

|-- dresses
|   |-- 00030
|   |   |-- garment_caption.txt
|   |   |-- garment_detail.png
|   |   |-- garment_line.png
|   |   |-- garment_mask.png
|   |   |-- garment.png
|   |   |-- person_agnostic.png
|   |   |-- person_mask.png
|   |   |-- person.png
|   |   |-- video_0_agnostic_sam.mp4
|   |   |-- video_0_agnostic.mp4
|   |   |-- video_0_densepose.mp4
|   |   |-- video_0_mask.mp4
|   |   |-- video_0.mp4
|   |   |-- video_1_agnostic_sam.mp4
|   |   |-- video_1_agnostic.mp4
|   |   |-- video_1_densepose.mp4
|   |   |-- video_1_mask.mp4
|   |   |-- video_1.mp4
|   |-- 00032
|   ...
|-- lower_body
|   |-- 00003
|   |-- 00006
|   ...
|-- upper_bdoy
|   |-- 00000
|   |-- 00001
|   ...
|   dresses_test.csv
|   dresses_train.csv
|   lower_test.csv
|   lower_train.csv
|   upper_test.csv
|   upper_train.csv

File Name	Source	Description
--- Garment Data ---
garment.png	Raw data	In-shop garment image
garment_detail.png	Raw data	Dataied garment image
garment_caption.txt	Qwen-VL-MAX	Detailed text description of garment image generated by Qwen-vl-max
garment_line.png	AniLines	Lineart of garment image generated by AniLines
garment_mask.png	Grounded SAM-2	Binary mask of garment image generated by Grounded SAM-2
--- Person Data ---
person.png	Raw data	Image of a person wearing the corresponding garment
person_mask.png	Grounded SAM-2	Binary mask of the garment area on the person image generated by Grounded SAM-2
person_agnostic.png	Multiplication	Person image with garment area masked out generated by pixel-wise multiplication
--- Full-shot person video Data ---
video_0.mp4	Raw data	Full-shot person video
video_0_mask.mp4	OpenPose	Binary mask of the garment area on the full-shot person video generated by OpenPose
video_0_agnostic.mp4	Multiplication	Full-shot person video with garment area masked out generated by pixel-wise multiplication
video_0_agnostic_sam.mp4	Grounded SAM-2	Full-shot person video with garment area masked out generated by Grounded SAM-2
video_0_densepose.mp4	Detectron2	DensePose UV coordinates for the human body of full-shot person video generated by Detectron2
--- Close-up person video Data ---
video_1.mp4	Raw data	Close-up person video
video_1_mask.mp4	Grounded SAM-2	Binary mask of the garment area on the Close-up person video generated by Grounded SAM-2
video_1_agnostic.mp4	Multiplication	Close-up person video with garment area masked out generated by pixel-wise multiplication
video_1_agnostic_sam.mp4	Grounded SAM-2	Close-up person video with garment area masked out generated by Grounded SAM-2
video_1_densepose.mp4	Detectron2	DensePose UV coordinates for the human body of close-up person video generated by Detectron2

Contact

If you have any questions, please reach out via email at [email protected]

Citation

If you find this work useful for your research, please cite our paper:

@article{zeng2025eevee,
  title={Eevee: Towards Close-up High-resolution Video-based Virtual Try-on},
  author={Zeng, Jianhao and Bai, Yancheng and Chen, Ruidong and Zhang, Xuanpu and Sun, Lei and Jin, Dongyang and Xu, Ryan and Zhang, Nannan and Song, Dan and Chu, Xiangxiang},
  journal={arXiv preprint arXiv:2511.18957},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support