Add comprehensive model card for ReDiff with metadata and usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ library_name: transformers
4
+ ---
5
+
6
+ # From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model
7
+
8
+ This repository contains the official implementation of **ReDiff**, a refining-enhanced vision-language diffusion model, as presented in the paper [From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model](https://huggingface.co/papers/2510.19871).
9
+
10
+ ReDiff addresses the train-inference discrepancy in discrete diffusion models, which often leads to catastrophic error cascades. It reframes the generation process from passive denoising to active refining, teaching the model to identify and correct its own errors. This innovative approach involves a two-stage training process: first, instilling foundational revision capabilities by training the model to revise synthetic errors, and second, implementing a novel online self-correction loop where the model learns to refine its own flawed drafts from an expert's corrections. This mistake-driven learning significantly improves the coherence and factual accuracy of generated content, enabling stable and efficient parallel generation far superior to traditional denoising methods.
11
+
12
+ - **Project Page**: [https://rediff-hku.github.io/](https://rediff-hku.github.io/)
13
+ - **Code**: [https://github.com/jiyt17/ReDiff](https://github.com/jiyt17/ReDiff)
14
+
15
+ <img src="https://github.com/jiyt17/ReDiff/raw/main/assets/teaser.jpg" alt="ReDiff Teaser" style="width: 100%;"/>
16
+
17
+ ## Quick Inference Demo
18
+
19
+ The ReDiff model is designed for vision-language tasks. To quickly test the model with a visual instruction demo, follow these simple steps:
20
+
21
+ 1. **Clone the repository**
22
+ ```bash
23
+ git clone https://github.com/jiyt17/ReDiff
24
+ cd ReDiff/train
25
+ ```
26
+ 2. **Initialize the environment**
27
+ Run the environment setup script to install necessary dependencies (this includes `transformers`):
28
+ ```bash
29
+ bash init_env.sh
30
+ ```
31
+ 3. **Run the demo script**
32
+ Execute the demo script to test ReDiff on an example image:
33
+ ```bash
34
+ python generate_demo.py
35
+ ```
36
+
37
+ For more detailed usage, training, and evaluation instructions, please refer to the [GitHub repository](https://github.com/jiyt17/ReDiff).
38
+
39
+ ## Citation
40
+
41
+ If you find our work helpful or inspiring, please feel free to cite it.
42
+
43
+ ```bibtex
44
+ @article{ji2025denoising,
45
+ title={From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model},
46
+ author={Ji, Yatai and Wang, Teng and Ge, Yuying and Liu, Zhiheng and Yang, Sidi and Shan, Ying and Luo, Ping},
47
+ journal={arXiv preprint arXiv:2510.19871},
48
+ year={2025}
49
+ }
50
+ ```