jiyatai
/

ReDiff

+---
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+# From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model
+This repository contains the official implementation of **ReDiff**, a refining-enhanced vision-language diffusion model, as presented in the paper [From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model](https://huggingface.co/papers/2510.19871).
+ReDiff addresses the train-inference discrepancy in discrete diffusion models, which often leads to catastrophic error cascades. It reframes the generation process from passive denoising to active refining, teaching the model to identify and correct its own errors. This innovative approach involves a two-stage training process: first, instilling foundational revision capabilities by training the model to revise synthetic errors, and second, implementing a novel online self-correction loop where the model learns to refine its own flawed drafts from an expert's corrections. This mistake-driven learning significantly improves the coherence and factual accuracy of generated content, enabling stable and efficient parallel generation far superior to traditional denoising methods.
+- **Project Page**: [https://rediff-hku.github.io/](https://rediff-hku.github.io/)
+- **Code**: [https://github.com/jiyt17/ReDiff](https://github.com/jiyt17/ReDiff)
+<img src="https://github.com/jiyt17/ReDiff/raw/main/assets/teaser.jpg" alt="ReDiff Teaser" style="width: 100%;"/>
+## Quick Inference Demo
+The ReDiff model is designed for vision-language tasks. To quickly test the model with a visual instruction demo, follow these simple steps:
+1.  **Clone the repository**
+    ```bash
+    git clone https://github.com/jiyt17/ReDiff
+    cd ReDiff/train
+    ```
+2.  **Initialize the environment**
+    Run the environment setup script to install necessary dependencies (this includes `transformers`):
+    ```bash
+    bash init_env.sh
+    ```
+3.  **Run the demo script**
+    Execute the demo script to test ReDiff on an example image:
+    ```bash
+    python generate_demo.py
+    ```
+For more detailed usage, training, and evaluation instructions, please refer to the [GitHub repository](https://github.com/jiyt17/ReDiff).
+## Citation
+If you find our work helpful or inspiring, please feel free to cite it.
+```bibtex
+@article{ji2025denoising,
+  title={From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model},
+  author={Ji, Yatai and Wang, Teng and Ge, Yuying and Liu, Zhiheng and Yang, Sidi and Shan, Ying and Luo, Ping},
+  journal={arXiv preprint arXiv:2510.19871},
+  year={2025}
+}
+```