Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Abstract
Stable-DiffCoder demonstrates superior code modeling performance compared to autoregressive baselines through block diffusion continual pretraining and efficient training mechanisms.
Diffusion-based language models (DLLMs) offer non-sequential, block-wise generation and richer data reuse compared to autoregressive (AR) models, but existing code DLLMs still lag behind strong AR baselines under comparable budgets. We revisit this setting in a controlled study and introduce Stable-DiffCoder, a block diffusion code model that reuses the Seed-Coder architecture, data, and training pipeline. To enable efficient knowledge learning and stable training, we incorporate a block diffusion continual pretraining (CPT) stage enhanced by a tailored warmup and block-wise clipped noise schedule. Under the same data and architecture, Stable-DiffCoder overall outperforms its AR counterpart on a broad suite of code benchmarks. Moreover, relying only on the CPT and supervised fine-tuning stages, Stable-DiffCoder achieves stronger performance than a wide range of \~8B ARs and DLLMs, demonstrating that diffusion-based training can improve code modeling quality beyond AR training alone. Moreover, diffusion-based any-order modeling improves structured code modeling for editing and reasoning, and through data augmentation, benefits low-resource coding languages.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs (2025)
- LLaDA2.0: Scaling Up Diffusion Language Models to 100B (2025)
- WeDLM: Reconciling Diffusion Language Models with Standard Causal Attention for Fast Inference (2025)
- CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models (2026)
- SDAR-VL: Stable and Efficient Block-wise Diffusion for Vision-Language Understanding (2025)
- Fast and Accurate Causal Parallel Decoding using Jacobi Forcing (2025)
- Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper