FLUX.2-dev β Attention-only INT8 Weight-Only Transformer (ROCm)
This repository provides an INT8 weight-only quantized transformer forblack-forest-labs/FLUX.2-dev.
It is designed to be:
- β ROCm-compatible
- β Stable on AMD Instinct MI210
- β Image-quality preserving
Only attention Linear layers (Q/K/V + projections) are quantized. All other components remain in BF16.
π What is included
- β Transformer with attention-only INT8 weight-only quantization
- β TorchAO-based quantization (no bitsandbytes)
- β Compatible with Diffusers standard pipelines
β What is NOT included
- β VAE
- β Text encoders
- β Scheduler
These components are automatically loaded from the base FLUX.2 model.
π‘ Why attention-only INT8?
Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm. Quantizing only attention layers provides:
- Significant VRAM reduction
- Stable generation
- No "confetti noise" artifacts
- Safe inference on MI210 (64 GB)
π Usage (Diffusers)
import torch
from diffusers import Flux2Pipeline, AutoModel
BASE_MODEL = "black-forest-labs/FLUX.2-dev"
ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"
dtype = torch.bfloat16
device = "cuda" # ROCm uses "cuda" in PyTorch
transformer = AutoModel.from_pretrained(
ATTN_INT8,
subfolder="transformer_attn_int8wo",
torch_dtype=dtype,
use_safetensors=False,
).to(device)
pipe = Flux2Pipeline.from_pretrained(
BASE_MODEL,
transformer=transformer,
torch_dtype=dtype,
)
pipe.enable_attention_slicing()
pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A realistic starter pack figurine in a blister box, studio lighting",
num_inference_steps=28,
guidance_scale=4,
height=1024,
width=1024,
).images[0]
image.save("out.png")
- Downloads last month
- -
Model tree for AmdGoose/FLUX.2-dev-transformer-int8wo
Base model
black-forest-labs/FLUX.2-dev