FLUX.2-dev — Attention-only INT8 Weight-Only Transformer (ROCm)

This repository provides an INT8 weight-only quantized transformer for
black-forest-labs/FLUX.2-dev.

It is designed to be:

✅ ROCm-compatible
✅ Stable on AMD Instinct MI210
✅ Image-quality preserving

Only attention Linear layers (Q/K/V + projections) are quantized. All other components remain in BF16.

🔍 What is included

✅ Transformer with attention-only INT8 weight-only quantization
✅ TorchAO-based quantization (no bitsandbytes)
✅ Compatible with Diffusers standard pipelines

❌ What is NOT included

❌ VAE
❌ Text encoders
❌ Scheduler

These components are automatically loaded from the base FLUX.2 model.

💡 Why attention-only INT8?

Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm. Quantizing only attention layers provides:

Significant VRAM reduction
Stable generation
No "confetti noise" artifacts
Safe inference on MI210 (64 GB)

🚀 Usage (Diffusers)

import torch
from diffusers import Flux2Pipeline, AutoModel

BASE_MODEL = "black-forest-labs/FLUX.2-dev"
ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"

dtype = torch.bfloat16
device = "cuda"  # ROCm uses "cuda" in PyTorch

transformer = AutoModel.from_pretrained(
    ATTN_INT8,
    subfolder="transformer_attn_int8wo",
    torch_dtype=dtype,
    use_safetensors=False,
).to(device)

pipe = Flux2Pipeline.from_pretrained(
    BASE_MODEL,
    transformer=transformer,
    torch_dtype=dtype,
)

pipe.enable_attention_slicing()
pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A realistic starter pack figurine in a blister box, studio lighting",
    num_inference_steps=28,
    guidance_scale=4,
    height=1024,
    width=1024,
).images[0]

image.save("out.png")

Downloads last month: -

Model tree for AmdGoose/FLUX.2-dev-transformer-int8wo

Base model

black-forest-labs/FLUX.2-dev

Finetuned

(24)

this model