Flash Attention 2.8.3 for Windows (RTX 50 Series / Blackwell)

This repository provides a pre-compiled Windows Wheel (.whl) for Flash Attention 2, specifically built and optimized for the NVIDIA RTX 50 Series (Blackwell architecture).

🚀 Key Specifications

Version: Flash Attention 2.8.3
Architecture: sm_120 (NVIDIA Blackwell - RTX 5070 Ti, 5080, 5090, etc.)
Operating System: Windows 10/11 (64-bit)
Python Version: 3.12.x
CUDA Toolkit: 12.9
PyTorch Compatibility: Built with PyTorch 2.8.0+cu129

📦 Installation

Download the .whl file from the Files and versions tab.
Open your terminal in your Python environment.
Run the following command:

pip install flash_attn-2.8.3+cu129sm120-cp312-cp312-win_amd64.whl

✅ Verification

import torch
import flash_attn
print(f"Flash Attention version: {flash_attn.__version__}")
print(f"Device: {torch.cuda.get_device_name(0)}")

⚖️ Disclaimer

Unofficial Build: This binary is an unofficial community build and is not affiliated with, or endorsed by, the original authors (Dao-AILab) or NVIDIA.
Compatibility: Specifically optimized for sm_120. It may not work on older architectures (RTX 30/40 series).
Use at Your Own Risk: The author is not responsible for any system instability, data loss, or hardware issues resulting from the use of this file.

🙏 Credits

Original Flash Attention 2 developed by Tri Dao (Dao-AILab).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support