Flash Attention 2.8.3 for Windows (RTX 50 Series / Blackwell)
This repository provides a pre-compiled Windows Wheel (.whl) for Flash Attention 2, specifically built and optimized for the NVIDIA RTX 50 Series (Blackwell architecture).
π Key Specifications
- Version: Flash Attention 2.8.3
- Architecture:
sm_120(NVIDIA Blackwell - RTX 5070 Ti, 5080, 5090, etc.) - Operating System: Windows 10/11 (64-bit)
- Python Version: 3.12.x
- CUDA Toolkit: 12.9
- PyTorch Compatibility: Built with PyTorch 2.8.0+cu129
π¦ Installation
- Download the
.whlfile from the Files and versions tab. - Open your terminal in your Python environment.
- Run the following command:
pip install flash_attn-2.8.3+cu129sm120-cp312-cp312-win_amd64.whl
β Verification
import torch
import flash_attn
print(f"Flash Attention version: {flash_attn.__version__}")
print(f"Device: {torch.cuda.get_device_name(0)}")
βοΈ Disclaimer
- Unofficial Build: This binary is an unofficial community build and is not affiliated with, or endorsed by, the original authors (Dao-AILab) or NVIDIA.
- Compatibility: Specifically optimized for
sm_120. It may not work on older architectures (RTX 30/40 series). - Use at Your Own Risk: The author is not responsible for any system instability, data loss, or hardware issues resulting from the use of this file.
π Credits
Original Flash Attention 2 developed by Tri Dao (Dao-AILab).
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support