Flash Attention 2.8.3 for Windows (RTX 50 Series / Blackwell)

This repository provides a pre-compiled Windows Wheel (.whl) for Flash Attention 2, specifically built and optimized for the NVIDIA RTX 50 Series (Blackwell architecture).

πŸš€ Key Specifications

  • Version: Flash Attention 2.8.3
  • Architecture: sm_120 (NVIDIA Blackwell - RTX 5070 Ti, 5080, 5090, etc.)
  • Operating System: Windows 10/11 (64-bit)
  • Python Version: 3.12.x
  • CUDA Toolkit: 12.9
  • PyTorch Compatibility: Built with PyTorch 2.8.0+cu129

πŸ“¦ Installation

  1. Download the .whl file from the Files and versions tab.
  2. Open your terminal in your Python environment.
  3. Run the following command:
pip install flash_attn-2.8.3+cu129sm120-cp312-cp312-win_amd64.whl

βœ… Verification

import torch
import flash_attn
print(f"Flash Attention version: {flash_attn.__version__}")
print(f"Device: {torch.cuda.get_device_name(0)}")

βš–οΈ Disclaimer

  • Unofficial Build: This binary is an unofficial community build and is not affiliated with, or endorsed by, the original authors (Dao-AILab) or NVIDIA.
  • Compatibility: Specifically optimized for sm_120. It may not work on older architectures (RTX 30/40 series).
  • Use at Your Own Risk: The author is not responsible for any system instability, data loss, or hardware issues resulting from the use of this file.

πŸ™ Credits

Original Flash Attention 2 developed by Tri Dao (Dao-AILab).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support