YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
SageAttention 2.1.1 - Pre-built Wheel for RTX 5090 (Blackwell)
Pre-compiled SageAttention wheel with sm_120 kernel support for NVIDIA RTX 5090 and other Blackwell architecture GPUs.
Specifications
| Component | Version |
|---|---|
| SageAttention | 2.1.1 |
| Python | 3.12 |
| Platform | Linux x86_64 |
| CUDA Architecture | sm_120 (Blackwell) |
| Built From | thu-ml/SageAttention sm120_compilation branch |
Supported GPUs
- NVIDIA RTX 5090
- NVIDIA RTX 5080
- NVIDIA RTX 5070 Ti
- NVIDIA RTX 5070
- Other Blackwell architecture GPUs (sm_120)
Installation
Direct pip install
pip install https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl
Download and install
wget https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl
pip install sageattention-2.1.1-cp312-cp312-linux_x86_64.whl
In Dockerfile
RUN wget -q -O /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl \
"https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl" && \
pip install /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl && \
rm /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl
Requirements
- CUDA: 12.8+
- PyTorch: 2.7.0+ with CUDA 12.8 support
- Python: 3.12
- Driver: NVIDIA 570.x+
Why This Wheel Exists
The official SageAttention repository doesn't publish pre-built wheels. Building SageAttention from source requires a GPU at compile time to generate the CUDA kernels for your specific architecture.
This wheel was compiled on a RunPod GPU pod with RTX 5090, making it ready-to-use for:
- Docker/serverless deployments
- CI/CD pipelines
- Any environment where GPU isn't available during build
Performance
SageAttention provides 30-50% speedup for attention operations in AI models compared to standard PyTorch attention, particularly beneficial for:
- Video generation (WAN 2.1/2.2, Mochi, etc.)
- Image generation (FLUX, SD3, etc.)
- Large language models
Build Details
Built on RunPod GPU pod with RTX 5090:
git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
git checkout sm120_compilation
export TORCH_CUDA_ARCH_LIST="12.0"
python setup.py bdist_wheel
Verification
Test if SageAttention is working:
import torch
import sageattention
print("SageAttention version:", sageattention.__version__)
print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))
License
SageAttention is released under the Apache 2.0 License. See the original repository for details.
Credits
- thu-ml/SageAttention - Original SageAttention implementation
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support