YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SageAttention 2.1.1 - Pre-built Wheel for RTX 5090 (Blackwell)

Pre-compiled SageAttention wheel with sm_120 kernel support for NVIDIA RTX 5090 and other Blackwell architecture GPUs.

Specifications

Component	Version
SageAttention	2.1.1
Python	3.12
Platform	Linux x86_64
CUDA Architecture	sm_120 (Blackwell)
Built From	thu-ml/SageAttention `sm120_compilation` branch

Supported GPUs

NVIDIA RTX 5090
NVIDIA RTX 5080
NVIDIA RTX 5070 Ti
NVIDIA RTX 5070
Other Blackwell architecture GPUs (sm_120)

Installation

Direct pip install

pip install https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl

Download and install

wget https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl
pip install sageattention-2.1.1-cp312-cp312-linux_x86_64.whl

In Dockerfile

RUN wget -q -O /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl \
    "https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl" && \
    pip install /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl && \
    rm /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl

Requirements

CUDA: 12.8+
PyTorch: 2.7.0+ with CUDA 12.8 support
Python: 3.12
Driver: NVIDIA 570.x+

Why This Wheel Exists

The official SageAttention repository doesn't publish pre-built wheels. Building SageAttention from source requires a GPU at compile time to generate the CUDA kernels for your specific architecture.

This wheel was compiled on a RunPod GPU pod with RTX 5090, making it ready-to-use for:

Docker/serverless deployments
CI/CD pipelines
Any environment where GPU isn't available during build

Performance

SageAttention provides 30-50% speedup for attention operations in AI models compared to standard PyTorch attention, particularly beneficial for:

Video generation (WAN 2.1/2.2, Mochi, etc.)
Image generation (FLUX, SD3, etc.)
Large language models

Build Details

Built on RunPod GPU pod with RTX 5090:

git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
git checkout sm120_compilation
export TORCH_CUDA_ARCH_LIST="12.0"
python setup.py bdist_wheel

Verification

Test if SageAttention is working:

import torch
import sageattention

print("SageAttention version:", sageattention.__version__)
print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))

License

SageAttention is released under the Apache 2.0 License. See the original repository for details.

Credits

thu-ml/SageAttention - Original SageAttention implementation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support