YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

SageAttention 2.1.1 - Pre-built Wheel for RTX 5090 (Blackwell)

Pre-compiled SageAttention wheel with sm_120 kernel support for NVIDIA RTX 5090 and other Blackwell architecture GPUs.

Specifications

Component Version
SageAttention 2.1.1
Python 3.12
Platform Linux x86_64
CUDA Architecture sm_120 (Blackwell)
Built From thu-ml/SageAttention sm120_compilation branch

Supported GPUs

  • NVIDIA RTX 5090
  • NVIDIA RTX 5080
  • NVIDIA RTX 5070 Ti
  • NVIDIA RTX 5070
  • Other Blackwell architecture GPUs (sm_120)

Installation

Direct pip install

pip install https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl

Download and install

wget https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl
pip install sageattention-2.1.1-cp312-cp312-linux_x86_64.whl

In Dockerfile

RUN wget -q -O /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl \
    "https://huggingface.co/stoor0777/sageattention-rtx5090/resolve/main/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl" && \
    pip install /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl && \
    rm /tmp/sageattention-2.1.1-cp312-cp312-linux_x86_64.whl

Requirements

  • CUDA: 12.8+
  • PyTorch: 2.7.0+ with CUDA 12.8 support
  • Python: 3.12
  • Driver: NVIDIA 570.x+

Why This Wheel Exists

The official SageAttention repository doesn't publish pre-built wheels. Building SageAttention from source requires a GPU at compile time to generate the CUDA kernels for your specific architecture.

This wheel was compiled on a RunPod GPU pod with RTX 5090, making it ready-to-use for:

  • Docker/serverless deployments
  • CI/CD pipelines
  • Any environment where GPU isn't available during build

Performance

SageAttention provides 30-50% speedup for attention operations in AI models compared to standard PyTorch attention, particularly beneficial for:

  • Video generation (WAN 2.1/2.2, Mochi, etc.)
  • Image generation (FLUX, SD3, etc.)
  • Large language models

Build Details

Built on RunPod GPU pod with RTX 5090:

git clone https://github.com/thu-ml/SageAttention.git
cd SageAttention
git checkout sm120_compilation
export TORCH_CUDA_ARCH_LIST="12.0"
python setup.py bdist_wheel

Verification

Test if SageAttention is working:

import torch
import sageattention

print("SageAttention version:", sageattention.__version__)
print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))

License

SageAttention is released under the Apache 2.0 License. See the original repository for details.

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support