Instructions to use epfl-vita/svi-model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use epfl-vita/svi-model with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("epfl-vita/svi-model", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
| datasets: | |
| - vita-video-gen/svi-benchmark | |
| language: | |
| - en | |
| tags: | |
| - video generation | |
| pipeline_tag: image-to-video | |
| library_name: diffusers | |
| license: mit | |
| project_page: https://stable-video-infinity.github.io/homepage/ | |
| papers: | |
| - title: 'Stable Video Infinity: Infinite-Length Video Generation with Error Recycling' | |
| authors: | |
| - Wuyang Li | |
| - Wentao Pan | |
| - Po-Chien Luan | |
| - Yang Gao | |
| - Alexandre Alahi | |
| url: https://huggingface.co/papers/2510.09212 | |
| conference: arXiv preprint, 2025 | |
| <div align="center"> | |
| <h1>Stable Video Infinity: Infinite-Length Video Generation with Error Recycling<h1> | |
| <p align="center"> | |
| <a href="https://huggingface.co/papers/2510.09212"> <img src="https://img.shields.io/badge/Paper-HuggingFace-red?logo=huggingface&logoColor=yellow" alt="Paper on Hugging Face"/> </a> | |
| <a href="https://stable-video-infinity.github.io/homepage/"> <img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"/> </a> | |
| <a href="https://github.com/vita-epfl/Stable-Video-Infinity"> <img src="https://img.shields.io/badge/SVI-GitHub-black?logo=github&logoColor=white" alt="SVI on GitHub"/> </a> | |
| <a href="https://huggingface.co/datasets/vita-video-gen/svi-benchmark"> <img src="https://img.shields.io/badge/SVI_Dataset-Hugging%20Face-orange?logo=huggingface&logoColor=yellow" alt="SVI Dataset"/> </a> | |
| <a href="https://huggingface.co/vita-video-gen/svi-model"> <img src="https://img.shields.io/badge/SVI_models-Hugging%20Face-FFCC00?logo=huggingface&logoColor=yellow" alt="SVI Models"/> </a> </p> </div> | |
| ## 🎯 About This Repository | |
| **Stable-Video-Infinity(SVI)** is able to generate ANY-length videos with high temporal consistency, plausible scene transitions, and controllable streaming storylines in ANY domains. | |
| This repository contains the model weights of SVI Family. | |
| ## 🌟 Key Highlights | |
| - **OpenSVI**: Everything is open-sourced: training & evaluation scripts, datasets, and more. | |
| - **Infinite Length**: No inherent limit on video duration; generate arbitrarily long stories (see the 10‑minute “Tom and Jerry” demo). | |
| - **Versatile**: Supports diverse in-the-wild generation tasks: multi-scene short films, single‑scene animations, skeleton-/audio-conditioned generation, cartoons, and more. | |
| - **Efficient**: Only LoRA adapters are tuned, requiring very little training data: anyone can make their own SVI easily. | |
| ## 📦 Resources | |
| | **Model** | **Task** | **Input** | **Output** | **Hugging Face Link** | **Comments** | | |
| |-------|------|-------|--------|-------------------|------------------| | |
| | **ALL** | Infinite possibility | Image + X | X video | [🤗 Folder](https://huggingface.co/vita-video-gen/svi-model/tree/main/version-1.0) |Family bucket! I want to play with all! | | |
| | **SVI-Shot** | Single-scene generation | Image + Text prompt | Long video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-shot.safetensors?download=true) | Generate consistent long video with 1 text prompt. (This will never drift) | | |
| | **SVI-Film** | Multi-scene generation | Image + Text prompt stream | Film-style video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-film.safetensors?download=true) | Generate creative long video with 1 text prompt stream (5 second per text). | | |
| | **SVI-Film (Transition)** | Multi-scene generation | Image + Text prompt stream | Film-style video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-film-transitions.safetensors?download=true) |Generate creative long video with 1 text prompt stream. (More scene transitions due to the training data) | | |
| | **SVI-Tom&Jerry** | Cartoon animation | Image | Cartoon video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-tom.safetensors?download=true) | Generate creative long cartoon videos with 1 text prompt stream (This will never drift in our 20 min test)| | |
| | **SVI-Talk** | Talking head | Image + Audio | Talking video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-talk.safetensors?download=true) |Generate long videos with audio-conditioned human speaking | | |
| | **SVI-Dance** | Dancing animation | Image + Skeleton | Dance video | [🤗 Model](https://huggingface.co/vita-video-gen/svi-model/resolve/main/version-1.0/svi-dance.safetensors?download=true) | Generate long videos with skeleton-conditioned human dancing | | |
| Note: If you want to play with T2V, you can directly use SVI with an image generated by any T2I model! | |
| ## 📝 Citation | |
| If you find our work helpful for your research, please consider citing our paper. Thank you so much! | |
| ```bibtex | |
| @article{li2025stable, | |
| title={Stable Video Infinity: Infinite-Length Video Generation with Error Recycling}, | |
| author={Wuyang Li and Wentao Pan and Po-Chien Luan and Yang Gao and Alexandre Alahi}, | |
| journal={arXiv preprint arXiv: arXiv:2510.09212}, | |
| year={2025}, | |
| url={https://huggingface.co/papers/2510.09212}, | |
| } | |
| ``` |