There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-Training

Model Configuration

Table 1: Pre-trained model.

Table 2: Fine-tuned model configurations. Encoder and decoder settings are separated by comma.

Name	Blocks	Dim	Heads	Params
EPG-L	16, 16	1024, 1024	16, 16	540M
EPG-XL	12, 12	768, 1584	12, 22	583M
EPG-XXL	12, 12	768, 1920	12, 16	789M
EPG-G	12, 12	768, 2688	12, 21	1391M

Table 3: Fine-tuned model performance in downstream tasks.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support