The model is great, clearly superior to the others. I'm very excited about it, but its VRAM consumption is quite high. Is there a possibility that you could publish an FP8 or similar quantized version?
Can we get an fp8 version or similar
· Sign up or log in to comment