Note GGUFs to run locally:
Note MLX to run on MacOS:
Note 16-bit safetensor run / fine-tune with:
Note NVFP4 for fast inference/deployment: