WeDLM-8B-Instruct-MLX-8bit

This is an 8-bit quantized MLX version of tencent/WeDLM-8B-Instruct for efficient inference on Apple Silicon.

It currently does not work too well or provide meaningfull speedup due to lack of pre compilation. https://github.com/ZimengXiong/WeDLM-MLX/tree/main

Related Models

Variant	HuggingFace
4-bit	zimengxiong/WeDLM-8B-Instruct-MLX-4bit
8-bit (this model)	zimengxiong/WeDLM-8B-Instruct-MLX-8bit
fp16	zimengxiong/WeDLM-8B-Instruct-MLX

This model inherits the license from the base model tencent/WeDLM-8B-Instruct.

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Base model

Quantized

(4)

this model