FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collection of Mixed Precision LLaMA and Qwen Models
-
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-out_proj-all
5B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-qkv_proj-all
5B • Updated • 14 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated • 16
FP8-dynamic, FP8-block, NVFP4, INT4, INT8 versions of Qwen3-Next-80B-A3B-Instruct
Collection of Mixed Precision LLaMA and Qwen Models
-
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-out_proj-all
5B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-qkv_proj-all
5B • Updated • 14 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-down_proj-all
6B • Updated • 15 -
inference-optimization/Llama-3.1-8B-Instruct-Mixed-NVFP4-FP8_BLOCK-gate_up_proj-all
7B • Updated • 16
models
30
inference-optimization/Qwen3-Next-80B-A3B-Instruct-quantized.w8a8
Updated
inference-optimization/Qwen3-Next-80B-A3B-Instruct-quantized.w4a16
Updated
inference-optimization/Qwen3-Next-80B-A3B-Instruct-NVFP4
Updated
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8-dynamic
Updated
inference-optimization/Qwen3-Next-80B-A3B-Instruct-FP8-block
Updated
inference-optimization/Qwen3-30B-A3B-Thinking-2507.w4a16
Text Generation
•
5B
•
Updated
•
16
inference-optimization/Llama-3.1-8B-Instruct-HIGGS-quantized-paths
Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507.w4a16
Text Generation
•
5B
•
Updated
•
21
inference-optimization/Qwen3-4B-Instruct-2507.w4a16
Text Generation
•
1B
•
Updated
•
28
inference-optimization/Qwen3-4B-Thinking-2507.w4a16
Text Generation
•
1B
•
Updated
•
209
datasets
0
None public yet