cpatonn commited on
Commit
1701e9a
·
verified ·
1 Parent(s): e9cd035

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -9,12 +9,30 @@ base_model: MiniMaxAI/MiniMax-M2
9
 
10
  ## Model Details
11
 
 
 
12
  - **Quantization Method:** cyankiwi AWQ v1.0
13
  - **Bits:** 4
14
  - **Group Size:** 32
15
  - **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
16
  - **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Inference
19
 
20
  ### Prerequisite
 
9
 
10
  ## Model Details
11
 
12
+ ### Quantization Details
13
+
14
  - **Quantization Method:** cyankiwi AWQ v1.0
15
  - **Bits:** 4
16
  - **Group Size:** 32
17
  - **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
18
  - **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
19
 
20
+ ### Memory Usage
21
+
22
+ | **Type** | **MiniMax-M2** | **MiniMax-M2-AWQ-4bit** |
23
+ |:---------------:|:----------------:|:----------------:|
24
+ | **Memory Size** | 214.3 GB | 121.5 GB |
25
+ | **KV Cache per Token** | 124.0 kB | 31.0 kB |
26
+ | **KV Cache per Context** | 23.3 GB | 5.8 GB |
27
+
28
+ ### Evaluations
29
+
30
+ | **Benchmarks** | **MiniMax-M2** | **MiniMax-M2-AWQ-4bit** |
31
+ |:---------------:|:----------------:|:----------------:|
32
+ | **Perplexity** | 1.54984 | 1.54743 |
33
+
34
+ - **Evaluation Context Length:** 16384
35
+
36
  ## Inference
37
 
38
  ### Prerequisite