InternLM3-8B-Instruct โ€” GGUF Quants

Quantized GGUF versions of internlm/internlm3-8b-instruct โ€” Shanghai AI Lab's InternLM3 8B instruction-tuned model featuring a 1M token context window, strong multilingual support (English + Chinese), and competitive performance across reasoning and coding benchmarks.

The 1M context window makes InternLM3-8B uniquely capable among sub-10B models for long-document tasks, RAG pipelines, and extended reasoning chains.

Available Files

File Quant Size Use Case
InternLM3-8B-Instruct-Q8_0.gguf Q8_0 ~8.5GB Maximum quality
InternLM3-8B-Instruct-Q6_K.gguf Q6_K ~6.6GB Near-lossless
InternLM3-8B-Instruct-Q5_K_M.gguf Q5_K_M ~5.7GB High quality
InternLM3-8B-Instruct-Q4_K_M.gguf Q4_K_M ~4.9GB Recommended default
InternLM3-8B-Instruct-Q3_K_M.gguf Q3_K_M ~3.9GB Low VRAM
InternLM3-8B-Instruct-IQ4_XS.gguf IQ4_XS ~4.3GB Imatrix 4-bit
InternLM3-8B-Instruct-IQ3_XXS.gguf IQ3_XXS ~3.2GB Imatrix 3-bit
InternLM3-8B-Instruct-IQ2_M.gguf IQ2_M ~2.8GB Imatrix 2-bit
InternLM3-8B-Instruct-IQ1_S.gguf IQ1_S ~2.0GB Extreme compression
InternLM3-8B-Instruct-fp16.gguf FP16 ~16.0GB Full precision
imatrix.dat โ€” โ€” Importance matrix

Usage

# llama.cpp
./llama-cli -m InternLM3-8B-Instruct-Q4_K_M.gguf \
  --ctx-size 8192 -n 512 \
  -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\n"

# Ollama
ollama run hf.co/DuoNeural/InternLM3-8B-Instruct-GGUF:Q4_K_M

About InternLM3-8B

  • Parameters: 8B
  • Context: 1M tokens (unique at this parameter scale)
  • Architecture: Decoder-only transformer
  • Languages: English, Chinese (multilingual)
  • Strengths: Long-context reasoning, instruction following, coding, math

Notable for its extreme context length โ€” 1M tokens in a sub-10B model is unmatched in the open-source landscape.


Quantized by DuoNeural using llama.cpp on RTX 5090.


DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ€” DuoNeural.

Downloads last month
712
GGUF
Model size
9B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/InternLM3-8B-Instruct-GGUF

Quantized
(46)
this model