---
library_name: onnx
tags:
  - text-generation
  - smollm2
  - llama
  - onnx
  - inference4j
license: apache-2.0
pipeline_tag: text-generation
---

# SmolLM2-360M-Instruct — ONNX

ONNX export of [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) (360M parameters) with KV cache support for efficient autoregressive generation.

Converted for use with [inference4j](https://github.com/inference4j/inference4j), an inference-only AI library for Java.

## Original Source

- **Repository:** [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct)
- **License:** Apache 2.0

## Usage with inference4j

```java
try (var gen = SmolLM2TextGenerator.builder().build()) {
    GenerationResult result = gen.generate("What is Java?");
    System.out.println(result.text());
}
```

## Model Details

| Property | Value |
|----------|-------|
| Architecture | LlamaForCausalLM (360M parameters, 32 layers, 960 hidden, 15 heads, 5 KV heads) |
| Task | Text generation (instruction-tuned) |
| Context length | 8192 tokens |
| Vocabulary | 49,152 tokens (BPE) |
| Chat template | ChatML (`<|im_start|>user`...`<|im_end|>`) |
| Original framework | PyTorch (transformers) |
| Export method | Hugging Face Optimum (with KV cache) |

## License

This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Original model by [HuggingFace](https://huggingface.co/HuggingFaceTB).