|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- tool |
|
|
- function-calling |
|
|
- agent |
|
|
- merge |
|
|
base_model: |
|
|
- Qwen/Qwen3-4B-Instruct-2507 |
|
|
- beyoru/Qwen3-4B-I-1209 |
|
|
- Qwen/Qwen3-4B-Thinking-2507 |
|
|
datasets: |
|
|
- Salesforce/xlam-function-calling-60k |
|
|
- beyoru/xlam-instruct-grpo |
|
|
--- |
|
|
|
|
|
|
|
|
# π§ **Model Card β EvolLLM-Linh** |
|
|
|
|
|
### **Model Overview** |
|
|
|
|
|
**Name:** EvolLLM-Linh |
|
|
**Version:** v1.0 |
|
|
**Release Date:** October 23, 2025 |
|
|
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
|
|
**Library:** π€ *Transformers* |
|
|
|
|
|
<p align="center"> |
|
|
<img src="hyacine-hsr.gif" width="150"> |
|
|
</p> |
|
|
|
|
|
**Purpose:** |
|
|
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**. |
|
|
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**. |
|
|
|
|
|
**Key Capabilities:** |
|
|
- Precise and context-aware API invocation |
|
|
- Robust multi-turn dialogue consistency |
|
|
- Adaptive understanding of user preferences and intent shifts |
|
|
|
|
|
|
|
|
### **Evaluation Comparison** |
|
|
|
|
|
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** | **MinCoder-4B-Expert** | |
|
|
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: | |
|
|
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 | 0.81 | |
|
|
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 | 0.66 | |
|
|
| MULTI TURN β USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 | 0.50 | |
|
|
| MULTI TURN β USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 | 0.64 | |
|
|
| SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 | 0.76 | |
|
|
| USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 | 0.60 | |
|
|
| ATOMIC TASK β BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 | 0.88 | |
|
|
| ATOMIC TASK β ENUM | 0.940 | 0.940 | 0.94 | 0.86 | 0.96 | |
|
|
| ATOMIC TASK β NUMBER | 0.940 | 0.960 | 0.90 | 0.82 | 0.94 | |
|
|
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 | 0.94 | |
|
|
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 | 0.62 | |
|
|
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 | 0.82 | |
|
|
| **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** | **0.761** | |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
> **Note:** |
|
|
> **We evaluate all models with the same configuration.** |
|
|
> If you find any incorrect or inconsistent result, please report it for verification. |
|
|
> This ensures transparency and reproducibility across benchmarks. |
|
|
|
|
|
### **Leaderboard Reference** |
|
|
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**. |
|
|
Results are **internal benchmarks** aligned with ACEBench task categories. |
|
|
|
|
|
--- |
|
|
|
|
|
### **Method** |
|
|
- GRPO (Rule-based reward + self-confidence reward) |
|
|
- Evol Merging |
|
|
|
|
|
--- |
|
|
|
|
|
## **Support me at** |
|
|
<p align="center"> |
|
|
<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank"> |
|
|
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px"> |
|
|
</a> |
|
|
</p> |
|
|
|
|
|
### **License** |
|
|
|
|
|
**MIT License** β free for research and non-commercial use with attribution. |
|
|
Β© 2025 beyoru. |
|
|
--- |