File size: 3,919 Bytes
d4ed45e 85ad14b 681be90 d4ed45e 681be90 d4ed45e b197709 ae7ef38 85ad14b d4ed45e 681be90 85ad14b 681be90 85ad14b f7d9b62 b765b42 c78b8a4 f7d9b62 85ad14b 681be90 f7d9b62 85ad14b d4ed45e f7d9b62 561865e d4ed45e 85ad14b d4ed45e 85ad14b 681be90 d4ed45e f7d9b62 85ad14b 681be90 85ad14b f7d9b62 eadc797 f7d9b62 eadc797 f7d9b62 eadc797 681be90 20a64cc f7d9b62 681be90 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
library_name: transformers
tags:
- tool
- function-calling
- agent
- merge
base_model:
- Qwen/Qwen3-4B-Instruct-2507
- beyoru/Qwen3-4B-I-1209
- Qwen/Qwen3-4B-Thinking-2507
datasets:
- Salesforce/xlam-function-calling-60k
- beyoru/xlam-instruct-grpo
---
# π§ **Model Card β EvolLLM-Linh**
### **Model Overview**
**Name:** EvolLLM-Linh
**Version:** v1.0
**Release Date:** October 23, 2025
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
**Library:** π€ *Transformers*
<p align="center">
<img src="hyacine-hsr.gif" width="150">
</p>
**Purpose:**
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
**Key Capabilities:**
- Precise and context-aware API invocation
- Robust multi-turn dialogue consistency
- Adaptive understanding of user preferences and intent shifts
### **Evaluation Comparison**
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** | **MinCoder-4B-Expert** |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: |
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 | 0.81 |
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 | 0.66 |
| MULTI TURN β USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 | 0.50 |
| MULTI TURN β USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 | 0.64 |
| SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 | 0.76 |
| USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 | 0.60 |
| ATOMIC TASK β BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 | 0.88 |
| ATOMIC TASK β ENUM | 0.940 | 0.940 | 0.94 | 0.86 | 0.96 |
| ATOMIC TASK β NUMBER | 0.940 | 0.960 | 0.90 | 0.82 | 0.94 |
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 | 0.94 |
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 | 0.62 |
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 | 0.82 |
| **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** | **0.761** |
---
> **Note:**
> **We evaluate all models with the same configuration.**
> If you find any incorrect or inconsistent result, please report it for verification.
> This ensures transparency and reproducibility across benchmarks.
### **Leaderboard Reference**
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
Results are **internal benchmarks** aligned with ACEBench task categories.
---
### **Method**
- GRPO (Rule-based reward + self-confidence reward)
- Evol Merging
---
## **Support me at**
<p align="center">
<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
</a>
</p>
### **License**
**MIT License** β free for research and non-commercial use with attribution.
Β© 2025 beyoru.
--- |