EvolLLM-Linh / README.md
beyoru's picture
Update README.md
561865e verified
---
library_name: transformers
tags:
- tool
- function-calling
- agent
- merge
base_model:
- Qwen/Qwen3-4B-Instruct-2507
- beyoru/Qwen3-4B-I-1209
- Qwen/Qwen3-4B-Thinking-2507
datasets:
- Salesforce/xlam-function-calling-60k
- beyoru/xlam-instruct-grpo
---
# 🧠 **Model Card β€” EvolLLM-Linh**
### **Model Overview**
**Name:** EvolLLM-Linh
**Version:** v1.0
**Release Date:** October 23, 2025
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
**Library:** πŸ€— *Transformers*
<p align="center">
<img src="hyacine-hsr.gif" width="150">
</p>
**Purpose:**
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
**Key Capabilities:**
- Precise and context-aware API invocation
- Robust multi-turn dialogue consistency
- Adaptive understanding of user preferences and intent shifts
### **Evaluation Comparison**
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **Llama** | **Qwen-2507** | **MinCoder-4B-Expert** |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: |
| SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 | 0.81 |
| SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 | 0.66 |
| MULTI TURN – USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 | 0.50 |
| MULTI TURN – USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 | 0.64 |
| SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 | 0.76 |
| USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 | 0.60 |
| ATOMIC TASK – BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 | 0.88 |
| ATOMIC TASK – ENUM | 0.940 | 0.940 | 0.94 | 0.86 | 0.96 |
| ATOMIC TASK – NUMBER | 0.940 | 0.960 | 0.90 | 0.82 | 0.94 |
| ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 | 0.94 |
| ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 | 0.62 |
| ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 | 0.82 |
| **Overall Accuracy** | **0.750 (75.0%)** | **0.760 (76.0%)** | **0.61** | **0.64** | **0.761** |
---
> **Note:**
> **We evaluate all models with the same configuration.**
> If you find any incorrect or inconsistent result, please report it for verification.
> This ensures transparency and reproducibility across benchmarks.
### **Leaderboard Reference**
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β€” assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
Results are **internal benchmarks** aligned with ACEBench task categories.
---
### **Method**
- GRPO (Rule-based reward + self-confidence reward)
- Evol Merging
---
## **Support me at**
<p align="center">
<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
</a>
</p>
### **License**
**MIT License** β€” free for research and non-commercial use with attribution.
Β© 2025 beyoru.
---