File size: 3,919 Bytes
d4ed45e
85ad14b
681be90
 
 
 
d4ed45e
681be90
 
d4ed45e
 
b197709
 
ae7ef38
85ad14b
 
d4ed45e
681be90
85ad14b
681be90
85ad14b
f7d9b62
 
 
 
 
b765b42
c78b8a4
 
 
 
f7d9b62
 
 
85ad14b
681be90
f7d9b62
 
 
85ad14b
d4ed45e
f7d9b62
 
561865e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d4ed45e
 
85ad14b
d4ed45e
 
 
 
85ad14b
681be90
d4ed45e
f7d9b62
85ad14b
681be90
85ad14b
f7d9b62
 
 
eadc797
f7d9b62
eadc797
f7d9b62
eadc797
 
 
 
 
 
681be90
20a64cc
f7d9b62
681be90
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
library_name: transformers
tags:
- tool
- function-calling
- agent
- merge
base_model:
- Qwen/Qwen3-4B-Instruct-2507
- beyoru/Qwen3-4B-I-1209
- Qwen/Qwen3-4B-Thinking-2507
datasets:
- Salesforce/xlam-function-calling-60k
- beyoru/xlam-instruct-grpo
---


# 🧠 **Model Card β€” EvolLLM-Linh**

### **Model Overview**

**Name:** EvolLLM-Linh  
**Version:** v1.0  
**Release Date:** October 23, 2025  
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)  
**Library:** πŸ€— *Transformers*  

<p align="center">
  <img src="hyacine-hsr.gif" width="150">
</p>

**Purpose:**  
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.  
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.

**Key Capabilities:**
- Precise and context-aware API invocation  
- Robust multi-turn dialogue consistency  
- Adaptive understanding of user preferences and intent shifts  


### **Evaluation Comparison**

| **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **Llama** | **Qwen-2507** |      **MinCoder-4B-Expert**     |
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: | :---------------: |
| SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |        0.81       |
| SINGLE TURN – PARALLEL FUNCTION |       0.660       |       0.620       |    0.16   |      0.51     |        0.66       |
| MULTI TURN – USER ADJUST        |       0.500       |       0.500       |    0.40   |      0.48     |        0.50       |
| MULTI TURN – USER SWITCH        |       0.620       |       0.620       |    0.40   |      0.56     |        0.64       |
| SIMILAR API CALLS               |       0.760       |       0.740       |    0.64   |      0.68     |        0.76       |
| USER PREFERENCE HANDLING        |       0.600       |       0.640       |    0.62   |      0.64     |        0.60       |
| ATOMIC TASK – BOOLEAN           |       0.880       |       0.960       |    0.70   |      0.68     |        0.88       |
| ATOMIC TASK – ENUM              |       0.940       |       0.940       |    0.94   |      0.86     |        0.96       |
| ATOMIC TASK – NUMBER            |       0.940       |       0.960       |    0.90   |      0.82     |        0.94       |
| ATOMIC TASK – LIST              |       0.920       |       0.900       |    0.84   |      0.78     |        0.94       |
| ATOMIC TASK – OBJECT (DEEP)     |       0.580       |       0.520       |    0.32   |      0.36     |        0.62       |
| ATOMIC TASK – OBJECT (SHORT)    |       0.800       |       0.960       |    0.70   |      0.56     |        0.82       |
| **Overall Accuracy**            | **0.750 (75.0%)** | **0.760 (76.0%)** |  **0.61** |    **0.64**   |      **0.761**    |


---

> **Note:**
> **We evaluate all models with the same configuration.**
> If you find any incorrect or inconsistent result, please report it for verification.
> This ensures transparency and reproducibility across benchmarks.

### **Leaderboard Reference**
all model are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β€” assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.  
Results are **internal benchmarks** aligned with ACEBench task categories.

---

### **Method**
- GRPO (Rule-based reward + self-confidence reward)  
- Evol Merging  

---

## **Support me at**
<p align="center">
  <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
    <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
  </a>
</p>

### **License**

**MIT License** β€” free for research and non-commercial use with attribution.  
Β© 2025 beyoru.
---