Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,17 @@
|
|
| 1 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
tags:
|
| 4 |
- tool
|
|
@@ -10,67 +23,60 @@ datasets:
|
|
| 10 |
- Salesforce/xlam-function-calling-60k
|
| 11 |
---
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
# π§ **Model Card β EvolLLM-Linh**
|
| 17 |
|
| 18 |
### **Model Overview**
|
| 19 |
|
| 20 |
-
**Name:** EvolLLM-Linh
|
| 21 |
-
**Version:** v1.0
|
| 22 |
-
**Release Date:** October 23, 2025
|
| 23 |
-
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
|
| 24 |
-
**Library:** π€ *Transformers*
|
| 25 |
|
| 26 |
-
|
| 27 |
-
**
|
| 28 |
-
|
| 29 |
-
It aims enhance the **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
|
| 30 |
|
| 31 |
**Key Capabilities:**
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
* Adaptive understanding of user preferences and intent shifts
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
-
### **Evaluation
|
| 40 |
-
|
| 41 |
-
| **Category** | **
|
| 42 |
-
| ------------------------------- |
|
| 43 |
-
| SINGLE TURN β SINGLE FUNCTION |
|
| 44 |
-
| SINGLE TURN β PARALLEL FUNCTION |
|
| 45 |
-
| MULTI TURN β USER ADJUST |
|
| 46 |
-
| MULTI TURN β USER SWITCH |
|
| 47 |
-
| SIMILAR API CALLS |
|
| 48 |
-
| USER PREFERENCE HANDLING |
|
| 49 |
-
| ATOMIC TASK β BOOLEAN |
|
| 50 |
-
| ATOMIC TASK β ENUM |
|
| 51 |
-
| ATOMIC TASK β NUMBER |
|
| 52 |
-
| ATOMIC TASK β LIST |
|
| 53 |
-
| ATOMIC TASK β OBJECT (DEEP) |
|
| 54 |
-
| ATOMIC TASK β OBJECT (SHORT) |
|
| 55 |
-
|
| 56 |
-
**Overall Accuracy:** **0.750 (75.0%)** in normal en task.
|
| 57 |
|
| 58 |
---
|
| 59 |
|
| 60 |
### **Leaderboard Reference**
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
Results here are preliminary and reflect internal benchmarking on the same task categories.
|
| 64 |
|
| 65 |
---
|
| 66 |
|
| 67 |
-
### Method
|
| 68 |
-
- GRPO (Rule
|
| 69 |
-
- Evol Merging
|
| 70 |
-
|
| 71 |
|
| 72 |
-
|
| 73 |
|
|
|
|
| 74 |
<p align="center">
|
| 75 |
<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
|
| 76 |
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
|
|
@@ -78,8 +84,6 @@ Results here are preliminary and reflect internal benchmarking on the same task
|
|
| 78 |
</p>
|
| 79 |
|
| 80 |
### **License**
|
| 81 |
-
|
| 82 |
-
**MIT License** β free for research and non-commercial use with attribution.
|
| 83 |
Β© 2025 beyoru.
|
| 84 |
-
|
| 85 |
---
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: transformers
|
| 3 |
+
tags:
|
| 4 |
+
- tool
|
| 5 |
+
- function-calling
|
| 6 |
+
- agent
|
| 7 |
+
- merge
|
| 8 |
+
base_model:
|
| 9 |
+
- Qwen/Qwen3-4B-Instruct-2507
|
| 10 |
+
- beyoru/Qwen3-4B-I-1209
|
| 11 |
+
- Qwen/Qwen3-4B-Thinking-2507
|
| 12 |
+
datasets:
|
| 13 |
+
- Salesforce/xlam-function-calling-60k
|
| 14 |
+
---
|
| 15 |
library_name: transformers
|
| 16 |
tags:
|
| 17 |
- tool
|
|
|
|
| 23 |
- Salesforce/xlam-function-calling-60k
|
| 24 |
---
|
| 25 |
|
|
|
|
|
|
|
|
|
|
| 26 |
# π§ **Model Card β EvolLLM-Linh**
|
| 27 |
|
| 28 |
### **Model Overview**
|
| 29 |
|
| 30 |
+
**Name:** EvolLLM-Linh
|
| 31 |
+
**Version:** v1.0
|
| 32 |
+
**Release Date:** October 23, 2025
|
| 33 |
+
**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
|
| 34 |
+
**Library:** π€ *Transformers*
|
| 35 |
|
| 36 |
+
**Purpose:**
|
| 37 |
+
EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
|
| 38 |
+
It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
|
|
|
|
| 39 |
|
| 40 |
**Key Capabilities:**
|
| 41 |
+
- Precise and context-aware API invocation
|
| 42 |
+
- Robust multi-turn dialogue consistency
|
| 43 |
+
- Adaptive understanding of user preferences and intent shifts
|
|
|
|
| 44 |
|
| 45 |
---
|
| 46 |
|
| 47 |
+
### **Evaluation Comparison**
|
| 48 |
+
|
| 49 |
+
| **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **xLAM-2-8b-fc-r** | **Qwen3-2507** |
|
| 50 |
+
| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
|
| 51 |
+
| SINGLE TURN β SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
|
| 52 |
+
| SINGLE TURN β PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
|
| 53 |
+
| MULTI TURN β USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 |
|
| 54 |
+
| MULTI TURN β USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 |
|
| 55 |
+
| SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 |
|
| 56 |
+
| USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 |
|
| 57 |
+
| ATOMIC TASK β BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 |
|
| 58 |
+
| ATOMIC TASK β ENUM | 0.940 | 0.940 | 0.94 | 0.86 |
|
| 59 |
+
| ATOMIC TASK β NUMBER | 0.940 | 0.960 | 0.90 | 0.82 |
|
| 60 |
+
| ATOMIC TASK β LIST | 0.920 | 0.900 | 0.84 | 0.78 |
|
| 61 |
+
| ATOMIC TASK β OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
|
| 62 |
+
| ATOMIC TASK β OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
|
| 63 |
+
| **Overall Accuracy** | **0.750** | **0.760** | **0.61** | **0.64** |
|
|
|
|
| 64 |
|
| 65 |
---
|
| 66 |
|
| 67 |
### **Leaderboard Reference**
|
| 68 |
+
Both **EvolLLM-Linh** and **GPT-OSS-20B** are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
|
| 69 |
+
Results are **internal benchmarks** aligned with ACEBench task categories.
|
|
|
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
+
### **Method**
|
| 74 |
+
- GRPO (Rule-based reward + self-confidence reward)
|
| 75 |
+
- Evol Merging
|
|
|
|
| 76 |
|
| 77 |
+
---
|
| 78 |
|
| 79 |
+
## **Support me at**
|
| 80 |
<p align="center">
|
| 81 |
<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
|
| 82 |
<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
|
|
|
|
| 84 |
</p>
|
| 85 |
|
| 86 |
### **License**
|
| 87 |
+
**MIT License** β free for research and non-commercial use with attribution.
|
|
|
|
| 88 |
Β© 2025 beyoru.
|
|
|
|
| 89 |
---
|