beyoru commited on
Commit
f7d9b62
Β·
verified Β·
1 Parent(s): a6a3366

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -46
README.md CHANGED
@@ -1,4 +1,17 @@
1
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  tags:
4
  - tool
@@ -10,67 +23,60 @@ datasets:
10
  - Salesforce/xlam-function-calling-60k
11
  ---
12
 
13
-
14
-
15
-
16
  # 🧠 **Model Card β€” EvolLLM-Linh**
17
 
18
  ### **Model Overview**
19
 
20
- **Name:** EvolLLM-Linh \
21
- **Version:** v1.0 \
22
- **Release Date:** October 23, 2025 \
23
- **Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) \
24
- **Library:** πŸ€— *Transformers*
25
 
26
-
27
- **Purpose:**
28
- EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
29
- It aims enhance the **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
30
 
31
  **Key Capabilities:**
32
-
33
- * Precise and context-aware API invocation
34
- * Robust multi-turn dialogue consistency
35
- * Adaptive understanding of user preferences and intent shifts
36
 
37
  ---
38
 
39
- ### **Evaluation Summary**
40
-
41
- | **Category** | **Accuracy** |
42
- | ------------------------------- | -----------: |
43
- | SINGLE TURN – SINGLE FUNCTION | 0.800 |
44
- | SINGLE TURN – PARALLEL FUNCTION | 0.660 |
45
- | MULTI TURN – USER ADJUST | 0.500 |
46
- | MULTI TURN – USER SWITCH | 0.620 |
47
- | SIMILAR API CALLS | 0.760 |
48
- | USER PREFERENCE HANDLING | 0.600 |
49
- | ATOMIC TASK – BOOLEAN | 0.880 |
50
- | ATOMIC TASK – ENUM | 0.940 |
51
- | ATOMIC TASK – NUMBER | 0.940 |
52
- | ATOMIC TASK – LIST | 0.920 |
53
- | ATOMIC TASK – OBJECT (DEEP) | 0.580 |
54
- | ATOMIC TASK – OBJECT (SHORT) | 0.800 |
55
-
56
- **Overall Accuracy:** **0.750 (75.0%)** in normal en task.
57
 
58
  ---
59
 
60
  ### **Leaderboard Reference**
61
-
62
- EvolLLM-Linh participates in evaluations aligned with **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β€” a public leaderboard assessing LLM performance on **function calling, compositional reasoning, and multi-turn interaction**.
63
- Results here are preliminary and reflect internal benchmarking on the same task categories.
64
 
65
  ---
66
 
67
- ### Method
68
- - GRPO (Rule base reward + self confidence reward)
69
- - Evol Merging
70
-
71
 
72
- ## Support me at:
73
 
 
74
  <p align="center">
75
  <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
76
  <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
@@ -78,8 +84,6 @@ Results here are preliminary and reflect internal benchmarking on the same task
78
  </p>
79
 
80
  ### **License**
81
-
82
- **MIT License** β€” free for research and non-commercial use with attribution.
83
  Β© 2025 beyoru.
84
-
85
  ---
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - tool
5
+ - function-calling
6
+ - agent
7
+ - merge
8
+ base_model:
9
+ - Qwen/Qwen3-4B-Instruct-2507
10
+ - beyoru/Qwen3-4B-I-1209
11
+ - Qwen/Qwen3-4B-Thinking-2507
12
+ datasets:
13
+ - Salesforce/xlam-function-calling-60k
14
+ ---
15
  library_name: transformers
16
  tags:
17
  - tool
 
23
  - Salesforce/xlam-function-calling-60k
24
  ---
25
 
 
 
 
26
  # 🧠 **Model Card β€” EvolLLM-Linh**
27
 
28
  ### **Model Overview**
29
 
30
+ **Name:** EvolLLM-Linh
31
+ **Version:** v1.0
32
+ **Release Date:** October 23, 2025
33
+ **Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
34
+ **Library:** πŸ€— *Transformers*
35
 
36
+ **Purpose:**
37
+ EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
38
+ It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
 
39
 
40
  **Key Capabilities:**
41
+ - Precise and context-aware API invocation
42
+ - Robust multi-turn dialogue consistency
43
+ - Adaptive understanding of user preferences and intent shifts
 
44
 
45
  ---
46
 
47
+ ### **Evaluation Comparison**
48
+
49
+ | **Category** | **EvolLLM-Linh** | **GPT-OSS-20B** | **xLAM-2-8b-fc-r** | **Qwen3-2507** |
50
+ | ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
51
+ | SINGLE TURN – SINGLE FUNCTION | 0.800 | 0.800 | 0.63 | 0.69 |
52
+ | SINGLE TURN – PARALLEL FUNCTION | 0.660 | 0.620 | 0.16 | 0.51 |
53
+ | MULTI TURN – USER ADJUST | 0.500 | 0.500 | 0.40 | 0.48 |
54
+ | MULTI TURN – USER SWITCH | 0.620 | 0.620 | 0.40 | 0.56 |
55
+ | SIMILAR API CALLS | 0.760 | 0.740 | 0.64 | 0.68 |
56
+ | USER PREFERENCE HANDLING | 0.600 | 0.640 | 0.62 | 0.64 |
57
+ | ATOMIC TASK – BOOLEAN | 0.880 | 0.960 | 0.70 | 0.68 |
58
+ | ATOMIC TASK – ENUM | 0.940 | 0.940 | 0.94 | 0.86 |
59
+ | ATOMIC TASK – NUMBER | 0.940 | 0.960 | 0.90 | 0.82 |
60
+ | ATOMIC TASK – LIST | 0.920 | 0.900 | 0.84 | 0.78 |
61
+ | ATOMIC TASK – OBJECT (DEEP) | 0.580 | 0.520 | 0.32 | 0.36 |
62
+ | ATOMIC TASK – OBJECT (SHORT) | 0.800 | 0.960 | 0.70 | 0.56 |
63
+ | **Overall Accuracy** | **0.750** | **0.760** | **0.61** | **0.64** |
 
64
 
65
  ---
66
 
67
  ### **Leaderboard Reference**
68
+ Both **EvolLLM-Linh** and **GPT-OSS-20B** are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** β€” assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
69
+ Results are **internal benchmarks** aligned with ACEBench task categories.
 
70
 
71
  ---
72
 
73
+ ### **Method**
74
+ - GRPO (Rule-based reward + self-confidence reward)
75
+ - Evol Merging
 
76
 
77
+ ---
78
 
79
+ ## **Support me at**
80
  <p align="center">
81
  <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
82
  <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
 
84
  </p>
85
 
86
  ### **License**
87
+ **MIT License** β€” free for research and non-commercial use with attribution.
 
88
  Β© 2025 beyoru.
 
89
  ---