beyoru
/

EvolLLM-Linh

@@ -1,4 +1,17 @@
----
 library_name: transformers
 tags:
 - tool
@@ -10,67 +23,60 @@ datasets:
 - Salesforce/xlam-function-calling-60k
 ---
 # 🧠 **Model Card — EvolLLM-Linh**
 ### **Model Overview**
-**Name:** EvolLLM-Linh \
-**Version:** v1.0 \
-**Release Date:** October 23, 2025 \
-**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) \
-**Library:** 🤗 *Transformers*
-**Purpose:**
-EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
-It aims enhance the **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
 **Key Capabilities:**
-* Precise and context-aware API invocation
-* Robust multi-turn dialogue consistency
-* Adaptive understanding of user preferences and intent shifts
 ---
-### **Evaluation Summary**
-| **Category**                    | **Accuracy** |
-| ------------------------------- | -----------: |
-| SINGLE TURN – SINGLE FUNCTION   |        0.800 |
-| SINGLE TURN – PARALLEL FUNCTION |        0.660 |
-| MULTI TURN – USER ADJUST        |        0.500 |
-| MULTI TURN – USER SWITCH        |        0.620 |
-| SIMILAR API CALLS               |        0.760 |
-| USER PREFERENCE HANDLING        |        0.600 |
-| ATOMIC TASK – BOOLEAN           |        0.880 |
-| ATOMIC TASK – ENUM              |        0.940 |
-| ATOMIC TASK – NUMBER            |        0.940 |
-| ATOMIC TASK – LIST              |        0.920 |
-| ATOMIC TASK – OBJECT (DEEP)     |        0.580 |
-| ATOMIC TASK – OBJECT (SHORT)    |        0.800 |
-**Overall Accuracy:** **0.750 (75.0%)** in normal en task.
 ---
 ### **Leaderboard Reference**
-EvolLLM-Linh participates in evaluations aligned with **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — a public leaderboard assessing LLM performance on **function calling, compositional reasoning, and multi-turn interaction**.
-Results here are preliminary and reflect internal benchmarking on the same task categories.
 ---
-### Method
-- GRPO (Rule base reward + self confidence reward)
-- Evol Merging
-## Support me at:
 <p align="center">
   <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
     <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
@@ -78,8 +84,6 @@ Results here are preliminary and reflect internal benchmarking on the same task
 </p>
 ### **License**
-**MIT License** — free for research and non-commercial use with attribution.
 © 2025 beyoru.
 ---

+---
+library_name: transformers
+tags:
+- tool
+- function-calling
+- agent
+- merge
+base_model:
+- Qwen/Qwen3-4B-Instruct-2507
+- beyoru/Qwen3-4B-I-1209
+- Qwen/Qwen3-4B-Thinking-2507
+datasets:
+- Salesforce/xlam-function-calling-60k
+---
 library_name: transformers
 tags:
 - tool
 - Salesforce/xlam-function-calling-60k
 ---
 # 🧠 **Model Card — EvolLLM-Linh**
 ### **Model Overview**
+**Name:** EvolLLM-Linh
+**Version:** v1.0
+**Release Date:** October 23, 2025
+**Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
+**Library:** 🤗 *Transformers*
+**Purpose:**
+EvolLLM-Linh is a fine-tuned large language model designed for **function calling**.
+It aims to enhance **robustness, accuracy, and dialogue coherence** of LLMs operating in **API-driven or tool-using environments**.
 **Key Capabilities:**
+- Precise and context-aware API invocation
+- Robust multi-turn dialogue consistency
+- Adaptive understanding of user preferences and intent shifts
 ---
+### **Evaluation Comparison**
+| **Category**                    |  **EvolLLM-Linh** |  **GPT-OSS-20B**  | **xLAM-2-8b-fc-r** | **Qwen3-2507** |
+| ------------------------------- | :---------------: | :---------------: | :-------: | :-----------: |
+| SINGLE TURN – SINGLE FUNCTION   |       0.800       |       0.800       |    0.63   |      0.69     |
+| SINGLE TURN – PARALLEL FUNCTION |       0.660       |       0.620       |    0.16   |      0.51     |
+| MULTI TURN – USER ADJUST        |       0.500       |       0.500       |    0.40   |      0.48     |
+| MULTI TURN – USER SWITCH        |       0.620       |       0.620       |    0.40   |      0.56     |
+| SIMILAR API CALLS               |       0.760       |       0.740       |    0.64   |      0.68     |
+| USER PREFERENCE HANDLING        |       0.600       |       0.640       |    0.62   |      0.64     |
+| ATOMIC TASK – BOOLEAN           |       0.880       |       0.960       |    0.70   |      0.68     |
+| ATOMIC TASK – ENUM              |       0.940       |       0.940       |    0.94   |      0.86     |
+| ATOMIC TASK – NUMBER            |       0.940       |       0.960       |    0.90   |      0.82     |
+| ATOMIC TASK – LIST              |       0.920       |       0.900       |    0.84   |      0.78     |
+| ATOMIC TASK – OBJECT (DEEP)     |       0.580       |       0.520       |    0.32   |      0.36     |
+| ATOMIC TASK – OBJECT (SHORT)    |       0.800       |       0.960       |    0.70   |      0.56     |
+| **Overall Accuracy**            | **0.750**         |     **0.760**     |  **0.61** |    **0.64**   |
 ---
 ### **Leaderboard Reference**
+Both **EvolLLM-Linh** and **GPT-OSS-20B** are benchmarked using **[ACEBench](https://chenchen0103.github.io/ACEBench/)** — assessing **function calling**, **compositional reasoning**, and **multi-turn interaction**.
+Results are **internal benchmarks** aligned with ACEBench task categories.
 ---
+### **Method**
+- GRPO (Rule-based reward + self-confidence reward)
+- Evol Merging
+---
+## **Support me at**
 <p align="center">
   <a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
     <img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
 </p>
 ### **License**
+**MIT License** — free for research and non-commercial use with attribution.
 © 2025 beyoru.
 ---