EvolLLM-Linh / README.md

Update README.md

561865e verified about 2 months ago

3.92 kB

	---
	library_name: transformers
	tags:
	- tool
	- function-calling
	- agent
	- merge
	base_model:
	- Qwen/Qwen3-4B-Instruct-2507
	- beyoru/Qwen3-4B-I-1209
	- Qwen/Qwen3-4B-Thinking-2507
	datasets:
	- Salesforce/xlam-function-calling-60k
	- beyoru/xlam-instruct-grpo
	---


	# 🧠 Model Card — EvolLLM-Linh

	### Model Overview

	Name: EvolLLM-Linh
	Version: v1.0
	Release Date: October 23, 2025
	Base Model: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
	Library: 🤗 Transformers

	<p align="center">
	<img src="hyacine-hsr.gif" width="150">
	</p>

	Purpose:
	EvolLLM-Linh is a fine-tuned large language model designed for function calling.
	It aims to enhance robustness, accuracy, and dialogue coherence of LLMs operating in API-driven or tool-using environments.

	Key Capabilities:
	- Precise and context-aware API invocation
	- Robust multi-turn dialogue consistency
	- Adaptive understanding of user preferences and intent shifts


	### Evaluation Comparison

	\| Category \| EvolLLM-Linh \| GPT-OSS-20B \| Llama \| Qwen-2507 \| MinCoder-4B-Expert \|
	\| ------------------------------- \| :---------------: \| :---------------: \| :-------: \| :-----------: \| :---------------: \|
	\| SINGLE TURN – SINGLE FUNCTION \| 0.800 \| 0.800 \| 0.63 \| 0.69 \| 0.81 \|
	\| SINGLE TURN – PARALLEL FUNCTION \| 0.660 \| 0.620 \| 0.16 \| 0.51 \| 0.66 \|
	\| MULTI TURN – USER ADJUST \| 0.500 \| 0.500 \| 0.40 \| 0.48 \| 0.50 \|
	\| MULTI TURN – USER SWITCH \| 0.620 \| 0.620 \| 0.40 \| 0.56 \| 0.64 \|
	\| SIMILAR API CALLS \| 0.760 \| 0.740 \| 0.64 \| 0.68 \| 0.76 \|
	\| USER PREFERENCE HANDLING \| 0.600 \| 0.640 \| 0.62 \| 0.64 \| 0.60 \|
	\| ATOMIC TASK – BOOLEAN \| 0.880 \| 0.960 \| 0.70 \| 0.68 \| 0.88 \|
	\| ATOMIC TASK – ENUM \| 0.940 \| 0.940 \| 0.94 \| 0.86 \| 0.96 \|
	\| ATOMIC TASK – NUMBER \| 0.940 \| 0.960 \| 0.90 \| 0.82 \| 0.94 \|
	\| ATOMIC TASK – LIST \| 0.920 \| 0.900 \| 0.84 \| 0.78 \| 0.94 \|
	\| ATOMIC TASK – OBJECT (DEEP) \| 0.580 \| 0.520 \| 0.32 \| 0.36 \| 0.62 \|
	\| ATOMIC TASK – OBJECT (SHORT) \| 0.800 \| 0.960 \| 0.70 \| 0.56 \| 0.82 \|
	\| Overall Accuracy \| 0.750 (75.0%) \| 0.760 (76.0%) \| 0.61 \| 0.64 \| 0.761 \|


	---

	> Note:
	> We evaluate all models with the same configuration.
	> If you find any incorrect or inconsistent result, please report it for verification.
	> This ensures transparency and reproducibility across benchmarks.

	### Leaderboard Reference
	all model are benchmarked using [ACEBench](https://chenchen0103.github.io/ACEBench/) — assessing function calling, compositional reasoning, and multi-turn interaction.
	Results are internal benchmarks aligned with ACEBench task categories.

	---

	### Method
	- GRPO (Rule-based reward + self-confidence reward)
	- Evol Merging

	---

	## Support me at
	<p align="center">
	<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
	<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
	</a>
	</p>

	### License

	MIT License — free for research and non-commercial use with attribution.
	© 2025 beyoru.
	---

	---
	library_name: transformers
	tags:
	- tool
	- function-calling
	- agent
	- merge
	base_model:
	- Qwen/Qwen3-4B-Instruct-2507
	- beyoru/Qwen3-4B-I-1209
	- Qwen/Qwen3-4B-Thinking-2507
	datasets:
	- Salesforce/xlam-function-calling-60k
	- beyoru/xlam-instruct-grpo
	---


	# 🧠 Model Card — EvolLLM-Linh

	### Model Overview

	Name: EvolLLM-Linh
	Version: v1.0
	Release Date: October 23, 2025
	Base Model: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
	Library: 🤗 Transformers

	<p align="center">
	<img src="hyacine-hsr.gif" width="150">
	</p>

	Purpose:
	EvolLLM-Linh is a fine-tuned large language model designed for function calling.
	It aims to enhance robustness, accuracy, and dialogue coherence of LLMs operating in API-driven or tool-using environments.

	Key Capabilities:
	- Precise and context-aware API invocation
	- Robust multi-turn dialogue consistency
	- Adaptive understanding of user preferences and intent shifts


	### Evaluation Comparison

	\| Category \| EvolLLM-Linh \| GPT-OSS-20B \| Llama \| Qwen-2507 \| MinCoder-4B-Expert \|
	\| ------------------------------- \| :---------------: \| :---------------: \| :-------: \| :-----------: \| :---------------: \|
	\| SINGLE TURN – SINGLE FUNCTION \| 0.800 \| 0.800 \| 0.63 \| 0.69 \| 0.81 \|
	\| SINGLE TURN – PARALLEL FUNCTION \| 0.660 \| 0.620 \| 0.16 \| 0.51 \| 0.66 \|
	\| MULTI TURN – USER ADJUST \| 0.500 \| 0.500 \| 0.40 \| 0.48 \| 0.50 \|
	\| MULTI TURN – USER SWITCH \| 0.620 \| 0.620 \| 0.40 \| 0.56 \| 0.64 \|
	\| SIMILAR API CALLS \| 0.760 \| 0.740 \| 0.64 \| 0.68 \| 0.76 \|
	\| USER PREFERENCE HANDLING \| 0.600 \| 0.640 \| 0.62 \| 0.64 \| 0.60 \|
	\| ATOMIC TASK – BOOLEAN \| 0.880 \| 0.960 \| 0.70 \| 0.68 \| 0.88 \|
	\| ATOMIC TASK – ENUM \| 0.940 \| 0.940 \| 0.94 \| 0.86 \| 0.96 \|
	\| ATOMIC TASK – NUMBER \| 0.940 \| 0.960 \| 0.90 \| 0.82 \| 0.94 \|
	\| ATOMIC TASK – LIST \| 0.920 \| 0.900 \| 0.84 \| 0.78 \| 0.94 \|
	\| ATOMIC TASK – OBJECT (DEEP) \| 0.580 \| 0.520 \| 0.32 \| 0.36 \| 0.62 \|
	\| ATOMIC TASK – OBJECT (SHORT) \| 0.800 \| 0.960 \| 0.70 \| 0.56 \| 0.82 \|
	\| Overall Accuracy \| 0.750 (75.0%) \| 0.760 (76.0%) \| 0.61 \| 0.64 \| 0.761 \|


	---

	> Note:
	> We evaluate all models with the same configuration.
	> If you find any incorrect or inconsistent result, please report it for verification.
	> This ensures transparency and reproducibility across benchmarks.

	### Leaderboard Reference
	all model are benchmarked using [ACEBench](https://chenchen0103.github.io/ACEBench/) — assessing function calling, compositional reasoning, and multi-turn interaction.
	Results are internal benchmarks aligned with ACEBench task categories.

	---

	### Method
	- GRPO (Rule-based reward + self-confidence reward)
	- Evol Merging

	---

	## Support me at
	<p align="center">
	<a href="https://www.buymeacoffee.com/ductransa0g" target="_blank">
	<img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" width="150px">
	</a>
	</p>

	### License

	MIT License — free for research and non-commercial use with attribution.
	© 2025 beyoru.
	---