| --- |
| base_model: |
| - meta-llama/Llama-3.1-8B-Instruct |
| library_name: peft |
| license: mit |
| datasets: |
| - Roihn/Einstein-Puzzles-Data |
| language: |
| - en |
| --- |
| # Einstein-Puzzles |
|
|
| **Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry** ([Arxiv](https://arxiv.org/abs/2510.25595)) |
|
|
| *Run Peng\*, Ziqiao Ma\*, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai* |
|
|
| ## Model Details |
|
|
| For all the model fine-tuning, we employ LoRA with a rank of 32, training with a global batch size of 128 and a learning rate of 2e-4 using a cosine decay schedule for 1 epoch. Fine-tuning is conducted using [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), while FlashAttention-2 is used to speed up training. The process takes approximately 30 minutes on 4 A40 GPUs with 48GB RAM each. |
|
|
| This repo provides the fine-tuned model with full capability of information providing and seeking and chain-of-thought reasoning. |
|
|
| ## Citation |
| ```bibtex |
| @misc{peng2025communicationverificationllmagents, |
| title={Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry}, |
| author={Run Peng and Ziqiao Ma and Amy Pang and Sikai Li and Zhang Xi-Jia and Yingzhuo Yu and Cristian-Paul Bara and Joyce Chai}, |
| year={2025}, |
| eprint={2510.25595}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2510.25595}, |
| } |
| ``` |
|
|