--- license: apache-2.0 pipeline_tag: text-generation extra_gated_prompt: "You agree to not use this model (or future versions) to conduct experiments that cause harm to any person or group." extra_gated_fields: Company: text Country: country Specific date: date_picker I want to use this model for: type: select options: - Work - Research - Education - Hobby - label: Other value: other I agree to use this model in good faith ONLY: checkbox ---

cubby

In this repository, we propose the next iteration of `arco`, a new meta-learner small language model. Now with `qwen` as the base architecture for improvements. During previous research, we first noticed a dramatic underpeformance on fewshot prompting from previous `arco` series (regardless of benchmark improvements on arc) so we decided that the main concept to work on was making a more robust fewshot learning by focusing directly on tasks that improve that skill with a stronger baseline model like `qwen` family. After several merging iterations with some openly available models, we finally achieved a strong baseline for a meta-learner model which we called [`arco-3`](https://huggingface.co/appvoid/arco-3-gguf). This model will serve as the starting point for future fewshot finetunings and experiments. #### prompt There is no prompt intentionally set. #### benchmarks ##### **meta arena** We tested around 65 models against each other with fewshot tasks and used `gemini-2.5-pro` to chose the best answers from each one. Currently, it ranks 13th in [meta-arena](https://huggingface.co/spaces/appvoid/meta-arena).

meta arena

##### **variance** We also tested the model against some popular small models on "power" distribution for our 5 typically chosen language modeling benchmarks. variance ##### **language modeling** To our surprise, this model also improved some language modeling tasks over the base model on several well-known benchmarks. | Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average | | -----------|--------------------------------|-------|-------|-----------|--------|------------|---------| | 0.6b | qwen 3 |40.31| 34.47 | 47.38 | 67.46 | 56.04 | 49.13 | | 0.6b | arco 3 | **43.34** | **36.01** | **49.56** | **68.17** | **58.09** | **51.03** | #### strengths - Strong bias to format - Excellent classifier - State-of-the-art paraphrasing - Vocabulary/Idiomatic understanding #### limitations - Lack of creative outputs - Extremely poor summarization skills - Poor causality understanding - Hallucinations We have a plan to tackle each one of these issues for them to be corrected in the future. #### supporters Buy Me A Coffee #### trivia `arco` means "bow" in spanish, which is just another way to say that hits its target fast and accurately. **Note**: the model has not been tested as a chat assistant and it might not work as intended, use with caution.