ColBERT-Zero πΆ Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT β’ 10 items β’ Updated 18 days ago β’ 19
view article Article LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family Jan 19 β’ 88
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M β’ 16 items β’ Updated May 5, 2025 β’ 305
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 β’ 49 items β’ Updated 19 days ago β’ 139
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated about 9 hours ago β’ 155
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 β’ 15 items β’ Updated Dec 6, 2024 β’ 660
Molmo Collection Artifacts for open multimodal language models. β’ 5 items β’ Updated Dec 23, 2025 β’ 309
xLAM models Collection xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM β’ 19 items β’ Updated 19 days ago β’ 59
GLiNER bi-encoders Collection Bi-encoder and poly-encoder architectures of GLiNER β’ 5 items β’ Updated Jan 29 β’ 13
πͺ SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos β’ 12 items β’ Updated May 5, 2025 β’ 246
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs β’ 7 items β’ Updated 19 days ago β’ 16
view article Article Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval +1 Mar 22, 2024 β’ 128
Idefics2 πΆ Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. β’ 11 items β’ Updated May 6, 2024 β’ 92
Awesome Document AI Collection A collection of open-source document AI π π π β’ 27 items β’ Updated Mar 11, 2024 β’ 80
Vector-io compatible Datasets Collection These datasets can be loaded into your vector database with a single line bash command β’ 15 items β’ Updated Sep 19, 2024 β’ 3
Pokemons dataset captioned with different models Collection The Pokemons dataset from Lambda Labs is quite popular in the diffusion community because it lets us quickly validate ideas. β’ 3 items β’ Updated Nov 28, 2023 β’ 3