Running on CPU Upgrade Featured 2.82k The Smol Training Playbook 📚 2.82k The secrets to building world-class LLMs
Running 3.63k The Ultra-Scale Playbook 🌌 3.63k The ultimate guide to training LLM on large GPU Clusters
deepseek-ai/DeepSeek-R1-0528 Text Generation • 685B • Updated May 29, 2025 • 367k • • 2.39k
Alibaba-NLP/gte-Qwen1.5-7B-instruct Sentence Similarity • 8B • Updated Jan 11, 2025 • 780 • 108
Salesforce/SFR-Embedding-Code-400M_R Feature Extraction • 0.4B • Updated Jan 24, 2025 • 11k • 34
Alibaba-NLP/gte-modernbert-base Sentence Similarity • 0.1B • Updated Jul 4, 2025 • 33k • • 186
R3GAN Collection R3GAN: A Modern BaselineGAN https://github.com/brownvc/R3GAN/ https://arxiv.org/abs/2501.05441 • 7 items • Updated Jan 10, 2025 • 10
nomic-ai/modernbert-embed-base-unsupervised Sentence Similarity • 0.1B • Updated Dec 30, 2024 • 366 • 10
nomic-ai/modernbert-embed-base Sentence Similarity • 0.1B • Updated Jan 24, 2025 • 76.2k • • 223
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated Jan 6, 2025 • 29
Long Context RAG Performance of Large Language Models Paper • 2411.03538 • Published Nov 5, 2024 • 1