view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels Aug 18 • 88
view article Article Asynchronous Robot Inference: Decoupling Action Prediction and Execution +5 Jul 10 • 45
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9 • 722
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 Jul 1 • 130
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 Sep 18, 2024 • 272
view article Article 💥 Building a Vulnerable Bank MCP — Then Automating an Agent to Hack It Jun 18 • 8
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 143
view changelog Changelog Xet is now the default storage option for new users and organizations May 23 • 74
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16 • 55
view article Article Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models. May 15 • 36
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 76