Data Selection for Language Models via Importance Resampling Paper β’ 2302.03169 β’ Published Feb 6, 2023 β’ 1
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity Paper β’ 2305.13169 β’ Published May 22, 2023 β’ 4
RedPajama: an Open Dataset for Training Large Language Models Paper β’ 2411.12372 β’ Published Nov 19, 2024 β’ 58
LLaMA: Open and Efficient Foundation Language Models Paper β’ 2302.13971 β’ Published Feb 27, 2023 β’ 22
OpenAssistant Conversations -- Democratizing Large Language Model Alignment Paper β’ 2304.07327 β’ Published Apr 14, 2023 β’ 10
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner Paper β’ 2406.05498 β’ Published Jun 8, 2024 β’ 1
Universal and Transferable Adversarial Attacks on Aligned Language Models Paper β’ 2307.15043 β’ Published Jul 27, 2023 β’ 3
CyberOps AI: Red, Blue, Purple & Black Hat Defense Collection A cutting-edge collection of AI-driven models, datasets, and spaces dedicated to advancing the full spectrum of cybersecurity operations. β’ 6 items β’ Updated Feb 2, 2025 β’ 3
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper β’ 2512.12730 β’ Published Dec 14, 2025 β’ 50
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper β’ 2512.13168 β’ Published Dec 15, 2025 β’ 52
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper β’ 2512.12692 β’ Published Dec 14, 2025 β’ 14
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 12 items β’ Updated about 16 hours ago β’ 128
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 β’ 219
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper β’ 2510.05592 β’ Published Oct 7, 2025 β’ 109
WhiteRabbitNeo-V3 Collection The latest and most capable cybersecurity model we've ever created β’ 1 item β’ Updated Jun 25, 2025 β’ 20