Vanishing Gradients in Reinforcement Finetuning of Language Models Paper • 2310.20703 • Published Oct 31, 2023 • 1
LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures Paper • 2312.04000 • Published Dec 7, 2023
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks Paper • 2407.03475 • Published Jul 3, 2024
Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers Paper • 2509.24317 • Published Sep 29 • 10
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Paper • 2501.12370 • Published Jan 21 • 11