arxiv:2512.22910

Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning

Published on Dec 28

Authors:

Abstract

Deep Q-learning algorithms remain notoriously unstable, especially during early training when the maximization operator amplifies estimation errors. Inspired by bounded rationality theory and developmental learning, we introduce Sat-EnQ, a two-phase framework that first learns to be ``good enough'' before optimizing aggressively. In Phase 1, we train an ensemble of lightweight Q-networks under a satisficing objective that limits early value growth using a dynamic baseline, producing diverse, low-variance estimates while avoiding catastrophic overestimation. In Phase 2, the ensemble is distilled into a larger network and fine-tuned with standard Double DQN. We prove theoretically that satisficing induces bounded updates and cannot increase target variance, with a corollary quantifying conditions for substantial reduction. Empirically, Sat-EnQ achieves 3.8x variance reduction, eliminates catastrophic failures (0% vs 50% for DQN), maintains 79% performance under environmental noise}, and requires 2.5x less compute than bootstrapped ensembles. Our results highlight a principled path toward robust reinforcement learning by embracing satisficing before optimization.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.22910 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.22910 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.22910 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.