Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts
Paper • 2602.03473 • Published • 11
None defined yet.
Scaling Continual Learning to 300+ Tasks with Bi-Level Routing Mixture-of-Experts
BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models