Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection Paper • 2504.01931 • Published Apr 2
Enhancing Group Fairness in Online Settings Using Oblique Decision Forests Paper • 2310.11401 • Published Oct 17, 2023
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models Paper • 2307.12980 • Published Jul 24, 2023 • 1
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment Paper • 2411.18688 • Published Nov 27, 2024
Data-augmented phrase-level alignment for mitigating object hallucination Paper • 2405.18654 • Published May 28, 2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep Paper • 2406.05946 • Published Jun 10, 2024
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment Paper • 2404.12318 • Published Apr 18, 2024 • 15
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking Paper • 2312.09244 • Published Dec 14, 2023 • 10