FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions Paper • 2310.15421 • Published Oct 24, 2023
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting Paper • 2310.11324 • Published Oct 17, 2023 • 1
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement Paper • 2310.08559 • Published Oct 12, 2023 • 1
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Paper • 2312.01552 • Published Dec 4, 2023 • 32
Faith and Fate: Limits of Transformers on Compositionality Paper • 2305.18654 • Published May 29, 2023 • 7