arxiv:2511.10899
Farima Fatahi
farimafatahi
ยท
AI & ML interests
None yet
Recent Activity
authored
a paper
22 days ago
FactBench: A Dynamic Benchmark for In-the-Wild Language Model Factuality
Evaluation
authored
a paper
22 days ago
Logit Arithmetic Elicits Long Reasoning Capabilities Without Training
authored
a paper
22 days ago
From Proof to Program: Characterizing Tool-Induced Reasoning Hallucinations in Large Language Models