More Thinking, More Hallucination
The prevailing wisdom in AI has been simple: make models think harder, and they'll get more accurate. Chain-of-thought prompting. Reasoning RL. Thinking tokens. The entire industry has been optimizing for one thing: deeper reasoning.
An ICLR 2026 paper just proved that assumption is fundamentally wrong.
"The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination" demonstrates that training AI agents to reason harder through reinforcement learning increases hallucination rates in lockstep with task performance. The smarter the model gets at solving problems, the worse it gets at knowing when to stop.
The Data
The researchers tested multiple models on SimpleToolHalluBench, a benchmark measuring two types of failure:
| Model | Base Rate | After Reasoning RL | Change |
|---|
|-------|-----------|-------------------|--------|
| Qwen 2.5-7B (NTA) | **34.8%** | **74.3%** | +113% |
|---|---|---|---|
| Qwen 2.5-7B (DT) | 54.7% | 78.7% | +44% |
| Qwen 3-8B (DT) | 36.2% | 56.8% | +57% |
| Qwen 3-32B (DT) | 46.6% | 50.7% | +9% |
The worst case: a 7B model goes from hallucinating tools 34.8% of the time to 74.3%, a 2.1x increase, just by enhancing its reasoning through knowledge distillation from DeepSeek-R1.
Why This Happens
This isn't a bug in one model. The researchers found a mechanistic explanation:
The neural network layer that should restrain bad tool calls is exactly what gets trained away during reasoning RL.
Using representation analysis (CKA similarity scores), they showed that tool-reliability representations collapse to below 0.75 in early/middle layers post-RL, while in-distribution representations maintain above 0.9 stability. The model's ability to reason about tools gets better, but its ability to say "this tool doesn't exist" gets destroyed.
Reasoning RL teaches the model to be more confident and more thorough. Those are exactly the wrong traits when the correct answer is "I don't know" or "I can't do that."
Can You Fix It?
The paper tested two mitigation strategies:
Neither approach closes the gap. And this is on a controlled benchmark. In production, where prompts are messier and tool inventories are larger, the problem compounds.
Why This Matters Now
This isn't an academic curiosity. Consider:
- Every major AI lab is pushing reasoning models: OpenAI's o-series, DeepSeek-R1, Qwen's thinking mode, Anthropic's extended thinking
The industry is collectively building agents that are increasingly capable and increasingly unreliable at the same time. The reasoning trap isn't something that might happen. It's happening right now, at scale, in production systems making real decisions.
The Contrarian Take
The AI industry's response to hallucination has been "make the model smarter." This paper proves that's exactly backwards.
You don't fix hallucination by adding more reasoning. You fix it by building systems that assume the model will hallucinate and handle it gracefully. Verification layers. Human-in-the-loop checkpoints. Tool call validation. Output grounding against source data.
The smartest AI teams aren't the ones with the biggest models. They're the ones that treat model output as unverified-by-default and build the infrastructure to catch the inevitable failures.
We built an interactive explorer so you can see this trade-off for yourself. Toggle reasoning on and off across real models and watch the hallucination rates change in real time.
Source: "The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination", ICLR 2026, Rio de Janeiro. Read the paper →. Try the Reasoning Trap Explorer →