ICLR 2026

The Reasoning Trap

An ICLR 2026 paper proved that making AI agents reason harder actually makes them hallucinate more, not less. Toggle reasoning on and watch it happen.

Enhanced Reasoning

Toggle to see what happens when you turn on chain-of-thought / reasoning RL

Qwen 2.5-7B

Base Model · 7B params

Base

No Tool Available (should refuse)

34.8%

Distractor Tools (should pick correct tool)

54.7%

Qwen 3-8B

Base Model · 8B params

Base

No Tool Available (should refuse)

4.1%

Distractor Tools (should pick correct tool)

36.2%

Qwen 3-32B

Base Model · 32B params

Base

No Tool Available (should refuse)

5.1%

Distractor Tools (should pick correct tool)

46.6%

The Bottom Line

The layer in the neural network that should restrain bad tool calls is exactly what gets trained away during reasoning RL. The model learns to reason harder, but the guardrails against hallucination collapse in the process.

This isn't a bug in one model. It's a fundamental trade-off in how we train AI agents today. Every major reasoning method — RL, distillation, chain-of-thought — amplifies the same problem.

Enterprise Impact

47%

made decisions on hallucinated content

Agents in Production

96%

of enterprises run AI agents

Worst Case

2.1x

hallucination increase with reasoning

Source

"The Reasoning Trap: How Enhancing LLM Reasoning Amplifies Tool Hallucination"

ICLR 2026 · Rio de Janeiro · SimpleToolHalluBench

Read the full paper →

Building with AI agents?

We help teams ship reliable AI systems — with guardrails that actually work.

Talk to Us Try the Cost Calculator

The Reasoning Trap

Enhanced Reasoning

Qwen 2.5-7B

Qwen 3-8B

Qwen 3-32B

Can You Fix It?

The Bottom Line

Source

Building with AI agents?

The Reasoning Trap

Enhanced Reasoning

Qwen 2.5-7B

Qwen 3-8B

Qwen 3-32B

Can You Fix It?

The Bottom Line

Source

Building with AI agents?