Techniques & Methods

Test-Time Compute

Letting an AI think longer at inference time — the breakthrough behind reasoning models.

Also known as: inference-time compute,extended thinking

Test-time compute is the idea that AI models can be much smarter if you let them think longer before answering. Instead of immediately generating a response, the model works through the problem internally — sometimes for minutes — exploring multiple approaches, catching errors, revising. This is the breakthrough behind GPT-5 Pro, Claude Opus 4.8 with extended thinking, Gemini 3.5 Pro "Deep Think", and DeepSeek-R1. The trade-off is cost: "thinking tokens" are billed at output prices. On Opus 4.8 that is $75 per million. Worth enabling for complex coding, multi-step maths, research synthesis. Wasted spend for simple Q&A.

Read the full guide