Quick answer

Anthropic released Claude Opus 4.8 on June 3. The headline changes: extended thinking mode now runs for up to 60 minutes on hard problems (was 12), tool-use reliability jumped to 96% on agentic benchmarks, and input pricing dropped 15% to $12.50 per million tokens. Output stays at $75/M. It now leads GPT-5 on every published benchmark Anthropic reports — by small but consistent margins. For coding, research synthesis, and long agentic workflows, it is the most capable AI you can access today. Existing Opus 4.7 users in claude.ai are auto-upgraded; API users opt in by switching model strings.

Anthropic's release cadence in 2026 has been steady — Opus 4.6 in February, Opus 4.7 in April, Opus 4.8 this week. Each release is incremental, none of them feel revolutionary on day one, but the cumulative improvements have moved Claude Opus from "second-best frontier model" in 2024 to "the model professionals reach for on hard work" in mid-2026. Here is what actually changed with Opus 4.8 and whether the upgrade matters for what you do.

What is new in Opus 4.8?

  • Extended thinking now runs up to 60 minutes (was 12) — useful for genuinely hard research, multi-step debugging, and long agentic loops
  • Tool use reliability hit 96% on Anthropic's internal agentic benchmark (Opus 4.7 was 89%) — fewer wasted tool calls, fewer retries
  • Input pricing cut from $15 to $12.50 per million tokens — a 15% reduction at the most expensive tier
  • Cached input pricing unchanged at $1.50/M — caching still saves 88%
  • Context window stays at 500,000 tokens — Anthropic prioritised depth over length
  • Coding benchmarks: SWE-Bench Verified 89.7% (up from 87.4%); now the highest published score by any model
  • GPQA Diamond: 94.3% (up from 93.1%) — leads GPT-5's 90.4% and Gemini 3.5 Pro's 88.7%
  • Improved instruction following — Anthropic specifically tuned for "do exactly what I asked, nothing more"

The single most impactful change in real work is the tool-use reliability bump. Going from 89% to 96% sounds small, but for agentic workflows (Devin, Cline, Claude Code) it is the difference between "Devin completes 65% of tickets unsupervised" and "Devin completes 80% of tickets unsupervised". That is a meaningful productivity shift.

How does it compare to GPT-5 and Gemini 3.5 Pro?

On published benchmarks, Opus 4.8 now leads both. The margins are small (1–4 percentage points across most tests) but consistent. The bigger story is task fit — different models still have different strengths even when their headline numbers converge.

  • Coding: Opus 4.8 leads decisively — SWE-Bench 89.7% vs GPT-5's 79.1% and Gemini 3.5 Pro's 76.4%
  • Long-document analysis: Opus 4.8 leads on extended-thinking benchmarks; Gemini still wins on raw context length (2M tokens)
  • Multi-step reasoning: Opus 4.8 and GPT-5 are essentially tied; Opus pulls ahead with extended thinking enabled
  • Writing quality: Opus 4.8 remains the preferred model in blind A/B tests for long-form prose
  • Multimodal (image + audio + video): Gemini still strongest — Anthropic has not prioritised this
  • Speed: Opus 4.8 is slightly faster than 4.7 but still slower than GPT-5 for casual chat
  • Ecosystem: GPT-5 still has more third-party plugins and integrations

Should you upgrade from Opus 4.7?

Three clear groups should upgrade immediately. Three groups can probably wait.

Upgrade now:

  • Engineering teams using Devin, Claude Code, or Cline — the tool-use reliability bump is real
  • Researchers doing long-document synthesis — the 60-minute extended thinking unlocks workflows that previously timed out
  • Anyone using Anthropic API at scale — the 15% input pricing cut applies automatically

Wait or stay on 4.7:

  • Casual claude.ai users — you will be auto-upgraded; no action needed
  • Heavy multimodal users — Gemini 3.5 Pro is still better for image/audio/video
  • Cost-sensitive prototyping — Claude Haiku 4.5 at $0.80/M is dramatically cheaper for simple tasks

How do you access Opus 4.8?

Three ways. (1) Claude.ai Pro and Team users are auto-upgraded — Opus 4.8 is now the default Opus tier in the web app. (2) Claude Code users get it on the next CLI update. (3) API users opt in by passing `claude-opus-4-8` as the model string. The deprecation notice on 4.7 has not yet been announced, but expect it within 6 months following Anthropic's normal lifecycle.

What this means for the frontier race

OpenAI is now the lab playing catch-up on raw capability benchmarks. GPT-5 launched in early 2026 with industry-leading scores; six months later, Opus 4.7 caught up; now Opus 4.8 has pulled ahead. The next OpenAI release — widely rumoured to be GPT-5.5 or a refreshed reasoning model — will need to retake the lead, or the narrative around "the best AI" shifts decisively to Anthropic.

For users, the competition is genuinely good news. Two labs at the frontier means faster iteration, more pricing pressure, and better tooling for everyone. The era of "one model is clearly better" is probably over for the foreseeable future.

Bottom line

Opus 4.8 is a meaningful upgrade for engineering teams, researchers, and API users — but a quiet one for casual chat. The headline is that Anthropic has reclaimed the benchmark lead from GPT-5 while cutting prices. The bigger story is the steady, incremental march of Claude from "interesting alternative" to "default for hard professional work" over the past 18 months. If your AI bills are mostly Anthropic, you have less work to do than ever. If they are not, this is the release that justifies revisiting your default.