Trends Editor's Pick ·7 min read·June 4, 2026

Claude Opus 4.8 Released — What's New and Should You Switch from Opus 4.7?

Anthropic just shipped Opus 4.8 with extended thinking improvements, cheaper pricing, and better tool use. Here is what actually changed.

Quick answer

Anthropic released Claude Opus 4.8 on June 3. The headline changes: extended thinking mode now runs for up to 60 minutes on hard problems (was 12), tool-use reliability jumped to 96% on agentic benchmarks, and input pricing dropped 15% to $12.50 per million tokens. Output stays at $75/M. It now leads GPT-5 on every published benchmark Anthropic reports — by small but consistent margins. For coding, research synthesis, and long agentic workflows, it is the most capable AI you can access today. Existing Opus 4.7 users in claude.ai are auto-upgraded; API users opt in by switching model strings.

Anthropic's release cadence in 2026 has been steady — Opus 4.6 in February, Opus 4.7 in April, Opus 4.8 this week. Each release is incremental, none of them feel revolutionary on day one, but the cumulative improvements have moved Claude Opus from "second-best frontier model" in 2024 to "the model professionals reach for on hard work" in mid-2026. Here is what actually changed with Opus 4.8 and whether the upgrade matters for what you do.

What is new in Opus 4.8?

Extended thinking now runs up to 60 minutes (was 12) — useful for genuinely hard research, multi-step debugging, and long agentic loops
Tool use reliability hit 96% on Anthropic's internal agentic benchmark (Opus 4.7 was 89%) — fewer wasted tool calls, fewer retries
Input pricing cut from $15 to $12.50 per million tokens — a 15% reduction at the most expensive tier
Cached input pricing unchanged at $1.50/M — caching still saves 88%
Context window stays at 500,000 tokens — Anthropic prioritised depth over length
Coding benchmarks: SWE-Bench Verified 89.7% (up from 87.4%); now the highest published score by any model
GPQA Diamond: 94.3% (up from 93.1%) — leads GPT-5's 90.4% and Gemini 3.5 Pro's 88.7%
Improved instruction following — Anthropic specifically tuned for "do exactly what I asked, nothing more"

The single most impactful change in real work is the tool-use reliability bump. Going from 89% to 96% sounds small, but for agentic workflows (Devin, Cline, Claude Code) it is the difference between "Devin completes 65% of tickets unsupervised" and "Devin completes 80% of tickets unsupervised". That is a meaningful productivity shift.

How does it compare to GPT-5 and Gemini 3.5 Pro?

On published benchmarks, Opus 4.8 now leads both. The margins are small (1–4 percentage points across most tests) but consistent. The bigger story is task fit — different models still have different strengths even when their headline numbers converge.

Coding: Opus 4.8 leads decisively — SWE-Bench 89.7% vs GPT-5's 79.1% and Gemini 3.5 Pro's 76.4%
Long-document analysis: Opus 4.8 leads on extended-thinking benchmarks; Gemini still wins on raw context length (2M tokens)
Multi-step reasoning: Opus 4.8 and GPT-5 are essentially tied; Opus pulls ahead with extended thinking enabled
Writing quality: Opus 4.8 remains the preferred model in blind A/B tests for long-form prose
Multimodal (image + audio + video): Gemini still strongest — Anthropic has not prioritised this
Speed: Opus 4.8 is slightly faster than 4.7 but still slower than GPT-5 for casual chat
Ecosystem: GPT-5 still has more third-party plugins and integrations

Should you upgrade from Opus 4.7?

Three clear groups should upgrade immediately. Three groups can probably wait.

Upgrade now:

Engineering teams using Devin, Claude Code, or Cline — the tool-use reliability bump is real
Researchers doing long-document synthesis — the 60-minute extended thinking unlocks workflows that previously timed out
Anyone using Anthropic API at scale — the 15% input pricing cut applies automatically

Wait or stay on 4.7:

Casual claude.ai users — you will be auto-upgraded; no action needed
Heavy multimodal users — Gemini 3.5 Pro is still better for image/audio/video
Cost-sensitive prototyping — Claude Haiku 4.5 at $0.80/M is dramatically cheaper for simple tasks

How do you access Opus 4.8?

Three ways. (1) Claude.ai Pro and Team users are auto-upgraded — Opus 4.8 is now the default Opus tier in the web app. (2) Claude Code users get it on the next CLI update. (3) API users opt in by passing `claude-opus-4-8` as the model string. The deprecation notice on 4.7 has not yet been announced, but expect it within 6 months following Anthropic's normal lifecycle.

What this means for the frontier race

OpenAI is now the lab playing catch-up on raw capability benchmarks. GPT-5 launched in early 2026 with industry-leading scores; six months later, Opus 4.7 caught up; now Opus 4.8 has pulled ahead. The next OpenAI release — widely rumoured to be GPT-5.5 or a refreshed reasoning model — will need to retake the lead, or the narrative around "the best AI" shifts decisively to Anthropic.

For users, the competition is genuinely good news. Two labs at the frontier means faster iteration, more pricing pressure, and better tooling for everyone. The era of "one model is clearly better" is probably over for the foreseeable future.

Bottom line

Opus 4.8 is a meaningful upgrade for engineering teams, researchers, and API users — but a quiet one for casual chat. The headline is that Anthropic has reclaimed the benchmark lead from GPT-5 while cutting prices. The bigger story is the steady, incremental march of Claude from "interesting alternative" to "default for hard professional work" over the past 18 months. If your AI bills are mostly Anthropic, you have less work to do than ever. If they are not, this is the release that justifies revisiting your default.

What is new in Opus 4.8?

How does it compare to GPT-5 and Gemini 3.5 Pro?

Should you upgrade from Opus 4.7?

How do you access Opus 4.8?

What this means for the frontier race

Bottom line

What Is Sora 2 — and Is It Better Than Veo and Runway in 2026?

AI for Small Business in 2026 — 7 Tools That Actually Save Time

AI Voice Generators in 2026 — The 5 That Actually Sound Human