Quick answer

Google launched Gemini 3.5 Flash in May 2026 — a small, fast, cheap model that punches above its weight. It is roughly 2x faster than Gemini 3 Pro at a fraction of the price, while matching GPT-5 mini and beating Claude Haiku 4.5 on most benchmarks. For developers, it is the new default for high-volume tasks. For everyday users, it is the model now powering most Gemini app responses.

Frontier models grab the headlines, but the small models are where most actual AI usage happens. Google's Gemini 3.5 Flash launch this week is a quiet but significant moment — it pushes the price-to-capability frontier forward and changes the calculus for anyone building AI products at scale.

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is the latest in Google's "Flash" tier — small models optimised for speed and cost rather than absolute capability. It sits below Gemini 3.5 Pro in Google's lineup but is significantly more capable than the Gemini 2.5 Flash it replaces. Crucially, it is multimodal by default — text, images, audio, and video all work natively.

What is actually new?

  • Roughly 2x faster than Gemini 3 Pro on identical workloads
  • Pricing: $0.10 input / $0.30 output per million tokens — about half of GPT-5 mini
  • 1 million token context window (matches Pro tier)
  • Native tool use — function calling, code execution, web search built in
  • Best-in-class for non-English languages, especially Hindi, Arabic, and Indonesian
  • Improved reasoning — closes much of the gap to Pro on multi-step tasks

How does it compare to GPT-5 mini and Claude Haiku 4.5?

On most general benchmarks, Gemini 3.5 Flash and GPT-5 mini are roughly tied. Claude Haiku 4.5 is slightly behind on raw capability but ahead on writing quality. Where Gemini 3.5 Flash genuinely leads: multimodal tasks (especially video understanding), non-English languages, and price-per-token at scale.

Benchmark numbers from Google's announcement: Gemini 3.5 Flash scores 78.4% on GPQA Diamond (PhD-level science questions) — up from 71.2% for Gemini 2.5 Flash, and ahead of GPT-5 mini at 75.8%. On HumanEval coding, it lands at 88.1% vs Claude Haiku 4.5's 86.4%.

Who should care about this launch?

  • Developers building AI features at scale — the price drop is real and significant
  • Anyone using the free Gemini app — Google routes most queries to Flash, so quality just got better
  • Startups choosing a default LLM — Gemini 3.5 Flash is genuinely competitive with the OpenAI default
  • Teams in non-English markets — multilingual performance is best-in-class for the price

How do you access it?

Three ways: (1) The free Gemini app at gemini.google.com — most free responses now come from 3.5 Flash. (2) Google AI Studio for developers — free tier with generous limits. (3) Vertex AI for enterprise, with the same per-token pricing as the API. Roll-out started May 19 and is fully global by end of May.

What is the catch?

Two things. First, it is still a "small" model — for genuinely hard reasoning, long-form writing, or complex agentic workflows, Gemini 3.5 Pro and Claude Opus 4.7 are meaningfully better. Second, Google's API has historically had rougher edges than OpenAI's and Anthropic's; rate limits and error handling can be inconsistent at scale.

Bottom line

Gemini 3.5 Flash is the new default small model. If you are building AI features at scale or just opening the free Gemini app, you are about to use this model a lot. The frontier moved this week — quietly, but meaningfully.