Core Concepts
Long Context
AI models that can hold millions of tokens at once — entire codebases, books, year-long threads.
Also known as: long context AI,long context window
Long context refers to AI models with very large context windows — 1M tokens and above. By mid-2026, Gemini 3.5 Pro holds 2M tokens, GPT-5.6 holds 1M, Llama 4 Behemoth holds 1M, Claude Opus 4.8 holds 200K (with effective extended-thinking memory beyond). Long context enables workflows that were impossible at 8K or 32K — fitting entire codebases in the prompt for cross-file refactors, synthesising 50+ research papers in one query, year-long Slack threads, full legal contracts with amendments. The trade-offs are cost (1M-token prompts cost real money), latency (first token can take 5-15 seconds), and reliability ("needle in a haystack" accuracy drops near boundaries). Prompt caching is critical to make long-context affordable in production.

