Techniques & Methods
Prompt Caching
A pricing trick that cuts AI API input cost by 50–90% for repeat-context workflows.
Also known as: cache,context caching
Prompt caching is a feature where AI providers cache the internal computation of a stable prompt prefix (a system prompt, retrieved documents, conversation history) and charge you dramatically less for it on repeat requests. The economics: Anthropic offers 90% off cached input. Google offers 75%. OpenAI offers 50%. For RAG, agentic workflows, or multi-turn chat — where the prefix repeats — caching is the single biggest cost lever available. A RAG chatbot with a 4,000-token system prompt at 10k conversations/day on Claude Sonnet 4.6 costs $12,000/mo without caching and roughly $1,500/mo with 95% cache hit rate. Same model, same output. Most developers in 2026 are not yet using it.
