Prompt caching (supported by some providers) lets the model reuse a prefix of the prompt across requests instead of reprocessing it every time. You send a long prefix once (e.g. system message + large doc); the API caches it. Later requests that use the same prefix only pay for and process the new part (e.g. the latest user message). That cuts cost and latency when you have many turns over the same context. Not all APIs support it; when they do, you often mark which part of the prompt is cacheable.
Prompt caching: reuse the prefix
Without caching
Same prefix re-sent and re-processed every time.
With caching
Prefix (e.g. system + doc) stored; you pay less and get lower latency for follow-up turns.
Supported by some APIs (e.g. Anthropic, Vertex). Best when the same long context is used across many requests.
Example: Typical use
Long system prompt + 100-page doc as context for a support agent. First turn: full cost. Next 10 turns: only the new user/assistant messages are processed; the prefix is served from cache. Ideal for chat-over-docs or agents with a fixed, large context.