Back to glossary

Agent Caching

Storing and reusing the results of previous agent computations, tool calls, or model inferences to reduce latency and cost for repeated or similar requests. Agent caching operates at multiple levels from prompt caching to full response caching.

Agent caching dramatically reduces both cost and latency for agent systems that handle repetitive or similar requests. Prompt caching (supported natively by providers like Anthropic) stores the processed system prompt and tool definitions so they do not need to be re-tokenized on every call. Tool result caching stores API responses so repeated queries return instantly. Semantic caching matches similar (not just identical) queries to cached responses.

For high-traffic agent deployments, caching can reduce costs by 50 to 80 percent. Implement it in layers: deterministic caching for identical tool calls (same parameters always return same results), time-bounded caching for data that changes slowly (competitor prices, inventory levels), and semantic caching for model inferences on similar queries. The key challenge is cache invalidation: knowing when cached data is stale and needs refreshing. Set appropriate TTLs based on how quickly the underlying data changes, and implement cache warming for predictably needed data. Monitor cache hit rates to quantify the value of your caching strategy and identify opportunities for improvement.

Related Terms

Model Context Protocol (MCP)

An open standard that defines how AI models connect to external tools, data sources, and services through a unified interface. MCP enables agents to dynamically discover and invoke capabilities without hardcoded integrations.

Tool Use

The ability of an AI model to invoke external functions, APIs, or services during a conversation to perform actions beyond text generation. Tool use transforms language models from passive responders into active problem solvers.

Function Calling

A model capability where the AI generates structured JSON arguments for predefined functions rather than free-form text. Function calling provides a reliable bridge between natural language understanding and programmatic execution.

Agentic Workflow

A multi-step process where an AI agent autonomously plans, executes, and iterates on tasks using tools, reasoning, and feedback loops. Agentic workflows go beyond single-turn interactions to accomplish complex goals.

ReAct Pattern

An agent architecture that interleaves Reasoning and Acting steps, where the model thinks about what to do next, takes an action, observes the result, and repeats. ReAct combines chain-of-thought reasoning with tool use in a unified loop.

Chain of Thought

A prompting technique that instructs the model to break down complex problems into sequential reasoning steps before producing a final answer. Chain of thought significantly improves accuracy on math, logic, and multi-step tasks.