Agents need memory to be useful across turns. Short-term memory is the current context window: the last N messages, the latest tool calls and results. Everything in the prompt that the model can "see" is short-term; when the window is full, older content is dropped or summarized. Long-term memory is persistent storage: a vector DB of past facts, user preferences, or conversation summaries. The agent doesnβt read the whole store every time β it retrieves relevant pieces (e.g. via similarity search) and injects them into the context. So the model gets "you prefer Celsius" or "last time we discussed project X" only when relevant.
How memory feeds into the conversation
Short-term = current chat. Long-term = retrieved (e.g. from vector DB) and injected only when relevant.
Short-term (context)
Whatβs in the current conversation: last N messages, latest tool results. Lost when the session ends or the window is trimmed.
Example: "You asked for London weather; the result was 12Β°C. You then asked if you need a jacket β the model uses that to say yes."
Long-term
Stored across sessions: vector DB of past facts, user preferences, or conversation summaries. The agent retrieves relevant bits (e.g. via RAG) and adds them to context.
Example: "User prefers Celsius and lives in Berlin" stored; when they ask "weather today" the agent can default to Berlin and Celsius.
In practice