Page4/10

Memory & Context Management · Page 1 of 1

Agent Memory Systems

Agent Memory

Why Memory Matters

Without memory:
Step 1: Search for "Alice's phone number"
Step 2: Forget result
Step 3: Try to call Alice → Error (don't have number)

With memory:
Step 1: Search for "Alice's phone number" → Store in memory
Step 2: Call Alice using number from memory ✓

Types of Memory

1. Short-Term Memory (Context Window)

- Conversation history in current session
- Token-limited (e.g., last 4K tokens)
- Used by LLM to maintain context

Example:
Agent: "My name is Alice"
[Store in short-term]
Agent: "What's my name?"
[Retrieve from short-term] → "Alice"

2. Long-Term Memory

- Persistent storage (database, files)
- Unbounded size
- Contains facts, preferences, history

Examples:
- User preferences ("Alice likes coffee")
- Past interactions ("Booked 3 flights for Alice")
- Learned facts ("Target price limit: $300")

3. Working Memory

- Active task state
- Goals being pursued
- Current reasoning path

Example:
"Working on: Book flight from NYC to LA"
"Current step: Searching prices"
"Constraint: Budget $500 max"

Context Window Management

Token limits are real:

GPT-4: 8K-128K tokens
Claude 3: Up to 200K tokens

Problem: Long conversations exceed limits

Solutions:
1. Summarization: Compress old conversations
2. Retrieval: Only load relevant context
3. Hierarchical: Keep summaries, load details as needed

Memory Retrieval

Smart agents retrieve relevant memories:

Query: "Book a flight for Alice"

Retrieve:
- Alice's location (NYC)
- Alice's destination preferences (loves LA)
- Alice's budget ($300)
- Alice's past flights (prefers morning departures)

Now agent has context to book optimally!

Vector Embeddings for Memory

Modern approach: Store memories as embeddings.

Memory: "Alice loves coffee"
Embedding: [0.2, -0.5, 0.8, ...] (vector in high-dimensional space)

Query: "What's Alice's favorite drink?"
Query embedding: [0.19, -0.52, 0.78, ...] (similar!)

Find memories with similar embeddings → "Alice loves coffee" matches!

Forgetting (Purging Old Memory)

Agents need to forget irrelevant information:

Memory management:
- Keep: Frequently accessed, high relevance
- Archive: Older but still useful
- Forget: Outdated, incorrect, irrelevant

Example:
- Keep: User's current address
- Archive: Previous address from 5 years ago
- Forget: Temporary task from yesterday

main.py

OUTPUT

▶Click "Run Code" to execute…