Page4/9
Retrieval-Augmented Generation (RAG) Β· Page 1 of 1
RAG Concepts & Architecture
Retrieval-Augmented Generation (RAG)
The Problem: Knowledge Cutoff
LLMs are trained on data up to a certain date:
- GPT-4 trained until April 2023
- After that date: Model doesn't know
User: "What happened on May 1, 2024?"
GPT-4: "I don't know, my training ended before that."
The Solution: RAG
Instead of retraining the model:
- Store your documents in a database
- When user asks a question:
- Search documents for relevant info
- Give relevant info to LLM
- LLM answers based on context
User: "What are the new policies?"
System:
1. Search knowledge base for "policies"
2. Find: "New policy document from May 2024"
3. Pass to LLM: "Based on this document: [text]... Answer: what are the new policies?"
4. LLM: "The new policies are..."
Why RAG is Powerful
| Approach | Cost | Speed | Freshness | Accuracy |
|---|---|---|---|---|
| Fine-tuning | $$$$$ | Slow | Days | Good |
| RAG | $$ | Fast | Real-time | Excellent |
| No knowledge | $ | Very Fast | N/A | Poor |
RAG Architecture
Documents β Vector Database
β
|
User Question β Embedding β Search β Top K Results β LLM β Answer
Step 1: Vectorize Documents
Document: "Python is a programming language"
Embedding: [0.2, -0.5, 0.8, 0.1, ...] (1024D vector)
Embedding captures semantic meaning!
Similar documents β Similar embeddings
Step 2: Search for Relevant Documents
User question: "How do I learn Python?"
Question embedding: [0.25, -0.48, 0.75, 0.12, ...] (similar to "Python document"!)
Search: Find K documents with highest similarity
Step 3: Pass to LLM with Context
System message: "Use these documents to answer:"
Documents: [retrieved documents]
User question: "How do I learn Python?"
LLM: [answers based on context]
Practical Example: Customer Support Bot
Company stores:
- Product manuals
- FAQs
- Support tickets
- Policies
Customer: "How do I return an item?"
RAG system:
1. Search knowledge base β Find return policy
2. Pass to LLM with policy
3. LLM: "According to our policy: [details] Steps: [steps]"
Vector Databases (Tools)
| Tool | Best For |
|---|---|
| Pinecone | Managed, easy |
| Weaviate | Open source, flexible |
| Milvus | Scalable, enterprise |
| Chroma | Local/small projects |
| Qdrant | Performance |
RAG vs Fine-tuning
Use RAG when:
- Knowledge changes frequently
- Need up-to-date info
- Multiple knowledge sources
- Quick implementation needed
Use Fine-tuning when:
- Want to change model behavior/style
- Need performance optimization
- Training data is stable
- Cost not a concern
main.py
Loading...
OUTPUT
βΆClick "Run Code" to executeβ¦