Alt+←/→to navigatePage5/956

Fine-Tuning LLMs · Page 1 of 1

Fine-Tuning Strategies

36 min Intermediate

Fine-Tuning Large Language Models

What is Fine-Tuning?

Fine-tuning = Taking a pre-trained LLM and training it on your specific data.

Pre-trained model: General knowledge (trained on internet)
Fine-tuning data: Your specific data (domain knowledge)
Result: Model that acts like you want

Why Fine-Tune?

Base GPT-3.5: General responses
Fine-tuned on customer service: Customer service responses
Fine-tuned on medical data: Medical advice (with proper disclaimers)
Fine-tuned on code: Code generation for your style

Fine-Tuning vs RAG

Fine-Tuning:
- Modifies model weights
- Learning is "baked in"
- Better for style/behavior changes
- More expensive, slower
- Changes how model thinks

RAG:
- Model stays the same
- Adds context at inference time
- Better for knowledge addition
- Cheaper, faster
- Model retrieves then answers

Types of Fine-Tuning

Full Fine-Tuning

Train ALL parameters of the model.

Pros:
- Best quality
- Model fully adapts

Cons:
- Expensive (requires GPU, lots of data)
- Time-consuming
- Requires hundreds of examples

Parameter-Efficient Fine-Tuning (PEFT)

Train only a small percentage of parameters.

Main techniques:
1. LoRA (Low-Rank Adaptation) - Train 1-2% of params
2. QLoRA - Quantized LoRA (cheaper)
3. Prefix tuning - Add learnable prefixes
4. Adapter layers - Add small trainable modules

Instruction Fine-Tuning

Training data format:
{
  "instruction": "Summarize this text",
  "input": "[long text]",
  "output": "[summary]"
}

Model learns: instruction → output

RLHF (Reinforcement Learning from Human Feedback)

Stage 1: Supervised fine-tuning
- Train on high-quality examples

Stage 2: Reward model training
- Train model to predict human preferences
- Humans rate outputs (this is better vs that is better)

Stage 3: Policy optimization
- Use reward model to fine-tune LLM
- Optimize for "human-preferred" outputs

Fine-Tuning Process

Step 1: Prepare data (100-1000+ examples)
Step 2: Format data correctly
Step 3: Choose base model
Step 4: Fine-tune (hours to days)
Step 5: Evaluate on test set
Step 6: Deploy and monitor

Data Requirements

Small model (7B):    100-500 examples minimum
Medium (13-70B):     500-5K examples
Large (175B+):       Thousands of examples

Quality > Quantity:
- 100 high-quality examples > 1000 random examples

Cost Comparison

GPT-3.5 Fine-tuning: $0.008 per 1K tokens (input), $0.012 (output)
Claude Fine-tuning: Similar pricing
Open source (LLaMA): Free (run yourself)

ROI: Better model → Better results → Worth it if using heavily

Risks & Challenges

Catastrophic forgetting: Model "forgets" general knowledge
- Solution: Blend original data with new data during training

Overfitting: Model memorizes training data
- Solution: Validation set, early stopping, regularization

Data quality: Bad training data → Bad results
- Solution: Carefully curate & clean training data

Bias amplification: Fine-tuning can amplify biases
- Solution: Diverse training data, bias testing

main.py

OUTPUT

▶Click "Run Code" to execute…