Page2/9
Large Language Models (LLMs) Explained · Page 1 of 1
From Transformers to ChatGPT
Large Language Models (LLMs)
What is an LLM?
A Large Language Model is:
- A transformer network
- Pre-trained on billions of text tokens
- Learns to predict the next word
- Scaled to hundreds of billions of parameters
Input: "The cat sat on the"
LLM: "mat" (predicts next word)
Input: "What is 2+2?"
LLM: "4" (learned from examples)
Input: "Write a poem about"
LLM: Generates coherent, creative text
Training Process
Stage 1: Pre-training (Next Token Prediction)
Training data: Entire internet (books, websites, code)
Objective: Predict next word given previous words
Learning: Billions of examples → learns language patterns
Loss function:
- For each word, compute probability of next word
- Compare to actual next word
- Minimize cross-entropy loss
Stage 2: Supervised Fine-Tuning (Optional)
Training data: Human-written examples of good responses
Objective: Learn to follow instructions better
Example:
User: "Summarize this text: [long text]"
Model: "[good summary]"
Loss: Minimize difference between model output and human-written summary
Stage 3: Reinforcement Learning from Human Feedback (RLHF)
Training data: Human rankings of different outputs
Objective: Learn which responses are better
Example:
User: "How to make friends?"
Response A: [helpful advice]
Response B: [vague nonsense]
Human rates: Response A >> Response B
Model learns: Generate Response A-like outputs
Model Sizes
Small (7B params): 1-2 hours training, cheap inference
Medium (13-70B): Balanced
Large (175B+): GPT-3, Claude
Huge (1T+ params): Future models, not yet practical
Rule: More parameters → Better understanding & reasoning
But: Slower, more expensive, more compute
Inference (Using the Model)
User input: "What is Python?"
Step 1: Tokenize: ["What", "is", "Python", "?"]
Step 2: Convert to numbers: [1234, 567, 890, 123]
Step 3: Feed through transformer (generate next token probabilities)
Step 4: Sample from probabilities (pick next word)
Step 5: Repeat steps 3-4 until done or [END] token
Output: "Python is a programming language..."
Key Insight: Emergent Abilities
With enough scale:
- Models without explicit training suddenly can:
- Translate languages
- Write code
- Answer questions
- Reason about problems
These abilities "emerge" from scale! Not explicitly programmed.
main.py
Loading...
OUTPUT
▶Click "Run Code" to execute…