Page2/9

Large Language Models (LLMs) Explained · Page 1 of 1

From Transformers to ChatGPT

Large Language Models (LLMs)

What is an LLM?

A Large Language Model is:

A transformer network
Pre-trained on billions of text tokens
Learns to predict the next word
Scaled to hundreds of billions of parameters

Input: "The cat sat on the"
LLM: "mat" (predicts next word)

Input: "What is 2+2?"
LLM: "4" (learned from examples)

Input: "Write a poem about"
LLM: Generates coherent, creative text

Training Process

Stage 1: Pre-training (Next Token Prediction)

Training data: Entire internet (books, websites, code)
Objective: Predict next word given previous words
Learning: Billions of examples → learns language patterns

Loss function:
- For each word, compute probability of next word
- Compare to actual next word
- Minimize cross-entropy loss

Stage 2: Supervised Fine-Tuning (Optional)

Training data: Human-written examples of good responses
Objective: Learn to follow instructions better

Example:
User: "Summarize this text: [long text]"
Model: "[good summary]"

Loss: Minimize difference between model output and human-written summary

Stage 3: Reinforcement Learning from Human Feedback (RLHF)

Training data: Human rankings of different outputs
Objective: Learn which responses are better

Example:
User: "How to make friends?"
Response A: [helpful advice]
Response B: [vague nonsense]

Human rates: Response A >> Response B

Model learns: Generate Response A-like outputs

Model Sizes

Small (7B params):    1-2 hours training, cheap inference
Medium (13-70B):      Balanced
Large (175B+):        GPT-3, Claude
Huge (1T+ params):    Future models, not yet practical

Rule: More parameters → Better understanding & reasoning
      But: Slower, more expensive, more compute

Inference (Using the Model)

User input: "What is Python?"

Step 1: Tokenize: ["What", "is", "Python", "?"]
Step 2: Convert to numbers: [1234, 567, 890, 123]
Step 3: Feed through transformer (generate next token probabilities)
Step 4: Sample from probabilities (pick next word)
Step 5: Repeat steps 3-4 until done or [END] token

Output: "Python is a programming language..."

Key Insight: Emergent Abilities

With enough scale:

Models without explicit training suddenly can:
- Translate languages
- Write code
- Answer questions
- Reason about problems

These abilities "emerge" from scale! Not explicitly programmed.

main.py

OUTPUT

▶Click "Run Code" to execute…