🧠

Module

Deep Learning & Neural Networks

Progress40%

8 / 20 pages

Lesson 1: Neurons & Perceptrons — Building Blocks

Lesson 2: Forward & Backpropagation — How Networks Learn

Lesson 3: Loss Functions & Optimization (Adam, SGD)

Lesson 4: Tokenization, Word Embeddings & Word2Vec

Lesson 5: Convolutional Neural Networks (CNN) — Image Processing

Lesson 6: Recurrent Neural Networks (RNN, LSTM, GRU)

Lesson 7: Attention Mechanisms & Transformers

Lesson 8: Generative Adversarial Networks (GAN)

Lesson 9: Weight Initialization, Regularization & Dropout

Lesson 10: Transfer Learning & Model Deployment

Back to Module Overview

Page8/20

Tokenization, Word Embeddings & Word2Vec · Page 2 of 2

Word2Vec & Learning Embeddings

Word2Vec (Context-Based Embeddings)

Idea: Learn word embeddings by predicting context!

Skip-Gram Model

Train a network to: Given a word, predict surrounding words.

"the cat sat on the mat"

For word "cat":
- Context words: ["the", "sat"] (nearby words)

Network learns:
- Input: embed("cat")
- Output: predict ["the", "sat"]

By training on millions of examples:
- Similar words in similar contexts
- Similar embeddings!

Example Training

Sentence: "king queen man woman"

Training examples (predict next word):
- "king" → "queen"
- "queen" → "man"
- "man" → "woman"

After training:
embed("king") - embed("man") ≈ embed("queen") - embed("woman")

Why? Both pairs are {masculine → feminine} relationships!

Semantic Algebra (Magic!)

Pre-trained embeddings capture semantics:

embed("king") - embed("man") + embed("woman") ≈ embed("queen")

king is to man as queen is to woman!

embed("Paris") - embed("France") + embed("Germany") ≈ embed("Berlin")

Paris is to France as Berlin is to Germany!

Pre-Trained Embeddings

Don't train from scratch! Use pre-trained:

Word2Vec: 300D vectors trained on Google News (billions of words)
GloVe: Global vectors, captures global word co-occurrence
FastText: Handles out-of-vocabulary words (subword information)

Advantage: These embeddings already understand language! Just load and use.

Embedding Dimensions

50D embeddings:   Fast, less memory, OK quality
100D embeddings:  Good balance
300D embeddings:  High quality (standard Word2Vec)
1000D embeddings: Very high quality, but slow

Most use 300D as sweet spot.

main.py

OUTPUT

▶Click "Run Code" to execute…