🧠

Module

Deep Learning & Neural Networks

Progress35%

7 / 20 pages

Lesson 1: Neurons & Perceptrons — Building Blocks

Lesson 2: Forward & Backpropagation — How Networks Learn

Lesson 3: Loss Functions & Optimization (Adam, SGD)

Lesson 4: Tokenization, Word Embeddings & Word2Vec

Lesson 5: Convolutional Neural Networks (CNN) — Image Processing

Lesson 6: Recurrent Neural Networks (RNN, LSTM, GRU)

Lesson 7: Attention Mechanisms & Transformers

Lesson 8: Generative Adversarial Networks (GAN)

Lesson 9: Weight Initialization, Regularization & Dropout

Lesson 10: Transfer Learning & Model Deployment

Back to Module Overview

Page7/20

Tokenization, Word Embeddings & Word2Vec · Page 1 of 2

From Text to Numbers

Text Processing & Embeddings

The Challenge: Text to Neural Networks

Neural networks need numbers, but we have text!

"The cat sat on the mat" → ???

Step 1: Tokenization

Break text into tokens (words, subwords, characters).

"The cat sat on the mat"
↓
["The", "cat", "sat", "on", "the", "mat"]

Step 2: Vocabulary & Indexing

Map each token to an integer.

Vocabulary:
- "cat" → 2
- "mat" → 5
- "on" → 4
- "sat" → 3
- "the" → 1

Text: "The cat sat on the mat"
↓
[1, 2, 3, 4, 1, 5]

Step 3: Convert to Vectors

Now the neural network can process!

[1, 2, 3, 4, 1, 5] → Neural Network → Output

One-Hot Encoding (Naive Approach)

Represent each word as a vector:

Vocabulary: {cat, dog, mat, sat, the}

"cat" → [1, 0, 0, 0, 0]  (one-hot)
"dog" → [0, 1, 0, 0, 0]
"the" → [0, 0, 0, 0, 1]

Problem: No semantic relationship!

"cat" and "dog" should be similar (both animals)
"cat" and "pizza" should be different
But one-hot gives them no relationship

Dense Word Embeddings (Better Approach)

Instead of sparse one-hot vectors, learn dense embeddings:

"cat" → [0.2, -0.5, 0.8, 0.1, -0.3]  (5D embedding)
"dog" → [0.25, -0.48, 0.75, 0.15, -0.25]  (similar!)
"the" → [-0.1, 0.2, -0.3, 0.8, 0.1]  (different)

Magic: Similar words have similar vectors!

Distance between "cat" and "dog" vectors: 0.05 ← close!
Distance between "cat" and "the" vectors: 1.2 ← far!

main.py

OUTPUT

▶Click "Run Code" to execute…