12/20
Recurrent Neural Networks (RNN, LSTM, GRU) Β· Page 2 of 2

LSTM (Long Short-Term Memory) & GRU

LSTM (Long Short-Term Memory)

Solution to vanishing gradients: Use gates to control information flow!

The LSTM Cell

Four gates control what gets remembered:

1. Forget Gate: f_t = sigmoid(W_f Γ— [h_{t-1}, x_t] + b_f)
   "Should I forget this info?"

2. Input Gate: i_t = sigmoid(W_i Γ— [h_{t-1}, x_t] + b_i)
   "Should I learn this new info?"

3. Candidate: C̃_t = tanh(W_c × [h_{t-1}, x_t] + b_c)
   "What new info should I learn?"

4. Output Gate: o_t = sigmoid(W_o Γ— [h_{t-1}, x_t] + b_o)
   "What info should I output?"

Cell state update:
C_t = f_t βŠ™ C_{t-1} + i_t βŠ™ CΜƒ_t  (add new, forget old)

Hidden state:
h_t = o_t βŠ™ tanh(C_t)

(βŠ™ = element-wise multiplication)

Key insight: Cell state flows straight through, gradients don't vanish!

Example: Understanding Context

Sentence: "The cat, which was orange and fluffy, sat"

LSTM forgets irrelevant words (commas, adjectives)
Remembers "cat" as subject
Learns that "sat" is the verb about the cat

Forget gate: "Forget 'orange', 'fluffy', 'and'"
Input gate: "Remember 'cat'"
Output gate: "Output 'sat' is verb of 'cat'"

GRU (Gated Recurrent Unit)

Simpler than LSTM but similar performance.

Only 2 gates (vs LSTM's 4):

Reset gate: r_t = sigmoid(W_r Γ— [h_{t-1}, x_t] + b_r)
Update gate: z_t = sigmoid(W_z Γ— [h_{t-1}, x_t] + b_z)

hΜƒ_t = tanh(W Γ— [r_t βŠ™ h_{t-1}, x_t] + b)
h_t = (1 - z_t) βŠ™ hΜƒ_t + z_t βŠ™ h_{t-1}

Advantages of GRU:

  • Fewer parameters (2 gates vs 4)
  • Faster training
  • Often similar performance to LSTM
  • Good for smaller datasets

LSTM vs GRU vs RNN

ModelParamsSpeedLong-termUse Case
RNNFewFastPoorSimple sequences
GRUMediumMediumGoodText, most tasks
LSTMManySlowExcellentComplex sequences

Modern practice:

  • Default to GRU (good balance)
  • Use LSTM for very long sequences
  • Avoid vanilla RNN (vanishing gradients)

Bidirectional RNNs

Process sequence in both directions:

Forward RNN: ← (left to right)
Backward RNN: β†’ (right to left)
Concatenate: [forward_hidden, backward_hidden]

Advantage: Can look ahead!

Example: Sequence labeling, machine translation

main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…