LIVE · BENGALURU EST. 2024
7 SHORTS
OBSERVATIONNt2026-05-28

L2 regularisation is just weight decay

ww(12λη)ηLw \leftarrow w(1 - 2\lambda\eta) - \eta\nabla L

The math is identical. L2 adds a penalty term λ‖w‖² to the loss. Follow the gradient and you get a multiplicative shrink factor on the weights at every step — that shrink factor is weight decay. Two names, one operation. Frameworks that expose both knobs are giving you the same lever twice.

★ STANDALONE OBSERVATION
1 / 7↑ ↓ keys