7 SHORTS
OBSERVATIONNt2026-05-28
L2 regularisation is just weight decay
w←w(1−2λη)−η∇L
The math is identical. L2 adds a penalty term λ‖w‖² to the loss. Follow the gradient and you get a multiplicative shrink factor on the weights at every step — that shrink factor is weight decay. Two names, one operation. Frameworks that expose both knobs are giving you the same lever twice.
★ STANDALONE OBSERVATION
1 / 7↑ ↓ keys