Lab Shorts — Kushal Lab

9 SHORTS

OBSERVATIONNt2026-05-28

L2 regularisation is just weight decay

w \leftarrow w(1 - 2\lambda\eta) - \eta\nabla L

The math is identical. L2 adds a penalty term λ‖w‖² to the loss. Follow the gradient and you get a multiplicative shrink factor on the weights at every step — that shrink factor is weight decay. Two names, one operation. Frameworks that expose both knobs are giving you the same lever twice.

★ STANDALONE OBSERVATION

1 / 9↑ ↓ keys