- A popular technique to mitigate exploding gradients problem.
- Gradients during backpropagation are clipped so that they never exceed a given threshold.
- The threshold is a hyperparameter that can be tuned.
- Used mostly in RNNs since batch normalization is tricky to use. For other types of networks, batch normalization is sufficient.
Using Gradient Clipping in Keras
optimizer = keras.optimizers.SGD(clipvalue=1.0)
model.compile(loss="mse", optimizer=optimizer)
- Every component of gradient vector will be clipped between -1.0 and 1.0.
- This may change orientation of gradient vector, however this approach works well in practice.
- To retain direction of gradient vector, use
clipnorm
instead of clipvalue
.
- When using
clipnorm
, the whole gradient vector is clipped if its l2 norm is greater than specified threshold.
Gradient Clipping in Practice
- Track size of gradients during training using TensorBoard.
- If gradients are exploding, try clipping by value and by norm.
- Use both approaches with different thresholds and see which performs best on validation set.