Day 19 - Nesterov Accelerated Gradient
- A faster variant of momentum optimization.
- Gradient of the cost function is not calculated at the local position but slightly ahead in the direction of the momentum, at .
- Nesterov Accelerated Gradient algorithm
- This tweak works because the momentum vector will be pointing towards the optimum so it's slightly more accurate to measure the gradient a bit further in that direction.
- When momentum pushes the weights across a valley, regular momentum optimization continues to push further across the valley while NAG pushes back toward the bottom of the valley.
Using Nesterov Accelerated Gradient in Keras
optimizer = keras.optimizers.SGD(lr=0.001, momentum=0.9, nesterov=True)