This tweak works because the momentum vector will be pointing towards the optimum so it's slightly more accurate to measure the gradient a bit further in that direction.
When momentum pushes the weights across a valley, regular momentum optimization continues to push further across the valley while NAG pushes back toward the bottom of the valley.