Day 21 - Learning Rate Scheduling
Finding a good learning rate is important.
- A very high rate causes training to diverge.
- A very low rate takes too long to converge.
- A slightly high rate will make quick progress at first but will never settle down at optimum.
Finding a good learning rate:
- Train model for a few hundred iterations.
- Exponentially increase learning rate from a very small value to a very large value.
- Look at the learning curve and choose learning rate slightly lower than value when learning curve starts shooting back up.
- Reinitialize model and train with this learning rate.
Using a non-constant learning rate:
- Start with a large learning rate and then reduce it once training stops making fast progress.
- A good solution can be reached faster this way than when using the optimal constant learning rate.
- There are many different strategies to reduce the learning rate. These strategies are called learning schedules.
Examples of learning schedules:
- Power scheduling
optimizer = keras.optimizers.SGD(lr=0.01, decay=1e-4)
- Exponential scheduling
def exponential_decay(lr0, s): def exponential_decay_fn(epoch): return lr0 * 0.1**(epoch / s) return exponential_decay_fn exponential_decay_fn = exponential_decay(lr0=0.01, s=20) lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn) model.fit(... callbacks=[lr_scheduler])
- Piecewise constant scheduling
def piecewise_constant_fn(epoch): if epoch < 5: return 0.01 elif epoch < 15: return 0.005 else: return 0.001 lr_scheduler = keras.callbacks.LearningRateScheduler(piecewise_constant_fn) model.fit(... callbacks=[lr_scheduler])
- Performance scheduling
lr_scheduler = keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)
- 1-cycle scheduling
- Create custom callback that modifies lelarning rate at each iteration.
- Use
self.model.optimizer.lr
to update the learning rate.
Another way use learning schudule (specific to tf.keras
):
Define the learning rate using one of the schedules available in keras.optimizers.schedules
and then pass this learning rate to any optimizer.
learning_rate = keras.optimizers.schedules.ExponentialDecay(...)
optimizer = keras.optimizers.SGD(learning_rate)