Day 17 - Reusing Pretrained Layers
- Training a large deep neural network from scratch takes a lot of time, data and resources.
- Transfer learning - find a network trained for a similar task and reuse the lower layers of this network.
- Transfer learning works best when the lower level features of the data it was trained on and the data you want to use it for are similar.
- Transfer learning works best with deep convolutional neural networks.
Finding the right number of layers to reuse
- The more similar the tasks are, the more layers you can reuse (starting from the lowest layer).
- For very similar tasks, keep all hidden layers as they are and replace the output layer.
- Freeze all reused layers.
- Train model and see how it performs.
- Unfreeze one or two top-most layers and see if performance improves.⭐The more training data you have, the more layers you can unfreeze.⭐Reduce the learning rate when reused layers and unfrozen to prevent wrecking the pre-trained weights.
- If you have little training data, drop the top hidden layers and freeze remaining hidden layers.
- If you have a lot of training data, replace top hidden layers instead of dropping them. More hidden layers can also be added.
Transfer Learning with Keras
# Create a new model without the last layer of the original model
model_clone = tf.keras.models.Sequential(model.layers[:-1])
# Clone the model so that the layers of original model are unaffected
model_clone = tf.keras.models.clone_model(model_clone)
# Copy weights manually since clone_model doesn't clone weights
model_clone.set_weights(model.get_weights())
# Build the cloned model using same input shape as original model
model_clone.build(model.layers[0].input_shape)
# Add new output layer depending on task we want to solve
model_clone.add(tf.keras.layers.Dense(1, activation="sigmoid"))
# Freeze reused layers for the first few epochs
# so that the new output layer has time to learn reasonable weights
for layer in model_cllone.layers[:-1]:
layer.trainable = False
# The model always needs to be compiled after freezing/unfreezing layers
model_clone.compile(loss="binary_crossentropy", optimizer="sgd")
# Train the model for a few epochs
model_clone.fit(...)
# Unfreeze reused layers
for layer in model_clone.layers[:-1]:
layer.trainable = True
# Reduce learning rate to prevent wrecking reused weights
optimizer = tf.keras.optimizers.SGD(lr=1e-4)
model_clone.compile(loss="binary_crossentropy", optimizer=optimizer)
⭐
The model always needs to be compiled after freezing/unfreezing layers
Unsupervised Pretraining
- If sufficient labeled training is not available, gather unlabeled data and train an unsupervised model (e.g. autoencoder or GAN) and reuse their lower layers.
- Add output layer on top and fine-tune the final network using supervised learning with the labeled data available.
Pretraining on an Auxiliary Task
- Train a network on a different but related task for which labeled data is readily available or can be easily obtained and reuse the lower layers of this network.
Self-supervised learning
- Automatically generate labels from data and then train a model on resulting labeled dataset using supervised learning.
- Since no human labeling is required, this is usually classified as a form of unsupervised learning.
You show a system a piece of input, a text, a video, even an image, you suppress a piece of it, mask it, and you train a neural net or your favorite class or model to predict the piece that’s missing. It could be the future of a video or the words missing in a text - Yan LeCun