Day 17 - Reusing Pretrained Layers

Training a large deep neural network from scratch takes a lot of time, data and resources.

Transfer learning - find a network trained for a similar task and reuse the lower layers of this network.

Transfer learning works best when the lower level features of the data it was trained on and the data you want to use it for are similar.

Transfer learning works best with deep convolutional neural networks.

Finding the right number of layers to reuse

The more similar the tasks are, the more layers you can reuse (starting from the lowest layer).

For very similar tasks, keep all hidden layers as they are and replace the output layer.

Freeze all reused layers.

Train model and see how it performs.

Unfreeze one or two top-most layers and see if performance improves.
⭐
The more training data you have, the more layers you can unfreeze.
⭐
Reduce the learning rate when reused layers and unfrozen to prevent wrecking the pre-trained weights.

If you have little training data, drop the top hidden layers and freeze remaining hidden layers.

If you have a lot of training data, replace top hidden layers instead of dropping them. More hidden layers can also be added.

Transfer Learning with Keras

# Create a new model without the last layer of the original model
model_clone = tf.keras.models.Sequential(model.layers[:-1])

# Clone the model so that the layers of original model are unaffected
model_clone = tf.keras.models.clone_model(model_clone)

# Copy weights manually since clone_model doesn't clone weights
model_clone.set_weights(model.get_weights())

# Build the cloned model using same input shape as original model
model_clone.build(model.layers[0].input_shape)

# Add new output layer depending on task we want to solve
model_clone.add(tf.keras.layers.Dense(1, activation="sigmoid"))

# Freeze reused layers for the first few epochs
# so that the new output layer has time to learn reasonable weights
for layer in model_cllone.layers[:-1]:
	layer.trainable = False

# The model always needs to be compiled after freezing/unfreezing layers
model_clone.compile(loss="binary_crossentropy", optimizer="sgd")

# Train the model for a few epochs
model_clone.fit(...)

# Unfreeze reused layers
for layer in model_clone.layers[:-1]:
	layer.trainable = True

# Reduce learning rate to prevent wrecking reused weights
optimizer = tf.keras.optimizers.SGD(lr=1e-4)
model_clone.compile(loss="binary_crossentropy", optimizer=optimizer)

⭐

The model always needs to be compiled after freezing/unfreezing layers

Unsupervised Pretraining

If sufficient labeled training is not available, gather unlabeled data and train an unsupervised model (e.g. autoencoder or GAN) and reuse their lower layers.

Add output layer on top and fine-tune the final network using supervised learning with the labeled data available.

Pretraining on an Auxiliary Task

Train a network on a different but related task for which labeled data is readily available or can be easily obtained and reuse the lower layers of this network.

Self-supervised learning

Automatically generate labels from data and then train a model on resulting labeled dataset using supervised learning.

Since no human labeling is required, this is usually classified as a form of unsupervised learning.

You show a system a piece of input, a text, a video, even an image, you suppress a piece of it, mask it, and you train a neural net or your favorite class or model to predict the piece that’s missing. It could be the future of a video or the words missing in a text - Yan LeCun