Day 17 - Reusing Pretrained Layers

Finding the right number of layers to reuse

  1. Freeze all reused layers.
  1. Train model and see how it performs.
  1. Unfreeze one or two top-most layers and see if performance improves.
    The more training data you have, the more layers you can unfreeze.
    Reduce the learning rate when reused layers and unfrozen to prevent wrecking the pre-trained weights.
  1. If you have little training data, drop the top hidden layers and freeze remaining hidden layers.
  1. If you have a lot of training data, replace top hidden layers instead of dropping them. More hidden layers can also be added.

Transfer Learning with Keras

# Create a new model without the last layer of the original model
model_clone = tf.keras.models.Sequential(model.layers[:-1])

# Clone the model so that the layers of original model are unaffected
model_clone = tf.keras.models.clone_model(model_clone)

# Copy weights manually since clone_model doesn't clone weights
model_clone.set_weights(model.get_weights())

# Build the cloned model using same input shape as original model
model_clone.build(model.layers[0].input_shape)

# Add new output layer depending on task we want to solve
model_clone.add(tf.keras.layers.Dense(1, activation="sigmoid"))

# Freeze reused layers for the first few epochs
# so that the new output layer has time to learn reasonable weights
for layer in model_cllone.layers[:-1]:
	layer.trainable = False

# The model always needs to be compiled after freezing/unfreezing layers
model_clone.compile(loss="binary_crossentropy", optimizer="sgd")

# Train the model for a few epochs
model_clone.fit(...)

# Unfreeze reused layers
for layer in model_clone.layers[:-1]:
	layer.trainable = True

# Reduce learning rate to prevent wrecking reused weights
optimizer = tf.keras.optimizers.SGD(lr=1e-4)
model_clone.compile(loss="binary_crossentropy", optimizer=optimizer)
The model always needs to be compiled after freezing/unfreezing layers

Unsupervised Pretraining

Pretraining on an Auxiliary Task

Self-supervised learning

You show a system a piece of input, a text, a video, even an image, you suppress a piece of it, mask it, and you train a neural net or your favorite class or model to predict the piece that’s missing. It could be the future of a video or the words missing in a text - Yan LeCun