8.5 Transfer learning

Slides

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. Usually a pretrained model is trained with large volume of images and that is why the convolutional layers and vector representation of this model can be used for other tasks as well. However, the dense layers need to be retrained because they are specific to the dataset to make predictions. In our problem, we want to keep convoluational layers but we want to train new dense layers.

Following are the steps to create train/validation data for model:

# Build image generator for training (takes preprocessing input function)
train_gen = ImageDataGenerator(preprocessing_function=preprocess_input)

# Load in train dataset into train generator
train_ds = train_gen.flow_from_directory(directory=path/to/train_imgs_dir, # Train images directory
                                         target_size=(150,150), # resize images to train faster
                                         batch_size=32) # 32 images per batch

# Create image generator for validation
val_gen = ImageDataGenerator(preprocessing_function=preprocess_input)

# Load in image for validation
val_ds = val_gen.flow_from_directory(directory=path/to/val_imgs_dir, # Validation image directory
                                     target_size=(150,150),
                                     batch_size=32,
                                     shuffle=False) # False for validation

Following are the steps to build model from a pretrained model:

# Build base model
base_model = Xception(weights='imagenet',
                      include_top=False, # to create custom dense layer
                      input_shape=(150,150,3))

# Freeze the convolutional base by preventing the weights being updated during training
base_model.trainable = False

# Define expected image shape as input
inputs = keras.Input(shape=(150,150,3))

# Feed inputs to the base model
base = base_model(inputs, training=False) # set False because the model contains BatchNormalization layer

# Convert matrices into vectors using pooling layer
vectors = keras.layers.GlobalAveragePooling2D()(base)

# Create dense layer of 10 classes
outputs = keras.layers.Dense(10)(vectors)

# Create model for training
model = keras.Model(inputs, outputs)

Following are the steps to instantiate optimizer and loss function:

# Define learning rate
learning_rate = 0.01

# Create optimizer
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)

# Define loss function
loss = keras.losses.CategoricalCrossentropy(from_logits=True) # to keep the raw output of dense layer without applying softmax

# Compile the model
model.compile(optimizer=optimizer,
              loss=loss,
              metrics=['accuracy']) # evaluation metric accuracy

The model is ready to train once it is defined and compiled:

# Train the model, validate it with validation data, and save the training history
history = model.fit(train_ds, epochs=10, validation_data=val_ds)

Classes, function, and attributes:

from tensorflow.keras.preprocessing.image import ImageDataGenerator: to read the image data and make it useful for training/validation
flow_from_directory(): method to read the images directly from the directory
next(train_ds): to unpack features and target variables
train_ds.class_indices: attribute to get classes according to the directory structure
GlobalAveragePooling2D(): accepts 4D tensor as input and operates the mean on the height and width dimensionalities for all the channels and returns vector representation of all images
CategoricalCrossentropy(): method to produces a one-hot array containing the probable match for each category in multi classification
model.fit(): method to train model
epochs: number of iterations over all of the training data
history.history: history attribute is a dictionary recording loss and metrics values (accuracy in our case) for at each epoch

Notes

Add notes from the video (PRs are welcome)

convolutional layers convert an image into a vector representation
dense layers use vector representations to make predictions
using a pretrained neural network
imagenet has 1000 different classes
a dense layer may be specific to a certain number of classes whereas the vector representation can be applied to another dataset
reusing the vector representation from convolutional layers means transferring knowledge and the idea behind transfer learning
train faster on smaller size images
the batch size
base model vs custom model
bottom layers vs top layers in keras
keras optimizers
using the adam optimizer
weights, learning rates
eta in xgboost
model loss
categorical cross entropy
changing accuracy during several training epochs
overfitting

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Notes from Peter Ernicke
Notes from Peter Ernicke

Navigation

Machine Learning Zoomcamp course
Session 8: Neural Networks and Deep Learning
Previous: Convolutional neural networks
Next: Adjusting the learning rate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05-transfer-learning.md

05-transfer-learning.md

8.5 Transfer learning

Notes

Navigation

Files

05-transfer-learning.md

Latest commit

History

05-transfer-learning.md

File metadata and controls

8.5 Transfer learning

Notes

Navigation