Skip to main content

Transfer learning

July 22, 2017

In machine learning, it is known as transfer learning when we take advantage of the knowledge acquired to solve a problem and apply it to solve another similar problem.

Nowadays where it is most used is in the recognition of objects in images.

Suppose for example that someone designed and trained a model capable of recognizing cars. Applying transfer learning I could try to reuse that model, adapting it to make a new model that recognizes trucks.

What is the motivation behind transfer learning?

For object recognition, the best results are obtained using convolutional neural networks (ConvNets1 2). ConvNets are deep neural networks (deep learning) where some of their layers function as convolutional filters 3 . With training, these filters are adapted to recognize characteristics of the images that we want to classify.

Assuming I want to make a ConvNet, the first big problem I will have is how to define the architecture: How many convolutional layers? How many filters? What size? To achieve good results this is not a simple task and requires a lot of knowledge and lots of practice.

The second problem is the training. Usually ConvNets are very expensive to train. They have many parameters and a lot of input data is needed.

If someone has already solved a recognition problem similar to the one I want to solve, it is much easier to take advantage of the work already done (architecture and training).


ImageNet is a project for research in object recognition and computer vision. This project has a database of millions of images with different objects tagged. These images are used by researchers and students to develop their models.

Since 2010, ImageNet has made a yearly challenge where teams of researchers from all over the world are presented. The winning team is the one that when given a set of new photographs (150000 was in 2016), recognizes 1000 different object classes in these with the slightest error.

The interesting thing is that many of the models that meet these challenges are released and any of us can use them. They are models that recognize 1000 different categories of objects which makes them quite generic and can adapt to many problems. The filters learned by the convolutional layers of these models serve many problems of recognition and vision. The most known models are: VGG16, VGG19, ResNet50, InceptionV3, etc.

Example with VGG16

Following an idea similar to the one raised in this post on keras blog, reusing VGG16 with a few lines of code added, I will solve the problem "Dogs vs Cats" by kaggle.

"Dogs vs Cats" problem

The problem is to identify in 2000 photos, which are photos of dogs and which are photos of cats. For this kaggle gives us 12500 photos of dogs and 12500 photos of cats that are already classified to train with.

Some of these photos are as follows:

Solution using keras and VGG16

To solve the problem I will use keras (one of the most used deep learning libraries). As of version 1.1 several of the models of ImageNet already come integrated and can be instantiated and used easily.

The complete code is here (link), there are 50 lines. This is the most interesting part:

# I instantiate VGG16 with imagenet weights
vgg16 = VGG16(weights='imagenet')

# I create a model equal to VGG with the difference of the last layer.
# Instead of using the last layer that classifies between 1000 classes
# I use a own layer that classifies cats and dogs (0 = cats and 1 = dogs). 
block5_pool_output = vgg16.get_layer('block5_pool').output
x = Flatten(input_shape=(7,7,512))(block5_pool_output)
x = Dense(256, activation='relu')(x)
x = Dropout(0.6)(x) # regularización
x = Dense(1, activation='sigmoid')(x)
model = Model(input=vgg16.input, output=x)

# I do not train the convolutional part
for layer in model.layers[:18]:
    layer.trainable = False 

What I do is instantiate the VGG16 model and replace the last layer. Instead of sorting through 1000 kinds of objects, I sort only between dogs and cats (this is known as doing a "finetuning" of the model).

By doing this and training only the substituted layer, a validation accuracy of almost 97% is achieved.

Epoch 48/50
80/80 [==============================] - 52s - loss: 0.1121 - acc: 0.9577 - val_loss: 0.0890 - val_acc: 0.9681
Epoch 49/50
80/80 [==============================] - 52s - loss: 0.1236 - acc: 0.9587 - val_loss: 0.0934 - val_acc: 0.9650
Epoch 50/50
80/80 [==============================] - 51s - loss: 0.1203 - acc: 0.9475 - val_loss: 0.0878 - val_acc: 0.9694

Dogs and cats are categories that ImageNet models already classify, so training the last layer only is enough. For other types of problems you can re-train some of the convolutional layers.

Other transfer learning applications

The example of dogs and cats may seem of little use, but this same technique can be translated to other more useful applications.

For example, Jeremy Howard comments in this video (link), that in his previous company, Enlitic, using a technique very similar to this one with pre-trained models of ImageNet and images of tomographies, they managed to make a model that classifies benign and malignant tumors in a way more accurate than a panel of 4 radiologists.

For more information about transfer learning in ConvNets see:

1 Introduction to convolutional networks - Adam Geitgey
2 Understanding Convolutions - Chris Olah
3 Convolution matrix - Wikipedia

Contact us

If you want to hire us or make an inquiry please contact us.