Made with the help of Google CodeLabs
We will be using transfer learning, which means we are starting with a model that has been already trained on another problem. We will then be retraining it on a similar problem. Deep learning from scratch can take days, but transfer learning can be done in short order.
We are going to use a model trained on the Large Visual Recognition Challenge . These models can differentiate between 1,000 different classes, like Dalmatian or dishwasher. You will have a choice of model architectures, so you can determine the right tradeoff between speed, size and accuracy for your problem.
The graph below shows the first-choice-accuracies of these configurations (y-axis), vs the number of calculations required (x-axis), and the size of the model (circle area).
16 points are shown for mobilenet. For each of the 4 model sizes (circle area in the figure) there is one point for each image resolution setting. The 128px image size models are represented by the lower-left point in each set, while the 224px models are in the upper right.
Other notable architectures are also included for reference. "GoogleNet" in this figure is "Inception V1" in this table. An extended version of this figure is available in slides 84-89 here.