This repository contains PyTorch evaluation code, training code and pretrained models for the following projects:
- DeiT (Data-Efficient Image Transformers), ICML 2021
- CaiT (Going deeper with Image Transformers), ICCV 2021 (Oral)
- ResMLP (ResMLP: Feedforward networks for image classification with data-efficient training)
- PatchConvnet (Augmenting Convolutional networks with attention-based aggregation)
- 3Things (Three things everyone should know about Vision Transformers)
- DeiT III (DeiT III: Revenge of the ViT)
PatchConvnet provides interpretable attention maps to convnets:
For details see Augmenting Convolutional networks with attention-based aggregation by Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Matthieu Cord, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve and Hervé Jégou.
If you use this code for a paper please cite:
@article{touvron2021patchconvnet,
title={Augmenting Convolutional networks with attention-based aggregation},
author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Jakob Verbeek and Herv'e J'egou},
journal={arXiv preprint arXiv:2112.13692},
year={2021},
}
We provide PatchConvnet models pretrained on ImageNet-1k:
name | acc@1 | res | FLOPs (B) | #params (M) | Peak Mem. (MB) | throughput(im/s) | url |
---|---|---|---|---|---|---|---|
S60 | 82.1 | 224 | 4.0 | 25.2 | 1322 | 1129 | model |
S120 | 83.2 | 224 | 7.5 | 47.7 | 1450 | 580 | model |
B60 | 83.5 | 224 | 15.8 | 99.4 | 2790 | 541 | model |
B120 | 84.1 | 224 | 29.9 | 188.6 | 3314 | 280 | model |
Model pretrained on ImageNet-21k with finetuning on ImageNet-1k:
name | acc@1 | res | FLOPs (B) | #params (M) | Peak Mem. (MB) | throughput(im/s) | url |
---|---|---|---|---|---|---|---|
S60 | 83.5 | 224 | 4.0 | 25.2 | 1322 | 1129 | model |
S60 | 84.9 | 384 | 11.8 | 25.2 | 3604 | 388 | model |
S60 | 85.4 | 512 | 20.9 | 25.2 | 6296 | 216 | model |
B60 | 85.4 | 224 | 15.8 | 99.4 | 2790 | 541 | model |
B60 | 86.5 | 384 | 46.5 | 99.4 | 7067 | 185 | model |
B120 | 86.0 | 224 | 29.8 | 188.6 | 3314 | 280 | model |
B120 | 86.9 | 384 | 87.7 | 188.6 | 7587 | 96 | model |
PatchConvnet models with multi-class tokens on ImageNet-1k:
name | acc@1 | res | FLOPs (B) | #params (M) | url |
---|---|---|---|---|---|
S60 (scratch) | 81.1 | 224 | 5.3 | 25.6 | model |
S60 (finetune) | 82.0 | 224 | 5.3 | 25.6 | model |
The models are also available via torch hub.
Before using it, make sure you have the latest pytorch-image-models package timm
by Ross Wightman installed.
We provide a notebook to visualize the attention maps of our networks.
This repository is released under the Apache 2.0 license as found in the LICENSE file.
We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.