Skip to content

Latest commit

 

History

History
71 lines (53 loc) · 4.18 KB

README_patchconvnet.md

File metadata and controls

71 lines (53 loc) · 4.18 KB

Augmenting Convolutional networks with attention-based aggregation

This repository contains PyTorch evaluation code, training code and pretrained models for the following projects:

  • DeiT (Data-Efficient Image Transformers), ICML 2021
  • CaiT (Going deeper with Image Transformers), ICCV 2021 (Oral)
  • ResMLP (ResMLP: Feedforward networks for image classification with data-efficient training)
  • PatchConvnet (Augmenting Convolutional networks with attention-based aggregation)
  • 3Things (Three things everyone should know about Vision Transformers)
  • DeiT III (DeiT III: Revenge of the ViT)

PatchConvnet provides interpretable attention maps to convnets:

For details see Augmenting Convolutional networks with attention-based aggregation by Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Matthieu Cord, Piotr Bojanowski, Armand Joulin, Gabriel Synnaeve and Hervé Jégou.

If you use this code for a paper please cite:

@article{touvron2021patchconvnet,
  title={Augmenting Convolutional networks with attention-based aggregation},
  author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Piotr Bojanowski and Armand Joulin and Gabriel Synnaeve and Jakob Verbeek and Herv'e J'egou},
  journal={arXiv preprint arXiv:2112.13692},
  year={2021},
}

Model Zoo

We provide PatchConvnet models pretrained on ImageNet-1k:

name acc@1 res FLOPs (B) #params (M) Peak Mem. (MB) throughput(im/s) url
S60 82.1 224 4.0 25.2 1322 1129 model
S120 83.2 224 7.5 47.7 1450 580 model
B60 83.5 224 15.8 99.4 2790 541 model
B120 84.1 224 29.9 188.6 3314 280 model

Model pretrained on ImageNet-21k with finetuning on ImageNet-1k:

name acc@1 res FLOPs (B) #params (M) Peak Mem. (MB) throughput(im/s) url
S60 83.5 224 4.0 25.2 1322 1129 model
S60 84.9 384 11.8 25.2 3604 388 model
S60 85.4 512 20.9 25.2 6296 216 model
B60 85.4 224 15.8 99.4 2790 541 model
B60 86.5 384 46.5 99.4 7067 185 model
B120 86.0 224 29.8 188.6 3314 280 model
B120 86.9 384 87.7 188.6 7587 96 model

PatchConvnet models with multi-class tokens on ImageNet-1k:

name acc@1 res FLOPs (B) #params (M) url
S60 (scratch) 81.1 224 5.3 25.6 model
S60 (finetune) 82.0 224 5.3 25.6 model

The models are also available via torch hub. Before using it, make sure you have the latest pytorch-image-models package timm by Ross Wightman installed.

Notebook for visualization

Open In Colab We provide a notebook to visualize the attention maps of our networks.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Contributing

We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.