This repository contains PyTorch evaluation code, training code and pretrained models for the following projects:
- DeiT (Data-Efficient Image Transformers), ICML 2021
- CaiT (Going deeper with Image Transformers), ICCV 2021 (Oral)
- ResMLP (ResMLP: Feedforward networks for image classification with data-efficient training)
- PatchConvnet (Augmenting Convolutional networks with attention-based aggregation)
- 3Things (Three things everyone should know about Vision Transformers)
- DeiT III (DeiT III: Revenge of the ViT)
For details see Three things everyone should know about Vision Transformers by Hugo Touvron, Matthieu Cord, Alaaeldin El-Nouby, Jakob Verbeek and Hervé Jégou.
If you use this code for a paper please cite:
@article{Touvron2022ThreeTE,
title={Three things everyone should know about Vision Transformers},
author={Hugo Touvron and Matthieu Cord and Alaaeldin El-Nouby and Jakob Verbeek and Herve Jegou},
journal={arXiv preprint arXiv:2203.09795},
year={2022},
}
We propose to finetune only the attentions (flag --attn-only
) to adapt the models to higher resolutions or to do transfer learning.
We propose to replace the linear patch projection by an MLP patch projection (see class hMLP_stem). A key advantage is that this pre-processing stem is compatible with and improves mask-based self-supervised training like BeiT.
We propose to use block in parallele in order to have more flexible architectures (see class Layer_scale_init_Block_paralx2):
This repository is released under the Apache 2.0 license as found in the LICENSE file.
We actively welcome your pull requests! Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.