This is a pytorch implementation for the Visformer models. This project is based on the training code in Deit and the tools in timm.
Clone the repository:
git clone https://github.com/danczs/Visformer.git
Install pytorch, timm and einops:
pip install -r requirements.txt
The layout of Imagenet data:
/path/to/imagenet/
train/
class1/
img1.jpeg
class2/
img2.jpeg
val/
class1/
img1.jpeg
class2/
img2.jpeg
Visformer_small
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model visformer_small --batch-size 64 --data-path /path/to/imagenet --output_dir /path/to/save
Visformer_tiny
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --model visformer_tiny --batch-size 256 --drop-path 0.0 --data-path /path/to/imagenet --output_dir /path/to/save
For the current version, visformer_small can achieve 82.28% on ImageNet.
Beacause of the ploicy of our institution, we cannot send the pre-trained models out directly. Thankfully, @hzhang57 provides a model trained by himself.