This is the image classification template by Code-Generator using resnet18
model and cifar10
dataset from TorchVision and training is powered by PyTorch and PyTorch-Ignite.
Install the dependencies with pip
:
pip install -r requirements.txt --progress-bar off -U
|
|- README.md
|
|- main.py : main script to run
|- data.py : helper module with functions to setup input datasets and create dataloaders
|- models.py : helper module with functions to create a model or multiple models
|- trainers.py : helper module with functions to create trainer and evaluator
|- utils.py : module with various helper functions
|
|- requirements.txt : dependencies to install with pip
|
|- config.yaml : global configuration YAML file
|
|- test_all.py : test file with few basic sanity checks
torchrun \
--nproc_per_node 2 \
main.py config.yaml --backend nccl
- Execute on master node
torchrun \
--nproc_per_node 4 \
--nnodes 2 \
--node_rank 0 \
--master_addr 127.0.0.1 \
--master_port 8080 \
main.py config.yaml --backend nccl
- Execute on worker nodes
torchrun \
--nproc_per_node 4 \
--nnodes 2 \
--node_rank <node_rank> \
--master_addr 127.0.0.1 \
--master_port 8080 \
main.py config.yaml --backend nccl