Skip to content

JalinWang/MultiSourceTrainner

Repository files navigation

Code-Generator

Image Classification Template by Code-Generator

This is the image classification template by Code-Generator using resnet18 model and cifar10 dataset from TorchVision and training is powered by PyTorch and PyTorch-Ignite.

Getting Started

Install the dependencies with pip:

pip install -r requirements.txt --progress-bar off -U

Code structure

|
|- README.md
|
|- main.py : main script to run
|- data.py : helper module with functions to setup input datasets and create dataloaders
|- models.py : helper module with functions to create a model or multiple models
|- trainers.py : helper module with functions to create trainer and evaluator
|- utils.py : module with various helper functions
|
|- requirements.txt : dependencies to install with pip
|
|- config.yaml : global configuration YAML file
|
|- test_all.py : test file with few basic sanity checks

Training

Multi GPU Training (torchrun) (recommended)

torchrun \
  --nproc_per_node 2 \
  main.py config.yaml --backend nccl

Multi Node, Multi GPU Training (torchrun) (recommended)

  • Execute on master node
torchrun \
  --nproc_per_node 4 \
  --nnodes 2 \
  --node_rank 0 \
  --master_addr 127.0.0.1 \
  --master_port 8080 \
  main.py config.yaml --backend nccl
  • Execute on worker nodes
torchrun \
  --nproc_per_node 4 \
  --nnodes 2 \
  --node_rank <node_rank> \
  --master_addr 127.0.0.1 \
  --master_port 8080 \
  main.py config.yaml --backend nccl

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published