Skip to content

Latest commit

 

History

History
136 lines (91 loc) · 5.07 KB

TrainTest.md

File metadata and controls

136 lines (91 loc) · 5.07 KB

Training and Testing

English | 简体中文

Please run the commands in the root path of BasicSR.
In general, both the training and testing include the following steps:

  1. Prepare datasets. Please refer to DatasetPreparation.md
  2. Modify config files. The config files are under the options folder. For more specific configuration information, please refer to Config
  3. [Optional] You may need to download pre-trained models if you are testing or using pre-trained models. Please see ModelZoo
  4. Run commands. Use Training Commands or Testing Commands accordingly.

目录

  1. Training Commands
    1. Single GPU Training
    2. Distributed (Multi-GPUs) Training
    3. Slurm Training
  2. Testing Commands
    1. Single GPU Testing
    2. Distributed (Multi-GPUs) Testing
    3. Slurm Testing

Training Commands

Single GPU Training

PYTHONPATH="./:${PYTHONPATH}" \
CUDA_VISIBLE_DEVICES=0 \
python basicsr/train.py -opt options/train/SRResNet_SRGAN/train_MSRResNet_x4.yml

Distributed Training

8 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 basicsr/train.py -opt options/train/EDVR/train_EDVR_M_x4_SR_REDS_woTSA.yml --launcher pytorch

or

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
./scripts/dist_train.sh 8 options/train/EDVR/train_EDVR_M_x4_SR_REDS_woTSA.yml

4 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/train.py -opt options/train/EDVR/train_EDVR_M_x4_SR_REDS_woTSA.yml --launcher pytorch

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
./scripts/dist_train.sh 4 options/train/EDVR/train_EDVR_M_x4_SR_REDS_woTSA.yml

Slurm Training

Introduction to Slurm

1 GPU

PYTHONPATH="./:${PYTHONPATH}" \
GLOG_vmodule=MemcachedClient=-1 \
srun -p [partition] --mpi=pmi2 --job-name=MSRResNetx4 --gres=gpu:1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=6 --kill-on-bad-exit=1 \
python -u basicsr/train.py -opt options/train/SRResNet_SRGAN/train_MSRResNet_x4.yml --launcher="slurm"

4 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
GLOG_vmodule=MemcachedClient=-1 \
srun -p [partition] --mpi=pmi2 --job-name=EDVRMwoTSA --gres=gpu:4 --ntasks=4 --ntasks-per-node=4 --cpus-per-task=4 --kill-on-bad-exit=1 \
python -u basicsr/train.py -opt options/train/EDVR/train_EDVR_M_x4_SR_REDS_woTSA.yml --launcher="slurm"

8 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
GLOG_vmodule=MemcachedClient=-1 \
srun -p [partition] --mpi=pmi2 --job-name=EDVRMwoTSA --gres=gpu:8 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=6 --kill-on-bad-exit=1 \
python -u basicsr/train.py -opt options/train/EDVR/train_EDVR_M_x4_SR_REDS_woTSA.yml --launcher="slurm"

Testing Commands

Single GPU Testing

PYTHONPATH="./:${PYTHONPATH}" \
CUDA_VISIBLE_DEVICES=0 \
python basicsr/test.py -opt options/test/SRResNet_SRGAN/test_MSRResNet_x4.yml

Distributed Testing

8 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python -m torch.distributed.launch --nproc_per_node=8 --master_port=4321 basicsr/test.py -opt options/test/EDVR/test_EDVR_M_x4_SR_REDS.yml --launcher pytorch

or

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
./scripts/dist_test.sh 8 options/test/EDVR/test_EDVR_M_x4_SR_REDS.yml

4 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 basicsr/test.py -opt options/test/EDVR/test_EDVR_M_x4_SR_REDS.yml --launcher pytorch

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
./scripts/dist_test.sh 4 options/test/EDVR/test_EDVR_M_x4_SR_REDS.yml

Slurm Testing

Introduction to Slurm

1 GPU

PYTHONPATH="./:${PYTHONPATH}" \
GLOG_vmodule=MemcachedClient=-1 \
srun -p [partition] --mpi=pmi2 --job-name=test --gres=gpu:1 --ntasks=1 --ntasks-per-node=1 --cpus-per-task=6 --kill-on-bad-exit=1 \
python -u basicsr/test.py -opt options/test/SRResNet_SRGAN/test_MSRResNet_x4.yml --launcher="slurm"

4 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
GLOG_vmodule=MemcachedClient=-1 \
srun -p [partition] --mpi=pmi2 --job-name=test --gres=gpu:4 --ntasks=4 --ntasks-per-node=4 --cpus-per-task=4 --kill-on-bad-exit=1 \
python -u basicsr/test.py -opt options/test/EDVR/test_EDVR_M_x4_SR_REDS.yml --launcher="slurm"

8 GPUs

PYTHONPATH="./:${PYTHONPATH}" \
GLOG_vmodule=MemcachedClient=-1 \
srun -p [partition] --mpi=pmi2 --job-name=test --gres=gpu:8 --ntasks=8 --ntasks-per-node=8 --cpus-per-task=6 --kill-on-bad-exit=1 \
python -u basicsr/test.py -opt options/test/EDVR/test_EDVR_M_x4_SR_REDS.yml --launcher="slurm"