Skip to content
ProGamerGov edited this page Nov 29, 2019 · 2 revisions

The timing modification of neural_style_time.py is used to accurately track the time it takes to complete style transfer.

The following parameters are used to generate the timing data:

python3 neural_style_time.py -backend nn -optimizer lbfgs -num_iterations 500 -print_iter 0

python3 neural_style_time.py -backend nn -optimizer adam -num_iterations 500 -print_iter 0

python3 neural_style_time.py -backend cudnn -optimizer lbfgs -num_iterations 500 -print_iter 0

python3 neural_style_time.py -backend cudnn -optimizer adam -num_iterations 500 -print_iter 0

python3 neural_style_time.py -backend cudnn -cudnn_autotune -optimizer lbfgs -num_iterations 500 -print_iter 0

python3 neural_style_time.py -backend cudnn -cudnn_autotune -optimizer adam -num_iterations 500 -print_iter 0
  • Each test is run 3 times, and then the average of those 3 runs if rounded to the nearest second.

Speed can vary a lot depending on the backend and the optimizer. Here are some times for running 500 iterations with -image_size=512 on a Tesla K80 with different settings:

  • -backend nn -optimizer lbfgs: 117 seconds
  • -backend nn -optimizer adam: 100 seconds
  • -backend cudnn -optimizer lbfgs: 124 seconds
  • -backend cudnn -optimizer adam: 107 seconds
  • -backend cudnn -cudnn_autotune -optimizer lbfgs: 109 seconds
  • -backend cudnn -cudnn_autotune -optimizer adam: 91 seconds

Here are the same benchmarks on a GTX 1080:

  • -backend nn -optimizer lbfgs: 56 seconds
  • -backend nn -optimizer adam: 38 seconds
  • -backend cudnn -optimizer lbfgs: 40 seconds
  • -backend cudnn -optimizer adam: 40 seconds
  • -backend cudnn -cudnn_autotune -optimizer lbfgs: 23 seconds
  • -backend cudnn -cudnn_autotune -optimizer adam: 24 seconds

Here are the same benchmarks on a NVIDIA GRID K520:

  • -backend nn -optimizer lbfgs: 236 seconds
  • -backend nn -optimizer adam: 209 seconds
  • -backend cudnn -optimizer lbfgs: 226 seconds
  • -backend cudnn -optimizer adam: 200 seconds
  • -backend cudnn -cudnn_autotune -optimizer lbfgs: 226 seconds
  • -backend cudnn -cudnn_autotune -optimizer adam: 200 seconds

Here are the same benchmarks on a Tesla T4 with different settings:

  • -backend nn -optimizer lbfgs: 72 seconds
  • -backend nn -optimizer adam: 66 seconds
  • -backend cudnn -optimizer lbfgs: 48 seconds
  • -backend cudnn -optimizer adam: 40 seconds
  • -backend cudnn -cudnn_autotune -optimizer lbfgs: 51 seconds
  • -backend cudnn -cudnn_autotune -optimizer adam: 43 seconds

Here are the same benchmarks on a Tesla P100-PCIE-16GB with different settings:

  • -backend nn -optimizer lbfgs: 61 seconds
  • -backend nn -optimizer adam: 47 seconds
  • -backend cudnn -optimizer lbfgs: 37 seconds
  • -backend cudnn -optimizer adam: 23 seconds
  • -backend cudnn -cudnn_autotune -optimizer lbfgs: 39 seconds
  • -backend cudnn -cudnn_autotune -optimizer adam: 25 seconds