-
Notifications
You must be signed in to change notification settings - Fork 195
Perf
While it is convenient to be able to time a particular function with a given set of parameters, it is even better to be able to generate a plot of performance over a range of parameters. clFFT can generate performance plots with the help of Python scripts. The python scripts are located at ./src/scripts/perf, but when the INSTALL target is built from the build environment the scripts are copied into the ./bin/clFFT/develop/vs10x64/package directory along with the rest of the built binaries.
The are two primary python scripts that are user interact-able.
This script is responsible for measuring, gathering performance data and recording it in a log file. This script calls the clFFT client program in a loop, modifying program parameters in an organized fashion and scrapes stdOut for performance information. It provides a sophisticated interface that simplifies specifying test ranges and strides. It provides for extensive help information with the --help parameter
C:\clFFT\src\scripts\perf>measurePerformance.py -h
usage: measurePerformance.py [-h] [--device DEVICE] [-b BATCHSIZE]
[-a CONSTPROBSIZE] [-x LENGTHX]
[-y LENGTHY] [-z LENGTHZ]
[--problemsize PROBLEMSIZE]
[-i INPUTLAYOUT] [-o OUTPUTLAYOUT]
[-p PLACENESS] [-r PRECISION]
[--ldscomplex]
[--ldsfraction LDSFRACTION]
[--cachesize CACHESIZE]
[--xfactor XFACTOR]
[--library {clfft}] [--label LABEL]
[--createini CREATEINIFILENAME]
[--ini INIFILENAME]
[--tablefile TABLEOUTPUTFILENAME]
Measure performance of the clFFT library
optional arguments:
-h, --help show this help message and exit
--device DEVICE device(s) to run on; may be a comma-delimited list.
choices are ['gpu', 'cpu']. (default gpu)
-b BATCHSIZE, --batchsize BATCHSIZE
number of FFTs to perform with one invocation of the
client. the special value 'max' may be used to adjust
the batch size on a per-transform basis to the maximum
problem size possible on the device. may be a range or
a comma-delimited list. if a range is entered, you may
follow it with ':X', where X is the stepping of the
range (if omitted, it defaults to a stepping of 1).
e.g., 1-15 or 12,18 or 7,10-30:10,1050-1054. the
special value 'pow10' expands to '1-9,10-90:10,100-900
:100,1000-9000:1000,10000-90000:10000,100000-900000:10
0000,1000000-9000000:1000000'. Note that 'max' and
'pow10' may not be used in a list; they must be used
by themselves; max may only be used with --library
clfft. (default 1)
-a CONSTPROBSIZE, --adaptivemax CONSTPROBSIZE
Max problem size that you want to maintain across the
invocations of client with different lengths. This is
adaptive and adjusts itself automtically.
-x LENGTHX, --lengthx LENGTHX
length(s) of x to test; must be factors of 1, 2, 3, or
5 with clFft; may be a range or a comma-delimited
list. e.g., 16-128 or 1200 or 16,2048-32768 (default
1)
-y LENGTHY, --lengthy LENGTHY
length(s) of y to test; must be factors of 1, 2, 3, or
5 with clFft; may be a range or a comma-delimited
list. e.g., 16-128 or 1200 or 16,32768 (default 1)
-z LENGTHZ, --lengthz LENGTHZ
length(s) of z to test; must be factors of 1, 2, 3, or
5 with clFft; may be a range or a comma-delimited
list. e.g., 16-128 or 1200 or 16,32768 (default 1)
--problemsize PROBLEMSIZE
additional problems of a set size. may be used in
addition to lengthx/y/z. each indicated problem size
will be added to the list of FFTs to perform. should
be entered in AxBxC:D format. A, B, and C indicate the
sizes of the X, Y, and Z dimensions (respectively). D
is the batch size. All values except the length of X
are optional. may enter multiple in a comma-delimited
list. e.g., 2x2x2:32768 or 256x256:100,512x512:256
-i INPUTLAYOUT, --inputlayout INPUTLAYOUT
may enter multiple in a comma-delimited list. choices
are ['cp', 'ci']. ci = complex interleaved, cp =
complex planar (default ci)
-o OUTPUTLAYOUT, --outputlayout OUTPUTLAYOUT
may enter multiple in a comma-delimited list. choices
are ['cp', 'ci']. ci = complex interleaved, cp =
complex planar (default ci)
-p PLACENESS, --placeness PLACENESS
may enter multiple in a comma-delimited list. choices
are ['in', 'out']. in = in place, out = out of place
(default in)
-r PRECISION, --precision PRECISION
may enter multiple in a comma-delimited list. choices
are ['single', 'double']. (default single)
--ldscomplex turn on complex LDS (default off)
--ldsfraction LDSFRACTION
fraction of the LDS to use; should be 0 or an integer
2-8. library automatically chooses the value on 0. may
be a range or a comma-delimited list. (default 0)
--cachesize CACHESIZE
size of the cache; should be 0 or a positive integer
between one and two times the problem size. library
automatically chooses the value on a 0. may be a range
or a comma-delimited list. (default 0)
--xfactor XFACTOR size of the X dimension to use when dividing up large
problems; should be 0 or a power of 2. library
automatically chooses the value on a 0. may be a range
or a comma-delimited list. (default 0)
--library {clfft} indicates the library to use for testing on this run
--label LABEL a label to be associated with all transforms performed
in this run. if LABEL includes any spaces, it must be
in "double quotes". note that the label is not saved
to an .ini file. e.g., --label cayman may indicate
that a test was performed on a cayman card or --label
"Windows 32" may indicate that the test was performed
on Windows 32
--createini CREATEINIFILENAME
create an .ini file with the given name that saves the
other parameters given at the command line, then quit.
e.g., 'performance.py -x 2048 --createini
my_favorite_setup.ini' will create an .ini file that
will save the configuration for a 2048-datapoint 1D
FFT.
--ini INIFILENAME use the parameters in the named .ini file instead of
the command line parameters.
--tablefile TABLEOUTPUTFILENAME
save the results to a plaintext table with the file
name indicated. this can be used with
plotPerformance.py to generate graphs of the
data (default: table prints to screen)
An example of using this script to gather data is illustrated below; running to gather performance number for a few sizes - 4,16,64,256,1024.
C:\clFFT\src\scripts\perf>measurePerformance.py -x 4,16,64,256,1024 -b max
A subdirectory or file perfLog already exists.
=========================MEASURE PERFORMANCE START===========================
Process id of Measure Performance:14592
Executing measure performance for label: None
Executing for label: None
table header---->lengthx,lengthy,lengthz,batch,device,inlay,outlay,place,precision,label,GFLOPS
Total combinations = 5
preparing command: 1
Executing Command: ['Client.exe', '--gpu', '-x', '4', '-y', '1', '-z', '1', '--batchSize', '1048576', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 0 samples out of 10
===========================clFFT============================
Handle: 1
Kernel: 0000000003DD08C0
OutEvents: 000000000480F390
Length: (4)
Batch: 1048576
Input Stride: (1)
Output Stride: (1)
Global Work: (2097152)
Gflops: 83.3251
Time (ns): 503,366
stderr:
Execution Successfull---------------
preparing command: 2
Executing Command: ['Client.exe', '--gpu', '-x', '16', '-y', '1', '-z', '1', '--batchSize', '262144', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 10
===========================clFFT============================
Handle: 1
Kernel: 0000000003DD0940
OutEvents: 000000000627B6B0
Length: (16)
Batch: 262144
Input Stride: (1)
Output Stride: (1)
Global Work: (1048576)
Gflops: 174.583
Time (ns): 480,493
stderr:
Execution Successfull---------------
preparing command: 3
Executing Command: ['Client.exe', '--gpu', '-x', '64', '-y', '1', '-z', '1', '--batchSize', '65536', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 10
===========================clFFT============================
Handle: 1
Kernel: 0000000003DDCA00
OutEvents: 0000000004DBFE50
Length: (64)
Batch: 65536
Input Stride: (1)
Output Stride: (1)
Global Work: (1048576)
Gflops: 235.951
Time (ns): 533,284
stderr:
Execution Successfull---------------
preparing command: 4
Executing Command: ['Client.exe', '--gpu', '-x', '256', '-y', '1', '-z', '1', '--batchSize', '16384', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 1 samples out of 10
===========================clFFT============================
Handle: 1
Kernel: 0000000003EDC8D0
OutEvents: 0000000004C18E30
Length: (256)
Batch: 16384
Input Stride: (1)
Output Stride: (1)
Global Work: (1048576)
Gflops: 343.413
Time (ns): 488,543
stderr:
Execution Successfull---------------
preparing command: 5
Executing Command: ['Client.exe', '--gpu', '-x', '1024', '-y', '1', '-z', '1', '--batchSize', '4096', '--inLayout', '1', '--outLayout', '1', '', '', '-p', '10']
stdout:
========================StdDev ( 2 )========================
clFFT[ 0 ]: Pruning 0 samples out of 10
===========================clFFT============================
Handle: 1
Kernel: 0000000003C508C0
OutEvents: 000000000621C200
Length: (1024)
Batch: 4096
Input Stride: (1)
Output Stride: (1)
Global Work: (524288)
Gflops: 420.946
Time (ns): 498,200
stderr:
Execution Successfull---------------
=========================MEASURE PERFORMANCE ENDS===========================
This generates a log file in the current directory that contains the details of the parameters tested with the performance number
C:\clFFT\src\scripts\perf>type results2013-07-23T16.01.52.791000.txt
lengthx,lengthy,lengthz,batch,device,inlay,outlay,place,precision,label,GFLOPS
4,1,1,1048576,gpu,ci,ci,in,single,None,83.3251
16,1,1,262144,gpu,ci,ci,in,single,None,174.583
64,1,1,65536,gpu,ci,ci,in,single,None,235.951
256,1,1,16384,gpu,ci,ci,in,single,None,343.413
1024,1,1,4096,gpu,ci,ci,in,single,None,420.946
This log file is then fed into the plotPerformance.py script, which consumes the records and plots the results in a graph.
While the logfile generated from measurePerformance is sufficient for gathering performance data, it is nice to be able to generate plots with the data to be able to easily compare and contrast different sets of data. This is the purpose of plotPerformance.py; this python script uses the python matplotlib ( freely available ) library to either open a window into an interactive graph, or create an image file straight to disk. It provides for extensive help information with the --help parameter
C:\clFFT\src\scripts\perf>plotPerformance.py -h
usage: plotPerformance.py [-h] -d DATAFILE -x
{x,y,z,batchsize,problemsize} [-y {gflops}]
[--plot {device,precision,label}]
[--title GRAPHTITLE]
[--x_axis_label XAXISLABEL]
[--x_axis_scale {linear,log2,log10}]
[--y_axis_label YAXISLABEL]
[--outputfile OUTPUTFILENAME]
Plot performance of the clFFT library. plotPerformance.py reads in
data tables from measurePerformance.py and plots their values
optional arguments:
-h, --help show this help message and exit
-d DATAFILE, --datafile DATAFILE
indicate a file to use as input. must be in the format
output by measurePerformance.py. may be used
multiple times to indicate multiple input files. e.g.,
-d cypressOutput.txt -d caymanOutput.txt
-x {x,y,z,batchsize,problemsize}, --x_axis {x,y,z,batchsize,problemsize}
indicate which value will be represented on the x
axis. problemsize is defined as x*y*z*batchsize
-y {gflops}, --y_axis {gflops}
indicate which value will be represented on the y axis
--plot {device,precision,label}
indicate which of ['device', 'precision', 'label']
should be used to differentiate multiple plots. this
will be chosen automatically if not specified
--title GRAPHTITLE the desired title for the graph generated by this
execution. if GRAPHTITLE contains any spaces, it must
be entered in "double quotes". if this option is not
specified, the title will be autogenerated
--x_axis_label XAXISLABEL
the desired label for the graph's x-axis. if
XAXISLABEL contains any spaces, it must be entered in
"double quotes". if this option is not specified, the
x-axis label will be autogenerated
--x_axis_scale {linear,log2,log10}
the desired scale for the graph's x-axis. if nothing
is specified, it will be selected automatically
--y_axis_label YAXISLABEL
the desired label for the graph's y-axis. if
YAXISLABEL contains any spaces, it must be entered in
"double quotes". if this option is not specified, the
y-axis label will be autogenerated
--outputfile OUTPUTFILENAME
name of the file to output graphs. Supported formats:
emf, eps, pdf, png, ps, raw, rgba, svg, svgz.
Once the performance of a particular run has been saved to a log file, you can instruct clAmdBlas.plotPerformance to parse the log file and create a line graph from that data. The graph below shows the performance over the data points measured.
C:\clFFT\src\scripts\perf>plotPerformance.py -x x -d results2013-07-23T16.01.52.791000.txt