You can run these from the stock Anaconda 3 docker container. Numba is already installed.
Use their instructions or docker-compose up using this docker-scripts repository.
A guide can also be found on my blog.
Note GPU drivers are not available as pass through when running containers on Windows 10 systems (as of 3/2020)
Install linux using the anaconda 3 instructions on the anaconda web site. You can also look on by blog for links and linux instructions.
These numbers represent the amount of time it takes to iterate 10,000,000 times using various approaches
The various approaches are structured as similar as possible without walking away from implementation specific optimizations.
- 10,000,000 long vector [[a,b,c],[a,b,c],...]
- Calculate
math.log(param_triple[0])*math.log(param_triple[1])*math.log(param_triple[2])
- Sum the results of all operations
- Z820 2X E5-2640 v2 @ 2.00GHz 128GB memory
- 2 - 8 Core Hyper threaded
- 32 Virtual Cores
- Nvidia RTX 2060 Super
cuda-demo.py
GPU grid: 0.1630 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU jit parallel: 0.0922 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU jit serial: 1.1733 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU serial nojit: 12.4751 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
dask-demo.py
DASK jit thrd multi 2.5249 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
DASK jit thrd sing: 1.7951 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
DASK jit proc mult: 1.9432 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
DASK jit proc sing: 3.8063 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU jit parallel: 0.0922 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU jit serial: 1.1381 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU nojit serial: 10.7825 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
- Multi-threaded: 8 processors 32 threads
- Single-threaded: 1 processor 1 thread
Dell Latitude I5-4310U @ 2.00Ghz / 2.60Ghz Docker 2 cores 4GB
dask-demo.py
DASK thrd multi: 1.6286 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
DASK thrd sing: 1.2028 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
DASK proc mult: 3.0543 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
DASK proc sing: 2.6753 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU jit parallel: 0.4176 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU jit serial: 0.6670 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
CPU serial nojit: 13.8456 Iterations: 10,000,000 dtype= float64 numcalc: sum( 10,000,000 ): 34986983045.74297
- Multi-threaded: 2 processors 2 threads total
- Single-threaded: 1 processor 1 thread total