You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import numba
@njit
def element_update_numba(a):
return a * 5
def element_update_dace(a):
return a * 5
def element_update(a):
return a * 5
def someforloop(A):
for i in range(1000):
for j in range(1000):
A[i, j] = element_update(A[i, j])
@njit(parallel=True)
def someforloop_numba_parallel(A):
for i in numba.prange(1000):
for j in numba.prange(1000):
A[i, j] = element_update_numba(A[i, j])
@njit
def someforloop_numba(A):
for i in range(1000):
for j in range(1000):
A[i, j] = element_update_numba(A[i, j])
@dace.program(auto_optimize=True, device=dace.DeviceType.CPU)
def someforloop_dace_parallel(A: dace.float64[1000, 1000]):
for i in dace.map[0: 1000]:
for j in dace.map[0: 1000]:
A[i, j] = element_update_dace(A[i, j])
@dace.program(auto_optimize=True, device=dace.DeviceType.CPU)
def someforloop_dace(A: dace.float64[1000, 1000]):
for i in range(1000):
for j in range(1000):
A[i, j] = element_update_dace(A[i, j])
someforloop_dace_parallel_compiled = someforloop_dace_parallel.compile()
someforloop_dace_compiled = someforloop_dace.compile()
a_orig = np.random.rand(1000, 1000)
TIMES = {}
a = a_orig.copy()
TIMES['numpy'] = %timeit -o someforloop(a)
a = a_orig.copy()
TIMES['numba'] = %timeit -o someforloop_numba(a)
a = a_orig.copy()
TIMES['numba_parallel'] = %timeit -o someforloop_numba_parallel(a)
a = a_orig.copy()
TIMES['dace_njit'] = %timeit -o someforloop_dace(a)
a = a_orig.copy()
TIMES['dace_compiled'] = %timeit -o someforloop_dace_compiled(a)
a = a_orig.copy()
TIMES['dace_parallel_njit'] = %timeit -o someforloop_dace_parallel(a)
a = a_orig.copy()
TIMES['dace_parell_compiled'] = %timeit -o someforloop_dace_parallel_compiled(a)
However I get the following results:
numpy: 285 ms ± 6.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
numba: 174 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
numba_parallel: 63.9 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
dace_njit: 725 µs ± 39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
dace_compiled: 161 µs ± 3.59 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
dace_parallel_njit: 679 µs ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
dace_parallel_compiled: 90.1 µs ± 4.89 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I'm wondering if there are any optimisations I'm missing / installations on my laptop missing that will make this code run faster?
Expected behavior
DaCe to be faster
Desktop (please complete the following information):
OS: Windows 10
IDE: VSCode
Version: latest
The text was updated successfully, but these errors were encountered:
Describe the bug
Seeing different performance when compared to numba than suggested in DaCe notebook.
To Reproduce
https://nbviewer.org/github/spcl/dace/blob/master/tutorials/benchmarking.ipynb
In the above notebook, it is suggest that the following code should run faster on DaCe than it does on numba:
However I get the following results:
numpy: 285 ms ± 6.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
numba: 174 µs ± 4.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
numba_parallel: 63.9 µs ± 5.91 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
dace_njit: 725 µs ± 39 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
dace_compiled: 161 µs ± 3.59 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
dace_parallel_njit: 679 µs ± 21.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
dace_parallel_compiled: 90.1 µs ± 4.89 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
I'm wondering if there are any optimisations I'm missing / installations on my laptop missing that will make this code run faster?
Expected behavior
DaCe to be faster
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: