We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Especially on newer CPU architectures, it may be favorable not to exploit sparsity in e.g. multiplication of homogeneous transforms.
AVX2-capable machine:
Julia Version 0.6.0-pre.beta.295 Commit dc907c7 (2017-04-24 04:37 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Older, non-AVX2-capable machine:
Julia Version 0.6.0-pre.beta.295 Commit dc907c760f (2017-04-24 04:37 UTC) Platform Info: OS: macOS (x86_64-apple-darwin13.4.0) CPU: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
In each case, I rebuilt the system image for the native architecture.
@benchmark (arot * brot, atrans + arot * btrans) setup = begin arot = rand(SMatrix{3, 3}) brot = rand(SMatrix{3, 3}) atrans = rand(SVector{3}) btrans = rand(SVector{3}) end
AVX2:
BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 12.423 ns (0.00% GC) median time: 12.496 ns (0.00% GC) mean time: 12.906 ns (0.00% GC) maximum time: 32.182 ns (0.00% GC) -------------- samples: 10000 evals/sample: 999
Non-AVX2:
BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 10.598 ns (0.00% GC) median time: 11.208 ns (0.00% GC) mean time: 11.527 ns (0.00% GC) maximum time: 91.898 ns (0.00% GC) -------------- samples: 10000 evals/sample: 999 time tolerance: 5.00% memory tolerance: 1.00%
@benchmark a * b setup = (a = rand(SMatrix{4, 4}); b = rand(SMatrix{4, 4}))
BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 5.331 ns (0.00% GC) median time: 5.344 ns (0.00% GC) mean time: 5.565 ns (0.00% GC) maximum time: 26.861 ns (0.00% GC) -------------- samples: 10000 evals/sample: 1000
BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 7.679 ns (0.00% GC) median time: 8.138 ns (0.00% GC) mean time: 8.520 ns (0.00% GC) maximum time: 54.870 ns (0.00% GC) -------------- samples: 10000 evals/sample: 999 time tolerance: 5.00% memory tolerance: 1.00%
The text was updated successfully, but these errors were encountered:
Doing this before #207 would simplify #207.
Sorry, something went wrong.
No branches or pull requests
Especially on newer CPU architectures, it may be favorable not to exploit sparsity in e.g. multiplication of homogeneous transforms.
AVX2-capable machine:
Older, non-AVX2-capable machine:
In each case, I rebuilt the system image for the native architecture.
Exploiting sparsity
AVX2:
Non-AVX2:
Not exploiting sparsity
AVX2:
Non-AVX2:
The text was updated successfully, but these errors were encountered: