Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

alextnewman · 2016-12-07T14:40:49Z

The removal of OpenMP from this tensor_cpu_inl.h caused a massive performance regression for us on Windows (MSVC 2013), Mac (Clang), and Linux (gcc): f225763

Locally, we've reverted this commit and gotten a tremendously positive result (20%+ improvement in training time), so it would be very helpful if there were some sort of option or flag we could use to enable OpenMP parallelization for this function without internal forking.

The text was updated successfully, but these errors were encountered:

szha · 2019-08-04T00:51:19Z

This code base has been donated to the Apache MXNet project per #373, and repo is deprecated. Future development and issue tracking should continue in Apache MXNet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

alextnewman commented Dec 7, 2016

szha commented Aug 4, 2019

Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

Comments

alextnewman commented Dec 7, 2016

szha commented Aug 4, 2019