Skip to content
This repository has been archived by the owner on Aug 11, 2020. It is now read-only.

Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

Open
alextnewman opened this issue Dec 7, 2016 · 1 comment

Comments

@alextnewman
Copy link
Contributor

The removal of OpenMP from this tensor_cpu_inl.h caused a massive performance regression for us on Windows (MSVC 2013), Mac (Clang), and Linux (gcc): f225763

Locally, we've reverted this commit and gotten a tremendously positive result (20%+ improvement in training time), so it would be very helpful if there were some sort of option or flag we could use to enable OpenMP parallelization for this function without internal forking.

@szha
Copy link
Member

szha commented Aug 4, 2019

This code base has been donated to the Apache MXNet project per #373, and repo is deprecated. Future development and issue tracking should continue in Apache MXNet.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants