Optimizations techniques for code generation #58

felippezacarias · 2016-01-12T19:03:13Z

Pull Request	Why	Reference	Code parameters used	Time Before/After – Xeon	Time Before/After – Xeon Phi
#51	Thread blocking access would be achieved by the directive schedule(static,1) on the outer most loop. It allows threads processing the z plane use some y and x planes already in cache.	Wave Equation Based Stencil Optimizations on Multi-core CPU - Muhong Zhou and William W. Symes, Rice University – Section: Reducing L3 Cache Misses – Blocking thread accesses	Xeon: Code 8th order, Grid size 512x512x512 Xeon Phi: Code 8th order, Grid size 420x420x420	288 sec - 258 sec	123 sec - 112 sec
#52	Modifies the array access pattern by fission on the inner most loop and rearranging the access pattern by its stride. Beyond that, this changes helps to reduce register pressure on the vectorization.	Borges, L., 2011, 3d finite differences on multi-core processors. (available online at [https://software.intel. com/en-us/articles/3d-finite-differences-on-multi-core-processors](https://software.intel. com/en-us/articles/3d-finite-differences-on-multi-core-processors)).	Xeon: Code 8th order, Grid size 512x512x512 Xeon Phi: Code 8th order, Grid size 420x420x420	258 sec - 158 sec	112 sec - 196 sec

Provide feedback