-
Notifications
You must be signed in to change notification settings - Fork 7
update roofline for high order #48
Comments
@tj-sun - can you help @felippezacarias get your branch running to do the benchmarking? @felippezacarias - can you run with a domain 512*_3 so that we can be sure most of the problem is not sitting in L3. Also - so ensure you are not messing up the alignment can you carefully set the domain size, n, such that n+boundary_depth_2 == 512. |
Please use the feature_higher_spatial_order branch. |
I also just added output of kernel AI when you run python tests/eigenwave3d.py |
also note that the number of ghost cells equals the spatial order. |
I've just done some amendments to our AI calculation in the new commit. Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) I guess we will see when we got some results. https://redmine.scorec.rpi.edu/attachments/111/roofline_for_FastMath.pdf |
@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before? @tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct? |
Hi, -----Original Message----- @ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before? |
Why don't you do both (papi + hand instrument) and compare? If there is a big difference we will want to know why. |
@felippezacarias - you are absolutely right on the grid_size. I didn't recalculate the grid_size after setting new order. It's fixed now. |
@tj-sun going back to your comments above "Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) " This is not making sense to me. Previously we estimated that AI for 4th order was ~0.8 --- remember that initially @felippezacarias reported 1.7 and then you pointed out that this has to be divided by two to take into account floats. I could buy that figure because it was consistent with the figure of 0.94 reported in roofline_for_FastMath.pdf (BTW - your suggestion that the article was talking about double would imply that the AI for floats would be twice that again). Can we focus on getting this right as it is a key metric. |
I read the article again yesterday but I think the 0.94 in the article is double precision, so I began to think our AI is too low. I checked again and found the overall calculation earlier was done wrongly. I also added boundary conditions and ghost cell adjustments (according to page 31 of the article) |
@felippezacarias please note that in the new commit f337943 the behaviour of setting spatial order has changed. Now -so=4 will set 4th order instead of 8th order. This is to address issue #41 |
Repeat benchmarks on SENAI machine (Xeon and Xeon Phi) for different spatial orders (2,4,6,8,10,12).
Need the OI and peak flops for both so we can update roofline plot.
The text was updated successfully, but these errors were encountered: