update roofline for high order #48

ggorman · 2015-09-18T12:22:12Z

Repeat benchmarks on SENAI machine (Xeon and Xeon Phi) for different spatial orders (2,4,6,8,10,12).

Need the OI and peak flops for both so we can update roofline plot.

ggorman · 2015-09-18T12:26:47Z

@tj-sun - can you help @felippezacarias get your branch running to do the benchmarking?

@felippezacarias - can you run with a domain 512*_3 so that we can be sure most of the problem is not sitting in L3. Also - so ensure you are not messing up the alignment can you carefully set the domain size, n, such that n+boundary_depth_2 == 512.

tj-sun · 2015-09-18T12:44:34Z

Please use the feature_higher_spatial_order branch.
grid.set_accuracy() to set the order
in command line, you can run python tests/eigenwave3d.py -so n
where n is the spatial order divided by 2. So -so 2 for 4th order.
Note that due to different implementation for the boundary conditions in different orders, the errors are not comparable between different orders. But for now I think we just focus on the kernel performance.

tj-sun · 2015-09-18T13:10:54Z

I also just added output of kernel AI when you run python tests/eigenwave3d.py

tj-sun · 2015-09-18T14:17:49Z

also note that the number of ghost cells equals the spatial order.
So for 4th order, if setting grid size=100, you will have 105 grid points in total (one more because both side end with grid points)
i.e. make sure that grid_size + n*2 + 1 = 512
where n is the number you pass in with -so

tj-sun · 2015-09-18T18:15:21Z

I've just done some amendments to our AI calculation in the new commit. Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) I guess we will see when we got some results.

https://redmine.scorec.rpi.edu/attachments/111/roofline_for_FastMath.pdf

felippezacarias · 2015-09-18T18:26:55Z

@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before?

@tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct?

tj-sun · 2015-09-18T18:49:14Z

Hi,
The dimension should change to gridsize + 1 + 2*margin, where margin should equal to order. If that's not what you see I will take a look when I'm back home.

-----Original Message-----
From: "felippezacarias" [email protected]
Sent: ‎18/‎09/‎2015 19:26
To: "opesci/opesci-fd" [email protected]
Cc: "tj-sun" [email protected]
Subject: Re: [opesci-fd] update roofline for high order (#48)

@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before?
@tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct?
—
Reply to this email directly or view it on GitHub.

ggorman · 2015-09-18T21:19:44Z

Why don't you do both (papi + hand instrument) and compare? If there is a big difference we will want to know why.

tj-sun · 2015-09-18T23:56:40Z

@felippezacarias - you are absolutely right on the grid_size. I didn't recalculate the grid_size after setting new order. It's fixed now.

ggorman · 2015-09-19T08:08:28Z

@tj-sun going back to your comments above "Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) "

This is not making sense to me. Previously we estimated that AI for 4th order was ~0.8 --- remember that initially @felippezacarias reported 1.7 and then you pointed out that this has to be divided by two to take into account floats. I could buy that figure because it was consistent with the figure of 0.94 reported in roofline_for_FastMath.pdf (BTW - your suggestion that the article was talking about double would imply that the AI for floats would be twice that again).

Can we focus on getting this right as it is a key metric.

tj-sun · 2015-09-19T12:14:29Z

I read the article again yesterday but I think the 0.94 in the article is double precision, so I began to think our AI is too low. I checked again and found the overall calculation earlier was done wrongly. I also added boundary conditions and ghost cell adjustments (according to page 31 of the article)

tj-sun · 2015-09-21T10:34:44Z

@felippezacarias please note that in the new commit f337943 the behaviour of setting spatial order has changed. Now -so=4 will set 4th order instead of 8th order. This is to address issue #41

ggorman added the help wanted label Sep 18, 2015

ggorman assigned felippezacarias Sep 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update roofline for high order #48

update roofline for high order #48

ggorman commented Sep 18, 2015

ggorman commented Sep 18, 2015

tj-sun commented Sep 18, 2015

tj-sun commented Sep 18, 2015

tj-sun commented Sep 18, 2015

tj-sun commented Sep 18, 2015

felippezacarias commented Sep 18, 2015

tj-sun commented Sep 18, 2015

ggorman commented Sep 18, 2015

tj-sun commented Sep 18, 2015

ggorman commented Sep 19, 2015

tj-sun commented Sep 19, 2015

tj-sun commented Sep 21, 2015

update roofline for high order #48

update roofline for high order #48

Comments

ggorman commented Sep 18, 2015

ggorman commented Sep 18, 2015

tj-sun commented Sep 18, 2015

tj-sun commented Sep 18, 2015

tj-sun commented Sep 18, 2015

tj-sun commented Sep 18, 2015

felippezacarias commented Sep 18, 2015

tj-sun commented Sep 18, 2015

ggorman commented Sep 18, 2015

tj-sun commented Sep 18, 2015

ggorman commented Sep 19, 2015

tj-sun commented Sep 19, 2015

tj-sun commented Sep 21, 2015