Compute-Intensive Kernel Hits NaN #71

frobnitzem · 2020-12-22T19:49:38Z

This kernel:

Line 234 in e241152

A[i] = _mm256_fmadd_pd(A[i], A[i], A[i]);

repeatedly applies
A = A*A + A (i.e. fmadd(A, A, A))
which quickly escalates to A=NaN.

To avoid this, the kernel can be changed to
A = -A*A + A (i.e. fnmsub(A, A, A)).

This makes the iteration equivalent to the Logistic map with r=1.
Adjusting the initial condition to A = 0.7 (or anything between 0 and 1) makes the iterations converge slowly to 0 over time.

hyviquel · 2021-01-05T19:58:39Z

@elliottslaughter Do you think this problem can be linked to #69 ?

elliottslaughter · 2021-01-07T20:41:22Z

@hyviquel It shouldn't have anything to do with that issue since the results of the compute bound kernel are thrown away, and the "result" that is actually written into the output region is instead a tuple containing the timestep and point (column) of the task.

@frobnitzem Thanks for pointing this out. Overall this looks good to me. I don't think it will change the practical results on any platforms we tested (since we already verified that we hit peak FLOPS), but it is true that a future system might hypothetically add an early-out for NaNs (which would then cause Task Bench to over-report its achieved FLOPS).

I'm happy to take a PR on this or may get back to it myself in a week or so. (Currently digging myself out of things that have been piling up since the break.)

frobnitzem · 2021-01-08T21:46:36Z

I agree, my timings were the same after changing the code. I didn't make a PR because one of the avx cases doesn't have a clear fix.

…

On Thu, Jan 7, 2021, 3:41 PM Elliott Slaughter ***@***.***> wrote: @hyviquel <https://github.com/hyviquel> It shouldn't have anything to do with that issue since the results of the compute bound kernel are thrown away, and the "result" that is actually written into the output region is instead a tuple containing the timestep and point (column) of the task. @frobnitzem <https://github.com/frobnitzem> Thanks for pointing this out. Overall this looks good to me. I don't think it will change the practical results on any platforms we tested (since we already verified that we hit peak FLOPS), but it is true that a future system might hypothetically add an early-out for NaNs (which would then cause Task Bench to over-report its achieved FLOPS). I'm happy to take a PR on this or may get back to it myself in a week or so. (Currently digging myself out of things that have been piling up since the break.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#71 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARDW54JN2RKSC3NMMCCQCDSYYMAHANCNFSM4VGCCOEQ> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute-Intensive Kernel Hits NaN #71

Compute-Intensive Kernel Hits NaN #71

frobnitzem commented Dec 22, 2020

hyviquel commented Jan 5, 2021

elliottslaughter commented Jan 7, 2021

frobnitzem commented Jan 8, 2021 via email

Compute-Intensive Kernel Hits NaN #71

Compute-Intensive Kernel Hits NaN #71

Comments

frobnitzem commented Dec 22, 2020

hyviquel commented Jan 5, 2021

elliottslaughter commented Jan 7, 2021

frobnitzem commented Jan 8, 2021 via email