-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track possible multiple correlation support in nifty/ducc gridders #169
Comments
While the ducc gridder still does not support multiple correlations, it's on my todo list. I assume that the most typical situation will be four correlations, i.e. four complex numbers for every entry in the MS instead of just one at the moment, correct? Will the weight array stay the same, or will it also increase by four? |
Hi @mreineck. Thanks for keeping this on the horizon. I would say one, two and four correlations are the typical cases. The WEIGHT and WEIGHT_SPECTRUM arrays always have a correlation dimension which is the same as that in the DATA, FLAG, CORRECTED_DATA etc. arrays. See the CASA MSv2.0 spec for confirmation: https://casa.nrao.edu/Memos/229.html#SECTION00061000000000000000 From admittedly hazy memory, I think I just decided to pass non-contiguous arrays into the nifty gridder. I remember it sent many warning messages about non-contiguous arrays to the console, which I think you removed? |
OK, I'll see how that fits into the design. My hope is that I can use vector instructions when gridding/degridding multiple values that are at identical (u,v,w) locations. If that works, then a single pass doing four correlations might be noticeably faster than four separate passes doing the one correlation at a time. Passing non-contiguous arrays is probably not such a big deal. There will be slightly more cache misses, but the effect on overall performance is probably small. |
@landmanbester, I can't imagine another RA use case, but pinging you in case there's something exotic that I may have missed |
This is useful to know. I'm currently re-implementing a Discrete Wavelet Transform in numba: ratt-ru/pfb-imaging#10 and the thought of applying the 1D DWT on non-contiguous data made me pause: The original implementation copies rows which are non-contiguous before applying the filter. Do you have any thoughts on this that you'd be willing to share on the matter in ratt-ru/pfb-imaging#10? I'm not asking for a code review -- I find this level of discussion useful. |
Whether unaligned accesses are a problem depends primarily on how many operations you do on an array element before going to the next. In the gridder, I read a value from the If this was a 1D gridder, things would already look worse, as cost now only scales linearly with |
I think that about covers it. The weights are per correlation so, strictly speaking, you have to grid the correlations before converting to Stokes components. I don't know if it makes much of a difference but I think the diagonal components of the brightness matrix are always real-valued |
Unaligned accesses can create a bit of heavoc when vectorizing code using
special instruction sets. I had some speedup (not the full 8 fold when I
did this vectorization by hand once but that was a long time ago).
Compilers have likely become smarter in the mean time.
I don't know if numba exploits special instruction sets or not though
…On Fri, 24 Jul 2020, 12:24 Landman Bester, ***@***.***> wrote:
I would say one, two and four correlations are the typical cases.
@landmanbester <https://github.com/landmanbester>, I can't imagine
another RA use case, but pinging you in case there's something exotic that
I may have missed
I think that about covers it. The weights are per correlation so, strictly
speaking, you have to grid the correlations before converting to Stokes
components. I don't know if it makes much of a difference but I think the
diagonal components of the brightness matrix are always real-valued
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6TBLLKRRWYERHCSZHLR5FOOPANCNFSM4KFGJBBQ>
.
|
Yes. In my particular situation it wouldn't be a problem though, since during gridding I'd basically have a (potentially) unaligned load per visibility, and the remaining support**2 memory accesses would be aligned. I still don't expect anything near an 8-fold speedup, however... |
Yea I don't expect it to make any difference in your case. Sorry I forgot
you wrote a stacking gridder. With wprojection it makes more of a
difference.
…On Fri, 24 Jul 2020, 15:19 mreineck, ***@***.***> wrote:
Unaligned accesses can create a bit of heavoc when vectorizing code using
special instruction sets.
Yes. In my particular situation it wouldn't be a problem though, since
during gridding I'd basically have a (potentially) unaligned load per
visibility, and the remaining support**2 memory accesses would be aligned.
I still don't expect anything near an 8-fold speedup, however...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6TXUAQYBQNG3T3VEKTR5GC7ZANCNFSM4KFGJBBQ>
.
|
numba defers to LLVM for SIMD: http://numba.pydata.org/numba-doc/latest/user/faq.html?highlight=simd#does-numba-vectorize-array-computations-simd I'm seen numba generate vectorised assembly code. Something like the following should show you the ASM: @numba.njit
def function(...):
pass
function(...)
print(function.inspect_asm()) |
After finding another way of vectorizing the inner loops of the gridding/degridding functions, I think I won't support simultaneous processing of multiple correlations after all ... (It's of course true that with separate processing we have to do the same kernel evaluations repeatedly, but I don't think this is an important factor any more.) |
You don't need to grid all the data unless you are forming multiple stokes
parameters (which isn't the typical use case). Handling two correlations at
once is however quite possible and avoids the large memory reservations you
need:
See for instance the pull request going in at the moment for my facet based
numba gridding and degridding scheme:
https://github.com/ska-sa/codex-africanus/blob/681ca8b46dad4693ca0484608c863f9f9ba77f08/africanus/gridding/perleypolyhedron/policies/stokes_conversion_policies.py
The one exception is of course DFT based RM synthesis (e.g. Brentjens et
al.) you will need to allocate nu*nv for 3 grids (IQU)
Cheers
…On Tue, Aug 4, 2020 at 2:19 PM mreineck ***@***.***> wrote:
After finding another way of vectorizing the inner loops of the
gridding/degridding functions, I think I won't support simultaneous
processing of multiple correlations after all ...
The issue is that if we process, say, four correlations together, then the
amount of data which is touched in the innermost loops also becomes four
times larger, and this can make the difference between very few and very
many L1 cache misses.
Also it would mean reserving memory for 4 complex arrays of size nu*nv
instead of just one, which is probably not the best thing to do for really
large dirty images.
So unless there are important reasons for switching, I propose to keep the
current implementation.
(It's of course true that with separate processing we have to do the same
kernel evaluations repeatedly, but I don't think this is an important
factor any more.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4RE6W2S756RVIYQBVMW33R674G3ANCNFSM4KFGJBBQ>
.
--
--
Benjamin Hugo
PhD. student,
Centre for Radio Astronomy Techniques and Technologies
Department of Physics and Electronics
Rhodes University
Junior software developer
Radio Astronomy Research Group
South African Radio Astronomy Observatory
Black River Business Park
Observatory
Cape Town
|
I see, thanks! |
Description
Solution
The text was updated successfully, but these errors were encountered: