Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about interplay between parameters #474

Closed
AKHughes1994 opened this issue Jun 10, 2023 · 8 comments
Closed

Question about interplay between parameters #474

AKHughes1994 opened this issue Jun 10, 2023 · 8 comments

Comments

@AKHughes1994
Copy link

Hello,

First of all sorry if this is the wrong forum to be asking these questions but I have a few regarding the interplay between --data-time-chunk, --data-freq-chunk, and --sel-ddid.

For example, let's consider a single VLA-style 8-bit ms file where the total bandwidth is broken into two sub-bands, each composed of 8 spectral windows (spw), and spw each has 64 2MHz frequency channels. Thus, the sub-bands have 512 channels and the total bandwidth has 1024 channels. I want to perform self-calibration on each of the sub-bands separately, so I make two parsets and specify ddid=0~7 for one and ddid=8~15 for the other.

Now if I wanted to load in all of the data, and calibrate the entire bandwidth I would put --data-freq-chunk=1024. But what do I do for sub-bands? Would put I --data-freq-chunk=512 as there are only 512 channels in the sub-band, or would I still put --data-freq-chunk=1024 and then the ddid term would automatically choose the 512 channels within the specified range of spws. In practice, I doubt it makes a difference but I'm curious to know what the order of the two commands is; moreover, the scenario where I only need to specify 512 channels reduce memory cost.

An adjacent question is does the program automatically cap the data-freq-chunk. Say I were to put data-freq-chunk=10000 despite the fact I only have 1024 channels, would it be identical to data-freq-chunk=1024 or would it do something undesirable?

The third question is without specifying units does --data-time-chunk interpret the integer as a number of integration/dump times?

Thanks for your help

@o-smirnov
Copy link
Collaborator

First of all sorry if this is the wrong forum to be asking these questions but I have a few regarding the interplay between --data-time-chunk, --data-freq-chunk, and --sel-ddid.

Right enough forum -- but you can also use Discussions if you're not sure it's a bug, we can always migrate things to an issue.

Now if I wanted to load in all of the data, and calibrate the entire bandwidth I would put --data-freq-chunk=1024. But what do I do for sub-bands? Would put I --data-freq-chunk=512 as there are only 512 channels in the sub-band, or would I still put --data-freq-chunk=1024 and then the ddid term would automatically choose the 512 channels within the specified range of spws. In practice, I doubt it makes a difference but I'm curious to know what the order of the two commands is; moreover, the scenario where I only need to specify 512 channels reduce memory cost.

I think since #424 went in, multiple-spw support is in there... but probably not as well-tested as we would like. The chunking only determines how much data is read in at once (and less obviously the parallelism) -- within each chuck, the per-Jones frequency solution interval then determines the frequency cadence of the solutions. Thus in your scenario, it would load 512 or 1024 channels in per worker, and run a solution per every interval within a chunk, as determined by --x-freq-int. The only difference in practice should be that in the case of a 512-channel chunk, you would get two workers operating in parallel, while for a 1024-channel chunk, there would only be one worker. Memory cost should be similar either way since the same amount of data is going into RAM in both scenarios (it's rather the time chunking that drives RAM up and down).

An adjacent question is does the program automatically cap the data-freq-chunk. Say I were to put data-freq-chunk=10000 despite the fact I only have 1024 channels, would it be identical to data-freq-chunk=1024 or would it do something undesirable?

Nope, no difference at all, anything over the maximum means full band.

The third question is without specifying units does --data-time-chunk interpret the integer as a number of integration/dump times?

Yes, exactly.

Thanks for your help

Any time! Just forgive the slow response time at times.

@JSKenyon
Copy link
Collaborator

You beat me to it @o-smirnov (and probably explained it better than I would have). @AKHughes1994 One thing which may not work is overlapping spectral windows. I know that this is the case for some (all?) of the VLA bands between SPW 7 and 8. It is a difficult situation to handle, as it becomes a little unclear what the "correct" thing to do is. Was your intention to calibrate over all 16 SPWs? Or was your goal to solve separately in each sub-band using a single parset?

@AKHughes1994
Copy link
Author

AKHughes1994 commented Jun 13, 2023

@o-smirnov Thanks!

@JSKenyon The need to break it up into two sub-bands came from the inability to handle overlapping SPWs. If possible I would prefer to do all 16 spws, but the sub-band routine seems to work fine!

Can I tack on another couple of questions about the parameters?

  1. As I understand it, if I set time-chunk = 0 or freq-chunk = 0 it loads in the entire frequency/time range. Does this also apply time-int=0 and freq-int=0 for the gain terms? Would that make the solution interval all of the loaded data?

  2. Furthermore, how would time-chunk=0 interact with chunk-by-jump=SCAN_NUMBER? Would it break up individual scans and then load all of the data in each scan?

  3. Lastly, the two terms max-prior-error and max-post-error, I've seen parsets where they are both set to 0.0. Does this turn off prior/post error flagging?

Thanks

@o-smirnov
Copy link
Collaborator

Yes exactly to all three. :)

@AKHughes1994
Copy link
Author

AKHughes1994 commented Jun 13, 2023

@o-smirnov Thanks again, let me throw out one more for the road,

What are the units of --sol-min-bl?

@o-smirnov
Copy link
Collaborator

Meters (we don't have a uv-wavelength cutoff, do we @JSKenyon?)

@JSKenyon
Copy link
Collaborator

Yep, --sol-min-bl and --sol-max-bl are both in meters. No, we don't currently have a uv-wavelength cutoff.

@AKHughes1994
Copy link
Author

Thanks, @o-smirnov @JSKenyon, I'll close this for now and direct future questions of this sort to the discussion section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants