-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure auto_nbands handler doesn't always terminate jobs #342
Conversation
Just a side note: for Lobster, we need a certain number of bands as a minimum. I would be worried that it isn't enough with such a fix. |
Thanks @JaGeo, do you mean NBANDS should be increased beyond the minimum needed to get the calculation to run? Or just avoiding decreasing NBANDS as a fix? |
I think the latter as I am not sure you can determine this minimum for each run. There might be requirements for GW runs as well. |
That's a reasonable point. For a normal band structure run, I think it's best to have a fix like this where the number of bands is just increased enough for the calc to run. For GW or BSE calculations, hopefully the user is more attuned to the need for higher number of bands. |
@esoteric-ephemera: Would it be better to issue a warning but not fail the job in the case of seemingly large nbands? Actually, that was my original intention but I suppose that may not be the case here... |
@Andrew-S-Rosen: not sure, for reasons that @JaGeo and I discussed above, having custodian automatically decrease the number of bands isn't always a good idea. But I'm not opposed to drafting this up |
@esoteric-ephemera: What I meant was that the auto nbands warning I had added in #324 was not actually meant to kill the job. It was only meant to raise a warning to the user and then proceed as usual (with no changes). |
Ah ok! I can look into changing that |
To be fair, that was apparently my bad. 😅 So obviously don't feel obligated to do so. But I just wanted to make it known that the issue you ran into was in part because of a mistake from me... |
Is it possible to get a new release with this fix included? |
Done. |
@shyuep: Something seems up here with the new release: |
@Andrew-S-Rosen @shyuep : looking in the 2024.10.14 custodian dir, I only have these files: |
Is it just that the pyproject.toml should have: |
The new release has fixed it. |
Minor update to the logic of the
auto_nbands
check forVaspErrorHandler
. This check sees if the number of bands has been updated by VASP, and currently it only checks to see if that updated number is very large.However, there are cases where the user specifies an NBANDS that is incompatible with their parallelization settings, as NBANDS must be divisible by$(\mathrm{ranks}) / (\mathrm{KPAR} \times \mathrm{NCORE})$ . In these cases, VASP increases the number of bands to ensure the calculation can still proceed. This can happen in MP's band structure workflows with uniform $k$ -point densities.
However, since the current
auto_nbands
handler applies no corrections to the job, these otherwise successful runs are killed.This PR adds logic to ensure that the calculation is rerun with a higher number of bands appropriate to the parallelization setting. This is kinda redundant, since VASP already does this. But I think it has to occur this way because
VaspErrorHandler
is monitoring the job and flags it for anauto_nbands
error.Another implementation concern: it's generally safer to decrease the number of bands since this requires a lower energy cutoff to converge each band. It might be safer to decrease NBANDS as a fix