Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Don't kill a VASP job if NBANDS is very high with no other warning #344

Closed
3 tasks
Andrew-S-Rosen opened this issue Jul 15, 2024 · 6 comments
Closed
3 tasks
Labels

Comments

@Andrew-S-Rosen
Copy link
Member

Andrew-S-Rosen commented Jul 15, 2024

Code snippet

No response

What happened?

In the following block of code, if VASP updates the number of bands automatically and it's > 2x the number of electrons, Custodian will kill the job even if VASP completes with no errors.

if "auto_nbands" in self.errors and (nbands := self._get_nbands_from_outcar(directory)):
outcar = load_outcar(os.path.join(directory, "OUTCAR"))
if (nelect := outcar.nelect) and (nbands > 2 * nelect):
self.error_count["auto_nbands"] += 1
warnings.warn(
"NBANDS seems to be too high. The electronic structure may be inaccurate. "
"You may want to rerun this job with a smaller number of cores.",
UserWarning,
)

In practice, having such a large number of bands is a sign of a problem by the user and can lead to spurious energetic states. I have observed this for gas-phase CO in a box. But we ultimately should not kill the job. We should just warn the user (even though it's unlikely they'll read the logs).

This was indirectly noted by @esoteric-ephemera in #342.

I can patch this, but it admittedly won't be for a little while. We should rethink how the bands stuff is being handled because in practice, it doesn't actually seem ideal...

Version

v2024.6.24

Which OS?

  • MacOS
  • Windows
  • Linux

Log output

No response

@Andrew-S-Rosen
Copy link
Member Author

Andrew-S-Rosen commented Jul 15, 2024

My proposal is as follows.

  1. Remove "auto_nbands" from the error_msgs list. It is not an error but rather just a useful warning from VASP. Keeping it here could cause some mistakes later on by developers.

  2. Instead of relying on the if "auto_nbands" check, have the following block of checks be done in a "bespoke" manner where it is not tied to a specific VASP warning or error message:

if "auto_nbands" in self.errors and (nbands := self._get_nbands_from_outcar(directory)):
outcar = load_outcar(os.path.join(directory, "OUTCAR"))
if (nelect := outcar.nelect) and (nbands > 2 * nelect):
self.error_count["auto_nbands"] += 1
warnings.warn(
"NBANDS seems to be too high. The electronic structure may be inaccurate. "
"You may want to rerun this job with a smaller number of cores.",
UserWarning,
)
elif nbands := vi["INCAR"].get("NBANDS"):
kpar = vi["INCAR"].get("KPAR", 1)
ncore = vi["INCAR"].get("NCORE", 1)
# If the user set an NBANDS that isn't compatible with parallelization settings,
# increase NBANDS to ensure correct task distribution and issue a UserWarning.
# The number of ranks per band is (number of MPI ranks) / (KPAR * NCORE)
if (ranks := outcar.run_stats.get("cores")) and (rem_bands := nbands % (ranks // (kpar * ncore))) != 0:
actions.append({"dict": "INCAR", "action": {"_set": {"NBANDS": nbands + rem_bands}}})
warnings.warn(
f"Your NBANDS={nbands} setting was incompatible with your parallelization "
f"settings, KPAR={kpar}, NCORE={ncore}, over {ranks} ranks. "
f"The number of bands has been decreased accordingly to {nbands + rem_bands}.",
UserWarning,
)

  1. For the case of the UserWarning "NBANDS seems to be too high...", don't kill the job. Just warn the user.

This probably means we should be treating this as a new handler. Just like how we have an UnconvergedErrorHandler that checks for convergence, we can have an NbandsErrorHandler that checks for the appropriateness of nbands independent of any specific VASP error message.

@shyuep
Copy link
Member

shyuep commented Oct 16, 2024

This has been fixed. But I have added a update_incar setting for VaspJob that solves this problem to some extent.

@shyuep shyuep closed this as completed Oct 16, 2024
@yuan-gist
Copy link

I encountered a Custodian nbands issue when running the Lobster workflow: issue

As long as there is enough number of basis functions, the number of empty bands should not be issue, so I do not think Custodian should settle in when NBANDS is automatically changed by VASP.

@shyuep
Copy link
Member

shyuep commented Nov 12, 2024

This was already fixed in the latest custodian release.

@yuan-gist
Copy link

Sorry, forgot to mention my Custodian version is 2024.10.16, which is the latest. It seems the issue is still there, though I have not try update_incar setting for this.

@JaGeo
Copy link
Member

JaGeo commented Nov 13, 2024

@shyuep thanks for the help! The new version has solved the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants