Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tango should skip unavailable backends when using distDocker #248

Open
20wildmanj opened this issue Feb 15, 2024 · 0 comments
Open

Tango should skip unavailable backends when using distDocker #248

20wildmanj opened this issue Feb 15, 2024 · 0 comments

Comments

@20wildmanj
Copy link
Contributor

From Chaskiel re: an error that recently occured on CMU prod

The error was triggered by an operating system error (an LDAP lookup failed when tango ssh'd to the docker node). There has only been one such failure since the beginning of the semester

ERROR|2024-02-08 10:43:41,556|TangoREST|addJob request failed: Command '['ssh', '-o', 'BatchMode=yes', '-i', '/usr/local/lib/Tango/vmms/id_rsa', '-o', 'StrictHostKeyChecking=no', '-o', 'GSSAPIAuthentication=no', '[email protected]', '(docker images)']' returned non-zero exit status 255.

For the devs:
the failure was in getImages, not waitVM (which has retry logic), or a later step (where retries are more complicated because you'd have to restart the process from the copyIn phase)
It may make sense for DistDocker.getImages (and maybe DistDocker.getVms) to skip backends that are unavailable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant