You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to automatically reboot all crab schedulers at least once a year.
prerequisite
We should figure out how to be nice to HammerCloud on schedd restart if we want to automatically reboot our schedds: #7410
implementation
We had a brainstorming session in udine and so far the simplest thing that we could think of is
move the list of enabled schedds that TW should use from gitlab to puppet.
every schedd in the list should have a new parameter alongside the current "enabled: 0/1", and it should be the desired date of schedd reboot. make sure that there is no overlap in these dates! If you want to be fancy, you could also have a list of dates, in order to accommodate for multiple reboots per year.
TW reads the configuration and if the "(current time - reboot time) < 1week", then TW should stop using that schedd
add a daily cronjob to the schedd. if the "(current time - reboot time) < 2d" (using 2 days with a daily cronjob should avoid negative results, maybe they can be a problem in a bash script) then reboot the schedd. the reboot procedure should be
hold all the running dagmans, specifying "schedd reboot" as hold reason
condor_off
reboot
(condor_on should be automatic)
add a systemctl service unit, systemctl timer, hourly cronjob, whatever system you like, that releases all the hold dagmans that match the "schedd reboot" hold reason
The text was updated successfully, but these errors were encountered:
Our review of the account(s) and/or content named in your report has concluded. We have determined that one or more violations of GitHub’s Terms of Service have occurred and have taken appropriate action in response.
intro
We want to automatically reboot all crab schedulers at least once a year.
prerequisite
We should figure out how to be nice to HammerCloud on schedd restart if we want to automatically reboot our schedds: #7410
implementation
We had a brainstorming session in udine and so far the simplest thing that we could think of is
The text was updated successfully, but these errors were encountered: