Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjustments for expired dot token Jenkins jobs #731

Open
3 tasks
robrap opened this issue Jul 17, 2024 · 1 comment
Open
3 tasks

Adjustments for expired dot token Jenkins jobs #731

robrap opened this issue Jul 17, 2024 · 1 comment
Labels

Comments

@robrap
Copy link
Contributor

robrap commented Jul 17, 2024

The job https://tools-edx-jenkins.edx.org/job/oauth/job/stage-edx-delete_expired_dot_tokens/ failed, and I'm guessing it will pass on the next run, which is currently configured for a day later.

Here are some adjustments we'd like to make:

  • Adjust the 3 oauth Jenkins jobs so arch-bom has permissions to run the job (to clean up after a failure). (SRE request?)
  • Adjust in OpsGenie to only report on the 3rd failure, so it has a chance to re-run and fix itself.
    • Adjust the stage job to run hourly during the working day.
      • Note that the 3rd failure will alert at whatever time it fails, thus configuring during working hours.
    • Do we want to configure the 3rd failure alerting for the prod jobs as well? If so, should they only be during working hours?
@robrap robrap added the on-call label Jul 17, 2024
@jristau1984
Copy link

jristau1984 commented Jul 22, 2024

From Grooming discussion:

  • Just have SRE grant re-run permissions to Arch-BOM to allow rerunning the job after a failure. This can get a smaller subtask specific to that request, and use the escalate label to get it in SRE board.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

2 participants