Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chief of Monitors #669

Closed
usercont-release-bot opened this issue Nov 1, 2024 · 0 comments
Closed

Chief of Monitors #669

usercont-release-bot opened this issue Nov 1, 2024 · 0 comments
Assignees
Labels

Comments

@usercont-release-bot
Copy link
Collaborator

Sentry

  • On Monday check all the alerts for the past week-end
  • Fix recurring alerts
    • Deploy fixes to production for issues that cause major disruption or complete downtime
    • Verify no alerts are being triggered anymore
  • More complicated issues should be brought to the team and prioritized correctly.
  • Clean all the past alerts so we have easy to navigate dashboard
  • Link GitHub and Sentry issues

Grafana

React to alerts arriving through email and check the SLO monitoring page (Packit section) and respond to the email so others know what is happening. Suggest updates of the alert thresholds if needed.

Watch our other two Grafana dashboards as well:

SLO1 issues investigation

We are investigating SLO1 issues. They could be related to short running tasks taking more than half a minute to complete.
When looking at the Celery monitoring dashboard pay attention to short running tasks and how long they took to complete.
For the moment we can report misbehaving here.

CI/Zuul

You are responsible throughout the week for keeping the CI green, that is to look for and drive the resolution of systematic CI failures.

It can happen that a CI system has an outage. For problems related to Zuul, please reach out to the team at #sf-ops matrix.org or #rhos-ops Slack channel.

pre-commit-ci

Once the pre-commit-ci user creates updates to our pre-commit configs, take care of the pull requests:

Openshift

If you think there's something wrong with the Openshift instance we're running in:

  • Automotive cluster - ask in packit-auto-shared-infra in internal Google chat or mailto [email protected]
  • Managed Platform Plus - ask in #help-it-cloud-openshift in internal Slack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: done
Development

No branches or pull requests

2 participants