The recommend way of resolving failures is 1) root causing via checking the logs, 2) determining the appropriate action, and 3) making changes to the CI pipeline if needed.
Logs are exposed via GCB in a link that’s accessible via the “Details” link of
the public-pr
GitHub CI status. If the results are for some reason not
available, then you can check the build results by opening the Cloud project to
find the corresponding build.
When checking the logs, it is helpful to keep in mind the CI steps outlined here to know where the error occurred in the CI pipeline.
While incidents can be due to bugs in the CI pipeline’s implementation, GitHub
or GCB can also be the culprit. For example, if too many GitHub calls are made
(due to CI running too many concurrently-running PRs), then some statuses might
remain in the “pending” state, which just necessitates a re-run. GCB might have
a backward-incompatible change
that needs updating. In
the past, GitHub has changed its private/public key pair, necessitating
a change to the known_hosts
file required for GCB to run.
If you determine that a change is required to the CI pipeline, then understand which step needs to change and open a PR with the fix. You can test the fix by opening a test PR in openconfig/public and changing the branch used when pulling the CI pipeline code. Once it is approved, cut a new release of the CI pipeline based on these guidelines. Lastly, submit a PR in the openconfig/public repo if needed, for example to bump the CI pipeline version used.
Example 1: Making a fix
Example 2: Adding a feature to the CI pipeline
Example 3: Adding a validator to the CI pipeline
Example 4: Building the CI image at a regular frequency to avoid stale dependencies