Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch base image to UBI for AMD rocm install #620

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

harshad16
Copy link
Member

@harshad16 harshad16 commented Jul 15, 2024

Description

Switch base image to UBI for AMD rocm install
Related-to: https://issues.redhat.com/browse/RHOAIENG-7501

How Has This Been Tested?

1. build the base `podman build -t amd-base .`
2. check the necessary bits are available in amd-base , with `rpm -qa`

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Contributor

openshift-ci bot commented Jul 15, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jiridanek jiridanek added the trivy-scan This label that allows trivy to create a security report on the pull requests label Jul 15, 2024
@harshad16 harshad16 marked this pull request as ready for review July 16, 2024 15:35
@openshift-ci openshift-ci bot requested review from atheo89 and jstourac July 16, 2024 15:35
@jstourac
Copy link
Member

Looks like we need to update the prow configuration then too:

@jiridanek
Copy link
Member

Also Makefile should be updated (in this PR), to reference ubi9- and not c9s- dir.

Copy link
Member

@atheo89 atheo89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great transition to UBI 🙂
I added some minor suggestions, overall looks good to me.

I was able to built this image you may find it here:
quay.io/rh_ee_atheodor/workbench-images:amd-ubi9-python-3.9-2024a_20240717

amd/ubi9-python-3.9/Dockerfile Outdated Show resolved Hide resolved
amd/ubi9-python-3.9/Dockerfile Outdated Show resolved Hide resolved
@openshift-ci openshift-ci bot removed the lgtm label Jul 17, 2024
@harshad16 harshad16 changed the title Switch the base image to UBI, use centos stream for CRB Switch base image to UBI for AMD rocm install Jul 17, 2024
@jiridanek
Copy link
Member

We're still running out of disk space on one of the images, as @caponetto observed yesterday.

@jiridanek
Copy link
Member

ci/prow/notebook-amd-c9s-python-3-9-pr-image-mirror — Job failed.                     BaseSHA:fecd10c13ce13d66b57498abc48976f74121dd63

so openshift-ci needs to be updated

@jiridanek
Copy link
Member

@jstourac so, approving in GitHub UI gives BOTH LGTM and approved, now that we are approvers. Gotta be careful.

Makefile Outdated Show resolved Hide resolved
@caponetto
Copy link
Contributor

We're still running out of disk space on one of the images, as @caponetto observed yesterday.

At first, I thought that was happening only during Trivy scan because it has to copy stuff around. But then I saw that CI is running out of disk space even during the build step for amd-jupyter-pytorch-ubi9-python-3.9 (examples are this PR and some builds triggered after a merge). Considering that amd-jupyter-pytorch-ubi9-python-3.9 is ~60 GB uncompressed, the CI is operating on its limit. If we ever need to add new things to this image, we'll probably face these storage issues more often.

@caponetto
Copy link
Contributor

Apparently, there are more people concerned about rocm+pytorch size (see ROCm/ROCm-docker#120)

@atheo89
Copy link
Member

atheo89 commented Jul 19, 2024

To fix the ci please check this PR: #627 explains the instructions on how you can update the notebook matrix

- switch to rocm naming convention
- adjust the makefile with ubi changes

Signed-off-by: Harshad Reddy Nalla <[email protected]>
Copy link
Contributor

openshift-ci bot commented Jul 22, 2024

@harshad16: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebooks-ubi8-e2e-tests b8a1d3f link true /test notebooks-ubi8-e2e-tests
ci/prow/notebook-amd-c9s-python-3-9-pr-image-mirror 3cfe039 link true /test notebook-amd-c9s-python-3-9-pr-image-mirror
ci/prow/notebook-amd-jupyter-minimal-c9s-python-3-9-pr-image-mirror 3cfe039 link true /test notebook-amd-jupyter-minimal-c9s-python-3-9-pr-image-mirror
ci/prow/amd-runtimes-ubi9-e2e-tests 3cfe039 link true /test amd-runtimes-ubi9-e2e-tests
ci/prow/runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror 3cfe039 link true /test runtime-rocm-pytorch-ubi9-python-3-9-pr-image-mirror

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@atheo89
Copy link
Member

atheo89 commented Jul 23, 2024

/lgtm
/approve

/override ci/prow/images
/override /ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
/override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
/override ci/prow/rocm-notebooks-e2e-tests

@openshift-ci openshift-ci bot added the lgtm label Jul 23, 2024
Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

@atheo89: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • /ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror

Only the following failed contexts/checkruns were expected:

  • build (rocm-jupyter-pytorch-ubi9-python-3.9) / build
  • ci/prow/images
  • ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
  • ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
  • ci/prow/notebook-rocm-ubi9-python-3-9-pr-image-mirror
  • ci/prow/rocm-notebooks-e2e-tests
  • pull-ci-opendatahub-io-notebooks-2023a-images
  • pull-ci-opendatahub-io-notebooks-main-notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
  • pull-ci-opendatahub-io-notebooks-main-notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
  • pull-ci-opendatahub-io-notebooks-main-notebook-rocm-ubi9-python-3-9-pr-image-mirror
  • pull-ci-opendatahub-io-notebooks-main-rocm-notebooks-e2e-tests
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/lgtm
/approve

/override ci/prow/images
/override /ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
/override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
/override ci/prow/rocm-notebooks-e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atheo89, jiridanek, jstourac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [atheo89,jiridanek,jstourac]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek
Copy link
Member

let me try

/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build"
/override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
/override ci/prow/rocm-notebooks-e2e-tests
/override ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
/override ci/prow/images

Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

@jiridanek: Overrode contexts on behalf of jiridanek: build (rocm-jupyter-pytorch-ubi9-python-3.9) / build, ci/prow/images, ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror, ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror, ci/prow/rocm-notebooks-e2e-tests

In response to this:

let me try

/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build"
/override ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
/override ci/prow/rocm-notebooks-e2e-tests
/override ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
/override ci/prow/images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jiridanek
Copy link
Member

/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build"

Copy link
Contributor

openshift-ci bot commented Jul 23, 2024

@jiridanek: /override requires failed status contexts, check run or a prowjob name to operate on.
The following unknown contexts/checkruns were given:

  • build (rocm-jupyter-pytorch-ubi9-python-3.9) / build

Only the following failed contexts/checkruns were expected:

  • ci/prow/images
  • ci/prow/notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
  • ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
  • ci/prow/notebook-rocm-ubi9-python-3-9-pr-image-mirror
  • ci/prow/rocm-notebooks-e2e-tests
  • pull-ci-opendatahub-io-notebooks-2023a-images
  • pull-ci-opendatahub-io-notebooks-main-notebook-rocm-jupyter-pyt-ubi9-python-3-9-pr-image-mirror
  • pull-ci-opendatahub-io-notebooks-main-notebook-rocm-jupyter-tf-ubi9-python-3-9-pr-image-mirror
  • pull-ci-opendatahub-io-notebooks-main-notebook-rocm-ubi9-python-3-9-pr-image-mirror
  • pull-ci-opendatahub-io-notebooks-main-rocm-notebooks-e2e-tests
  • tide

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override "build (rocm-jupyter-pytorch-ubi9-python-3.9) / build"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit bc3b27e into opendatahub-io:main Jul 23, 2024
12 of 13 checks passed
jiridanek added a commit to jiridanek/notebooks that referenced this pull request Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm trivy-scan This label that allows trivy to create a security report on the pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants