Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up the notebook-e2e-test to sepereted groups #53010

Closed
wants to merge 1 commit into from

Conversation

atheo89
Copy link
Contributor

@atheo89 atheo89 commented Jun 10, 2024

Related to: https://issues.redhat.com/browse/RHOAIENG-8399

This PR aims to improve testing efficiency and reduce testing time.

To archive that we had to change a bit the ocp-ci configuration.

Currently, all notebook tests are grouped under a single test suite called "notebooks-e2e-tests". However, this approach had several drawbacks:

  • Execution time is prolonged.
  • Failure of a single notebook test necessitates a full restart.
  • Tests often fail due to timeouts and resource shortages.
  • Testing unnecessary notebooks due to lack of selective testing.

To address these issues, we propose breaking down the unified test suite into separate tests, triggered only when relevant changes occur using the run_if_changed: (regex) option, this way will reduce the time and will spot easier problematic notebooks.

So the notebooks-e2e-tests broke into the following:

  1. notebooks-ubi8-e2e-tests
    Triggered when changes occur in the following repository folders:
    run_if_changed: (base/ubi8-python-3.8/)|(jupyter/minimal/ubi8-python-3.8/)|(jupyter/datascience/ubi8-python-3.8/)|(jupyter/pytorch/ubi8-python-3.8/)|(jupyter/tensorflow/ubi8-python-3.8/)|(jupyter/trustyai/ubi8-python-3.8/)

  2. anaconda-ubi8-e2e-tests
    Triggered when changes occur in the following repository folders: (base/anaconda-python-3.8/)|(jupyter/datascience/anaconda-python-3.8/)

  3. notebooks-ubi9-e2e-tests
    Triggered when changes occur in the following repository folders:
    run_if_changed: (base/ubi9-python-3.9/)|(jupyter/minimal/ubi9-python-3.9/)|(jupyter/datascience/ubi9-python-3.9/)|(jupyter/pytorch/ubi9-python-3.9/)|(jupyter/tensorflow/ubi9-python-3.9/)|(jupyter/trustyai/ubi9-python-3.9/)

  4. codeserver-notebook-e2e-tests
    Triggered when changes occur in the following repository folders:
    run_if_changed: (base/ubi9-python-3.9/)|(codeserver/ubi9-python-3.9/)

  5. rstudio-notebook-e2e-tests
    Triggered when changes occur in the following repository folders:
    run_if_changed: (base/c9s-python-3.9/)|(rstudio/c9s-python-3.9/)

  6. runtimes-ubi8-e2e-test
    Triggered when changes occur in the following repository folders:
    run_if_changed: (base/ubi8-python-3.8/)|(runtimes/datascience/ubi8-python-3.8/)|(runtimes/pytorch/ubi8-python-3.8/)|(runtimes/tensorlow/ubi8-python-3.8/)

  7. runtimes-ubi9-e2e-test
    Triggered when changes occur in the following repository folders:
    run_if_changed: (base/ubi9-python-3.9/)|(runtimes/datascience/ubi9-python-3.9/)|(runtimes/pytorch/ubi9-python-3.9/)|(runtimes/tensorlow/ubi9-python-3.9/)

8 intel-notebooks-e2e-tests
Triggered when changes occur in the following repository folders:
run_if_changed: (intel/base/gpu/ubi9-python-3.9/)|(jupyter/intel/pytorch/ubi9-python-3.9/)|(intel/runtimes/tensorflow/ubi9-python-3.9/*)

PS: The habana-notebooks-e2e-tests was already separated

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 10, 2024

/pj-rehearse more

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 10, 2024
@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

/pj-rehearse ci/rehearse/opendatahub-io/notebooks/main/notebooks-ubi9-e2e-tests

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@atheo89: job(s): ci/rehearse/opendatahub-io/notebooks/main/notebooks-ubi9-e2e-tests either don't exist or were not found to be affected, and cannot be rehearsed

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

/pj-rehearse ci/rehearse/opendatahub-io/notebooks/main/codeserver-notebook-e2e-tests

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@atheo89: job(s): ci/rehearse/opendatahub-io/notebooks/main/codeserver-notebook-e2e-tests either don't exist or were not found to be affected, and cannot be rehearsed

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

/pj-rehearse more

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@jiridanek
Copy link
Member

jiridanek commented Jun 11, 2024

Looks legit to me. I am not at all familiar with OPC-CI yamls, so I'd want to see this running before I'd have the confidence to lgtm it.

I am slightly concerned that the jobs end up touching images they are not testing. For example, INFO[2024-06-11T09:21:02Z] Building amd-c9s-python-3.9 appeared in logs of rehearse-53010-pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests #1800452668397719552 at https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/53010/rehearse-53010-pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests/1800452668397719552. Does this mean that since each e2e subset builds all images, the images get built repeatedly? While this is indeed wasteful, I consider this PR an overall improvement on the current situation, so I'm still in favour despite this.

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

Ok, 3 out of 8 rehearsals jobs failed - with almost 62% of success. Maybe in a second iteration of retest they would pass.
The failed tests are either because of infra limitations or unsuccessful build image on the first step.
However, the scenario to run all the e2e tests would be on a major release update PR. Mainly, most of the times a PR would fall in some of the e2e subsets.

Does this mean that since each e2e subset builds all images, the images get built repeatedly? While this is indeed wasteful, I consider this PR an overall improvement on the current situation, so I'm still in favour despite this.

@jiridanek Indeed. However there is no filtering for image builds on the ocp-ci documentation, so i think we can not avoid it :/

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

/pj-rehearse pull-ci-opendatahub-io-notebooks-main-notebooks-ubi8-e2e-tests
/pj-rehearse pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests
/pj-rehearse pull-ci-opendatahub-io-notebooks-main-rstudio-notebook-e2e-tests

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@atheo89: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

1 similar comment
@openshift-ci-robot
Copy link
Contributor

@atheo89: requesting more than one rehearsal in one comment is not supported. If you would like to rehearse multiple specific jobs, please separate the job names by a space in a single command.

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

/pj-rehearse pull-ci-opendatahub-io-notebooks-main-notebooks-ubi8-e2e-tests

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 11, 2024

/pj-rehearse pull-ci-opendatahub-io-notebooks-main-rstudio-notebook-e2e-tests

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Contributor

openshift-ci bot commented Jun 11, 2024

@atheo89: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/opendatahub-io/notebooks/main/intel-notebooks-e2e-tests 8215b9b link unknown /pj-rehearse pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 12, 2024

/pj-rehearse pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests

@openshift-ci-robot
Copy link
Contributor

@atheo89: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@atheo89, pj-rehearse: unable to determine affected jobs ERROR:

could not load configuration from base revision of release repo: could not checkout worktree: '[git checkout f74bd26fdb04b87225558789148d313150bed522]' failed with out:  and error exec: Stdout already set

If the problem persists, please contact Test Platform.

@jiridanek
Copy link
Member

jiridanek commented Jun 13, 2024

/pj-rehearse pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests

that's broken currently, it won't pass https://issues.redhat.com/browse/RHOAIENG-8388

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 14, 2024

Upstream tracking issue: opendatahub-io/notebooks#562

Copy link
Contributor

openshift-ci bot commented Jun 14, 2024

@jiridanek: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@atheo89: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-opendatahub-io-notebooks-main-anaconda-ubi8-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-codeserver-notebook-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-intel-notebooks-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-notebooks-ubi8-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-notebooks-ubi9-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-rstudio-notebook-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-runtimes-ubi8-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
pull-ci-opendatahub-io-notebooks-main-runtimes-ubi9-e2e-tests opendatahub-io/notebooks presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Copy link
Contributor

openshift-ci bot commented Jun 20, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: atheo89, jiridanek
Once this PR has been reviewed and has the lgtm label, please assign davidvossel for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 20, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's some mistake, making changes in this file, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't, maybe some mistake during the rebase happened here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I will open a seperate PR, and i will close this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is #53476

@atheo89
Copy link
Contributor Author

atheo89 commented Jun 21, 2024

Close this in favor of : #53476

@atheo89 atheo89 closed this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants