Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split edpm into nodeset and deployment stages #197

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

fao89
Copy link
Contributor

@fao89 fao89 commented Apr 23, 2024

No description provided.

@fao89 fao89 force-pushed the edpm_stages branch 2 times, most recently from 7194363 to 23641e6 Compare April 23, 2024 16:02
@fao89 fao89 marked this pull request as ready for review April 23, 2024 16:02
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll also have to modify this file to use split data plane

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, it is fine, but how can I test this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm confused, because I thought we wanted to break the dataplane stages into two stages, such that OpenStackDataPlaneNodeSet can reach Ready state in the first stage and then OpenStackDataPlaneDeployment reaches Ready in the second stage (which we check for via the separate stages' wait_conditions, i.e. oc wait osdpns openstack-edpm -n openstack --for condition=Ready --timeout=1200s and then oc wait osdpd edpm-deployment -n openstack --for condition=Ready --timeout=1200s). If we don't separate these into two different stages, won't we be applying OpenStackDataPlaneNodeSet and OpenStackDataPlaneDeployment at the same time still?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right!

@abays
Copy link
Contributor

abays commented Apr 23, 2024

We'll also need to update the various example data plane .md files to reflect the new approach

@fao89 fao89 force-pushed the edpm_stages branch 4 times, most recently from 7e77fad to ffb6e43 Compare April 24, 2024 11:22
@fao89 fao89 requested a review from fultonj April 24, 2024 12:14
```
oc wait osdpns openstack-edpm --for condition=SetupReady --timeout=600s
```
Create the nodeset CR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/nodeset/deployment

Copy link
Contributor

@abays abays Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to rewrite this using the Component pattern like va/hci/edpm-pre-ceph/kustomization.yaml, so that we're not duplicating replacements logic here and in va/hci/edpm-post-ceph/deployment/kustomization.yaml and va/hci/edpm-post-ceph/nodeset/kustomization.yaml. I know there's also a components include here of - ../../../lib/control-plane, but perhaps it would be okay to place it in va/hci/edpm-post-ceph/nodeset/kustomization.yaml? Or maybe we need a separate stage all together for something like va/hci/control-plane-post-ceph to house the control plane update (and thus the EDPM post-Ceph stage would really become three stages: control-plane, nodeset and deployment)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the duplicated replacements was a leftover from my previous attempt

@fao89 fao89 force-pushed the edpm_stages branch 2 times, most recently from de4eb31 to 73aa700 Compare April 24, 2024 14:30
@cjeanner
Copy link
Contributor

From my side, I'm ready. I want to get some reviews from the team, but openstack-k8s-operators/ci-framework#1547 works.

Since we'll break current jobs downstream if they inject any hook or kustomization (usually they stage_2, which would now be stage_3), I stacked a second patch in the PR to allow using a "better" name, built on the build_output.

Copy link
Contributor

@fultonj fultonj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+2

Overall I'm fine with this change.

Before we merge, though someone (because we don't yet have fully automated gating) should ensure that a full ci-framework reproducer run with the updated automation/vars/default.yaml completes successfully before we merge. Testing va1 and uni01alpha seems prudent. Let's make sure someone from ci-team agrees before merging.

With VA1 we usually get:

├── dataplane-post-ceph.yaml
├── dataplane-pre-ceph.yaml

After this PR merges we'll get:

├── dataplane-deployment-pre-ceph.yaml
├── dataplane-nodeset-pre-ceph.yaml
├── deployment-post-ceph.yaml
└── nodeset-post-ceph.yaml

I built both and then compared the new pre ceph from this branch against the old:

cd new
(cat dataplane-deployment-pre-ceph.yaml ; echo "---"; cat dataplane-nodeset-pre-ceph.yaml) > pre.yaml
cd ..
diff -u old/dataplane-pre-ceph.yaml new/pre.yaml 

It looked fine (i.e. no meaningful difference).

I compared the new post ceph from this branch against the main branch (old) too. diff -u old/dataplane-post-ceph.yaml new/nodeset-post-ceph.yaml is the same as deployment-post-ceph.yaml; which is how it should be.

Nit: I didn't like the name dataplane-post-ceph.yaml in VA1 since it also included an update to the control plane. Similarly, nodeset-post-ceph.yaml has a nodeset but it also has an update to the control plane CR, so I think calling it post-ceph.yaml would be better. We don't have to block this on that however.

@psathyan
Copy link
Contributor

uni01alpha testing results

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
compute-0                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
compute-1                  : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
localhost                  : ok=388  changed=120  unreachable=0    failed=0    skipped=141  rescued=1    ignored=0   

Thursday 25 April 2024  01:29:40 -0400 (0:00:00.186)       0:46:44.479 ******** 
=============================================================================== 
kustomize_deploy : Run Wait Conditions for examples/dt/uni01alpha ------------------------------------------------------------------------------------------------------------------------------------------------------------------- 768.64s
kustomize_deploy : Run Wait Conditions for examples/dt/uni01alpha/networker --------------------------------------------------------------------------------------------------------------------------------------------------------- 471.60s
kustomize_deploy : Run Wait Conditions for examples/dt/uni01alpha/control-plane ----------------------------------------------------------------------------------------------------------------------------------------------------- 450.02s
test_operator : Wait for the last job to be Completed - tempest --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 396.85s
os_must_gather : Run openstack-must-gather command ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 185.09s
openshift_obs : Wait for observability-operator pod ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 60.61s
ci_setup : Install needed packages --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 54.03s
kustomize_deploy : Wait for cert-manager-operator pods ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 40.62s
kustomize_deploy : Wait for MetalLB speaker pods ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 36.04s
test_operator : Wait until the test-operator csv is present -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 33.56s
test_operator : Wait for the test-operator csv to Succeed ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 32.11s
openshift_setup : Wait for openshift-marketplace pods to be running ------------------------------------------------------------------------------------------------------------------------------------------------------------------ 31.83s
kustomize_deploy : Wait for certmanager pods ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 30.93s
kustomize_deploy : Wait for NMstate webhook deployment ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 20.75s
kustomize_deploy : Wait for NMstate handler pods ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 20.66s
artifacts : Ensure we have proper rights on the gathered content --------------------------------------------------------------------------------------------------------------------------------------------------------------------- 20.00s
kustomize_deploy : Wait for cainjector pods ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 11.00s
test_operator : Ensure that the test-operator-logs-pod is Running -------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.53s
kustomize_deploy : Run Wait Conditions for examples/dt/uni01alpha/control-plane/nncp -------------------------------------------------------------------------------------------------------------------------------------------------- 6.57s
kustomize_deploy : Wait for webhook pods ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.52s

@abays abays force-pushed the edpm_stages branch 2 times, most recently from 6330fef to 1c87769 Compare April 25, 2024 16:30
@fultonj
Copy link
Contributor

fultonj commented Apr 25, 2024

Is this PR (197) being done in response to this change to the dataplane operator?

openstack-k8s-operators/dataplane-operator#839

I think @abays added his most recent change to this patch (197) because he is still getting stuck on the check introduced by PR839 above.

https://github.com/openstack-k8s-operators/architecture/compare/1c87769bd24b95cbc96003a5741fb20f3ec7e63c..eca199600cd164fde4550a398794e9eb9c91fb76

@fultonj
Copy link
Contributor

fultonj commented Apr 26, 2024

Is this PR (197) being done in response to this change to the dataplane operator?

openstack-k8s-operators/dataplane-operator#839

I think @abays added his most recent change to this patch (197) because he is still getting stuck on the check introduced by PR839 above.

That is fixed by openstack-k8s-operators/dataplane-operator#859

Copy link
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@abays abays merged commit 3e9cb57 into openstack-k8s-operators:main Apr 29, 2024
4 checks passed
@fultonj fultonj mentioned this pull request Apr 30, 2024
@karelyatin karelyatin mentioned this pull request May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants