Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible operator v1.25.2 misses custom resource modify events #6202

Closed
venkataramanam opened this issue Nov 24, 2022 · 6 comments
Closed

Ansible operator v1.25.2 misses custom resource modify events #6202

venkataramanam opened this issue Nov 24, 2022 · 6 comments
Labels
language/ansible Issue is related to an Ansible operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-information Indicates an issue needs more information in order to work on it. triage/support Indicates an issue that is a support question.
Milestone

Comments

@venkataramanam
Copy link

Bug Report

What did you do?

We have multiple instances of a custom resource in our application. One among those we treat it as some kind of master instance. The others are treated as sort of logical instances. Whenever any updates are done to the master like upgrading it by specifying a field .spec.version, operator does all the necessary hard job of upgrading that instance. Finally at the very end of reconcile, the operator code updates .spec.version in all other logical instances. This act of updating is to synchronize the logical instances with the master. Once the master CR reconcile has been completed, Operator used to pick up change events in all logical instances and reconcile them as well. Please note that we have let the Operator to reconcile one at time (sequential and not parallel - --max-concurrent-reconciles=1). All of this used to work absolutely fine till now and when we upgraded to v1.25.2, we see the change events to logical events (the .spec.version updates) are not being getting reconciled. We dont know why.

This is the ansible-operator command line with its args...

ENTRYPOINT ["/tini", "--", "/usr/local/bin/ansible-operator", "run", \
    "--watches-file=./watches.yaml", \
    "--zap-log-level=error", \
    "--max-concurrent-reconciles=1", \
    "--ansible-verbosity=0", \
    "--reconcile-period=0s" \
    ]

The watches.yaml

- version: v1
  group: wos.cpd.ibm.com
  kind: WOService
  snakeCaseParameters: False
  playbook: reconcile.yaml
  watchDependentResources: False
  finalizer:
    name: wos.cpd.ibm.com/finalizer
    playbook: finalize.yaml

NOTE:

  1. The reconcile-period is set to 0 to not reconcile the CRs periodically unless a change is made to a CR.
  2. The logical instance CRs are not owned by the master instance CR, incase this has any bearing.

What did you expect to see?

We expected the change events in the logical instance CRs to be reconciled by the Operator.

What did you see instead? Under which circumstances?

We are seeing the change events in the logical instance CRs are not getting reconciled by the Operator. This used to work prior to us upgrading to v1.2.5.2 ...The last version was v1.25.0 where it used to work all good.

Environment

Operator type:

/language ansible

Kubernetes cluster type:

OpenShift
Server Version: 4.10.37
Kubernetes Version: v1.23.5+8471591
$ operator-sdk version

v1.25.2

$ go version (if language is Go)

$ kubectl version

Possible Solution

Additional context

@openshift-ci openshift-ci bot added the language/ansible Issue is related to an Ansible operator project label Nov 24, 2022
@venkataramanam venkataramanam changed the title Ansible operator v1.25.2 misses CR events Ansible operator v1.25.2 misses custom resource modify events Nov 24, 2022
@varshaprasad96
Copy link
Member

varshaprasad96 commented Nov 28, 2022

@venkataramanam we don't have any particular changes that could affect the way ansible operator works in 1.25.2 from 1.25.0 (https://sdk.operatorframework.io/docs/upgrading-sdk-version/v1.25.2/). Could you add any error logs if they appear?
Is there a change in openshift version on which the operator is being run? Could you also check if there are any events associated with the logical instances which are different from the previous working release?

@varshaprasad96 varshaprasad96 added the triage/support Indicates an issue that is a support question. label Nov 28, 2022
@varshaprasad96 varshaprasad96 added this to the Backlog milestone Nov 28, 2022
@venkataramanam
Copy link
Author

@varshaprasad96

we don't have any particular changes that could affect the way ansible operator works in 1.25.2 from 1.25.0

Yes, I did see there aren't changes around this area.

Could you add any error logs if they appear?

Unfortunately, there aren't any error logs.

Is there a change in openshift version on which the operator is being run?

Yes, there is a change in the OpenShift version. Not sure if that was the reason for missing change events from being watched.

Could you also check if there are any events associated with the logical instances which are different from the previous working release?

No, absolutely no change in events. We only just update .spec.version in all logical instances as part of master CR reconcile at the very end.

Is there some sort of timeout beyond which even if there are change events, they would not get processed ?

@everettraven everettraven added the triage/needs-information Indicates an issue needs more information in order to work on it. label Feb 20, 2023
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2023
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 21, 2023
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link

openshift-ci bot commented Jul 22, 2023

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot closed this as completed Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/ansible Issue is related to an Ansible operator project lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. triage/needs-information Indicates an issue needs more information in order to work on it. triage/support Indicates an issue that is a support question.
Projects
None yet
Development

No branches or pull requests

4 participants