Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix wait for idle on slow network #183

Merged
merged 5 commits into from
Dec 7, 2023

Conversation

rgildein
Copy link
Contributor

@rgildein rgildein commented Dec 1, 2023

During my manual testing, I found out two issues:

  1. Using 11s for timeout in pre-upgrade step verifying that model is in idle state cause an issue on slow network. It raised an exception, saying that apps are not in idle. This was caused by slower network connection, since I was running it locally and the model was on an external OpenStack cloud. After changing timeout to 60 it works fine. I also tested that running cou on the external cloud directly do no cause any issue. Maybe we can set value to 60 (instead of using DEFAULT_TIMEOUT), but that can cause unwanted long waiting if any app is not in idle.
$ cou plan
...
Verify that all OpenStack applications are in idle state ✖
2023-12-01 17:44:45 [ERROR] Timed out waiting for model:
  rabbitmq-server/0 [idle] active: Unit is ready
  neutron-api/0 [idle] active: Unit is ready
  glance/0 [idle] active: Unit is ready
  cinder/0 [idle] active: Unit is ready
  ...  # all apps are in idle
# after change
$ COU_TIMEOUT=60 cou plan
...
Running cloud upgrade...
Verify that all OpenStack applications are in idle state ✔
Backup mysql databases ✔
...
  1. error message from wiat_for_idle is not correct. Example before fix
2023-12-01 17:09:14 [ERROR] Timed out waiting for model:
  rabbitmq-server/0 [idle] active: Unit is ready
    neutron-api/0 [idle] active: Unit is ready
    glance/0 [idle] active: Unit is ready
    cinder/0 [idle] active: Unit is ready
    ...
#  example after fix
2023-12-01 17:09:14 [ERROR] Timed out waiting for model:
  rabbitmq-server/0 [idle] active: Unit is ready
  neutron-api/0 [idle] active: Unit is ready
  glance/0 [idle] active: Unit is ready
  cinder/0 [idle] active: Unit is ready

Example before fix
```bash
2023-12-01 17:09:14 [ERROR] Timed out waiting for model:
  rabbitmq-server/0 [idle] active: Unit is ready
    neutron-api/0 [idle] active: Unit is ready
    glance/0 [idle] active: Unit is ready
    cinder/0 [idle] active: Unit is ready
    ...
```
and example after fix
```bash
2023-12-01 17:09:14 [ERROR] Timed out waiting for model:
  rabbitmq-server/0 [idle] active: Unit is ready
  neutron-api/0 [idle] active: Unit is ready
  glance/0 [idle] active: Unit is ready
  cinder/0 [idle] active: Unit is ready
  ...
```
The raise_on_blocked=True is used in pre-upgrade step, so we can raise
exception immediately if any apps is in block state.
We need to use DEFAULT_TIMEOUT to be able to configure timeout if cou is
run from slower network.

Example without changing timeout:
```bash
$ cou plan
...
Verify that all OpenStack applications are in idle state ✖
2023-12-01 17:44:45 [ERROR] Timed out waiting for model:
  rabbitmq-server/0 [idle] active: Unit is ready
  neutron-api/0 [idle] active: Unit is ready
  glance/0 [idle] active: Unit is ready
  cinder/0 [idle] active: Unit is ready
  ...  # all apps are in idle
```
after
```bash
$ cou plan
...
Running cloud upgrade...
Verify that all OpenStack applications are in idle state ✔
Backup mysql databases ✔
...
```
@rgildein rgildein added the bug Something isn't working label Dec 1, 2023
@rgildein rgildein self-assigned this Dec 1, 2023
@rgildein rgildein requested a review from a team as a code owner December 1, 2023 17:23
cou/steps/plan.py Outdated Show resolved Hide resolved
cou/utils/juju_utils.py Show resolved Hide resolved
cou/steps/plan.py Outdated Show resolved Hide resolved
gabrielcocenza
gabrielcocenza previously approved these changes Dec 6, 2023
Copy link
Member

@gabrielcocenza gabrielcocenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a non-blocker suggestion

cou/utils/juju_utils.py Show resolved Hide resolved
cou/utils/juju_utils.py Outdated Show resolved Hide resolved
Copy link
Contributor

@agileshaw agileshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one suggestion in-line to simply the explanatory comment. Otherwise looks good

cou/steps/plan.py Outdated Show resolved Hide resolved
Copy link
Contributor

@agileshaw agileshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rgildein rgildein merged commit c716836 into canonical:main Dec 7, 2023
3 checks passed
@rgildein rgildein deleted the bug/fix-wait_for_idle branch December 7, 2023 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants