Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow restarting openqa-webui-daemon without downtime #5820

Merged
merged 3 commits into from
Aug 6, 2024

Conversation

Martchus
Copy link
Contributor

@Martchus Martchus commented Aug 5, 2024

Copy link
Member

@okurz okurz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@Martchus
Copy link
Contributor Author

Martchus commented Aug 5, 2024

It works locally when installing the packages from the OBS check. (If you want to reproduce, be sure to also install openQA-common because the reuse=1 change is part of that sub package.)

I tested this by hammering the F5 key in the web browser why reloading the service via sleep 5 && sudo systemctl reload openqa-webui. Without this change there's a time window of around a second where one gets no connection and with the change this doesn't happen.

The journal also looks good - so the old service is really only stopped once the new one is starting:

Aug 05 17:20:33 linux-9lzf systemd[1]: Reloaded The openQA web UI.
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Listening at "http://127.0.0.1:9526?reuse=1"
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: Web application available at http://127.0.0.1:9526
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Listening at "http://[::1]:9526?reuse=1"
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: Web application available at http://[::1]:9526
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Manager 52867 started
…
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52908]: [info] Worker 52908 started
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52867]: [info] Creating process id file "/var/lib/openqa/webui/prefork-1.pid"
Aug 05 17:20:35 linux-9lzf openqa-webui-daemon[52909]: [info] Worker 52909 started
Aug 05 17:20:36 linux-9lzf openqa-webui-daemon[52799]: [warn] Stopping worker 52833 immediately
Aug 05 17:20:36 linux-9lzf openqa-webui-daemon[52799]: [warn] Stopping worker 52831 immediately
…

The output of systemctl status also looks good. All the PIDs of prefork processes are replaced after a reload and there are no leftover processes.

I also already have a fix for the failing test.

I still need to ensure that the service is not restarted on updates via the rpm scripts and that other services being restarted don't trigger a restart of the web UI. And I also need to add reload: True in our salt states (according to https://docs.saltproject.io/en/latest/ref/states/all/salt.states.service.html).

* Allow restarting `openqa-webui-daemon` without downtime by sending SIGHUP
  to the process or reloading the systemd unit `openqa-webui.service`
* Start the Mojolicious application with `reuse=1` as mentioned on
  https://docs.mojolicious.org/Mojolicious/Guides/Cookbook#Zero-downtime-software-upgrades
* Note that other services are not covered but those are also not user
  facing or retried and thus not required
* See https://progress.opensuse.org/issues/162533
This helps minimizing downtimes. It is also generally acceptable to show
the web UI even though not all other services can be started.

Related ticket: https://progress.opensuse.org/issues/162533
* Avoid restarting the main web UI service via the rpm postrun script; only
  reload the service
* See https://progress.opensuse.org/issues/162533
Copy link

codecov bot commented Aug 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.50%. Comparing base (40fce5a) to head (a1f44e4).
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5820   +/-   ##
=======================================
  Coverage   98.50%   98.50%           
=======================================
  Files         395      395           
  Lines       38715    38715           
=======================================
  Hits        38136    38136           
  Misses        579      579           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Martchus
Copy link
Contributor Author

Martchus commented Aug 6, 2024

I tested the package from OBS checks locally and it works. So reinstalling/updating the package now causes the main service to reload and other services are still restarted.

I also didn't run into any limits regarding PostgreSQL connections. However, in production we might have other limits so I'll check whether I can run two prefork instances (which all the usual settings) in parallel on OSD and o3.

EDIT: I can run sudo -u geekotest /usr/share/openqa/script/openqa prefork -m production --proxy -i 100 -H 400 -w 45 -c 1 -G 800 -l 'http://[::]:8080' on OSD/o3 and it works. So we have enough headroom for database connections.

@Martchus Martchus removed the not-ready label Aug 6, 2024
@mergify mergify bot merged commit ca7a942 into os-autoinst:master Aug 6, 2024
42 checks passed
@Martchus Martchus deleted the zero-downtime-restart branch August 6, 2024 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants