From add4ac277b510ee3f2b03e45515cfabf2f3142a7 Mon Sep 17 00:00:00 2001 From: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com> Date: Tue, 10 Oct 2023 09:42:40 -0400 Subject: [PATCH] Add docs for agent upgrade restart feature (#568) * Add docs for agent upgrade restart feature * Add note in upgrading summary about restarting --- .../fleet/upgrade-elastic-agent.asciidoc | 61 ++++++++++++++++++- 1 file changed, 58 insertions(+), 3 deletions(-) diff --git a/docs/en/ingest-management/fleet/upgrade-elastic-agent.asciidoc b/docs/en/ingest-management/fleet/upgrade-elastic-agent.asciidoc index 273266781..ac5a45110 100644 --- a/docs/en/ingest-management/fleet/upgrade-elastic-agent.asciidoc +++ b/docs/en/ingest-management/fleet/upgrade-elastic-agent.asciidoc @@ -16,11 +16,13 @@ date and time. In most failure cases the {agent} may retry an upgrade after a short wait. The wait durations between retries are: 1m, 5m, 10m, 15m, 30m, and 1h. During this -time, the {agent} may show up as "retrying" in the {fleet} UI. -//Note that you can abort an upgrade that is being retried. See <>. +time, the {agent} may show up as "retrying" in the {fleet} UI. As well, if agent +upgrades have been detected to have stalled, you can restart the upgrade process +for a <> or in bulk for +<>. This approach simplifies the process of keeping your agents up to date. It also -saves you time because you don’t need third-party tools or processes to +saves you time because you don't need third-party tools or processes to manage upgrades. By default, {agent}s require internet access to perform binary upgrades from @@ -50,6 +52,12 @@ can perform the following upgrade-related actions: |<> |View the status of an upgrade, including upgrade metrics and agent logs. +|<> +|Restart an upgrade process that has stalled for a single agent. + +|<> +|Do a bulk restart of the upgrade process for a set of agents. + |=== @@ -144,3 +152,50 @@ don't see the host name, try refreshing the page. + [role="screenshot"] image::images/upgrade-failure.png[Agent logs showing upgrade failure] + +[discrete] +[[restart-upgrade-single]] +== Restart an upgrade for a single agent + +An {agent} upgrade process may sometimes stall. This can happen for various +reasons, including, for example, network connectivity issues or a delayed shutdown. + +When an {agent} upgrade has been detected to be stuck, a warning indicator +appears on the UI. When this occurs, you can restart the upgrade from either the +*Agents* tab on the main {fleet} page or from the details page for any individual +agent. + +Restart from main {fleet} page: + +. From the **Actions** menu next to an agent that is stuck in an `Updating` +state, choose **Restart upgrade**. +. In the **Restart upgrade** window, select an upgrade version and click +**Upgrade agent**. + +Restart from an agent details page: + +. In {fleet}, in the **Host** column, click the agent's name. On the +**Agent details** tab, a warning notice appears if the agent is detected to have +stalled during an upgrade. +. Click *Restart upgrade*. +. In the **Restart upgrade** window, select an upgrade version and click +**Upgrade agent**. + +[discrete] +[[restart-upgrade-multiple]] +== Restart an upgrade for multiple agents + +When the upgrade process for multiple agents has been detected to have stalled, +you can restart the upgrade process in bulk. + +. On the **Agents** tab, select any set of the agents that are indicated to be stuck, and click **Actions**. +. From the **Actions** menu, select **Restart upgrade agents**. +. In the **Restart upgrade...** window, select an upgrade version. +. Select the amount of time available for the maintenance window. The upgrades +are spread out uniformly across this maintenance window to avoid exhausting +network resources. ++ +To force selected agents to upgrade immediately when the upgrade is +triggered, select **Immediately**. Avoid using this setting for batches of more +than 10 agents. +. Restart the upgrades.