Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm reconciler aborts and uninstalls chart when installing multi-node #3758

Closed
4 tasks done
danj-replicated opened this issue Nov 24, 2023 · 8 comments · Fixed by #4515
Closed
4 tasks done

Helm reconciler aborts and uninstalls chart when installing multi-node #3758

danj-replicated opened this issue Nov 24, 2023 · 8 comments · Fixed by #4515
Assignees
Labels
area/helm bug Something isn't working

Comments

@danj-replicated
Copy link
Contributor

danj-replicated commented Nov 24, 2023

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 GNU/Linux
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Version

v1.28.4+k0s.0

Sysinfo

`k0s sysinfo`
Machine ID: "d8dd4971601b5be156336f209867f26f1d6e2c560d0e815ccbd4021bc3de859b" (from machine) (pass)
Total memory: 3.8 GiB (pass)
Disk space available for /var/lib/k0s: 102.6 GiB (pass)
Name resolution: localhost: [127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.15.0-88-generic (pass)
  Max. file descriptors per process: current: 1048576 / max: 1048576 (pass)
  AppArmor: active (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (pass)
    cgroup controller "memory": available (pass)
    cgroup controller "devices": available (assumed) (pass)
    cgroup controller "freezer": available (assumed) (pass)
    cgroup controller "pids": available (pass)
    cgroup controller "hugetlb": available (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: built-in (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

k0s installed via k0sctl on multiple nodes cannot install the first ordered helm chart due to etcd leadership re-election. this causes the helm chart to be removed.

Steps to reproduce

  1. install k0s with k0sctl on 2 or more nodes and specify a few helm charts with ordering.
  2. wait for cluster to install
  3. check chart objects i.e: k0s kubectl describe chart -n kube-system k0s-addon-chart-openebs

Expected behavior

helm charts should be installed or re-tried if leadership-reelection interferes with install.

Actual behavior

helm chart aborts and uninstalls due to etcd leadership re-election

Screenshots and logs

  Error:        can't install loadedChart `openebs`: release openebs failed, and has been uninstalled due to atomic being set: etcdserver: leader changed

Additional context

No response

@danj-replicated danj-replicated added the bug Something isn't working label Nov 24, 2023
@twz123
Copy link
Member

twz123 commented Nov 24, 2023

/xref #3651
/cc @juanluisvaladas

@juanluisvaladas juanluisvaladas self-assigned this Dec 4, 2023
Copy link
Contributor

github-actions bot commented Jan 3, 2024

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Jan 3, 2024
@twz123 twz123 removed the Stale label Jan 4, 2024
Copy link
Contributor

github-actions bot commented Feb 3, 2024

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Feb 3, 2024
@twz123 twz123 removed the Stale label Feb 4, 2024
Copy link
Contributor

github-actions bot commented Mar 5, 2024

The issue is marked as stale since no activity has been recorded in 30 days

Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Apr 10, 2024
@twz123 twz123 removed the Stale label Apr 11, 2024
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label May 11, 2024
@twz123 twz123 removed the Stale label May 12, 2024
@jnummelin
Copy link
Member

Adding some references

helm/helm#7637

This is fixed upstream Helm in helm/helm@b5378b3

Unfortunately we do not construct the client this way so by default we do no use this fix. 😢

Double unfortunately the retrying round-tripper implementation is private, thus if we want to fix this we need to either "inline" it or adopt it in some other way.

jnummelin added a commit to jnummelin/k0s that referenced this issue May 29, 2024
This way we get the retrying round-tripper setup and Helm now retries some known transient errors
such as the etcd leader change.

Fixes k0sproject#3758

Signed-off-by: Jussi Nummelin <[email protected]>
k0s-bot pushed a commit that referenced this issue May 29, 2024
This way we get the retrying round-tripper setup and Helm now retries some known transient errors
such as the etcd leader change.

Fixes #3758

Signed-off-by: Jussi Nummelin <[email protected]>
(cherry picked from commit 11da64f)
k0s-bot pushed a commit that referenced this issue May 29, 2024
This way we get the retrying round-tripper setup and Helm now retries some known transient errors
such as the etcd leader change.

Fixes #3758

Signed-off-by: Jussi Nummelin <[email protected]>
(cherry picked from commit 11da64f)
k0s-bot pushed a commit that referenced this issue May 29, 2024
This way we get the retrying round-tripper setup and Helm now retries some known transient errors
such as the etcd leader change.

Fixes #3758

Signed-off-by: Jussi Nummelin <[email protected]>
(cherry picked from commit 11da64f)
k0s-bot pushed a commit that referenced this issue May 29, 2024
This way we get the retrying round-tripper setup and Helm now retries some known transient errors
such as the etcd leader change.

Fixes #3758

Signed-off-by: Jussi Nummelin <[email protected]>
(cherry picked from commit 11da64f)
@ianb-mp
Copy link
Contributor

ianb-mp commented Jun 11, 2024

I've just started seeing this issue on v1.30.1+k0s.0 ... but I'm thinking that version doesn't include the fix from #4515 - is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants