Shoot worker node hostname changes after machine reboot #569
Labels
area/robustness
Robustness, reliability, resilience related
kind/bug
Bug
lifecycle/rotten
Nobody worked on this for 12 months (final aging stage)
platform/openstack
OpenStack platform/infrastructure
How to categorize this issue?
/area robustness
/kind bug
/platform openstack
What happened:
When rebooting a shoot worker node, its hostname changes.
This causes
kubelet
to fail to start after the machine reboot because it can't get theNode
object with the new name:Note: the default
dns_domain
for neutron network isnovalocal
in our installation, which is appended to the server name. Because the entire FQDN hostname is too long, it is shortened in the above example.provider-openstack doesn't set the
dns_domain
in the created neutron networks explicitly.What you expected to happen:
The hostname should be stable and kubelet should be able to start again after a node reboot.
How to reproduce it (as minimally and precisely as possible):
Node
is not able to recover from stateUnready
Anything else we need to know?:
This extension adds an
ExecStartPre
directive to thekubelet
unit which changes the hostname:gardener-extension-provider-openstack/pkg/webhook/controlplane/ensurer.go
Lines 265 to 267 in a9035cb
On the initial boot of the machine, this always works as the
kubelet
unit and thehostnamectl
command is always invoked after any cloud-init mechanisms (the unit is only present after the first successful run of thecloud-config
downloader/executor).However, after rebooting the machine, all the
kubelet
unit and itshostnamectl
command race with other cloud-init mechanisms which can lead to a changed hostname.Environment:
kubectl version
): v1.24.8The text was updated successfully, but these errors were encountered: