-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
Previously we were using the client incorrectly by not passing the context, this was causing internal network error in the client due to the API server not being ready. However, we continued and blocked on the event channel. With the new approach, we can determine when the API server is ready, this way we return a FATAL error and the process exists. This allows the init process to retry again until the API server becomes ready. Signed-off-by: Nino Kodabande <[email protected]>
721bb53
to
b1c38c4
Compare
pkg/kube/servicewatcher.go
Outdated
@@ -48,14 +48,17 @@ func watchServices(ctx context.Context, client *kubernetes.Clientset) (<-chan ev | |||
informerFactory := informers.NewSharedInformerFactory(client, 1*time.Hour) | |||
serviceInformer := informerFactory.Core().V1().Services() | |||
sharedInformer := serviceInformer.Informer() | |||
_, _ = sharedInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ | |||
_, _ = serviceInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're no longer using sharedInformer
here? It seems to be the same thing here.
AddFunc: func(obj interface{}) { | ||
log.Debugf("Service Informer: Add func called with: %+v", obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it matters, but these logs should be trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK for the trace logging to work it needs to be handled from the main. I don't think just changing these to trace would work 🤔
Also, I don't recall if we handle trace logging at all in RD anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, trace would mean it's basically off and needs a code change to turn on. We don't really want to log everything in release builds (even with debug turned on).
if err != nil { | ||
return nil, nil, fmt.Errorf("error listing services: %w", err) | ||
} | ||
log.Debugf("coreV1 services list :%+v", services.Items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here — trace?
@@ -101,15 +104,16 @@ func watchServices(ctx context.Context, client *kubernetes.Clientset) (<-chan ev | |||
informerFactory.WaitForCacheSync(ctx.Done()) | |||
informerFactory.Start(ctx.Done()) | |||
|
|||
services, err := serviceInformer.Lister().List(labels.Everything()) | |||
services, err := client.CoreV1().Services(corev1.NamespaceAll).List(ctx, v1.ListOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with Nino: This manages to return an error (good!), but we don't handle it (bad!) and exit the whole process. At least init
restarts us, but we should avoid needing to do that.
90febd7
to
37f265a
Compare
This will prevent the process to exit with a FATAL error which prevents the init process restarting guest agent again. The error is now handled internally withing the process. Signed-off-by: Nino Kodabande <[email protected]>
37f265a
to
f8f722a
Compare
Signed-off-by: Nino Kodabande <[email protected]>
pkg/kube/watcher.go
Outdated
case isTimeout(err): | ||
case errors.Is(err, unix.ENETUNREACH): | ||
case errors.Is(err, unix.ECONNREFUSED): | ||
case apierrors.IsInternalError(err): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for all internal errors; I guess that's fine?
@@ -258,6 +267,10 @@ func isTimeout(err error) bool { | |||
return false | |||
} | |||
|
|||
func isAPINotReady(err error) bool { | |||
return strings.Contains(err.Error(), "apiserver not ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have a comment about where this comes from (k3s, over HTTP so we can't test for a specific error object). Ideally we also check that err
is of the right type here, but I guess that can be optional.
pkg/kube/watcher.go
Outdated
// If it's a time out, the server may not be running yet | ||
switch { | ||
case isTimeout(err): | ||
fallthrough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be clearer as:
switch {
default:
return err
case isTimeout(err):
case errors.Is(err, unix.ENETUNREACH):
case errors.Is(err, unix.ECONNREFUSED):
case isAPINotReady(err):
}
// Sleep and continue for all the expected errors
time.Sleep(time.Second)
continue
AddFunc: func(obj interface{}) { | ||
log.Debugf("Service Informer: Add func called with: %+v", obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, trace would mean it's basically off and needs a code change to turn on. We don't really want to log everything in release builds (even with debug turned on).
@@ -181,7 +181,7 @@ func main() { | |||
k8sServiceListenerIP := net.ParseIP(*k8sServiceListenerAddr) | |||
|
|||
if k8sServiceListenerIP == nil || !(k8sServiceListenerIP.Equal(net.IPv4zero) || | |||
k8sServiceListenerIP.Equal(net.IPv4(127, 0, 0, 1))) { //nolint:gomnd // IPv4 addr localhost | |||
k8sServiceListenerIP.Equal(net.IPv4(127, 0, 0, 1))) { //nolint:gomnd,mnd // IPv4 addr localhost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to update the lint config to allow magic numbers for net.IPv4
instead.
5634943
to
d6b729e
Compare
Signed-off-by: Nino Kodabande <[email protected]>
Signed-off-by: Nino Kodabande <[email protected]>
d6b729e
to
ae6abd7
Compare
Previously we were using the client incorrectly by not passing the context, this was causing internal network errors in the client due to the API server not being ready as you can see in the below logs:
However, we continued and blocked the event channel.
With the new approach, we can determine when the API server is ready, this way we return a FATAL error and the process exists. This allows the init process to retry again until the API server becomes ready.
The below logs are for the the new approach:
Fixes: rancher-sandbox/rancher-desktop#6809