-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source-controller
appear to be hanging, checking a git repo over ssh
#1154
Comments
what happens when you reconcile the GitRepository manually by running |
I tired that before; it was hanging as well, on |
I rebooted my router, which killed the hanging connection and generated two error messages with stack-trace; maybe that can help: {
"level": "error",
"ts": "2023-07-05T08:16:47.392Z",
"msg": "failed to checkout and determine revision: unable to list remote for 'ssh://[email protected]/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"controller": "gitrepository",
"controllerGroup": "source.toolkit.fluxcd.io",
"controllerKind": "GitRepository",
"GitRepository": {
"name": "flux-system",
"namespace": "flux-system"
},
"namespace": "flux-system",
"name": "flux-system",
"reconcileID": "b19c779d-28aa-4163-aa59-1cb7ed4f3373",
"error": "failed to checkout and determine revision: unable to list remote for 'ssh://[email protected]/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"stacktrace": "github.com/fluxcd/source-controller/internal/reconcile/summarize.logError\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/processor.go:99\ngithub.com/fluxcd/source-controller/internal/reconcile/summarize.ErrorActionHandler\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/processor.go:77\ngithub.com/fluxcd/source-controller/internal/reconcile/summarize.(*Helper).SummarizeAndPatch\n\tgithub.com/fluxcd/source-controller/internal/reconcile/summarize/summary.go:193\ngithub.com/fluxcd/source-controller/internal/controller.(*GitRepositoryReconciler).Reconcile.func1\n\tgithub.com/fluxcd/source-controller/internal/controller/gitrepository_controller.go:204\ngithub.com/fluxcd/source-controller/internal/controller.(*GitRepositoryReconciler).Reconcile\n\tgithub.com/fluxcd/source-controller/internal/controller/gitrepository_controller.go:240\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"
}
{
"level": "debug",
"ts": "2023-07-05T08:16:47.393Z",
"logger": "events",
"msg": "failed to checkout and determine revision: unable to list remote for 'ssh://[email protected]/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"type": "Warning",
"object": {
"kind": "GitRepository",
"namespace": "flux-system",
"name": "flux-system",
"uid": "df67c776-a9e3-4d82-8534-30823b917661",
"apiVersion": "source.toolkit.fluxcd.io/v1",
"resourceVersion": "293766"
},
"reason": "GitOperationFailed"
}
{
"level": "error",
"ts": "2023-07-05T08:16:47.412Z",
"msg": "Reconciler error",
"controller": "gitrepository",
"controllerGroup": "source.toolkit.fluxcd.io",
"controllerKind": "GitRepository",
"GitRepository": {
"name": "flux-system",
"namespace": "flux-system"
},
"namespace": "flux-system",
"name": "flux-system",
"reconcileID": "b19c779d-28aa-4163-aa59-1cb7ed4f3373",
"error": "failed to checkout and determine revision: unable to list remote for 'ssh://[email protected]/...': ssh: handshake failed: read tcp 10.42.0.15:56574->140.82.114.4:22: read: connection reset by peer",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:226"
} |
this error combined with the fact that the thread essentially gets stuck leads me to believe that this issue is the result of connection issues where the connection just gets stuck forever without completing or terminating and then when the router is rebooted the connection is dropped |
That sounds like a reasonable explanation. |
I have the
source-controller
configured to watch a single git repo over ssh, with an interval of 1 minute and no explicit timeout (should default to 60s). After a little while (about 10 minutes since reboot in my latest case), the source controller stops checking the repo, stops logging anything (logging bumped to debug to investigate), and never recovers from that state.The
kustomize-controller
, configured to reconcile every 10 minutes keeps working / logging properly, but never sees any update after that point.from http://...:8080/metrics:
Additional context:
vagrant up
I'll be happy to provide any further details if needed.
Please let me know how I can help resolve this issue.
Thanks
The text was updated successfully, but these errors were encountered: