Combine all initializer commands with && to catch any failing commands #453

frittentheke · 2024-08-23T07:21:31Z

By running two commands instead of one (the second being the cat | grep), any failures (non-zero exit code) of the first part (containing k6 inspect) will be lost and masked away.

By chaining them all with && the first non-zero RC will fail the whole command and return.

Fixes: #435

yorugac · 2024-08-23T15:34:37Z

@frittentheke, thank you for the PR! It seems like you have opened 2 PRs for the same issue? Could you please leave only one?

Quickly skimming the code, the change seems reasonable on the first glance, but I need to look up the details there again. Additionally, this change requires testing. I'll be able to fully review this in a couple of weeks approx. Hope that's alright!

frittentheke · 2024-08-23T16:22:36Z

@frittentheke, thank you for the PR! It seems like you have opened 2 PRs for the same issue? Could you please leave only one?

I changed one to draft and shall remove the "fixes" reference to this issue. It is simply not "just" a bugfix.

Thanks for looking into and testing #453

yorugac · 2024-09-23T14:42:47Z

@frittentheke, sorry for the delay! I've finally managed to test this PR a bit. So there are 2 parts:

correct error return in common.go: that's a great catch; thank you!
the k6 inspect command: I'm afraid this change makes debugging more complicated than before. The current flow, with ;, allows one to see error in logs of initializer. Example user flow: user sees an initializer job has failed in k6-operator's logs and then can go and see the exact error in initializer logs. But with &&, initializer fails and leaves no logs afterwards. AFAIS, it becomes harder to debug the problem. If you could describe a more specific case of initializer failure that you were looking into, please do! Otherwise, I'd request to omit this part as I don't see how it helps ATM...

yorugac · 2024-09-30T13:45:41Z

Hi @frittentheke, would you be able to modify the PR to leave only the first change, in common.go? We're going to have a release this week, and it'd be preferable to merge that fix before that.
We can continue discussion about the second change at a later point / in another PR, of course.

frittentheke · 2024-10-01T12:01:22Z

@frittentheke, sorry for the delay! I've finally managed to test this PR a bit. So there are 2 parts:

1. correct error return in `common.go`: that's a great catch; thank you!

2. the `k6 inspect` command: I'm afraid this change makes debugging more complicated than before. The current flow, with `;`, allows one to see error in logs of initializer. Example user flow: user sees an `initializer job has failed` in k6-operator's logs and then can go and see the exact error in initializer logs. But with `&&`, initializer fails and leaves no logs afterwards. AFAIS, it becomes harder to debug the problem. If you could describe a more specific case of initializer failure that you were looking into, please do! Otherwise, I'd request to omit this part as I don't see how it helps ATM...

@yorugac sorry for the delay ...

Regarding 1) I created a new PR with just the var typo fixed: #465

As for 2) :

I'd like to argue that the current flow actually only handles certain types of errors - those that go into that very output file /tmp/k6logs and that are then caught by the grep filter level=error- but then misses others: Any issues with the initializer (causing non-zero exit codes) or cause none of those error lines to be emitted are then lost and the initializer actually falsely succeeds as the original return code of the inspect is masked or lost and only the one from the grep command counts for the pod termination. In my case, which made me write up the PR, the inspect command was not able to start (due to some imports being wrong) and the initializer still succeeded as there was not output with level=error and the k6-operator got stuck in the next phase (no JSON output, no jobs, ...).

To me any unexpected non-zero return code should be handled as an error (I know about grafana/k6#2804, #75, ...). A while back I started a bigger PR (#450) completely reworking the way the initializer works through the various tasks and then sends off it's verdict via termination log, the kubernetes-native way of handling this phase (https://kubernetes.io/docs/tasks/debug/debug-application/determine-reason-pod-failure/). But that PR / discussion is for another day I suppose. If you even like the idea to rework the whole thing to make it a lot more robust.

By running two commands instead of one (the second being the cat | grep), any failures (non-zero exit code) of the first part (containing `k6 inspect`) will be lost and masked away. By chaining them all with `&&` the first non-zero RC will fail the whole command and return. Fixes: grafana#435

frittentheke · 2024-10-23T07:18:40Z

@yorugac with 0.17.0 released would you consider discussing 2) some more? If you agree that my chain of thought is not totally off and would consider a rework the likes of #450 I'd gladly take in some feedback there to finish the PR.

yorugac · 2024-11-05T10:45:20Z

Hi @frittentheke, apologies for this delay! I've got swamped with internal work. And thank you for your patience! 😂

You described this case:

In my case, which made me write up the PR, the inspect command was not able to start (due to some imports being wrong) and the initializer still succeeded as there was not output with level=error and the k6-operator got stuck in the next phase (no JSON output, no jobs, ...).

AFAIR, when imports are off, there should be logs left in initializer which then should cause an error in k6-operator on JSON unmarshal. It sounds like it wasn't so for you, so no logs despite the import error? If so, could you please share a script how to repeat that situation?

frittentheke force-pushed the failInitOnNonZero_435 branch 2 times, most recently from 24e2177 to a054bf7 Compare August 23, 2024 07:47

frittentheke mentioned this pull request Aug 23, 2024

Issues within initalizer error handling if script is incorrect #435

Open

frittentheke mentioned this pull request Oct 1, 2024

Fix typo in variable containing error when fetching pod logs #465

Merged

frittentheke force-pushed the failInitOnNonZero_435 branch from a054bf7 to 8008a90 Compare October 1, 2024 12:02

frittentheke mentioned this pull request Oct 30, 2024

Make reading logs by initializer a more reliable operation #193

Open

yorugac mentioned this pull request Nov 5, 2024

Rework Initializer container and handling its verdict #450

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine all initializer commands with && to catch any failing commands #453

Combine all initializer commands with && to catch any failing commands #453

frittentheke commented Aug 23, 2024

yorugac commented Aug 23, 2024 •

edited

Loading

frittentheke commented Aug 23, 2024 •

edited

Loading

yorugac commented Sep 23, 2024

yorugac commented Sep 30, 2024

frittentheke commented Oct 1, 2024

frittentheke commented Oct 23, 2024

yorugac commented Nov 5, 2024

Combine all initializer commands with && to catch any failing commands #453

Are you sure you want to change the base?

Combine all initializer commands with && to catch any failing commands #453

Conversation

frittentheke commented Aug 23, 2024

yorugac commented Aug 23, 2024 • edited Loading

frittentheke commented Aug 23, 2024 • edited Loading

yorugac commented Sep 23, 2024

yorugac commented Sep 30, 2024

frittentheke commented Oct 1, 2024

frittentheke commented Oct 23, 2024

yorugac commented Nov 5, 2024

yorugac commented Aug 23, 2024 •

edited

Loading

frittentheke commented Aug 23, 2024 •

edited

Loading