Null client received in TestPromiseOrdering #543

lthibault · 2023-08-23T15:47:05Z

davidhubbard · 2023-08-24T03:47:17Z

For context, Apr 28 @zenhack wrote the following to @lthibault in the go-capnp Matrix channel:

how tolerant are you to regressions? I'm getting to the point where I'm debating whether just gutting some sections of this and re-implemening, getting the bones right without worrying about keeping it bug-free every step of the way, and then hunting stuff down later would make this easier. Hopefully the way I do it would mean fewer bugs in the long run.

...I may not regardless, but it's a thought that's beginning to haunt me

@lthibault and I decided it'd be informative, at least, to do a quick git bisect from the last tagged release at v0.30 (May 15) through the present, though it will mean I'll have to backport TestPromiseOrdering which was added in #530 to main on Jun 29. Should be very doable.

The bisect will attempt to track down if any particular commit by @zenhack - or anyone else for that matter - stands out as a moment in time when TestPromiseOrdering broke.

tl;dr: if we had TestPromiseOrdering back when v0.30 was released, this issue might have been caught. The idea is to apply TestPromiseOrdering at each commit and see if this theory holds, and we can point at a single commit and say, "this is why TestPromiseOrdering is failing."

davidhubbard · 2023-08-29T03:05:33Z

I've successfully ported TestPromiseOrdering() on top of v0.30 (May 15) by sort-of-cherry-picking @zenhack's #530 of Jun 29. The unit test now passes against that May 15 code.

I've already found and fixed an E-Ordering issue, which appeared immediately when the unit test was first merged on Jun 29.

I'll continue working on a branch. The next step will be to rework the commit on top of Jun 26 which introduces ClientSnapshot. While challenging, I think I see how to do it. At that point I'll be "caught up" with where @zenhack was when he first merged the unit test on Jun 29.

The final step to reach a pull request is to catch up to the present day -- there are a few more commits from 390b049 to 163c79d mostly around the new structured logger interface. They touch a different section of the code.

I'll send it when I'm all caught up. The nice thing about using a side-branch while I time-travel back 3 months (git is so cool), is I can keep running the fixed unit test as I work my way forward in time. The commits in my side-branch will be overwritten. I plan to force-push the branch to bring it forward in time.

That the unit test wasn't working in 7e1bedd suggests @zenhack had a solution way back then. I would've really liked to hear his thoughts. :(

tl;dr: it's looking doable to fix TestPromiseOrdering(), though we're sorely missing @zenhack and will probably not fully grasp his vision.

lthibault · 2023-08-29T15:33:49Z

This is awesome! I'd very much welcome the opportunity to sync up on this with you, at your convenience, just to keep the proverbial bus factor above 1.

Does it make sense to eagerly merge your fix into main?

davidhubbard · 2023-09-01T04:40:00Z

Quick dev update: I've force-pushed on my branch, now only 28 commits behind HEAD.

~~It panics in a different test (TestSendDisembargo) with the impent getting Shutdown() from the Client and releasing the import entirely -- I'm still working on the reference counting.~~

Update: I've force-pushed again and it now passes all tests ... except TestPromiseOrdering, which runs to completion. i.e. TestPromiseOrdering no longer gets the Null client which this issue is all about; instead the problem now is that promises are not fulfilled in E-Order... But this is pretty exciting.

The previous commit is still present until it is garbage-collected.

mologie · 2024-06-13T14:03:31Z

Just chiming in here while investigating #572: I can unfortunately still reproduce the "call on null client" error that this test was suffering from even on @davidhubbard's branch on commit 924f821. If it was related, chances are the changes are just hiding the issue in that specific test now, or my issue is unrelated.

I will try to find a minimal reproducer eventually, as the tool to detect this is currently part of a larger proprietary project with C++ capnp components, and takes 30s+ to hit the bug during a benchmark. Maybe that will contribute towards finding a solution for this and my issue.

lthibault added the bug label Aug 23, 2023

lthibault assigned davidhubbard Aug 23, 2023

lthibault mentioned this issue Oct 12, 2023

zenhack.net is Gone #548

Closed

mologie mentioned this issue Jun 13, 2024

Sporadic error call on null client for first RPC call after upgrading from 3.0.0-alpha.30 to 3.0.1-alpha.1, ~1% chance #572

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Null client received in TestPromiseOrdering #543

Null client received in TestPromiseOrdering #543

lthibault commented Aug 23, 2023 •

edited

Loading

davidhubbard commented Aug 24, 2023 •

edited

Loading

davidhubbard commented Aug 29, 2023

lthibault commented Aug 29, 2023

davidhubbard commented Sep 1, 2023 •

edited

Loading

mologie commented Jun 13, 2024

Null client received in TestPromiseOrdering #543

Null client received in TestPromiseOrdering #543

Comments

lthibault commented Aug 23, 2023 • edited Loading

davidhubbard commented Aug 24, 2023 • edited Loading

davidhubbard commented Aug 29, 2023

lthibault commented Aug 29, 2023

davidhubbard commented Sep 1, 2023 • edited Loading

mologie commented Jun 13, 2024

lthibault commented Aug 23, 2023 •

edited

Loading

davidhubbard commented Aug 24, 2023 •

edited

Loading

davidhubbard commented Sep 1, 2023 •

edited

Loading