Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null client received in TestPromiseOrdering #543

Open
lthibault opened this issue Aug 23, 2023 · 5 comments
Open

Null client received in TestPromiseOrdering #543

lthibault opened this issue Aug 23, 2023 · 5 comments
Assignees
Labels

Comments

@lthibault
Copy link
Collaborator

lthibault commented Aug 23, 2023

Diff of test case.

@davidhubbard
Copy link
Collaborator

davidhubbard commented Aug 24, 2023

For context, Apr 28 @zenhack wrote the following to @lthibault in the go-capnp Matrix channel:

how tolerant are you to regressions? I'm getting to the point where I'm debating whether just gutting some sections of this and re-implemening, getting the bones right without worrying about keeping it bug-free every step of the way, and then hunting stuff down later would make this easier. Hopefully the way I do it would mean fewer bugs in the long run.

...I may not regardless, but it's a thought that's beginning to haunt me

@lthibault and I decided it'd be informative, at least, to do a quick git bisect from the last tagged release at v0.30 (May 15) through the present, though it will mean I'll have to backport TestPromiseOrdering which was added in #530 to main on Jun 29. Should be very doable.

The bisect will attempt to track down if any particular commit by @zenhack - or anyone else for that matter - stands out as a moment in time when TestPromiseOrdering broke.

tl;dr: if we had TestPromiseOrdering back when v0.30 was released, this issue might have been caught. The idea is to apply TestPromiseOrdering at each commit and see if this theory holds, and we can point at a single commit and say, "this is why TestPromiseOrdering is failing."

@davidhubbard
Copy link
Collaborator

I've successfully ported TestPromiseOrdering() on top of v0.30 (May 15) by sort-of-cherry-picking @zenhack's #530 of Jun 29. The unit test now passes against that May 15 code.

I've already found and fixed an E-Ordering issue, which appeared immediately when the unit test was first merged on Jun 29.

I'll continue working on a branch. The next step will be to rework the commit on top of Jun 26 which introduces ClientSnapshot. While challenging, I think I see how to do it. At that point I'll be "caught up" with where @zenhack was when he first merged the unit test on Jun 29.

The final step to reach a pull request is to catch up to the present day -- there are a few more commits from 390b049 to 163c79d mostly around the new structured logger interface. They touch a different section of the code.

I'll send it when I'm all caught up. The nice thing about using a side-branch while I time-travel back 3 months (git is so cool), is I can keep running the fixed unit test as I work my way forward in time. The commits in my side-branch will be overwritten. I plan to force-push the branch to bring it forward in time.

That the unit test wasn't working in 7e1bedd suggests @zenhack had a solution way back then. I would've really liked to hear his thoughts. :(

tl;dr: it's looking doable to fix TestPromiseOrdering(), though we're sorely missing @zenhack and will probably not fully grasp his vision.

@lthibault
Copy link
Collaborator Author

This is awesome! I'd very much welcome the opportunity to sync up on this with you, at your convenience, just to keep the proverbial bus factor above 1.

Does it make sense to eagerly merge your fix into main?

@davidhubbard
Copy link
Collaborator

davidhubbard commented Sep 1, 2023

Quick dev update: I've force-pushed on my branch, now only 28 commits behind HEAD.

It panics in a different test (TestSendDisembargo) with the impent getting Shutdown() from the Client and releasing the import entirely -- I'm still working on the reference counting.

Update: I've force-pushed again and it now passes all tests ... except TestPromiseOrdering, which runs to completion. i.e. TestPromiseOrdering no longer gets the Null client which this issue is all about; instead the problem now is that promises are not fulfilled in E-Order... But this is pretty exciting.

The previous commit is still present until it is garbage-collected.

@mologie
Copy link

mologie commented Jun 13, 2024

Just chiming in here while investigating #572: I can unfortunately still reproduce the "call on null client" error that this test was suffering from even on @davidhubbard's branch on commit 924f821. If it was related, chances are the changes are just hiding the issue in that specific test now, or my issue is unrelated.

I will try to find a minimal reproducer eventually, as the tool to detect this is currently part of a larger proprietary project with C++ capnp components, and takes 30s+ to hit the bug during a benchmark. Maybe that will contribute towards finding a solution for this and my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants