-
Notifications
You must be signed in to change notification settings - Fork 136
uv_append: Initialize barrier upon creation. #462
Conversation
Signed-off-by: Mathieu Borderé <[email protected]>
We haven't released anything since then, but I'd rather merge this quickly as LXD builds from master, will just merge it, looks like an obvious improvement. |
Wow, this is a nasty bug! |
yeah :-( |
Was the snapshot throttling experiment run with the un-fixed code? IOW does this bug explain why latency increased instead of decreasing? Or is there something else yet to be investigated? |
I ran those tests on June 21st, which predates the work that introduced that bug. I'll revisit the throttling anyway as I want to understand the mechanism behind the phenomenon and double-check I didn't mess something up during the test :-) |
Mathieu re-ran the benchmark against a raft commit that wasn't affected by this issue, and confirmed that tail latency for client requests improves when snapshotting does less I/O. |
yes, but I think Free is talking about the throttling of the snapshot write. |
@cole-miller @MathieuBordere btw even after our emails and conversations I still don't know what the real-world problem being solved is, and/or the details about possible performance issues that Canonical is hitting with dqlite (e.g. which application has issues? what are these issues exactly? was the underlying problem/cause unequivocally identified and measured?). However, assuming that Canonical has an application for which:
then one would reasonably expect that throttling the snapshot I/O to a low enough rate should be enough to maintain latency within acceptable values, even without implementing incremental snapshots in order to reduce the total amount of I/O being performed (vs just its rate). Compression via lz4 should probably be disabled in order to reduce CPU load, as already mentioned by Mathieu IIRC. This expectation is based on the fact that I/O should be the only resource that becomes contended when taking a snapshot without compression, while CPU and memory should be fine (I believe the only cost involved there is a memory copy of the WAL, since the database pages are not being copied anymore these days after canonical/dqlite#356). Or are there other impactful costs, e.g. checkpoint-related? If assumption 1. is not true, one would intuitively look at a disk-based dqlite FSM as a solution. However, in case of really large datasets, dqlite might not be an appropriate database choice anymore considering that:
unless of course one plans to change/fix those two aspects too. |
Will rerun the throttling tests with code from this tree https://github.com/MathieuBordere/raft/tree/throttle It's hacked together, so don't pay too much attention to the quality ... |
Thanks, didn't look at the details but looks good. It'd be interesting to experiment a bit with different values of sleep time and batch size, to see how they affect latency. Expectation would be of course that the smaller the batch size and the higher the sleep time, the lower will be the latency. If results don't match the expectation and latency is perfectly stable without snapshots at all, then there's probably something else going on that needs investigation and understanding. |
Fixes #461.
barrier->blocking
was not initialized, resulting inbarrier->blocking
beingtrue
in most of the cases, causing the effect seen in the issue.The tests didn't catch the bug as the barriers are properly initialized in the tests.
edit: not going to spend time on test for now.
Will still add a test that uses a slowUvSnapshot
call and ensures append requests get through.