Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Client hang on KEYS command #878

Open
keithchew opened this issue Oct 20, 2024 · 1 comment
Open

[BUG] Client hang on KEYS command #878

keithchew opened this issue Oct 20, 2024 · 1 comment

Comments

@keithchew
Copy link

keithchew commented Oct 20, 2024

Testing on async_flash branch, this is a strange one.

I have a test client calling KEYS repeatedly, and other clients (group A) not calling the KEYS command. In order to trigger the bug, group A clients are periodically disconnected and reconnected to the server. When the bug happens, the client calling the KEYS will not get a reply and just hangs waiting.

I have added additional logs to the keysCommand() method in db.cpp, it appears that the code completes the method (ie enters and exits) and did not get blocked anywhere. I am currently narrowing this down as to why there is no reply to the client...

Just logging an issue first and will report more when I have tracked it down.

@keithchew
Copy link
Author

keithchew commented Oct 21, 2024

After further stress testing, it appears the KEYS command is very unstable. There are layers of locks on the async code path and I have encountered either crashes and freezes. Another freeze case here:

7:61:M 20 Oct 2024 22:53:29.818 # === ASSERTION FAILED ===
7:61:M 20 Oct 2024 22:53:29.818 # ==> networking.cpp:252 'FCorrectThread(c)' is not true

------ STACK TRACE ------

Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(clientInstallWriteHandler(client*)+0x204) [0x55b4e5bbc544]
/opt/KeyDB/bin/keydb-server *:6379(prepareClientToWrite(client*)+0x208) [0x55b4e5bbe8c8]
/opt/KeyDB/bin/keydb-server *:6379(addReplyProto(client*, char const*, unsigned long)+0x17) [0x55b4e5bbec77]
/opt/KeyDB/bin/keydb-server *:6379(keysCommandCore(client*, redisDbPersistentDataSnapshot const*, char*, char*)+0x284) [0x55b4e5bd79e4]
/opt/KeyDB/bin/keydb-server *:6379(+0x26eaa7) [0x55b4e5be0aa7]
/opt/KeyDB/bin/keydb-server *:6379(AsyncWorkQueue::WorkerThreadMain()+0x448) [0x55b4e5ccfd48]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7f988be36df4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f988bbdb609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f988bafe353]

On my local, I have disabled the async code in keysCommand, ie always complete the command in sync mode. So far, it is stable, so there is definitely something fishy going on in the async code flow. Reviewing the code, this is the only place where it creates a fake client and writes the replies from the fake client to the real client. I believe this approach is messing up the low level code (in networking.cpp) for locking and sending replies to the client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant