Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] prepareClientToWrite #867

Open
keithchew opened this issue Sep 25, 2024 · 2 comments
Open

[CRASH] prepareClientToWrite #867

keithchew opened this issue Sep 25, 2024 · 2 comments

Comments

@keithchew
Copy link

Testing on async_flash branch, but this bug has been around before that:

7:63:M 25 Sep 2024 22:19:36.640 # === ASSERTION FAILED ===
7:63:M 25 Sep 2024 22:19:36.640 # ==> networking.cpp:300 'c->conn == nullptr || c->lock.fOwnLock()' is not true

------ STACK TRACE ------

Backtrace:
/opt/KeyDB/bin/keydb-server *:6379(prepareClientToWrite(client*)+0x139) [0x55bb6d4c2189]
/opt/KeyDB/bin/keydb-server *:6379(addReplyProto(client*, char const*, unsigned long)+0x17) [0x55bb6d4c2307]
/opt/KeyDB/bin/keydb-server *:6379(keysCommandCore(client*, redisDbPersistentDataSnapshot const*, char*, char*)+0x1a8) [0x55bb6d4db0d8]
/opt/KeyDB/bin/keydb-server *:6379(AsyncWorkQueue::WorkerThreadMain()+0x450) [0x55bb6d5d0110]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6df4) [0x7fbb43866df4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fbb4360b609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fbb4352e353]

Looking at the code, it looks like the assert check should be done after the fast exit checks? ie:

...
    if (!c->conn) return C_ERR; /* Fake client for AOF loading. */

    bool fAsync = !FCorrectThread(c);  <--- MOVE this from start of method above to here
...

This will not fix the crash if the fast exit checks pass, so will try it on my local first and see if the crash happens again.

@keithchew keithchew changed the title [CRASH] [CRASH] prepareClientToWrite Sep 25, 2024
@keithchew
Copy link
Author

I can confirm that the above did not solve the root cause of the crash. I reviewed the code a bit more, and here is likely the correct fix in keysCommandCore() in db.cpp:

    aeAcquireLock();
    std::unique_lock<decltype(cIn->lock)> lock(cIn->lock); <---- ADD THIS
    addReplyProto(cIn, c->buf, c->bufpos);
...
    lock.unlock(); <---- ADD THIS
    aeReleaseLock();

The above will ensure the

c->lock.fOwnLock()

condition is satisfied. Will continue testing to make sure all is stable.

@keithchew
Copy link
Author

keithchew commented Oct 3, 2024

After reviewing the code a bit more on locks, I have tidied up the above a little:

  AeLocker ae;
  std::unique_lock<decltype(cIn->lock)> lock(cIn->lock, std::defer_lock);
  bool fAsync = !FCorrectThread(cIn); 
  if (!fAsync) {
    lock.lock();
    ae.arm(cIn);
  } else {
    aeAcquireLock();
  }
...
  if (!fAsync) {
    ae.disarm();
    lock.unlock();
  } else {
    aeReleaseLock();
  }

Based on my testing, so far the keys command is solid and no crashes or deadlocks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant