Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocking commands may block permanently due to lost wakeup #2473

Open
1 of 2 tasks
PokIsemaine opened this issue Aug 7, 2024 · 4 comments
Open
1 of 2 tasks

Blocking commands may block permanently due to lost wakeup #2473

PokIsemaine opened this issue Aug 7, 2024 · 4 comments
Labels
bug type bug

Comments

@PokIsemaine
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

Version

unstable

Minimal reproduce step

I discovered this issue while testing a PR(#2332), even though the PR increased the likelihood of it occurring, the problem actually existed beforehand.
It is difficult to reproduce manually without modifying the code, and it almost exclusively occurs during Go's integration tests.
Here's an example with TestRegression:

func TestRegression(t *testing.T) {
    srv := util.StartServer(t, map[string]string{})
    defer srv.Close()

    ctx := context.Background()
    rdb := srv.NewClient()
    defer func() { require.NoError(t, rdb.Close()) }()

    c := srv.NewTCPClient()
    defer func() { require.NoError(t, c.Close()) }()

    proto := "*3\r\n$5\r\nBLPOP\r\n$6\r\nhandle\r\n$1\r\n0\r\n"
    require.NoError(t, c.Write(fmt.Sprintf("%s%s", proto, proto)))

    resList := []string{"*2", "$6", "handle", "$1", "a"}

    v := rdb.RPush(ctx, "handle", "a")
    require.EqualValues(t, 1, v.Val())
    for _, res := range resList {
        c.MustRead(t, res)
    }

    v = rdb.RPush(ctx, "handle", "a")
    require.EqualValues(t, 1, v.Val())

    for _, res := range resList {
        c.MustRead(t, res)
    }
}

By printing some information, I was able to trace the error:

Start Execute BPOP
TryPopFormList

CommandPush::Execute
WakeupBlockingConns: handle, n_conns: 1
CommandPush::Execute OK

BlockingCommander::StartBlocking
BlockKeys

The issue occurs here:

// CommandBPop
  Status Execute(Server *srv, Connection *conn, std::string *output) override {
    srv_ = srv;
    InitConnection(conn);

    auto s = TryPopFromList();
    if (s.ok() || !s.IsNotFound()) {
      return Status::OK();  // error has already output in TryPopFromList
    }
    // <==HERE: Push is done after TryPopFromList, but not yet Blocking
    return StartBlocking(timeout_, output);
  }

To manually reproduce the issue consistently, you can add a sleep and complete a PUSH within 5 seconds:

  Status Execute(Server *srv, Connection *conn, std::string *output) override {
    srv_ = srv;
    InitConnection(conn);

    auto s = TryPopFromList();
    if (s.ok() || !s.IsNotFound()) {
      return Status::OK();  // error has already output in TryPopFromList
    }
    std::cout << "Sleep for 5 seconds" << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(5));
    std::cout << "Wake up" << std::endl;
    return StartBlocking(timeout_, output);
  }

image

What did you expect to see?

Blocking commands will not be blocked after new key values ​​are added.

What did you see instead?

Blocking command blocks permanently.

Anything Else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@git-hulk
Copy link
Member

git-hulk commented Aug 7, 2024

@PokIsemaine I'm not sure if I understood your point correctly:

  1. Client-A sends B[L|R]POP command to Kvrocks and no element in the list at that time
  2. Client-B publishes an element but Client-A hasn't entered the blocking mode, so Client-B's message won't wake up the Client-A.
  3. Client-A needs to be pending until the next publish message

Is that right?

@PokIsemaine
Copy link
Contributor Author

@PokIsemaine I'm not sure if I understood your point correctly:

  1. Client-A sends B[L|R]POP command to Kvrocks and no element in the list at that time
  2. Client-B publishes an element but Client-A hasn't entered the blocking mode, so Client-B's message won't wake up the Client-A.
  3. Client-A needs to be pending until the next publish message

Is that right?

Yeah

@git-hulk
Copy link
Member

git-hulk commented Aug 7, 2024

I see. Thank you for raising this issue. I knew this issue indeed, but didn't find a proper way to solve the data race. It's nice to have an issue to track this.

PokIsemaine added a commit to PokIsemaine/kvrocks that referenced this issue Aug 7, 2024
@PokIsemaine
Copy link
Contributor Author

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug type bug
Projects
None yet
Development

No branches or pull requests

2 participants