-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: replace worker connection's mutex with tbb::concurrent_hash_map
#2461
base: unstable
Are you sure you want to change the base?
feat: replace worker connection's mutex with tbb::concurrent_hash_map
#2461
Conversation
ee539f1
to
b42ca2d
Compare
8c8951b
to
1104241
Compare
tbb::concurrent_hash_map
tbb::concurrent_hash_map
Would you mind try redis-benchmark to test some commands? I don't know how this would improve our performance, it would great to have some data |
Hi @mapleFU , Thanks for your suggestion! Best Regards, |
|
||
for (const auto &conn : to_be_killed_conns) { | ||
FreeConnectionByID(conn.first, conn.second); | ||
} | ||
} | ||
|
||
std::vector<int> Worker::getConnFds() const { | ||
return tbb::parallel_reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO parallel_for looks like some cpu-bound operations, and would it occupies more threads than expected here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mapleFU ,
Thanks for your review suggestion.
I read the tbb's official document, and find the schedule uses all cores on default.
Maybe we should set a limitation for tbb schedule.
In fact, I'm not sure do we really need tbb hashmap to replace current mutex now. 🤔
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I'm not sure do we really need tbb hashmap to replace current mutex now.
I think it's worth doing if we don't traverse all conns_
frequently.
Hi @mapleFU , It seems that the performance just has a slight change. Here is my computer basic info.
This is the unstable branch's report with commit 6473d6c.
This is current tbb feature branch's report.
Here is the delta report of tbb minus unstable.
Best Regards, |
Thanks for your benchmarking! Maybe I'd like to run this and trying to perf the mutex contention when I have time. I've check out the benchmark result but the benchmark result turns to be unstable 🤔 The graph below shows that in different percentage result would be different...So maybe we should analysis the performance bottleneck for connections handling and check whether it could bring up performance improvement |
Hi @mapleFU , Thanks very much for your help! 😊 Best Regards, |
@mapleFU This PR generally looks good to me. Huge thanks for your efforts @LindaSummer. |
I don't have time for this today, I'll take a round tomorrow |
src/server/worker.cc
Outdated
for (const auto &iter : monitor_conns_) { | ||
conns.emplace_back(iter.second); | ||
auto collect_conns_fn = [](ConnMap &conns) { | ||
return parallel_reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is parallel_reduce
necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @PragmaTwice ,
I use a reduce action before iteration because the parallel_for
will acquire a range rather than one item.
So, I want to move the time costly action Close()
outside of the parallel_for
.
But since this function is inside the dtor, if the ownership is managed correctly, it should be the only function uses internal state of the Worker
object.
I think the parallel_reduce
can be replaced by parallel_for
or even removed.
I will refactor it and run tests before pushing to current PR.
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean maybe a single-thread for is enough here? didn't have a deep look yet cc @mapleFU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @PragmaTwice ,
I have removed the parallel_reduce
from dtor and all tests are passed in my repo's GitHub Actions.
I think in dtor the concurrency protection of internal state may be redundant since all references should be released except the dtor itself.
Best Regards,
Edward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean maybe a single-thread for is enough here? didn't have a deep look yet
Yeah parallel task also has a dop (degree of parallel), and I guess default of parallel_for
would be high and be good at writing cpu-bound tasks like OpenMP
I think maybe lock is prefered here
…r/kvrocks into feature/tbb_worker_with_mutex_pr
|
||
for (const auto &conn : to_be_killed_conns) { | ||
FreeConnectionByID(conn.first, conn.second); | ||
} | ||
} | ||
|
||
std::vector<int> Worker::getConnFds() const { | ||
return tbb::parallel_reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I'm not sure do we really need tbb hashmap to replace current mutex now.
I think it's worth doing if we don't traverse all conns_
frequently.
…r/kvrocks into feature/tbb_worker_with_mutex_pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other part LGTM
…r/kvrocks into feature/tbb_worker_with_mutex_pr
General LGTM, maybe I'd testing it this week, if the operations is just raw read/erase it would works better |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So sorry for delay replying
After an-around of thinking, I think:
Pros:
- Maybe faster on conn. When conn mutex is a bottleneck on throughput, it could actually making it faster
- In the code holds the mutex in a worker for a longtime, the new tbb will brings up smaller critical section and making the front-end user thread get faster response, which benifits the latency
Cons:
- Would conns be a large number? If conn is in a little number, it would brings up inconsistency between
monitor_conns_
andconns_
. It's much more difficult to figure out the correctness thanstd::mutex
. Especially on the function likeKickouIdleClients
. It improves the latency which blocked by the global lock, but when connections will not be so many, it doesn't make any sense @git-hulk tbb::parallel_for
would goes here: https://github.com/oneapi-src/oneTBB/blob/2d516c871b79d81032f0ca44d76e5072bc62d479/src/tbb/task_dispatcher.h#L243 which requires task_scheduler and an independent pool.
Hi @mapleFU , Thanks very much for your review suggestions! 😄
I agree on this concern. Removing the lock will lead to two connection collection's inconsistency.
Thanks for your investigation! By the way, after reading our codebase, I think that the tbb container may more suitable for containers which just need to protect themselves rather than more related states. Best Regards, |
redis::Reply(iter->second->Output(), reply); | ||
if (ConnMap::accessor accessor; conns_.find(accessor, fd)) { | ||
accessor->second->SetLastInteraction(); | ||
redis::Reply(accessor->second->Output(), reply); | ||
return Status::OK(); | ||
} | ||
|
||
return {Status::NotOK, "connection doesn't exist"}; | ||
} | ||
|
||
void Worker::BecomeMonitorConn(redis::Connection *conn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mapleFU ,
Sorry for delay reply.
I'd like to use this way on all accessors to mitigate potential inconsistency in monitor_conns_
and conns_
.
void Worker::BecomeMonitorConn(redis::Connection *conn) { | |
void Worker::BecomeMonitorConn(redis::Connection *conn) { | |
ConnMap::accessor accessor; | |
ConnMap::accessor monitor_accessor; | |
bool find_conn = conns_.find(accessor, conn->GetFD()); | |
bool find_monitor = monitor_conns_.find(monitor_accessor, conn->GetFD()); | |
if (find_conn) { | |
conns_.erase(accessor); | |
} | |
if (find_monitor) { | |
monitor_accessor->second = conn; | |
} else { | |
monitor_conns_.insert(monitor_accessor, std::make_pair(conn->GetFD(), conn)); | |
} | |
srv->IncrMonitorClientNum(); | |
conn->EnableFlag(redis::Connection::kMonitor); | |
} |
Best Regards,
Edward
Hi @mapleFU , Sorry for delay reply. Best Regards, |
Quality Gate passedIssues Measures |
Issue
#1389
Proposed Changes
std:::map
totbb::concurrent_hash_map
Comment
During my testing and researching on tbb, I find that the
tbb::concurrent_hash_map
is not thread safe on traversing and erasing.Here is the related issue: oneapi-src/oneTBB#183.
So, I have to keep the lock on iteration and erasing.After reading the tbb code, I find that
tbb::concurrent_hashmap
should userange()
withtbb::parallel_for
andtbb::parallel_reduce
for traversing.