Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEBDUTY-965: fix thread sanitizer errors in ut #883

Merged
merged 4 commits into from
Apr 4, 2024

Conversation

budevg
Copy link
Collaborator

@budevg budevg commented Apr 3, 2024

  • NEBDUTY-965: promise object passed as pointer to different thread released too early
  • NEBDUTY-965: Remove unsafe debug code in vhost-server aio backend
  • NEBDUTY-965: help tsan to deduce happens-before relation properly in vhost-server
  • NEBDUTY-965: avoid data race when reading WaitMode from rdma client config

…eased too early

promise object is passed as pointer to different thread which will invoke
SetValue on promise. The main thread which is waiting in
`promise.GetFuture().Wait()` will get control and delete the promise variable
keeping dangling pointer in the other thread.

Since the calling thread won't access the dangling pointer nothing bad should happen
but this triggers failure of unit tests compiled with tsan
This code is unsafe because it attempts to print the contents of a batch after
it has been submitted. This can lead to use-after-free errors if the IOs in the
batch are freed before the print function is called.

Additionally, this code caused
`make --sanitize=address -tt cloud/blockstore/vhost-server/ut`
to fail due to use-after-free errors.
…vhost-server

vhost-server unit tests with tsan fail with error similar to

```
WARNING: ThreadSanitizer: data race (pid=2340732)
  Read of size 8 at 0x7b1c000070b0 by thread T2:
    #0 NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::CompletionThreadFunc() /cloud/blockstore/vhost-server/backend_aio.cpp:391:46 (cloud-blockstore-vhost-server-ut+0x1e9def9) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #1 NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0::operator()() const /cloud/blockstore/vhost-server/backend_aio.cpp:229:45 (cloud-blockstore-vhost-server-ut+0x1e9d5a5) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #2 decltype(std::declval<NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0>()()) std::__y1::__invoke[abi:v15000]<NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0>(NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0&&) /contrib/libs/cxxsupp/libcxx/include/__functional/invoke.h:403:23 (cloud-blockstore-vhost-server-ut+0x1e9d505) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #3 void std::__y1::__thread_execute[abi:v15000]<std::__y1::unique_ptr<std::__y1::__thread_struct, std::__y1::default_delete<std::__y1::__thread_struct>>, NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0>(std::__y1::tuple<std::__y1::unique_ptr<std::__y1::__thread_struct, std::__y1::default_delete<std::__y1::__thread_struct>>, NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0>&, std::__y1::__tuple_indices<>) /contrib/libs/cxxsupp/libcxx/include/thread:284:5 (cloud-blockstore-vhost-server-ut+0x1e9d3cd) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #4 void* std::__y1::__thread_proxy[abi:v15000]<std::__y1::tuple<std::__y1::unique_ptr<std::__y1::__thread_struct, std::__y1::default_delete<std::__y1::__thread_struct>>, NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::Start()::$_0>>(void*) /contrib/libs/cxxsupp/libcxx/include/thread:295:5 (cloud-blockstore-vhost-server-ut+0x1e9cb72) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
  Previous write of size 8 at 0x7b1c000070b0 by thread T7:
    #0 calloc /contrib/libs/clang16-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:712:5 (cloud-blockstore-vhost-server-ut+0x1fd12e1) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #1 NCloud::NBlockStore::NVHostServer::PrepareIO(TLog&, TVector<NCloud::NBlockStore::NVHostServer::TAioDevice, std::__y1::allocator<NCloud::NBlockStore::NVHostServer::TAioDevice>> const&, vhd_io*, TVector<iocb*, std::__y1::allocator<iocb*>>&, unsigned long) /cloud/blockstore/vhost-server/request_aio.cpp:152:43 (cloud-blockstore-vhost-server-ut+0x1ee9700) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #2 NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::PrepareBatch(vhd_request_queue*, TVector<iocb*, std::__y1::allocator<iocb*>>&, unsigned long) /cloud/blockstore/vhost-server/backend_aio.cpp:320:9 (cloud-blockstore-vhost-server-ut+0x1ea5e52) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #3 NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TAioBackend::ProcessQueue(unsigned int, vhd_request_queue*, NCloud::NBlockStore::NVHostServer::TStats<unsigned long>&) /cloud/blockstore/vhost-server/backend_aio.cpp:256:32 (cloud-blockstore-vhost-server-ut+0x1e9a3f7) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #4 NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TServer::QueueThreadFunc(unsigned int) /cloud/blockstore/vhost-server/server.cpp:252:18 (cloud-blockstore-vhost-server-ut+0x1f36661) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #5 NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TServer::Start(NCloud::NBlockStore::NVHostServer::TOptions const&)::$_0::operator()() const /cloud/blockstore/vhost-server/server.cpp:173:47 (cloud-blockstore-vhost-server-ut+0x1f362ed) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)
    #6 decltype(std::declval<NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TServer::Start(NCloud::NBlockStore::NVHostServer::TOptions const&)::$_0>()()) std::__y1::__invoke[abi:v15000]<NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TServer::Start(NCloud::NBlockStore::NVHostServer::TOptions const&)::$_0>(NCloud::NBlockStore::NVHostServer::(anonymous namespace)::TServer::Start(NCloud::NBlockStore::NVHostServer::TOptions const&)::$_0&&) /contrib/libs/cxxsupp/libcxx/include/__functional/invoke.h:403:23 (cloud-blockstore-vhost-server-ut+0x1f36235) (BuildId: 8e691127638c86e2d0906170455404ea4e367474)

```

This is since TSAN doesn't understand that request allocated in VHOST thread is
later read in AIO thread when completion happens. This patch adds explicit tsan
annotations to mitigate these false positives
@budevg budevg changed the base branch from main to stable-23-3 April 3, 2024 12:36
@budevg budevg changed the title users/evgenybud/merge 23 3 v1 NEBDUTY-965: fix thread sanitizer errors in ut Apr 3, 2024
…onfig

Rdma client config can be modified during initial hand shake that's why CM
thread overwrites it by original config when Qp created  in
`TClientEndpoint::CreateQP`

thread sanitizer reports errors since this Config.WaitMode is used in
completion poller thread. To avoid this error WaitMode is read only once during
the initialization and of the endpoint and we don't modify it any more

Thread sanitizer error looks like that
```
WARNING: ThreadSanitizer: data race (pid=2354960)
  Write of size 8 at 0x7b5c00000118 by thread T2:
    #0 __tsan_memcpy contrib/libs/clang16-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:3165:3 (cloud-blockstore-libs-rdma-impl-ut+0x22f79f4) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #1 NCloud::NBlockStore::NRdma::(anonymous namespace)::TClientEndpoint::CreateQP() cloud/blockstore/libs/rdma/impl/client.cpp:591:16 (cloud-blockstore-libs-rdma-impl-ut+0x49e974a) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #2 NCloud::NBlockStore::NRdma::(anonymous namespace)::TClient::BeginConnect(NCloud::NBlockStore::NRdma::(anonymous namespace)::TClientEndpoint*) cloud/blockstore/libs/rdma/impl/client.cpp:1876:19 (cloud-blockstore-libs-rdma-impl-ut+0x49e85dc) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #3 NCloud::NBlockStore::NRdma::(anonymous namespace)::TClient::HandleConnectionEvent(std::__y1::unique_ptr<rdma_cm_event, int (*)(rdma_cm_event*)>) cloud/blockstore/libs/rdma/impl/client.cpp:1736:13 (cloud-blockstore-libs-rdma-impl-ut+0x49bbfb0) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #4 non-virtual thunk to NCloud::NBlockStore::NRdma::(anonymous namespace)::TClient::HandleConnectionEvent(std::__y1::unique_ptr<rdma_cm_event, int (*)(rdma_cm_event*)>) cloud/blockstore/libs/rdma/impl/client.cpp (cloud-blockstore-libs-rdma-impl-ut+0x49bc761) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #5 NCloud::NBlockStore::NRdma::(anonymous namespace)::TConnectionPoller::HandleConnectionEvents() cloud/blockstore/libs/rdma/impl/client.cpp:1282:27 (cloud-blockstore-libs-rdma-impl-ut+0x49d3dca) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #6 NCloud::NBlockStore::NRdma::(anonymous namespace)::TConnectionPoller::ThreadProc() cloud/blockstore/libs/rdma/impl/client.cpp:1254:25 (cloud-blockstore-libs-rdma-impl-ut+0x49d3870) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #7 non-virtual thunk to NCloud::NBlockStore::NRdma::(anonymous namespace)::TConnectionPoller::ThreadProc() cloud/blockstore/libs/rdma/impl/client.cpp (cloud-blockstore-libs-rdma-impl-ut+0x49d39d9) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #8 void* (anonymous namespace)::ThreadProcWrapper<ISimpleThread>(void*) util/system/thread.cpp:383:45 (cloud-blockstore-libs-rdma-impl-ut+0x2680796) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #9 (anonymous namespace)::TPosixThread::ThreadProxy(void*) util/system/thread.cpp:244:20 (cloud-blockstore-libs-rdma-impl-ut+0x268b82e) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
  Previous read of size 4 at 0x7b5c00000118 by thread T1:
    #0 NCloud::NBlockStore::NRdma::(anonymous namespace)::TClientEndpoint::AbortRequests() cloud/blockstore/libs/rdma/impl/client.cpp:814:16 (cloud-blockstore-libs-rdma-impl-ut+0x49c4494) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #1 NCloud::NBlockStore::NRdma::(anonymous namespace)::TCompletionPoller::HandlePollEvent(epoll_event const&) cloud/blockstore/libs/rdma/impl/client.cpp:1428:31 (cloud-blockstore-libs-rdma-impl-ut+0x49c4042) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #2 NCloud::NBlockStore::NRdma::(anonymous namespace)::TCompletionPoller::HandlePollEvents() cloud/blockstore/libs/rdma/impl/client.cpp:1446:17 (cloud-blockstore-libs-rdma-impl-ut+0x49c3736) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #3 void NCloud::NBlockStore::NRdma::(anonymous namespace)::TCompletionPoller::Execute<(NCloud::NBlockStore::NRdma::EWaitMode)0>() cloud/blockstore/libs/rdma/impl/client.cpp:1530:21 (cloud-blockstore-libs-rdma-impl-ut+0x49c3341) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #4 NCloud::NBlockStore::NRdma::(anonymous namespace)::TCompletionPoller::ThreadProc() cloud/blockstore/libs/rdma/impl/client.cpp:1398:17 (cloud-blockstore-libs-rdma-impl-ut+0x49c1c18) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #5 non-virtual thunk to NCloud::NBlockStore::NRdma::(anonymous namespace)::TCompletionPoller::ThreadProc() cloud/blockstore/libs/rdma/impl/client.cpp (cloud-blockstore-libs-rdma-impl-ut+0x49c1cf9) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #6 void* (anonymous namespace)::ThreadProcWrapper<ISimpleThread>(void*) util/system/thread.cpp:383:45 (cloud-blockstore-libs-rdma-impl-ut+0x2680796) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
    #7 (anonymous namespace)::TPosixThread::ThreadProxy(void*) util/system/thread.cpp:244:20 (cloud-blockstore-libs-rdma-impl-ut+0x268b82e) (BuildId: 49c919960fac17dc20567e23b8a866c9294bf128)
```
@budevg budevg force-pushed the users/evgenybud/merge-23-3-v1 branch from e412341 to ff0d3a0 Compare April 3, 2024 15:41
Copy link
Contributor

github-actions bot commented Apr 3, 2024

Note

This is an automated comment that will be appended during run.

🟢 default-linux-x86-64-relwithdebinfo: all tests PASSED for commit ff0d3a0.

TESTS PASSED ERRORS FAILED SKIPPED MUTED?
1519 1519 0 0 0 0

@budevg budevg requested a review from tpashkin April 3, 2024 16:51
@budevg budevg merged commit 2e0f0f2 into stable-23-3 Apr 4, 2024
6 checks passed
@budevg budevg deleted the users/evgenybud/merge-23-3-v1 branch April 4, 2024 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants