Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ race condition with Communicator::stringToProxy and Communicator::destroy #2585

Closed
externl opened this issue Jul 29, 2024 · 4 comments · Fixed by #2784
Closed

C++ race condition with Communicator::stringToProxy and Communicator::destroy #2585

externl opened this issue Jul 29, 2024 · 4 comments · Fixed by #2784
Assignees
Labels
Milestone

Comments

@externl
Copy link
Member

externl commented Jul 29, 2024

While porting Swift to use async/await I got this crash

* thread #2, queue = 'com.apple.root.default-qos.cooperative', stop reason = EXC_BAD_ACCESS (code=1, address=0x5f)
  * frame #0: 0x0000000105081c6c TestDriver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__is_long[abi:ue170006](this="") const at string:1734:33
    frame #1: 0x0000000105081c1c TestDriver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__get_pointer[abi:ue170006](this="") const at string:1869:17
    frame #2: 0x0000000105081bdc TestDriver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::data[abi:ue170006](this="") const at string:1559:73
    frame #3: 0x00000001050ca1c4 TestDriver`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:ue170006]<char, std::__1::char_traits<char>, std::__1::allocator<char>>(__os=0x000000016af15020, __str="") at ostream:1093:56
    frame #4: 0x00000001051272c4 TestDriver`IceInternal::EndpointI::initWithOptions(this=0x000060000166c418, args=size=2) at EndpointI.cpp:49:17
    frame #5: 0x0000000105138c54 TestDriver`IceInternal::IPEndpointI::initWithOptions(this=0x000060000166c418, args=size=2, oaEndpoint=false) at IPEndpointI.cpp:378:16
    frame #6: 0x000000010546ed10 TestDriver`IceInternal::TcpEndpointFactory::create(this=0x00006000024743d8, args=size=2, oaEndpoint=false) const at TcpEndpointI.cpp:364:12
    frame #7: 0x000000010512328c TestDriver`IceInternal::EndpointFactoryManager::create(this=0x0000600000474398, str="tcp -p 12010", oaEndpoint=false) const at EndpointFactoryManager.cpp:108:35
    frame #8: 0x00000001053fdeb8 TestDriver`IceInternal::ReferenceFactory::create(this=0x0000600001f74018, str="test:tcp -p 12010", propertyPrefix="") at ReferenceFactory.cpp:469:74
    frame #9: 0x00000001050989e8 TestDriver`Ice::Communicator::_stringToProxy(this=0x0000600003f4c658, s="test:tcp -p 12010") const at Communicator.cpp:84:43
    frame #10: 0x00000001056e37d0 TestDriver`std::__1::optional<Ice::ObjectPrx> Ice::Communicator::stringToProxy<Ice::ObjectPrx, true>(this=0x0000600003f4c658, str="test:tcp -p 12010") const at Communicator.h:95:30
    frame #11: 0x00000001056e35c8 TestDriver`-[ICECommunicator stringToProxy:error:](self=0x0000600002a703a0, _cmd="stringToProxy:error:", str="test:tcp -p 12010", error=0x000000016af16a70) at Communicator.mm:51:39
    frame #12: 0x0000000104fab6bc TestDriver`closure #1 in CommunicatorI.stringToProxyImpl<ProxyImpl>(self=0x0000600000e50d80, str="test:tcp -p 12010") at CommunicatorI.swift:251:46
    frame #13: 0x0000000104fad9c4 TestDriver`partial apply for closure #1 in CommunicatorI.stringToProxyImpl<A>(_:) at <compiler-generated>:0
    frame #14: 0x00000001a2d1c54c libswiftObjectiveC.dylib`ObjectiveC.autoreleasepool<τ_0_0>(invoking: () throws -> τ_0_0) throws -> τ_0_0 + 64
    frame #15: 0x0000000104fab14c TestDriver`CommunicatorI.stringToProxyImpl<ProxyImpl>(str="test:tcp -p 12010", self=0x0000600000e50d80) at CommunicatorI.swift:250:20
    frame #16: 0x0000000104fa6ca8 TestDriver`CommunicatorI.stringToProxy(str="test:tcp -p 12010", self=0x0000600000e50d80) at CommunicatorI.swift:48:13
    frame #17: 0x0000000104fabb64 TestDriver`protocol witness for Communicator.stringToProxy(_:) in conformance CommunicatorI at <compiler-generated>:0
    frame #18: 0x00000001057c0358 TestDriver`closure #1 in allTests(output=0x0000600002860230, comm=0x0000600000e50d80, ref="test:tcp -p 12010") at AllTests.swift:60:36
    frame #19: 0x00000001057c0d44 TestDriver`partial apply for closure #1 in allTests(_:) at <compiler-generated>:0
    frame #20: 0x0000000104fa2214 TestDriver`thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) at <compiler-generated>:0
    frame #21: 0x0000000104fa2370 TestDriver`partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) at <compiler-generated>:0

The code in question was running stringToProxy and destroy at the same time.

Task {
    try communicator.stringToProxy(ref)
}
communicator.destory()
@externl externl added the cpp label Jul 29, 2024
@externl externl added this to the 3.8.0 milestone Jul 29, 2024
@pepone
Copy link
Member

pepone commented Aug 14, 2024

It is not clear to me that this is a race condition issue. Doesn't Task create a detached task that could run after main exits?

@externl
Copy link
Member Author

externl commented Aug 14, 2024

That was just a snippet of the test that was failing. It was not exiting main for a while after.

It's reproducible in just C++ .

#include <Ice/Ice.h>

int main(int argc, char *argv[])
{
    int count = 0;
    for (;;)
    {
        Ice::CommunicatorHolder ich(argc, argv);
        auto communicator = ich.communicator();

        std::thread t([communicator]
                      {
                          for (;;)
                          {
                            try {
                                communicator->stringToProxy("test:tcp -h 127.0.0.1 -p 10000");
                            } catch (Ice::CommunicatorDestroyedException &e) {
                                break;
                            }

                          } });
        communicator->destroy();
        t.join();
    }

    return 0;
}
❯ clang++ -std=c++17 test.cpp -lIce -I/Users/joe/Developer/zeroc-ice/ice/cpp/include -I/Users/joe/Developer/zeroc-ice/ice/cpp/include/generated -L/Users/joe/Developer/zeroc-ice/ice/cpp/lib -Wl,-rpath,/Users/joe/Developer/zeroc-ice/ice/cpp/lib  -o test

❯ ./test
fish: Job 1, './test' terminated by signal SIGSEGV (Address boundary error)

@pepone pepone self-assigned this Aug 14, 2024
@externl
Copy link
Member Author

externl commented Aug 14, 2024

After running this a bunch there seems to be a variety of failures you can get.

  • invalid endpoint
  • SIGSEGV (Address boundary error)
  • EXC_BAD_ACCESS

@pepone
Copy link
Member

pepone commented Aug 14, 2024

One issue is that endpoint factories set the instance member to nullptr during destroy, but access it without chekcing.

For example:

void
IceInternal::TcpEndpointFactory::destroy()
{
_instance = nullptr;
}

and:

string
IceInternal::TcpEndpointFactory::protocol() const
{
return _instance->protocol();
}

A second issue is the endpoint factory manager clears the factories vector, while it is in use:

For example:

IceInternal::EndpointFactoryManager::destroy()
{
for (vector<EndpointFactoryPtr>::size_type i = 0; i < _factories.size(); i++)
{
_factories[i]->destroy();
}
_factories.clear();
}

and:

lock_guard lock(_mutex); // TODO: Necessary?
//
// TODO: Optimize with a map?
//
for (vector<EndpointFactoryPtr>::size_type i = 0; i < _factories.size(); i++)
{
if (_factories[i]->protocol() == protocol)
{
factory = _factories[i];
}
}

We can fix this one by holding the mutex in destroy.

For "invalid endpoint" it is probably because, the factory was destroyed and removed, before it has a chance to parse the endpoint. We can fix ReferenceFactory to throw CommunicatorDestroyedException in this case.

I also got this crash with the same test:

* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x180)
  * frame #0: 0x00000001023abb00 libIce.38a0.dylib`std::__1::shared_ptr<IceInternal::DefaultsAndOverrides>::get[abi:ue170006](this=0x0000000000000180) const at shared_ptr.h:871:16
    frame #1: 0x000000010239bee0 libIce.38a0.dylib`std::__1::shared_ptr<IceInternal::DefaultsAndOverrides>::operator bool[abi:ue170006](this=0x0000000000000180) const at shared_ptr.h:903:16
    frame #2: 0x000000010239be80 libIce.38a0.dylib`IceInternal::Instance::defaultsAndOverrides(this=0x0000000000000000) const at Instance.cpp:289:5
    frame #3: 0x000000010259234c libIce.38a0.dylib`IceInternal::ProtocolInstance::defaultSourceAddress(this=0x0000600000b14518) const at ProtocolInstance.cpp:103:23
    frame #4: 0x0000000102351264 libIce.38a0.dylib`IceInternal::IPEndpointI::initWithOptions(this=0x0000600001914018, args=size=0, oaEndpoint=false) at IPEndpointI.cpp:408:56

(lldb) frame select 3
frame #3: 0x00000001025922a8 libIce.38a0.dylib`IceInternal::ProtocolInstance::defaultSourceAddress(this=0x0000600001f80218) const at ProtocolInstance.cpp:103:23
   100 	const Address&
   101 	IceInternal::ProtocolInstance::defaultSourceAddress() const
   102 	{
-> 103 	    return _instance->defaultsAndOverrides()->defaultSourceAddress;
   104 	}
   105 	
   106 	const EncodingVersion&
(lldb) print _instance
(const IceInternal::InstancePtr) nullptr {
  __ptr_ = nullptr

@externl externl changed the title C++ data race with Communicator::stringToProxy and Communicator::destroy C++ race condition with Communicator::stringToProxy and Communicator::destroy Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants