dnsrr discovery method does not work when "healthcheck" used #47

mnoky · 2019-07-30T12:59:55Z

Great project, I'm excited to get this working for a service I have deployed in a docker cluster! Currently testing 1.0-RC14 and I've hit the following snag:

The dnsrr discovery method does not work when a docker "healthcheck" is used. Reason being: during startup, the service name cannot be resolved. The name is not available until after the healthcheck succeeds and the service is up and running. Thus, it is a bit of a chicken-and-egg problem. The following exception is thrown at startup and the service cannot start (only relevant lines shown)

Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.hazelcast.core.HazelcastInstance]: Factory method 'hazelcastInstance' threw exception; nested exception is com.hazelcast.config.ConfigurationException: Cannot create a new instance of MemberAddressProvider 'class org.bitsofinfo.hazelcast.spi.docker.swarm.dnsrr.DockerDNSRRMemberAddressProvider'
...
Caused by: com.hazelcast.config.ConfigurationException: Cannot create a new instance of MemberAddressProvider 'class org.bitsofinfo.hazelcast.spi.docker.swarm.dnsrr.DockerDNSRRMemberAddressProvider'
...
    at com.hazelcast.instance.DefaultNodeContext.newMemberAddressProviderInstance(DefaultNodeContext.java:94)
    ... 63 more
    Caused by: java.net.UnknownHostException: my_service: Name or service not known
...
    at org.bitsofinfo.hazelcast.spi.docker.swarm.dnsrr.DockerDNSRRMemberAddressProvider.resolveServiceName(DockerDNSRRMemberAddressProvider.java:130)

When I disable the healthcheck for my service, the dns resolution works right away and there are no problems.

Is it possible to delay the dns lookup in DockerDNSRRMemberAddressProvider? Or does it need to be available right away?

The text was updated successfully, but these errors were encountered:

mnoky · 2019-07-30T13:23:08Z

Looks like others have encountered this problem as well:

moby/moby#35451

bitsofinfo · 2019-07-30T13:37:59Z

I don't think there is a way to do this out of the box. I think @Cardds would have to add an option for some kind of artificial sleep for such a thing, but I'm not sure even that would be reliable. @Cardds ?

mnoky · 2019-07-30T13:54:09Z

I haven't yet tried the DockerSwarmDiscoveryStrategy + SwarmMemberAddressProvider solution. I'm guessing the use of healthcheck will also be problematic here... Do you know offhand if this would be the case?

bitsofinfo · 2019-07-30T14:08:58Z

That method uses the actual swarm APIs to discover peers, so its not reliant on the auto-generated swarm peer level host/dns like DockerDNSRRMemberAddressProvider method. So it should work.

bitsofinfo · 2019-07-30T14:10:31Z

btw @mnoky, on that moby issue, I highly doubt that issue will ever be resolved. They've seemingly abandoned swarm to minimal maintenance mode at this point.

vinsgithub · 2022-05-25T20:31:14Z

Hi @bitsofinfo @mnoky, I've found a workaround for that in my scenario (not necessarily covers all) and I hope could help someone.

Little notice about swarm:
Swarm is not dead and still maintained. In some part also evolved by Mirantis because lots of companies are still using it. After 2019 many things have changed, sure, but swarm is still out there for those who don't need kubernetes and cloud services in general.

To overcome the initialization problem in my springboot (jhipster) microservice using your awesome hazelcast-docker-swarm solution, I've set this in my docker-compose:

   healthcheck:
      test: (echo 'exit' | curl -v telnet://localhost:8082 2>&1 | grep -c refused > /dev/null) || (curl -sS http://localhost:8082/management/health | grep -c UP > /dev/null)
      interval: 5s
      timeout: 30s
      retries: 4
      start_period: 3s #must be less than JHIPSTER_SLEEP

The rational behind this is that application needs to resolve docker service name during startup but swarm healthcheck does not allow it until healthcheck itself is ok. So we first allow healthcheck to be initially ok if local service port (8082) is refusing connection (application is starting) but as soon as local port is responding, healthcheck with test the real application check output.
It's not ideal but it's a good compromise.

bitsofinfo added enhancement question labels Jul 30, 2019

vinsgithub mentioned this issue May 25, 2022

Using healthcheck on swarm disturbs nameservices moby/moby#35451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dnsrr discovery method does not work when "healthcheck" used #47

dnsrr discovery method does not work when "healthcheck" used #47

mnoky commented Jul 30, 2019

mnoky commented Jul 30, 2019

bitsofinfo commented Jul 30, 2019

mnoky commented Jul 30, 2019

bitsofinfo commented Jul 30, 2019

bitsofinfo commented Jul 30, 2019

vinsgithub commented May 25, 2022

dnsrr discovery method does not work when "healthcheck" used #47

dnsrr discovery method does not work when "healthcheck" used #47

Comments

mnoky commented Jul 30, 2019

mnoky commented Jul 30, 2019

bitsofinfo commented Jul 30, 2019

mnoky commented Jul 30, 2019

bitsofinfo commented Jul 30, 2019

bitsofinfo commented Jul 30, 2019

vinsgithub commented May 25, 2022