Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster not able to reach quorum with single master. #100

Closed
prudhvigodithi opened this issue Apr 5, 2022 · 4 comments
Closed

Cluster not able to reach quorum with single master. #100

prudhvigodithi opened this issue Apr 5, 2022 · 4 comments
Labels
bug Something isn't working infra

Comments

@prudhvigodithi
Copy link
Collaborator

prudhvigodithi commented Apr 5, 2022

When creating a single master cluster the, get an error as master not discovered or elected yet

Cluster.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: os-logs
  namespace: os
spec:
  security:
    tls:
      http:
        generate: true
      transport:
        generate: true
        perNode: true
  general:
    httpPort: 9200
    vendor: opensearch
    version: 1.2.3
    serviceName: os-svc
    setVMMaxMapCount: true
  confMgmt:
    autoScaler: false
    monitoring: false
  dashboards:
    enable: true
    version: 1.2.0
    replicas: 1
  nodePools:
    - component: master
      replicas: 1
      diskSize: "100Gi"
      resources:
        requests:
          cpu: 500m
          memory: 1Gi
        limits:
          memory: 1Gi
      roles:
        - master
    - component: nodes
      replicas: 1
      diskSize: 1000Gi
      resources:
        requests:
          cpu: 500m
          memory: 2Gi
        limits:
          memory: 2Gi
      jvm: "-Xmx1G -Xms1G"
      roles:
        - data
    - component: client
      replicas: 1
      diskSize: 100Gi
      resources:
        requests:
          cpu: 500m
          memory: 2Gi
        limits:
          memory: 2Gi
      jvm: "-Xmx1G -Xms1G"
      roles:
        - data

Log:

[2022-04-04T12:05:00,713][WARN ][o.o.c.NodeConnectionsService] [os-logs-master-0] failed to connect to {os-logs-bootstrap-0}{zs52XaaoT0mHtvYMg3N_Aw}{0NSI6IQXSEqq-0WJv0bICQ}{os-logs-bootstrap-0}{192.168.4.104:9300}{m}{shard_indexing_pressure_enabled=true} (tried [25] times)
org.opensearch.transport.ConnectTransportException: [os-logs-bootstrap-0][192.168.4.104:9300] connect_exception
	at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1064) ~[opensearch-1.2.3.jar:1.2.3]
	at org.opensearch.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:213) ~[opensearch-1.2.3.jar:1.2.3]
	at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:55) ~[opensearch-core-1.2.3.jar:1.2.3]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2152) ~[?:?]
	at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:70) ~[opensearch-core-1.2.3.jar:1.2.3]
	at org.opensearch.transport.netty4.Netty4TcpChannel.lambda$addListener$0(Netty4TcpChannel.java:81) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) ~[?:?]
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:321) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:337) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: os-logs-bootstrap-0/192.168.4.104:9300
Caused by: java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
	at sun.nio.ch.Net.pollConnectNow(Net.java:660) ~[?:?]
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:875) ~[?:?]
	at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330) ~[?:?]
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[?:?]
	... 7 more

master not discovered or elected yet, an election requires a node with id [zs52XaaoT0mHtvYMg3N_Aw], have discovered [{os-logs-master-0}{T08PHfy_TgOcFmrxp8t-Xg}{G4DypmTBQn63VlsqqgI7fw}{os-logs-master-0}{192.168.20.231:9300}{m}{shard_indexing_pressure_enabled=true}] which is not a quorum; discovery will continue using [192.168.39.238:9300, 192.168.50.75:9300] from hosts providers and [{os-logs-bootstrap-0}{zs52XaaoT0mHtvYMg3N_Aw}{0NSI6IQXSEqq-0WJv0bICQ}{os-logs-bootstrap-0}{192.168.4.104:9300}{m}{shard_indexing_pressure_enabled=true}, {os-logs-master-0}{T08PHfy_TgOcFmrxp8t-Xg}{G4DypmTBQn63VlsqqgI7fw}{os-logs-master-0}{192.168.20.231:9300}{m}{shard_indexing_pressure_enabled=true}] from last-known cluster state; node term 1, last-accepted version 29 in term 1

Cause:
It's because the operator spin up a cluster with effectively 2 master nodes, then remove the initial master which causes quorum problems.

Possible solution to fix:
Before removing the operator has to make sure that the bootstrap node should not be part of the voting configuration.

@segalziv segalziv added bug Something isn't working infra labels Apr 19, 2022
@brentmjohnson
Copy link

Upvoting this - essentially adding support in the operator for discovery.type: single-node which is already supported in opensearch.

@elimumford
Copy link

Possible workaround for you that I got working with a single node cluser was to add:

nodePools:
    - component: masters
      additionalConfig:
        discovery.seed_hosts: <cluster-name>-masters-0
        cluster.initial_master_nodes: <cluster-name>-masters-0
      replicas: 1

I'm not sure if you will have to add that to each component group or not.

@idanl21
Copy link
Collaborator

idanl21 commented Jul 3, 2022

Fixed

@pasztorl
Copy link

When I adding the additional config the security update does not run successfully:

`OpenSearch Security not initialized.**************************************************************************
** This tool will be deprecated in the next major release of OpenSearch **
** opensearch-project/security#1755 **


Security Admin v7
Will connect to graylog-opensearch.graylog.svc.cluster.local:9200 ... done
Connected as "CN=admin,OU=graylog-opensearch"
OpenSearch Version: 2.2.1
Contacting opensearch cluster 'opensearch' and wait for YELLOW clusterstate ...
Clustername: graylog-opensearch
Clusterstate: GREEN
Number of nodes: 1
Number of data nodes: 0
.opendistro_security index does not exists, attempt to create it ... ERR: An unexpected SocketTimeoutException occured: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
Trace:
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-5 [ACTIVE]
at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:936)
at org.opensearch.client.RestClient.performRequest(RestClient.java:332)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working infra
Projects
None yet
Development

No branches or pull requests

6 participants