-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling upgrades not done in proper order #156
Comments
in 1.2.3 we dint have any change in order of rolling deploy. Can you show the druid CR. and maybe some logs or screenshots. |
@itamar-marom @cyril-corbon if anyone of you have faced this issue ? |
@AdheipSingh I just tested introducing a change to the My Druid CR: apiVersion: druid.apache.org/v1alpha1
kind: Druid
metadata:
name: analytics
spec:
image: sample-image
imagePullPolicy: Always
imagePullSecrets:
- name: docker-registry-credentials
startScript: /druid.sh
podLabels:
environment: staging
release: stable
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 1000
rollingDeploy: true
defaultProbes: false
commonConfigMountPath: "/opt/druid/conf/druid/cluster/_common"
jvm.options: |-
-server
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/opt/druid/var/tmp/
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dorg.jboss.logging.provider=slf4j
-Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.SLF4JLogger
-Dlog4j.shutdownCallbackRegistry=org.apache.druid.common.config.Log4jShutdown
-Dlog4j.shutdownHookEnabled=true
-XX:HeapDumpPath=/opt/druid/var/historical.hprof
-XX:+ExitOnOutOfMemoryError
--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED
--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
log4j.config: |-
<?xml version="1.0" encoding="UTF-8" ?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
<Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="debug">
<AppenderRef ref="Console"/>
</Logger>
</Loggers>
</Configuration>
common.runtime.properties: |
# Zookeeper
# https://druid.apache.org/docs/latest/tutorials/cluster.html#configure-zookeeper-connection
# https://druid.apache.org/docs/latest/configuration/index.html#zookeeper
druid.zk.service.host=druid-zookeeper.druid.svc
druid.zk.paths.base=/druid
druid.zk.service.compress=false
# Metadata Store
# https://druid.apache.org/docs/latest/configuration/index.html#metadata-storage
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://druid-postgresql-cluster.druid.svc:5432/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password={ "type": "environment", "variable": "METADATA_STORAGE_PASSWORD" }
druid.metadata.storage.connector.createTables=true
# Deep Storage
# https://druid.apache.org/docs/latest/configuration/index.html#deep-storage
druid.storage.type=s3
druid.storage.bucket=sample-bucket
druid.storage.baseKey=segments
druid.storage.disableAcl=true
druid.s3.accessKey={ "type": "environment", "variable": "AWS_ACCESS_KEY_ID" }
druid.s3.secretKey={ "type": "environment", "variable": "AWS_SECRET_ACCESS_KEY" }
# Extensions
druid.extensions.loadList=["druid-basic-security", "postgresql-metadata-storage", "druid-kafka-indexing-service", "druid-s3-extensions", "druid-datasketches", "druid-lookups-cached-global", "druid-protobuf-extensions", "druid-parquet-extensions", "druid-distinctcount", "prometheus-emitter"]
# Lookups
# https://druid.apache.org/docs/latest/querying/lookups.html#saving-configuration-across-restarts
druid.lookup.enableLookupSyncOnStartup=false
# Logging
# https://druid.apache.org/docs/latest/configuration/index.html#startup-logging
druid.startup.logging.logProperties=true
# Task Logging
# https://druid.apache.org/docs/latest/configuration/index.html#task-logging
druid.indexer.logs.type=s3
druid.indexer.logs.s3Bucket=sample-bucket
druid.indexer.logs.s3Prefix=tasks
druid.indexer.logs.disableAcl=true
# Query request logging
# https://druid.apache.org/docs/28.0.1/configuration/#request-logging
druid.request.logging.type=filtered
# https://druid.apache.org/docs/28.0.1/configuration/#filtered-request-logging
druid.request.logging.delegate.type=slf4j
druid.request.logging.queryTimeThresholdMs=60000
druid.request.logging.sqlQueryTimeThresholdMs=600000
# Monitoring metrics
# https://druid.apache.org/docs/latest/configuration/index.html#enabling-metrics
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor"]
druid.monitoring.emissionPeriod=PT10S
# Metrics emitters
# https://druid.apache.org/docs/latest/configuration/index.html#metrics-emitters
druid.emitter=prometheus
# Prometheus Emitter
# https://druid.apache.org/docs/0.23.0/development/extensions-contrib/prometheus.html#configuration
druid.emitter.prometheus.strategy=exporter
druid.emitter.prometheus.port=9001
druid.emitter.prometheus.namespace=druid_native
druid.emitter.prometheus.addServiceAsLabel=true
druid.emitter.prometheus.dimensionMapPath=/opt/druid/conf/druid/cluster/_common/metricsMapping.json
# Cache
druid.cache.type=caffeine
# Security (Basic)
# https://druid.apache.org/docs/latest/development/extensions-core/druid-basic-security.html
# https://druid.apache.org/docs/latest/operations/security-overview.html#enable-an-authenticator
# https://druid.apache.org/docs/latest/design/auth.html
# Authenticator
druid.auth.authenticatorChain=["MyBasicMetadataAuthenticator"]
# MyBasicMetadataAuthenticator
druid.auth.authenticator.MyBasicMetadataAuthenticator.type=basic
druid.auth.authenticator.MyBasicMetadataAuthenticator.initialAdminPassword={ "type": "environment", "variable": "DRUID_ADMIN_PASSWORD" }
druid.auth.authenticator.MyBasicMetadataAuthenticator.initialInternalClientPassword={ "type": "environment", "variable": "DRUID_INTERNAL_PASSWORD" }
druid.auth.authenticator.MyBasicMetadataAuthenticator.credentialIterations=10000
druid.auth.authenticator.MyBasicMetadataAuthenticator.credentialsValidator.type=metadata
druid.auth.authenticator.MyBasicMetadataAuthenticator.skipOnFailure=false
druid.auth.authenticator.MyBasicMetadataAuthenticator.authorizerName=MyBasicMetadataAuthorizer
# Escalator
druid.escalator.type=basic
druid.escalator.internalClientUsername=druid_system
druid.escalator.internalClientPassword={ "type": "environment", "variable": "DRUID_INTERNAL_PASSWORD" }
druid.escalator.authorizerName=MyBasicMetadataAuthorizer
# Authorizer
druid.auth.authorizers=["MyBasicMetadataAuthorizer"]
# MyBasicMetadataAuthorizer
druid.auth.authorizer.MyBasicMetadataAuthorizer.type=basic
druid.auth.authorizer.MyBasicMetadataAuthorizer.initialAdminUser=admin
druid.auth.authorizer.MyBasicMetadataAuthorizer.initialAdminRole=admin
# Query
druid.generic.useThreeValueLogicForNativeFilters=true
druid.expressions.useStrictBooleans=true
druid.generic.useDefaultValueForNull=false
extraCommonConfig:
- name: druid-metrics-mapping
namespace: druid
volumeMounts:
- mountPath: /opt/druid/var
name: var-volume
volumes:
- name: var-volume
emptyDir: {}
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
envFrom:
- secretRef:
name: druid-credentials
- secretRef:
name: druid-kafka-credentials
- secretRef:
name: druid-s3-credentials
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
nodes:
############################## Druid Master #################################
overlords:
kind: StatefulSet
nodeType: "overlord"
podLabels:
druid-process: overlord
druid.port: 8090
# Requires this mount path due to Druid's start script design
# https://github.com/druid-io/druid-operator/issues/25
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
replicas: 1
podDisruptionBudgetSpec:
maxUnavailable: 1
extra.jvm.options: |-
-Xms2G
-Xmx2G
runtime.properties: |
# https://druid.apache.org/docs/latest/configuration/index.html#overlord
druid.service=druid/overlord
druid.plaintextPort=8090
# https://druid.apache.org/docs/latest/configuration/index.html#overlord-operations
druid.indexer.runner.type=httpRemote
druid.indexer.storage.type=metadata
druid.indexer.storage.recentlyFinishedThreshold=PT12H
druid.indexer.queue.startDelay=PT30S
# Monitoring
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.server.metrics.TaskCountStatsMonitor", "org.apache.druid.server.metrics.TaskSlotCountStatsMonitor"]
## Tasks Metadata/Logs Management
## https://druid.apache.org/docs/latest/operations/clean-metadata-store/#indexer-task-logs
# Cleanup of task logs and its associated metadata
druid.indexer.logs.kill.enabled=true
# 12 hours in milliseconds
druid.indexer.logs.kill.durationToRetain=43200000
# 5 min in milliseconds
druid.indexer.logs.kill.initialDelay=300000
# 6 hours in milliseconds
druid.indexer.logs.kill.delay=21600000
resources:
requests:
cpu: 1500m
memory: 6Gi
limits:
cpu: 2
memory: 10Gi
livenessProbe:
httpGet:
path: /status/health
port: 8090
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /status/health
port: 8090
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- druid-master
tolerations:
- key: druid
value: master
operator: Equal
effect: NoSchedule
ports:
- name: metrics
containerPort: 9001
services:
- spec:
type: ClusterIP
ports:
- name: service
port: 8090
- name: metrics
port: 9001
targetPort: metrics
coordinators:
kind: StatefulSet
nodeType: "coordinator"
podLabels:
druid-process: coordinator
druid.port: 8081
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/master/coordinator-overlord"
replicas: 1
podDisruptionBudgetSpec:
maxUnavailable: 1
extra.jvm.options: |-
-Xms4G
-Xmx4G
runtime.properties: |
# https://druid.apache.org/docs/latest/configuration/index.html#coordinator
druid.service=druid/coordinator
druid.plaintextPort=8081
# https://druid.apache.org/docs/latest/configuration/index.html#coordinator-operation
druid.coordinator.period=PT60S
druid.coordinator.startDelay=PT300S
druid.coordinator.period.indexingPeriod=PT600S
# Coordinator's Compaction duty
druid.coordinator.dutyGroups=["compaction"]
druid.coordinator.compaction.duties=["compactSegments"]
druid.coordinator.compaction.period=PT120S
## Metadata Management
## https://druid.apache.org/docs/latest/operations/clean-metadata-store/#configure-automated-metadata-cleanup
druid.coordinator.period.metadataStoreManagementPeriod=PT1H
# Cleanup unused segments older than 3 months
druid.coordinator.kill.on=true
druid.coordinator.kill.period=P1D
druid.coordinator.kill.durationToRetain=P90D
druid.coordinator.kill.maxSegments=1000
# Cleanup audit records older than 1 month
druid.coordinator.kill.audit.on=true
druid.coordinator.kill.audit.period=P1D
druid.coordinator.kill.audit.durationToRetain=P30D
# Cleanup supervisors records older than 1 month
druid.coordinator.kill.supervisor.on=true
druid.coordinator.kill.supervisor.period=P1D
druid.coordinator.kill.supervisor.durationToRetain=P30D
# Cleanup rules records older than 1 day
druid.coordinator.kill.rule.on=true
druid.coordinator.kill.rule.period=P1D
druid.coordinator.kill.rule.durationToRetain=P1D
# Cleanup auto-compaction configuration records on a daily basis
# only applies to datasources with no segments (used or unused)
druid.coordinator.kill.compaction.on=true
druid.coordinator.kill.compaction.period=P1D
# Cleanup supervisors' datasource records older than 7 days
# only applies when the supervisor has been terminated
druid.coordinator.kill.datasource.on=true
druid.coordinator.kill.datasource.period=P1D
druid.coordinator.kill.datasource.durationToRetain=P7D
resources:
requests:
cpu: 1500m
memory: 4Gi
limits:
cpu: 2
memory: 6Gi
livenessProbe:
httpGet:
path: /status/health
port: 8081
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /status/health
port: 8081
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- druid-master
tolerations:
- key: druid
value: master
operator: Equal
effect: NoSchedule
ports:
- name: metrics
containerPort: 9001
services:
- spec:
type: ClusterIP
ports:
- name: service
port: 8081
- name: metrics
port: 9001
targetPort: metrics
############################## Druid Data #################################
historicals:
kind: StatefulSet
nodeType: "historical"
podLabels:
druid-process: historical
druid.port: 8083
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/historical"
replicas: 1
podDisruptionBudgetSpec:
maxUnavailable: 1
extra.jvm.options: |-
-Xms5G
-Xmx5G
-XX:MaxDirectMemorySize=6G
runtime.properties: |
# https://druid.apache.org/docs/latest/configuration/index.html#historical
druid.service=druid/historical
druid.plaintextPort=8083
# HTTP server
# Sum of `druid.broker.http.numConnections` accross all the brokers in the cluster
druid.server.http.numThreads=70
# Processing threads and buffers
druid.processing.buffer.sizeBytes=500M
druid.processing.numMergeBuffers=4
druid.processing.numThreads=7
# Segment storage
# https://druid.apache.org/docs/latest/configuration/index.html#historical-general-configuration
druid.server.maxSize=500G
# https://druid.apache.org/docs/latest/configuration/index.html#storing-segments
druid.segmentCache.locations=[{\"path\":\"/druid/data/segments\",\"maxSize\":"500G"}]
# Segment loading
druid.segmentCache.numLoadingThreads=2
# Query cache
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.sizeInBytes=1G
# Monitoring
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]
## Query performance
# Default timeout (60 seconds) can be overriden at query context
druid.server.http.defaultQueryTimeout=60000
# GroupBy merging buffer per-query spilling to disk (1Gb)
druid.query.groupBy.maxOnDiskStorage=1000000000
resources:
requests:
cpu: 7
memory: 11Gi
limits:
cpu: 8
# 19GB for mapping segments to memory
memory: 30Gi
volumeClaimTemplates:
- metadata:
name: data-volume
spec:
storageClassName: gp3
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
volumeMounts:
- name: data-volume
mountPath: /druid/data
livenessProbe:
httpGet:
path: /status/health
port: 8083
initialDelaySeconds: 20
periodSeconds: 20
timeoutSeconds: 5
# 10 minutes
failureThreshold: 30
readinessProbe:
httpGet:
path: /druid/historical/v1/readiness
port: 8083
initialDelaySeconds: 20
periodSeconds: 30
timeoutSeconds: 5
# 100 minutes
failureThreshold: 200
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- druid-data
tolerations:
- key: druid
value: data
operator: Equal
effect: NoSchedule
ports:
- name: metrics
containerPort: 9001
services:
- spec:
type: ClusterIP
ports:
- name: service
port: 8083
- name: metrics
port: 9001
targetPort: metrics
middlemanagers:
kind: StatefulSet
nodeType: "middleManager"
podLabels:
druid-process: middleManager
druid.port: 8091
# Requires this mount path due to Druid's start script design
# https://github.com/druid-io/druid-operator/issues/25
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/data/middleManager"
replicas: 1
podDisruptionBudgetSpec:
maxUnavailable: 1
extra.jvm.options: |-
-Xms256M
-Xmx256M
runtime.properties: |-
# https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-and-peons
druid.service=druid/middleManager
druid.plaintextPort=8091
# https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-configuration
druid.indexer.runner.javaOptsArray=["-server", "-Xms2200M", "-Xmx2200M", "-XX:MaxDirectMemorySize=1800M", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.io.tmpdir=var/data/tmp/peons", "-XX:+ExitOnOutOfMemoryError", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/druid/data/peon.%t.%p.hprof", "--add-exports=java.base/jdk.internal.ref=ALL-UNNAMED", "--add-exports=java.base/jdk.internal.misc=ALL-UNNAMED", "--add-opens=java.base/java.lang=ALL-UNNAMED", "--add-opens=java.base/java.io=ALL-UNNAMED", "--add-opens=java.base/java.nio=ALL-UNNAMED", "--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED", "--add-opens=java.base/sun.nio.ch=ALL-UNNAMED"]
druid.worker.capacity=14
# The middle managers processes open ports from inside the Pod
# This forces us to open them from outside the Pod to allow communication to them
# Please, be careful with this
druid.indexer.runner.ports=[8100, 8101, 8102, 8103, 8104, 8105, 8106, 8107, 8108, 8109, 8110, 8111, 8112, 8113]
# HTTP server
# https://druid.apache.org/docs/latest/configuration/index.html#indexer-concurrent-requests
# Sum of `druid.broker.http.numConnections` accross all the brokers in the cluster
druid.server.http.numThreads=70
# Monitoring
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.client.cache.CacheMonitor"]
## Query Performance
# GroupBy merging buffer per-query spilling to disk (1GB)
druid.query.groupBy.maxOnDiskStorage=1000000000
# Query cache
druid.realtime.cache.useCache=true
druid.realtime.cache.populateCache=true
druid.cache.sizeInBytes=200Mi
# Additional Peons config:
# https://druid.apache.org/docs/latest/configuration/index.html#middlemanager-configuration
druid.indexer.task.baseTaskDir=/druid/data/persistent/task
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numThreads=4
druid.indexer.fork.property.druid.processing.numMergeBuffers=4
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=200MiB
# Monitoring
druid.indexer.fork.property.druid.emitter.prometheus.strategy=pushgateway
druid.indexer.fork.property.druid.emitter.prometheus.pushGatewayAddress=http://prometheus-pushgateway.kube-prometheus-stack.svc.cluster.local:9091
resources:
requests:
cpu: "15"
memory: 57G
limits:
cpu: "15.9"
memory: 58G
volumes:
- name: data-volume
emptyDir: {}
volumeMounts:
- name: data-volume
mountPath: /druid/data
livenessProbe:
httpGet:
path: /status/health
port: 8091
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /status/health
port: 8091
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
terminationGracePeriodSeconds: 1200
lifecycle:
preStop:
exec:
command:
- "/bin/sh"
- "/opt/druid/resources/scripts/mm_shutdown_hook.sh"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- druid-data-mm
tolerations:
- key: druid
value: data-mm
operator: Equal
effect: NoSchedule
ports:
- name: metrics
containerPort: 9001
# The middle managers processes open ports from inside the Pod
# This forces us to open them from outside the pod to allow communication to them
# They are configured on the 'middlemanagers.runtime.properties.druid.indexer.runner.ports'
# Please be careful with this
- name: peon-0
containerPort: 8100
- name: peon-1
containerPort: 8101
- name: peon-2
containerPort: 8102
- name: peon-3
containerPort: 8103
- name: peon-4
containerPort: 8104
- name: peon-5
containerPort: 8105
- name: peon-6
containerPort: 8106
- name: peon-7
containerPort: 8107
- name: peon-8
containerPort: 8108
- name: peon-9
containerPort: 8109
- name: peon-10
containerPort: 8110
- name: peon-11
containerPort: 8111
- name: peon-12
containerPort: 8112
- name: peon-13
containerPort: 8113
services:
- spec:
type: ClusterIP
ports:
- name: service
port: 8091
- name: metrics
port: 9001
targetPort: metrics
############################## Druid Query #################################
brokers:
kind: StatefulSet
nodeType: "broker"
podLabels:
druid-process: broker
druid.port: 8082
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/broker"
replicas: 3
podDisruptionBudgetSpec:
maxUnavailable: 1
extra.jvm.options: |-
-Xms8G
-Xmx8G
-XX:MaxDirectMemorySize=5g
runtime.properties: |
# https://druid.apache.org/docs/latest/configuration/index.html#broker
druid.service=druid/broker
druid.plaintextPort=8082
# HTTP server
druid.server.http.numThreads=30
# HTTP client
druid.broker.http.numConnections=20
druid.broker.http.maxQueuedBytes=20MiB
druid.broker.http.readTimeout=PT5M
# +/- 90% of druid.broker.http.readTimeout
druid.broker.http.unusedConnectionTimeout=PT4M
# Processing threads and buffers
druid.processing.buffer.sizeBytes=1G
druid.processing.numMergeBuffers=4
druid.processing.numThreads=1
druid.processing.tmpDir=/druid/data/processing
# Query cache disabled -- push down caching and merging instead
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false
# Monitoring
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.java.util.metrics.JvmCpuMonitor", "org.apache.druid.java.util.metrics.JvmThreadsMonitor", "org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]
# SQL settings
druid.sql.enable=true
druid.sql.planner.useNativeQueryExplain=true
## Query performance
# Default timeout (60 seconds) can be overriden at query context
druid.server.http.defaultQueryTimeout=60000
# GroupBy merging buffer per-query spilling to disk (1Gb)
druid.query.groupBy.maxOnDiskStorage=1000000000
# Subqueries
druid.server.http.maxSubqueryRows=800000
resources:
requests:
cpu: "3.5"
memory: 13Gi
limits:
cpu: "6"
#+1GB or 2GB overhead to allow usage spikes
memory: 14Gi
volumes:
- name: data-volume
emptyDir: {}
volumeMounts:
- name: data-volume
mountPath: /druid/data
livenessProbe:
httpGet:
path: /status/health
port: 8082
initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /druid/broker/v1/readiness
port: 8082
initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 20
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- druid-query
tolerations:
- key: druid
value: query
operator: Equal
effect: NoSchedule
ports:
- name: metrics
containerPort: 9001
services:
- spec:
type: ClusterIP
ports:
- name: service
port: 8082
- name: metrics
port: 9001
targetPort: metrics
routers:
kind: StatefulSet
nodeType: "router"
podLabels:
druid-process: router
druid.port: 8888
nodeConfigMountPath: "/opt/druid/conf/druid/cluster/query/router"
replicas: 1
podDisruptionBudgetSpec:
maxUnavailable: 1
extra.jvm.options: |-
-Xms4G
-Xmx4G
runtime.properties: |
# https://druid.apache.org/docs/latest/configuration/index.html#router
druid.service=druid/router
druid.plaintextPort=8888
# https://druid.apache.org/docs/latest/configuration/index.html#runtime-configuration
# Service discovery
druid.router.defaultBrokerServiceName=druid/broker
# HTTP server
druid.router.http.numConnections=50
druid.router.http.numMaxThreads=80
druid.router.http.readTimeout=PT5M
# Management proxy to coordinator/overlord: required for unified web console.
# https://druid.apache.org/docs/latest/design/router.html#router-as-management-proxy
druid.router.managementProxy.enabled=true
resources:
requests:
cpu: "3.5"
memory: 4Gi
limits:
cpu: "4"
memory: 5Gi
livenessProbe:
httpGet:
path: /status/health
port: 8888
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /status/health
port: 8888
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- druid-query
tolerations:
- key: druid
value: query
operator: Equal
effect: NoSchedule
ports:
- name: metrics
containerPort: 9001
services:
- spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 8888
- name: metrics
port: 9001
targetPort: metrics
|
config LGTM. Sadly i haven't seen this issue. Any logs or screens would be helpful. |
@AdheipSingh I provided a Google Drive link to a video showing the pod update processing. Do you have issues watching it? or isn't enough? https://drive.google.com/file/d/15GxhZZZWlhWiz-EXXAIarG81jMNMmu49/view?usp=sharing |
I did face the same issue in the upgrade process of Druid, from version 28.0.1 to 29.0.1 Operator version: v1.2.3 The screenshot shows Druid pods sorted by the creation Timestamp, and as layoaster said, the Overlords and the Historicals are being updated at the same time before the Middle Managers!
|
After upgrading to v1.2.3 and having
rollingDeploy=true
, I see that rolling updates are no longer performed in the usual order:Instead, the overlord and the historical instances are being updated at the same time and before the MM's.
Could you restore the previous order or adopt the recommended order.?
Note: Druid docs state that overlords can be updated before MMs if having "Autoscaling-based replacement". However this is only possible when deploying Druid on standalone EC2 instances.
EDIT: just to add Druid version:
28.0.1
The text was updated successfully, but these errors were encountered: