Let's describe in details ClickHouse Custom Resource
Full example is available in 99-clickhouseinstallation-max.yaml file.
The best way to work with this doc is to open 99-clickhouseinstallation-max.yaml in separate tab
and look into it along with reading this explanation.
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "clickhouse-installation-test"
Create resource of kind: "ClickHouseInstallation"
named as "clickhouse-installation-max"
.
Accessible with kubectl
as:
kubectl get clickhouseinstallations.clickhouse.altinity.com
NAME AGE
clickhouse-installation-max 23h
defaults:
replicasUseFQDN: "no"
distributedDDL:
profile: default
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
serviceTemplate: chi-service-template
.spec.defaults
section represents default values for sections below.
.spec.defaults.replicasUseFQDN
- should replicas be specified by FQDN in<host></host>
.spec.defaults.distributedDDL
- reference to<yandex><distributed_ddl></distributed_ddl></yandex>
.spec.defaults.templates
would be used everywhere wheretemplates
is needed.
configuration:
.spec.configuration
section represents sources for ClickHouse configuration files. Be it users, remote servers and etc configuration files.
zookeeper:
nodes:
- host: zookeeper-0.zookeepers.zoo3ns.svc.cluster.local
port: 2181
- host: zookeeper-1.zookeepers.zoo3ns.svc.cluster.local
port: 2181
- host: zookeeper-2.zookeepers.zoo3ns.svc.cluster.local
port: 2181
session_timeout_ms: 30000
operation_timeout_ms: 10000
root: /path/to/zookeeper/node
identity: user:password
.spec.configuration.zookeeper
refers to <yandex><zookeeper></zookeeper></yandex> config section
.spec.configuration.profiles
refers to <yandex><profiles></profiles></yandex> settings sections.
profiles:
readonly/readonly: 1
expands into
<profiles>
<readonly>
<readonly>1</readonly>
</readonly>
</profiles>
.spec.configuration.quotas
refers to <yandex><quotas></quotas></yandex> settings sections.
quotas:
default/interval/duration: 3600
default/interval/queries: 10000
expands into
<quotas>
<default>
<interval>
<duration>3600</duration>
<queries>3600</queries>
</interval>
</default>
</quotas>
.spec.configuration.users
refers to <yandex><users></users></yandex> settings sections.
users:
test/networks/ip:
- "127.0.0.1"
- "::/0"
expands into
<users>
<test>
<networks>
<ip>127.0.0.1</ip>
<ip>::/0</ip>
</networks>
</test>
</users>
settings:
compression/case/method: "zstd"
# <compression>
# <case>
# <method>zstd</method>
# </case>
# </compression>
.spec.configuration.settings
refers to <yandex><profiles></profiles><users></users></yandex> settings sections.
files:
dict1.xml: |
<yandex>
<!-- ref to file /etc/clickhouse-data/config.d/source1.csv -->
</yandex>
source1.csv: |
a1,b1,c1,d1
a2,b2,c2,d2
.spec.configuration.files
allows to introduce custom files to ClickHouse via YAML manifest.
This can be used in order to create complex custom configurations. One possible usage example is external dictionary
spec:
configuration:
settings:
dictionaries_config: config.d/*.dict
files:
dict_one.dict: |
<yandex>
<dictionary>
<name>one</name>
<source>
<clickhouse>
<host>localhost</host>
<port>9000</port>
<user>default</user>
<password/>
<db>system</db>
<table>one</table>
</clickhouse>
</source>
<lifetime>60</lifetime>
<layout><flat/></layout>
<structure>
<id>
<name>dummy</name>
</id>
<attribute>
<name>one</name>
<expression>dummy</expression>
<type>UInt8</type>
<null_value>0</null_value>
</attribute>
</structure>
</dictionary>
</yandex>
clusters:
.spec.configuration.clusters
represents array of ClickHouse clusters definitions.
ClickHouse instances layout within cluster is described with .clusters.layout
section
- name: all-counts
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
layout:
shardsCount: 3
replicasCount: 2
Pod and VolumeClaim templates to be used can be specified explicitly for each replica:
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
layout
is specified with basic layout dimensions:
layout:
shardsCount: 3
replicasCount: 2
or with detailed specification of shards
and replicas
.
shard0
here has replicasCount
specified, while shard1
has 3 replicas explicitly specified, with possibility to customized each replica.
- name: customized
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
layout:
shards:
- name: shard0
replicasCount: 3
weight: 1
internalReplication: Disabled
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
- name: shard1
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
replicas:
- name: replica0
- name: replica1
- name: replica2
combination is also possible, which is presented in shard2
specification, where 3 replicas in total are requested with replicasCount
and one of these replicas is explicitly specified with different podTemplate
:
- name: customized
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
layout:
shards:
- name: shard2
replicasCount: 3
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
replicas:
- name: replica0
port: 9000
templates:
podTemplate: clickhouse-v19.11.3.11
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
ClickHouse cluster named all-counts
represented by layout with 3 shards of 2 replicas each (6 pods total).
Pods will be created and fully managed by the operator.
In ClickHouse config file this would be represented as:
<yandex>
<remote_servers>
<all-counts>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.1</host>
<port>9000</port>
</replica>
<replica>
<host>192.168.1.2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.3</host>
<port>9000</port>
</replica>
<replica>
<host>192.168.1.4</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.5</host>
<port>9000</port>
</replica>
<replica>
<host>192.168.1.6</host>
<port>9000</port>
</replica>
</shard>
</all-counts>
</remote_servers>
</yandex>
with full IP and DNS management provided by k8s and operator.
- name: shards-only
layout:
shardsCount: 3 # replicasCount = 1, by default
ClickHouse cluster named shards-only
represented by layout with 3 shards of 1 replicas each (3 pods total).
Pods will be created and fully managed by the operator.
In ClickHouse config file this would be represented as:
<yandex>
<remote_servers>
<shards-only>
<shard>
<replica>
<host>192.168.1.1</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>192.168.1.2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<replica>
<host>192.168.1.3</host>
<port>9000</port>
</replica>
</shard>
</shards-only>
</remote_servers>
</yandex>
- name: replicas-only
layout:
replicasCount: 3 # shardsCount = 1, by default
ClickHouse cluster named replicas-only
represented by layout with 1 shard of 3 replicas each (3 pods total).
Pods will be created and fully managed by the operator.
In ClickHouse config file this would be represented as:
<yandex>
<remote_servers>
<replicas-only>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>192.168.1.1</host>
<port>9000</port>
</replica>
<replica>
<host>192.168.1.2</host>
<port>9000</port>
</replica>
<replica>
<host>192.168.1.3</host>
<port>9000</port>
</replica>
</shard>
</replicas-only>
</remote_servers>
</yandex>
layout
provides possibility to explicitly define each shard and replica with
.spec.configuration.clusters.layout.shards
- name: customized
layout:
shards:
- replicas:
so we can specify shards
and replicas
explicitly - either all shards
and replias
or selectively,
only those which we'd like to be different from default template.
Full specification of replicas
in a shard. Note - no replicasCount
specified, all replicas are described by replicas
array:
- name: shard1
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
replicas:
- name: replica0
- name: replica1
- name: replica2
Another example with selectively described replicas. Note - replicasCount
specified and one replica is described explicitly
- name: shard2
replicasCount: 3
templates:
podTemplate: clickhouse-v18.16.1
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
replicas:
- name: replica0
port: 9000
templates:
podTemplate: clickhouse-v19.11.3.11
dataVolumeClaimTemplate: default-volume-claim
logVolumeClaimTemplate: default-volume-claim
templates:
serviceTemplates:
- name: chi-service-template
# generateName understands different sets of macroses,
# depending on the level of the object, for which Service is being created:
#
# For CHI-level Service:
# 1. {chi} - ClickHouseInstallation name
# 2. {chiID} - short hashed ClickHouseInstallation name (BEWARE, this is an experimental feature)
#
# For Cluster-level Service:
# 1. {chi} - ClickHouseInstallation name
# 2. {chiID} - short hashed ClickHouseInstallation name (BEWARE, this is an experimental feature)
# 3. {cluster} - cluster name
# 4. {clusterID} - short hashed cluster name (BEWARE, this is an experimental feature)
# 5. {clusterIndex} - 0-based index of the cluster in the CHI (BEWARE, this is an experimental feature)
#
# For Shard-level Service:
# 1. {chi} - ClickHouseInstallation name
# 2. {chiID} - short hashed ClickHouseInstallation name (BEWARE, this is an experimental feature)
# 3. {cluster} - cluster name
# 4. {clusterID} - short hashed cluster name (BEWARE, this is an experimental feature)
# 5. {clusterIndex} - 0-based index of the cluster in the CHI (BEWARE, this is an experimental feature)
# 6. {shard} - shard name
# 7. {shardID} - short hashed shard name (BEWARE, this is an experimental feature)
# 8. {shardIndex} - 0-based index of the shard in the cluster (BEWARE, this is an experimental feature)
#
# For Replica-level Service:
# 1. {chi} - ClickHouseInstallation name
# 2. {chiID} - short hashed ClickHouseInstallation name (BEWARE, this is an experimental feature)
# 3. {cluster} - cluster name
# 4. {clusterID} - short hashed cluster name (BEWARE, this is an experimental feature)
# 5. {clusterIndex} - 0-based index of the cluster in the CHI (BEWARE, this is an experimental feature)
# 6. {shard} - shard name
# 7. {shardID} - short hashed shard name (BEWARE, this is an experimental feature)
# 8. {shardIndex} - 0-based index of the shard in the cluster (BEWARE, this is an experimental feature)
# 9. {replica} - replica name
# 10. {replicaID} - short hashed replica name (BEWARE, this is an experimental feature)
# 11. {replicaIndex} - 0-based index of the replica in the shard (BEWARE, this is an experimental feature)
generateName: "service-{chi}"
# type ObjectMeta struct from k8s.io/meta/v1
metadata:
labels:
custom.label: "custom.value"
annotations:
cloud.google.com/load-balancer-type: "Internal"
service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0
service.beta.kubernetes.io/azure-load-balancer-internal: "true"
service.beta.kubernetes.io/openstack-internal-load-balancer: "true"
service.beta.kubernetes.io/cce-load-balancer-internal-vpc: "true"
# type ServiceSpec struct from k8s.io/core/v1
spec:
ports:
- name: http
port: 8123
- name: client
port: 9000
type: LoadBalancer
.spec.templates.serviceTemplates
represents Service templates
with additional sections, such as:
generateName
generateName
is used to explicitly specify service name to be created. generateName
provides the following macro substitutions:
{chi}
- ClickHouseInstallation name{chiID}
- short hashed ClickHouseInstallation name (BEWARE, this is an experimental feature){cluster}
- cluster name{clusterID}
- short hashed cluster name (BEWARE, this is an experimental feature){clusterIndex}
- 0-based index of the cluster in the CHI (BEWARE, this is an experimental feature){shard}
- shard name{shardID}
- short hashed shard name (BEWARE, this is an experimental feature){shardIndex}
- 0-based index of the shard in the cluster (BEWARE, this is an experimental feature){replica}
- replica name{replicaID}
- short hashed replica name (BEWARE, this is an experimental feature){replicaIndex}
- 0-based index of the replica in the shard (BEWARE, this is an experimental feature)
templates:
volumeClaimTemplates:
- name: default-volume-claim
# type PersistentVolumeClaimSpec struct from k8s.io/core/v1
spec:
# 1. If storageClassName is not specified, default StorageClass
# (must be specified by cluster administrator) would be used for provisioning
# 2. If storageClassName is set to an empty string (‘’), no storage class will be used
# dynamic provisioning is disabled for this PVC. Existing, “Available”, PVs
# (that do not have a specified storageClassName) will be considered for binding to the PVC
#storageClassName: gold
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
.spec.templates.volumeClaimTemplates
represents PersistentVolumeClaim templates
templates:
podTemplates:
# multiple pod templates makes possible to update version smoothly
# pod template for ClickHouse v18.16.1
- name: clickhouse-v18.16.1
# We may need to label nodes with clickhouse=allow label for this example to run
# See ./label_nodes.sh for this purpose
zone:
key: "clickhouse"
values:
- "allow"
# Shortcut version for AWS installations
#zone:
# values:
# - "us-east-1a"
# Possible values for distribution are:
# Unspecified
# OnePerHost
distribution: "Unspecified"
# type PodSpec struct {} from k8s.io/core/v1
spec:
containers:
- name: clickhouse
image: yandex/clickhouse-server:18.16.1
volumeMounts:
- name: default-volume-claim
mountPath: /var/lib/clickhouse
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "64Mi"
cpu: "100m"
.spec.templates.podTemplates
represents Pod Templates
with additional sections, such as:
zone
distribution
zone
and distribution
together define zoned layout of ClickHouse instances over nodes. Internally it is a shortcut to affinity.nodeAffinity
and affinity.podAntiAffinity
properly filled.
Example - how to place ClickHouse instances in AWS us-east-1a
availability zone with one ClickHouse per host
zone:
values:
- "us-east-1a"
distribution: "OnePerHost"
Example - how to place ClickHouse instances on nodes labeled as clickhouse=allow
with one ClickHouse per host
zone:
key: "clickhouse"
values:
- "allow"
distribution: "OnePerHost"