Skip to content

Latest commit

 

History

History
289 lines (215 loc) · 13.9 KB

node-config.md

File metadata and controls

289 lines (215 loc) · 13.9 KB
title sidebar_position
Node configuration
1

The node configuration allows you to customize and optimize the settings for individual nodes in your cluster. It is divided into several sections:

  • Common configuration settings: shared top-level properties
  • Storage settings: defined in the storage section
  • Metastore settings: defined in the metastore section
  • Ingest settings: defined in the ingest_api section
  • Indexer settings: defined in the indexer section
  • Searcher settings: defined in the searcher section
  • Jaeger settings: defined in the jaeger section

A commented example is available here: quickwit.yaml.

Common configuration

Property Description Env variable Default value
version Config file version. 0.7 is the only available value with a retro compatibility on 0.5 and 0.4.
cluster_id Unique identifier of the cluster the node will be joining. Clusters sharing the same network should use distinct cluster IDs. QW_CLUSTER_ID quickwit-default-cluster
node_id Unique identifier of the node. It must be distinct from the node IDs of its cluster peers. Defaults to the instance's short hostname if not set. QW_NODE_ID short hostname
enabled_services Enabled services (control_plane, indexer, janitor, metastore, searcher) QW_ENABLED_SERVICES all services
listen_address The IP address or hostname that Quickwit service binds to for starting REST and GRPC server and connecting this node to other nodes. By default, Quickwit binds itself to 127.0.0.1 (localhost). This default is not valid when trying to form a cluster. QW_LISTEN_ADDRESS 127.0.0.1
advertise_address IP address advertised by the node, i.e. the IP address that peer nodes should use to connect to the node for RPCs. QW_ADVERTISE_ADDRESS listen_address
gossip_listen_port The port which to listen for the Gossip cluster membership service (UDP). QW_GOSSIP_LISTEN_PORT rest.listen_port
grpc_listen_port The port on which gRPC services listen for traffic. QW_GRPC_LISTEN_PORT rest.listen_port + 1
peer_seeds List of IP addresses or hostnames used to bootstrap the cluster and discover the complete set of nodes. This list may contain the current node address and does not need to be exhaustive. If the list of peer seeds contains a host name, Quickwit will resolve it by querying the DNS every minute. On kubernetes for instance, it is a good practise to set it to a headless service. QW_PEER_SEEDS
data_dir Path to directory where data (tmp data, splits kept for caching purpose) is persisted. This is mostly used in indexing. QW_DATA_DIR ./qwdata
metastore_uri Metastore URI. Can be a local directory or s3://my-bucket/indexes or postgres://username:password@localhost:5432/metastore. Learn more about the metastore configuration. QW_METASTORE_URI {data_dir}/indexes
default_index_root_uri Default index root URI that defines the location where index data (splits) is stored. The index URI is built following the scheme: {default_index_root_uri}/{index-id} QW_DEFAULT_INDEX_ROOT_URI {data_dir}/indexes
environment variable only Log level of Quickwit. Can be a direct log level, or a comma separated list of module_name=level RUST_LOG info

REST configuration

This section contains the REST API configuration options.

Property Description Env variable Default value
listen_port The port on which the REST API listens for HTTP traffic. QW_REST_LISTEN_PORT 7280
cors_allow_origins Configure the CORS origins which are allowed to access the API. Read more
extra_headers List of header names and values

Configuring CORS (Cross-origin resource sharing)

CORS (Cross-origin resource sharing) describes which address or origins can access the REST API from the browser. By default, sharing resources cross-origin is not allowed.

A wildcard, single origin, or multiple origins can be specified as part of the cors_allow_origins parameter:

Example of a REST configuration:

rest:
  listen_port: 1789
  extra_headers:
    x-header-1: header-value-1
    x-header-2: header-value-2
  cors_allow_origins: '*'

#   cors_allow_origins: https://my-hdfs-logs.domain.com   # Optionally we can specify one domain
#   cors_allow_origins:                                   # Or allow multiple origins
#     - https://my-hdfs-logs.domain.com
#     - https://my-hdfs.other-domain.com

gRPC configuration

This section contains the configuration options for gRPC services and clients used for internal communication between nodes.

Property Description Env variable Default value
max_message_size The maximum size (in bytes) of messages exchanged by internal gRPC clients and services. 20 MiB

Example of a gRPC configuration:

grpc:
  max_message_size: 30 MiB

:::warning We advise changing the default value of 20 MiB only if you encounter the following error: Error, message length too large: found 24732228 bytes, the limit is: 20971520 bytes. In that case, increase max_message_size by increments of 10 MiB until the issue disappears. This is a temporary fix: the next version of Quickwit, 0.8, will rely exclusively on gRPC streaming endpoints and handle messages of any length. :::

Storage configuration

Please refer to the dedicated storage configuration page to learn more about configuring Quickwit for various storage providers.

Here are also some minimal examples of how to configure Quickwit with Amazon S3 or Alibaba OSS:

AWS_ACCESS_KEY_ID=<your access key ID>
AWS_SECRET_ACCESS_KEY=<your secret access key>

Amazon S3

storage:
  s3:
    region: us-east-1

Alibaba

storage:
  s3:
    region: us-east-1
    endpoint: https://oss-us-east-1.aliyuncs.com

Metastore configuration

This section may contain one configuration subsection per available metastore implementation. The specific configuration parameters for each implementation may vary. Currently, the available metastore implementations are:

  • File-backed
  • PostgreSQL

File-backed metastore configuration

File-backed metastore doesn't have any node level configuration. You can configure the poll interval at the index level.

PostgreSQL metastore configuration

Property Description Default value
min_connections Minimum number of connections to maintain in the pool at all times. 0
max_connections Maximum number of connections to maintain in the pool. 10
acquire_connection_timeout Maximum amount of time to spend waiting for an available connection before aborting a query. 10s
idle_connection_timeout Maximum idle duration before closing individual connections. 10min
max_connection_lifetime Maximum lifetime of individual connections. 30min

Example of a metastore configuration for PostgreSQL in YAML format:

metastore:
  postgres:
    min_connections: 10
    max_connections: 50
    acquire_connection_timeout: 30s
    idle_connection_timeout: 1h
    max_connection_lifetime: 1d

Indexer configuration

This section contains the configuration options for an indexer. The split store is documented in the indexing document.

Property Description Default value
split_store_max_num_bytes Maximum size in bytes allowed in the split store. 100G
split_store_max_num_splits Maximum number of files allowed in the split store. 1000
max_concurrent_split_uploads Maximum number of concurrent split uploads allowed on the node. 12
merge_concurrency Maximum number of merge operations that can be executed on the node at one point in time. (2 x num threads available) / 3
enable_otlp_endpoint If true, enables the OpenTelemetry exporter endpoint to ingest logs and traces via the OpenTelemetry Protocol (OTLP). false
cpu_capacity Advisory parameter used by the control plane. The value can expressed be in threads (e.g. 2) or in term of millicpus (2000m). The control plane will attempt to schedule indexing pipelines on the different nodes proportionally to the cpu capacity advertised by the indexer. It is NOT used as a limit. All pipelines will be scheduled regardless of whether the cluster has sufficient capacity or not. The control plane does not attempt to spread the work equally when the load is well below the cpu_capacity. Users who need a balanced load on all of their indexer nodes can set the cpu_capacity to an arbitrarily low value as long as they keep it proportional to the number of threads available. num threads available

Example:

indexer:
  split_store_max_num_bytes: 100G
  split_store_max_num_splits: 1000
  max_concurrent_split_uploads: 12
  enable_otlp_endpoint: true

Ingest API configuration

Property Description Default value
max_queue_memory_usage Maximum size in bytes of the in-memory Ingest queue. 2GiB
max_queue_disk_usage Maximum disk-space in bytes taken by the Ingest queue. The minimum size is at least 256M and be at least max_queue_memory_usage. 4GiB

Example:

ingest_api:
  max_queue_memory_usage: 2GiB
  max_queue_disk_usage: 4GiB

Searcher configuration

This section contains the configuration options for a Searcher.

Property Description Default value
aggregation_memory_limit Controls the maximum amount of memory that can be used for aggregations before aborting. This limit is per searcher node. A node may run concurrent queries, which share the limit. The first query that will hit the limit will be aborted and frees its memory. It is used to prevent excessive memory usage during the aggregation phase, which can lead to performance degradation or crashes. 500M
aggregation_bucket_limit Determines the maximum number of buckets returned to the client. 65000
fast_field_cache_capacity Fast field in memory cache capacity on a Searcher. If your filter by dates, run aggregations, range queries, or if you use the search stream API, or even for tracing, it might worth increasing this parameter. The metrics starting by quickwit_cache_fastfields_cache can help you make an informed choice when setting this value. 1G
split_footer_cache_capacity Split footer in memory cache (it is essentially the hotcache) capacity on a Searcher. 500M
partial_request_cache_capacity Partial request in memory cache capacity on a Searcher. Cache intermediate state for a request, possibly making subsequent requests faster. It can be disabled by setting the size to 0. 64M
max_num_concurrent_split_searches Maximum number of concurrent split search requests running on a Searcher. 100
max_num_concurrent_split_streams Maximum number of concurrent split stream requests running on a Searcher. 100
split_cache Searcher split cache configuration options defined in the section below. Cache disabled if unspecified.
request_timeout_secs The time before a search request is cancelled. This should match the timeout of the stack calling into quickwit if there is one set. 30

Searcher split cache configuration

This section contains the configuration options for the on disk searcher split cache.

Property Description Default value
max_num_bytes Maximum disk size in bytes allowed in the split cache. Can be exceeded by the size of one split.
max_num_splits Maximum number of splits allowed in the split cache. 10000
num_concurrent_downloads Maximum number of concurrent download of splits. 1

Example:

searcher:
  fast_field_cache_capacity: 1G
  split_footer_cache_capacity: 500M
  partial_request_cache_capacity: 64M
  split_cache:
    max_num_bytes: 1G
    max_num_splits: 10000
    num_concurrent_downloads: 1

Jaeger configuration

Property Description Default value
enable_endpoint If true, enables the gRPC endpoint that allows the Jaeger Query Service to connect and retrieve traces. false

Example:

jaeger:
  enable_endpoint: true

Using environment variables in the configuration

You can use environment variable references in the config file to set values that need to be configurable during deployment. To do this, use:

${VAR_NAME}

where VAR_NAME is the name of the environment variable.

Each variable reference is replaced at startup by the value of the environment variable. The replacement is case-sensitive and occurs before the configuration file is parsed. Referencing undefined variables throws an error unless you specify a default value or custom error text.

To specify a default value, use:

${VAR_NAME:-default_value}

where default_value is the value to use if the environment variable is unset.

<config_field>: ${VAR_NAME}
or
<config_field>: ${VAR_NAME:-default value}

For example:

export QW_LISTEN_ADDRESS=0.0.0.0
# config.yaml
version: 0.7
cluster_id: quickwit-cluster
node_id: my-unique-node-id
listen_address: ${QW_LISTEN_ADDRESS}
rest:
  listen_port: ${QW_LISTEN_PORT:-1111}

Will be interpreted by Quickwit as:

version: 0.7
cluster_id: quickwit-cluster
node_id: my-unique-node-id
listen_address: 0.0.0.0
rest:
  listen_port: 1111