Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into issue/3443-ssd-cache
Browse files Browse the repository at this point in the history
  • Loading branch information
imotov committed Aug 14, 2023
2 parents 76ad046 + c9472e4 commit b1dc6e5
Show file tree
Hide file tree
Showing 129 changed files with 10,645 additions and 1,469 deletions.
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -88,9 +88,9 @@ archive:
workspace-deps-tree:
$(MAKE) -C $(QUICKWIT_SRC) workspace-deps-tree

.PHONY: build-docs
build-docs:
$(MAKE) -C $(QUICKWIT_SRC) build-docs
.PHONY: build-rustdoc
build-rustdoc:
$(MAKE) -C $(QUICKWIT_SRC) build-rustdoc

.PHONY: build-ui
build-ui:
Expand Down
8 changes: 5 additions & 3 deletions docs/configuration/index-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Index configuration
sidebar_position: 2
sidebar_position: 3
---

This page describes how to configure an index.
Expand Down Expand Up @@ -53,6 +53,7 @@ doc_mapping:
tokenizer: raw
tag_fields: ["resource.service"]
timestamp_field: timestamp
index_field_presence: true

search_settings:
default_search_fields: [severity_text, body]
Expand All @@ -69,7 +70,7 @@ The index ID is a string that uniquely identifies the index within the metastore
## Index uri

The index-uri defines where the index files (also called splits) should be stored.
This parameter expects a [storage uri](../reference/storage-uri).
This parameter expects a [storage uri](storage-config#storage-uris).

The `index-uri` parameter is optional.
By default, the `index-uri` will be computed by concatenating the `index-id` with the
Expand All @@ -93,6 +94,7 @@ The doc mapping defines how a document and the fields it contains are stored and
| `timestamp_field` | Timestamp field* used for sharding documents in splits. The field has to be of type `datetime`. [Learn more about time sharding](./../overview/architecture.md). | `None` |
`partition_key` | If set, quickwit will route documents into different splits depending on the field name declared as the `partition_key`. | `null` |
| `max_num_partitions` | Limits the number of splits created through partitioning. (See [Partitioning](../overview/concepts/querying.md#partitioning)) | `200` |
| `index_field_presence` | Enabling index field presence is required to allow for exists queries. Enabling it can have a significant CPU-cost on indexing. | false |

*: tags fields and timestamp field are expressed as a path from the root of the JSON object to the given field. If a field name contains a `.` character, it needs to be escaped with a `\` character.

Expand Down Expand Up @@ -127,7 +129,7 @@ fast:
| ------------- | ------------- | ------------- |
| `description` | Optional description for the field. | `None` |
| `stored` | Whether value is stored in the document store | `true` |
| `indexed` | Whether value should be indexed so it can be searhced | `true` |
| `indexed` | Whether value should be indexed so it can be searched | `true` |
| `tokenizer` | Name of the `Tokenizer`. ([See tokenizers](#description-of-available-tokenizers)) for a list of available tokenizers. | `default` |
| `record` | Describes the amount of information indexed, choices between `basic`, `freq` and `position` | `basic` |
| `fieldnorms` | Whether to store fieldnorms for the field. Fieldnorms are required to calculate the BM25 Score of the document. | `false` |
Expand Down
20 changes: 8 additions & 12 deletions docs/configuration/metastore-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Metastore configuration
sidebar_position: 3
sidebar_position: 4
---

Quickwit needs a place to store meta-information about its indexes.
Expand Down Expand Up @@ -48,28 +48,27 @@ Likewise, if you upgrade Quickwit to a version that includes some changes in the

For convenience, Quickwit also makes it possible to store its metadata in files using a file-backed metastore. In that case, Quickwit will write one file per index.

The metastore is then configured by passing a [Storage URI](../reference/storage-uri.md) that will serve as the root of the metastore storage.
The metastore is then configured by passing a [storage URI](storage-config#storage-uris) that will serve as the root of the metastore storage.

The metadata file associated with a given index will then be stored under
The metadata file associated with a given index will then be stored under

`[storage_uri]/[index_id]/metastore.json`
`[storage_uri]/[index_id]/metastore.json`

For the moment, Quickwit supports two types of storage types:

- a local file system URI (e.g., `file:///opt/toto`). It is also valid to pass a file path directly (without file://). `/var/quickwit`. Relative paths will be resolved with respect to the current working directory.
- S3-compatible storage URI (e.g. `s3://my-bucket/some-path`] ). See the [Storage URI](../reference/storage-uri.md) documentation to configure S3 or S3-compatible storage.
- S3-compatible storage URI (e.g., `s3://my-bucket/some-path`). See the [storage config](storage-config) documentation to configure S3 or S3-compatible storage providers.

### Polling configuration

By default, the File-Backed Metastore is only read once when you start a Quickwit process (searcher, indexer,...).
By default, the File-Backed Metastore is only read once when you start a Quickwit process (searcher, indexer, ...).

You can also configure it to poll the File-Backed Metastore periodically to keep a fresh view of it. This is useful for a Searcher instance that needs to be aware of new splits published by an Indexer running in parallel.

To configure the polling interval (in seconds only), add a URI fragment to the storage URI like this: `s3://quickwit/my-indexes#polling_interval=30s`

:::tip
Amazon S3 charges $0.0004 per 1000 GET requests. Polling a metastore every 30 seconds will induce a cost of $0.04 per month and per index.

Amazon S3 charges $0.0004 per 1000 GET requests. Polling a metastore every 30 seconds costs $0.04 per month and index.
:::

### Examples
Expand All @@ -87,8 +86,5 @@ file:///local/indices#polling_interval=30s
```

:::caution
The file-backed metastore does not allow concurrent writes. For this reason, it should not be used in distributed settings.
Running several indexer services on the same file-backed metastore can lead to the corruption of the metastore.
Running several search services, on the other hand, is perfectly safe.

The file-backed metastore does not support multiple instances running at the same time because it does not implement any locking mechanism to prevent concurrent writes from overwriting each other. Ensure that only one file-backed metastore instance is running at all times.
:::
84 changes: 12 additions & 72 deletions docs/configuration/node-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,92 +34,32 @@ A commented example is available here: [quickwit.yaml](https://github.com/quickw
| `default_index_root_uri` | Default index root URI that defines the location where index data (splits) is stored. The index URI is built following the scheme: `{default_index_root_uri}/{index-id}` | `QW_DEFAULT_INDEX_ROOT_URI` | `{data_dir}/indexes` |
| `rest_cors_allow_origins` | Configure the CORS origins which are allowed to access the API. [Read more](#configuring-cors-cross-origin-resource-sharing) | |


There are also other parameters that can be only defined by env variables:

| Env variable | Description |
| --- | --- |
| `QW_S3_ENDPOINT` | Custom S3 endpoint. |
| `QW_S3_MAX_CONCURRENCY` | Limit the number of concurent requests to S3 |
| `QW_ENABLE_JAEGER_EXPORTER` | Enable trace export to Jaeger. |
| `QW_AZURE_STORAGE_ACCOUNT` | Azure Blob Storage account name. |
| `QW_AZURE_STORAGE_ACCESS_KEY` | Azure Blob Storage account access key. |

More details about [storage configuration](../reference/storage-uri.md).

## Storage configuration

This section may contain one configuration subsection per storage provider. The specific configuration parameters for each provider may vary. Currently, the supported storage providers are:
- Azure
- Amazon S3 or S3-compatible providers
Please refer to the dedicated [storage configuration](storage-config) page to learn more about configuring Quickwit for various storage providers.

If a storage configuration is not explicitly set, Quickwit will rely on the default settings provided by the SDK ([Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust), [AWS SDK for Rust](https://github.com/awslabs/aws-sdk-rust)) of each storage provider.
Here are also some minimal examples of how to configure Quickwit with Amazon S3 or Alibaba OSS:

### Azure storage configuration

| Property | Description | Default value |
| --- | --- | --- |
| `account` | The Azure storage account name. | |
| `access_key` | The Azure storage account access key. | |
```bash
AWS_ACCESS_KEY_ID=<your access key ID>
AWS_SECRET_ACCESS_KEY=<your secret access key>
```

Example of a storage configuration for Azure in YAML format:
*Amazon S3*

```yaml
storage:
azure:
account: your-azure-account-name
access_key: your-azure-access-key
s3:
region: us-east-1
```
### S3 storage configuration
| Property | Description | Default value |
| --- | --- | --- |
| `flavor` | The optional storage flavor to use. Available flavors are `digital_ocean`, `garage`, `gcs`, and `minio`. | |
| `access_key_id` | The AWS access key ID. | |
| `secret_access_key` | The AWS secret access key. | |
| `region` | The AWS region to send requests to. | `us-east-1` (SDK default) |
| `endpoint` | Custom endpoint for use with S3-compatible providers. | SDK default |
| `force_path_style_access` | Disables [virtual-hosted–style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) requests. Required by some S3-compatible providers (Ceph, MinIO). | `false` |
| `disable_multi_object_delete` | Disables [Multi-Object Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) requests. Required by some S3-compatible providers (GCS). | `false` |
| `disable_multipart_upload` | Disables [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) of objects. Required by some S3-compatible providers (GCS). | `false` |

:::warning
Hardcoding credentials into configuration files is not secure and strongly discouraged. Prefer the alternative authentication methods that your storage backend may provide.
:::

**Storage flavors**

Storage flavors ensure that Quickwit works correctly with storage providers that deviate from the S3 API by automatically configuring the appropriate settings. The available flavors are:
- `digital_ocean`
- `garage`
- `gcs`
- `minio`

*Digital Ocean*

The Digital Ocean flavor (`digital_ocean`) forces path-style access and turns off multi-object delete requests.

*Garage flavor*

The Garage flavor (`garage`) overrides the `region` parameter to `garage` and forces path-style access.

*Google Cloud Storage*

The Google Cloud Storage flavor (`gcs`) turns off multi-object delete requests and multipart uploads.

*MinIO flavor*

The MinIO flavor (`minio`) forces path-style access.

Example of a storage configuration for Google Cloud Storage in YAML format:
*Alibaba*
```yaml
storage:
s3:
flavor: gcs
region: us-east1
endpoint: https://storage.googleapis.com
region: us-east-1
endpoint: https://oss-us-east-1.aliyuncs.com
```
## Metastore configuration
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration/ports-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Ports configuration
sidebar_position: 5
sidebar_position: 6
---

When starting a quickwit search server, one important parameter that can be configured is
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration/source-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Source configuration
sidebar_position: 4
sidebar_position: 5
---

Quickwit can insert data into an index from one or multiple sources.
Expand Down
148 changes: 148 additions & 0 deletions docs/configuration/storage-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
title: Storage configuration
sidebar_position: 2
---

## Supported Storage Providers

Quickwit currently supports three types of storage providers:
- Amazon S3 and S3-compatible (Garage, MinIO, ...)
- Azure Blob Storage
- Local file storage*

## Storage URIs

Storage URIs refer to different storage providers identified by a URI "protocol" or "scheme". Quickwit supports the following storage URI protocols:
- `s3://` for Amazon S3 and S3-compatible
- `azure://` for Azure Blob Storage
- `file://` for local file systems

In general, you can use a storage URI or a file path anywhere you would intuitively expect a file path. For instance:
- when setting the `index_uri` of an index to specify the storage provider and location;
- when setting the `metastore_uri` in a node config to set up a file-backed metastore;
- when passing a file path as a command line argument.

### Local file storage URIs

Quickwit interprets regular file paths as local file system URIs. Relative file paths are allowed and are resolved relatively to the current working directory (CWD). `~` can be used as a shortcut to refer to the user’s home directory. The following are valid local file system URIs:

```markdown
- /var/quickwit
- file:///var/quickwit
- /home/quickwit/data
- ~/data
- ./quickwit
```

:::caution
When using the `file://` protocol, a third `/` is necessary to express an absolute path. For instance, the following URI `file://home/quickwit/` is interpreted as `./home/quickwit`
:::

## Storage configuration

This section contains one configuration subsection per storage provider. If a storage configuration parameter is not explicitly set, Quickwit relies on the default values provided by the storage provider SDKs ([Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust), [AWS SDK for Rust](https://github.com/awslabs/aws-sdk-rust)).

### S3 storage configuration

| Property | Description | Default value |
| --- | --- | --- |
| `flavor` | The optional storage flavor to use. Available flavors are `digital_ocean`, `garage`, `gcs`, and `minio`. | |
| `access_key_id` | The AWS access key ID. | |
| `secret_access_key` | The AWS secret access key. | |
| `region` | The AWS region to send requests to. | `us-east-1` (SDK default) |
| `endpoint` | Custom endpoint for use with S3-compatible providers. | SDK default |
| `force_path_style_access` | Disables [virtual-hosted–style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) requests. Required by some S3-compatible providers (Ceph, MinIO). | `false` |
| `disable_multi_object_delete` | Disables [Multi-Object Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) requests. Required by some S3-compatible providers (GCS). | `false` |
| `disable_multipart_upload` | Disables [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) of objects. Required by some S3-compatible providers (GCS). | `false` |

:::warning
Hardcoding credentials into configuration files is not secure and strongly discouraged. Prefer the alternative authentication methods that your storage backend may provide.
:::

#### Environment variables

| Env variable | Description |
| --- | --- |
| `QW_S3_ENDPOINT` | Custom S3 endpoint. |
| `QW_S3_MAX_CONCURRENCY` | Limit the number of concurent requests to S3 |

#### Storage flavors

Storage flavors ensure that Quickwit works correctly with storage providers that deviate from the S3 API by automatically configuring the appropriate settings. The available flavors are:
- `digital_ocean`
- `garage`
- `gcs`
- `minio`

*Digital Ocean*

The Digital Ocean flavor (`digital_ocean`) forces path-style access and turns off multi-object delete requests.

*Garage flavor*

The Garage flavor (`garage`) overrides the `region` parameter to `garage` and forces path-style access.

*Google Cloud Storage*

The Google Cloud Storage flavor (`gcs`) turns off multi-object delete requests and multipart uploads.

*MinIO flavor*

The MinIO flavor (`minio`) forces path-style access.

Example of a storage configuration for Google Cloud Storage in YAML format:

```yaml
storage:
s3:
flavor: gcs
region: us-east1
endpoint: https://storage.googleapis.com
```
### Azure storage configuration
| Property | Description | Default value |
| --- | --- | --- |
| `account` | The Azure storage account name. | |
| `access_key` | The Azure storage account access key. | |

#### Environment variables

| Env variable | Description |
| --- | --- |
| `QW_AZURE_STORAGE_ACCOUNT` | Azure Blob Storage account name. |
| `QW_AZURE_STORAGE_ACCESS_KEY` | Azure Blob Storage account access key. |

Example of a storage configuration for Azure in YAML format:

```yaml
storage:
azure:
account: your-azure-account-name
access_key: your-azure-access-key
```

## Storage configuration examples for various object storage providers

### Garage

[Garage](https://garagehq.deuxfleurs.fr/) is an open-source distributed object storage service tailored for self-hosting.

```yaml
storage:
s3:
flavor: garage
endpoint: http://127.0.0.1:3900
```

### MinIO

[MinIO](https://min.io/) is a high-performance object storage.

```yaml
storage:
s3:
flavor: minio
endpoint: http://127.0.0.1:9000
```
Loading

0 comments on commit b1dc6e5

Please sign in to comment.