Skip to content

Commit

Permalink
Revamp storage config pages
Browse files Browse the repository at this point in the history
  • Loading branch information
guilload committed Aug 10, 2023
1 parent 6968e43 commit 0c51a7f
Show file tree
Hide file tree
Showing 13 changed files with 388 additions and 569 deletions.
2 changes: 1 addition & 1 deletion docs/configuration/index-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Index configuration
sidebar_position: 2
sidebar_position: 3
---

This page describes how to configure an index.
Expand Down
8 changes: 4 additions & 4 deletions docs/configuration/metastore-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Metastore configuration
sidebar_position: 3
sidebar_position: 4
---

Quickwit needs a place to store meta-information about its indexes.
Expand Down Expand Up @@ -50,9 +50,9 @@ For convenience, Quickwit also makes it possible to store its metadata in files

The metastore is then configured by passing a [Storage URI](../reference/storage-uri.md) that will serve as the root of the metastore storage.

The metadata file associated with a given index will then be stored under
The metadata file associated with a given index will then be stored under

`[storage_uri]/[index_id]/metastore.json`
`[storage_uri]/[index_id]/metastore.json`

For the moment, Quickwit supports two types of storage types:

Expand Down Expand Up @@ -87,7 +87,7 @@ file:///local/indices#polling_interval=30s
```

:::caution
The file-backed metastore does not allow concurrent writes. For this reason, it should not be used in distributed settings.
The file-backed metastore does not allow concurrent writes. For this reason, it should not be used in distributed settings.
Running several indexer services on the same file-backed metastore can lead to the corruption of the metastore.
Running several search services, on the other hand, is perfectly safe.

Expand Down
86 changes: 13 additions & 73 deletions docs/configuration/node-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,94 +34,34 @@ A commented example is available here: [quickwit.yaml](https://github.com/quickw
| `default_index_root_uri` | Default index root URI that defines the location where index data (splits) is stored. The index URI is built following the scheme: `{default_index_root_uri}/{index-id}` | `QW_DEFAULT_INDEX_ROOT_URI` | `{data_dir}/indexes` |
| `rest_cors_allow_origins` | Configure the CORS origins which are allowed to access the API. [Read more](#configuring-cors-cross-origin-resource-sharing) | |


There are also other parameters that can be only defined by env variables:

| Env variable | Description |
| --- | --- |
| `QW_S3_ENDPOINT` | Custom S3 endpoint. |
| `QW_S3_MAX_CONCURRENCY` | Limit the number of concurent requests to S3 |
| `QW_ENABLE_JAEGER_EXPORTER` | Enable trace export to Jaeger. |
| `QW_AZURE_STORAGE_ACCOUNT` | Azure Blob Storage account name. |
| `QW_AZURE_STORAGE_ACCESS_KEY` | Azure Blob Storage account access key. |

More details about [storage configuration](../reference/storage-uri.md).

## Storage configuration

This section may contain one configuration subsection per storage provider. The specific configuration parameters for each provider may vary. Currently, the supported storage providers are:
- Azure
- Amazon S3 or S3-compatible providers
Here is a minimal example of how to configure Quickwit with Amazon S3 or Alibaba OSS:

If a storage configuration is not explicitly set, Quickwit will rely on the default settings provided by the SDK ([Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust), [AWS SDK for Rust](https://github.com/awslabs/aws-sdk-rust)) of each storage provider.

### Azure storage configuration

| Property | Description | Default value |
| --- | --- | --- |
| `account` | The Azure storage account name. | |
| `access_key` | The Azure storage account access key. | |
```bash
AWS_ACCESS_KEY_ID=<your access key ID>
AWS_SECRET_ACCESS_KEY=<your secret access key>
```

Example of a storage configuration for Azure in YAML format:
*Amazon S3*

```yaml
storage:
azure:
account: your-azure-account-name
access_key: your-azure-access-key
s3:
region: us-east-1
```
### S3 storage configuration
| Property | Description | Default value |
| --- | --- | --- |
| `flavor` | The optional storage flavor to use. Available flavors are `digital_ocean`, `garage`, `gcs`, and `minio`. | |
| `access_key_id` | The AWS access key ID. | |
| `secret_access_key` | The AWS secret access key. | |
| `region` | The AWS region to send requests to. | `us-east-1` (SDK default) |
| `endpoint` | Custom endpoint for use with S3-compatible providers. | SDK default |
| `force_path_style_access` | Disables [virtual-hosted–style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) requests. Required by some S3-compatible providers (Ceph, MinIO). | `false` |
| `disable_multi_object_delete` | Disables [Multi-Object Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) requests. Required by some S3-compatible providers (GCS). | `false` |
| `disable_multipart_upload` | Disables [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) of objects. Required by some S3-compatible providers (GCS). | `false` |

:::warning
Hardcoding credentials into configuration files is not secure and strongly discouraged. Prefer the alternative authentication methods that your storage backend may provide.
:::

**Storage flavors**

Storage flavors ensure that Quickwit works correctly with storage providers that deviate from the S3 API by automatically configuring the appropriate settings. The available flavors are:
- `digital_ocean`
- `garage`
- `gcs`
- `minio`

*Digital Ocean*

The Digital Ocean flavor (`digital_ocean`) forces path-style access and turns off multi-object delete requests.

*Garage flavor*

The Garage flavor (`garage`) overrides the `region` parameter to `garage` and forces path-style access.

*Google Cloud Storage*

The Google Cloud Storage flavor (`gcs`) turns off multi-object delete requests and multipart uploads.

*MinIO flavor*

The MinIO flavor (`minio`) forces path-style access.

Example of a storage configuration for Google Cloud Storage in YAML format:
*Alibaba*
```yaml
storage:
s3:
flavor: gcs
region: us-east1
endpoint: https://storage.googleapis.com
region: us-east-1
endpoint: https://oss-us-east-1.aliyuncs.com
```
Please refer to the dedicated [storage configuration](./storage-config.md) page to learn more about configuring Quickwit for various storage providers and setting additional parameters.
## Metastore configuration
This section may contain one configuration subsection per available metastore implementation. The specific configuration parameters for each implementation may vary. Currently, the available metastore implementations are:
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration/ports-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Ports configuration
sidebar_position: 5
sidebar_position: 6
---

When starting a quickwit search server, one important parameter that can be configured is
Expand Down
2 changes: 1 addition & 1 deletion docs/configuration/source-config.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Source configuration
sidebar_position: 4
sidebar_position: 5
---

Quickwit can insert data into an index from one or multiple sources.
Expand Down
148 changes: 148 additions & 0 deletions docs/configuration/storage-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
title: Storage configuration
sidebar_position: 2
---

## Supported Storage Providers

Quickwit currently supports three types of storage providers:
- Amazon S3 and S3-compatible (Garage, MinIO, ...)
- Azure Blob Storage
- Local file storage*

## Storage URIs

Storage URIs refer to different storage providers identified by a URI "protocol" or "scheme". Quickwit supports the following storage URI protocols:
- `s3://` for Amazon S3 and S3-compatible
- `azure://` for Azure Blob Storage
- `file://` for local file systems

In general, you can use a storage URI or a file path anywhere you would intuitively expect a file path. For instance:
- when setting the `index_uri` of an index to specify the storage provider and location;
_ when setting the `metastore_uri` in a node config to set up a file-backed metastore;
- when passing a file path as a command line argument.

### Local file storage URIs

Quickwit interprets regular file paths as local file system URIs. Relative file paths are allowed and are resolved relatively to the current working directory (CWD). `~` can be used as a shortcut to refer to the user’s home directory. The following are valid local file system URIs:

```markdown
- /var/quickwit
- file:///var/quickwit
- /home/quickwit/data
- ~/data
- ./quickwit
```

:::caution
When using the `file://` protocol, a third `/` is necessary to express an absolute path. For instance, the following URI `file://home/quickwit/` is interpreted as `./home/quickwit`
:::

## Storage configuration

This section contains one configuration subsection per storage provider. If a storage configuration parameter is not explicitly set, Quickwit relies on the default values provided by the storage provider SDKs ([Azure SDK for Rust](https://github.com/Azure/azure-sdk-for-rust), [AWS SDK for Rust](https://github.com/awslabs/aws-sdk-rust)).

### S3 storage configuration

| Property | Description | Default value |
| --- | --- | --- |
| `flavor` | The optional storage flavor to use. Available flavors are `digital_ocean`, `garage`, `gcs`, and `minio`. | |
| `access_key_id` | The AWS access key ID. | |
| `secret_access_key` | The AWS secret access key. | |
| `region` | The AWS region to send requests to. | `us-east-1` (SDK default) |
| `endpoint` | Custom endpoint for use with S3-compatible providers. | SDK default |
| `force_path_style_access` | Disables [virtual-hosted–style](https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html) requests. Required by some S3-compatible providers (Ceph, MinIO). | `false` |
| `disable_multi_object_delete` | Disables [Multi-Object Delete](https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html) requests. Required by some S3-compatible providers (GCS). | `false` |
| `disable_multipart_upload` | Disables [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html) of objects. Required by some S3-compatible providers (GCS). | `false` |

:::warning
Hardcoding credentials into configuration files is not secure and strongly discouraged. Prefer the alternative authentication methods that your storage backend may provide.
:::

#### Environment variables

| Env variable | Description |
| --- | --- |
| `QW_S3_ENDPOINT` | Custom S3 endpoint. |
| `QW_S3_MAX_CONCURRENCY` | Limit the number of concurent requests to S3 |

#### Storage flavors

Storage flavors ensure that Quickwit works correctly with storage providers that deviate from the S3 API by automatically configuring the appropriate settings. The available flavors are:
- `digital_ocean`
- `garage`
- `gcs`
- `minio`

*Digital Ocean*

The Digital Ocean flavor (`digital_ocean`) forces path-style access and turns off multi-object delete requests.

*Garage flavor*

The Garage flavor (`garage`) overrides the `region` parameter to `garage` and forces path-style access.

*Google Cloud Storage*

The Google Cloud Storage flavor (`gcs`) turns off multi-object delete requests and multipart uploads.

*MinIO flavor*

The MinIO flavor (`minio`) forces path-style access.

Example of a storage configuration for Google Cloud Storage in YAML format:

```yaml
storage:
s3:
flavor: gcs
region: us-east1
endpoint: https://storage.googleapis.com
```
### Azure storage configuration
| Property | Description | Default value |
| --- | --- | --- |
| `account` | The Azure storage account name. | |
| `access_key` | The Azure storage account access key. | |

#### Environment variables

| Env variable | Description |
| --- | --- |
| `QW_AZURE_STORAGE_ACCOUNT` | Azure Blob Storage account name. |
| `QW_AZURE_STORAGE_ACCESS_KEY` | Azure Blob Storage account access key. |

Example of a storage configuration for Azure in YAML format:

```yaml
storage:
azure:
account: your-azure-account-name
access_key: your-azure-access-key
```

## Storage configuration examples for various object storage providers

### Garage

[Garage](https://garagehq.deuxfleurs.fr/) is an open-source distributed object storage service tailored for self-hosting.

```yaml
storage:
s3:
flavor: garage
endpoint: http://127.0.0.1:3900
```

### MinIO

[MinIO](https://min.io/) is a high-performance object.

```yaml
storage:
s3:
flavor: minio
endpoint: http://127.0.0.1:9000
```
8 changes: 2 additions & 6 deletions docs/deployment/deployment-modes.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,8 @@ One indexer running on a small instance (4 vCPUs) can ingest documents at a thro
## Multiple indexers, multiple searchers

Indexing a single [data source](../configuration/source-config.md) on several indexers is only possible with a [Kafka source](../configuration/source-config.md#kafka-source).
Support distributed indexing for Pulsar and the Ingest API is planned for Quickwit 0.7 (Q3). Stay tuned!
Support for native distributed indexing is planned for Quickwit 0.7 (Q4). Stay tuned!

## File-backed metastore limitations

The file-backed metastore is a good fit for standalone and small deployments. However, it has the following limitations:
- It does not support multiple instances.
- It caches metastore data and polls files periodically to update its cache. Thus it has a delayed view of the metastore state.

As long as you can guarantee that no more than one metastore is running at any given time, the file-backed metastore is safe to use. For heavier workloads, we recommend using a PostgreSQL metastore.
The file-backed metastore is a good fit for standalone and small deployments. However, it does not support multiple instances running at the same time. As long as you can guarantee that no more than one metastore is running at any given time, the file-backed metastore is safe to use. For heavy workloads, we recommend using a PostgreSQL metastore.
44 changes: 0 additions & 44 deletions docs/guides/storage-setup/azure-setup.md

This file was deleted.

Loading

0 comments on commit 0c51a7f

Please sign in to comment.