Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ruler): enables ruler store that uses clients from thanos-io/objstore pkg #11713

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
* [11654](https://github.com/grafana/loki/pull/11654) **dannykopping** Cache: atomically check background cache size limit correctly.
* [11682](https://github.com/grafana/loki/pull/11682) **ashwanthgoli** Metadata cache: Adds `frontend.max-metadata-cache-freshness` to configure the time window for which metadata results are not cached. This helps avoid returning inaccurate results by not caching recent results.
* [11679](https://github.com/grafana/loki/pull/11679) **dannykopping** Cache: extending #11535 to align custom ingester query split with cache keys for correct caching of results.
* [11713](https://github.com/grafana/loki/pull/11713) **ashwanthgoli** Ruler: Adds a new ruler storage layer that uses clients created from thanos-io/objstore pkg. Existing storage layer is now deprecated.

##### Fixes
* [11074](https://github.com/grafana/loki/pull/11074) **hainenber** Fix panic in lambda-promtail due to mishandling of empty DROP_LABELS env var.
Expand Down
286 changes: 286 additions & 0 deletions docs/sources/configure/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,9 @@ Pass the `-config.expand-env` flag at the command line to enable this way of set
# The ruler block configures the Loki ruler.
[ruler: <ruler>]

# The ruler_storage_config configures ruler storage backend.
[ruler_storage: <ruler_storage_config>]

# The ingester_client block configures how the distributor will connect to
# ingesters. Only appropriate when running all components, the distributor, or
# the querier.
Expand Down Expand Up @@ -5276,6 +5279,289 @@ Named store from this example can be used by setting object_store to store-1 in
[cos: <map of string to cos_storage_config>]
```

### ruler_storage_config

The `ruler_storage_config` configures ruler storage backend.

```yaml
# Backend storage to use. Supported backends are: s3, gcs, azure, swift,
# filesystem, local.
# CLI flag: -ruler-storage.backend
[backend: <string> | default = "s3"]

s3:
# The S3 bucket endpoint. It could be an AWS S3 endpoint listed at
# https://docs.aws.amazon.com/general/latest/gr/s3.html or the address of an
# S3-compatible service in hostname:port format.
# CLI flag: -ruler-storage.s3.endpoint
[endpoint: <string> | default = ""]

# S3 region. If unset, the client will issue a S3 GetBucketLocation API call
# to autodetect it.
# CLI flag: -ruler-storage.s3.region
[region: <string> | default = ""]

# S3 bucket name
# CLI flag: -ruler-storage.s3.bucket-name
[bucket_name: <string> | default = ""]

# S3 secret access key
# CLI flag: -ruler-storage.s3.secret-access-key
[secret_access_key: <string> | default = ""]

# S3 session token
# CLI flag: -ruler-storage.s3.session-token
[session_token: <string> | default = ""]

# S3 access key ID
# CLI flag: -ruler-storage.s3.access-key-id
[access_key_id: <string> | default = ""]

# If enabled, use http:// for the S3 endpoint instead of https://. This could
# be useful in local dev/test environments while using an S3-compatible
# backend storage, like Minio.
# CLI flag: -ruler-storage.s3.insecure
[insecure: <boolean> | default = false]

# The signature version to use for authenticating against S3. Supported values
# are: v4.
# CLI flag: -ruler-storage.s3.signature-version
[signature_version: <string> | default = "v4"]

# The S3 storage class to use. Details can be found at
# https://aws.amazon.com/s3/storage-classes/.
# CLI flag: -ruler-storage.s3.storage-class
[storage_class: <string> | default = "STANDARD"]

sse:
# Enable AWS Server Side Encryption. Supported values: SSE-KMS, SSE-S3.
# CLI flag: -ruler-storage.s3.sse.type
[type: <string> | default = ""]

# KMS Key ID used to encrypt objects in S3
# CLI flag: -ruler-storage.s3.sse.kms-key-id
[kms_key_id: <string> | default = ""]

# KMS Encryption Context used for object encryption. It expects JSON
# formatted string.
# CLI flag: -ruler-storage.s3.sse.kms-encryption-context
[kms_encryption_context: <string> | default = ""]

http_config:
# The time an idle connection will remain idle before closing.
# CLI flag: -ruler-storage.s3.http.idle-conn-timeout
[idle_conn_timeout: <duration> | default = 1m30s]

# The amount of time the client will wait for a servers response headers.
# CLI flag: -ruler-storage.s3.http.response-header-timeout
[response_header_timeout: <duration> | default = 2m]

# If the client connects via HTTPS and this option is enabled, the client
# will accept any certificate and hostname.
# CLI flag: -ruler-storage.s3.http.insecure-skip-verify
[insecure_skip_verify: <boolean> | default = false]

# Maximum time to wait for a TLS handshake. 0 means no limit.
# CLI flag: -ruler-storage.s3.tls-handshake-timeout
[tls_handshake_timeout: <duration> | default = 10s]

# The time to wait for a server's first response headers after fully writing
# the request headers if the request has an Expect header. 0 to send the
# request body immediately.
# CLI flag: -ruler-storage.s3.expect-continue-timeout
[expect_continue_timeout: <duration> | default = 1s]

# Maximum number of idle (keep-alive) connections across all hosts. 0 means
# no limit.
# CLI flag: -ruler-storage.s3.max-idle-connections
[max_idle_connections: <int> | default = 100]

# Maximum number of idle (keep-alive) connections to keep per-host. If 0, a
# built-in default value is used.
# CLI flag: -ruler-storage.s3.max-idle-connections-per-host
[max_idle_connections_per_host: <int> | default = 100]

# Maximum number of connections per host. 0 means no limit.
# CLI flag: -ruler-storage.s3.max-connections-per-host
[max_connections_per_host: <int> | default = 0]

gcs:
# GCS bucket name
# CLI flag: -ruler-storage.gcs.bucketname
[bucket_name: <string> | default = ""]

# JSON representing either a Google Developers Console client_credentials.json
# file or a Google Developers service account key file. If empty, fallback to
# Google default logic.
# CLI flag: -ruler-storage.gcs.service-account
[service_account: <string> | default = ""]

azure:
# Azure storage account name
# CLI flag: -ruler-storage.azure.account-name
[account_name: <string> | default = ""]

# Azure storage account key. If unset, Azure managed identities will be used
# for authentication instead.
# CLI flag: -ruler-storage.azure.account-key
[account_key: <string> | default = ""]

# If `connection-string` is set, the values of `account-name` and
# `endpoint-suffix` values will not be used. Use this method over
# `account-key` if you need to authenticate via a SAS token. Or if you use the
# Azurite emulator.
# CLI flag: -ruler-storage.azure.connection-string
[connection_string: <string> | default = ""]

# Azure storage container name
# CLI flag: -ruler-storage.azure.container-name
[container_name: <string> | default = "loki"]

# Azure storage endpoint suffix without schema. The account name will be
# prefixed to this value to create the FQDN. If set to empty string, default
# endpoint suffix is used.
# CLI flag: -ruler-storage.azure.endpoint-suffix
[endpoint_suffix: <string> | default = ""]

# Number of retries for recoverable errors
# CLI flag: -ruler-storage.azure.max-retries
[max_retries: <int> | default = 20]

# User assigned managed identity. If empty, then System assigned identity is
# used.
# CLI flag: -ruler-storage.azure.user-assigned-id
[user_assigned_id: <string> | default = ""]

http_config:
# The time an idle connection will remain idle before closing.
# CLI flag: -ruler-storage.azure.http.idle-conn-timeout
[idle_conn_timeout: <duration> | default = 1m30s]

# The amount of time the client will wait for a servers response headers.
# CLI flag: -ruler-storage.azure.http.response-header-timeout
[response_header_timeout: <duration> | default = 2m]

# If the client connects via HTTPS and this option is enabled, the client
# will accept any certificate and hostname.
# CLI flag: -ruler-storage.azure.http.insecure-skip-verify
[insecure_skip_verify: <boolean> | default = false]

# Maximum time to wait for a TLS handshake. 0 means no limit.
# CLI flag: -ruler-storage.azure.tls-handshake-timeout
[tls_handshake_timeout: <duration> | default = 10s]

# The time to wait for a server's first response headers after fully writing
# the request headers if the request has an Expect header. 0 to send the
# request body immediately.
# CLI flag: -ruler-storage.azure.expect-continue-timeout
[expect_continue_timeout: <duration> | default = 1s]

# Maximum number of idle (keep-alive) connections across all hosts. 0 means
# no limit.
# CLI flag: -ruler-storage.azure.max-idle-connections
[max_idle_connections: <int> | default = 100]

# Maximum number of idle (keep-alive) connections to keep per-host. If 0, a
# built-in default value is used.
# CLI flag: -ruler-storage.azure.max-idle-connections-per-host
[max_idle_connections_per_host: <int> | default = 100]

# Maximum number of connections per host. 0 means no limit.
# CLI flag: -ruler-storage.azure.max-connections-per-host
[max_connections_per_host: <int> | default = 0]

swift:
# OpenStack Swift authentication API version. 0 to autodetect.
# CLI flag: -ruler-storage.swift.auth-version
[auth_version: <int> | default = 0]

# OpenStack Swift authentication URL
# CLI flag: -ruler-storage.swift.auth-url
[auth_url: <string> | default = ""]

# Set this to true to use the internal OpenStack Swift endpoint URL
# CLI flag: -ruler-storage.swift.internal
[internal: <boolean> | default = false]

# OpenStack Swift username.
# CLI flag: -ruler-storage.swift.username
[username: <string> | default = ""]

# OpenStack Swift user's domain name.
# CLI flag: -ruler-storage.swift.user-domain-name
[user_domain_name: <string> | default = ""]

# OpenStack Swift user's domain ID.
# CLI flag: -ruler-storage.swift.user-domain-id
[user_domain_id: <string> | default = ""]

# OpenStack Swift user ID.
# CLI flag: -ruler-storage.swift.user-id
[user_id: <string> | default = ""]

# OpenStack Swift API key.
# CLI flag: -ruler-storage.swift.password
[password: <string> | default = ""]

# OpenStack Swift user's domain ID.
# CLI flag: -ruler-storage.swift.domain-id
[domain_id: <string> | default = ""]

# OpenStack Swift user's domain name.
# CLI flag: -ruler-storage.swift.domain-name
[domain_name: <string> | default = ""]

# OpenStack Swift project ID (v2,v3 auth only).
# CLI flag: -ruler-storage.swift.project-id
[project_id: <string> | default = ""]

# OpenStack Swift project name (v2,v3 auth only).
# CLI flag: -ruler-storage.swift.project-name
[project_name: <string> | default = ""]

# ID of the OpenStack Swift project's domain (v3 auth only), only needed if it
# differs the from user domain.
# CLI flag: -ruler-storage.swift.project-domain-id
[project_domain_id: <string> | default = ""]

# Name of the OpenStack Swift project's domain (v3 auth only), only needed if
# it differs from the user domain.
# CLI flag: -ruler-storage.swift.project-domain-name
[project_domain_name: <string> | default = ""]

# OpenStack Swift Region to use (v2,v3 auth only).
# CLI flag: -ruler-storage.swift.region-name
[region_name: <string> | default = ""]

# Name of the OpenStack Swift container to put chunks in.
# CLI flag: -ruler-storage.swift.container-name
[container_name: <string> | default = ""]

# Max retries on requests error.
# CLI flag: -ruler-storage.swift.max-retries
[max_retries: <int> | default = 3]

# Time after which a connection attempt is aborted.
# CLI flag: -ruler-storage.swift.connect-timeout
[connect_timeout: <duration> | default = 10s]

# Time after which an idle request is aborted. The timeout watchdog is reset
# each time some data is received, so the timeout triggers after X time no
# data is received on a request.
# CLI flag: -ruler-storage.swift.request-timeout
[request_timeout: <duration> | default = 5s]

filesystem:
# Local filesystem storage directory.
# CLI flag: -ruler-storage.filesystem.dir
[dir: <string> | default = ""]

local:
# Directory to scan for rules
# CLI flag: -ruler-storage.local.directory
[directory: <string> | default = ""]
```

## Runtime Configuration file

Loki has a concept of "runtime config" file, which is simply a file that is reloaded while Loki is running. It is used by some Loki components to allow operator to change some aspects of Loki configuration without restarting it. File is specified by using `-runtime-config.file=<filename>` flag and reload period (which defaults to 10 seconds) can be changed by `-runtime-config.reload-period=<duration>` flag. Previously this mechanism was only used by limits overrides, and flags were called `-limits.per-user-override-config=<filename>` and `-limits.per-user-override-period=10s` respectively. These are still used, if `-runtime-config.file=<filename>` is not specified.
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ require (
github.com/prometheus/alertmanager v0.26.0
github.com/prometheus/common/sigv4 v0.1.0
github.com/richardartoul/molecule v1.0.0
github.com/thanos-io/objstore v0.0.0-20230829152104-1b257a36f9a3
github.com/thanos-io/objstore v0.0.0-20231025225615-ff7faac741fb
github.com/willf/bloom v2.0.3+incompatible
go.opentelemetry.io/collector/pdata v1.0.0-rcv0015
go4.org/netipx v0.0.0-20230125063823-8449b0a6169f
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -1715,8 +1715,8 @@ github.com/tedsuo/ifrit v0.0.0-20191009134036-9a97d0632f00/go.mod h1:eyZnKCc955u
github.com/tencentcloud/tencentcloud-sdk-go v1.0.162/go.mod h1:asUz5BPXxgoPGaRgZaVm1iGcUAuHyYUo1nXqKa83cvI=
github.com/tencentyun/cos-go-sdk-v5 v0.7.40 h1:W6vDGKCHe4wBACI1d2UgE6+50sJFhRWU4O8IB2ozzxM=
github.com/tencentyun/cos-go-sdk-v5 v0.7.40/go.mod h1:4dCEtLHGh8QPxHEkgq+nFaky7yZxQuYwgSJM87icDaw=
github.com/thanos-io/objstore v0.0.0-20230829152104-1b257a36f9a3 h1:avZFY25vRM35FggTBQj2WXq45yEvIKbDLUcNDrJLfKU=
github.com/thanos-io/objstore v0.0.0-20230829152104-1b257a36f9a3/go.mod h1:oJ82xgcBDzGJrEgUsjlTj6n01+ZWUMMUR8BlZzX5xDE=
github.com/thanos-io/objstore v0.0.0-20231025225615-ff7faac741fb h1:fZuIuOSHsaUOJqvcWlIgt1lACXLF1073TmRuzoByQqw=
github.com/thanos-io/objstore v0.0.0-20231025225615-ff7faac741fb/go.mod h1:q369VBtseI5OQbK9IsGDfQCfcVu1fsur7ynUcojxnDA=
github.com/tidwall/gjson v1.6.0/go.mod h1:P256ACg0Mn+j1RXIDXoss50DeIABTYK1PULOJHhxOls=
github.com/tidwall/match v1.0.1/go.mod h1:LujAq0jyVjBy028G1WhWfIzbpQfMO8bBZ6Tyb0+pL9E=
github.com/tidwall/pretty v1.0.0/go.mod h1:XNkn88O1ChpSDQmQeStsy+sBenx6DDtFZJxhVysOjyk=
Expand Down
5 changes: 5 additions & 0 deletions pkg/loki/loki.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ type Config struct {
Frontend lokifrontend.Config `yaml:"frontend,omitempty"`
QueryRange queryrange.Config `yaml:"query_range,omitempty"`
Ruler ruler.Config `yaml:"ruler,omitempty"`
RulerStorage rulestore.Config `yaml:"ruler_storage,omitempty"`
IngesterClient ingester_client.Config `yaml:"ingester_client,omitempty"`
Ingester ingester.Config `yaml:"ingester,omitempty"`
IndexGateway indexgateway.Config `yaml:"index_gateway"`
Expand Down Expand Up @@ -160,6 +161,7 @@ func (c *Config) RegisterFlags(f *flag.FlagSet) {
c.TableManager.RegisterFlags(f)
c.Frontend.RegisterFlags(f)
c.Ruler.RegisterFlags(f)
c.RulerStorage.RegisterFlags(f)
c.Worker.RegisterFlags(f)
c.QueryRange.RegisterFlags(f)
c.RuntimeConfig.RegisterFlags(f)
Expand Down Expand Up @@ -227,6 +229,9 @@ func (c *Config) Validate() error {
if err := c.Ruler.Validate(); err != nil {
return errors.Wrap(err, "invalid ruler config")
}
if err := c.RulerStorage.Validate(); err != nil {
return errors.Wrap(err, "invalid ruler_storage config")
}
if err := c.Ingester.Validate(); err != nil {
return errors.Wrap(err, "invalid ingester config")
}
Expand Down
12 changes: 9 additions & 3 deletions pkg/loki/modules.go
Original file line number Diff line number Diff line change
Expand Up @@ -1029,13 +1029,15 @@ func (t *Loki) initQueryFrontend() (_ services.Service, err error) {
}

func (t *Loki) initRulerStorage() (_ services.Service, err error) {
logger := log.With(util_log.Logger, "component", "ruler-storage")

// if the ruler is not configured and we're in single binary then let's just log an error and continue.
// unfortunately there is no way to generate a "default" config and compare default against actual
// to determine if it's unconfigured. the following check, however, correctly tests this.
// Single binary integration tests will break if this ever drifts
legacyReadMode := t.Cfg.LegacyReadTarget && t.Cfg.isModuleEnabled(Read)
if (t.Cfg.isModuleEnabled(All) || legacyReadMode || t.Cfg.isModuleEnabled(Backend)) && t.Cfg.Ruler.StoreConfig.IsDefaults() {
level.Info(util_log.Logger).Log("msg", "Ruler storage is not configured; ruler will not be started.")
if (t.Cfg.isModuleEnabled(All) || legacyReadMode || t.Cfg.isModuleEnabled(Backend)) && t.Cfg.Ruler.StoreConfig.IsDefaults() && t.Cfg.RulerStorage.IsDefaults() {
level.Info(logger).Log("msg", "Ruler storage is not configured; ruler will not be started.")
return
}

Expand All @@ -1047,7 +1049,11 @@ func (t *Loki) initRulerStorage() (_ services.Service, err error) {
}
}

t.RulerStorage, err = base_ruler.NewLegacyRuleStore(t.Cfg.Ruler.StoreConfig, t.Cfg.StorageConfig.Hedging, t.clientMetrics, ruler.GroupLoader{}, util_log.Logger)
if !t.Cfg.Ruler.StoreConfig.IsDefaults() {
t.RulerStorage, err = base_ruler.NewLegacyRuleStore(t.Cfg.Ruler.StoreConfig, t.Cfg.StorageConfig.Hedging, t.clientMetrics, ruler.GroupLoader{}, logger)
} else {
t.RulerStorage, err = base_ruler.NewRuleStore(context.Background(), t.Cfg.RulerStorage, nil, ruler.GroupLoader{}, logger, prometheus.DefaultRegisterer)
}

return
}
Expand Down
Loading
Loading