Skip to content

Commit

Permalink
Make worker pool more resilient to slow tasks (#5751)
Browse files Browse the repository at this point in the history
* Make worker pool more resilient to slow tasks

* CHANGELOG.md

* fix invalid pointer
  • Loading branch information
thampiotr committed Nov 20, 2023
1 parent 18befb8 commit 196e5a2
Show file tree
Hide file tree
Showing 5 changed files with 366 additions and 91 deletions.
132 changes: 132 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,138 @@ This document contains a historical list of changes between releases. Only
changes that impact end-user behavior are listed; changes to documentation or
internal API changes are not present.

Main (unreleased)
-----------------

### Breaking changes

- Remove `otelcol.exporter.jaeger` component (@hainenber)

- In the mysqld exporter integration, some metrics are removed and others are renamed. (@marctc)
- Removed metrics:
- "mysql_last_scrape_failed" (gauge)
- "mysql_exporter_scrapes_total" (counter)
- "mysql_exporter_scrape_errors_total" (counter)
- Metric names in the `info_schema.processlist` collector have been [changed](https://github.com/prometheus/mysqld_exporter/pull/603).
- Metric names in the `info_schema.replica_host` collector have been [changed](https://github.com/prometheus/mysqld_exporter/pull/496).
- Changes related to `replication_group_member_stats collector`:
- metric "transaction_in_queue" was Counter instead of Gauge
- renamed 3 metrics starting with `mysql_perf_schema_transaction_` to start with `mysql_perf_schema_transactions_` to be consistent with column names.
- exposing only server's own stats by matching `MEMBER_ID` with `@@server_uuid` resulting "member_id" label to be dropped.

### Other changes

- Bump `mysqld_exporter` version to v0.15.0. (@marctc)
- Bump `github-exporter` version to 1.0.6. (@marctc)

### Features

- Added a new `stage.decolorize` stage to `loki.process` component which
allows to strip ANSI color codes from the log lines. (@thampiotr)

- Added a new `stage.sampling` stage to `loki.process` component which
allows to only process a fraction of logs and drop the rest. (@thampiotr)

- Added a new `stage.eventlogmessage` stage to `loki.process` component which
allows to extract data from Windows Event Log. (@thampiotr)

- Update version of River:

- River now supports raw strings, which are strings surrounded by backticks
instead of double quotes. Raw strings can span multiple lines, and do not
support any escape sequences. (@erikbaranowski)

- River now permits using `[]` to access non-existent keys in an object.
When this is done, the access evaluates to `null`, such that `{}["foo"]
== null` is true. (@rfratto)

- Added support for python profiling to `pyroscope.ebpf` component. (@korniltsev)

- Windows Flow Installer: Add /CONFIG /DISABLEPROFILING and /DISABLEREPORTING flag (@jkroepke)

- Add queueing logs remote write client for `loki.write` when WAL is enabled. (@thepalbi)

- New Grafana Agent Flow components:

- `otelcol.processor.filter` - filters OTLP telemetry data using OpenTelemetry
Transformation Language (OTTL). (@hainenber)

### Enhancements

- The `loki.write` WAL now has snappy compression enabled by default. (@thepalbi)

- Allow converting labels to structured metadata with Loki's structured_metadata stage. (@gonzalesraul)

- Improved performance of `pyroscope.scrape` component when working with a large number of targets. (@cyriltovena)

- Added support for comma-separated list of fields in `source` option and a
new `separator` option in `drop` stage of `loki.process`. (@thampiotr)

- The `loki.source.docker` component now allows connecting to Docker daemons
over HTTP(S) and setting up TLS credentials. (@tpaschalis)

- Added an `exclude_event_message` option to `loki.source.windowsevent` in flow mode,
which excludes the human-friendly event message from Windows event logs. (@ptodev)

- Improve detection of rolled log files in `loki.source.kubernetes` and
`loki.source.podlogs` (@slim-bean).

- Support clustering in `loki.source.kubernetes` (@slim-bean).

- Support clustering in `loki.source.podlogs` (@rfratto).

- Make component list sortable in web UI. (@hainenber)

- Adds new metrics (`mssql_server_total_memory_bytes`, `mssql_server_target_memory_bytes`,
and `mssql_available_commit_memory_bytes`) for `mssql` integration.

- Grafana Agent Operator: `config-reloader` container no longer runs as root.
(@rootmout)

- Added support for replaying not sent data for `loki.write` when WAL is enabled. (@thepalbi)

- Added support for unicode strings in `pyroscope.ebpf` python profiles. (@korniltsev)

- Improved resilience of graph evaluation in presence of slow components. (@thampiotr)

### Bugfixes

- Set exit code 1 on grafana-agentctl non-runnable command. (@fgouteroux)

- Fixed an issue where `loki.process` validation for stage `metric.counter` was
allowing invalid combination of configuration options. (@thampiotr)

- Fixed issue where adding a module after initial start, that failed to load then subsequently resolving the issue would cause the module to
permanently fail to load with `id already exists` error. (@mattdurham)

- Allow the usage of encodings other than UTF8 to be used with environment variable expansion. (@mattdurham)

- Fixed an issue where native histogram time series were being dropped silently. (@krajorama)

- Fix validation issue with ServiceMonitors when scrape timeout is greater than interval. (@captncraig)

- Static mode's spanmetrics processor will now prune histograms when the dimension cache is pruned.
Dimension cache was always pruned but histograms were not being pruned. This caused metric series
created by the spanmetrics processor to grow unbounded. Only static mode has this issue. Flow mode's
`otelcol.connector.spanmetrics` does not have this bug. (@nijave)

- Prevent logging errors on normal shutdown in `loki.source.journal`. (@wildum)

- Break on iterate journal failure in `loki.source.journal`. (@wildum)

- Fix file descriptor leak in `loki.source.journal`. (@wildum)

- Fixed a bug in River where passing a non-string key to an object (such as
`{}[true]`) would incorrectly report that a number type was expected instead. (@rfratto)

- Include Faro Measurement `type` field in `faro.receiver` Flow component and legacy `app_agent_receiver` integration. (@rlankfo)

- Mark `password` argument of `loki.source.kafka` as a `secret` rather than a `string`. (@harsiddhdave44)

- Fixed a bug where UDP syslog messages were never processed (@joshuapare)

- Updating configuration for `loki.write` no longer drops data. (@thepalbi)

v0.37.4 (2023-11-06)
-----------------

Expand Down
4 changes: 2 additions & 2 deletions pkg/flow/flow_updates_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ func TestController_Updates_WithQueueFull(t *testing.T) {
ModuleRegistry: newModuleRegistry(),
IsModule: false,
// The small number of workers and small queue means that a lot of updates will need to be retried.
WorkerPool: worker.NewShardedWorkerPool(1, 1),
WorkerPool: worker.NewFixedWorkerPool(1, 1),
})

// Use testUpdatesFile from graph_builder_test.go.
Expand Down Expand Up @@ -376,6 +376,6 @@ func newTestController(t *testing.T) *Flow {
ModuleRegistry: newModuleRegistry(),
IsModule: false,
// Make sure that we have consistent number of workers for tests to make them deterministic.
WorkerPool: worker.NewShardedWorkerPool(4, 100),
WorkerPool: worker.NewFixedWorkerPool(4, 100),
})
}
Loading

0 comments on commit 196e5a2

Please sign in to comment.