Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BufferOverflowError when buffers are full, fluentd is shutting down, and overflow_action is set to block #35

Closed
1 of 2 tasks
ryn9 opened this issue Mar 19, 2022 · 3 comments
Labels
question User forum like issues

Comments

@ryn9
Copy link

ryn9 commented Mar 19, 2022

(check apply)

  • read the contribution guideline
  • (optional) already reported 3rd party upstream repository or mailing list if you use k8s addon or helm charts.

Steps to replicate

Example config:

<system>
  workers 4
  root_dir /fluentd/root_dir
  log_level debug
</system>

<source>
  @type http
  port 8080
  bind 0.0.0.0
  body_size_limit 10m
  keepalive_timeout 10s
  @label @STANDARD
</source>


<label @STANDARD>

XXXFILTERSXXX

  <match **>
    @id out_opensearch
    @type opensearch
    scheme https
    ssl_verify true
    host XXXHOSTXXX
    port XXXPORTXXX
    user XXXUSERXXX
    password XXXPASSWORDXXX
    target_index_key @target_index
    remove_keys @target_index
    compression_level default_compression
    <buffer tag>
      @type file
      #path set via root_dir (for multi worker)
      compress gzip
      flush_interval 1s
      overflow_action block
      retry_type periodic
      retry_forever true
      retry_wait 5s
    </buffer>
  </match>

  <match **>
    @type null
  </match>

</label>


<label @ERROR>
  <match **>
    @type stdout
  </match>
</label>

<label @FLUENT_LOG>
  <match **>
    @type stdout
  </match>
</label>

Expected Behavior or What you need to ask

When OpenSearch service is unavailable and buffers are full, we see BufferOverflowError errors and messages sent to the the ERROR label, when shutting down fluentd.

Example errors:

2022-03-19 18:29:59 +0000 [warn]: #2 send an error event stream to @ERROR: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/buffer.rb:327:in `write'" tag=""
2022-03-19 18:29:59 +0000 [warn]: #3 send an error event stream to @ERROR: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/buffer.rb:327:in `write'" tag=""
2022-03-19 18:29:59 +0000 [warn]: #0 send an error event stream to @ERROR: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/usr/local/bundle/gems/fluentd-1.14.5/lib/fluent/plugin/buffer.rb:327:in `write'" tag=""

Not sure this issue is limited to the OpenSearch Output plugin

note - when using the monitor_agent, we see buffer_available_buffer_space_ratios actually go to a negative number
https://docs.fluentd.org/input/monitor_agent

Using Fluentd and OpenSearch plugin versions

OS version: Docker image built off ghcr.io/calyptia/fluentd:v1.14.5-debian-1.0
Bare Metal or within Docker or Kubernetes or others?: Docker image built off ghcr.io/calyptia/fluentd:v1.14.5-debian-1.0
Fluentd v1.0 or later: 1.14.5
OpenSearch plugin version: fluent-plugin-opensearch version 1.0.2

@cosmo0920
Copy link
Collaborator

overflow_action block

This parameter is not intended to improve throughput.
overflow_action block should be used for batch operation that is likely for Embulk like operation.
For ordinary cases, we should use throw_exception or drop_oldest_chunk actions when buffer is full.

FYI: Fluentd official document says that:

block: wait until buffer can store more data.
After buffer is ready for storing more data, writing buffer is retried.
Because of such behavior, block is suitable for processing batch execution,
so do not use for improving processing throughput or performance.

ref: https://docs.fluentd.org/configuration/buffer-section#flushing-parameters

@cosmo0920 cosmo0920 added the question User forum like issues label Mar 22, 2022
@ryn9
Copy link
Author

ryn9 commented Mar 22, 2022

@cosmo0920 the aforementioned configuration is not intended to improve throughput.

It is intended to serve as an 'at-least-once' style aggregator.
With the noted behavior, messages are arguably lost, in that the do not stay in the standard pipeline and instead get sent off to ERROR.

Again - I am not sure this issue is limited to the OpenSearch Output plugin

@cosmo0920
Copy link
Collaborator

This should be occurred with selecting the wrong option. You should select throw_exception or drop_oldest_chunk actions. This is Fluentd specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question User forum like issues
Projects
None yet
Development

No branches or pull requests

2 participants