Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric comments generated even when no metrics #207

Open
patrick-stephens opened this issue Aug 23, 2022 · 3 comments
Open

Metric comments generated even when no metrics #207

patrick-stephens opened this issue Aug 23, 2022 · 3 comments

Comments

@patrick-stephens
Copy link

patrick-stephens commented Aug 23, 2022

Following the guidance here: https://docs.fluentd.org/monitoring-fluentd/monitoring-prometheus#step-1-counting-incoming-records-by-prometheus-filter-plugin

Setting up an output plugin metric also generates HELP and TYPE comments for other possible but not provided metrics.

Whilst the spec technically allows this (the wording is they can only exist a maximum of once) it can confuse scraping tools and does not seem right: we should only generate those special comments when a metric exists.

An example using this config:

<source>
  @type forward
  port 5000
</source>

<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>

<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>

<match **>
  @type null
</match>

This when run produces empty metrics for some which then some scrapers do not like:

$ docker run --user 0 --rm -it -p 24231:24231 -v $PWD:/fluentd/etc ghcr.io/calyptia/fluentd:edge-debian sh -c "fluent-gem install fluent-plugin-prometheus; su fluentd; tini -- /bin/entrypoint.sh -c /fluentd/etc/fluentd.conf"
...
$ curl localhost:24231/metrics
# TYPE fluentd_output_status_buffer_total_bytes gauge
# HELP fluentd_output_status_buffer_total_bytes Current total size of stage and queue buffers.
# TYPE fluentd_output_status_buffer_stage_length gauge
# HELP fluentd_output_status_buffer_stage_length Current length of stage buffers.
# TYPE fluentd_output_status_buffer_stage_byte_size gauge
# HELP fluentd_output_status_buffer_stage_byte_size Current total size of stage buffers.
# TYPE fluentd_output_status_buffer_queue_length gauge
# HELP fluentd_output_status_buffer_queue_length Current length of queue buffers.
# TYPE fluentd_output_status_queue_byte_size gauge
# HELP fluentd_output_status_queue_byte_size Current total size of queue buffers.
# TYPE fluentd_output_status_buffer_available_space_ratio gauge
# HELP fluentd_output_status_buffer_available_space_ratio Ratio of available space in buffer.
# TYPE fluentd_output_status_buffer_newest_timekey gauge
# HELP fluentd_output_status_buffer_newest_timekey Newest timekey in buffer.
# TYPE fluentd_output_status_buffer_oldest_timekey gauge
# HELP fluentd_output_status_buffer_oldest_timekey Oldest timekey in buffer.
# TYPE fluentd_output_status_retry_count gauge
# HELP fluentd_output_status_retry_count Current retry counts.
fluentd_output_status_retry_count{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
# TYPE fluentd_output_status_num_errors gauge
# HELP fluentd_output_status_num_errors Current number of errors.
fluentd_output_status_num_errors{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
# TYPE fluentd_output_status_emit_count gauge
# HELP fluentd_output_status_emit_count Current emit counts.
fluentd_output_status_emit_count{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 3.0
# TYPE fluentd_output_status_emit_records gauge
# HELP fluentd_output_status_emit_records Current emit records.
fluentd_output_status_emit_records{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 3.0
# TYPE fluentd_output_status_write_count gauge
# HELP fluentd_output_status_write_count Current write counts.
fluentd_output_status_write_count{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
# TYPE fluentd_output_status_rollback_count gauge
# HELP fluentd_output_status_rollback_count Current rollback counts.
fluentd_output_status_rollback_count{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
# TYPE fluentd_output_status_flush_time_count gauge
# HELP fluentd_output_status_flush_time_count Total flush time.
fluentd_output_status_flush_time_count{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
# TYPE fluentd_output_status_slow_flush_count gauge
# HELP fluentd_output_status_slow_flush_count Current slow flush counts.
fluentd_output_status_slow_flush_count{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
# TYPE fluentd_output_status_retry_wait gauge
# HELP fluentd_output_status_retry_wait Current retry wait
fluentd_output_status_retry_wait{hostname="df5808e5adb2",plugin_id="object:834",type="null"} 0.0
@AntoineC44
Copy link
Contributor

To me it is more user friendly having the comments, it shows metrics exists but has not received events yet. Which prometheus parser are you using?

@patrick-stephens
Copy link
Author

cmetrics, i.e. the Fluent Bit one, but I believe this will be updated to resolve it however it does mean the two tools in the same ecosystem do not currently work together.

These metrics never receive events: they are never populated with anything in my example and this meant I could never scrape metrics from fluentd until I disabled them completely. This is the main reason I raised the issue: if this was just a transient failure resolved on the next scrape then sure but my concern was other scrapers could fail as well.

@AntoineC44
Copy link
Contributor

Thanks for the explanation, don't know if this cmetrics parser behavior is the norm or the exception, if it is the exception maybe opening an issue there for fix would be better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants