Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added prometheus client and metrics server to Ghost #21192

Conversation

cmraible
Copy link
Contributor

@cmraible cmraible commented Oct 2, 2024

ref https://linear.app/tryghost/issue/ENG-1505/add-prometheus-metrics-server-to-allow-monitoring-ghost-metrics

Summary

This commit includes two main components: a prometheus client class to collect metrics from Ghost, and a standalone metrics server that exposes a /metrics endpoint at a separate port (9416 by default) from the main Ghost app.

The prometheus client is a very thin wrapper around [prom-client](https://github.com/siimon/prom-client). We could use prom-client directly, but this approach should make it easier to switch to a different prometheus client package (or make our own) if we ever need to down the line.

The list of default metrics this enables is specified in an e2e test here. This also gives us the ability to create and collect custom metrics, although none are included in this commit yet.

Configuration

The prometheus client and the metrics server are both enabled by default, but can be disabled by setting the metrics_server:disabled flag to true.

Why not expose the /metrics endpoint in one of the existing express apps?

The standalone express app exists for two main reasons:

  1. We don't want these metrics to be public, and the easiest way to accomplish that is to expose the /metrics endpoint at a different port that won't be exposed to the internet.
  2. Creating a standalone express instance decouples the metrics endpoint from the Ghost server, so if Ghost is not responding for whatever reason, we should still be able to scrape metrics to understand what's going on internally.

Impact on Boot & Shut down time

The prometheus client is initialized early in the boot process so we can collect metrics during the boot sequence. Testing locally has shown that this increases boot time by ~20ms. The metrics server which exposes the /metrics endpoint is not initialized until after the background services, and it is not awaited, to avoid impacting boot time. None of this code, including the requires, will run if the metrics_server:disabled flag is set to true.

Shutting down the metrics server is added as a cleanup task for the main Ghost server instance, and is setup to shut down with 0 grace period to avoid impacting shut down time.

@cmraible cmraible closed this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant