Useful metrics #106

jan-g · 2017-11-09T13:13:03Z

Operationally, there are some obvious things to measure per flow node. These should be exposed via /metrics if they aren't already:

DB connectivity:

number of active pool connections (vs. idle)
sql span histograms for journalling

One upper limit on how many concurrent stage operations we can sustain per second is (max pool connections) / <sql query span>.

Executor connectivity:

number of active fn invocations the executor is waiting on.

Error counts:

fn failures
db errors
lower-level errors: eg, socket availability (we might conceivably bump into this if we have a naive http/1.1 connection to the fn api).

The text was updated successfully, but these errors were encountered:

hhexo · 2017-11-09T13:20:56Z

For DB connectivity we have spans of the time taken by sql persistence operations (by operation). These are then collected in histogram by the prometheus mapper. I would argue that we also need quantiles (therefore use prometheus Summaries instead of Histograms), so I can add those.

Counters (number of connections, number of fn invocations, ...) are supported in prometheus but I'm not sure if opentracing has a concept for those (still, I'm a bit ignorant when it comes to opentracing).

jan-g · 2017-11-10T09:42:53Z

I don't mind using raw prometheus (or something wrapped around it) if it means we can get counters out for useful things.

zootalures · 2017-11-10T12:15:15Z

I don't think opentracing concerns itself with metrics/gauge stuff - and retconning numbers from the events is a bad idea, I assume we'll need to generate propmetheus metrics from internal gauge/counters alongside the event metrics.

hhexo · 2017-11-20T18:18:05Z

Note: #114 adds a few of the mentioned metrics.

DB timings (already there)
API call timings
Number of currently active Flows
Number of currently active Fn invocations
Duration of individual flows (aggregated as histogram / quantiles)

hhexo · 2017-11-21T16:37:10Z

#114 is closed now, because it will be done as part of #84 since that changes the api.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Useful metrics #106

Useful metrics #106

jan-g commented Nov 9, 2017

hhexo commented Nov 9, 2017

jan-g commented Nov 10, 2017

zootalures commented Nov 10, 2017

hhexo commented Nov 20, 2017 •

edited

Loading

hhexo commented Nov 21, 2017

Useful metrics #106

Useful metrics #106

Comments

jan-g commented Nov 9, 2017

hhexo commented Nov 9, 2017

jan-g commented Nov 10, 2017

zootalures commented Nov 10, 2017

hhexo commented Nov 20, 2017 • edited Loading

hhexo commented Nov 21, 2017

hhexo commented Nov 20, 2017 •

edited

Loading