From 187d10a1d7b719c05f9d5665fe8c2072d5de11df Mon Sep 17 00:00:00 2001 From: Anna Kapuscinska Date: Wed, 28 Feb 2024 16:09:12 +0000 Subject: [PATCH] docs: Add autogenerated metrics reference page Signed-off-by: Anna Kapuscinska --- .gitattributes | 1 + docs/content/en/docs/reference/metrics.md | 422 ++++++++++++++++++++++ 2 files changed, 423 insertions(+) create mode 100644 docs/content/en/docs/reference/metrics.md diff --git a/.gitattributes b/.gitattributes index 381565e053a..835b4faafda 100644 --- a/.gitattributes +++ b/.gitattributes @@ -7,4 +7,5 @@ # docs /docs/content/en/docs/reference/helm-chart.md linguist-generated /docs/content/en/docs/reference/grpc-api.md linguist-generated +/docs/content/en/docs/reference/metrics.md linguist-generated /docs/data/tetragon_flags.yaml linguist-generated diff --git a/docs/content/en/docs/reference/metrics.md b/docs/content/en/docs/reference/metrics.md new file mode 100644 index 00000000000..b3bdac51a08 --- /dev/null +++ b/docs/content/en/docs/reference/metrics.md @@ -0,0 +1,422 @@ +--- +title: "Metrics Reference" +description: > + This reference documents Prometheus metrics exposed by Tetragon. +weight: 4 +--- + + +# Tetragon Health Metrics + +## `tetragon_build_info` + +Build information about tetragon + +| label | values | +| ----- | ------ | +| `commit` | `931b70f2c9878ba985ba6b589827bea17da6ec33` | +| `go_version` | `go1.22.0` | +| `modified` | `false` | +| `time ` | `2022-05-13T15:54:45Z` | + +## `tetragon_data_event_size` + +The size of received data events. + +| label | values | +| ----- | ------ | +| `op ` | `bad, ok` | + +## `tetragon_data_events_total` + +The number of data events by type. For internal use only. + +| label | values | +| ----- | ------ | +| `event` | `Added, Appended, Bad, Matched, NotMatched, Received` | + +## `tetragon_errors_total` + +The total number of Tetragon errors. For internal use only. + +| label | values | +| ----- | ------ | +| `type ` | `event_finalize_process_info_failed, event_missing_process_info, handler_error, process_cache_evicted, process_cache_miss_on_get, process_cache_miss_on_remove, process_pid_tid_mismatch` | + +## `tetragon_event_cache_accesses_total` + +The total number of Tetragon event cache accesses. For internal use only. + +## `tetragon_event_cache_errors_total` + +The total of errors encountered while fetching process exec information from the cache. + +| label | values | +| ----- | ------ | +| `error` | `nil_process_pid` | +| `event_type` | `PROCESS_EXEC, PROCESS_EXIT, PROCESS_KPROBE, PROCESS_LOADER, PROCESS_TRACEPOINT, PROCESS_UPROBE, RATE_LIMIT_INFO` | + +## `tetragon_event_cache_parent_info_errors_total` + +The total of times we failed to fetch cached parent info for a given event type. + +| label | values | +| ----- | ------ | +| `event_type` | `PROCESS_EXEC, PROCESS_EXIT, PROCESS_KPROBE, PROCESS_LOADER, PROCESS_TRACEPOINT, PROCESS_UPROBE, RATE_LIMIT_INFO` | + +## `tetragon_event_cache_pod_info_errors_total` + +The total of times we failed to fetch cached pod info for a given event type. + +| label | values | +| ----- | ------ | +| `event_type` | `PROCESS_EXEC, PROCESS_EXIT, PROCESS_KPROBE, PROCESS_LOADER, PROCESS_TRACEPOINT, PROCESS_UPROBE, RATE_LIMIT_INFO` | + +## `tetragon_event_cache_process_info_errors_total` + +The total of times we failed to fetch cached process info for a given event type. + +| label | values | +| ----- | ------ | +| `event_type` | `PROCESS_EXEC, PROCESS_EXIT, PROCESS_KPROBE, PROCESS_LOADER, PROCESS_TRACEPOINT, PROCESS_UPROBE, RATE_LIMIT_INFO` | + +## `tetragon_event_cache_retries_total` + +The total number of retries for event caching per entry type. + +| label | values | +| ----- | ------ | +| `entry_type` | `parent_info, pod_info, process_info` | + +## `tetragon_flags_total` + +The total number of Tetragon flags. For internal use only. + +| label | values | +| ----- | ------ | +| `type ` | `auid, clone, errorArgs, errorCWD, errorCgroupID, errorCgroupKn, errorCgroupName, errorCgroupSubsys, errorCgroupSubsysCgrp, errorCgroups, errorFilename, errorPathResolutionCwd, execve, execveat, miss, nocwd, procFS, rootcwd, taskWalk, truncArgs, truncFilename` | + +## `tetragon_generic_kprobe_merge_errors_total` + +The total number of failed attempts to merge a kprobe and kretprobe event. + +| label | values | +| ----- | ------ | +| `curr_fn` | `example_kprobe` | +| `curr_type` | `enter, exit` | +| `prev_fn` | `example_kprobe` | +| `prev_type` | `enter, exit` | + +## `tetragon_generic_kprobe_merge_ok_total` + +The total number of successful attempts to merge a kprobe and kretprobe event. + +## `tetragon_generic_kprobe_merge_pushed_total` + +The total number of pushed events for later merge. + +## `tetragon_handler_errors_total` + +The total number of event handler errors. For internal use only. + +| label | values | +| ----- | ------ | +| `error_type` | `event_handler_failed, unknown_opcode` | +| `opcode` | `0, 11, 13, 14, 15, 23, 24, 25, 26, 5, 7` | + +## `tetragon_handling_latency` + +The latency of handling messages in us. + +| label | values | +| ----- | ------ | +| `op ` | `11, 13, 14, 15, 23, 24, 25, 26, 5, 7` | + +## `tetragon_map_errors_total` + +The total number of entries dropped per LRU map. + +| label | values | +| ----- | ------ | +| `map ` | `execve_map, tg_execve_joined_info_map` | + +## `tetragon_map_in_use_gauge` + +The total number of in-use entries per map. + +| label | values | +| ----- | ------ | +| `map ` | `execve_map, tg_execve_joined_info_map` | +| `total` | ` 0` | + +## `tetragon_missed_events_total` + +The total number of Tetragon events per type that are failed to sent from the kernel. + +| label | values | +| ----- | ------ | +| `msg_op` | `11, 13, 14, 15, 23, 24, 25, 26, 5, 7` | + +## `tetragon_msg_op_total` + +The total number of times we encounter a given message opcode. For internal use only. + +| label | values | +| ----- | ------ | +| `msg_op` | `11, 13, 14, 15, 23, 24, 25, 26, 5, 7` | + +## `tetragon_notify_overflowed_events_total` + +The total number of events dropped because listener buffer was full + +## `tetragon_policyfilter_metrics_total` + +Policy filter metrics. For internal use only. + +| label | values | +| ----- | ------ | +| `op ` | `add, add-container, delete, update` | +| `subsys` | `pod-handlers, rthooks` | + +## `tetragon_process_loader_stats` + +Process Loader event statistics. For internal use only. + +| label | values | +| ----- | ------ | +| `count` | `LoaderReceived, LoaderResolvedImm, LoaderResolvedRetry` | + +## `tetragon_ratelimit_dropped_total` + +The total number of rate limit Tetragon drops + +## `tetragon_ringbuf_perf_event_errors_total` + +The total number of errors when reading the Tetragon ringbuf. + +## `tetragon_ringbuf_perf_event_lost_total` + +The total number of Tetragon ringbuf perf events lost. + +## `tetragon_ringbuf_perf_event_received_total` + +The total number of Tetragon ringbuf perf events received. + +## `tetragon_ringbuf_queue_lost_total` + +The total number of Tetragon events ring buffer queue lost. + +## `tetragon_ringbuf_queue_received_total` + +The total number of Tetragon events ring buffer queue received. + +## `tetragon_tracingpolicy_loaded` + +The number of loaded tracing policy by state. + +| label | values | +| ----- | ------ | +| `state` | `disabled, enabled, error, load_error` | + +## `tetragon_watcher_errors_total` + +The total number of errors for a given watcher type. + +| label | values | +| ----- | ------ | +| `error` | `failed_to_get_pod` | +| `watcher` | ` k8s` | + +## `tetragon_watcher_events_total` + +The total number of events for a given watcher type. + +| label | values | +| ----- | ------ | +| `watcher` | ` k8s` | + + + +# Tetragon Resources Metrics + +## `go_gc_duration_seconds` + +A summary of the pause duration of garbage collection cycles. + +## `go_goroutines` + +Number of goroutines that currently exist. + +## `go_info` + +Information about the Go environment. + +| label | values | +| ----- | ------ | +| `version` | `go1.22.0` | + +## `go_memstats_alloc_bytes` + +Number of bytes allocated and still in use. + +## `go_memstats_alloc_bytes_total` + +Total number of bytes allocated, even if freed. + +## `go_memstats_buck_hash_sys_bytes` + +Number of bytes used by the profiling bucket hash table. + +## `go_memstats_frees_total` + +Total number of frees. + +## `go_memstats_gc_sys_bytes` + +Number of bytes used for garbage collection system metadata. + +## `go_memstats_heap_alloc_bytes` + +Number of heap bytes allocated and still in use. + +## `go_memstats_heap_idle_bytes` + +Number of heap bytes waiting to be used. + +## `go_memstats_heap_inuse_bytes` + +Number of heap bytes that are in use. + +## `go_memstats_heap_objects` + +Number of allocated objects. + +## `go_memstats_heap_released_bytes` + +Number of heap bytes released to OS. + +## `go_memstats_heap_sys_bytes` + +Number of heap bytes obtained from system. + +## `go_memstats_last_gc_time_seconds` + +Number of seconds since 1970 of last garbage collection. + +## `go_memstats_lookups_total` + +Total number of pointer lookups. + +## `go_memstats_mallocs_total` + +Total number of mallocs. + +## `go_memstats_mcache_inuse_bytes` + +Number of bytes in use by mcache structures. + +## `go_memstats_mcache_sys_bytes` + +Number of bytes used for mcache structures obtained from system. + +## `go_memstats_mspan_inuse_bytes` + +Number of bytes in use by mspan structures. + +## `go_memstats_mspan_sys_bytes` + +Number of bytes used for mspan structures obtained from system. + +## `go_memstats_next_gc_bytes` + +Number of heap bytes when next garbage collection will take place. + +## `go_memstats_other_sys_bytes` + +Number of bytes used for other system allocations. + +## `go_memstats_stack_inuse_bytes` + +Number of bytes in use by the stack allocator. + +## `go_memstats_stack_sys_bytes` + +Number of bytes obtained from system for stack allocator. + +## `go_memstats_sys_bytes` + +Number of bytes obtained from system. + +## `go_threads` + +Number of OS threads created. + +## `process_cpu_seconds_total` + +Total user and system CPU time spent in seconds. + +## `process_max_fds` + +Maximum number of open file descriptors. + +## `process_open_fds` + +Number of open file descriptors. + +## `process_resident_memory_bytes` + +Resident memory size in bytes. + +## `process_start_time_seconds` + +Start time of the process since unix epoch in seconds. + +## `process_virtual_memory_bytes` + +Virtual memory size in bytes. + +## `process_virtual_memory_max_bytes` + +Maximum amount of virtual memory available in bytes. + + + +# Tetragon Events Metrics + +## `tetragon_events_total` + +The total number of Tetragon events + +| label | values | +| ----- | ------ | +| `binary` | `example-binary` | +| `namespace` | `example-namespace` | +| `pod ` | `example-pod` | +| `type ` | `PROCESS_EXEC, PROCESS_EXIT, PROCESS_KPROBE, PROCESS_LOADER, PROCESS_TRACEPOINT, PROCESS_UPROBE, RATE_LIMIT_INFO` | +| `workload` | `example-workload` | + +## `tetragon_policy_events_total` + +Policy events calls observed. + +| label | values | +| ----- | ------ | +| `binary` | `example-binary` | +| `hook ` | `example_kprobe` | +| `namespace` | `example-namespace` | +| `pod ` | `example-pod` | +| `policy` | `example-tracingpolicy` | +| `workload` | `example-workload` | + +## `tetragon_syscalls_total` + +System calls observed. + +| label | values | +| ----- | ------ | +| `binary` | `example-binary` | +| `namespace` | `example-namespace` | +| `pod ` | `example-pod` | +| `syscall` | `example_syscall` | +| `workload` | `example-workload` | +