In modern software systems, understanding the performance characteristics of functions—especially those critical to the operation of your application—is paramount. One key metric in performance analysis is function latency, which is the time taken by a function to execute from start to finish. By analyzing function latency, developers can identify bottlenecks, optimize performance, and ensure that their systems operate efficiently under various conditions.
This blog post will dive into how to measure function latency using eBPF, an incredibly powerful tool for tracing and monitoring both kernel and user-space programs.
eBPF (Extended Berkeley Packet Filter) is a revolutionary technology that allows developers to write small programs that run in the Linux kernel. Originally designed for packet filtering, eBPF has evolved into a versatile tool for tracing, monitoring, and profiling system behavior. With eBPF, you can instrument almost any part of the Linux kernel or user-space programs to collect performance data, enforce security policies, or even debug systems in real time—all without the need to modify the kernel source code or restart the system.
eBPF programs are executed in a sandboxed environment within the kernel, ensuring safety and stability. These programs can attach to various hooks within the kernel, such as system calls, network events, and tracepoints, or even user-space functions using uprobes (user-level probes). The data collected by eBPF programs can then be exported to user space for analysis, making it an invaluable tool for system observability. Uprobe
in kernel mode eBPF runtime may also cause relatively large performance overhead. In this case, you can also consider using user mode eBPF runtime, such as bpftime.
Function latency is a critical metric in performance analysis for both kernel and user-space applications. It provides insights into how long a particular function takes to execute, which is crucial for:
- Identifying Performance Bottlenecks: High function latency may indicate inefficiencies or issues within the code that need optimization.
- Ensuring System Responsiveness: In real-time systems or latency-sensitive applications, understanding and minimizing function latency is essential to maintain responsiveness.
- Profiling and Benchmarking: By measuring the latency of various functions, developers can benchmark their systems and compare the performance of different implementations or configurations.
- Debugging and Diagnostics: When a system exhibits unexpected behavior or performance degradation, measuring function latency can help pinpoint the source of the problem.
Both kernel-space (e.g., system calls, file operations) and user-space (e.g., library functions) functions can be profiled for latency, providing a comprehensive view of system performance.
Below is an eBPF program designed to measure the latency of a function by hooking into its entry and exit points. The program uses kprobes and kretprobes (for kernel functions) or uprobes and uretprobes (for user-space functions) to capture the start and end times of the function execution.
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2021 Google LLC. */
#include "vmlinux.h"
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include "funclatency.h"
#include "bits.bpf.h"
const volatile pid_t targ_tgid = 0;
const volatile int units = 0;
/* key: pid. value: start time */
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, MAX_PIDS);
__type(key, u32);
__type(value, u64);
} starts SEC(".maps");
__u32 hist[MAX_SLOTS] = {};
static void entry(void)
{
u64 id = bpf_get_current_pid_tgid();
u32 tgid = id >> 32;
u32 pid = id;
u64 nsec;
if (targ_tgid && targ_tgid != tgid)
return;
nsec = bpf_ktime_get_ns();
bpf_map_update_elem(&starts, &pid, &nsec, BPF_ANY);
}
SEC("kprobe/dummy_kprobe")
int BPF_KPROBE(dummy_kprobe)
{
entry();
return 0;
}
static void exit(void)
{
u64 *start;
u64 nsec = bpf_ktime_get_ns();
u64 id = bpf_get_current_pid_tgid();
u32 pid = id;
u64 slot, delta;
start = bpf_map_lookup_elem(&starts, &pid);
if (!start)
return;
delta = nsec - *start;
switch (units) {
case USEC:
delta /= 1000;
break;
case MSEC:
delta /= 1000000;
break;
}
slot = log2l(delta);
if (slot >= MAX_SLOTS)
slot = MAX_SLOTS - 1;
__sync_fetch_and_add(&hist[slot], 1);
}
SEC("kretprobe/dummy_kretprobe")
int BPF_KRETPROBE(dummy_kretprobe)
{
exit();
return 0;
}
char LICENSE[] SEC("license") = "GPL";
-
Header Files: The code begins by including the necessary headers like
vmlinux.h
(which provides kernel definitions) andbpf_helpers.h
(which offers helper functions for eBPF programs). -
Global Variables:
targ_tgid
is a target process ID (or thread group ID), andunits
determines the time unit for latency measurement (e.g., microseconds or milliseconds). -
BPF Maps: A hash map (
starts
) is defined to store the start time of function executions for each process ID. Another array (hist
) is used to store the latency distribution. -
Entry Function: The
entry()
function captures the current timestamp when the function is entered and stores it in thestarts
map keyed by the process ID. -
Exit Function: The
exit()
function calculates the latency by subtracting the stored start time from the current time. The result is then categorized into a histogram slot, which is incremented to record the occurrence of that latency range. -
Probes: The
kprobe
andkretprobe
are used to attach to the entry and exit points of the function, respectively. These probes trigger theentry()
andexit()
functions to measure the latency. -
License: The program is licensed under GPL to ensure compliance with kernel licensing requirements.
To trace the latency of a user-space function, such as the read
function in the libc
library, you can run the following command:
# ./funclatency /usr/lib/x86_64-linux-gnu/libc.so.6:read
tracing /usr/lib/x86_64-linux-gnu/libc.so.6:read...
tracing func read in /usr/lib/x86_64-linux-gnu/libc.so.6...
Tracing /usr/lib/x86_64-linux-gnu/libc.so.6:read. Hit Ctrl-C to exit
^C
nsec : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
128 -> 255 : 0 | |
512 -> 1023 : 0 | |
65536 -> 131071 : 651 |****************************************+|
131072 -> 262143 : 107 |****** |
262144 -> 524287 : 36 |** |
524288 -> 1048575 : 8 | |
8388608 -> 16777215 : 2 | |
Exiting trace of /usr/lib/x86_64-linux-gnu/libc.so.6:read
To trace the latency of a kernel-space function, such as vfs_read
, run the following command:
# sudo ./funclatency -u vfs_read
Tracing vfs_read. Hit Ctrl-C to exit
^C
usec : count distribution
0 -> 1 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 3397 |****************************************|
32 -> 63 : 2175 |************************* |
64 -> 127 : 184 |** |
1024 -> 2047 : 0 | |
4096 -> 8191 : 5 | |
2097152 -> 4194303 : 2 | |
Exiting trace of vfs_read
These commands trace the execution of the specified function, either in user-space or kernel-space, and print a histogram of the observed latencies, showing the distribution of function execution times.
You can find the source code in https://github.com/eunomia-bpf/bpf-developer-tutorial/blob/main/src/33-funclatency
Measuring function latency with eBPF offers deep insights into the performance of both user-space and kernel-space code. By understanding function latency, developers can identify performance bottlenecks, improve system responsiveness, and ensure the smooth operation of their applications.
This
blog post covered the basics of using eBPF to trace function latency, including an overview of the eBPF kernel code used to perform the tracing. The examples provided demonstrated how to run the tool to trace both user-space and kernel-space functions.
For those interested in learning more about eBPF, including more advanced examples and tutorials, please visit our https://github.com/eunomia-bpf/bpf-developer-tutorial or our website https://eunomia.dev/tutorials/.
If you are looking for a production-ready tool for function latency measurement, you might want to check out the full implementation available in the BCC repository.