-
Notifications
You must be signed in to change notification settings - Fork 20
DTrace D Compiler Design
The current compiler implementation in DTrace on Linux is largely based on the design found in the legacy implementation that provided D-to-DIF compilation.
Let's first look at how the first implementation of DTrace on Linux handled compilation. Then we will have a look at the current DTrace compiler processes, and then after introducing some of the functional requirements that form the basis for needing a substantial change, an overview of the new design will be given.
The highest level unit of D code to be presented to DTrace is a program, which comprises a mix of any number of type declaration, variable declarations, and statements. A statement can be referred to as a clause, and in this document we favour that term because statement is more commonly used in the context of source code (along with expression).
TODO: DESCRIBE TYPE AND VAR DECLARATIONS
D clauses consist of three components:
- one or more probe specifications (that may contain wildcards)
- an optional predicate (a D code expression that evaluates true or false)
- a D code block (internally modeled as a sequence of actions)
Programs can be specified on the command line (using the -P, -m, -f, -n, -i options) or in scripts (using the -s option). Programs are compiled and loaded in the order they are specified, with containing clauses being loaded in the order they appear in their containing program. This is important because the order of the clauses is often part of the functional implementation of the tracing session.
A given clause is compiled for each of its associated probe specifications. Note that the list of probe specifications is just that: a comma-separated list of probe specifications. Since each probe specification may contains wildcard elements, each probe specification may resolve into multiple probes that match the specification. Still, a clause is only compiled once for each set of probes that matches a given specification. Simply put: if there are N probe specifications for a clause, the clause will be compiled N times (regardless of how many probes match the different specifications).
A clause is broken up into multiple DIF instruction sequences, typically one per statement in the clause. A simple example illustrates this:
{
a = 1;
b = 2;
trace(a + b);
}
is compiled into three DIF instruction sequences:
Statement: a = 1;
INS OFF OPCODE INSTRUCTION
000 000: 25000001 setx DT_INTEGER[0], %r1 ! 0x1
001 004: 2a050001 stgs %r1, DT_VAR(1280) ! DT_VAR(1280) = "a"
002 008: 23000001 ret %r1
Statement: b = 2;
INS OFF OPCODE INSTRUCTION
000 000: 25000001 setx DT_INTEGER[0], %r1 ! 0x2
001 004: 2a050101 stgs %r1, DT_VAR(1281) ! DT_VAR(1281) = "b"
002 008: 23000001 ret %r1
Statement: trace(a + b);
INS OFF OPCODE INSTRUCTION
000 000: 29050001 ldgs DT_VAR(1280), %r1 ! DT_VAR(1280) = "a"
001 004: 29050102 ldgs DT_VAR(1281), %r2 ! DT_VAR(1281) = "b"
002 008: 07010201 add %r1, %r2, %r1
003 012: 23000001 ret %r1
As mentioned before, this code is generated for every probe specification (rather than for every matching probe). There are cases where compilation will fail as a function of the matching probe set for a given probe specification. This happens when the clause uses the args[] built-in array variable and the set of matching probes does not provide a consistent argument set. In that case, compilation will fail. An alternative approach would be to compile the code for each subset of matching probes for which there is a consistent argument set, but that is not done.
When the time comes to actually execute probe scripts, the list of clauses is processed in the order they were specified. For each probe specification in a given clause, a list of DIF instruction sequences is passed to the kernel along with the probe specification. The kernel DTrace component performs the matching of probes against the probe specification, and associates the DIF code sequences with each probe.
With the choice to move away from a DTrace-specific kernel component for tracing building upon core Linux kernel tracing facilities certain limitations came to light. One important one is the fact that BPF programs are passed a program type specific context whereas DTrace probe clauses are expected to be passed a uniform context. The DTrace context ensures that D clauses can e.g. access probe arguments using the arg0 through arg9 variables, or through the args[] array, regardless of the probe. In BPF, some programs can access the arguments directly from the BPF context (e.g. system call probes) whereas other probes need to dig them out of the register set.
Given that the D compiler is designed to generate code in a largely probe independent manner (aside from some type checking, where possible), the way to get around the BPF requirement for program type contexts vs DTrace generic context is the use of a trampoline. The BPF program that will be attached to a probe will be a small trampoline that is probe type (and therefore program type) specific. It's main job is to take the BPF context and construct a DTrace context from it. Once that is done, a call is made to the BPF code that implements the actual clause.
With this basic knowledge about the need for trampoline code in the current DTrace on Linux, we can discuss the compilation process. Just as with DTrace in its first implementation, a given clause is compiled for each of its associated probe specifications. Again, the list of probe specifications is just that: a comma-separated list of probe specifications. Since each probe specification may contains wildcard elements, each probe specification may resolve into multiple probes that match the specification.
As before, a clause is only compiled once for each set of probes that matches a given specification. Simply put: if there are N probe specifications for a clause, the clause will be compiled N times (regardless of how many probes match the different specifications). It is also important to note that a clause is compiled into a single BPF program (note how this is different from what DTrace v1 does as described above).
The compilation of a clause for (one of) its probe specification(s) starts with a call to the trampoline generator in a provider implementation that matches the probe specification. Here the first complication pops up. If the probe specification matches more than one probe, and the matching probes do not belong to the same provider or do not share a consistent argument set compilation is not possible. In practical terms this means that almost all probe specifications that match more than one probe cause compilation to fail. More on this in the next section.
As an example, let's look at the trampoline code generated for a syscall probe (in C syntax):
struct syscall_data {
dt_pt_regs *regs;
long syscall_nr;
long arg[6];
};
int main(struct syscall_data *ctx)
{
struct dt_bpf_context dctx;
int rc;
memset(&dctx, 0, sizeof(dctx));
dctx.epid = EPID;
for (i = 0; i < ARGC; i++)
`dctx.argv[i] = ctx->arg[i];
#ifdef HAS_PREDICATE
rc = dt_predicate(ctx, &dctx);
if (rc == 0)
goto exit;
#endif
rc = dt_program(ctx, &dctx);
exit:
return rc;
}
The syscall_data struct provides a definition of the BPF context that is passed to BPF programs attached to system call entry and return probes. In the C code above EPID is an identifier that will be filled in when the code is loaded, ARGC is the number of arguments that the probe has, and HAS_PREDICATE is true if the clause has a predicate.
Next, if there is a predicate, code is generated for it in the same instruction stream as the trampoline. In other words, the predicate implementing code is appended to the trampoline (after the exit instruction).
Then, the actual D code block is compiled and the BPF instructions are appended to the instruction stream that starts with the trampoline and also contains the predicate code (if there is one). The generated code starts of with a prologue that establishes the environment for executing the body of the clause (and more often than not, record tracing information in the output buffer). The prologue is (in C syntax):
int dt_program(void *ctx, struct dt_bpf_context *dctx)
{
int rc; /* %r0 */
int key;
int cpu;
char *buf; /* %r9 */
cpu = bpf_get_smp_processor_id();
key = 0;
rc = bpf_map_lookup_elem(&mem, &key);
if (rc == NULL)
return 0;
/* Store a 4-byte filler (0) in front of the actual content. */
*((uint32_t *)&buf[0]) = 0;
buf += 4;
*((uint32_t *)&buf[0]) = dctx->epid;
*((uint32_t *)&buf[4]) = 0;
/* END OF PROLOGUE */
The code generated for the rest of the D code block follows this prologue, and is followed by an epilogue (in C syntax):
/* START OF EPILOGUE */
if (dctx->fault)
return dctx->fault;
return bpf_perf_event_output(ctx, &buffers, cpu, buf-4, SIZE+4);
}
The details on how the buffer is managed is outside the scope of this document. The important aspect here is that the prologue initializes some important data that is expected to exist during the code generation phase of compiling the D code block, and the epilogue is important because it submits the buffer data to the output buffer that the DTrace consumer will process.
Once the D code block has been compiled. the entire "program" (trampoline, compiled predicate (if any), and compiled D code block) is assembled. It is at this stage that the calls to dt_predicate and dt_program are resolved based on the actual start addresses of those pieces of code.
When the time comes to actually execute probe scripts, global BPF maps are created and then the list of clauses is processed in the order they were specified. For each probe specification in a given clause there will be exactly one program (remember: all actions are compiled into a single program, along with the trampoline and predicate).