pahole -j BTF conversion is not reproducible #42

martinetd · 2023-05-14T03:57:48Z

Hi,

The dwarf -> BTF conversion multithreaded process just spawns threads which consume the next dwarf cu in turn whenever they're ready, and output whenever they're done, which leads to non-reproducible output as the processing time isn't guaranteed.

I don't see an obvious solution with the current code (there's some reordering for rust, would that work without too big of a slow down?), but I figured I'd bring it up here first for ideas.
The workaround that'll likely be used for nixos is disabling threads if SOURCE_DATE_EPOCH is set (as that most likely means a reproducible build was intended), but we'll be happy to try something else.

acmel · 2023-05-15T21:06:19Z

Humm, perhaps we can add an extra pass to create just the CUs, sorted by name, then make the BTF encoding ordered by CU name somehow, that probably end up causing some performance penalty as sometimes a BTF encoder thread would have to wait for the next (sorted by name) CU to have its DWARF processed, so would require some command line option for enabling it, maybe --reproducible-output.

martinetd · 2023-05-16T00:06:20Z

I agree sorting is probably the most straightforward solution.
It seems a bit of a shame to sort before the parallel BTF processing as that'll require threads to wait for each other as you pointed out -- sorting the final output is more difficult?
It also doesn't have to be a costly sort like CU name, but could be pure input order e.g. something like adding a counter:

have two "globals" counters, one for current input cu number and one for current output cu number;
in dwarf_cus__nextcu remember input cu number and increment it there under lock
when processing is done, if current cu number matches output cu number output directly (and increment output number), otherwise keep off in a temporary list
while we're there after having successfully output anything check if there was anything we can dequeue in temporary list and do it, also incrementing counter.

Should be possible without too much of a slow down, just holding the memory associated with the output in a temporary list until it's ready to be copied off.

Regarding extra command line switch (if slowdown requires it), it's not trivial to add options to all users of pahole (e.g. linux build), so basing the decision on SOURCE_DATE_EPOCH or another env var might be more easy to use, but I guess we can figure that out later.

martinetd mentioned this issue May 14, 2023

pahole: patch to force single-threaded mode if reproducibility is desired NixOS/nixpkgs#231768

Merged

12 tasks

brycekahle mentioned this issue Oct 16, 2023

Reproducible BTF tarballs aquasecurity/btfhub#98

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pahole -j BTF conversion is not reproducible #42

pahole -j BTF conversion is not reproducible #42

martinetd commented May 14, 2023

acmel commented May 15, 2023

martinetd commented May 16, 2023

pahole -j BTF conversion is not reproducible #42

pahole -j BTF conversion is not reproducible #42

Comments

martinetd commented May 14, 2023

acmel commented May 15, 2023

martinetd commented May 16, 2023