-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interpret: make typed copies lossy wrt provenance and padding #129778
Conversation
rustbot has assigned @michaelwoerister. Use |
Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri Some changes occurred to the CTFE / Miri engine cc @rust-lang/miri The Miri subtree was changed cc @rust-lang/miri |
ea84409
to
8d243c6
Compare
This comment has been minimized.
This comment has been minimized.
8d243c6
to
c9f962d
Compare
This comment has been minimized.
This comment has been minimized.
3c0e117
to
cd76305
Compare
This comment has been minimized.
This comment has been minimized.
cd76305
to
327cf31
Compare
r? compiler |
r? saethlin |
@@ -65,6 +65,9 @@ pub struct CompileTimeMachine<'tcx> { | |||
/// storing the result in the given `AllocId`. | |||
/// Used to prevent reads from a static's base allocation, as that may allow for self-initialization loops. | |||
pub(crate) static_root_ids: Option<(AllocId, LocalDefId)>, | |||
|
|||
/// A cache of "data range" computations for unions (i.e., the offsets of non-padding bytes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the sort of thing that would normally be a query. Is there a reason it isn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we even want this cached during a normal compilation (but it probably won't hurt much either), and the RangeSet
type is currently only declared in the interpreter so can't be used as a query. Also I am largely unfamiliar with the query system so adding a query is scary. ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because we won't encounter unions repeatedly in const eval. That makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have many constants of the same (union) type, then we would encounter it repeatedly and caching might be worth it. But we don't do validation during const-eval by default so the cache is not nearly as necessary as for Miri.
I can see how making this a query makes sense. But it would probably be a query we wouldn't want to serialize for incremental (I assume there's a way to do that), and as I said I am not sure where we would then put the RangeSet
. Probably rustc_data_structures
, but that seems odd for such a specific data structure. I'm open for suggestions though. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've made a good case that we shouldn't make this into a query in this PR, I'm filing away this conversation for the next time I'm squinting at interpreter profiles :)
.validate_operand( | ||
&allocated.into(), | ||
/*recursive*/ false, | ||
/*reset_provenance_and_padding*/ false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this single caller not want provenance and padding reset? I looked at the docs for this function for context and I think they refer to a function might_permit_raw_init
that doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for assert_zero_valid / assert_mem_uninitialized_valid. The CompileTimeMachine that we operate in here is discarded at the end of this function, alongside the data we are validating here. So there's no reason to reset its padding and provenance, nobody will ever see those writes anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function parameter and all the branching on it is supposed to be a performance optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I have taken to always using the hyperlink rustdoc syntax when mentioning other functions so that the link checker ensures that I can't mention a function that doesn't exist. The habit has already prevented a few of my doc comments from rotting. It's extra work, up to you if you want to adopt it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this function parameter and all the branching on it is supposed to be a performance optimization?
That's one part.
Also this is not the only caller that doesn't to the reset -- const_validate_operand
also doesn't reset. Basically I wanted this change to be Miri-only.
And validate_operand
in the recursive case, when it validates behind references, also doesn't reset padding and provenance -- and that is crucial, it would be wrong to reset this (and indeed the Miri test suite fails when you try, since if you ever have a reference to read-only memory it then blows up when trying to reset padding and provenance there). So branching in the validator on whether we reset provenance and padding is definitely required.
I'm very happy to see less interior mutability. The compiler uses more casual Cell and RefCell than any other codebase I've worked in which seems ironic. @rustbot author |
e140716
to
9529feb
Compare
Finished benchmarking commit (304b7f8): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (secondary -2.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary -2.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 757.734s -> 756.628s (-0.15%) |
Oh wow, just having these branches sit there, even if they are never taken, shows up so strongly in the benchmarks? I could add a |
It only got a significant regression on CTFE stress tests. If you don't think that this should affect real programs, I think it would be fine to just take the hit (as it sounds that the proposed optimization would introduce a lot of complexity into the code). |
It wouldn't be a lot of complexity, but it would make the (nightly-only) const-eval UB check less effective. But those checks already miss a bunch of UB anyway so maybe that's okay. |
…venance-reset, r=<try> interpreter typed copy validation: statically disable padding/provenance in CTFE machine Let's see if this fixes the perf impact in rust-lang#129778. r? `@saethlin`
experiment: see where the perf regressions in rust-lang#129778 come from Let's see if we can figure out what caused the perf impact in rust-lang#129778. There are some extra functions in a few places so maybe more `inline(always)` helps... r? `@saethlin`
recovers some of the perf regressions from rust-lang#129778
Is it intended that after this change I ran into this just now in rkyv, as I was using |
Function returns are a typed copy. This PR only changed the behavior of the const eval interpreter, codegen has always been capable of breaking such code. |
Yes, this is intended -- padding bytes are always uninitialized. And for very unfortunate reasons related to Happy to hear that is already found a bug in the wild. :D Is there something we can link to for this bug? If you wouldn't mind I'd like to add this to the "bugs Miri found" list. |
Of course. I think the more salient question here is "what does a typed copy mean for unions?". Is there any documentation about what bytes are considered padding in unions? Is just
I think the best I have is this failed CI run. I haven't checked for other potentially-affected sources yet. Feel free to add whatever you'd like! 🙂 |
Yes, that is indeed non-trivial. I would love it to be a fully copy of all bytes, but sadly, that does not work. So currently we use "a union has padding where all fields have padding". I'd love to change this (see rust-lang/unsafe-code-guidelines#518), but I am not sure if that is possible. |
interpret: mark some hot functions inline(always) That seems to recover a good part of the perf impact of rust-lang#129778. r? `@saethlin`
interpret: mark some hot functions inline(always) That seems to recover a good part of the perf impact of rust-lang/rust#129778. r? `@saethlin`
MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).
MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).
Rollup merge of rust-lang#130214 - RalfJung:zeroed, r=Mark-Simulacrum MaybeUninit::zeroed: mention that padding is not zeroed That should clarify cases like [this](rust-lang#129778 (comment)).
recovers some of the perf regressions from rust-lang#129778
A "typed copy" in Rust can be a lossy process: when copying at type
usize
(or any other non-pointer type), if the original memory had any provenance, that provenance is lost. When copying at pointer type, if the original memory had partial provenance (i.e., not the same provenance for all bytes), that provenance is lost. When copying any type with padding, the contents of padding are lost.This PR equips our validity-checking pass with the ability to reset provenance and padding according to those rules. Can be reviewed commit-by-commit. The first three commits are just preparation without any functional change.
Fixes rust-lang/miri#845
Fixes rust-lang/miri#2182