Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads without init events #624

Merged
merged 21 commits into from
Mar 9, 2024
Merged

Reads without init events #624

merged 21 commits into from
Mar 9, 2024

Conversation

ThomasHaas
Copy link
Collaborator

@ThomasHaas ThomasHaas commented Feb 23, 2024

This PR attempts to address #623 .

There is some cleanup to do.
There is also a problem with removing init events from litmus tests. I need to debug this (I suppose it is related to co-last memory values?)

EDIT: It also partially addresses #136

@ThomasHaas ThomasHaas changed the title Reads without init events [DRAFT] Reads without init events Feb 24, 2024
@ThomasHaas
Copy link
Collaborator Author

ThomasHaas commented Mar 2, 2024

I need some feedback on how far we want to go in this branch.
(1) If I make sure that we generate all init events as before (even for dynamic allocations), everything works(*) and this PR is pretty much done.
(2) If we don't create init events for dynamic allocations, at least C11 needs some rewriting (this is why a test fails). The current implementation already changes the definition of fr which, together with SC-per-loc, seems to be sufficient for most models to handle to reads from uninitialized memory (I cannot give a 100% guarantee on correctness though!).
However, C11 defines it's own fr (which we can remove) and does not have SC-per-loc and thus requires the rewriting of an axiom.

(*) As long as the programs are really memory-safe, there should be no change to our current semantics. For programs that can read memory out-of-bounds, we previously could give a spurious PASS as the read had no possible rf-edges and we always required the existence of one. This has changed in this PR: the out-of-bounds reads are possible.


EDIT: I added two new options to (i) toggle between zeroed/non-deterministic default values (only relevant to uninitialized memory) and (ii) to toggle between creating/not creating init events for dynamic allocations. The default values are such that they match our current behaviour. This should also make the above question obsolete.

…dynamic allocations (both are enabled by default)

- Added new test showing that we can read uninit memory.
@ThomasHaas ThomasHaas changed the title [DRAFT] Reads without init events Reads without init events Mar 2, 2024
@hernanponcedeleon
Copy link
Owner

A few questions / comments that come to my mind

  • the current implementation would return PASS ("be correct" might be an over-claim since the program has UB) for this variant of the uninitRead.c, right?
int main()
{
    array[SIZE] = 42;
    assert (array[SIZE] == 42);
    return 0;
}

At some point I thought handling out-of-memory would not work out-of-box due to our alias analysis relying on defined behavior

  • we need to be sure we properly handle all variants of alloc (malloc, calloc, ...). AFAIR they vary in how (if) the initialize memory which used to be irrelevant for us when we zeroed out all memory
  • I don't think the last question is irrelevant now: if one set the new options to false, we return unsound results now, so this still needs to be discussed / fixed
  • which is the axiom giving problems in c11? The chain of co and rf in Coh?
  • we should add a TODO to the part where you programmatically set the new fr. This should be moved to some common cat once we manage imports

@ThomasHaas
Copy link
Collaborator Author

A few questions / comments that come to my mind

* the current implementation would return PASS ("be correct" might be an over-claim since the program has UB) for this variant of the `uninitRead.c`, right?
int main()
{
    array[SIZE] = 42;
    assert (array[SIZE] == 42);
    return 0;
}

I think the alias analysis will prohibit those two from aliasing in both the new and the old version, so you will get unsound results in general. In this particular case I don't know what result you will get (possibly PASS for both, but for the wrong reasons).
Anyway, this is simply due to the alias analysis prohibiting this and not due to our anarchic semantics.
For example, if you would load the index from memory, say int index = SIZE; .... array[index] = 42; assert (...) the alias analysis will think they can alias and then we still get correct results.
Also note that everything works fine if you access uninitialized memory that has been allocated (dynamic allocs) even if that is UB: the alias analysis handles those correctly!
In that sense, this PR is a conservative extension of our current code: we soundly handle some more types of UB memory accesses than before but we still do not handle all of them. Also, by adding flag ~empty (UR) you can now check if the program has UB.

At some point I thought handling out-of-memory would not work out-of-box due to our alias analysis relying on defined behavior

* we need to be sure we properly handle all variants of alloc (malloc, calloc, ...). AFAIR they vary in how (if) the initialize memory which used to be irrelevant for us when we zeroed out all memory

Yes, calloc zeroes out the memory and thus would create init events as opposed to malloc.

* I don't think the last question is irrelevant now: if one set the new options to false, we return unsound results now, so this still needs to be discussed / fixed

Of course, but now we have the possibility to defer the discussion/the fixing. A proper fix would likely need a lot of practical testing.
And there are a lot of things one needs to think about:
(1) For every memory model you need to check if the new fr is sufficient or if other changes are necessary. I think in particular C11 and the GPU models need rework. But you could also argue for now that those models are just not compatible with uninit reads and so combining them is a user error rather than a tool error.
(2) What about final memory values if there is no store to that location? Our current encoding requires at least one store to exist and, e.g., litmus tests would fail if you removed init events.

* which is the axiom giving problems in c11? The chain of co and rf in Coh?

It's the Coh axiom. You need to change it like this I believe:

irreflexive ((rf^-1)? ; mo ; rf? ; hb) as Coh // old version
irreflexive ((fr | mo) ; rf? ; hb) as Coh // new version
* we should add a TODO to the part where you programmatically set the new fr. This should be moved to some common cat once we manage imports

I don't think a TODO is required specifically for fr because all relations defined in getOrCreatePredefinedRelation need to be revisited once you add imports. In particular, even if you kept the current fr, you would still need to remove its definition from getOrCreatePredefinedRelation when adding imports.

@hernanponcedeleon
Copy link
Owner

  • we need to be sure we properly handle all variants of alloc (malloc, calloc, ...). AFAIR they vary in how (if) the initialize memory which used to be irrelevant for us when we zeroed out all memory

Yes, calloc zeroes out the memory and thus would create init events as opposed to malloc.

Both intrinsics seem to be handled in a similar way. The only diff I see is that calloc takes an extra argument and the size is computed differently, but I don't yet see how we would enforce initialization in the calloc case, but not on malloc.

* I don't think the last question is irrelevant now: if one set the new options to false, we return unsound results now, so this still needs to be discussed / fixed

Of course, but now we have the possibility to defer the discussion/the fixing. A proper fix would likely need a lot of practical testing. And there are a lot of things one needs to think about: (1) For every memory model you need to check if the new fr is sufficient or if other changes are necessary. I think in particular C11 and the GPU models need rework. But you could also argue for now that those models are just not compatible with uninit reads and so combining them is a user error rather than a tool error. (2) What about final memory values if there is no store to that location? Our current encoding requires at least one store to exist and, e.g., litmus tests would fail if you removed init events.

If the option is not compatible with a given model, we should at least check this and throw an exception if the combination is used.

From the testing perspective, the minimum would be to fix the cases we already known are wrong. I just run all unit tests and I saw only two problems: (1) related to C11 which might be fixed changing the model as you explained, (2) a timeout on RISCV, maybe related to the rf^-1 ; rf (or some similar pattern).

* we should add a TODO to the part where you programmatically set the new fr. This should be moved to some common cat once we manage imports

I don't think a TODO is required specifically for fr because all relations defined in getOrCreatePredefinedRelation need to be revisited once you add imports. In particular, even if you kept the current fr, you would still need to remove its definition from getOrCreatePredefinedRelation when adding imports.

I was under the impression that this was the only remaining non-base relation that reamined there, but you are right we still have a few, e.g. {rf, co} \cap external

@ThomasHaas
Copy link
Collaborator Author

ThomasHaas commented Mar 6, 2024

Both intrinsics seem to be handled in a similar way. The only diff I see is that calloc takes an extra argument and the size is computed differently, but I don't yet see how we would enforce initialization in the calloc case, but not on malloc.

That should be straightforward by just adding an extra field to Alloc saying whether the memory gets zeroed out or not.

If the option is not compatible with a given model, we should at least check this and throw an exception if the combination is used.

I don't think this is so straightforward because how would you know that the provided model is incompatible? We could
legitimately have two C11 models, one being the original (incompatible one) and one being the new one.
We don't even throw exceptions if the CAT model and the target architecture are mismatching and that one is a bigger source of errors.
Basically, throwing exceptions based on the CAT model is not feasible because the only way we can detect what memory model is used is by checking its name/filepath, but I don't think this is a smart idea.

From the testing perspective, the minimum would be to fix the cases we already known are wrong. I just run all unit tests and I saw only two problems: (1) related to C11 which might be fixed changing the model as you explained, (2) a timeout on RISCV, maybe related to the rf^-1 ; rf (or some similar pattern).

The question is if the change to the model is a real fix or if it just hides one problem but creates different one's?
All the GPU models are also wrong but you don't see it in the unit tests because litmus tests don't have dynamic allocation.
I can "fix" C11 and leave the original axiom commented out in the file (similar to how we commented out all the lock stuff we don't support) though (EDIT: I suspect that more parts that need changes but the proposed one is sufficient for now).
Btw. what RISC-V timed out? That's surprising because the rf^-1;rf pattern is only used in the original RISC-V but not in my rewritten version. rf^-1;rf would also need to get updated to be compatible with missing init events, though I cannot imagine a case where something breaks.

@ThomasHaas
Copy link
Collaborator Author

ThomasHaas commented Mar 6, 2024

I changed C11 and reworked calloc. Since the unit tests should work now with the new settings, should I enable them by default? (EDIT: I accidentally enabled the new setting already)

@hernanponcedeleon
Copy link
Owner

EDIT: It also partially addresses #136

Out of curiosity (no need to get this done in this PR unless it is trivial): would non-det allocs work out of the box if we simple remove the exception when the size if not constant or what else needs to be implemented to handle those?

@hernanponcedeleon
Copy link
Owner

I changed C11 and reworked calloc. Since the unit tests should work now with the new settings, should I enable them by default? (EDIT: I accidentally enabled the new setting already)

I think it is good to get this option enabled so it is always tested (and I'm trying to come up with an easy way to extend all our test by enabling some options so we get most options tested, maybe not in every PR, but at least in some nightly/weekly job) and we can eventually find problems

@ThomasHaas
Copy link
Collaborator Author

EDIT: It also partially addresses #136

Out of curiosity (no need to get this done in this PR unless it is trivial): would non-det allocs work out of the box if we simple remove the exception when the size if not constant or what else needs to be implemented to handle those?

There are two more things:
(1) The alias analysis needs to support this. IIRC, the current one won't but the new one in #616 can handle it (maybe with small adaptions)
(2) The encoding of the memory layout cannot be completely static anymore. In particular, MemoryObject.size and MemoryObject.address have to become proper Expressions. Actually, MemoryObject.address should go away completely because the MemoryObject itself already represents its address. Instead, you would introduce a Map<MemoryObject, Formula> obj2Addr in EncodingContext that captures the address formulas.
Then ProgramEncoder.encodeMemory would need to generate constraints to make sure all objects are non-overlapping and properly aligned.
A little trick here is to reorder the memory so that all objects with known size come first because then you could still generate constant addresses for them.

cat/c11.cat Outdated Show resolved Hide resolved
- Improved sharing of subrelation "([R] \ [range(rf)]);loc" (it is used twice)
@hernanponcedeleon hernanponcedeleon merged commit 2d816d0 into development Mar 9, 2024
1 check passed
@hernanponcedeleon hernanponcedeleon deleted the uninitReads branch March 9, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants