Reads without init events #624

ThomasHaas · 2024-02-23T15:32:07Z

This PR attempts to address #623 .

There is some cleanup to do.
There is also a problem with removing init events from litmus tests. I need to debug this (I suppose it is related to co-last memory values?)

EDIT: It also partially addresses #136

…tions

- Some changes in CAT models to support uninit reads - Minor other changes

- Automatically add empty(UR;loc;[IW]) axiom to all memory models.

# Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/verification/solving/ModelChecker.java

ThomasHaas · 2024-03-02T09:05:00Z

I need some feedback on how far we want to go in this branch.
(1) If I make sure that we generate all init events as before (even for dynamic allocations), everything works(*) and this PR is pretty much done.
(2) If we don't create init events for dynamic allocations, at least C11 needs some rewriting (this is why a test fails). The current implementation already changes the definition of fr which, together with SC-per-loc, seems to be sufficient for most models to handle to reads from uninitialized memory (I cannot give a 100% guarantee on correctness though!).
However, C11 defines it's own fr (which we can remove) and does not have SC-per-loc and thus requires the rewriting of an axiom.

(*) As long as the programs are really memory-safe, there should be no change to our current semantics. For programs that can read memory out-of-bounds, we previously could give a spurious PASS as the read had no possible rf-edges and we always required the existence of one. This has changed in this PR: the out-of-bounds reads are possible.

EDIT: I added two new options to (i) toggle between zeroed/non-deterministic default values (only relevant to uninitialized memory) and (ii) to toggle between creating/not creating init events for dynamic allocations. The default values are such that they match our current behaviour. This should also make the above question obsolete.

…dynamic allocations (both are enabled by default) - Added new test showing that we can read uninit memory.

hernanponcedeleon · 2024-03-02T11:24:33Z

A few questions / comments that come to my mind

the current implementation would return PASS ("be correct" might be an over-claim since the program has UB) for this variant of the uninitRead.c, right?

int main()
{
    array[SIZE] = 42;
    assert (array[SIZE] == 42);
    return 0;
}

At some point I thought handling out-of-memory would not work out-of-box due to our alias analysis relying on defined behavior

we need to be sure we properly handle all variants of alloc (malloc, calloc, ...). AFAIR they vary in how (if) the initialize memory which used to be irrelevant for us when we zeroed out all memory
I don't think the last question is irrelevant now: if one set the new options to false, we return unsound results now, so this still needs to be discussed / fixed
which is the axiom giving problems in c11? The chain of co and rf in Coh?
we should add a TODO to the part where you programmatically set the new fr. This should be moved to some common cat once we manage imports

ThomasHaas · 2024-03-02T11:59:39Z

A few questions / comments that come to my mind

* the current implementation would return PASS ("be correct" might be an over-claim since the program has UB) for this variant of the `uninitRead.c`, right?

int main()
{
    array[SIZE] = 42;
    assert (array[SIZE] == 42);
    return 0;
}

I think the alias analysis will prohibit those two from aliasing in both the new and the old version, so you will get unsound results in general. In this particular case I don't know what result you will get (possibly PASS for both, but for the wrong reasons).
Anyway, this is simply due to the alias analysis prohibiting this and not due to our anarchic semantics.
For example, if you would load the index from memory, say int index = SIZE; .... array[index] = 42; assert (...) the alias analysis will think they can alias and then we still get correct results.
Also note that everything works fine if you access uninitialized memory that has been allocated (dynamic allocs) even if that is UB: the alias analysis handles those correctly!
In that sense, this PR is a conservative extension of our current code: we soundly handle some more types of UB memory accesses than before but we still do not handle all of them. Also, by adding flag ~empty (UR) you can now check if the program has UB.

At some point I thought handling out-of-memory would not work out-of-box due to our alias analysis relying on defined behavior
* we need to be sure we properly handle all variants of alloc (malloc, calloc, ...). AFAIR they vary in how (if) the initialize memory which used to be irrelevant for us when we zeroed out all memory

Yes, calloc zeroes out the memory and thus would create init events as opposed to malloc.

* I don't think the last question is irrelevant now: if one set the new options to false, we return unsound results now, so this still needs to be discussed / fixed

Of course, but now we have the possibility to defer the discussion/the fixing. A proper fix would likely need a lot of practical testing.
And there are a lot of things one needs to think about:
(1) For every memory model you need to check if the new fr is sufficient or if other changes are necessary. I think in particular C11 and the GPU models need rework. But you could also argue for now that those models are just not compatible with uninit reads and so combining them is a user error rather than a tool error.
(2) What about final memory values if there is no store to that location? Our current encoding requires at least one store to exist and, e.g., litmus tests would fail if you removed init events.

* which is the axiom giving problems in c11? The chain of co and rf in Coh?

It's the Coh axiom. You need to change it like this I believe:

irreflexive ((rf^-1)? ; mo ; rf? ; hb) as Coh // old version
irreflexive ((fr | mo) ; rf? ; hb) as Coh // new version

* we should add a TODO to the part where you programmatically set the new fr. This should be moved to some common cat once we manage imports

I don't think a TODO is required specifically for fr because all relations defined in getOrCreatePredefinedRelation need to be revisited once you add imports. In particular, even if you kept the current fr, you would still need to remove its definition from getOrCreatePredefinedRelation when adding imports.

hernanponcedeleon · 2024-03-06T01:33:52Z

we need to be sure we properly handle all variants of alloc (malloc, calloc, ...). AFAIR they vary in how (if) the initialize memory which used to be irrelevant for us when we zeroed out all memory
Yes, calloc zeroes out the memory and thus would create init events as opposed to malloc.

Both intrinsics seem to be handled in a similar way. The only diff I see is that calloc takes an extra argument and the size is computed differently, but I don't yet see how we would enforce initialization in the calloc case, but not on malloc.

* I don't think the last question is irrelevant now: if one set the new options to false, we return unsound results now, so this still needs to be discussed / fixed
Of course, but now we have the possibility to defer the discussion/the fixing. A proper fix would likely need a lot of practical testing. And there are a lot of things one needs to think about: (1) For every memory model you need to check if the new fr is sufficient or if other changes are necessary. I think in particular C11 and the GPU models need rework. But you could also argue for now that those models are just not compatible with uninit reads and so combining them is a user error rather than a tool error. (2) What about final memory values if there is no store to that location? Our current encoding requires at least one store to exist and, e.g., litmus tests would fail if you removed init events.

If the option is not compatible with a given model, we should at least check this and throw an exception if the combination is used.

From the testing perspective, the minimum would be to fix the cases we already known are wrong. I just run all unit tests and I saw only two problems: (1) related to C11 which might be fixed changing the model as you explained, (2) a timeout on RISCV, maybe related to the rf^-1 ; rf (or some similar pattern).

* we should add a TODO to the part where you programmatically set the new fr. This should be moved to some common cat once we manage imports
I don't think a TODO is required specifically for fr because all relations defined in getOrCreatePredefinedRelation need to be revisited once you add imports. In particular, even if you kept the current fr, you would still need to remove its definition from getOrCreatePredefinedRelation when adding imports.

I was under the impression that this was the only remaining non-base relation that reamined there, but you are right we still have a few, e.g. {rf, co} \cap external

ThomasHaas · 2024-03-06T08:35:52Z

Both intrinsics seem to be handled in a similar way. The only diff I see is that calloc takes an extra argument and the size is computed differently, but I don't yet see how we would enforce initialization in the calloc case, but not on malloc.

That should be straightforward by just adding an extra field to Alloc saying whether the memory gets zeroed out or not.

If the option is not compatible with a given model, we should at least check this and throw an exception if the combination is used.

I don't think this is so straightforward because how would you know that the provided model is incompatible? We could
legitimately have two C11 models, one being the original (incompatible one) and one being the new one.
We don't even throw exceptions if the CAT model and the target architecture are mismatching and that one is a bigger source of errors.
Basically, throwing exceptions based on the CAT model is not feasible because the only way we can detect what memory model is used is by checking its name/filepath, but I don't think this is a smart idea.

From the testing perspective, the minimum would be to fix the cases we already known are wrong. I just run all unit tests and I saw only two problems: (1) related to C11 which might be fixed changing the model as you explained, (2) a timeout on RISCV, maybe related to the rf^-1 ; rf (or some similar pattern).

The question is if the change to the model is a real fix or if it just hides one problem but creates different one's?
All the GPU models are also wrong but you don't see it in the unit tests because litmus tests don't have dynamic allocation.
I can "fix" C11 and leave the original axiom commented out in the file (similar to how we commented out all the lock stuff we don't support) though (EDIT: I suspect that more parts that need changes but the proposed one is sufficient for now).
Btw. what RISC-V timed out? That's surprising because the rf^-1;rf pattern is only used in the original RISC-V but not in my rewritten version. rf^-1;rf would also need to get updated to be compatible with missing init events, though I cannot imagine a case where something breaks.

…and dynamically allocated objects. - Added zeroing-out for Calloc.

ThomasHaas · 2024-03-06T09:37:45Z

I changed C11 and reworked calloc. Since the unit tests should work now with the new settings, should I enable them by default? (EDIT: I accidentally enabled the new setting already)

cat/c11.cat

dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/MemoryAllocation.java

dartagnan/src/main/java/com/dat3m/dartagnan/wmm/RelationNameRepository.java

dartagnan/src/main/java/com/dat3m/dartagnan/wmm/processing/AddInitReadHandling.java

hernanponcedeleon · 2024-03-08T01:29:14Z

EDIT: It also partially addresses #136

Out of curiosity (no need to get this done in this PR unless it is trivial): would non-det allocs work out of the box if we simple remove the exception when the size if not constant or what else needs to be implemented to handle those?

hernanponcedeleon · 2024-03-08T01:35:59Z

I changed C11 and reworked calloc. Since the unit tests should work now with the new settings, should I enable them by default? (EDIT: I accidentally enabled the new setting already)

I think it is good to get this option enabled so it is always tested (and I'm trying to come up with an easy way to extend all our test by enabling some options so we get most options tested, maybe not in every PR, but at least in some nightly/weekly job) and we can eventually find problems

ThomasHaas · 2024-03-08T08:09:02Z

EDIT: It also partially addresses #136

Out of curiosity (no need to get this done in this PR unless it is trivial): would non-det allocs work out of the box if we simple remove the exception when the size if not constant or what else needs to be implemented to handle those?

There are two more things:
(1) The alias analysis needs to support this. IIRC, the current one won't but the new one in #616 can handle it (maybe with small adaptions)
(2) The encoding of the memory layout cannot be completely static anymore. In particular, MemoryObject.size and MemoryObject.address have to become proper Expressions. Actually, MemoryObject.address should go away completely because the MemoryObject itself already represents its address. Instead, you would introduce a Map<MemoryObject, Formula> obj2Addr in EncodingContext that captures the address formulas.
Then ProgramEncoder.encodeMemory would need to generate constraints to make sure all objects are non-overlapping and properly aligned.
A little trick here is to reorder the memory so that all objects with known size come first because then you could still generate constant addresses for them.

- Updated description of what makes c11.cat different from c11-orig.cat

- Some overall cleanup in the tests

cat/c11.cat

dartagnan/src/test/java/com/dat3m/dartagnan/c/C11OrigLFDSTest.java

dartagnan/src/test/java/com/dat3m/dartagnan/c/C11OrigLocksTest.java

…f)])

dartagnan/src/main/java/com/dat3m/dartagnan/wmm/Wmm.java

dartagnan/src/test/java/com/dat3m/dartagnan/c/C11OrigLocksTest.java

- Improved sharing of subrelation "([R] \ [range(rf)]);loc" (it is used twice)

ThomasHaas added 5 commits February 23, 2024 14:56

Added initial version of uninit reads (does not work for CAAT yet)

b7a1583

Fixed issues and added CAAT support for new uninit read relation

9659ad3

Updated MemoryAllocation to not create init events for dynamic alloca…

59e830a

…tions

DRAFT COMMIT:

3115f3b

- Some changes in CAT models to support uninit reads - Minor other changes

DRAFT COMMIT:

eca51b5

- Automatically add empty(UR;loc;[IW]) axiom to all memory models.

ThomasHaas changed the title ~~Reads without init events~~ [DRAFT] Reads without init events Feb 24, 2024

ThomasHaas mentioned this pull request Feb 24, 2024

Getting rid of (unnecessary) init events #623

Closed

ThomasHaas added 4 commits March 1, 2024 17:53

Reverted CAT model changes

00387d2

Merge branch 'development' into uninitReads

02a4657

# Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/verification/solving/ModelChecker.java

Moved init read axiom instrumentation into Wmm processor pass.

dc359dc

Reverted some changes that were temporary fixes.

a689ac6

- Added new options to zero-out memory and to create init events for …

20ade00

…dynamic allocations (both are enabled by default) - Added new test showing that we can read uninit memory.

ThomasHaas changed the title ~~[DRAFT] Reads without init events~~ Reads without init events Mar 2, 2024

Merge branch 'development' into uninitReads

8379814

ThomasHaas added 2 commits March 6, 2024 09:47

Updated C11 to be (more) compatible with the lack of init events.

071212e

- Generalized MemoryObject.initialValues to work for both statically …

e0dfd05

…and dynamically allocated objects. - Added zeroing-out for Calloc.

Merge branch 'development' into uninitReads

dc42dd6