New evaluator that doesn't depend on flattening #1495

bugarela · 2024-09-02T16:50:18Z

Hello

This is a huge PR because it changes the core of our evaluation process to not depend on flattening anymore. I've moved a lot of things around and attempted many different things throughout this process, and I finally have a version that I think is good enough to ship.

Since this is pretty much a rewrite, we are getting rid of a lot of tech debt and fixing many bugs. Here's a list of what I could find:

Closes Hidden error when calling get with no key inside a fold #1471
Closes Simulator crashes with invalid init action #1439
Closes Promote effect and type checking errors in repl to errors #1330
Closes Issue loading modules when importing something in a loaded REPL #1342
Closes Use Either instead of Option in RuntimeValue #1058
Closes Remove compileFromCode and use more appropriate interfaces #1052
Closes Make name resolution incremental #934
Closes Name resolution can fail in the simulator depending on operator order #1035
Closes <undefined value> in the REPL when evaluating #924
Closes Refactor code compilation #830
Closes Random search should react to SIGINT #734
Closes Extend _lastTrace when evaluating actions #288
Closes redefining a function gives a "syntax error" #434
Closes How to run tests in deep modules? #1184
Closes Unhandled error if init and step params in the CLI have invalid syntax. #1151
Closes Make (the new) flattening incremental #1103 (as it won't be needed anymore)

The following issues are consequences of flattening bugs. After this PR, the issues stop happening for test, run, and repl, and only happen in verify and compile commands, as we still need flattening for Apalache. So the impact of these issues was significantly reduced.

Change overview

The main change is that, while the old evaluator can evaluate a single flattened module, this version can evaluate expressions, definitions, simulations and tests over the original modules, avoiding the flattening pass (which is slow and buggy).

Related to that, the new version compiles from a starting point: either an expression/definition typed into the REPL, a combiation of init/step/inv given to simulate, or from a test definition. It won't even touch code that is not a dependency of this entrypoint. The old evaluator would go through the entire flattened module and compile everything beforehand.

The biggest challenge is to handle constants. This was previously done by the flattener, and now we need to take care of them while simulating. A good example of this is the instances.qnt file:

quint/examples/language-features/instances.qnt

Lines 38 to 45 in 351b52a

    
           action init = all { V1::init, V2::init } 
        
           action step = all { 
        
             V1::counter' = V1::counter + 1, 
        
             V2::counter' = V2::counter + 2, 
        
           } 
        
           // The two variables should increment independently 
        
           val inv = V1::counter == 0 or V2::counter > V1::counter

Evaluating V1::init and V2::init are two different things, even though both point to the same definition in the lookup table. To make sure we use different constant values for different constants, and that we use different variable registers for variables that come from the same module but through different instances, we need to track a some extra information while building (compiling) expressions.

Structure

The new evaluator has two main steps:

Building (builder.ts). Take Quint expressions and build typescript arrow functions that can be called to obtain the value of that expression under a certain context
Evaluating (`evaluator.ts). Take the built arrow functions and call them, keeping track of the context.

Performance

Although performance changes are not consistent, my evaluation is that it it overall better.

For the REPL, it is a complete game changer. For the malachite project, evaluating any expression in the REPL would take around 20 seconds, while now it is completely instant.

For runs and tests, it is faster for most cases, sometimes taking half of the time. For some cases, it is slower, but I couldn't find any case where it takes double of the time or more. I want to further improve this version so we can get faster results for all scenarios, but I think this version is good enough to release, specially considering the REPL performance and how many bugs it fixes. See benchmark details below.

Benchmarks

"old version" in the benchmarks is the last released version. Ideally I should benchmark against main, but this is easier. Also, we are measuring simulation/testing time, and performance changes outside of this execution scope are not accounted.

Cases where the new version is better than the old one:

quint run --main=ics20Test --invariant=BalanceNonNegative ../examples/cosmos/ics20/ics20.qnt  --seed=0xbf95eeae47ada
old: (50803ms)
new: (25292ms)

quint run --main=prisoners3 --invariant='countInv and safetyInv' ../examples/puzzles/prisoners/prisoners.qnt
old: (8033ms)
new: (4848ms)

quint run --main=prisoners3 --invariant='countInv and safetyInv' ../examples/puzzles/prisoners/prisoners.qnt --max-samples=50000
old: (39185ms)
new: (23397ms)

quint test mysticeti_paper_test.qnt --max-samples=100
old: (24329ms)
new: (12704ms)

quint run mysticeti_paper_test.qnt --init=init_with_dag --max-steps=1 --max-samples=100
old: (9157ms)
new: (4187ms)

PS: The mysticeti ones actually have a flattening bug in the old version and I have to make a workaround

quint test --max-samples 1000 specs/quint/tests/disagreement/disagreementTest.qnt
old: (17998ms)
new: (16574ms)

This is the command that runs on Malachite's CI:
time bash scripts/quint-forall.sh "test --max-samples 100" specs/quint/tests/**/*Test.qnt
new:
Executed in  117.45 secs    fish           external
   usr time  152.82 secs    0.00 micros  152.82 secs
   sys time    6.73 secs  663.00 micros    6.73 secs
old:
Executed in  163.25 secs    fish           external
   usr time  196.00 secs  348.00 micros  196.00 secs
   sys time    7.89 secs  210.00 micros    7.89 secs

Cases where the new version is worse than the old one:

quint test --max-samples=100 --main TendermintModels ../examples/cosmos/tendermint/TendermintModels.qnt
old: (3653ms)
new: (6870ms)

quint test --main ics20Test ../examples/cosmos/ics20/ics20.qnt
old: (61450ms)
new: (101720ms)

quint test --main=properChannelsTests ../examples/cosmos/ics20/denomTrace.qnt
old: (4635ms)
new: (6770ms)

quint test --max-samples 100 specs/quint/tests/line28/line28Test.qnt
old: (20624ms)
new: (25692ms)

Tests added for any new code
Documentation added for any new functionality
Entries added to the respective CHANGELOG.md for any new functionality
Feature table on README.md updated for any listed functionality

…ntly)

…stance-constant-callgraph" This change is no longer necessary as this dependency no longer exists. I've checked that the reported problem doesn't happen after the revert This reverts commit 384c28e, reversing changes made to 576d72b.

This test case describes a scenario where we don't get the desired result. The behavior changed, not for the better nor for the worse.

…utput

bugarela · 2024-09-02T18:36:47Z

Hi @romac @MahtabNorouzi! I realize this is way too big to review. Unfortunately, the nature of this work and my current resources made it so this is the only feasible option. @josef-widder has been using a version of this since last week as a beta tester, and I'll also be using this heavily over this week. I've also been testing this with our set of examples and I'm confident with its current state.

So I'd like to ask that you give this a look, however it fits in your time and priorities right now. Let me know if you think this is worth merging/releasing, or if there are any other requirements I should meet before putting it out.

I'm also happy to answer any questions or go through stuff in detail either in here, slack or zoom.

bugarela · 2024-09-02T18:39:31Z

@konnov if you want to take a look at this, I'm sure you'll have valuable feedback. I think this is an evolution of what you wrote, and I would not be able to come up with it without the base you provided through your implementation.

konnov · 2024-09-03T10:25:08Z

@konnov if you want to take a look at this, I'm sure you'll have valuable feedback. I think this is an evolution of what you wrote, and I would not be able to come up with it without the base you provided through your implementation.

It's a massive change. I think it would make more sense to break it down into much smaller PRs. Reviewing such a big change in the simulator is a full-time job for a week. If you need help with that, Informal management knows my hourly rates.

bugarela · 2024-09-03T11:51:16Z

It's a massive change. I think it would make more sense to break it down into much smaller PRs. Reviewing such a big change in the simulator is a full-time job for a week. If you need help with that, Informal management knows my hourly rates.

Of course, yes. I've always be very in favor of small PRs and feel bad that this is so big. However, in this particular case, I don't think the effort taken (probably more than a week) to break this down would be compensated in significantly more informed reviews, considering the people available to review it at this point.

On top of that, I mentioned you here to make you aware of this change. I've tested the new version in many of our internal and public Quint projects and it's all good. I know you are also writing Quint specs, and I'd rather have you informed than surprised by such meaningful change in the simulator.

And sorry if I implied that I expected you to work for free! I'm just trying to be considerate of your participation in this as best as I can.

konnov · 2024-09-04T09:35:54Z

It's a massive change. I think it would make more sense to break it down into much smaller PRs. Reviewing such a big change in the simulator is a full-time job for a week. If you need help with that, Informal management knows my hourly rates.

Of course, yes. I've always be very in favor of small PRs and feel bad that this is so big. However, in this particular case, I don't think the effort taken (probably more than a week) to break this down would be compensated in significantly more informed reviews, considering the people available to review it at this point.

On top of that, I mentioned you here to make you aware of this change. I've tested the new version in many of our internal and public Quint projects and it's all good. I know you are also writing Quint specs, and I'd rather have you informed than surprised by such meaningful change in the simulator.

And sorry if I implied that I expected you to work for free! I'm just trying to be considerate of your participation in this as best as I can.

Hey @bugarela! No worries! I am just saying that I could have a look at smaller PRs. Understanding a massive change is quite hard without having context. Normally the context exchange happens during regular meetings. Since we don't have those anymore, reviewing such a big change would be similar to a security audit to me, of course, the focus would not be on the security bugs.

If you have no other choice to refactor this big PR into smaller chunks, I would recommend you to introduce a flag to run both the old simulator and the new one in a lockstep, so you would be able to quickly detect when they start to deviate.

bugarela · 2024-09-09T12:47:08Z

quint/src/names/collector.ts

-      // Update the definition to point to the expression being overriden
-      constDef.id = ex.id


This was a kind of a hack so the compiler would evaluate the constant as the expression it was overridden to. We now do it on the level of registers on the evaluator.

bugarela · 2024-09-09T14:12:01Z

quint/src/static/callgraph.ts

-      decl.overrides.forEach(([_param, expr]) => {
-        this.graphAddAll(expr.id, Set([decl.id]))
-      })


This is from a reverted commit, which added a hack for something we don't need anymore

different version numbers

bugarela added 30 commits August 20, 2024 09:46

Rewrite evaluator to not depend on flattening

1bf92b6

Fix many evaluation issues

36bf10e

Fix memo by making it mutable again

4a2ab82

Attempt some improvements over context performance

f3c519c

Avoid many lookups

b43affb

Use registers for parameters

04ceb06

Save constants in registers as well

76e069c

Implement then and add recorder calls

ff60acc

Use registers for variables

19a63bb

Optimize a few things

cefbc6f

Go back to using RuntimeValue in traces

1f7cb83

Fix and update many integration tests

d135fc3

Track nondet picks in the trace

57be2a2

Track action taken in the trace

5f067fd

Fix/update all integration tests

e1e0e89

Organize between files

ec043f0

Separate and memoize building

6e8576f

Improve performance by caching val (temporarly) and pure val (permane…

d4ad6bc

…ntly)

Cache constants and fix metadata tracking

dbda074

Update unit tests

cd3198a

Ensure REPL still works after name errors

0b36c5c

Improve error handling in REPL and add tests for that

3d61827

Reorganize and delete unused code

ba55e38

Fix and update most tests

0923a83

Document new evaluator's code

017c5d8

Update test for unsupported scenario

2e9d0af

This test case describes a scenario where we don't get the desired result. The behavior changed, not for the better nor for the worse.

Merge remote-tracking branch 'origin/main' into gabriela/new-evaluator

c866d20

Fix formatting issue

4ff1347

Fix integration test by making sure the proper table is used in the o…

002bd29

…utput

bugarela added 4 commits September 2, 2024 14:20

Update fixtures, consts now keep their proper id in the table

b486185

Add missing progress bar update (that got lost in conflict resolution)

87dfdfc

Add CHANGELOG entries

3116a43

Remove remainder from old behavior left as comment

2537fdb

bugarela marked this pull request as ready for review September 2, 2024 18:32

bugarela requested review from romac and MahtabNorouzi September 2, 2024 18:32

Fix hint text

949408c

bugarela commented Sep 9, 2024

View reviewed changes

Improve code quality a bit

76bbfb8

MahtabNorouzi approved these changes Sep 9, 2024

View reviewed changes

bugarela added 2 commits September 9, 2024 13:34

Fix REPL issue when evaluating a variable that was never referenced

be10877

Merge remote-tracking branch 'origin/main' into gabriela/new-evaluator

c641178

bugarela enabled auto-merge September 9, 2024 18:15

Use -q flag in new REPL integration tests to avoid dealing with

c2d27e6

different version numbers

bugarela merged commit e9a0b38 into main Sep 9, 2024
14 checks passed

bugarela deleted the gabriela/new-evaluator branch September 9, 2024 18:53

bugarela mentioned this pull request Sep 9, 2024

Release v0.22.0 #1505

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New evaluator that doesn't depend on flattening #1495

New evaluator that doesn't depend on flattening #1495

bugarela commented Sep 2, 2024 •

edited

Loading

bugarela commented Sep 2, 2024

bugarela commented Sep 2, 2024

konnov commented Sep 3, 2024

bugarela commented Sep 3, 2024

konnov commented Sep 4, 2024

bugarela Sep 9, 2024

bugarela Sep 9, 2024

	action init = all { V1::init, V2::init }
	action step = all {
	V1::counter' = V1::counter + 1,
	V2::counter' = V2::counter + 2,
	}

	// The two variables should increment independently
	val inv = V1::counter == 0 or V2::counter > V1::counter

		// Update the definition to point to the expression being overriden
		constDef.id = ex.id

New evaluator that doesn't depend on flattening #1495

New evaluator that doesn't depend on flattening #1495

Conversation

bugarela commented Sep 2, 2024 • edited Loading

Change overview

Structure

Performance

bugarela commented Sep 2, 2024

bugarela commented Sep 2, 2024

konnov commented Sep 3, 2024

bugarela commented Sep 3, 2024

konnov commented Sep 4, 2024

bugarela Sep 9, 2024

Choose a reason for hiding this comment

bugarela Sep 9, 2024

Choose a reason for hiding this comment

bugarela commented Sep 2, 2024 •

edited

Loading