Adds support marking exprs as subsumed #301

saulshanabrook · 2023-11-27T15:41:40Z

EDIT: This PR has been changed to allow "subsuming" nodes, which are both unextractable and cannot be matched during rules: #301 (comment)

Previously, we had support for marking a whole function as unextractable. This PR extends that to also allow marking individual rows of a function as unextractable as well.

It also adds a high-level :unextractable parameter for the rewrite command to mark the LHS as unextractable after its been applied.

This is one way to support directional transformations/subsumption (closes #256 for now).

As I mentioned in that issue, one concrete use case for this is in my rewrites for numba compatibility of array expressions. In that example, I have to translate mean into sum expressions, when an axis parameter is passed to mean. I want the rewritten version extracted, but by default, the cost would be higher so it wouldn't be. My workaround at the moment was to give the whole mean function a huge cost so it would always be avoided if possible.

However, that solution isn't ideal for a few reasons:

It mixes the rewrite logic with the original function definition. I would like all the numba logic to be contained in that separate module.
It makes the extraction take longer because it's considering these large costs instead of just ignoring those options altogether.

With this PR, I could move the logic entirely to the rewrite.

Implementation

This PR implements the feature by storing a set of "unextractable" input values to each function. I am not sure if this is like safe for rebuilding, so I would love some feedback on that. If there is a better way of storing this information, I am happy to change it.

oflatt · 2023-11-27T19:19:38Z

A couple of high level thoughts:

Looks like a useful feature
This probably doesn't work with rebuilding- the approach I would take would be to store in the tables in the back-end which rows are unextractable. Later, any e-node equal to it should also be marked unextractable.
It would be cool, while we are add it, to add a way to mark things as subsumed- they will be extractable but not e-match-able.
We could also have things that are neither e-matchable nor extractable if they have both flags

saulshanabrook · 2023-11-27T21:33:25Z

Thanks for taking a look @oflatt!

This probably doesn't work with rebuilding- the approach I would take would be to store in the tables in the back-end which rows are unextractable. Later, any e-node equal to it should also be marked unextractable.

Will do, I will move it into the table, as part of the vals.

It would be cool, while we are add it, to add a way to mark things as subsumed- they will be extractable but not e-match-able.
We could also have things that are neither e-matchable nor extractable if they have both flags

Oh yeah, I like this behavior. I was already worried that if like you mark something as unextractable, then it could still be matched and return a similar expression that would then be extractable. What if we just had one flag called subsumed that made rows un-extractable and un-matchable? Cna you think of a use case where you would only want one and not the other?

This would also give a better name to the rewrite flag, of :subsume instead of :unextractable which is a bit ambiguous and confusing I think

oflatt · 2023-11-28T23:20:16Z

I'm guessing there are cases where we might want something to be not extractable, but still matchable or vice-versa.
For example, if you find something you know is better, but you still want to be able to match on this e-node to find something even better than that.

saulshanabrook · 2023-11-29T01:32:52Z

Ok I can add actions then for both of them and store the info separately.

Maybe the high-level rewrite flag though could enable both and be called like :replace, since the combination semantically is like you delete the old one, but it just can also never be re-added?

* Make row a struct so we can add attribute easier * make function and table debug friendly

saulshanabrook · 2023-11-29T22:08:28Z

This is ready for review again. I have updated it to:

Add a subsume action which marks a row as subsumed and unable to be queried when running rules. It can still be queried when running a check.
Add the :replace instead of the :unextractable argument to rewrite, which marks the LHS as subsumed and unextractable.

src/ast/mod.rs

src/function/mod.rs

src/function/table.rs

src/gj.rs

oflatt · 2023-12-27T03:09:13Z

Do you think we should add some sort of warning if a whole e-class has been subsumed? I think it might be a potential foot-gun, and hard to prevent.

Example:

(datatype Math
  (Num i64)
  (Add Math Math))

;; lets subsume this  
(subsume (Add (Num 2) (Num 1)))
;; for some reason we like this version better
(Add (Add (Num 1) (Num 1)) (Num 1))

;; later we do constant folding
(union (Add (Num 1) (Num 1)) (Num 2))

;; now as far as I can tell we only have (Add (Num 2) (Num 1)) in the e-class, and it is subsumed.
;; for unextractable, that means we can't even extract anything

saulshanabrook · 2023-12-27T15:22:52Z

Do you think we should add some sort of warning if a whole e-class has been subsumed? I think it might be a potential foot-gun, and hard to prevent.

Maybe we could expose the unextractable and subsumed status in the visualization? I was also thinking this would be helpful for debugging.

If we do this though, I would prefer to do it in a follow up PR, since it requires a bit of coordination with the egraph-serialized repo.

saulshanabrook · 2024-02-08T00:03:43Z

I wanted to post here to re-clarify the driving use case for this PR to help with considering whether it should be updated and merged.

This feature would be useful when trying to rewrite a smaller expression into a larger one and extract that larger one.

In the array API example, for example, since Numab doesn't support mean with the axis argument (numba/numba#1269), we have to rewrite it to use the sum which does support the axis argument. Ideally, this rewrite would make the original expression unextractable.

Instead, I have to work around it's absence by making the mean function have a huge cost.

I want to be able to support extracting it as is when I don't apply the numba ruleset (and for instances that don't have the axis parameter), so I can't set it to unextractable.

This workaround is not ideal for two reasons:

It mixes the numba logic in with the core array API definition. Ideally, the numba rewrite rules could written in such a way they don't require any changes to the main array API. They could live in a different package and there could be others like them for other backends, so it wouldn't be tenable to force all of these ad hoc cost changes into the core array API rules.
The cost is arbitrary and it's unclear how high it has to be to effectively avoid the mean when the sum is available.

So with the features in this PR it would become easier to define a separate ruleset that translates smaller expressions into larger ones, and extract those larger ones if that ruleset is run, and if not extract the original smaller ones.

Also, I am not sure if this is relevant, but I did see an open issue in eggcc that also might be related (egraphs-good/eggcc#281), asking for a way to replace an expression with another one. It is possible the subsume and unextractable actions could help there too.

saulshanabrook · 2024-02-08T17:59:50Z

I joined the egglog meeting today (w/ @mwillsey, @ztatlock, @oflatt, and @ajpal) and we came up with a proposal that we:

Remove the unextractable keyword
Never extract things that are subsumed
Don't allow marking customs with custom merge functions as subsumed.

Thank you all!

I will update this PR to reflect these comments.

We also talked about some other possible workarounds, including separating unextractable/costs for functions from the function definitions so you could use different ones.

We also remarked that if you have multiple rules that subsume an expression, those could match at the same time and they would both be matched even though it is subsumed. It was expressed this was OK because this is similar to other actions that also act simultaneously after matching, even if it's unexpected. You could avoid this if you want by splitting rules into rulesets and running them one at a time.

saulshanabrook · 2024-02-12T18:54:21Z

I have updated the PR to address the points from the discussion. This is ready for another review.

yihozhang

This looks great! Thanks for adapting the PR to the new refactored codebase and incorporating conclusions from discussions! I left a small question. I believe once @oflatt approves, we should be able to merge this PR.

src/ast/mod.rs

src/function/table.rs

egraphs-good#301 (comment)

oflatt

Looks almost ready to merge!

src/actions.rs

src/ast/mod.rs

oflatt

Looks great! 🎉 🎉

saulshanabrook · 2024-03-20T20:25:21Z

I wanted to note that I am now integrating these changes into the Python code and I can now remove my custom magic cost numbers! https://github.com/egraphs-good/egglog-python/pull/127/files#diff-9a9a81a04022815662adeb365e546523cb4376d5c1c797be3131968605f1c4c8

Thank you all for the time/energy spent refining this.

saulshanabrook added 3 commits November 27, 2023 10:03

Add support for making expression unextractable

42c9eba

Add support for unextractable arg to functions

50fa8b4

Add unextractable example

ad4cec2

saulshanabrook requested a review from a team as a code owner November 27, 2023 15:41

saulshanabrook requested review from ajpal and removed request for a team November 27, 2023 15:41

Add two failing tests about rebuilding

190dcd5

saulshanabrook marked this pull request as draft November 28, 2023 19:57

Fix breaking test to be accurate

3fcc3bd

saulshanabrook added 2 commits November 29, 2023 10:31

Add another test

e7b430a

Move unextractable to table

78bcdb5

* Make row a struct so we can add attribute easier * make function and table debug friendly

saulshanabrook force-pushed the unextractable branch from c0d174b to 78bcdb5 Compare November 29, 2023 15:31

saulshanabrook added 7 commits November 29, 2023 11:25

Add support for subsuming nodes

fb58cf8

Change rewrite arg to subsume and mark as unextractable

5433414

Fix example file

6ab1817

Add subsumption to example file

1dcbd72

Fix subsumption handling by moving to querying

043fdb7

Allow replace tests to fail with term encoding

5b56a67

Skip those tests all together

f0ea440

saulshanabrook marked this pull request as ready for review November 29, 2023 22:05