Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support marking exprs as subsumed #301

Merged
merged 40 commits into from
Feb 23, 2024

Conversation

saulshanabrook
Copy link
Member

@saulshanabrook saulshanabrook commented Nov 27, 2023

EDIT: This PR has been changed to allow "subsuming" nodes, which are both unextractable and cannot be matched during rules: #301 (comment)


Previously, we had support for marking a whole function as unextractable. This PR extends that to also allow marking individual rows of a function as unextractable as well.

It also adds a high-level :unextractable parameter for the rewrite command to mark the LHS as unextractable after its been applied.

This is one way to support directional transformations/subsumption (closes #256 for now).

As I mentioned in that issue, one concrete use case for this is in my rewrites for numba compatibility of array expressions. In that example, I have to translate mean into sum expressions, when an axis parameter is passed to mean. I want the rewritten version extracted, but by default, the cost would be higher so it wouldn't be. My workaround at the moment was to give the whole mean function a huge cost so it would always be avoided if possible.

However, that solution isn't ideal for a few reasons:

  1. It mixes the rewrite logic with the original function definition. I would like all the numba logic to be contained in that separate module.
  2. It makes the extraction take longer because it's considering these large costs instead of just ignoring those options altogether.

With this PR, I could move the logic entirely to the rewrite.

Implementation

This PR implements the feature by storing a set of "unextractable" input values to each function. I am not sure if this is like safe for rebuilding, so I would love some feedback on that. If there is a better way of storing this information, I am happy to change it.

@saulshanabrook saulshanabrook requested a review from a team as a code owner November 27, 2023 15:41
@saulshanabrook saulshanabrook requested review from ajpal and removed request for a team November 27, 2023 15:41
@oflatt
Copy link
Member

oflatt commented Nov 27, 2023

A couple of high level thoughts:

  • Looks like a useful feature
  • This probably doesn't work with rebuilding- the approach I would take would be to store in the tables in the back-end which rows are unextractable. Later, any e-node equal to it should also be marked unextractable.
  • It would be cool, while we are add it, to add a way to mark things as subsumed- they will be extractable but not e-match-able.
  • We could also have things that are neither e-matchable nor extractable if they have both flags

@saulshanabrook
Copy link
Member Author

Thanks for taking a look @oflatt!

This probably doesn't work with rebuilding- the approach I would take would be to store in the tables in the back-end which rows are unextractable. Later, any e-node equal to it should also be marked unextractable.

Will do, I will move it into the table, as part of the vals.

It would be cool, while we are add it, to add a way to mark things as subsumed- they will be extractable but not e-match-able.
We could also have things that are neither e-matchable nor extractable if they have both flags

Oh yeah, I like this behavior. I was already worried that if like you mark something as unextractable, then it could still be matched and return a similar expression that would then be extractable. What if we just had one flag called subsumed that made rows un-extractable and un-matchable? Cna you think of a use case where you would only want one and not the other?

This would also give a better name to the rewrite flag, of :subsume instead of :unextractable which is a bit ambiguous and confusing I think

@saulshanabrook saulshanabrook marked this pull request as draft November 28, 2023 19:57
@oflatt
Copy link
Member

oflatt commented Nov 28, 2023

I'm guessing there are cases where we might want something to be not extractable, but still matchable or vice-versa.
For example, if you find something you know is better, but you still want to be able to match on this e-node to find something even better than that.

@saulshanabrook
Copy link
Member Author

saulshanabrook commented Nov 29, 2023

Ok I can add actions then for both of them and store the info separately.

Maybe the high-level rewrite flag though could enable both and be called like :replace, since the combination semantically is like you delete the old one, but it just can also never be re-added?

* Make row a struct so we can add attribute easier
* make function and table debug friendly
@saulshanabrook saulshanabrook marked this pull request as ready for review November 29, 2023 22:05
@saulshanabrook
Copy link
Member Author

This is ready for review again. I have updated it to:

  1. Add a subsume action which marks a row as subsumed and unable to be queried when running rules. It can still be queried when running a check.
  2. Add the :replace instead of the :unextractable argument to rewrite, which marks the LHS as subsumed and unextractable.

src/ast/mod.rs Outdated Show resolved Hide resolved
src/function/mod.rs Outdated Show resolved Hide resolved
src/function/table.rs Outdated Show resolved Hide resolved
src/gj.rs Outdated Show resolved Hide resolved
@oflatt
Copy link
Member

oflatt commented Dec 27, 2023

Do you think we should add some sort of warning if a whole e-class has been subsumed? I think it might be a potential foot-gun, and hard to prevent.

Example:

(datatype Math
  (Num i64)
  (Add Math Math))

;; lets subsume this  
(subsume (Add (Num 2) (Num 1)))
;; for some reason we like this version better
(Add (Add (Num 1) (Num 1)) (Num 1))

;; later we do constant folding
(union (Add (Num 1) (Num 1)) (Num 2))

;; now as far as I can tell we only have (Add (Num 2) (Num 1)) in the e-class, and it is subsumed.
;; for unextractable, that means we can't even extract anything

@saulshanabrook
Copy link
Member Author

Do you think we should add some sort of warning if a whole e-class has been subsumed? I think it might be a potential foot-gun, and hard to prevent.

Maybe we could expose the unextractable and subsumed status in the visualization? I was also thinking this would be helpful for debugging.

If we do this though, I would prefer to do it in a follow up PR, since it requires a bit of coordination with the egraph-serialized repo.

@saulshanabrook
Copy link
Member Author

I wanted to post here to re-clarify the driving use case for this PR to help with considering whether it should be updated and merged.

This feature would be useful when trying to rewrite a smaller expression into a larger one and extract that larger one.

In the array API example, for example, since Numab doesn't support mean with the axis argument (numba/numba#1269), we have to rewrite it to use the sum which does support the axis argument. Ideally, this rewrite would make the original expression unextractable.

Instead, I have to work around it's absence by making the mean function have a huge cost.

I want to be able to support extracting it as is when I don't apply the numba ruleset (and for instances that don't have the axis parameter), so I can't set it to unextractable.

This workaround is not ideal for two reasons:

  1. It mixes the numba logic in with the core array API definition. Ideally, the numba rewrite rules could written in such a way they don't require any changes to the main array API. They could live in a different package and there could be others like them for other backends, so it wouldn't be tenable to force all of these ad hoc cost changes into the core array API rules.
  2. The cost is arbitrary and it's unclear how high it has to be to effectively avoid the mean when the sum is available.

So with the features in this PR it would become easier to define a separate ruleset that translates smaller expressions into larger ones, and extract those larger ones if that ruleset is run, and if not extract the original smaller ones.


Also, I am not sure if this is relevant, but I did see an open issue in eggcc that also might be related (egraphs-good/eggcc#281), asking for a way to replace an expression with another one. It is possible the subsume and unextractable actions could help there too.

@saulshanabrook
Copy link
Member Author

saulshanabrook commented Feb 8, 2024

I joined the egglog meeting today (w/ @mwillsey, @ztatlock, @oflatt, and @ajpal) and we came up with a proposal that we:

  • Remove the unextractable keyword
  • Never extract things that are subsumed
  • Don't allow marking customs with custom merge functions as subsumed.

Thank you all!

I will update this PR to reflect these comments.


We also talked about some other possible workarounds, including separating unextractable/costs for functions from the function definitions so you could use different ones.

We also remarked that if you have multiple rules that subsume an expression, those could match at the same time and they would both be matched even though it is subsumed. It was expressed this was OK because this is similar to other actions that also act simultaneously after matching, even if it's unexpected. You could avoid this if you want by splitting rules into rulesets and running them one at a time.

@saulshanabrook saulshanabrook changed the title Adds support marking exprs as subsumed and/or unextractable Adds support marking exprs as subsumed Feb 8, 2024
@saulshanabrook
Copy link
Member Author

I have updated the PR to address the points from the discussion. This is ready for another review.

Copy link
Collaborator

@yihozhang yihozhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for adapting the PR to the new refactored codebase and incorporating conclusions from discussions! I left a small question. I believe once @oflatt approves, we should be able to merge this PR.

src/ast/mod.rs Outdated Show resolved Hide resolved
src/function/table.rs Outdated Show resolved Hide resolved
Copy link
Member

@oflatt oflatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost ready to merge!

src/actions.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Show resolved Hide resolved
Copy link
Member

@oflatt oflatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🎉 🎉

@saulshanabrook saulshanabrook merged commit 77ffa1f into egraphs-good:main Feb 23, 2024
3 checks passed
@saulshanabrook saulshanabrook deleted the unextractable branch February 23, 2024 21:05
@saulshanabrook
Copy link
Member Author

I wanted to note that I am now integrating these changes into the Python code and I can now remove my custom magic cost numbers! https://github.com/egraphs-good/egglog-python/pull/127/files#diff-9a9a81a04022815662adeb365e546523cb4376d5c1c797be3131968605f1c4c8

Thank you all for the time/energy spent refining this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to do directional transform?
3 participants