-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add action to set custom cost #355
Conversation
c05cec8
to
919a246
Compare
I like this approach in general. Where does the |
Thanks for taking a look Max! I updated the description to include the egglog comparison of APIs as well. Regarding min, I believe currently it's similar to a on merge function that just takes the new value. It does it here. We could add a way to customize this, or have a different default. I could imagine adding the ability to do something like For both of these, I wonder about waiting to add them until/if they are needed for a particular use case? Or maybe there is a way to add them that is easy enough, just hesitant to add extra complexity until it's needed. |
Hmm now I'm questioning whether either this or my PR provide the functionality needed from the zulip thread. We need to write out a full example that actually calculates the cost. So in your world, it would look something like this right?
I think we need a way to read the cost of any expression of that sort, otherwise I'm not how the rules to set cost should work. |
Here's how you might do it by defining a table to store the minimum cost for an eclass
|
Right, okay that makes sense. It's quite verbose, and the first rule is pure boilerplate, it should never really look different than that. You'd also need to make rules for every other constructor (even if you were still using AST size as their cost). Maybe I'm missing something? |
Yeah, it's pretty annoying I agree. We could generate these rules or bake them in somehow |
Ok, I propose that we close/hold both this and #353 because it seems like a realistic use case requires something like @oflatt's code sample above, and I think neither PR actually makes a big dent in the boilerplate that you'd need. We could do something with more code generation, but given that I think we don't know what parts should be generated vs left alone, we should maybe just hold off. We do however need someway of actually doing the extraction w.r.t. different costs. I think @yihozhang proposed something like passing in pairs of |
Right I'm, not sure how much should be generated. I'm against extending the language in a way that is limited or a hack |
I would support waiting on an implementation until we have the use cases more clearly defined. One possibly way to do that would be some sort of needs/design doc that at least articulates the different needs, and from there we could then see how different options achieve them most elegently. |
Closing this for now, since we might want to refine the design space here a bit more. |
How stable is this PR to use? I'm working on a project that needs this feature but I cannot wait for the redesigns and brainstorming (although that's good, thank you for your efforts and high standards). |
@AzizZayed I will merge in the latest commits to this branch so that it is suitable for you to try working with. We have started merging in some things with an "unstable" prefix, so that might be an option here, to give folks a chance to play with this before committing to a final design. |
This PR is now up to date and the action is now named I will try seeing if @tylerhou's original example (which motivated this work) could be made workable using this PR https://egraphs.zulipchat.com/#narrow/stream/328972-general/topic/How.20can.20I.20find.20the.20tree.20associated.20with.20an.20extraction.3F |
e1e0d79
to
35d2186
Compare
I added the ability to set costs to be expressions, in order to support the tree join example. This is how you would implement it with this PR: (datatype JoinTree
(Rel String)
(Join JoinTree JoinTree))
(function size (JoinTree) i64 :merge (min old new))
(let ra (Rel "a"))
(let rb (Rel "b"))
(let rc (Rel "c"))
(let rd (Rel "d"))
(let re (Rel "e"))
(let rf (Rel "f"))
(let query (Join (Join (Join (Join (Join ra rb) rc) rd) re) rf))
; Size
(set (size ra) 50)
(set (size rb) 200)
(set (size rc) 10)
(set (size rd) 123)
(set (size re) 10000)
(set (size rf) 1)
;; cost of relation is its size minus 1, since the string arg will have a cost of 1 as well
(rule ((= (size (Rel a)) asize))
((cost (Rel a) (- asize 1))))
;; cost/size of join is product of sizes
(rule ((Join a b)
(= (size a) asize)
(= (size b) bsize))
((set (size (Join a b)) (* asize bsize))
(cost (Join a b) (* asize bsize))))
; Associativity
(rewrite (Join a b) (Join b a))
; Commutativity
(rewrite (Join (Join a b) c) (Join a (Join b c)))
(run 10000)
(extract query)
(extract (size query)) Compared with the original example: (datatype JoinTree (Rel String) (Join JoinTree JoinTree))
(function cost (JoinTree) i64 :merge (min old new))
(function size (JoinTree) i64 :merge (min old new))
(let ra (Rel "a"))
(let rb (Rel "b"))
(let rc (Rel "c"))
(let rd (Rel "d"))
(let re (Rel "e"))
(let rf (Rel "f"))
(let query (Join (Join (Join (Join (Join ra rb) rc) rd) re) rf))
; Size
(set (size ra) 50)
(set (size rb) 200)
(set (size rc) 10)
(set (size rd) 123)
(set (size re) 10000)
(set (size rf) 1)
(rule ((Join a b) (= (size a) asize) (= (size b) bsize))
((set (size (Join a b)) (* asize bsize))))
; Cost
(rule ((Rel a)) ((set (cost (Rel a)) (size (Rel a)))))
(rule ((Join a b) (= (size a) asize)
(= (cost a) acost)
(= (size b) bsize)
(= (cost b) bcost))
((set (cost (Join a b)) (+ (+ (* asize bsize) acost) bcost))))
; Associativity
(rewrite (Join a b) (Join b a))
; Commutativity
(rewrite (Join (Join a b) c) (Join a (Join b c)))
(run 10000)
(extract (cost query))
(extract (size query)) The computed cost for both is the same, It doesn't allow for setting costs or defining custom merge functions for costs, but as you can see here you can use a companion table to do that. The cost will always just be set as the most recent value. |
Will I be able to query a |
@AzizZayed No you are not able to query it like a regular function, it only shows up in the extraction calculation or when serializing the nodes. What do you need that for? |
@saulshanabrook Just for debugging purposes. It's not very important. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial thoughts
@AzizZayed There is some concern that this won't work with an updated backend that is planning to be integrated soon, so it seems like we might hold off on merging this feature in. I will still try to update this PR to keep it in good status. The web UI won't work with this feature. |
7620ac8
to
361e9a3
Compare
0607400
to
cbe174a
Compare
I have updated this PR with changes from main, moved cost to its own action, and changed cost back from |
CodSpeed Performance ReportMerging #355 will not alter performanceComparing Summary
Benchmarks breakdown
|
I have updated this PR to take the minimum instead of an arbitrary cost if two outputs are combined that have different costs. Basically, taking a step back, then PR adds some additional ability to set custom costs per function but does not allow you to set cumulative costs including children. Why? Because doing so is incompatible with the current extractor as well as all extractors in extraction-gym. The serialized e-graph must contain an integer per node for the cost, it cannot be a function that depends on which child you select. The motivation here is that this does allow a larger subset of extraction than before, without going full hog. The sense is that this is useful enough to support more use cases, as seen by the examples included in this PR that were built from real world use cases people brought forward. cc @ezrosent |
Update from team meeting: We don't want to support this feature now, because we are focused on stabilizing the core over adding new features that aren't directly necessary for our immediate work. We may consider looking at it again towards the end of this release cycle or in the next one. We are also unsure if this API is the best one to present to users or if it could be implemented as a non-core extension. For now, if you want to have custom costs, please extract the e-graph yourself and do your own extraction how you see fit. If you have particular use cases that would be solved by this PR or more generally would require some form of custom costs, place post a comment on the underlying issue #294 with details of your use case. That is helpful to guide the discussion of if and how we want to support this type of feature. |
Alternative to #353 that uses a new action instead of specifying a new table.
A few differences over the other approach:
This PR API:
Other PR API:
Python API comparisons
This PR:
Other PR:
Closes #294