Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EGraph Visualizations #147

Closed
wants to merge 88 commits into from

Conversation

saulshanabrook
Copy link
Member

@saulshanabrook saulshanabrook commented May 22, 2023

Adds the ability to visualize the state of an EGraph using Graphviz (addressing #144).

Goals:

  • Help educate new users on what e-graphs are and how egglog implements them
  • Help with debugging the state of an e-graph after some transitions
  • Serve as a launching point for interactive visualizations down the road

Features:

  • Colors each cluster based on it's sort
  • Caps size of graph to reduce memory/time blowups with unreadable output
  • Shows container values as nodes by adding inner_values method to Sorts
  • Creates intermediate graph IR before encoding to graphviz. Could be exposed at a later time to allow other visualization frontends from say Python.
  • Adds visualizer to web view, with dynamic transitions, and ability to pan and zoom.
  • Adds CLI flags --output-dot and --output-svg to visualize any program
  • Adds make graphs command to create SVG & dots of all examples
  • Tests graph creation in CI

Possible next steps/follow-up issues:

Examples

eqsat-basic in web viewer

Screenshot 2023-06-07 at 1 43 16 PM

It will also transition between graphs smoothly:

Recording.mov

eqsat-basic

eqsat-basic

fibonacci

fibonacci

fibonacci-demand

fibonacci-demand

map

map

rw-analysis

rw-analysis

proofs

proofs

eqsat-basic in Python

I've also been working on some Python examples, though they're not part of this PR. I wanted to show them here to just give a sense of how that could work eventually:

note that these screenshots were taken at a @recursecenter presentation previously and don't reflect the current graphviz styles

In this notebook we can see how the graph changes before and after running:
Screenshot 2023-05-15 at 4 05 02 PM

In this other notebook we use d3-graphviz to animate the transitions between different graphs, doing one run in between each snapshot:

Untitled.3.mov

TODO

  • Resolve all clippy errors
  • Rename Graph to something more descriptive such asEGraphState or ExportedEGraph
  • Initially, keep Graph structure private to this crate. This could later be useful for the Python bindings to use with other visualization formats like cytoscape.js
  • Add CLI commands to output .dot as well as .svg, instead of always emitting it
  • Test graph output on all examples (make test-graphs)
  • Investigate why some functions are added twice (realized it was based on ordering of arguments)
  • Make style closer to e-graphs good website, by making e-class borders dotted, and decreasing size of arrowheads, making nodes square, and making borders round
  • Ensure all docstrings comments have ///
  • Make node IDs stable
  • Fix self arrows (to make lhead work with self arrows, seem to need a dummy point outside the cluster, which ends up looking odd, so leaving as is for now)
  • Switch functions that return built-in values to represent those as nodes instead of as strings, so that we can represent those that points to eqsats, like maps, sets, or vecs.
  • Position where arrows are coming from based on argument ordering.
  • Add graphviz to web
  • on web, set size to size of container
  • On web, animate transitions
  • On web, test on large graphs
  • Fix proof execution
  • Add CLI flag to switch between showing temp nodes, and not
  • Fix inconsistent node spacing
  • Switch so that unit isn't combined
  • add sort as cluster label
  • change outline of clusters which are e-classes vs not
  • try adding colors based on sort of clusters
  • Change font of nodes to helvetica

@oflatt
Copy link
Member

oflatt commented Jun 16, 2023

Thanks! Worked for me

@oflatt
Copy link
Member

oflatt commented Jun 29, 2023

What's the status on this? Does @mwillsey have time to review?
Saul and I discussed it some- I think the conclusion was to somewhat simplify the intermediate representation Saul made. Eventually we want only one format that extraction, visualization, and dump to text uses

@saulshanabrook
Copy link
Member Author

Saul and I discussed it some- I think the conclusion was to somewhat simplify the intermediate representation Saul made. Eventually we want only one format that extraction, visualization, and dump to text uses

@oflatt I saw that you opened your PR that also has its own IR for the graph. I would be happy to commit to merging our two IRs once both PRs are merged, if that's easier as well. In that merge, it might also be good to think about having a format we could expose in Python to allow other forms of extraction easily, like the work @philzook58 was experimenting with.

@oflatt
Copy link
Member

oflatt commented Jul 3, 2023

Yep, I'm in favor of merging this and fixing it later then.
We should also coordinate later with @mwillsey's new format for the extraction gym:
https://github.com/egraphs-good/extraction-gym

@mwillsey
Copy link
Member

mwillsey commented Jul 6, 2023

After conversation with @oflatt, I really think we can use a common format for both this and the extraction gym. So I'm going to close this PR for now, as I do not intend to merge it in it's current state. Details about the format are coming soon!

@mwillsey mwillsey closed this Jul 6, 2023
@saulshanabrook
Copy link
Member Author

@mwillsey So would you recommend implementing this on top of the new common format and then re-opening?

FWIW the Python bindings are already published with this fork and I will continue including this code in my fork and keeping it up to date, because I have been finding the graphviz helpful for education.

@mwillsey
Copy link
Member

mwillsey commented Jul 9, 2023

Yes! I think the idea is that you’d use the egraph serialize library as a dependency. Egglog (and egg) will hopefully soon have support for exporting to the in-memory representation of that library, thereby supporting not only serialization but also hopefully visualization. So the Python bindings could use mainline and still have visualization.

saulshanabrook added a commit to saulshanabrook/egraph-serialize that referenced this pull request Jul 11, 2023
This adds a mapping of e-class id to class type to the format.

One use case for this was in the visualizer in egglog
(egraphs-good/egglog#147) to display the sort
on each e-class.
mwillsey added a commit to egraphs-good/egraph-serialize that referenced this pull request Jul 11, 2023
* Add sorts/types/names to classes

This adds a mapping of e-class id to class type to the format.

One use case for this was in the visualizer in egglog
(egraphs-good/egglog#147) to display the sort
on each e-class.

* Use local test files (#2)

This changes the tests to use the local files instead of those in the
extraction gym repo. I made this change so I could test the addition of
classes.

Feel free to disregard if you like.

* Make class_data a separate object

---------

Co-authored-by: Max Willsey <[email protected]>
@saulshanabrook
Copy link
Member Author

saulshanabrook commented Jul 21, 2023

My plan for following up with this work is as follows:

  • Add serialization support to egglog Add serialization support #171
  • Add support for converting serialized format to graphviz in the https://github.com/egraphs-good/egraph-serialize/ repo (under a feature flag, so graphviz requirements are optional), with methods to produce the string fo the graphviz, and save as a .dot and .svg file
  • Add support for splitting out all nodes of certain sorts into their own e-classes as a method on Egraph in egraph-serialize (so that fns which return or take primitives can not share them in the viz)
  • Update this branch to use the graphviz support from serialize to save svgs to disk and in the web UI.

Let me know what you think!

@mwillsey
Copy link
Member

Yes, that sounds good! I would even go further and say that any "singleton" e-class (an e-class with just a single node, like all primitives) could be inlined directly into the parent for easier visualization.

@saulshanabrook
Copy link
Member Author

saulshanabrook commented Jul 21, 2023

Yeah, I think that might work too... let me take a look at some examples from this branch to see what would make sense....

Here are a couple of examples from three consecutive commits in the history of this branch:

git checkout <hash>
cargo run tests/<name>.egg --save-svg
# Screenshot to convert to cropped PNG due to bug at these commits with size
77e80d4 9da623c 7ddabb8
All equal primitives in shared node.
Current behavior of exporter.
Unit primitives in their own nodes. All primitives in their own nodes.
Current behavior of this branch.
fibonacci Screenshot 2023-07-21 at 12 16 00 PM Same as ← Screenshot 2023-07-21 at 12 15 34 PM
fibonacci-demand Screenshot 2023-07-21 at 12 13 48 PM Same as ← Screenshot 2023-07-21 at 12 13 48 PM
path Screenshot 2023-07-21 at 12 28 09 PM Screenshot 2023-07-21 at 12 55 20 PM Screenshot 2023-07-21 at 12 56 42 PM

Does anyone have thoughts on which are preferable? Happy to add other examples too.

I ended up settling on the last method because I thought it was closest to the current semantics of egglog.

@mwillsey
Copy link
Member

For things of type unit, what makes the most sense to me is to group by function name. So basically approach 1, but split up by function name; so all the path tuples in one box, edge tuples in another, and so on. I think we can just elide the actual () node.

For other primitives (i64, etc), I'd like to see what the inlining approach looks like for when the primitives are used as inputs to functions. A quick example:

Node
|  |
v  v
1  foo

Could become:

Node(1, ·)
|
v
foo

in the situation where 1 is in a singleton e-class (or maybe is a primitive or something) but foo is not.

@saulshanabrook
Copy link
Member Author

For other primitives (i64, etc), I'd like to see what the inlining approach looks like for when the primitives are used as inputs to functions. A quick example:

That's a cool idea! It would definitely cut down on the number of nodes...

We would still put collection primitives as nodes b/c they can point to e-classes...?

For functions that return primitives, we could do Node(1, ·) -> 2 too?

@mwillsey
Copy link
Member

Sure! We'll have to try it to see

@saulshanabrook
Copy link
Member Author

I have added these examples, along with the existing examples, to the e-graph serialize PR, so that we can see how inlining compares! https://github.com/saulshanabrook/egraph-serialize/tree/viz/tests-viz

@saulshanabrook
Copy link
Member Author

saulshanabrook commented Aug 8, 2023

@mwillsey I have updated this PR to use the e-graph serialize graphviz implementation (egraphs-good/egraph-serialize#4 and #171).

The changes seem out of date in the GitHub diff, even though I pushed them to the branch. Maybe if you re-open it they will be refreshed?


EDIT: I opened a new PR since maybe that's simpler, with the same branch: #186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants