Construct graphs resulting from arbitrary applications of rewrites #1082
brandonwillard
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Apply
NodesWe need the ability to efficiently construct the graphs resulting from arbitrarily ordered applications of individual rewrites.
This functionality is required for our work in AeMCMC, which after aesara-devs/aemcmc#45 has moved the rewrite considerations into the realm of applied vs. unapplied rewrites. More specifically, AeMCMC needs to "temporarily" consider scale-mixture expanded random variables (see aesara-devs/aemcmc#46) as a means of discovering custom Gibbs samplers. Those expansions need to be ephemeral, because some may end up being pointless–or potentially deleterious–when they don't serve to produce a sampler.
Consider the following two rewrites:
and the following example graph:
In other words, we're rewriting the expression$\log\exp\left(x + x\right) + x + x$ and we want to record all the possible graphs resulting from iteratively applying each rewrite in different orders.
We can anticipate rewrites like the following:
If we want to apply the above two rewrites to a graph, we must first choose a means of traversing the graph and applying them to each node. In this case, nearly any topological traversal order will do (as long as it visits every node), but the traversal processes must be performed repeatedly so that the graphs resulting from each successfully applied rewrite can themselves be rewritten.
The
EquilibriumOptimizer
is currently the most suitableRewriter
for this kind of job. It will construct aLocalOptGroup
and apply the two rewrites until a fixed point is reached.N.B.
LocalOptGroup
uses a "tracking" mechanism to make sure that the above two rewrites are only attempted when a given node uses anat.log
orat.add
Op
–i.e. the twoOp
s registered by each rewrite in theirlocal_optimizer
decorators.As with all rewrites, we need to construct a
FunctionGraph
object and apply aRewriter
to it:As we know, only one of the anticipated sequences of rewrites has been applied. If we want to observe all sequences, we need to alter the steps applied by
EquilibriumOptimizer
so that a newFunctionGraph
instance is produced each time a rewrite is successfully applied, then a queue/"stream" ofFunctionGraph
instances can be constructed and processed by each rewrite in varying orders.First, why are new
FunctionGraph
instances needed? Because our rewriting interfaces are all based onFunctionGraph
s, and information likeFunctionGraph.clients
and other data structures are maintained byFeature
s attached to aFunctionGraph
. Information in the attachedFeature
s is updated incrementally through callbacks triggered by aFunctionGraph
when nodes are added, removed, and updated.Since the incrementally built data structures in a
FunctionGraph
and itsFeature
s are used all throughout our rewrite implementations, we need them to work in this new situation where independent graphs are repeatedly produced by rewrites.At first, it might seem like we can simply clone
FunctionGraph
s at a certain point; however, sinceFunctionGraph.replace
works by alteringApply
nodes in-place, every singleApply
node in a graph must be cloned, and this invalidates much of the information tracked byFeature
s and the like, because that information often consists ofdict
s mappingApply
nodes and/or their outputVariable
s.One might notice that the situation can be reformulated into
FunctionGraph
"diffs" (i.e. each rewrite applies changes to a graph viaFunctionGraph.replace
and those replacement pairs/maps can be collected and applied to a reference/initialFunctionGraph
. Functionality based on this reformulation is provided by theHistory
Feature
, which "snapshots" aFunctionGraph
instance, collects the changes/diffs, and can apply them in "reverse" to revert aFunctionGraph
to a previous state. BecauseHistory
usesFunctionGraph.change_node_input
, all the listeningFeature
s can update their internal data structures accordingly.This approach still doesn't address the main issue:
Apply
nodes are updated in-place, making it necessary to clone entireFunctionGraph
s at some point.N.B. #666 addresses some of these cloning issues, but it still needs to recurse into the clients of a cloned
Apply
node and clone the client nodes. In other words, any time anApply
node's inputs are changed, that node needs to be cloned, as well as all theApply
nodes that depend on the updated node's outputs.Using our example graph
z
, i.e.if we want to replace the node/
Variable
labeledE
(as we would withlocal_reduce_add
) and produce a new, independent graph, we need to clone the "downstream" nodes (i.e.D
,C
,B
, andA
), becauseE
is an input to all of them.This cloning simplifies some things, but it also demands resources and doesn't scale well with the size of a graph. The same goes for all the information collected about the cloned
Apply
nodes; it all needs to be recomputed. These recomputations can be very redundant as well. Just consider theFunctionGraph.clients
information. The only client of nodeB
isA
, and that relationship doesn't change whenA
andB
are cloned; nevertheless, the entry forB
's clients inFunctionGraph.clients
needs to be recomputed using the cloned values.So how can we avoid the need to clone downstream
Apply
nodes and invalidate collected information?Generalized
Apply
NodesOne way to look at this cloning situation:
Apply
nodes can only have one set of inputs, which limits their reuse.Recall that
Apply
nodes are simply containers that hold anOp
and lists of input and outputVariable
s. If we allowApply
nodes to have multiple sets of inputs, then the example replacement mentioned above would only involve the addition of an alternative input to nodeD
.Using this approach an
Apply
node is now determined by itsOp
, outputVariable
(s), and inputType
s–instead of inputVariable
s. A "context" is needed to determine the inputVariable
(s), and, if we connect these contexts toFunctionGraph
s, we get a little closer to our goal.Here's a MWE illustrating these alternative
Apply
nodes:These contexts still require that one clone
FunctionGraph
s, but at least they make it possible to do so without cloning most of theVariable
s andApply
nodes involved. Instead, it might be possible to get away with only shallow clones ofFunctionGraph
s and the data structures inFeature
s that track node-related information.Traversing the Space of Rewrites
Concerns similar to the ones above arise when we consider rewrites that require the use of associative-commutative (AC) properties, because we don't–for instance–want to permute the order of arguments to a commutative
Op
, construct the newApply
node, make the change in aFunctionGraph
, call all the listeningFeature
s, let them perform their updates, and then see if that ordering leads to a graph that matches the next step in a rewrite. Again, in this sort of scenario, we need to temporarily manifest these graphs in a lower-cost way and then reason about them.In #634, AC properties are implemented using miniKanren (via
kanren
). miniKanren allows one to easily traverse and evaluate a few–or all–graphs resulting from every application order of each rewrite.Simply put, miniKanren provides a framework for lazily producing and processing sequences of rewrites applied in every order. These sequences can be infinite, as well.
Let's take a look at what miniKanren does with our graphs.
miniKanren works by producing branching streams of "state" objects. The states in miniKanren are usually unification states: i.e. maps of logic variables and their unified values. As miniKanren process each goal, those goals add new states to the stream–states which satisfy their goals, of course. When a goal-satisfying state cannot be produced in a given stream, that stream ends, yielding no results.
Let's take a look at the internal states produced by
kanren
.The second state example demonstrates a failure of the state to satisfy the relation implemented by the goal (i.e.
cons(1, cdr(cdr_lv)) != z_et
for anycdr_lv
).In Aesara, graphs are walked via recursive calls to nodes' inputs. In
kanren
, nodes are effectivelyetuple
s, and their inputs are obtained via a call tocdr
, making recursiveconso
goals a relational means of walking a graph. See here for detailed illustrations of relational graph walking inkanren
.The miniKanren states produced when graphs are walked look similar to the example state printed above: they contain logic variables for the
car
andcdr
values of each node in a graph.Starting at the "output" logic variable
res_lv
, one can see thatgraph
is being deconstructed intoConsPair
s.ConsPair
s are the base representation of two non-sequence objects that have beencons
ed together (e.g.cons(1, 2) == ConsPair(1, 2)
). ThoseConsPair
s hold logic variables created during recursive iterations withinwalko
. Some of the logic variables map directly to the sub-graphs that haven't been traversed/"deconstructed" intocons
es yet. (N.B. a sequence represented incons
form is terminated by an empty sequence (e.g.cons(a, ()) == (a,)
).)These
cons
chains produced bywalko
are convenient representations for mixing and matching arbitrary parts of a graph. For example:The "multi-input"
Apply
node functionality illustrated in the section above can be reproduced using these unification states, because the "context" arrays in that framework are effectively encoded by thecons
pair relationships in a unification state. This can be seen in the toy example above; by changing the value of a single logic variable, we were able to arbitrarily alter a nested tuple. If such a tuple corresponds to the inputs of a node in an Aesara graph, the resulting state corresponds to a new context array.These unification states are considerably more general/flexible than the very Aesara-specific "multi-input" contexts, but they don't translate as directly to Aesara graphs and/or
FunctionGraphs
. Callingreify
on a state will construct an Aesara graph almost from scratch (i.e. by walking the graph and creating creatingetuple
s viacons
es and then evaluating thoseetuple
s to produce theApply
nodes). There is some caching inetuple
s, though, so acons
of parts that have already beencons
ed before won't be re-evaluated in all instances.In general, the graphs produced by miniKanren are not automatically reified at any point in the miniKanren process, so they don't ever take the form of standard Aesara graphs or
FunctionGraph
objects. This means it's not possible to use our existing Aesara rewrites and tooling on these values without costly reifications.Nevertheless, it should be possible to combine the two techniques and leverage the streaming framework of miniKanren for our graph-space traversal needs.
To clarify, a
kanren
goal constructor and goal take the following general form:As the template above illustrates, we are able to do anything to the miniKanren state object (i.e.
S
in the example) within a goal. Also, we are able to use any type of state object that implements theMutableMapping
interface.Within this framework, we could sync the miniKanren states and their associated Aesara graphs and/or
FunctionGraph
s.More on this later…
Beta Was this translation helpful? Give feedback.
All reactions