Rethinking the GraphBuilder #176

jobrachem · 2024-01-17T14:05:48Z

In the recent weeks, there have been a number of issues that are in some way related to the lsl.GraphBuilder class.

Add transformation method to lsl.Var #173 (With regard to this issue, we discussed the question of what consitutes a change to a model vs. a change to a variable/node.)
Raise error if duplicate nodes are detected in GraphBuilder.transform #167
Refactor GraphBuilder.transform() method #93
Let GraphBuilder.replace_node, GraphBuilder.replace_var and GraphBuilder.transform find existing objects by name #88
Add a convenience function to get a GraphBuilder instance directly from a lsl.Model #87

I think it may be worthwile to have a look at the GraphBuilder and see how it might evolve in the future.

Three claims

To start off the discussion, I claim the following:

The GraphBuilder class is not in fact where we build the Liesel graph.
The GraphBuilder instead offers methods to change an existing graph or to change nodes in that graph.
A large chunk of the functionality of the GraphBuilder is concentrated in private methods that are being used in the GraphBuilder.build_model method.

Details on the claims

Claim 1

The typical way you build a model in Liesel is by initializing variables, connecting them with calculators. Only once in this process, and most likely near the end of it, do you call the GraphBuilder by adding the "terminal nodes": nodes without outputs that can be used to build the whole graph by recursively following their inputs. In my usage, I often have a one-liner of the form:

model = lsl.GraphBuilder().add(response).build_model()

Claim 2

The graph- or var-manipulating methods of the graph builder are:

GraphBuilder.replace_x (x can be "node" or "var")
GraphBuilder.rename_x (x can be "node" or "var")
GraphBuilder.transform

Claim 3

Here's an excerpt from the GraphBuilder.build_model method:

        gb._set_missing_names()
        gb._add_model_log_lik_node()
        gb._add_model_log_prior_node()
        gb._add_model_log_prob_node()
        gb._add_model_seed_nodes()

        nodes, _vars = gb._all_nodes_and_vars()
        nodes_and_vars = nodes + _vars

        model = Model(nodes_and_vars, grow=False, copy=copy)

What follows from these claims?

For me, it currently seems like the GraphBuilder may not need to be its own class. I think its functionality can be allocated as follows:

Var- and node-changing functionality can live directly on Vars and nodes.
Model setup can be done in the Model init. As a side node, the issue addressed in Raise error if duplicate nodes are detected in GraphBuilder.transform #167 can be solved in the process by requiring users to exclusively provide the "terminal nodes" to the model class, such that the graph is always built from the set of minimally required nodes by recursively following their inputs.
Methods for changing an existing model could become independent functions instead. This would only really apply to the GraphBuilder.replace_x methods, as far as I can see. I am aware that the current model class is intentionally static.

Concluding words

This issue is not intended as a fully worked-out solution to only be approved. It is instead intended to provide a basis for discussion about how the GraphBuilder might evolve.

The text was updated successfully, but these errors were encountered:

jobrachem · 2024-01-17T16:04:27Z

We had a first exchange of thoughts on the matter. If we tackle the GraphBuilder, we will go through three stages:

Decision about whether we want to tackle a bigger change to the GraphBuilder. We are in this stage. If "no", the process ends here.
Decision about what this change should look like.
Decision about the time-frame: When do we want to implement the change?

Here are some notes from today's meeting.

There is not much disagreement about the claims.
Paul originally thought of a GraphBuilder as some sort of context manager: Nodes that are created within the context manager are recognized automatically, such that you do not have to add your "terminal nodes" manually.
- This is not how it works today: You do add your terminal node to the graph builder manually.
- Johannes worries about using a context manager for building a large model: For one thing, the indentation is annoying, if the model code becomes large. Another point (not discussed): Using a context manager makes it hard to pre-package model parts or build up a model in a distributed way.
- Generally, the issue is: How can we make model building both smooth and understandable?
Another original idea was: Every node belongs to one GraphBuilder only.
- Personal note from me: We have not discussed this yet, but I am not entirely sure why this is important.
The discussion about "what follows" from the claims has not really started at this point.

jobrachem · 2024-01-17T17:13:11Z

@wiep @GianmarcoCallegher, my notes are included above. Feel free to comment for additions or corrections.

jobrachem added comp:model This issue is related to the model module team Work processes and organization of the team labels Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethinking the GraphBuilder #176

Rethinking the GraphBuilder #176

jobrachem commented Jan 17, 2024

jobrachem commented Jan 17, 2024 •

edited

Loading

jobrachem commented Jan 17, 2024

Rethinking the GraphBuilder #176

Rethinking the GraphBuilder #176

Comments

jobrachem commented Jan 17, 2024

Three claims

Details on the claims

Claim 1

Claim 2

Claim 3

What follows from these claims?

Concluding words

jobrachem commented Jan 17, 2024 • edited Loading

jobrachem commented Jan 17, 2024

jobrachem commented Jan 17, 2024 •

edited

Loading