-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Flow.metadata
attribute used in Flow.update_metadata
method
#679
base: main
Are you sure you want to change the base?
Conversation
…in update_metadata method
to use equivalent code as Flow
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #679 +/- ##
=======================================
Coverage 99.23% 99.24%
=======================================
Files 21 21
Lines 1573 1583 +10
Branches 427 431 +4
=======================================
+ Hits 1561 1571 +10
Misses 10 10
Partials 2 2
|
Flow.metadata
attribute now used in Flow.update_metadata
methodFlow.metadata
attribute used in Flow.update_metadata
method
@utf i'm sure you're strapped for time but even partial feedback here would be much appreciated! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @janosh. My only concern is about breaking the API of update_metadata
between Flow
s and Job
s. If we add the target
option to Job.update_metadata
that should make this function easier to use and the implementation cleaner.
src/jobflow/core/flow.py
Outdated
self.metadata.update(update) | ||
|
||
if target in ["jobs", "both"]: | ||
for job in self: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here job
could actually be a nested flow. So you should still iterate over these if target = "flow"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps the naming could be improved. target = "flow"
isn't meant to update the metadata of all nested flows (to the exclusion of jobs) but rather only the flow itself, not any of its jobs or nested flows.
how about we rename from target: Literal["flow", "jobs", "both"] = "both"
to target: Literal["self", "nested", "both"] = "both"
? unless you think it's important to have more control, i.e. be able to only update nested Flow
or Job
metadata. there might be use cases for that. in which case maybe we want target: Literal["self", "nested", "jobs", "flows", "both"] = "both"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to clarify this could be the doc string:
target
Specifies where to apply the metadata update. Options are:
- "self": Update only the Flow's own metadata
- "nested": Update only the metadata of Jobs/Flows within the Flow
- "jobs": Update only the metadata of Jobs within the Flow
- "flows": Update only the metadata of Flows within the Flow
- "all": Update both the Flow's metadata and nested Job+Flow metadata (default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overcomplicating it. If you want to select specific flows or jobs then you can pass in a name or class filter. So no need for the extra options. The main thing is the API should be consistent between jobs/flows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. then the easiest thing would be to get rid of target
altogether and use the "all"
behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting point. I think the benefit of target is that it would be a shortcut for class=Flow
vs class=Job
. I would be happy either having the target option or specifying the shortcut in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different issue but while we're on the topic of API design, what's your take on adding a callback_filter: Callable[[Flow | Job], bool]
? users would pass in a function which takes the Flow
or Job
instance on which you invoke update_metadata
(or update_config
, ...) and returns True
if updates should be applied. perhaps more prone to user error but also very versatile. usage example:
Flow().update_metadata(
{"material_id": 42},
callback_filter=lambda flow: SomeMaker in map(type, flow)
and flow.name == "flow name"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the benefit of target is that it would be a shortcut for
class=Flow
vsclass=Job
. I would be happy either having the target option or specifying the shortcut in the docstring.
i don't follow. by class=Job|Flow
, did you mean filter_function=Job|Flow
? the job variant filter_function=Job
doesn't seem to work the way you're suggesting so maybe you meant sth else?
EDIT: i guess you meant class_filter
on Maker.update_kwargs
but that isn't implemented for Flow/Job.update_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking of class_filter
. I think the callback_filter
you proposed sounds very flexible and would be useful!
Flow.update_metadata
is now much more similar to and has reached feature parity withJob.update_metadata
:jobflow/src/jobflow/core/job.py
Lines 922 to 930 in 367836b