-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __getitem__
and __getattr__
🧙magic🧙♀️ methods for Job
#450
base: main
Are you sure you want to change the base?
Add __getitem__
and __getattr__
🧙magic🧙♀️ methods for Job
#450
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #450 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 20 20
Lines 1521 1523 +2
Branches 419 419
=========================================
+ Hits 1521 1523 +2
|
__getitem__
magic method for Job
__getitem__
🧙magic🧙 method for Job
__getitem__
🧙magic🧙 method for Job
__getitem__
🧙magic🧙♀️ method for Job
This is a very interesting idea, thanks! I have a couple of comments: Incompatibility with output attributesOne potential confusion is that you aren't able to do something like: job2 = capitalize(job1.hello) We could add that functionality in, but it then raises an issue with A potential fix would be to rename all of the Job attributes and functions to Incompatibility with jobs without a dictionary outputAnother issue is with jobs that output a python primitive directly. E.g., your @job
def capitalize(s):
return s.upper()
@job
def decapitalize(s):
return s.lower() This change might think users than can do: job1 = capitalize("hello")
job2 = decapitalize(job1)
flow = Flow([job1, job2]) But that will fail since jobs can only accept To enable this behaviour we could make it so that if a I think supporting both attributes and whole input jobs would actually be a really nice feature and make jobflow easier to use. FlowsFinally, just a note that we could also do something similar with Flows. E.g., this would enable you to write: @job
def add(a, b):
return a + b
add_first = add(1, 5)
add_second = add(add_first, 5)
flow = Flow([add_first, add_second], output=add_second)
add_third = add(flow, 5) Together with #416 we would have a very nice API, e.g. @job
def add(a, b):
return a + b
@flow
def multi_add(a, b):
add_first = add(1, 5)
return add(add_first, 5)
result = add(multi_add(1, 5), 5) |
@utf. Thanks for expanding on my initial idea to transform this into a great discussion about the API as a whole. I think these are all fantastic. In particular
This was a feature I was actually hoping already existed a few weeks back --- I tried it out, and of course it wasn't the case. But I think this would be elegant because I can't really see many use cases for people passing On a practical note, I am happy to get help on this from anyone reading this message! I probably can't tackle your suggestions imminently, but it's important enough in my opinion that it will be high up on my to-do list. |
@Andrew-S-Rosen Thanks for tagging me. These sound like great UX improvements! |
FWIW: Going to try to tackle more of @utf's ideas next week! |
__getitem__
🧙magic🧙♀️ method for Job
__getitem__
and __getattr__
🧙magic🧙♀️ method for Job
__getitem__
and __getattr__
🧙magic🧙♀️ method for Job
__getitem__
and __getattr__
🧙magic🧙♀️ method2 for Job
__getitem__
and __getattr__
🧙magic🧙♀️ method2 for Job
__getitem__
and __getattr__
🧙magic🧙♀️ methods for Job
@utf: Alright, we are getting there! First, a summary and then a question for you. SummaryI have now added a I have also made it so that if a QuestionsQuestion 1I have the following test: from jobflow import Flow, job, run_locally
from dataclasses import dataclass
@job
def make_str(s):
@dataclass
class MyClass:
hello: str = s
return MyClass
@job
def capitalize(s):
return s.upper()
job1 = make_str("world")
job2 = capitalize(job1.hello)
flow = Flow([job1, job2])
responses = run_locally(flow, ensure_success=True)
assert responses[job2.uuid][1].output == "WORLD" The traceback is as follows: Traceback (most recent call last):
File "C:\Users\asros\github\jobflow\src\jobflow\managers\local.py", line 101, in _run_job
response = job.run(store=store)
File "C:\Users\asros\github\jobflow\src\jobflow\core\job.py", line 605, in run
# if Job was created using the job decorator, then access the original function
File "C:\Users\asros\github\jobflow\src\jobflow\core\job.py", line 711, in resolve_args
store,
File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 451, in find_and_resolve_references
resolved_references = resolve_references(
File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 346, in resolve_references
resolved_references[ref] = ref.resolve(
File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 178, in resolve
data = data[attr] if attr_type == "i" else getattr(data, attr)
AttributeError: 'dict' object has no attribute 'hello'
2023-10-12 14:45:24,767 INFO Finished executing jobs locally To my untrained eye, it seems that the output is being serialized into a dictionary format automatically (MSONable-related?) rather than staying a class. Do you have any suggestions? Question 2There is a new CommentYou mentioned earlier that this could result in clashes with "first-class" Todo
|
Awesome, thanks @Andrew-S-Rosen. For question 1, I think this is because the class is defined in the scope of from jobflow import Flow, job, run_locally
from dataclasses import dataclass
@dataclass
class MyClass:
hello: str
@job
def make_str(s):
return MyClass(hello=s)
@job
def capitalize(s):
return s.upper()
job1 = make_str("world")
job2 = capitalize(job1.hello)
flow = Flow([job1, job2])
responses = run_locally(flow, ensure_success=True)
assert responses[job2.uuid][1].output == "WORLD" |
Hello everyone, I think this is definitely a very nice addition! I have one question. This is related to the "real" attributes of the Job object, also raised by @utf with a potential solution with the renaming to _ or something, which hopefully does not clash... (someone could still define an output with e.g. output_ as a key, even it would be rather weird). I am just wondering if anyone thought of subclassing the Job (and if it makes sense ? maybe it wouldn't even work ?). If that is the case, also the attributes of these subclasses should be appended with _ (or any other modification to avoid clash with the Anyway, very nice thinking to improve user friendliness! |
Looks like the issue here is def __getattr__(self, name: str) -> OutputReference:
if attr := getattr(self.output, name, None):
return attr
raise AttributeError(f"{type(self).__name__} has no attribute {name!r}") |
I was gonna start writing something about this and then I read your comment @utf . Indeed, it would really be nice to be able to do that as well, otherwise we are stuck with a difference of behavior between jobs and flows, i.e. you can use job directly without the .output to access a job output but for flows is different. And this difference is I believe worse than the one with I will try to get back a bit on the |
@janosh: Thanks! For your example though, we don't know if @davidwaroquiers: Yes, I agree. I view the |
In full disclosure, I am going to need to put this PR on hold. Anyone willing to can feel free to expand on it with a follow-up PR if they wish! If we start seeing some movement with the |
Summary
Following in @janosh's footsteps, I propose a
__getitem__
magic method forJob
, such that if you index aJob
object by a key (as if it were a dictionary) or an index (as if it were a list), the.output
is implicitly called for the user and theOutputReference
is returned.Motivation
Consider these two toy functions:
If you want to make a
Flow
of them, you have to currently do:However, new users to Jobflow often stumble on this in my experience because they expect to be able to do:
They can now do that without any issues 😄