Add `getitem` and `getattr` 🧙magic🧙‍♀️ methods for `Job` #450

Andrew-S-Rosen · 2023-10-10T00:31:30Z

Summary

Following in @janosh's footsteps, I propose a __getitem__ magic method for Job, such that if you index a Job object by a key (as if it were a dictionary) or an index (as if it were a list), the .output is implicitly called for the user and the OutputReference is returned.

Motivation

Consider these two toy functions:

@job
def make_str(s):
    return {"hello": s}

@job
def capitalize(s):
    return s.upper()

If you want to make a Flow of them, you have to currently do:

from jobflow import Flow

job1 = make_str("world")
job2 = capitalize(job1.output["hello"])
flow = Flow([job1, job2])

However, new users to Jobflow often stumble on this in my experience because they expect to be able to do:

from jobflow import Flow

job1 = make_str("world")
job2 = capitalize(job1["hello"])
flow = Flow([job1, job2])

They can now do that without any issues 😄

…nvenience

codecov · 2023-10-10T00:46:40Z

Codecov Report

Merging #450 (e29fb7d) into main (cdbcbd0) will not change coverage.
The diff coverage is 100.00%.

❗ Current head e29fb7d differs from pull request most recent head 57402ee. Consider uploading reports for the commit 57402ee to get more accurate results

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #450   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           20        20           
  Lines         1521      1523    +2     
  Branches       419       419           
=========================================
+ Hits          1521      1523    +2

Files	Coverage Δ
src/jobflow/core/job.py	`100.00% <100.00%> (ø)`

utf · 2023-10-10T07:29:59Z

This is a very interesting idea, thanks! I have a couple of comments:

Incompatibility with output attributes

One potential confusion is that you aren't able to do something like:

job2 = capitalize(job1.hello)

We could add that functionality in, but it then raises an issue with job1.output e.g., if your output document has an attribute called output this will return the OutputReference and not OutputReference.output.

A potential fix would be to rename all of the Job attributes and functions to attribute_, E.g., job.output_, job.graph_, job.update_maker_kwargs_ etc. This would be a breaking change but not too onerous to fix.

Incompatibility with jobs without a dictionary output

Another issue is with jobs that output a python primitive directly. E.g., your capitalize function. Consider these jobs.

@job
def capitalize(s):
    return s.upper()

@job
def decapitalize(s):
    return s.lower()

This change might think users than can do:

job1 = capitalize("hello")
job2 = decapitalize(job1)
flow = Flow([job1, job2])

But that will fail since jobs can only accept OutputReferences (or other python objects) as inputs.

To enable this behaviour we could make it so that if a Job is given as an input, then we use the whole OutputReference by default.

I think supporting both attributes and whole input jobs would actually be a really nice feature and make jobflow easier to use.

Flows

Finally, just a note that we could also do something similar with Flows. E.g., this would enable you to write:

@job
def add(a, b):
    return a + b

add_first = add(1, 5)
add_second = add(add_first, 5)

flow = Flow([add_first, add_second], output=add_second)

add_third = add(flow, 5)

Together with #416 we would have a very nice API, e.g.

@job
def add(a, b):
    return a + b

@flow
def multi_add(a, b):
    add_first = add(1, 5)
    return add(add_first, 5)

result = add(multi_add(1, 5), 5)

Andrew-S-Rosen · 2023-10-10T21:49:04Z

@utf. Thanks for expanding on my initial idea to transform this into a great discussion about the API as a whole. I think these are all fantastic. In particular

"To enable this behaviour we could make it so that if a Job is given as an input, then we use the whole OutputReference by default."

This was a feature I was actually hoping already existed a few weeks back --- I tried it out, and of course it wasn't the case. But I think this would be elegant because I can't really see many use cases for people passing Job objects between jobs and wanting the behavior as it stands right now. As an aside that's not too crucial for this discussion, it would also help with interconversion between several workflow tools since that kind of pattern is pretty common.

On a practical note, I am happy to get help on this from anyone reading this message! I probably can't tackle your suggestions imminently, but it's important enough in my opinion that it will be high up on my to-do list.

janosh · 2023-10-10T22:36:41Z

@Andrew-S-Rosen Thanks for tagging me. These sound like great UX improvements!

Andrew-S-Rosen · 2023-10-11T18:51:05Z

FWIW: Going to try to tackle more of @utf's ideas next week!

Andrew-S-Rosen · 2023-10-12T21:48:01Z

@utf: Alright, we are getting there! First, a summary and then a question for you.

Summary

I have now added a __getattr__ method, in addition to __getitem__, that allows for (implicit) parsing of attributes from the Job output. Tests have been added.

I have also made it so that if a Job is passed to another Job, the .output is called. This is a breaking change because before it would raise a UserWarning, but I think it is more logical. The UserWarning tests have been removed, and a test for the job-in-job behavior was added.

Questions

Question 1

I have the following test:

from jobflow import Flow, job, run_locally
from dataclasses import dataclass

@job
def make_str(s):

    @dataclass
    class MyClass:
        hello: str = s

    return MyClass

@job
def capitalize(s):
    return s.upper()

job1 = make_str("world")
job2 = capitalize(job1.hello)

flow = Flow([job1, job2])

responses = run_locally(flow, ensure_success=True)
assert responses[job2.uuid][1].output == "WORLD"

The traceback is as follows:

Traceback (most recent call last):
  File "C:\Users\asros\github\jobflow\src\jobflow\managers\local.py", line 101, in _run_job
    response = job.run(store=store)
  File "C:\Users\asros\github\jobflow\src\jobflow\core\job.py", line 605, in run
    # if Job was created using the job decorator, then access the original function
  File "C:\Users\asros\github\jobflow\src\jobflow\core\job.py", line 711, in resolve_args
    store,
  File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 451, in find_and_resolve_references
    resolved_references = resolve_references(
  File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 346, in resolve_references
    resolved_references[ref] = ref.resolve(
  File "C:\Users\asros\github\jobflow\src\jobflow\core\reference.py", line 178, in resolve
    data = data[attr] if attr_type == "i" else getattr(data, attr)
AttributeError: 'dict' object has no attribute 'hello'

2023-10-12 14:45:24,767 INFO Finished executing jobs locally

To my untrained eye, it seems that the output is being serialized into a dictionary format automatically (MSONable-related?) rather than staying a class. Do you have any suggestions?

Question 2

There is a new E RecursionError: maximum recursion depth exceeded in comparison introduced in a few tests now. I'm tackling one issue at a time, but if you know the answer on how to fix it, do let me know!

Comment

You mentioned earlier that this could result in clashes with "first-class" Job attributes, like output. While true, I just want to note the __getattr__ will only be activated if the attribute doesn't exist in the first place. That means if the user called .output, it will return the OutputReference (even if the returned class has an attribute .output). We can either rename the native attributes like you suggested or be clear in the documentation somewhere about what is not allowed.

Todo

Apply the same logic to Flow (perhaps in a new PR?).

utf · 2023-10-13T09:00:38Z

Awesome, thanks @Andrew-S-Rosen.

For question 1, I think this is because the class is defined in the scope of make_str, so it won't be accessible outside of that function. This means monty decoder can't deserialise it as it can't find the definition to import. If adjust your test to the follow, it should work:

from jobflow import Flow, job, run_locally
from dataclasses import dataclass


@dataclass
class MyClass:
    hello: str 

@job
def make_str(s):
    return MyClass(hello=s)

@job
def capitalize(s):
    return s.upper()

job1 = make_str("world")
job2 = capitalize(job1.hello)

flow = Flow([job1, job2])

responses = run_locally(flow, ensure_success=True)
assert responses[job2.uuid][1].output == "WORLD"

davidwaroquiers · 2023-10-13T13:45:02Z

Hello everyone,

I think this is definitely a very nice addition! I have one question. This is related to the "real" attributes of the Job object, also raised by @utf with a potential solution with the renaming to _ or something, which hopefully does not clash... (someone could still define an output with e.g. output_ as a key, even it would be rather weird). I am just wondering if anyone thought of subclassing the Job (and if it makes sense ? maybe it wouldn't even work ?). If that is the case, also the attributes of these subclasses should be appended with _ (or any other modification to avoid clash with the __getattr__).

Anyway, very nice thinking to improve user friendliness!

Andrew-S-Rosen · 2023-10-13T13:49:26Z

@utf: Thank you! That fixes it and makes a lot of sense.

@janosh: It seems this change caused some of your magic method tests to fail with a recursion error, such as flow = flow + job. Do you happen to know why this might be?

… job's output

janosh · 2023-10-13T15:56:13Z

Looks like the issue here is __getattr__ calls getattr on self.output which itself calls __getattr__ if the attribute doesn't exist in self.output, leading to a loop. Safer to raise AttributeError if name can't be found in self.output:

def __getattr__(self, name: str) -> OutputReference:
    if attr := getattr(self.output, name, None):
        return attr
    raise AttributeError(f"{type(self).__name__} has no attribute {name!r}")

davidwaroquiers · 2023-10-13T16:03:41Z

Flows

Finally, just a note that we could also do something similar with Flows. E.g., this would enable you to write:

@job
def add(a, b):
    return a + b

add_first = add(1, 5)
add_second = add(add_first, 5)

flow = Flow([add_first, add_second], output=add_second)

add_third = add(flow, 5)

Together with #416 we would have a very nice API, e.g.

@job
def add(a, b):
    return a + b

@flow
def multi_add(a, b):
    add_first = add(1, 5)
    return add(add_first, 5)

result = add(multi_add(1, 5), 5)

I was gonna start writing something about this and then I read your comment @utf . Indeed, it would really be nice to be able to do that as well, otherwise we are stuck with a difference of behavior between jobs and flows, i.e. you can use job directly without the .output to access a job output but for flows is different. And this difference is I believe worse than the one with @job and no @flow because its on the user, while for jobs having @job but flows not having @flow, at least it "only" concerns developers (or expert users).

I will try to get back a bit on the @flow decorator proof of concept implementation I did to see if I manage to get it to work properly when it is inside jobflow.

Andrew-S-Rosen · 2023-10-13T16:59:47Z

@janosh: Thanks! For your example though, we don't know if name is in the output until the OutputReference is resolved at runtime, right? That seems like that would cause a problem with your proposed change. But I'll look into this.

@davidwaroquiers: Yes, I agree. I view the Flow analogue as being a requirement. Thanks for taking a look at @flow again! I'm really excited about this prospect.

Andrew-S-Rosen · 2023-11-02T06:13:49Z

In full disclosure, I am going to need to put this PR on hold. Anyone willing to can feel free to expand on it with a follow-up PR if they wish!

If we start seeing some movement with the @flow decorator though, I'd be substantially inspired to revisit this ASAP 😉

Andrew-S-Rosen added 8 commits October 9, 2023 17:27

Add __getitem__ magic method to Job

3abc1ce

Add newline

f509b0f

black

3e40fd9

Merge remote-tracking branch 'origin/rosen-convenience' into rosen-co…

a711bbb

…nvenience

Fix import

3e55bc5

ruff

cfd867d

patch

132a38d

patch

94a1511

Andrew-S-Rosen changed the title ~~Add __getitem__ magic method for Job~~ Add __getitem__ 🧙magic🧙 method for Job Oct 10, 2023

Andrew-S-Rosen changed the title ~~Add __getitem__ 🧙magic🧙 method for Job~~ Add __getitem__ 🧙magic🧙‍♀️ method for Job Oct 10, 2023

Merge branch 'main' into rosen-convenience

2daaefb

Andrew-S-Rosen mentioned this pull request Oct 11, 2023

Allow for the use of @flows with Jobflow Quantum-Accelerators/quacc#1061

Open

Andrew-S-Rosen added 2 commits October 11, 2023 11:51

Merge branch 'main' into rosen-convenience

e29fb7d

Add __getattr__ support Job-in-Job support

184d86c

Andrew-S-Rosen changed the title ~~Add __getitem__ 🧙magic🧙‍♀️ method for Job~~ Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ method for Job Oct 12, 2023

Andrew-S-Rosen changed the title ~~Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ method for Job~~ Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ method2 for Job Oct 12, 2023

Andrew-S-Rosen changed the title ~~Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ method2 for Job~~ Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ methods for Job Oct 12, 2023

black

8818241

Andrew-S-Rosen added 3 commits October 12, 2023 14:48

update

8363c14

black

18f75a7

lint

6893d7f

Andrew-S-Rosen added 2 commits October 13, 2023 06:35

Update test_flow.py

7c835d3

black

2a84836

lint

6e1c45d

Job.__getattr__ raise AttributeError if attribute is not found in the…

57402ee

… job's output

utf mentioned this pull request Oct 29, 2023

Simplify code by using Flow methods __len__ and __getitem__ #467

Merged

davidwaroquiers mentioned this pull request Feb 19, 2024

WIP: Flow decorator #547

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `getitem` and `getattr` 🧙magic🧙‍♀️ methods for `Job` #450

Add `getitem` and `getattr` 🧙magic🧙‍♀️ methods for `Job` #450

Andrew-S-Rosen commented Oct 10, 2023 •

edited

Loading

codecov bot commented Oct 10, 2023 •

edited

Loading

utf commented Oct 10, 2023 •

edited

Loading

Andrew-S-Rosen commented Oct 10, 2023

janosh commented Oct 10, 2023

Andrew-S-Rosen commented Oct 11, 2023

Andrew-S-Rosen commented Oct 12, 2023 •

edited

Loading

utf commented Oct 13, 2023

davidwaroquiers commented Oct 13, 2023 •

edited

Loading

Andrew-S-Rosen commented Oct 13, 2023

janosh commented Oct 13, 2023 •

edited

Loading

davidwaroquiers commented Oct 13, 2023 •

edited

Loading

Flows

Andrew-S-Rosen commented Oct 13, 2023

Andrew-S-Rosen commented Nov 2, 2023 •

edited

Loading

Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ methods for Job #450

Are you sure you want to change the base?

Add __getitem__ and __getattr__ 🧙magic🧙‍♀️ methods for Job #450

Conversation

Andrew-S-Rosen commented Oct 10, 2023 • edited Loading

Summary

Motivation

codecov bot commented Oct 10, 2023 • edited Loading

Codecov Report

utf commented Oct 10, 2023 • edited Loading

Incompatibility with output attributes

Incompatibility with jobs without a dictionary output

Flows

Andrew-S-Rosen commented Oct 10, 2023

janosh commented Oct 10, 2023

Andrew-S-Rosen commented Oct 11, 2023

Andrew-S-Rosen commented Oct 12, 2023 • edited Loading

Summary

Questions

Question 1

Question 2

Comment

Todo

utf commented Oct 13, 2023

davidwaroquiers commented Oct 13, 2023 • edited Loading

Andrew-S-Rosen commented Oct 13, 2023

janosh commented Oct 13, 2023 • edited Loading

davidwaroquiers commented Oct 13, 2023 • edited Loading

Flows

Andrew-S-Rosen commented Oct 13, 2023

Andrew-S-Rosen commented Nov 2, 2023 • edited Loading

Add `getitem` and `getattr` 🧙magic🧙‍♀️ methods for `Job` #450

Add `getitem` and `getattr` 🧙magic🧙‍♀️ methods for `Job` #450

Andrew-S-Rosen commented Oct 10, 2023 •

edited

Loading

codecov bot commented Oct 10, 2023 •

edited

Loading

utf commented Oct 10, 2023 •

edited

Loading

Andrew-S-Rosen commented Oct 12, 2023 •

edited

Loading

davidwaroquiers commented Oct 13, 2023 •

edited

Loading

janosh commented Oct 13, 2023 •

edited

Loading

davidwaroquiers commented Oct 13, 2023 •

edited

Loading

Andrew-S-Rosen commented Nov 2, 2023 •

edited

Loading