Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-38041: rewrite pre-exec-init logic to work without QGs and respect storage class differences #444

Merged
merged 16 commits into from
Sep 12, 2024

Conversation

TallJimbo
Copy link
Member

@TallJimbo TallJimbo commented Sep 10, 2024

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 96.39831% with 17 lines in your changes missing coverage. Please review.

Project coverage is 83.52%. Comparing base (3efb6ca) to head (9a3e471).
Report is 17 commits behind head on main.

Files with missing lines Patch % Lines
...n/lsst/pipe/base/pipeline_graph/_pipeline_graph.py 88.33% 8 Missing and 6 partials ⚠️
python/lsst/pipe/base/graph/graph.py 96.51% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #444      +/-   ##
==========================================
+ Coverage   82.30%   83.52%   +1.22%     
==========================================
  Files          96       97       +1     
  Lines       10971    11426     +455     
  Branches     2092     2192     +100     
==========================================
+ Hits         9030     9544     +514     
+ Misses       1590     1528      -62     
- Partials      351      354       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TallJimbo TallJimbo force-pushed the tickets/DM-38041 branch 2 times, most recently from d787880 to 048018f Compare September 10, 2024 20:15
Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, few minor questions. I only looked superficially at the unit tests.

Whether this task can run without any datasets for the given
connection.
"""
return getattr(self.get_connections(), "minimum", 1) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm confused here, should it be getattr(getattr(self.get_connections(), connection_name), "minimum", 1)? Or maybe even a bit more type-safe with isinstance(..., BaseInput)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍, I've switched to a version that uses isinstance(..., BaseInput) as that's a lot more readable than nested getattr calls, too.

Comment on lines +2186 to +2193
Previously recorded package versions. Updated in place to include
any new packages that weren't present before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add that it is only updated if existing versions are consistent (no exception raised)?

Comment on lines +1849 to +1856
if (ref := butler.find_dataset(self.packages_dataset_type)) is not None:
packages = butler.get(ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we never have a situation with multiple concurrent clients trying to execute this code, it is full of races 🙂

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 I think we've been pretty clear about pre-exec-init not being something you can parallelize.

Comment on lines +1816 to +1823
# We do writes at the end to minimize the mess we leave behind when we
# raise an exception.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to delay all writes, including packages and init outputs, in case one of these three methods raises?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about that too, but I think the extra complexity involved just isn't worth it, since this kind of mess hasn't really been a problem in practice that I'm aware of.

python/lsst/pipe/base/pipeline_graph/_pipeline_graph.py Outdated Show resolved Hide resolved
python/lsst/pipe/base/graph/graph.py Outdated Show resolved Hide resolved
@contextmanager
def make_butler(self) -> Iterator[Butler]:
"""Wrap a temporary local butler repository in a context manager."""
with tempfile.TemporaryDirectory() as root:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding ignore_cleanup_errors=False, I think we saw random cleanup failures that resulted in exceptions which you probably want to ignore in tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. I actually intentionally went with tempfile.TemporaryDirectory over lsst.utils.tests.temporaryDirectory because I knew the standard library could now do that (and the only reason we invented our own was that the standard library couldn't before), and then I forgot to actually use that flag 🤦‍♂️.

tests/test_init_output_run.py Outdated Show resolved Hide resolved
tests/test_init_output_run.py Outdated Show resolved Hide resolved
tests/test_init_output_run.py Outdated Show resolved Hide resolved
This does not use TaskFactory (which is in ctrl_mpexec anyway) because
it takes a lot more care with storage class conversions and components,
and it saves all writes until after it's done all checks, in order to
reduce the chance that we leave a repository messy when we encounter
an error.
The old ones that take TaskDef will be deprecated on DM-40442.
This code was moved here from ctrl.mpexec.CmdLineFwk.preExecInitQBB
with minimal changes.
This reimplements functionality previously in ctrl_mpexec's PreExecInit
by delegating to the new PipelineGraph.instantiate_tasks instead.

PreExecInit.saveInitOutputs now delegates to this method.
@TallJimbo TallJimbo merged commit 8292ef4 into main Sep 12, 2024
12 of 13 checks passed
@TallJimbo TallJimbo deleted the tickets/DM-38041 branch September 12, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants