-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-42188: Make RemoteButler usable from services #930
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #930 +/- ##
==========================================
+ Coverage 87.94% 87.99% +0.05%
==========================================
Files 301 309 +8
Lines 39211 39468 +257
Branches 8277 8311 +34
==========================================
+ Hits 34484 34730 +246
- Misses 3525 3531 +6
- Partials 1202 1207 +5 ☔ View full report in Codecov by Sentry. |
3df6f7f
to
9e42b8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really good.
|
||
factory = RemoteButlerFactory.create_factory_from_config(butler_config) | ||
return factory.create_butler_with_credentials_from_environment(butler_options=options) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a raise
here for clarity if there is no match to REMOTE or DIRECT? I know that there are currently only two values in the enum but this ties it up nicely. Consider using match
here since that is designed for enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because mypy's exhaustiveness checking will ensure that all the cases are handled, there's some value in not having the raise. Without the raise, when someone adds an enum value, mypy will tell them all the places that need to do something with it.
I don't feel that strongly about it one way or the other though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match
naturally comes with a fall through clause that can raise. I feel better if the code looks like it's dealing with all the options without relying on mypy.
) -> Butler: | ||
"""Return a new Butler instance connected to the same repository | ||
as this one, but overriding ``collections``, ``run``, | ||
``inferDefaults``, and default data ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this method becomes public I think it will trigger the numpydoc warning. Might want to add the public docs here now rather than later.
# missing the ``cls`` property. | ||
return ButlerType.DIRECT | ||
elif butler_class_name == "lsst.daf.butler.direct_butler.DirectButler": | ||
return ButlerType.DIRECT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a cls
entry to one of the direct butler config files in tests/config/basic
so we can trigger this code path?
return ButlerType.REMOTE | ||
else: | ||
raise ValueError( | ||
f"Butler configuration requests to load unknown Butler class {butler_class_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"Butler configuration requests to load unknown Butler class {butler_class_name}" | |
f"Butler configuration requests to load unknown Butler class {butler_class_name!r}" |
so the class name is quoted.
|
||
Parameters | ||
---------- | ||
repositories : `Mapping`[`str`, `str`], optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repositories : `Mapping`[`str`, `str`], optional | |
repositories : `~collections.abc.Mapping`[`str`, `str`], optional |
server_url : `str` | ||
The URL of the Butler server that RemoteButler instances created by | ||
this factory will connect to. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove blank line. Not needed.
@@ -188,15 +153,19 @@ def isWriteable(self) -> bool: | |||
@property | |||
def dimensions(self) -> DimensionUniverse: | |||
# Docstring inherited. | |||
if self._dimensions is not None: | |||
return self._dimensions | |||
with self._cache.access() as cache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need the read lock? If the value is not None
then no other thread is going to make it None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not been able to locate any definitive information about Python's memory model for threading other than "if you use the synchronization primitives from the threading module then it works how you would expect".
We probably don't need this on the read in the current version of CPython because of the GIL but that's not documented as permanent behavior anywhere that I am aware of.
@@ -629,18 +598,21 @@ def _get_url(self, path: str, version: str = "v1") -> str: | |||
path : `str` | |||
The full path to the endpoint. | |||
""" | |||
return f"{version}/{path}" | |||
slash = "" if self._server_url.endswith("/") else "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we force self._server_url
to always have the /
so we don't need to check every time on get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I had considered that but I was concerned someone might assign the server URL from somewhere else, forgetting the slash, and it would break intermittently. By putting this check here we enforce it at the point where the code depends on it having a slash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't _server_url
private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but RemoteButler is already almost 700 lines long and we haven't implemented 95% of its methods yet. I don't see any reason not to enforce correctness at the point of use instead of adding a comment like "make extra sure this variable always has a slash at the end of it!"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something. _server_url
is set in one place in the constructor so my suggestion was to move this fixup line up to that point and normalize the string there. Then it's fixed once and never needs to be tested again.
I wasn't suggesting we document that all places that construct a RemoteButler must ensure the / is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is, but with the rate that these constructors are proliferating we may end up with it getting set more than one place.
The only place that must have a slash is _get_url, so what is the benefit to doing this logic in the constructor? The constructor is plenty busy with other issues already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Leave it. I don't really understand why we want to do a check and append every time the server is called rather than once when we construct the object. _server_url
is entirely internal. We are in charge of how that variable is set.
|
||
def test_factory_via_global_repository_index(self): | ||
with tempfile.NamedTemporaryFile(suffix=".yaml") as index_file: | ||
index_file.write(f"test_repo: {self.config_file_uri}\n".encode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_file.write(f"test_repo: {self.config_file_uri}\n".encode()) | |
print(f"test_repo: {self.config_file_uri}", file=index_file) |
(and open temp file with mode="w"
)? It is shorter and seems more natural.
tests/test_thread_safe_cache.py
Outdated
|
||
import unittest | ||
|
||
from lsst.daf.butler._utilities.thread_safe_cache import ThreadSafeCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put the three thread safety tests in a single test file? They are very small.
|
||
|
||
class ButlerFactory: | ||
"""Factory for efficiently instantiating Butler instances from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm reading #931, we seem to be saying that server is required to use a label from a DAF_BUTLER_REPOSITORY_INDEX. This makes me ponder that the name of the class is a bit too generic since DirectButler is not required to use the index labels and we don't expect this class to be used in non-server context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's DAF_BUTLER_REPOSITORY_INDEX or "pass in whatever set of labels and config files you want." I'd be open to renaming if you have any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it has to be able to return DirectButler and RemoteButler for now the class can't really be ServerButlerFactory (as is the name of the variable you assign to). LabeledButlerFactory
? NamedButlerFactory
?
Decoupled the set of parameter passed to Butler() from those passed to its subclasses. RemoteButler construction will be changing to require injection of some long-lived objects, so it needs a different constructor signature than the original Butler constructor. There are several parameters to Butler like "collections" and "run" which control the behavior of a Butler instance and are common to both DirectButler and RemoteButler. Upcoming changes to Butler construction will require these to be passed around through multiple functions. To avoid repeating them they are pulled out to a dataclass object.
To provide a place for storing state that is shared between multiple instances of RemoteButler, add a RemoteButlerFactory class that can be used to get instances of RemoteButler. This also simplifies the constructor logic for both Butler types by making Butler.from_config() the only place responsible for knowing about the configuration search logic.
The DirectButler constructor used to have two forms that were logically separate but combined into one interface: 1. Standard new instance creation 2. Copy construction This became more awkward when part of the construction process was moved to Butler.from_config, because 'config' and 'butler' are mutually exclusive options but we always need to instantiate a config to find out which class to load. Moved the copy construction form into a separate instance method _clone(). This will likely become public at some point in the near future.
Python does not guarantee that dict mutations are threadsafe, and in particular the current CPython implementation is not if __eq__ is defined for the object used as the key. (Almost every Butler-related object defines a custom __eq__.)
Moved most of the logic from DirectButler.__new__ into a factory function DirectButler.create_from_config. This allows us to use __new__ as a common constructor shared by create_from_config() and clone(). This will help ensure that any new instance properties we add are correctly set up for all construction paths.
We need the logic for mapping the ButlerConfig 'cls' field to Butler type in both Butler() and the upcoming ButlerFactory, so move it to ButlerConfig.
Add a factory class for creating multiple Butler instances for multiple repositories in the repository index. This is intended for use in long-lived multi-threaded services that will instantiate a new Butler for each end-user HTTP request.
We now inject cached data from RemoteButlerFactory into RemoteButler, so that multiple RemoteButler instances connected to the same server can share a cache.
With the changes to Butler construction, creating a RemoteButler in the Docker smoke test is now faster. This was causing the Docker smoke test to be unreliable, because the test was sometimes trying to connect before the server had booted up.
Tim is not comfortable with relying on mypy and the existing exception in get_butler_type to ensure that all cases are handled.
9e42b8b
to
2d24a30
Compare
Checklist
doc/changes