-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-41162: Create minimalist RemoteButler client and FastAPI server #897
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #897 +/- ##
==========================================
+ Coverage 87.23% 87.66% +0.43%
==========================================
Files 270 276 +6
Lines 36345 36165 -180
Branches 7598 7547 -51
==========================================
+ Hits 31704 31705 +1
+ Misses 3484 3305 -179
+ Partials 1157 1155 -2
☔ View full report in Codecov by Sentry. |
69dfcf9
to
fc9af88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reorganization is okay. The main place where I have an issue is how the butler gets configured. This needs fixing but I'm fine with you fixing it on a different ticket since this is not production code. Butler(server_uri)
has to work and it doesn't look like it does at the moment (or at least the test code is not demonstrating that)
butler_config = ButlerConfig(config, searchPaths, without_datastore=True) | ||
self._config = RemoteButlerConfigModel.model_validate(butler_config) | ||
self._dimensions: DimensionUniverse | None = None | ||
# TODO: RegistryDefaults should have finish() called on it, but this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we also need to work out what to do with kwargs since people can say instrument="LATISS"
and then in the defaults we check that that's a real instrument so we can default it in later queries when there is a need for an instrument. We can decide what to do about that later. It will fold in to how Query objects work and maybe we will pass the defaults to the Query object and let the server sort it out. I don't know if we want to pass kwargs to the server for verification/expansion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that just passing the defaults to the Query object will probably be the right thing. To stop the execution flow from getting too tangled up between the client and server, I think in general it's going to be the right approach to just record what the user asked for and give it to the server to actually do things with.
http_client: httpx.Client | None = None, | ||
**kwargs: Any, | ||
): | ||
butler_config = ButlerConfig(config, searchPaths, without_datastore=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note eventually the datastore will be needed.
**kwargs: Any, | ||
): | ||
butler_config = ButlerConfig(config, searchPaths, without_datastore=True) | ||
self._config = RemoteButlerConfigModel.model_validate(butler_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this validate because ButlerConfig
looks like a generic mapping and it happens to have a remote_butler.url
in the hierarchy? In general ButlerConfig
is not a pydantic model even though we have talked about it being one. What happens to all the other parts of ButlerConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. Currently the other parts of ButlerConfig are being discarded since we're not using them. I figured as we added usage of the other parts of ButlerConfig, we could add the appropriate validation for them.
It's still not clear to me which parts of the existing DirectButler ButlerConfig are generated internally in the client code, which are potentially configured locally in a client-side config file, and which are only server-side concepts that the client never sees in a configuration file.
I do wonder if some portion of the configuration would be returned from a versioned "init" endpoint (which maybe gives you back some config chunks, server capabilities, dimensions, collection names, etc all in one shot.) If there is only one "config file" endpoint that configuration format becomes an unversionable permanent part of the API
# Docstring inherited. | ||
raise NotImplementedError() | ||
|
||
def transaction(self) -> AbstractContextManager[None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TallJimbo I'm assuming transaction
shouldn't really be in the ABC?
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to add __all__
to all these files. We do always tend to do the __all__
in the files and then from x import *
variant in the __init__.py
. I think that it also helps sphinx to know what docs should be built and doesn't get confused by other symbols being imported from elsewhere.
|
||
def test_bad_config(self): | ||
with self.assertRaises(ValidationError): | ||
Butler({"cls": "lsst.daf.butler.remote_butler.RemoteButler", "remote_butler": {"url": "!"}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the if
at the end as we do for the other tests so that in theory python tests/test_remote_butler.py
does work. Sometimes it's useful to be able to run outside of pytest if that's possible.
from lsst.daf.butler.tests.utils import MetricTestRepo, makeTestTempDir, removeTestTempDir | ||
|
||
TESTDIR = os.path.abspath(os.path.dirname(__file__)) | ||
|
||
|
||
def _make_remote_butler(http_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how a normal user is going to create a RemoteButler. They are going to say:
butler = Butler("https://test.example")
and it is meant to work automatically. It's meant to automatically look for the /butler.json
endpoint,
It's this code in the prototype server:
app.get("/butler/butler.json", response_model=dict[str, Any])
def read_server_config() -> Mapping:
"""Return the butler configuration that the client should use."""
config_str = f"""
datastore:
root: {BUTLER_ROOT}
registry:
cls: lsst.daf.butler.registries.remote.RemoteRegistry
db: <butlerRoot>
"""
config = Config.fromString(config_str, format="yaml")
return config.toDict()
This would clearly need to change to return back the cls
. There is a <butlerRoot>
convention in the configs that allows the config to say "use what ever URI you found this butler config at" versus a hard-coded server name. So if you go to https://192.168.1.1/butler.json
and the server is at https://192.168.1.1
that all works if you set the root
field to <butlerRoot>
and if you don't like root
we can call it server_url
or something. Alternatively the butler.json could return a hard-coded server URL that is distinct from where the config came from.
In the test code below that was deleted you will see that it downloads the Config, then injects the httpx client into the config such that the Butler constructor will get the Client object for the server URI rather than having to make it itself from a URI.
We can leave it like this for now but it needs to be fixed to work the standard way soon. Remember that an end user doesn't have to know whether they ended up with a DirectButler or RemoteButler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By convention in Phalanx/Safir for REST APIs they don't allow external access to things at the top level of the HTTP path -- that's reserved for internal endpoints like Kubernetes health checks. The externally accessible parts are expected to be nested under "/butler/" or something. Not sure if we can override that or not, I can find out.
As it stands you can point the constructor to a config file on disk or an HTTP server and get back a remote butler instance the same way as you would with a direct butler. Whether it makes sense to serve that file from the REST server itself I'm not sure. If it does it may be more like "http://butler.lsst.cloud/configs/dp02" rather than just the root URL.
I kind of think that any configuration chunks that must be provided by the server should come from a versioned endpoint instead of a static file. So if we are keeping the convention from DirectButler that you can load a configuration file from an http server, we might want to do like "butler://butler.lsst.cloud" to mean "create a remote butler pointing at butler.lsst.cloud with all client-side options set to defaults"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped this file also on my branch that has been just merged. Hopefully git will figure it out when you rebase.
Add a no-op RemoteButler that can be instantiated via the Butler() constructor.
Extract dimensions and some httpx setup logic from RemoteRegistry and move them to RemoteButler. Delete RemoteRegistry and tests associated with it.
Instead of manually setting a global variable to configure the test server's Butler, use FastAPI's dependency injection framework.
7eab7d5
to
6f6001b
Compare
Checklist
doc/changes