Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-41162: Create minimalist RemoteButler client and FastAPI server #897

Merged
merged 8 commits into from
Oct 26, 2023

Conversation

dhirving
Copy link
Contributor

@dhirving dhirving commented Oct 23, 2023

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes

@codecov
Copy link

codecov bot commented Oct 23, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (dd9cc5e) 87.23% compared to head (6f6001b) 87.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #897      +/-   ##
==========================================
+ Coverage   87.23%   87.66%   +0.43%     
==========================================
  Files         270      276       +6     
  Lines       36345    36165     -180     
  Branches     7598     7547      -51     
==========================================
+ Hits        31704    31705       +1     
+ Misses       3484     3305     -179     
+ Partials     1157     1155       -2     
Files Coverage Δ
python/lsst/daf/butler/remote_butler/__init__.py 100.00% <100.00%> (ø)
python/lsst/daf/butler/remote_butler/_config.py 100.00% <100.00%> (ø)
...n/lsst/daf/butler/remote_butler/server/__init__.py 100.00% <100.00%> (ø)
...n/lsst/daf/butler/remote_butler/server/_factory.py 100.00% <100.00%> (ø)
tests/test_server.py 91.89% <100.00%> (+66.57%) ⬆️
.../daf/butler/remote_butler/server/_server_models.py 0.00% <0.00%> (ø)
...on/lsst/daf/butler/remote_butler/server/_server.py 90.90% <90.90%> (ø)
tests/test_remote_butler.py 88.23% <88.23%> (ø)
...on/lsst/daf/butler/remote_butler/_remote_butler.py 95.83% <95.83%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dhirving dhirving force-pushed the tickets/DM-41162 branch 4 times, most recently from 69dfcf9 to fc9af88 Compare October 24, 2023 21:22
@dhirving dhirving marked this pull request as ready for review October 24, 2023 21:52
Copy link
Member

@timj timj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reorganization is okay. The main place where I have an issue is how the butler gets configured. This needs fixing but I'm fine with you fixing it on a different ticket since this is not production code. Butler(server_uri) has to work and it doesn't look like it does at the moment (or at least the test code is not demonstrating that)

butler_config = ButlerConfig(config, searchPaths, without_datastore=True)
self._config = RemoteButlerConfigModel.model_validate(butler_config)
self._dimensions: DimensionUniverse | None = None
# TODO: RegistryDefaults should have finish() called on it, but this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we also need to work out what to do with kwargs since people can say instrument="LATISS" and then in the defaults we check that that's a real instrument so we can default it in later queries when there is a need for an instrument. We can decide what to do about that later. It will fold in to how Query objects work and maybe we will pass the defaults to the Query object and let the server sort it out. I don't know if we want to pass kwargs to the server for verification/expansion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that just passing the defaults to the Query object will probably be the right thing. To stop the execution flow from getting too tangled up between the client and server, I think in general it's going to be the right approach to just record what the user asked for and give it to the server to actually do things with.

http_client: httpx.Client | None = None,
**kwargs: Any,
):
butler_config = ButlerConfig(config, searchPaths, without_datastore=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note eventually the datastore will be needed.

**kwargs: Any,
):
butler_config = ButlerConfig(config, searchPaths, without_datastore=True)
self._config = RemoteButlerConfigModel.model_validate(butler_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this validate because ButlerConfig looks like a generic mapping and it happens to have a remote_butler.url in the hierarchy? In general ButlerConfig is not a pydantic model even though we have talked about it being one. What happens to all the other parts of ButlerConfig?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. Currently the other parts of ButlerConfig are being discarded since we're not using them. I figured as we added usage of the other parts of ButlerConfig, we could add the appropriate validation for them.

It's still not clear to me which parts of the existing DirectButler ButlerConfig are generated internally in the client code, which are potentially configured locally in a client-side config file, and which are only server-side concepts that the client never sees in a configuration file.

I do wonder if some portion of the configuration would be returned from a versioned "init" endpoint (which maybe gives you back some config chunks, server capabilities, dimensions, collection names, etc all in one shot.) If there is only one "config file" endpoint that configuration format becomes an unversionable permanent part of the API

# Docstring inherited.
raise NotImplementedError()

def transaction(self) -> AbstractContextManager[None]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TallJimbo I'm assuming transaction shouldn't really be in the ABC?

#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to add __all__ to all these files. We do always tend to do the __all__ in the files and then from x import * variant in the __init__.py. I think that it also helps sphinx to know what docs should be built and doesn't get confused by other symbols being imported from elsewhere.


def test_bad_config(self):
with self.assertRaises(ValidationError):
Butler({"cls": "lsst.daf.butler.remote_butler.RemoteButler", "remote_butler": {"url": "!"}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the if at the end as we do for the other tests so that in theory python tests/test_remote_butler.py does work. Sometimes it's useful to be able to run outside of pytest if that's possible.

from lsst.daf.butler.tests.utils import MetricTestRepo, makeTestTempDir, removeTestTempDir

TESTDIR = os.path.abspath(os.path.dirname(__file__))


def _make_remote_butler(http_client):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how a normal user is going to create a RemoteButler. They are going to say:

butler = Butler("https://test.example")

and it is meant to work automatically. It's meant to automatically look for the /butler.json endpoint,

It's this code in the prototype server:

app.get("/butler/butler.json", response_model=dict[str, Any])
 def read_server_config() -> Mapping:
     """Return the butler configuration that the client should use."""
     config_str = f"""
 datastore:
     root: {BUTLER_ROOT}
 registry:
     cls: lsst.daf.butler.registries.remote.RemoteRegistry
     db: <butlerRoot>
 """
     config = Config.fromString(config_str, format="yaml")
     return config.toDict()

This would clearly need to change to return back the cls. There is a <butlerRoot> convention in the configs that allows the config to say "use what ever URI you found this butler config at" versus a hard-coded server name. So if you go to https://192.168.1.1/butler.json and the server is at https://192.168.1.1 that all works if you set the root field to <butlerRoot> and if you don't like root we can call it server_url or something. Alternatively the butler.json could return a hard-coded server URL that is distinct from where the config came from.

In the test code below that was deleted you will see that it downloads the Config, then injects the httpx client into the config such that the Butler constructor will get the Client object for the server URI rather than having to make it itself from a URI.

We can leave it like this for now but it needs to be fixed to work the standard way soon. Remember that an end user doesn't have to know whether they ended up with a DirectButler or RemoteButler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention in Phalanx/Safir for REST APIs they don't allow external access to things at the top level of the HTTP path -- that's reserved for internal endpoints like Kubernetes health checks. The externally accessible parts are expected to be nested under "/butler/" or something. Not sure if we can override that or not, I can find out.

As it stands you can point the constructor to a config file on disk or an HTTP server and get back a remote butler instance the same way as you would with a direct butler. Whether it makes sense to serve that file from the REST server itself I'm not sure. If it does it may be more like "http://butler.lsst.cloud/configs/dp02" rather than just the root URL.

I kind of think that any configuration chunks that must be provided by the server should come from a versioned endpoint instead of a static file. So if we are keeping the convention from DirectButler that you can load a configuration file from an http server, we might want to do like "butler://butler.lsst.cloud" to mean "create a remote butler pointing at butler.lsst.cloud with all client-side options set to defaults"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped this file also on my branch that has been just merged. Hopefully git will figure it out when you rebase.

Add a no-op RemoteButler that can be instantiated via the Butler() constructor.
Extract dimensions and some httpx setup logic from RemoteRegistry and
move them to RemoteButler.  Delete RemoteRegistry and tests associated
with it.
Instead of manually setting a global variable to configure the test
server's Butler, use FastAPI's dependency injection framework.
@dhirving dhirving merged commit e89626a into main Oct 26, 2023
16 checks passed
@dhirving dhirving deleted the tickets/DM-41162 branch October 26, 2023 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants