-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-40053: Add Datastore records to DatasetRef and use them in butler.get #875
Conversation
@timj, @TallJimbo, here is the current state of my branch, if you want to check it. Not so much code changes, but there were a lot of experiments along the way. Any comments are welcome! |
I finally had a chance to look at this and given,
then this looks like a reasonable approach. We do have to decide whether it's only ephemeral datastores that don't have to return records. Does SasquatchDatastore have to return empty records or only do that if |
I'm not familiar with SasquatchDatastore, but I can imagine that if it does not need datastore records stored in registry then it can return |
4c9f3eb
to
5eb7f94
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #875 +/- ##
==========================================
- Coverage 87.73% 87.58% -0.15%
==========================================
Files 271 271
Lines 36364 36506 +142
Branches 7573 7610 +37
==========================================
+ Hits 31904 31974 +70
- Misses 3278 3347 +69
- Partials 1182 1185 +3
☔ View full report in Codecov by Sentry. |
a5a9732
to
47c9238
Compare
|
||
__slots__ = {"table_spec", "record_class"} | ||
table_spec: ddl.TableSpec | ||
record_class: type[StoredDatastoreItemInfo] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really a minor side issue, but I'd prefer not to assume anywhere that opaque tables are just for datastores, and I'd be great to cut down on the number of conversions (SQLAlchemy row <-> StoredDatastoreItemInfo
<-> pydantic/json mapping).
So, could we put this OpaqueTableDefinition
struct in registry.interfaces.opaque
?
And could we make record_class
just "some pydantic model type"? Or maybe even just use bare dict
here? Dynamic class types aren't getting us any static-analysis advantages, and runtime checking is probably a waste of cycles (except at the JSON boundary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be OK to move OpaqueTableDefinition
to registry.interfaces
. I was worried that this may cause core
to be (logically) dependent on registry
, but we can do it as mypy-only dependency, as we do for a couple of other classes already. We do need then to replace StoredDatastoreItemInfo
with registry-side type (or possibly use dict
), I need to look at the code and figure out a reasonable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea of completely abandoning typing and using dict for arbitrary datastore records. The StoredDatastoreItemInfo
dataclass definition is seemingly very close to defining the database schema directly through introspection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have typed datastore records, but it makes life a bit harder for registry. But it's probably unavoidable.
I think this is broadly consistent with what I think I'd have put in DMTN-249 if I hadn't been overzealous about optimizing the back-and-forth in |
47c9238
to
a653f7b
Compare
87a138c
to
e10be0f
Compare
6a3b2e9
to
1c5ee3e
Compare
163e135
to
b0643eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a lot of little comments and questions but I have no big-picture concerns; it's all a step in the right direction.
FWIW, your put_new
does not match the one I have been prototyping with transaction support over on DMTN-249 anymore, and I think it'll probably end up changing again when we do transactions and/or client-server put
, but my prototyping doesn't include ChainedDatastore
so I'm not confident it will work as-is either.
f2d0608
to
d7d3769
Compare
There are few changes in Registry and Datastore classes that reduce use of the bridge and opaque tables by datastores: - New method `Datastore.get_opaque_table_definitions` returns opaque table definitoons which are used by Registry method `make_datastore_tables` to create those tables. - `Registry.findDataset` has an option to retrieve datastore records and attach them to returned ref. - `Datastore.get` can use the datastore records from the ref instead of using opaque tables. Note that `Datastore.get` can still work with refs that don't have datastore records, this is mostly to keep the unit tests running, a future commit will remove that option and update all unit tests.
Adds `Datastore.put_new`, we want to keep old `Datastore.put` to keep unit tests running (no unit tests for `put_new()` yet). `put_new()` returns DatasetRef with datastore records, and new Registry method is called by Butler to store those records in database. Better approach would be for Registry to store datastore records in `insertDatasets`/`_importDatasets` but that needs a bit of code re-arrangement, will do that later.
This is to make it more similar to the `OpaqueTableValues` in DMTN-249 prototype. The `DatasetDatastoreRecords` type alias is similar to `OpaqueTableBatch` from prototype. `StoredDatastoreItemInfo` is always used with its corresponding `DatasetRef` or `DatasetId`, so keeping dataset_id in each record is wasteful.
This is only to suppress errors in the tests.
Old name was too generic, in reality this is very datastore-specific. I do not think we need generic structure of the same kind, at least not right now.
Jim suggested that we use new `OpaqueRecordSet` class to keep datastore records, but after discussion we figured it would requre lots of additional structure to support it. For now there is no clear need for that class and we can continue using `StoredDatastoreItemInfo` in DatasetRefs, but want to make records private to give us freedom to change it later.
Datastore records in DatasetRef can become invalidated in some cases, this is mostly true for our unit tests that do crazy things to test consistency. This patch adds a flag parameter to some methods that tells them to ignore ref-supplied datastore records.
After trying to re-implement datastore unit tests I realized that the `Datastore.put_new` and `Registry.store_datastore_records` combination (used to implement `Butler.put`) is not very compatible with transaction rollback system that we now have in place. So I decided to keep `Butler.put` unchanged for now until we do something more drastic with transactions. I keep `Datastore.put_new` and `Registry.store_datastore_records` implementations, they are not used but may be useful in the future.
Co-authored-by: Jim Bosch <[email protected]>
d7d3769
to
2a1b9c1
Compare
Checklist
doc/changes