-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324
base: main
Are you sure you want to change the base?
Conversation
Heads up @mionaalex - the "Documentation" label was applied to this issue. |
tests are sad |
a51ac04
to
86d9a32
Compare
@tomponline tests are mostly green except for one: https://github.com/canonical/lxd/actions/runs/11463430112/job/31913802942#step:12:52804 I'm not certain why this is failing as it doesn't seem to have anything to do with this PR. It is potentially related to #14315 since Edit: Note that this also doesn't fail locally. I'll have to get a tmate session running. |
@hamistao please can you check this out, thanks Seems like a panic. |
@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic. |
I've been investigating this for an hour or so with no progress. It would be very useful to surface panics in the test logs. I'm trying to figure out a way to do this. |
Did you identify which commit introduced it yet? Did you try reverting the earlier profiles PR? |
With it being intermittent I didn't think reverting the profiles PR would tell me very much (e.g. I'll need to figure out where that panic is occurring in either case). I've added a commit to check LXD logs for panics. It's failing on standalone tests but not in the cluster tests which is a bit odd. Still investigating. |
4b56433
to
bea0552
Compare
I've re-run the test 8 times now and the panic only occurred on the first two runs. I've added a PR to handle panics a bit more cleanly in the future (#14346). If it happens again it should be obvious where it occurred. |
Of course it fails again as soon as I move the panic checker work into another PR 🤦 |
320408e
to
6b8b335
Compare
a0a21d4
to
ef70dd8
Compare
Signed-off-by: Mark Laing <[email protected]>
Adds instance and storage volume snapshots and backups to the OpenFGA model. These entitlements cannot be assigned to identities, service accounts, or group members. Instead they are inherited from the parent instance or volume. Signed-off-by: Mark Laing <[email protected]>
…d backups. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
The auth.ValidateEntitlement function validates all entitlements that can be granted via the API. Since the new entitlements on snapshots and backups cannot be granted via the API, this check fails. The OpenFGA server will return an error if an invalid query is performed based on it's own understanding of the authorization model. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
Previously the only entities that had inherited relations were project and server. Now that we are linking instances and storage volumes to their snapshots and backups, the OpenFGADatastore implementation needs to handle these relations. On Read, we can connect a snapshot or backup to its parent instance or storage volume using the information stored in its URL. For example, the storage volume backup URL: /1.0/storage-pools/default/volumes/custom/vol1/backups/backup1?project=project1 is related to its parent: /1.0/storage-pools/default/volumes/custom/vol1?project=project1 via the `storage_volume relation`. Signed-off-by: Mark Laing <[email protected]>
…tartingWithUser. Previously the only entities that had inherited relations were project and server. Now that we are linking instances and storage volumes to their snapshots and backups, the OpenFGADatastore implementation needs to handle these relations. On ReadStartingWithUser, the function needs to return all backups or snapshots that are related to a parent instance or storage volume. This is used in the `ListObjects` call to the OpenFGA server, which is used by `(auth.Authorizer).GetPermissionChecker`. To do this, I have naively queried for all snapshots or backups in the project, and filtered out those that don't have the correct parent. This keeps the implementation simple and makes use of `GetEntityURLs`, which performs as few queries as possible. Further optimisation may be needed. Signed-off-by: Mark Laing <[email protected]>
We can now use the `can_view`, `can_edit`, and `can_delete` entitlements with instance backups and snapshots. We should do this so that our checks more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>
The access handler was performing some logic to determine the location of the storage volume for use in the access check. This was based on whether the storage pool is remote, and if not, the cluster member where the volume is located. This commit removes that logic and adds a "location" field to `storageVolumeDetails` so that it can be used in the handlers. The logic for determining the location is modified to suit the call site. It is only set when the pool is not remote. Signed-off-by: Mark Laing <[email protected]>
The storage volume snapshot and backup access handlers need to share almost identical logic to the storage volume access handler. Including getting the storage pool, understanding if the storage volume is located on another cluster member, and so forth. This commit parameterises the function so that it can be used by the snapshot and backup entity types as well; creating and checking against the correct URL when called. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
We can now check `can_view`, `can_edit`, and `can_delete` against the backup/snapshot itself. We should do so to more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>
Signed-off-by: Mark Laing <[email protected]>
97046fa
to
9b67811
Compare
Update on this. I've set up a tmate session 3 times and in each case I:
It didn't fail once. I have also ran the full suite ~15 times over the last week and haven't seen the failure again. At this point I'm pretty baffled 🤷 |
I've just spotted another failure in another PR (#14434) that might be related: https://github.com/canonical/lxd/actions/runs/11777602677/job/32802823358#step:12:38935 |
are your test fails always happening on ceph too? |
No it was failing with the |
Signed-off-by: Mark Laing <[email protected]>
The underlying cause of this bug was that general filtering of used-by URLs makes the assumption that the
can_view
entitlement is available for all entity types. It is a fair assumption, but wasn't true for storage volume or instance backups or snapshots.To fix this, four new entity types have been added to the authorization model:
instance_backup
instance_snapshot
storage_volume_backup
storage_volume_snapshot
Each has associated entitlements:
can_edit
can_view
can_delete
It is still not possible to grant these entitlements via the API. Instead, they are granted via
can_manage_snapshots
orcan_manage_backups
on the associated instance or storage volume.The OpenFGADatastore implementation has been updated to handle
instance
andstorage_volume
relations between the parent and it's snapshots/backups.Closes #14291