Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324

markylaing · 2024-10-22T12:56:29Z

The underlying cause of this bug was that general filtering of used-by URLs makes the assumption that the can_view entitlement is available for all entity types. It is a fair assumption, but wasn't true for storage volume or instance backups or snapshots.

To fix this, four new entity types have been added to the authorization model:

instance_backup
instance_snapshot
storage_volume_backup
storage_volume_snapshot

Each has associated entitlements:

can_edit
can_view
can_delete

It is still not possible to grant these entitlements via the API. Instead, they are granted via can_manage_snapshots or can_manage_backups on the associated instance or storage volume.

The OpenFGADatastore implementation has been updated to handle instance and storage_volume relations between the parent and it's snapshots/backups.

Update OpenFGADatastore comments - it says an instance is not a relation but after this PR it is.

Closes #14291

github-actions · 2024-10-22T12:56:44Z

Heads up @mionaalex - the "Documentation" label was applied to this issue.

markylaing · 2024-10-22T12:56:54Z

CC @mas-who @edlerd

tomponline · 2024-10-22T13:30:48Z

tests are sad

lxd/db/openfga/openfga.go

markylaing · 2024-10-23T07:18:23Z

@tomponline tests are mostly green except for one: https://github.com/canonical/lxd/actions/runs/11463430112/job/31913802942#step:12:52804

I'm not certain why this is failing as it doesn't seem to have anything to do with this PR. It is potentially related to #14315 since lxc profile assign calls PUT /1.0/instances/{name} which does some Profile.ToAPI work. I also don't understand why it only failed with the dir storage backend.

Edit: Note that this also doesn't fail locally. I'll have to get a tmate session running.

tomponline · 2024-10-23T07:36:04Z

I'm not certain why this is failing as it doesn't seem to have anything to do with this PR. It is potentially related to #14315 since lxc profile assign calls PUT /1.0/instances/{name} which does some Profile.ToAPI work. I also don't understand why it only failed with the dir storage backend.

@hamistao please can you check this out, thanks

Seems like a panic.

markylaing · 2024-10-24T06:32:08Z

@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic.

markylaing · 2024-10-24T08:12:37Z

@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic.

I've been investigating this for an hour or so with no progress. It would be very useful to surface panics in the test logs. I'm trying to figure out a way to do this.

tomponline · 2024-10-24T08:24:55Z

@tomponline @hamistao The CI passed on the third attempt. I'll investigate a bit more though as I don't want to introduce any races, especially when they may be causing a panic.

I've been investigating this for an hour or so with no progress. It would be very useful to surface panics in the test logs. I'm trying to figure out a way to do this.

Did you identify which commit introduced it yet?

Did you try reverting the earlier profiles PR?

markylaing · 2024-10-24T11:46:19Z

Did you identify which commit introduced it yet?

Did you try reverting the earlier profiles PR?

With it being intermittent I didn't think reverting the profiles PR would tell me very much (e.g. I'll need to figure out where that panic is occurring in either case). I've added a commit to check LXD logs for panics. It's failing on standalone tests but not in the cluster tests which is a bit odd. Still investigating.

markylaing · 2024-10-25T10:28:39Z

I've re-run the test 8 times now and the panic only occurred on the first two runs. I've added a PR to handle panics a bit more cleanly in the future (#14346). If it happens again it should be obvious where it occurred.

markylaing · 2024-10-25T12:25:57Z

Of course it fails again as soon as I move the panic checker work into another PR 🤦

Signed-off-by: Mark Laing <[email protected]>

Adds instance and storage volume snapshots and backups to the OpenFGA model. These entitlements cannot be assigned to identities, service accounts, or group members. Instead they are inherited from the parent instance or volume. Signed-off-by: Mark Laing <[email protected]>

…d backups. Signed-off-by: Mark Laing <[email protected]>

Signed-off-by: Mark Laing <[email protected]>

The auth.ValidateEntitlement function validates all entitlements that can be granted via the API. Since the new entitlements on snapshots and backups cannot be granted via the API, this check fails. The OpenFGA server will return an error if an invalid query is performed based on it's own understanding of the authorization model. Signed-off-by: Mark Laing <[email protected]>

Signed-off-by: Mark Laing <[email protected]>

Previously the only entities that had inherited relations were project and server. Now that we are linking instances and storage volumes to their snapshots and backups, the OpenFGADatastore implementation needs to handle these relations. On Read, we can connect a snapshot or backup to its parent instance or storage volume using the information stored in its URL. For example, the storage volume backup URL: /1.0/storage-pools/default/volumes/custom/vol1/backups/backup1?project=project1 is related to its parent: /1.0/storage-pools/default/volumes/custom/vol1?project=project1 via the `storage_volume relation`. Signed-off-by: Mark Laing <[email protected]>

…tartingWithUser. Previously the only entities that had inherited relations were project and server. Now that we are linking instances and storage volumes to their snapshots and backups, the OpenFGADatastore implementation needs to handle these relations. On ReadStartingWithUser, the function needs to return all backups or snapshots that are related to a parent instance or storage volume. This is used in the `ListObjects` call to the OpenFGA server, which is used by `(auth.Authorizer).GetPermissionChecker`. To do this, I have naively queried for all snapshots or backups in the project, and filtered out those that don't have the correct parent. This keeps the implementation simple and makes use of `GetEntityURLs`, which performs as few queries as possible. Further optimisation may be needed. Signed-off-by: Mark Laing <[email protected]>

We can now use the `can_view`, `can_edit`, and `can_delete` entitlements with instance backups and snapshots. We should do this so that our checks more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>

The access handler was performing some logic to determine the location of the storage volume for use in the access check. This was based on whether the storage pool is remote, and if not, the cluster member where the volume is located. This commit removes that logic and adds a "location" field to `storageVolumeDetails` so that it can be used in the handlers. The logic for determining the location is modified to suit the call site. It is only set when the pool is not remote. Signed-off-by: Mark Laing <[email protected]>

The storage volume snapshot and backup access handlers need to share almost identical logic to the storage volume access handler. Including getting the storage pool, understanding if the storage volume is located on another cluster member, and so forth. This commit parameterises the function so that it can be used by the snapshot and backup entity types as well; creating and checking against the correct URL when called. Signed-off-by: Mark Laing <[email protected]>

Signed-off-by: Mark Laing <[email protected]>

We can now check `can_view`, `can_edit`, and `can_delete` against the backup/snapshot itself. We should do so to more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>

Signed-off-by: Mark Laing <[email protected]>

markylaing · 2024-11-08T11:48:59Z

Update on this. I've set up a tmate session 3 times and in each case I:

Ran the standalone suite up to the intermittently failing test (config_profiles)
Ran the config_profiles test in a loop 60 times without tearing down the test harness (e.g. keeping the test environment). To do this I had to edit the test slightly, only to clean things up (deleted leftover profiles and directories)

It didn't fail once. I have also ran the full suite ~15 times over the last week and haven't seen the failure again.

At this point I'm pretty baffled 🤷

markylaing · 2024-11-11T13:40:10Z

I've just spotted another failure in another PR (#14434) that might be related: https://github.com/canonical/lxd/actions/runs/11777602677/job/32802823358#step:12:38935

tomponline · 2024-11-12T10:21:22Z

I've just spotted another failure in another PR (#14434) that might be related: https://github.com/canonical/lxd/actions/runs/11777602677/job/32802823358#step:12:38935

are your test fails always happening on ceph too?

markylaing · 2024-11-12T10:57:06Z

I've just spotted another failure in another PR (#14434) that might be related: https://github.com/canonical/lxd/actions/runs/11777602677/job/32802823358#step:12:38935

are your test fails always happening on ceph too?

No it was failing with the dir backend.

Signed-off-by: Mark Laing <[email protected]>

markylaing added the Bug Confirmed to be a bug label Oct 22, 2024

markylaing added this to the lxd-6.2 milestone Oct 22, 2024

markylaing self-assigned this Oct 22, 2024

markylaing requested a review from tomponline October 22, 2024 12:56

github-actions bot added the Documentation Documentation needs updating label Oct 22, 2024

markylaing force-pushed the used-by-bug branch from a51ac04 to 86d9a32 Compare October 22, 2024 15:25

tomponline reviewed Oct 22, 2024

View reviewed changes

lxd/db/openfga/openfga.go Outdated Show resolved Hide resolved

markylaing mentioned this pull request Oct 24, 2024

Auth: Prune pending TLS identities #14261

Merged

markylaing force-pushed the used-by-bug branch 4 times, most recently from 4b56433 to bea0552 Compare October 25, 2024 10:22

markylaing mentioned this pull request Oct 25, 2024

Test: Add panic checker #14346

Open

markylaing marked this pull request as draft October 25, 2024 12:26

markylaing force-pushed the used-by-bug branch 5 times, most recently from 320408e to 6b8b335 Compare October 31, 2024 15:02

markylaing force-pushed the used-by-bug branch 2 times, most recently from a0a21d4 to ef70dd8 Compare November 6, 2024 14:36

markylaing added 15 commits November 7, 2024 14:38

shared/entity: Add functions to create snapshot and backup URLs.

2bcb57a

Signed-off-by: Mark Laing <[email protected]>

lxd/auth/drivers: Clarify that "can_view" allows viewing snapshots an…

3859a9c

…d backups. Signed-off-by: Mark Laing <[email protected]>

lxd/auth: Run make update-auth.

367ac66

Signed-off-by: Mark Laing <[email protected]>

metadata: Run make update-metadata.

1085028

Signed-off-by: Mark Laing <[email protected]>

lxd/db/openfga: Use entity types for parent-child relations.

5f6daef

Signed-off-by: Mark Laing <[email protected]>

lxd: Update calls to the storage volume access handler.

8999160

Signed-off-by: Mark Laing <[email protected]>

lxd: Update storage volume snapshot and backup access checks.

cfe7b7d

We can now check `can_view`, `can_edit`, and `can_delete` against the backup/snapshot itself. We should do so to more accurately reflect the authorization model. Signed-off-by: Mark Laing <[email protected]>

test/suites: Add tests for storage pool used-by filtering.

9b67811

Signed-off-by: Mark Laing <[email protected]>

markylaing force-pushed the used-by-bug branch 6 times, most recently from 97046fa to 9b67811 Compare November 8, 2024 11:42

DEBUG: Add recovery handling and instance put logging

3dc0987

Signed-off-by: Mark Laing <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324

Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324

markylaing commented Oct 22, 2024 •

edited

Loading

github-actions bot commented Oct 22, 2024

markylaing commented Oct 22, 2024

tomponline commented Oct 22, 2024

markylaing commented Oct 23, 2024 •

edited

Loading

tomponline commented Oct 23, 2024 •

edited

Loading

markylaing commented Oct 24, 2024

markylaing commented Oct 24, 2024

tomponline commented Oct 24, 2024

markylaing commented Oct 24, 2024

markylaing commented Oct 25, 2024

markylaing commented Oct 25, 2024

markylaing commented Nov 8, 2024

markylaing commented Nov 11, 2024

tomponline commented Nov 12, 2024

markylaing commented Nov 12, 2024

Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324

Are you sure you want to change the base?

Auth: Fix missing snapshots and backups from storage pool used-by URLs #14324

Conversation

markylaing commented Oct 22, 2024 • edited Loading

github-actions bot commented Oct 22, 2024

markylaing commented Oct 22, 2024

tomponline commented Oct 22, 2024

markylaing commented Oct 23, 2024 • edited Loading

tomponline commented Oct 23, 2024 • edited Loading

markylaing commented Oct 24, 2024

markylaing commented Oct 24, 2024

tomponline commented Oct 24, 2024

markylaing commented Oct 24, 2024

markylaing commented Oct 25, 2024

markylaing commented Oct 25, 2024

markylaing commented Nov 8, 2024

markylaing commented Nov 11, 2024

tomponline commented Nov 12, 2024

markylaing commented Nov 12, 2024

markylaing commented Oct 22, 2024 •

edited

Loading

markylaing commented Oct 23, 2024 •

edited

Loading

tomponline commented Oct 23, 2024 •

edited

Loading