You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to even attempt cost-based query planning, SpiceDB needs to gather some metrics about the relationships that it is trying to traverse. This proposal is meant to more directly discuss details of a potential stats module that is alluded to in #1573.
Solution Brainstorm
It has been a while since I filed #1573, but I have been noodling on the stats module problem, and I wanted to capture some of my thoughts to see what kind of other thoughts they might generates from other folks.
There are a few high-level problems here that I'll try to ideate on:
Stats to track
HLL storage
Syncing
Deletes
Stats to track
This one is short. The stats module should be flexible enough to track additional metrics over time, but the clearest need is to track cardinality between relationships. As such, the stats module should expose the ability to fetch relationship cardinality estimates in its interface. Tracking the cardinality is touched on in #1573, but I think the clearest path is to use a structure like HLLs to estimate cardinality in a memory-efficient way.
Whenever a relationship is created, an HLL for each end of that relationship would be updated. (More later on deletes)
Additional stats may be integrated as needed.
HLL storage
This one might be tricky. I see a few options which I'll try to outline exhaustively:
❌ Store HLLs in a DB-native way
Most DBs don't support truly native HLLs, and that is certainly true of the storage backends supported by SpiceDB. Some may require an extension to be installed or might not have a native option at all. Even if extensions existed for each backend, requiring users to install and upgrade special DB extensions is likely an unacceptable user experience for SpiceDB customers.
❌ Stats-specific storage
A dedicated storage option (like Redis) could be pulled in for storing stats. While Redis has fantastic support for HLLs, I personally do not like this approach as it still complicates the SpiceDB deployment story and also comes with some persistence concerns. Adding yet another piece to deploy and maintain is likely unacceptable, so I won't spend more time on it.
(Yes?) App-level HLLs stored as byte arrays
Expecting byte array support for a DB is reasonably table stakes, so the DB requirement here should be easy to meet. CockroachDB, Spanner, Postgres, MySQL. In this world, SpiceDB is responsible for operating on the HLLs and serializing them out as bytes within the DB. This is more complexity within SpiceDB itself, but likely worth the tradeoff of a more simplified deploy/customer experience.
Syncing
This section assumes app-level HLLs.
These stats are meant to be a heuristics by nature, so it should be okay for the data to be a little stale. It may make sense to have a goroutine running on the side and receiving events over a channel. As it receives events, it will update some in-memory HLLs. Separately, on some cadence (X minutes, X events, etc), the in-memory HLLs should be merged with the HLLs in the database. Since we lack DB-native HLL operations, this syncing will require a read-modify-write pattern where the updated HLLs need be read, deserialized, merged with the in-memory HLLs, and serialized out to the DB. There will be a balance between choosing a syncing cadence that is high enough to not cause DB lock contention while also being low enough to be uselessly out of date.
Deletes
Deletes get tricky and have the opportunity to throw off stats when performed with significant volume. Due to the probabilistic nature of HLLs, you can't really properly "delete" from it. Separate HLLs could be used to track deletes, but it's still ultimately a guess with potential to go wrong. This isn't really unique to SpiceDB as all DB stats modules have edge cases where the stats tracking may go awry, but it's worth calling out, being aware of, and having some sort of a plan for.
Other Thoughts?
Is it worth it to consider some equivalent of the ANALYZE command to allow SpiceDB users to force their stats to be updated after major operations? This could be useful for scenarios where large deletes have occured and thrown off stats.
The text was updated successfully, but these errors were encountered:
Copying over some great context from @josephschorr in the Discord
could be a process which uses the watch API and updates in the background and out of band
That would prevent multiple writers
Could even be sharded
And if it provided its own grpc API, SpiceDB could call that to get the stats in-memory
Likely a great way to prototype
Problem Statement
In order to even attempt cost-based query planning, SpiceDB needs to gather some metrics about the relationships that it is trying to traverse. This proposal is meant to more directly discuss details of a potential stats module that is alluded to in #1573.
Solution Brainstorm
It has been a while since I filed #1573, but I have been noodling on the stats module problem, and I wanted to capture some of my thoughts to see what kind of other thoughts they might generates from other folks.
There are a few high-level problems here that I'll try to ideate on:
Stats to track
This one is short. The stats module should be flexible enough to track additional metrics over time, but the clearest need is to track cardinality between relationships. As such, the stats module should expose the ability to fetch relationship cardinality estimates in its interface. Tracking the cardinality is touched on in #1573, but I think the clearest path is to use a structure like HLLs to estimate cardinality in a memory-efficient way.
Whenever a relationship is created, an HLL for each end of that relationship would be updated. (More later on deletes)
Additional stats may be integrated as needed.
HLL storage
This one might be tricky. I see a few options which I'll try to outline exhaustively:
Most DBs don't support truly native HLLs, and that is certainly true of the storage backends supported by SpiceDB. Some may require an extension to be installed or might not have a native option at all. Even if extensions existed for each backend, requiring users to install and upgrade special DB extensions is likely an unacceptable user experience for SpiceDB customers.
A dedicated storage option (like Redis) could be pulled in for storing stats. While Redis has fantastic support for HLLs, I personally do not like this approach as it still complicates the SpiceDB deployment story and also comes with some persistence concerns. Adding yet another piece to deploy and maintain is likely unacceptable, so I won't spend more time on it.
Expecting byte array support for a DB is reasonably table stakes, so the DB requirement here should be easy to meet. CockroachDB, Spanner, Postgres, MySQL. In this world, SpiceDB is responsible for operating on the HLLs and serializing them out as bytes within the DB. This is more complexity within SpiceDB itself, but likely worth the tradeoff of a more simplified deploy/customer experience.
Syncing
This section assumes app-level HLLs.
These stats are meant to be a heuristics by nature, so it should be okay for the data to be a little stale. It may make sense to have a goroutine running on the side and receiving events over a channel. As it receives events, it will update some in-memory HLLs. Separately, on some cadence (X minutes, X events, etc), the in-memory HLLs should be merged with the HLLs in the database. Since we lack DB-native HLL operations, this syncing will require a read-modify-write pattern where the updated HLLs need be read, deserialized, merged with the in-memory HLLs, and serialized out to the DB. There will be a balance between choosing a syncing cadence that is high enough to not cause DB lock contention while also being low enough to be uselessly out of date.
Deletes
Deletes get tricky and have the opportunity to throw off stats when performed with significant volume. Due to the probabilistic nature of HLLs, you can't really properly "delete" from it. Separate HLLs could be used to track deletes, but it's still ultimately a guess with potential to go wrong. This isn't really unique to SpiceDB as all DB stats modules have edge cases where the stats tracking may go awry, but it's worth calling out, being aware of, and having some sort of a plan for.
Other Thoughts?
Is it worth it to consider some equivalent of the
ANALYZE
command to allow SpiceDB users to force their stats to be updated after major operations? This could be useful for scenarios where large deletes have occured and thrown off stats.The text was updated successfully, but these errors were encountered: