-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add relationship count API #1860
Comments
My concerns with this idea are two-fold:
If the intention is to only have this used during the initial hydration and subsequently for a small window, I believe having a service which consumes the Watch API and keeps a real-time count (bucketed by whatever filters are useful) might be a better solution: it could be an open source solution as well, configurable from the command line to take in filters, keep a real-time count and, using our ability to go backwards in time with Watch, be able to "pick up" if the watcher goes down |
Thanks for the response @josephschorr . I'll do my best to answer your questions:
This is a legitimate concern, but I do think the existing indexes will help here quite a bit. I ran multiple different
As a note there are a total Furthermore, it would be acceptable in my mind to execute these queries with If this really does become a problem where people are abusing this API I think adding some form of rate limiting is reasonable and is probably something that should be considered for APIs in general at some point in time.
It depends on the scenario. If it is a systemic issue then a bulk export on a subset of the data is likely required to investigate the root cause. If it is a one-off issue, then dehydration/rehydration of the data is likely the simplest fix. |
So it occurred to us that the #1785 issue might be an avenue to solve this problem:
|
Codifying this into a proposal: CountRelationships ProposalAdd the ability to track and retrieve the count of relationships based on registered filters in SpiceDB. This will be used both by external users for data validation purposes and, eventually, by SpiceDB itself for use in the eventual query planner (see #1785 and #1573) ProposalSpiceDB will add the following APIs: // RegisterRelationshipCounter registers a new filter for counting relationships. A filter must be registered before
// a count can be requested.
rpc RegisterRelationshipCounter(RegisterRelationshipCounterRequest) returns (RegisterRelationshipCounterResponse)
// CountRelationships returns the count of relationships for *pre-registered* filter.
rpc CountRelationships(CountRelationshipsRequest) returns (CountRelationshipsResponse)
// UnregisterRelationshipCounter unregisters a new filter for counting relationships.
rpc UnregisterRelationshipCounter(UnregisterRelationshipCounterRequest) returns (UnregisterRelationshipCounterResponse)
message RegisterRelationshipCounterRequest {
// relationship_filter defines the filter to be applied to the relationships
// to be counted.
RelationshipFilter relationship_filter = 1
[ (validate.rules).message.required = true ];
}
message RegisterRelationshipCounterResponse {}
message CountRelationshipsRequest {
Consistency consistency = 1;
// relationship_filter defines the filter to be applied to the relationships
// to be counted.
RelationshipFilter relationship_filter = 2
[ (validate.rules).message.required = true ];
}
message CountRelationshipsResponse {
// read_at is the ZedToken at which the relationship count was performed
ZedToken read_at = 1 [ (validate.rules).message.required = true ];
uint64 relationship_count = 2;
}
message UnregisterRelationshipCounterRequest {
RelationshipFilter relationship_filter = 1
[ (validate.rules).message.required = true ];
}
message UnregisterRelationshipCounterResponse {} Workflow and ImplementationThe workflow with consist of registering one (or more) relationship filters, which will be stored in the datastore. For the first-pass, naive implementation, the For the second-pass, proper implementation, a distinct service will be added to SpiceDB to use the watch API to keep the counts up-to-date in-memory, with |
Instead of a distinct service, we could use the DB to coordinate a leader election among SpiceDB replicas and choose one to hold the watch / do the counts. |
Thanks for putting together this official proposal @josephschorr ! Do you anticipate having any kind of guardrails in terms of the number of relationship counters that can be registered at a given time. Also, the RelationshipFilter allows the user to specify an |
Yes, there will likely be a (configurable) limit
Yes, but fortunately it should be an index scan in most cases and once there is a real counting service, it will be handled in memory |
This supports registration, unregistration and counting, but does not yet use a service, which means *all* CountRelationships calls will invoke a `count(*)` on the underlying datastore, which could be slow Fixes authzed#1860
This supports registration, unregistration and counting, but does not yet use a service, which means *all* CountRelationships calls will invoke a `count(*)` on the underlying datastore, which could be slow Fixes authzed#1860
This supports registration, unregistration and counting, but does not yet use a service, which means *all* CountRelationships calls will invoke a `count(*)` on the underlying datastore, which could be slow Fixes authzed#1860
Closing since this is now merged. A followup with a service for counting will be added if need be. Fixes #1901 |
See below for the current proposal
Problem Statement
We are migrating our data from a legacy authorization system into SpiceDB and would like a way to easily verify the relationship counts in order to validate the accuracy of our data. Specifically, we would like to be able to do this in the following scenarios:
We can currently solve scenario 1 above by subscribing the Watch API while performing a hydration and maintaining the counts internally. For scenario 2 we can do a filtered ReadRelationships call for each relationship, but this requires retrieving more data than necessary from the database since we only care about the counts in the end. The ReadRelationships API also has the limitation of requiring us to make the calls in chunks of 1000, which forces us to process things serially.
In both cases having a relationships count API that simply performs a
SELECT count(*) from relation_tuple WHERE ...
clause against the database would be much more efficient/user friendly.Solution Brainstorm
Implement a
CountRelationships
API with the following spec:The text was updated successfully, but these errors were encountered: