Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVStore: Add the capability to list namespaces #3387

Open
tnull opened this issue Oct 28, 2024 · 2 comments
Open

KVStore: Add the capability to list namespaces #3387

tnull opened this issue Oct 28, 2024 · 2 comments
Milestone

Comments

@tnull
Copy link
Contributor

tnull commented Oct 28, 2024

We previously discussed this in the original KVStore PR, but ended up not including it there:

We should add the capability to list primary and secondary namespaces to the KVStore trait. This would allow users to explore all stored keys based on trait methods only, which would be a prerequisite for writing generalized migration logic between KVStore implementations.

Tagging this 0.1 as a user indicated a need for such migration logic.

@tnull tnull added this to the 0.1 milestone Oct 28, 2024
@G8XSU
Copy link
Contributor

G8XSU commented Oct 29, 2024

Problem with fn list_namespaces()

I don’t fully agree with the approach of listing namespaces in the KvStore API.

An application must be aware of the namespaces it is using and should avoid any dynamic namespaces. (An analogy would be an application requesting list_tables from a SQL db.)

Problem with fn list_secondary_namespaces()

Depending on how client has architected their storage, this could be a non-backward compatible change or could be inefficient.

Consider a client with kvstore: partition_key:namespace sort_key:secondary_namespace+key

  • Requesting list of secondary namespaces might require them to list all the keys depending on implementation.
  • In dynamo-db there is no way to list all partition_keys afaik, you can only list sort_keys within a partition_key

Consider a client with composite key: namespace+secondary_namespace+key, if they have functionality for prefix search.

  • Requesting list of namespace / secondary namespaces might require them to list all the keys depending on implementation.

Proposal

Since the purpose of list namespaces or secondary_namespaces is to eventually list keys for migration etc.
We should directly ask for keys and extend the list api to support this.

By this i mean, either change fn list(&self, primary_namespace: &str, secondary_namespace: &str) to make secondary_namespace optional.

or introduce fn list(&self, primary_namespace: &str) -> Vec<secondary_namespace, key>

  • Don't ask for list<namespaces>, this solves the dynamo-db case where we can't list partition_keys.
  • Solves the case where listing secondary_namespaces require them to list all keys. (both in hash+range & composite key case.)

In short we might have complicated this by not supporting prefix listing of keys and going for 2 levels of namespaces.

@tnull
Copy link
Contributor Author

tnull commented Oct 30, 2024

An application must be aware of the namespaces it is using

Potentially, yes, but our migration logic might not be aware of all the namespaces.

and should avoid any dynamic namespaces.

Huh? That can't be a requirement as the namespaces have been explicitly introduced to allow for dynamic namespaces in MonitorUpdatingPersister?

Depending on how client has architected their storage, this could be a non-backward compatible change or could be inefficient.

I guess at this point we have pretty clear picture of the currently deployed KVStore implementations. Do any of them stand out that can't do this?

Requesting list of secondary namespaces might require them to list all the keys depending on implementation.

Yes, this is exactly what we try to do here, no?

In dynamo-db there is no way to list all partition_keys afaik, you can only list sort_keys within a partition_key

Right, that means that the implementation would need to iterate over all partitions? Same goes for a file store backend for example? I'm not sure I'm following what the problem is exactly?

By this i mean, either change fn list(&self, primary_namespace: &str, secondary_namespace: &str) to make secondary_namespace optional.

What do you mean with "make secondary_namespace optional"?

or introduce fn list(&self, primary_namespace: &str) -> Vec<secondary_namespace, key>

* Don't ask for `list<namespaces>`, this solves the dynamo-db case where we can't list partition_keys.

* Solves the case where listing secondary_namespaces require them to list all keys. (both in hash+range & composite key case.)

I think you're losing me here, what exactly is your proposal and what is it solving?

In short we might have complicated this by not supporting prefix listing of keys and going for 2 levels of namespaces.

How would prefix matching be any more efficient here? I think all (?) the most common backends would still need to do this by iterating over all keys and filtering?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants