Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support embeddings workflows #36

Open
alexanderatallah opened this issue Apr 18, 2023 · 1 comment
Open

Support embeddings workflows #36

alexanderatallah opened this issue Apr 18, 2023 · 1 comment

Comments

@alexanderatallah
Copy link
Owner

alexanderatallah commented Apr 18, 2023

Support workflows involving embeddings and vector databases. Need to discuss:

  • whether the storage of embeddings should be left to the developer
  • if not, ephemeral vs persistent embeddings use cases
  • most expensive "tinkering" use cases that require vector databases, that could be made low/zero cost for devs
  • cases where multiple apps are computing embeddings on the same data
  • costs of switching between embedding models (recomputing embeddings)
@alexanderatallah alexanderatallah converted this from a draft issue Apr 18, 2023
@handrew
Copy link
Collaborator

handrew commented Apr 20, 2023

Working on a demo of embeddings w/ Window, so I have a few thoughts here.

  1. Whether storage should be left to the developer:
  • I think it should be fairly easy for Window to be neutral here and allow the developer to store embeddings wherever they want on the backend.
  • For instance, the thing I'm working on saves embeddings to disk locally. It'd be easy to imagine saving this to some vector store or plain old SQL db in a hosted service and letting the developer handle CRUD operations, caching, etc.
  • However, it would be nice if Window had some way to store embeddings on the client side as well :). Either in the client's browser's local storage, or even just in memory. To this end, there are a few tools — Huggingface.js and Chroma come to mind — that could be used in Window “out of the box”.
  1. Ephemeral vs. persistent embeddings:
  • I don't have a strong view or prior on what use cases might prefer one or the other yet, but per the above I think it could be fairly simple to enable flexibility of in-memory / ephemeral vs. locally-stored / managed embeddings. Could be a potential "Infura"-like offering too?
  1. Most expensive "tinkering":
  • My prior on this is that OpenAI's embeddings are so cheap that the "zero cost" value prop might not be as acutely felt. Also, I've seen some vector dbs, e.g., Chroma use free, off-the-shelf HuggingFace embedding models, so those might be an option to get started out with.
  1. Multiple apps computing embeddings on the same data:
  • I'm not quite sure what you mean here. Could you give an example?
  1. Costs of switching:
  • Will of course depend on the provider. Unless I'm misunderstanding, if I were you, I'd probably leave this up to the developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: No status
Development

No branches or pull requests

2 participants