Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OPFS support #1490

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add OPFS support #1490

wants to merge 2 commits into from

Conversation

dengkunli
Copy link
Contributor

This PR adds OPFS (Origin Private File System) support for duckdb-wasm. With OPFS support, we could persist the database in the browser 🔥

API

To remain simple and intuitive, no new api is added, just pass a path starting with opfs:// to db.open and we'll know we're using OPFS and finish all the preparation work needed:

await db.open({
    path: 'opfs://test.db',
    accessMode: duckdb.DuckDBAccessMode.READ_WRITE,
});
const conn = await db.connect();

List of Changes

  • Add OPFS Support
  • Improve Debuggability: use -g option to enable all DWARF debugging capabilities (before this, we could breakpoint but debug symbol information are missing)
  • Fix the coi worker build: the duckdb-browser-coi.pthread.worker.ts override the global onmessage handler but forgot to set startWorker in the load cmd (when missing, the coi worker will not start properly)
  • Update examples/esbuild-browser: add opfs example & support serving coi build
  • Add tests

@carlopi carlopi self-requested a review November 17, 2023 11:34
@carlopi
Copy link
Collaborator

carlopi commented Nov 17, 2023

Thanks a lot for this PR! I will have to go properly through it and do a review, but looks a very cool addition

@carlopi
Copy link
Collaborator

carlopi commented Nov 20, 2023

This PR seems to me addresses 2 separate features, OPFS and support for threads in the example.

I think it's clearer to have them land separately, I did a partial cherry-pick of this PR in #1501.

OPFS part I would need to more closely review and experiment with, but I will get to it in the next days.

@dengkunli
Copy link
Contributor Author

Thanks a lot! I just did a rebase to resolve the conflict in examples/esbuild-browser/package.json after the partial cherry-pick

@dengkunli
Copy link
Contributor Author

@carlopi Hi, checking in on the progress here, any updates? or any assistance i can offer?

@Alex-Monahan
Copy link

Howdy @dengkunli! We are pretty excited about this capability - thank you! One quick question - does this also enable you to query a file that is located in OPFS? Could this enable a user to drop in a .csv file for example, and then query that file with DuckDB Wasm in a way that persists across browser sessions? If so, mind adding a test to show that sweet feature working?

@carlopi
Copy link
Collaborator

carlopi commented Jan 2, 2024

Hi @dengkunli, sorry for the delay, work was undergoing in other areas of the code, and this has not gotten yet the required attention. I will send a proper review in the next few days, but this is very interesting to have in duckdb-wasm.

@dengkunli
Copy link
Contributor Author

@Alex-Monahan of course! I've pushed a commit adding tests including:

  1. export csv to OPFS
  2. import csv from OPFS
  3. import parquet from OPFS using read_parquet function (works as well without read_parquet function)

@carlopi Thanks a lot for taking the time to review !

@binwangcn
Copy link

does this PR merged in release version?
any update?

@charlespickettau
Copy link

Is there any update on getting this merged/released? We are currently bringing in our own version of the package which is not ideal.

I understand you are open source but if the PR is ready then can we get an ETA on merging it please?

@lpfy
Copy link

lpfy commented Mar 25, 2024

Is there any update on getting this merged/released? We are currently bringing in our own version of the package which is not ideal.

I understand you are open source but if the PR is ready then can we get an ETA on merging it please?

Hi, I am also interested to know the ETA. Thanks

@Alex-Monahan
Copy link

Alex-Monahan commented Apr 3, 2024

I am experimenting with this PR, and tried to ATTACH a database stored within the OPFS (note, this database doesn't exist yet - I want to create it as a part of the attach operation). It couldn't find a registered file handle. Any ideas on what needs to be added to support attaches?
Ex: ATTACH 'opfs://my_db_to_attach.duckdb' AS woot

@carlopi
Copy link
Collaborator

carlopi commented Apr 3, 2024

I am experimenting with this PR, and tried to ATTACH a database stored within the OPFS (note, this database doesn't exist yet - I want to create it as a part of the attach operation). It couldn't find a registered file handle. Any ideas on what needs to be added to support attaches? Ex: ATTACH 'opfs://my_db_to_attach.duckdb' AS woot

There have been very recent rework of the file system in mainline duckdb, see duckdb/duckdb#11297 or duckdb/duckdb#11343.
Part of the motivation for those changes is more powerful support for attaching remote databases in duckdb.
Those changes will be releases as duckdb v0.10.2.

Those changes of the interface are not compatible with current duckdb-wasm, that will need to be slightly changed to support those new APIs. I do not know, but it's possible that with the newer interface more stuff will work out of the box, or possible some different methods needs to be supported.

@ravwojdyla
Copy link
Contributor

ravwojdyla commented Apr 4, 2024

First and foremost: awesome work/library, thank you so much!! On the topic of this PR: would love to use this feature asap, even if it is released as experimental and/or there are upcoming changes that may (or may not) streamline this use-case. A bird in the hand is worth two in the bush? It seems like there's high demand for this feature based on:

🙏

@hangxingliu
Copy link

Hi @Alex-Monahan . @ravwojdyla and @carlopi

Actually we have maintained a DuckDB WASM with OPFS support since last year October. It seems like that it supports the future changes @carlopi mentioned.
And we used it for our app few months. You can build and use it if you want, there are scripts in these two repos for quickly building and you can build it by yourself to exclude our custom extension. (datadocs/duckdb-wasm@7c494e7...5bf5ab4)

And our forked version support opening db, attaching db, reading/writing files from OPFS. and it is similar to this PR:

import { createOPFSFileHandle } from '@datadocs/duckdb-wasm';
...
const dirHandle = await navigator.storage.getDirectory();
const dbHandle = await createOPFSFileHandle(`attach.db`, dirHandle, {
  create: true,
  emptyAsAbsent: true,
});
const walHandle = await createOPFSFileHandle(`attach.db.wal`, dirHandle, {
  create: true,
  emptyAsAbsent: true,
});
...
await duckdbConnection.query('ATTACH 'attach.db');

The repositories at here:

I originally intended to create these support as a PR into DuckDB repository. However, after several months app testing, I found that this PR already exists.

@dengkunli
Copy link
Contributor Author

Those changes of the interface are not compatible with current duckdb-wasm, that will need to be slightly changed to support those new APIs. I do not know, but it's possible that with the newer interface more stuff will work out of the box, or possible some different methods needs to be supported.

@carlopi Got it, thanks for the info, I'll provide an update accordingly

@hamilton
Copy link

hamilton commented May 2, 2024

Hi @dengkunli! Do you need any help on getting this over the finish line?

@dengkunli
Copy link
Contributor Author

Hi @dengkunli! Do you need any help on getting this over the finish line?

yes actually, following recent rework of the file system in mainline duckdb, i've work out the mvp & eh build, but not the coi build (cross-origin isolation build that uses pthreads). In coi mode, an error is frequently thrown (not always, but very likely) in pthread worker: the error occurs when js invokes duckdb_web_fs_get_file_info_by_id and parse the result string as json, when parsing, instead of receiving a valid json, it receives a malformed json whose first character is not { and instead is \u0000, example: \u0000"cacheEpoch":3,"fileId":1,"fileName":"opfs://test.db","fileSize":12288.0,"dataProtocol":3.0,"dataUrl":"opfs://test.db","collectStatistics":false}

i've spent some time digging in, but could not find the cause, any idea what kind of error it may be?

Thanks a lot @hamilton @carlopi

@HiveTechDev
Copy link

HiveTechDev commented Jul 12, 2024

@carlopi have you approved this merge or should we work from the branch?

@Alex-Monahan
Copy link

Based on this article, it may be best to have a separate OPFS DB per web worker and not share it scross browser tabs:
https://www.notion.so/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite

@derekperkins
Copy link

Based on this article, it may be best to have a separate OPFS DB per web worker and not share it scross browser tabs: https://www.notion.so/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite

That would be a huge mistake IMO, for the same reason they didn't do that in the article. Multiple tabs should be able to share the same backing dataset, without duckdb-wasm enforcing a per tab restriction. That being said, while it might be nice for this package to handle concurrency internally, I think it would be preferable to ship something first, with the caveat being that users have to manage concurrency / race conditions themselves.

@warrenbhw
Copy link

any updates on this PR? What's required to land this?

@warrenbhw
Copy link

@dengkunli @derekperkins is there any reason that this isn't yet merged? Happy to pitch in if there's any additional work needed to land this in the next couple weeks.

@warrenbhw
Copy link

gentle bump @carlopi

@derekperkins
Copy link

If @hangxingliu has a working implementation, and this still requires significant reworking with the v1 filesystem changes, should we look at just upstreaming that?

@hangxingliu
Copy link

If @hangxingliu has a working implementation, and this still requires significant reworking with the v1 filesystem changes, should we look at just upstreaming that?

Sorry @derekperkins , I just read all changes between our forked version and the latest code in this repo, and confirmed that our implementation is also outdated with the latest DuckDB's version. But the good news is that the migration is not very difficult and we will update our code recently because our app also needs to use the latest version of DuckDB as one of frontend storage solution. But I am not sure does DuckDB official team has any plan to implement this feature. We would use official solution if they have.

@derekperkins
Copy link

@hangxingliu thanks for the update. If this PR isn't updated by the time you update your fork, you should submit a separate PR. Maybe we can take the best ideas from both.

@warrenbhw
Copy link

@derekperkins upstreaming the changes made by @hangxingliu would be ideal. I'm very interested in having some kind of built-in persistence story, and I think there have been a few others threads on GH about this - would make duckdb-wasm much more powerful!

@warrenbhw
Copy link

warrenbhw commented Sep 10, 2024

gentle bump again @hangxingliu and @derekperkins :)

Wondering if anyone is planning on getting these changes upstreamed. Would be a big win!

I don't yet have the duckdb expertise to do this without spending some time digging in, so it would be lovely if someone who's more familiar is planning to merge the changes

Apologies for all of the comments on the thread. Just would be good to understand if there are plans to make this happen, or if I should assume that it's not on anyone's near-term roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.