Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for SheetReader extension #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

freddie-freeloader
Copy link
Owner

@freddie-freeloader freddie-freeloader commented Oct 3, 2024

Hi!

In the last semester, I was part of a programming project organized by the DIMA group at TU Berlin. We created a small DuckDB-extension named sheetreader that utilizes sheetreader-core (a fast multi-threaded XLSX parser) for importing XLSX files into DuckDB.

We did a few benchmarks comparing our extension to the import function which the spatial extension provides (st_read). Our first benchmarks indicate, that depending on several factors the sheetreader extension is around 5 to 10 times faster than the spatial extension at parsing XLSX files and loading them into DuckDB (https://github.com/polydbms/sheetreader-duckdb/?tab=readme-ov-file#benchmarks).

We would like to offer this extension as a DuckDB community extension.

A note regarding the repository structure of our extension:

  • We have a version in branch benchmark-version that has code dedicated for benchmarking.
  • On the branch main, we provide a “slimmed down” version with that code removed. We would like to offer the latter version as the community extension.

@freddie-freeloader freddie-freeloader force-pushed the add-sheetreader-extension branch 3 times, most recently from 4d7e446 to 7070d94 Compare October 4, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants