Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support un-nesting JSONB arrays #90

Open
georgelza opened this issue Aug 22, 2024 · 1 comment
Open

Support un-nesting JSONB arrays #90

georgelza opened this issue Aug 22, 2024 · 1 comment
Labels
feature New feature or request good first issue Good for newcomers priority-medium Medium priority issue

Comments

@georgelza
Copy link

What feature are you requesting?

I have a dataset/document as follows, which is pushed to iceberg/s3 into parquet format.
i've configured paradedb to be able to query the data.

i need to be able to flatten/select from the documents in the basket items array.
i need to be able to filter based on values in the individual documents in the basket items array.

{ "invoiceNumber": "1341243123341232", "saleDateTime_Ltz": "2023-12-23T16:53:39.911+02:00", "salesTimetamp_Epoc": "1718117619911", "store" : { "id": "1033", "name": "Derry" }, "clerk": { "id": "231", "name": "Martin", "surname": "Smith" }, "terminalPoint": "14", "basketItems":[ { "id": "234123412", "name": "Minty Frsh", "brand": "Colgate", "category": "Healthcare", "price":12412.00, "quantity":3 }, { "id": "234123421", "name": "All Bran", "brand": "Kellog's", "category": "Cereal", "price":12.00, "quantity":3 }, { "id": "534123412", "name": "Sugar Free", "brand": "Coke", "category": "Cool drinks", "price":112.00, "quantity":2 }, { "id": "224123412", "name": "Auto Wash", "brand": "OMO", "category": "Cleaning", "price":22.00, "quantity":4 } ], "nett": 442.23, "vat":10.00, "total":452.23 }

Why are you requesting this feature?

paradedb is to be a front end for out analytics.
our source data is going via stream via flink into a iceberg table format on S3 into parquet file format.
source data is multi multi level json structured, far t complex to flatten into old shape column/rows.

What is your proposed implementation for this feature?

.

Full Name:

George Leonard

Affiliation:

none

@georgelza georgelza changed the title unseating jsonb arrays unnesting jsonb arrays Aug 22, 2024
@rebasedming rebasedming added good first issue Good for newcomers priority-medium Medium priority issue labels Aug 23, 2024
@philippemnoel philippemnoel changed the title unnesting jsonb arrays Support unnesting JSONB arrays Aug 23, 2024
@philippemnoel philippemnoel changed the title Support unnesting JSONB arrays Support un-nesting JSONB arrays Aug 23, 2024
@philippemnoel philippemnoel added the feature New feature or request label Aug 23, 2024
@rebasedming
Copy link
Contributor

rebasedming commented Aug 26, 2024

As of #103 there is a workaround for this. You need to unnest at CREATE FOREIGN TABLE time but the following is now possible:

-- without unnest
CREATE FOREIGN TABLE nested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'struct_col');

select * from nested;
        struct_col
--------------------------
 {"a": "abc", "b": "def"}

-- with unnest
CREATE FOREIGN TABLE unnested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'unnest(struct_col)');

select * from unnested;
  a  |  b
-----+-----
 abc | def
(1 row)

In the above example, test_duckdb_types.parquet has a JSON field called struct_col.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request good first issue Good for newcomers priority-medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants