Support un-nesting JSONB arrays #90

georgelza · 2024-08-22T14:37:01Z

What feature are you requesting?

I have a dataset/document as follows, which is pushed to iceberg/s3 into parquet format.
i've configured paradedb to be able to query the data.

i need to be able to flatten/select from the documents in the basket items array.
i need to be able to filter based on values in the individual documents in the basket items array.

{ "invoiceNumber": "1341243123341232", "saleDateTime_Ltz": "2023-12-23T16:53:39.911+02:00", "salesTimetamp_Epoc": "1718117619911", "store" : { "id": "1033", "name": "Derry" }, "clerk": { "id": "231", "name": "Martin", "surname": "Smith" }, "terminalPoint": "14", "basketItems":[ { "id": "234123412", "name": "Minty Frsh", "brand": "Colgate", "category": "Healthcare", "price":12412.00, "quantity":3 }, { "id": "234123421", "name": "All Bran", "brand": "Kellog's", "category": "Cereal", "price":12.00, "quantity":3 }, { "id": "534123412", "name": "Sugar Free", "brand": "Coke", "category": "Cool drinks", "price":112.00, "quantity":2 }, { "id": "224123412", "name": "Auto Wash", "brand": "OMO", "category": "Cleaning", "price":22.00, "quantity":4 } ], "nett": 442.23, "vat":10.00, "total":452.23 }

Why are you requesting this feature?

paradedb is to be a front end for out analytics.
our source data is going via stream via flink into a iceberg table format on S3 into parquet file format.
source data is multi multi level json structured, far t complex to flatten into old shape column/rows.

What is your proposed implementation for this feature?

.

Full Name:

George Leonard

Affiliation:

none

The text was updated successfully, but these errors were encountered:

rebasedming · 2024-08-26T18:27:42Z

As of #103 there is a workaround for this. You need to unnest at CREATE FOREIGN TABLE time but the following is now possible:

-- without unnest
CREATE FOREIGN TABLE nested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'struct_col');

select * from nested;
        struct_col
--------------------------
 {"a": "abc", "b": "def"}

-- with unnest
CREATE FOREIGN TABLE unnested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'unnest(struct_col)');

select * from unnested;
  a  |  b
-----+-----
 abc | def
(1 row)

In the above example, test_duckdb_types.parquet has a JSON field called struct_col.

georgelza changed the title ~~unseating jsonb arrays~~ unnesting jsonb arrays Aug 22, 2024

rebasedming added good first issue Good for newcomers priority-medium Medium priority issue labels Aug 23, 2024

philippemnoel changed the title ~~unnesting jsonb arrays~~ Support unnesting JSONB arrays Aug 23, 2024

philippemnoel changed the title ~~Support unnesting JSONB arrays~~ Support un-nesting JSONB arrays Aug 23, 2024

philippemnoel added the feature New feature or request label Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support un-nesting JSONB arrays #90

Support un-nesting JSONB arrays #90

georgelza commented Aug 22, 2024

rebasedming commented Aug 26, 2024 •

edited

Loading

Support un-nesting JSONB arrays #90

Support un-nesting JSONB arrays #90

Comments

georgelza commented Aug 22, 2024

What feature are you requesting?

Why are you requesting this feature?

What is your proposed implementation for this feature?

Full Name:

Affiliation:

rebasedming commented Aug 26, 2024 • edited Loading

rebasedming commented Aug 26, 2024 •

edited

Loading