Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parquet read TableDefinition support #4831

Merged

Conversation

devinrsmith
Copy link
Member

Additionally, adds explicit entry points for single, flat-partitioned, and kv-partitioned reads.

Fixes #4746
Partial workaround for #871

Additionally, adds explicit entry points for single, flat-partitioned, and kv-partitioned reads.

Fixes deephaven#4746
Partial workaround for deephaven#871
@devinrsmith devinrsmith added feature request New feature or request parquet Related to the Parquet integration DocumentationNeeded ReleaseNotesNeeded Release notes are needed labels Nov 15, 2023
@devinrsmith devinrsmith added this to the November 2023 milestone Nov 15, 2023
@devinrsmith devinrsmith self-assigned this Nov 15, 2023
Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal comments from me. I think this is a nice change. Deferring approval for Python reviewers.

py/server/deephaven/parquet.py Outdated Show resolved Hide resolved

return builder.build()

def _j_table_definition(table_definition: Union[Dict[str, DType], List[Column], None]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmao-denver Should this go in table.py or some other location where it could be reused?

py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
py/server/deephaven/parquet.py Outdated Show resolved Hide resolved
elif type == ParquetType.KV_PARTITIONED:
j_table = _JParquetTools.readKeyValuePartitionedTable(_JFile(path), read_instructions, j_table_definition)
elif type == ParquetType.METADATA_PARTITIONED:
raise DHError(f"{ParquetType.METADATA_PARTITIONED} with table_definition not currently supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the f-string here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just cleanly support refactoring if METADATA_PARTITIONED was renamed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -51,15 +49,15 @@ def test_crd(self):
with self.subTest(msg="write_table(Table, str)"):
write(table, file_location)
self.assertTrue(os.path.exists(file_location))
table2 = read(file_location)
table2 = read(file_location, type=ParquetType.SINGLE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These type specs are not needed, correct? If they are needed, I'm sure user code will break.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly about having the test code be more targeted - no reason to test general parquet layout inference for every single test IMO. There are tests specifically designed to test layout inference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with that. Just confirming that we weren't breaking users.

malhotrashivam
malhotrashivam previously approved these changes Nov 15, 2023
Copy link
Contributor

@jmao-denver jmao-denver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python changes LGTM

@devinrsmith devinrsmith merged commit 8123f7c into deephaven:main Nov 16, 2023
10 checks passed
@devinrsmith devinrsmith deleted the nightly/explicit-parquet-definitions branch November 16, 2023 22:03
@github-actions github-actions bot locked and limited conversation to collaborators Nov 16, 2023
@deephaven-internal
Copy link
Contributor

Labels indicate documentation is required. Issues for documentation have been opened:

How-to: https://github.com/deephaven/deephaven.io/issues/3446
Reference: https://github.com/deephaven/deephaven.io/issues/3447

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
DocumentationNeeded feature request New feature or request parquet Related to the Parquet integration ReleaseNotesNeeded Release notes are needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initially empty hive partitioned parquet support
6 participants