-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parquet read TableDefinition support #4831
Add parquet read TableDefinition support #4831
Conversation
Additionally, adds explicit entry points for single, flat-partitioned, and kv-partitioned reads. Fixes deephaven#4746 Partial workaround for deephaven#871
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimal comments from me. I think this is a nice change. Deferring approval for Python reviewers.
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
py/server/deephaven/parquet.py
Outdated
|
||
return builder.build() | ||
|
||
def _j_table_definition(table_definition: Union[Dict[str, DType], List[Column], None]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmao-denver Should this go in table.py
or some other location where it could be reused?
py/server/deephaven/parquet.py
Outdated
elif type == ParquetType.KV_PARTITIONED: | ||
j_table = _JParquetTools.readKeyValuePartitionedTable(_JFile(path), read_instructions, j_table_definition) | ||
elif type == ParquetType.METADATA_PARTITIONED: | ||
raise DHError(f"{ParquetType.METADATA_PARTITIONED} with table_definition not currently supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want the f-string here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just cleanly support refactoring if METADATA_PARTITIONED
was renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
py/server/tests/test_parquet.py
Outdated
@@ -51,15 +49,15 @@ def test_crd(self): | |||
with self.subTest(msg="write_table(Table, str)"): | |||
write(table, file_location) | |||
self.assertTrue(os.path.exists(file_location)) | |||
table2 = read(file_location) | |||
table2 = read(file_location, type=ParquetType.SINGLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These type specs are not needed, correct? If they are needed, I'm sure user code will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly about having the test code be more targeted - no reason to test general parquet layout inference for every single test IMO. There are tests specifically designed to test layout inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with that. Just confirming that we weren't breaking users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python changes LGTM
Labels indicate documentation is required. Issues for documentation have been opened: How-to: https://github.com/deephaven/deephaven.io/issues/3446 |
Additionally, adds explicit entry points for single, flat-partitioned, and kv-partitioned reads.
Fixes #4746
Partial workaround for #871