-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further improvements to parquet page materializaition #5670
Conversation
...et/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToPageWithDictionary.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review your earlier PR, but I see other usages of org.apache.parquet.io.api.Binary
that could potentially be improved without needing to copy the bytes.
...s/parquet/base/src/main/java/io/deephaven/parquet/base/materializers/ObjectMaterializer.java
Outdated
Show resolved
Hide resolved
...ns/parquet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToObjectPage.java
Outdated
Show resolved
Hide resolved
/** | ||
* @return the factory to create the materializers for this column. | ||
*/ | ||
PageMaterializerFactory getPageMaterializerFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not immediately obvious to me that this belongs as part of ToPage
. No doubt it's convenient (and, we might argue is in the same style as the other "helpers" ToPage#getChunkType
, ToPage#getNativeType
). Generally, I'm trying to minimize the expansion of "public" interfaces. That said, this is already an implementation exposing interface, and I don't think this additional exposure is egregious in any way; just that I wish we were better at having a separation between public interfaces and implementation details. Oh well. (I don't think you need to necessarily make any changes here, just venting a bit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can safely consider ToPage
to be internal, rather than part of our API. It's literally just the layer that takes Parquet data and formats it for the engine. The materializer is intrinsically tied to the ToPage
implementation, in the sense that the two must be mutually compatible. I strongly encouraged Shivam to make this refactoring to avoid future bugs like the one that triggered this work, and to allow for further optimization in the same manner as his last several PRs, by removing unnecessary layering and data copying.
...ons/parquet/base/src/main/java/io/deephaven/parquet/base/materializers/BlobMaterializer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial code review.
/** | ||
* @return the factory to create the materializers for this column. | ||
*/ | ||
PageMaterializerFactory getPageMaterializerFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can safely consider ToPage
to be internal, rather than part of our API. It's literally just the layer that takes Parquet data and formats it for the engine. The materializer is intrinsically tied to the ToPage
implementation, in the sense that the two must be mutually compatible. I strongly encouraged Shivam to make this refactoring to avoid future bugs like the one that triggered this work, and to allow for further optimization in the same manner as his last several PRs, by removing unnecessary layering and data copying.
@@ -26,15 +28,14 @@ public static <ATTR extends Any> ToPage<ATTR, String[]> create( | |||
new ChunkDictionary<>( | |||
(dictionary, key) -> dictionary.decodeToBinary(key).toStringUsingUTF8(), | |||
dictionarySupplier), | |||
INSTANCE::convertResult); | |||
INSTANCE::convertResult, | |||
INSTANCE.getPageMaterializerFactory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's possible to defer this, but it's not obvious that it matters even a little bit.
...ions/parquet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToLongPage.java
Outdated
Show resolved
Hide resolved
/** | ||
* @return the factory to create the materializers for this column. | ||
*/ | ||
PageMaterializerFactory getPageMaterializerFactory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner if instead of having a PageMaterializerFactory
, ToPages
implemented PageMaterializerFactory
themselves? That seems more aligned with the intent, here.
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/PageMaterializer.java
Outdated
Show resolved
Hide resolved
@@ -20,16 +21,15 @@ | |||
|
|||
public class ToBigDecimalPage<ATTR extends Any> implements ToPage<ATTR, BigDecimal[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if we should delete ToBigDecimalPage
and ToBigIntegerPage
in order to just use ToObjectPage
.
...table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToBigDecimalFromNumeric.java
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetSchemaReader.java
Outdated
Show resolved
Hide resolved
...uet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToBooleanAsBytePage.java
Outdated
Show resolved
Hide resolved
...s/parquet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToInstantPage.java
Outdated
Show resolved
Hide resolved
...s/parquet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToInstantPage.java
Outdated
Show resolved
Hide resolved
...s/parquet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToInstantPage.java
Outdated
Show resolved
Hide resolved
...uet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToLocalDateTimePage.java
Outdated
Show resolved
Hide resolved
...parquet/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToLocalTimePage.java
Outdated
Show resolved
Hide resolved
...et/table/src/main/java/io/deephaven/parquet/table/pagestore/topage/ToPageWithDictionary.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Extension of #5582 and #5638
This PR fixes a bug related to codecs which was introduced in #5638 and does further code reorganization to reduce chances of such bugs in the future by moving the decisions for page materializer and toPage in the same place rather than the two decisions being made independently.