Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal Support for Binary Precision #91

Open
3 of 4 tasks
wilwade opened this issue Jun 23, 2023 · 5 comments
Open
3 of 4 tasks

Decimal Support for Binary Precision #91

wilwade opened this issue Jun 23, 2023 · 5 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@wilwade
Copy link
Member

wilwade commented Jun 23, 2023

Currently this library only supports DECIMAL reading and writing when the precision is <= 18

To annotate the Parquet Spec: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

DECIMAL can be used to annotate the following types:

  • int32: for 1 <= precision <= 9
  • int64: for 1 <= precision <= 18; precision < 10 will produce a
    warning
  • fixed_len_byte_array: precision is limited by the array size. Length n
    can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits
  • binary: precision is not limited, but is required. The minimum number of
    bytes to store the unscaled value should be used.

Test Files:

Related Issues:

@YECHUNAN
Copy link

I made a PR attempting to add rudimentary support for Decimal fields that are represented by byte arrays, which may have precision over 18.

wilwade added a commit that referenced this issue Aug 14, 2023
Problem
=======
Address #91

Solution
========
When encountering such byte array represented "Decimal" fields, parse
them into raw buffers.

Change summary:
---------------
- Added code to parse "Decimal" type fields represented by byte arrays
(fixed length or non-fixed length) into raw buffer values for further
client side processing.
- Added two test cases verifying the added code.
- Loosen the precision check to allow values greater than 18 for byte
array represented "Decimal" fields.

Steps to Verify:
----------------
- Use the library to open a parquet file which contains a "Decimal"
field represented by a byte array whose precision is greater than 18.
- Before the change, library will throw an error saying precision cannot
be greater than 18.
- After the change, library will parse those fields to their raw buffer
values and return records normally.

---------

Co-authored-by: Wil Wade <[email protected]>
@wilwade wilwade changed the title Decimal Support for Precision > 18 Decimal Support for Binary Precision Aug 14, 2023
@craxal
Copy link

craxal commented Mar 12, 2024

I suspect that the earlier pull request has caused some regression issues related to DECIMAL values. Some folks are reporting the following error:

missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)

From what I can gather, this occurs even if there are no FIXED_LEN_BYTE_ARRAY backed DECIMAL values (only INT64 in one case).

@wilwade
Copy link
Member Author

wilwade commented Mar 13, 2024

@craxal the fix from @JasonYeMSFT released in v1.6.1 (just this morning) should fix it.

@craxal
Copy link

craxal commented Mar 13, 2024

@wilwade Ah, yes, I think it does. Just tested it myself. Sorry, I thought the pull request had already been released.

@craxal
Copy link

craxal commented Sep 9, 2024

Is there any status update on this item? We're hoping we can start parsing fixed length array decimals in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants