Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for some operations with decimals #988

Merged
merged 14 commits into from
Oct 16, 2024

Conversation

philss
Copy link
Member

@philss philss commented Sep 20, 2024

Adding support for decimals in some of functions that work with integers and floats. This is adding support for creating decimal series using the Explorer.Series.from_list/2 function as well.

The decimals support is still in the experimental phase in Polars, so we won't be able to support all the operations now. Some of the operations are returning f64 series or float results, and some of the functions are raising exceptions from Polars because they are not implemented at the backend.

@philss philss force-pushed the ps-add-decimal-dtype-part-ii branch from 2d16ef0 to 83afad1 Compare October 8, 2024 00:24
Instead of relying on the calculated dtype from the backend, we
take the maximum scale from the numbers passed in `from_list/2`,
or we cast to the scale given at the creation.
@philss philss marked this pull request as ready for review October 9, 2024 19:05
Copy link
Contributor

@billylanchantin billylanchantin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! This looks like it was a ton of work. And I think the tradeoffs (like the not implemented errors) are very reasonable.

Comment on lines 1696 to 1699
# Casting works, but the df will be different if we don't pass the precision,
# because it will take the default precision for decimals, which is "38".
# The second problem of not passing precision is that scale will be handled differently.
# This example would divide the integers by 10^2 without the precision.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems important. Would it be feasible to write tests to demonstrate the behavior it references?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to add some test cases for that. The only problem is that we have a constraint that raises when the "out DF" is different from what we calculate. I will check if it's possible to ignore that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm covering most of what I said in the comment, specially after removing the support for nil precision and scale. There is still an issue when we do an arithmetic operation and the backend returns a dtype without a precision, but we are casting in the end.

lib/explorer/shared.ex Outdated Show resolved Hide resolved
@@ -2577,6 +2601,7 @@ defmodule Explorer.Series do

* floats: #{Shared.inspect_dtypes(@float_dtypes, backsticks: true)}
* integers: #{Shared.inspect_dtypes(@integer_types, backsticks: true)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, there is a typo here, it should be backticks but we fix it later. :D

lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
@@ -3395,8 +3427,18 @@ defmodule Explorer.Series do
defp cast_to_add({:datetime, p, tz}, {:duration, p}), do: {:datetime, p, tz}
defp cast_to_add({:duration, p}, {:datetime, p, tz}), do: {:datetime, p, tz}
defp cast_to_add({:duration, p}, {:duration, p}), do: {:duration, p}

defp cast_to_add({:decimal, p1, s1}, {:decimal, p2, s2}),
do: {:decimal, maybe_max(p1, p2), maybe_max(s1, s2)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some rule we need to follow here? For example, is there a maximum value for precision and scale?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to ChatGPT:


Yes, in Apache Arrow's decimal128 type, the precision and scale are constrained as follows:

Precision: This represents the total number of digits that can be stored, both before and after the decimal point. For decimal128, the maximum precision is 38 digits. This means that it can store up to 38 significant digits.

Scale: This defines how many of the digits are allocated to the fractional part (i.e., after the decimal point). The scale can be any value between 0 and the precision value. For example, if you have a precision of 38 and set a scale of 10, then 28 digits can be used before the decimal point and 10 digits after.

Thus, the maximum precision is 38, and the scale can be anywhere from 0 to 38, depending on the application needs.


So I think we are good, but I'd encapsulate this logic in a function. :)

lib/explorer/shared.ex Outdated Show resolved Hide resolved
lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/series.ex Outdated Show resolved Hide resolved
lib/explorer/shared.ex Outdated Show resolved Hide resolved
Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits and ship it!!!

@philss philss merged commit 425c329 into main Oct 16, 2024
4 checks passed
@philss philss deleted the ps-add-decimal-dtype-part-ii branch October 16, 2024 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants