Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Add multi-column support to AggFormula #6206

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

lbooker42
Copy link
Contributor

@lbooker42 lbooker42 commented Oct 15, 2024

Python examples:

from deephaven import empty_table
from deephaven import agg

source = empty_table(20).update(
    ["X = i", "Y = 2 * i", "Z = 3 * i", "Letter = (X % 2 == 0) ? `A` : `B`"]
)

result = source.agg_by([
    agg.formula(formula="out_a=sqrt(5.0)"),
    agg.formula(formula="out_b=min(X)"),
    agg.formula(formula="out_c=min(X) + max(Y)"),
    agg.formula(formula="out_d=sum(X + Y + Z)"),
], by=["Letter"])

Groovy examples:

source = emptyTable(20).update("X = i", "Y = 2 * i", "Z = 3 * i", "Letter = (X % 2 == 0) ? `A` : `B`")

result = source.aggBy([
    AggFormula("out_a=sqrt(5.0)"),
    AggFormula("out_b=min(X)"),
    AggFormula("out_c=min(X) + max(Y)"),
    AggFormula("out_d=sum(X + Y + Z)"),
], "Letter")

@lbooker42 lbooker42 self-assigned this Oct 16, 2024
@lbooker42 lbooker42 added this to the 0.37.0 milestone Oct 16, 2024
py/server/deephaven/agg.py Outdated Show resolved Hide resolved
Comment on lines 183 to 185
cols (Union[str, List[str]]): If provided, supplies the column(s) to aggregate on, can be renaming expressions,
i.e. "new_col = col". Default is None, which can be valid when the `formula` argument supplies the input
column names or when used in Table agg_all_by operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting cols as a List[str] seems to suggest that we might want to support a list of formulas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure

formula(['output_col1=(input_col1 + input_col2) * input_col3', 'output_col2=(input_col21 + input_col22) * input_col23'])

is better than

[formula('output_col1=(input_col1 + input_col2) * input_col3'), formula('output_col2=(input_col21 + input_col22) * input_col23')]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our other methods do the second. I'm pointing out the inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a technical reason why we can't do that immediately for Java (already using String... var args to support Pair args). We could do it for Python but then we'd have a mismatch.

When/if we completely sunset the parameterized version, we can implement this without any conflict.

py/client/pydeephaven/agg.py Outdated Show resolved Hide resolved
Copy link
Member

@devinrsmith devinrsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worthwhile bringing up the gRPC structuring layer given work that Charles want to do there too...

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants