Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#674] feat(all): add a unified Expression system #721

Merged
merged 8 commits into from
Nov 17, 2023

Conversation

mchades
Copy link
Contributor

@mchades mchades commented Nov 10, 2023

What changes were proposed in this pull request?

  • Add a unified Expression system
  • refactor expression of partitioning, sortOrder, and distribution

Why are the changes needed?

clarify Expression, Partitioning, SortOrder, and Distribution

Fix: #674

Does this PR introduce any user-facing change?

Yep. There have been changes in the representation of Partitioning, SortOrderDTO, and DistributionDTO.

How was this patch tested?

existing tests

@mchades mchades self-assigned this Nov 10, 2023
Copy link

github-actions bot commented Nov 10, 2023

Code Coverage Report

Overall Project 66.11% -4.66% 🟢
Files changed 55.73% 🔴

Module Coverage
server 87.96% 🟢
catalog-lakehouse-iceberg 82.04% -3.34% 🟢
client-java 76.11% 🟢
core 74.85% 🟢
catalog-hive 66.79% -0.08% 🟢
api 58.09% -21.26% 🔴
common 41.86% -9.46% 🟢
Files
Module File Coverage
server TableOperations.java 96.11% 🟢
catalog-lakehouse-iceberg FromIcebergSortOrder.java 94.26% 🟢
FromIcebergPartitionSpec.java 87.6% -6.61% 🟢
IcebergTable.java 85.39% 🟢
IcebergCatalogOperations.java 80.32% 🟢
ToIcebergPartitionSpec.java 79.01% -6.17% 🟢
ToIcebergSortOrder.java 52.08% -43.4% 🔴
client-java RelationalCatalog.java 97.58% 🟢
core EntityCombinedTable.java 92.92% 🟢
CatalogOperationDispatcher.java 88.75% 🟢
BaseTable.java 57.89% 🟢
catalog-hive HiveTable.java 93.89% -0.5% 🟢
HiveCatalogOperations.java 73.82% 🟢
api FunctionExpression.java 89.66% -10.34% 🟢
TableChange.java 85.09% 🟢
SortOrders.java 63.04% -36.96% 🟢
NamedReference.java 59.32% -40.68% 🔴
NullOrdering.java 53.33% -46.67% 🔴
Strategy.java 45.95% -54.05% 🔴
Distributions.java 45.1% -54.9% 🔴
SortDirection.java 40.96% -59.04% 🔴
Literal.java 32.1% -67.9% 🔴
Transforms.java 19.93% -80.07% 🔴
Expression.java 18.42% -81.58% 🔴
Transform.java 11.67% -88.33% 🔴
Table.java 0% -36.36% 🔴
TableCatalog.java 0% 🟢
Distribution.java 0% 🔴
SortOrder.java 0% 🔴
common MonthPartitioningDTO.java 100% 🟢
YearPartitioningDTO.java 100% 🟢
HourPartitioningDTO.java 100% 🟢
TableDTO.java 83.44% -3.97% 🟢
Partitioning.java 81.28% -18.72% 🟢
JsonUtils.java 78.51% -4.56% 🟢
FunctionArg.java 76% -24% 🟢
DayPartitioningDTO.java 75% -25% 🟢
IdentityPartitioningDTO.java 75% -25% 🟢
DistributionDTO.java 70.18% -11.11% 🟢
SortOrderDTO.java 63.33% -3.33% 🟢
FuncExpressionDTO.java 63.03% -36.97% 🟢
LiteralDTO.java 61.83% -38.17% 🟢
RangePartitioningDTO.java 55.17% -44.83% 🔴
ListPartitioningDTO.java 50% -50% 🔴
TruncatePartitioningDTO.java 49.45% -50.55% 🔴
FunctionPartitioningDTO.java 47.71% -52.29% 🔴
BucketPartitioningDTO.java 46.39% -53.61% 🔴
FieldReferenceDTO.java 45.98% -54.02% 🔴
TableCreateRequest.java 0% -10.37% 🔴
PartitionUtils.java 0% 🟢
DTOConverters.java 0% -76.17% 🔴

import org.apache.iceberg.expressions.Expressions;
import org.apache.iceberg.expressions.UnboundTerm;

/** Implement iceberg sort order converter to gravitino sort order. */
/** Implement gravitino sort order converter to iceberg sort order. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems incorrect, right? This is a class that converts Gravitino classes to Iceberg classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct after my modification

break;
case Transforms.NAME_OF_YEAR:
expression = Expressions.year(colName);
case "year":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that in the Transforms class, there are relevant constant strings. Why not consider using constants here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they're in different scopes, the constant strings in Transforms mean the partitioning method name, but here is FunctionExpression's function name, they just have the same name. Therefore, I don't reuse the constant strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/** List of fields or columns that are referenced by this expression. */
default NamedReference[] references() {
// SPARK-40398: Replace `Arrays.stream()...distinct()`
// to this for perf gain, the result order is not important.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this comment here, since it is not valid for us.

return arguments();
}

final class FuncExpressionValue implements FunctionExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuncExpressionValue seems not so accurate, may be better change to FuncExpressionImpl?

}

@EqualsAndHashCode
final class LiteralValue<T> implements Literal<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also.

return new NamedReference[] {this};
}

@EqualsAndHashCode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to implement this method by our own, not generate by lombok in API module.

return of(strategy, number, expressions);
}

public static final class DistributionValue implements Distribution {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, is it better change to DistributionImpl?

return new SortValue(expression, direction, nullOrdering);
}

public static final class SortValue implements SortOrder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here.

return arguments();
}

final class FuncExpressionValue implements FunctionExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name FuncExpressionValue is confusing.

/** Represents a field or column reference in the public logical expression API. */
public interface NamedReference extends Expression {

static FieldReference field(String... fieldName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding some explanation as to why we use a string array instead of a string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we use a string array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we use a string array?

@diqiu50 comment added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use "NameIdentifier" as the parameter type

try {
return valueOf(strategy.toUpperCase());
} catch (IllegalArgumentException e) {
throw new IllegalArgumentException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange here because we throw the same kind of exception here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I want to improve the error message for IllegalArgumentException, as it is currently not user-friendly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you using the following code to handle it.

    public static Strategy getByName(String name) {
      for (Strategy st : Strategy.values()) {
        if (st.name().equalsIgnoreCase(name)) {
          return st;
        }
      }
      throw new IllegalArgumentException(" your error message: " + name);
    }

*/
public interface Transform extends Expression {
/** Returns the transform function name. */
String name();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the different between Transform and FunctionExpression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage scenarios are different. Transform is used for partitioning, while FunctionExpression is used for ordinary expressions. The difference here has already been explained in the design doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the definitions are almost the same, so I was confused by their usage.

@qqqttt123 qqqttt123 changed the title [#674] feat(all): add unify Expression system [#674] feat(all): add a unified Expression system Nov 15, 2023
Transforms.field(new String[] {"field1"}), Transforms.function("now", new Transform[0])
Expression[] distributionArg2 =
new Expression[] {
field("field1"), FunctionExpression.of("now", Expression.EMPTY_EXPRESSION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define a FunctionExpression.of(String) function that takes no parameters

public enum Strategy {
HASH,
RANGE,
EVEN;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


public static BucketTransform bucket(int numBuckets, String[]... fieldNames) {
return new BucketTransform(
Literal.of(numBuckets, TypeCreator.REQUIRED.I32),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding StringLiteral.of() and IntegerLiteral.of() functions would make it easier to use.

Expression expression, SortDirection direction, NullOrdering nullOrdering) {
return new SortImpl(expression, direction, nullOrdering);
}

Copy link
Contributor

@diqiu50 diqiu50 Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give more funcitons to make easier to use.
of(Expression expression)
of(Expression expression, SortDirection direction)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SortOrders.of(Expression expression, SortDirection direction) already exists.

@mchades
Copy link
Contributor Author

mchades commented Nov 16, 2023

@Clearvive @jerryshao @diqiu50 @yuqi1129 I have feedback on all the comments you left, please review the PR again, thanks!

/** Returns the distribution strategy name. */
Strategy strategy();

int number();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add java doc for this interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

public static final Distribution NONE =
new DistributionImpl(Strategy.HASH, 0, Expression.EMPTY_EXPRESSION);

public static Distribution ofEVEN(int number, Expression... expressions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better rename to "even", "hash"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add javadoc for all the public API methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc for all the public API methods was added

* nameReferenceDistribution(Strategy.EVEN, 1, new String[]{"a"});
* </pre>
*/
public static Distribution ofFields(Strategy strategy, int number, String[] fieldNames) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be String[][] for fieldNames?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the javadoc you mentioned here is unmatched to the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

return arguments();
}

abstract class SingleFieldTransform implements Transform {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently have transform with more than one fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "SingleField" refers to the transform arguments, for example, ListTransform and BucketTransform may require referencing multiple fields.

public static final String NAME_OF_LIST = "list";
public static final String NAME_OF_RANGE = "range";

public static IdentityTransform identity(String... fieldName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would better change to identity(String[] fieldName), here String[] field represents a field with name as an array fo string. Using String... will give the user a feeling that identity supports multiple fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I change it to identity(String[] fieldName) and add another method identity(String columnName) for convenient to use

Arrays.stream(fieldNames).map(NamedReference::field).toArray(NamedReference[]::new));
}

public static ListTransform list(String[][] fieldNames) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should unify with either String[][] or String[]....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified into String[]...

import com.datastrato.gravitino.rel.expressions.Expression;
import java.util.Arrays;

public interface FunctionArg extends Expression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usage of this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is added to more clearly distinguish the definition of function argument expression and the function expression on the client side. Otherwise, LiteralDTO and FieldReferenceDTO will be mixed up with FuncExpressionDTO.

@mchades
Copy link
Contributor Author

mchades commented Nov 16, 2023

@Clearvive Can you help to review the Iceberg catalog part again?

@mchades
Copy link
Contributor Author

mchades commented Nov 16, 2023

@yuqi1129 Can you please review the Hive catalog part again?

} else if (gravitinoExpr.arguments().length == 2) {
colName =
String.join(
DOT, ((NamedReference.FieldReference) gravitinoExpr.arguments()[1]).fieldName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you not considering the values stored in [0]? Shouldn't we log them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for pointing out my issue. My latest commit refactors this part of the code and improves readability. I welcome your further review.

break;
case Transforms.NAME_OF_YEAR:
expression = Expressions.year(colName);
case "year":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

* @param expression The expression to sort by
* @return The created sort order
*/
public static SortImpl of(Expression expression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better rename to "ascending", also adding a new method called "descending"? I think only of cannot clearly describe the meaning of the method, and users have to read the javadoc to understand the default behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@jerryshao
Copy link
Contributor

Just left one comment, overall LGTM.

@jerryshao jerryshao merged commit c11c996 into apache:main Nov 17, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Develop Generic Expression system
5 participants