Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-5918] Add MAP function (enabled in Spark library) #3459

Merged
merged 1 commit into from
Oct 27, 2023

Conversation

chucheng92
Copy link
Member

https://issues.apache.org/jira/browse/CALCITE-5918

the spark map function has 2 different points with calcite map constructor:

  1. use map(k, v, ...) rather than map[k, v, ...]
  2. allows empty map such as map(); however calcite map constructor not support it.

@chucheng92 chucheng92 force-pushed the CALCITE-5918 branch 2 times, most recently from e592b6c to 752772f Compare October 8, 2023 04:04
site/_docs/reference.md Outdated Show resolved Hide resolved
return SqlStdOperatorTable.MAP_QUERY.createCall(s.end(this), args.get(0));
} else {
// MAP function e.g. "MAP(1, 2)" equivalent to standard "MAP[1, 2]"
return SqlLibraryOperators.MAP.createCall(s.end(this), args.getList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can check if the size of args is an even number.

Copy link
Member Author

@chucheng92 chucheng92 Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to retain the original parser logic here, because it will be detected later on OperandTypeChecker. And here is similar with spark's array handling and calcite array constructor. pls see: https://github.com/apache/calcite/pull/3141/files

Another benefit is that we can use some more human-readable error messages in predefined CalciteResource.

@chucheng92 chucheng92 force-pushed the CALCITE-5918 branch 3 times, most recently from a993e1e to 184d3f4 Compare October 11, 2023 07:30
@@ -873,6 +874,7 @@ Builder populate2() {
map.put(MAP_VALUE_CONSTRUCTOR, value);
map.put(ARRAY_VALUE_CONSTRUCTOR, value);
defineMethod(ARRAY, BuiltInMethod.ARRAYS_AS_LIST.method, NullPolicy.NONE);
defineMethod(MAP, BuiltInMethod.MAP_FUNCTION.method, NullPolicy.NONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call it just MAP to be consistent with other functions?

Copy link
Member Author

@chucheng92 chucheng92 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, there is a MAP object in BuiltInMethod, occupy this name. but yes, if we rename that, we can use MAP. I have updated it.

@@ -5251,6 +5251,17 @@ public static Map mapFromArrays(List keysArray, List valuesArray) {
return map;
}

/** Support the MAP function. */
public static Map mapFunction(Object... args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -568,6 +569,9 @@ public static SqlOperandTypeChecker variadic(
public static final SqlSingleOperandTypeChecker MAP_FROM_ENTRIES =
new MapFromEntriesOperandTypeChecker();

public static final SqlSingleOperandTypeChecker MAP_FUNCTION =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same naming comment as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

RelDataTypeFactory typeFactory,
List<RelDataType> argTypes) {
return Pair.of(
typeFactory.leastRestrictive(Util.quotientList(argTypes, 2, 0)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are these constants coming from and what do they signify?

Copy link
Member Author

@chucheng92 chucheng92 Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Util.quotientList(argTypes, 2, 0):
This extracts all elements at even indices from argTypes.
It represents the types of keys in the map as they are placed at even positions
e.g. 0, 2, 4, etc. details please see Util.quotientList.

std MapValueConstructor has same logic. see:

typeFactory.leastRestrictive(Util.quotientList(argTypes, 2, 0)),

I've added some comments here for better readability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add UT for typeFactory.leastRestrictive?

How about?
MAP('k1', 1, 'k2', 2.0, 'k3', '3')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add UT for typeFactory.leastRestrictive?

How about? MAP('k1', 1, 'k2', 2.0, 'k3', '3')

yes, added it. thanks for reviewing.

site/_docs/reference.md Outdated Show resolved Hide resolved
site/_docs/reference.md Outdated Show resolved Hide resolved
@chucheng92
Copy link
Member Author

chucheng92 commented Oct 13, 2023

hi, @tanclary I have solved the comments and updated the PR.

btw, I have rebased the main to use CALCITE-5570, I hope it won’t cause any trouble to your review.

@chucheng92
Copy link
Member Author

hi, @tanclary Sorry to pin you, I have solved the comments. could you help to re-check it?

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good. Left one naming comment

@@ -97,7 +97,7 @@ public <T> RelNode translate(Queryable<T> queryable) {
public RelNode translate(Expression expression) {
if (expression instanceof MethodCallExpression) {
final MethodCallExpression call = (MethodCallExpression) expression;
BuiltInMethod method = BuiltInMethod.MAP.get(call.method);
BuiltInMethod method = BuiltInMethod.FUNCTIONS_MAPS.get(call.method);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why call it FUNCTIONS_MAPS?

Copy link
Member Author

@chucheng92 chucheng92 Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this is a static field in BuiltInMethod to store all the methods. it occupy 'MAP' name before, so I can't continue to use this name as our spark map method name, so i changed it, then we can use 'MAP' as method name as you suggested for consistency in naming. (I set 'MAP_FUNCTION' in BuiltInMethod previously).

one disadvantage I can think of here is that it may cause compatibility issues? If there is an upper-layer application/engine use java reflection to call this field. But is this as expected?

I'm ok if we need to reset spark MAP method to 'MAP_FUNCTION' name in BuiltInMethod and keep here unchanged. @tanclary what do you think?

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, you might find some of the sonarcloud reports helpful, it looks like they have some suggestions on refactoring the for loops and providing types for the generic Map, not sure how feasible these are in your case but you can just check and see if any of them prove useful.

@chucheng92
Copy link
Member Author

chucheng92 commented Oct 26, 2023

@tanclary

Thanks for the reminder. I checked sonar report and problems are roughly divided into three categories:

  1. Assignment of index i in for-loop
  2. The MAP of SqlFunctions does not specify a generic type
  3. Reuse string constants

I have modified 1, but 2 and 3 actually not applicable for us.

2: The methods of SqlFunctions are all used by reflection. It can be written as generics or not. However, the current collection types Map and List in SqlFunctions are expressed using raw, I hope we can keep consistent.

3: is something like "CHAR(10) NOT NULL", there are multiple checkScalar, sonar wants us to define a variable and reuse it. In fact, it is not necessary. The current way of writing is more intuitive.

If you agree with these, I will squash the commits. WDYT?

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, think there might be some checkstyle issues but feel free to squash and ping me after!

@chucheng92
Copy link
Member Author

chucheng92 commented Oct 26, 2023

thank you all for patient reviewing! all comments are resolved.

@tanclary Thanks for approve! could you help to merge it? so we can add it in up-coming 1.36.

@sonarcloud
Copy link

sonarcloud bot commented Oct 26, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 14 Code Smells

98.5% 98.5% Coverage
0.0% 0.0% Duplication

Copy link
Contributor

@tanclary tanclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for the delay! Congrats on becoming a committer!

@tanclary tanclary merged commit b11f179 into apache:main Oct 27, 2023
18 checks passed
@chucheng92
Copy link
Member Author

LGTM, sorry for the delay! Congrats on becoming a committer!

thanks tanner. you helped me to a lot. :)

@chucheng92 chucheng92 deleted the CALCITE-5918 branch October 30, 2023 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants