-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-5862] Incorrect semantics of ARRAY function (Spark library) when elements have Numeric and Character types #3324
Conversation
List<RelDataType> nullTypes = new ArrayList<>(); | ||
for (int i = 0; i < opBinding.getOperandCount(); i++) { | ||
SqlTypeFamily family = opBinding.getOperandType(i).getSqlTypeName().getFamily(); | ||
if (family == SqlTypeFamily.NUMERIC) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be rewritten as a switch case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments. Thanks for filing and opening a fix for this bug!
} | ||
} | ||
|
||
boolean onlyNumericCharacterTypes = opBinding.getOperandCount() > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be misunderstanding, but if you only care about the size of the lists and not the elements themselves, you could just use boolean flags to keep track of which types have appeared. I think it will simplify the expression below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final List<RelDataType> types = opBinding.collectOperandTypes();
boolean hasNumeric = false;
boolean hasCharacter = false
types.forEach(type -> {
// null check here maybe
switch (type.getSqlTypeName().getFamily()) {
case x:
hasNumeric = true;
break;
case y:
hasCharacter = true;
break;
default:
}
if hasNumeric || hasCharacter || c {....} ```
you could do something like this maybe? (Excuse the pseudocode it's just an example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You are right. I have fixed this. PTAL~
@tanclary thanks for reviewing. I will address these comments later. |
65f4e2f
to
b93fd1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think array must have same type.
You should use TypeCoercionImpl
to solve your problem.
Hi, Jiajun. thanks for reviewing. However, this PR does not modify the
calcite/core/src/main/java/org/apache/calcite/sql/type/SameOperandTypeChecker.java Line 92 in bdfb170
The so currently run |
btw, this PR just modify array function with Spark semantics, and the |
9846350
to
009e2d6
Compare
@JiajunBernoulli @tanclary Hi, Sorry to ping you. It has been a long time for this PR. Do you have any other concerns? If you have any comments, I hope to receive your feedback. If there are some problems in my implementation, maybe you can give some specific opinions. I will appreciate it. |
So to clarify: you can have an array with multiple types, but only if they are NUMERIC/CHARACTER? What if you have another combination like [1, Date '2008-12-25', 'a']? |
@tanclary thanks for reviewing. I have updated the PR, if you help time, pls help to re-check it. btw, the cases which have multi-types just like the case '[1, Date '2008-12-25', 'a']' can't enter in |
fc60f5f
to
f3d434e
Compare
f3d434e
to
ef178dd
Compare
hi, @tanclary It’s been a long time since last review, could we continue on this PR (if you have time)? thanks. |
@chucheng92 Yes I think this looks good but I will wait for @JiajunBernoulli to approve/give + 1 because they also had some comments if I remember correctly. |
@chucheng92 , Compared to NPE, the is indeed an improvement. LGTM, remember to squash your commits before merging. |
f28e385
to
5f632c0
Compare
@tanclary @JiajunBernoulli thanks for your patient review. all comments resolved. @tanclary If you have time, could you help to merge this? |
@chucheng92 Hi yes, would you mind editing the commit message to add a space between "function" and "(Spark library)"? Apologies for the nit. Re-ping me after and I will help merge 👍 |
5f632c0
to
ae17bd2
Compare
…when elements have Numeric and Character types
ae17bd2
to
60797bf
Compare
Kudos, SonarCloud Quality Gate passed! |
Thanks, fixed it. |
This PR introduce a regression when building an array with structs. I defined in my codebase public class NamedStructUtils {
private NamedStructUtils() {
throw new UnsupportedOperationException("This is a utility class and cannot be instantiated");
}
public static SqlBasicFunction create() {
return SqlBasicFunction.create(
"NAMED_STRUCT",
NamedStructUtils::returnType,
OPERAND_TYPE_CHECKER);
}
private static String namedStructRequiresLiteralStringKeysGotExpression() {
return "named_struct key is not a string literal, got an expression";
}
private static String namedStructRequiresLiteralStringKeysGotOtherType(String tpe) {
return String.format("named_struct key is not a string literal, found type '%s'", tpe);
}
private static String namedStructRequiresEvenNumberOfArgs() {
return "named_struct requires an even number for arguments";
}
private static RelDataType returnType(SqlOperatorBinding operatorBinding) {
final List<String> keys = new ArrayList<String>();
final List<RelDataType> values = new ArrayList<RelDataType>();
int i = 0;
for (RelDataType type : operatorBinding.collectOperandTypes()) {
if (i % 2 == 0) {
Object key = operatorBinding.getOperandLiteralValue(i, Object.class);
if (key == null) {
throw new CalciteException(
namedStructRequiresLiteralStringKeysGotExpression(), null);
}
if (key instanceof NlsString) {
keys.add(((NlsString) key).getValue());
} else {
String tpe = key.getClass().getSimpleName();
throw new CalciteException(
namedStructRequiresLiteralStringKeysGotOtherType(tpe), null);
}
} else {
values.add(type);
}
i++;
}
final RelDataTypeFactory typeFactory = operatorBinding.getTypeFactory();
return typeFactory.createStructType(values, keys);
}
private static final SqlOperandTypeChecker TWO_OR_MORE =
OperandTypes.variadic(SqlOperandCountRanges.from(2));
private static final SqlOperandTypeChecker OPERAND_TYPE_CHECKER =
new SqlOperandTypeChecker() {
@Override public boolean checkOperandTypes(
SqlCallBinding callBinding,
boolean throwOnFailure) {
final boolean isRangeOk = TWO_OR_MORE.checkOperandTypes(callBinding, throwOnFailure);
final int operandCount = callBinding.operands().size();
final boolean isEvenNumberOfArgs = operandCount % 2 != 0;
if (throwOnFailure && isEvenNumberOfArgs) {
throw new CalciteException(namedStructRequiresEvenNumberOfArgs(), null);
}
return isRangeOk && isEvenNumberOfArgs;
}
@Override public SqlOperandCountRange getOperandCountRange() {
return TWO_OR_MORE.getOperandCountRange();
}
@Override public String getAllowedSignatures(SqlOperator op, String opName) {
return opName + "(...)";
}
};
} Since eg
|
@MasseGuillaume thanks for reporting this. I will try to reproduce/fix this. |
opened a ticket to track this: https://issues.apache.org/jira/browse/CALCITE-6127 |
What is the purpose of the change
https://issues.apache.org/jira/browse/CALCITE-5862
Brief change log
Fix incorrect spark semantics of array function when array elements have Numeric and Character types.
Verifying this change
SqlOperatorTests#testArrayFunc()