Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-5825] Add URL_ENCODE and URL_DECODE function (enabled in Spark library) #3318

Merged
merged 1 commit into from
Oct 27, 2023

Conversation

herunkang2018
Copy link
Contributor

No description provided.

@sonarcloud
Copy link

sonarcloud bot commented Jul 14, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

89.2% 89.2% Coverage
0.0% 0.0% Duplication

@herunkang2018
Copy link
Contributor Author

@NobiGo would you like to help review if you have time? Thanks very much.

import static org.apache.calcite.util.Static.RESOURCE;

/**
* A collection of functions used in Url processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is UrlFunctions only available in Spark?

If so, I think the comments could be more detailed in the description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an util class for all the general Url processing funcitons, not only for Spark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it

private static final Charset UTF_8 = StandardCharsets.UTF_8;

/** The "URL_DECODE(string)" function. */
public static String urlDecode(String value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value is null or empty string "",What is the correct behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. The correct behavior is to return empty string. I will add two unit tests to cover it.

Copy link
Contributor Author

@herunkang2018 herunkang2018 Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For null check, the ARG0 of NullPolicy will do null check in code generation phase, so I think there is no need to implement null check in each function implementation additionally.

String url;
try {
url = URLDecoder.decode(value, UTF_8.name());
} catch (UnsupportedEncodingException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides UnsupportedEncodingException, are there any other exceptions to handle?

Like some RuntimeException,such as NullPointerException.

Copy link
Contributor Author

@herunkang2018 herunkang2018 Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for remind. RuntimeException does not need to explicitly throw, so easy to forget to handle it. I check the RuntimeException in URLDecoder.decode, there will be IllegalArgumentException when encounter the unencoded %.

The next I will do are two things:

  1. Keep current exception handling way. Since IllegalArgumentException exception messages are very diverse, I think that not handling with it explicitly here is reasonable, and just throws IllegalArgumentException when it encountered.
  2. Add exception test cases to cover IllegalArgumentException.

Copy link
Contributor Author

@herunkang2018 herunkang2018 Oct 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases to cover IllegalArgumentException is added.

}

/** The "URL_ENCODE(string)" function. */
public static String urlEncode(String url) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

@herunkang2018 herunkang2018 Sep 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be RuntimeException, including NullPointerException/IllegalCharsetNameException/UnsupportedCharsetException, since we use UTF-8 charset, and this charset always exists. So these RuntimeException will never appear. I think we can ignore them.

@LakeShen
Copy link
Contributor

@herunkang2018 ,thank you for opening this PR, I left a couple of comments,overall looks good.

@herunkang2018
Copy link
Contributor Author

@LakeShen Thanks a lot for the review. All comments resolved, please help to review again if you have time.

@LakeShen
Copy link
Contributor

LakeShen commented Oct 2, 2023

@herunkang2018 LGTM.

And need the second people to review it:)

@@ -2854,6 +2854,8 @@ BigQuery's type system uses confusingly different names for types and functions:
| b | UNIX_MILLIS(timestamp) | Returns the number of milliseconds since 1970-01-01 00:00:00
| b | UNIX_SECONDS(timestamp) | Returns the number of seconds since 1970-01-01 00:00:00
| b | UNIX_DATE(date) | Returns the number of days since 1970-01-01
| s | URL_DECODE(string) | Decodes a *string* in 'application/x-www-form-urlencoded' format using a specific encoding scheme
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should specify what happens if the function fails to encode/decode. Is the result NULL or a runtime error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for remind, I will check it soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return original string when decoded error. I have fixed this and added the behaviour to doc.

Copy link
Contributor

@JiajunBernoulli JiajunBernoulli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good, would you please squash your commits?

@sonarcloud
Copy link

sonarcloud bot commented Oct 27, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

90.2% 90.2% Coverage
0.0% 0.0% Duplication

@herunkang2018
Copy link
Contributor Author

@JiajunBernoulli Sure, the commit is squashed.

@JiajunBernoulli JiajunBernoulli merged commit ad2e843 into apache:main Oct 27, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants