Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup for Wrangler SQL Execution Support #690

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

harshdeeppruthi
Copy link

This PR is responsible for the following:

  1. Adding the initial setup for the SQL Execution support for Wrangler Directives using CDAP's relational API
  2. Adding a UI toggle to enable this SQL Execution of directives

This includes implementation of sql support of drop directive as an example. Once this is merged, the SQL support for other directives can be extended in a similar way

@@ -126,4 +129,11 @@ default List<EntityCountMetric> getCountMetrics() {
// no op
return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a description for this method similar to getCountMetrics()

@Override
default Relation transform(RelationalTranformContext relationalTranformContext,
Relation relation) {
// no-op
Copy link
Contributor

@albertshau albertshau Jan 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking just at this class, it seems like every existing directive would become a no-op in SQL, which would be incorrect. It only makes sense when seeing the isSQLSupported() method in the RelationalTransform interface, and that it defaults to false.

I think it would be cleaner if the Directive interface did not extend the RelationalTransform interface, with only directives that do support SQL (like Drop) implementing it. Then the isSQLSupported() method is not needed, Wrangler just checks if the directive is an instance of RelationTransform. This way it is impossible to implement a directive where the transform() and isSQLSupported() methods don't match up.

@Description("Toggle to configure execution language between JEXL and SQL")
@Macro
@Nullable
private String sqlExecution;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a newline after this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to name this something like 'executionMode' and have an internal Enum for it. This is more extensible than a boolean toggle for SQL or no SQL, as it allows us to add new modes in the future if needed. (As a side note, when a property is meant to be a toggle, it should be of Boolean type, not String type).

For example, we may eventually want to add an 'auto' mode that lets the engine choose, or maybe some BigQuery specific SQL mode or something like that.

@@ -646,6 +717,11 @@ public static class Config extends PluginConfig {
static final String NAME_SCHEMA = "schema";
static final String NAME_ON_ERROR = "on-error";

@Name(NAME_SQL_EXECUTION)
@Description("Toggle to configure execution language between JEXL and SQL")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JEXL is a technical term that many users won't understand. It's also not technically correct, as many directives are direct java implementations that don't use jexl scripting. Not sure what would be a better term though. Native? Default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants