Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] Add simplified AST classes and visitors #1579

Merged
merged 11 commits into from
Oct 8, 2024
Merged

[v1] Add simplified AST classes and visitors #1579

merged 11 commits into from
Oct 8, 2024

Conversation

alancai98
Copy link
Member

@alancai98 alancai98 commented Sep 13, 2024

Relevant Issues

Description

  • Adds simplified AST classes and visitors written in Java
  • Also unnests existing AST classes to 1 level of nesting maximum (e.g. will have foo.bar)
  • Subsequent PRs will
    • Add docs for new classes
    • Add equals/hashcode impls
    • Integrate w/ existing v1 code
    • Add factory functions + builders
  • Note: DDL has also been removed for now. DDL will be added back at some point before the v1 release to show we can add new features without breaking existing code.

Other Information

  • Updated Unreleased Section in CHANGELOG: [NO]

    • No -- v1 branch
  • Any backward-incompatible changes? [NO]

  • Any new external dependencies? [NO]

  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? [YES]

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 self-assigned this Sep 13, 2024
Base automatically changed from v1-upgrade-1.9 to v1 September 16, 2024 23:41
@alancai98 alancai98 marked this pull request as draft September 17, 2024 23:49
@alancai98
Copy link
Member Author

Converting to a draft. From offline discussion, will look into

  • rewriting using simple Java classes
  • see the viability of using Lombok w/ the Java classes
  • unnest some AST nodes

@alancai98 alancai98 marked this pull request as ready for review September 23, 2024 23:48
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I've already brought this up in person, I'm adding some context and links here.

Thinking about the backwards/forwards compatibility, I think it would be a better approach to use integers rather than enums. Some Java libraries have done this, and I've also seen a similar approach from protobuf. Here's an interesting issue that discusses the downsides of using closed enums.

We could potentially have a model that looks something like this:

public interface IdentifierUnqualified {
    ...
    CaseSensitivity getCase();
}

public interface Enum {
    int getValue();
}

public interface CaseSensitivity implements Enum {
    static const int UNKNOWN = 0;
    static const int SENSITIVE = 1;
    static const int INSENSITIVE = 2;
}

OR, we could just use the integers directly without the Enum. Though, the enum does provide the possibility for better readability (toString overriding):

public interface IdentifierUnqualified {
    ...
    /**
     * Valid returns are specified as constants in: `CaseSensitivity.CASE_*`
     */
    int getCase();
}

public class CaseSensitivity {
    static const int CASE_UNKNOWN = 0;
    static const int CASE_SENSITIVE = 1;
    static const int CASE_INSENSITIVE = 2;
}

This modeling enforces that Java users use the default branch for switch statements, protecting us from backward compatibility issues. Also, the UNKNOWN case helps us in forward compatibility scenarios for deserialization. This is an approach that protobuf takes.

Our internal implementations, following the first model from above, could look like:

// Internal impl file
val id = idUnqualified("some_table", CaseSensitivity.SENSITIVE)
when (id.getCase().getValue()) {
    CaseSensitivity.SENSITIVE -> TODO()
    CaseSensitivity.INSENSITIVE -> TODO()
    else -> error("Unsupported. Or we can also convert to the UNKNOWN variant if need be.")
}

fun idUnqualified(name: String, case: Int): IdentifierUnqualified {
    return IdentifierUnqualifiedImpl(name, case)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the dive-deep into enum compatibility. Agree w/ the points that Java's closed enums are an issue and shouldn't be relied upon in our public APIs.

I'm open to using integers in some form. Probably will discuss w/ you offline further on the general API we should use:

  • interfaces?
  • classes w/ constants?
  • do we still need an unknown/other variant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion, chose to define an internal Enum interface within the partiql-ast package. Enums we which to add will be classes that implement this interface with a default constant of UNKNOWN set to 0.

There will be a private constructor with public functions to create.

E.g. for Order:

public class Order implements Enum {
    public static final int UNKNOWN = 0;
    public static final int ASC = 1;
    public static final int DESC = 2;

    public static Order UNKNOWN() {
        return new Order(UNKNOWN);
    }

    public static Order ASC() {
        return new Order(ASC);
    }

    public static Order DESC() {
        return new Order(DESC);
    }

    private final int code;

    private Order(int code) {
        this.code = code;
    }

    @Override
    public int code() {
        return code;
    }
}

And example client code calling and using this class:

Order order = Order.ASC();
switch (order.code()) {
    case Order.ASC:
        System.out.println("Ascending order");
        break;
    case Order.DESC:
        System.out.println("Descending order");
        break;
    default:
        System.out.println("Unknown order");
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to follow SQL more closely.

/**
 * <element reference> ::= <array value expression> <left bracket or trigraph> <numeric value expression> <right bracket or trigraph>
 * Example: some_array_ref[2 + 2], some_ref['hello'], some_ref[CAST('' || 'a' AS STRING)]
 */
class ElementReference extends Expr {
    @NotNull
    Expr root;

    @NotNull
    Expr element;
}

/**
 * <field reference> ::= <value expression primary> <period> <field name>
 */
class FieldReference {
    @NotNull
    Expr root;

    @NotNull
    Identifier field;
}

/**
 * This is used as part of the projection list in SQL, but it directly uses <qualified asterisk>.
 * <all fields reference> ::= <value expression primary> <period> <asterisk>
 */
class AllFieldsReference {
  @NotNull
  Expr root;
}


/**
 * Below is not specified by SQL:1999.
 * <all elements reference> ::= <value expression primary> <left bracket or trigraph> <asterisk> <right bracket or trigraph>
 */
class AllElementsReference {
  @NotNull
  Expr root;
}

Anyway, you could scrap the use of ExprPath and ExprPathStep, and builders would help with code-readability.

// SELECT * FROM t WHERE t['a'].b[0] > 2
val query = QueryBuilder()
    .selectStar()
    .from("t")
    .where(ExprBuilder().id("t").elementRef("a").fieldRef("b").elementRef(0).gt(2))
    .build()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I like the existing modeling of path expressions for a couple reasons:

  1. Path is a commonly used term in the PartiQL spec whereas reference is somewhat overloaded (e.g. variable references)
  2. Modeling the steps as a list with steps ordered left to right rather than a recursive chain seems a bit more intuitive to me (probably since that's what I've often been working with in this code base).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion, will model w/ a root and steps but will model the steps using a linked list rather than a java collection list. Also I'm coming around to the naming you had suggested for the path steps. I made some slight adjustments to be a less verbose (i.e. removing Reference).

/**
* TODO docs, equals, hashcode
*/
public class TypeAny extends Type {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd defer to @jpschorr, but this may be a good opportunity for us to use the TypeDynamic naming since these changes are breaking anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if TypeAny/TypeDynamic was used anywhere. I'll delete this class for now. Can always add back once there's a need.

Comment on lines +14 to +15
* TODO docs, equals, hashcode
*/
public class ExprIsType extends Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm aware, IS [NOT] <type> is still in the implementation. Are you suggestion to remove it?

/**
* TODO docs, equals, hashcode
*/
public class ExprDateDiff extends Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really hope we don't need to add this to the AST. Same goes for DATE_ADD. I wonder if there's a way we can rewrite this to be +/- for datetimes and intervals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I don't have much context for DATE_ADD/DATE_DIFF being added. It's not in the SQL/PartiQL specs. Probably was added in lieu of interval support in the v0.1 version of PLK.

It's somewhat strange that we chose to model the first arg, DatetimeField, as a keyword rather than a string, which means we needed to add a special parsing rule into the grammar. I'm not opposed to removing it for v1 but have concerns if some customers are relying on the existing syntax already.

Will discuss offline w/ you and other PLK folks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From offline discussion: we'll remove DATE_ADD and DATE_DIFF from the AST with the eventual plan to support the +/- syntax with intervals.

To support customers dependent on these existing functions and special keyword syntax, we will still support parsing the special syntax but

  • convert the datetime keyword (e.g. month, day) into a string or
  • map to different functions for each variant (e.g. date_add_month, date_diff_day)

DATE_ADD and DATE_DIFF will still have support in the plan and evaluator.

Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! I've gone through most of the files, though several remain. I can re-review once some modifications have been made.

Copy link

github-actions bot commented Oct 7, 2024

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-469780F) +/-
% Passing 89.67% 94.22% 4.55% ✅
Passing 5287 5555 268 ✅
Failing 609 54 -555 ✅
Ignored 0 287 287 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: 469780f
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2642
  • Failing in both: 18
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 1
  • PASSING in BASE but now IGNORED in TARGET: 111
  • FAILING in BASE but now PASSING in TARGET: 179
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 1 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. data type mismatch in logical expression, compileOption: PERMISSIVE

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

179 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-469780F). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ✅

BASE (EVAL-9B16D1D) TARGET (EVAL-469780F) +/-
% Passing 94.22% 94.22% 0.00% ✅
Passing 5555 5555 0 ✅
Failing 54 54 0 ✅
Ignored 287 287 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: 9b16d1d
  • Base Engine: EVAL
  • Target Commit: 469780f
  • Target Engine: EVAL

Result Details

  • Passing in both: 5555
  • Failing in both: 54
  • Ignored in both: 287
  • PASSING in BASE but now FAILING in TARGET: 0
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0
  • IGNORED in BASE but now PASSING in TARGET: 0

@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to enforce immutable structures, we could write getters (no setters). Or use final. Or, in the javadocs state that properties should not be modified.

@alancai98
Copy link
Member Author

If we want to enforce immutable structures, we could write getters (no setters). Or use final. Or, in the javadocs state that properties should not be modified.

I'll save the discussion for mutable vs immutable fields for another PR/discussion. I like immutable structures for most purposes. However I've dealt with some annoying patterns when I needed to modify parts of the AST which is a common use case for tree rewriting.

Copy link

github-actions bot commented Oct 7, 2024

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-DE2BFAF) +/-
% Passing 89.67% 94.22% 4.55% ✅
Passing 5287 5555 268 ✅
Failing 609 54 -555 ✅
Ignored 0 287 287 🔶
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: de2bfaf
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2642
  • Failing in both: 18
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 1
  • PASSING in BASE but now IGNORED in TARGET: 111
  • FAILING in BASE but now PASSING in TARGET: 179
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 1 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. data type mismatch in logical expression, compileOption: PERMISSIVE

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

179 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-DE2BFAF). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ✅

BASE (EVAL-9B16D1D) TARGET (EVAL-DE2BFAF) +/-
% Passing 94.22% 94.22% 0.00% ✅
Passing 5555 5555 0 ✅
Failing 54 54 0 ✅
Ignored 287 287 0 ✅
Total Tests 5896 5896 0 ✅

Testing Details

  • Base Commit: 9b16d1d
  • Base Engine: EVAL
  • Target Commit: de2bfaf
  • Target Engine: EVAL

Result Details

  • Passing in both: 5555
  • Failing in both: 54
  • Ignored in both: 287
  • PASSING in BASE but now FAILING in TARGET: 0
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 0
  • IGNORED in BASE but now PASSING in TARGET: 0

@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@alancai98
Copy link
Member Author

Hm not sure why the conformance test commenting is publishing a new comment for every commit rather than editing an existing comment. Deleting the stale comments for now. I'll investigate further if this happens for other PRs.

@partiql partiql deleted a comment from github-actions bot Oct 7, 2024
@alancai98 alancai98 merged commit 2fc23c7 into v1 Oct 8, 2024
14 checks passed
@alancai98 alancai98 deleted the v1-simplify-ast branch October 8, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants