Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] consider using ColumnPosition.LAST to replace columnPosition is null #401

Closed
FANNG1 opened this issue Sep 15, 2023 · 1 comment · Fixed by #927
Closed

[Improvement] consider using ColumnPosition.LAST to replace columnPosition is null #401

FANNG1 opened this issue Sep 15, 2023 · 1 comment · Fixed by #927
Assignees
Labels
improvement Improvements on everything

Comments

@FANNG1
Copy link
Contributor

FANNG1 commented Sep 15, 2023

What would you like to be improved?

ColumnPosition could be first and after, null value means last in Graviton now. In the perspective of programing it's confusing.
For SparkCatalog in Iceberg, null position is not allowed.

    if (add.position() instanceof TableChange.After) {
      pendingUpdate.moveAfter(DOT.join(add.fieldNames()), referenceField);
    } else if (add.position() instanceof TableChange.First) {
      pendingUpdate.moveFirst(DOT.join(add.fieldNames()));
    } else {
      Preconditions.checkArgument(
          add.position() == null,
          "Cannot add '%s' at unknown position: %s",
          DOT.join(add.fieldNames()),
          add.position());
    }

How should we improve?

For end user, if ColumnPosition is not defined, GravitonServer should fill a default value ColumnPosition.LAST. ColumnPosition is null is illegal in the Graviton Server.

@mchades
Copy link
Contributor

mchades commented Nov 29, 2023

I hold the view that Gravitino should not fill in the null position by ColumnPosition.LAST as default.
Instead, it should be the responsibility of different catalogs to handle null values.

This is due to the different ways in which catalogs handle the null position. For example, Hive defaults to adding new columns to the end of non-partitioned columns(#871 did as the Hive default behavior), while Iceberg adds them to the last column (native Iceberg throws an exception, but #383 made an implicit conversion).

A uniform default value fill would make it difficult for catalogs to distinguish between "Last" and null.
@jerryshao What's your opinion?

jerryshao pushed a commit that referenced this issue Dec 5, 2023
### What changes were proposed in this pull request?

Introduce a default column position for unspecified positions when
adding a column.

### Why are the changes needed?

Passing null values in code carries risks and uncertainties. Using
default values can solve this problem and improve the robustness of the
code.


Fix: #401 

### Does this PR introduce _any_ user-facing change?

no, users can still use null, but it will be internally converted to the
default value.

### How was this patch tested?
adding new UTs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants