Skip to content

Commit

Permalink
[SPARK-48824][SQL] Add Identity Column SQL syntax
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

Add SQL support for creating identity columns. Users can specify a column `GENERATED ALWAYS AS IDENTITY(identityColumnSpec)` , where identity values are **always** generated by the system, or `GENERATED BY DEFAULT AS IDENTITY(identityColumnSpec)`, where users can specify the identity values.

Users can optionally specify the starting value of the column (default = 1) and the increment/step of the column (default = 1). Also we allow both
`START WITH <start> INCREMENT BY <step>`
and
`INCREMENT BY <step> START WITH <start>`

It allows flexible ordering of the increment and starting values, as both variants are used in the wild by other systems (e.g. [PostgreSQL](https://www.postgresql.org/docs/current/sql-createsequence.html) [Oracle](https://docs.oracle.com/en/database/oracle/oracle-database/23/sqlrf/CREATE-SEQUENCE.html#GUID-E9C78A8C-615A-4757-B2A8-5E6EFB130571)).

For example, we can define

```
CREATE TABLE default.example (
  id LONG GENERATED ALWAYS AS IDENTITY,
  id1 LONG GENERATED ALWAYS AS IDENTITY(),
  id2 LONG GENERATED BY DEFAULT AS IDENTITY(START WITH 0),
  id3 LONG GENERATED ALWAYS AS IDENTITY(INCREMENT BY 2),
  id4 LONG GENERATED BY DEFAULT AS IDENTITY(START WITH 0 INCREMENT BY -10),
  id5 LONG GENERATED ALWAYS AS IDENTITY(INCREMENT BY 2 START WITH -8),
  value LONG
)
```
This will enable defining identity columns in Spark SQL for data sources that support it.

To be more specific this PR

- Adds parser support for GENERATED { BY DEFAULT | ALWAYS } AS IDENTITY in create/replace table statements. Identity column specifications are temporarily stored in the field's metadata, and then are parsed/verified in DataSourceV2Strategy and used to instantiate v2 [Column]
- Adds TableCatalog::capabilities() and TableCatalogCapability.SUPPORTS_CREATE_TABLE_WITH_IDENTITY_COLUMNS This will be used to determine whether to allow specifying identity columns or whether to throw an exception.

### Why are the changes needed?

A SQL API is needed to create Identity Columns.

### Does this PR introduce _any_ user-facing change?

It allows the aforementioned SQL syntax to create identity columns in a table.

### How was this patch tested?

Positive and negative unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #47614 from zhipengmao-db/zhipengmao-db/SPARK-48824-id-syntax.

Authored-by: zhipeng.mao <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
  • Loading branch information
zhipengmao-db authored and cloud-fan committed Sep 15, 2024
1 parent 1346531 commit 931ab06
Show file tree
Hide file tree
Showing 22 changed files with 724 additions and 41 deletions.
24 changes: 24 additions & 0 deletions common/utils/src/main/resources/error/error-conditions.json
Original file line number Diff line number Diff line change
Expand Up @@ -1589,6 +1589,30 @@
],
"sqlState" : "42601"
},
"IDENTITY_COLUMNS_DUPLICATED_SEQUENCE_GENERATOR_OPTION" : {
"message" : [
"Duplicated IDENTITY column sequence generator option: <sequenceGeneratorOption>."
],
"sqlState" : "42601"
},
"IDENTITY_COLUMNS_ILLEGAL_STEP" : {
"message" : [
"IDENTITY column step cannot be 0."
],
"sqlState" : "42611"
},
"IDENTITY_COLUMNS_UNSUPPORTED_DATA_TYPE" : {
"message" : [
"DataType <dataType> is not supported for IDENTITY columns."
],
"sqlState" : "428H2"
},
"IDENTITY_COLUMN_WITH_DEFAULT_VALUE" : {
"message" : [
"A column cannot have both a default value and an identity column specification but column <colName> has default value: (<defaultValue>) and identity column specification: (<identityColumnSpec>)."
],
"sqlState" : "42623"
},
"ILLEGAL_DAY_OF_WEEK" : {
"message" : [
"Illegal input for day of week: <string>."
Expand Down
2 changes: 2 additions & 0 deletions docs/sql-ref-ansi-compliance.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,12 +536,14 @@ Below is a list of all the keywords in Spark SQL.
|HOUR|non-reserved|non-reserved|non-reserved|
|HOURS|non-reserved|non-reserved|non-reserved|
|IDENTIFIER|non-reserved|non-reserved|non-reserved|
|IDENTITY|non-reserved|non-reserved|non-reserved|
|IF|non-reserved|non-reserved|not a keyword|
|IGNORE|non-reserved|non-reserved|non-reserved|
|IMMEDIATE|non-reserved|non-reserved|non-reserved|
|IMPORT|non-reserved|non-reserved|non-reserved|
|IN|reserved|non-reserved|reserved|
|INCLUDE|non-reserved|non-reserved|non-reserved|
|INCREMENT|non-reserved|non-reserved|non-reserved|
|INDEX|non-reserved|non-reserved|non-reserved|
|INDEXES|non-reserved|non-reserved|non-reserved|
|INNER|reserved|strict-non-reserved|reserved|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,12 +256,14 @@ BINARY_HEX: 'X';
HOUR: 'HOUR';
HOURS: 'HOURS';
IDENTIFIER_KW: 'IDENTIFIER';
IDENTITY: 'IDENTITY';
IF: 'IF';
IGNORE: 'IGNORE';
IMMEDIATE: 'IMMEDIATE';
IMPORT: 'IMPORT';
IN: 'IN';
INCLUDE: 'INCLUDE';
INCREMENT: 'INCREMENT';
INDEX: 'INDEX';
INDEXES: 'INDEXES';
INNER: 'INNER';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1297,7 +1297,22 @@ colDefinitionOption
;

generationExpression
: GENERATED ALWAYS AS LEFT_PAREN expression RIGHT_PAREN
: GENERATED ALWAYS AS LEFT_PAREN expression RIGHT_PAREN #generatedColumn
| GENERATED (ALWAYS | BY DEFAULT) AS IDENTITY identityColSpec? #identityColumn
;

identityColSpec
: LEFT_PAREN sequenceGeneratorOption* RIGHT_PAREN
;

sequenceGeneratorOption
: START WITH start=sequenceGeneratorStartOrStep
| INCREMENT BY step=sequenceGeneratorStartOrStep
;

sequenceGeneratorStartOrStep
: MINUS? INTEGER_VALUE
| MINUS? BIGINT_LITERAL
;

complexColTypeList
Expand Down Expand Up @@ -1591,11 +1606,13 @@ ansiNonReserved
| HOUR
| HOURS
| IDENTIFIER_KW
| IDENTITY
| IF
| IGNORE
| IMMEDIATE
| IMPORT
| INCLUDE
| INCREMENT
| INDEX
| INDEXES
| INPATH
Expand Down Expand Up @@ -1942,12 +1959,14 @@ nonReserved
| HOUR
| HOURS
| IDENTIFIER_KW
| IDENTITY
| IF
| IGNORE
| IMMEDIATE
| IMPORT
| IN
| INCLUDE
| INCREMENT
| INDEX
| INDEXES
| INPATH
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.connector.catalog;
import org.apache.spark.annotation.Evolving;

import java.util.Objects;

/**
* Identity column specification.
*/
@Evolving
public class IdentityColumnSpec {
private final long start;
private final long step;
private final boolean allowExplicitInsert;

/**
* Creates an identity column specification.
* @param start the start value to generate the identity values
* @param step the step value to generate the identity values
* @param allowExplicitInsert whether the identity column allows explicit insertion of values
*/
public IdentityColumnSpec(long start, long step, boolean allowExplicitInsert) {
this.start = start;
this.step = step;
this.allowExplicitInsert = allowExplicitInsert;
}

/**
* @return the start value to generate the identity values
*/
public long getStart() {
return start;
}

/**
* @return the step value to generate the identity values
*/
public long getStep() {
return step;
}

/**
* @return whether the identity column allows explicit insertion of values
*/
public boolean isAllowExplicitInsert() {
return allowExplicitInsert;
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
IdentityColumnSpec that = (IdentityColumnSpec) o;
return start == that.start &&
step == that.step &&
allowExplicitInsert == that.allowExplicitInsert;
}

@Override
public int hashCode() {
return Objects.hash(start, step, allowExplicitInsert);
}

@Override
public String toString() {
return "IdentityColumnSpec{" +
"start=" + start +
", step=" + step +
", allowExplicitInsert=" + allowExplicitInsert +
"}";
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,25 @@ private[sql] object QueryParsingErrors extends DataTypeErrorsBase {
ctx)
}

def identityColumnUnsupportedDataType(
ctx: IdentityColumnContext,
dataType: String): Throwable = {
new ParseException("IDENTITY_COLUMNS_UNSUPPORTED_DATA_TYPE", Map("dataType" -> dataType), ctx)
}

def identityColumnIllegalStep(ctx: IdentityColSpecContext): Throwable = {
new ParseException("IDENTITY_COLUMNS_ILLEGAL_STEP", Map.empty, ctx)
}

def identityColumnDuplicatedSequenceGeneratorOption(
ctx: IdentityColSpecContext,
sequenceGeneratorOption: String): Throwable = {
new ParseException(
"IDENTITY_COLUMNS_DUPLICATED_SEQUENCE_GENERATOR_OPTION",
Map("sequenceGeneratorOption" -> sequenceGeneratorOption),
ctx)
}

def createViewWithBothIfNotExistsAndReplaceError(ctx: CreateViewContext): Throwable = {
new ParseException(errorClass = "_LEGACY_ERROR_TEMP_0052", ctx)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ static Column create(
boolean nullable,
String comment,
String metadataInJSON) {
return new ColumnImpl(name, dataType, nullable, comment, null, null, metadataInJSON);
return new ColumnImpl(name, dataType, nullable, comment, null, null, null, metadataInJSON);
}

static Column create(
Expand All @@ -63,7 +63,8 @@ static Column create(
String comment,
ColumnDefaultValue defaultValue,
String metadataInJSON) {
return new ColumnImpl(name, dataType, nullable, comment, defaultValue, null, metadataInJSON);
return new ColumnImpl(name, dataType, nullable, comment, defaultValue,
null, null, metadataInJSON);
}

static Column create(
Expand All @@ -74,7 +75,18 @@ static Column create(
String generationExpression,
String metadataInJSON) {
return new ColumnImpl(name, dataType, nullable, comment, null,
generationExpression, metadataInJSON);
generationExpression, null, metadataInJSON);
}

static Column create(
String name,
DataType dataType,
boolean nullable,
String comment,
IdentityColumnSpec identityColumnSpec,
String metadataInJSON) {
return new ColumnImpl(name, dataType, nullable, comment, null,
null, identityColumnSpec, metadataInJSON);
}

/**
Expand Down Expand Up @@ -113,6 +125,12 @@ static Column create(
@Nullable
String generationExpression();

/**
* Returns the identity column specification of this table column. Null means no identity column.
*/
@Nullable
IdentityColumnSpec identityColumnSpec();

/**
* Returns the column metadata in JSON format.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,23 @@ public enum TableCatalogCapability {
* {@link TableCatalog#createTable}.
* See {@link Column#defaultValue()}.
*/
SUPPORT_COLUMN_DEFAULT_VALUE
SUPPORT_COLUMN_DEFAULT_VALUE,

/**
* Signals that the TableCatalog supports defining identity columns upon table creation in SQL.
* <p>
* Without this capability, any create/replace table statements with an identity column defined
* in the table schema will throw an exception during analysis.
* <p>
* An identity column is defined with syntax:
* {@code colName colType GENERATED ALWAYS AS IDENTITY(identityColumnSpec)}
* or
* {@code colName colType GENERATED BY DEFAULT AS IDENTITY(identityColumnSpec)}
* identityColumnSpec is defined with syntax: {@code [START WITH start | INCREMENT BY step]*}
* <p>
* IdentitySpec is included in the column definition for APIs like
* {@link TableCatalog#createTable}.
* See {@link Column#identityColumnSpec()}.
*/
SUPPORTS_CREATE_TABLE_WITH_IDENTITY_COLUMNS
}
Loading

0 comments on commit 931ab06

Please sign in to comment.