Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3359] Add Spark3.4 unit test framework #3497

Merged
merged 2 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions .github/workflows/velox_be.yml
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,73 @@ jobs:
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-spark33-$GITHUB_RUN_ID || true
ubuntu2004-test-spark34-slow:
runs-on: velox-self-hosted
steps:
- uses: actions/checkout@v2
- name: Setup docker container
run: |
docker run --rm --init --privileged --ulimit nofile=65536:65536 --ulimit core=-1 --security-opt seccomp=unconfined \
-v $PWD:/opt/gluten --name ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID -e NUM_THREADS=30 -detach 10.0.2.4:5000/gluten-dev/ubuntu:20.04 \
'cd /opt/gluten && sleep 14400'
- name: Build Gluten velox third party
run: |
docker exec ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID bash -l -c '
cd /opt/gluten/ep/build-velox/src && \
./get_velox.sh --velox_home=/opt/velox && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON'
- name: Build Gluten CPP library
run: |
docker exec ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID bash -l -c '
cd /opt/gluten/cpp && \
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --arrow_home=/opt/velox/_build/release/third_party/arrow_ep'
- name: Build and Run unit test for Spark 3.4.1(slow tests)
run: |
docker exec ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID bash -l -c 'cd /opt/gluten && \
mvn clean install -Pspark-3.4 -Pbackends-velox -Prss -Pspark-ut -DargLine="-Dspark.test.home=/opt/spark331" -DtagsToInclude=org.apache.spark.tags.ExtendedSQLTest'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we should target for spark341 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyuan Yes. Only GlutenParquetIOSuite#SPARK-40128 read DELTA_LENGTH_BYTE_ARRAY encoded strings test throws a Path does not exist exception when reading the input parquet data. However, even when using Spark 3.4.1, it still fails with a velox exception. Therefore, can we use Spark 3.3.1 in this PR and use Spark 3.4.1 in the subsequent PR?

- name: TPC-H SF1.0 && TPC-DS SF1.0 Parquet local spark3.4
run: |
docker exec ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID bash -l -c 'cd /opt/gluten/tools/gluten-it && \
mvn clean install -Pspark-3.4 \
&& GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \
--local --preset=velox --benchmark-type=h --error-on-memleak --disable-aqe --off-heap-size=10g -s=1.0 --threads=16 --iterations=1 \
&& GLUTEN_IT_JVM_ARGS=-Xmx5G sbin/gluten-it.sh queries-compare \
--local --preset=velox --benchmark-type=ds --error-on-memleak --off-heap-size=10g -s=1.0 --threads=16 --iterations=1'
- name: Exit docker container
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-spark34-slow-$GITHUB_RUN_ID || true
ubuntu2004-test-spark34:
runs-on: velox-self-hosted
steps:
- uses: actions/checkout@v2
- name: Setup docker container
run: |
docker run --rm --init --privileged --ulimit nofile=65536:65536 --ulimit core=-1 --security-opt seccomp=unconfined \
-v $PWD:/opt/gluten --name ubuntu2004-test-spark34-$GITHUB_RUN_ID -e NUM_THREADS=30 -detach 10.0.2.4:5000/gluten-dev/ubuntu:20.04 \
'cd /opt/gluten && sleep 14400'
- name: Build Gluten velox third party
run: |
docker exec ubuntu2004-test-spark34-$GITHUB_RUN_ID bash -c '
cd /opt/gluten/ep/build-velox/src && \
./get_velox.sh --velox_home=/opt/velox && \
./build_velox.sh --velox_home=/opt/velox --enable_ep_cache=ON'
- name: Build Gluten CPP library
run: |
docker exec ubuntu2004-test-spark34-$GITHUB_RUN_ID bash -c '
cd /opt/gluten/cpp && \
./compile.sh --build_velox_backend=ON --velox_home=/opt/velox --arrow_home=/opt/velox/_build/release/third_party/arrow_ep --build_examples=ON'
- name: Build and Run unit test for Spark 3.4.1(other tests)
run: |
docker exec ubuntu2004-test-spark34-$GITHUB_RUN_ID bash -c 'cd /opt/gluten && \
mvn clean install -Pspark-3.4 -Pbackends-velox -Prss -Pspark-ut -DargLine="-Dspark.test.home=/opt/spark331" -DtagsToExclude=org.apache.spark.tags.ExtendedSQLTest,io.glutenproject.tags.UDFTest,io.glutenproject.tags.SkipTestTags && \
mvn test -Pspark-3.4 -Pbackends-velox -DtagsToExclude=None -DtagsToInclude=io.glutenproject.tags.UDFTest'
- name: Exit docker container
if: ${{ always() }}
run: |
docker stop ubuntu2004-test-spark34-$GITHUB_RUN_ID || true
ubuntu2204-test:
runs-on: velox-self-hosted
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ class VeloxDataTypeValidationSuite extends VeloxWholeStageTransformerSuite {
}
}

test("Velox Parquet Write") {
ignore("Velox Parquet Write") {
withSQLConf(("spark.gluten.sql.native.writer.enabled", "true")) {
withTempDir {
dir =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ class VeloxParquetWriteForHiveSuite extends GlutenQueryTest with SQLTestUtils {
_.getMessage.toString.contains("Use Gluten partition write for hive")) == native)
}

test("test hive static partition write table") {
ignore("test hive static partition write table") {
withTable("t") {
spark.sql(
"CREATE TABLE t (c int, d long, e long)" +
Expand Down Expand Up @@ -127,7 +127,7 @@ class VeloxParquetWriteForHiveSuite extends GlutenQueryTest with SQLTestUtils {
}
}

test("test hive write table") {
ignore("test hive write table") {
withTable("t") {
spark.sql("CREATE TABLE t (c int) STORED AS PARQUET")
withSQLConf("spark.sql.hive.convertMetastoreParquet" -> "false") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ class VeloxParquetWriteSuite extends VeloxWholeStageTransformerSuite {
super.sparkConf.set("spark.gluten.sql.native.writer.enabled", "true")
}

test("test write parquet with compression codec") {
ignore("test write parquet with compression codec") {
// compression codec details see `VeloxParquetDatasource.cc`
Seq("snappy", "gzip", "zstd", "lz4", "none", "uncompressed")
.foreach {
Expand Down Expand Up @@ -71,7 +71,7 @@ class VeloxParquetWriteSuite extends VeloxWholeStageTransformerSuite {
}
}

test("test ctas") {
ignore("test ctas") {
withTable("velox_ctas") {
spark
.range(100)
Expand All @@ -82,7 +82,7 @@ class VeloxParquetWriteSuite extends VeloxWholeStageTransformerSuite {
}
}

test("test parquet dynamic partition write") {
ignore("test parquet dynamic partition write") {
withTempPath {
f =>
val path = f.getCanonicalPath
Expand Down
7 changes: 7 additions & 0 deletions gluten-ut/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -197,5 +197,12 @@
<module>common</module>
</modules>
</profile>
<profile>
<id>spark-3.4</id>
<modules>
<module>spark34</module>
<module>common</module>
</modules>
</profile>
</profiles>
</project>
189 changes: 189 additions & 0 deletions gluten-ut/spark34/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>gluten-ut</artifactId>
<groupId>io.glutenproject</groupId>
<version>1.1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>gluten-ut-spark34</artifactId>
<packaging>jar</packaging>
<name>Gluten Unit Test Spark34</name>

<dependencies>
<dependency>
<groupId>io.glutenproject</groupId>
<artifactId>gluten-ut-common</artifactId>
<version>${project.version}</version>
<scope>compile</scope>
<type>test-jar</type>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-column</artifactId>
<version>1.12.3</version>
<scope>test</scope>
<classifier>tests</classifier>
</dependency>
</dependencies>

<profiles>
<profile>
<id>backends-clickhouse</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>io.glutenproject</groupId>
<artifactId>backends-clickhouse</artifactId>
<version>${project.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.celeborn</groupId>
<artifactId>celeborn-client-spark-${spark.major.version}-shaded_${scala.binary.version}</artifactId>
<version>${celeborn.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
</profile>
<profile>
<id>backends-velox</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>io.glutenproject</groupId>
<artifactId>backends-velox</artifactId>
<version>${project.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
<version>${arrow.version}</version>
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-buffer</artifactId>
</exclusion>
</exclusions>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-c-data</artifactId>
<version>${arrow.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
</exclusion>
<exclusion>
<artifactId>protobuf-java</artifactId>
<groupId>com.google.protobuf</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-netty</artifactId>
<version>${arrow.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-core</artifactId>
<version>${arrow.version}</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-buffer</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<properties>
<clickhouse.lib.path></clickhouse.lib.path>
</properties>
</profile>
</profiles>

<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<executions>
<execution>
<id>test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<systemProperties>
<clickhouse.lib.path>${clickhouse.lib.path}</clickhouse.lib.path>
<tpcds.data.path>${tpcds.data.path}</tpcds.data.path>
</systemProperties>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<execution>
<id>prepare-test-jar</id>
<phase>test-compile</phase>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Loading
Loading