Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#614] feat(docker): Gravitino Trino Docker image #702

Merged
merged 3 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 20 additions & 5 deletions .github/workflows/docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@ name: Publish Docker Image
on:
workflow_dispatch:
inputs:
image:
type: choice
description: 'Choose the image to build'
required: true
default: 'gravitino-ci-hive'
options:
- 'gravitino-ci-hive'
- 'gravitino-ci-trino'
tag:
description: 'Docker tag to apply to this image'
required: true
Expand All @@ -11,8 +19,6 @@ on:
description: 'Publish Docker token'
required: true
type: string
env:
HIVE_IMAGE_NAME: datastrato/gravitino-ci-hive

jobs:
publish-docker-image:
Expand All @@ -22,6 +28,16 @@ jobs:
input_token: ${{ github.event.inputs.token }}
secrets_token: ${{ secrets.PUBLISH_DOCKER_TOKEN }}
steps:
- name: Set environment variables
run: |
if [ "${{ github.event.inputs.image }}" == "gravitino-ci-hive" ]; then
echo "image_type=hive" >> $GITHUB_ENV
echo "image_name=datastrato/gravitino-ci-hive" >> $GITHUB_ENV
elif [ "${{ github.event.inputs.image }}" == "gravitino-ci-trino" ]; then
echo "image_type=trino" >> $GITHUB_ENV
echo "image_name=datastrato/gravitino-ci-trino" >> $GITHUB_ENV
fi

- uses: actions/checkout@v3

- name: Check publish Docker token
Expand All @@ -45,8 +61,7 @@ jobs:

- name: Build and Push the main branch Docker image
if: ${{ github.ref_name == 'main' }}
run: ./dev/docker/hive/build-docker.sh --platform all --image ${HIVE_IMAGE_NAME} --tag ${{ github.event.inputs.tag }} --latest

run: ./dev/docker/build-docker.sh --platform all --type ${image_type} --image ${image_name} --tag ${{ github.event.inputs.tag }} --latest
- name: Build and Push the other branch Docker image
if: ${{ github.ref_name != 'main' }}
run: ./dev/docker/hive/build-docker.sh --platform all --image ${HIVE_IMAGE_NAME} --tag ${{ github.event.inputs.tag }}
run: ./dev/docker/build-docker.sh --platform all --type ${image_type} --image ${image_name} --tag ${{ github.event.inputs.tag }}
40 changes: 25 additions & 15 deletions dev/docker/hive/README.md → dev/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,26 @@
Copyright 2023 Datastrato.
This software is licensed under the Apache License version 2.
-->
# Hadoop and Hive Docker image
This Docker image is used to support Gravitino integration testing.
It includes Hadoop-2.x and Hive-2.x, you can use this Docker image to test the Gravitino catalog-hive module.
# Gravitino Docker images
This Docker image is designed to facilitate Gravitino integration testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing whitespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not fixed, don't forget to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

It can be utilized to test all catalog and connector modules within Gravitino.

## Build Docker image
```
./build-docker.sh --platform [all|linux/amd64|linux/arm64] --image {image_name} --tag {tag_name} --latest
```
# Datastrato Docker hub repository
- [Datastrato Docker hub repository address](https://hub.docker.com/r/datastrato)

## Run container
## How to build Docker image
```
docker run --rm -d -p 8022:22 -p 8088:8088 -p 9000:9000 -p 9083:9083 -p 10000:10000 -p 10002:10002 -p 50070:50070 -p 50075:50075 -p 50010:50010 datastrato/gravitino-ci-hive
./build-docker.sh --platform [all|linux/amd64|linux/arm64] --type [hive|trino] --image {image_name} --tag {tag_name} --latest
```

## Login Docker container
# Version change history
## Gravitino CI Hive

### Container startup commands
```
ssh -p 8022 datastrato@localhost (password: ds123, this is a sudo user)
docker run --rm -d -p 8022:22 -p 8088:8088 -p 9000:9000 -p 9083:9083 -p 10000:10000 -p 10002:10002 -p 50070:50070 -p 50075:50075 -p 50010:50010 datastrato/gravitino-ci-hive
```

# Docker hub repository
- [datastrato/gravitino-ci-hive](https://hub.docker.com/r/datastrato/gravitino-ci-hive)

## Version change history
### 0.1.0
- Docker image `datastrato/gravitino-ci-hive:0.1.0`
- `hadoop-2.7.3`
Expand Down Expand Up @@ -62,3 +59,16 @@ ssh -p 8022 datastrato@localhost (password: ds123, this is a sudo user)

### 0.1.5
- Rollback `Map container hostname to 127.0.0.1 before starting Hadoop` of `datastrato/gravitino-ci-hive:0.1.4`

## Gravitino CI Trino

### Container startup commands
```
docker run --rm -it -p 8080:8080 datastrato/gravitino-ci-trino
```

### 0.1.0
- Docker image `datastrato/gravitino-ci-trino:0.1.0`
- Base on `trinodb/trino:426` and removed some unused plugins from it.
- Expose ports:
- `8080` Trino JDBC port
99 changes: 99 additions & 0 deletions dev/docker/build-docker.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/bin/bash
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
set -ex
script_dir="$(dirname "${BASH_SOURCE-$0}")"
script_dir="$(cd "${script_dir}">/dev/null; pwd)"

# Build docker image for multi-arch
USAGE="-e Usage: ./build-docker.sh --platform [all|linux/amd64|linux/arm64] --type [hive|trino] --image {image_name} --tag {tag_name} --latest"

# Get platform type
if [[ "$1" == "--platform" ]]; then
shift
platform_type="$1"
if [[ "${platform_type}" == "linux/amd64" || "${platform_type}" == "linux/arm64" || "${platform_type}" == "all" ]]; then
echo "INFO : platform type is ${platform_type}"
else
echo "ERROR : ${platform_type} is not a valid platform type"
echo ${USAGE}
exit 1
fi
shift
else
platform_type="all"
fi

# Get component type
if [[ "$1" == "--type" ]]; then
shift
component_type="$1"
shift
else
echo "ERROR : must specify component type"
echo ${USAGE}
exit 1
fi

# Get docker image name
if [[ "$1" == "--image" ]]; then
shift
image_name="$1"
shift
else
echo "ERROR : must specify image name"
echo ${USAGE}
exit 1
fi

# Get docker image tag
if [[ "$1" == "--tag" ]]; then
shift
tag_name="$1"
shift
fi

# Get latest flag
build_latest=0
if [[ "$1" == "--latest" ]]; then
shift
build_latest=1
fi

if [[ "${component_type}" == "hive" ]]; then
. ${script_dir}/hive/hive-dependency.sh
build_args="--build-arg HADOOP_PACKAGE_NAME=${HADOOP_PACKAGE_NAME} --build-arg HIVE_PACKAGE_NAME=${HIVE_PACKAGE_NAME}"
elif [ "${component_type}" == "trino" ]; then
true # Placeholder, do nothing
else
echo "ERROR : ${component_type} is not a valid component type"
echo ${USAGE}
exit 1
fi

# Create multi-arch builder
BUILDER_NAME="gravitino-builder"
builders=$(docker buildx ls)
if echo "${builders}" | grep -q "${BUILDER_NAME}"; then
echo "BuildKit builder '${BUILDER_NAME}' already exists."
else
echo "BuildKit builder '${BUILDER_NAME}' does not exist."
docker buildx create --platform linux/amd64,linux/arm64 --use --name ${BUILDER_NAME}
fi

cd ${script_dir}/${component_type}
if [[ "${platform_type}" == "all" ]]; then
if [ ${build_latest} -eq 1 ]; then
docker buildx build --platform=linux/amd64,linux/arm64 ${build_args} --push --progress plain -f Dockerfile -t ${image_name}:latest -t ${image_name}:${tag_name} .
else
docker buildx build --platform=linux/amd64,linux/arm64 ${build_args} --push --progress plain -f Dockerfile -t ${image_name}:${tag_name} .
fi
else
if [ ${build_latest} -eq 1 ]; then
docker buildx build --platform=${platform_type} ${build_args} --output type=docker --progress plain -f Dockerfile -t ${image_name}:latest -t ${image_name}:${tag_name} .
else
docker buildx build --platform=${platform_type} ${build_args} --output type=docker --progress plain -f Dockerfile -t ${image_name}:${tag_name} .
fi
fi
100 changes: 0 additions & 100 deletions dev/docker/hive/build-docker.sh

This file was deleted.

31 changes: 31 additions & 0 deletions dev/docker/hive/hive-dependency.sh
xunliu marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
set -ex
hive_dir="$(dirname "${BASH_SOURCE-$0}")"
hive_dir="$(cd "${hive_dir}">/dev/null; pwd)"

# Environment variables definition
HADOOP_VERSION="2.7.3"
HIVE_VERSION="2.3.9"

HADOOP_PACKAGE_NAME="hadoop-${HADOOP_VERSION}.tar.gz" # Must export this variable for Dockerfile
HADOOP_DOWNLOAD_URL="http://archive.apache.org/dist/hadoop/core/hadoop-${HADOOP_VERSION}/${HADOOP_PACKAGE_NAME}"

HIVE_PACKAGE_NAME="apache-hive-${HIVE_VERSION}-bin.tar.gz" # Must export this variable for Dockerfile
HIVE_DOWNLOAD_URL="https://archive.apache.org/dist/hive/hive-${HIVE_VERSION}/${HIVE_PACKAGE_NAME}"

# Prepare download packages
if [[ ! -d "${hive_dir}/packages" ]]; then
mkdir -p "${hive_dir}/packages"
fi

if [ ! -f "${hive_dir}/packages/${HADOOP_PACKAGE_NAME}" ]; then
curl -s -o "${hive_dir}/packages/${HADOOP_PACKAGE_NAME}" ${HADOOP_DOWNLOAD_URL}
fi

if [ ! -f "${hive_dir}/packages/${HIVE_PACKAGE_NAME}" ]; then
curl -s -o "${hive_dir}/packages/${HIVE_PACKAGE_NAME}" ${HIVE_DOWNLOAD_URL}
fi
47 changes: 47 additions & 0 deletions dev/docker/trino/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#
# Copyright 2023 Datastrato.
# This software is licensed under the Apache License version 2.
#
FROM trinodb/trino:426
LABEL maintainer="[email protected]"

# Only mysql, hudi, iceberg, mariadb, jmx, memory, tpch, tpcds, hive plugin are kept
RUN rm -rf /usr/lib/trino/plugin/accumulo \
&& rm -rf /usr/lib/trino/plugin/blackhole \
&& rm -rf /usr/lib/trino/plugin/delta-lake \
&& rm -rf /usr/lib/trino/plugin/example-http \
&& rm -rf /usr/lib/trino/plugin/geospatial \
&& rm -rf /usr/lib/trino/plugin/kafka \
&& rm -rf /usr/lib/trino/plugin/local-file \
&& rm -rf /usr/lib/trino/plugin/ml \
&& rm -rf /usr/lib/trino/plugin/mysql-event-listener \
&& rm -rf /usr/lib/trino/plugin/phoenix5 \
&& rm -rf /usr/lib/trino/plugin/prometheus \
&& rm -rf /usr/lib/trino/plugin/redshift \
&& rm -rf /usr/lib/trino/plugin/singlestore \
&& rm -rf /usr/lib/trino/plugin/thrift \
&& rm -rf /usr/lib/trino/plugin/atop \
&& rm -rf /usr/lib/trino/plugin/cassandra \
&& rm -rf /usr/lib/trino/plugin/druid \
&& rm -rf /usr/lib/trino/plugin/exchange-filesystem \
&& rm -rf /usr/lib/trino/plugin/google-sheets \
&& rm -rf /usr/lib/trino/plugin/http-event-listener \
&& rm -rf /usr/lib/trino/plugin/ignite \
&& rm -rf /usr/lib/trino/plugin/kinesis \
&& rm -rf /usr/lib/trino/plugin/mongodb \
&& rm -rf /usr/lib/trino/plugin/oracle \
&& rm -rf /usr/lib/trino/plugin/pinot \
&& rm -rf /usr/lib/trino/plugin/raptor-legacy \
&& rm -rf /usr/lib/trino/plugin/resource-group-managers \
&& rm -rf /usr/lib/trino/plugin/sqlserver \
&& rm -rf /usr/lib/trino/plugin/bigquery \
&& rm -rf /usr/lib/trino/plugin/clickhouse \
&& rm -rf /usr/lib/trino/plugin/elasticsearch \
&& rm -rf /usr/lib/trino/plugin/exchange-hdfs \
&& rm -rf /usr/lib/trino/plugin/hudi \
&& rm -rf /usr/lib/trino/plugin/kudu \
&& rm -rf /usr/lib/trino/plugin/password-authenticators \
&& rm -rf /usr/lib/trino/plugin/postgresql \
&& rm -rf /usr/lib/trino/plugin/redis \
&& rm -rf /usr/lib/trino/plugin/session-property-managers \
&& rm -rf /usr/lib/trino/plugin/teradata-functions
Binary file modified docs/assets/publish-docker-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading