Skip to content

Commit

Permalink
Use Hosted Control Planes for ROSA to speed up cluster creation
Browse files Browse the repository at this point in the history
- Remove multi-az cluster create options
- Remove need for ccoctl when provisioning EFS as the
  cloud-credential-operator is not installed on HCP clusters

Resolves #673

Signed-off-by: Ryan Emerson <[email protected]>
  • Loading branch information
ryanemerson authored and ahus1 committed Apr 2, 2024
1 parent 16fb1f7 commit 9e59dfb
Show file tree
Hide file tree
Showing 37 changed files with 1,436 additions and 200 deletions.
23 changes: 6 additions & 17 deletions .github/workflows/rosa-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ on:
type: string
computeMachineType:
description: 'Instance type for the compute nodes'
default: m5.xlarge
type: string
multiAz:
description: 'Deploy to multiple availability zones in the region'
Expand All @@ -20,7 +19,6 @@ on:
type: string
replicas:
description: 'Number of worker nodes to provision'
default: '2'
type: string
region:
description: 'The AWS region to create the cluster in. Defaults to "vars.AWS_DEFAULT_REGION" if omitted.'
Expand All @@ -33,23 +31,10 @@ on:
type: string
computeMachineType:
description: 'Instance type for the compute nodes'
required: true
default: m5.xlarge
type: string
multiAz:
description: 'Deploy to multiple availability zones in the region'
required: true
default: false
type: boolean
availabilityZones:
description: 'Availability zones to deploy to'
required: false
default: ''
type: string
replicas:
description: 'Number of worker nodes to provision'
required: true
default: '2'
type: string
region:
description: 'The AWS region to create the cluster in. Defaults to "vars.AWS_DEFAULT_REGION" if omitted.'
Expand All @@ -76,16 +61,20 @@ jobs:
aws-default-region: ${{ vars.AWS_DEFAULT_REGION }}
rosa-token: ${{ secrets.ROSA_TOKEN }}

- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_wrapper: false

- name: Create ROSA Cluster
run: ./rosa_create_cluster.sh
working-directory: provision/aws
env:
VERSION: ${{ env.OPENSHIFT_VERSION }}
CLUSTER_NAME: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}
COMPUTE_MACHINE_TYPE: ${{ inputs.computeMachineType }}
MULTI_AZ: ${{ inputs.multiAz }}
AVAILABILITY_ZONES: ${{ inputs.availabilityZones }}
REPLICAS: ${{ inputs.replicas }}
TF_VAR_rhcs_token: ${{ secrets.ROSA_TOKEN }}

- name: Archive ROSA logs
uses: actions/upload-artifact@v4
Expand Down
8 changes: 8 additions & 0 deletions .github/workflows/rosa-cluster-delete.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,19 +45,27 @@ jobs:
with:
clusterName: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}

- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_wrapper: false

- name: Delete a ROSA Cluster
if: ${{ inputs.deleteAll == 'no' }}
shell: bash
run: ./rosa_delete_cluster.sh
working-directory: provision/aws
env:
CLUSTER_NAME: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}
TF_VAR_rhcs_token: ${{ secrets.ROSA_TOKEN }}

- name: Delete all ROSA Clusters
if: ${{ inputs.deleteAll == 'yes' }}
shell: bash
run: ./rosa_cluster_reaper.sh
working-directory: provision/aws
env:
TF_VAR_rhcs_token: ${{ secrets.ROSA_TOKEN }}

- name: Archive ROSA logs
uses: actions/upload-artifact@v4
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/rosa-multi-az-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ jobs:

- name: Scale ROSA clusters
run: |
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-a --min-replicas 3 scaling
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-b --min-replicas 3 scaling
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-a --min-replicas 3 --max-replicas 10 scaling
rosa edit machinepool -c ${{ env.CLUSTER_PREFIX }}-b --min-replicas 3 --max-replicas 10 scaling
- name: Setup Go Task
uses: ./.github/actions/task-setup
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,7 @@ quarkus/data/*.db
# Horreum #
###########
provision/environment_data.json

# OpenTofu / Terraform
**/*.tfstate*
**/*.terraform*
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ See <<aws-efs-as-readwritemany-storage>> for more information.
== Prerequisites

. xref:prerequisite/prerequisite-awscli.adoc[]
. [Install OpenTofu](https://opentofu.org/docs/intro/install/)
. Perform the steps outlined in the https://console.redhat.com/openshift/create/rosa/getstarted[ROSA installation guide]:
.. Enable ROSA Service in AWS account
.. Download and install the ROSA command line tool
Expand Down Expand Up @@ -47,8 +48,6 @@ If no `ADMIN_PASSWORD` is provided in the configuration, it reads it from the AW
`VERSION`:: OpenShift cluster version.
`REGION`:: AWS region where the cluster should run.
`COMPUTE_MACHINE_TYPE`:: https://aws.amazon.com/ec2/instance-types/[AWS instance type] for the default OpenShift worker machine pool.
`MULTI_AZ`:: Boolean parameter to indicate whether the OpenShift cluster should span many Availability Zones within the selected region.
`AVAILABILITY_ZONES`:: Comma separated list of Availability Zones to use for the cluster. For example, `eu-central-1a,eu-central-1b`.
`REPLICAS`:: Number of worker nodes.
If multi-AZ installation is selected, then this needs to be a multiple of the number of AZs available in the region.
For example, if the region has 3 AZs, then replicas need to be set to some multiple of 3.
Expand Down
1 change: 1 addition & 0 deletions provision/aws/efs/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
manifests
ccoctl
iam-trust.json

This file was deleted.

44 changes: 44 additions & 0 deletions provision/aws/efs/iam-policy.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:DescribeMountTargets",
"elasticfilesystem:DescribeAccessPoints",
"elasticfilesystem:DescribeFileSystems",
"elasticfilesystem:ClientMount",
"elasticfilesystem:ClientWrite",
"elasticfilesystem:CreateTags",
"elasticfilesystem:CreateMountTarget",
"elasticfilesystem:DeleteMountTarget",
"elasticfilesystem:DeleteTags",
"elasticfilesystem:TagResource",
"elasticfilesystem:UntagResource"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"elasticfilesystem:CreateAccessPoint"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:RequestTag/efs.csi.aws.com/cluster": "true"
}
}
},
{
"Effect": "Allow",
"Action": "elasticfilesystem:DeleteAccessPoint",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/efs.csi.aws.com/cluster": "true"
}
}
}
]
}
2 changes: 1 addition & 1 deletion provision/aws/rds/aurora_create_peering_connection.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ aws ec2 accept-vpc-peering-connection \

# Update the ROSA Cluster VPC's Route Table
ROSA_PUBLIC_ROUTE_TABLE_ID=$(aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=${ROSA_VPC}" "Name=association.main,Values=true" \
--filters "Name=vpc-id,Values=${ROSA_VPC}" "Name=tag:Name,Values=*public*" \
--query "RouteTables[*].RouteTableId" \
--output text
)
Expand Down
134 changes: 54 additions & 80 deletions provision/aws/rosa_create_cluster.sh
Original file line number Diff line number Diff line change
@@ -1,100 +1,74 @@
#!/usr/bin/env bash
set -e

if [[ "$RUNNER_DEBUG" == "1" ]]; then
set -x
fi

if [ -f ./.env ]; then
source ./.env
fi

function requiredEnv() {
for ENV in $@; do
if [ -z "${!ENV}" ]; then
echo "${ENV} variable must be set"
exit 1
fi
done
}

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

AWS_ACCOUNT=${AWS_ACCOUNT:-$(aws sts get-caller-identity --query "Account" --output text)}
if [ -z "$AWS_ACCOUNT" ]; then echo "Variable AWS_ACCOUNT needs to be set."; exit 1; fi

if [ -z "$VERSION" ]; then echo "Variable VERSION needs to be set."; exit 1; fi
CLUSTER_NAME=${CLUSTER_NAME:-$(whoami)}
if [ -z "$CLUSTER_NAME" ]; then echo "Variable CLUSTER_NAME needs to be set."; exit 1; fi
if [ -z "$REGION" ]; then echo "Variable REGION needs to be set."; exit 1; fi
if [ -z "$COMPUTE_MACHINE_TYPE" ]; then echo "Variable COMPUTE_MACHINE_TYPE needs to be set."; exit 1; fi

if [ "$MULTI_AZ" = "true" ]; then MULTI_AZ_PARAM="--multi-az"; else MULTI_AZ_PARAM=""; fi
if [ -z "$AVAILABILITY_ZONES" ]; then AVAILABILITY_ZONES_PARAM=""; else AVAILABILITY_ZONES_PARAM="--availability-zones $AVAILABILITY_ZONES"; fi
if [ -z "$REPLICAS" ]; then echo "Variable REPLICAS needs to be set."; exit 1; fi

echo "Checking if cluster ${CLUSTER_NAME} already exists."
if rosa describe cluster --cluster="${CLUSTER_NAME}"; then
echo "Cluster ${CLUSTER_NAME} already exists."
else
echo "Verifying ROSA prerequisites."
echo "Check if AWS CLI is installed."; aws --version
echo "Check if ROSA CLI is installed."; rosa version
echo "Check if ELB service role is enabled."
if ! aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" --no-cli-pager; then
aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
fi
rosa whoami
rosa verify quota

echo "Installing ROSA cluster ${CLUSTER_NAME}"

MACHINE_CIDR=$(./rosa_machine_cidr.sh)

ROSA_CMD="rosa create cluster \
--sts \
--cluster-name ${CLUSTER_NAME} \
--version ${VERSION} \
--role-arn arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Installer-Role \
--support-role-arn arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Support-Role \
--controlplane-iam-role arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-ControlPlane-Role \
--worker-iam-role arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Worker-Role \
--operator-roles-prefix ${CLUSTER_NAME} \
--region ${REGION} ${MULTI_AZ_PARAM} ${AVAILABILITY_ZONES_PARAM} \
--replicas ${REPLICAS} \
--compute-machine-type ${COMPUTE_MACHINE_TYPE} \
--machine-cidr ${MACHINE_CIDR} \
--service-cidr 172.30.0.0/16 \
--pod-cidr 10.128.0.0/14 \
--host-prefix 23"

echo $ROSA_CMD
$ROSA_CMD

requiredEnv AWS_ACCOUNT CLUSTER_NAME REGION

export CLUSTER_NAME=${CLUSTER_NAME:-$(whoami)}

echo "Verifying ROSA prerequisites."
echo "Check if AWS CLI is installed."; aws --version
echo "Check if ROSA CLI is installed."; rosa version
echo "Check if ELB service role is enabled."
if ! aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" --no-cli-pager; then
aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
fi
rosa whoami
rosa verify quota

mkdir -p "logs/${CLUSTER_NAME}"
echo "Installing ROSA cluster ${CLUSTER_NAME}"

function custom_date() {
date '+%Y%m%d-%H%M%S'
}
cd ${SCRIPT_DIR}/../opentofu/modules/rosa/hcp
tofu init
tofu workspace new ${CLUSTER_NAME} || true
export TF_WORKSPACE=${CLUSTER_NAME}

TOFU_CMD="tofu apply -auto-approve \
-var cluster_name=${CLUSTER_NAME} \
-var region=${REGION}"

if [ -n "${COMPUTE_MACHINE_TYPE}" ]; then
TOFU_CMD+=" -var instance_type=${COMPUTE_MACHINE_TYPE}"
fi

if [ -n "${VERSION}" ]; then
TOFU_CMD+=" -var openshift_version=${VERSION}"
fi

if [ -n "${REPLICAS}" ]; then
TOFU_CMD+=" -var replicas=${REPLICAS}"
fi

echo "Creating operator roles."
rosa create operator-roles --cluster "${CLUSTER_NAME}" --mode auto --yes > "logs/${CLUSTER_NAME}/$(custom_date)_create-operator-roles.log"

echo "Creating OIDC provider."
rosa create oidc-provider --cluster "${CLUSTER_NAME}" --mode auto --yes > "logs/${CLUSTER_NAME}/$(custom_date)_create-oidc-provider.log"

echo "Waiting for cluster installation to finish."
# There have been failures with 'ERR: Failed to watch logs for cluster ... connection reset by peer' probably because services in the cluster were restarting during the cluster initialization.
# Those errors don't show an installation problem, and installation will continue asynchronously. Therefore, retry.
TIMEOUT=$(($(date +%s) + 3600))
while true ; do
if (rosa logs install --cluster "${CLUSTER_NAME}" --watch --tail=1000000 >> "logs/${CLUSTER_NAME}/$(custom_date)_create-cluster.log"); then
break
fi
if (( TIMEOUT < $(date +%s))); then
echo "Timeout exceeded"
exit 1
fi
echo "retrying watching logs after failure"
sleep 1
done

echo "Cluster installation complete."
echo

./rosa_recreate_admin.sh
echo ${TOFU_CMD}
${TOFU_CMD}

SCALING_MACHINE_POOL=$(rosa list machinepools -c "${CLUSTER_NAME}" -o json | jq -r '.[] | select(.id == "scaling") | .id')
if [[ "${SCALING_MACHINE_POOL}" != "scaling" ]]; then
rosa create machinepool -c "${CLUSTER_NAME}" --instance-type m5.4xlarge --max-replicas 10 --min-replicas 0 --name scaling --enable-autoscaling
rosa create machinepool -c "${CLUSTER_NAME}" --instance-type m5.4xlarge --max-replicas 10 --min-replicas 1 --name scaling --enable-autoscaling
fi

cd ${SCRIPT_DIR}
./rosa_oc_login.sh
./rosa_efs_create.sh
../infinispan/install_operator.sh

Expand Down
Loading

0 comments on commit 9e59dfb

Please sign in to comment.