Skip to content

Scheduler Migration Guide

Eric Holmes edited this page Jun 14, 2017 · 16 revisions

Migrating an app to the CloudFormation backend

This document will guide you in the steps necessary to safely migrate applications to the new CloudFormation backend, with minimal amount of downtime.

Prerequisites

This guide assumes:

  1. You are using Empire v0.11.1 (note that v0.11.0 has a bug that can prevent migrations from working properly).
  2. You have launched Empire with --scheduler=cloudformation-migration (default in 0.11)

Step 1 - Create the CloudFormation stack

The first step will create the new CloudFormation stack with the ECS/ELB resources created, but without altering the <app>.empire CNAME record.

To perform the first step, run:

$ emp set EMPIRE_SCHEDULER_MIGRATION=step1

Then wait for the new CloudFormation stack to finish creating: https://console.aws.amazon.com/cloudformation/home

If the stack creation is rolled back, take note of the error in the stack events, fix the issue (usually Empire is misconfigured, or you haven't given Empire the necessary permissions), then destroy the stack, and run step 1 again:

$ emp set EMPIRE_SCHEDULER_MIGRATION=step1

Step 2 - Swap DNS

You have two options:

  1. Manually remove the <app>.empire CNAME and then proceed to step 3.
  2. Manually remove the <app>.empire CNAME, then set the DNS parameter to true in the newly created CloudFormation stack. Use this if you want to test the new stack before completely removing the old ECS/ELB resources.

If you're running a high load service, you may need to contact AWS support to pre-warm the new ELB.

Step 3 - Remove the old ECS/ELB resources

This will remove the old ECS services, and ELB's.

$ emp set EMPIRE_SCHEDULER_MIGRATION=step2

Step 4 - Finalize the migration

$ emp unset EMPIRE_SCHEDULER_MIGRATION

Downtime

During the migration, the only downtime that should occur is within step 2, when DNS is swapped from the old load balancer to the new load balancer. This downtime should be minimal (if any) since both the old load balancer and the new load balancer will be accepting traffic.

To minimize downtime, it's recommended that you increase the TTL to ~5 minutes (and wait a few minutes for the change to propagate) before removing the CNAME and setting the DNS parameter in the stack. We used this method at Remind to migrate the majority of our services without any downtime.

Rollback

If for whatever reason, you need to roll back to the old scheduler, you can do so manually with these steps:

  1. Update the backend column in the scheduler_migration table to ecs.
  2. Manually update the CloudFormation stack and set the DNS parameter to false.
  3. Deploy the application again.
  4. Manually remove the CloudFormation stack.