Skip to content

Latest commit

 

History

History
2518 lines (1876 loc) · 96.4 KB

week06&07.md

File metadata and controls

2518 lines (1876 loc) · 96.4 KB

ECS and a lot more

Content for the coming 2 weeks(week 6 and week 7) being so closely related, we would combine them so as to give my fellow bootcampers and myself plenty of time to implement everything and get completely caught up.

We can't use psql command here which we used to connect to the db as we haven't installed it and it isn't worth setting it up. So we are just testing the connection instead. Created a new script file in ‘/backend-flask/db/’ named test.

#!/usr/bin/env python3

import psycopg
import os
import sys

connection_url = os.getenv("CONNECTION_URL")

conn = None
try:
  print('attempting connection')
  conn = psycopg.connect(connection_url)
  print("Connection successful!")
except psycopg.Error as e:
  print("Unable to connect to the database:", e)
finally:
  conn.close()

test-connection

We need to implement a health check endpoint into our app as well. We begin in the backend. In our ‘app.py’ file, we add a health check.

@app.route('/api/health-check')
def health_check():
  return {'success': True}, 200

The reason of not using tools like curl in this:

Shell Injection: If your application dynamically constructs curl commands using user input or other untrusted data, it could be vulnerable to shell injection attacks, allowing attackers to execute arbitrary commands within the container.

#!/usr/bin/env python3

import urllib.request

try:
  response = urllib.request.urlopen('http://localhost:4567/api/health-check')
  if response.getcode() == 200:
    print("[OK] Flask server is running")
    exit(0) # success
  else:
    print("[BAD] Flask server is not running")
    exit(1) # false
# This for some reason is not capturing the error....
#except ConnectionRefusedError as e:
# so we'll just catch on all even though this is a bad practice
except Exception as e:
  print(e)
  exit(1) # false

app-health-check

we will also need a new AWS Cloudwatch group.

We login to the AWS Console, then go to Cloudwatch, and view Logs > Log groups. Then back in our codespace, > from the CLI, we create the log group.

log-group

log-group

Created our ECS cluster. Did it through the CLI instead of the console because AWS changes their UI so frequently, there’s no point in getting familiar with one layout.

aws ecs create-cluster \
--cluster-name cruddur \
--service-connect-defaults namespace=cruddur

The ‘ — service-connect-defaults’ lets us set the name for the default Service Connect namespace to our cluster. It’s a nicer way of mapping things internally using AWS Cloudmap. Screenshot 2024-01-29 163620 Screenshot 2024-01-29 163728 Screenshot 2024-01-29 163857

we’re going to use AWS ECR to house our containers. To do this, we must first create a repository.

aws ecr create-repository \
  --repository-name cruddur-python \
  --image-tag-mutability MUTABLE

Screenshot 2024-01-29 191335

Screenshot 2024-01-29 191504 Screenshot 2024-01-29 191512

This gives us a repository named ‘cruddur-python’ with the image tag being mutable. This will prevent tags from being overwritten.

Next, we must login to ECR using our AWS credentials. The command here uses our env variables we’ve already set in our environment.

aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

Screenshot 2024-01-29 191803

We can now push container images. We set our path to the repo.

export ECR_PYTHON_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/cruddur-python"

We pull a version of Python image

docker pull python:3.10-slim-buster

We then tag the image.

docker tag python:3.10-slim-buster $ECR_PYTHON_URL:3.10-slim-buster

Next, we push the image.

docker push $ECR_PYTHON_URL:3.10-slim-buster

Screenshot 2024-01-29 192331

From ECR in the AWS console, we can now see our image in the repository.

Screenshot 2024-01-29 192345

Now we must update our Flask app to use this. We navigate to our ‘backend-flask’ location, then edit our Dockerfile.

FROM 774944129490.dkr.ecr.us-east-1.amazonaws.com/cruddur-python:3.10-slim-buster
# Inside Container
# make a new folder inside container
WORKDIR /backend-flask

# Outside Container -> Inside Container
# this contains the libraries want to install to run the app
COPY requirements.txt requirements.txt

# Inside Container
# Install the python libraries used for the app
RUN pip3 install -r requirements.txt

# Outside Container -> Inside Container
# . means everything in the current directory
# first period . - /backend-flask (outside container)
# second period . /backend-flask (inside container)
COPY . .

ENV PYTHONUNBUFFERED=1

EXPOSE ${PORT}

# CMD (Command)
# python3 -m flask run --host=0.0.0.0 --port=4567
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0", "--port=4567", "--debug"]

To test the new configuration in our Dockerfile, we run select services from the CLI.

docker compose up backend-flask db

After this completes, we can see that the backend is running, as the port is now open. We test the health-check.

We can now start pushing this. So we again make another repo.

aws ecr create-repository \
  --repository-name backend-flask \
  --image-tag-mutability MUTABLE

Next we set the URL:

export ECR_BACKEND_FLASK_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/backend-flask"
echo $ECR_BACKEND_FLASK_URL

Now we build the image. On our previous container, we didn’t need to build an image, we pulled it. Andrew confirms we must make sure we’re in the backend-flask directory prior to running the command.

docker build -t backend-flask .

We then tag and push the image.

docker tag backend-flask:latest $ECR_BACKEND_FLASK_URL:latest

We make sure to tag the push with the tag ‘:latest’ although this isn’t necessary. It will get tagged this way by default. Also when using AWS, Andrew explained it will always look for the ‘:latest’ tag.

docker push $ECR_BACKEND_FLASK_URL:latest

Screenshot 2024-01-29 200136

From here, we go back to ECS in the AWS console. Andrew walks us through the UI of the existing options and configuration of setting up a service. While explaining task definitions, we find that the Cloudwatch log group we created earlier is improperly named.

We navigate back to Cloudwatch, and from the UI, we manually create a new log group named cruddur, with a retention period of 1 day.

image

Back in our code, we now need to finish creating our roles and setup the policy for our task definitions.

Our service-execution-policy.json followed by our service-assume-role-execution-policy.json:

{
    "Version":"2012-10-17",
    "Statement":[{
        "Effect": "Allow",
        "Action": [
          "ssm:GetParameters",
          "ssm:GetParameter"
        ],
        "Resource": "arn:aws:ssm:us-east-1:554621479919:parameter/cruddur/backend-flask/*"        
    }]
}
{
    "Version":"2012-10-17",
    "Statement":[{
      "Action":["sts:AssumeRole"],
      "Effect":"Allow",
      "Principal":{
        "Service":["ecs-tasks.amazonaws.com"]
      }}]
}

We run the files from the CLI to create the role and trust relationship in IAM.

aws iam create-role \
--role-name CruddurServiceExecutionRole \
--assume-role-policy-document file://aws/policies/service-assume-role-execution-policy.json

Screenshot 2024-01-29 203747

Screenshot 2024-01-30 031056

We then grant the CruddurTaskRole full access to Cloudwatch and write access to the AWS XRay Daemon:

aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/CloudWatchFullAccess --role-name CruddurTaskRole
aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess --role-name CruddurTaskRole

We can now begin working on our task definitions. From our workspace, we create a new folder from the aws directory named ‘task-definitions’ and then create a backend-flask.json file and a frontend-react-js.json file, filling in our own information.

{
  "family": "backend-flask",
  "executionRoleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/CruddurServiceExecutionRole",
  "taskRoleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/CruddurTaskRole",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "backend-flask",
      "image": "BACKEND_FLASK_IMAGE_URL",
      "cpu": 256,
      "memory": 512,
      "essential": true,
      "portMappings": [
        {
          "name": "backend-flask",
          "containerPort": 4567,
          "protocol": "tcp", 
          "appProtocol": "http"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "cruddur",
            "awslogs-region": "us-east-1",
            "awslogs-stream-prefix": "backend-flask"
        }
      },
      "environment": [
        {"name": "OTEL_SERVICE_NAME", "value": "backend-flask"},
        {"name": "OTEL_EXPORTER_OTLP_ENDPOINT", "value": "https://api.honeycomb.io"},
        {"name": "AWS_COGNITO_USER_POOL_ID", "value": ""},
        {"name": "AWS_COGNITO_USER_POOL_CLIENT_ID", "value": ""},
        {"name": "FRONTEND_URL", "value": "*"},
        {"name": "BACKEND_URL", "value": "*"},
        {"name": "AWS_DEFAULT_REGION", "value": "us-east-1"}
      ],
      "secrets": [
        {"name": "AWS_ACCESS_KEY_ID"    , "valueFrom": "arn:aws:ssm:AWS_REGION:AWS_ACCOUNT_ID:parameter/cruddur/backend-flask/AWS_ACCESS_KEY_ID"},
        {"name": "AWS_SECRET_ACCESS_KEY", "valueFrom": "arn:aws:ssm:AWS_REGION:AWS_ACCOUNT_ID:parameter/cruddur/backend-flask/AWS_SECRET_ACCESS_KEY"},
        {"name": "CONNECTION_URL"       , "valueFrom": "arn:aws:ssm:AWS_REGION:AWS_ACCOUNT_ID:parameter/cruddur/backend-flask/CONNECTION_URL" },
        {"name": "ROLLBAR_ACCESS_TOKEN" , "valueFrom": "arn:aws:ssm:AWS_REGION:AWS_ACCOUNT_ID:parameter/cruddur/backend-flask/ROLLBAR_ACCESS_TOKEN" },
        {"name": "OTEL_EXPORTER_OTLP_HEADERS" , "valueFrom": "arn:aws:ssm:AWS_REGION:AWS_ACCOUNT_ID:parameter/cruddur/backend-flask/OTEL_EXPORTER_OTLP_HEADERS" }
        
      ]
    }
  ]
}

{
  "family": "frontend-react-js",
  "executionRoleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/CruddurServiceExecutionRole",
  "taskRoleArn": "arn:aws:iam::AWS_ACCOUNT_ID:role/CruddurTaskRole",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "frontend-react-js",
      "image": "BACKEND_FLASK_IMAGE_URL",
      "cpu": 256,
      "memory": 256,
      "essential": true,
      "portMappings": [
        {
          "name": "frontend-react-js",
          "containerPort": 3000,
          "protocol": "tcp", 
          "appProtocol": "http"
        }
      ],

      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "cruddur",
            "awslogs-region": "us-east-1",
            "awslogs-stream-prefix": "frontend-react"
        }
      }
    }
  ]
}

After this is completed, we register our task definitions from the CLI.

aws ecs register-task-definition --cli-input-json file://aws/task-definitions/backend-flask.json
aws ecs register-task-definition --cli-input-json file://aws/task-definitions/frontend-react-js.json

We next set a variable for after finding the default VPC in AWS by running this:

export DEFAULT_VPC_ID=$(aws ec2 describe-vpcs \
--filters "Name=isDefault, Values=true" \
--query "Vpcs[0].VpcId" \
--output text)
echo $DEFAULT_VPC_ID

We then use it to setup our security group:

export CRUD_SERVICE_SG=$(aws ec2 create-security-group \
  --group-name "crud-srv-sg" \
  --description "Security group for Cruddur services on ECS" \
  --vpc-id $DEFAULT_VPC_ID \
  --query "GroupId" --output text)
echo $CRUD_SERVICE_SG

Then authorize port 80 for the security group:

aws ec2 authorize-security-group-ingress \
  --group-id $CRUD_SERVICE_SG \
  --protocol tcp \
  --port 80 \
  --cidr 0.0.0.0/0

Next, we create our backend-flask service through ECS in the AWS console manually. From ECS it looks like there’s an issue with our backend-flask cluster service. It’s giving an error regarding the permissions to ECR and the logs:CreateLogStream action. So to fix this, we go back to IAM and edit the policy for our CruddurServiceExecutionPolicy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameters",
                "ssm:GetParameter"
            ],
            "Resource": "arn:aws:ssm:us-east-1:554621479919:parameter/cruddur/backend-flask/*"
        }
    ]
}

Screenshot 2024-01-30 140051 Screenshot 2024-01-30 183005 Screenshot 2024-01-30 180938

We go back to ECS and force a new deployment of our service. When we check the task itself, it’s health status check came back as unknown. Screenshot 2024-01-30 183033

To troubleshoot the issue, we shelled into the task itself by running the following from CLI:

aws ecs execute-command \
--region $AWS_DEFAULT_REGION \
--cluster cruddur \
--task 99999999999999999999 \
--container backend-flask \
--command "/bin/bash" \
--interactive

Prior to this, we needed to install the Session Manager plugin for our CLI:

curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/ubuntu_64bit/session-manager-plugin.deb" -o "session-manager-plugin.deb"

sudo dpkg -i session-manager-plugin.deb

We’re still unable to shell into the task. As it turns out, we need to enable an option for the service. This can only be done through the CLI, so we create a new file in our ‘aws/json’ directory named ‘service-backend-flask.json’ to create the service, with our own information:

{
    "cluster": "cruddur",
    "launchType": "FARGATE",
    "desiredCount": 1,
    "enableECSManagedTags": true,
    "enableExecuteCommand": true,
    "loadBalancers": [
      {
          "targetGroupArn": "",
          "containerName": "backend-flask",
          "containerPort": 4567
      }
  ],
    "networkConfiguration": {
      "awsvpcConfiguration": {
        "assignPublicIp": "ENABLED",
        "securityGroups": [
          "sg-99999999999"
        ],
        "subnets": [
          "subnet-",
          "subnet-",
          "subnet-"
        ]
      }
    }
    "propagateTags": "SERVICE",
    "serviceName": "backend-flask",
    "taskDefinition": "backend-flask"
}

The ‘“enableExecuteCommand”: true’ option above is what we were needing to set. We relaunch the service, this time from the CLI:

aws ecs create-service --cli-input-json file://aws/json/service-backend-flask.json

Screenshot 2024-01-30 230939

Screenshot 2024-01-31 031628 Screenshot 2024-01-31 031617 Screenshot 2024-01-31 023945 Screenshot 2024-01-31 031657

We go back to ECS, grab the number from the recently started task, then again try to shell into the service task:

image

This time it works. We’re able to perform a health check on the task:

./bin/flask/health-check

The health check returns saying the Flask server is running. When we go back to ECS, the task is showing healthy there as well.

We create a new script for this process by creating a new folder in our ‘backend-flask/bin’ directory, named ‘ecs’, then a file inside named ‘connect-to-service’ where we copied the shell execute-command above into it. Then in our gitpod.yml file, to make sure Session Manager is installed in our environment at all times, we add a section for Fargate:

  - name: fargate 
    before: |
      cd /workspace
      curl "https://s3.amazonaws.com/session-manager-downloads/plugin/latest/ubuntu_64bit/session-manager-plugin.deb" -o "session-manager-plugin.deb"
      sudo dpkg -i session-manager-plugin.deb 
      cd $THEIA_WORKSPACE_ROOT
      cd backend-flask
#! /usr/bin/bash
if [ -z "$1" ]; then
    echo "no TASK_ID argument supplied eg ./bin/ecs/connect-to-service 89a18169c70f41bd873e0395255291fa backend-flask"
    exit 1
fi
TASK_ID=$1

if [ -z "$2" ]; then
    echo "no CONTAINER_NAME argument supplied eg ./bin/ecs/connect-to-service 89a18169c70f41bd873e0395255291fa backend-flask"
    exit 1
fi
CONTAINER_NAME=$2

aws ecs execute-command \
--region $AWS_DEFAULT_REGION \
--cluster cruddur \
--task $TASK_ID \
--container $CONTAINER_NAME \
--command "/bin/bash" \
--interactive

Screenshot 2024-01-31 031841

From here, we go back to the AWS console, access EC2, then go to security groups. We must edit the inbound rules of our earlier created security group to open port 4567 for our backend-flask service to run. We also edit the default security group’s inbound rules, this way our service can interact with our backend.

Screenshot 2024-01-31 032047

Earlier when creating our service-backend-flask.json file, we had removed code that we reinsert now:

    "serviceConnectConfiguration": {
      "enabled": true,
      "namespace": "cruddur",
      "services": [
        {
          "portName": "backend-flask",
          "discoveryName": "backend-flask",
          "clientAliases": [{"port": 4567}]
        }
      ]
    },

We again relaunch the service:

aws ecs create-service --cli-input-json file://aws/json/service-backend-flask.json

Screenshot 2024-01-31 130647 Screenshot 2024-01-31 130633 Screenshot 2024-01-31 124116 Screenshot 2024-01-31 124023

image

We now needed an application load balancer in place. We started by creating a new security group named cruddur-alb-sg.

Screenshot 2024-01-31 153824

From there, edited the inbound rules of the crud-srv-sg security group to allow access for the ALB’s security group as well.

Screenshot 2024-01-31 153728

Then we created a new target group with a target of IP addresses named cruddur-backend-flask-tg and another for the frontend named frontend-react-js. Created application load balancer named cruddur-alb using the cruddur-alb-sg security group and the cruddur-backend-flask-tg and frontend-react-js target groups.

Screenshot 2024-01-31 154128

Screenshot 2024-01-31 183106 Screenshot 2024-01-31 182004 Screenshot 2024-01-31 181831 Screenshot 2024-01-31 181205 Screenshot 2024-01-31 174542 Screenshot 2024-01-31 154148 Screenshot 2024-01-31 154128 Screenshot 2024-01-31 153824 Screenshot 2024-01-31 153728 Screenshot 2024-01-31 183202

In reviewing our frontend-react-js.json file, we decide we need to make a separate Dockerfile for production. We navigate to our frontend-react-js folder in our workspace, then created ‘Dockerfile.prod’

# Base Image ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FROM node:16.18 AS build

ARG REACT_APP_BACKEND_URL
ARG REACT_APP_AWS_PROJECT_REGION
ARG REACT_APP_AWS_COGNITO_REGION
ARG REACT_APP_AWS_USER_POOLS_ID
ARG REACT_APP_CLIENT_ID

ENV REACT_APP_BACKEND_URL=$REACT_APP_BACKEND_URL
ENV REACT_APP_AWS_PROJECT_REGION=$REACT_APP_AWS_PROJECT_REGION
ENV REACT_APP_AWS_COGNITO_REGION=$REACT_APP_AWS_COGNITO_REGION
ENV REACT_APP_AWS_USER_POOLS_ID=$REACT_APP_AWS_USER_POOLS_ID
ENV REACT_APP_CLIENT_ID=$REACT_APP_CLIENT_ID

COPY . ./frontend-react-js
WORKDIR /frontend-react-js
RUN npm install
RUN npm run build

# New Base Image ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FROM nginx:1.23.3-alpine

# --from build is coming from the Base Image
COPY --from=build /frontend-react-js/build /usr/share/nginx/html
COPY --from=build /frontend-react-js/nginx.conf /etc/nginx/nginx.conf

EXPOSE 3000

For the above file to work, we must also implement an nginx.conf or configuration file.

# Set the worker processes
worker_processes 1;

# Set the events module
events {
  worker_connections 1024;
}

# Set the http module
http {
  # Set the MIME types
  include /etc/nginx/mime.types;
  default_type application/octet-stream;

  # Set the log format
  log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';

  # Set the access log
  access_log  /var/log/nginx/access.log main;

  # Set the error log
  error_log /var/log/nginx/error.log;

  # Set the server section
  server {
    # Set the listen port
    listen 3000;

    # Set the root directory for the app
    root /usr/share/nginx/html;

    # Set the default file to serve
    index index.html;

    location / {
        # First attempt to serve request as file, then
        # as directory, then fall back to redirecting to index.html
        try_files $uri $uri/ $uri.html /index.html;
    }

    # Set the error page
    error_page  404 /404.html;
    location = /404.html {
      internal;
    }

    # Set the error page for 500 errors
    error_page  500 502 503 504  /50x.html;
    location = /50x.html {
      internal;
    }
  }
}

The nginx.conf file in the Dockerfile is used to configure the Nginx web server that is being used to serve the static content generated by our React application. The configuration file sets up the server to listen on port 3000 and serve the static files located in the /usr/share/nginx/html directory. It also sets up error pages and logging.

The location / block in the configuration file is particularly important as it specifies how Nginx will handle incoming requests. In this case, it uses the try_files directive to first attempt to serve the request as a file, then as a directory, and finally fallback to redirecting to index.html.

We cd into our frontend-react-js directory, then do an ‘npm run build’. We’re now told from the terminal that our build folder is ready to be deployed.

Screenshot 2024-01-31 191826

We build the image for the frontend from CLI:

docker build \
--build-arg REACT_APP_BACKEND_URL="https://4567-$GITPOD_WORKSPACE_ID.$GITPOD_WORKSPACE_CLUSTER_HOST" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_99999999" \
--build-arg REACT_APP_CLIENT_ID="9999999999999999" \
-t frontend-react-js \
-f Dockerfile.prod \
.

We also have to create our repository for the frontend still:

aws ecr create-repository \
--repository-name frontend-react-js \
--image-tag-mutability MUTABLE

We set the URL, then tag and push the Docker image.

export ECR_FRONTEND_REACT_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/frontend-react-js"
echo $ECR_FRONTEND_REACT_URL
docker tag frontend-react-js:latest $ECR_FRONTEND_REACT_URL:latest
docker push $ECR_FRONTEND_REACT_URL:latest

We now decide to create our frontend-react-js service. To do so, from our ‘aws/json’ folder, we create a new file named ‘service-frontend-react-js.json’

{
    "cluster": "cruddur",
    "launchType": "FARGATE",
    "desiredCount": 1,
    "enableECSManagedTags": true,
    "enableExecuteCommand": true,
    "loadBalancers": [
      {
          "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:99999999999:targetgroup/cruddur-frontend-react-js/9999999999999",
          "containerName": "frontend-react-js",
          "containerPort": 3000
      }
  ],        
    "networkConfiguration": {
      "awsvpcConfiguration": {
        "assignPublicIp": "ENABLED",
        "securityGroups": [
          "sg-9999999999999"
        ],
        "subnets": [
            "subnet-",
            "subnet-",
            "subnet-"
          ]
      }
    },
    "propagateTags": "SERVICE",
    "serviceName": "frontend-react-js",
    "taskDefinition": "frontend-react-js",
    "serviceConnectConfiguration": {
      "enabled": true,
      "namespace": "cruddur",
      "services": [
        {
          "portName": "frontend-react-js",
          "discoveryName": "frontend-react-js",
          "clientAliases": [{"port": 3000}]
        }
      ]
    }
  }

We can now create the service from the CLI:

aws ecs create-service --cli-input-json file://aws/json/service-frontend-react-js.json

Back in ECS, the service deploys a task, but the task shows unhealthy in the logs. We stop the task from the AWS console, then go back our workspace. We edit the code for our frontend-react-js.json file, removing the load balancer so we can get into the task to troubleshoot.

Code removed:

    "loadBalancers": [
      {
          "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:99999999999:targetgroup/cruddur-frontend-react-js/9999999999999",
          "containerName": "frontend-react-js",
          "containerPort": 3000
      }

We create the service from the CLI again, running the cmd from above. we next run one of our bash scripts to connect to the service.

./bin/ecs/connect-to-service _9999999999_ frontend-react-js

This fails.

image

We decide to rebuild the production environment locally to troubleshoot.

docker build \
--build-arg REACT_APP_BACKEND_URL="https://4567-$GITPOD_WORKSPACE_ID.$GITPOD_WORKSPACE_CLUSTER_HOST" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_99999999999" \
--build-arg REACT_APP_CLIENT_ID="9999999999999" \
-t frontend-react-js \
-f Dockerfile.prod \

Then we run it:

docker run --rm -p 3000:3000 -it frontend-react-js

We find that since the container is running in Alpine, it does not have the ability to allow us to shell into it, as it’s not installed by default for the container. Instead we duplicate our connect-to-service script we created earlier, and specify each file, one for connect-to-backend-flask and the other connect-to-frontend-react.

#! /usr/bin/bash
if [ -z "$1" ]; then
    echo "no TASK_ID argument supplied eg ./bin/ecs/connect-to-frontend-react-js 89a18169c70f41bd873e0395255291fa"
    exit 1
fi
TASK_ID=$1

CONTAINER_NAME=frontend-react-js

aws ecs execute-command \
--region $AWS_DEFAULT_REGION \
--cluster cruddur \
--task $TASK_ID \
--container $CONTAINER_NAME \
--command "/bin/sh" \
--interactive
#! /usr/bin/bash
if [ -z "$1" ]; then
    echo "no TASK_ID argument supplied eg ./bin/backend/connect-to-backend-flask 89a18169c70f41bd873e0395255291fa"
    exit 1
fi
TASK_ID=$1

CONTAINER_NAME=backend-flask

aws ecs execute-command \
--region $AWS_DEFAULT_REGION \
--cluster cruddur \
--task $TASK_ID \
--container $CONTAINER_NAME \
--command "/bin/bash" \
--interactive

./bin/ecs/connect-to-frontend-react-js <taskid>

This connection is successful. We find that we have curl, so Andrew asks ChatGPT to write a curl for a health check on a task definition running in Fargate. In the generated code, we find the health check, and add it to our ‘frontend-react-js.json’ file.

"healthCheck": {
  "command": [
    "CMD-SHELL",
    "curl -f http://localhost:3000 || exit 1"
    ],
    "interval": 30,
    "timeout": 5,
    "retries": 3
    }

After this, we re-register our task definition for ‘frontend-react-js.json’

aws ecs register-task=definition --cli-input-json file://aws/task-definitions/frontend-react-js.json

We go into EC2 in the AWS Console and check the target group for the frontend. In reviewing the health check, we find we may need to override the port to port 3000, so we do so.

Back in our workspace, we go back into our ‘service-frontend-react-js.json’ file and add our Load Balancer code back in to test and see if the port override on the target group was the issue.

    "loadBalancers": [
      {
          "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:99999999999:targetgroup/cruddur-frontend-react-js/9999999999999",
          "containerName": "frontend-react-js",
          "containerPort": 3000
      }

We then create the service again, after removing the service from ECS:

aws ecs create-service --cli-input-json file://aws/json/service-frontend-react-js.json

In reviewing the target from target groups in EC2, the target shows a status of unhealthy because the request timed out.

We review our service security group we setup previously and find that we hadn’t setup port information for the frontend yet. We edit the inbound rules, allowing port 3000.

We go back into our target groups and remove the port override, then check the status of our service. It now shows a healthy task!

Route53

Moving forward, we now decide to setup our custom domain. We open Route53 in AWS, then go to Hosted Zones.

Screenshot 2024-02-02 204058

We create a new hosted zone using our custom domain we purchased prior to the bootcamp.

Next, we needed an SSL certificate, so we went to AWS Certificate Manager > Request a certificate > Request a public certificate. I entered my FQDN, then added a wildcard, and asked for DNS validation.

My certificate was pending validation, even when Andrew’s completed. I still needed to update my DNS settings on my domain registrar to use the AWS nameservers setup in Route53. I updated this, then waited over the weekend for the change to propagate. When I returned to Certificate Manager, my domains showed a success status.

Screenshot 2024-04-05 045745

Back in Route53, we now have a CNAME record added. We move over to EC2, then select our load balancer. From there, we edit the listeners. We set the frontend listener on port 80 to forward to port 443 for https. Next, we set another rule, forwarding port 443 to our cruddur-frontend-react-js target group.

image

We then remove our previously setup listeners on port 3000 and 4567.

image

We go back into our listener for port 443, editing the rule.

Host header is api.gooddesignsolutions.in, then forward to cruddur-backend-flask target group.

Back in Route53, we go back to Hosted Zones and create a new record. Its an A record, routing traffic to “Alias to Application and Classic Load Balancer” in us-east-1 using our ALB load balancer.

The routing policy is set to simple, then we save it. Next we create another A record, all of the same settings as before, but this time we create a subdomain of api to gooddesignsolutions.in . Since we have this, it's decided we do not need the added routing of /api/ when reaching our health-check, so we go back into our workspace.

We open our app.py file, and update our @app.routes to remove the /api/ from the path.

From this:

image

To this:

image

We started to clean up the remaining @app.routes, but Andrew recalled we will need these from the frontend, so we instead go back and revert these changes. Instead we go into our task-definitions folder, then select our backend-flask.json file and edit the ennironment variables for “FRONTEND_URL” and “BACKEND_URL”.

      {"name": "FRONTEND_URL", "value": "gooddesignsolutions.in"},
      {"name": "BACKEND_URL", "value": "api.gooddesignsolutions.in"},

With this change, we now have to go and update our task definition from the CLI to make the change in ECS:

aws ecs register-task-definition --cli-input-json file://aws/task-definitions/backend-flask.json

We now have to push the image for the frontend-react again. We make sure we’re still logged into ECR first:

aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

We edit our build a bit, changing the variable for REACT_APP_BACKEND_URL to reflect our new subdomain, then build it.

We first set the URL.

export ECR_FRONTEND_REACT_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/frontend-react-js"
echo $ECR_FRONTEND_REACT_URL

Then build.

docker build \
--build-arg REACT_APP_BACKEND_URL="https://api.cruddur.com" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_99999999999" \
--build-arg REACT_APP_CLIENT_ID="9999999999999" \
-t frontend-react-js \
-f Dockerfile.prod \
.

We tag and push the image.

docker tag frontend-react-js:latest $ECR_FRONTEND_REACT_URL:latest

docker push $ECR_FRONTEND_REACT_URL:latest

With the task definitions for the backend updated, we go back to ECS and update the service, forcing a new deployment, using the latest revision.

The frontend uses the latest revision by default, so all we need to do is update the service, forcing a new deployment. Both tasks show as healthy after deployment, so we go check our target groups in ec2 > Target Groups > both the frontend and backend are now showing as healthy.

image

When we load the app through the browser, it displays. However there’s no data returned, also if we inspect it, there’s a CORS error for the subdomain, api.gooddesignsolutions.in .

Andrew is getting the same for api.cruddur.com. We go back into ECS and grab the task id for the backend task.

Then, back in our workspace, we use our script to connect to it.

Once in the task, we type env to see what environment variables are set.

image

After scrolling up, we see that the FRONTEND_URL and BACKEND_URL are being set. But they do not have protocols being set, so we again go back to our backend-flask.json task definition file and edit the variables for FRONTEND_URL AND BACKEND_URL.

      {"name": "FRONTEND_URL", "value": "https://gooddesignsolutions.in"},
      {"name": "BACKEND_URL", "value": "https://api.gooddesignsolutions.in"},

We again register the task definitions through the CLI, then force a new deployment through ECS. After waiting several moments for the new deployement, we can test the app again, and it’s now returning data!

In investigating our app now that it’s deployed, we ran into some debugging menus that we need to remove once we’re in a production mode environment.

Andrew finds documentation online regarding debugging application errors in production here:.

We found that with debugging enabled in a production environment, “The debugger allows executing arbitrary Python code from the browser.”

Upon learning this, we navigate in AWS over to EC2 and the security group for our load balancer. We edit the inbound rules, removing the open ports for 3000 and 4657, then for the time being, only allow My IP from both protocols HTTPS and HTTP and their respective ports, 443 and 80. This will lock the app down to where only I can access it for the time being.

Back in our workspace, we navigate to our backend-flask folder, select our Dockerfile and edit it, adding --debug to our CMD. This allows debugging in our development:

# CMD (Command)
# python3 -m flask run --host=0.0.0.0 --port=4567
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0", "--port=4567", "--debug"]

Then we create a new Dockerfile, Dockerfile.prod. This will be for production. Notice the flags on our CMD are slightly different than our development Dockerfile:

FROM 99999999999.dkr.ecr.us-east-1.amazonaws.com/cruddur-python:3.10-slim-buster

# [TODO] For debugging, don't leave these in
#RUN apt-get update -y
#RUN apt-get install iputils-ping -y
# -------

#  Inside Container
# Make a new folder inside container
WORKDIR /backend-flask

# Outside Container -> Inside Container
# this contains the libraries we want to install to run the app
COPY requirements.txt requirements.txt

# Inside Container
# Install the python libraries used for the app
RUN pip3 install -r requirements.txt

# Outside Container -> Inside Container
# . means everything in the current directory
# first period . - /backend-flask (outside container)
# second period ./backend-flask (inside container)
COPY . .

EXPOSE ${PORT}

# CMD (Command)
# python3 -m flask run --host=0.0.0.0 --port=4567
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0", "--port=4567", "--no-debug", "--no-debugger", "--no-reload"]

To build our production Dockerfile separately, we go to the CLI. First we login to ECR again:

aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

With logging into the ECR being such a repetitive task, we decide to create a script for it. From our backend-flask/bin folder, we create a new folder named ecr then a new file named login.

#! /usr/bin/bash

aws ecr get-login-password --region $AWS_DEFAULT_REGION | docker login --username AWS --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

After chmod’ing the file, it’s executable. We run the file, then proceed with building our production Dockerfile.

./bin/ecr/login


docker build -f Dockerfile.prod -t backend-flask-prod .

To test this production build, we have to run the Dockerfile, passing our environment variables to it.

#! /usr/bin/bash

docker run --rm \
-p 4567:4567 \
--env AWS_ENDPOINT_URL="http://dynamodb-local:8000" \
--env CONNECTION_URL="postgresql://postgres:***************@db:5432/cruddur" \
--env FRONTEND_URL="https://3000-${GITPOD_WORKSPACE_ID}.${GITPOD_WORKSPACE_CLUSTER_HOST}" \
--env BACKEND_URL="https://4567-${GITPOD_WORKSPACE_ID}.${GITPOD_WORKSPACE_CLUSTER_HOST}" \
--env OTEL_SERVICE_NAME='backend-flask' \
--env OTEL_EXPORTER_OTLP_ENDPOINT="https://api.honeycomb.io" \
--env OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=${HONEYCOMB_API_KEY}" \
--env AWS_XRAY_URL="*4567-${GITPOD_WORKSPACE_ID}.${GITPOD_WORKSPACE_CLUSTER_HOST}*" \
--env AWS_XRAY_DAEMON_ADDRESS="xray-daemon:2000" \
--env AWS_DEFAULT_REGION="${AWS_DEFAULT_REGION}" \
--env AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" \
--env AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}" \
--env ROLLBAR_ACCESS_TOKEN="${ROLLBAR_ACCESS_TOKEN}" \
--env AWS_COGNITO_USER_POOL_ID="${AWS_COGNITO_USER_POOL_ID}" \
--env AWS_COGNITO_USER_POOL_CLIENT_ID="99999999999999999" \
-it backend-flask-prod

Before running this, we save it to a new folder in our bin directory, naming it docker/backend-flask-prod. We make the file executable by chmod'ing it, then run it.

image

We get connection pool errors from the console, but this is because our PostgreSQL db is not running in the current environment, so we do a docker compose up on selective services, select our db then let it compose up.

image

We’re still having connections issues to the database, but that’s not what we’re concerned with here.

We’re trying to see if errors are logged in debug mode, so instead we go to our app.py file and introduce an error in the health-check.

Then we go back and create a new folder within our backend-flask/docker folder named build then create two new files backend-flask-prod and frontend-react-js-prod.

#! /usr/bin/bash

docker build -f Dockerfile.prod -t backend-flask-prod .
#! /usr/bin/bash

docker build \
--build-arg REACT_APP_BACKEND_URL="https://4567-$GITPOD_WORKSPACE_ID.$GITPOD_WORKSPACE_CLUSTER_HOST" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_9999999" \
--build-arg REACT_APP_CLIENT_ID="999999999999999" \
-t frontend-react-js \
-f Dockerfile.prod \
.

We then spin up our environment with the regular Dockerfile from the backend. With our non-production environment now running, we launch the backend of our app from our workspace, then modify the URL to direct to our health-check, adding /api/health-check to the end of our URL.

The page returns a TypeError, due to the error we introduced earlier.

image

Since we don’t want to see the TypeError page, we test modifying the CMD from our Dockerfile.

CMD [ "python3, "-m" , "flask, "run", "--host=0.0.0.0", "--port=4567", "--no-debug"]

We then compose up our environment again. Once it loads, we again launch our backend and modify the URL to direct to our health-check, which we introduced an error into previously.

image

It now returns an Internal Server Error page, which means the flag we passed in our development Dockerfile worked. It’s not in debug mode. We modify the development Dockerfile back, removing the --no-debug flag from the CMD.

We now chmod our two files we created earlier in the docker folder to make them exectuable. Then, we run the backend-flask-prod file to build the production environment again.

./bin/docker/build/backend-flask-prod

We create another script and folder, this time from the docker directory, a push folder, with a file named backend-flask-prod, then chmod the file.

#! usr/bin/bash

ECR_FRONTEND_REACT_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/frontend-react-js"
echo $ECR_FRONTEND_REACT_URL
docker tag backend-flask-prod:latest $ECR_BACKEND_FLASK_URL:latest
docker push $ECR_BACKEND_FLASK_URL:latest

We run this file, and it tags and pushes the image. Instead of manually running all of these commands from the CLI, we decide to simplify things and begin working on creating scripts for all of this.

In our ecs folder created previously, we create a new file force-deploy-backend-flask. Our goal with this file is to force a new deployment of the backend-flask service from ECS.


#! /usr/bin/bash

CLUSTER_NAME="cruddur"
SERVICE_NAME="backend-flask"
TASK_DEFINITION_FAMILY="backend-flask"

LATEST_TASK_DEFINITION_ARN=$(aws ecs describe-task-definition \
--task-definition $TASK_DEFINITION_FAMILY \
--query 'taskDefinition.taskDefinitionArn' \
--output text)

echo "TASK DEF ARN:"
echo $LATEST_TASK_DEFINITION_ARN

aws ecs update-service \
--cluster $CLUSTER_NAME \
--service $SERVICE_NAME \
--task-definition $LATEST_TASK_DEFINITION_ARN \
--force-new-deployment

We were running into issues with pathing in our scripts, as we are now moving them to new directories, and additional folders have been created/removed. We moved the /bin/ directory to the root of our workspace.

A fellow bootcamper reached out to Andrew with a possible solution to the problem getting us the absolute path and implementing it into our scripts. We start with the frontend-react-js-prod script from earlier.

#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BUILD_PATH=$(dirname $ABS_PATH)
DOCKER_PATH=$(dirname $BUILD_PATH)
BIN_PATH=$(dirname $DOCKER_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
FRONTEND_REACT_JS_PATH="$PROJECT_PATH/frontend-react-js"

docker build \
--build-arg REACT_APP_BACKEND_URL="https://4567-$GITPOD_WORKSPACE_ID.$GITPOD_WORKSPACE_CLUSTER_HOST" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_N7WWGl3KC" \
--build-arg REACT_APP_CLIENT_ID="575n8ecqc551iscnosab6e0un3" \
-t frontend-react-js \
-f "$FRONTEND_REACT_JS_PATH/Dockerfile.prod" \
"$FRONTEND_REACT_JS_PATH/."
We test the file and it builds. While waiting on this to build, we do the same to backend-flask.prod
#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BUILD_PATH=$(dirname $ABS_PATH)
DOCKER_PATH=$(dirname $BUILD_PATH)
BIN_PATH=$(dirname $DOCKER_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"

docker build \
-f "$BACKEND_FLASK_PATH/Dockerfile.prod" \
-t backend-flask-prod \
"$BACKEND_FLASK_PATH/."

The next file we update pathing for is ./bin/ddb/seed

current_path = os.path.dirname(os.path.abspath(__file__))
parent_path = os.path.abspath(os.path.join(current_path, '..', '..','backend-flask'))
sys.path.append(parent_path)
from lib.db import db

Then ./bin/db/schema-load

#! /usr/bin/bash

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-schema-load"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

ABS_PATH=$(readlink -f "$0")
BIN_PATH=$(dirname $ABS_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"
schema_path="$BACKEND_FLASK_PATH/db/schema.sql"
echo $schema_path

if [ "$1" = "prod" ]; then
  echo "Running in production mode"
  URL=$PROD_CONNECTION_URL
else
  URL=$CONNECTION_URL
fi

psql $URL cruddur < $schema_path

Moving onto ./bin/db/seed

#! /usr/bin/bash

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-seed"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

ABS_PATH=$(readlink -f "$0")
BIN_PATH=$(dirname $ABS_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"
seed_path="$BACKEND_FLASK_PATH/db/seed.sql"
echo $seed_path

if [ "$1" = "prod" ]; then
  echo "Running in production mode"
  URL=$PROD_CONNECTION_URL
else
  URL=$CONNECTION_URL
fi

psql $URL cruddur < $seed_path

From there ./bin/db/setup

#! /usr/bin/bash
set -e # stop if it fails at any point

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-setup"
printf "${CYAN}==== ${LABEL}${NO_COLOR}\n"

ABS_PATH=$(readlink -f "$0")
DB_PATH=$(dirname $ABS_PATH)

source "$DB_PATH/drop"
source "$DB_PATH/create"
source "$DB_PATH/schema-load"
source "$DB_PATH/seed"
python "$DB_PATH/update_cognito_user_ids"

And then ./bin/db/update_cognito_user_ids

current_path = os.path.dirname(os.path.abspath(__file__))
parent_path = os.path.abspath(os.path.join(current_path, '..', '..','backend-flask'))
sys.path.append(parent_path)
from lib.db import db

We then updated the path for Postgres in our .gitpod.yml file:

    command: |
      export GITPOD_IP=$(curl ifconfig.me)
      source  "$THEIA_WORKSPACE_ROOT/bin/rds/update-sg-rule"   

At this point, the Dockerfile we were building has completed. We now need to push and tag it.

We already created a script to do this for our backend-flask-prod, now we need to create one for our frontend, so we create frontend-react-js.prod in the ./bin/docker/push directory.

#! /usr/bin/bash

ECR_FRONTEND_REACT_URL="$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/frontend-react-js"
echo $ECR_FRONTEND_REACT_URL

docker tag frontend-react-js:latest $ECR_FRONTEND_REACT_URL:latest
docker push $ECR_FRONTEND_REACT_URL:latest

After making the file executable with a chmod, we run the file to tag and push our frontend build. We decide to create a new force-deploy file for the frontend as well, since we already have one for the backend. In the ./bin/ecs directory, we create force-deploy-frontend-react-js.

#! /usr/bin/bash

CLUSTER_NAME="cruddur"
SERVICE_NAME="frontend-react-js"
TASK_DEFINITION_FAMILY="frontend-react-js"

LATEST_TASK_DEFINITION_ARN=$(aws ecs describe-task-definition \
--task-definition $TASK_DEFINITION_FAMILY \
--query 'taskDefinition.taskDefinitionArn' \
--output text)

echo "TASK DEF ARN:"
echo $LATEST_TASK_DEFINITION_ARN

aws ecs update-service \
--cluster $CLUSTER_NAME \
--service $SERVICE_NAME \
--task-definition $LATEST_TASK_DEFINITION_ARN \
--force-new-deployment

We make this file executable with a chmod, then run the file. This forces a new deployment of the frontend image. We test our production backend by navigating to our custom domain, adding the additional subdomain and path: https://api.gooddesignsolutions.in/api/health-check - this returns true. Next we introduce an error in the URL purposely, to make sure our debug menu doesn't appear - this returns an "Internal Server Error" so our debugging menu is not present.

Andrew begins to take us down the path to make sure that we’re using Python in a safe way. We begin researching Flask debugging. Andrew begins telling us about Ruby Rack, and looks up the Python equivalent, which turns out to be WSGI. The built-in debugger for Flask is Werkzeug, which is a utility library for WSGI. We begin searching to see if we can run this in production mode. According to their documentation, it’s intended only during local development.

image

image

Andrew then discussed several different options for debugging including possibly Gunicorn, but said we’ll have to see how things go moving forward.

Moving on, Andrew mentions that DynamoDB isn’t working in production mode, so we are going to debug that. We go back to our production app, through the web browser. We’re logged in, but there’s no data.

image

After inspecting the page, we find that its doing a GET from the wrong location. Andrew suspects this is from pushing the image earlier. We go back to our ./bin/docker/build/frontend-react-js-prod file and review our code.

We update this:

docker build \
--build-arg REACT_APP_BACKEND_URL="https://4567-$GITPOD_WORKSPACE_ID.$GITPOD_WORKSPACE_CLUSTER_HOST" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_9999999999" \
--build-arg REACT_APP_CLIENT_ID="99999999999999999999" \
-t frontend-react-js \
-f "$FRONTEND_REACT_JS_PATH/Dockerfile.prod" \
"$FRONTEND_REACT_JS_PATH/."

To this:

docker build \
--build-arg REACT_APP_BACKEND_URL="https://api.thejoshdev.com" \
--build-arg REACT_APP_AWS_PROJECT_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_COGNITO_REGION="$AWS_DEFAULT_REGION" \
--build-arg REACT_APP_AWS_USER_POOLS_ID="us-east-1_9999999999" \
--build-arg REACT_APP_CLIENT_ID="99999999999999999999" \
-t frontend-react-js \
-f "$FRONTEND_REACT_JS_PATH/Dockerfile.prod" \
"$FRONTEND_REACT_JS_PATH/."

Our environment variable for REACT_APP_BACKEND_URL was incorrect. We rebuild.

./bin/docker/build/frontend-react-js-prod

After it builds, we push and tag it again as well:

./bin/docker/push/frontend-react-js=prod

We then deploy it.

./bin/ecs/force-deploy-frontend-react-js

We then deploy it.

./bin/ecs/force-deploy-frontend-react-js

We go back into ECS to check the status of the new deployment. After several moments, the deployment still hasn’t shown as healthy yet, so we move over to EC2 and check the target groups. The old target group is still draining. This leads us to Andrew telling us about types of deployments and that when we deploy, we’re doing an ECS deployment, which is according to AWS, “replacing the current running version of the container with the latest version. The number of containers ECS adds or removes from the service during a rolling update is controlled by adjusting the minimum and maxiumum number of healthy tasks allowed during a service dployment, as sepcified in the DeploymentConfiguration.”

After waiting a bit longer for the target group to drain, we decide to go ahead and navigate to our production app anyways. The page loads through Inspect in the browser with no errors this time. Through this process, Andrew decides finding the correct scripts to run from what directory is becoming quite combersome due to the amount of scripts that we have. To alleviate the difficulty with this, from with our /bin/ directory, we create two folders: frontend and backend. We begin moving and renaming our various scripts, moving those related to frontend-react-js going to the frontend, those related to backend-flask to our backend folder. In our build scripts for the both the frontend and the backend, we had to update the pathing, as it has changed.

#for the backend

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"

#for the frontend

ABS_PATH=$(readlink -f "$0")
FRONTEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $FRONTEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
FRONTEND_REACT_JS_PATH="$PROJECT_PATH/frontend-react-js"

We reload our production app at this point and we now have data! Now that we have fixed that issue, we move onto the Messages section. Messages are not populating, and Andrew suspects this is due to our local users not existing in the production database. We need to seed some users. We connect to the remote database through the Terminal.

./bin/db/connect prod

After connecting to the database, we run a SELECT * FROM users; which returns our Cognito account that we created previously. We need to populate some users so we have something to work with. For reference, we pull up ./backend-flask/db/seed.sql and copy the command into the terminal connected to our Postgres database.

INSERT INTO public.users (display_name, email, handle, cognito_user_id) VALUES ('Andrew Bayko','[email protected]' , 'bayko' ,'MOCK') 

Andrew accidentally copies the line with his user information again instead for the manual insert above, so when he tests the /messages/new/bayko page of his web app, his Inspect page comes back with a 500 error returned on the GET for short.

That leads us to go check out our Rollbar account to see if we have any error tracking.

Screenshot 2024-02-05 002846

That leads us to go check out our Rollbar account to see if we have any error tracking.

image

Andrew shows us where the short that was returning the 500 error is coming from. We navigate to /backend-flask/services/users_short.py

image

We go back to Rollbar, check Cloudwatch logs, view RDS; none of these are showing us what the problem was. Eventually, from the terminal, we run a query:

SELECT * FROM users;

At this point, Andrew sees that he’s added himself twice to the database, but it raises the question. If our app did not find /messages/new/baykowhy wasn't there an error other than a 500 error? We navigate to our ./backend-flask/lib/db.py file and pull up our code:

    with self.pool.connection() as conn:
      with conn.cursor() as cur:
        cur.execute(wrapped_sql,params)
        json = cur.fetchone()
        if json == None:
          "{}"
          else:
            return json[0]

We want to see what’s returning, so we docker compose up our local environment.

We need to seed data again, so we use this opportunity to run another script ./bin/db/setup. The script won't run however, because the pathing is now incorrect since we restructured our /bin/ folder. While working through the pathing, we're repeatedly running ./bin/db/setup, but it's failing because there's sessions connected to the database that we're attempting to drop through part of the script.

Eventually after composing our environment up and down several times with the same issue when trying to drop the database, we close out of our workspace, and start up a new one through Gitpod. We again run our ./bin/db/setup file, this time it errors on an issue with pathing in the ./bin/db/seed file. With the pathing fixed, we again try the setup file, but it again tells us the database is being accessed by other users when we attempt to drop the table. We quickly search for and find sufficient code to kill sessions to a Postgres database, then in our ./backend-flask/db directory, we create kill-all-connections.sql.

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE   
-- don't kill my own connection!
pid <> pg_backend_pid()
-- don't kill the connections to other databases
AND datname = 'cruddur';    

Then, in ./bin/db we create kill-all.

#! /usr/bin/bash

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-kill-all"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

ABS_PATH=$(readlink -f "$0")
DB_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $DB_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"
kill_path="$BACKEND_FLASK_PATH/db/kill-all-connections.sql"
echo $kill_path

psql $CONNECTION_URL cruddur < $kill_path

We test the script.

./bin/db/kill-all

It completes. We attempt to drop the database again.

./bin/db/drop

The database drops successfully this time. We again run our setup file.

image

We view the pathing in our update_cognito_user_ids file. We needed to add 'backend-flask' to the path.

current_path = os.path.dirname(os.path.abspath(__file__))
parent_path = os.path.abspath(os.path.join(current_path, '..', '..','backend-flask'))
sys.path.append(parent_path)
from lib.db import db

We continue on, loading our Message data.

./bin/ddb/schema-load

We attempt to seed the data running ./bin/ddb/seed but we obtain a ModuleNotFoundError: No module named 'lib' error. We open ./bin/db/seed and view the pathing. It needed updated as well, just the same as above. We make the same pathing change as above, then try running the seed file again. This time, it seeds our data.

With our local environment completely running now, we open the backend port in a browser, then check the path: /api/users/@bayko/short

image

We open the frontend port instead, then sign into our web app. We see that we are not returning any data. A quick refresh of the browser, and data returns. We now navigate to the path /messages/new/bayko and Inspect the page.

Screenshot 2024-02-05 140723

It’s returning what its supposed to now. We alter the path to a user that does not exist in our database. /messages/new/asdfasdf. The short now returns a 500 error.

image

We check the backend-flask logs through our terminal:

image

The data returning is not what is expected. We open our users_short.py and our backend-flask/lib/db.py files. Andrew explains that he had thought json would return None, but it's returning something. So we begin debugging by printing a few lines.

image

    with self.pool.connection() as conn:
      with conn.cursor() as cur:
        cur.execute(wrapped_sql,params)
        json = cur.fetchone()
        if json == None:
          return "{}"
          else:
            return json[0]

With the return added, we refresh the web app, then inspect the page again.

image

It’s returning what it’s supposed to return now. We’re ready to push the changes to production, so we run our build script for our backend.

./bin/backend/build

image

We need to update the pathing in our build script.

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"

We go ahead and fix the pathing in our ./bin/frontend/build script as well:

ABS_PATH=$(readlink -f "$0")
FRONTEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $FRONTEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
FRONTEND_REACT_JS_PATH="$PROJECT_PATH/frontend-react-js"

We run the build script for the backend again, this time it works. We tag and push the image again as well.

./bin/backend/push

./bin/backend/deploy

We connect to our production database through the terminal.

./bin/db/connect prod

From the terminal, we manually insert a user into the database:

INSERT INTO public.users (display_name, email, handle, cognito_user_id) VALUES ('Andrew Bayko', '[email protected]' , 'bayko' ,'MOCK');

From our production web app, we go again to /messages/new/bayko, then inspect the page. We return all 200 statuses. When we test the app to send a message, it works!

We now direct our attention towards an issue we’ve been having with our Cognito token. Our frontend-react-js/src/lib/CheckAuth.js file declares a const for checkAuth that we thought would attempt to renew our Cognito token.

It has not been doing so. Andrew explains we're going to have to wrap this in another function to make sure this gets set.

We research token refresh for AWS Cognito using Amplify, noting a possible solution using Auth.currentSession, then go back to our code for CheckAuth.js

const checkAuth = async (setUser) => {
  Auth.currentAuthenticatedUser({
    // Optional, By default is false.
    // If set to true, this call will send a 
    // request to Cognito to get the latest user data
    bypassCache: false
  })
  .then((user) => {
    console.log('user',user);
    return Auth.currentAuthenticatedUser()
  }).then((cognito_user) => {
      console.log('cognito_user',cognito_user);
      setUser({
        display_name: cognito_user.attributes.name,
        handle: cognito_user.attributes.preferred_username
      })

It looks like Auth.currentAuthenticatedUser is getting called twice and Andrew's not sure why that was, so we logged out the data. Then we reloaded the page and inspected it.

image

The same information is being returned twice. Looks like we will be using Auth.currentSession. We update our code.

const checkAuth = async (setUser) => {
  Auth.currentAuthenticatedUser({
    // Optional, By default is false.
    // If set to true, this call will send a 
    // request to Cognito to get the latest user data
    bypassCache: false
  })
  .then((cognito_user) => {
    console.log('cognito_user',cognito_user);
    setUser({
        display_name: cognito_user.attributes.name,
        handle: cognito_user.attributes.preferred_username
      })
    return Auth.currentSession()
  }).then((cognito_user) => {
      console.log('cognito_user_session',cognito_user_session);
      localStorage.setItem("access_token", cognito_user_session.accessToken.jwtToken)
  })
  .catch((err) => console.log(err));

Andrew explains that we’re going to have to do this check every time we do API calls as well, so we will likely need to wrap this around other functions. From CheckAuth.js we create a new function, making a call to the Auth.currentSession.

const getAccessToken = async () => {
  Auth.currentSession()
  .then((cognito_user_session) => {
    localStorage.setItem("access_token", cognito_user_session.accessToken.jwtToken);
    return localStorage.getItem("access_token")
    
  })   
  .catch((err) => console.log(err));
}

From our HomeFeedPage.js we adjust our API call to pass along getAccesstoken. We also import from CheckAuth.js

import {checkAuth, getAccessToken} from 'lib/CheckAuth';

  const loadData = async () => {
    try {
      const backend_url = `${process.env.REACT_APP_BACKEND_URL}/api/activities/home`
      const access_token = getAccessToken()
      console.log('access_token',access_token)
      const res = await fetch(backend_url, {
        headers: {
          Authorization: `Bearer ${access_token}`
        },
        method: "GET"
      });

We test our local web app.

image

This could be due to the updated code, so we find and replace all instances of import CheckAuth with import {checkAuth}. Then we refresh the page. This makes no change. As it turns out, we need to export our functions from CheckAuth.js.

export async function getAccessToken(){
  Auth.currentSession()
  .then((cognito_user_session) => {
    localStorage.setItem("access_token", cognito_user_session.accessToken.jwtToken);
    return localStorage.getItem("access_token")
    
  })   
  .catch((err) => console.log(err));
}

export async function checkAuth(setUser){
  Auth.currentAuthenticatedUser({
    // Optional, By default is false.
    // If set to true, this call will send a 
    // request to Cognito to get the latest user data
    bypassCache: false
  })
  .then((cognito_user) => {
    console.log('cognito_user',cognito_user);
    setUser({
        display_name: cognito_user.attributes.name,
        handle: cognito_user.attributes.preferred_username
      })
    return Auth.currentSession()
  }).then((cognito_user) => {
      console.log('cognito_user_session',cognito_user_session);
      localStorage.setItem("access_token", cognito_user_session.accessToken.jwtToken)
  })
  .catch((err) => console.log(err));

We reload the web app again, and it is again displaying correctly. However, one of our console.logs are coming back with access_token undefined. We again review our code both for the API call in HomeFeedPage.js and the getAccessToken function in CheckAuth.js

export async function getAccessToken(){
  Auth.currentSession()
  .then((cognito_user_session) => {
    const access_token = cognito_user_session.accessToken.jwtToken
    console.log('11',access_token)
    localStorage.setItem("access_token", access_token)
  })   
  .catch((err) => console.log(err));
}

Note that we console.log the same information in both files.

We again refresh the page and inspect it. It’s identical information.

image

We do a find and replace on every file in our workspace that contains Authorization, then add a line to import our new functions.

import {checkAuth, getAccessToken} from '../lib/CheckAuth';

Then, wherever Authorization appears, in these cases passing our Authorization headers for CORS, we're also adding two lines of code above it:

await getAccessToken()
const access_token = localStorage.getItem("access_token")

We refresh our web app, then click through the various pages. Of the pages currently connected correctly, our token stays logged in now. When we inspect, we get back nothing but 200 status codes, so we’re in good shape.

Moving on, we decide we want to implement XRay back into our application. Andrew shows us a Cloudformation task definition file where we view the code for XRay.

- Name: xray
  Image: public.ecr.aws/xray/aws-xray-daemon
  Essential: true
  User: '1337'
  LogConfiguration:
      LogDriver: awslogs
      Options:
        awslogs-group: !Ref AWS::StackName
        awslogs-region: !Ref AWS::Region
        awslogs-stream-prefix: app

We copy the code and begin converting it to .json by writing it to our ./aws/task-definitions/backend-flask.json file.

{
        "name": "xray",
        "image": "public.ecr.aws/xray/aws-xray-daemon",
        "essential": true,       
        "user": "1337",
        "portMappings": [
          {
            "name": "xray",
            "containerPort": 2000,
            "protocol": "udp"
          }
        ]        
      },

We continue prepping for another deployment, now creating a new script to easily register task definitions so we don’t have to manually run the command. We create ./bin/backend/register

#! /usr/bin/bash

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
TASK_DEF_PATH="$PROJECT_PATH/aws/task-definitions/backend-flask.json"

echo $TASK_DEF_PATH

aws ecs register-task-definition \
--cli-input-json "file://$TASK_DEF_PATH"

While troubleshooting the pathing for this script, we found that our build scripts had incorrect pathing since the bin directory move, so we fix this as well.

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
BACKEND_FLASK_PATH="$PROJECT_PATH/backend-flask"
ABS_PATH=$(readlink -f "$0")
FRONTEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $FRONTEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
FRONTEND_REACT_JS_PATH="$PROJECT_PATH/frontend-react-js"
ABS_PATH=$(readlink -f "$0")
FRONTEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $FRONTEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
FRONTEND_REACT_JS_PATH="$PROJECT_PATH/frontend-react-js"
Then we create one for the frontend.

#! /usr/bin/bash

ABS_PATH=$(readlink -f "$0")
FRONTEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $FRONTEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
TASK_DEF_PATH="$PROJECT_PATH/aws/task-definitions/frontend-react-js.json"

echo $TASK_DEF_PATH

aws ecs register-task-definition \
--cli-input-json "file://$TASK_DEF_PATH"

We chmod both files to make them executable, then we run the file for the backend.

./bin/backend/register

After fixing some syntax issues with the task definition file for the backend, our script works. XRay should now be available, but only on the next deployment. So we again need to deploy.

./bin/backend/deploy

We go back into ECS from the AWS Console to check our deployment. We go into our cruddur cluster, then select the backend-flask service. Our service did not deploy the latest task when checking this. To troubleshoot, we go back to our workspace and manually deploy the backend from the CLI.

aws ecs describe-task-definition \
--task-definition $TASK_DEFINITION_FAMILY \
--query 'taskDefinition.taskDefinitionArn' \
--output text

The CLI outputs backend-flask:13. This IS our latest task. We deploy again running our ./bin/backend/deploy script. Next, we go back into ECS and view our cluster. We can see the backend-flask service is working on the deployment, so we click into the service, then view the latest task.

image

XRay is running, but it’s health status is unknown. While researching possible ways to implement a health check for XRay, we come back to ECS to find that our backend-flask task is unhealthy. We check the logs for the task, and all it is returning is 200 statuses, passing our api/health-check.

We decide to force a new deployment from the AWS console. This new deployment is coming back unhealthy as well, yet our logs indicate its passing them.

We update our aws/task-definitions/backend-flask.json by removing the portion of the definition for XRay, then we again register the task definition, then again deploy it. After this, we check ECS and again the task is failing due to health checks.

image Since it’s not causing the problem, we add XRay back to our task definition, then re-register it. To continue troubleshooting, we build our environment locally:

./bin/backend/build

Then we Docker compose up selected services, just db and dynamodb-local.

We create our data.

./bin/db/setup

We create another script, this time ./bin/backend/run.

#! /usr/bin/bash

docker run --rm \
  -p 4567:4567
  -it backend-flask-prod

We chmod the file to make it executable, then we try to run it, but it fails. Andrew explains this is because it is missing our environment variables. It’s looking for an .env file containing our variables.

From the root of our workspace, we create two new files: .env.backend-flask and .env.frontend-react-js. We cut all of our env vars from our docker-compose.yml and paste them into their respective .env file. We then correct the syntax of the variables going from a .yml format to an .env. We then update our docker-compose.yml to point to our new .env files.

services:
  backend-flask:
    env_file:
      - .env.backend-flask
frontend-react-js:
    env_file:
      - .env.frontend-react-js

Next, we have to update our run file to use to env vars.

#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
ENVFILE_PATH="$PROJECT_PATH/.env.backend-flask"

docker run --rm \
  --env-file $ENVFILE_PATH \
  --publish 4567:4567 \
  -it backend-flask-prod

Before we move further with this, we decide to test the changes we’ve made by composing down our environment, then Docker compose up for the whole environment. We view the frontend of the app, where there’s no data returned. We attach a shell of the backend through terminal, then run an env command. All of our environment variables return as expected.

We compose down our entire environment, then again Docker compose up, this time just selected services db , dynamodb-local , and xray-daemon. To make sure we have seeded data, we again run our ./bin/db/setup.

We start our environment locally.

./bin/backend/run

Amongst other errors in the terminal, we are getting this as well:

image

After some time to research this, Andrew believes the issue is due to how containers handle their network. Since XRay is ran as its own container, its using localhost. However, if we add a new service definition to our docker-compose.yml, that could fix the issue. We set the network through the CLI:

docker network create cruddur-net

We read further into our docker-compose.yml file and see that a network is already named cruddur. We check networks for Docker from the CLI:

docker network list

image

We need to find out how to name the user defined networks in Docker. After checking Docker Docs documentation, we update docker-compose.yml

networks: 
  default:
    driver: bridge
    name: cruddur-net

We also update ./bin/backend/run

#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
ENVFILE_PATH="$PROJECT_PATH/.env.backend-flask"

docker run --rm \
  --env-file $ENVFILE_PATH \
  --network cruddur-net \
  --publish 4567:4567 \
  -it backend-flask-prod
#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
ENVFILE_PATH="$PROJECT_PATH/.env.backend-flask"

docker run --rm \
  --env-file $ENVFILE_PATH \
  --network cruddur-net \
  --publish 4567:4567 \
  -it backend-flask-prod

We compose down our environment again, then compose up our 3 selected services again: db , dynamodb-local , and xray-daemon. We again seed our data, then try again to run our environment.

It's returning the same issue as it was above. Andrew gets a second look at the code with Bayko and it turns out the issue is how we're passing the env vars. We decide to generate out our environment variables each time our environment loads. To do this, we decide to implement Ruby.

We begin by creating a new script in ./bin/backend named generate-env.

#!/usr/bin/env ruby

require 'erb'

template = File.read 'env.backend-flask.erb'
script_content = ERB.new(template).result(binding)

This is going to read from a file named env.backend-flask.erb for our env vars, so in the root of our workspace, we create .env.backend-flask.erb. We copy/paste all of our environment variables from .env.backend-flask.erb to the new file, changing the syntax for an .erb file. We have to render our .erb file and send it to a directory, so we update ./bin/backend/generate-env.

#!/usr/bin/env ruby

require 'erb'

template = File.read '.env.backend-flask.erb'
content = ERB.new(template).result(binding)
filename = ".env.backend-flask"
File.write(filename, content)

After testing the script works, we delete the manually created .env.backend-flask file from earlier. Next, we create a new folder from the root of the workspace and named it erb. We move the .env.backend-flask.erb file we created previously into this folder. Then we create a new file .env.frontend-react-js.erb, and copy/paste our env vars from .env.frontend-react-js to the new file, making syntax changes to the .erb format.

In ./bin/frontend we create a new file named generate-env as well. It's during the creation of this script that we realize we've set our .gitignore file to not pay attention to files ending in .env, not how we've named them here. We go back and fix ./bin/backend/generate-env then complete ./bin/frontend/generate-env

#!/usr/bin/env ruby

require 'erb'

template = File.read 'erb/backend-flask.env.erb'
content = ERB.new(template).result(binding)
filename = "backend-flask.env"
File.write(filename, content)
#!/usr/bin/env ruby
require 'erb'

template = File.read 'erb/frontend-react-js.env.erb'
content = ERB.new(template).result(binding)
filename = 'frontend-react-js.env'
File.write(filename, content)

In our erb directory, we rename these files as well.

image

We test both scripts.

./bin/frontend/generate-env
./bin/backend/generate-env

This generates our backend-flask.env and frontend-react-js.env files.

image

So we can ensure these variables are accessible whenever we launch our workspace, we decide to add these scripts to run in our .gitpod.yml file.

For our frontend:

- name: react-js
    command: |
      ruby "$THEIA_WORKSPACE_ROOT/bin/frontend/generate-env"

For our backend:

- name: flask 
    command: |
      ruby "$THEIA_WORKSPACE_ROOT/bin/backend/generate-env"

Initially in the video, the code above is using source "$THEIA_WORKSPACE_ROOT/bin/frontend/generate-env" and source "$THEIA_WORKSPACE_ROOT/bin/backend/generate-env" but this did not work, as these are Ruby scripts. Changing the command to ruby fixed the issue.

We go back to our docker-compose.yml file and update the chagne to the env_file names.

backend-flask:
    env_file:
      - backend-flask.env
frontend-react-js:
    env_file:
      - frontend-react-js.env

Next we update the pathing in our ./bin/backend/run file.

#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
ENVFILE_PATH="$PROJECT_PATH/backend-flask.env"

docker run --rm \
  --env-file $ENVFILE_PATH \
  --network cruddur-net \
  --publish 4567:4567 \
  -it backend-flask-prod

We find we never made a run file for the frontend, so we do so now. ./bin/frontend/run

#! /usr/bin/bash 

ABS_PATH=$(readlink -f "$0")
BACKEND_PATH=$(dirname $ABS_PATH)
BIN_PATH=$(dirname $BACKEND_PATH)
PROJECT_PATH=$(dirname $BIN_PATH)
ENVFILE_PATH="$PROJECT_PATH/frontend-react-js.env"

docker run --rm \
  --env-file $ENVFILE_PATH \
  --network cruddur-net \
  --publish 4567:4567 \
  -it frontend-react-js-prod

We also take this time to update our network for each service in our docker-compose.yml file. Andrew stresses at this time that we have to set all of our containers to connect to the same network.

networks: 
  cruddur-net:
    driver: bridge
    name: cruddur-net

We again run ./bin/backend/run.

image

We’re getting these same errors again. Andrew mentions we can install Busybox to debug these connections. From ./bin we create busybox.

#! /usr/bin/bash 


docker run --rm \
  --network cruddur-net \
  --publish 4567:4567 \
  -it busybox

We make the file executable, then run the file.

./bin/busybox

We’re able to ping XRay to see that it’s running.

image

To test the issue in our production environment, we temporarily install ping and telnet in our production environment. We open /backend-flask/Dockerfile.prod and add a line to install it.

# [TODO] For debugging, don't leave these in
RUN apt-get update =y
RUN apt-get install iputils-ping -y
# -------------

Then we build the environment.

./bin/backend/build

We temporarily add /bin/bash to the -it frontend-react-js-prod argument in ./bin/backend/run, so when we run the script, we automatically shell in.

./bin/backend/run

We have access to ping from here. We’re again able to ping XRay.

image

We again shell into our backend and view our environment variables, cross referencing this with what’s generated in our backend-flask.env We found that what's generated has quotations around the variables whereas the environment variables when shelled into the backend show no quotations. We manually update the backend-flask.env file to remove all quotes around the variables, then again try ./bin/backend/run. It's running.

image

To fix this permanently, we must update our .erb templates in our erb directory. We remove all quotes (singles and doubles) from all variables. We manually delete our existing .env files, then run our frontend and backend generate-env scripts to recreate them. When we check these .env files, they do not have any quotations. We want to make sure that Docker compose will still work, so we compose down our environment, then do a Docker compose up. Everything is in working order again!

We comment out the debugging code we added to backend-flask/Dockerfile.prod. Back to our original issue with the health check failing for the backend, we open up our aws/task-definitions/backend-flask.json file. Andrew believes he's found the problem

image

When we moved our bin directory, the path to this health check moved. To fix this issue, we create a new bin folder in backend-flask. We then move our health-check file from ./bin/flask to backend-flask/bin and delete the ./bin/flask folder. We then update the path to the health check in ./aws/task-definitions/backend-flask.json.

"name": "backend-flask",
        "image": "999999999.dkr.ecr.us-east-1.amazonaws.com/backend-flask",
        "essential": true,
        "healthCheck": {
          "command": [
            "CMD-SHELL",
            "python /backend-flask/bin/health-check"
          ],

With our task definition updated, we have to rebuild the backend container again.

./bin/backend/build

We next push the backend container.

./bin/backend/push

Then, when we try to register our task definition, we get an AccessDeniedException when calling the RegisterTaskDefinition operation. There's an issue with IAM in AWS. As it turns out, while committing our code along with Andrew, a temp file was committed as well, exposing an AWS key. We had to login to IAM as the root user, remove the explicit deny from the user created for our workspace, then rotate out my AWS keys. I then updated the env vars in our workspace with an export and gp env.

With the issue resolved, we again register the backend task definition.

./bin/backend/register

We check on our backend service through ECS in AWS again. The service task is showing unhealthy. We go back to our task definition code, and update the timeout from 15 to 5 seconds.

"interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60

Then, we re-register the task definition.

./bin/backend/register

For good measure, we redeploy the service.

./bin/backend/deploy

Back over in ECS, we again check the backend service.

image

The task status is pending, so we wait for it to complete. A few moments later, the task is now healthy!

image

XRay continues to show Unknown for Health status, but that’s because we have no way of adding a health check to it since curl is not installed in the default container. Speaking of XRay, we add it to our frontend task definition.

"containerDefinitions": [
      {
        "name": "xray",
        "image": "public.ecr.aws/xray/aws-xray-daemon",
        "essential": true,       
        "user": "1337",
        "portMappings": [
          {
            "name": "xray",
            "containerPort": 2000,
            "protocol": "udp"
          }
        ]        
      },

Then we register the task definition.

./bin/frontend/register

Then we deploy it.

./bin/frontend/deploy

Back over in ECS, we again view our services. They’re running.

image

From our cluster, we select Update Cluster , then expand Monitoring and select "Use Container Insights." Over time, these logs will continue to gather data. It's good to set this now, preparing for later.

I also watched Ashish Rajan’s Security Considerations video for Weeks 6–7. I kept detailed notes:

Week 6-7 Security Considerations Notes

What type of Container Services in AWS? 
- Virtual machine

Managed Services: 
- ECS
- Fargate
- EKS

Launch types to AWS: 
EC2 – ELB to auto scaling groups – contains EC2 instances – AWS manages VM and physical server
ECS – ELB to ECS cluster to auto scaling groups – contains EC2 instances – non managed
Fargate – ELB to ECS cluster to Fargate tasks – Amazon manages everything but containerized apps 

Security challenges with Fargate: 
- No visibility of infrastructure
- Ephemeral resources make it hard to do triage or forensics for detected threats
- No file/network monitoring
- Cannot run traditional security agents in Fargate
- User can run unverified Container images
- Containers can run as root and even with elevated privileges 

Amazon ECS – Security Best Practices - AWS
- Cloud control plane configuration – access control, container images, etc
- Choosing the right public or private ECR for images
- Amazon ECR Scan Images to “Scan on Push” using Basic or Enhanced (Inspector + Snyk)
- Use VPC Endpoints or security groups with known sources only
- Compliance standard is what your business requires
- Amazon Organizations SCP – to manage ECS Task deletion, ECS creation, region lock, etc
- AWS CloudTrail is enabled and monitored to trigger alerts on malicious ECS behavior by an identity in AWS
- AWS Config Rules (as no GuardDuty (ECS) even in March 2023) is enabled in the account and region of ECS


Amazon ECS – Security Best Practices – Application

- Access control – roles or IAM Users for ECS Clusters/Services/Tasks
- Most recent version of ECR Agent daemon on EC2
- Container Control Plane Configuration – root privileges, resource limitations, etc
- No secrets/passwords in ECS Task Definitions e.g. db password etc- Consider AWS Secret Manager
- No secrets/passwords in Containers – Consider AWS Secret Manager
- Only use Trusted Containers from ECR with no HIGH/CRITICAL vulnerabilities
- Limit ability to ssh into EC2 container to read only file systems – use APIs or GitOps to pull information for troubleshooting
- Amazon CloudWatch to monitor malicious ECW Configuration changes
- Only using Authorized Container Images (hopefully some image signing in the future e.g. sigstore)

What AWS services can help achieve a website or web application hosted in AWS? 

- Route53
- Cloudfront
- API Gateway

How does DNS Route53 work? 

1. DNS request: When a user enters a domain name in their web browser, it sends a DNS request to a DNS resolver server to translate the domain name into an IP address.

2. DNS resolver: The DNS resolver looks up the IP address associated with the domain name. If the resolver doesn't already have the IP address cached, it sends a query to the DNS root servers.

3. DNS root servers: The root servers respond with a list of TLD (Top-Level Domain) servers for the appropriate TLD.

4. TLD servers: The TLD servers respond with a list of authoritative name servers for the domain.

5. Authoritative name servers: The authoritative name servers respond with the IP address associated with the domain name.

6. Route 53: Once the DNS resolver has the IP address, it sends the request to the appropriate AWS resource, such as an EC2 instance or an S3 bucket.

Amazon Route53 – Security Best Practices – AWS
- Integration with ACM (Amazon Certificate Manager) for TLS
- Compliance standard is what your business requires for a DNS provider
- Amazon Organizations SCP – to manage Route53 actions like creation, deletion, modification of production URIs, etc
- AWS CloudTrail is enabled and monitored to trigger alerts for malicious activities e.g. Associate VPC with Hosted Zone, Change Resource Record sets, register domain, etc.
- GuardDuty is enabled for monitoring suspicious DNS communications (ex. Crypto-mining) and automated for auto-remediation
- AWS Config rules is enabled in the account and region of ECS

Amazon Route53 – Security Best Practices – Application
- Access Control – Roles or IAM users for making DNS changes in Amazon Route53
- Public vs Private hosted zones
- All Route53 records should point to an existing DNS, ELB, ALB, or AWS S3 – Watch out for dangling DNS domains
- Hosted Zone Configuration changes limited to small set of people
- enable Encryption in Transit using TLS/SSL certification (HTTPS Urls)
- Only use Trusted Domain Providers for requesting new DNS
- Set TTLs appropriately to afford to wait for a change to take effect 
- Ensure Root Domain Alias Record Points to ELB
- Develop process for continuously verifying if DNS and hosted zone are all current and valid