This week we learned about Postgres and RDS.

https://www.postgresql.org/

PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. RDS is a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud. Database engines available for use with RDS include Aurora (MySQL or PostGreSQL compatible), MySQL, MariaDB, PostgreSQL, Oracle, and Microsoft SQL Server.

Created the RDS Postgres instance from the AWS CLI in Gitpod

aws rds create-db-instance \
  --db-instance-identifier cruddur-db-instance \
  --db-instance-class db.t3.micro \
  --engine postgres \
  --engine-version  14.6 \
  --master-username root \
  --master-user-password huEE33z2Qvl383 \
  --allocated-storage 20 \
  --availability-zone ca-central-1a \
  --backup-retention-period 0 \
  --port 5432 \
  --no-multi-az \
  --db-name cruddur \
  --storage-type gp2 \
  --publicly-accessible \
  --storage-encrypted \
  --enable-performance-insights \
  --performance-insights-retention-period 7 \
  --no-deletion-protection

To connect to Postgres locally, use this command:

psql -Upostgres --host localhost After entering our credentials to access the account

PSQL commands.

\x on -- expanded display when looking at data
\q -- Quit PSQL
\l -- List all databases
\c database_name -- Connect to a specific database
\dt -- List all tables in the current database
\d table_name -- Describe a specific table
\du -- List all users and their roles
\dn -- List all schemas in the current database
CREATE DATABASE database_name; -- Create a new database
DROP DATABASE database_name; -- Delete a database
CREATE TABLE table_name (column1 datatype1, column2 datatype2, ...); -- Create a new table
DROP TABLE table_name; -- Delete a table
SELECT column1, column2, ... FROM table_name WHERE condition; -- Select data from a table
INSERT INTO table_name (column1, column2, ...) VALUES (value1, value2, ...); -- Insert data into a table
UPDATE table_name SET column1 = value1, column2 = value2, ... WHERE condition; -- Update data in a table
DELETE FROM table_name WHERE condition; -- Delete data from a table
Once we were logged into the local database, we created a database named “cruddur”.

Created a database named “cruddur”.

CREATE database cruddur;

Setup tables for the database. In other languages like Ruby on Rails, they have a schema that defines the entire database. Since Flask doesn’t have it, we are going to write the entire thing by hand instead. To begin this, we created a new folder in ‘backend-flask’ named ‘db’ then created a new SQL file named schema.sql inside of that. As part of schema.sql, we need to have Postgres generate UUIDs. These are universally unique identifiers. It’s a randomly generated string of numbers generated by Postgres. To implement this, we needed to add an extension for it.

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

After this, from our postgres:bash terminal, we ran:

psql cruddur < db/schema.sql -h localhost -U postgres

This created the extension needed. To make doing this easier, we created a connection url string for our local database.

export CONNECTION_URL="postgresql://postgres:thisisntmypassword@localhost:5432/cruddur"
gp env export CONNECTION_URL="postgresql://postgres:thisisntmypassword@localhost:5432/cruddur"

The format for a connection url string for a Postgres database is the following:

postgresql://[user[:password]@][netlocation][:port][/dbname][?parameter1=value1]

We then created a new folder in ‘backend-flask’ named ‘bin’ which stands for binary. In this folder, we can store batch scripts to execute commands for our database. We then made several new files: db-create, db-drop, db-schema-load

For the files, each one needs a shebang. The shebang tells our app how to treat this file. Since we want it to run as bash, we added:

#! /user/bin/bash

We started off testing if we can drop our database. So to do this, we add a line to db-drop.

psql $CONNECTION_URL -c "drop database cruddur;'

This command should psql connect to our local Postgres database using our connection string url, then using ‘-c’ it issues a SQL command of “drop database cruddur;”

We run db-drop from our terminal and it gives us permission denied. This is because the file is not executable by default. To make it or any file executable after creation, we have to run an additional command for it:

chmod u+x <filename>

We run this to make our file executable, then do the same for our remaining batch scripts. We again test our batch script and get an error letting us know that we cannot drop the currently open database. Since we’re going to be working with this frequently, we need to replace part of our CONNECTION_URL, so we’re not using the database when trying to drop it.

To do this, we must use sed. Sed is a text stream editor used on Unix systems to edit files quickly and efficiently. The tool searches through, replaces, adds, and deletes lines in a text file without opening the file in a text editor. We use sed, then wrap it in an environment variable.

#! /usr/bin/bash

echo "db-drop"

NO_DB_CONNECTION_URL=$(sed 's/\/cruddur//g' <<<"$CONNECTION_URL")
psql $NO_DB_CONNECTION_URL -c "DROP database cruddur;"

We then go through and do the same for our db-create, passing a different SQL command.

#! /usr/bin/bash

echo "db-create"

NO_DB_CONNECTION_URL=$(sed 's/\/cruddur//g' <<<"$CONNECTION_URL")
psql $NO_DB_CONNECTION_URL -c "CREATE database cruddur;"

We move onto db-schema-load, and add a different command to connect to our database. Since it runs relative to where we’re executing, we use realpath. The realpath will tell us the actual path of the file.

#! /usr/bin/bash

echo "db-schema-load"

schema_path="$(realpath .)/db/schema.sql"
echo $schema_path

psql $CONNECTION_URL cruddur < $schema_path

We need a way to determine when we’re running from our production environment (prod) or our local Postgres environment. To do this, we added an if statement to the code.

#! /usr/bin/bash

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-schema-load"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

schema_path="$(realpath .)/db/schema.sql"
echo $schema_path

if [ "$1" = "prod" ]; then
  echo "Running in production mode"
  URL=$PROD_CONNECTION_URL
else
  URL=$CONNECTION_URL
fi

psql $URL cruddur < $schema_path

The additional code under our shebang was also added to provide a different color through the CLI when viewing our Postgres logs, so we can see when this is being ran. We went through and added it to our other batch scripts as well.

Back on our main task of adding tables to our database, we go back to our schema.sql, and add SQL commands to create our tables:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
DROP TABLE IF EXISTS public.users;
DROP TABLE IF EXISTS public.activities;


CREATE TABLE public.users (
  uuid UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
  display_name text,
  handle text,
  cognito_user_id text,
  created_at TIMESTAMP default current_timestamp NOT NULL
);

CREATE TABLE public.activities (
  uuid UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
  user_uuid UUID NOT NULL,
  message text NOT NULL,
  replies_count integer DEFAULT 0,
  reposts_count integer DEFAULT 0,
  likes_count integer DEFAULT 0,
  reply_to_activity_uuid integer,
  expires_at TIMESTAMP,
  created_at TIMESTAMP default current_timestamp NOT NULL
);

We then made a new batch script named ‘db-connect’, made it executable by running ‘chmod u+x ./bin/db-connect’ then ran the file we just created.

#! /usr/bin/bash

psql $CONNECTION_URL

We then created one more batch script named ‘db-seed’ and made it executable as well.

#! /usr/bin/bash

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-seed"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

seed_path="$(realpath .)/db/seed.sql"
echo $seed_path

if [ "$1" = "prod" ]; then
  echo "Running in production mode"
  URL=$PROD_CONNECTION_URL
else
  URL=$CONNECTION_URL
fi

psql $URL cruddur < $seed_path

This batch script will run our seed.sql file that we created in ‘backend-flask/db’.

-- this file was manually created
INSERT INTO public.users (display_name, email, handle, cognito_user_id)
VALUES
  ('bhanumalhotra','cloudevops101@gmail.com', 'bhanumalhotra' ,'MOCK'),
  ('bhanu', 'bhanumanaws@gmail.com', 'bhanuman','MOCK'),
  ('Bhanu Malhotra','bhanumalhotra219@gmail.com', 'Bhanu', 'MOCK');

INSERT INTO public.activities (user_uuid, message, expires_at)
VALUES
  (
    (SELECT uuid from public.users WHERE users.handle = 'bhanumalhotra' LIMIT 1),
    'This was imported as seed data!',
    current_timestamp + interval '10 day'
  )

When we test the ‘db-seed’ file, it works.

We may need to see what connections are being used to our Postgres database. For this, we implement ‘db-sessions’ and make it executable.

#! /usr/bin/bash
CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-sessions"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

if [ "$1" = "prod" ]; then
  echo "Running in production mode"
  URL=$PROD_CONNECTION_URL
else
  URL=$CONNECTION_URL
fi

NO_DB_URL=$(sed 's/\/cruddur//g' <<<"$URL")
psql $NO_DB_URL -c "select pid as process_id, \
       usename as user,  \
       datname as db, \
       client_addr, \
       application_name as app,\
       state \
from pg_stat_activity;"

When we run this, we’re able to see the active connections to our database, and close whatever connections we need to close.

We then drop our database by running ‘db-drop’ but decide we should create a command to run all of our commands, so we don’t have to run them invidivually. We create ‘db-setup’ and make it executable.

#! /usr/bin/bash
-e # stop if it fails at any point

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="db-setup"
printf "${CYAN}==== ${LABEL}${NO_COLOR}\n"

bin_path="$(realpath .)/bin"

source "$bin_path/db-drop"
source "$bin_path/db-create"
source "$bin_path/db-schema-load"
source "$bin_path/db-seed"

With that setup, now we could now install the Postgres driver in our Backend application. We’re going to use an AWS Lambda to to insert users into our database. Since AWS lacks the required PostgreSQL libraries in the AMI image, we must run the Postgres driver, custom compliled Psycopg2 C library for Python. To implement this, we add to our requirements.txt file.

psycopg[binary]
psycopg[pool]

We then do a ‘pip install -r requirements.txt’ to install the driver.

We’re going to have to use pooling as a way of dealing with connections to our database. There’s a certain amount of connections your database can handle in relational databases. There’s concurrent connections, where some are running, some aren’t. These connection pools allow us to re-use connections with concurrent connections.

Since we’re running Lambda functions, if our app became widely popular, we’d need to use a proxy, as Lambda functions create new functions each time it’s ran, which could become expensive.

To create our connection pool, we create a new file in ‘backend-flask/lib’ named ‘db.py’

# Importing the ConnectionPool class from the psycopg_pool module
from psycopg_pool import ConnectionPool
# Importing the os module to access environment variables
import os

# Function to wrap a SQL query result representing a single object
def query_wrap_object(template):
    # Constructing a SQL query to select a single object and convert it to JSON format
    sql = f"""
    -- This query selects a single object and converts it to JSON format.
    -- The COALESCE function is used to handle cases where the result might be NULL, 
    -- returning an empty JSON object '{}' in such cases.
    (SELECT COALESCE(row_to_json(object_row),'{{}}'::json) FROM (
    {template}   -- 'template' is a placeholder for the SQL query template provided as argument
    ) object_row);
    """
    return sql

# Function to wrap a SQL query result representing an array of objects
def query_wrap_array(template):
    # Constructing a SQL query to select an array of objects and convert it to JSON format
    sql = f"""
    -- This query selects an array of objects and converts it to JSON format.
    -- The COALESCE function is used to handle cases where the result might be NULL, 
    -- returning an empty JSON array '[]' in such cases.
    (SELECT COALESCE(array_to_json(array_agg(row_to_json(array_row))),'[]'::json) FROM (
    {template}   -- 'template' is a placeholder for the SQL query template provided as argument
    ) array_row);  
    """   
    return sql

# Retrieving the database connection URL from an environment variable named "CONNECTION_URL"
connection_url = os.getenv("CONNECTION_URL")

# Creating a connection pool using the ConnectionPool class with the provided connection URL
pool = ConnectionPool(connection_url)

https://www.psycopg.org/psycopg3/docs/api/pool.html#:~:text=min_size%20(%20int%20%2C%20default%3A%204,connections%20the%20pool%20will%20hold.

min_size ( int , default: 4) – The minimum number of connection the pool will hold. The pool will actively try to create new connections if some are lost (closed, broken) and will try to never go below min_size . max_size ( int , default: None ) – The maximum number of connections the pool will hold.

We open docker-compose.yml and add an environment variable for our CONNECTION_URL.

CONNECTION_URL: "postgresql://postgres:password@localhost:5432/cruddur"

Next we open home_activities.py to import our connection pool, remove our mock data, and add our query to establish our connection.

from lib.db import pool, query_wrap_array

...........................

# Constructing a SQL query to select an array of objects representing activities
# Joining the public.activities table with public.users table to get additional information about the user associated with each activity
# Ordering the results by the creation timestamp of activities in descending order

sql = query_wrap_array("""
SELECT
    activities.uuid,
    users.display_name,
    users.handle,
    activities.message,
    activities.replies_count,
    activities.reposts_count,
    activities.likes_count,
    activities.reply_to_activity_uuid,
    activities.expires_at,
    activities.created_at
FROM public.activities
LEFT JOIN public.users ON users.uuid = activities.user_uuid
ORDER BY activities.created_at DESC
""")

# Printing the constructed SQL query for debugging purposes
print("SQL=========")    
print(sql)
print("SQL+++++++++")       

# Establishing a connection to the database using the connection pool
with pool.connection() as conn:
    # Creating a cursor object to execute SQL queries
    with conn.cursor() as cur:
        # Executing the SQL query
        cur.execute(sql)
        # Fetching the result (which is expected to be a tuple containing JSON data)
        # Note: This assumes that the SQL query returns a single row of JSON data
        json = cur.fetchone()

# Printing a delimiter for debugging purposes
print("---------") 
# Printing the JSON data extracted from the database result
print(json[0])      
# Returning the extracted JSON data
return json[0]

This will fetch the data and return the results. Since we’re writing raw SQL, this will allows us to return json directly as well.

After working through some SQL errors, we pointed our attention back towards RDS. We spin up our RDS database, then test connecting to it using the terminal.

psql $PROD_CONNECTION_URL

Since we have not setup access through the security group for our RDS yet, the connection hangs. We must get the IP address of our Gitpod environment, then give that to our security group in AWS. We manually setup an inbound rule in our security group through AWS.

To do this, from the terminal, we run ‘curl ifonfig.me’ which outputs our Gitpod IP address.

We next passed GITPOD_IP=$(curl ifconfig.me) as variable so we can grab GITPOD_IP for RDS whenever needed. This allowed us to store our current IP address as an environment variable.

We again test the ‘psql’ command above, this time it works. Since our IP is going to update everytime we launch our workspace, we will need to manually update that IP stored by the inbound rule everytime as well.

There’s several env variables we then set after this, passing our security group id and our security group rule id as variables: DB_SG_ID and DB_SG_RULE_ID

We also store our env var in ‘.gitpod.yml’ as well as create a new batch script named ‘rds-update-sg-rule’ to run every time our environment launches:

  - name: postgres
    init: |
      curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc|sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/postgresql.gpg
      echo "deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main" |sudo tee  /etc/apt/sources.list.d/pgdg.list
      sudo apt update
      sudo apt install -y postgresql-client-13 libpq-dev      
      sudo apt install -y postgresql-client-13 libpq-dev  
    command: |
      export GITPOD_IP=$(curl ifconfig.me)
      source  "$THEIA_WORKSPACE_ROOT/backend-flask/bin/rds-update-sg-rule"

#! /usr/bin/bash

CYAN='\033[1;36m'
NO_COLOR='\033[0m'
LABEL="rds-update-sg-rule"
printf "${CYAN}== ${LABEL}${NO_COLOR}\n"

aws ec2 modify-security-group-rules \
    --group-id $DB_SG_ID \
    --security-group-rules "SecurityGroupRuleId=$DB_SG_RULE_ID,SecurityGroupRule{Description=GITPOD,IpProtocol=tcp,FromPort=5432,ToPort=5432,CidrIpv4=$GITPOD_IP/32}"

After confirming connection to RDS from Gitpod, modified docker-compose.yml to pass a different env var for CONNECTION_URL.

CONNECTION_URL: "${PROD_CONNECTION_URL}"

The new variable didn’t work because we did not have a schema loaded. I then accessed backend-flask, then ran ‘./bin/db-schema-load prod’. This got rid of the ‘psycopg.errors.UndefinedTable: relation “public.activities” does not exist error’ we were receiving.

When we load the webpage, it’s still blank, but this is because there’s no data. To create data, we need a user to sign up. When we sign up through Cognito, we need to create a record, which can be done through a Lambda.

We setup the Lambda function through AWS, and needed to setup an environment variable for it. We created CONNECTION_URL, which was the PROD_CONNECTION_URL we set earlier through AWS CLI for Gitpod.

Next we needed to add a layer for the function to interact with the Psycopg2 library, so we referenced https://github.com/jetbridge/psycopg2-lambda-layer where I found py 3.8 for my region (us-east-1), then inserted the ARN of the function into the console and verified: it’s compatible, so it was added to the function.

Added Lambda function to our Cognito User pool through Cognito > User pools, then selecting our user pool. It’s a trigger type: sign-up, post confirmation trigger. I then selected the function we already configured as the function to use.

Tested Lambda function by opening Cloudwatch logs, then going to Cognito and removing our existing user we created a couple weeks ago. We then attempted to recreate our user account through our site, but received a “PostConfirmation failed with error local variable ‘conn’ referenced before assignment.” error. Viewed Cloudwatch log for this error through the AWS console.

The error indicates a problem with our code in the cruddur-post-confirmation.py we created earlier. Updated indentation to the Lambda function code, then copied new code to Lambda function and deployed the changes. Tried web site again, resending a new authentication code. This time, we received a new error.

We updated our Lambda function again, this time removing conn = psycopg2.connect(os.getenv(‘CONNECTION_URL’)) and cur = conn.cursor() from ‘try’ and adding print commands to pass a string of “SQL Statement — -” and sql itself. We again went back to Cognito and removed our user, then signed up through the web site again. We again got a timeout error, but we knew this would happen. We just wanted to see what information is passing.

After some corrections to syntax, we went back to Lambda and found that we needed to set a VPC for our function. To do this, we needed to set a permission for the lambda. We went to the Lambda Configuration, then selected Permissions, and selected the execution role. This redirected us to the IAM page under Roles for the function’s execution role. Add permissions/Attach policies > then found we needed to create a custom policy to allow Lambda to create a network interface to the db. We then attached the policy to our Lambda function through IAM to add the permissions.

We again went back and created a VPC for our function, this time choosing our VPC, 2 subnets for better availability, and our default VPC security group.

Back in Cognito, we again removed our user, then went back to our web app to test again. We again got the same error from our website. I find that Andrew did not, so I repeat the same steps: removed user from Cognito, then went back to our web app and created an account again. This time, I get the same response as Andrew. In the latest Cloudwatch log, we were now getting a syntax error.

We reviewed our function once again, and after clearing up a few more syntax errors with our SQL command, we again tested. I came back with a successful new user created, but Andrew during the video got an error in regards to the email because his schema was not loaded. This led us into looking at our schema.sql file and editing the tables for public.users to not include NULL data, and we also reran the cmd to load the schema from the AWS CLI. “./bin/db-schema-load prod”

Once more we tested our function by removing the user from Cognito, creating a new account through our web app, then again checking our Cloudwatch logs. This time, the user was successfully created without errors.

To test this from the terminal, we selected our Postgres bash terminal in Gitpod, then connected to our production database: ./bin/db-connect prod. Once connected, we queried the users table: select * from users; > this returns 1 row, our newly created user account!

Next, we spun up our environment and login and find there’s no data. This is because we need to create it.

We begin by creating activities. While working out the code for actvities, we found a function to add to db.py from https://kb.objectrocket.com/postgresql/python-error-handling-with-the-psycopg2-postgresql-adapter-645.

Defined a new function in db.py to clean up code in create_activity.py. Called function in home_activities.py instead. Wrapped all functions in db.py by defining new class Db and importing to home_activities.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week4.md

week4.md

This week we learned about Postgres and RDS.

Files

week4.md

Latest commit

History

week4.md

File metadata and controls

This week we learned about Postgres and RDS.