Skip to content

Commit

Permalink
Merge pull request #6 from kjs222/add-documentation
Browse files Browse the repository at this point in the history
add documentation
  • Loading branch information
kjs222 authored Jan 20, 2024
2 parents 4d4534b + bbd2ab0 commit 97b432e
Show file tree
Hide file tree
Showing 7 changed files with 168 additions and 43 deletions.
211 changes: 168 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,191 @@
### Data Collector
## Note

The data collector is deployed on AWS and functioning in production. It is collecting data every night and persisting the data in the data store.
It may be easiest to read this README (with images) on Github: https://github.com/kjs222/congressional-app

The architecture of the Data Collector is as follows:
## Tech Stack

- Data Store:
- DynamoDB (no SQL/schema-less database)
- Runtime:
- AWS Lambda (serverless)
- Typescript/Node
- Trigger:
- AWS Event Bridge scheduled event (10pm MT every evening)
- React Typescript frontend
- Node Typescript backend
- Infrastructure as Code via AWS CDK
- Deployed on AWS (see details below on architecture)

### Datastore
## Architecture

The datastore for the data collector is created in `lib/congressional-app-backend-stack.ts` (code excerpted below). There is no schema other than a requirement to set up a partition and sort key:
- Deployed URL: http://kjs222-congressional-application.s3-website-us-east-1.amazonaws.com
- Repository: https://github.com/kjs222/congressional-app

![Architecture Diagram](images/arch_diagram.png)

Architecture is described below and depicted above. Application is deployed on AWS and is managed with IAC using AWS CDK.

- backend IAC: backend/lib/congressional-app-backend-stack.ts
- frontend IAC: frontend/infrastructure/lib/congressional-app-frontend-stack.ts

#### Data Persistence

The application uses AWS DynamoDB (no SQL) for the data persistence layer. Two tables support the application:

- congressDataCollectorRaw
- congressAnalyzedVotes

The schema is flexible on both tables. Type safety is imposed within the application using [zod](https://zod.dev/) schemas. All data coming in/out of the database is validated against those schemas for type safety.

See example data access here: backend/src/api/adapters/dynamo-analyzed-vote-repository.ts

I chose a no SQL database for a variety of reasons:

- the application is an MVP, and I will learn more about data access patterns as the application evolves. Changing schemas in this a No SQL datastore is trivial, compared to a SQL database. Once I understand access patterns better as the application evolves, it is possible that I will transition to a SQL datastore.
- the API aspect of the application is read-only, limiting some of the challenges I have encountered with no SQL in the past.
- the application uses an hexagonal design (aka ports and adapters) making the change of datastores relatively smooth.

#### Data Collector

- Deployed on: AWS Lambda (serverless)
- Invoked by: Scheduled AWS Event Bridge Event
- Code path: backend/src/data-collector

Purpose:

- makes API calls to ProPublica API to get recent congressional votes from the prior day
- persists the raw vote information in AWS Dynamo DB
- sends an event on AWS SQS for the data collector

##### Event Collaboration

To effectively decouple the data collector for the data analyzer (and allow them to scale independently), the collaboration between the two components is achieved through SQS events and queue. After the data collector persists raw votes obtained from the ProPublica API, it emits an event onto the SQS queue, which is picked up by the data analyzer.

#### Data Analyzer

- Deployed on: AWS Lambda (serverless)
- Invoked by: SQS Event
- Code path: backend/src/data-analyzer

Purpose:

- Perform analysis on raw data collected by data collector
- Persist analyzed data to be served up by API

#### API

- Deployed on: AWS Lambda (serverless) with AWS API Gateway (routing, etc)
- Invoked by: HTTP API
- Code path: backend/src/api

Purpose:

- API for frontend application

#### Frontend

- Deployed on: AWS S3 Static Website
- Code path: frontend/src
- URL: http://kjs222-congressional-application.s3-website-us-east-1.amazonaws.com

A React application that interacts with the API described above to display analyzed vote information.

## CI/CD

CI/CD is implemented using Github Actions:

- CI Workflow: .github/workflows/pr.yml
- CD Workflow: .github/workflows/deploy.yml

#### CI

The CI workflow is initiated when a PR is opened against the `main` branch. It builds the application and runs all unit and integration tests.

See examples [here](https://github.com/kjs222/congressional-app/actions/workflows/pr.yml)

![CI runs](images/CI_1.png)
![CI detail](images/CI_2.png)

#### CD

The CD workflow is initiated on a push to main. It deploys the frontend and backend applications on AWS.

See examples [here](https://github.com/kjs222/congressional-app/actions/workflows/deploy.yml)

![CD detail](images/CD_1.png)

## Instrumentation + Metrics

Given that the application is entirely serverless, using Prometheus (requiring a server) seemed like an odd choice. Most production metrics tools (datadog, etc) have integration with AWS. If the application evolves, I would likely move to using one of those paid tools. But in the interim, AWS CloudWatch provides sufficient monitoring and instrumenting tools for the application.

See some examples below:

![Metrics Exp 1](images/Metrics_1.png)
![Metrics Exp 2](images/Metrics_2.png)

## Testing

The application contains both Unit and Integration tests.

Backend tests use the following testing frameworks:

- mocha with chai
- sinon for stubs, spies and mocks

### Unit Tests

Requirements:

- node - version 20, but likely lower versions work
- npm - should be installed with node

To run:

```
const rawVotesTable = new dynamo.Table(
this,
"congressDataCollectorRawVotes",
{
partitionKey: { name: "part", type: dynamo.AttributeType.STRING },
sortKey: { name: "sort", type: dynamo.AttributeType.STRING },
tableName: "congressDataCollectorRaw",
removalPolicy: cdk.RemovalPolicy.DESTROY,
// change below to billingMode: dynamo.BillingMode.PAY_PER_REQUEST
readCapacity: 25,
writeCapacity: 25,
}
);
cd backend
npm install
npm run unit-test
```

This table is used to persist two types of data:
##### Mocking

As indicated above, sinon is used for mocking.

See example usage here: backend/test/unit/api/vote-handler.spec.ts

### Integration Tests

1. a marker of the last vote processed so I know where to "end" on the following day
2. raw vote data for processing by the data analyzer (not built yet)
Additional requirements:

Again, there is no schema for a dynamoDB table, so the best that I can show is the shape of the data I am inserting. See `backend/src/data-collector/adapters/dynamo-raw-data-repository.ts`
- docker

The shape of the data for #1 above is:
To run:

```
part: string,
sort: string,
rollCall: number,
date: string
batchId: string
cd backend
npm install
npm run integration-test-local
```

THe shape of the data for #2 above is:
## To Run Locally

#### Frontend:

```
part: string,
sort: string,
raw: string
cd frontend
npm install
npm run start
```

### API for Data collection
Application will be at localhost:3000

I am not exposing the env variables needed to actually run the application however (sorry)

#### Backend

The backend application is a serverless application so there is no server to be run. The "handler" is the entry into each application component and it can conceivably be invoked.

The API for data collection is ProPublica's Congressional API. This requires an API key that I am not exposing (it is saved in AWS Secrets Manager).
Some things to note:

The data fetcher is: `backend/src/data-collector/adapters/propublica-vote-fetcher.ts`
- to spin up a dockerized database

```
cd backend
npm install
npm run start-local-ddb
```

And the service that orchestrates the fetching and saving of the data is: `backend/src/data-collector/services/data-collection-service.ts`
I am not exposing the env variables needed to actually run the application however (sorry)
Binary file added images/CD_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/CI_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/CI_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Metrics_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Metrics_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/arch_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 97b432e

Please sign in to comment.