-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from kjs222/add-documentation
add documentation
- Loading branch information
Showing
7 changed files
with
168 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,66 +1,191 @@ | ||
### Data Collector | ||
## Note | ||
|
||
The data collector is deployed on AWS and functioning in production. It is collecting data every night and persisting the data in the data store. | ||
It may be easiest to read this README (with images) on Github: https://github.com/kjs222/congressional-app | ||
|
||
The architecture of the Data Collector is as follows: | ||
## Tech Stack | ||
|
||
- Data Store: | ||
- DynamoDB (no SQL/schema-less database) | ||
- Runtime: | ||
- AWS Lambda (serverless) | ||
- Typescript/Node | ||
- Trigger: | ||
- AWS Event Bridge scheduled event (10pm MT every evening) | ||
- React Typescript frontend | ||
- Node Typescript backend | ||
- Infrastructure as Code via AWS CDK | ||
- Deployed on AWS (see details below on architecture) | ||
|
||
### Datastore | ||
## Architecture | ||
|
||
The datastore for the data collector is created in `lib/congressional-app-backend-stack.ts` (code excerpted below). There is no schema other than a requirement to set up a partition and sort key: | ||
- Deployed URL: http://kjs222-congressional-application.s3-website-us-east-1.amazonaws.com | ||
- Repository: https://github.com/kjs222/congressional-app | ||
|
||
![Architecture Diagram](images/arch_diagram.png) | ||
|
||
Architecture is described below and depicted above. Application is deployed on AWS and is managed with IAC using AWS CDK. | ||
|
||
- backend IAC: backend/lib/congressional-app-backend-stack.ts | ||
- frontend IAC: frontend/infrastructure/lib/congressional-app-frontend-stack.ts | ||
|
||
#### Data Persistence | ||
|
||
The application uses AWS DynamoDB (no SQL) for the data persistence layer. Two tables support the application: | ||
|
||
- congressDataCollectorRaw | ||
- congressAnalyzedVotes | ||
|
||
The schema is flexible on both tables. Type safety is imposed within the application using [zod](https://zod.dev/) schemas. All data coming in/out of the database is validated against those schemas for type safety. | ||
|
||
See example data access here: backend/src/api/adapters/dynamo-analyzed-vote-repository.ts | ||
|
||
I chose a no SQL database for a variety of reasons: | ||
|
||
- the application is an MVP, and I will learn more about data access patterns as the application evolves. Changing schemas in this a No SQL datastore is trivial, compared to a SQL database. Once I understand access patterns better as the application evolves, it is possible that I will transition to a SQL datastore. | ||
- the API aspect of the application is read-only, limiting some of the challenges I have encountered with no SQL in the past. | ||
- the application uses an hexagonal design (aka ports and adapters) making the change of datastores relatively smooth. | ||
|
||
#### Data Collector | ||
|
||
- Deployed on: AWS Lambda (serverless) | ||
- Invoked by: Scheduled AWS Event Bridge Event | ||
- Code path: backend/src/data-collector | ||
|
||
Purpose: | ||
|
||
- makes API calls to ProPublica API to get recent congressional votes from the prior day | ||
- persists the raw vote information in AWS Dynamo DB | ||
- sends an event on AWS SQS for the data collector | ||
|
||
##### Event Collaboration | ||
|
||
To effectively decouple the data collector for the data analyzer (and allow them to scale independently), the collaboration between the two components is achieved through SQS events and queue. After the data collector persists raw votes obtained from the ProPublica API, it emits an event onto the SQS queue, which is picked up by the data analyzer. | ||
|
||
#### Data Analyzer | ||
|
||
- Deployed on: AWS Lambda (serverless) | ||
- Invoked by: SQS Event | ||
- Code path: backend/src/data-analyzer | ||
|
||
Purpose: | ||
|
||
- Perform analysis on raw data collected by data collector | ||
- Persist analyzed data to be served up by API | ||
|
||
#### API | ||
|
||
- Deployed on: AWS Lambda (serverless) with AWS API Gateway (routing, etc) | ||
- Invoked by: HTTP API | ||
- Code path: backend/src/api | ||
|
||
Purpose: | ||
|
||
- API for frontend application | ||
|
||
#### Frontend | ||
|
||
- Deployed on: AWS S3 Static Website | ||
- Code path: frontend/src | ||
- URL: http://kjs222-congressional-application.s3-website-us-east-1.amazonaws.com | ||
|
||
A React application that interacts with the API described above to display analyzed vote information. | ||
|
||
## CI/CD | ||
|
||
CI/CD is implemented using Github Actions: | ||
|
||
- CI Workflow: .github/workflows/pr.yml | ||
- CD Workflow: .github/workflows/deploy.yml | ||
|
||
#### CI | ||
|
||
The CI workflow is initiated when a PR is opened against the `main` branch. It builds the application and runs all unit and integration tests. | ||
|
||
See examples [here](https://github.com/kjs222/congressional-app/actions/workflows/pr.yml) | ||
|
||
![CI runs](images/CI_1.png) | ||
![CI detail](images/CI_2.png) | ||
|
||
#### CD | ||
|
||
The CD workflow is initiated on a push to main. It deploys the frontend and backend applications on AWS. | ||
|
||
See examples [here](https://github.com/kjs222/congressional-app/actions/workflows/deploy.yml) | ||
|
||
![CD detail](images/CD_1.png) | ||
|
||
## Instrumentation + Metrics | ||
|
||
Given that the application is entirely serverless, using Prometheus (requiring a server) seemed like an odd choice. Most production metrics tools (datadog, etc) have integration with AWS. If the application evolves, I would likely move to using one of those paid tools. But in the interim, AWS CloudWatch provides sufficient monitoring and instrumenting tools for the application. | ||
|
||
See some examples below: | ||
|
||
![Metrics Exp 1](images/Metrics_1.png) | ||
![Metrics Exp 2](images/Metrics_2.png) | ||
|
||
## Testing | ||
|
||
The application contains both Unit and Integration tests. | ||
|
||
Backend tests use the following testing frameworks: | ||
|
||
- mocha with chai | ||
- sinon for stubs, spies and mocks | ||
|
||
### Unit Tests | ||
|
||
Requirements: | ||
|
||
- node - version 20, but likely lower versions work | ||
- npm - should be installed with node | ||
|
||
To run: | ||
|
||
``` | ||
const rawVotesTable = new dynamo.Table( | ||
this, | ||
"congressDataCollectorRawVotes", | ||
{ | ||
partitionKey: { name: "part", type: dynamo.AttributeType.STRING }, | ||
sortKey: { name: "sort", type: dynamo.AttributeType.STRING }, | ||
tableName: "congressDataCollectorRaw", | ||
removalPolicy: cdk.RemovalPolicy.DESTROY, | ||
// change below to billingMode: dynamo.BillingMode.PAY_PER_REQUEST | ||
readCapacity: 25, | ||
writeCapacity: 25, | ||
} | ||
); | ||
cd backend | ||
npm install | ||
npm run unit-test | ||
``` | ||
|
||
This table is used to persist two types of data: | ||
##### Mocking | ||
|
||
As indicated above, sinon is used for mocking. | ||
|
||
See example usage here: backend/test/unit/api/vote-handler.spec.ts | ||
|
||
### Integration Tests | ||
|
||
1. a marker of the last vote processed so I know where to "end" on the following day | ||
2. raw vote data for processing by the data analyzer (not built yet) | ||
Additional requirements: | ||
|
||
Again, there is no schema for a dynamoDB table, so the best that I can show is the shape of the data I am inserting. See `backend/src/data-collector/adapters/dynamo-raw-data-repository.ts` | ||
- docker | ||
|
||
The shape of the data for #1 above is: | ||
To run: | ||
|
||
``` | ||
part: string, | ||
sort: string, | ||
rollCall: number, | ||
date: string | ||
batchId: string | ||
cd backend | ||
npm install | ||
npm run integration-test-local | ||
``` | ||
|
||
THe shape of the data for #2 above is: | ||
## To Run Locally | ||
|
||
#### Frontend: | ||
|
||
``` | ||
part: string, | ||
sort: string, | ||
raw: string | ||
cd frontend | ||
npm install | ||
npm run start | ||
``` | ||
|
||
### API for Data collection | ||
Application will be at localhost:3000 | ||
|
||
I am not exposing the env variables needed to actually run the application however (sorry) | ||
|
||
#### Backend | ||
|
||
The backend application is a serverless application so there is no server to be run. The "handler" is the entry into each application component and it can conceivably be invoked. | ||
|
||
The API for data collection is ProPublica's Congressional API. This requires an API key that I am not exposing (it is saved in AWS Secrets Manager). | ||
Some things to note: | ||
|
||
The data fetcher is: `backend/src/data-collector/adapters/propublica-vote-fetcher.ts` | ||
- to spin up a dockerized database | ||
|
||
``` | ||
cd backend | ||
npm install | ||
npm run start-local-ddb | ||
``` | ||
|
||
And the service that orchestrates the fetching and saving of the data is: `backend/src/data-collector/services/data-collection-service.ts` | ||
I am not exposing the env variables needed to actually run the application however (sorry) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.