Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: v1 wip #141

Merged
merged 99 commits into from
Nov 10, 2023
Merged

feat: v1 wip #141

merged 99 commits into from
Nov 10, 2023

Conversation

chris13524
Copy link
Member

@chris13524 chris13524 commented Oct 23, 2023

Description

Implements WalletConnect/walletconnect-specs#161

Resolves #70
Resolves #39

Changes

Queuing functionality:

  • Added asynchronous task that processes queued notifications from the database
  • New v1 /notify endpoint which saves notifications to this queue
    • Supports idempotency
    • Supports sending multiple notifications
      Relay:
  • @chris13524 Retry all relay message publishes
  • Add TTL to HTTP relay JWT
  • Add user agent to HTTP relay client

/v1/subscribers

  • Returns the scopes each subscriber is subscribed to

Metrics:

  • RDS Grafana dashboards
  • Total subscribed topics
    • @chris13524 Don't have 1 metric all 3, just have the two project and subscribers metric, and then add them together in Grafana?
  • Count + latency + status code for each HTTP endpoint
    • @chris13524 Grafana panel for rate of each HTTP endpoint
    • @chris13524 Grafana panel for latency readings of each HTTP endpoint
    • @chris13524 Grafana panel for showing status codes somehow
  • extracted to issue: Postgres handler metrics #184
  • @chris13524 Count + latency + status for each websocket message processed
    • @chris13524 Grafana panel for rate of each handler function
    • @chris13524 Grafana panel for latency readings of each handler function
    • @chris13524 Grafana panel for showing status responses somehow
  • @chris13524 Count + latency + attributes: temporary/permanent failures & tag for each message published to the relay
    • @chris13524 Grafana panel showing rate of messages sent to relay, should be no more than 50 msgs/sec for initial launch
    • @chris13524 Grafana panel showing rate of temporary and permanent failures

Refactors:

  • MongoDB completely removed from application, CI, and tests
  • anyhow removed in-favor of thiserror
  • Rename integration tests to deployment tests, and rename storage tests to integration tests
  • Various refactors:
    • Keep items in key and Url types instead of strings
    • Improve module structure, placement of routes, relay client helpers, simplify websocket module
    • Add bind_ip to allow local development to bind to localhost, avoiding popups on macOS
    • Notification type IDs in UUID form rather than String, as this is what Cloud serves. Existing database unchanged, but invalid entries will be ignored. In the future we can migrate the database and drop invalid values and change from VARCHAR to UUID, but for now we will keep.
    • Remove anyhow

Tasks

Microservice:
To process notifications async from the API server we've decided to go with the separate microservice that processes messages from the queue:

  • Separate entry points for the notify API server and new microservice (7a44f),
  • Implement microservice:
    • Messages processing queue in PostgreSQL table with the sqlx PgListener for new records (and existing records back to the awaiting for processing status) event notification.
    • Microservice bootstrap.
    • Messages processing worker.
    • Logging and debugging.
    • Expose metrics and analytics for microservice.
    • Deploying mechanism updates for the new microservice.
    • Consolidate binaries into 1 process for now
    • Why double pg_notify trigger?
    • Idempotency key
    • Prevent updating already sent notification
  • @chris13524: convert migration tests into integration tests and finish removing MongoDB from fix: remove DocDB from runtime #167
  • /v1/notify endpoint

Deferred:

  • /v1/ get messages & status endpoints
  • Separate microservice

How Has This Been Tested?

Test cases

Due Diligence

  • Breaking change
  • Requires a documentation update
  • Requires a e2e/integration test update


info!("Response: {response:?} for /v1/notify from project: {project_id}");

if let Some(metrics) = &state.metrics {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably remove these metrics in a follow-up PR, as these have been replaced with other ones.

@@ -468,7 +597,7 @@ pub async fn get_subscription_watchers_for_account_by_app_or_all_app(
SELECT project, did_key, sym_key
FROM subscription_watcher
LEFT JOIN project ON project.id=subscription_watcher.project
WHERE account=$1 AND (project IS NULL OR project.app_domain=$2)
WHERE expiry > now() AND account=$1 AND (project IS NULL OR project.app_domain=$2)
Copy link
Member Author

@chris13524 chris13524 Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixed query returning to expired watchers, not a major issue, but a fix nevertheless

&self.wsclient,
&self.state.metrics,
)
async fn connect(state: &AppState, client: &Client) -> Result<()> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary wrapper struct. Refactored to simplify

Copy link
Contributor

@xav xav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see anything obviously wrong, but it's almost impossible to review this, it's just too big. Each of the items in the least could have been a single PR.

@@ -30,7 +30,7 @@ pub struct SubscriberNotification {
/// Hash of the CAIP-10 account of the subscriber
pub account_hash: String,
/// The notification type ID
pub notification_type: Arc<str>,
pub notification_type: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you change to String, is it because of the DB mapping?

Copy link
Member Author

@chris13524 chris13524 Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the notification type was changed from Arc<str> to a Uuid in most of the codebase. And here it's creating a new String representation for it via .to_string() and converting the String to an Arc<str> doesn't provide any value

@chris13524 chris13524 merged commit 0b4e6af into main Nov 10, 2023
14 checks passed
@chris13524 chris13524 deleted the chore/messaging-refactor branch November 10, 2023 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: allow more than 500 notifications to be sent at once feat: webhook retry & queuing
3 participants