Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add database setup #74

Merged
merged 6 commits into from
Oct 10, 2024
Merged

Add database setup #74

merged 6 commits into from
Oct 10, 2024

Conversation

fhenneke
Copy link
Contributor

@fhenneke fhenneke commented Oct 8, 2024

This PR adds a setup file for the database and a docker image to create a local database.

The database is set up to have four tables

  • A table with settlement transaction hashes and timestamps.
  • A table with settlement transaction hashes and token addresses. It should be populated with all tokens traded in a settlement. Together with the first table we have all tokens traded with times when they were traded.
  • A table for prices from different sources.

This table structure will be mirrored once per chain.

The database can be set up locally using the commands in the README or using the make file command make test_db.

The code does not use this table layout yet. I am planning to add another PR which uses those tables instead of the current tables raw_token_imbalances, slippage_prices, and fees_new.

@@ -0,0 +1,23 @@
CREATE TABLE token_info (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make it chain-specific? I would rename it to

token_info_mainnet

given that we will probably use one version of the db for all chains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we have one database for all chains instead of one database per chain? We use the latter for all other purposes.

If we go with having information on all chain in one database, I would expect that having an additional column network in all tables is better that duplicating tables for each chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we have one database for all chains instead of one database per chain? We use the latter for all other purposes.

Don't have a strong opinion here, but this would add overhead, as we (i.e., devops) would need to set up different databases, accounts etc

If we go with having information on all chain in one database, I would expect that having an additional column network in all tables is better that duplicating tables for each chain.

Although postgres most likely takes care of everything, having a single table where multiple daemons might try to add things at the same time could create some race conditions. Also, could it make things slower, as network should be part of the primary key now?

Somehow having a separate table per chain looks safer/more flexible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think, @ahhda?

@@ -0,0 +1,23 @@
CREATE TABLE token_info (
token_address bytea PRIMARY KEY,
symbol varchar NOT NULL,
Copy link
Contributor

@harisang harisang Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh the symbol is not needed, so i would remove it (as i am also always nervous with the crazy characters some tokens use that might cause encoding issues here). And if you want to include it, i would definitely make it optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then lets just remove it. No need to store unused data here.

decimals int NOT NULL
);

CREATE TABLE token_times (
Copy link
Contributor

@harisang harisang Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name a bit unintuitive. I would probably change it to something like

transaction_timestamps

or imbalance_timestamps

Copy link
Contributor Author

@fhenneke fhenneke Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would token_timestamps work? Or token_transactions?

This table is not about imbalances and it is not about transactions (edit:) it is not about timestamps of transactions but about tokens. So not having token in the name sounds misleading to me. (end edit) The table might be superseded by a token_imbalances table for linking tokens to transactions, maybe in combination with a settlements table linking transaction hashes and times.

Copy link
Contributor

@harisang harisang Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is not about imbalances and it is not about transactions.

Hm, then what is it about? There is a tx hash associated with each entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a clarification in the comment above.

  • For a table with the name transaction_timestamps i would expect that (tx_hash, time) is a key. Having token addresses in that would surprise me.
  • For a table with the name imbalance_timestamps i would expect that it stores imbalances in some form.

Maybe the clean design would be to have a transaction_timestamps table and a transaction_tokens table. To avoid overhead due to additional tables it could also be a transaction_token_timestamps table with columns (tx_hash, token_address, time) or something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the distinction between the transaction_timestamps and transaction_tokens tables make sense. And actually we could easily generate the transaction_tokens table already by reducing the scope of the part of the code that computes imbalances (it could simply just keep track of what tokens are being transferred and record those in this table). I.e., repurpose the raw_token_imbalances table into a transaction_tokens table until further notice.

Copy link
Contributor

@bram-vdberg bram-vdberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I added a makefile so we can start the database easier.

Would it be worth adding an index? It would make querying specific tokens faster:

CREATE INDEX idx_prices_token_time ON prices (token_address, time);

Copy link
Contributor

@harisang harisang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitions look good! I cannot really comment on the rest

@fhenneke fhenneke marked this pull request as ready for review October 9, 2024 13:12
PRIMARY KEY (tx_hash, token_address)
);

CREATE TYPE PriceSource AS ENUM ('coingecko', 'moralis', 'dune', 'native');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fhenneke Do you know if this is easy to change, in case we end up using some other price feeds as well?

@harisang harisang merged commit 21727e4 into main Oct 10, 2024
3 checks passed
@fhenneke fhenneke deleted the add_new_database branch October 10, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants