Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add database setup #74

Merged
merged 6 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Dockerfile.test_db
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM postgres
ENV POSTGRES_PASSWORD postgres
ENV POSTGRES_DB mainnet
COPY ./database/01_table_creation.sql /docker-entrypoint-initdb.d/
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,5 +24,10 @@ python -m src.imbalances_script
python -m src.daemon
```

**To run the basic test in the `tests/` folder:**
using pytest, simply run the command: `pytest`
## Tests

To build and start a local database for testing use the command
```sh
docker build -t test_db_image -f Dockerfile.test_db .
docker run -d --name test_db_container -p 5432:5432 test_db_image
```
23 changes: 23 additions & 0 deletions database/01_table_creation.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
CREATE TABLE token_info (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make it chain-specific? I would rename it to

token_info_mainnet

given that we will probably use one version of the db for all chains

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we have one database for all chains instead of one database per chain? We use the latter for all other purposes.

If we go with having information on all chain in one database, I would expect that having an additional column network in all tables is better that duplicating tables for each chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we have one database for all chains instead of one database per chain? We use the latter for all other purposes.

Don't have a strong opinion here, but this would add overhead, as we (i.e., devops) would need to set up different databases, accounts etc

If we go with having information on all chain in one database, I would expect that having an additional column network in all tables is better that duplicating tables for each chain.

Although postgres most likely takes care of everything, having a single table where multiple daemons might try to add things at the same time could create some race conditions. Also, could it make things slower, as network should be part of the primary key now?

Somehow having a separate table per chain looks safer/more flexible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think, @ahhda?

token_address bytea PRIMARY KEY,
symbol varchar NOT NULL,
Copy link
Contributor

@harisang harisang Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh the symbol is not needed, so i would remove it (as i am also always nervous with the crazy characters some tokens use that might cause encoding issues here). And if you want to include it, i would definitely make it optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then lets just remove it. No need to store unused data here.

decimals int NOT NULL
);

CREATE TABLE token_times (
Copy link
Contributor

@harisang harisang Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name a bit unintuitive. I would probably change it to something like

transaction_timestamps

or imbalance_timestamps

Copy link
Contributor Author

@fhenneke fhenneke Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would token_timestamps work? Or token_transactions?

This table is not about imbalances and it is not about transactions (edit:) it is not about timestamps of transactions but about tokens. So not having token in the name sounds misleading to me. (end edit) The table might be superseded by a token_imbalances table for linking tokens to transactions, maybe in combination with a settlements table linking transaction hashes and times.

Copy link
Contributor

@harisang harisang Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table is not about imbalances and it is not about transactions.

Hm, then what is it about? There is a tx hash associated with each entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a clarification in the comment above.

  • For a table with the name transaction_timestamps i would expect that (tx_hash, time) is a key. Having token addresses in that would surprise me.
  • For a table with the name imbalance_timestamps i would expect that it stores imbalances in some form.

Maybe the clean design would be to have a transaction_timestamps table and a transaction_tokens table. To avoid overhead due to additional tables it could also be a transaction_token_timestamps table with columns (tx_hash, token_address, time) or something like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the distinction between the transaction_timestamps and transaction_tokens tables make sense. And actually we could easily generate the transaction_tokens table already by reducing the scope of the part of the code that computes imbalances (it could simply just keep track of what tokens are being transferred and record those in this table). I.e., repurpose the raw_token_imbalances table into a transaction_tokens table until further notice.

time timestamp NOT NULL,
token_address bytea NOT NULL,
block_number bigint NOT NULL,
tx_hash bytea NOT NULL,

PRIMARY KEY (time, token_address, tx_hash)
);

CREATE TABLE prices (
time timestamp NOT NULL,
token_address bytea NOT NULL,
price numeric(60, 18) NOT NULL,
source varchar NOT NULL,

PRIMARY KEY (time, token_address, source)
);
Loading