Hyperdot - Powerful data analysis and creations platform — RFP #1815

cattania · 2023-06-21T16:32:55Z

Project Abstract

Please replace these instructions with a brief description of your project summarising key points (1-2 paragraphs).

If your application is a follow-up to a previous grant, please mention which one in the first line of the abstract and include a link to previous pull requests if applicable.

Grant level

Level 1: Up to $10,000, 2 approvals
Level 2: Up to $30,000, 3 approvals
Level 3: Unlimited, 5 approvals (for >$100k: Web3 Foundation Council approval)

Application Checklist

The application template has been copied and aptly renamed (project_name.md).
I have read the application guidelines.
Payment details have been provided (bank details via email or BTC, Ethereum (USDC/DAI) or Polkadot/Kusama (USDT) address in the application).
The software delivered for this grant will be released under an open-source license specified in the application.
The initial PR contains only one commit (squash and force-push if needed).
The grant will only be announced once the first milestone has been accepted (see the announcement guidelines).
I prefer the discussion of this application to take place in a private Element/Matrix channel. My username is: @tania.infra3:matrix.org (change the homeserver if you use a different one)

CLAassistant · 2023-06-21T16:33:02Z

All committers have signed the CLA.

dsm-w3f

@cattania thank you for the application. I notice that the deliverables in the milestones are not compliant with our application template. Could you please adjust to include the mandatory ones, especially in milestones 2 and 3? You can take a look at other applications to see examples of how to describe the milestone deliverables.

I understood that the main objective of the application is to have a tool with great performance. Correct me if I understood wrong. As you already have an MVP, did you already run some benchmarks about the query performance and also the size of the storage in comparison with other tools? If yes, could you share these benchmarks with us or give us an idea about the performance of the tool?

Furthermore, I think the RFP mentioned proposes to have a tool that is flexible like Dune to generate dynamic dashboards that can be shared with other people. I found that is possible to create the queries in the scope of this application, but it will be possible to share them with other people to use it?

cattania · 2023-06-23T02:54:18Z

@cattania thank you for the application. I notice that the deliverables in the milestones are not compliant with our application template. Could you please adjust to include the mandatory ones, especially in milestones 2 and 3? You can take a look at other applications to see examples of how to describe the milestone deliverables.

I understood that the main objective of the application is to have a tool with great performance. Correct me if I understood wrong. As you already have an MVP, did you already run some benchmarks about the query performance and also the size of the storage in comparison with other tools? If yes, could you share these benchmarks with us or give us an idea about the performance of the tool?

Furthermore, I think the RFP mentioned proposes to have a tool that is flexible like Dune to generate dynamic dashboards that can be shared with other people. I found that is possible to create the queries in the scope of this application, but it will be possible to share them with other people to use it?

Thanks for your reply, I adjusted the milestone compatibility.

In terms of performance, I note that Dune customizes the data engine for crypto data, but even so, running complex analytics on tens of millions of blocks of data can be very time consuming, as postgres is not suitable for analytics scenarios. hyperdot currently uses postgres, so it is expected that it will have similar benchmark with performance data to Dune. At the same time, we will establish a good index for block crypto data and enable data compression to further optimize performance.

In addition, somewhat different is that hyperdot is more focused on scalability, flexibility, and performance of different analysis scenarios, hyperdot use of multiple data engines to solve this problem.

dsm-w3f · 2023-06-23T17:45:43Z

@cattania thank you for the answer. What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project? Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost? How do you plan for this project to be financially sustainable over time?

cattania · 2023-06-25T02:34:18Z

@cattania thank you for the answer. What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project? Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost? How do you plan for this project to be financially sustainable over time?

What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project?

In this project, first hyperdot will act as a business.

In the coming months, we hope that we can provide users with a SQL query Polkadot/Kusama/Substrate chain data platform for data analysis, at the same time allow can share dashboard. This is exactly what hyperdot describes in this RFP.

Next, we want to help get hyperdot in front of more people with the traction of the Foundation. Then, we will continue to improve our technical capabilities, hyperdot will implement the Polkadot/Kusama/Substrate more data model, data analysis engine, query interactions (such as Chatgpt). We are also working to expose hyperdot's indexed data and analytics capabilities to outside developers (like thegraph and subquery) via apis and wasm. These efforts will make hyperdot more competitive.

Finally, hyperdot's long-term vision is to become the most competitive platform for on-chain crypto data analysis and creations in web3.

Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost?

Yes, this is in line with hyperdot's goals. For storage costs, we plan to obtain funding from the following way

hyperdot is free for small queries, new user-created queries. For large queries a certain fee will be charged through the smart contract. By balancing the two, hyperdot can easily maintain storage costs.
We plan to identify historical data and new data, and archive historical data to inexpensive storage (e.g. S3 or IPFS).

How do you plan for this project to be financially sustainable over time?

hyperdot is sustainable because

We will charge a reasonable fee for the query and dashboard data. This cost model allows us to balance the cost and benefit.
As mentioned earlier, we will expose our data indexing and analysis capabilities to more developers through api and wasm, so that we can charge for indexing and running on demand to get some benefit

dsm-w3f · 2023-06-26T17:15:40Z

@cattania Thanks for the answer. Are you aware of these two projects that recently received grants from us?

#1768
#1716

Could you please provide a brief comparison of your project with them? Also notice the Data Alliance bounty mentioned in the discussions of these PRs. How could your project fit in this bounty program? It is not launched yet, but it is planned to happen.

cattania · 2023-06-27T01:58:12Z

Thanks for your reply. I see #1768 and #1716.

#1716 is a etl tool that, coincidentally, we used to maintain ethereum-etl, for a brief comparison

hyperdot and Dot-ETL Project Proposal #1716 both have ETL capabilities, but hyperdot provides a post-ETL data engine, i.e. $\text{extract} \to \text{transform } \to \text{load data} \to \text{postgres}$
hyperdot provides a data analysis website that allows anyone to make interactive queries and share visual dashboards using sql

#1768 goes one step further on top of Dott-ETL store data into google BigQuery, similar to how hyperdot provides a post-ETL data engine. It also provides a dashboard, for a brief comparison

Both hyperdot and Deep Account Analytics in Three Tiers for the Polkadot Data Alliance #1768 can collect and store data to the data engine. The difference is that hyperdot favors postgres, which is free and open source, so that anyone can easily run hyperdot. hyperdot also supports multiple data engines, which means that it can support postgres as well as other data engines.
Deep Account Analytics in Three Tiers for the Polkadot Data Alliance #1768 provides a data explore dashboard similar to subscan, but focuses more on solving deep account analytics problems, hyperdot is more similar to Dune, with its focus on interactive queries using sql, visualization of query data dashboards, and shared authoring of data analytics.

About fit Polkadot Data Alliance Bounty, I noticed it and we thought hyperdot very fit

hyperdot implements transparent open source data infrastructure, which can well support ETL-like implementation of multiple substrate chains in polkadot ecosystem
hyperdot provides multiple data engines to support data analysis and querying, and processing via a unified sql engine (even if they don't support sql), similar to dotlake mentioned in this document
hyperdot-fronted-end provides metrics computation and intermediate table analysis similar to Dune

Furthermore, we thought seriously about the Polkadot Data Alliance Bounty idea of "providing a comprehensive, accurate, and accessible data warehouse and rewarding users who contribute to data analysis". hyperdot also aims to provide a multi-chain, multi-data engine, unified sql querying data warehouse, with plans to add creations incentives in the future.

dsm-w3f · 2023-06-27T18:56:54Z

@cattania thank you for the answer. I marked the application as ready for review. The committee will take a look and can make more questions. We will provide a feedback soon.

cattania · 2023-06-28T07:01:53Z

@cattania thank you for the answer. I marked the application as ready for review. The committee will take a look and can make more questions. We will provide a feedback soon.

Thank you for your reply. We are ready.

In addition, we would like to add a comparison between #1716 and hyperdot. Dot-ETL is built on subquery, while hyperdot is built on subxt. hyperdot is more flexible. Not limited by subquery (e.g. data type of index, supported chain limited)

semuelle · 2023-06-28T10:06:44Z

@KarimJedda, could you have a look?

cattania · 2023-07-07T10:39:40Z

Any progress?

KarimJedda · 2023-07-07T12:15:17Z

Hello @semuelle & @cattania ! This is on my list, I'll aim to get back to you by next week. Due to Decoded happening last week there wasn't much time yet. Hope that's fine ✌️

cattania · 2023-07-07T13:14:01Z

Thanks for your reply, I guessed it might be because of Decoded 🤣

takahser · 2023-07-20T13:56:40Z

@KarimJedda any update? 🙃

cattania · 2023-07-25T08:21:19Z

Is there any progress on the project? We'd love to hear your feedback

KarimJedda · 2023-07-25T10:43:24Z

Took me a while to make time because of another conference (EuroPython) last week and urgent data request for this week, my apologies.

High level

As far as I understand, this project has two components: an indexer (hyperdot-node) and a frontend (hyperdot-fronted-end) to build in essence a Dune analytics for Substrate and the Polkadot ecosystem.

From what I can see in the frontend in the POC you shared, that it's currently querying in the browser a Postgres database. What would be beneficial here is to see how that ties in with online charting capabilities, like what Colorful Notion is proposing with the Apache Superset project they're integrating: How do you see users interacting and using the system that would be similar (or different) to Dune?

You mention it provides also a post-ETL data engine, leveraging subxt to ingest data directly from the chains themselves.

A few general questions:

How do you plan to cover the costs of accessing all the nodes in the ecosystem?
How do you plan to maintain the system (system operations, DB backups)?
Are you familiar with operations and maintenance of Postgres database?
How do you plan to share the data, if people would like direct access? Is this a use case you plan to cover?

Milestones

Milestone 1 - Backend

0b. you mention sending test transaction, will this also be used to submit transactions somehow? If yes, can you elaborate a little bit how that fits in in the infrastructure diagram?

0d. Will you be providing the built docker containers too?

1. How will you handle metadata upgrades and data backfills of the data? Some words on that would help.

3. Do you already have a POC of this part? From own experience, it's a bit challenging, just want to get your thoughts.

4. This is the central part of the project, what advantage would JSON-RPC for the backend component bring to the system? As for the pub/sub with the frontend, I guess you'll use Websockets right?

5. On this part, I think it's alright to store it in PostgreSQL and I'm sure it'll handle the load. However, it's not an easy thing to maintain and share with the ecosystem. Would an ELT type system work too? You list in the infrastructure diagram that you can query using DuckDB on files, wouldn't that make the system a bit simpler and more portable? If this is meant to be open sourced, it's operation mode should be simple.

9. I'd add data migrations here too, since I believe the data model might be subject to changes.

Milestone 2 - Frontend

I'm not a frontend expert but I think this looks ok. I'd focus though on very fast iteration and making the frontend agnostic of data sources (ie designing the API the frontend uses in a way that it's not tightly coupled to the storage layer or DB). Meaning, it should work with DuckDB, Postgres, BigQuery etc. Similar to what Apache Superset or Metabase is providing but prettier and more tailored to a "Dune" use case:

public queries
shared dashboards
possibilities of embedding text analysis to the frontend
comments

That way hyperdot can be used for investigations and people sharing their results, instead of sharing screenshots of dashboards. Just my 2 cents.

Conclusion

In general, I would focus on the frontend a whole lot more, it's the main differentiator and user facing capability and I would believe a massive positive point for the community. For the backend, I'd advise considering collaborating with Substrate-ETL (for data directly from chains) or with Dot-ETL together to integrate things like real-time capabilities on top of their solutions. But this is your project, you decide.

Believe me when I say it will simplify the infrastructure a lot. Down the line, if you see the need of building a custom backend or indexer for your own tailored use case (as value prop), it would be easy to switch. There's massive heavy lifting and operations involved with the backend part which I believe are under estimated here (don't take it as criticism please)

Would you consider building your solution on top of a hybrid of what Substrate-ETL/DOT-ETL is providing and focusing on the frontend / user facing part?

Could you also please list the number of chains to be integrated if you end up deciding to do it yourself rather than leverage on substrate-etl/dot-etl? This will help validate the timeline a bit, since it's very ambitious.

My recommendation

Collaborate with Substrate-ETL/Dot-ETL on the "where and how do we get data"
Rework the proposal to be much more frontend focused "Dune for Substrate, but better"
Come up with a value proposition to make it viable: "how would the project stay sustainable and capture value?" to sustain operating costs
Interview a few parachains/ecosystem teams & researchers on the top 3 "dashboards" or interactive data use cases they'd like to see and include how you would solve that as part of the analysis and data platform RFP.

I'm aware this might go a bit beyond the RFP, but there is some possible synergy with other ecosystem teams here that we should definitely leverage. As software developers, I know it's hard but it'll make things much easier.

Going forward I would also recommend:

Reaching out to the Colorful Notion team and DOT-ETL team about their roadmaps and data access
Revising your plan to include more the frontend/user facing part, less the backend part

This is my feedback so far. Hope it helps.

dsm-w3f · 2023-07-31T11:05:45Z

@KarimJedda thank you for the review. @cattania could you give us feedback about the comments from Karim? Are you willing to incorporate his suggestions in the application? What is your opinion on that?

cattania · 2023-08-01T10:28:18Z

Thank you very much for @KarimJedda review. We have seriously thought over Karim suggestions, and there are several main problems at present

Karim suggests combining Substrate-ETL/Dot-ETL hyperdot,

We looked at Substrate-ETL, and I see that the interface it provides is based on google bigquery, which is a specific sql statement, we want to provide ANSI SQL compliant queries, so Substrate-ETL can be used, But it doesn't seem to meet our ultimate goal.
Dot-ETL seems to be a wip at the moment, and it's unclear what the final interface will look like.
In fact, as Karim said, developing an etl is very complicated, however we have implemented the functions of the etl part based on subxt, but if the functions of dot-etl meet us, we can integrate dot-etl.

There is no problem with Karim suggestion being more UI oriented.

Therefore, we adopt part of @KarimJedda suggestion:

Simplify the storage part of hyperdot. If dot-etl or further explore substarte-etl can meet our needs, we will integrate the data sources provided by these systems
hyperdot focuses more on UI, "Dune for Substrate, but better"

Finally, if adopted, We feel hyperdot development costs will come down, and we will change the amount of money we need to fund

dsm-w3f · 2023-08-01T11:04:58Z

@cattania thank you for the answer. I think it is a reasonable approach. Could you please incorporate these changes in the application document? After that, I think we will be ready for the review of the committee.

cattania · 2023-08-07T04:02:39Z

Thanks for your reply, we are in the process of changing the proposal

dsm-w3f · 2023-08-17T18:42:57Z

@cattania how is it going the changes in the proposal? Any forecast for delivering it?

cattania · 2023-08-18T10:27:10Z

@cattania how is it going the changes in the proposal? Any forecast for delivering it?

yes, we will modify the proposal this week according to previous discussions, focusing more on the integration of UI, substrate-etl and dot-etl

cattania · 2023-08-21T06:24:31Z

We have adapted the contents of the proposal in accordance with the previous discussion, including

After we re-evaluated the substrate-etl, which helped to reduce our development cost, we reduced the number of grants
As suggested by @KarimJedda , we will focus more on dashboard
We carefully investigated other projects in the community, communicated with some users in the X (twitter), and added QA to answer some questions

dsm-w3f · 2023-08-22T18:48:32Z

@cattania thank you for the changes. I have some doubts about the application document. I notice that there are three main components, substrate-etl, hyperdot-frontend and hyperdot-node. However, It is not clear to me the relation of these components. Where the data will be stored? How the user will connect with a dashboard? They will need to run a node to be able to see a dashboard? How the data extraction will work?

Furthermore, I expected to see some prototypes of the tool. It will enable to use charts in the dashboards or only SQL queries?

cattania · 2023-08-23T06:46:22Z

@dsm-w3f Thanks for your reply, the relationship is clear

substrate-etl is an on-chain etl tool that currently stores data in the Google bigquery public data warehouse
hyperdot-node is a backend that integrates substrate-etl. It will use the bigquery data warehouse provided by substrate-etl and provide some api interfaces for hyperdot-front-end access
hyperdot-front-end is the UI that allows users to query through interactive SQL

It allows both chart and SQL queries on the dashboard

dsm-w3f

@cattania thank you for the answer. I think it is a good approach. Would be nice to have more prototypes to see the screens of the application but as you are confirming that will be possible to query and generate chats as well as share them, it looks good to me. Happy to go forward with the project. I'll also ping the other members of the committee to reevaluate this application.

Noc2

Thanks for the application. Could @bytesleak also sign the terms and conditions?

cattania · 2023-08-24T01:43:08Z

Thank you for your reply, @dsm-w3f We are sure that the query chat can be shared with others, such as generating pictures or other better ways. We are still working on the design prototype. Of course, I will immediately communicate with @bytesleak for signing. Thank you again for your reply

github-actions · 2023-08-24T11:27:37Z

Congratulations and welcome to the Web3 Foundation Grants Program! Please refer to our Milestone Delivery repository for instructions on how to submit milestones and invoices, our FAQ for frequently asked questions and the support section of our README for more ways to find answers to your questions.

Before you start, take a moment to read through our announcement guidelines for all communications related to the grant or make them known to the right person in your organisation. In particular, please don't announce the grant publicly before at least the first milestone of your project has been approved. At that point or shortly before, you can get in touch with us at [email protected] and we'll be happy to collaborate on an announcement about the work you’re doing.

Lastly, please remember to let us know in case you run into any delays or deviate from the deliverables in your application. You can either leave a comment here or directly request to amend your application via PR. We wish you luck with your project! 🚀

) * add hyperdot rfc * fix: milestone compatibility and delivery shared dashboards * update: changed somethings in the rfc --------- Co-authored-by: alloctor <[email protected]>

add hyperdot rfc

92826ff

dsm-w3f self-assigned this Jun 22, 2023

dsm-w3f suggested changes Jun 22, 2023

View reviewed changes

fix: milestone compatibility and delivery shared dashboards

6fe6791

dsm-w3f self-requested a review June 27, 2023 18:45

dsm-w3f added the ready for review The project is ready to be reviewed by the committee members. label Jun 27, 2023

keeganquigley mentioned this pull request Jul 28, 2023

P2P data platform proposal #1866

Merged

10 tasks

dsm-w3f mentioned this pull request Aug 9, 2023

Adding application: Polkadot Analytics Platform #1883

Merged

10 tasks

semuelle added ready for review The project is ready to be reviewed by the committee members. changes requested The team needs to clarify a few things first. and removed ready for review The project is ready to be reviewed by the committee members. labels Aug 11, 2023

update: changed somethings in the rfc

498f172

dsm-w3f approved these changes Aug 23, 2023

View reviewed changes

dsm-w3f removed the changes requested The team needs to clarify a few things first. label Aug 23, 2023

Noc2 requested changes Aug 23, 2023

View reviewed changes

Noc2 approved these changes Aug 24, 2023

View reviewed changes

takahser approved these changes Aug 24, 2023

View reviewed changes

takahser merged commit 149ef98 into w3f:master Aug 24, 2023
6 of 7 checks passed

w3f deleted a comment from github-actions bot Aug 24, 2023

cattania mentioned this pull request Nov 14, 2023

Delivery hyperdot milestone 1 w3f/Grant-Milestone-Delivery#1058

Merged

5 tasks

cattania mentioned this pull request Dec 25, 2023

Delivery hyperdot milestone 2 w3f/Grant-Milestone-Delivery#1091

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperdot - Powerful data analysis and creations platform — RFP #1815

Hyperdot - Powerful data analysis and creations platform — RFP #1815

cattania commented Jun 21, 2023 •

edited

Loading

CLAassistant commented Jun 21, 2023 •

edited

Loading

dsm-w3f left a comment •

edited

Loading

cattania commented Jun 23, 2023 •

edited

Loading

dsm-w3f commented Jun 23, 2023

cattania commented Jun 25, 2023

dsm-w3f commented Jun 26, 2023

cattania commented Jun 27, 2023 •

edited

Loading

dsm-w3f commented Jun 27, 2023

cattania commented Jun 28, 2023

semuelle commented Jun 28, 2023

cattania commented Jul 7, 2023

KarimJedda commented Jul 7, 2023

cattania commented Jul 7, 2023

takahser commented Jul 20, 2023

cattania commented Jul 25, 2023

KarimJedda commented Jul 25, 2023

dsm-w3f commented Jul 31, 2023

cattania commented Aug 1, 2023

dsm-w3f commented Aug 1, 2023

cattania commented Aug 7, 2023

dsm-w3f commented Aug 17, 2023

cattania commented Aug 18, 2023

cattania commented Aug 21, 2023

dsm-w3f commented Aug 22, 2023

cattania commented Aug 23, 2023

dsm-w3f left a comment

Noc2 left a comment

cattania commented Aug 24, 2023

github-actions bot commented Aug 24, 2023

Hyperdot - Powerful data analysis and creations platform — RFP #1815

Hyperdot - Powerful data analysis and creations platform — RFP #1815

Conversation

cattania commented Jun 21, 2023 • edited Loading

Project Abstract

Grant level

Application Checklist

CLAassistant commented Jun 21, 2023 • edited Loading

dsm-w3f left a comment • edited Loading

Choose a reason for hiding this comment

cattania commented Jun 23, 2023 • edited Loading

dsm-w3f commented Jun 23, 2023

cattania commented Jun 25, 2023

What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project?

Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost?

How do you plan for this project to be financially sustainable over time?

dsm-w3f commented Jun 26, 2023

cattania commented Jun 27, 2023 • edited Loading

dsm-w3f commented Jun 27, 2023

cattania commented Jun 28, 2023

semuelle commented Jun 28, 2023

cattania commented Jul 7, 2023

KarimJedda commented Jul 7, 2023

cattania commented Jul 7, 2023

takahser commented Jul 20, 2023

cattania commented Jul 25, 2023

KarimJedda commented Jul 25, 2023

High level

Milestones

Milestone 1 - Backend

Milestone 2 - Frontend

Conclusion

My recommendation

dsm-w3f commented Jul 31, 2023

cattania commented Aug 1, 2023

dsm-w3f commented Aug 1, 2023

cattania commented Aug 7, 2023

dsm-w3f commented Aug 17, 2023

cattania commented Aug 18, 2023

cattania commented Aug 21, 2023

dsm-w3f commented Aug 22, 2023

cattania commented Aug 23, 2023

dsm-w3f left a comment

Choose a reason for hiding this comment

Noc2 left a comment

Choose a reason for hiding this comment

cattania commented Aug 24, 2023

github-actions bot commented Aug 24, 2023

cattania commented Jun 21, 2023 •

edited

Loading

CLAassistant commented Jun 21, 2023 •

edited

Loading

dsm-w3f left a comment •

edited

Loading

cattania commented Jun 23, 2023 •

edited

Loading

cattania commented Jun 27, 2023 •

edited

Loading