Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperdot - Powerful data analysis and creations platform — RFP #1815

Merged
merged 3 commits into from
Aug 24, 2023

Conversation

cattania
Copy link
Contributor

@cattania cattania commented Jun 21, 2023

Project Abstract

Please replace these instructions with a brief description of your project summarising key points (1-2 paragraphs).

If your application is a follow-up to a previous grant, please mention which one in the first line of the abstract and include a link to previous pull requests if applicable.

Grant level

  • Level 1: Up to $10,000, 2 approvals
  • Level 2: Up to $30,000, 3 approvals
  • Level 3: Unlimited, 5 approvals (for >$100k: Web3 Foundation Council approval)

Application Checklist

  • The application template has been copied and aptly renamed (project_name.md).
  • I have read the application guidelines.
  • Payment details have been provided (bank details via email or BTC, Ethereum (USDC/DAI) or Polkadot/Kusama (USDT) address in the application).
  • The software delivered for this grant will be released under an open-source license specified in the application.
  • The initial PR contains only one commit (squash and force-push if needed).
  • The grant will only be announced once the first milestone has been accepted (see the announcement guidelines).
  • I prefer the discussion of this application to take place in a private Element/Matrix channel. My username is: @tania.infra3:matrix.org (change the homeserver if you use a different one)

@CLAassistant
Copy link

CLAassistant commented Jun 21, 2023

CLA assistant check
All committers have signed the CLA.

@dsm-w3f dsm-w3f self-assigned this Jun 22, 2023
Copy link
Contributor

@dsm-w3f dsm-w3f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cattania thank you for the application. I notice that the deliverables in the milestones are not compliant with our application template. Could you please adjust to include the mandatory ones, especially in milestones 2 and 3? You can take a look at other applications to see examples of how to describe the milestone deliverables.

I understood that the main objective of the application is to have a tool with great performance. Correct me if I understood wrong. As you already have an MVP, did you already run some benchmarks about the query performance and also the size of the storage in comparison with other tools? If yes, could you share these benchmarks with us or give us an idea about the performance of the tool?

Furthermore, I think the RFP mentioned proposes to have a tool that is flexible like Dune to generate dynamic dashboards that can be shared with other people. I found that is possible to create the queries in the scope of this application, but it will be possible to share them with other people to use it?

@cattania
Copy link
Contributor Author

cattania commented Jun 23, 2023

@cattania thank you for the application. I notice that the deliverables in the milestones are not compliant with our application template. Could you please adjust to include the mandatory ones, especially in milestones 2 and 3? You can take a look at other applications to see examples of how to describe the milestone deliverables.

I understood that the main objective of the application is to have a tool with great performance. Correct me if I understood wrong. As you already have an MVP, did you already run some benchmarks about the query performance and also the size of the storage in comparison with other tools? If yes, could you share these benchmarks with us or give us an idea about the performance of the tool?

Furthermore, I think the RFP mentioned proposes to have a tool that is flexible like Dune to generate dynamic dashboards that can be shared with other people. I found that is possible to create the queries in the scope of this application, but it will be possible to share them with other people to use it?

Thanks for your reply, I adjusted the milestone compatibility.

In terms of performance, I note that Dune customizes the data engine for crypto data, but even so, running complex analytics on tens of millions of blocks of data can be very time consuming, as postgres is not suitable for analytics scenarios. hyperdot currently uses postgres, so it is expected that it will have similar benchmark with performance data to Dune. At the same time, we will establish a good index for block crypto data and enable data compression to further optimize performance.

In addition, somewhat different is that hyperdot is more focused on scalability, flexibility, and performance of different analysis scenarios, hyperdot use of multiple data engines to solve this problem.

@dsm-w3f
Copy link
Contributor

dsm-w3f commented Jun 23, 2023

@cattania thank you for the answer. What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project? Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost? How do you plan for this project to be financially sustainable over time?

@cattania
Copy link
Contributor Author

@cattania thank you for the answer. What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project? Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost? How do you plan for this project to be financially sustainable over time?

What is your long-term vision for this project? This intends to be an open-source library or a business, or do you have some other long-term vision for this project?

In this project, first hyperdot will act as a business.

In the coming months, we hope that we can provide users with a SQL query Polkadot/Kusama/Substrate chain data platform for data analysis, at the same time allow can share dashboard. This is exactly what hyperdot describes in this RFP.

Next, we want to help get hyperdot in front of more people with the traction of the Foundation. Then, we will continue to improve our technical capabilities, hyperdot will implement the Polkadot/Kusama/Substrate more data model, data analysis engine, query interactions (such as Chatgpt). We are also working to expose hyperdot's indexed data and analytics capabilities to outside developers (like thegraph and subquery) via apis and wasm. These efforts will make hyperdot more competitive.

Finally, hyperdot's long-term vision is to become the most competitive platform for on-chain crypto data analysis and creations in web3.

Do you plan to maintain an index with Polkadot/Kusama data for usage? If so, how do you plan to plan to fund the storage cost?

Yes, this is in line with hyperdot's goals. For storage costs, we plan to obtain funding from the following way

  1. hyperdot is free for small queries, new user-created queries. For large queries a certain fee will be charged through the smart contract. By balancing the two, hyperdot can easily maintain storage costs.

  2. We plan to identify historical data and new data, and archive historical data to inexpensive storage (e.g. S3 or IPFS).

How do you plan for this project to be financially sustainable over time?

hyperdot is sustainable because

  1. We will charge a reasonable fee for the query and dashboard data. This cost model allows us to balance the cost and benefit.
  2. As mentioned earlier, we will expose our data indexing and analysis capabilities to more developers through api and wasm, so that we can charge for indexing and running on demand to get some benefit

@dsm-w3f
Copy link
Contributor

dsm-w3f commented Jun 26, 2023

@cattania Thanks for the answer. Are you aware of these two projects that recently received grants from us?

#1768
#1716

Could you please provide a brief comparison of your project with them? Also notice the Data Alliance bounty mentioned in the discussions of these PRs. How could your project fit in this bounty program? It is not launched yet, but it is planned to happen.

@cattania
Copy link
Contributor Author

cattania commented Jun 27, 2023

Thanks for your reply. I see #1768 and #1716.

#1716 is a etl tool that, coincidentally, we used to maintain ethereum-etl, for a brief comparison

  1. hyperdot and Dot-ETL Project Proposal #1716 both have ETL capabilities, but hyperdot provides a post-ETL data engine, i.e. $\text{extract} \to \text{transform } \to \text{load data} \to \text{postgres}$
  2. hyperdot provides a data analysis website that allows anyone to make interactive queries and share visual dashboards using sql

#1768 goes one step further on top of Dott-ETL store data into google BigQuery, similar to how hyperdot provides a post-ETL data engine. It also provides a dashboard, for a brief comparison

  1. Both hyperdot and Deep Account Analytics in Three Tiers for the Polkadot Data Alliance #1768 can collect and store data to the data engine. The difference is that hyperdot favors postgres, which is free and open source, so that anyone can easily run hyperdot. hyperdot also supports multiple data engines, which means that it can support postgres as well as other data engines.

  2. Deep Account Analytics in Three Tiers for the Polkadot Data Alliance #1768 provides a data explore dashboard similar to subscan, but focuses more on solving deep account analytics problems, hyperdot is more similar to Dune, with its focus on interactive queries using sql, visualization of query data dashboards, and shared authoring of data analytics.

About fit Polkadot Data Alliance Bounty, I noticed it and we thought hyperdot very fit

  1. hyperdot implements transparent open source data infrastructure, which can well support ETL-like implementation of multiple substrate chains in polkadot ecosystem

  2. hyperdot provides multiple data engines to support data analysis and querying, and processing via a unified sql engine (even if they don't support sql), similar to dotlake mentioned in this document

  3. hyperdot-fronted-end provides metrics computation and intermediate table analysis similar to Dune

Furthermore, we thought seriously about the Polkadot Data Alliance Bounty idea of "providing a comprehensive, accurate, and accessible data warehouse and rewarding users who contribute to data analysis". hyperdot also aims to provide a multi-chain, multi-data engine, unified sql querying data warehouse, with plans to add creations incentives in the future.

@dsm-w3f dsm-w3f self-requested a review June 27, 2023 18:45
@dsm-w3f dsm-w3f added the ready for review The project is ready to be reviewed by the committee members. label Jun 27, 2023
@dsm-w3f
Copy link
Contributor

dsm-w3f commented Jun 27, 2023

@cattania thank you for the answer. I marked the application as ready for review. The committee will take a look and can make more questions. We will provide a feedback soon.

@cattania
Copy link
Contributor Author

@cattania thank you for the answer. I marked the application as ready for review. The committee will take a look and can make more questions. We will provide a feedback soon.

Thank you for your reply. We are ready.

In addition, we would like to add a comparison between #1716 and hyperdot. Dot-ETL is built on subquery, while hyperdot is built on subxt. hyperdot is more flexible. Not limited by subquery (e.g. data type of index, supported chain limited)

@semuelle
Copy link
Member

@KarimJedda, could you have a look?

@cattania
Copy link
Contributor Author

cattania commented Jul 7, 2023

Any progress?

@KarimJedda
Copy link

Hello @semuelle & @cattania ! This is on my list, I'll aim to get back to you by next week. Due to Decoded happening last week there wasn't much time yet. Hope that's fine ✌️

@cattania
Copy link
Contributor Author

cattania commented Jul 7, 2023

Thanks for your reply, I guessed it might be because of Decoded 🤣

@takahser
Copy link
Collaborator

@KarimJedda any update? 🙃

@cattania
Copy link
Contributor Author

Is there any progress on the project? We'd love to hear your feedback

@KarimJedda
Copy link

Took me a while to make time because of another conference (EuroPython) last week and urgent data request for this week, my apologies.

High level

As far as I understand, this project has two components: an indexer (hyperdot-node) and a frontend (hyperdot-fronted-end) to build in essence a Dune analytics for Substrate and the Polkadot ecosystem.

From what I can see in the frontend in the POC you shared, that it's currently querying in the browser a Postgres database. What would be beneficial here is to see how that ties in with online charting capabilities, like what Colorful Notion is proposing with the Apache Superset project they're integrating: How do you see users interacting and using the system that would be similar (or different) to Dune?

You mention it provides also a post-ETL data engine, leveraging subxt to ingest data directly from the chains themselves.

A few general questions:

  • How do you plan to cover the costs of accessing all the nodes in the ecosystem?
  • How do you plan to maintain the system (system operations, DB backups)?
  • Are you familiar with operations and maintenance of Postgres database?
  • How do you plan to share the data, if people would like direct access? Is this a use case you plan to cover?

Milestones

Milestone 1 - Backend

0b. you mention sending test transaction, will this also be used to submit transactions somehow? If yes, can you elaborate a little bit how that fits in in the infrastructure diagram?

0d. Will you be providing the built docker containers too?

1. How will you handle metadata upgrades and data backfills of the data? Some words on that would help.

3. Do you already have a POC of this part? From own experience, it's a bit challenging, just want to get your thoughts.

4. This is the central part of the project, what advantage would JSON-RPC for the backend component bring to the system? As for the pub/sub with the frontend, I guess you'll use Websockets right?

5. On this part, I think it's alright to store it in PostgreSQL and I'm sure it'll handle the load. However, it's not an easy thing to maintain and share with the ecosystem. Would an ELT type system work too? You list in the infrastructure diagram that you can query using DuckDB on files, wouldn't that make the system a bit simpler and more portable? If this is meant to be open sourced, it's operation mode should be simple.

9. I'd add data migrations here too, since I believe the data model might be subject to changes.

Milestone 2 - Frontend

I'm not a frontend expert but I think this looks ok. I'd focus though on very fast iteration and making the frontend agnostic of data sources (ie designing the API the frontend uses in a way that it's not tightly coupled to the storage layer or DB). Meaning, it should work with DuckDB, Postgres, BigQuery etc. Similar to what Apache Superset or Metabase is providing but prettier and more tailored to a "Dune" use case:

  • public queries
  • shared dashboards
  • possibilities of embedding text analysis to the frontend
  • comments

That way hyperdot can be used for investigations and people sharing their results, instead of sharing screenshots of dashboards. Just my 2 cents.

Conclusion

In general, I would focus on the frontend a whole lot more, it's the main differentiator and user facing capability and I would believe a massive positive point for the community. For the backend, I'd advise considering collaborating with Substrate-ETL (for data directly from chains) or with Dot-ETL together to integrate things like real-time capabilities on top of their solutions. But this is your project, you decide.

Believe me when I say it will simplify the infrastructure a lot. Down the line, if you see the need of building a custom backend or indexer for your own tailored use case (as value prop), it would be easy to switch. There's massive heavy lifting and operations involved with the backend part which I believe are under estimated here (don't take it as criticism please)

Would you consider building your solution on top of a hybrid of what Substrate-ETL/DOT-ETL is providing and focusing on the frontend / user facing part?

Could you also please list the number of chains to be integrated if you end up deciding to do it yourself rather than leverage on substrate-etl/dot-etl? This will help validate the timeline a bit, since it's very ambitious.

My recommendation

  • Collaborate with Substrate-ETL/Dot-ETL on the "where and how do we get data"
  • Rework the proposal to be much more frontend focused "Dune for Substrate, but better"
  • Come up with a value proposition to make it viable: "how would the project stay sustainable and capture value?" to sustain operating costs
  • Interview a few parachains/ecosystem teams & researchers on the top 3 "dashboards" or interactive data use cases they'd like to see and include how you would solve that as part of the analysis and data platform RFP.

I'm aware this might go a bit beyond the RFP, but there is some possible synergy with other ecosystem teams here that we should definitely leverage. As software developers, I know it's hard but it'll make things much easier.

Going forward I would also recommend:

  • Reaching out to the Colorful Notion team and DOT-ETL team about their roadmaps and data access
  • Revising your plan to include more the frontend/user facing part, less the backend part

This is my feedback so far. Hope it helps.

@keeganquigley keeganquigley mentioned this pull request Jul 28, 2023
10 tasks
@dsm-w3f
Copy link
Contributor

dsm-w3f commented Jul 31, 2023

@KarimJedda thank you for the review. @cattania could you give us feedback about the comments from Karim? Are you willing to incorporate his suggestions in the application? What is your opinion on that?

@cattania
Copy link
Contributor Author

cattania commented Aug 1, 2023

Thank you very much for @KarimJedda review. We have seriously thought over Karim suggestions, and there are several main problems at present

  1. Karim suggests combining Substrate-ETL/Dot-ETL hyperdot,
  • We looked at Substrate-ETL, and I see that the interface it provides is based on google bigquery, which is a specific sql statement, we want to provide ANSI SQL compliant queries, so Substrate-ETL can be used, But it doesn't seem to meet our ultimate goal.

  • Dot-ETL seems to be a wip at the moment, and it's unclear what the final interface will look like.

  • In fact, as Karim said, developing an etl is very complicated, however we have implemented the functions of the etl part based on subxt, but if the functions of dot-etl meet us, we can integrate dot-etl.

  1. There is no problem with Karim suggestion being more UI oriented.

Therefore, we adopt part of @KarimJedda suggestion:

  1. Simplify the storage part of hyperdot. If dot-etl or further explore substarte-etl can meet our needs, we will integrate the data sources provided by these systems
  2. hyperdot focuses more on UI, "Dune for Substrate, but better"

Finally, if adopted, We feel hyperdot development costs will come down, and we will change the amount of money we need to fund

@dsm-w3f
Copy link
Contributor

dsm-w3f commented Aug 1, 2023

@cattania thank you for the answer. I think it is a reasonable approach. Could you please incorporate these changes in the application document? After that, I think we will be ready for the review of the committee.

@cattania
Copy link
Contributor Author

cattania commented Aug 7, 2023

Thanks for your reply, we are in the process of changing the proposal

@semuelle semuelle added ready for review The project is ready to be reviewed by the committee members. changes requested The team needs to clarify a few things first. and removed ready for review The project is ready to be reviewed by the committee members. labels Aug 11, 2023
@dsm-w3f
Copy link
Contributor

dsm-w3f commented Aug 17, 2023

@cattania how is it going the changes in the proposal? Any forecast for delivering it?

@cattania
Copy link
Contributor Author

@cattania how is it going the changes in the proposal? Any forecast for delivering it?

yes, we will modify the proposal this week according to previous discussions, focusing more on the integration of UI, substrate-etl and dot-etl

@cattania
Copy link
Contributor Author

We have adapted the contents of the proposal in accordance with the previous discussion, including

  1. After we re-evaluated the substrate-etl, which helped to reduce our development cost, we reduced the number of grants
  2. As suggested by @KarimJedda , we will focus more on dashboard
  3. We carefully investigated other projects in the community, communicated with some users in the X (twitter), and added QA to answer some questions

@dsm-w3f
Copy link
Contributor

dsm-w3f commented Aug 22, 2023

@cattania thank you for the changes. I have some doubts about the application document. I notice that there are three main components, substrate-etl, hyperdot-frontend and hyperdot-node. However, It is not clear to me the relation of these components. Where the data will be stored? How the user will connect with a dashboard? They will need to run a node to be able to see a dashboard? How the data extraction will work?

Furthermore, I expected to see some prototypes of the tool. It will enable to use charts in the dashboards or only SQL queries?

@cattania
Copy link
Contributor Author

@dsm-w3f Thanks for your reply, the relationship is clear

  1. substrate-etl is an on-chain etl tool that currently stores data in the Google bigquery public data warehouse
  2. hyperdot-node is a backend that integrates substrate-etl. It will use the bigquery data warehouse provided by substrate-etl and provide some api interfaces for hyperdot-front-end access
  3. hyperdot-front-end is the UI that allows users to query through interactive SQL

It allows both chart and SQL queries on the dashboard

Copy link
Contributor

@dsm-w3f dsm-w3f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cattania thank you for the answer. I think it is a good approach. Would be nice to have more prototypes to see the screens of the application but as you are confirming that will be possible to query and generate chats as well as share them, it looks good to me. Happy to go forward with the project. I'll also ping the other members of the committee to reevaluate this application.

@dsm-w3f dsm-w3f removed the changes requested The team needs to clarify a few things first. label Aug 23, 2023
Copy link
Collaborator

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the application. Could @bytesleak also sign the terms and conditions?

@cattania
Copy link
Contributor Author

Thank you for your reply, @dsm-w3f We are sure that the query chat can be shared with others, such as generating pictures or other better ways. We are still working on the design prototype. Of course, I will immediately communicate with @bytesleak for signing. Thank you again for your reply

@takahser takahser merged commit 149ef98 into w3f:master Aug 24, 2023
6 of 7 checks passed
@github-actions
Copy link
Contributor

Congratulations and welcome to the Web3 Foundation Grants Program! Please refer to our Milestone Delivery repository for instructions on how to submit milestones and invoices, our FAQ for frequently asked questions and the support section of our README for more ways to find answers to your questions.

Before you start, take a moment to read through our announcement guidelines for all communications related to the grant or make them known to the right person in your organisation. In particular, please don't announce the grant publicly before at least the first milestone of your project has been approved. At that point or shortly before, you can get in touch with us at [email protected] and we'll be happy to collaborate on an announcement about the work you’re doing.

Lastly, please remember to let us know in case you run into any delays or deviate from the deliverables in your application. You can either leave a comment here or directly request to amend your application via PR. We wish you luck with your project! 🚀

@w3f w3f deleted a comment from github-actions bot Aug 24, 2023
ainhoa-a pushed a commit to ainhoa-a/Grants-Program that referenced this pull request Jan 26, 2024
)

* add hyperdot rfc

* fix: milestone compatibility and delivery shared dashboards

* update: changed somethings in the rfc

---------

Co-authored-by: alloctor <[email protected]>
taqtiqa-mark pushed a commit to taqtiqa-mark/Grants-Program that referenced this pull request Jun 6, 2024
)

* add hyperdot rfc

* fix: milestone compatibility and delivery shared dashboards

* update: changed somethings in the rfc

---------

Co-authored-by: alloctor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review The project is ready to be reviewed by the committee members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants