Skip to content

Commit

Permalink
adding ui query cost estimation
Browse files Browse the repository at this point in the history
  • Loading branch information
sydneynotthecity committed Jun 30, 2023
1 parent c2f3d3e commit df1f9ca
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 3 deletions.
6 changes: 4 additions & 2 deletions docs/accessing-data/connecting.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ title: "Connecting"
sidebar_position: 10
---

BigQuery offers multiple connection methods to the Hubble dataset. This guide details three common methods:
BigQuery offers multiple connection methods to Hubble. This guide details three common methods:

- [BigQuery UI](#bigquery-ui) - analysts that need to perform ad hoc analysis using SQL
- [BigQuery SDK](#bigquery-sdk) - developers that need to integrate data into applications
- [Looker Studio](#looker-studio) - business people that need to visualize data

## Prerequisites

To access the Hubble dataset, you will need a Google Cloud Project with billing and the BigQuery API enabled. For more information, please follow the instructions provided by [Google Cloud](https://cloud.google.com/bigquery/docs/quickstarts/query-public-dataset-console).
To access Hubble, you will need a Google Cloud Project with billing and the BigQuery API enabled. For more information, please follow the instructions provided by [Google Cloud](https://cloud.google.com/bigquery/docs/quickstarts/query-public-dataset-console).

Google does provide a BigQuery Sandbox for free that allows users to explore datasets in a limited capacity.

Expand Down Expand Up @@ -58,6 +58,8 @@ Install the client library locally, and configure your environment to use your G
python3 --version
# if you do not have pip, install it
python -m pip install --upgrade pip

# install bigquery client library
pip install --upgrade google-cloud-bigquery
gcloud config set project PROJECT_ID
```
Expand Down
31 changes: 31 additions & 0 deletions docs/accessing-data/optimizing-queries.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ order by `month`

**Performance Summary**

By pruning partitions and aggregating on a clustered field, the query processing costs reduce by a factor of 8.

| | Bytes Processed | Cost |
| ---------------- | --------------- | ------ |
| Original Query | 408.1 GB | $2.041 |
Expand Down Expand Up @@ -127,6 +129,8 @@ where batch_run_date >= '2023-05-01'

**Performance Summary**

Hubble stores wide tables. Query performance is greatly improved by selecting only the data you need. This principle is critical when exploring the operations and transactions tables, which are the largest tables in Hubble.

| | Bytes Processed | Cost |
| -------------- | --------------- | ------ |
| Original Query | 769.45 GB | $3.847 |
Expand Down Expand Up @@ -156,6 +160,33 @@ If you need to estimate costs before running a query, there are several options

### BigQuery Console

The BigQuery Console comes with a built-in query validator. It verifies query syntax and provides an estimate of the number of bytes processed. The validator can be found in the upper right hand corner of the Query Editor, next to the green checkmark.

To calculate the query cost, convert the number of bytes processed into terabytes, and multiply the result by $5:

`(estimated bytes read / 1TB) * $5`

Paste the following query into the Editor to view the estimated bytes processed.

<CodeExample>

```sql
select timestamp_trunc(closed_at, month) as month,
sum(tx_set_operation_count) as total_operations
from `crypto-stellar.crypto_stellar.history_ledgers`
where batch_run_date >= '2023-01-01T00:00:00'
and batch_run_date < '2023-06-01T00:00:00'
and closed_at >= '2023-01-01T00:00:00'
and closed_at < '2023-06-01T00:00:00'
group by month
```

</CodeExample>

The validator estimates that 51.95MB of data will be read.

0.00005195 TB * $5 = $0.000259. _That’s a cheap query!_

### dryRun Config Parameter

If you are submitting a query through a [BigQuery client library](https://cloud.google.com/bigquery/docs/reference/libraries), you can perform a dry run to estimate the total bytes processed before submitting the query job.
Expand Down
2 changes: 1 addition & 1 deletion docs/accessing-data/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sidebar_position: 0

Hubble is an open-source, publicly available dataset that provides a complete historical record of the Stellar network. Similar to Horizon, it ingests and presents the data produced by the Stellar network in a format that is easier to consume than the performance-oriented data representations used by Stellar Core. The dataset is hosted on BigQuery–meaning it is suitable for large, analytic workloads, historical data retrieval and complex data aggregation. **Hubble should not be used for real-time data retrieval and cannot submit transactions to the network.** For real time use cases, we recommend [running an API server](/docs/run-api-server).

This guide describes when to use Hubble and how to connect. For more information regarding underlying data structures, queries and examples, please refer to [Viewing Metadata](/docs/accessing-data/viewing-metadata) and [Optimizing Queries](/docs/accessing-data/optimizing-queries).
This guide describes when to use Hubble and how to connect. To view the underlying data structures, queries and examples, use the [Viewing Metadata](/docs/accessing-data/viewing-metadata) and [Optimizing Queries](/docs/accessing-data/optimizing-queries) tutorials.

## Why Use Hubble?

Expand Down

0 comments on commit df1f9ca

Please sign in to comment.