Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write docs #140

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
115 changes: 3 additions & 112 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,16 @@
<hr/>
</div>

## Intro
## Introduction

_Scrolls_ is a tool for building and maintaining read-optimized collections of Cardano's on-chain entities. It crawls the history of the chain and aggregates all data to reflect the current state of affairs. Once the whole history has been processed, _Scrolls_ watches the tip of the chain to keep the collections up-to-date.

Examples of collections are: "utxo by address", "chain parameters by epoch", "pool metadata by pool id", "tx cbor by hash", etc.

> In other words, _Scrolls_ is just a map-reduce algorithm that aggregates the history of the chain into use-case-specific, key-value dictionaries.

Check our [documentation](https://txpipe.github.io/scrolls) for detailed information on how to start working with Scrolls.

:warning: this tool is under heavy development. Library API, configuration schema and storage structure may vary drastically. Several important features are still missing. Use at your own peril.

## Storage
Expand Down Expand Up @@ -101,99 +103,6 @@ Scrolls is a pipeline that takes block data as input and outputs DB update comma
- [ ] By Mint Policy / Asset
- [ ] By Pool

## Testdrive

In the `testdrive` folder you'll find a minimal example that uses docker-compose to spin up a local Redis instance and a Scrolls daemon. You'll need Docker and docker-compose installed in your local machine. Run the following commands to start it:

```sh
cd testdrive
docker-compose up
```

You should see the logs of both _Redis_ and _Scrolls_ crawling the chain from a remote relay node. If you're familiar with Redis CLI, you can run the following commands to see the data being cached:

```sh
redis:6379> KEYS *
1) "c1.addr1qx0w02a2ez32tzh2wveu80nyml9hd50yp0udly07u5crl6x57nfgdzya4axrl8mfx450sxpyzskkl95sx5l7hcfw59psvu6ysx"
2) "c1.addr1qx68j552ywp6engr2s9xt7aawgpmr526krzt4mmzc8qe7p8qwjaawywglaawe74mwu726w49e8e0l9mexcwjk4kqm2tq5lmpd8"
3) "c1.addr1q90z7ujdyyct0jhcncrpv5ypzwytd3p7t0wv93anthmzvadjcq6ss65vaupzmy59dxj43lchlra0l482rh0qrw474snsgnq3df"
4) "c1.addr1w8vg4e5xdpad2jt0z775rt0alwmku3my2dmw8ezd884zvtssrq6fg"
5) "c1.addr1q9tj3tdhaxqyph568h7efh6h0f078m2pxyd0xgzq47htwe3vep55nfane06hggrc2gvnpdj4gcf26kzhkd3fs874hzhszja3lh"
6) "c1.addr1w8tqqyccvj7402zns2tea78d42etw520fzvf22zmyasjdtsv3e5rz"
redis:6379> SMEMBERS c1.addr1w8tqqyccvj7402zns2tea78d42etw520fzvf22zmyasjdtsv3e5rz
1) "2548228522837ea580bc55a3e6a09479deca499b5e7f3c08602a1f3191a178e7:20"
2) "04086c503512833c7a0c11fc85f7d0f0422db9d14b31275b3d4327c40c6fd73b:25"
redis:6379>
```

Once you're done with the testdive, you can clean your environment by running:

```sh
docker-compose down
```

## Installing

We currently provide the following ways to install _Scrolls_:

- Using one of the pre-compiled binaries shared via [Github Releases](https://github.com/txpipe/scrolls/releases)
- Using the Docker image shared via [Github Packages](https://github.com/txpipe/scrolls/pkgs/container/scrolls)
- By compiling from source code using the instructions provided in this README.


## Configuration

This is an example configuration file:

```toml
# get data from a relay node
[source]
type = "N2N"
address = "relays-new.cardano-mainnet.iohk.io:3001"

# You can optionally enable enrichment (local db with transactions), this is needed for some reducers
[enrich]
type = "Sled"
db_path = "/opt/scrolls/sled_db"

# enable the "UTXO by Address" collection
[[reducers]]
type = "UtxoByAddress"
# you can optionally prefix the keys in the collection
key_prefix = "c1"
# you can optionally only process UTXO from a set of predetermined addresses
filter = ["addr1qy8jecz3nal788f8t2zy6vj2l9ply3trpnkn2xuvv5rgu4m7y853av2nt8wc33agu3kuakvg0kaee0tfqhgelh2eeyyqgxmxw3"]

# enable the "Point by Tx" collection
[[reducers]]
type = "PointByTx"
key_prefix = "c2"

# store the collections in a local Redis
[storage]
type = "Redis"
connection_params = "redis://127.0.0.1:6379"

# start reading from an arbitrary point in the chain
[intersect]
type = "Point"
value = [57867490, "c491c5006192de2c55a95fb3544f60b96bd1665accaf2dfa2ab12fc7191f016b"]

# let Scrolls know that we're working with mainnet
[chain]
type = "Mainnet"
```

## Compiling from Source

To compile from source, you'll need to have the Rust toolchain available in your development box. Execute the following command to clone and build the project:

```sh
git clone https://github.com/txpipe/scrolls.git
cd scrolls
cargo build
```

## FAQ

### Don't we have tools for this already?
Expand All @@ -211,24 +120,6 @@ Yes, we do. We have excellent tools such as: [Kupo](https://github.com/CardanoSo
There's some overlap between _Oura_ and _Scrolls_. Both tools read on-chain data and output some data results. The main difference is that Oura is meant to **_react_** to events, to watch the chain and actuate upon certain patterns. In contrast, _Scrolls_ is meant to provide a snapshot of the current state of the chain by aggregating the whole history.

They were built to work well together. For example, let's say that you're building an app that uses Oura to process transaction data, you could then integrate _Scrolls_ as a way to lookup the source address of the transaction's input.

### How do I read the data using Python?

Assuming you're using Redis as a storage backend (only one available ATM), we recommend using [redis-py](https://github.com/redis/redis-py) package to talk directly to the Redis instance. This is a very simple code snippet to query a the UTXOs by address.

```python
>>> import redis
>>> r = redis.Redis(host='localhost', port=6379, db=0)
>>> r.smembers("c1.addr1w8tqqyccvj7402zns2tea78d42etw520fzvf22zmyasjdtsv3e5rz")
{b'2548228522837ea580bc55a3e6a09479deca499b5e7f3c08602a1f3191a178e7:20', b'04086c503512833c7a0c11fc85f7d0f0422db9d14b31275b3d4327c40c6fd73b:25'}
```

The Redis operation being used is `smembers` which return the list of members of a set stored under a particular key. In this case, we query by the value `c1.addr1w8tqqyccvj7402zns2tea78d42etw520fzvf22zmyasjdtsv3e5rz`, where `c1` is the key prefix specified in the config for our particular collection and `addr1w8tqqyccvj7402zns2tea78d42etw520fzvf22zmyasjdtsv3e5rz` is the address we're interested in querying. The response from redis is the list of UTXOs (in the format `{tx-hash}:{output-index}`) that are associated with that particular address.

### How do I read the data using NodeJS?

TODO

### What is "swarm mode"?

Swarm mode is a way to speed up the process of rebuilding collection from scratch by splitting the tasks into concurrent instances of the _Scrolls_ daemon by partitioning the history of the chain into smaller fragments.
Expand Down
23 changes: 23 additions & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,26 @@
# Summary

- [Introduction](./introduction.md)
- [Installation](./installation/README.md)
- [Binary Release](./installation/binary_release.md)
- [From Source](./installation/from_source.md)
- [Docker](./installation/docker.md)
- [Usage](./usage/README.md)
- [Configuration](./configuration/README.md)
- [Sources](./configuration/sources.md)
- [Reducers](./configuration/reducers/README.md)
- [Predicates](./configuration/reducers/predicates.md)
- [Storage](./configuration/storage.md)
- [Enrich](./configuration/enrich.md)
- [Intersect](./configuration/intersect.md)
- [Chain](./configuration/chain.md)
- [Policy](./configuration/policy.md)
- [Advanced Features](./advanced/README.md)
- [Swarm Mode](./advanced/swarm_mode.md)
- [Troubleshooting](./troubleshooting/README.md)
- [Guides](./guides/README.md)
- [Testdrive](./guides/testdrive.md)
- [NodeJS Client](./guides/nodejs.md)
- [Python Client](./guides/python.md)
- [Redis-cli](./guides/redis.md)

1 change: 1 addition & 0 deletions book/src/advanced/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Advanced features
4 changes: 4 additions & 0 deletions book/src/advanced/swarm_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Swarm mode

Swarm mode is a way to speed up the process of rebuilding collection from scratch by splitting the tasks into concurrent instances of the Scrolls daemon by partitioning the history of the chain into smaller fragments.

48 changes: 48 additions & 0 deletions book/src/configuration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Configuration
For the purpose of testing out Scrolls you can use the provided configuration located in `testdrive/simple/daemon.toml`. See below for another example with explanations and check the following sections of this book to understand in detail each section of the configuration file.

## Format
Scrolls daemon supports `.toml` and `.json` configuration files. Unlike json, toml supports comments which are very handy to turn declarations on and off, specially during early stages of development, debugging, learning, etc. On the other hand, deeply nested filters can be difficult to understand using toml syntax, so the user can choose to declare the whole configuration with json, or instead to rely on tools like [toml2json](https://github.com/woodruffw/toml2json) and [remarshal](https://github.com/remarshal-project/remarshal) to translate small chunks of json (such as complex deeply nested filters) to be used in toml configuration files.

When working with toml configuration files, sometimes it also helps to translate the whole configuration to json, and use [jq](https://stedolan.github.io/jq/)/[bat](https://github.com/sharkdp/bat) to make the json human friendly. This often helps to understand the structure of the filters. Example: `toml2json ./configuration.toml | jq | bat -l json`

## Configuration Example
```toml
# get data from a relay node
[source]
type = "N2N"
address = "relays-new.cardano-mainnet.iohk.io:3001"

# You can optionally enable enrichment (local db with transactions), this is needed for some reducers
[enrich]
type = "Sled"
db_path = "/opt/scrolls/sled_db"

# enable the "UTXO by Address" collection
[[reducers]]
type = "UtxoByAddress"
# you can optionally prefix the keys in the collection
key_prefix = "c1"
# you can optionally only process UTXO from a set of predetermined addresses
filter = ["addr1qy8jecz3nal788f8t2zy6vj2l9ply3trpnkn2xuvv5rgu4m7y853av2nt8wc33agu3kuakvg0kaee0tfqhgelh2eeyyqgxmxw3"]

# enable the "Point by Tx" collection
[[reducers]]
type = "PointByTx"
key_prefix = "c2"

# store the collections in a local Redis
[storage]
type = "Redis"
connection_params = "redis://127.0.0.1:6379"

# start reading from an arbitrary point in the chain
[intersect]
type = "Point"
value = [57867490, "c491c5006192de2c55a95fb3544f60b96bd1665accaf2dfa2ab12fc7191f016b"]

# let Scrolls know that we're working with mainnet
[chain]
type = "Mainnet"
```

50 changes: 50 additions & 0 deletions book/src/configuration/chain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Chain

Specify which network to fetch data from.

## Fields
- type: `"Mainnet" | "Testnet" | "PreProd" | "Preview" | "Custom"`
- magic (*): `u64`,
- byron_epoch_length (*): `u32`,
- byron_slot_length (*): `u32`,
- byron_known_slot (*): `u64`,
- byron_known_hash (*): `String`,
- byron_known_time (*): `u64`,
- shelley_epoch_length (*): `u32`,
- shelley_slot_length (*): `u32`,
- shelley_known_slot (*): `u64`,
- shelley_known_hash (*): `String`,
- shelley_known_time (*): `u64`,
- address_network_id (*): `u8`,
- adahandle_policy (*): `String`,


(*) Use only with `type = "Custom"`

## Examples

Using mainnet
``` toml
[chain]
type = "Mainnet"
```

Using custom values (mainnet):
``` toml
[chain]
type = "Custom"
magic = 764824073
byron_epoch_length = 432000
byron_slot_length = 20
byron_known_slot = 0
byron_known_time = 1506203091
byron_known_hash = "f0f7892b5c333cffc4b3c4344de48af4cc63f55e44936196f365a9ef2244134f"
shelley_epoch_length = 432000
shelley_slot_length = 1
shelley_known_slot = 4492800
shelley_known_hash = "aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de"
shelley_known_time = 1596059091
address_network_id = 1
adahandle_policy = "f0ff48bbb7bbe9d59a40f1ce90e9e9d0ff5002ec48f232b49ca0fb9a"

```
16 changes: 16 additions & 0 deletions book/src/configuration/enrich.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Enrich
Store utxo information in a local DB, this is needed for some reducers to work. Currently, only [Sled](https://github.com/spacejam/sled) databases are supported.

## Fields
- type: `"Sled" | "Skip"`
- db_path (*): `String`

(*) Use only with `type = "Sled"`

## Example

``` toml
[enrich]
type = "Sled"
db_path = "/opt/scrolls/sled_db"
```
34 changes: 34 additions & 0 deletions book/src/configuration/intersect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Intersect

Scrolls provides 4 different strategies for finding the intersection point within the chain sync process.

- `Origin`: Scrolls will start reading from the beginning of the chain.
- `Tip`: Scrolls will start reading from the current tip of the chain.
- `Point`: Scrolls will start reading from a particular point (slot, hash) in the chain. If the point is not found, the process will be terminated with a non-zero exit code.
- `Fallbacks`: Scrolls will start reading the first valid point within a set of alternative positions. If point is not valid, the process will fallback into the next available point in the list of options. If none of the points are valid, the process will be terminated with a non-zero exit code.


## Fields
- type: `"Tip" | "Origin" | "Point" | "Fallbacks"`
- value (*): `(u64, String) | Vec<(u64, String)>`

(*) Use value of type `(u64, String)` with `type = "Point"` and value of type `Vec<(u64, String)>` with `type = "Fallbacks"`

## Examples

Using **Point**:
``` toml
[intersect]
type = "Point"
value = [57867490, "c491c5006192de2c55a95fb3544f60b96bd1665accaf2dfa2ab12fc7191f016b"]
```

Using **Fallbacks**:
``` toml
[intersect]
type = "Fallbacks"
value = [
[12345678, "this_is_not_a_valid_hash_ff1b93cdfd997d4ea93e7d930908aa5905d788f"],
[57867490, "c491c5006192de2c55a95fb3544f60b96bd1665accaf2dfa2ab12fc7191f016b"]
]
```
17 changes: 17 additions & 0 deletions book/src/configuration/policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Policy

## Fields
- missing_data: `"Skip" | "Warn" | "Default"`
- cbor_errors: `"Skip" | "Warn" | "Default"`
- ledger_errors: `"Skip" | "Warn" | "Default"`
- any_error: `"Skip" | "Warn" | "Default"`


## Example

``` toml
[policy]
cbor_errors = "Skip"
missing_data = "Warn"
```

Loading