feat(wasm): split-up build #3289

TalDerei · 2023-11-07T18:16:49Z

References #3236 and #3274 and #3238. The build method has been refactored to enable separate web-workers to operate on sub-components of build process in parallel in the web context. The appropriate methods will be exposed to wasm using the wasm-bindgen attribute in the wasm crate.

The PR includes a preliminary wasm unit testing suite using wasm-bindgen-test that mocks indexDB calls using an interactive browser. This simplified integration testing rather than having to compile and test locally in my webgpu repo every time a change was made. The command to run the test suite is wasm-pack test --chrome -- --test test_build --target wasm32-unknown-unknown --release.

TalDerei · 2023-11-08T21:34:53Z

The parallel build process is split into two phases:

Phase 1: Spawn a distinct web-worker for each ActionPlan, and generate an Action by invoking the action_builder wasm function. The action_builder is generic on the ActionPlan. Under the hood, it uses pattern matching to distinguish between different ActionPlan variants, and invokes Build(BuildPlan) to construct the proper Action. Each web worker will be supplied with an instance of the private witness. The result of the first phase will a list of Actions [Action].
Phase 2: A top-level build method, build_parallel, will be exposed for constructing the transaction. Among other fields, build_parallel will require the TransactionPlan and [Action] and AuthorizationData. Internally, it will call the build_tx_parallel and authorize methods for constructing the Transaction and blinding signature.

I generally noticed as I expose more of the lower-level methods in the build process to WASM, some of the internal structs are not serializable since they're exported from external libraries (arkworks) that don't implement the serialize / deserialize traits (for instance the synthetic_blinding_factor field in UnauthTransaction: https://github.com/penumbra-zone/penumbra/blob/main/crates/core/transaction/src/plan/build.rs#L434). This construction was designed around this limitation. In the future, we shouldn't have to define extra data formats unless we're specifically exposing features from an external library like Arkworks.

hdevalence · 2023-11-08T23:09:09Z

@TalDerei I'm not totally sure I understand the phase 1 / phase 2 distinction. Are these phases of implementation, or different phases of using the same API? If we want to expose parallelism to the web context, the unit of parallelism has to be tasks assigned to different web workers, right? I'm also not sure about adding a new BuildPlan variant to the ActionPlan, I think building is something we want to be able to do to any ActionPlan, rather than a type of ActionPlan itself.

It would be good to see if we could reduce some of the code duplication we currently have between the existing parallel and non-parallel Rust methods, as a side effect of doing this wasm work.

What about something like the following:

Add an ActionPlan::build_unauth(&self, witness_data: &WitnessData) -> Action method to the ActionPlan enum. Internally, this method would match on the ActionPlan variant and call the action-specific build methods inside. Because it doesn't have access to the AuthorizationData, it should fill any required authorization data in with dummy values (like an all 0s array).
Add a TransactionPlan::build_unauth_with_actions(&self, witness_data: &WitnessData, actions: Vec<Action>) -> Transaction method to the TransactionPlan. This would work like the existing TransactionPlan::build method, but it would slot in the provided prebuilt actions instead of using the ActionPlans in the TransactionPlan. Because it doesn't have access to the AuthorizationData, it would need to fill in the binding signature with dummy values (like an all 0s array).
Add a TransactionPlan::authorize(&self, transaction: &Transaction, auth_data: &AuthorizationData) -> Transaction method that would use the provided auth_data to overwrite all the dummy values in all of the transaction's actions, then use the blinding factors in the transaction plan (&self) to derive the synthetic blinding factor needed to compute the binding signature.

The use of dummy values is unappealing, but the big upside of this approach is that we can use it in the Typescript/WASM context without having to add any additional serialization formats, and I think we can also clean up the existing Rust code:

We can replace the existing TransactionPlan::build with an all-in-one method TransactionPlan::build(&self, witness_data: &WitnessData, auth_data: &AuthorizationData) -> Transaction that (1) calls ActionPlan::build_unauth on each action, (2) passes those to TransactionPlan::build_unauth_with_actions, (3) slots in the auth data with TransactionPlan::authorize, and then returns the completed transaction.
We can replace the existing TransactionPlan::build_concurrent with an all-in-one method TransactionPlan::build_concurrent(&self, witness_data: &WitnessData, auth_data: &AuthorizationData) -> Transaction that is exactly like TransactionPlan::build but it wraps each call to ActionPlan::build_unath in a tokio::spawn. This gives almost perfect code reuse between build and build_concurrent.
The web code can farm out the calls to ActionPlan::build_unauth to individual web workers, then do the build_unauth_with_actions and authorize steps once it gets the AuthorizationData. This means we can start the expensive part (proving) before getting the auth signatures, but still only have one step once we get the AuthorizationData.
We can eliminate the UnauthTransaction type, I'm not sure any of the Rust code is using the delayed authorization feature, and the reason it exists is for this use case, which this new design would solve better.

TalDerei · 2023-11-08T23:42:57Z

I should have clarified. phases 1 / 2 are different phases of the same API. I think your suggestions for refactoring this way make a lot of sense, and seem to be more idiomatic while reducing code duplication at the same time!

The only thing I'm wondering is why ActionPlan::build_unauth needs to do anything with the AuthorizationData dummy values in the first place? It would only need to perform this work.

hdevalence · 2023-11-09T03:46:14Z

Well, suppose ActionPlan::build_unauth is called on an ActionPlan::SpendPlan. This has to return an Action::Spend with a fully constructed Spend inside, but that Spend has an auth_sig field. That field needs to be set to something, and the AuthorizationData is not available, so we can set it to a placeholder value instead.

TalDerei · 2023-11-09T04:10:04Z

I see now, the auth_sig placeholder value was set to resemble the code screaming AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.

TalDerei · 2023-11-13T04:06:20Z

@hdevalence Currently, the memo field isn't being handled properly. There's a divergence in the wrappedMemoKey field in an Action between the serial and concurrent build methods.

Update: performing mem::take inside WasmPlanner returns the default planner state. Consequently, I reverted to passing in the entire TransactionPlan to build_action, which internally calls build_unauth after performing some necessary deserialization work. This seems like an overkill since we only need the memo_plan field. But, we can't access the memo field from self since it now contains the default memo. The memo field is now being handled properly, but it's ugly.

I'm trying to think through the best way to handle the memo field. There's also code duplication happening between the build_action, build_unauth_with_actions and build methods with respect to the memo field that need to be addressed.

TalDerei · 2023-11-14T00:20:54Z

@hdevalence breaking API changes:

penumbra-wasm package

new exposed build_action WASM method in wasm_planner.rs
new exposed build_parallel WASM method in tx.rs
modified the internals of the serial WASM build method in tx.rs to reduce code duplication

penumbra-transaction package

build_unauth method in action.rs
build_unauth_with_actions method in build.rs
authorize_with_auth method in build.rs
build method with modified internals to reduce code duplication in build.rs
build_concurrent with modified internals to reduce code duplication in build.rs
removal of UnauthTransaction type

Our all-in-one build method internally calls build_unauth_with_actions and authorize_with_auth per your suggestions. The conflict with this approach is that existing calls to the build method in main separate the build and authorize calls since authorize takes an extra field for randomness. I suggest we strip out the authorize_with_auth from the build method, and call the authorize_with_auth method separately to handle this? The downside to doing this is we'll need to clone the TransactionPlan between the build and authorize_with_auth calls. Unless they can take a reference to self rather than taking ownership?

Testing

I've manually tested the equivalence of the refactored serial and parallel build methods against the original build method. They yield the same transaction payload and proofs, except with different blinding signatures as expected.

Additionally, I'm currently running the existing unit / integration tests to check for failures. What's the idiomatic approach to testing correctness here?

TODOs

The methods currently support Spend and Output actions. Add support for other actions like Swap, SwapClaim, etc.

Relevant

According to one of the comments in effect_hash: "TransactionPlan::build builds the actions sorted by type, so hash the actions in the order they'll appear in the final transaction." The web-workers will need to implement some ordering mechanism to place the actions in the correct order.

Fix

CI pipeline state is currently failing.

hdevalence · 2023-11-15T02:27:53Z

According to one of the comments in effect_hash: "TransactionPlan::build builds the actions sorted by type, so hash the actions in the order they'll appear in the final transaction." The web-workers will need to implement some ordering mechanism to place the actions in the correct order.

This is wrong (or should be wrong, we should check that the implementation actually doesn't do this). At one point in the past we attempted to do this, and then changed course.

The EffectHash should be computed on the actions in the order they are in the plan. Otherwise (as we experienced), there are all kinds of possibilities for subtle mismatches of ordering. Instead, choosing an ordering of the actions is the responsibility of the planner; once the TransactionPlan is generated, it should be used as-is.

hdevalence · 2023-11-15T02:29:18Z

Additionally, I'm currently running the existing unit / integration tests to check for failures. What's the idiomatic approach to testing correctness here?

It should be sufficient to run the Rust tests (either with cargo test or cargo nextest) and to push a PR and see that the smoke test (running pcli against pd) passes.

hdevalence · 2023-11-15T02:49:26Z

Our all-in-one build method internally calls build_unauth_with_actions and authorize_with_auth per your suggestions. The conflict with this approach is that existing calls to the build method in main separate the build and authorize calls since authorize takes an extra field for randomness. I suggest we strip out the authorize_with_auth from the build method, and call the authorize_with_auth method separately to handle this? The downside to doing this is we'll need to clone the TransactionPlan between the build and authorize_with_auth calls. Unless they can take a reference to self rather than taking ownership?

This is a good catch. I think, though, that we should keep the existing API, and figure out a way to make it work.

At a high level, the TransactionPlan is supposed to be a fully deterministic description of the transaction to be built. So why do we need new randomness at all? Especially, why do we need randomness after the authorization signatures are already computed? Why does build_dao_transaction have to jump through hoops to add fake randomness?

Looking through the existing code, where is the randomness actually used? It's only used in the computation of the binding signature:

        // Compute the binding signature and assemble the transaction.
        let binding_signing_key = rdsa::SigningKey::from(synthetic_blinding_factor);
        let auth_hash = transaction.transaction_body.auth_hash();

        let binding_sig = binding_signing_key.sign(rng, auth_hash.as_bytes());
        tracing::debug!(bvk = ?rdsa::VerificationKey::from(&binding_signing_key), ?auth_hash);

Why does the decaf377-rdsa sign take an RNG? decaf377-rdsa is an EdDSA variant, so it should support deterministic signing. If we changed the decaf377-rdsa API to support deterministic signing, we wouldn't need any RNG at all, and then we wouldn't have this problem.

crates/core/transaction/src/plan/action.rs

crates/core/transaction/src/plan/build.rs

hdevalence · 2023-11-15T03:43:17Z

crates/core/transaction/src/plan/build.rs

        self,
-        fvk: &FullViewingKey,
+        mut actions: Vec<Action>,


Shouldn't actions be immutable, since it's preconstructed data?

Right now, I'd call the actions semi preconstructed data. If we think about this from the perspective of a web-worker, they each call build_action which internally matches on the Action type. build_action should only be called for actions that are computationally intensive to compute. It therefore only precomputes the output, spend, swap, swap-claim, and delegator-vote actions.

If we don't mark it as mutable, how should we handle pushing other actions to the transaction body?

We explicitly shouldn't handle that, we should require that all of the actions are built and passed in, rather than trying to manage things case by case.

As a first-pass implementation strategy, I think it would be a good idea to unconditionally spawn a new web worker for each ActionPlan, but even if we did something more sophisticated, it wouldn't change this interface.

In general we should try to avoid action-by-action special case handling and instead handle ActionPlans and Actions uniformly, as with some of the other PR suggestions.

The web-workers are heavy instances (rather than simply being lightweight threads) since they replicate the underlying VM state. Spawning unnecessary web-workers would degrade performance since it increases the message passing overhead.

There are also many variants for an ActionPlan, which means we'd need to spawn more threads than the number of physical cores on the device.

I implemented your suggestion as a first-pass strategy. The pre-built actions are now immutable by default. I'll see how the performance scales in terms of execution time and latency in the webgpu repository and report back.

Not to bikeshed, but this will increase the size / complexity of the exposed build_action wasm function. I don't think this would have any affect, if anything it would be minor.

crates/core/transaction/src/plan/build.rs

hdevalence · 2023-11-17T06:22:19Z

I'm worried about this work getting tripped up on other changes -- like I mentioned on Discord, I didn't realize that the change to remove the custom ordering logic from the TransactionPlan's EffectHash implementation didn't happen over the summer, so the plan I suggested was more complex than I'd intended, and now this PR has a larger scope than we would have liked.

For that reason, I tried carrying it further, filling in some missing match arms and cleaning up some of the code.

hdevalence

Happy to merge this on green CI unless there are other changes you'd like to get in.

We should squash the commits when merging, the history is a mess because of the rebasing.

We can fix any straggler bits in follow-up PRs.

hdevalence · 2023-11-17T07:04:05Z

crates/bin/pcli/tests/network_integration.rs

+    // TODO: the first may no longer be a spend because of ordering changes.
+    // Let's not try to fix this at the moment. Later we can put a "canonical ordering" into the planner.
+    /*


As follow-up, we should add a TransactionPlan::sort_actions(&mut self) and change the Planner to call it just before returning the final TransactionPlan, then re-enable this test (which implicitly depended on action ordering)

TalDerei · 2023-11-17T15:01:58Z

We still need to make the relevant updates to the docs, for instance updating transaction_signing.md to reflect the removal of the UnauthTransaction type.

grod220 · 2023-11-17T14:37:54Z

crates/wasm/src/error.rs

@@ -10,6 +10,7 @@ use web_sys::DomException;
 use penumbra_tct::error::{InsertBlockError, InsertEpochError, InsertError};

 pub type WasmResult<T> = Result<T, WasmError>;
+pub type WasmOption<T> = Option<T>;


Unused, can remove

grod220 · 2023-11-17T14:39:13Z

crates/wasm/src/lib.rs

+pub mod note_record;
+pub mod planner;
+pub mod storage;
+pub mod swap_record;
+pub mod tx;
+pub mod utils;
+pub mod view_server;
+pub mod wasm_planner;


Any particular reason to expose these?

In order to make the modules visible to wasm_bindgen_test, otherwise the test suite will fail since the modules are private.

grod220 · 2023-11-17T14:41:53Z

crates/wasm/Cargo.toml

 web-sys                  = { version = "0.3.64", features = ["console"] }
+serde_json               = "1.0.107"


Not a big deal, but can move this to dev-dependencies as it's only in tests

moved to dev-dependencies

grod220 · 2023-11-17T14:43:20Z

crates/wasm/src/storage.rs

+                    .key_path(Some(&IdbKeyPath::new(note_key)))
+                    .to_owned();
+                let note_object_store = evt.db().create_object_store_with_params(
+                    "SPENDABLE_NOTES",


We should use constants.tables for table names

grod220 · 2023-11-17T14:43:48Z

crates/wasm/src/storage.rs

-        let db_req: OpenDbRequest = IdbDatabase::open_u32(&constants.name, constants.version)?;
+        let mut db_req: OpenDbRequest = IdbDatabase::open_u32(&constants.name, constants.version)?;
+
+        // Conditionally create object stores in the `IdbDatabase` database for testing purposes


This feels a bit odd being only for testing purposes, but existing in the production code. Can we move this to our test suite?

Attempting to migrate this to the test suite has been extremely unsuccessful over the past couple of days. It boils down to 1. our decision to use IdbDatabase, a wrapper for the IndexDB database, and 2. the tight coupling of the IndexedDBStorage instance when we initialize the ViewServer / WasmPlanner. Triggering the creation of an object store in the database requires a an onupgradeneeded event, which requires a connection handle to the database instance. Unfortunately, the handle is consumed in the creation of the database. Anyways, we can't create the database in the test suite directly because any changes made to it won't affect the database instance specific to the ViewServer / WasmPlanner.

Dependency injection via a constructor or setter would resolve this, but It doesn't seem to work in this environment since Serialize is not implemented for indexed_db_futures::IdbDatabase.

For instance, something like

pub async fn set_storage(&mut self, storage: JsValue) -> WasmResult<()>{ let storage = serde_wasm_bindgen::from_value(storage)?; self.storage = Some(storage) }

would require somehow converting a JsValue into IndexedDBStorage or vice-versa.

TLDR; the conditional inside the storage module is the simplest way to mock IndexDB calls and bypass the aforementioned limitations in this environment. I think we can keep it for now.

Ah, indeed that is tricky. But if the db instance is expected, we couldn't create the object store? Like this in the test:

let database: *const IdbDatabase = storage_ref.get_database(); database.create_object_store(...);

However, if this is not possible, think it would be worth extracting this logic into a separate method (create_tables_for_test()) so the ::new() method isn't pull of this business logic.

This won't work since it'll complain that the creation of an object store requires an upgradeneeded event.

grod220 · 2023-11-17T14:59:34Z

crates/wasm/src/wasm_planner.rs

+            memo_key = Some(memo_plan.key);
+        }
+
+        //


Looks like a left over comment

grod220 · 2023-11-17T15:03:26Z

crates/wasm/tests/test_build.rs

+        // Limit the use of Penumbra Rust libraries since we're mocking JS calls
+        // that are based on constructing objects according to protobuf definitions.


Why can't we import the structs instead of re-defining them here?

we'd need to mark the fields on the IndexedDbConstants and Tables structs as pub, since the test suite can't access the private fields.

I think that would be a worthwhile tradeoff instead of re-writing the structs. What do you think?

@hdevalence are there any issues with making these fields pub?

grod220 · 2023-11-17T15:04:45Z

crates/wasm/tests/test_build.rs

+
+        // Convert note to `SpendableNoteRecord`.
+        let spendable_note: SpendableNoteRecord =
+            serde_json::from_str(spendable_note_json).unwrap();


I wonder if there's a way we can use serde_wasm_bindgen with JsValue here instead of serde_json 🤔 . Maybe it doesn't matter though.

grod220 · 2023-11-17T15:07:16Z

crates/wasm/src/tx.rs

+///     auth_data: `pb::AuthorizationData`
+/// Returns: `pb::Transaction`
+#[wasm_bindgen]
+pub fn build_parallel(


I can't really tell, but is this where web workers will be spawned later?

The web-workers will call the build_action method in the wasm planner, and then the prebuilt actions will be slotted into build_parallel where the final transaction can be assembled.

grod220 · 2023-11-17T15:07:55Z

crates/wasm/tests/test_build.rs

+        .unwrap();
+        console_log!("Serial transaction is: {:?}", serial_transaction);
+    }
+}


I didn't see any assertions in the test, should we add some?

The assertions here would be tricky because we're essentially comparing if a serial transaction and parallel transaction match. The only difference in their transaction payload should be the binding signature since it requires randomness. I've been instead manually inspecting if they match in the dev-console, but some kind of assertion would be nice.

hdevalence · 2023-11-17T16:16:22Z

Sounds good. I think it would be good to make some issues to track follow up work, but to merge this in the meantime for the avoidance of other possible conflicts.

Corresponds to changes in [0], which split up the `build_concurrent` logic. [0] penumbra-zone/penumbra#3289

TalDerei had a problem deploying to smoke-test November 7, 2023 18:16 — with GitHub Actions Failure

TalDerei requested review from hdevalence and removed request for hdevalence November 7, 2023 18:35

TalDerei had a problem deploying to smoke-test November 8, 2023 21:40 — with GitHub Actions Failure

TalDerei self-assigned this Nov 8, 2023

TalDerei mentioned this pull request Nov 9, 2023

Tracking issue: Proposal penumbra-zone/webgpu#2

Open

12 tasks

TalDerei had a problem deploying to smoke-test November 13, 2023 20:04 — with GitHub Actions Failure

conorsch force-pushed the tx-decompose branch from e9fb717 to 99dce1f Compare November 13, 2023 21:24

conorsch had a problem deploying to smoke-test November 13, 2023 21:24 — with GitHub Actions Failure

TalDerei had a problem deploying to smoke-test November 13, 2023 22:05 — with GitHub Actions Failure

TalDerei temporarily deployed to smoke-test November 13, 2023 22:14 — with GitHub Actions Inactive

TalDerei temporarily deployed to smoke-test November 13, 2023 23:44 — with GitHub Actions Inactive