Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLS12-381 Crypto Primitives #1071

Merged
merged 35 commits into from
Jul 18, 2023
Merged

Conversation

mschoenebeck
Copy link
Contributor

@mschoenebeck mschoenebeck commented Apr 22, 2023

Added library mschoenebeck/bls12-381 as a submodule. For x86_64 there are high-speed asm implementations of all low-level arithmetic functions available, which optionally make use of adc and bmi2 CPU extensions (if available on host machine, live dispatcher via init call).

Added the following types to leap:

bls_scalar_type  = std::array<uint64_t, 4>;
bls_g1_type      = bls12_381::g1;
bls_g2_type      = bls12_381::g2;
bls_gt_type      = bls12_381::fp12;
bls_fp_type      = bls12_381::fp;
bls_fp2_type     = bls12_381::fp2;

Added the following host functions to leap (according to https://eips.ethereum.org/EIPS/eip-2537):

bls_g1_add(bls_g1_type op1, bls_g1_type op2, bls_g1_type result);
bls_g2_add(bls_g2_type op1, bls_g2_type op2, bls_g2_type result);
bls_g1_mul(bls_g1_type point, bls_scalar_type scalar, bls_g1_type result);
bls_g2_mul(bls_g2_type point, bls_scalar_type scalar, bls_g2_type result);
bls_g1_exp(bls_g1_type[] points, bls_scalar_type[] scalars, bls_g1_type result);
bls_g2_exp(bls_g2_type[] points, bls_scalar_type[] scalars, bls_g2_type result);
bls_pairing(bls_g1_type[] g1_points, bls_g2_type[] g2_points, bls_gt_type result);
bls_g1_map(bls_fp_type e, bls_g1_type result);
bls_g2_map(bls_fp2_type e, bls_g2_type result);
bls_fp_mod(std::array<uint8_t, 64> s, bls_fp_type result);

Check out mschoenebeck/aggsigtest for a sample contract using those new host functions.

@@ -338,6 +338,8 @@ struct controller_impl {
set_activation_handler<builtin_protocol_feature_t::get_code_hash>();
set_activation_handler<builtin_protocol_feature_t::get_block_num>();
set_activation_handler<builtin_protocol_feature_t::crypto_primitives>();
set_activation_handler<builtin_protocol_feature_t::bls_primitives>();
bls12_381::init();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct place for the library's init function? The call to init only needs to happen once when nodeos starts. It is a live dispatcher that checks if adc and bmi2 cpu features are available and if so sets the faster asm routines.

} )
( builtin_protocol_feature_t::bls_primitives, builtin_protocol_feature_spec{
"BLS_PRIMITIVES",
fc::variant("2c302889ce2db124878f7c6092cf27519d450559ed2fdfbad14f532d90fc5139").as<digest_type>(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SHA256 hash value of the comment message below is probably incorrect. I haven't been able to reproduce any of the other hashes, so I assume the one I generated is probably wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SHA256 hash includes the trailing newline.
For example, GET_BLOCK_NUM, hash the string between the /**/, quoted below, including the trailing newline but not the newline before "Builtin".

"Builtin protocol feature: GET_BLOCK_NUM

Enables new `get_block_num` intrinsic which returns the current block number.
"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yours would be 01969c44de35999b924095ae7f50081a7f274409fdbccb9fc54fa7836c76089c

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Kevin. Thank you very much for clarifying that. I tried a few different things but not that. I am now able to reproduce all the hashes.

Corrected the hash value.

sv.reserve(n);
for(uint32_t i = 0; i < n; i++)
{
bls12_381::g1 p = bls12_381::g1::fromJacobianBytesLE({reinterpret_cast<const uint8_t*>(points.data() + i*144), 144}, false, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before using the span<>s passed as arguments, need to validate that the size of the span is what is required.

Consider, for example, a contract currently with 64KB of memory calling bls_g1_exp(64KB-8, 4, ..., n=1, ...). The host function validation wrappers will allow this call because the points span is 4 bytes long and within the 0 to 64KB-1 valid range of memory. However, bls12_381::g1::fromJacobianBytesLE({reinterpret_cast<const uint8_t*>(points.data() + i*144), 144} then goes on to use 144 bytes starting from 64KB-8. This will access invalid WASM memory leading to undefined behavior (what will likely happen in this case and at this point of time is a recovered SEGV which leaks the memory allocated by the std::vector<>::reserve() above).

Host functions must ensure they only access memory within the passed span<> ranges. So instead, need to validate that points.size() == n*144. Consider, for example, this host function

int32_t interface::alt_bn128_add(span<const char> op1, span<const char> op2, span<char> result ) const {
if (op1.size() != 64 || op2.size() != 64 || result.size() < 64 ||
bn256::g1_add(std::span<const uint8_t, 64>{(const uint8_t*)op1.data(), 64},
std::span<const uint8_t, 64>{(const uint8_t*)op2.data(), 64},
std::span<uint8_t, 64>{ (uint8_t*)result.data(), 64}) == -1)
return return_code::failure;
return return_code::success;
}

notice how if op1 is not 64 bytes an error is returned. Since bls_g1_exp() does not return anything, the only reasonable option for bls_g1_exp may be to assert that points.size() == n*144 (i.e. fail the transaction).

This same guidance applies to result as well. Notice that the above alt_bn128_add example checks that result is at least 64 bytes. That may not be ideal and may have been an oversight in the original implementation -- I would suggest enforcing that the result here is exactly 144 bytes.

(this feedback applies to all the added host functions)

Copy link
Contributor Author

@mschoenebeck mschoenebeck Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation and example scenario. I was expecting the execution to fail in those cases but didn't consider potential memory leaks. A pretty big deal, especially for network nodes that run indefinitely. Of course, these range checks are very important.

Updates in my most recent commit(s) include:

  • added checks for all span<> ranges
  • make use of return_code::success/return_code::failure as in BN host functions
  • also see related changes in cdt (proper error propagation back to smart contract and then handling with eosio::check, again, exactly as in BN host functions)
  • fixed indentation of tabs (3 ws instead of 4)

bls12_381::g2 p_g2 = bls12_381::g2::fromJacobianBytesLE({reinterpret_cast<const uint8_t*>(g2_points.data() + i*288), 288}, false, true);
bls12_381::pairing::add_pair(v, p_g1, p_g2);
}
bls12_381::fp12 r = bls12_381::pairing::calculate(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever seeing a loop in a host function, need to be mindful on if a malicious contract can cause the loop to take "too long".

In this case, how long does bls_pairing take if n = 80000 (I believe the approximate realistic maximum currently)?

We don't have hard rules, but historical guidance is that a host function should "yield" (allow the deadline timer to cancel the transaction) at least once per millisecond. But this is a manual yield the host function must perform. As an example,

void interface::sha3( span<const char> input, span<char> output, int32_t keccak ) const {
bool _keccak = keccak == 1;
const size_t bs = eosio::chain::config::hashing_checktime_block_size;
const char* data = input.data();
uint32_t datalen = input.size();
fc::sha3::encoder enc;
while ( datalen > bs ) {
enc.write( data, bs);
data += bs;
datalen -= bs;
context.trx_context.checktime();
}

see how every eosio::chain::config::hashing_checktime_block_size worth of input to hash will call context.trx_context.checktime();

And, general historical guidance is that host functions with conditions that fail to yield in 5+ms would be considered a security defect and must be fixed.

So, we may need to add context.trx_context.checktime() calls in this loop depending on how long it takes to perform these operations.

Copy link
Contributor Author

@mschoenebeck mschoenebeck Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for taking the time explaining everything so well. I really appreciate it. I already noticed calls to context.trx_context.checktime() here and there, but forgot to ask about it.

There are three host functions with loops. According to your rule of thumb (approx. one check per ms) I did some time measuring on my machine to get an idea on how long each one of the loop iterations run. Here the results and the reasoning behind the numbers I chose for the checktime() intervals:

  • g1_exp: 155ns per loop iter => times 10 = 1.55ms
  • g2_exp: 260ns per loop iter => times 6 = 1.56ms
  • pairng: 395ns per loop iter => times 4 = 1.58ms

Assuming significantly better HW on BP nodes I thought aiming for 1.5ms on my machine should be fine. Also, in case of pairing, having at least 3 pairs going through without checktime() overhead would be a very common case for smart contracts that perform groth16 proof verifications (fwiw).

Updates in the most recent commit(s) include:

  • added context.trx_context.checktime() calls in reasonable intervals

if(i%4 == 0)
context.trx_context.checktime();
}
bls12_381::fp12 r = bls12_381::pairing::calculate(v);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of v.size() == 80000 how long does bls12_381::pairing::calculate() take?

Copy link
Contributor Author

@mschoenebeck mschoenebeck Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine about 20 seconds. I was thinking about it after I read you previous comment: Does that mean, we need to be able to pass the yield() function to pairing::calculate() and execute it there in reasonable intervals as well? (Edit: probably not, since now with the checktime() inside the loop it wouldn't be possible to add that many pairs to v anymore)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since now with the checktime() inside the loop it wouldn't be possible to add that many pairs to v anymore

yeah, good point. But is there some number that takes, for example, 25ms to make it through the loop, but then takes 100ms to make it through bls12_381::pairing::calculate()? That too would be considered a security defect. I'm not sure of the timing for these operations so maybe it's a non issue.

Copy link
Contributor Author

@mschoenebeck mschoenebeck Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. So I did some measuring again:

It takes v.size() == 58 pairs in order for the entire loop to take 25ms (on my machine). The call to pairing::calculate(v) then takes about an additional 14.7ms.

So the pairing is about half of the cost of the loop. I'm not exactly sure how those numbers would vary on a faster machine but I guess it would be a similar ratio between loop execution and paring calculation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's "only" ~10ms but I think we should be prudent and protect against that -- it definitely would fail previous guidelines we had back in the bug bounty days.

Since we control the library, the best choice is to pass in the ability for bls12_381::pairing::calculate() to poll a yield function. An example of that is,

int32_t interface::alt_bn128_pair(span<const char> g1_g2_pairs) const {
auto checktime = [this]() { context.trx_context.checktime(); };
auto res = bn256::pairing_check({(const uint8_t*)g1_g2_pairs.data(), g1_g2_pairs.size()} , checktime);

where the declaration of that is

/// pairing_check calculates the Optimal Ate pairing for a set of points.
///  @param marshaled_g1g2_pairs marshaled g1 g2 pair sequence
///  @return -1 for unmarshal error, 0 for unsuccessful pairing and 1 for successful pairing
int32_t pairing_check(std::span<const uint8_t> marshaled_g1g2_pairs, std::function<void()> yield);

if you wanted to make the yield optional for users, you could potentially do something like

int32_t pairing_check(std::span<const uint8_t> marshaled_g1g2_pairs, std::function<void()> yield = std::function<void()>());

or just require users to pass in a stub function. Either way is fine; importantly either way ultimately still allows the library to used outside of leap.

(there is another option: add a subjective limit to the size of n; for example

int32_t interface::mod_exp(span<const char> base,
span<const char> exp,
span<const char> modulus,
span<char> out) const {
if (context.control.is_speculative_block()) {
unsigned int base_modulus_size = std::max(base.size(), modulus.size());
if (base_modulus_size < exp.size()) {
EOS_THROW(subjective_block_production_exception,
"mod_exp restriction: exponent bit size cannot exceed bit size of either base or modulus");
}

but I can't find any compelling reason to pick this choice due to our control of the bls lib)

Copy link
Contributor Author

@mschoenebeck mschoenebeck Jun 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Since there is a leap branch of the library now, let's try to tailor it perfectly for leap integration.

I added a yield function pointer parameter to pairing::calculate() (pairing::miller_loop() respectively) and g2::multiExp() since this one could also run for about 15ms in the 25ms-loop worst case. Here are the numbers for all three functions (my machine):

  • G1 multi-exponentiation: with v.size() == 161 the loop takes 25ms => g1::multiexp(v) then takes only 5.1ms
  • G2 multi-exponentiation: with v.size() == 96 the loop takes 25ms => g2::multiexp(v) then takes 15.3ms
  • Pairing: with v.size() == 58 the loop takes 25ms => pairing::calculate(v) then takes 14.7ms

The timings of g1::multiExp() should be fine, I guess, since 25ms + 5.1ms ≅ 30ms. So if 5ms to 7ms is an acceptable 'overhead' for the execution time of either pairing::calculate() or g1/g2::multiExp() after their corresponding loops (assuming the worst case vector sizes listed above with 25ms preceding loop execution time) then only one yield() call approximately in the middle of execution of pairing and multiexp should be fine, right? Worst case scenario is then about 25ms + ~7ms ≅ 32ms total execution time in case the transaction fails.

It was a bit tricky though, to integrate the yield() call accurately into the middle of the pairing and mutliexp functions because there are multiple loops (including nested loops) of which some depend on v.size(). However, I believe I found a good way to add it approximately in the middle of each function's execution path, so that the time check is performed once if v exceeds a certain size.

Copy link
Member

@arhag arhag Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mschoenebeck: I think we should pass the yield function into g1::multiExp() as well for consistency.

We should not assume that there will always be a limit of 30 ms in a transaction. For example, we are proposing that the network upgrade to a higher max transaction time. With a max transaction time of 150 ms, the user could pass a 5x larger array and thus increase the time of the g1::multiexp function call by 5x as well.

Additionally, I think the yield functions should be called frequently enough within the implementation so that the host functions cannot be abused to exceed the deadline by more than 5 ms even if we allow the transaction to execute indefinitely. We are proposing to raise the limit to 150 ms. We may push it to 250 ms in future. But in the longer term future, we may be able to take advantage of parallelism to even allow a transaction to execute for longer than even a block interval! And even if that does not happen, we may do that with read-only transactions which could potentially call these host functions. And we need to ensure read-only transactions respect our deadlines as well to maintain availability of the various features of the system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added yield() call to g1::multiexp. Within the implementation yield() is called inside the loop that reads the parameters passed to the host function every 10 (g1) or 6 (g2) iterations. That should be frequent enough taking the above numbers into consideration, right?

@mschoenebeck
Copy link
Contributor Author

mschoenebeck commented Jun 28, 2023

@mschoenebeck
Copy link
Contributor Author

I have one question about the correct place to link the bls library. I noticed that bn256 is actually linked to chain library despite being a submodule of fc. It is not linked against fc though.

https://github.com/AntelopeIO/leap/blob/7c0694b4c965bf12078680cb4062f092554be0c5/libraries/chain/CMakeLists.txt#L132C1-L134C23

I linked the bls lib to fc since it is a submodule of that library. Probably a minor thing but what is the correct way?

https://github.com/mschoenebeck/leap/blob/90071617799738820350538f632ca5d5fd54e5e8/libraries/libfc/CMakeLists.txt#L121C1-L122C174

BOOST_CHECK_EQUAL(p.inCorrectSubgroup(), false);
g2 p_res = g2::fromJacobianBytesBE(hexToBytes<288>("0x158a2a1e3ce68c49f9795908aa3779c6919ed5de5cbcd1d2a331d0742d1eb3cb28014006b5f686204adb5fdca73aea570ee0f0d58880907c8de5867dd99b6b7306b2c3de4a1537e6d042f2b8e44c8086853728cc246726016b0fcf993db3d759005f8ac0cb55113c857c5cf3f83d9b624ce9a2a0a00a1206777cf935721c857b322a611ed0703cf3e922bfb8b19a1f5e10a341b2191ab5a15d35f69850d2adb633e5425eecb7f38dd486a95b3f74d60f3ee6cf692b3c76813407710630763f7605b3828c19203f661732a02f7f546ab354694128bbe5a792a9db4a443c0fe10af0df2bc1b8d07aee99bd6f8c6b26847011aa31634f42f722d52022c736369db470576687fdf819cf15a0db4c01a0bd7028ee17cefdf6d66557d47fb725b6d00f"), false, true).value();
BOOST_CHECK_EQUAL(p.equal(p_res), true);
} FC_LOG_AND_RETHROW();
Copy link
Member

@arhag arhag Jul 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests of garbage input are good, but I would like to see unit tests that directly test the code paths of the host functions integrated into Leap to run with its unit tests. It should test for: these garbage cases; failure cases that return -1; and at least one successful case for each function (taken from the libraries test vectors).

This will require adding a new test contract.

See how it was done for the alt_bn_128 host functions: https://github.com/eosnetworkfoundation/mandel/pull/316/files#diff-87c8848613f6d5f711354edd61feeebcaae9b6acd4f5f16ca651075b989dc929

Copy link
Contributor Author

@mschoenebeck mschoenebeck Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying that. I added a test contract bls_primitives_test to test all host functions. All tests pass except for one: For some reason the test for pairing fails on eos-vm-oc with a segmentation fault. I have no idea why. The same test passes for eos-vm and eos-vm-jit. I am not exactly sure how to debug this though. Any ideas what could go wrong here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that failure is because the host function is named bls_pairing which doesn't match,


Fixing the entry in intrinsic_mapping.hpp will resolve it.

(in 2.0 this mismatch would have ended up as a compile error, we lost that somewhere between 2.0 & 3.1 -- #621)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Matt. Fixed in: 59cc737

@mschoenebeck
Copy link
Contributor Author

I synced with main. However, I had to resolve a merge conflict in unittests/deep-mind/deep-mind.log. I'm pretty sure it is not correctly resolved since it shows a ton of changes to the main branch. Please take a look.

@mschoenebeck
Copy link
Contributor Author

Quick update: I was able to resolve the deep-mind.log conflicts myself. Fixed in: 59cc737

@arhag arhag changed the base branch from main to bls_integration July 18, 2023 18:14
Copy link
Member

@arhag arhag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. There are some small tweaks the ENF will do including getting the CI to fully pass. But we are good as it is to merge this into the bls_integraton branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants