Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0139? | Plutus Core Builtin Type - ByteStringMap #921

Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions CIP-bytestring-map-builtin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
CIP: ????
ana-pantilie marked this conversation as resolved.
Show resolved Hide resolved
Title: Plutus Core Builtin Type - `ByteStringMap`
Category: Plutus
Status: Proposed
Authors:
- Michael Peyton Jones <[email protected]>
- Ana Pantilie <[email protected]>
Implementors: []
Discussions:
- https://github.com/cardano-foundation/CIPs/pull/921
Created: YYYY-MM-DD
ana-pantilie marked this conversation as resolved.
Show resolved Hide resolved
License: CC-BY-4.0
---

## Abstract

We propose a `ByteStringMap` type for Plutus Core. This type will have logarithmic time lookups, which are difficult to achieve in Plutus Core programs today.


## Motivation: why is this CIP necessary?

### Map data structures

A _map_ is a collection of _values_ indexed by _keys_. We consider a map to be _well-defined_ if each key uniquely identifies each value. A basic map implementation provides, at the very least, a way to define an empty map, an operation to insert new key-value associations into an existing map, an operation to lookup values based on an input key and an operation to delete entries from the map.

### Performance and Plutus Core implementations

To provide map operations with good asymptotic performance, it is necessary to implement a fairly complex data structure such as a red-black tree. Implementing such data structures in Plutus Core programs themselves has been attempted multiple times in the past, but unfortunately we have found that the high overhead and large memory cost outweigh any theoretical algorithmic improvements (see [attempt 1][3] and [attempt 2][2]).

The only practical way to implement maps in Plutus Core is by using an association list representation. Lists are known to have poor asymptotic performance for usual map operations. To make matters worse, maps are in many cases central to developing smart contract logic, specifically maps with `ByteString` keys. See [CPS-0013][1] for more details on how smart contracts heavily depend on `ByteString` maps.

Another attempt to mitigate the performance restrictions imposed by the list implementation was to specialise the Plutus Core implementation [into an association list with `ByteString` keys][4]. Unfortunately, this approach also failed.

We concluded that embedding maps into Plutus Core cannot be done without incurring high performance costs.

### Map as a builtin type

The only remaining solution is to consider introducing a builtin map type.

Unfortunately, the builtin language has a significant limitation: it does not allow for higher-order builtin functions. As of the time of writing, builtin functions may not receive other functions as arguments, but there is ongoing work to lift this restriction. However, due to complex design considerations, repeated applications of function arguments will incur high performance costs. Since maps are recursive data structures, these high performance costs will be accumulated and such an approach will still not be feasible.

Why are higher-order builtin functions necessary? Map keys must be processed (or simply, just compared) in order to be able to build and operate over well-defined maps. For arbitrary key types, that would only be possible by passing the required processing or comparison functions as arguments to the builtin functions.

The only way to implement a builtin map with a (builtin) polymorphic key type would be to modify the builtins language itself. A compiler would have to only allow the construction of maps with key types which are known a priori to posess some property (such as a total ordering). The Plutus Core evaluator would have to check for this as well. The implementation itself would not be too complicated, but we must first consider how this would change the formal semantics of the builtins language.
rphair marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed that and we don't really need to modify anything much? We can just fail at runtime for built-in types that cannot implement Ord (or PlutusOrd or whatever). And then we'll expose a safe interface in Plinth forbidding one from using such built-in types as keys in maps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment still stands, I don't believe you'd need to pass a comparison function around, keys can only be of built-in types and we either know how to compare those (in most cases) or don't (like those pesky crypto types) and in the latter case we can just throw. UPLC is a low-level language, it's fine for it to throw like that, and high-level languages will be able to provide a type-safe interface. No meta-theoretical properties are hurt. General maps are gonna be a bit less efficient though, but if done right, probably insignificantly so.


Given the above, the only immediate solution is to implement a builtin map type which is specialised to `ByteString` keys but remains polymorphic in the value type. We believe that this solution should be good enough for most practical use-cases, and maybe even surpass a fully polymorphic map type in performance.


## Specification

We propose the addition of the following builtin type: `ByteStringMap` of kind `Type -> Type`.
- It represents a map associating keys of type `ByteString` with values of any builtin type.
- The new builtin type should be implemented using a tree data structure allowing for logarithmic-time map operations.
- We consider the costing size of a `ByteStringMap` to be equal to the size of its corresponding representation as an association list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that, we'll have to carry some additional info around for balancing purposes, so it sounds like costing should reflect that. And an element in an association list is a pointer to a pair with two pointers to actual content, in our case it'll probably be just the latter two pointers. So yeah, I don't think we should assume that costing is going to be exactly the same, although I suppose that could be a good enough approximation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwxm what should we say here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simply say "roughly equal" instead of "equal" and that'll be fine, this is all to be figured out during the costing phase anyway, it's fine not to have the exact answer now.

Copy link
Contributor

@kwxm kwxm Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwxm what should we say here?

@ana-pantilie: sorry, I missed this. We actually have to caculate the size of the map object before we carry out any operations on it, and that might be expensive if we have to traverse a tree to findi ts size. One way to alleviate this might be to have the maps carry their sizes about with them and make sure that the update operations update the size as well (and during deserialisation we'd have to check that maps embedded in scripts don't lie about their size). We might even need two measures of size: one telling you the number of elements of the map (for converting to a list, for example), and one telling you the depth of the map (for insertion etc.). We could do this by using newtype wrappers to choose which size measure should be applied. Maybe we could just store the size of the object and take the log instead of measuring the depth, since these should be proportional if the tree's always kept balanced. This is all probably too detailed for a CIP though, so saying that the size should be something like the number of entries should be fine: we can work out the details later.

- The binary encoding of `ByteStringMap`s is equivalent to its encoding as an association list.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwxm - please review. This is based on the standard flat implementation for Haskell's Data.Map.


**Note**: Here `Type` is the universe of all builtin types, since we do not consider types formed out of applying builtin types to arbitrary types to be inhabited.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a type variable or a type application reducing to a built-in type etc, so it's not just the built-in types.


We propose the following set of builtin functions to accompany the new builtin type:
1. `insertBSMap :: forall v . ByteString -> v -> ByteStringMap v -> ByteStringMap v`
- it returns a map with the input key updated to the new value, silently discarding any previous value
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument, and logarithmic memory in the `ByteStringMap` argument
2. `lookupBSMap :: forall a v . ByteString -> (() -> a) -> ByteStringMap v -> a`
ana-pantilie marked this conversation as resolved.
Show resolved Hide resolved
- it returns the value for a given key in the map; if the key is missing, it returns the default value of type `() -> a`
- its implementation depends on, as of the time of writing, ongoing work in supporting higher-order builtin functions; see the next section for the motivation behind this choice
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument and constant memory
3. `deleteBSMap :: forall v . ByteString -> ByteStringMap v -> ByteStringMap v`
- it returns a map without the key-value association identified by the input key
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument and logarithmic memory in the `ByteStringMap` argument
4. `unionBSMap :: forall v . ByteStringMap v -> ByteStringMap v -> ByteStringMap v`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important use-case for this builtin data-structure is to facilitate fast operations over the Value type defined as Map.Map CurrencySymbol (Map.Map TokenName Integer). These operations are by far one of the biggest performance bottlenecks for onchain code.

It is extremely expensive to apply validation logic to Values in onchain code because of the high cost of map operations. union is by far the most expensive value operation, but it also used nearly universally by dApps and is a huge performance bottleneck. This builtin data-structure needs to have a semigroup collision handling for Values either unionBSMap with some bool flag indicating a value or just a separate unionBSMapValue builtin for use with Value types.

The common pattern shared across most protocols is as follows:

  1. Given an input, construct the value expected to be possessed by the output which seeks to satisfy the spending conditions of said input.
  2. Check that the expected output value matches the actual output value

The construction of the expected output value is in the majority of cases a union of the input value and a value delta (the quantities of
assets to be added to / removed from the input, ie. the difference in value between the input UTxO and the output UTxO).

processRequest :: ProtocolStateDatum -> TxInInfo -> TxOut -> Bool 
processRequest stateDatum txIn txOut = 
  let resolvedInput = (txInInfoResolved txIn)
      inputValue = txOutValue resolvedInput
      outputValue = txOutValue txOut
      -- Given a tx input and the relevant state datum compute the value delta between the input value 
      -- and the expected value delta between the input value and the expected. (ie the quantities of 
      -- assets to be added to / removed from the input.  
      inputValueDelta = computeValueDelta txIn stateDatum 
      expectedOutputValue = inputValue <> inputValueDelta
   in expectedOutputValue == outputValue

For examples in production dApps see:
https://github.com/SundaeSwap-finance/sundae-contracts/blob/edc118880d3baffcb7d5bd277faec2e7dc54c59b/lib/calculation/swap.ak#L120C1-L124C64
https://github.com/minswap/minswap-dex-v2/blob/c3293ad435792ba502f5a2b1691c6736e5f167e6/lib/amm_dex_v2/order_validation.ak#L530
https://github.com/minswap/minswap-dex-v2/blob/c3293ad435792ba502f5a2b1691c6736e5f167e6/lib/amm_dex_v2/order_validation.ak#L136
https://github.com/lenfiLabs/lenfi-smart-contracts/blob/eb7dc685f3888c940d0da027c46b5f1c3d9578f1/validators/order_contract.ak#L164
https://github.com/minswap/minswap-lbe-v2/blob/6fea91d881d6037e9c944ba39f6068a1d6df4e42/lib/lb_v2/validation.ak#L429
https://github.com/minswap/minswap-lbe-v2/blob/6fea91d881d6037e9c944ba39f6068a1d6df4e42/lib/lb_v2/order_validation.ak#L237
https://github.com/minswap/minswap-lbe-v2/blob/6fea91d881d6037e9c944ba39f6068a1d6df4e42/lib/lb_v2/order_validation.ak#L82
https://github.com/Danogo2023/bond-dex/blob/f29effc2ca1f354fd8a9c3695ec58926f940de84/lib/bond/limit_bid/withdraw.ak#L392-L397

For explanation on why many of these are using value.add (insert with semigroup collision handling instead of left/right-based collision)
instead of merge see https://github.com/txpipe-shop/sundae-swap/blob/d8adea189f16b3a585ea311d6467f5fe2fda78d4/src/audit.typ#L1099-L1119
probably due to the overhead introduced by recursion in onchain code, merge is more expensive than add and thus should be preferred when you have a fixed number of assets to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colll78 Thanks for all the pointers! Sounds like we should have insertWithBSMap as a builtin. With higher order function support coming soon, this makes total sense.

I'm unsure about the following:

  • Should we keep insertBSMap
  • Do we need a unionWithBSMap builtin, or does insertWithBSMap suffice
  • What can we do to speed up equality checks on Values? It's not simply map equality due to potential zero-value entries. Perhaps filterBSMap + equalsBSMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map equality is cheap because we currently take advantage of equalsData.

equalsData (mergeValue ownInputValue tokensToAdd) ownOutputValue

To handle zero-value entries mergeValue simply deletes them, I would expect that the union function would similarly be capable of deleting zero value entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need unionWithBSMap and I think filterBSMap would be extremely helpful as-well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have insertWithBSMap or unionWithBSMap. The upcoming higher-order builtins allow one to return an application from a builtin call, not to execute one within the builtin (we used to have that functionality but it was removed a long time ago) and we need the latter to be able to implement insertWithBSMap or unionWithBSMap as a builtin. We can try to design such functionality, but it's probably not gonna be pretty. I've spent some time several years ago thinking about roughly this problem and had an idea back then, so if you want me to try it out, I can do that (it'll be a few days of work before I can tell whether it's promising or not).

Copy link
Contributor

@colll78 colll78 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then just a default mergeValue and insertValue (that you don't provide a function to, and only works on canonical ledger constructed Value as BuiltinData) would be just as impactful. Dealing with Values is the main use-case and one of the main efficiency bottlenecks in most protocols, it makes sense to have specialized builtins for interacting with Values given they are one of the two most foundational components of smart contracts (the purpose of smart contracts is to validate the transfer of and change in Value and Data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, perhaps one of the most universally beloved smart contract platforms in the blockchain space defines a programming paradigm "Asset oriented programming", the entire language has been built around the notion that it natively understands assets and provides native first-class features for assets. The introduction of such builtins would also vastly enhance the security of the ecosystem because the implementation of such functions in onchain code are extremely error prone, especially when you look at all the black magic optimization that libraries are using (which wouldn't be necessary with such builtins).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colll78 sounds interesting. Wanna open an issue in the Plutus repo about that?

However I personally prefer to be as general as possible until we're sure that that's not enough. So I'd rather investigate unionWithBSMap, maybe we can do that and then we can keep domain-specific logic in the user space. If that's too hard or isn't going to fly, then I agree with you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is reasonable.

mergeValue :: BuiltinData -> BuiltinData -> BuiltinData 
mergeValue v1 v2 = 
  let valueMap1 = listToBSMap (BI.unsafeDataAsMap v1) 
       valueMap2 = listToBSMap (BI.unsafeDataAsMap v2) 
       combinedVal = unionWithBSMap (\innerA innerB -> unionWithBsMap BI.addInteger (listToBSMap (BI.unsafeDataAsMap innerA)) (listToBSMap (BI.unsafeDataAsMap innerB))) valueMap1 valueMap2
   in toBuiltinData $ filterWithBSMap ((/=) 0) combinedVal

In this case filterWithBSMap and equalsBSMap are definitely required.

- it merges the two input maps, left-biased when there are key collisions
- it uses linear time in the sum of the first and second arguments, and linear memory in the sum of the first and second arguments
5. `bsMapToList :: forall v . ByteStringMap v -> List (Pair ByteString v)`
- it transforms the map into an association list
- the resulting list should be ordered (in ascending order of the keys)
- it is linear in the size of the `ByteStringMap`
6. `listToBSMap :: forall v . List (Pair ByteString v) -> ByteStringMap v`
- it transforms an association list into a map, right-biased in the case of duplicate keys
- it has `O(n * logn)` time complexity, where `n` is the size of the list
- it has linear space complexity
- if the input list does not contain duplicate keys and is ordered (in ascending order of the keys), the space and time complexity will be linear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following should also be added:

keysBSMap ::  forall v . BuiltinByteStringMap v -> BuiltinList ByteString
valuesBSMap :: forall v. BuiltinByteStringMap v -> BuiltinList v

## Rationale: how does this CIP achieve its goals?

### Constructing `ByteStringMap`s

The first concern we have to address is the reasoning behind not introducing a builtin function for constructing empty `ByteStringMap`s. Since maps are polymorphic in the value type, such a function would require receiving a type tag to know which type of concrete empty map to construct. This provides for an awkward interface to users, and it can easily be avoided by delegating the responsibility of constructing empty map constants to the compiler.

`insertBSMap` and `deleteBSMap` remain fundamental operations which we must expose as builtin functions.

One seemingly unnecessary addition is `unionBSMap`, as it may be implemented in Plutus Core by traversing the elements of one of the maps and calling `insertBSMap`. However, as outlined by [CPS-0013][1], `ByteStringMap` should provide an optimal implementation of the Plutus Tx `Value` type, and the operations on `Value`s depend heavily on map unions. Therefore, for performance reasons, we prefer to expose this functionality as a builtin function.

The final map-constructing operation we have decided to include is `listToBSMap`. The motivation for introducing this builtin is providing quick internalisation of `Data` encoded maps. The `Data` encoding of maps is list-based, and is internalised into the builtin language by decoding the data `Map` constructor as a builtin list of pairs. For maps where the key type is `ByteString`, we now offer the optimised representation of `ByteStringMap`, and it is natural that we provide the most cost-efficient way of transforming such lists of pairs into `ByteStringMap`s.

### Querying `ByteStringMap`s

Map lookups are essential, so we must provide `lookupBSMap`, but there remains the question of how to provide a total function with this functionality.

As mentioned earlier, we have decided to base the implementation of this builtin on an upcoming feature, because it is the least invasive way of providing a total implementation and it leverages future improvements to the builtins language.

With the proposed signature of `lookupBSMap`, users will be able to provide any default value for when the lookup fails, including `error` or `False`. Therefore, `lookupBSMap` subsumes a membership check operation as well.

### Traversing `ByteStringMap`s

Unfortunately, due to the limitations presented earlier we cannot provide an effective builtin fold over maps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it we implemented Allow for returning application trees from builtins a.k.a. folding builtins, which is a straightforward generalization of pattern matching builtins and shouldn't be too hard to get done (famous last words).


However, we can provide a transformation to builtin lists, which can be traversed using the existing list builtins. Iteration over the list representation of a map offers access to both the key and value at each step, therefore it is the most general bulk-operator we can expose.

## Path to Active

## Acceptance Criteria

- [ ] The feature is implemented according to the implementation plan and merged into
the master branch of the [plutus repository][5].
- [ ] [cardano-ledger][6] is updated to include new protocol parameters to control costing of
the new builtins.
- [ ] The feature is integrated into [cardano-node][7] and released as part of a hard fork.

### Implementation Plan

The implementation of this CIP should not proceed without an empirical assessment of the effectiveness of the new primitives, as per the following plan:

1. Implement the new primitives according to the specification.
2. Assign a preliminary cost to the new builtin functions. Consider similar operations and their current costs.
3. Create variants of the [existing benchmarks][8] and potentially add some more.
4. Check that the `ByteStringMap` variants are indeed significantly faster.

If the preliminary performance investigation was not successful, this CIP should be revised
according to the findings of the experiment. Otherwise, the implementation can proceed:

5. Determine the most appropriate costing functions for modelling the builtin's performance
and assign costs accordingly.
6. Add the new builtin type and functions to the appropriate sections in the [Plutus Core
Specification][9].
7. Formalize the new builtin type and functions in the [plutus-metatheory][10].
8. The final version of the feature is ready to be merged into [plutus][5] and accepted by
the Plutus Core team.

## Copyright
This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode).

[1]: https://cips.cardano.org/cps/CPS-0013 "CPS-0013"
[2]: https://github.com/IntersectMBO/plutus/pull/5697 "red-black-2"
[3]: https://github.com/IntersectMBO/plutus/pull/692 "red-black-1"
[4]: https://github.com/IntersectMBO/plutus/pull/5779 "specialised-bytestringmap"
[5]: https://github.com/IntersectMBO/plutus "plutus"
[6]: https://github.com/IntersectMBO/cardano-ledger "cardano-ledger"
[7]: https://github.com/IntersectMBO/cardano-node "cardano-node"
[8]: https://github.com/IntersectMBO/plutus/tree/master/plutus-benchmark/script-contexts "script-context-benchmarks"
[9]: https://plutus.cardano.intersectmbo.org/resources/plutus-core-spec.pdf "Formal Specification of the Plutus Core Language"
[10]: https://github.com/IntersectMBO/plutus/tree/master/plutus-metatheory "plutus-metatheory"