Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0139? | Plutus Core Builtin Type - ByteStringMap #921

Conversation

ana-pantilie
Copy link
Contributor

@ana-pantilie ana-pantilie commented Oct 3, 2024

We propose a ByteStringMap type for Plutus Core. This type will have logarithmic time lookups, which are difficult to achieve in Plutus Core programs today.

(Rendered Version)

@ana-pantilie ana-pantilie changed the title Plutus Core Builtin Type - ByteStringMap CIP-???? | Plutus Core Builtin Type - ByteStringMap Oct 3, 2024
Comment on lines 55 to 56
- We consider the costing size of a `ByteStringMap` to be equal to the size of its corresponding representation as an association list.
- The binary encoding of `ByteStringMap`s is equivalent to its encoding as an association list.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwxm - please review. This is based on the standard flat implementation for Haskell's Data.Map.

@rphair rphair added Category: Plutus Proposals belonging to the 'Plutus' category. State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Oct 3, 2024
3. `deleteBSMap :: forall v . ByteString -> ByteStringMap v -> ByteStringMap v`
- it returns a map without the key-value association identified by the input key
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument and logarithmic memory in the `ByteStringMap` argument
4. `unionBSMap :: forall v . ByteStringMap v -> ByteStringMap v -> ByteStringMap v`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important use-case for this builtin data-structure is to facilitate fast operations over the Value type defined as Map.Map CurrencySymbol (Map.Map TokenName Integer). These operations are by far one of the biggest performance bottlenecks for onchain code.

It is extremely expensive to apply validation logic to Values in onchain code because of the high cost of map operations. union is by far the most expensive value operation, but it also used nearly universally by dApps and is a huge performance bottleneck. This builtin data-structure needs to have a semigroup collision handling for Values either unionBSMap with some bool flag indicating a value or just a separate unionBSMapValue builtin for use with Value types.

The common pattern shared across most protocols is as follows:

  1. Given an input, construct the value expected to be possessed by the output which seeks to satisfy the spending conditions of said input.
  2. Check that the expected output value matches the actual output value

The construction of the expected output value is in the majority of cases a union of the input value and a value delta (the quantities of
assets to be added to / removed from the input, ie. the difference in value between the input UTxO and the output UTxO).

processRequest :: ProtocolStateDatum -> TxInInfo -> TxOut -> Bool 
processRequest stateDatum txIn txOut = 
  let resolvedInput = (txInInfoResolved txIn)
      inputValue = txOutValue resolvedInput
      outputValue = txOutValue txOut
      -- Given a tx input and the relevant state datum compute the value delta between the input value 
      -- and the expected value delta between the input value and the expected. (ie the quantities of 
      -- assets to be added to / removed from the input.  
      inputValueDelta = computeValueDelta txIn stateDatum 
      expectedOutputValue = inputValue <> inputValueDelta
   in expectedOutputValue == outputValue

For examples in production dApps see:
https://github.com/SundaeSwap-finance/sundae-contracts/blob/edc118880d3baffcb7d5bd277faec2e7dc54c59b/lib/calculation/swap.ak#L120C1-L124C64
https://github.com/minswap/minswap-dex-v2/blob/c3293ad435792ba502f5a2b1691c6736e5f167e6/lib/amm_dex_v2/order_validation.ak#L530
https://github.com/minswap/minswap-dex-v2/blob/c3293ad435792ba502f5a2b1691c6736e5f167e6/lib/amm_dex_v2/order_validation.ak#L136
https://github.com/lenfiLabs/lenfi-smart-contracts/blob/eb7dc685f3888c940d0da027c46b5f1c3d9578f1/validators/order_contract.ak#L164
https://github.com/minswap/minswap-lbe-v2/blob/6fea91d881d6037e9c944ba39f6068a1d6df4e42/lib/lb_v2/validation.ak#L429
https://github.com/minswap/minswap-lbe-v2/blob/6fea91d881d6037e9c944ba39f6068a1d6df4e42/lib/lb_v2/order_validation.ak#L237
https://github.com/minswap/minswap-lbe-v2/blob/6fea91d881d6037e9c944ba39f6068a1d6df4e42/lib/lb_v2/order_validation.ak#L82
https://github.com/Danogo2023/bond-dex/blob/f29effc2ca1f354fd8a9c3695ec58926f940de84/lib/bond/limit_bid/withdraw.ak#L392-L397

For explanation on why many of these are using value.add (insert with semigroup collision handling instead of left/right-based collision)
instead of merge see https://github.com/txpipe-shop/sundae-swap/blob/d8adea189f16b3a585ea311d6467f5fe2fda78d4/src/audit.typ#L1099-L1119
probably due to the overhead introduced by recursion in onchain code, merge is more expensive than add and thus should be preferred when you have a fixed number of assets to update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colll78 Thanks for all the pointers! Sounds like we should have insertWithBSMap as a builtin. With higher order function support coming soon, this makes total sense.

I'm unsure about the following:

  • Should we keep insertBSMap
  • Do we need a unionWithBSMap builtin, or does insertWithBSMap suffice
  • What can we do to speed up equality checks on Values? It's not simply map equality due to potential zero-value entries. Perhaps filterBSMap + equalsBSMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map equality is cheap because we currently take advantage of equalsData.

equalsData (mergeValue ownInputValue tokensToAdd) ownOutputValue

To handle zero-value entries mergeValue simply deletes them, I would expect that the union function would similarly be capable of deleting zero value entries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need unionWithBSMap and I think filterBSMap would be extremely helpful as-well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have insertWithBSMap or unionWithBSMap. The upcoming higher-order builtins allow one to return an application from a builtin call, not to execute one within the builtin (we used to have that functionality but it was removed a long time ago) and we need the latter to be able to implement insertWithBSMap or unionWithBSMap as a builtin. We can try to design such functionality, but it's probably not gonna be pretty. I've spent some time several years ago thinking about roughly this problem and had an idea back then, so if you want me to try it out, I can do that (it'll be a few days of work before I can tell whether it's promising or not).

Copy link
Contributor

@colll78 colll78 Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then just a default mergeValue and insertValue (that you don't provide a function to, and only works on canonical ledger constructed Value as BuiltinData) would be just as impactful. Dealing with Values is the main use-case and one of the main efficiency bottlenecks in most protocols, it makes sense to have specialized builtins for interacting with Values given they are one of the two most foundational components of smart contracts (the purpose of smart contracts is to validate the transfer of and change in Value and Data)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, perhaps one of the most universally beloved smart contract platforms in the blockchain space defines a programming paradigm "Asset oriented programming", the entire language has been built around the notion that it natively understands assets and provides native first-class features for assets. The introduction of such builtins would also vastly enhance the security of the ecosystem because the implementation of such functions in onchain code are extremely error prone, especially when you look at all the black magic optimization that libraries are using (which wouldn't be necessary with such builtins).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colll78 sounds interesting. Wanna open an issue in the Plutus repo about that?

However I personally prefer to be as general as possible until we're sure that that's not enough. So I'd rather investigate unionWithBSMap, maybe we can do that and then we can keep domain-specific logic in the user space. If that's too hard or isn't going to fly, then I agree with you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is reasonable.

mergeValue :: BuiltinData -> BuiltinData -> BuiltinData 
mergeValue v1 v2 = 
  let valueMap1 = listToBSMap (BI.unsafeDataAsMap v1) 
       valueMap2 = listToBSMap (BI.unsafeDataAsMap v2) 
       combinedVal = unionWithBSMap (\innerA innerB -> unionWithBsMap BI.addInteger (listToBSMap (BI.unsafeDataAsMap innerA)) (listToBSMap (BI.unsafeDataAsMap innerB))) valueMap1 valueMap2
   in toBuiltinData $ filterWithBSMap ((/=) 0) combinedVal

In this case filterWithBSMap and equalsBSMap are definitely required.

- it has `O(n * logn)` time complexity, where `n` is the size of the list
- it has linear space complexity
- if the input list does not contain duplicate keys and is ordered (in ascending order of the keys), the space and time complexity will be linear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following should also be added:

keysBSMap ::  forall v . BuiltinByteStringMap v -> BuiltinList ByteString
valuesBSMap :: forall v. BuiltinByteStringMap v -> BuiltinList v

Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well-written, thank you.

Sorry for being a party pooper on insertWithBSMap/unionWithBSMap

CIP-bytestring-map-builtin/README.md Outdated Show resolved Hide resolved

Why are higher-order builtin functions necessary? Map keys must be processed (or simply, just compared) in order to be able to build and operate over well-defined maps. For arbitrary key types, that would only be possible by passing the required processing or comparison functions as arguments to the builtin functions.

The only way to implement a builtin map with a (builtin) polymorphic key type would be to modify the builtins language itself. A compiler would have to only allow the construction of maps with key types which are known a priori to posess some property (such as a total ordering). The Plutus Core evaluator would have to check for this as well. The implementation itself would not be too complicated, but we must first consider how this would change the formal semantics of the builtins language.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed that and we don't really need to modify anything much? We can just fail at runtime for built-in types that cannot implement Ord (or PlutusOrd or whatever). And then we'll expose a safe interface in Plinth forbidding one from using such built-in types as keys in maps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment still stands, I don't believe you'd need to pass a comparison function around, keys can only be of built-in types and we either know how to compare those (in most cases) or don't (like those pesky crypto types) and in the latter case we can just throw. UPLC is a low-level language, it's fine for it to throw like that, and high-level languages will be able to provide a type-safe interface. No meta-theoretical properties are hurt. General maps are gonna be a bit less efficient though, but if done right, probably insignificantly so.

We propose the addition of the following builtin type: `ByteStringMap` of kind `Type -> Type`.
- It represents a map associating keys of type `ByteString` with values of any builtin type.
- The new builtin type should be implemented using a tree data structure allowing for logarithmic-time map operations.
- We consider the costing size of a `ByteStringMap` to be equal to the size of its corresponding representation as an association list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that, we'll have to carry some additional info around for balancing purposes, so it sounds like costing should reflect that. And an element in an association list is a pointer to a pair with two pointers to actual content, in our case it'll probably be just the latter two pointers. So yeah, I don't think we should assume that costing is going to be exactly the same, although I suppose that could be a good enough approximation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwxm what should we say here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simply say "roughly equal" instead of "equal" and that'll be fine, this is all to be figured out during the costing phase anyway, it's fine not to have the exact answer now.

Copy link
Contributor

@kwxm kwxm Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwxm what should we say here?

@ana-pantilie: sorry, I missed this. We actually have to caculate the size of the map object before we carry out any operations on it, and that might be expensive if we have to traverse a tree to findi ts size. One way to alleviate this might be to have the maps carry their sizes about with them and make sure that the update operations update the size as well (and during deserialisation we'd have to check that maps embedded in scripts don't lie about their size). We might even need two measures of size: one telling you the number of elements of the map (for converting to a list, for example), and one telling you the depth of the map (for insertion etc.). We could do this by using newtype wrappers to choose which size measure should be applied. Maybe we could just store the size of the object and take the log instead of measuring the depth, since these should be proportional if the tree's always kept balanced. This is all probably too detailed for a CIP though, so saying that the size should be something like the number of entries should be fine: we can work out the details later.

- We consider the costing size of a `ByteStringMap` to be equal to the size of its corresponding representation as an association list.
- The binary encoding of `ByteStringMap`s is equivalent to its encoding as an association list.

**Note**: Here `Type` is the universe of all builtin types, since we do not consider types formed out of applying builtin types to arbitrary types to be inhabited.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a type variable or a type application reducing to a built-in type etc, so it's not just the built-in types.

3. `deleteBSMap :: forall v . ByteString -> ByteStringMap v -> ByteStringMap v`
- it returns a map without the key-value association identified by the input key
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument and logarithmic memory in the `ByteStringMap` argument
4. `unionBSMap :: forall v . ByteStringMap v -> ByteStringMap v -> ByteStringMap v`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have insertWithBSMap or unionWithBSMap. The upcoming higher-order builtins allow one to return an application from a builtin call, not to execute one within the builtin (we used to have that functionality but it was removed a long time ago) and we need the latter to be able to implement insertWithBSMap or unionWithBSMap as a builtin. We can try to design such functionality, but it's probably not gonna be pretty. I've spent some time several years ago thinking about roughly this problem and had an idea back then, so if you want me to try it out, I can do that (it'll be a few days of work before I can tell whether it's promising or not).


### Traversing `ByteStringMap`s

Unfortunately, due to the limitations presented earlier we cannot provide an effective builtin fold over maps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do it we implemented Allow for returning application trees from builtins a.k.a. folding builtins, which is a straightforward generalization of pattern matching builtins and shouldn't be too hard to get done (famous last words).

@rphair rphair changed the title CIP-???? | Plutus Core Builtin Type - ByteStringMap CIP-0138? | Plutus Core Builtin Type - ByteStringMap Oct 15, 2024
@rphair rphair changed the title CIP-0138? | Plutus Core Builtin Type - ByteStringMap CIP-0139? | Plutus Core Builtin Type - ByteStringMap Oct 15, 2024
Copy link
Collaborator

@rphair rphair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ana-pantilie - a well received pair of proposals in general (with #915) as confirmed at today's CIP meeting; please rename the containing directory accordingly & update the proposal link in the initial posting 🎉

CIP-bytestring-map-builtin/README.md Outdated Show resolved Hide resolved
@rphair rphair added State: Confirmed Candiate with CIP number (new PR) or update under review. and removed State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Oct 15, 2024
@ana-pantilie
Copy link
Contributor Author

@rphair Great news, thank you!

Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. Unfortunately, there's something that I didn't realize the first time and it's that listToBSMap is unusable, because at the moment it's not possible to construct arbitrary lists of pairs, only lists of pairs of Data. Neither the specification nor the formalization have this issue, but there's been zero work done to address it on the implementation side. And it is very non-trivial if we want to do it properly.


The only remaining solution is to consider introducing a builtin map type.

Unfortunately, the builtin language has a significant limitation: it does not allow for higher-order builtin functions. As of the time of writing, builtin functions may not receive other functions as arguments, but there is ongoing work to lift this restriction. However, due to complex design considerations, repeated applications of function arguments will incur high performance costs. Since maps are recursive data structures, these high performance costs will be accumulated and such an approach will still not be feasible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of the time of writing, builtin functions may not receive other functions as arguments, but there is ongoing work to lift this restriction.

The restriction has been lifted, so this builtin:

lookupBSMap :: forall v . ByteString -> (() -> v) -> ByteStringMap v -> v

is now expressible. BTW, the same logic as in the array CIP applies, so it should probably be

lookupBSMap :: forall v . (() -> v) -> ByteStringMap v -> ByteString -> v

We propose the following set of builtin functions to accompany the new builtin type:
1. `insertBSMap :: forall v . ByteString -> v -> ByteStringMap v -> ByteStringMap v`
- it returns a map with the input key updated to the new value, silently discarding any previous value
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument, and logarithmic memory in the `ByteStringMap` argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please talk about insertWith/unionWith a bit? Specifically, that we want to have them, but cannot unless one of

is implemented.

3. `deleteBSMap :: forall v . ByteString -> ByteStringMap v -> ByteStringMap v`
- it returns a map without the key-value association identified by the input key
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument and logarithmic memory in the `ByteStringMap` argument
4. `unionBSMap :: forall v . ByteStringMap v -> ByteStringMap v -> ByteStringMap v`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want intersectionBSMap and differenceBSMap then.

A horrible nit btw: I'd personally prefer a different capitalization: unionBsMap. But feel free to ignore.

- it transforms the map into an association list
- the resulting list should be ordered (in ascending order of the keys)
- it is linear in the size of the `ByteStringMap`
6. `listToBSMap :: forall v . List (Pair ByteString v) -> ByteStringMap v`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this has a big problem. I.e. at the moment this builtin is definable but basically unusable. Please reference that issue here.


`insertBSMap` and `deleteBSMap` remain fundamental operations which we must expose as builtin functions.

One seemingly unnecessary addition is `unionBSMap`, as it may be implemented in Plutus Core by traversing the elements of one of the maps and calling `insertBSMap`. However, as outlined by [CPS-0013][1], `ByteStringMap` should provide an optimal implementation of the Plutus Tx `Value` type, and the operations on `Value`s depend heavily on map unions. Therefore, for performance reasons, we prefer to expose this functionality as a builtin function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that Map-backed Value is going to happen, it's way too much complexity just to get a slow version of a hypothetical built-in Value.

We propose the following set of builtin functions to accompany the new builtin type:
1. `insertBSMap :: forall v . ByteString -> v -> ByteStringMap v -> ByteStringMap v`
- it returns a map with the input key updated to the new value, silently discarding any previous value
- it uses logarithmic time in the `ByteStringMap` argument, linear time in the `ByteString` argument, and logarithmic memory in the `ByteStringMap` argument
Copy link
Contributor

@kwxm kwxm Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the memory usage relly be logarithmic in the size of the map? In most functional data structures adding a new entry just requires a constant amount of space. Maybe it's not that simple if you need to keep your structure balanced though: I should go and look up how you actually implement these things.


We propose the addition of the following builtin type: `ByteStringMap` of kind `Type -> Type`.
- It represents a map associating keys of type `ByteString` with values of any builtin type.
- The new builtin type should be implemented using a tree data structure allowing for logarithmic-time map operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we have to implement all of this ourselves, or is there some existing library that we can use?

@ana-pantilie
Copy link
Contributor Author

After discussing internally, I've unfortunately decided to close this CIP.

The reason behind this decision is that we have discovered that it will not actually solve the original problem: introducing a builtin type for representing Value optimally. Together with the limitation pointed out by @effectfully in regards to listToBSMap, this makes a builtin ByteStringMap into an oddly specific and inflexible builtin with an awkward API.

@colll78 and @effectfully believe that all the overhead required to implement the Value logic on top of ByteStringMap will result in poor performance, so in this sense ByteStringMap is too general for this use-case.

To make matters worse, the limitation regarding listToBSMap further restricts its usage. Not having a dedicated Pair constructor in Data breaks a lot of type transformations between ByteStringMap and other types, resulting in a misleading and incomplete implementation.

Whether a builtin map could ever be useful, @colll78 believes a more general builtin Map, polymorphic in both input types does have some use cases, however a current implementation would suffer from even more limitations which we need to address first.

To conclude, the language isn't ready yet for a builtin Map type.

I'm glad that at least this CIP has generated discussions which have made that clear and we have a better idea of what to try next. @rphair thank you for your time considering and reviewing this CIP.

@rphair
Copy link
Collaborator

rphair commented Nov 12, 2024

Thank you as well @ana-pantilie for your honesty & comprehensive documentation, along with your reviewers, of this problem space. ⭐

@rphair rphair removed the State: Confirmed Candiate with CIP number (new PR) or update under review. label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Plutus Proposals belonging to the 'Plutus' category.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants