-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Implement parser for mapping key lookup (e.g. m[0][1]
)
#3943
base: wrap
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think my biggest comment is that I think we should distinguish string literals from other literals. But see below for more detailed comments.
I guess some of these comments are like "is this supposed to be Solidity or JS", and I guess the answer is Solidity, so, uh, I guess it's the Solidity branch of those conditionals that you want to take. :P
export const stringP = stringEntriesP.pipe(many(), stringify(), between('"')); | ||
} | ||
|
||
const numberP = int().pipe(or(float())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you're relying on builtins here? You may want to make sure these actually matches Solidity's numeric literal syntax, which for instance allows underscores (no more than one consecutively) between digits for grouping purposes.
Edit: Removed comment about JS numeric literals since those aren't relevant
|
||
const stringP = Strings.stringP; | ||
|
||
const literalP = numberP.pipe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solidity has other sorts of literals that I think need to be accounted for. Specifically:
- Boolean literals (
true
andfalse
) - Hex string literals -- I assume static bytestring literals
0x1234
(and address literals) fall under numeric literals, but there's also dynamic bytestring / hex string literalshex"deadbeef"
. (Edit: This also has a single-quoted form.) - It also might be worth supporting Solidity units here, as those are effectively part of numeric literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, one more I forgot about -- string literals are normally restricted to ASCII, but you can also do unicode string literals as unicode"..."
; better account for that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Boolean literals
-
hex"deadbeef"
literals -
unicode"..."
literals - Solidity units (
gwei
, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm so unicode literals are a bit annoying, it seems, because Parjs seems to just assume its input is in the BMP (See their README section Parsing Unicode).
It should be doable to restrict ""
literals to ASCII, but it might be a bit annoying to write the parser for that?
What do you think of this:
- Exclude
unicode"..."
as unsupported for now (since BMP restriction suggests that we shouldn't make the promise of full-unicode support) - Don't restrict
""
to ASCII in the meantime, to afford some way of specifying literals that include BMP characters.
Alternatively, if you want to be a stickler about ""
meaning ASCII, we could just define our own literal bmp""
to be more explicit.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to actually enforce the ASCII restriction. Especially as that's relatively recent, older Solidity doesn't make a distinction.
I think it's also OK to exclude unicode"..."
for now. Proper Unicode handling will be a pain regardless of how we do it, so maybe it's OK if we just don't support it all properly for now. (Especially the \xNN
escape codes...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Especially the \xNN escape codes...)
I don't want to know 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too bad I'm going to explain anyway! :P In Solidity, you can do "\xNN"
to insert a particular byte. Note, not a particular character -- a particular byte, regardless of whether or not this makes valid UTF-8. However, if it's invalid UTF-8, then you won't be able to use it as a string
, only as a bytes
or bytesN
(since string literals can be used as either). So, e.g., unicode"\xA1"
is not the same as unicode"\u00A1"
or unicode"¡"
, but rather is the same as hex"A1"
, and so is an error if used as a string
rather than a bytes
/bytesN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! Does the Solidity compiler catch this up-front? Or is it a runtime problem?
Too bad I'm going to explain anyway!
Good cause I was lying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Solidity compiler detects it upfront. It is possible to put bad UTF-8 inside a string at runtime; there are no runtime checks on whether your UTF-8 is good. But attempting to do so directly from a literal (whether unicode
or hex
or whatever) will trip the compile-time check and cause a compile error. If you really want to put bad UTF-8 inside a string
, you'll need to be sneakier about it. (Putting it in a bytes
and the converting that bytes
to a string
will do the trick. :) )
540659d
to
c4d40de
Compare
- Ensure Literal interface includes `type` field as discriminator - Add some generics infrasturcture for building AST constructors - Add support for boolean literals
- Define custom type names for each form kind in the AST - Update tests to look for one-of `{ trace, value }`
Addresses #3877
Had this lying around from when I discovered parjs a few weeks ago. Decided to clean it up as a draft for review, in case it's something we can utilize in @truffle/decoder.
As it stands, this PR adds a new package @truffle/parse-mapping-lookup, whose name is terrible and needs to change. This new package exposes a function
parse()
, to accept a struct/mapping lookup expression (m[0]
,ledger.balances[0]
, etc.) and return an AST representation of that expression.This handles literal floating point numbers and strings, but it doesn't do anything special with those values. It does handle escape sequences (thanks to parjs's own example). It currently loses information, in that
m["0x"]
will parse the same asm["\"0x\""]
. @haltman-at I figure this is something that would be handled by @truffle/encoder, if we want it, so I didn't get fancy.This also squashes all possibly useful error handling; that will likely be something to improve even slightly before merging.
See test cases for an overview of supported syntax.
Possibly minor note: I borrowed the original tsconfig for this new package from @truffle/db, so it uses
src/
, notlib/
). This was because it seemed to be the fastest way to get jest working for tests, but we might want to update this to borrow from one of the codec packages instead.