Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for JSON5 #211

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open

Add support for JSON5 #211

wants to merge 18 commits into from

Conversation

anjohnson
Copy link

@anjohnson anjohnson commented Sep 1, 2018

The changes in this PR add support to yajl-2.1 for parsing and generating JSON5.
Included are the code and tests for:

  • Object keys may be an unquoted identifier name if they match the regex [$_A-Za-z][$_A-Za-z0-9]*.
  • Objects may have a trailing comma.
  • Arrays may have a trailing comma.
  • Strings may be single-quoted.
  • Strings may span multiple lines by escaping the newline character(s) which are then excluded from the string.
  • Strings may include hex character escapes using \xXX where XX are 2 hex digits giving a character number in the Basic Latin or Latin-1 Supplement Unicode character ranges (U+0000 through U+00FF).
  • Numbers may be hex integers.
  • Numbers may have a leading or trailing decimal point.
  • Numbers may be ±Infinity or NaN.
  • Numbers may being with a + sign.
  • Turning on JSON5 for the parser also allows comments, which are part of the JSON5 spec.

The yajl generator also supports JSON5 with a configuration flag, permitting it to output ±Infininty and NaN values and will omit quotes from object keys that follow the appropriate rules.

The json_reformat and json_verify (and yajl_test) programs take a -5 option to allow JSON5 input, and json_reformat has a separate -g flag to generate JSON5 output. The yajl_tree code enables JSON5 support unilaterally, as there was no existing configuration API for it.

I also worked on the Doxygen markup, in addition to documenting the JSON5 additions. Some of these changes were probably more extensive than they could have been, so those are the last in the series of commits included.

And finally one other small but important change to know about: These characters [ and ] are brackets, these { and } are braces; I swapped the tag names that the lexer returns and adjusted the parser to match (my brain was getting too confused, or maybe it was just OCD, but I had to fix these).

This branch and PR was modified in August 2020 to correct some misunderstandings of the JSON5 spec on my part — I added the \x hex character escapes, and removed the checks for JavaScript keywords.

Added yajl_allow_json5 config flag, pass it around.
Added -5 option to yajl_test, json_reformat and json_verify.
If configured for JSON5 the lexer now allows a leading or trailing
decimal point on doubles, and an explicit leading + sign on integers
or double numbers.
Added tests to check these.
Includes the simple test case.
yajl_parse_integer still doesn't handle LLONG_MIN in base 10 or 16.
NaN and both Infinities.
Special handling was added to yajl_test since different OSs don't
always generate the same output for special numbers (nan/NaN/...).
When this flag is set, the yajl_gen_double() routine can output
the values NaN, -Infinity and +Infinity.
Set yajl_gen_json5.
Replace reformat_number with reformat_integer and reformat_double.
Added a new routine to yajl_encode.c that validates bare identifiers.
Use this in yajl_gen_string() to avoid quoting keys we don't have to.

Added a separate -g option to json_reformat to distinguish JSON5 output
from the -5 option that flags input as being JSON5.
Adds another lexer entry point for lexing map keys only,
adjust parser to use this instead of the general lexer.
Also defines another lexer token for internal use only.
Any character other than the digits 1-9 may be preceded by a
reverse solidus '\', and unless the combination has an explicitly
defined expansion the character is included without the solidus.
JSON5 adds \', \0 and \v to the set of defined escapes, and an
escaped newline is omitted from a string.
Teach the lexer/parser to recognize and decode them in JSON5 mode.
Teach the encoder to use them in JSON5 mode.
Add another error message for bad hex digits.
Test cases to show they work, and that the bad-digit check fires.
Also adds missing character flag VIC for 'r'.
The a5_spec_example test was copied from the JSON5 spec.
Includes some major text reformatting, so this could trigger
merge conflicts against other pull requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant