From 3236a7edd02c43840ea404dfc0a6986ac9369682 Mon Sep 17 00:00:00 2001 From: Jason Evans Date: Fri, 25 Mar 2022 13:05:34 -0700 Subject: [PATCH] Begin implementing `hocc` MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add hocc syntax highlighting for kakoune. There's probably a way to avoid the near-complete duplication with the Hemlock syntax highlighting configuration, but it may involve shell evaluation of all highlighter additions. - Add `Os.argv`. Command line parameters may not be UTF-8-encoded, so it's not safe to use OCaml's `Sys.argv`. Finish implementing command line parameter validation Begin implementing `hocc` specification data structures Implement `hocc` scanner Revise `hocc` grammar, define corresponding syntax tree types Enable `hocc` syntax highlighting for the `.hmhi` extension Misc - Revise `hocc` grammar to use ε productions where appropriate. - Implement hmhi parsing. - Add `Hmc.Scan.cursor` as a convenience function. - Revise `hocc` grammar productions to use right recursion, and to support Hemlock code sequences well enough to pass muster for `hocc`'s purposes. Add `Parse.Error`, improve error reporting Add ctx type for parser functions Incorporate token malformations into parse error reporting Add `Hmc.Scan.AbstractToken.malformations` This convenience function extracts malformations, which centralizes the verbose exhaustive pattern matching necessary to do this reliably. Revise `hocc` command line usage to make parser generation optional Start implementing `.hmh` parsing Finish initial implementation of `hocc` parser (doesn't work correctly) Implement parse tree pretty printing, start implementing tracing Fix `hocc` parsing bugs (successfully parses `Example1.hmh[i]`) Add `hocc` parsing tests, fix grammar/implementation bugs Make production parameter lexical bindings optional Add more `hocc` parsing tests Add more `hocc` parser tests Add `Hocc` grammar parsing test Enhance `hocc` grammar to omit `-> ()` for unit-type non-terminals Add token alias syntax `hocc` documentation edits Expand `hocc` documentation Finish initial draft of `hocc.md` Add `hocc` parse error tests Add remaining `hocc` parser tests Add Lyken grammar spec (two forms, one of which uses DAG prec by construction) Remove `UIDENT` from `ProdParamType` Restrict precedence relationships to `<` with previously defined precedences Remove the `-dot`/`-graphviz` command line options, since cycles are no longer possible and the textual precedence relationships closely correspond to the graph. Add `Io` module Implement precedence validation/closure Incorporate `stderr` into `Io.t` and add `Io.fatal` Implement html precedence reporting Implement hocc report generation Make `hocc` scanner lazy Create the `hocc` report directory if needed Expose `Hmc.Scan.ConcretToken.t` definition to enable pattern matching Implement various modules on which `Spec` depends Implement more of `Spec.t` initialization Reduction initialization still isn't implemented, nor any reporting besides `prec`. Implement token/nonterm report generation Misc refactoring, start of reduction procesessing Combine duplicate code into `QualifiedType` Encapsulate `Reduction.param` as `Reduction.Param.t` Enhance first pass to gather qualified type in addition to index Implement `Reduction.Params` Refactor production parameter processing to be universally usable Implement reduction processing Synthesize `PSEUDO_END`, prep for start symbol wrapper synthesis Synthesize wrappers for start symbols Implement symbol first/follow set closure Implement `Lr[01]Item{,set}` Start implementing LR(1) item set closure Implement `Lr1ItemsetClosure.compat_ident` Implement `Lr1ItemsetClosure.compat_weak` Finish implementing `Lr1ItemsetClosure` Implement LR(1) item set closure generation --- .editorconfig | 2 +- bootstrap/bin/{ => hmc}/dune | 0 bootstrap/bin/{ => hmc}/hmc.ml | 5 +- bootstrap/bin/hocc/assoc.ml | 11 + bootstrap/bin/hocc/assoc.mli | 7 + bootstrap/bin/hocc/conf.ml | 236 ++ bootstrap/bin/hocc/conf.mli | 24 + bootstrap/bin/hocc/dune | 7 + bootstrap/bin/hocc/hocc.ml | 59 + bootstrap/bin/hocc/io.ml | 179 ++ bootstrap/bin/hocc/io.mli | 29 + bootstrap/bin/hocc/lr0Item.ml | 32 + bootstrap/bin/hocc/lr0Item.mli | 11 + bootstrap/bin/hocc/lr0Itemset.ml | 26 + bootstrap/bin/hocc/lr0Itemset.mli | 12 + bootstrap/bin/hocc/lr1Item.ml | 76 + bootstrap/bin/hocc/lr1Item.mli | 23 + bootstrap/bin/hocc/lr1Itemset.ml | 97 + bootstrap/bin/hocc/lr1Itemset.mli | 38 + bootstrap/bin/hocc/lr1ItemsetClosure.ml | 167 ++ bootstrap/bin/hocc/lr1ItemsetClosure.mli | 35 + bootstrap/bin/hocc/parse.ml | 1918 +++++++++++++++++ bootstrap/bin/hocc/prec.ml | 22 + bootstrap/bin/hocc/prec.mli | 15 + bootstrap/bin/hocc/prod.ml | 40 + bootstrap/bin/hocc/prod.mli | 19 + bootstrap/bin/hocc/qualifiedType.ml | 58 + bootstrap/bin/hocc/qualifiedType.mli | 16 + bootstrap/bin/hocc/reduction.ml | 140 ++ bootstrap/bin/hocc/reduction.mli | 42 + bootstrap/bin/hocc/scan.ml | 113 + bootstrap/bin/hocc/scan.mli | 69 + bootstrap/bin/hocc/spec.ml | 1473 +++++++++++++ bootstrap/bin/hocc/spec.mli | 16 + bootstrap/bin/hocc/symbol.ml | 120 ++ bootstrap/bin/hocc/symbol.mli | 43 + bootstrap/bin/hocc/workq.ml | 40 + bootstrap/bin/hocc/workq.mli | 18 + bootstrap/src/hmc/scan.ml | 101 + bootstrap/src/hmc/scan.mli | 17 +- bootstrap/test/basis/seed/test_seed0.ml | 2 +- bootstrap/test/basis/seed/test_seed42.ml | 2 +- bootstrap/test/hocc/A.expected | 4 + bootstrap/test/hocc/A.hmh | 22 + bootstrap/test/hocc/B.expected | 4 + bootstrap/test/hocc/B.hmh | 19 + bootstrap/test/hocc/C.expected | 4 + bootstrap/test/hocc/C.hmh | 15 + bootstrap/test/hocc/D.expected | 4 + bootstrap/test/hocc/D.hmh | 15 + bootstrap/test/hocc/E.expected | 4 + bootstrap/test/hocc/E.hmh | 10 + bootstrap/test/hocc/Example1.expected | 5 + bootstrap/test/hocc/Example1.hmh | 63 + bootstrap/test/hocc/Example1.hmhi | 9 + bootstrap/test/hocc/F.expected | 4 + bootstrap/test/hocc/F.hmh | 16 + bootstrap/test/hocc/Hocc.expected | 4 + bootstrap/test/hocc/Hocc.hmh | 179 ++ bootstrap/test/hocc/Lyken.expected | 4 + bootstrap/test/hocc/Lyken.hmh | 1254 +++++++++++ bootstrap/test/hocc/M.expected | 4 + bootstrap/test/hocc/M.hmh | 30 + bootstrap/test/hocc/N.expected | 4 + bootstrap/test/hocc/N.hmh | 23 + bootstrap/test/hocc/Parse_a.expected | 5 + bootstrap/test/hocc/Parse_a.hmh | 55 + bootstrap/test/hocc/Parse_a.hmhi | 5 + bootstrap/test/hocc/Parse_b.expected | 4 + bootstrap/test/hocc/Parse_b.hmh | 2 + bootstrap/test/hocc/Parse_b.hmhi | 1 + .../test/hocc/Parse_error_cident.expected | 2 + bootstrap/test/hocc/Parse_error_cident.hmh | 2 + bootstrap/test/hocc/Parse_error_code.expected | 2 + bootstrap/test/hocc/Parse_error_code.hmh | 2 + .../Parse_error_delimited_rarray.expected | 2 + .../hocc/Parse_error_delimited_rarray.hmh | 3 + .../Parse_error_delimited_rbrack.expected | 2 + .../hocc/Parse_error_delimited_rbrack.hmh | 3 + .../Parse_error_delimited_rcapture.expected | 2 + .../hocc/Parse_error_delimited_rcapture.hmh | 3 + .../Parse_error_delimited_rcurly.expected | 2 + .../hocc/Parse_error_delimited_rcurly.hmh | 3 + .../Parse_error_delimited_rparen.expected | 2 + .../hocc/Parse_error_delimited_rparen.hmh | 3 + bootstrap/test/hocc/Parse_error_hmhi.expected | 2 + bootstrap/test/hocc/Parse_error_hmhi.hmh | 2 + bootstrap/test/hocc/Parse_error_hmhi.hmhi | 1 + bootstrap/test/hocc/Parse_error_hocc.expected | 2 + bootstrap/test/hocc/Parse_error_hocc.hmh | 1 + .../test/hocc/Parse_error_malformed.expected | 3 + bootstrap/test/hocc/Parse_error_malformed.hmh | 3 + .../hocc/Parse_error_nonterm_cce.expected | 2 + .../test/hocc/Parse_error_nonterm_cce.hmh | 2 + .../hocc/Parse_error_of_type_dot.expected | 2 + .../test/hocc/Parse_error_of_type_dot.hmh | 2 + .../test/hocc/Parse_error_precs.expected | 2 + bootstrap/test/hocc/Parse_error_precs.hmh | 2 + .../test/hocc/Parse_error_precs_lt.expected | 2 + bootstrap/test/hocc/Parse_error_precs_lt.hmh | 2 + .../hocc/Parse_error_prod_param_type.expected | 2 + .../test/hocc/Parse_error_prod_param_type.hmh | 3 + .../hocc/Parse_error_reduction_arrow.expected | 2 + .../test/hocc/Parse_error_reduction_arrow.hmh | 3 + .../test/hocc/Parse_error_uident.expected | 2 + bootstrap/test/hocc/Parse_error_uident.hmh | 2 + bootstrap/test/hocc/dune | 381 ++++ bootstrap/test/hocc/help_a.expected | 25 + bootstrap/test/hocc/help_b.expected | 26 + doc/design/index.md | 2 +- doc/index.md | 5 + doc/tools/hocc.md | 906 ++++++++ ide/kakoune/hocc.kak | 122 ++ 113 files changed, 8668 insertions(+), 8 deletions(-) rename bootstrap/bin/{ => hmc}/dune (100%) rename bootstrap/bin/{ => hmc}/hmc.ml (89%) create mode 100644 bootstrap/bin/hocc/assoc.ml create mode 100644 bootstrap/bin/hocc/assoc.mli create mode 100644 bootstrap/bin/hocc/conf.ml create mode 100644 bootstrap/bin/hocc/conf.mli create mode 100644 bootstrap/bin/hocc/dune create mode 100644 bootstrap/bin/hocc/hocc.ml create mode 100644 bootstrap/bin/hocc/io.ml create mode 100644 bootstrap/bin/hocc/io.mli create mode 100644 bootstrap/bin/hocc/lr0Item.ml create mode 100644 bootstrap/bin/hocc/lr0Item.mli create mode 100644 bootstrap/bin/hocc/lr0Itemset.ml create mode 100644 bootstrap/bin/hocc/lr0Itemset.mli create mode 100644 bootstrap/bin/hocc/lr1Item.ml create mode 100644 bootstrap/bin/hocc/lr1Item.mli create mode 100644 bootstrap/bin/hocc/lr1Itemset.ml create mode 100644 bootstrap/bin/hocc/lr1Itemset.mli create mode 100644 bootstrap/bin/hocc/lr1ItemsetClosure.ml create mode 100644 bootstrap/bin/hocc/lr1ItemsetClosure.mli create mode 100644 bootstrap/bin/hocc/parse.ml create mode 100644 bootstrap/bin/hocc/prec.ml create mode 100644 bootstrap/bin/hocc/prec.mli create mode 100644 bootstrap/bin/hocc/prod.ml create mode 100644 bootstrap/bin/hocc/prod.mli create mode 100644 bootstrap/bin/hocc/qualifiedType.ml create mode 100644 bootstrap/bin/hocc/qualifiedType.mli create mode 100644 bootstrap/bin/hocc/reduction.ml create mode 100644 bootstrap/bin/hocc/reduction.mli create mode 100644 bootstrap/bin/hocc/scan.ml create mode 100644 bootstrap/bin/hocc/scan.mli create mode 100644 bootstrap/bin/hocc/spec.ml create mode 100644 bootstrap/bin/hocc/spec.mli create mode 100644 bootstrap/bin/hocc/symbol.ml create mode 100644 bootstrap/bin/hocc/symbol.mli create mode 100644 bootstrap/bin/hocc/workq.ml create mode 100644 bootstrap/bin/hocc/workq.mli create mode 100644 bootstrap/test/hocc/A.expected create mode 100644 bootstrap/test/hocc/A.hmh create mode 100644 bootstrap/test/hocc/B.expected create mode 100644 bootstrap/test/hocc/B.hmh create mode 100644 bootstrap/test/hocc/C.expected create mode 100644 bootstrap/test/hocc/C.hmh create mode 100644 bootstrap/test/hocc/D.expected create mode 100644 bootstrap/test/hocc/D.hmh create mode 100644 bootstrap/test/hocc/E.expected create mode 100644 bootstrap/test/hocc/E.hmh create mode 100644 bootstrap/test/hocc/Example1.expected create mode 100644 bootstrap/test/hocc/Example1.hmh create mode 100644 bootstrap/test/hocc/Example1.hmhi create mode 100644 bootstrap/test/hocc/F.expected create mode 100644 bootstrap/test/hocc/F.hmh create mode 100644 bootstrap/test/hocc/Hocc.expected create mode 100644 bootstrap/test/hocc/Hocc.hmh create mode 100644 bootstrap/test/hocc/Lyken.expected create mode 100644 bootstrap/test/hocc/Lyken.hmh create mode 100644 bootstrap/test/hocc/M.expected create mode 100644 bootstrap/test/hocc/M.hmh create mode 100644 bootstrap/test/hocc/N.expected create mode 100644 bootstrap/test/hocc/N.hmh create mode 100644 bootstrap/test/hocc/Parse_a.expected create mode 100644 bootstrap/test/hocc/Parse_a.hmh create mode 100644 bootstrap/test/hocc/Parse_a.hmhi create mode 100644 bootstrap/test/hocc/Parse_b.expected create mode 100644 bootstrap/test/hocc/Parse_b.hmh create mode 100644 bootstrap/test/hocc/Parse_b.hmhi create mode 100644 bootstrap/test/hocc/Parse_error_cident.expected create mode 100644 bootstrap/test/hocc/Parse_error_cident.hmh create mode 100644 bootstrap/test/hocc/Parse_error_code.expected create mode 100644 bootstrap/test/hocc/Parse_error_code.hmh create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rarray.expected create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rarray.hmh create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rbrack.expected create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rbrack.hmh create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rcapture.expected create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rcapture.hmh create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rcurly.expected create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rcurly.hmh create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rparen.expected create mode 100644 bootstrap/test/hocc/Parse_error_delimited_rparen.hmh create mode 100644 bootstrap/test/hocc/Parse_error_hmhi.expected create mode 100644 bootstrap/test/hocc/Parse_error_hmhi.hmh create mode 100644 bootstrap/test/hocc/Parse_error_hmhi.hmhi create mode 100644 bootstrap/test/hocc/Parse_error_hocc.expected create mode 100644 bootstrap/test/hocc/Parse_error_hocc.hmh create mode 100644 bootstrap/test/hocc/Parse_error_malformed.expected create mode 100644 bootstrap/test/hocc/Parse_error_malformed.hmh create mode 100644 bootstrap/test/hocc/Parse_error_nonterm_cce.expected create mode 100644 bootstrap/test/hocc/Parse_error_nonterm_cce.hmh create mode 100644 bootstrap/test/hocc/Parse_error_of_type_dot.expected create mode 100644 bootstrap/test/hocc/Parse_error_of_type_dot.hmh create mode 100644 bootstrap/test/hocc/Parse_error_precs.expected create mode 100644 bootstrap/test/hocc/Parse_error_precs.hmh create mode 100644 bootstrap/test/hocc/Parse_error_precs_lt.expected create mode 100644 bootstrap/test/hocc/Parse_error_precs_lt.hmh create mode 100644 bootstrap/test/hocc/Parse_error_prod_param_type.expected create mode 100644 bootstrap/test/hocc/Parse_error_prod_param_type.hmh create mode 100644 bootstrap/test/hocc/Parse_error_reduction_arrow.expected create mode 100644 bootstrap/test/hocc/Parse_error_reduction_arrow.hmh create mode 100644 bootstrap/test/hocc/Parse_error_uident.expected create mode 100644 bootstrap/test/hocc/Parse_error_uident.hmh create mode 100644 bootstrap/test/hocc/dune create mode 100644 bootstrap/test/hocc/help_a.expected create mode 100644 bootstrap/test/hocc/help_b.expected create mode 100644 doc/index.md create mode 100644 doc/tools/hocc.md create mode 100644 ide/kakoune/hocc.kak diff --git a/.editorconfig b/.editorconfig index 61a49e703..ab94a347d 100644 --- a/.editorconfig +++ b/.editorconfig @@ -1,6 +1,6 @@ # https://editorconfig.org/ -[*.{hm,hmi}] +[*.{hm,hmi,hmh,hmhi}] indent_style = space indent_size = 4 tab_width = 8 diff --git a/bootstrap/bin/dune b/bootstrap/bin/hmc/dune similarity index 100% rename from bootstrap/bin/dune rename to bootstrap/bin/hmc/dune diff --git a/bootstrap/bin/hmc.ml b/bootstrap/bin/hmc/hmc.ml similarity index 89% rename from bootstrap/bin/hmc.ml rename to bootstrap/bin/hmc/hmc.ml index 3a04732dd..b61fb2aca 100644 --- a/bootstrap/bin/hmc.ml +++ b/bootstrap/bin/hmc/hmc.ml @@ -36,10 +36,9 @@ let scan_file path = () let _ = - match Array.length Sys.argv with + match Array.length Os.argv with | 0L | 1L -> halt "hmc usage: hmc " | _ -> begin - let path_str = Array.get 1L Sys.argv in - let path = Path.of_string path_str in + let path = Path.of_bytes (Bytes.Slice.init (Array.get 1L Os.argv)) in scan_file path end diff --git a/bootstrap/bin/hocc/assoc.ml b/bootstrap/bin/hocc/assoc.ml new file mode 100644 index 000000000..60badce54 --- /dev/null +++ b/bootstrap/bin/hocc/assoc.ml @@ -0,0 +1,11 @@ +open Basis + +type t = + | Left + | Right + +let pp t formatter = + formatter |> Fmt.fmt (match t with + | Left -> "Left" + | Right -> "Right" + ) diff --git a/bootstrap/bin/hocc/assoc.mli b/bootstrap/bin/hocc/assoc.mli new file mode 100644 index 000000000..3884790d4 --- /dev/null +++ b/bootstrap/bin/hocc/assoc.mli @@ -0,0 +1,7 @@ +open Basis + +type t = + | Left + | Right + +include FormattableIntf.SMono with type t := t diff --git a/bootstrap/bin/hocc/conf.ml b/bootstrap/bin/hocc/conf.ml new file mode 100644 index 000000000..73547c265 --- /dev/null +++ b/bootstrap/bin/hocc/conf.ml @@ -0,0 +1,236 @@ +open! Basis +include Basis.Rudiments + +type algorithm = + | LR1Compact + | LR1Canonical + +let pp_algorithm algorithm formatter = + formatter |> Fmt.fmt (match algorithm with + | LR1Compact -> "LR1Compact" + | LR1Canonical -> "LR1Canonical" + ) + +type t = { + verbose: bool; + text: bool; + html: bool; + hocc: bool; + algorithm: algorithm; + hemlock: bool; + ocaml: bool; + srcdir_opt: Path.t option; + module_opt: Path.Segment.t option; + dstdir_opt: Path.t option; +} + +let pp {verbose; text; html; hocc; algorithm; hemlock; ocaml; srcdir_opt; module_opt; dstdir_opt} + formatter = + formatter + |> Fmt.fmt "{verbose=" |> Bool.pp verbose + |> Fmt.fmt "; text=" |> Bool.pp text + |> Fmt.fmt "; html=" |> Bool.pp html + |> Fmt.fmt "; hocc=" |> Bool.pp hocc + |> Fmt.fmt "; algorithm=" |> pp_algorithm algorithm + |> Fmt.fmt "; hemlock=" |> Bool.pp hemlock + |> Fmt.fmt "; ocaml=" |> Bool.pp ocaml + |> Fmt.fmt "; srcdir_opt=" |> (Option.pp Path.pp) srcdir_opt + |> Fmt.fmt "; module_opt=" |> (Option.pp Path.Segment.pp) module_opt + |> Fmt.fmt "; dstdir_opt=" |> (Option.pp Path.pp) dstdir_opt + |> Fmt.fmt "}" + +let default = { + verbose=false; + text=false; + html=false; + hocc=false; + algorithm=LR1Compact; + hemlock=false; + ocaml=false; + srcdir_opt=None; + module_opt=None; + dstdir_opt=None; +} + +let usage error = + let exit_code, formatter = match error with + | false -> 0, File.Fmt.stdout + | true -> 1, File.Fmt.stderr + in + formatter + |> Fmt.fmt {|hocc usage: hocc + +Parameters: + -h[elp] : Print command usage and exit. + -v[erbose] : Print progress information during parser generation. + -txt | -text : Write a detailed automoton description in plain text format + to "/hocc/.txt". + -html : Write a detailed automoton description in internally + hyperlinked HTML format to "/hocc/.html". + -hmh | -hocc : Write a complete grammar specification in hocc format to + "/hocc/.hmh", but with all non-terminal + types and reduction code omitted. + -c[anonical] : Generate a canonical LR(1) parser rather than a compact + LR(1) parser. + -hm | -hemlock : Generate a Hemlock-based parser implementation and write it + to "/.hm[i]". + -ml | -ocaml : Generate an OCaml-based parser implementation and write it + to "/.ml[i]". This is brittle + functionality intended only for Hemlock bootstrapping. + -s[rc] : Path and module name of input source, where inputs match + ".hmh[i]" and "" comprises the source directory + and module name, "[/]". +-d[stdir] : Path to directory in which to place generated output, such + that output file paths match "/[hocc/].*". + Defaults to "". +|} + |> ignore; + Stdlib.exit exit_code + +let is_segment_cident segment = + let rec cont cursor past = begin + match String.C.Cursor.(<) cursor past with + | false -> true + | true -> begin + let cp, cursor' = String.C.Cursor.next cursor in + match cp with + | cp when Codepoint.(cp >= of_char 'A' && cp <= of_char 'Z') -> cont cursor' past + | cp when Codepoint.(cp >= of_char 'a' && cp <= of_char 'z') -> cont cursor' past + | cp when Codepoint.(cp >= of_char '0' && cp <= of_char '9') -> cont cursor' past + | cp when Codepoint.(cp = of_char '_') -> cont cursor' past + | cp when Codepoint.(cp = of_char '\'') -> cont cursor' past + | _ -> false + end + end in + let rec start cursor past = begin + match String.C.Cursor.(<) cursor past with + | false -> false + | true -> begin + let cp, cursor' = String.C.Cursor.next cursor in + match cp with + | cp when Codepoint.(cp = of_char '_') -> start cursor' past + | cp when Codepoint.(cp >= of_char 'A' && cp <= of_char 'Z') -> cont cursor' past + | _ -> false + end + end in + match Path.Segment.to_string segment with + | None -> false + | Some s -> begin + let sslice = String.C.Slice.of_string s in + let base = String.C.Slice.base sslice in + let past = String.C.Slice.past sslice in + start base past + end + +let of_argv argv = + let arg_arg argv i = begin + let i' = succ i in + match i' < Array.length argv with + | false -> begin + let arg = Bytes.to_string_replace (Array.get i argv) in + File.Fmt.stderr |> Fmt.fmt "hocc: " |> Fmt.fmt arg |> Fmt.fmt " argument missing\n" + |> ignore; + usage true + end + | true -> Array.get i' argv + end in + let rec f t argv i = begin + match i < Array.length argv with + | false -> t + | true -> begin + let arg_bytes = Array.get i argv in + let arg_string = Bytes.to_string_replace arg_bytes in + match arg_string with + | "-help" | "-h" -> usage false + | "-verbose" | "-v" -> f {t with verbose=true} argv (succ i) + | "-txt" | "-text" -> f {t with text=true} argv (succ i) + | "-html" -> f {t with html=true} argv (succ i) + | "-hmh" | "-hocc" -> f {t with hocc=true} argv (succ i) + | "-canonical" | "-c" -> f {t with algorithm=LR1Canonical} argv (succ i) + | "-hm" | "-hemlock" -> f {t with hemlock=true} argv (succ i) + | "-ml" | "-ocaml" -> f {t with ocaml=true} argv (succ i) + | "-src" | "-s" -> begin + let path = Path.of_bytes (Bytes.Slice.init (arg_arg argv i)) in + let dirname, basename_opt = Path.split path in + let srcdir_opt = match Path.is_empty dirname with + | true -> None + | false -> Some dirname + in + let module_opt = match basename_opt with + | None -> begin + File.Fmt.stderr + |> Fmt.fmt "hocc: Invalid source: " + |> Path.pp path + |> Fmt.fmt "\n" + |> ignore; + usage true + end + | Some segment -> begin + match is_segment_cident segment with + | false -> begin + File.Fmt.stderr |> Fmt.fmt "hocc: Invalid source module name: " + |> Path.Segment.pp segment |> Fmt.fmt "\n" |> ignore; + usage true + end + | true -> Some segment + end + in + f {t with srcdir_opt; module_opt} argv (i + 2L) + end + | "-dstdir" | "-d" -> begin + let dstdir = Path.of_bytes (Bytes.Slice.init (arg_arg argv i)) in + f {t with dstdir_opt=Some dstdir} argv (i + 2L) + end + | _ -> begin + File.Fmt.stderr + |> Fmt.fmt "hocc: Invalid command line parameter: " + |> String.pp arg_string + |> Fmt.fmt "\n" + |> ignore; + usage true + end + end + end in + let t = f default argv 1L in + match t.module_opt with + | None -> begin + File.Fmt.stderr |> Fmt.fmt "hocc: Source unspecified\n" |> ignore; + usage true + end + | Some _ -> t + +let verbose {verbose; _} = + verbose + +let text {text; _} = + text + +let html {html; _} = + html + +let hocc {hocc; _} = + hocc + +let algorithm {algorithm; _} = + algorithm + +let hemlock {hemlock; _} = + hemlock + +let ocaml {ocaml; _} = + ocaml + +let srcdir {srcdir_opt; _} = + match srcdir_opt with + | None -> Path.of_string "." + | Some srcdir -> srcdir + +let module_ {module_opt; _} = + match module_opt with + | None -> not_reached () + | Some m -> m + +let dstdir {dstdir_opt; _} = + match dstdir_opt with + | None -> Path.of_string "." + | Some dstdir -> dstdir diff --git a/bootstrap/bin/hocc/conf.mli b/bootstrap/bin/hocc/conf.mli new file mode 100644 index 000000000..bc15f1f03 --- /dev/null +++ b/bootstrap/bin/hocc/conf.mli @@ -0,0 +1,24 @@ +open Basis + +type algorithm = + | LR1Compact + | LR1Canonical + +val pp_algorithm: algorithm -> (module Fmt.Formatter) -> (module Fmt.Formatter) + +type t + +include FormattableIntf.SMono with type t := t + +val of_argv: Bytes.t array -> t + +val verbose: t -> bool +val text: t -> bool +val html: t -> bool +val hocc: t -> bool +val algorithm: t -> algorithm +val hemlock: t -> bool +val ocaml: t -> bool +val srcdir: t -> Path.t +val module_: t -> Path.Segment.t +val dstdir: t -> Path.t diff --git a/bootstrap/bin/hocc/dune b/bootstrap/bin/hocc/dune new file mode 100644 index 000000000..15f70f33e --- /dev/null +++ b/bootstrap/bin/hocc/dune @@ -0,0 +1,7 @@ +(executables + (names hocc) + (libraries Basis Hmc)) + +(install + (section bin) + (files (hocc.exe as hocc))) diff --git a/bootstrap/bin/hocc/hocc.ml b/bootstrap/bin/hocc/hocc.ml new file mode 100644 index 000000000..30981b114 --- /dev/null +++ b/bootstrap/bin/hocc/hocc.ml @@ -0,0 +1,59 @@ +open Basis +open! Basis.Rudiments + +let parse_hmhi (Io.{hmhi; _} as io) = + match hmhi with + | Some text -> begin + let scanner = Scan.init text in + let io = + io.log + |> Fmt.fmt "hocc: Parsing " |> Path.pp (Option.value_hlt (Text.path text)) |> Fmt.fmt "\n" + |> Io.with_log io in + let _scanner', hmhi = Parse.hmhi scanner in + match hmhi with + | Error errors -> begin + List.iter (List.sort errors ~cmp:Parse.Error.cmp) ~f:(fun error -> + File.Fmt.stderr |> Parse.Error.fmt ~alt:true error |> ignore + ); + Stdlib.exit 1 + end + | Ok hmhi -> io, Some hmhi + end + | None -> io, None + +let parse_hmh (Io.{hmh; _} as io) = + let scanner = Scan.init hmh in + let io = + io.log + |> Fmt.fmt "hocc: Parsing " |> Path.pp (Option.value_hlt (Text.path hmh)) |> Fmt.fmt "\n" + |> Io.with_log io in + let _scanner', hmh = Parse.hmh scanner in + match hmh with + | Error errors -> begin + List.iter (List.sort errors ~cmp:Parse.Error.cmp) ~f:(fun error -> + File.Fmt.stderr |> Parse.Error.fmt ~alt:true error |> ignore + ); + Stdlib.exit 1 + end + | Ok hmh -> io, hmh + +let _ = + let conf = Conf.of_argv Os.argv in + let io = Io.init conf in + let io, _hmhi_opt = parse_hmhi io in + let io, hmh = parse_hmh io in + let io, spec = Spec.init conf io hmh in + let io = match Conf.text conf with + | false -> io + | true -> Spec.to_txt conf io spec + in + let io = match Conf.html conf with + | false -> io + | true -> Spec.to_html conf io spec + in + let io = match Conf.hocc conf with + | false -> io + | true -> Spec.to_hocc io spec + in + let _io = Io.fini conf io in + () diff --git a/bootstrap/bin/hocc/io.ml b/bootstrap/bin/hocc/io.ml new file mode 100644 index 000000000..227af7987 --- /dev/null +++ b/bootstrap/bin/hocc/io.ml @@ -0,0 +1,179 @@ +open Basis +include Basis.Rudiments + +type t = { + err: (module Fmt.Formatter); + hmhi: Text.t option; + hmh: Text.t; + log: (module Fmt.Formatter); + txt: (module Fmt.Formatter); + html: (module Fmt.Formatter); + hocc: (module Fmt.Formatter); + hmi: (module Fmt.Formatter); + hm: (module Fmt.Formatter); + mli: (module Fmt.Formatter); + ml: (module Fmt.Formatter); +} + +let init_err _conf = + File.Fmt.stderr + +let path_with_suffix ?(is_report=false) conf suffix = + Path.join [ + (Conf.srcdir conf); + Path.of_string (match is_report with false -> "" | true -> "hocc"); + Path.of_segment (Path.Segment.join [ + (Conf.module_ conf); + Option.value_hlt Path.(basename (of_string suffix)) + ]); + ] + +let open_infile_as_text path = + match File.of_path path with + | Ok f -> begin + let stream = File.Stream.of_file f in + let text = Text.of_bytes_stream ~path stream in + Ok text + end + | Error _ as error -> error + +let init_hmhi conf = + let path = path_with_suffix conf ".hmhi" in + match open_infile_as_text path with + | Ok text -> Some text + | Error _ -> None + +let open_error ~err path error = + let _err = + err + |> Fmt.fmt "hocc: File.of_path " |> Path.pp path |> Fmt.fmt ": " + |> Fmt.fmt (Errno.to_string error) + |> Fmt.fmt "\n" + in + Stdlib.exit 1 + +let init_hmh conf ~err = + let path = path_with_suffix conf ".hmh" in + match open_infile_as_text path with + | Ok text -> text + | Error error -> open_error ~err path error + +let init_log conf = + match Conf.verbose conf with + | false -> File.Fmt.sink + | true -> File.Fmt.stdout + +let init_txt conf = + match Conf.text conf with + | false -> File.Fmt.sink + | true -> String.Fmt.empty + +let init_html conf = + match Conf.html conf with + | false -> File.Fmt.sink + | true -> String.Fmt.empty + +let init_hocc conf = + match Conf.hocc conf with + | false -> File.Fmt.sink + | true -> String.Fmt.empty + +let init_hmi conf hmhi = + match Conf.hemlock conf, hmhi with + | false, _ + | _, None -> File.Fmt.sink + | true, Some _ -> String.Fmt.empty + +let init_hm conf = + match Conf.hemlock conf with + | false -> File.Fmt.sink + | true -> String.Fmt.empty + +let init_mli conf hmhi = + match Conf.ocaml conf, hmhi with + | false, _ + | _, None -> File.Fmt.sink + | true, Some _ -> String.Fmt.empty + +let init_ml conf = + match Conf.ocaml conf with + | false -> File.Fmt.sink + | true -> String.Fmt.empty + +let init conf = + let err = init_err conf in + let hmhi = init_hmhi conf in + let hmh = init_hmh conf ~err in + let log = init_log conf in + let txt = init_txt conf in + let html = init_html conf in + let hocc = init_hocc conf in + let hmi = init_hmi conf hmhi in + let hm = init_hm conf in + let mli = init_mli conf hmhi in + let ml = init_ml conf in + + {err; hmhi; hmh; log; txt; html; hocc; hmi; hm; mli; ml} + +let open_outfile_as_formatter ~is_report ~err path = + let _ = match is_report with + | false -> () + | true -> Os.mkdirat (Path.dirname path) |> ignore + in + match File.of_path ~flag:File.Flag.W path with + | Ok f -> File.Fmt.of_t f + | Error error -> open_error ~err path error + +let fini_formatter ?(is_report=false) conf ~err ~log formatter suffix = + match Fmt.sync formatter with + | To_string s -> begin + let path = path_with_suffix ~is_report conf suffix in + let log' = log |> Fmt.fmt "hocc: Writing " |> Path.pp path |> Fmt.fmt "\n" in + let formatter = open_outfile_as_formatter ~is_report ~err path in + let formatter' = formatter |> Fmt.fmt s |> Fmt.flush in + log', formatter' + end + | Synced formatter' -> log, formatter' + +let fini conf ({err; log; txt; html; hocc; hmi; hm; mli; ml; _} as t) = + let log, txt = fini_formatter ~is_report:true conf ~err ~log txt ".txt" in + let log, html = fini_formatter ~is_report:true conf ~err ~log html ".html" in + let log, hocc = fini_formatter ~is_report:true conf ~err ~log hocc ".hmh" in + let log, hmi = fini_formatter conf ~err ~log hmi ".hmi" in + let log, hm = fini_formatter conf ~err ~log hm ".hm" in + let log, mli = fini_formatter conf ~err ~log mli ".mli" in + let log, ml = fini_formatter conf ~err ~log ml ".ml" in + let log = Fmt.flush log in + {t with log; txt; html; hocc; hmi; hm; mli; ml} + +let fatal {err; _} = + let _err = Fmt.flush err in + Stdlib.exit 1 + +let with_log t log = + let log = Fmt.flush log in + {t with log} + +let with_err t err = + {t with err} + +let with_txt t txt = + {t with txt} + +let with_html t html = + {t with html} + +let with_hocc t hocc = + {t with hocc} + +let with_hmi t hmi = + {t with hmi} + +let with_hm t hm = + {t with hm} + +let with_mli t mli = + {t with mli} + +let with_ml t ml = + {t with ml} diff --git a/bootstrap/bin/hocc/io.mli b/bootstrap/bin/hocc/io.mli new file mode 100644 index 000000000..14866abd7 --- /dev/null +++ b/bootstrap/bin/hocc/io.mli @@ -0,0 +1,29 @@ +open Basis + +type t = { + err: (module Fmt.Formatter); + hmhi: Text.t option; + hmh: Text.t; + log: (module Fmt.Formatter); + txt: (module Fmt.Formatter); + html: (module Fmt.Formatter); + hocc: (module Fmt.Formatter); + hmi: (module Fmt.Formatter); + hm: (module Fmt.Formatter); + mli: (module Fmt.Formatter); + ml: (module Fmt.Formatter); +} + +val init: Conf.t -> t +val fini: Conf.t -> t -> t +val fatal: t -> 'a + +val with_err: t -> (module Fmt.Formatter) -> t +val with_log: t -> (module Fmt.Formatter) -> t +val with_txt: t -> (module Fmt.Formatter) -> t +val with_html: t -> (module Fmt.Formatter) -> t +val with_hocc: t -> (module Fmt.Formatter) -> t +val with_hmi: t -> (module Fmt.Formatter) -> t +val with_hm: t -> (module Fmt.Formatter) -> t +val with_mli: t -> (module Fmt.Formatter) -> t +val with_ml: t -> (module Fmt.Formatter) -> t diff --git a/bootstrap/bin/hocc/lr0Item.ml b/bootstrap/bin/hocc/lr0Item.ml new file mode 100644 index 000000000..1fbd3f182 --- /dev/null +++ b/bootstrap/bin/hocc/lr0Item.ml @@ -0,0 +1,32 @@ +open Basis +open Basis.Rudiments + +module T = struct + type t = { + prod: Prod.t; + dot: uns; + } + + let hash_fold {prod; dot} state = + state + |> Prod.hash_fold prod + |> Uns.hash_fold dot + + let cmp {prod=p0; dot=d0} {prod=p1; dot=d1} = + let open Cmp in + match Prod.cmp p0 p1 with + | Lt -> Lt + | Eq -> Uns.cmp d0 d1 + | Gt -> Gt + + let pp {prod; dot} formatter = + formatter + |> Fmt.fmt "{prod=" |> Prod.pp prod + |> Fmt.fmt "; dot=" |> Uns.pp dot + |> Fmt.fmt "}" +end +include T +include Identifiable.Make(T) + +let init ~prod ~dot = + {prod; dot} diff --git a/bootstrap/bin/hocc/lr0Item.mli b/bootstrap/bin/hocc/lr0Item.mli new file mode 100644 index 000000000..81b2a72f6 --- /dev/null +++ b/bootstrap/bin/hocc/lr0Item.mli @@ -0,0 +1,11 @@ +open Basis +open Basis.Rudiments + +type t = { + prod: Prod.t; + dot: uns; +} + +include IdentifiableIntf.S with type t := t + +val init: prod:Prod.t -> dot:uns -> t diff --git a/bootstrap/bin/hocc/lr0Itemset.ml b/bootstrap/bin/hocc/lr0Itemset.ml new file mode 100644 index 000000000..f97fc3bb4 --- /dev/null +++ b/bootstrap/bin/hocc/lr0Itemset.ml @@ -0,0 +1,26 @@ +open Basis +open! Basis.Rudiments + +module T = struct + type t = (Lr0Item.t, Lr0Item.cmper_witness) Ordset.t + + let hash_fold t = + Ordset.hash_fold t + + let cmp t0 t1 = + Ordset.cmp t0 t1 + + let pp t formatter = + formatter + |> Ordset.pp t +end +include T +include Identifiable.Make(T) + +let empty = Ordset.empty (module Lr0Item) + +let insert lr0item t = + Ordset.insert lr0item t + +let union t0 t1 = + Ordset.union t0 t1 diff --git a/bootstrap/bin/hocc/lr0Itemset.mli b/bootstrap/bin/hocc/lr0Itemset.mli new file mode 100644 index 000000000..d26cf1cd0 --- /dev/null +++ b/bootstrap/bin/hocc/lr0Itemset.mli @@ -0,0 +1,12 @@ +open Basis +open! Basis.Rudiments + +type t = (Lr0Item.t, Lr0Item.cmper_witness) Ordset.t + +include IdentifiableIntf.S with type t := t + +val empty: t + +val insert: Lr0Item.t -> t -> t + +val union: t -> t -> t diff --git a/bootstrap/bin/hocc/lr1Item.ml b/bootstrap/bin/hocc/lr1Item.ml new file mode 100644 index 000000000..dd7c02f82 --- /dev/null +++ b/bootstrap/bin/hocc/lr1Item.ml @@ -0,0 +1,76 @@ +open Basis +open Basis.Rudiments + +module T = struct + type t = { + lr0item: Lr0Item.t; + follow: (uns, Uns.cmper_witness) Ordset.t; + } + + let hash_fold {lr0item; follow} state = + state + |> Lr0Item.hash_fold lr0item + |> Ordset.hash_fold follow + + let cmp {lr0item=l0; follow=f0} {lr0item=l1; follow=f1} = + let open Cmp in + match Lr0Item.cmp l0 l1 with + | Lt -> Lt + | Eq -> Ordset.cmp f0 f1 + | Gt -> Gt + + let pp {lr0item; follow} formatter = + formatter + |> Fmt.fmt "{lr0item=" |> Lr0Item.pp lr0item + |> Fmt.fmt "; follow=" |> Ordset.pp follow +end +include T +include Identifiable.Make(T) + +let init ~lr0item ~follow = + assert (not (Ordset.is_empty follow)); + {lr0item; follow} + +(* The concatenation of the RHS symbols to the right of the dot and the follow set comprise an + * ordered sequence of symbols to be expected. Merge-fold the symbols' first sets (excluding "ε"), + * until a preceding symbol's first set does not contain "ε". Similarly, if all symbols contain "ε", + * merge the follow set (excluding "ε"). Merge "ε" if all symbols' first sets and the follow set + * contain "ε". *) +let first ~epsilon ~symbols {lr0item; follow} = + let append_symbol_set first merge_epsilon symbol_set = begin + let symbol_set_sans_epsilon = Ordset.remove Symbol.(epsilon.index) symbol_set in + let first' = Ordset.union symbol_set_sans_epsilon first in + let contains_epsilon = Ordset.mem epsilon.index symbol_set in + let merge_epsilon' = match contains_epsilon with + | false -> false + | true -> merge_epsilon + in + first', merge_epsilon' + end in + let rhs_indexes = lr0item.prod.rhs_indexes in + let rhs_slice = Array.Slice.init ~range:(lr0item.dot =:< Array.length rhs_indexes) rhs_indexes in + (* Merge-fold RHS symbols' first sets. *) + let first, merge_epsilon = Array.Slice.fold_until + ~init:(Ordset.empty (module Uns), true) + ~f:(fun (first, merge_epsilon) symbol_index -> + let symbol = Array.get symbol_index symbols in + let first', merge_epsilon' = append_symbol_set first merge_epsilon Symbol.(symbol.first) in + (first', merge_epsilon'), not merge_epsilon' + ) rhs_slice + in + (* Append the follow set only if all RHS symbols to the right of the dot contain "ε". *) + match merge_epsilon with + | false -> first + | true -> begin + let first', merge_epsilon' = append_symbol_set first merge_epsilon follow in + match merge_epsilon' with + | false -> first' + | true -> Ordset.insert epsilon.index first' + end + +let is_kernel_item {lr0item={prod; dot}; _} = + Uns.(dot > 0L) || (Prod.is_synthetic prod) + +let follow_union symbol_indexes t = + let follow = Ordset.union symbol_indexes t.follow in + {t with follow} diff --git a/bootstrap/bin/hocc/lr1Item.mli b/bootstrap/bin/hocc/lr1Item.mli new file mode 100644 index 000000000..ee0e91776 --- /dev/null +++ b/bootstrap/bin/hocc/lr1Item.mli @@ -0,0 +1,23 @@ +open Basis +open Basis.Rudiments + +type t = { + lr0item: Lr0Item.t; + follow: (uns, Uns.cmper_witness) Ordset.t; +} + +include IdentifiableIntf.S with type t := t + +val init: lr0item:Lr0Item.t -> follow:(uns, Uns.cmper_witness) Ordset.t -> t + +val first: epsilon:Symbol.t -> symbols:Symbol.t array -> t -> (uns, Uns.cmper_witness) Ordset.t +(** [first ~epsilon ~symbols t] computes the first set of [t]. The first set is not memoized because + it is only needed during closure computation in [Lr1ItemsetClosure] (the [init] and [merge] + functions), whereas many items may be created as goto set elements, but only compatible goto + sets are merged. *) + +val is_kernel_item: t -> bool +(** [is_kernel_item t] returns true iff [t] would be a valid kernel item. Kernel items must have non-zero + production dot positions unless they reduce to synthetic start symbol wrappers. *) + +val follow_union: (uns, Uns.cmper_witness) Ordset.t -> t -> t diff --git a/bootstrap/bin/hocc/lr1Itemset.ml b/bootstrap/bin/hocc/lr1Itemset.ml new file mode 100644 index 000000000..4d46aa355 --- /dev/null +++ b/bootstrap/bin/hocc/lr1Itemset.ml @@ -0,0 +1,97 @@ +open Basis +open! Basis.Rudiments + +module T = struct + type t = (Lr0Item.t, Lr1Item.t, Lr0Item.cmper_witness) Ordmap.t + + let hash_fold = Ordmap.hash_fold Lr1Item.hash_fold + + let cmp = Ordmap.cmp Lr1Item.cmp + + let pp = Ordmap.pp Lr1Item.pp +end +include T +include Identifiable.Make(T) + +module Seq = struct + type container = t + type elm = Lr1Item.t + type t = (Lr0Item.t, Lr1Item.t, Lr0Item.cmper_witness) Ordmap.Seq.t + + let length = Ordmap.Seq.length + let next t = + match Ordmap.Seq.next t with (_, lr1item), t' -> lr1item, t' + + let next_opt t = + match Ordmap.Seq.next_opt t with + | None -> None + | Some ((_, lr1item), t') -> Some (lr1item, t') + + let init = Ordmap.Seq.init +end + +let empty = Ordmap.empty (module Lr0Item) + +let singleton (Lr1Item.{lr0item; _} as lr1item) = + Ordmap.singleton (module Lr0Item) ~k:lr0item ~v:lr1item + +let length = Ordmap.length + +let is_empty = Ordmap.is_empty + +let mem Lr1Item.{lr0item; follow} t = + match Ordmap.mem lr0item t with + | false -> false + | true -> begin + let Lr1Item.{follow=t_follow; _} = Ordmap.get_hlt lr0item t in + Ordset.subset t_follow follow + end + +let choose t = + match Ordmap.choose t with + | None -> None + | Some (_, lr1item) -> Some lr1item + +let insert (Lr1Item.{lr0item; follow} as lr1item) t = + match Ordmap.get lr0item t with + | None -> Ordmap.insert ~k:lr0item ~v:lr1item t + | Some Lr1Item.{follow=t_follow; _} -> begin + let lr1item' = Lr1Item.init ~lr0item ~follow:(Ordset.union follow t_follow) in + Ordmap.update_hlt ~k:lr0item ~v:lr1item' t + end + +let insert_hlt (Lr1Item.{lr0item; follow} as lr1item) t = + match Ordmap.get lr0item t with + | None -> Ordmap.insert ~k:lr0item ~v:lr1item t + | Some Lr1Item.{follow=t_follow; _} -> begin + let t_follow' = Ordset.union follow t_follow in + match Cmp.is_eq (Ordset.cmp t_follow t_follow') with + | true -> halt "Item already present" + | false -> begin + let lr1item' = Lr1Item.init ~lr0item ~follow:t_follow' in + Ordmap.update_hlt ~k:lr0item ~v:lr1item' t + end + end + +let remove Lr1Item.{lr0item; follow} t = + match Ordmap.get lr0item t with + | None -> t + | Some Lr1Item.{follow=t_follow; _} -> begin + let follow' = Ordset.diff t_follow follow in + match Ordset.is_empty follow' with + | true -> Ordmap.remove lr0item t + | false -> begin + let lr1item' = Lr1Item.init ~lr0item ~follow:follow' in + Ordmap.update_hlt ~k:lr0item ~v:lr1item' t + end + end + +let equal = Ordmap.equal Lr1Item.(=) + +let fold ~init ~f t = + Ordmap.fold ~init ~f:(fun accum (_, lr1item) -> f accum lr1item) t + +let lr0itemset t = + Ordmap.fold ~init:Lr0Itemset.empty ~f:(fun lr0itemset (lr0item, _) -> + Ordset.insert lr0item lr0itemset + ) t diff --git a/bootstrap/bin/hocc/lr1Itemset.mli b/bootstrap/bin/hocc/lr1Itemset.mli new file mode 100644 index 000000000..085fd3265 --- /dev/null +++ b/bootstrap/bin/hocc/lr1Itemset.mli @@ -0,0 +1,38 @@ +open Basis +open! Basis.Rudiments + +type t + +include IdentifiableIntf.S with type t := t + +module Seq : sig + type container = t + + include SeqIntf.SMonoDef with type elm = Lr1Item.t + + val init: container -> t +end + +val empty: t + +val singleton: Lr1Item.t -> t + +val length: t -> uns + +val is_empty: t -> bool + +val mem: Lr1Item.t -> t -> bool + +val choose: t -> Lr1Item.t option + +val insert: Lr1Item.t -> t -> t + +val insert_hlt: Lr1Item.t -> t -> t + +val remove: Lr1Item.t -> t -> t + +val equal: t -> t -> bool + +val fold: init:'accum -> f:('accum -> Lr1Item.t -> 'accum) -> t -> 'accum + +val lr0itemset: t -> Lr0Itemset.t diff --git a/bootstrap/bin/hocc/lr1ItemsetClosure.ml b/bootstrap/bin/hocc/lr1ItemsetClosure.ml new file mode 100644 index 000000000..c770de45c --- /dev/null +++ b/bootstrap/bin/hocc/lr1ItemsetClosure.ml @@ -0,0 +1,167 @@ +open Basis +open Basis.Rudiments + +module T = struct + type t = { + index: uns; + kernel: Lr1Itemset.t; + added: Lr1Itemset.t; + } + + let hash_fold {index; _} state = + state |> Uns.hash_fold index + + let cmp {index=i0; _} {index=i1; _} = + Uns.cmp i0 i1 + + let pp {index; kernel; added} formatter = + formatter + |> Fmt.fmt "{index=" |> Uns.pp index + |> Fmt.fmt "; kernel=" |> Lr1Itemset.pp kernel + |> Fmt.fmt "; added=" |> Lr1Itemset.pp added + |> Fmt.fmt "}" +end +include T +include Identifiable.Make(T) + +(* Update closure to incorporate `lr1itemset`. *) +let add_lr1itemset ~epsilon ~symbols lr1itemset t = + let rec f ~epsilon ~symbols lr1itemset t = begin + match Lr1Itemset.choose lr1itemset with + | None -> t + | Some (Lr1Item.{lr0item={prod={rhs_indexes; _} as prod; dot}; follow} as lr1item) -> begin + let lr1itemset' = Lr1Itemset.remove lr1item lr1itemset in + match Uns.(dot < Array.length rhs_indexes) with + | false -> f ~epsilon ~symbols lr1itemset' t + | true -> begin + let rhs_symbol_index = Array.get dot rhs_indexes in + let rhs_symbol = Array.get rhs_symbol_index symbols in + match Symbol.is_nonterm rhs_symbol with + | false -> f ~epsilon ~symbols lr1itemset' t + | true -> begin + let lhs = rhs_symbol in + let follow' = Lr1Item.first ~epsilon ~symbols + (Lr1Item.init ~lr0item:(Lr0Item.init ~prod ~dot:(succ dot)) ~follow) in + let lr1itemset', t' = Ordset.fold ~init:(lr1itemset', t) + ~f:(fun (lr1itemset, t) prod -> + let lr0item = Lr0Item.init ~prod ~dot:0L in + let lr1item = Lr1Item.init ~lr0item ~follow:follow' in + match Lr1Itemset.mem lr1item t.added with + | true -> lr1itemset, t + | false -> begin + let lr1itemset' = Lr1Itemset.insert lr1item lr1itemset in + let added' = Lr1Itemset.insert_hlt lr1item t.added in + lr1itemset', {t with added=added'} + end + ) lhs.prods in + f ~epsilon ~symbols lr1itemset' t' + end + end + end + end in + f ~epsilon ~symbols lr1itemset t + +(* Merge the kernel represented by `lr1itemset` into `t`'s kernel, then update the closure. *) +let merge ~epsilon ~symbols lr1itemset t = + let lr1itemset', kernel' = Lr1Itemset.fold + ~init:(Lr1Itemset.empty, t.kernel) + ~f:(fun (lr1itemset, kernel) lr1item -> + assert (Lr1Item.is_kernel_item lr1item); + match Lr1Itemset.mem lr1item kernel with + | true -> lr1itemset, kernel + | false -> begin + let lr1itemset' = Lr1Itemset.insert_hlt lr1item lr1itemset in + let kernel' = Lr1Itemset.insert_hlt lr1item kernel in + lr1itemset', kernel' + end + ) lr1itemset in + assert (Bool.(=) (Lr1Itemset.is_empty lr1itemset') (Lr1Itemset.equal t.kernel kernel')); + match Lr1Itemset.is_empty lr1itemset' with + | true -> false, t + | false -> begin + let t' = add_lr1itemset ~epsilon ~symbols lr1itemset' {t with kernel=kernel'} in + true, t' + end + +let init ~epsilon ~symbols ~index lr1itemset = + match merge ~epsilon ~symbols lr1itemset { + index; + kernel=Lr1Itemset.empty; + added=Lr1Itemset.empty; + } with _, t -> t + +let next t = + let f symbol_indexes Lr1Item.{lr0item={prod={rhs_indexes; _}; dot}; _} = begin + match Uns.(dot < Array.length rhs_indexes) with + | false -> symbol_indexes + | true -> begin + let symbol_index = Array.get dot rhs_indexes in + Ordset.insert symbol_index symbol_indexes + end + end in + Lr1Itemset.fold ~init:(Lr1Itemset.fold ~init:(Ordset.empty (module Uns)) ~f t.kernel) ~f t.added + +let goto symbol t = + let f lr1itemset (Lr1Item.{lr0item={prod={rhs_indexes; _} as prod; dot}; _} as lr1item) = begin + match Uns.(dot < Array.length rhs_indexes) && + Uns.(Array.get dot rhs_indexes = Symbol.(symbol.index)) with + | false -> lr1itemset + | true -> begin + let lr0item' = Lr0Item.init ~prod ~dot:(succ dot) in + let lr1item' = Lr1Item.init ~lr0item:lr0item' ~follow:lr1item.follow in + assert (Lr1Item.is_kernel_item lr1item'); + Lr1Itemset.insert lr1item' lr1itemset + end + end in + Lr1Itemset.fold ~init:(Lr1Itemset.fold ~init:Lr1Itemset.empty ~f t.kernel) ~f t.added + +let compat_weak lr1itemset t = + let rec ident_lr0 o_seq t_seq = begin + match Lr1Itemset.Seq.next_opt o_seq, Lr1Itemset.Seq.next_opt t_seq with + | None, None -> true + | Some ({lr0item=o_lr0item; _}, o_seq'), Some ({lr0item=t_lr0item; _}, t_seq') -> begin + (* Require identical LR(0) item. *) + match Lr0Item.(o_lr0item = t_lr0item) with + | false -> false + | true -> ident_lr0 o_seq' t_seq' + end + | None, Some _ + | Some _, None -> not_reached () + end in + let rec compat_follow o_seq t_seq = begin + let rec compat_follow_inner o_seq t_seq o_follow t_follow = begin + match Lr1Itemset.Seq.next_opt o_seq, Lr1Itemset.Seq.next_opt t_seq with + | None, None -> true + | Some ({follow=o_follow'; _}, o_seq'), Some ({follow=t_follow'; _}, t_seq') + -> begin + (* Require weakly compatible follow sets for all follow set pairings, as defined by the + * Pager(1977) algorithm, and as refined by Menhir to prevent phantom conflicts + * accompanying actual conflicts. *) + match + (Ordset.subset (Ordset.union t_follow o_follow') (Ordset.inter o_follow t_follow')), + (Ordset.subset (Ordset.union o_follow t_follow') (Ordset.inter t_follow o_follow')) + with + | true, true -> compat_follow_inner o_seq' t_seq' o_follow t_follow + | _ -> false + end + | None, Some _ + | Some _, None -> not_reached () + + end in + match Lr1Itemset.Seq.next_opt o_seq, Lr1Itemset.Seq.next_opt t_seq with + | None, None -> true + | Some ({follow=o_follow; _}, o_seq'), Some ({follow=t_follow; _}, t_seq') -> + compat_follow_inner o_seq' t_seq' o_follow t_follow && compat_follow o_seq' t_seq' + | None, Some _ + | Some _, None -> not_reached () + end in + match Uns.(=) (Lr1Itemset.length lr1itemset) (Lr1Itemset.length t.kernel) with + | false -> false + | true -> begin + let o_seq = Lr1Itemset.Seq.init lr1itemset in + let t_seq = Lr1Itemset.Seq.init t.kernel in + ident_lr0 o_seq t_seq && compat_follow o_seq t_seq + end + +let compat_ident lr1itemset {kernel; _} = + Lr1Itemset.equal lr1itemset kernel diff --git a/bootstrap/bin/hocc/lr1ItemsetClosure.mli b/bootstrap/bin/hocc/lr1ItemsetClosure.mli new file mode 100644 index 000000000..fb13bb0ea --- /dev/null +++ b/bootstrap/bin/hocc/lr1ItemsetClosure.mli @@ -0,0 +1,35 @@ +open Basis +open Basis.Rudiments + +type t = { + index: uns; + kernel: Lr1Itemset.t; + added: Lr1Itemset.t; +} + +include IdentifiableIntf.S with type t := t + +val init: epsilon:Symbol.t -> symbols:Symbol.t array -> index:uns -> Lr1Itemset.t -> t +(** [init ~epsilon ~symbols ~index lr1itemset] creates the closure of the kernel represented by + [lr1itemset], with index set to [index]. *) + +val merge: epsilon:Symbol.t -> symbols:Symbol.t array -> Lr1Itemset.t -> t -> bool * t +(** [merge ~epsilon ~symbols lr1itemset t] merges the kernel represented by [lr1itemset] into [t]'s + kernel and creates the closure of the merged kernel. The boolean result indicates whether + items were merged into the kernel. *) + +val next: t -> (uns, Uns.cmper_witness) Ordset.t +(** [next t] returns the set of symbol indexes that may appear next, i.e. the symbol indexes + corresponding to the symbols for which [goto] returns a non-empty set. *) + +val goto: Symbol.t -> t -> Lr1Itemset.t +(** [goto symbol t] computes the kernel of the goto set reachable from [t], given [symbol]. *) + +val compat_weak: Lr1Itemset.t -> t -> bool +(** [compat_weak lr1itemset t] determines if [lr1itemset] is weakly compatible, as defined by the + Pager(1977) algorithm, and as refined by Menhir to prevent phantom conflicts accompanying actual + conflicts. *) + +val compat_ident: Lr1Itemset.t -> t -> bool +(** [compat_ident lr1itemset t] determines if [lr1itemset] and [t] have identical kernels, which is + the basis of the canonical LR(1) algorithm. *) diff --git a/bootstrap/bin/hocc/parse.ml b/bootstrap/bin/hocc/parse.ml new file mode 100644 index 000000000..f6070538b --- /dev/null +++ b/bootstrap/bin/hocc/parse.ml @@ -0,0 +1,1918 @@ +open Basis +open! Basis.Rudiments + +module Error = struct + module T = struct + type t = { + source: Hmc.Source.Slice.t; + msg: string; + } + + let cmp t0 t1 = + Hmc.Source.Slice.cmp t0.source t1.source + + let pp {source; msg} formatter = + formatter + |> Fmt.fmt "{source=" |> Hmc.Source.Slice.pp source + |> Fmt.fmt "; msg=" |> String.pp msg + |> Fmt.fmt "}" + + let fmt ?(alt=false) ({source; msg} as t) formatter = + match alt with + | false -> pp t formatter + | true -> begin + formatter + |> Fmt.fmt "hocc: At " + |> Hmc.Source.Slice.pp source + |> Fmt.fmt ": " + |> Fmt.fmt msg + |> Fmt.fmt "\n" + end + end + include T + include Cmpable.Make(T) + + let init_token token msg = + {source=Scan.Token.source token; msg} + + let init_mal mal = + let open Hmc.Scan.AbstractToken.Rendition.Malformation in + {source=source mal; msg=description mal} + + let init_scanner scanner msg = + let cursor = Scan.cursor scanner in + let source = Hmc.Source.Slice.of_cursors ~base:cursor ~past:cursor in + {source; msg} +end + +type uident = + | Uident of {uident: Scan.Token.t} +and cident = + | Cident of {cident: Scan.Token.t} +and ident = + | IdentUident of {uident: uident} + | IdentCident of {cident: cident} + | IdentUscore of {uscore: Scan.Token.t} +and precs_tl = + | PrecsTlCommaUident of {comma: Scan.Token.t; uident: uident; precs_tl: precs_tl} + | PrecsTlEpsilon +and precs = + | Precs of {uident: uident; precs_tl: precs_tl} +and prec_rels = + | PrecRelsLtPrecs of {lt: Scan.Token.t; precs: precs} + | PrecRelsEpsilon +and prec_type = + | PrecTypePrec of {prec: Scan.Token.t} + | PrecTypeLeft of {left: Scan.Token.t} + | PrecTypeRight of {right: Scan.Token.t} +and prec = + | Prec of {prec_type: prec_type; uident: uident; prec_rels: prec_rels} +and of_type = + | OfType of {of_: Scan.Token.t; type_module: cident; dot: Scan.Token.t; type_type: uident} +and of_type0 = + | OfType0OfType of {of_type: of_type} + | OfType0Epsilon +and prec_ref = + | PrecRefPrecUident of {prec: Scan.Token.t; uident: uident} + | PrecRefEpsilon +and token_alias = + | TokenAlias of {alias: Scan.Token.t} + | TokenAliasEpsilon +and token = + | Token of {token: Scan.Token.t; cident: cident; token_alias: token_alias; of_type0: of_type0; + prec_ref: prec_ref} +and sep = + | SepLineDelim of {line_delim: Scan.Token.t} + | SepSemi of {semi: Scan.Token.t} + | SepBar of {bar: Scan.Token.t} +and codes_tl = + | CodesTlSepCode of {sep: sep; code: code; codes_tl: codes_tl} + | CodesTlEpsilon +and codes = + | Codes of {code: code; codes_tl: codes_tl} +and codes0 = + | Codes0Codes of {codes: codes} + | Codes0Epsilon +and delimited = + | DelimitedBlock of {indent: Scan.Token.t; codes: codes; dedent: Scan.Token.t} + | DelimitedParen of {lparen: Scan.Token.t; codes0: codes0; rparen: Scan.Token.t} + | DelimitedCapture of {lcapture: Scan.Token.t; codes0: codes0; rcapture: Scan.Token.t} + | DelimitedList of {lbrack: Scan.Token.t; codes0: codes0; rbrack: Scan.Token.t} + | DelimitedArray of {larray: Scan.Token.t; codes0: codes0; rarray: Scan.Token.t} + | DelimitedModule of {lcurly: Scan.Token.t; codes0: codes0; rcurly: Scan.Token.t} +and code_tl = + | CodeTlDelimited of {delimited: delimited; code_tl: code_tl} + | CodeTlToken of {token: Scan.Token.t; code_tl: code_tl} + | CodeTlEpsilon +and code = + | CodeDelimited of {delimited: delimited; code_tl: code_tl} + | CodeToken of {token: Scan.Token.t; code_tl: code_tl} +and prod_param_symbol = + | ProdParamSymbolCident of {cident: cident} + | ProdParamSymbolAlias of {alias: Scan.Token.t} +and prod_param_binding = + | ProdParamBindingIdentColon of {ident: ident; colon: Scan.Token.t} + | ProdParamBindingEpsilon +and prod_param = + | ProdParam of {prod_param_binding: prod_param_binding; prod_param_symbol: prod_param_symbol} +and prod_params_tl = + | ProdParamsTlProdParam of {prod_param: prod_param; prod_params_tl: prod_params_tl} + | ProdParamsTlEpsilon +and prod_params = + | ProdParamsProdParam of {prod_param: prod_param; prod_params_tl: prod_params_tl} +and prod_pattern = + | ProdPatternParams of {prod_params: prod_params} + | ProdPatternEpsilon of {epsilon: Scan.Token.t} +and prod = + | Prod of {prod_pattern: prod_pattern; prec_ref: prec_ref} +and prods_tl = + | ProdsTlBarProd of {bar: Scan.Token.t; prod: prod; prods_tl: prods_tl} + | ProdsTlEpsilon +and prods = + | ProdsBarProd of {bar: Scan.Token.t; prod: prod; prods_tl: prods_tl} + | ProdsProd of {prod: prod; prods_tl: prods_tl} +and reduction = + | Reduction of {prods: prods; arrow: Scan.Token.t; code: code} +and reductions_tl = + | ReductionsTlBarReduction of {bar: Scan.Token.t; reduction: reduction; + reductions_tl: reductions_tl} + | ReductionsTlEpsilon +and reductions = + | ReductionsBarReduction of {bar: Scan.Token.t; reduction: reduction; + reductions_tl: reductions_tl} + | ReductionsReduction of {reduction: reduction; reductions_tl: reductions_tl} +and nonterm_type = + | NontermTypeNonterm of {nonterm: Scan.Token.t} + | NontermTypeStart of {start: Scan.Token.t} +and nonterm = + | NontermProds of {nonterm_type: nonterm_type; cident: cident; prec_ref: prec_ref; + cce: Scan.Token.t; prods: prods} + | NontermReductions of {nonterm_type: nonterm_type; cident: cident; of_type: of_type; + prec_ref: prec_ref; cce: Scan.Token.t; reductions: reductions} +and stmt = + | StmtPrec of {prec: prec} + | StmtToken of {token: token} + | StmtNonterm of {nonterm: nonterm} + | StmtCode of {code: code} +and stmts_tl = + | StmtsTl of {line_delim: Scan.Token.t; stmt: stmt; stmts_tl: stmts_tl} + | StmtsTlEpsilon +and stmts = + | Stmts of {stmt: stmt; stmts_tl: stmts_tl} +and hocc = + | Hocc of {hocc: Scan.Token.t; indent: Scan.Token.t; stmts: stmts; dedent: Scan.Token.t} +and eoi = + | Eoi of {eoi: Scan.Token.t} +and matter = + | Matter of {token: Scan.Token.t; matter: matter} + | MatterEpsilon +and hmh = + | Hmh of {prelude: matter; hocc: hocc; postlude: matter; eoi: eoi} +and hmhi = + | Hmhi of {prelude: matter; hocc: Scan.Token.t; postlude: matter; eoi: eoi} + +(**************************************************************************************************) +(* source_of_* functions. *) + +(* Not to be confused with joining forces. *) +let join_sources source0_opt source1_opt = + match source0_opt, source1_opt with + | None, None -> None + | Some _, None -> source0_opt + | None, Some _ -> source1_opt + | Some source0, Some source1 -> begin + let open Hmc.Source in + let base0, past0 = Slice.cursors source0 in + let base1, past1 = Slice.cursors source1 in + let open Cmp in + let base = match Cursor.cmp base0 base1 with + | Lt + | Eq -> base0 + | Gt -> base1 + in + let past = match Cursor.cmp past0 past1 with + | Lt + | Eq -> past1 + | Gt -> past0 + in + Some (Slice.of_cursors ~base ~past) + end + +(* Not to be confused with a token force. *) +let token_source token = + Some (Scan.Token.source token) + +let rec source_of_uident = function + | Uident {uident} -> token_source uident + +and source_of_cident = function + | Cident {cident} -> token_source cident + +and source_of_ident = function + | IdentUident {uident} -> source_of_uident uident + | IdentCident {cident} -> source_of_cident cident + | IdentUscore {uscore} -> token_source uscore + +and source_of_precs_tl = function + | PrecsTlCommaUident {comma; uident; precs_tl} -> + token_source comma + |> join_sources (source_of_uident uident) + |> join_sources (source_of_precs_tl precs_tl) + | PrecsTlEpsilon -> None + +and source_of_precs = function + | Precs {uident; precs_tl} -> + source_of_uident uident + |> join_sources (source_of_precs_tl precs_tl) + +and source_of_prec_rels = function + | PrecRelsLtPrecs {lt; precs} -> + token_source lt + |> join_sources (source_of_precs precs) + | PrecRelsEpsilon -> None + +and source_of_prec_type = function + | PrecTypePrec {prec} -> token_source prec + | PrecTypeLeft {left} -> token_source left + | PrecTypeRight {right} -> token_source right + +and source_of_prec = function + | Prec {prec_type; uident; prec_rels} -> + source_of_prec_type prec_type + |> join_sources (source_of_uident uident) + |> join_sources (source_of_prec_rels prec_rels) + +and source_of_of_type = function + | OfType {of_; type_module=_; dot; type_type} -> + token_source of_ + |> join_sources (token_source dot) + |> join_sources (source_of_uident type_type) + +and source_of_of_type0 = function + | OfType0OfType {of_type} -> source_of_of_type of_type + | OfType0Epsilon -> None + +and source_of_prec_ref = function + | PrecRefPrecUident {prec; uident} -> + token_source prec + |> join_sources (source_of_uident uident) + | PrecRefEpsilon -> None + +and source_of_token_alias = function + | TokenAlias {alias} -> token_source alias + | TokenAliasEpsilon -> None + +and source_of_token = function + | Token {token; cident; token_alias; of_type0; prec_ref} -> + token_source token + |> join_sources (source_of_cident cident) + |> join_sources (source_of_token_alias token_alias) + |> join_sources (source_of_of_type0 of_type0) + |> join_sources (source_of_prec_ref prec_ref) + +and source_of_sep = function + | SepLineDelim {line_delim} -> token_source line_delim + | SepSemi {semi} -> token_source semi + | SepBar {bar} -> token_source bar + +and source_of_codes_tl = function + | CodesTlSepCode {sep; code; codes_tl} -> + source_of_sep sep + |> join_sources (source_of_code code) + |> join_sources (source_of_codes_tl codes_tl) + | CodesTlEpsilon -> None + +and source_of_codes = function + | Codes {code; codes_tl} -> + source_of_code code + |> join_sources (source_of_codes_tl codes_tl) + +and source_of_codes0 = function + | Codes0Codes {codes} -> source_of_codes codes + | Codes0Epsilon -> None + +and source_of_delimited = function + | DelimitedBlock {indent=ldelim; codes=_; dedent=rdelim} + | DelimitedParen {lparen=ldelim; codes0=_; rparen=rdelim} + | DelimitedCapture {lcapture=ldelim; codes0=_; rcapture=rdelim} + | DelimitedList {lbrack=ldelim; codes0=_; rbrack=rdelim} + | DelimitedArray {larray=ldelim; codes0=_; rarray=rdelim} + | DelimitedModule {lcurly=ldelim; codes0=_; rcurly=rdelim} -> + token_source ldelim + |> join_sources (token_source rdelim) + +and source_of_code_tl = function + | CodeTlDelimited {delimited; code_tl} -> + source_of_delimited delimited + |> join_sources (source_of_code_tl code_tl) + | CodeTlToken {token; code_tl} -> + token_source token + |> join_sources (source_of_code_tl code_tl) + | CodeTlEpsilon -> None + +and source_of_code = function + | CodeDelimited {delimited; code_tl} -> + source_of_delimited delimited + |> join_sources (source_of_code_tl code_tl) + | CodeToken {token; code_tl} -> + token_source token + |> join_sources (source_of_code_tl code_tl) + +and source_of_prod_param_symbol = function + | ProdParamSymbolCident {cident} -> source_of_cident cident + | ProdParamSymbolAlias {alias} -> token_source alias + +and source_of_prod_param_binding = function + | ProdParamBindingIdentColon {ident; colon} -> + source_of_ident ident + |> join_sources (token_source colon) + | ProdParamBindingEpsilon -> None + +and source_of_prod_param = function + | ProdParam {prod_param_binding; prod_param_symbol} -> + source_of_prod_param_binding prod_param_binding + |> join_sources (source_of_prod_param_symbol prod_param_symbol) + +and source_of_prod_params_tl = function + | ProdParamsTlProdParam {prod_param; prod_params_tl} -> + source_of_prod_param prod_param + |> join_sources (source_of_prod_params_tl prod_params_tl) + | ProdParamsTlEpsilon -> None + +and source_of_prod_params = function + | ProdParamsProdParam {prod_param; prod_params_tl} -> + source_of_prod_param prod_param + |> join_sources (source_of_prod_params_tl prod_params_tl) + +and source_of_prod_pattern = function + | ProdPatternParams {prod_params} -> source_of_prod_params prod_params + | ProdPatternEpsilon {epsilon} -> token_source epsilon + +and source_of_prod = function + | Prod {prod_pattern; prec_ref} -> + source_of_prod_pattern prod_pattern + |> join_sources (source_of_prec_ref prec_ref) + +and source_of_prods_tl = function + | ProdsTlBarProd {bar; prod; prods_tl} -> + token_source bar + |> join_sources (source_of_prod prod) + |> join_sources (source_of_prods_tl prods_tl) + | ProdsTlEpsilon -> None + +and source_of_prods = function + | ProdsBarProd {bar; prod; prods_tl} -> + token_source bar + |> join_sources (source_of_prod prod) + |> join_sources (source_of_prods_tl prods_tl) + | ProdsProd {prod; prods_tl} -> + source_of_prod prod + |> join_sources (source_of_prods_tl prods_tl) + +and source_of_reduction = function + | Reduction {prods; arrow=_; code} -> + source_of_prods prods + |> join_sources (source_of_code code) + +and source_of_reductions_tl = function + | ReductionsTlBarReduction {bar; reduction; reductions_tl} -> + token_source bar + |> join_sources (source_of_reduction reduction) + |> join_sources (source_of_reductions_tl reductions_tl) + | ReductionsTlEpsilon -> None + +and source_of_reductions = function + | ReductionsBarReduction {bar; reduction; reductions_tl} -> + token_source bar + |> join_sources (source_of_reduction reduction) + |> join_sources (source_of_reductions_tl reductions_tl) + | ReductionsReduction {reduction; reductions_tl} -> + source_of_reduction reduction + |> join_sources (source_of_reductions_tl reductions_tl) + +and source_of_nonterm_type = function + | NontermTypeNonterm {nonterm} -> token_source nonterm + | NontermTypeStart {start} -> token_source start + +and source_of_nonterm = function + | NontermProds {nonterm_type; cident=_; prec_ref=_; cce=_; prods} -> + source_of_nonterm_type nonterm_type + |> join_sources (source_of_prods prods) + | NontermReductions {nonterm_type; cident=_; of_type=_; prec_ref=_; cce=_; reductions} -> + source_of_nonterm_type nonterm_type + |> join_sources (source_of_reductions reductions) + +and source_of_stmt = function + | StmtPrec {prec} -> source_of_prec prec + | StmtToken {token} -> source_of_token token + | StmtNonterm {nonterm} -> source_of_nonterm nonterm + | StmtCode {code} -> source_of_code code + +and source_of_stmts_tl = function + | StmtsTl {line_delim; stmt; stmts_tl} -> + token_source line_delim + |> join_sources (source_of_stmt stmt) + |> join_sources (source_of_stmts_tl stmts_tl) + | StmtsTlEpsilon -> None + +and source_of_stmts = function + | Stmts {stmt; stmts_tl} -> + source_of_stmt stmt + |> join_sources (source_of_stmts_tl stmts_tl) + +and source_of_hocc = function + | Hocc {hocc; indent=_; stmts=_; dedent} -> + token_source hocc + |> join_sources (token_source dedent) + +and source_of_eoi = function + | Eoi {eoi} -> token_source eoi + +and source_of_matter = function + | Matter {token; matter} -> + token_source token + |> join_sources (source_of_matter matter) + | MatterEpsilon -> None + +and source_of_hmh = function + | Hmh {prelude; hocc; postlude=_; eoi} -> + source_of_matter prelude + |> join_sources (source_of_hocc hocc) + |> join_sources (source_of_eoi eoi) + +and source_of_hmhi = function + | Hmhi {prelude; hocc; postlude=_; eoi} -> + source_of_matter prelude + |> join_sources (token_source hocc) + |> join_sources (source_of_eoi eoi) + +(**************************************************************************************************) +(* fmt_* functions. *) + +let fmt_lcurly ~alt ~width formatter = + match alt with + | false -> formatter |> Fmt.fmt "{" + | true -> + formatter + |> Fmt.fmt "{\n" + |> Fmt.fmt ~pad:" " ~just:Fmt.Left ~width:(width + 4L) "" + +let fmt_semi ~alt ~width formatter = + match alt with + | false -> formatter |> Fmt.fmt "; " + | true -> + formatter + |> Fmt.fmt "\n" + |> Fmt.fmt ~pad:" " ~just:Fmt.Left ~width:(width + 4L) "" + +let fmt_rcurly ~alt ~width formatter = + match alt with + | false -> formatter |> Fmt.fmt "}" + | true -> + formatter + |> Fmt.fmt "\n" + |> Fmt.fmt ~pad:" " ~just:Fmt.Left ~width:(width + 2L) "" + |> Fmt.fmt "}" + +let rec fmt_uident ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) uident formatter = + match uident with + | Uident {uident} -> + formatter + |> Fmt.fmt "Uident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "uident=" |> Scan.Token.pp uident + |> fmt_rcurly ~alt ~width +and pp_uident uident formatter = + fmt_uident uident formatter + +let rec fmt_cident ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) cident formatter = + match cident with + | Cident {cident} -> + formatter + |> Fmt.fmt "Cident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "cident=" |> Scan.Token.pp cident + |> fmt_rcurly ~alt ~width +and pp_cident cident formatter = + fmt_cident cident formatter + +and fmt_ident ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) ident formatter = + match ident with + | IdentUident {uident} -> + formatter |> Fmt.fmt "IdentUident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "uident=" |> pp_uident uident + |> fmt_rcurly ~alt ~width + | IdentCident {cident} -> + formatter |> Fmt.fmt "IdentCident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "cident=" |> pp_cident cident + |> fmt_rcurly ~alt ~width + | IdentUscore {uscore} -> + formatter |> Fmt.fmt "IdentUscore " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "uscore=" |> Scan.Token.pp uscore + |> fmt_rcurly ~alt ~width +and pp_ident ident formatter = + fmt_ident ident formatter + +and fmt_precs_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) precs_tl + formatter = + let width' = width + 4L in + match precs_tl with + | PrecsTlCommaUident {comma; uident; precs_tl} -> + formatter |> Fmt.fmt "PrecsTlCommaUident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "comma=" |> Scan.Token.pp comma + |> fmt_semi ~alt ~width + |> Fmt.fmt "uident=" |> fmt_uident ~alt ~width:width' uident + |> fmt_semi ~alt ~width + |> Fmt.fmt "precs_tl=" |> fmt_precs_tl ~alt ~width:width' precs_tl + |> fmt_rcurly ~alt ~width + | PrecsTlEpsilon -> + formatter |> Fmt.fmt "PrecsTlEpsilon" +and pp_precs_tl precs_tl formatter = + fmt_precs_tl precs_tl formatter + +and fmt_precs ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) precs formatter = + let width' = width + 4L in + match precs with + | Precs {uident; precs_tl} -> + formatter |> Fmt.fmt "Precs " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "uident=" |> fmt_uident ~alt ~width:width' uident + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "precs_tl=" |> fmt_precs_tl ~alt ~width:width' precs_tl + |> fmt_rcurly ~alt ~width +and pp_precs precs formatter = + fmt_precs precs formatter + +and fmt_prec_rels ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prec_rels formatter = + let width' = width + 4L in + match prec_rels with + | PrecRelsLtPrecs {lt; precs} -> + formatter |> Fmt.fmt "PrecRelsLtPrecs " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "lt=" |> Scan.Token.pp lt + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "precs=" |> fmt_precs ~alt ~width:width' precs + |> fmt_rcurly ~alt ~width + | PrecRelsEpsilon -> + formatter |> Fmt.fmt "PrecRelsEpsilon" +and pp_prec_rels prec_rels formatter = + fmt_prec_rels prec_rels formatter + +and fmt_prec_type ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prec_type formatter = + match prec_type with + | PrecTypePrec {prec} -> + formatter |> Fmt.fmt "PrecTypePrec " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prec=" |> Scan.Token.pp prec + |> fmt_rcurly ~alt ~width + | PrecTypeLeft {left} -> + formatter |> Fmt.fmt "PrecTypeLeft " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "left=" |> Scan.Token.pp left + |> fmt_rcurly ~alt ~width + | PrecTypeRight {right} -> + formatter |> Fmt.fmt "PrecTypeRight " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "right=" |> Scan.Token.pp right + |> fmt_rcurly ~alt ~width +and pp_prec_type prec_type formatter = + fmt_prec_type prec_type formatter + +and fmt_prec ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prec formatter = + let width' = width + 4L in + match prec with + | Prec {prec_type; uident; prec_rels} -> + formatter |> Fmt.fmt "Prec " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prec_type=" |> fmt_prec_type ~alt ~width:width' prec_type + |> fmt_semi ~alt ~width + |> Fmt.fmt "uident=" |> fmt_uident ~alt ~width:width' uident + |> fmt_semi ~alt ~width + |> Fmt.fmt "prec_rels=" |> fmt_prec_rels ~alt ~width:width' prec_rels + |> fmt_rcurly ~alt ~width +and pp_prec prec formatter = + fmt_prec prec formatter + +and fmt_of_type ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) of_type formatter = + let width' = width + 4L in + match of_type with + | OfType {of_; type_module; dot; type_type} -> + formatter |> Fmt.fmt "OfType " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "of_=" |> Scan.Token.pp of_ + |> fmt_semi ~alt ~width + |> Fmt.fmt "type_module=" |> fmt_cident ~alt ~width:width' type_module + |> fmt_semi ~alt ~width + |> Fmt.fmt "dot=" |> Scan.Token.pp dot + |> fmt_semi ~alt ~width + |> Fmt.fmt "type_type=" |> fmt_uident ~alt ~width:width' type_type + |> fmt_rcurly ~alt ~width +and pp_of_type of_type formatter = + fmt_of_type of_type formatter + +and fmt_of_type0 ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) of_type0 formatter = + let width' = width + 4L in + match of_type0 with + | OfType0OfType {of_type} -> + formatter |> Fmt.fmt "OfType0OfType " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "of_type=" |> fmt_of_type ~alt ~width:width' of_type + |> fmt_rcurly ~alt ~width + | OfType0Epsilon -> + formatter |> Fmt.fmt "OfType0Epsilon" +and pp_of_type0 of_type0 formatter = + fmt_of_type0 of_type0 formatter + +and fmt_prec_ref ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prec_ref formatter = + let width' = width + 4L in + match prec_ref with + | PrecRefPrecUident {prec; uident} -> + formatter |> Fmt.fmt "PrecRefPrecUident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prec=" |> Scan.Token.pp prec + |> fmt_semi ~alt ~width + |> Fmt.fmt "uident=" |> fmt_uident ~alt ~width:width' uident + |> fmt_rcurly ~alt ~width + | PrecRefEpsilon -> + formatter |> Fmt.fmt "PrecRefEpsilon" +and pp_prec_ref prec_ref formatter = + fmt_prec_ref prec_ref formatter + +and fmt_token_alias ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) token_alias formatter = + match token_alias with + | TokenAlias {alias} -> + formatter |> Fmt.fmt "Token " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "alias=" |> Scan.Token.pp alias + |> fmt_rcurly ~alt ~width + | TokenAliasEpsilon -> + formatter |> Fmt.fmt "TokenAliasEpsilon" +and pp_token_alias token_alias formatter = + fmt_token_alias token_alias formatter + +and fmt_token ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) token formatter = + let width' = width + 4L in + match token with + | Token {token; cident; token_alias; of_type0; prec_ref} -> + formatter |> Fmt.fmt "Token " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "token=" |> Scan.Token.pp token + |> fmt_semi ~alt ~width + |> Fmt.fmt "cident=" |> fmt_cident ~alt ~width:width' cident + |> fmt_semi ~alt ~width + |> Fmt.fmt "token_alias=" |> fmt_token_alias ~alt ~width:width' token_alias + |> fmt_semi ~alt ~width + |> Fmt.fmt "of_type0=" |> fmt_of_type0 ~alt ~width:width' of_type0 + |> fmt_semi ~alt ~width + |> Fmt.fmt "prec_ref=" |> fmt_prec_ref ~alt ~width:width' prec_ref + |> fmt_rcurly ~alt ~width +and pp_token token formatter = + fmt_token token formatter + +and fmt_sep ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) sep formatter = + match sep with + | SepLineDelim {line_delim} -> + formatter |> Fmt.fmt "SepLineDelim " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "line_delim=" |> Scan.Token.pp line_delim + |> fmt_rcurly ~alt ~width + | SepSemi {semi} -> + formatter |> Fmt.fmt "SepSemi " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "semi=" |> Scan.Token.pp semi + |> fmt_rcurly ~alt ~width + | SepBar {bar} -> + formatter |> Fmt.fmt "SepBar " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "bar=" |> Scan.Token.pp bar + |> fmt_rcurly ~alt ~width +and pp_sep sep formatter = + fmt_sep sep formatter + +and fmt_codes_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) codes_tl formatter = + let width' = width + 4L in + match codes_tl with + | CodesTlSepCode {sep; code; codes_tl} -> + formatter |> Fmt.fmt "CodesTlSepCode " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "sep=" |> fmt_sep ~alt ~width:width' sep + |> fmt_semi ~alt ~width + |> Fmt.fmt "code=" |> fmt_code ~alt ~width:width' code + |> fmt_rcurly ~alt ~width + |> Fmt.fmt "codes_tl=" |> fmt_codes_tl ~alt ~width:width' codes_tl + |> fmt_rcurly ~alt ~width + | CodesTlEpsilon -> formatter |> Fmt.fmt "CodesTlEpsilon" +and pp_codes_tl codes_tl formatter = + fmt_codes codes_tl formatter + +and fmt_codes ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) codes formatter = + let width' = width + 4L in + match codes with + | Codes {code; codes_tl} -> + formatter |> Fmt.fmt "Codes " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "code=" |> fmt_code ~alt ~width:width' code + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes_tl=" |> fmt_codes_tl ~alt ~width:width' codes_tl + |> fmt_rcurly ~alt ~width +and pp_codes codes formatter = + fmt_codes codes formatter + +and fmt_codes0 ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) codes0 formatter = + let width' = width + 4L in + match codes0 with + | Codes0Codes {codes} -> + formatter |> Fmt.fmt "Codes0Codes " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "codes=" |> fmt_codes ~alt ~width:width' codes + |> fmt_rcurly ~alt ~width + | Codes0Epsilon -> + formatter |> Fmt.fmt "Codes0Epsilon" +and pp_codes0 codes formatter = + fmt_codes codes formatter + +and fmt_delimited ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) delimited formatter = + let width' = width + 4L in + match delimited with + | DelimitedBlock {indent; codes; dedent} -> + formatter |> Fmt.fmt "DelimitedBlock " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "indent=" |> Scan.Token.pp indent + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes=" |> fmt_codes ~alt ~width:width' codes + |> fmt_semi ~alt ~width + |> Fmt.fmt "dedent=" |> Scan.Token.pp dedent + |> fmt_rcurly ~alt ~width + | DelimitedParen {lparen; codes0; rparen} -> + formatter |> Fmt.fmt "DelimitedParen " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "lparen=" |> Scan.Token.pp lparen + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes0=" |> fmt_codes0 ~alt ~width:width' codes0 + |> fmt_semi ~alt ~width + |> Fmt.fmt "rparen=" |> Scan.Token.pp rparen + |> fmt_rcurly ~alt ~width + | DelimitedCapture {lcapture; codes0; rcapture} -> + formatter |> Fmt.fmt "DelimitedCapture " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "lcapture=" |> Scan.Token.pp lcapture + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes0=" |> fmt_codes0 ~alt ~width:width' codes0 + |> fmt_semi ~alt ~width + |> Fmt.fmt "rcapture=" |> Scan.Token.pp rcapture + |> fmt_rcurly ~alt ~width + | DelimitedList {lbrack; codes0; rbrack} -> + formatter |> Fmt.fmt "DelimitedList " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "lbrack=" |> Scan.Token.pp lbrack + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes0=" |> fmt_codes0 ~alt ~width:width' codes0 + |> fmt_semi ~alt ~width + |> Fmt.fmt "rbrack=" |> Scan.Token.pp rbrack + |> fmt_rcurly ~alt ~width + | DelimitedArray {larray; codes0; rarray} -> + formatter |> Fmt.fmt "DelimitedArray " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "larray=" |> Scan.Token.pp larray + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes0=" |> fmt_codes0 ~alt ~width:width' codes0 + |> fmt_semi ~alt ~width + |> Fmt.fmt "rarray=" |> Scan.Token.pp rarray + |> fmt_rcurly ~alt ~width + | DelimitedModule {lcurly; codes0; rcurly} -> + formatter |> Fmt.fmt "DelimitedModule " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "lcurly=" |> Scan.Token.pp lcurly + |> fmt_semi ~alt ~width + |> Fmt.fmt "codes0=" |> fmt_codes0 ~alt ~width:width' codes0 + |> fmt_semi ~alt ~width + |> Fmt.fmt "rcurly=" |> Scan.Token.pp rcurly + |> fmt_rcurly ~alt ~width +and pp_delimited delimited formatter = + fmt_delimited delimited formatter + +and fmt_code_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) code_tl formatter = + let width' = width + 4L in + match code_tl with + | CodeTlDelimited {delimited; code_tl} -> + formatter |> Fmt.fmt "CodeTlDelimited " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "delimited=" |> fmt_delimited ~alt ~width:width' delimited + |> fmt_semi ~alt ~width + |> Fmt.fmt "code_tl=" |> fmt_code_tl ~alt ~width:width' code_tl + |> fmt_rcurly ~alt ~width + | CodeTlToken {token; code_tl} -> + formatter |> Fmt.fmt "CodeTlToken " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "token=" |> Scan.Token.pp token + |> fmt_semi ~alt ~width + |> Fmt.fmt "code_tl=" |> fmt_code_tl ~alt ~width:width' code_tl + |> fmt_rcurly ~alt ~width + | CodeTlEpsilon -> + formatter |> Fmt.fmt "CodeTlEpsilon" +and pp_code_tl code_tl formatter = + fmt_code code_tl formatter + +and fmt_code ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) code formatter = + let width' = width + 4L in + match code with + | CodeDelimited {delimited; code_tl} -> + formatter |> Fmt.fmt "CodeDelimited " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "delimited=" |> fmt_delimited ~alt ~width:width' delimited + |> fmt_semi ~alt ~width + |> Fmt.fmt "code_tl=" |> fmt_code_tl ~alt ~width:width' code_tl + |> fmt_rcurly ~alt ~width + | CodeToken {token; code_tl} -> + formatter |> Fmt.fmt "CodeToken " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "token=" |> Scan.Token.pp token + |> fmt_semi ~alt ~width + |> Fmt.fmt "code_tl=" |> fmt_code_tl ~alt ~width:width' code_tl + |> fmt_rcurly ~alt ~width +and pp_code code formatter = + fmt_code code formatter + +and fmt_prod_param_symbol ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod_param_symbol + formatter = + let width' = width + 4L in + match prod_param_symbol with + | ProdParamSymbolCident {cident} -> + formatter |> Fmt.fmt "ProdParamSymbolCident " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "cident=" |> fmt_cident ~alt ~width:width' cident + |> fmt_rcurly ~alt ~width + | ProdParamSymbolAlias {alias} -> + formatter |> Fmt.fmt "ProdParamSymbolAlias " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "alias=" |> Scan.Token.pp alias + |> fmt_rcurly ~alt ~width +and pp_prod_param_symbol prod_param_symbol formatter = + fmt_prod_param_symbol prod_param_symbol formatter + +and fmt_prod_param_binding ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod_param_binding + formatter = + let width' = width + 4L in + match prod_param_binding with + | ProdParamBindingIdentColon {ident; colon} -> + formatter |> Fmt.fmt "ProdParamBindingIdentColon " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "ident=" |> fmt_ident ~alt ~width:width' ident + |> fmt_semi ~alt ~width + |> Fmt.fmt "colon=" |> Scan.Token.pp colon + |> fmt_rcurly ~alt ~width + | ProdParamBindingEpsilon -> + formatter |> Fmt.fmt "ProdParamBindingEpsilon" +and pp_prod_param_binding prod_param_binding formatter = + fmt_prod_param_binding prod_param_binding formatter + +and fmt_prod_param ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod_param formatter = + let width' = width + 4L in + match prod_param with + | ProdParam {prod_param_binding; prod_param_symbol} -> + formatter |> Fmt.fmt "ProdParam " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prod_param_binding=" |> fmt_prod_param_binding ~alt ~width:width' prod_param_binding + |> fmt_semi ~alt ~width + |> Fmt.fmt "prod_param_symbol=" |> fmt_prod_param_symbol ~alt ~width:width' prod_param_symbol + |> fmt_rcurly ~alt ~width +and pp_prod_param prod_param formatter = + fmt_prod_param prod_param formatter + +and fmt_prod_params_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod_params_tl formatter = + let width' = width + 4L in + match prod_params_tl with + | ProdParamsTlProdParam {prod_param; prod_params_tl} -> + formatter |> Fmt.fmt "ProdParamsTlProdParam " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prod_param=" |> fmt_prod_param ~alt ~width:width' prod_param + |> fmt_semi ~alt ~width + |> Fmt.fmt "prod_params_tl=" |> fmt_prod_params_tl ~alt ~width:width' prod_params_tl + |> fmt_rcurly ~alt ~width + | ProdParamsTlEpsilon -> + formatter |> Fmt.fmt "ProdParamsTlEpsilon" +and pp_prod_params_tl prod_params_tl formatter = + fmt_prod_params_tl prod_params_tl formatter + +and fmt_prod_params ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod_params formatter = + let width' = width + 4L in + match prod_params with + | ProdParamsProdParam {prod_param; prod_params_tl} -> + formatter |> Fmt.fmt "ProdParamsProdParam " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prod_param=" |> fmt_prod_param ~alt ~width:width' prod_param + |> fmt_semi ~alt ~width + |> Fmt.fmt "prod_params_tl=" |> fmt_prod_params_tl ~alt ~width:width' prod_params_tl + |> fmt_rcurly ~alt ~width +and pp_prod_params prod_params formatter = + fmt_prod_params prod_params formatter + +and fmt_prod_pattern ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod_pattern formatter = + let width' = width + 4L in + match prod_pattern with + | ProdPatternParams {prod_params} -> + formatter |> Fmt.fmt "ProdPatternParams " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prod_params=" |> fmt_prod_params ~alt ~width:width' prod_params + |> fmt_rcurly ~alt ~width + | ProdPatternEpsilon {epsilon} -> + formatter |> Fmt.fmt "ProdPatternEpsilon " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "epsilon=" |> Scan.Token.pp epsilon + |> fmt_rcurly ~alt ~width +and pp_prod_pattern prod_pattern formatter = + fmt_prod_pattern prod_pattern formatter + +and fmt_prod ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prod formatter = + let width' = width + 4L in + match prod with + | Prod {prod_pattern; prec_ref} -> + formatter |> Fmt.fmt "Prod " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prod_pattern=" |> fmt_prod_pattern ~alt ~width:width' prod_pattern + |> fmt_semi ~alt ~width + |> Fmt.fmt "prec_ref=" |> fmt_prec_ref ~alt ~width:width' prec_ref + |> fmt_rcurly ~alt ~width +and pp_prod prod formatter = + fmt_prod prod formatter + +and fmt_prods_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prods_tl formatter = + let width' = width + 4L in + match prods_tl with + | ProdsTlBarProd {bar; prod; prods_tl} -> + formatter |> Fmt.fmt "ProdsTlBarProd " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "bar=" |> Scan.Token.pp bar + |> fmt_semi ~alt ~width + |> Fmt.fmt "prod=" |> fmt_prod ~alt ~width:width' prod + |> fmt_semi ~alt ~width + |> Fmt.fmt "prods_tl=" |> fmt_prods_tl ~alt ~width:width' prods_tl + |> fmt_rcurly ~alt ~width + | ProdsTlEpsilon -> + formatter |> Fmt.fmt "ProdsTlEpsilon" +and pp_prods_tl prods_tl formatter = + fmt_prods_tl prods_tl formatter + +and fmt_prods ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) prods formatter = + let width' = width + 4L in + match prods with + | ProdsBarProd {bar; prod; prods_tl} -> + formatter |> Fmt.fmt "ProdsBarProd " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "bar=" |> Scan.Token.pp bar + |> fmt_semi ~alt ~width + |> Fmt.fmt "prod=" |> fmt_prod ~alt ~width:width' prod + |> fmt_semi ~alt ~width + |> Fmt.fmt "prods_tl=" |> fmt_prods_tl ~alt ~width:width' prods_tl + |> fmt_rcurly ~alt ~width + | ProdsProd {prod; prods_tl} -> + formatter |> Fmt.fmt "ProdsProd " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prod=" |> fmt_prod ~alt ~width:width' prod + |> fmt_semi ~alt ~width + |> Fmt.fmt "prods_tl=" |> fmt_prods_tl ~alt ~width:width' prods_tl + |> fmt_rcurly ~alt ~width +and pp_prods prods formatter = + fmt_prods prods formatter + +and fmt_reduction ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) reduction formatter = + let width' = width + 4L in + match reduction with + | Reduction {prods; arrow; code} -> + formatter |> Fmt.fmt "Reduction " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prods=" |> fmt_prods ~alt ~width:width' prods + |> fmt_semi ~alt ~width + |> Fmt.fmt "arrow=" |> Scan.Token.pp arrow + |> fmt_semi ~alt ~width + |> Fmt.fmt "code=" |> pp_code code + |> fmt_rcurly ~alt ~width +and pp_reduction reduction formatter = + fmt_reduction reduction formatter + +and fmt_reductions_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) reductions_tl formatter = + let width' = width + 4L in + match reductions_tl with + | ReductionsTlBarReduction {bar; reduction; reductions_tl} -> + formatter |> Fmt.fmt "ReductionsTlBarReduction " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "bar=" |> Scan.Token.pp bar + |> fmt_semi ~alt ~width + |> Fmt.fmt "reduction=" |> fmt_reduction ~alt ~width:width' reduction + |> fmt_semi ~alt ~width + |> Fmt.fmt "reductions_tl=" |> fmt_reductions_tl ~alt ~width:width' reductions_tl + |> fmt_rcurly ~alt ~width + | ReductionsTlEpsilon -> + formatter |> Fmt.fmt "ReductionsTlEpsilon" +and pp_reductions_tl reductions_tl formatter = + fmt_reductions_tl reductions_tl formatter + +and fmt_reductions ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) reductions formatter = + let width' = width + 4L in + match reductions with + | ReductionsBarReduction {bar; reduction; reductions_tl} -> + formatter |> Fmt.fmt "ReductionsBarReduction " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "bar=" |> Scan.Token.pp bar + |> fmt_semi ~alt ~width + |> Fmt.fmt "reduction=" |> fmt_reduction ~alt ~width:width' reduction + |> fmt_semi ~alt ~width + |> Fmt.fmt "reductions_tl=" |> fmt_reductions_tl ~alt ~width:width' reductions_tl + |> fmt_rcurly ~alt ~width + | ReductionsReduction {reduction; reductions_tl} -> + formatter |> Fmt.fmt "ReductionsReduction " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "reduction=" |> fmt_reduction ~alt ~width:width' reduction + |> fmt_semi ~alt ~width + |> Fmt.fmt "reductions_tl=" |> fmt_reductions_tl ~alt ~width:width' reductions_tl + |> fmt_rcurly ~alt ~width +and pp_reductions reductions formatter = + fmt_reductions reductions formatter + +and fmt_nonterm_type ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) nonterm_type formatter = + match nonterm_type with + | NontermTypeNonterm {nonterm} -> + formatter |> Fmt.fmt "NontermTypeNonterm " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "nonterm=" |> Scan.Token.pp nonterm + |> fmt_rcurly ~alt ~width + | NontermTypeStart {start} -> + formatter |> Fmt.fmt "NontermTypeStart " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "start=" |> Scan.Token.pp start + |> fmt_rcurly ~alt ~width +and pp_nonterm_type nonterm_type formatter = + fmt_nonterm_type nonterm_type formatter + +and fmt_nonterm ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) nonterm formatter = + let width' = width + 4L in + match nonterm with + | NontermProds {nonterm_type; cident; prec_ref; cce; prods} -> + formatter |> Fmt.fmt "NontermProds " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "nonterm_type=" |> fmt_nonterm_type ~alt ~width:width' nonterm_type + |> fmt_semi ~alt ~width + |> Fmt.fmt "cident=" |> fmt_cident ~alt ~width:width' cident + |> fmt_semi ~alt ~width + |> Fmt.fmt "prec_ref=" |> fmt_prec_ref ~alt ~width:width' prec_ref + |> fmt_semi ~alt ~width + |> Fmt.fmt "cce=" |> Scan.Token.pp cce + |> fmt_semi ~alt ~width + |> Fmt.fmt "prods=" |> fmt_prods ~alt ~width:width' prods + |> fmt_rcurly ~alt ~width + | NontermReductions {nonterm_type; cident; of_type; prec_ref; cce; reductions} -> + formatter |> Fmt.fmt "NontermReductions " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "nonterm_type=" |> fmt_nonterm_type ~alt ~width:width' nonterm_type + |> fmt_semi ~alt ~width + |> Fmt.fmt "cident=" |> fmt_cident ~alt ~width:width' cident + |> fmt_semi ~alt ~width + |> Fmt.fmt "of_type=" |> fmt_of_type ~alt ~width:width' of_type + |> fmt_semi ~alt ~width + |> Fmt.fmt "prec_ref=" |> fmt_prec_ref ~alt ~width:width' prec_ref + |> fmt_semi ~alt ~width + |> Fmt.fmt "cce=" |> Scan.Token.pp cce + |> fmt_semi ~alt ~width + |> Fmt.fmt "reductions=" |> fmt_reductions ~alt ~width:width' reductions + |> fmt_rcurly ~alt ~width +and pp_nonterm nonterm formatter = + fmt_nonterm nonterm formatter + +and fmt_stmt ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) stmt formatter = + let width' = width + 4L in + match stmt with + | StmtPrec {prec} -> + formatter |> Fmt.fmt "StmtPrec " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prec=" |> fmt_prec ~alt ~width:width' prec + |> fmt_rcurly ~alt ~width + | StmtToken {token} -> + formatter |> Fmt.fmt "StmtToken " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "token=" |> fmt_token ~alt ~width:width' token + |> fmt_rcurly ~alt ~width + | StmtNonterm {nonterm} -> + formatter |> Fmt.fmt "StmtNonterm " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "nonterm=" |> fmt_nonterm ~alt ~width:width' nonterm + |> fmt_rcurly ~alt ~width + | StmtCode {code} -> + formatter |> Fmt.fmt "StmtCode " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "code=" |> fmt_code ~alt ~width:width' code + |> fmt_rcurly ~alt ~width +and pp_stmt stmt formatter = + fmt_stmt stmt formatter + +and fmt_stmts_tl ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) stmts_tl formatter = + let width' = width + 4L in + match stmts_tl with + | StmtsTl {line_delim; stmt; stmts_tl} -> + formatter |> Fmt.fmt "StmtsTl " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "line_delim=" |> Scan.Token.pp line_delim + |> fmt_semi ~alt ~width + |> Fmt.fmt "stmt=" |> fmt_stmt ~alt ~width:width' stmt + |> fmt_semi ~alt ~width + |> Fmt.fmt "stmts_tl=" |> fmt_stmts_tl ~alt ~width:width' stmts_tl + |> fmt_rcurly ~alt ~width + | StmtsTlEpsilon -> + formatter |> Fmt.fmt "StmtsTlEpsilon" +and pp_stmts_tl stmts_tl formatter = + fmt_stmts_tl stmts_tl formatter + +and fmt_stmts ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) stmts formatter = + let width' = width + 4L in + match stmts with + | Stmts {stmt; stmts_tl} -> + formatter |> Fmt.fmt "Stmts " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "stmt=" |> fmt_stmt ~alt ~width:width' stmt + |> fmt_semi ~alt ~width + |> Fmt.fmt "stmts_tl=" |> fmt_stmts_tl ~alt ~width:width' stmts_tl + |> fmt_rcurly ~alt ~width +and pp_stmts stmts formatter = + fmt_stmts stmts formatter + +and fmt_hocc ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) hocc formatter = + let width' = width + 4L in + match hocc with + | Hocc {hocc; indent; stmts; dedent} -> + formatter |> Fmt.fmt "Hocc " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "hocc=" |> Scan.Token.pp hocc + |> fmt_semi ~alt ~width + |> Fmt.fmt "indent=" |> Scan.Token.pp indent + |> fmt_semi ~alt ~width + |> Fmt.fmt "stmts=" |> fmt_stmts ~alt ~width:width' stmts + |> fmt_semi ~alt ~width + |> Fmt.fmt "dedent=" |> Scan.Token.pp dedent + |> fmt_rcurly ~alt ~width +and pp_hocc hocc formatter = + fmt_hocc hocc formatter + +and fmt_eoi ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) eoi formatter = + match eoi with + | Eoi {eoi} -> + formatter |> Fmt.fmt "Eoi " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "eoi=" |> Scan.Token.pp eoi + |> fmt_rcurly ~alt ~width +and pp_eoi eoi formatter = + fmt_eoi eoi formatter + +and fmt_matter ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) matter formatter = + let width' = width + 4L in + match matter with + | Matter {token; matter} -> + formatter |> Fmt.fmt "Matter " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "token=" |> Scan.Token.pp token + |> fmt_semi ~alt ~width + |> Fmt.fmt "matter=" |> fmt_matter ~alt ~width:width' matter + |> fmt_rcurly ~alt ~width + | MatterEpsilon -> + formatter |> Fmt.fmt "MatterEpsilon" +and pp_matter matter formatter = + fmt_matter matter formatter + +and fmt_hmh ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) hmh formatter = + let width' = width + 4L in + match hmh with + | Hmh {prelude; hocc; postlude; eoi} -> + formatter |> Fmt.fmt "Hmh " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prelude=" |> pp_matter prelude + |> fmt_semi ~alt ~width + |> Fmt.fmt "hocc=" |> fmt_hocc ~alt ~width:width' hocc + |> fmt_semi ~alt ~width + |> Fmt.fmt "postlude=" |> pp_matter postlude + |> fmt_semi ~alt ~width + |> Fmt.fmt "eoi=" |> fmt_eoi ~alt ~width:width' eoi + |> fmt_rcurly ~alt ~width +and pp_hmh hmh formatter = + fmt_hmh hmh formatter + +and fmt_hmhi ?(alt=Fmt.alt_default) ?(width=Fmt.width_default) hmhi formatter = + let width' = width + 4L in + match hmhi with + | Hmhi {prelude; hocc; postlude; eoi} -> + formatter |> Fmt.fmt "Hmhi " + |> fmt_lcurly ~alt ~width + |> Fmt.fmt "prelude=" |> pp_matter prelude + |> fmt_semi ~alt ~width + |> Fmt.fmt "hocc=" |> Scan.Token.pp hocc + |> fmt_semi ~alt ~width + |> Fmt.fmt "postlude=" |> pp_matter postlude + |> fmt_semi ~alt ~width + |> Fmt.fmt "eoi=" |> fmt_eoi ~alt ~width:width' eoi + |> fmt_rcurly ~alt ~width +and pp_hmhi hmhi formatter = + fmt_hmhi hmhi formatter + +(**************************************************************************************************) +(* Recursive descent parser. *) + +let trace = false + +type ctx = { + scanner: Scan.t; + errs: Error.t list; +} + +let pp_ctx {scanner; errs} formatter = + formatter + |> Fmt.fmt "{scanner=" |> Scan.pp scanner + |> Fmt.fmt "; errs=" |> (List.pp Error.pp) errs + |> Fmt.fmt "}" + +let rec next ?(all=false) spine ({scanner; errs} as ctx) = + let scanner', tok = Scan.next scanner in + let _ = if trace then + File.Fmt.stderr + |> Fmt.fmt "hocc (trace): next ~all:" |> Bool.pp all + |> Fmt.fmt " " |> (List.pp String.pp) (List.rev spine) + |> Fmt.fmt " " |> pp_ctx ctx |> ignore + in + let errs' = List.fold (Scan.Token.malformations tok) ~init:errs ~f:(fun accum mal -> + Error.init_mal mal :: accum) in + let ctx' = {scanner=scanner'; errs=errs'} in + match all, tok with + | _, HmcToken {atok=Tok_whitespace; _} + | false, HmcToken {atok=(Tok_hash_comment|Tok_paren_comment _); _} -> begin + let _ = if trace then + File.Fmt.stderr |> Fmt.fmt " -> recurse (" |> Scan.Token.pp tok |> Fmt.fmt ")\n" |> ignore + in + next ~all spine ctx' + end + | _ -> begin + let _ = if trace then + File.Fmt.stderr |> Fmt.fmt " -> " |> Scan.Token.pp tok |> Fmt.fmt "\n" |> ignore in + ctx', tok + end + +let err msg {scanner; errs} = + {scanner; errs=(Error.init_scanner scanner msg) :: errs} + +let err_token tok msg {scanner; errs} = + {scanner; errs=(Error.init_token tok msg) :: errs} + +let reduce ?(alt=true) spine ctx + (fmt_t: ?alt:bool -> ?width:uns -> 'a -> (module Fmt.Formatter) -> (module Fmt.Formatter)) t = + let _ = if trace then + File.Fmt.stderr |> Fmt.fmt "hocc (trace): reduce " |> (List.pp String.pp) (List.rev spine) + |> Fmt.fmt " " |> pp_ctx ctx |> Fmt.fmt " " |> fmt_t ~alt t |> Fmt.fmt "\n" |> ignore + in + ctx, Some t + +(* Map optional subtree result, passing the resulting ctx in to enable tail recursion. *) +let mapr ~child ~f spine ctx = + let ctx', child_opt = child spine ctx in + match child_opt with + | None -> ctx', None + | Some c -> f spine ctx' c + +(* Map optional subtree result, without support for tail recursion. *) +let map ~child ~f + ~(fmt_child: ?alt:bool -> ?width:uns -> 'a -> (module Fmt.Formatter) -> (module Fmt.Formatter)) + spine ctx = + mapr ~child ~f:(fun spine ctx' c -> reduce spine ctx' fmt_child (f c)) spine ctx + +let rec uident spine ctx = + let spine = match trace with true -> "uident" :: spine | false -> [] in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_uident _; _} as uident -> + reduce spine ctx' fmt_uident (Uident {uident}) + | _ -> err_token tok "Expected uident" ctx, None + +and cident spine ctx = + let spine = "cident" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_cident _; _} as cident -> + reduce spine ctx' fmt_cident (Cident {cident}) + | _ -> err_token tok "Expected cident" ctx, None + +and ident spine ctx = + let spine = "ident" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_uident _; _} as uident -> + reduce spine ctx' fmt_ident (IdentUident {uident=Uident {uident}}) + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_cident _; _} as cident -> + reduce spine ctx' fmt_ident (IdentCident {cident=Cident {cident}}) + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_uscore; _} as uscore -> + reduce spine ctx' fmt_ident (IdentUscore {uscore}) + | _ -> err_token tok "Expected ident" ctx, None + +and precs_tl spine ctx = + let spine = "precs_tl" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_comma; _} as comma -> + mapr ~child:uident ~f:(fun spine ctx' uident -> + map ~child:precs_tl ~f:(fun precs_tl -> + PrecsTlCommaUident {comma; uident; precs_tl} + ) ~fmt_child:fmt_precs_tl spine ctx' + ) spine ctx' + | _ -> reduce spine ctx fmt_precs_tl PrecsTlEpsilon + +and precs spine ctx = + let spine = "precs" :: spine in + mapr ~child:uident ~f:(fun spine ctx' uident -> + map ~child:precs_tl ~f:(fun precs_tl -> + Precs {uident; precs_tl} + ) ~fmt_child:fmt_precs spine ctx' + ) spine ctx + +and prec_rels spine ctx = + let spine = "prec_rels" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_lt; _} as lt -> + map ~child:precs ~f:(fun precs -> + PrecRelsLtPrecs {lt; precs} + ) ~fmt_child:fmt_prec_rels spine ctx' + | _ -> reduce spine ctx fmt_prec_rels PrecRelsEpsilon + +and prec_type spine ctx = + let spine = "prec_type" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_prec; _} as prec -> + reduce spine ctx' fmt_prec_type (PrecTypePrec {prec}) + | HoccToken {atok=Scan.AbstractToken.Tok_left; _} as left -> + reduce spine ctx' fmt_prec_type (PrecTypeLeft {left}) + | HoccToken {atok=Scan.AbstractToken.Tok_right; _} as right -> + reduce spine ctx' fmt_prec_type (PrecTypeRight {right}) + | _ -> err_token tok "Expected precedence type" ctx, None + +and prec spine ctx = + let spine = "prec" :: spine in + mapr ~child:prec_type ~f:(fun spine ctx' prec_type -> + mapr ~child:uident ~f:(fun spine ctx' uident -> + map ~child:prec_rels ~f:(fun prec_rels -> + Prec {prec_type; uident; prec_rels} + ) ~fmt_child:fmt_prec spine ctx' + ) spine ctx' + ) spine ctx + +and of_type spine ctx = + let spine = "of_type" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_of; _} as of_ -> + let dot spine ctx = begin + let spine = "dot" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_dot; _} -> ctx', Some tok + | _ -> err_token tok "Expected '.'" ctx, None + end in + mapr ~child:cident ~f:(fun spine ctx' type_module -> + mapr ~child:dot ~f:(fun spine ctx' dot -> + map ~child:uident ~f:(fun type_type -> + OfType {of_; type_module; dot; type_type} + ) ~fmt_child:fmt_of_type spine ctx' + ) spine ctx' + ) spine ctx' + | _ -> err_token tok "Expected 'of'" ctx, None + +and of_type0 spine ctx = + let spine = "of_type0" :: spine in + let _ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_of; _} -> + map ~child:of_type ~f:(fun of_type -> + OfType0OfType {of_type} + ) ~fmt_child:fmt_of_type0 spine ctx + | _ -> reduce spine ctx fmt_of_type0 OfType0Epsilon + +and prec_ref spine ctx = + let spine = "prec_ref" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_prec; _} as prec -> + map ~child:uident ~f:(fun uident -> + PrecRefPrecUident {prec; uident} + ) ~fmt_child:fmt_prec_ref spine ctx' + | _ -> reduce spine ctx fmt_prec_ref PrecRefEpsilon + +and token_alias spine ctx = + let spine = "token_alias" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_istring _; _} as alias -> + reduce spine ctx' fmt_token_alias (TokenAlias {alias}) + | _ -> reduce spine ctx fmt_token_alias TokenAliasEpsilon + +and token spine ctx = + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_token; _} as token -> + mapr ~child:cident ~f:(fun spine ctx' cident -> + mapr ~child:token_alias ~f:(fun spine ctx' token_alias -> + mapr ~child:of_type0 ~f:(fun spine ctx' of_type0 -> + map ~child:prec_ref ~f:(fun prec_ref -> + Token {token; cident; token_alias; of_type0; prec_ref} + ) ~fmt_child:fmt_token spine ctx' + ) spine ctx' + ) spine ctx' + ) spine ctx' + | _ -> err_token tok "Expected 'token' statement" ctx, None + +and sep spine ctx = + let spine = "sep" :: spine in + let ctx', tok = next ~all:true spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_line_delim; _} as line_delim -> + reduce spine ctx' fmt_sep (SepLineDelim {line_delim}) + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_semi; _} as semi -> + reduce spine ctx' fmt_sep (SepSemi {semi}) + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_bar; _} as bar -> + reduce spine ctx' fmt_sep (SepBar {bar}) + | _ -> ctx, None + +and codes_tl spine ctx = + let spine = "codes_tl" :: spine in + let _ctx', tok = next ~all:true spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_line_delim; _} + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_semi; _} + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_bar; _} -> + mapr ~child:sep ~f:(fun spine ctx' sep -> + mapr ~child:code ~f:(fun spine ctx' code -> + map ~child:codes_tl ~f:(fun codes_tl -> + CodesTlSepCode {sep; code; codes_tl} + ) ~fmt_child:fmt_codes_tl spine ctx' + ) spine ctx' + ) spine ctx + | _ -> reduce spine ctx fmt_codes_tl CodesTlEpsilon + +and codes spine ctx = + let spine = "codes" :: spine in + mapr ~child:code ~f:(fun spine ctx' code -> + map ~child:codes_tl ~f:(fun codes_tl -> + Codes {code; codes_tl} + ) ~fmt_child:fmt_codes spine ctx' + ) spine ctx + +and codes0 spine ctx = + let spine = "codes0" :: spine in + let ctx', codes_opt = codes spine ctx in + match codes_opt with + | Some codes -> reduce spine ctx' fmt_codes0 (Codes0Codes {codes}) + | None -> reduce spine ctx fmt_codes0 Codes0Epsilon + +and indent spine ctx = + let spine = "indent" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_indent _; _} -> ctx', Some tok + | _ -> err_token tok "Expected indent" ctx, None + +and dedent ?all spine ctx = + let spine = "dedent" :: spine in + let ctx', tok = next ?all spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_dedent _; _} -> ctx', Some tok + | _ -> err_token tok "Expected dedent" ctx, None + +and rparen ?all spine ctx = + let spine = "rparen" :: spine in + let ctx', tok = next ?all spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_rparen; _} -> ctx', Some tok + | _ -> err_token tok "Expected ')'" ctx, None + +and rcapture ?all spine ctx = + let spine = "rcapture" :: spine in + let ctx', tok = next ?all spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_rcapture; _} -> ctx', Some tok + | _ -> err_token tok "Expected '|)'" ctx, None + +and rbrack ?all spine ctx = + let spine = "rbrack" :: spine in + let ctx', tok = next ?all spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_rbrack; _} -> ctx', Some tok + | _ -> err_token tok "Expected ']'" ctx, None + +and rarray ?all spine ctx = + let spine = "rarray" :: spine in + let ctx', tok = next ?all spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_rarray; _} -> ctx', Some tok + | _ -> err_token tok "Expected '|]'" ctx, None + +and rcurly ?all spine ctx = + let spine = "rcurly" :: spine in + let ctx', tok = next ?all spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_rcurly; _} -> ctx', Some tok + | _ -> err_token tok "Expected '}'" ctx, None + +and delimited spine ctx = + let spine = "delimited" :: spine in + let ctx', tok = next ~all:true spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_indent _; _} as indent -> + mapr ~child:codes ~f:(fun spine ctx' codes -> + map ~child:(dedent ~all:true) ~f:(fun dedent -> + DelimitedBlock {indent; codes; dedent} + ) ~fmt_child:fmt_delimited spine ctx' + ) spine ctx' + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_lparen; _} as lparen -> + mapr ~child:codes0 ~f:(fun spine ctx' codes0 -> + map ~child:(rparen ~all:true) ~f:(fun rparen -> + DelimitedParen {lparen; codes0; rparen} + ) ~fmt_child:fmt_delimited spine ctx' + ) spine ctx' + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_lcapture; _} as lcapture -> + mapr ~child:codes0 ~f:(fun spine ctx' codes0 -> + map ~child:(rcapture ~all:true) ~f:(fun rcapture -> + DelimitedCapture {lcapture; codes0; rcapture} + ) ~fmt_child:fmt_delimited spine ctx' + ) spine ctx' + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_lbrack; _} as lbrack -> + mapr ~child:codes0 ~f:(fun spine ctx' codes0 -> + map ~child:(rbrack ~all:true) ~f:(fun rbrack -> + DelimitedList {lbrack; codes0; rbrack} + ) ~fmt_child:fmt_delimited spine ctx' + ) spine ctx' + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_larray; _} as larray -> + mapr ~child:codes0 ~f:(fun spine ctx' codes0 -> + map ~child:(rarray ~all:true) ~f:(fun rarray -> + DelimitedArray {larray; codes0; rarray} + ) ~fmt_child:fmt_delimited spine ctx' + ) spine ctx' + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_lcurly; _} as lcurly -> + mapr ~child:codes0 ~f:(fun spine ctx' codes0 -> + map ~child:(rcurly ~all:true) ~f:(fun rcurly -> + DelimitedModule {lcurly; codes0; rcurly} + ) ~fmt_child:fmt_delimited spine ctx' + ) spine ctx' + | _ -> err_token tok "Expected left delimiter" ctx, None + +and code_tl spine ctx = + let spine = "code_tl" :: spine in + let ctx', tok = next ~all:true spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.(Tok_indent _|Tok_lparen|Tok_lcapture|Tok_lbrack + |Tok_larray|Tok_lcurly); _} -> + mapr ~child:delimited ~f:(fun spine ctx' delimited -> + map ~child:code_tl ~f:(fun code_tl -> + CodeTlDelimited {delimited; code_tl} + ) ~fmt_child:fmt_code_tl spine ctx' + ) spine ctx + | HmcToken {atok=Hmc.Scan.AbstractToken.(Tok_dedent _|Tok_rparen|Tok_rcapture|Tok_rbrack + |Tok_rarray|Tok_rcurly + |Tok_line_delim|Tok_semi|Tok_bar); _} -> + reduce spine ctx fmt_code_tl CodeTlEpsilon + | HmcToken _ as token -> + map ~child:code_tl ~f:(fun code_tl -> + CodeTlToken {token; code_tl} + ) ~fmt_child:fmt_code_tl spine ctx' + | _ -> reduce spine ctx fmt_code_tl CodeTlEpsilon + +and code spine ctx = + let spine = "code" :: spine in + let ctx', tok = next ~all:true spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.(Tok_indent _|Tok_lparen|Tok_lcapture|Tok_lbrack + |Tok_larray|Tok_lcurly); _} -> + mapr ~child:delimited ~f:(fun spine ctx' delimited -> + map ~child:code_tl ~f:(fun code_tl -> + CodeDelimited {delimited; code_tl} + ) ~fmt_child:fmt_code spine ctx' + ) spine ctx + | HmcToken {atok=Hmc.Scan.AbstractToken.(Tok_dedent _|Tok_rparen|Tok_rcapture|Tok_rbrack + |Tok_rarray|Tok_rcurly + |Tok_line_delim|Tok_semi|Tok_bar); _} -> + err_token tok "Expected Hemlock code" ctx, None + | HmcToken _ as token -> + map ~child:code_tl ~f:(fun code_tl -> + CodeToken {token; code_tl} + ) ~fmt_child:fmt_code spine ctx' + | _ -> err_token tok "Expected Hemlock code" ctx, None + +and prod_param_symbol spine ctx = + let spine = "prod_param_symbol" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_cident _; _} as cident -> + reduce spine ctx' fmt_prod_param_symbol (ProdParamSymbolCident {cident=Cident {cident}}) + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_istring _; _} as alias -> + reduce spine ctx' fmt_prod_param_symbol (ProdParamSymbolAlias {alias}) + | _ -> err_token tok "Expected production parameter symbol" ctx, None + +and prod_param_binding spine ctx = + let spine = "prod_param_binding" :: spine in + let colon spine ctx = begin + let spine = "colon" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_colon; _} -> ctx', Some tok + | _ -> err_token tok "Expected ':'" ctx, None + end in + let ctx', ident_colon_opt = mapr ~child:ident ~f:(fun spine ctx' ident -> + map ~child:colon ~f:(fun colon -> + ProdParamBindingIdentColon {ident; colon} + ) ~fmt_child:fmt_prod_param_binding spine ctx' + ) spine ctx + in + match ident_colon_opt with + | Some _ -> ctx', ident_colon_opt + | None -> reduce spine ctx fmt_prod_param_binding ProdParamBindingEpsilon + +and prod_param spine ctx = + let spine = "prod_param" :: spine in + mapr ~child:prod_param_binding ~f:(fun spine ctx' prod_param_binding -> + map ~child:prod_param_symbol ~f:(fun prod_param_symbol -> + ProdParam {prod_param_binding; prod_param_symbol} + ) ~fmt_child:fmt_prod_param spine ctx' + ) spine ctx + +and prod_params_tl spine ctx = + let spine = "prod_params_tl" :: spine in + let ctx', prod_param_opt = prod_param spine ctx in + match prod_param_opt with + | Some prod_param -> + map ~child:prod_params_tl ~f:(fun prod_params_tl -> + ProdParamsTlProdParam {prod_param; prod_params_tl} + ) ~fmt_child:fmt_prod_params_tl spine ctx' + | None -> reduce spine ctx fmt_prod_params_tl ProdParamsTlEpsilon + +and prod_params spine ctx = + let spine = "prod_params" :: spine in + mapr ~child:prod_param ~f:(fun spine ctx' prod_param -> + map ~child:prod_params_tl ~f:(fun prod_params_tl -> + ProdParamsProdParam {prod_param; prod_params_tl} + ) ~fmt_child:fmt_prod_params spine ctx' + ) spine ctx + +and prod_pattern spine ctx = + let spine = "prod_pattern" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_epsilon; _} as epsilon -> + reduce spine ctx' fmt_prod_pattern (ProdPatternEpsilon {epsilon}) + | _ -> + map ~child:prod_params ~f:(fun prod_params -> + ProdPatternParams {prod_params} + ) ~fmt_child:fmt_prod_pattern spine ctx + +and prod spine ctx = + let spine = "prod" :: spine in + mapr ~child:prod_pattern ~f:(fun spine ctx' prod_pattern -> + map ~child:prec_ref ~f:(fun prec_ref -> + Prod {prod_pattern; prec_ref} + ) ~fmt_child:fmt_prod spine ctx' + ) spine ctx + +and prods_tl spine ctx = + let spine = "prods_tl" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_bar; _} as bar -> + mapr ~child:prod ~f:(fun spine ctx' prod -> + map ~child:prods_tl ~f:(fun prods_tl -> + ProdsTlBarProd {bar; prod; prods_tl} + ) ~fmt_child:fmt_prods_tl spine ctx' + ) spine ctx' + | _ -> reduce spine ctx fmt_prods_tl ProdsTlEpsilon + +and prods spine ctx = + let spine = "prods" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_bar; _} as bar -> + mapr ~child:prod ~f:(fun spine ctx' prod -> + map ~child:prods_tl ~f:(fun prods_tl -> + ProdsBarProd {bar; prod; prods_tl} + ) ~fmt_child:fmt_prods spine ctx' + ) spine ctx' + | _ -> + mapr ~child:prod ~f:(fun spine ctx' prod -> + map ~child:prods_tl ~f:(fun prods_tl -> + ProdsProd {prod; prods_tl} + ) ~fmt_child:fmt_prods spine ctx' + ) spine ctx + +and reduction spine ctx = + let spine = "reduction" :: spine in + let arrow spine ctx = begin + let spine = "arrow" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_arrow; _} -> ctx', Some tok + | _ -> err_token tok "Expected '->'" ctx, None + end in + mapr ~child:prods ~f:(fun spine ctx' prods -> + mapr ~child:arrow ~f:(fun spine ctx' arrow -> + map ~child:code ~f:(fun code -> + Reduction {prods; arrow; code} + ) ~fmt_child:fmt_reduction spine ctx' + ) spine ctx' + ) spine ctx + +and reductions_tl spine ctx = + let spine = "reductions_tl" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_bar; _} as bar -> + mapr ~child:reduction ~f:(fun spine ctx' reduction -> + map ~child:reductions_tl ~f:(fun reductions_tl -> + ReductionsTlBarReduction {bar; reduction; reductions_tl} + ) ~fmt_child:fmt_reductions_tl spine ctx' + ) spine ctx' + | _ -> reduce spine ctx fmt_reductions_tl ReductionsTlEpsilon + +and reductions spine ctx = + let spine = "reductions" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_bar; _} as bar -> + mapr ~child:reduction ~f:(fun spine ctx' reduction -> + map ~child:reductions_tl ~f:(fun reductions_tl -> + ReductionsBarReduction {bar; reduction; reductions_tl} + ) ~fmt_child:fmt_reductions spine ctx' + ) spine ctx' + | _ -> + mapr ~child:reduction ~f:(fun spine ctx' reduction -> + map ~child:reductions_tl ~f:(fun reductions_tl -> + ReductionsReduction {reduction; reductions_tl} + ) ~fmt_child:fmt_reductions spine ctx' + ) spine ctx + +and nonterm_type spine ctx = + let spine = "nonterm_type" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_nonterm; _} as nonterm -> + reduce spine ctx' fmt_nonterm_type (NontermTypeNonterm {nonterm}) + | HoccToken {atok=Scan.AbstractToken.Tok_start; _} as start -> + reduce spine ctx' fmt_nonterm_type (NontermTypeStart {start}) + | _ -> err_token tok "Expected 'nonterm'/'start'" ctx, None + +and nonterm spine ctx = + let spine = "nonterm" :: spine in + let cce spine ctx = begin + let spine = "cce" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_colon_op "::="; _} -> ctx', Some tok + | _ -> err_token tok "Expected '::='" ctx, None + end in + mapr ~child:nonterm_type ~f:(fun spine ctx' nonterm_type -> + mapr ~child:cident ~f:(fun spine ctx' cident -> + let _ctx'', tok = next spine ctx' in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_of; _} -> + mapr ~child:of_type ~f:(fun spine ctx' of_type -> + mapr ~child:prec_ref ~f:(fun spine ctx' prec_ref -> + mapr ~child:cce ~f:(fun spine ctx' cce -> + map ~child:reductions ~f:(fun reductions -> + NontermReductions {nonterm_type; cident; of_type; prec_ref; cce; reductions} + ) ~fmt_child:fmt_nonterm spine ctx' + ) spine ctx' + ) spine ctx' + ) spine ctx' + | _ -> + mapr ~child:prec_ref ~f:(fun spine ctx' prec_ref -> + mapr ~child:cce ~f:(fun spine ctx' cce -> + map ~child:prods ~f:(fun prods -> + NontermProds {nonterm_type; cident; prec_ref; cce; prods} + ) ~fmt_child:fmt_nonterm spine ctx' + ) spine ctx' + ) spine ctx' + ) spine ctx' + ) spine ctx + +and stmt spine ctx = + let spine = "stmt" :: spine in + let _ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.(Tok_prec|Tok_left|Tok_right); _} -> + map ~child:prec ~f:(fun prec -> StmtPrec {prec}) ~fmt_child:fmt_stmt spine ctx + | HoccToken {atok=Scan.AbstractToken.Tok_token; _} -> + map ~child:token ~f:(fun token -> StmtToken {token}) ~fmt_child:fmt_stmt spine ctx + | HoccToken {atok=Scan.AbstractToken.(Tok_nonterm|Tok_start); _} -> + map ~child:nonterm ~f:(fun nonterm -> StmtNonterm {nonterm}) ~fmt_child:fmt_stmt spine ctx + | _ -> map ~child:code ~f:(fun code -> StmtCode {code}) ~fmt_child:fmt_stmt spine ctx + +and stmts_tl spine ctx = + let spine = "stmts_tl" :: spine in + let line_delim spine ctx = begin + let spine = "line_delim" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_line_delim; _} -> ctx', Some tok + | _ -> err_token tok "Expected line delimiter" ctx, None + end in + let ctx', line_delim_opt = line_delim spine ctx in + match line_delim_opt with + | Some line_delim -> begin + mapr ~child:stmt ~f:(fun spine ctx' stmt -> + map ~child:stmts_tl ~f:(fun stmts_tl -> + StmtsTl {line_delim; stmt; stmts_tl} + ) ~fmt_child:fmt_stmts_tl spine ctx' + ) spine ctx' + end + | None -> reduce spine ctx fmt_stmts_tl StmtsTlEpsilon + +and stmts spine ctx = + let spine = "stmts" :: spine in + mapr ~child:stmt ~f:(fun spine ctx' stmt -> + map ~child:stmts_tl ~f:(fun stmts_tl -> + Stmts {stmt; stmts_tl} + ) ~fmt_child:fmt_stmts spine ctx' + ) spine ctx + +and hocc spine ctx = + let spine = "hocc" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_hocc; _} as hocc -> + mapr ~child:indent ~f:(fun spine ctx' indent -> + mapr ~child:stmts ~f:(fun spine ctx' stmts -> + map ~child:dedent ~f:(fun dedent -> + Hocc {hocc; indent; stmts; dedent} + ) ~fmt_child:fmt_hocc spine ctx' + ) spine ctx' + ) spine ctx' + | _ -> err_token tok "Expected 'hocc' statement" ctx, None + +and eoi spine ctx = + let spine = "eoi" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_end_of_input; _} as eoi -> + reduce spine ctx' fmt_eoi (Eoi {eoi}) + | _ -> err "Unexpected token before eoi" ctx, None + +and matter spine ctx = + let spine = "matter" :: spine in + let rec f ctx = begin + let ctx', tok = next ~all:true spine ctx in + match tok with + | HoccToken _ + | HmcToken {atok=Hmc.Scan.AbstractToken.Tok_end_of_input; _} -> ctx, MatterEpsilon + | HmcToken _ -> begin + let ctx', matter_rchild = f ctx' in + ctx', Matter {token=tok; matter=matter_rchild} + end + end in + let ctx', matter = f ctx in + reduce ~alt:false spine ctx' fmt_matter matter + +and hmh scanner = + let spine = ["hmh"] in + let ctx = {scanner; errs=[]} in + let ctx', hmh_opt = + mapr ~child:matter ~f:(fun spine ctx' prelude -> + mapr ~child:hocc ~f:(fun spine ctx' hocc -> + mapr ~child:matter ~f:(fun spine ctx' postlude -> + map ~child:eoi ~f:(fun eoi -> + Hmh {prelude; hocc; postlude; eoi} + ) ~fmt_child:fmt_hmh spine ctx' + ) spine ctx' + ) spine ctx' + ) spine ctx + in + match ctx', hmh_opt with + | {errs=(_ :: _); _}, _ + | _, None -> ctx'.scanner, Error ctx'.errs + | {errs=[]; _}, Some hmh -> ctx'.scanner, Ok hmh + +and hmhi scanner = + let spine = ["hmhi"] in + let hocc spine ctx = begin + let spine = "hocc" :: spine in + let ctx', tok = next spine ctx in + match tok with + | HoccToken {atok=Scan.AbstractToken.Tok_hocc; _} as hocc -> ctx', Some hocc + | _ -> err "Expected 'hocc' keyword" ctx, None + end in + let ctx = {scanner; errs=[]} in + let ctx', hmh_opt = + mapr ~child:matter ~f:(fun spine ctx' prelude -> + mapr ~child:hocc ~f:(fun spine ctx' hocc -> + mapr ~child:matter ~f:(fun spine ctx' postlude -> + map ~child:eoi ~f:(fun eoi -> + Hmhi {prelude; hocc; postlude; eoi} + ) ~fmt_child:fmt_hmhi spine ctx' + ) spine ctx' + ) spine ctx' + ) spine ctx + in + match ctx', hmh_opt with + | {errs=(_ :: _); _}, _ + | _, None -> ctx'.scanner, Error ctx'.errs + | {errs=[]; _}, Some hmh -> ctx'.scanner, Ok hmh diff --git a/bootstrap/bin/hocc/prec.ml b/bootstrap/bin/hocc/prec.ml new file mode 100644 index 000000000..982bc6c6a --- /dev/null +++ b/bootstrap/bin/hocc/prec.ml @@ -0,0 +1,22 @@ +open Basis +open Basis.Rudiments + +type t = { + index: uns; + name: string; + assoc: Assoc.t option; + doms: (uns, Uns.cmper_witness) Ordset.t; + stmt: Parse.prec; +} + +let pp {index; name; assoc; doms; stmt} formatter = + formatter + |> Fmt.fmt "{index=" |> Uns.pp index + |> Fmt.fmt "; name=" |> String.pp name + |> Fmt.fmt "; assoc=" |> (Option.pp Assoc.pp) assoc + |> Fmt.fmt "; doms=" |> Ordset.pp doms + |> Fmt.fmt "; stmt=" |> Parse.fmt_prec stmt + |> Fmt.fmt "}" + +let init ~index ~name ~assoc ~doms ~stmt = + {index; name; assoc; doms; stmt} diff --git a/bootstrap/bin/hocc/prec.mli b/bootstrap/bin/hocc/prec.mli new file mode 100644 index 000000000..d1f8c2b47 --- /dev/null +++ b/bootstrap/bin/hocc/prec.mli @@ -0,0 +1,15 @@ +open Basis +open Basis.Rudiments + +type t = { + index: uns; + name: string; + assoc: Assoc.t option; + doms: (uns, Uns.cmper_witness) Ordset.t; + stmt: Parse.prec; +} + +include FormattableIntf.SMono with type t := t + +val init: index:uns -> name:string -> assoc:(Assoc.t option) + -> doms:(uns, Uns.cmper_witness) Ordset.t -> stmt:Parse.prec -> t diff --git a/bootstrap/bin/hocc/prod.ml b/bootstrap/bin/hocc/prod.ml new file mode 100644 index 000000000..c1e37f006 --- /dev/null +++ b/bootstrap/bin/hocc/prod.ml @@ -0,0 +1,40 @@ +open Basis +open Basis.Rudiments + +module T = struct + type t = { + index: uns; + lhs_index: uns; + rhs_indexes: uns array; + prec: Prec.t option; + stmt: Parse.prod option; + reduction: Reduction.t; + } + + let hash_fold {index; _} state = + Uns.hash_fold index state + + let cmp {index=index0; _} {index=index1; _} = + Uns.cmp index0 index1 + + let pp {index; lhs_index; rhs_indexes; prec; stmt; reduction} formatter = + formatter + |> Fmt.fmt "{index=" |> Uns.pp index + |> Fmt.fmt "; lhs_index=" |> Uns.pp lhs_index + |> Fmt.fmt "; rhs_indexes=" |> (Array.pp Uns.pp) rhs_indexes + |> Fmt.fmt "; prec=" |> (Option.pp Prec.pp) prec + |> Fmt.fmt "; stmt=" |> (Option.pp Parse.fmt_prod) stmt + |> Fmt.fmt "; reduction=" |> Reduction.pp reduction + |> Fmt.fmt "}" +end +include T +include Identifiable.Make(T) + +let init ~index ~lhs_index ~rhs_indexes ~prec ~stmt ~reduction = + {index; lhs_index; rhs_indexes; prec; stmt; reduction} + +let is_synthetic {stmt; _} = + Option.is_none stmt + +let is_epsilon {rhs_indexes; _} = + Array.is_empty rhs_indexes diff --git a/bootstrap/bin/hocc/prod.mli b/bootstrap/bin/hocc/prod.mli new file mode 100644 index 000000000..b06085b8b --- /dev/null +++ b/bootstrap/bin/hocc/prod.mli @@ -0,0 +1,19 @@ +open Basis +open Basis.Rudiments + +type t = { + index: uns; + lhs_index: uns; + rhs_indexes: uns array; + prec: Prec.t option; + stmt: Parse.prod option; + reduction: Reduction.t; +} + +include IdentifiableIntf.S with type t := t + +val init: index:uns -> lhs_index:uns -> rhs_indexes:uns array -> prec:Prec.t option + -> stmt:Parse.prod option -> reduction:Reduction.t -> t + +val is_synthetic: t -> bool +val is_epsilon: t -> bool diff --git a/bootstrap/bin/hocc/qualifiedType.ml b/bootstrap/bin/hocc/qualifiedType.ml new file mode 100644 index 000000000..572ec1d4a --- /dev/null +++ b/bootstrap/bin/hocc/qualifiedType.ml @@ -0,0 +1,58 @@ +open Basis +open! Basis.Rudiments + +module T = struct + type t = + | Synthetic + | Implicit + | Explicit of { + module_: string; + type_: string; + } + + let hash_fold t state = + match t with + | Synthetic -> state |> Uns.hash_fold 0L + | Implicit -> state |> Uns.hash_fold 1L + | Explicit {module_; type_} -> begin + state + |> Uns.hash_fold 2L + |> String.hash_fold module_ + |> String.hash_fold type_ + end + + let cmp t0 t1 = + let open Cmp in + match t0, t1 with + | Synthetic, Synthetic -> Eq + | Synthetic, (Implicit|Explicit _) -> Lt + | Implicit, Synthetic -> Gt + | Implicit, Implicit -> Eq + | Implicit, Explicit _ -> Lt + | Explicit _, (Synthetic|Implicit) -> Gt + | Explicit {module_=m0; type_=t0}, Explicit {module_=m1; type_=t1} -> begin + match String.cmp m0 m1 with + | Lt -> Lt + | Eq -> String.cmp t0 t1 + | Gt -> Gt + end + + let pp t formatter = + match t with + | Synthetic -> formatter |> Fmt.fmt "Synthetic" + | Implicit -> formatter |> Fmt.fmt "Implicit" + | Explicit {module_; type_} -> + formatter + |> Fmt.fmt "Explicit {module_=" |> String.pp module_ + |> Fmt.fmt "; type_=" |> String.pp type_ + |> Fmt.fmt "}" +end +include T +include Identifiable.Make(T) + +let synthetic = Synthetic + +let implicit = Implicit + +let init ~module_ ~type_ = + Explicit {module_; type_} diff --git a/bootstrap/bin/hocc/qualifiedType.mli b/bootstrap/bin/hocc/qualifiedType.mli new file mode 100644 index 000000000..fc5ab1d2b --- /dev/null +++ b/bootstrap/bin/hocc/qualifiedType.mli @@ -0,0 +1,16 @@ +open Basis +open! Basis.Rudiments + +type t = + | Synthetic + | Implicit + | Explicit of { + module_: string; + type_: string; + } + +include IdentifiableIntf.S with type t := t + +val synthetic: t +val implicit: t +val init: module_:string -> type_:string -> t diff --git a/bootstrap/bin/hocc/reduction.ml b/bootstrap/bin/hocc/reduction.ml new file mode 100644 index 000000000..4f1f35028 --- /dev/null +++ b/bootstrap/bin/hocc/reduction.ml @@ -0,0 +1,140 @@ +open Basis +open Basis.Rudiments + +module T = struct + module Param = struct + module U = struct + type t = { + binding: string option; + symbol_name: string; + qtype: QualifiedType.t; + prod_param: Parse.prod_param option; + } + + let hash_fold {binding; symbol_name; _} state = + state + |> Option.hash_fold String.hash_fold binding + |> String.hash_fold symbol_name + + let cmp {binding=b0; symbol_name=s0; _} {binding=b1; symbol_name=s1; _} = + let open Cmp in + match Option.cmp String.cmp b0 b1 with + | Lt -> Lt + | Eq -> String.cmp s0 s1 + | Gt -> Gt + + let pp {binding; symbol_name; qtype; prod_param} formatter = + formatter + |> Fmt.fmt "{binding=" |> (Option.pp String.pp) binding + |> Fmt.fmt "; symbol_name=" |> String.pp symbol_name + |> Fmt.fmt "; qtype=" |> QualifiedType.pp qtype + |> Fmt.fmt "; prod_param=" |> (Option.pp Parse.fmt_prod_param) prod_param + |> Fmt.fmt "}" + end + include U + include Identifiable.Make(U) + + let init ~binding ~symbol_name ~qtype ~prod_param = + {binding; symbol_name; qtype; prod_param} + end + + module Params = struct + module U = struct + type t = Param.t array + type elm = Param.t + + let hash_fold t state = + state |> Array.hash_fold Param.hash_fold t + + let cmp t0 t1 = + Array.cmp Param.cmp t0 t1 + + let pp t formatter = + formatter |> (Array.pp Param.pp) t + + let init io params = + Array.fold ~init:(Set.empty (module String)) + ~f:(fun bindings Param.{binding; prod_param; _} -> + match binding with + | None -> bindings + | Some binding -> begin + match Set.mem binding bindings with + | true -> begin + match prod_param with + | Some ProdParam {prod_param_binding=ProdParamBindingIdentColon { + ident=((IdentUident {uident=Uident {uident=binding_token}}) | + (IdentCident {cident=Cident {cident=binding_token}})); _}; _} -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " + |> Hmc.Source.Slice.pp (Scan.Token.source binding_token) + |> Fmt.fmt ": Duplicate parameter binding: " + |> Fmt.fmt (Hmc.Source.Slice.to_string (Scan.Token.source binding_token)) + |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | _ -> not_reached () + end + | false -> Set.insert binding bindings + end + ) params |> ignore; + io, params + + module Cursor = struct + module V = struct + type t = Param.t Array.Cursor.t + + let cmp = Array.Cursor.cmp + let hd = Array.Cursor.hd + let tl = Array.Cursor.tl + let pred = Array.Cursor.pred + let succ = Array.Cursor.succ + let lget = Array.Cursor.lget + let rget = Array.Cursor.rget + let prev = Array.Cursor.prev + let next = Array.Cursor.next + end + include V + include Cmpable.Make(V) + end + let length = Array.length + end + include U + include Identifiable.Make(U) + include Container.MakeMonoIndex(U) + + let to_array t = t + let length = Array.length + let range = Array.range + let get = Array.get + let map = Array.map + end + + type t = { + index: uns; + lhs: QualifiedType.t; + rhs: Params.t; + code: Parse.code option; + } + + let hash_fold {index; _} state = + Uns.hash_fold index state + + let cmp {index=index0; _} {index=index1; _} = + Uns.cmp index0 index1 + + let pp {index; lhs; rhs; code} formatter = + formatter + |> Fmt.fmt "{index=" |> Uns.pp index + |> Fmt.fmt "; lhs=" |> QualifiedType.pp lhs + |> Fmt.fmt "; rhs=" |> Params.pp rhs + |> Fmt.fmt "; code=" |> (Option.pp Parse.fmt_code) code + |> Fmt.fmt "}" +end +include T +include Identifiable.Make(T) + +let init ~index ~lhs ~rhs ~code = + {index; lhs; rhs; code} diff --git a/bootstrap/bin/hocc/reduction.mli b/bootstrap/bin/hocc/reduction.mli new file mode 100644 index 000000000..cccb747e6 --- /dev/null +++ b/bootstrap/bin/hocc/reduction.mli @@ -0,0 +1,42 @@ +open Basis +open Basis.Rudiments + +module Param : sig + type t = { + binding: string option; + symbol_name: string; + qtype: QualifiedType.t; + prod_param: Parse.prod_param option; + } + + include IdentifiableIntf.S with type t := t + + val init: binding:string option -> symbol_name:string -> qtype:QualifiedType.t + -> prod_param:Parse.prod_param option -> t +end + +module Params : sig + type t + type elm = Param.t + + include IdentifiableIntf.S with type t := t + include ContainerIntf.SMonoArray with type t := t with type elm := elm + include ContainerIntf.SMonoIndex with type t := t with type elm := elm + + val init: Io.t -> Param.t array -> Io.t * t + val length: t -> uns + val range: t -> range + val get: uns -> t -> Param.t + val map: f:(Param.t -> 'a) -> t -> 'a array +end + +type t = { + index: uns; + lhs: QualifiedType.t; + rhs: Params.t; + code: Parse.code option; +} + +include IdentifiableIntf.S with type t := t + +val init: index:uns -> lhs:QualifiedType.t -> rhs:Params.t -> code:Parse.code option -> t diff --git a/bootstrap/bin/hocc/scan.ml b/bootstrap/bin/hocc/scan.ml new file mode 100644 index 000000000..63a528bc5 --- /dev/null +++ b/bootstrap/bin/hocc/scan.ml @@ -0,0 +1,113 @@ +open Basis +open! Basis.Rudiments + +module AbstractToken = struct + type t = + | Tok_hocc + | Tok_token + | Tok_nonterm + | Tok_start + | Tok_epsilon + | Tok_prec + | Tok_left + | Tok_right + + let pp t formatter = + formatter |> Fmt.fmt (match t with + | Tok_hocc -> "Tok_hocc" + | Tok_token -> "Tok_token" + | Tok_nonterm -> "Tok_nonterm" + | Tok_start -> "Tok_start" + | Tok_epsilon -> "Tok_epsilon" + | Tok_prec -> "Tok_prec" + | Tok_left -> "Tok_left" + | Tok_right -> "Tok_right" + ) + + let malformations = function + | Tok_hocc | Tok_token | Tok_nonterm | Tok_start | Tok_epsilon | Tok_prec | Tok_left | Tok_right + -> [] +end + +module ConcreteToken = struct + type t = { + atok: AbstractToken.t; + source: Hmc.Source.Slice.t; + } + + let atok t = + t.atok + + let source t = + t.source + + let pp t formatter = + formatter + |> Fmt.fmt "{atok=" |> AbstractToken.pp t.atok + |> Fmt.fmt "; source=" |> Hmc.Source.Slice.pp t.source + |> Fmt.fmt "}" +end + +module Token = struct + type t = + | HmcToken of Hmc.Scan.ConcreteToken.t + | HoccToken of ConcreteToken.t + + let source = function + | HmcToken ctok -> Hmc.Scan.ConcreteToken.source ctok + | HoccToken ctok -> ConcreteToken.source ctok + + let pp t formatter = + match t with + | HmcToken ctok -> formatter |> Fmt.fmt "HmcToken " |> Hmc.Scan.ConcreteToken.pp ctok + | HoccToken ctok -> formatter |> Fmt.fmt "HoccToken " |> ConcreteToken.pp ctok + + let malformations = function + | HmcToken {atok; _} -> Hmc.Scan.AbstractToken.malformations atok + | HoccToken {atok; _} -> AbstractToken.malformations atok +end + +type t = { + scan: Hmc.Scan.t; + next: (t * Token.t) Lazy.t; +} + +let pp {scan; _} formatter = + Hmc.Scan.pp scan formatter + +let rec susp_next scan = lazy begin + let scan', ctok = Hmc.Scan.next scan in + let ctok' = match Hmc.Scan.ConcreteToken.atok ctok with + | Tok_uident (Constant uident) -> begin + let open AbstractToken in + let source = Hmc.Scan.ConcreteToken.source ctok in + match uident with + | "hocc" -> Token.HoccToken {atok=Tok_hocc; source} + | "token" -> Token.HoccToken {atok=Tok_token; source} + | "nonterm" -> Token.HoccToken {atok=Tok_nonterm; source} + | "start" -> Token.HoccToken {atok=Tok_start; source} + | "epsilon" -> Token.HoccToken {atok=Tok_epsilon; source} + | "prec" -> Token.HoccToken {atok=Tok_prec; source} + | "left" -> Token.HoccToken {atok=Tok_left; source} + | "right" -> Token.HoccToken {atok=Tok_right; source} + | _ -> Token.HmcToken ctok + end + | _ -> Token.HmcToken ctok + in + let t' = {scan=scan'; next=susp_next scan'} in + t', ctok' +end + +let init text = + let scan = Hmc.Scan.init text in + let next = susp_next scan in + {scan; next} + +let text {scan; _} = + Hmc.Scan.text scan + +let cursor {scan; _} = + Hmc.Scan.cursor scan + +let next {next; _} = + Lazy.force next diff --git a/bootstrap/bin/hocc/scan.mli b/bootstrap/bin/hocc/scan.mli new file mode 100644 index 000000000..a550a9226 --- /dev/null +++ b/bootstrap/bin/hocc/scan.mli @@ -0,0 +1,69 @@ +(** Thin wrapper around Hmc's scanner that adds hocc-specific keywords. *) + +open Basis +open! Basis.Rudiments + +module AbstractToken: sig + type t = + | Tok_hocc + | Tok_token + | Tok_nonterm + | Tok_start + | Tok_epsilon + | Tok_prec + | Tok_left + | Tok_right + + val pp: t -> (module Fmt.Formatter) -> (module Fmt.Formatter) + + val malformations: t -> Hmc.Scan.AbstractToken.Rendition.Malformation.t list + (** [malformations t] returns a list of malformations associated with [t], or an empty list if + there are no malformations. This function can be used on any token variant, even if no + malformations are possible. *) +end + +module ConcreteToken : sig + type t = { + atok: AbstractToken.t; + source: Hmc.Source.Slice.t; + } + + val atok: t -> AbstractToken.t + val source: t -> Hmc.Source.Slice.t + + include FormattableIntf.SMono with type t := t +end + +module Token: sig + type t = + | HmcToken of Hmc.Scan.ConcreteToken.t + | HoccToken of ConcreteToken.t + + val source: t -> Hmc.Source.Slice.t + + include FormattableIntf.SMono with type t := t + + val malformations: t -> Hmc.Scan.AbstractToken.Rendition.Malformation.t list + (** [malformations t] returns a list of malformations associated with [t], or an empty list if + there are no malformations. This function can be used on any token variant, even if no + malformations are possible. *) +end + +type t + +include FormattableIntf.SMono with type t := t + +val init: Text.t -> t +(** [init text] initializes scanner to scan [text]. *) + +val text: t -> Text.t +(** [text t] returns the source text for [t]. *) + +val cursor: t -> Hmc.Source.Cursor.t +(** [cursor t] returns the cursor at the scanner's current position. This cursor is equivalent to + the base of the token returned by [next t]. *) + +val next: t -> t * Token.t +(** [next t] scans the next token past the tokens scanned by [t]'s predecessor state(s) and returns + the scanner's successor state along with a token. If [t] is at the end of input, there is no + successor state, and [t, (HmcToken EndOfInput)] is returned. *) diff --git a/bootstrap/bin/hocc/spec.ml b/bootstrap/bin/hocc/spec.ml new file mode 100644 index 000000000..76ff28ba2 --- /dev/null +++ b/bootstrap/bin/hocc/spec.ml @@ -0,0 +1,1473 @@ +open Basis +open! Basis.Rudiments + +type t = { + precs: Prec.t array; + symbols: Symbol.t array; + prods: Prod.t array; + reductions: Reduction.t array; + lr1itemsetclosures: Lr1ItemsetClosure.t array; +} + +let string_of_token token = + Hmc.Source.Slice.to_string (Scan.Token.source token) + +let string_of_alias_token token = + match token with + | Scan.Token.HmcToken {atok=Tok_istring (Constant istring); _} -> istring + | _ -> not_reached () + +let precs_init io hmh = + let rec fold_precs_tl io ~precs_map rels doms precs_tl = begin + match precs_tl with + | Parse.PrecsTlCommaUident {uident=Uident {uident}; precs_tl; _} -> begin + let name = string_of_token uident in + let rels = match Set.mem name rels with + | true -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Redundant relation to precedence: " |> Fmt.fmt name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | false -> Set.insert name rels + in + let doms = match Map.get name precs_map with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Relation to undefined precedence: " |> Fmt.fmt name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some Prec.{index; doms=rel_doms; _} -> Ordset.insert index doms |> Ordset.union rel_doms + in + fold_precs_tl io ~precs_map rels doms precs_tl + end + | PrecsTlEpsilon -> io, doms + end in + let fold_precs io ~precs_map precs = begin + match precs with + | Parse.Precs {uident=Uident {uident}; precs_tl} -> begin + let name = string_of_token uident in + let rels = Set.singleton (module String) name in + let doms = match Map.get name precs_map with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Relation to undefined precedence: " |> Fmt.fmt name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some Prec.{index; doms; _} -> Ordset.insert index doms + in + fold_precs_tl io ~precs_map rels doms precs_tl + end + end in + let fold_prec io ~precs_map prec : (Io.t * (string, Prec.t, String.cmper_witness) Map.t) = begin + match prec with + | Parse.Prec {prec_type; uident=Uident {uident}; prec_rels} -> begin + let index = Map.length precs_map in + let name = string_of_token uident in + let assoc = match prec_type with + | PrecTypePrec _ -> None + | PrecTypeLeft _ -> Some Assoc.Left + | PrecTypeRight _ -> Some Assoc.Right + in + let io, doms = match prec_rels with + | PrecRelsLtPrecs {precs; _} -> fold_precs io ~precs_map precs + | PrecRelsEpsilon -> io, Ordset.empty (module Uns) + in + let p = Prec.init ~index ~name ~assoc ~doms ~stmt:prec in + let precs_map = match Map.mem name precs_map with + | true -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Redefined precedence: " |> Fmt.fmt name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | false -> Map.insert_hlt ~k:name ~v:p precs_map + in + io, precs_map + end + end in + let fold_stmt io ~precs_map stmt = begin + match stmt with + | Parse.StmtPrec {prec} -> fold_prec io ~precs_map prec + | _ -> io, precs_map + end in + let rec fold_stmts_tl io ~precs_map stmts_tl = begin + match stmts_tl with + | Parse.StmtsTl {stmt; stmts_tl; _} -> begin + let io, precs_map = fold_stmt io ~precs_map stmt in + fold_stmts_tl io ~precs_map stmts_tl + end + | StmtsTlEpsilon -> io, precs_map + end in + let fold_stmts io ~precs_map stmts = begin + match stmts with + | Parse.Stmts {stmt; stmts_tl} -> begin + let io, precs_map = fold_stmt io ~precs_map stmt in + fold_stmts_tl io ~precs_map stmts_tl + end + end in + let io, precs_map = match hmh with Parse.Hmh {hocc=Hocc {stmts; _}; _} -> + fold_stmts io ~precs_map:(Map.empty (module String)) stmts + in + let precs_kvpairs = + Map.to_array precs_map + |> Array.sort ~cmp:(fun (_, Prec.{index=a; _}) (_, {index=b; _}) -> Uns.cmp a b) in + let precs = Array.init (Array.range precs_kvpairs) ~f:(fun i -> + match Array.get i precs_kvpairs with + | (_, prec) -> prec + ) in + io, precs_map, precs + +let tokens_init io ~precs_map hmh = + let fold_token io ~precs_map ~tokens_map ~aliases_map token = begin + match token with + | Parse.Token {cident=Cident {cident}; token_alias; of_type0; prec_ref; _} -> begin + let index = Map.length tokens_map in + let name = string_of_token cident in + let alias = match token_alias with + | TokenAlias {alias} -> Some (string_of_alias_token alias) + | TokenAliasEpsilon -> None + in + let qtype = match of_type0 with + | OfType0OfType {of_type=OfType { + type_module=Cident {cident}; type_type=Uident {uident}; _}} -> begin + let module_ = string_of_token cident in + let type_ = string_of_token uident in + QualifiedType.init ~module_ ~type_ + end + | OfType0Epsilon -> QualifiedType.implicit + in + let prec = match prec_ref with + | PrecRefPrecUident {uident=Uident {uident}; _} -> begin + let prec_name = string_of_token uident in + match Map.get prec_name precs_map with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Undefined precedence: " |> Fmt.fmt prec_name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some _ as prec -> prec + end + | PrecRefEpsilon -> None + in + let token = + Symbol.init_token ~index ~name ~qtype ~prec ~stmt:(Some token) ~alias in + let tokens_map = match Map.mem name tokens_map with + | true -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source cident) + |> Fmt.fmt ": Redefined token: " |> Fmt.fmt name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | false -> Map.insert_hlt ~k:name ~v:token tokens_map + in + let aliases_map = match token_alias with + | TokenAlias {alias=a} -> begin + let alias_name = string_of_alias_token a in + match Map.mem alias_name aliases_map with + | true -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source a) + |> Fmt.fmt ": Redefined token alias: " |> Fmt.fmt alias_name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | false -> Map.insert_hlt ~k:alias_name ~v:name aliases_map + end + | TokenAliasEpsilon -> aliases_map + in + io, tokens_map, aliases_map + end + end in + let fold_stmt io ~precs_map ~tokens_map ~aliases_map stmt = begin + match stmt with + | Parse.StmtToken {token} -> fold_token io ~precs_map ~tokens_map ~aliases_map token + | _ -> io, tokens_map, aliases_map + end in + let rec fold_stmts_tl io ~precs_map ~tokens_map ~aliases_map stmts_tl = begin + match stmts_tl with + | Parse.StmtsTl {stmt; stmts_tl; _} -> begin + let io, tokens_map, aliases_map = fold_stmt io ~precs_map ~tokens_map ~aliases_map stmt in + fold_stmts_tl io ~precs_map ~tokens_map ~aliases_map stmts_tl + end + | StmtsTlEpsilon -> io, tokens_map, aliases_map + end in + let fold_stmts io ~precs_map ~tokens_map ~aliases_map stmts = begin + match stmts with + | Parse.Stmts {stmt; stmts_tl} -> begin + let io, tokens_map, aliases_map = fold_stmt io ~precs_map ~tokens_map ~aliases_map stmt in + fold_stmts_tl io ~precs_map ~tokens_map ~aliases_map stmts_tl + end + end in + let tokens_map, aliases_map = List.fold + [("EPSILON", "ε"); ("PSEUDO_END", "⊥")] + ~init:(Map.empty (module String), Map.empty (module String)) + ~f:(fun (tokens_map, aliases_map) (name, alias) -> + let index = Map.length tokens_map in + let token = Symbol.init_token ~index ~name ~qtype:QualifiedType.implicit ~prec:None + ~stmt:None ~alias:(Some alias) in + let tokens_map' = Map.insert ~k:name ~v:token tokens_map in + let aliases_map' = Map.insert ~k:alias ~v:name aliases_map in + tokens_map', aliases_map' + ) + in + let io, tokens_map, aliases_map = match hmh with Parse.Hmh {hocc=Hocc {stmts; _}; _} -> + fold_stmts io ~precs_map ~tokens_map ~aliases_map stmts + in + io, tokens_map, aliases_map + +(* Ephemeral symbol information. Symbols have to be processed in two passes due to their mutually + * recursive form. `SymbolInfo.t` captures only the name->metadata required of the first pass. *) +module SymbolInfo = struct + type t = { + index: uns; + name: string; + alias: string option; + qtype: QualifiedType.t; + } + + let init ~index ~name ~alias ~qtype = + {index; name; alias; qtype} +end + +let symbol_infos_init io ~tokens_map hmh = + let insert_symbol_info ~k ~v k_token symbol_infos = begin + match Map.mem k symbol_infos with + | true -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source k_token) + |> Fmt.fmt ": Redefined symbol: " |> Fmt.fmt k |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | false -> Map.insert_hlt ~k ~v symbol_infos + end in + let fold_nonterm io ~symbol_infos nonterm = begin + let name, qtype = match nonterm with + | Parse.NontermProds {cident=Cident {cident=nonterm_cident}; _} -> + string_of_token nonterm_cident, QualifiedType.implicit + | NontermReductions {cident=Cident {cident=nonterm_cident}; of_type=OfType { + type_module=Cident {cident}; type_type=Uident {uident}; _}; _} -> begin + let name = string_of_token nonterm_cident in + let module_ = string_of_token cident in + let type_ = string_of_token uident in + name, QualifiedType.init ~module_ ~type_ + end + in + match nonterm with + | NontermProds {nonterm_type; cident=Cident {cident}; _} + | NontermReductions {nonterm_type; cident=Cident {cident}; _} -> begin + let index = Map.length symbol_infos in + let symbol_info = SymbolInfo.init ~index ~name ~alias:None ~qtype in + let symbol_infos = insert_symbol_info ~k:name ~v:symbol_info cident symbol_infos in + let io, symbol_infos = match nonterm_type with + | NontermTypeNonterm _ -> io, symbol_infos + | NontermTypeStart _ -> begin + (* Synthesize start symbol wrapper. *) + let index' = Map.length symbol_infos in + let name' = name ^ "'" in + let symbol_info' = SymbolInfo.init ~index:index' ~name:name' ~alias:None + ~qtype:QualifiedType.Synthetic in + let symbol_infos = insert_symbol_info ~k:name' ~v:symbol_info' cident symbol_infos in + io, symbol_infos + end + in + io, symbol_infos + end + end in + let fold_stmt io ~symbol_infos stmt = begin + match stmt with + | Parse.StmtNonterm {nonterm} -> fold_nonterm io ~symbol_infos nonterm + | _ -> io, symbol_infos + end in + let rec fold_stmts_tl io ~symbol_infos stmts_tl = begin + match stmts_tl with + | Parse.StmtsTl {stmt; stmts_tl; _} -> begin + let io, symbol_infos = fold_stmt io ~symbol_infos stmt in + fold_stmts_tl io ~symbol_infos stmts_tl + end + | StmtsTlEpsilon -> io, symbol_infos + end in + let fold_stmts io ~symbol_infos stmts = begin + match stmts with + | Parse.Stmts {stmt; stmts_tl} -> begin + let io, symbol_infos = fold_stmt io ~symbol_infos stmt in + fold_stmts_tl io ~symbol_infos stmts_tl + end + end in + let symbol_infos = Map.fold ~init:(Map.empty (module String)) + ~f:(fun symbol_infos (_name, Symbol.{index; name; alias; qtype; _}) -> + let symbol_info = SymbolInfo.init ~index ~name ~alias ~qtype in + Map.insert_hlt ~k:name ~v:symbol_info symbol_infos + ) tokens_map in + let io, symbol_infos = match hmh with Parse.Hmh {hocc=Hocc {stmts; _}; _} -> + fold_stmts io ~symbol_infos stmts + in + io, symbol_infos + +let symbols_init io ~precs_map ~tokens_map ~aliases_map ~symbol_infos hmh = + let fold_prod_param io ~aliases_map ~symbol_infos prod_params prod_param = begin + match prod_param with + | Parse.ProdParam {prod_param_binding; prod_param_symbol} -> begin + let binding = match prod_param_binding with + | ProdParamBindingIdentColon {ident=IdentUident {uident=Uident {uident=ident}}; _} + | ProdParamBindingIdentColon {ident=IdentCident {cident=Cident {cident=ident}}; _} -> + Some (string_of_token ident) + | ProdParamBindingIdentColon {ident=IdentUscore _; _} + | ProdParamBindingEpsilon -> None + in + let io, symbol_name, qtype = match prod_param_symbol with + | ProdParamSymbolCident {cident=Cident {cident}} -> begin + let symbol_name = string_of_token cident in + match Map.get symbol_name symbol_infos with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source cident) + |> Fmt.fmt ": Undefined symbol: " |> Fmt.fmt symbol_name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some SymbolInfo.{name; alias; qtype; _} -> begin + let io = match alias with + | Some alias -> begin + io.log + |> Fmt.fmt "hocc: At " + |> Hmc.Source.Slice.pp (Scan.Token.source cident) + |> Fmt.fmt ": Unused token alias " |> String.pp alias |> Fmt.fmt " for " + |> Fmt.fmt symbol_name |> Fmt.fmt "\n" + |> Io.with_log io + end + | None -> io + in + io, name, qtype + end + end + | ProdParamSymbolAlias {alias} -> begin + let alias_name = string_of_alias_token alias in + match Map.get alias_name aliases_map with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source alias) + |> Fmt.fmt ": Undefined alias: " |> Fmt.fmt alias_name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some token_name -> begin + match Map.get_hlt token_name symbol_infos with + | SymbolInfo.{name; qtype; _} -> io, name, qtype + end + end + in + let param = + Reduction.Param.init ~binding ~symbol_name ~qtype ~prod_param:(Some prod_param) in + io, param :: prod_params + end + end in + let rec fold_prod_params_tl io ~aliases_map ~symbol_infos prod_params + prod_params_tl = begin + match prod_params_tl with + | Parse.ProdParamsTlProdParam {prod_param; prod_params_tl} -> begin + let io, prod_params = + fold_prod_param io ~aliases_map ~symbol_infos prod_params prod_param in + fold_prod_params_tl io ~aliases_map ~symbol_infos prod_params prod_params_tl + end + | ProdParamsTlEpsilon -> Reduction.Params.init io (Array.of_list_rev prod_params) + end in + let fold_prod_pattern io ~aliases_map ~symbol_infos prod_pattern = begin + match prod_pattern with + | Parse.ProdPatternParams {prod_params=ProdParamsProdParam {prod_param; prod_params_tl}} + -> begin + let io, prod_params = fold_prod_param io ~aliases_map ~symbol_infos [] prod_param in + fold_prod_params_tl io ~aliases_map ~symbol_infos prod_params prod_params_tl + end + | ProdPatternEpsilon _ -> Reduction.Params.init io [||] + end in + let fold_prod io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code ~reduction nonterm_prods_set prod = begin + match prod with + | Parse.Prod {prod_pattern; prec_ref} -> begin + let lhs_index = Symbol.(nonterm_prodless.index) in + let io, rhs = fold_prod_pattern io ~aliases_map ~symbol_infos prod_pattern in + let io = match code with + | Some _ -> io + | None -> begin + (* Codeless productions have no use for parameter bindings. *) + Reduction.Params.fold ~init:io ~f:(fun io Reduction.Param.{binding; prod_param; _} -> + match binding with + | Some binding -> begin + let binding_token = match prod_param with + | Some ProdParam {prod_param_binding=ProdParamBindingIdentColon { + ident=(IdentUident {uident=Uident {uident=token}}) | + (IdentCident {cident=Cident {cident=token}}); _}; _} -> token + | _ -> not_reached () + in + io.log + |> Fmt.fmt "hocc: At " + |> Hmc.Source.Slice.pp (Scan.Token.source binding_token) + |> Fmt.fmt ": Unused parameter binding: " |> Fmt.fmt binding |> Fmt.fmt "\n" + |> Io.with_log io + end + | None -> io + ) rhs + end + in + let rhs_indexes = Reduction.Params.map ~f:(fun Reduction.Param.{symbol_name; _} -> + (Map.get_hlt symbol_name symbol_infos).index + ) rhs in + let prec = match prec_ref with + | PrecRefPrecUident {uident=Uident {uident}; _} -> begin + let prec_name = string_of_token uident in + match Map.get prec_name precs_map with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Undefined precedence: " |> Fmt.fmt prec_name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some _ as prec -> begin + match nonterm_prodless.prec with + | Some _ -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Precedence already specified by nonterm\n" + |> Io.with_err io + in + Io.fatal io + end + | None -> prec + end + end + | PrecRefEpsilon -> None + in + let index = Ordset.length reductions_set in + let lhs = Symbol.(nonterm_prodless.qtype) in + let reduction, reductions_set = match reduction with + | Some reduction -> reduction, reductions_set + | None -> begin + let reduction = Reduction.init ~index ~lhs ~rhs ~code in + let reductions_set = Ordset.insert reduction reductions_set in + reduction, reductions_set + end + in + let index = Ordset.length prods_set in + let prod = Prod.init ~index ~lhs_index ~rhs_indexes ~prec ~stmt:(Some prod) ~reduction in + let nonterm_prods_set = Ordset.insert prod nonterm_prods_set in + let prods_set = Ordset.insert prod prods_set in + io, nonterm_prods_set, prods_set, reductions_set, prod + end + end in + let rec fold_prods_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code ~reduction nonterm_prods_set prods_tl = begin + match prods_tl with + | Parse.ProdsTlBarProd {prod; prods_tl; _} -> begin + let io, nonterm_prods_set, prods_set, reductions_set, _prod = + fold_prod io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code ~reduction nonterm_prods_set prod in + fold_prods_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code ~reduction nonterm_prods_set prods_tl + end + | ProdsTlEpsilon -> io, nonterm_prods_set, prods_set, reductions_set + end in + let fold_prods io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless prods = begin + match prods with + | Parse.ProdsBarProd {prod; prods_tl; _} + | ProdsProd {prod; prods_tl} -> begin + let code = None in + let reduction = None in + let nonterm_prods_set = Ordset.empty (module Prod) in + let io, nonterm_prods_set, prods_set, reductions_set, _prod = + fold_prod io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code ~reduction nonterm_prods_set prod in + fold_prods_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code ~reduction nonterm_prods_set prods_tl + end + end in + let fold_reduction io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless nonterm_prods_set reduction = begin + match reduction with + | Parse.Reduction {prods; code; _} -> begin + (* Map one or more prods to a single reduction. *) + match prods with + | ProdsBarProd {prod; prods_tl; _} + | ProdsProd {prod; prods_tl} -> begin + let reduction_prods = Ordset.empty (module Prod) in + let io, reduction_prods_merge, prods_set, reductions_set, prod = + fold_prod io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code:(Some code) ~reduction:None reduction_prods prod in + let reduction_prods = Ordset.union reduction_prods_merge reduction_prods in + let io, reduction_prods_merge, prods_set, reductions_set = + fold_prods_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless ~code:(Some code) ~reduction:(Some prod.reduction) reduction_prods + prods_tl in + let reduction_prods = Ordset.union reduction_prods_merge reduction_prods in + (* Verify that the prods' parameters are uniform. *) + let () = Ordset.iter ~f:(fun prod1 -> + let open Cmp in + match Reduction.Params.cmp Prod.(prod.reduction.rhs) Prod.(prod1.reduction.rhs) with + | Lt + | Gt -> begin + let pattern_source = Option.value_hlt ( + match prod1.stmt with + | Some (Prod {prod_pattern; _}) -> Parse.source_of_prod_pattern prod_pattern + | None -> not_reached () + ) in + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp pattern_source + |> Fmt.fmt ": Inconsistent production parametrization\n" + |> Io.with_err io + in + Io.fatal io + end + | Eq -> () + ) reduction_prods in + let nonterm_prods_set = Ordset.union reduction_prods nonterm_prods_set in + io, nonterm_prods_set, prods_set, reductions_set + end + end + end in + let rec fold_reductions_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless nonterm_prods_set reductions_tl = begin + match reductions_tl with + | Parse.ReductionsTlBarReduction {reduction; reductions_tl; _} -> begin + let io, nonterm_prods_set, prods_set, reductions_set = + fold_reduction io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless nonterm_prods_set reduction in + fold_reductions_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless nonterm_prods_set reductions_tl + end + | ReductionsTlEpsilon -> io, nonterm_prods_set, prods_set, reductions_set + end in + let fold_reductions io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless reductions = begin + match reductions with + | Parse.ReductionsBarReduction {reduction; reductions_tl; _} + | ReductionsReduction {reduction; reductions_tl} -> begin + let nonterm_prods_set = Ordset.empty (module Prod) in + let io, nonterm_prods_set, prods_set, reductions_set = + fold_reduction io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless nonterm_prods_set reduction in + fold_reductions_tl io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless nonterm_prods_set reductions_tl + end + end in + let fold_nonterm io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set nonterm = begin + let start, name, prec = match nonterm with + | Parse.NontermProds {nonterm_type; cident=Cident {cident}; prec_ref; _} + | NontermReductions {nonterm_type; cident=Cident {cident}; prec_ref; _} -> begin + let start = match nonterm_type with + | NontermTypeNonterm _ -> false + | NontermTypeStart _ -> true + in + let name = string_of_token cident in + let prec = match prec_ref with + | PrecRefPrecUident {uident=Uident {uident}; _} -> begin + let prec_name = string_of_token uident in + match Map.get prec_name precs_map with + | None -> begin + let io = + io.err + |> Fmt.fmt "hocc: At " |> Hmc.Source.Slice.pp (Scan.Token.source uident) + |> Fmt.fmt ": Undefined precedence: " |> Fmt.fmt prec_name |> Fmt.fmt "\n" + |> Io.with_err io + in + Io.fatal io + end + | Some _ as prec -> prec + end + | PrecRefEpsilon -> None + in + start, name, prec + end + in + let SymbolInfo.{index; qtype; _} = Map.get_hlt name symbol_infos in + let epsilon = Map.get_hlt "EPSILON" symbols_map in + let nonterm_prodless = Symbol.init_nonterm ~index ~name ~qtype ~prec ~stmt:(Some nonterm) ~start + ~prods:(Ordset.empty (module Prod)) ~epsilon in + let io, prods, prods_set, reductions_set = match nonterm with + | NontermProds {prods; _} -> + fold_prods io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless prods + | NontermReductions {reductions; _} -> + fold_reductions io ~precs_map ~aliases_map ~symbol_infos ~prods_set ~reductions_set + ~nonterm_prodless reductions + in + let nonterm = + Symbol.init_nonterm ~index ~name ~qtype ~prec ~start ~prods ~stmt:(Some nonterm) ~epsilon in + let symbols_map = Map.insert_hlt ~k:name ~v:nonterm symbols_map in + let io, symbols_map, prods_set, reductions_set = match start with + | false -> io, symbols_map, prods_set, reductions_set + | true -> begin + (* Synthesize wrapper for start symbol. *) + let name' = name ^ "'" in + let SymbolInfo.{index=index'; _} = Map.get_hlt name' symbol_infos in + let SymbolInfo.{index=pe_index; name=pe_name; qtype=pe_qtype; _} = + Map.get_hlt "PSEUDO_END" symbol_infos in + let io, rhs = Reduction.Params.init io [| + Reduction.Param.init ~binding:(Some "start") ~symbol_name:name ~qtype ~prod_param:None; + Reduction.Param.init ~binding:None ~symbol_name:pe_name ~qtype:pe_qtype + ~prod_param:None; + |] in + let reduction = Reduction.init ~index:(Ordset.length reductions_set) + ~lhs:QualifiedType.synthetic ~rhs ~code:None in + let reductions_set = Ordset.insert reduction reductions_set in + let prod = Prod.init ~index:(Ordset.length prods_set) ~lhs_index:index' + ~rhs_indexes:[|index; pe_index|] ~prec:None ~stmt:None ~reduction in + let prods = Ordset.singleton (module Prod) prod in + let prods_set = Ordset.insert prod prods_set in + let nonterm' = Symbol.init_nonterm ~index:index' ~name:name' + ~qtype:QualifiedType.synthetic ~prec:None ~stmt:None ~start ~prods ~epsilon in + let symbols_map = Map.insert_hlt ~k:name' ~v:nonterm' symbols_map in + io, symbols_map, prods_set, reductions_set + end + in + io, symbols_map, prods_set, reductions_set + end in + let fold_stmt io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set ~reductions_set + stmt = begin + match stmt with + | Parse.StmtNonterm {nonterm} -> + fold_nonterm io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set nonterm + | _ -> io, symbols_map, prods_set, reductions_set + end in + let rec fold_stmts_tl io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set stmts_tl = begin + match stmts_tl with + | Parse.StmtsTl {stmt; stmts_tl; _} -> begin + let io, symbols_map, prods_set, reductions_set = + fold_stmt io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set stmt in + fold_stmts_tl io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set stmts_tl + end + | StmtsTlEpsilon -> io, symbols_map, prods_set, reductions_set + end in + let fold_stmts io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set ~reductions_set + stmts = begin + match stmts with + | Parse.Stmts {stmt; stmts_tl} -> begin + let io, symbols_map, prods_set, reductions_set = + fold_stmt io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set stmt in + fold_stmts_tl io ~precs_map ~aliases_map ~symbol_infos ~symbols_map ~prods_set + ~reductions_set stmts_tl + end + end in + (* Compute first/follow sets for all symbols. *) + let close_symbols nonterm_range epsilon symbols = begin + (* Iterate to a fixed point, given a per prod folding function. *) + let close_impl nonterm_range epsilon symbols ~fold_prod = begin + let fold_prods epsilon symbols ~fold_prod symbol = begin + Ordset.fold ~init:(symbols, symbol, false) ~f:(fun (symbols, symbol, merged) prod -> + match fold_prod ~epsilon symbols symbol prod with + | symbols', false -> symbols', symbol, merged + | symbols', true -> symbols', Array.get Symbol.(symbol.index) symbols', true + ) Symbol.(symbol.prods) + end in + let fold_nonterms nonterm_range epsilon symbols ~fold_prod = begin + Range.Uns.fold ~init:(symbols, false) ~f:(fun (symbols, merged) i -> + let symbol = Array.get i symbols in + match fold_prods epsilon symbols ~fold_prod symbol with + | _, _, false -> symbols, merged + | symbols', _, true -> symbols', true + ) nonterm_range + end in + let rec f nonterm_range epsilon symbols ~fold_prod = begin + match fold_nonterms nonterm_range epsilon symbols ~fold_prod with + | _, false -> symbols + | symbols', true -> f nonterm_range epsilon symbols' ~fold_prod + end in + f nonterm_range epsilon symbols ~fold_prod + end in + let close_first nonterm_range epsilon symbols = begin + let fold_prod ~epsilon symbols symbol prod = begin + let lr0item = Lr0Item.init ~prod ~dot:0L in + let lr1item = Lr1Item.init ~lr0item + ~follow:(Ordset.singleton (module Uns) Symbol.(epsilon.index)) in + let rhs_first = Lr1Item.first ~epsilon ~symbols lr1item in + (* Merge the RHS's first set into symbol's first set. *) + match Symbol.first_has_diff rhs_first symbol with + | false -> symbols, false + | true -> begin + let symbol' = Symbol.first_union rhs_first symbol in + let symbols' = Array.set symbol'.index symbol' symbols in + symbols', true + end + end in + close_impl nonterm_range epsilon symbols ~fold_prod + end in + let close_follow nonterm_range epsilon symbols = begin + let fold_prod ~epsilon symbols symbol prod = begin + match Array.length Prod.(prod.rhs_indexes) with + | 0L -> symbols, false + | _rhs_length -> begin + Array.Slice.foldi (Array.Slice.init prod.rhs_indexes) + ~init:(symbols, false) ~f:(fun i (symbols, merged) b_index -> + (* A ::= αBβ *) + let b = Array.get b_index symbols in + let lr0item = Lr0Item.init ~prod ~dot:(succ i) in + let lr1item = Lr1Item.init ~lr0item + ~follow:(Ordset.singleton (module Uns) Symbol.(epsilon.index)) in + let first_beta = Lr1Item.first ~epsilon ~symbols lr1item in + let first_beta_sans_epsilon = Ordset.remove epsilon.index first_beta in + (* Merge β's first set (sans "ε") into B's follow set. *) + let symbols', b', merged' = + match Symbol.follow_has_diff first_beta_sans_epsilon b with + | false -> symbols, b, merged + | true -> begin + let b' = Symbol.follow_union first_beta_sans_epsilon b in + let symbols' = Array.set b_index b' symbols in + symbols', b', true + end + in + (* If β's first set contains "ε", merge A's follow set into B's follow set. *) + let symbols', merged' = match Ordset.mem epsilon.index first_beta && + Symbol.follow_has_diff Symbol.(symbol.follow) b' with + | false -> symbols', merged' + | true -> begin + let b' = Symbol.follow_union symbol.follow b' in + let symbols' = Array.set b_index b' symbols in + symbols', true + end + in + symbols', merged' + ) + end + end in + close_impl nonterm_range epsilon symbols ~fold_prod + end in + symbols + |> close_first nonterm_range epsilon + |> close_follow nonterm_range epsilon + end in + (* Extract the non-terminal specifications from the AST. The end result will be + * symbols/prods/reductions arrays, each of which contains records that themselves encode their + * array offsets. Tokens have already been extracted into `tokens_map`, and symbols have already + * been assigned indices viz `symbol_infos`; prod/reduction indexes are incrementally assigned + * during AST traversal. *) + let reductions_set = Ordset.empty (module Reduction) in + let prods_set = Ordset.empty (module Prod) in + let io, symbols_map, prods_set, reductions_set = + match hmh with Parse.Hmh {hocc=Hocc {stmts; _}; _} -> + fold_stmts io ~precs_map ~aliases_map ~symbol_infos ~symbols_map:tokens_map ~prods_set + ~reductions_set stmts + in + (* Convert symbols map to array and close on the first/follow sets. *) + let epsilon = Map.get_hlt "EPSILON" symbols_map in + let symbols_kvpairs = + Map.to_array symbols_map + |> Array.sort ~cmp:(fun (_, symbol0) (_, symbol1) -> Symbol.cmp symbol0 symbol1) in + let nonterm_range = Map.length tokens_map =:< Map.length symbols_map in + let symbols = Array.init (Array.range symbols_kvpairs) ~f:(fun i -> + match Array.get i symbols_kvpairs with + | (_, symbol) -> symbol + ) |> close_symbols nonterm_range epsilon in + (* Convert productions set to array. *) + let prods = Ordset.to_array prods_set in + (* Convert reductions map to array. *) + let reductions = Ordset.to_array reductions_set in + io, epsilon, symbols, prods, reductions + +let lr1itemsetclosures_init io ~epsilon ~symbols ~nonterm_range ~algorithm = + (* Collect the initial LR(1) item set closures that comprise the initial work queue. There is one + * such closure for each synthetic start symbol. *) + let init ~epsilon ~symbols ~nonterm_range = begin + let lr1itemsetclosures_map, kernels, workq = Range.Uns.fold + ~init:(Ordmap.empty (module Uns), Map.empty (module Lr0Itemset), Workq.empty) + ~f:(fun ((lr1itemsetclosures_map, kernels, workq) as accum) symbol_index -> + let symbol = Array.get symbol_index symbols in + match Symbol.is_synthetic symbol with + | false -> accum + | true -> begin + assert (Uns.(=) (Ordset.length symbol.prods) 1L); + let prod = Ordset.choose_hlt symbol.prods in (* There can be only one. ⚔ *) + let dot = 0L in + let lr0item = Lr0Item.init ~prod ~dot in + let lr1item = Lr1Item.init ~lr0item ~follow:symbol.follow in + let lr1itemset = Lr1Itemset.singleton lr1item in + let lr0itemset = Lr1Itemset.lr0itemset lr1itemset in + let index = Ordmap.length lr1itemsetclosures_map in + let lr1itemsetclosure = Lr1ItemsetClosure.init ~epsilon ~symbols ~index lr1itemset in + let lr1itemsetclosures_map' = + Ordmap.insert ~k:index ~v:lr1itemsetclosure lr1itemsetclosures_map in + let kernels' = + Map.insert_hlt ~k:lr0itemset ~v:(Ordset.singleton (module Uns) index) kernels in + let workq' = Workq.push_back index workq in + lr1itemsetclosures_map', kernels', workq' + end + ) nonterm_range in + lr1itemsetclosures_map, kernels, workq + end in + (* Iteratively process the work queue until no work remains. *) + let rec close_lr1itemsets io ~epsilon ~symbols ~compat ~lr1itemsetclosures_map ~kernels ~workq + = begin + match Workq.is_empty workq with + | true -> io, lr1itemsetclosures_map + | false -> begin + let index, workq' = Workq.pop workq in + let lr1itemsetclosure = Ordmap.get_hlt index lr1itemsetclosures_map in + let io, lr1itemsetclosures_map', kernels', workq' = Ordset.fold + ~init:(io, lr1itemsetclosures_map, kernels, workq') + ~f:(fun (io, lr1itemsetclosures_map, kernels, workq) symbol_index -> + let symbol = Array.get symbol_index symbols in + let goto = Lr1ItemsetClosure.goto symbol lr1itemsetclosure in + let goto_kernel = Lr1Itemset.lr0itemset goto in + match Map.get goto_kernel kernels with + | None -> begin + (* Create a new LR(1) item set closure. *) + let goto_index = Ordmap.length lr1itemsetclosures_map in + let goto_closure = + Lr1ItemsetClosure.init ~epsilon ~symbols ~index:goto_index goto in + let lr1itemsetclosures_map' = Ordmap.insert_hlt ~k:goto_index ~v:goto_closure + lr1itemsetclosures_map in + let io = io.log |> Fmt.fmt "+" |> Io.with_log io in + let indexes = Ordset.singleton (module Uns) goto_index in + let kernels' = Map.insert_hlt ~k:goto_kernel ~v:indexes kernels in + let workq' = Workq.push_back goto_index workq in + io, lr1itemsetclosures_map', kernels', workq' + end + | Some indexes -> begin + match Ordset.fold_until ~init:None ~f:(fun _ merge_index -> + let merge_lr1itemsetclosure = + Ordmap.get_hlt merge_index lr1itemsetclosures_map in + match compat goto merge_lr1itemsetclosure with + | false -> None, false + | true -> Some merge_index, true + ) indexes with + | None -> begin + (* Create a new LR(1) item set closure; non-unique goto kernel. *) + let goto_index = Ordmap.length lr1itemsetclosures_map in + let goto_closure = + Lr1ItemsetClosure.init ~epsilon ~symbols ~index:goto_index goto in + let lr1itemsetclosures_map' = Ordmap.insert_hlt ~k:goto_index + ~v:goto_closure lr1itemsetclosures_map in + let io = io.log |> Fmt.fmt "*" |> Io.with_log io in + let indexes' = Ordset.insert goto_index indexes in + let kernels' = Map.update_hlt ~k:goto_kernel ~v:indexes' kernels in + let workq' = Workq.push_back goto_index workq in + io, lr1itemsetclosures_map', kernels', workq' + end + | Some merge_index -> begin + (* Merge into existing LR(1) item set closure. *) + let merge_lr1itemsetclosure = + Ordmap.get_hlt merge_index lr1itemsetclosures_map in + let merged, merge_lr1itemsetclosure' = + Lr1ItemsetClosure.merge ~epsilon ~symbols goto merge_lr1itemsetclosure in + let io, lr1itemsetclosures_map', workq' = match merged with + | false -> io, lr1itemsetclosures_map, workq + | true -> begin + let lr1itemsetclosures_map' = Ordmap.update_hlt ~k:merge_index + ~v:merge_lr1itemsetclosure' lr1itemsetclosures_map in + let workq' = match Workq.mem merge_index workq with + | true -> workq + | false -> Workq.push merge_index workq + in + let io = io.log |> Fmt.fmt "." |> Io.with_log io in + io, lr1itemsetclosures_map', workq' + end + in + io, lr1itemsetclosures_map', kernels, workq' + end + end + ) (Lr1ItemsetClosure.next lr1itemsetclosure) in + close_lr1itemsets io ~epsilon ~symbols ~compat + ~lr1itemsetclosures_map:lr1itemsetclosures_map' ~kernels:kernels' ~workq:workq' + end + end in + let lr1itemsetclosures_map, kernels, workq = init ~epsilon ~symbols ~nonterm_range in + let compat = match algorithm with + | Conf.LR1Compact -> Lr1ItemsetClosure.compat_weak + | Conf.LR1Canonical -> Lr1ItemsetClosure.compat_ident + in + let io = + io.log + |> Fmt.fmt "hocc: LR(1) item set compatibility: " + |> Fmt.fmt ( + match algorithm with + | Conf.LR1Compact -> "weak" + | Conf.LR1Canonical -> "ident" + ) + |> Fmt.fmt "\n" + |> Io.with_log io + in + let io, lr1itemsetclosures_map = + close_lr1itemsets io ~epsilon ~symbols ~compat ~lr1itemsetclosures_map ~kernels ~workq in + (* XXX Log unused symbols. *) + (* XXX Log unused productions. *) + (* Convert lr1itemsetclosures map to array. *) + let lr1itemsetclosures = + Ordmap.to_array lr1itemsetclosures_map + |> Array.map ~f:(fun (_, lr1itemsetclosure) -> lr1itemsetclosure) in + io, lr1itemsetclosures + +let init conf io hmh = + let io = io.log |> Fmt.fmt "hocc: Generating specification\n" |> Io.with_log io in + let io, precs_map, precs = precs_init io hmh in + let io, tokens_map, aliases_map = tokens_init io ~precs_map hmh in + let io, symbol_infos = symbol_infos_init io ~tokens_map hmh in + let io, epsilon, symbols, prods, reductions = + symbols_init io ~precs_map ~tokens_map ~aliases_map ~symbol_infos hmh in + let nonterm_range = Map.length tokens_map =:< Array.length symbols in + let io, lr1itemsetclosures = + lr1itemsetclosures_init io ~epsilon ~symbols ~nonterm_range ~algorithm:(Conf.algorithm conf) in + io, {precs; symbols; prods; reductions; lr1itemsetclosures} + +let to_txt conf io t = + let pp_symbol_index symbol_index formatter = begin + let symbol = Array.get symbol_index t.symbols in + match symbol.alias with + | None -> formatter |> Fmt.fmt symbol.name + | Some alias -> formatter |> String.pp alias + end in + let pp_prod Prod.{lhs_index; rhs_indexes; prec; _} formatter = begin + formatter + |> Fmt.fmt (Symbol.name (Array.get lhs_index t.symbols)) + |> Fmt.fmt " ::=" + |> (fun formatter -> + match Array.length rhs_indexes with + | 0L -> formatter |> Fmt.fmt " epsilon" + | _ -> begin + Array.fold ~init:formatter ~f:(fun formatter rhs_index -> + formatter + |> Fmt.fmt " " + |> pp_symbol_index rhs_index + ) rhs_indexes + end + ) + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + end in + let io = io.log |> Fmt.fmt "hocc: Generating text report\n" |> Io.with_log io in + io.txt + |> Fmt.fmt (Path.Segment.to_string_hlt (Conf.module_ conf)) |> Fmt.fmt " grammar\n\n" + |> Fmt.fmt "Precedences\n" + |> (fun formatter -> + Array.fold ~init:formatter ~f:(fun formatter Prec.{name; assoc; doms; _} -> + formatter + |> Fmt.fmt (match assoc with + | None -> " prec " + | Some Left -> " left " + | Some Right -> " right " + ) + |> Fmt.fmt name + |> (fun formatter -> + match Ordset.is_empty doms with + | true -> formatter + | false -> begin + let _, formatter = Ordset.fold ~init:(true, formatter) + ~f:(fun (first, formatter) prec_ind -> + let formatter = + formatter + |> Fmt.fmt (match first with + | true -> " < " + | false -> ", " + ) + |> Fmt.fmt (Array.get prec_ind t.precs).name + in + (false, formatter) + ) doms + in + formatter + end + ) + |> Fmt.fmt "\n" + ) t.precs + ) + |> Fmt.fmt "Tokens\n" + |> (fun formatter -> + Array.fold ~init:formatter + ~f:(fun formatter (Symbol.{name; alias; qtype; prec; first; follow; _} as symbol) -> + match Symbol.is_token symbol with + | false -> formatter + | true -> begin + formatter + |> Fmt.fmt " token " + |> Fmt.fmt name + |> (fun formatter -> + match alias with + | None -> formatter + | Some alias -> formatter |> Fmt.fmt " " |> String.pp alias + ) + |> (fun formatter -> + match qtype with + | Synthetic + | Implicit -> formatter + | Explicit {module_; type_} -> + formatter |> Fmt.fmt " of " |> Fmt.fmt module_ |> Fmt.fmt "." |> Fmt.fmt type_ + ) + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + |> Fmt.fmt "\n First: " + |> (List.pp pp_symbol_index) (Ordset.to_list first) + |> Fmt.fmt "\n follow: " + |> (List.pp pp_symbol_index) (Ordset.to_list follow) + |> Fmt.fmt "\n" + end + ) t.symbols + ) + |> Fmt.fmt "Non-terminals\n" + |> (fun formatter -> + Array.fold ~init:formatter + ~f:(fun formatter (Symbol.{name; start; qtype; prods; first; follow; _} as symbol) -> + match Symbol.is_nonterm symbol with + | false -> formatter + | true -> begin + formatter + |> Fmt.fmt (match start with + | true -> " start " + | false -> " nonterm " + ) + |> Fmt.fmt name + |> (fun formatter -> + match qtype with + | Synthetic + | Implicit -> formatter + | Explicit {module_; type_} -> + formatter |> Fmt.fmt " of " |> Fmt.fmt module_ |> Fmt.fmt "." |> Fmt.fmt type_ + ) + |> Fmt.fmt "\n First: " + |> (List.pp pp_symbol_index) (Ordset.to_list first) + |> Fmt.fmt "\n Follow: " + |> (List.pp pp_symbol_index) (Ordset.to_list follow) + |> Fmt.fmt "\n Productions\n" + |> (fun formatter -> + Ordset.fold ~init:formatter + ~f:(fun formatter prod -> + formatter + |> Fmt.fmt " " + |> pp_prod prod + |> Fmt.fmt "\n" + ) prods + ) + end + ) t.symbols + ) + |> Fmt.fmt "Item sets\n" + |> Array.fmt ~alt:true (fun lr1itemsetclosure formatter -> + let pp_lr0item lr0item formatter = begin + let Lr0Item.{prod; dot} = lr0item in + let Prod.{lhs_index; rhs_indexes; _} = prod in + formatter + |> Fmt.fmt (Symbol.name (Array.get lhs_index t.symbols)) + |> Fmt.fmt " ::=" + |> (fun formatter -> + Array.foldi ~init:formatter ~f:(fun i formatter rhs_index -> + formatter + |> Fmt.fmt (match i = dot with + | false -> "" + | true -> " ·" + ) + |> Fmt.fmt " " + |> pp_symbol_index rhs_index + ) rhs_indexes + |> Fmt.fmt ( + match Array.length rhs_indexes = dot with + | false -> "" + | true -> " ·" + ) + ) + end in + let pp_lr1item lr1item formatter = begin + let Lr1Item.{lr0item; _} = lr1item in + let Lr0Item.{prod; _} = lr0item in + let Prod.{prec; _} = prod in + formatter + |> Fmt.fmt "[" + |> pp_lr0item lr0item + |> Fmt.fmt ", {" + |> (fun formatter -> + Array.foldi ~init:formatter ~f:(fun i formatter symbol_index -> + formatter + |> Fmt.fmt (match i with + | 0L -> "" + | _ -> ", " + ) + |> pp_symbol_index symbol_index + ) (Ordset.to_array Lr1Item.(lr1item.follow)) + ) + |> Fmt.fmt "}]" + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + end in + formatter + |> Uns.pp Lr1ItemsetClosure.(lr1itemsetclosure.index) + |> Fmt.fmt ": kernel=" + |> (fun formatter -> + let lr1item_list = Lr1Itemset.fold ~init:[] ~f:(fun accum lr1itemset -> + lr1itemset :: accum + ) Lr1ItemsetClosure.(lr1itemsetclosure.kernel) in + formatter |> List.fmt ~alt:true ~width:4L pp_lr1item lr1item_list + ) + |> Fmt.fmt "; added=" + |> (fun formatter -> + let lr1item_list = Lr1Itemset.fold ~init:[] ~f:(fun accum lr1itemset -> + lr1itemset :: accum + ) Lr1ItemsetClosure.(lr1itemsetclosure.added) in + formatter |> List.fmt ~alt:true ~width:4L pp_lr1item lr1item_list + ) + ) t.lr1itemsetclosures + (* XXX Item sets. *) + |> Fmt.fmt "\n" + |> Fmt.fmt "States\n" + (* XXX States. *) + |> Io.with_txt io + +let to_html conf io t = + let pp_symbol_index symbol_index formatter = begin + let symbol = Array.get symbol_index t.symbols in + let pretty_name = match symbol.alias with + | None -> symbol.name + | Some alias -> + String.Fmt.empty |> Fmt.fmt "“" |> Fmt.fmt alias |> Fmt.fmt "”" |> Fmt.to_string + in + formatter |> Fmt.fmt " Fmt.fmt symbol.name |> Fmt.fmt "\">" + |> Fmt.fmt pretty_name |> Fmt.fmt "" + end in + let io = io.log |> Fmt.fmt "hocc: Generating html report\n" |> Io.with_log io in + io.html + |> Fmt.fmt "\n" + |> Fmt.fmt "\n" + |> Fmt.fmt "

" |> Fmt.fmt (Path.Segment.to_string_hlt (Conf.module_ conf)) + |> Fmt.fmt " grammar

\n" + |> Fmt.fmt "

Sections

\n" + |> Fmt.fmt "\n" + |> Fmt.fmt "
\n" + |> Fmt.fmt "

Precedences

\n" + |> Fmt.fmt "
    \n" + |> (fun formatter -> + Array.fold ~init:formatter ~f:(fun formatter Prec.{name; assoc; doms; _} -> + formatter + |> Fmt.fmt "
  • " + |> Fmt.fmt (match assoc with + | None -> "prec" + | Some Left -> "left" + | Some Right -> "right" + ) + |> Fmt.fmt " Fmt.fmt name |> Fmt.fmt "\">" + |> Fmt.fmt name + |> Fmt.fmt "" + |> (fun formatter -> + match Ordset.is_empty doms with + | true -> formatter + | false -> begin + let _, formatter = Ordset.fold ~init:(true, formatter) + ~f:(fun (first, formatter) prec_ind -> + let ref_name = (Array.get prec_ind t.precs).name in + let formatter = + formatter + |> Fmt.fmt (match first with + | true -> " < " + | false -> ", " + ) + |> Fmt.fmt " Fmt.fmt ref_name |> Fmt.fmt "\">" + |> Fmt.fmt ref_name + |> Fmt.fmt "" + in + (false, formatter) + ) doms + in + formatter + end + ) + |> Fmt.fmt "
  • \n" + ) t.precs + ) + |> Fmt.fmt "
\n" + |> Fmt.fmt "

Tokens

\n" + |> Fmt.fmt "
    \n" + |> (fun formatter -> + Array.fold ~init:formatter + ~f:(fun formatter (Symbol.{name; alias; qtype; prec; first; follow; _} as symbol) -> + match Symbol.is_token symbol with + | false -> formatter + | true -> begin + formatter + |> Fmt.fmt "
  • token " + |> Fmt.fmt " Fmt.fmt name |> Fmt.fmt "\">" + |> Fmt.fmt name + |> Fmt.fmt "" + |> (fun formatter -> + match alias with + | None -> formatter + | Some alias -> formatter |> Fmt.fmt " " |> String.pp alias + ) + |> (fun formatter -> + match qtype with + | Synthetic + | Implicit -> formatter + | Explicit {module_; type_} -> + formatter |> Fmt.fmt " of " |> Fmt.fmt module_ |> Fmt.fmt "." |> Fmt.fmt type_ + ) + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> + formatter + |> Fmt.fmt " prec " + |> Fmt.fmt " Fmt.fmt name |> Fmt.fmt "\">" + |> Fmt.fmt name + |> Fmt.fmt "" + ) + |> Fmt.fmt "\n" + |> Fmt.fmt "
      \n" + |> Fmt.fmt "
    • First: " + |> (List.pp pp_symbol_index) (Ordset.to_list first) + |> Fmt.fmt "
    • \n" + |> Fmt.fmt "
    • Follow: " + |> (List.pp pp_symbol_index) (Ordset.to_list follow) + |> Fmt.fmt "
    • \n" + |> Fmt.fmt "
    \n" + |> Fmt.fmt "
  • \n" + end + ) t.symbols + ) + |> Fmt.fmt "
\n" + |> Fmt.fmt "

Non-terminals

\n" + |> Fmt.fmt "
    \n" + |> (fun formatter -> + Array.fold ~init:formatter + ~f:(fun formatter (Symbol.{name; start; qtype; prods; first; follow; _} as symbol) -> + match Symbol.is_nonterm symbol with + | false -> formatter + | true -> begin + formatter + |> Fmt.fmt (match start with + | true -> "
  • start " + | false -> "
  • nonterm " + ) + |> Fmt.fmt " Fmt.fmt name |> Fmt.fmt "\">" + |> Fmt.fmt name + |> Fmt.fmt "" + |> (fun formatter -> + match qtype with + | Synthetic + | Implicit -> formatter + | Explicit {module_; type_} -> + formatter |> Fmt.fmt " of " |> Fmt.fmt module_ |> Fmt.fmt "." |> Fmt.fmt type_ + ) + |> Fmt.fmt "\n" + |> Fmt.fmt "
      \n" + |> Fmt.fmt "
    • First: " + |> (List.pp pp_symbol_index) (Ordset.to_list first) + |> Fmt.fmt "
    • \n" + |> Fmt.fmt "
    • Follow: " + |> (List.pp pp_symbol_index) (Ordset.to_list follow) + |> Fmt.fmt "
    • \n" + |> Fmt.fmt "
    • Productions\n" + |> Fmt.fmt "
        \n" + |> (fun formatter -> + Ordset.fold ~init:formatter + ~f:(fun formatter Prod.{lhs_index; rhs_indexes; prec; _} -> + let lhs_name = Symbol.name (Array.get lhs_index t.symbols) in + formatter + |> Fmt.fmt "
      • " + |> Fmt.fmt " Fmt.fmt lhs_name |> Fmt.fmt "\">" + |> Fmt.fmt lhs_name + |> Fmt.fmt " ::=" + |> (fun formatter -> + match Array.length rhs_indexes with + | 0L -> formatter |> Fmt.fmt " epsilon" + | _ -> begin + Array.fold ~init:formatter ~f:(fun formatter rhs_index -> + let rhs_name = Symbol.name (Array.get rhs_index t.symbols) in + formatter + |> Fmt.fmt " " + |> Fmt.fmt " Fmt.fmt rhs_name |> Fmt.fmt "\">" + |> pp_symbol_index rhs_index + |> Fmt.fmt "" + ) rhs_indexes + end + ) + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + |> Fmt.fmt "
      • \n" + ) prods + |> Fmt.fmt "
      \n" + |> Fmt.fmt "
    • \n" + ) + |> Fmt.fmt "
    \n" + end + ) t.symbols + ) + |> Fmt.fmt "
\n" + |> Fmt.fmt "

Item sets

\n" + |> Fmt.fmt "
    \n" + (* XXX Item sets. *) + |> Fmt.fmt "
\n" + |> Fmt.fmt "

States

\n" + |> Fmt.fmt "
    \n" + (* XXX States. *) + |> Fmt.fmt "
\n" + |> Fmt.fmt "\n" + |> Fmt.fmt "\n" + |> Io.with_html io + +let to_hocc io t = + let pp_symbol_index symbol_index formatter = begin + let symbol = Array.get symbol_index t.symbols in + match symbol.alias with + | None -> formatter |> Fmt.fmt symbol.name + | Some alias -> formatter |> String.pp alias + end in + let io = io.log |> Fmt.fmt "hocc: Generating hocc report\n" |> Io.with_log io in + io.hocc + |> Fmt.fmt "hocc\n" + |> (fun formatter -> + Array.fold ~init:formatter ~f:(fun formatter Prec.{name; assoc; stmt; _} -> + formatter + |> Fmt.fmt (match assoc with + | None -> " prec " + | Some Left -> " left " + | Some Right -> " right " + ) + |> Fmt.fmt name + |> (fun formatter -> + match stmt with + | Prec {prec_rels=PrecRelsLtPrecs {precs=Precs {uident=Uident {uident}; precs_tl}; _}; _} -> + begin + let rec fmt_precs_tl precs_tl formatter = begin + match precs_tl with + | Parse.PrecsTlCommaUident {uident=Uident {uident}; precs_tl; _} -> begin + formatter + |> Fmt.fmt ", " |> Fmt.fmt (string_of_token uident) + |> fmt_precs_tl precs_tl + end + | PrecsTlEpsilon -> formatter + end in + formatter + |> Fmt.fmt " < " |> Fmt.fmt (string_of_token uident) + |> fmt_precs_tl precs_tl + end + | Prec {prec_rels=PrecRelsEpsilon; _} -> formatter + ) + |> Fmt.fmt "\n" + ) t.precs + ) + |> (fun formatter -> + Array.fold ~init:formatter ~f:(fun formatter (Symbol.{name; alias; prec; _} as symbol) -> + match Symbol.is_token symbol && not (Symbol.is_synthetic symbol) with + | false -> formatter + | true -> begin + formatter + |> Fmt.fmt " token " + |> Fmt.fmt name + |> (fun formatter -> + match alias with + | None -> formatter + | Some alias -> formatter |> Fmt.fmt " " |> String.pp alias + ) + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + |> Fmt.fmt "\n" + end + ) t.symbols + ) + |> (fun formatter -> + Array.fold ~init:formatter ~f:(fun formatter (Symbol.{name; start; prec; prods; _} as symbol) -> + match Symbol.is_nonterm symbol && not (Symbol.is_synthetic symbol) with + | false -> formatter + | true -> begin + formatter + |> Fmt.fmt (match start with + | true -> " start " + | false -> " nonterm " + ) + |> Fmt.fmt name + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> + formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + |> Fmt.fmt " ::=" + |> Fmt.fmt (match Ordset.length prods with + | 1L -> "" + | _ -> "\n" + ) + |> (fun formatter -> + Ordset.fold ~init:formatter ~f:(fun formatter Prod.{rhs_indexes; prec; _} -> + formatter + |> Fmt.fmt (match Ordset.length prods with + | 1L -> "" + | _ -> " |" + ) + |> (fun formatter -> + match Array.length rhs_indexes with + | 0L -> formatter |> Fmt.fmt " epsilon" + | _ -> begin + Array.fold ~init:formatter ~f:(fun formatter rhs_index -> + formatter + |> Fmt.fmt " " + |> pp_symbol_index rhs_index + ) rhs_indexes + end + ) + |> (fun formatter -> + match prec with + | None -> formatter + | Some {name; _} -> formatter |> Fmt.fmt " prec " |> Fmt.fmt name + ) + |> Fmt.fmt "\n" + ) prods + ) + end + ) t.symbols + ) + |> Io.with_hocc io diff --git a/bootstrap/bin/hocc/spec.mli b/bootstrap/bin/hocc/spec.mli new file mode 100644 index 000000000..c13db8b28 --- /dev/null +++ b/bootstrap/bin/hocc/spec.mli @@ -0,0 +1,16 @@ +open! Basis +open! Basis.Rudiments + +type t = { + precs: Prec.t array; + symbols: Symbol.t array; + prods: Prod.t array; + reductions: Reduction.t array; + lr1itemsetclosures: Lr1ItemsetClosure.t array; +} + +val init: Conf.t -> Io.t -> Parse.hmh -> Io.t * t + +val to_txt: Conf.t -> Io.t -> t -> Io.t +val to_html: Conf.t -> Io.t -> t -> Io.t +val to_hocc: Io.t -> t -> Io.t diff --git a/bootstrap/bin/hocc/symbol.ml b/bootstrap/bin/hocc/symbol.ml new file mode 100644 index 000000000..1652c77ef --- /dev/null +++ b/bootstrap/bin/hocc/symbol.ml @@ -0,0 +1,120 @@ +open Basis +open! Basis.Rudiments + +module T = struct + type stmt = + | Token of Parse.token + | Nonterm of Parse.nonterm + + let pp_stmt stmt formatter = + match stmt with + | Token token -> formatter |> Fmt.fmt "Token " |> Parse.fmt_token token + | Nonterm nonterm -> formatter |> Fmt.fmt "Nonterm " |> Parse.fmt_nonterm nonterm + + type t = { + index: uns; + name: string; + qtype: QualifiedType.t; + prec: Prec.t option; + stmt: stmt option; + alias: string option; + start: bool; + prods: (Prod.t, Prod.cmper_witness) Ordset.t; + first: (uns, Uns.cmper_witness) Ordset.t; + follow: (uns, Uns.cmper_witness) Ordset.t; + } + + let hash_fold {index; _} state = + state |> Uns.hash_fold index + + let cmp {index=index0; _} {index=index1; _} = + Uns.cmp index0 index1 + + let pp {index; name; qtype; prec; stmt; alias; start; prods; first; follow} formatter = + formatter + |> Fmt.fmt "{index=" |> Uns.pp index + |> Fmt.fmt "; name=" |> String.pp name + |> Fmt.fmt "; qtype=" |> QualifiedType.pp qtype + |> Fmt.fmt "; prec=" |> (Option.pp Prec.pp) prec + |> Fmt.fmt "; stmt=" |> (Option.pp pp_stmt) stmt + |> Fmt.fmt "; alias=" |> (Option.pp String.pp) alias + |> Fmt.fmt "; start=" |> Bool.pp start + |> Fmt.fmt "; prods=" |> Ordset.pp prods + |> Fmt.fmt "; first=" |> Ordset.pp first + |> Fmt.fmt "; follow=" |> Ordset.pp follow + |> Fmt.fmt "}" +end +include T +include Identifiable.Make(T) + +let init_token ~index ~name ~qtype ~prec ~stmt ~alias = + let stmt = match stmt with + | None -> None + | Some stmt -> Some (Token stmt) + in + let start = false in + let prods = Ordset.empty (module Prod) in + (* Tokens are in their own `first` sets. *) + let first = Ordset.singleton (module Uns) index in + let follow = Ordset.empty (module Uns) in + {index; name; qtype; prec; stmt; alias; start; prods; first; follow} + +let init_nonterm ~index ~name ~qtype ~prec ~stmt ~start ~prods ~epsilon = + let stmt = match stmt with + | None -> None + | Some stmt -> Some (Nonterm stmt) + in + let alias = None in + (* Insert "ε" into the `first` set if there is an epsilon production. *) + let has_epsilon_prod = Ordset.fold_until ~init:false ~f:(fun _has_epsilon_prod prod -> + let is_epsilon = Prod.is_epsilon prod in + is_epsilon, is_epsilon + ) prods in + let first = match has_epsilon_prod with + | false -> Ordset.empty (module Uns) + | true -> Ordset.singleton (module Uns) epsilon.index + in + (* Insert "ε" into the `follow` set for synthetic wrapper symbols. *) + let follow = match stmt with + | Some _ -> Ordset.empty (module Uns) + | None -> Ordset.singleton (module Uns) epsilon.index + in + {index; name; qtype; prec; stmt; alias; start; prods; first; follow} + +let is_token {prods; _} = + Ordset.is_empty prods + +let is_nonterm t = + not (is_token t) + +let is_synthetic {stmt; _} = + match stmt with + | None -> true + | Some _ -> false + +let index {index; _} = + index + +let name {name; _} = + name + +let first_mem ~other t = + Ordset.mem other.index t.first + +let first_has_diff symbol_indexes t = + not (Ordset.is_empty (Ordset.diff symbol_indexes t.first)) + +let first_insert ~other t = + let first = Ordset.insert other.index t.first in + {t with first} + +let first_union symbol_indexes t = + let first = Ordset.union symbol_indexes t.first in + {t with first} + +let follow_has_diff symbol_indexes t = + not (Ordset.is_empty (Ordset.diff symbol_indexes t.follow)) + +let follow_union symbol_indexes t = + let follow = Ordset.union symbol_indexes t.follow in + {t with follow} diff --git a/bootstrap/bin/hocc/symbol.mli b/bootstrap/bin/hocc/symbol.mli new file mode 100644 index 000000000..c34d5b430 --- /dev/null +++ b/bootstrap/bin/hocc/symbol.mli @@ -0,0 +1,43 @@ +open Basis +open! Basis.Rudiments + +type stmt = + | Token of Parse.token + | Nonterm of Parse.nonterm + +type t = { + index: uns; + name: string; + qtype: QualifiedType.t; + prec: Prec.t option; + stmt: stmt option; + alias: string option; + start: bool; + prods: (Prod.t, Prod.cmper_witness) Ordset.t; + first: (uns, Uns.cmper_witness) Ordset.t; + follow: (uns, Uns.cmper_witness) Ordset.t; +} + +val init_token: index:uns -> name:string -> qtype:QualifiedType.t -> prec:Prec.t option + -> stmt:Parse.token option -> alias:string option -> t + +val init_nonterm: index:uns -> name:string -> qtype:QualifiedType.t -> prec:Prec.t option + -> stmt:Parse.nonterm option -> start:bool -> prods:(Prod.t, Prod.cmper_witness) Ordset.t + -> epsilon:t -> t + +val is_token: t -> bool +val is_nonterm: t -> bool +val is_synthetic: t -> bool + +include IdentifiableIntf.S with type t := t + +val index: t -> uns +val name: t -> string + +val first_mem: other:t -> t -> bool +val first_has_diff: (uns, Uns.cmper_witness) Ordset.t -> t -> bool +val first_insert: other:t -> t -> t +val first_union: (uns, Uns.cmper_witness) Ordset.t -> t -> t + +val follow_has_diff: (uns, Uns.cmper_witness) Ordset.t -> t -> bool +val follow_union: (uns, Uns.cmper_witness) Ordset.t -> t -> t diff --git a/bootstrap/bin/hocc/workq.ml b/bootstrap/bin/hocc/workq.ml new file mode 100644 index 000000000..b63d38e48 --- /dev/null +++ b/bootstrap/bin/hocc/workq.ml @@ -0,0 +1,40 @@ +open Basis +open! Basis.Rudiments + +type t = { + deq: uns Deq.t; + set: (uns, Uns.cmper_witness) Set.t; +} + +let empty = { + deq=Deq.empty; + set=Set.empty (module Uns); +} + +let length {set; _} = + Set.length set + +let is_empty {set; _} = + Set.is_empty set + +let push lr1itemsetclosure_index {deq; set} = + assert (not (Set.mem lr1itemsetclosure_index set)); + { + deq=Deq.push lr1itemsetclosure_index deq; + set=Set.insert lr1itemsetclosure_index set; + } + +let push_back lr1itemsetclosure_index {deq; set} = + assert (not (Set.mem lr1itemsetclosure_index set)); + { + deq=Deq.push_back lr1itemsetclosure_index deq; + set=Set.insert lr1itemsetclosure_index set; + } + +let pop {deq; set} = + let lr1itemsetclosure_index, deq' = Deq.pop deq in + let set' = Set.remove lr1itemsetclosure_index set in + lr1itemsetclosure_index, {deq=deq'; set=set'} + +let mem lr1itemsetclosure_index {set; _} = + Set.mem lr1itemsetclosure_index set diff --git a/bootstrap/bin/hocc/workq.mli b/bootstrap/bin/hocc/workq.mli new file mode 100644 index 000000000..7b8a3ce6f --- /dev/null +++ b/bootstrap/bin/hocc/workq.mli @@ -0,0 +1,18 @@ +open! Basis +open! Basis.Rudiments + +type t + +val empty: t + +val length: t -> uns + +val is_empty: t -> bool + +val push: uns -> t -> t + +val push_back: uns -> t -> t + +val pop: t -> uns * t + +val mem: uns -> t -> bool diff --git a/bootstrap/src/hmc/scan.ml b/bootstrap/src/hmc/scan.ml index 7c302ccf7..4baf742d7 100644 --- a/bootstrap/src/hmc/scan.ml +++ b/bootstrap/src/hmc/scan.ml @@ -576,6 +576,104 @@ module AbstractToken = struct formatter |> Fmt.fmt "Tok_error " |> (List.pp Rendition.Malformation.pp) mals ) |> Fmt.fmt ">" + + let malformations = function + (* Keywords. *) + | Tok_and | Tok_also | Tok_as | Tok_conceal | Tok_effect | Tok_else | Tok_expose | Tok_external + | Tok_false | Tok_fn | Tok_function | Tok_if | Tok_import | Tok_include | Tok_lazy | Tok_let + | Tok_match | Tok_mutability | Tok_of | Tok_open | Tok_or | Tok_rec | Tok_then | Tok_true + | Tok_type | Tok_when | Tok_with + (* Operators. *) + | Tok_tilde_op _ | Tok_qmark_op _ | Tok_star_star_op _ | Tok_star_op _ | Tok_slash_op _ + | Tok_pct_op _ | Tok_plus_op _ | Tok_minus_op _ | Tok_at_op _ | Tok_caret_op _ | Tok_dollar_op _ + | Tok_lt_op _ | Tok_eq_op _ | Tok_gt_op _ | Tok_bar_op _ | Tok_colon_op _ | Tok_dot_op _ + (* Punctuation. *) + | Tok_tilde | Tok_qmark | Tok_minus | Tok_lt | Tok_lt_eq | Tok_eq | Tok_lt_gt | Tok_gt_eq + | Tok_gt | Tok_comma | Tok_dot | Tok_dot_dot | Tok_semi | Tok_colon | Tok_colon_colon + | Tok_colon_eq | Tok_lparen | Tok_rparen | Tok_lbrack | Tok_rbrack | Tok_lcurly | Tok_rcurly + | Tok_bar | Tok_lcapture | Tok_rcapture | Tok_larray | Tok_rarray | Tok_bslash | Tok_tick + | Tok_caret | Tok_amp | Tok_xmark | Tok_arrow | Tok_carrow + (* Miscellaneous. *) + | Tok_source_directive (Constant _) + | Tok_line_delim + | Tok_indent (Constant _) + | Tok_dedent (Constant _) + | Tok_whitespace|Tok_hash_comment + | Tok_paren_comment (Constant _) + | Tok_uscore + | Tok_uident (Constant _) + | Tok_cident _ + | Tok_codepoint (Constant _) + | Tok_rstring (Constant _) + | Tok_istring (Constant _) + | Tok_fstring_lditto + | Tok_fstring_interpolated (Constant _) + | Tok_fstring_pct + | Tok_fstring_pad (Constant _) + | Tok_fstring_just _ | Tok_fstring_sign _ | Tok_fstring_alt | Tok_fstring_zpad + | Tok_fstring_width_star + | Tok_fstring_width (Constant _) + | Tok_fstring_pmode _ | Tok_fstring_precision_star + | Tok_fstring_precision (Constant _) + | Tok_fstring_radix _ | Tok_fstring_notation _ | Tok_fstring_pretty + | Tok_fstring_fmt (Constant _) + | Tok_fstring_sep (Constant _) + | Tok_fstring_label _ | Tok_fstring_lparen_caret | Tok_fstring_caret_rparen | Tok_fstring_rditto + | Tok_r32 (Constant _) + | Tok_r64 (Constant _) + | Tok_u8 (Constant _) + | Tok_i8 (Constant _) + | Tok_u16 (Constant _) + | Tok_i16 (Constant _) + | Tok_u32 (Constant _) + | Tok_i32 (Constant _) + | Tok_u64 (Constant _) + | Tok_i64 (Constant _) + | Tok_u128 (Constant _) + | Tok_i128 (Constant _) + | Tok_u256 (Constant _) + | Tok_i256 (Constant _) + | Tok_u512 (Constant _) + | Tok_i512 (Constant _) + | Tok_nat (Constant _) + | Tok_zint (Constant _) + | Tok_end_of_input | Tok_misaligned + -> [] + (* Malformations. *) + | Tok_source_directive (Malformed mals) + | Tok_indent (Malformed mals) + | Tok_dedent (Malformed mals) + | Tok_paren_comment (Malformed mals) + | Tok_uident (Malformed mals) + | Tok_codepoint (Malformed mals) + | Tok_rstring (Malformed mals) + | Tok_istring (Malformed mals) + | Tok_fstring_interpolated (Malformed mals) + | Tok_fstring_pad (Malformed mals) + | Tok_fstring_width (Malformed mals) + | Tok_fstring_precision (Malformed mals) + | Tok_fstring_fmt (Malformed mals) + | Tok_fstring_sep (Malformed mals) + | Tok_r32 (Malformed mals) + | Tok_r64 (Malformed mals) + | Tok_u8 (Malformed mals) + | Tok_i8 (Malformed mals) + | Tok_u16 (Malformed mals) + | Tok_i16 (Malformed mals) + | Tok_u32 (Malformed mals) + | Tok_i32 (Malformed mals) + | Tok_u64 (Malformed mals) + | Tok_i64 (Malformed mals) + | Tok_u128 (Malformed mals) + | Tok_i128 (Malformed mals) + | Tok_u256 (Malformed mals) + | Tok_i256 (Malformed mals) + | Tok_u512 (Malformed mals) + | Tok_i512 (Malformed mals) + | Tok_nat (Malformed mals) + | Tok_zint (Malformed mals) + | Tok_error mals + -> mals end module ConcreteToken = struct @@ -812,6 +910,9 @@ let view_of_t t = let text t = Source.(text (Cursor.container t.tok_base)) +let cursor {tok_base; _} = + tok_base + let str_of_cursor cursor t = Source.Slice.to_string (Source.Slice.of_cursors ~base:t.tok_base ~past:cursor) diff --git a/bootstrap/src/hmc/scan.mli b/bootstrap/src/hmc/scan.mli index 37491c002..564706a88 100644 --- a/bootstrap/src/hmc/scan.mli +++ b/bootstrap/src/hmc/scan.mli @@ -184,6 +184,7 @@ module AbstractToken : sig | Tok_arrow | Tok_carrow + (* Miscellaneous. *) | Tok_source_directive of source_directive Rendition.t | Tok_line_delim | Tok_indent of unit Rendition.t @@ -242,11 +243,19 @@ module AbstractToken : sig | Tok_error of Rendition.Malformation.t list val pp: t -> (module Fmt.Formatter) -> (module Fmt.Formatter) + + val malformations: t -> Rendition.Malformation.t list + (** [malformations t] returns a list of malformations associated with [t], or an empty list if + there are no malformations. This function can be used on any token variant, even if no + malformations are possible. *) end (** Concrete tokens augment abstract tokens with source locations. *) module ConcreteToken : sig - type t + type t = { + atok: AbstractToken.t; + source: Source.Slice.t; + } val atok: t -> AbstractToken.t val source: t -> Source.Slice.t @@ -256,12 +265,18 @@ end type t +include FormattableIntf.SMono with type t := t + val init: Text.t -> t (** [init text] initializes scanner to scan [text]. *) val text: t -> Text.t (** [text t] returns the source text for [t]. *) +val cursor: t -> Source.Cursor.t +(** [cursor t] returns the cursor at the scanner's current position. This cursor is equivalent to + the base of the token returned by [next t]. *) + val next: t -> t * ConcreteToken.t (** [next t] scans the next token past the tokens scanned by [t]'s predecessor state(s) and returns the scanner's successor state along with a token. If [t] is at the end of input, there is no diff --git a/bootstrap/test/basis/seed/test_seed0.ml b/bootstrap/test/basis/seed/test_seed0.ml index f8e1b1e91..ad4899332 100644 --- a/bootstrap/test/basis/seed/test_seed0.ml +++ b/bootstrap/test/basis/seed/test_seed0.ml @@ -3,7 +3,7 @@ open Basis let () = File.Fmt.stdout |> Fmt.fmt "HEMLOCK_ENTROPY=" - |> String.pp (Sys.getenv "HEMLOCK_ENTROPY") + |> String.pp (Stdlib.Sys.getenv "HEMLOCK_ENTROPY") |> Fmt.fmt " -> seed=" |> Hash.State.pp Hash.State.seed |> Fmt.fmt "\n" diff --git a/bootstrap/test/basis/seed/test_seed42.ml b/bootstrap/test/basis/seed/test_seed42.ml index f8e1b1e91..ad4899332 100644 --- a/bootstrap/test/basis/seed/test_seed42.ml +++ b/bootstrap/test/basis/seed/test_seed42.ml @@ -3,7 +3,7 @@ open Basis let () = File.Fmt.stdout |> Fmt.fmt "HEMLOCK_ENTROPY=" - |> String.pp (Sys.getenv "HEMLOCK_ENTROPY") + |> String.pp (Stdlib.Sys.getenv "HEMLOCK_ENTROPY") |> Fmt.fmt " -> seed=" |> Hash.State.pp Hash.State.seed |> Fmt.fmt "\n" diff --git a/bootstrap/test/hocc/A.expected b/bootstrap/test/hocc/A.expected new file mode 100644 index 000000000..439cadf7a --- /dev/null +++ b/bootstrap/test/hocc/A.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./A.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++.++++.++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/A.hmh b/bootstrap/test/hocc/A.hmh new file mode 100644 index 000000000..5efa06634 --- /dev/null +++ b/bootstrap/test/hocc/A.hmh @@ -0,0 +1,22 @@ +hocc + left p1 + left p2 < p1 + + token PLUS + token STAR + token LPAREN + token RPAREN prec p1 + token ID + token EOI + + start E ::= + | E PLUS T EOI prec p2 + | T EOI + + nonterm T ::= + | T STAR F + | F + + nonterm F prec p1 ::= + | LPAREN E RPAREN + | ID diff --git a/bootstrap/test/hocc/B.expected b/bootstrap/test/hocc/B.expected new file mode 100644 index 000000000..f267b5ee9 --- /dev/null +++ b/bootstrap/test/hocc/B.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./B.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++.++++.++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/B.hmh b/bootstrap/test/hocc/B.hmh new file mode 100644 index 000000000..59ad744b7 --- /dev/null +++ b/bootstrap/test/hocc/B.hmh @@ -0,0 +1,19 @@ +hocc + token PLUS + token STAR + token LPAREN + token RPAREN + token ID + token EOI + + start E ::= + | E PLUS T EOI + | T EOI + + nonterm T ::= + | T STAR F + | F + + nonterm F ::= + | LPAREN E RPAREN + | ID diff --git a/bootstrap/test/hocc/C.expected b/bootstrap/test/hocc/C.expected new file mode 100644 index 000000000..54fc54557 --- /dev/null +++ b/bootstrap/test/hocc/C.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./C.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++++++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/C.hmh b/bootstrap/test/hocc/C.hmh new file mode 100644 index 000000000..68d76f532 --- /dev/null +++ b/bootstrap/test/hocc/C.hmh @@ -0,0 +1,15 @@ +hocc + token EQUAL + token STAR + token ID + token EOI + + start S ::= + | L EQUAL R EOI + | R EOI + + nonterm L ::= + | STAR R + | ID + + nonterm R ::= L diff --git a/bootstrap/test/hocc/D.expected b/bootstrap/test/hocc/D.expected new file mode 100644 index 000000000..300f8d00d --- /dev/null +++ b/bootstrap/test/hocc/D.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./D.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak ++++++++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/D.hmh b/bootstrap/test/hocc/D.hmh new file mode 100644 index 000000000..bfdacc522 --- /dev/null +++ b/bootstrap/test/hocc/D.hmh @@ -0,0 +1,15 @@ +hocc + left p1 + left p2 < p1 + + token STAR "*" prec p1 + token PLUS "+" prec p2 + token ID + token EOI + + start S ::= E EOI + + nonterm E ::= + | ID + | E "*" E prec p1 + | E "+" E prec p2 diff --git a/bootstrap/test/hocc/E.expected b/bootstrap/test/hocc/E.expected new file mode 100644 index 000000000..1c112106a --- /dev/null +++ b/bootstrap/test/hocc/E.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./E.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++..+.+ \ No newline at end of file diff --git a/bootstrap/test/hocc/E.hmh b/bootstrap/test/hocc/E.hmh new file mode 100644 index 000000000..2a343971c --- /dev/null +++ b/bootstrap/test/hocc/E.hmh @@ -0,0 +1,10 @@ +hocc + token C + token D + token EOI + + start S ::= N N EOI + + nonterm N ::= + | C N + | D diff --git a/bootstrap/test/hocc/Example1.expected b/bootstrap/test/hocc/Example1.expected new file mode 100644 index 000000000..ecca4bcbc --- /dev/null +++ b/bootstrap/test/hocc/Example1.expected @@ -0,0 +1,5 @@ +hocc: Parsing "./Example1.hmhi" +hocc: Parsing "./Example1.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak ++++++++++++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/Example1.hmh b/bootstrap/test/hocc/Example1.hmh new file mode 100644 index 000000000..553247646 --- /dev/null +++ b/bootstrap/test/hocc/Example1.hmh @@ -0,0 +1,63 @@ +open import Basis + +# Specify the parser. `hocc ...` expands to a module implementation, `{ ... }`. +include hocc + left mul + token STAR "*" prec mul + token SLASH "/" prec mul + nonterm MulOp of Token.t ::= + | "*" -> STAR + | "/" -> SLASH + + left add < mul + token PLUS "+" prec add + token MINUS "-" prec add + nonterm AddOp of Token.t ::= + | "+" -> PLUS + | "-" -> MINUS + + token INT of Zint.t + nonterm Expr of Zint.t ::= + | e0:Expr op:MulOp e1:Expr -> + match op with + | MulOp STAR -> Zint.(e0 * e1) + | MulOp SLASH -> Zint.(e0 / e1) + | e0:Expr op:AddOp e1:Expr prec add -> + match op with + | AddOp PLUS -> Zint.(e0 + e1) + | AddOp MINUS -> Zint.(e0 - e1) + | x:INT -> x + + token EOI + start Answer of Zint.t ::= + | e:Expr EOI -> e + +# Tokenize `s`, e.g. "2 + 3 * 4", and append an `EOI` token. +tokenize s = + s |> String.split_rev ~f:(fn cp -> Codepoint.O.(cp = ' ')) + |> List.rev_filter ~f:(fn s -> String.length s <> 0) + |> List.rev_map ~f:fn s -> + let open Token + match s with + | "*" -> STAR + | "/" -> SLASH + | "+" -> PLUS + | "-" -> MINUS + | _ -> INT (Zint.of_string s) + |> List.push Token.EOI + |> List.rev + +# Calculate the result of the arithmetic expression expressed in `s`, e.g. "2 + 3 * 4". +calculate s = + List.fold_until (tokenize s) ~init:Start.Answer.boi ~f:fn parser tok -> + let parser' = Start.Answer.next tok parser + let done = match status parser' with + | Prefix -> false + | Accept _ + | Error _ -> true + parser', done + |> + function + | Accept answer -> answer + | Prefix _ -> halt "Partial input" + | Error _ -> halt "Parse error" diff --git a/bootstrap/test/hocc/Example1.hmhi b/bootstrap/test/hocc/Example1.hmhi new file mode 100644 index 000000000..b49400610 --- /dev/null +++ b/bootstrap/test/hocc/Example1.hmhi @@ -0,0 +1,9 @@ +open import Basis + +# Export the parser API so that alternatives to `calculate` can be implemented. `hocc` expands to a +# module signature. +include hocc + +calulate: string -> zint + [@@doc "Calculate the result of a simple arithmetic expression comprising non-negative integers + and `+`, `-`, `*`, and `/` operators. Tokens must be separated by one or more spaces."] diff --git a/bootstrap/test/hocc/F.expected b/bootstrap/test/hocc/F.expected new file mode 100644 index 000000000..287e00e28 --- /dev/null +++ b/bootstrap/test/hocc/F.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./F.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++*+++++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/F.hmh b/bootstrap/test/hocc/F.hmh new file mode 100644 index 000000000..41380461c --- /dev/null +++ b/bootstrap/test/hocc/F.hmh @@ -0,0 +1,16 @@ +hocc + token A + token B + token C + token D + token E + + start S ::= + | A M D + | B N D + | A N E + | B M E + + nonterm M ::= C + + nonterm N ::= C diff --git a/bootstrap/test/hocc/Hocc.expected b/bootstrap/test/hocc/Hocc.expected new file mode 100644 index 000000000..b7f5b0686 --- /dev/null +++ b/bootstrap/test/hocc/Hocc.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./Hocc.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++++++++.+.++++++++++++++++++++++++++.......+.+........++............+.............+..............+...............+........+++++++++++++++++++++++++++++++.++++++++++.+.+++++++++++++++++++.+.....++++++++++.+.+++.++.++++++++++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/Hocc.hmh b/bootstrap/test/hocc/Hocc.hmh new file mode 100644 index 000000000..108941ab8 --- /dev/null +++ b/bootstrap/test/hocc/Hocc.hmh @@ -0,0 +1,179 @@ +hocc + # hocc-specific keywords + token HOCC "hocc" + token NONTERM "nonterm" + token EPSILON_ "epsilon" + token START "start" + token TOKEN "token" + token PREC "prec" + token LEFT "left" + token RIGHT "right" + + # Identifiers + token UIDENT # Uncapitalized + token CIDENT # Capitalized + token USCORE "_" + + # Token alias + token STRING + + # Punctuation/separators + token COLON_COLON_EQ "::=" + token OF "of" + token COLON ":" + token DOT "." + token ARROW "->" + token BAR "|" + token LT "<" + token COMMA "," + token SEMI ";" + token LINE_DELIM + + # Left-right paired delimiters + token INDENT + token DEDENT + token LPAREN "(" + token RPAREN ")" + token LCAPTURE "(|" + token RCAPTURE "|)" + token LBRACK "[" + token RBRACK "]" + token LARRAY "[|" + token RARRAY "|]" + token LCURLY "{" + token RCURLY "}" + + # Miscellaneous Hemlock token in embedded code + token CODE_TOKEN + + # End of input, used to terminate start symbols + token EOI + + nonterm Ident ::= UIDENT | CIDENT | "_" + + nonterm PrecsTl ::= + | "," UIDENT PrecsTl + | epsilon + + nonterm Precs ::= UIDENT PrecsTl + + nonterm PrecRels ::= + | "<" Precs + | epsilon + + nonterm PrecType ::= "prec" | "left" | "right" + + nonterm Prec ::= PrecType UIDENT PrecRels + + nonterm OfType ::= "of" CIDENT "." UIDENT + + nonterm OfType0 ::= + | OfType + | epsilon + + nonterm PrecRef ::= + | "prec" UIDENT + | epsilon + + nonterm TokenAlias ::= + | STRING + | epsilon + + nonterm Token ::= "token" CIDENT TokenAlias OfType0 PrecRef + + nonterm Sep ::= LINE_DELIM | ";" | "|" + + nonterm CodesTl ::= + | Sep Code CodesTl + | epsilon + + nonterm Codes ::= Code CodesTl + + nonterm Codes0 ::= + | Codes + | epsilon + + nonterm Delimited ::= + | INDENT Codes DEDENT + | "(" Codes0 ")" + | "(|" Codes0 "|)" + | "[" Codes0 "]" + | "[|" Codes0 "|]" + | "{" Codes0 "}" + + nonterm CodeTl ::= + | Delimited CodeTl + | CODE_TOKEN CodeTl + | epsilon + + nonterm Code ::= + | Delimited CodeTl + | CODE_TOKEN CodeTl + + nonterm ProdParamType ::= + | CIDENT + | STRING + + nonterm ProdParamIdent ::= + | Ident ":" + | epsilon + + nonterm ProdParam ::= ProdParamIdent ProdParamType + + nonterm ProdParamsTl ::= + | ProdParam ProdParamsTl + | epsilon + + nonterm ProdParams ::= ProdParam ProdParamsTl + + nonterm ProdPattern ::= + | ProdParams + | "epsilon" + + nonterm Prod ::= ProdPattern PrecRef + + nonterm ProdsTl ::= + | "|" Prod ProdsTl + | epsilon + + nonterm Prods ::= + | "|" Prod ProdsTl + | Prod ProdsTl + + nonterm Reduction ::= Prods "->" Code + + nonterm ReductionsTl ::= + | "|" Reduction ReductionsTl + | epsilon + + nonterm Reductions ::= + | "|" Reduction ReductionsTl + | Reduction ReductionsTl + + nonterm NontermType ::= "nonterm" | "start" + + nonterm Nonterm ::= + | NontermType CIDENT PrecRef "::=" Prods + | NontermType CIDENT OfType PrecRef "::=" Reductions + + nonterm Stmt ::= + | Prec + | Token + | Nonterm + | Code + + nonterm StmtsTl ::= + | LINE_DELIM Stmt StmtsTl + | epsilon + + nonterm Stmts ::= Stmt StmtsTl + + nonterm Hocc ::= "hocc" INDENT Stmts DEDENT + + nonterm Matter ::= + | CODE_TOKEN Matter + | epsilon + + start Hmh ::= Matter Hocc Matter EOI + + start Hmhi ::= Matter "hocc" Matter EOI diff --git a/bootstrap/test/hocc/Lyken.expected b/bootstrap/test/hocc/Lyken.expected new file mode 100644 index 000000000..7a2c43c07 --- /dev/null +++ b/bootstrap/test/hocc/Lyken.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./Lyken.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.+.+++++++++.+++++++++.+++++++++++++++++++..............................+..........................+.+.........+++++++.++++.++++++++++++++++++++++++++++.....................................+...................+....+................++.................................................................+.......................+.....+.++..........................................+...........+....................................................................................................+................+....+.........................................++...................................................................................+....+....+....................................++...................................................+++.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++...+...+...+...................................................+++++...........+++....+.++++++++.................++..+++++..+..+..+.+.+++++++++++++++++++++++++++++..+++....+++.++++++++++++++++++++++++++.++....++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.++++++++++++.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++...+.+++.....+++++++....+++++.+++++++.+.....+++.+++.....................................+.........................................+.+..................................................................................................................................................................................................................................................................................................................+++++++++++++++++++++++++++++++.+.+.++++++++..++..+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++..+..........+..............++++++++++++++++++++++.++.++++++++++++++++++++++++++++++++..++++++++++++++++++++++++++++++++++++++++++++++++..++++++++++++++++++++++++++++++++++..++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.+.++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.+.+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++.+...++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/Lyken.hmh b/bootstrap/test/hocc/Lyken.hmh new file mode 100644 index 000000000..9661c113d --- /dev/null +++ b/bootstrap/test/hocc/Lyken.hmh @@ -0,0 +1,1254 @@ +hocc + ################################################################################################ + # Precedences. + + prec pGenericParamOne + left pColon < pGenericParamOne + + prec pProcParmBody + left pRparen < pProcParmBody + prec pList < pRparen, pColon + + prec pId + prec pOptionalGenericParamList < pId + left pLparen < pOptionalGenericParamList, pList + left pRarrow < pOptionalGenericParamList + + prec pProcParmRequired2 + prec pProcParmRequired1 < pProcParmRequired2 + + prec pTypeName2 + prec pVarRestId + left pDot < pTypeName2, pVarRestId, pOptionalGenericParamList + left pLbrace + left pLbracket < pVarRestId, pOptionalGenericParamList + prec pThis < pDot, pLbrace, pLbracket + + prec pUnaryExpr2 < pDot, pLbracket, pLparen + right pPowOp < pDot, pLbracket, pLparen, pUnaryExpr2 + left pMulOp < pPowOp + left pPlusOp < pMulOp + + prec pLvalListBody < pLbracket, pPlusOp + left pEq < pLvalListBody + prec pDoWhileExpr2 + + left pCmpOp1 < pPlusOp, pLvalListBody, pDoWhileExpr2 + prec pAssnExpr1 < pCmpOp1 + + prec pExprListBodyA < pCmpOp1, pDot, pLbracket, pLparen + prec pExprListBodyB < pLbracket, pCmpOp1, pPlusOp + + prec pDictElm + prec pProcParmRequiredList + prec pProcParmOptional + prec pInitFieldList + prec pCallNamed + prec pCallPargs + prec pCallThis + left pComma < pDictElm, pProcParmRequiredList, pProcParmOptional, pInitFieldList, pCallNamed, + pCallPargs, pCallThis, pLvalListBody, pUnaryExpr2, pPlusOp, pAssnExpr1, pExprListBodyA, + pExprListBodyB + + left pCmpOp2 < pCmpOp1, pComma, pExprListBodyA + prec pUnaryExpr1 < pDot, pCmpOp2 + prec pAssnExpr2 < pCmpOp2, pDot, pLparen, pPlusOp + left pCmpOp3 < pCmpOp2, pUnaryExpr1, pAssnExpr1, pAssnExpr2 + + prec pAssnExpr3 + + left pCmpOp4 < pCmpOp3 + left pCmpOp5 < pCmpOp4 + + prec pTypeNameSuffixElm < pDot + prec pTypeNameSuffix1 < pDot, pLbracket + prec pTypeName1 < pTypeNameSuffix1 + prec pLvalPrefix1 < pDot, pLbracket + prec pLvalPrefix2 < pLbracket + prec pLvalSuffix < pTypeNameSuffix1 + prec pLval1 < pDot + prec pLval2 < pRparen + prec pAnnotations2 + prec pAnnotations1 < pAnnotations2 + prec pImplements + prec pImplementsInterface < pImplements + prec pImportModuleRelpath < pPowOp + prec pImportModuleName < pDot + prec pImportVars < pComma + prec pImportLval < pComma + prec pVar1 < pDot + prec pVar2 < pVar1 + prec pElseClause < pCmpOp2 + prec pDoWhileExpr1 < pCmpOp5, pComma, pDot, pLbracket, pLparen, pCmpOp2 + prec pAssertExpr < pCmpOp5, pComma + left pRdangle + prec pNondelimitedExpr1 < pRdangle + prec pNondelimitedExpr2 < pRparen + prec pLvalSubscript + prec pExprSlice < pLvalSubscript + prec pExprSuffix < pLbracket, pLvalSuffix + + prec pReturnStmt1 + prec pThrowStmt1 + prec pAssnExprLeft4 + prec pAssnExprLeft2 < pComma + prec pAssnExprLeft1 < pAssnExprLeft2, pAssnExprLeft4 + prec pAssnExprLeft3 < pReturnStmt1, pThrowStmt1, pComma, pRparen + prec pAssnExprLeft5 < pComma + + prec pExpr1 + prec pStmtList < pExpr1 + prec pExpr2 < pDot + prec pStmt + left pRbrace < pStmt + prec pExprList1 < pComma, pRparen, pRbrace + prec pExprList2 < pAssnExpr3, pPlusOp, pPowOp, pMulOp + + left pIs + left pBar + left pLdangle + left pRbracket + + ################################################################################################ + # Tokens. + + token BOI + token INT + token FLOAT + token STR + token BUF + token CBLOCK + token ID prec pId + token COMMENT + token WHITESPACE + + # Keywords. + token AND "and" prec pCmpOp3 + token ASSERT "assert" + token ATTR "attr" + token BLANK "_" + token BREAK "break" + token CATCH "catch" + token CLASS "class" + token CONST "const" + token CONTINUE "continue" + token DEBUG "$debug" + token DO "do" + token ELIF "elif" + token ELSE "else" + token ENUM "enum" + token EXTENDS "extends" + token FALSE "false" + token FILE "$file" + token FINAL "final" + token FOR "for" + token FROM "from" + token GUARD "guard" + token IF "if" + token IMPLEMENTS "implements" prec pImplements + token IMPORT "import" + token IN "in" prec pCmpOp2 + token INF "Inf" + token INIT "init" + token INTERFACE "interface" + token IS "is" prec pIs + token LINE "$line" + token MEMBER "member" + token METH "meth" + token MODULE "module" + token NAN "NaN" + token NOT "not" prec pCmpOp2 + token NULL "null" + token OR "or" prec pCmpOp5 + token PRELUDE "prelude" + token PRIVATE "private" + token PROC "proc" + token PROTECTED "protected" + token PUBLIC "public" + token PURE "pure" + token RETURN "return" + token SELECT "select" + token STATIC "static" + token THIS "this" prec pThis + token THROW "throw" + token TRUE "true" + token VAR "var" + token VIRTUAL "virtual" + token WHERE "where" + token WHILE "while" + token XOR "xor" prec pCmpOp4 + + # Symbols. + token POW "^" prec pPowOp + + token MUL "*" prec pMulOp + token DIV "/" prec pMulOp + token MOD "\%" prec pMulOp + + token PLUS "+" prec pPlusOp + token MINUS "-" prec pPlusOp + + token EQEQ "==" prec pCmpOp2 + token NOTEQ "!=" prec pCmpOp2 + token EQEQEQ "===" prec pCmpOp2 + token NOTEQEQ "!==" prec pCmpOp2 + + token LT "<" prec pCmpOp1 + token LE "<=" prec pCmpOp1 + token GT ">" prec pCmpOp1 + token GE ">=" prec pCmpOp1 + + token EQ "=" prec pEq + + token PLUSEQ "+=" + token MINUSEQ "-=" + token MULEQ "*=" + token DIVEQ "/=" + token MODEQ "\%=" + token POWEQ "^=" + + token BAR "|" prec pBar + token RARROW "->" prec pRarrow + token COMMA "," prec pComma + token LPAREN "(" prec pLparen + token LBRACE "{" prec pLbrace + token LBRACKET "[" prec pLbracket + token LDANGLE "«" prec pLdangle + token COLON ":" prec pColon + token SEMICOLON ";" + token RPAREN ")" prec pRparen + token RBRACE "}" prec pRbrace + token RBRACKET "]" prec pRbracket + token RDANGLE "»" prec pRdangle + token DOT "." prec pDot + + ################################################################################################ + # Non-terminals. + + nonterm SliceTerm ::= + | Expr + | Lval + | epsilon + + nonterm Slice ::= + | SliceTerm ":" SliceTerm ":" SliceTerm + | SliceTerm ":" SliceTerm + | Expr + | Lval + + nonterm ParamTypeList ::= + | TypeSpec + | ParamTypeList "," TypeSpec + + nonterm DerivationConstraint ::= + | ID "extends" TypeName ImplementsInterface + | ID "implements" InterfaceList + + nonterm DerivationConstraintList ::= + | DerivationConstraint + | DerivationConstraintList "and" DerivationConstraint + + nonterm GenericParamOne ::= + | "«" TypeSpec "»" + | "«" TypeSpec "where" DerivationConstraint "»" + | epsilon prec pGenericParamOne + + nonterm GenericParamTwo ::= + | "«" TypeSpec "," TypeSpec "»" + | "«" TypeSpec "," TypeSpec "where" DerivationConstraintList "»" + | epsilon + + nonterm OptionalGenericParamList prec pOptionalGenericParamList ::= + | "«" ParamTypeList "»" + | "«" ParamTypeList "where" DerivationConstraintList "»" + | epsilon + + nonterm TypeNameSuffixElm ::= + | ID OptionalGenericParamList prec pTypeNameSuffixElm + + nonterm TypeNameSuffix ::= + | TypeNameSuffixElm prec pTypeNameSuffix1 + | TypeNameSuffix "." TypeNameSuffixElm + + nonterm TypeName ::= + | ID OptionalGenericParamList prec pTypeName1 + | ID OptionalGenericParamList "." TypeNameSuffix prec pTypeName2 + | TypeName "." TypeNameSuffix prec pTypeName2 + + nonterm Proc ::= + | "proc" + | "meth" + + nonterm ProtoType ::= + | Proc OptionalGenericParamList + | Proc OptionalGenericParamList "->" "(" ")" + | Proc OptionalGenericParamList "->" "(" ProcRetBody ")" + + | Proc OptionalGenericParamList "(" ProcParmBody ")" + | Proc OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ")" + | Proc OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ProcRetBody ")" + + nonterm TypeSpec ::= + | TypeName + | TypeSpec "|" TypeName + + | ProtoType + | TypeSpec "|" ProtoType + + nonterm Blank ::= + | "_" + + nonterm LvalSubscript prec pLvalSubscript ::= + | "[" Slice "]" + | "[" "]" + + nonterm LvalPrefix ::= + # TypeName-conforming syntax; may not actually refer to type name. + | TypeName prec pLvalPrefix1 + | TypeName LvalSubscript + + | "(" ImportLval ")" prec pLvalPrefix2 + | "(" ImportLval ")" LvalSubscript + + | "this" prec pThis + | "this" LvalSubscript + + nonterm LvalSuffix ::= + | ID prec pLvalSuffix + | ID LvalSubscript + + | LvalSuffix "." ID prec pLvalSuffix + | LvalSuffix "." ID LvalSubscript + + nonterm Lval ::= + | ImportLval prec pLval2 + | Blank + | LvalPrefix prec pLval1 + | LvalPrefix "." LvalSuffix prec pLval1 + | TypeName "." LvalSuffix prec pLval1 + | "(" Lval ")" + + nonterm LvalListBody prec pLvalListBody ::= + | Annotations Var "," Lval + | Var "," Lval + | Lval "," Lval + | Lval "," Annotations Var + | Lval "," Var + | Annotations Var "," Var + | Var "," Var + + | LvalListBody "," Annotations Var + | LvalListBody "," Var + | LvalListBody "," Lval + + nonterm DictElm prec pDictElm ::= + | Expr ":" Expr + | Expr ":" Lval + | Lval ":" Expr + | Lval ":" Lval + nonterm DictList ::= + | DictElm + | DictList "," DictElm + nonterm Dict ::= + | "{" ":" "}" GenericParamTwo + | "{" DictList "}" GenericParamTwo + + nonterm List ::= + | "[" Expr "]" GenericParamOne + | "[" Lval "]" GenericParamOne + | "[" ExprListBody "]" GenericParamOne + | "[" "]" GenericParamOne prec pList + + nonterm Annotation ::= + | "«" Expr "»" + | "«" Lval "»" + | "«" ExprListBody "»" + | "«" Str "»" + | "const" + | "final" + | "prelude" + | "private" + | "protected" + | "public" + | "pure" + | "static" + | "virtual" + | "«" "»" + nonterm Annotations ::= + | Annotation prec pAnnotations1 + | Annotations Annotation prec pAnnotations2 + + nonterm ProcParmRequired ::= + | Annotations TypeSpec ID prec pProcParmRequired2 + | TypeSpec ID prec pProcParmRequired1 + | Annotations ID prec pProcParmRequired2 + | ID prec pProcParmRequired1 + | Annotations TypeSpec "_" prec pProcParmRequired2 + | TypeSpec "_" prec pProcParmRequired1 + | Annotations "_" prec pProcParmRequired2 + | "_" prec pProcParmRequired1 + nonterm ProcParmRequiredList ::= + | ProcParmRequired prec pProcParmRequiredList + | ProcParmRequiredList "," ProcParmRequired + + nonterm ProcParmOptional prec pProcParmOptional ::= + | Annotations TypeSpec ID "=" Expr + | Annotations TypeSpec ID "=" Lval + + | TypeSpec ID "=" Expr + | TypeSpec ID "=" Lval + + | Annotations ID "=" Expr + | Annotations ID "=" Lval + | ID "=" Expr + | ID "=" Lval + | Annotations TypeSpec "_" "=" Expr + | Annotations TypeSpec "_" "=" Lval + | TypeSpec "_" "=" Expr + | TypeSpec "_" "=" Lval + | Annotations "_" "=" Expr + | Annotations "_" "=" Lval + | "_" "=" Expr + | "_" "=" Lval + nonterm ProcParmOptionalList ::= + | ProcParmOptional + | ProcParmOptionalList "," ProcParmOptional + + nonterm ProcParmPargs ::= + | Annotations "[" "]" GenericParamOne ID + | "[" "]" GenericParamOne ID + | Annotations "[" "]" GenericParamOne "_" + | "[" "]" GenericParamOne "_" + + nonterm ProcParmKargs ::= + | Annotations "{" ":" "}" GenericParamTwo ID + | "{" ":" "}" GenericParamTwo ID + | Annotations "{" ":" "}" GenericParamTwo "_" + | "{" ":" "}" GenericParamTwo "_" + + nonterm ProcParmBody ::= + | ProcRetBody + | epsilon prec pProcParmBody + + nonterm ProcRetBody ::= + | ProcParmRequiredList "," ProcParmOptionalList "," ProcParmPargs "," ProcParmKargs + | ProcParmRequiredList "," ProcParmOptionalList "," ProcParmPargs + | ProcParmRequiredList "," ProcParmOptionalList "," ProcParmKargs + | ProcParmRequiredList "," ProcParmOptionalList + | ProcParmRequiredList "," ProcParmPargs "," ProcParmKargs + | ProcParmRequiredList "," ProcParmPargs + | ProcParmRequiredList "," ProcParmKargs + | ProcParmRequiredList + | ProcParmOptionalList "," ProcParmPargs "," ProcParmKargs + | ProcParmOptionalList "," ProcParmPargs + | ProcParmOptionalList "," ProcParmKargs + | ProcParmOptionalList + | ProcParmPargs "," ProcParmKargs + | ProcParmPargs + | ProcParmKargs + + nonterm Str ::= + | STR + | Str STR + + nonterm Buf ::= + | BUF + | Buf BUF + + nonterm ProcDecl ::= + | Annotations Proc OptionalGenericParamList + | Annotations Proc OptionalGenericParamList "->" ProcRetBody + | Annotations Proc OptionalGenericParamList "->" "(" ")" + | Annotations Proc OptionalGenericParamList "->" "(" ProcRetBody ")" + + | Proc OptionalGenericParamList + | Proc OptionalGenericParamList "->" ProcRetBody + | Proc OptionalGenericParamList "->" "(" ")" + | Proc OptionalGenericParamList "->" "(" ProcRetBody ")" + + | Annotations Proc ID OptionalGenericParamList + | Annotations Proc ID OptionalGenericParamList "->" ProcRetBody + | Annotations Proc ID OptionalGenericParamList "->" "(" ")" + | Annotations Proc ID OptionalGenericParamList "->" "(" ProcRetBody ")" + + | Proc ID OptionalGenericParamList + | Proc ID OptionalGenericParamList "->" ProcRetBody + | Proc ID OptionalGenericParamList "->" "(" ")" + | Proc ID OptionalGenericParamList "->" "(" ProcRetBody ")" + + | Annotations Proc OptionalGenericParamList "(" ProcParmBody ")" + | Annotations Proc OptionalGenericParamList "(" ProcParmBody ")" "->" ProcRetBody + | Annotations Proc OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ")" + | Annotations Proc OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ProcRetBody ")" + + | Proc OptionalGenericParamList "(" ProcParmBody ")" + | Proc OptionalGenericParamList "(" ProcParmBody ")" "->" ProcRetBody + | Proc OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ")" + | Proc OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ProcRetBody ")" + + | Annotations Proc ID OptionalGenericParamList "(" ProcParmBody ")" + | Annotations Proc ID OptionalGenericParamList "(" ProcParmBody ")" "->" ProcRetBody + | Annotations Proc ID OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ")" + | Annotations Proc ID OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ProcRetBody ")" + + | Proc ID OptionalGenericParamList "(" ProcParmBody ")" + | Proc ID OptionalGenericParamList "(" ProcParmBody ")" "->" ProcRetBody + | Proc ID OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ")" + | Proc ID OptionalGenericParamList "(" ProcParmBody ")" "->" "(" ProcRetBody ")" + + nonterm ProcDeclStmt ::= + | ProcDecl + + nonterm ProcExpr ::= + | ProcDecl "{" Stmts "}" + + nonterm GuardVariant ::= + | "continue" + | "throw" + + nonterm GuardExpr ::= + | Annotations "guard" GuardVariant "{" Stmts "}" + | Annotations "guard" GuardVariant "->" ProcRetBody "{" Stmts "}" + | Annotations "guard" GuardVariant "->" "(" ")" "{" Stmts "}" + | Annotations "guard" GuardVariant "->" "(" ProcRetBody ")" "{" Stmts "}" + + | "guard" GuardVariant "{" Stmts "}" + | "guard" GuardVariant "->" ProcRetBody "{" Stmts "}" + | "guard" GuardVariant "->" "(" ")" "{" Stmts "}" + | "guard" GuardVariant "->" "(" ProcRetBody ")" "{" Stmts "}" + + nonterm CatchExpr ::= + | Annotations "catch" "{" Stmts "}" + | Annotations "catch" "->" ProcRetBody "{" Stmts "}" + | Annotations "catch" "->" "(" ")" "{" Stmts "}" + | Annotations "catch" "->" "(" ProcRetBody ")" "{" Stmts "}" + | "catch" "{" Stmts "}" + | "catch" "->" ProcRetBody "{" Stmts "}" + | "catch" "->" "(" ")" "{" Stmts "}" + | "catch" "->" "(" ProcRetBody ")" "{" Stmts "}" + | Annotations "catch" "(" ProcParmBody ")" "{" Stmts "}" + | Annotations "catch" "(" ProcParmBody ")" "->" ProcRetBody "{" Stmts "}" + | Annotations "catch" "(" ProcParmBody ")" "->" "(" ")" "{" Stmts "}" + | Annotations "catch" "(" ProcParmBody ")" "->" "(" ProcRetBody ")" "{" Stmts "}" + | "catch" "(" ProcParmBody ")" "{" Stmts "}" + | "catch" "(" ProcParmBody ")" "->" ProcRetBody "{" Stmts "}" + | "catch" "(" ProcParmBody ")" "->" "(" ")" "{" Stmts "}" + | "catch" "(" ProcParmBody ")" "->" "(" ProcRetBody ")" "{" Stmts "}" + + nonterm InitFieldList prec pInitFieldList ::= + | ID "=" Expr + | ID "=" Lval + | InitFieldList "," ID "=" Expr + | InitFieldList "," ID "=" Lval + + nonterm Inits ::= + | "init" "(" ")" + | "init" "(" CallList ")" + | "init" "(" Expr ")" + | "init" "(" Lval ")" + | "init" "(" ExprListBody ")" + | "init" "(" LvalListBody ")" + | "init" "(" ")" "," InitFieldList + | "init" "(" CallList ")" "," InitFieldList + | "init" "(" Expr ")" "," InitFieldList + | "init" "(" Lval ")" "," InitFieldList + | "init" "(" ExprListBody ")" "," InitFieldList + | "init" "(" LvalListBody ")" "," InitFieldList + | Lval "(" ")" + | Lval "(" CallList ")" + | Lval "(" Expr ")" + | Lval "(" Lval ")" + | Lval "(" ExprListBody ")" + | Lval "(" LvalListBody ")" + | Lval "(" ")" "," InitFieldList + | Lval "(" CallList ")" "," InitFieldList + | Lval "(" Expr ")" "," InitFieldList + | Lval "(" Lval ")" "," InitFieldList + | Lval "(" ExprListBody ")" "," InitFieldList + | Lval "(" LvalListBody ")" "," InitFieldList + | InitFieldList + + nonterm MemberBlock ::= + | Annotations "member" "{" Stmts "}" + | "member" "{" Stmts "}" + + nonterm InitDecl ::= + | Annotations "init" + | Annotations "init" ":" Inits + | Annotations "init" "(" ")" + | Annotations "init" "(" ")" ":" Inits + + | "init" + | "init" ":" Inits + | "init" "(" ")" + | "init" "(" ")" ":" Inits + + | Annotations "init" "(" ProcParmBody ")" + | Annotations "init" "(" ProcParmBody ")" ":" Inits + + | "init" "(" ProcParmBody ")" + | "init" "(" ProcParmBody ")" ":" Inits + + nonterm InitDeclStmt ::= + | InitDecl + + nonterm InitExpr ::= + | InitDecl "{" Stmts "}" + + nonterm ExtendsClass ::= + | epsilon + | "extends" TypeName + + nonterm InterfaceList ::= + | TypeName + | InterfaceList "," TypeName + nonterm ImplementsInterface ::= + | epsilon prec pImplementsInterface + | "implements" InterfaceList + + nonterm ClassDecl ::= + | Annotations "class" OptionalGenericParamList ExtendsClass ImplementsInterface + | "class" OptionalGenericParamList ExtendsClass ImplementsInterface + | Annotations "class" ID OptionalGenericParamList ExtendsClass ImplementsInterface + | "class" ID OptionalGenericParamList ExtendsClass ImplementsInterface + + nonterm ClassExpr ::= + | ClassDecl "{" Stmts "}" + + nonterm ExtendsInterface ::= + | epsilon + | "extends" InterfaceList + nonterm InterfaceDecl ::= + | Annotations "interface" OptionalGenericParamList ExtendsInterface + | "interface" OptionalGenericParamList ExtendsInterface + | Annotations "interface" ID OptionalGenericParamList ExtendsInterface + | "interface" ID OptionalGenericParamList ExtendsInterface + nonterm InterfaceExpr ::= + | InterfaceDecl "{" Stmts "}" + + nonterm EnumDecl ::= + | Annotations "enum" ID + | "enum" ID + nonterm Enum ::= + | Annotations ID + | ID + nonterm Enums ::= + | Enum + | Enum "=" INT + | Enums "," Enum + | Enums "," Enum "=" INT + nonterm EnumExpr ::= + | EnumDecl "{" Enums "}" + + nonterm ImportModuleRelpath ::= + | epsilon prec pImportModuleRelpath + | "^" + | ImportModuleRelpath "^" + # Use right recursion here in order to be able to specify that only the last + # component of the moduleName defines a lexical variable. + nonterm ImportModuleName ::= + | ID "." ImportModuleName + | ID prec pImportModuleName + nonterm ImportModule ::= + | ImportModuleRelpath "." ImportModuleName + | ImportModuleName + + nonterm ImportListItem ::= + | ID OptionalGenericParamList + + nonterm ImportList ::= + | ImportListItem + | ImportList "," ImportListItem + nonterm ImportVars ::= + | "*" + | ImportList prec pImportVars + | "(" ImportList ")" + + nonterm ImportLval ::= + | Annotations "import" ImportModule + | "import" ImportModule + | Annotations "from" ImportModule "import" ImportVars prec pImportLval + | "from" ImportModule "import" ImportVars prec pImportLval + + nonterm CallNamed prec pCallNamed ::= + | ID ":" Expr + | ID ":" Lval + nonterm CallNamedList ::= + | CallNamed + | CallNamedList "," CallNamed + nonterm CallPargs prec pCallPargs ::= + | ":" Expr + | ":" Lval + nonterm CallKargs ::= + | ":" ":" Expr + | ":" ":" Lval + nonterm CallList ::= + # 1111 + | Expr "," CallNamedList "," CallPargs "," CallKargs + | Lval "," CallNamedList "," CallPargs "," CallKargs + | ExprListBody "," CallNamedList "," CallPargs "," CallKargs + | LvalListBody "," CallNamedList "," CallPargs "," CallKargs + # 1110 + | Expr "," CallNamedList "," CallPargs + | Lval "," CallNamedList "," CallPargs + | ExprListBody "," CallNamedList "," CallPargs + | LvalListBody "," CallNamedList "," CallPargs + # 1101 + | Expr "," CallNamedList "," CallKargs + | Lval "," CallNamedList "," CallKargs + | ExprListBody "," CallNamedList "," CallKargs + | LvalListBody "," CallNamedList "," CallKargs + # 1100 + | Expr "," CallNamedList + | Lval "," CallNamedList + | ExprListBody "," CallNamedList + | LvalListBody "," CallNamedList + # 1011 + | Expr "," CallPargs "," CallKargs + | Lval "," CallPargs "," CallKargs + | ExprListBody "," CallPargs "," CallKargs + | LvalListBody "," CallPargs "," CallKargs + # 1010 + | Expr "," CallPargs + | Lval "," CallPargs + | ExprListBody "," CallPargs + | LvalListBody "," CallPargs + # 1001 + | Expr "," CallKargs + | Lval "," CallKargs + | ExprListBody "," CallKargs + | LvalListBody "," CallKargs + # 1000 + # Everywhere that CallList is used, there must be four companion productions with Expr, Lval, + # ExprListBody, and LvalListBody substituted, instead of the following productions. If the + # following productions were enabled, there would be no consistent way to resolve the + # resulting conflicts. + # + # | Expr + # | Lval + # | ExprListBody + # | LvalListBody + # 0111 + | CallNamedList "," CallPargs "," CallKargs + # 0110 + | CallNamedList "," CallPargs + # 0101 + | CallNamedList "," CallKargs + # 0100 + | CallNamedList + # 0011 + | CallPargs "," CallKargs + # 0010 + | CallPargs + # 0001 + | CallKargs + nonterm CallThis prec pCallThis ::= + | "this" ":" Expr + | "this" ":" Lval + nonterm CallExpr ::= + | Expr "(" ")" + | Expr "(" CallThis ")" + | Expr "(" CallList ")" + | Expr "(" Expr ")" + | Expr "(" Lval ")" + | Expr "(" ExprListBody ")" + | Expr "(" LvalListBody ")" + | Expr "(" CallThis "," CallList ")" + | Expr "(" CallThis "," Expr ")" + | Expr "(" CallThis "," Lval ")" + | Expr "(" CallThis "," ExprListBody ")" + | Expr "(" CallThis "," LvalListBody ")" + + | Lval "(" ")" + | Lval "(" CallThis ")" + | Lval "(" CallList ")" + | Lval "(" Expr ")" + | Lval "(" Lval ")" + | Lval "(" ExprListBody ")" + | Lval "(" LvalListBody ")" + | Lval "(" CallThis "," CallList ")" + | Lval "(" CallThis "," Expr ")" + | Lval "(" CallThis "," Lval ")" + | Lval "(" CallThis "," ExprListBody ")" + | Lval "(" CallThis "," LvalListBody ")" + + nonterm UnaryExpr ::= + | "not" Expr prec pUnaryExpr1 + | "not" Lval prec pUnaryExpr1 + | "+" Expr prec pUnaryExpr2 + | "+" Lval prec pUnaryExpr2 + | "-" Expr prec pUnaryExpr2 + | "-" Lval prec pUnaryExpr2 + + nonterm PowOp prec pPowOp ::= + | "^" + nonterm MulOp prec pMulOp ::= + | "*" + | "/" + | "\%" + nonterm PlusOp prec pPlusOp ::= + | "+" + | "-" + + nonterm CmpOp1 prec pCmpOp1 ::= + | "<" + | "<=" + | ">=" + | ">" + nonterm CmpOp2 prec pCmpOp2 ::= + | "==" + | "!=" + + | "===" + | "!==" + nonterm CmpOp3 prec pCmpOp3 ::= + | "and" + nonterm CmpOp4 prec pCmpOp4 ::= + | "xor" + nonterm CmpOp5 prec pCmpOp5 ::= + | "or" + + nonterm InExpr prec pCmpOp2 ::= + # in. + | Expr "in" Expr + | Expr "in" Lval + | Lval "in" Expr + | Lval "in" Lval + | Expr "in" ExprList + | Lval "in" ExprList + | ExprList "in" Expr + | ExprList "in" Lval + | ExprList "in" ExprList + + # not in. + | Expr "not" "in" Expr + | Expr "not" "in" Lval + | Lval "not" "in" Expr + | Lval "not" "in" Lval + | Expr "not" "in" ExprList + | Lval "not" "in" ExprList + | ExprList "not" "in" Expr + | ExprList "not" "in" Lval + | ExprList "not" "in" ExprList + + nonterm InfixExpr ::= + # PowOp. + | Expr PowOp Expr prec pPowOp + | Expr PowOp Lval prec pPowOp + | Expr PowOp "(" ")" prec pPowOp + | Expr PowOp "(" CallList ")" prec pPowOp + # | Expr PowOp "(" Expr ")" prec pPowOp + # | Expr PowOp "(" Lval ")" prec pPowOp + | Expr PowOp "(" ExprListBody ")" prec pPowOp + | Expr PowOp "(" LvalListBody ")" prec pPowOp + + | Lval PowOp Expr prec pPowOp + | Lval PowOp Lval prec pPowOp + | Lval PowOp "(" ")" prec pPowOp + | Lval PowOp "(" CallList ")" prec pPowOp + # | Lval PowOp "(" Expr ")" prec pPowOp + # | Lval PowOp "(" Lval ")" prec pPowOp + | Lval PowOp "(" ExprListBody ")" prec pPowOp + | Lval PowOp "(" LvalListBody ")" prec pPowOp + + # MulOp. + | Expr MulOp Expr prec pMulOp + | Expr MulOp Lval prec pMulOp + | Expr MulOp "(" ")" prec pMulOp + | Expr MulOp "(" CallList ")" prec pMulOp + # | Expr MulOp "(" Expr ")" prec pMulOp + # | Expr MulOp "(" Lval ")" prec pMulOp + | Expr MulOp "(" ExprListBody ")" prec pMulOp + | Expr MulOp "(" LvalListBody ")" prec pMulOp + + | Lval MulOp Expr prec pMulOp + | Lval MulOp Lval prec pMulOp + | Lval MulOp "(" ")" prec pMulOp + | Lval MulOp "(" CallList ")" prec pMulOp + # | Lval MulOp "(" Expr ")" prec pMulOp + # | Lval MulOp "(" Lval ")" prec pMulOp + | Lval MulOp "(" ExprListBody ")" prec pMulOp + | Lval MulOp "(" LvalListBody ")" prec pMulOp + + # PlusOp. + | Expr PlusOp Expr prec pPlusOp + | Expr PlusOp Lval prec pPlusOp + | Expr PlusOp "(" ")" prec pPlusOp + | Expr PlusOp "(" CallList ")" prec pPlusOp + # | Expr PlusOp "(" Expr ")" prec pPlusOp + # | Expr PlusOp "(" Lval ")" prec pPlusOp + | Expr PlusOp "(" ExprListBody ")" prec pPlusOp + | Expr PlusOp "(" LvalListBody ")" prec pPlusOp + + | Lval PlusOp Expr prec pPlusOp + | Lval PlusOp Lval prec pPlusOp + | Lval PlusOp "(" ")" prec pPlusOp + | Lval PlusOp "(" CallList ")" prec pPlusOp + # | Lval PlusOp "(" Expr ")" prec pPlusOp + # | Lval PlusOp "(" Lval ")" prec pPlusOp + | Lval PlusOp "(" ExprListBody ")" prec pPlusOp + | Lval PlusOp "(" LvalListBody ")" prec pPlusOp + + # CmpOp1. + | Expr CmpOp1 Expr prec pCmpOp1 + | Expr CmpOp1 Lval prec pCmpOp1 + | Lval CmpOp1 Expr prec pCmpOp1 + | Lval CmpOp1 Lval prec pCmpOp1 + | Expr CmpOp1 ExprList prec pCmpOp1 + | Lval CmpOp1 ExprList prec pCmpOp1 + | ExprList CmpOp1 Expr prec pCmpOp1 + | ExprList CmpOp1 Lval prec pCmpOp1 + | ExprList CmpOp1 ExprList prec pCmpOp1 + + # InExpr. + | InExpr + + # CmpOp2. + | Expr CmpOp2 Expr prec pCmpOp2 + | Expr CmpOp2 Lval prec pCmpOp2 + | Lval CmpOp2 Expr prec pCmpOp2 + | Lval CmpOp2 Lval prec pCmpOp2 + | Expr CmpOp2 ExprList prec pCmpOp2 + | Lval CmpOp2 ExprList prec pCmpOp2 + | ExprList CmpOp2 Expr prec pCmpOp2 + | ExprList CmpOp2 Lval prec pCmpOp2 + | ExprList CmpOp2 ExprList prec pCmpOp2 + + # CmpOp3. + | Expr CmpOp3 Expr prec pCmpOp3 + | Expr CmpOp3 Lval prec pCmpOp3 + | Lval CmpOp3 Expr prec pCmpOp3 + | Lval CmpOp3 Lval prec pCmpOp3 + + # CmpOp4. + | Expr CmpOp4 Expr prec pCmpOp4 + | Expr CmpOp4 Lval prec pCmpOp4 + | Lval CmpOp4 Expr prec pCmpOp4 + | Lval CmpOp4 Lval prec pCmpOp4 + + # CmpOp5. + | Expr CmpOp5 Expr prec pCmpOp5 + | Expr CmpOp5 Lval prec pCmpOp5 + | Lval CmpOp5 Expr prec pCmpOp5 + | Lval CmpOp5 Lval prec pCmpOp5 + + nonterm AssnExprLeft ::= + | "(" AssnExprLeft ")" + + | Annotations Var "," VarRestId + | Var "," VarRestId + | Lval "," VarRestId prec pAssnExprLeft4 + | LvalListBody "," VarRestId prec pAssnExprLeft4 + + | Annotations Var "," "[" "]" GenericParamOne ":" Lval prec pAssnExprLeft2 + | Var "," "[" "]" GenericParamOne ":" Lval prec pAssnExprLeft2 + | Lval "," "[" "]" GenericParamOne ":" Lval prec pAssnExprLeft2 + | LvalListBody "," "[" "]" GenericParamOne ":" Lval prec pAssnExprLeft2 + + | VarRestId prec pAssnExprLeft1 + | "[" "]" GenericParamOne ":" Lval prec pAssnExprLeft1 + + | Annotations Var prec pAssnExprLeft5 + | Var prec pAssnExprLeft5 + # Everywhere AssnExprLeft is used, there must be a companion production with Lval substituted, + # instead of the following production. + # + # | Lval + | LvalListBody prec pAssnExprLeft3 + + nonterm Var ::= + | Annotations "var" ID prec pVar1 + | "var" ID prec pVar2 + | Annotations TypeSpec ID prec pVar1 + | TypeSpec ID prec pVar2 + + nonterm VarRestId ::= + | Annotations "[" "]" GenericParamOne ID + | "[" "]" GenericParamOne ID prec pVarRestId + | Annotations "[" "]" GenericParamOne "_" + | "[" "]" GenericParamOne "_" prec pVarRestId + + nonterm AttrVar ::= + | Annotations ID + | ID + + nonterm DelimitedAttrExpr ::= + | Annotations "attr" AttrVar "{" Stmts "}" + | "attr" AttrVar "{" Stmts "}" + + # Assignment operators. These can only be used with one left side Lval. + nonterm AssnOp ::= + | "+=" + | "-=" + | "*=" + | "/=" + | "\%=" + | "^=" + + nonterm AssnExpr ::= + | Lval AssnOp Expr prec pAssnExpr1 + | Lval AssnOp Lval prec pAssnExpr1 + | Lval AssnOp "(" ")" + | Lval AssnOp "(" CallList ")" + # | Lval AssnOp "(" Expr ")" + # | Lval AssnOp "(" Lval ")" + | Lval AssnOp "(" ExprListBody ")" prec pAssnExpr3 + | Lval AssnOp "(" LvalListBody ")" + + | Lval "=" Expr prec pAssnExpr2 + | Lval "=" Lval prec pAssnExpr2 + | Lval "=" ExprList prec pAssnExpr2 + + | AssnExprLeft "=" Expr prec pAssnExpr2 + | AssnExprLeft "=" Lval prec pAssnExpr2 + # | Lval "=" Expr prec pAssnExpr2 + | AssnExprLeft "=" ExprList prec pAssnExpr2 + # | Lval "=" ExprList prec pAssnExpr2 + + nonterm ForClause ::= + | "for" AssnExprLeft "in" Expr + | "for" AssnExprLeft "in" Lval + | "for" Lval "in" Expr + | "for" Lval "in" Lval + | "for" AssnExprLeft "in" ExprList + | "for" Lval "in" ExprList + nonterm ForClauseList ::= + | ForClause + | ForClauseList ForClause + nonterm IfClause ::= + | "if" Expr + | "if" Lval + nonterm IfClauseList ::= + | IfClause + | IfClauseList IfClause + nonterm DictComprehensionExpr ::= + | "{" "in" ForClauseList "select" DictList "}" GenericParamTwo + | "{" "in" ForClauseList IfClauseList "select" DictList "}" GenericParamTwo + nonterm ListComprehensionExpr ::= + | "[" "in" ForClauseList "select" Expr "]" GenericParamOne + | "[" "in" ForClauseList "select" Lval "]" GenericParamOne + | "[" "in" ForClauseList "select" ExprList "]" GenericParamOne + + | "[" "in" ForClauseList IfClauseList "select" Expr "]" GenericParamOne + | "[" "in" ForClauseList IfClauseList "select" Lval "]" GenericParamOne + | "[" "in" ForClauseList IfClauseList "select" ExprList "]" GenericParamOne + + nonterm ElifClause ::= + | "elif" Expr "{" Stmts "}" + | "elif" Lval "{" Stmts "}" + | "elif" ExprList "{" Stmts "}" + nonterm ElifList ::= + | ElifList ElifClause + nonterm ElseClause ::= + | "else" "{" Stmts "}" + nonterm IfExpr ::= + | "if" Expr "{" Stmts "}" ElifList ElseClause + | "if" Lval "{" Stmts "}" ElifList ElseClause + | "if" ExprList "{" Stmts "}" ElifList ElseClause + + nonterm IsInListElm ::= + | "not" "in" Expr "{" Stmts "}" + | "not" "in" Lval "{" Stmts "}" + | "not" "in" ExprList "{" Stmts "}" + + | "in" Expr "{" Stmts "}" + | "in" Lval "{" Stmts "}" + | "in" ExprList "{" Stmts "}" + nonterm IsInList ::= + | IsInList IsInListElm + + nonterm IsExpr ::= + | "is" InExpr "{" Stmts "}" IsInList ElseClause + + nonterm DoExpr ::= + | "{" Stmts "}" + + nonterm ForExpr ::= + | ForClause "{" Stmts "}" + + nonterm DoWhileExpr ::= + | "do" "{" Stmts "}" "while" Expr prec pDoWhileExpr1 + | "do" "{" Stmts "}" "while" Lval prec pDoWhileExpr1 + | "do" "{" Stmts "}" "while" ExprList prec pDoWhileExpr2 + + nonterm WhileExpr ::= + | "while" Expr "{" Stmts "}" + | "while" Lval "{" Stmts "}" + | "while" ExprList "{" Stmts "}" + + nonterm AssertExpr prec pAssertExpr ::= + | "assert" Expr + | "assert" Lval + | "assert" ExprList + + nonterm DelimitedExpr ::= + | IfExpr + | IsExpr + | DoExpr + | WhileExpr + | ClassExpr + | InterfaceExpr + | EnumExpr + | DelimitedAttrExpr + | ProcExpr + | GuardExpr + | CatchExpr + | InitExpr + | MemberBlock + | ForExpr + + nonterm NondelimitedExpr ::= + | Str prec pNondelimitedExpr1 + | Buf + | "false" + | "true" + | "null" + | INT + | FLOAT + | "Inf" + | "NaN" + | "$file" + | "$line" + | "$debug" + + | "(" Expr ")" + # Everywhere that NondelimitedExpr is used, there must be a companion production with Lval + # substituted. + # + # | Lval + | Dict + | List + | CallExpr + | UnaryExpr + | InfixExpr + | AssnExpr + | AssnExprLeft prec pNondelimitedExpr2 + | DictComprehensionExpr + | ListComprehensionExpr + | DoWhileExpr + | AssertExpr + + nonterm ExprSlice prec pExprSlice ::= + | "[" Slice "]" + | "[" "]" + + nonterm ExprSuffix ::= + | ID ExprSlice + + | TypeName "." ID prec pExprSuffix + + | LvalSuffix "." ID prec pExprSuffix + | LvalSuffix "." ID ExprSlice + + | ExprSuffix "." ID prec pExprSuffix + | ExprSuffix "." ID ExprSlice + + nonterm Expr ::= + | DelimitedExpr prec pExpr1 + | NondelimitedExpr + # Every use of Expr must be accompanied by a companion Lval production + # | Lval + + | Expr ExprSlice + | Lval ExprSlice + + | Expr "." TypeNameSuffix prec pExpr2 + | Lval "." TypeNameSuffix prec pExpr2 + + | Expr "." LvalSuffix prec pExpr2 + | Lval "." LvalSuffix prec pExpr2 + + | Expr "." ExprSuffix prec pExpr2 + | Lval "." ExprSuffix prec pExpr2 + + nonterm ExprListBody ::= + | Expr "," Expr prec pExprListBodyA + | Expr "," Lval prec pExprListBodyB + | Lval "," Expr prec pExprListBodyA + | ExprListBody "," Expr prec pExprListBodyA + | ExprListBody "," Lval prec pExprListBodyA + | LvalListBody "," Expr prec pExprListBodyA + + nonterm ExprList ::= + | "(" ExprListBody ")" prec pExprList2 + | ExprListBody prec pExprList1 + + nonterm ModuleStmt ::= + | Annotations "module" + | "module" + + nonterm ReturnStmt ::= + | "return" + | "return" "(" ")" + | "return" CallList + | "return" Expr + | "return" Lval + | "return" ExprListBody + | "return" LvalListBody prec pReturnStmt1 + | "return" "(" CallList ")" + # | "return" "(" Expr ")" + # | "return" "(" Lval ")" + | "return" "(" ExprListBody ")" + | "return" "(" LvalListBody ")" + + nonterm BreakStmt ::= + | "break" INT + | "break" + + nonterm ContinueStmt ::= + | "continue" INT + | "continue" + + nonterm ThrowStmt ::= + | "throw" + | "throw" "(" ")" + | "throw" CallList + | "throw" Expr + | "throw" Lval + | "throw" ExprListBody + | "throw" LvalListBody prec pThrowStmt1 + | "throw" "(" CallList ")" + # | "throw" "(" Expr ")" + # | "throw" "(" Lval ")" + | "throw" "(" ExprListBody ")" + | "throw" "(" LvalListBody ")" + + nonterm CblockStmt ::= + | Annotations CBLOCK + | CBLOCK + + nonterm Stmt ::= + | ModuleStmt + | ClassDecl + | InterfaceDecl + | EnumDecl + | InitDeclStmt + | ProcDeclStmt + | ReturnStmt + | BreakStmt + | ContinueStmt + | ThrowStmt + + | NondelimitedExpr + | Lval + | ExprList prec pStmt + + nonterm DelimitedStmt ::= + | CblockStmt + + nonterm StmtList ::= + | Stmt ";" + | DelimitedStmt + | DelimitedExpr prec pStmtList + + | StmtList Stmt ";" + | StmtList DelimitedStmt + | StmtList DelimitedExpr prec pStmtList + | StmtList ";" + + nonterm Stmts ::= + | Stmt + | StmtList + + start Module ::= + | BOI Stmts diff --git a/bootstrap/test/hocc/M.expected b/bootstrap/test/hocc/M.expected new file mode 100644 index 000000000..1ba39570b --- /dev/null +++ b/bootstrap/test/hocc/M.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./M.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++++*.++++++..+*.......++*.++++... \ No newline at end of file diff --git a/bootstrap/test/hocc/M.hmh b/bootstrap/test/hocc/M.hmh new file mode 100644 index 000000000..6e8da9f81 --- /dev/null +++ b/bootstrap/test/hocc/M.hmh @@ -0,0 +1,30 @@ +# Example grammar G2 from Pager(1977), pp 256. + +hocc + token At + token Bt + token Ct + token Dt + token Et + token Tt + token Ut + + start Xn ::= + | At Yn Dt + | At Zn Ct + | At Tn + | Bt Yn Et + | Bt Zn Dt + | Bt Tn + + nonterm Yn ::= + | Tt Wn + | Ut Xn + + nonterm Zn ::= Tt Ut + + nonterm Tn ::= Ut Xn At + + nonterm Wn ::= Ut Vn + + nonterm Vn ::= epsilon diff --git a/bootstrap/test/hocc/N.expected b/bootstrap/test/hocc/N.expected new file mode 100644 index 000000000..b8fa8de0f --- /dev/null +++ b/bootstrap/test/hocc/N.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./N.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++++*++++++*++++.. \ No newline at end of file diff --git a/bootstrap/test/hocc/N.hmh b/bootstrap/test/hocc/N.hmh new file mode 100644 index 000000000..9c99825c3 --- /dev/null +++ b/bootstrap/test/hocc/N.hmh @@ -0,0 +1,23 @@ +# Example grammar G3 from Pager(1977), pp 259. Weak compatibility fails to combine states that would +# be combined if strong compatibility were used. + +hocc + token A + token B + token C + token D + token E + token T + token U + token V + token W + + start X ::= + | A Y D + | A Z C + | B Y E + | B Z D + + nonterm Y ::= T U V + + nonterm Z ::= T U W diff --git a/bootstrap/test/hocc/Parse_a.expected b/bootstrap/test/hocc/Parse_a.expected new file mode 100644 index 000000000..1fcca0cc2 --- /dev/null +++ b/bootstrap/test/hocc/Parse_a.expected @@ -0,0 +1,5 @@ +hocc: Parsing "./Parse_a.hmhi" +hocc: Parsing "./Parse_a.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak +++++ \ No newline at end of file diff --git a/bootstrap/test/hocc/Parse_a.hmh b/bootstrap/test/hocc/Parse_a.hmh new file mode 100644 index 000000000..03d29c5bb --- /dev/null +++ b/bootstrap/test/hocc/Parse_a.hmh @@ -0,0 +1,55 @@ +# Matter. + +include hocc + prec p1 + left p2 + right p3 + prec p4 < p1 + left p5 < p1, p2 + right p6 < p3, p4, p5 + + prec mul + prec add < mul + token OP + token PLUS "+" + token UNS of Uns.t + token MINUS prec add + token STAR "*" prec mul + token SLASH "/" of Unit.t prec mul + + nonterm N1 of Unit.t ::= epsilon -> + (a b) + (|a b|) + [a b] + [|a b|] + {a b} + (a + (|a + [a [|a {a b} b|] b] + b|) + b + ) + a; b; c + + nonterm N2 ::= N1 N2 | epsilon + + start N3 prec p1 ::= epsilon + + nonterm N4 of Unit.t ::= epsilon -> () + + start N5 of Unit.t prec p2 ::= epsilon -> () + + nonterm N6 of Unit.t ::= + | N2 + | OP N3 -> () + | OP N4 -> () + | N4 N5 + | epsilon -> () + + nonterm N7 of Unit.t ::= + | o:OP _:N1 N2 -> foo + + Code = { + } + +# Matter. diff --git a/bootstrap/test/hocc/Parse_a.hmhi b/bootstrap/test/hocc/Parse_a.hmhi new file mode 100644 index 000000000..d1b811bd3 --- /dev/null +++ b/bootstrap/test/hocc/Parse_a.hmhi @@ -0,0 +1,5 @@ +# Matter. + +include hocc + +# Matter. diff --git a/bootstrap/test/hocc/Parse_b.expected b/bootstrap/test/hocc/Parse_b.expected new file mode 100644 index 000000000..5f8ec7b4b --- /dev/null +++ b/bootstrap/test/hocc/Parse_b.expected @@ -0,0 +1,4 @@ +hocc: Parsing "./Parse_b.hmhi" +hocc: Parsing "./Parse_b.hmh" +hocc: Generating specification +hocc: LR(1) item set compatibility: weak diff --git a/bootstrap/test/hocc/Parse_b.hmh b/bootstrap/test/hocc/Parse_b.hmh new file mode 100644 index 000000000..da321f015 --- /dev/null +++ b/bootstrap/test/hocc/Parse_b.hmh @@ -0,0 +1,2 @@ +hocc + prec p \ No newline at end of file diff --git a/bootstrap/test/hocc/Parse_b.hmhi b/bootstrap/test/hocc/Parse_b.hmhi new file mode 100644 index 000000000..efa806861 --- /dev/null +++ b/bootstrap/test/hocc/Parse_b.hmhi @@ -0,0 +1 @@ +hocc \ No newline at end of file diff --git a/bootstrap/test/hocc/Parse_error_cident.expected b/bootstrap/test/hocc/Parse_error_cident.expected new file mode 100644 index 000000000..7091a0d9e --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_cident.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_cident.hmh" +hocc: At ["./Parse_error_cident.hmh":2:10.."./Parse_error_cident.hmh":2:16): Expected cident diff --git a/bootstrap/test/hocc/Parse_error_cident.hmh b/bootstrap/test/hocc/Parse_error_cident.hmh new file mode 100644 index 000000000..d37247175 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_cident.hmh @@ -0,0 +1,2 @@ +hocc + token uident diff --git a/bootstrap/test/hocc/Parse_error_code.expected b/bootstrap/test/hocc/Parse_error_code.expected new file mode 100644 index 000000000..c0ce9ea88 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_code.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_code.hmh" +hocc: At ["./Parse_error_code.hmh":3:0.."./Parse_error_code.hmh":3:0): Expected Hemlock code diff --git a/bootstrap/test/hocc/Parse_error_code.hmh b/bootstrap/test/hocc/Parse_error_code.hmh new file mode 100644 index 000000000..06403228a --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_code.hmh @@ -0,0 +1,2 @@ +hocc + nonterm N of T.t ::= epsilon -> diff --git a/bootstrap/test/hocc/Parse_error_delimited_rarray.expected b/bootstrap/test/hocc/Parse_error_delimited_rarray.expected new file mode 100644 index 000000000..f28b3708e --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rarray.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_delimited_rarray.hmh" +hocc: At ["./Parse_error_delimited_rarray.hmh":4:0.."./Parse_error_delimited_rarray.hmh":4:0): Expected '|]' diff --git a/bootstrap/test/hocc/Parse_error_delimited_rarray.hmh b/bootstrap/test/hocc/Parse_error_delimited_rarray.hmh new file mode 100644 index 000000000..39825e737 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rarray.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | epsilon -> [| diff --git a/bootstrap/test/hocc/Parse_error_delimited_rbrack.expected b/bootstrap/test/hocc/Parse_error_delimited_rbrack.expected new file mode 100644 index 000000000..a84be3ab7 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rbrack.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_delimited_rbrack.hmh" +hocc: At ["./Parse_error_delimited_rbrack.hmh":4:0.."./Parse_error_delimited_rbrack.hmh":4:0): Expected ']' diff --git a/bootstrap/test/hocc/Parse_error_delimited_rbrack.hmh b/bootstrap/test/hocc/Parse_error_delimited_rbrack.hmh new file mode 100644 index 000000000..1b31cc9c5 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rbrack.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | epsilon -> [ diff --git a/bootstrap/test/hocc/Parse_error_delimited_rcapture.expected b/bootstrap/test/hocc/Parse_error_delimited_rcapture.expected new file mode 100644 index 000000000..fd3b5acd7 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rcapture.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_delimited_rcapture.hmh" +hocc: At ["./Parse_error_delimited_rcapture.hmh":4:0.."./Parse_error_delimited_rcapture.hmh":4:0): Expected '|)' diff --git a/bootstrap/test/hocc/Parse_error_delimited_rcapture.hmh b/bootstrap/test/hocc/Parse_error_delimited_rcapture.hmh new file mode 100644 index 000000000..4b44ecdd9 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rcapture.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | epsilon -> (| diff --git a/bootstrap/test/hocc/Parse_error_delimited_rcurly.expected b/bootstrap/test/hocc/Parse_error_delimited_rcurly.expected new file mode 100644 index 000000000..2b32e6e92 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rcurly.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_delimited_rcurly.hmh" +hocc: At ["./Parse_error_delimited_rcurly.hmh":4:0.."./Parse_error_delimited_rcurly.hmh":4:0): Expected '}' diff --git a/bootstrap/test/hocc/Parse_error_delimited_rcurly.hmh b/bootstrap/test/hocc/Parse_error_delimited_rcurly.hmh new file mode 100644 index 000000000..cdea67ea8 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rcurly.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | epsilon -> { diff --git a/bootstrap/test/hocc/Parse_error_delimited_rparen.expected b/bootstrap/test/hocc/Parse_error_delimited_rparen.expected new file mode 100644 index 000000000..92ff73f2b --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rparen.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_delimited_rparen.hmh" +hocc: At ["./Parse_error_delimited_rparen.hmh":4:0.."./Parse_error_delimited_rparen.hmh":4:0): Expected ')' diff --git a/bootstrap/test/hocc/Parse_error_delimited_rparen.hmh b/bootstrap/test/hocc/Parse_error_delimited_rparen.hmh new file mode 100644 index 000000000..608c38ab2 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_delimited_rparen.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | epsilon -> ( diff --git a/bootstrap/test/hocc/Parse_error_hmhi.expected b/bootstrap/test/hocc/Parse_error_hmhi.expected new file mode 100644 index 000000000..6b736744e --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_hmhi.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_hmhi.hmhi" +hocc: At ["./Parse_error_hmhi.hmhi":1:3.."./Parse_error_hmhi.hmhi":1:3): Expected 'hocc' keyword diff --git a/bootstrap/test/hocc/Parse_error_hmhi.hmh b/bootstrap/test/hocc/Parse_error_hmhi.hmh new file mode 100644 index 000000000..21baf0dab --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_hmhi.hmh @@ -0,0 +1,2 @@ +hocc + prec p diff --git a/bootstrap/test/hocc/Parse_error_hmhi.hmhi b/bootstrap/test/hocc/Parse_error_hmhi.hmhi new file mode 100644 index 000000000..257cc5642 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_hmhi.hmhi @@ -0,0 +1 @@ +foo diff --git a/bootstrap/test/hocc/Parse_error_hocc.expected b/bootstrap/test/hocc/Parse_error_hocc.expected new file mode 100644 index 000000000..c8344a1c3 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_hocc.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_hocc.hmh" +hocc: At ["./Parse_error_hocc.hmh":2:0.."./Parse_error_hocc.hmh":2:0): Expected 'hocc' statement diff --git a/bootstrap/test/hocc/Parse_error_hocc.hmh b/bootstrap/test/hocc/Parse_error_hocc.hmh new file mode 100644 index 000000000..257cc5642 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_hocc.hmh @@ -0,0 +1 @@ +foo diff --git a/bootstrap/test/hocc/Parse_error_malformed.expected b/bootstrap/test/hocc/Parse_error_malformed.expected new file mode 100644 index 000000000..d612efee0 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_malformed.expected @@ -0,0 +1,3 @@ +hocc: Parsing "./Parse_error_malformed.hmh" +hocc: At ["./Parse_error_malformed.hmh":2:19.."./Parse_error_malformed.hmh":2:24): Invalid codepoint +hocc: At ["./Parse_error_malformed.hmh":3:25.."./Parse_error_malformed.hmh":3:32): Invalid codepoint diff --git a/bootstrap/test/hocc/Parse_error_malformed.hmh b/bootstrap/test/hocc/Parse_error_malformed.hmh new file mode 100644 index 000000000..4e48f21e7 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_malformed.hmh @@ -0,0 +1,3 @@ +hocc + token FUBAR "fu bar" + start S ::= FUBAR "fu bar" diff --git a/bootstrap/test/hocc/Parse_error_nonterm_cce.expected b/bootstrap/test/hocc/Parse_error_nonterm_cce.expected new file mode 100644 index 000000000..4a20d8840 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_nonterm_cce.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_nonterm_cce.hmh" +hocc: At ["./Parse_error_nonterm_cce.hmh":3:0.."./Parse_error_nonterm_cce.hmh":3:0): Expected '::=' diff --git a/bootstrap/test/hocc/Parse_error_nonterm_cce.hmh b/bootstrap/test/hocc/Parse_error_nonterm_cce.hmh new file mode 100644 index 000000000..ad9acc84b --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_nonterm_cce.hmh @@ -0,0 +1,2 @@ +hocc + nonterm N of T.t diff --git a/bootstrap/test/hocc/Parse_error_of_type_dot.expected b/bootstrap/test/hocc/Parse_error_of_type_dot.expected new file mode 100644 index 000000000..3f4011d31 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_of_type_dot.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_of_type_dot.hmh" +hocc: At ["./Parse_error_of_type_dot.hmh":3:0.."./Parse_error_of_type_dot.hmh":3:0): Expected '.' diff --git a/bootstrap/test/hocc/Parse_error_of_type_dot.hmh b/bootstrap/test/hocc/Parse_error_of_type_dot.hmh new file mode 100644 index 000000000..abec735b7 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_of_type_dot.hmh @@ -0,0 +1,2 @@ +hocc + token A of T diff --git a/bootstrap/test/hocc/Parse_error_precs.expected b/bootstrap/test/hocc/Parse_error_precs.expected new file mode 100644 index 000000000..2fbe19a6f --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_precs.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_precs.hmh" +hocc: At ["./Parse_error_precs.hmh":3:0.."./Parse_error_precs.hmh":3:0): Expected uident diff --git a/bootstrap/test/hocc/Parse_error_precs.hmh b/bootstrap/test/hocc/Parse_error_precs.hmh new file mode 100644 index 000000000..8fccb921b --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_precs.hmh @@ -0,0 +1,2 @@ +hocc + prec p < a, diff --git a/bootstrap/test/hocc/Parse_error_precs_lt.expected b/bootstrap/test/hocc/Parse_error_precs_lt.expected new file mode 100644 index 000000000..eac077b2d --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_precs_lt.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_precs_lt.hmh" +hocc: At ["./Parse_error_precs_lt.hmh":3:0.."./Parse_error_precs_lt.hmh":3:0): Expected uident diff --git a/bootstrap/test/hocc/Parse_error_precs_lt.hmh b/bootstrap/test/hocc/Parse_error_precs_lt.hmh new file mode 100644 index 000000000..a409a574a --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_precs_lt.hmh @@ -0,0 +1,2 @@ +hocc + prec p < diff --git a/bootstrap/test/hocc/Parse_error_prod_param_type.expected b/bootstrap/test/hocc/Parse_error_prod_param_type.expected new file mode 100644 index 000000000..12d05f4b3 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_prod_param_type.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_prod_param_type.hmh" +hocc: At ["./Parse_error_prod_param_type.hmh":3:10.."./Parse_error_prod_param_type.hmh":3:12): Expected production parameter symbol diff --git a/bootstrap/test/hocc/Parse_error_prod_param_type.hmh b/bootstrap/test/hocc/Parse_error_prod_param_type.hmh new file mode 100644 index 000000000..8812a56f6 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_prod_param_type.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | x:42 -> () diff --git a/bootstrap/test/hocc/Parse_error_reduction_arrow.expected b/bootstrap/test/hocc/Parse_error_reduction_arrow.expected new file mode 100644 index 000000000..f73d11882 --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_reduction_arrow.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_reduction_arrow.hmh" +hocc: At ["./Parse_error_reduction_arrow.hmh":4:0.."./Parse_error_reduction_arrow.hmh":4:0): Expected '->' diff --git a/bootstrap/test/hocc/Parse_error_reduction_arrow.hmh b/bootstrap/test/hocc/Parse_error_reduction_arrow.hmh new file mode 100644 index 000000000..d6884cbdd --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_reduction_arrow.hmh @@ -0,0 +1,3 @@ +hocc + nonterm N of T.t ::= + | epsilon diff --git a/bootstrap/test/hocc/Parse_error_uident.expected b/bootstrap/test/hocc/Parse_error_uident.expected new file mode 100644 index 000000000..513b9a32a --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_uident.expected @@ -0,0 +1,2 @@ +hocc: Parsing "./Parse_error_uident.hmh" +hocc: At ["./Parse_error_uident.hmh":2:9.."./Parse_error_uident.hmh":2:15): Expected uident diff --git a/bootstrap/test/hocc/Parse_error_uident.hmh b/bootstrap/test/hocc/Parse_error_uident.hmh new file mode 100644 index 000000000..8818d5afb --- /dev/null +++ b/bootstrap/test/hocc/Parse_error_uident.hmh @@ -0,0 +1,2 @@ +hocc + prec Cident diff --git a/bootstrap/test/hocc/dune b/bootstrap/test/hocc/dune new file mode 100644 index 000000000..04255c1ee --- /dev/null +++ b/bootstrap/test/hocc/dune @@ -0,0 +1,381 @@ +(rule + (deps + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to help_a.out (run %{bin:hocc} -v -verbose -txt -text -html -hmh -hocc -c -canonical -hm -hemlock -ml -ocaml -s Foo -src Foo -d bar -dstdir bar -h))))) +(rule + (alias runtest) + (action (diff help_a.expected help_a.out))) + +(rule + (deps + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to help_b.out (run %{bin:hocc} -v -verbose -txt -text -html -hmh -hocc -c -canonical -hm -hemlock -ml -ocaml -s Foo -src Foo -d bar -dstdir bar -no-such-option))))) +(rule + (alias runtest) + (action (diff help_b.expected help_b.out))) + +(rule + (deps + (glob_files Parse_a.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_a.out (run %{bin:hocc} -v -s Parse_a))))) +(rule + (alias runtest) + (action (diff Parse_a.expected Parse_a.out))) + +(rule + (deps + (glob_files Parse_b.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_b.out (run %{bin:hocc} -v -s Parse_b))))) +(rule + (alias runtest) + (action (diff Parse_b.expected Parse_b.out))) + +(rule + (deps + (glob_files Parse_error_malformed.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_malformed.out (run %{bin:hocc} -v -s Parse_error_malformed))))) +(rule + (alias runtest) + (action (diff Parse_error_malformed.expected Parse_error_malformed.out))) + +(rule + (deps + (glob_files Parse_error_uident.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_uident.out (run %{bin:hocc} -v -s Parse_error_uident))))) +(rule + (alias runtest) + (action (diff Parse_error_uident.expected Parse_error_uident.out))) + +(rule + (deps + (glob_files Parse_error_cident.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_cident.out (run %{bin:hocc} -v -s Parse_error_cident))))) +(rule + (alias runtest) + (action (diff Parse_error_cident.expected Parse_error_cident.out))) + +(rule + (deps + (glob_files Parse_error_precs_lt.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_precs_lt.out (run %{bin:hocc} -v -s Parse_error_precs_lt))))) +(rule + (alias runtest) + (action (diff Parse_error_precs_lt.expected Parse_error_precs_lt.out))) + +(rule + (deps + (glob_files Parse_error_precs.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_precs.out (run %{bin:hocc} -v -s Parse_error_precs))))) +(rule + (alias runtest) + (action (diff Parse_error_precs.expected Parse_error_precs.out))) + +(rule + (deps + (glob_files Parse_error_of_type_dot.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_of_type_dot.out (run %{bin:hocc} -v -s Parse_error_of_type_dot))))) +(rule + (alias runtest) + (action (diff Parse_error_of_type_dot.expected Parse_error_of_type_dot.out))) + +(rule + (deps + (glob_files Parse_error_delimited_rparen.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_delimited_rparen.out (run %{bin:hocc} -v -s Parse_error_delimited_rparen))))) +(rule + (alias runtest) + (action (diff Parse_error_delimited_rparen.expected Parse_error_delimited_rparen.out))) + +(rule + (deps + (glob_files Parse_error_delimited_rcapture.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_delimited_rcapture.out (run %{bin:hocc} -v -s Parse_error_delimited_rcapture))))) +(rule + (alias runtest) + (action (diff Parse_error_delimited_rcapture.expected Parse_error_delimited_rcapture.out))) + +(rule + (deps + (glob_files Parse_error_delimited_rbrack.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_delimited_rbrack.out (run %{bin:hocc} -v -s Parse_error_delimited_rbrack))))) +(rule + (alias runtest) + (action (diff Parse_error_delimited_rbrack.expected Parse_error_delimited_rbrack.out))) + +(rule + (deps + (glob_files Parse_error_delimited_rarray.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_delimited_rarray.out (run %{bin:hocc} -v -s Parse_error_delimited_rarray))))) +(rule + (alias runtest) + (action (diff Parse_error_delimited_rarray.expected Parse_error_delimited_rarray.out))) + +(rule + (deps + (glob_files Parse_error_delimited_rcurly.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_delimited_rcurly.out (run %{bin:hocc} -v -s Parse_error_delimited_rcurly))))) +(rule + (alias runtest) + (action (diff Parse_error_delimited_rcurly.expected Parse_error_delimited_rcurly.out))) + +(rule + (deps + (glob_files Parse_error_code.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_code.out (run %{bin:hocc} -v -s Parse_error_code))))) +(rule + (alias runtest) + (action (diff Parse_error_code.expected Parse_error_code.out))) + +(rule + (deps + (glob_files Parse_error_prod_param_type.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_prod_param_type.out (run %{bin:hocc} -v -s Parse_error_prod_param_type))))) +(rule + (alias runtest) + (action (diff Parse_error_prod_param_type.expected Parse_error_prod_param_type.out))) + +(rule + (deps + (glob_files Parse_error_reduction_arrow.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_reduction_arrow.out (run %{bin:hocc} -v -s Parse_error_reduction_arrow))))) +(rule + (alias runtest) + (action (diff Parse_error_reduction_arrow.expected Parse_error_reduction_arrow.out))) + +(rule + (deps + (glob_files Parse_error_nonterm_cce.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_nonterm_cce.out (run %{bin:hocc} -v -s Parse_error_nonterm_cce))))) +(rule + (alias runtest) + (action (diff Parse_error_nonterm_cce.expected Parse_error_nonterm_cce.out))) + +(rule + (deps + (glob_files Parse_error_hocc.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_hocc.out (run %{bin:hocc} -v -s Parse_error_hocc))))) +(rule + (alias runtest) + (action (diff Parse_error_hocc.expected Parse_error_hocc.out))) + +(rule + (deps + (glob_files Parse_error_hmhi.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Parse_error_hmhi.out (run %{bin:hocc} -v -s Parse_error_hmhi))))) +(rule + (alias runtest) + (action (diff Parse_error_hmhi.expected Parse_error_hmhi.out))) + +(rule + (deps + (glob_files Example1.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Example1.out (run %{bin:hocc} -v -s Example1))))) +(rule + (alias runtest) + (action (diff Example1.expected Example1.out))) + +(rule + (deps + (glob_files Hocc.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Hocc.out (run %{bin:hocc} -v -s Hocc))))) +(rule + (alias runtest) + (action (diff Hocc.expected Hocc.out))) + +(rule + (deps + (glob_files A.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to A.out (run %{bin:hocc} -v -s A))))) +(rule + (alias runtest) + (action (diff A.expected A.out))) + +(rule + (deps + (glob_files B.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to B.out (run %{bin:hocc} -v -s B))))) +(rule + (alias runtest) + (action (diff B.expected B.out))) + +(rule + (deps + (glob_files C.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to C.out (run %{bin:hocc} -v -s C))))) +(rule + (alias runtest) + (action (diff C.expected C.out))) + +(rule + (deps + (glob_files D.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to D.out (run %{bin:hocc} -v -s D))))) +(rule + (alias runtest) + (action (diff D.expected D.out))) + +(rule + (deps + (glob_files E.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to E.out (run %{bin:hocc} -v -s E))))) +(rule + (alias runtest) + (action (diff E.expected E.out))) + +(rule + (deps + (glob_files F.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to F.out (run %{bin:hocc} -v -s F))))) +(rule + (alias runtest) + (action (diff F.expected F.out))) + +(rule + (deps + (glob_files M.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to M.out (run %{bin:hocc} -v -s M))))) +(rule + (alias runtest) + (action (diff M.expected M.out))) + +(rule + (deps + (glob_files N.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to N.out (run %{bin:hocc} -v -s N))))) +(rule + (alias runtest) + (action (diff N.expected N.out))) + +(rule + (deps + (glob_files Lyken.hmh*) + %{bin:hocc}) + (action + (with-accepted-exit-codes + (or 0 1) + (with-outputs-to Lyken.out (run %{bin:hocc} -v -s Lyken))))) +(rule + (alias runtest) + (action (diff Lyken.expected Lyken.out))) diff --git a/bootstrap/test/hocc/help_a.expected b/bootstrap/test/hocc/help_a.expected new file mode 100644 index 000000000..13b034918 --- /dev/null +++ b/bootstrap/test/hocc/help_a.expected @@ -0,0 +1,25 @@ +hocc usage: hocc + +Parameters: + -h[elp] : Print command usage and exit. + -v[erbose] : Print progress information during parser generation. + -txt | -text : Write a detailed automoton description in plain text format + to "/hocc/.txt". + -html : Write a detailed automoton description in internally + hyperlinked HTML format to "/hocc/.html". + -hmh | -hocc : Write a complete grammar specification in hocc format to + "/hocc/.hmh", but with all non-terminal + types and reduction code omitted. + -c[anonical] : Generate a canonical LR(1) parser rather than a compact + LR(1) parser. + -hm | -hemlock : Generate a Hemlock-based parser implementation and write it + to "/.hm[i]". + -ml | -ocaml : Generate an OCaml-based parser implementation and write it + to "/.ml[i]". This is brittle + functionality intended only for Hemlock bootstrapping. + -s[rc] : Path and module name of input source, where inputs match + ".hmh[i]" and "" comprises the source directory + and module name, "[/]". +-d[stdir] : Path to directory in which to place generated output, such + that output file paths match "/[hocc/].*". + Defaults to "". diff --git a/bootstrap/test/hocc/help_b.expected b/bootstrap/test/hocc/help_b.expected new file mode 100644 index 000000000..60e1782ce --- /dev/null +++ b/bootstrap/test/hocc/help_b.expected @@ -0,0 +1,26 @@ +hocc: Invalid command line parameter: "-no-such-option" +hocc usage: hocc + +Parameters: + -h[elp] : Print command usage and exit. + -v[erbose] : Print progress information during parser generation. + -txt | -text : Write a detailed automoton description in plain text format + to "/hocc/.txt". + -html : Write a detailed automoton description in internally + hyperlinked HTML format to "/hocc/.html". + -hmh | -hocc : Write a complete grammar specification in hocc format to + "/hocc/.hmh", but with all non-terminal + types and reduction code omitted. + -c[anonical] : Generate a canonical LR(1) parser rather than a compact + LR(1) parser. + -hm | -hemlock : Generate a Hemlock-based parser implementation and write it + to "/.hm[i]". + -ml | -ocaml : Generate an OCaml-based parser implementation and write it + to "/.ml[i]". This is brittle + functionality intended only for Hemlock bootstrapping. + -s[rc] : Path and module name of input source, where inputs match + ".hmh[i]" and "" comprises the source directory + and module name, "[/]". +-d[stdir] : Path to directory in which to place generated output, such + that output file paths match "/[hocc/].*". + Defaults to "". diff --git a/doc/design/index.md b/doc/design/index.md index 53cb56aeb..bad36e84a 100644 --- a/doc/design/index.md +++ b/doc/design/index.md @@ -1,4 +1,4 @@ -# Hemlock +# Hemlock Design [Hemlock](https://github.com/BranchTaken/Hemlock) is a systems programming language, but not all systems are alike. Hemlock is intentionally constrained to excel for a (large) subset of possible diff --git a/doc/index.md b/doc/index.md new file mode 100644 index 000000000..0e970b794 --- /dev/null +++ b/doc/index.md @@ -0,0 +1,5 @@ +# Hemlock Documentation + +- [Design](design/index.md) +- Tools + + [`hocc`](tools/hocc.md) diff --git a/doc/tools/hocc.md b/doc/tools/hocc.md new file mode 100644 index 000000000..20c23405e --- /dev/null +++ b/doc/tools/hocc.md @@ -0,0 +1,906 @@ +# hocc + +`hocc` is an [LR(1) parser generator](https://en.wikipedia.org/wiki/Canonical_LR_parser). Its name +carries on a long tradition, to wit: + +- [`yacc`](https://en.wikipedia.org/wiki/Yacc) stands for "Yet Another Compiler Compiler". Clearly + the name derives from "yack", as in, "Chuck's dinner didn't sit well and he yacked it." +- `hocc` stands for "Hardly Original Compiler Compiler". The name derives from "hock", as in, "Hank + hocked a loogie." + +Both programs interpret high-level human-written parser descriptions and produce output unfit for +human consumption. However `hocc` has several distinguishing features relative to `yacc`, aside from +interoperating with [Hemlock](https://github.com/BranchTaken/Hemlock) rather than +[C](https://en.wikipedia.org/wiki/The_C_Programming_Language). + +- `hocc` generates LR(1) rather than [LALR(1)](https://en.wikipedia.org/wiki/LALR_parser) parsers + using a fast behavior-preserving compaction algorithm[^pager1977] that reduces the state machine + size relative to the canonical LR(1) algorithm[^knuth1965]. +- `hocc`'s precedence facilities are more precise and easier to use without inadvertently masking + grammar ambiguities. Whereas `yacc` supports only a single linear precedence ordering, `hocc` + supports arbitrarily many directed acyclic precedence graphs. Given this more powerful conflict + resolution mechanism, `hocc` refuses to generate parsers for ambiguous grammars. +- `hocc` supports an automated error recovery algorithm[^diekmann2020] based on minimum-cost repair + sequences. + +## Command usage + +`hocc ` + +Parameters: + +- `-h[elp]`: Print command usage and exit. +- `-v[erbose]`: Print progress information during parser generation. +- `-txt` | `-text`: Write a detailed automoton description in plain text format to + `/hocc/.txt`. +- `-html`: Write a detailed automoton description in internally hyperlinked HTML format to + `/hocc/.html`. +- `-hmh` | `-hocc`: Write a complete grammar specification in `hocc` format to + `/hocc/.hmh`, but with all non-terminal types and reduction code omitted. +- `-c[anonical]`: Generate a canonical LR(1) parser rather than a compact LR(1) parser. +- `-hm` | `-hemlock`: Generate a Hemlock-based parser implementation and write it to + `/.hm[i]`. +- `-ml` | `-ocaml`: Generate an OCaml-based parser implementation and write it to + `/.ml[i]`. This is brittle functionality intended only for Hemlock + bootstrapping. +- `-s[rc] `: Path and module name of input source, where inputs match `.hmh[i]` and + `` comprises the source directory and module name, `[/]`. +- `-d[stdir] `: Path to directory in which to place generated output, such that output file + paths match `/[hocc/].*`. Defaults to ``. + +Syntax errors in the input file may prevent file generation. Specification errors do not prevent +report and graph file generation, but all specification errors must be resolved for parser +generation to succeed. Some syntax errors in the embedded Hemlock code may pass through `hocc` +unnoticed. + +Example invocations: + +- `hocc -hm -src Parser`: Read `Parser.hmh[i]` and generate `Parser.hm[i]`. +- `hocc -verbose -report -graph -hocc -hemlock -src src/Parser -d obj`: Verbosely read + `src/Parser.hmh[i]` and generate `obj/hocc/Parser.{report,dot,hmh}` and `obj/Parser.{hm,hmi}`. + +## Generated API + +The generated parser is encapsulated in a module with an interface similar to the following. The +interface is mainly relevant to application code which utilizes the generated parser rather than the +specification itself, with the exception that non-terminals may need to refer to the `Token.t` type. +Note that the `EPSILON` token identifier, alias `"ε"`, is reserved as the token associated with the +start state at the base of the parser stack; it remains on the stack until parsing accepts and is +therefore visible to introspection at any intermediate parse state. The `PSEUDO_END` token +identifier, alias `"⊥"`, is reserved as a terminator pseudo-token that follows start symbols; +although `"⊥"` is never constructed, it can appear in lookahead sets and is therefore exposed for +parser introspection purposes. + +The generated parser intentionally does not support effects in reduction code, so that intermediate +parser states can be used as persistent reusable snapshots. + +```hemlock +{ + Spec = { + Assoc = { + type t: t = + | Left + | Right + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + Prec = { + type t: t = { + index: uns # Index in `precs` array. + name: string + assoc: option Assoc.t + doms: Ordset.t uns # Indices in `precs` array of dominator precedences. + } + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + precs: array Prec.t + [@@doc "Array of precedences, where each element's `index` field corresponds to the + element's array index."] + + Token = { + type t: t = { + index: uns # Index in `symbols` array. + name: string + alias: option string + prec: option Prec.t + } + + hash_map: t -> Hash.State.t -> Hash.State.t + cmp: t -> t -> Cmp.t + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + Prod = { + type t: t = { + index: uns # Index of corresponding reduction function in `reductions` array. + lhs_index: uns + rhs_indexes: array uns + prec: option Prec.t + } + + hash_map: t -> Hash.State.t -> Hash.State.t + cmp: t -> t -> Cmp.t + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + Nonterm = { + type t: t = { + index: uns # Index in `symbols` array. + name: string + start: bool + prec: option Prec.t + prods: Ordset.t Prod.t + } + + hash_map: t -> Hash.State.t -> Hash.State.t + cmp: t -> t -> Cmp.t + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + Symbol = { + type t: t = + | Token of Token.t + | Nonterm of Nonterm.t + + hash_map: t -> Hash.State.t -> Hash.State.t + cmp: t -> t -> Cmp.t + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + symbols: array Symbol.t + [@@doc "Array of symbols, where each element's `index` field corresponds to the element's + array index."] + + Item = { + type t: t = { + prod: Prod.t + dot_pos: uns + lookahead: Ordset.t Token.t + } + + hash_map: t -> Hash.State.t -> Hash.State.t + cmp: t -> t -> Cmp.t + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + ItemSet = { + type t: t = { + kernel: Ordset.t Item.t + added: Ordset.t Item.t + } + + hash_map: t -> Hash.State.t -> Hash.State.t + cmp: t -> t -> Cmp.t + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + Action = { + type t: t = + | Shift of uns + | Reduce of uns + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + State = { + type t: t = { + index: uns # Index in `states` array. + itemset: ItemSet.t + actions: Map.t Symbol.t Action.t Symbol + goto: Map.t Token.t uns Token + } + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + states: array State.t + [@@doc "Array of CFSM states, where each element's `index` field corresponds to the + element's array index."] + } + + Token = { + type t: t = + # One variant per `token` statement, e.g. `A` and `B`. + | EPSILON of unit # Reserved. + | PSEUDO_END of unit # Reserved. + | A of TypeA.t + | B of TypeB.t + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + + spec: t -> Spec.Token.t + } + + Nonterm = { + type t: t = + # One variant per `nonterm`/`start` statement, e.g. `S` and `N`. + | S of TypeS.t + | N of TypeN.t + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + + spec: t -> Spec.Nonterm.t + } + + Symbol = { + type t: t = + | Token of Token.t + | Nonterm of Nonterm.t + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + State = { + type t: t = uns + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + + spec: t -> Spec.State.t + } + + type stack_elm: stack_elm = (Symbol.t, State.t) + type stack: stack = list stack_elm + type reduction: reduction = stack -> stack + + reductions: array reduction + [@@doc "Array of reductions, where each element's `index` field corresponds to the element's + array index."] + + Status = { + type t: t = + # `feed`/`step` may produce these variants; `next` fast-forwards over them. + | ShiftPrefix of (Token.t, State.t) + | ShiftAccept of (Token.t, State.t) + | Reduce of reduction + # Common variants. + | Prefix # Valid parse prefix; more input needed. + | Accept of Nonterm.t # Successful parse result. + | Reject of Token.t # Syntax error due to unexpected token. + + pp >e: t -> Fmt.Formatter e >e-> Fmt.Formatter e + } + + type t: t = { + stack: stack + status: status + } + + Start = { + # One submodule per `start` symbol, e.g. `S`. + S = { + boi: t + } + } + + feed: Token.t -> t -> t + [@@doc "`feed token t` returns a result with status in {`ShiftPrefix`, `ShiftAccept`, + `Reject`}. `t.status` must be `Prefix`."] + + step: t -> t + [@@doc "`step t` returns the result of applying one state transition to `t`. `t.status` must + be in {`ShiftPrefix`, `ShiftAccept`, `Reduce`}."] + + next: -> Token.t -> t -> t + [@@doc "`next token t` calls `feed token t` and fast-forwards via `step` calls to return a + result with status in {`Prefix`, `Accept`, `Reject`}. `t.status` must be `Prefix`."] + } +``` + +## Parser specification + +The `hocc` specification grammar is layered onto Hemlock's grammar via the addition of several +keywords: + +- Parser: `hocc` +- Symbols: + + [Tokens](#tokens): `token` + + [Non-terminals](#non-terminals): `nonterm`, `start` + + [Productions](#productions): `epsilon` +- [Precedence](#precedence): `prec`, `left`, `right` + +A valid parser specification is encapsulated by a `hocc` statement and describes how to construct a +parse tree of symbols. `token` statements correspond to terminal symbols, i.e. leaf nodes in the +parse tree, whereas non-terminal `start`/`nonterm` statements correspond to internal nodes in the +parse tree. A parse tree always has a non-terminal start symbol at its root. Non-terminals have +associated production patterns that specify how to construct non-terminal nodes during post-order +tree construction. Precedences may be declared via the `prec`/`left`/`right` statements and symbols +may be assigned those precedences for use during conflict resolution. + +The following subsections document specification semantics. See the `hocc` [grammar](#grammar) +specification for comprehensive syntax details. + +### Tokens + +Token identifiers match `[_]*[A-Z][A-Za-z0-9_']*` in conformance with Hemlock's capitalized +identifier syntax. By convention the `hocc` documentation restricts token identifiers to +`[A-Z][A-Z0-9_]*` to distinguish tokens from non-terminals, but other conventions can work just as +well. + +```hocc +hocc + token SOME_TOKEN +``` + +In practice, many token types serve as punctuation and have invariant contents. These token types +can be declared with a string alias, which can then be used in production patterns. + +```hocc +hocc + token LPAREN "(" +``` + +Tokens with variant contents must have a declared data type, since the implicit token data type is +`Unit.t`. The data type must be of the form `.`, where the module provides +`hash_fold`, `cmp`, and `pp` functions in support of the type. All of the `Basis` modules which +implement data types meet these requirements. + +```hocc +hocc + token VALUE of Zint.t +``` + +Tokens may be assigned [precedence](#precedence) to aid in conflict resolution. + +```hocc +hocc + left p + token X prec p +``` + +### Non-terminals + +Non-terminal identifiers match `[_]*[A-Z][A-Za-z0-9_']*` in conformance with Hemlock's capitalized +identifier syntax. By convention the `hocc` documentation restricts non-terminal identifiers to +`[A-Z][A-Za-z0-9]*` to distinguish non-terminals from tokens, but other conventions can work just as +well. + +```hocc +hocc + nonterm SomeNonterm ::= # [...] + start SomeStart ::= # [...] +``` + +Note that each `start` symbol is augmented with a wrapper symbol that facilitates parser reduction +actions on the corresponding start symbol. The wrapper's name is generated by appending a `'` to the +start symbol's name. For example, `start S ...` implies the `S'` wrapper symbol. As such, `start S +...` and `nonterm S' ...` cannot coexist. + +As for tokens, non-terminals with variant contents must have a declared data type, since the +implicit non-terminal data type is `Unit.t`. A parser which universally utilizes implicitly typed +non-terminals does not construct a parse tree, but it may still be useful as a recognizer, or as an +abstract grammar specification which `hocc` can verify without generating a parser. + +```hocc +hocc + nonterm SomeNonterm of Node.t ::= # [...] + start SomeStart of Node.t ::= # [...] +``` + +#### Productions + +Each non-terminal symbol has one or more associated productions, which denote patterns for combining +symbols to construct a symbol during post-order parse tree construction. + +```hocc +hocc + token SOME_TOKEN + nonterm SomeNonterm ::= SOME_TOKEN + start SomeStart ::= SomeNonterm SOME_TOKEN +``` + +As a special case, the `epsilon` keyword denotes an empty pattern which can be used to construct a +non-terminal without combining any symbols. + +```hocc +hocc + token A + token B + nonterm N ::= + | A + | epsilon + start S ::= N B +``` + +All of the above examples use non-terminals of implicit data type `Unit.t`, which also implies +trivial `()` "reduction" code. This can be written explicitly. + +```hocc +hocc + token A + token B + nonterm N of Unit.t ::= + | A + | epsilon + -> () + start S of Unit.t ::= N B -> () +``` + +Parsers which construct a parse tree may need to associate production-specific reduction code rather +than sharing the reduction code with all of a non-terminal's productions. As for Hemlock pattern +matching, all productions which share reduction code must specify equivalently typed lexical +bindings. + +```hocc +hocc + token U of Uns.t + token PLUS "+" + start S of Uns.t ::= + | PLUS u1:U u2:U + | u1:U "+" u2:U + | u1:U u2:U _:PLUS -> + u1 + u2 +``` + +Ordinarily, the characteristic finite state machine (CFSM) corresponding to an LR(1) grammar delays +each transition until the lookahead symbol becomes available. However this poses a challenge for +start symbols because there is no concrete lookahead symbol past the end of input. The following +invalid grammar would infinitely recurse. + +```hocc +# Invalid (infinite recursion). +hocc + token U of Uns.t + start S of Uns.t ::= + | u:U s:S -> u + s + | epsilon -> 0 +``` + +A typical solution to this challenge is to require the application to signal end of input to the +CFSM via a dedicated API. However `hocc` uses the same approach as Menhir [[^fpottier]] and instead +proactively (transitively) reduces when the current symbol unambiguously delimits a valid start +symbol reduction. Some start symbols may trivially meet the requirements for proactive reduction, +e.g. for a grammar which incrementally parses a file comprising newline-separated statements, each +of which is encapsulated by a start symbol. However, many grammars do need to manually incorporate +an explicit end of input token. The above grammar can be repaired as follows. + +```hocc +# Valid, though right-associative. +hocc + token U of Uns.t + token EOI + start S of Uns.t ::= + | u:U s:S -> u + s + | EOI -> 0 +``` + +Note that the above grammar is right-associative only because the `EOI` repair is simpler to +demonstrate in that context. The following left-associative grammar is preferable in practice +because it incrementally constructs the parse tree rather than delaying all reductions until `EOI`. + +```hocc +# Valid and left-associative. +hocc + token U of Uns.t + token EOI + start S of Uns.t ::= + | s:S u:U -> s + u + | s:S EOI -> s + | epsilon -> 0 +``` + +### Precedence + +A parser specification may contain conflicts wherein a parser state encodes multiple valid actions +for one or more inputs. `hocc` refuses to generate parsers which contain unresolved conflicts. +Parser specifications can often be refactored or expanded to eliminate conflicts, but such revisions +may reduce clarity and maintainability. Precedences provide a mechanism for conflict resolution, +i.e. explicit choice of actions. `hocc` attempts to resolve conflicts based on the precedences +assigned to tokens and productions. + +Each production can specify its precedence, or if all of a non-terminal's productions are to have +the same precedence, the precedence can be more succinctly specified for the non-terminal as a +whole. It is an error to explicitly specify both a non-terminal's precedence and the precedence of +any of its productions. + +Precedences may be defined with any of the following associativities: + +- `prec`: Do not resolve conflicts via associativity. Non-associativity is useful for specifying + precedence-based resolutions without inadvertently masking conflicts. +- `left`: Resolve shift/reduce conflicts by reducing. This induces left associativity, e.g. + `2 + 3 + 4` is parsed as `(2 + 3) + 4`. +- `right`: Resolve shift/reduce conflicts by shifting. This induces right associativity, e.g. + `2 + 3 + 4` is parsed as `2 + (3 + 4)`. All else being equal, prefer left associativity to + minimize intermediate parser state. + +Precedences can be defined via the `prec`, `left`, and `right` statements, and they may optionally +be ordered via `<` relationships with previously defined precedences, irrespective of associativity. +These precedence relationships are used to compute the transitive closure of precedence orderings. +Precedences with disjoint relationships are incomparable, i.e. they have no relative ordering. By +default, all tokens and productions have a *lack* of precedence, which is equivalent to each such +token/production being assigned a unique disjoint `prec` precedence. + +Conflicts may occur between two or more actions, of which at least one is a reduce action. Such an +action set induces shift/reduce and/or reduce/reduce conflicts; by construction shift/shift +conflicts cannot occur. Given conflicting actions A and B, A "dominates" B if A is preferred over B. +For conflict resolution to succeed, one action must dominate all other conflicting actions. The +rules for conflict resolution are as follows. If none of the rules apply, conflict resolution fails. + +- If one action has higher precedence than all other actions, that action dominates. +- If a subset of actions has higher precedence than all other actions, and the actions in the + highest-precedence subset have equal associativity, associativity resolves the conflict under + either of the following circumstances: + + `left`: A single reduce action dominates shift actions. + + `right`: Shift actions dominate one or more reduce actions. + +Associativity suffices for resolving simple shift/reduce conflicts as in e.g. `2 + 3 + 4`, so that +it is deterministically parsed as `(2 + 3) + 4` (as in the following example specification) or +`2 + (3 + 4)`. + +```hocc +hocc + left add + token PLUS prec add + token INT of Int.t + nonterm Expr of Int.t ::= + | x:INT -> x + | e0:Expr PLUS e1:Expr prec add -> Int.(e0 + e1) +``` + +Alternatively, precedence ordering can resolve shift/reduce conflicts, though associativity is +preferable when applicable. + +```hocc +hocc + prec add + prec plus < add + token PLUS prec plus + token INT of Int.t + nonterm Expr of Int.t ::= + | x:INT -> x + | e0:Expr PLUS e1:Expr prec add -> Int.(e0 + e1) +``` + +Precedence ordering can also resolve reduce/reduce conflicts between productions, which is beyond +the power of associativity. In the following parser specification, `MUL` has precedence over `PLUS` +due to the precedence relationship `add < mul`, so `2 + 3 * 4` is parsed as `2 + (3 * 4)`. + +```hocc +hocc + left mul + token MUL prec mul + left add < mul + token PLUS prec add + token INT of Int.t + nonterm Expr of Int.t ::= + | e0:Expr MUL e1:Expr prec mul -> Int.(e0 + e1) + | e0:Expr PLUS e1:Expr prec add -> Int.(e0 + e1) + | x:INT -> x +``` + +Precedence relationships are optional in precedence declarations. Contrived examples follow. + +```hocc +hocc + left a + left b < a + left c < a + left d < b, c # Transitive: a + right e + prec f < d, e # Transitive: a, b, c +``` + +Precedences are bound to tokens, non-terminals, and productions using the optional `prec` reference +clause. Omitting the `prec` reference clause is equivalent to referring to a unique disjoint `prec` +precedence. The following example demonstrates the `prec` clause syntax. + +```hocc +hocc + prec p1 + left p2 + + token FOO prec p1 + + nonterm Bar prec p2 + | FOO + | epsilon + + start Biz + | Bar FOO prec p1 +``` + +The `PSEUDO_END` token is implicitly defined with no precedence; any related conflicts must be +resolved by restructuring the grammar. + +```hocc +hocc + token PSEUDO_END "⊥" +``` + +## Example + +The following example implements a simple mathematical expression calculator. + +`Example.hmhi`: + +```hocc +open import Basis + +# Export the parser API so that alternatives to `calculate` can be implemented. `hocc` expands to a +# module signature. +include hocc + +calulate: string -> zint + [@@doc "Calculate the result of a simple arithmetic expression comprising non-negative integers + and `+`, `-`, `*`, and `/` operators. Tokens must be separated by one or more spaces."] +``` + +`Example.hmh`: + +```hocc +open import Basis + +# Specify the parser. `hocc ...` expands to a module implementation, `{ ... }`. +include hocc + left mul + token STAR "*" prec mul + token SLASH "/" prec mul + nonterm MulOp of Token.t ::= + | "*" -> STAR + | "/" -> SLASH + + left add < mul + token PLUS "+" prec add + token MINUS "-" prec add + nonterm AddOp of Token.t ::= + | "+" -> PLUS + | "-" -> MINUS + + token INT of Zint.t + nonterm Expr of Zint.t ::= + | e0:Expr op:MulOp e1:Expr -> + match op with + | MulOp STAR -> Zint.(e0 * e1) + | MulOp SLASH -> Zint.(e0 / e1) + | e0:Expr op:AddOp e1:Expr prec add -> + match op with + | AddOp PLUS -> Zint.(e0 + e1) + | AddOp MINUS -> Zint.(e0 - e1) + | x:INT -> x + + token EOI + start Answer of Zint.t ::= + | e:Expr EOI -> e + +# Tokenize `s`, e.g. "2 + 3 * 4", and append an `EOI` token. +tokenize s = + s |> String.split_rev ~f:(fn cp -> Codepoint.O.(cp = ' ')) + |> List.rev_filter ~f:(fn s -> String.length s <> 0) + |> List.rev_map ~f:fn s -> + let open Token + match s with + | "*" -> STAR + | "/" -> SLASH + | "+" -> PLUS + | "-" -> MINUS + | _ -> INT (Zint.of_string s) + |> List.push Token.EOI + |> List.rev + +# Calculate the result of the arithmetic expression expressed in `s`, e.g. "2 + 3 * 4". +calculate s = + List.fold_until (tokenize s) ~init:Start.Answer.boi ~f:fn parser tok -> + let parser' = Start.Answer.next tok parser + let done = match status parser' with + | Prefix -> false + | Accept _ + | Error _ -> true + parser', done + |> + function + | Accept answer -> answer + | Prefix _ -> halt "Partial input" + | Error _ -> halt "Parse error" +``` + +To generate Hemlock code from the above inputs, run `hocc -hm -s Example`. + +### Grammar + +The `hocc` specification language grammar is equivalent to the following specification. + +```hocc +hocc + # hocc-specific keywords + token HOCC "hocc" + token NONTERM "nonterm" + token EPSILON "epsilon" + token START "start" + token TOKEN "token" + token PREC "prec" + token LEFT "left" + token RIGHT "right" + + # Identifiers + token UIDENT # Uncapitalized + token CIDENT # Capitalized + token USCORE "_" + + # Token alias + token STRING + + # Punctuation/separators + token COLON_COLON_EQ "::=" + token OF "of" + token COLON ":" + token DOT "." + token ARROW "->" + token BAR "|" + token LT "<" + token COMMA "," + token SEMI ";" + token LINE_DELIM + + # Left-right paired delimiters + token INDENT + token DEDENT + token LPAREN "(" + token RPAREN ")" + token LCAPTURE "(|" + token RCAPTURE "|)" + token LBRACK "[" + token RBRACK "]" + token LARRAY "[|" + token RARRAY "|]" + token LCURLY "{" + token RCURLY "}" + + # Miscellaneous Hemlock token in embedded code + token CODE_TOKEN + + # End of input, used to terminate start symbols + token EOI + + nonterm Ident ::= UIDENT | CIDENT | "_" + + nonterm PrecsTl ::= + | "," UIDENT PrecsTl + | epsilon + + nonterm Precs ::= UIDENT PrecsTl + + nonterm PrecRels ::= + | "<" Precs + | epsilon + + nonterm PrecType ::= "prec" | "left" | "right" + + nonterm Prec ::= PrecType UIDENT PrecRels + + nonterm OfType ::= "of" CIDENT "." UIDENT + + nonterm OfType0 ::= + | OfType + | epsilon + + nonterm PrecRef ::= + | "prec" UIDENT + | epsilon + + nonterm TokenAlias ::= + | STRING + | epsilon + + nonterm Token ::= "token" CIDENT TokenAlias OfType0 PrecRef + + nonterm Sep ::= LINE_DELIM | ";" | "|" + + nonterm CodesTl ::= + | Sep Code CodesTl + | epsilon + + nonterm Codes ::= Code CodesTl + + nonterm Codes0 ::= + | Codes + | epsilon + + nonterm Delimited ::= + | INDENT Codes DEDENT + | "(" Codes0 ")" + | "(|" Codes0 "|)" + | "[" Codes0 "]" + | "[|" Codes0 "|]" + | "{" Codes0 "}" + + nonterm CodeTl ::= + | Delimited CodeTl + | CODE_TOKEN CodeTl + | epsilon + + nonterm Code ::= + | Delimited CodeTl + | CODE_TOKEN CodeTl + + nonterm ProdParamType ::= + | CIDENT + | STRING + + nonterm ProdParamIdent ::= + | Ident ":" + | epsilon + + nonterm ProdParam ::= ProdParamIdent ProdParamType + + nonterm ProdParamsTl ::= + | ProdParam ProdParamsTl + | epsilon + + nonterm ProdParams ::= ProdParam ProdParamsTl + + nonterm ProdPattern ::= + | ProdParams + | "epsilon" + + nonterm Prod ::= ProdPattern PrecRef + + nonterm ProdsTl ::= + | "|" Prod ProdsTl + | epsilon + + nonterm Prods ::= + | "|" Prod ProdsTl + | Prod ProdsTl + + nonterm Reduction ::= Prods "->" Code + + nonterm ReductionsTl ::= + | "|" Reduction ReductionsTl + | epsilon + + nonterm Reductions ::= + | "|" Reduction ReductionsTl + | Reduction ReductionsTl + + nonterm NontermType ::= "nonterm" | "start" + + nonterm Nonterm ::= + | NontermType CIDENT PrecRef "::=" Prods + | NontermType CIDENT OfType PrecRef "::=" Reductions + + nonterm Stmt ::= + | Prec + | Token + | Nonterm + | Code + + nonterm StmtsTl ::= + | LINE_DELIM Stmt StmtsTl + | epsilon + + nonterm Stmts ::= Stmt StmtsTl + + nonterm Hocc ::= "hocc" INDENT Stmts DEDENT + + nonterm Matter ::= + | CODE_TOKEN Matter + | epsilon + + start Hmh ::= Matter Hocc Matter EOI + + start Hmhi ::= Matter "hocc" Matter EOI +``` + +## Citations + +[^knuth1965]: + Donald Knuth, + “On the translation of languages from left to right,” + Information and Control 8 (6), 607–639, July 1965. + +[^pager1977]: + David Pager, + “A Practical General Method for Constructing LR(k) Parsers,” + Acta Informatica 7, 249-268, 1977. + +[^diekmann2020]: + Lukas Diekmann and Laurence Tratt, + “Don't Panic! Better, Fewer, Syntax Errors for LR Parsers,” + 34th European Conference on Object-Oriented Programming (ECOOP 2020), Article No. 6, pages 6:1–6:32. + +[^fpottier]: + François Pottier and Yann Régis-Gianas, + “Menhir LR(1) Parser Generator,” + http://gallium.inria.fr/~fpottier/menhir/ diff --git a/ide/kakoune/hocc.kak b/ide/kakoune/hocc.kak new file mode 100644 index 000000000..24cc45803 --- /dev/null +++ b/ide/kakoune/hocc.kak @@ -0,0 +1,122 @@ +# https://github.com/BranchTaken/Hemlock +# ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ + +# Detection +# ‾‾‾‾‾‾‾‾‾ + +hook global BufCreate .*\.hmhi? %{ + set-option buffer filetype hocc +} + +# Initialization +# ‾‾‾‾‾‾‾‾‾‾‾‾‾‾ + +hook global WinSetOption filetype=hocc %{ + require-module hocc + set-option window static_words %opt{hocc_static_words} +} + +hook -group hocc-highlight global WinSetOption filetype=hocc %{ + add-highlighter window/hocc ref hocc + hook -once -always window WinSetOption filetype=.* %{ remove-highlighter window/hocc } +} + +provide-module hocc %{ + +# Highlighters +# ‾‾‾‾‾‾‾‾‾‾‾‾ + + +add-highlighter shared/hocc regions +add-highlighter shared/hocc/code default-region group + +add-highlighter shared/hocc/code/cident regex \b[_]*[A-Z][A-Za-z0-9_']*\b 0:module +add-highlighter shared/hocc/code/uident regex \b[_]*[a-z][A-Za-z0-9_']*\b 0:Default +add-highlighter shared/hocc/code/tab regex \t 0:Error +add-highlighter shared/hocc/code/unaligned regex ^(\ \ )*\ (?![\ ]) 0:Error +add-highlighter shared/hocc/code/unaligned_continue_keyword regex ^(\ \ \ \ )*(and|also|as|else|external|of|or|then|when|with)\b 0:Error +add-highlighter shared/hocc/code/unaligned_continue_punctuation regex ^(\ \ \ \ )*([\x7D\]),!'\\\-+*/%@$<=>\|:.]) 0:Error +add-highlighter shared/hocc/code/unaligned_continue_caret regex ^(\ \ \ \ )*([\^](?![&A-Za-z_])) 0:Error +add-highlighter shared/hocc/code/trailing regex (\ )+$ 0:ExcessWhitespace +add-highlighter shared/hocc/code/interior_multispace regex (?<=\S)(\ ){2,}(?=\S) 0:ExcessWhitespace + +add-highlighter shared/hocc/comment region -recurse \Q(* \Q(* \Q*) fill comment +add-highlighter shared/hocc/line_comment region '#' '\n' fill comment + +add-highlighter shared/hocc/string region ((?]?(\+|_)?#?0?\*\(\^ () fill meta +add-highlighter shared/hocc/string/precision region \%('.')?[<^>]?(\+|_)?#?0?([1-9][0-9]*)?\.=?\*\(\^ () fill meta +add-highlighter shared/hocc/string/fmt region \%('.')?[<^>]?(\+|_)?#?0?([1-9][0-9]*)?(\.=?[1-9][0-9]*)?[bodx]?[mac]?p?f\(\^ () fill meta +add-highlighter shared/hocc/string/value region \%('.')?[<^>]?(\+|_)?#?0?([1-9][0-9]*)?(\.=?[1-9][0-9]*)?[bodx]?[mac]?p?([bnzcs]|([ui](8|16|32|64|128|256|512)?)|(r(32|64)?))([\ ]*[-+*/%@^$<=>|:.][-+*/%@$<=>|:.~?]*[\ ]*)?\(\^ () fill meta + +add-highlighter shared/hocc/string/width_precision region \^\)\.=?\*\(\^ () fill meta +add-highlighter shared/hocc/string/width_fmt region \^\)(\.=?[1-9][0-9]*)?[bodx]?[mac]?p?f\(\^ () fill meta +add-highlighter shared/hocc/string/width_value region \^\)(\.=?[1-9][0-9]*)?[bodx]?[mac]?p?([bnzcs]|([ui](8|16|32|64|128|256|512)?)|(r(32|64)?))([\ ]*[-+*/%@^$<=>|:.][-+*/%@$<=>|:.~?]*[\ ]*)?\(\^ () fill meta +add-highlighter shared/hocc/string/precision_fmt region \^\)[bodx]?[mac]?p?f\(\^ () fill meta +add-highlighter shared/hocc/string/precision_value region \^\)[bodx]?[mac]?p?([bnzcs]|([ui](8|16|32|64|128|256|512)?)|(r(32|64)?))([\ ]*[-+*/%@^$<=>|:.][-+*/%@$<=>|:.~?]*[\ ]*)?\(\^ () fill meta +add-highlighter shared/hocc/string/fmt_value region \^\)([\ ]*[-+*/%@^$<=>|:.][-+*/%@$<=>|:.~?]*[\ ]*)?\(\^ () fill meta + +add-highlighter shared/hocc/string/unprotected region (?|:.~?]*} 0:operator +add-highlighter shared/hocc/code/infix_operator regex %{[-+*/%@^$<=>|:.][-+*/%@^$<=>|:.~?]*} 0:operator + +add-highlighter shared/hocc/code/boolean regex \b(true|false)\b 0:value + +add-highlighter shared/hocc/code/bin_integer regex \b(0b)([_]*[01][01_]*)(([ui](8|16|32|64|128|256|512)?)|[zn])?\b 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/oct_integer regex \b(0o)([_]*[0-7][0-7_]*)(([ui](8|16|32|64|128|256|512)?)|[zn])?\b 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/hex_integer regex \b(0x)([_]*[0-9a-f][0-9a-f_]*)(([ui](8|16|32|64|128|256|512)?)|[zn])?\b 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/integer regex \b(([1-9][0-9_]*)|0[_]*)(([ui](8|16|32|64|128|256|512)?)|[zn])?\b 1:value 3:attribute + +add-highlighter shared/hocc/code/bin_real_dot regex \b(0b)([01][01_]*\.(?!\.)[01_]*(p_*[+\-]?_*[0-9][0-9_]*)?)(r(32|64)?)? 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/bin_real_p regex \b(0b)([01][01_]*p_*[+\-]?_*[0-9][0-9_]*)(r(32|64)?)?\b 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/bin_real_r regex \b(0b)([01][01_]*)(r(32|64)?)\b 1:attribute 2:value 3:attribute + +add-highlighter shared/hocc/code/oct_real_dot regex \b(0o)([0-7][0-7_]*\.(?!\.)[0-7_]*(p_*[+\-]?_*[0-9][0-9_]*)?)(r(32|64)?)? 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/oct_real_p regex \b(0o)([0-7][0-7_]*p_*[+\-]?_*[0-9][0-9_]*)(r(32|64)?)?\b 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/oct_real_r regex \b(0o)([0-7][0-7_]*)(r(32|64)?)\b 1:attribute 2:value 3:attribute + +add-highlighter shared/hocc/code/hex_real_dot regex \b(0x)([0-9a-f][0-9a-f_]*\.(?!\.)[0-9a-f_]*(p_*[+\-]?_*[0-9][0-9_]*)?)(r(32|64)?)? 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/hex_real_p regex \b(0x)([0-9a-f][0-9a-f_]*p_*[+\-]?_*[0-9][0-9_]*)(r(32|64)?)?\b 1:attribute 2:value 3:attribute +add-highlighter shared/hocc/code/hex_real_r regex \b(0x)([0-9a-f][0-9a-f_]*)(r(32|64)?)\b 1:attribute 2:value 3:attribute + +add-highlighter shared/hocc/code/real_dot regex \b([0-9][0-9_]*\.(?!\.)[0-9_]*(e_*[+\-]?_*[0-9][0-9_]*)?)(r(32|64)?)? 1:value 2:attribute +add-highlighter shared/hocc/code/real_e regex \b([0-9][0-9_]*e_*[+\-]?_*[0-9][0-9_]*)(r(32|64)?)?\b 1:value 2:attribute +add-highlighter shared/hocc/code/real_r regex \b([0-9][0-9_]*)(r(32|64)?)\b 1:value 2:attribute + +# Macro +# ‾‾‾‾‾ + +evaluate-commands %sh{ + keywords="and|also|as|conceal|effect|else|expose|external|fn|function|if|import|include|lazy|let" + keywords="${keywords}|match|mutability|of|open|or|rec|then|type|when|with" + keywords="${keywords}|hocc|token|nonterm|start|epsilon|prec|left|right" + + printf %s\\n "declare-option str-list hocc_static_words ${keywords}" | tr '|' ' ' + + printf %s " + add-highlighter shared/hocc/code/ regex \b(${keywords})\b 0:keyword + " +} + +}