-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: structured diagnostics for the compiler #45
base: master
Are you sure you want to change the base?
Conversation
I think having structured output from the compiler is great, and I think it is wise to think carefully about versioning and subtyping. Yet I think this proposal would be strengthened with further examples, and proposed schemata for various outputs. Maybe this is too soon and that information is based saved until later? A few questions I have at this point:
I guess that's it for now, actually. I'm excited to see this posted! |
Concerning the error message part:
(see also the current implementation). Thus if we want to add a new constructor, we can add it as a derived constructor of Concerning the various schemata, my point of view is that there are two distinct situation in the current compiler. type msg = Fmt.t loc
type report = {
kind : report_kind;
main : msg;
sub : msg list;
footnote: Fmt.t option;
} and the For instance, for the error report, I knew of three points that I have considered to change at points:
From the $ ocamlc -config
version: 5.2.0
standard_library_default: /home/angeletti/.opam/5.2.0+flambda/lib/ocaml
standard_library: /home/angeletti/.opam/5.2.0+flambda/lib/ocaml
ccomp_type: cc
c_compiler: gcc
ocamlc_cflags: -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread
ocamlc_cppflags: -D_FILE_OFFSET_BITS=64
ocamlopt_cflags: -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread
ocamlopt_cppflags: -D_FILE_OFFSET_BITS=64
bytecomp_c_compiler: gcc -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread -D_FILE_OFFSET_BITS=64
native_c_compiler: gcc -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread -D_FILE_OFFSET_BITS=64
bytecomp_c_libraries: -lzstd -lm -lpthread
native_c_libraries: -lm -lpthread
native_ldflags:
native_pack_linker: ld -r -o
native_compiler: true
architecture: amd64
model: default
int_size: 63
word_size: 64
system: linux
asm: as
asm_cfi_supported: true
with_frame_pointers: false
ext_exe:
ext_obj: .o
ext_asm: .s
ext_lib: .a
ext_dll: .so
os_type: Unix
default_executable_name: a.out
systhread_supported: true
host: x86_64-pc-linux-gnu
target: x86_64-pc-linux-gnu
flambda: true
safe_string: true
default_safe_string: true
flat_float_array: true
function_sections: true
afl_instrument: false
tsan: false
windows_unicode: false
supports_shared_libraries: true
native_dynlink: true
naked_pointers: false
exec_magic_number: Caml1999X034
cmi_magic_number: Caml1999I034
cmo_magic_number: Caml1999O034
cma_magic_number: Caml1999A034
cmx_magic_number: Caml1999y034
cmxa_magic_number: Caml1999z034
ast_impl_magic_number: Caml1999M034
ast_intf_magic_number: Caml1999N034
cmxs_magic_number: Caml1999D034
cmt_magic_number: Caml1999T034
linear_magic_number: Caml1999L034 to $ ocamlc -config -log-format json {
"metadata" : { "version" : [1, 0], "valid" : "Full"},
"version" : "5.3.0+dev0-2023-12-22",
"standard_library_default" : "/usr/local/lib/ocaml",
"standard_library" : "/usr/local/lib/ocaml",
"ccomp_type" : "cc",
"c_compiler" : "gcc",
"ocamlc_cflags" :
" -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread",
"ocamlc_cppflags" : " -D_FILE_OFFSET_BITS=64 ",
"ocamlopt_cflags" :
" -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread",
"ocamlopt_cppflags" : " -D_FILE_OFFSET_BITS=64 ",
"bytecomp_c_compiler" :
"gcc -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread -D_FILE_OFFSET_BITS=64 ",
"native_c_compiler" :
"gcc -O2 -fno-strict-aliasing -fwrapv -pthread -fPIC -pthread -D_FILE_OFFSET_BITS=64 ",
"bytecomp_c_libraries" : "-lzstd -lm -lpthread",
"native_c_libraries" : " -lm -lpthread",
"native_ldflags" : "",
"native_pack_linker" : "ld -r -o ",
"native_compiler" : true,
"architecture" : "amd64",
"model" : "default",
"int_size" : 63,
"word_size" : 64,
"system" : "linux",
"asm" : "as",
"asm_cfi_supported" : true,
"with_frame_pointers" : false,
"ext_exe" : "",
"ext_obj" : ".o",
"ext_asm" : ".s",
"ext_lib" : ".a",
"ext_dll" : ".so",
"os_type" : "Unix",
"default_executable_name" : "a.out",
"systhread_supported" : true,
"host" : "x86_64-pc-linux-gnu",
"target" : "x86_64-pc-linux-gnu",
"flambda" : false,
"safe_string" : true,
"default_safe_string" : true,
"flat_float_array" : true,
"function_sections" : true,
"afl_instrument" : false,
"tsan" : false,
"windows_unicode" : false,
"supports_shared_libraries" : true,
"native_dynlink" : true,
"naked_pointers" : false,
"exec_magic_number" : "Caml1999X035",
"cmi_magic_number" : "Caml1999I035",
"cmo_magic_number" : "Caml1999O035",
"cma_magic_number" : "Caml1999A035",
"cmx_magic_number" : "Caml1999Y035",
"cmxa_magic_number" : "Caml1999Z035",
"ast_impl_magic_number" : "Caml1999M035",
"ast_intf_magic_number" : "Caml1999N035",
"cmxs_magic_number" : "Caml1999D035",
"cmt_magic_number" : "Caml1999T035",
"linear_magic_number" : "Caml1999L035"
} For the cases where the compiler is current dumping raw contents (for instance with "debug" :
{
"type" : "object",
"properties" :
{
"profile" : { "type" : "string"},
"cmm_invariant" : { "type" : "string"},
"linear" : { "type" : "array", "items" : { "type" : "string"}},
"mach" : { "type" : "array", "items" : { "type" : "string"}},
"unbox-specialised-args" : { "type" : "array", "items" : { "type" : "string"}},
"unbox-closures" : { "type" : "array", "items" : { "type" : "string"}},
"unbox-free-vars-of-closures" : { "type" : "array", "items" : { "type" : "string"}},
"remove-free-vars-equal-to-args" : { "type" : "array", "items" : { "type" : "string"}},
"cmm" : { "type" : "array", "items" : { "type" : "string"}},
"raw_clambda" : { "type" : "array", "items" : { "type" : "string"}},
"clambda" : { "type" : "array", "items" : { "type" : "string"}},
"raw_flambda" : { "type" : "array", "items" : { "type" : "string"}},
"flambda" : { "type" : "array", "items" : { "type" : "string"}},
"raw_lambda" : { "type" : "string"},
"lambda" : { "type" : "string"},
"instr" : { "type" : "string"},
"shape" : { "type" : "string"},
"typedtree" : { "type" : "string"},
"source" : { "type" : "string"},
"parsetree" : { "type" : "string"}
},
"required" : []
}, while the full compiler schema is "type" : "object",
"properties" :
{
"metadata" : { "$ref" : "#/$defs/metadata"},
"alerts" :
{ "type" : "array", "items" : { "$ref" : "#/$defs/error_report"}},
"warnings" :
{ "type" : "array", "items" : { "$ref" : "#/$defs/error_report"}},
"error" : { "$ref" : "#/$defs/error_report"},
"debug" : { "$ref" : "#/$defs/debug"}
},
"required" : ["metadata"] The possible disadvantage of this schema for debugging outout is that if we want to structure the output in a more structured way, we will need to add new fields for the structured version. On a more metalevel, I agree that it is necessary to discuss the various schemata, but I am not sure what are the best venues, nor specification to do that. Typically, json scheme is quite heavy to use to describe sum types, for instance the "error_kind" :
{
"oneOf" :
[{
"type" : "array",
"prefixItems" :
[{ "const" : "Report_warning_as_error"},
{ "type" : "string"}]
},
{
"type" : "array",
"prefixItems" :
[{ "const" : "Report_warning"}, { "type" : "string"}]
},
{
"type" : "array",
"prefixItems" :
[{ "const" : "Report_alert_as_error"}, { "type" : "string"}
]
},
{
"type" : "array",
"prefixItems" :
[{ "const" : "Report_alert"}, { "type" : "string"}]
}, { "const" : "Report_error"}]
}, when transposed to |
Maybe? I'm really more after concrete examples instead of the generic schemata. That is, I think with maybe 2 examples, it would be easy to infer the rough schemata, but if we just look at schemata, it might be hard to intuit how examples would look. |
I can certainly give some examples. At the same time, the exact schema can be updated easily. For instance, taking the functor error example from the RFC with a tree representation of document format and inlined constructor arguments: ((metadata ((version (1 0)) (valid Full)))
(error
((kind Report_error)
(main
((msg
((Box HV 0
((Text "This application of the functor ")
(Tag Inline_code ((Text "F"))) (Text " is ill-typed.")
(Simple_break 1 0) (Text "These arguments:") (Simple_break 1 2)
(Box B 0 ((Tag Preservation ((Text "Y"))))) (Simple_break 1 0)
(Text "do not match these parameters:") (Simple_break 1 2)
(Box B 0
((Text "functor") (Simple_break 1 0)
(Tag Insertion ((Text "(X : ") (Box B 2 ((Text "x"))) (Text ")")))
(Simple_break 1 0)
(Tag Preservation
((Text "(Y : ") (Box B 2 ((Text "y"))) (Text ")")))
(Simple_break 1 0) (Text "-> ...")))))))
(loc
((file "functor_example.ml") (start_line 7) (stop_line 7)
(characters (11 15))))))
(sub
(((msg
((Tab_break 0 0)
(Tbox
((Tag Insertion ((Text "1. "))) Set_tab
(Box HV 2
((Text "An argument appears to be missing with module type")
(Simple_break 1 2) (Box B 0 ((Box B 2 ((Text "x"))))))))))))
((msg
((Tab_break 0 0)
(Tbox
((Tag Preservation ((Text "2. "))) Set_tab
(Box HV 2
((Text "Module Y matches the expected module type ")
(Box B 2 ((Text "y"))))))))))))))) Another compiler output with warnings, alerts and dump output: (* example_warning_alert.ml *)
let f x = Gc.eventlog_pause () ocamlopt -dparsetree -dsource -dlambda -log-format sexp -w +A example_warning_alert.ml ((metadata ((version (1 0)) (valid Full)))
(debug
((source "let f x = Gc.eventlog_pause ()")
(lambda
"(seq\n (let\n (f/270 =\n (function x/272 : int (apply (field_imm 7 (global Stdlib__Gc!)) 0)))\n (setfield_ptr(root-init) 0 (global Example_warning_alert!) f/270))\n 0)"
)))
(warnings
(((kind (Report_warning "27 [unused-var-strict]"))
(main
((msg ((Text "unused variable x.")))
(loc
((file "example_warning_alert.ml") (start_line 2) (stop_line 2)
(characters (6 7)))))))
((kind (Report_warning "70 [missing-mli]"))
(main
((msg ((Text "Cannot find interface file.")))
(loc ((file "example_warning_alert.ml"))))))))
(alerts
(((kind (Report_alert "deprecated"))
(main
((msg
((Text "Stdlib.Gc.eventlog_pause\nUse Runtime_events.pause instead."))
)
(loc
((file "example_warning_alert.ml") (start_line 2) (stop_line 2)
(characters (10 27)))))))))) For the toplevel, I propose to put the compiler diagnostic inside a # let f x = 1 in raise Not_found; f ();; ((metadata ((version (1 0)) (valid Full)))
(output
((Box B 0 ((Text "Exception:") (Simple_break 1 0) (Text "Not_found.")))))
(compiler_diagnostic
((warnings
(((kind (Report_warning "27 [unused-var-strict]"))
(main
((msg ((Text "unused variable x.")))
(loc ((start_line 1) (stop_line 1) (characters (6 7)))))))
((kind (Report_warning "21 [nonreturning-statement]"))
(main
((msg
((Text "this statement never returns (or has an unsound type.)")))
(loc ((start_line 1) (stop_line 1) (characters (15 30)))))))))))) #warn A;; ((metadata ((version (1 0)) (valid Full)))
(errors
(((Text "Unknown directive ") (Tag Inline_code ((Text "warn"))) (Text ".")
(Flush false))))) |
Those examples are really helpful. Thanks. My reaction to those is that I'm still worried about the mix of syntax and semantics. That is, the top few nodes are all about semantics: where this message was generated from and what it's doing. Then the lower nodes are all about syntax: how should we pretty-print this? Instead, I would rather see something like
(I did not check for parenthesis matching.) Then, there would be a function |
I agree that at a later stage, it would be better to provide more semantical information. Nevertheless I think it is useful to start by exposing the current rendered error message as a reasonable way to present the error. This doesn't prevent us to add more semantics information later, and possibly deprecates the "raw" output if it becomes completely unused. To be honest, on the review side, I would be a bit worried to ask reviewers to review both a new logging mechanism and a change that alters all errors in the compiler. Typically, one important information that is missing for exploiting error messages would be having error names (and error numbers?) as proposed in #33. I would rather propose to move step by step, first with structured output, which will make #33 more appealing, and then add more semantics structure to the error reports. One alternative that has been suggested to me for the |
Whilst I generally favor an iterative approach, it is worth pointing out that some consumers of the structured output will probably need to consume every version of it, so the more versions you have the more work for your users. It is worth spending some time up front trying to get the right information for the tools that want this to avoid needing long term support for a bunch of versions. |
@lpw25 what are the tools / use-case / scenario / persona that you have in mind? |
Without thinking hard about it, I'd assume build systems and editor tooling. |
I agree! But I don't think I understand the concrete worry here. The logging mechanism would be new, for sure, but why do you say that my suggestion also alters all errors in the compiler? |
@lpw25 , I agree that it is better to avoid reporting too much complexity on the consumer side. However, when speaking to dune developer and merlin developers, the list of features required right now that I ended up was:
I think it makes sense to start with an initial version that implements those highly demanded features as soon as possible. Then once we have actual structured diagnostics, I have no doubts that other feature requests will appear. But I am bit wary to try to anticipate on hypothetical future feedbacks without more data. At the same time, if you think that I might be ignoring some possible sources of concrete feedbacks at Janestreet, don't hesitate to point them to me. A possible good compromise could be to have a ramp-up period: we could start with diagnostic at version 0.0 (0.1) to let developer tools try to use it and gather feedback, with the idea that tools may drop in the future the support for all versions 0.* once the versions 1.* are out. @goldfirere , sorry I was unclear. I meant the error report pipeline rather than the error messages themselves. Currently, we have three levels of representation for errors:
Serializing the error report is straightforward, and require only to adapt the report printers without any other changes. |
For the record, there is a history of previous work and previous discussions on this feature. It was first explicitly discussed in a feature request created in january 2016 ocaml/ocaml#7123, with implementations proposed four years later in summer-fall 2020, when @nojb implemented ocaml/ocaml#9758 and Muskan Gard and @Octachron implemented ocaml/ocaml#9538, ocaml/ocaml#9979 (during a summer internship). At the time we asked for feedback from tool authors, but the feedback was inconclusive and the various PRs ended up being stalled -- seemingly forever. In response to this feedback, @Octachron worked on schemas versioning for a long while, and the current RFC is the new iteration -- sitting on top of non-trivial buildup work, in particular ocaml/ocaml#13169 . |
We are already in such a ramp-up period, where there are zero versions of the protocol available and tools are not using any of them. No one is forcing tools to adopt/support the structured output as soon as it gets out, they can keep doing whatever they want, and give us feedback on what they would need/want to start consuming it. |
Thanks for the responses. I agree it makes sense to decouple the initial implementation of structured output from the work of updating all the code that assembles error messages. I can observe that the two steps -- add structured output and improve the error-message assembling -- can happen in either order. That is, we could go through the error messages first and add structure to their assembly without changing concrete output. (I don't necessarily think it's better to reorder. Just saying it's possible.) As for ramp-up period: that's true. But part of what's in play here is time. That is, suppose we get to a first version of this, with little semantic information (that is, we implement the version that produces the sexps as they appear in this PR). Tools might think of adopting. If everything is going to change in 6 months -- and the change would make adoption easier -- then authors might wait for things to get easier. But if it's going to be 6 years, authors might just forge ahead, even if it means awkward compatibility support later. Of course it's always hard to make rock-solid commitments in open-source software (or closed-source!). Yet I think that, if we broadcast the direction of travel to tool authors, they can make more informed decisions about when, and what, to adopt. But with all of this in mind, I think perhaps forging ahead with the proposal as written is a good next step. |
This is a proposal for implementing structured diagnostics for the compiler, to be able to render compiler errors, debugging output, config information and toplevel output in structured formats like S-expression or JSON.
As a higher level overview, this RFC is mostly focused on the definition of a backward-compatibility and versioning policy for those structured diagnostics.
Rendered version