Track specific validation errors #260

karwa · 2023-09-26T07:14:19Z

From the table, I've only done the "URL parsing" section, not the IDNA or host parsing sections.

Having this in the reference implementation has already helped me confirm a couple of issues with the validation errors reported by the standard (bugs and PRs to WHATWG/url incoming). It can also be useful for other implementations to check the errors they report. Hopefully we can add expected validation errors to the WPT test suite in due course, as well.

That said, these results are not exposed to the live viewer yet. I've hacked something together locally, but I'm not sure of the cleanest way to thread that through to the viewer.

karwa · 2023-09-26T13:20:21Z

Related to #156 and #61

As for performance: we could use a bit-mask rather than an array of strings (since the standard only has a handful of unique validation errors). The cost is that we would lose some information, namely:

The order in which errors were encountered
The number of occurrences of each error

It would also be more limiting, as we would be unable to store associated information with each error (e.g. cursor position). But, in return, we would be able to store all errors that were encountered in a fixed amount of memory.

Another alternative would be for the parser to accept a validation error callback closure. Callers could supply a closure which collects all errors in an array, or which ignores them entirely, and the overhead in the parser should not be worse than any other virtual function call; the parser would just call the user-supplied function saying "hey, this error happened". Importantly, the caller would provide the memory to store errors, if they indeed want to store them.

karwa · 2023-09-26T14:54:37Z

I've implemented the closure approach on a branch (https://github.com/karwa/whatwg-url/commits/validation-error-callback) and hooked it up roughly to the live viewer.

The constructor for the URL type is generated from IDL or something and I'm not sure how to go around it, so you'll need to forward that argument manually by editing the generated constructor in lib/URL.js (line 115) and adding:

{
  let curArg = arguments[2];
  if (curArg !== undefined && typeof curArg === "function") {
    args.push(curArg);
  }
}

But yeah, it illustrates the approach. Since it's a callback function we can also add supplementary information about the error such as the pointer position, without having to worry about the memory overhead of storing all this stuff.

domenic · 2023-09-30T05:54:45Z

This is exciting work!

I think we only want to expose this capability through the low-level URL Standard API, not the new URL() API. That should bypass any worries about conflicting with the generated code. It does mean the live viewer would need to switch which API it uses, but that should be fine.

Other things we'd need to do before merging this:

Documentation in the README, including on which validation errors are missing.
The results of the benchmark script before and after.

It may also be worth benchmarking the results of a no-op default onValidationError, versus a null/undefined default onValidationError that is always called with this.onValidationError.?(...).

domenic · 2023-09-30T05:58:30Z

I also like the onValidationError approach because we could avoid the performance cost of #59 by doing something like

if (this.onValidationError && !isURLCodePoint(c) && c !== p("%")) {
   this.onValidationError("invalid-URL-unit");
}

instead of just

if (!isURLCodePoint(c) && c !== p("%")) {
   this.onValidationError.?("invalid-URL-unit");
}

There are a few other instances of that pattern as well that might be worth optimizing out, e.g.

    if (c === p("%") &&
        (!infra.isASCIIHex(this.input[this.pointer + 1]) ||
         !infra.isASCIIHex(this.input[this.pointer + 2]))) {
      this.onValidationError("invalid-URL-unit");
    }

could remove collapse the three checks down into one.

karwa added 2 commits September 26, 2023 09:03

Track specific validation errors

ab0f586

Incorporate credentials error change from whatwg/url#793

ff5c725

Fix a URL code-unit check when parsing opaque paths

0ecf6dd

karwa force-pushed the validation-errors branch from 569c975 to 0ecf6dd Compare September 26, 2023 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track specific validation errors #260

Track specific validation errors #260

karwa commented Sep 26, 2023

karwa commented Sep 26, 2023

karwa commented Sep 26, 2023

domenic commented Sep 30, 2023

domenic commented Sep 30, 2023

Track specific validation errors #260

Are you sure you want to change the base?

Track specific validation errors #260

Conversation

karwa commented Sep 26, 2023

karwa commented Sep 26, 2023

karwa commented Sep 26, 2023

domenic commented Sep 30, 2023

domenic commented Sep 30, 2023