Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dependency resolution, XPath static analysis; support current() in computations, repeat-based itemsets, relative body references #166

Merged
merged 32 commits into from
Jul 25, 2024

Conversation

eyelidlessness
Copy link
Member

@eyelidlessness eyelidlessness commented Jul 12, 2024

Whew, that title is a mouthful! As it suggests, there's a fair bit going on here. But it's all interrelated: all of the functional improvements (fixes and features alike) effectively call in to the same underlying abstractions (840937b).

Working out from the lowest conceptual level to the highest:

  1. A simpler, more general, and more correct, approach to querying aspects of a given XPath expression's syntax—by traversal of parsed XPath syntax trees

  2. Semantic application of that approach, used to identify various kinds of XPath path expressions and sub-expressions, including node-set returning functions; similar application replaces existing analysis of function calls, e.g. to identify translation expressions.

  3. A more thorough (and thoroughly tested) approach to path resolution logic. This is the most substantive aspect of the change, and I'll go into more detail about the approach below.

  4. Analysis of path predicates to similarly identify path references within those predicates, along with the sub-path context for each such reference. This will also be discussed in more detail below, as it relates to the next and previous points.

  5. Combination of all of these aspects to perform dependency analysis on arbitrary sub-expressions, where (to the extent possible) the product is a set of contextualized nodeset references, which are ultimately used to establish reactive subscriptions for each of a form's computational expressions. More on this below as well.

Note on XPath syntax terminology

There will be several references to aspects of XPath syntax, both in the text below and throughout the changes in this PR. These references always refer to one (or more) of:

  • concepts directly named in XPath 1.0
  • concepts derived from that specification, typically where the spec does not explicitly name a concept
  • convenient categorical concepts which reference multiple such underlying concepts (e.g. quite a few references in the change to "Step-like")

If any reference to such a concept is unclear, feedback on that is welcome and encouraged. My hope is that the end product of this PR is unambiguous on spec and spec-derived concepts, and easily understood if/when we iterate on these foundations downstream.

Note on where this fits in the form initialization process

I want to make sure it's absolutely clear that all of the work described herein happens while parsing a form. All of this analysis and resolution is done once, upfront, before a form's instance state is initialized. Since #154, the distinction will hopefully be clearer for this and future work (as the code is clearly structured under the parse source directory). But it feels important to make the distinction clear now, as there's been some ambiguity in the past about when certain logic is effective, and what that might imply for e.g. performance, internal API usage, etc.

Path resolution

Broadly, the approach to path resolution involves:

  1. Build a partially flattened, list-like representation of the unresolved path expression, where each list item is one of these parsed syntax nodes:

    • Step (StepNode)
    • / (as a reference to the root node in an AbsoluteLocationPath; not explicitly named in the XPath grammar; internally named AbsoluteRootLocationPathNode)
    • // (shorthand for the sub-path expression /descendant-or-self::node()/, itself a Step plus the prefix to a following Step; also not explicitly named in the XPath grammar; internally named RelativeStepSyntaxLiteralNode [which is probably not ideal!])
    • FilterExpr (which is a reference to many different aspects of XPath syntax, but for our usage generally corresponds to a node-set returning function)
  2. If a particular context path exists, build the same partially flattened representation and concatenate (e.g. [...context, ...unresolved]).

  3. Iterate through the (now-contextualized, where applicable) flattened representation, resolving (where possible):

    • current() to the context path itself
    • . (and other syntax representations of self references) to the preceding Step-like path segment, by omitting the self reference
    • .. (and other syntax representations of parent references), by omitting the parent reference as well as its preceding Step-like path segment
  4. Serializing the remaining path segments. Once again, I'll give more detail on this under Dependency analysis below.

Dependency analysis

The approach to dependency analysis replaces a much less thorough/mature approach, but has the same goal: produce a set of nodeset references identified in an arbitrary XPath expression. The previous approach did some portion of what will be described below, naively cutting a variety of corners (largely introduced in our early spike on the viability of this project).

Broadly, the more thorough and mature approach involves:

  • With a given context expression (where one is known/available from a current or earlier parsing step), and
  • Given an arbitrary XPath expression

Find all of the sub-expressions referencing a path in that arbitrary expression.

  1. For every such path sub-expression found:
    1. Resolve the path itself, as described in Path resolution above
    2. Predicate resolution: for each path segment, and for each of its predicates: find any further path references in the predicate expression. For each predicate path sub-expression found: resolve that path such that context() references the same context as above, and ./self references the current path segment/Step context
  2. Serialization: For all resolved path expressions, serialize the path segments omitting predicates.

This last point is important: as the work here happens during form parsing, the resolved dependencies it produces are (and have been for some time) nodeset references, as in references to an XForm model node. We will likely revisit this in one form or another, but this is the way all existing dependency analysis works (and has been improved in various ways under the same approach). This is also the mechanism that enables the same approach to fix tracking relative references as well as current() references.

Most importantly: the product of all this analysis is a set of nodeset references used in engine internals to identify SubscribableDependencys for reactive subscriptions. The resolved expressions are never evaluated to produce results for the source expressions from which they were resolved.

To really emphasize how much this is building on a set of assumptions already at work, this is how little changed to make these changes available to downstream instance state logic.

Relative body references

This aspect of the change (3e425f1) largely builds on the same path resolution and serialization foundations above, with two key differences:

  • current() (as the initial path segment of a given body reference) is treated as equivalent to ./self; I believe this is the expectation/intent, at least as represented by the ported JavaRosa scenario test which it fixes

  • serialization of resolved paths preserves predicates; this ensures, for instance, that itemset nodesets retain any of the dynamic filtering logic defined in the form (and allows downstream dependency analysis from there)

An important note: while this aspect of path resolution is applied in nearly every aspect of body parsing where a reference might be expected, the <value> and <label> subexpressions of an <itemset> are exempted. Where these are defined as relative, it is important to preserve that so their evaluation is contextualized to the also-dynamic items produced by the <itemset nodeset> expression.

Scenario updates

The first commit in this change takes a slightly different tack from #150 when it comes to the scenario tests with implicit repeat creation. Instead of implementing such implicit repeat creation, this change:

  • Warns when implicit creation is likely expected at a given point in a test's execution, allowing the test to be updated to explicitly create the missing repeats
  • Adds an API that allows such explicit additions easily identifiable, so similar changes may be adopted by JavaRosa at that project's convenience

Beyond that... well, there's a whole bunch of newly passing tests!

Copy link

changeset-bot bot commented Jul 12, 2024

🦋 Changeset detected

Latest commit: e39b018

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@getodk/scenario Minor
@getodk/xforms-engine Minor
@getodk/ui-solid Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

eyelidlessness added a commit that referenced this pull request Jul 13, 2024
These early tests were intended to demonstrate using the parser, in a way that relates to concepts we’d build later in the project exploration. Specifically, they demonstrated querying aspects of the parsed syntax tree to identify path sub-expressions.

They have probably not had much value for quite a while, as we’ve had real logic doing this quite differently. This is especially obvious now, as of #166, where we have substantially matured that approach.

This leaves standard tree-sitter testing intact, which is more than sufficient for validating the parser grammar’s **syntax tree output**. All other XPath-related logic is tested much more thoroughly downstream (including quite a bit exercising the grammar, both in `xpath` and `xforms-engine` packages).

Small note that this also moves type tests before the standard tree-sitter tests. This is probably a change we won’t want to make in other packages, at least without some additional consideration. It makes sense here because there’s no downstream testing that relies on evaluating TypeScript source modules. In other packages, type tests failing early would prevent further test tasks from running (which would be great in CI, but probably quite annoying in dev).
eyelidlessness added a commit that referenced this pull request Jul 15, 2024
These early tests were intended to demonstrate using the parser, in a way that relates to concepts we’d build later in the project exploration. Specifically, they demonstrated querying aspects of the parsed syntax tree to identify path sub-expressions.

They have probably not had much value for quite a while, as we’ve had real logic doing this quite differently. This is especially obvious now, as of #166, where we have substantially matured that approach.

This leaves standard tree-sitter testing intact, which is more than sufficient for validating the parser grammar’s **syntax tree output**. All other XPath-related logic is tested much more thoroughly downstream (including quite a bit exercising the grammar, both in `xpath` and `xforms-engine` packages).

Small note that this also moves type tests before the standard tree-sitter tests. This is probably a change we won’t want to make in other packages, at least without some additional consideration. It makes sense here because there’s no downstream testing that relies on evaluating TypeScript source modules. In other packages, type tests failing early would prevent further test tasks from running (which would be great in CI, but probably quite annoying in dev).
@eyelidlessness eyelidlessness force-pushed the fix/dep-resolution-current-relative-body-refs branch from 539a5a8 to 0161928 Compare July 16, 2024 20:24
Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this quite overwhelming at first but started with the unit tests and those really helped me get oriented. After understanding the tests, I went back to the code and it felt much more comfortable. I read through it relatively quickly mostly to see whether it made me think of untested cases. The general approaches seem sound to me.

A few small questions inline.

this is how little changed to make these changes available to downstream instance state logic.

Very cool!

current() (as the initial path segment of a given body reference) is treated as equivalent to ./self

It still feels largely accidental that this is supported at all in JR but the interpretation is consistent.

* Treats the literal sub-expression `null` as if it were a keyword. This is a
* deviation from XPath (where `null` would be a RelativeLocationPath
* referencing a child node by that name). However, in real-world usage, some
* forms evidently use `null` as a keyword representing a null and/or blank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example form? Wouldn't they use the 'null' string value in that case? Do Enketo and Collect support something like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you asked about this! It gave me an opportunity to really think about the underlying reason it existed in the first place, and whether it makes sense to carry over into this change. In short: I don't think this is necessary anymore! I'll double check a couple things locally to be absolutely sure, but I expect to be able to remove this.

Since you asked for an example form, the cases I could find were in whova. See here and here (last argument to the inner if() in both).


I do have to ask whether a 'null' string value is expected to be a special case. This would be very surprising to me!

Taking the whova usage as an example, the conditions that reference null as a NameTest will produce a blank value when they hit that condition. If those were a 'null' string Literal, in strict XPath semantics, they'll produce the that literal string value.

A quick search of the resources ported from JavaRosa doesn't reveal any usage of 'null' that I could find. A quick search in JavaRosa itself has a few hits for the string literal, but none that look like treating it as a special XPath value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here and here (last argument to the inner if() in both).

🤯 Looks like the intent is indeed to have a blank value and because it had the desired outcome for the reasons you described it stayed in.

I do have to ask whether a 'null' string value is expected to be a special case.

No, I don't think so!

I expect to be able to remove this.

🎉🎉🎉

*
* @todo XPath grammar technically also allows for `current()[some-predicate]`,
* and our `tree-sitter-xpath` grammar/parser also allow for this. But
* `@getodk/xpath` types do not currently acknowledge this possibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible in a form? The expression would need to appear in a place where it's bound to a repeat because only repeats can be filtered in this way. It would have to be in a relevance or read-only expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible in a form, it's possible anywhere. It's part of the XPath grammar: FilterExpr. The grammar itself is overly permissive—it allows Predicates on things you can't predicate, such as numbers and string literals—but the inferrable intent is specifically to allow Predicates on node-set returning FunctionCalls and on parenthesized Exprs (implicitly those producing node-sets as well).

I'm curious what you mean by:

The expression would need to appear in a place where it's bound to a repeat because only repeats can be filtered in this way. It would have to be in a relevance or read-only expression.

Are you saying that we should add special conditions to enforce this, restricting this aspect of XPath grammar to these specific areas of XForms functionality? Or just that expressions like that would only be produced (e.g. by XLSForm/pyxform) in those cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that it would only be actually useful/desirable to filter a nodeset with more than one node, I think. Basically the todo feels relatively low priority unless there's a use case I'm missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly acknowledging that this is a gap in @getodk/xpath, both at the type level for SyntaxNode variants and (consequently) at the implementation of that aspect of the spec. Agree it's low priority! I don't want to lose the context completely, as it's one of very few areas xpath deviates from the spec.

* - `instance("id")/...` (where `...` represents additional steps)
* - `instance("id")//...` (^)
*
* @todo XPath grammar technically also allows for `current()[some-predicate]`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instance("id")[some-predicate] I think. Instances are (internally represented as) XML and XML must have a root node so I'm not sure this is possible.

Copy link
Member Author

@eyelidlessness eyelidlessness Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Missed in copypasta. I think it'd be better both this and the other case) to reference the grammar names for the syntax, rather than the specific function names. Which may also help clarify some of the questions raised in that other thread.

The point in both cases is that they're not accounting for predicates which may be present, because the current SyntaxNode types don't allow for it (and correcting the types would implicate fixing the xpath runtime to handle the case).


const serializedPaths = Array.from(
new Set(
resolvedPathLists.map((resovledPathList) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: resovledPathList (but it's used consistently)


contextNodeset: null,
expression: '//. | //./foo | //.. | //../foo',
expected: ['//.', '//foo', '//..'],
Copy link
Member

@lognaturel lognaturel Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't the last one //../foo? Does that expression even make any sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's resolved to //foo. The result is set-like, so that result corresponds to both //./foo (second sub-expression) and //../foo (fourth/last sub-expression) in the UnionExpr. (This surprised me for a second too, when I wrote the test case.)

But now that I think about it a bit more, I suppose there's a slight chance I'm wrong about //foo, //./foo, //../foo all being equivalent. I think the self step case has to be, but there's a bit of nuance in the parent step case: it means "foo which has one or more child elements"1. I don't know how much difference it'd make for this change now, as neither would actually resolve anything with our current downstream usage. But I'm happy to reconsider the handling of this case now, or add a comment about this nuance for future reference in case it becomes an issue when downstream can surface it.

Footnotes

  1. This kind of thing is why I find XPath expressivity both impressive and frustrating. It's kind of like regex that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yes, I understand. How about making it //../bar so it's a distinct case? I didn't notice there were only three items in the expected set. I got tripped up by that in another place as well.

"foo which has one or more child elements"

Parent, right? But yeah, I don't think it needs to be addressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, yes, I understand. How about making it //../bar so it's a distinct case? I didn't notice there were only three items in the expected set. I got tripped up by that in another place as well.

Despite both of us being tripped up on the same thing, I think it's valuable to have tests specifically around the set-like behavior. My instinct is:

  • where there are cases that exhibit the set-like behavior, break them up so there is a clearer connection between each individual sub-expression and the resolved path

  • add one or more dedicated cases specifically around the set-like behavior, where there are only multiple sub-expressions which would resolve to a single path

Do you think that would be clearer for your understanding too?

Parent, right? But yeah, I don't think it needs to be addressed.

Neither actually! I started to walk through the steps to explain my reasoning, but that helped me realize it is equivalent to //foo (probably retreading similar reasoning when I originally arrived at the same result):

//../foo is equivalent to:
/descendant-or-self::node()/../foo is equivalent to:

  1. All nodes in the document except attributes and namespaces (which are not children/descendants):

    • /
    • every element
    • every text node
    • every comment node
  2. The subset of 1 which are parents of any other subset of 1:

    • /
    • every element with any child node
  3. The children of 2 which pass the foo NameTest

That's equivalent to //foo, because foo can't be a child of anything filtered out in step 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that would be clearer for your understanding too?

Yes!

That's equivalent to //foo, because foo can't be a child of anything filtered out in step 2.

Ah, ok.

// parentheses can have surprising effects on a path expression's semantics!
// (For instance: see note on predicates and ordering in
// https://www.w3.org/TR/1999/REC-xpath-19991116/#node-sets).
it.fails('resolves parenthesized portions of sub-expressions', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels low priority but good to have a test for!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably something we should get some data on. I don't know how much parenthesized Exprs are used in real world.

I do know that I'd want more coverage of it in xpath as well, particularly around the note linked in this comment.


{
description:
'we do not currently check prefixes for function call analysis, typical usage has common namespace mapping',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... it's not my favorite thing, but I really wouldn't want to connect up form-derived evaluation context to static analysis unless/until we have a compelling reason, like a bug in the wild.

@eyelidlessness
Copy link
Member Author

It still feels largely accidental that this is supported at all in JR but the interpretation is consistent.

I don't know if it's accidental? It's not testable with the standard browser evaluator: the current() function is only implemented in browsers for XSL, not for general purpose XPath usage. But it's what I would expect in XSL as well. Happy to confirm that if you think it'd be valuable.


Anyway, I think I've addressed everything else, a couple followup commits and the rest in threads. Thanks for a good thorough look! I'll keep an eye out for anything further.

@lognaturel
Copy link
Member

Happy to confirm that if you think it'd be valuable.

No, I think it's very reasonable!

Maybe consider the naming changes to the two tests that resulted in overlap in expected results if you see why I found those confusing. Certainly doesn't feel blocking to me.

@sadiqkhoja will likely want to take a look at some point but I will give approval in case we may want to merge before then.

@eyelidlessness eyelidlessness force-pushed the fix/dep-resolution-current-relative-body-refs branch from 035c934 to 0128020 Compare July 19, 2024 21:06
eyelidlessness and others added 17 commits July 22, 2024 12:11
Followup from [this discussion](#150 (comment))

The idea here is:

1. Explicit repeat creation in tests will improve test clarity

2. Introduce a clear way to make similar changes in JavaRosa as they come up

3. Detect missing repeats (with a still naive approach[^1], albeit now recursive) and **log with a stack trace** so explicit calls can be introduced (conditionally, with parameterization like many other cases where we make adjustments to the JavaRosa direct port)

4. Add a new proposed `Scenario` method which…

  - Makes clear where explicit repeat creation calls are added, in a way that can be traced directly in test source, whenever convenient

  - Assumes the call occurs in such a sub-suite parameterizing whether to explicitly add repeats as detected; adds repeats as explicitly specified in the true condition, suppresses logging in the false condition

This approach already detected one test which would have passed if adding repeats had been explicit. The test is updated here to demonstrate that.

Notice that the test’s **PORTING NOTES** have also been removed. This is because the notes were wrong! This is an excellent example of how misleading it is that tests fail for lack of this implicit behavior! The actual test logic is not substantially noisier or more complex as a result. This feels like a clear win to me.

[^1]: Keeping this naive seems fine for the limited scope of usage. The reference expressions which reach this point are limited to `Scenario.answer` calls with an explicit reference. If we’re using references of arbitrary complexity in those calls, I think we’ve got much bigger problems than this functionality being so narrowly scoped.
… resolution, dependency identification

Function call analysis is generalized and streamlined, and used to help various other aspects of path-related static analysis.

Path resolution:

- Accounts for arbitrary self/parent steps (of any number, at any depth in a path)
- Resolves relative paths to arbitray Absolute and FilterExpr contexts
- Resolves current() calls to either:
  - The step in a [partial] path expression where called in a predicate on that step
  - The context (typically parent) of an expression’s definition

Dependency identification:

- Benefits from all of the above path resolution improvements
- Accurately contextualizes dependencies referencing current() (which will improve reactivity for a variety of form logic cases)
- Accurately identifies and contextualizes arbitrary sub-expressions in predicates (which will improve reactivity for form logic cases with more complex logic)
…orm definition (typically body)

Note that this breaks up parsing of…

- `<item><label ref>`, where arbitrary references may be contextualized to their parent nodeset
- `<itemset><label ref>`, where a relative expression is intentionally kept unresolved (ensuring that evaluation of dynamic itemsets update correctly when the itemset nodeset expression is a form state reference (i.e. repeat-based itemsets)
…text

This ensures that:

- if relative, the nodeset expression itself is contextualized correctly
- in any case, that any dependencies identified in the dynamic item expression are resolved to the correct context
…ogic

This wil improve a variety of edge cases, and will be most visible in the correct tracking of:

- `current()` references in expressions
- any references found in expressions’ predicates
Test is fixed by stripping predicates in dependency analysis
This **will change** when we implement support for `jr:count` and `jr:noAddRemove`. The intent of implementing it now is to ensure, in subsequent commits, that adding explicit repeat creation does not cause false positives in new test passage
…peats will not be allowed

Technically this is a bit of a fib! The engine/client API will still permit repeat creation calls. But its passing now will ensure we continue to keep it passing when we implement the pertinent functionality
As discussed in review, it’s highly unlikely this is actually needed anymore!

Brief backstory context in case we ever have reason to doubt this:

0. The WHO VA form, which I’ve used as a frame of reference for many things, uses two references to `null` that don’t resolve to anything. The inferred intent is to treat `null` as a keyword, just as it would be in languages with the Billion Dollar Mistake.
1. In early prototyping (pre Web Forms project), the WHO VA form’s references to `null` produced errors where there was assertion logic checking that expected dependencies were found.
2. Early Web Forms work produced the same, for a time at least producing errors in the console. This produced a lot of distracting noise that made it harder to identify actual implementation errors in dependency resolution.
3. Subsequent improvements to dependency resolution have been successful enough that we’ve even eschewed logging when a dependency lookup doesn’t resolve (though I sometimes add it in local dev for testing/validation), at least as of #67.
4. As of #135, dependency resolution was expanded to be form-wide and match all possible nodes; there’s no case where a `null` reference **should** match a node and won’t.

I can imagine these potentially useful followup changes:

1. On parse, identify any path sub-expressions which **will never** resolve to any node in the form.
2. Short circuit lookup of such never-resolvable paths. There’s no sense walking the full form tree for something we know we won’t find!
3. Potentially warn when such never-resolvable paths are found. This wouldn’t be particularly useful for the `null` NameTest specifically (where the intent at least in WHO VA is clearly a null reference), but it could be useful for catching typos and other mistaken references like hierarchy confusion.
- Cases which previously exercised set-like behvaior implicitly are broken into individual tests, e.g. demonstrating cases where `current()` and `.` are treated the same way
- Adds an explicit set-like behavior test
This corrects the previous copypasta mistake, and makes the notes more general so they don’t overly influence reasoning about the gap in behavior
@eyelidlessness eyelidlessness force-pushed the fix/dep-resolution-current-relative-body-refs branch from 0128020 to 1d4de28 Compare July 22, 2024 19:11
@sadiqkhoja
Copy link
Contributor

My questions/confusions/observations after going through unit tests:

1- Why following two are different. If contextNodeset is relative, shouldn’t current() to resolve as relative nodeset:

const contextNodeset = 'whatever';
const expression = '/data[current() = 1]/bar';
const expected = ['/data/bar', 'whatever']; // shouldn't this be '/data/whatever'??
//vs
const contextNodeset = 'whatever';
const expression = '/data[whatever = 1]/bar';
const expected = ['/data/bar', '/data/whatever'];

2- I was assuming arguments to the functions would be added as dependencies and this would return ‘/bat’ as well?

description: 'Arbitrary FilterExpr with either a trailing Step or a Predicate (without any specialized handling for the FunctionName in use)',
contextNodeset: null,
expression: 'foo("bar", /bat)/quux and zig()[zag]',
expected: ['foo("bar", /bat)/quux', 'zig()', 'zig()/zag']

3- Parent’s sibling works when contextnode is absolute but not when it is relative:

const contextNodeset = 'foo';
const expression = '../bar';
const expected = ['bar']; // fail
//vs
const contextNodeset = '/foo';
const expression = '../bar';
const expected = ['/bar']; // pass

4- How are we handling text()? I am thinking parent of the text node (i.e. the element node) should be added to the dependencies

const contextNodeset = '/data/foo';
const expression = 'child::text()';
const expected = ['/data/foo'];

5- shouldn’t this be an error - text node can’t have children, right?

const contextNodeset = '/data/child::text()';
const expression = 'child::node()';
const expected = ['/data/child::text()/child::node()'];

6- I think isTranslationExpression should return true for the following expression:

/foo[jr:itext("id") = 1]

7- would be nice to have one or two unit tests for ignoreReferenceToContextPath

Two of the tests are currently marked failing:

2. This question caught a bug! I’ll fix in a subsequent commit.
4. This question suggested a valuable enhancement! I’ll timebox a solution to this, and expect to also have a fix for it in a subsequent commit.
This further clarifies that the intent is to identify where an expression **is** a translation call, not where an expression may contain such a call as any arbitrary sub-expression. As mentioned in the added JSDoc todo, it is possible we may want this in some future change. I think we should know more about whether it’s an expected use case before implementing it (as if it’s not an expected use case, it could be an indication of a form design mistake).
… and add several related tests around the general concept.
Copy link
Contributor

@sadiqkhoja sadiqkhoja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this is an excellent work 🎉

I have added few small suggestions inline, otherwise this is great.

packages/xforms-engine/src/parse/xpath/syntax-traversal.ts Outdated Show resolved Hide resolved
packages/xforms-engine/src/parse/xpath/syntax-traversal.ts Outdated Show resolved Hide resolved
if (expected == null) {
expect(result).toBeNull();
} else {
expect(result).toMatchObject(expected);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is clever to use toMatchObject and not define the complete expected object.

packages/xforms-engine/src/parse/xpath/syntax-traversal.ts Outdated Show resolved Hide resolved
readonly expected: boolean;
}

it.each<IsTranslationExpressionCase>([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting following case to pass:

{
  description: 'argument is a string',
  expression: 'jr:itext(string(1))',
  expected: true,
},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought looking at this comment was "well yeah, and it should also accept anything else returning or producing a string". And then it occurred to me that... that's everything. Even the test cases where we currently reject some argument types (e.g. Number), because the xpath evaluator will cast them to a string. So the more I think about this, the more I think we shouldn't bother checking anything about the Argument's expression at all, except that it's there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I was not sure if we expect non-core functions to be doing explicit casting of the arguments.

@eyelidlessness eyelidlessness force-pushed the fix/dep-resolution-current-relative-body-refs branch from e39b018 to b07023e Compare July 25, 2024 17:28
@sadiqkhoja sadiqkhoja merged commit db81472 into main Jul 25, 2024
86 checks passed
@sadiqkhoja sadiqkhoja deleted the fix/dep-resolution-current-relative-body-refs branch July 25, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants