-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explainer: WebDriver Extension for Accessible Nodes, etc. (potential solution for #197) #203
Comments
Thanks for the detail here, @cookiecrook. This is a really solid start.
Would this return an axId or an accessible node snapshot?
Can you explain the desire to return multiple properties at once here? This is very different to elements, for example, where WebDriver only returns one thing at a time. For example, to get an element attribute, you use /session/{session id}/element/{element id}/attribute/{name}, which only gets a single attribute. Obviously, returning multiple things in one call is better for performance. However, it does add some complexity in the spec; e.g. we have to work out what set of things to return as per your discussion section. If we return a single thing at a time, we can avoid some of that complexity. Having different methods for every single thing would be tedious for extensibility. But perhaps we could have a simple attribute getter with a defined set of attribute keys we can expand over time? For example, /session/{session id}/accessibility/node/{axId}/attribute/{name}, where {name} could be "label", "role", "pressed", etc. Further down the line, some thought needs to be given to how axId is specified. Some engines have simple globally unique 32 bit numeric ids for accessible nodes. I think Chromium does? I'm not sure about WebKit. However, Gecko does not, instead having a 64 bit unique id which is only guaranteed to be unique within the document, not across documents. So, some care needs to be taken in terms of what assumptions are made. I see WebDriver specifies that a node id is created as "a new globally unique string" and it also specifies that there is a "node id map". We might need to do something similar for axId. Or perhaps we can just specify that the id is globally unique but opaque and implementation defined? I'm not sure if that's reasonable. Bikeshedding: Maybe this is just me, but I don't love the name event or notification for things we perform on an accessible node. I tend to think of events and notifications as things that a node fires. I would have suggested "action", but that gets conflated with default or custom actions. Maybe "interaction"? |
@jcsteh wrote:
That's an open question. Initially I thought either "ax node from element" or "ax node from id" should return the same snapshot object, but I don't have a strong preference for or against making the additional call...
Mainly to avoid tedium and perf hits... In the tree walker use case, for example, making each attribute/property separate calls could turn one call per element into dozens or hundreds per. But I acknowledge it could work either way.
The Gecko GUID question seems worthy of researching sooner rather than later. Obviously the spec should be limited to features anticipated to be readily implementable in all engines. I agree with all your other points, and I acknowledge those are open questions too. |
It probably doesn't matter that much. If these calls return a snapshot, the snapshot should include the axId. The answer to this question will depend heavily on whether we go with snapshots or individual getters.
That's certainly true. This seems to be something that was considered acceptable for DOM elements in WebDriver and it'd be nice to have a similar interface for simplicity/consistency. On the flip side, we don't need to use WebDriver to walk the DOM tree, whereas we have no choice for the accessibility tree, so I realise the use case is quite different. |
If an opaque, implementation defined, globally unique id string is acceptable, I think this should be implementable in all engines. That said, when I first raised this, I didn't realise that a WebDriver session had a "current browsing context". As I understand it, a browsing context is associated with a document. If the accessibility methods use this browsing context, that means we only need to look at the document associated with the current browsing context, not all documents everywhere. That does make this a lot more feasible. I guess we probably still want the id string to be globally unique though, even across browsing contexts? |
Some scattered (sorry, it's that kind of day) thoughts: It might be helpful to guide the discussion if we could document some of the types of things we'd like to be able to test in WPTs using these APIs. Specifically, I think questions like those @jcsteh is asking around returning a property bag vs. returning discrete properties (as is done for Element properties), and questions around including/excluding ignored nodes when tree walking, might be easier to answer with a solid understanding of what we're going to do with the output.
Some of this makes me wonder whether we'd want to require accessibility to be "enabled" before an accessiblenode can be retrieved, so that we can ensure that the accessibility IDs are consistent (AFAICT currently Chrome at least implements computedname/computedrole on top of the CDP I guess all the property names will be based on the ARIA names, as the best platform-independent vocabulary we have available? |
Possibly need a way to register for outward notifications too… e.g. When a live region changes. |
Live region changes in particular might be tricky to standardise. Each API does them differently, which I suspect means core browser implementations vary wildly. Notably, IAccessible2 and ATK don't have specific live region events, but instead rely on generalised text inserted/removed events and the client checking live region properties on the object. This is not to suggest that registering for outgoing events isn't something we need. It very probably is. However, I think it might take longer to iron out the details there and it might not make sense to block this work on standardising live region events. Is there some other outward notification we can start with to get the core concept working? Focus or selection perhaps? |
Spoke with @OrKoN today who mentioned a related use case for accessibility in webdriver... possibly w3c/webdriver-bidi#443 |
TPAC-related updates summarized in #197 (comment) |
Most relevant from above linked notes:
So for the sake of near-term interop, the minimum viable product could focus on those that could ship near-term in all three engines:
But not an outgoing notification snarfer, for example. [Update Nov 7: As an example, this likely means that outgoing ARIA Live Region notifications would not be testable in WebDriver Classic.] |
Actually we could even remove (3. Trigger/Synthesize Accessibility Event/Notification) from the MVP, but it seems achievable and useful, so I'm keeping it in the short list for now. |
Potential error codes:
|
I've started prototyping this in Gecko. This has raised some questions with regard to the shape of the proposed API. AbstractionCurrently, session/{session_id}/element/{elId}/accessiblenode returns a map of properties, including the id of the accessible node and ids of parent, children, etc. You can then interrogate that accessible node (or other accessible nodes) using session/{session_id}/accessibility/node/{axId}. This works, but I'm realising it doesn't fit so well with other WebDriver abstractions like web element, web frame, web window, shadow root, etc. Do we need such an abstraction for accessibility? It does introduce complexity, but perhaps it's important enough to justify that. I guess this potentially makes things easier for clients using libraries, since they probably get an object representing the AccessibleNode when the return value is de-serialised and can call methods directly on that, rather than having to deal with the ids directly. If we do this, we run into a problem with session/{session_id}/element/{elId}/accessiblenode. I think the abstraction would require us to return just the accessible node reference object (like we do for elements, shadow roots, etc.), but we ideally want to return the properties for that node, rather than having to make an additional call to get those. How could we work around that? I guess we could have a property "accessiblenode" or similar which provides the node reference object, if that's acceptable. That does feel a bit weird though. The abstraction would also modify the returned data for parent, children, etc. so that they too would be accessible node reference objects rather than just ids. Other thoughts
Edit: Corrected some terminology and added some additional stuff about the abstraction. |
In my prototype, I've implemented this using a UUID and a map, similar to how it's implemented for DOM nodes. So we should be fine here as long as the id can be a string. |
If I'm understanding you correctly, I agree that the JSON property bag for the current accessible node should be returned with either call... along with the ID of the accessible node, the related element ID (if it exists), and the related accessibility nodes (parent, children, etc. if they exist)...
If I understand your specific question correctly, the frame element or shadow host could return a placeholder object that contains one accessibility child ID... Walking down into this descendant chain with subsequent WebDriver callbacks would eventually you to the contents of the frame or web component. While slightly more tedious to author, this path seems more resilient to change and implementation details. Of note, these are likely to expose some implementation detail differences that IMO we don't need to solve in the initial release. We'll have similar differences on scrollable divs for example, and possibly generated content, or other CSSOM constructs. We have agreed that tests exposing those differences should not land in WPT proper (except as tentative explorations)… |
PS. I'm excited you're making progress! |
@nmlapre it's a bad time for Jamie, but if you two could sync today, we could discuss it in tomorrow's WPT call. |
That's not quite what I mean. The current proposal just exposes the id of the accessible node as a string. That means that all clients (even using object oriented libraries) have to take the id and pass it as an argument to any other accessibility calls. In contrast, web elements, web frames, etc. have an id, but they use a specific format to allow de-serialisation as an object by object oriented libraries. For example, with the Python webdriver package, you have a WebElement object and you call methods on that object:
Note that the details of passing the id argument are encapsulated by the WebElement class. The protocol provides a mechanism for both arguments and return values to serialise and de-serialise such objects transparently. If I understand correctly, the protocol handles this by using a JSON object which looks something like this:
Whenever the de-serialiser sees this, it creates a WebElement object. Similarly, whenever a server wants to return a WebElement, it serialises it into such a JSON object. Getting back to accessibility, if we were using the Python webdriver package, calling:
would return a dict of properties. One of the keys in that dict would be "parent". Instead of
In terms of the protocol, accessible node properties would need to look something like this:
This obviously adds a lot of complexity in the client implementations. The question is whether that is required and/or worthwhile. On the other hand, if we don't do it, I'm not quite sure how the client libraries will handle this because there's no logical place to put the /accessibility/node methods. They obviously can't go on WebElement. |
For anyone that's interested, I've posted my WIP code on this Mozilla bug. In particular, this test might be of interest. While very basic, it proves that simple states such as checked and pressed can be tested (I have this passing locally). |
@jgraham noted on Matrix that we'd ideally design closer to what we would for BiDi and that we'd want something closer to just an id rather than the web element type abstraction. However, I still don't understand how this would fit into the current object oriented client libraries and how we'd maintain compatibility with the existing implementation of elements, etc. In particular, a client isn't going to know to create an AccessibleNode object without the web element type abstraction unless each method has special code which massages the result first. |
I now have a proof of concept test for accessibility tree traversal which is passing locally. The comments marked XXX in this patch might help explain the questions i have around abstraction, as the implementation is a bit ugly without those addressed. |
While we could expose this, it's not going to be useful in WPT. WPT doesn't have a way to de-serialise (or even serialise really) a DOM node. When you pass an element to a WPT test_driver method, it builds a CSS selector and passes it to wptrunner. wptrunner then asks WPT to find a matching element and then executes the method on that element. There are no WPT methods that return elements because there isn't a way to resolve the element from its WebDriver UUID. We might be able to add a WebDriver method to build a selector for an element or something, but I'm not sure it's worth the squeeze. Instead of this, we could:
|
Thanks for the clarification. I understand now, and will tag in @gsnedders for their advice on resolving this. One potential easier solution (while bidi is not universally supported) is to postpone including the element references in the return. We can reap the benefits of element-based accessibility backing node testing in the meantime, and prematurely commit ourselves to any particular solution on in-document or cross-document object references. So in the short term a single getter |
Interestingly, if the client libraries do de-serialise accessibleNode references to objects, this actually causes a problem for wptrunner because it needs to pass the accessibleNode references back to the WPT test running in JS. The only way it can do this is to re-serialise them as ids, so now we've just pointlessly de-serialised to Python objects only to serialise them back to ids. So, code/design cleanliness and consistency aside, in a lot of ways, it'd be much simpler to avoid the abstraction. The abstraction is still probably nicer for non-WPT consumers of WebDriver, though. |
I've just extended this framework further with a utility function to allow testing of a full tree using a simple JS object. For example:
|
Do we want to support relations here? Is this something that is mapped in the internal, cross-platform tree in all engines (as opposed to only being used to compute labels, etc.)? I know Gecko and Chromium both map relations in the cross-platform tree, but I'm not sure about WebKit. For example:
What about reverse relations; e.g. labelFor? |
@jcsteh Are these commands available only via WPT or they can also be used via GeckoDriver HTTP calls? I'm curious if using this API via Selenium directly is possible. Sorry for derailing the discussion in a different direction. |
This is just a prototype for now, so they're only available via WPT or Gecko's own marionette protocol (and only with my patches applied; they're not landed). However, should these become part of the WebDriver spec (which I believe is the proposed plan), they would need to be implemented in GeckoDriver as well. |
@jcsteh Thank you for explaining. Is there any Mozilla ticket I could follow to know when the patches land in the main tree? I assume that would be part of the Firefox source code, though I am not sure where are the boundaries of Marionette. Gecko and GeckoDriver. |
I'd start with following Mozilla bug 1929144. Even if I don't land all the patches there, I'll comment or link other bugs as needed. That said, this is still some way off being finalised - we don't have consensus on some important open questions (see above) - so I wouldn't expect this to land imminently. |
Chromium recently deleted all of its AccessibleNode code. It was almost 9000 lines, and hadn't been used in 10 years. |
WPT Roadmap is here: Abandoned
Active
The active projects are parallel paths that test different parts of the stack... |
Rather than muddy the problem issue #197 with a specific proposed solution, I'm posting this as a standalone issue. Ideally we could turn this Issue into an Explainer and eventually a Spec, but the goal is to get wider approval of the idea first, during a few meetings at TPAC 2023 Sept 11–15 in Spain.
Note: I will be editing this problem description, so expect changes.
Note on WebDriver-BiDi
This explainer does not use BiDi examples, but we don't anticipate problems converting to the other format and welcome accessibility additions to Classic and/or BiDi. It's been suggested that this be added to the BiDi roadmap.
Current State of Cross-Browser Web Accessibility Testing
Existing WebDriver accessibility testing methods go through DOM Element, to AX Element, then to its label or role.
get a string value from the backing accessibility object (if it exists) of a given DOM element
In 2023, we added over 1000 automated accessibility tests to the WPT Interop 2023 Accessibility Investigation using the above two WebDriver methods, but there is so much more to test, and no way available to test it in WPT/WebDriver.
Potential Changes
See also: #197
…a new WebDriver accessibility extension might look something like this:
1. Way to access the backing "accessible node" of a DOM element (if one exists).
Note
Only one of the following two accessors 👇 are needed, not both
get accessible node from its mainstream DOM element (if one exists)
EITHER a new method in a new accessibility-specific webdriver extension.
OR a new method on the existing webdriver element interface.
Note
Only one of the preceding two accessors 👆 are needed, not both. Currently prototyping option 2.
2. Way to access an "accessible node" by its WebDriver ID directly (e.g. you may receive this ID from a parent/child cross-reference).
Regardless if an accessible node is associated with a DOM element (some are not), once you already have the accessible node id:
get accessible node by its WebDriver ID
3. Way to Trigger an Accessibility Event/Notification.
We also need a way to trigger a notification on the accessibility object, too.
Note
synthesizeevent
is just a draft name. Very open to change on every aspect of this.Common Events
Click/Press
where the minimum payload is the notification type (e.g. a screen reader “click” would fire):
Explanation: “AX Press” almost always results in a DOM “click” but the event object on a “press on AX object” event can end up very different from a “click on DOM element.” For example:
AT Focus (pulls keyboard focus if the element is focusable)
AT Focus should be verifiable, b/c it will pull standard keyboard focus along with it, if the AT focused elements is keyboard focusable.
Other Events/Notifications
Trigger “Action” (lower priority for v1/MVP)
It could also be used for non-default “actions” (e.g. trigger the associated “reply” action):
This one 👆 has native precedent, but the proposed Web API hasn’t yet shipped, so it may be lower priority.
Scroll into view a.k.a. “scroll to visible” (lower priority for v1/MVP)
This might not be needed as it’s usually called downstream from focus, rather than directly from AT.
Show Menu (lower priority for v1/MVP)
Show menu (VO and other AT’s equivalent to show the “right-click” menu). This sometimes results in a different AT-vs-mainstream behavior when web site has overridden the “right-click” mouse behavior.
I don’t know how if “showMenu” would be interoperable on other systems, but it’s in WebKit because Mac VO and other AT support it. I assume Windows has something similar.
4. Test-Only (WebDriver-only for now?) Interface for accessible node.
Return value for the accessibleNode would be a static snap shot of the element at the time of the request:
Example return object for accessible node getter.
Discussion Points
Getter Interface for ~“accessibleNode”
There’s a balance between whether to return a limited scope of known things to query" or to return "over-expose” as much as possible about the backing accessible object… Some relationships or properties are costly or slow to return, so we’ll probably need to start with a subset of the things that all implementations can return reasonably quickly.
Perhaps multiple getters: a default set of the easy ones (role, label, required, checked, yadda, yadda) and then we don’t include the ones with a significant perf cost or other complications unless requested specifically.
/session/{sID}/accessibility/~ax_element/{axID}
for defaults/session/{sID}/accessibility/~ax_element/{axID}/~full
for everything/session/{sID}/accessibility/~ax_element/{axID}/~partial
for a specific set, with an array of keys in the post payloadNote
Note that
~
above indicates TBD draft name proposals... Open to changes, of course.Object/Node Persistance
API should be clear that Accessibility Objects/Nodes are not expected to persist once removed from the accessibility tree. Though this may be possible in some implementations, it is unlikely to be readily achievable in all implementations, so:
The text was updated successfully, but these errors were encountered: