-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User gestures #142
Comments
Thanks for proposing this, @AshleyScirra!
Can you say more about this? Are there any cc @mustaqahmed who's done a lot of work on user gestures in Chrome. |
We have a game engine that supports running entirely in a web worker with OffscreenCanvas. In our architecture input events are forwarded to the worker via postMessage(), it runs the entire engine and all logic in the worker, and then it posts back for calls to any APIs not available in a worker (window.open, WebRTC, etc...) However this means calls are never synchronously in a user input event, and so in some browsers we lose the ability to use any user-gesture limited APIs at all. So we have to turn off using a web worker and stop using OffscreenCanvas, just so we can keep within the user gesture rules. I know OffscreenCanvas isn't supported outside of Chromium yet, but even when it is, this is a blocking issue for us to switch to using it. |
I see, so an example might be when there's a "fullscreen" button in the game's UI, and you need to actually run the game engine to know that it was clicked, and then do the |
To the best of my knowledge, most other browsers use the old HTML model that required calling activation-gated APIs synchronously in appropriate event handlers, or asynchronously after a (undefined) propagation. That undefined part was known to have fundamental problems, and every browser (including pre-M72 Chrome) added unpredictable workarounds to make it work. The primary motivation for the current HTML model was to support async usage of user activation in a well-defined manner that is agnostic to any propagation scheme in JS. |
Yes, exactly. It might seem a roundabout approach to making the call, but if you make middleware/game engines/libraries that run in a worker, you can end up with that situation; by the time you try to call I guess this is a variant of "difficulty delegating work to Web Workers", but I called it out separately as giving up on running code off main thread is a pretty significant consequence. |
User gestures is a P2 for Google's closure library "Allows more work to be done asynchronously, which unblocks migration to Promise." |
WebKit implements the "new" model in HTML, and has for a long time. About the problem @AshleyScirra outlines, understood. I guess it would be good to see some examples. WebKit, for instance, allows quite a bit of time before consuming the activation (this obviously shouldn't be relied on and can change without notice!). So, I guess I'm wondering why one would do:
Instead of:
|
Just so we have some more context, the APIs in WebKit that either consume user activation or check for transient activation are:
So the discussion needs to be framed around when the above are a problem (the list is not exhaustive, other engines may support more). |
@marcoscaceres - it doesn't look to me like Safari does support this. This WebKit bug is still open and some of the cases, such as reading a blob, still prevent a user gesture from working based on a quick test now in Safari 16.1. As for the requestFullscreen workaround, middleware like Construct's game engine, and probably also anything compiled to WebAssembly and running in a worker, doesn't know what the intent is with any given input event. It uses an architecture where it simply forwards all input events to the worker, runs logic, and then posts messages back to the DOM to run any APIs not available in a worker. We can't request fullscreen and then cancel it on every single input event just in case it does want to enter fullscreen - it would be unusable with constant flickering for the user. |
I still don't know what "this" is exactly (I assume it's "the model implemented by Chrome" or is it https://mustaqahmed.github.io/user-activation-v2/ ? WebKit support what's in the HTML spec today, which sounds like it differs from the model implemented by Chrome? Is there some other proposal/PR to HTML that you are referring to that specifies that proposal? (Hi @mustaqahmed, do you know?)
Understood. It sounds like we need some kind of thing that requests more time or somehow links a set of operations started by a user gesture. |
Yes - that. As far as I understand it, Chrome's model follows what is now in the spec.
I'm really confused by this remark because it appears to me WebKit has never supported the latest spec on this and it still looks like it doesn't in Safari 16.1. I've never seen it referenced in Safari's release notes, which I have closely followed for years. The WebKit issue is still open, and I would have assumed it would be closed if it was supported. Try the second button ("Open popup after async blob read") in this test case: https://downloads.scirra.com/labs/safariusergesture/index.html It successfully opens a popup in Chrome, because its user gesture rules allow a ~1 second timeout. It does not work in Safari, which I always thought was because it never updated to the latest spec around user gesture rules: the fact there is an |
I don't actually know what exactly Chrome implements today, but the large spec change happened in whatwg/html@8f8c1f5. This largely split the prior "user activation" into transient and sticky activation, and defined what is events are considered user activation.
WebKit/WebKit@941560c added transient activation; this shipped in Safari 14.
I think the problem is that we didn't uniformly move all existing usage of user gestures over to transient activation—including |
I'm still confused - WebKit/WebKit@941560c appears to refer to using Web Share after AJAX - one of Safari's many API-specific carve-outs - rather than a general implementation of the modern user activation model. Web content can feature detect the modern user activation model via |
Gecko have been implemented the behavior defined spec and we have not yet implemented But our |
What @gsnedders and @EdgarChen mentioned above matches Chrome's experience: lots of APIs, both exposed and Chrome internal, rely on user activation, and Chrome needed quite a bit of time to migrate all to UAv2! Good point about |
I think part of the confusion here is between: A. The core user activation implementation as per the HTML spec (which now includes B. How dependent APIs use it---every API here needs to fix its spec and add WPTs to clarify its reliance on core user activation model (say, transient vs sticky). For progress tracking on B, we have meta issue whatwg/html#5129 for external spec changes only, and now I think we need a similar one in this repository for WPTs. Thoughts? |
@EdgarChen, about the tests (mostly related to We've identified a bunch on non-conforming things in the tests during implementation of the API in WebKit. Would love your (or other Gecko folks') input there! Cases in point:
|
@AshleyScirra wrote:
No, Web Share is using the V2 model in WebKit. The problem is that the V2 model itself can't cater for the case outlined in the bug: the v2 model appears to not handle that case (not WebKit's fault). We need a Model v2.1 that can extend the timeout before transient activation expires or something. We should discuss that elsewhere, however.
Again, no. This a completely incorrect assumption. You are confusing transient activation, the As @gsnedders mentioned, some APIs are using transient activation in WebKit (my list in #142 (comment)) - others are using "the old model": this is likely the case for all browsers. I filed this bug on WebKit to track where the "old model" is used in WebKit that MAY need to be migrated over: Hopefully that clarifies things. |
Forgive me but I'm still trying to understand the situation here. What I need as a web developer is:
If the V2 model is insufficient for some specific cases, I'm happy for browsers to extend it at their discretion - after all, the wording of the V2 model refers to "an expiry time defined by the browser" which could be interpreted in a suitably flexible way. I don't think I really understand what Safari's specific exemptions around AJAX and Web Share are beyond the V2 model. But what I'm really after is the V2 model to be a minimum baseline that can always be relied upon for all APIs. Any exceptions beyond that are welcome! It only makes it easier to use user-gesture limited APIs. So if I understand correctly, my proposal would involve using V2 activation consistently for all APIs as a minimum guarantee, completely removing the "old model", and implementing I apologise for any frustration in trying to communicate this but user activation inconsistencies have been a pain point for us for years now and I'm keen to make sure it is clear what web developers need and what solving this would mean in the eyes of a web developer. |
Hi @AshleyScirra,
Agree. That's the shared goal. However, it keeps sounding like you are conflating whatever Chrome implements with the HTML spec's activation model. Those things might not align so please keep that in mind. I gave some examples above where today Chrome and other browsers currently disagree:
And there are places where Chrome may be ahead (see the WebKit bug I filed), and @mustaqahmed mentioned that Google has done a bunch of work to migrate things over.
That would be greatly appreciated. However, please note that there is nothing in the HTML spec that connects You've may have identified literally "unspecified behavior".
It doesn't, AFAIK: if you click on "consume user activation" in HTML or you go to the definition of
I don't know what
Assuming the above is just "consume the user activation" (which does the magic of starting the timer).
Respectfully, no. This needs to be in each spec. It's not a detectable thing. How you know, is because the specs that use "consume user activation" or "has transient activation" check.
Again - please please please don't do that! that's an extremely bad assumption: the timing across users agents is going to differ and your transient activation could be consumed by basically anything (including the browser for whatever reason). You can't make any assumptions there. For instance, your page could be BFCached and the script wouldn't have a clue that the world has changed and you've lost transient activation.
No. You've raised valid issues around this... and it has resulted in good bugs! (again, the bug I filed for WebKit).
Again, there are no "Safari's specific exemptions". We are not doing anything HTML doesn't say to do. If other browsers are doing something different, they are in violation of HTML. That's not a bad thing if it's "doing the right thing"™️ by developers. We just need to get that fixed in HTML. The problem is that the V2 model appears to be either too limiting or broken.
Again, that's totally the goal. But please please please, don't come in with a mindset of "Safari is doing the wrong thing". It's entirely possible for another browser to exhibit what you consider to be the right behavior, while not adhering the the standards. I'm not suggesting anyone should regress their behavior, just that the behavior might not be specified.
Again, no 🥲. If the
You don't need to apologize. I 100% understand where you are coming from. I know this is not your problem and you "just want stuff to work"™️. And I understand you are trying to hack around the problems because you have a real product that relies on all this... and it's all quite an incompatible mess. But I absolutely promise you we are working on fixing the things you said above. Just please, don't rely on the UserActivation API to mean what you said above.
Understood. That's quite evident now. I think the path forward is pretty clear:
Folk here, agree with the above? |
This is not correct.
There is. Step 8 of rules for choosing a navigable, which is the part of
I don't really agree with that. I think it's reasonable for someone to expect that a browser implements In other words, I think it's reasonable for web developers not to expect browsers to ship with two separate user activation models, with different APIs using each. |
Right, sorry - and that makes sense. But it doesn't consume it (different discussion, I know).
I was thinking about this also. It would only be prudent to expose the UserActivation API once the all the APIs had been migrated over. However, migrating all the APIs could take a Very. Long. Time. Developers (and browser vendors) could benefit from API being exposed without the above requirement is my point. There are already tests in WPT appearing that are relying on That's why I'm saying that the
We all agree with this - this is the ideal world we want to get to... but that's not the world we are in, hence this bug. |
So I guess what I'm asking for here is also "please update the spec to talk about the new user activation model where appropriate". In fact I would request that the spec is changed to align with what Chrome ships today, as that is the most useful form of user activation (and what I thought corresponded to the spec). I had assumed the spec got updated accordingly for all user-activation-restricted APIs when the v2 activation model went in to the spec; if not then that would also need to be done to make the spec consistent and then result in consistent behavior across browsers. I am indeed coming from a "just want it to work" perspective though!
I was looking at https://mustaqahmed.github.io/user-activation-v2/, but it does say it's out of date, I probably shouldn't refer to it any more. I think it is the same thing as "transient activation" in the spec wording.
I think this is the kind of thing where there is a difference between the perspective of spec authors/browser developers and a perhaps more pragmatic just-get-it-working view of a web developer. We have an entire game engine that can run in a web worker with OffscreenCanvas. However this only works if v2-style transient activation is supported (as it is in Chrome), because all inputs are sent via postMessage() to the worker, and all API calls not available in a worker are made by posting back to the main thread. This typically happens within one or two frames (16-32ms) as the game ticks its logic. This is well within any reasonable "transient activation" timeout and so everything still works: Now suppose Safari or Firefox implement OffscreenCanvas, but not user activation equivalent to Chrome. If we feature-detect just OffscreenCanvas and enable it based on that, it means postMessage() can lose the user activation and so a subsequent post back to the main thread will fail to access some APIs. That's a big breaking change for us and would mean suddenly lots of content is broken. I want to avoid that. So I need a way to feature-detect user activation rules that support this architecture. The best thing I've been able to find is Another example is our web app does a very short await and then calls As a rule of thumb, web developers generally need feature-detection for any observable difference between browsers. If it doesn't exist, then we still have a problem that needs solving (and often customers actively complaining about it who want it solved), so we hack something if we can. If |
I'm confused as to why postMessage() would consume user activation? Transient activation is associated with the window, not with anything else (aside from the APIs that explicitly consume it). Also, which APIs in a worker are depending on having transient activation? that seems wrong...
You can't assume that... there may be APIs that continue to use the old model forever for web compat reasons. Thus, you just can't make that arbitrary assumption. The UserActivation API only exposes two things: "isActive" and "hasBeenActive", that's it! It has nothing to do with "does this browser implement all the user activation things everywhere?". Further, there is nothing in HTML that says "user agents MUST NOT expose the User Activation API unless they implement V2 Model everywhere". That would be unreasonable and impractical.
You are doing it at your own and your users' detriment 😢
It needs to be handled on a API-by-API basis (see below!).
This sounds like "old model"... This is good. I can look into this, it's actionable, and something we can probably fix! 🥰
I've been developing web pages since I was 16... I'm now 43. Believe me, I know this game well.
Just come here and tell us what's broken. Half joking, the way to "feature detect" Is: "is the bug for X open in bugs.webkit.org"? We are making a concerted effort to prioritize and fix stuff as part of this interop effort.
That's great. Please keep doing that. We will hold up our end of the barging by fixing bugs also. Just to finish off, this is really helpful @AshleyScirra. I really appreciate that you've provided the detailed responses and the amount of thought you've given this stuff. I know we've got around in circles a quite a bit, but it's quite fruitful. If you can keep telling us specific APIs that are affecting your work/app, then we can try to prioritize those. |
To be clear, the ask here is not for the existence of
I don't think that would be unreasonable and impractical. We could add that to HTML if it would help web developers. Indeed, the only reason we haven't so far, is that we assumed all implementations would follow the path of only exposing web developers to a single model. That's what we did in Chrome: we developed the new model in parallel to the old one, behind a flag, and once we flipped the flag, all call sites were updated to the new model, in a single release. That seems like the most beneficial strategy for web developers, which is why it makes sense they might assume that browsers would follow it. This approach makes way more sense than adding a feature detection API for every API that relies on the user activation spec. E.g. Early you stated concern about such switches taking
Maybe @mustaqahmed can comment on how much time it took for Chromium, but my impression was that it wasn't so hard to find all call sites of the old C++ functions, and update them to the new ones. Most of the time was in designing the model (which is done), and checking on the web compatibility of any changes (which is done for at least Chromium, and in general wasn't so hard because the new model is generally more permissive than the old one). |
It doesn't consume user activation. Under the old model, anything async in a user input event means you lose user activation (in this case, posting to a worker and waiting for a message to come back is fundamentally async). Under the v2 model, a short async bit of work can be done and still have transient activation. So the v2 model means you can do a short bit of async work and then successfully do something that consumes activation.
I know it kind of sucks, but I don't see a better option unfortunately. I will 100% use a better option if one is provided, but I don't believe there is one yet (unless this does become an official feature detection signal). We already deal with a bunch of browser bugs and inconsistencies in various hacky ways - usually filing browser bugs along the way, but sometimes they don't get fixed long-term - so to me this doesn't seem that much different to that kind of thing anyway.
As I understand it the v2 model is backwards compatible. The old model seems to be "user activation is only in a synchronous user input event". The v2 model is roughly "user activation is in a synchronous user input event and a short time period afterwards". So perhaps there is little backwards compatibility risk to changing this? From @domenic's comment it sounds like the Chrome team didn't struggle too much with backwards compatibility.
As I mentioned it is indeed useful to have a way to feature detect the new model. If the presence of |
I agree: converging to the correct model plus gradually addressing compat problems needed significant bandwidth from us (Chrome), and any new implementation work would greatly benefit from this. However, another significant chunk of our effort went into fixing (many!) internal test failures caused by historical/incremental assumptions about how the then-semi-defined activation model should work. It's likely any old code-base would have to face such a cleanup job, so let's defer some of these topics to a latter (2024) goal.
Please check/file HTML issues so that we can track any longer term discussion separately without blocking this interop discussion. I thought it would be great if we can curve out a 2023 goal before 2023 starts, and avoid "too many So, my proposal for 2023 is to target user activation interop as exposed by the following "Bucket 1" APIs. For convenience, I have created the 3 buckets of "user APIs", let me know if I missed or misplaced some APIs. Bucket 1: specs that properly reference the HTML activation concept and are on a standards track
Bucket 2: specs that properly reference the HTML activation concept and are not on a standards trackBucket 3: specs that need to be edited to properly reference the HTML activation concept
|
(Note: I have edited the comment above a few times to correct API buckets.) |
About the user gesture bucket 1 APIs, we are unlikely to focus on following APIs in 2023 due to resource and priority reasons.
So we would like to propose excluding them from the list. We support Fullscreen API and clipboard API and WebAudio (cc @alastor0325 for WebAudio). Note the scope of the original user gesture proposal was ambiguous. It looks longer for people to narrow down the scope. The current GH proposal came after the exclusion deadline, so we are also not sure if the “partial support” is still viable. @jgraham |
Well we are just over a week from the deadline for making a final decision on which protocol areas to adopt, and more than a week beyond the proposed deadline for making the proposals detailed enough that they could be clearly assessed. However given there's real pain here, it seems to me like it would be worthwhile to consider a clearly scoped proposal, even at this stage. But for others it may already be at the point where there isn't time to reassess the proposal in light of any changes. It seems obvious to me that this can't end up requiring support for APIs that happen to require activation which UAs are otherwise unable to implement. So I think we'd need to see a clear set of tests which cover the points of difference between browsers but doesn't depend on features that aren't universially implemented. |
Based on the comments above, I am splitting "Bucket 1" (from in my previous comment) into two sub-buckets as follows, and proposing to target Bucket 1.1 for 2023: Bucket 1: specs that properly reference the HTML activation concept and are on a standards trackBucket 1.1: already supported by major browsers
Bucket 1.2: not yet supported by major browsers
(I moved Fullscreen to Bucket 1.2 because Safari doesn't support consumption yet.) |
Sounds good, thanks for your responses. :) |
The above seems reasonable... with the hope that at least Fullscreen will also get included (it's hopefully small change, so I'd encourage us to add it). With Payment Request, at least Chrome and Safari should be fully interoperable and already doing the right thing per spec. And even though Web Share is not available across all platforms, it should be broadly interoperable with respect to user activation (for where it is available). |
To complete @marcoscaceres's list above, here is Blink's Web Share code. At this point, we can broaden our 2023 Interop goal above (i.e. Bucket 1.1) to: Please vote for |
|
We were working on the basis of the "Bucket 1.1" scope, and don't think the other work is the same level of priority. Also, as a process point, this is extremely late to be changing the scope, given that people are already firming up positions, and trying to broaden the scope at this stage will invalidate that work, and thus is unlikely to be well received. |
@jgraham, Happy to push this to 2024, as the API listed are broadly interoperable anyway. I guess in parallel, we can all come up with a broader plan interop plan by expanding on @mustaqahmed's list (#142 (comment)). I'd really like to cross reference that with the larger list of APIs that depend on some kind of user gesture. |
Thank you for proposing user gestures for inclusion in Interop 2023. We wanted to let you know that this proposal was not selected to be part of Interop this year. We had many strong proposals, and could not accept them all. As discussed in the issue comments, it was hard to find a subset of this proposal that was itself an interop priority and only depends on features that are themselves widely implemented. For an overview of our process, see the proposal selection summary. Thank you again for contributing to Interop 2023! Posted on behalf of the Interop team. |
We should definitely see if we can get this into 2024. Hopefully we can just reopen the issue when the time is right? |
(In other news, the User Activation API is enabled in Safari Tech Preview… so, that’s something 😊) |
That would be great. We haven't defined the process for next time yet, but I expect a new issue will be clearer. It would need to explain what's changed since last time. |
Can I request that this proposal is resubmitted for Interop 2024? I could file a new issue but it seems valuable to preserve the existing discussion. (I don't appear to have permission to reopen this issue myself.) |
@AshleyScirra please file a new issue and link to this one. I'd suggest naming it user activation to match spec terminology. |
Re-submitted for 2024 at #428. |
Description
User gestures (aka user activation) restrict certain sensitive actions, such as opening a popup window, to only be allowed in response to a user gesture, i.e. some kind of user input event. Browsers currently significantly differ in their implementation of this though, and it causes awkward compatibility problems.
Rationale
The model implemented by Chrome works well: there are essentially two flags and a short timeout. Other browsers do not support this though, only considering synchronous code run inside a user input event as a user gesture.
This causes problems such as:
await
can mean you lose the gesture)postMessage
can mean you lose the gesture)Chrome's model solves all these problems, assuming there is not too long a wait. However in many cases that is perfectly sufficient. For example converting a Blob to an ArrayBuffer is async but will likely complete very quickly for small blobs; in Chrome a user gesture can still be used afterwards, but in other browsers it cannot.
Specification
"Tracking user activation" in the HTML spec: https://html.spec.whatwg.org/multipage/interaction.html#tracking-user-activation
Tests
https://wpt.fyi/results/html/user-activation
The text was updated successfully, but these errors were encountered: