-
Notifications
You must be signed in to change notification settings - Fork 386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WebXR / WebRTC Overlaps and Integrations #1295
Comments
The main reasons I proposed a separate Using
There are also some general advantages of the getUserMedia / MediaStream approach:
So in general I'd be happy to throw my (admittedly insignificant!) weight behind this approach rather than
|
One item to raise, but that I haven't done much thinking about personally, is whether users would need a timestamp for the current camera image. In terms of spawning anchors into a tracked local space, then just having the correct spatial relationship feels sufficient (ie the camera's pose would be the pose it was in when the frame was captured). I think that would allow correctly mapping poses from the camera into the local space without needing timestamps. One area where timestamp knowledge is likely required is if there is also a tracked controller and you wish to know where that appears in the camera image. I assume by default simply using XRFrame getPose to request the pose of the controller in the camera space would just do the spatial conversion, ie it would return where the predicted controller position for rendering the next frame is relative the to pose the camera was in when it captured the latest camera frame. That's not the same thing as saying where was the controller relative to the camera at the timestamp at which the image was captured. I can't think of a huge number of reasons people would want to know that, but it serves as a thought experiment for whether the difference in timestamps needs to be made explicit. I would note that a separate |
With TPAC coming up, there was a suggestion the Immersive Web and WebRTC groups could meet to discuss potential integrations of XR data into the WebRTC spec. WebRTC has come up in response to various discussions over the years, so it might be worth narrowing down a bit on what is practically being discussed.
There’s the use-case outlined here for remote assistance.
However on top of that,
getUserMedia
has often been suggested as the general way to support access to a camera stream for local processing, either on CPU or GPU, as an alternative to the current raw-camera-access draft spec for “synchronous” access to ARCore frames.Media Streams in general don’t seem great for per-frame metadata, as I mentioned in immersive-web/webxr-ar-module#78 - requestVideoFrameCallback is best-effort (not good enough for WebXR), and MediaStreamTrackProcessor isn’t widely supported and is aimed at CPU processing, which might involve some costly readback / format conversion.
Using ImageCapture.grabFrame() might be the best bet (although that is also Chromium-only at the moment, it doesn't feel a massive spec).
grabFrame returns a promise that resolves to an ImageBitmap of the next frame - which despite the name is just a handle to the data, which can be on the GPU. As currently written in the spec there's no metadata with the ImageBitmap. I did notice this has been previously suggested by @thetuvix in the raw-camera-access discussion here.
With that proposal from @thetuvix, the ImageBitmap from grabFrame() is passed in to an XRSession to retrieve the specific pose data relating to that frame, which feels like a reasonable suggestion to me.
One downside of requiring a running XRSession to access the pose data is that on mobile platforms
immersive-ar
would still be required, which has some practical issues. Aninline-tracked
session type in combination with an XR MediaStream would do it though, so that can certainly be addressed down the line.The text was updated successfully, but these errors were encountered: