Three.js From Zero · Article s11-13

S11-13 WebXR Layers + Depth Sensing in Three.js

Season 11 · Article 13

WebXR Layers + Depth Sensing — the two underused APIs that change quality dramatically

Two APIs in WebXR that most Three.js tutorials skip and most VR/AR demos need. Layers fix aliased text and 360 video — the compositor resamples once instead of three times. Depth Sensing turns Quest 3's environment into a real-time depth buffer your shader can sample. Together they raise quality from "browser demo" to "comfortable to wear."

1. Why this article exists

Two things kill WebXR quality faster than anything else:

Soft, aliased text and UI — because everything you render goes through XRWebGLLayer, runs at headset framerate, and gets resampled at least twice by the compositor.
Virtual objects that always render on top of real surfaces — because without depth, the renderer has no idea your couch is in front of the ball.

Both have fixes that have been in the spec for years. Most projects don't use them. After this article you will know why they matter and exactly how to wire them in.

2. WebXR Layers — what they are

By default, three.js renders an XR session into a single XRWebGLLayer — your scene is a 3D framebuffer the headset compositor takes and reprojects to the displays. Anything in that framebuffer pays a render-then-resample cost. Text, UI, and 360 video especially suffer.

The Layers API lets you submit composition layers instead — typed primitives the compositor handles natively:

Layer type	Use it for
`XRQuadLayer`	Floating UI panels, slate windows, world-space menus, video posters
`XRCylinderLayer`	Curved seated-mode video (Apple Cinema-style), curved menus
`XREquirectLayer`	360° photos and videos, mono or stereo
`XRCubeLayer`	Skyboxes
`XRProjectionLayer`	Your normal 3D scene framebuffer (this is the default if you do nothing)

The trick: a XRQuadLayer is rendered once at native resolution and the compositor resamples it once. A 1024×512 quad layer with text can look physically sharp because it never goes through the GL pipeline at headset rate. Same for equirect 360 video — no skybox sphere, the compositor projects it.

In a typical XRWebGLLayer pipeline a UI panel is sampled 2-3 times — render to FBO, post-effect, compositor reproject. Quad layers cut that to one. The visible difference on Quest 3 text-heavy menus is large.

3. Layers code pattern (vanilla three.js)

const session = await navigator.xr.requestSession('immersive-vr', {
  requiredFeatures: ['layers'],
});

const xrGL = new XRWebGLBinding(session, renderer.getContext());
const refSpace = await session.requestReferenceSpace('local');

// 1) projection layer for the 3D scene
const projection = xrGL.createProjectionLayer({ depth: true });
session.updateRenderState({ layers: [projection] });

// 2) a quad layer for our menu
const menuTex = renderToTexture(menuScene, 1024, 512);
const quad = xrGL.createQuadLayer({
  space: refSpace,
  viewPixelWidth:  1024,
  viewPixelHeight: 512,
  width:  0.6,                    // metres
  height: 0.3,
});
quad.transform = new XRRigidTransform({ x: 0, y: 1.4, z: -1.2 });

// keep both
session.updateRenderState({ layers: [projection, quad] });

// in the frame callback, push the texture into the quad
session.requestAnimationFrame(function frame(t, frame) {
  const subImage = xrGL.getSubImage(quad, frame);
  // copy menuTex into subImage.colorTexture
  ...
  session.requestAnimationFrame(frame);
});

Patterns to know:

Reference space. Quad layers anchor to a XRReferenceSpace. 'local' for head-locked, 'local-floor' for floor-anchored.
Texture push, not draw. You feed an existing GPU texture into the layer's sub-image — typically a Three render target. The compositor handles sampling.
Order matters. Layers list is back-to-front. Put the projection layer first, your UI quad second; transparent quads occlude the projection layer behind.
Polyfill exists. The webxr-layers-polyfill emulates layer types via the projection layer for browsers without native support — same code, smaller win, but at least it works.

4. WebXR Depth Sensing — what it is

Quest 3 and 3S ship a runtime depth pass: the headset's cameras + on-device ML produce a coarse per-pixel depth map of the room, ~5 m range, updated every frame. The Depth Sensing API hands you that map.

Practical applications:

Real-world occlusion. Your virtual ball goes behind your real couch. The biggest immersion lift in mixed reality.
Shadow projection. Cast a virtual shadow onto a real surface.
Physics surface contact. Roll a ball, have it stop on the real coffee table.

None of this needs a precomputed scene mesh. Depth gives you the surface shape this frame, in head space.

5. Depth Sensing code pattern

const session = await navigator.xr.requestSession('immersive-ar', {
  requiredFeatures: ['depth-sensing'],
  depthSensing: {
    usagePreference: ['gpu-optimized'],         // sample directly from shader
    dataFormatPreference: ['luminance-alpha'],   // 16-bit packed
  },
});

// In the frame callback, per view
for (const view of pose.views) {
  const depth = frame.getDepthInformation(view);
  if (!depth) continue;
  // depth.texture (when gpu-optimized) is a WebGLTexture
  // depth.normDepthBufferFromNormView is a 4x4 mat4 you'll feed to the shader
  // your fragment shader does:
  //   vec2 d_uv = (uMatrix * vec4(ndc.xy, 0, 1)).xy;
  //   float realZ = sampleDepth(d_uv);
  //   if (realZ < vViewZ) discard;     // pixel behind real geometry
}

Two flavors:

gpu-optimized. Depth as a WebGL texture you sample in shader. Fast, no copy.
cpu-optimized. Depth as a CPU buffer (XRCPUDepthInformation). Useful for raycasts and physics; slower per-fragment.

Wiring occlusion in three.js is one extra fragment branch in the materials you want occluded. The render loop reads depth once per frame and pushes the matrix as a uniform.

6. Live demo — the simulated effect

Both APIs need a real headset to demonstrate. Most readers don't have a Quest browser open. So here is a non-VR scene that simulates the visual benefits:

Sharp text plane — rendered at 4× canvas resolution to a render target, then drawn as a flat quad. Stands in for what a quad layer does for free.
Depth occlusion sim — a "real-world surface" rectangle sits at z = -1.2. The "virtual cube" in front passes through it as you pull a slider. Toggle occlusion on, and a depth comparison in the fragment shader discards pixels behind the surface — the same trick the depth API gives you in real MR.

–

depth occlusion (sim) sharp text (sim quad layer) cube z -0.6 surface z -1.2 auto-rotate

7. Quest 3 + Vision Pro support — April 2026

Feature	Quest 3 / 3S Browser	Vision Pro Safari (visionOS 2.x)	Chromium Android
WebXR VR mode	Yes	Yes	Yes
WebXR AR mode (immersive-ar)	Yes (passthrough)	No — VR-only on visionOS 2	Yes (ARCore)
Quad / cylinder / equirect layers	Yes (long-standing)	Partial — basic layers in VR mode	Partial
Depth Sensing API	Yes (Horizon Browser 40+)	No	Limited
Persistent anchors	Yes (8 per origin)	No	Limited
Hand tracking	Yes	Yes (Natural Input)	Limited

The asymmetry is real: Quest is the depth + AR target; Vision Pro is the hand-input + VR-mode-only target. Production apps either pick a primary headset or branch on capabilities at session start.

8. Fallbacks for unsupported devices

Both APIs are progressive enhancement candidates. Sane defaults:

Layers fallback. Use the webxr-layers-polyfill — same code, layers emulate via the projection framebuffer. Crispness drops to baseline; the API surface stays.
Depth Sensing fallback. Probe at session start; if absent, fall back to plane detection ('plane-detection' feature) or hit-test against a flat floor. Occlusion suffers; physics still works on detected planes.
visionOS path. Detect 'immersive-ar' rejection, ship a VR-mode flat-floor experience instead, and use Natural Input (gaze + pinch) for selection.

9. The R3F counterpart

If you're on R3F, @react-three/xr v6 wraps both APIs:

import { XR, Layers, useXRDepth } from '@react-three/xr';

<XR sessionInit={{ requiredFeatures: ['layers', 'depth-sensing'] }}>
  <Layers>
    <Layers.Quad position={[0, 1.4, -1.2]} width={0.6} height={0.3}>
      <MenuUI /> { /* normal R3F children, rendered into a quad layer */ }
    </Layers.Quad>
  </Layers>
  <OcclusionMesh /> { /* depth-aware mesh — uses useXRDepth uniform */ }
</XR>

Same APIs, JSX shape. The depth uniform is automatically wired into materials that opt in.

10. Tooling — how to verify it actually worked

Quest 3 dev tooling. chrome://inspect over USB, then headset-side OPenXR perf overlay shows GPU vs compositor split. Layers should reduce GL render time.
Pixel-perfect comparison. Render the same UI through projection vs quad layer. Photograph the lens. Quad will be visibly sharper at small text sizes.
Depth visualizer. A debug shader that maps depth values to a heatmap quickly tells you whether occlusion is working or whether your matrix transform is wrong.

11. Pitfalls

Layers + bloom = no. Composition layers don't go through your post chain. Bloom or vignette only affect the projection layer. Either accept the look or render UI back into the projection layer.
Quad layer alpha. Premultiplied vs straight differs by browser; test on actual hardware.
Depth latency. Depth maps are typically 1-2 frames behind. Fast head turns can show "trailing" occlusion. Smooth with a small EMA in the shader.
Stereo equirect. Stereo mode requires correct layout (top-bottom vs left-right). Wrong = crossed eyes.
Privacy. Depth and camera access trigger separate permission prompts on Quest. Plan onboarding around it.

12. Takeaways

Layers turn UI and 360 video into compositor-native primitives — sharper text, fewer dropped frames.
Depth Sensing turns Quest 3's runtime depth pass into a shader-readable texture — real-world occlusion in <30 lines.
Both are progressive enhancement: polyfill or branch when missing.
Vision Pro Safari still has no AR mode and no Depth API as of April 2026; build a VR-mode fallback.
R3F users get JSX shapes via @react-three/xr v6.