Three.js From Zero · Article s5-07
S5-07 Clustered Forward Lighting
Clustered Forward Lighting
Deferred scales to many lights but breaks MSAA + transparency. Clustered forward = 3D grid over the view frustum, lights binned per cell, shader loops only its cell's lights. Scales AND handles transparency.
1. The motivation
Forward + 100 lights = each fragment loops 100 lights. Dead.
Forward+ / clustered: slice the frustum into a 16×9×24 grid (~3500 cells). Assign each light to the cells its radius intersects. Shader samples its cell → loops 4-20 lights instead of 100.
2. The two passes
// CPU (or compute): bin lights
for each light:
for each cluster cell the light's bounding sphere overlaps:
add light.id to cell.lightList
// Upload two buffers:
// - lightIndices: concatenated list of all light IDs
// - clusterHeads: per-cell (offset, count) into lightIndices
// GPU shading
vec3 shade(fragment) {
uvec3 cluster = computeCluster(gl_FragCoord, depth);
(offset, count) = clusterHeads[cluster];
for (uint i = 0; i < count; i++) {
uint lightId = lightIndices[offset + i];
color += lighting(lights[lightId]);
}
}
3. The cluster math
Grid is in view space. X/Y are 2D screen tiles. Z is logarithmic slices (tight near camera, looser far away).
float slice(float viewZ, float near, float far, int N) {
return float(N) * log(viewZ / near) / log(far / near);
}
Logarithmic Z gives even distribution of density.
4. Live demo — clustered forward visualization
Many dynamic lights over a scene. Toggle cluster visualization overlay → see the 2D tile grid and light bin counts.
5. Tile size tradeoff
- Small tiles (8×8 px): precise culling, big buffer.
- Large tiles (32×32 px): fewer cells, looser cull.
- Typical: 16×16 or 32×32. Z-slices 16-32.
6. Cluster vs. tile
Tiled forward: 2D screen grid only (no depth slice). Simpler, but long narrow lights overfill tiles.
Clustered forward: adds Z slices. Much tighter cull along view direction.
Unity URP/HDRP, Unreal, Detroit Become Human, Doom Eternal — all clustered.
7. Transparency
Transparent objects sample same cluster buffer. No separate pass. This is the killer feature over deferred.
8. Compute-shader build
Modern: light-list build is itself a compute shader. Each workgroup = one cell, threads fan out over lights, atomic append on hit.
Build cost: sub-millisecond for 1024 lights × 3500 cells on RTX-tier cards.
9. Three.js status
WebGPURenderer has the pieces — compute shaders, storage buffers. No stock clustered lighter yet.
Library: WebGPU samples or roll your own via TSL compute (S4-10).
10. Takeaways
- Clustered = 3D grid over view frustum, lights binned per cell.
- Log-Z slices → even density.
- Per-fragment: loop only your cell's lights (4-20 instead of 100+).
- Supports transparency, unlike deferred.
- Modern AAA default.