Three.js From Zero · Article s5-09

S5-09 GPU Particles

Season 5 · Article 09

GPU Particle Systems

CPU particles: 10k at 60fps. GPU particles: 1M+ at 60fps. Simulate on the GPU, draw without a CPU roundtrip. Every spark, fog wisp, spell effect in modern games uses this.

1. Where CPU particles break

Each frame: update N particles' positions/velocities/lifetimes on CPU, upload the buffer to GPU, draw as points or quads.

  • Upload cost: O(N) bandwidth per frame.
  • Simulation cost: O(N) on a serial CPU.
  • At 100k particles: frame-budget gone.

2. GPU particle architecture

  1. Storage buffer: position, velocity, life, color per particle. Lives in VRAM.
  2. Compute pass: runs once per particle per frame. Updates fields.
  3. Render pass: draws particles as quads (billboarded) or points. No CPU upload.
// Compute (WGSL / TSL)
@compute fn sim(@builtin(global_invocation_id) id: vec3u) {
  let p = particles[id.x];
  p.velocity += gravity * dt;
  p.position += p.velocity * dt;
  p.life -= dt;
  if (p.life < 0.0) respawn(&p);
  particles[id.x] = p;
}

3. WebGL-era trick: fake compute via FBO ping-pong

Before WebGPU, no compute shaders. Workaround:

  1. Store particle state in an RGBA float texture (position in .xyz, life in .w).
  2. Render a fullscreen quad with a "sim" shader that reads the texture and writes the new state to a second RT.
  3. Swap. Draw particles as GL_POINTS, sampling the latest texture for position.

That's GPGPU. Season 2 Article 09 covered it.

4. WebGPU-era: real compute

Via TSL (S4-10): Fn(), storage(), renderer.compute(). Far cleaner.

5. Billboarding

Particles face the camera. In the vertex shader, expand a quad around each particle point:

// Pass particle center as attribute; quad corners via instanced draw
vec3 right = cameraMatrix[0].xyz;
vec3 up    = cameraMatrix[1].xyz;
vec3 world = particleCenter + (corner.x * right + corner.y * up) * size;
gl_Position = viewProj * vec4(world, 1.0);

6. Sorting for transparency

Additive particles (fire, sparks) don't need sorting. Alpha-blended (smoke) do.

GPU bitonic sort: O(n log² n) parallel. Runs in a compute pass pre-render. 1M particles sort in ~1ms.

7. Live demo — 200k fire particles

Fake GPU simulation: we use Points with an instance-count drawing approach. CPU updates a flat buffer once per frame (still plenty of perf at 200k for demo purposes). The shape of the code is identical to a real compute shader.

8. Force fields

At each sim step, accumulate forces:

  • Gravity: v += g * dt
  • Wind: v += windField(p) * dt
  • Curl noise: v += curlNoise(p * scale) * strength — organic swirls.
  • Attractor: v += normalize(attractor - p) * strength / dist²

9. Collision via depth buffer

For ground/obstacle collision, sample last frame's depth buffer:

vec2 uv = toScreenUv(particle.pos);
float sceneDepth = texture(uDepthBuffer, uv).r;
if (particle.depth > sceneDepth) {
  particle.velocity.y = -particle.velocity.y * 0.5;  // bounce
  particle.position.y = sceneDepth + eps;
}

10. Who ships this

  • Unreal Niagara: fully GPU. Node graph UI. Fluids, VAT, everything.
  • Unity VFX Graph: same concept, HDRP required.
  • UE5 Niagara Fluids: millions of sparks with fluid-sim drag.
  • Three.js: three.quarks, three-nebula are CPU. Roll your own for TSL compute.

11. Takeaways

  • CPU: 10k particle ceiling. GPU: millions.
  • Compute pass updates state. Render pass draws. No CPU round trip.
  • Billboarding in vertex shader expands points to camera-facing quads.
  • Additive blend needs no sort. Alpha blend needs bitonic sort.
  • Depth-buffer collision is cheap and convincing.