Three.js From Zero · Article s5-09
S5-09 GPU Particles
GPU Particle Systems
CPU particles: 10k at 60fps. GPU particles: 1M+ at 60fps. Simulate on the GPU, draw without a CPU roundtrip. Every spark, fog wisp, spell effect in modern games uses this.
1. Where CPU particles break
Each frame: update N particles' positions/velocities/lifetimes on CPU, upload the buffer to GPU, draw as points or quads.
- Upload cost: O(N) bandwidth per frame.
- Simulation cost: O(N) on a serial CPU.
- At 100k particles: frame-budget gone.
2. GPU particle architecture
- Storage buffer: position, velocity, life, color per particle. Lives in VRAM.
- Compute pass: runs once per particle per frame. Updates fields.
- Render pass: draws particles as quads (billboarded) or points. No CPU upload.
// Compute (WGSL / TSL)
@compute fn sim(@builtin(global_invocation_id) id: vec3u) {
let p = particles[id.x];
p.velocity += gravity * dt;
p.position += p.velocity * dt;
p.life -= dt;
if (p.life < 0.0) respawn(&p);
particles[id.x] = p;
}
3. WebGL-era trick: fake compute via FBO ping-pong
Before WebGPU, no compute shaders. Workaround:
- Store particle state in an RGBA float texture (position in .xyz, life in .w).
- Render a fullscreen quad with a "sim" shader that reads the texture and writes the new state to a second RT.
- Swap. Draw particles as GL_POINTS, sampling the latest texture for position.
That's GPGPU. Season 2 Article 09 covered it.
4. WebGPU-era: real compute
Via TSL (S4-10): Fn(), storage(), renderer.compute(). Far cleaner.
5. Billboarding
Particles face the camera. In the vertex shader, expand a quad around each particle point:
// Pass particle center as attribute; quad corners via instanced draw
vec3 right = cameraMatrix[0].xyz;
vec3 up = cameraMatrix[1].xyz;
vec3 world = particleCenter + (corner.x * right + corner.y * up) * size;
gl_Position = viewProj * vec4(world, 1.0);
6. Sorting for transparency
Additive particles (fire, sparks) don't need sorting. Alpha-blended (smoke) do.
GPU bitonic sort: O(n log² n) parallel. Runs in a compute pass pre-render. 1M particles sort in ~1ms.
7. Live demo — 200k fire particles
Fake GPU simulation: we use Points with an instance-count drawing approach. CPU updates a flat buffer once per frame (still plenty of perf at 200k for demo purposes). The shape of the code is identical to a real compute shader.
8. Force fields
At each sim step, accumulate forces:
- Gravity:
v += g * dt - Wind:
v += windField(p) * dt - Curl noise:
v += curlNoise(p * scale) * strength— organic swirls. - Attractor:
v += normalize(attractor - p) * strength / dist²
9. Collision via depth buffer
For ground/obstacle collision, sample last frame's depth buffer:
vec2 uv = toScreenUv(particle.pos);
float sceneDepth = texture(uDepthBuffer, uv).r;
if (particle.depth > sceneDepth) {
particle.velocity.y = -particle.velocity.y * 0.5; // bounce
particle.position.y = sceneDepth + eps;
}
10. Who ships this
- Unreal Niagara: fully GPU. Node graph UI. Fluids, VAT, everything.
- Unity VFX Graph: same concept, HDRP required.
- UE5 Niagara Fluids: millions of sparks with fluid-sim drag.
- Three.js:
three.quarks,three-nebulaare CPU. Roll your own for TSL compute.
11. Takeaways
- CPU: 10k particle ceiling. GPU: millions.
- Compute pass updates state. Render pass draws. No CPU round trip.
- Billboarding in vertex shader expands points to camera-facing quads.
- Additive blend needs no sort. Alpha blend needs bitonic sort.
- Depth-buffer collision is cheap and convincing.