Article 05 · Three.js From Zero

Loading 3D Models (glTF)

Up to this point every mesh we've built has been code-generated — primitives, custom geometry, procedural textures. In the real world you load art. This article is about loading models you didn't author, specifically in glTF — the format Khronos designed for real-time 3D on the web, and the only format you should be shipping in 2026.

The demo loads a fully rigged glTF character with multiple embedded animation clips. Pick a clip from the dropdown and the character plays it with a smooth crossfade from whatever was playing before.

0 fps

animation crossfade time scale wireframe

live clip weights

state machine add dwell 2.0s

pick clips above and press ▶ run — the mixer will crossfade through them in order, on loop

Loading model…

drag to orbit

Where to actually see the crossfade: build a sequence in the State Machine panel (hit the preset button for an instant demo, or pick your own clips), press ▶ run, and watch both the weight bars on the left AND the chip highlight sliding through the sequence. Bump crossfade up to 1.5 seconds and the blend between each pair of states becomes slow enough to see every interpolated frame.

Why glTF (and not FBX, OBJ, or Collada)

glTF is sometimes called "the JPEG of 3D". It's a runtime-optimized format designed to be loaded fast and rendered as-is — not edited. Three.js has loaders for FBX, OBJ, DAE, STL, etc., but you should pick glTF by default:

Format	Size	Features	Loader speed
`.glb / .gltf`	Tiny (Draco/KTX2)	PBR, anim, skin, morph, lights, cameras	Fast
`.fbx`	Big	Full DCC format, spotty PBR	Slow
`.obj`	Medium	Meshes + materials. No animation.	Medium
`.dae`	Big (XML)	Legacy Collada	Slow

Tell your artist: "export as glTF 2.0 binary (.glb)". If they can't, ask again. Every modern DCC tool supports it — Blender native, Maya via plugin, 3ds Max, C4D, Substance Painter.

.gltf vs .glb

Same format, two packagings:

.gltf — JSON manifest + separate .bin file + image files. Human-readable, easy to inspect and diff.
.glb — everything in one binary blob. Smaller, one HTTP request. Ship this.

Loading a glTF

import { GLTFLoader } from 'three/addons/loaders/GLTFLoader.js';

const loader = new GLTFLoader();

const gltf = await loader.loadAsync('/models/character.glb');

scene.add(gltf.scene);

That's it. `gltf.scene` is a regular `Object3D` — you can position, rotate, scale, traverse, and parent it like any other group.

What's in the `gltf` object

gltf.scene        // Object3D — the visible result
gltf.scenes       // Object3D[] — if the file has multiple scenes (rare)
gltf.animations   // AnimationClip[] — feed these to an AnimationMixer
gltf.cameras      // Camera[] — if the artist included cameras
gltf.asset        // { version, generator, ... } metadata
gltf.userData     // arbitrary metadata from the artist
gltf.parser       // internal parser (rarely needed)

The canonical boilerplate

loader.load(
  url,
  (gltf) => {               // onLoad
    scene.add(gltf.scene);
    gltf.scene.traverse((o) => {
      if (o.isMesh) {
        o.castShadow = true;
        o.receiveShadow = true;
      }
    });
  },
  (e) => console.log('loaded', (e.loaded / e.total * 100).toFixed(1) + '%'), // onProgress
  (err) => console.error(err),  // onError
);

The progress callback is useful for a loading bar. Three things to know:

e.total is 0 if the server doesn't send Content-Length. Guard against divide-by-zero.
Textures inside the glTF load after onLoad fires — there's a small frame or two where materials render white. Usually invisible.
Errors are often CORS issues in development. Serve from the same origin or configure headers.

Compressed glTF: Draco + KTX2

A raw glTF of a medium character is 5–15 MB. That's fine for a game but slow on a mobile landing page. Two compression options stack:

Draco — for meshes

import { DRACOLoader } from 'three/addons/loaders/DRACOLoader.js';

const draco = new DRACOLoader()
  .setDecoderPath('https://www.gstatic.com/draco/versioned/decoders/1.5.7/');

loader.setDRACOLoader(draco);

Typical compression: 4–10× smaller geometry. CPU cost to decode: milliseconds. Author Draco'd glTF with gltfpack or glTF-Transform.

KTX2 — for textures inside the glTF

import { KTX2Loader } from 'three/addons/loaders/KTX2Loader.js';

const ktx2 = new KTX2Loader()
  .setTranscoderPath('https://www.gstatic.com/basis/versioned/2022-04-04/')
  .detectSupport(renderer);

loader.setKTX2Loader(ktx2);
loader.setMeshoptDecoder(await import('three/addons/libs/meshopt_decoder.module.js').then(m => m.MeshoptDecoder));

With all three set (Draco + KTX2 + Meshopt), you load files that are 20–50× smaller than raw FBX and still get full PBR + animation.

Animation — the AnimationMixer pattern

glTF ships animations as an array of AnimationClip objects. To actually play them you need an AnimationMixer — a per-object animation controller.

const mixer = new THREE.AnimationMixer(gltf.scene);

// An action is a scheduled playback of a clip.
const idleClip = THREE.AnimationClip.findByName(gltf.animations, 'Idle');
const idleAction = mixer.clipAction(idleClip);
idleAction.play();

// In the loop:
const clock = new THREE.Clock();
renderer.setAnimationLoop(() => {
  mixer.update(clock.getDelta());
  renderer.render(scene, camera);
});

What is a crossfade, actually?

Here's the mental model — and the reason the live weight bars in the demo exist:

Every AnimationAction has a weight between 0 and 1. Weight is how loudly that clip contributes to the final pose. The mixer sums all playing actions weighted by their weights and averages to 1.0. If only Idle is playing with weight 1.0, you see pure Idle. If only Dance is playing with weight 1.0, you see pure Dance.

A crossfade is what happens in between:

Start with Idle running at weight 1, Dance not playing.
Call Dance.fadeIn(0.8) and Idle.fadeOut(0.8).
For the next 0.8 seconds, both clips play simultaneously. Dance's weight ramps 0 → 1. Idle's weight ramps 1 → 0.
At any moment in that window, the mixer is blending the two poses together. At weight 0.5 / 0.5 it's the literal average of both poses.
After 0.8 seconds Idle's weight is 0 (silent) and Dance is at 1 (fully visible).

Turn on the auto-loop button in the demo and watch the weight bars — the orange/green fill slides from one clip to the other, and that graph is the crossfade. If you drop the fade slider to 0.05s the bars snap instantly and the character pops. Push it to 1.5s and the blend is slow enough that you see a clearly interpolated hybrid pose for most of that second and a half.

This is why game character movement feels smooth. Every transition in every modern game — idle → walk, walk → run, run → jump, attack → idle — is a crossfade between clips, usually 0.1–0.3 seconds long. No keyframe editing at runtime, just weights.

From crossfades to a state machine

Two-layer diagram: top layer shows the state machine as graph nodes (Idle, Walking, Running, Punch) connected by transitions labeled with keys (W, Shift, Space). Bottom layer shows the mixer's crossfade: Walking weight curve descending while Running weight curve rises, overlapping in a yellow fade region. A summary panel explains how the FSM triggers the mixer's fadeIn/fadeOut calls. — The FSM decides WHEN to change state (top). The mixer decides HOW to blend the weights during the transition (bottom).

A state machine is what you build on top of crossfading to drive a character. It's two pieces:

States — each one maps to an animation clip. Idle, Walking, Running, Jump, Attack.
Transitions — the rules for switching. In a real game, transitions are triggered by inputs or conditions ("joystick pushed → Walking", "attack button → Attack", "attack clip finished → Idle").

The state machine only decides when to change state. The actual visual blending is the mixer's job — it calls the same fadeIn / fadeOut pair you saw above on every transition. That's literally the entire relationship between the two concepts.

The State Machine panel under the demo is a minimal sequencer built on this idea. Instead of game inputs driving transitions, a timer does — every N seconds (the "dwell" slider), it advances to the next state in the sequence you defined. Press preset for a sample sequence, then ▶ run. Watch:

The active chip slides along the sequence — that's the "current state".
The weight bars show two clips overlapping at every transition — that's the crossfade.
The character smoothly morphs from Idle pose → Walking stride → Running → Dance → Wave → back to Idle on loop.

Try: build your own sequence with the add dropdown. Try a deliberately awkward one like Sitting → Jump → Death and see how much work the crossfade is doing to make even nonsense transitions look plausible.

The minimal state machine code

Stripped to essence, a FSM over animation clips is this:

const sm = { sequence: ['Idle', 'Walking', 'Running'], idx: 0, timer: null };

function next() {
  const name = sm.sequence[sm.idx % sm.sequence.length];
  playClip(name);              // the crossfade helper from above
  sm.idx++;
  sm.timer = setTimeout(next, 2000);
}
next();

Everything else — chips, add/remove UI, dwell control — is UI. The core is 7 lines.

Two separate times: dwell vs crossfade

The sequencer in the demo has two time-based controls, and they confuse people.

Control	What it sets	Lives on
`crossfade`	Blend duration between two clips	The action (`fadeIn` / `fadeOut` argument)
`dwell`	How long each state holds before moving on	The state machine (`setTimeout` delay)

Rule of thumb: dwell should be larger than crossfade. If dwell = 0.3s and crossfade = 0.8s, the next transition starts before the previous one has finished — fades pile up and the character looks twitchy. In the demo, the default is dwell 2s, crossfade 0.6s. The character has ~1.4s of "pure state" between blends.

A full sequencer, 40 lines

This is the complete demo logic, stripped of UI-only concerns:

// One state = one clip name.
const sm = {
  sequence: [],
  idx:      0,
  timer:    null,
  running:  false,
};

function smAdd(name)    { sm.sequence.push(name); renderChips(); }
function smRemove(i)    { sm.sequence.splice(i, 1); renderChips(); }
function smClear()      { sm.sequence = []; renderChips(); }

function smStart(dwellMs = 2000) {
  if (sm.running || !sm.sequence.length) return;
  sm.running = true;
  sm.idx = 0;

  const step = () => {
    const name = sm.sequence[sm.idx % sm.sequence.length];
    playClip(name);                 // uses mixer crossfade
    renderChips();                  // highlight current chip
    sm.idx++;
    sm.timer = setTimeout(step, dwellMs);
  };
  step();
}

function smStop() {
  sm.running = false;
  if (sm.timer) { clearTimeout(sm.timer); sm.timer = null; }
  renderChips();
}

Rendering the chip row

The visible list of states in the demo is rendered every time the sequence or the active index changes — there's no framework, just a function that rewrites innerHTML:

function renderChips() {
  container.innerHTML = '';
  sm.sequence.forEach((name, i) => {
    const chip = document.createElement('span');
    chip.className = 'chip' + (sm.running && i === sm.idx ? ' active' : '');
    chip.innerHTML =
      `<span class="idx">${i + 1}</span>` +
      `<span>${name}</span>` +
      `<button class="x">×</button>`;
    chip.querySelector('.x').addEventListener('click', () => smRemove(i));
    container.appendChild(chip);
  });
}

Note the active class — that's what paints the currently-running state green. When smStart advances, it calls renderChips() after incrementing sm.idx, which re-runs this function and the green highlight moves to the next chip.

The "don't stomp on me" pattern for the main dropdown

There's a subtle bug the demo avoids: the state machine programmatically sets ui.clip.value = name on every transition, but the main dropdown has a change listener that also calls playClip — an infinite re-entrancy loop.

Fix: a suppression flag:

let suppressClipChange = false;

ui.clip.addEventListener('change', () => {
  if (suppressClipChange) return;
  if (sm.running) smStop();     // user took manual control
  playClip(ui.clip.value);
});

// Inside the sequencer:
suppressClipChange = true;
ui.clip.value = name;
suppressClipChange = false;
playClip(name);

Same pattern applies any time you're syncing a control's value from code — form inputs, segmented toggles, even the URL hash.

From timer-driven to event-driven

A real game FSM isn't timer-driven — transitions are triggered by events. The code structure is identical, you just replace setTimeout with event handlers:

const states = {
  Idle:    { clip: 'Idle',    next: { run: 'Running', attack: 'Punch' } },
  Running: { clip: 'Running', next: { stop: 'Idle',    attack: 'Punch' } },
  Punch:   { clip: 'Punch',   next: { finished: 'Idle' } },
};

let current = 'Idle';
playClip(states[current].clip);

function trigger(event) {
  const target = states[current].next[event];
  if (target) {
    current = target;
    playClip(states[current].clip);
  }
}

// Hook up inputs:
addEventListener('keydown', (e) => {
  if (e.key === 'Shift') trigger('run');
  if (e.key === ' ')     trigger('attack');
});

// Clip-finished events from the mixer:
mixer.addEventListener('finished', (e) => {
  if (e.action.getClip().name === 'Punch') trigger('finished');
});

That's a game-ready FSM in 25 lines. Transitions are guarded — if you're in Punch and press Shift, nothing happens because Punch.next doesn't include a run event. That guarding is the whole reason you'd build a state machine instead of just "play whatever the user asks for".

Stopping cleanly

Every route change, scene change, or model unload needs the sequencer to stop:

function teardown() {
  smStop();                    // clear the setTimeout
  mixer.stopAllAction();       // stop every clip the mixer is holding
  mixer.uncacheRoot(gltfScene);// release cached bindings (tracks to targets)
  disposeGltf(gltfScene);      // dispose geom, materials, textures
}

The crossfade code again, in full

For reference, here's the playClip helper that the state machine calls:

The crossfade code again, in full

For reference, here's the playClip helper that the state machine calls:

function play(name, fade = 0.4) {
  const next = mixer.clipAction(
    THREE.AnimationClip.findByName(gltf.animations, name)
  );
  if (current === next) return;

  next.reset().setEffectiveWeight(1).fadeIn(fade).play();
  current?.fadeOut(fade);
  current = next;
}

Three notes:

fadeIn and fadeOut work on action weights over time. Both running at once = a crossfade.
reset() before fadeIn resets time AND weight, avoiding stale state.
action.timeScale = 0.5 plays the clip at half speed. Negative values play it backwards.

Loop modes

action.setLoop(THREE.LoopOnce);        // play once, stop
action.setLoop(THREE.LoopRepeat);      // default
action.setLoop(THREE.LoopPingPong);    // forward, reverse, forward, ...
action.clampWhenFinished = true;    // hold last frame instead of returning to rest

Traversing the loaded scene

After load you'll often want to find specific parts of the model — tag the hero mesh for outlining, hide placeholder geometry the artist left in, or swap a texture:

gltf.scene.traverse((o) => {
  if (o.isMesh) {
    o.castShadow = true;
    o.receiveShadow = true;

    // Find by name (set in the DCC tool)
    if (o.name === 'Helmet_Screen') {
      o.material.emissiveIntensity = 2;
    }

    // Or enumerate all materials in the file:
    // console.log(o.name, o.material.name);
  }
});

// Or grab by name directly:
const head = gltf.scene.getObjectByName('Head');

Disposing a loaded model cleanly

Because glTFs pull in geometry, materials, textures, and skeletons, unloading them correctly matters. A full traversal:

function disposeGltf(gltfScene) {
  gltfScene.traverse((o) => {
    if (o.geometry) o.geometry.dispose();
    if (o.material) {
      const mats = Array.isArray(o.material) ? o.material : [o.material];
      for (const m of mats) {
        for (const key in m) {
          const v = m[key];
          if (v && v.isTexture) v.dispose();
        }
        m.dispose();
      }
    }
  });
  scene.remove(gltfScene);
}

The for...in loop catches every texture slot on the material (map, normalMap, roughnessMap, etc.) without you having to enumerate them by hand.

Free model sources

Site	What's there	License
Poly Haven	High-end PBR models, all glTF-ready	CC0 (free even commercially)
Kenney Assets	Low-poly game assets, huge libraries	CC0
Quaternius	Stylized low-poly, huge collection	CC0
Sketchfab	Largest library, quality varies	Mixed — filter for CC0 / CC-BY
Khronos samples	Reference glTFs (DamagedHelmet, etc.)	Various, all open

Common first-time pitfalls

Model loads but is invisible. It's huge or tiny — no camera can see it. console.log(gltf.scene), check the scale. Or use a BoxHelper to find it.
Model is black / no lighting. You forgot scene.environment. PBR materials need it.
Animations don't play. You created the mixer but don't call mixer.update(delta) every frame.
Clip not found. The name in the dropdown doesn't match. console.log(gltf.animations.map(c => c.name)) to see what's there.
Model flickers for a frame. Textures arrive after onLoad. Hide the scene for one frame, or use manager.onLoad.
CORS errors. Serve from same origin, or enable Access-Control-Allow-Origin headers. file:// won't work.

Exercises

Load Khronos's DamagedHelmet glTF, apply an HDR envMap (Article 04 recipe), and get the final render looking like the reference image.
Download a Poly Haven model, compress it with glTF-Transform (Draco + KTX2), and compare file sizes before/after. Write down the number.
Take the character in the demo and add a "ping-pong" button that plays Idle → Walking → Running forward, then reverses back to Idle. Hint: LoopOnce + finished event on the mixer.

What's next

Article 06 — Interactivity: Raycaster, Controls, Events. How to click on things in 3D, build camera controls, drag objects around, and manage pointer events unambiguously across mouse / touch / pen.

Loading 3D Models (glTF)

Why glTF (and not FBX, OBJ, or Collada)

.gltf vs .glb

Loading a glTF

What's in the gltf object

The canonical boilerplate

Compressed glTF: Draco + KTX2

Draco — for meshes

KTX2 — for textures inside the glTF

Animation — the AnimationMixer pattern

What is a crossfade, actually?

From crossfades to a state machine

The minimal state machine code

Two separate times: dwell vs crossfade

A full sequencer, 40 lines

Rendering the chip row

The "don't stomp on me" pattern for the main dropdown

From timer-driven to event-driven

Stopping cleanly

The crossfade code again, in full

The crossfade code again, in full

Loop modes

Traversing the loaded scene

Disposing a loaded model cleanly

Free model sources

Common first-time pitfalls

Exercises

What's next

What's in the `gltf` object