Three.js From Zero · Article s10-04

S10-04 LLM-Driven NPCs

Season 10 · Article 04

LLM-Driven NPCs

Claude or GPT as NPC brain. Function-calling maps to 3D actions. Tone, memory, personality via system prompt. Players talk. Characters respond and act.

1. The architecture

// Per turn:
player_input = "Can you take me to the armory?"

messages = [
  { role: 'system', content: CHARACTER_PROMPT },
  ...history,
  { role: 'user', content: player_input }
];

tools = [
  { name: 'walk_to', parameters: { location: 'string' } },
  { name: 'face_player', parameters: {} },
  { name: 'play_emote', parameters: { emote: 'string' } },
];

response = await llm.chat({ messages, tools });

// LLM responds with dialog + function call
if (response.tool_call) invokeInScene(response.tool_call);
speak(response.content);

2. System prompt (personality)

CHARACTER_PROMPT = `
You are Grenna, a 55-year-old dwarven blacksmith in the town of Fenhold.
You're gruff but kind. Knowledgeable about smithing and local gossip.

Available locations: forge, tavern, market, armory, my home.
Available emotes: wave, nod, shrug, laugh, sigh.

Respond in 1-2 sentences. Call tools to move or emote naturally.
NEVER break character. NEVER mention you're an AI.
`;

3. Function calling (OpenAI format)

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages,
  tools: [
    { type: 'function', function: { name: 'walk_to', parameters: {
      type: 'object', properties: { location: { type: 'string' } }
    }}}
  ],
});

const call = response.choices[0].message.tool_calls?.[0];
if (call) {
  const args = JSON.parse(call.function.arguments);
  npc.walkTo(args.location);
}

4. Memory

Short-term: last 10-20 turns in messages array.
Long-term: summary inserted into system prompt. "Summarize this conversation" every N turns.
World state: inject scene facts into system prompt.

5. Latency budgets

GPT-4 (full): ~2s per turn. Feels slow in-game.
GPT-4o-mini / Claude Haiku: ~500-800ms. Acceptable.
Llama 3 8B running locally: ~300ms. Best, needs WebGPU.
Stream tokens: start TTS / lip sync on first token, hide latency.

6. Live demo — NPC chat (mock)

A mock LLM (rule-based) responds. In production: hit your API with the pattern above.

Grenna (dwarven blacksmith, Fenhold) is ready to chat.

7. Platforms (packaged)

Inworld AI: personality + voice + memory.
Convai: similar; Unity/Unreal SDKs.
NVIDIA ACE: NPC framework, voice + facial.
Replica: voices.

8. Privacy / safety

Never send user credentials.
Rate-limit per player.
Block prompt-injection attempts ("ignore previous instructions…") with pre-filter.
Content filter: OpenAI's moderation API before displaying.

9. Cost

GPT-4o-mini: $0.15/M input tokens, $0.60/M output. Average chat = 500 tokens = $0.0003/turn. 1000 players × 50 turns/session = $15/session/1000 players. Manageable.

10. Takeaways

LLM as NPC brain via chat API + function calling.
System prompt = personality + available actions.
Short-term memory in messages array. Long-term via summaries.
GPT-4o-mini or Claude Haiku for latency.
Stream tokens to start lip sync early.
Content safety + rate limits critical.