Three.js From Zero · Article s10-04
S10-04 LLM-Driven NPCs
Season 10 · Article 04
LLM-Driven NPCs
Claude or GPT as NPC brain. Function-calling maps to 3D actions. Tone, memory, personality via system prompt. Players talk. Characters respond and act.
1. The architecture
// Per turn:
player_input = "Can you take me to the armory?"
messages = [
{ role: 'system', content: CHARACTER_PROMPT },
...history,
{ role: 'user', content: player_input }
];
tools = [
{ name: 'walk_to', parameters: { location: 'string' } },
{ name: 'face_player', parameters: {} },
{ name: 'play_emote', parameters: { emote: 'string' } },
];
response = await llm.chat({ messages, tools });
// LLM responds with dialog + function call
if (response.tool_call) invokeInScene(response.tool_call);
speak(response.content);
2. System prompt (personality)
CHARACTER_PROMPT = `
You are Grenna, a 55-year-old dwarven blacksmith in the town of Fenhold.
You're gruff but kind. Knowledgeable about smithing and local gossip.
Available locations: forge, tavern, market, armory, my home.
Available emotes: wave, nod, shrug, laugh, sigh.
Respond in 1-2 sentences. Call tools to move or emote naturally.
NEVER break character. NEVER mention you're an AI.
`;
3. Function calling (OpenAI format)
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages,
tools: [
{ type: 'function', function: { name: 'walk_to', parameters: {
type: 'object', properties: { location: { type: 'string' } }
}}}
],
});
const call = response.choices[0].message.tool_calls?.[0];
if (call) {
const args = JSON.parse(call.function.arguments);
npc.walkTo(args.location);
}
4. Memory
- Short-term: last 10-20 turns in messages array.
- Long-term: summary inserted into system prompt. "Summarize this conversation" every N turns.
- World state: inject scene facts into system prompt.
5. Latency budgets
- GPT-4 (full): ~2s per turn. Feels slow in-game.
- GPT-4o-mini / Claude Haiku: ~500-800ms. Acceptable.
- Llama 3 8B running locally: ~300ms. Best, needs WebGPU.
- Stream tokens: start TTS / lip sync on first token, hide latency.
6. Live demo — NPC chat (mock)
A mock LLM (rule-based) responds. In production: hit your API with the pattern above.
Grenna (dwarven blacksmith, Fenhold) is ready to chat.
7. Platforms (packaged)
- Inworld AI: personality + voice + memory.
- Convai: similar; Unity/Unreal SDKs.
- NVIDIA ACE: NPC framework, voice + facial.
- Replica: voices.
8. Privacy / safety
- Never send user credentials.
- Rate-limit per player.
- Block prompt-injection attempts ("ignore previous instructions…") with pre-filter.
- Content filter: OpenAI's moderation API before displaying.
9. Cost
GPT-4o-mini: $0.15/M input tokens, $0.60/M output. Average chat = 500 tokens = $0.0003/turn. 1000 players × 50 turns/session = $15/session/1000 players. Manageable.
10. Takeaways
- LLM as NPC brain via chat API + function calling.
- System prompt = personality + available actions.
- Short-term memory in messages array. Long-term via summaries.
- GPT-4o-mini or Claude Haiku for latency.
- Stream tokens to start lip sync early.
- Content safety + rate limits critical.