LLM Integration
A step-by-step guide for integrating the Vault inference SDK into a Next.js 15 App Router application. Covers client setup, prompt engineering, streaming, and production error handling.
Overview
This guide builds a production-ready inference layer: a singleton Vault client, a server action that calls the inference API, and a streaming UI component that renders tokens as they arrive.
Setup
Create a singleton client. Instantiating per-request wastes connections and bypasses the built-in connection pool.
npm install @vault/sdk
# .env.local VAULT_API_KEY=vlt_live_xxxxxxxxxxxxxxxxxxxxxxxx VAULT_WORKSPACE=my-workspace
// lib/vault.ts
import { VaultClient } from '@vault/sdk';
export const vault = new VaultClient({
apiKey: process.env.VAULT_API_KEY!,
workspace: process.env.VAULT_WORKSPACE ?? 'default',
timeout: 30_000,
});First Request
Use a Next.js server action so the API key never reaches the client. Call vault.infer() with your model and prompt.
'use server';
import { vault } from '@/lib/vault';
export async function runInference(prompt: string): Promise<string> {
const result = await vault.infer({
model: 'vault-3-turbo',
prompt,
maxTokens: 256,
});
return result.text;
}maxTokens low in development to reduce cost and latency during iteration.Streaming
For long completions, stream the response via a Route Handler and consume it with the Fetch API on the client.
import { vault } from '@/lib/vault';
export async function POST(req: Request) {
const { prompt } = await req.json();
const stream = vault.stream({ model: 'vault-3-turbo', prompt });
const encoder = new TextEncoder();
return new Response(
new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
controller.enqueue(encoder.encode(chunk.delta));
if (chunk.done) controller.close();
}
},
}),
{ headers: { 'Content-Type': 'text/plain; charset=utf-8' } },
);
}Error Handling
The SDK throws typed errors. Use isVaultError to distinguish SDK errors from unexpected throws.
import { vault, isVaultError } from '@vault/sdk';
try {
const result = await vault.infer({ model: 'vault-3-turbo', prompt });
return result.text;
} catch (err) {
if (isVaultError(err)) {
if (err.status === 429) {
// Rate limited — back off and retry
}
if (err.status === 401) {
// API key invalid — alert and stop
}
console.error('Vault error:', err.code, err.message);
}
throw err; // Re-throw unexpected errors
}