Edit on GitHub
Guide12 min read

LLM Integration

A step-by-step guide for integrating the Vault inference SDK into a Next.js 15 App Router application. Covers client setup, prompt engineering, streaming, and production error handling.

Overview

This guide builds a production-ready inference layer: a singleton Vault client, a server action that calls the inference API, and a streaming UI component that renders tokens as they arrive.

01Initialize a singleton VaultClient in lib/vault.ts
02Write a server action that calls vault.infer()
03Add streaming with vault.stream() and ReadableStream
04Handle errors with VaultError type guards

Setup

Create a singleton client. Instantiating per-request wastes connections and bypasses the built-in connection pool.

1
Install the SDK
bash
npm install @vault/sdk
2
Add environment variables
bash
# .env.local
VAULT_API_KEY=vlt_live_xxxxxxxxxxxxxxxxxxxxxxxx
VAULT_WORKSPACE=my-workspace
3
Create the client singleton
ts
// lib/vault.ts
import { VaultClient } from '@vault/sdk';

export const vault = new VaultClient({
  apiKey:    process.env.VAULT_API_KEY!,
  workspace: process.env.VAULT_WORKSPACE ?? 'default',
  timeout:   30_000,
});

First Request

Use a Next.js server action so the API key never reaches the client. Call vault.infer() with your model and prompt.

app/actions/infer.ts
'use server';
import { vault } from '@/lib/vault';

export async function runInference(prompt: string): Promise<string> {
  const result = await vault.infer({
    model:     'vault-3-turbo',
    prompt,
    maxTokens: 256,
  });
  return result.text;
}
Keep maxTokens low in development to reduce cost and latency during iteration.

Streaming

For long completions, stream the response via a Route Handler and consume it with the Fetch API on the client.

app/api/infer/route.ts
import { vault } from '@/lib/vault';

export async function POST(req: Request) {
  const { prompt } = await req.json();
  const stream = vault.stream({ model: 'vault-3-turbo', prompt });
  const encoder = new TextEncoder();

  return new Response(
    new ReadableStream({
      async start(controller) {
        for await (const chunk of stream) {
          controller.enqueue(encoder.encode(chunk.delta));
          if (chunk.done) controller.close();
        }
      },
    }),
    { headers: { 'Content-Type': 'text/plain; charset=utf-8' } },
  );
}

Error Handling

The SDK throws typed errors. Use isVaultError to distinguish SDK errors from unexpected throws.

app/actions/infer.ts
import { vault, isVaultError } from '@vault/sdk';

try {
  const result = await vault.infer({ model: 'vault-3-turbo', prompt });
  return result.text;
} catch (err) {
  if (isVaultError(err)) {
    if (err.status === 429) {
      // Rate limited — back off and retry
    }
    if (err.status === 401) {
      // API key invalid — alert and stop
    }
    console.error('Vault error:', err.code, err.message);
  }
  throw err; // Re-throw unexpected errors
}
The SDK retries 429 and 5xx responses automatically. You only need to handle errors that exhaust the retry budget.