VAULT // NEXT: Developer Docs Starter

Request Logging

Log every inference request with timing and token counts. Use structured JSON so logs are queryable.

lib/vault.ts

import { VaultClient } from '@vault/sdk';

const client = new VaultClient({
  apiKey: process.env.VAULT_API_KEY!,
  hooks: {
    onRequest({ model, prompt }) {
      console.log(JSON.stringify({
        event: 'vault.request',
        model,
        promptTokens: prompt.length,
        ts: Date.now(),
      }));
    },
    onResponse({ model, usage, latencyMs }) {
      console.log(JSON.stringify({
        event:         'vault.response',
        model,
        inputTokens:   usage.inputTokens,
        outputTokens:  usage.outputTokens,
        latencyMs,
        ts: Date.now(),
      }));
    },
  },
});

export { client as vault };

Avoid logging prompt content in production. It may contain PII. Log token counts and model names only.

Tracing

Attach a trace ID to each inference call to correlate it with the upstream request in your observability platform.

app/api/infer/route.ts

import { vault } from '@/lib/vault';
import { randomUUID } from 'crypto';

export async function POST(req: Request) {
  const traceId = req.headers.get('x-trace-id') ?? randomUUID();

  const result = await vault.infer({
    model: 'vault-3-turbo',
    prompt: await req.text(),
    metadata: { traceId },
  });

  return Response.json({ text: result.text }, {
    headers: { 'x-trace-id': traceId },
  });
}

Cost Tracking

Use token counts from the response to calculate cost per request. Aggregate daily to track spend trends.

vault-3-turbo$0.50 / 1M input$1.50 / 1M output

vault-3-pro$3.00 / 1M input$15.00 / 1M output

vault-3-mini$0.10 / 1M input$0.30 / 1M output

Alerts

Set up alerts on these signals to catch problems before they impact users.

p95 latency > 5sCheck model load, consider switching to vault-3-mini for this endpoint.

error_rate > 1%Inspect error codes. 429 means rate limit, 5xx means model instability.

daily_cost > thresholdAudit token usage. Common cause: prompts growing unbounded with context.

output_tokens spikeCheck maxTokens cap. A missing cap allows unbounded completions.

Observability

Request Logging

Tracing

Cost Tracking

Alerts