Edit on GitHub
Guide9 min read

Observability

Instrument your Vault integration with structured logging, distributed tracing, and cost tracking. Know what your models are doing in production before your users notice something is wrong.

Request Logging

Log every inference request with timing and token counts. Use structured JSON so logs are queryable.

lib/vault.ts
import { VaultClient } from '@vault/sdk';

const client = new VaultClient({
  apiKey: process.env.VAULT_API_KEY!,
  hooks: {
    onRequest({ model, prompt }) {
      console.log(JSON.stringify({
        event: 'vault.request',
        model,
        promptTokens: prompt.length,
        ts: Date.now(),
      }));
    },
    onResponse({ model, usage, latencyMs }) {
      console.log(JSON.stringify({
        event:         'vault.response',
        model,
        inputTokens:   usage.inputTokens,
        outputTokens:  usage.outputTokens,
        latencyMs,
        ts: Date.now(),
      }));
    },
  },
});

export { client as vault };
Avoid logging prompt content in production. It may contain PII. Log token counts and model names only.

Tracing

Attach a trace ID to each inference call to correlate it with the upstream request in your observability platform.

app/api/infer/route.ts
import { vault } from '@/lib/vault';
import { randomUUID } from 'crypto';

export async function POST(req: Request) {
  const traceId = req.headers.get('x-trace-id') ?? randomUUID();

  const result = await vault.infer({
    model: 'vault-3-turbo',
    prompt: await req.text(),
    metadata: { traceId },
  });

  return Response.json({ text: result.text }, {
    headers: { 'x-trace-id': traceId },
  });
}

Cost Tracking

Use token counts from the response to calculate cost per request. Aggregate daily to track spend trends.

vault-3-turbo$0.50 / 1M input$1.50 / 1M output
vault-3-pro$3.00 / 1M input$15.00 / 1M output
vault-3-mini$0.10 / 1M input$0.30 / 1M output

Alerts

Set up alerts on these signals to catch problems before they impact users.

p95 latency > 5sCheck model load, consider switching to vault-3-mini for this endpoint.
error_rate > 1%Inspect error codes. 429 means rate limit, 5xx means model instability.
daily_cost > thresholdAudit token usage. Common cause: prompts growing unbounded with context.
output_tokens spikeCheck maxTokens cap. A missing cap allows unbounded completions.