Edit on GitHub
Getting Started8 min read

Core Concepts

Before building with Vault, it helps to understand the three primitives that everything else is built on: workspaces, models, and the inference pipeline.

Workspaces

A workspace is an isolated environment with its own API keys, usage quotas, and model access. Use separate workspaces for production, staging, and local development so usage and costs stay segmented.

lib/vault.ts
// Production client
export const vaultProd = new VaultClient({
  apiKey:    process.env.VAULT_API_KEY_PROD!,
  workspace: 'production',
});

// Staging client
export const vaultStaging = new VaultClient({
  apiKey:    process.env.VAULT_API_KEY_STAGING!,
  workspace: 'staging',
});
Each workspace has independent rate limits. Running evals or batch jobs in a staging workspace will not affect your production quota.

Models

Vault exposes a curated set of foundation models through a unified API surface. Each model has a fixed context window and a cost per token.

vault-3-turbo128kFast, low cost. Best for structured extraction and classification.
vault-3-pro200kHighest accuracy. Best for complex reasoning and long documents.
vault-3-mini32kUltra-low latency. Best for real-time or edge use cases.

Inference Pipeline

Every call to vault.infer() goes through a five-stage pipeline: validation, routing, queuing, inference, and response parsing.

1
Validation

Zod validates the request shape before any network call. Invalid inputs throw synchronously.

2
Routing

The client selects the model endpoint based on the workspace configuration and model ID.

3
Queuing

Requests are queued with concurrency control to stay within your rate limit envelope.

4
Inference

The model processes the prompt and returns raw token output.

5
Parsing

The response is deserialized and typed. Malformed responses throw a VaultParseError.

Streaming

Streaming uses the same pipeline but returns an async generator instead of a single response. Each iteration yields a chunk with a delta string and a done flag.

app/api/stream/route.ts
import { vault } from '@/lib/vault';

export async function GET() {
  const stream = vault.stream({
    model: 'vault-3-turbo',
    prompt: 'Summarize the Vault SDK in three sentences.',
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(encoder.encode(chunk.delta));
        if (chunk.done) controller.close();
      }
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain; charset=utf-8' },
  });
}