Core Concepts
Before building with Vault, it helps to understand the three primitives that everything else is built on: workspaces, models, and the inference pipeline.
Workspaces
A workspace is an isolated environment with its own API keys, usage quotas, and model access. Use separate workspaces for production, staging, and local development so usage and costs stay segmented.
// Production client
export const vaultProd = new VaultClient({
apiKey: process.env.VAULT_API_KEY_PROD!,
workspace: 'production',
});
// Staging client
export const vaultStaging = new VaultClient({
apiKey: process.env.VAULT_API_KEY_STAGING!,
workspace: 'staging',
});Models
Vault exposes a curated set of foundation models through a unified API surface. Each model has a fixed context window and a cost per token.
vault-3-turbo128kFast, low cost. Best for structured extraction and classification.vault-3-pro200kHighest accuracy. Best for complex reasoning and long documents.vault-3-mini32kUltra-low latency. Best for real-time or edge use cases.Inference Pipeline
Every call to vault.infer() goes through a five-stage pipeline: validation, routing, queuing, inference, and response parsing.
Zod validates the request shape before any network call. Invalid inputs throw synchronously.
The client selects the model endpoint based on the workspace configuration and model ID.
Requests are queued with concurrency control to stay within your rate limit envelope.
The model processes the prompt and returns raw token output.
The response is deserialized and typed. Malformed responses throw a VaultParseError.
Streaming
Streaming uses the same pipeline but returns an async generator instead of a single response. Each iteration yields a chunk with a delta string and a done flag.
import { vault } from '@/lib/vault';
export async function GET() {
const stream = vault.stream({
model: 'vault-3-turbo',
prompt: 'Summarize the Vault SDK in three sentences.',
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
controller.enqueue(encoder.encode(chunk.delta));
if (chunk.done) controller.close();
}
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/plain; charset=utf-8' },
});
}