OmniLLM

Usage Guide

Start with the canonical path.

The default OmniLLM path is intentionally conservative: one typed generation request, one gateway, multiple provider backends. Application code stays centered on LlmRequest and LlmResponse while the runtime handles operational concerns.

Install

Add the crate, configure an endpoint, and label every key so pool status is understandable in production.

Request

Keep product code centered on LlmRequest, Message, RequestItem, and LlmResponse unless raw provider payloads are required.

Protect

Let the runtime handle key selection, RPM pressure, timeout, circuit state, cancellation, and budget reservation.

Verify

Use replay fixtures and conversion reports to make provider behavior reviewable before rolling changes forward.

DefaultUse Gateway::call for non-streaming generation and Gateway::stream for canonical streaming. This is the right entry point when your product wants provider-neutral generation behavior without hand-writing provider adapters.

gateway.rscanonical generation

let gateway = GatewayBuilder::new(ProviderEndpoint::openai_responses())
    .add_key(KeyConfig::new("sk-key-1", "prod-1").rpm_limit(500))
    .budget_limit_usd(50.0)
    .request_timeout(Duration::from_secs(45))
    .build()?;

let response = gateway.call(request, CancellationToken::new()).await?;

Architecture

Two modes, one budget ledger.

OmniLLM exposes a normalized canonical mode and an explicit primitive mode. The first keeps generation portable. The second preserves provider-native request and response bodies for APIs that do not map cleanly onto a single generation schema.

Normalize the application contract. Preserve the provider contract when the provider API is the product.OmniLLM protocol posture

Canonical Responses	Gateway::call and Gateway::stream use LlmRequest, LlmResponse, and LlmStreamEvent for provider-neutral generation.
Provider Primitive	primitive_call, primitive_stream, and primitive_realtime preserve raw provider payloads for Images, Audio, Realtime, Count Tokens, Gemini Live, and compatible wrappers.
Shared Ledger	Both modes settle through BudgetTracker, so raw provider usage participates in pre-reserve and settle accounting.
Shared Protection	Both modes use the same key pool, RPM protection, timeout model, and circuit state instead of creating a second execution stack.

Runtime Profiles

Endpoint names describe wire shape.

Runtime configuration uses EndpointProtocol. Low-level parsing, emission, and transcoding use ProviderProtocol. Names such as ClaudeMessages and GeminiGenerateContent are wire-shape identifiers, not marketing preferences.

Official endpoints	Use official endpoint variants when OmniLLM should derive the standard upstream path from a host or prefix.
Compatible endpoints	Use *_compat variants when the upstream wrapper already exposes the full request URL.
Wire formats	Use WireFormat when converting raw API bodies between OpenAI, Anthropic, Gemini, and compatible request shapes.
Transport profile	EndpointProtocol chooses the runtime request shape while application code keeps typed OmniLLM surfaces.

Figure 1. EndpointProtocol chooses the runtime profile while WireFormat and ProviderProtocol describe conversion and transport boundaries.

Native APIs

Primitive mode keeps raw provider APIs intact.

Primitive calls are additive and explicit. They are the escape hatch for provider APIs where the raw provider contract is part of your product behavior. PrimitiveRequest, primitive_call, primitive_stream, and primitive_realtime keep primitive paths additive instead of forcing raw provider APIs through canonical generation conversion.

primitive.rsraw provider payload

let request = PrimitiveRequest::json(
    ProviderPrimitiveKind::ImagesGenerate,
    serde_json::json!({
        "model": "gpt-image-1",
        "prompt": "A blueprint-style gateway diagram"
    }),
);

let raw = gateway.primitive_call(request, CancellationToken::new()).await?;

Conversion

Loss is reported, not hidden.

Transcoding returns explicit bridge metadata through ConversionReport<T>. Callers can inspect bridged, lossy, and loss_reasons instead of guessing which provider-specific fields were dropped.

bridged

The request crossed provider formats instead of staying native.

lossy

At least one field could not be represented safely in the target wire shape.

loss_reasons

Human-readable reasons make test assertions and operator logs actionable.

emit_transport_request

Turns typed requests into inspectable method, path, headers, and body before dispatch.

transcode.rsbridge metadata

let report = transcode_api_request(
    WireFormat::OpenAiChatCompletions,
    WireFormat::OpenAiResponses,
    raw_chat,
)?;

assert!(report.bridged);
assert!(!report.lossy);
for reason in report.loss_reasons {
    tracing::warn!(%reason, "provider bridge dropped data");
}

API Reference

API surface is layered by intent.

Treat the API as layers. Start with the stable application contract, choose the endpoint family that matches the job, drop to primitive mode only when the raw provider contract matters, then keep transport, replay, and operational signals attached to every call.

Application contract	LlmRequest, Message, RequestItem, LlmResponse, and LlmStreamEvent keep generation portable across provider wire formats.
Gateway runtime	GatewayBuilder, ProviderEndpoint, KeyConfig, Gateway::call, and Gateway::stream bind endpoint choice to key pools, limits, timeout, cancellation, and budget settlement.
Endpoint families	EmbeddingRequest, image generation, audio transcription, audio speech, and rerank request families add typed APIs beyond text generation.
Primitive APIs	PrimitiveRequest plus primitive_call, primitive_stream, and primitive_realtime preserve raw provider JSON, SSE, realtime, or media payloads.
Bridge and transport	WireFormat, EndpointProtocol, ProviderProtocol, ConversionReport, and emit_transport_request make conversion and HTTP emission inspectable.
Operations and tests	pool_status, budget_remaining_usd, ReplayFixture, sanitizer helpers, and OmniLLM error classes make production behavior reviewable.

Layer 01

Canonical generation contract

Use this layer when product code wants provider-neutral generation and stable response handling.

LlmRequest carries messages, model options, tools, and provider-neutral generation intent.
Message, MessageRole, and RequestItem keep chat content typed instead of passing raw JSON around.
Gateway::call returns LlmResponse for non-streaming generation.
Gateway::stream returns LlmStreamEvent for canonical streaming behavior.

Layer 02

Gateway construction and runtime policy

Use this layer to bind endpoint choice to operational controls before the first request is sent.

GatewayBuilder selects ProviderEndpoint and wires key pools, budgets, retry posture, and request timeout.
KeyConfig labels keys, applies RPM limits, and makes pool status readable for operators.
CancellationToken is part of every call path, including streaming and primitive calls.
BudgetTracker pre-reserves and settles usage without introducing a second budget subsystem.

Layer 03

Typed non-generation endpoint families

Use this layer when the task is still portable enough to deserve a typed OmniLLM request family.

EmbeddingRequest and EmbeddingResponse cover vector generation and OpenAI-compatible embeddings emission.
Image generation request families let media workloads share gateway keys, timeout, and budget controls.
Audio transcription and speech request families keep media endpoints inside the same runtime posture.
Rerank request families model retrieval ranking without pretending it is text generation.

Layer 04

Provider primitive APIs

Use this layer when the provider API shape is itself the product contract and must be preserved.

PrimitiveRequest carries raw provider-native request payloads without LlmRequest or ApiRequest conversion.
primitive_call handles one-shot provider-native APIs such as images, audio, token counting, or metadata.
primitive_stream keeps provider-native SSE events intact when canonical stream events would lose detail.
primitive_realtime is reserved for realtime transports such as OpenAI Realtime and Gemini Live.

Layer 05

Protocol bridge and transport emission

Use this layer when you need to inspect, test, or report the exact wire shape before dispatch.

EndpointProtocol describes runtime endpoint behavior; ProviderProtocol describes low-level provider wire shape.
WireFormat names the source and target body format for explicit transcoding.
ConversionReport<T> reports bridged, lossy, and loss_reasons instead of hiding downgrade behavior.
emit_transport_request exposes method, path, headers, and body so tests can assert the emitted request.

Layer 06

Operational and replay surfaces

Use this layer to turn gateway behavior into observable service behavior.

pool_status reports key availability, limiter pressure, inflight work, and circuit state.
budget_remaining_usd shows remaining budget after reservation and settlement accounting.
ReplayFixture plus sanitizer helpers produce safe request/response fixtures for protocol regression tests.
NoAvailableKey, BudgetExceeded, and Protocol(...) separate pool, spend, and conversion failures.

Registry

Provider coverage is capability-scoped.

Provider support is expressed as endpoint and primitive capabilities rather than SDK parity claims. This keeps the registry honest about transport shape, request path, response preservation, and settlement behavior.

OpenAI / Azure OpenAI	Canonical Responses, Chat Completions compatibility, embeddings, images, audio, realtime primitives, and OpenAI-compatible wrappers.
Anthropic Claude	Messages wire shape, canonical generation bridge, tool/message conversion, and primitive extension points.
Gemini / Vertex AI	GenerateContent profiles, Gemini-family wire conversion, Vertex-style deployment posture, and Live API primitive scope.
Bedrock	Provider registry integration and runtime routing hooks for cloud-hosted model families.
Compatible providers	Explicit compat endpoints for wrappers that already expose OpenAI-shaped URLs or provider-native proxy paths.

Testing

Fixtures should be useful and safe.

Record/replay tests need reviewable artifacts that do not leak secrets. OmniLLM ships ReplayFixture, sanitize_transport_request, sanitize_transport_response, and sanitize_json_value for safe fixture workflows.

Record real transport requests and responses only in controlled integration runs.
Sanitize auth headers, query tokens, JSON secrets, and large binary or base64 payload fields.
Review fixture diffs as API contracts instead of opaque snapshots.
Replay against deterministic fixtures before modifying protocol bridges or provider profiles.

replay.rssanitized fixture

let sanitized = sanitize_transport_request(recorded_request);
let fixture = ReplayFixture::from_transport(sanitized, response);
fixture.write_json("tests/fixtures/openai_responses.basic.json")?;

Operations

Runtime status belongs in the interface.

Production services can inspect gateway.pool_status() and gateway.budget_remaining_usd(). These surfaces expose key availability, inflight token pressure, RPM pressure, circuit state, and remaining budget after pre-reserve plus settlement accounting.

pool_status()	Shows key availability, limiter pressure, circuit state, and whether the pool can accept more work.
budget_remaining_usd()	Reports remaining budget after reservation and settlement, so operators see real spend pressure.
NoAvailableKey	Indicates pool exhaustion, cooldown, or circuit-open state rather than a provider protocol failure.
BudgetExceeded	Indicates spend policy blocked dispatch before or during settlement.
Protocol(...)	Indicates bridge, parsing, emission, or provider wire-shape mismatch.

SignalsWhen a request fails, OmniLLM-specific errors tell operators whether the failure belongs to pool availability, spend policy, or protocol conversion.

Production

Before shipping, make the operational contract explicit.

The runtime gives you the primitives, but production readiness comes from making budget, cancellation, replay, and provider behavior visible in your service boundary.

Choose the canonical protocol for portable generation before reaching for provider primitives.
Configure every production key with explicit labels, RPM limits, cooldown behavior, and budget assumptions.
Use CancellationToken and request timeouts for every gateway path, including primitive and stream calls.
Treat ConversionReport as part of acceptance criteria when transcoding between provider formats.
Record replay fixtures only after sanitization has removed auth headers, query secrets, and large binary fields.
Expose pool_status, budget_remaining_usd, and OmniLLM error classes to operators before launch.

AI-native Project

The Skill teaches agents the real library.

The bundled OmniLLM Skill gives coding agents repository-native signals instead of generic Rust SDK guesses. It is tuned to GatewayBuilder, ProviderEndpoint, EndpointProtocol, WireFormat, ReplayFixture, primitive calls, and OmniLLM runtime errors.

Install the Skill into Claude Code, Codex, OpenCode, or Claude-compatible skill runners.
Ask for gateway setup, endpoint selection, protocol transcoding, replay fixture generation, or OmniLLM-specific error debugging.
Verify answers against real examples, tests, endpoint profiles, conversion reports, and runtime surfaces.

install.shruntime + skill

# Codex / Claude Code / OpenCode
npx @vercel-labs/skills install github:aiomni/omnillm/skill

# Rust runtime
cargo add omnillm

About this manual. This field manual turns OmniLLM’s runtime, protocol bridge, primitive provider support, replay fixtures, docs, and bundled Skill into one readable entry point for Rust teams shipping production LLM systems.