Usage Guide
Start with the canonical path.
The default OmniLLM path is intentionally conservative: one typed generation request, one gateway, multiple provider backends. Application code stays centered on LlmRequest and LlmResponse while the runtime handles operational concerns.
Add the crate, configure an endpoint, and label every key so pool status is understandable in production.
Keep product code centered on LlmRequest, Message, RequestItem, and LlmResponse unless raw provider payloads are required.
Let the runtime handle key selection, RPM pressure, timeout, circuit state, cancellation, and budget reservation.
Use replay fixtures and conversion reports to make provider behavior reviewable before rolling changes forward.
DefaultUse Gateway::call for non-streaming generation and Gateway::stream for canonical streaming. This is the right entry point when your product wants provider-neutral generation behavior without hand-writing provider adapters.
let gateway = GatewayBuilder::new(ProviderEndpoint::openai_responses())
.add_key(KeyConfig::new("sk-key-1", "prod-1").rpm_limit(500))
.budget_limit_usd(50.0)
.request_timeout(Duration::from_secs(45))
.build()?;
let response = gateway.call(request, CancellationToken::new()).await?;Architecture
Two modes, one budget ledger.
OmniLLM exposes a normalized canonical mode and an explicit primitive mode. The first keeps generation portable. The second preserves provider-native request and response bodies for APIs that do not map cleanly onto a single generation schema.
Normalize the application contract. Preserve the provider contract when the provider API is the product.OmniLLM protocol posture
| Canonical Responses | Gateway::call and Gateway::stream use LlmRequest, LlmResponse, and LlmStreamEvent for provider-neutral generation. |
|---|---|
| Provider Primitive | primitive_call, primitive_stream, and primitive_realtime preserve raw provider payloads for Images, Audio, Realtime, Count Tokens, Gemini Live, and compatible wrappers. |
| Shared Ledger | Both modes settle through BudgetTracker, so raw provider usage participates in pre-reserve and settle accounting. |
| Shared Protection | Both modes use the same key pool, RPM protection, timeout model, and circuit state instead of creating a second execution stack. |
Runtime Profiles
Endpoint names describe wire shape.
Runtime configuration uses EndpointProtocol. Low-level parsing, emission, and transcoding use ProviderProtocol. Names such as ClaudeMessages and GeminiGenerateContent are wire-shape identifiers, not marketing preferences.
| Official endpoints | Use official endpoint variants when OmniLLM should derive the standard upstream path from a host or prefix. |
|---|---|
| Compatible endpoints | Use *_compat variants when the upstream wrapper already exposes the full request URL. |
| Wire formats | Use WireFormat when converting raw API bodies between OpenAI, Anthropic, Gemini, and compatible request shapes. |
| Transport profile | EndpointProtocol chooses the runtime request shape while application code keeps typed OmniLLM surfaces. |
Native APIs
Primitive mode keeps raw provider APIs intact.
Primitive calls are additive and explicit. They are the escape hatch for provider APIs where the raw provider contract is part of your product behavior. PrimitiveRequest, primitive_call, primitive_stream, and primitive_realtime keep primitive paths additive instead of forcing raw provider APIs through canonical generation conversion.
let request = PrimitiveRequest::json(
ProviderPrimitiveKind::ImagesGenerate,
serde_json::json!({
"model": "gpt-image-1",
"prompt": "A blueprint-style gateway diagram"
}),
);
let raw = gateway.primitive_call(request, CancellationToken::new()).await?;Conversion
Loss is reported, not hidden.
Transcoding returns explicit bridge metadata through ConversionReport<T>. Callers can inspect bridged, lossy, and loss_reasons instead of guessing which provider-specific fields were dropped.
The request crossed provider formats instead of staying native.
At least one field could not be represented safely in the target wire shape.
Human-readable reasons make test assertions and operator logs actionable.
Turns typed requests into inspectable method, path, headers, and body before dispatch.
let report = transcode_api_request(
WireFormat::OpenAiChatCompletions,
WireFormat::OpenAiResponses,
raw_chat,
)?;
assert!(report.bridged);
assert!(!report.lossy);
for reason in report.loss_reasons {
tracing::warn!(%reason, "provider bridge dropped data");
}API Reference
API surface is layered by intent.
Treat the API as layers. Start with the stable application contract, choose the endpoint family that matches the job, drop to primitive mode only when the raw provider contract matters, then keep transport, replay, and operational signals attached to every call.
| Application contract | LlmRequest, Message, RequestItem, LlmResponse, and LlmStreamEvent keep generation portable across provider wire formats. |
|---|---|
| Gateway runtime | GatewayBuilder, ProviderEndpoint, KeyConfig, Gateway::call, and Gateway::stream bind endpoint choice to key pools, limits, timeout, cancellation, and budget settlement. |
| Endpoint families | EmbeddingRequest, image generation, audio transcription, audio speech, and rerank request families add typed APIs beyond text generation. |
| Primitive APIs | PrimitiveRequest plus primitive_call, primitive_stream, and primitive_realtime preserve raw provider JSON, SSE, realtime, or media payloads. |
| Bridge and transport | WireFormat, EndpointProtocol, ProviderProtocol, ConversionReport, and emit_transport_request make conversion and HTTP emission inspectable. |
| Operations and tests | pool_status, budget_remaining_usd, ReplayFixture, sanitizer helpers, and OmniLLM error classes make production behavior reviewable. |
Canonical generation contract
Use this layer when product code wants provider-neutral generation and stable response handling.
- LlmRequest carries messages, model options, tools, and provider-neutral generation intent.
- Message, MessageRole, and RequestItem keep chat content typed instead of passing raw JSON around.
- Gateway::call returns LlmResponse for non-streaming generation.
- Gateway::stream returns LlmStreamEvent for canonical streaming behavior.
Gateway construction and runtime policy
Use this layer to bind endpoint choice to operational controls before the first request is sent.
- GatewayBuilder selects ProviderEndpoint and wires key pools, budgets, retry posture, and request timeout.
- KeyConfig labels keys, applies RPM limits, and makes pool status readable for operators.
- CancellationToken is part of every call path, including streaming and primitive calls.
- BudgetTracker pre-reserves and settles usage without introducing a second budget subsystem.
Typed non-generation endpoint families
Use this layer when the task is still portable enough to deserve a typed OmniLLM request family.
- EmbeddingRequest and EmbeddingResponse cover vector generation and OpenAI-compatible embeddings emission.
- Image generation request families let media workloads share gateway keys, timeout, and budget controls.
- Audio transcription and speech request families keep media endpoints inside the same runtime posture.
- Rerank request families model retrieval ranking without pretending it is text generation.
Provider primitive APIs
Use this layer when the provider API shape is itself the product contract and must be preserved.
- PrimitiveRequest carries raw provider-native request payloads without LlmRequest or ApiRequest conversion.
- primitive_call handles one-shot provider-native APIs such as images, audio, token counting, or metadata.
- primitive_stream keeps provider-native SSE events intact when canonical stream events would lose detail.
- primitive_realtime is reserved for realtime transports such as OpenAI Realtime and Gemini Live.
Protocol bridge and transport emission
Use this layer when you need to inspect, test, or report the exact wire shape before dispatch.
- EndpointProtocol describes runtime endpoint behavior; ProviderProtocol describes low-level provider wire shape.
- WireFormat names the source and target body format for explicit transcoding.
- ConversionReport<T> reports bridged, lossy, and loss_reasons instead of hiding downgrade behavior.
- emit_transport_request exposes method, path, headers, and body so tests can assert the emitted request.
Operational and replay surfaces
Use this layer to turn gateway behavior into observable service behavior.
- pool_status reports key availability, limiter pressure, inflight work, and circuit state.
- budget_remaining_usd shows remaining budget after reservation and settlement accounting.
- ReplayFixture plus sanitizer helpers produce safe request/response fixtures for protocol regression tests.
- NoAvailableKey, BudgetExceeded, and Protocol(...) separate pool, spend, and conversion failures.
Registry
Provider coverage is capability-scoped.
Provider support is expressed as endpoint and primitive capabilities rather than SDK parity claims. This keeps the registry honest about transport shape, request path, response preservation, and settlement behavior.
| OpenAI / Azure OpenAI | Canonical Responses, Chat Completions compatibility, embeddings, images, audio, realtime primitives, and OpenAI-compatible wrappers. |
|---|---|
| Anthropic Claude | Messages wire shape, canonical generation bridge, tool/message conversion, and primitive extension points. |
| Gemini / Vertex AI | GenerateContent profiles, Gemini-family wire conversion, Vertex-style deployment posture, and Live API primitive scope. |
| Bedrock | Provider registry integration and runtime routing hooks for cloud-hosted model families. |
| Compatible providers | Explicit compat endpoints for wrappers that already expose OpenAI-shaped URLs or provider-native proxy paths. |
Testing
Fixtures should be useful and safe.
Record/replay tests need reviewable artifacts that do not leak secrets. OmniLLM ships ReplayFixture, sanitize_transport_request, sanitize_transport_response, and sanitize_json_value for safe fixture workflows.
- Record real transport requests and responses only in controlled integration runs.
- Sanitize auth headers, query tokens, JSON secrets, and large binary or base64 payload fields.
- Review fixture diffs as API contracts instead of opaque snapshots.
- Replay against deterministic fixtures before modifying protocol bridges or provider profiles.
let sanitized = sanitize_transport_request(recorded_request);
let fixture = ReplayFixture::from_transport(sanitized, response);
fixture.write_json("tests/fixtures/openai_responses.basic.json")?;Operations
Runtime status belongs in the interface.
Production services can inspect gateway.pool_status() and gateway.budget_remaining_usd(). These surfaces expose key availability, inflight token pressure, RPM pressure, circuit state, and remaining budget after pre-reserve plus settlement accounting.
| pool_status() | Shows key availability, limiter pressure, circuit state, and whether the pool can accept more work. |
|---|---|
| budget_remaining_usd() | Reports remaining budget after reservation and settlement, so operators see real spend pressure. |
| NoAvailableKey | Indicates pool exhaustion, cooldown, or circuit-open state rather than a provider protocol failure. |
| BudgetExceeded | Indicates spend policy blocked dispatch before or during settlement. |
| Protocol(...) | Indicates bridge, parsing, emission, or provider wire-shape mismatch. |
SignalsWhen a request fails, OmniLLM-specific errors tell operators whether the failure belongs to pool availability, spend policy, or protocol conversion.
Production
Before shipping, make the operational contract explicit.
The runtime gives you the primitives, but production readiness comes from making budget, cancellation, replay, and provider behavior visible in your service boundary.
- Choose the canonical protocol for portable generation before reaching for provider primitives.
- Configure every production key with explicit labels, RPM limits, cooldown behavior, and budget assumptions.
- Use CancellationToken and request timeouts for every gateway path, including primitive and stream calls.
- Treat ConversionReport as part of acceptance criteria when transcoding between provider formats.
- Record replay fixtures only after sanitization has removed auth headers, query secrets, and large binary fields.
- Expose pool_status, budget_remaining_usd, and OmniLLM error classes to operators before launch.
AI-native Project
The Skill teaches agents the real library.
The bundled OmniLLM Skill gives coding agents repository-native signals instead of generic Rust SDK guesses. It is tuned to GatewayBuilder, ProviderEndpoint, EndpointProtocol, WireFormat, ReplayFixture, primitive calls, and OmniLLM runtime errors.
- Install the Skill into Claude Code, Codex, OpenCode, or Claude-compatible skill runners.
- Ask for gateway setup, endpoint selection, protocol transcoding, replay fixture generation, or OmniLLM-specific error debugging.
- Verify answers against real examples, tests, endpoint profiles, conversion reports, and runtime surfaces.
# Codex / Claude Code / OpenCode
npx @vercel-labs/skills install github:aiomni/omnillm/skill
# Rust runtime
cargo add omnillmAbout this manual. This field manual turns OmniLLM’s runtime, protocol bridge, primitive provider support, replay fixtures, docs, and bundled Skill into one readable entry point for Rust teams shipping production LLM systems.