BLOCK 4 · AI GATEWAY (PRODUCTION)4:08 – 4:20 · 12 min · Lecture (paywalled tour + cache aside)
TOPIC 4.8 / 10 · PAYWALL TOUR + CACHE ASIDE
Why these matter at enterprise scale
"Here's what you can configure today in OSS. Here's what the enterprise tier unlocks. Either way,
here's why this concern matters at production scale." Plus a lecture-only Semantic Cache aside
— OSS gives you the mechanism; you bring the storage layer.
What attendees see (verbatim, verified 2026-05-19):
Unlock guardrails for better security
This feature is a part of the Bifrost enterprise license. We would love to know more about your use case and how we can help you.
Two sub-sections under Guardrails:
Rules — the policies themselves: CEL expressions over messages, block-or-redact actions, sampling rates, PII patterns
Providers — the engines that execute those rules: Bedrock Guardrails · Azure Content Safety · GraySwan · Patronus · regex scanners · secrets scanners
Bullets (what makes a Rule):
CEL expressions over messages — m.role == 'user' && contains(m.content, 'SSN')
Block-or-redact actions
Sampling rates
PII patterns
Slide 2 / 2 · Why this lives at the gateway
Guardrails belong at the gateway layer
Wire once, govern everywhere — applies to every call from every app
Without gateway guardrails: every app re-implements (and forgets some checks)
Same pattern in PortKey, AWS Bedrock Guardrails, NeMo, Lakera, Guardrails AI
The mechanism is portable; the implementation is paywalled
Wire once, govern everywhere.
4.8.B
Audit Logs (locked)
2 minLecture4:10 – 4:12
Slide 1 / 2 · Audit Logs (locked screen)
Governance — Audit Logs
Who called what, when, with what cost, against what VK
Filterable by user, team, customer
Mandatory for SOC 2 / HIPAA / GDPR if your AI touches regulated data
Tamper-resistant, long retention, security-tool integrations
Slide 2 / 2 · OSS vs Enterprise
OSS gives you logs. Enterprise gives you audit.
Layer
OSS Bifrost (you have this)
Enterprise Bifrost
LLM Logs
✓ Recent requests, basic filtering
✓ same
Audit Trail
✗
✓ Full, filterable, tamper-resistant
Retention
Local / short
Long, compliance-grade
SIEM integration
✗
✓
4.8.C
Advanced Governance (locked)
2 minLecture4:12 – 4:14
Slide 1 / 2 · Multi-tenancy (locked)
Teams · Business Units · Customers · RBAC
Tenant isolation — separate VKs, budgets, audit views per customer
Required the moment you ship AI to external customers
User provisioning syncs with your identity provider (Okta, Azure AD)
Roles & Permissions, Access Profiles
Slide 2 / 2 · When you need this
Internal AI? Maybe. External AI? Yes.
Internal-only AI tools — you can get away without multi-tenancy
External AI features (sold to your customers) — you need this
Either Bifrost enterprise, or roll multi-tenancy at the app layer
The concern doesn't go away
4.8.D
Adaptive Routing (locked)
2 minLecture4:14 – 4:16
Slide 1 / 2 · Adaptive Routing (locked)
Beyond static rules
Watches provider health, latency, error rates in real time
Shifts traffic dynamically based on observed signals
Closed-loop control on top of the routing layer
Static weighted rules (which you configured in 4.6) are the floor
Slide 2 / 2 · Static vs adaptive
When to graduate
Static routing — predictable, debuggable, fine for most teams
Adaptive routing — earns its keep at high traffic
Most teams start static and adopt adaptive when their traffic warrants it
Don't reach for adaptive on day one
4.8.E
Cluster Config (locked)
2 minLecture4:16 – 4:18
Slide 1 / 2 · Cluster Config (locked)
Multi-node Bifrost
HA mode — multiple gateway nodes with shared state
Leader election, rolling deploys
For production scale: never run one gateway instance
Cluster Config is the control plane
Slide 2 / 2 · Where Apache 2.0 ends
Single-instance OSS → cluster Enterprise
Single-instance OSS Bifrost: fine for dev, staging, small prod
When you can't tolerate a gateway restart → cluster
Storage scales with it: SQLite (single-node) → PostgreSQL (shared) — same swap you saw in Beat 4.5.5
Apache 2.0 ends at the cluster boundary
Same scaling shape in every gateway — PortKey, LiteLLM, etc.
4.8.F
Semantic Cache (OSS but needs assembly)
2 minLecture4:18 – 4:20Relocated from 4.6.B (2026-05-19)
Slide 1 / 2 · Two cache modes
Semantic Cache: hash vs semantic
Mode
What it matches
What it needs
Hash (direct)
Exact string match — dimension: 1 in plugin config
Vector store + x-bf-cache-key header per request
Semantic
Similarity by embedding
Vector store + embedding model + x-bf-cache-key header
Bifrost gives you the mechanism. You bring the storage layer.
Slide 2 / 2 · What production looks like
Three pieces, all in OSS
Plugin config in config.json — semantic_cache entry under plugins[], enabled: true, dimension: 1 (hash) or higher (semantic)
Vector store config in config.json — Weaviate / Redis / etc. via vector_store block (Bifrost's storage layer; not bundled, you supply it)
x-bf-cache-key header on each request — per-session or per-tenant cache namespacing; without it the plugin doesn't fire
"OSS gives you the mechanism. Vector store + cache-key strategy is the assembly your team owns."