Designed for tablet+. View on a tablet or larger screen for the intended layout.
CORE_WORKSHOP_v1.0
SPEAKER NOTES MODE — press s to hide
BLOCK 4 · AI GATEWAY (PRODUCTION) 4:08 – 4:20 · 12 min · Lecture (paywalled tour + cache aside)

TOPIC 4.8 / 10 · PAYWALL TOUR + CACHE ASIDE

Why these matter at enterprise scale

"Here's what you can configure today in OSS. Here's what the enterprise tier unlocks. Either way, here's why this concern matters at production scale." Plus a lecture-only Semantic Cache aside — OSS gives you the mechanism; you bring the storage layer.

4.8.A

Guardrails (locked)

2 min Lecture 4:08 – 4:10 ★ LOAD-BEARING (OSS-vs-Enterprise honesty)
Slide 1 / 2 · Guardrails (locked screen)

Guardrails — Rules + Providers

What attendees see (verbatim, verified 2026-05-19):

Unlock guardrails for better security
This feature is a part of the Bifrost enterprise license. We would love to know more about your use case and how we can help you.

Two sub-sections under Guardrails:

  • Rules — the policies themselves: CEL expressions over messages, block-or-redact actions, sampling rates, PII patterns
  • Providers — the engines that execute those rules: Bedrock Guardrails · Azure Content Safety · GraySwan · Patronus · regex scanners · secrets scanners

Bullets (what makes a Rule):

  • CEL expressions over messages — m.role == 'user' && contains(m.content, 'SSN')
  • Block-or-redact actions
  • Sampling rates
  • PII patterns
Slide 2 / 2 · Why this lives at the gateway

Guardrails belong at the gateway layer

  • Wire once, govern everywhere — applies to every call from every app
  • Without gateway guardrails: every app re-implements (and forgets some checks)
  • Same pattern in PortKey, AWS Bedrock Guardrails, NeMo, Lakera, Guardrails AI
  • The mechanism is portable; the implementation is paywalled

Wire once, govern everywhere.

4.8.B

Audit Logs (locked)

2 min Lecture 4:10 – 4:12
Slide 1 / 2 · Audit Logs (locked screen)

Governance — Audit Logs

  • Who called what, when, with what cost, against what VK
  • Filterable by user, team, customer
  • Mandatory for SOC 2 / HIPAA / GDPR if your AI touches regulated data
  • Tamper-resistant, long retention, security-tool integrations
Slide 2 / 2 · OSS vs Enterprise

OSS gives you logs. Enterprise gives you audit.

Layer OSS Bifrost (you have this) Enterprise Bifrost
LLM Logs✓ Recent requests, basic filtering✓ same
Audit Trail✓ Full, filterable, tamper-resistant
RetentionLocal / shortLong, compliance-grade
SIEM integration
4.8.C

Advanced Governance (locked)

2 min Lecture 4:12 – 4:14
Slide 1 / 2 · Multi-tenancy (locked)

Teams · Business Units · Customers · RBAC

  • Tenant isolation — separate VKs, budgets, audit views per customer
  • Required the moment you ship AI to external customers
  • User provisioning syncs with your identity provider (Okta, Azure AD)
  • Roles & Permissions, Access Profiles
Slide 2 / 2 · When you need this

Internal AI? Maybe. External AI? Yes.

  • Internal-only AI tools — you can get away without multi-tenancy
  • External AI features (sold to your customers) — you need this
  • Either Bifrost enterprise, or roll multi-tenancy at the app layer
  • The concern doesn't go away
4.8.D

Adaptive Routing (locked)

2 min Lecture 4:14 – 4:16
Slide 1 / 2 · Adaptive Routing (locked)

Beyond static rules

  • Watches provider health, latency, error rates in real time
  • Shifts traffic dynamically based on observed signals
  • Closed-loop control on top of the routing layer
  • Static weighted rules (which you configured in 4.6) are the floor
Slide 2 / 2 · Static vs adaptive

When to graduate

  • Static routing — predictable, debuggable, fine for most teams
  • Adaptive routing — earns its keep at high traffic
  • Most teams start static and adopt adaptive when their traffic warrants it
  • Don't reach for adaptive on day one
4.8.E

Cluster Config (locked)

2 min Lecture 4:16 – 4:18
Slide 1 / 2 · Cluster Config (locked)

Multi-node Bifrost

  • HA mode — multiple gateway nodes with shared state
  • Leader election, rolling deploys
  • For production scale: never run one gateway instance
  • Cluster Config is the control plane
Slide 2 / 2 · Where Apache 2.0 ends

Single-instance OSS → cluster Enterprise

  • Single-instance OSS Bifrost: fine for dev, staging, small prod
  • When you can't tolerate a gateway restart → cluster
  • Storage scales with it: SQLite (single-node) → PostgreSQL (shared) — same swap you saw in Beat 4.5.5
  • Apache 2.0 ends at the cluster boundary
  • Same scaling shape in every gateway — PortKey, LiteLLM, etc.
4.8.F

Semantic Cache (OSS but needs assembly)

2 min Lecture 4:18 – 4:20 Relocated from 4.6.B (2026-05-19)
Slide 1 / 2 · Two cache modes

Semantic Cache: hash vs semantic

Mode What it matches What it needs
Hash (direct) Exact string match — dimension: 1 in plugin config Vector store + x-bf-cache-key header per request
Semantic Similarity by embedding Vector store + embedding model + x-bf-cache-key header

Bifrost gives you the mechanism. You bring the storage layer.

Slide 2 / 2 · What production looks like

Three pieces, all in OSS

  • Plugin config in config.jsonsemantic_cache entry under plugins[], enabled: true, dimension: 1 (hash) or higher (semantic)
  • Vector store config in config.json — Weaviate / Redis / etc. via vector_store block (Bifrost's storage layer; not bundled, you supply it)
  • x-bf-cache-key header on each request — per-session or per-tenant cache namespacing; without it the plugin doesn't fire

"OSS gives you the mechanism. Vector store + cache-key strategy is the assembly your team owns."