Block 4 / Topic 4.8 — Paywall Tour + Cache aside

4.8.A

Guardrails (locked)

2 min Lecture 4:08 – 4:10 ★ LOAD-BEARING (OSS-vs-Enterprise honesty)

Slide 1 / 2 · Guardrails (locked screen)

Guardrails — Rules + Providers

What attendees see (verbatim, verified 2026-05-19):

Unlock guardrails for better security
This feature is a part of the Bifrost enterprise license. We would love to know more about your use case and how we can help you.

Two sub-sections under Guardrails:

Rules — the policies themselves: CEL expressions over messages, block-or-redact actions, sampling rates, PII patterns
Providers — the engines that execute those rules: Bedrock Guardrails · Azure Content Safety · GraySwan · Patronus · regex scanners · secrets scanners

Bullets (what makes a Rule):

CEL expressions over messages — m.role == 'user' && contains(m.content, 'SSN')
Block-or-redact actions
Sampling rates
PII patterns

Slide 2 / 2 · Why this lives at the gateway

Guardrails belong at the gateway layer

Wire once, govern everywhere — applies to every call from every app
Without gateway guardrails: every app re-implements (and forgets some checks)
Same pattern in PortKey, AWS Bedrock Guardrails, NeMo, Lakera, Guardrails AI
The mechanism is portable; the implementation is paywalled

Wire once, govern everywhere.

Speaker notes

"Sidebar → Guardrails → Rules. Locked screen. The CTA says 'Unlock guardrails for better security. This feature is a part of the Bifrost enterprise license. We would love to know more about your use case and how we can help you.' Notice the second sentence — that's how Enterprise conversations actually start. Not 'pay us first'; 'tell us your use case.' Don't apologize for the paywall — narrate what would be there. Production teams write CEL expressions like 'if message contains SSN, block.' Block-or-redact actions. Sampling rates so you don't pay to scan every message. PII patterns. And the Providers sub-section is the engines that execute those rules — integrations with Bedrock Guardrails, Azure Content Safety, GraySwan, Patronus, regex scanners, secrets scanners."

"Why does this live at the gateway? Wire once, govern everywhere. Every app, every endpoint. Without gateway guardrails, every app team re-implements them — and forgets some. Same pattern in PortKey, AWS Bedrock Guardrails, NeMo, Lakera, Guardrails AI. The mechanism is portable. The implementation in Bifrost is paywalled. Either path — OSS Bifrost plus app-layer guardrails, or enterprise tier — this concern doesn't go away."

4.8.B

Audit Logs (locked)

2 min Lecture 4:10 – 4:12

Slide 1 / 2 · Audit Logs (locked screen)

Governance — Audit Logs

Who called what, when, with what cost, against what VK
Filterable by user, team, customer
Mandatory for SOC 2 / HIPAA / GDPR if your AI touches regulated data
Tamper-resistant, long retention, security-tool integrations

Slide 2 / 2 · OSS vs Enterprise

OSS gives you logs. Enterprise gives you audit.

Layer	OSS Bifrost (you have this)	Enterprise Bifrost
LLM Logs	✓ Recent requests, basic filtering	✓ same
Audit Trail	✗	✓ Full, filterable, tamper-resistant
Retention	Local / short	Long, compliance-grade
SIEM integration	✗	✓

4.8.C

Advanced Governance (locked)

2 min Lecture 4:12 – 4:14

Slide 1 / 2 · Multi-tenancy (locked)

Teams · Business Units · Customers · RBAC

Tenant isolation — separate VKs, budgets, audit views per customer
Required the moment you ship AI to external customers
User provisioning syncs with your identity provider (Okta, Azure AD)
Roles & Permissions, Access Profiles

Slide 2 / 2 · When you need this

Internal AI? Maybe. External AI? Yes.

Internal-only AI tools — you can get away without multi-tenancy
External AI features (sold to your customers) — you need this
Either Bifrost enterprise, or roll multi-tenancy at the app layer
The concern doesn't go away

4.8.D

Adaptive Routing (locked)

2 min Lecture 4:14 – 4:16

Slide 1 / 2 · Adaptive Routing (locked)

Beyond static rules

Watches provider health, latency, error rates in real time
Shifts traffic dynamically based on observed signals
Closed-loop control on top of the routing layer
Static weighted rules (which you configured in 4.6) are the floor

Slide 2 / 2 · Static vs adaptive

When to graduate

Static routing — predictable, debuggable, fine for most teams
Adaptive routing — earns its keep at high traffic
Most teams start static and adopt adaptive when their traffic warrants it
Don't reach for adaptive on day one

4.8.E

Cluster Config (locked)

2 min Lecture 4:16 – 4:18

Slide 1 / 2 · Cluster Config (locked)

Multi-node Bifrost

HA mode — multiple gateway nodes with shared state
Leader election, rolling deploys
For production scale: never run one gateway instance
Cluster Config is the control plane

Slide 2 / 2 · Where Apache 2.0 ends

Single-instance OSS → cluster Enterprise

Single-instance OSS Bifrost: fine for dev, staging, small prod
When you can't tolerate a gateway restart → cluster
Storage scales with it: SQLite (single-node) → PostgreSQL (shared) — same swap you saw in Beat 4.5.5
Apache 2.0 ends at the cluster boundary
Same scaling shape in every gateway — PortKey, LiteLLM, etc.

Speaker notes

"Sidebar → Cluster Config. Locked. Multi-node Bifrost. HA mode. Shared state, leader election, rolling deploys. For production scale, you don't run one gateway instance — you run a cluster. Cluster Config is the control plane."

Callback to Beat 4.5.5: "Remember the storage layer we looked at — SQLite by default, swap to PostgreSQL in config.json? That swap is what makes the cluster work. Cluster Config orchestrates the nodes; PostgreSQL is the shared state they all read from. Same Bifrost binary, same UI, different backend. The cluster doesn't change your app code. Your app keeps calling localhost:8080 — load-balanced to whichever node answers."

"Single-instance OSS Bifrost is fine for dev, staging, and small prod. The moment you can't tolerate a gateway restart — meaning your AI traffic is critical-path — you go cluster. That's where Apache 2.0 ends. Same scaling shape in every gateway alternative."

Closing bridge: "That's the tour of the locked features. One more concern before we wire Nuxt — semantic cache. This one's actually OSS, but it needs a little more than a config flag, so we'll walk it instead of wiring it."

4.8.F

Semantic Cache (OSS but needs assembly)

2 min Lecture 4:18 – 4:20 Relocated from 4.6.B (2026-05-19)

Slide 1 / 2 · Two cache modes

Semantic Cache: hash vs semantic

Mode	What it matches	What it needs
Hash (direct)	Exact string match — `dimension: 1` in plugin config	Vector store + `x-bf-cache-key` header per request
Semantic	Similarity by embedding	Vector store + embedding model + `x-bf-cache-key` header

Bifrost gives you the mechanism. You bring the storage layer.

Slide 2 / 2 · What production looks like

Three pieces, all in OSS

Plugin config in config.json — semantic_cache entry under plugins[], enabled: true, dimension: 1 (hash) or higher (semantic)
Vector store config in config.json — Weaviate / Redis / etc. via vector_store block (Bifrost's storage layer; not bundled, you supply it)
x-bf-cache-key header on each request — per-session or per-tenant cache namespacing; without it the plugin doesn't fire

"OSS gives you the mechanism. Vector store + cache-key strategy is the assembly your team owns."

Speaker notes

Slide 1 — name the two modes: "Bifrost's semantic_cache plugin has two modes. Hash mode — exact string match — sets dimension to 1 and skips embedding computation. Use this for chatbots, deterministic templates, common queries. Semantic mode — similarity matching — needs an embedding model on top, catches paraphrases and re-asks. Both modes share the same plumbing: vector store as the underlying KV, and an x-bf-cache-key header on every request to namespace cache entries by session or tenant."

Slide 2 — the honest production picture: "Three pieces. Plugin config in config.json — the semantic_cache entry under plugins[]. Vector store — Bifrost doesn't bundle one; you wire Weaviate or Redis or whatever your stack runs. That's a vector_store block in config.json. And the cache-key header — x-bf-cache-key: <session-id> on every request, otherwise the plugin doesn't fire. All three pieces live in OSS. The Apache 2.0 license includes the cache mechanism. The vector store is your team's storage choice — same shape as choosing Redis vs Memcached for HTTP caching. We didn't wire this live in the workshop because vector-store setup is a workshop's worth of work on its own. But the mechanism is there, free, ready when you are."

Closing bridge to 4.9: "That's the tour. The OSS gateway you configured today has the foundation. The enterprise tier adds the production-scale concerns we just walked — guardrails, audit, multi-tenancy, adaptive routing, clustering. Cache is OSS, just needs your storage choice. Either way — the concerns themselves don't change. They live at the gateway, not in your app. Let's prove that by wiring the gateway into a real Vue app."