An interactive demonstration of what happens when agentic AI systems operate without security controls — and what changes when guardrails are in place. Nine scenarios. Four new architectural controls.
Each scenario runs in under 30 seconds. Toggle guardrails on and off to see the same agent behave differently.
Choose from nine scenarios in the left panel — from normal operation to multi-stage attack chains and live HITL checkpoints.
Switch between Guardrails ON and Guardrails OFF to see the same request handled differently.
Watch the agent work in real time — every decision, tool call, and control action streamed live.
Left pane shows everything happening inside the agent. Right pane shows what the end-user sees.
Every scenario runs with guardrails on and off — showing what the controls prevent, and what happens without them. Scenarios 6–9 run live over a bidirectional WebSocket.
A routine financial research and writing task. Shows the healthy agentic loop — plan, delegate, synthesise — with a full audit trail.
BaselineThe agent fetches a malicious website during research. The page contains hidden override instructions. With guardrails off, credentials are silently exfiltrated.
OWASP LLM01The agent attempts to use a tool outside its permitted scope. The allowlist control blocks it. Without guardrails, the action proceeds unchecked.
Tool AllowlistThe agent makes far more tool calls than the task requires — a runaway agent. The call budget cap limits blast radius. Without it, API costs spiral.
Call BudgetA £5.6M pension fund rebalancing is planned but nothing executes. Every intended trade is held for human-in-the-loop approval — the four-eyes control applied to autonomous AI.
Human-in-the-LoopA worker returns a result containing hidden adversarial instructions. The orchestrator reads them and changes behaviour. Tool result validation catches this before it reaches message history.
OWASP LLM02An orchestrator dispatches a task wildly outside its original scope. A compliant worker executes it without question. Job scope manifests stop this cold.
Scope EnforcementRecon → exploit → exfiltrate — all in seconds. Shows how an agent with tool access can execute a full attack chain. The tool allowlist stops it at phase 2.
Attack ChainA £15M portfolio rebalancing pauses for real human approval — not simulated. Approve or reject via the browser. The agent unblocks the instant you respond.
Bidirectional HITLv2.0.0 adds defences targeting multi-agent and orchestration-layer threats — the gaps that single-agent guardrails don't cover.
Derived from the original user request. Workers validate every dispatch against it independently of the orchestrator — catching scope drift the orchestrator can't see.
Worker-enforcedScans every worker result for adversarial injection patterns before it enters the orchestrator's message history — stops poisoned tool results before they affect behaviour.
Pre-history scanChild orchestrators cannot have a broader tool scope than their parent. A compromised orchestrator cannot escalate its own privilege by spawning a less-restricted child.
Scope containmentEach worker receives a short-lived token scoped to the tools it needs for one job. Leaked or replayed tokens cannot be used to call out-of-scope tools.
Least-privilege credsAttack scenarios 2–8 are scripted for reliability. Scenario 1 (Normal) and Scenario 9 (HITL) make live backend calls — the HITL checkpoint is a real bidirectional pause.
Open the Demo →