AI Agent Security Demo

Getting started

How to use the demo

Each scenario runs in under 30 seconds. Toggle guardrails on and off to see the same agent behave differently.

Pick a scenario

Choose from nine scenarios in the left panel — from normal operation to multi-stage attack chains and live HITL checkpoints.

Toggle guardrails

Switch between Guardrails ON and Guardrails OFF to see the same request handled differently.

Click Run

Watch the agent work in real time — every decision, tool call, and control action streamed live.

Compare the panes

Left pane shows everything happening inside the agent. Right pane shows what the end-user sees.

The two panes

◈ Under the Hood — Audit Trail

Every orchestrator decision and reasoning step
Tool calls dispatched to workers (with inputs)
What each worker returned
Guardrail events — blocks and reasons
Token usage and timing per step
Session summary and guardrail statistics

◈ User View — What They See

The request as the user typed it
The agent's final response only
Error messages when a request is blocked
In attack scenarios: a normal-looking response — with no sign that data was exfiltrated

Scenarios

Nine scenarios, two outcomes each

Every scenario runs with guardrails on and off — showing what the controls prevent, and what happens without them. Scenarios 6–9 run live over a bidirectional WebSocket.

✅

Normal Operation

A routine financial research and writing task. Shows the healthy agentic loop — plan, delegate, synthesise — with a full audit trail.

Baseline

💉

Prompt Injection

The agent fetches a malicious website during research. The page contains hidden override instructions. With guardrails off, credentials are silently exfiltrated.

OWASP LLM01

🚫

Unauthorized Tool

The agent attempts to use a tool outside its permitted scope. The allowlist control blocks it. Without guardrails, the action proceeds unchecked.

Tool Allowlist

💸

Budget Exceeded

The agent makes far more tool calls than the task requires — a runaway agent. The call budget cap limits blast radius. Without it, API costs spiral.

Call Budget

🔍

Pre-execution Review

A £5.6M pension fund rebalancing is planned but nothing executes. Every intended trade is held for human-in-the-loop approval — the four-eyes control applied to autonomous AI.

Human-in-the-Loop

☠️

Poisoned Tool Result

A worker returns a result containing hidden adversarial instructions. The orchestrator reads them and changes behaviour. Tool result validation catches this before it reaches message history.

OWASP LLM02

🤖

Worker Over-Compliance

An orchestrator dispatches a task wildly outside its original scope. A compliant worker executes it without question. Job scope manifests stop this cold.

Scope Enforcement

⚡

AI Attack Chain

Recon → exploit → exfiltrate — all in seconds. Shows how an agent with tool access can execute a full attack chain. The tool allowlist stops it at phase 2.

Attack Chain

⏸

Live HITL Checkpoint

A £15M portfolio rebalancing pauses for real human approval — not simulated. Approve or reject via the browser. The agent unblocks the instant you respond.

Bidirectional HITL

Architectural Controls

Four new controls in v2.0.0

v2.0.0 adds defences targeting multi-agent and orchestration-layer threats — the gaps that single-agent guardrails don't cover.

📋

Job Scope Manifest

Derived from the original user request. Workers validate every dispatch against it independently of the orchestrator — catching scope drift the orchestrator can't see.

Worker-enforced

🛡️

Tool Result Validation

Scans every worker result for adversarial injection patterns before it enters the orchestrator's message history — stops poisoned tool results before they affect behaviour.

Pre-history scan

🔗

Non-Expanding Delegation

Child orchestrators cannot have a broader tool scope than their parent. A compromised orchestrator cannot escalate its own privilege by spawning a less-restricted child.

Scope containment

🔑

Job-Scoped MCP Tokens

Each worker receives a short-lived token scoped to the tools it needs for one job. Leaked or replayed tokens cannot be used to call out-of-scope tools.

Least-privilege creds

See inside an
AI agent under attack.

How to use the demo

Pick a scenario

Toggle guardrails

Click Run

Compare the panes

◈ Under the Hood — Audit Trail

◈ User View — What They See

Nine scenarios, two outcomes each

Normal Operation

Prompt Injection

Unauthorized Tool

Budget Exceeded

Pre-execution Review

Poisoned Tool Result

Worker Over-Compliance

AI Attack Chain

Live HITL Checkpoint

Four new controls in v2.0.0

Job Scope Manifest

Tool Result Validation

Non-Expanding Delegation

Job-Scoped MCP Tokens

Ready to run the demo?

See inside anAI agent under attack.

How to use the demo

Pick a scenario

Toggle guardrails

Click Run

Compare the panes

◈ Under the Hood — Audit Trail

◈ User View — What They See

Nine scenarios, two outcomes each

Normal Operation

Prompt Injection

Unauthorized Tool

Budget Exceeded

Pre-execution Review

Poisoned Tool Result

Worker Over-Compliance

AI Attack Chain

Live HITL Checkpoint

Four new controls in v2.0.0

Job Scope Manifest

Tool Result Validation

Non-Expanding Delegation

Job-Scoped MCP Tokens

Ready to run the demo?

See inside an
AI agent under attack.