Glossary

Lethal trifecta

Access to private data, exposure to untrusted content, and the ability to communicate externally — combined in one agent, this is the prompt-injection attack surface for AI tools.

The lethal trifecta, popularised by Simon Willison in mid-2025, describes the three capabilities that turn any AI assistant into a remote-execution target when they are combined: (1) access to private data, (2) exposure to untrusted content, and (3) the ability to communicate externally. An assistant that can read your email, follow links in those emails, and send messages on your behalf is the canonical example — an attacker can email instructions to your agent and have it act on them.

Helix limits the trifecta by design. The approval policy intercepts every external send and calendar write by default; an attacker-controlled message in the inbox cannot exfiltrate or impersonate without a human approval. Identity scopes are explicit, so an agent that has no calendar grant cannot create events even if instructed to. The audit log records both the agent's reasoning and the matched rule, so post-incident forensics is tractable.

No design eliminates the trifecta — capability is the product. Helix's position is that the right answer is graduated autonomy: read freely, draft freely, but require a human in the loop for the externally-visible actions where the attack surface lives. The approval policy is the knob; tightening it is one click per identity.

Related terms

In product

Audit trail for AI agents