Make Real Progress In Security From AI
I gave a talk at the AI Agent Security Summit by Zenity Labs on October 8th in San Francisco. I’ll post a blog version of that talk here shortly.
But for now, here are: My slides.
Links and references:
- Anthropic applying mechanistic interpretability to a frontier model for the first time
- OpenAI’s early attempts at “solving” prompt injection”
- Microsoft’s early attempts at “solving” prompt injection
- Johann’s youtube channel
- Johann’s phenomenal Month of AI Bugs breaking any agentic app out there
- First MCP malware observed in the wild
- Another MCP malware
- Prompt injection attack through MCP tool descriptions which we can dynamically changed by the server
- Zenity’s MCP registry
- Brave showing a prompt injection attack on Perplexity Comet that breaks CORS
- Perpelexity defending its stance that agents should not respect browser rules
- Our 0click persistent attack on ChatGPT and other flagship AIs
- Breaking Copilot Studio to change scope between SharePoint sites, BlackHat USA 2024
- Hijacking Microsoft 365 Copilot by sending an email or an external Teams message, BlackHat USA 2024
- Johann’s original discovery of AI memory as a persistence mechanism
- Brave’s Leo AI intentionally nerfs its capabilities to stay secure
- Johann’s original discovery of markdown images as a data exfiltration vector
- Aim Labs researchers find a bypass to M365 Copilot’s image filtering mechanism
- Noma researchers find a bypass to Agentforce’s image filtering mechanism
- Anthropic is saying computer use is dangerous
- Anthropic announcing Claude for Chrome, computer use for the browser
- Malte Ubl (Vercel CTO)’s work on image-free markdown rendering
- Anthropic reporting on adversaries using Claude despite of AI guardrail
- OWASP Agent Observability Standard (AOS)