Assume Breach When Building AI Apps
Cybersecurity In-Depth: Feature articles on security strategy, latest trends, and people to know.
Assume Breach When Building AI Apps
AI jailbreaks are not vulnerabilities; they are expected behavior.
Michael Bargury, CTO & Co-Founder, Zenity
August 19, 2024
4 Min Read
Source: Pinkypills via iStock
LinkedinFacebookTwitterRedditEmail
COMMENTARY
If you are still a skeptic about artificial intelligence (AI), you won’t be for long. I was recently using Claude.ai to model security data I had at hand into a graph for attack path analysis. While I can do this myself, Claude took care of the task in minutes. More importantly, Claude was just as quick to adapt the script when significant changes were made to the initial requirements. Instead of me having to switch between being a security researcher and data engineer — exploring the graph, identifying a missing property or relation, and adapting the script — I could keep on my researcher hat while Claude played the engineer.
These are moments of clarity, when you realize your toolbox has been upgraded, saving you hours or days of work. It seems like many people have been having those moments, becoming more convinced of the impact AI is going to have in the enterprise.
But AI isn’t infallible. There have been a number of public examples of AI jailbreaking, where the generative AI model was fed carefully crafted prompts to do or say unintended things. It can mean bypassing built-in safety features and guardrails or accessing capabilities that are supposed to be restricted. AI companies are trying to solve jailbreaking; some say they have either done so or are making significant progress. Jailbreaking is treated as a fixable problem — a quirk we’ll soon get rid of.
As part of that mindset, AI vendors are treating jailbreaks as vulnerabilities. They expect researchers to submit their latest prompts to a bug-bounty program instead of publishing them on social media for laughs. Some security leaders are talking about AI jailbreaks in terms of responsible disclosure, creating a clear contrast with those supposedly irresponsible people who disclose jailbreaks publicly.
Reality Sees Things Differently
Meanwhile, AI jailbreaking communities are popping up on social media and community platforms, such as Discord and Reddit, like mushrooms after the rain. These communities are more akin to gaming speedrunners than to security researchers. Whenever a new generative AI model is released, these communities race to see who can find a jailbreak first. It usually takes minutes, and they never fail. These communities do not know about, of care about, responsible disclosure.
To quote an X post from Pliny the Prompter, a popular social media account from the AI breaking community: “circumventing AI ‘safety’ measures is getting easier as they become more powerful, not harder. this may seem counterintuitive but it’s all about the surface area of attack, which seems to be expanding much faster than anyone on defense can keep up with.”
Let’s imagine for a second that vulnerability disclosure could work — that we can get every person on the planet to submit their evil prompts to a National Vulnerability Database-style repository before sharing it with their friends. Would that actually help? Last year at DEF CON, the AI village hosted the largest public AI red-teaming event, where they reportedly collected over 17,000 jailbreaking conversations. This was an incredible effort with huge benefits to our understanding of securing AI, but it did not make any significant change to the rate at which AI jailbreaks are discovered.
Vulnerabilities are quirks of the application in which they were found. If the application is complex, it has more surface for vulnerabilities. AI captures human languages so well, but can we really hope to enumerate all quirks of the human experience?
Stop Worrying About Jailbreaks
We need to operate under the assumption that AI jailbreaks are trivial. Don’t give your AI application capabilities it should not be using. If the AI application can perform actions and relies on people not knowing those prompts as a defense mechanism, expect those actions to be eventually exploited by a persistent user.
AI startups are suggesting we think of AI agents as employees who know a lot of facts but need guidance on applying their knowledge to the real world. As security professionals, I believe we need a different analogy: I suggest you think of an AI agent as an expert you want to hire, even though that expert defrauded their previous employer. You really need this employee, so you put a bunch of guardrails in place to ensure this employee won’t defraud you as well. But at the end of the day, every data and access you give this problematic employee exposes your organization and is risky. Instead of trying to create systems that can’t be jailbroken, let’s focus on applications that are easy to monitor for when they inevitably are, so we can quickly respond and limit the impact.
LinkedinFacebookTwitterRedditEmail
About the Author
CTO & Co-Founder, Zenity
Michael Bargury is an industry expert in cybersecurity focused on cloud security, SaaS security, and AppSec. Michael is the CTO and co-founder of Zenity.io, a startup that enables security governance for low-code/no-code enterprise applications without disrupting business. Prior to Zenity, Michael was a senior architect at Microsoft Cloud Security CTO Office, where he founded and headed security product efforts for IoT, APIs, IaC, Dynamics, and confidential computing. Michael holds 15 patents in the field of cybersecurity and a BSc in Mathematics and Computer Science from Tel Aviv University. Michael is leading the OWASP community effort on low-code/no-code security.
Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.
More Insights
Webinars
- Think Like a Cybercriminal to Stop the Next Potential Attack Jul 22, 2025
- Elevating Database Security: Harnessing Data Threat Analytics and Security Posture Jul 23, 2025
- The DOGE-effect on Cyber: What’s happened and what’s next? Jul 24, 2025
- Solving ICS/OT Patching and Vulnerability Management Conundrum Jul 30, 2025
- Creating a Roadmap for More Effective Security Partnerships Aug 14, 2025
Events
- [Virtual Event] Strategic Security for the Modern Enterprise Jun 26, 2025
- [Virtual Event] Anatomy of a Data Breach Jun 18, 2025
- [Conference] Black Hat USA - August 2-7 - Learn More Aug 2, 2025
You May Also Like
Application Security
‘Void Banshee’ Exploits Second Microsoft Zero-Day
Application Security
Microsoft VS Code Undermined in Asian Spy Attack
Application Security
Hackers Use Rare Stealth Techniques to Down Asian Military, Gov’t Orgs
Application Security
Microsoft Talks Kernel Drivers Post CrowdStrike Outage
Edge Picks
Browser Extensions Pose Heightened, but Manageable, Security Risks Browser Extensions Pose Heightened, but Manageable, Security Risks
URL bar of a browser showing part of a website address Endpoint Security
Gartner: Secure Enterprise Browser Adoption to Hit 25% by 2028 Gartner: Secure Enterprise Browser Adoption to Hit 25% by 2028
Icons for Chrome, Edge, and Firefox browsers on a screen Endpoint Security
ClickFix Spin-Off Attack Bypasses Key Browser Safeguards ClickFix Spin-Off Attack Bypasses Key Browser Safeguards
Stream of 0s and 1s running alongside padlock icons Endpoint Security
Extension Poisoning Campaign Highlights Gaps in Browser Security Extension Poisoning Campaign Highlights Gaps in Browser Security
Latest Articles in The Edge
5 Min Read
- AI Is Reshaping How Attorneys Practice Law Jul 15, 2025 |
5 Min Read
- Browser Exploits Wane as Users Become the Attack Surface Jul 9, 2025 |
6 Min Read
- Unlock Security Operations Success With Data Analysis Jul 8, 2025 |
2 Min Read
Cookies Button
About Cookies On This Site
We and our partners use cookies to enhance your website experience, learn how our site is used, offer personalised features, measure the effectiveness of our services, and tailor content and ads to your interests while you navigate on the web or interact with us across devices. By clicking “Continue” or continuing to browse our site you are agreeing to our and our partners use of cookies. For more information see Privacy Policy
CONTINUE
Cookie Policy
When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.
Allow All
Manage Consent Preferences
Strictly Necessary Cookies
Always Active
These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.
Performance Cookies
Always Active
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.
Functional Cookies
Always Active
These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.
Targeting Cookies
Always Active
These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.
Back Button
Cookie List
Search Icon
Filter Icon
Clear
checkbox labellabel
ApplyCancel
ConsentLeg.Interest
checkbox labellabel
checkbox labellabel
checkbox labellabel
Confirm My Choices