8 minute read

The Edge Logo

Cybersecurity In-Depth: Feature articles on security strategy, latest trends, and people to know.

Assume Breach When Building AI Apps

AI jailbreaks are not vulnerabilities; they are expected behavior.

Picture of Michael Bargury

Michael Bargury, CTO & Co-Founder, Zenity

August 19, 2024

4 Min Read

person in a dark suit and beard breaking out of a white wall

Source: Pinkypills via iStock

LinkedinFacebookTwitterRedditEmail

COMMENTARY

If you are still a skeptic about artificial intelligence (AI), you won’t be for long. I was recently using Claude.ai to model security data I had at hand into a graph for attack path analysis. While I can do this myself, Claude took care of the task in minutes. More importantly, Claude was just as quick to adapt the script when significant changes were made to the initial requirements. Instead of me having to switch between being a security researcher and data engineer — exploring the graph, identifying a missing property or relation, and adapting the script — I could keep on my researcher hat while Claude played the engineer.

These are moments of clarity, when you realize your toolbox has been upgraded, saving you hours or days of work. It seems like many people have been having those moments, becoming more convinced of the impact AI is going to have in the enterprise.

But AI isn’t infallible. There have been a number of public examples of AI jailbreaking, where the generative AI model was fed carefully crafted prompts to do or say unintended things. It can mean bypassing built-in safety features and guardrails or accessing capabilities that are supposed to be restricted. AI companies are trying to solve jailbreaking; some say they have either done so or are making significant progress. Jailbreaking is treated as a fixable problem — a quirk we’ll soon get rid of.

As part of that mindset, AI vendors are treating jailbreaks as vulnerabilities. They expect researchers to submit their latest prompts to a bug-bounty program instead of publishing them on social media for laughs. Some security leaders are talking about AI jailbreaks in terms of responsible disclosure, creating a clear contrast with those supposedly irresponsible people who disclose jailbreaks publicly.

Reality Sees Things Differently

Meanwhile, AI jailbreaking communities are popping up on social media and community platforms, such as Discord and Reddit, like mushrooms after the rain. These communities are more akin to gaming speedrunners than to security researchers. Whenever a new generative AI model is released, these communities race to see who can find a jailbreak first. It usually takes minutes, and they never fail. These communities do not know about, of care about, responsible disclosure.

To quote an X post from Pliny the Prompter, a popular social media account from the AI breaking community: “circumventing AI ‘safety’ measures is getting easier as they become more powerful, not harder. this may seem counterintuitive but it’s all about the surface area of attack, which seems to be expanding much faster than anyone on defense can keep up with.”

Let’s imagine for a second that vulnerability disclosure could work — that we can get every person on the planet to submit their evil prompts to a National Vulnerability Database-style repository before sharing it with their friends. Would that actually help? Last year at DEF CON, the AI village hosted the largest public AI red-teaming event, where they reportedly collected over 17,000 jailbreaking conversations. This was an incredible effort with huge benefits to our understanding of securing AI, but it did not make any significant change to the rate at which AI jailbreaks are discovered.

Vulnerabilities are quirks of the application in which they were found. If the application is complex, it has more surface for vulnerabilities. AI captures human languages so well, but can we really hope to enumerate all quirks of the human experience?

Stop Worrying About Jailbreaks

We need to operate under the assumption that AI jailbreaks are trivial. Don’t give your AI application capabilities it should not be using. If the AI application can perform actions and relies on people not knowing those prompts as a defense mechanism, expect those actions to be eventually exploited by a persistent user.

AI startups are suggesting we think of AI agents as employees who know a lot of facts but need guidance on applying their knowledge to the real world. As security professionals, I believe we need a different analogy: I suggest you think of an AI agent as an expert you want to hire, even though that expert defrauded their previous employer. You really need this employee, so you put a bunch of guardrails in place to ensure this employee won’t defraud you as well. But at the end of the day, every data and access you give this problematic employee exposes your organization and is risky. Instead of trying to create systems that can’t be jailbroken, let’s focus on applications that are easy to monitor for when they inevitably are, so we can quickly respond and limit the impact.

LinkedinFacebookTwitterRedditEmail

About the Author

Michael Bargury

Michael Bargury

CTO & Co-Founder, Zenity

Michael Bargury is an industry expert in cybersecurity focused on cloud security, SaaS security, and AppSec. Michael is the CTO and co-founder of Zenity.io, a startup that enables security governance for low-code/no-code enterprise applications without disrupting business. Prior to Zenity, Michael was a senior architect at Microsoft Cloud Security CTO Office, where he founded and headed security product efforts for IoT, APIs, IaC, Dynamics, and confidential computing. Michael holds 15 patents in the field of cybersecurity and a BSc in Mathematics and Computer Science from Tel Aviv University. Michael is leading the OWASP community effort on low-code/no-code security.

See more from Michael Bargury

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

Subscribe

More Insights

Webinars

More Webinars

Events

More Events

You May Also Like


Application Security

‘Void Banshee’ Exploits Second Microsoft Zero-Day

Application Security

Microsoft VS Code Undermined in Asian Spy Attack

Application Security

Hackers Use Rare Stealth Techniques to Down Asian Military, Gov’t Orgs

Application Security

Microsoft Talks Kernel Drivers Post CrowdStrike Outage

Edge Picks

thumbnail Cyber Risk

Browser Extensions Pose Heightened, but Manageable, Security Risks Browser Extensions Pose Heightened, but Manageable, Security Risks

URL bar of a browser showing part of a website address Endpoint Security

Gartner: Secure Enterprise Browser Adoption to Hit 25% by 2028 Gartner: Secure Enterprise Browser Adoption to Hit 25% by 2028

Icons for Chrome, Edge, and Firefox browsers on a screen Endpoint Security

ClickFix Spin-Off Attack Bypasses Key Browser Safeguards ClickFix Spin-Off Attack Bypasses Key Browser Safeguards

Stream of 0s and 1s running alongside padlock icons Endpoint Security

Extension Poisoning Campaign Highlights Gaps in Browser Security Extension Poisoning Campaign Highlights Gaps in Browser Security

Latest Articles in The Edge

5 Min Read

5 Min Read

6 Min Read

2 Min Read

Read More The Edge

Cookies Button

About Cookies On This Site

We and our partners use cookies to enhance your website experience, learn how our site is used, offer personalised features, measure the effectiveness of our services, and tailor content and ads to your interests while you navigate on the web or interact with us across devices. By clicking “Continue” or continuing to browse our site you are agreeing to our and our partners use of cookies. For more information see Privacy Policy

CONTINUE

Company Logo

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

More information

Allow All

Strictly Necessary Cookies

Always Active

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.    You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.

Performance Cookies

Always Active

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.    All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.

Functional Cookies

Always Active

These cookies enable the website to provide enhanced functionality and personalisation. They may be set by us or by third party providers whose services we have added to our pages.    If you do not allow these cookies then some or all of these services may not function properly.

Targeting Cookies

Always Active

These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites.    They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Back Button

Search Icon

Filter Icon

Clear

checkbox labellabel

ApplyCancel

ConsentLeg.Interest

checkbox labellabel

checkbox labellabel

checkbox labellabel

Confirm My Choices

Powered by Onetrust

Updated: