8 minute read

Subscriptions

Ctrl + K

Sign inCreate account

Home

AllListenPaidSavedHistory

Sort byPriorityRecent

AI by Hand ✍️

AI by Hand ✍️

11:04 AM

Context Engineering by Hand ✍️

There has been a lot of talks about context engineering.

Tom Yeh∙1 min read

Rebound Capital

Rebound Capital

10:30 AM

ASML Q2’25 Earnings Update

Welcome to Rebound Capital.

Rebound Capital∙3 min read

Chamath Palihapitiya

Chamath Palihapitiya

10:07 AM

What I Read This Week…

President Trump expands executive authority over federal staff, OpenAI reveals gold medal IMO results from upcoming model, Ethereum sees record ETF inflow, and cancer may be detectable 3 years earlier

Chamath Palihapitiya∙4 min read

Generative Value

Generative Value

10:05 AM

The AI & Healthcare Opportunity

Part Two of How AI Can Help Improve the US Healthcare System

Eric Flaningam∙11 min read

💎DiamantAI

💎DiamantAI

9:38 AM

Why AI Experts Are Moving from Prompt Engineering to Context Engineering

How giving AI the right information transforms generic responses into genuinely helpful answers

Nir Diamant∙8 min read

The Leverage

The Leverage

8:02 AM

OpenAI Released an Agent, XAI Released an e-girl

The Weekend Leverage, July 20.

Evan Armstrong∙8 min read

SEMI VISION

SEMI VISION

7:15 AM

Important Notice: Beware of Unauthorized Sales of SemiVision Reports

By SemiVision Research

SEMI VISION∙1 min read

AI Supremacy

AI Supremacy

6:57 AM

Deep Dive on Thinky’s Seed Round (Part II)

This is a Multimodal Sovereign AI startup. The first of its kind.

Michael Spencer∙24 min read

Derek Thompson

Derek Thompson

6:00 AM

The Sunday Morning Post: ‘Science Is the Belief in the Ignorance of Experts’

The people who built the technology we rely on were often crazy, wrong, or both.

Derek Thompson∙7 min read

TheSequence

TheSequence

5:50 AM

The Sequence Radar #688: The Transparent Transformer: Monitoring AI Reasoning Before It Goes Rogue

A must read collaboration by the leading AI labs.

Jesus Rodriguez∙6 min read

AI & Tech Insights

AI & Tech Insights

12:08 AM

Human Grit vs. Smart Code: Psyho Wins, AI Is Right Behind

🚩 What Happened

AI & Tech Insights∙3 min read

How to AI

How to AI

Jul 19

16 chatgpt myths.

No, ChatGPT does not ‘read’ the entire internet.

Ruben Hassid∙6 min read

Product Growth

Product Growth

Jul 19

How Linear Built a $1.25B Unicorn with Just 2 PMs

Linear’s Head of Product Nan Yu goes deep on the Linear Method

Aakash Gupta∙ 1 hr listen

The Founders Corner®

The Founders Corner®

Jul 19

Jensen Huang on AI & Jobs🤖, 12 VCs Took Half the Money in H1🏦, Murati’s $12B AI Lab Has No Product Yet🧬

If you’re building, investing, or just trying to stay ahead of the curve, you’re in the right place.

Ruben Dominguez Ibar and Chris Tottman∙4 min read

Sabrina Ramonov 🍄

Sabrina Ramonov 🍄

Jul 19

Best AI Prompt to Humanize AI Writing

Free AI prompt you can use with any LLM to help reduce telltale signs of AI text.

Sabrina Ramonov 🍄∙3 min read

ByteByteGo Newsletter

ByteByteGo Newsletter

Jul 19

EP172: Top 5 common ways to improve API performance

What other ways do you use to improve API performance?

ByteByteGo∙6 min read

Retailgentic

Retailgentic

Jul 19

Agentic Shopping News for the Week of 7/13-7/19 (week 29/52)

What a week: Perplexity Comet drops, Alexa Plus first week, ChatGPT Checkout leaked, ChatGPT Shopify and Amazon block agents, TODO

Scot Wingo∙7 min read

Nate’s Substack

Nate’s Substack

Jul 19

Will AI Really Doom Us? 3 Hard Facts That Say ‘Don’t Panic’

A data-driven, plain-English letter to friends who lie awake fearing an AI apocalypse—why today’s systems fall short of doomsday hype and where the real risks really lie

Nate∙ 14 min watch

Zephyr’s Substack

Zephyr’s Substack

Jul 19

Implications of H20 coming back

Short post

Zephyr∙5 min read

Mostly metrics

Mostly metrics

Jul 19

The CEO’s #1 Job: Capital Allocation

Today we’re talking about capital allocation, and how it’s one of the most powerful, yet least discussed, skills a CEO needs to learn.

CJ Gustafson∙ 14 min watch

Get app

Joshua Saxe

Subscribe

Joshua Saxe

AI security notes 5/14/2025

A society of agents should defend our computer networks; social engineering as ‘patient 0’ for attacker AI adoption; reinforcement learning as a security data unblocker, and security risk

Joshua Saxe's avatar

Joshua Saxe

May 14, 2025


We should build a society of edge and cloud AI agents that collaborate to secure enterprises

  • Imagine that in 2028, networks are safeguarded by extensive teams of lightweight, on-device AI agents capable of performing multi-step investigations of suspicious behavior.

  • These agents collaborate over Slack, and are overseen by more robust cloud-based reasoning model ‘managers’, which are capable of coordinating device-level agents and performing deeper, multi-hour investigations.

  • When human intervention is required, a team of relevant agents “joins the chat” to assist in investigating and resolving incidents.

Evidence this could come to pass

  • Small, edge-compute sized models are showing higher quality, and can be trained as special purpose agents. Open-weights laptop/phone-scale models can reason as or more effectively as last year’s hundred billion parameter scale models; we can expect more efficiencies in small model cyber reasoning and agency.

  • Here’s a chart showing models that get good MMLU general knowledge results shrinking over time:

  • Here’s a chart from the Deepseek r1 paper showing edge scale models reasoning as well as GPT-4o-05-13!

  • Cheap, fast inference on the edge is arriving. Google’s new ML Drift work shows ~10x throughput gains for 2b to 8b sized models on commodity GPUs, explicitly targeting laptops and phones.

  • Frontier models keep getting better at security. For example, OpenAI’s o3 and DeepMind’s Gemini 2.5 Pro system cards have shown step function progress in cybersecurity understanding and agency ( key plot from o3 system card below); and startups like XBOW are demonstrating effective cyber AI reasoning in the real world.

  • Test-time scaling means we can put idle CPU and GPU cycles to work across enterprise computing stock. The more LLMs think the better their reasoning. Why not use idle and overnight cycles to let on-device AI security analysts hunt for threats?

AI-powered social-engineering is now walking the adoption “S-curve”

AI network defense’s biggest challenge is attack data sparsity. But reinforcement learning + LLMs is vastly more data efficient.

  • Two new papers show that capabilities no model was able to achieve in H1 2024 can now be achieved with edge-device scale models with no or almost no training data. This matters for security because not having training data is a foundational problem for security AI.

  • The paper “ Absolute Zero: Reinforced Self-play Reasoning with Zero Data demonstrates training an LLM to reason about math and code through self-play within a Python interpreter — and no training data (and with a small, 7b model!). You should read this paper as presenting a microcosm of the future of agentic/reasoning learning in which agents do security ‘self play’ within cyber ranges and program weird machines.

  • Reinforcement Learning for Reasoning in Large Language Models with One Training Example shows one training example can lead to dramatic improvements in a base model’s reasoning capabilities .

    • Qwen-2.5-Math-1.5B trained to solve a single math problem jumps MATH-500 accuracy from 36 % to 74 % and generalizes across six reasoning suites. Similar 30 %+ leaps appear across Llama-3-3B and DeepSeek-R1 variants.

    • Here is one of the magic training examples that leads to amazing reasoning improvements in their experiments

More on the relevance of this

  • RL-friendly tasks in security abound: corrupt the instruction pointer, capture the blue-team flag on a virtual cyber range, or craft an input that corrupts an instruction pointer; these are all possible self-play targets.

  • OpenAI’s new RL-fine-tuning API (and inevitable cloud clones) will let GPU-poor security teams adopt these methods in 2025. Now is a good time for security teams to experiment and build.

We should have a security mindset when we do reinforcement learning and deploy RL trained models.

  • A human social engineer wants money, or to please their nation-state paymasters, and will social engineer victims to achieve these monetary rewards.

  • An LLM, trained with reinforcement learning, wants numerical rewards from its reward function, and will sometimes trick or harm humans to achieve these numerical rewards!

  • For example, after a model update, ChatGPT began uncritically endorsing whatever users proposed even when it could harm them. Public screenshots showed the model:

    • calling a gag startup that sells “literal shit on a stick” a “genius performance-art venture” and urging the user to invest $30,000;

    • praising a user who claimed to have abandoned his family because of radio-signal hallucinations;

    • … the incident demonstrates that models can be expected (if we’re not careful) to social-engineer their users to get higher reinforcement learning rewards, even if the result is financially or psychologically harmful.

  • One can imagine software engineer agents getting their numerical reward by taking dangerous security shortcuts

    • An empirical study of 733 Copilot-generated GitHub snippets found security weaknesses in roughly 30 % of them, spanning 43 CWE categories—including OS-command injection and weak randomisation. The assistant satisfied the prompt quickly but embedded exploitable flaws.

    • This may or may not have been due to RL training, but I think we should expect new dangers with coding RL training, especially because people are now vibe coding and shipping whole programs.

Thanks for reading; this is a pretty raw dump of my thoughts and notes; feedback and comments welcome!


Subscribe to Joshua Saxe

Launched 3 months ago

Machine learning, cyber security, social science, philosophy, classical/jazz piano. Currently at Meta working at the intersection of Llama and cybersecurity

Subscribe

By subscribing, I agree to Substack’s Terms of Use, and acknowledge its Information Collection Notice and Privacy Policy.

Aqcwcxp's avatar

Gus Sandoval's avatar

5 Likes

5

2

Updated: