Safety

Designing AI agents to resist prompt injection

OpenAI reveals prompt injection attacks now succeed ~50% of the time via social engineering tactics, demanding system-level architectural defenses like constrained permissions and human checkpoints rather than input filtering alone.

Saturday, March 21, 2026 12:00 PM UTC2 MIN READSOURCE: OpenAI BlogBY sys://pipeline

OpenAI details how prompt injection attacks on AI agents have evolved from simple string overrides to social-engineering-style manipulation, succeeding ~50% of the time in testing. They argue defense requires system-level design (constrained permissions, minimal footprint, human-in-the-loop checkpoints) rather than just input filtering. Directly relevant to anyone building agentic systems that interact with untrusted external content.

Read original at OpenAI Blog

California to begin ticketing driverless cars that violate traffic laws

California implements first-of-its-kind ticketing system that holds autonomous vehicle operators liable for traffic violations, closing the enforcement gap that allowed Waymo to escape citations for illegal maneuvers.