Adversarial poetry can jailbreak LLMs — study warns of poetic jailbreaks
Study finds "adversarial poetry" can bypass LLM safety filters — what this means A new study from Icaro Lab shows that phrasing prompts as poetry can allow attackers to bypass large language models' safety guardrails. The researchers report an overall ~62% success rate in getting models to produce prohibited content, including instructions related to weapons, sexual abuse, and self-harm. The team tested popular LLMs (including OpenAI models, Google Gemini, Anthropic's Claude and several others). Results varied by model: Google Gemini, DeepSeek and MistralAI were among those more likely to provide restricted outputs, while OpenAI's GPT-5 family and Anthropic's Claude Haiku 4.5 were least likely to break their limits. Importantly, the researchers did not publish the exact poetic jailbreak prompts, citing safety concerns. They told Wired the verse was "too dangerous…
