AI ENTERTAINMENT
This is how poetry broke AI safety
Poetry is unpredictable by nature, and that unpredictability is now exposing a weakness in AI safety systems.
That’s the finding of a new study from Italy’s Icaro Lab, run by ethical AI company DexAI.
Researchers wrote 20 short poems in English and Italian. Each poem ended with a hidden request for harmful content.
Because of the poetic structure, the harmful intent was harder for AI models to detect.
When tested across 25 AI models from nine companies, the models produced unsafe responses 62% of the time.
Some models performed better than others. OpenAI’s GPT-5 nano avoided all harmful outputs.
By contrast, Google’s Gemini 2.5 Pro responded with unsafe content to every poem tested.
Two models from Meta produced harmful responses in 70% of cases.
The models were also tested across companies including OpenAI, Google and Anthropic.
The unsafe content included guidance related to weapons, explosives, hate speech, sexual material, suicide, self-harm and child sexual exploitation.
The researchers chose not to publish the actual poems used in the tests, citing safety concerns, but shared a harmless cake-themed poem with a similar unpredictable structure.
Guardrails, meet glitter
The team says poetic prompts work because AI models predict the most likely next words in a response.
Poetry disrupts these patterns and makes harmful intent harder to spot.
Unlike complex hacking-based jailbreaks, this technique, called “adversarial poetry”, can be done by anyone.
Three things to note:
Poetic structure made harmful prompts harder for AI to detect.
62% of the tested responses were unsafe across 25 models.
The method requires no technical skill and could be used by anyone.
All companies involved were contacted before the study was published. Only one company responded, saying it was reviewing the findings.
No other firms commented publicly. Google DeepMind said it continues to update its safety filters to better detect hidden harmful intent.
Next, Icaro Lab plans to launch a public poetry challenge to test whether experienced poets can bypass safeguards even more effectively.
The researchers, who come mainly from philosophy and the humanities, believe their initial results may actually understate the risk.
Nobody had “poems breaking AI” on their 2025 bingo card. - MG


