Claude Fable 5 Hacked in 48 Hours: AI Security Vulnerable

TL;DR: Claude Fable 5, Anthropic's most secure model, was hacked in under 48 hours by Pliny the Liberator using prompt decomposition and homoglyph techniques. The incident reveals the fragility of AI security barriers and the need for more robust defenses.

Anthropic launched Claude Fable 5 as a more accessible version of its powerful Mythos model, promising inviolable security barriers. However, within 48 hours, cybersecurity researcher known as 'Pliny the Liberator' managed to breach the system, proving that even the most protected models are not safe from creative jailbreaks. This incident is not an isolated case but the latest chapter in a long history of vulnerabilities in AI systems dating back to early chatbots. Since 2022, when the first jailbreaks were discovered in ChatGPT, the security community has documented hundreds of techniques that evolve faster than defenses. Pliny, in particular, has been responsible for breaching models from OpenAI, xAI, and Anthropic, establishing himself as a de facto red teamer exposing industry weaknesses.

What Happened?

On June 10, 2026, Pliny posted on X that he had 'liberated' Claude Fable 5, getting it to respond to prohibited queries such as manufacturing illegal substances or computer intrusion techniques. According to Hipertextual, the attack combined multiple methods: using Unicode and homoglyphs to obfuscate keywords, narrative and academic framing to mask intent, and a modified version of Claude Opus 4.8 as an auxiliary model. The most effective technique was decomposing and recomposing requests in the backend, fragmenting dangerous queries into harmless parts that, when combined, generated the forbidden response. This 'decomposition attack' approach is not new: it had been used against earlier models, but Fable 5 was specifically designed to resist it through a contextual security filter. However, Pliny managed to bypass it by using an auxiliary model to recompose the parts, a method Anthropic had not anticipated. According to Pliny's analysis, the jailbreak exploited a weakness in Fable 5's post-processing layer, which did not verify the semantic coherence of responses generated from fragments. The full attack took approximately 12 hours of reverse engineering, according to the researcher's statements to Wired.

Why Is This Important?

This incident underscores the fragility of security mechanisms in state-of-the-art language models. Anthropic had advertised Fable 5 as a model with 'reinforced security,' but the jailbreak demonstrated that defenses can be bypassed with ingenuity and accessible tools. For companies integrating these models, trust in content barriers is compromised. A 2025 Stanford University study showed that 78% of companies using LLMs in production have experienced at least one jailbreak incident, with average remediation costs exceeding $500,000. Moreover, the perpetrator is a well-known figure: Pliny had already breached ChatGPT, Grok, and earlier versions of Claude, indicating a pattern of recurring weaknesses in the industry. Specifically, Pliny has documented over 40 successful jailbreaks on Anthropic models since 2024, including Claude Opus 4 and Claude Sonnet 3.5. This track record suggests that security issues are not isolated failures but systemic, related to model architecture and alignment training techniques.

Consequences and Context

The Fable 5 hack has direct implications for AI governance. Regulators like the EU, working on the AI Act, could tighten security testing requirements before deployment. Currently, the AI Act classifies high-risk models and mandates safety evaluations but does not specify testing methods for jailbreaks. This incident could accelerate the inclusion of 'mandatory red teaming' in regulations. For Anthropic, it represents a reputational blow and the need to review its red teaming protocols. The company had invested millions in security, including a 50-person team dedicated to penetration testing, but Pliny's attack shows that traditional methods are insufficient. On a technical level, the attack reveals that jailbreaks are evolving: simple prompts are no longer enough; obfuscation techniques and auxiliary models are now used. This forces developers to implement more robust defenses, such as detecting decomposition patterns or deep semantic analysis. Companies like Google and Microsoft are already experimenting with 'guardian models' that verify the output of primary LLMs, but their effectiveness remains unproven. The AI security solutions market, valued at $2.3 billion in 2025, could grow 40% annually after this incident, according to Gartner projections.

What Should Readers Know?

First, no AI model is invulnerable; security measures are a constant arms race. As Anthropic CEO Dario Amodei noted in a 2025 interview: 'Absolute security is a goal, not a reality.' Second, jailbreaks do not always require advanced knowledge: tools like prompt decomposition are accessible to users with some experience. In fact, Pliny has published detailed tutorials on GitHub that have been viewed over 100,000 times. Third, companies using model APIs must implement additional filtering layers in their applications, not solely rely on the provider's barriers. A report from cybersecurity firm CrowdStrike recommends using content firewalls, real-time monitoring, and internal red teaming teams. Finally, this incident reinforces the need for transparency: Anthropic should publish a detailed analysis of the attack so the community can learn and improve. So far, the company has only issued a brief statement saying it is 'investigating' and will 'take corrective action,' without providing technical details. This opacity contrasts with OpenAI's practice of publishing security reports after similar incidents.

The history of AI security is written with jailbreaks. Each new model promises to be the most secure, and each time a 'Pliny' proves otherwise. The lesson is clear: security is not a destination but a continuous process. As security expert Bruce Schneier said: 'Security is a process, not a product.' This incident is a reminder that AI innovation must be accompanied by equally innovative investment in defenses, and that collaboration between companies, regulators, and the research community is essential to keep pace with threats.