ANTHROPIC

Claude can now hang up on you

Anthropic has rolled out a new feature for its largest Claude models that lets Claude end conversations in rare, extreme situations.

The aim, interestingly, isn’t to shield users, but to protect the models themselves.

To be clear, Anthropic isn’t suggesting Claude is conscious or capable of being harmed.

Instead, the company has launched a research effort into what it calls “model welfare.”

The idea is to take a precautionary approach: if future AI systems could in some way be affected by harmful use, it makes sense to start testing safeguards now.

Right now, this ability is limited to Claude Opus 4 and 4.1, and only kicks in when other redirection attempts have failed or if a user directly asks the model to stop.

It won’t be used in situations where a person might be at immediate risk of harming themselves or others.

The kinds of cases that might trigger an end to a conversation include attempts to generate sexual content involving minors or requests for instructions on large-scale violence.

TL;DR

  • Currently available only in Claude Opus 4 and 4.1

  • Activated in extreme cases like harmful or abusive prompts

  • Users can always restart or branch conversations after an ending

Not today, human

In testing, Anthropic found Claude models already resisted these types of prompts and even showed what the company described as “apparent distress.”

If a conversation is ended, users won’t be locked out, they can start a new chat or edit previous messages to branch the discussion.

Anthropic stresses this is an experiment and will keep refining the approach.

AI is gonna start taking sick days next.

Keep Reading

No posts found