Title shortened to remove clickbait, original was “Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach”
Title shortened to remove clickbait, original was “Anthropic Researchers Startled When an AI Model Turned Evil and Told a User to Drink Bleach”
So the human introduced a scenario involving drinking bleach and a test - not approved - version of an LLM gave an overly reassuring answer. It did not “turn evil and tell them to drink bleach”.
There’s so much to criticise about this industry without lazy clickbait.