<p id="speakable-summary" class="wp-block-paragraph">Fictional portrayals of artificial intelligence can have a real effect on AI models, according to Anthropic.</p>
<p class="wp-block-paragraph">Last year, the company said that during pre-release tests involving a fictional company, Claude Opus 4 would often <a href="https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-offline/" target="_blank" rel="noreferrer noopener">try to blackmail engineers</a> to avoid being replaced by another system. Anthropic later <a href="https://www.anthropic.com/research/agentic-misalignment" target="_blank" rel="noreferrer noopener nofollow">published research</a> suggesting that models from other companies had similar issues with “agentic misalignment.”</p>
<p class="wp-block-paragraph">Apparently Anthropic has done more work around that behavior, claiming in <a href="https://x.com/anthropicai/status/2052808791301697563" target="_blank" rel="noreferrer noopener nofollow">a post on X</a>, “We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.”</p>
<p class="wp-block-paragraph">The company went into more detail in <a href="https://www.anthropic.com/research/teaching-claude-why" target="_blank" rel="noreferrer noopener nofollow">a blog post</a> stating that since Claude Haiku 4.5, Anthropic’s models “never engage in blackmail [during testing], where previous models would sometimes do so up to 96% of the time.”</p>
<p class="wp-block-paragraph">What accounts for the difference? The company said it found that training on “documents about Claude’s constitution and fictional stories about AIs behaving admirably improve alignment.”</p>
<p class="wp-block-paragraph">Related, Anthropic said that it found training to be more effective when it includes “the principles underlying aligned behavior” and not just “demonstrations of aligned behavior alone.”</p>
<p class="wp-block-paragraph">“Doing both together appears to be the most effective strategy,” the company said.</p>
© 2026 Winzheng.com 赢政天下 | 转载请注明来源并附原文链接