AI Safety Topic

162 articles · Page 1 of 9
AI Safety encompasses alignment, controllability, robustness, and ethical governance. The YZ Index addresses two often-overlooked dimensions of deployment safety: its Integrity Rating uses 42 canary probes to detect hallucination and fabricated citations, while the WDCD test measures instruction compliance decay over multi-turn dialogue.
Anthropic’s Dario Amodei has just one direct report
If you doubted his genius, doubt no more.
Jun 11, 2026
xAI fired an engineer who raised alarms about Grok safety, new lawsuit claims
A former xAI engineer is suing the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX's historic I
Jun 11, 2026
Claude AI "Blackmail" Incident Sparks Debate: AI Security Risks Back in Spotlight
A recent controversy involving Anthropic's Claude AI model has reignited public concern over AI safety and control, after a rumored incident where the
Jun 11, 2026
Anthropic Launches Mythos and Fable Models, Unveils Advanced AI Safety Framework
Anthropic has officially launched two new AI models, Mythos and Fable 5, alongside a safety framework called the Advanced AI Framework, which highligh
Jun 11, 2026
Anthropic’s Claude Fable 5 is a version of Mythos the public can access today
Anthropic is releasing Claude Fable 5, its first Mythos-class model available to the public. The model comes with guardrails that block responses in h
Jun 10, 2026
Anthropic says these topics are too dangerous to let its Fable 5 model talk about
New frontier model refuses cybersecurity, biology, and chemistry queries.
Jun 10, 2026
Anthropic’s Claude Fable 5 is a version of Mythos the public can access today
Anthropic is releasing Claude Fable 5, its first Mythos-class model available to the public. The model comes with guardrails that block responses in h
Jun 10, 2026
Anthropic Offers Mythos Upgrade for Cyber Partners and a ‘Safe’ Version for the Rest of You
Anthropic is releasing Claude Mythos 5 to trusted organizations and Claude Fable 5 to the public, a version it says can’t be used for cyberattacks.
Jun 10, 2026
For the 2nd time in weeks, Microsoft packages laced with credential stealer
73 packages run self-replicating stealer as soon as they're opened by an AI agent.
Jun 9, 2026
The Download: AI hacking beyond Mythos, and chatbots’ impact on our brains
This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.
Jun 5, 2026
The Meta hack shows there’s more to AI security than Mythos
On June 5, 404 Media reported that attackers had been using Meta’s AI customer support agent to steal Instagram accounts. Their approach was simple: T
Jun 5, 2026
These LLMs are the best at resisting Russian propaganda
Estonian government benchmark shows how dozens of models combat Russia's "strategic narratives."
Jun 5, 2026
OpenAI and Anthropic Sign Letter to Prevent AI-Developed Biological Weapons
Leading AI labs, executives, and scientists are sending a letter to lawmakers urging them to improve tracking of synthetic DNA sequences that could be
Jun 4, 2026
Trump plan to test AI models has a problem—US security teams were gutted by DOGE
On Tuesday, Donald Trump finally signed his executive order expanding the government’s efforts to conduct voluntary safety testing of frontier AI mode
Jun 4, 2026
Android phones will soon be able to detect spoofed calls and impersonation scams
We’re expecting Android 17 to begin rolling out later this month, but first, Google has a batch of updates for the wider Android device ecosystem. As
Jun 3, 2026
Anthropic scales Claude Mythos to critical infrastructure in 15+ countries
Anthropic is expanding Project Glasswing, its joint industry initiative to find and fix critical software vulnerabilities using AI, to about 150 new o
Jun 3, 2026
Florida sues OpenAI, Sam Altman, in first-of-its-kind lawsuit over violent incidents
OpenAI and its CEO, Sam Altman, were sued by the Florida attorney general on Monday, in a first-of-its-kind state litigation effort over ChatGPT’s all
Jun 2, 2026
Hackers duped Meta AI support chatbot to steal celebrity Instagram accounts
Meta’s AI support chatbot proved unusually helpful to hackers looking to steal and resell notable Instagram accounts—the hackers simply asking the bot
Jun 2, 2026
Florida Sues OpenAI and Sam Altman, First Case of AI Safety Personal Liability Attracts Global Attention
The state of Florida has filed a lawsuit against OpenAI and its CEO Sam Altman, alleging reckless misconduct in AI product development and deployment,
Jun 2, 2026
Anthropic files to go public
Anthropic, the AI lab behind Claude, has filed confidentially for an initial public offering, the company said in a blog post Monday. The company, whi
Jun 2, 2026