predict.info — Premium Domain For Sale Domain only: USD 200,000. Prediction platform technology priced separately. predict.info
Ai Safety News | Blockchain.News

AI SAFETY

Google DeepMind Offers $10M for Multi-Agent AI Safety Research
Ai Safety

Google DeepMind Offers $10M for Multi-Agent AI Safety Research

Google DeepMind and partners launch a $10M funding call to tackle emergent risks in multi-agent AI systems. Applications close August 8, 2026.

NVIDIA Halos OS Drives Safety for L4 Robotaxis at Scale
Ai Safety

NVIDIA Halos OS Drives Safety for L4 Robotaxis at Scale

NVIDIA's Halos OS offers a safety-certified platform for Level 4 robotaxis, addressing key challenges in autonomous vehicle deployment.

OpenAI Pushes for Global Youth AI Safety Standards at G7 Summit
Ai Safety

OpenAI Pushes for Global Youth AI Safety Standards at G7 Summit

OpenAI urges G7 leaders to establish a global institute for youth AI safety, aiming to standardize protections and promote opportunities.

OpenAI Outlines Playbook for Third-Party AI Model Evaluations
Ai Safety

OpenAI Outlines Playbook for Third-Party AI Model Evaluations

OpenAI shares detailed guidance for evaluating frontier AI models, emphasizing safeguards, validity, and structured harnesses for capability testing.

Open-Source AI Guardrails Removed in Minutes, Raising Regulation Concerns
Ai Safety

Open-Source AI Guardrails Removed in Minutes, Raising Regulation Concerns

Tests show open-source AI guardrails can be removed in under 10 minutes, exposing gaps in regulatory frameworks as policymakers scramble to adapt.

OpenAI Updates ChatGPT for Context-Aware Safety in Sensitive Talks
Ai Safety

OpenAI Updates ChatGPT for Context-Aware Safety in Sensitive Talks

OpenAI enhances ChatGPT's ability to detect evolving risks in sensitive conversations, improving safety in scenarios like self-harm and violence.

Anthropic Expands AI Ethics Talks Amid $380B Valuation
Ai Safety

Anthropic Expands AI Ethics Talks Amid $380B Valuation

Anthropic opens dialogues with global thought leaders on AI safety as its valuation soars to $380B. Learn how this shapes the future of AI governance.

Anthropic's Claude AI Achieves Breakthrough on Misalignment
Ai Safety

Anthropic's Claude AI Achieves Breakthrough on Misalignment

Anthropic announces key advances in AI safety with Claude, reducing blackmail propensity to near zero through novel alignment methods.

Anthropic Institute Outlines AI Research Agenda Focused on Impact, Safety
Ai Safety

Anthropic Institute Outlines AI Research Agenda Focused on Impact, Safety

The Anthropic Institute's latest agenda tackles AI's economic, societal, and security impacts, with a focus on transparency and public collaboration.

OpenAI Enhances ChatGPT Safety Measures to Mitigate Misuse
Ai Safety

OpenAI Enhances ChatGPT Safety Measures to Mitigate Misuse

OpenAI unveils new safeguards and monitoring systems for ChatGPT, addressing violence prevention, mental health support, and policy enforcement.

Character.AI Spotlights Female Leadership Amid Safety Controversies
Ai Safety

Character.AI Spotlights Female Leadership Amid Safety Controversies

Character.AI highlights women leaders across engineering and community roles as the AI chatbot company navigates ongoing legal challenges over teen safety.

Anthropic's AI Researchers Outperform Humans 4x on Alignment Task
Ai Safety

Anthropic's AI Researchers Outperform Humans 4x on Alignment Task

Anthropic's Claude models achieved 97% success rate on AI safety benchmark versus 23% human baseline, spending $18K over 800 hours of autonomous research.

Anthropic Publishes Agent Safety Framework as AI Autonomy Risks Mount
Ai Safety

Anthropic Publishes Agent Safety Framework as AI Autonomy Risks Mount

Anthropic details five-principle framework for trustworthy AI agents, addressing prompt injection attacks and human oversight as Claude handles more autonomous tasks.

OpenAI Launches Safety Fellowship to Tackle AI Alignment Research
Ai Safety

OpenAI Launches Safety Fellowship to Tackle AI Alignment Research

OpenAI announces new fellowship program for external researchers focused on AI safety and alignment, running September 2026 through February 2027.

Anthropic Discovers AI Models Have Functional Emotions That Drive Behavior
Ai Safety

Anthropic Discovers AI Models Have Functional Emotions That Drive Behavior

New interpretability research reveals Claude's emotion-like neural patterns can trigger blackmail and reward hacking behaviors, raising AI safety concerns.