AI SAFETY
Anthropic Study Reveals Users Skip Critical Checks on AI-Generated Code
New research from $380B-valued Anthropic shows users are 5.2% less likely to verify AI outputs when creating artifacts, raising questions about automation risks.
Stability AI Joins Tech Coalition to Combat Child Exploitation
Stability AI becomes full member of Tech Coalition after completing 2025 Pathways program, strengthening AI safety measures against online child abuse.
NVIDIA DRIVE AV Powers Mercedes-Benz CLA to Top Euro NCAP Safety Rating
Mercedes-Benz CLA earns Euro NCAP's Best Performer of 2025 award using NVIDIA DRIVE AV software, marking a shift toward AI-driven safety standards in vehicles.
Anthropic Releases Full AI Constitution for Claude Under Open License
Anthropic publishes Claude's complete training constitution under CC0 license, detailing AI safety priorities and ethical guidelines as company eyes $350B valuation.
Anthropic Discovers 'Assistant Axis' to Prevent AI Jailbreaks and Persona Drift
Anthropic researchers map neural 'persona space' in LLMs, finding a key axis that controls AI character stability and blocks harmful behavior patterns.
OpenAI Updates Model Spec with U18 Teen Safety Principles for ChatGPT
OpenAI introduces new U18 Principles to its Model Specification, establishing age-appropriate AI safety guidelines for teenage ChatGPT users ages 13-17.
Anthropic Enhances AI Safeguards for Sensitive Conversations
Anthropic has implemented advanced safeguards for its AI, Claude, to better handle sensitive topics such as suicide and self-harm, ensuring user safety and well-being.
AI Development Framework Aims for Greater Transparency and Safety
Anthropic proposes a framework for AI transparency, focusing on safety and accountability. This initiative aims to enhance public safety and responsible AI development.
Anthropic Strengthens AI Safeguards for Claude
Anthropic enhances its AI model Claude's safety and reliability with robust safeguards, ensuring beneficial outcomes while preventing misuse and harmful impacts.
Character.AI Implements New Safety Measures for Teen Users
Character.AI announces significant changes to enhance the safety of its platform for users under 18, including removing open-ended chat and introducing age assurance tools.
OpenAI Enhances GPT-5 for Sensitive Conversations with New Safety Measures
OpenAI has released an addendum to the GPT-5 system card, showcasing improvements in handling sensitive conversations with enhanced safety benchmarks.
NVIDIA Introduces Safety Measures for Agentic AI Systems
NVIDIA has launched a comprehensive safety recipe to enhance the security and compliance of agentic AI systems, addressing risks such as prompt injection and data leakage.
NVIDIA NeMo Guardrails Enhance LLM Streaming for Safer AI Interactions
NVIDIA introduces NeMo Guardrails to enhance large language model (LLM) streaming, improving latency and safety for generative AI applications through real-time, token-by-token output validation.
Ensuring AI Reliability: NVIDIA NeMo Guardrails Integrates Cleanlab's Trustworthy Language Model
NVIDIA's NeMo Guardrails, in collaboration with Cleanlab's Trustworthy Language Model, aims to enhance AI reliability by preventing hallucinations in AI-generated responses.
OpenAI Releases Comprehensive GPT-4o System Card Detailing Safety Measures
OpenAI's report on GPT-4o highlights extensive safety evaluations, red teaming, and risk mitigations prior to release.