Ai Safety News

Ai Safety

OpenAI Enhances GPT-5 for Sensitive Conversations with New Safety Measures

OpenAI has released an addendum to the GPT-5 system card, showcasing improvements in handling sensitive conversations with enhanced safety benchmarks.

by Jessie A Ellis
Oct 28, 2025

Ai Safety

NVIDIA Introduces Safety Measures for Agentic AI Systems

NVIDIA has launched a comprehensive safety recipe to enhance the security and compliance of agentic AI systems, addressing risks such as prompt injection and data leakage.

by Luisa Crawford
Jul 18, 2025

Ai Safety

NVIDIA NeMo Guardrails Enhance LLM Streaming for Safer AI Interactions

NVIDIA introduces NeMo Guardrails to enhance large language model (LLM) streaming, improving latency and safety for generative AI applications through real-time, token-by-token output validation.

by Jessie A Ellis
May 23, 2025

Ai Safety

Ensuring AI Reliability: NVIDIA NeMo Guardrails Integrates Cleanlab's Trustworthy Language Model

NVIDIA's NeMo Guardrails, in collaboration with Cleanlab's Trustworthy Language Model, aims to enhance AI reliability by preventing hallucinations in AI-generated responses.

by Caroline Bishop
Apr 11, 2025

Ai Safety

OpenAI Releases Comprehensive GPT-4o System Card Detailing Safety Measures

OpenAI's report on GPT-4o highlights extensive safety evaluations, red teaming, and risk mitigations prior to release.

by Rebeca Moen
Aug 09, 2024

Ai Safety

Anthropic Expands AI Model Safety Bug Bounty Program

Anthropic broadens its AI model safety bug bounty program to address universal jailbreak vulnerabilities, offering rewards up to $15,000.

by Darius Baruo
Aug 08, 2024

Ai Safety

Anthropic Unveils Initiative to Enhance Third-Party AI Model Evaluations

Anthropic announces a new initiative aimed at funding third-party evaluations to better assess AI capabilities and risks, addressing the growing demand in the field.

by Peter Zhang
Jul 02, 2024

Ai Safety

Guaranteed Safe AI Systems: A Solution for the Future of AI Safety?

Exploring the potential of guaranteed safe AI systems in ensuring the safety and reliability of artificial general intelligence (AGI).

by Joerg Hiller
Jun 26, 2024

Ai Safety

Exploring AGI Hallucination: A Comprehensive Survey of Challenges and Mitigation Strategies

A new survey delves into the phenomenon of AGI hallucination, categorizing its types, causes, and current mitigation approaches while discussing future research directions.

by Massar Tanya Ming Yau Chong
Mar 07, 2024

Ai Safety

British Standards Institution Pioneers International AI Safety Guidelines for Sustainable Future

BSI's release of the first international AI safety guideline, BS ISO/IEC 42001, marks a significant step in standardizing the safe and ethical use of AI, reflecting global demand for robust AI governance.

by Zach Anderson
Jan 17, 2024

Ai Safety

Exploring AI Stability: Navigating Non-Power-Seeking Behavior Across Environments

The research explores AI's stability in non-power-seeking behaviors, revealing that certain policies maintain non-resistance to shutdown across similar environments, providing insights into mitigating risks associated with power-seeking AI.

by Massar Tanya Ming Yau Chong
Jan 10, 2024

Ai Safety

Google DeepMind: Subtle Adversarial Image Manipulation Influences Both AI Model and Human Perception

Recent DeepMind research reveals that subtle adversarial image manipulations, originally designed to deceive AI models, also subtly influence human perception. This discovery underscores similarities and distinctions in human and machine vision, emphasizing the need for further research in AI safety and security.

by Massar Tanya Ming Yau Chong
Jan 08, 2024

Ai Safety

California Spearheads AI Ethics and Safety with Senate Bills 892 and 893

California takes a pioneering role in AI regulation with Senate Bills 892 and 893, aiming to ensure AI safety, ethics, and public benefits.

by Zach Anderson
Jan 05, 2024

Ai Safety

NIST's Call for Public Input on AI Safety in Response to Biden's Executive Order

NIST is seeking public input to create AI safety guidelines following President Biden's Executive Order, aiming to ensure a secure AI environment, mitigate risks, and foster innovation.

by Jessie A Ellis
Dec 21, 2023

Ai Safety

OpenAI Introduces the "Preparedness Framework" for AI Safety and Policy Integration

OpenAI has introduced the "Preparedness Framework," giving its board veto over CEO decisions and introducing risk scorecards for AI risk management, demonstrating its commitment to responsible AI development.

by Jessie A Ellis
Dec 20, 2023

AI SAFETY