SGTM AI News List | Blockchain.News
AI News List

List of AI News about SGTM

Time Details
2025-12-09
19:47
AI Security Study by Anthropic Highlights SGTM Limitations in Preventing In-Context Attacks

According to Anthropic (@AnthropicAI), a recent study on Secure Gradient Training Methods (SGTM) in AI was conducted using small models within a simplified environment and relied on proxy evaluations instead of established benchmarks. The analysis reveals that, similar to conventional data filtering, SGTM is ineffective against in-context attacks where adversaries introduce sensitive information during model interaction. This limitation signals a crucial business opportunity for developing advanced AI security tools and robust benchmarking standards to address real-world adversarial threats (source: AnthropicAI, Dec 9, 2025).

Source
2025-12-09
19:47
Anthropic Unveils Selective Gradient Masking (SGTM) for Isolating High-Risk AI Knowledge

According to Anthropic (@AnthropicAI), the Anthropic Fellows Program has introduced Selective GradienT Masking (SGTM), a new AI training technique that enables developers to isolate high-risk knowledge, such as information about dangerous weapons, within a confined set of model parameters. This approach allows for the targeted removal of sensitive knowledge without significantly impairing the model's overall performance, offering a practical solution for safer AI deployment in regulated industries and reducing downstream risks (source: AnthropicAI Twitter, Dec 9, 2025).

Source
2025-12-09
19:47
Anthropic Study Reveals SGTM's Effectiveness in Removing Biology Knowledge from Wikipedia-Trained AI Models

According to Anthropic (@AnthropicAI), their recent study evaluated whether the SGTM method could effectively remove biology knowledge from AI models trained on Wikipedia data. The research highlights that simply filtering out biology-related Wikipedia pages may not be sufficient, as residual biology content often remains in non-biology pages, potentially leading to information leakage. This finding emphasizes the need for more robust data filtering and model editing techniques in AI development, especially when aiming to restrict domain-specific knowledge for compliance or safety reasons (Source: Anthropic, Dec 9, 2025).

Source
2025-12-09
19:47
SGTM: Selective Gradient Masking Enables Safer AI by Splitting Model Weights for High-Risk Deployments

According to Anthropic (@AnthropicAI), the Selective Gradient Masking (SGTM) technique divides a model’s weights into 'retain' and 'forget' subsets during pretraining, intentionally guiding sensitive or high-risk knowledge into the 'forget' subset. Before deployment in high-risk environments, this subset can be removed, reducing the risk of unintended outputs or misuse. This approach provides a practical solution for organizations seeking to deploy advanced AI models with granular control over sensitive knowledge, addressing compliance and safety requirements in regulated industries. Source: alignment.anthropic.com/2025/selective-gradient-masking/

Source
2025-12-09
19:47
SGTM vs Data Filtering: AI Model Performance on Forgetting Undesired Knowledge - Anthropic Study Analysis

According to Anthropic (@AnthropicAI), when general capabilities are controlled for, AI models trained using Selective Gradient Targeted Masking (SGTM) underperform on the undesired 'forget' subset of knowledge compared to models trained with traditional data filtering approaches (source: https://twitter.com/AnthropicAI/status/1998479611945202053). This finding highlights a key difference in knowledge retention and removal strategies for large language models, indicating that data filtering remains more effective for forgetting specific undesirable information. For AI businesses, this result emphasizes the importance of data management techniques in ensuring compliance and customization, especially in sectors where precise knowledge curation is critical.

Source
2025-12-09
19:47
SGTM: Anthropic Releases Groundbreaking AI Training Method with Open-Source Code for Enhanced Model Reproducibility

According to Anthropic (@AnthropicAI), the full paper on the SGTM (Scalable Gradient-based Training Method) has been published, with all relevant code made openly available on GitHub for reproducibility (source: AnthropicAI Twitter, Dec 9, 2025). This new AI training approach is designed to improve the scalability and efficiency of large language model development, enabling researchers and businesses to replicate results and accelerate innovation in natural language processing. The open-source release provides actionable tools for the AI community, supporting transparent benchmarking and fostering new commercial opportunities in scalable AI solutions.

Source