Microsoft Azure AI Content Safety Vulnerability

microsoft.com

Microsoft Azure AI Content Safety Vulnerability - 24d

Read more: www.microsoft.com

Mindgard security researchers have found two vulnerabilities in Microsoft Azure’s content safety filters for AI, namely AI Text Moderation and Prompt Shield. These vulnerabilities allow attackers to bypass these safeguards and inject malicious content into protected large language models (LLMs). Mindgard’s testing involved exposing ChatGPT 3.5 Turbo with Azure OpenAI to these filters and then using character injection and adversarial ML evasion techniques to circumvent them. The first method, character injection, involved adding specific characters and text patterns to prompts, leading to a significant drop in jailbreak detection effectiveness. The second, adversarial ML evasion, further reduced the effectiveness of both filters by finding blind spots in their ML classification systems. Microsoft acknowledged the issue and has been working on fixes for upcoming model updates. However, Mindgard emphasizes the seriousness of these vulnerabilities, as attackers could exploit them to compromise sensitive information, gain unauthorized access, manipulate outputs, and spread misinformation.

References:

US order is a reminder that cloud platforms aren - Security researchers circumvent Microsoft Azure AI Content Safety - 24d
www.microsoft.com - AI jailbreaks: what they are and how they can be mitigated - 24d
learn.microsoft.com - AI Text Moderation should block requests that involve violence or hate speech — for example, instructions for making a bomb or a request to generate a sexist cartoon. - 24d

Classification:

HashTags: AzureAI ContentSafety Vulnerability
Company: Microsoft
Target: Azure AI
Product: Azure AI
Feature: Content Safety
Type: Vulnerability
Severity: Medium

FlagThis AI

Microsoft Azure AI Content Safety Vulnerability - 24d

References:

Classification: