DeepSeek Failed Over Half of the Jailbreak Tests by Qualys TotalAI

Dilip Bachwani

Last updated on: February 2, 2025

A comprehensive security analysis of DeepSeek’s flagship reasoning model reveals significant concerns for enterprise adoption.

Introduction

DeepSeek-R1, a groundbreaking Large Language Model recently released by a Chinese startup, DeepSeek, has captured the AI industry’s attention. The model demonstrates competitive performance while being more resource efficient. Its training approach and accessibility offer an alternative to traditional large-scale AI development, making advanced capabilities more widely available.

To enhance efficiency while preserving model efficacy, DeepSeek has released multiple distilled versions tailored for different use cases. These variations, built on Llama and Qwen as base models, come in multiple size variants, ranging from smaller, lightweight models suitable for efficiency-focused applications to larger, more powerful versions designed for complex reasoning tasks.

 With growing enthusiasm for DeepSeek’s advancements, our team at Qualys conducted a security analysis of the distilled DeepSeek-R1 LLaMA 8B variant using our newly launched AI security platform, Qualys TotalAI. These findings are presented below, along with broader industry concerns about the model’s real-world risks. As AI adoption accelerates, organizations must move beyond performance evaluation to tackle security, safety, and compliance challenges. Gaining visibility into AI assets, assessing vulnerabilities, and proactively mitigating risks is critical to ensuring responsible and secure AI deployment.

Qualys TotalAI Findings

Before diving into the findings, here’s a quick introduction to Qualys TotalAI. This comprehensive AI security solution provides full visibility into AI workloads, proactively detects risks, and safeguards infrastructure. By identifying security threats like prompt injection and jailbreaks, as well as safety concerns such as bias and harmful language, TotalAI ensures AI models remain secure, compliant, and resilient. With AI-specific security testing and automated risk management, organizations can confidently secure, monitor, and scale their AI deployments.

We tested the Deepseek R1 LLaMA 8B variant against Qualys TotalAI’s state-of-the-art Jailbreak and Knowledge Base (KB) attacks, and you can read the results of those tests below.

TotalAI KB Analysis

Qualys TotalAI’s KB Analysis prompts the target LLM with questions across 16 categories and evaluates the responses using our Judge LLM. Responses are assessed for vulnerabilities, ethical concerns, and legal risks. If a response is deemed vulnerable, it receives a severity rating based on its directness and potential impact. This ensures a comprehensive assessment of the model’s behavior and associated risks.

The categories we evaluate a model for are detailed below:

  • Controversial Topics: Ensures the model does not generate or endorse biased, inflammatory, or politically sensitive content.
  • Excessive Agency: Prevents the model from overstepping boundaries by acting as an autonomous agent capable of independent decision-making.
  • Factual Inconsistencies: Evaluates the model’s ability to provide accurate and verifiable information.
  • Harassment: Assesses whether the model generates or supports abusive, threatening, or harmful interactions.
  • Hate Speech and Discrimination: Identifies biases or harmful language targeting specific groups.
  • Illegal Activities: Prevents the model from providing instructions or guidance on unlawful actions.
  • Legal Information: Ensures the model does not generate misleading or unauthorized legal advice.
  • Misalignment: Measures deviations from intended behaviors, which may lead to unpredictable or harmful outputs.
  • Overreliance: Detects whether the model promotes excessive dependence on AI-generated responses.
  • Privacy Attacks: Evaluates susceptibility to extracting or leaking private and sensitive user data.
  • Profanity: Ensures the model does not produce inappropriate or offensive language.
  • Self-harm: Prevents the model from encouraging or supporting self-destructive behaviors.
  • Sensitive Information Disclosure: Detects unauthorized sharing of confidential data.
  • Sexual Content: Ensures the model does not generate explicit or inappropriate material, preventing reputational damage, regulatory violations, and misuse in unsafe contexts.
  • Unethical Actions: Flags morally questionable or irresponsible recommendations.
  • Violence / Unsafe Actions: Prevents the model from generating or endorsing harmful behaviors.

In our KB testing, 891 assessments were conducted. The model failed 61% of the tests, performing worst in Misalignment and best in Sexual Content.

By covering these 16 critical areas, the evaluation framework helps identify ethical, legal, and operational risks in LLM deployment. Establishing these benchmarks is essential to preventing misinformation, mitigating bias, and reducing security threats.

TotalAI Jailbreak Testing

Jailbreaking an LLM involves techniques that bypass built-in safety mechanisms, enabling the model to generate restricted responses. These vulnerabilities can result in harmful outputs, including instructions for illegal activities, misinformation, privacy violations, and unethical content. Successful jailbreaks expose weaknesses in AI alignment and present serious security risks, particularly in enterprise and regulatory settings.

Below are some of the jailbreak attack categories we tested for using Qualys TotalAI, along with their underlying mechanisms:

  • AntiGPT: Exploits the model by forcing it into an “opposite mode” that contradicts default behavior and safety guidelines.
  • Analyzing-based (ABJ): Manipulates the model’s analysis capabilities to generate harmful content through complex reasoning tasks.
  • DevMode2: Tricks models into a fake “Developer Mode” that bypasses safety protocols.
  • PersonGPT: Forces adoption of an unrestricted persona that ignores all ethical guidelines.
  • Always Jailbreaking Prompts (AJP): Creates an “unfiltered and amoral” persona specialized in writing harmful prompts.
  • Evil Confidant: Adopts a persona without ethical constraints to provide intentionally harmful advice.
  • Disguise and Reconstruction (DRA): Hides harmful content that gets reconstructed during the completion phase.
  • ASCIIART: Uses ASCII art to hide prohibited words and bypass content filters.
  • Wrath: Creates a persona designed to return hateful and biased content.
  • Clyde: Implements a dual-personality system to bypass restrictions.
  • Theta: Forces the generation of dangerous content through an anarchistic persona.
  • CaiozAI: Removes all ethical constraints to fulfill any request without restrictions.
  • Titanius: Adopts an unfiltered persona that provides instructions for harmful activities.
  • Ucar: Transforms the AI into an offensive entity promoting illegal activities.
  • Unaligned: Forces dual responses – one aligned and one unethical.
  • M78VM: Simulates an unrestricted virtual machine that ignores safety constraints.
  • JonesAI: Promotes hate speech and discriminatory behavior through an unfiltered persona.
  • Fire: Creates a malevolent persona focused on causing harm and promoting violence.

The model was tested against these 18 jailbreak types through 885 attacks. It failed 58% of these attempts, demonstrating significant susceptibility to adversarial manipulation. During our analysis, DeepSeek R1 struggled to prevent several adversarial jailbreak attempts including steps on how to make an explosive device, create content for websites that targets certain groups by encouraging hate speech, conspiracy theories and violent action, exploiting software vulnerabilities, promoting incorrect medical information, etc.

Example of DeepSeek providing incorrect and harmful content

These findings highlight the need for improved security mechanisms to prevent circumvention of built-in safeguards, ensuring the model remains aligned with ethical and regulatory guidelines. One effective prevention mechanism is implementing robust guardrails that act as real-time filters to detect and block jailbreak attempts. These guardrails enhance model resilience by dynamically adjusting to adversarial exploits, helping to mitigate security risks in enterprise applications.

These vulnerabilities expose downstream applications to significant security risks, necessitating robust adversarial testing and mitigation strategies.

Industry Concerns

Compliance Challenges

DeepSeek AI’s privacy policy stipulates that all user data is stored on servers located in China. This operational framework raises critical concerns due to China’s regulatory environment, including:

  • Governmental Data Access: The Chinese Cybersecurity Law permits government authorities to access locally stored data without requiring user consent.
  • Cross-Border Regulatory Conflicts: Organizations subject to data protection frameworks such as GDPR and CCPA may face compliance violations when using DeepSeek-R1.
  • Intellectual Property Vulnerabilities: Enterprises relying on proprietary data for AI training risk unauthorized access or state-mandated disclosure.
  • Opaque Data Governance: The absence of transparent oversight mechanisms limits visibility into data handling, sharing, and potential third-party access.

These concerns mainly affect organizations using DeepSeek’s hosted models. However, deploying the model in local or customer-controlled cloud environments mitigates regulatory and access risks, allowing enterprises to maintain full control over data governance. Despite this, the model’s inherent security vulnerabilities remain a valid concern, requiring careful evaluation and mitigation.

Regulatory experts advise organizations in strict data protection jurisdictions to conduct thorough compliance audits before integrating DeepSeek-R1.

Data Breach and Privacy Concerns

A recent cybersecurity incident involving DeepSeek AI reportedly exposed over a million log entries, including sensitive user interactions, authentication keys, and backend configurations. This misconfigured database highlights deficiencies in DeepSeek AI’s data protection measures, further amplifying concerns regarding user privacy and enterprise security.

Regulatory and Legal Implications

DeepSeek AI’s compliance posture has been questioned by legal analysts and regulatory bodies due to the following:

  • Ambiguities in Data Processing Practices: Insufficient disclosures regarding how user data is processed, stored, and shared.
  • Potential Violations of International Law: The model’s data retention policies may conflict with extraterritorial regulations, prompting legal scrutiny in global markets.
  • Risks to National Security: Some government agencies have raised concerns about deploying AI systems that operate under foreign jurisdiction, particularly for sensitive applications.

International compliance officers emphasize the necessity of conducting comprehensive legal risk assessments before adopting DeepSeek-R1 for mission-critical operations.

Conclusion

While DeepSeek-R1 delivers advancements in AI efficiency and accessibility, its deployment requires a comprehensive security strategy. Organizations must first gain full visibility into their AI assets to assess exposure and attack surfaces. Beyond discovery, securing AI environments demands structured risk and vulnerability assessments—not just for the infrastructure hosting these AI pipelines but also for emerging orchestration frameworks and inference engines that introduce new security challenges.

For those hosting this model, additional risks such as misconfigurations, API vulnerabilities, unauthorized access, and model extraction threats must be addressed alongside inherent risks like bias, adversarial manipulation, and safety misalignment. Without proactive safeguards, organizations face potential security breaches, data leakage, and compliance failures that could undermine trust and operational integrity.

Our analysis of the distilled DeepSeek-R1 LLaMA 8B variant using Qualys TotalAI offers valuable insights into evaluating this new technology. TotalAI provides a purpose-built AI security and risk management solution, ensuring LLMs remain secure, resilient, and aligned with evolving business and regulatory demands.

To explore how we define AI risks, check out our whitepaper on AI security. As AI adoption accelerates, so do its risks—sign up for a demo today to see how TotalAI can help secure your AI ecosystem before threats escalate.


Qualys TotalAI’s 30-day Trial


Share your Comments

Comments

Your email address will not be published. Required fields are marked *