Cloudforce One reveals AI reasoning security risks

Cloudforce One has released new research examining how adversarial deception techniques can bypass AI reasoning systems. The study analysed seven leading AI models, including both frontier and non-frontier systems, to assess how threat actors could exploit weaknesses in AI reasoning.

The research found that attackers are now using lures – blocks of text designed to emotionally manipulate or confuse AI models – to trick security auditors into white-listing malicious code. This research is a technical reality check. As organizations shift to lean heavily on autonomous systems, and LLMs, the perimeter is changing. The attack surface has expanded beyond the network, with a massive target now shifting to the model’s reasoning itself… so what happens if the models that run critical parts of your business are tampered with?

Some High-Level Takeaways:

The 1% bypass zone: Subtle deception is most effective. When safety lures—i.e., comments claiming the code is benign—make up less than 1% of a file, AI detection rates plummeted to 53%. In this case, the lures subtly nudge model reasoning without triggering the protesting too much suspicion.
The U-curve of deception: Moderate attempts to trick AI often work, but protesting too much (over 1,000 comments) triggers a repetition alarm that causes the AI to flag the code as fraudulent
The context trap: The greatest threat isn’t linguistic, it’s structural. By burying malicious payloads inside large library bundles (like React SDKs), attackers crashed detection rates to just 12%, effectively exhausting the AI’s focus.
Linguistic profiling: The study found that AI models have developed stereotypes. For example, some models flagged Russian or Chinese comments as high-risk signals regardless of the code’s actual function, while being more trusting of languages like Estonian.
AI reasoning as an attack surface: Threat actors are increasingly focusing on manipulating model cognition rather than breaking traditional security controls.
Structural obfuscation is highly effective: Embedding malicious code within large, legitimate-looking software packages significantly reduces detection rates.
Bias in model interpretation: Language-based heuristics may introduce unintended bias, affecting security decisions in inconsistent ways.
Scaling complexity increases vulnerability: Larger and more complex code contexts reduce the ability of models to accurately identify malicious intent.

Industry Implications

As enterprises accelerate adoption of AI-driven security, automation, and development pipelines, the findings suggest a pressing need to reassess how trust is established in AI-generated decisions.

Cloudforce One emphasizes that organizations must move beyond traditional prompt safety approaches and adopt more robust model evaluation, adversarial testing, and context-aware security frameworks.

Cloudforce One is the company’s dedicated threat intelligence and research team focused on tracking advanced cyber threats, emerging attacker techniques, and security risks impacting global digital infrastructure