Creating Noise: The Emerging Obfuscation Technique Designed to Evade Email Security NLP Detection Capabilities

James Dyer | Mar 16, 2026

Our Threat Intelligence team has observed an emerging obfuscation technique, specifically used to make Natural Language Processing (NLP) detection capabilities less effective. Broadly, malicious actors are adding additional characters, break lines, and legitimate links to the end of a phishing email in an attempt to disguise their malicious payloads amongst the noise and evade NLP detection.

For this threat alert, our team analyzed 40 emerging attacks identified by KnowBe4 Defend that used this technique to understand how it works, why attackers are employing it, and its potential for success.

Of the analyzed attacks, the most common legitimate part of an email appended to the attack was the Bank of America email signature, while ‘Uber.com’ and ‘Bofa.com’ were the most frequently used legitimate links.

Quick Attack Summary

  • Vector and type: Email phishing
  • Technique: Natural Language Processing (NLP) obfuscation
  • Targets: Organizations in North America
  • Platform: Microsoft 365, layered with Integrated Cloud Email Security (ICES) solutions

In an effort to obfuscate malicious payloads such as links and attachments, threat actors are appending an additional email body to their malicious phishing email. The appended emails are usually harmless and often include benign language that will not trigger NLP detection and legitimate links that are not present on any block list.

The phishing email can be divided into two key components: the malicious content immediately visible to the recipient at the top of the email and the obfuscation element appended at the bottom. This obfuscation element is usually where the benign links and language are present. These two parts are often separated by numerous HTML break lines (empty whitespace) that aim to deter the recipient from scrolling all the way down to notice the obfuscation element. Our Threat Intelligence team has identified only eight instances where fewer than 100 break lines were used, with an average of 157 break lines observed. In addition, cybercriminals are significantly increasing the character count in this part of the email to give NLP more data to process.

Average percentage split of the benign elements present in an email using this technique:

  • Randomized text: 5.93%
  • Graymail: 62.6%
  • Legitimate email chain: 31.47%

The most common legitimate obfuscation element appended to one of these attacks was the Bank of America email signature.

When attackers have included links in the attack, an average of 4.68 legitimate links were included per attack, compared to just 1.87 malicious links. Our Threat Intelligence team identified ‘Uber.com’ and ‘Bofa.com’ were the most used legitimate links.

What The Attack Looks Like

Example 1

The Malicious Part

creating-noise-1

Screenshot of a phishing email impersonating a voicemail service with KnowBe4 Defend anti-phishing banners applied.

In the above screenshot, the malicious part of the attack is present. In this case, the email impersonates a voicemail service, prompting the recipient to click on a malicious HTML attachment to listen to the supposed message. Sent as part of a wider phishing campaign, the attack includes a polymorphic element, with the subject line and attachment name being randomized. This tactic hinders security teams from performing mass manual remediation of emails with the same subject line or conducting a direct search for a specific attachment, as each has been personalized to the recipient.

The Obfuscation Part

creating-noise-2

Screenshot showing the lower section of the same phishing email, where the attacker has inserted random characters to mask a malicious attachment from NLP detection.

If a recipient scrolls down, they will eventually encounter the second part: the obfuscation text. In this example shown in the screenshot above, the attacker has included random characters in the form of an email chain to increase the character count of the email, ensuring two things:

  1. Overall, there are more benign elements for NLP detection to pick up on
  1. The length of the email has increased. For some email security tools, if an email takes too long to scan, it will be released before the scan is complete, so phishing email can get through without classified as malicious .

Example 2

The Malicious Part

creating-noise-3

Screenshot of a phishing email impersonating Adobe with KnowBe4 Defend anti-phishing banners applied.

In this second example, the attack impersonates Adobe, a well-known organization whose software has a high adoption rate among professionals. The email is sent from a compromised account, posing as the recipient's HR team and urging them to click a malicious link to learn more about employee benefits.

The Obfuscation Part

creating-noise-4

Screenshot showing the lower section of the same phishing email, where the attacker has inserted an uber advertisement with legitimate links to mask a malicious link from NLP detection.

Upon scrolling, the recipient would come across what appears to be an Uber advertisement, where the attacks is once again impersonating a well-known brand. There will be a number of legitimate links such as the ‘rent a vehicle’ hyperlink, and the various pages linked in the sign off.

KnowBe4 Analysis

A Balance of Probabilities

The primary objective of this obfuscation technique is to bypass NLP detection. But how do extra characters, benign language, and legitimate links facilitate this? Some NLP solutions operate on a probability scale; if enough benign elements are present to outweigh a single suspicious link or attachment, tools may not classify it as a phishing email with high confidence, and others may not flag it at all.

It is the attackers' hope that, by stacking enough benign elements at the bottom of an email, an NLP tool will generate a general conclusion that the email is safer than it is malicious and deliver it to the recipient’s inbox.

Can Attackers Outsmart an ICES?

Our Threat Intelligence team suspects that these types of attacks are designed to bypass advanced tools like integrated cloud email security (ICES) anti-phishing solutions, as secure email gateway (SEG) systems do not typically utilize NLP functionality. This suggests that attackers are aware of the shift toward cloud-based email security and are tailoring their tactics based on the technology stack used by their targets.

Identifying Advanced Phishing Threats

While these attacks seem to have been created to bypass ICES solutions, Cybersecurity leaders shouldn’t start reverting to the SEG just yet. As can be seen in both examples above, the attacks have been flagged as high confidence phishing emails by KnowBe4 Defend. For the KnowBe4 solution specifically, one of the main reasons is because we utilize a zero-trust approach to detect and neutralize emerging threats.

Ultimately, effectively identifying and preventing this type of threat requires a sophisticated tool capable of detecting and neutralizing the various techniques employed in each attack, including polymorphic subject lines, impersonation, and account compromise. KnowBe4 Defend takes a holistic approach to detection, using AI and a zero-trust approach to detect and neutralize emerging threats like impersonation and zero-day attacks.

Obfuscation FAQs

What is NLP obfuscation in phishing emails?

NLP (Natural Language Processing) obfuscation is a technique where attackers "bury" a malicious link or attachment under a mountain of benign text, legitimate links, and random characters. The goal is to confuse AI detection tools into thinking the email is statistically more "safe" than "malicious" based on the sheer volume of legitimate-looking content.

Why do attackers use hundreds of empty break lines in these emails?

Attackers use an average of 157 HTML break lines to create a massive amount of whitespace. This serves two purposes: it hides the "junk" text from the human recipient so they don't get suspicious, and it increases the file size and character count to potentially trigger "timeout" or "bypass" rules in certain email security scanners.

How do legitimate links like 'Uber.com' help a phishing attack?

Security tools often check the reputation of links within an email. By including multiple high-reputation, legitimate links (like Bank of America or Uber) alongside a single malicious one, the attacker shifts the "probability scale" of the email. Some NLP tools may decide the email is safe because 80% of the links are verified and legitimate.

Why are these attacks specifically targeting ICES solutions instead of SEGs?

Integrated Cloud Email Security (ICES) solutions typically use advanced AI and NLP to scan message context, whereas older Secure Email Gateways (SEGs) often rely on simpler blocklists. Attackers are evolving their tactics to specifically "poison" the data that AI-based ICES tools rely on to make decisions.

How can organizations detect polymorphic phishing attacks?

Since polymorphic attacks change their subject lines and filenames for every recipient, manual remediation is nearly impossible. Effective defense requires a zero-trust, AI-driven solution that analyzes the intent and behavior of the email—such as account compromise or impersonation—rather than just looking for known malicious signatures.


See KnowBe4 Defend™ in Action

Learn how Defend™ strategically enhances Microsoft 365's native security to catch the threats Secure Email Gateways (SEGs) miss.

Request a Demo


Subscribe to Our Blog


We Train Humans & Agents




Get the latest insights, trends and security news. Subscribe to CyberheistNews.